How to convert pdf2office using node.js
Contents
System requirements
Platform: Windows
Programming Language: Node.js
License Key requirement: ‘pdf2office’ module permission in the license key
SDK Version: Foxit PDF CONVERSION SDK 1.0 or later
Foxit PDF Conversion SDK is a flexible, high-performance library to convert PDF files to MS office suite formats while maintaining the layout and format of your original documents.
From version 1.0, Foxit PDF CONVERSION SDK has added support for the Node.js programming language, which is an open-source, cross-platform, server-side JavaScript runtime environment.
With this new support for Node.js, developers can now use the Foxit PDF CONVERSION SDK to build powerful conversion tools and applications using the popular programming language.
Requirement
● Node Version need Node 8 ~ 18.● Node 8 ~ 18 use npm to install foxit-pdf-conversion-sdk-node library.
Installation
Using npm to install the Node.js package for the Foxit PDF conversion SDK.
npm i @foxitsoftware/foxit-pdf-conversion-sdk-node
Working with Foxit PDF Conversion SDK Node.js API
It’s easy to use Foxit PDF Conversion SDK Node.js for converting PDF to Office. We can use PDF paths or PDF streams as a source of PDF input. Before using it, first look at the definition of input parameters.
PDFConversionSDK.PDF2Office.StartConvertToWordWithPath(src_pdf_path, src_pdf_password, saved_word_file_path, setting_data, convert_callback);
PDFConversionSDK.PDF2Office.StartConvertToPPTWithPath(src_pdf_path, src_pdf_password, saved_ppt_file_path, setting_data, convert_callback);
PDFConversionSDK.PDF2Office.StartConvertToExcelWithPath(src_pdf_path, src_pdf_password, saved_excel_file_path, setting_data, convert_callback);
PDFConversionSDK.PDF2Office.StartConvertToWordWithStream(src_pdf_reader,src_pdf_password, saved_word_file_stream, setting_data, convert_callback);
PDFConversionSDK.PDF2Office.StartConvertToPPTWithStream(src_pdf_reader,src_pdf_password, saved_ppt_file_stream, setting_data, convert_callback);
PDFConversionSDK.PDF2Office.StartConvertToExcelWithStream(src_pdf_reader,src_pdf_password, saved_excel_file_stream, setting_data, convert_callback);
Parameter | Windows |
src_pdf_path | Path of a PDF file. This should not be an empty string. |
src_pdf_password | Password for the input PDF file. If no password is needed for the file, please pass an empty string. |
saved_word_file_path | Path of the saved Word format file as conversion result. This should not be an empty string. If the suffix name of the saved Word format file is not “docx”, a new suffix named “docx” will be added to the original file name. |
saved_ppt_file_path | Path of the saved PowerPoint format file as conversion result. This should not be an empty string. If the suffix name of the saved PowerPoint format file is not “pptx”, a new suffix named “pptx” will be added to the original file name. |
saved_excle_file_path | Path of the saved Excel format file as conversion result. This should not be an empty string. If the suffix name of the saved Excel format file is not “xlsx”, a new suffix named “xlsx” will be added to the original file name. |
src_pdf_reader | A ReaderCallback object which is implemented by user to load a PDF document. It should not be NULL. |
saved_word_file_stream | A StreamCallback object which is implemented by user to read the contents of the converted Excel format file. It should not be NULL. |
saved_ppt_file_stream | A StreamCallback object which is implemented by user to read the contents of the converted Excel format file. It should not be NULL. |
saved_excel_file_stream | A StreamCallback object which is implemented by user to read the contents of the converted Excel format file. It should not be NULL. |
setting_data | Setting data used for converting. |
convert_callback | A PDFConversionSDK.ConvertCallback object which is implemented by the user to pause and notify the conversion progress during the converting process. This can be NULL which means not to pause and notify the conversion progress. If this is not NULL, it should be a valid PDFConversionSDK.ConvertCallback object implemented by user. Default value: NULL. |
import foxit pdf conversion sdk node module
var {
ErrorCode,
State,
Library,
PDF2OfficeSettingData,
ConvertCallback,
PDF2Office,
Progressive,
ReaderCallback,
StreamCallback } = require("@foxitsoftware/foxit-pdf-conversion-sdk-node");
Specify PDF2OfficeSettingData for pdf2office
Before conversion, you need to set the metrics data files path and whether to enable machine learning based recognition. Metrics data files are offered in the “res/metrics_data” folder of the Foxit PDF Conversion SDK package. Enable machine learning-based recognition functionality to enhance recognition results in PDF documents. And this recognition functionality will be executed on the server side and return the relevant results when it is done. Machine learning based recognition is disabled in the following code.
var setting_data = new PDF2OfficeSettingData('../../res/metrics_data', false);
Specify PDFConversionSDK.ConvertCallback
class CustomConvertCallback extends ConvertCallback {
NeedToPause() {
return true;
}
ProgressNotify(converted_count, total_count) {
}
}
var custom_callback = new CustomConvertCallback();
Specify PDFConversionSDK.ReaderCallback
class CustomReaderCallback extends ReaderCallback {
LoadFile(file_path) {
this.file_path = file_path;
this.fd = fs.openSync(file_path, 'r');
}
GetSize() {
var states = fs.statSync(this.file_path);
return states.size;
}
ReadBlock(offset, size) {
var buf = Buffer.alloc(size);
var read_size = fs.readSync(this.fd, buf, 0, size, offset);
if (read_size == 0) {
return "";
}
return buf.toString('binary');
}
}
Specify PDFConversionSDK.StreamCallback
class CustomStreamCallback extends StreamCallback {
LoadFile(file_path) {
this.file_path = file_path;
this.fd = fs.openSync(file_path, 'w+')
this.curpos = 0;
this.offset = 0;
}
Retain() {
return this;
}
Release() {
fs.closeSync(this.fd);
}
GetSize() {
var states = fs.statSync(this.file_path);
return states.size;
}
IsEOF() {
var states = fs.statSync(this.file_path);
return this.curpos - this.offset >= states.size;
}
GetPosition() {
return this.curpos;
}
ReadBlock0(offset, size) {
var buf = Buffer.alloc(size);
this.offset = offset;
var read_size = fs.readSync(this.fd, buf, 0, size, offset);
if (read_size == 0) {
return false;
}
this.curpos = offset + size;
return buf.toString('binary');
}
ReadBlock1(buffer, size) {
var states = fs.statSync(this.file_path);
return ReadBlock0(buffer, states.size(), size);
}
WriteBlock(buffer, offset, size) {
this.offset = offset;
var buf = Buffer.from(buffer, 'binary');
var write_size = fs.writeSync(this.fd, buf, 0, size, offset);
if (write_size == size) {
this.curpos = offset + size;
return true;
}
return false;
}
Flush(user_data) {
}
}
Call conversion
Call the PDF2Office.StartConvertToWordWithPath function to begin converting PDF to Word. The interface call methods for PPT and Excel are similar.
// convert pdf to word with path
try {
var progressive = PDF2Office.StartConvertToWordWithPath(src_pdf_path, "", saved_word_file_path, setting_data, custom_callback);
if (progressive.GetRateOfProgress() != 100) {
var state = State.e_ToBeContinued;
while (State.e_ToBeContinued == state) {
state = progressive.Continue();
}
}
console.log("Convert PDF file to Word format file with path.");
} catch (e) {
console.log(e.message);
}
// convert pdf to word with stream
try {
var custom_readercallback_word = new CustomReaderCallback();
var custom_streamcallback_word = new CustomStreamCallback();
custom_readercallback_word.LoadFile(src_pdf_path);
custom_streamcallback_word.LoadFile(saved_word_file_path);
var progressive = PDF2Office.StartConvertToWordWithStream(custom_readercallback_word, "", custom_streamcallback_word, setting_data, custom_callback);
if (progressive.GetRateOfProgress() != 100) {
var state = State.e_ToBeContinued;
while (State.e_ToBeContinued == state) {
state = progressive.Continue();
}
}
console.log("Convert PDF file to Word format file with stream.");
delete custom_readercallback_word;
delete custom_streamcallback_word;
} catch (e) {
console.log(e.message);
}
How to Run conversion in foxit-pdf-sdk-web-demo
Foxit-pdf-sdk-web-demo is an online demo application based on Foxit PDF SDK that helps users learn and familiarize themselves with the various features and capabilities of Foxit PDF SDK. Foxit pdf-sdk-web-demo has add a demo of the pdf2office function by using the Foxit PDF Conversion SDK.
How to get
- Get foxit-pdf-sdk-web-demo from https://github.com/foxitsoftware/foxit-pdf-sdk-web-demo.git
Conversion demo is located in the foxit-pdf-sdk-web-demo/examples/conversion directory. PDFToOfficeService is a service for nodejs used for PDF to Office with Foxit PDF Conversion SDK.
How to run
Install foxit-pdf-sdk-web-demo dependencies
cd foxit-pdf-sdk-web-demo
npm i
Install conversion server dependencies
cd foxit-pdf-sdk-web-demo/examples/conversion/PDFToOfficeService
npm i
Start conversion server
cd foxit-pdf-sdk-web-demo
npm run start:conversion-server
Start conversion client
Change the serverUrlBase value in foxit pdf sdk web demo/examples/version/config.ts,export let serverUrlBase= `http://localhost:19113`.
cd foxit-pdf-sdk-web-demo
npm install http-server -g
npm run build
http-server ./dist -p 8083
How to use
1. Open http://localhost:8083/#/conversion in your browser to access the demo.
2. Click on the “Upload” button to upload a PDF file.
3. Once the PDF has been successfully uploaded, select one of the conversion options to convert the PDF to the desired office document format.
4. If your PDF file contains tables, it is recommended that you select the option to “Use AI to recognize borderless tables” for better table extraction.
6. Once the conversion is complete, click on the “Download” button to download the converted file.
Updated on October 13, 2023