Foxit PDF Conversion SDK

How to convert pdf2office using node.js

System requirements

Platform: Windows
Programming Language: Node.js
License Key requirement: ‘pdf2office’ module permission in the license key
SDK Version: Foxit PDF CONVERSION SDK 1.0 or later

Foxit PDF Conversion SDK is a flexible, high-performance library to convert PDF files to MS office suite formats while maintaining the layout and format of your original documents.

From version 1.0, Foxit PDF CONVERSION SDK has added support for the Node.js programming language, which is an open-source, cross-platform, server-side JavaScript runtime environment.

With this new support for Node.js, developers can now use the Foxit PDF CONVERSION SDK to build powerful conversion tools and applications using the popular programming language.

Requirement

● Node Version need Node 8 ~ 18.
● Node 8 ~ 18 use npm to install foxit-pdf-conversion-sdk-node library.

Installation

Using npm to install the Node.js package for the Foxit PDF conversion SDK. 

npm i @foxitsoftware/foxit-pdf-conversion-sdk-node

Working with Foxit PDF Conversion SDK Node.js API

It’s easy to use Foxit PDF Conversion SDK Node.js for converting PDF to Office. We can use PDF paths or PDF streams as a source of PDF input.  Before using it, first look at the definition of input parameters.

PDFConversionSDK.PDF2Office.StartConvertToWordWithPath(src_pdf_path, src_pdf_password, saved_word_file_path, setting_data, convert_callback);
PDFConversionSDK.PDF2Office.StartConvertToPPTWithPath(src_pdf_path, src_pdf_password, saved_ppt_file_path, setting_data, convert_callback);
PDFConversionSDK.PDF2Office.StartConvertToExcelWithPath(src_pdf_path, src_pdf_password, saved_excel_file_path, setting_data, convert_callback);
PDFConversionSDK.PDF2Office.StartConvertToWordWithStream(src_pdf_reader,src_pdf_password, saved_word_file_stream, setting_data, convert_callback);
PDFConversionSDK.PDF2Office.StartConvertToPPTWithStream(src_pdf_reader,src_pdf_password, saved_ppt_file_stream, setting_data, convert_callback);
PDFConversionSDK.PDF2Office.StartConvertToExcelWithStream(src_pdf_reader,src_pdf_password, saved_excel_file_stream, setting_data, convert_callback);

Parameter Windows
src_pdf_pathPath of a PDF file. This should not be an empty string.
src_pdf_passwordPassword for the input PDF file. If no password is needed for the file, please pass an empty string.
saved_word_file_pathPath of the saved Word format file as conversion result. This should not be an empty string. If the suffix name of the saved Word format file is not “docx”, a new suffix named “docx” will be added to the original file name.
saved_ppt_file_pathPath of the saved PowerPoint format file as conversion result. This should not be an empty string. If the suffix name of the saved PowerPoint format file is not “pptx”, a new suffix named “pptx” will be added to the original file name.
saved_excle_file_pathPath of the saved Excel format file as conversion result. This should not be an empty string. If the suffix name of the saved Excel format file is not “xlsx”, a new suffix named “xlsx” will be added to the original file name.
src_pdf_readerReaderCallback object which is implemented by user to load a PDF document. It should not be NULL.
saved_word_file_streamStreamCallback object which is implemented by user to read the contents of the converted Excel format file. It should not be NULL.
saved_ppt_file_streamStreamCallback object which is implemented by user to read the contents of the converted Excel format file. It should not be NULL.
saved_excel_file_streamStreamCallback object which is implemented by user to read the contents of the converted Excel format file. It should not be NULL.
setting_dataSetting data used for converting.
convert_callbackPDFConversionSDK.ConvertCallback object which is implemented by the user to pause and notify the conversion progress during the converting process. This can be NULL which means not to pause and notify the conversion progress. If this is not NULL, it should be a valid PDFConversionSDK.ConvertCallback object implemented by user. Default value: NULL.

import foxit pdf conversion sdk node module

var {
  ErrorCode,
  State,
  Library,
  PDF2OfficeSettingData,
  ConvertCallback,
  PDF2Office,
  Progressive,
  ReaderCallback,
  StreamCallback } = require("@foxitsoftware/foxit-pdf-conversion-sdk-node");

Specify PDF2OfficeSettingData for pdf2office

Before conversion, you need to set the metrics data files path and whether to enable machine learning based recognition. Metrics data files are offered in the “res/metrics_data” folder of the Foxit PDF Conversion SDK package. Enable machine learning-based recognition functionality to enhance recognition results in PDF documents. And this recognition functionality will be executed on the server side and return the relevant results when it is done. Machine learning based recognition is disabled in the following code.

var setting_data = new PDF2OfficeSettingData('../../res/metrics_data', false);

Specify PDFConversionSDK.ConvertCallback

class CustomConvertCallback extends ConvertCallback {
  NeedToPause() {
    return true;
  }
 
  ProgressNotify(converted_count, total_count) {
  }
}
 
var custom_callback = new CustomConvertCallback();

Specify PDFConversionSDK.ReaderCallback

class CustomReaderCallback extends ReaderCallback {
  LoadFile(file_path) {
    this.file_path = file_path;
    this.fd = fs.openSync(file_path, 'r');
  }
 
  GetSize() {
    var states = fs.statSync(this.file_path);
    return states.size;
  }
 
  ReadBlock(offset, size) {
    var buf = Buffer.alloc(size);
    var read_size = fs.readSync(this.fd, buf, 0, size, offset);
    if (read_size == 0) {
      return "";
    }
    return buf.toString('binary');
  }
}

Specify PDFConversionSDK.StreamCallback

class CustomStreamCallback extends StreamCallback {
  LoadFile(file_path) {
    this.file_path = file_path;
    this.fd = fs.openSync(file_path, 'w+')
    this.curpos = 0;
    this.offset = 0;
  }
 
  Retain() {
    return this;
  }
 
  Release() {
    fs.closeSync(this.fd);
  }
 
  GetSize() {
    var states = fs.statSync(this.file_path);
    return states.size;
  }
 
  IsEOF() {
    var states = fs.statSync(this.file_path);
    return this.curpos - this.offset >= states.size;
  }
 
  GetPosition() {
    return this.curpos;
  }
 
  ReadBlock0(offset, size) {
    var buf = Buffer.alloc(size);
    this.offset = offset;
    var read_size = fs.readSync(this.fd, buf, 0, size, offset);
    if (read_size == 0) {
      return false;
    }
    this.curpos = offset + size;
    return buf.toString('binary');
  }
 
  ReadBlock1(buffer, size) {
    var states = fs.statSync(this.file_path);
    return ReadBlock0(buffer, states.size(), size);
  }
 
  WriteBlock(buffer, offset, size) {
    this.offset = offset;
    var buf = Buffer.from(buffer, 'binary');
    var write_size = fs.writeSync(this.fd, buf, 0, size, offset);
    if (write_size == size) {
      this.curpos = offset + size;
      return true;
    }
    return false;
  }
 
  Flush(user_data) {
  }
}

Call conversion

Call the PDF2Office.StartConvertToWordWithPath function to begin converting PDF to Word. The interface call methods for PPT and Excel are similar.

// convert pdf to word with path
try {
  var progressive = PDF2Office.StartConvertToWordWithPath(src_pdf_path, "", saved_word_file_path, setting_data, custom_callback);
  if (progressive.GetRateOfProgress() != 100) {
    var state = State.e_ToBeContinued;
    while (State.e_ToBeContinued == state) {
      state = progressive.Continue();
    }
  }
  console.log("Convert PDF file to Word format file with path.");
} catch (e) {
  console.log(e.message);
}
 
// convert pdf to word with stream
try {
  var custom_readercallback_word = new CustomReaderCallback();
  var custom_streamcallback_word = new CustomStreamCallback();
  custom_readercallback_word.LoadFile(src_pdf_path);
  custom_streamcallback_word.LoadFile(saved_word_file_path);
  var progressive = PDF2Office.StartConvertToWordWithStream(custom_readercallback_word, "", custom_streamcallback_word, setting_data, custom_callback);
  if (progressive.GetRateOfProgress() != 100) {
    var state = State.e_ToBeContinued;
    while (State.e_ToBeContinued == state) {
      state = progressive.Continue();
    }
  }
  console.log("Convert PDF file to Word format file with stream.");
  delete custom_readercallback_word;
  delete custom_streamcallback_word;
} catch (e) {
  console.log(e.message);
}

How to Run conversion in foxit-pdf-sdk-web-demo

Foxit-pdf-sdk-web-demo is an online demo application based on Foxit PDF SDK that helps users learn and familiarize themselves with the various features and capabilities of Foxit PDF SDK. Foxit pdf-sdk-web-demo has add a demo of the pdf2office function by using the Foxit PDF Conversion SDK.

How to get

Conversion demo is located in the foxit-pdf-sdk-web-demo/examples/conversion directory. PDFToOfficeService is a service for nodejs used for PDF to Office with Foxit PDF Conversion SDK. 

How to run

Install foxit-pdf-sdk-web-demo dependencies

cd foxit-pdf-sdk-web-demo
npm i

Install conversion server dependencies

cd foxit-pdf-sdk-web-demo/examples/conversion/PDFToOfficeService
npm i

Start conversion server

cd foxit-pdf-sdk-web-demo
npm run start:conversion-server

Start conversion client 

Change the serverUrlBase value in foxit pdf sdk web demo/examples/version/config.ts,export let serverUrlBase= `http://localhost:19113`.

cd foxit-pdf-sdk-web-demo
npm install http-server -g
npm run build
http-server ./dist -p 8083

How to use

1. Open http://localhost:8083/#/conversion in your browser to access the demo.

2. Click on the “Upload” button to upload a PDF file.

3.  Once the PDF has been successfully uploaded, select one of the conversion options to convert the PDF to the desired office document format.

4. If your PDF file contains tables, it is recommended that you select the option to “Use AI to recognize borderless tables” for better table extraction.

6. Once the conversion is complete, click on the “Download” button to download the converted file.

Updated on October 13, 2023

Was this article helpful?
Thanks for your feedback. If you have a comment on how to improve the article, you can write it here: