Extract text from a defined rectangular area on a page
Foxit Quick PDF Library includes a range of functionality for extracting text from PDF files, but usually it’s for extract text from an entire page. The extract functions which include “area” in the name let you specify a rectangular area from which you wish to extract text. The key functions for this using regular memory functions are SetTextExtractionArea and for direct access (DA) functions it is DASetTextExtractionArea.
Sample code demonstrating the use of the regular and DA functions for extracting text from a portion of the page is shown below:
SetTextExtractionArea with GetPageText
DPL.LoadFromFile(@"Sample.pdf", "");
DPL.SetOrigin(1); // Sets 0,0 coordinate position to top left of page, default is bottom left
DPL.SetTextExtractionArea(35, 35, 229, 30); // Left, Top, Width, Height
string ExtractedContent = DPL.GetPageText(8);
Console.WriteLine(ExtractedContent);
DASetTextExtractionArea with ExtractFilePageText
SetOrigin cannot be used with DASetTextExtractionArea so the 0,0 coordinates are at the bottom left of the page by default. This means we need to adjust top parameter so that the top is measured bottom up, rather than from top down. The page height is 792 points so it’s just a matter of subtracting 35 in our example above from 792 to give us 757 points.
DPL.DASetTextExtractionArea(35, 757, 229, 30); // Left, Top, Width, Height
ExtractedContent = DPL.ExtractFilePageText(@"Sample.pdf", "", 1, 8);
Console.WriteLine(ExtractedContent);
DASetTextExtractionArea with DAExtractPageText
int fileHandle = DPL.DAOpenFile(@"C:\Users\Rowan\Dropbox (Debenu)\DQPL ReleaseTester\TestFiles\Text\Adobe PDF Library.pdf", "");
int pageRef = DPL.DAFindPage(fileHandle, 1);
DPL.DASetTextExtractionArea(35, 757, 229, 30); // Left, Top, Width, Height
ExtractedContent = DPL.DAExtractPageText(fileHandle, pageRef, 8);
Console.WriteLine(ExtractedContent);
Foxit Quick PDF Library gives you precision control over which text is extracted from the document.
This article refers to a deprecated product. If you are looking for support for Foxit PDF SDK, please click here.
Updated on May 16, 2022