Foxit PDF SDK for Mac

How to Extract & Search for Text with Foxit PDF SDK (Objective-C)

Text Page

Foxit PDF SDK provides APIs to extract, select, search and retrieve text in PDF documents. PDF text contents are stored in FSTextPage objects which are related to a specific page. The FSTextPage class can be used to retrieve information about text in a PDF page, such as single character, single word, or text content within a specified character range or a rectangle and so on. It also can be used to construct objects of other text related classes to perform other operations for text contents or access specified information from text contents:

  • To search for text in the text contents of a PDF page, construct a FSTextSearch object with a FSTextPage object.
  • To access text such as hypertext links, construct a FSPageTextLinks object with FSTextPage object.

Example:

How to extract text from a PDF page

#include "FSPDFObjC.h"
...
// Assuming FSPDFPage page has been loaded and parsed.
// Get the text page object.
FSTextPage *textPage = [[FSTextPage alloc] initWithPage:page flags:FSTextPageParseTextNormal];
int charCount = [textPage getCharCount];
if (charCount > 0) {
    NSString *currentText = [textPage getChars:0 count:-1];
}
...

How to select text of a rectangle area in a PDF

#include "FSPDFObjC.h"
...
FSTextPage *textPage = [[FSTextPage alloc] initWithPage:page flags:FSTextPageParseTextNormal];
FSRectF* rect = [[FSRectF alloc] initWithLeft1:90 bottom1:580 right1:450 top1:595];
NSString* text = [textPage getTextInRect:rect];

Text Search

Foxit PDF SDK provides APIs to search text in a PDF document, a XFA document, a text page or in a PDF annotation’s appearance. It offers functions to perform a text search and get the search results:

  • To specify the search pattern and options, use functions TextSearch.SetPattern, TextSearch.SetStartPage (only useful for a text search in a PDF document), TextSearch.SetEndPage (only useful for a text search in a PDF document) and TextSearch.SetSearchFlags.
  • To perform the search, use function TextSearch.FindNext or TextSearch.FindPrev.
  • To get the search results, use function TextSearch.GetMatchXXX().

Example:

How to search a text pattern in a page

#include "FSPDFObjC.h"
...
// Assuming FSPDFDoc doc has been loaded.
...
FSTextSearch *search = [[FSTextSearch alloc] initWithDocument:doc cancel:nil];
int startIndex = 0;
int endIndex = [doc getPageCount] - 1;
[search setStartPage:startIndex];
[search setEndPage:endIndex];
NSString *pattern = @"Foxit";
[search setPattern:pattern];
NSInteger flags = FSTextSearchSearchNormal;
[search setSearchFlags: (unsigned int)flags];
int match_count = 0;
while ([search findNext]) {
FSRectFArray *rects = [search getMatchRects];
match_count ++;
}
...

Text Link

In a PDF page, text contents that represent a hypertext link to a website/resource on the internet, or an email address are the same as common text. Prior to text link processing, user should first call PageTextLinks.GetTextLink to get a textlink object.

Example:

How to retrieve hyperlinks in a PDF page

#include "FSPDFObjC.h"
...
FSTextPage *textPage = [[FSTextPage alloc] initWithPage:page flags:FSTextPageParseTextNormal];
FSPageTextLinks* page_text_links = [[FSPageTextLinks alloc] initWithPage:textPage];
if (NO == [page_text_links isEmpty]) {
    int index = 0;
    FSTextLink* text_link = [page_text_links getTextLink:index];
    if (NO == [text_link isEmpty])
    NSString* url = [text_link getURI];
}
...

Updated on April 28, 2019

Was this article helpful?
Thanks for your feedback. If you have a comment on how to improve the article, you can write it here: