PDF Tech

How Full-Text Search Revolutionizes your Document Management

by Conor Smith | April 4, 2019

According to research performed by Feldman and Sherman, workers spend 15-35% of their workday searching for information but are successful less than 50% of the time.

With more data being collected and turned into useful information than ever before in this “age of information”, being fast and successful in this search is crucial for your company. That means getting your document text index and search functionality in order is key to improving productivity.

 

Full-text search: what is it?

Full-text search is defined as “searching a single computer-stored document or a collection in a full-text database”. In this article, we’re focusing on the latter. As an example, this is like searching for business contracts in your database and getting every instance of where they’re mentioned returned to you. With a PDF SDK, you can perform full-text search programmatically or in a UI. Doing it programmatically allows you to search for more elements and in multiple files.

See below for a very simple example of how easy it is to carry out a full-text search by writing to a text file and manipulating data from there.

 

 

The full-text search workflow

Find string that matches a string pattern -> Save string -> Gather saved strings into console -> Create database file -> Write console result strings to the .db file -> Save .db file
Full-Text Search Workflow
Read on below as we look at the main ways in which full-text search can revolutionize your document management.

 

Full-text search for scale & speed

Full-text search enables you to search a string across thousands of documents since the full content (including split text, combined words, metadata, annotations and so on) is indexed. This means no more sifting through endless reams of paper to find that all important contract or invoice again. Full-text search takes seconds to execute and your query can be as simple or complex as needs be.

 

Full-text search for cost & waste reduction

The savings from performing a full-text search versus searching through printed documents is huge. Bearing in mind that only 2% of companies are paperless, and with a 2018 survey revealing 44% of businesses still use paper in their day-to-day operations, this represents a huge cost-cutting exercise for companies, and a boon for the environment too.

It can’t be overstated that archiving paper is an extremely costly exercise that also comes with great risk. Too often we hear of companies that have had an unfortunate accident such as a fire, leak or storm that destroys untold amounts of precious information. This is often without any backup which adds even more to the replacement cost.

 
Simplify your PDF Development Journey

 

Full-text search for redaction & extraction

When it comes to redaction, full-text search enables you to return all instances of a query and mark it for redaction instantly. A common use case for a company would be to redact all instances of sensitive customer information such as social security numbers in a database.

Text extraction is one of the main ways in which full-text search is utilized by developers. The primary function of text extraction when it comes to full-text search is to create lists. A common example would be for an insurance company to pull the data located in the “Name” field in their documents and write it into a format you can export like SQL, CSV or XML. Another way to do this is to base the extraction on specific x,y coordinates of a PDF, leaving you with a list of customer names in this example.

For both redaction and extraction, they work in tandem with OCR (optical character recognition) which convert your image-based PDFs (“bad” PDFs) to instantly editable text (“good” PDFs). This ensures you don’t miss any situation where redaction or extraction is required in your scanned PDF files. Also, keep in mind this doesn’t only apply to Latin script languages. Premium software development kits work with Chinese, Korean, Russian, and many more languages.

 

Full-text search for compliance & GDPR

For any company that has EU customers, full-text search is a great asset in helping you to meet GDPR compliance. In particular, ‘the right to data portability’ (Article 20 of the GDPR) and ‘the right to erasure’ (Articles 17 & 19 of the GDPR) means that you can search for a contact’s details and export/delete their data in a much faster and convenient way. There are countless other regulations, depending on the countries your business operates in, that you can use full-text search to help you comply with legislation.

 

 
We’re experts when it comes to full-text search. Foxit PDF SDK offers the fastest full-text search technology in the market. To discover more about how Foxit PDF SDK can revolutionize your document management, get in touch with us below for a risk-free trial.