Foxit PDF SDK 5.3

Why the length of the string retrieved from interface FSPDF_TextPage_GetChars is different from the count of characters getting from interface FSPDF_TextPage_CountChars?

The string retrieved from interface FSPDF_TextPage_GetChars is a UTF-8 string. The interface FSPDF_TextPage_CountChars gets the count of characters in a page.

“The length of string” and “the count of characters” are two totally different concepts. The length of a UTF-8 string represents how many bytes the UTF-8 string consumes, rather than how many characters the UTF-8 string contains. When a character just takes up one byte, the length of a UTF-8 string will be same with the count of characters. But for some characters, they take up more than one byte, such as Chinese characters and some special characters. In this case, the length of a UTF-8 string will usually be larger than the count of characters. So, the length of the string retrieved from interface FSPDF_TextPage_GetChars will not always be just same with the count of characters getting from interface FSPDF_TextPage_CountChars.

On the other hand, a PDF document may have some invisible characters which can be counted by interface FSPDF_TextPage_CountChars, but cannot be retrieved by interface FSPDF_TextPage_GetChars. In this case, the length of the string retrieved from interface FSPDF_TextPage_GetChars will also be different from the count of characters getting from interface FSPDF_TextPage_CountChars.

Note: This article refers to a deprecated version of a Foxit Product. If you are still using Foxit PDF SDK 5.3 or older, please refer to your download package documents for Developer Guide and API Reference.

Get a trial version of the new Foxit PDF SDK and see our latest generation SDK’s brand new features!

Updated on March 26, 2017

Was this article helpful?
Thanks for your feedback. If you have a comment on how to improve the article, you can write it here: