PDF Sample applications Tech

How to Programmatically Redact Data from PDFs

by PDF SDK | September 29, 2020

When your web application needs to allow users to download reports, charts, or invoices, PDFs are a great option. However, there are times when you must remove personal information from the PDFs you generate. For example, an administrator in your application may be permitted to view all user data, but other users may be restricted in what they can view. The need to limit access is especially relevant in applications that store a user’s address, phone number, date of birth, or government identification numbers. Due to regulatory and security requirements, you may need to hide this information from your app’s user interface and any PDFs it creates.

Fortunately, Foxit has a PDF library for .NET & .NET Core applications that allows you to build data redaction into any PDFs you generate. Our library dramatically reduces the amount of work required to generate PDFs that contain sensitive user data.

In this tutorial, you’ll see how to create a .NET Core Razor Pages web application that will generate a PDF with private data visually redacted. The application is available on GitHub if you want to see the final application, or you can follow along to learn how it works.

Building the Application

This web application will use Foxit PDF SDK to generate a PDF using customer data stored on your filesystem. It will look for instances of “SIN:” (indicating a user’s Social Insurance Number) and hide the data on that line with a black box. While this is a relatively simple algorithm, you can see how it could be expanded to build a robust data protection layer into your PDF generation process.

Prerequisites

A recent version of .NET Core such as version 3.1

– An IDE that supports .NET Core development like Visual Studio 2019 or JetBrains Rider

– Download Foxit PDF SDK for .NET Core on Windows using a free trial

Creating a New .Net Core Web Application

When building any new .NET based application, Microsoft and the community recommend using .NET Core (or the upcoming re-branded .NET version 5).

To create a .NET Core web application that uses Foxit’s PDF tooling, create a new folder on your local machine. Navigate to it and run the following from your command prompt:

dotnet new webapp

This will scaffold a new .NET Core web application using Razor Pages.

Note: Alternatively, you may use your IDE to create a new Razor Page web application.

Referencing the PDF SDK

After you’ve downloaded and extracted Foxit PDF SDK for Windows and .NET Core, copy the “lib” folder over to your .NET Core project’s root.

Next, using your IDE, add a reference to “lib\x64_vc15\fsdk_dotnetcore.dll”.

In your IDE (such as Visual Studio 2019), right-click your project in the solution explorer and choose “Add -> Existing Item…”. Add the file found at “lib\x64_vc15\fsdk.dll”.

Note: If using an IDE like JetBrains Rider, you may need to manually copy that file into the root of the project.

In the solution explorer, right-click the file and choose “properties”. Under “Copy to output directory” choose “Copy if newer”.

Your .csproj file should now look like this:

xml
<Project Sdk="Microsoft.NET.Sdk.Web">
<PropertyGroup>
<TargetFramework>netcoreapp3.1</TargetFramework>
</PropertyGroup>
<ItemGroup>
<Reference Include="fsdk_dotnetcore, Version=7.3.0.730, Culture=neutral, PublicKeyToken=null">
<HintPath>lib\x64_vc15\fsdk_dotnetcore.dll</HintPath>
</Reference>
</ItemGroup>
<ItemGroup>
<None Update="fsdk.dll">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
</ItemGroup>
</Project>

You’re now ready to set up your data source and build the user interface.

Adding A Data Source

In a real-world application, you will probably use a database, HTTP API, or file cache, but you can use a static JSON file as your data source for the sake of this demonstration. In your project’s root, create a file called `datasource.json` and add the following:

json
{
"FullName": "Bob Smith",
"City": "New York",
"State": "New York",
"SIN": 123456789
}

This data represents a single customer with a name, location, and SIN (to be redacted) that will be used to populate a PDF document. Now you are ready to build the user interface for your PDF generation app.

Creating The User Interface

You need a web page that allows users to generate PDFs based on your stored user data. In the file “Pages/Index.cshtml” add the following:

html
@page
@model IndexModel
@{
ViewData["Title"] = "Home page";
}
<p>
The user's data looks like this:
<ul>
<li>Fullname: @Model.UserData.FullName</li>
<li>City: @Model.UserData.City</li>
<li>State: @Model.UserData.State</li>
<li>SIN: @Model.UserData.SIN</li>
</ul>
</p>
<br />
<p>
You can generate a PDF with the user's data below ????
</p>
<form asp-page-handler="PDF" method="POST">
<button type="submit" class="btn btn-primary">Generate PDF</button>
</form>
<br/>
<form asp-page-handler="SecurePDF" method="POST">
<button type="submit" class="btn btn-primary">View Secured Version Of Generated PDF</button>
</form>

There are two forms here that each reference a specific ASP page handler. In the next section, you’ll build these handler methods in your code-behind file.

PDF Generation Handlers

As you might suspect, the code that generates a PDF is more complex than our UI. Open the file “Pages/Index.cshtml.cs” to follow along, and in the coming steps, we’ll go through each section of code to explain how it works.

Index Model

First, this is the skeleton of the Razor Page Index Model:

csharp
public class IndexModel : PageModel
{
public class UserDataJsonModel
{
public string FullName { get; set; }
public string City { get; set; }
public string State { get; set; }
public int SIN { get; set; }
}

/** Found under "lib/gsdk_sn.txt" **/
private static string serialNo = "YOUR_SERIAL_NO";
/** Found under "lib/gsdk_key.txt" **/
private static string key = “YOUR_KEY”;
public UserDataJsonModel UserData { get; set; }
/**
More code to come!...
**/
}

There’s a nested class `UserDataJsonModel` that represents your user data. You’ll assign data to that property in the next section. Be sure to paste your Foxit serial number and SDK key in the appropriate places in this file.

Page Handler

Next, add this method to the same file:

csharp
public async Task OnGetAsync()
{
using var fileStream = System.IO.File.OpenRead("./datasource.json");
this.UserData = await JsonSerializer.DeserializeAsync<UserDataJsonModel>(fileStream);
}

This will read your user data from the JSON file created above and deserialize it, assigning the data to the `UserData` property.

Note: `UserData` is referenced and used in the view file “Pages/Index.cshtml” that you created earlier.

PDF Handlers

Next, add the following two methods:

csharp
public async Task<IActionResult> OnPostPDFAsync()
{
using var fileStream = System.IO.File.OpenRead("./datasource.json");
UserDataJsonModel model = await JsonSerializer.DeserializeAsync<UserDataJsonModel>(fileStream);
string path = GeneratePDF(model, false);
var stream = new FileStream(path, FileMode.Open);
return new FileStreamResult(stream, "application/pdf");
}
public async Task<IActionResult> OnPostSecurePDFAsync()
{
using var fileStream = System.IO.File.OpenRead("./datasource.json");
UserDataJsonModel model = await JsonSerializer.DeserializeAsync<UserDataJsonModel>(fileStream);
string path = GeneratePDF(model, true);
var stream = new FileStream(path, FileMode.Open);
return new FileStreamResult(stream, "application/pdf");
}

Each method will read the user data, deserialize it, and then pass it to the `GeneratePDF` method. This method, which you’ll create next, will display the user data in a PDF document and return the location where it was saved on the file system.

PDF Generation

Finally, this section describes the core of the PDF generation code that uses Foxit’s PDF SDK. We’ve added comments to the code to help you understand what it’s doing, but be sure to read our developer guide and documentation for more details.

csharp
private static string GeneratePDF(UserDataJsonModel model, bool applyRedaction)
{
Library.Initialize(serialNo, key);
try
{
// Create a new PDF with a blank page.
using PDFDoc doc = new PDFDoc();
using Form form = new Form(doc);
using PDFPage page = doc.InsertPage(0, PDFPage.Size.e_SizeLetter);
// Create a text field for each property of the user data json model.
int iteration = 0;
foreach (var prop in model.GetType().GetProperties())
{
int offset = iteration * 60;
using Control control = form.AddControl(page, prop.Name, Field.Type.e_TypeTextField,
new RectF(50f, 600f - offset, 400f, 640f - offset));
using Field field = control.GetField();
var propValue = prop.GetValue(model);
field.SetValue($"{prop.Name}: {propValue} {Environment.NewLine}");
iteration++;
}
// Convert fillable form fields into read-only text.
page.Flatten(true, (int) PDFPage.FlattenOptions.e_FlattenAll);
if (applyRedaction)
{
/**
Described in the following section
**/
}
// Save the file locally and return the file's location to the caller.
string fileOutput = "./Output.pdf";
doc.SaveAs(fileOutput, (int) PDFDoc.SaveFlags.e_SaveFlagNoOriginal);
return fileOutput;
}
finally
{
Library.Release();
}
}

This will allow you to go ahead and generate PDFs  without redacting the user’s SIN. In the next section, you’ll see how to apply redaction and search for specific patterns that indicate sensitive user data.

Private Data Redaction

In the code sample above, I removed the code that performs the private data redaction. Here is that piece in isolation (with code comments for clarity):

csharp
if (applyRedaction)
{
// Configure text search to look at the first (and only) page in the PDF.
using var redaction = new Redaction(doc);
page.StartParse((int) foxit.pdf.PDFPage.ParseFlags.e_ParsePageNormal, null, false);
using TextPage textPage = new TextPage(page,
(int) foxit.pdf.TextPage.TextParseFlags.e_ParseTextUseStreamOrder);
using TextSearch search = new TextSearch(textPage);
RectFArray rectArray = new RectFArray();
// Mark text starting with the pattern "SIN:" to be redacted.
search.SetPattern("SIN:");
while (search.FindNext())
{
using var searchSentence = new TextSearch(textPage);
var sentence = search.GetMatchSentence();

// The matched sentence might not start with the pattern text.
// Grab the substring of the sentence starting from the search text pattern.
searchSentence.SetPattern(sentence.Substring(sentence.IndexOf("SIN:")));
while (searchSentence.FindNext())
{
RectFArray itemArray = searchSentence.GetMatchRects();
rectArray.InsertAt(rectArray.GetSize(), itemArray);
}

}
// If there are matches, then apply the redaction.
if (rectArray.GetSize() > 0)
{
using Redact redact = redaction.MarkRedactAnnot(page, rectArray);
redact.SetFillColor(0xFF0000);
redact.ResetAppearanceStream();
redaction.Apply();
}
}

At this point, you can run the application from your IDE or by running `dotnet run` from a command prompt. You’ll see two buttons to generate either a normal PDF or a PDF with the user’s SIN redacted.

In this tutorial, you’ve seen how to build a .NET Core web application that allows you to redact sensitive information inside any PDF document. This example shows how using Foxit PDF SDK makes generating and redacting data from PDFs significantly easier.

While this tutorial has focused on building a web application, Foxit provides SDKs for several platforms, including mobile and native apps. There are more tutorials and details in the documentation if you want to see what else is possible with Foxit’s SDK. The entire SDK API reference is also available so you can dig into all our features.

Author: James Hickey