PII Detection

The PII Detection Job Task is uses regex expressions to detect PII in any document or metadata passing through 3Sixty. The regex expressions are stored in the form of a .properties file.

PII detection requires Tesseract OCR 5.X

Once installed check the box to attach text as metadata and enter the field name for the extracted content. We recommend using "content" if it does not conflict with other metadata fields in your run.

Important:  File size limit is 95MB

Note:  PII FLAG
This task will always add the boolean field hasPii for the purposes of mapping and analysis.


Configuration

To use this task go to the task tab in your job. Select the task from the drop down and click the plus circle to configure the task. Click done after making any changes to save.

Condition check

It will execute the task when the condition's result is 'true', 't', 'on', '1', or 'yes' (case-insensitive), or run on all conditions if left empty. This condition is evaluated for each document, determining whether the task should be executed based on the specified values.

Example: If I only want to run this task for PDF documents I would use the expression: equals('#{rd.mimetype}',"application/pdf")

Field To Mark

The output metadata property to store PII detected. The value of this field will be a map, In order to view this metadata, the field 'pii' must be be mapped.

{

"PhoneNumber": 20,

"Names": 200

}

Break up PII data into individuals fields with a prefix

Instead of adding the PII as a map, 3Sixty will break it up as individual fields with a prefix for easier mapping/processing.

The fields in the example below can be mapped as pii.phonenumber and pii.names.

Prefix for PII fields

If breaking up PII data, the prefix to use for each field. If left blank 'pii' will be used.

Fields To Check

Source properties and/or document to check for PII. Use ALL_PROPS to check all properties, BINARY to check the document (extracted via Tika) or individual property names.


Examples


Related Links

PII Properties Rules