|Applies To||RSA Product Set: Data Loss Prevention (DLP)|
RSA Product/Service Type: Enterprise Manager, Datacenter
RSA Version/Condition: 9.6 SP2
|Tasks||This article provides steps to configure "File Fingerprint Crawler" with filtering for target-scan contents through using exclusions. |
What is the purpose of Fingerprinting files?
The intent and purpose of the File fingerprinting feature is to identify potentially sensitive information in the form of files within the organization and collect fingerprints on them that will help in detecting data leakage violations during content analysis.
Depending on the type of files that have to be protected, the feature allows for two different types of fingerprinting. They are i) Partial/Full text fingerprinting which extracts and fingerprints the textual content of a file and ii) Binary fingerprinting where the entire binary content of the file is fingerprinted.
An Example for Partial text fingerprint detection would include the cases of identifying portions of sensitive information that is interspersed within an email leaving an organization and an example for binary fingerprint detection would include detecting someone who tries to publicize an organization’s next generation design diagram or an executable to the outside world.
Note: Crawler function is hosted on Datacenter Site-Coordinator [SC] server.
|Resolution||On your "RSA DLP Enterprise-Manager" webinterface, go to "settings" then "Fingerprint Crawler Manager".|
First: Start configuring the basic Crawler settings as shown below.
1- Crawler Name: Name associated with the crawler configuration. Used by EM to manage the different crawlers.
2- Resulting Content Blade Name: Name of the Content blade that will be created upon successull run of this crawl configuration.
3- Run At Site (Drop down): Drop down box that enumerates the list of available sites and lets the user choose the one under which the crawler has to be run.
4- Credentials (User/Pass): Credentials to the Site Coordinator. Only an authorized user can run the crawler within a Site.
5- File Content Match:
Full And Partial Text -> Collects fingerprints on partial and full textual content.
Full Binary -> Collect fingerprints on binary content.
6- Full UNC Path: UNC Path to a file or directory whose contents need to be fingerprinted. One or more paths may be added. Options available against each path that allows user to enter credentials
for that path. This will enable the crawler to crawl files across domains.
7- Default User Credentials: The default credentials to use to connect to any of the file shares pointed by the paths when credentials for individual paths is not given.
Second: Start configuring the Crawler Advanced options for exclusions as shown below.
1- Advanced Options: Advanced options are used to refine the crawler configuration. They allow an user to specify inclusion and exclusion rules to narrow down the files that need to be fingerprinted from the paths given above. The values can be either simple strings or Regular expressions. The regular expression syntax can be validated before saving the configuration.
How to validate the execution of the "Crawler function" and that exclusions have been applied?
Once the status of the "Crawler function" on the EM webinterface shows completed, please open "more info" tab to have a look on the summary of the "crawler function" execution and verify if the exclusions on the target-scan have been applied or not by validating no. "Files Fingerprinted" & no. of files "Filtered from Fingerprint".
As per the example shown in the below picture, "crawler function" ran over a target-scan folder than contains four text files in which two of them have been excluded using "File extensions" .xlsx & .docx, the result of "Files Fingerprinted" shows TWO & "Filtered from Fingerprint" shows Two.