000032430 - How to configure filtering on  File Fingerprint Crawler using exclusions in RSA DLP

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support Employee on Apr 21, 2017
Version 4Show Document
  • View in full screen mode

Article Content

Article Number000032430
Applies ToRSA Product Set: Data Loss Prevention (DLP)
RSA Product/Service Type: Enterprise Manager, Datacenter
RSA Version/Condition: 9.6 SP2
Platform: Windows
 
TasksThis article provides steps to configure "File Fingerprint Crawler" with filtering for target-scan contents through using exclusions. 
What is the purpose of Fingerprinting files?
The intent and purpose of the File fingerprinting feature is to identify potentially sensitive information in the form of files within the organization and collect fingerprints on them that will help in detecting data leakage violations during content analysis.
Depending on the type of files that have to be protected, the feature allows for two different types of fingerprinting. They are i) Partial/Full text fingerprinting which extracts and fingerprints the textual content of a file and ii) Binary fingerprinting where the entire binary content of the file is fingerprinted.
An Example for Partial text fingerprint detection would include the cases of identifying portions of sensitive information that is interspersed within an email leaving an organization and an example for binary fingerprint detection would include detecting someone who tries to publicize an organization’s next generation design diagram or an executable to the outside world.
Note: Crawler function is hosted on Datacenter Site-Coordinator [SC]  server. 
ResolutionOn your "RSA DLP Enterprise-Manager"  webinterface, go to "settings" then "Fingerprint Crawler Manager".
First: Start configuring the basic Crawler settings as shown below.
1- Crawler Name: Name associated with the crawler configuration. Used by EM to manage the different crawlers.

2- Resulting Content Blade Name:  Name of the Content blade that will be created upon successull run of this crawl configuration.
3- Run At Site (Drop down): Drop down box that enumerates the list of available sites and lets the user choose the one under which the crawler has to be run.
4- Credentials (User/Pass): Credentials to the Site Coordinator. Only an authorized user can run the crawler within a Site.
5- File Content Match: 
Full And Partial Text -> Collects fingerprints on partial and full textual content.
Full Binary -> Collect fingerprints on binary content.
6- Full UNC Path: UNC Path to a file or directory whose contents need to be fingerprinted. One or more paths may be added. Options available against each path that allows user to enter credentials
for that path. This will enable the  crawler to crawl files across domains.
7- Default User Credentials: The default credentials to use to connect to any of the file shares pointed by the paths when credentials for individual paths is not given.
Second: Start configuring the Crawler Advanced options for exclusions as shown below.
1- Advanced Options: Advanced options are used to refine the crawler configuration. They allow an user to specify inclusion and exclusion rules to narrow down the files that need to be fingerprinted from the paths given above. The values can be either simple strings or Regular expressions. The regular expression syntax can be validated before saving the configuration.
  • File Extension: Filter (include/exclude) files based on their extensions.(e.g. doc) 
  • File Name: Filter (include/exclude) files based on their names. (e.g exlude file with name "blacklist.xls")
  • Trailing Directory names: Filter (include/exclude) all files and sub-directories within a directory with the given name.
  • Full UNC Path: Filter (only exclusion) a complete path. This is used to specify a sub directory location of the paths described above. Inclusion not supported as it is equivalent to Item 6.
Third:  Save configurations and run the crawler.

How to validate the execution of the "Crawler function" and that exclusions have been applied?
Once the status of the "Crawler function" on the EM webinterface  shows completed, please open "more info" tab to have a look on the summary of the "crawler function" execution and verify if the exclusions on the target-scan  have been applied or not by  validating no. "Files Fingerprinted" & no. of files "Filtered from Fingerprint". 
As per the example shown in the below picture, "crawler function" ran over a target-scan folder than contains four  text files in which  two of them have been excluded using "File extensions" .xlsx & .docx, the result of "Files Fingerprinted" shows TWO  & "Filtered from Fingerprint" shows Two. 
User-added image
 

Attachments

    Outcomes