000032430 - How to configure RSA Data Loss Prevention filtering on File Fingerprint Crawler using exclusions

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support on Jan 23, 2018
Version 5Show Document
  • View in full screen mode

Article Content

Article Number000032430
Applies ToRSA Product Set: Data Loss Prevention
RSA Product/Service Type: Enterprise Manager, Datacenter
RSA Version/Condition: 9.6 SP2
Platform: Windows
TasksThis article provides steps to configure File Fingerprint Crawler with filtering for target-scan contents through using exclusions. 

What is the purpose of Fingerprinting files?

The intent and purpose of the File fingerprinting feature is to identify potentially sensitive information in the form of files within the organization and collect fingerprints on them that will help in detecting data leakage violations during content analysis.
Depending on the type of files that have to be protected, the feature allows for two different types of fingerprinting. They are:

  1. Partial/Full text fingerprinting which extracts and fingerprints the textual content of a file, and
  2. Binary fingerprinting where the entire binary content of the file is fingerprinted.
An example for partial text fingerprint detection would include the cases of identifying portions of sensitive information that is interspersed within an email leaving an organization and an example for binary fingerprint detection would include detecting someone who tries to publicize an organization’s next generation design diagram or an executable to the outside world.
Note: Crawler function is hosted on the Datacenter Site-Coordinator [SC]  server. 
  1. On the RSA Data Loss Prevention Enterprise-Manager (EM) web interface, go to Settings > Fingerprint Crawler Manager.
  2. Start configuring the basic crawler settings as shown below.
    1. Crawler Name: Name associated with the crawler configuration. Used by EM to manage the different crawlers.
    2. Resulting Content Blade Name:  Name of the Content blade that will be created upon successful run of this crawl configuration.
    3. Run At Site (Drop down): Drop down box that enumerates the list of available sites and lets the user choose the one under which the crawler has to be run.
    4. Credentials (User/Pass): Credentials to the Site Coordinator. Only an authorized user can run the crawler within a site.
    5. File Content Match
  • Full And Partial Text > Collects fingerprints on partial and full textual content.
  • Full Binary > Collects fingerprints on binary content.
  1. Full UNC Path: UNC path to a file or directory whose contents need to be fingerprinted. One or more paths may be added. Options available against each path that allows user to enter credentials for that path. This will enable the  crawler to crawl files across domains.
  2.  Default User Credentials: The default credentials to use to connect to any of the file shares pointed by the paths when credentials for individual paths is not given.
  1. Start configuring the Crawler Advanced options for exclusions as shown below.
    1. Advanced Options: Advanced options are used to refine the crawler configuration. They allow an user to specify inclusion and exclusion rules to narrow down the files that need to be fingerprinted from the paths given above. The values can be either simple strings or Regular expressions. The regular expression syntax can be validated before saving the configuration.
    2. File Extension: Filter (include/exclude) files based on their extensions.(e.g. doc) 
    3. File Name: Filter (include/exclude) files based on their names. (e.g exlude file with name "blacklist.xls")
    4. Trailing Directory names: Filter (include/exclude) all files and sub-directories within a directory with the given name.
    5. Full UNC Path: Filter (only exclusion) a complete path. This is used to specify a sub directory location of the paths described above. Inclusion not supported as it is equivalent to Item 6.
  2. Save configurations and run the crawler.

Validate the execution of the Crawler function and that exclusions have been applied

Once the status of the Crawler function on the EM web interface shows completed,
  1. Open the More Info tab to have a look on the summary of the crawler function execution.
  2. Verify if the exclusions on the target-scan have been applied or not by validating the amount of Files Fingerprinted number of files Filtered from Fingerprint. Per the example shown below, crawler function ran over a target-scan folder than contains four text files in which  two of them have been excluded using the file extensions .xlsx and .docx.  The result of Files Fingerprinted and Filtered from Fingerprint each shows two. 
User-added image