Data Privacy: Recommended Configurations

Document created by RSA Information Design and Development on Jul 26, 2016
Version 1Show Document
  • View in full screen mode
 

Administrators can set up the Security Analytics hosts and services to meet data privacy requirements for their environment. RSA has recommended configurations for both data privacy and data retention.

Recommended Data Privacy Configuration

The recommended configuration to obtain the best analytical value with data obfuscation enabled is to define privacy-sensitive meta data and keep both original and obfuscated (hash) values of privacy-sensitive data on disk for Decoders, Log Decoders, Concentrators, and Brokers.

The assumption is that only a handful of meta data (approximately 10 meta keys) will be classified as protected and a FIPS 140-compliant algorithm for hashing will be used along with a salt to make reverse engineering the original value difficult. The recommended solution is SHA-256 with a salt of length at least 16 characters and up to 60 characters.

Note: By default, hash values are stored in binary format for faster response times and because it requires less storage space in the database when compared to saving them in string format. The recommended storage method is text/string.

Brokers and Investigation may have original and obfuscated data in cache due to data privacy officers using Investigation to confirm the original value to which the obfuscated value maps during investigations. Downstream services can also limit the use of the original sensitive values to in-memory processing so that data does not persist on disk in those downstream systems; this holds true for ESA and Malware Analysis.

The recommended solution to delete data when ready is the built-in and automatic data retention enforcement, which deletes data at a certain threshold. You can use this method for the following components in Security Analytics 10.5: Decoder, Log Decoder, Log Collector, Archiver, Malware Analysis, Incident Management, and Reporting Engine. You can manually configure Event Stream Analysis to support similar automatic data retention enforcement.

To manage cache storage, the Security Analytics server clears cache related to investigations of events every 24 hours. The Broker can also be configured to execute a periodic removal of locally stored cache.

Options for Data Retention Configurations

Security Analytics provides alternative controls that the administrator can apply to enforce stronger restrictions on privacy-sensitive data storage when data obfuscation is enabled.

Data Storage With Data Retention Options in Effect

The following table summarizes where data is stored in the default configuration with no data privacy as well as for each data retention alternative. A checkmark indicates that privacy-sensitive data is saved on the component; an X indicates that no privacy-sensitive data is stored on the component.

                                                                                                              
ComponentDefault ConfigurationData Storage Options
 Original Data StoredOriginal Data and Hash Stored (recommended)Only Hash StoredNo Data Stored (all meta data is transient)

Ingestion

Decoder

checkmark3.png checkmark3.png X X
Log Decodercheckmark3.pngcheckmark3.pngXX

Meta Aggregation

Concentratorcheckmark3.pngcheckmark3.pngXX
Brokercheckmark3.png (Cache only)checkmark3.png (Cache only)XX

Real-Time Analysis

 
Investigationcheckmark3.pngcheckmark3.png (Cache only)XX

Event Stream Analysis

checkmark3.pngXXX
Malware Analysischeckmark3.pngXXX
Incident Managementcheckmark3.pngXXX

Reporting

Reporting Enginecheckmark3.pngcheckmark3.png (Optional)XX

Long-Term Analytics

Archivercheckmark3.png (Optional)checkmark3.png (Optional)XX
Warehousecheckmark3.png (Optional)checkmark3.png (Optional)XX

Content

Liven/an/an/an/a

Fraud Analysis

WebThreat Detectionn/an/an/an/a

End Point Protection

ECATn/an/aN/AN/A

Notes:

Cache Only means that sensitive data is in the Broker or Security Analytics Server cache. Configure Data Retention provides details about automated and manual clearing of cache.

Optional means that sensitive data storage does occur, but can be limited by optional configurations. For example, to limit where sensitive data is stored, do not enable DPO access for Reporting and do not aggregate original protected data into the Archiver.

Option 1: No Original Data Saved to Disk, Only Hash Stored

Administrators can eliminate the persistence of sensitive data to disk and store only an obfuscated value if the risk of exposure is too great. In this scenario, meta data generated during parsing on the Decoders and Log Decoders is used only in memory and not written to disk. Administrators can configure individual meta keys on a Decoder or Log Decoder as transient to ensure that sensitive meta data is not written to disk. Downstream services do not see original values and must use obfuscated values  to conduct investigation and analytics.

To configure this data privacy scheme, data obfuscation must be enabled with hash values configured. You can configure individual meta keys on a Decoder or Log Decoder as transient to ensure that original values are not written to disk.

  • Original values identified as sensitive are extracted from the raw data during parsing on the Decoder and Log Decoder and are accessible to the system during parsing (parsers, rules, feeds).
  • The Decoder does not save the original values for meta keys identified as sensitive, storing only the hash of original values along with other non-sensitive meta data related to the event.

A side effect of these options is some loss in analytical capability, but you can configure these to suit the needs of your environment.

  • By configuring all sensitive data as Transient, sensitive values are not persisted to disk, and the analytic capabilities using the original value are available at parse time only (parsers, rules, feeds).
  • Event stream analysis (ESA) and malware analysis systems must rely only on the obfuscated meta values when doing their correlation and scoring respectively.
  • Reporting Engine is limited to pulling reports using the non-sensitive and obfuscated values.
  • The data privacy officer cannot view the original value, but can use the configured hash and salt to determine if an obfuscated value represents a specific known original value.

Option 2: No Original or Obfuscated Values Stored: not recommended

Administrators can eliminate the persistence of the original value to disk entirely if the risk of exposure is too great. As in Option 1, in this scenario, meta data generated during parsing on the Decoders and Log Decoders is used only in memory and not written to disk. Administrators can configure individual meta keys on a Decoder or Log Decoder as transient to ensure that sensitive meta data is not written to disk. Downstream services do not see original values and have no obfuscated values to conduct investigation and analytics.

To configure this data privacy scheme, configure individual meta keys on a Decoder or Log Decoder as transient to ensure that original values are not written to disk.

  • Original values identified as sensitive are extracted from the raw data during parsing on the Decoder and Log Decoder and are accessible to the system during parsing (parsers, rules, feeds).
  • The Decoder does not save not save the original values for meta keys identified as sensitive, storing only non-sensitive meta data related to the event.

A side effect of these options is significant loss in analytical capability, but you can configure these to suit the needs of your environment.

  • By configuring all sensitive data as Transient, sensitive values are not persisted to disk, and the analytic capabilities using the original value are available at parse time only (parsers, rules, feeds). See Configure Data Retention.
  • All downstream components have no visibility in  the original values, obfuscated or otherwise.
  • The data privacy officer has no visibility into the original value obfuscated or otherwise.

Optional Data Overwriting Options

Option 1: Limit Disk Space for Continuous Overwriting of Older Data

If the desired data retention period to store the data, and therefore the amount of storage required for that data, is known the size of the underlying hardware or the partition can be limited to that size. By reducing the hard drive storage or the partition size, the amount of free space available that has to be filled before new data overwrites it would also be limited. The newly ingested data continually overwrites the older data. Either solution must be done at deployment time to be effective.

Side effects of this option are:

  • The removal of some disks will limit the number of resources available to distribute the I/O, causing some degradation in performance.
  • The smaller partition size may cause some degradation in performance, but would alleviate some of the performance impact of removing disks.

Option 2: Use Tiered Storage to Overwrite Data on a Scheduled Basis

If overwriting of data is required on a scheduled automatic basis, you can configure the Decoders and Concentrators to use tiered storage. The tiered storage configuration provides a mechanism for invoking a script after a database file has been removed from the application but prior to its removal from the file system. If necessary, instead of moving the file to the second tier, or cold storage, (the intended function in a tiered storage use case), the script can use a utility like the CentOS shred utility to overwrite the file. This tool is less effective when the database is stored in a journaling file system like XFS, in which the Security Analytics Core database resides, and on a RAID logical drive like the ones with which the Security Analytics Core appliances.

Most other Security Analytics components do not have this option; their data is stored in a database that does not support the tiered storage mechanism. The only other component that could use this overwrite method is the Reporting Engine since it saves reports and alerts as individual files. However, the Reporting Engine charts are stored in a database so they would be immune to this technique.

Option 3: Purge Data Using String and Pattern Redaction Option

Data purging provides a mechanism to strategically overwrite a specific subset of data from the system in case any sensitive data has been persisted either on purpose or by accident. The Security Analytics wipe utility allows for unique patterns to be written over the data in the meta and packet databases for Security Analytics Core services, which may contain RAW packets or logs for existing sessions, based on a session identifier. All Security Analytics Core components have the capability to overwrite a subset of data that has been found by executing a query string, including regex patterns. The session identifiers resulting from the query are fed into the Security Analytics wipe utility.

Note: This option is not available if the data in the SA Core database has been compressed (as typically done in Archiver deployments).

In most Security Analytics components the database in use does not provide a built-in redaction or secure deletion mechanism. The Malware Analysis component can overwrite the data object in the database with the value private instead of deleting it during the data retention management process, but this is not meant to be a secure deletion mechanism.

Caution: Using this method on a large number to sessions has two drawbacks: it can be time-consuming and impact performance.

Limitations to Data Overwriting

There are limitations to the overwriting techniques described as Option 2 and 3. To perform the overwrite of the data in the disk sectors, the above options for overwriting and the overwrite command line tool provided as an alternative method (shred, a function of CentOS) make assumptions about the disk layout.  Security Analytics appliances use SSD drives and RAID configurations for performance and reliability reasons, and these inhibit the functionality of the overwrite techniques. If overwrite techniques alter SSD drives and RAID configurations in an attempt to increase security, there will inevitably be an associated performance cost reflected in ingest rates, query speeds, and potentially other areas. The command line tools available for overwrite are recommended only for special use cases when it is necessary to redact specific data. The tools are not for use in a real-time continuous method because of the potential performance cost that will be incurred.

You are here: Data Privacy Overview > Recommended Configurations

Attachments

    Outcomes