Decoder: Configure 10G Capability

Document created by RSA Information Design and Development on Mar 22, 2017Last modified by RSA Information Design and Development on Sep 25, 2017
Version 3Show Document
  • View in full screen mode
  

This topic guides administrators in how to tune a Packet Decoder specifically for high speed packet capture.  

This guide applies when capturing packets on a 10G interface card. Packet capture at high speeds requires careful configuration and pushes the Decoder hardware to its limits, so please read this entire topic when implementing a 10G capture solution.

RSA Security Analytics Version 10.6.2 provides support for high-speed collection on the Decoder. You can capture network packet data from higher speed networks and optimize your Packet Decoder to capture network traffic up to 8Gb/sec sustained and 10Gb/sec burst, depending on which parsers and feeds you have enabled.

Note: You can skip to Configure 10G Decoder if you are starting with new Series 5 hardware.

Enhancements introduced to facilitate capture in these environments include the following:

• Utilization of pf_ring capture driver capability to leverage commodity 10G Intel NIC card for high speed capture.

• Introduction of assembler.parse.valve configuration. Configuration automatically disables application parsers when certain thresholds are exceeded to limit risk of packet loss. Once disabled, network layer parsers are still active. Once stats fall below exceeded thresholds, application parsers will automatically re-enable.

• Introduction of parallel.values configuration on the Concentrator for query optimizations.

Hardware Prerequisites

  • Series 4S Decoder
  • Intel 82599-based ethernet card, such as the Intel x520. All RSA-provided 10G cards meet this requirement.
  • 96 GB of DD3-1600 memory in dual-rank DIMMs. Single rank DIMMs may decrease performance by as much as 10%. To determine the speed and rank of the installed DIMMs, run the command dmidecode -t 17.
  • Sufficiently large and fast storage to meet the capture requirement. Storage considerations are covered later in this topic.

Software Prerequisites

  • Linux kernel package obtained from RSA. Only Linux kernel packages provided by RSA are supported.
  • The pfring package that matches the currently installed kernel. The kernel version must match the pfring version exactly.

10G Decoder Installation

Perform the following steps to install the Security Analytics 10.6.2 10G Decoder:

Prerequisites

• SA-S4H-P-DEC or SMC-S4H-P-DEC platforms built on the Dell R620 Platform

• SMC-10GE-* 10G Intel 520 NIC installed (available from RSA)

• Packet Decoders updated to 10.6.2

• Each Packet Decoder configured with a minimum of 2 DACs or SAN connectivity.

Note: Refer to Storage Considerations in this document prior to update, as physical re-cabling may be required.

• Dell R620 BIOS v1.2.6 or later. It is recommended that customers update to the latest v2.2.3 BIOS, but is not required for 10G if they are running v1.2.6 or later.

Note: BIOS revisions earlier than v1.2.6 have issues properly identifying the location of the 10G capture card within the system. It is important to update the BIOS before installing packages, as the packages use information provided by the BIOS to initialize the system.

BIOS Installation Instructions

1. Download BIOS v2.2.3 from the following location:

http://www.dell.com/support/home/us/en/19/Drivers/DriversDetails?driverId=V7P04

2. Download the Update Package for Red Hat Linux file.

3. Copy the file to the Security Analytics server.

4. Login as root.

5. Change the permissions on the file to execute.

6. Run the following file:

./BIOS_V7P04_LN_2.2.3.BIN

7. The system will request a reboot when complete.

Note: The BIOS installation procedure takes approximately 10 minutes.

Update 10G Decoder

1. Update the Decoder appliance to the version 10.6.2 release, including any and all OS patches. The minimum version of the security patch applied is RSA Security Analytics Version 10.6.2. This release requires Linux kernel package:

kernel- 2.6.32-642.6.2, which is the kernel release for RSA Security Analytics Version 10.6.2.

2. Ensure that the kernel, pfring, and numactl versions are as follows:

kernel- 2.6.32-573.12.1.el6.x86_64

pf_ring 6.0.3-8598.2.6.32.642.6.2

numactl-2.0.9-2.el6 .x86_64.rpm

Install 10G Decoder

Download the latest version of the pfring rpm package from smcupdate

pfring-6.0.3-8598.2.6.32.573.12.1.el6.x86_64.rpm

For more information, refer to RSA SecurCare: https://knowledge.rsasecurity.com.

2. Via ssh, install the packages using the following command once the files are scp’ed to the Decoder:

rpm -ivh pfring*

Note: Be sure to perform the following checks:

a. Check for the el6 rpm using the following command:

rpm –qa |grep numactl*

b. Check to ensure the version is numactl-2.0.9-2.el6 .x86_64.rpm

Note: If the update step above is performed prior to upgrading the BIOS, the following steps need to be performed:

• Uninstall the packages via rpm –e command.

• Update BIOS to v2.2.3

• Run the rpm commands to install the necessary packages again.

3. Ensure that the kernel, pfring, and numactl versions are as follows:

kernel- 2.6.32-573.12.1.el6.x86_64

pfring-6.0.3-8598.2.6.32.573.12.1.el6.x86_64.rpm

numactl-2.0.9-2.el6 .x86_64.rpm

4. Reboot the Decoder appliance (full system restart is required to ensure the pf_ring drivers load correctly).

5. Once the Decoder reboots, you can verify that the installation was successful if you see additional PFRINGZC interfaces available under the options for “Capture Interface Selected” (shown below).

Configure 10G Decoder

Once updated, perform the following steps to configure the 10G Decoder:

1. From the Decoder Explorer view, right click Decoder and select Properties.

2. In the Properties drop-down menu, select reconfig and enter the following parameters:

update=1 op=10g

3. From the Decoder Explorer view, right click database and select Properties.

4. In the Properties drop-down menu, select reconfig and enter the following parameters shown in the following screen capture:

update=1 op=10g

5. Select the capture port adapter. Options for this include:

a. Single port capture - PFRINGZC,p1p1 or PFRINGZC,p1p2

b. Capture off both ports –

i. Select PFRINGZC,P1P1

ii. In Explorer view, set capture.device.params = device=zc:p1p2,zc:p1p1

c. Ensure that the selected capture hardware is on the correct NUMA node.

From an ssh session to the appliance, execute the following statement:

cat /sys/class/net/<interface_name>/device/numa_node

where <interface_name> is the selected capture interface (for example, p1p1).

If the result is 0 (zero), no additional configuration is necessary.

If not, add the result as the parameter core to the capture parameters, as shown below:

/decoder/config/capture.device.params: core=1

This change requires a service restart to take effect.

Note: Based on hardware configuration, the capture ports may be identified with a different name other than p1p1/p1p2 but will always have the prefix PFRINGZC. For example, on some appliances these ports may be identified as eth4 / eth5. To capture from eth4, select PFRINGZC,eth4. To capture from eth5, select PFRINGZC,eth5.

6. If the write thread is having trouble sustaining the capture speed, you can try the following:

Change /datebase/config/packet.integrity.flush to normal.

Note: You can try adjusting the packet.file.size to something higher, but keep the file size under 10 GB, as the whole file is buffered in memory at these speeds.

7. (Optional) Application parsing is extremely CPU intensive and can cause the Decoder to drop packets. To mitigate application parsing induced drops, the setting /decoder/config/assembler.parse.valve can be set to true. This will result in the following:

• When session parsing becomes a bottleneck, application parsers (HTTP, SMTP, FTP, etc.) will be temporarily disabled.

• Sessions are not dropped when the application parsers are disabled, just the fidelity of the parsing performed on those sessions.

• Sessions parsed when the application parsers are disabled will still have associated network meta (NETWORK parser).

• The statistic /decoder/parsers/stats/blowoff.count displays the count of all sessions that bypassed application parsers (network parsing is still performed).

• When session parsing is no longer a potential bottleneck, the application parsers are automatically re-enabled.

8. The assembler session pool should be large enough that it is not forcing sessions.

• To determine if sessions are being forced by the statistic /decoder/stats/assembler.sessions.forced (it will be increasing) and /decoder/stats/assembler.sessions will be within several hundred of /decoder/config/assembler.session.pool.

• RSA Security’s test site used the following configuration at just under 10G:

/decoder/config/assembler.session.pool was set to 1000000

and /decoder/stats/assembler.sessions would average 630K.

An alternative method for Steps 1 through 4 listed above can be used to configure the 10G Decoder by performing Steps 1, 2, 3, and 4 explained below. Steps 5 through 8 listed above are required if you are using this method.

1. Update session and packet pool settings to the following values (under /decoder/config):

a. pool.packet.pages = 1000000

b. pool.session.pages = 300000

2. Packet write block size under ( /database/config/packet.write.block size) must be set to exactly 4 GB, or for Version 10.6+, use filesize.

Note: This configures the Decoder to buffer the file with huge pages and write using direct I/O for maximum performance.

3. Update parse thread settings to the following values (under /decoder/config).

a. parse.threads =12

4. Select the capture port adapter. Options for this include:

a. Single port capture - PFRINGZC,p1p1 or PFRINGZC,p1p2

b. Capture off both ports –

i. Select PFRINGZC,P1P1

ii. In Explorer view, set capture.device.params = device=zc:p1p2,zc:p1p1

Note: Based on hardware configuration, the capture ports may be identified with a different name other than p1p1/p1p2 but will always have the prefix PFRINGZC. For example, on some appliances these ports may be identified as eth4 / eth5. To capture from eth4, select PFRINGZC,eth4. To capture from eth5, select PFRINGZC,eth5.

Storage Considerations

When capturing at higher speed rates, the storage system holding the packet and meta databases must be capable of the necessary throughput for reads and writes to disk. Supported options for DAC and SAN configurations are outlined below.

Using the Series 4S Hardware (With Two or More DAC Units)

The Decoder head unit is equipped with a hardware-RAID SAS controller card providing connectivity to the DAC. In most deployments these are configured such that the DACs are daisy-chained off a single port on the SAS card. To support higher speed environments, a minimum of two DACs are required per Decoder and must each be connected directly to the SAS card. To accommodate two DACs, connect the first DAC to one port on the SAS card, and then connect another DAC to the other port on the SAS card. For environments with more than two DACs, chain them off each port in a balanced manner. This may require a re-cabling of DACs in an existing deployment, but should not affect data that has already been captured on the Decoder.

If adding new capacity, use the currently available NwMakeArray script to provision the DAC units. The script automatically adds one DAC per execution (i.e., if adding three DACs, then the script must be run three times), adding them to NwDecoder10G's configuration as separate mount points. The independent mount points are important, as it allows the NwDecoder10G to segregate write I/O from capture from the read I/O needed to satisfy packet content requests.

Using SAN Storage

The Decoder will allow any storage configuration that can meet the sustained throughput requirement. Note that the standard 8Gbit FC link to a SAN is not sufficient to read and write packet data at 10G, thus environments utilizing SAN are required to configure connectivity to the SAN utilizing multiple FCs.

Parsing and Content Consideration for Packet Capture

Capturing and performing enrichment against raw packets can present unique challenges at any capture rate. With higher session and packet rates in 10G, parsing efficiency is paramount. A single parser can have a detrimental effect on the system, ultimately resulting in packet drops. Testing performed for 10G capture included baseline parsers as well as combinations of feeds, rules, and other content accessible via RSA Live. Whether a customer is updating a currently deployed system or deploying a new system, it is recommended they utilize the following best practices to minimize risk for packet loss. One caveat is if you are updating a current 10G deployment but not adding any additional traffic. For example, a current Decoder capturing off a 10G card at 2G sustained should see no difference in performance, unless part of the update also entails adding additional traffic for capture.

10G Best Practices

1. Incorporate baseline parsers (except SMB/Webmail, both of which generally have high CPU utilization) and monitor to ensure little to no packet loss.

2. When adding additional parsers, add only one or two parsers at a time.

3. Measure performance impact of newly added content, especially during peak traffic periods.

• If drops start occurring when they did not happen before, disable all newly-added parsers and enable just one at a time and measure the impact. This helps pinpoint individual parsers causing detrimental effects on performance. It may be possible to refactor it to perform better or reduce its feature set to just what is necessary for the customer use case.

• Although lesser performance impacts, feeds should also be reviewed and added in a phased approach to help measure performance impacts.

• Application Rules also tend to have little observable impact, though again, it is best not to add a large number of rules at once without measuring the performance impact.

Finally, making the recommended configuration changes outlined in the Configuration section will help minimize potential issues.

Aggregation on a 10G Decoder to Other Security Analytics Components

With the initial release, aggregation from the Packet Decoder to a Concentrator is supported. Deployments utilizing Malware Analytics, Event Stream Analysis, Warehouse Connector, and Reporting Engine are expected to impact performance and can lead to packet loss. Due to the high volume of session rates, the following configuration changes are recommended:

• Nice aggregation on the Concentrator limited the performance impact on the 10G Decoder

/concentrator/config/aggregate.nice = true

• Due to the high volume of sessions on the concentrator, you may consider activating "parallel values" mode on the concentrator by setting /sdk/config/parallel.values to true. This will improve investigation performance when the number of sessions per second is above 30,000.

• Further review for content and parsing will be required for deployments where utilization of other SA Components are desired (i.e., Warehouse, Malware Analysis, ESA, and Reporting Engine).

Decoder

Storage Considerations

When capturing at 10G line rates, the storage system holding the packet and meta databases must be capable of sustained write throughput of 1400 MBytes/s.

There are several ways to achieve such high sustained throughput. Here we describe one such possible solution, though other storage architectures are possible.

Using the Series 4S hardware, with two DAC units

The Series 4S is equipped with a hardware RAID SAS controller capable of an aggregate 48Gbit/s of I/O throughput. It is equipped with 8 external 6 Gbit ports, organized into two 4-lane SAS cables. The recommended configuration for 10G is to balance at least 2 DAC units across these two external connectors. For example, connect 1 DAC to one port on SAS card, and then connect another DAC to the other port on the SAS card. As you add more DACs, chain them off of each port in a balanced manner.

As you add capacity, use the NwMakeArray script to provision the DAC units. This will automatically add them to NwDecoder10G's configuration as separate mount points. The independent mount points are important as it allows the NwDecoder10G to segregate write I/O from capture from the read I/O needed to satisfy packet content requests.

Other Storage Configurations (SAN, etc.)

The Decoder will allow any storage configuration that can meet the sustained throughput requirement. Note that the standard 8Gbit FC link to a SAN is not sufficient to store packet data at 10G, thus in order to use a SAN it may be required to perform aggregation across multiple targets using a software-RAID Scheme.

Parsing at High Speeds

Obviously, parsing raw packets at high speeds presents unique challenges. Given the high session and packet rates, parsing efficiency is paramount. A single parser that is inefficient (spends too long examining packets) can slow the whole system down to the point where packets are dropped at the card. For initial 10G testing, start with only native parsers (except SMB/WebMail). Use the native parsers to establish baseline performance and with little to no packet drops. Do not download any Live content until this has been done and the system is proven to capture without issue at high speeds.

After the system has been operational and running smoothly, Live content should be added very slowly - especially parsers. Parsers can have a dramatic effect on performance. Here are some rules of thumb:

Tested Live Content

The following parsers can all (not each) be run at 10G on our test data set:

  • MA content (7 Lua parsers,1 feed, 1 application rule)
  • 4 feeds (alert ids info, nwmalwaredomains, warning and suspicious)
  • 41 application rules
  • DNS_verbose_lua (disable DNS)
  • fingerprint_javascript_lua
  • fingerprint_pdf_lua
  • fingerprint_rar_lua
  • fingerprint_rtf_lua
  • MAIL_lua (disable MAIL)
  • SNMP_lua (disable SNMP)
  • spectrum_lua
  • SSH_lua (disable SSH)
  • TLS_lua
  • windows_command_shell
  • windows_executable

NOT TESTED:

  • SMB_lua, native SMB disabled by default
  • html_threat

OTHER:

HTTP_lua reduces capture rate from >9G to <7G. At just under 5G, this parser can be used in place of the native without dropping (in addition to the list above). xor_executable will push parse CPU to 100% and the system can drop significantly at any time due to parse backup.

Aggregation on a 10G Decoder

A 10G Decoder can serve aggregation to a single concentrator while running at 10G speeds. 

  1. Concentrator aggregates between 45-70k sessions/sec
  2. The 10G Decoder is capturing between 40-50k sessions/sec.
    With content identified above, this is about 1.5 to 2 million meta/sec.
  3. Turn on nice aggregation on the Concentrator to limit the performance impact on the Decoder.
    /concentrator/config/aggregate.nice=true
  4. Due to the high volume of sessions on the concentrator, you may consider activating parallelvaluesmode on the concentrator by setting /sdk/config/parallel.values to true. This will improve investigation performance when the number of sessions per second is above 30k.

If multiple aggregation streams are necessary, it would be less impactful on the Decoder to aggregate from the Concentrator instead.

Previous Topic:Use Custom Parsers
You are here
Table of Contents > Additional Procedures > Configure 10G Capability

Attachments

    Outcomes