This topic guides administrators in how to tune a Packet Decoder specifically for high speed packet capture using NetWitness Suite 11.0. This applies when capturing packets on a 10G interface card. Packet capture at high speeds requires careful configuration and pushes the Decoder hardware to its limits, so please read this entire topic when implementing a 10G capture solution.
RSA NetWitness Suite provides support for high-speed collection on the Decoder. You can capture network packet data from higher speed networks and optimize your Packet Decoder to capture network traffic up to 8Gb/sec sustained and 10Gb/sec burst, depending on which parsers and feeds you have enabled.
Enhancements that facilitate capture in these environments include the following:
- Utilization of the pf_ring capture driver capability to leverage the commodity 10G Intel NIC card for high-speed capture.
- Introduction of assembler.parse.valve configuration, which automatically disables application parsers when certain thresholds are exceeded, to limit risk of packet loss. When the application parsers are disabled, network layer parsers are still active. When stats fall below exceeded thresholds, application parsers are automatically re-enabled.
- A Series 4S or Series 5 Decoder
- An Intel 82599-based ethernet card, such as the Intel x520. All RSA-provided 10G cards meet this requirement. Two examples are:
- All SMC-10GE cards provided by RSA.
- A Dell Network Daughter Card using an Intel controller to provide 10G network interfaces. This is included in all Series 5 hardware.
- For the Series 4S / Dell R620 only: 96 GB of DD3-1600 memory in dual-rank DIMMs. Single-rank DIMMs may decrease performance by as much as 10%. To determine the speed and rank of the installed DIMMs, run this command:
dmidecode -t 17.
- Sufficiently large and fast storage to meet the capture requirement. Storage considerations are covered later in this topic.
- Each Packet Decoder configured with a minimum of 2 DACs or SAN connectivity.
- Dell R620-based systems, such as the Series 4S, must have their BIOS updated to v1.2.6 or later.
- The 10G Decoder capability is only supported on RSA-provided Decoder Installation images. All required software is installed by default.
- If upgrading from a previous release, perform the upgrade first before proceeding with configuration
Install the 10G Decoder
Perform the following steps to install the NetWitness 10G Decoder:
Download and Update the BIOS
Download BIOS v2.2.3 from the following location:
- Download the Update Package for the Red Hat Linux file.
- Copy the file to the NetWitness server.
- Login as root.
- Change the permissions on the file to execute.
Run the following file:
- Reboot the system when execution is complete and a reboot is requested.
Locate the 10G Decoder Packages
The packages required to configure the 10G Decoder should already be present on the Decoder installation image. You should not have to install any additional packages. The packages that provide the 10G driver capability are:
Verify 10G Decoder Packages Are Installed
Installation of the 10G Decoder packages is handled automatically. Therefore, there should be no action to enable the 10G functionality.
- If you upgraded the kernel packages as part of an upgrade, a reboot is required. The operating system will recompile and install the drivers for the upgraded kernel.
- You can verify that the installation was successful if you see additional PFRINGZC interfaces available when selecting the Capture Port Adaper as described below.
Configure the 10G Decoder
Perform the following steps to configure the 10G Decoder:
- From the Decoder Explorer view, right-click Decoder and select Properties.
- In the properties drop-down menu, select reconfig and enter the following parameters:
This adjusts the Decoder packet processing pipeline to allow for higher raw data throughput, but less parsing ability.
- From the Decoder Explorer view, right-click database and select Properties.
- In the Properties drop-down menu, select reconfig and enter the following parameters:
This adjusts the packet database to use very large file sizes and Direct I/O.
- Select the capture port adapter. Options for this include:
- Single port capture - PFRINGZC,p1p1 or PFRINGZC,p1p2
- Capture off both ports – Select PFRINGZC,P1P1 and in the Explorer view, set capture.device.params = device=zc:p1p2,zc:p1p1
If the write thread is having trouble sustaining the capture speed, you can try the following:
Change /datebase/config/packet.integrity.flush to normal.
(Optional) Application parsing is extremely CPU intensive and can cause the Decoder to drop packets. To mitigate application parsing-induced drops, you can set /decoder/config/assembler.parse.valve to true. These are the results:
- When session parsing becomes a bottleneck, application parsers (HTTP, SMTP, FTP, and others) are temporarily disabled.
- Sessions are not dropped when the application parsers are disabled, just the fidelity of the parsing performed on those sessions.
- Sessions parsed when the application parsers are disabled still have associated network meta (from the network parser).
- The statistic /decoder/parsers/stats/blowoff.count displays the count of all sessions that bypassed application parsers (network parsing is still performed).
- When session parsing is no longer a potential bottleneck, the application parsers are automatically re-enabled.
- The assembler session pool should be large enough that it is not forcing sessions.
- You can determine if sessions are being forced by the statistic /decoder/stats/assembler.sessions.forced (it will be increasing). Also /decoder/stats/assembler.sessions will be within several hundred of /decoder/config/assembler.session.pool.
(Optional) If you need to adjust the MTU for capture, add the snaplen parameter to capture.device.params. Unlike previous releases, the snaplen does not need to be rounded up to any specific boundary. The Decoder automatically adjusts the MTU set on the capture interfaces.
The following configuration parameters are deprecated and no longer necessary
- The core= parameter in capture.device.params
- Any configuration files under /etc/pf_ring directory
Typical Configuration Parameters
Typical configuration parameters are listed below. Actual parameters may vary depending on the amount of memory and CPU resources available.
- session and packet pool settings(under /decoder/config):
- pool.packet.pages = 1000000
- pool.session.pages = 300000
Packet write block size under (/database/config/packet.write.block size) set to filesize.
Parse Thread Count (under /decoder/config).
When capturing at 10G line rates, the storage system holding the packet and meta databases must be capable of sustained write throughput of 1400 MBytes/s.
Using the Series 4S Hardware (With Two or More DAC Units)
The Series 4S is equipped with a hardware RAID SAS controller capable of an aggregate 48Gbit/s of I/O throughput. It is equipped with eight external 6 Gbit ports, organized into two 4-lane SAS cables. The recommended configuration for 10G is to balance at least two DAC units across these two external connectors. For example, connect one DAC to one port on SAS card, and then connect another DAC to the other port on the SAS card.
For environments with more than two DACs, chain them off each port in a balanced manner. This may require re-cabling of DACs in an existing deployment, but should not affect data that has already been captured on the Decoder.
If adding new capacity, use the currently available NwMakeArray script to provision the DAC units. The script automatically adds one DAC per execution (that means, if adding three DACs, then the script must be run three times), adding the DACs to the NwDecoder10G configuration as separate mount points. The independent mount points are important, as this configuration allows the NwDecoder10G to segregate write I/O from capture from the read I/O needed to satisfy packet content requests.
Using SAN and Other Storage Configurations
The Decoder allows any storage configuration that can meet the sustained throughput requirement. The standard 8-Gbit FC link to a SAN is not sufficient to store packet data at 10G; in order to use a SAN it may be required to perform aggregation across multiple targets using a software-RAID Scheme. Thus environments using SAN are required to configure connectivity to the SAN using multiple FCs.
Parsing and Content Considerations
Parsing raw packets at high speeds presents unique challenges. Given the high session and packet rates, parsing efficiency is paramount. A single parser that is inefficient (spends too long examining packets) can slow the whole system down to the point where packets are dropped at the card.
For initial 10G testing, start with only native parsers (except SMB/WebMail). Use the native parsers to establish baseline performance and with little to no packet drops. Do not download any Live content until this has been done and the system is proven to capture without issue at high speeds.
After the system has been operational and running smoothly, Live content should be added very slowly - especially parsers.
Whether you are updating a currently deployed system or deploying a new system, it is recommended you use the following best practices to minimize risk for packet loss. One caveat is if you are updating a current 10G deployment but not adding any additional traffic. For example, a current Decoder capturing off a 10G card at 2G sustained should see no difference in performance, unless part of the update also entails adding additional traffic for capture.
- Incorporate baseline parsers (except SMB/Webmail, both of which generally have high CPU utilization) and monitor to ensure little to no packet loss.
- When adding additional parsers, add only one or two parsers at a time.
- Measure performance impact of newly added content, especially during peak traffic periods.
- If drops start occurring when they did not happen before, disable all newly-added parsers and enable just one at a time and measure the impact. This helps pinpoint individual parsers causing detrimental effects on performance. It may be possible to refactor it to perform better or reduce its feature set to just what is necessary for the customer use case.
- Although lesser performance impacts, feeds should also be reviewed and added in a phased approach to help measure performance impacts.
- Application Rules also tend to have little observable impact, though again, it is best not to add a large number of rules at once without measuring the performance impact.
Finally, making the recommended configuration changes outlined in the Configuration section will help minimize potential issues.
Tested Live Content
All (not each) of the following parsers can run at 10G on the test data set used:
- MA content (7 Lua parsers, 1 feed, 1 application rule)
- 4 feeds (alert ids info, nwmalwaredomains, warning, and suspicious)
- 41 application rules
- DNS_verbose_lua (disable DNS)
- MAIL_lua (disable MAIL)
- SNMP_lua (disable SNMP)
- SSH_lua (disable SSH)
- SMB_lua, native SMB disabled by default
- HTTP_lua reduces the capture rate from >9G to <7G. At just under 5G this parser can be used in place of the native without dropping (in addition to the list above).
- xor_executable pushes parse CPU to 100% and the system can drop significantly due to parse backup.
Aggregation Adjustments Based on Tested Live Content
A 10G Decoder can serve aggregation to a single Concentrator while running at 10G speeds. Deployments using Malware Analysis, Event Stream Analysis, Warehouse Connector, and Reporting Engine are expected to impact performance and can lead to packet loss.
For the tested scenario, the Concentrator aggregates between 45 and 70k sessions/sec. The 10G Decoder captures between 40-and 50k sessions/sec. With the content identified above, this is about 1.5 to 2 million meta/sec. Due to the high volume of session rates, the following configuration changes are recommended:
- Nice aggregation on the Concentrator limits the performance impact on the 10G Decoder. The following command turns on nice aggregation.
/concentrator/config/aggregate.nice = true
- Due to the high volume of sessions on the Concentrator, you may consider activating parallel values mode on the Concentrator by setting /sdk/config/parallel.values to 16. This improves Investigation performance when the number of sessions per second is greater than 30,000.
- If multiple aggregation streams are necessary, aggregating from the Concentrator instead has less impact on the Decoder.
- Further review for content and parsing is required for deployments where you want to use other NetWitness Suite components (Warehouse, Malware Analysis, ESA, and Reporting Engine).