This topic guides administrators in how to tune a Packet Decoder specifically for high speed packet capture using NetWitness Suite 11.0. This applies when capturing packets on a 10G interface card. Packet capture at high speeds requires careful configuration and pushes the Decoder hardware to its limits, so please read this entire topic when implementing a 10G capture solution.
RSA NetWitness Suite provides support for high-speed collection on the Decoder. You can capture network packet data from higher speed networks and optimize your Packet Decoder to capture network traffic up to 8Gb/sec sustained and 10Gb/sec burst, depending on which parsers and feeds you have enabled.
Enhancements that facilitate capture in these environments include the following:
- Utilization of the pf_ring capture driver capability to leverage the commodity 10G Intel NIC card for high-speed capture.
- Introduction of assembler.parse.valve configuration, which automatically disables application parsers when certain thresholds are exceeded, to limit risk of packet loss. When the application parsers are disabled, network layer parsers are still active. When stats fall below exceeded thresholds, application parsers are automatically re-enabled.
- A Series 4S or Series 5 Decoder
- An Intel 82599-based ethernet card, such as the Intel x520. All RSA-provided 10G cards meet this requirement. Two examples are:
- All SMC-10GE cards provided by RSA.
- A Dell Network Daughter Card using an Intel controller to provide 10G network interfaces. This is included in all Series 5 hardware.
- For the Series 4S / Dell R620 only: 96 GB of DD3-1600 memory in dual-rank DIMMs. Single-rank DIMMs may decrease performance by as much as 10%. To determine the speed and rank of the installed DIMMs, run this command:
dmidecode -t 17.
- Sufficiently large and fast storage to meet the capture requirement. Storage considerations are covered later in this topic.
- Each Packet Decoder configured with a minimum of 2 DACs or SAN connectivity.
- Dell R620-based systems, such as the Series 4S, must have their BIOS updated to v1.2.6 or later.
- The 10G Decoder capability is only supported on RSA-provided Decoder Installation images. All required software is installed by default.
- If upgrading from a previous release, perform the upgrade first before proceeding with configuration
Install the 10G Decoder
Perform the following steps to install the NetWitness 10G Decoder:
Download and Update the BIOS
Download BIOS v2.2.3 from the following location:
- Download the Update Package for the Red Hat Linux file.
- Copy the file to the NetWitness server.
- Login as root.
- Change the permissions on the file to execute.
Run the following file:
- Reboot the system when execution is complete and a reboot is requested.
Locate the 10G Decoder Packages
The packages required to configure the 10G Decoder should already be present on the Decoder installation image. You should not have to install any additional packages.
Verify 10G Decoder Packages Are Installed
Installation of the 10G Decoder packages is handled automatically. Therefore, there should be no action to enable the 10G functionality.
- If you upgraded the kernel packages as part of an upgrade, a reboot is required. The operating system will recompile and install the drivers for the upgraded kernel.
- You can verify that the installation was successful if you see additional PFRINGZC interfaces available when selecting the Capture Port Adaper as described below.
Configure the 10G Decoder
Perform the following steps to configure the 10G Decoder:
- From the Decoder Explorer view, right-click Decoder and select Properties.
- In the properties drop-down menu, select reconfig and enter the following parameters:
This adjusts the Decoder packet processing pipeline to allow for higher raw data throughput, but less parsing ability.
- From the Decoder Explorer view, right-click database and select Properties.
- In the Properties drop-down menu, select reconfig and enter the following parameters:
This adjusts the packet database to use very large file sizes and Direct I/O.
- Select the capture port adapter. Options for this include (in the following examples, "p1p1" and "p1p2" are placeholders and should be replaced with your own interface names):
- Single port capture - PFRINGZC,p1p1 or PFRINGZC,p1p2
- Capture off both ports – Select PFRINGZC,P1P1 and in the Explorer view, set capture.device.params = device=zc:p1p2,zc:p1p1
If the write thread is having trouble sustaining the capture speed, you can try the following:
Change /datebase/config/packet.integrity.flush to normal.
(Optional) Application parsing is extremely CPU intensive and can cause the Decoder to drop packets. To mitigate application parsing-induced drops, you can set /decoder/config/assembler.parse.valve to true. These are the results:
- When session parsing becomes a bottleneck, application parsers (HTTP, SMTP, FTP, and others) are temporarily disabled.
- Sessions are not dropped when the application parsers are disabled, just the fidelity of the parsing performed on those sessions.
- Sessions parsed when the application parsers are disabled still have associated network meta (from the network parser).
- The statistic /decoder/parsers/stats/blowoff.count displays the count of all sessions that bypassed application parsers (network parsing is still performed).
- When session parsing is no longer a potential bottleneck, the application parsers are automatically re-enabled.
- The assembler session pool should be large enough that it is not forcing sessions.
- You can determine if sessions are being forced by the statistic /decoder/stats/assembler.sessions.forced (it will be increasing). Also /decoder/stats/assembler.sessions will be within several hundred of /decoder/config/assembler.session.pool.
(Optional) If you need to adjust the MTU for capture, add the snaplen parameter to capture.device.params. Unlike previous releases, the snaplen does not need to be rounded up to any specific boundary. The Decoder automatically adjusts the MTU set on the capture interfaces.
The following configuration parameters are deprecated and no longer necessary
- The core= parameter in capture.device.params
- Any configuration files under /etc/pf_ring directory
Typical Configuration Parameters
Typical configuration parameters are listed below. Actual parameters may vary depending on the amount of memory and CPU resources available.
- session and packet pool settings(under /decoder/config):
- pool.packet.pages = 1000000
- pool.session.pages = 300000
Packet write block size under (/database/config/packet.write.block size) set to filesize.
Parse Thread Count (under /decoder/config).
When capturing at 10G line rates, the storage system holding the packet and meta databases must be capable of sustained write throughput of 1400 MBytes/s.
Using the Series 4S Hardware (With Two or More DAC Units)
The Series 4S is equipped with a hardware RAID SAS controller capable of an aggregate 48Gbit/s of I/O throughput. It is equipped with eight external 6 Gbit ports, organized into two 4-lane SAS cables. The recommended configuration for 10G is to balance at least two DAC units across these two external connectors. For example, connect one DAC to one port on SAS card, and then connect another DAC to the other port on the SAS card.
For environments with more than two DACs, chain them off each port in a balanced manner. This may require re-cabling of DACs in an existing deployment, but should not affect data that has already been captured on the Decoder.
If adding new capacity, use the currently available NwMakeArray script to provision the DAC units. The script automatically adds one DAC per execution (that means, if adding three DACs, then the script must be run three times), adding the DACs to the NwDecoder10G configuration as separate mount points. The independent mount points are important, as this configuration allows the NwDecoder10G to segregate write I/O from capture from the read I/O needed to satisfy packet content requests.
Using SAN and Other Storage Configurations
The Decoder allows any storage configuration that can meet the sustained throughput requirement. The standard 8-Gbit FC link to a SAN is not sufficient to store packet data at 10G; in order to use a SAN it may be required to perform aggregation across multiple targets using a software-RAID Scheme. Thus environments using SAN are required to configure connectivity to the SAN using multiple FCs.
Parsing and Content Considerations
Parsing raw packets at high speeds presents unique challenges. Given the high session and packet rates, parsing efficiency is paramount. A single parser that is inefficient (spends too long examining packets) can slow the whole system down to the point where packets are dropped at the card.
For initial 10G testing, start with only native parsers (except SMB/WebMail). Use the native parsers to establish baseline performance and with little to no packet drops. Do not download any Live content until this has been done and the system is proven to capture without issue at high speeds.
After the system has been operational and running smoothly, Live content should be added very slowly - especially parsers.
Whether you are updating a currently deployed system or deploying a new system, it is recommended you use the following best practices to minimize risk for packet loss. One caveat is if you are updating a current 10G deployment but not adding any additional traffic. For example, a current Decoder capturing off a 10G card at 2G sustained should see no difference in performance, unless part of the update also entails adding additional traffic for capture.
- Incorporate baseline parsers (except SMB/Webmail, both of which generally have high CPU utilization) and monitor to ensure little to no packet loss.
- When adding additional parsers, add only one or two parsers at a time.
- Measure performance impact of newly added content, especially during peak traffic periods.
- If drops start occurring when they did not happen before, disable all newly-added parsers and enable just one at a time and measure the impact. This helps pinpoint individual parsers causing detrimental effects on performance. It may be possible to refactor it to perform better or reduce its feature set to just what is necessary for the customer use case.
- Although lesser performance impacts, feeds should also be reviewed and added in a phased approach to help measure performance impacts.
- Application Rules also tend to have little observable impact, though again, it is best not to add a large number of rules at once without measuring the performance impact.
Finally, making the recommended configuration changes outlined in the Configuration section will help minimize potential issues.
Tested Live Content
All (not each) of the following parsers can run at 10G on the test data set used:
- MA content (7 Lua parsers, 1 feed, 1 application rule)
- 4 feeds (alert ids info, nwmalwaredomains, warning, and suspicious)
- 41 application rules
- DNS_verbose_lua (disable DNS)
- MAIL_lua (disable MAIL)
- SNMP_lua (disable SNMP)
- SSH_lua (disable SSH)
- SMB_lua, native SMB disabled by default
- HTTP_lua reduces the capture rate from >9G to <7G. At just under 5G this parser can be used in place of the native without dropping (in addition to the list above).
- xor_executable pushes parse CPU to 100% and the system can drop significantly due to parse backup.
Aggregation Adjustments Based on Tested Live Content
A 10G Decoder can serve aggregation to a single Concentrator while running at 10G speeds. Deployments using Malware Analysis, Event Stream Analysis, Warehouse Connector, and Reporting Engine are expected to impact performance and can lead to packet loss.
For the tested scenario, the Concentrator aggregates between 45 and 70k sessions/sec. The 10G Decoder captures between 40-and 50k sessions/sec. With the content identified above, this is about 1.5 to 2 million meta/sec. Due to the high volume of session rates, the following configuration changes are recommended:
- Nice aggregation on the Concentrator limits the performance impact on the 10G Decoder. The following command turns on nice aggregation.
/concentrator/config/aggregate.nice = true
- Due to the high volume of sessions on the Concentrator, you may consider activating parallel values mode on the Concentrator by setting /sdk/config/parallel.values to 16. This improves Investigation performance when the number of sessions per second is greater than 30,000.
- If multiple aggregation streams are necessary, aggregating from the Concentrator instead has less impact on the Decoder.
- Further review for content and parsing is required for deployments where you want to use other NetWitness Suite components (Warehouse, Malware Analysis, ESA, and Reporting Engine).
Optimize Read/Write Operations When Adding New Storage
A 10G Decoder is optimized to stagger read and write operations across multiple volumes so that the current file being written is on a different volume from the next file that will be written. This allows maximum throughput on the raid volume when reading data from the last file being written while writing the current file on a different volume. However, if volumes are added after a Decoder has been in use, the ability to stagger is limited because one or more volumes are already full so the new volume is the only place new files can be written.
To remedy this situation, an administrator can run a stagger command on an existing NetWitness Suite database (packet, log, meta, or session), that has at least two volumes, to stagger the files across all volumes in the most optimal read/write pattern. The major use case is when new storage is added to an existing Decoder and you want to stagger the volumes BEFORE restarting capture.
The configuration nodes for this command are the session, meta, and packet databases. Each of these lives under /database/config, which is usually a root node. The config nodes for a Decoder are:
The NetWitness Suite Core Database Tuning Guide has information on how those configurations are formatted.
The stagger command is typically only useful for a 10G Decoder and usually just for the packet database. Maximum performance is achieved for storing and retrieving packets when multiple volumes are present. In this scenario, the Decoder always fills the volume with the most free space. When the volumes are roughly the same size, this results in a staggered write pattern, which allows maximum throughput for reading and writing across all volumes. However, this only naturally occurs when multiple packet storage volumes are present at the time the Decoder is first deployed.
A typical use case is adding more storage to an existing Decoder to increase retention. However, when adding storage to an deployment that has already filled the existing volumes with stored packets, the Decoder will naturally fill the new storage with packets before rolling out any packets on the existing storage. This results in a suboptimal read/write pattern because most reads will occur on the same volume that is currently being written to. In a 10G deployment, reads are blocked from the volume when writes are occurring. This doesn't stop ALL reads on that volume, because the file is buffered in memory before being written, but it does result in suboptimal read performance.
With the stagger command, you can add more storage and then have the service naturally stagger the files across ALL volumes (existing and new) so that read performance is optimized.
The downside to this command is it can take some time to stagger and the Decoder should not be capturing during the stagger operation.
- Add all storage and configure mount points.
- Add new storage mount points to packet.dir (or session.dir/meta.dir) and restart service (very important!).
- Ensure capture is stopped.
- Run stagger operation but make sure the connection that initiated the stagger operation is never terminated until the operation is complete. If the connection is terminated, then the stagger operation will be canceled. If the operation is canceled, the files that were already staggered will remain in place. The operation can be resumed by rerunning the same command (the work already done will not need to be done again). If running stagger from NwConsole, run the timeout 0 command before sending the stagger command. This will prevent the normal 30-second command timeout.
- Start capture after stagger command finishes.
The following are the parameters for the command:
- type - The database that will be staggered (session, meta, or packet). Typically only the packet database is useful for staggering, but it is possible to do the session or meta database when multiple volumes are present for those databases. Since the session and meta databases write far less data than the packet database, typically staggering those databases results in less noticeable performance gains.
- dryRun - If true (the default), will only return a description of the operations that would be performed. If false, then the files will actually be moved to an optimal read/write pattern. You MUST pass false to actually stagger the files.
Example usage from NwConsole:
login <decoder>:50004 <username> <password>
send /database stagger type=packet dryRun=false
If you run this command via the RESTful API, please pass the additional parameter expiry=0 to prevent a timeout from the service. You will also need to ensure the HTTP client does not disconnect before the operation completes.