Configure High Speed Packet Capture Capability (Version 11.6 and Later)

This topic guides administrators on how to tune a Network Decoder specifically for high speed packet capture using NetWitness Platform 11.6. This applies when capturing packets on 10G, 40G, or higher speed interface cards.

IMPORTANT: Packet capture at high speeds requires careful configuration and pushes the Decoder hardware to its limits, so you must read this entire topic while implementing a high speed capture solution.

RSA NetWitness Platform provides support for high-speed collection on the Decoder. You can capture network packet data from higher speed networks and optimize your Network Decoder to capture network traffic up to 40 Gbit/s. These network capture speeds can be easily achieved on a Decoder but make a note that not all NetWitness Platform features operate at these rates. The actual throughput of Decoder depends on the following factors:

  • Amount of packets filtered vs. the amount of packets retained
    It is important to filter the traffic that enters the Decoder to minimize packet drops. You can use Network Rules or a BPF Rule to filter the traffic before it enters advanced features such as stream reassembly and deep packet inspection. For more information, see Configure Network Rules and (Optional) Configure System-Level (BPF) Packet Filtering.
  • Average packet size
    Decoder capturing small packets, for example 64 byte packets, can decrease the overall throughput of the Decoder. If average packet sizes are small, the line rate of Decoder ingest will fall significantly below the network link speed. It occurs due to the amount of per-packet overhead between each Ethernet frame.
  • Shared system resources
    If the Decoder host is also hosting other applications or add-on features that manipulate data and perform other analysis, it can lower the overall throughput of the Decoder. Even if these add-on applications are not utilizing the CPU, these will utilize the shared resources like memory bandwidth and CPU interconnect bandwidth.

Enhancements that facilitate capture in these high speed environments include the following:

  • Utilization of the DPDK capture drivers to leverage the commodity 10G and 40G Intel NIC cards for high-speed capture environments. For more information on how to configure DPDK, see (Optional) Data Plane Development Kit Packet Capture.
  • Introduction of assembler.parse.valve configuration, which automatically disables application parsers when certain thresholds are exceeded, to limit risk of packet loss. When the application parsers are disabled, network layer parsers are still active. When stats fall below exceeded thresholds, application parsers are automatically re-enabled.
  • Utilization of Berkeley Packet Filters (BPF) in 10G environments. For more information, see (Optional) Configure System-Level (BPF) Packet Filtering in Configure Capture Settings.
  • Decoder will utilize the symmetric RSS to distribute capture loads among multiple CPU cores. For more information see, "Utilizing Receive Side Scaling with DPDK" in (Optional) Data Plane Development Kit Packet Capture

  • Decoder will distribute the session reassembly work among multiple CPU cores as well, provided that the packets fed into each assembly are naturally segregated using the RSS features or as a result of being captured on different physical interfaces.

Note: Beginning with 11.5, the Network Decoder can capture from multiple interfaces simultaneously. This functionality allows Network Decoders to capture from multiple physical Network Interface Cards (NICs) while leveraging the same network rules, application rules, and parsers for each NIC. You can use this feature in all capture environments. For more information, see (Optional) Multiple Adapter Packet Capture.

Note: Enabling intra-session for HTTP pipelining is not supported when capturing at 10G rates on a single Decoder as the HTTP_Lua parser required to function can cause dropped packets at that rate of ingest. In this case, the load should be spread across multiple Concentrator-Decoder pairs.

Hardware Prerequisites

  • A Series 5 or Series 6 Decoder
  • An Intel 82599-based ethernet card, such as the Intel x520 or an Intel i40e-based card like the Intel XL710. All RSA-provided 10G cards meet this requirement such as the following:
    • All SMC-10GE cards provided by RSA.
    • A RSA Network Daughter Card using an Intel controller to provide 10G network interfaces. This is included in all Series 5 hardware.
  • Sufficiently large and fast storage to meet the capture requirement of packets in case you set up in a mode other than meta-only. Storage considerations are covered later in this topic.
  • If the Decoder will reassemble and handle high numbers of sessions, for example 30000 sessions per second or more, then at least two physical storage volumes are required. This will allow one volume to be used for writing session data while the other to read session data and deliver it for down-stream analytics.

  • Each Network Decoder configured with a minimum of 2 DACs or SAN connectivity when set up in mode other than meta-only.

Software Prerequisites

RSA NetWitness Platform 11.6 or later.

Decoder Installation

Perform the following steps to install the Decoder.

Install the Decoder Service on Host that Will Perform the Capture

The Network Decoder service package contains the necessary software components to utilize the features described in the document. The Network Decoder service package and all other required packages are installed during the Decoder installation.

Select the Capture Interfaces and Assign Interfaces to DPDK

You must choose the physical network connections that will be used for packet ingest. The physical interface type will depend on the type of network traffic sent to the Decoder. The interface you select must match the physical characteristics of the Decoder feed (or network link) used to receive packets. For example, if you want to capture at 40G speeds on a cable, then you need a 40G interface capable of attaching to that cable.

For information on how to assign interfaces to DPDK, see (Optional) Data Plane Development Kit Packet Capture.

Verify the Decoder Packages Installation

Perform the following steps to verify the Decoder package installation:

  • After you configure the interfaces to use with DPDK, ensure that you reboot the Decoder host.
  • After the reboot, ensure that the DPDK interfaces appear in the Decoder's capture adapter list.

Select the Decoder Operating Mode

You can configure Decoder to handle any type of capture scenario that ranges from a very deep application-level inspection to very fast and simple network connection tracking. The Decoder includes the following three built-in templates that serve as a starting point for configuring your capture needs:

  1. Normal: It is the default Decoder mode with no defined resource allocation and all tunable parameters set to default. It can be a good option if your capture rate is less than 5 Gbit/s or if you want to run a large amount of deep inspection content such as application parsers, feeds, OpenAppID detectors, complex Snort rules, and so on. For more information, see Configure the Decoder for Normal Mode (Default Mode).

  2. 10G: This mode is available in NetWitness Platform 10.4 and later. You can use this mode if your capture rate is under 10 Gbit/s. This mode assumes that most of the packets are retained and saved in the packet database. Make a note that not all parsers can run at 10 Gbit/s speeds as some parsers are too complex for high speeds. For more information, see Configure the Decoder for 10G Mode.

  3. NDR: This mode is to capture beyond the 10 Gbit/s speeds, but below 40Gb/sec. Make a note of the following points while using this mode:

    1. By default, this mode assumes that packets are not retained as the rate at which this mode consumes storage is significantly high.

    2. By default, the ingested packets are dropped. You must insert the network rules before the drop statement to pick portions of the network stream and keep the incoming packet feed for further analysis.

    3. Make a note that at higher speeds only a limited amount of deep packet inspection content can be used.

    4. Ensure that you enable the multi-thread assembly while using this mode.

    For more information, see Configure the Decoder for NDR Mode.

Configure the Decoder for Normal Mode (Default Mode)

By default, the Decoder runs in the normal mode and requires no additional configuration. The default normal mode captures up to 5Gb/sec with large amounts of deep packet inspection while storing network sessions.

Configure the Decoder for 10G Mode

The instructions in this section are to configure the Decoder for capture speeds up to 10Gb/sec with medium amounts of deep packet inspection while storing network sessions.

To configure the Decoder:

  1. From the Decoder Explore view, right-click Decoder and select Properties. In the properties drop-down menu, select reconfig and enter the following parameters:
    op=10g
    Ensure that the correct parameter is displayed in the output.
  2. Add the update=true parameter and run reconfig again to save the configuration changes. For example, your final parameters can be op=10g update=true.
  3. From the Decoder Explore view, right-click database and select Properties.
  4. In the Properties drop-down menu, select reconfig and enter the following parameters:
    update=1 op=10g
    These parameters adjust the packet database to use very large file sizes and Direct I/O.
  5. Perform the following steps to select capture interfaces:
    1. In the Decoder Explore view, right-click on Decoder and select Properties.
    2. In the Properties drop-down menu, click select and run the command to view the available adapters. For example, this might produce output like the list below. The actual list of adapters will vary depending on the hardware available on the Decoder host.

      1: packet_mmap_,eth0

      2: DPDK,0000:41:00.0

      3: DPDK,0000:41:00.1

    3. To select a single interface for capture, enter the parameter adapter=N where N is the capture interface. For example, using the list above you might choose to capture on the first DPDK interface using the adapter=2 parameter.

    4. To select more than one interface for capture, enter the parameter adapter=N,M,... where N,M and so on are the capture interfaces. For example, using the list above you might choose to capture on both DPDK interfaces using the adapter=2,3 parameter.

  6. (Optional) Application parsing is extremely CPU intensive and can cause the Decoder to drop packets. To mitigate application parsing-induced drops, you can set /decoder/config/assembler.parse.valve to true. These are the results:

    • When session parsing becomes a bottleneck, application parsers (HTTP, SMTP, FTP, and others) are temporarily disabled.
    • Sessions are not dropped when the application parsers are disabled, just the fidelity of the parsing performed on those sessions.
    • Sessions parsed when the application parsers are disabled still have associated network meta (from the network parser).
    • The statistic /decoder/parsers/stats/blowoff.count displays the count of all sessions that bypassed application parsers (network parsing is still performed).
    • When session parsing is no longer a potential bottleneck, the application parsers are automatically re-enabled.
    • The assembler session pool should be large enough that it is not forcing sessions.
    • You can determine if sessions are being forced by the statistic /decoder/stats/assembler.sessions.forced (it will be increasing). Also /decoder/stats/assembler.sessions will be within several hundred of /decoder/config/assembler.session.pool.
  7. (Optional) If you need to adjust the MTU for capture, add the snaplen parameter to capture.device.params. Unlike previous releases, the snaplen does not need to be rounded up to any specific boundary. The Decoder automatically adjusts the MTU set on the capture interfaces.

  8. The following configuration parameters are deprecated and no longer necessary.

    • The core= parameter in capture.device.params
    • Any configuration files under /etc/pf_ring directory
    • Separate device= parameters in capture.device.params. All multi-interface selection is performed with the select command described in Step 5 (b).

    Note: An Ethernet device installed post imaging must be added to DPDK if you want to use it as a capture interface. Similarly, it also require configuration if used as a network interface, or for system tools to access it without manual configuration.

Performance Tuning Parameters

By default, the following tunable parameters are disabled. It is recommended that you enable these to achieve high capture rates and consistent performance.

  1. You can use the BPF filter to perform fast filtering of the packets. The BPF filter is the fastest way to remove packets from ingest. By dropping the unwanted traffic as early as possible, you can reduce the workload on the Decoder and ensure that essential packets are not dropped. For more information on BPF/PCAP filter, see Configure Capture Settings.
  2. You can turn on the Receive Side Scaling (RSS) feature for your network capture interface. The RSS feature splits the traffic coming into the interface in separate queues. It allows each queue to be handled by a different thread, and therefore run on a different CPU core. This provides more CPU time to execute per-packet operations like evaluating BPF and Network rules. RSS makes handling of higher packet rates easier.
    The correct value for RSS will depends on the number of CPUs available on the host. A good starting point for the number of RSS threads depends on how many Cores per CPU socket are present in your Decoder host. Take the number of Cores present on a single CPU, and divide it by 2. For example, on a 12 core processor, you might use up to 6 RSS queues. You can distribute the RSS unequally between interfaces. For example you can assign 4 RSS queues to a busy interface and 2 to a less busy interface.

    IMPORTANT: Ensure that a Network Interface on a Decoder host is typically attached to only one CPU socket. Therefore, you must count only the CPU cores of the CPU socket that is attached to the network interfaces.
    Setting higher numbers of RSS queues is possible, but there are diminishing returns if the total number of RSS queues spawns more Capture and Assembly threads than there are physical cores on the Decoder host.

    For more details on how to configure RSS, see "Utilizing Receive Side Scaling with DPDK" in (Optional) Data Plane Development Kit Packet Capture.

Storage Considerations

Packet Retention Requires Extremely High Sustained Throughput

When capturing at 10G line rates, the storage system holding the packet and meta databases must be capable of sustained write throughput of 1400 MBytes/s. Make a note that many SANs are not capable of achieving these speeds.

Using SAN and Other Storage Configurations

The Decoder allows any storage configuration that can meet the sustained throughput requirement. The standard 8 Gbit FC link to a SAN is not sufficient to store packet data at 10G; in order to use a SAN it may be required to perform aggregation across multiple targets using a software-RAID Scheme. Thus environments using SAN are required to configure connectivity to the SAN using multiple FCs.

Optimize Read/Write Operations When Adding New Storage

A 10G Decoder is optimized to stagger read and write operations across multiple volumes so that the current file being written is on a different volume from the next file that will be written. This allows maximum throughput on the raid volume when reading data from the last file being written while writing the current file on a different volume. However, if volumes are added after a Decoder has been in use, the ability to stagger is limited because one or more volumes are already full so the new volume is the only place new files can be written.

To remedy this situation, an administrator can run a stagger command on an existing NetWitness Platform database (packet, log, meta, or session), that has at least two volumes, to stagger the files across all volumes in the most optimal read/write pattern. The major use case is when new storage is added to an existing Decoder and you want to stagger the volumes before restarting capture.

The configuration nodes for this command are the session, meta, and packet databases. Each of these lives under /database/config, which is usually a root node. The config nodes for a Decoder are:

  • /database/config/packet.dir
  • /database/config/meta.dir
  • /database/config/session.dir

The NetWitness Platform Core Database Tuning Guide has information on how those configurations are formatted.

The stagger command is typically only useful for a 10G Decoder and usually just for the packet database. Maximum performance is achieved for storing and retrieving packets when multiple volumes are present. In this scenario, the Decoder always fills the volume with the most free space. When the volumes are roughly the same size, this results in a staggered write pattern, which allows maximum throughput for reading and writing across all volumes. However, this only naturally occurs when multiple packet storage volumes are present at the time the Decoder is first deployed.

A typical use case is adding more storage to an existing Decoder to increase retention. However, when adding storage to an deployment that has already filled the existing volumes with stored packets, the Decoder will naturally fill the new storage with packets before rolling out any packets on the existing storage. This results in a suboptimal read/write pattern because most reads will occur on the same volume that is currently being written to. In a 10G deployment, reads are blocked from the volume when writes are occurring. This does not stop ALL reads on that volume, because the file is buffered in memory before being written, but it does result in suboptimal read performance.

With the stagger command, you can add more storage and then have the service naturally stagger the files across ALL volumes (existing and new) so that read performance is optimized.

Caution: This command should only be performed after the storage is mounted and the Decoder configured to use it (for example, after adding the mount point(s) to packet.dir).

The downside to this command is it can take some time to stagger and the Decoder should not be capturing during the stagger operation.

Recommended workflow:

  1. Add all storage and configure mount points.
  2. Add new storage mount points to packet.dir (or session.dir/meta.dir) and restart service (very important!).
  3. Ensure capture is stopped.
  4. Run the stagger operation. You must not terminate the connection that initiated the stagger operation until the operation is complete. If you run stagger from NwConsole, run the timeout 0 command before sending the stagger command. This will prevent the normal 30 second command timeout.
  5. Start capture after the stagger command finishes.

The following are the parameters for the stagger command:

  • type - The database that will be staggered (session, meta, or packet). Typically only the packet database is useful for staggering, but it is possible to do the session or meta database when multiple volumes are present for those databases. Since the session and meta databases write far less data than the packet database, typically staggering those databases results in less noticeable performance gains.
  • dryRun - If true (the default), will only return a description of the operations that would be performed. If false, then the files will actually be moved to an optimal read/write pattern. You MUST pass false to actually stagger the files.

Example usage from NwConsole:

login <decoder>:50004 <username> <password>

timeout 0

send /database stagger type=packet dryRun=false

Note: If you run this command using the RESTful API, pass the additional parameter expiry=0 to prevent a timeout from the service. You will also need to ensure the HTTP client does not disconnect before the operation completes.

Parsing and Content Considerations

Parsing raw packets at high speeds presents unique challenges. Given the high session and packet rates, parsing efficiency is paramount. A single parser that is inefficient (spends too long examining packets) can slow the whole system down to the point where packets are dropped at the card.

For initial 10G testing, start with only native parsers (except SMB/WebMail). Use the native parsers to establish baseline performance and with little to no packet drops. Do not download any Live content until this has been done and the system is proven to capture without issue at high speeds.

After the system has been operational and running smoothly, Live content should be added very slowly, especially the parsers.

Best Practices

Whether you are updating a currently deployed system or deploying a new system, it is recommended you use the following best practices to minimize risk for packet loss. One caveat is that if you are updating a current 10G deployment but not adding any additional traffic. For example, a current Decoder capturing off a 10G card at 2G sustained should see no difference in performance, unless part of the update also entails adding additional traffic for capture.

  • Incorporate baseline parsers (except SMB/Webmail, both of which generally have high CPU utilization) and monitor to ensure little to no packet loss.
  • When adding additional parsers, add only one or two parsers at a time.
  • Measure performance impact of newly added content, especially during peak traffic periods.
  • If drops start occurring when they did not happen before, disable all newly-added parsers and enable just one at a time and measure the impact. This helps pinpoint individual parsers causing detrimental effects on performance. It may be possible to re-factor it to perform better or reduce its feature set to just what is necessary for the customer use case.
  • Although lesser performance impacts, feeds should also be reviewed and added in a phased approach to help measure performance impacts.
  • Application Rules also tend to have little observable impact, though again, it is best not to add a large number of rules at once without measuring the performance impact.
  • If you regularly get a timeout message in the Investigate > Events view, such as The query on channel 192577 was auto-canceled by the system for exceeding time usage limits. Check timeout values. Query running time was 00:05:00 (HH:MM:SS), first check the query console to determine if there are issues around time it takes for a service to respond, index error messages, or other warnings that may need to be addressed to increase query response time. If there are no messages indicating any specific warnings then try increasing the Core Query Timeout from the default 5 minutes to 10 minutes as described in "View Query and Session Attributes per Role" section of the System Security and User Management Guide.

Also, making the recommended configuration changes outlined in the Configuration section will help minimize potential issues.

Use Case 1: 10G Mode - Egress, Deep Packet Inspection

In this setup the goal is to ingest at 10G sustained line rates, perform Deep Packet Inspection (DPI), store metadata, and store raw packets for some time.

Tested Live Content

All (not each) of the following parsers can run at 10G on the test data set used:

  • MA content (7 Lua parsers, 1 feed, 1 application rule)
  • 4 feeds (alert ids info, nwmalwaredomains, warning, and suspicious)
  • 41 application rules
  • DNS_verbose_lua (disable DNS)
  • fingerprint_javascript_lua
  • fingerprint_pdf_lua
  • fingerprint_rar_lua
  • fingerprint_rtf_lua
  • MAIL_lua (disable MAIL)
  • SNMP_lua (disable SNMP)
  • spectrum_lua
  • SSH_lua (disable SSH)
  • TLS_lua
  • windows_command_shell
  • windows_executable

Not Tested

  • SMB_lua, native SMB disabled by default
  • html_threat

Other

  • HTTP_lua reduces the capture rate from >9G to <7G. At just under 5G this parser can be used in place of the native without dropping (in addition to the list above).
  • xor_executable pushes parse CPU to 100% and the system can drop significantly due to parse backup.

Aggregation Adjustments Based on Tested Live Content

A 10G Decoder can serve aggregation to a single Concentrator while running at 10G speeds. Deployments using Malware Analysis, Event Stream Analysis, Warehouse Connector, and Reporting Engine are expected to impact performance and can lead to packet loss.

For the tested scenario, the Concentrator aggregates between 45 and 70k sessions per second. The 10G Decoder captures between 40 and 50k sessions per second. With the content identified above, this is about 1.5 to 2 million meta per second. Due to the high volume of session rates, the following configuration changes are recommended:

  • Nice aggregation on the Concentrator limits the performance impact on the 10G Decoder. The following command turns on nice aggregation.
    /concentrator/config/aggregate.nice = true
  • Due to the high volume of sessions on the Concentrator, you may consider activating parallel values mode on the Concentrator by setting /sdk/config/parallel.values to 16. This improves Investigation performance when the number of sessions per second is greater than 30,000.
  • If multiple aggregation streams are necessary, aggregating from the Concentrator instead has less impact on the Decoder.
  • Further review for content and parsing is required for deployments where you want to use other NetWitness Platform components (Warehouse Connector, Malware Analysis, ESA, and Reporting Engine).

Configure the Decoder for NDR Mode

The instructions in this section are to configure the Decoder for capture speeds more than 10Gb/sec but less than 40Gb/sec with small amounts of DPI while storing only metadata.

To configure the Decoder:

  1. From the Decoder Explore view, right-click Decoder and select Properties. In the properties drop-down menu, select reconfig and enter the following parameters:
    op=ndr
    Ensure that the correct parameter is displayed in the output.
  2. Add the update=true parameter and run reconfig again to save the configuration changes. For example, your final parameters will be op=ndrupdate=true.
  3. From the Decoder Explore view, right-click database and select Properties.
  4. In the Properties drop-down menu, select reconfig and enter the following parameters:
    update=1 op=ndr
    These parameters adjust the packet database to use very large file sizes and Direct I/O.
  5. Perform the following steps to select capture interfaces:
    1. In the Decoder Explore view, right-click on Decoder and select Properties.
    2. In the Properties drop-down menu, click select and run the command to view the available adapters. For example, this might produce output like the list below. The actual list of adapters will vary depending on the hardware available on the Decoder host.

      1: packet_mmap_,eth0

      2: DPDK,0000:41:00.0

      3: DPDK,0000:41:00.1

    3. To select a single interface for capture, enter the parameter adapter=N where N is the capture interface. For example, using the list above you might choose to capture on the first DPDK interface using the adapter=2 parameter.

    4. To select more than one interface for capture, enter the parameter adapter=N,M,... where N,M and so on are the capture interfaces. For example, using the list above you might choose to capture on both DPDK interfaces using the adapter=2,3 parameter.

  6. (Optional) Application parsing is extremely CPU intensive and can cause the Decoder to drop packets. To mitigate application parsing-induced drops, you can set /decoder/config/assembler.parse.valve to true. These are the results:

    • When session parsing becomes a bottleneck, application parsers (HTTP, SMTP, FTP, and others) are temporarily disabled.
    • Sessions are not dropped when the application parsers are disabled, just the fidelity of the parsing performed on those sessions.
    • Sessions parsed when the application parsers are disabled still have associated network meta (from the network parser).
    • The statistic /decoder/parsers/stats/blowoff.count displays the count of all sessions that bypassed application parsers (network parsing is still performed).
    • When session parsing is no longer a potential bottleneck, the application parsers are automatically re-enabled.
    • The assembler session pool should be large enough that it is not forcing sessions.
    • You can determine if sessions are being forced by the statistic /decoder/stats/assembler.sessions.forced (it will be increasing). Also /decoder/stats/assembler.sessions will be within several hundred of /decoder/config/assembler.session.pool.
  7. (Optional) If you need to adjust the MTU for capture, add the snaplen parameter to capture.device.params. Unlike previous releases, the snaplen does not need to be rounded up to any specific boundary. The Decoder automatically adjusts the MTU set on the capture interfaces.

  8. The following configuration parameters are deprecated and no longer necessary.

    • The core= parameter in capture.device.params
    • Any configuration files under /etc/pf_ring directory
    • Separate device= parameters in capture.device.params. All multi-interface selection is performed with the select command described in Step 5 (b).

    Note: An Ethernet device installed post imaging must be added to DPDK if you want to use it as a capture interface. Similarly, it also require configuration if used as a network interface, or for system tools to access it without manual configuration.

Performance Tuning Parameters

By default, the following tunable parameters are disabled. It is recommended that you enable these to achieve high capture rates and consistent performance.

  1. You can use the BPF filter to perform fast filtering of the packets. The BPF filter is the fastest way to remove packets from ingest. By dropping traffic that you don't need to retain as early as possible, you can reduce the workload on the Decoder and ensure that essential packets are not dropped. For more information on BPF/PCAP filter, see Configure Capture Settings.
  2. You can turn on the Receive Side Scaling (RSS) feature for your network capture interface. The RSS feature splits the traffic coming into the interface in separate queues. It allows each queue to be handled by a different thread, and therefore run on a different CPU core. This provides more CPU time to execute per-packet operations like evaluating BPF and Network rules. RSS makes handling of higher packet rates easier.
    The correct value for RSS will depends on the number of CPUs available on the host. A good starting point for the number of RSS threads depends on how many Cores per CPU socket are present in your Decoder host. Take the number of Cores present on a single CPU, and divide it by 2. For example, on a 12 core processor, you might use up to 6 RSS queues. You can distribute the RSS unequally between interfaces. For example you can assign 4 RSS queues to a busy interface and 2 to a less busy interface.

    IMPORTANT: Ensure that a Network Interface on a Decoder host is typically attached to only one CPU socket. Therefore, you must count only the CPU cores of the CPU socket that is attached to the network interfaces.
    Setting higher numbers of RSS queues is possible, but there are diminishing returns if the total number of RSS queues spawns more Capture and Assembly threads than there are physical cores on the Decoder host.

    For more details on how to configure RSS, see "Utilizing Receive Side Scaling with DPDK" in (Optional) Data Plane Development Kit Packet Capture.

Storage Considerations

NDR Mode Assumes No Packet Retention

The base or default configuration for NDR mode starts with all packet writes disabled. So, in this scenario the storage throughput requirements are relatively less. If you choose to turn on packet retention, be aware that it is relatively easy to overwhelm the I/O throughput of most storage solutions with a 40G network feed.

Parsing and Content Considerations

Parsing at Speeds Greater than 10G

As network ingest speeds increase, less CPU time is available to examine each packet or each session. It means that only limited amount of parsing can be performed. Fortunately, you only need to account for the amount of traffic that enters Decoder's parsing sub-system. For example, if you are ingesting 40 Gbit/s but filter out 30 Gbit/s of traffic, you only need to allow 10 Gbit/s of traffic to move through the parsers. At raw 40 Gbit/s speeds, only the "well-behaved" content works. The "well-behaved" content can defined as:

  • Content that only activate when a large, uncommon search token is registered. It includes the Snort rules with a large, uncommon "fast-pattern" content field. For more information on fast-pattern and Snort rules, see Decoder Snort Detection.

  • Content that generates a finite number of metas per session.

  • Parsers that do not activate on session begin/end events.

The overall maximum throughput of the reassembly and parsing systems on Decoder is about 100,000 sessions (or streams) per second. However, in practical terms, very few downstream analytical or database services can handle such activity. If you find that your session rate is too high, or that your down-stream services cannot handle how many sessions the Decoder is generating, consider filtering out lower value traffic as it is ingested.

Best Practices

The NDR mode has a network rule configured to drop all incoming traffic by default. In general, when capturing above 10Gb/sec, you must limit the amount of traffic analyzed using DPI. These use cases in this document are example guidelines that do not guarantee the actual throughput of the Decoder. As mentioned previously in this document, the actual throughput of the Decoder depends on the following factors:

  • Amount of packets filtered vs. the amount of packets retained
    It is important to filter the traffic that enters the Decoder to minimize packet drops. You can use Network Rules or a BPF Rule to filter the traffic before it enters advanced features such as stream reassembly and DPI. For more information, see Configure Network Rules and (Optional) Configure System-Level (BPF) Packet Filtering.
  • Average packet size
    Decoder capturing small packets, for example 64 byte packets, can decrease the overall throughput of the Decoder. If average packet sizes are small, the line rate of Decoder ingest will fall significantly below the network link speed. It occurs due to the amount of per-packet overhead between each Ethernet frame.
  • Shared system resources
    If the Decoder host is also hosting other applications or add-on features that manipulate data and perform other analysis, it can lower the overall throughput of the Decoder. Even if these add-on applications are not utilizing the CPU, these will utilize the shared resources like memory bandwidth and CPU interconnect bandwidth.

Whether you are updating a currently deployed system or deploying a new system, it is recommended you use the following best practices to minimize risk for packet loss.

  • Incorporate baseline parsers (except SMB/Webmail, both of which generally have high CPU utilization) and monitor to ensure little to no packet loss.
  • When adding additional parsers, add only one or two parsers at a time.
  • Measure performance impact of newly added content, especially during peak traffic periods.
  • If drops start occurring when they did not happen before, disable all newly-added parsers and enable just one at a time and measure the impact. This helps pinpoint individual parsers causing detrimental effects on performance. It may be possible to re-factor it to perform better or reduce its feature set to just what is necessary for the customer use case.
  • If you regularly get a timeout message in the Investigate > Events view, such as The query on channel 192577 was auto-canceled by the system for exceeding time usage limits. Check timeout values. Query running time was 00:05:00 (HH:MM:SS), first check the query console to determine if there are issues around time it takes for a service to respond, index error messages, or other warnings that may need to be addressed to increase query response time. If there are no messages indicating any specific warnings then try increasing the Core Query Timeout from the default 5 minutes to 10 minutes as described in "View Query and Session Attributes per Role" section of the System Security and User Management Guide.

Use Case 1: NDR Mode - Egress, General Purpose

Generate NetFlow Style Meta Only + Small Subset of Snort Rules + All Native Parsers

In this setup the goal is to ingest at rates higher than 10G sustained line rates, perform DPI with only native network parsers (non-Lua), and store only metadata for some time.

Tested Live Content

All (not each) of the following parsers can run at 10G on the test data set used:

  • The NETWORK native parsers

  • 35 Snort rules for FireEye red team tool detection

Not Tested

All Lua parsers.

Other

The following considerations are recommended while running the Decoder in the NDR mode:

  • Some native parsers should be disabled in some environments. For example, the DNS parser might generate too many small sessions or too much meta per session.
  • It is recommended to keep the session rate less than 35,000 sessions per second. This is to limit the packet drops to less than 1% at such high throughput rates.

  • Configure rules to filter the traffic that you do not require. For more information on BPF/PCAP filter, see Configure Capture Settings.

Use Case 2: NDR Mode - Egress, Data Exfiltration

Generate NetFlow Style Meta Only + Exfiltration Focused Specific Native Parsers

In this setup the goal is to ingest at rates higher than 10G sustained line rates, perform DPI with only native network parsers (non-Lua), and store only metadata for some time.

Tested Live Content

All (not each) of the following parsers can run at 10G on the test data set used:

  • Native parsers - HTTP, HTTPS(SSL), SMTP, DNS, FTP, SFTP/SSH, and VNC

  • 35 Snort rules for FireEye red team tool detection

Not Tested

All Lua parsers.

Other

The following considerations are recommended while running the Decoder in the NDR mode:

  • Some native parsers should be disabled in some environments. For example, the DNS parser might generate too many small sessions or too much meta per session.
  • It is recommended to keep the session rate less than 35,000 sessions per second. This is to limit the packet drops to less than 1% at such high throughput rates.

  • Configure rules to filter the traffic that you do not require. For more information on BPF/PCAP filter, see Configure Capture Settings.

Use Case 3: NDR Mode - Lateral Movement

Generate NetFlow Style Meta Only + Small Subset of Snort Rules + Specific Native Parsers

In this setup the goal is to ingest at rates higher than 10G sustained line rates, perform DPI with only native network parsers (non-Lua), and store only metadata for some time.

Tested Live Content

All (not each) of the following parsers can run at 10G on the test data set used:

  • Native parsers - Kerberos, SMB, VNC, and SFTP/SSH

  • 35 Snort rules for FireEye red team tool detection

Not Tested

All Lua parsers.

Other

The following considerations are recommended while running the Decoder in the NDR mode:

  • Disable the Snort rule related to SMB traffic (for example, M.HackTool.SMB.Impacket-Obfuscation.[Service Names]). It is recommended because large regex parameters can cause Snort parser to utilize more than 50% CPU.

  • Some native parsers should be disabled in some environments. For example, the DNS parser might generate too many small sessions or too much meta per session.
  • It is recommended to keep the session rate less than 35,000 sessions per second. This is to limit the packet drops to less than 1% at such high throughput rates.

  • Configure rules to filter the traffic that you do not require. For more information on BPF/PCAP filter, see Configure Capture Settings.