|Applies To||RSA Product Set: NetWitness Logs & Packets, Security Analytics|
RSA Product/Service Type: Packet Decoder, Health & Wellness
RSA Version/Condition: 10.6.x, 10.5.x
O/S Version: EL6
- Decoder Capture Rate Zero alarm is triggered on Health & Wellness.
- Decoder Packet Capture Pool Depleted alarm is triggered on Health & Wellness
- Decoder Dropping >xx% of Packets alarm is triggered on Health & Wellness
- Decoder packet capture rate becomes zero until Decoder service restart.
- After Decoder restart, it continuously falls into the same issue again. (This can happen every couple of days or several times in a day as it depends on the incoming traffic and the parser.)
|Cause||As a session was stuck in Parse thread on Decoder, all available memory pages for Decoder was allocated to Assembler and Parse Thread.|
There is no available memory for Capture thread so that newly ingested packets cannot be captured.
|Resolution||To resolve the issue you must unload or remove the problematic parser that is causing the Decoder to become stuck and then restart the Decoder service.|
|Notes||How to figure out a session is stuck in Parse Thread.|
Need to review statdb or statHist for below stat values. Also need to understand the pattern under both Normal and Issue status. It can be varied in each customer's environment, incoming traffic amount and loaded parsers. Please take some time to review statdb in Normal status and compare the stats under the issue.
Patterns in Normal status.
Patterns in Issue status
- /decoder/stats/pool.packet.capture : Mostly it is the largest pool. (Approximately it takes up 60~90% of /decoder/config/pool.packet.pages. It depends on Decoder load)
- /decoder/stats/pool.packet.assembler : Very low number
- /decoder/stats/assembler.packet.pages : Mostly it is the second largest pool in Decoder. (It takes up 10~40% of the /decoder/config/pool.packet.pages. It depends on Decoder load. Based on /decoder/config/assembler.pool.ratio it can take up to 70% of /decoder/config/pool.packet.pages in default.)
- /decoder/parsers/stats/queue.sessions.total : It continuously becomes 0 as the session should be returned to Assembler Thread to save when parsing is completed.
(Will not use full path of the value from now on)
This behavior can be checked with statdb and statHist output.
- If a session is stuck in Parse Thread, queue.sessions.total starts not becoming 0. ex ) continuously becomes a value > 0.
- assembler.packet.pages starts to grow up to 70% of the pool.packet.pages.
- Once assembler.packet.pages reaches the limit, pool.packet.assembler starts to grow up to the rest 30% of the pool.packet.pages.
- Once pool.packet.assembler reaches the limit, pool.packet.capture becomes 0. (As pool.packet.pages is a shared pool and both pool.packet.assembler and assembler.packet.pages took up the entire pool.packet.pages, no pages is available for pool.packet.capture.)
- capture.rate becomes 0 as pool.packet.capture is 0. It will not take more than 10 mins since the session stuck.
How to figure out the problematic parser to lead the stuck.
How to collect/investigate statdb
- Need to have a core dump of Decoder when it happens. Core Dump Analysis by CE team will tell the problematic parser. However, the dump size is two digit GB, it might be pretty hard to transfer the file.
- If Core Dump is not allowed, list up the recently changed/added parsers. And remove one by one and let Decoder run several days to figure out the bad parser.
- This can rarely happen with RSA Live Parsers. However mostly it happens with Customer's custom parsers.
- CE team has gencore.sh to collect the dump. Refer the reference link in below.
- Request the customer to transfer Decoder statdb files under /var/netwitness/decoder/statdb on Decoder.
- No need to stop Decoder service to get a closed db file as the opened db file can be queried without any problem.
How to collect statHist
- Refer Decoder explore page /sys/statHist function. Can access with NWConsole or Decoder REST web page (http://decoderIP:50104/sys).