000035681 - Decoder Capture Rate Zero on Health & Wellness due to parser stuck in RSA NetWitness Platform

Document created by RSA Customer Support Employee on Aug 8, 2019
Version 1Show Document
  • View in full screen mode

Article Content

Article Number000035681
Applies ToRSA Product Set: NetWitness Platform
RSA Product/Service Type: Packet Decoder, Health & Wellness
RSA Version/Condition: 10.6.x, 10.5.x, 11.0,11.1,11.2
Platform: CentOS
O/S Version: EL6, EL7
Issue
  • Decoder Capture Rate Zero alarm is triggered on Health & Wellness.
  • Decoder Packet Capture Pool Depleted alarm is triggered on Health & Wellness
  • Decoder Dropping >xx% of Packets alarm is triggered on Health & Wellness
  • Decoder packet capture rate becomes zero until Decoder service restart.
  • After Decoder restart, it continuously falls into the same issue again. (This can happen every couple of days or several times in a day as it depends on the incoming traffic and the parser.)
CauseDue to a session that was stuck in Parse thread on Decoder, all available memory pages for Decoder was allocated to Assembler and Parse Thread.

There is no available memory for Capture thread so that newly ingested packets cannot be captured. 
ResolutionTo resolve the issue you must unload or remove the problematic parser that is causing the Decoder to become stuck and then restart the Decoder service.
Notes

How to figure out a session is stuck in Parse Thread


To resolve this issue, you must review statdb or statHist for the stat values listed below. Alsoyou  need to understand the pattern under both Normal and Issue status. It can be varied in each customer's environment, incoming traffic amount and loaded parsers. Please take some time to review statdb in Normal status and compare the stats under the issue.


Patterns in Normal status



  • /decoder/stats/pool.packet.capture: Mostly it is the largest pool. (Approximately it takes up 60~90% of /decoder/config/pool.packet.pages. It depends on Decoder load)
  • /decoder/stats/pool.packet.assembler:  Very low number
  • /decoder/stats/assembler.packet.pages:  Mostly it is the second largest pool in Decoder. (It takes up 10~40% of the /decoder/config/pool.packet.pages. It depends on Decoder load. Based on /decoder/config/assembler.pool.ratio it can take up to 70% of /decoder/config/pool.packet.pages in default.)
  • /decoder/parsers/stats/queue.sessions.total:  It continuously becomes 0 as the session should be returned to Assembler Thread to save when parsing is completed.


Patterns in Issue status


(Will not use full path of the value from now on)

  • If a session is stuck in Parse Thread, queue.sessions.total starts not becoming 0.  ex ) continuously becomes a value > 0.
  • assembler.packet.pages starts to grow up to 70% of the pool.packet.pages.
  • Once assembler.packet.pages reaches the limit, pool.packet.assembler starts to grow up to the rest 30% of the pool.packet.pages.
  • Once pool.packet.assembler reaches the limit, pool.packet.capture becomes 0. (As pool.packet.pages is a shared pool and both pool.packet.assembler and assembler.packet.pages took up the entire pool.packet.pages, no pages is available for pool.packet.capture.) 
  • capture.rate becomes 0 as pool.packet.capture is 0. It will not take more than 10 mins since the session stuck.
 

Example of Normal versus Abnormal Stats in Explore
Stat/Config NodeNormalAbnormal
/decoder/stats/pool.packet.capture1887010
/decoder/stats/pool.packet.assembler060000
/decoder/stats/assembler.packet.pages100140000
/decoder/config/pool.packet.pages200000200000
/decoder/config/assembler.pool.ratio7070
/decoder/parsers/stats/queue.sessions.total0754


This behavior can be checked with statdb and statHist output.
                                                                                           

How to figure out the problematic parser to lead the stuck



  • Need to have a core dump of Decoder when it happens. Core Dump Analysis by CE team will tell the problematic parser. However, the dump size is two digit GB, it might be pretty hard to transfer the file.
  • If Core Dump is not allowed, list up the recently changed/added parsers. And remove one by one and let Decoder run several days to figure out the bad parser.
  • This can rarely happen with RSA Live Parsers. However, mostly it happens with the customer's custom parsers.
  • CE team has gencore.sh to collect the dump. Refer the reference link in below.


How to collect/investigate statdb



Collect


  • Request the customer to transfer Decoder statdb files under /var/netwitness/decoder/statdb on Decoder.
  • No need to stop Decoder service to get a closed db file as the opened db file can be queried without any problem.

Investigate




How to collect statHist



  • Refer Decoder explore page /sys/statHist function. Can access with NWConsole or Decoder REST web page (http://decoderIP:50104/sys).

Attachments

    Outcomes