000030209 - RSA Security Analytics UI becomes unavailable because the System Monitoring message queue is full

Document created by RSA Customer Support Employee on Jun 14, 2016
Version 1Show Document
  • View in full screen mode

Article Content

Article Number000030209
Applies ToRSA Product Set: Security Analytics
RSA Product/Service Type: Security Analytics UI, Health & Wellness, Security Analytics Server
RSA Version/Condition:,,,, 10.5.1
Platform: CentOS
O/S Version: EL6
IssueMultiple CLOSE_WAIT connections to the Jetty web server from various hosts eventually leads to connectivity issues that cause the Security Analytics UI to become unreachable, as shown in the example below.
[root@SA-Server ~]# netstat -anp | grep 443
tcp      982      0          CLOSE_WAIT  30163/java
tcp      523      0            CLOSE_WAIT  30163/java
tcp        1      0            CLOSE_WAIT  30163/java
tcp        1      0            CLOSE_WAIT  30163/java
tcp      982      0          CLOSE_WAIT  30163/java
tcp     2054      0            CLOSE_WAIT  30163/java

The /var/lib/netwitness/uax/logs/sa.log file streams the following errors when the issue is occurring:
2015-05-07 17:33:10,457 [pool-3-thread-31304] ERROR com.rsa.smc.sa.admin.util.monitoring.MessageBusReader - System Monitoring message queue is full

Restarting the rsa-sms, collectd, and rabbitmq-server services on the appliance have no effect on the issue.
ResolutionA hotfix to resolve the issue has been prepared by the Engineering team for Security Analytics
The issue is being investigated by the Engineering team for the other affected versions.
If you are experiencing this issue, contact RSA Support and quote this article number for further assistance.
WorkaroundIn order to temporarily resolve the issue, the jettysrv service on the Security Analytics server appliance can be restarted by issuing the commands below.
  1. stop jettysrv
  2. start jettysrv
NOTE:  Restarting the jettysrv service will result in the Security Analytics UI becoming unreachable--if this is not already the case--for a few minutes while the service re-initializes completely.

Another workaround to prevent the issue from occurring in the future is to modify the SMS polling intervals in the puppet recipes to occur every 60 seconds rather than every 10 seconds, and by modifying the collectd interval to occur every 180 seconds rather than every 60 seconds.  This can be done by issuing the command below on the Security Analytics server.
[root@SA-Server ~]# updatedb && for files in $(locate --regex ".conf.erb$");do sed -i 's/interval  "60"/interval  "180"/g' $files; sed -i 's/interval  "10"/interval  "60"/g' $files;done && sed -i 's/interval 60/interval 180/g' /etc/puppet/modules/rsa-sms-server/files/_collectd_java.conf && sed -i 's/interval  "5"/interval  "60"/g' /etc/puppet/modules/broker/templates/NwBroker.conf.erb && service puppet restart