SysMaint: Troubleshooting Health & Wellness

Document created by RSA Information Design and Development on Jul 29, 2016Last modified by Susan Ewald on Nov 1, 2016
Version 2Show Document
  • View in full screen mode
 

This topic guides you to Health & Wellness issues that you may encounter and suggests solutions to these problems.

Issues Common to All Hosts and Services 

You may see the wrong statistics in the Health & Wellness interface if:

  • Some or all the hosts and services are not provisioned and enabled correctly.
  • You have a mixed-version deployment (that is, hosts updated to different Security Analytics versions).
  •  Supporting services are not running.

Issues Identified by Messages in the Interface or Log Files

This section provides troubleshooting information for issues identified by messages Security Analytics displays in the Health & Wellness Interface or includes in the Health & Wellness log files. 

               
MessageUser Interface:  Cannot connect to System Management Service
System Management Service (SMS) logs:

Caught an exception during connection recovery!
java.io.IOException
at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:106)
at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:102)
at com.rabbitmq.client.impl.AMQConnection.start(AMQConnection.java:346)
at com.rabbitmq.client.impl.recovery.RecoveryAwareAMQConnection
Factory.newConnection(RecoveryAwareAMQConnectionFactory.java:36)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recover
Connection(AutorecoveringConnection.java:388)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.begin
AutomaticRecovery(AutorecoveringConnection.java:360)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.access$000
(AutorecoveringConnection.java:48)
at com.rabbitmq.client.impl.recovery.AutorecoveringConnection$1.shutdown
Completed(AutorecoveringConnection.java:345)
at com.rabbitmq.client.impl.ShutdownNotifierComponent.notifyListeners
(ShutdownNotifierComponent.java:75)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run
(AMQConnection.java:572)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.rabbitmq.client.ShutdownSignalException: connection error
at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:67)
at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue
(BlockingValueOrException.java:33)
at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply
(AMQChannel.java:343)
at com.rabbitmq.client.impl.AMQConnection.start(AMQConnection.java:292)
... 8 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:189)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:288)
at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95)
at com.rabbitmq.client.impl.SocketFrameHandler.readFrame
(SocketFrameHandler.java:139)
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run
(AMQConnection.java:532)
Possible CauseRabbitMQ service not running on the Security Analytics host. 
SolutionRestart RabbitMQ service using the following commands.
service rabbitmq-server restart

 

                        
Message/
Problem
User Interface: Cannot connect to System Management Service
CauseThe System Management Service, RabbitMQ, or Tokumx service is not running.  
SolutionRun the following commands on Security Analytics server to make sure all these services are running.
[root@saserver ~]# service rsa-sms status
RSA NetWitness SMS :: Server is not running.
[root@saserver ~]# service rsa-sms start
Starting RSA NetWitness SMS :: Server...
[root@saserver ~]# service rsa-sms status
RSA NetWitness SMS :: Server is running (5687).
[root@saserver ~]# service tokumx status
tokumx (pid  2779) is running...
 service rabbitmq-server status
Status of node sa@localhost ...
[{pid,2501},
 {running_applications,
     [{rabbitmq_federation_management,"RabbitMQ Federation Management",
          "3.3.4"},
Message/
Problem
User Interface: Cannot connect to System Management Service
Possible Cause/var/lib/rabbitmq partition usage is 70% or greater. 
SolutionContact Customer Care.

 

               
Message/
Problem
User Interface: Host migration failed.
Possible CauseOne or more Security Analytics services may be in a stopped state.
SolutionMake sure that the following services are running then restart the Security Analytics server:
Archiver, Broker, Concentrator, Decoder, Event Stream Analysis, Incident management, IPDB Extractor, Log Collector, Log Decoder, Malware Analysis, Reporting Engine, Warehouse Connector, Workbench.

 

              
Message/
Problem
User Interface: Server Unavailable.
Possible CauseOne or more Security Analytics services may be in a stopped state.
SolutionMake sure that the following services are running then restart the Security Analytics server:  Archiver, Broker, Concentrator, Decoder, Event Stream Analysis, Incident management, IPDB Extractor, Log Collector, Log Decoder, Malware Analysis, Reporting Engine, Warehouse Connector, Workbench.

 

                      
Message/
Problem
User Interface: Server Unavailable
Possible CauseSystem Management Service (SMS), RabbitMQ, or Tokumx service is not running. 
Solution 1Run the following commands on Security Analytics server to make sure all these services are running.
[root@saserver ~]# service rsa-sms status
RSA NetWitness SMS :: Server is not running.
[root@saserver ~]# service rsa-sms start
Starting RSA NetWitness SMS :: Server...
[root@saserver ~]# service rsa-sms status
RSA NetWitness SMS :: Server is running (5687).
[root@saserver ~]# service tokumx status
tokumx (pid  2779) is running...
 service rabbitmq-server status
Status of node sa@localhost ...
[{pid,2501},
 {running_applications,
     [{rabbitmq_federation_management,"RabbitMQ Federation Management",
          "3.3.4"},
Solution 2Make sure /var/lib/rabbitmq partition is less than 75% full
Solution 3Check Security Analytics host log files (var/lib/netwitness/uax/logs/sa.log) for any errors.

Issues Not Identified by the User Interface or Logs

This section provides troubleshooting information for issues that are not identified by messages Security Analytics displays in the Health & Wellness Interface or includes in the Health & Wellness log files.  For example, you may see incorrect statitical information in the Interface. 

              
ProblemIncorrect statistics displayed in Health and Wellness interface.
Possible CausePuppet service not running. Puppet service must be running on all services.
SolutionRestart Puppet service.

 

             
ProblemIncorrect statistics displayed in Health and Wellness interface.
Possible CauseSMS service is not running. SMS service must be running on the Security Analytics host.
SolutionRestart SMS service.

 

             
ProblemSecurity Analytics does not show version to which you upgraded until you restart jettysrv. 
Possible CauseWhen Security Analytics checks a connection, it polls a service every 30 seconds to see if it is active. During that 30 seconds, if the service comes back up, it will not get the new version.
Solution
  1. Manually stop the service.
  2. Wait until you see that it is it offline.
  3. Restart the service.
    Security Analytics displays the correct version.

 

              
ProblemSecurity Analytics server does not display Service Unavailable page.
Possible CauseAfter you upgrade to Security Analytics version 10.5, JDK 1.8 is not default version and this causes the jettysrv to fail to start. Without the Jetty server, the Security Analytics server cannot display the Service Unavailable page. 
SolutionRestart jettysrv.
You are here: Monitor Health and Wellness of Security Analytics > Troubleshooting Health & Wellness

Attachments

    Outcomes