Security Analytics System Maintenance Checklist

Document created by RSA Information Design and Development on Oct 24, 2017Last modified by RSA Information Design and Development on Nov 16, 2018
Version 3Show Document
  • View in full screen mode
 

This checklist is intended for troubleshooting system issues as well as regular maintenance that can improve the health of your systems. For example, if you run into issues with disk space (such as disk space filling up regularly), refer to this document. It is not mandatory that you perform these tasks as suggested here, but theses steps are designed to help with troubleshooting. This checklist is intended for reference purposes.

Several of the following troubleshooting tasks suggest restarting services. Please check with your organization's policies on restarting services before you perform those tasks.

If you need assistance with these tasks, contact Customer Support. For information about how to contact Customer Support, go to the "Contact Customer Support" page in RSA Link (How to contact RSA Customer Supporthttps://community.rsa.com/docs/DOC-1294).

Audience

The primary audience for this guide is members of the Administration team who are responsible for maintaining Security Analytics.

All Host Types Health Checks

In this section, we describe the most common health checks that apply across all the Security Analytics platforms. You perform these tasks using both the Security Analytics user interface and SSH-Session/ CLI.

Checks for all Host Types Using the Security Analytics UI

                       
Task TitleDescription
Check services
  1. Go to Administration > Hosts and ensure that all the boxes in the Services column are green.
  2. Go to Administration > Services and ensure that all the services that are listed include green circles ().
 
Check alarms

In the Security Analytics UI, go to Administration > Health & Wellness and click the Alarms tab. For information about interpreting the alarms, see Monitor Alarms.

 

Checks for All Host Types Using SSH-Session/ CLI

                                                                                                                                
Task TitleDescription

Check memory usage

Run the following command:
free -g ; top

 

Check CPU usage

Run the following command:
iostat

 

Check for any Security Analytics configuration changes

Run the following command:
puppet agent -t

 

Check the status of mcollective and collectd services

Run the following commands:
service mcollective status
service collectd status

 

Log maintenance

It is a best practice to monitor service and system logs for content and physical size on a daily basis. It is important to verify that logs are being rolled over to keep disk partitions from getting full. (A log is rotated after it reaches a certain size, for example, 50 MB, and a log control tool such as logrotate creates a new file in its place for logging purposes.) Some of the services might not function properly if the root partition runs over 80%. Follow the steps in System Log Maintenance to address problems that can arise if the root partition runs over 80%.

 

Monitor Reporting Engine

Monitor the Reporting Engine to ensure that it does not fill up the /home/rsasoc/ partition. For information about how to monitor Reporting Engine, see Monitor Reporting Engine.

 

Monitor Malware Co-Located service

The Malware Analysis colo service may fail if the spectrum.h2.db database size is over 10 GB. Avoid running the Malware Analysis colo service for continuous scans and check the size of the database frequently. This service is located on all Security Analytics servers. Do not confuse it with the stand-alone Malware Analysis appliance or virtual machine. If the service fails due to unavailable disk space, follow the steps described in Malware Analysis Colo Service Failure.

 

Monitor RabbitMQ server

Security Analytics servers use the RabbitMQ service for features such as federation, Health and Wellness,
and Incident Management. Ensure that the RabbitMQ service is in a healthy state by running a report and looking for alarms, memory usage, and sockets used. To run this report, follow the steps described in RabbitMQ Service Report.

 

Back up host systems and services

Scheduled daily backups of all essential Security Analytics configurations should be taken for each of the following components:

  • Log Decoder
  • Archiver
  • Concentrator
  • Broker
  • ESA
  • Remote Log Collectors (VLC)
  • Reporting Engine
  • Security Analytics server

For information about backing up these components, see Back Up and Restore Data for Hosts and Services.

 

Check Storage Usage

Run the following command:
df -h

 

Sort the files consuming the largest amount of disk space

Run the following command:
du -hsh * | sort -rh

 

 

Check for core service dump files

In the Security Analytics console, run the following command:
find /var/netwitness/ -iname core*

 

Check for any process claiming space for a deleted file

Run the following command:
lsof | grep –i deleted

 

Check for date and time to make sure they are synchronized

In the Security Analytics console, run the following command:
date

 

Check size of H2 database

Security Analytics uses an in-memory H2 database. If the H2 database is over 5 GB, and the user interface is slow, contact Customer Support.

 

Check for Security Analytics current version

Run the following command:
rpm –qa | grep -i –nwappliance

 

Check for all attached storage disks status and RAID configurations

Run the following command:
/usr/sbin/nwraidutil.pl

 

 

Check for NTP operations

Run the following command:
ntpstat

 

Check for kernel version

Run the following command:
uname –a

 

Check for kernel version

Run the following command:
uname –a

 

Verify Custom Index File Configurations

Validate that the following files are consistent across all of the same type of host, for example, all Concentrators have a consistent index-concentrator-custom.xml file.

  • Decoders:/etc/netwitness/ng/index-decoder-custom.xml
  • Concentrators:/etc/netwitness/ng/index-concentrator-custom.xml

If there are any file discrepancies, verify which host has the correct version and push that version to the other hosts. For instructions, see Push Correct Versions of Custom Index Files to Hosts.

 

Verify Custom Feeds

Verify that custom feeds are correctly deployed to hosts. For instructions, see Verify Custom Feeds.

 

Back Up Feeds, Rules and Parsers

Backing up feeds, correlation rules, parsers, and application rules regularly ensures that your configuration is correct if recovery is necessary and makes the recovery procedure easier and faster. If these items are not changed often, they can be backed up less frequently, but you should back them up regularly. You can use the backup scripts to back up these artifacts. For more information, see Back Up and Restore Data for Hosts and Services.

 

Security Analytics Head Server Health Checks

In this section, we describe regular health checks to perform on the Security Analytics Head Server. You perform these tasks using the Security Analytics user interface and SSH-Session/ CLI.

Head Server checks using the Security Analytics UI

                  
Task TitleDescription

List Health and Wellness alarms for false and true positive

Create a list of Health and Wellness alarms and filter for false-positive and true-positive alarms, so that you can address them. Go to AdministrationHealth & Wellness and select the Alarms tab.

 

Head Server Checks Using SSH-Session/ CLI

                                                     
Task TitleDescription

Test connectivity of SA Head Server with other hosts

Run the command:
mco ping

 

Check for Reporting Engine critical errors

Run the command:
tailf /home/rsasoc/rsa/soc/reporting-engine/logs/reporting-engine.log | grep -i error

 

Check for Mcollective errors

Run the command:
tailf /var/log/mcollective.log | grep –i error

 

Check SA certificates keystore contents

Run the command:
keytool -list -keystore /etc/pki/java/cacerts -storetype JKS -storepassword changeit

 

Check for any SA Jetty server critical errors

Run the command:
more /var/lib/netwitness/uax/logs/sa.log | grep -i error

 

Verify all attached storage disks

Run the command: nwraidutil.pl

 

Verify network connectivity

  • Run the command: curl host_IP:port

  • With outgoing SMTP servers, run the command:
    curl smtp_server_IP:25

 

Verify required Security Analytics ports are open

Run the command:
netstat –alnp | grep “port_no”

 

Concentrator Health Checks

Indexes

By default, Security Analytics hosts create index slices based on index save session count. The option /index/config/save.session.count enables you to configure the system to perform automatic checkpoint saves. "0" (zero) means that no checkpoint saves will occur based on sessions that are added. "auto" means that a save will occur at an interval chosen automatically based on available resources.

Older versions of Security Analytics Core, or systems that have been upgraded from Security Analytics versions prior to 10.5, use a time-based save schedule that saves the index every eight hours. You can see the current save interval by using the scheduler editor in the Security Analytics Administration UI for the service.

Within the index slice window there is a maximum limit of a unique number of values that can be indexed for a meta key. The number of values is defined by the setting valueMax in the index-concentrator-custom.xml file. If the meta key is indexed by index values, and if there is no valueMax setting, or if it is set to 0, then the meta key maximum number of unique values is limitless, which can cause higher index usage and degrade the Concentrator performance. Therefore, RSA recommends that you to set valueMax for the meta keys with index values.

It is also important to monitor the number of slices created on a Hybrid or Concentrator host. If the slices reach a certain number, the Concentrator service will have a detrimental impact on query performance, since more slices are created. When hosts reach the following number of index slices, an index reset is recommended if overall query performance is reduced:

  • LogHbyrid: 250 index-slices
  • LogConcentrator: 500 index-slices

Caution: Be aware that a full re-index takes days to complete on a fully-loaded Concentrator .

NWDatabase Configuration Verification

Professional Services usually configures the Core NW database parameters and handles related issues. The information below is quoted from an internal support document: "The Core Database Tuning Guide” and provides information about the syntax that can be used and configuration best-practice for the core NW database.

Syntax used

The following example shows the syntax for NW database configuration: /var/netwitness/decoder/packetdb=10tb;/var/netwitness/decoder0/packetdb==20.5tb

The size values are optional. If set, they indicate the maximum total size of files stored before databases roll over. If the size is not present, the database does not automatically roll over, but its size can be managed using other mechanisms.

The use of = or == is significant. The default behavior of the databases is to automatically create directories specified when the Core service starts. However, this behavior can be overridden by using the == syntax. If == is used, the service does not create any directories. If the directories do not exist when the service starts, the service does not successfully start processing. This gives the service resilience against file systems that are missing or unmounted when the host boots.

Verification of the 95% threshold

To ensure that the NW database directory sizes are configured with the correct 95% threshold, in the Security Analytics UI:

  1. Go to the Security Analytics service Explore view, right-click on Properties and select reconfig.
  2. In the parameters field, type Update=0 and click Send. The response output will check the host storage and attached storage, and automatically calculates what the 95% threshold is.
  3. When you type Update=1 and click Send, the response output displays the same response as in the previous step, but when you refresh the Explore view, you will see that the session, meta, and packet database directories size have been updated to 95% of the current available storage.
  4. Restart the Concentrator or Decoder service for the changes to take effect.

Concentrator Health Checks using the Security AnalyticsUI

                                                          
Task TitleDescription

Check Health and Wellness for any related errors to hosts.

Go to AdministrationHealth & Wellness and click on the Alarms tab.

 

Check aggregation status, rate and auto start

  1. Go to Administration > Services and select a Concentrator service.

  2. Click View > Config. From the Config drop-down menu at the top of the page, select Stats.

  3. In Key Stats, check the values Rate, Behind and Status and make sure sessions-behind are less than 100,000.

 

Confirm metadata at the Concentrator is available for investigation

Go to Investigation > Navigate and select Load Values. 

Set query.parse to strict

  1. Go to Administration > Services and select a Concentrator service.

  2. In the Actions menu, click View > Explore.

  3. In the left pane, expand sdk and select config. Ensure that query.parse is set to strict.

 

Review configured storage for NWDB

  1. Go to Administration > Services and select a Concentrator service.

  2. In the Actions menu, click View > Explore and in the left pane, select database > config.

  3. Look in the configured storage for NWDB (meta.dir, session.dir, index.dir), which should be using up to 95% of available storage (local storage and DAC).

 

Index-check: Check the number of slices

The number of slices should be 400 or less. Run the command:
/index/stats/slices.total
(Index Slice Total)
The number of slices should be less than 500 to avoid slowing down query performance.

 

Ensure NWDB storage configuration is correct

  1. Go to Administration > Services and select a Concentrator service.

  2. In the Actions menu, click View > Explore.

  3. In the left panel, right-click on database and select Properties.

  4. From the drop down menu, select reconfig, and in Parameters, type update=0, and click Send. This calculates what the NWDB size-configuration should be for all available storage to the server.

  5. If this configuration does not match the current configuration, in Parameters, type update=1 and then restart the nwconcentrator service to implement the correct NWDB storage configuration.

 

Verify all meta keys are configured with correct format and valueMax entries

  1. Go to Administration > Services and select a Concentrator service.

  2. In the Actions menu, click View > Config.

  3. Select the Files tab, and from the drop down list, select the index-concentrator-custom.xml file and verify that all the meta keys are configured with the correct format and valueMax entries.

 

Check /index/
config/save.
session.count

  1. Go to Administration > Services and select a Concentrator service

  2. In the Actions menu, click View > Config.

  3. From the Config menu at the top of the page, select Explore.

  4. In the left pane, select index > config. save.session.count is displayed in the right pane. save.session.count is 600000000 by default (in 10.5.X and later). If save.session.count=0, then index slice creation is still controlled by the service scheduler.

  5. In View > Config, select the Files tab and from the drop down list, select scheduler. Scheduler should look similar to : /sys/config/scheduler/351 = hours=8 pathname=/index msg=save

  6. If /index/config/save.session.count=0 and the index save schedule is every 8 hours, there are at least 21 index slices created every week. Assuming that the majority of queries are two weeks or less, update the index slice to:
    /index/config/index.slices.open (Index Open Slice Count) = 0 to 42 (42 is the default open slice count).

This change should reduce the maximum amount of memory that the Concentrator service can use for queries.

Note: The change is immediate and does not require a service restart.

 

Concentrator Checks Using SSH-Session/ CLI

                                           
Task TitleDescription

Check storage usage

Run the command: df –h

 

Check memory usage

Run the command: free –g

 

Check for meta keys exceeding their valueMax per slice

Run the command:
cat /var/log/messages | grep –i index | grep –i max

 

Test execution of Puppet provisioning script

Run the command: puppet agent –t

 

Verify required ports are open

Run the command netstat –alnp | grep “port_no”

 

Index check: Check size of index slices

Run the command:
cd /var/netwitness/concentrator ; du -h index

Note: RSA recommends that index slice size should be less than 20 GB for optimal performance. If you see very large index slices, you can verify the following index configuration settings:

  • Proper valueMax values are set for meta keys with the format IndexValues in index-concentrator-custom.xml.
  • Index save scheduler entry is set to 8hr, or
  • Index config /index/config/save.session.count is set to auto or 600000000 (600 Million).
 

Event Stream Analysis (ESA) Health Checks

In this section, we describe regular health checks to perform for ESA. You perform these tasks using both the Security Analytics user interface and SSH-Session/ CLI.

ESA checks using the Security Analytics UI

                                                     
Task TitleDescription

Check the Events per Second (EPS) rate

Monitor EPS for an ESA host at the following location:

Alerts > Configure > Services, select an ESA host and check Offered Rate.
Compare your current ESA EPS rates to previous results, and if there is a significant difference, call Customer Support. If you are using a virtual system, you can also refer to the "Basic Deployment" topic in the Virtual Host Setup Guide (https://community.rsa.com/docs/DOC-83321) for more information.

 

Check Mongo database status

The Mongo database on the ESA host is responsible for storing the alerts and incident management information. After a period of time, it is possible for this database to grow large and cause performance issues. RSA recommends that the Mongo database does not exceed 5 GB in size. Ensure that you set up database maintenance to prevent it from exceeding 5 GB at the following location:

Administration > Services, select an ESA service. From the Actions menu, select View > Explore > Alert > Storage > Maintenance.

 

Ensure that all data source connections are enabled

  1. Go to Services and select an ESA service.

  2. From the Actions menu, select View > Config.

 

Ensure that ESA rules resource-usage monitoring is enabled

  1. Go to Administration > Services, select an ESA service. From the Actions menu, select View > Explore.

  2. In the left pane, expand CEP and go to Metrics > configuration, and ensure that EnabledMemoryMetric, EnabledCaptureSnapshot and EnableStats are set to true.

  3. Restart the ESA service.

 

Monitor ESA rules memory usage

  1. Go to Administration > Health & Wellness > System Stats Browser.

  2. Enter the following options in the fields at the top of the page:
    Host = ESA
    Component = Event Stream Analytics
    Category, type esa-metrics

  3. Click Apply.

 

Ensure that the correct Concentrators are added to the ESA service as data sources

  1. Go to Administration > Services and select an ESA service.

  2. From the Actions menu, select View > Config and ensure that the list of Concentrators is correct.

  3. Ensure that all Concentrators are enabled and that the default port is set to 56005.

 

Ensure that the number of enabled rules meets requirements

Go to Alerts > Configure > Services > Rule Stats.

 

Ensure all ESA rules are deployed after updates

  1. Go to Alerts > Configure > Rules.

  2. Ensure that there is no exclamation mark beside the deployment.

 

ESA Checks Using SSH-Session/ CLI

                  
Task TitleDescription

Make sure that there are NO sessions-behind between ESA and downstream data-sources like (concentrator, decoders)

SSH to the ESA host and run the following commands

Note: The commands in RED are user inputs and the ones in BLACK are system outputs.

root@ESA]# /opt/rsa/esa/client/bin/esa-client --profiles carlos

carlos:offline||jmx:localhost:com.rsa.netwitness.esa:/>carlos-connect

RemoteJmsDirectEndpoint { jms://localhost:50030?carlos.useSSL=true } ; running = true

carlos:localhost||jmx:localhost:com.rsa.netwitness.esa:/>cd nextgen

/Workflow/Source/nextgenAggregationSource

carlos:localhost||jmx:localhost:com.rsa.netwitness.esa:/Workflow/Source/next

genAggregationSource>
get .

"name" : "10.xx.xx.xx:56005",
"note" : "",
"sessionId" : 24462390949,
"sessionsBehind" : 58501036,
"state" : "IDLE_QUEUED",
"status" : "Streaming",
"time" : 1459508373000
}, {

 

Log Collector Health Checks

In this section, we describe regular health checks to perform for Log Collector. You perform these tasks using both the Security Analytics user interface and SSH-Session/ CLI.

Log Collector checks using the Security Analytics UI

                                      
Task TitleDescription

Ensure all subcollections are started

  1. Go to Administration > Services, and select a Log Collector service.

  2. From the Actions menu, select View > System and check the collection status to ensure that the relevant collections have been started.

 

Check the Start collection on service startup status

  1. Go to Administration > Services, and select a Log Collector service.

  2. From the Actions menu, select View > Config > Collector Configuration.

 

Ensure Remote Log Collectors (VLCs) are configured

If VLCs are available and their configuration is the Pull model, ensure that VLCs are included in the Log Collector configuration.

  1. Go to Administration > Services, and select a Log Collector service.

  2. In the Actions menu, select View > Config > Remote Collectors.

 

Ensure the Decoder is defined for the Log Collector in EventDestinations

  1. Go to Administration > Services, and select a Log Collector service.

  2. In the Actions menu, select View > Config > Event Destinations and make sure the status is “started”.

 

Ensure ports are set to default 50001 and 56001 for SSL

  1. Go to Administration > Services, and select a Log Collector service.

  2. In the Actions menu, select View > Config. Select the General tab and ensure that:
    Port is set to 50001
    SSL Port is set to 56001

 

Log Collector Checks Using SSH-Session/ CLI

                                      
Task TitleDescription

Ensure rabbitmq-server is started

From logcollector SSH, run the command:
service rabbitmq-server status

 

Ensure nwlogcollector service is up and running

From logcollector SSH, run the command:
status nwlogcollector
 

Make sure all queues have at least one consumer

Run the command:
rabbitmqctl list_queues -p logcollection messages_ready name consumers

 

Ensure that there are no stuck rdq files

Navigate to the following location and ensure that msg_store_persistent does not have the rdq files backed up:
/var/netwitness/logcollector/rabbitmq/mnesia/sa@localhost/msg_store_persistent

 

If host is VLC, verify that the logCollectionType is set to RC

Run the command:
cat /etc/netwitness/ng/logcollection/{logCollectionType}

Note: If you are deploying new plugin collection content on a VLC, you must deploy it on the local Log Collector as well.

 

Log Decoder Health Checks

In this section, we describe regular health checks to perform for Log Decoder. You perform these tasks using both the Security Analytics user interface and SSH-Session/ CLI.

Log Decoder checks using the Security Analytics UI and Explore/REST

                                                
Task TitleDescription

Ensure that capture has been started.

  1. Go to Administration > Services, and select a Log Decoder service.

  2. From the Actions menu, select View > System and check the capture status.

 

Ensure that the capture rate is within the EPS range

  1. Go to Administration > Services, and select a Log Decoder service.

  2. From the Actions menu, select View > Config.

  3. From the Config dropdown menu, select Stats.

 

Ensure that parsers are enabled

  1. Go to Administration > Services, and select a Log Decoder service.

  2. From the Actions menu, select View > Config and on the General tab, check the Parsers Configuration section.

 

Ensure that the ports are set to default 50002 and 56002 for SSL

  1. Go to Administration > Services, and select a Log Decoder service.

  2. From the Actions menu, select View > Config and on the General tab in the System Configuration section, ensure that:
    Port is set to 50002
    SSL Port is set to 56002

 

Ensure that the correct capture interface is selected

  1. Go to Administration > Services, and select a Log Decoder service.

  2. From the Actions menu, select View > Config.

  3. On the General tab, check the Decoder Configuration section.

 

Ensure that databases are correctly configured

  1. Go to Administration > Services, and select a Log Decoder service. From the Actions menu, select View > Explore.

  2. In the left pane, right-click on database and select Properties, and from the drop-down menu, select reconfig.

  3. In Parameters, type update=0, and then click Send.

  4. Compare the response output with the current configuration.

 

Ensure Decoders are Synchronized with Enabled Parsers

To ensure accurate analysis and that forensic data is available to analysts, Decoders must be in sync with enabled parsers. Because the list of enabled parsers can grow in a typical installation, the easiest way to validate the list of parsers that are enabled on Decoders is to check which parsers are currently disabled by using the parsers.disabled attribute in the Explore/REST interface. Perform the steps in Validate the List of Enabled Parsers on Decoders.

 

Log Decoder Checks Using SSH-Session/ CLI

                  
Task TitleDescription

Ensure that Log Decoder is listening on port 514

Run the command: netstat –anp | grep 514  

Archiver Health Checks

In this section, we describe regular health checks to perform for Archiver. You perform these tasks using both the Security Analytics user interface and SSH-Session/ CLI.

Archiver checks using the Security Analytics UI

                                                
Task TitleDescription

Ensure that aggregation has started

  1. Go to Administration > Services, and select an Archiver service.

  2. From the Actions menu, select View > System and check the aggregation status.

 

Ensure that Log Decoder aggregation is correct and status is consuming

  1. Go to Administration > Services, and select an Archiver service.

  2. From the Actions menu, select View > Config and check the General tab.

 

Ensure that aggregation is automatically started

  1. Go to Administration > Services, and select an Archiver service.

  2. From the Actions menu, select View > Config.

  3. On the General tab, under Aggregation Configuration > Aggregation Settings, ensure that Aggregate Autostart is selected.

 

Ensure that all metas are added to Meta Include

  1. Go to Administration > Services, and select an Archiver service.

  2. From the Actions menu, select View > Config and on the General tab, check the Meta Include column.

 

Ensure that ports are set to default 50008 and SSL 56008

  1. Go to Administration > Services, and select an Archiver service.

  2. In the Actions menu, select View > Config. Ensure that:
    Port is set to 50008
    SSL Port is set to 56008

 

Ensure that all database directories are set to
0 B

  1. Go to Administration > Services, and select an Archiver service. From the Actions menu, select View > Explore.

  2. In the left pane, expand archiver > collections > default > database > config.

  3. In the right pane, check meta.dir, packet.dir, and session.dir and ensure that they are set to 0 B.

 

Ensure that 95% of usable storage is set

  1. Go to Administration > Services, and select an Archiver service. In the Actions menu, select View > Config.

  2. Select the Data Retention tab, and under Collections, check the values for Hot Storage, Warm Storage, and Cold Storage.

 

Packet Decoder Health Checks

In this section, we describe regular health checks to perform for Packet Decoders. You perform these tasks using both the Security Analytics user interface and SSH-Session/ CLI.

Packet Decoder checks using the Security Analytics UI

                                           
Task TitleDescription

Ensure that capture has been started.

  1. Go to Administration > Services, and select a Packet Decoder service.

  2. From the Actions menu, select View > System and check the capture status.

 

Ensure that the correct capture interface is selected

  1. Go to Administration > Services, and select a Packet Decoder service.

  2. From the Actions menu, select View > Config and on the General tab, check the Decoder Configuration section.

 

Ensure that parsers are enabled

  1. Go to Administration > Services, and select a Packet Decoder service.

  2. From the Actions menu, select View > Config and on the General tab, check the Parsers Configuration section.

 

Ensure that the capture rate is within the Mbps range

  1. Go to Administration > Services, and select a Packet Decoder service.

  2. From the Actions menu, select View > Config.

  3. From the Config dropdown menu, select Stats.

 

Ensure that ports are set to default 50004 and SSL 56004

  1. Go to Administration > Services, and select a Packet Decoder service.

  2. In the Actions menu, select View > Config.

  3. Ensure that:
    Port is set to 50004
    SSL Port is set to 56004

 

Ensure that there are no Flex parsers or duplicate parsers enabled

  1. Go to Administration > Services, and select a Packet Decoder service.

  2. From the Actions menu, select View > Config and on the General tab, check the Parsers Configuration section.

 

Packet Decoder Checks Using SSH-Session/ CLI

                  
Task TitleDescription

Monitor current transmission rate, packet-drops and errors on all interfaces

Run the command: netstat -i 

Log Locations

If issues arise with any component of the Security Analytics platform, the following table will help you find component log files to assist with troubleshooting.

You can also export logs from the user interface, for example, from Decoders and Log Collectors. For information about viewing and exporting logs, see the "Search and Export Historical Logs" topic in the System Maintenance Guide (https://community.rsa.com/docs/DOC-84570).

                                                                   
ComponentLog Location
SA Server UI/var/lib/netwitness/uax/logs/sa.log

SA Server Jetty

/var/lib/netwitness/uax/logs/sa.log
/opt/rsa/jetty9/logs/<YYYY>_<MM>_<DD>.stderrorout.log

RabbitMQ

/var/log/rabbitmq/sa@localhost.log
/var/log/rabbitmq/startup.log
/var/log/rabbitmq/sa@localhost-sasl.log

Reporting Engine

/home/rsasoc/rsa/soc/reporting-engine/logs/reporting-engine.log

Upgrading

/var/log/yum.log

CollectD /var/log/messages

Puppet

/var/log/messages

Puppet Master

/var/log/puppet/masterhttp.log

MCollective

/var/log/mcollective.log

ESA

/opt/rsa/esa/logs/esa.log

General

/var/log/messages

HostNumber of DACs

Series 4 Log Decoder

Up to 5, or one UltraDAC.

Series 5 Log Decoder

Up to 8, or one UltraDAC.

Supported Browsers

When using the Security Analytics user interface, RSA recommends that you use the following browsers:

  • Google Chrome
  • Firefox

Task Details

This section contains detailed procedures for some of the tasks in the checklist.

Check Services

To check the health of services using Health and Wellness:

  1. Log into the Security Analytics user interface.
  2. Select Administration > Health & Wellness and then click the Policies tab.
  3. Ensure that the Broker, Concentrator and Decoder services are displayed as green (enabled) and that the services indicate All with the correct number of devices for our environment.

To learn more about health and wellness, read the "Health and Wellness" topic
in the System Maintenance Guide in RSA Link (https://community.rsa.com/).

System Log Maintenance

To address issues if the root partition runs over 80%:

  1. Check disk volume partition space and ensure that the root partition is not over 80%. Run the following command:
    df
  2. Check the size of the logs in the /etc/logrotate.conf and /etc/logrotate.d directories. Ensure that the logs are being rolled over. Most services use logrotate to manage the logs. logrotate configurations are in the /etc/logrotate.conf and /etc/logrotate.d directories. The following list of logs should be monitored:
    /var/log/tokumx/
    /var/log/puppet/
    /var/log/logstash/
    /var/log/audit/
    /var/log/rabbitmq/
    /var/lib/netwitness/uax/logs
    /var/lib/netwitness/rsamalware/jetty/logs
    /opt/rsa/im/
    /opt/rsa/jetty9/logs/
    /home/rsasoc/rsa/soc/reporting-engine/logs
    /opt/rsa/sms/
    /opt/rsa/sms/logs
    /var/lib/netwitness/rsamalware/spectrum/logs

  3. Pay special attention to the /var/lib/netwitness/uax/scheduler/ directory. This is where Security Analytics stores all PCAPS that are generated from analysts using the Investigation module. Ensure that this directory does not fill up all the available space in the partition.

Monitor Reporting Engine

To resolve Reporting Engine issues, run a df command. If the command shows that the partition is getting full, the most common directories that cause this are:

  • /home/rsasoc/rsa/soc/reporting-engine/formattedReports
  • /home/rsasoc/rsa/soc/reporting-engine/resultstore

Recovery steps: Open a ticket with Customer Support, as this can indicate a unique situation that should be evaluated by Support.

Malware Analysis Colo Service Failure

To resolve a Malware Analysis Colo service failure:

  1. Run stop rsaMalwareDevice

  2. Move the contents of /var/lib/netwitness/rsamalware/spectrum/db/
    to a backup location.
  3. Run start rsaMalwareDevice

RabbitMQ Service Report

To run the RabbitMQ service report and recover if RabbitMQ is down:

  1. SSH to the Security Analytics server.
  1. Run rabbitmqctl status.

Recovery Steps: If RabbitMQ is down, follow these steps:

  1. Collect the logs under /var/log/rabbitmq/
  2. Run the following commands:
    service puppet stop
    service rsa-sms stop
    service rabbitmq-server stop
    service rabbitmq-server start
    service rsa-sms start
    service puppet start

Packet Retention Data Management Script

To run the rest_packet_retention.py script that provides packet retention data:

  1. Copy the rest_packet_retention.py to a server that has access to the other Security Analytics hosts.
  2. Make the script executable by running the following command (this is a one-time task):
    chmod +x rest_packet_retention.py
  3. Create a host .csv file named decoders.csv that contains a list of all Decoders, one per line, with IP addresses or hostnames (this is a one-time task).
  4. Run the following command:
    ./rest_packet_retention.py
  5. Enter the user name (the local admin account, not root) for the host and press ENTER.
  6. Enter the password for the host and press ENTER.
  7. The following example shows output from the rest_packet_retention.py script:
    Username: admin
    Password for admin:
    Host: 172.16.0.0 Packet Oldest Time: 2017-05-10 15:06:49 Days (Retention): 10 days, 10:28:47

Meta Retention Data Management Script

To run the rest_meta_retention.py script that provides meta retention data:

  1. Copy the rest_meta_retention.py script to a server that has access to the other Security Analytics hosts.
  2. Make the script executable by running the following command (this is a one-time task):
    chmod +x rest_meta_retention.py
  3. Create a host .csv file named concentrators.csv that contains a list of all Concentrators, one per line, with IP addresses or hostnames (this is a one-time task).
  4. Run the following command:
    ./rest_meta_retention.py
  5. Enter the user name (the local admin account, not root) for the host and press ENTER.
  6. Enter the password for the host and press ENTER.
  7. The following example shows output from the rest_packet_retention.py script:
    Username: admin
    Password for admin:
    Host: 172.16.0.0 Meta Retention: 10 days, 20:27:46

Push Correct Versions of Custom Index Files to Hosts

To verify which host has the correct version of custom index files, and then push them to the other hosts, follow the steps in this example (which is a Concentrator with an inconsistent file):

  1. Log into the Security Analytics user interface.
  2. From the main menu, select Administration > Services.
  3. Select a Concentrator that has the correct version of the custom index file and click View > Config.
  4. Select the Files tab.

  5. Select the index-concentrator-custom.xml from the drop-down list and click Push to push this file to a Concentrator with a known bad configuration.

Verify Custom Feeds

To verify that custom feeds are deployed correctly to hosts:

  1. Log into the Security Analytics user interface.
  2. From the main menu, select Live > Feeds.
  3. Check the Status column for any failed feeds and remediate them.

Validate the List of Enabled Parsers on Decoders

To validate enabled parsers for each Decoder and then compare the list of disabled parsers to ensure uniformity across all Decoders:

  1. Log into the Security Analytics user interface.
  2. Select Administration > Services.
  3. Select a Decoder and click View > Explore.
  4. In the left pane, navigate to decoder > parsers > config.
    The parsers.disabled attribute is displayed on the right and lists the parsers that are disabled as shown in the following figure.

  5. Compare the disabled.parsers attribute to other Decoders in your environment. Any discrepancies can be cleared up by copying and pasting settings from a known good Decoder configuration to one with errors. You can also manually enable or disable parsers using View > Config.

 

You are here
Table of Contents > Security Analytics System Maintenance Checklist

Attachments

    Outcomes