Alerting: Troubleshoot ESA

Document created by RSA Information Design and Development on Jun 26, 2017Last modified by RSA Information Design and Development on Sep 14, 2017
Version 2Show Document
  • View in full screen mode
  

This section describes common issues that may occur while using ESA, and it suggests common solutions to these problems.

Troubleshoot ESA Services

                              
ProblemPossible CausesSolutions

On the Security Analytics Dashboard, the ESA service displays in red to indicate it is offline.

On the Alerts > Configure page, the following message displays: "The Service is either offline or not reachable."

Several

When an ESA service is offline, there are many possible causes. However, a common issue is that you have created a rule that uses excessive memory and causes the ESA service to fail. To troubleshoot this problem, see "Steps to Troubleshoot Memory Issues with an ESA Service Offline."

Other common causes might be that your firewall is blocking the connection between the ESA and Security Analytics, or the ESA service machine may be down.  

  

To bring up ESA Services:

From Administration > Services, select the actions icon Actions_Icon.png  for your ESA service,  and choose start.

If your ESA service is stopping and restarting in a loop, you may need to call Customer Support to get the services to start.

After a recent upgrade, the ESA service displays in red on the Security Analytics Dashboard to indicate it is offline.

On the Alerts > Configure page, the following message displays: "The Service is either offline or not reachable."

Configuration issuesIf your system has been recently upgraded,  you may have made a configuration error. Under  Administration > Services, select your ESA service, and click on Edit Service. On the Edit Service field, click Test Connection.  If the connections fails, you likely have a configuration error. Attempt to fix your configuration error, and try again. 
The ESA appears to be running slowly.Configuration issuesYou may be able to improve performance by modifying the buffer (the default value is 1048576 bytes), or setting the TCP setting to TCPNoDelay to prevent a delay in receiving TPC Acks. You can modify these settings (readBufferSize and tcpNoDelay) by going to Explorer /Workflow/Source/nextgenAggregation .

Troubleshoot ESA Database Issues

                  
ProblemPossible CausesSolutions

 My ESA Dashboard doesn't load. 

  • Or, there is an error getting data.
  • Or, it loads very slowly. 
The database that stores alerts has grown too large.

You may need to configure Alert database settings so that the database clears old alerts on a timely basis. For information on configuring these settings, see "Configure ESA Storage" in the Event Stream Analysis (ESA) Configuration Guide.

Once the database has become too large, you need to clear the alerts. Contact Customer Support to do this. 

Troubleshoot RSA Live Rules for ESA

                    
ProblemPossible CausesSolutions
I imported a group of rules from RSA Live, and now my ESA service is crashing. Why?You may not have configured the parameters for the RSA Live rule to tune it for your environment. 

Each rule in RSA Live has a description that includes the parameters you must configure and prerequisites for your environment. Review this description to see if the rule is appropriate for your environment.

To ensure that you deploy rules safely in your environment, configure new rules as trial rules to test them in your environment. Trial rules add a safeguard for testing new rules.  For details on this, see  Deploy Rules as Trial Rules.

I imported a group of rules from RSA Live, and while the rules deployed without errors, they were later disabled.

Not all RSA Live rules are meant for every environment. You may not have the correct meta in your ESA for the rule to run.

You can verify that a rule was disabled by going to Alerts> Services > Deployed Rule Stats.  If the rule is disabled, the green icon does not display next to the rule. 

 If a rule deployed correctly but was disabled, check the logs for exceptions related to the rule. Specifically, check to see if the rules were disabled due to missing meta. To do this, go to Administration > Services,  select your ESA service and then ic-actns.png > View >Logs.

Then, search for a message similar to the following:

"Property named ‘<meta_name>' is not valid in any stream"

For example, you might see:

Failed to validate filter expression '(medium=1 and streams=2 or medium=3...(238 chars)': Property named 'tcp_flags_seen' is not valid in any stream

If a similar message displays, you may need to add a custom meta key to the Log Decoder or Concentrator. To do this, follow these instructions: "Create Custom Meta Keys Using Custom Feed " in the Decoder and Log Decoder Configuration Guide.

Troubleshoot Deployments

                  
ProblemPossible CausesSolutions
ESA rules are disabled after an update.Hunting pack metas (inv.category, inv.context, ioc, boc, eoc, analysis.session, analysis.service, analysis.file) used in the ESA rule are modified from single string to array data type.You must update the hunting pack meta keys and redeploy the ESA rules. For more information, see the KB article in RSA Link at https://community.rsa.com/docs/DOC-76158.

Troubleshoot Rules

                    
ProblemPossible CausesSolutions
I created a custom rule (via the Rule Builder or Advanced EPL), and my rule is not firing. Why?You may have connectivity issues.

Check the 'Offered Rate' statistic on the Alerts > Configure > Services tab.

If the offered rate is zero, then the ESA service is not receiving data from Concentrators. Validate the Concentrator connectivity. Go to Administration Services, select your ESA and click on View > Config.  Ensure the concentrator is enabled. Select the concentrator and click on test connection.

If the offered rate is not zero, the meta key name and type used in the rule likely doesn't match the meta key present in events. Check to see if the meta key name and type used in the rule is valid by searching for the meta key name in Alerts > Configure > Settings tab (Meta key references search).

 There may be a problem with the rule.

If a specific rule is not firing, go to Alert > Configure > Services to see if the rule was disabled. In the Deployed Rule Stats section, a rule that is disabled displays a clear enabled button (instead of the green enabled button).

You can also check Events Matched field. Go to Alerts >  ConfigureServices. From there, you can see the number of events that were matched in the Events Matched column.

If no events matched, check the logic of your rule for errors. For example, check the syntax for uppercase and lowercase errors, and check the time window. If the rule still doesn't fire, consider simplifying the logic of the rule to see if it fires when there is less complexity. 

Steps to Troubleshoot Memory Issues with an ESA Service Offline

Step 1: Verify that your Host is Running

The first step to troubleshooting is to ensure that your host is running. To do this, go to Administration > Hosts. If the host is down, the system parameters will not display (updating host information can sometimes be delayed), the Services displays in red, and the Updates field displays an error message. 

ESA_Host_Down.png

If your host is down, contact your SA Administrator to restart it. Otherwise, go to Step 2. 

Step 2: View Detailed Statistics in Health & Wellness

Once you are sure your ESA service is down, you can go to Health & Wellness to see where potential issues are occurring. The most common problem is that your ESA service is exceeding memory thresholds which causes it to stop or fail.

  1. Go to Health & Wellness> Alarms to see if the ESA triggered any alarms. Look for the following alarms:

    • ESA Overall Memory Utilization > 85%
    • ESA Overall Memory Utilization > 95%
    • ESA Service Stopped
  2. Go toHealth & Wellness > System Stats Browser to see the memory metrics for each rule's performance. To view the metrics, enter the following:

                   
    Host ComponentCategory
    <your host>Event Stream Analysisesa-metrics

    esa_metrics.png

    The memory for each rule is displayed in the Value column, and the value is displayed in bytes. You can view a historical view of memory usage in the Historical Graph column. 

    ESA_hist_graph.png

  3. Go to Health & Wellness >  System Stats Browser to see details of your ESA performance. Select your host, and use the following filters to view the following statistics:

                                                                               
    Host ComponentCategoryStatisticExample
    <your host>HostSystemInfoCPU Utilization1.08%
    <your host>HostSystemInfoMemory Utilization45.43%
    <your host>HostSystemInfoUsed Memory7.08 GB
    <your host>HostSystemInfoTotal Memory15.58 GB
    <your host>HostSystemInfoUptime77758, 1 week, 2 day...
    <your host>Event Stream AnalysisProcessInfoMemory Utilization7.07 GB
    <your host>Event Stream AnalysisProcessInfoCPU Utilization0.2%
    <your host>Event Stream AnalysisJVM.MemoryallCommitted Heap Memory Usage 8.0 GB
    <your host>Event Stream AnalysisESA-MetricsTotal ESA Memory Usage %4.64%

    System_Stats_Browser_1.png

If you are having a problem with memory or CPU utilization, continue to step 3. 

Step 3: Bring up your ESA Services

  1. From Administration > Services, select the actions icon Actions_Icon.png  for your ESA service and choose start
  2. Return to the ESA Service to troubleshoot which rules have created memory issues. 

If your ESA service is stopping and restarting in a loop, you may need to call Customer Support to get the services to start.

If you are able to start your ESA service without a shutdown, continue to step 4.

Step 4: Check the Alerts and Events Volume

Once you are able to restart your ESA service without an immediate shutdown, you can review the stats for your rules to see which rules are consuming too many resources. Sometimes, ESA services fail because a rule is generating too many alerts or a rule is matching too many events. Check for both of these issues if you have determined that memory usage is causing your ESA service to shut down. 

View Alert Summaries

 Rules that generate a high volume of alerts can overwhelm the system and cause it to fail or restart.  To view the alert summaries, go to Dashboard > Alerts > Summary. On the lower half of the screen, you can see the number of alerts generated for each rule in the Count field. If the number is significantly high for a particular rule,  you need to disable the rule and rewrite it to be more efficient.

ESA_Alerts_High.png

View Events Matched

Sometimes a rule matches too many events which can use up excessive memory.  This typically occurs if you create a large event window where a great number of events accumulates without triggering an alert.  These are a problem because each event is stored in memory while the rule waits for the alert to trigger. To check for this issue, go to Dashboard > Alerts > Services.  From there, you can see the number of events that were matched in the Events Matched column. If there was a high number of events matched for a given rule, you can investigate the rule further to see if you can make it more efficient.

ESA_High_Events.png

Step 5: Disable and Repair the Rule that Caused Issues

Once you have determined the rules that need to be rewritten, disable them and rewrite rules so that they don't generate such a high volume of alerts or events. For pointers on how to write more efficient rules, see Best Practices.

Disable Rules

  1. To disable rules, go to Alerts > Services, and select the rules you want to disable in the Deployed Rules Stats field.
  2. Select Disable to disable the rules. 

Edit Rules

  1. To repair the rules, go to Alerts > Rules > Rule Library. Select the rule to edit, and click the actions icon Actions_Icon.png.
  2. Select Edit.
  3. Edit the rule to be more efficient. For instructions on creating rules, see Add Rules to the Rule Library
  4. Once you are satisfied with your rule, you can save the rule as a trial rule to ensure that any memory issues do not affect ESA services performance. To do this, follow the steps listed in Work with Trial Rules.

Enable Rules

  1. To enable rules, go to Alerts > Services, and select the rules you want to enable in the Deployed Rules Stats field.
  2. Select Enable to enable the rules. 

(Optional) Check the ESA Log Files for More Information

Once you verify that your services are down and some potential causes for the system going down, check to see if the service is stopping and restarting in a loop.  To do this, go to the ESA logs. From the Administration > Services module, select your ESA service and click the actions icon Actions_Icon.png and select View > Logs

If you cannot access the ESA logs from the Security Analytics interface, you can SSH into the system and go to:opt/rsa/esa/logs/esa.log.

Previous Topic:Best Practices
You are here
Table of Contents > ESA QuickStart Guide > Troubleshoot ESA

Attachments

    Outcomes