Alerting: Troubleshoot ESA

Document created by RSA Information Design and Development on Sep 12, 2017Last modified by RSA Information Design and Development on Jul 8, 2019
Version 10Show Document
  • View in full screen mode
 

This section describes common issues that may occur while using ESA, and it suggests common solutions to these problems.

Troubleshoot ESA Correlation Services

                    
ProblemPossible CausesSolutions

On the NetWitness Platform Dashboard, the ESA service appears in red to indicate it is offline.

In the CONFIGURE > ESA Rules view, the following message appears: "The Service is either offline or not reachable."

Several

When an ESA Correlation service is offline, there are many possible causes. However, a common issue is that you have created a rule that uses excessive memory and causes the ESA service to fail. To troubleshoot this problem, see Steps to Troubleshoot Memory Issues with an ESA Service Offline.

Other common causes might be that your firewall is blocking the connection between the ESA and NetWitness Platform, or the ESA Correlation service machine may be down.

To bring up ESA Services:

Go to ADMIN > Services, select your ESA service, and then select Actions icon Start.

If your ESA service is stopping and restarting in a loop, you may need to call Customer Support to get the services to start.

After a recent upgrade, the ESA service appears in red on the NetWitness Platform Dashboard to indicate it is offline.

In the CONFIGURE > ESA Rules view, the following message appears: "The Service is either offline or not reachable."

Configuration issuesIf your system has been recently upgraded, you may have made a configuration error. Go to ADMIN > Services, select your ESA service, and then select Actions icon Edit. In the Edit Service dialog, click Test Connection. If the connections fails, you likely have a configuration error. Attempt to fix your configuration error and try again.

Troubleshoot RSA Live Rules for ESA

                    
ProblemPossible CausesSolutions
I imported a group of rules from RSA Live, and now my ESA service is crashing. Why?You may not have configured the parameters for the RSA Live rule to tune it for your environment. 

Each rule in RSA Live has a description that includes the parameters you must configure and prerequisites for your environment. Review this description to see if the rule is appropriate for your environment.

To ensure that you deploy rules safely in your environment, configure new rules as trial rules to test them in your environment. Trial rules add a safeguard for testing new rules. For details on this, see Deploy Rules as Trial Rules.

I imported a group of rules from RSA Live, and while the rules deployed without errors, they were later disabled.

Not all RSA Live rules are meant for every environment. You may not have the correct meta in your ESA for the rule to run.

You can verify that a rule was disabled by going to CONFIGURE > ESA Rules > Services > Deployed Rule Stats. If the rule is disabled, the green icon does not display next to the rule. 

If a rule deployed correctly but was disabled, check the logs for exceptions related to the rule. Specifically, check to see if the rules were disabled due to missing meta. To do this, go to the ESA Correlation logs. You can use SSH to get in the system and go to: /var/log/netwitness/correlation-server/correlation-server.log.

Then, search for a message similar to the following:

"Property named ‘<meta_name>' is not valid in any stream"

For example, you might see:

Failed to validate filter expression '(medium=1 and streams=2 or medium=3...(238 chars)': Property named 'tcp_flags_seen' is not valid in any stream

If a similar message displays, you may need to add a custom meta key to the Log Decoder or Concentrator. To do this, follow these instructions: "Create Custom Meta Keys Using Custom Feed" in the Decoder and Log Decoder Configuration Guide.

Troubleshoot ESA Rules

                                                
ProblemPossible CausesSolutions

I have an ESA rule that is not getting deployed and is not creating alerts.

A meta key that the rule uses is a string array type, but it shows as a string type on ESA.

Check to see if any string array meta keys that the rule uses are configured as string array types on ESA. Go to CONFIGURE > ESA Rules > Settings tab (Meta Key References).

  • If it shows string[], it is configured as a string array type on ESA. This is fine.
  • If it shows string without the brackets, it is configured as a string type and you need to fix it on ESA.

In the ESA Correlation service Explore view, go to correlation/stream. Add string array meta keys to the multi-valued list to allow them to be used as an array in ESA rules. Go back to the Meta Key References and click the refresh icon (Meta Re-Sync (Refresh) icon). Verify that the meta keys with a string array type show a value of .string[]. For additional details, see "Configure Meta Keys as Arrays in ESA Correlation Rules" in the ESA Configuration Guide.

I created a rule, and I checked the syntax. The rule looked fine. When I went to deploy the rule, I got an error. Why?You may not have the correct meta to deploy the rule. 

Check the Meta Key References. You may not have the correct meta to deploy the rule. Check the ESA Correlation service log files to see which meta keys are missing: /var/log/netwitness/correlation-server/correlation-server.log.

In NetWitness Platform version 11.3.1 and later, you can check the ESA rule status in the ESA rule deployment (Go to CONFIGURE > ESA Rules > Rules tab. In the options panel on the left, select a deployment and look in the ESA Rules section). If a disabled rule has an error message, it shows ESA disabled rule error message icon in the Status field. Hover over the rule to view the error message tooltip.

Disabled rule tooltip showing an error message

In the above example, the error message shows: "Failed to resolve event type: Event type or class named "Host_Whitelist" was not found." In this case, a Context Hub list called "Host_Whitelist" that is used by the rule is not available. For more information on context hub lists, see the Context Hub Configuration Guide.

For more information, see the ESA Rules section of the Deployment Panel reference.

I created a rule with an enrichment, added an SMTP notification, and deployed my rule. We are not receiving SMTP notifications. Why?

You do not have a template that met the criteria to parse the events.

Check the ESA Correlation service log files to see if the SMTP notification failed: /var/log/netwitness/correlation-server/correlation-server.log. For more details on the notification error, check the Integration-Server log file on the NetWitness Server (also known as Node 0, Admin server, or NWServer): /var/log/netwitness/integration-server/integration-server.log.

If you use an ESA rule that has an enrichment, such as a Context Hub list, you must create a custom template. You can duplicate a default template and adjust it for your enrichment. See SMTP Notification Error Example below for a notification error example.

For information on creating a custom ESA template, see "Define a Template for ESA Alert Notifications" in the System Configuration Guide.

Go to the Master Table of Contents to find all NetWitness Platform Logs & Network 11.x documents.

I created a custom rule (via the Rule Builder or Advanced EPL), and my rule is not firing. Why?You may have connectivity issues.

Check the Offered Rate statistic on the CONFIGURE> ESA Rules>Services tab. Select the ESA service and then look at the statistics on the tab for the Deployment.

If the Offered Rate is zero, then the ESA service is not receiving data from Concentrators. Check the ESA Correlation log files for connectivity issues: /var/log/netwitness/correlation-server/correlation-server.log.

If the offered rate is not zero, the meta key name and type used in the rule likely doesn't match the meta key present in events. Check to see if the meta key name and type used in the rule is valid by searching for the meta key name in CONFIGURE>ESA Rules>Settings tab (Meta Key References).

I created a custom rule (via the Rule Builder or Advanced EPL), and my rule is not firing. Why?There may be a problem with the rule.If a specific rule is not firing, go to CONFIGURE>ESA Rules>Services to see if the rule was disabled. In the Deployed Rule Stats section, a rule that is disabled displays a clear enabled button (instead of the green enabled button).

You can also check Events Matched field. Go to CONFIGURE >ESA Rules> Services. From there, you can see the number of events that were matched in the Events Matched column.

If no events matched, check the logic of your rule for errors. For example, check the syntax for uppercase and lowercase errors, and check the time window. If the rule still doesn't fire, consider simplifying the logic of the rule to see if it fires when there is less complexity. 

After a recent upgrade, I am not seeing alerts and I am seeing disabled rules.There may be a problem with the ESA rule deployment.

Deploy the ESA rule deployments again. ESA Rule Deployment Steps provides more information on deploying rules using the ESA Correlation service.

If this does not resolve the issue, check the ESA Correlation log files for more information: /var/log/netwitness/correlation-server/correlation-server.log.

After an update or upgrade to 11.3.1 or later, if I try to make an adjustment to some rules, I get an error when trying to save them.

The Ignore Case option may be selected for a meta key that does not contain alphabetic values, such as IP address.

In NetWitness Platform 11.3.1 and later, the Ignore Case option has been removed from the ESA Rule Builder - Build a Statement dialog for meta keys that do not contain text values. Adding Ignore Case on meta keys which do not contain alphabetic values causes additional processing to occur for no added benefit.

In the ESA Rule Builder - Build a Statement dialog, check to see if you have any meta keys that do not contain alphabetic characters, for example, ip_src and ip_dst. If you do, clear the Ignore Case checkbox for those meta keys and try to save the rule again.

SMTP Notification Error Example

The following SMTP notification error example is an excerpt from a correlation-server.log file, which shows an error message for sending notifications with unsupported templates. In this example, there is a rule that is configured with the GeoIP enrichment, which has a hash table as one of its fields (the GeoIPLookup meta). Because the default SMTP template is only designed to deal with metas that are either singular values or arrays that contain only singular values, such as "ip.src":"1.1.1.1" and "action":["fw:inbound-network-traffic"], sending the email notification fails due to the array containing a hash table.

FTL stack trace ("~" means nesting-related):

- Failed at: ${value!""} [in template "smtp.ftl" in macro "value_of" at line 1, column 152]

- Reached through: @value_of metadata[key] [in template "smtp.ftl" at line 85, column 141]

----

...

For "${...}" content: Expected a string or something automatically convertible to string (number, date or boolean), or "template output" , but this has evaluated to an extended_hash (LinkedHashMap wrapped into f.t.DefaultMapAdapter):

==> value!"" [in template "smtp.ftl" at line 1, column 154]

Steps to Troubleshoot Memory Issues with an ESA Service Offline

Step 1: Verify that your Host Is Running

The first step to troubleshooting is to ensure that your host is running. To do this, go to ADMIN > Hosts. If the host is down, the system parameters will not display (updating host information can sometimes be delayed), the Services display in red, and you may see an error message. 

Image of Hosts view showing services in red

If your host is down, contact your NetWitness Platform Administrator to restart it. Otherwise, go to Step 2. 

Step 2: View Detailed Statistics in Health & Wellness

If your ESA service is down, you can go to Health & Wellness and view the last known metrics to see where potential issues are occurring. The most common problem is that your ESA service is exceeding memory thresholds, which causes it to stop or fail.

  1. Go to ADMIN > Health & Wellness > Alarms to see if the ESA triggered any alarms. Look for the following alarms for ESA Correlation:

    • Correlation Server in Critical State
    • Correlation Server in Unhealthy State
    • Correlation Server Stopped

    Health & Wellness Alarm shows Correlation Server stopped

  2. Go to ADMIN > Health & Wellness > System Stats Browser to see the memory metrics for each rule's performance. To view the metrics, enter the following and click Apply:

                      
    Host Component

    Category

    <your host>Correlation Server

    Correlation Engine Metrics

    ESA Correlation Health & Wellness System Stats Browser metrics for a selected rule

    The name of the rule is in the Statistic column and the memory usage in bytes is in the Value column.

  3. Click Historical graph icon to view a historical view of memory usage for the rule in the Historical Graph column. 

    ESA Correlation Historical Graph showing ESA Rule Memory Usage

  4. In the System Stats Browser, you can also see details of your ESA Correlation service performance.

    Health and Wellness Memory Metrics

    Select your host, and use the following filters to view the following statistics:

                                                                                           
    Host ComponentCategoryStatisticExample
    <your host>HostSystemInfoCPU Utilization1.14%
    <your host>HostSystemInfoMemory Utilization30.64%
    <your host>HostSystemInfoUsed Memory15.05 GB
    <your host>HostSystemInfoTotal Memory49.14 GB
    <your host>HostSystemInfoUptime259493, 3 days 16 minutes 53 seconds
    <your host>Correlation ServerProcess jvmMemory Total Max64.00 GB
    <your host>Correlation Server

    Process jvm

    Memory Total Used

    593.70 MB

    <your host>Correlation ServerProcessInfoCPU Utilization0.4%
    <your host>Correlation ServerProcessInfoMaximum Memory62.92 GB
    <your host>Correlation ServerProcessInfoMemory Utilization1.48 GB

     

    The following figure shows the location of the ESA Correlation service CPU and Memory Utilization statistics.

    H&W System Stats Browser tab showing system info for the ESA host

  5. Click Historical graph icon to view a historical view of CPU and memory utilization.

    The following figure shows the historical graph of CPU utilization.

    Historical Graph of the ESA Correlation service CPU utilization

    The following figure shows the historical graph of Memory Utilization.

    Historical Graph of the ESA Correlation service memory utilization

If you are having a problem with memory or CPU utilization, continue to step 3. 

Step 3: Bring up your ESA Services

  1. Go to ADMIN > Services, select your ESA service, and then select Actions icon Start.
  2. Return to the ESA Service to troubleshoot which rules have created memory issues. 

If your ESA service is stopping and restarting in a loop, you may need to call Customer Support to get the services to start.

If you are able to start your ESA service without a shutdown, continue to step 4.

Step 4: Check the Alerts and Events Volume

After you are able to restart your ESA service without an immediate shutdown, you can review the stats for your rules to see which rules are consuming too many resources. Sometimes, ESA services fail because a rule is generating too many alerts or a rule is matching too many events. Check for both of these issues if you have determined that memory usage is causing your ESA service to shut down. 

View Alert Summaries

Rules that generate a high volume of alerts can overwhelm the system and cause it to fail or restart.  To view the alert summaries, go to RESPOND > Alerts. In the Filters panel on the left, in the ALERT NAMES section, select the alert name for the rule. The number of alerts with that name appears at the bottom of the Alerts list results. If the number is significantly high for a particular rule, you need to disable the rule and rewrite it to be more efficient.

Respond Alerts View showing the number of alerts for a selected rule

To clear your filter, click Reset Filters.

View Events Matched

Sometimes a rule matches too many events, which can use up excessive memory. This typically occurs if you create a large event window where a great number of events accumulate without triggering an alert. This is a problem because each event is stored in memory while the rule waits for the alert to trigger. To check for this issue, go to CONFIGURE > ESA Rules > Services. From there, you can see the number of events that were matched in the Events Matched column for the deployment. If a high number of events were matched for a given rule, you can investigate the rule further to see if you can make it more efficient.

ESA Rules Services Tab

Step 5: Disable and Repair the Rule that Caused Issues

Once you have determined the rules that need to be rewritten, disable them and rewrite rules so that they don't generate such a high volume of alerts or events. For pointers on how to write more efficient rules, see Best Practices.

Disable Rules

  1. To disable rules, go to CONFIGURE > ESA Rules > Services, and select the rules you want to disable in the Deployed Rules Stats field.
  2. Select Disable to disable the rules. 

Edit Rules

  1. To repair the rules, go to CONFIGURE > ESA Rules > Rules tab > Rule Library.
  2. For each rule that you repair, do the following:
    1. Select the rule to edit and then select Actions icon > Edit.
    2. Edit the rule to be more efficient. For instructions on creating rules, see Add Rules to the Rule Library
    3. When you are satisfied with your rule, you can save the rule as a trial rule to ensure that any memory issues do not affect ESA services performance. To do this, follow the steps listed in Work with Trial Rules.

Deploy Rules

  1. Go to CONFIGURE > ESA Rules > Rules tab.
  2. In the options panel on the left, select the deployment that contains the rule.
  3. In the Deployment view, the rule that you changed shows a status of Updated. Click Deploy Now.
    The rule status changes to Deployed.

Verify that the Rules are Enabled

After you deploy the ESA rules, they should automatically show as enabled. If not, you can enable the rules.

  1. Go to CONFIGURE > ESA Rules > Services tab, and select the ESA service in the options panel.
  2. On the deployment tab for the deployment that contains the rules, in the Deployed Rule Stats section, look at the status of the rules in the Enable column. Enabled rules show a green circle. If the rules show a white circle, you can enable the rules.
  3. To enable rules, select the rules you want to enable and select Enable above the table.

(Optional) Check the ESA Correlation Log Files for More Information

Once you verify that your services are down and some potential causes for the system going down, check to see if the service is stopping and restarting in a loop. To do this, go to the ESA Correlation logs. You can use SSH to get in the system and go to: /var/log/netwitness/correlation-server/correlation-server.log.

Previous Topic:Best Practices
You are here
Table of Contents > Getting Started with ESA > Troubleshoot ESA

Attachments

    Outcomes