Alerting: Best Practices

Document created by RSA Information Design and Development on Mar 23, 2017Last modified by RSA Information Design and Development on Apr 26, 2017
Version 4Show Document
  • View in full screen mode
  

Best practices provide guidelines to help you write and manage rules, deploy rules, and maintain system health for your ESA services.

Understand Event Stream Analysis Rule Types

The Security Analytics Event Stream Analysis (ESA) service provides advanced stream analytics such as correlation and complex event processing at high throughputs and low latency.  It is capable of processing large volumes of disparate event data from Concentrators. However, when  working with Event Stream Analysis, you should be aware of the factors that affect resource usage in order to create effective rules. 

Each event that is received by ESA is evaluated to determine if it may trigger a rule. There are three types of rules that can be deployed in order to determine what the ESA engine should do with the incoming event. Each of these rule types have different impacts on system resource utilization. All three rule types may be created via the Rule Builder, Advanced EPL rules, or downloaded via RSA Live. The table below lists the rule type and the impact this rule may have on system resources.

                       
Rule TypeDescription

Simple Filter Rule

This rule has no correlation to other events. At ingestion time, this rule is evaluated against a set of conditions, and if those conditions are met an alert is generated. If no conditions match, the event is quickly released by the engine to free up memory usage. These rules do not take up memory since the events are not retained beyond the initial evaluation. The memory resource usage does not increase as more simple filter rules are deployed. However, if the filter condition is too generic, it is possible that this rule can generate too many alerts, which will strain the system resources for the storage and retrieval of these alerts.
For example, you might write a rule to generate an alert when HTTP network activity arrives over a non-standard HTTP port.

Event Window Rule This rule evaluates a set of events over a time period for specific conditions. At ingestion time, the rule is evaluated against a set of conditions. If those conditions are met, the event is retained in memory for a specific amount of time. After the specified time passes, the events are removed from the time window if the number of events collected does not meet the threshold to trigger an alert. The memory consumption of such rules are highly dependent on the incoming event rate (traffic), the amount of data per event, and the time length specified in the event window. Each matching event is retained in memory until the time window has passed, so the longer the time window, the greater the potential volume.  For example, you might write a rule that generates an alert if a user fails to log into any system five times within a ten minute time frame.

Followed By Rule

This rule evaluates a chain of incoming events to determine if the sequence of events matches a particular condition. At ingestion time, the rule is evaluated against a set of conditions. If the conditions are met, one of two actions occurs:

  • If this is first event of the sequence, a new event thread is started, and the event is retained as the head of the sequence.
  • If the event belongs to an existing event thread, it is added to that sequence.

In both cases, the event is retained in memory. The amount of resource usage is particularly sensitive to the customer environment for this type of rule. If the filter condition generates many event threads, resources are consumed for for each new thread (in addition to the event). Additionally, if the end of the event thread is never met (i.e. an alert is never generated), then the entire event is saved in memory indefinitely.  For example, you might write a rule to generate an alert when a user fails to log into a server, then performs a successful login, and then creates a new account.

 

In addition to the memory usage discussed above, alert generation also consumes system resources. Each alert that is generated must be stored for retrieval and must also be processed by Incident Management. This process uses disk space for storage, requires database memory to be consumed, and increases CPU utilization running queries. 

When writing and deploying rules, you should be aware that each of these actions “cost” you system resources. The sections below are designed to help you keep your usage at a healthy level and monitor for problems if systems are becoming overloaded. 

Best Practices for Writing Rules

These are general guidelines for writing rules.

  • Create alerts for actionable events.  The purpose of an alert should be to notify you of an event that requires immediate and specific action. For events that do not require action, or only require you to have awareness of the event, you can create a report. This helps to prevent you from overloading the database that stores alerts.
  • Configure new rules as trial rules so you can observe how they react in your environment.  If you deploy new rules as trial rules, they will be disabled if the configured memory threshold is exceeded. You can also use the memory snapshot feature to see how much memory was being used when a trial rule was disabled. For more details, see Work with Trial Rules.
  • Configure Alert notifications only after your rule testing and tuning is complete. This can help ensure you do not get flooded with notifications if a rule behaves differently than you expect. 
  • Rules need to be specific so that you limit resource usage. Use the following guidelines to limit usage:
    • Make the filters on the rule exclude all but the necessary events for the rule to fire accurately.
    • Make the size of your windows (window time for correlation) as small as possible.
    • Limit the events that you include in the window: For example, if you only want to see IDS events, ensure that you only include those events in your time window. 
  • Rules need to be tuned to an alert level that is manageable. If you are flooded with alerts, then the purpose and utility of an alert is lost. In addition, it’s possible to flood the database that stores alerts, which can slow or prevent your system from processing alerts. For example, maybe you want to know about encrypted traffic to other countries. But, you could limit the list to countries that are known risks. This limits the volume of alerts to a level you can manage.

Best Practices for Working with RSA Live Rules

These are guidelines for RSA Live Rules.

  • Deploy RSA Live rules in small batches. Not every rule is suited to every environment. The best way to ensure your RSA Live rules are successful is to deploy them in small batches so you can test them in your environment. If you deploy small batches, it's much easier to tell if a particular rule has an issue. 
  • Read the rule descriptions provided with RSA Live rules. ESA rules are not “one size fits all.” Not all rules will work in your environment. The rule descriptions tell you which parameters you will need to modify to successfully deploy a rule in your environment.
  • Set your parameters.  RSA Live rules have parameters that need to be modified.  If you do not modify your parameters, the rule may not work or it may exhaust your memory.
  • Deploy new rules as trial rules so you can observe how they react in your environment. If you deploy new rules as trial rules, they will be disabled if the configured memory threshold is exceeded. For more details, see Work with Trial Rules.

Best Practices for Deploying Rules

These are general guidelines for deploying rules.

  • Deploy rules in small batches so you can observe how they react in your environment. Not all environments are the same, and a rule will need to be tuned for memory usage, alert volume, and effective detection of events.
  • Test rules before you configure alert notifications. Configure Alert notifications only after your rule testing and tuning is complete. This can help ensure you do not get flooded with alerts if a rule behaves differently than you expect.
  • Monitor system health as a part of your deployment process. When you deploy rules, monitor your system’s health as a part of your deployment process. You can view total memory utilization for your ESA in the Health and Wellness tab. For more information, see "Viewing Health and Wellness statistics" in Troubleshoot ESA.

Best Practices for System Health

These are general guidelines for system health.

  • Configure the alerts database to maintain a healthy level of alerts. ESA uses MongoDB to store alerts. If the MongoDB becomes flooded with alerts, it can slow or stop the database. To ensure your database maintains a healthy level of alerts, configure settings to clear out alerts regularly. To do this, see "Configure ESA Storage" in the Event Stream Analysis (ESA) Configuration Guide.
  • Set up new rules as trial rules. A common issue is that new rules may cause memory issues. To prevent this, you can set up new rules as trial rules. If the configured memory threshold is met, all trial rules are disabled to prevent the system from running out of memory.  For more information about trial rules, see Work with Trial Rules.
  • Set up thresholds in the Health & Wellness module to alert you if memory usage is too high. There are metrics in the Health & Wellness module that track memory usage. You can set up alerts and notifications to send you an email if those thresholds are crossed.  For more information about the memory statistics you can view, see "Viewing Health and Wellness statistics" in Troubleshoot ESA
  • Monitor memory metrics for each rule in the Health & Wellness module. For each rule, you can view the estimated memory usage in the Health & Wellness module. You can use this information to ensure that rules do not use too much memory.  For more information about the memory statistics you can view, see "Viewing Health and Wellness statistics" in Troubleshoot ESA
Previous Topic:ESA QuickStart Guide
Next Topic:Troubleshoot ESA
You are here
Table of Contents > ESA QuickStart Guide > Best Practices

Attachments

    Outcomes