Davide Veneziano

Tools and troubleshooting: calculating the index utilization

Discussion created by Davide Veneziano Employee on Mar 10, 2015
Latest reply on Apr 13, 2015 by Andy Cunningham

Security Analytics relies massively on indexed data when running a query to ensure high performance and flexibility upon both investigations and reports. Whatever condition is set into the where clause, this will be run across the index database looking for a match so saving the solution from digging into the raw data multiple times and resulting in instant access to your security events.


In order to make the implementation of this mechanism realistic, the platform has to be told how many unique values each meta key is supposed to hold until the index is saved next (which is every 8 hours by default). This setting is stored along the definition of each meta key into the index-concentrator and index-archiver xml file, within the ValueMax flag, which defines the maximum number of unique values a key can assume for each time slice. When the buffer is full, newest data are prevented to enter the database hence the importance to monitor carefully the index utilization to ensure no relevant keys are approaching ValueMax.


To help in achieving this objective, I’m attaching a simple script which generates a profile for each indexed key based on the ValueMax configured and the unique values currently used for each key and returns the percentage of utilization of the buffer in a descending order. This could support in identifying immediately the meta keys which may require a bigger buffer or which are storing too many unique values when they shouldn’t.


In order to set up the script, a perl interpreter and wget are required so it could be run on a Security Analytics appliance or in any box with perl and wget installed. To configure it, open the script, set in the variables at the top the IP of the concentrator and an admin username and password. When run, the script will connect to the concentrator’s REST API, pulls out the required data and generates the profile which is returned to the user’s screen.


A sample output will follow:

service: 10.67% (8/75)

action: 1.50% (15/1000)

ip.proto: 1.17% (3/256)

medium: 1.00% (1/100)

did: 0.39% (1/256)

content: 0.05% (23/50000)

country.dst: 0.05% (5/10000)

city.dst: 0.04% (20/50000)

extension: 0.04% (21/50000)

error: 0.03% (14/50000)

Disclaimer: please DO NOT consider what is described and attached to this post as RSA official content. As any other unofficial material, it has to be tested in a controlled environment first and impacts have to be evaluated carefully before being promoted to production. Also note the content I publish is usually intended to prove a concept rather than being fully working in any environments. As such, handle it with care.