Kevin Arunski

Size Index Bucketing

Blog Post created by Kevin Arunski Employee on May 4, 2018

One of the more challenging things to accomplish in the RSA NetWitness core database is querying and filtering using meta items that represent byte sizes.  At face value it may seem simple: sizes are just numbers, so why would it be difficult to compare the size values in each session with the search criteria?  

The traditional RSA NetWitness index does not handle the values in "size" particularly well.  The RSA NetWitness index tries to keep track of the sessions in which every unique value appears.  So that means it could be required to maintain a separate list of sessions for every single possible value of size from 0 all the way up to the maximum possible size.  Since the number of sessions is so large (in the billions) the number of size values to track immediately becomes in the millions.  What's more, each of those values is only associated with a few sessions, on average, and they tend to be spread out all over the data.  That small list of sessions doesn't compress well, and wastes disk space and RAM.

Enter Size Bucketing

To make size indexes work well, we have introduced a new indexing mode that can be used on size keys.  It is called "bucket" mode, and it works like this:  instead of indexing every possible size, we round down the sizes to their nearest "bucket."  The buckets are whole-number values of kilobytes, megabytes, gigabytes, and so on.  This drastically reduces the number of index entries and solves the performance issues.  Fortunately, using this type of index does not really lead to a loss of functionality.  You can still use size indexes to perform queries, so expressions like: 

size > 1024 

or 

size == 1234456

Are valid and are evaluated accurately, even if a bucketed index is used.  The bucketed index narrows down the query enough that the exact expression can be evaluated using the data in the meta database.  

There is a subtle difference in index behavior, however.  If your query criteria specifies an exact bucket value, then the results returned will be all the sessions that have matching values in the bucket.  If you ask for size = 2048, the index engine identifies this as exactly 2 kilobytes, and will return sessions with sizes that are greater than or equal to 2 KB, but less than 3 KB.  If your query criteria does not match an exact bucket size, the query engine narrows down the results to those sessions that match the value exactly.  The reason for this behavior is to support the Navigate view in a logical way, while still allowing for more specific cases of the index to be utilized.

Using Size Bucketing

Size bucketing can be enabled on custom indexes with the following requirements:

  • The index format must be uint32 or uint64.
  • The index must be indexed by value.

To enable the size buckets, just add the bucket parameter to your custom index entry.  For example:

<key name="size" description="size" format="UInt32" bucket="true" level="IndexValues" />

After the index is saved or reloaded, the meta will be indexed with buckets.  Notice that 11.1 also removes an explicit restriction on indexing "size":  it is now acceptable to index this meta type.

If using size bucketing, it is not necessary to specify a valueMax parameter.  The size buckets prevent value max from reaching a large value.

Size Buckets in Navigate View

One immediate effect of size bucketed indexes is that they are useful in the Navigate view.  The Navigate view will render the size buckets in their human-readable form with the appropriate digital unit displayed.  So sizes are shown as "1 MB", "11 MB", "1 TB", and so on.  The navigation report will give you totals for the sessions in the buckets, so you can see useful information about the most frequently encountered sizes in the collection.  In addition, these buckets are maintained when you click into them and pivot to the Events view.  There, you will see a listing of sessions that are in the bucket.

Session Size Meta Can Be Added to Navigate

The labels used for sizes are also supported as part of the raw query syntax, so you may specify a query using human-readable aliases such as:

size > "1 KB"

or 

size < "10 GB"

Note that you have to put the value in quotes, because it's really a text label on the bucket.

Outcomes