Marinos Roussos

IM retention is misconfigured causing table to grow continuously due to not performing regular reindex

Discussion created by Marinos Roussos on Mar 16, 2018
Latest reply on Apr 9, 2018 by Marinos Roussos

Just came across another issue that should have been fixed years ago but RSA engineering didn't think it would be useful to do. There is a Jira open since around 2015 for this (which was probably closed as is was likely not understood).

 

To anyone with basic nosql understanding it would be clear that the database will not always release the disk space after deleting files. For that to happen a table reindex needs to happen.

 

reIndex — MongoDB Manual 2.4 

 

On IM retention, the files are deleted but the database is never instructed by SA/NW/IM to perform a table re-index. This is the perfect recipe for disaster as the system will continuously run slower and slower and grow in size. If retention wasn't set, IM wouldn't be able to load the table in the GUI.

 

The sad thing is that raising a ticket with engineering (and based on the fact that there was a Jira about this issue) one would expect that they would come back to you and say, this is a known issue because we are not re-indexing the data and here is the workaround.

But no, weeks and weeks on and they were still  going in circles up until I figured part of the problem myself. First they tried to blame me, then they tried to blame it on a bug on mongo. 

 

Howe about that you have a bad implementation of mongodb?

 

All I had to do is this inside IM database:

db.alert.reIndex()

 

Before:

> show collections
system.indexes 3.48KB (uncompressed), 32.00KB (compressed)
system.users NaNundefined (uncompressed), NaNundefined (compressed)
categories 16.61KB (uncompressed), 32.00KB (compressed)
aggregation_rule 45.20KB (uncompressed), 64.00KB (compressed)
alert 163.11GB (uncompressed), 40.93GB (compressed)
incident 681.45KB (uncompressed), 2.47MB (compressed)
remediation_task 0.00B (uncompressed), 72.00KB (compressed)
tracking_id_sequence 285.00B (uncompressed), 32.00KB (compressed)
fs.files 0.00B (uncompressed), 48.50KB (compressed)
fs.chunks 0.00B (uncompressed), 48.50KB (compressed)

 

After:

> show collections
system.indexes 3.48KB (uncompressed), 32.00KB (compressed)
system.users NaNundefined (uncompressed), NaNundefined (compressed)
categories 16.61KB (uncompressed), 32.00KB (compressed)
aggregation_rule 45.20KB (uncompressed), 64.00KB (compressed)
alert 26.98GB (uncompressed), 12.41GB (compressed)
incident 681.45KB (uncompressed), 2.47MB (compressed)
remediation_task 0.00B (uncompressed), 72.00KB (compressed)
tracking_id_sequence 285.00B (uncompressed), 32.00KB (compressed)
fs.files 0.00B (uncompressed), 48.50KB (compressed)
fs.chunks 0.00B (uncompressed), 48.50KB (compressed)

 

My issue is still not completely gone as the IM table with the same amounts of ESA alerts still has 8 times bigger size than the equivalent alerts table in ESA, but it's a good start.

 

Suggestions to management:

-Make sure your staff are trained enough to deal with your software.

-Engineering should have basic understanding about the platform (ie what appliance the service is running on) and basic knowledge on how to login to a service for example. The default passwords have been the same since 10.0.

-Take Jira tickets suggested by people that have been working with customers seriously and implement them appropriately. Not just testing stuff in a "empty" virtual machine and sign them off.

 

As always, do this at your own risk. If you are not comfortable or unsure, raise a Support ticket with RSA.

Outcomes