000029675 - How to calculate Data Retention for a Mapr based Security Analytics Warehouse

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support Employee on Apr 21, 2017
Version 4Show Document
  • View in full screen mode

Article Content

Article Number000029675
Applies ToSecurity Analytics: 10.X
Security Analytics Warehouse (MAPR based)
IssueThis article details how to determine how many days of storage is left when MAPR based warehouse is in use.
The attached script is designed run on a machine that has access to an NFS mount to the  warehouse. This can be any unix host that uses nfs that has access to the Warehouse Connector.
 
ResolutionThe attached script takes a single argument which is the full path to the parent directory containing the rsasoc directory.
A typical Warehouse structure would look as follows when viewed on the Appliance that is running the Warehouse Connector service:
/saw/lonsaw/rsasoc/tmp
/saw/lonsaw/rsasoc/v1/logs
/saw/lonsaw/rsasoc/v1/sessions
/saw/lonsaw/rsasoc/v1
/saw/lonsaw/rsasoc
/saw/lonsaw/


In this case, the command would be given as follows:
./saw-retention.sh /saw/lonsaw
[root@LOGDECCOL1 ~]# ./saw-retention.sh /saw/lonsaw
*** Looking at Warehouse under Mount /saw/lonsaw ***
1/1/1970 LOGS: 9 SESSIONS: 11 TOTAL: 20
1/12/2014 LOGS: 205699 SESSIONS: 101484 TOTAL: 307183
10/12/2014 LOGS: 87072 SESSIONS: 55402 TOTAL: 142474
11/12/2014 LOGS: 93811 SESSIONS: 57009 TOTAL: 150820
12/12/2014 LOGS: 38054 SESSIONS: 21268 TOTAL: 59322
2/12/2014 LOGS: 205507 SESSIONS: 95613 TOTAL: 301120
3/12/2014 LOGS: 145948 SESSIONS: 77254 TOTAL: 223202
4/12/2014 LOGS: 105432 SESSIONS: 64673 TOTAL: 170105
5/12/2014 LOGS: 104930 SESSIONS: 61988 TOTAL: 166918
6/12/2014 LOGS: 108924 SESSIONS: 64465 TOTAL: 173389
7/12/2014 LOGS: 103382 SESSIONS: 60190 TOTAL: 163572
8/12/2014 LOGS: 104874 SESSIONS: 62152 TOTAL: 167026
9/12/2014 LOGS: 85405 SESSIONS: 54391 TOTAL: 139796
23/1/2015 LOGS: 0 SESSIONS: 0 TOTAL: 0
24/1/2015 LOGS: 0 SESSIONS: 0 TOTAL: 0
25/1/2015 LOGS: 0 SESSIONS: 0 TOTAL: 0
26/1/2015 LOGS: 0 SESSIONS: 0 TOTAL: 0
27/1/2015 LOGS: 0 SESSIONS: 0 TOTAL: 0
28/1/2015 LOGS: 0 SESSIONS: 0 TOTAL: 0
29/1/2015 LOGS: 0 SESSIONS: 0 TOTAL: 0
30/1/2015 LOGS: 0 SESSIONS: 0 TOTAL: 0
31/1/2015 LOGS: 0 SESSIONS: 0 TOTAL: 0
6/1/2015 LOGS: 96310 SESSIONS: 48119 TOTAL: 144429
7/1/2015 LOGS: 135520 SESSIONS: 71831 TOTAL: 207351
8/1/2015 LOGS: 147738 SESSIONS: 83692 TOTAL: 231430
9/1/2015 LOGS: 146525 SESSIONS: 79970 TOTAL: 226495
1/2/2015 LOGS: 160802 SESSIONS: 96255 TOTAL: 257057
13/2/2015 LOGS: 171560 SESSIONS: 133313 TOTAL: 304873
14/2/2015 LOGS: 137303 SESSIONS: 139007 TOTAL: 276310
15/2/2015 LOGS: 123968 SESSIONS: 108528 TOTAL: 232496
16/2/2015 LOGS: 127682 SESSIONS: 93216 TOTAL: 220898
17/2/2015 LOGS: 134877 SESSIONS: 97928 TOTAL: 232805
18/2/2015 LOGS: 148089 SESSIONS: 102859 TOTAL: 250948
19/2/2015 LOGS: 65756 SESSIONS: 45479 TOTAL: 111235
2/2/2015 LOGS: 137995 SESSIONS: 94762 TOTAL: 232757
3/2/2015 LOGS: 119767 SESSIONS: 90763 TOTAL: 210530
4/2/2015 LOGS: 120845 SESSIONS: 84982 TOTAL: 205827
5/2/2015 LOGS: 118199 SESSIONS: 78232 TOTAL: 196431
6/2/2015 LOGS: 119947 SESSIONS: 80005 TOTAL: 199952
7/2/2015 LOGS: 117875 SESSIONS: 80339 TOTAL: 198214
8/2/2015 LOGS: 121163 SESSIONS: 81231 TOTAL: 202394
9/2/2015 LOGS: 125858 SESSIONS: 88819 TOTAL: 214677
MIN(MB):  0   MAX(MB): 221  AVG(MB): 193 DAYS: 33
DAYS LEFT AT AVERAGE: 193 MB/ DAY: 104
DAYS LEFT AT MAXIMUM: 221 MB/ DAY: 91
DAYS LEFT AT TWICE MAXIMUM 442 MB /DAY: 45

 

The last three lines in RED give the number of days left based on the average, maximum and twice maximum daily storage consumption rate.
Each line beginning with a date shows how much space is used in KB for both Warehouse logs and sessions by # of days.
NotesCopy of script
 
#!/bin/bash
#
# SCRIPT: saw-retention.sh
# AUTHOR: David Waugh
# DATE: 18/02/2015
# REV: 1.0a
#
# PLATFORM: Linux Centos (Security Analytics 10.X)
#
# PURPOSE: Calculate Data Retention of a Warehouse
#
#
######################################################################
#
# Expects the path to the Warehouse as the first argument
# eg the argument given should point to the directory ABOVE rsasoc
# My files are written to /mnt/saw/lonsaw/rsasoc/v1/logs/data/2015/2
# So I will give /mnt/saw/lonsaw as the first argument
# We will calculate
# MIN DAILY USAGE RATE
# MAX DAILY USAGE RATE
# AVG DAILY USAGE RATE
MIN_DAILY=10000000000
MAX_DAILY=0
AVG_DAILY=0
DAY_COUNT=0
DAYSIZE=0
TOTALSIZE=0
echo -e "*** Looking at Warehouse under Mount $1 ***\n"
#First we do Logs
for years in $(ls -1 $1/rsasoc/v1/logs/data)
do
        #echo -e "Year: $years\n"
        for month in $(ls -1 $1/rsasoc/v1/logs/data/$years)
        do
           #echo -e "Month: $month\n"
           for day in $(ls -1 $1/rsasoc/v1/logs/data/$years/$month)
           do
             #echo -e "Day: $day "
             DAYLOGSIZE=`du -k $1/rsasoc/v1/logs/data/$years/$month/$day --max-depth=0 |tail -n 1 |cut -f 1`
             DAYSESSIONSIZE=`du -k $1/rsasoc/v1/sessions/data/$years/$month/$day --max-depth=0 |tail -n 1 |cut -f 1`
             DAYSIZE=`expr $DAYLOGSIZE + $DAYSESSIONSIZE`
             TOTALSIZE=`expr $TOTALSIZE + $DAYSIZE`
             echo -e "$day/$month/$years LOGS: $DAYLOGSIZE SESSIONS: $DAYSESSIONSIZE TOTAL: $DAYSIZE "
                # Only Increment Day Count for Non Empty Directories
                if [ $DAYSIZE -gt 0 ]
                then
                        DAY_COUNT=`expr $DAY_COUNT + 1`
                fi

                if [ $DAYSIZE -lt $MIN_DAILY ]
                then
                    MIN_DAILY=$DAYSIZE
                fi
                if [ $DAYSIZE -gt $MAX_DAILY ]
                then
                    MAX_DAILY=$DAYSIZE
                fi

           done
       
        done
done
AVG_DAILY=`expr $TOTALSIZE / $DAY_COUNT`
FREESPACE=`df $1 | tail -n 1 |tr -s ' ' |cut -d ' ' -f4`
DAYSLEFT_AVG=`expr $FREESPACE / $AVG_DAILY`
DAYSLEFT_MAX=`expr $FREESPACE / $MAX_DAILY`
DAYSLEFT_2MAX=`expr $FREESPACE / $MAX_DAILY / 2`
# Convert to MB
MIN_DAILY=`expr $MIN_DAILY / 1024`
MAX_DAILY=`expr $MAX_DAILY / 1024`
AVG_DAILY=`expr $AVG_DAILY / 1024`
TWICEMAX=`expr 2 \* $MAX_DAILY`
echo -e "MIN(MB):  $MIN_DAILY   MAX(MB): $MAX_DAILY  AVG(MB): $AVG_DAILY DAYS: $DAY_COUNT \n"
echo -e "DAYS LEFT AT AVERAGE: $AVG_DAILY MB/ DAY: $DAYSLEFT_AVG"
echo -e "DAYS LEFT AT MAXIMUM: $MAX_DAILY MB/ DAY: $DAYSLEFT_MAX"
echo -e "DAYS LEFT AT TWICE MAXIMUM $TWICEMAX MB /DAY: $DAYSLEFT_2MAX"

Attachments

Outcomes