000034041 - RSA Identity Governance and Lifecycle Raw Data Cleanup

Document created by RSA Customer Support Employee on Oct 5, 2016Last modified by RSA Customer Support Employee on Apr 21, 2017
Version 2Show Document
  • View in full screen mode

Article Content

Article Number000034041
Applies ToRSA Product Set: RSA Identity Governance and Lifecycle (RSA G&L)
IssueRSA Identity Governance and Lifecycle maintains the last several raw data sets from each collector for the purposes of validation and troubleshooting errors. This job can be unintentionally interrupted by collections and other processes, and if it is consistently not allowed to run to completion, it can result in database growth and collection performance degradation.
 
TasksWe typically suggest running collections off-hours, as they run best when there is a lower load on the server. With this in mind, many customers will schedule them in the early morning hours. This unfortunately can lead to contention as there are some processes that the system will schedule by default to run at that time as well. Out of the box we will run our statistics job at 12:30 AM on the server. At the end of statistics there is a process that will purge all collectors’ raw data. The SQL query below can be used to determine the time of the last successful completion of the purge.
SELECT MAX(EVENT_TIME) FROM T_AV_JOB_STATS WHERE PKG_NAME='RawData_Helper' AND 
PROC_NAME='Purge_And_Refresh_Raw_Tables' AND EVENT = 'END';

Ideally, this will return a recent date; however if the date is several days in the past you may have an issue with your collections interrupting the clean up task. If the date is weeks or further in the past, you may have noticeable database bloat and performance impact.
ResolutionThere are multiple things that can be done to improve this area

  • Running Collector > Scheduling > Tasks > DB Statistics Refresh


This determines at what time database statistics and the clean-up job run. By default, it is scheduled to run at 12:30 AM. Ideally you want to schedule it at a period where you have the longest period absent of collections and other interruptions. For example, if your collections run from 10:00 PM to 3:00 AM, a good time to schedule database statistics to run would be 3:15 AM. Assuming your work day begins at 8:00 AM, that gives database statistics a period of 6 hours and 45 minutes to complete the run time, which is higher than the average expected run time, so it should complete regularly each night.


  • Raw Data Set Retention


The number of runs worth of raw data maintained is determined by the CollectorRawDataSetsToKeep value in the T_SYSTEM_SETTINGS table. There is a different KB article that goes into more details on this topic (KB 000030107). The short answer is that the minimum recommended setting for this value is 2. You can check the value by running the below SQL:

SELECT VALUE FROM T_SYSTEM_SETTINGS WHERE PARAMETER = 'CollectorRawDataSetsToKeep';

You can change the setting to the recommended value of 2 with the SQL command below:

UPDATE T_SYSTEM_SETTINGS SET VALUE = '2' WHERE PARAMETER = 'CollectorRawDataSetsToKeep'; 
COMMIT;


  • Shrink the T_RAW_* and T_DC_SOURCEDATA_* Tables


After the purge process has cleaned up excess rows from the raw data tables, you can shrink the tables to reclaim space. This is only necessary if significant clean up (4+ runs) was needed. The below should only be run when the app-server is offline.

ALTER TABLE {TABLENAME} ENABLE ROW MOVEMENT;
ALTER TABLE {TABLENAME} SHRINK SPACE;
ALTER TABLE {TABLENAME} DISABLE ROW MOVEMENT;

Attachments

    Outcomes