000034041 - How to manage raw data cleanup in RSA Identity Governance & Lifecycle

Document created by RSA Customer Support Employee on Oct 5, 2016Last modified by RSA Customer Support on Feb 26, 2020
Version 3Show Document
  • View in full screen mode

Article Content

Article Number000034041
Applies ToRSA Product Set: RSA Identity Governance & Lifecycle
RSA Version/Condition: All
IssueRSA Identity Governance & Lifecycle maintains the last several raw data sets from each collection for the purposes of validation and troubleshooting errors. This job can be unintentionally interrupted by collections and other processes, and if it is consistently not allowed to run to completion, it can result in database growth and collection performance degradation.
TasksRSA Identity Governance & Lifecycle Support typically suggests running collections off-hours, as collections run best when there is a lower load on the server. With this in mind, many customers will schedule them to run in the early morning hours. This unfortunately can lead to contention as there are some processes that the system will schedule by default to run at that time as well. Out of the box RSA Identity Governance & Lifecycle runs the database statistics job at 12:30 AM on the server. At the end of database statistics there is a process that will purge all collectors’ raw data. The SQL query below can be used to determine the time of the last successful completion of the purge. Run this query as avuser.

PROC_NAME='Purge_And_Refresh_Raw_Tables' AND EVENT = 'END';

Ideally, this will return a recent date; however if the date is several days in the past you may have an issue with your collections interrupting the clean-up task that is purging the raw data. If the date is weeks or further in the past, you may have noticeable database bloat and performance impact.
ResolutionBelow are suggestions for managing this issue:

  • Database Statistics scheduling

The date, time and frequency the database statistics and clean-up job run can be scheduled under Collectors > Scheduling > Tasks tab (DB Statistics Refresh) > Edit. By default, database statistics are scheduled to run at 12:30 AM. Ideally you want to schedule it at a period where you have the longest period absent of collections and other interruptions. For example, if your collections run from 10:00 PM to 3:00 AM (five hours), a good time to schedule database statistics to run would be 3:15 AM. Assuming your work day begins at 8:00 AM, that gives database statistics a period of six hours and 45 minutes to complete the run time, which is higher than the average expected run time of five hours, so it should complete regularly each night.

  • Raw Data Set Retention

The number of runs worth of raw data maintained is determined by the CollectorRawDataSetsToKeep custom parameter. You may want to modify this setting to reduce the number of raw data sets to keep. Please see RSA Knowledge Base Article 000030107 -- Optimal value for CollectorRawDataSetsToKeep in RSA Identity Governance & Lifecycle for details on modifying this setting.

  • Shrink the T_RAW_* and T_DC_SOURCEDATA_* Tables

After the purge process has cleaned up excess rows from the raw data tables, you can shrink the tables to reclaim space. This is only necessary if significant clean up (4+ runs) was needed. The below SQL statements should only be run when the application server is offline. I.e. when RSA Identity Governance & Lifecycle is shutdown or in maintenance mode. Perform the following steps as SYSDBA:


For more information on managing space and shrinking tables in RSA Identity Governance & Lifecycle, please see RSA Knowledge Base Article How to manage database growth in RSA Identity Governance & Lifecycle.