Workflow jobs or nodes with a large number of history records have been known to lead to system performance issues such as 100% CPU usage and out of memory errors both of which can lead to an unresponsive system needing an application restart in RSA Identity Governance & Lifecycle.
An important artifact to gather when there is 100% CPU usage and/or out of memory errors is the Aveksa Statistics Report (ASR). The ASR can aid in determining whether workflow jobs or workflow nodes with large amounts of history could be the root cause of the performance issue.
To generate an ASR, in the RSA Identity Governance & Lifecycle user interface go to Admin > Diagnostics > Create Report.
There are two sections in the ASR under Workflow Information that are useful for determining if Workflows are causing CPU and memory issues. These are:
Top 10 Workflow Jobs By History – A table containing information about the top ten workflow jobs that have the largest number of history records in the database. Information includes the Job Name, Job ID, Job DB, and History Count; and
Top 10 Workflow Nodes By History – A table containing information about the top ten workflow job nodes that have the largest number of history records in the database. Information includes the Node Name, Job ID, Job DB, and History Count.
Look at the Top 10 Workflow Jobs By History section to see if there are any workflow job(s) with a History Count >= 1000. A workflow job with a History Count of 1000+ is indicative of a long running workflow. Similarly, look at the Top 10 Workflow Job Nodes By History section to see if there any nodes with a History Count >=1000. A workflow job node with a History Count of 1000+ is indicative of a workflow job that has excessive looping. Both long running workflows and workflows with excessive looping, can be the root cause of CPU and/or out of memory errors.
Typically long running workflows or ones with excessive looping are the result of poor workflow design or abnormal circumstances (e.g., polling for state every 30 seconds, or indefinite looping rather than stopping after a reasonable threshold) resulting in the workflow not completing in an appropriate time frame.
For a short-term resolution to the problem, please contact RSA Identity Governance & Lifecycle Customer Support and mention this RSA Knowledge Base Article ID 000032254 for reference. The long-term resolution may require a redesign of the problematic workflow(s).