000037495 - Queue depth admin errors in RSA Identity Governance & Lifecycle

Document created by RSA Customer Support Employee on Jun 6, 2019Last modified by RSA Customer Support Employee on Jun 17, 2019
Version 3Show Document
  • View in full screen mode

Article Content

Article Number000037495
Applies ToRSA Product Set: Identity Governance & Lifecycle
RSA Version/Condition: 7.1.x

 
Issue

Emails are being generated with Queue Depth Admin errors.  For example,



A new admin error has been generated.
The error requires the attention of AveksaAdmin.

Admin error Details:

Type: SystemStatusEvent
Description: Queue Depth
Date Generated: 3/27/19 4:14 PM
Priority: High



 


The /home/oracle/wildfly-10.1.0.Final/standalone/log/aveksaServer.log has messages like these:


 


03/27/2018 15:21:22.442 WARN  (Worker_eventq#Event Queue#WPDS_16471) [com.aveksa.server.workflow.statistics.QueueDepthProcessor] WARNING: Monitor[alertq#Normal] has taken 18886 ms to process an item. please check logs for details.

03/27/2018 15:29:11.669 WARN  (Worker_eventq#Event Queue#WPDS_16471) [com.aveksa.server.workflow.statistics.QueueDepthProcessor] ERROR: Monitor[actionq#Role] has taken 141751 ms to process an item. please check logs for details.

03/27/2018 15:37:12.223 WARN  (Worker_eventq#Event Queue#WPDS_16468) [com.aveksa.server.workflow.statistics.QueueDepthProcessor] CRITICAL: Monitor[jobq#Normal] has taken 505538 ms to process an item. please check logs for details.


These messages may be filling up the aveksaServer.log:
 
User-added image
Cause

Automated workflow tasks run in queues. These messages indicate that workflow queues are backed up for some reason and workflow activities have been in the queues for a while. 


Depending on the length of time workflow activities have been waiting in the queue, the message is labeled WARNING, ERROR, or CRITICAL, where CRITICAL indicates the longest wait of the three types.

Common causes are long-running SQL queries, looping workflows, and/or database locks.



 
Resolution
Long waits in the Workpoint queues can result in performance degradation.


When these messages are happening, look holistically at the system. Look for a system issue such as long-running queries or database locks. Review your workflow implementation. Look for a workflow issue such as looping workflows. Determine if this is a one-time activity such as an upgrade or a patch in which case these messages may be ignored.

Queues may be monitored from the RSA Identity Governance & Lifecycle User Interface (Admin > Workflow > Monitoring).

For further help, please contact RSA Customer Support.
Workaround
The wait time thresholds for sending the notifications are controlled by three new variables that were introduced when queue monitoring functionality was implemented. These variables retain notification threshold values for the time a workflow task is allowed to sit in a queue before a notification is sent out and logged to the aveksaServer.log. Threshold values can be set for warnings, errors, or critical notifications. These variables and their default values are:
 


WF_QUEUE_DEPTH_DURATION_WARNING (default=3000)
WF_QUEUE_DEPTH_DURATION_ERROR (default=120,000)
WF_QUEUE_DEPTH_DURATION_CRITICAL (default=240,000)
 


To reduce the amount of logging to the aveksaServer.log and the quantity of emails generated by these notifications, these default thresholds may need to be increased to more meaningful values based on your system's usage. There is no "right" value to set them at, we are simply looking for high water marks beyond what your regular usage should be hitting. If you are seeing this message due to a single item that is extremely high, then the message is doing it's job correctly, only adjust if you are being flooded with warnings.

To determine the best values for your implementation, in the RSA Identity Governance & Lifecycle User Interface, go to Admin > Admin Errors and add the Details column to the table. Then search the table for the string queue. This should pull up all the recent hits and you should be able to see in the Details column what values in milliseconds (ms) are triggering the notification. If you are seeing more than just WARNING messages, such as  ERROR and CRITICAL, then those may need to be adjusted as well.


 


Once you have determined a meaningful new highwater mark, in the RSA Identity Governance & Lifecycle User Interface, go to Admin > System > Edit and add the following three parameters to the Custom section at the bottom of the screen replacing the X, Y and Z values with the new values you have determined are best for your implementation.


 


 WF_QUEUE_DEPTH_DURATION_WARNING      >> X
 WF_QUEUE_DEPTH_DURATION_ERROR          >> Y
 WF_QUEUE_DEPTH_DURATION_CRITICAL       >> Z
 

You do not need to preface these parameters with the custom tag. This happens automatically once you save the settings.



User-added image

Attachments

    Outcomes