Admin Error: Description: Queue Depth
Can somebody please help me understand what this error is?
How can I resolve?
Version: 7.1 P03
Oracle 12 - RSA Onboard
- Community Thread
- Forum Thread
- Identity G&L
- Identity Governance & Lifecycle
- RSA Identity
- RSA Identity G&L
- RSA Identity Governance & Lifecycle
- RSA Identity Governance and Lifecycle
- RSA IGL
I have been working with RSA Customer Support for over two months on this without a resolution. This is simply incredible. I followed the solution here https://community.rsa.com/docs/DOC-97453 and it didn't help. After going back-and-forth with the support rep trying different things that didn't work, I was most recently told that I needed to add two additional entries: WF_QUEUE_DEPTH_DURATION_ERROR and WF_QUEUE_DEPTH_DURATION_CRITICAL which have nothing to do with the Warning Admin Errors. Not surprisingly, this didn't help either.
Oh and another thing we also get dozens of these Admin Errors filling up the queue just to us everything is OK...which I really don't see the need for:
Admin Error: 107940
Error ID: 107940
Description: Queue Depth
Type: System Status Event
Created On: 2/11/19 12:51 PM
OK: Monitor[actionq#Normal] has taken 1976 ms to process an item.
Sorry for your frustration here. It is important we understand why the queue depth are triggering here first before we look for ways just to turn them off. This is equivalent to saying you just want the check engine light to be turned off on your dash. It is usually on for a reason.
The queue depth warnings/errors/critical detection occurs when the workflow engine sees a lot of work backing up within the engine. It is meant to warn you that something is going on you need to look closer at. It is possible you just have a very busy system in which case increasing the levels where each detection level occurs is ok. Before we do that though, I recommend you go to the Admin->Workflow->Monitoring screen to understand what queue in particular is backing up. For example, it might be the custom node queue which indicates you have some long running java or sql nodes in your workflow that are causing concern. In this case, you probably want to go examine what that logic is doing and if it can be optimized rather than suppress these detection messages.
If after investigation you are convinced the system is fine and you really do not want to see these detection messages at the current levels you can look at increasing them. In your comments above it is not clear if it was properly explained to you how the detection levels work. The levels represent levels of detection much like the dashboard lights on your car starting with lights that are yellow and progressing to red based on severity. Similarly, we have the levels warning, error, and critical. So if you are looking to increase the warning level, you very likely also have to increase the error and critical levels so those arent triggering. If I raise just the warning level from 200 to 1000 for example, you might have moved the warning level but if the error level is set to 800, you have a problem still.
I hope this helps. Please look at why the detection messages are occurring first before suppressing.
Just yesterday, I see that I have thousands of pending verification items for requests that were closed.
Additionally, open escalations, again for requests that were closed.
I have sent this information to RSA Support, but what is the impact if I just cancel them?
Thanks for the explanation.
Can you tell me what the format is for stopping the "OK: Monitor" Admin Error entries? We have dozens of them daily.
Cancelling them means those requests will no longer be outstanding. They will show up as cancelled in our UI but them nothing is waiting to try and verify these changes.
Depends on what type of monitor event it is. I need more details. In general, these are being raised because you need to do something (not ignore the admin events). If it is pending verifications like above you can cancel them. If it is queue depth, it may mean you need to look at your workflows and see if there is a particular node that takes too long (like an inefficient query or bad java code in a java node)