SecurID® Governance & Lifecycle Blog

Subscribe to the official SecurID Governance & Lifecycle community blog for information about new product features, industry insights, best practices, and more.

New Feature:  Workflow System Status

SeanMiller1
Moderator Moderator
Moderator
3 4 2,788

We have all been driving our car and at some point a light comes on the dashboard.  Sometimes it is a simple orange light like the windshield fluid.  We should top that up but I can keep driving without harm likely (unless I can no longer see the road).  The dashboard might similarly show me an orange check engine light.  This usually means you need to get your car into the shop but it isn't an immediate concern.  Alternatively, the same light might show red telling you a serious problem has occurred in your engine.  You need to stop driving now.  In the recent  RSA Identity Governance and Lifecycle 7.1 release, we have introduced a similar concept focusing on workflow system status.

The Admin->Workflow→Monitoring page will show you a real time view of the workflow system status.  This includes graphs for how hard it is working (Number of Items Serviced), if anything is backing up (Queue Size), and system status indicators.  The status indicators only show if there is an issue. Not only do the status indicators surface that there is a problem, they generally have a means to resolve the problem or at least get more details.  A status indicator will show a hand cursor if you can click it for more information to resolve the issue.  In addition to the visual indicators,  the system will send out admin errors with the appropriate status and information.  The administrators can configure Notification rules to email these events to the appropriate administrator.   

The system is configured to monitor the following conditions and surface workflow status indicators.

Verification (Count)

pastedImage_1.png

This status indicator determines how many changes are pending verification that are older than one month and less than 12 months.

Thresholds

  1. Warning - 100 changes
  2. Error - 500 changes
  3. Critical - 1000 changes

Resolution

This status indicator allows you to click through to a screen that shows the changes that we are trying to verify.  The verifications will be dealt with by future collections or an administrator can choose to cancel a change here to remove the verification.

Verification (Age)

pastedImage_1.png

This status indicator determines if there are any changes pending verification that are older than n months

Thresholds

  1. Warning - no warning by default
  2. Error - There are changes older than 6 months that havent been verified
  3. Critical - There are changes older than 12 months that havent been verified

Resolution

This status indicator allows you to click through to a screen that shows the changes that we are trying to verify.  The verifications will be dealt with by future collections or an administrator can choose to cancel a change here to remove the verification.

Queue Backup

pastedImage_2.png

This is a series of status indicator (one for each priority queue type) that will show if work 

Thresholds

  1. Warning - 1000 ms by default
  2. Error -      2*60*1000 ms by default
  3. Critical -  4*60*1000 ms by default

Stalled Workflows

pastedImage_3.png

This status indicator determines if there are any workflows marked as stalled.

Thresholds

  1. Warning - 0
  2. Error - 50
  3. Critical - 100

Workflows should not ever be marked as stalled.  So even one is being considered a warning.

Resolution

This status indicator allows you to click through to see the stalled workflow jobs.   In general, a stalled workflow needs to be examined more closely to see if there is some flaw in the business logic.  A stalled workflow indicates something took longer than expected.  From this screen you can also evaluate the workflow(s) to see if they can proceed. 

Database Connections

pastedImage_4.png

Thresholds

 

  1. Critical - Any exception thrown by the workflow engine that it can no longer communicate with the database

Resolution

Clicking this status indicator icon opens up dialog where an administrator can check if the workflow engine can communicate with the database. If the connection is successful, the status indicator is cleared and an admin error is logged for change of status.

For more information on this feature – please check out Video Link : 32182 

4 Comments
ChrisPope
Frequent Contributor
Frequent Contributor

We are testing 7.1 Patch 03 and are getting dozens of Admin Errors of Type = "System Status Event" and Description = "Queue Depth".  Mostly they have Detail values that start with:

- OK: Monitor[actionq#Role]

- WARNING: Monitor[actionq#Role]

- OK: Monitor[actionq#Normal]

- WARNING: Monitor[actionq#Normal].

 

I followed the instructions found here https://community.rsa.com/docs/DOC-97453 and added WF_QUEUE_DEPTH_DURATION_WARNING with a value of 7900, which is higher than a value in any of the errors.

 

The Admin Errors are still being generated for values less than the 7900 threshold, so this doesn't seem to work.  Now what?  What am I missing?

 

Also, Is there any way to stop the "OK" messages completely?  I don't understand why we would want dozens of "Admin Errors" telling us everything is OK.

MHelmy
Moderator Moderator
Moderator

I would recommend opening a support case to check this behaviour. It's hard to give an answer on such situation here without the proper logs and investigation done by support.

EmreGuloglu1
Beginner
Beginner

Facing the same issue. Eventhough value is set to 10k, I keep getting OK: Monitor[actionq#CustomNode] has taken 0 ms to process an item. Could you please share the solution, if you have found?

ChrisPope
Frequent Contributor
Frequent Contributor

I opened a Case with RSA and after several months received a partial solution.

 

For the Warning messages, the actual format for the entry in Admin > System is:

custom.WF_QUEUE_DEPTH_DURATION_WARNING

 

I have received no response or reason to the purpose of having "OK" Admin Errors.  Nor have I received a way to stop them from happening.

 

It appears to me that this entire "feature" was not thought through very well.  It is very annoying...as is the lack of response from RSA.