000034993 - One of the server nodes in RSA Identity Management and Governance 6.9.1 in a WebSphere clustered environment keeps crashing due to 'ORA-02049 db locks' error and 'wpEventQueue not available'

Document created by RSA Customer Support Employee on Apr 3, 2017
Version 1Show Document
  • View in full screen mode

Article Content

Article Number000034993
Applies ToRSA Product Set: RSA Identity Management and Governance
RSA Product/Service Type: Enterprise Software
RSA Version/Condition: 6.9.1
Platform: IBM WebSphere
IssueOne node (for example, App2) of the three nodes in a WebSphere clustered environment has database locking issues (blocked sessions and row contentions) resulting in a production outage.  However, if that server is down, the other two servers in the clustered environment run fine.
All nodes in the WebSphere clustered environment have the same wpBus configuration and Message store configuration (except directory path).
The Message store type being used across all nodes in the clustered environment is File store.  See the configuration below:
wpBus Configuration:
Initial State: Started
Message store type: File store
High message threshold per message point: 50000 messages
Default blocked destination retry interval: 5000 milliseconds
Message Store:
Log size 500 MB
Minimum permanent store size 500 MB (Unlimited permanent store size)
Maximum permanent store size 500 MB
Minimum temporary store size 500 MB (Unlimited temporary store size)
Maximum temporary store size 500 MB


Issue Symptoms


  • When the problematic node (App2) is started, it is trying to replay all the transactions it has stored. The moment the problematic node comes up, it tries to replay all 50,000 messages in its queue, which have probably already been picked up by another application server long ago.
  • Because of this, the database is locking thousands of records. Then, the other application servers (e. g., App1 and App3) try to get the row,  the table locks and they cannot get the row.
  • When the SQL queries performed by App2 never complete or clear out, the error "ORA-02049: timeout: distributed transaction waiting for lock" is seen in App2's SystemOut.log.
Aside from database lock contention, the errors below in the log files are also symptoms of the issue:


Crashing node


The aveksaServer.log file for the problematic node will have the following error:
 


09/15/2016 10:40:24.309 ERROR (SIBJMSRAThreadPool : 8) [com.aveksa.server.message.MessageSubscriber] Listener threw during Message notification, for listener AuthorizationServiceProvider
java.lang.RuntimeException: Illegal TXN State: Cannot commit once a rollback begins. Txn count=1

at com.aveksa.server.db.persistence.PersistenceServiceProvider.commitTransaction(PersistenceServiceProvider.java:2536)
at com.aveksa.server.db.persistence.PersistenceServiceProvider.commitTransaction(PersistenceServiceProvider.java:2526)
at com.aveksa.server.db.persistence.PersistenceServiceProvider.closeJDBCQuery(PersistenceServiceProvider.java:3250)
at com.aveksa.server.db.persistence.PersistenceServiceProvider.executeJDBCQueryInteger(PersistenceServiceProvider.java:3374)
at com.aveksa.server.db.PersistenceManager.executeJDBCQueryInteger(PersistenceManager.java:496)
at com.aveksa.server.authorization.XXAuthorizationServiceProvider.updateImplicitBusinessSourceOwnerEntitlements(XXAuthorizationServiceProvider.java:2009)
at com.aveksa.server.authorization.XXAuthorizationServiceProvider.internalRefreshAuthorizationData(XXAuthorizationServiceProvider.java:2650)
at com.aveksa.server.authorization.XXAuthorizationServiceProvider.notifyMessage(XXAuthorizationServiceProvider.java:2581)
at com.aveksa.server.message.MessageSubscriberProvider.distributeMessage(MessageSubscriberProvider.java:78)
at com.aveksa.server.message.SubscriberMDB.onMessage(SubscriberMDB.java:78)
at com.ibm.ejs.container.WASMessageEndpointHandler.invokeJMSMethod(WASMessageEndpointHandler.java:138)
at com.ibm.ws.ejbcontainer.mdb.MessageEndpointHandler.invokeMdbMethod(MessageEndpointHandler.java:1146)
at com.ibm.ws.ejbcontainer.mdb.MessageEndpointHandler.invoke(MessageEndpointHandler.java:844)
at com.sun.proxy.$Proxy27.onMessage(Unknown Source)
at com.ibm.ws.sib.api.jmsra.impl.JmsJcaEndpointInvokerImpl.invokeEndpoint(JmsJcaEndpointInvokerImpl.java:233)
at com.ibm.ws.sib.ra.inbound.impl.SibRaDispatcher.dispatch(SibRaDispatcher.java:919)
at com.ibm.ws.sib.ra.inbound.impl.SibRaSingleProcessListener$SibRaWork.run(SibRaSingleProcessListener.java:592)
at com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:668)
at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1865)
09/13/2016 00:05:19.472 ERROR (WebContainer : 35) [com.aveksa.UI] com.aveksa.gui.pages.admin.workflow.workitem.WorkflowWorkItemPageData.<init>(WorkflowWorkItemPageData.java:84) -
com.aveksa.server.workflow.WorkflowServiceException: javax.ejb.EJBTransactionRolledbackException: Transaction rolled back; nested exception is: javax.transaction.TransactionRolledbackException: Transaction is ended due to timeout

at com.aveksa.server.workflow.WorkflowWorkItem.open(WorkflowWorkItem.java:2164)
at com.aveksa.gui.objects.workflow.GuiWorkflowWorkItem.open(GuiWorkflowWorkItem.java:233)
at com.aveksa.gui.pages.admin.workflow.workitem.WorkflowWorkItemPageData.<init>(WorkflowWorkItemPageData.java:82)
at sun.reflect.GeneratedConstructorAccessor212.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:39)
at java.lang.reflect.Constructor.newInstance(Constructor.java:527)
at com.aveksa.gui.pages.PageManager.makeNewPage(PageManager.java:491)
at com.aveksa.gui.pages.PageManager.handleRequest(PageManager.java:344)
at com.aveksa.gui.pages.PageManager.handleRequest(PageManager.java:254)
at com.aveksa.gui.core.MainManager.handleRequest(MainManager.java:176)
at com.aveksa.gui.core.MainManager.doGet(MainManager.java:125)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:575)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1230)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:779)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:478)
at com.ibm.ws.webcontainer.servlet.ServletWrapperImpl.handleRequest(ServletWrapperImpl.java:178)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.invokeTarget(WebAppFilterChain.java:136)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:97)
at com.aveksa.gui.core.filters.LoginFilter.doFilter(LoginFilter.java:67)
at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:195)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
at com.aveksa.gui.util.security.XSSFilter.doFilter(XSSFilter.java:20)
at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:195)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:964)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1104)
at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:87)
at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:914)
09/15/2016 11:08:06.833 ERROR (Worker_actionq#ActionQ1#WPDS_14) [com.aveksa.server.workflow.scripts.WorkflowContextImpl] Error Completing WorkItem: com.aveksa.server.db.PersistenceException: com.ibm.websphere.ce.cm.StaleConnectionException: No more data to read from socket
...
com.aveksa.server.workflow.WorkflowServiceException: com.aveksa.server.db.PersistenceException: com.ibm.websphere.ce.cm.StaleConnectionException: No more data to read from socket

at com.aveksa.server.workflow.scripts.WorkflowContextImpl.setCompletionInformation(WorkflowContextImpl.java:1066)
at com.aveksa.server.workflow.scripts.WorkflowContextImpl.completeWorkItem(WorkflowContextImpl.java:954)
at com.aveksa.server.workflow.scripts.WorkflowContextImpl.completeWorkItem(WorkflowContextImpl.java:897)
at com.aveksa.server.workflow.scripts.WorkflowContextImpl.completeWorkItem(WorkflowContextImpl.java:1079)
at com.aveksa.server.workflow.scripts.nodes.BaseWorkflowNode.nodeAvailableAsynchronous(BaseWorkflowNode.java:67)
at com.aveksa.server.workflow.scripts.nodes.SubprocessNode.nodeAvailableAsynchronous(SubprocessNode.java:41)
at com.aveksa.server.workflow.scripts.nodes.WorkflowNodeHandler.nodeAvailableAsynchronous(WorkflowNodeHandler.java:55)
at sun.reflect.GeneratedMethodAccessor200.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at com.workpoint.server.script.StatementEngineJava.execute(Unknown Source)
at com.workpoint.server.script.ScriptEngine.A(Unknown Source)
at com.workpoint.server.script.ScriptEngine.execute(Unknown Source)
at com.workpoint.server.monitor.ActionMonitorHelper.A(Unknown Source)
at com.workpoint.server.monitor.ActionMonitorHelper.execute(Unknown Source)
at com.workpoint.server.pojo.ScriptExecAsyncPvtBean.executeScriptMonitor(Unknown Source)
at com.workpoint.server.pojo.EJSRemote0SLScriptExecAsyncPvt_EJB_8b5c6ed5.executeScriptMonitor(EJSRemote0SLScriptExecAsyncPvt_EJB_8b5c6ed5.java)
at com.workpoint.server.pojo._ScriptExecAsyncPvt_Stub.executeScriptMonitor(_ScriptExecAsyncPvt_Stub.java:1)
at com.workpoint.client.Monitor.executeScriptMonitor(Unknown Source)
at com.workpoint.queue.work.ActionQWorker.A(Unknown Source)
at com.workpoint.queue.work.ActionQWorker.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:784)
Caused by:
com.aveksa.server.db.PersistenceException: com.ibm.websphere.ce.cm.StaleConnectionException: No more data to read from socket

at com.aveksa.server.db.persistence.PersistenceServiceProvider.runStoredProcedure(PersistenceServiceProvider.java:1458)
at com.aveksa.server.db.persistence.PersistenceServiceProvider.runStoredProcedure(PersistenceServiceProvider.java:1329)
at com.aveksa.server.db.PersistenceManager.runStoredProcedure(PersistenceManager.java:235)
at com.aveksa.server.workflow.scripts.WorkflowContextImpl.setCompletionInformation(WorkflowContextImpl.java:1064)
... 23 more
Caused by:
com.ibm.websphere.ce.cm.StaleConnectionException: No more data to read from socket

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:56)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:39)
at java.lang.reflect.Constructor.newInstance(Constructor.java:527)
at com.ibm.websphere.rsadapter.GenericDataStoreHelper.mapExceptionHelper(GenericDataStoreHelper.java:626)
at com.ibm.websphere.rsadapter.GenericDataStoreHelper.mapException(GenericDataStoreHelper.java:685)
at com.ibm.ws.rsadapter.AdapterUtil.mapException(AdapterUtil.java:2267)
at com.ibm.ws.rsadapter.jdbc.WSJdbcUtil.mapException(WSJdbcUtil.java:1191)
at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.execute(WSJdbcPreparedStatement.java:635)
at com.aveksa.server.db.persistence.PersistenceServiceProvider.runStoredProcedure(PersistenceServiceProvider.java:1432)
... 26 more

All nodes


Below are occurrences of the error in the aveksaServer.log for ALL nodes (including the problematic node):


09/15/2016 10:53:28.328 ERROR (SIBJMSRAThreadPool : 5) [org.hibernate.transaction.JDBCTransaction] Could not toggle autocommit
java.sql.SQLException: DSRA9350E: Operation setAutoCommit is not allowed during a global transaction.
    at com.ibm.ws.rsadapter.jdbc.WSJdbcConnection.setAutoCommit(WSJdbcConnection.java:3504)
    at org.hibernate.transaction.JDBCTransaction.toggleAutoCommit(JDBCTransaction.java:224)
    at org.hibernate.transaction.JDBCTransaction.rollbackAndResetAutoCommit(JDBCTransaction.java:216)
    at org.hibernate.transaction.JDBCTransaction.rollback(JDBCTransaction.java:192)
    at com.aveksa.server.db.persistence.PersistenceServiceProvider.cleanTransaction(PersistenceServiceProvider.java:2593)
    at com.aveksa.server.db.persistence.PersistenceServiceProvider.cleanTransaction(PersistenceServiceProvider.java:2569)
    at com.aveksa.server.db.PersistenceManager.cleanTransaction(PersistenceManager.java:416)
    at com.aveksa.server.workflow.scripts.split.ContextObject.getChangeRequestItemIds(ContextObject.java:1034)
    at com.aveksa.server.workflow.scripts.action.ConditionAction.evaluateChangeRequestConditions(ConditionAction.java:133)
    at com.aveksa.server.workflow.scripts.action.ConditionAction.evaluate(ConditionAction.java:76)
    at sun.reflect.GeneratedMethodAccessor184.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
    at java.lang.reflect.Method.invoke(Method.java:611)
    at com.workpoint.server.script.StatementEngineJava.execute(Unknown Source)
    at com.workpoint.server.script.ScriptEngine.A(Unknown Source)
    at com.workpoint.server.script.ScriptEngine.execute(Unknown Source)
    at com.workpoint.server.script.ScriptEngine.execute(Unknown Source)
    at com.workpoint.server.job.JobNode.changeState(Unknown Source)
    at com.workpoint.server.job.JobNode.changeState(Unknown Source)
    at com.workpoint.server.job.JobNode.isComplete(Unknown Source)
    at com.workpoint.server.job.Job.changeWorkItemState(Unknown Source)
    at com.workpoint.server.pojo.ServerAutomatedActivityMDBean.autoCompleteWorkItems(Unknown Source)
    at com.workpoint.server.pojo.ServerAutomatedActivityMDBean.autoCompleteWorkItems(Unknown Source)
    at com.workpoint.server.pojo.ServerAutomatedActivityMDBean.onMessage(Unknown Source)
    at com.ibm.ejs.container.WASMessageEndpointHandler.invokeJMSMethod(WASMessageEndpointHandler.java:138)
    at com.ibm.ws.ejbcontainer.mdb.MessageEndpointHandler.invokeMdbMethod(MessageEndpointHandler.java:1146)
    at com.ibm.ws.ejbcontainer.mdb.MessageEndpointHandler.invoke(MessageEndpointHandler.java:844)
    at com.sun.proxy.$Proxy27.onMessage(Unknown Source)
    at com.ibm.ws.sib.api.jmsra.impl.JmsJcaEndpointInvokerImpl.invokeEndpoint(JmsJcaEndpointInvokerImpl.java:233)
    at com.ibm.ws.sib.ra.inbound.impl.SibRaDispatcher.dispatch(SibRaDispatcher.java:919)
    at com.ibm.ws.sib.ra.inbound.impl.SibRaSingleProcessListener$SibRaWork.run(SibRaSingleProcessListener.java:592)
    at com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:668)
    at com.ibm.ws.util.ThreadPool$Worker.run(ThreadPool.java:1865)
09/15/2016 10:53:28.330 ERROR (SIBJMSRAThreadPool : 5) [org.hibernate.transaction.JDBCTransaction] JDBC rollback failed
java.sql.SQLException: DSRA9350E: Operation Connection.rollback is not allowed during a global transaction.
    at com.ibm.ws.rsadapter.jdbc.WSJdbcConnection.rollback(WSJdbcConnection.java:3350)
    at org.hibernate.transaction.JDBCTransaction.rollbackAndResetAutoCommit(JDBCTransaction.java:213)
    at org.hibernate.transaction.JDBCTransaction.rollback(JDBCTransaction.java:192)
    at com.aveksa.server.db.persistence.PersistenceServiceProvider.cleanTransaction(PersistenceServiceProvider.java:2593)
    at com.aveksa.server.db.persistence.PersistenceServiceProvider.cleanTransaction(PersistenceServiceProvider.java:2569)
    at com.aveksa.server.db.PersistenceManager.cleanTransaction(PersistenceManager.java:416)
    at com.aveksa.server.workflow.scripts.split.ContextObject.getChangeRequestItemIds(ContextObject.java:1034)
    at com.aveksa.server.workflow.scripts.action.ConditionAction.evaluateChangeRequestConditions(ConditionAction.java:133)
    at com.aveksa.server.workflow.scripts.action.ConditionAction.evaluate(ConditionAction.java:76)
    at sun.reflect.GeneratedMethodAccessor184.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
    at java.lang.reflect.Method.invoke(Method.java:611)
    at com.workpoint.server.script.StatementEngineJava.execute(Unknown Source)
    at com.workpoint.server.script.ScriptEngine.A(Unknown Source)
    at com.workpoint.server.script.ScriptEngine.execute(Unknown Source)
    at com.workpoint.server.script.ScriptEngine.execute(Unknown Source)
    at com.workpoint.server.job.JobNode.changeState(Unknown Source)
    at com.workpoint.server.job.JobNode.changeState(Unknown Source)
    at com.workpoint.server.job.JobNode.isComplete(Unknown Source)
    at com.workpoint.server.job.Job.changeWorkItemState(Unknown Source)
    at com.workpoint.server.pojo.ServerAutomatedActivityMDBean.autoCompleteWorkItems(Unknown Source)
    at com.workpoint.server.pojo.ServerAutomatedActivityMDBean.autoCompleteWorkItems(Unknown Source)
    at com.workpoint.server.pojo.ServerAutomatedActivityMDBean.onMessage(Unknown Source)
    at com.ibm.ejs.container.WASMessageEndpointHandler.invokeJMSMethod(WASMessageEndpointHandler.java:138)
    at com.ibm.ws.ejbcontainer.mdb.MessageEndpointHandler.invokeMdbMethod(MessageEndpointHandler.java:1146)
    at com.ibm.ws.ejbcontainer.mdb.MessageEndpointHandler.invoke(MessageEndpointHandler.java:844)
    at com.sun.proxy.$Proxy27.onMessage(Unknown Source)
    at com.ibm.ws.sib.api.jmsra.impl.JmsJcaEndpointInvokerImpl.invokeEndpoint(JmsJcaEndpointInvokerImpl.java:233)
    at com.ibm.ws.sib.ra.inbound.impl.SibRaDispatcher.dispatch(SibRaDispatcher.java:919)
    at com.ibm.ws.sib.ra.inbound.impl.SibRaSingleProcessListener$SibRaWork.run(SibRaSingleProcessListener.java:592)
    at com.ibm.ejs.j2c.work.WorkProxy.run(WorkProxy.java:668)
    at com.ibm.ws.util.ThreadPool


WebSphere SystemOut.log


Occurrences of the error in WebSphere's SystemOut log file for the problematic node which keeps crashing:
 
[15/09/16 11:33:38:144 EST] 0000013f SystemOut     O 2016-09-15 11:33:38,144 [Thread-137] INFO  com.workpoint.server.ServerProperties  - ServerProperties.setProperty() invoked for property= calculated.db.offset.millis, value=0
[15/09/16 11:33:39:775 EST] 0000013e SibMessage    I   [:] CWSIP0555W: The Remote Message Point on ME App2Node01.App2-wpBus for destination wpEventQueue, localized at 1CB1FB9AB589D497 has reached its message depth high threshold.
[15/09/16 11:33:44:793 EST] 0000013e SystemOut     O 2016-09-15 11:33:44,793 [Thread-136] ERROR com.workpoint.server.monitor.MonitorHelper  - Exception occurred attempting to queue a Monitor Started Message for monitor type actionq#ActionQ1#881
javax.jms.JMSException: CWSIA0067E: An exception was received during the call to the method JmsMsgProducerImpl.sendMessage (#4): com.ibm.ws.sib.processor.exceptions.SIMPLimitExceededException: CWSIK0025E: The destination wpEventQueue on messaging engine App2Node01.App2-wpBus is not available because the high limit for the number of messages for this destination has already been reached..
    at com.ibm.ws.sib.api.jms.impl.JmsMsgProducerImpl.sendMessage(JmsMsgProducerImpl.java:1346)
    at com.ibm.ws.sib.api.jms.impl.JmsMsgProducerImpl.send(JmsMsgProducerImpl.java:736)
    at com.workpoint.common.util.JMSUtils.sendQueueMessage(Unknown Source)
    at com.workpoint.common.util.JMSUtils.sendQueueMessage(Unknown Source)
    at com.workpoint.server.monitor.MonitorHelper.monitorStarted(Unknown Source)
    at com.workpoint.server.pojo.MonitorPvtBean.addMonitor(Unknown Source)
    at com.workpoint.server.pojo.EJSRemote0SLMonitorPvt_EJB_65486d32.addMonitor(EJSRemote0SLMonitorPvt_EJB_65486d32.java)
    at com.workpoint.server.pojo._MonitorPvt_Stub.addMonitor(_MonitorPvt_Stub.java:1)
    at com.workpoint.client.Monitor.addMonitor(Unknown Source)
    at com.workpoint.queue.core.QMonitor.H(Unknown Source)
    at com.workpoint.queue.core.QMonitor.startMonitor(Unknown Source)
    at com.workpoint.queue.WpQMonitors$_A.run(Unknown Source)
    at java.util.Timer$TimerImpl.run(Timer.java:296)
Caused by:
com.ibm.ws.sib.processor.exceptions.SIMPLimitExceededException: CWSIK0025E: The destination wpEventQueue on messaging engine App2Node01.App2-wpBus is not available because the high limit for the number of messages for this destination has already been reached.
    at com.ibm.ws.sib.processor.impl.PtoPInputHandler.checkHandlerAvailable(PtoPInputHandler.java:2969)
    at com.ibm.ws.sib.processor.impl.PtoPInputHandler.internalHandleMessage(PtoPInputHandler.java:532)
    at com.ibm.ws.sib.processor.impl.PtoPInputHandler.handleProducerMessage(PtoPInputHandler.java:283)
    at com.ibm.ws.sib.processor.impl.ProducerSessionImpl.send(ProducerSessionImpl.java:643)
    at com.ibm.ws.sib.api.jms.impl.JmsMsgProducerImpl.sendMessage(JmsMsgProducerImpl.java:1277)
... 12 more

[15/09/16 11:53:45:146 EST] 00000172 SystemOut     O 2016-09-15 11:53:45,145 [Worker_alertq#AlertQ1#WPDS_2] ERROR com.workpoint.server.recordset.SmartStatement  - SQLException caught
java.sql.SQLSyntaxErrorException: ORA-02049: timeout: distributed transaction waiting for lock
    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:440)
    at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396)
    at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:837)
    at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:445)
    at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:191)
    at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:523)
    at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:207)
    at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:1010)
    at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1315)
    at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3576)
    at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:3657)
    at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeUpdate(OraclePreparedStatementWrapper.java:1350)
    at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.pmiExecuteUpdate(WSJdbcPreparedStatement.java:1187)
    at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.executeUpdate(WSJdbcPreparedStatement.java:804)
    at com.workpoint.server.recordset.SmartStatement.executeUpdate(Unknown Source)
    at com.workpoint.server.recordset.RecordSet.executeUpdate(Unknown Source)
    at com.workpoint.server.recordset.WP_PROCI_CONTROL.update(Unknown Source)
    at com.workpoint.server.job.Job.touchRootParent(Unknown Source)
    at com.workpoint.server.job.Job.touchRootParent(Unknown Source)
    at com.workpoint.server.pojo.JobUpdatePvtBean.evaluateData(Unknown Source)
    at com.workpoint.server.pojo.EJSRemote0SLJobUpdatePvt_EJB_5fb93aca.evaluateData(EJSRemote0SLJobUpdatePvt_EJB_5fb93aca.java)
    at com.workpoint.server.pojo._JobUpdatePvt_Stub.evaluateData(_JobUpdatePvt_Stub.java:1)
    at com.workpoint.client.Job.evaluate(Unknown Source)
    at     com.aveksa.server.workflow.RetryEnabledOperationsJobUtils$JobEvaluateStrategy.execute(RetryEnabledOperationsJobUtils.java:345)
    at     com.aveksa.server.workflow.RetryEnabledOperationsJobUtils.executeJobStrategyWithProcessing(RetryEnabledOperationsJobUtils.java:169)
    at com.aveksa.server.workflow.RetryEnabledOperationsJobUtils.evaluateWithProcessing(RetryEnabledOperationsJobUtils.java:137)
    at com.aveksa.server.workflow.scripts.nodes.EscalationHandler.createEscalationNode(EscalationHandler.java:294)
    at com.aveksa.server.workflow.scripts.nodes.EscalationHandler.escalate(EscalationHandler.java:113)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
    at java.lang.reflect.Method.invoke(Method.java:611)
    at com.workpoint.server.script.StatementEngineJava.execute(Unknown Source)
    at com.workpoint.server.script.ScriptEngine.A(Unknown Source)
    at com.workpoint.server.script.ScriptEngine.execute(Unknown Source)
    at com.workpoint.server.monitor.AlertMonitorHelper.A(Unknown Source)
    at com.workpoint.server.monitor.AlertMonitorHelper.A(Unknown Source)
    at com.workpoint.server.monitor.AlertMonitorHelper.A(Unknown Source)
    at com.workpoint.server.monitor.AlertMonitorHelper.doExecute(Unknown Source)
    at com.workpoint.server.monitor.AlertMonitorHelper.execute(Unknown Source)
    at com.workpoint.server.pojo.AlertPvtBean.executeAlertMonitor(Unknown Source)
    at com.workpoint.server.pojo.EJSRemote0SLAlertPvt_EJB_0fb3523f.executeAlertMonitor(EJSRemote0SLAlertPvt_EJB_0fb3523f.java)
    at com.workpoint.server.pojo._AlertPvt_Stub.executeAlertMonitor(_AlertPvt_Stub.java:1)
    at com.workpoint.client.Monitor.executeAlertMonitor(Unknown Source)
    at com.workpoint.queue.work.AlertQWorker.A(Unknown Source)
    at com.workpoint.queue.work.AlertQWorker.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:784)

CauseThe JMS queue of the problematic node was corrupted due to a bad file store.
ResolutionThere is no known fix to prevent the JMS queue corruption if a file store is bad.
WorkaroundAs a workaround for the issue, follow the steps below:
  1. Bring down all the application servers in the WebSphere clustered environment and ensure that there are no connections to the database.
  2. Rename the file store destination folder on the problematic node (App2) and restart the problematic node. For example, rename  /opt/IBM/WebSphere/AppServer/profiles/AppSrv01/filestores/com.ibm.ws.sib/App2-wpBus-8B7CA4D08CCCB229 to App2-wpBus-8B7CA4D08CCCB229.orig<date>.  
  3. After renaming the file store destination folder and restarting App2's application server, its queue file store destination folder will be recreated automatically.
  4. Ensure that activities are going to App2 and monitor this node; i. e., there should not be a database locking issue.
  5. Bring down application server (App2).
  6. Bring up the primary application server (App1).
  7. Ensure the majority of the requests/activities are going to App1 and monitor this node; i. e., there should not be a database locking issue.
  8. Restart the primary application server (App1).
  9. Start up the second application server (App2).
  10. Start up the third application server (App3).
  11. Allow more requests to be processed and ensure that each of the server gets hit.
  12. Monitor all servers and observe that there are no database locking issues, There should not be any occurrences of the specific errors in the logs which were symptoms of the issue.

Attachments

    Outcomes