000032859 - Internal replication error and RSA Authentication Manager 8.1 replica instance failed to sync to primary after forcing it out of sync

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support Employee on Jul 14, 2017
Version 7Show Document
  • View in full screen mode

Article Content

Article Number000032859
Applies ToRSA Product Set: SecurID
RSA Product/Service Type: Authentication Manager
RSA Version/Condition: 8.1
 

 
Issue
  • The replication status report on the primary's Operations Console, shows the following error when forcing the replica out of sync and then trying to resynchronize it to the primary:
 
Internal replication error

 

  • The RSA Replication (Replica) service on the replica instance is shutdown.
rsaadmin@am81r1:/opt/rsa/am/server> ./rsaserv status all
RSA Database Server                                        [RUNNING]
RSA Administration Server with Operations Console          [RUNNING]
RSA RADIUS Server Operations Console                       [RUNNING]
RSA Runtime Server                                         [RUNNING]
RSA RADIUS Server                                          [RUNNING]
RSA Console Server                                         [RUNNING]
RSA Replication (Replica)                                  [SHUTDOWN]

  • When attempting to start the replication service manually, it comes up then goes down again after a few minutes.
  • The opt/rsa/am/server/logs/ReplicaReplication.log shows the following error:
@@@2016-02-19 17:33:18,090 SYSTEM [Shutdown Hook ] 
 Service.shutdownVm(198) | SVRSA81SRVB-Lab.cgimss.com,,,,Stopped the service.
@@@2016-02-19 17:33:26,086 SYSTEM [WrapperSimpleAppMain ] 
 Service.start(88) | SVRSA81SRVB-Lab.cgimss.com,,,,Starting the service.
@@@2016-02-19 17:33:26,770 ERROR [ApplyP2R latestAppliedSweepId: 54 linesCommittedInNextSweep: 0 nextSweepIdToApply: 55] 
 ApplyP2R.executeBatchUpdate(202) | SVRSA81SRVB-Lab.cgimss.com,,,,unable to execute batch update when applying primary changes
java.sql.BatchUpdateException: Batch entry 2 insert into rsa_rep.am_attr_categories ( id, label_key, is_editable_ind, domain_object_type ) values
(   E'000000000000000000002001f0020036', E'TOKEN_SOFT_ANDROID_2.x', 'false', E'TOKEN' ) was aborted.  Call getNextException to see the cause.
    at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2598)
    at org.postgresql.core.v3.QueryExecutorImpl$1.handleError(QueryExecutorImpl.java:459)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:407)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2737)
    at com.rsa.replication.ApplyP2R.executeBatchUpdate(ApplyP2R.java:200)
    at com.rsa.replication.ApplyP2R.commitBatches(ApplyP2R.java:175)
    at com.rsa.replication.ApplyThread$1.doWork(ApplyThread.java:64)
    at com.rsa.replication.AutoFileCloser.<init>(AutoFileCloser.java:28)
    at com.rsa.replication.ApplyThread$1.<init>(ApplyThread.java:55)
    at com.rsa.replication.ApplyThread.applyChangesFromFileToDatabase(ApplyThread.java:55)
    at com.rsa.replication.ApplyThread.applyInsideTransaction(ApplyThread.java:49)
    at com.rsa.replication.ApplyThread.apply(ApplyThread.java:36)
    at com.rsa.replication.ApplyP2R.workIfNeccessary(ApplyP2R.java:73)
    at com.rsa.replication.ReplicationRunnable.work(ReplicationRunnable.java:70)
    at com.rsa.replication.util.ServiceCallable.work(ServiceCallable.java:110)
    at com.rsa.replication.util.ServiceCallable.runMainLoopUnsafe(ServiceCallable.java:99)
    at com.rsa.replication.util.ServiceCallable.runMainLoop(ServiceCallable.java:79)
    at com.rsa.replication.util.ServiceCallable.call(ServiceCallable.java:42)
    at com.rsa.replication.util.ServiceCallable.call(ServiceCallable.java:1)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:680)
@@@2016-02-19 17:33:26,792 FATAL [ApplyP2R latestAppliedSweepId: 54 linesCommittedInNextSweep: 0 nextSweepIdToApply: 55] 
 ServiceCallable.runMainLoop(83) | SVRSA81SRVB-Lab.cgimss.com,,,,Unhandled exception during main loop. Shutting down all service threads.
java.lang.RuntimeException: unable to apply primary changes
    at com.rsa.replication.ApplyP2R.wrapAndThrowPossibleDeadlock(ApplyP2R.java:213)
    at com.rsa.replication.ApplyP2R.executeBatchUpdate(ApplyP2R.java:203)
    at com.rsa.replication.ApplyP2R.commitBatches(ApplyP2R.java:175)
    at com.rsa.replication.ApplyThread$1.doWork(ApplyThread.java:64)
    at com.rsa.replication.AutoFileCloser.<init>(AutoFileCloser.java:28)
    at com.rsa.replication.ApplyThread$1.<init>(ApplyThread.java:55)
    at com.rsa.replication.ApplyThread.applyChangesFromFileToDatabase(ApplyThread.java:71)
    at com.rsa.replication.ApplyThread.applyInsideTransaction(ApplyThread.java:50)
    at com.rsa.replication.ApplyThread.apply(ApplyThread.java:36)
    at com.rsa.replication.ApplyP2R.workIfNeccessary(ApplyP2R.java:73)
    at com.rsa.replication.ReplicationRunnable.work(ReplicationRunnable.java:73)
    at com.rsa.replication.util.ServiceCallable.work(ServiceCallable.java:110)
    at com.rsa.replication.util.ServiceCallable.runMainLoopUnsafe(ServiceCallable.java:99)
    at com.rsa.replication.util.ServiceCallable.runMainLoop(ServiceCallable.java:80)
    at com.rsa.replication.util.ServiceCallable.call(ServiceCallable.java:42)
    at com.rsa.replication.util.ServiceCallable.call(ServiceCallable.java:1)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:139)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
    at java.lang.Thread.run(Thread.java:680)
CauseThis behavior occurs when, while the primary and replica are synchronizing, the primary attempts to write data to the replica's database that has values that already exist in the replica's database table.  The primary doesn't attempt to overwrite existing data, so the batch update fails and the RSA Replication (Replica) service shuts down.
ResolutionThe existing data on the replica needs to be deleted from the database manually, before attempting to synchronize the replica.
If the RepicaReplication.log in your deployment has the error shown above (that is, unable to execute batch update when applying primary changes), follow the steps below to resolve the issue,
  1. SSH to the replica and login as rsaadmin.
  2.  Navigate to /opt/rsa/am/server/logs.
  3. Cat the opt/rsa/am/server/logs/ReplicaReplication.log.
  4.  Look for entries similar to this one:
"unable to execute batch update when applying primary changes
java.sql.BatchUpdateException: Batch entry 2 insert into rsa_rep.am_attr_categories ( id, label_key, is_editable_ind, domain_object_type )
values (E'000000000000000000002001f0020036', E'TOKEN_SOFT_ANDROID_2.x', 'false', E'TOKEN' ) was aborted.  Call getNextException to see the cause.
    at rg.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2598)"

In this example the table name is rsa_rep.am_attr_categories and the value is 000000000000000000002001f0020036.  This is the problematic data that needs to be deleted from the database

  1. Navigate to /opt/rsa/am/utils and run the CLU to capture the database password.  
rsaadmin@am81p:/opt/rsa/am/utils> ./rsautil manage-secrets -a get com.rsa.db.dba.password 
Please enter OC Administrator username: <enter Operations Console administrator name> 
Please enter OC Administrator password: <enter Operations Console administrator password> 
com.rsa.db.dba.password: rSKD5bGguLGNL9uGvFWnJoxIcHJah2 

  1. Navigate to /opt/rsa/am/pgsql/bin and connect to the database.
rsaadmin@am81p:/opt/rsa/am/utils> cd ../pgsql/bin
rsaadmin@am81p:/opt/rsa/am/pgsql/bin> ./psql -h localhost -p 7050 -d db -U rsa_dba 
Password for user rsa_dba: <enter the com.rsa.db.dba.password captured above> 
psql.bin (9.2.4) 
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256) 
Type "help" for help. 
db=# DELETE FROM rsa_rep.am_attr_categories WHERE id ='000000000000000000002001f0020036'; 
commit; 
\q;

  1. Run the following command to delete the value.  Be sure to commit the change to the database and quit the connection.
db=# DELETE FROM rsa_rep.am_attr_categories WHERE id ='000000000000000000002001f0020036'; 
commit; 
\q;

  1. This process should be repeated for all entries that show duplicates in the ReplicaReplication.log until all of the duplicates have been removed.
  2. Next, SSH to the primary, force the replica out of sync and then go to the Operations Console and sync it again.  Please contact RSA support for the information in article 000029415 - Replication status of "internal replication error" in Authentication Manager 8.1.
WorkaroundRedeploy the replica if this is an available option, as the solution provided here might take much longer to complete than redeploying the replica.
 

Attachments

    Outcomes