000017401 - AM 8.x Replication status Instance Offline

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support Employee on Apr 21, 2017
Version 2Show Document
  • View in full screen mode

Article Content

Article Number000017401
Applies ToOn the Operations Console you will see the replicaiton status is : Instance Offline
By the explaination of the status it could be either the instance is shutdown or there's network issue betewwn primary and replica.
RSA Authentication Manager 8.X
TCP Port 7002 is not listening ,     run on primary:  netstat -ano |grep 7002  => blank or doesnn't show replica
IssueReplication Status in operations console shows:[Instance Offline]

RSA Replication Service on replica is   [Shutdown]

RSA Replication (Replica)                                  [SHUTDOWN]

Replica is out of sync.
CauseNetwork issue like firewall , DNS blocked the communication between Primary and Replica.
Replica is Down or the Replication Service is down on replica.

Network Issue: please check on Firewall or DNS server to verity the status.

Command can be used to verify the network:

1. netstat -aon   

2. traceroute [ip]

3. nslookup [hostname]

4. tcpdump tcp port 7002


Replica is down or Replication service is down

1. Try manaully restart the service on replica :   cd  /opt/rsa/am/server   ./rsaserv restart all

RSA Replication (Replica)                                  [SHUTDOWN]

2. If service: replica_replication is still shutdown , please go to check on replication log

3. here's a example from one of my cases

In the replication log:  /opt/rsa/am/server/logs/ReplicaReplication.log

I found the following exception:

java.lang.RuntimeException: exception occurred while attempting to read file to apply /opt/rsa/am/replication/p2r_sweeps_to_apply/p2r_sweep_781748.sql.gz
        at com.rsa.replication.ApplyThread$1.doWork(ApplyThread.java:66)
        at com.rsa.replication.AutoFileCloser.<init>(AutoFileCloser.java:28)
        at com.rsa.replication.ApplyThread$1.<init>(ApplyThread.java:55)
        at com.rsa.replication.ApplyThread.applyChangesFromFileToDatabase(ApplyThread.java:71)
        at com.rsa.replication.ApplyThread.applyInsideTransaction(ApplyThread.java:50)
        at com.rsa.replication.ApplyThread.apply(ApplyThread.java:36)
        at com.rsa.replication.ApplyP2R.workIfNeccessary(ApplyP2R.java:69)
        at com.rsa.replication.ReplicationRunnable.work(ReplicationRunnable.java:73)
        at com.rsa.replication.util.ServiceCallable.work(ServiceCallable.java:110)
        at com.rsa.replication.util.ServiceCallable.runMainLoopUnsafe(ServiceCallable.java:99)
        at com.rsa.replication.util.ServiceCallable.runMainLoop(ServiceCallable.java:80)
        at com.rsa.replication.util.ServiceCallable.call(ServiceCallable.java:42)
        at com.rsa.replication.util.ServiceCallable.call(ServiceCallable.java:1)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:139)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:909)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException:Not in GZIP format
        at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:141)
        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:56)
        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:65)
        at com.rsa.replication.ApplyThread$1.doWork(ApplyThread.java:60)

This indicates that the replication process on replica cannot gunzip the replication package.

Manually locate the sweep file: p2r_sweep_781748.sql.gz  (/opt/rsa/am/replication/p2r_sweeps_to_propagate/)

            1. gunzip p2r_sweep_781748.sql.gz     ---- This may fail in some cases and give the same error as the java exception.

            2. Locate the same sweep file on primary, did a md5sum comparison by issuing : md5sum p2r_sweep_781748.sql.gz  ,

           Primary: 5e8b30233bee8c0943a7f91a72464d9a  p2r_sweep_781748.sql.gz

            Replica:  8d356e6dd7cd35ae76d1c291b62f655b  p2r_sweep_781748.sql.gz

Error is now cleared, concluding the sweep file on replica must be corrupted.

To fix the problem, we need to manually transfer the sweep file  (including .gz and .sha256) from primary to replace the problematic one on replica.

1. Move the problematic sweep files on replica to /tmp   (including .gz and .sha256) 

2. Go to Primary, switch path to: /opt/rsa/am/replication/p2r_sweeps_to_propagate/

3. SFTP rsaadmin@<replica ip/hostname>

4. In SFTP change dir to the same folder: cd /opt/rsa/am/replication/p2r_sweeps_to_propagate/

5. In SFTP copy the file to replica: put p2r_sweep_781748.sql.gz.sha256     and  put put p2r_sweep_781748.sql.gz

homer:/opt/rsa/am/replication/p2r_sweeps_to_propagate/tmp # sftp rsaadmin@marge      # Hommer is my  "Primary"  and  "Marge"  is my Replica

Connecting to marge...
sftp> cd /opt/rsa/am/replication/p2r_sweeps_to_apply
sftp> put p2r_sweep_781748.sql.gz.sha256
Uploading p2r_sweep_781748.sql.gz.sha256 to /opt/rsa/am/replication/p2r_sweeps_to_apply/p2r_sweep_781748.sql.gz.sha256
p2r_sweep_781748.sql.gz.sha256                                                  100%   64     0.1KB/s   00:00
sftp> put p2r_sweep_781748.sql.gz
Uploading p2r_sweep_781748.sql.gz to /opt/rsa/am/replication/p2r_sweeps_to_apply/p2r_sweep_781748.sql.gz
p2r_sweep_781748.sql.gz                                                         100% 1634     1.6KB/s   00:00
sftp> exit

6. On Replica, go to restart the service:  cd  /opt/rsa/am/server   ./rsaserv restart all

7. Now the Service primary_replication is in Running status.

marge:/opt/rsa/am/server # ./rsaserv status
Running as rsaadmin...
RSA Database Server                                        [RUNNING]
RSA Administration Server with Operations Console          [RUNNING]
RSA RADIUS Server Operations Console                       [RUNNING]
RSA Runtime Server                                         [RUNNING]
RSA RADIUS Server                                          [RUNNING]
RSA Console Server                                         [RUNNING]
RSA Replication (Replica)                                  [RUNNING]

8. Check on Operations Console, status right now is : Normal 

Legacy Article IDa65478