RSA Authentication Manager 8.x shows replication status as "Instance Offline"
a day ago
Originally Published: 2014-05-08
Article Number
000051901
Applies To
RSA Product Set:  SecurID
RSA Product/Service Type:  Authentication Manager
RSA Version/Condition:  8.x

 

Issue
  • In the Operations Console, under Deployment Configuration > Instances > Status Report, the replication status may appear as “Instance Offline.” This status may indicate that the replica instance is shut down or that there is a network connectivity issue between the primary and replica server(s).
  • When running netstat from an SSH session on the primary server, port 7002 may not appear as listening. The output may be blank or may not display the replica connection at all.
netstat -ano | grep 7002
  • RSA Replication Service on replica is shutdown: 
RSA Replication (Replica)                                  [SHUTDOWN]
  • Replication is shown as out of sync.
Cause

This message typically occurs due to a network or communication issue, such as:

  • A firewall blocking the required communication port.
  • DNS issues preventing communication between the primary and replica servers.
  • The replica server being offline or the replication service on the replica server being stopped.
Resolution

To resolve the issue, first determine whether it is caused by a network connectivity problem or by the replica/replication service being down.

1. Verify Network Connectivity

Check the firewall and DNS configuration to ensure communication between the primary and replica servers is functioning correctly.

Run the following commands on both servers:

netstat -aon
traceroute <Primary_IP_Address>
traceroute <Replica_IP_Address>
nslookup <Primary_IP_Address>
nslookup <Replica_IP_Address>
nslookup <Primary_FQDN>
nslookup <Replica_FQDN>
tcpdump tcp port 7002

These commands help verify:

  • Port accessibility
  • Network routing
  • DNS resolution
  • Replication traffic on port 7002

2. If the Replica or Replication Service Is Down

Try restarting the Authentication Manager services on the replica server.

Check Current Service Status

cd /opt/rsa/am/server
./rsaserv status all

Example output:

RSA Database Server                                        [RUNNING]
RSA Administration Server with Operations Console          [RUNNING]
RSA Runtime Server                                         [RUNNING]
RSA Replication (Replica)                                  [SHUTDOWN]

Restart All Replication Service

cd /opt/rsa/am/server
./rsaserv replica_replication

After the restart completes, verify that the replication service is running:

./rsaserv status all

Expected status:

RSA Replication (Replica)                                  [RUNNING]

3. If the Replication Service Remains Down

Check the following log file on the replica:

/opt/rsa/am/server/logs/ReplicaReplication.log

Look for errors similar to:

java.lang.RuntimeException: exception occurred while attempting to read file to apply /opt/rsa/am/replication/p2r_sweeps_to_apply/p2r_sweep_781748.sql.gz
        at com.rsa.replication.ApplyThread$1.doWork(ApplyThread.java:66)
        at com.rsa.replication.AutoFileCloser.<init>(AutoFileCloser.java:28)
        at com.rsa.replication.ApplyThread$1.<init>(ApplyThread.java:55)
        at com.rsa.replication.ApplyThread.applyChangesFromFileToDatabase(ApplyThread.java:71)
        at com.rsa.replication.ApplyThread.applyInsideTransaction(ApplyThread.java:50)
        at com.rsa.replication.ApplyThread.apply(ApplyThread.java:36)
        at com.rsa.replication.ApplyP2R.workIfNeccessary(ApplyP2R.java:69)
        at com.rsa.replication.ReplicationRunnable.work(ReplicationRunnable.java:73)
        at com.rsa.replication.util.ServiceCallable.work(ServiceCallable.java:110)
        at com.rsa.replication.util.ServiceCallable.runMainLoopUnsafe(ServiceCallable.java:99)
        at com.rsa.replication.util.ServiceCallable.runMainLoop(ServiceCallable.java:80)
        at com.rsa.replication.util.ServiceCallable.call(ServiceCallable.java:42)
        at com.rsa.replication.util.ServiceCallable.call(ServiceCallable.java:1)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:139)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:909)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException:Not in GZIP format
        at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:141)
        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:56)
        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:65)
        at com.rsa.replication.ApplyThread$1.doWork(ApplyThread.java:60)

This error indicates that the replication sweep package on the replica is corrupted and cannot be decompressed.

 

4. Validate the Sweep File

Locate the affected sweep file referenced in the error message, for example: p2r_sweep_781748.sql.gz 

Navigate to the replication sweep directory on the replica: 

cd /opt/rsa/am/replication/p2r_sweeps_to_apply/

Attempt to extract the file:

gunzip p2r_sweep_781748.sql.gz

If the same GZIP error appears, compare the file checksum between the primary and replica servers.

md5sum p2r_sweep_781748.sql.gz

Example:

Primary: 5e8b30233bee8c0943a7f91a72464d9a
Replica: 8d356e6dd7cd35ae76d1c291b62f655b

If the checksums differ, the sweep file on the replica is corrupted.

 

5. Replace the Corrupted Sweep File

On the Replica Server

Move the corrupted files to /tmp:

mv p2r_sweep_781748.sql.gz* /tmp/

On the Primary Server

Navigate to the sweep directory: 

cd /opt/rsa/am/replication/p2r_sweeps_to_propagate/

Transfer both the .gz and .sha256 files to the replica server using either SFTP or a file transfer tool such as WinSCP.

sftp rsaadmin@<Replica_IP_or_Hostname>

Example:

sftp> cd /opt/rsa/am/replication/p2r_sweeps_to_apply/
sftp> put p2r_sweep_781748.sql.gz.sha256
sftp> put p2r_sweep_781748.sql.gz
sftp> exit

6. Restart Services Again

On the replica server restart replication service:

cd /opt/rsa/am/server
./rsaserv replica_replication

Verify that the replication service is running:

./rsaserv status all

Expected output:

RSA Replication (Replica)                                  [RUNNING]

7. Verify Replication Status

Log in to the Operations Console navigate to Deployment Configuration > Instances > Status Report, then verify that the replication status displayed as: Normal

Notes

Verified in RSA Labs icon.png