000017401 - RSA Authentication Manager 8.x shows replication status as "Instance Offline"

Document created by RSA Customer Support Employee on Jul 24, 2018
Version 1Show Document
  • View in full screen mode

Article Content

Article Number000017401
Applies ToRSA Product Set:  SecurID
RSA Product/Service Type:  Authentication Manager
RSA Version/Condition:  8.x
Issue
  • In the Operations Console you will see the replication status is listed as Instance Offline.  This status message can mean that the instance is shut down or there is a network issue between the primary and replica server(s).
  • On the primary running netstat from an SSH session shows 7002 is not listening.  It shows as blank or does not show the replica.


netstat -ano | grep 7002


  • Replication Status in the Operations Console shows as Instance Offline
  • RSA Replication Service on replica is shutdown:


RSA Replication (Replica)                                  [SHUTDOWN]


  • Replication is shown as out of sync.
CauseThis message will happen if there is a network issue such as:
  • The firewall blocking a port.
  • DNS is blocking the communication between the primary and replica servers.
  • The replica is down or the replication service is down on the replica.
Resolution

To resolve a network issue



  1. Please check the firewall or DNS server to verify the status.
  2. Run the following commands:


netstat -aon   
traceroute <IP address of primary>
traceroute <IP address of replica>
nslookup <IP address of primary>
nslookup <IP address of replica>
nslookup <FQDN of primary>
nslookup <FQDN of replica>
tcpdump tcp port 7002


If the replica is down or the replication service is down



  1. Try manually restarting the replication service on the replica:   


rsaadmin@marge:~> cd /opt/rsa/am/server  
rsaadmin@marge:/opt/rsa/am/server> ./rsaserv status all
RSA Database Server                                        [RUNNING]
RSA Administration Server with Operations Console          [RUNNING]
RSA RADIUS Server Operations Console                       [RUNNING]
RSA Runtime Server                                         [RUNNING]
RSA RADIUS Server                                          [RUNNING]
RSA Console Server                                         [RUNNING]
RSA Replication (Primary)                                  [RUNNING]
rsaadmin@marge:/opt/rsa/am/server> ./rsaserv restart all
Stopping RSA RADIUS Server: **
RSA RADIUS Server                                          [SHUTDOWN]
Stopping RSA Runtime Server: ***
RSA Runtime Server                                         [SHUTDOWN]
Stopping RSA Console Server: **
RSA Console Server                                         [SHUTDOWN]
Stopping RSA Replication (Primary): **
RSA Replication (Primary)                                  [SHUTDOWN]
Stopping RSA Database Server: *
RSA Database Server                                        [SHUTDOWN]
Stopping RSA RADIUS Server Operations Console: **
RSA RADIUS Server Operations Console                       [SHUTDOWN]
Stopping RSA Administration Server with Operations Console: **
RSA Administration Server with Operations Console          [SHUTDOWN]
Starting RSA Administration Server with Operations Console:
Starting RSA Database Server: *************
RSA Administration Server with Operations Console          [RUNNING]
Starting RSA RADIUS Server Operations Console: *\ RSA Database Server     [RUNNING] *************
RSA RADIUS Server Operations Console                       [RUNNING]
Starting RSA Runtime Server: *****************************
RSA Runtime Server                                         [RUNNING]
Starting RSA RADIUS Server: *
RSA RADIUS Server                                          [RUNNING]
Starting RSA Console Server: *
Starting RSA Replication (Primary): ***
RSA Replication (Primary)                                  [RUNNING]*****************
RSA Console Server                                         [RUNNING]
rsaadmin@marge:/opt/rsa/am/server>


  1. Alternatively, restart just the replication service:


rsaadmin@marge:/opt/rsa/am> cd  ../server  
rsaadmin@marge:/opt/rsa/am/server> ./rsaserv replica_replication
Stopping RSA Replication (Replica): / RSA Database Server     [RUNNING]
RSA Replication (Replica)                                     [SHUTDOWN]
Starting RSA Database Server:
Starting RSA Replication (Replica): ***
RSA Replication (Replica)                                     [RUNNING]
rsaadmin@marge:/opt/rsa/am/server>


  1. If the replica_replication service is still shutdown, please check the /opt/rsa/am/server/logs/ReplicaReplication.log for the following exception:


java.lang.RuntimeException: exception occurred while attempting to read file to apply /opt/rsa/am/replication/p2r_sweeps_to_apply/p2r_sweep_781748.sql.gz
        at com.rsa.replication.ApplyThread$1.doWork(ApplyThread.java:66)
        at com.rsa.replication.AutoFileCloser.<init>(AutoFileCloser.java:28)
        at com.rsa.replication.ApplyThread$1.<init>(ApplyThread.java:55)
        at com.rsa.replication.ApplyThread.applyChangesFromFileToDatabase(ApplyThread.java:71)
        at com.rsa.replication.ApplyThread.applyInsideTransaction(ApplyThread.java:50)
        at com.rsa.replication.ApplyThread.apply(ApplyThread.java:36)
        at com.rsa.replication.ApplyP2R.workIfNeccessary(ApplyP2R.java:69)
        at com.rsa.replication.ReplicationRunnable.work(ReplicationRunnable.java:73)
        at com.rsa.replication.util.ServiceCallable.work(ServiceCallable.java:110)
        at com.rsa.replication.util.ServiceCallable.runMainLoopUnsafe(ServiceCallable.java:99)
        at com.rsa.replication.util.ServiceCallable.runMainLoop(ServiceCallable.java:80)
        at com.rsa.replication.util.ServiceCallable.call(ServiceCallable.java:42)
        at com.rsa.replication.util.ServiceCallable.call(ServiceCallable.java:1)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:139)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:909)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException:Not in GZIP format
        at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:141)
        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:56)
        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:65)
        at com.rsa.replication.ApplyThread$1.doWork(ApplyThread.java:60)


  1. This error indicates that the replication process on replica cannot gunzip the replication package.
  2. Navigate to the location of the sweep files, noted in the error (e. g., /opt/rsa/am/replication/p2r_sweeps_to_propagate/).
  3. Locate the sweep file noted in the error, which is p2r_sweep_781748.sql.gz.
  4. Expand the .gz file on the replica.  Note that this may fail in some cases and give the same error as the java exception above.


gunzip p2r_sweep_781748.sql.gz


  1. Locate the sweep file on the primary with the same name.
  2. Run an md5sum comparison by issuing the following command:


md5sum p2r_sweep_781748.sql.gz
Primary: 5e8b30233bee8c0943a7f91a72464d9a  p2r_sweep_781748.sql.gz
Replica:  8d356e6dd7cd35ae76d1c291b62f655b  p2r_sweep_781748.sql.gz


  1. The error is now cleared, concluding the sweep file on replica must be corrupted.
  2. To fix the problem, manually transfer the sweep file, including the .gz and the .sha256, from the primary server to the replica, replacing the problematic file on the replica.
    1. Move the problematic sweep files on the replica to /tmp, including the .gz and the .sha256.
    2. On the primary, navigate to /opt/rsa/am/replication/p2r_sweeps_to_propagate/.
    3. SFTP rsaadmin@<replica_IP_address/hostname> (Note that Homer is the primary and Marge is the replica).


cd /opt/rsa/am/replication/p2r_sweeps_to_propagate/
put p2r_sweep_781748.sql.gz.sha256
put p2r_sweep_781748.sql.gz

homer:/opt/rsa/am/replication/p2r_sweeps_to_propagate/tmp # sftp rsaadmin@marge      
Connecting to marge...
Password: <enter password>
sftp> cd /opt/rsa/am/replication/p2r_sweeps_to_apply
sftp> put p2r_sweep_781748.sql.gz.sha256
Uploading p2r_sweep_781748.sql.gz.sha256 to /opt/rsa/am/replication/p2r_sweeps_to_apply/p2r_sweep_781748.sql.gz.sha256
p2r_sweep_781748.sql.gz.sha256                                                  100%   64     0.1KB/s   00:00
sftp> put p2r_sweep_781748.sql.gz
Uploading p2r_sweep_781748.sql.gz to /opt/rsa/am/replication/p2r_sweeps_to_apply/p2r_sweep_781748.sql.gz
p2r_sweep_781748.sql.gz                                                         100% 1634     1.6KB/s   00:00
sftp> exit


  1. On the replica, restart the Authentication Manager services: 


rsaadmin@marge:/opt/rsa/am> cd  /opt/rsa/am/server  
rsaadmin@marge:/opt/rsa/am/server> ./rsaserv restart all


  1. Now the primary_replication service is in a running status:


rsaadmin@marge:/opt/rsa/am/server> ./rsaserv status
Running as rsaadmin...
RSA Database Server                                        [RUNNING]
RSA Administration Server with Operations Console          [RUNNING]
RSA RADIUS Server Operations Console                       [RUNNING]
RSA Runtime Server                                         [RUNNING]
RSA RADIUS Server                                          [RUNNING]
RSA Console Server                                         [RUNNING]
RSA Replication (Replica)                                  [RUNNING]


  1. Login to the Operations Console and check to see if the status is set to Normal.
Legacy Article IDa65478

Attachments

    Outcomes