000017066 - 'How to collect log data and restore replication after a replication failure on AM7.1 on Linux or Solaris

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support Employee on Apr 21, 2017
Version 2Show Document
  • View in full screen mode

Article Content

Article Number000017066
Applies ToRSA Authentication Manager 7.1 Service Pack 2 (SP2), Service Pack 3 (SP3), and Service Pack 4 (SP4)
Replication failed
Replication error or Replication logs
RHEL Linux 4.0
RHEL 5.5
Solaris 10
 
IssueCollect log data that can be sent to RSA Customer Service for analysis, and restore replication after a replication failure on RSA Authentication Manager 7.1.2 or later
CauseWhen a replication failure occurs in RSA Authentication Manager, the replica can be deleted and reattached (assuming SP2 or higher has been installed). However, simply reattaching a replica does not address the root cause of a replication failure and does not prevent the same failure from recurring. Also, if the primary is not recoverable, a complete reinstallation of the environment may need to be done. For these reasons, it is important to gather data prior to taking action to restore replication.
After the data is collected, you should disable and enable replication. (This is different than pausing and resuming.) Consider that this measure may not restore replication depending on the cause of the replication failure. Also, if you successfully restore replication, this does not mean that the replication issue is resolved.
Note: RSA recommends that you identify the cause of a replication failure before you reattach a replica. However, if you decide to reattach a replica without identifying the cause, follow the important instructions at the end of this solution.
 
ResolutionHow to collect log data and restore replication after a replication failure on an RSA Authentication Manager on UNIX
Note: This procedure instructs you to run RSA utilities. These utilities require the master password for your deployment. You created the master password during the Quick Setup of the primary server. 

Collecting log data
To collect log data that you can send to Customer Service for analysis:
  1. Do the following on the primary server:

    a. Connect to the server using the console, telnet  or an SSH client, as appropriate.
    Change to the fileowner used during the installation.   
    Change to the home directory used for installation, for example /usr/local/rsasecurity/rsaauthenticationmanager , but use whatever was really used in your environment.  For simplicity, the rest of this document will refer to that directory as [RSAHome] .  

    b. Set the current directory to [RSAHome]/utils.    Run:
      cd  utils

    c. Run the RSA script that sets environmental variables. Run:
      . ./rsaenv
    Note: The first 4 characters in this command are period, space, period, and forward slash.

    d. Generate the logs. Run the following commands. Enter each command on one line.
     
    ./rsautil manage-database -a exec-sql -f diagnostics/IMS_RepLogRpt.sql -A log_primary.html -U com.rsa.replication.admin

    ./rsautil manage-database -a exec-sql -f diagnostics/IMS_RepInfoRpt.sql -A info_primary.html -U com.rsa.replication.admin

    ./rsautil manage-database -a exec-sql -f diagnostics/IMS_RepErrorRpt.sql -A error_primary.html -U com.rsa.replication.admin

    ./rsautil manage-database -a exec-sql -f diagnostics/streams_hc_10GR2.sql -U com.rsa.replication.admin

    [LINUX command] free -m > /tmp/freempri.txt



    df -k > /tmp/dfkpri.txt

    uptime > /tmp/uptimepri.txt

     


    e. Collect the files from the [RSAHome]/utils directory (log_primary.html, info_primary.html, error_primary.html, and checkresult.html) and copy to the tmp directory.
    Note: Modify the file names so that the file name identifies the system where you collected the log data. For example, you can add the local host name to the file name so that it looks similar to the following: checkresult_USIT-RSAApp2.html
     
  2. Do the following on the replica server where the replication failure occurred:

    a. Connect to the server using the console, telnet  or an SSH client, as appropriate.
    Change to the fileowner used during the installation.   
    Change directories to [RSAHome] . 


    b. Set the current directory to [RSAHome]/utils.    Run:
      cd  utils

    c. Run the RSA script that sets environmental variables. Run:
      . ./rsaenv
    Note: The first 4 characters in this command are period, space, period, and forward slash.

    d. Generate the logs. Run the following commands. Enter each command on one line.
     
    ./rsautil manage-database -a exec-sql -f diagnostics/IMS_RepLogRpt.sql -A log_replica.html -U com.rsa.replication.admin

    ./rsautil manage-database -a exec-sql -f diagnostics/IMS_RepInfoRpt.sql -A info_replica.html -U com.rsa.replication.admin

    ./rsautil manage-database -a exec-sql -f diagnostics/IMS_RepErrorRpt.sql -A error_replica.html -U com.rsa.replication.admin

    ./rsautil manage-database -a exec-sql -f diagnostics/streams_hc_10GR2.sql -U com.rsa.replication.admin

    [LINUX command] free -m > /tmp/freerep.txt



    df -k > /tmp/dfkrep.txt

    uptime > /tmp/uptimerep.txt

     


    e. Collect the files from the [RSAHome]/utils directory (log_replica.html, info_replica.html, error_replica.html, and checkresult.html) and copy to the tmp directory.
    Note: Modify the file names so that the file name identifies the system where you collected the log data. For example, you can add the local host name to the file name so that it looks similar to the following: checkresult_USIT-RSAApp2.html
     
  3. On the primary and each replica, do the following:

    a. Run a replication status report. From [RSAHome]\utils Run:
      ./rsautil manage-replication -a report

    b. Collect the alert_<instance name>.log file, where <instance name> is an 8-character random-looking name, this file is in:
    [RSAHome]db/admin/<instance name>/bdump 

    c. If the RSA Engineer requests, also collect the  *.trc files from the point replication failed forward, in the same directory as the alert_<instance name> file.
     
  4. Copy all collected files ,and the files generated to the /tmp directory.
  5. Upload the logs to RSA Customer Support for analysis.
Restarting replication
Restarting replication stops and starts the replication processes. This restores replication in some cases, but does not address the cause of the replication failure.
Avoid pausing and resuming replication when a replication issue exists. This requires communication between the primary and replicas, and if replication processes are abnormal, then pausing and resuming might take a significant amount of time and often fails.
To restart replication (if it is imperative that the replica be restored immediately), run the following commands on the primary and then on each replica:
 
./rsautil manage-database -a exec-sql -U com.rsa.replication.admin -f diagnostics/disable-rep.sql
./rsautil manage-database -a exec-sql -U com.rsa.replication.admin -f diagnostics/enable-rep.sql


Reattaching the affected replica
You can reattach the affected replica using the RSA Operations Console. Before you proceed, read the information in the following note.
 
NotesImportant information about reattaching replicas
RSA recommends that you identify the cause of a replication failure before you reattach a replica. When you reattach a replica without knowing the cause of replication issues, consider the following aspects:
  • If the primary has an issue that is not recoverable, deleting a replica removes it as a promotion target.
  • Before you can attach a replica, the primary must have replication functioning normally, and you must do a replication cleanup.
  • Reattaching a replica often defers a replication failure; it does not resolve it.
Important: If the cause of a replication issue is an SP3 or SP4 upgrade failure, do not reattach the replica. The upgrade failure needs to be addressed first.
If you decide to reattach a replica without identifying the cause, perform these steps first:
  1. Create a backup of the primary using the RSA Operations Console. If the backup fails, do not delete the replica. Backup and attach both use Oracle Data Pump, and if there is a problem with Data Pump, you cannot attach the replica.
  2. Copy the backup and the secrets file from the primary.
  3. Maintain your license (a *.ZIP file), and all Service Pack and Patch files necessary to rebuild your environment to the same version from which you took your backup.
    Note: If you need to rebuild your environment, someone must have physical access to the each Appliance to restore the factory default settings and run Quick Setup.
For a similar solution for the RSA Appliance 3.0, see   A51084
For a similar solution for RSA Authentication Manager 7.1 on Windows 2003 or Windows 2008 see A51085
 
Legacy Article IDa60894

Attachments

    Outcomes