000013224 - Appliance 3.0.2- How to cleanup the Primary Appliance and re-attach a replica after a replication failure on a post SP2 Appliance

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support Employee on Apr 21, 2017
Version 2Show Document
  • View in full screen mode

Article Content

Article Number000013224
Applies ToAppliance 3.0.2, 3.0.4
AM 7.1 SP2, AM 7.1 SP4
Customer Support Training module, CSTM on replication and other topics, copy and paste this link into your browser URL
https://knowledge.rsasecurity.com/scolcms/set.aspx?id=9488
 
IssueHow to clean-up the Primary Appliance and re-attach a replica after a replication failure on a post SP2 Appliance
Replication Failure Replication Failed Failing to re-attach a replica Appliance, Alert logs shows Found the stuck propagation process to <SID>
remote_apply_error => ORA-26714: User error encountered while applying  -  Apply Error
ORA-01280: Fatal LogMiner Error. The capture process on local site is ABORTED
Archive log deleted from Primary and Change not applied to Replica - must clean-up and re-attach
====info_replica.html====
remote_log_apply_time => <not current date>
Archived Log Status is not current date, and Deleted = NO
<date>     ../backup/<SID>/archivelog/<date>/<name>_.arc
e.g. /usr/local/RSASecurity/RSAAuthenticationManager/backup/IKAGWYZR/archivelog/2014_ 03_30/o1_mf_1_5712_9mhdq1pr_.arc     NO
====info_primary=========
archive log .arc files not deleted since <not current date>
Local Apply TIMEOUT <not current date>
=====alert_<replica>.log======
Errors in file ../db/admin/bijqumds/bdump/<anme>.trc:
ORA-07445: exception encountered: core dump [kghbshrt()+112] [SIGBUS] [Non-existent physical address] [0xB714050C] [] []
<not current date>
 - IMS Trace - 0 rows deleted in rsa_logrep.ims_log_audit_adm for .01 seconds
 
ResolutionNOTE: All "rsautil" commands are run from the "/usr/local/RSASecurity/RSAAuthenticationManager/utils" directory (also known as (rsahome)/utils  ).  Once you logon to the appliance as "emcsrv"  (you will be prompted for the Operating System Password) run the following commands:
sudo su         (you will be prompted for a password again, use the OS Password from above, this makes you the root user)
su rsaadmin  (this makes you the rsaadmin user) 

cd /usr/local/RSASecurity/RSAAuthenticationManager/utils     (this changes you to the (rsahome)/utils directory)
. ./rsaenv            (notice this starts with dot-space-dot-slash, this sets some required environment variables)

Prerequisite : You need to verify you have the proper Master Password.  From the (rsahome)/utils directory run the following command:


./rsautil manage-replication -a list


You will be prompted for the Master Password.  If the command fails, do NOT continue with this process until you have the proper Master Password.


Unless noted, all of these steps are done on the Primary.

This fix covers all scenarios; one replica broken vs. all replicas broken


1. On the Primary, run a backup using the backup utility in the Operations Console (Maintenance->Backups->Create Backup . You can download the backup off the primary following the procedures in: a45460)
 


2. If Radius is configured, delete the Replica Radius server from the primary Operations Console (Deployment Configuration>Radius>Manage Existing)
Note: if you are running at least SP4, and the RADIUS Server is operating properly, and you will be re-adding the existing appliance, you can skip step 2  and 16
  2a. If there is at least one functional Replica, and  problem replica is not running at least SP4,  then apply the SP4 factory reset patch to the problem Replica and factory reset it. This will make it an unconfigured SP4 appliance.

 


3- Delete the Replica Server from the Primary Operations Console (Deployment Configuration>Instances>Manage Existing)
   IMPORTANT- If more than one Replica is to be removed from the Operations Console, don't chose multiple Replicas, choose ONE AT A TIME


If this hangs with --Status:Stopping propagation process at [<instance>] - Stop RSA Services on this replica.  May also need to reboot Primary
4.On the Primary run the following command:


 ./rsautil setup-replication -a list


You will be prompted for the Master password and/or SuperAdmin  Password to run all rsautil commands.


5. If the failed replica is in the list, run the following command:


./rsautil setup-replication -a remove-replica -n <name of replica to be removed>


6. On the Primary, run the following command:


 ./rsautil setup-replication -a remove-unreg-replicas


***NOTE: Steps 7 applies ONLY if there are no working replicas. If you have any working replicas skip step 7.*** 


 7. On the Primary, run the following commands:


 ./rsautil setup-replication -a remove-primary


 ./rsautil manage-rep-error -a run-script -o cleanup_propagation.sql


 ./rsautil setup-replication -a set-primary


Confirm and answer Y to all questions 


***NOTE: Steps 7 applies ONLY if there are no working replicas. If you have any working replicas skip step 7.*** 


8. On the Primary, Issue the following command:


 ./rsautil manage-rep-error -a run-script -o cleanup_propagation.sql


9. Restart the Authentication Manager database services TWICE including reboot


 


      Switch from the (rsahome)/utils directory to the /server directory:


      cd ../server 


      ./rsaam stop all


   ./rsaam start db   (this is the first db restart)


wait 2 minutes after the database start finishes


./rsaam stop all


exit   (leaving rsaadmin)


sudo su - (to get in to root mode, use same password as emcsrv)


reboot     (this will also do the second database restart, and free up locked files)


The reboot normally takes approximately 10 minutes to finish. If it has been more than 6 months since the last reboot with fsck (disk check), the system will do a Linux fsck , which  increases the reboot time to approximately 20 minutes


 


10. On the Primary, log into the Security Console and click the Setup->Instances menu. Verify that replication status is "Running"


 


11. Logon to the Security Console of the Primary and update the Authentication Manager Contact List: 


    Access/Authentication Agents/Authentication Manager Contact List/Automatic Rebalance, Rebalance


12. On the Primary, run a backup using the backup utility in the Operations Console (Maintenance->Backups->Create Backup).


13. On the Primary, generate a new Replica Package/Dump file from the Operations Console (Deployment Configuration>Instances>Generate Replica Package. Always use the "Manual" Option)


14. Logon to the Replica's Operations Console and ATTACH to the primary using the new Replica Package/Dump files


NOTE: The Replica Operations Console should only give you the option to ATTACH to the Primary. In the event the ATTACH option is not available on the Replicas Operations Console, the command below must be run on the Replica Server to prepare it for Attachment to the Primary.


CD to " /usr/local/RSASecurity/RSAAuthenticationManager/utils " and run the following command:


 ./rsautil manage-replication -a cleanup-offline-site


Once the command finishes successfully, logon to the Replicas Operations Console and ATTACH to the Primary using the new Replication Package/dump files.


15. Once the replica attaches to the primary, logon to the Primary Operations Console and check the replication Status report (Deployment Configuration>Instances>Status Report)


  The Data Transfer Status Should show "COMPLETE" both ways.


16. Logon to the Security Console of the Primary and update the Authentication Manager Contact List:


    Access/Authentication Agents/Authentication Manager Contact List/Automatic Rebalance, Rebalance


17. If you use RADIUS in your environment, and removed the Replica RADIUS Server as part of Step 2, Reconfigure RADIUS on the Replica server. Open the Operations Console, navigate to Deployment Configuration>RADIUS>Configure Server and enter the required data to configure RADIUS.


If you only have one replica and replication is broken, or ALL of your replicas are not working, you can follow these steps
SSH to Primary Appliance with emcsrv account to Create a backup:
sudo su rsaadmin
<same password as emcsrv>
 cd /usr/RSASecurity/RSAAuthenticationManager/utils
 ./rsautil manage-backups -a export -f /tmp/bac<date>.dmp
        This creates    /tmp/bac<date>.dmp   and    /tmp/bac<date>.secrets, use WinSCP to copy off appliance, use today's date for <date>
SSH to Replica to stop services - in case problems on Primary this preserves replica for promotion
sudo su rsaadmin
<same password as emcsrv>
 cd /usr/RSASecurity/RSAAuthenticationManager/server
 ./rsaam stop all
SSH on Primary - Clean-up
 ./rsautil setup-replication -a list  <prompted: Master password>
        Failed replica should be in the list
 ./rsautil setup-replication -a remove-replica -n <Fully qualified name of replica>     repeat for other replicas
 ./rsautil setup-replication -a remove-unreg-replicas
                ./rsautil setup-replication -a remove-primary                   don?t worry - This is Step 7 above
 ./rsautil manage-rep-error -a run-script -o cleanup_propagation.sql
 ./rsautil setup-replication -a set-primary                    Told you not to worry
Confirm and answer Y to all questions.
Stop the services, reboot the primary to unlock files if possible, and wait for all RSA services to start.

 cd ../server/
 ./rsaam stop all
 exit
 sudo su -
<same Password as emcsrv>
#  reboot                                  
Login to Primary Operations Console - Deployment Configuration > Instances > Generate Replica Package.
        Generate with Manual Option for both the .pkg file and the .dmp file.  Download to your PC

SSH to Replica to start services and configure to receive replica package
sudo su rsaadmin
<same password as emcsrv>
 cd /usr/RSASecurity/RSAAuthenticationManager/server
 ./rsaam start all
 cd ../utils
 ./rsautil manage-replication -a cleanup-offline-site
Login to Replica Operations Console - you should be promoted for a Replica Package.  Browse to .PKG file first, enter Master Password
[Next>]    Browse to .DMP file
When Replica Package Apply is Done, login to Primary Security Console to Update the RSA Authentication Manager Contact List - Security Console -
Access - Authentication Agents - Authentication Manager Contact List - Automatic Rebalance. [Automatic Rebalance]
This allows all RSA agents to use the replica.

 
NotesReplication error report data collection:
Access the Operating system command prompt via SSH with Linux
cd /usr/local/RSASecurity/RSAAuthenticationManager/utils
./rsautil manage-replication -a error-report -f error.htm
[If running SP4 un-patched or patch 3 or less, you may need to run an older version of this report ? rsautil manage-database -a exec-sql -f diagnostics/IMS_RepErrorRpt.sql -A error_primary.html -U com.rsa.replication.admin - you?ll get a java exception error]
./rsautil manage-database -a exec-sql -f diagnostics/IMS_RepLogRpt.sql -A log_primary.html -U com.rsa.replication.admin
./rsautil manage-database -a exec-sql -f diagnostics/IMS_RepInfoRpt.sql -A info_primary.html -U com.rsa.replication.admin
./rsautil manage-database -a exec-sql -f diagnostics/streams_hc_10GR2.sql -U com.rsa.replication.admin
You?ll need to know the Master Password to run these reports.
You will have four files in the ..\utils directory; error.htm, [or error_primary.html If running older versions]
log_primary.html, info_primary.html and checkresult.html. Please send them to me.
Modify the -A output to log_replica , info_replica and checkresult_replica then run these reports on the Replica(s).
Send these files and the alert log to Customer Support
Linux/Solaris/Appliance
/usr/local/RSASecurity/RSAAuthenticationManager/db/admin/<oracleSID>/bdump/alert_ <oracleSID>.log
Contact RSA Customer Support if all Appliance instances are not the Same Service Pack/Patch level.

It's highly recommended that all Appliance instances are upgraded to the latest Appliance 3.0 Service Packs/Patches. Follow the readme instructions of each Service Pack/Patch to apply the updates.


AM 7.1.2 Server- How to clean-up a Primary and re-attach a replica after a replication Failure: a51068
Legacy Article IDa51069

Attachments

    Outcomes