Article Content
Article Number | 000013224 |
Applies To | Appliance 3.0.2, 3.0.4 AM 7.1 SP2, AM 7.1 SP4 Customer Support Training module, CSTM on replication and other topics, copy and paste this link into your browser URL https://knowledge.rsasecurity.com/scolcms/set.aspx?id=9488 |
Issue | How to clean-up the Primary Appliance and re-attach a replica after a replication failure on a post SP2 Appliance Replication Failure Replication Failed Failing to re-attach a replica Appliance, Alert logs shows Found the stuck propagation process to <SID> remote_apply_error => ORA-26714: User error encountered while applying - Apply Error ORA-01280: Fatal LogMiner Error. The capture process on local site is ABORTED Archive log deleted from Primary and Change not applied to Replica - must clean-up and re-attach ====info_replica.html==== remote_log_apply_time => <not current date> Archived Log Status is not current date, and Deleted = NO <date> ../backup/<SID>/archivelog/<date>/<name>_.arc e.g. /usr/local/RSASecurity/RSAAuthenticationManager/backup/IKAGWYZR/archivelog/2014_ 03_30/o1_mf_1_5712_9mhdq1pr_.arc NO ====info_primary========= archive log .arc files not deleted since <not current date> Local Apply TIMEOUT <not current date> =====alert_<replica>.log====== Errors in file ../db/admin/bijqumds/bdump/<anme>.trc: ORA-07445: exception encountered: core dump [kghbshrt()+112] [SIGBUS] [Non-existent physical address] [0xB714050C] [] [] <not current date> - IMS Trace - 0 rows deleted in rsa_logrep.ims_log_audit_adm for .01 seconds |
Resolution | NOTE: All "rsautil" commands are run from the "/usr/local/RSASecurity/RSAAuthenticationManager/utils" directory (also known as (rsahome)/utils ). Once you logon to the appliance as "emcsrv" (you will be prompted for the Operating System Password) run the following commands: sudo su (you will be prompted for a password again, use the OS Password from above, this makes you the root user) su rsaadmin (this makes you the rsaadmin user) cd /usr/local/RSASecurity/RSAAuthenticationManager/utils (this changes you to the (rsahome)/utils directory) . ./rsaenv (notice this starts with dot-space-dot-slash, this sets some required environment variables) Prerequisite : You need to verify you have the proper Master Password. From the (rsahome)/utils directory run the following command: ./rsautil manage-replication -a list You will be prompted for the Master Password. If the command fails, do NOT continue with this process until you have the proper Master Password. Unless noted, all of these steps are done on the Primary. This fix covers all scenarios; one replica broken vs. all replicas broken 1. On the Primary, run a backup using the backup utility in the Operations Console (Maintenance->Backups->Create Backup . You can download the backup off the primary following the procedures in: a45460) 2. If Radius is configured, delete the Replica Radius server from the primary Operations Console (Deployment Configuration>Radius>Manage Existing) 3- Delete the Replica Server from the Primary Operations Console (Deployment Configuration>Instances>Manage Existing) If this hangs with --Status:Stopping propagation process at [<instance>] - Stop RSA Services on this replica. May also need to reboot Primary ./rsautil setup-replication -a list You will be prompted for the Master password and/or SuperAdmin Password to run all rsautil commands. 5. If the failed replica is in the list, run the following command: ./rsautil setup-replication -a remove-replica -n <name of replica to be removed> 6. On the Primary, run the following command: ./rsautil setup-replication -a remove-unreg-replicas ***NOTE: Steps 7 applies ONLY if there are no working replicas. If you have any working replicas skip step 7.*** 7. On the Primary, run the following commands: ./rsautil setup-replication -a remove-primary ./rsautil manage-rep-error -a run-script -o cleanup_propagation.sql ./rsautil setup-replication -a set-primary Confirm and answer Y to all questions ***NOTE: Steps 7 applies ONLY if there are no working replicas. If you have any working replicas skip step 7.*** 8. On the Primary, Issue the following command: ./rsautil manage-rep-error -a run-script -o cleanup_propagation.sql 9. Restart the Authentication Manager database services TWICE including reboot
Switch from the (rsahome)/utils directory to the /server directory: cd ../server ./rsaam stop all ./rsaam start db (this is the first db restart) wait 2 minutes after the database start finishes ./rsaam stop all exit (leaving rsaadmin) sudo su - (to get in to root mode, use same password as emcsrv) reboot (this will also do the second database restart, and free up locked files) The reboot normally takes approximately 10 minutes to finish. If it has been more than 6 months since the last reboot with fsck (disk check), the system will do a Linux fsck , which increases the reboot time to approximately 20 minutes
10. On the Primary, log into the Security Console and click the Setup->Instances menu. Verify that replication status is "Running"
11. Logon to the Security Console of the Primary and update the Authentication Manager Contact List: Access/Authentication Agents/Authentication Manager Contact List/Automatic Rebalance, Rebalance 12. On the Primary, run a backup using the backup utility in the Operations Console (Maintenance->Backups->Create Backup). 13. On the Primary, generate a new Replica Package/Dump file from the Operations Console (Deployment Configuration>Instances>Generate Replica Package. Always use the "Manual" Option) 14. Logon to the Replica's Operations Console and ATTACH to the primary using the new Replica Package/Dump files NOTE: The Replica Operations Console should only give you the option to ATTACH to the Primary. In the event the ATTACH option is not available on the Replicas Operations Console, the command below must be run on the Replica Server to prepare it for Attachment to the Primary. CD to " /usr/local/RSASecurity/RSAAuthenticationManager/utils " and run the following command: ./rsautil manage-replication -a cleanup-offline-site Once the command finishes successfully, logon to the Replicas Operations Console and ATTACH to the Primary using the new Replication Package/dump files. 15. Once the replica attaches to the primary, logon to the Primary Operations Console and check the replication Status report (Deployment Configuration>Instances>Status Report) The Data Transfer Status Should show "COMPLETE" both ways. 16. Logon to the Security Console of the Primary and update the Authentication Manager Contact List: Access/Authentication Agents/Authentication Manager Contact List/Automatic Rebalance, Rebalance 17. If you use RADIUS in your environment, and removed the Replica RADIUS Server as part of Step 2, Reconfigure RADIUS on the Replica server. Open the Operations Console, navigate to Deployment Configuration>RADIUS>Configure Server and enter the required data to configure RADIUS. If you only have one replica and replication is broken, or ALL of your replicas are not working, you can follow these steps SSH to Primary Appliance with emcsrv account to Create a backup: sudo su rsaadmin <same password as emcsrv> cd /usr/RSASecurity/RSAAuthenticationManager/utils ./rsautil manage-backups -a export -f /tmp/bac<date>.dmp This creates /tmp/bac<date>.dmp and /tmp/bac<date>.secrets, use WinSCP to copy off appliance, use today's date for <date> SSH to Replica to stop services - in case problems on Primary this preserves replica for promotion sudo su rsaadmin <same password as emcsrv> cd /usr/RSASecurity/RSAAuthenticationManager/server ./rsaam stop all SSH on Primary - Clean-up ./rsautil setup-replication -a list <prompted: Master password> Failed replica should be in the list ./rsautil setup-replication -a remove-replica -n <Fully qualified name of replica> repeat for other replicas ./rsautil setup-replication -a remove-unreg-replicas ./rsautil setup-replication -a remove-primary don?t worry - This is Step 7 above ./rsautil manage-rep-error -a run-script -o cleanup_propagation.sql ./rsautil setup-replication -a set-primary Told you not to worry Confirm and answer Y to all questions. Stop the services, reboot the primary to unlock files if possible, and wait for all RSA services to start. cd ../server/ ./rsaam stop all exit sudo su - <same Password as emcsrv> # reboot Login to Primary Operations Console - Deployment Configuration > Instances > Generate Replica Package. Generate with Manual Option for both the .pkg file and the .dmp file. Download to your PC SSH to Replica to start services and configure to receive replica package sudo su rsaadmin <same password as emcsrv> cd /usr/RSASecurity/RSAAuthenticationManager/server ./rsaam start all cd ../utils ./rsautil manage-replication -a cleanup-offline-site Login to Replica Operations Console - you should be promoted for a Replica Package. Browse to .PKG file first, enter Master Password [Next>] Browse to .DMP file When Replica Package Apply is Done, login to Primary Security Console to Update the RSA Authentication Manager Contact List - Security Console - Access - Authentication Agents - Authentication Manager Contact List - Automatic Rebalance. [Automatic Rebalance] This allows all RSA agents to use the replica. |
Notes | Replication error report data collection: Access the Operating system command prompt via SSH with Linux cd /usr/local/RSASecurity/RSAAuthenticationManager/utils ./rsautil manage-replication -a error-report -f error.htm [If running SP4 un-patched or patch 3 or less, you may need to run an older version of this report ? rsautil manage-database -a exec-sql -f diagnostics/IMS_RepErrorRpt.sql -A error_primary.html -U com.rsa.replication.admin - you?ll get a java exception error] ./rsautil manage-database -a exec-sql -f diagnostics/IMS_RepLogRpt.sql -A log_primary.html -U com.rsa.replication.admin ./rsautil manage-database -a exec-sql -f diagnostics/IMS_RepInfoRpt.sql -A info_primary.html -U com.rsa.replication.admin ./rsautil manage-database -a exec-sql -f diagnostics/streams_hc_10GR2.sql -U com.rsa.replication.admin You?ll need to know the Master Password to run these reports. You will have four files in the ..\utils directory; error.htm, [or error_primary.html If running older versions] log_primary.html, info_primary.html and checkresult.html. Please send them to me. Modify the -A output to log_replica , info_replica and checkresult_replica then run these reports on the Replica(s). Send these files and the alert log to Customer Support Linux/Solaris/Appliance /usr/local/RSASecurity/RSAAuthenticationManager/db/admin/<oracleSID>/bdump/alert_ <oracleSID>.log Contact RSA Customer Support if all Appliance instances are not the Same Service Pack/Patch level. It's highly recommended that all Appliance instances are upgraded to the latest Appliance 3.0 Service Packs/Patches. Follow the readme instructions of each Service Pack/Patch to apply the updates. AM 7.1.2 Server- How to clean-up a Primary and re-attach a replica after a replication Failure: a51068 |
Legacy Article ID | a51069 |