000011794 - RKM Primary and secondary servers wont sync?

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support Employee on Apr 21, 2017
Version 2Show Document
  • View in full screen mode

Article Content

Article Number000011794
Applies ToRSA Key Manager Appliance 1.6
RSA Key Manager Appliance
rPath Linux
IssueRKM Primary and secondary servers wont sync?
Oracle reporting an ora-16661 on the primary node

The customer noticed a problem with the synchronization of their RKM primary and secondary servers. Ultimately the system became unresponsive resulting in the power recycling of the system.


Table of Oracle Errors Noticed

Error Description

ORA-27037 Unable to obtain file status

ORA-16572 DataGuard configuration file not found

ORA-16825 FastStart Failover and other errors or warnings detected for the database.

ORA-16820 Fast-Start Failover observer is no longer observing this database

ORA-16817 unsynchronized Fast-Start Failover configuration

ORA-16606 Unable to find property string

ORA-16656 higher DRC UID sequence number detected

ORA-16596 Object not part of the Dataguard broker configuration

Cause

Errors related to Dataguard configuration on Dataguard Broker Bootstrap

The drc log reveals oracle error ORA-27037 and ORA-16572 at several occasions during the dataguard broker bootstrap, resulting in a server with no configuration. The errors emanate from the inability of the bootstrap program to access the dat file /opt/oracle/product/10.2.0/db_1/dbs/dr1PRODRKMS.dat. It appears to be a problem with the mount /opt on the unix system, making it inaccessible to the bootstrap program.

The issue persisted from October 2008 to Jan 2009, when the system was restarted due to a power cycle. The failure of the bootstrap to load a successful Dataguard configuration could have resulted in the unsynchronized data between the Primary and Standby as indicated by the unsynchronized FSFO-configuration error ORA-16817


Interpretation of errors ORA-16817,ORA-16820 and ORA-16825 in the drc log

The errors ORA-16825 and the associated errors ORA-16820 and ORA-16817 appears to have started after the successful bootstrap of the dataguard databroker configuration in Jan 2009. However, since the successful start was done with out resolving the errors due to the wrong unix mount as mentioned above as well as resorting to reconciliation of the v$archive_gap prior to the successful start, this left the system in an unstable state with a failed attempt to do a managed recovery or resynchronization.

An excerpt of the error message from the drc logs verified is given here:

DG 2009-01-23-10:10:12 0 2 0 RSM0: HEALTH CHECK ERROR: ORA-16820: Fast-Start Failover observer is no longer observing this database

DG 2009-01-23-10:10:12 0 2 676894187 Operation CTL_GET_STATUS cancelled during phase 2, error = ORA-16825

DG 2009-01-23-10:10:12 0 2 676894187 Operation CTL_GET_STATUS cancelled during phase 2, error = ORA-16825

DG 2009-01-23-10:11:12 0 2 0 RSM0: HEALTH CHECK WARNING: ORA-16817: unsynchronized Fast-Start Failover configuration


Repeated invocation of ?Entered rfm_get_chief_lock() for MON_VERIFY, reason 0 ? in the drc log.

The continued occurrence of the Entered rfm_get_chief_lock() entry in the log is attributed to a known Oracle Bug 5752220, ?BROKER IS HANGING WHEN ENABLING CONFIGURATION?. As indicated by the repeated call to the procedure rfm_get_chief_lock, this spawns a large number of threads ultimately leaving the system in a hung state. This was noticed in the customer site as well.

As this bug was noticed with any of our customers in the past, it is believed that the already unstable state of the system due to the unix mount error contributed to this situation leading to the circumstances for triggering this bug. This bug is related to Oracle version 10.2.0.1 and 10.2.0.2

Resolution

The current logs reveal that even after a successful bootstrap of the Dataguard Broker, the system is still in an unstable state due to unresolved errors prior to the successful bootstrap. It is strongly suggested to resolve the existing differences in the archive logs by investigating the same as well as resolving any issues or errors, before the system could be considered to be stable. In addition, RKM Appliance 1.6.1 version onwards uses Oracle version 10.2.0.4 that is devoid of some critical bugs noticed in the previous releases. In the light of the same, it is strongly advised to perform a planned move to the Appliance version 1.6.1 at the earliest, for better performance and better results.

NotesBZ 118378
Legacy Article IDa44562

Attachments

    Outcomes