000014770 - Unable to join a new cluster to a group; no failures are reported when joining the clusters

Document created by RSA Customer Support Employee on Jun 16, 2016Last modified by RSA Customer Support Employee on Apr 22, 2017
Version 2Show Document
  • View in full screen mode

Article Content

Article Number000014770
Applies ToRSA Key Manager Appliance
IssueUnable to join a new cluster to a group; no failures are reported when joining the clusters
The operation to join cluster to a group appears to go through completely with no errors on RKM Appliance Operations Console, but Clusters information and Query Replication State option on Operations Console for both clusters do not show that the clusters are joined.
CauseThe original standby reconfigured as a primary in a new cluster was assigned a new host name but the IP address was not changed AND the new cluster number chosen was the same as the first cluster.  The IP address of reconfigured appliance must be different than the original one and the cluster number for each cluster must be unique (between 1 and 9).
ResolutionAlthough the state of the first cluster (that only has a primary at this point) is unknown and the cluster may need to be restored (from previous backups), the following steps can be taken to configure the two clusters in a group:

1. Backup the primary in first cluster (if not already backed up).  See RKM Appliance Deployment Guide for more details on how to configure backup.

2. Run uninstall script on the second appliance (currently configured as the primary for second cluster)

3. Reconfigure/reinstall the second appliance as a new cluster using:
    - a new IP (different than its original IP)
    - a new host name (different than any of the previous host names assigned to it)
    - the new IP/host name chosen for the second appliance must resolve properly, by IP and by hostname
    - a new cluster name (different than the first cluster or any of the previously attempted names for the new cluster)
    - a new cluster number (different than the first cluster number, any of the previously attempted numbers for the new cluster, and not be greater than 9)
WorkaroundInitially two appliances were configured in a single cluster (one as a primary and the other as a standby).  The standby was removed from the cluster and re-configured as a new cluster (with only a primary) to join it in a group with the first cluster.
The original standby appliance was removed from the original cluster using the following steps:

1. Ran uninstall script on the standby

2. Took the following steps to make the primary appliance to accept updates:
   a. su - oracle
   b. Connect to dgmgrl cli by command dgmgrl sys/passwd
   c. Disable fast start failover in primary with the force option:
         DGMGRL> disable fast_start failover force;
   d. Connect to sqlplus / as sysdba and issue the command:
         alter system set dg_broker_start=false;
   e. Shutdown the primary database with shutdown immediate command in sqlplus
   f. Startup the primary database with startup command in sqlplus
   g. If access to KMS GUI fails with an error "You are not authorized to access this resource", Access Manager services may require a restart. To manually start Access Manager and Tomcat services, run the following commands, after logging in as root:
         service ctrust stop
         service tomcat stop
         service ctrust start
         [wait for few seconds and then issue:]
         service tomcat start
Legacy Article IDa49140