000012663 - RSA Authentication Manager database instance and other services fail to start after reboot with ORA-1172

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support Employee on Apr 7, 2017
Version 2Show Document
  • View in full screen mode

Article Content

Article Number000012663
Applies ToRSA Product Set: SecurID
RSA Product/Service Type: Authentication Manager
RSA Version/Condition: 7.1 SP2 and higher, 3.0 SP2 and higher
O/S Version: Microsoft Windows Server 2003 or 2008
 
Issue
  • RSA Authentication Manager Administration Server and the RSA Authentication Manager all fail to start after reboot.
  • Operating System disk full
  • C:\Windows\Temp\ is full of dbmgmt<###>.sql files  
  • In the <RSA_HOME>/db/admin/<instance>/bdump/<instance>*.trc files are the following errors:
ORA-01172:  recovery of thread 1 stuck at block 2 of file 2
ERROR at line 1:
ORA-16038: log 1 sequence# 4294 cannot be archived
ORA-19809: limit exceeded for recovery files
ORA-00312: online log 1 thread 1:
'/usr/local/RSASecurity/RSAAuthenticationManager/db/oradata/azqjz0aw/redo01.log'
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
Error reported in the alert log: ORA-19815: WARNING: db_recovery_file_dest_size of 107374182400 bytes is 100.00% used, and has 0 remaining bytes available.
ORA-19809:  DB instance fails to start with 1067 error
System parameters with non-default values
ALTER DATABASE OPEN
ORA-1507 signalled during: ALTER DATABASE OPEN
Windows could not start the RSA Authentication Manager Database Instance service on <Server Name>.
Error 1067: The process terminated unexpectedly
Resolution
  • On a Windows server, the C:\Windows\Temp\ folder is full of .sql files.  The folder needs to be renamed. DO NOT try to CD to it or remove it while booted.  To safely remove the folder, boot to Safe Mode if you must and create temporary alternate folder and then remove the folder and/or .sql files
  • If deleting all the .sql files does not resolve the issue, a second option is to check is the system fingerprint.  If hardware was changed then the server stops running as a security check until the fingerprint is regenerated.  From a command line on the primary, run the following:
cd <drive>\Program Files\RSA Security\RSA Authentication Manager\utils>
rsautil manage-secrets -a recover

Enter master password:  <enter the system's master password>
System fingerprint restored successfully.


Authentication Manager services still do not start


If the Authentication Manager services still do not start, proceed with the following:
  1. Stop all Authentication Manager services, either in the Windows Services.msc or in Unix/Linux with the following commands:
rsaadmin@cs-appliance3-03:~> cd /usr/local/RSASecurity/RSAAuthenticationManager/server
rsaadmin@cs-appliance3-03:~> ./rsaam stop all
rsaadmin@cs-appliance3-03:~> exit
rsaadmin@cs-appliance3-03:~> sudo su - 
rsaadmin's password: <enter OS user password> 

  1. Look for stuck Oracle processes, which should not exist if Oracle stopped cleanly.  The database will not start with these stuck processes, e. g.
[root@cs-appliance3-03 ~]# ps -ef | grep ora
rsaadmin  5006     1  0 16:26 ?        00:00:00 ora_j001_oyrpf14j
root      9469  9141  0 16:28 pts/0    00:00:00 grep ora
rsaadmin  9655     1  0 May14 ?        00:00:00 ora_q004_oyrpf14j
rsaadmin 11829     1  0 May09 ?        00:00:53 ora_pmon_oyrpf14j
rsaadmin 11831     1  0 May09 ?        00:00:27 ora_psp0_oyrpf14j
rsaadmin 11833     1  0 May09 ?        00:00:08 ora_mman_oyrpf14j
rsaadmin 11835     1  0 May09 ?        00:00:42 ora_dbw0_oyrpf14j
rsaadmin 11837     1  0 May09 ?        00:01:02 ora_lgwr_oyrpf14j
rsaadmin 11839     1  0 May09 ?        00:06:57 ora_ckpt_oyrpf14j

  1. If you see these processes even after you have stopped the database, then you either need to reboot, or as root, kill -9 them manually by their PID number, which is the number in the second column from right.  For example, 
kill -9 11839

  1. After killing the PID, check again.  Sometimes you have to kill every single PID.  Sometimes when you kill one PID, all the other Oracle PIDs stop.
[root@cs-appliance3-03 ~]# ps -ef | grep ora
rsaadmin  5006     1  0 16:26 ?        00:00:00 ora_j001_oyrpf14j
root      9469  9141  0 16:28 pts/0    00:00:00 grep ora
rsaadmin  9655     1  0 May14 ?        00:00:00 ora_q004_oyrpf14j
rsaadmin 11829     1  0 May09 ?        00:00:53 ora_pmon_oyrpf14j
rsaadmin 11831     1  0 May09 ?        00:00:27 ora_psp0_oyrpf14j
rsaadmin 11833     1  0 May09 ?        00:00:08 ora_mman_oyrpf14j
rsaadmin 11835     1  0 May09 ?        00:00:42 ora_dbw0_oyrpf14j
rsaadmin 11837     1  0 May09 ?        00:01:02 ora_lgwr_oyrpf14j

  1. Work you way down, or up, the list of process IDs until all ora processes except your own grep are gone.  For example, 
kill -9 11839
kill -9 11837
kill -9 11835
kill -9 11833
...
[root@cs-appliance3-03 ~]# ps -ef | grep ora
root      7786  7573  0 16:40 pts/0    00:00:00 grep ora
[root@cs-appliance3-03 ~]#

  1. Now search for running Java process too with the following command.
ps -ef | grep java

  1. If any are found, kill them until only the grep remains.
  2. Try to start the Authentication Manager services again:
sudo su rsaadmin
cd /usr/local/RSASecurity/RSAAuthenticationManager/server
./rsaam start all

When error ORA-1172 is signaled during ALTER DATABASE OPEN, the RSA Authentication Manager Database Server crash recovery or instance recovery could not apply a change to a block because it was not the next change. This can happen if the block was corrupted and then repaired during recovery. For example, if an unexpected power outage happened, the RSA Authentication Manager Database Server may fail to start. It may help to try the following:
  1. Stop all the services except RSA Authentication Manager Database Listener and RSA Authentication Manager Database Server from either the Windows Services.msc or for a Unix/Linux server, navigate to /usr/local/RSASecurity/RSAAuthenticationManager/server and run the following:
./rsaam stop all
./rsaam start db

  1. You can also stop and start the Oracle database with the following commands:

./rsautil manage-database -a stop-db                  
./rsautil manage-database -a start-db

  1. From a command prompt or SSH session, navigate to <RSAHOME>/RSASecurity/RSAAuthenticationManager/utils.
  2. Run the following command to capture the com.rsa.db.root.password:
./rsautil manage-secrets -a get com.rsa.db.root.password
Enter master password:  <enter master password>

  1. Set the environmental variable.  Note that for Unix/Linux the command is . ./rsaenv (dot space dot slash).  For Windows, run the command rsaenv.cmd
    . ./rsaenv                    
    sqlplus sys/<paste com.rsa.db.root.password captured above> as sysdba
    SQL*Plus: Release 10.2.0.4.0 - Production on Thu Sep 3 15:46:33 2009
    Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
     
    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
    With the Partitioning, Data Mining and Real Application Testing options

    1. Run the following commands:
    SQL> shutdown immediate
    ORA-01109: database not open
    Database dismounted.
    ORACLE instance shut down.
    SQL> startup mount
    ORACLE instance started.
    Total System Global Area 1241513984 bytes
    Fixed Size                  1267212 bytes
    Variable Size             738200052 bytes
    Database Buffers          486539264 bytes
    Redo Buffers               15507456 bytes
    Database mounted.
    SQL> alter database open
      2  ;
    alter database open
    *
    ERROR at line 1:
    ORA-01172: recovery of thread 1 stuck at block 323 of file 2
    ORA-01151: use media recovery to recover block, restore backup if needed
    SQL> recover database;                              
    SQL> alter database open;
    Database altered.
    SQL> exit
     
    Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
    With the Partitioning, Data Mining and Real Application Testing options

    1. If the recover database; command is not needed, you will see the following message.  Just continue to the alter database open; and exit commands
    ORA-00283: recovery session canceled due to errors, 
    ORA-00264: no recovery required Media recovery complete.

    1. Now start all the services and this should resolve the issue.
    cd /usr/local/RSASecurity/RSAAuthenticationManager/server
    ./rsaam start all

    1. If you see the following messages, follow the steps in the section on Additional Oracle Errors below:
    ORA-16038: log 1 sequence# 5498 cannot be archived
    ORA-19809: limit exceeded for recovery files
    ORA-00312: online log 1 thread 1:
    '/usr/local/RSASecurity/RSAAuthenticationManager/db/oradata/kc5xtcke/redo01.log'


    Additional Oracle Errors


    1. Follow steps 1 - 5 above.
    2. Run the following commands from within SQL:
    SQL> shutdown immediate ;
    ORA-01109: database not open
    Database dismounted.
    ORACLE instance shut down.
    SQL> startup mount
    SQL> alter database clear unarchived logfile group 1;
    SQL> alter database clear unarchived logfile group 2;
    SQL> alter database clear unarchived logfile group 3;
    SQL> alter database open;
    SQL> shutdown immediate;
    SQL> startup;
    SQL> exit

    1. You may need to restart the database from rsautils instead of from ./rsaam:
    cd /usr/local/RSASecurity/RSAAuthenticationManager/server
    ./rsautil manage-database -a stop-db 
    ./rsautil manage-database -a start-db 
    ../server/rsaam start all

    1. If this still does not start all of the Authentication Manager services, as the final fix, you may need to increase the archive log limit.  See KB 000027714 for details.
    sqlplus sys/<paste com.rsa.db.root.password captured above> as sysdba
    sql> shutdown immediate;
    sql> startup nomount;
    sql> alter system set db_recovery_file_dest_size=160G scope=both;
    sql> alter database mount;
    sql> alter database open;
    sql> exit
    -bash-3.00$ cd $RSA_HOME/db/bin
    -bash-3.00$ ./rman
    > connect target sys/"encrypted password"
    > crosscheck archivelog all;
    > delete expired archivelog all;
    > exit

    1. Stop and start RSA Authentication Manager Services.
    cd /usr/local/RSASecurity/RSAAuthenticationManager/server 
    ./rsaam start all
    Legacy Article IDa57396

    Attachments

      Outcomes