000036992 - Collector reports ORA-12805: parallel query server died unexpectedly, in RSA Identity Governance & Lifecycle

Document created by RSA Customer Support Employee on Dec 7, 2018
Version 1Show Document
  • View in full screen mode

Article Content

Article Number000036992
Applies ToRSA Product Set: RSA Identity Governance & Lifecycle
RSA Product/Service Type: Appliance
RSA Version/Condition: 7.0.1 and above
Platform: Linux
IssueA Data Collection fails, where an examination of the aveksaServer.log shows the following error.

11/03/2018 18:49:08.963 ERROR (Exec Task Consumer#0) [com.aveksa.server.xfw.TaskExecutor] Failed method=Execute ExecutionTask[TaskID=158008 RunID=274732 Source=3669 Type=DataRelationshipProcessing Status=InProgress] com.aveksa.server.xfw.ExecutionException: com.aveksa.server.collector.DataProcessorException: com.aveksa.server.db.PersistenceException:
java.sql.SQLException: ORA-20011: Approximate NDV failed: ORA-12805: parallel query server died unexpectedly
ORA-06512: at "AVUSER.EDC_DATA_COLLECTOR", line 265


As this error relates to the Oracle database, the Alert log for the Oracle database needs to be examined.
The typical location for the Oracle database Alert log is: /u01/app/oracle/diag/rdbms/avdb/AVDB/trace/alert_AVDB.log.



Around the same time the error is reported in the aveksaServer.log, there should also be a trace file generated for an Oracle Parallel process.
For example, in the Alert log extract below, the Oracle Parallel process is called: p003.




Sat Nov 03 18:41:30 2018
Errors in file /u01/app/oracle/diag/rdbms/avdb/AVDB/trace/AVDB_p003_18234.trc:
ORA-15186: ASMLIB error function = [kfk_asm_ioerror],  error = [0],  mesg = [I/O Error]
ORA-01115: IO error reading block from file 10 (block # 12636800)
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk
ORA-15186: ASMLIB error function = [kfk_asm_ioerror],  error = [0],  mesg = [I/O Error]


If we examine the Oracle Parallel process trace file, we can confirm that the errors that caused the process to fail.




ORA-00603: ORACLE server session terminated by fatal error 
ORA-24557: error 1115 encountered while handling error 1115; exiting server process 
ORA-01115: IO error reading block from file (block # ) 
ORA-01115: IO error reading block from file 10 (block # 12636800) 
ORA-15081: failed to submit an I/O operation to a disk 
ORA-15081: failed to submit an I/O operation to a disk 
ORA-15186: ASMLIB error function = [kfk_asm_ioerror], error = [0], mesg = [I/O Error]


Finally, as this error also involves the "ASMLIB", the Alert log for the Oracle ASM instance can also be examined.
The typical location for the Oracle ASM Alert log is: /u01/app/oracle/diag/rdbms/+asm/+ASM/trace/alert_+ASM.log.



So, around the same time the error is reported in the Alert log, the +ASM instance may report the following.
These warning show that Oracle ASM also detected problems writing to a disk.




Sat Nov 03 18:39:59 2018
WARNING: Waited 121 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 121 secs for write IO to PST disk 0 in group 1.
CauseA problem has occurred when writing to the Oracle database files, that are serviced by the Oracle ASM Storage Manager.

If no specific Operating System or disk errors are reported, then the problem is likely to be caused by an underlying hardware problem, or if on a Virtual Machine, the disk emulation.
In some cases, no O/S error is reported because the problem is with the Linux software causing a hardware hang (for example Oracle Linux orabug 20561622).

If the problem is related to hardware, then one of more of the following messages may also appear in the Oracle database Alert log.

ORA-63999: data file suffered media failure



WARNING: Read Failed. group:1 disk:0 AU:171184 offset:0 size:262144 
path:ORCL:VOL1 
incarnation:0x0 asynchronous result:'I/O error' 
subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so krq:0x2b03c7aef670 bufp:0x2b03c96cf000 osderr1:0x3 osderr2:0x2e 
IO elapsed time: 436092000 usec Time waited on I/O: 0 usec 
WARNING: failed to read mirror side 1 of virtual extent 112098 logical extent 0 of file 277 in group [1.1781028343] from disk VOL1 allocation unit 171184 reason error; if possible, will try another mirror side

Finally, the errors can also be caused if access to the disks managed by Oracle ASM have the wrong permissions.
If this is the case, then the following errors may also be reported.

ORA-15025: could not open disk '/dev/sda3'
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied  (or Input/Output error)
Resolution

Hardware Fault


If you are running an RSA Appliance, please engage RSA Customer Support for further assistance, as the hard disk may need to be replaced.

If you are not running an RSA Appliance, please engage your local Linux System Administrator, to perform a hard disk review and replacement.

Permission Denied


If the problem is permissions, then please check that the Oracle ASM disk used by RSA Identity Governance and Lifecycle has the correct permissions.

  1. Login to the Operating System as "root".
  2. Go to the directory where the Oracle ASM Disk files are located: cd /dev/oracleasm/disks
  3. List the disks and their permissions: ll

    /dev/oracleasm/disks # ll
    total 0
    brw-rw---- 1 oracle dba 8, 3 Oct 31 16:25 VOL1

  4. The binary file VOL1 should be owned by "oracle" in the group "dba", and have Read Write access for System and Owner, and no access for World.
  5. If the owner for VOL1 is incorrect, then use the Linux chown command to fix the ownership: chown oracle:dba VOL1
  6. If the permissions for VOL1 are incorrect, then use the Linux chmod command to fix the permissions: chmod 660 VOL1
 
WorkaroundRe-start the whole Linux server, to see if the hardware failure is only temporary.
You may want to gracefully bring down RSA Identity Governance and Lifecycle, first.
  1. Login to the Operating System as "oracle".
  2. Shut down RSA Identity Governance and Lifecycle: acm stop
  3. Shut down the Oracle services, including the Oracle database and ASM: acm stoporacle
  4. Re-start the Linux server.
However, please still consider replacing any faulty hardware.

Attachments

    Outcomes