000035076 - Process for replacing RSA Identity Governance and Lifecycle faulty hard drive (HDD) in RAID 5

Document created by RSA Customer Support Employee on Jun 6, 2017Last modified by RSA Customer Support Employee on Jun 6, 2017
Version 2Show Document
  • View in full screen mode

Article Content

Article Number000035076
Applies ToProduct Set: RSA Identity Governance and Lifecycle
RSA Version: All Versions
Platform: Dell Hardware Appliance R620, R720, R730
Operating Systems: SUSE Linux Enterprise Server, Red Hat Enterprise Linux
IssueThis document describes the process for replacing a faulty hard drive (HDD) in RAID 5.
ResolutionIf you are noticing that the hard drive blinks amber light 4 times per second, this indicates that hard drive is faulty (failed) and is offline and needs to be replaced.
If you are noticing blinking lights on multiple hard drives, please make sure to replace only one drive at a time, replacing the failed drive first. Below diagram shows different indicator patterns on hard drives, based on the light color and the way it blinks.
HDD Indicator Patterns

User-added image
RAID 5 tolerates one drive failing since there is a hot-spare out of the box and it is auto-activated to take over if one drive fails. If a second drive fails after the hot spare was engaged, RAID5 can run, but in a degraded state.
This procedure is for drives that are hot-swappable. Hot swap is an important part of data protection.
When the drives are hot-swappable (accessible and removable from the front of the machine), they should be removed/replaced "hot". For a running server with a failed drive it is highly recommended to not power down.  Power cycling a failed array can trigger array failure.  The risk (in RAID5) is losing another drive before you rebuild so sooner the better and easier than rebuilding a server from backups.
Hot swap ability is determined by your RAID controller.  If you have a PERC (which ships by default in the server) you have hot swap.  [You can disable that if you want but it not recommended].
So faulty drives should be replaced while system is ON.  If one HDD is faulty, it will be shown with ‘failed’ state: 
For example: iDRAC Virtual Console will show the status shown below:
User-added image

To replace the faulty HDD, you need to remove it from the bay slowly, wait for 10-12 seconds, add the new drive and wait for around 30 seconds after which server will recognize the new HDD and will start rebuilding automatically.  This will be indicated by a fast flashing green light. Your storage software should show state of the drive as rebuilding.
User-added image 

At this point you need to let the system fix the RAID so everything shows as being healthy.  Once the light shows steady green, the drive replacement process is complete. Note that it is the RAID Array Controller (PERC) that performs the rebuild of HDD.
Once rebuild is complete, disk state will be Online, which indicates it is the part of RAID now.
NotesThe Global Hot Spare is not part of the array and the READY  is its normal state. The hard drive moves from READY state to Online state only if it is part of the array and has been assigned as a Global Hot Spare in order to trigger the rebuild process.