000030838 - How to replace a single hard drive in an RSA NetWitness Logs & Network appliance

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support on Jun 15, 2018
Version 4Show Document
  • View in full screen mode

Article Content

Article Number000030838
Applies ToRSA Product Set: NetWitness Logs & Network, Security Analytics, NetWitness NextGen (Legacy)
RSA Product/Service Type: Core Appliance, Packet Hybrid, Log Hybrid, SA All-in-One, Event Stream Analysis (ESA), Malware Analysis, Warehouse, Archiver, Security Analytics Server
RSA Version/Condition: 9.8, 10.1.x, 10.2.x, 10.3.x, 10.4.x, 10.5.x, 10.6.x
Platform: CentOS
Platform (Other): Series 4 Appliances, S4S Appliances, Security Analytics Appliances, MegaCli Hardware RAID
O/S Version: EL5, EL6
IssueNeed to replace a hard disk in either appliance head unit or a Direct Attached Capacity (DAC) array.
Resolution

Notes:



  • Steps 1 to 7 may be done remotely either via SSH or DRAC Console
  • If you need to replace more than one disk then replace them in the following order of preference:
    • Disks on-board the appliance head unit (versus a DAC)
    • Disks which currently have physical state of Failed, Offline, Unknown state or Missing
    • Disks which currently have physical state of Unconfigured or Hotspare
  • Ensure that if you are replacing the OS drive on an appliance (these show as 136.732 GB in nwraidutil.pl output), that you shutdown the appliance operating system first. Prior to shutting down, it is always recommended to stop capture/aggregation on decoders and concentrators in either the SA UI or NetWitness Administrator and stop the services. An example of commands used to stop the services on a packet concentrator running CentOS6 would be: stop nwappliance && stop nwconcentrator
  • Warning: If you need to replace more than one disk, don't replace all disks simultaneously. If previous disk was a virtual disk/RAID volume member then wait for any RAID rebuild to complete (see step 6. for command).
  • For SA Warehouse Nodes (MAPR SAW) please refer to the article entitled Security Analytics | How to replace faulty disk on RSA Security Analytics Warehouse (SAW) node in RSA Security Analytics 10.3 and higher instead of this article for disk replacement.


Steps for Replacing Single Disk
1. Confirm the logical location of the disk
Use
nwraidutil.pl to confirm Adapter and whether disk is on-board the appliance head unit or on a DAC
nwraidutil.pl - How to Download and Use the RSA NetWitness RAID Utility


Record the enclosure number and the slot number of the disk to be replaced if not already known e.g. Enclosure 15 Slot 2 is referred to as [15:2] in commands below.
Please refer to Example 2 in notes below.


The disks in the appliance head unit are typically show as either 136.732 GB (146 GB) or 931.512 GB (1 TB) and are on Adapter 0 which is usually one of the models Dell PERC H700/H710.The disks in a JBOD/DAC are typically in enclosures attached to Adapter 1. Adapter 1 is usually one of the following models Dell PERC H800/H810.
In rare instances Linux reverses the Adapters and the on-board disks are on Adapter 1.


2. Confirm the current role of the disk - if it is part of a Virtual Disk (VD), a Hotspare or Unconfigured/Failed
Use
nwraidutil.pl and examine Physical Disk Information



2a. If disk to be replaced shows as GEI or ID-X e.g. ID-0 in the Physical Disk Information then it is a hotspare.
2b. If disk to be replaced shows as (
U) - Unconfigured, (X) - Offline or (!) - Failed in the Physical Disk Information then it is currently not part of VD or hotspare
Please refer to Example 2 in notes below.

Use
nwraidutil.pl and examine Logical Disk Information
2c. If disk to be replaced shows as
(O) in the Logical Disk Information then it part of a VD set.
Please refer to Example 1 in notes below.

3. Take Disk Offline
Note: This does not need to be done if the disk is not currently part of a VD and so does not show as
(O) in Logical Disk Information




Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -PhysDrv[ENCLOSURE:SLOT] -a<adapter_num>



Example - Replacing the 14th disk in 1st DAC (slots start at 0)
Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -PhysDrv[15:13] -a1

Example Output:
Adapter: 1: EnclId-15 SlotId-13 state changed to OffLine.

Exit Code: 0x00

Disk state may change to (U) for Unconfigured and in Adapter Information, you will see: [VD's Degraded]  1

4. Mark drive as missing [optional]
Note: This does not need to be done if the disk is not currently part of a VD and so does not show as
(O) in Logical Disk Information



Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PdMarkMissing -physdrv[ENCLOSURE:SLOT] -a<adapter_num>



Example - Replacing the 14th disk in 1st DAC
Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PdMarkMissing -physdrv[15:13] -a1

Example Output:
Adapter: 1: EnclId-15 SlotId-13 is marked Missing.

Exit Code: 0x00

 

5. Prepare for removal
Note: This does not need to be done if the disk is not currently part of a VD and so does not show as
(O) in Logical Disk Information




Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PdPrpRmv -physdrv[ENCLOSURE:SLOT] -a<adapter_num>



Example - Replacing the 14th disk in 1st DAC
Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PdPrpRmv -physdrv[15:13] -a1

Example Output:
Prepare for removal Success

Exit Code: 0x00


6. Show progress of rebuild [optional]
If there is a hotspare disk available and a member of the VD is taken offline, then the hotspare disk state will change to
R - Rebuild once step 3 is run.
This is not applicable if disk taken offline was not previously
(O) in the Logical Disk Information.



Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv[ENCLOSURE:SLOT] -a<adapter_num>



Example - Watching rebuild of hotspare to member of VD
Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv[15:14] -a1

Example Output:
Rebuild Progress on Device at Enclosure 15, Slot 14 Completed 3% in 7 Minutes.

Exit Code: 0x00


7. Show physical location of drive to be replaced


Start Flashing a Drives Amber LED: (you may need to remove DAC faceplate to properly see drive LEDs)
/opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[ENCLOSURE:SLOT] -a<adapter_num>



Stop Flashing a Drives LED:
/opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[ENCLOSURE:SLOT] -a<adapter_num>


8. Physically replace drive
 

9. Get missing drives Array and Row (required for next step) [optional if Step 4 is skipped or fails]
This is optional if Step4 was skipped or failed but physical disk status shows as
(U) - Unconfigured.




Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PdGetMissing -a<adapter_num>



Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PdGetMissing -a1

Example Output 1:
    Adapter 1 - No Missing Drive is Found.

Exit Code: 0x00

Example Output 2:
    Adapter 1 - Missing Physical drives

    No.   Array   Row   Size Expected
    1     1       13    139392 MB

Exit Code: 0x00


If output shows "No Missing Drive is Found" then can skip Step 10.



10. Replace Missing Drive


Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PdReplaceMissing -PhysDrv[ENCLOSURE:SLOT] -Array<N> -row<N> -a<adapter_num>
Note: Arrays start a 0 so first array is Array0

Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PdReplaceMissing -PhysDrv[15:13] -Array1 -row13 -a1

Example Output:
Adapter: 1: Missing PD at Array 1, Row 13 is replaced.

Exit Code: 0x00


11. If disk shows as (X) for Offline in nwraidutil.pl output, change disk back to Online.


Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 -PDOnline -PhysDrv[ENCLOSURE:SLOT] -a<adapter_num>

Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 -PDOnline -PhysDrv[15:13] -a1

Example Output:
EnclId-15SlotId-13 state changed to OnLine.

Exit Code: 0x00


12a. Show progress of rebuild (not applicable if replaced disk will become hotspare)
See Step 6 above for command.


OR


12b. If disk was previously a hotspare (or there is currently no hotspare present in the enclosure), then a hotspare needs to be configured.
nwraidutil.pl should show Physical Disk state as
(U) - Unconfigured.





Example Output:
15    14    (U)       0             1.819 TB       HITACHI HUS72302CLAR2000C1D6YGKNKHTK


If the disk is onboard the head unit or comes from the 1st DAC on a S4/S4S/SA appliance then it is usually set as Global hotspare with Enclosure Affinity:


Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 pdhsp set enclaffinity physdrv[ENCLOSURE:SLOT] -a<adapter_num>

After running this command disk status will show as GEI.


For disks from enclosures after the 1st DAC, it is usually recommended to configure hotspare as Virtual Disk dedicated hotspare:





Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 pdhsp set dedicated Array<N> physdrv[ENCLOSURE:SLOT] –a<adapter_num>




Example: Setting 14th SATA disk in the 1st DAC of a packet concentrator as VD dedicated hotspare
/opt/MegaRAID/MegaCli/MegaCli64 pdhsp set dedicated Array1 physdrv[15:13] -a1

After running this command disk status will show as ID-X. Above example creates a hotspare for 2nd VD (Array1) and so disk status will show as ID-1

Notes on setting hotspare:


  • In order for a hotspare to take over the role of a disk in an array, the disk needs to have the same capacity as the disk it is replacing. The impact of this is that a SATA disk cannot be a hotspare for the array of Solid State Disk (SSD) found in concentrator DACs.
  • If you are creating a hotspare which is not the last disk in the DAC, consider adding the parameter -nonRevertible to make the hotspare non-revertible.
  • If a disk in VD is still in (R) - Rebuild state then setting hotspare for this VD will likely fail while this is still rebuilding.

13. If the hotspare was previously a VD dedicated hotspare, then one of the features of this hotspare type is the hotspare is revertible (once bad disk is replaced, the hotspare will go back to being a hotspare). The way this is done is using copyback.


Example Status in nwraidutil.pl output (script version prior to 2015.08.10):
15   14     (?)       0             2.728 TB       SEAGATE ST3000NXCLAR3000GS18Z1Y1H6A1

 


Example Status in nwraidutil.pl output (script version 2015.08.10 and later):
15   14     (C)       0             2.728 TB       SEAGATE ST3000NXCLAR3000GS18Z1Y1H6A1


So once old hotspare is a member of VD and is (O) status, then replaced drive will be in copyback status while it copies the contents of the old hotspare in preparation for that drive to revert back to a hotspare.

There are two options:
1) Wait for copyback to complete (which will take somewhere in the region of 7 hours)
This can be monitored through the RAID adapter log:



Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 adpeventlog getsinceshutdown –f <output_filename.log> –a<adapter_num>




Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 adpeventlog getsinceshutdown –f raid_events_since_boot.log –a1

See Example 3 in Notes below for example of raid adapter event log.

Copyback progress can also be monitoring using the following command:



Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 pdcpybk showprog physdrv[ENCLOSURE:SLOT] –a<adapter_num>




Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 pdcpybk showprog physdrv[15:13] -a1
Example Output:
Copyback Progress on Device at Enclosure 15, Slot 13 Completed 5% in 21 minutes

Exit Code: 0x00


2) Stop copyback and manually set hotspare using command in Step 11b.


Syntax:
/opt/MegaRAID/MegaCli/MegaCli64 pdcpybk stop physdrv[ENCLOSURE:SLOT] –a<adapter_num>




Example Command:
/opt/MegaRAID/MegaCli/MegaCli64 pdcpybk stop physdrv[15:13] -a1


If you have any questions, concerns or feedback about this article, please contact RSA Support quoting this KB number.
NotesAssumptions:
This article assumes the appliance hardware is either a Series 4 appliance or a Security Analytics/S4S Appliance.

For Series 3 appliances which have reached End of Product Support (EOPS), you may need to exchange



/opt/MegaRAID/MegaCli/MegaCli64

for


/opt/MegaRAID/CmdTool2/CmdTool2

in above commands

Example 1: Example of Logical Disk Information in
nwraidutil.pl
output from a packet concentrator appliance


------------------------
Logical Disk Information
------------------------

                        Physical Drive State Legend
-------------------------------------------------------------------------------------
B  Unconfigured(Bad)                                            O  Online
D  Dedicated Hotspare and associated virtual drive number       R  Rebuild
E  Hotspare prefers same enclosure                              S  Solid-State Drive
F  Foreign                                                      U  Unconfigured(Good)
G  Global hotspare                                              X  Offline
I  Hotspare is revertible                                       !  Failed
M  Missing                                                      ?  Unknown state
-------------------------------------------------------------------------------------
NOTE: 'E' does not prohibit a hotspare from being used in another enclosure, it is merely a preference

        Logical Drive State Legend
------------------------------------------------
D  Degraded                     X  Offline
O  Optimal                      !  Failed
P  Partially Degraded           ?  Unknown state
R  Rebuild
------------------------------------------------

Adapter: 0 - PERC H710P Mini
        Virtual Disk: 0 (O) - Found 2 of 2 {Raid Level 1, 136.125 GB, 128 KB Stripe Size, WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU}
                PD: 0   Enclosure: 32   Slot: 0  (O)   136.732 GB   SEAGATE ST9146853SS     YS0A6XM3H6HP
                PD: 1   Enclosure: 32   Slot: 1  (O)   136.732 GB   SEAGATE ST9146853SS     YS0A6XM3H7DG
        Virtual Disk: 1 (O) - Found 2 of 2 {Raid Level 1, 931.0 GB, 128 KB Stripe Size, WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU}
                PD: 0   Enclosure: 32   Slot: 2  (O)   931.512 GB   SEAGATE ST91000640SS    AS099XG5SYBG
                PD: 1   Enclosure: 32   Slot: 3  (O)   931.512 GB   SEAGATE ST91000640SS    AS099XG5SXBT
Adapter: 1 - PERC H810 Adapter
        Virtual Disk: 0 (O) - Found 7 of 7 {Raid Level 5, 10.913 TB, 128 KB Stripe Size, WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU}
                PD: 0   Enclosure: 15   Slot: 7  (O)   1.819 TB     HITACHI HUS72302CLAR2000C1D6YFKV9MJK
                PD: 1   Enclosure: 15   Slot: 8  (O)   1.819 TB     HITACHI HUS72302CLAR2000C1D6YFKUYZDK
                PD: 2   Enclosure: 15   Slot: 9  (O)   1.819 TB     HITACHI HUS72302CLAR2000C1D6YFKTAXWK
                PD: 3   Enclosure: 15   Slot: 10 (O)   1.819 TB     HITACHI HUS72302CLAR2000C1D6YGKU290K
                PD: 4   Enclosure: 15   Slot: 11 (O)   1.819 TB     HITACHI HUS72302CLAR2000C1D6YFKV6S9K
                PD: 5   Enclosure: 15   Slot: 12 (O)   1.819 TB     HITACHI HUS72302CLAR2000C1D6YGKNP24K
                PD: 6   Enclosure: 15   Slot: 13 (O)   1.819 TB     HITACHI HUS72302CLAR2000C1D6YGKNKHTK
        Virtual Disk: 1 (O) - Found 7 of 7 {Raid Level 5, 1.087 TB, 128 KB Stripe Size, WriteBack, ReadAdaptive, Cached, Write Cache OK if Bad BBU}
                PD: 0   Enclosure: 15   Slot: 0  (OS)   186.310 GB   HITACHI HUSRL402 CLAR200C190XTVVDY5A
                PD: 1   Enclosure: 15   Slot: 1  (OS)   186.310 GB   HITACHI HUSRL402 CLAR200C190XTVVZHMA
                PD: 2   Enclosure: 15   Slot: 2  (OS)   186.310 GB   HITACHI HUSRL402 CLAR200C190XTVUX9RA
                PD: 3   Enclosure: 15   Slot: 3  (OS)   186.310 GB   HITACHI HUSRL402 CLAR200C190XTVVZ9LA
                PD: 4   Enclosure: 15   Slot: 4  (OS)   186.310 GB   HITACHI HUSRL402 CLAR200C190XTVVW9RA
                PD: 5   Enclosure: 15   Slot: 5  (OS)   186.310 GB   HITACHI HUSRL402 CLAR200C190XTVW1VEA
                PD: 6   Enclosure: 15   Slot: 6  (OS)   186.310 GB   HITACHI HUSRL402 CLAR200C190XTVVAG6A

No logical disk problems found.


Example 2: Example of Physical Disk Information in nwraidutil.pl output from a packet concentrator appliance


-------------------------
Physical Disk Information
-------------------------

                        Physical Drive State Legend
-------------------------------------------------------------------------------------
B  Unconfigured(Bad)                                            O  Online
D  Dedicated Hotspare and associated virtual drive number       R  Rebuild
E  Hotspare prefers same enclosure                              S  Solid-State Drive
F  Foreign                                                      U  Unconfigured(Good)
G  Global hotspare                                              X  Offline
I  Hotspare is revertible                                       !  Failed
M  Missing                                                      ?  Unknown state
-------------------------------------------------------------------------------------
NOTE: 'E' does not prohibit a hotspare from being used in another enclosure, it is merely a preference

Adapters found: 2

Adapter 0 (PERC H710P Mini) enclosures found: 1
Adapter 0 (PERC H710P Mini) enclosure 32 slots found: 4
Encl  Slot  State     P.Fail.Count  Raw Size       Inquiry Data
32    0     (O)       0             136.732 GB     SEAGATE ST9146853SS     YS0A6XM3H6HP
32    1     (O)       0             136.732 GB     SEAGATE ST9146853SS     YS0A6XM3H7DG
32    2     (O)       0             931.512 GB     SEAGATE ST91000640SS    AS099XG5SYBG
32    3     (O)       0             931.512 GB     SEAGATE ST91000640SS    AS099XG5SXBT

Adapter 1 (PERC H810 Adapter) enclosures found: 1
Adapter 1 (PERC H810 Adapter) enclosure 15 slots found: 15
Encl  Slot  State     P.Fail.Count  Raw Size       Inquiry Data
15    0     (OS)      0             186.310 GB     HITACHI HUSRL402 CLAR200C190XTVVDY5A
15    1     (OS)      0             186.310 GB     HITACHI HUSRL402 CLAR200C190XTVVZHMA
15    2     (OS)      0             186.310 GB     HITACHI HUSRL402 CLAR200C190XTVUX9RA
15    3     (OS)      0             186.310 GB     HITACHI HUSRL402 CLAR200C190XTVVZ9LA
15    4     (OS)      0             186.310 GB     HITACHI HUSRL402 CLAR200C190XTVVW9RA
15    5     (OS)      0             186.310 GB     HITACHI HUSRL402 CLAR200C190XTVW1VEA
15    6     (OS)      0             186.310 GB     HITACHI HUSRL402 CLAR200C190XTVVAG6A
15    7     (O)       0             1.819 TB       HITACHI HUS72302CLAR2000C1D6YFKV9MJK
15    8     (O)       0             1.819 TB       HITACHI HUS72302CLAR2000C1D6YFKUYZDK
15    9     (O)       0             1.819 TB       HITACHI HUS72302CLAR2000C1D6YFKTAXWK
15    10    (O)       0             1.819 TB       HITACHI HUS72302CLAR2000C1D6YGKU290K
15    11    (O)       0             1.819 TB       HITACHI HUS72302CLAR2000C1D6YFKV6S9K
15    12    (O)       0             1.819 TB       HITACHI HUS72302CLAR2000C1D6YGKNP24K
15    13    (O)       0             1.819 TB       HITACHI HUS72302CLAR2000C1D6YGKNKHTK
15    14    (GEI)     0             1.819 TB       HITACHI HUS72302CLAR2000C1D6YGKNVJKK            Hotspare Information

No physical disk problems found.


Example 3:  Example RAID Adapter Event Log


seqNum: 0x00001c68
Time: Tue Nov  4 09:24:15 2014

Code: 0x0000002a
Class: 0
Locale: 0x20
Event Description: Shutdown command received from host
Event Data:
===========
None


seqNum: 0x00001c69
Seconds since last reboot: 4
Code: 0x00000000
Class: 0
Locale: 0x20
Event Description: Firmware initialization started (PCI ID 005b/1000/1f2d/1028)
Event Data:
===========
VendorId: 1000
DeviceId: 5b
SubVendorId: 1028
SubDeviceId: 1f2d


seqNum: 0x00001d6a
Seconds since last reboot: 131
Code: 0x00000119
Class: 0
Locale: 0x02
Event Description: CopyBack automatically started on PD 39(e0x3f/s5) from PD 30(e0x3f/s14)
Event Data:
===========
None

seqNum: 0x00001d6b
Seconds since last reboot: 131
Code: 0x00000072
Class: 0
Locale: 0x02
Event Description: State change on PD 39(e0x3f/s5) from UNCONFIGURED_GOOD(0) to COPYBACK(20)
Event Data:
===========
Device ID: 57
Enclosure Index: 63
Slot Number: 5
Previous state: 0
New state: 32

seqNum: 0x00001e6a
Time: Tue Nov  4 16:13:51 2014

Code: 0x00000116
Class: 0
Locale: 0x02
Event Description: CopyBack complete on PD 39(e0x3f/s5) from PD 30(e0x3f/s14)
Event Data:
===========
None




 

Attachments

    Outcomes