000029929 - Repairing kernel panics (boot failure) on RSA Security Analytics appliances

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support Employee on Apr 21, 2017
Version 2Show Document
  • View in full screen mode

Article Content

Article Number000029929
Applies ToRSA Product Set: Security Analytics
RSA Version/Condition: 10.0.x, 10.1.x, 10.2.x, 10.3.x, 10.4.x
Platform: CentOS

 
IssueAt times, Security Analytics appliances may fail to boot with a message indicating a kernel panic. When this happens, the output will be printed on the console of the Security Analytics appliance. Unfortunately, there is no way to retrieve this information remotely at this time unless the integrated Dell Remote Access Controller (a/k/a Dell RAC, iDRAC, iDRAC6, iDRAC7) has been configured and is available on the network. For more information, please see How to configure iDRAC 7 on Dell R620 RSA Security Analytics appliances.
If an alternate vKVM (virtual console) solution is configured or some alternate means of accessing the Linux console is available, this may be used in place of the Dell RAC.
During a kernel panic, output similar to this will be printed:
Kernel panic - not syncing: Attempted to kill init!

 
CauseThere are many possible causes of kernel panics. At this time, this article attempts to address a specific cause: failure of the Linux kernel to find, use, or mount the root (/) filesystem.
More specifically, we have found that some customers have encountered a failure to boot after upgrading a kernel on a Security Analytics appliance. There are several possible causes:
  • A misconfigured or incomplete initramfs (formerly "initial RAMdisk") image. We provide a tool to fix these problems; please see The default kernel in the grub boot loader configuration is not the latest on an RSA Security Analytics appliance for more information.
  • On newer releases of Linux, the block device providing the root filesystem has been moved to a new path. Older appliances may contain a kernel commandline parameter specifying the root filesystem similar to this: root=/dev/VolGroup00/root.
  • The Logical Volume Manager's configuration file, /etc/lvm/lvm.conf, may have been provided a filter directive that is over-broad and prevents the LVM from considering the physical devices needed to initialize itself.
These last two causes are addressed below.
ResolutionFirst, using the console of your Security Analytics appliance, please attempt to boot your older kernel, if the boot failure is linked to a kernel upgrade. Kernel upgrades are contained in most of the quarterly security packs we release on SecurCare Online and may optionally be provided as out-of-band releases in the future for pressing issues. If applying an upgrade to a major new release of Security Analytics, such as 10.3.5 -> 10.4.1, this also includes a mandatory kernel upgrade.
During the boot process, after POST finishes
The Linux bootloader's configuration is a small text file, /boot/grub/grub.conf. This file contains one configuration stanza for each defined kernel that you may wish to boot. This is a sample entry:
title CentOS (2.6.32-431.23.3.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-431.23.3.el6.x86_64 root=/dev/VolGroup00/root ro nodmraid quiet
        initrd /initramfs-2.6.32-431.23.3.el6.x86_64.img

In this example, note the second parameter in bold. It is advisable to remove the quiet directive during troubleshooting; much more information will be printed to the Linux console during boot, which may help diagnose the exact problem. Leaving out this directive is harmless; you may optionally leave it out of the bootloader configuration for future troubleshooting considerations.
The first parameter in bold specifies the place that the kernel should look for the root filesystem; in this case, the block device /dev/VolGroup00/root. Even on appliances running older kernels, the newer-style /dev/mapper/VolGroup00-root should exist; you can verify this for yourself:
ls -l /dev/mapper/VolGroup00-root
lrwxrwxrwx. 1 root root 8 Mar 10 14:24 /dev/mapper/VolGroup00-root -> ../dm-13

This change can be applied while running most older kernels; kernels as old as 2.6.32-358.x.x have been verified.
An even better option guaranteed to work on even relatively old appliances is to use a UUID identifier for the root filesystem; this is the default recommended for CentOS systems. On most appliances, the UUID for the root filesystem can be found with tune2fs from the e2fsprogs package:
tune2fs -l /dev/mapper/VolGroup00-root | grep -i uuid
Filesystem UUID:          4cb91617-3b80-4594-a282-e5a4d6df1cb3

If the e2fsprogs package does not exist, you may need to install it. Once you have the UUID of the root filesystem, modify /boot/grub/grub.conf to use it, similarly to this:
title CentOS (2.6.32-431.23.3.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-431.23.3.el6.x86_64 root=UUID=4cb91617-3b80-4594-a282-e5a4d6df1cb3 ro nodmraid
        initrd /initramfs-2.6.32-431.23.3.el6.x86_64.img

Note that "root" and "UUID" both take an assignment symbol, "=". Typos in grub.conf can cause your appliance to fail to boot; in extreme cases, they may require a reinstall or a rescue image to fix. Please exercise caution when modifying this file and make periodic backups.
After making this change, your appliance may still fail to boot. If you modified grub.conf to exclude the quiet directive, as shown above, you may see errors regarding the ability to find or activate volume groups. If this is the case, modifying the root filesystem path may still be required, but it is not enough.
It may be necessary to modify /etc/lvm/lvm.conf to remove any filter directives. These are normally only added when running the arrayCfg script to add DAC or SAN storage to a Security Analytics appliance. To find these lines, look for any un-commented lines beginning with the filter directive.
grep -Ev '^[[:blank:]]*#' /etc/lvm/lvm.conf | grep filter
    filter = [ "a/.*/" ]

In this example, the file contains an "accept filter", denoted by the "a", that determines what block device names the Logical Volume Manager will consider. This filter is actually the default, correct filter applied by LVM; it will consider any and all block devices when attempting to find physical volumes and volume groups. By default, it is commented out, so there would have been no output from the above example.
If there are any filter directives at all, make a backup copy of /etc/lvm/lvm.conf and comment out these directives. Once this is done, the initramfs for the kernel will need to be rebuilt. Please back up the existing initramfs, such as by moving it to root's home directory:
mv /boot/initramfs-2.6.32-504.el6.x86_64.img ~/

Do not back up important files to /tmp. The contents of this directory are considered unimportant by default, and may be erased on reboot or at any other time.
Once the original initramfs is out of the way, you must create a new one by invoking the system utility dracut; preferably with the -f (force) and -v (verbose) options.
dracut -f -v /boot/initramfs-2.6.32-504.el6.x86_64.img 2.6.32-504.el6.x86_64

Please note the format carefully: the first non-optional parameter is the full path of the new initramfs image file to create. The second non-optional parameter is the version of the kernel that you wish to boot. It is not a filename and does not contain any path components. Once the new initramfs is built, you should be able to boot the associated kernel successfully.
NotesAdvanced users can make use of the grubby-wrapper utility referenced in The default kernel in the grub boot loader configuration is not the latest on an RSA Security Analytics appliance.
If the initramfs file is missing (or otherwise appears corrupt) when grubby-wrapper runs, it will emit an error and prompt you to rebuild the initramfs image file. Answer "Yes" (without quotes) to proceed. Again, please back up the existing initramfs before attempting this.
At this time, this functionality is not exposed directly; it is part of the error-checking performed every time grubby-wrapper prepares to make changes to grub.conf. Simply call grubby-wrapper with the -d (default) or -k flags and wait for the error-checking routines to find the initramfs problems.

Attachments

    Outcomes