The purpose of this document is to suggest a way of configuring grub on a system where the operating system is mirrored between two disks, so that if the first disk fails completely, the system will boot automatically from the second disk.
Whether this actually works depends on how precisely the system fails, so there are also instructions on how to get the system up if it still isn't booting. Note that not everything in this document has been fully tested, and the grub documentation suggests there are machines on which they definitely won't work.
The assumption is that we have two identically partitioned disks,
containing a mirrored boot partition [1], and we will be booting from the
master boot record on each disk [2]. The examples assume that the first
partition on each disk make up the mirrored disk mounted at /boot, and
that grub numbers these disks (hd0)
and
(hd1)
. Grub numbers disks in order of
detection by the bios. The order is probably the same as the operating
system, but in some cases the ordering may not be obvious, for example
if you have a mixture of IDE and SCSI disks.
First make sure that the BIOS setup of your machine will attempt to boot from the second disk if it can't find a valid boot sector on the first disk. Run the grub command from your root prompt and within the grub shell type the following (Note that the install commands may be broken for legibility but should be entered as one long line):
Here the --stage2 value is the
location of the grub stage2 file within the operating system, which
means it writes the changes to the mirrored partition, rather than one
or the other raw disk, and hence avoids corrupting the mirror. The
other file references are in grub format, ie. a partition reference
followed by the location of the file within that partition. We don't
specify an explicit partition for the grub.conf
file as the "p" flag tells grub to
remember which partition it found the stage2 file on and make it the
default, and we therefore avoid setting the disk explicitly and use
the current disk, which is presumably the good half of the mirror. If
we didn't have a separate /boot partition, all the file references
would start /boot/grub
rather than /grub
.
The grub.conf
file (sometimes called
menu.lst
) in /boot/grub/
should have no "root"
commands, at least for the default target, and there should be no
explicit partition references, since we are trying to use relative
references. For example:
Hopefully this means the system will boot even if the first disk is corrupted or removed, though it may not if the problem means that the system can read some, but not all for the boot files.
If you upgrade your linux distribution or your grub package
directly, you will probably need to redo these instructions (your
distribution may try to update the grub setup, but the chances are it
won't do it this way). You may also have to fix your grub.conf
file after upgrading the kernel.
Depending on the problem, the system may still not boot, or only partially boot (eg. as far as a grub prompt), in which case some manual intervention is required. If there is a grub prompt we can use it, otherwise we can get a grub prompt by booting from a floppy disk with grub installed[3]. All we need to do here is to select the good boot partition with the root command, and give the kernel and initrd lines as in a normal boot eg.:
root (hd1,0)However, you may have forgotten which boot partition to use and what arguments you need. In this case the grub commands find, cat and configfile are useful, and grub also has tab completion of files and partitions, which can be used to work out which disk is which. Use find /grub/grub.conf to work out which partition(s) has a valid copy. Set this partition to be the active one with root (hd1,0). Then you can view the configuration file with cat /grub/grub.conf and enter the commands by hand, or use configfile /grub/grub.conf to load it. Note you may have to edit a menu item by adding a root command to get the system to run from it.
If you have any unmirrored partitions on the failed disk, the
system is likely not to boot fully when it looks for the missing
partition, though commenting out the appropriate line in /etc/fstab
should work if the disk isn't
vital.