Linux RAID Failure Recovery: Making Mirrored or Cloned Disks Bootable When Moved

Frederick Denny (fred@freddenny.com)
v. 0.90.0 17th of March 2003

Background: The purpose of this document is to provide a disaster recover procedure for recovering from Linux Software RAID failures on my servers that run RAID-1 (mirroring) . These procedures are also applicable when cloning a disk using Linux Software RAID and making disk(s) bootable in other machine(s) or drive position(s).

Note: For initial configuration of Linux Software RAID Volumes refer to The Software-RAID HOWTO: RAID Setup

Scenario #1 (A simple mirrored slave drive failure replacement):

- We are running RAID-1 (mirroring) on a set of bootable SCSI disks.
- The second (mirror slave) disk is lost and replaced with a fresh replacement.
1.  Run
fdisk -l to determine which volume is to replaced

 
Disk /dev/sda: 255 heads, 63 sectors, 4462 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 4462 35840983+ fd Linux raid autodetect

Disk /dev/sdb: 255 heads, 63 sectors, 4462 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 1 261 2096451 6 FAT16

2.  In this case our new disk is /dev/sdb and we will need to delete the existing FAT16 partition and create a new partition using the "Linux raid autodetect" partition system ID.

2a. # fdisk /dev/sdb

2b.
Hit "p" to see the current info:
  Command (m for help): p


Disk /dev/sdb: 255 heads, 63 sectors, 4462 cylinders
Units = cylinders of 16065 * 512 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 1 261 2096451 6 FAT16


2c.
Delete the FAT16 partition (or whatever other partitions you might have)


Command (m for help): d

Partition number (1-4): 1


2d. Create a new partition that matches your mirror size (I use the default full disk from start to end)

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-4462, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-4462, default 4462):
Using default value 4462

2e. Now we must set the partition system id to Linux RAID Autodeted (type fd)

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)


2f. The next step is to write all this information to the disk:


Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.

2g. Now you can re-build your RAID mirror set or clone your master disk to a slave disk:

# raidhotadd /dev/md0 /dev/sdb1

you can monitor the rebuild progress by typing: cat /proc/mdstat

Scenario #2 (replacing  a mirrored master or cloning a disk and making it bootable in another drive position):

- Assume we are running RAID-1 and our master mirrored disk dies hard and we replace it with the slave (in the master position hd0)
                                        or
  (You have used
raidhotadd to duplicate a disk that will later move to the master position from slave/clone position.

- I prefer to put the surviving/good disk in the master position in my server

- You can try booting up the disk or disk set but it probably will not boot (if it works there is no need to go any further)

1.  Boot from a Linux CD-ROM type :
linux rescue when at the boot prompt.

2.  The mount the master disk to a temp mount. 
mount /dev/sda1 /mnt/fred.

3.  Go to the mount location and make sure everything is there for a reality check.

4.  Now we will re-write the boot loader so we can boot the disk (assuming hd0 is our target drive position the sda1 partition becomes a 0 as grub start counting a 0). This comes together as (hd0,0)

 type:
/mnt/fred/sbin/grub
grub> root (hd0,0)
grub> setup (hd0)
grub> quit

5. Now reboot the machine and your clone drive should be a bootable master disk.

6. You can now proceed to sync up you slave disk or make more clones using raidhotadd /dev/md0 /dev/sdb1