Running Linux off Flash Drives

Configuring a Linux Server with RAID-1

Copyright 2009 by Stephen Vermeulen
Last updated: 2009 Sep 25
Building Routers with Linux





2002 account amd area array attached away bad boot call cat choose complete copyright course debian did docs existing faster feature field following further having hdc high howto https implementation install instead item loose lost menu minutes monitor names note ones recovery replacement return samba seconds smaller specify think times until updated whole

See also:

Linux RAID Links

  • 2009-Sep-25: A collection of articles that talk about Linux RAID and how it handles errors: [8562]
  • 2009-Sep-10: Another possibility for a Linux-based NAS machine would be to use unRAID from Lime Technology, their hardware compatibility page is here. This is reviewed by SmallNetBuilder and here is another approach. unRAID is somewhat like RAID-4 in that it uses a single parity disk, but it also does not stripe the data across multiple disks. This causes it to loose some potential performance due to the lost striping, but it provides some important gains in flexibility. You can upgrade existing data disks much faster (as the only data regeneration is to reload the contents of the replaced disk) and there are fewer limitations on the sizes of the individual disks. There is a long support thread on the LIMEtechnology unRAID product here. A video review of unRAID can be found here, they gloss over some of unRAID's biggest advantages: it can make a fault-tolerant array out any random assortment of IDE and SATA drives (they don't have to be all the same size, this also allows you to easily upgrade old (small) drives by just unplugging them and replacing them with a larger new drive) and if you have the bad luck of having two drives die at the same time the data on the other drives in the array is still usable (which is not the case for RAID-1 or RAID-5). Another video on unRAID, this goes through reasons for using it and a full build, further discussion here. [7188]
  • 2008-Nov-11: FlexRAID is another potential RAID system. [7196]
  • 2008-Sep-18: This article: Benchmarking hardware RAID vs. Linux kernel software RAID, shows that a high end RAID card can outperform a software implementation of RAID5 by about a factor of two (from about 150MB/s to 300MB/s write speeds with 6 disks) using an AMD X2 2.2GHz CPU. They also mention that XFS has some performance advantages over ext3 when used on a RAID disk set. [6877]
  • 2007-Dec-19: Managing RAID on Linux by Derek Vadala, ISBN: 978-1565927308. This might be useful to have, but with a 2002 publication date it is probably in need of an update (the mdadm tool was being introduced about then and I think the 2.26 kernel series had yet to start). [4467]

Overview

While rebuilding my Linux Samba file server (see: Replacing a Windows NT4 Server) I thought it might be time to investigate protecting its single drive by installing a second similar drive and configuring the system for RAID-1. I had initially thought about this, but in 2005 the state of the software RAID world was in a bit of flux so I filed the task for later. Well at the end of 2007 it looked like later might finally have arrived.

I initially did some searching for information (see above) and eventually stumbled across this note on Debian's support area that talks about installing to a combination of RAID and crypto file systems. Ah ha, I thought! The Debian installer does include RAID support, I must have missed it somehow (in my Samba testing I had reinstalled Debian many times without seeing mention of RAID). Turns out they call it SataRaid and as I'm running parallel IDE I probably just skimmed by.

Eventually I came across this Debian new installation software RAID1 HOWTO which is a pretty accurate account of how to install your Debian system to a software RAID-1 drive pair, and how to test that it really does work. I followed this guide and got a working system, then I tested it and rebuilt it a couple of times on different hardware to make sure I understood it correctly.

Following this guide will enable you to install a Debian 4.0 (Etch) system to a pair of drives in RAID-1 mode, set up your system so that /boot, root and swap are all on RAID-1 devices, reconfigure grub to allow you to boot off either drive, or automatically boot of the good drive in the event one drive fails. The guide also goes through some simple tests that show you how to check the health of the drives, remove a drive from service, put a new drive into the array, partition the new drive and place it into service and how to monitor the rebuilding process.

This howto was done on a SCSI system so the device names were a bit different from my system, so I went through some tests to make sure I understood what the commands would be on my system - and to practice in advance to make sure I knew what to do if a drive really did fail. Of course Murphy's Law will apply here and by having spent the time preparing I will never have to use this skill - something else will fail instead.

There are a few points worth noting about the RAID install:

  1. The whole process starts in the partition manager
  2. You need to erase all partitions from each disk, and then you need to create three partitions on each disk. These partitions will be for the boot, the swap and the root.
  3. You can make the boot partitions quite small, I used 250MB but only 16MB is actually occupied.
  4. The swap partitions will probably be 2-3 times your physical RAM.
  5. The root partitions can be as large as the available space, but if the disks are different sizes you need to make the partitions the same size so will be leaving some free space on the larger disk. Even if your disks are identical in size you might want to leave a few gigs unused in the event that when you buy a replacement drive it is just a little bit smaller.
  6. If you are going to do a test install so that you can try out the recovery procedures you might want to keep the root partition small (say 5GB) that way any rebuilds will go quickly.
  7. The goal is to make the partition tables identical on both drives.
  8. You don't see anything about RAID until you select the "Use as" field in the "Partition settings" screen. When you do you get presented with another menu that has the item "physical volume for RAID" - this is what you must select for all the partitions.
  9. You will also need to set "Bootable flag: on" for the two partitions that will host the final /boot device.
  10. Once you have set up the three pairs of raw partitions you need to write the partition tables and then you need to select the "Configure software RAID" menu item that has shown up on the main partitioning menu (this may have scrolled out of the visible area of the display so you might not have noticed it).
  11. You should now see a menu that allows you to "Create MD device" and when you select this you can choose to create a "RAID1" device (if you were setting up a RAID0 for speed or a RAID5 with 3 or more disks you would select those at this point)
  12. You then get to specify how many partitions are involved and if there are any spare ones
  13. And the last step is to select which raw devices will be paired to make the RAID-1 arrays.
  14. You repeat this process for all three RAID-1 MD devices, and then you can "Finish partitioning and write changes to disk"
  15. At this point you can proceed with the rest of the install, but the RAID arrays are synchronizing so you probably should not reboot right away - instead open an additional console by pressing ALT-F2 and then issue the command cat /proc/mdstat to monitor the progress of the RAID rebuild. To return to the installer you can press ALT-F1.
  16. You will want to install the grub boot manager.
  17. Once the system has rebooted you will need to do a couple more things. First edit the /boot/grub/menu.lst file and add the (hd1,0) device in addition to the (hd0,0) device. Next add the line fallback 1 right after the line: default 0 so that if device zero fails grub will automatically try device 1.
  18. You will also need to run grub and at its command prompt type:
         root (hd0,0)
         setup (hd0)
         root (hd1,0)
         setup (hd1)
         quit
    
  19. At this point you should have a system that will boot and run off RAID-1 and should continue working even if you loose one drive. You might want to do some tests before putting this into production.

Test 1

The server I configured had two drives, both set to master on the two IDE channels of the motherboard, so the drives were named: hda and hdc by Linux.

For the first test I shutdown the system, disconnected hdc from the IDE cable and started the system up again. The system was still able to boot using hda as expected.

Next I shutdown the system and rebooted it using DBAN (Darik's Boot and Nuke), I then proceeded to wipe out the first 4GB of hda (remember hdc was still disconnected so it was safe).

I then shutdown the system and reconnected hdc (so now a bad hda and a good hdc are attached). I then rebooted and the system came up running off hdc as expected. The cat /proc/mdstat shows the md0, md1 and md2 devices running in degraded mode using partitions from hdc only.

Now it was time to try rebuilding the system (simulating a new, empty disk, having been attached in hda's position). The first command is:

sfdisk -d /dev/hdc | sfdisk /dev/hda
which is a pretty dangerous one, it copies the partition table from the good disk (hdc) over to the new disk (hda).

The next thing you do is to use mdadm to add the partitions from the new disk into the raid disks:

mdadm --add /dev/md0 /dev/hda1
mdadm --add /dev/md1 /dev/hda2
mdadm --add /dev/md2 /dev/hda3
As soon as you do this the rebuilding will start, since md0 and md1 (boot and swap) are pretty small they will rebuild in seconds to minutes, but md2 will probably take some time. You can cat /proc/mdstat to monitor the rebuild progress or try mdadm -D /dev/md2 for more detailed information.

Once the rebuild is done you should run grub again:

grub
root (hd0,0)
setup (hd0)
root (hd1,0)
setup (hd1)
quit
and at this point you are complete and you can reboot and check that the system can boot off either drive again.

Test 2

For my second test I repeated the first test, but just swapped the role of the drives. I did this so that I could make certain all the drive letters were documented correctly.

In the second test I shutdown the system, disconnected hda from the IDE cable and started the system up again. The system was still able to boot using hdc as expected.

Next I shutdown the system and rebooted it using DBAN (Darik's Boot and Nuke), I then proceeded to wipe out the first 4GB of hdc (remember hda was still disconnected so it was safe).

I then shutdown the system and reconnected hda (so now a bad hdc and a good hda are attached). I then rebooted and the system came up running off hda as expected. The cat /proc/mdstat shows the md0, md1 and md2 devices running in degraded mode using partitions from hda only.

Now it was time to try rebuilding the system (simulating a new, empty disk, having been attached in hdc's position). The first command is:

sfdisk -d /dev/hda | sfdisk /dev/hdc
which is a pretty dangerous one, it copies the partition table from the good disk (hda) over to the new disk (hdc).

The next thing you do is to use mdadm to add the partitions from the new disk into the raid disks:

mdadm --add /dev/md0 /dev/hdc1
mdadm --add /dev/md1 /dev/hdc2
mdadm --add /dev/md2 /dev/hdc3
As soon as you do this the rebuilding will start, since md0 and md1 (boot and swap) are pretty small they will rebuild in seconds to minutes, but md2 will probably take some time. You can cat /proc/mdstat to monitor the rebuild progress or try mdadm -D /dev/md2 for more detailed information.

Once the rebuild is done you should run grub again:

grub
root (hd0,0)
setup (hd0)
root (hd1,0)
setup (hd1)
quit
and at this point you are complete and you can reboot and check that the system can boot off either drive again.



              back to vermeulen.ca home