RAID

From Noah.org
Revision as of 22:05, 21 March 2013 by Root (Talk | contribs) (mdadm RAID recovery)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This was my old RAID setup for an Ubuntu NAS box. Note that since Ubuntu 7.10 this is much easier. Simply use the Ubuntu Alternate Server disk. When installing select manual disk partitioning. There will be an option to create an MD device.

Hardware

I used two identical PATA 180GB drives. Each drive was connected to a separate IDE bus (each has their own cable). Do not put two IDE RAID drives on the same IDE bus. Two drives on the same cable would slow everything down plus if one drive goes bad it brings down the entire IDE bus.

software

When installing mdadm you will be asked if you want to automatically start MD Arrays. Select "all".

apt-get install mdadm

First attempt -- a non-bootable RAID-1 partition

Format

I created a 4GB partition on both drives for the operating system (boot and / on an ext3 partition). The first partition (4G) of each drive was formatted as ext3. The 4GB partition on the second drive is unused. It's just there to make it easier to make sure that each drive is setup exactly the same. I created a second partition on each drive. I left the this partition unformatted. In other words, I installed the full operating system first on the 4GB partition before I even started with RAID configuration. I didn't create a swap partition. I could have, but this machine has 1GB of RAM and will do no work besides SaMBa, so swap isn't going to help much.

After all is said and done I had two drives partitioned and formatted identically:

# fdisk -l

Disk /dev/hda: 180.0 GB, 180045766656 bytes
255 heads, 63 sectors/track, 21889 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1         486     3903763+  83  Linux
/dev/hda2             487       21889   171919597+  83  Linux

Disk /dev/hdd: 180.0 GB, 180045766656 bytes
255 heads, 63 sectors/track, 21889 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdd1               1         486     3903763+  83  Linux
/dev/hdd2             487       21889   171919597+  83  Linux

This setup is not "ideal", but it's easier to setup. I'm building a NAS, so I don't care as much for the integrity of the operating system. I just care about the files in the file server. If the boot sector goes bad I can reinstall Linux and recover my files offline. A more clever system would allow the boot and operating system partitions to be on the RAID array itself.

Create the RAID-1 MD Array

I had a small issue where after I created the two partitions (hda2 and hdd2) using fdisk they did not show up in /dev/hda2 or /dev/hdd2. I could see them with 'fdisk -l'. What I did was use cfdisk to create a filesystem on the partitions. After that they showed up in /dev/.

Once the partitions are ready it's easy to turn them into a RAID device:

mdadm /dev/md0 --create --auto=yes --level=1 --raid-devices=2 /dev/hda2 /dev/hdd2

This will create the RAID device /dev/md0 and it will start to "sync" the two partitions. Cat /proc/mdstat to see the status of the resync process:

# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 hdd2[1] hda2[0]
      171919488 blocks [2/2] [UU]
      [>....................]  resync =  2.5% (4372032/171919488) finish=105.1min speed=26557K/sec
      
unused devices: <none>

Second attempt -- Bootable RAID-1 on root

That was so easy that I decided to start over and get the whole drive under properly RAID-1.

Ubunt 7.04 is broken. It does not reliably allow you to build a bootable RAID array. Use Ubuntu 6.10 Server instead.

I installed Ubuntu Server 6.10 on one of the disks (hda). I allowed the installer to repartition and use all of hda (it created one root partition and a swap partition). When all was done I had a stock system with the following partitions on /dev/hda:

Disk /dev/hda: 180.0 GB, 180045766656 bytes
255 heads, 63 sectors/track, 21889 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1       21470   172457743+  83  Linux
/dev/hda2           21471       21889     3365617+   5  Extended
/dev/hda5           21471       21889     3365586   82  Linux swap / Solaris

Then I used `dd` to copy the main boot drive to the second drive.

dd bs=1M if=/dev/hda of=/dev/hdd

After that was done I had two identically partitioned drives:

# fdisk -l

Disk /dev/hda: 180.0 GB, 180045766656 bytes
255 heads, 63 sectors/track, 21889 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1       21470   172457743+  83  Linux
/dev/hda2           21471       21889     3365617+   5  Extended
/dev/hda5           21471       21889     3365586   82  Linux swap / Solaris

Disk /dev/hdd: 180.0 GB, 180045766656 bytes
255 heads, 63 sectors/track, 21889 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdd1   *           1       21470   172457743+  83  Linux
/dev/hdd2           21471       21889     3365617+   5  Extended
/dev/hdd5           21471       21889     3365586   82  Linux swap / Solaris

It's probably not necessary to use dd since the RAID system is going to want to resync the drive anyways, but this was a lazy way to make sure both drives were identically partitioned. If I was building a RAID-1 with 1TB drives I would copy the partition table by hand, since this would be faster. I could also copy just the partition information without copying the data:

sfdisk -d /dev/hda | sed s/hda/hdd/ > /tmp/hdd
sfdisk -f /dev/hdd < /tmp/hdd

Next set the secondary drive's first partition type to fd (Linux raid autodetect). I just used fdisk and the t command. Set type fd for only partition 1. The Extended/swap partitions will not be part of the RAID. It should look like this afterwards:

# fdisk -l /dev/hdd

Disk /dev/hdd: 180.0 GB, 180045766656 bytes
255 heads, 63 sectors/track, 21889 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hdd1   *           1       21470   172457743+  fd  Linux raid autodetect
/dev/hdd2           21471       21889     3365617+   5  Extended
/dev/hdd5           21471       21889     3365586   82  Linux swap / Solaris

Now I actually create the RAID array and add the secondary drive. This is a "degraded" array because one drive (hda) is marked as "missing". This drive will be added later.

mdadm /dev/md0 --create --auto=yes --level=1 --raid-devices=2 missing /dev/hdd1

At first I got this error message:

mdadm: Cannot open /dev/hdd1: Device or resource busy

This was likely due to the fact that installing `mdadm` started /dev/md0 for me (perhaps it saw that I set /dev/hdd to type fd or it may have seen the RAID super block on /dev/hdd from a previous attempt). This is easy to fix -- just shutdown RAID on /dev/md0.

# mdadm --stop /dev/md0
mdadm: stopped /dev/md0

Edit /etc/fstab. Change the / mount that looks like this:

/dev/hda1    /    ext3    defaults,errors=remount-ro 0 1

Set to md0 device:

/dev/md0    /    ext3    defaults,errors=remount-ro 0 1

Edit /boot/grub/menu.lst and add the following boot menu item (Be sure to copy the kernel image you need to boot):

title RAID kernel 2.6.15-26-server
root (hd0,0)
kernel /boot/vmlinuz-2.6.15-26-server root=/dev/md0 ro
initrd /boot/initrd.img-2.6.15-26-server
boot
mkfs.ext3 /dev/md0
mkdir -p /mnt/md0
mount /dev/md0 /mnt/md0
telinit 1
cp -aux / /mnt/md0

Now restart:

shutdown -h now

It should boot off of /dev/md0. Check this after booting by running `mount`.

I'm not sure this step is necessary. It may be enough to simply set the partition ID to fd. But this step can't hurt since once the drive is added to the array the md driver will resync it anyway.

sfdisk -d /dev/hdd | sfdisk --no-reread /dev/hda

Next add the original primary disk to the array. This will trigger a resync.

mdadm /dev/md0 -a /dev/hda1

Monitor the resync with `mdadm --detail /dev/md0` or `cat /proc/mdstat`. When it shows the drives are in sync and the array is healthy then you can reboot.

Still more problems. It seems that this does not build the superblock quiet right. I got an error something like this.

/dev/md0: The filesystem size (according to the superblock) is 97677200 blocks
The physical size of the device is 97677184 blocks
Either the superblock or the partition table is likely to be corrupt!

/dev/md0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
	(i.e., without -a or -p options)

The fix was tedious, but not hard. First grab a live CD. I used Ubuntu 7.04. I booted off the CD then started a shell. I became root `sudo su -` (no password is necessary for the live CD). Even though this is a live CD you can still "install" packages using apt-get. You need the mdadm package. This is necessary to mount the RAID array:

apt-get install mdadm
mdadm --assemble /dev/md0 /dev/hda1 /dev/hdd1
e2fsck -f /dev/md0
resize2fs /dev/md0

Also delete any /etc/mdadm.conf or /etc/mdadm/mdadm.conf files you may have. After this I was able to reboot and the array worked perfectly.

errors

If you used Ubuntu 7.04 it may appear that it has locked up on boot, but if you wait a couple minutes then you might see an error message. If you see this error then you may have the "Ubuntu MD race condition" bug.

        Check root= bootarg cat /proc/cmdline
        or missing modules, devices: cat /proc/modules ls /dev
ALERT! /dev/md0 does not exist.  Dropping to a shell!

Here are some notes on this problem. The work-around is not satisfactory. The best thing to do is to downgrade to Ubutnu 6.10 Server (Edgy).

https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/103177 https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/79204

speed tests

Speed isn't really important for my NAS purposes, but I was curious as to how RAID-1 would improve read performance. First I got a baseline speed for each drive before they were put into the RAID.

# hdparm -t /dev/hda

/dev/hda:
 Timing buffered disk reads:  140 MB in  3.01 seconds =  46.49 MB/sec

# hdparm -t /dev/hdd

/dev/hdd:
 Timing buffered disk reads:  134 MB in  3.03 seconds =  44.20 MB/sec

And oddly enough, it seemed to make performance no faster:

# hdparm -t /dev/md0

/dev/md0:
 Timing buffered disk reads:  136 MB in  3.02 seconds =  45.03 MB/sec

Well, that just sucks...

mdadm RAID recovery

If a drive fails replace the drive; then copy the partition information from one of the good drives in the array to the new drive. In this example the good drive is sda and the new, empty drive is sdc:

# sfdisk -d /dev/sda | sfdisk /dev/sdc

Add the new drive to the array:

# mdadm --manage /dev/md0 --add /dev/sdc1
mdadm: added /dev/sdc1

After adding the device it will show up as a spare as indicated by (S) next to sdc1 in the /proc/mdstat output.

# cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid10 sdc1[4](S) sdd1[2] sda1[0]
      3906763776 blocks super 1.2 512K chunks 2 near-copies [4/2] [U_U_]
      
unused devices: <none>

The problem is now the drive just sits there as a spare. You need to force the md system to recovery the array. There's probably some way to do this with mdadm, but I just use the sys filesystem.

# cat /sys/block/md0/md/sync_action
frozen
# echo repair >/sys/block/md0/md/sync_action
# cat /sys/block/md0/md/sync_action
recover

The OS will need to rebuild the array by resyncing the required data to the new drive. You can check on the recovery progress by looking at /proc/mdstat.

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid10 sdb1[4] sdd1[2] sdc1[3] sda1[0]
      3906763776 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
      [>....................]  recovery =  0.0% (35072/1953381888) finish=31522.6min speed=1032K/sec
      
unused devices: <none>

Notice that the recovery finish is 31522.6 minutes. That's over 21 days. It turns out that there is a speed limit to the recovery process.

cat /proc/sys/dev/raid/speed_limit_min
1000
cat /proc/sys/dev/raid/speed_limit_max
200000

You can modify this value to increase the recovery speed. I like to set it to half the max speed limit. This value will decrease recovery time to 37 hours.

echo 100000 > /proc/sys/dev/raid/speed_limit_min
cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : active raid10 sdb1[4] sdd1[2] sdc1[3] sda1[0]
      3906763776 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
      [>....................]  recovery =  0.0% (1118080/1953381888) finish=2233.5min speed=14567K/sec
      
unused devices: <none>

Setting the min speed limit equal to the max speed limit will decrease recovery time to 5.5 hours. This will likely make the system unusable to other applications.

notes

There is lots of useful information under /sys/block/md0/md/.

script

Note that this section appears to have gotten a bit corrupted -- perhaps from a cut-and-paste error.

mdadm /dev/md0 --create --auto=yes --level=1 --raid-devices=2 /dev/hda2 /dev/hdd2
dd bs=1M if=/dev/hda of=/dev/hdd
sfdisk -d /dev/hda | sed s/hda/hdd/ > /tmp/hdd
sfdisk -f /dev/hdd < /tmp/hdd
mdadm /dev/md0 --create --auto=yes --level=1 --raid-devices=2 missing /dev/hdd1
mdadm --stop /dev/md0

Edit /boot/grub/menu.lst and add the following boot menu item (Be sure to copy the kernel image you need to boot):

title RAID kernel 2.6.15-26-server
root (hd0,0)
kernel /boot/vmlinuz-2.6.15-26-server root=/dev/md0 ro
initrd /boot/initrd.img-2.6.15-26-server
boot

Does this following section belong somewhere else?

mkfs.ext3 /dev/md0
mkdir -p /mnt/md0
mount /dev/md0 /mnt/md0
telinit 1
cp -aux / /mnt/md0
sfdisk -d /dev/hdd | sfdisk --no-reread /dev/hda
mdadm /dev/md0 -a /dev/hda1
#Also delete any /etc/mdadm.conf or /etc/mdadm/mdadm.conf files you may have. After this I was able to reboot and the array worked perfectly.