Site: US UK AU |
Nexcess Blog

Software RAID on Linux with mdadm

August 8, 2011 0 Comments RSS Feed

Software RAID on Linux with mdadm
Software RAID can be a great alternative when greater disk redundancy and/or performance is needed but the expense of a hardware RAID card can’t be justified. On Linux, software RAID almost
universally means mdadm, so that is what I’m going to cover here. For the examples in this post I will be using a VM with CentOS 5 installed on the first drive (sda) and 4 other drives to use for working with mdadm. CentOS 5 comes with mdadm version 2.6.9, which is somewhat older that the most current release but shouldn’t be missing any important features.

md devices can be created using the –create (-C) subcommand, generally like this:

# mdadm --create /dev/mdA -n B -l C -c D <list of component devices>

Where A is the device number, it can be chosen arbitrarily but usually starts at 0. The device should not already exist since it will be created by mdadm. B is the number of physical devices in the array, C is the RAID level and D is the chunk size in KB. The chunk size only applies to RAID levels with striping, such as 0, 5, 6, etc. and determines how big the stripes (chunks of data) on each physical device are. Chunk size can have a big impact on the performance of an array, I would recommend reading what this page says on the topic (keeping in mind that it was written for smaller drives than are currently available). Some example array creation commands:
Simple RAID5 array with 256K chunk size:

# mdadm --create /dev/md0 -n 4 -l 5 -c 256 /dev/sd[bcde]1

RAID1 with 2 devices, plus a hot spare in case of a drive failure:

# mdadm --create /dev/md0 -n 2 -l 1 --spare-devices 1 /dev/sd[bcd]1

A couple of notes about array creation:
1) When creating a RAID 5 (or 6 likely) array, the array is initially created as degraded and the last disk is used to recover it. This is intentional for performance reasons and not a reason to panic.
2) You can create an array with one (or more) missing disks as long as there are enough actual disks to start the array by using the word “missing” in the device list, like so:

# mdadm --create /dev/md0 -n 2 -l 1 /dev/sdb1 missing

3) mdadm supports layering of md devices, so you can in theory do something like joining 2x500GB + 1x1TB drives into a 1TB RAID1 like so (assuming /dev/sd[bc] are 500GB and /dev/sdd is 1TB):

# mdadm --create /dev/md0 -n 2 -l linear /dev/sd[bc]1
# mdadm --create /dev/md1 -n 2 -l 1 /dev/sdd1 /dev/md0

4) mdadm can also work with entire drives, rather than just partitions (as I’ve been using so far). Which is entirely up to personal preference as there is no performance difference.
5) After array creation, you should save your configuration to /etc/mdadm.conf (/etc/mdadm/mdadm.conf on some systems) like so:

# mdadm --detail --scan > /etc/mdadm.conf

mdadm can also modify RAID devices after they’ve been created by use the –grow (-G) subcommand. This is covered in detail here. The important things to note are that you should be very careful when shrinking the array and shrink the filesystem on it first, and to have patience since some operations (like adding new disks to striped arrays) can take a very long time complete. If an array is no longer needed, it can be removed by unmounting, then stopping the array, then removing it’s corresponding line from /etc/mdadm.conf, like so:

# umount /dev/md0
mdadm --stop /dev/md0

The status of all arrays on the system can be check quickly by running ‘cat /proc/mdstat’ (some devices have been removed from the output to keep it short):

$ cat /proc/mdstat 
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] 
md6 : active raid1 sdc3[2] sdg3[0]
      1465061695 blocks super 1.2 [2/2] [UU]
      bitmap: 0/11 pages [0KB], 65536KB chunk

md5 : active raid5 sdc2[4] sdd2[2] sdh2[1] sdg2[0]
      1464929280 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/4 pages [0KB], 65536KB chunk

md1 : active raid0 sda3[0] sdb3[1]
      4016000 blocks 64k chunks
unused devices: <none>

The important parts of the output are the “active” and “[UU]” parts, which will show you when an array is degraded or failed, and which disk(s) are having issues. When a disk is missing, it’s corresponding ‘U’ will be replaced with an ‘_’ (underscore), something like this (if sdb1 had failed):

md2 : active raid1 sda1[0]
      505920 blocks [2/1] [U_]

Array recovery with mdadm is usually very painless, in the cases where a single disk has gone bad and dropped out of an array with a RAID level that provides redundancy (1, 5, etc). Using the example above, once sdb had been replaced (either by bringing the machine down, or hot-swapping), we can re-added it to the array with:

# mdadm /dev/md2 --add /dev/sdb1

The array will start resyncing, which will show up in /proc/mdstat looking something like this:

md2 : active raid1 sdb1[2] sda1[0]
      505920 blocks [2/1] [U_]
      [======>..............]  recovery = 33.2% (168512/505920) finish=0.1min speed=56170K/sec

Unfortunately, array recovery is not always that simple, such as when multiple disks in a RAID5 array fail. In a case like that, the most important part is to look for help, the Linux Raid Wiki and the mdadm mailing list are both excellent sources of information. Attempting to fix an advanced failure without knowing what you’re doing could (and likely would) result in a total loss of data on the array.

Posted in: Linux