Anders Brownworth

Technology and Disruption

Activating an Inactive Software RAID

I had some sort of an unrecoverable drive / controller error on my home Linux server which locked up the machine and caused it's software RAID to go inactive after reboot.


--------
box # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : inactive sdb1[1] sda1[0] sdh1[6] sdg1[7] sdd1[2] sdf1[5] sde1[4]
6837319552 blocks

unused devices: <none>
--------


One disk (/dev/sdc) was showing IO errors so I removed and replaced it.


--------
box # mdadm --manage /dev/md0 --remove /dev/sdc1
box # fdisk /dev/sdc
(one partition of type "fd" spanning the disk)
box # mdadm --manage /dev/md0 --add /dev/sdc1
--------


But when I attempted to re-assemble the RAID:


--------
box # mdadm --assemble /dev/md0 /dev/sd[abcdefgh]1
mdadm: cannot open device /dev/sda1: Device or resource busy
mdadm: /dev/sda1 has no superblock - assembly aborted
--------


What? No superblock? I can clearly read it:


--------
box ~ # mdadm --examine /dev/sda1
/dev/sda1:
Magic : a92b4efc
Version : 0.90.00
UUID : 2476ddcb:ac8eb7ae:d7a6d8c7:9aeca122 (local to host box)
Creation Time : Mon Jan 25 21:51:25 2010
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 6837319552 (6520.58 GiB 7001.42 GB)
Raid Devices : 8
Total Devices : 8
Preferred Minor : 0

Update Time : Sun Aug 28 16:18:58 2011
State : active
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0
Checksum : c1380029 - correct
Events : 8480985

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/sda1

0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 65 4 active sync /dev/sde1
5 5 8 81 5 active sync /dev/sdf1
6 6 8 113 6 active sync /dev/sdh1
7 7 8 97 7 active sync /dev/sdg1
--------


Upon a more rational dissection, I notice the "Device or resource busy" error on the line before so I suppose that second error is just a misleading remnant.

But then it occurred to me, this superblock lists 8 good drives and no spares. This clearly isn't the case. And why is /dev/md0 even defined? Clearly an earlier mdadm --assemble had already worked somehow but decided /dev/md0 was not fit to start. This must have happened at boot.

I needed to remove /dev/md0 and re-assemble it again. But this time, I use the --force option so the RAID comes up active:


--------
box # mdadm --stop /dev/md0
mdadm: stopped /dev/md0
box # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
unused devices: <none>
box # mdadm --assemble --force /dev/md0 /dev/sd[abcdefgh]1
mdadm: /dev/md0 has been started with 7 drives (out of 8) and 1 spare.
box main # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid5 sda1[0] sde1[8] sdg1[7] sdh1[6] sdf1[5] sdd1[4] sdc1[2] sdb1[1]
6837319552 blocks level 5, 64k chunk, algorithm 2 [8/7] [UUU_UUUU]
[>....................] recovery = 0.0% (71456/976759936) finish=455.5min speed=35728K/sec

unused devices: <none>
--------


If you have an inactive /dev/md0, you have to stop it before retrying the assemble. The "Device or resource busy" and "no superblock" errors are slightly misleading.

All that is left is to watch the rebuild happen.


--------
watch -n .5 cat /proc/mdstat
--------

Comments (22)

atripl from Russia, Volgograd

I had the same problem today and completely forgot that the device md0 had started during the boot and not allowed me to re-assemble it. Thanks for your post!

SeanB from MN, US

This was very, very, very helpful, thanks for posting. I did have to stop my existing arrays and re-assemble them, was beating my head against the wall to figure out why I couldn't get them to start, superblocks, etc.

CodeMonkeyJ from MD, USA

You are a godsend. Was going crazy trying to get my RAID up and running and was dreading losing all my stuff. Scouring Google for hours with all the various misleading errors, within five minutes of this post, the RAID was up and running again.

Anders from Cambridge, MA

Great to hear - glad I could help. Yeah, I was sweating it out thinking I had lost everything as well! mdadm should really reply with something a little more helpful when the RAID is administratively down.

Dhirham from Indonesia

Thanks for posting. Now my raid5 running up again.

Sandra from Sweden

Thank you. At first I thought I had two faulty disks on my hand and a raid5 beyond repair. This helped me out a lot. Thanks again. :)

Kristian from Denmark

This is a sysadmin career saver! Thanks a lot!

Walter from Colombia

Thank you so much. [>....................] recovering...

Ramesh from Texas

If you have an inactive /dev/md0, you have to stop it before retrying the assemble.

That was the missing piece. Thanks.

Rodrigo from Argentina

Very great post!!! Very very useful for me!!!!!

Many many thanks!!

rocko from Szeged/ Hungary /

Thx, It's is ok!

Mario from Tallinn

Hello,
I made Swap to MD1 as RAID0. MD0 is RAID1 and /boot and MD2 is RAID1 and /root.
/proc/mdstat shows md1 inactive sda2[0](S) sdb[2](S)

I entered command:
# mdadm --assemble --force /dev/md0 /dev/sd[ab]2

It said:
/dev/md1 assembled from 1 drive and 1 spare- not enough to start the array.

Any toughts?
History:
Have 2xWD 500GB RE4 disks with Debian 6.
1 disk failed weirdly. On boot it showed that md0 was 2of2 (RAID1 worked), MD1 also worked then, but MD2 (root) was 1 of 2. And it didnt rebuild the disk. So i decided to DD zero the non working drive and manually recreated the partitions with fdisk. The other partitions rebuilded fine with mdadm, but RAID0 swap didnt. It didnt add the second MD1- ARRAY UUID and name to mdadm.conf either, I manually added the 3'rd partition info.

Anders from Cambridge, MA

@Mario: You must have a typo - asking md0 to assemble shouldn't show an error on md1.

Mario from

Yeah, sorry. That command was pasted from this thread. I modified it when i used it myself with MD1.

Anders from Cambridge, MA

@Mario: It might be more helpful if you just post info about the md* that isn't working. As written, your original post is hard to follow.

Tom from USA

This post was a life-saver! You rock!

Mark from Australia

Saved my bacon!

Anatoli from

Actually, there is no need to stop and force-start an array, just run: mdadm -R /dev/<array_name> (-R for run).

Selby from Sofia / Bulgaria

I have four HDD

# cat /proc/mdstat SHOW:
Personalities : [raid1]
md124 : active raid1 sda[1] sdb[0]
976748544 blocks super external:/md1/0 [2/2] [UU]

md1 : inactive sda[1](S) sdb[0](S)
6306 blocks super external:imsm

md125 : active raid1 sdc[1] sdd[0]
976748544 blocks super external:/md0/0 [2/2] [UU]

md0 : inactive sdc[1](S) sdd[0](S)
6306 blocks super external:imsm

unused devices: <none>

Please help me.
Thanks

Brett from California

Thank you very much for this post! Solved my problem easily after hours of google searching.

tejo from TEMSE

I guess my raid5 (8 * 3tb set) on my home server failed one drive after a reboot today. Yeah, can happen to us all. Now, my question is, since i have 8 identical WDC_WD30EFRX-68EUZN0 drives in this raid, how can i identify the disk when i open the "box" ?

Sounds like a silly question, but its "kinda" easy to identify it by the mdadm soft, but how would i find the hardware ?
I kinda feel "try and error" isnt the right approach ....

I know this is a old post, so i not really expect an answer ... continuing my search :-)

Thx in advance !
Tee

Anders from Cambridge, MA

When the system is running, it's the drive who's light doesn't follow the others during normal operation is the broken one.

Leave a Comment

Name:
Location: (city / state / country)
Email: (not published / no spam)
Comment:

No HTML is allowed. Cookies must be enabled to post. Your comment will appear on this page after a moderator OKs it. Offensive content will not be published.

Click the puppy to submit your comment.

To create links in comments:
[link:http://anders.com/] becomes http://anders.com/
[link:http://anders.com/|Anders.com] becomes Anders.com
Notice there is no rel="nofollow" in these hrefs. Links in comments will carry page rank from this site so only link to things worthy of people's attention.