Monthly Archives: October 2007

Solaris 8 Disk Fail During Reboot

I had a disk fail while rebooting a solaris 8 machine. I had a RAID 1 setup with two disks using Disk Suite. The first drive in the boot order was the drive that failed. Since it failed during the reboot, when Solaris started back up, it started in single user mode. It gave the error:
Insufficient metadvice database replicas located

Here is some of the information that I found out:

root@server / # metastat
d0: Mirror
Submirror 0: d10
State: Needs maintenance
Submirror 1: d20
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 33002848 blocks

d10: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c0t0d0s0
Size: 33002848 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t0d0s0 0 No Maintenance

d20: Submirror of d0
State: Okay
Size: 33002848 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t1d0s0 0 No Okay

d1: Mirror
Submirror 0: d11
State: Needs maintenance
Submirror 1: d21
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 2101552 blocks

d11: Submirror of d1
State: Needs maintenance
Invoke: metareplace d1 c0t0d0s1
Size: 2101552 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t0d0s1 0 No Maintenance

d21: Submirror of d1
State: Okay
Size: 2101552 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t1d0s1 0 No Okay

root@server / # metastat -p
d0 -m d10 d20 1
d10 1 1 c0t0d0s0
d20 1 1 c0t1d0s0
d1 -m d11 d21 1
d11 1 1 c0t0d0s1
d21 1 1 c0t1d0s1
root@server / # metadb -i
flags first blk block count
M p unknown unknown /dev/dsk/c0t0d0s3
M p unknown unknown /dev/dsk/c0t0d0s3
M p unknown unknown /dev/dsk/c0t0d0s4
M p unknown unknown /dev/dsk/c0t0d0s4
a m p lu 16 1034 /dev/dsk/c0t1d0s3
a p l 1050 1034 /dev/dsk/c0t1d0s3
a p l 16 1034 /dev/dsk/c0t1d0s4
a p l 1050 1034 /dev/dsk/c0t1d0s4

To fix this, do the following:

1. When you are booted into single user mode, delete any metadevice state database replicas that were on the ‘bad’ disk. This disk does not even have to been in to do this, but it needs to be done. So I had to do the following for my system:
# metadb -d /dev/dsk/c0t0d0s3
# metadb -d /dev/dsk/c0t0d0s4

2. Make sure that the metadevices have been deleted.
# metadb -i

3. Pull the ‘bad’ disk out and leave the bay empty. Reboot the machine.

4. At this point, you should be in normal boot sequence and get you into a running system, with only one disk in the RAID.

5. Put the replacement disk in the bay.

6. Partition the new disk. Easiest way to do this is to copy the partition table from the working disk.
# dd if=/dev/rdsk/c0t1d0s2 of=/dev/rdsk/c0t0d0s2 count=16

7. Recreate the metadevice state database replicas on the new disk. I had to do the following:
# metadb -a -f -c /dev/dsk/c0t0d0s3
# metadb -a -f -c /dev/dsk/c0t0d0s4

8. Setup the disks so that they are back in the submirrors:
# metareplace –e d0 c0t0d0s0
# metareplace –e d1 c0t0d0s1

At that point, you should have everything up and running. You can use ‘metastat’ to watch the disks sync up.