This article provides an example to outline the steps required for replacing a defective drive in a software RAID (mdadm) setup.
Let’s outline a real scenario I encountered yesterday. I have a mirrored (RAID1) array, and I just received an email alert from mdadm monitoring, indicating that a “Degraded Array event has been detected on the md device /dev/md/”
Step 1 – Check the array status
To verify that the disk is failed, check /proc/mdstat:
root@cp11 ~ # cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sda3[0]
1918777408 blocks super 1.2 [2/1] [U_]
bitmap: 8/15 pages [32KB], 65536KB chunk
md1 : active raid1 sda2[0]
1046528 blocks super 1.2 [2/1] [U_]
md0 : active raid1 sda1[0]
33520640 blocks super 1.2 [2/1] [U_]
unused devices:
I don’t have an “sdb” disk in my active RAID, and only “sda” is available, which likely means that “sdb” is already dead.
Step 2 – Find the HDD serial numbers
My data center is requesting the serial number of the HDD that needs to be replaced. You can use the lsblk utility to determine which serial numbers correspond to which drives.
root@cp11 / # lsblk -o NAME,SERIAL
NAME SERIAL
sda K5HRNX7A
├─sda1
│ └─md0
├─sda2
│ └─md1
└─sda3
└─md2
In my situation, I only noticed the serial number for the functioning HDD, which suggests that the “sdb” drive is likely defective. I contacted the data center to request a replacement for the faulty HDD and provided them with the serial number of the functional drive to help identify the failed disk.
Step 3 -Arranging an appointment with the Data Center support team to exchange the defective drive
I have a completely failed device, so I didn’t attempt to remove the disk using mdadm. To replace the defective drive, you need to make an appointment with the Data Center support team in advance. The support team will need to take the server offline for a short period of time.
Step 4 – Preparing the new drive
Both drives in the array must have identical partitioning. You should copy the table using the appropriate utilities depending on whether you use the MBR or GPT partition table type.
Backing up the MBR
Before copying the MBR to a new drive, it’s essential to back it up. This precaution protects your data and gives you peace of mind, knowing that if something goes away during the copying process, you have a reliable way to restore the original.
Backup with MBR
# sfdisk --dump /dev/sda > sda_parttable_mbr.bak
Restore with MBR
# sfdisk /dev/sda < sda_parttable_mbr.bak
Step 5 - Partition the new drive
In this example, the new drive is identified as /dev/sdb, just like the old one. We need to copy the partition table from another drive in the array, which, in this case, is /dev/sda. We will use the `sfdisk` command to export the partition table from /dev/sda
sfdisk command to write the table to sdb:
# sfdisk -d /dev/sda | sfdisk /dev/sdb
You then need to assign the drive to a new random UUID:
# sgdisk -G /dev/sdb
Step 6 - Add the new drive to the array
Integration of the new drive
After removing the defective drive and installing the new one, you must integrate it into the RAID array for each partition.
# mdadm /dev/md0 -a /dev/sdb1
# mdadm /dev/md1 -a /dev/sdb2
# mdadm /dev/md2 -a /dev/sdb3
The new drive is now integrated into the array and will begin synchronization. Depending on the sizes of the partitions, this process may take some time. You can monitor the synchronization status using the command cat /proc/mdstat.
root@cp11 /home # cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 sdb3[2] sda3[0]
1918777408 blocks super 1.2 [2/1] [U_]
[=>...................] recovery = 5.8% (112027840/1918777408) finish=179.6min speed=167571K/sec
bitmap: 13/15 pages [52KB], 65536KB chunk
md1 : active raid1 sdb2[2] sda2[0]
1046528 blocks super 1.2 [2/1] [U_]
resync=DELAYED
md0 : active raid1 sdb1[2] sda1[0]
33520640 blocks super 1.2 [2/2] [UU]
unused devices:
Step 7 - Bootloader installation
Since the disk's serial number has changed, we need to generate a new one using GRUB2.
# grub-mkdevicemap -n
If you are performing this repair on a system already booted, running `grub-install` on the new drive is sufficient for GRUB2. For example:
# grub-install /dev/sdb
Done.
IMPORTANT NOTE: The commands provided are merely examples. Please adjust them as needed!