Update with new procedure docs

Update for disk replacement
This commit is contained in:
Joey Hafner 2024-02-29 01:38:57 -08:00
parent 0ccf7a3a81
commit 93dea6c474

View File

@ -1,5 +1,5 @@
# Physical Disk Locations (DS4243) # Physical Disk Locations (DS4243)
*Updated 2024/02/27* *Updated 2024/02/28*
Each cell contains the serial number for the drive in the mapped bay. Each cell contains the serial number for the drive in the mapped bay.
| | X1 | X2 | X3 | X4 | | | X1 | X2 | X3 | X4 |
@ -9,17 +9,40 @@ Each cell contains the serial number for the drive in the mapped bay.
| Y3 | VJGJUWNX | 2EGXD27V | VJGJAS1X | VJG2UTUX | | Y3 | VJGJUWNX | 2EGXD27V | VJGJAS1X | VJG2UTUX |
| Y4 | VJGRGD2X | 2EGL8AVV | 2EKA903X | VJGRRG9X | | Y4 | VJGRGD2X | 2EGL8AVV | 2EKA903X | VJGRRG9X |
| Y5 | VJGK56KX | 2EGNPVWV | 2EKATR2X | VKH3Y3XX | | Y5 | VJGK56KX | 2EGNPVWV | 2EKATR2X | VKH3Y3XX |
| Y6 | VLKV9N8V | R5G4W2VV | 2EKA92XX | VKGW5YGX | | Y6 | VLKV9N8V | R5G4W2VV | VLKXPS1V | VKGW5YGX |
# Identify a Failing Disk # Identify a Failing Disk
Disk Smart test errors are reported by device ID (e.g. /dev/sdw), rather than the serial number. To find the serial number associated with a particular device ID, run the following one-liner with `$dev` substituted for the device to find: Disk Smart test errors are reported by device ID (e.g. /dev/sdw), rather than the serial number. To find the serial number associated with a particular device ID, run the following one-liner with `$dev` substituted for the device to find:
`TODO` `TODO`
# Get Serial Number from part-uuid
`ls -l /dev/disk/by-partuuid`
Will return lines for each partition device and its mapping to a `/dev/sd` Linux block device.
From there, run `smartctl -a <block device> | grep Serial` where `<block device>` is like `/dev/sdw`.
Or, as a one-liner with `$DISK_UUID` set to the UUID to find:
`ls -l /dev/disk/by-partuuid | grep $DISK_UUID | cut -d' ' -f 11 | xargs basename | sed 's/^/\/dev\//' | xargs sudo smartctl -a | grep Serial | tr -s ' ' | cut -d' ' -f 3`
It might be possible to pull the part UUID from the `zpool status` command directly. An exercise for the reader.
# Offline and wipe the failing disk
0. Match the disk name (e.g. `/dev/sdw`) to the UUID (e.g. `13846695584571018356`). Use `lsblk --fs` for this.
1. Offline the disk: `zpool offline $pool $disk_id`
2. Wipe the disk: `wipefs $disklabel` (where `$disklabel` is like `/dev/sdw`)
3. Run `lsblk --fs` again to verify the wipe worked. If not, you'll need to run a full dd wipe with `dd if=/dev/zero of=$disklabel bs=1M`. This will take a long time as it writes zeroes across the entire drive.
4. Physically remove the disk.
# Replace Disk in Pool # Replace Disk in Pool
Once the failed disk has been identified and physically replaced, you should know the old drive's UUID (via `zpool status`) and the new drive's device name (via `lsblk` and deduction) Once the failed disk has been identified and physically replaced, you should know the old drive's UUID (via `zpool status`) and the new drive's device name (via `lsblk` and deduction)
# Update Log # Update Log
**Most recent first**
### 2024/02/27 - *2024/02/28*: Replaced 2EKA92XX with VLKXPS1V at Y6/X3
- Replaced VJG2T4YX with VJG282NX at Y2/X3 - *2024/02/27*: Replaced VJG2T4YX with VJG282NX at Y2/X3