miscellaneous_tips:02_linux:raid_lvm

Dies ist eine alte Version des Dokuments!


Linux RAID and LVM Setup

This is a short introduction how to work with Linux RAID and LVM.

Please note: If the RAID array is to be built including SSDs (Solid State Disks) then there is also some general information on using SSDs under Linux.

Linux RAID provides redundancy of disks which increases the fault tolerance of storage systems and avoids data loss in case a disk drive fails.

LVM is a concept where several physical disks or even complete RAID arrays can be combined to provide one or more disk volumes. The advantage of LVM is that at any time, even during normal operation, LVM volumes can be changed in size, or disks can be added, removed, or replaced.

From the point of view of an LVM, a physical disk can be either a single disk drive, or a RAID array of disks. Using a RAID array of disks significantly increases the fault tolerance of the LVM volume.

  • Linux can be booted from a RAID1 array, but not from an LVM volume, so the boot partition should be located on a RAID1 array, not on an LVM volume.
  • LVM volumes can easily be increased in size by adding new disks, or replacing existing disks by newer, larger ones, so they can well be used for large data storage partitions.

Linux installation on a RAID1 and/or LVM can be cumbersome, depending on the Linux distribution, so it possibly makes sense to prepare at least a partition on the RAID1 array for booting and system installation first, and then select these partitions during installation instead of letting the installer create the partitions.

For example, at the time of this writing, openSUSE's YaST tool is unable to create a RAID1 array with a missing drive.

The mdadm tool is used to manage Linux RAID arrays. The index number of a RAID device, as in md0, has to be unique. if md0 already exists, a different, unused number has to be chosen for a new array.


mdadm –create –verbose /dev/md0 –level=1 –raid-devices=2 /dev/sdx1 /dev/sdy 1


Assume there is an existing non-RAID drive /dev/sdx, and a single new drive /dev/sdy is available which should become part of the new RAID1 array:

  • Prepare a new partition on the new disk /dev/sdy
  • Create a new RAID1 array using only the new partition /dev/sdy1, and declare the other RAID component as missing:
mdadm –create –verbose /dev/md0 –level=1 –raid-devices=2 missing /dev/sdx 1
mdadm –manage /dev/md0 –add /dev/sda1
  • Create an new partition on the RAID device /dev/md0
  • Format and mount the new partition /dev/md0p1 as usual
  • Add the mounting information to /etc/fstab so the partition can be mounted automatically
  • Done

Assuing an existing RAID1 array /dev/md0 with /dev/sda1 and /dev/sdb1, where the partitions (and thus the RAID array) should grow.

mdadm –fail /dev/md0 /dev/sda1
mdadm –remove /dev/md0 /dev/sda1
  • Resize or re-create partition /dev/sda1, then add the grown /dev/sda1 to the existing RAID:
mdadm –add /dev/md0 /dev/sda1
  • Wait until resync complete
  • Then update partition on /dev/sdb:
    • Remove /dev/sdb from /dev/md0
    • Copy partition table from /dev/sda to /dev/sdb
    • Add new partition /dev/sdb1 to /dev/md0
  • Wait until resync complete

If the array has a write-intent bitmap, it is strongly recommended that you remove the bitmap before increasing the size of the array. Failure to observe this precaution can lead to the destruction of the array if the existing bitmap is insufficiently large, especially if the increased array size necessitates a change to the bitmap's chunksize.

mdadm –grow /dev/mdX –bitmap none
mdadm –grow /dev/mdX –size max
mdadm –wait /dev/mdX
mdadm –grow /dev/mdX –bitmap internal

FIXME

If there is a partition on the RAID array /dev/md0 then this partition also need to be grown using a tool like parted.

Finally the file system on the partition needs to be extended. First make sure the file system is consistent, then axtednd it. For ext file systems:

fsck /dev/md0
resize2fs /dev/md0

If the RAID array is to become part of an LVM volume group then an LVM physical volume has to be created from the RAID array.

Some actions on a RAID array can only be taken after the RAID array has been stopped:

mdadm –stop /dev/md1

If the RAID array cant be stopped then possibly a partition on the device has to be unmounted first, or, if the RAID array is part of an LVM the logical volume has to be deactivated before.

In some cases a RAID array needs to be renamed. For example if the hostname is part of the array name, and has been changed.

In the example below the array is assembled as /dev/md125, and its old internal name is mediaplayer:2:

~ # mdadm –detail /dev/md125
/dev/md125:
        Version : 1.0
  Creation Time : Sun Oct 28 15:08:56 2012
     Raid Level : raid1
     Array Size : 5244916 (5.00 GiB 5.37 GB)
  Used Dev Size : 5244916 (5.00 GiB 5.37 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Thu Nov  3 23:50:33 2016
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : mediaplayer:2
           UUID : d35131f9:adf82215:10b667d6:9fab3ed3
         Events : 187

    Number   Major   Minor   RaidDevice State
       0       8       98        0      active sync   /dev/sdg2
       1       8       50        1      active sync   /dev/sdd2

We want to change the internal name to pc-martin:5 and assemble it as /dev/md/5 AKA /dev/md5. To do this, the array first has to be stopped, and then re-assembled with a new name:

~ # mdadm –stop /dev/md125
mdadm: stopped /dev/md125
~ # mdadm –assemble /dev/md/5 –name=pc-martin:5 –update=name /dev/sdg2 /dev/sdd2
mdadm: /dev/md/5 has been started with 2 drives.

It is important that both the parameters –name=pc-martin:5 and –update=name are given in the commands above.

~ # mdadm –detail /dev/md5
/dev/md5:
        Version : 1.0
  Creation Time : Sun Oct 28 15:08:56 2012
     Raid Level : raid1
     Array Size : 5244916 (5.00 GiB 5.37 GB)
  Used Dev Size : 5244916 (5.00 GiB 5.37 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Thu Nov  3 23:50:33 2016
          State : clean 
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : pc-martin:5  (local to host pc-martin)
           UUID : d35131f9:adf82215:10b667d6:9fab3ed3
         Events : 187

    Number   Major   Minor   RaidDevice State
       0       8       98        0      active sync   /dev/sdg2
       1       8       50        1      active sync   /dev/sdd2

Finally an appropriate line has to added to or replaced in /etc/mdadm.conf:

~ # mdadm –detail –brief /dev/md5
ARRAY /dev/md5 metadata=1.0 name=pc-martin:5 UUID=d35131f9:adf82215:10b667d6:9fab3ed3

In many cases it is important to know the type of a partition table on a disk.

  • gpt is new partition table type which can be used for large disks, and is supported by UEFI boot
  • dos (sometimes displayed as ms-dos) is a legacy partition table type which doesn't support very large disks, and is not supported for UEFI boot

If the partition table type is gpt then the tools gdisk and sgdisk are appropriate to work with the partition table. If the partition table type is shown as dos or ms-dos then the tools fdisk or sfdisk have to be used.

The parted tool and its graphical frontend gparted support both gpt and dos partition tables. They can be used to create partitions as well as to determine the partition table type.

Determining The Partition Table Type

The parted tool can be used to display a disk's current partition table type, e.g.:

:~ # parted /dev/sda print
Model: ATA OCZ-VERTEX3 (scsi)
Disk /dev/sda: 240GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End    Size    File system     Name        Flags
 1      1049kB  223GB  223GB   linux-swap(v1)  Linux RAID  raid
 2      223GB   240GB  17.2GB  linux-swap(v1)  Linux swap

In the example above, the partition table type is gpt.

Preparing A New RAID Partition On A New Disk

  • Determine the type of the existing partition table, or create a new partition table, e.g. with the parted tool
  • Create a partition on the new disk drive
  • Don't format the partition, but set the partition type to linux raid (type 0xFD)

FIXME Add some example commands

Please note the sequence of device parameters differs for both commands, in both cases above /dev/sda is the existing disk with a valid partition table and /dev/sdb is the new disk to which the partition table is to be copied.

Copy an msdos-type partition table from /dev/sda to /dev/sdb:

sfdisk -d /dev/sda | sfdisk /dev/sdb

Copy a gpt-type partition table from /dev/sda to /dev/sdb:

sgdisk -R /dev/sdb /dev/sda

LVM distinguishes between different layers:

  • A Physical Volume (pv) can be a single disk drive, or a whole RAID array
  • Physical Volumes are combined to implement a Volume Group (vg)
  • A Volume Group can contain one or more Logical Volumes (lv)
  • A Logical Volume can be used like a disk partition, it can be formatted and mounted

For each layer there are different tools available:

  • pvdisplay, pvcreate etc. can be used to manage a physical LVM volume (pv)
  • vgdisplay, vgcreate etc. can be used to manage a LVM volume group (vg) consisting of one or more physical volume(s)
  • lvdisplay, lvcreate etc. can be used to manage a logical LVM volume (lv) allocated in a volume group

Each physical disk or RAID array to be used with LVM has to be registered as Physical Volume (pv) first.

Assuming /dev/md1 is an existing RAID array used as physical volume which is to be replaced by a newly installed RAID array /dev/md2:

pvcreate /dev/sda   # Use a whole physical disk drive
pvcreate /dev/md2   # Use a whole RAID array

pvdisplay can be used to display existing physical volumes, in the example below /dev/md0 is a pure RAID1 array from which the system boots, so it is not listed. The output shows an older volume /dev/md1 which already belongs to logical volume vg-data, and a newly created /dev/md1 which doesn't belong to a volume group, yet:

pvdisplay
  — Physical volume —
  PV Name               /dev/md1
  VG Name               vg-data
  PV Size               465.76 GiB / not usable 1.87 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              119234
  Free PE               0
  Allocated PE          119234
  PV UUID               NEfhlq-t3cH-ThOM-YGGV-uHWp-GzfL-23Agaj

/dev/md2 is a new physical volume of "953.74 GiB" size:

  — NEW Physical volume —
  PV Name               /dev/md2
  VG Name               
  PV Size               953.74 GiB
  Allocatable           NO
  PE Size               0   
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               kgj8w4-bU0c-5RAL-ETkY-Rnfk-arbh-TyNd8Q
vgcreate data /dev/md0
lvcreate -l 100%FREE -n data data

Add the new physical volume /dev/md2 to the existing volume group vg-data:

vgextend vg-data /dev/md2

Move all data from the old physical disk /dev/md1 which is to be removed to some other free space in the volume group, e.g. /dev/md2. This may take quite some time to complete, depending on the disk sizes:

pvmove /dev/md1

Remove an unused physical volume /dev/md1 from the volume group vg-data:

vgreduce vg-data /dev/md1

Remove an old physical volume (could be a whole RAID1 array) /dev/md1 from the LVM. Wipes the label on a device so that LVM will no longer recognize it as a physical volume:

pvremove /dev/md1

If the size of a logical volume is to be increased to the maximum size of the volume group e.g. after the volume group has been enlarged by a new physical volume then the following command can be used:

lvresize -r -l 100%VG /dev/vg-data/lv-data

The parameter -r takes care that the size of the underlying file system in the volume group is also adjusted accordingly. If the logical volume is to be shrunk, however, then the underlying file system needs to be shrunk first, before the logical volume is shrunk using the lvresize command, otherwise data may be lost. See man lvresize for details.

FIXME

lvchange -an /dev/….    # deactivate a logical volume
lvchange -ay /dev/….    # activate a logical volume

The command

cat /proc/mdstat

can be used to monitor the state of all RAID arrays in a system. For example:

~ # cat /proc/mdstat 
Personalities : [raid1] 
md2 : active raid1 sdd1[0] sdb1[2]
      1000072192 blocks super 1.2 [2/2] [UU]
      bitmap: 0/8 pages [0KB], 65536KB chunk

md0 : active raid1 sdc1[0] sda1[2]
      217508864 blocks super 1.2 [2/2] [UU]
      bitmap: 2/2 pages [8KB], 65536KB chunk

unused devices: <none>

In the example output above md0 is a RAID1 array from which the system boots, md2 is a RAID array which is used as a physical volume in a volume group providing a logical volume used as data partition. The original RAID array for LVM was md1, but this has been replaced by a newer, larger array named md2, as described above.

The most important thing here is that both arrays are labelled [UU] which indicates that the array is healthy. if the status code is [U_] or [_U] this means that an array drive is faulty or missing.

After a disk has been removed from a RAID array or from an LVM volume then signatures may still be available on the disk, so if the disk is re-used the old metadata may appear again.

To fix this the metadata can be removed from the disk before it is re-used.

The safest way is to boot a live system like partedmagic from an USB stick, with only the old disk connected. The mdadm program refuses to remove the RAID metadata from the partition if the RAID array is still running, and the RAID array can only be stopped if is isn't in use e.g. by a LVM volume group. So if live system has recognized an old logical volume vgdata based on a RAID device /dev/md1 which consisted of the partition /dev/sda1, then the following actions need to be taken:

mdadm –zero-superblock /dev/sda1
wipefs –all /dev/sda

Martin Burnicki 2016-04-08 12:20

  • miscellaneous_tips/02_linux/raid_lvm.1630054334.txt.gz
  • Zuletzt geändert: 2021-08-27 10:52
  • von martin