diff options
author | David Teigland <teigland@redhat.com> | 2016-09-06 14:50:08 -0500 |
---|---|---|
committer | David Teigland <teigland@redhat.com> | 2016-09-06 14:50:08 -0500 |
commit | a1fb7b51b799f8169a4dc03c7e414c9c88934f2d (patch) | |
tree | 55589b57ffd71385f93681476d4410c67189c9c3 | |
parent | c8a14a29cdcf7e800af16f9b6a8fa9ed49250d30 (diff) | |
download | lvm2-a1fb7b51b799f8169a4dc03c7e414c9c88934f2d.tar.gz |
man: add lvmraid(7)
-rw-r--r-- | man/lvmraid.7.in | 1314 |
1 files changed, 1314 insertions, 0 deletions
diff --git a/man/lvmraid.7.in b/man/lvmraid.7.in new file mode 100644 index 000000000..9725cbf37 --- /dev/null +++ b/man/lvmraid.7.in @@ -0,0 +1,1314 @@ +.TH "LVMRAID" "7" "LVM TOOLS #VERSION#" "Red Hat, Inc" "\"" + +.SH NAME +lvmraid \(em LVM RAID + +.SH DESCRIPTION + +LVM RAID is a way to create logical volumes (LVs) that use multiple physical +devices to improve performance or tolerate device failure. How blocks of +data in an LV are placed onto physical devices is determined by the RAID +level. RAID levels are commonly referred to by number, e.g. raid1, raid5. +Selecting a RAID level involves tradeoffs among physical device +requirements, fault tolerance, and performance. A description of the RAID +levels can be found at +.br +www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf + +LVM RAID uses both Device Mapper (DM) and Multiple Device (MD) drivers +from the Linux kernel. DM is used to create and manage visible LVM +devices, and MD is used to place data on physical devices. + +.SH Create a RAID LV + +To create a RAID LV, use lvcreate and specify an LV type. +The LV type corresponds to a RAID level. +The basic RAID levels that can be used are: +.B raid0, raid1, raid4, raid5, raid6, raid10. + +.B lvcreate \-\-type +.I RaidLevel +[\fIOPTIONS\fP] +.B \-\-name +.I Name +.B \-\-size +.I Size +.I VG +[\fIPVs\fP] + +To display the LV type of an existing LV, run: + +.B lvs -o name,segtype +\fIVG\fP/\fILV\fP + +(The LV type is also referred to as "segment type" or "segtype".) + +LVs can be created with the following types: + +.SS raid0 + +\& + +Also called striping, raid0 spreads LV data across multiple devices in +units of stripe size. This is used to increase performance. LV data will +be lost if any of the devices fail. + +.B lvcreate \-\-type raid0 +[\fB\-\-stripes\fP \fINumber\fP \fB\-\-stripesize\fP \fISize\fP] +\fIVG\fP +[\fIPVs\fP] + +.HP +.B \-\-stripes +specifies the number of devices to spread the LV across. + +.HP +.B \-\-stripesize +specifies the size of each stripe in kilobytes. This is the amount of +data that is written to one device before moving to the next. +.P + +\fIPVs\fP specifies the devices to use. If not specified, lvm will choose +\fINumber\fP devices, one for each stripe. + +.SS raid1 + +\& + +Also called mirroring, raid1 uses multiple devices to duplicate LV data. +The LV data remains available if all but one of the devices fail. +The minimum number of devices required is 2. + +.B lvcreate \-\-type raid1 +[\fB\-\-mirrors\fP \fINumber\fP] +\fIVG\fP +[\fIPVs\fP] + +.HP +.B \-\-mirrors +specifies the number of mirror images in addition to the original LV +image, e.g. \-\-mirrors 1 means there are two images of the data, the +original and one mirror image. +.P + +\fIPVs\fP specifies the devices to use. If not specified, lvm will choose +\fINumber\fP devices, one for each image. + +.SS raid4 + +\& + +raid4 is a form of striping that uses an extra device dedicated to storing +parity blocks. The LV data remains available if one device fails. The +parity is used to recalculate data that is lost from a single device. The +minimum number of devices required is 3. + +.B lvcreate \-\-type raid4 +[\fB\-\-stripes\fP \fINumber\fP \fB\-\-stripesize\fP \fISize\fP] +\fIVG\fP +[\fIPVs\fP] + +.HP +.B \-\-stripes +specifies the number of devices to use for LV data. This does not include +the extra device lvm adds for storing parity blocks. \fINumber\fP stripes +requires \fINumber\fP+1 devices. \fINumber\fP must be 2 or more. + +.HP +.B \-\-stripesize +specifies the size of each stripe in kilobytes. This is the amount of +data that is written to one device before moving to the next. +.P + +\fIPVs\fP specifies the devices to use. If not specified, lvm will choose +\fINumber\fP+1 separate devices. + +raid4 is called non-rotating parity because the parity blocks are always +stored on the same device. + +.SS raid5 + +\& + +raid5 is a form of striping that uses an extra device for storing parity +blocks. LV data and parity blocks are stored on each device. The LV data +remains available if one device fails. The parity is used to recalculate +data that is lost from a single device. The minimum number of devices +required is 3. + +.B lvcreate \-\-type raid5 +[\fB\-\-stripes\fP \fINumber\fP \fB\-\-stripesize\fP \fISize\fP] +\fIVG\fP +[\fIPVs\fP] + +.HP +.B \-\-stripes +specifies the number of devices to use for LV data. This does not include +the extra device lvm adds for storing parity blocks. \fINumber\fP stripes +requires \fINumber\fP+1 devices. \fINumber\fP must be 2 or more. + +.HP +.B \-\-stripesize +specifies the size of each stripe in kilobytes. This is the amount of +data that is written to one device before moving to the next. +.P + +\fIPVs\fP specifies the devices to use. If not specified, lvm will choose +\fINumber\fP+1 separate devices. + +raid5 is called rotating parity because the parity blocks are placed on +different devices in a round-robin sequence. There are variations of +raid5 with different algorithms for placing the parity blocks. The +default variant is raid5_ls (raid5 left symmetric, which is a rotating +parity 0 with data restart.) See \fBRAID5 variants\fP below. + +.SS raid6 + +\& + +raid6 is a form of striping like raid5, but uses two extra devices for +parity blocks. LV data and parity blocks are stored on each device. The +LV data remains available if up to two devices fail. The parity is used +to recalculate data that is lost from one or two devices. The minimum +number of devices required is 5. + +.B lvcreate \-\-type raid6 +[\fB\-\-stripes\fP \fINumber\fP \fB\-\-stripesize\fP \fISize\fP] +\fIVG\fP +[\fIPVs\fP] + +.HP +.B \-\-stripes +specifies the number of devices to use for LV data. This does not include +the extra two devices lvm adds for storing parity blocks. \fINumber\fP +stripes requires \fINumber\fP+2 devices. \fINumber\fP must be 3 or more. + +.HP +.B \-\-stripesize +specifies the size of each stripe in kilobytes. This is the amount of +data that is written to one device before moving to the next. +.P + +\fIPVs\fP specifies the devices to use. If not specified, lvm will choose +\fINumber\fP+2 separate devices. + +Like raid5, there are variations of raid6 with different algorithms for +placing the parity blocks. The default variant is raid6_zr (raid6 zero +restart, aka left symmetric, which is a rotating parity 0 with data +restart.) See \fBRAID6 variants\fP below. + +.SS raid10 + +\& + +raid10 is a combination of raid1 and raid0, striping data across mirrored +devices. LV data remains available if one or more devices remains in each +mirror set. The minimum number of devices required is 4. + +.B lvcreate \-\-type raid10 +.RS +[\fB\-\-mirrors\fP \fINumberMirrors\fP] +.br +[\fB\-\-stripes\fP \fINumberStripes\fP \fB\-\-stripesize\fP \fISize\fP] +.br +\fIVG\fP +[\fIPVs\fP] +.RE + +.HP +.B \-\-mirrors +specifies the number of mirror images within each stripe. e.g. +\-\-mirrors 1 means there are two images of the data, the original and one +mirror image. + +.HP +.B \-\-stripes +specifies the total number of devices to use in all raid1 images (not the +number of raid1 devices to spread the LV across, even though that is the +effective result). The number of devices in each raid1 mirror will be +NumberStripes/(NumberMirrors+1), e.g. mirrors 1 and stripes 4 will stripe +data across two raid1 mirrors, where each mirror is devices. + +.HP +.B \-\-stripesize +specifies the size of each stripe in kilobytes. This is the amount of +data that is written to one device before moving to the next. +.P + +\fIPVs\fP specifies the devices to use. If not specified, lvm will choose +the necessary devices. Devices are used to create mirrors in the +order listed, e.g. for mirrors 1, stripes 2, listing PV1 PV2 PV3 PV4 +results in mirrors PV1/PV2 and PV3/PV4. + +RAID10 is not mirroring on top of stripes, which would be RAID01, which is +less tolerant of device failures. + + +.SH Synchronization + +Synchronization makes all the devices in a RAID LV consistent with each +other. + +In a RAID1 LV, all mirror images should have the same data. When a new +mirror image is added, or a mirror image is missing data, then images need +to be synchronized. Data blocks are copied from an existing image to a +new or outdated image to make them match. + +In a RAID 4/5/6 LV, parity blocks and data blocks should match based on +the parity calculation. When the devices in a RAID LV change, the data +and parity blocks can become inconsistent and need to be synchronized. +Correct blocks are read, parity is calculated, and recalculated blocks are +written. + +The RAID implementation keeps track of which parts of a RAID LV are +synchronized. This uses a bitmap saved in the RAID metadata. The bitmap +can exclude large parts of the LV from synchronization to reduce the +amount of work. Without this, the entire LV would need to be synchronized +every time it was activated. When a RAID LV is first created and +activated the first synchronization is called initialization. + +Automatic synchronization happens when a RAID LV is activated, but it is +usually partial because the bitmaps reduce the areas that are checked. +A full sync may become necessary when devices in the RAID LV are changed. + +The synchronization status of a RAID LV is reported by the +following command, where "image synced" means sync is complete: + +.B lvs -a -o name,sync_percent + + +.SS Scrubbing + +Scrubbing is a full scan/synchronization of the RAID LV requested by a user. +Scrubbing can find problems that are missed by partial synchronization. + +Scrubbing assumes that RAID metadata and bitmaps may be inaccurate, so it +verifies all RAID metadata, LV data, and parity blocks. Scrubbing can +find inconsistencies caused by hardware errors or degradation. These +kinds of problems may be undetected by automatic synchronization which +excludes areas outside of the RAID write-intent bitmap. + +The command to scrub a RAID LV can operate in two different modes: + +.B lvchange \-\-syncaction +.BR check | repair +.IR VG / LV + +.HP +.B check +Check mode is read\-only and only detects inconsistent areas in the RAID +LV, it does not correct them. + +.HP +.B repair +Repair mode checks and writes corrected blocks to synchronize any +inconsistent areas. + +.P + +Scrubbing can consume a lot of bandwidth and slow down application I/O on +the RAID LV. To control the I/O rate used for scrubbing, use: + +.HP +.B \-\-maxrecoveryrate +.BR \fIRate [ b | B | s | S | k | K | m | M | g | G ] +.br +Sets the maximum recovery rate for a RAID LV. \fIRate\fP is specified as +an amount per second for each device in the array. If no suffix is given, +then KiB/sec/device is assumed. Setting the recovery rate to \fB0\fP +means it will be unbounded. + +.HP +.BR \-\-minrecoveryrate +.BR \fIRate [ b | B | s | S | k | K | m | M | g | G ] +.br +Sets the minimum recovery rate for a RAID LV. \fIRate\fP is specified as +an amount per second for each device in the array. If no suffix is given, +then KiB/sec/device is assumed. Setting the recovery rate to \fB0\fP +means it will be unbounded. + +.P + +To display the current scrubbing in progress on an LV, including +the syncaction mode and percent complete, run: + +.B lvs -a -o name,raid_sync_action,sync_percent + +After scrubbing is complete, to display the number of inconsistent blocks +found, run: + +.B lvs -o name,raid_mismatch_count + +Also, if mismatches were found, the lvs attr field will display the letter +"m" (mismatch) in the 9th position, e.g. + +.nf +# lvs -o name,vgname,segtype,attr vg/lvol0 + LV VG Type Attr + lvol0 vg raid1 Rwi-a-r-m- +.fi + + +.SS Scrubbing Limitations + +The \fBcheck\fP mode can only report the number of inconsistent blocks, it +cannot report which blocks are inconsistent. This makes it impossible to +know which device has errors, or if the errors affect file system data, +metadata or nothing at all. + +The \fBrepair\fP mode can make the RAID LV data consistent, but it does +not know which data is correct. The result may be consistent but +incorrect data. When two different blocks of data must be made +consistent, it chooses the block from the device that would be used during +RAID intialization. However, if the PV holding corrupt data is known, +lvchange \-\-rebuild can be used to reconstruct the data on the bad +device. + +Future developments might include: + +Allowing a user to choose the correct version of data during repair. + +Using a majority of devices to determine the correct version of data to +use in a three-way RAID1 or RAID6 LV. + +Using a checksumming device to pin-point when and where an error occurs, +allowing it to be rewritten. + + +.SH SubLVs + +An LV is often a combination of other hidden LVs called SubLVs. The +SubLVs either use physical devices, or are built from other SubLVs +themselves. SubLVs hold LV data blocks, RAID parity blocks, and RAID +metadata. SubLVs are generally hidden, so the lvs \-a option is required +display them: + +.B lvs -a -o name,segtype,devices + +SubLV names begin with the visible LV name, and have an automatic suffix +indicating its role: + +.IP \(bu 3 +SubLVs holding LV data or parity blocks have the suffix _rimage_#. +These SubLVs are sometimes referred to as DataLVs. + +.IP \(bu 3 +SubLVs holding RAID metadata have the suffix _rmeta_#. RAID metadata +includes superblock information, RAID type, bitmap, and device health +information. These SubLVs are sometimes referred to as MetaLVs. + +.P + +SubLVs are an internal implementation detail of LVM. The way they are +used, constructed and named may change. + +The following examples show the SubLV arrangement for each of the basic +RAID LV types, using the fewest number of devices allowed for each. + +.SS Examples + +.B raid0 +.br +Each rimage SubLV holds a portion of LV data. No parity is used. +No RAID metadata is used. + +.nf +lvcreate --type raid0 --stripes 2 --name lvr0 ... + +lvs -a -o name,segtype,devices + lvr0 raid0 lvr0_rimage_0(0),lvr0_rimage_1(0) + [lvr0_rimage_0] linear /dev/sda(...) + [lvr0_rimage_1] linear /dev/sdb(...) +.fi + +.B raid1 +.br +Each rimage SubLV holds a complete copy of LV data. No parity is used. +Each rmeta SubLV holds RAID metadata. + +.nf +lvcreate --type raid1 --mirrors 1 --name lvr1 ... + +lvs -a -o name,segtype,devices + lvr1 raid1 lvr1_rimage_0(0),lvr1_rimage_1(0) + [lvr1_rimage_0] linear /dev/sda(...) + [lvr1_rimage_1] linear /dev/sdb(...) + [lvr1_rmeta_0] linear /dev/sda(...) + [lvr1_rmeta_1] linear /dev/sdb(...) +.fi + +.B raid4 +.br +Two rimage SubLVs each hold a portion of LV data and one rimage SubLV +holds parity. Each rmeta SubLV holds RAID metadata. + +.nf +lvcreate --type raid4 --stripes 2 --name lvr4 ... + +lvs -a -o name,segtype,devices + lvr4 raid4 lvr4_rimage_0(0),\\ + lvr4_rimage_1(0),\\ + lvr4_rimage_2(0) + [lvr4_rimage_0] linear /dev/sda(...) + [lvr4_rimage_1] linear /dev/sdb(...) + [lvr4_rimage_2] linear /dev/sdc(...) + [lvr4_rmeta_0] linear /dev/sda(...) + [lvr4_rmeta_1] linear /dev/sdb(...) + [lvr4_rmeta_2] linear /dev/sdc(...) +.fi + +.B raid5 +.br +Three rimage SubLVs each hold a portion of LV data and parity. +Each rmeta SubLV holds RAID metadata. + +.nf +lvcreate --type raid5 --stripes 2 --name lvr5 ... + +lvs -a -o name,segtype,devices + lvr5 raid5 lvr5_rimage_0(0),\\ + lvr5_rimage_1(0),\\ + lvr5_rimage_2(0) + [lvr5_rimage_0] linear /dev/sda(...) + [lvr5_rimage_1] linear /dev/sdb(...) + [lvr5_rimage_2] linear /dev/sdc(...) + [lvr5_rmeta_0] linear /dev/sda(...) + [lvr5_rmeta_1] linear /dev/sdb(...) + [lvr5_rmeta_2] linear /dev/sdc(...) +.fi + +.B raid6 +.br +Six rimage SubLVs each hold a portion of LV data and parity. +Each rmeta SubLV holds RAID metadata. + +.nf +lvcreate --type raid6 --stripes 3 --name lvr6 + +lvs -a -o name,segtype,devices + lvr6 raid6 lvr6_rimage_0(0),\\ + lvr6_rimage_1(0),\\ + lvr6_rimage_2(0),\\ + lvr6_rimage_3(0),\\ + lvr6_rimage_4(0),\\ + lvr6_rimage_5(0) + [lvr6_rimage_0] linear /dev/sda(...) + [lvr6_rimage_1] linear /dev/sdb(...) + [lvr6_rimage_2] linear /dev/sdc(...) + [lvr6_rimage_3] linear /dev/sdd(...) + [lvr6_rimage_4] linear /dev/sde(...) + [lvr6_rimage_5] linear /dev/sdf(...) + [lvr6_rmeta_0] linear /dev/sda(...) + [lvr6_rmeta_1] linear /dev/sdb(...) + [lvr6_rmeta_2] linear /dev/sdc(...) + [lvr6_rmeta_3] linear /dev/sdd(...) + [lvr6_rmeta_4] linear /dev/sde(...) + [lvr6_rmeta_5] linear /dev/sdf(...) + +.B raid10 +.br +Four rimage SubLVs each hold a portion of LV data. No parity is used. +Each rmeta SubLV holds RAID metadata. + +.nf +lvcreate --type raid10 --stripes 2 --mirrors 1 --name lvr10 + +lvs -a -o name,segtype,devices + lvr10 raid10 lvr10_rimage_0(0),\\ + lvr10_rimage_1(0),\\ + lvr10_rimage_2(0),\\ + lvr10_rimage_3(0) + [lvr10_rimage_0] linear /dev/sda(...) + [lvr10_rimage_1] linear /dev/sdb(...) + [lvr10_rimage_2] linear /dev/sdc(...) + [lvr10_rimage_3] linear /dev/sdd(...) + [lvr10_rmeta_0] linear /dev/sda(...) + [lvr10_rmeta_1] linear /dev/sdb(...) + [lvr10_rmeta_2] linear /dev/sdc(...) + [lvr10_rmeta_3] linear /dev/sdd(...) +.fi + + +.SH Device Failure + +Physical devices in a RAID LV can fail or be lost for multiple reasons. +A device could be disconnected, permanently failed, or temporarily +disconnected. The purpose of RAID LVs (levels 1 and higher) is to +continue operating in a degraded mode, without losing LV data, even after +a device fails. The number of devices that can fail without the loss of +LV data depends on the RAID level: + +.IP \[bu] 3 +RAID0 (striped) LVs cannot tolerate losing any devices. LV data will be +lost if any devices fail. + +.IP \[bu] 3 +RAID1 LVs can tolerate losing all but one device without LV data loss. + +.IP \[bu] 3 +RAID4 and RAID5 LVs can tolerate losing one device without LV data loss. + +.IP \[bu] 3 +RAID6 LVs can tolerate losing two devices without LV data loss. + +.IP \[bu] 3 +RAID10 is variable, and depends on which devices are lost. It can +tolerate losing all but one device in a single raid1 mirror without +LV data loss. + +.P + +If a RAID LV is missing devices, or has other device-related problems, lvs +reports this in the health_status (and attr) fields: + +.B lvs -o name,lv_health_status + +.B partial +.br +Devices are missing from the LV. This is also indicated by the letter "p" +(partial) in the 9th position of the lvs attr field. + +.B refresh needed +.br +A device was temporarily missing but has returned. The LV needs to be +refreshed to use the device again (which will usually require +partial synchronization). This is also indicated by the letter "r" (refresh +needed) in the 9th position of the lvs attr field. See +\fBRefreshing an LV\fP. This could also indicate a problem with the +device, in which case it should be be replaced, see +\fBReplacing Devices\fP. + +.B mismatches exist +.br +See +.BR Scrubbing . + +Most commands will also print a warning if a device is missing, e.g. +.br +.nf +WARNING: Device for PV uItL3Z-wBME-DQy0-... not found or rejected ... +.fi + +This warning will go away if the device returns or is removed from the +VG (see \fBvgreduce \-\-removemissing\fP). + + +.SS Activating an LV with missing devices + +A RAID LV that is missing devices may be activated or not, depending on +the "activation mode" used in lvchange: + +.B lvchange \-ay \-\-activationmode +.RB { complete | degraded | partial } +.IR VG / LV + +.B complete +.br +The LV is only activated if all devices are present. + +.B degraded +.br +The LV is activated with missing devices if the RAID level can +tolerate the number of missing devices without LV data loss. + +.B partial +.br +The LV is always activated, even if portions of the LV data are missing +because of the missing device(s). This should only be used to perform +recovery or repair operations. + +.BR lvm.conf (5) +.B activation/activation_mode +.br +controls the activation mode when not specified by the command. + +The default value is printed by: +.nf +lvmconfig --type default activation/activation_mode +.fi + +.SS Replacing Devices + +Devices in a RAID LV can be replaced with other devices in the VG. When +replacing devices that are no longer visible on the system, use lvconvert +\-\-repair. When replacing devices that are still visible, use lvconvert +\-\-replace. The repair command will attempt to restore the same number +of data LVs that were previously in the LV. The replace option can be +repeated to replace multiple PVs. Replacement devices can be optionally +listed with either option. + +.B lvconvert \-\-repair +.IR VG / LV +[\fINewPVs\fP] + +.B lvconvert \-\-replace +\fIOldPV\fP +.IR VG / LV +[\fINewPV\fP] + +.B lvconvert +.B \-\-replace +\fIOldPV1\fP +.B \-\-replace +\fIOldPV2\fP +... +.IR VG / LV +[\fINewPVs\fP] + +New devices require synchronization with existing devices, see +.BR Synchronization . + +.SS Refreshing an LV + +Refreshing a RAID LV clears any transient device failures (device was +temporarily disconnected) and returns the LV to its fully redundant mode. +Restoring a device will usually require at least partial synchronization +(see \fBSynchronization\fP). Failure to clear a transient failure results +in the RAID LV operating in degraded mode until it is reactivated. Use +the lvchange command to refresh an LV: + +.B lvchange \-\-refresh +.IR VG / LV + +.nf +# lvs -o name,vgname,segtype,attr,size vg + LV VG Type Attr LSize + raid1 vg raid1 Rwi-a-r-r- 100.00g + +# lvchange --refresh vg/raid1 + +# lvs -o name,vgname,segtype,attr,size vg + LV VG Type Attr LSize + raid1 vg raid1 Rwi-a-r--- 100.00g +.fi + +.SS Automatic repair + +If a device in a RAID LV fails, device-mapper in the kernel notifies the +.BR dmeventd (8) +monitoring process (see \fBMonitoring\fP). +dmeventd can be configured to automatically respond using: + +.BR lvm.conf (5) +.B activation/raid_fault_policy + +Possible settings are: + +.B warn +.br +A warning is added to the system log indicating that a device has +failed in the RAID LV. It is left to the user to repair the LV, e.g. +replace failed devices. + +.B allocate +.br +dmeventd automatically attempts to repair the LV using spare devices +in the VG. Note that even a transient failure is handled as a permanent +failure; a new device is allocated and full synchronization is started. + +The specific command run by dmeventd to warn or repair is: +.br +.B lvconvert \-\-repair \-\-use\-policies +.IR VG / LV + + +.SS Corrupted Data + +Data on a device can be corrupted due to hardware errors, without the +device ever being disconnected, and without any fault in the software. +This should be rare, and can be detected (see \fBScrubbing\fP). + + +.SS Rebuild specific PVs + +If specific PVs in a RAID LV are known to have corrupt data, the data on +those PVs can be reconstructed with: + +.B lvchange \-\-rebuild PV +.IR VG / LV + +The rebuild option can be repeated with different PVs to replace the data +on multiple PVs. + +For example, in a raid1 LV, the master mirror image on PV1 may have +corrupt data to due a transient disk error. In this case, \-\-rebuild PV1 +reconstructs data on the master image rather than rebuilding all other +images from the master image. + + +.SH Monitoring + +When a RAID LV is activated the \fBdmeventd\fP(8) process is started to +monitor the health of the LV. Various events detected in the kernel can +cause a notification to be sent from device-mapper to the monitoring +process, including device failures and synchronization completion (e.g. +for initialization or scrubbing). + +The LVM configuration file contains options that affect how the monitoring +process will respond to failure events (e.g. raid_fault_policy). It is +possible to turn on and off monitoring with lvchange, but it is not +recommended to turn this off unless you have a thorough knowledge of the +consequences. + + +.SH Configuration Options + +There are a number of options in the LVM configuration file that affect +the behavior of RAID LVs. The tunable options are listed +below. A detailed description of each can be found in the LVM +configuration file itself. +.br + mirror_segtype_default +.br + raid10_segtype_default +.br + raid_region_size +.br + raid_fault_policy +.br + activation_mode + + +.SH RAID1 Tuning + +A RAID1 LV can be tuned so that certain devices are avoided for reading +while all devices are still written to. + +.B lvchange +.BR \-\- [ raid ] writemostly +.BR \fIPhysicalVolume [ : { y | n | t }] +.IR VG / LV + +The specified device will be marked as "write mostly", which means that +reading from this device will be avoided, and other devices will be +preferred for reading (unless no other devices are available.) This +minimizes the I/O to the specified device. + +If the PV name has no suffix, the write mostly attribute is set. If the +PV name has the suffix \fB:n\fP, the write mostly attribute is cleared, +and the suffix \fB:t\fP toggles the current setting. + +The write mostly option can be repeated on the command line to change +multiple devices at once. + +To report the current write mostly setting, the lvs attr field will show +the letter "w" in the 9th position when write mostly is set: + +.B lvs -a -o name,attr + +When a device is marked write mostly, the maximum number of outstanding +writes to that device can be configured. Once the maximum is reached, +further writes become synchronous. When synchronous, a write to the LV +will not complete until writes to all the mirror images are complete. + +.B lvchange +.BR \-\- [ raid ] writebehind +.IR IOCount +.IR VG / LV + +To report the current write behind setting, run: + +.B lvs -o name,raid_write_behind + +When write behind is not configured, or set to 0, all LV writes are +synchronous. + + +.SH RAID Takeover + +RAID takeover is converting a RAID LV from one RAID level to another, e.g. +raid5 to raid6. Changing the RAID level is usually done to increase or +decrease resilience to device failures. This is done using lvconvert and +specifying the new RAID level as the LV type: + +.B lvconvert --type +.I RaidLevel +\fIVG\fP/\fILV\fP +[\fIPVs\fP] + +The most common and recommended RAID takeover conversions are: + +.HP +\fBlinear\fP to \fBraid1\fP +.br +Linear is a single image of LV data, and +converting it to raid1 adds a mirror image which is a direct copy of the +original linear image. + +.HP +\fBstriped\fP/\fBraid0\fP to \fBraid4/5/6\fP +.br +Adding parity devices to a +striped volume results in raid4/5/6. + +.P + +Unnatural conversions that are not recommended include converting between +striped and non-striped types. This is because file systems often +optimize I/O patterns based on device striping values. If those values +change, it can decrease performance. + +Converting to a higher RAID level requires allocating new SubLVs to hold +RAID metadata, and new SubLVs to hold parity blocks for LV data. +Converting to a lower RAID level removes the SubLVs that are no longer +needed. + +Conversion often requires full synchronization of the RAID LV (see +\fBSynchronization\fP). Converting to RAID1 requires copying all LV data +blocks to a new image on a new device. Converting to a parity RAID level +requires reading all LV data blocks, calculating parity, and writing the +new parity blocks. Synchronization can take a long time and degrade +performance (rate controls also apply to conversion, see +\fB\-\-maxrecoveryrate\fP.) + +.P + +The following takeover conversions are currently possible: +.br +.IP \(bu 3 +between linear and raid1. +.IP \(bu 3 +between striped and raid4. + +.SS Examples + +1. Converting an LV from \fBlinear\fP to \fBraid1\fP. + +.nf +# lvs -a -o name,segtype,size vg + LV Type LSize + lv linear 300.00g + +# lvconvert --type raid1 --mirrors 1 vg/lv + +# lvs -a -o name,segtype,size vg + LV Type LSize + lv raid1 300.00g + [lv_rimage_0] linear 300.00g + [lv_rimage_1] linear 300.00g + [lv_rmeta_0] linear 3.00m + [lv_rmeta_1] linear 3.00m +.fi + +2. Converting an LV from \fBmirror\fP to \fBraid1\fP. + +.nf +# lvs -a -o name,segtype,size vg + LV Type LSize + lv mirror 100.00g + [lv_mimage_0] linear 100.00g + [lv_mimage_1] linear 100.00g + [lv_mlog] linear 3.00m + +# lvconvert --type raid1 vg/lv + +# lvs -a -o name,segtype,size vg + LV Type LSize + lv raid1 100.00g + [lv_rimage_0] linear 100.00g + [lv_rimage_1] linear 100.00g + [lv_rmeta_0] linear 3.00m + [lv_rmeta_1] linear 3.00m +.fi + +3. Converting an LV from \fBstriped\fP (with 4 stripes) to \fBraid6_nc\fP. + +.nf +Start with a striped LV: + +# lvcreate --stripes 4 -L64M -n my_lv vg + +Convert the striped LV to raid6_nc: + +# lvconvert --type raid6_nc vg/my_lv + +# lvs -a -o lv_name,segtype,sync_percent,data_copies + LV Type Cpy%Sync #Cpy + my_lv raid6_n_6 100.00 3 + [my_lv_rimage_0] linear + [my_lv_rimage_1] linear + [my_lv_rimage_2] linear + [my_lv_rimage_3] linear + [my_lv_rimage_4] linear + [my_lv_rimage_5] linear + [my_lv_rmeta_0] linear + [my_lv_rmeta_1] linear + [my_lv_rmeta_2] linear + [my_lv_rmeta_3] linear + [my_lv_rmeta_4] linear + [my_lv_rmeta_5] linear +.fi + +This convert begins by allocating MetaLVs (rmeta_#) for each of the +existing stripe devices. It then creates 2 additional MetaLV/DataLV pairs +(rmeta_#/rimage_#) for dedicated raid6 parity. + +If rotating data/parity is required, such as with raid6_nr, it must be +done by reshaping (see below). + +4. Converting an LV from \fBlinear\fP to \fBraid1\fP (with 3 images). + +.nf +Start with a linear LV: + +# lvcreate -L1G -n my_lv vg + +Convert the linear LV to raid1 with three images +(original linear image plus 2 mirror images): + +# lvconvert --type raid1 --mirrors 2 vg/my_lv +.fi + + +.SH RAID Reshaping + +RAID reshaping is changing attributes of a RAID LV while keeping the same +RAID level, i.e. changes that do not involve changing the number of +devices. This includes changing RAID layout, stripe size, or number of +stripes. + +When changing the RAID layout or stripe size, no new SubLVs (MetaLVs or +DataLVs) need to be allocated, but DataLVs are extended by a small amount +(typically 1 extent). The extra space allows blocks in a stripe to be +updated safely, and not corrupted in case of a crash. If a crash occurs, +reshaping can just be restarted. + +(If blocks in a stripe were updated in place, a crash could leave them +partially updated and corrupted. Instead, an existing stripe is quiesced, +read, changed in layout, and the new stripe written to free space. Once +that is done, the new stripe is unquiesced and used.) + +.SS Examples + +1. Converting raid6_n_6 to raid6_nr with rotating data/parity. + +This conversion naturally follows a previous conversion from striped to +raid6_n_6 (shown above). It completes the transition to a more +traditional RAID6. + +.nf +# lvs -o lv_name,segtype,sync_percent,data_copies + LV Type Cpy%Sync #Cpy + my_lv raid6_n_6 100.00 3 + [my_lv_rimage_0] linear + [my_lv_rimage_1] linear + [my_lv_rimage_2] linear + [my_lv_rimage_3] linear + [my_lv_rimage_4] linear + [my_lv_rimage_5] linear + [my_lv_rmeta_0] linear + [my_lv_rmeta_1] linear + [my_lv_rmeta_2] linear + [my_lv_rmeta_3] linear + [my_lv_rmeta_4] linear + [my_lv_rmeta_5] linear + +# lvconvert --type raid6_nr vg/my_lv + +# lvs -a -o lv_name,segtype,sync_percent,data_copies + LV Type Cpy%Sync #Cpy + my_lv raid6_nr 100.00 3 + [my_lv_rimage_0] linear + [my_lv_rimage_0] linear + [my_lv_rimage_1] linear + [my_lv_rimage_1] linear + [my_lv_rimage_2] linear + [my_lv_rimage_2] linear + [my_lv_rimage_3] linear + [my_lv_rimage_3] linear + [my_lv_rimage_4] linear + [my_lv_rimage_5] linear + [my_lv_rmeta_0] linear + [my_lv_rmeta_1] linear + [my_lv_rmeta_2] linear + [my_lv_rmeta_3] linear + [my_lv_rmeta_4] linear + [my_lv_rmeta_5] linear +.fi + +The DataLVs are larger (additional segment in each) which provides space +for out-of-place reshaping. The result is: + +FIXME: did the lv name change from my_lv to r? +.br +FIXME: should we change device names in the example to sda,sdb,sdc? +.br +FIXME: include -o devices or seg_pe_ranges above also? + +.nf +# lvs -a -o lv_name,segtype,seg_pe_ranges,dataoffset + LV Type PE Ranges data + r raid6_nr r_rimage_0:0-32 \\ + r_rimage_1:0-32 \\ + r_rimage_2:0-32 \\ + r_rimage_3:0-32 + [r_rimage_0] linear /dev/sda:0-31 2048 + [r_rimage_0] linear /dev/sda:33-33 + [r_rimage_1] linear /dev/sdaa:0-31 2048 + [r_rimage_1] linear /dev/sdaa:33-33 + [r_rimage_2] linear /dev/sdab:1-33 2048 + [r_rimage_3] linear /dev/sdac:1-33 2048 + [r_rmeta_0] linear /dev/sda:32-32 + [r_rmeta_1] linear /dev/sdaa:32-32 + [r_rmeta_2] linear /dev/sdab:0-0 + [r_rmeta_3] linear /dev/sdac:0-0 +.fi + +All segments with PE ranges '33-33' provide the out-of-place reshape space. +The dataoffset column shows that the data was moved from initial offset 0 to +2048 sectors on each component DataLV. + + +.SH RAID5 Variants + +raid5_ls +.br +\[bu] +RAID5 left symmetric +.br +\[bu] +Rotating parity N with data restart + +raid5_la +.br +\[bu] +RAID5 left symmetric +.br +\[bu] +Rotating parity N with data continuation + +raid5_rs +.br +\[bu] +RAID5 right symmetric +.br +\[bu] +Rotating parity 0 with data restart + +raid5_ra +.br +\[bu] +RAID5 right asymmetric +.br +\[bu] +Rotating parity 0 with data continuation + +raid5_n +.br +\[bu] +RAID5 striping +.br +\[bu] +Same layout as raid4 with a dedicated parity N with striped data. +.br +\[bu] +Used for +.B RAID Takeover + +.SH RAID6 Variants + +raid6 +.br +\[bu] +RAID6 zero restart (aka left symmetric) +.br +\[bu] +Rotating parity 0 with data restart +.br +\[bu] +Same as raid6_zr + +raid6_zr +.br +\[bu] +RAID6 zero restart (aka left symmetric) +.br +\[bu] +Rotating parity 0 with data restart + +raid6_nr +.br +\[bu] +RAID6 N restart (aka right symmetric) +.br +\[bu] +Rotating parity N with data restart + +raid6_nc +.br +\[bu] +RAID6 N continue +.br +\[bu] +Rotating parity N with data continuation + +raid6_n_6 +.br +\[bu] +RAID6 N continue +.br +\[bu] +Fixed P-Syndrome N-1 and Q-Syndrome N with striped data +.br +\[bu] +Used for +.B RAID Takeover + +raid6_ls_6 +.br +\[bu] +RAID6 N continue +.br +\[bu] +Same as raid5_ls for N-1 disks with fixed Q-Syndrome N +.br +\[bu] +Used for +.B RAID Takeover + +raid6_la_6 +.br +\[bu] +RAID6 N continue +.br +\[bu] +Same as raid5_la for N-1 disks with fixed Q-Syndrome N +.br +\[bu] +Used for +.B RAID Takeover + +raid6_rs_6 +.br +\[bu] +RAID6 N continue +.br +\[bu] +Same as raid5_rs for N-1 disks with fixed Q-Syndrome N +.br +\[bu] +Used for +.B RAID Takeover + +raid6_ra_6 +.br +\[bu] +RAID6 N continue +.br +\[bu] +Same as raid5_ra for N-1 disks with fixed Q-Syndrome N +.br +\[bu] +Used for +.B RAID Takeover + + + +.SH RAID Duplication + +RAID LV conversion (takeover or reshaping) can be done out\-of\-place by +copying the LV data onto new devices while changing the RAID properties. +Copying avoids modifying the original LV but requires additional devices. +Once the LV data has been copied/converted onto the new devices, there are +multiple options: + +1. The RAID LV can be switched over to run from just the new devices, and +the original copy of the data removed. The converted LV then has the new +RAID properties, and exists on new devices. The old devices holding the +original data can be removed or reused. + +2. The new copy of the data can be dropped, leaving the original RAID LV +unchanged and using its original devices. + +3. The new copy of the data can be separated and used as a new independent +LV, leaving the original RAID LV unchanged on its original devices. + +The command to start duplication is: + +.B lvconvert \-\-type +.I RaidLevel +[\fB\-\-stripes\fP \fINumber\fP \fB\-\-stripesize\fP \fISize\fP] +.RS +.B \-\-duplicate +.IR VG / LV +[\fIPVs\fP] +.RE + +.HP +.B \-\-duplicate +.br +Specifies that the LV conversion should be done out\-of\-place, copying +LV data to new devices while converting. + +.HP +.BR \-\-type , \-\-stripes , \-\-stripesize +.br +Specifies the RAID properties to use when creating the copy. + +.P +\fIPVs\fP specifies the new devices to use. + +The steps in the duplication process: + +.IP \(bu 3 +LVM creates a new LV on new devices using the specified RAID properties +(type, stripes, etc) and optionally specified devices. + +.IP \(bu 3 +LVM changes the visible RAID LV to type raid1, making the original LV the +first raid1 image (SubLV 0), and the new LV the second raid1 image +(SubLV 1). + +.IP \(bu 3 +The RAID1 synchronization process copies data from the original LV +image (SubLV 0) to the new LV image (SubLV 1). + +.IP \(bu 3 +When synchronization is complete, the original and new LVs are +mirror images of each other and can be separated. + +.P + +The duplication process retains both the original and new LVs (both +SubLVs) until an explicit unduplicate command is run to separate them. The +unduplicate command specifies if the original LV should use the old +devices (SubLV 0) or the new devices (SubLV 1). + +To make the RAID LV use the data on the old devices, and drop the copy on +the new devices, specify the name of SubLV 0 (suffix _dup_0): + +.B lvconvert \-\-unduplicate +.BI \-\-name +.IB LV _dup_0 +.IR VG / LV + +To make the RAID LV use the data copy on the new devices, and drop the old +devices, specify the name of SubLV 1 (suffix _dup_1): + +.B lvconvert \-\-unduplicate +.BI \-\-name +.IB LV _dup_1 +.IR VG / LV + +FIXME: To make the LV use the data on the original devices, but keep the +data copy as a new LV, ... + +FIXME: include how splitmirrors can be used. + + +.SH RAID1E + +TODO + +.SH History + +The 2.6.38-rc1 version of the Linux kernel introduced a device-mapper +target to interface with the software RAID (MD) personalities. This +provided device-mapper with RAID 4/5/6 capabilities and a larger +development community. Later, support for RAID1, RAID10, and RAID1E (RAID +10 variants) were added. Support for these new kernel RAID targets was +added to LVM version 2.02.87. The capabilities of the LVM \fBraid1\fP +type have surpassed the old \fBmirror\fP type. raid1 is now recommended +instead of mirror. raid1 became the default for mirroring in LVM version +2.02.100. + |