summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* improve reading and repairing vg metadatadev-dct-read-11David Teigland2019-02-0137-1630/+1938
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fact that vg repair is implemented as a part of vg read has led to a very poor, complicated implementation of vg_read and very limited and uncontrolled repair capability. This splits read and repair apart. Summary ------- - take all kinds of various repairs out of vg_read - vg_read no longer writes anything - vg_read now simply reads and returns vg metadata - vg_read ignores bad or old copies of metadata - vg_read proceeds with a single good copy of metadata - improve error checks and handling when reading - keep track of bad (corrupt) copies of metadata in lvmcache - keep track of old (seqno) copies of metadata in lvmcache - keep track of outdated PVs in lvmcache - vg_write will do basic repairs - new command vgck --updatemetdata will do all repairs Reading bad/old metadata ------------------------ - "bad metadata": the mda_header or metadata text has invalid fields or can't be parsed by lvm. This is a form of corruption that would not be caused by known failure scenarios. A checksum error is typically included among the errors reported. - "old metadata": a valid copy of the metadata that has a smaller seqno than other copies of the metadata. This can happen if the device failed, or io failed, or lvm failed while commiting new metadata to all the metadata areas. Old metadata on a PV that has been removed from the VG is the "outdated" case below. When a VG has some PVs with bad/old metadata, lvm can simply ignore the bad/old copies, and use a good copy. This is why there are multiple copies of the metadata -- so it's available even when some of the copies cannot be used. The bad/old copies do not have to be repaired before the VG can be used (the repair can happen later.) A PV with no good copies of the metadata simply falls back to being treated like a PV with no mdas; a common and harmless configuration. When bad/old metadata exists, lvm warns the user about it, and suggests repairing it using a new metadata repair command. Bad metadata in particular is something that users will want to investigate and repair themselves, since it should not happen and may indicate some other problem that needs to be fixed. PVs with bad/old metadata are not the same as missing devices. Missing devices will block various kinds of VG modification or activation, but bad/old metadata will not. Previously, lvm would attempt to repair bad/old metadata whenever it was read. This was unnecessary since lvm does not require every copy of the metadata to be used. It would also hide potential problems that should be investigated by the user. It was also dangerous in cases where the VG was on shared storage. The user is now allowed to investigate potential problems and decide how and when to repair them. Repairing bad/old metadata -------------------------- When label scan sees bad metadata in an mda, that mda is removed from the lvmcache info->mdas list. This means that vg_read will skip it, and not attempt to read/process it again. If it was the only in-use mda on a PV, that PV is treated like a PV with no mdas. It also means that vg_write will also skip the bad mda, and not attempt to write new metadata to it. The only way to repair bad metadata is with the metadata repair command. When label scan sees old metadata in an mda, that mda is kept in the lvmcache info->mdas list. This means that vg_read will read/process it again, and likely see the same mismatch with the other copies of the metadata. Like the label_scan, the vg_read will simply ignore the old copy of the metadata and use the latest copy. If the command is modifying the vg (e.g. lvcreate), then vg_write, which writes new metadata to every mda on info->mdas, will write the new metadata to the mda that had the old version. If successful, this will resolve the old metadata problem (without needing to run a metadata repair command.) Outdated PVs ------------ An outdated PV is a PV that has an old copy of VG metadata that shows it is a member of the VG, but the latest copy of the VG metadata does not include this PV. This happens if the PV is disconnected, vgreduce --removemissing is run to remove the PV from the VG, then the PV is reconnected. In this case, the outdated PV needs have its outdated metadata removed and the PV used flag needs to be cleared. This repair will be done by the subsequent repair command. It is also done if vgremove is run on the VG. MISSING PVs ----------- When a device is missing, most commands will refuse to modify the VG. This is the simple case. More complicated is when a command is allowed to modify the VG while it is missing a device. When a VG is written while a device is missing for one of it's PVs, the VG metadata includes the MISSING_PV flag on the PV with the missing device. When the VG is next used, it needs to be treated as if this PV with the MISSING flag is still missing, even if the device has reappeared. vgreduce --removemissing will remove PVs with missing devices, or PVs with the MISSING flag where the device has reappeared. vgextend --restoremissing will clear the MISSING flag on PVs where the device has reappeared, allowing the VG to be used normally. This must be done with caution since the reappeared device may have old data that is inconsistent with data on other PVs. Bad mda repair -------------- The new command: vgck --updatemetadata VG first uses vg_write to repair old metadata, and other basic issues mentioned above (old metadata, outdated PVs, pv_header flags, MISSING_PV flags). It will also go further and repair bad metadata: . text metadata that has a bad checksum . text metadata that is not parsable . corrupt mda_header checksum and version fields
* cleanup: indentZdenek Kabelac2019-01-283-4/+2
|
* man: lvmvdo component activation descriptionZdenek Kabelac2019-01-281-0/+15
| | | | Describe component activation for VDO Data LV.
* man: vdo regeneratedZdenek Kabelac2019-01-284-3347/+134
| | | | Correcting order of appearance of VDO description in lvcreate.
* vdo: add some basic exampleZdenek Kabelac2019-01-281-0/+8
|
* vdo: document types vdo and vdo-poolZdenek Kabelac2019-01-283-6/+8
|
* vdo: complete matching with thin syntaxZdenek Kabelac2019-01-282-26/+30
| | | | | | | | | | | | | Just like we support for thin-pool syntax: lvcreate --thinpool new_tpoolname -L Size vg add same support logic with for vdo-poo: lvcreate --vdopool new_vpoolname -L Size vg Also move description of syntax bellow thin-pool, so it's correctly ordered in generated man page.
* lvconvert: pass force and yes options for vdo conversionZdenek Kabelac2019-01-281-1/+3
|
* tests: rounding for pools changed to power of 2Zdenek Kabelac2019-01-281-5/+3
| | | | Even with 64K chunk support, lvm2 will target power-of-2 chunks.
* thin: select chunk size as power of 2Zdenek Kabelac2019-01-282-14/+4
| | | | | | Whenever thin-pool chunk size is unspecified and left for lvm calculation try to select the size as nearest highest power-of-2 instead of just being a multiple of 64KiB.
* cache: select chunk size as power of 2Zdenek Kabelac2019-01-282-1/+5
| | | | | When cache chunk size is not configured, and left for lvm deduction, select the value which is power-of-2.
* tests: use pvscan after enable_dev in process-each-duplicate-vgnamesDavid Teigland2019-01-281-7/+2
| | | | instead of vgscan, so that new dev is recognized with hints
* vgscan: drop 'take a while' messageDavid Teigland2019-01-281-2/+0
| | | | every command does this
* rpm: package lvmvdo man pageZdenek Kabelac2019-01-221-0/+3
|
* vdo: some formating updatesZdenek Kabelac2019-01-221-10/+16
|
* lv_manip: better work with PERCENT_VG modifier with lvresizeZdenek Kabelac2019-01-211-2/+2
| | | | | | Fixing recent commit 022ebb0cfebee4ac8fdbe4e0c61e85db1038a115 Resize already has size that needs to be counted with, otherwise upsizing operation could turn into size reduction one.
* vdo: minor API cleanupZdenek Kabelac2019-01-213-20/+9
| | | | | | Since the parse_vdo_pool_status() become vdo_manip API part, and there will be no 'dm' matching status parser, the API can be simplified and closely match thin API here.
* tests: vdo dmeventd resizeZdenek Kabelac2019-01-211-0/+68
|
* vdo: enable dmeventd resizeZdenek Kabelac2019-01-212-3/+20
|
* vdo: add simple wrapper for getting pool percentageZdenek Kabelac2019-01-212-0/+14
| | | | | | Just like with i.e. thins provide simple function for getting percentage of VDO Pool usage (uses existing status function).
* tests: initial test for vdo resizeZdenek Kabelac2019-01-212-0/+69
|
* tests: aux fix testing for kvdoZdenek Kabelac2019-01-211-0/+1
|
* tests: update cache testZdenek Kabelac2019-01-212-17/+20
| | | | | Since migration_threshold is now protected to not be smaller then 8*chunk_size - update tests to count with this modification.
* vdo: man documenting resizeZdenek Kabelac2019-01-211-0/+36
|
* cleanup: better namingZdenek Kabelac2019-01-211-7/+7
|
* vdo: allow resize of VDO and VDO pool volumesZdenek Kabelac2019-01-212-8/+12
| | | | | | | | | | | | | Now with newer VDO kvdo target we can start to use standard mechanism to enable resize of VDO volumes. VDO pool can be grown. Virtual volume grows on top of VDO pool when is not big enough. Reduced VDOLV is calling discard for reduced areas - this can take long time! TODO: implement some pollable mechanism for out-of-lock TRIM.
* vdo: size reduction requires VDO to be activeZdenek Kabelac2019-01-211-0/+5
| | | | | To be able to send discard to reduced areas - the VDO LV needs to be active.
* vdo: discard reduced areaZdenek Kabelac2019-01-211-0/+37
| | | | Implement sending discard to reduced LV area.
* vdo: estimate virtual size after resizeZdenek Kabelac2019-01-211-0/+4
|
* vdo: introduce function for estimation of virtual sizeZdenek Kabelac2019-01-212-1/+19
|
* lv_manip: better work with PERCENT_VG modifierZdenek Kabelac2019-01-213-0/+13
| | | | | | | | | | | | | | | | | | When using 'lvcreate -l100%VG' and there is big disproportion between real available space and requested setting - automatically fallback to 100%FREE. Difference can be seen when VG is big and already most space was allocated, so the requestion 100%VG can end (and by spec for % modifier it's correct) as LV with size of 1%VG. Usually this is not a big problem - buit in some cases - like cache-pool allocation, this can result a big difference for chunksize selection. With this patch it's more closely match common-sense logic without the need of reitteration of too big changes in lvm2 core ATM. TODO: in the future there should be allocator solving all allocations in a single call.
* dm: migration_threshold for old linked toolsZdenek Kabelac2019-01-212-3/+18
| | | | | | Just like with precending lvm2 device_mapper patch, ensure that old users of libdm will also get fixed migration threshold for caches.
* dm: ensure migration_threshold is big enoughZdenek Kabelac2019-01-212-3/+18
| | | | | | | | | | | | When using caches with BIG pool size (>TB) there is required to use relatively huge chunk size. Once the chunksize has got over 1MiB - kernel cache target stopped writing such chunks back on this if the migration_threshold remained on default 1MiB (2048 sectors) size. This patch ensure, DM layer will not let pass table line which has not big enough migration threshold that can let pass at least 8 chunks (independently of lvm2 metadata).
* man: document dD attrs for VDO lvsZdenek Kabelac2019-01-212-4/+6
| | | | New attrs v(d)o pool and v(D) pool data.
* man: initial man page for VDO supportZdenek Kabelac2019-01-212-1/+243
| | | | Basic lvm2 command support for VDO.
* man: missed --zero option for thin-pool creationZdenek Kabelac2019-01-213-0/+3
| | | | | During man page rewrite this info got lost and remained only for lvconvert. So restore it back for lvcreate.
* vdo: update vdo profileZdenek Kabelac2019-01-211-21/+20
|
* vdo: fix archived metadata commentZdenek Kabelac2019-01-212-1/+2
| | | | | | | | | | | | lvm uses 'minimum_io_size' name to exactly match VDO naming here, however in all common cases _size is using 'sector/512b' unit. But in this case the value is in bytes and can have only 2 values: either 512 or 4096. It's probably not worth to rename it internaly, so we can just drop comment - instead of using 1 or 8. Thought let's think about it....
* lvmdbusd: Use UUID instead of name for VG renameTony Asleson2019-01-162-3/+3
| | | | | Use the UUID to specify the VG to rename instead of the name as this approach works when we have duplicate VG names.
* lvmdbusd: Handle duplicate VG namesTony Asleson2019-01-162-9/+42
| | | | | | | | | | | Lvm can at times have duplicate names. When this happens the daemon will internally use vg_name:vg_uuid as the name for lookups, but display just the vg_name externally. If an API user uses the Manager.LookUpByLvmId and queries the vg name they will only get returned one result as the API can only accommodate returning 1. The one returned is the first instance found when sorting the volume groups by UUID. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1583510
* lvmdbusd: Correct object manager lookupsTony Asleson2019-01-161-35/+15
| | | | | | | | | When we have two logical volumes which switch their names at the same time we are left with incorrect lookups. Anytime we find an entry by doing a lookup by UUID or by name we will ensure that the lookups are indeed correct. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1642176
* lvmdbusd: Spelling correctionTony Asleson2019-01-161-1/+1
|
* lvmdbusd: LookUpByLvmId: Add doc for cb, cbeTony Asleson2019-01-161-0/+2
|
* lvmdbusd: Ensure all paths return valueTony Asleson2019-01-161-1/+1
|
* hints: invalidate when pvscan --cache sees a new PVDavid Teigland2019-01-164-2/+54
| | | | | | | | An idea from Zdenek for better ensuring valid hints by invalidating them when pvscan --cache <device> sees a new PV, which is a case where we know that hints should be invalidated. This is triggered from systemd/udev logic, and there may be some cases where it would invalidate hints that the existing methods wouldn't detect.
* lvmlockd: fix make lockstart waitDavid Teigland2019-01-161-1/+1
| | | | when building without lvmlockd
* move init_use_aioDavid Teigland2019-01-161-2/+2
| | | | it doesn't make sense to call from init_logging
* lvmlockd: make lockstart wait for existing startDavid Teigland2019-01-165-9/+21
| | | | | | | | | | | | | | | | | | If there are two independent scripts doing: vgchange --lockstart vg lvchange -ay vg/lv The first vgchange to do the lockstart will wait for the lockstart to complete before returning. The second vgchange to do the lockstart will see that the start is already in progress (from the first) and will do nothing. This means the second does not wait for any lockstart to complete, and moves on to the lvchange which may find the lockspace still starting and fail. To fix this, make the vgchange lockstart command wait for any lockstart's in progress to complete.
* hints: fix hint flock when using lvm shellDavid Teigland2019-01-154-5/+17
| | | | also cmd->use_hints needs to be set for each shell command
* WHATS_NEW: device hintsDavid Teigland2019-01-151-0/+1
|