| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fact that vg repair is implemented as a part of vg read
has led to a very poor implementation of vg_read. This splits
read and repair apart.
Summary
-------
- take all kinds of various repairs out of vg_read
- vg_read no longer writes anything
- vg_read now simply reads and returns vg metadata
- vg_read can proceed with a single good copy of metadata
- vg_read should ignore bad or old copies of metadata
- improve error checks and handling when reading
- keep track of bad (corrupt) copies of metadata in lvmcache
- keep track of old (seqno) copies of metadata in lvmcache
- keep track of outdated PVs in lvmcache
- wipe outdated PVs in vg_write instead of vg_read
- fix PV headers in vg_write instead of vg_read
- update old metadata in vg_write instead of vg_read
- do not conflate bad/old metadata with missing devs
- separate commands for other vg repairs will follow
Reading bad/old metadata
------------------------
- "bad metadata" is a copy of the metadata that has been corrupted,
or can't be read, or has some invalid data that can't be parsed or
understood by lvm. It's often reported as a checksum error, but
not always. Bad metadata should be replaced with a copy of good
metadata from another PV (or from a good copy on the same PV.)
- "old metadata" is a copy of the metadata that has a smaller seqno
than other copies of the metadata. It could happen if the device
failed, or io failed, or lvm failed while commiting new metadata
to all the metadata areas. Old metadata on a PV that is still in
the VG should be replaced with a copy of good metadata from another
PV (or from a good copy on the same PV). Old metadata on a PV that
has been removed from the VG should be erased.
When a VG has some PVs with bad/old metadata, lvm can simply ignore
the bad/old copies, and use a good copy. This is why there are
multiple copies of the metadata -- so it's available even when some
of the copies cannot be used. The bad/old copies do not have to be
repaired before the VG can be used (the repair can happen later.)
A PV with no good copies of the metadata simply falls back to being
treated like a PV with no mdas; a common and harmless configuration.
When bad/old metadata exists, lvm warns the user about it, and
suggests repairing it using a new metadata repair command.
Bad/old metadata is something that users will often want to
investigate and repair themselves, since it should not generally
happen and may indicate some other problem that needs to be fixed.
PVs with bad/old metadata are not the same as missing devices.
Missing devices will block various kinds of VG modification or
activation, but bad/old metadata will not.
Previously, lvm would attempt to repair bad/old metadata whenever
it was read. This was unnecessary since lvm does not require every
copy of the metadata to be used. It would also hide potential
problems that should be investigated by the user. It was also
dangerous in cases where the VG was on shared storage. The user
is now allowed to investigate potential problems and decide how
and when to repair them.
Repairing bad/old metadata
--------------------------
When label scan sees bad metadata in an mda, that mda is removed
from the lvmcache info->mdas list. This means that vg_read will
skip it, and not attempt to read/process it again. If it was
the only in-use mda on a PV, that PV is treated like a PV with
no mdas. It also means that vg_write will also skip the bad mda,
and not attempt to write new metadata to it. The only way to
repair bad metadata is with metadata repair (see next commit).
(We may also want to allow pvchange --metadataignore on a PV
with bad metadata.)
When label scan sees old metadata in an mda, that mda is kept
in the lvmcache info->mdas list. This means that vg_read will
read/process it again, and likely see the same mismatch with
the other copies of the metadata. Like the label_scan, the
vg_read will simply ignore the old copy of the metadata and
use the latest copy. If the command is modifying the vg
(e.g. lvcreate), then vg_write, which writes new metadata to
every mda on info->mdas, will write the new metadata to the
mda that had the problematic old version. If successful,
this will resolve the old metadata problem (without needing
to run a metadata repair command.)
Outdated PVs
------------
An outdated PV is a PV that has an old copy of VG metadata
that shows it is a member of the VG, but the latest copy of
the VG metadata does not include this PV. This happens if
the PV is disconnected, vgreduce --removemissing is run to
remove the PV from the VG, then the PV is reconnected.
In this case, the outdated PV needs have its outdated metadata
removed and the PV used flag needs to be cleared. This repair
will be done by the subsequent repair command. It is also done
if vgremove is run on the VG.
MISSING PVs
-----------
When a device is missing, most commands will refuse to modify
the VG. This is the simple case. More complicated is when
a command is allowed to modify the VG while it is missing a
device.
When a VG is written while a device is missing for one of it's PVs,
the VG metadata includes the MISSING_PV flag on the PV with the
missing device. When the VG is next used, it needs to be treated
as if this PV with the MISSING flag is still missing, even if the
device has reappeared.
vgreduce --removemissing will remove PVs with missing devices,
or PVs with the MISSING flag where the device has reappeared.
vgextend --restoremissing will clear the MISSING flag on PVs
where the device has reappeared, allowing the VG to be used
normally. This must be done with caution since the reappeared
device may have old data that is inconsistent with data on other PVs.
|
| |
|
|
|
|
| |
Describe component activation for VDO Data LV.
|
|
|
|
| |
Correcting order of appearance of VDO description in lvcreate.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Just like we support for thin-pool syntax:
lvcreate --thinpool new_tpoolname -L Size vg
add same support logic with for vdo-poo:
lvcreate --vdopool new_vpoolname -L Size vg
Also move description of syntax bellow thin-pool, so it's
correctly ordered in generated man page.
|
| |
|
|
|
|
| |
Even with 64K chunk support, lvm2 will target power-of-2 chunks.
|
|
|
|
|
|
| |
Whenever thin-pool chunk size is unspecified and left for lvm calculation
try to select the size as nearest highest power-of-2 instead of
just being a multiple of 64KiB.
|
|
|
|
|
| |
When cache chunk size is not configured, and left for lvm deduction,
select the value which is power-of-2.
|
|
|
|
| |
instead of vgscan, so that new dev is recognized with hints
|
|
|
|
| |
every command does this
|
| |
|
| |
|
|
|
|
|
|
| |
Fixing recent commit 022ebb0cfebee4ac8fdbe4e0c61e85db1038a115
Resize already has size that needs to be counted with,
otherwise upsizing operation could turn into size reduction one.
|
|
|
|
|
|
| |
Since the parse_vdo_pool_status() become vdo_manip API part,
and there will be no 'dm' matching status parser,
the API can be simplified and closely match thin API here.
|
| |
|
| |
|
|
|
|
|
|
| |
Just like with i.e. thins provide simple function for
getting percentage of VDO Pool usage (uses existing
status function).
|
| |
|
| |
|
|
|
|
|
| |
Since migration_threshold is now protected to not be smaller
then 8*chunk_size - update tests to count with this modification.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now with newer VDO kvdo target we can start to use standard mechanism
to enable resize of VDO volumes.
VDO pool can be grown.
Virtual volume grows on top of VDO pool when is not big enough.
Reduced VDOLV is calling discard for reduced areas - this can
take long time!
TODO: implement some pollable mechanism for out-of-lock TRIM.
|
|
|
|
|
| |
To be able to send discard to reduced areas - the VDO LV needs to
be active.
|
|
|
|
| |
Implement sending discard to reduced LV area.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using 'lvcreate -l100%VG' and there is big disproportion between
real available space and requested setting - automatically fallback
to 100%FREE.
Difference can be seen when VG is big and already most space was
allocated, so the requestion 100%VG can end (and by spec for % modifier
it's correct) as LV with size of 1%VG. Usually this is not a big
problem - buit in some cases - like cache-pool allocation, this
can result a big difference for chunksize selection.
With this patch it's more closely match common-sense logic without
the need of reitteration of too big changes in lvm2 core ATM.
TODO: in the future there should be allocator solving all allocations
in a single call.
|
|
|
|
|
|
| |
Just like with precending lvm2 device_mapper patch, ensure
that old users of libdm will also get fixed migration threshold
for caches.
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using caches with BIG pool size (>TB) there is required
to use relatively huge chunk size. Once the chunksize has
got over 1MiB - kernel cache target stopped writing such chunks
back on this if the migration_threshold remained on default 1MiB
(2048 sectors) size.
This patch ensure, DM layer will not let pass table line which
has not big enough migration threshold that can let pass
at least 8 chunks (independently of lvm2 metadata).
|
|
|
|
| |
New attrs v(d)o pool and v(D) pool data.
|
|
|
|
| |
Basic lvm2 command support for VDO.
|
|
|
|
|
| |
During man page rewrite this info got lost and remained
only for lvconvert. So restore it back for lvcreate.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
lvm uses 'minimum_io_size' name to exactly match VDO naming here,
however in all common cases _size is using 'sector/512b' unit.
But in this case the value is in bytes and can have only 2 values:
either 512 or 4096.
It's probably not worth to rename it internaly, so we can just
drop comment - instead of using 1 or 8.
Thought let's think about it....
|
|
|
|
|
| |
Use the UUID to specify the VG to rename instead of the name as this
approach works when we have duplicate VG names.
|
|
|
|
|
|
|
|
|
|
|
| |
Lvm can at times have duplicate names. When this happens the daemon will
internally use vg_name:vg_uuid as the name for lookups, but display just
the vg_name externally. If an API user uses the Manager.LookUpByLvmId and
queries the vg name they will only get returned one result as the API
can only accommodate returning 1. The one returned is the first instance
found when sorting the volume groups by UUID.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1583510
|
|
|
|
|
|
|
|
|
| |
When we have two logical volumes which switch their names at the
same time we are left with incorrect lookups. Anytime we find
an entry by doing a lookup by UUID or by name we will ensure
that the lookups are indeed correct.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1642176
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
An idea from Zdenek for better ensuring valid hints by invalidating
them when pvscan --cache <device> sees a new PV, which is a case
where we know that hints should be invalidated. This is triggered
from systemd/udev logic, and there may be some cases where it would
invalidate hints that the existing methods wouldn't detect.
|
|
|
|
| |
when building without lvmlockd
|
|
|
|
| |
it doesn't make sense to call from init_logging
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If there are two independent scripts doing:
vgchange --lockstart vg
lvchange -ay vg/lv
The first vgchange to do the lockstart will wait for
the lockstart to complete before returning.
The second vgchange to do the lockstart will see that
the start is already in progress (from the first) and
will do nothing. This means the second does not wait
for any lockstart to complete, and moves on to the
lvchange which may find the lockspace still starting
and fail.
To fix this, make the vgchange lockstart command
wait for any lockstart's in progress to complete.
|
|
|
|
| |
also cmd->use_hints needs to be set for each shell command
|
| |
|