| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The copy of VG metadata stored in lvmcache was not being used
in general. It pretended to be a generic VG metadata cache,
but was not being used except for clvmd activation. There
it was used to avoid reading from disk while devices were
suspended, i.e. in resume.
This removes the code that attempted to make this look
like a generic metadata cache, and replaces with with
something narrowly targetted to what it's actually used for.
This is a way of passing the VG from suspend to resume in
clvmd. Since in the case of clvmd one caller can't simply
pass the same VG to both suspend and resume, suspend needs
to stash the VG somewhere that resume can grab it from.
(resume doesn't want to read it from disk since devices
are suspended.) The lvmcache vginfo struct is used as a
convenient place to stash the VG to pass it from suspend
to resume, even though it isn't related to the lvmcache
or vginfo. These suspended_vg* vginfo fields should
not be used or touched anywhere else, they are only to
be used for passing the VG data from suspend to resume
in clvmd. The VG data being passed between suspend and
resume is never modified, and will only exist in the
brief period between suspend and resume in clvmd.
suspend has both old (current) and new (precommitted)
copies of the VG metadata. It stashes both of these in
the vginfo prior to suspending devices. When vg_commit
is successful, it sets a flag in vginfo as before,
signaling the transition from old to new metadata.
resume grabs the VG stashed by suspend. If the vg_commit
happened, it grabs the new VG, and if the vg_commit didn't
happen it grabs the old VG. The VG is then used to resume
LVs.
This isolates clvmd-specific code and usage from the
normal lvm vg_read code, making the code simpler and
the behavior easier to verify.
Sequence of operations:
- lv_suspend() has both vg_old and vg_new
and stashes a copy of each onto the vginfo:
lvmcache_save_suspended_vg(vg_old);
lvmcache_save_suspended_vg(vg_new);
- vg_commit() happens, which causes all clvmd
instances to call lvmcache_commit_metadata(vg).
A flag is set in the vginfo indicating the
transition from the old to new VG:
vginfo->suspended_vg_committed = 1;
- lv_resume() needs either vg_old or vg_new
to use in resuming LVs. It doesn't want to
read the VG from disk since devices are
suspended, so it gets the VG stashed by
lv_suspend:
vg = lvmcache_get_suspended_vg(vgid);
If the vg_commit did not happen, suspended_vg_committed
will not be set, and in this case, lvmcache_get_suspended_vg()
will return the old VG instead of the new VG, and it will
resume LVs based on the old metadata.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When process_each_pv() calls vg_read() on the orphan VG, the
internal implementation was doing an unnecessary
lvmcache_label_scan() and two unnecessary label_read() calls
on each orphan. Some of those unnecessary label scans/reads
would sometimes be skipped due to caching, but the code was
always doing at least one unnecessary read on each orphan.
The common format_text case was also unecessarily calling into
the format-specific pv_read() function which actually did nothing.
By analyzing each case in which vg_read() was being called on
the orphan VG, we can say that all of the label scans/reads
in vg_read_orphans are unnecessary:
1. reporting commands: the information saved in lvmcache by
the original label scan can be reported. There is no advantage
to repeating the label scan on the orphans a second time before
reporting it.
2. pvcreate/vgcreate/vgextend: these all share a common
implementation in pvcreate_each_device(). That function
already rescans labels after acquiring the orphan VG lock,
which ensures that the command is using valid lvmcache
information.
|
|
|
|
|
|
|
| |
The old code was doing unnecessary label scans when
checking to see if the new VG name exists. A single
label_scan is sufficient if it is done after the
new VG lock is held.
|
|
|
|
|
| |
Take advantage of the common implementation with aio
and reduced disk reads.
|
|
|
|
|
|
|
|
| |
When lvmlockd indicates that the lvmetad cache is out of
date because of changes by another node, lvmetad_pvscan_vg()
rescans the devices in the VG to update lvmetad. Use the
new label_scan in this function to use the common code and
take advantage of the new aio and reduced reads.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new label_scan() function reads a large buffer of data
from the start of the disk, and saves it so that multiple
structs can be read from it. Previously, only the label_header
was read from this buffer, and the code which needed data
structures that immediately followed the label_header would
read those from disk separately. This created a large
number of small, unnecessary disk reads.
In each place that the two read paths (label_scan and vg_read)
need to read data from disk, first check if that data is
already available from the label_read_data buffer, and if
so just copy it from the buffer instead of reading from disk.
Code changes
------------
- passing the label_read_data struct down through
both read paths to make it available.
- before every disk read, first check if the location
and size of the desired piece of data exists fully
in the label_read_data buffer, and if so copy it
from there. Otherwise, use the existing code to
read the data from disk.
- adding some log_error messages on existing error paths
that were already being updated for the reasons above.
- using similar naming for parallel functions on the two
parallel read paths that are being updated above.
label_scan path calls:
read_metadata_location_summary, text_read_metadata_summary
vg_read path calls:
read_metadata_location_vg, text_read_metadata_file
Previously, those functions were named:
label_scan path calls:
vgname_from_mda, text_vgsummary_import
vg_read path calls:
_find_vg_rlocn, text_vg_import_fd
I/O changes
-----------
In the label_scan path, the following data is either copied
from label_read_data or read from disk for each PV:
- label_header and pv_header
- mda_header (in _raw_read_mda_header)
- vg metadata name (in read_metadata_location_summary)
- vg metadata (in config_file_read_fd)
Total of 4 reads per PV in the label_scan path.
In the vg_read path, the following data is either copied from
label_read_data or read from disk for each PV:
- mda_header (in _raw_read_mda_header)
- vg metadata name (in read_metadata_location_vg)
- vg metadata (in config_file_read_fd)
Total of 3 reads per PV in the vg_read path.
For a common read/reporting command, each PV will be:
- read by the command's initial lvmcache_label_scan()
- read by lvmcache_label_rescan_vg() at the start of vg_read()
- read by vg_read()
Previously, this would cause 11 synchronous disk reads per PV:
4 from lvmcache_label_scan(), 4 from lvmcache_label_rescan_vg()
and 3 from vg_read().
With this commit's optimization, there are now 2 async disk reads
per PV: 1 from lvmcache_label_scan() and 1 from
lvmcache_label_rescan_vg().
When a second mda is used on a PV, it is located at the
end of the PV. This second mda and copy of metadata will
not be found in the label_read_data buffer, and will always
require separate disk reads.
|
|
|
|
|
| |
Fix mixing bitwise & and logical && which was
always 1 in any case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the use of lvmcache_label_rescan_vg() in the previous
commit for the special case of independent metadata areas.
label scan is about discovering VG name to device associations
using information from disks, but devices in VGs with
independent metadata areas have no information on disk, so
the label scan does nothing for these VGs/devices.
With independent metadata areas, only the VG metadata found
in files is used. This metadata is found and read in
vg_read in the processing phase.
lvmcache_label_rescan_vg() drops lvmcache info for the VG devices
before repeating the label scan on them. In the case of
independent metadata areas, there is no metadata on devices, so the
label scan of the devices will find nothing, so will not recreate
the necessary vginfo/info data in lvmcache for the VG. Fix this
by setting a flag in the lvmcache vginfo struct indicating that
the VG uses independent metadata areas, and label rescanning should
be skipped.
In the case of independent metadata areas, it is the metadata
processing in the vg_read phase that sets up the lvmcache
vginfo/info information, and label scan has no role.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LVM's general design for scanning/reading of metadata from disks is
that a command begins with a discovery phase, called "label scan",
in which it discovers which devices belong to lvm, what VGs exist on
those devices, and which devices are associated with each VG.
After this comes the processing phase, which is based around
processing specific VGs. In this phase, lvm acquires a lock on
the VG, and rescans the devices associated with that VG, i.e.
it repeats the label scan steps on the devices in the VG in case
something has changed between the initial label scan and taking
the VG lock. This ensures that the command is processing the
lastest, unchanging data on disk.
This commit moves the location of these label scans to make them
clearer and avoid unnecessary repeated calls to them.
Previously, the initial label scan was called as a side effect
from various utility functions. This would lead to it being called
unnecessarily. It is an expensive operation, and should only be
called when necessary. Also, this is a primary step in the
function of the command, and as such it should be called prominently
at the top level of command processing, not as a hidden side effect
of a utility function. lvm knows exactly where and when the
label scan needs to be done. Because of this, move the label scan
calls from the internal functions to the top level of processing.
Other specific instances of lvmcache_label_scan() are still called
unnecessarily or unclearly by specific commands that do not use
the common process_each functions. These will be improved in
future commits.
During the processing phase, rescanning labels for devices in a VG
needs to be done after the VG lock is acquired in case things have
changed since the initial label scan. This was being done by way
of rescanning devices that had the INVALID flag set in lvmcache.
This usually approximated the right set of devices, but it was not
exact, and obfuscated the real requirement. Correct this by using
a new function that rescans the devices in the VG:
lvmcache_label_rescan_vg().
Apart from being inexact, the rescanning was extremely well hidden.
_vg_read() would call ->create_instance(), _text_create_text_instance(),
_create_vg_text_instance() which would call lvmcache_label_scan()
which would call _scan_invalid() which repeats the label scan on
devices flagged INVALID. lvmcache_label_rescan_vg() is now called
prominently by _vg_read() directly.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To do label scanning, lvm code calls lvmcache_label_scan().
Change lvmcache_label_scan() to use the new label_scan()
which can use async io, rather than implementing its own
dev iter loop and calling the synchronous label_read() on
each device.
Also add lvmcache_label_rescan_vg() which calls the new
label_scan_devs() which does label scanning on only the
specified devices. This is for a subsequent commit and
is not yet used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the implementation without using it in the code.
The code still calls label_read() on each individual device
to do scanning.
Calling the new label_scan() will use async io if async io is
enabled in config settings. If not enabled, or if async io fails,
label_scan() will fall back to using synchronous io. If only some
aio ops fail, the code will attempt to perform synchronous io on just
the ios that failed.
Uses linux native aio system calls, not the posix wrappers which
are messier and may not have all the latest linux capabilities.
Internally, the same functionality is used before:
- iterate through each visible device on the system,
provided from from dev-cache
- call _find_label_header on the dev to find the sector
containing the label_header
- call _text_read to look at the pv_header and mda locations
after the pv_header
- for each mda location, read the mda_header and the vg metadata
- add info/vginfo structs to lvmcache which associate the
device name (info) with the VG name (vginfo) so that vg_read
can know which devices to read for a given VG name
The new label scanning issues a "large" read beginning at the start
of the device, where large is configurable, but intended to cover
all the labels/headers/metadata that is located at the start of
the device. This large data buffer from each device is saved in a
global list using a new 'label_read_data' struct. Currently, this
buffer is only used to find the label_header from the first four
sectors of the device. In subsequent commits, other functions
that read other structs/metadata will first try to find that data in
the saved label_read_data buffer. In most common cases, the data
they need can simply be copied out of the existing buffer, and
they can avoid issuing another disk read to get it.
|
|
|
|
|
| |
There are config settings to enable aio, and to configure
the concurrency and read size.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The interface consists of:
- A context struct, one for the entire command.
- An io struct, one per io operation (read).
- dev_async_context_setup() creates an aio context.
- dev_async_context_destroy() destroys an aio context.
- dev_async_alloc_ios() allocates a specified number of io structs,
along with an associated buffer for the data.
- dev_async_free_ios() frees all the allocated io structs+buffers.
- dev_async_io_get() gets an available io struct from those
allocated in alloc_ios. If none are available, it will allocate
a new io struct if under limit.
- dev_async_io_put() puts a used io struct back into the set
of unused io structs, making it available for get.
- dev_async_read_submit() start an async read io.
- dev_async_getevents() collect async io completions.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
These two methods might be useful for debugging, but are not testing
anything.
|
|
|
|
|
|
| |
Commit 763db8aab02d7df551a3e8500d261ef6c9651bdb rejects 2-legged
conversions to striped/raid0 but different messages are displayed
for raid0 or striped. This commit provides the same rejection messages.
|
|
|
|
|
|
|
|
|
|
|
|
| |
raid4/5 LVs may only be converted to striped or raid0/raid0_meta
in case they have at least 3 legs. 2-legged raid4/5 are a result
of either converting a raid1 to raid4/5 (takeover) or converting
a raid4/5 with more than 2 legs to raid1 with 2 legs (reshape).
The raid4/5 personalities map those as raid1,
thus reject conversion to striped/raid0.
Resolves: rhbz1511047
|
|
|
|
|
| |
So raid doesn't want to create raid arrays with bigger regionsize ATM,
so just use smaller regionsize.
|
|
|
|
|
| |
Systemd 222 has a bug - where it's sometimes umount unpredictibly just
mounted device - skip testing when this happens.
|
|
|
|
| |
pvcreate with 2MDAs needs some extra space.
|
|
|
|
|
| |
Make more obvious the operation just got delayed
(using same wording as with thin snapshots)
|
| |
|
|
|
|
|
| |
Coverity cannot do a deeper analyzis so let's make just reports
go away and initialize them to 0.
|
|
|
|
| |
Coverity reported these are no longer in use.
|
|
|
|
| |
Use display_lvname and update thin snapshot merge error message.
|
| |
|
|
|
|
| |
Use some more "" for bash vars
|
|
|
|
|
|
|
| |
Do not using lvm's @SBINDIR@ for mdadm path.
Set this directly to /sbin/mdadm like other tools.
Group them separately
|
|
|
|
| |
TODO: it likely should be checked value is >0...
|
|
|
|
| |
When security_level was set, allocated filename was leaking.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In HA cluster, we have "clvm" resource agent to manage clvmd daemon.
The agent invokes clvmd like: "clvmd -T90 -d0", which always prints
a scaring error message:
"""
local socket: connect failed: No such file or directory
"""
When specifed with "-d" option, clvmd tries to check if an instance
of the clvmd daemon is already running through a testing connection.
The connect() will fail with this ENOENT error in such case, so supress
the error message in such case.
TODO: add missing error reaction code - since ofter log_error, program
is not supposed to continue running (log_error() is for reporting
stopping problems).
Signed-off-by: Eric Ren <zren@suse.com>
|
|
|
|
|
|
|
|
| |
Check and prevent starting another snapshot merge before
exiting merging is finished.
TODO: we can possibly implement smarter logic to drop existing
merging and start a new one.
|
| |
|
| |
|
|
|
|
| |
Test will now fail rather than warn if conditions are not met.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The lvchange-raid[456].sh test checks that mismatches can be detected
properly. It does this by writing garbage to the back half of one of
the legs directly. When performing a "check" or "repair" of mismatches,
MD does a good job going directly to disk and bypassing any buffers that
may prevent it from seeing mismatches. However, in the case of RAID4/5/6
we have the stripe cache to contend with and this is not bypassed. Thus,
mismatches which have /just/ happened to an area that now populates the
stripe cache may be overlooked. This isn't a serious issue, however,
because the stripe cache is short-lived and reasonably small. So, while
there may be a small window of time between the disk changing underneath
the RAID array and when you run a "check"/"repair" - causing a mismatch
to be missed - that would be no worse than if a user had simply run a
"check" a few seconds before the disk changed. IOW, it simply isn't worth
making a fuss over dropping the stripe cache before beginning a "check" or
"repair" (which we actually did attempt to do a while back).
So, to get the test running smoothly, we simply deactivate and reactivate
the LV to force the stripe cache to be dropped and then proceed. We could
just as easily wait a few seconds for the stripe cache to empty also.
|
|
|
|
|
|
|
|
|
| |
When a "recover" is just starting for a RAID LV, it is possible to get
"idle" for the sync action if the status is issued quickly enough. This
is fine, the MD thread just hasn't gotten things going yet. However,
the /need/ for a "recover" should be marked in md->recovery and it would
be simple enough to fix the kernel so this doesn't happen. May eventually
want a separate bug for this, but for now it fits with RHBZ 1507719.
|
|
|
|
|
|
|
|
|
| |
We always preferred and recommended socket activation for our services
so remove the Install section in related .service units which are unused
in this case and keep only the Install section in associated .socket
units.
Signed-off-by: Bastian Blank <waldi@debian.org>
|
|
|
|
| |
Replace with common macro.
|
|
|
|
|
| |
Just add some dots to messages and remove unneeded
stack trace from return after log_error.
|
|
|
|
|
|
|
|
|
| |
Since vg_validate() now rejects LVs without segments and
insert_layer_for_segments_on_pv() gets just created
'layer_lv' without segment, it needs to be hidden
from vg->lvs during processing of _align_segment_boundary_to_pe_range()
as this function calls lv_validate() and now requires
vg to be consistent. LV is then put back into vg->lvs.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two known bugs in the lvconvert-raid-status-validation.sh
test. The first one I consider to be more of an annoyance (1507719).
The second one I consider to be more serious (1507729).
RHBZ 1507719 simply documents the fact that the three RAID status
fields may not always be coherent due to the way they are set and
unset when the MD thread is shutting down and starting up. For
example, the sync ratio may be 100% but the sync action may not
yet have switched to "idle" and the health characters may not yet
all be 'A's (i.e. the devices set to InSync).
RHBZ 1507729 is more serious. The sync ratio can be 100% for a
short period of time after upconverting linear -> RAID1. It is
reset to 0 once the MD sync thread gets to work on it. It does
this because, technically, the array /is/ in-sync if the new
devices are excluded - i.e. the data is 100% available and
consistent. I'm not sure what to do about this problem, but we'd
much rather not have this state that looks exactly like the
end of the process when the sync ratio is 100% because the
"recover" process finished, but the sync action and health
characters haven't been updated yet. Put simply, the problem
is that we can't tell if a sync is starting or finished based
on the status output.
|
|
|
|
|
|
|
| |
Since 4fa5add6b1bd4d7f7313f2950021a09e4130ad08 ("pvcreate: Wipe cached
bootloaderarea when wiping label.") label_remove is responsible
for the lvmcache_del. (toollib and liblvm need fixing to share
the code.)
|
|
|
|
|
| |
Commit 04244107732feb5274bc24efed428a0d4ddae8f6 by mistake took also
this unwanted local modification of test - revert it.
|
|
|
|
|
|
| |
New validation code which does require to not store LV with no size
(no segments) revealed this size setup code needs to happen
earlier.
|
|
|
|
|
|
|
| |
Preload reiserfs module for the case, fs is present/compiled for a
kernel but it's not present in memory.
Size reducition needs --yes confirmation to preceed for reiserfs.
|
|
|
|
|
|
| |
Before accessing content make sure LV has segment.
This can be used in case code removes LV without segments
(i.e. on some error path)
|