summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* lvmcache: simplify metadata cachedev-dct-new-scan-29David Teigland2017-11-0914-343/+260
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The copy of VG metadata stored in lvmcache was not being used in general. It pretended to be a generic VG metadata cache, but was not being used except for clvmd activation. There it was used to avoid reading from disk while devices were suspended, i.e. in resume. This removes the code that attempted to make this look like a generic metadata cache, and replaces with with something narrowly targetted to what it's actually used for. This is a way of passing the VG from suspend to resume in clvmd. Since in the case of clvmd one caller can't simply pass the same VG to both suspend and resume, suspend needs to stash the VG somewhere that resume can grab it from. (resume doesn't want to read it from disk since devices are suspended.) The lvmcache vginfo struct is used as a convenient place to stash the VG to pass it from suspend to resume, even though it isn't related to the lvmcache or vginfo. These suspended_vg* vginfo fields should not be used or touched anywhere else, they are only to be used for passing the VG data from suspend to resume in clvmd. The VG data being passed between suspend and resume is never modified, and will only exist in the brief period between suspend and resume in clvmd. suspend has both old (current) and new (precommitted) copies of the VG metadata. It stashes both of these in the vginfo prior to suspending devices. When vg_commit is successful, it sets a flag in vginfo as before, signaling the transition from old to new metadata. resume grabs the VG stashed by suspend. If the vg_commit happened, it grabs the new VG, and if the vg_commit didn't happen it grabs the old VG. The VG is then used to resume LVs. This isolates clvmd-specific code and usage from the normal lvm vg_read code, making the code simpler and the behavior easier to verify. Sequence of operations: - lv_suspend() has both vg_old and vg_new and stashes a copy of each onto the vginfo: lvmcache_save_suspended_vg(vg_old); lvmcache_save_suspended_vg(vg_new); - vg_commit() happens, which causes all clvmd instances to call lvmcache_commit_metadata(vg). A flag is set in the vginfo indicating the transition from the old to new VG: vginfo->suspended_vg_committed = 1; - lv_resume() needs either vg_old or vg_new to use in resuming LVs. It doesn't want to read the VG from disk since devices are suspended, so it gets the VG stashed by lv_suspend: vg = lvmcache_get_suspended_vg(vgid); If the vg_commit did not happen, suspended_vg_committed will not be set, and in this case, lvmcache_get_suspended_vg() will return the old VG instead of the new VG, and it will resume LVs based on the old metadata.
* label_scan: remove extra label scan and read for orphan PVsDavid Teigland2017-11-095-188/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | When process_each_pv() calls vg_read() on the orphan VG, the internal implementation was doing an unnecessary lvmcache_label_scan() and two unnecessary label_read() calls on each orphan. Some of those unnecessary label scans/reads would sometimes be skipped due to caching, but the code was always doing at least one unnecessary read on each orphan. The common format_text case was also unecessarily calling into the format-specific pv_read() function which actually did nothing. By analyzing each case in which vg_read() was being called on the orphan VG, we can say that all of the label scans/reads in vg_read_orphans are unnecessary: 1. reporting commands: the information saved in lvmcache by the original label scan can be reported. There is no advantage to repeating the label scan on the orphans a second time before reporting it. 2. pvcreate/vgcreate/vgextend: these all share a common implementation in pvcreate_each_device(). That function already rescans labels after acquiring the orphan VG lock, which ensures that the command is using valid lvmcache information.
* vgcreate: improve the use of label_scanDavid Teigland2017-11-091-8/+20
| | | | | | | The old code was doing unnecessary label scans when checking to see if the new VG name exists. A single label_scan is sufficient if it is done after the new VG lock is held.
* lvmetad: use new label_scan for update from pvscanDavid Teigland2017-11-094-18/+82
| | | | | Take advantage of the common implementation with aio and reduced disk reads.
* lvmetad: use new label_scan for update from lvmlockdDavid Teigland2017-11-091-136/+301
| | | | | | | | When lvmlockd indicates that the lvmetad cache is out of date because of changes by another node, lvmetad_pvscan_vg() rescans the devices in the VG to update lvmetad. Use the new label_scan in this function to use the common code and take advantage of the new aio and reduced reads.
* label_scan/vg_read: use label_read_data to avoid disk readsDavid Teigland2017-11-0920-182/+367
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new label_scan() function reads a large buffer of data from the start of the disk, and saves it so that multiple structs can be read from it. Previously, only the label_header was read from this buffer, and the code which needed data structures that immediately followed the label_header would read those from disk separately. This created a large number of small, unnecessary disk reads. In each place that the two read paths (label_scan and vg_read) need to read data from disk, first check if that data is already available from the label_read_data buffer, and if so just copy it from the buffer instead of reading from disk. Code changes ------------ - passing the label_read_data struct down through both read paths to make it available. - before every disk read, first check if the location and size of the desired piece of data exists fully in the label_read_data buffer, and if so copy it from there. Otherwise, use the existing code to read the data from disk. - adding some log_error messages on existing error paths that were already being updated for the reasons above. - using similar naming for parallel functions on the two parallel read paths that are being updated above. label_scan path calls: read_metadata_location_summary, text_read_metadata_summary vg_read path calls: read_metadata_location_vg, text_read_metadata_file Previously, those functions were named: label_scan path calls: vgname_from_mda, text_vgsummary_import vg_read path calls: _find_vg_rlocn, text_vg_import_fd I/O changes ----------- In the label_scan path, the following data is either copied from label_read_data or read from disk for each PV: - label_header and pv_header - mda_header (in _raw_read_mda_header) - vg metadata name (in read_metadata_location_summary) - vg metadata (in config_file_read_fd) Total of 4 reads per PV in the label_scan path. In the vg_read path, the following data is either copied from label_read_data or read from disk for each PV: - mda_header (in _raw_read_mda_header) - vg metadata name (in read_metadata_location_vg) - vg metadata (in config_file_read_fd) Total of 3 reads per PV in the vg_read path. For a common read/reporting command, each PV will be: - read by the command's initial lvmcache_label_scan() - read by lvmcache_label_rescan_vg() at the start of vg_read() - read by vg_read() Previously, this would cause 11 synchronous disk reads per PV: 4 from lvmcache_label_scan(), 4 from lvmcache_label_rescan_vg() and 3 from vg_read(). With this commit's optimization, there are now 2 async disk reads per PV: 1 from lvmcache_label_scan() and 1 from lvmcache_label_rescan_vg(). When a second mda is used on a PV, it is located at the end of the PV. This second mda and copy of metadata will not be found in the label_read_data buffer, and will always require separate disk reads.
* independent metadata areas: fix bogus codeDavid Teigland2017-11-091-1/+3
| | | | | Fix mixing bitwise & and logical && which was always 1 in any case.
* label_scan: fix independent metadata areasDavid Teigland2017-11-091-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the use of lvmcache_label_rescan_vg() in the previous commit for the special case of independent metadata areas. label scan is about discovering VG name to device associations using information from disks, but devices in VGs with independent metadata areas have no information on disk, so the label scan does nothing for these VGs/devices. With independent metadata areas, only the VG metadata found in files is used. This metadata is found and read in vg_read in the processing phase. lvmcache_label_rescan_vg() drops lvmcache info for the VG devices before repeating the label scan on them. In the case of independent metadata areas, there is no metadata on devices, so the label scan of the devices will find nothing, so will not recreate the necessary vginfo/info data in lvmcache for the VG. Fix this by setting a flag in the lvmcache vginfo struct indicating that the VG uses independent metadata areas, and label rescanning should be skipped. In the case of independent metadata areas, it is the metadata processing in the vg_read phase that sets up the lvmcache vginfo/info information, and label scan has no role.
* label_scan: move to start of commandDavid Teigland2017-11-095-55/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LVM's general design for scanning/reading of metadata from disks is that a command begins with a discovery phase, called "label scan", in which it discovers which devices belong to lvm, what VGs exist on those devices, and which devices are associated with each VG. After this comes the processing phase, which is based around processing specific VGs. In this phase, lvm acquires a lock on the VG, and rescans the devices associated with that VG, i.e. it repeats the label scan steps on the devices in the VG in case something has changed between the initial label scan and taking the VG lock. This ensures that the command is processing the lastest, unchanging data on disk. This commit moves the location of these label scans to make them clearer and avoid unnecessary repeated calls to them. Previously, the initial label scan was called as a side effect from various utility functions. This would lead to it being called unnecessarily. It is an expensive operation, and should only be called when necessary. Also, this is a primary step in the function of the command, and as such it should be called prominently at the top level of command processing, not as a hidden side effect of a utility function. lvm knows exactly where and when the label scan needs to be done. Because of this, move the label scan calls from the internal functions to the top level of processing. Other specific instances of lvmcache_label_scan() are still called unnecessarily or unclearly by specific commands that do not use the common process_each functions. These will be improved in future commits. During the processing phase, rescanning labels for devices in a VG needs to be done after the VG lock is acquired in case things have changed since the initial label scan. This was being done by way of rescanning devices that had the INVALID flag set in lvmcache. This usually approximated the right set of devices, but it was not exact, and obfuscated the real requirement. Correct this by using a new function that rescans the devices in the VG: lvmcache_label_rescan_vg(). Apart from being inexact, the rescanning was extremely well hidden. _vg_read() would call ->create_instance(), _text_create_text_instance(), _create_vg_text_instance() which would call lvmcache_label_scan() which would call _scan_invalid() which repeats the label scan on devices flagged INVALID. lvmcache_label_rescan_vg() is now called prominently by _vg_read() directly.
* label_scan: call new label_scan from lvmcache_label_scanDavid Teigland2017-11-095-65/+180
| | | | | | | | | | | | | To do label scanning, lvm code calls lvmcache_label_scan(). Change lvmcache_label_scan() to use the new label_scan() which can use async io, rather than implementing its own dev iter loop and calling the synchronous label_read() on each device. Also add lvmcache_label_rescan_vg() which calls the new label_scan_devs() which does label scanning on only the specified devices. This is for a subsequent commit and is not yet used.
* label_scan: add new implementation for async and syncDavid Teigland2017-11-093-57/+1509
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the implementation without using it in the code. The code still calls label_read() on each individual device to do scanning. Calling the new label_scan() will use async io if async io is enabled in config settings. If not enabled, or if async io fails, label_scan() will fall back to using synchronous io. If only some aio ops fail, the code will attempt to perform synchronous io on just the ios that failed. Uses linux native aio system calls, not the posix wrappers which are messier and may not have all the latest linux capabilities. Internally, the same functionality is used before: - iterate through each visible device on the system, provided from from dev-cache - call _find_label_header on the dev to find the sector containing the label_header - call _text_read to look at the pv_header and mda locations after the pv_header - for each mda location, read the mda_header and the vg metadata - add info/vginfo structs to lvmcache which associate the device name (info) with the VG name (vginfo) so that vg_read can know which devices to read for a given VG name The new label scanning issues a "large" read beginning at the start of the device, where large is configurable, but intended to cover all the labels/headers/metadata that is located at the start of the device. This large data buffer from each device is saved in a global list using a new 'label_read_data' struct. Currently, this buffer is only used to find the label_header from the first four sectors of the device. In subsequent commits, other functions that read other structs/metadata will first try to find that data in the saved label_read_data buffer. In most common cases, the data they need can simply be copied out of the existing buffer, and they can avoid issuing another disk read to get it.
* command: add settings to enable async ioDavid Teigland2017-11-095-0/+49
| | | | | There are config settings to enable aio, and to configure the concurrency and read size.
* io: add low level async io supportDavid Teigland2017-11-0911-0/+484
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The interface consists of: - A context struct, one for the entire command. - An io struct, one per io operation (read). - dev_async_context_setup() creates an aio context. - dev_async_context_destroy() destroys an aio context. - dev_async_alloc_ios() allocates a specified number of io structs, along with an associated buffer for the data. - dev_async_free_ios() frees all the allocated io structs+buffers. - dev_async_io_get() gets an available io struct from those allocated in alloc_ios. If none are available, it will allocate a new io struct if under limit. - dev_async_io_put() puts a used io struct back into the set of unused io structs, making it available for get. - dev_async_read_submit() start an async read io. - dev_async_getevents() collect async io completions.
* WHATS_NEW: ignore stripes/stripesize on RAID takoverHeinz Mauelshagen2017-10-261-0/+1
|
* lvcreate: error message with dot.Heinz Mauelshagen2017-10-261-1/+1
|
* raid: ignore --stripes/--stripesize on takeoverHeinz Mauelshagen2017-10-261-3/+22
| | | | | | | | | | | | | | Converting from one raid level to another, no changes of stripes or stripesize can be requested because those are subject to reshaping. I.e. the process requires to takeover first and secondly request raid algorithm, stripe or stripesize changes. Ignore any related changes display warninngs and proceed with the takeover. Without this patch, a takeover requesting stripesize change causes data corruption!
* tests: better clustering supportZdenek Kabelac2017-10-262-2/+2
| | | | | Use exclusive activation for snapshot conversion since we can only convert exclusively active volumes.
* tests: allow override of LVM_LOG_FILE_MAX_LINESZdenek Kabelac2017-10-261-1/+1
| | | | | | | | | | Just like with other vars support this: make check_local T=xyz LVM_LOG_FILE_MAX_LINES=10000000 Allows easily to override existing line limit. Also increase limiting size of logs per command since some of our commands are becoming very verbose....
* Makefile: help shows hint about LVM_LOG_FILE_MAX_LINESZdenek Kabelac2017-10-261-2/+3
|
* log: better message when reached log limitZdenek Kabelac2017-10-261-3/+8
| | | | | | Add explaining message, when command was aborted due to the reach of configure line number count (LVM_LOG_FILE_MAX_LINES) for logging (used mainly with testing).
* WHATS_NEW: missedZdenek Kabelac2017-10-261-0/+1
| | | | | | | | | Last patch missed to mention, we've improved/fixed generated paths in units and init.d shell scripts when lvm2 was plainly configured with just i.e. --prefix. Note: some distros might have fully specified --sbindir and --usrsbindir - thus those very not seeing problems in generated paths.
* commands: drop secondary for lvconvert --type snapshotZdenek Kabelac2017-10-251-1/+0
| | | | | | | Both form were marked and secondary thus none of the supported syntax entered manpage. This restores appearance of snapshot conversion in man page.
* shellcheck: some apostrophe changes and cleanupsZdenek Kabelac2017-10-259-61/+59
|
* scripts: paths updateZdenek Kabelac2017-10-258-32/+31
| | | | | Correct usage of sbindir also for scripts so the path no longer needs resolving more vars like exec_prefix & prefix.
* systemd: use proper sbindir pathZdenek Kabelac2017-10-2513-19/+19
| | | | | | | | | | | | Replace lowercase @sbindir@ with @SBINDIR@ which contains fully decoded path. Same with @usrsbindir@ which is also used with clvmd and cmirrord. Also handle SYSCONFDIR for EnvironmentFile. Patch fixes generated unit files with strings like: ExecStart=${exec_prefix}/sbin/lvm
* configure: improve support for sbindir pathZdenek Kabelac2017-10-252-39/+57
| | | | | | | | | | | | | | | | | Introduce few more AC_SUBST vars for usage in *.in generation. In some case we want to replace i.e. $sbindir with full path instead of current ${exec_prefix}/sbin. This patch provides: USRSBINDIR SBINDIR DEFAULT_SYS_LOCK_DIR SYSCONFDIR At the same time properly use sbindir & usrsbindir with lvm, fsadm, clvmd from one primary definition.
* tests: snapshot conversionsZdenek Kabelac2017-10-254-0/+273
| | | | Add missing tests for snapshost conversions.
* typo: fix invalidZdenek Kabelac2017-10-251-1/+1
|
* snapshot: improve validationZdenek Kabelac2017-10-253-2/+13
| | | | | | | | | | | Do not allow to take snapshot of mirror/raid leg or log or metadata LV. This was actually never supported, but user was able to create it, and this put device stack in hardly fixable state (needs manual work). This prevents such creation to pass. Also improve validation when recreating snapshot volume type from origin and COW volume.
* clean-up: Correct the comment to match the particular test caseJonathan Brassow2017-10-241-2/+2
|
* tests:check lvconvert with /dev in vglvnameZdenek Kabelac2017-10-241-0/+8
|
* lvconvert: fixing extraction of vgnameZdenek Kabelac2017-10-242-69/+39
| | | | | | | | | | | | | | | | | Correction to function for extracting vgname out of lvconvert parameters. Avoid repeating some checks. Add code to handle generic options which may provide vgname in its argument and compare them all so they match to a single vgname (otherwise it's a error). Extract default (envvar) vgname only when no position nor optional vgname is found. Fixing regression instroduce with patchset started with commit: 1e2420bca85da9a37570871cd70192e9ae831786 (2.02.169)
* fsadm: refactor resize_crypt functionOndrej Kozina2017-10-241-28/+32
| | | | | | | | | split resize_crypt function in two. a) Detect proper dm-crypt device type and count new --size value for cryptsetup resize command. b) Perform the resize
* fsadm: rename local variables to avoid confusionOndrej Kozina2017-10-241-11/+11
|
* test: add regression test for fsadm bugOndrej Kozina2017-10-241-2/+60
| | | | | | | the bug in LUKS grow/shrink decision in fsadm was masked due to fact that default LVM2 extent size was larger than LUKS1 default data offset for dm-crypt mapping. The new test address this bug.
* fsadm: fix bug in LUKS grow/shrink decision branchOndrej Kozina2017-10-241-1/+1
|
* fsadm: add luks specific error message for small devicesOndrej Kozina2017-10-241-0/+4
|
* tests: check stacked cache dataLV of thin-poolZdenek Kabelac2017-10-232-0/+84
|
* lvcreate: skip checking for name restriction for cachingZdenek Kabelac2017-10-232-1/+2
| | | | | | | | | | | | | lvcreate supports a 'conversion' when caching LV. This normally worked fine, however in case passed LV was thin-pool's data LV with suffix _tdata we have failed to early. As the easiest fix looks dropping validation of name when caching type is select - such name check will happen later once the VG is opened again and properly detect if the LV with protected name already exists and can be converted, or will be rejected as ambigiuous operation requiring user to specify --type cache | --type cache-pool.
* lvextend: detect stacked cache lv used for thinpoolZdenek Kabelac2017-10-232-1/+3
| | | | | Ensure, that cacheLV is not tried to be resize until full support is added.
* lvconvert: preserve names of converted LVZdenek Kabelac2017-10-232-11/+12
| | | | | When prompting and warning for conversion, remember initial LV names, so after conversion is finished, correct original names are printed.
* test: remove 'should's from test to test target status race fixHeinz Mauelshagen2017-10-191-8/+6
|
* liblvm: Move lib code used exclusively into metadata-liblvm.cAlasdair G Kergon2017-10-189-673/+727
| | | | Also remove some redundant function definitions from metadata.h.
* tidy: Add missing underscores to statics.Alasdair G Kergon2017-10-1837-362/+358
|
* libdm: fix typo in libdevmapper.pcZdenek Kabelac2017-10-182-1/+2
| | | | Fixing name for RT libraries and using RT_LIBS.
* lvmlockd: check error for sanlock access to lvmlock LVDavid Teigland2017-10-173-6/+18
| | | | | When the sanlock daemon does not have permission to access the lvmlock LV, make the error messages more helpful.
* device: Separate errors for dev not found and filtered.Alasdair G Kergon2017-10-175-4/+18
| | | | | | | | Replaced the confusing device error message "not found (or ignored by filtering)" by either "not found" or "excluded by a filter". (Later we should be able to say which filter.) Left the the liblvm code paths alone.
* tests: check external origin is monitoredZdenek Kabelac2017-10-161-0/+47
|
* thin: monitor also external originZdenek Kabelac2017-10-162-1/+9
| | | | | Add missing monitoring for external origin LVs and add -real suffix for UUID used for monitoring of external origin.
* configure: autoreconfMarian Csontos2017-10-161-26/+1
|