From 99a8bfcd7233a7f18ac191c3b3150068e8ac3a72 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Fri, 26 Jan 2018 06:50:52 -0600 Subject: doc: lvm disk reading --- doc/lvm-disk-reading.txt | 246 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 246 insertions(+) create mode 100644 doc/lvm-disk-reading.txt diff --git a/doc/lvm-disk-reading.txt b/doc/lvm-disk-reading.txt new file mode 100644 index 000000000..5d5e6b575 --- /dev/null +++ b/doc/lvm-disk-reading.txt @@ -0,0 +1,246 @@ + +LVM disk reading + +Reading disks happens in two phases. The first is a discovery phase, +which determines what's on the disks. The second is a working phase, +which does a particular job for the command. + + +Phase 1: Discovery +------------------ + +Read all the disks on the system to find out: +- What are the LVM devices? +- What VG's exist on those devices? + +This phase is called "label scan" (although it reads and scans everything, +not just the label.) It stores the information it discovers (what LVM +devices exist, and what VGs exist on them) in lvmcache. The devs/VGs info +in lvmcache is the starting point for phase two. + + +Phase 1 in outline: + +For each device: + +a. Read the first KB of the device. (N is configurable.) + +b. Look for the lvm label_header in the first four sectors, + if none exists, it's not an lvm device, so quit looking at it. + (By default, label_header is in the second sector.) + +c. Look at the pv_header, which follows the label_header. + This tells us the location of VG metadata on the device. + There can be 0, 1 or 2 copies of VG metadata. The first + is always at the start of the device, the second (if used) + is at the end. + +d. Look at the first mda_header (location came from pv_header + in the previous step). This is by default in sector 8, + 4096 bytes from the start of the device. This tells us the + location of the actual VG metadata text. + +e. Look at the first copy of the text VG metadata (location came + from mda_header in the previous step). This is by default + in sector 9, 4608 bytes from the start of the device. + The VG metadata is only partially analyzed to create a basic + summary of the VG. + +f. Store an "info" entry in lvmcache for this device, + indicating that it is an lvm device, and store a "vginfo" + entry in lvmcache indicating the name of the VG seen + in the metadata in step e. + +g. If the pv_header in step c shows a second mda_header + location at the end of the device, then read that as + in step d, and repeat steps e-f for it. + +At the end of phase 1, lvmcache will have a list of devices +that belong to LVM, and a list of VG names that exist on +those devices. Each device (info struct) is associated +with the VG (vginfo struct) it is used in. + +If the number of KB read in step (a) was large enough, then +all the structs/metadata needed in steps b-e will be found +in the data buffer returned by a. If a particular struct +or metadata needed in steps b-e are located outside the range +of the initial read, then those steps need to issue their own +read at the necessary location to get that bit of data. +(The optional second mda_header and VG metadata in step g +is located at the end of the device, and will always require +an additional read.) + + +Phase 1 in code: + +The most relevant functions are listed for each step in the outline. + +lvmcache_label_scan() +label_scan() +_label_scan_async() + +for each dev: dev = dev_iter_get(iter) ... + +a. _label_read_async_start() + +b. _label_read_data_process() + _find_label_header() + +c. _label_read_data_process() + ops->read() + _text_read() + +d. _read_mda_header_and_metadata() + raw_read_mda_header() + +e. _read_mda_header_and_metadata() + read_metadata_location() + text_read_metadata_summary() + config_file_read_fd() + ops->read_vgsummary() + _read_vgsummary() + +f. _text_read(): lvmcache_add() + [adds this device to list of lvm devices] + _read_mda_header_and_metadata(): lvmcache_update_vgname_and_id() + [adds the VG name to list of VGs] + + +Phase 1 in log messages: + +For each device: + Scanning data from all devs async + +a. Reading sectors from device + +b. Parsing label and data from device + +d. Copying mda header sector from ... + or if the mda_header needs to be read from disk: + Reading mda header sector from ... + +e. Copying metadata summary for ... + or if the metadata needs to be read from disk: + Reading metadata summary for ... + +f. lvmcache ... + + +Phase 2: Work +------------- + +This phase carries out the operation requested by the command that was +run. + +Whereas the first phase is based on iterating through each device on the +system, this phase is based on iterating through each VG name. The list +of VG names comes from phase 1, which stored the list in lvmcache to be +used by phase 2. + +Some commands may need to iterate through all VG names, while others may +need to iterate through just one or two. + +This phase includes locking each VG as work is done on it, so that two +commands do not interfere with each other. + + +Phase 2 in outline: + +For each VG name: + +a. Lock the VG. + +b. Repeat the phase 1 scan steps for each device (PV) in this VG. + The phase 1 information in lvmcache may have changed because no VG lock + was held during phase 1. So, repeat the phase 1 steps, but only for the + devices in this VG. + +c. Get the list of on-disk metadata locations for this VG. + Phase 1 created this list in lvmcache to be used here. At this + point we copy it out of lvmcache. In the simple/common case, + this is a list of devices in the VG. But, some devices may + have 0 or 2 metadata locations instead of the default 1, so it + is not always equal to the list of devices. We want to read + every copy of the metadata for this VG. + +d. For each metadata location on each device in the VG + (the list from the previous step): + + 1) Look at the mda_header. The location of the mda_header was saved + in the lvmcache info struct by phase 1 (where it came from the + pv_header.) The mda_header tells us where the text VG metadata is + located. + + 2) Look at the text VG metadata. The location came from mda_header + in the previous step. The VG metadata is fully analyzed and used + to create an in-memory 'struct volume_group'. + + Copying or reading the mda_header and VG metadata in steps d.1 and d.2 + follow the same model as in phase 1: if the data read in scan step 2.b + covered these areas, then data is simply copied out of the buffer from + step 2.b, otherwise new reads are done. + +e. Compare the copies of VG metadata that were found in each location. + If some copies are older, choose the newest one to use, and update + any older copies. + +f. Update details about the devices/VG in lvmcache. + +g. Pass the 'vg' struct to the command-specific code to work with. + + +Phase 2 in code: + +The most relevant functions are listed for each step in the outline. + +For each VG name: + process_each_vg() + +a. vg_read() + lock_vol() + +b. vg_read() + lvmcache_label_rescan_vg() + [insert phase 1 steps a-f] + +c. vg_read() + create_instance() + _text_create_text_instance() + _create_vg_text_instance() + lvmcache_fid_add_mdas_vg() + [Copies mda locations from info->mdas where it was saved + by phase 1, into fid->metadata_areas_in_use. This is + the key connection between phase 1 and phase 2.] + +d. dm_list_iterate_items(mda, &fid->metadata_areas_in_use) + +d1. ops->vg_read() + _vg_read_raw() + raw_read_mda_header() + +d2. _vg_read_raw() + text_read_metadata() + config_file_read_fd() + ops->read_vg() + _read_vg() + + +Phase 2 in log messages: + +For each VG name: + Processing VG + Reading VG + +b. Reading VG rereading labels for + Scanning data from devs async + [insert log messages from phase 1 steps a-f] + Scanned data from devs async + +For each mda on each in the VG: + +d. Reading VG from + +d.1. Copying|Reading mda header sector from ... + +d.2. Copying|Reading metadata from ... + -- cgit v1.2.1