summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDavid Teigland <teigland@redhat.com>2019-01-02 13:32:05 -0600
committerDavid Teigland <teigland@redhat.com>2019-01-10 11:27:59 -0600
commita0f3b852b4dc8871584ff4799fee9999db01b4b4 (patch)
tree89f183badd346d5907a002211d7f86030e6e6a6d
parentdf0797db8cec82df92fb1df2bace184f5a811924 (diff)
downloadlvm2-dev-dct-repair-4.tar.gz
Handling bad or old metadatadev-dct-repair-4
Reading bad/old metadata ------------------------ - "bad metadata" is a copy of the metadata that has been corrupted, or can't be read, or has some invalid data that can't be parsed or understood by lvm. It's often reported as a checksum error, but not always. Bad metadata should be replaced with a copy of good metadata from another PV (or from a good copy on the same PV.) - "old metadata" is a copy of the metadata that has a smaller seqno than other copies of the metadata. It could happen if the device failed, or io failed, or lvm failed while commiting new metadata to all the metadata areas. Old metadata on a PV that is still in the VG should be replaced with a copy of good metadata from another PV (or from a good copy on the same PV). Old metadata on a PV that has been removed from the VG should be erased. When a VG has some PVs with bad/old metadata, lvm can simply ignore the bad/old copies, and use a good copy. This is why there are multiple copies of the metadata -- so it's available even when some of the copies cannot be used. The bad/old copies do not have to be repaired before the VG can be used (the repair can happen later.) A PV with no good copies of the metadata simply falls back to being treated like a PV with no mdas; a common and harmless configuration. When bad/old metadata exists, lvm warns the user about it, and suggests repairing it using a new metadata repair command. Bad/old metadata is something that users will often want to investigate and repair themselves, since it should not generally happen and may indicate some other problem that needs to be fixed. PVs with bad/old metadata are not the same as missing devices. Missing devices will block various kinds of VG modification or activation, but bad/old metadata will not. Previously, lvm would attempt to repair bad/old metadata whenever it was read. This was unnecessary since lvm does not require every copy of the metadata to be used. It would also hide potential problems that should be investigated by the user. It was also dangerous in cases where the VG was on shared storage. The user is now allowed to investigate potential problems and decide how and when to repair them. Repairing bad/old metadata -------------------------- When label scan sees bad metadata in an mda, that mda is removed from the lvmcache info->mdas list. This means that vg_read will skip it, and not attempt to read/process it again. If it was the only in-use mda on a PV, that PV is treated like a PV with no mdas. It also means that vg_write will also skip the bad mda, and not attempt to write new metadata to it. The only way to repair bad metadata is with metadata repair (see below). (We may also want to allow pvchange --metadataignore on a PV with bad metadata.) When label scan sees old metadata in an mda, that mda is kept in the lvmcache info->mdas list. This means that vg_read will read/process it again, and likely see the same mismatch with the other copies of the metadata. Like the label_scan, the vg_read will simply ignore the old copy of the metadata and use the latest copy. If the command is modifying the vg (e.g. lvcreate), then vg_write, which writes new metadata to every mda on info->mdas, will write the new metadata to the mda that had the problematic old version. If successful, this will resolve the old metadata problem (without needing to run a metadata repair command.) Repair command -------------- A new metadata repair command replaces bad/old metadata on PVs with a good copy of the metadata. This will find a good copy of metadata from a PV and use it to replace any bad/old metadata copies: $ vgck --repairmetadata VG This will use metadata from the specified source PV and use it to replace any bad/old metadata copies: $ vgck --repairmetadata --sourcedevice PV VG This will find a good copy of metadata from a PV and write it to the specified PV(s): $ vgck --repairmetadata VG PV ... This will use metadata from the specified source PV and write it to the specified PV(s): $ vgck --repairmetadata --sourcedevice PV VG PV ... This will use raw metadata from the specified file and use it to replace any bad/old metadata copies: $ vgck --repairmetadata --file String VG This will use raw metadata from the specified file and use write it to the specified PV(s): $ vgck --repairmetadata --file String VG PV ... The raw metadata file used by --repairmetadata above is produced by the following command which writes it to the specified file (or stdout). It can be manually edited if needed, and then used as input. $ vgck --dumpmetadata [--sourcedevice PV] [--file String] VG Some of these features could potentially be incorporated into vgcfgrestore with new options. In general vgcfgrestore has never been a repair tool, and is not aimed at handling bad/old metadata, but rather going between two good versions of metadata.
-rw-r--r--include/.symlinks.in2
-rw-r--r--lib/cache/lvmcache.c283
-rw-r--r--lib/cache/lvmcache.h17
-rw-r--r--lib/format_text/format-text.c18
-rw-r--r--lib/format_text/format-text.h6
-rw-r--r--lib/format_text/import.c14
-rw-r--r--lib/format_text/import_vsn1.c40
-rw-r--r--lib/format_text/text_label.c202
-rw-r--r--lib/label/label.c15
-rw-r--r--lib/metadata/metadata-exported.h8
-rw-r--r--lib/metadata/metadata.c1372
-rw-r--r--lib/metadata/metadata.h3
-rw-r--r--tools/args.h9
-rw-r--r--tools/command-lines.in12
-rw-r--r--tools/vgck.c105
15 files changed, 1568 insertions, 538 deletions
diff --git a/include/.symlinks.in b/include/.symlinks.in
index 3d5075058..37061c0e8 100644
--- a/include/.symlinks.in
+++ b/include/.symlinks.in
@@ -26,6 +26,8 @@
@top_srcdir@/lib/format_pool/format_pool.h
@top_srcdir@/lib/format_text/archiver.h
@top_srcdir@/lib/format_text/format-text.h
+@top_srcdir@/lib/format_text/layout.h
+@top_srcdir@/lib/format_text/import-export.h
@top_srcdir@/lib/format_text/text_export.h
@top_srcdir@/lib/format_text/text_import.h
@top_srcdir@/lib/label/label.h
diff --git a/lib/cache/lvmcache.c b/lib/cache/lvmcache.c
index a2ee0cd43..f481b84a6 100644
--- a/lib/cache/lvmcache.c
+++ b/lib/cache/lvmcache.c
@@ -43,6 +43,12 @@ struct lvmcache_info {
uint32_t ext_version; /* Extension version */
uint32_t ext_flags; /* Extension flags */
uint32_t status;
+ int summary_seqno; /* vg seqno found on this dev during scan */
+ int mda1_seqno;
+ int mda2_seqno;
+ unsigned summary_seqno_mismatch:1; /* two mdas on this dev has mismatching metadata */
+ unsigned mda1_bad:1; /* label scan found bad metadata in mda1 */
+ unsigned mda2_bad:1; /* label scan found bad metadata in mda2 */
};
/* One per VG */
@@ -713,6 +719,33 @@ static void _destroy_duplicate_device_list(struct dm_list *head)
dm_list_init(head);
}
+int lvmcache_has_bad_metadata(struct device *dev)
+{
+ struct lvmcache_info *info;
+
+ if (!(info = lvmcache_info_from_pvid(dev->pvid, dev, 0))) {
+ /* shouldn't happen */
+ log_error("No lvmcache info for checking bad metadata on %s", dev_name(dev));
+ return 0;
+ }
+
+ if (info->mda1_bad || info->mda2_bad)
+ return 1;
+ return 0;
+}
+
+/*
+ * "bad" metadata cannot be used/processed by lvm, e.g.
+ * it has a bad checksum, invalid/unrecognizable content.
+ */
+void lvmcache_set_bad_metadata(struct lvmcache_info *info, int mda1_bad, int mda2_bad)
+{
+ if (mda1_bad)
+ info->mda1_bad = 1;
+ if (mda2_bad)
+ info->mda2_bad = 1;
+}
+
static void _vginfo_attach_info(struct lvmcache_vginfo *vginfo,
struct lvmcache_info *info)
{
@@ -2073,8 +2106,9 @@ static int _lvmcache_update_vgname(struct lvmcache_info *info,
sprintf(mdabuf, " with %u mda(s)", dm_list_size(&info->mdas));
else
mdabuf[0] = '\0';
- log_debug_cache("lvmcache %s: now in VG %s%s%s%s%s.",
+ log_debug_cache("lvmcache %s %s: now in VG %s%s%s%s%s.",
dev_name(info->dev),
+ info->dev->pvid,
vgname, vginfo->vgid[0] ? " (" : "",
vginfo->vgid[0] ? vginfo->vgid : "",
vginfo->vgid[0] ? ")" : "", mdabuf);
@@ -2162,12 +2196,9 @@ int lvmcache_add_orphan_vginfo(const char *vgname, struct format_type *fmt)
}
/*
- * FIXME: get rid of other callers of this function which call it
- * in odd cases to "fix up" some bit of lvmcache state. Make those
- * callers fix up what they need to directly, and leave this function
- * with one purpose and caller.
+ * Returning 0 causes the caller to remove the info struct for this
+ * device from lvmcache, which will make it look like a missing device.
*/
-
int lvmcache_update_vgname_and_id(struct lvmcache_info *info, struct lvmcache_vgsummary *vgsummary)
{
const char *vgname = vgsummary->vgname;
@@ -2193,6 +2224,7 @@ int lvmcache_update_vgname_and_id(struct lvmcache_info *info, struct lvmcache_vg
* Puts the vginfo into the vgname hash table.
*/
if (!_lvmcache_update_vgname(info, vgname, vgid, vgsummary->vgstatus, vgsummary->creation_host, info->fmt)) {
+ /* shouldn't happen, internal error */
log_error("Failed to update VG %s info in lvmcache.", vgname);
return 0;
}
@@ -2201,6 +2233,7 @@ int lvmcache_update_vgname_and_id(struct lvmcache_info *info, struct lvmcache_vg
* Puts the vginfo into the vgid hash table.
*/
if (!_lvmcache_update_vgid(info, info->vginfo, vgid)) {
+ /* shouldn't happen, internal error */
log_error("Failed to update VG %s info in lvmcache.", vgname);
return 0;
}
@@ -2216,50 +2249,111 @@ int lvmcache_update_vgname_and_id(struct lvmcache_info *info, struct lvmcache_vg
if (!vgsummary->seqno && !vgsummary->mda_size && !vgsummary->mda_checksum)
return 1;
+ /*
+ * Keep track of which devs/mdas have old versions of the metadata.
+ * The values we keep in vginfo are from the metadata with the largest
+ * seqno. One dev may have more recent metadata than another dev, and
+ * one mda may have more recent metadata than the other mda on the same
+ * device.
+ *
+ * When a device holds old metadata, the info struct for the device
+ * remains in lvmcache, so the device is not treated as missing.
+ * Also the mda struct containing the old metadata is kept on
+ * info->mdas. This means that vg_read will read metadata from
+ * the mda again (and probably see the same old metadata). It
+ * also means that vg_write will use the mda to write new metadata
+ * into the mda that currently has the old metadata.
+ */
+ if (vgsummary->mda_num == 1)
+ info->mda1_seqno = vgsummary->seqno;
+ else if (vgsummary->mda_num == 2)
+ info->mda2_seqno = vgsummary->seqno;
+
+ if (!info->summary_seqno)
+ info->summary_seqno = vgsummary->seqno;
+ else {
+ if (info->summary_seqno == vgsummary->seqno) {
+ /* This mda has the same metadata as the prev mda on this dev. */
+ return 1;
+
+ } else if (info->summary_seqno > vgsummary->seqno) {
+ /* This mda has older metadata than the prev mda on this dev. */
+ info->summary_seqno_mismatch = 1;
+
+ } else if (info->summary_seqno < vgsummary->seqno) {
+ /* This mda has newer metadata than the prev mda on this dev. */
+ info->summary_seqno_mismatch = 1;
+ info->summary_seqno = vgsummary->seqno;
+ }
+ }
+
+ /* this shouldn't happen */
if (!(vginfo = info->vginfo))
return 1;
if (!vginfo->seqno) {
vginfo->seqno = vgsummary->seqno;
+ vginfo->mda_checksum = vgsummary->mda_checksum;
+ vginfo->mda_size = vgsummary->mda_size;
- log_debug_cache("lvmcache %s: VG %s: set seqno to %d",
- dev_name(info->dev), vginfo->vgname, vginfo->seqno);
+ log_debug_cache("lvmcache %s mda%d VG %s set seqno %u checksum %x mda_size %zu",
+ dev_name(info->dev), vgsummary->mda_num, vgname,
+ vgsummary->seqno, vgsummary->mda_checksum, vgsummary->mda_size);
+ goto update_vginfo;
- } else if (vgsummary->seqno != vginfo->seqno) {
- log_warn("Scan of VG %s from %s found metadata seqno %d vs previous %d.",
- vgname, dev_name(info->dev), vgsummary->seqno, vginfo->seqno);
+ } else if (vgsummary->seqno < vginfo->seqno) {
vginfo->scan_summary_mismatch = 1;
- /* If we don't return success, this dev info will be removed from lvmcache,
- and then we won't be able to rescan it or repair it. */
+
+ log_debug_cache("lvmcache %s mda%d VG %s older seqno %u checksum %x mda_size %zu",
+ dev_name(info->dev), vgsummary->mda_num, vgname,
+ vgsummary->seqno, vgsummary->mda_checksum, vgsummary->mda_size);
return 1;
- }
- if (!vginfo->mda_size) {
+ } else if (vgsummary->seqno > vginfo->seqno) {
+ vginfo->scan_summary_mismatch = 1;
+
+ /* Replace vginfo values with values from newer metadata. */
+ vginfo->seqno = vgsummary->seqno;
vginfo->mda_checksum = vgsummary->mda_checksum;
vginfo->mda_size = vgsummary->mda_size;
- log_debug_cache("lvmcache %s: VG %s: set mda_checksum to %x mda_size to %zu",
- dev_name(info->dev), vginfo->vgname,
- vginfo->mda_checksum, vginfo->mda_size);
+ log_debug_cache("lvmcache %s mda%d VG %s newer seqno %u checksum %x mda_size %zu",
+ dev_name(info->dev), vgsummary->mda_num, vgname,
+ vgsummary->seqno, vgsummary->mda_checksum, vgsummary->mda_size);
- } else if ((vginfo->mda_size != vgsummary->mda_size) || (vginfo->mda_checksum != vgsummary->mda_checksum)) {
- log_warn("Scan of VG %s from %s found mda_checksum %x mda_size %zu vs previous %x %zu",
- vgname, dev_name(info->dev), vgsummary->mda_checksum, vgsummary->mda_size,
- vginfo->mda_checksum, vginfo->mda_size);
- vginfo->scan_summary_mismatch = 1;
- /* If we don't return success, this dev info will be removed from lvmcache,
- and then we won't be able to rescan it or repair it. */
+ goto update_vginfo;
+ } else {
+ /*
+ * Same seqno as previous metadata we saw for this VG.
+ * If the metadata somehow has a different checksum or size,
+ * even though it has the same seqno, something has gone wrong.
+ * FIXME: figure out if this can happen, how, and what to do with it.
+ */
+
+ if ((vginfo->mda_size != vgsummary->mda_size) || (vginfo->mda_checksum != vgsummary->mda_checksum)) {
+ log_warn("WARNING: scan of VG %s from %s mda%d found mda_checksum %x mda_size %zu vs %x %zu",
+ vgname, dev_name(info->dev), vgsummary->mda_num,
+ vgsummary->mda_checksum, vgsummary->mda_size,
+ vginfo->mda_checksum, vginfo->mda_size);
+ vginfo->scan_summary_mismatch = 1;
+ return 0;
+ }
+
+ /*
+ * The seqno and checksum matches what was previously seen;
+ * the summary values have already been saved in vginfo.
+ */
return 1;
}
- /*
- * If a dev has an unmatching checksum, ignore the other
- * info from it, keeping the info we already saved.
- */
+ update_vginfo:
if (!_lvmcache_update_vgstatus(info, vgsummary->vgstatus, vgsummary->creation_host,
vgsummary->lock_type, vgsummary->system_id)) {
+ /*
+ * This shouldn't happen, it's an internal errror, and we can leave
+ * the info in place without saving the summary values in vginfo.
+ */
log_error("Failed to update VG %s info in lvmcache.", vgname);
- return 0;
}
return 1;
@@ -2280,10 +2374,26 @@ int lvmcache_update_vg(struct volume_group *vg, unsigned precommitted)
dm_list_iterate_items(pvl, &vg->pvs) {
(void) dm_strncpy(pvid_s, (char *) &pvl->pv->id, sizeof(pvid_s));
- /* FIXME Could pvl->pv->dev->pvid ever be different? */
- if ((info = lvmcache_info_from_pvid(pvid_s, pvl->pv->dev, 0)) &&
- !lvmcache_update_vgname_and_id(info, &vgsummary))
- return_0;
+
+ if (!(info = lvmcache_info_from_pvid(pvid_s, pvl->pv->dev, 0))) {
+ log_debug_cache("lvmcache_update_vg %s no info for %s %s",
+ vg->name,
+ (char *) &pvl->pv->id,
+ pvl->pv->dev ? dev_name(pvl->pv->dev) : "missing");
+ continue;
+ }
+
+ log_debug_cache("lvmcache_update_vg %s for info %s",
+ vg->name, dev_name(info->dev));
+
+ /*
+ * FIXME: use a different function that just attaches info's that
+ * had no metadata onto the correct vginfo.
+ */
+ if (!lvmcache_update_vgname_and_id(info, &vgsummary)) {
+ log_debug_cache("lvmcache_update_vg %s failed to update info for %s",
+ vg->name, dev_name(info->dev));
+ }
}
return 1;
@@ -2703,9 +2813,10 @@ void lvmcache_del_bas(struct lvmcache_info *info)
}
int lvmcache_add_mda(struct lvmcache_info *info, struct device *dev,
- uint64_t start, uint64_t size, unsigned ignored)
+ uint64_t start, uint64_t size, unsigned ignored,
+ struct metadata_area **mda_new)
{
- return add_mda(info->fmt, NULL, &info->mdas, dev, start, size, ignored);
+ return add_mda(info->fmt, NULL, &info->mdas, dev, start, size, ignored, mda_new);
}
int lvmcache_add_da(struct lvmcache_info *info, uint64_t start, uint64_t size)
@@ -3044,3 +3155,103 @@ int lvmcache_scan_mismatch(struct cmd_context *cmd, const char *vgname, const ch
return 1;
}
+/*
+ * This is used by the metadata repair command to check if
+ * the metadata on a dev needs repair because it's old.
+ */
+int lvmcache_has_old_metadata(struct cmd_context *cmd, const char *vgname, const char *vgid, struct device *dev)
+{
+ struct lvmcache_vginfo *vginfo;
+ struct lvmcache_info *info;
+
+ /* shouldn't happen */
+ if (!vgname || !vgid)
+ return 0;
+
+ /* shouldn't happen */
+ if (!(vginfo = lvmcache_vginfo_from_vgid(vgid)))
+ return 0;
+
+ /* shouldn't happen */
+ if (!(info = lvmcache_info_from_pvid(dev->pvid, NULL, 0)))
+ return 0;
+
+ /* on same dev, one mda has newer metadata than the other */
+ if (info->summary_seqno_mismatch)
+ return 1;
+
+ /* one or both mdas on this dev has older metadata than another dev */
+ if (vginfo->seqno > info->summary_seqno)
+ return 1;
+
+ return 0;
+}
+
+/*
+ * Return a dev that has a copy of the metadata with the latest seqno.
+ */
+struct device *lvmcache_get_repair_src_dev(struct cmd_context *cmd,
+ const char *vgname)
+{
+ struct lvmcache_vginfo *vginfo;
+ struct lvmcache_info *info;
+
+ if (!(vginfo = lvmcache_vginfo_from_vgname(vgname, NULL))) {
+ log_error("No info found for VG %s", vgname);
+ return NULL;
+ }
+
+ dm_list_iterate_items(info, &vginfo->infos) {
+ log_debug("get_repair_src_dev %s vg %u dev %u",
+ dev_name(info->dev), vginfo->seqno, info->summary_seqno);
+
+ if (info->summary_seqno == vginfo->seqno)
+ return info->dev;
+ }
+ return NULL;
+}
+
+struct metadata_area *lvmcache_get_repair_src_mda(struct cmd_context *cmd,
+ const char *vgname,
+ struct device *dev,
+ int use_mda_num)
+{
+ struct lvmcache_vginfo *vginfo;
+ struct lvmcache_info *info;
+ struct metadata_area *mda;
+ int use_mda1 = 0;
+ int use_mda2 = 0;
+
+ if (!(vginfo = lvmcache_vginfo_from_vgname(vgname, NULL)))
+ return NULL;
+
+ dm_list_iterate_items(info, &vginfo->infos) {
+ if (info->dev != dev)
+ continue;
+
+ if (use_mda_num == 1)
+ use_mda1 = 1;
+ else if (use_mda_num == 2)
+ use_mda2 = 1;
+ else if (!info->summary_seqno_mismatch)
+ use_mda1 = 1;
+ else if (info->mda1_seqno > info->mda2_seqno)
+ use_mda1 = 1;
+ else if (info->mda2_seqno > info->mda1_seqno)
+ use_mda2 = 1;
+
+ dm_list_iterate_items(mda, &info->mdas) {
+ if (use_mda1 && (mda->status & MDA_PRIMARY)) {
+ log_warn("get_repair_src_mda use mda1");
+ return mda;
+ }
+ if (use_mda2 && !(mda->status & MDA_PRIMARY)) {
+ log_warn("get_repair_src_mda use mda2");
+ return mda;
+ }
+ }
+ return NULL;
+ }
+ return NULL;
+}
+
diff --git a/lib/cache/lvmcache.h b/lib/cache/lvmcache.h
index bf976e9c6..3921a27cb 100644
--- a/lib/cache/lvmcache.h
+++ b/lib/cache/lvmcache.h
@@ -57,10 +57,12 @@ struct lvmcache_vgsummary {
char *creation_host;
const char *system_id;
const char *lock_type;
+ uint32_t seqno;
uint32_t mda_checksum;
size_t mda_size;
- int zero_offset;
- int seqno;
+ int mda_num; /* 1 = summary from mda1, 2 = summary from mda2 */
+ unsigned mda_ignored:1;
+ unsigned zero_offset:1;
};
int lvmcache_init(struct cmd_context *cmd);
@@ -144,7 +146,8 @@ void lvmcache_del_mdas(struct lvmcache_info *info);
void lvmcache_del_das(struct lvmcache_info *info);
void lvmcache_del_bas(struct lvmcache_info *info);
int lvmcache_add_mda(struct lvmcache_info *info, struct device *dev,
- uint64_t start, uint64_t size, unsigned ignored);
+ uint64_t start, uint64_t size, unsigned ignored,
+ struct metadata_area **mda_new);
int lvmcache_add_da(struct lvmcache_info *info, uint64_t start, uint64_t size);
int lvmcache_add_ba(struct lvmcache_info *info, uint64_t start, uint64_t size);
@@ -225,4 +228,12 @@ struct volume_group *lvmcache_get_saved_vg(const char *vgid, int precommitted);
struct volume_group *lvmcache_get_saved_vg_latest(const char *vgid);
void lvmcache_drop_saved_vgid(const char *vgid);
+void lvmcache_set_bad_metadata(struct lvmcache_info *info, int mda1_bad, int mda2_bad);
+int lvmcache_has_bad_metadata(struct device *dev);
+int lvmcache_has_old_metadata(struct cmd_context *cmd, const char *vgname, const char *vgid, struct device *dev);
+struct device *lvmcache_get_repair_src_dev(struct cmd_context *cmd,
+ const char *vgname);
+struct metadata_area *lvmcache_get_repair_src_mda(struct cmd_context *cmd,
+ const char *vgname,
+ struct device *dev, int use_mda_num);
#endif
diff --git a/lib/format_text/format-text.c b/lib/format_text/format-text.c
index 4160ba810..977ad84b5 100644
--- a/lib/format_text/format-text.c
+++ b/lib/format_text/format-text.c
@@ -308,7 +308,7 @@ static int _text_lv_setup(struct format_instance *fid __attribute__((unused)),
return 1;
}
-static void _xlate_mdah(struct mda_header *mdah)
+void xlate_mdah(struct mda_header *mdah)
{
struct raw_locn *rl;
@@ -344,7 +344,7 @@ static int _raw_read_mda_header(struct mda_header *mdah, struct device_area *dev
return 0;
}
- _xlate_mdah(mdah);
+ xlate_mdah(mdah);
if (strncmp((char *)mdah->magic, FMTT_MAGIC, sizeof(mdah->magic))) {
log_error("Wrong magic number in metadata area header on %s at %llu",
@@ -395,7 +395,7 @@ static int _raw_write_mda_header(const struct format_type *fmt,
mdah->version = FMTT_VERSION;
mdah->start = start_byte;
- _xlate_mdah(mdah);
+ xlate_mdah(mdah);
mdah->checksum_xl = xlate32(calc_crc(INITIAL_CRC, (uint8_t *)mdah->magic,
MDA_HEADER_SIZE -
sizeof(mdah->checksum_xl)));
@@ -1239,7 +1239,7 @@ int read_metadata_location_summary(const struct format_type *fmt,
/* Ignore this entry if the characters aren't permissible */
if (!validate_name(buf)) {
- log_error("Metadata location on %s at %llu begins with invalid VG name.",
+ log_warn("WARNING: metadata on %s at %llu begins with invalid VG name.",
dev_name(dev_area->dev),
(unsigned long long)(dev_area->start + rlocn->offset));
return 0;
@@ -1250,7 +1250,7 @@ int read_metadata_location_summary(const struct format_type *fmt,
wrap = (uint32_t) ((rlocn->offset + rlocn->size) - mdah->size);
if (wrap > rlocn->offset) {
- log_error("Metadata location on %s at %llu is too large for circular buffer.",
+ log_warn("WARNING: metadata on %s at %llu is too large for circular buffer.",
dev_name(dev_area->dev),
(unsigned long long)(dev_area->start + rlocn->offset));
return 0;
@@ -1302,7 +1302,7 @@ int read_metadata_location_summary(const struct format_type *fmt,
(off_t) (dev_area->start + MDA_HEADER_SIZE),
wrap, calc_crc, vgsummary->vgname ? 1 : 0,
vgsummary)) {
- log_error("Metadata location on %s at %llu has invalid summary for VG.",
+ log_warn("WARNING: metadata on %s at %llu has invalid summary for VG.",
dev_name(dev_area->dev),
(unsigned long long)(dev_area->start + rlocn->offset));
return 0;
@@ -1310,7 +1310,7 @@ int read_metadata_location_summary(const struct format_type *fmt,
/* Ignore this entry if the characters aren't permissible */
if (!validate_name(vgsummary->vgname)) {
- log_error("Metadata location on %s at %llu has invalid VG name.",
+ log_warn("WARNING: metadata on %s at %llu has invalid VG name.",
dev_name(dev_area->dev),
(unsigned long long)(dev_area->start + rlocn->offset));
return 0;
@@ -1472,7 +1472,7 @@ static int _text_pv_write(const struct format_type *fmt, struct physical_volume
// if fmt is not the same as info->fmt we are in trouble
if (!lvmcache_add_mda(info, mdac->area.dev,
mdac->area.start, mdac->area.size,
- mda_is_ignored(mda)))
+ mda_is_ignored(mda), NULL))
return_0;
}
@@ -1842,7 +1842,7 @@ static int _mda_import_text_raw(struct lvmcache_info *info, const struct dm_conf
offset = dm_config_find_int64(cn, "start", 0);
ignore = dm_config_find_int(cn, "ignore", 0);
- lvmcache_add_mda(info, device, offset, size, ignore);
+ lvmcache_add_mda(info, device, offset, size, ignore, NULL);
return 1;
}
diff --git a/lib/format_text/format-text.h b/lib/format_text/format-text.h
index d6e6b033e..431280e59 100644
--- a/lib/format_text/format-text.h
+++ b/lib/format_text/format-text.h
@@ -61,7 +61,8 @@ int add_ba(struct dm_pool *mem, struct dm_list *eas,
uint64_t start, uint64_t size);
void del_bas(struct dm_list *bas);
int add_mda(const struct format_type *fmt, struct dm_pool *mem, struct dm_list *mdas,
- struct device *dev, uint64_t start, uint64_t size, unsigned ignored);
+ struct device *dev, uint64_t start, uint64_t size, unsigned ignored,
+ struct metadata_area **mda_new);
void del_mdas(struct dm_list *mdas);
/* On disk */
@@ -76,4 +77,7 @@ struct data_area_list {
struct disk_locn disk_locn;
};
+struct mda_header;
+void xlate_mdah(struct mda_header *mdah);
+
#endif
diff --git a/lib/format_text/import.c b/lib/format_text/import.c
index 4b344856f..0c13c20d2 100644
--- a/lib/format_text/import.c
+++ b/lib/format_text/import.c
@@ -61,13 +61,13 @@ int text_read_metadata_summary(const struct format_type *fmt,
offset2, size2, checksum_fn,
vgsummary->mda_checksum,
checksum_only, 1)) {
- /* FIXME: handle errors */
- log_error("Couldn't read volume group metadata from %s.", dev_name(dev));
+ log_warn("WARNING: invalid metadata text from %s at %llu.",
+ dev_name(dev), (unsigned long long)offset);
goto out;
}
} else {
if (!config_file_read(cft)) {
- log_error("Couldn't read volume group metadata from file.");
+ log_warn("WARNING: invalid metadata text from file.");
goto out;
}
}
@@ -229,9 +229,11 @@ static struct volume_group *_import_vg_from_config_tree(const struct dm_config_t
*/
if (!(vg = (*vsn)->read_vg(fid, cft, allow_lvmetad_extensions)))
stack;
- else if ((vg_missing = vg_missing_pv_count(vg))) {
- log_verbose("There are %d physical volumes missing.",
- vg_missing);
+ else {
+ set_pv_devices(fid, vg);
+
+ if ((vg_missing = vg_missing_pv_count(vg)))
+ log_verbose("There are %d physical volumes missing.", vg_missing);
vg_mark_partial_lvs(vg, 1);
/* FIXME: move this code inside read_vg() */
}
diff --git a/lib/format_text/import_vsn1.c b/lib/format_text/import_vsn1.c
index 58f517ec2..176873083 100644
--- a/lib/format_text/import_vsn1.c
+++ b/lib/format_text/import_vsn1.c
@@ -213,21 +213,6 @@ static int _read_pv(struct format_instance *fid,
pv->is_labelled = 1; /* All format_text PVs are labelled. */
- /*
- * Convert the uuid into a device.
- */
- if (!(pv->dev = lvmcache_device_from_pvid(fid->fmt->cmd, &pv->id, &pv->label_sector))) {
- char buffer[64] __attribute__((aligned(8)));
-
- if (!id_write_format(&pv->id, buffer, sizeof(buffer)))
- buffer[0] = '\0';
-
- if (fid->fmt->cmd && !fid->fmt->cmd->pvscan_cache_single)
- log_error_once("Couldn't find device with uuid %s.", buffer);
- else
- log_debug_metadata("Couldn't find device with uuid %s.", buffer);
- }
-
if (!(pv->vg_name = dm_pool_strdup(mem, vg->name)))
return_0;
@@ -238,16 +223,6 @@ static int _read_pv(struct format_instance *fid,
return 0;
}
- /* TODO is the !lvmetad_used() too coarse here? */
- if (!pv->dev && !lvmetad_used())
- pv->status |= MISSING_PV;
-
- if ((pv->status & MISSING_PV) && pv->dev && pv_mda_used_count(pv) == 0) {
- pv->status &= ~MISSING_PV;
- log_info("Recovering a previously MISSING PV %s with no MDAs.",
- pv_dev_name(pv));
- }
-
/* Late addition */
if (dm_config_has_node(pvn, "dev_size") &&
!_read_uint64(pvn, "dev_size", &pv->size)) {
@@ -300,21 +275,6 @@ static int _read_pv(struct format_instance *fid,
pv->pe_align = 0;
pv->fmt = fid->fmt;
- /* Fix up pv size if missing or impossibly large */
- if ((!pv->size || pv->size > (1ULL << 62)) && pv->dev) {
- if (!dev_get_size(pv->dev, &pv->size)) {
- log_error("%s: Couldn't get size.", pv_dev_name(pv));
- return 0;
- }
- log_verbose("Fixing up missing size (%s) "
- "for PV %s", display_size(fid->fmt->cmd, pv->size),
- pv_dev_name(pv));
- size = pv->pe_count * (uint64_t) vg->extent_size + pv->pe_start;
- if (size > pv->size)
- log_warn("WARNING: Physical Volume %s is too large "
- "for underlying device", pv_dev_name(pv));
- }
-
if (!alloc_pv_segment_whole_pv(mem, pv))
return_0;
diff --git a/lib/format_text/text_label.c b/lib/format_text/text_label.c
index 7d10e065b..f95d3ea61 100644
--- a/lib/format_text/text_label.c
+++ b/lib/format_text/text_label.c
@@ -240,11 +240,10 @@ void del_bas(struct dm_list *bas)
del_das(bas);
}
-/* FIXME: refactor this function with other mda constructor code */
int add_mda(const struct format_type *fmt, struct dm_pool *mem, struct dm_list *mdas,
- struct device *dev, uint64_t start, uint64_t size, unsigned ignored)
+ struct device *dev, uint64_t start, uint64_t size, unsigned ignored,
+ struct metadata_area **mda_new)
{
-/* FIXME List size restricted by pv_header SECTOR_SIZE */
struct metadata_area *mdal, *mda;
struct mda_lists *mda_lists = (struct mda_lists *) fmt->private;
struct mda_context *mdac, *mdac2;
@@ -294,9 +293,18 @@ int add_mda(const struct format_type *fmt, struct dm_pool *mem, struct dm_list *
mda_set_ignored(mdal, ignored);
dm_list_add(mdas, &mdal->list);
+ if (mda_new)
+ *mda_new = mdal;
return 1;
}
+static void _del_mda(struct metadata_area *mda)
+{
+ dm_free(mda->metadata_locn);
+ dm_list_del(&mda->list);
+ dm_free(mda);
+}
+
void del_mdas(struct dm_list *mdas)
{
struct dm_list *mdah, *tmp;
@@ -304,9 +312,7 @@ void del_mdas(struct dm_list *mdas)
dm_list_iterate_safe(mdah, tmp, mdas) {
mda = dm_list_item(mdah, struct metadata_area);
- dm_free(mda->metadata_locn);
- dm_list_del(&mda->list);
- dm_free(mda);
+ _del_mda(mda);
}
}
@@ -318,72 +324,84 @@ static int _text_initialise_label(struct labeller *l __attribute__((unused)),
return 1;
}
-struct _update_mda_baton {
- struct lvmcache_info *info;
- struct label *label;
-};
-
-static int _read_mda_header_and_metadata(struct metadata_area *mda, void *baton)
+static int _read_mda_header_and_metadata(const struct format_type *fmt,
+ struct metadata_area *mda, int mda_num,
+ struct lvmcache_vgsummary *vgsummary)
{
- struct _update_mda_baton *p = baton;
- const struct format_type *fmt = p->label->labeller->fmt;
struct mda_context *mdac = (struct mda_context *) mda->metadata_locn;
struct mda_header *mdah;
- struct lvmcache_vgsummary vgsummary = { 0 };
if (!(mdah = raw_read_mda_header(fmt, &mdac->area, mda_is_primary(mda)))) {
- log_error("Failed to read mda header from %s", dev_name(mdac->area.dev));
- goto fail;
+ log_warn("WARNING: bad metadata header on %s at %llu.",
+ dev_name(mdac->area.dev),
+ (unsigned long long)mdac->area.start);
+ return 0;
}
+ if (mda)
+ mda->header_start = mdah->start;
+
mda_set_ignored(mda, rlocn_is_ignored(mdah->raw_locns));
if (mda_is_ignored(mda)) {
log_debug_metadata("Ignoring mda on device %s at offset " FMTu64,
dev_name(mdac->area.dev),
mdac->area.start);
+ vgsummary->mda_ignored = 1;
return 1;
}
if (!read_metadata_location_summary(fmt, mdah, mda_is_primary(mda), &mdac->area,
- &vgsummary, &mdac->free_sectors)) {
- if (vgsummary.zero_offset)
+ vgsummary, &mdac->free_sectors)) {
+ if (vgsummary->zero_offset)
return 1;
- log_error("Failed to read metadata summary from %s", dev_name(mdac->area.dev));
- goto fail;
- }
-
- if (!lvmcache_update_vgname_and_id(p->info, &vgsummary)) {
- log_error("Failed to save lvm summary for %s", dev_name(mdac->area.dev));
- goto fail;
+ log_warn("WARNING: bad metadata text on %s in mda%d",
+ dev_name(mdac->area.dev), mda_num);
+ return 0;
}
return 1;
-
-fail:
- lvmcache_del(p->info);
- return 0;
}
-static int _text_read(struct labeller *l, struct device *dev, void *label_buf,
+/*
+ * Used by label_scan to get a summary of the VG that exists on this PV. This
+ * summary is stored in lvmcache vginfo/info/info->mdas and is used later by
+ * vg_read which needs to know which PVs to read for a given VG name, and where
+ * the metadata is at for those PVs.
+ */
+
+static int _text_read(struct labeller *labeller, struct device *dev, void *label_buf,
struct label **label)
{
+ struct lvmcache_vgsummary vgsummary;
+ struct lvmcache_info *info;
+ const struct format_type *fmt = labeller->fmt;
struct label_header *lh = (struct label_header *) label_buf;
struct pv_header *pvhdr;
struct pv_header_extension *pvhdr_ext;
- struct lvmcache_info *info;
+ struct metadata_area *mda;
+ struct metadata_area *mda1 = NULL;
+ struct metadata_area *mda2 = NULL;
struct disk_locn *dlocn_xl;
uint64_t offset;
uint32_t ext_version;
- struct _update_mda_baton baton;
+ int mda_count = 0;
+ int good_mda_count = 0;
+ int bad_mda_count = 0;
+ int mda1_bad = 0;
+ int mda2_bad = 0;
+ int rv1, rv2;
/*
* PV header base
*/
pvhdr = (struct pv_header *) ((char *) label_buf + xlate32(lh->offset_xl));
- if (!(info = lvmcache_add(l, (char *)pvhdr->pv_uuid, dev,
+ /* FIXME: errors from lvmcache_add need to be handled better */
+ /* FIXME: stop pretending that the PV is initially an orphan. */
+
+ if (!(info = lvmcache_add(labeller, (char *)pvhdr->pv_uuid, dev,
FMT_TEXT_ORPHAN_VG_NAME,
FMT_TEXT_ORPHAN_VG_NAME, 0)))
return_0;
@@ -403,11 +421,23 @@ static int _text_read(struct labeller *l, struct device *dev, void *label_buf,
dlocn_xl++;
}
- /* Metadata area headers */
dlocn_xl++;
+
+ /* Metadata areas */
while ((offset = xlate64(dlocn_xl->offset))) {
- lvmcache_add_mda(info, dev, offset, xlate64(dlocn_xl->size), 0);
+
+ /*
+ * This just calls add_mda() above, replacing info with info->mdas.
+ */
+ lvmcache_add_mda(info, dev, offset, xlate64(dlocn_xl->size), 0, &mda);
+
dlocn_xl++;
+ mda_count++;
+
+ if (mda_count == 1)
+ mda1 = mda;
+ else if (mda_count == 2)
+ mda2 = mda;
}
dlocn_xl++;
@@ -417,7 +447,7 @@ static int _text_read(struct labeller *l, struct device *dev, void *label_buf,
*/
pvhdr_ext = (struct pv_header_extension *) ((char *) dlocn_xl);
if (!(ext_version = xlate32(pvhdr_ext->version)))
- goto out;
+ goto scan_mdas;
log_debug_metadata("%s: PV header extension version " FMTu32 " found",
dev_name(dev), ext_version);
@@ -434,22 +464,100 @@ static int _text_read(struct labeller *l, struct device *dev, void *label_buf,
lvmcache_add_ba(info, offset, xlate64(dlocn_xl->size));
dlocn_xl++;
}
-out:
- baton.info = info;
- baton.label = *label;
+
+ scan_mdas:
+ if (!mda_count) {
+ log_debug_metadata("Scanning %s found no mdas.", dev_name(dev));
+ return 1;
+ }
+
+ if (mda1) {
+ log_debug_metadata("Scanning %s mda1 summary.", dev_name(dev));
+ memset(&vgsummary, 0, sizeof(vgsummary));
+ vgsummary.mda_num = 1;
+
+ rv1 = _read_mda_header_and_metadata(fmt, mda1, 1, &vgsummary);
+
+ if (rv1 && !vgsummary.zero_offset && !vgsummary.mda_ignored) {
+ if (!lvmcache_update_vgname_and_id(info, &vgsummary)) {
+ /* I believe this is only an internal error. */
+ log_warn("Scanning %s mda1 failed to save summary.", dev_name(dev));
+ _del_mda(mda1);
+ mda1 = NULL;
+ mda1_bad = 1;
+ bad_mda_count++;
+ } else {
+ log_warn("Scanned %s mda1 seqno %u", dev_name(dev), vgsummary.seqno);
+ good_mda_count++;
+ }
+ }
+
+ if (!rv1) {
+ /* Remove the bad mda so vg_read won't try to read it. */
+ log_warn("WARNING: scanning %s mda1 failed to read metadata summary.", dev_name(dev));
+ log_warn("WARNING: repair VG metadata on %s with vgck --repairmetadata.", dev_name(dev));
+ _del_mda(mda1);
+ mda1 = NULL;
+ mda1_bad = 1;
+ bad_mda_count++;
+ }
+ }
+
+ if (mda2) {
+ log_debug_metadata("Scanning %s mda2 summary.", dev_name(dev));
+ memset(&vgsummary, 0, sizeof(vgsummary));
+ vgsummary.mda_num = 2;
+
+ rv2 = _read_mda_header_and_metadata(fmt, mda2, 2, &vgsummary);
+
+ if (rv2 && !vgsummary.zero_offset && !vgsummary.mda_ignored) {
+ if (!lvmcache_update_vgname_and_id(info, &vgsummary)) {
+ /* I believe this is only an internal error. */
+ log_warn("Scanning %s mda2 failed to save summary.", dev_name(dev));
+ _del_mda(mda2);
+ mda2 = NULL;
+ mda2_bad = 1;
+ bad_mda_count++;
+ } else {
+ log_warn("Scanned %s mda2 seqno %u", dev_name(dev), vgsummary.seqno);
+ good_mda_count++;
+ }
+ }
+
+ if (!rv2) {
+ /* Remove the bad mda so vg_read won't try to read it. */
+ log_warn("WARNING: scanning %s mda2 failed to read metadata summary.", dev_name(dev));
+ log_warn("WARNING: repair VG metadata on %s with vgck --repairmetadata.", dev_name(dev));
+ _del_mda(mda2);
+ mda2 = NULL;
+ mda2_bad = 1;
+ bad_mda_count++;
+ }
+ }
+
+ log_debug_metadata("Scanning %s found mda_count %d mda1_bad %d mda2_bad %d",
+ dev_name(dev), mda_count, mda1_bad, mda2_bad);
/*
- * In the vg_read phase, we compare all mdas and decide which to use
- * which are bad and need repair.
+ * Track which devs have bad metadata so repair can find them
+ * (even if this dev also has good metadata that we are able to use).
*
- * FIXME: this quits if the first mda is bad, but we need something
- * smarter to be able to use the second mda if it's good.
+ * When bad metadata is seen above, the unusable mda struct is
+ * removed from lvmcache info->mdas. This means that vg_read will
+ * skip the bad mda not try to read the bad metadata. It also means
+ * that vg_write will also skip the bad mda and not try to write
+ * new metadata to it.
*/
- if (!lvmcache_foreach_mda(info, _read_mda_header_and_metadata, &baton)) {
- log_error("Failed to scan VG from %s", dev_name(dev));
+ if (mda1_bad || mda2_bad)
+ lvmcache_set_bad_metadata(info, mda1_bad, mda2_bad);
+
+ if (good_mda_count)
+ return 1;
+
+ if (bad_mda_count)
return 0;
- }
+ /* no metadata in the mdas */
return 1;
}
diff --git a/lib/label/label.c b/lib/label/label.c
index e01608d2c..842871984 100644
--- a/lib/label/label.c
+++ b/lib/label/label.c
@@ -426,8 +426,18 @@ static int _process_block(struct cmd_context *cmd, struct dev_filter *f,
label->dev = dev;
label->sector = sector;
} else {
- /* FIXME: handle errors */
- lvmcache_del_dev(dev);
+ /*
+ * Leave the info in lvmcache because the device is present and can
+ * still be used even if it has metadata that we can't use (we can
+ * use metadata from another PV/mda.)
+ * _text_read only saves mdas with good metadata in lvmcache, and
+ * if a PV has no mdas with good metadata, then the info for the PV
+ * will be in lvmcache with empty info->mdas, and it will behave
+ * like a PV with no mdas (a common configuration.)
+ */
+ label->dev = dev;
+ label->sector = sector;
+ log_warn("WARNING: failed to read metadata summary from %s PVID %s", dev_name(dev), dev->pvid);
}
out:
return ret;
@@ -690,7 +700,6 @@ static int _scan_list(struct cmd_context *cmd, struct dev_filter *f,
scan_failed = 1;
scan_process_errors++;
scan_failed_count++;
- lvmcache_del_dev(devl->dev);
}
}
diff --git a/lib/metadata/metadata-exported.h b/lib/metadata/metadata-exported.h
index f4fb112a8..76fdfd1e6 100644
--- a/lib/metadata/metadata-exported.h
+++ b/lib/metadata/metadata-exported.h
@@ -1320,4 +1320,12 @@ int is_system_id_allowed(struct cmd_context *cmd, const char *system_id);
int vg_strip_outdated_historical_lvs(struct volume_group *vg);
+int vg_repair_metadata(struct cmd_context *cmd, const char *vgname,
+ const char *dev_src_name,
+ const char *file_src_name,
+ struct dm_list *dev_dst_list);
+
+int vg_dump_metadata(struct cmd_context *cmd, const char *vgname,
+ const char *dev_src_name, int force, const char *tofile);
+
#endif
diff --git a/lib/metadata/metadata.c b/lib/metadata/metadata.c
index 229256822..263ae48f5 100644
--- a/lib/metadata/metadata.c
+++ b/lib/metadata/metadata.c
@@ -34,6 +34,12 @@
#include "time.h"
#include "lvmnotify.h"
+#include "format-text.h"
+#include "layout.h"
+#include "import-export.h"
+#include "xlate.h"
+#include "crc.h"
+
#include <math.h>
#include <sys/param.h>
@@ -109,6 +115,61 @@ out:
return pv->pe_align;
}
+static void _set_pv_device(struct format_instance *fid,
+ struct volume_group *vg,
+ struct physical_volume *pv)
+{
+ char buffer[64] __attribute__((aligned(8)));
+ uint64_t size;
+
+ if (!(pv->dev = lvmcache_device_from_pvid(fid->fmt->cmd, &pv->id, &pv->label_sector))) {
+ if (!id_write_format(&pv->id, buffer, sizeof(buffer)))
+ buffer[0] = '\0';
+
+ if (fid->fmt->cmd && !fid->fmt->cmd->pvscan_cache_single)
+ log_error_once("Couldn't find device with uuid %s.", buffer);
+ else
+ log_debug_metadata("Couldn't find device with uuid %s.", buffer);
+ }
+
+ if (!pv->dev && !lvmetad_used())
+ pv->status |= MISSING_PV;
+
+ if ((pv->status & MISSING_PV) && pv->dev && pv_mda_used_count(pv) == 0) {
+ pv->status &= ~MISSING_PV;
+ log_info("Found a previously MISSING PV %s with no MDAs.", pv_dev_name(pv));
+ }
+
+ /* Fix up pv size if missing or impossibly large */
+ if ((!pv->size || pv->size > (1ULL << 62)) && pv->dev) {
+ if (!dev_get_size(pv->dev, &pv->size)) {
+ log_error("%s: Couldn't get size.", pv_dev_name(pv));
+ return;
+ }
+ log_verbose("Fixing up missing size (%s) for PV %s", display_size(fid->fmt->cmd, pv->size),
+ pv_dev_name(pv));
+ size = pv->pe_count * (uint64_t) vg->extent_size + pv->pe_start;
+ if (size > pv->size)
+ log_warn("WARNING: Physical Volume %s is too large "
+ "for underlying device", pv_dev_name(pv));
+ }
+}
+
+/*
+ * Finds the 'struct device' that correponds to each PV in the metadata,
+ * and may make some adjustments to vg fields based on the dev properties.
+ */
+void set_pv_devices(struct format_instance *fid, struct volume_group *vg)
+{
+ struct pv_list *pvl;
+
+ dm_list_iterate_items(pvl, &vg->pvs)
+ _set_pv_device(fid, vg, pvl->pv);
+
+ dm_list_iterate_items(pvl, &vg->pvs_outdated)
+ _set_pv_device(fid, vg, pvl->pv);
+}
+
unsigned long set_pe_align_offset(struct physical_volume *pv,
unsigned long data_alignment_offset)
{
@@ -2927,6 +2988,7 @@ int vg_write(struct volume_group *vg)
struct pv_list *pvl, *pvl_safe;
struct metadata_area *mda;
struct lv_list *lvl;
+ struct device *mda_dev;
int revert = 0, wrote = 0;
dm_list_iterate_items(lvl, &vg->lvs) {
@@ -3006,8 +3068,23 @@ int vg_write(struct volume_group *vg)
/* Write to each copy of the metadata area */
dm_list_iterate_items(mda, &vg->fid->metadata_areas_in_use) {
+ mda_dev = mda_get_device(mda);
+
if (mda->status & MDA_FAILED)
continue;
+
+ /*
+ * When the scan and vg_read find old metadata in an mda, they
+ * leave the info struct in lvmcache, and leave the mda in
+ * info->mdas. That means we use the mda here to write new
+ * metadata into. This means that a command writing a VG will
+ * automatically update old metadata to the latest.
+ */
+ if (lvmcache_has_old_metadata(vg->cmd, vg->name, (const char *)&vg->id, mda_dev)) {
+ log_warn("WARNING: updating old metadata to %u on %s for VG %s.",
+ vg->seqno, dev_name(mda_dev), vg->name);
+ }
+
if (!mda->ops->vg_write) {
log_error("Format does not support writing volume"
"group metadata areas");
@@ -3708,9 +3785,6 @@ static int _check_or_repair_pv_ext(struct cmd_context *cmd,
r = 1;
out:
- if ((pvs_fixed > 0) && !_repair_inconsistent_vg(vg, lockd_state))
- return_0;
-
return r;
}
@@ -3739,20 +3813,18 @@ static struct volume_group *_vg_read(struct cmd_context *cmd,
struct format_instance *fid = NULL;
struct format_instance_ctx fic;
const struct format_type *fmt;
- struct volume_group *vg, *correct_vg = NULL;
+ struct volume_group *vg, *vg_ret = NULL;
struct metadata_area *mda;
- struct lvmcache_info *info;
int inconsistent = 0;
int inconsistent_vgid = 0;
int inconsistent_pvs = 0;
int inconsistent_mdas = 0;
- int inconsistent_mda_count = 0;
int strip_historical_lvs = *consistent;
int update_old_pv_ext = *consistent;
unsigned use_precommitted = precommitted;
struct dm_list *pvids;
struct pv_list *pvl;
- struct dm_list all_pvs;
+ struct device *mda_dev, *dev_ret;
char uuid[64] __attribute__((aligned(8)));
int skipped_rescan = 0;
@@ -3778,29 +3850,29 @@ static struct volume_group *_vg_read(struct cmd_context *cmd,
log_very_verbose("Reading VG %s %s", vgname ?: "<no name>", vgid ? uuid : "<no vgid>");
if (lvmetad_used() && !use_precommitted) {
- if ((correct_vg = lvmetad_vg_lookup(cmd, vgname, vgid))) {
- dm_list_iterate_items(pvl, &correct_vg->pvs)
- reappeared += _check_reappeared_pv(correct_vg, pvl->pv, *consistent);
+ if ((vg_ret = lvmetad_vg_lookup(cmd, vgname, vgid))) {
+ dm_list_iterate_items(pvl, &vg_ret->pvs)
+ reappeared += _check_reappeared_pv(vg_ret, pvl->pv, *consistent);
if (reappeared && *consistent)
- *consistent = _repair_inconsistent_vg(correct_vg, lockd_state);
+ *consistent = _repair_inconsistent_vg(vg_ret, lockd_state);
else
*consistent = !reappeared;
- if (_wipe_outdated_pvs(cmd, correct_vg, &correct_vg->pvs_outdated, lockd_state)) {
+ if (_wipe_outdated_pvs(cmd, vg_ret, &vg_ret->pvs_outdated, lockd_state)) {
/* clear the list */
- dm_list_init(&correct_vg->pvs_outdated);
- lvmetad_vg_clear_outdated_pvs(correct_vg);
+ dm_list_init(&vg_ret->pvs_outdated);
+ lvmetad_vg_clear_outdated_pvs(vg_ret);
}
}
- if (correct_vg) {
- if (update_old_pv_ext && !_vg_update_old_pv_ext_if_needed(correct_vg)) {
- release_vg(correct_vg);
+ if (vg_ret) {
+ if (update_old_pv_ext && !_vg_update_old_pv_ext_if_needed(vg_ret)) {
+ release_vg(vg_ret);
return_NULL;
}
- if (strip_historical_lvs && !vg_strip_outdated_historical_lvs(correct_vg)) {
- release_vg(correct_vg);
+ if (strip_historical_lvs && !vg_strip_outdated_historical_lvs(vg_ret)) {
+ release_vg(vg_ret);
return_NULL;
}
@@ -3812,12 +3884,12 @@ static struct volume_group *_vg_read(struct cmd_context *cmd,
* we should just read the vg from disk entirely
* and skip reading it from lvmetad.
*/
- dm_list_iterate_items(pvl, &correct_vg->pvs)
+ dm_list_iterate_items(pvl, &vg_ret->pvs)
label_scan_open(pvl->pv->dev);
}
- return correct_vg;
+ return vg_ret;
}
/*
@@ -3930,440 +4002,155 @@ static struct volume_group *_vg_read(struct cmd_context *cmd,
* label scan and then copied into fid by create_instance().
*/
- /* create format instance with appropriate metadata area */
fic.type = FMT_INSTANCE_MDAS | FMT_INSTANCE_AUX_MDAS;
fic.context.vg_ref.vg_name = vgname;
fic.context.vg_ref.vg_id = vgid;
+
+ /*
+ * Sets up the metadata areas that we need to read below.
+ */
if (!(fid = fmt->ops->create_instance(fmt, &fic))) {
log_error("Failed to create format instance");
return NULL;
}
- /* Store pvids for later so we can check if any are missing */
- if (!(pvids = lvmcache_get_pvids(cmd, vgname, vgid))) {
- _destroy_fid(&fid);
- return_NULL;
- }
-
/*
* We use the fid globally here so prevent the release_vg
* call to destroy the fid - we may want to reuse it!
*/
fid->ref_count++;
- /* Ensure contents of all metadata areas match - else do recovery */
- inconsistent_mda_count=0;
- dm_list_iterate_items(mda, &fid->metadata_areas_in_use) {
- struct device *mda_dev = mda_get_device(mda);
-
- use_previous_vg = 0;
-
- log_debug_metadata("Reading VG %s from %s", vgname, dev_name(mda_dev));
- if ((use_precommitted &&
- !(vg = mda->ops->vg_read_precommit(fid, vgname, mda, &vg_fmtdata, &use_previous_vg)) && !use_previous_vg) ||
- (!use_precommitted &&
- !(vg = mda->ops->vg_read(fid, vgname, mda, &vg_fmtdata, &use_previous_vg)) && !use_previous_vg)) {
- inconsistent = 1;
- vg_fmtdata = NULL;
- continue;
- }
- /* Use previous VG because checksum matches */
- if (!vg) {
- vg = correct_vg;
- continue;
- }
+ /*
+ * label_scan found PVs for this VG and set up lvmcache to describe the
+ * VG/PVs that we use here to read the VG. It created 'vginfo' for the
+ * VG, and created an 'info' attached to vginfo for each PV. It also
+ * added a metadata_area struct to info->mdas for each metadata area it
+ * found on the PV. The info->mdas structs are copied to
+ * fid->metadata_areas_in_use by create_instance above, and here we
+ * read VG metadata from each of those mdas.
+ */
+ dm_list_iterate_items(mda, &fid->metadata_areas_in_use) {
+ mda_dev = mda_get_device(mda);
- if (!correct_vg) {
- correct_vg = vg;
+ /* I don't think this can happen */
+ if (!mda_dev) {
+ log_warn("Ignoring metadata for VG %s from missing dev.", vgname);
continue;
}
- /* FIXME Also ensure contents same - checksum compare? */
- if (correct_vg->seqno != vg->seqno) {
- if (cmd->metadata_read_only || skipped_rescan)
- log_warn("Not repairing metadata for VG %s.", vgname);
- else
- inconsistent = 1;
-
- if (vg->seqno > correct_vg->seqno) {
- release_vg(correct_vg);
- correct_vg = vg;
- } else {
- mda->status |= MDA_INCONSISTENT;
- ++inconsistent_mda_count;
- }
- }
-
- if (vg != correct_vg) {
- release_vg(vg);
- vg_fmtdata = NULL;
- }
- }
- fid->ref_count--;
-
- /* Ensure every PV in the VG was in the cache */
- if (correct_vg) {
- /*
- * Update the seqno from the cache, for the benefit of
- * retro-style metadata formats like LVM1.
- */
- // correct_vg->seqno = seqno > correct_vg->seqno ? seqno : correct_vg->seqno;
-
- /*
- * If the VG has PVs without mdas, or ignored mdas, they may
- * still be orphans in the cache: update the cache state here,
- * and update the metadata lists in the vg.
- */
- if (!inconsistent &&
- dm_list_size(&correct_vg->pvs) > dm_list_size(pvids)) {
- dm_list_iterate_items(pvl, &correct_vg->pvs) {
- if (!pvl->pv->dev) {
- inconsistent_pvs = 1;
- break;
- }
-
- if (str_list_match_item(pvids, pvl->pv->dev->pvid))
- continue;
-
- /*
- * PV not marked as belonging to this VG in cache.
- * Check it's an orphan without metadata area
- * not ignored.
- */
- if (!(info = lvmcache_info_from_pvid(pvl->pv->dev->pvid, pvl->pv->dev, 1)) ||
- !lvmcache_is_orphan(info)) {
- inconsistent_pvs = 1;
- break;
- }
-
- if (lvmcache_mda_count(info)) {
- if (!lvmcache_fid_add_mdas_pv(info, fid)) {
- release_vg(correct_vg);
- return_NULL;
- }
-
- log_debug_metadata("Empty mda found for VG %s on %s.",
- vgname, dev_name(pvl->pv->dev));
-
-#if 0
- /*
- * If we are going to do any repair we have to be using
- * the latest metadata on disk, so we have to rescan devs
- * if we skipped that at the start of the vg_read. We'll
- * likely come back through here, but without having
- * skipped_rescan.
- *
- * FIXME: in some cases we don't want to do this.
- */
- if (skipped_rescan && cmd->can_use_one_scan) {
- log_debug_metadata("Restarting read to rescan devs.");
- cmd->can_use_one_scan = 0;
- release_vg(correct_vg);
- correct_vg = NULL;
- lvmcache_del(info);
- label_read(pvl->pv->dev);
- goto restart_scan;
- }
-#endif
-
- if (inconsistent_mdas)
- continue;
-
- /*
- * If any newly-added mdas are in-use then their
- * metadata needs updating.
- */
- lvmcache_foreach_mda(info, _check_mda_in_use,
- &inconsistent_mdas);
- }
- }
-
- /* If the check passed, let's update VG and recalculate pvids */
- if (!inconsistent_pvs) {
- log_debug_metadata("Updating cache for PVs without mdas "
- "in VG %s.", vgname);
- /*
- * If there is no precommitted metadata, committed metadata
- * is read and stored in the cache even if use_precommitted is set
- */
- lvmcache_update_vg(correct_vg, correct_vg->status & PRECOMMITTED);
-
- if (!(pvids = lvmcache_get_pvids(cmd, vgname, vgid))) {
- release_vg(correct_vg);
- return_NULL;
- }
- }
- }
-
- fid->ref_count++;
- if (dm_list_size(&correct_vg->pvs) !=
- dm_list_size(pvids) + vg_missing_pv_count(correct_vg)) {
- log_debug_metadata("Cached VG %s had incorrect PV list",
- vgname);
-
- if (prioritized_section())
- inconsistent = 1;
- else {
- release_vg(correct_vg);
- correct_vg = NULL;
- }
- } else dm_list_iterate_items(pvl, &correct_vg->pvs) {
- if (is_missing_pv(pvl->pv))
- continue;
- if (!str_list_match_item(pvids, pvl->pv->dev->pvid)) {
- log_debug_metadata("Cached VG %s had incorrect PV list",
- vgname);
- release_vg(correct_vg);
- correct_vg = NULL;
- break;
- }
- }
-
- if (correct_vg && inconsistent_mdas) {
- release_vg(correct_vg);
- correct_vg = NULL;
- }
- fid->ref_count--;
- }
-
- dm_list_init(&all_pvs);
-
- /* Failed to find VG where we expected it - full scan and retry */
- if (!correct_vg) {
- /*
- * Free outstanding format instance that remained unassigned
- * from previous step where we tried to get the "correct_vg",
- * but we failed to do so (so there's a dangling fid now).
- */
- _destroy_fid(&fid);
- vg_fmtdata = NULL;
-
- inconsistent = 0;
+ use_previous_vg = 0;
- /* Independent MDAs aren't supported under low memory */
- if (!cmd->independent_metadata_areas && prioritized_section())
- return_NULL;
- if (!(fmt = lvmcache_fmt_from_vgname(cmd, vgname, vgid, 0)))
- return_NULL;
+ if (use_precommitted) {
+ log_warn("Reading VG %s precommit metadata from %s %llu",
+ vgname, dev_name(mda_dev), (unsigned long long)mda->header_start);
- if (precommitted && !(fmt->features & FMT_PRECOMMIT))
- use_precommitted = 0;
+ vg = mda->ops->vg_read_precommit(fid, vgname, mda, &vg_fmtdata, &use_previous_vg);
- /* create format instance with appropriate metadata area */
- fic.type = FMT_INSTANCE_MDAS | FMT_INSTANCE_AUX_MDAS;
- fic.context.vg_ref.vg_name = vgname;
- fic.context.vg_ref.vg_id = vgid;
- if (!(fid = fmt->ops->create_instance(fmt, &fic))) {
- log_error("Failed to create format instance");
- return NULL;
- }
-
- /*
- * We use the fid globally here so prevent the release_vg
- * call to destroy the fid - we may want to reuse it!
- */
- fid->ref_count++;
- /* Ensure contents of all metadata areas match - else recover */
- inconsistent_mda_count=0;
- dm_list_iterate_items(mda, &fid->metadata_areas_in_use) {
- use_previous_vg = 0;
-
- if ((use_precommitted &&
- !(vg = mda->ops->vg_read_precommit(fid, vgname, mda, &vg_fmtdata, &use_previous_vg)) && !use_previous_vg) ||
- (!use_precommitted &&
- !(vg = mda->ops->vg_read(fid, vgname, mda, &vg_fmtdata, &use_previous_vg)) && !use_previous_vg)) {
- inconsistent = 1;
+ if (!vg && !use_previous_vg) {
+ log_warn("WARNING: Reading VG %s precommit on %s failed.", vgname, dev_name(mda_dev));
vg_fmtdata = NULL;
continue;
}
+ } else {
+ log_warn("Reading VG %s metadata from %s %llu",
+ vgname, dev_name(mda_dev), (unsigned long long)mda->header_start);
- /* Use previous VG because checksum matches */
- if (!vg) {
- vg = correct_vg;
- continue;
- }
-
- if (!correct_vg) {
- correct_vg = vg;
- if (!_update_pv_list(cmd->mem, &all_pvs, correct_vg)) {
- _free_pv_list(&all_pvs);
- fid->ref_count--;
- release_vg(vg);
- return_NULL;
- }
- continue;
- }
-
- if (!id_equal(&vg->id, &correct_vg->id)) {
- inconsistent = 1;
- inconsistent_vgid = 1;
- }
-
- /* FIXME Also ensure contents same - checksums same? */
- if (correct_vg->seqno != vg->seqno) {
- /* Ignore inconsistent seqno if told to skip repair logic */
- if (cmd->metadata_read_only || skipped_rescan)
- log_warn("Not repairing metadata for VG %s.", vgname);
- else
- inconsistent = 1;
-
- if (!_update_pv_list(cmd->mem, &all_pvs, vg)) {
- _free_pv_list(&all_pvs);
- fid->ref_count--;
- release_vg(vg);
- release_vg(correct_vg);
- return_NULL;
- }
- if (vg->seqno > correct_vg->seqno) {
- release_vg(correct_vg);
- correct_vg = vg;
- } else {
- mda->status |= MDA_INCONSISTENT;
- ++inconsistent_mda_count;
- }
- }
+ vg = mda->ops->vg_read(fid, vgname, mda, &vg_fmtdata, &use_previous_vg);
- if (vg != correct_vg) {
- release_vg(vg);
+ if (!vg && !use_previous_vg) {
+ log_warn("WARNING: Reading VG %s on %s failed.", vgname, dev_name(mda_dev));
vg_fmtdata = NULL;
+ continue;
}
}
- fid->ref_count--;
-
- /* Give up looking */
- if (!correct_vg) {
- _free_pv_list(&all_pvs);
- _destroy_fid(&fid);
- return_NULL;
- }
- }
-
- /*
- * If there is no precommitted metadata, committed metadata
- * is read and stored in the cache even if use_precommitted is set
- */
- lvmcache_update_vg(correct_vg, (correct_vg->status & PRECOMMITTED));
-
- if (inconsistent) {
- /* FIXME Test should be if we're *using* precommitted metadata not if we were searching for it */
- if (use_precommitted) {
- log_error("Inconsistent pre-commit metadata copies "
- "for volume group %s", vgname);
- /*
- * Check whether all of the inconsistent MDAs were on
- * MISSING PVs -- in that case, we should be safe.
- */
- dm_list_iterate_items(mda, &fid->metadata_areas_in_use) {
- if (mda->status & MDA_INCONSISTENT) {
- log_debug_metadata("Checking inconsistent MDA: %s", dev_name(mda_get_device(mda)));
- dm_list_iterate_items(pvl, &correct_vg->pvs) {
- if (mda_get_device(mda) == pvl->pv->dev &&
- (pvl->pv->status & MISSING_PV))
- --inconsistent_mda_count;
- }
- }
- }
-
- if (inconsistent_mda_count < 0)
- log_error(INTERNAL_ERROR "Too many inconsistent MDAs.");
+ if (!vg)
+ continue;
- if (!inconsistent_mda_count) {
- *consistent = 0;
- _free_pv_list(&all_pvs);
- return correct_vg;
- }
- _free_pv_list(&all_pvs);
- release_vg(correct_vg);
- return NULL;
+ if (vg && !vg_ret) {
+ vg_ret = vg;
+ dev_ret = mda_dev;
+ continue;
}
- if (!*consistent) {
- _free_pv_list(&all_pvs);
- return correct_vg;
- }
+ /*
+ * Use the newest copy of the metadata found on any mdas.
+ * Above, We could check if the scan found an old metadata
+ * seqno in this mda and just skip reading it again; then these
+ * seqno checks would just be sanity checks.
+ */
- if (cmd->is_clvmd) {
- _free_pv_list(&all_pvs);
- return correct_vg;
+ if (vg->seqno == vg_ret->seqno) {
+ release_vg(vg);
+ continue;
}
- if (skipped_rescan) {
- log_warn("Not repairing metadata for VG %s.", vgname);
- _free_pv_list(&all_pvs);
- release_vg(correct_vg);
- return_NULL;
+ if (vg->seqno > vg_ret->seqno) {
+ log_warn("WARNING: ignore old metadata seqno %u on %s vs new metadata seqno %u on %s for VG %s.",
+ vg_ret->seqno, dev_name(dev_ret),
+ vg->seqno, dev_name(mda_dev), vg->name);
+ release_vg(vg_ret);
+ vg_ret = vg;
+ dev_ret = mda_dev;
+ vg_fmtdata = NULL;
+ continue;
}
- /* Don't touch if vgids didn't match */
- if (inconsistent_vgid) {
- log_warn("WARNING: Inconsistent metadata UUIDs found for "
- "volume group %s.", vgname);
- *consistent = 0;
- _free_pv_list(&all_pvs);
- return correct_vg;
+ if (vg_ret->seqno > vg->seqno) {
+ log_warn("WARNING: ignore old metadata seqno %u on %s vs new metadata seqno %u on %s for VG %s.",
+ vg->seqno, dev_name(mda_dev),
+ vg_ret->seqno, dev_name(dev_ret), vg->name);
+ release_vg(vg);
+ vg_fmtdata = NULL;
+ continue;
}
+ }
- /*
- * If PV is marked missing but we found it,
- * update metadata and remove MISSING flag
- */
- dm_list_iterate_items(pvl, &all_pvs)
- _check_reappeared_pv(correct_vg, pvl->pv, 1);
+ if (vg_ret)
+ set_pv_devices(fid, vg_ret);
- if (!_repair_inconsistent_vg(correct_vg, lockd_state)) {
- _free_pv_list(&all_pvs);
- release_vg(correct_vg);
- return NULL;
- }
-
- if (!_wipe_outdated_pvs(cmd, correct_vg, &all_pvs, lockd_state)) {
- _free_pv_list(&all_pvs);
- release_vg(correct_vg);
- return_NULL;
- }
- }
+ fid->ref_count--;
- _free_pv_list(&all_pvs);
+ if (!vg_ret)
+ return_NULL;
- if (vg_missing_pv_count(correct_vg)) {
- log_verbose("There are %d physical volumes missing.",
- vg_missing_pv_count(correct_vg));
- vg_mark_partial_lvs(correct_vg, 1);
- }
+ /*
+ * In lvmcache, PVs with no mdas were not attached to the vginfo during
+ * label_scan because label_scan didn't know where they should go. Now
+ * that we have the VG metadata we can tell, so use that to attach those
+ * info's to the vginfo.
+ */
+ lvmcache_update_vg(vg_ret, vg_ret->status & PRECOMMITTED);
- if ((correct_vg->status & PVMOVE) && !pvmove_mode()) {
- log_error("Interrupted pvmove detected in volume group %s.",
- correct_vg->name);
- log_print("Please restore the metadata by running vgcfgrestore.");
- release_vg(correct_vg);
- return NULL;
+ if (vg_missing_pv_count(vg_ret)) {
+ log_warn("WARNING: VG %s is missing %d PVs.", vgname, vg_missing_pv_count(vg_ret));
+ vg_mark_partial_lvs(vg_ret, 1);
}
/* We have the VG now finally, check if PV ext info is in sync with VG metadata. */
- if (!cmd->is_clvmd && !_check_or_repair_pv_ext(cmd, correct_vg, lockd_state,
+ if (!cmd->is_clvmd && !_check_or_repair_pv_ext(cmd, vg_ret, lockd_state,
skipped_rescan ? 0 : *consistent,
&inconsistent_pvs)) {
- release_vg(correct_vg);
+ release_vg(vg_ret);
return_NULL;
}
*consistent = !inconsistent_pvs;
- if (!cmd->is_clvmd && correct_vg && *consistent && !skipped_rescan) {
- if (update_old_pv_ext && !_vg_update_old_pv_ext_if_needed(correct_vg)) {
- release_vg(correct_vg);
+ if (!cmd->is_clvmd && vg_ret && *consistent && !skipped_rescan) {
+ if (update_old_pv_ext && !_vg_update_old_pv_ext_if_needed(vg_ret)) {
+ release_vg(vg_ret);
return_NULL;
}
- if (strip_historical_lvs && !vg_strip_outdated_historical_lvs(correct_vg)) {
- release_vg(correct_vg);
+ if (strip_historical_lvs && !vg_strip_outdated_historical_lvs(vg_ret)) {
+ release_vg(vg_ret);
return_NULL;
}
}
- return correct_vg;
+ return vg_ret;
}
#define DEV_LIST_DELIM ", "
@@ -6109,3 +5896,802 @@ int vg_strip_outdated_historical_lvs(struct volume_group *vg) {
return 1;
}
+
+static int _vg_repair_metadata_on_dev(struct cmd_context *cmd,
+ const struct format_type *fmt,
+ struct volume_group *vg,
+ char *textbuf_src,
+ uint32_t textlen_src,
+ uint32_t textcrc_src,
+ struct device *dev)
+{
+ char uuidstr[64] __attribute__((aligned(8)));
+ char *read_buf = NULL;
+ char *mh_buf = NULL;
+ struct metadata_area *mda1;
+ struct metadata_area *mda2;
+ struct mda_context *mdac1;
+ struct mda_context *mdac2;
+ struct label_header *lh;
+ struct pv_header *ph;
+ struct disk_locn *dl;
+ struct mda_header *mh;
+ struct raw_locn *rlocn_slot0;
+ uint64_t area_offset; /* where metadata area begins (mda_header starts) */
+ uint64_t area_size; /* size of metadata area (header + buffer) */
+ uint64_t new_offset; /* where text begins in metadata area */
+ uint32_t lh_offset; /* label_header offset from start of disk */
+ uint32_t ph_offset; /* pv_header offset from start of label_header */
+ uint32_t mh_crc; /* of mda header sector */
+ size_t mh_buf_size = MDA_HEADER_SIZE; /* size of mda header buffer */
+ int label_found = 0;
+ int mda1_found = 0;
+ int mda2_found = 0;
+ int i;
+
+ /* for reading the headers on the dev being repaired */
+ if (!(read_buf = dm_zalloc(4096)))
+ return_0;
+
+ /* for writing new mda_header on the dev being repaired */
+ if (!(mh_buf = dm_zalloc(mh_buf_size)))
+ return_0;
+
+ /* for tracking update to first mda on the dev being repaired */
+ if (!(mda1 = dm_zalloc(sizeof(struct metadata_area))))
+ return_0;
+
+ /* for tracking update to first mda on the dev being repaired */
+ if (!(mdac1 = dm_zalloc(sizeof(struct mda_context))))
+ return_0;
+
+ /* for tracking update to second mda on the dev being repaired */
+ if (!(mda2 = dm_zalloc(sizeof(struct metadata_area))))
+ return_0;
+
+ /* for tracking update to second mda on the dev being repaired */
+ if (!(mdac2 = dm_zalloc(sizeof(struct mda_context))))
+ return_0;
+
+ mda1->metadata_locn = mdac1;
+ mda2->metadata_locn = mdac2;
+
+ mda1->ops = dm_list_item(dm_list_first(&fmt->mda_ops), struct metadata_area_ops);
+ mda2->ops = dm_list_item(dm_list_first(&fmt->mda_ops), struct metadata_area_ops);
+
+ mda1->status = MDA_PRIMARY;
+
+ /*
+ * Read the first 4K of the disk being repaired and get values from
+ * - label_header
+ * - pv_header
+ * - disk_locn's
+ *
+ * Those three things cannot yet be repaired, we require
+ * that they be intact and correct so they can be used
+ * to repair the mda_header's and text metadata.
+ */
+
+ if (!dev_read_bytes(dev, 0, 4096, read_buf)) {
+ log_error("Repair %s: cannot read device.", dev_name(dev));
+ return 0;
+ }
+
+ for (i = 0; i < LABEL_SCAN_SECTORS; i++) {
+ lh_offset = i * 512;
+
+ /* label_header ususally in the second sector */
+ lh = (struct label_header *)(read_buf + lh_offset);
+
+ if (strncmp((char *)lh->id, LABEL_ID, sizeof(lh->id)))
+ continue;
+
+ if (xlate64(lh->sector_xl) != i)
+ continue;
+
+ /* TODO: check label crc and if it's not valid allow repair */
+
+ label_found = 1;
+
+ ph_offset = xlate32(lh->offset_xl);
+
+ log_warn("Repair %s: label_header found at %u", dev_name(dev), lh_offset);
+ log_warn("Repair %s: label_header pv_header offset %llu", dev_name(dev), (unsigned long long)ph_offset);
+ break;
+ }
+
+ if (!label_found) {
+ /* TODO: allow writing correct label header here */
+ log_error("Repair %s: no label_header found.", dev_name(dev));
+ return 0;
+ }
+
+ ph = (struct pv_header *)(read_buf + lh_offset + ph_offset);
+
+ /*
+ * TODO: allow an option that specifies a pv_uuid that we should use to
+ * write a new pv_header here.
+ */
+ if (!id_write_format((const struct id *)ph->pv_uuid, uuidstr, sizeof(uuidstr))) {
+ log_error("Repair %s: pv_header has invalid pv_uuid", dev_name(dev));
+ return 0;
+ }
+
+ log_warn("Repair %s: pv_header pv_uuid %s", dev_name(dev), uuidstr);
+ log_warn("Repair %s: pv_header device_size %llu", dev_name(dev),
+ (unsigned long long)xlate64(ph->device_size_xl));
+
+ /* disk_locn's follow the pv_header, specifying locations of data and metadata */
+ dl = ph->disk_areas_xl;
+
+ /* one disk_locn for each data area, we don't do anything with these */
+ while (1) {
+ area_offset = xlate64(dl->offset);
+ if (!area_offset)
+ break;
+ area_size = xlate64(dl->size);
+
+ log_warn("Repair %s: disk_locn data area offset %llu size %llu",
+ dev_name(dev),
+ (unsigned long long)area_offset,
+ (unsigned long long)area_size);
+
+ dl++;
+ }
+
+ /* one disk_locn is empty as separator */
+ dl++;
+
+ /* one disk_locn for each metadata area, we need these to repair metadata */
+ while (1) {
+ area_offset = xlate64(dl->offset);
+ if (!area_offset)
+ break;
+ area_size = xlate64(dl->size);
+
+ log_warn("Repair %s: disk_locn metadata area offset %llu size %llu",
+ dev_name(dev),
+ (unsigned long long)area_offset,
+ (unsigned long long)area_size);
+
+ if (!mda1_found) {
+ mda1_found = 1;
+ mdac1->area.dev = dev;
+ mdac1->area.start = area_offset;
+ mdac1->area.size = area_size;
+ } else if (!mda2_found) {
+ mda2_found = 1;
+ mdac2->area.dev = dev;
+ mdac2->area.start = area_offset;
+ mdac2->area.size = area_size;
+ }
+
+ dl++;
+ }
+
+ if (!mda1_found && !mda2_found) {
+ /* TODO: allow writing new disk_locn's for existing mda_headers here? */
+ log_error("Repair %s: no metadata locations found.", dev_name(dev));
+ return 0;
+ }
+
+ /* Ignoring pv_header_extension and disk_locn's for boot areas. */
+
+ /*
+ * Check that the PVID on this device matches a PVID in the VG metadata
+ * we're going to write. This is important in case the wrong
+ * destination dev if specified which belongs to another VG.
+ */
+ if (!find_pv_in_vg_by_uuid(vg, (struct id *)ph->pv_uuid)) {
+ log_error("Repair %s: pv_header uuid %s not found in source metadata.",
+ dev_name(dev), uuidstr);
+ return 0;
+ }
+
+ /*
+ * We cannot export 'vg' into a new buf to write because the export
+ * adds new comment fields which produce a different metadata checksum
+ * from the metadata on the other devices. Instead we have to write
+ * the exact copy of text metadata from the source device (in
+ * textbuf_src) so that the existing and repaired devices match.
+ */
+
+ /* TODO: syslog description of what we're doing */
+
+ /*
+ * Write new text metadata and mda_header. This ignores any existing
+ * mda_header and text area, and writes the new metadata at the start
+ * of the text area, and a new mda_header pointing to it.
+ *
+ * TODO: we could look for the largest start/size in slot0 or slot1
+ * and put this new repaired metadata after that. This would preserve
+ * older metadata copies for analysis/debugging.
+ *
+ * Should we first read the current/invalid/bad mda_header to check
+ * anything before rewriting it?
+ * We could check for the RAW_LOCN_IGNORED flag set in the
+ * mda_header/rlocn struct which indicates that we are ignoring
+ * this metadata area. But since we are repairing possibly corrupt
+ * data should we trust that bit?
+ *
+ * There are two different checksums: meta_buf_crc and mh_crc.
+ *
+ * meta_buf_crc is the checksum of the text metadata buffer.
+ * This is written in the mda_header/rlocn struct.
+ *
+ * mh_crc is the checksum of the sector containing the mda_header.
+ * It begins after the mda_header.checksum_xl field, ending at 512.
+ * It includes mda_header and rlocn structs (including meta_buf_crc.)
+ * It is written in the mda_header.checksum_xl field.
+ * It needs to be computed after xlate of the other fields.
+ */
+
+ if (mda1_found) {
+ new_offset = MDA_HEADER_SIZE; /* maybe start after existing? */
+
+ memset(mh_buf, 0, mh_buf_size);
+ mh = (struct mda_header *)mh_buf;
+ strncpy((char *)mh->magic, FMTT_MAGIC, sizeof(mh->magic));
+ mh->version = FMTT_VERSION;
+ mh->start = mdac1->area.start;
+ mh->size = mdac1->area.size;
+ rlocn_slot0 = &mh->raw_locns[0];
+ rlocn_slot0->offset = new_offset;
+ rlocn_slot0->size = textlen_src;
+ rlocn_slot0->checksum = textcrc_src;
+ /* slot1 is zero like it would usually be for committed metadata */
+
+ xlate_mdah(mh); /* must happen before crc is calculated */
+
+ mh_crc = calc_crc(INITIAL_CRC,
+ (uint8_t *)mh->magic,
+ (MDA_HEADER_SIZE - sizeof(mh->checksum_xl)));
+
+ mh->checksum_xl = xlate32(mh_crc);
+
+ /* TODO: write some unofficial data into the circular buffer
+ indicating the repair */
+
+ log_warn("Repair %s: writing new metadata at %llu", dev_name(dev),
+ (unsigned long long)(mdac1->area.start + new_offset));
+
+ /* write text metadata into circular buffer */
+ if (!test_mode()) {
+ if (!dev_write_bytes(dev, mdac1->area.start + new_offset,
+ (size_t)textlen_src, textbuf_src)) {
+ log_error("Repair %s: failed to write new mda_header", dev_name(dev));
+ return 0;
+ }
+ } else {
+ log_warn("Skip write for test mode.");
+ }
+
+ log_warn("Repair %s: writing new mda_header at %llu", dev_name(dev),
+ (unsigned long long)mdac1->area.start);
+
+ /* write mda_header at start of metadata area */
+ if (!test_mode()) {
+ if (!dev_write_bytes(dev, mdac1->area.start,
+ (size_t)mh_buf_size, mh_buf)) {
+ log_error("Repair %s: failed to write new metadata", dev_name(dev));
+ return 0;
+ }
+ } else {
+ log_warn("Skip write for test mode.");
+ }
+ }
+
+ /* catch typos below */
+ mda1 = NULL;
+ mdac1 = NULL;
+
+ /*
+ * TODO: the existing rlocn struct may have RAW_LOCN_IGNORED set which
+ * means this metadata area should not contain metadata.
+ * For now, if there's a metadata area here we will write the new
+ * repaired metadata into it even if it was previously ignored.
+ */
+
+ if (mda2_found) {
+ new_offset = MDA_HEADER_SIZE; /* maybe start after existing? */
+
+ memset(mh_buf, 0, mh_buf_size);
+ mh = (struct mda_header *)mh_buf;
+ strncpy((char *)mh->magic, FMTT_MAGIC, sizeof(mh->magic));
+ mh->version = FMTT_VERSION;
+ mh->start = mdac2->area.start;
+ mh->size = mdac2->area.size;
+ rlocn_slot0 = &mh->raw_locns[0];
+ rlocn_slot0->offset = new_offset;
+ rlocn_slot0->size = textlen_src;
+ rlocn_slot0->checksum = textcrc_src;
+ /* slot1 is zero like it would usually be for committed metadata */
+
+ xlate_mdah(mh); /* must happen before crc is calculated */
+
+ mh_crc = calc_crc(INITIAL_CRC,
+ (uint8_t *)mh->magic,
+ (MDA_HEADER_SIZE - sizeof(mh->checksum_xl)));
+
+ mh->checksum_xl = xlate32(mh_crc);
+
+ /* TODO: write some unofficial data into the circular buffer
+ indicating the repair */
+
+ log_warn("Repair %s: writing new metadata at %llu", dev_name(dev),
+ (unsigned long long)(mdac2->area.start + new_offset));
+
+ /* write text metadata into circular buffer */
+ if (!test_mode()) {
+ if (!dev_write_bytes(dev, mdac2->area.start + new_offset,
+ (size_t)textlen_src, textbuf_src)) {
+ log_error("Repair %s: failed to write new mda_header", dev_name(dev));
+ return 0;
+ }
+ } else {
+ log_warn("Skip write for test mode.");
+ }
+
+ log_warn("Repair %s: writing new mda_header at %llu", dev_name(dev),
+ (unsigned long long)mdac2->area.start);
+
+ /* write mda_header at start of metadata area */
+ if (!test_mode()) {
+ if (!dev_write_bytes(dev, mdac2->area.start,
+ (size_t)mh_buf_size, mh_buf)) {
+ log_error("Repair %s: failed to write new metadata", dev_name(dev));
+ return 0;
+ }
+ } else {
+ log_warn("Skip write for test mode.");
+ }
+ }
+
+ return 1;
+}
+
+static char *_read_metadata_text(struct cmd_context *cmd, struct device *dev,
+ uint64_t area_start, uint64_t area_size,
+ uint32_t *len, uint64_t *disk_offset)
+{
+ struct mda_header *mh;
+ struct raw_locn *rlocn_slot0;
+ uint64_t text_offset, text_size;
+ char *area_buf;
+ char *text_buf;
+
+ /*
+ * Read the entire metadata area, including mda_header and entire
+ * circular buffer.
+ */
+ if (!(area_buf = dm_malloc(area_size)))
+ return NULL;
+
+ if (!dev_read_bytes(dev, area_start, area_size, area_buf)) {
+ return NULL;
+ }
+
+ mh = (struct mda_header *)area_buf;
+ xlate_mdah(mh);
+
+ rlocn_slot0 = &mh->raw_locns[0];
+ text_offset = rlocn_slot0->offset;
+ text_size = rlocn_slot0->size;
+
+ /*
+ * Copy and return the current metadata text out of the metadata area.
+ */
+
+ if (!(text_buf = dm_malloc(text_size))) {
+ return NULL;
+ }
+
+ memcpy(text_buf, area_buf + text_offset, text_size);
+
+ if (len)
+ *len = (uint32_t)text_size;
+ if (disk_offset)
+ *disk_offset = area_start + text_offset;
+
+ return text_buf;
+}
+
+int vg_repair_metadata(struct cmd_context *cmd, const char *vgname,
+ const char *dev_src_name, const char *file_src_name,
+ struct dm_list *dev_dst_list)
+{
+ struct device *dev_src = NULL;
+ char *textbuf_src;
+ const char *vgid;
+ const struct format_type *fmt;
+ struct format_instance *fid;
+ struct format_instance_ctx fic;
+ struct metadata_area *mda_src;
+ struct mda_context *mdac_src;
+ struct volume_group *vg;
+ struct pv_list *pvl;
+ struct device_list *devl;
+ unsigned use_previous_vg = 0;
+ uint32_t textlen_src = 0;
+ uint32_t textcrc_src = 0;
+ int use_mda_num = 0;
+ int ret = 0;
+
+ /*
+ * The source metadata to use for repair comes from a specific PV,
+ * a specific file, or when neither is specified, we look for a
+ * good copy of metadata from any available PV in the named VG.
+ */
+ if (dev_src_name && file_src_name) {
+ log_error("Cannot use both source device and source file.");
+ return 0;
+
+ } else if (dev_src_name) {
+ if (!(dev_src = dev_cache_get(dev_src_name, NULL))) {
+ log_error("No device found for repair source.");
+ return 0;
+ }
+
+ } else if (file_src_name) {
+ /* Will open and read the file below. */
+
+ } else {
+ if (!(dev_src = lvmcache_get_repair_src_dev(cmd, vgname))) {
+ log_error("No device found to use for repair.");
+ return 0;
+ }
+ }
+
+ /*
+ * Set up overhead/abstractions for reading a given vgname
+ * (fmt/fid/fic/vgid).
+ *
+ * This requires that the label scan (already done) has
+ * found the vgname on at least one good PV and stuffed
+ * vginfo about it into lvmcache.
+ *
+ * TODO: remove this limitation so we can repair things
+ * even if label scan can't fully process any PVs?
+ */
+ if (!(vgid = lvmcache_vgid_from_vgname(cmd, vgname))) {
+ log_error("No VG found for %s", vgname);
+ return 0;
+ }
+
+ if (!(fmt = lvmcache_fmt_from_vgname(cmd, vgname, vgid, 0))) {
+ log_error("No fmt found for %s", vgname);
+ return 0;
+ }
+
+ fic.type = FMT_INSTANCE_MDAS | FMT_INSTANCE_AUX_MDAS;
+ fic.context.vg_ref.vg_name = vgname;
+ fic.context.vg_ref.vg_id = vgid;
+ if (!(fid = fmt->ops->create_instance(fmt, &fic))) {
+ log_error("Failed to create format instance");
+ return 0;
+ }
+
+ /*
+ * Read the source metadata from a single mda on a single PV,
+ * or from a text file. The source metadata is read into a
+ * raw buffer of text, and also read+parsed into a struct vg.
+ * The raw text is what needs to be written to the destination
+ * PVs, but the parsed vg struct allows us to verify the raw
+ * text metadata is parsable/vavlid, and to look up and verify
+ * values like PVID's in it.
+ *
+ * For dev_src:
+ * we require that the label_scan (already done) has found the
+ * VG on the source dev and added info to lvmcache about it.
+ * We get the source mda/mdac structs from lvmcache info->mdas.
+ * This assumes there is one good PV that label_scan can process
+ * correctly.
+ *
+ * TODO: if there was a :2 suffix on the dev src name, then
+ * use the second mda from the device.
+ *
+ * TODO: even if the textbuf is not parsable, should we allow
+ * using it with the force option? There may be cases where
+ * there's a problem with the parsing that shouldn't prevent
+ * us from writing the new metadata if we know it's what we want.
+ */
+ if (dev_src) {
+ if (!(mda_src = lvmcache_get_repair_src_mda(cmd, vgname, dev_src, use_mda_num))) {
+ log_error("No mda on source device %s", dev_name(dev_src));
+ goto out;
+ }
+ mdac_src = mda_src->metadata_locn;
+
+ /*
+ * Read the VG metadata from the source device as a raw chunk of
+ * original text.
+ */
+ textbuf_src = _read_metadata_text(cmd, dev_src,
+ mdac_src->area.start, mdac_src->area.size,
+ &textlen_src, NULL);
+ if (!textbuf_src || !textlen_src) {
+ log_error("No metadata text on source device %s", dev_name(dev_src));
+ goto out;
+ }
+
+ /*
+ * Read the same VG metadata, but imported/parsed into a vg struct
+ * format so we know it's valid/parsable, and can look at values in it.
+ * _vg_read_raw()
+ */
+ if (!(vg = mda_src->ops->vg_read(fid, vgname, mda_src, NULL, &use_previous_vg))) {
+ log_error("No parsable metadata on source device %s", dev_name(dev_src));
+ goto out;
+ }
+ set_pv_devices(fid, vg);
+
+ } else if (file_src_name) {
+ char *desc = NULL;
+ time_t when;
+ struct stat sb;
+ int fd, rv;
+
+ if (!(fd = open(file_src_name, O_RDONLY))) {
+ log_error("Cannot open file use for repair: %s", file_src_name);
+ goto out;
+ }
+
+ rv = fstat(fd, &sb);
+ if (rv) {
+ log_error("Cannot access file use for repair: %s", file_src_name);
+ close(fd);
+ goto out;
+ }
+
+ if (!(textlen_src = (uint32_t)sb.st_size)) {
+ log_error("Empty file to use for repair: %s", file_src_name);
+ close(fd);
+ goto out;
+ }
+
+ if (!(textbuf_src = dm_zalloc(textlen_src + 1))) {
+ close(fd);
+ goto_out;
+ }
+
+ rv = read(fd, textbuf_src, textlen_src);
+ if (rv != textlen_src) {
+ log_error("Cannot read file to use for repair: %s", file_src_name);
+ close(fd);
+ goto out;
+ }
+
+ textlen_src += 1; /* null terminating byte */
+
+ close(fd);
+
+ /*
+ * Read the VG metadata from the file, but imported/parsed
+ * into a vg struct so we know it's valid/parsable, and can
+ * look at values in it.
+ *
+ * TODO: error if a "backup" metadata file is specified instead
+ * of raw output (or munge a backup file into a raw format.)
+ */
+ if (!(vg = text_read_metadata(fid, file_src_name, NULL, &use_previous_vg,
+ NULL, 0, 0, 0, 0, 0, NULL, 0, &when, &desc))) {
+ log_error("No parsable metadata in source file %s", file_src_name);
+ goto out;
+ }
+ set_pv_devices(fid, vg);
+ }
+
+ textcrc_src = calc_crc(INITIAL_CRC, (uint8_t *)textbuf_src, textlen_src);
+
+ /*
+ * If we haven't been told which devs to repair, then repair any related
+ * to this VG that label_scan saw a problem with.
+ *
+ * FIXME: on a given device, there may be a problem with just mda1 or
+ * just mda2. Get that info from lvmcache and use it to repair just
+ * one mda if only one of them has a problem.
+ */
+ if (dm_list_empty(dev_dst_list)) {
+ dm_list_iterate_items(pvl, &vg->pvs) {
+ if (lvmcache_has_bad_metadata(pvl->pv->dev)) {
+ if (!(devl = malloc(sizeof(*devl))))
+ return_0;
+ devl->dev = pvl->pv->dev;
+ dm_list_add(dev_dst_list, &devl->list);
+ log_warn("Device %s has bad metadata.", dev_name(devl->dev));
+ continue;
+ }
+
+ if (lvmcache_has_old_metadata(cmd, vgname, vgid, pvl->pv->dev)) {
+ if (!(devl = malloc(sizeof(*devl))))
+ return_0;
+ devl->dev = pvl->pv->dev;
+ dm_list_add(dev_dst_list, &devl->list);
+ log_warn("Device %s has old metadata.", dev_name(devl->dev));
+ continue;
+ }
+ }
+ }
+
+ if (dm_list_empty(dev_dst_list)) {
+ log_warn("No devices found needing repair.");
+ ret = 1;
+ goto out_release;
+ }
+
+ log_print("Using metadata for %s from %s with seqno %u size %u checksum 0x%x.",
+ vgname, dev_src ? dev_name(dev_src) : file_src_name,
+ vg->seqno, textlen_src, textcrc_src);
+ dm_list_iterate_items(devl, dev_dst_list)
+ log_warn("Repair %s ?", dev_name(devl->dev));
+
+#if 0
+ if (!arg_count(cmd, yes_ARG) &&
+ yes_no_prompt("Replace metadata on these devices? [y/n]: ") == 'n') {
+ log_error("Repair aborted.");
+ return 0;
+ }
+#endif
+
+ ret = 1;
+
+ dm_list_iterate_items(devl, dev_dst_list) {
+ if (!_vg_repair_metadata_on_dev(cmd, fmt, vg, textbuf_src, textlen_src, textcrc_src, devl->dev)) {
+ log_error("Repair failed on %s", dev_name(devl->dev));
+ ret = 0;
+ }
+ }
+
+out_release:
+ release_vg(vg);
+out:
+ fid->ref_count--;
+ return ret;
+}
+
+int vg_dump_metadata(struct cmd_context *cmd, const char *vgname,
+ const char *dev_src_name, int force, const char *tofile)
+{
+ struct device *dev_src;
+ char *textbuf_src;
+ const char *vgid;
+ const struct format_type *fmt;
+ struct format_instance *fid;
+ struct format_instance_ctx fic;
+ struct metadata_area *mda_src;
+ struct mda_context *mdac_src;
+ struct volume_group *vg;
+ unsigned use_previous_vg = 0;
+ uint32_t textlen_src = 0;
+ uint32_t textcrc_src;
+ uint64_t text_disk_offset;
+ int use_mda_num = 0;
+ int ret = 0;
+
+ if (!dev_src_name) {
+ if (!(dev_src = lvmcache_get_repair_src_dev(cmd, vgname))) {
+ log_error("No device found to use for repair.");
+ return 0;
+ }
+ } else {
+ if (!(dev_src = dev_cache_get(dev_src_name, NULL))) {
+ log_error("No device found for repair source.");
+ return 0;
+ }
+ }
+
+ /*
+ * Set up overhead/abstractions for reading a given vgname
+ * (fmt/fid/fic/vgid).
+ *
+ * This requires that the label scan (already done) has
+ * found the vgname on at least one good PV and stuffed
+ * vginfo about it into lvmcache.
+ *
+ * TODO: remove this limitation so we can repair things
+ * even if label scan can't fully process any PVs?
+ */
+ if (!(vgid = lvmcache_vgid_from_vgname(cmd, vgname))) {
+ log_error("No VG found for %s", vgname);
+ return 0;
+ }
+
+ if (!(fmt = lvmcache_fmt_from_vgname(cmd, vgname, vgid, 0))) {
+ log_error("No fmt found for %s", vgname);
+ return 0;
+ }
+
+ fic.type = FMT_INSTANCE_MDAS | FMT_INSTANCE_AUX_MDAS;
+ fic.context.vg_ref.vg_name = vgname;
+ fic.context.vg_ref.vg_id = vgid;
+ if (!(fid = fmt->ops->create_instance(fmt, &fic))) {
+ log_error("Failed to create format instance");
+ return 0;
+ }
+
+ /*
+ * Read the source metadata from a single mda on a single device.
+ * This metadata will be written to the devs needing repair.
+ *
+ * We require that the label_scan (already done) has found the
+ * VG on the source dev and added info to lvmcache about it.
+ * We get the source mda/mdac structs from lvmcache info->mdas.
+ *
+ * This assumes there is one good PV that label_scan can process
+ * correctly.
+ *
+ * TODO: if there was a :2 suffix on the dev src name, then
+ * use the second mda from the device.
+ *
+ * TODO: add an option to look at a specific offset in the
+ * circular buffer instead of using the rlocn value.
+ */
+ if (!(mda_src = lvmcache_get_repair_src_mda(cmd, vgname, dev_src, use_mda_num))) {
+ log_error("No mda on source device %s", dev_name(dev_src));
+ goto out;
+ }
+ mdac_src = mda_src->metadata_locn;
+
+ /*
+ * Read the VG metadata from the source device as a raw chunk of
+ * original text.
+ */
+ textbuf_src = _read_metadata_text(cmd, dev_src,
+ mdac_src->area.start, mdac_src->area.size,
+ &textlen_src, &text_disk_offset);
+ if (!textbuf_src || !textlen_src) {
+ log_error("No metadata text on source device %s", dev_name(dev_src));
+ goto out;
+ }
+
+ textcrc_src = calc_crc(INITIAL_CRC, (uint8_t *)textbuf_src, textlen_src);
+
+ /*
+ * Read the same VG metadata, but imported/parsed into a vg struct
+ * format so we know it's valid/parsable, and can look at values in it.
+ * _vg_read_raw()
+ */
+ if (!(vg = mda_src->ops->vg_read(fid, vgname, mda_src, NULL, &use_previous_vg))) {
+ log_error("Parse error for metadata on source device %s.", dev_name(dev_src));
+ if (force)
+ goto print;
+ log_error("Use --force to dump unparsable metadata buffer.");
+ goto out;
+ }
+
+ set_pv_devices(fid, vg);
+
+print:
+ log_print("Printing metadata for %s from %s at %llu with seqno %u size %u checksum 0x%x.",
+ vgname, dev_name(dev_src),
+ (unsigned long long)text_disk_offset,
+ vg->seqno, textlen_src, textcrc_src);
+
+ if (!tofile) {
+ log_print("---");
+ printf("%s\n", textbuf_src);
+ log_print("---");
+ } else {
+ FILE *fp;
+ if (!(fp = fopen(tofile, "wx"))) {
+ log_error("Failed to create file %s", tofile);
+ goto out;
+ }
+
+ fprintf(fp, "%s", textbuf_src);
+
+ if (fflush(fp))
+ stack;
+ if (fclose(fp))
+ stack;
+ }
+
+ if (vg)
+ release_vg(vg);
+
+ ret = 1;
+out:
+ fid->ref_count--;
+ return ret;
+}
+
diff --git a/lib/metadata/metadata.h b/lib/metadata/metadata.h
index 1e3dd1b97..b411f7d5f 100644
--- a/lib/metadata/metadata.h
+++ b/lib/metadata/metadata.h
@@ -173,6 +173,7 @@ struct metadata_area {
struct metadata_area_ops *ops;
void *metadata_locn;
uint32_t status;
+ uint64_t header_start; /* mda_header.start */
};
struct metadata_area *mda_copy(struct dm_pool *mem,
struct metadata_area *mda);
@@ -509,4 +510,6 @@ struct id pv_vgid(const struct physical_volume *pv);
uint64_t find_min_mda_size(struct dm_list *mdas);
char *tags_format_and_copy(struct dm_pool *mem, const struct dm_list *tagsl);
+void set_pv_devices(struct format_instance *fid, struct volume_group *vg);
+
#endif
diff --git a/tools/args.h b/tools/args.h
index c2fbac696..8d5ad5cd4 100644
--- a/tools/args.h
+++ b/tools/args.h
@@ -529,6 +529,15 @@ arg(repair_ARG, '\0', "repair", 0, 0, 0,
"utility on a thin pool. See \\fBlvmraid\\fP(7) and \\fBlvmthin\\fP(7)\n"
"for more information.\n")
+arg(repairmetadata_ARG, '\0', "repairmetadata", 0, 0, 0,
+ "Repair metadata on PVs in a VG.\n")
+
+arg(dumpmetadata_ARG, '\0', "dumpmetadata", 0, 0, 0,
+ "Print raw metadata for the VG.\n")
+
+arg(sourcedevice_ARG, '\0', "sourcedevice", pv_VAL, 0, 0,
+ "Use metadata from this device.\n")
+
arg(replace_ARG, '\0', "replace", pv_VAL, ARG_GROUPABLE, 0,
"Replace a specific PV in a raid LV with another PV.\n"
"The new PV to use can be optionally specified after the LV.\n"
diff --git a/tools/command-lines.in b/tools/command-lines.in
index a8c06bafc..624862713 100644
--- a/tools/command-lines.in
+++ b/tools/command-lines.in
@@ -1545,6 +1545,18 @@ vgck
OO: --reportformat ReportFmt
OP: VG|Tag ...
ID: vgck_general
+DESC: Read VG and display information
+
+vgck --repairmetadata VG
+OO: --sourcedevice PV, --file String
+OP: PV ...
+ID: vgck_repair_metadata
+DESC: Repair metadata on PVs
+
+vgck --dumpmetadata VG
+OO: --sourcedevice PV, --file String, --force
+ID: vgck_dump_metadata
+DESC: Read and print raw metadata from a PV
---
diff --git a/tools/vgck.c b/tools/vgck.c
index 54bc9d649..aa1f2e387 100644
--- a/tools/vgck.c
+++ b/tools/vgck.c
@@ -35,9 +35,114 @@ static int vgck_single(struct cmd_context *cmd __attribute__((unused)),
return ECMD_PROCESSED;
}
+/*
+ * vgck --repairmetadata [--sourcedevice PV_src] VG [PV_dst ...]
+ *
+ * PV_src: if specified, the metadata from this PV is written to
+ * the PVs needing repair. If not specified, a copy of the metadata
+ * with the largest seqno is used.
+ *
+ * PV_dst: if specified, new metadata is written to these PVs.
+ */
+
+static int _repair_metadata(struct cmd_context *cmd, int argc, char **argv)
+{
+ const char *dev_src_name = NULL;
+ const char *file_src_name = NULL;
+ const char *vgname;
+ struct device_list *devl;
+ struct dm_list dev_dst_list;
+ int ret = 1;
+ int i;
+
+ dm_list_init(&dev_dst_list);
+
+ vgname = cmd->position_argv[0];
+ /* TODO: verify valid vgname */
+
+ if (!lock_vol(cmd, vgname, LCK_VG_WRITE, NULL))
+ return ECMD_FAILED;
+
+ lvmcache_label_scan(cmd);
+
+ /*
+ * specific PVs to repair
+ */
+ for (i = 1; i < cmd->position_argc; i++) {
+ if (!(devl = malloc(sizeof(*devl))))
+ return ECMD_FAILED;
+ if (!(devl->dev = dev_cache_get(cmd->position_argv[i], NULL))) {
+ log_error("Device not found to repair: %s", cmd->position_argv[i]);
+ return ECMD_FAILED;
+ }
+ dm_list_add(&dev_dst_list, &devl->list);
+ }
+
+ /*
+ * specific PV or file to use as source of metadata to use for repair
+ */
+ if (arg_is_set(cmd, sourcedevice_ARG)) {
+ if (!(dev_src_name = arg_str_value(cmd, sourcedevice_ARG, NULL)))
+ return ECMD_FAILED;
+
+ } else if (arg_is_set(cmd, file_ARG)) {
+ if (!(file_src_name = arg_str_value(cmd, file_ARG, NULL)))
+ return ECMD_FAILED;
+ }
+
+ ret = vg_repair_metadata(cmd, vgname, dev_src_name, file_src_name, &dev_dst_list);
+
+ unlock_vg(cmd, NULL, vgname);
+
+ if (!ret)
+ return ECMD_FAILED;
+ return ECMD_PROCESSED;
+}
+
+static int _dump_metadata(struct cmd_context *cmd, int argc, char **argv)
+{
+ const char *dev_src_name = NULL;
+ const char *vgname;
+ const char *tofile = NULL;
+ int ret = 1;
+
+ vgname = cmd->position_argv[0];
+ /* TODO: verify valid vgname */
+
+ if (!lock_vol(cmd, vgname, LCK_VG_READ, NULL))
+ return ECMD_FAILED;
+
+ lvmcache_label_scan(cmd);
+
+ if (arg_is_set(cmd, sourcedevice_ARG)) {
+ if (!(dev_src_name = arg_str_value(cmd, sourcedevice_ARG, NULL)))
+ return ECMD_FAILED;
+ }
+
+ if (arg_is_set(cmd, file_ARG)) {
+ if (!(tofile = arg_str_value(cmd, file_ARG, NULL)))
+ return ECMD_FAILED;
+ }
+
+ ret = vg_dump_metadata(cmd, vgname, dev_src_name, arg_is_set(cmd, force_ARG), tofile);
+
+ unlock_vg(cmd, NULL, vgname);
+
+ if (!ret)
+ return ECMD_FAILED;
+ return ECMD_PROCESSED;
+}
+
int vgck(struct cmd_context *cmd, int argc, char **argv)
{
lvmetad_make_unused(cmd);
+
+ if (arg_is_set(cmd, repairmetadata_ARG))
+ return _repair_metadata(cmd, argc, argv);
+
+ if (arg_is_set(cmd, dumpmetadata_ARG))
+ return _dump_metadata(cmd, argc, argv);
+
return process_each_vg(cmd, argc, argv, NULL, NULL, 0, 0, NULL,
&vgck_single);
}