From 0e8189e2708bc1da08c77c7e1d960f420b6890a5 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Fri, 31 Oct 2008 11:31:08 -0400 Subject: close another possibility for propagating pack corruption Abstract -------- With index v2 we have a per object CRC to allow quick and safe reuse of pack data when repacking. This, however, doesn't currently prevent a stealth corruption from being propagated into a new pack when _not_ reusing pack data as demonstrated by the modification to t5302 included here. The Context ----------- The Git database is all checksummed with SHA1 hashes. Any kind of corruption can be confirmed by verifying this per object hash against corresponding data. However this can be costly to perform systematically and therefore this check is often not performed at run time when accessing the object database. First, the loose object format is entirely compressed with zlib which already provide a CRC verification of its own when inflating data. Any disk corruption would be caught already in this case. Then, packed objects are also compressed with zlib but only for their actual payload. The object headers and delta base references are not deflated for obvious performance reasons, however this leave them vulnerable to potentially undetected disk corruptions. Object types are often validated against the expected type when they're requested, and deflated size must always match the size recorded in the object header, so those cases are pretty much covered as well. Where corruptions could go unnoticed is in the delta base reference. Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to get corrupted so it actually matches the SHA1 of another object with the same size (the delta header stores the expected size of the base object to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the reference is a pack offset which would have to match the start boundary of a different base object but still with the same size, and although this is relatively much more "probable" than in the OBJ_REF_DELTA case, the probability is also about zero in absolute terms. Still, the possibility exists as demonstrated in t5302 and is certainly greater than a SHA1 collision, especially in the OBJ_OFS_DELTA case which is now the default when repacking. Again, repacking by reusing existing pack data is OK since the per object CRC provided by index v2 guards against any such corruptions. What t5302 failed to test is a full repack in such case. The Solution ------------ As unlikely as this kind of stealth corruption can be in practice, it certainly isn't acceptable to propagate it into a freshly created pack. But, because this is so unlikely, we don't want to pay the run time cost associated with extra validation checks all the time either. Furthermore, consequences of such corruption in anything but repacking should be rather visible, and even if it could be quite unpleasant, it still has far less severe consequences than actively creating bad packs. So the best compromize is to check packed object CRC when unpacking objects, and only during the compression/writing phase of a repack, and only when not streaming the result. The cost of this is minimal (less than 1% CPU time), and visible only with a full repack. Someone with a stats background could provide an objective evaluation of this, but I suspect that it's bad RAM that has more potential for data corruptions at this point, even in those cases where this extra check is not performed. Still, it is best to prevent a known hole for corruption when recreating object data into a new pack. What about the streamed pack case? Well, any client receiving a pack must always consider that pack as untrusty and perform full validation anyway, hence no such stealth corruption could be propagated to remote repositoryes already. It is therefore worthless doing local validation in that case. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'builtin-pack-objects.c') diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 15b80db5a1..1591417962 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1698,6 +1698,16 @@ static void prepare_pack(int window, int depth) get_object_details(); + /* + * If we're locally repacking then we need to be doubly careful + * from now on in order to make sure no stealth corruption gets + * propagated to the new pack. Clients receiving streamed packs + * should validate everything they get anyway so no need to incur + * the additional cost here in that case. + */ + if (!pack_to_stdout) + do_check_packed_object_crc = 1; + if (!nr_objects || !window || !depth) return; -- cgit v1.2.1 From d8f325563d85abcd9816311b3a84093b2d1cda9f Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:45 -0400 Subject: better validation on delta base object offsets In one case, it was possible to have a bad offset equal to 0 effectively pointing a delta onto itself and crashing git after too many recursions. In the other cases, a negative offset could result due to off_t being signed. Catch those. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'builtin-pack-objects.c') diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 1591417962..cc1e47f41a 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1038,10 +1038,10 @@ static void check_object(struct object_entry *entry) c = buf[used_0++]; ofs = (ofs << 7) + (c & 127); } - if (ofs >= entry->in_pack_offset) + ofs = entry->in_pack_offset - ofs; + if (ofs <= 0 || ofs >= entry->in_pack_offset) die("delta base offset out of bound for %s", sha1_to_hex(entry->idx.sha1)); - ofs = entry->in_pack_offset - ofs; if (reuse_delta && !entry->preferred_base) { struct revindex_entry *revidx; revidx = find_pack_revindex(p, ofs); -- cgit v1.2.1 From 09ded04b7e1f0096bb2fe356b2f5a298296151dd Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:46 -0400 Subject: make unpack_object_header() non fatal It is possible to have pack corruption in the object header. Currently unpack_object_header() simply die() on them instead of letting the caller deal with that gracefully. So let's have unpack_object_header() return an error instead, and find a better name for unpack_object_header_gently() in that context. All callers of unpack_object_header() are ready for it. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'builtin-pack-objects.c') diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index cc1e47f41a..64aefdf23b 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1002,7 +1002,7 @@ static void check_object(struct object_entry *entry) * We want in_pack_type even if we do not reuse delta * since non-delta representations could still be reused. */ - used = unpack_object_header_gently(buf, avail, + used = unpack_object_header_buffer(buf, avail, &entry->in_pack_type, &entry->size); -- cgit v1.2.1 From 03d660150cbc80cd7d2ec90c3c4e6ce563295e3a Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:48 -0400 Subject: make check_object() resilient to pack corruptions The check_object() function tries to get away with the least amount of pack access possible when it already has partial information on given object rather than calling the more costly packed_object_info(). When things don't look right, it should just give up and fall back to packed_object_info() directly instead of die()'ing. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) (limited to 'builtin-pack-objects.c') diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 64aefdf23b..c05fb944b0 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1005,6 +1005,8 @@ static void check_object(struct object_entry *entry) used = unpack_object_header_buffer(buf, avail, &entry->in_pack_type, &entry->size); + if (used == 0) + goto give_up; /* * Determine if this is a delta and if so whether we can @@ -1016,6 +1018,8 @@ static void check_object(struct object_entry *entry) /* Not a delta hence we've already got all we need. */ entry->type = entry->in_pack_type; entry->in_pack_header_size = used; + if (entry->type < OBJ_COMMIT || entry->type > OBJ_BLOB) + goto give_up; unuse_pack(&w_curs); return; case OBJ_REF_DELTA: @@ -1032,16 +1036,20 @@ static void check_object(struct object_entry *entry) ofs = c & 127; while (c & 128) { ofs += 1; - if (!ofs || MSB(ofs, 7)) - die("delta base offset overflow in pack for %s", - sha1_to_hex(entry->idx.sha1)); + if (!ofs || MSB(ofs, 7)) { + error("delta base offset overflow in pack for %s", + sha1_to_hex(entry->idx.sha1)); + goto give_up; + } c = buf[used_0++]; ofs = (ofs << 7) + (c & 127); } ofs = entry->in_pack_offset - ofs; - if (ofs <= 0 || ofs >= entry->in_pack_offset) - die("delta base offset out of bound for %s", - sha1_to_hex(entry->idx.sha1)); + if (ofs <= 0 || ofs >= entry->in_pack_offset) { + error("delta base offset out of bound for %s", + sha1_to_hex(entry->idx.sha1)); + goto give_up; + } if (reuse_delta && !entry->preferred_base) { struct revindex_entry *revidx; revidx = find_pack_revindex(p, ofs); @@ -1078,6 +1086,8 @@ static void check_object(struct object_entry *entry) */ entry->size = get_size_from_delta(p, &w_curs, entry->in_pack_offset + entry->in_pack_header_size); + if (entry->size == 0) + goto give_up; unuse_pack(&w_curs); return; } @@ -1087,6 +1097,7 @@ static void check_object(struct object_entry *entry) * with sha1_object_info() to find about the object type * at this point... */ + give_up: unuse_pack(&w_curs); } -- cgit v1.2.1 From 08698b1e32bc414f214b7300b40c30a30d9ecd1c Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:49 -0400 Subject: make find_pack_revindex() aware of the nasty world It currently calls die() whenever given offset is not found thinking that such thing should never happen. But this offset may come from a corrupted pack whych _could_ happen and not be found. Callers should deal with this possibility gracefully instead. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'builtin-pack-objects.c') diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index c05fb944b0..04a5a55ef9 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1053,6 +1053,8 @@ static void check_object(struct object_entry *entry) if (reuse_delta && !entry->preferred_base) { struct revindex_entry *revidx; revidx = find_pack_revindex(p, ofs); + if (!revidx) + goto give_up; base_ref = nth_packed_object_sha1(p, revidx->nr); } entry->in_pack_header_size = used + used_0; -- cgit v1.2.1 From 64bd76b1de75483dea646c39c390113ffc821299 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:50 -0400 Subject: pack-objects: allow "fixing" a corrupted pack without a full repack When the pack data to be reused is found to be bad, let's fall back to full object access through the generic path which has its own strategies to find alternate object sources in that case. This allows for "fixing" a corrupted pack simply by copying either another pack containing the object(s) found to be bad, or the loose object itself, into the object store and launch a repack without the need for -f. Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) (limited to 'builtin-pack-objects.c') diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 04a5a55ef9..5be6664c72 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -277,6 +277,7 @@ static unsigned long write_object(struct sha1file *f, */ if (!to_reuse) { + no_reuse: if (!usable_delta) { buf = read_sha1_file(entry->idx.sha1, &type, &size); if (!buf) @@ -358,20 +359,30 @@ static unsigned long write_object(struct sha1file *f, struct revindex_entry *revidx; off_t offset; - if (entry->delta) { + if (entry->delta) type = (allow_ofs_delta && entry->delta->idx.offset) ? OBJ_OFS_DELTA : OBJ_REF_DELTA; - reused_delta++; - } hdrlen = encode_header(type, entry->size, header); + offset = entry->in_pack_offset; revidx = find_pack_revindex(p, offset); datalen = revidx[1].offset - offset; if (!pack_to_stdout && p->index_version > 1 && - check_pack_crc(p, &w_curs, offset, datalen, revidx->nr)) - die("bad packed object CRC for %s", sha1_to_hex(entry->idx.sha1)); + check_pack_crc(p, &w_curs, offset, datalen, revidx->nr)) { + error("bad packed object CRC for %s", sha1_to_hex(entry->idx.sha1)); + unuse_pack(&w_curs); + goto no_reuse; + } + offset += entry->in_pack_header_size; datalen -= entry->in_pack_header_size; + if (!pack_to_stdout && p->index_version == 1 && + check_pack_inflate(p, &w_curs, offset, datalen, entry->size)) { + error("corrupt packed object for %s", sha1_to_hex(entry->idx.sha1)); + unuse_pack(&w_curs); + goto no_reuse; + } + if (type == OBJ_OFS_DELTA) { off_t ofs = entry->idx.offset - entry->delta->idx.offset; unsigned pos = sizeof(dheader) - 1; @@ -383,21 +394,19 @@ static unsigned long write_object(struct sha1file *f, sha1write(f, header, hdrlen); sha1write(f, dheader + pos, sizeof(dheader) - pos); hdrlen += sizeof(dheader) - pos; + reused_delta++; } else if (type == OBJ_REF_DELTA) { if (limit && hdrlen + 20 + datalen + 20 >= limit) return 0; sha1write(f, header, hdrlen); sha1write(f, entry->delta->idx.sha1, 20); hdrlen += 20; + reused_delta++; } else { if (limit && hdrlen + datalen + 20 >= limit) return 0; sha1write(f, header, hdrlen); } - - if (!pack_to_stdout && p->index_version == 1 && - check_pack_inflate(p, &w_curs, offset, datalen, entry->size)) - die("corrupt packed object for %s", sha1_to_hex(entry->idx.sha1)); copy_pack_data(f, p, &w_curs, offset, datalen); unuse_pack(&w_curs); reused++; @@ -1074,6 +1083,7 @@ static void check_object(struct object_entry *entry) */ entry->type = entry->in_pack_type; entry->delta = base_entry; + entry->delta_size = entry->size; entry->delta_sibling = base_entry->delta_child; base_entry->delta_child = entry; unuse_pack(&w_curs); -- cgit v1.2.1 From 59dd9ed18398d96922e345f7987f1245870fb3e5 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Wed, 29 Oct 2008 19:02:52 -0400 Subject: pack-objects: don't leak pack window reference when splitting packs Signed-off-by: Nicolas Pitre Signed-off-by: Junio C Hamano --- builtin-pack-objects.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) (limited to 'builtin-pack-objects.c') diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 5be6664c72..1b6eff314e 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -389,22 +389,28 @@ static unsigned long write_object(struct sha1file *f, dheader[pos] = ofs & 127; while (ofs >>= 7) dheader[--pos] = 128 | (--ofs & 127); - if (limit && hdrlen + sizeof(dheader) - pos + datalen + 20 >= limit) + if (limit && hdrlen + sizeof(dheader) - pos + datalen + 20 >= limit) { + unuse_pack(&w_curs); return 0; + } sha1write(f, header, hdrlen); sha1write(f, dheader + pos, sizeof(dheader) - pos); hdrlen += sizeof(dheader) - pos; reused_delta++; } else if (type == OBJ_REF_DELTA) { - if (limit && hdrlen + 20 + datalen + 20 >= limit) + if (limit && hdrlen + 20 + datalen + 20 >= limit) { + unuse_pack(&w_curs); return 0; + } sha1write(f, header, hdrlen); sha1write(f, entry->delta->idx.sha1, 20); hdrlen += 20; reused_delta++; } else { - if (limit && hdrlen + datalen + 20 >= limit) + if (limit && hdrlen + datalen + 20 >= limit) { + unuse_pack(&w_curs); return 0; + } sha1write(f, header, hdrlen); } copy_pack_data(f, p, &w_curs, offset, datalen); -- cgit v1.2.1