============================================ Merge Directive format 2 and Bundle format 4 ============================================ :Date: 2007-06-21 Motivation ---------- Merge Directive format 2 represents a request to perform a certain merge. It provides access to all the data necessary to perform that merge, by including a branch URL or a bundle payload. It typically will include a preview of what applying the patch would do. Bundle Format 4 is designed to be a compact format for storing revision metadata that can be generated quickly and installed into a repository efficiently. It is not intended to be human-readable. Note ---- These two formats, taken together, can be viewed as the successor of Bundle format 0.9, so their specifications are combined. It is expected that in the future, bundle and merge-directive formats will vary independently. Bundle Format Name ------------------ This is the fourth bundle format to see public use. Previous versions were 0.7, 0.8, and 0.9. Only 0.7's version number was aligned with a Bazaar release. Dependencies ------------ - Container format 1 - Multiparent diffs - Bencode - Patch-RIO Description ----------- Merge Directives fulfil the role previous bundle formats had of requesting a merge to be performed, but are a more flexible way of doing so. With the introduction of these two formats, there is a clear split between "directive", which is a request to merge (and therefore signable), and "bundle", which is just data. Merge Directive format 2 may provide a patch preview of the change being requested. If a preview is supplied, the receiving client will verify that the actual change matches the preview. Merge Directive format 2 also includes a testament hash, to ensure that if a branch is used, the branch cannot be subverted to cause the wrong changes to be applied. Bundle format 4 is designed to trade human-readability for speed and compactness. It does not contain a human-readable "prelude" patch. Merge Directive 2 Contents -------------------------- This format consists of three sections, in the following order. Patch-RIO command section ~~~~~~~~~~~~~~~~~~~~~~~~~ This section is identical to the corresponding section in Format 1 merge directives, except as noted below. It is mandatory. It is terminated by a line reading ``#`` that is not preceeded by a line ending with ``\``. In order to support cherry-picking and patch comparison, this format adds a new piece of information, the ``base_revision_id``. This is a suggested base revision for merging. It may be supplied by the user. If not, it is calculated using the standard merge base algorithm, with the ``revision_id`` and target branch's ``last_revision`` as its inputs. When merging, clients should use the ``base_revision_id`` when it is not already present in the ancestry of the ``last_revision`` of the target branch. If it is already present, clients should calculate a merge base in the normal way. Patch preview section ~~~~~~~~~~~~~~~~~~~~~ This section is optional. It begins with the line ``# Begin patch``. It is terminated by the end-of-file or by the beginning of a bundle section. Its contents are a unified diff, as per the ``bzr diff`` command. The FROM revision is the ``base_revision_id`` specified in the Patch-RIO section. Bundle section ~~~~~~~~~~~~~~ This section is optional, but if it is not supplied, a source_branch must be supplied. It begins with the line ``# Begin bundle``, and is terminated by the end-of-file. The contents are a base-64 encoded bundle. This may be any bundle format, but formats 4+ are strongly recommended. The base revision is the newest revision in the source branch which is an ancestor of all revisions not present in target which are ancestors of revision_id. This base revision may or may not be the same as the ``base_revision_id``. In particular, the ``base_revision_id`` may specify a cherry-pick, but all the ancestors of the ``base_revision_id`` should be installed in the target repository before performing such a merge. Bundle 4 Contents ----------------- Bazaar revision bundles begin with a format marker that reads ``# Bazaar revision bundle v4`` in plaintext. The remainder of the file is a ``Bazaar pack format 1`` container. The container is compressed using bzip2. Putting the format marker in plaintext ensures that old clients will give good diagnostics, but renders the file unreadable by standard bzip2 utilities. Serialization ~~~~~~~~~~~~~ Format 4 records revision and inventory records in their repository serialization format. This minimizes translation and compression costs in the common case, where the sender and receiver use the same serialization format for their repository. Steps have been taken to ensure a faithful conversion when serialization formats are mismatched. Bundle Records ~~~~~~~~~~~~~~ The bundle format creates a single bundle-level record out of two container records. The first container record contains metainfo as a Bencoded dict. The second container record contains the body. The bundle record name is associated with the metainfo record. The body record is anonymous. Record metainfo ~~~~~~~~~~~~~~~ :record_kind: The storage strategy of the record. May be ``fulltext`` (the record body contains the full text of the value), ``mpdiff`` (the record body contains a multi-parent diff of the value), or ``header`` (no record body). :parents: Used in fulltext and mpdiff records. The revisions that should be noted as parents of this revision in the repository. For mpdiffs, this is also the list of build-parents. :sha1: Used in mpdiff records. The sha-1 hash of the full-text value. Bundle record naming ~~~~~~~~~~~~~~~~~~~~~ All bundle records have a single name, which is associated with the metainfo container record. Records are named according to the body's content-kind, revision-id, and file-id. Content-kind may be one of: :file: a version of a user file :inventory: the tree inventory :revision: the revision metadata for a revision :signature: the revision signature for a revision Names are constructed like so: ``content-kind/revision-id/file-id``. Values are iterpreted left-to-right, so if two values are present, they are content-kind and revision-id. A record has a file-id if-and-only-if it is a file record. Info records have no revision or file-id. Inventory, revision and signature all have content-kind and revision-id, but no file-id. Layout ~~~~~~ The first record is an info/header record. The subsequent records are mpdiff file records. The are ordered first by file id, then in topological order by revision-id. The next records are mpdiff inventory records. They are topologically sorted. The next records are revision and signature fulltexts. They are interleaved and topologically sorted. Info record ~~~~~~~~~~~ The info record has type ``header``. It has no revision_id or file_id. Its metadata contains: :serializer: A string describing the serialization format used for inventory and revision data. May be ``xml5``, ``xml6`` or ``xml7``. :supports_rich_root: 1 if the source repository supports rich roots, 0 otherwise. Implementation notes ~~~~~~~~~~~~~~~~~~~~ - knit deltas contain almost enough information to extract the original SequenceMatcher.get_matching_blocks() call used to produce them. Combining that information with the relevant fulltexts allows us to avoid performing sequence matching on any fulltexts for which we have deltas. - MultiParent deltas contain ``get_matching_blocks`` output almost verbatim, but if there is more than one parent, the information about the leftmost parent may be incomplete. However, for single-parent multiparent diffs, we can extract the ``SequenceMatcher.get_matching_blocks`` output, and therefore ``the SequenceMatcher.get_opcodes`` output used to create knit deltas. Installing data across serialization mismatches ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In practice, there cannot be revision serialization mismatches, because the serialization of revisions has been consistent in serializations 5-7 If there is a mismatch in inventory serialization formats, the receiver can 1. extract the inventory objects for the parents 2. serialize them using the bundle serialize 3. apply the mpdiff 4. calculate the fulltext sha1 5. compare the calculated sha1 to the expected sha1 6. deserialize using the bundle serializer 7. serialize using the repository serializer 8. add to the repository This is much slower, of course. But since the since the fulltext is verified at step 5, it should be just as safe as any other conversion. Model differences ~~~~~~~~~~~~~~~~~ Note that there may be model differences requiring additional changes. These differences are described by the "supports_rich_root" value in the info record. A subset of xml6 and xml7 records are compatible with xml5 (i.e. those that were converted from xml5 originally). When installing from a bundle whose serializer supports tree references to a repository that does not support tree references, clients should halt if they encounter a record containing a tree reference. When installing from a supports_rich_root bundle to a repository that does not support rich roots, clients should halt if they encounter an inventory record whose root directory revision-id does not match the inventory revision id. When installing from a bundle that does not support rich roots to a repository that does, additional knits should be added for the root directory, with a revision for each inventory revision. Validating preview patches ~~~~~~~~~~~~~~~~~~~~~~~~~~ When applying a merge directive that includes a preview, clients should verify that the preview matches the changes requested by the merge directive. In order to do this, the client should generate a diff from the ``base_revision_id`` to the ``revision_id``. This diff should be compared against the preview patch, making allowances for the fact that whitespace munging may have occurred. One form of whitespace munging that has been observed is line-ending conversion. Certain mail clients such as Evolution do not respect the line-endings of text attachments. Since line-ending conversion is unlikely to alter the meaning of a patch, it seems safe to ignore line endings when comparing the preview patch. Another form of whitespace munging that has been observed is trailing-whitespace stripping. Again, it seems unlikely that stripping trailing whitespace could alter the meaning of a patch. Such a distinction is also invisible to readers, so ignoring it does not create a new threat. So it seems reasonable to ignore trailing whitespace when comparing the patches. Other mungings are possible, but it is recommended not to implement support for them until they have been observed. Each of these changes makes the comparison more approximate, and the more approximate it becomes, the easier it is to provide a preview patch that does not match the requested changes.