diff options
Diffstat (limited to 'doc/developers/bundle-format4.txt')
-rw-r--r-- | doc/developers/bundle-format4.txt | 272 |
1 files changed, 272 insertions, 0 deletions
diff --git a/doc/developers/bundle-format4.txt b/doc/developers/bundle-format4.txt new file mode 100644 index 0000000..0af1489 --- /dev/null +++ b/doc/developers/bundle-format4.txt @@ -0,0 +1,272 @@ +============================================ +Merge Directive format 2 and Bundle format 4 +============================================ + +:Date: 2007-06-21 + +Motivation +---------- +Merge Directive format 2 represents a request to perform a certain merge. It +provides access to all the data necessary to perform that merge, by including +a branch URL or a bundle payload. It typically will include a preview of +what applying the patch would do. + +Bundle Format 4 is designed to be a compact format for storing revision +metadata that can be generated quickly and installed into a repository +efficiently. It is not intended to be human-readable. + +Note +---- +These two formats, taken together, can be viewed as the successor of Bundle +format 0.9, so their specifications are combined. It is expected that in the +future, bundle and merge-directive formats will vary independently. + + +Bundle Format Name +------------------ +This is the fourth bundle format to see public use. Previous versions were +0.7, 0.8, and 0.9. Only 0.7's version number was aligned with a Bazaar +release. + + +Dependencies +------------ +- Container format 1 +- Multiparent diffs +- Bencode +- Patch-RIO + + +Description +----------- +Merge Directives fulfil the role previous bundle formats had of requesting a +merge to be performed, but are a more flexible way of doing so. With the +introduction of these two formats, there is a clear split between "directive", +which is a request to merge (and therefore signable), and "bundle", which is +just data. + +Merge Directive format 2 may provide a patch preview of the change being +requested. If a preview is supplied, the receiving client will verify that +the actual change matches the preview. + +Merge Directive format 2 also includes a testament hash, to ensure that if a +branch is used, the branch cannot be subverted to cause the wrong changes to be +applied. + +Bundle format 4 is designed to trade human-readability for speed and +compactness. It does not contain a human-readable "prelude" patch. + +Merge Directive 2 Contents +-------------------------- +This format consists of three sections, in the following order. + + +Patch-RIO command section +~~~~~~~~~~~~~~~~~~~~~~~~~ +This section is identical to the corresponding section in Format 1 merge +directives, except as noted below. It is mandatory. It is terminated by a +line reading ``#`` that is not preceeded by a line ending with ``\``. + +In order to support cherry-picking and patch comparison, this format adds a new +piece of information, the ``base_revision_id``. This is a suggested base +revision for merging. It may be supplied by the user. If not, it is +calculated using the standard merge base algorithm, with the ``revision_id`` +and target branch's ``last_revision`` as its inputs. + +When merging, clients should use the ``base_revision_id`` when it is not +already present in the ancestry of the ``last_revision`` of the target branch. +If it is already present, clients should calculate a merge base in the normal +way. + + +Patch preview section +~~~~~~~~~~~~~~~~~~~~~ +This section is optional. It begins with the line ``# Begin patch``. It is +terminated by the end-of-file or by the beginning of a bundle section. + +Its contents are a unified diff, as per the ``bzr diff`` command. The FROM +revision is the ``base_revision_id`` specified in the Patch-RIO section. + + +Bundle section +~~~~~~~~~~~~~~ +This section is optional, but if it is not supplied, a source_branch must be +supplied. It begins with the line ``# Begin bundle``, and is terminated by the +end-of-file. + +The contents are a base-64 encoded bundle. This may be any bundle format, but +formats 4+ are strongly recommended. The base revision is the newest revision +in the source branch which is an ancestor of all revisions not present in +target which are ancestors of revision_id. + +This base revision may or may not be the same as the ``base_revision_id``. In +particular, the ``base_revision_id`` may specify a cherry-pick, but all the +ancestors of the ``base_revision_id`` should be installed in the target +repository before performing such a merge. + + +Bundle 4 Contents +----------------- +Bazaar revision bundles begin with a format marker that reads +``# Bazaar revision bundle v4`` in plaintext. The remainder of the file is a +``Bazaar pack format 1`` container. The container is compressed using bzip2. + +Putting the format marker in plaintext ensures that old clients will give good +diagnostics, but renders the file unreadable by standard bzip2 utilities. + +Serialization +~~~~~~~~~~~~~ +Format 4 records revision and inventory records in their repository +serialization format. This minimizes translation and compression costs +in the common case, where the sender and receiver use the same serialization +format for their repository. Steps have been taken to ensure a faithful +conversion when serialization formats are mismatched. + + +Bundle Records +~~~~~~~~~~~~~~ +The bundle format creates a single bundle-level record out of two container +records. The first container record contains metainfo as a Bencoded dict. The +second container record contains the body. + +The bundle record name is associated with the metainfo record. The body record +is anonymous. + + +Record metainfo +~~~~~~~~~~~~~~~ + +:record_kind: The storage strategy of the record. May be ``fulltext`` (the + record body contains the full text of the value), ``mpdiff`` (the record + body contains a multi-parent diff of the value), or ``header`` (no record + body). +:parents: Used in fulltext and mpdiff records. The revisions that should be + noted as parents of this revision in the repository. For mpdiffs, this is + also the list of build-parents. +:sha1: Used in mpdiff records. The sha-1 hash of the full-text value. + + +Bundle record naming +~~~~~~~~~~~~~~~~~~~~~ +All bundle records have a single name, which is associated with the metainfo +container record. Records are named according to the body's content-kind, +revision-id, and file-id. + +Content-kind may be one of: + +:file: a version of a user file +:inventory: the tree inventory +:revision: the revision metadata for a revision +:signature: the revision signature for a revision + +Names are constructed like so: ``content-kind/revision-id/file-id``. Values +are iterpreted left-to-right, so if two values are present, they are +content-kind and revision-id. +A record has a file-id if-and-only-if it is a file record. +Info records have no revision or file-id. +Inventory, revision and signature all have content-kind and revision-id, but +no file-id. + +Layout +~~~~~~ +The first record is an info/header record. + +The subsequent records are mpdiff file records. The are ordered first by file +id, then in topological order by revision-id. + +The next records are mpdiff inventory records. They are topologically sorted. + +The next records are revision and signature fulltexts. They are interleaved +and topologically sorted. + +Info record +~~~~~~~~~~~ +The info record has type ``header``. It has no revision_id or file_id. +Its metadata contains: + +:serializer: A string describing the serialization format used for inventory + and revision data. May be ``xml5``, ``xml6`` or ``xml7``. +:supports_rich_root: 1 if the source repository supports rich roots, + 0 otherwise. + + +Implementation notes +~~~~~~~~~~~~~~~~~~~~ +- knit deltas contain almost enough information to extract the original + SequenceMatcher.get_matching_blocks() call used to produce them. Combining + that information with the relevant fulltexts allows us to avoid performing + sequence matching on any fulltexts for which we have deltas. + +- MultiParent deltas contain ``get_matching_blocks`` output almost verbatim, + but if there is more than one parent, the information about the leftmost + parent may be incomplete. However, for single-parent multiparent diffs, we + can extract the ``SequenceMatcher.get_matching_blocks`` output, and therefore + ``the SequenceMatcher.get_opcodes`` output used to create knit deltas. + + +Installing data across serialization mismatches +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In practice, there cannot be revision serialization mismatches, because the +serialization of revisions has been consistent in serializations 5-7 + +If there is a mismatch in inventory serialization formats, the receiver can + + 1. extract the inventory objects for the parents + 2. serialize them using the bundle serialize + 3. apply the mpdiff + 4. calculate the fulltext sha1 + 5. compare the calculated sha1 to the expected sha1 + 6. deserialize using the bundle serializer + 7. serialize using the repository serializer + 8. add to the repository + +This is much slower, of course. But since the since the fulltext is verified +at step 5, it should be just as safe as any other conversion. + +Model differences +~~~~~~~~~~~~~~~~~ + +Note that there may be model differences requiring additional changes. These +differences are described by the "supports_rich_root" value in the info record. + +A subset of xml6 and xml7 records are compatible with xml5 (i.e. those that +were converted from xml5 originally). + +When installing from a bundle whose serializer supports tree references to a +repository that does not support tree references, clients should halt if they +encounter a record containing a tree reference. + +When installing from a supports_rich_root bundle to a repository that does not +support rich roots, clients should halt if they encounter an inventory record +whose root directory revision-id does not match the inventory revision id. + +When installing from a bundle that does not support rich roots to a repository +that does, additional knits should be added for the root directory, with a +revision for each inventory revision. + +Validating preview patches +~~~~~~~~~~~~~~~~~~~~~~~~~~ +When applying a merge directive that includes a preview, clients should +verify that the preview matches the changes requested by the merge directive. + +In order to do this, the client should generate a diff from the +``base_revision_id`` to the ``revision_id``. This diff should be compared +against the preview patch, making allowances for the fact that whitespace +munging may have occurred. + +One form of whitespace munging that has been observed is line-ending +conversion. Certain mail clients such as Evolution do not respect the +line-endings of text attachments. Since line-ending conversion is unlikely to +alter the meaning of a patch, it seems safe to ignore line endings when +comparing the preview patch. + +Another form of whitespace munging that has been observed is +trailing-whitespace stripping. Again, it seems unlikely that stripping +trailing whitespace could alter the meaning of a patch. Such a distinction +is also invisible to readers, so ignoring it does not create a new threat. So +it seems reasonable to ignore trailing whitespace when comparing the patches. + +Other mungings are possible, but it is recommended not to implement support +for them until they have been observed. Each of these changes makes the +comparison more approximate, and the more approximate it becomes, the easier it +is to provide a preview patch that does not match the requested changes. |