diff options
author | Timothée Ravier <travier@redhat.com> | 2020-10-02 14:34:33 +0200 |
---|---|---|
committer | Timothée Ravier <travier@redhat.com> | 2020-10-02 14:34:48 +0200 |
commit | 68ac9e9c50d0e190a5a82bef5fdea29bd46fdd0a (patch) | |
tree | 57e51adcd58f966a09beae6dd437c3e14a9d2273 /docs/formats.md | |
parent | 6ca312a92399c92b60d83eae161a8b8ea148989b (diff) | |
download | ostree-68ac9e9c50d0e190a5a82bef5fdea29bd46fdd0a.tar.gz |
docs: Move and update pages from the manual
Diffstat (limited to 'docs/formats.md')
-rw-r--r-- | docs/formats.md | 196 |
1 files changed, 196 insertions, 0 deletions
diff --git a/docs/formats.md b/docs/formats.md new file mode 100644 index 00000000..36d395bd --- /dev/null +++ b/docs/formats.md @@ -0,0 +1,196 @@ +--- +nav_order: 7 +--- + +# OSTree data formats +{: .no_toc } + +1. TOC +{:toc} + +## On the topic of "smart servers" + +One really crucial difference between OSTree and git is that git has a +"smart server". Even when fetching over `https://`, it isn't just a +static webserver, but one that e.g. dynamically computes and +compresses pack files for each client. + +In contrast, the author of OSTree feels that for operating system +updates, many deployments will want to use simple static webservers, +the same target most package systems were designed to use. The +primary advantages are security and compute efficiency. Services like +Amazon S3 and CDNs are a canonical target, as well as a stock static +nginx server. + +## The archive format + +In the [repo](repo) section, the concept of objects was introduced, +where file/content objects are checksummed and managed individually. +(Unlike a package system, which operates on compressed aggregates). + +The `archive` format simply gzip-compresses each content object. +Metadata objects are stored uncompressed. This means that it's easy +to serve via static HTTP. Note: the repo config file still uses the +historical term `archive-z2` as mode. But this essentially indicates +the modern `archive` format. + +When you commit new content, you will see new `.filez` files appearing +in `objects/`. + +## archive efficiency + +The advantages of `archive`: + + - It's easy to understand and implement + - Can be served directly over plain HTTP by a static webserver + - Clients can download/unpack updates incrementally + - Space efficient on the server + +The biggest disadvantage of this format is that for a client to +perform an update, one HTTP request per changed file is required. In +some scenarios, this actually isn't bad at all, particularly with +techniques to reduce HTTP overhead, such as +[HTTP/2](https://en.wikipedia.org/wiki/HTTP/2). + +In order to make this format work well, you should design your content +such that large data that changes infrequently (e.g. graphic images) +are stored separately from small frequently changing data (application +code). + +Other disadvantages of `archive`: + + - It's quite bad when clients are performing an initial pull (without HTTP/2), + - One doesn't know the total size (compressed or uncompressed) of content + before downloading everything + +## Aside: the bare and bare-user formats + +The most common operation is to pull from an `archive` repository +into a `bare` or `bare-user` formatted repository. These latter two +are not compressed on disk. In other words, pulling to them is +similar to unpacking (but not installing) an RPM/deb package. + +The `bare-user` format is a bit special in that the uid/gid and xattrs +from the content are ignored. This is primarily useful if you want to +have the same OSTree-managed content that can be run on a host system +or an unprivileged container. + +## Static deltas + +OSTree itself was originally focused on a continuous delivery model, where +client systems are expected to update regularly. However, many OS vendors +would like to supply content that's updated e.g. once a month or less often. + +For this model, we can do a lot better to support batched updates than +a basic `archive` repo. However, we still want to preserve the +model of "static webserver only". Given this, OSTree has gained the +concept of a "static delta". + +These deltas are targeted to be a delta between two specific commit +objects, including "bsdiff" and "rsync-style" deltas within a content +object. Static deltas also support `from NULL`, where the client can +more efficiently download a commit object from scratch - this is +mostly useful when using OSTree for containers, rather than OS images. +For OS images, one tends to download an installer ISO or qcow2 image +which is a single file that contains the tree data already. + +Effectively, we're spending server-side storage (and one-time compute +cost), and gaining efficiency in client network bandwidth. + +## Static delta repository layout + +Since static deltas may not exist, the client first needs to attempt +to locate one. Suppose a client wants to retrieve commit `${new}` +while currently running `${current}`. + +The first thing to understand is that in order to save space, these +two commits are "modified base64" - the `/` character is replaced with +`_`. + +Like the commit objects, a "prefix directory" is used to make +management easier for filesystem tools + +A delta is named `$(mbase64 $from)-$(mbase64 $to)`, for example +`GpTyZaVut2jXFPWnO4LJiKEdRTvOw_mFUCtIKW1NIX0-L8f+VVDkEBKNc1Ncd+mDUrSVR4EyybQGCkuKtkDnTwk`, +which in SHA256 format is +`1a94f265a56eb768d714f5a73b82c988a11d453bcec3f985502b48296d4d217d-2fc7fe5550e410128d73535c77e98352b495478132c9b4060a4b8ab640e74f09`. + +Finally, the actual content can be found in +`deltas/$fromprefix/$fromsuffix-$to`. + +## Static delta internal structure + +A delta is itself a directory. Inside, there is a file called +`superblock` which contains metadata. The rest of the files will be +integers bearing packs of content. + +The file format of static deltas should be currently considered an +OSTree implementation detail. Obviously, nothing stops one from +writing code which is compatible with OSTree today. However, we would +like the flexibility to expand and change things, and having multiple +codebases makes that more problematic. Please contact the authors +with any requests. + +That said, one critical thing to understand about the design is that +delta payloads are a bit more like "restricted programs" than they are +raw data. There's a "compilation" phase which generates output that +the client executes. + +This "updates as code" model allows for multiple content generation +strategies. The design of this was inspired by that of Chromium: +[ChromiumOS Autoupdate](http://dev.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate). + +### The delta superblock + +The superblock contains: + + - arbitrary metadata + - delta generation timestamp + - the new commit object + - An array of recursive deltas to apply + - An array of per-part metadata, including total object sizes (compressed and uncompressed), + - An array of fallback objects + +Let's define a delta part, then return to discuss details: + +## A delta part + +A delta part is a combination of a raw blob of data, plus a very +restricted bytecode that operates on it. Say for example two files +happen to share a common section. It's possible for the delta +compilation to include that section once in the delta data blob, then +generate instructions to write out that blob twice when generating +both objects. + +Realistically though, it's very common for most of a delta to just be +"stream of new objects" - if one considers it, it doesn't make sense +to have too much duplication inside operating system content at this +level. + +So then, what's more interesting is that OSTree static deltas support +a per-file delta algorithm called +[bsdiff](https://github.com/mendsley/bsdiff) that most notably works +well on executable code. + +The current delta compiler scans for files with matching basenames in +each commit that have a similar size, and attempts a bsdiff between +them. (It would make sense later to have a build system provide a +hint for this - for example, files within a same package). + +A generated bsdiff is included in the payload blob, and applying it is +an instruction. + +## Fallback objects + +It's possible for there to be large-ish files which might be resistant +to bsdiff. A good example is that it's common for operating systems +to use an "initramfs", which is itself a compressed filesystem. This +"internal compression" defeats bsdiff analysis. + +For these types of objects, the delta superblock contains an array of +"fallback objects". These objects aren't included in the delta +parts - the client simply fetches them from the underlying `.filez` +object. + +###### Licensing for this document: +`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)` |