summaryrefslogtreecommitdiff
path: root/docs/formats.md
diff options
context:
space:
mode:
authorTimothée Ravier <travier@redhat.com>2020-10-02 14:34:33 +0200
committerTimothée Ravier <travier@redhat.com>2020-10-02 14:34:48 +0200
commit68ac9e9c50d0e190a5a82bef5fdea29bd46fdd0a (patch)
tree57e51adcd58f966a09beae6dd437c3e14a9d2273 /docs/formats.md
parent6ca312a92399c92b60d83eae161a8b8ea148989b (diff)
downloadostree-68ac9e9c50d0e190a5a82bef5fdea29bd46fdd0a.tar.gz
docs: Move and update pages from the manual
Diffstat (limited to 'docs/formats.md')
-rw-r--r--docs/formats.md196
1 files changed, 196 insertions, 0 deletions
diff --git a/docs/formats.md b/docs/formats.md
new file mode 100644
index 00000000..36d395bd
--- /dev/null
+++ b/docs/formats.md
@@ -0,0 +1,196 @@
+---
+nav_order: 7
+---
+
+# OSTree data formats
+{: .no_toc }
+
+1. TOC
+{:toc}
+
+## On the topic of "smart servers"
+
+One really crucial difference between OSTree and git is that git has a
+"smart server". Even when fetching over `https://`, it isn't just a
+static webserver, but one that e.g. dynamically computes and
+compresses pack files for each client.
+
+In contrast, the author of OSTree feels that for operating system
+updates, many deployments will want to use simple static webservers,
+the same target most package systems were designed to use. The
+primary advantages are security and compute efficiency. Services like
+Amazon S3 and CDNs are a canonical target, as well as a stock static
+nginx server.
+
+## The archive format
+
+In the [repo](repo) section, the concept of objects was introduced,
+where file/content objects are checksummed and managed individually.
+(Unlike a package system, which operates on compressed aggregates).
+
+The `archive` format simply gzip-compresses each content object.
+Metadata objects are stored uncompressed. This means that it's easy
+to serve via static HTTP. Note: the repo config file still uses the
+historical term `archive-z2` as mode. But this essentially indicates
+the modern `archive` format.
+
+When you commit new content, you will see new `.filez` files appearing
+in `objects/`.
+
+## archive efficiency
+
+The advantages of `archive`:
+
+ - It's easy to understand and implement
+ - Can be served directly over plain HTTP by a static webserver
+ - Clients can download/unpack updates incrementally
+ - Space efficient on the server
+
+The biggest disadvantage of this format is that for a client to
+perform an update, one HTTP request per changed file is required. In
+some scenarios, this actually isn't bad at all, particularly with
+techniques to reduce HTTP overhead, such as
+[HTTP/2](https://en.wikipedia.org/wiki/HTTP/2).
+
+In order to make this format work well, you should design your content
+such that large data that changes infrequently (e.g. graphic images)
+are stored separately from small frequently changing data (application
+code).
+
+Other disadvantages of `archive`:
+
+ - It's quite bad when clients are performing an initial pull (without HTTP/2),
+ - One doesn't know the total size (compressed or uncompressed) of content
+ before downloading everything
+
+## Aside: the bare and bare-user formats
+
+The most common operation is to pull from an `archive` repository
+into a `bare` or `bare-user` formatted repository. These latter two
+are not compressed on disk. In other words, pulling to them is
+similar to unpacking (but not installing) an RPM/deb package.
+
+The `bare-user` format is a bit special in that the uid/gid and xattrs
+from the content are ignored. This is primarily useful if you want to
+have the same OSTree-managed content that can be run on a host system
+or an unprivileged container.
+
+## Static deltas
+
+OSTree itself was originally focused on a continuous delivery model, where
+client systems are expected to update regularly. However, many OS vendors
+would like to supply content that's updated e.g. once a month or less often.
+
+For this model, we can do a lot better to support batched updates than
+a basic `archive` repo. However, we still want to preserve the
+model of "static webserver only". Given this, OSTree has gained the
+concept of a "static delta".
+
+These deltas are targeted to be a delta between two specific commit
+objects, including "bsdiff" and "rsync-style" deltas within a content
+object. Static deltas also support `from NULL`, where the client can
+more efficiently download a commit object from scratch - this is
+mostly useful when using OSTree for containers, rather than OS images.
+For OS images, one tends to download an installer ISO or qcow2 image
+which is a single file that contains the tree data already.
+
+Effectively, we're spending server-side storage (and one-time compute
+cost), and gaining efficiency in client network bandwidth.
+
+## Static delta repository layout
+
+Since static deltas may not exist, the client first needs to attempt
+to locate one. Suppose a client wants to retrieve commit `${new}`
+while currently running `${current}`.
+
+The first thing to understand is that in order to save space, these
+two commits are "modified base64" - the `/` character is replaced with
+`_`.
+
+Like the commit objects, a "prefix directory" is used to make
+management easier for filesystem tools
+
+A delta is named `$(mbase64 $from)-$(mbase64 $to)`, for example
+`GpTyZaVut2jXFPWnO4LJiKEdRTvOw_mFUCtIKW1NIX0-L8f+VVDkEBKNc1Ncd+mDUrSVR4EyybQGCkuKtkDnTwk`,
+which in SHA256 format is
+`1a94f265a56eb768d714f5a73b82c988a11d453bcec3f985502b48296d4d217d-2fc7fe5550e410128d73535c77e98352b495478132c9b4060a4b8ab640e74f09`.
+
+Finally, the actual content can be found in
+`deltas/$fromprefix/$fromsuffix-$to`.
+
+## Static delta internal structure
+
+A delta is itself a directory. Inside, there is a file called
+`superblock` which contains metadata. The rest of the files will be
+integers bearing packs of content.
+
+The file format of static deltas should be currently considered an
+OSTree implementation detail. Obviously, nothing stops one from
+writing code which is compatible with OSTree today. However, we would
+like the flexibility to expand and change things, and having multiple
+codebases makes that more problematic. Please contact the authors
+with any requests.
+
+That said, one critical thing to understand about the design is that
+delta payloads are a bit more like "restricted programs" than they are
+raw data. There's a "compilation" phase which generates output that
+the client executes.
+
+This "updates as code" model allows for multiple content generation
+strategies. The design of this was inspired by that of Chromium:
+[ChromiumOS Autoupdate](http://dev.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate).
+
+### The delta superblock
+
+The superblock contains:
+
+ - arbitrary metadata
+ - delta generation timestamp
+ - the new commit object
+ - An array of recursive deltas to apply
+ - An array of per-part metadata, including total object sizes (compressed and uncompressed),
+ - An array of fallback objects
+
+Let's define a delta part, then return to discuss details:
+
+## A delta part
+
+A delta part is a combination of a raw blob of data, plus a very
+restricted bytecode that operates on it. Say for example two files
+happen to share a common section. It's possible for the delta
+compilation to include that section once in the delta data blob, then
+generate instructions to write out that blob twice when generating
+both objects.
+
+Realistically though, it's very common for most of a delta to just be
+"stream of new objects" - if one considers it, it doesn't make sense
+to have too much duplication inside operating system content at this
+level.
+
+So then, what's more interesting is that OSTree static deltas support
+a per-file delta algorithm called
+[bsdiff](https://github.com/mendsley/bsdiff) that most notably works
+well on executable code.
+
+The current delta compiler scans for files with matching basenames in
+each commit that have a similar size, and attempts a bsdiff between
+them. (It would make sense later to have a build system provide a
+hint for this - for example, files within a same package).
+
+A generated bsdiff is included in the payload blob, and applying it is
+an instruction.
+
+## Fallback objects
+
+It's possible for there to be large-ish files which might be resistant
+to bsdiff. A good example is that it's common for operating systems
+to use an "initramfs", which is itself a compressed filesystem. This
+"internal compression" defeats bsdiff analysis.
+
+For these types of objects, the delta superblock contains an array of
+"fallback objects". These objects aren't included in the delta
+parts - the client simply fetches them from the underlying `.filez`
+object.
+
+###### Licensing for this document:
+`SPDX-License-Identifier: (CC-BY-SA-3.0 OR GFDL-1.3-or-later)`