From 7f474d66e1a8bbedde54abe3ef2ffcd344b177fe Mon Sep 17 00:00:00 2001 From: Monty Date: Sat, 20 Mar 2010 06:32:37 +0000 Subject: Substantial expansion of Ogg container overview document; still requires filling in of several references by not-yet-present examples. git-svn-id: http://svn.xiph.org/trunk/ogg@16991 0101bb08-14d6-0310-b084-bc0e0c8e3800 --- doc/oggstream.html | 494 ++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 378 insertions(+), 116 deletions(-) (limited to 'doc') diff --git a/doc/oggstream.html b/doc/oggstream.html index d39a82c..08fbca4 100644 --- a/doc/oggstream.html +++ b/doc/oggstream.html @@ -70,135 +70,397 @@ li { Fish Logo and Xiph.org -

Ogg logical and physical bitstream overview

- -

Ogg bitstreams

- -

Ogg codecs use octet vectors of raw, compressed data -(packets). These compressed packets do not have any -high-level structure or boundary information; strung together, they -appear to be streams of random bytes with no landmarks.

- -

Raw packets may be used directly by transport mechanisms that provide -their own framing and packet-separation mechanisms (such as UDP -datagrams). For stream based storage (such as files) and transport -(such as TCP streams or pipes), Vorbis and other future Ogg codecs use -the Ogg bitstream format to provide framing/sync, sync recapture -after error, landmarks during seeking, and enough information to -properly separate data back into packets at the original packet -boundaries without relying on decoding to find packet boundaries.

- -

Logical and physical bitstreams

- -

Raw packets are grouped and encoded into contiguous pages of -structured bitstream data called logical bitstreams. A -logical bitstream consists of pages, in order, belonging to a single -codec instance. Each page is a self contained entity (although it is -possible that a packet may be split and encoded across one or more -pages); that is, the page decode mechanism is designed to recognize, -verify and handle single pages at a time from the overall bitstream.

- -

Multiple logical bitstreams can be combined (with restrictions) into a -single physical bitstream. A physical bitstream consists of -multiple logical bitstreams multiplexed at the page level and may -include a 'meta-header' at the beginning of the multiplexed logical -stream that serves as identification magic. Whole pages are taken in -order from multiple logical bitstreams and combined into a single -physical stream of pages. The decoder reconstructs the original -logical bitstreams from the physical bitstream by taking the pages in -order from the physical bitstream and redirecting them into the -appropriate logical decoding entity. The simplest physical bitstream -is a single, unmultiplexed logical bitstream with no meta-header; this -is referred to as a 'degenerate stream'.

- -

Ogg Logical Bitstream Framing discusses +

Ogg bitstream overview

+ +This document serves as starting point for understanding the design +and implementation of the Ogg container format. If you're new to Ogg +or merely want a high-level technical overview, start reading here. +Other documents linked from the index page +give distilled technical descriptions and references of the container +mechanisms. This document is intended to aid understanding. + +

Container format design points

+ +

Ogg is intended to be a simplest-possible container, concerned only +with framing, ordering, and interleave. It can be used as a stream delivery +mechanism, for media file storage, or as a building block toward +implementing a more complex, non-linear container (for example, see +the Skeleton or Annodex/CMML). + +

The Ogg container is not intended to be a monolithic +'kitchen-sink'. It exists only to frame and deliver in-order stream +data and as such is vastly simpler than most other containers. +Elementary and multiplexed streams are both constructed entirely from a +single building block (an Ogg page) comprised of eight fields +totalling twenty-eight bytes (the page header) a list of packet lengths +(up to 255 bytes) and payload data (up to 65025 bytes). The structure +of every page is the same. There are no optional fields or alternate +encodings. + +

Stream and media metadata is contained in Ogg and not built into +the Ogg container itself. Metadata is thus compartmentalized and +layered rather than part of a monolithic design, an especially good +idea as no two groups seem able to agree on what a complete or +complete-enough metadata set should be. In this way, the container and +container implementation are isolated from unnecessary design flux. + +

Streaming

+ +

The Ogg container is primarily a streaming format, +encapsulating chronological, time-linear mixed media into a single +delivery stream or file. The design is such that an application can +always encode and/or decode all features of a bitstream in one pass +with no seeking and minimal buffering. Seeking to provide optimized +encoding (such as two-pass encoding) or interactive decoding (such as +scrubbing or instant replay) is not disallowed or discouraged, however +no container feature requires nonlinear access of the bitstream. + +

Variable Bit Rate, Variable Payload Size

+ +

Ogg is designed to contain any size data payload with bounded, +predictable efficiency. Ogg packets have no maximum size and a +zero-byte minimum size. There is no restriction on size changes from +packet to packet. Variable size packets do not require the use of any +optional or additional container features. There is no optimal +suggested packet size, though special consideration was paid to make +sure 50-200 byte packets were no less efficient than larger packet +sizes. The original design criteria was a 2% overhead at 50 byte +packets, dropping to a maximum working overhead of 1% with larger +packets, and a typical working overhead of .5-.7% for most practical +uses. + +

Simple pagination

+ +

Ogg is a byte-aligned container with no context-dependent, optional +or variable-length fields. Ogg requires no repacking of codec data. +The page structure is written out in-line as packet data is submitted +to the streaming abstraction. In addition, it is possible to +implement both Ogg mux and demux as MT-hot zero-copy abstractions (as +is done in the Tremor sourcebase). + +

Capture

+ +

Ogg is designed for efficient and immediate stream capture with +high confidence. Although packets have no size limit in Ogg, pages +are a maximum of just under 64kB meaning that any Ogg stream can be +captured with confidence after seeing 128kB of data or less [worst +case; typical figure is 6kB] from any random starting point in the +stream. + +

Seeking

+ +

Ogg implements simple coarse- and fine-grained seeking by design. + +

Coarse seeking may be performed by simply 'moving the tone arm' to a +new position and 'dropping the needle'. Rapid capture with +accompanying timecode from any location in an Ogg file is guaranteed +by the stream design. From the acquisition of the first timecode, +all data needed to play back from that time code forward is ahead of +the stream cursor. + +

Ogg implements full sample-granularity seeking using an +interpolated bisection search built on the capture and timecode +mechanisms used by coarse seeking. As above, once a search finds +the desired timecode, all data needed to play back from that time code +forward is ahead of the stream cursor. + +

Both coarse and fine seeking use the page structure and sequencing +inherent to the Ogg format. All Ogg streams are fully seekable from +creation; seekability is unaffected by truncation or missing data, and +is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor +heuristic. + +

Seeking without use of an index is a major point of the Ogg +design. There are several reasons why Ogg forgoes an index: + +

+ +

Simple multiplexing

+ +

Ogg multiplexes streams by interleaving pages from multiple elementary streams into a +multiplexed stream in time order. The multiplexed pages are not +altered. Muxing an Ogg AV stream out of separate audio, +video and data streams is akin to shuffling several decks of cards +together into a single deck; the cards themselves remain unchanged. +Demultiplexing is similarly simple. + +

The goal of this design is to make the mux/demux operation as +trivial as possible to allow live streaming systems to build and +rebuild streams on the fly with minimal CPU usage and no additional +storage or latency requirements. + +

Continuous and Discontinuous Media

+ +

Ogg streams belong to one of two categories, "Continuous" streams and +"Discontinuous" streams. + +

A stream that provides a gapless, time-continuous media type with a +fine-grained timebase is considered to be 'Continuous'. A continuous +stream should never be starved of data. Examples of continuous data +types include broadcast audio and video. + +

A stream that delivers data in a potentially irregular pattern or +with widely spaced timing gaps is considered to be 'Discontinuous'. A +discontinuous stream may be best thought of as data representing +scattered events; although they happen in order, they are typically +unconnected data often located far apart. One example of a +discontinuous stream types would be captioning such as Ogg Kate. Although it's +possible to design captions as a continuous stream type, it's most +natural to think of captions as widely spaced pieces of text with +little happening between. + +

The fundamental reason for distinction between continuous and +discontinuous streams concerns buffering. + +

Buffering

+ +

A continuous stream is, by definition, gapless. Ogg buffering is based +on the simple premise of never allowing an active continuous stream +to starve for data during decode; buffering works ahead until all +continuous streams in a physical stream have data ready and no further. + +

Discontinuous stream data is not assumed to be predictable. The +buffering design takes discontinuous data 'as it comes' rather than +working ahead to look for future discontinuous data for a potentially +unbounded period. Thus, the buffering process makes no attempt to fill +discontinuous stream buffers; their pages simply 'fall out' of the +stream when continuous streams are handled properly. + +

Buffering requirements in this design need not be explicitly +declared or managed in the encoded stream. The decoder simply reads as +much data as is necessary to keep all continuous stream types gapless +and no more, with discontinuous data processed as it arrives in the +continuous data. Buffering is implicitly optimal for the given +stream. Because all pages of all data types are stamped with absolute +timing information within the stream, inter-stream synchronization +timing is always maintained without the need for explicitly declared +buffer-ahead hinting. + +

Codec metadata

+ +

Ogg does not replicate codec-specific metadata into the mux layer +in an attempt to make the mux and codec layer implementations 'fully +separable'. Things like specific timebase, keyframing strategy, frame +duration, etc, do not appear in the Ogg container. The mux layer is, +instead, expected to query a codec through a standardized interface, +left to the implementation, for this data when it is needed. + +

Though modern design wisdom usually prefers to predict all possible +needs of current and future codecs then embed these dependencies and +the required metadata into the container itself, this strategy +increases container specification complexity, fragility, and rigidity. +The mux and codec implementations become more independent, but the +specifications become less independent. A codec can't do what a +container hasn't already provided for. New codecs are harder to +support, and you can do fewer useful things with the ones you've +already got (eg, try to make a good splitter without using any codecs. +You're stuck splitting at keyframes only, or building yet another new +mechanism into the container layer to mark what frames to skip +displaying). + +

Ogg's design goes the opposite direction, where the specification +is to be as simple, easy to understand, and 'proofed' against novel +codecs as possible. When an Ogg mux layer requires codec-specific +information, it queries the codec (or a codec stub). This trades a +more complex implementation for a simpler, more flexible +specification. + +

Stream structure metadata

+ +

The Ogg container itself does not define a metadata system for +declaring the structure and interrelations between multiple media +types in a muxed stream. That is, the Ogg container itself does not +specify data like 'which steam is the subtitle stream?' or 'which +video stream is the primary angle?'. This metadata still exists, but +is stored in the Ogg container rather than being built into the Ogg +container. Xiph specifies the 'Skeleton' metadata format for Ogg +streams, but this decoupling of container and stream structure +metadata means it is possible to use Ogg with any metadata +specification without altering the container itself, or without stream +structure metadata at all. + +

Frame accurate absolute position

+ +

Every Ogg page is stamped with a 64 bit 'granule position' that +serves as an absolute timestamp for mux and seeking. A few nifty +little tricks are usually also embedded in the granpos state, but +we'll leave those aside for the moment (strictly speaking, they're +part of each codec's mapping, not Ogg). + +

As previously mentioned above, granule positions are mapped into +absolute timestamps by the codec, rather than being a hard timestamp. +This allows maximally efficient use of the available 64 bits to +address every sample/frame position without approximation while +supporting new and previously unknown timebase encodings without +needing to extend or update the mux layer. When a codec needs a novel +timebase, it simply brings the code for that mapping along with it. +This is not a theoretical curiosity; new, wholly novel timebases were +deployed with the adoption of both Theora and Dirac. "Rolling INTRA" +(keyframeless video) also benefits from novel use of the granule +position. + +

Ogg stream arrangement

+ +

Packets, pages, and bitstreams

+ +

Ogg codecs use packets. Packets are octet payloads of +raw, compressed data, containing the data needed for a single +decompressed unit, eg, one video frame. Packets have no maximum size +and may be zero length. They do not have any high-level structure or +boundary information; strung together, the unframed packets form a +logical bitstream of apparently random bytes with no internal +landmarks. + +

Logical bitstream packets are grouped and framed into Ogg pages +along with a unique stream serial number to produce a +physical bitstream. An elementary stream is a +physical bitstream containing only the pages framing a single logical +bitstream. Each page is a self contained entity, although a packet may +be split and encoded across one or more pages. The page decode +mechanism is designed to recognize, verify and handle single pages at +a time from the overall bitstream. + +

Ogg Bitstream Framing specifies the page format of an Ogg bitstream, the packet coding process -and logical bitstreams in detail. The remainder of this document -specifies requirements for constructing finished, physical Ogg -bitstreams.

- -

Mapping Restrictions

- -

Logical bitstreams may not be mapped/multiplexed into physical -bitstreams without restriction. Here we discuss design restrictions -on Ogg physical bitstreams in general, mostly to introduce -design rationale. Each 'media' format defines its own (generally more -restrictive) mapping. An 'Ogg Vorbis Audio Bitstream', for example, has a -specific physical bitstream structure. -Any other codec or combination of codecs will generally also mandate a -corresponding restricted physical bitstream format.

- -

additional end-to-end structure

+and elementary bitstreams in detail. + +

Multiplexed bitstreams

+ +

Multiple logical/elementary bitstreams can be combined into a single +multiplexed bitstream by interleaving whole pages from each +contributing elementary stream in time order. The result is a single +physical stream that multiplexes and frames multiple logical streams. +Each logical stream is identified by the unique stream serial number +stamped in its pages. A physical stream may include a 'meta-header' +(such as the Ogg Skeleton) comprising its +own Ogg page at the beginning of the physical stream. A decoder +recovers the original logical/elementary bitstreams out of the +physical bitstream by taking the pages in order from the physical +bitstream and redirecting them into the appropriate logical decoding +entity. + +

Ogg Bitstream Multiplexing specifies +proper multiplexing of an Ogg bitstream in detail. + +

Chaining

+ +

Multiple Ogg physical bitstreams may be concatenated into a single new +stream; this is chaining. The bitstreams do not overlap; the +final page of a given logical bitstream is immediately followed by the +initial page of the next.

+ +

Each logical bitstream in a chain must have a unique serial number +within the scope of the full physical bitstream, not only within a +particular link or segment of the chain.

+ +

Continuous and discontinuous streams

+ +

Within Ogg, each stream must be declared (by the codec) to be +continuous- or discontinuous-time. Most codecs treat all streams they +use as either inherently continuous- or discontinuous-time, although +this is not a requirement. A codec may, as part of its mapping, choose +according to data in the initial header. + +

Continuous-time pages are stamped by end-time, discontinuous pages +are stamped by begin-time. Pages in a multiplexed stream are +interleaved in order of the time stamp regardless of stream type. +Both continuous and discontinuous logical streams are used to seek +within a physical stream, however only continuous streams are used to +determine buffering depth; because discontinuous streams are stamped +by start time, they will always 'fall out' in time when buffering +tracks only the continuous streams. See 'Examples' for an +illustration of the buffering mechanism. + +

Mapping Requirements

+ +

Each codec is allowed some freedom in deciding how its logical +bitstream is encapsulated into an Ogg bitstream (even if it is a +trivial mapping, eg, 'plop the packets in and go'). This is the +codec's mapping. Ogg imposes a few mapping requirements +on any codec.

The framing specification defines 'beginning of stream' and 'end of stream' page markers via a header flag (it is possible for a stream to consist of a single page). A -stream always consists of an integer number of pages, an easy +correct stream always consists of an integer number of pages, an easy requirement given the variable size nature of pages.

-

In addition to the header flag marking the first and last pages of a -logical bitstream, the first page of an Ogg bitstream obeys -additional restrictions. Each individual media mapping specifies its -own implementation details regarding these restrictions.

- -

The first page of a logical Ogg bitstream consists of a single, -small 'initial header' packet that includes sufficient information to -identify the exact CODEC type and media requirements of the logical -bitstream. The intent of this restriction is to simplify identifying -the bitstream type and content; for a given media type (or across all -Ogg media types) we can know that we only need a small, fixed -amount of data to uniquely identify the bitstream type.

- -

As an example, Ogg Vorbis places the name and revision of the Vorbis -CODEC, the audio rate and the audio quality into this initial header, -thus simplifying vastly the certain identification of an Ogg Vorbis -audio bitstream.

- -

sequential multiplexing (chaining)

- -

The simplest form of logical bitstream multiplexing is concatenation -(chaining). Complete logical bitstreams are strung -one-after-another in order. The bitstreams do not overlap; the final -page of a given logical bitstream is immediately followed by the -initial page of the next. Chaining is the only logical->physical -mapping allowed by Ogg Vorbis.

- -

Each chained logical bitstream must have a unique serial number within -the scope of the physical bitstream.

- -

concurrent multiplexing (grouping)

- -

Logical bitstreams may also be multiplexed 'in parallel' -(grouped). An example of grouping would be to allow -streaming of separate audio and video streams, using different codecs -and different logical bitstreams, in the same physical bitstream. -Whole pages from multiple logical bitstreams are mixed together.

- -

The initial pages of each logical bitstream must appear first; the -media mapping specifies the order of the initial pages. For example, -Ogg Theora describes video bitstream with audio. -The mapping specifies that the physical bitstream must begin -with the initial page of a logical video bitstream, followed by the -initial page of an audio stream. Unlike initial pages, terminal pages -for the logical bitstreams need not all occur contiguously (although a -specific media mapping may require this; it is not mandated by the -generic Ogg stream spec). Terminal pages may be 'nil' pages, -that is, pages containing no content but simply a page header with -position information and the 'last page of bitstream' flag set in the -page header.

+

The first page of an elementary Ogg bitstream consists of a single, +small 'initial header' packet that must include sufficient information +to identify the exact CODEC type. From this initial header, the codec +must also be able to determine its timebase and whether or not it is a +continuous- or discontinuous-time stream. The initial header must fit +on a single page. If a codec makes use of auxiliary headers (for +example, Vorbis uses two auxiliary headers), these headers must follow +the initial header immediately. The last header finishes its page; +data begins on a fresh page. + +

As an example, Ogg Vorbis places the name and revision of the +Vorbis CODEC, the audio rate and the audio quality into this initial +header. Comments and detailed codec setup appears in the larger +auxiliary headers.

+ +

Multiplexing Requirements

+ +

Multiplexing requirements within Ogg are straightforward. When +constructing a single-link (unchained) physical bitstream consisting +of multiple elementary streams: + +

    + +
  1. The initial header for each stream appears in sequence, each +header on a single page. All initial headers must appear with no +intervening data (no auxiliary header pages or packets, no data pages +or packets). Order of the initial headers is unspecified. The +'beginning of stream' flag is set on each initial header. + +
  2. All auxiliary headers for all streams must follow. Order +is unspecified. The final auxiliary header of each stream must flush +its page. + +
  3. Data pages for each stream follow, interleaved in time order. + +
  4. The final page of each stream sets the 'end of stream' flag. +Unlike initial pages, terminal pages for the logical bitstreams need +not occur contiguously; indeed it may not be possible for them to do so. +

Each grouped bitstream must have a unique serial number within the scope of the physical bitstream.

-

sequential and concurrent multiplexing

+

chaining and multiplexing

-

Groups of concurrently multiplexed bitstreams may be chained +

Multiplexed and/or unmultiplexed bitstreams may be chained consecutively. Such a physical bitstream obeys all the rules of both -grouped and chained multiplexed streams; the groups, when unchained , -must stand on their own as a valid concurrently multiplexed -bitstream.

+chained and multiplexed streams. Each link, when unchained, must +stand on its own as a valid physical bitstream. Chained streams do +not mix; a new segment may not begin until all streams in the +preceding segment have terminated.

+ +

Examples

-

multiplexing example

+[More to come shortly; this section is currently being revised and expanded]

Below, we present an example of a grouped and chained bitstream:

@@ -227,7 +489,7 @@ where decode requires more information). The Xiph Fish Logo is a trademark (™) of Xiph.Org.
- These pages © 1994 - 2005 Xiph.Org. All rights reserved. + These pages © 1994 - 2010 Xiph.Org. All rights reserved. -- cgit v1.2.1