From 7f474d66e1a8bbedde54abe3ef2ffcd344b177fe Mon Sep 17 00:00:00 2001
From: Monty Ogg codecs use octet vectors of raw, compressed data
-(packets). These compressed packets do not have any
-high-level structure or boundary information; strung together, they
-appear to be streams of random bytes with no landmarks. Raw packets may be used directly by transport mechanisms that provide
-their own framing and packet-separation mechanisms (such as UDP
-datagrams). For stream based storage (such as files) and transport
-(such as TCP streams or pipes), Vorbis and other future Ogg codecs use
-the Ogg bitstream format to provide framing/sync, sync recapture
-after error, landmarks during seeking, and enough information to
-properly separate data back into packets at the original packet
-boundaries without relying on decoding to find packet boundaries. Raw packets are grouped and encoded into contiguous pages of
-structured bitstream data called logical bitstreams. A
-logical bitstream consists of pages, in order, belonging to a single
-codec instance. Each page is a self contained entity (although it is
-possible that a packet may be split and encoded across one or more
-pages); that is, the page decode mechanism is designed to recognize,
-verify and handle single pages at a time from the overall bitstream. Multiple logical bitstreams can be combined (with restrictions) into a
-single physical bitstream. A physical bitstream consists of
-multiple logical bitstreams multiplexed at the page level and may
-include a 'meta-header' at the beginning of the multiplexed logical
-stream that serves as identification magic. Whole pages are taken in
-order from multiple logical bitstreams and combined into a single
-physical stream of pages. The decoder reconstructs the original
-logical bitstreams from the physical bitstream by taking the pages in
-order from the physical bitstream and redirecting them into the
-appropriate logical decoding entity. The simplest physical bitstream
-is a single, unmultiplexed logical bitstream with no meta-header; this
-is referred to as a 'degenerate stream'. Ogg Logical Bitstream Framing discusses
+ Ogg is intended to be a simplest-possible container, concerned only
+with framing, ordering, and interleave. It can be used as a stream delivery
+mechanism, for media file storage, or as a building block toward
+implementing a more complex, non-linear container (for example, see
+the Skeleton or Annodex/CMML).
+
+ The Ogg container is not intended to be a monolithic
+'kitchen-sink'. It exists only to frame and deliver in-order stream
+data and as such is vastly simpler than most other containers.
+Elementary and multiplexed streams are both constructed entirely from a
+single building block (an Ogg page) comprised of eight fields
+totalling twenty-eight bytes (the page header) a list of packet lengths
+(up to 255 bytes) and payload data (up to 65025 bytes). The structure
+of every page is the same. There are no optional fields or alternate
+encodings.
+
+ Stream and media metadata is contained in Ogg and not built into
+the Ogg container itself. Metadata is thus compartmentalized and
+layered rather than part of a monolithic design, an especially good
+idea as no two groups seem able to agree on what a complete or
+complete-enough metadata set should be. In this way, the container and
+container implementation are isolated from unnecessary design flux.
+
+ The Ogg container is primarily a streaming format,
+encapsulating chronological, time-linear mixed media into a single
+delivery stream or file. The design is such that an application can
+always encode and/or decode all features of a bitstream in one pass
+with no seeking and minimal buffering. Seeking to provide optimized
+encoding (such as two-pass encoding) or interactive decoding (such as
+scrubbing or instant replay) is not disallowed or discouraged, however
+no container feature requires nonlinear access of the bitstream.
+
+ Ogg is designed to contain any size data payload with bounded,
+predictable efficiency. Ogg packets have no maximum size and a
+zero-byte minimum size. There is no restriction on size changes from
+packet to packet. Variable size packets do not require the use of any
+optional or additional container features. There is no optimal
+suggested packet size, though special consideration was paid to make
+sure 50-200 byte packets were no less efficient than larger packet
+sizes. The original design criteria was a 2% overhead at 50 byte
+packets, dropping to a maximum working overhead of 1% with larger
+packets, and a typical working overhead of .5-.7% for most practical
+uses.
+
+ Ogg is a byte-aligned container with no context-dependent, optional
+or variable-length fields. Ogg requires no repacking of codec data.
+The page structure is written out in-line as packet data is submitted
+to the streaming abstraction. In addition, it is possible to
+implement both Ogg mux and demux as MT-hot zero-copy abstractions (as
+is done in the Tremor sourcebase).
+
+ Ogg is designed for efficient and immediate stream capture with
+high confidence. Although packets have no size limit in Ogg, pages
+are a maximum of just under 64kB meaning that any Ogg stream can be
+captured with confidence after seeing 128kB of data or less [worst
+case; typical figure is 6kB] from any random starting point in the
+stream.
+
+ Ogg implements simple coarse- and fine-grained seeking by design.
+
+ Coarse seeking may be performed by simply 'moving the tone arm' to a
+new position and 'dropping the needle'. Rapid capture with
+accompanying timecode from any location in an Ogg file is guaranteed
+by the stream design. From the acquisition of the first timecode,
+all data needed to play back from that time code forward is ahead of
+the stream cursor.
+
+ Ogg implements full sample-granularity seeking using an
+interpolated bisection search built on the capture and timecode
+mechanisms used by coarse seeking. As above, once a search finds
+the desired timecode, all data needed to play back from that time code
+forward is ahead of the stream cursor.
+
+ Both coarse and fine seeking use the page structure and sequencing
+inherent to the Ogg format. All Ogg streams are fully seekable from
+creation; seekability is unaffected by truncation or missing data, and
+is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor
+heuristic.
+
+ Seeking without use of an index is a major point of the Ogg
+design. There are several reasons why Ogg forgoes an index:
+
+ Ogg multiplexes streams by interleaving pages from multiple elementary streams into a
+multiplexed stream in time order. The multiplexed pages are not
+altered. Muxing an Ogg AV stream out of separate audio,
+video and data streams is akin to shuffling several decks of cards
+together into a single deck; the cards themselves remain unchanged.
+Demultiplexing is similarly simple.
+
+ The goal of this design is to make the mux/demux operation as
+trivial as possible to allow live streaming systems to build and
+rebuild streams on the fly with minimal CPU usage and no additional
+storage or latency requirements.
+
+ Ogg streams belong to one of two categories, "Continuous" streams and
+"Discontinuous" streams.
+
+ A stream that provides a gapless, time-continuous media type with a
+fine-grained timebase is considered to be 'Continuous'. A continuous
+stream should never be starved of data. Examples of continuous data
+types include broadcast audio and video.
+
+ A stream that delivers data in a potentially irregular pattern or
+with widely spaced timing gaps is considered to be 'Discontinuous'. A
+discontinuous stream may be best thought of as data representing
+scattered events; although they happen in order, they are typically
+unconnected data often located far apart. One example of a
+discontinuous stream types would be captioning such as Ogg Kate. Although it's
+possible to design captions as a continuous stream type, it's most
+natural to think of captions as widely spaced pieces of text with
+little happening between.
+
+ The fundamental reason for distinction between continuous and
+discontinuous streams concerns buffering.
+
+ A continuous stream is, by definition, gapless. Ogg buffering is based
+on the simple premise of never allowing an active continuous stream
+to starve for data during decode; buffering works ahead until all
+continuous streams in a physical stream have data ready and no further.
+
+ Discontinuous stream data is not assumed to be predictable. The
+buffering design takes discontinuous data 'as it comes' rather than
+working ahead to look for future discontinuous data for a potentially
+unbounded period. Thus, the buffering process makes no attempt to fill
+discontinuous stream buffers; their pages simply 'fall out' of the
+stream when continuous streams are handled properly.
+
+ Buffering requirements in this design need not be explicitly
+declared or managed in the encoded stream. The decoder simply reads as
+much data as is necessary to keep all continuous stream types gapless
+and no more, with discontinuous data processed as it arrives in the
+continuous data. Buffering is implicitly optimal for the given
+stream. Because all pages of all data types are stamped with absolute
+timing information within the stream, inter-stream synchronization
+timing is always maintained without the need for explicitly declared
+buffer-ahead hinting.
+
+ Ogg does not replicate codec-specific metadata into the mux layer
+in an attempt to make the mux and codec layer implementations 'fully
+separable'. Things like specific timebase, keyframing strategy, frame
+duration, etc, do not appear in the Ogg container. The mux layer is,
+instead, expected to query a codec through a standardized interface,
+left to the implementation, for this data when it is needed.
+
+ Though modern design wisdom usually prefers to predict all possible
+needs of current and future codecs then embed these dependencies and
+the required metadata into the container itself, this strategy
+increases container specification complexity, fragility, and rigidity.
+The mux and codec implementations become more independent, but the
+specifications become less independent. A codec can't do what a
+container hasn't already provided for. New codecs are harder to
+support, and you can do fewer useful things with the ones you've
+already got (eg, try to make a good splitter without using any codecs.
+You're stuck splitting at keyframes only, or building yet another new
+mechanism into the container layer to mark what frames to skip
+displaying).
+
+ Ogg's design goes the opposite direction, where the specification
+is to be as simple, easy to understand, and 'proofed' against novel
+codecs as possible. When an Ogg mux layer requires codec-specific
+information, it queries the codec (or a codec stub). This trades a
+more complex implementation for a simpler, more flexible
+specification.
+
+ The Ogg container itself does not define a metadata system for
+declaring the structure and interrelations between multiple media
+types in a muxed stream. That is, the Ogg container itself does not
+specify data like 'which steam is the subtitle stream?' or 'which
+video stream is the primary angle?'. This metadata still exists, but
+is stored in the Ogg container rather than being built into the Ogg
+container. Xiph specifies the 'Skeleton' metadata format for Ogg
+streams, but this decoupling of container and stream structure
+metadata means it is possible to use Ogg with any metadata
+specification without altering the container itself, or without stream
+structure metadata at all.
+
+ Every Ogg page is stamped with a 64 bit 'granule position' that
+serves as an absolute timestamp for mux and seeking. A few nifty
+little tricks are usually also embedded in the granpos state, but
+we'll leave those aside for the moment (strictly speaking, they're
+part of each codec's mapping, not Ogg).
+
+ As previously mentioned above, granule positions are mapped into
+absolute timestamps by the codec, rather than being a hard timestamp.
+This allows maximally efficient use of the available 64 bits to
+address every sample/frame position without approximation while
+supporting new and previously unknown timebase encodings without
+needing to extend or update the mux layer. When a codec needs a novel
+timebase, it simply brings the code for that mapping along with it.
+This is not a theoretical curiosity; new, wholly novel timebases were
+deployed with the adoption of both Theora and Dirac. "Rolling INTRA"
+(keyframeless video) also benefits from novel use of the granule
+position.
+
+ Ogg codecs use packets. Packets are octet payloads of
+raw, compressed data, containing the data needed for a single
+decompressed unit, eg, one video frame. Packets have no maximum size
+and may be zero length. They do not have any high-level structure or
+boundary information; strung together, the unframed packets form a
+logical bitstream of apparently random bytes with no internal
+landmarks.
+
+ Logical bitstream packets are grouped and framed into Ogg pages
+along with a unique stream serial number to produce a
+physical bitstream. An elementary stream is a
+physical bitstream containing only the pages framing a single logical
+bitstream. Each page is a self contained entity, although a packet may
+be split and encoded across one or more pages. The page decode
+mechanism is designed to recognize, verify and handle single pages at
+a time from the overall bitstream.
+
+ Ogg Bitstream Framing specifies
the page format of an Ogg bitstream, the packet coding process
-and logical bitstreams in detail. The remainder of this document
-specifies requirements for constructing finished, physical Ogg
-bitstreams. Logical bitstreams may not be mapped/multiplexed into physical
-bitstreams without restriction. Here we discuss design restrictions
-on Ogg physical bitstreams in general, mostly to introduce
-design rationale. Each 'media' format defines its own (generally more
-restrictive) mapping. An 'Ogg Vorbis Audio Bitstream', for example, has a
-specific physical bitstream structure.
-Any other codec or combination of codecs will generally also mandate a
-corresponding restricted physical bitstream format. Multiple logical/elementary bitstreams can be combined into a single
+multiplexed bitstream by interleaving whole pages from each
+contributing elementary stream in time order. The result is a single
+physical stream that multiplexes and frames multiple logical streams.
+Each logical stream is identified by the unique stream serial number
+stamped in its pages. A physical stream may include a 'meta-header'
+(such as the Ogg Skeleton) comprising its
+own Ogg page at the beginning of the physical stream. A decoder
+recovers the original logical/elementary bitstreams out of the
+physical bitstream by taking the pages in order from the physical
+bitstream and redirecting them into the appropriate logical decoding
+entity.
+
+ Ogg Bitstream Multiplexing specifies
+proper multiplexing of an Ogg bitstream in detail.
+
+ Multiple Ogg physical bitstreams may be concatenated into a single new
+stream; this is chaining. The bitstreams do not overlap; the
+final page of a given logical bitstream is immediately followed by the
+initial page of the next. Each logical bitstream in a chain must have a unique serial number
+within the scope of the full physical bitstream, not only within a
+particular link or segment of the chain. Within Ogg, each stream must be declared (by the codec) to be
+continuous- or discontinuous-time. Most codecs treat all streams they
+use as either inherently continuous- or discontinuous-time, although
+this is not a requirement. A codec may, as part of its mapping, choose
+according to data in the initial header.
+
+ Continuous-time pages are stamped by end-time, discontinuous pages
+are stamped by begin-time. Pages in a multiplexed stream are
+interleaved in order of the time stamp regardless of stream type.
+Both continuous and discontinuous logical streams are used to seek
+within a physical stream, however only continuous streams are used to
+determine buffering depth; because discontinuous streams are stamped
+by start time, they will always 'fall out' in time when buffering
+tracks only the continuous streams. See 'Examples' for an
+illustration of the buffering mechanism.
+
+ Each codec is allowed some freedom in deciding how its logical
+bitstream is encapsulated into an Ogg bitstream (even if it is a
+trivial mapping, eg, 'plop the packets in and go'). This is the
+codec's mapping. Ogg imposes a few mapping requirements
+on any codec.
The framing specification defines
'beginning of stream' and 'end of stream' page markers via a header
flag (it is possible for a stream to consist of a single page). A
-stream always consists of an integer number of pages, an easy
+correct stream always consists of an integer number of pages, an easy
requirement given the variable size nature of pages. In addition to the header flag marking the first and last pages of a
-logical bitstream, the first page of an Ogg bitstream obeys
-additional restrictions. Each individual media mapping specifies its
-own implementation details regarding these restrictions. The first page of a logical Ogg bitstream consists of a single,
-small 'initial header' packet that includes sufficient information to
-identify the exact CODEC type and media requirements of the logical
-bitstream. The intent of this restriction is to simplify identifying
-the bitstream type and content; for a given media type (or across all
-Ogg media types) we can know that we only need a small, fixed
-amount of data to uniquely identify the bitstream type. As an example, Ogg Vorbis places the name and revision of the Vorbis
-CODEC, the audio rate and the audio quality into this initial header,
-thus simplifying vastly the certain identification of an Ogg Vorbis
-audio bitstream. The simplest form of logical bitstream multiplexing is concatenation
-(chaining). Complete logical bitstreams are strung
-one-after-another in order. The bitstreams do not overlap; the final
-page of a given logical bitstream is immediately followed by the
-initial page of the next. Chaining is the only logical->physical
-mapping allowed by Ogg Vorbis. Each chained logical bitstream must have a unique serial number within
-the scope of the physical bitstream. Logical bitstreams may also be multiplexed 'in parallel'
-(grouped). An example of grouping would be to allow
-streaming of separate audio and video streams, using different codecs
-and different logical bitstreams, in the same physical bitstream.
-Whole pages from multiple logical bitstreams are mixed together. The initial pages of each logical bitstream must appear first; the
-media mapping specifies the order of the initial pages. For example,
-Ogg Theora describes video bitstream with audio.
-The mapping specifies that the physical bitstream must begin
-with the initial page of a logical video bitstream, followed by the
-initial page of an audio stream. Unlike initial pages, terminal pages
-for the logical bitstreams need not all occur contiguously (although a
-specific media mapping may require this; it is not mandated by the
-generic Ogg stream spec). Terminal pages may be 'nil' pages,
-that is, pages containing no content but simply a page header with
-position information and the 'last page of bitstream' flag set in the
-page header. The first page of an elementary Ogg bitstream consists of a single,
+small 'initial header' packet that must include sufficient information
+to identify the exact CODEC type. From this initial header, the codec
+must also be able to determine its timebase and whether or not it is a
+continuous- or discontinuous-time stream. The initial header must fit
+on a single page. If a codec makes use of auxiliary headers (for
+example, Vorbis uses two auxiliary headers), these headers must follow
+the initial header immediately. The last header finishes its page;
+data begins on a fresh page.
+
+ As an example, Ogg Vorbis places the name and revision of the
+Vorbis CODEC, the audio rate and the audio quality into this initial
+header. Comments and detailed codec setup appears in the larger
+auxiliary headers. Multiplexing requirements within Ogg are straightforward. When
+constructing a single-link (unchained) physical bitstream consisting
+of multiple elementary streams:
+
+ Each grouped bitstream must have a unique serial number within the
scope of the physical bitstream. Groups of concurrently multiplexed bitstreams may be chained
+ Multiplexed and/or unmultiplexed bitstreams may be chained
consecutively. Such a physical bitstream obeys all the rules of both
-grouped and chained multiplexed streams; the groups, when unchained ,
-must stand on their own as a valid concurrently multiplexed
-bitstream.Ogg logical and physical bitstream overview
-
-Ogg bitstreams
-
-Logical and physical bitstreams
-
-Ogg bitstream overview
+
+This document serves as starting point for understanding the design
+and implementation of the Ogg container format. If you're new to Ogg
+or merely want a high-level technical overview, start reading here.
+Other documents linked from the index page
+give distilled technical descriptions and references of the container
+mechanisms. This document is intended to aid understanding.
+
+Container format design points
+
+Streaming
+
+Variable Bit Rate, Variable Payload Size
+
+Simple pagination
+
+Capture
+
+Seeking
+
+
+
+
+
+Simple multiplexing
+
+Continuous and Discontinuous Media
+
+Buffering
+
+Codec metadata
+
+Stream structure metadata
+
+Frame accurate absolute position
+
+Ogg stream arrangement
+
+Packets, pages, and bitstreams
+
+Mapping Restrictions
-
-additional end-to-end structure
+and elementary bitstreams in detail.
+
+Multiplexed bitstreams
+
+Chaining
+
+Continuous and discontinuous streams
+
+Mapping Requirements
+
+sequential multiplexing (chaining)
-
-concurrent multiplexing (grouping)
-
-Multiplexing Requirements
+
+
+
+
sequential and concurrent multiplexing
+chaining and multiplexing
-
Below, we present an example of a grouped and chained bitstream:
@@ -227,7 +489,7 @@ where decode requires more information). The Xiph Fish Logo is a trademark (™) of Xiph.Org.