diff options
author | Jack Moffitt <jack@xiph.org> | 2000-09-03 05:54:26 +0000 |
---|---|---|
committer | Jack Moffitt <jack@xiph.org> | 2000-09-03 05:54:26 +0000 |
commit | e2cee72399a6b0d7986f582e86baf4e240ccaab6 (patch) | |
tree | 1d5ac0d04a81d4954932d3073ea096bd576abf2c /doc | |
download | ogg-e2cee72399a6b0d7986f582e86baf4e240ccaab6.tar.gz |
Initial revision
git-svn-id: http://svn.xiph.org/trunk/ogg@618 0101bb08-14d6-0310-b084-bc0e0c8e3800
Diffstat (limited to 'doc')
-rw-r--r-- | doc/Makefile.am | 9 | ||||
-rw-r--r-- | doc/framing.html | 384 | ||||
-rw-r--r-- | doc/index.html | 3 | ||||
-rw-r--r-- | doc/oggstream.html | 192 | ||||
-rw-r--r-- | doc/stream.png | bin | 0 -> 2327 bytes | |||
-rw-r--r-- | doc/white-ogg.png | bin | 0 -> 1181 bytes | |||
-rw-r--r-- | doc/white-xifish.png | bin | 0 -> 965 bytes |
7 files changed, 588 insertions, 0 deletions
diff --git a/doc/Makefile.am b/doc/Makefile.am new file mode 100644 index 0000000..8893207 --- /dev/null +++ b/doc/Makefile.am @@ -0,0 +1,9 @@ +## Process this with automake to create Makefile.in + +AUTOMAKE_OPTIONS = foreign + +docdir = $(prefix)/doc/$(PACKAGE)-$(VERSION) + +doc_DATA = index.html framing.html oggstream.html white-xifish.png stream.png white-ogg.png + +EXTRA_DIST = $(doc_DATA) diff --git a/doc/framing.html b/doc/framing.html new file mode 100644 index 0000000..51054c7 --- /dev/null +++ b/doc/framing.html @@ -0,0 +1,384 @@ +<HTML><HEAD><TITLE>xiph.org: Ogg Vorbis documentation</TITLE> +<BODY bgcolor="#ffffff" text="#202020" link="#006666" vlink="#000000"> +<nobr><a href="vorbis.html"><img src="white-ogg.png" border=0><img +src="vorbisword2.png" border=0></a></nobr><p> + +<h1><font color=#000070> +Ogg logical bitstream framing +</font></h1> + +<em>Last update to this document: July 15, 1999</em><br> + +<h2>Ogg bitstreams</h2> + +Vorbis encodes short-time blocks of PCM data into raw packets of +bit-packed data. These raw packets may be used directly by transport +mechanisms that provide their own framing and packet-seperation +mechanisms (such as UDP datagrams). For stream based storage (such as +files) and transport (such as TCP streams or pipes), Vorbis uses the +Ogg bitstream format to provide framing/sync, sync recapture +after error, landmarks during seeking, and enough information to +properly seperate data back into packets at the original packet +boundaries without relying on decoding to find packet boundaries.<p> + +<h2>Design constraints for Ogg bitstreams</h2> + +<ol><li>True streaming; we must not need to seek to build a 100% + complete bitstream. + +<li> Use no more than approximately 1-2% of bitstream bandwidth for + packet boundary marking, high-level framing, sync and seeking. + +<li> Specification of absolute position within the original sample + stream. + +<li> Simple mechanism to ease limited editing, such as a simplified + concatenation mechanism. + +<li> Detection of corruption, recapture after error and direct, random + access to data at arbitrary positions in the bitstream. +</ol> + +<h2>Logical and Physical Bitstreams</h2> + +A <em>logical</em> Ogg bitstream is a contiguous stream of +sequential pages belonging only to the logical bitstream. A +<em>physical</em> Ogg bitstream is constructed from one or more +than one logical Ogg bitstream (the simplest physical bitstream +is simply a single logical bitstream). We describe below the exact +formatting of an Ogg logical bitstream. Combining logical +bitstreams into more complex physical bitstreams is described in the +<a href="oggstream.html">Ogg bitstream overview</a>. The exact +mapping of raw Vorbis packets into a valid Ogg Vorbis physical +bitstream is described in <a href="vorbis-stream.html">Vorbis +bitstream mapping</a>. + +<h2>Bitstream structure</h2> + +An Ogg stream is structured by dividing incoming packets into +segments of up to 255 bytes and then wrapping a group of contiguous +packet segments into a variable length page preceeded by a page +header. Both the header size and page size are variable; the page +header contains sizing information and checksum data to determine +header/page size and data integrity.<p> + +The bitstream is captured (or recaptured) by looking for the beginning +of a page, specifically the capture pattern. Once the capture pattern +is found, the decoder verifies page sync and integrity by computing +and comparing the checksum. At that point, the decoder can extract the +packets themselves.<p> + +<h3>Packet segmentation</h3> + +Packets are logically divided into multiple segments before encoding +into a page. Note that the segmentation and fragmentation process is a +logical one; it's used to compute page header values and the original +page data need not be disturbed, even when a packet spans page +boundaries.<p> + +The raw packet is logically divided into [n] 255 byte segments and a +last fractional segment of < 255 bytes. A packet size may well +consist only of the trailing fractional segment, and a fractional +segment may be zero length. These values, called "lacing values" are +then saved and placed into the header segment table.<p> + +An example should make the basic concept clear:<p> + +<pre> +<tt> +raw packet: + ___________________________________________ + |______________packet data__________________| 753 bytes + +lacing values for page header segment table: 255,255,243 +</tt> +</pre> + +We simply add the lacing values for the total size; the last lacing +value for a packet is always the value that is less than 255. Note +that this encoding both avoids imposing a maximum packet size as well +as imposing minimum overhead on small packets (as opposed to, eg, +simply using two bytes at the head of every packet and having a max +packet size of 32k. Small packets (<255, the typical case) are +penalized with twice the segmentation overhead). Using the lacing +values as suggested, small packets see the minimum possible +byte-aligned overheade (1 byte) and large packets, over 512 bytes or +so, see a fairly constant ~.5% overhead on encoding space.<p> + +Note that a lacing value of 255 implies that a second lacing value +follows in the packet, and a value of < 255 marks the end of the +packet after that many additional bytes. A packet of 255 bytes (or a +multiple of 255 bytes) is terminated by a lacing value of 0:<p> + +<pre><tt> +raw packet: + _______________________________ + |________packet data____________| 255 bytes + +lacing values: 255, 0 +</tt></pre> + +Note also that a 'nil' (zero length) packet is not an error; it +consists of nothing more than a lacing value of zero in the header.<p> + +<h3>Packets spanning pages</h3> + +Packets are not resticted to beginning and ending within a page, +although individual segments are, by definition, required to do so. +Packets are not restricted to a maximum size, although excessively +large packets in the data stream are discouraged; the Ogg +bitstream specification strongly recommends nominal page size of +approximately 4-8kB (large packets are forseen as being useful for +initialization data at the beginning of a logical bitstream).<p> + +After segmenting a packet, the encoder may decide not to place all the +resulting segments into the current page; to do so, the encoder places +the lacing values of the segments it wishes to belong to the current +page into the current segment table, then finishes the page. The next +page is begun with the first value in the segment table belonging to +the next packet segment, thus continuing the packet (data in the +packet body must also correspond properly to the lacing values in the +spanned pages. The segment data in the first packet corresponding to +the lacing values of the first page belong in that page; packet +segments listed in the segment table of the following page must begin +the page body of the subsequent page).<p> + +The last mechanic to spanning a page boundary is to set the header +flag in the new page to indicate that the first lacing value in the +segment table continues rather than begins a packet; a header flag of +0x01 is set to indicate a continued packet. Although mandatory, it +is not actually algorithmically necessary; one could inspect the +preceeding segment table to determine if the packet is new or +continued. Adding the information to the packet_header flag allows a +simpler design (with no overhead) that needs only inspect the current +page header after frame capture. This also allows faster error +recovery in the event that the packet originates in a corrupt +preceeding page, implying that the previous page's segment table +cannot be trusted.<p> + +Note that a packet can span an arbitrary number of pages; the above +spanning process is repeated for each spanned page boundary. Also a +'zero termination' on a packet size that is an even multiple of 255 +must appear even if the lacing value appears in the next page as a +zero-length continuation of the current packet. The header flag +should be set to 0x01 to indicate that the packet spanned, even though +the span is a nil case as far as data is concerned.<p> + +The encoding looks odd, but is properly optimized for speed and the +expected case of the majority of packets being between 50 and 200 +bytes (note that it is designed such that packets of wildly different +sizes can be handled within the model; placing packet size +restrictions on the encoder would have only slightly simplified design +in page generation and increased overall encoder complexity).<p> + +The main point behind tracking individual packets (and packet +segments) is to allow more flexible encoding tricks that requiring +explicit knowledge of packet size. An example is simple bandwidth +limiting, implemented by simply truncating packets in the nominal case +if the packet is arranged so that the least sensitive portion of the +data comes last.<p> + +<h3>Page header</h3> + +The headering mechanism is designed to avoid copying and re-assembly +of the packet data (ie, making the packet segmentation process a +logical one); the header can be generated directly from incoming +packet data. The encoder buffers packet data until it finishes a +complete page at which point it writes the header followed by the +buffered packet segments.<p> + +<h4>capture_pattern</h4> + + A header begins with a capture pattern that simplifies identifying + pages; once the decoder has found the capture pattern it can do a more + intensive job of verifying that it has in fact found a page boundary + (as opposed to an inadvertant coincidence in the byte stream).<p> + +<pre><tt> + byte value + + 0 0x4f 'O' + 1 0x67 'g' + 2 0x67 'g' + 3 0x53 'S' +</tt></pre> + +<h4>stream_structure_version</h4> + + The capture pattern is followed by the stream structure revision: + +<pre><tt> + byte value + + 4 0x00 +</tt></pre> + +<h4>header_type_flag</h4> + + The header type flag identifies this page's context in the bitstream: + +<pre><tt> + byte value + + 5 bitflags: 0x01: unset = fresh packet + set = continued packet + 0x02: unset = not first page of logical bitstream + set = first page of logical bitstream (bos) + 0x04: unset = not last page of logical bitstream + set = last page of logical bitstream (eos) +</tt></pre> + +<h4>PCM absolute position</h4> + + (This is packed in the same way the rest of Ogg data is packed; + LSb of LSB first. Note that the 'position' data specifies a 'sample' + number (eg, in a CD quality sample is four octets, 16 bits for left + and 16 bits for right; in video it would be the frame number). The + position specified is the total samples encoded after including all + packets finished on this page (packets begun on this page but + continuing on to thenext page do not count). The rationale here is + that the position specified in the frame header of the last page + tells how long the PCM data coded by the bitstream is. A truncated + stream will still return the proper number of samples that can be + decoded fully. + +<pre><tt> + byte value + + 6 0xXX LSB + 7 0xXX + 8 0xXX + 9 0xXX + 10 0xXX + 11 0xXX + 12 0xXX + 13 0xXX MSB +</tt></pre> + +<h4>stream serial number</h4> + + Ogg allows for seperate logical bitstreams to be mixed at page + granularity in a physical bitstream. The most common case would be + sequential arrangement, but it is possible to interleave pages for + two seperate bitstreams to be decoded concurrently. The serial + number is the means by which pages physical pages are associated with + a particular logical stream. Each logical stream must have a unique + serial number within a physical stream: + +<pre><tt> + byte value + + 14 0xXX LSB + 15 0xXX + 16 0xXX + 17 0xXX MSB +</tt></pre> + +<h4>page sequence no</h4> + + Page counter; lets us know if a page is lost (useful where packets + span page boundaries). + +<pre><tt> + byte value + + 18 0xXX LSB + 19 0xXX + 20 0xXX + 21 0xXX MSB +</tt></pre> + +<h4>page checksum</h4> + + 32 bit CRC value (direct algorithm, initial val and final XOR = 0, + generator polynomial=0x04c11db7). The value is computed over the + entire header (with the CRC field in the header set to zero) and then + continued over the page. The CRC field is then filled with the + computed value.<p> + + (A thorough discussion of CRC algorithms can be found in <a + href="ftp://ftp.rocksoft.com/clients/rocksoft/papers/crc_v3.txt">"A + Painless Guide to CRC Error Detection Algorithms"</a> by Ross + Williams <a + href="mailto:ross@guest.adelaide.edu.au">ross@guest.adelaide.edu.au</a>.) + +<pre><tt> + byte value + + 22 0xXX LSB + 23 0xXX + 24 0xXX + 25 0xXX MSB +</tt></pre> + +<h4>page_segments</h4> + + The number of segment entries to appear in the segment table. The + maximum number of 255 segments (255 bytes each) sets the maximum + possible physical page size at 65307 bytes or just under 64kB (thus + we know that a header corrupted so as destroy sizing/alignment + information will not cause a runaway bitstream. We'll read in the + page according to the corrupted size information that's guaranteed to + be a reasonable size regardless, notice the checksum mismatch, drop + sync and then look for recapture).<p> + +<pre><tt> + byte value + + 26 0x00-0xff (0-255) +</tt></pre> + +<h4>segment_table (containing packet lacing values)</h4> + + The lacing values for each packet segment physically appearing in + this page are listed in contiguous order. + +<pre><tt> + byte value + + 27 0x00-0xff (0-255) + [...] + n 0x00-0xff (0-255, n=page_segments+26) +</tt></pre> + +Total page size is calculated directly from the known header size and +lacing values in the segment table. Packet data segments follow +immediately after the header.<p> + +Page headers typically impose a flat .25-.5% space overhead assuming +nominal ~8k page sizes. The segmentation table needed for exact +packet recovery in the streaming layer adds approximately .5-1% +nominal assuming expected encoder behavior in the 44.1kHz, 128kbps +stereo encodings.<p> + +<hr> +<a href="http://www.xiph.org/"> +<img src="white-xifish.png" align=left border=0> +</a> +<font size=-2 color=#505050> + +Ogg is a <a href="http://www.xiph.org">Xiphophorus</a> effort to +protect essential tenets of Internet multimedia from corporate +hostage-taking; Open Source is the net's greatest tool to keep +everyone honest. See <a href="http://www.xiph.org/about.html">About +Xiphophorus</a> for details. +<p> + +Ogg Vorbis is the first Ogg audio CODEC. Anyone may +freely use and distribute the Ogg and Vorbis specification, +whether in a private, public or corporate capacity. However, +Xiphophorus and the Ogg project (xiph.org) reserve the right to set +the Ogg/Vorbis specification and certify specification compliance.<p> + +Xiphophorus's Vorbis software CODEC implementation is distributed +under the Lessr/Library GNU Public License. This does not restrict +third parties from distributing independent implementations of Vorbis +software under other licenses.<p> + +OggSquish, Vorbis, Xiphophorus and their logos are trademarks (tm) of +<a href="http://www.xiph.org/">Xiphophorus</a>. These pages are +copyright (C) 1994-2000 Xiphophorus. All rights reserved.<p> + +</body> + + diff --git a/doc/index.html b/doc/index.html new file mode 100644 index 0000000..9b4232b --- /dev/null +++ b/doc/index.html @@ -0,0 +1,3 @@ +<a href="oggstream.html">Ogg logical and physical bitstream overview</a><br> +<a href="framing.html">Ogg logical bitstream framing</a><br> + diff --git a/doc/oggstream.html b/doc/oggstream.html new file mode 100644 index 0000000..46a221c --- /dev/null +++ b/doc/oggstream.html @@ -0,0 +1,192 @@ +<HTML><HEAD><TITLE>xiph.org: Ogg Vorbis documentation</TITLE> +<BODY bgcolor="#ffffff" text="#202020" link="#006666" vlink="#000000"> +<nobr><a href="vorbis.html"><img src="white-ogg.png" border=0><img +src="vorbisword2.png" border=0></a></nobr><p> + + +<h1><font color=#000070> +Ogg logical and physical bitstream overview +</font></h1> + +<em>Last update to this document: July 18, 1999</em><br> + +<h2>Ogg bitstreams</h2> + +Ogg codecs use octet vectors of raw, compressed data +(<em>packets</em>). These compressed packets do not have any +high-level structure or boundary information; strung together, they +appear to be streams of random bytes with no landmarks.<p> + +Raw packets may be used directly by transport mechanisms that provide +their own framing and packet-seperation mechanisms (such as UDP +datagrams). For stream based storage (such as files) and transport +(such as TCP streams or pipes), Vorbis and other future Ogg codecs use +the Ogg bitstream format to provide framing/sync, sync recapture +after error, landmarks during seeking, and enough information to +properly seperate data back into packets at the original packet +boundaries without relying on decoding to find packet boundaries.<p> + +<h2>Logical and physical bitstreams</h2> + +Raw packets are grouped and encoded into contiguous pages of +structured bitstream data called <em>logical bitstreams</em>. A +logical bitstream consists of pages, in order, belonging to a single +codec instance. Each page is a self contained entity (although it is +possible that a packet may be split and encoded across one or more +pages); that is, the page decode mechanism is designed to recognize, +verify and handle single pages at a time from the overall bitstream.<p> + +Multiple logical bitstreams can be combined (with restricctions) into +a single <em>physical bitstream</em>. A physical bitstream consists +of multiple logical bitstreams multiplexed at the page level. Whole +pages are taken in order from multiple logical bitstreams and combined +into a single physical stream of pages. The decoder reconstructs the +original logical bitstreams from the physical bitstream by taking the +pages in order fromt he physical bitstream and redirecting them into +the appropriate logical decoding entitiy. The simplest physical +bitstream is a single, unmultiplexed logical bitstream. <p> + +<a href=framing.html>Ogg Logical Bitstream Framing</a> discusses +the page format of an Ogg bitstream, the packet coding process +and logical bitstreams in detail. The remainder of this document +specifies requirements for constructing finished, physical Ogg +bitstreams.<p> + +<h2>Mapping Restrictions</h2> + +Logical bitstreams may not be mapped/multiplexed into physical +bitstreams without restriction. Here we discuss design restrictions +on Ogg physical bitstreams in general, mostly to introduce +design rationale. Each 'media' format defines its own (generally more +restrictive) mapping. An '<a href="vorbis-stream.html">Ogg Vorbis +Audio Bitstream</a>', for example, has a <a +href="vorbis-stream.html">specific physical bitstream structure</a>. +An 'Ogg A/V' bitstream (not currently specified) will also mandate a +specific, restricted physical bitstream format.<p> + +<h3>additional end-to-end structure</h3> + +The <a href="framing.html">framing specification</a> defines +'beginning of stream' and 'end of stream' page markers via a header +flag (it is possible for a stream to consist of a single page). A +stream always consists of an integer number of pages, an easy +requirement given the variable size nature of pages.<p> + +In addition to the header flag marking the first and last pages of a +logical bitstream, the first page of an Ogg bitstream obeys +additional restrictions. Each individual media mapping specifies its +own implementation details regarding these restrictions.<p> + +The first page of a logical Ogg bitstream consists of a single, +small 'initial header' packet that includes sufficient information to +identify the exact CODEC type and media requirements of the logical +bitstream. The intent of this restriction is to simplify identifying +the bitstream type and content; for a given media type (or across all +Ogg media types) we can know that we only need a small, fixed +amount of data to uniquely identify the bitstream type.<p> + +As an example, Ogg Vorbis places the name and revision of the Vorbis +CODEC, the audio rate and the audio quality into this initial header, +thus simplifying vastly the certain identification of an Ogg Vorbis +audio bitstream.<p> + +<h3>sequential multiplexing (chaining)</h3> + +The simplest form of logical bitstream multiplexing is concatenation +(<em>chaining</em>). Complete logical bitstreams are strung +one-after-another in order. The bitstreams do not overlap; the final +page of a given logical bitstream is immediately followed by the +initial page of the next. Chaining is the only logical->physical +mapping allowed by Ogg Vorbis.<p> + +Each chained logical bitstream must have a unique serial number within +the scope of the physical bitstream.<p> + +<h3>concurrent multiplexing (grouping)</h3> + +Logical bitstreams may also be multiplexed 'in parallel' +(<em>grouped</em>). An example of grouping would be to allow +streaming of seperate audio and video streams, using differnt codecs +and different logical bitstreams, in the same physical bitstream. +Whole pages from multiple logical bitstreams are mixed together.<p> + +The initial pages of each logical bitstream must appear first; the +media mapping specifies the order of the initial pages. For example, +Ogg A/V will eventually specify an Ogg video bitstream with +audio. The mapping may specify that the physical bitstream must begin +with the initial page of a logical video bitstream, followed by the +initial page of an audio stream. Unlike initial pages, terminal pages +for the logical bitstreams need not all occur contiguously (although a +specific media mapping may require this; it is not mandated by the +generic Ogg stream spec). Terminal pages may be 'nil' pages, +that is, pages containing no content but simply a page header with +position information and the 'last page of bitstream' flag set in the +page header.<p> + +Each grouped bitstream must have a unique serial number within the +scope of the physical bitstream.<p> + +<h3>sequential and concurrent multiplexing</h3> + +Groups of concurrently multiplexed bitstreams may be chained +consecutively. Such a physical bitstream obeys all the rules of both +grouped and chained multiplexed streams; the groups, when unchained , +must stand on their own as a valid concurrently multiplexed +bitstream.<p> + +<h3>multiplexing example</h3> + +Below, we present an example of a grouped and chained bitstream:<p> + +<img src=stream.png><p> + +In this example, we see pages from five total logical bitstreams +multiplexed into a physical bitstream. Note the following +characteristics: + +<ol><li>Grouped bitstreams begin together; all of the initial pages +must appear before any data pages. When concurrently multiplexed +groups are chained, the new group does not begin until all the +bitstreams in the previous group have terminated.<p> + +<li>The pages of concurrently multiplexed bitstreams need not conform +to a regular order; the only requirement is that page <tt>n</tt> of a +logical bitstream follow page <tt>n-1</tt> in the physical bitstream. +There are no restrictions on intervening pages belonging to other +logical bitstreams. (Tying page appearence to bitrate demands is one +logical strategy, ie, the page appears at the chronological point +where decode requires more information). + +</ol> + +<hr> +<a href="http://www.xiph.org/"> +<img src="white-xifish.png" align=left border=0> +</a> +<font size=-2 color=#505050> + +Ogg is a <a href="http://www.xiph.org">Xiphophorus</a> effort to +protect essential tenets of Internet multimedia from corporate +hostage-taking; Open Source is the net's greatest tool to keep +everyone honest. See <a href="http://www.xiph.org/about.html">About +Xiphophorus</a> for details. +<p> + +Ogg Vorbis is the first Ogg audio CODEC. Anyone may +freely use and distribute the Ogg and Vorbis specification, +whether in a private, public or corporate capacity. However, +Xiphophorus and the Ogg project (xiph.org) reserve the right to set +the Ogg/Vorbis specification and certify specification compliance.<p> + +Xiphophorus's Vorbis software CODEC implementation is distributed +under the Lesser/Library GNU Public License. This does not restrict +third parties from distributing independent implementations of Vorbis +software under other licenses.<p> + +OggSquish, Vorbis, Xiphophorus and their logos are trademarks (tm) of +<a href="http://www.xiph.org/">Xiphophorus</a>. These pages are +copyright (C) 1994-2000 Xiphophorus. All rights reserved.<p> + +</body> + + diff --git a/doc/stream.png b/doc/stream.png Binary files differnew file mode 100644 index 0000000..6e9dca8 --- /dev/null +++ b/doc/stream.png diff --git a/doc/white-ogg.png b/doc/white-ogg.png Binary files differnew file mode 100644 index 0000000..45dc0ac --- /dev/null +++ b/doc/white-ogg.png diff --git a/doc/white-xifish.png b/doc/white-xifish.png Binary files differnew file mode 100644 index 0000000..ab25cc8 --- /dev/null +++ b/doc/white-xifish.png |