summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJack Moffitt <jack@xiph.org>2000-09-03 06:27:46 +0000
committerJack Moffitt <jack@xiph.org>2000-09-03 06:27:46 +0000
commit273523a8e3f84abe37232b0265428f3d48b84d81 (patch)
treea12a361d834f8dd15c97d84c37361cbbb5c6b9b9
parent2975eba833b46985966696cee3e71a2f285db3bb (diff)
downloadlibvorbis-git-273523a8e3f84abe37232b0265428f3d48b84d81.tar.gz
cleared out the doc dir
svn path=/branches/branch_jackoggsvorbis/vorbis/; revision=623
-rw-r--r--doc/framing.html384
-rw-r--r--doc/oggstream.html192
-rw-r--r--doc/stream.pngbin0 -> 2327 bytes
3 files changed, 576 insertions, 0 deletions
diff --git a/doc/framing.html b/doc/framing.html
new file mode 100644
index 00000000..51054c7d
--- /dev/null
+++ b/doc/framing.html
@@ -0,0 +1,384 @@
+<HTML><HEAD><TITLE>xiph.org: Ogg Vorbis documentation</TITLE>
+<BODY bgcolor="#ffffff" text="#202020" link="#006666" vlink="#000000">
+<nobr><a href="vorbis.html"><img src="white-ogg.png" border=0><img
+src="vorbisword2.png" border=0></a></nobr><p>
+
+<h1><font color=#000070>
+Ogg logical bitstream framing
+</font></h1>
+
+<em>Last update to this document: July 15, 1999</em><br>
+
+<h2>Ogg bitstreams</h2>
+
+Vorbis encodes short-time blocks of PCM data into raw packets of
+bit-packed data. These raw packets may be used directly by transport
+mechanisms that provide their own framing and packet-seperation
+mechanisms (such as UDP datagrams). For stream based storage (such as
+files) and transport (such as TCP streams or pipes), Vorbis uses the
+Ogg bitstream format to provide framing/sync, sync recapture
+after error, landmarks during seeking, and enough information to
+properly seperate data back into packets at the original packet
+boundaries without relying on decoding to find packet boundaries.<p>
+
+<h2>Design constraints for Ogg bitstreams</h2>
+
+<ol><li>True streaming; we must not need to seek to build a 100%
+ complete bitstream.
+
+<li> Use no more than approximately 1-2% of bitstream bandwidth for
+ packet boundary marking, high-level framing, sync and seeking.
+
+<li> Specification of absolute position within the original sample
+ stream.
+
+<li> Simple mechanism to ease limited editing, such as a simplified
+ concatenation mechanism.
+
+<li> Detection of corruption, recapture after error and direct, random
+ access to data at arbitrary positions in the bitstream.
+</ol>
+
+<h2>Logical and Physical Bitstreams</h2>
+
+A <em>logical</em> Ogg bitstream is a contiguous stream of
+sequential pages belonging only to the logical bitstream. A
+<em>physical</em> Ogg bitstream is constructed from one or more
+than one logical Ogg bitstream (the simplest physical bitstream
+is simply a single logical bitstream). We describe below the exact
+formatting of an Ogg logical bitstream. Combining logical
+bitstreams into more complex physical bitstreams is described in the
+<a href="oggstream.html">Ogg bitstream overview</a>. The exact
+mapping of raw Vorbis packets into a valid Ogg Vorbis physical
+bitstream is described in <a href="vorbis-stream.html">Vorbis
+bitstream mapping</a>.
+
+<h2>Bitstream structure</h2>
+
+An Ogg stream is structured by dividing incoming packets into
+segments of up to 255 bytes and then wrapping a group of contiguous
+packet segments into a variable length page preceeded by a page
+header. Both the header size and page size are variable; the page
+header contains sizing information and checksum data to determine
+header/page size and data integrity.<p>
+
+The bitstream is captured (or recaptured) by looking for the beginning
+of a page, specifically the capture pattern. Once the capture pattern
+is found, the decoder verifies page sync and integrity by computing
+and comparing the checksum. At that point, the decoder can extract the
+packets themselves.<p>
+
+<h3>Packet segmentation</h3>
+
+Packets are logically divided into multiple segments before encoding
+into a page. Note that the segmentation and fragmentation process is a
+logical one; it's used to compute page header values and the original
+page data need not be disturbed, even when a packet spans page
+boundaries.<p>
+
+The raw packet is logically divided into [n] 255 byte segments and a
+last fractional segment of < 255 bytes. A packet size may well
+consist only of the trailing fractional segment, and a fractional
+segment may be zero length. These values, called "lacing values" are
+then saved and placed into the header segment table.<p>
+
+An example should make the basic concept clear:<p>
+
+<pre>
+<tt>
+raw packet:
+ ___________________________________________
+ |______________packet data__________________| 753 bytes
+
+lacing values for page header segment table: 255,255,243
+</tt>
+</pre>
+
+We simply add the lacing values for the total size; the last lacing
+value for a packet is always the value that is less than 255. Note
+that this encoding both avoids imposing a maximum packet size as well
+as imposing minimum overhead on small packets (as opposed to, eg,
+simply using two bytes at the head of every packet and having a max
+packet size of 32k. Small packets (<255, the typical case) are
+penalized with twice the segmentation overhead). Using the lacing
+values as suggested, small packets see the minimum possible
+byte-aligned overheade (1 byte) and large packets, over 512 bytes or
+so, see a fairly constant ~.5% overhead on encoding space.<p>
+
+Note that a lacing value of 255 implies that a second lacing value
+follows in the packet, and a value of < 255 marks the end of the
+packet after that many additional bytes. A packet of 255 bytes (or a
+multiple of 255 bytes) is terminated by a lacing value of 0:<p>
+
+<pre><tt>
+raw packet:
+ _______________________________
+ |________packet data____________| 255 bytes
+
+lacing values: 255, 0
+</tt></pre>
+
+Note also that a 'nil' (zero length) packet is not an error; it
+consists of nothing more than a lacing value of zero in the header.<p>
+
+<h3>Packets spanning pages</h3>
+
+Packets are not resticted to beginning and ending within a page,
+although individual segments are, by definition, required to do so.
+Packets are not restricted to a maximum size, although excessively
+large packets in the data stream are discouraged; the Ogg
+bitstream specification strongly recommends nominal page size of
+approximately 4-8kB (large packets are forseen as being useful for
+initialization data at the beginning of a logical bitstream).<p>
+
+After segmenting a packet, the encoder may decide not to place all the
+resulting segments into the current page; to do so, the encoder places
+the lacing values of the segments it wishes to belong to the current
+page into the current segment table, then finishes the page. The next
+page is begun with the first value in the segment table belonging to
+the next packet segment, thus continuing the packet (data in the
+packet body must also correspond properly to the lacing values in the
+spanned pages. The segment data in the first packet corresponding to
+the lacing values of the first page belong in that page; packet
+segments listed in the segment table of the following page must begin
+the page body of the subsequent page).<p>
+
+The last mechanic to spanning a page boundary is to set the header
+flag in the new page to indicate that the first lacing value in the
+segment table continues rather than begins a packet; a header flag of
+0x01 is set to indicate a continued packet. Although mandatory, it
+is not actually algorithmically necessary; one could inspect the
+preceeding segment table to determine if the packet is new or
+continued. Adding the information to the packet_header flag allows a
+simpler design (with no overhead) that needs only inspect the current
+page header after frame capture. This also allows faster error
+recovery in the event that the packet originates in a corrupt
+preceeding page, implying that the previous page's segment table
+cannot be trusted.<p>
+
+Note that a packet can span an arbitrary number of pages; the above
+spanning process is repeated for each spanned page boundary. Also a
+'zero termination' on a packet size that is an even multiple of 255
+must appear even if the lacing value appears in the next page as a
+zero-length continuation of the current packet. The header flag
+should be set to 0x01 to indicate that the packet spanned, even though
+the span is a nil case as far as data is concerned.<p>
+
+The encoding looks odd, but is properly optimized for speed and the
+expected case of the majority of packets being between 50 and 200
+bytes (note that it is designed such that packets of wildly different
+sizes can be handled within the model; placing packet size
+restrictions on the encoder would have only slightly simplified design
+in page generation and increased overall encoder complexity).<p>
+
+The main point behind tracking individual packets (and packet
+segments) is to allow more flexible encoding tricks that requiring
+explicit knowledge of packet size. An example is simple bandwidth
+limiting, implemented by simply truncating packets in the nominal case
+if the packet is arranged so that the least sensitive portion of the
+data comes last.<p>
+
+<h3>Page header</h3>
+
+The headering mechanism is designed to avoid copying and re-assembly
+of the packet data (ie, making the packet segmentation process a
+logical one); the header can be generated directly from incoming
+packet data. The encoder buffers packet data until it finishes a
+complete page at which point it writes the header followed by the
+buffered packet segments.<p>
+
+<h4>capture_pattern</h4>
+
+ A header begins with a capture pattern that simplifies identifying
+ pages; once the decoder has found the capture pattern it can do a more
+ intensive job of verifying that it has in fact found a page boundary
+ (as opposed to an inadvertant coincidence in the byte stream).<p>
+
+<pre><tt>
+ byte value
+
+ 0 0x4f 'O'
+ 1 0x67 'g'
+ 2 0x67 'g'
+ 3 0x53 'S'
+</tt></pre>
+
+<h4>stream_structure_version</h4>
+
+ The capture pattern is followed by the stream structure revision:
+
+<pre><tt>
+ byte value
+
+ 4 0x00
+</tt></pre>
+
+<h4>header_type_flag</h4>
+
+ The header type flag identifies this page's context in the bitstream:
+
+<pre><tt>
+ byte value
+
+ 5 bitflags: 0x01: unset = fresh packet
+ set = continued packet
+ 0x02: unset = not first page of logical bitstream
+ set = first page of logical bitstream (bos)
+ 0x04: unset = not last page of logical bitstream
+ set = last page of logical bitstream (eos)
+</tt></pre>
+
+<h4>PCM absolute position</h4>
+
+ (This is packed in the same way the rest of Ogg data is packed;
+ LSb of LSB first. Note that the 'position' data specifies a 'sample'
+ number (eg, in a CD quality sample is four octets, 16 bits for left
+ and 16 bits for right; in video it would be the frame number). The
+ position specified is the total samples encoded after including all
+ packets finished on this page (packets begun on this page but
+ continuing on to thenext page do not count). The rationale here is
+ that the position specified in the frame header of the last page
+ tells how long the PCM data coded by the bitstream is. A truncated
+ stream will still return the proper number of samples that can be
+ decoded fully.
+
+<pre><tt>
+ byte value
+
+ 6 0xXX LSB
+ 7 0xXX
+ 8 0xXX
+ 9 0xXX
+ 10 0xXX
+ 11 0xXX
+ 12 0xXX
+ 13 0xXX MSB
+</tt></pre>
+
+<h4>stream serial number</h4>
+
+ Ogg allows for seperate logical bitstreams to be mixed at page
+ granularity in a physical bitstream. The most common case would be
+ sequential arrangement, but it is possible to interleave pages for
+ two seperate bitstreams to be decoded concurrently. The serial
+ number is the means by which pages physical pages are associated with
+ a particular logical stream. Each logical stream must have a unique
+ serial number within a physical stream:
+
+<pre><tt>
+ byte value
+
+ 14 0xXX LSB
+ 15 0xXX
+ 16 0xXX
+ 17 0xXX MSB
+</tt></pre>
+
+<h4>page sequence no</h4>
+
+ Page counter; lets us know if a page is lost (useful where packets
+ span page boundaries).
+
+<pre><tt>
+ byte value
+
+ 18 0xXX LSB
+ 19 0xXX
+ 20 0xXX
+ 21 0xXX MSB
+</tt></pre>
+
+<h4>page checksum</h4>
+
+ 32 bit CRC value (direct algorithm, initial val and final XOR = 0,
+ generator polynomial=0x04c11db7). The value is computed over the
+ entire header (with the CRC field in the header set to zero) and then
+ continued over the page. The CRC field is then filled with the
+ computed value.<p>
+
+ (A thorough discussion of CRC algorithms can be found in <a
+ href="ftp://ftp.rocksoft.com/clients/rocksoft/papers/crc_v3.txt">"A
+ Painless Guide to CRC Error Detection Algorithms"</a> by Ross
+ Williams <a
+ href="mailto:ross@guest.adelaide.edu.au">ross@guest.adelaide.edu.au</a>.)
+
+<pre><tt>
+ byte value
+
+ 22 0xXX LSB
+ 23 0xXX
+ 24 0xXX
+ 25 0xXX MSB
+</tt></pre>
+
+<h4>page_segments</h4>
+
+ The number of segment entries to appear in the segment table. The
+ maximum number of 255 segments (255 bytes each) sets the maximum
+ possible physical page size at 65307 bytes or just under 64kB (thus
+ we know that a header corrupted so as destroy sizing/alignment
+ information will not cause a runaway bitstream. We'll read in the
+ page according to the corrupted size information that's guaranteed to
+ be a reasonable size regardless, notice the checksum mismatch, drop
+ sync and then look for recapture).<p>
+
+<pre><tt>
+ byte value
+
+ 26 0x00-0xff (0-255)
+</tt></pre>
+
+<h4>segment_table (containing packet lacing values)</h4>
+
+ The lacing values for each packet segment physically appearing in
+ this page are listed in contiguous order.
+
+<pre><tt>
+ byte value
+
+ 27 0x00-0xff (0-255)
+ [...]
+ n 0x00-0xff (0-255, n=page_segments+26)
+</tt></pre>
+
+Total page size is calculated directly from the known header size and
+lacing values in the segment table. Packet data segments follow
+immediately after the header.<p>
+
+Page headers typically impose a flat .25-.5% space overhead assuming
+nominal ~8k page sizes. The segmentation table needed for exact
+packet recovery in the streaming layer adds approximately .5-1%
+nominal assuming expected encoder behavior in the 44.1kHz, 128kbps
+stereo encodings.<p>
+
+<hr>
+<a href="http://www.xiph.org/">
+<img src="white-xifish.png" align=left border=0>
+</a>
+<font size=-2 color=#505050>
+
+Ogg is a <a href="http://www.xiph.org">Xiphophorus</a> effort to
+protect essential tenets of Internet multimedia from corporate
+hostage-taking; Open Source is the net's greatest tool to keep
+everyone honest. See <a href="http://www.xiph.org/about.html">About
+Xiphophorus</a> for details.
+<p>
+
+Ogg Vorbis is the first Ogg audio CODEC. Anyone may
+freely use and distribute the Ogg and Vorbis specification,
+whether in a private, public or corporate capacity. However,
+Xiphophorus and the Ogg project (xiph.org) reserve the right to set
+the Ogg/Vorbis specification and certify specification compliance.<p>
+
+Xiphophorus's Vorbis software CODEC implementation is distributed
+under the Lessr/Library GNU Public License. This does not restrict
+third parties from distributing independent implementations of Vorbis
+software under other licenses.<p>
+
+OggSquish, Vorbis, Xiphophorus and their logos are trademarks (tm) of
+<a href="http://www.xiph.org/">Xiphophorus</a>. These pages are
+copyright (C) 1994-2000 Xiphophorus. All rights reserved.<p>
+
+</body>
+
+
diff --git a/doc/oggstream.html b/doc/oggstream.html
new file mode 100644
index 00000000..46a221c8
--- /dev/null
+++ b/doc/oggstream.html
@@ -0,0 +1,192 @@
+<HTML><HEAD><TITLE>xiph.org: Ogg Vorbis documentation</TITLE>
+<BODY bgcolor="#ffffff" text="#202020" link="#006666" vlink="#000000">
+<nobr><a href="vorbis.html"><img src="white-ogg.png" border=0><img
+src="vorbisword2.png" border=0></a></nobr><p>
+
+
+<h1><font color=#000070>
+Ogg logical and physical bitstream overview
+</font></h1>
+
+<em>Last update to this document: July 18, 1999</em><br>
+
+<h2>Ogg bitstreams</h2>
+
+Ogg codecs use octet vectors of raw, compressed data
+(<em>packets</em>). These compressed packets do not have any
+high-level structure or boundary information; strung together, they
+appear to be streams of random bytes with no landmarks.<p>
+
+Raw packets may be used directly by transport mechanisms that provide
+their own framing and packet-seperation mechanisms (such as UDP
+datagrams). For stream based storage (such as files) and transport
+(such as TCP streams or pipes), Vorbis and other future Ogg codecs use
+the Ogg bitstream format to provide framing/sync, sync recapture
+after error, landmarks during seeking, and enough information to
+properly seperate data back into packets at the original packet
+boundaries without relying on decoding to find packet boundaries.<p>
+
+<h2>Logical and physical bitstreams</h2>
+
+Raw packets are grouped and encoded into contiguous pages of
+structured bitstream data called <em>logical bitstreams</em>. A
+logical bitstream consists of pages, in order, belonging to a single
+codec instance. Each page is a self contained entity (although it is
+possible that a packet may be split and encoded across one or more
+pages); that is, the page decode mechanism is designed to recognize,
+verify and handle single pages at a time from the overall bitstream.<p>
+
+Multiple logical bitstreams can be combined (with restricctions) into
+a single <em>physical bitstream</em>. A physical bitstream consists
+of multiple logical bitstreams multiplexed at the page level. Whole
+pages are taken in order from multiple logical bitstreams and combined
+into a single physical stream of pages. The decoder reconstructs the
+original logical bitstreams from the physical bitstream by taking the
+pages in order fromt he physical bitstream and redirecting them into
+the appropriate logical decoding entitiy. The simplest physical
+bitstream is a single, unmultiplexed logical bitstream. <p>
+
+<a href=framing.html>Ogg Logical Bitstream Framing</a> discusses
+the page format of an Ogg bitstream, the packet coding process
+and logical bitstreams in detail. The remainder of this document
+specifies requirements for constructing finished, physical Ogg
+bitstreams.<p>
+
+<h2>Mapping Restrictions</h2>
+
+Logical bitstreams may not be mapped/multiplexed into physical
+bitstreams without restriction. Here we discuss design restrictions
+on Ogg physical bitstreams in general, mostly to introduce
+design rationale. Each 'media' format defines its own (generally more
+restrictive) mapping. An '<a href="vorbis-stream.html">Ogg Vorbis
+Audio Bitstream</a>', for example, has a <a
+href="vorbis-stream.html">specific physical bitstream structure</a>.
+An 'Ogg A/V' bitstream (not currently specified) will also mandate a
+specific, restricted physical bitstream format.<p>
+
+<h3>additional end-to-end structure</h3>
+
+The <a href="framing.html">framing specification</a> defines
+'beginning of stream' and 'end of stream' page markers via a header
+flag (it is possible for a stream to consist of a single page). A
+stream always consists of an integer number of pages, an easy
+requirement given the variable size nature of pages.<p>
+
+In addition to the header flag marking the first and last pages of a
+logical bitstream, the first page of an Ogg bitstream obeys
+additional restrictions. Each individual media mapping specifies its
+own implementation details regarding these restrictions.<p>
+
+The first page of a logical Ogg bitstream consists of a single,
+small 'initial header' packet that includes sufficient information to
+identify the exact CODEC type and media requirements of the logical
+bitstream. The intent of this restriction is to simplify identifying
+the bitstream type and content; for a given media type (or across all
+Ogg media types) we can know that we only need a small, fixed
+amount of data to uniquely identify the bitstream type.<p>
+
+As an example, Ogg Vorbis places the name and revision of the Vorbis
+CODEC, the audio rate and the audio quality into this initial header,
+thus simplifying vastly the certain identification of an Ogg Vorbis
+audio bitstream.<p>
+
+<h3>sequential multiplexing (chaining)</h3>
+
+The simplest form of logical bitstream multiplexing is concatenation
+(<em>chaining</em>). Complete logical bitstreams are strung
+one-after-another in order. The bitstreams do not overlap; the final
+page of a given logical bitstream is immediately followed by the
+initial page of the next. Chaining is the only logical->physical
+mapping allowed by Ogg Vorbis.<p>
+
+Each chained logical bitstream must have a unique serial number within
+the scope of the physical bitstream.<p>
+
+<h3>concurrent multiplexing (grouping)</h3>
+
+Logical bitstreams may also be multiplexed 'in parallel'
+(<em>grouped</em>). An example of grouping would be to allow
+streaming of seperate audio and video streams, using differnt codecs
+and different logical bitstreams, in the same physical bitstream.
+Whole pages from multiple logical bitstreams are mixed together.<p>
+
+The initial pages of each logical bitstream must appear first; the
+media mapping specifies the order of the initial pages. For example,
+Ogg A/V will eventually specify an Ogg video bitstream with
+audio. The mapping may specify that the physical bitstream must begin
+with the initial page of a logical video bitstream, followed by the
+initial page of an audio stream. Unlike initial pages, terminal pages
+for the logical bitstreams need not all occur contiguously (although a
+specific media mapping may require this; it is not mandated by the
+generic Ogg stream spec). Terminal pages may be 'nil' pages,
+that is, pages containing no content but simply a page header with
+position information and the 'last page of bitstream' flag set in the
+page header.<p>
+
+Each grouped bitstream must have a unique serial number within the
+scope of the physical bitstream.<p>
+
+<h3>sequential and concurrent multiplexing</h3>
+
+Groups of concurrently multiplexed bitstreams may be chained
+consecutively. Such a physical bitstream obeys all the rules of both
+grouped and chained multiplexed streams; the groups, when unchained ,
+must stand on their own as a valid concurrently multiplexed
+bitstream.<p>
+
+<h3>multiplexing example</h3>
+
+Below, we present an example of a grouped and chained bitstream:<p>
+
+<img src=stream.png><p>
+
+In this example, we see pages from five total logical bitstreams
+multiplexed into a physical bitstream. Note the following
+characteristics:
+
+<ol><li>Grouped bitstreams begin together; all of the initial pages
+must appear before any data pages. When concurrently multiplexed
+groups are chained, the new group does not begin until all the
+bitstreams in the previous group have terminated.<p>
+
+<li>The pages of concurrently multiplexed bitstreams need not conform
+to a regular order; the only requirement is that page <tt>n</tt> of a
+logical bitstream follow page <tt>n-1</tt> in the physical bitstream.
+There are no restrictions on intervening pages belonging to other
+logical bitstreams. (Tying page appearence to bitrate demands is one
+logical strategy, ie, the page appears at the chronological point
+where decode requires more information).
+
+</ol>
+
+<hr>
+<a href="http://www.xiph.org/">
+<img src="white-xifish.png" align=left border=0>
+</a>
+<font size=-2 color=#505050>
+
+Ogg is a <a href="http://www.xiph.org">Xiphophorus</a> effort to
+protect essential tenets of Internet multimedia from corporate
+hostage-taking; Open Source is the net's greatest tool to keep
+everyone honest. See <a href="http://www.xiph.org/about.html">About
+Xiphophorus</a> for details.
+<p>
+
+Ogg Vorbis is the first Ogg audio CODEC. Anyone may
+freely use and distribute the Ogg and Vorbis specification,
+whether in a private, public or corporate capacity. However,
+Xiphophorus and the Ogg project (xiph.org) reserve the right to set
+the Ogg/Vorbis specification and certify specification compliance.<p>
+
+Xiphophorus's Vorbis software CODEC implementation is distributed
+under the Lesser/Library GNU Public License. This does not restrict
+third parties from distributing independent implementations of Vorbis
+software under other licenses.<p>
+
+OggSquish, Vorbis, Xiphophorus and their logos are trademarks (tm) of
+<a href="http://www.xiph.org/">Xiphophorus</a>. These pages are
+copyright (C) 1994-2000 Xiphophorus. All rights reserved.<p>
+
+</body>
+
+
diff --git a/doc/stream.png b/doc/stream.png
new file mode 100644
index 00000000..6e9dca88
--- /dev/null
+++ b/doc/stream.png
Binary files differ