summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorTimothy B. Terriberry <tterribe@xiph.org>2012-05-13 22:16:44 -0400
committerJean-Marc Valin <jmvalin@jmvalin.ca>2012-05-13 22:16:44 -0400
commit3fe9cca1fb02d5c29fe2e1521bb88360ef3e27ae (patch)
treeea05391c50fc15b89040ff76a2000f0a2157c352 /doc
parente134dc4785d793a24622d232dfb0cf04f702bb99 (diff)
downloadopus-3fe9cca1fb02d5c29fe2e1521bb88360ef3e27ae.tar.gz
Gen-art changes
Diffstat (limited to 'doc')
-rw-r--r--doc/draft-ietf-codec-opus.xml252
1 files changed, 154 insertions, 98 deletions
diff --git a/doc/draft-ietf-codec-opus.xml b/doc/draft-ietf-codec-opus.xml
index 34c9aeb5..084af0c0 100644
--- a/doc/draft-ietf-codec-opus.xml
+++ b/doc/draft-ietf-codec-opus.xml
@@ -98,7 +98,7 @@ Only the decoder portion of this software is normative, though a
significant amount of code is shared by both the encoder and decoder.
<xref target="conformance"/> provides a decoder conformance test.
The decoder contains a great deal of integer and fixed-point arithmetic which
- must be performed exactly, including all rounding considerations, so any
+ needs to be performed exactly, including all rounding considerations, so any
useful specification requires domain-specific symbolic language to adequately
define these operations.
Additionally, any
@@ -136,8 +136,8 @@ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
interpreted as described in RFC 2119 <xref target="rfc2119"></xref>.
</t>
<t>
-Even when using floating-point, various operations in the codec require
- bit-exact fixed-point behavior.
+Various operations in the codec require bit-exact fixed-point behavior, even
+ when writing a floating point implementation.
The notation "Q&lt;n&gt;", where n is an integer, denotes the number of binary
digits to the right of the decimal point in a fixed-point number.
For example, a signed Q14 value in a 16-bit word can represent values from
@@ -191,6 +191,41 @@ sign(x) = < 0, x == 0 ,
</t>
</section>
+<section anchor="abs" toc="exclude" title="abs(x)">
+<t>
+The absolute value of x, i.e.,
+<figure align="center">
+<artwork align="center"><![CDATA[
+abs(x) = sign(x)*x .
+]]></artwork>
+</figure>
+</t>
+</section>
+
+<section anchor="floor" toc="exclude" title="floor(f)">
+<t>
+The largest integer z such that z &lt;= f.
+</t>
+</section>
+
+<section anchor="ceil" toc="exclude" title="ceil(f)">
+<t>
+The smallest integer z such that z &gt;= f.
+</t>
+</section>
+
+<section anchor="round" toc="exclude" title="round(f)">
+<t>
+The integer z nearest to f, with ties rounded towards negative infinity,
+ i.e.,
+<figure align="center">
+<artwork align="center"><![CDATA[
+ round(f) = ceil(f - 0.5) .
+]]></artwork>
+</figure>
+</t>
+</section>
+
<section anchor="log2" toc="exclude" title="log2(f)">
<t>
The base-two logarithm of f.
@@ -221,12 +256,6 @@ Examples:
</t>
</section>
-<section anchor="floor" toc="exclude" title="floor(x)">
-<t>
-Largest integer z such that z &lt;= x.
-</t>
-</section>
-
</section>
</section>
@@ -312,10 +341,9 @@ On the other hand, non-speech signals are not always adequately coded using
<t>
A "Hybrid" mode allows the use of both layers simultaneously with a frame size
of 10&nbsp;or 20&nbsp;ms and a SWB or FB audio bandwidth.
-Each frame is split into a low frequency signal and a high frequency signal,
- with a cutoff of 8&nbsp;kHz.
-The LP layer then codes the low frequency signal, followed by the MDCT layer
- coding the high frequency signal.
+The LP layer codes the low frequencies by resampling the signal down to WB.
+The MDCT layer follows, coding the high frequency portion of the signal.
+The cutoff between the two lies at 8&nbsp;kHz, the maximum WB audio bandwidth.
In the MDCT layer, all bands below 8&nbsp;kHz are discarded, so there is no
coding redundancy between the two layers.
</t>
@@ -528,6 +556,10 @@ Support for that variant is OPTIONAL.
All bit diagrams in this document number the bits so that bit 0 is the most
significant bit of the first byte, and bit 7 is the least significant.
Bit 8 is thus the most significant bit of the second byte, etc.
+Well-formed Opus packets obey certain requirements, marked [R1] through [R7]
+ below.
+These are summarized in <xref target="malformed-packets"/> along with
+ appropriate means of handling malformed packets.
</t>
<section anchor="toc_byte" title="The TOC Byte">
@@ -606,9 +638,10 @@ This draft refers to a packet as a code 0 packet, code 1 packet, etc., based on
the value of "c".
</t>
-<t>
+<t anchor="R1">
A well-formed Opus packet MUST contain at least one byte with the TOC
- information, though the frame(s) within a packet MAY be zero bytes long.
+ information&nbsp;[R1], though the frame(s) within a packet MAY be zero bytes
+ long.
</t>
</section>
@@ -649,12 +682,13 @@ It is also roughly the maximum useful rate of the MDCT layer, as shortly
on the codebook sizes.
</t>
-<t>
+<t anchor="R2">
No length is transmitted for the last frame in a VBR packet, or for any of the
frames in a CBR packet, as it can be inferred from the total size of the
packet and the size of all other data in the packet.
-However, the length of any individual frame MUST NOT exceed 1275&nbsp;bytes, to
- allow for repacketization by gateways, conference bridges, or other software.
+However, the length of any individual frame MUST NOT exceed
+ 1275&nbsp;bytes&nbsp;[R2], to allow for repacketization by gateways,
+ conference bridges, or other software.
</t>
</section>
@@ -681,13 +715,13 @@ For code&nbsp;0 packets, the TOC byte is immediately followed by N-1&nbsp;bytes
</section>
<section title="Code 1: Two Frames in the Packet, Each with Equal Compressed Size">
-<t>
+<t anchor="R3">
For code 1 packets, the TOC byte is immediately followed by the
(N-1)/2&nbsp;bytes of compressed data for the first frame, followed by
(N-1)/2&nbsp;bytes of compressed data for the second frame, as illustrated in
<xref target="code1_packet"/>.
The number of payload bytes available for compressed data, N-1, MUST be even
- for all code 1 packets.
+ for all code 1 packets&nbsp;[R3].
</t>
<figure anchor="code1_packet" title="A Code 1 Packet" align="center">
<artwork align="center"><![CDATA[
@@ -709,7 +743,7 @@ The number of payload bytes available for compressed data, N-1, MUST be even
</section>
<section title="Code 2: Two Frames in the Packet, with Different Compressed Sizes">
-<t>
+<t anchor="R4">
For code 2 packets, the TOC byte is followed by a one- or two-byte sequence
indicating the length of the first frame (marked N1 in <xref target='code2_packet'/>),
followed by N1 bytes of compressed data for the first frame.
@@ -720,7 +754,7 @@ A code 2 packet MUST contain enough bytes to represent a valid length.
For example, a 1-byte code 2 packet is always invalid, and a 2-byte code 2
packet whose second byte is in the range 252...255 is also invalid.
The length of the first frame, N1, MUST also be no larger than the size of the
- payload remaining after decoding that length for all code 2 packets.
+ payload remaining after decoding that length for all code 2 packets&nbsp;[R4].
This makes, for example, a 2-byte code 2 packet with a second byte in the range
1...251 invalid as well (the only valid 2-byte code 2 packet is one where the
length of both frames is zero).
@@ -745,17 +779,17 @@ This makes, for example, a 2-byte code 2 packet with a second byte in the range
</section>
<section title="Code 3: A Signaled Number of Frames in the Packet">
-<t>
+<t anchor="R5">
Code 3 packets signal the number of frames, as well as additional
padding, called "Opus padding" to indicate that this padding is added at the
Opus layer, rather than at the transport layer.
-Code 3 packets MUST have at least 2 bytes.
+Code 3 packets MUST have at least 2 bytes&nbsp;[R6,R7].
The TOC byte is followed by a byte encoding the number of frames in the packet
in bits 2 to 7 (marked "M" in <xref target='frame_count_byte'/>), with bit 1 indicating whether
or not Opus padding is inserted (marked "p" in <xref target='frame_count_byte'/>), and bit 0
indicating VBR (marked "v" in <xref target='frame_count_byte'/>).
M MUST NOT be zero, and the audio duration contained within a packet MUST NOT
- exceed 120&nbsp;ms.
+ exceed 120&nbsp;ms&nbsp;[R5].
This limits the maximum frame count for any frame size to 48 (for 2.5&nbsp;ms
frames), with lower limits for longer frame sizes.
<xref target="frame_count_byte"/> illustrates the layout of the frame count
@@ -777,7 +811,7 @@ Values from 0...254 indicate that 0...254&nbsp;bytes of padding are included,
in addition to the byte(s) used to indicate the size of the padding.
If the value is 255, then the size of the additional padding is 254&nbsp;bytes,
plus the padding value encoded in the next byte.
-There MUST be at least one more byte in the packet in this case.
+There MUST be at least one more byte in the packet in this case&nbsp;[R6,R7].
The additional padding bytes appear at the end of the packet, and MUST be set
to zero by the encoder to avoid creating a covert channel.
The decoder MUST accept any value for the padding bytes, however.
@@ -795,17 +829,17 @@ To add 256 bytes to a packet, set the padding bit to 1, insert two bytes after
By using the value 255 multiple times, it is possible to create a packet of any
specific, desired size.
Let P be the number of header bytes used to indicate the padding size plus the
- total amount of padding bytes (i.e., the total number of bytes added to the
- packet).
-Then P MUST be no more than N-2.
+ number of padding bytes themselves (i.e., P is the total number of bytes added
+ to the packet).
+Then P MUST be no more than N-2&nbsp;[R6,R7].
</t>
-<t>
-In the CBR case, the compressed length of each frame in bytes is equal to the
- number of remaining bytes R in the packet after subtracting the (optional)
- padding, (R=N-2-P), divided by M.
-The value R MUST be a non-negative integer multiple of M.
-The compressed data for all M frames then follows, each of size
- (N-2-P)/M&nbsp;bytes, as illustrated in <xref target="code3cbr_packet"/>.
+<t anchor="R6">
+In the CBR case, let R=N-2-P be the number of bytes remaining in the packet
+ after subtracting the (optional) padding.
+Then the compressed length of each frame in bytes is equal to R/M.
+The value R MUST be a non-negative integer multiple of M&nbsp;[R6].
+The compressed data for all M frames follows, each of size
+ R/M&nbsp;bytes, as illustrated in <xref target="code3cbr_packet"/>.
</t>
<figure anchor="code3cbr_packet" title="A CBR Code 3 Packet" align="center">
@@ -816,11 +850,11 @@ The compressed data for all M frames then follows, each of size
| config |s|1|1|0|p| M | Padding length (Optional) :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
-: Compressed frame 1 ((N-2-P)/M bytes)... :
+: Compressed frame 1 (R/M bytes)... :
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
-: Compressed frame 2 ((N-2-P)/M bytes)... :
+: Compressed frame 2 (R/M bytes)... :
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
@@ -828,7 +862,7 @@ The compressed data for all M frames then follows, each of size
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
-: Compressed frame M ((N-2-P)/M bytes)... :
+: Compressed frame M (R/M bytes)... :
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: Opus Padding (Optional)... |
@@ -836,13 +870,13 @@ The compressed data for all M frames then follows, each of size
]]></artwork>
</figure>
-<t>
+<t anchor="R7">
In the VBR case, the (optional) padding length is followed by M-1 frame
lengths (indicated by "N1" to "N[M-1]" in <xref target='code3vbr_packet'/>), each encoded in a
one- or two-byte sequence as described above.
The packet MUST contain enough data for the M-1 lengths after removing the
(optional) padding, and the sum of these lengths MUST be no larger than the
- number of bytes remaining in the packet after decoding them.
+ number of bytes remaining in the packet after decoding them&nbsp;[R7].
The compressed data for all M frames follows, each frame consisting of the
indicated number of bytes, with the final frame consuming any remaining bytes
before the final padding, as illustrated in <xref target="code3cbr_packet"/>.
@@ -944,7 +978,7 @@ Four FB stereo 20&nbsp;ms CELT frames of the same compressed size:
</figure>
</section>
-<section title="Receiving Malformed Packets">
+<section anchor="malformed-packets" title="Receiving Malformed Packets">
<t>
A receiver MUST NOT process packets which violate any of the rules above as
normal Opus packets.
@@ -956,15 +990,16 @@ Packets which violate these constraints may cause implementations of
</t>
<t>
These constraints are summarized here for reference:
-<list style="symbols">
+<list style="format [R%d]">
<t>Packets are at least one byte.</t>
<t>No implicit frame length is larger than 1275 bytes.</t>
<t>Code 1 packets have an odd total length, N, so that (N-1)/2 is an
integer.</t>
-<t>Code 2 packets have enough bytes after the TOC for a valid frame length, and
- that length is no larger than the number of bytes remaining in the packet.</t>
-<t>Code 3 packets contain at least one frame, but no more than 120&nbsp;ms of
- audio total.</t>
+<t>Code 2 packets have enough bytes after the TOC for a valid frame
+ length, and that length is no larger than the number of bytes remaining in the
+ packet.</t>
+<t>Code 3 packets contain at least one frame, but no more than 120&nbsp;ms
+ of audio total.</t>
<t>The length of a CBR code 3 packet, N, is at least two bytes, the number of
bytes added to indicate the padding size plus the trailing padding bytes
themselves, P, is no more than N-2, and the frame count, M, satisfies
@@ -1078,14 +1113,22 @@ The range decoder maintains an internal state vector composed of the two-tuple
current range and the actual coded value, minus one, and the size of the
current range, respectively.
Both val and rng are 32-bit unsigned integer values.
-The decoder initializes rng to 128 and initializes val to 127 minus the top 7
- bits of the first input octet.
-It saves the remaining bit for use in the renormalization procedure described
- in <xref target="range-decoder-renorm"/>, which the decoder invokes
- immediately after initialization to read additional bits and establish the
- invariant that rng&nbsp;&gt;&nbsp;2**23.
</t>
+<section anchor="range-decoder-init" title="Range Decoder Initialization">
+<t>
+Let b0 be the first input octet (or zero if there are no octets in this Opus
+ frame).
+The decoder initializes rng to 128 and initializes val to
+ (127&nbsp;-&nbsp;(b0&gt;&gt;1)), where (b0&gt;&gt;1) is the top 7 bits of the
+ first input octet.
+It saves the remaining bit, (b0&amp;1), for use in the renormalization
+ procedure described in <xref target="range-decoder-renorm"/>, which the
+ decoder invokes immediately after initialization to read additional bits and
+ establish the invariant that rng&nbsp;&gt;&nbsp;2**23.
+</t>
+</section>
+
<section anchor="decoding-symbols" title="Decoding Symbols">
<t>
Decoding a symbol is a two-step process.
@@ -1103,7 +1146,7 @@ fs = ft - min(------ + 1, ft) .
rng/ft
]]></artwork>
</figure>
-The divisions here are exact integer division.
+The divisions here are integer division.
</t>
<t>
The decoder then identifies the symbol in the current context corresponding to
@@ -1159,13 +1202,14 @@ To normalize the range, the decoder repeats the following process, implemented
by ec_dec_normalize() (entdec.c), until rng&nbsp;&gt;&nbsp;2**23.
If rng is already greater than 2**23, the entire process is skipped.
First, it sets rng to (rng&lt;&lt;8).
-Then it reads the next octet of the payload and combines it with the left-over
- bit buffered from the previous octet to form the 8-bit value sym.
-It takes the left-over bit as the high bit (bit 7) of sym, and the top 7 bits
- of the octet it just read as the other 7 bits of sym.
+Then it reads the next octet of the Opus frame and forms an 8-bit value sym,
+ using the left-over bit buffered from the previous octet as the high bit
+ and the top 7 bits of the octet just read as the other 7 bits of sym.
The remaining bit in the octet just read is buffered for use in the next
iteration.
If no more input octets remain, it uses zero bits instead.
+See <xref target="range-decoder-init"/> for the initialization used to process
+ the first octet.
Then, it sets
<figure align="center">
<artwork align="center"><![CDATA[
@@ -1771,6 +1815,8 @@ In order to properly produce LBRR frames under all conditions, an encoder might
transitions.
However, the reference implementation opts to disable LBRR frames at the
transition point for simplicity.
+Since transitions are relatively infrequent in normal usage, this does not have
+ a significant impact on packet loss robustness.
</t>
<t>
@@ -1849,11 +1895,11 @@ The quantized excitation signal (see <xref target="silk_excitation"/>) follows
<c><xref target="silk_gains"/></c>
<c/>
-<c>Normalized LSF Stage 1 Index</c>
+<c>Normalized LSF Stage-1 Index</c>
<c><xref target="silk_nlsf_stage1_pdfs"/></c>
<c/>
-<c>Normalized LSF Stage 2 Residual</c>
+<c>Normalized LSF Stage-2 Residual</c>
<c><xref target="silk_nlsf_stage2"/></c>
<c/>
@@ -1978,7 +2024,7 @@ wi0 = i0 + 3*(n/5)
wi1 = i2 + 3*(n%5)
]]></artwork>
</figure>
- where the division is exact integer division.
+ where the division is integer division.
The range of these indices is 0 to 14, inclusive.
Let w[i] be the i'th weight from <xref target="silk_stereo_weights_table"/>.
Then the two prediction weights, w0_Q13 and w1_Q13, are
@@ -1994,6 +2040,9 @@ w0_Q13 = w_Q13[wi0]
</figure>
N.b., w1_Q13 is computed first here, because w0_Q13 depends on it.
The constant 6554 is approximately 0.1 in Q16.
+Although wi0 and wi1 only have 15 possible values,
+ <xref target="silk_stereo_weights_table"/> contains 16 entries to allow
+ interpolation between entry wi0 and (wi0&nbsp;+&nbsp;1) (and likewise for wi1).
</t>
<texttable anchor="silk_stereo_weights_table"
@@ -2064,6 +2113,7 @@ In that case, if this flag is zero (indicating that there should be a side
channel), then Packet Loss Concealment (PLC, see
<xref target="Packet Loss Concealment"/>) SHOULD be invoked to recover a
side channel signal.
+Otherwise, the stereo image will collapse.
</t>
<texttable anchor="silk_mid_only_pdf" title="Mid-only Flag PDF">
@@ -2171,7 +2221,7 @@ The 3 least significant bits are decoded using a uniform PDF:
</texttable>
<t>
-These 6 bits are combined to form a gain index between 0 and 63.
+These 6 bits are combined to form a value, gain_index, between 0 and 63.
When the gain for the previous subframe is available, then the current gain is
limited as follows:
<figure align="center">
@@ -2182,11 +2232,10 @@ log_gain = max(gain_index, previous_log_gain - 16) .
This may help some implementations limit the change in precision of their
internal LTP history.
The indices which this clamp applies to cannot simply be removed from the
- codebook, because the previous gain index will not be available after packet
- loss.
-This step is skipped after a decoder reset, and in the side channel if the
- previous frame in the side channel was not coded, since there is no previous
- gain index.
+ codebook, because previous_log_gain will not be available after packet loss.
+The clamping is skipped after a decoder reset, and in the side channel if the
+ previous frame in the side channel was not coded, since there is no value for
+ previous_log_gain available.
It MAY also be skipped after packet loss.
</t>
@@ -2195,7 +2244,7 @@ For subframes which do not have an independent gain (including the first
subframe of frames not listed as using independent coding above), the
quantization gain is coded relative to the gain from the previous subframe (in
the same channel).
-The PDF in <xref target="silk_delta_gain_pdf"/> yields a delta gain index
+The PDF in <xref target="silk_delta_gain_pdf"/> yields a delta_gain_index value
between 0 and 40, inclusive.
</t>
<texttable anchor="silk_delta_gain_pdf"
@@ -2212,8 +2261,8 @@ The following formula translates this index into a quantization gain for the
current subframe using the gain from the previous subframe:
<figure align="center">
<artwork align="center"><![CDATA[
-log_gain = clamp(0, max(2*gain_index - 16,
- previous_log_gain + gain_index - 4), 63) .
+log_gain = clamp(0, max(2*delta_gain_index - 16,
+ previous_log_gain + delta_gain_index - 4), 63) .
]]></artwork>
</figure>
</t>
@@ -2251,10 +2300,10 @@ A set of normalized Line Spectral Frequency (LSF) coefficients follow the
Coding (LPC) coefficients for the current SILK frame.
Once decoded, the normalized LSFs form an increasing list of Q15 values between
0 and 1.
-These represent the interleaved zeros on the unit circle between 0 and pi
- (hence "normalized") in the standard decomposition of the LPC filter into a
- symmetric part and an anti-symmetric part (P and Q in
- <xref target="silk_nlsf2lpc"/>).
+These represent the interleaved zeros on the upper half of the unit circle
+ (between 0 and pi, hence "normalized") in the standard decomposition
+ <xref target="line-spectral-pairs"/> of the LPC filter into a symmetric part
+ and an anti-symmetric part (P and Q in <xref target="silk_nlsf2lpc"/>).
Because of non-linear effects in the decoding process, an implementation SHOULD
match the fixed-point arithmetic described in this section exactly.
An encoder SHOULD also use the same process.
@@ -2275,7 +2324,7 @@ After reconstructing the normalized LSFs
All of this is necessary to ensure the reconstruction process is stable.
</t>
-<section anchor="silk_nlsf_stage1" title="Stage 1 Normalized LSF Decoding">
+<section anchor="silk_nlsf_stage1" title="Normalized LSF Stage 1 Decoding">
<t>
The first VQ stage uses a 32-element codebook, coded with one of the PDFs in
<xref target="silk_nlsf_stage1_pdfs"/>, depending on the audio bandwidth and
@@ -2291,7 +2340,7 @@ The actual codebook elements are listed in
</t>
<texttable anchor="silk_nlsf_stage1_pdfs"
- title="PDFs for Normalized LSF Index Stage-1 Decoding">
+ title="PDFs for Normalized LSF Stage-1 Index Decoding">
<ttcol align="left">Audio Bandwidth</ttcol>
<ttcol align="left">Signal Type</ttcol>
<ttcol align="left">PDF</ttcol>
@@ -2327,7 +2376,7 @@ The actual codebook elements are listed in
</section>
-<section anchor="silk_nlsf_stage2" title="Stage 2 Normalized LSF Decoding">
+<section anchor="silk_nlsf_stage2" title="Normalized LSF Stage 2 Decoding">
<t>
A total of 16 PDFs are available for the LSF residual in the second stage: the
8 (a...h) for NB and MB frames given in
@@ -2341,7 +2390,7 @@ Which PDF is used for which coefficient is driven by the index, I1,
</t>
<texttable anchor="silk_nlsf_stage2_nbmb_pdfs"
- title="PDFs for NB/MB Normalized LSF Index Stage-2 Decoding">
+ title="PDFs for NB/MB Normalized LSF Stage-2 Index Decoding">
<ttcol align="left">Codebook</ttcol>
<ttcol align="left">PDF</ttcol>
<c>a</c> <c>{1, 1, 1, 15, 224, 11, 1, 1, 1}/256</c>
@@ -2355,7 +2404,7 @@ Which PDF is used for which coefficient is driven by the index, I1,
</texttable>
<texttable anchor="silk_nlsf_stage2_wb_pdfs"
- title="PDFs for WB Normalized LSF Index Stage-2 Decoding">
+ title="PDFs for WB Normalized LSF Stage-2 Index Decoding">
<ttcol align="left">Codebook</ttcol>
<ttcol align="left">PDF</ttcol>
<c>i</c> <c>{1, 1, 1, 9, 232, 9, 1, 1, 1}/256</c>
@@ -2369,7 +2418,7 @@ Which PDF is used for which coefficient is driven by the index, I1,
</texttable>
<texttable anchor="silk_nlsf_nbmb_stage2_cb_sel"
- title="Codebook Selection for NB/MB Normalized LSF Index Stage 2 Decoding">
+ title="Codebook Selection for NB/MB Normalized LSF Stage-2 Index Decoding">
<ttcol>I1</ttcol>
<ttcol>Coefficient</ttcol>
<c/>
@@ -2441,7 +2490,7 @@ Which PDF is used for which coefficient is driven by the index, I1,
</texttable>
<texttable anchor="silk_nlsf_wb_stage2_cb_sel"
- title="Codebook Selection for WB Normalized LSF Index Stage 2 Decoding">
+ title="Codebook Selection for WB Normalized LSF Stage-2 Index Decoding">
<ttcol>I1</ttcol>
<ttcol>Coefficient</ttcol>
<c/>
@@ -2763,7 +2812,7 @@ w2_Q18[k] = (1024/(cb1_Q8[k] - cb1_Q8[k-1])
</artwork>
</figure>
where cb1_Q8[-1]&nbsp;=&nbsp;0 and cb1_Q8[d_LPC]&nbsp;=&nbsp;256, and the
- division is exact integer division.
+ division is integer division.
This is reduced to an unsquared, Q9 value using the following square-root
approximation:
<figure align="center">
@@ -2786,7 +2835,7 @@ The reference implementation already requires code to compute these weights on
</t>
<texttable anchor="silk_nlsf_nbmb_codebook"
- title="Codebook Vectors for NB/MB Normalized LSF Stage 1 Decoding">
+ title="NB/MB Normalized LSF Stage-1 Codebook Vectors">
<ttcol>I1</ttcol>
<ttcol>Codebook (Q8)</ttcol>
<c/>
@@ -2858,7 +2907,7 @@ The reference implementation already requires code to compute these weights on
</texttable>
<texttable anchor="silk_nlsf_wb_codebook"
- title="Codebook Vectors for WB Normalized LSF Stage 1 Decoding">
+ title="WB Normalized LSF Stage-1 Codebook Vectors">
<ttcol>I1</ttcol>
<ttcol>Codebook (Q8)</ttcol>
<c/>
@@ -2939,7 +2988,7 @@ NLSF_Q15[k] = clamp(0,
(cb1_Q8[k]<<7) + (res_Q10[k]<<14)/w_Q9[k], 32767) ,
]]></artwork>
</figure>
- where the division is exact integer division.
+ where the division is integer division.
However, nothing in either the reconstruction process or the
quantization process in the encoder thus far guarantees that the coefficients
are monotonically increasing and separated well enough to ensure a stable
@@ -3010,16 +3059,16 @@ For all other values of i, both NLSF_Q15[i-1] and NLSF_Q15[i] are updated as
follows:
<figure align="center">
<artwork align="center"><![CDATA[
- i-1
- __
- min_center_Q15 = (NDeltaMin[i]>>1) + \ NDeltaMin[k]
- /_
- k=0
- d_LPC
- __
- max_center_Q15 = 32768 - (NDeltaMin[i]>>1) - \ NDeltaMin[k]
- /_
- k=i+1
+ i-1
+ __
+ min_center_Q15 = (NDeltaMin_Q15[i]>>1) + \ NDeltaMin_Q15[k]
+ /_
+ k=0
+ d_LPC
+ __
+ max_center_Q15 = 32768 - (NDeltaMin_Q15[i]>>1) - \ NDeltaMin_Q15[k]
+ /_
+ k=i+1
center_freq_Q15 = clamp(min_center_Q15[i],
(NLSF_Q15[i-1] + NLSF_Q15[i] + 1)>>1,
max_center_Q15[i])
@@ -3353,7 +3402,7 @@ sc_Q16[0] = 65470 - -------------------------- ,
(maxabs_Q12 * (k+1)) >> 2
]]></artwork>
</figure>
- where the division here is exact integer division.
+ where the division here is integer division.
This is an approximation of the chirp factor needed to reduce the target
coefficient to 32767, though it is both less than 0.999 and, for
k&nbsp;&gt;&nbsp;0 when maxabs_Q12 is much greater than 32767, still slightly
@@ -6035,7 +6084,7 @@ rng = rng - --- * (fh - fl) .
ft
]]></artwork>
</figure>
-The divisions here are exact integer division.
+The divisions here are integer division.
</t>
<section anchor="range-encoder-renorm" title="Renormalization">
@@ -7605,7 +7654,7 @@ Robust and Efficient Quantization of Speech LSP Parameters Using Structured Vect
<reference anchor="Martin79">
<front>
<title>Range encoding: An algorithm for removing redundancy from a digitised message</title>
-<author initials="N." surname="Martin" fullname=""><organization/></author>
+<author initials="G.N.N." surname="Martin" fullname="G. Nigel N. Martin"><organization/></author>
<date year="1979" />
</front>
<seriesInfo name="Proc. Institution of Electronic and Radio Engineers International Conference on Video and Data Recording" value="" />
@@ -7693,6 +7742,13 @@ Robust and Efficient Quantization of Speech LSP Parameters Using Structured Vect
</front>
</reference>
+<reference anchor="line-spectral-pairs" target="http://en.wikipedia.org/wiki/Line_spectral_pairs">
+<front>
+<title>Line Spectral Pairs</title>
+<author><organization>Wikipedia</organization></author>
+</front>
+</reference>
+
<reference anchor="range-coding" target="http://en.wikipedia.org/wiki/Range_coding">
<front>
<title>Range Coding</title>