summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorKeith Bostic <keith.bostic@wiredtiger.com>2011-05-20 18:58:34 -0400
committerKeith Bostic <keith.bostic@wiredtiger.com>2011-05-20 18:58:34 -0400
commit04209dc4fa0af080a656ff3edda34b900a6e74d9 (patch)
tree0e9aec6cd0b5df7b501933d51ce3082a7fa08cd2 /docs
parentc342b339e3c2028f48c934fb053fb6a75d95fc1c (diff)
downloadmongo-04209dc4fa0af080a656ff3edda34b900a6e74d9.tar.gz
Change the huffman code to accept frequencies from 0 to UINTMAX_T, that gives
the engine the appropriate information for creating a weight set.
Diffstat (limited to 'docs')
-rw-r--r--docs/src/huffman.dox39
1 files changed, 23 insertions, 16 deletions
diff --git a/docs/src/huffman.dox b/docs/src/huffman.dox
index 96c7c6791f2..e94c866f5c2 100644
--- a/docs/src/huffman.dox
+++ b/docs/src/huffman.dox
@@ -18,31 +18,38 @@ row-store or a column-store, specify "huffman_value=english", \c
configuration passed to \c WT_SESSION::create.
Setting Huffman encoding to \c "english" configures WiredTiger to
-compress each byte individually and to use a built-in English language
+compress individual bytes and to use a built-in English language
frequency table.
Setting Huffman encoding to \c "utf8:<file>" configures WiredTiger to
-encode each byte individually, and to read the specified file for the
+encode individual bytes, and to read the specified file for the
frequency table. The format of the frequency table file is lines
containing pairs of unsigned integers separated by whitespace. The
-first integer is the byte value, the second integer is the frequency
-value. Byte values and frequency values may be specified as hexadecimal
-numbers (with a leading \c "0x" prefix), or as integers. Byte values
-must be unique and be in the range of 0 to 255. Frequency values do not
-need to be unique, but must be in the range of 0 to 255, where the lower
-the frequency value, the less likely the byte value is to occur. Any
-unspecified byte values are assumed to have frequencies of 0.
+first integer is the symbol value, the second integer is the frequency
+value. Symbol values may be specified as hexadecimal numbers (with a
+leading \c "0x" prefix), or as integers. Symbol values must be unique
+and in the range of 0 to 255. Frequency values do not need to be
+unique, but must be in the range of 0 to the maximum 32-bit unsigned
+integer value (4,294,967,295), where the lower the frequency value, the
+less likely the symbol value is to occur. Any unspecified symbol
+values are assumed to have frequencies of 0.
Setting Huffman encoding to \c "utf16:<file>" configures WiredTiger to
encode pairs of bytes, and to read the specified file for the frequency
table. The format of the frequency table file is lines containing pairs
of unsigned integers separated by whitespace. The first integer is the
-byte value, the second integer is the frequency value. Byte values and
-frequency values may be specified as hexadecimal numbers (with a leading
-\c "0x" prefix), or as integers. Byte values must be unique and be in
-the range of 0 to 65535. Frequency values do not need to be unique, but
-must be in the range of 0 to 65535, where the lower the frequency value,
-the less likely the byte value is to occur. Any unspecified byte
-values are assumed to have frequencies of 0.
+symbol value, the second integer is the frequency value. Symbol values
+may be specified as hexadecimal numbers (with a leading \c "0x" prefix),
+or as integers. Symbol values must be unique and in the range of 0 to
+65,535. Frequency values do not need to be unique, but must be in the
+range of 0 to the maximum 32-bit unsigned integer value (4,294,967,295),
+where the lower the frequency value, the less likely the byte value is
+to occur. Any unspecified symbol values are assumed to have frequencies
+of 0.
+
+Input containing symbol values that did not appear in the frequency
+table (or appeared in the frequency table, but with frequency values of
+0), are accepted, although will not compress as well as if they are
+listed in the frequency table, with frequency values other than 0.
*/