summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorMichael Cahill <michael.cahill@wiredtiger.com>2011-02-01 21:25:48 +1100
committerMichael Cahill <michael.cahill@wiredtiger.com>2011-02-01 21:25:48 +1100
commit11750c575e25a67ecffd02fedc13552ea7957a06 (patch)
treef18bf22ee1237439aaa035301ec009a122d463f6 /docs
parent6997f5a8cd4c140e66530c5eacd1cc4239cc542d (diff)
downloadmongo-11750c575e25a67ecffd02fedc13552ea7957a06.tar.gz
Run doxygen to generate documentation from s_all.
refs #27
Diffstat (limited to 'docs')
-rw-r--r--docs/Doxyfile2
-rw-r--r--docs/design.html633
-rw-r--r--docs/layout.html201
3 files changed, 1 insertions, 835 deletions
diff --git a/docs/Doxyfile b/docs/Doxyfile
index 88db1ee0316..041f93de028 100644
--- a/docs/Doxyfile
+++ b/docs/Doxyfile
@@ -595,7 +595,7 @@ WARN_LOGFILE = doxygen.log
# directories like "/usr/src/myproject". Separate the files or directories
# with spaces.
-INPUT = ../include/wiredtiger.h \
+INPUT = ../include/wiredtiger.in \
../include/wiredtiger_ext.h \
src
diff --git a/docs/design.html b/docs/design.html
deleted file mode 100644
index d39171db43e..00000000000
--- a/docs/design.html
+++ /dev/null
@@ -1,633 +0,0 @@
-<htmL
-<head>
-<title>Wired Tiger</title>
-</head>
-<body bgcolor=white>
-<center><h1><font color="green">Wired Tiger</font></h1></center>
-<p align=right>$Date$</p>
-
-<hr size=1 noshade>
-
-<h3>Major architectural differences from Berkeley DB:</h3>
-<ol>
-
-<li><b>Support for multi-core architectures</b>:
-architectures have 10's or 100's of cores, with multiple threads per
-core, and touching shared memory kills performance. Berkeley DB's
-architecture is based on shared memory data mutex architecture (where
-mutexes serialize access to data structures), which doesn't scale well.
-
-<p>Wired Tiger will have a "work queue architecture", that is, an
-interface outside of the database engine where threads of control queue
-work. Inside the database engine, a different thread of control will
-perform the work. The work queue will be read and written by only two
-threads, limiting the amount of memory that's touched by more than a
-single thread of control.
-
-<p>Initial benchmarks show that schedulers/architectures are not
-uniform: for some, processes are better than threads and vice-versa, and
-polling is better than mutexing, and vice-versa. For that reason, we
-will use a combination of polling and mutexes for the work queue, and
-configure at run-time if the database engine uses separate processes or
-threads to perform work.
-
-<p>Mutexes will behave in the usual fashion: clients will write their
-work into the shared memory queue, wake the worker if necessary, then
-sleep on a mutex. Workers will perform the work and wake the client.
-Polling will be implemented using shared memory and a memory flush, in
-other words, the usual "lock; write the data structure; unlock" cycle
-is replaced with "write new task into a memory slot, flush memory, poll
-for new task to complete". However, only the application knows how many
-threads it has and what they're doing, so the application will have to
-be responsible for binding threads to cores. This means we will need
-to export controls so applications can bind threads of control to
-specific processors and do other platform-specific scheduling tasks.
-
-<p>As in Berkeley DB, this "work queue architecture" must handle two
-different programmatic architectures:
-
-<ol>
-<li>A single process, without threads, in other words, a single thread
-of control. In this architecture, the work queue will be skipped
-entirely, and the thread of control will enter the library and do all
-work itself.
-
-<li>One or more processes, optionally threaded -- in other words,
-multiple threads of control. Complicating this architecture is the
-possibility that Wired Tiger may be called from other library routines
-(this is the Berkeley DB Subversion problem, and the reason Berkeley DB
-implemented the DB_REGISTER flag).
-</ol>
-
-<p>
-The programmatic architecture will be selectable at run-time (unless
-that proves too costly in terms of performance, in which case we can
-add configuration-time options).
-
-<p>
-The "work queue architecture" results in one significant difference from
-the Berkeley DB architecture. A separate process cannot configure any
-database engine callbacks or other functions for the database (such as
-providing a key sort function), those must be configured within the
-threads of control executing the actual database code. This is
-annoying, as it requires the database configuration and callback to be
-done before any separate process accesses the database. On the other
-hand, it is useful as only one process in the application suite has to
-be aware of the underlying configuration details of the database.
-
-<p>
-<font color="blue">Maybe </font>Automatically partition the
-database to transfer work requests to a different machine or thread of
-control. Once the database is large enough that threads will
-reasonably not conflict, it would be best to have multiple threads of
-control to partition work on the data. In this case, each thread which
-owns a chunk of a database should write to its own, individual, log
-file.
-
-<p><li><b>Environment creation and recovery</b>:
-Berkeley DB's environment creation and recovery model is too complex.
-Applications were often unable to determine when to run recovery,
-requiring, for example, features like the DB_REGISTER flag and the
-DbEnv.failchk method.
-
-<p>As previously described, Wired Tiger environments will have separate
-threads of control performing all database work. The default Wired
-Tiger environment will be a combination of Berkeley DB's DbEnv.failchk
-and DB_REGISTER configurations: whenever a thread of control "joins" a
-Wired Tiger environment, or periodically as called by an application
-thread, Wired Tiger will confirm the database engine's thread of control
-is still running. If that thread of control has failed, another thread
-of control will optionally be created, whatever recovery is necessary
-will be performed, and the system will continue. In this sense, Wired
-Tiger will be a "crashless" system, that is, applications will never
-know that recovery was performed, it will be performed behind the scenes
-as necessary.
-
-<p><li><b>Schema layer</b>:
-Wired Tiger database environments have a configuration which describes
-the contents of the environment. This focuses on, but is not limited
-to, the database files. The goal is to eliminate Berkeley DB's baroque
-file handling. File operations (for example, remove, rename, create,
-etc.) will be done by transactionally modifying the schema database. To
-remove a file, we can remove the name in the schema database, and deal
-with the physical remove as part of the transaction commit. Any
-transaction serialization required by operations on that file can be
-done based on handling the record in the schema database. Recovery
-cleanup can be done by reviewing the entries in the scheme database
-after it has been recovered. By divorcing file system names from
-logical database names, things get easier: as another example, file
-striping can be done at a logical layer, where looking up a name in the
-schema database can return whatever physical file and/or offset we want.
-
-<p><li><b><font color="blue">Maybe </font>Record-level locking</b>:
-Record-level locking is tempting in order to increase concurrency. It
-significantly complicates recovery, unfortunately. However, there
-isn't any other way to increase insert rates where the keys are inserted
-in sorted order, which is a significant failing of Berkeley DB.
-
-<p><li><b><font color="blue">Maybe </font>Logical logging</b>:
-Logical logging is tempting because of the possibility of supporting
-multi-master replication (by shipping logical operations to the replicas
-instead of shipping physical record changes). However, logical logging
-significantly complicates recovery (for example, no split was required
-during the operation but a split is unexpectedly required during
-recovery), and complicating recovery is a bad idea.
-
-</ol>
-
-<h3>General API changes:</h3>
-<ol>
-
-<p><li><b>Per-thread handles</b>:
-Applications must allocate a per-thread object for any thread that will
-be calling a WiredTiger function. When crossing the library API, the
-per-thread handle will be updated to reference any current environment
-and database handles, then the per-thread handle will be the handle
-passed as the first argument of every internal database function.
-
-<p>
-Each thread of control's information will include a lock ID which will
-be used for deadlock detection. (This assumes we eventually implement
-some kind of deadlock detection...)
-
-<p><li><b>Object constructors/destructors</b>:
-There will be explicit object constructor/destructor functions which are
-used for configuration only (the DB_ENV vs. ENV split done right). The
-application has to allocate the object, then configure it, then use it
-and then destroy it. It may be repeatedly used, as it contains no
-internal information.
-
-<p><li><b>Error handling</b>:
-Wired Tiger error messages will include a URL which can be used to look
-up detailed error information. This makes error configuration fairly
-complicated: an error message will have two parts, a per-release URL for
-detailed information and a cryptic error message. By default, both will
-be displayed, but there will be a compile-time options to output only
-the number (to support a small footprint). Error messages in the code
-will need to contain the number, the short message and the detailed
-message, and the URL support will be automatically generated for each
-release.
-
-<p>
-Error handling (error message, prefix, stream, whatever) will be
-database name specific.
-
-<p><li><b>Configuration</b>:
-There will be no general configuration functions, Wired Tiger will use
-configuration methods only; as in Berkeley DB, there will be getter
-functions in all cases. This will also apply to many of the arguments
-for methods, that is, optional arguments will be set with configuration
-methods, rather than specifying potentially ignored-values to the method
-call.
-
-<p>
-Configuration routines will only update specific shared memory region
-locations, to allow auto-generation of the configuration code.
-
-<p><li><b>Cursor errors</b>:
-By default, cursors will become unusable after any error; applications
-wanting Berkeley DB behavior, where the cursor has not moved and can be
-used after error, will have a flag which duplicates the cursor before
-each operation (Berkeley DB's DbCursor.dup method). The point is that
-cursor duplication is painful and difficult, and most applications don't
-need it.
-
-<p><li><b>Secondary indices</b>:
-Combine the Db.get and Db.pget calls into a single method.
-
-<p><li><b>Delete of non-existent entry</b>:
-Delete of a non-existent entry should optionally not return an error.
-
-<p><li><b>File/directory permission modes</b>:
-Support explicit modes for all physical files.
-
-<p><li><b>Flash support</b>:
-Add high-level environment configuration which indicates no underlying
-disk support.
-
-</ol>
-
-<h3>Btree access method changes:</h3>
-
-<table width="100%" border="1" align=left>
-<tr valign="top">
-<th align="left" valign="top" width="15%">Feature</th>
-<th align="left" width="2%">Implemented</th>
-<th align="left">Notes</th>
-</tr>
-
-</td></tr>
-<tr valign="top"><td>
-Bulk load
-</td><td>
-Yes
-</td><td>
-Wired Tiger supports bulk load, where sorted key/data pairs are read
-from a callback function, and a B+tree is built bottom-up in a new
-database.
-
-<tr valign="top"><td>
-Page size
-</td><td>
-Yes
-</td><td>
-Berkeley DB's maximum page size is 64KB, limiting on-page key/data items
-to roughly 16KB. Wired Tiger supports pages of any practical size. The
-goal is two-fold, to allow databases with key or data items larger than
-16KB to be stored on-page, and to increase the data transfer of a single
-disk read. However, simply increasing the size of the database page has
-a problem: in Berkeley DB, internal nodes are the same size as leaf
-pages, and a binary search of large pages will behave badly in the L1/L2
-caches. To avoid this problem, Wired Tiger supports configuration of
-different sizes for internal and leaf nodes.
-
-</td></tr>
-<tr valign="top"><td>
-Overflow items
-</td><td>
-Yes
-</td><td>
-<p>
-In Berkeley DB, overflow items (key or data items too large to fit on a
-leaf page), are allocated in units of database page size. This wastes,
-on average, half a database page for each overflow item. Also, overflow
-items are then marshalled to/from those database pages. WiredTiger
-allocates overflow objects directly from the underlying file, using a
-configurable value as the unit of allocation. This eliminates the
-wasted space and the manipulation of multiple pages in order to store
-or retrieve overflow items.
-
-</td></tr>
-<tr valign="top"><td>
-Extent allocation
-</td><td>
-</td><td>
-Wired Tiger supports extent-based allocation in the B+tree. Pages will
-be initially allocated at the page size; when splitting, pages will
-increase in size to the maximum page size. Pages split after reaching
-the maximum page size will be aggregated into extents. For example,
-imagine an 8KB page size with a maximum page size of 64KB and 10MB
-extents. The page is initially 8KB; as that runs out of space, the page
-size doubles until it's 64KB. The next split will result in 2
-contiguous 64KB pages, until there are 160 contiguous pages in memory,
-that is, a 10MB extent. Internal pages will not be gathered into
-extents, although they will increase in size from the initial size to
-the maximum page size.
-
-<p>
-The goal is to speed sequential search as much larger chunks of disk can
-be returned in a single disk read. As extents are made up of smaller
-pages, random accesses do not need to read the full extent.
-
-</td></tr>
-<tr valign="top"><td>
-Secondary indices
-</td><td>
-</td><td>
-Wired Tiger optionally stores a copy (or part of a copy) of the primary
-data item in the secondary to speed the use of a secondary index,
-avoiding subsequent lookup in the primary. In Berkeley DB, an "ordered"
-lookup when walking a secondary becomes a "random" lookup on the primary
-in order to return the primary's data item, resulting in a colder cache
-and more deadlocks. By storing a copy of the primary's data in the
-secondary, we can avoid the second lookup.
-
-</td></tr>
-<tr valign="top"><td>
-Compression
-</td><td>
-</td><td>
-Wired Tiger will support static (probalby Huffman encoding) key/data
-item compression. This worth doing because CPU is less-important than
-memory, and trading CPU to keep the cache hotter is the right choice to
-make.
-
-</td></tr>
-<tr valign="top"><td>
-Salvage
-</td><td>
-</td><td>
-Wired Tiger supports in-place database salvage.
-
-</td></tr>
-<tr valign="top"><td>
-Automatic transaction retry
-</td><td>
-</td><td>
-Wired Tiger supports automatic retry of an individual transactional
-operation, should it be selected to resolve a deadlock.
-
-</td></tr>
-<tr valign="top"><td>
-Copy avoidance
-</td><td>
-</td><td>
-Wired Tiger supports a call-back function which is called from inside
-the library and given a pointer to the key/data item (normally on the
-page), avoiding any data copy at all.
-
-</td></tr>
-<tr valign="top"><td>
-BLOB support
-</td><td>
-</td><td>
-Wired Tiger will support BLOBs using Berkeley DB's "returned file
-handle" design. While this is less necessary given Wired Tiger's
-support for large page sizes, it's still useful for truly large data
-items.
-
-</td></tr>
-</table>
-
-<h3>Cache changes:</h3>
-
-<table width="100%" border="1" align=left>
-<tr valign="top">
-<th align="left" valign="top" width="15%">Feature</th>
-<th align="left" width="2%">Implemented</th>
-<th align="left">Notes</th>
-</tr>
-
-</td></tr>
-<tr valign="top"><td>
-Database page alignment
-</td><td>
-Yes
-</td><td>
-Database pages are separately allocated from their headers so direct I/O
-from the underlying file into memory is possible.
-
-</td></tr>
-<tr valign="top"><td>
-File identification
-</td><td>
-Yes
-</td><td>
-Berkeley DB encodes a file ID string in the file header; WiredTiger Use
-the file name itself, and expects it to be unique.
-
-</td></tr>
-<tr valign="top"><td>
-MVCC
-</td><td>
-</td><td>
-Remove MVCC's support for backing freeze files, applications must grow
-the cache if there's insufficient space for the transactional
-requirements.
-
-</td></tr>
-</table>
-
-
-
-<h3>Implementation changes:</h3>
-<ol>
-
-<p><li><b>Micro-logging</b>:
-To the extent possible, Wired Tiger should support TimesTen's notion of
-micro-logging (where threads of control do work in ways that if they
-fail, other threads of control can clean up without interrupting the
-overall progress of the database engine). While desirable, this is not
-a requirement of the architecture itself, it's a performance enhancement
-in that it minimizes the events which can force more or less expensive
-recovery operations to be performed.
-
-<p>
-The biggest problem for Berkeley DB was the shared regions, as they
-could be corrupted by an application thread of control dying while
-modifying the shared region, resulting in an expensive recovery period
-as well as an expensive re-creation of the large shared region. In
-Wired Tiger, the only shared regions will be the individual chunks used
-by threads of control to communicate with the threads running in the
-database engine, removing this problem entirely.
-
-<p><li><b>Utilities</b>:
-All of the Wired Tiger utilities (except for any code generation
-utility) will be implemented as thin wrappers. (In Berkeley DB db_dump,
-db_load and db_hotbackup have far too much program logic.)
-
-<p><li><b>Exported subsystems</b>:
-Wired Tiger will not export the cache, locking, logging, mutex and
-transaction APIs in a general way. Reason: the insistence on separate
-subsystems meant we didn't pass a single object handle throughout the
-entire system. There isn't enough demand for the subsystems to make
-them available initially. That said, we will continue to rigorously
-separate the functionality to the extent possible.
-
-<p><li><b>API layering</b>:
-Wired Tiger will not require mutexes to cross the API layer. This will
-allow us to consistently use handle methods to access internal as well
-as external functions throughout the library. The "global" thread
-cookie will contain the information we need to figure out if any locks
-are needed as we cross API layers.
-
-<p><li><b>Code Generation</b>:
-Wired Tiger will support a code-generation utility early on. It will
-be a script that outputs a C-language program, supporting all features.
-
-<p><li><b>File types</b>:
-Wired Tiger will use the file name space to aggressively categorize
-storage by class: optionally taking advantage of Flash vs. RAM vs.
-durable disk writes vs. standard disk, vs. network storage.
-
-<p><li><b>Hot backup</b>:
-Wired Tiger will have a hot-backup operation as part of the library.
-Hot backups can be faster if the hot backup thread writes a log record
-("starting database copy") into the log, because recovery only has to
-go back to the checkpoint preceding that record, rather than to the
-beginning of the log, when doing "catastrophic" recovery on a hot
-backup.
-
-<p>
-Additionally, hot backup should be integrated with replication: a hot
-backup is no different from the initialization of a replica.
-
-<p><li><b>Scripting language</b>:
-Use Python or Ruby as our initial scripting language, and possibly
-Erlang to write multi-threaded tests.
-
-<p><li><b>Cursor operations</b>:
-Split up the cursor get/pget functions into a group of methods, instead
-of the current complex set of flags.
-
-<p><li><b>Record numbers</b>:
-Support 64-bit logical record numbers.
-<p>
-Give the application a flag to specify keys are numbers, it's too common
-an error.
-<p>
-Support holes in the record number space.
-
-<p><li><b>Statistics</b>:
-Implement 64-bit statistics values.
-<p>
-Output statistics in XML, and provide a script to display them in text.
-<p>
-Store all statistics in a single chunk of space so it's easy to snapshot
-the engine's performance.
-
-<p><li><b>Log files</b>:
-Dump log files in XML and provide a script which turns them into flat text.
-
-<p><li><b>Compaction</b>:
-Do compaction lazily, top-down through the tree instead of bottom-up.
-
-</ol>
-<h3>Documentation:</h3>
-<ol>
-
-<p><li>
-Support HTML only. Use cpp instead of m4 as our flat-text to HTML
-pre-processor.
-
-<p><li>
-Write tutorials (booklets) around the features from day 1, don't depend
-on the man pages alone. (Instead of having a Reference Guide, have
-booklets.)
-
-<p><li>
-Every man page will have a code example that can be cut-and-pasted by
-programmers.
-
-</ol>
-<h3>Do it the same (mostly):</h3>
-<ol>
-
-<ul type=disc>
-<li>Bulk get
-<li>C API (require ANSI C)
-<li>C++ API (we should support C++ hash & cursors)
-<li>Checksums
-<li>Documentation
-<li>Duplicates (both on- and off-page)
-<li>Error handling
-<li>Event notification
-<li>Group commit
-<li>Growing and shrinking the cache
-<li>Micro tests
-<li>Mutexes
-<li>Page layout
-<li>POSIX and Windows support, initially
-<li>Purify
-<li>Secondary indices, foreign key constraints
-<li>Subsystem separation (access methods, cache, lock, log, replication, txn)
-<li>Test suite (written in Python, Ruby, or C)
-</ul>
-
-<p><li><b>Replication Manager</b>:
-Wired Tiger's focus needs to be on partial and cascading replication,
-there's useful features in both.
-
-</ol>
-<h3>Don't do it at all:</h3>
-<ol>
-
-<p><li><b>Concurrent Data Store, in-memory databases</b>:
-Named, non-durable databases are the simpler solution to both problems.
-
-<p><li><b>Cryptography</b>:
-It's hard and nobody on the engineering team ever understands it. If
-the files need to be encrypted, rely on 3rd party products that encrypt
-the entire filesystem.
-
-<p><li><b>Deadlock detection</b>:
-Use timeout-based deadlock detection, instead -- the graphs are
-complicated and expensive to build.
-
-<p><li><b>Java API</b>:
-Java: the Berkeley DB JE product is a better solution than JNI.
-
-<p><li><b>Application-specific logging</b>:
-Not useful enough to replace, although we should continue to allow
-applications to insert their own records into the log, just no longer
-document any interface to those records during recovery (or even when
-the records will be used and/or discarded). Application-written log
-records will only be subsequently available by reading the log from
-outside of the engine.
-
-<p><li><b>Hash access method</b>:
-The Berkeley DB Hash access method offers one feature: a hotter cache
-because fewer meta-data pages are required when the data is truly large.
-This feature is less useful for Wired Tiger, given Wired Tiger's support
-for larger pages, but is still useful in some cases. However, it isn't
-useful enough to implement a separate access method as in Berkeley DB.
-Could we do it by modifying the B+tree somehow?
-
-<p><li><b>Recno access method</b>:
-The Berkeley DB Recno access method offers two features: immutable
-record numbers and backing text files. Immutable record numbers could
-be supported in a key-based B+tree where deleted records were replaced
-with placeholders. Backing text files aren't useful enough to re-implement.
-
-<p><li><b>Queue access method</b>:
-The Berkeley DB Queue access method offers two features: removal of
-extent-based files as the database grows and shrinks, and record-level
-locking, at the price of fixed-sized key/data items. These features
-could be provided by a key-based, record-level locking B+tree, that
-could be striped across multiple files (using the schema layer
-database). The one problem is this will result in phantoms, which may
-be a problem for queue-based applications.
-
-<p><li><b>Subdatabases</b>:
-Subdatabases (multiple databases per file) are actually useful for the
-few applications wanting to use the database name space for things like
-objects (for example, a C++ compiler writing objects into databases, or
-an application storing a user mailbox per database). However, they
-greatly complicate database creation and deletion, not to mention
-logging and recovery. It might be possible to do something cleaner than
-Berkeley DB in the context of a schema file, but for now I don't think
-it's worth the effort.
-
-<p><li><b>Db.feedback</b>:
-Not useful enough to replace -- include any feedback as part of general
-event notification.
-
-<p><li>
-List of other features/support not useful enough to replace:
-<ul type=disc>
-<li>Apache, PHP, Perl, Tcl
-<li>BREW, QNX, S60
-<li>dbm, gdbm, ndbm, hsearch, DB 1.85 APIs
-<li>DB_REGISTER (it's now the default)
-<li>Db.key_range
-<li>Example code (replaced by documentation and code generation)
-<li>Flat-text backing files
-<li>RPC
-<li>Replication base API
-<li>SWIG
-<li>Sequences
-<li>System V shared memory regions
-<li>XA-style distributed transactions
-</ul>
-
-</ol>
-<h3>FAQ:</h3>
-<ol>
-<p><li>
-Why not use real database terms ("database" means environment, "table"
-means database)?
-<p>
-Even though that's desirable, since we're continuing to use the
-Berkeley DB API, it doesn't make sense to switch.
-
-<p><li>
-Will the engine require a 64-bit type?
-<p>
-I guess so, this shouldn't be a problem, should it?
-
-</ol>
-<h3>References:</h3>
-
-<ul type="none">
-
-<li><a href="http://www.cs.berkeley.edu/~brewer/cs262/Aries.pdf">ARIES:
-A Transaction Recovery Method Supporting Fine-Granularity Locking and
-Partial Rollbacks Using Write-Ahead Logging</a>
-
-<p>
-<li><a href="http://Berkeley DB.cs.yale.edu/hstore/oltpperf-sigmod08.pdf">OLTP Through the Looking Glass, and What We Found There</a>
-
-</ul>
-
-</body>
-</html>
diff --git a/docs/layout.html b/docs/layout.html
deleted file mode 100644
index f2e9953187d..00000000000
--- a/docs/layout.html
+++ /dev/null
@@ -1,201 +0,0 @@
-<htmL
-<head>
-<title>Wired Tiger Layout</title>
-</head>
-<body bgcolor=white>
-<center><h1><font color="green">Wired Tiger Layout</font></h1></center>
-<p align=right>$Date$</p>
-<hr size=1 noshade>
-
-<p>
-There are a number of page formats used in WiredTiger, each with a unique page
-type. However, all of the pages store data in some specific order, and it's
-simplest to think of all of the formats as sets of key/data pairs.
-
-<p>
-Keys are either record numbers or variable-length byte strings.
-
-<p>
-Data items are either fixed-length byte strings, variable-length byte strings
-or off-page references (pointers to other pages in the file).
-
-<p>
-WiredTiger page types are presented in the API as "row" or "column" stores.
-
-<p>
-A "row" store is a Berkeley DB style key/data store, where both keys and data
-are variable-length byte strings. A "column" store is one where the key is a
-record number and the data item is either a variable- or fixed-length byte
-string. We support both because row stores are faster for queries where all
-columns are required by every lookup (because there's only a single set of
-meta-data pages to go through, or read into the cache). Column stores are
-faster for queries where only a few of the columns are required for any lookup
-(because only the columns being returned are present in the cache). Column
-stores also offer significant compression advantages when there is minimal
-difference between adjacent column values.
-
-<p>
-To restate, there are 2 different types of keys in WiredTiger:
-
-<blockquote><pre>
-Recno : 64-bit record number
-Variable-Length : byte string + length
-</pre></blockquote>
-
-<p>
-The record-number isn't stored explicitly in the page, it's implicitly stored
-as part of the Btree.
-
-<p>
-And there are 3 different types of data items in WiredTiger:
-
-<blockquote><pre>
-Fixed-Length : fixed length number of bytes
-Variable-Length : byte string + length
-Off-page Ref: : off-page reference (a ref to another page in the file)
-</pre></blockquote>
-
-<p>
-A row store is one where there's a variable-length key associated with one or
-more variable-length data items. Because record numbers are an implicit part
-of the Btree, it's possible to retrieve records from a row store using record
-numbers, where that's useful. In a row store, a key may reference multiple
-data items (that is, duplicate data items).
-
-<p>
-Here's the most complex possible row store layout:
-
-<blockquote><pre>
- IP
- |
-IP -> IP -> IP -> IP
- |
- LP -> LP -> LP
- |
- DIP
- |
- DIP -> DIP -> DIP
- |
- |
- DLP -> DLP
-
-Key: IP Internal page
- LP Leaf page
- DIP Duplicate internal page
- DLP Duplicate leaf page
-</pre></blockquote>
-
-<p>
-This is a standard B+tree layout. One or more levels of internal pages, then
-one level of leaf pages.
-
-<p>
-A row store's internal page keys are variable-length byte strings. The internal
-page data items are references to other pages (the off-page reference type).
-
-<p>
-A row store's leaf page key and data items are both variable-length byte
-strings.
-
-<p>
-As mentioned above, it's possible to store multiple data items with a single
-key. In this case, the key is only stored a single time. If there are enough
-duplicate data items, the duplicate data items are moved into their own,
-separate Btree, referenced from the leaf page. These are the "duplicate
-internal" and "duplicate leaf" pages in the above diagram.
-
-<p>
-Duplicate internal page keys are record numbers, and the duplicate internal
-page data items are references to other pages (the off-page reference type).
-
-<p>
-Duplicate leaf page keys are record numbers, and the duplicate leaf page data
-items are variable-length strings.
-
-<p>
-Row store notes:
-<ul>
-
-<li>
-WiredTiger internal pages are used only for navigation, and may not store
-true database keys. In other words, a true B+tree, where keys can only be
-returned from leaf pages.
-
-<li>
-Any variable-length key or data item can be replaced with a reference to
-an overflow page. This is used whenever a key or data item won't fit onto a
-page because it's too large. However, references to overflow pages can be
-thought of as an additional step taken to get a key or data item, and not a
-fundamental layout structure.
-
-<li>
-Any variable-length key or data item can be compressed using the Huffman
-compression engine. Again, decompression is an additional step taken to get
-a key or data item, and not a fundamental layout structure issue.
-
-</ul>
-
-<p>
-This concludes the discussion of a row store.
-
-<p>
-A column store is one where there's a single key, that references a single
-data item, and that data item may be fixed- or variable-length.
-
-<p>
-Here's the most complex possible row store layout:
-
-<blockquote><pre>
- IP
- |
-IP -> IP -> IP -> IP
- |
- LP -> LP -> LP
-
-Key: IP Internal page
- LP Leaf page
-</pre></blockquote>
-
-<p>
-A column store's internal page keys are record numbers. The internal page
-data items are references to other pages (the off-page reference type).
-
-<p>
-A column store's leaf page keys are record numbers. The leaf page data items
-are fixed- or variable-length byte strings.
-
-<p>
-Because column store keys do not disappear on delete, there needs to be an
-on-page, permanent "deleted" data value; for variable-length byte strings,
-this is a specific item type. For fixed-length byte strings, the top bit
-of the field is the deleted flag (which means that the user-specified data
-length is off by a single bit).
-
-<p>
-Column store notes:
-<ul>
-
-<li>
-Like the row store, any variable-length data item can be replaced
-with a reference to an overflow page, and can be compressed using the
-Huffman compression engine.
-
-<li>
-Any fixed-length data item can be compressed using a dictionary lookup
-or repeat count.
-
-</ul>
-
-<p>
-Q&A:
-<ul>
-
-<li>
-Why not have fixed-length data items in a row store? Because once you have
-variable-length items on the page, having fixed-length items doesn't make
-things any faster, you still have to maintain a page index.
-
-</ul>
-
-</body>
-</html>