More editing of documentation, expanding of examples.

--HG-- branch : mjc extra : transplant_source : %16%C1%5B%C8%91F%D3%D0%AA%C2%1BY%FE%F6%27%AC%02%16%D9I
author: Michael Cahill <michael.cahill@wiredtiger.com> 2011-01-04 17:48:09 +1100
committer: Michael Cahill <michael.cahill@wiredtiger.com> 2011-01-04 17:48:09 +1100
commit: e51557e46f94c83ec7bdce4f686916d4f8bd5d94 (patch)
tree: ac8a71fa719bc3b1db747363ce7222693561fe94 /docs
parent: 771a014ea9bae78f94c6abe7412a11e1cf4625d6 (diff)
download: mongo-e51557e46f94c83ec7bdce4f686916d4f8bd5d94.tar.gz
7 files changed, 105 insertions, 26 deletions
diff --git a/docs/src/cursors.dox b/docs/src/cursors.dox
index e31fd371e0d..df3d629dbbf 100644
--- a/docs/src/cursors.dox
+++ b/docs/src/cursors.dox
@@ -15,12 +15,23 @@ The following are builtin cursor types:
 <table>
 <tr><th>URI</th><th>Function</th></tr>
 <tr><td><tt>table:[\<tablename\>]</tt></td><td>ordinary table cursor</td></tr>
-<tr><td><tt>column:[\<tablename\>.\<columnname\>]</tt></td><td>column cursor</td></tr>
-<tr><td><tt>config:[table:\<tablename\>]</tt></td><td>database or table configuration</td></tr>
-<tr><td><tt>cursortype:</tt></td><td>types of cursor (key=(string)prefix, data=NULL)</td></tr>
+<tr><td><tt>columnset:[\<tablename\>.\<columnset\>]</tt></td><td>column set cursor</td></tr>
+<tr><td><tt>index:[\<tablename\>.\<index\>]</tt></td><td>index cursor</td></tr>
 <tr><td><tt>join:\<cursor1\>\&\<cursor2\>[&\<cursor3\>...]</tt></td><td>Join the contents of multiple cursors together.</td></tr>
 <tr><td><tt>module:</tt></td><td>loadable modules (key=(string)name, data=(string)path)</td></tr>
+<tr><td><tt>config:[table:\<tablename\>]</tt></td><td>database or table configuration</td></tr>
+<tr><td><tt>cursortype:</tt></td><td>types of cursor (key=(string)prefix, data=NULL)</td></tr>
 <tr><td><tt>sequence:[\<seqname\>]</tt></td><td>Sequence cursor (key=recno, data=NULL)</td></tr>
 <tr><td><tt>statistics:[table:\<tablename\>]</tt></td><td>database or table statistics (key=(string)keyname, data=(string)value)</td></tr>
 </table>
+
+\section cursor_projections Projections
+
+Cursors on tables, column sets and indices can return a subset of columns.  This is done by listing the column names in parantheses in the <code>uri</code> parameter to WT_SESSION::open_cursor.  Only the fields from the listed columns are returned by WT_CURSOR::get_key and WT_CURSOR::get_value.
+
+This is particularly useful with index cursors, because if all columns in the projection are available in the index (including primary key columns, which are the values of the index), there is no need to access any column set.
+
+\section cursor_ranges Restricting the Range of a Scan
+
+XXX
  */
diff --git a/docs/src/extending.dox b/docs/src/extending.dox
index d7b5ac2e574..c79a0ce2b0a 100644
--- a/docs/src/extending.dox
+++ b/docs/src/extending.dox
@@ -6,9 +6,17 @@
 
 \section modules Loadable Modules
 
+XXX
+
 \section cursor_factory Adding Cursor Types
 
+XXX
+
 \section collations Custom Collators
 
+XXX
+
 \section extractors Custom Field Extractors
+
+XXX
  */
diff --git a/docs/src/overview.dox b/docs/src/overview.dox
index 9eff24c12e5..583690120db 100644
--- a/docs/src/overview.dox
+++ b/docs/src/overview.dox
@@ -4,16 +4,13 @@
 
 \section overview_intro Introduction
 
-The WiredTiger Data Store is a platform for extensible data management.
-This documentation describes the public interface used by developers to
-construct applications.
+The WiredTiger Data Store is an extensible platform for data management.  This documentation describes the public interface to WiredTiger used by developers to construct applications.
 
 We follow SQL terminology: a database is set of tables that are managed together.  Tables logically consist of rows, each row has a key and a value.  Tables may optionally have an associated schema, which splits the key/value pair into a set of columns.  Tables may also have associated indices, each of which is ordered by some set of columns.
 
-TODO: we need more paragraphs on this page -- maybe two more paragraphs in the Introduction that say "Applications will generally do X, Y and Z, configuration is based on strings passed to the methods, the retrieval/update interfaces is based on cursors that look like X, Y and Z."
+WiredTiger supports column-oriented storage in addition to traditional row-oriented storage.  Instead of storing all fields from a row together, WiredTiger can efficiently store and access sets of columns (including single columns) separately.  The same programming interface is used to access row- and column-oriented data, making it possible to change the data layout to improve throughput without rewriting applications.
 
-The API consists of only a small set of classes, of which the following are
-the most commonly used:
+Applications will generally use the following classes to access and manage data:
 
  - a WT_CONNECTION represents a connection to a database.  Most applications
  will only open one connection to a database for each process.  The first
@@ -34,9 +31,11 @@ the most commonly used:
  columns), as an interface to statistics, configuration data or
  application-specific data sources.
 
+Handles and operations are configured using strings, which keeps the set of methods in the API relatively small and makes the interface very similar regardless of the programming language used in the application.  WiredTiger supports the C, C++, Java and Python programming languages (among others).
+
 \section overview_details Programmer's Reference
 
-For more details about using WiredTiger, see:
+For full details about using WiredTiger, see:
 
 - \subpage architecture
 - \ref using
diff --git a/docs/src/packing.dox b/docs/src/packing.dox
index e3623fd7e8a..5df7e903a59 100644
--- a/docs/src/packing.dox
+++ b/docs/src/packing.dox
@@ -2,5 +2,8 @@
 
 /*! \page packing Packing and Unpacking Data
 
-What are WT_CURSOR::get_key, WT_CURSOR::get_value, WT_CURSOR::set_key and WT_CURSOR::set_value all about?
+XXX What are WT_CURSOR::get_key, WT_CURSOR::get_value, WT_CURSOR::set_key and WT_CURSOR::set_value all about?
+
+- native C structs
+- portability between programming languages
  */
diff --git a/docs/src/processes.dox b/docs/src/processes.dox
index 39f81e8d1c4..893b8aada2a 100644
--- a/docs/src/processes.dox
+++ b/docs/src/processes.dox
@@ -6,5 +6,17 @@ WiredTiger includes a server that provides remote access to databases.  The prim
 
 The remote interface is the way languages other than C/C++ are supported, and the interface for client processes in multiprocess C/C++ applications.
 
-The server can be embedded in the application or run from the command line.  For more details, see \ref command_line.
+The server can be embedded in an application or run from the command line.  To embed the RPC server in your application, pass <code>"sharing=on"</code> to ::wiredtiger_open.  Note that in this case, when your application exits, all client connections are forcibly closed.
+
+For details on running a standalone RPC server, see \ref command_line.
+
+\section processes_sharing Multiple Processes calling ::wiredtiger_open
+
+When ::wiredtiger_open is called for a database, one of the following occurs:
+
+# no process has the database open, it was closed cleanly.  In this case, the opening process becomes the primary process and the database is opened without recovery.
+# no process has the database open, but it was not closed cleanly.  In this case, the process becomes the primary and recovery is run before the database is opened.  See \ref transaction_recovery for details.
+# another process has the database open and is running the RPC server, in which case the opening process becomes a client.
+# another process has the database open but is not running the RPC server, in which case the open fails.
+e
  */
diff --git a/docs/src/schema.dox b/docs/src/schema.dox
index 5535b47d81c..8f8257239d2 100644
--- a/docs/src/schema.dox
+++ b/docs/src/schema.dox
@@ -2,13 +2,11 @@
 
 /*! \page schema Schemas
 
-\section schema_intro Tables, Rows and Columns
-
-XXX rewrite from scratch, fill out with more details about how to use WT_SCHEMA for various things: native C structs, portability between programming languages, non-relational data such as multiple index keys per row, etc.
+The tables we have seen so far have all had simple key/value pairs for records.
 
-Lifted from http://en.wikipedia.org/wiki/Column-oriented_DBMS.
+\section schema_intro Tables, Rows and Columns
 
-A database program must show its data as two-dimensional tables, of columns and rows, but store it as one-dimensional strings. For example, a database might have this table.
+A table is a logical representation of data consisting of cells in rows and columns.  For example, a database might have this table.
 
 <table>
 <tr><th>EmpId</th><th>Lastname</th><th>Firstname</th><th>Salary</th></tr>
@@ -17,17 +15,17 @@ A database program must show its data as two-dimensional tables, of columns and
 <tr><td>3</td><td>Johnson</td><td>Cathy</td><td>44000</td></tr>
 </table>
 
-This simple table includes an employee identifier (EmpId), name fields (Lastname and Firstname) and a salary (Salary).
+This simple table includes an employee identifier, last name and first name, and a salary.
 
-A row-oriented database serializes all of the values in a row together, then the values in the next row, and so on:
+A row-oriented database would store all of the values in a row together, then the values in the next row, and so on:
 
-<pre>k
+<pre>
       1,Smith,Joe,40000;
       2,Jones,Mary,50000;
       3,Johnson,Cathy,44000;
 </pre>
 
-A column-oriented database serializes all of the values of a column together, then the values of the next column, and so on:
+A column-oriented database stores all of the values of a column together, then the values of the next column, and so on:
 
 <pre>
       1,2,3;
@@ -36,11 +34,24 @@ A column-oriented database serializes all of the values of a column together, th
       40000,50000,44000;
 </pre>
 
-\section packing Serializing Values
+WiredTiger supports both storage formats, and can mix and match the storage of columns within a logical table.
+
+Applications describe the format of their data by supplying a *schema* to WT_SESSION::create_table.  This specifies how the application's data can be split into fields and mapped onto rows and columns.
+
+\section schema_columns Describing Columns
+
+XXX
+
+\section schema_indices Adding an Index
+
+XXX
+
+\section schema_mapping Column Storage
 
-\section schema_storage How Schemas are Stored
+XXX
 
-\section schema_startup Loading of Schemas during Startup
+\section schema_advanced Advanced Schemas
 
-\section schema_mapping Mapping Tables onto Access Methods
+- non-relational data such as multiple index keys per row
+- application-supplied extractors and collators may need to be registered before recovery can run.
  */
diff --git a/docs/src/transactions.dox b/docs/src/transactions.dox
index d63efba3a8e..8c595f68f83 100644
--- a/docs/src/transactions.dox
+++ b/docs/src/transactions.dox
@@ -2,5 +2,40 @@
 
 /*! \page transactions Transactions
 
-\section acid The ACID properties.
+Note: the initial release of WiredTiger does not include support for transactions.  This page describes the behavior of a future release.
+
+\section transactions_acid The ACID properties.
+
+Transactions provide a powerful abstraction for multiple threads to operate on data concurrently because they have the following properties:
+
+- Atomicity: all or none of a transaction is completed.
+- Consistency: if each transaction maintains some property when considered separately, then the combined effect of executing the transactions concurrently will maintain the same property.
+- Isolation: developers can reason about transactions independently.
+- Durability: once a transaction commits, its updates are saved.
+
+\section transactions_api The Transaction API
+
+To configure for transactions, the database must be created with transaction support enabled.  This is done by passing the configuration string <code>"transactional"</code> to ::wiredtiger_open when creating the database.
+
+In WiredTiger, the transactional context is managed by the WT_SESSION class.  Applications call WT_SESSION::begin_transaction to start a new transaction, which is only permitted when no cursors are open.  Operations performed with that WT_SESSION handle are then part of the transaction, and their effects can be committed by calling WT_SESSION::commit_transaction or WT_SESSION::rollback_transaction, both of which implicitly close any open cursors.
+
+When transactions are used, operations may fail with additional errors, including ::WT_DEADLOCK, ::WT_UPDATE_CONFLICT, ... [XXX].
+
+\section transactions_cc Concurrency Control
+
+WiredTiger uses a timestamp-based optimistic concurrency control algorithm.  This avoids the bottleneck of a centralized lock manager and expensive graph searching to identify deadlock cycles. [XXX]
+
+\section transaction_isolation Isolation Levels
+
+The default isolation level is <code>serializable</code>, which means that the concurrent execution of committed transactions is equivalent to executing the transactions in some serial order.
+
+Weaker isolation levels are also provided, including <code>repeatable read</code>, which permits phantoms, <code>snapshot isolation</code>, which permits write skew, <code>read committed</code>, which permits lost updates and always returns the most recently committed changes, and <code>read uncommitted</code>, which always reads the most recent version of data, regardless of whether it is committed.
+
+\section transaction_recovery Recovery
+
+Recovery is run automatically during ::wiredtiger_open when required.  See \ref processes_sharing for details.
+
+Recovery works by using a database log that contains a record of the actions of all transactions.  Recovery first finds the last complete checkpoint, and then scans forward through the log from that point to determine which transactions committed after the checkpoint.  All actions are rolled forward from the checkpoint so that the in-memory tree matches its state when the crash occurred.  Then the "losing" transactions (those that did not commit) are rolled back to return the database to a consistent state.
+
+This suggests the importance of regular checkpoints: they limit the amount of work required during recovery, which speeds up the ::wiredtiger_open call.  See WT_SESSION::checkpoint for information about triggering checkpoints.
  */
author	Michael Cahill <michael.cahill@wiredtiger.com>	2011-01-04 17:48:09 +1100
committer	Michael Cahill <michael.cahill@wiredtiger.com>	2011-01-04 17:48:09 +1100
commit	e51557e46f94c83ec7bdce4f686916d4f8bd5d94 (patch)
tree	ac8a71fa719bc3b1db747363ce7222693561fe94 /docs
parent	771a014ea9bae78f94c6abe7412a11e1cf4625d6 (diff)
download	mongo-e51557e46f94c83ec7bdce4f686916d4f8bd5d94.tar.gz