diff options
Diffstat (limited to 'src/docs/schema.dox')
-rw-r--r-- | src/docs/schema.dox | 206 |
1 files changed, 206 insertions, 0 deletions
diff --git a/src/docs/schema.dox b/src/docs/schema.dox new file mode 100644 index 00000000000..01b9572c0ff --- /dev/null +++ b/src/docs/schema.dox @@ -0,0 +1,206 @@ +/*! @page schema Schemas + +While many tables have simple key/value pairs for records, WiredTiger +also supports more complex data patterns. + +@section schema_intro Tables, rows and columns + +A table is a logical representation of data consisting of cells in rows and +columns. For example, a database might have this table. + +<table> +<tr><th>EmpId</th><th>Lastname</th><th>Firstname</th><th>Salary</th></tr> +<tr><td>1</td><td>Smith</td><td>Joe</td><td>40000</td></tr> +<tr><td>2</td><td>Jones</td><td>Mary</td><td>50000</td></tr> +<tr><td>3</td><td>Johnson</td><td>Cathy</td><td>44000</td></tr> +</table> + +This simple table includes an employee identifier, last name and first +name, and a salary. + +A row-oriented database would store all of the values in a row together, +then the values in the next row, and so on: + +<pre> + 1,Smith,Joe,40000; + 2,Jones,Mary,50000; + 3,Johnson,Cathy,44000; +</pre> + +A column-oriented database stores all of the values of a column together, +then the values of the next column, and so on: + +<pre> + 1,2,3; + Smith,Jones,Johnson; + Joe,Mary,Cathy; + 40000,50000,44000; +</pre> + +WiredTiger supports both storage formats, and can mix and match the storage +of columns within a logical table. + +Applications describe the format of their data by supplying a schema to +WT_SESSION::create. This specifies how the application's data can be split +into fields and mapped onto rows and columns. + +@section schema_types Column types + +By default, WiredTiger works as a traditional key/value store, where the +keys and values are raw byte arrays accessed using a WT_ITEM structure. +Keys and values may be up to (4GB - 512B) bytes in size, but depending +on how @ref WT_SESSION::create "maximum item sizes" are configured, +large key and value items will be stored on overflow pages. + +See @subpage keyvalue for more details on raw key / value items. + +The schema layer allows key and value types to be chosen from a list, +or composite keys or values made up of columns with any combination of +types. The size (4GB - 512B) byte limit on keys and values still +applies. + +WiredTiger's uses format strings similar to those specified in the Python +struct module to describe the types of columns in a table: + http://docs.python.org/library/struct + +@section schema_format_types Format types + +<table> +@hrow{Format, C Type, Java type, Python type, Standard size} +@row{x, pad byte, N/A, N/A, 1} +@row{b, signed char, byte, integer, 1} +@row{B, unsigned char, byte, integer, 1} +@row{h, short, short, integer, 2} +@row{H, unsigned short, short, integer, 2} +@row{i, int, int, integer, 4} +@row{I, unsigned int, int, integer, 4} +@row{l, long, int, integer, 4} +@row{L, unsigned long, int, integer, 4} +@row{q, long long, long, integer, 8} +@row{Q, unsigned long long, long, integer, 8} +@row{r, uint64_t, long, integer, 8} +@row{s, char[], String, string, fixed length} +@row{S, char[], String, string, variable} +@row{t, unsigned char, byte, integer, fixed bit length} +@row{u, WT_ITEM *, byte[], string, variable} +</table> + +The <code>'r'</code> type is used for record number keys in column stores. + +The <code>'S'</code> type is encoded as a C language string terminated by a +NUL character. + +The <code>'t'</code> type is used for fixed-length bit field values. If +it is preceded by a size, that indicates the number of bits to store, +between 1 and 8. That number of low-order bits will be stored in the +table. The default is a size of 1 bit: that is, a boolean. The +application must always use an <code>unsigned char</code> type (or +equivalently, <code>uint8_t</code>) for calls to WT_CURSOR::set_value, +and a pointer to the same for calls to WT_CURSOR::get_value. If a bit +field value is combined with other types in a packing format, it is +equivalent to <code>'B'</code>, and a full byte is used to store it. + +When referenced by a record number (that is, a key format of \c 'r'), +the <code>'t'</code> type will be stored in a fixed-length column-store, +and will not have a out-of-band value to indicate the record does not +exist. In this case, a 0 byte value is used to indicate the record does +not exist. This means removing a record with WT_CURSOR::remove is +equivalent to storing a value of 0 in the record with WT_CURSOR::update +(and storing a value of 0 in the record will cause cursor scans to +skip the record). Additionally, creating a record past the end of the +table or file implies the creation of any missing intermediate records, +with byte values of 0. + +The <code>'u'</code> type is for raw byte arrays: if it appears at the end +of a format string (including in the default <code>"u"</code> format for +untyped tables), the size is not stored explicitly. When <code>'u'</code> +appears within a format string, the size is stored as a 32-bit integer in +the same byte order as the rest of the format string, followed by the data. + +There is a default collator that gives lexicographic (byte-wise) +comparisons, and the default encoding is designed so that lexicographic +ordering of encoded keys is usually the expected ordering. For example, +the variable-length encoding of integers is designed so that they have the +natural integer ordering under the default collator. + +See @subpage packing for details of WiredTiger's packing format. + +WiredTiger can also be extended with @ref WT_COLLATOR. + +@section schema_data_access Columns in key and values + +Every table has a key format and a value format as describe in @ref +schema_types. These types are configured when the table is created by +passing \c key_format and \c value_format keys to WT_SESSION::create. + +Cursors for a table have the same key format as the table itself. The key +columns of a cursor are set with WT_CURSOR::set_key and accessed with +WT_CURSOR::get_key. WT_CURSOR::set_key is analogous to \c printf, and takes +a list of values in the order the key columns are configured in \c +key_format. The columns values are accessed with WT_CURSOR::get_key, which +is analogous to \c scanf, and takes a list of pointers to values in the same +order. + +Cursors for a table have the same value format as the table, unless a +projection is specified to WT_SESSION::open_cursor. WT_CURSOR::set_value is +used to set value columns, and WT_CURSOR::get_value is used to get value +columns. + +@section schema_columns Describing columns + +The columns in a table can be assigned names by passing a \c columns key to +WT_SESSION::create. The column names are assigned first to the columns in +the \c key_format, and then to the columns in \c value_format. There must be +a name for every column, and no column names may be repeated. + +@section schema_column_groups Storing groups of columns together + +Once column names are assigned, they can be used to configure column groups, +where groups of columns are stored in separate files. + +There are two steps involved in setting up column groups: first, pass a list +of names for the column groups in the \c colgroups configuration key to +WT_SESSION::create. Then make a call to WT_SESSION::create for each column +group, using the URI <code>colgroup:\<table\>:\<colgroup name\></code> and a +\c columns key in the configuration. Columns can be stored in multiple +column groups, but all value columns must appear in at least on column group. + +Column groups have the same key as the table. This is particularly useful +for column stores, because record numbers are not stored explicitly on disk, +so there is no repetition of keys across multiple files. Also note that key +columns cannot be stored in column group values: they can be retrieved with +WT_CURSOR::get_key. + +@section schema_indices Adding an index + +Schema columns can also be used to configure indices on tables. To create an +index, call WT_SESSION::create using the URI +<code>index:\<table\>:\<index name\></code> and include a \c columns key in +the configuration. WiredTiger updates all indices for a table whenever the +table is modified. + +A cursor can be opened on an index by passing the index URI to +WT_SESSION::open_cursor. Index cursors use the specified index key columns +for WT_CURSOR::get_key and WT_CURSOR::set_key, and by default return all of +the table value columns in WT_CURSOR::get_value. Index cursors are +read-only: they cannot be used to perform updates. + +@section schema_examples Code samples + +The code below is taken from the complete example program +@ex_ref{ex_schema.c}. + +@snippet ex_schema.c schema decl +@snippet ex_schema.c schema work + +The code below is taken from the complete example program +@ex_ref{ex_call_center.c}. + +@snippet ex_call_center.c call-center decl +@snippet ex_call_center.c call-center work + +@todo new section: schema_advanced Advanced Schemas +@todo non-relational data such as multiple index keys per row +@todo application-supplied extractors and collators need to be +registered before recovery can run. + */ |