diff options
Diffstat (limited to 'src/mongo/db/timeseries')
-rw-r--r-- | src/mongo/db/timeseries/README.md | 91 |
1 files changed, 91 insertions, 0 deletions
diff --git a/src/mongo/db/timeseries/README.md b/src/mongo/db/timeseries/README.md new file mode 100644 index 00000000000..d0c12629d8e --- /dev/null +++ b/src/mongo/db/timeseries/README.md @@ -0,0 +1,91 @@ +# Time-Series Collections + +MongoDB supports a new collection type for storing time-series data with the [timeseries](../commands/create.idl) +collection option. A time-series collection presents a simple interface for inserting and querying +measurements while organizing the actual data in buckets. + +A minimally configured time-series collection is defined by providing the [timeField](timeseries.idl) +at creation. Optionally, a meta-data field may also be specified to help group + measurements in the buckets. MongoDB also supports an expiration mechanism on measurements through +the `expireAfterSeconds` option. + +A time-series collection `mytscoll` in the `mydb` database is represented in the [catalog](../catalog/README.md) by a +combination of a view and a system collection: +* The view `mydb.mytscoll` is defined with the bucket collection as the source collection with +certain properties: + * Writes (inserts only) are allowed on the view. Every document inserted must contain a time field. + * Querying the view implicitly unwinds the data in the underlying bucket collection to return + documents in their original non-bucketed form. + * The aggregation stage [$_internalUnpackBucket](../pipeline/document_source_internal_unpack_bucket.h) is used to + unwind the bucket data for the view. +* The system collection has the namespace `mydb.system.buckets.mytscoll` and is where the actual + data is stored. + * Each document in the bucket collection represents a set of time-series data within a period of time. + * If a meta-data field is defined at creation time, this will be used to organize the buckets so that + all measurements within a bucket have a common meta-data value. + * Besides the time range, buckets are also constrained by the total number and size of measurements. + +## Bucket Collection Schema + +``` +{ + _id: <Object ID with time component equal to first measurement in this bucket>, + control: { + // <Some statistics on the measurements such min/max values of data fields> + version: 1, // Version of bucket schema. Currently fixed at 1 since this is the + // first iteration of time-series collections. + min: { + <time field>: <time of first measurement in this bucket>, + <field0>: <minimum value of 'field0' across all measurements>, + <field1>: <maximum value of 'field1' across all measurements>, + ... + }, + max: { + <time field>: <time of last measurement in this bucket>, + <field0>: <maximum value of 'field0' across all measurements>, + <field1>: <maximum value of 'field1' across all measurements>, + ... + }, + }, + meta: <meta-data field (if specified at creation) value common to all measurements in this bucket>, + data: { + <time field>: { + '0', <time of first measurement>, + '1', <time of second measurement>, + ... + '<n-1>': <time of n-th measurement>, + }, + <field0>: { + '0', <value of 'field0' in first measurement>, + '1', <value of 'field0' in first measurement>, + ... + }, + <field1>: { + '0', <value of 'field1' in first measurement>, + '1', <value of 'field1' in first measurement>, + ... + }, + ... + } +} +``` + +See: +[MongoDB Blog: Time Series Data and MongoDB: Part 2 - Schema Design Best Practices](https://www.mongodb.com/blog/post/time-series-data-and-mongodb-part-2-schema-design-best-practices) + +# Glossary +**bucket**: A group of measurements with the same meta-data over a limited period of time. + +**bucket collection**: A system collection used for storing the buckets underlying a time-series +collection. Replication, sharding and indexing are all done at the level of buckets in the bucket +collection. + +**measurement**: A set of related key-value pairs at a specific time. + +**meta-data**: The key-value pairs of a time-series that rarely change over time and serve to +identify the time-series as a whole. + +**time-series**: A sequence of measurements over a period of time. + +**time-series collection**: A collection type representing a writable non-materialized view that +allows storing and querying a number of time-series, each with different meta-data. |