SERVER-49327 Architecture Guide: Time-Series Collections

author: Benety Goh <benety@mongodb.com> 2021-01-29 14:51:28 -0500
committer: Evergreen Agent <no-reply@evergreen.mongodb.com> 2021-01-30 00:29:30 +0000
commit: 92d10390cbab0206c05e24c40d9284e71b282dd6 (patch)
tree: c8b8adbeb6dd5b44d88b57e3353ffba60781f65f /src/mongo/db
parent: e5f0892d0d74eb4b90707a0fd5b19e3220a8344b (diff)
download: mongo-92d10390cbab0206c05e24c40d9284e71b282dd6.tar.gz
1 files changed, 91 insertions, 0 deletions
diff --git a/src/mongo/db/timeseries/README.md b/src/mongo/db/timeseries/README.md
new file mode 100644
index 00000000000..d0c12629d8e
--- /dev/null
+++ b/src/mongo/db/timeseries/README.md
@@ -0,0 +1,91 @@
+# Time-Series Collections
+
+MongoDB supports a new collection type for storing time-series data with the [timeseries](../commands/create.idl)
+collection option. A time-series collection presents a simple interface for inserting and querying
+measurements while organizing the actual data in buckets.
+
+A minimally configured time-series collection is defined by providing the [timeField](timeseries.idl)
+at creation. Optionally, a meta-data field may also be specified to help group
+ measurements in the buckets. MongoDB also supports an expiration mechanism on measurements through
+the `expireAfterSeconds` option.
+
+A time-series collection `mytscoll` in the `mydb` database is represented in the [catalog](../catalog/README.md) by a
+combination of a view and a system collection:
+* The view `mydb.mytscoll` is defined with the bucket collection as the source collection with
+certain properties:
+    * Writes (inserts only) are allowed on the view. Every document inserted must contain a time field.
+    * Querying the view implicitly unwinds the data in the underlying bucket collection to return
+      documents in their original non-bucketed form.
+        * The aggregation stage [$_internalUnpackBucket](../pipeline/document_source_internal_unpack_bucket.h) is used to
+          unwind the bucket data for the view.
+* The system collection has the namespace `mydb.system.buckets.mytscoll` and is where the actual
+  data is stored.
+    * Each document in the bucket collection represents a set of time-series data within a period of time.
+    * If a meta-data field is defined at creation time, this will be used to organize the buckets so that
+      all measurements within a bucket have a common meta-data value.
+    * Besides the time range, buckets are also constrained by the total number and size of measurements.
+
+## Bucket Collection Schema
+
+```
+{
+    _id: <Object ID with time component equal to first measurement in this bucket>,
+    control: {
+        // <Some statistics on the measurements such min/max values of data fields>
+        version: 1,  // Version of bucket schema. Currently fixed at 1 since this is the
+                     // first iteration of time-series collections.
+        min: {
+            <time field>: <time of first measurement in this bucket>,
+            <field0>: <minimum value of 'field0' across all measurements>,
+            <field1>: <maximum value of 'field1' across all measurements>,
+            ...
+        },
+        max: {
+            <time field>: <time of last measurement in this bucket>,
+            <field0>: <maximum value of 'field0' across all measurements>,
+            <field1>: <maximum value of 'field1' across all measurements>,
+            ...
+        },
+    },
+    meta: <meta-data field (if specified at creation) value common to all measurements in this bucket>,
+    data: {
+        <time field>: {
+            '0', <time of first measurement>,
+            '1', <time of second measurement>,
+            ...
+            '<n-1>': <time of n-th measurement>,
+        },
+        <field0>: {
+            '0', <value of 'field0' in first measurement>,
+            '1', <value of 'field0' in first measurement>,
+            ...
+        },
+        <field1>: {
+            '0', <value of 'field1' in first measurement>,
+            '1', <value of 'field1' in first measurement>,
+            ...
+        },
+        ...
+    }
+}
+```
+
+See:
+[MongoDB Blog: Time Series Data and MongoDB: Part 2 - Schema Design Best Practices](https://www.mongodb.com/blog/post/time-series-data-and-mongodb-part-2-schema-design-best-practices)
+
+# Glossary
+**bucket**: A group of measurements with the same meta-data over a limited period of time.
+
+**bucket collection**: A system collection used for storing the buckets underlying a time-series
+collection. Replication, sharding and indexing are all done at the level of buckets in the bucket
+collection.
+
+**measurement**: A set of related key-value pairs at a specific time.
+
+**meta-data**: The key-value pairs of a time-series that rarely change over time and serve to
+identify the time-series as a whole.
+
+**time-series**: A sequence of measurements over a period of time.
+
+**time-series collection**: A collection type representing a writable non-materialized view that
+allows storing and querying a number of time-series, each with different meta-data.
author	Benety Goh <benety@mongodb.com>	2021-01-29 14:51:28 -0500
committer	Evergreen Agent <no-reply@evergreen.mongodb.com>	2021-01-30 00:29:30 +0000
commit	92d10390cbab0206c05e24c40d9284e71b282dd6 (patch)
tree	c8b8adbeb6dd5b44d88b57e3353ffba60781f65f /src/mongo/db
parent	e5f0892d0d74eb4b90707a0fd5b19e3220a8344b (diff)
download	mongo-92d10390cbab0206c05e24c40d9284e71b282dd6.tar.gz