summaryrefslogtreecommitdiff
path: root/src/mongo/db
diff options
context:
space:
mode:
authorBenety Goh <benety@mongodb.com>2021-01-29 14:51:28 -0500
committerEvergreen Agent <no-reply@evergreen.mongodb.com>2021-01-30 00:29:30 +0000
commit92d10390cbab0206c05e24c40d9284e71b282dd6 (patch)
treec8b8adbeb6dd5b44d88b57e3353ffba60781f65f /src/mongo/db
parente5f0892d0d74eb4b90707a0fd5b19e3220a8344b (diff)
downloadmongo-92d10390cbab0206c05e24c40d9284e71b282dd6.tar.gz
SERVER-49327 Architecture Guide: Time-Series Collections
Diffstat (limited to 'src/mongo/db')
-rw-r--r--src/mongo/db/timeseries/README.md91
1 files changed, 91 insertions, 0 deletions
diff --git a/src/mongo/db/timeseries/README.md b/src/mongo/db/timeseries/README.md
new file mode 100644
index 00000000000..d0c12629d8e
--- /dev/null
+++ b/src/mongo/db/timeseries/README.md
@@ -0,0 +1,91 @@
+# Time-Series Collections
+
+MongoDB supports a new collection type for storing time-series data with the [timeseries](../commands/create.idl)
+collection option. A time-series collection presents a simple interface for inserting and querying
+measurements while organizing the actual data in buckets.
+
+A minimally configured time-series collection is defined by providing the [timeField](timeseries.idl)
+at creation. Optionally, a meta-data field may also be specified to help group
+ measurements in the buckets. MongoDB also supports an expiration mechanism on measurements through
+the `expireAfterSeconds` option.
+
+A time-series collection `mytscoll` in the `mydb` database is represented in the [catalog](../catalog/README.md) by a
+combination of a view and a system collection:
+* The view `mydb.mytscoll` is defined with the bucket collection as the source collection with
+certain properties:
+ * Writes (inserts only) are allowed on the view. Every document inserted must contain a time field.
+ * Querying the view implicitly unwinds the data in the underlying bucket collection to return
+ documents in their original non-bucketed form.
+ * The aggregation stage [$_internalUnpackBucket](../pipeline/document_source_internal_unpack_bucket.h) is used to
+ unwind the bucket data for the view.
+* The system collection has the namespace `mydb.system.buckets.mytscoll` and is where the actual
+ data is stored.
+ * Each document in the bucket collection represents a set of time-series data within a period of time.
+ * If a meta-data field is defined at creation time, this will be used to organize the buckets so that
+ all measurements within a bucket have a common meta-data value.
+ * Besides the time range, buckets are also constrained by the total number and size of measurements.
+
+## Bucket Collection Schema
+
+```
+{
+ _id: <Object ID with time component equal to first measurement in this bucket>,
+ control: {
+ // <Some statistics on the measurements such min/max values of data fields>
+ version: 1, // Version of bucket schema. Currently fixed at 1 since this is the
+ // first iteration of time-series collections.
+ min: {
+ <time field>: <time of first measurement in this bucket>,
+ <field0>: <minimum value of 'field0' across all measurements>,
+ <field1>: <maximum value of 'field1' across all measurements>,
+ ...
+ },
+ max: {
+ <time field>: <time of last measurement in this bucket>,
+ <field0>: <maximum value of 'field0' across all measurements>,
+ <field1>: <maximum value of 'field1' across all measurements>,
+ ...
+ },
+ },
+ meta: <meta-data field (if specified at creation) value common to all measurements in this bucket>,
+ data: {
+ <time field>: {
+ '0', <time of first measurement>,
+ '1', <time of second measurement>,
+ ...
+ '<n-1>': <time of n-th measurement>,
+ },
+ <field0>: {
+ '0', <value of 'field0' in first measurement>,
+ '1', <value of 'field0' in first measurement>,
+ ...
+ },
+ <field1>: {
+ '0', <value of 'field1' in first measurement>,
+ '1', <value of 'field1' in first measurement>,
+ ...
+ },
+ ...
+ }
+}
+```
+
+See:
+[MongoDB Blog: Time Series Data and MongoDB: Part 2 - Schema Design Best Practices](https://www.mongodb.com/blog/post/time-series-data-and-mongodb-part-2-schema-design-best-practices)
+
+# Glossary
+**bucket**: A group of measurements with the same meta-data over a limited period of time.
+
+**bucket collection**: A system collection used for storing the buckets underlying a time-series
+collection. Replication, sharding and indexing are all done at the level of buckets in the bucket
+collection.
+
+**measurement**: A set of related key-value pairs at a specific time.
+
+**meta-data**: The key-value pairs of a time-series that rarely change over time and serve to
+identify the time-series as a whole.
+
+**time-series**: A sequence of measurements over a period of time.
+
+**time-series collection**: A collection type representing a writable non-materialized view that
+allows storing and querying a number of time-series, each with different meta-data.