summaryrefslogtreecommitdiff
path: root/src/mongo/db/s/README.md
diff options
context:
space:
mode:
authorPierlauro Sciarelli <pierlauro.sciarelli@mongodb.com>2021-12-17 16:01:00 +0000
committerEvergreen Agent <no-reply@evergreen.mongodb.com>2021-12-17 16:29:36 +0000
commitd2d933fbd2cb1a2bb4befcdbcbef56814447c7e7 (patch)
tree9fac7bcfeec4ab04ecd16b8dcee752d1805fb032 /src/mongo/db/s/README.md
parentca8ba5328f177be1a27b22808ff54950ffa3ba1b (diff)
downloadmongo-d2d933fbd2cb1a2bb4befcdbcbef56814447c7e7.tar.gz
SERVER-49516 Architecture Guide updates for Sharded collections support all ddl operations
Diffstat (limited to 'src/mongo/db/s/README.md')
-rw-r--r--src/mongo/db/s/README.md67
1 files changed, 67 insertions, 0 deletions
diff --git a/src/mongo/db/s/README.md b/src/mongo/db/s/README.md
index 48ad03ea795..262050886a0 100644
--- a/src/mongo/db/s/README.md
+++ b/src/mongo/db/s/README.md
@@ -1181,3 +1181,70 @@ drivers will not specify this flag at all, so the behavior remains the same.
#### Code references
* [isMaster command](https://github.com/mongodb/mongo/blob/r4.8.0-alpha/src/mongo/s/commands/cluster_is_master_cmd.cpp#L248) for mongos.
* [hello command](https://github.com/mongodb/mongo/blob/r4.8.0-alpha/src/mongo/s/commands/cluster_is_master_cmd.cpp#L64) for mongos.
+
+# Cluster DDL operations
+
+[Data Definition Language](https://en.wikipedia.org/wiki/Data_definition_language) (DDL) operations are operations that change the metadata;
+some examples of DDLs are create/drop database or create/rename/drop collection.
+
+Metadata are tracked in the two main MongoDB catalogs:
+- *[Local catalog](https://github.com/mongodb/mongo/blob/master/src/mongo/db/catalog/README.md#the-catalog)*: present on each shard, keeping
+track of databases/collections/indexes the shard owns or has knowledge of.
+- *Sharded Catalog*: residing on the config server, keeping track of the metadata of databases and sharded collections for which it serves
+as the authoritative source of information.
+
+## Sharding DDL Coordinator
+The [ShardingDDLCoordinator](https://github.com/mongodb/mongo/blob/106b96548c5214a8e246a1cf6ac005a3985c16d4/src/mongo/db/s/sharding_ddl_coordinator.h#L47-L191)
+is the main component of the DDL infrastructure for sharded clusters: it is an abstract class whose concrete implementations have the
+responsibility of coordinating the different DDL operations between shards and the config server in order to keep the two catalogs
+consistent. When a DDL request is received by a router, it gets forwarded to the [primary shard](https://docs.mongodb.com/manual/core/sharded-cluster-shards/#primary-shard)
+of the targeted database. For the sake of clarity, createDatabase is the only DDL operation that cannot possibly get forwarded to the
+database primary but is instead routed to the config server, as the database may not exist yet.
+
+##### Serialization and joinability of DDL operations
+When a primary shard receives a DDL request, it tries to construct a DDL coordinator performing the following steps:
+- Acquire the [distributed lock for the database](https://github.com/mongodb/mongo/blob/908e394d39b223ce498fde0d40e18c9200c188e2/src/mongo/db/s/sharding_ddl_coordinator.cpp#L155). This ensures that at most one DDL operation at a time will run for namespaces belonging to the same database on that particular primary node.
+- Acquire the distributed lock for the [collection](https://github.com/mongodb/mongo/blob/908e394d39b223ce498fde0d40e18c9200c188e2/src/mongo/db/s/sharding_ddl_coordinator.cpp#L171) (or [collections](https://github.com/mongodb/mongo/blob/908e394d39b223ce498fde0d40e18c9200c188e2/src/mongo/db/s/sharding_ddl_coordinator.cpp#L181)) involved in the operation.
+
+In case a new DDL petition on the same namespace gets forwarded by a router while a DDL coordinator is instantiated, a [check is performed](https://github.com/mongodb/mongo/blob/b7a055f55a202ba870730fb865579acf5d9fb90f/src/mongo/db/s/sharding_ddl_coordinator.h#L54-L61)
+on the shard in order to join the ongoing operation if the options match (same operation with same parameters) or fail if they don't
+(different operation or same operation with different parameters).
+
+##### Execution of DDL coordinators
+Once the distributed locks have been acquired, it is guaranteed that no other concurrent DDLs are happening for the same database,
+hence a DDL coordinator can safely start [executing the operation](https://github.com/mongodb/mongo/blob/master/src/mongo/db/s/sharding_ddl_coordinator.cpp#L207).
+
+As first step, each coordinator is required to [majority commit a document](https://github.com/mongodb/mongo/blob/2ae2bcedfb7d48e64843dd56b9e4f107c56944b6/src/mongo/db/s/sharding_ddl_coordinator.h#L105-L116) -
+that we will refer to as state document - containing all information regarding the running operation such as name of the DDL, namespaces
+involved and other metadata identifying the original request. At this point, the coordinator is entitled to start making both local and
+remote catalog modifications, eventually after blocking CRUD operations on the changing namespaces; when the execution reaches relevant
+points, the state can be checkpointed by [updating the state document](https://github.com/mongodb/mongo/blob/b7a055f55a202ba870730fb865579acf5d9fb90f/src/mongo/db/s/sharding_ddl_coordinator.h#L118-L127).
+
+The completion of a DDL operation is marked by the [state document removal](https://github.com/mongodb/mongo/blob/b7a055f55a202ba870730fb865579acf5d9fb90f/src/mongo/db/s/sharding_ddl_coordinator.cpp#L258)
+followed by the [release of the distributed locks](https://github.com/mongodb/mongo/blob/b7a055f55a202ba870730fb865579acf5d9fb90f/src/mongo/db/s/sharding_ddl_coordinator.cpp#L291-L298)
+in inverse order of acquisition.
+
+Some DDL operations are required to block migrations before actually executing so that the coordinator has a consistent view of which
+shards contain data for the collection. The [setAllowMigration command](https://github.com/mongodb/mongo/blob/c5fd926e176fcaf613d9fb785f5bdc70e1aa14be/src/mongo/db/s/config/configsvr_set_allow_migrations_command.cpp#L42)
+serves the purpose of blocking ongoing migrations and avoiding new ones to start.
+
+##### Resiliency to elections, crashes and errors
+
+DDL coordinators are resilient to elections and sudden crashes because they're instances of a [primary only service](https://github.com/mongodb/mongo/blob/master/docs/primary_only_service.md)
+that - by definition - gets automatically resumed when the node of a shard steps up.
+
+The coordinator state document has a double aim:
+- It serves the purpose of primary only service state document.
+- It tracks the progress of a DDL operation.
+
+Steps executed by coordinators are implemented in idempotent phases. When entering a phase, the state is checkpointed as majority committed
+on the state document before actually executing the phase. If a node fails or steps down, it is then safe to resume the DDL operation as
+follows: skip previous phases and re-execute starting from the checkpointed phase.
+
+When a new primary node is elected, the DDL primary only service is [rebuilt](https://github.com/mongodb/mongo/blob/20549d58943b586749d1570eee834c71bdef1b37/src/mongo/db/s/sharding_ddl_coordinator_service.cpp#L158-L185)
+resuming outstanding coordinators, if present; during this recovery phase, incoming DDL operations are [temporarily put on hold](https://github.com/mongodb/mongo/blob/20549d58943b586749d1570eee834c71bdef1b37/src/mongo/db/s/sharding_ddl_coordinator_service.cpp#L152-L156)
+waiting for pre-existing DDL coordinators to be re-instantiated in order to avoid conflicts in the acquisition of the distributed locks.
+
+If a [recoverable error](https://github.com/mongodb/mongo/blob/a1752a0f5300b3a4df10c0a704c07e597c3cd291/src/mongo/db/s/sharding_ddl_coordinator.cpp#L216-L226)
+is caught at execution-time, it will be retried indefinitely; all other errors errors have the effect of stopping and destructing the DDL coordinator and -
+because of that - are never expected to happen after a coordinator performs destructive operations.