summaryrefslogtreecommitdiff
path: root/src/mongo/db/s
diff options
context:
space:
mode:
authorAllison Easton <allison.easton@mongodb.com>2023-03-29 06:18:31 +0000
committerEvergreen Agent <no-reply@evergreen.mongodb.com>2023-03-29 07:28:53 +0000
commitc1f6281e1193a2d81e003bae0989318dab730a14 (patch)
treed3248326484df913b4500fc17994c7767fbea4f2 /src/mongo/db/s
parent77bc214bdd1816cd1ce76983a72f536cd6ecbc1b (diff)
downloadmongo-c1f6281e1193a2d81e003bae0989318dab730a14.tar.gz
SERVER-74692 Introduce a section on DDL operations to the new Arch Guide
Diffstat (limited to 'src/mongo/db/s')
-rw-r--r--src/mongo/db/s/README_ddl_operations.md62
-rw-r--r--src/mongo/db/s/README_new.md1
2 files changed, 63 insertions, 0 deletions
diff --git a/src/mongo/db/s/README_ddl_operations.md b/src/mongo/db/s/README_ddl_operations.md
new file mode 100644
index 00000000000..040fe768d33
--- /dev/null
+++ b/src/mongo/db/s/README_ddl_operations.md
@@ -0,0 +1,62 @@
+# DDL Operations
+On the Sharding team, we use the term *DDL* to mean any operation that needs to update any subset of [catalog containers](https://github.com/mongodb/mongo/blob/f8a2113103a509ffa361c5aacb3ec0fa94858f9b/src/mongo/db/s/README_sharding_catalog.md#catalog-containers). Within this definition, there are standard DDLs that use the DDL coordinator infrastructure as well as non-standard DDLs that each have their own implementations.
+
+## Standard DDLs
+Most DDL operations are built upon the DDL coordinator infrastructure which provides some [retriability](#retriability), [synchronization](#synchronization), and [recoverability](#recovery) guarantees.
+
+Each of these operations has a *coordinator* - a node that drives the execution of the operation. In [most operations](https://github.com/mongodb/mongo/blob/e61bf27c2f6a83fed36e5a13c008a32d563babe2/src/mongo/db/s/sharding_ddl_coordinator_service.cpp#L60-L120), this coordinator is the database primary, but in [a few others](https://github.com/mongodb/mongo/blob/e61bf27c2f6a83fed36e5a13c008a32d563babe2/src/mongo/db/s/config/configsvr_coordinator_service.cpp#L75-L94) the coordinator is the CSRS. These coordinators extend either the [RecoverableShardingDDLCoordinator class](https://github.com/mongodb/mongo/blob/9fe03fd6c85760920398b7891fde74069f5457db/src/mongo/db/s/sharding_ddl_coordinator.h#L266) or the [ConfigSvrCoordinator class](https://github.com/mongodb/mongo/blob/9fe03fd6c85760920398b7891fde74069f5457db/src/mongo/db/s/config/configsvr_coordinator.h#L47), which make up the DDL coordinator infrastructure.
+
+The diagram below shows a simplified example of a DDL operation's execution. The coordinator can be one of the shards or the config server, and the commands sent to that node will just be applied locally.
+
+```mermaid
+sequenceDiagram
+participant C as Coordinator
+participant S0 as Shard0
+participant S1 as Shard1
+participant CSRS as CSRS
+loop DDL Coordinator Infrastructure
+ note over C: Acquire DDL locks
+ loop Coordinator Implementation
+ note over C: checkpoint
+ C->>CSRS: Stop migrations
+ note over C: checkpoint
+ C->>S0: Acquire critical section
+ C->>S1: Acquire critical section
+ note over C: checkpoint
+ C->>CSRS: Do some operation
+ note over C: checkpoint
+ C->>S0: Do some operation
+ C->>S1: Do some operation
+ note over C: checkpoint
+ C->>S0: Release critical section
+ C->>S1: Release critical section
+ note over C: checkpoint
+ C->>CSRS: Restart migrations
+ end
+ note over C: Release DDL locks
+end
+```
+
+The outer loop is common to all standard DDL operations and ensures that different DDLs within the same database serialize properly by acquiring [DDL locks](#synchronization). The inner loop is specific to each operation, and will perform some set of updates to the sharding metadata while under the critical section. Some operations may not involve all shards or may have more complex phases on the participant shards, but all follow the same general pattern of acquiring the critical section, updating some metadata, and releasing the critical section, checkpointing their progress along the way.
+
+The checkpoints are majority write concern updates to a persisted document on the coordinator. This document - called a state document - contains all the information about the running operation including the operation type, namespaces involved, and which checkpoint the operation has reached. An initial checkpoint must be the first thing any coordinator does in order to ensure that the operation will continue to run even [in the presence of failovers](#recovery). Subsequent checkpoints allow retries to skip phases that have already been completed.
+
+### Retriability
+Most DDL operations must complete after they have started. An exception to this is often a *CheckPreconditions* phase at the beginning of a coordinator in which the operation will check some conditions and will be allowed to exit if these conditions are not met. After this, however, the operation will continue to retry until it succeeds. This is because the updates to the sharding metadata would cause inconsistencies if the critical section were released partially through the operation. For this reason, DDL operations should not throw non-retriable errors after the initial phase of checking preconditions.
+
+### Synchronization
+DDL operations are serialized on the coordinator by acquisition of the DDL locks, handled by the [DDL Lock Manager](https://github.com/mongodb/mongo/blob/r6.2.0/src/mongo/db/s/ddl_lock_manager.h). DDL locks are local to the coordinator and only in memory, so they must be reacquired during [recovery](#recovery).
+
+Each operation acquires a DDL lock on the database first, and then the collection for the operation. Some operations also acquire additional DDL locks, such as renameCollection, which will acquire the target namespace after acquiring the database and source collection locks. At the end of the operation, the locks are released in reverse order.
+
+### Recovery
+DDL coordinators are resilient to elections and sudden crashes because they are implemented as [primary only services](https://github.com/mongodb/mongo/blob/r6.0.0/docs/primary_only_service.md#primaryonlyservice) that - by definition - get automatically resumed when the node of a shard steps up.
+
+When a new primary node is elected, the DDL primary only service is rebuilt, and any ongoing coordinators will be restarted based on their persisted state document. During this recovery phase, any new requests for DDL operations are put on hold, waiting for existing coordinators to be re-instatiated to avoid conflicts with the DDL locks.
+
+## Non-Standard DDLs
+Some DDL operations do not follow the structure outlined in the section above. These operations are chunk migration, resharding, and refine collection shard key. There are also other operations such as add and remove shard that do not modify the sharding catalog but do modify local metadata and need to coordinate with ddl operations. These operations also do not use the DDL coordinator infrastructure, but they do take the DDl lock to synchronize with other ddls.
+
+Both chunk migration and resharding have to copy user data across shards. This is too time intensive to happen entirely while holding the collection critical section, so these operations have separate machinery to transfer the data and commit the changes. These commands do not commit transactionally across the shards and the config server, rather they commit on the config server and rely on shards pulling the updated commit information from the config server after learning via a router that there is new information. They also do not have the same requirement as standard DDL operations that they must complete after starting except after entering their commit phases.
+
+Refine shard key commits only on the config server, again relying on shards to pull updated information from the config server after hearing about this more recent information from a router. In this case, this was done not because of the cost of transfering data, but so that refine shard key did not need to involve the shards. This allows the refineShardKey command to run quickly and not block operations.
diff --git a/src/mongo/db/s/README_new.md b/src/mongo/db/s/README_new.md
index a232b158395..271d2e517f3 100644
--- a/src/mongo/db/s/README_new.md
+++ b/src/mongo/db/s/README_new.md
@@ -25,6 +25,7 @@ The graph further down visualises the architecture of the MongoDB Sharding syste
- [Shard role](README_sharding_catalog.md#router-role)
- [Shard versioning protocol](README_versioning_protocols.md)
- [Balancer](README_balancer.md)
+- [DDL Operations](README_ddl_operations.md)
- [Migrations](README_migrations.md)
```mermaid