summaryrefslogtreecommitdiff
path: root/src/mongo/db/s/README.md
diff options
context:
space:
mode:
authorBrett Nawrocki <brett.nawrocki@mongodb.com>2021-09-13 16:49:05 +0000
committerEvergreen Agent <no-reply@evergreen.mongodb.com>2021-09-14 15:13:39 +0000
commitf859381ef5c03988df9130164b86cdf8b262729e (patch)
tree38e620219f3f9e5d263f32a253b98a26037a4aa7 /src/mongo/db/s/README.md
parent77c17082ef7744f0f83e03e403af54a17e116d78 (diff)
downloadmongo-f859381ef5c03988df9130164b86cdf8b262729e.tar.gz
SERVER-59864 Reword confusing documentation
Previously, it was specified that the ChunkSplitter can split chunks concurrently, provided those chunks do not overlap. This implies that chunks are able to overlap, which is not the case. Instead specify that those chunks must be distinct (i.e. the same chunk cannot be split multiple times concurrently). Additionally, reword a confusing sentence regarding the storage of the key management collection for sharded clusters vs. replica sets to be clear that it is talking about two separate cases. Finally, fix various small typos throughout the file.
Diffstat (limited to 'src/mongo/db/s/README.md')
-rw-r--r--src/mongo/db/s/README.md16
1 files changed, 8 insertions, 8 deletions
diff --git a/src/mongo/db/s/README.md b/src/mongo/db/s/README.md
index 8dd759a82cd..1b127b9b008 100644
--- a/src/mongo/db/s/README.md
+++ b/src/mongo/db/s/README.md
@@ -156,11 +156,11 @@ scenarios:
### Types of operations that will cause the versioning information to become stale
-Before going through the table that explains which operations modify the versioning information and how, it is important to give a bit more information about the move chunk operation. When we move a chunk C from the *i* shard to the *j* shard, where *i* and *j* are different, we end up updating the shard version of both shards. For the recipient shard (i.e. *j* shard), the version of the migrated chunk defines its shard version. For the donor shard (i.e. *i* shard) what we do is looking for another chunk of that collection on that shard and update its version. That chunk is called the control chunk and its version defines the *i* shard version. If there are no other chunks, the shard version is updated to SV<sub>i</sub><0, 0, E<sub>cv</sub>>.
+Before going through the table that explains which operations modify the versioning information and how, it is important to give a bit more information about the move chunk operation. When we move a chunk C from the *i* shard to the *j* shard, where *i* and *j* are different, we end up updating the shard version of both shards. For the recipient shard (i.e. *j* shard), the version of the migrated chunk defines its shard version. For the donor shard (i.e. *i* shard) what we do is look for another chunk of that collection on that shard and update its version. That chunk is called the control chunk and its version defines the *i* shard version. If there are no other chunks, the shard version is updated to SV<sub>i</sub><0, 0, E<sub>cv</sub>>.
Operation Type | Version Modification Behavior |
-------------- | ----------------------------- |
-Moving a chunk C <br> C<M, m, E> | C<M<sub>cv</sub> + 1, 0, E<sub>cv</sub>> <br> ControlChunk<<sub>cv</sub> + 1, 1, E<sub>cv</sub>> if any |
+Moving a chunk C <br> C<M, m, E> | C<M<sub>cv</sub> + 1, 0, E<sub>cv</sub>> <br> ControlChunk<M<sub>cv</sub> + 1, 1, E<sub>cv</sub>> if any |
Splitting a chunk C into n pieces <br> C<M, m, E> | C<sub>new 1</sub><M<sub>cv</sub>, m<sub>cv</sub> + 1, E<sub>cv</sub>> <br> ... <br> C<sub>new n</sub><M<sub>cv</sub>, m<sub>cv</sub> + n, E<sub>cv</sub>> |
Merging chunks C<sub>1</sub>, ..., C<sub>n</sub> <br> C<sub>1</sub><M<sub>1</sub>, m<sub>1</sub>, E<sub>1</sub>> <br> ... <br> C<sub>n</sub><M<sub>n</sub>, m<sub>n</sub>, E<sub>n</sub>> <br> | C<sub>new</sub><M<sub>cv</sub>, m<sub>cv</sub> + 1, E<sub>cv</sub>> |
Dropping a collection | SV<sub>i</sub><0, 0, objectid()> forall i in 1 <= i <= #Shards |
@@ -259,7 +259,7 @@ A chunk is moved from one shard to another by the moveChunk command. This comman
1. **Transfer queued modifications** - Once the initial batch of documents are copied, the recipient then needs to retrieve any modifications that have been queued up on the donor. This is done by [repeatedly sending the _transferMods command to the donor](https://github.com/mongodb/mongo/blob/5c72483523561c0331769abc3250cf623817883f/src/mongo/db/s/migration_destination_manager.cpp#L1060-L1111). These are [inserts, updates and deletes](https://github.com/mongodb/mongo/blob/11eddfac181ff6ff9faf3e1d55c050373bc6fc24/src/mongo/db/s/migration_chunk_cloner_source_legacy.cpp#L534-L550) that occurred on the donor while the initial batch was being transferred.
1. **Wait for recipient to clone documents** - The donor [polls the recipient](https://github.com/mongodb/mongo/blob/3f849d508692c038afb643b1acb99b8a8cb98d38/src/mongo/db/s/migration_chunk_cloner_source_legacy.cpp#L984) to see when the transfer of documents has been completed or the timeout has been exceeded. This is indicated when the recipient returns a state of “steady” as a result of the _recvChunkStatus command.
1. **Enter the critical section** - Once the recipient has cloned the initial documents, the donor then [declares that it is in a critical section](https://github.com/mongodb/mongo/blob/3f849d508692c038afb643b1acb99b8a8cb98d38/src/mongo/db/s/migration_source_manager.cpp#L344). This indicates that the next operations must not be interrupted and will require recovery actions if they are interrupted. Writes to the donor shard will be suspended while the critical section is in effect. The mechanism to implement the critical section writes the ShardingStateRecovery document to store the minimum optime of the sharding config metadata. If this document exists on stepup it is used to update the optime so that the correct metadata is used.
-1. **Commit on the recipient** - While in the critical section, the [_recvChunkCommit](https://github.com/mongodb/mongo/blob/3f849d508692c038afb643b1acb99b8a8cb98d38/src/mongo/db/s/migration_chunk_cloner_source_legacy.cpp#L360) command is sent to the recipient directing it to fetch any remaining documents for this chunk. The recipient responds by sending _transferMods to fetch the remaining documents while writes are blocked on the donor. Once the documents are transferred successfully, the _recvChunkCommit command returns it’s status to unblock the donor.
+1. **Commit on the recipient** - While in the critical section, the [_recvChunkCommit](https://github.com/mongodb/mongo/blob/3f849d508692c038afb643b1acb99b8a8cb98d38/src/mongo/db/s/migration_chunk_cloner_source_legacy.cpp#L360) command is sent to the recipient directing it to fetch any remaining documents for this chunk. The recipient responds by sending _transferMods to fetch the remaining documents while writes are blocked on the donor. Once the documents are transferred successfully, the _recvChunkCommit command returns its status to unblock the donor.
1. **Commit on the config server** - The donor sends the _configsvrCommitChunkMigration command to the config server. Before the command is sent, [reads are also suspended](https://github.com/mongodb/mongo/blob/3f849d508692c038afb643b1acb99b8a8cb98d38/src/mongo/db/s/migration_source_manager.cpp#L436) on the donor shard.
#### Code references
@@ -291,7 +291,7 @@ On either donor or recipient, the range deletion is [submitted asynchronously](h
## Orphan filtering
There are two cases that arise where orphaned documents need to be filtered out from the results of commands. The first case occurs while the migration protocol described above is in progress. Queries on the recipient that include documents in the chunk that is being migrated will need to be filtered out. This is because this chunk is not yet owned by the recipient shard and should not be visible there until the migration commits.
-The other case where orphans need to be filtered occurs once the migration is completed but the orphaned documents on the donor have not yet been deleted. The results of the filtering depend on what version of the chunk is in use by the query. If the query was in flight before the migration was completed, any documents that were moved by the migration must still be returned. The orphan deletion mechanism desribed above respects this and will not delete these orphans until the outstanding queries complete. If the query has started after the migration was committed, then the orphaned documents will not be returned since they are not owned by this shard.
+The other case where orphans need to be filtered occurs once the migration is completed but the orphaned documents on the donor have not yet been deleted. The results of the filtering depend on what version of the chunk is in use by the query. If the query was in flight before the migration was completed, any documents that were moved by the migration must still be returned. The orphan deletion mechanism described above respects this and will not delete these orphans until the outstanding queries complete. If the query has started after the migration was committed, then the orphaned documents will not be returned since they are not owned by this shard.
Shards store a copy of the chunk distribution for each collection for which they own data. This copy, often called the "filtering metadata" since it is used to filter out orphaned documents for chunks the shard does not own, is stored in memory in the [CollectionShardingStateMap](https://github.com/mongodb/mongo/blob/r4.4.0-rc3/src/mongo/db/s/collection_sharding_state.cpp#L45). The map is keyed by namespace, and the values are instances of [CollectionShardingRuntime](https://github.com/mongodb/mongo/blob/8b8488340f53a71f29f40ead546e36c59323ca93/src/mongo/db/s/collection_sharding_runtime.h). A CollectionShardingRuntime stores the filtering metadata for the collection [in its MetadataManager member](https://github.com/mongodb/mongo/blob/8b8488340f53a71f29f40ead546e36c59323ca93/src/mongo/db/s/metadata_manager.h#L277-L281).
@@ -340,8 +340,8 @@ submits the chunk to the ChunkSplitter to be auto-split.
### The auto split task
The ChunkSplitter is a replica set primary-only service that manages the process of auto-splitting
-chunks. The ChunkSplitter runs auto-split tasks asynchronously - thus, multiple chunks can undergo
-an auto-split concurrently, provided they don't overlap.
+chunks. The ChunkSplitter runs auto-split tasks asynchronously - thus, distinct chunks can
+undergo an auto-split concurrently.
To prepare for the split point selection process, the ChunkSplitter flags that an auto-split for the
chunk is in progress. There may be incoming writes to the original chunk while the split is in
@@ -633,8 +633,8 @@ with the keyId from the message. If the signature does not match, the message wi
### Key management
To provide HMAC message verification all nodes inside a security perimeter i.e. mongos and mongod need to access a secret key to generate and
verify message signatures. MongoDB maintains keys in a `system.keys` collection in the `admin`
-database. In the sharded cluster this collection is located on the config server, in the Replica Set
-its managed by the primary node (and propagated to secondaries via normal replication).
+database. In a sharded cluster this collection is located on the config server replica set and managed by the config server primary.
+In a replica set, this collection is managed by the primary node and propagated to secondaries via normal replication.
The key document has the following format:
```