diff options
author | Ted Tuckman <ted.tuckman@mongodb.com> | 2020-12-23 07:04:20 -0500 |
---|---|---|
committer | Evergreen Agent <no-reply@evergreen.mongodb.com> | 2021-01-27 14:23:42 +0000 |
commit | 49985eb4a42f43e5da75f6c8da9a0cb76c482c9e (patch) | |
tree | 70eb4274dc2daedbb53de5254b08fe06a51fcea8 /src/mongo/db/query/README.md | |
parent | c08a910858f79ac4af5fa4b7f33684110da3e05c (diff) | |
download | mongo-49985eb4a42f43e5da75f6c8da9a0cb76c482c9e.tar.gz |
SERVER-53495 Extend query README with MQL parsing details
Diffstat (limited to 'src/mongo/db/query/README.md')
-rw-r--r-- | src/mongo/db/query/README.md | 71 |
1 files changed, 71 insertions, 0 deletions
diff --git a/src/mongo/db/query/README.md b/src/mongo/db/query/README.md index 62f5f434a0b..ede094cc3d4 100644 --- a/src/mongo/db/query/README.md +++ b/src/mongo/db/query/README.md @@ -211,4 +211,75 @@ aggregation pipelines. ## Query Language Parsing & Validation +Once we have parsed the command and checked authorization, we move on to parsing the individual +parts of the query. Once again, we will focus on the find and aggregate commands. + +### Find command parsing +The find command is parsed entirely by the IDL. Initially the IDL parser creates a QueryRequest. As +mentioned above, the IDL parser does all of the required type checking and stores all options for +the query. The QueryRequest is then turned into a CanonicalQuery. The CanonicalQuery +parses the collation and the filter while just holding the rest of the IDL parsed fields. +The parsing of the collation is straightforward: for each field that is allowed to be in the object, +we check for that field and then build the collation from the parsed fields. + +When the CanonicalQuery is built we also parse the filter argument. A filter is composed of one or +more MatchExpressions which are parsed recursively using hand written code. The parser builds a +tree of MatchExpressions from the filter BSON object. The parser performs some validation at the +same time -- for example, type validation and checking the number of arguments for expressions are +both done here. + +### Aggregate Command Parsing + +#### LiteParsedPipeline +In the process of parsing an aggregation we create two versions of the pipeline: a +LiteParsedPipeline (that contains LiteParsedDocumentSource objects) and the Pipeline (that contains +DocumentSource objects) that is eventually used for execution. See the above section on +authorization checking for more details. + +#### DocumentSource +Before talking about the aggregate command as a whole, we will first briefly discuss +the concept of a DocumentSource. A DocumentSource represents one stage in the an aggregation +pipeline. For each stage in the pipeline, we create another DocumentSource. A DocumentSource +either represents a stage in the user's pipeline or a stage generated from a user facing +alias, but the relation to the user's pipeline is not always one-to-one. For example, a $bucket in +a user pipeline becomes a $group stage followed by a $sort stage, while a user specified $group +will remain as a DocumentSourceGroup. Each DocumentSource has its own parser that performs +validation of its internal fields and arguments and then generates the DocumentSource that will be +added to the final pipeline. + +#### Pipeline +The pipeline parser uses the individual document source parsers to parse the entire pipeline +argument of the aggregate command. The parsing process is fairly simple -- for each object in the +user specified pipeline lookup the document source parser for the stage name, and then parse the +object using that parser. The final pipeline is composed of the DocumentSources generated by the +individual parsers. + +#### Aggregation Command +When an aggregation is run, the first thing that happens is the request is parsed into a +LiteParsedPipeline. As mentioned above, the LiteParsedPipeline is used to check options and +permissions on namespaces. More checks are done in addition to those performed by the +LiteParsedPipeline, but the next parsing step is after all of those have been completed. Next, the +BSON object is parsed again into the pipeline using the DocumentSource parsers that we mentioned +above. Note that we use the original BSON for parsing the pipeline and DocumentSources as opposed +to continuing from the LiteParsedPipeline. This could be improved in the future. + +### Other command parsing +As mentioned above, there are several other commands maintained by the query team. We will quickly +give a summary of how each is parsed, but not get into the same level of detail. + +* count : Parsed by IDL and then turned into a CountStage which can be executed in a similar way to + a find command. +* distinct : The distinct specific arguments are parsed by IDL, and the generic command arguments + are parsed by custom code. They are then combined into a QueryRequest (mentioned above), + canonicalized, packaged into a ParsedDistinct, which is eventually turned into an executable + stage. +* mapReduce : Parsed by IDL and then turned into an equivalent aggregation command. +* update : Parsed by IDL. An update command can contain both query (find) and pipeline syntax + (for updates) which each get delegated to their own parsers. +* delete : Parsed by IDL. The filter portion of the of the delete command is delegated to the find + parser. +* findAndModify : Parsed by IDL. The findAndModify command can contain find and update syntax. The + query portion is delegated to the query parser and if this is an update (rather than a delete) it + uses the same parser as the update command. + TODO from here on. |