SERVER-53495 Extend query README with MQL parsing details

author: Ted Tuckman <ted.tuckman@mongodb.com> 2020-12-23 07:04:20 -0500
committer: Evergreen Agent <no-reply@evergreen.mongodb.com> 2021-01-27 14:23:42 +0000
commit: 49985eb4a42f43e5da75f6c8da9a0cb76c482c9e (patch)
tree: 70eb4274dc2daedbb53de5254b08fe06a51fcea8 /src/mongo/db/query/README.md
parent: c08a910858f79ac4af5fa4b7f33684110da3e05c (diff)
download: mongo-49985eb4a42f43e5da75f6c8da9a0cb76c482c9e.tar.gz
1 files changed, 71 insertions, 0 deletions
diff --git a/src/mongo/db/query/README.md b/src/mongo/db/query/README.md
index 62f5f434a0b..ede094cc3d4 100644
--- a/src/mongo/db/query/README.md
+++ b/src/mongo/db/query/README.md
@@ -211,4 +211,75 @@ aggregation pipelines.
 
 ## Query Language Parsing & Validation
 
+Once we have parsed the command and checked authorization, we move on to parsing the individual
+parts of the query. Once again, we will focus on the find and aggregate commands.
+
+### Find command parsing
+The find command is parsed entirely by the IDL. Initially the IDL parser creates a QueryRequest. As
+mentioned above, the IDL parser does all of the required type checking and stores all options for
+the query. The QueryRequest is then turned into a CanonicalQuery. The CanonicalQuery
+parses the collation and the filter while just holding the rest of the IDL parsed fields.
+The parsing of the collation is straightforward: for each field that is allowed to be in the object,
+we check for that field and then build the collation from the parsed fields.
+
+When the CanonicalQuery is built we also parse the filter argument. A filter is composed of one or
+more MatchExpressions which are parsed recursively using hand written code. The parser builds a
+tree of MatchExpressions from the filter BSON object. The parser performs some validation at the
+same time -- for example, type validation and checking the number of arguments for expressions are
+both done here.
+
+### Aggregate Command Parsing
+
+#### LiteParsedPipeline
+In the process of parsing an aggregation we create two versions of the pipeline: a
+LiteParsedPipeline (that contains LiteParsedDocumentSource objects) and the Pipeline (that contains
+DocumentSource objects) that is eventually used for execution.  See the above section on
+authorization checking for more details.
+
+#### DocumentSource
+Before talking about the aggregate command as a whole, we will first briefly discuss
+the concept of a DocumentSource. A DocumentSource represents one stage in the an aggregation
+pipeline. For each stage in the pipeline, we create another DocumentSource. A DocumentSource
+either represents a stage in the user's pipeline or a stage generated from a user facing
+alias, but the relation to the user's pipeline is not always one-to-one. For example, a $bucket in
+a user pipeline becomes a $group stage followed by a $sort stage, while a user specified $group
+will remain as a DocumentSourceGroup. Each DocumentSource has its own parser that performs
+validation of its internal fields and arguments and then generates the DocumentSource that will be
+added to the final pipeline.
+
+#### Pipeline
+The pipeline parser uses the individual document source parsers to parse the entire pipeline
+argument of the aggregate command. The parsing process is fairly simple -- for each object in the
+user specified pipeline lookup the document source parser for the stage name, and then parse the
+object using that parser. The final pipeline is composed of the DocumentSources generated by the
+individual parsers.
+
+#### Aggregation Command
+When an aggregation is run, the first thing that happens is the request is parsed into a
+LiteParsedPipeline. As mentioned above, the LiteParsedPipeline is used to check options and
+permissions on namespaces. More checks are done in addition to those performed by the
+LiteParsedPipeline, but the next parsing step is after all of those have been completed. Next, the
+BSON object is parsed again into the pipeline using the DocumentSource parsers that we mentioned
+above. Note that we use the original BSON for parsing the pipeline and DocumentSources as opposed
+to continuing from the LiteParsedPipeline. This could be improved in the future.
+
+### Other command parsing
+As mentioned above, there are several other commands maintained by the query team. We will quickly
+give a summary of how each is parsed, but not get into the same level of detail.
+
+* count : Parsed by IDL and then turned into a CountStage which can be executed in a similar way to
+  a find command.
+* distinct : The distinct specific arguments are parsed by IDL, and the generic command arguments
+  are parsed by custom code. They are then combined into a QueryRequest (mentioned above),
+  canonicalized, packaged into a ParsedDistinct, which is eventually turned into an executable
+  stage.
+* mapReduce : Parsed by IDL and then turned into an equivalent aggregation command.
+* update : Parsed by IDL. An update command can contain both query (find) and pipeline syntax
+  (for updates) which each get delegated to their own parsers.
+* delete : Parsed by IDL. The filter portion of the of the delete command is delegated to the find
+  parser.
+* findAndModify : Parsed by IDL. The findAndModify command can contain find and update syntax. The
+  query portion is delegated to the query parser and if this is an update (rather than a delete) it
+  uses the same parser as the update command.
+
 TODO from here on.
author	Ted Tuckman <ted.tuckman@mongodb.com>	2020-12-23 07:04:20 -0500
committer	Evergreen Agent <no-reply@evergreen.mongodb.com>	2021-01-27 14:23:42 +0000
commit	49985eb4a42f43e5da75f6c8da9a0cb76c482c9e (patch)
tree	70eb4274dc2daedbb53de5254b08fe06a51fcea8 /src/mongo/db/query/README.md
parent	c08a910858f79ac4af5fa4b7f33684110da3e05c (diff)
download	mongo-49985eb4a42f43e5da75f6c8da9a0cb76c482c9e.tar.gz