diff options
author | Robert Guo <robert.guo@10gen.com> | 2022-03-21 21:57:53 -0400 |
---|---|---|
committer | Evergreen Agent <no-reply@evergreen.mongodb.com> | 2022-03-23 16:43:56 +0000 |
commit | 1881a6bf29b756110d9e13e82e0999345854bf9c (patch) | |
tree | 9951608e736fb2a97890412cdb548ea60b0b1f1c /docs | |
parent | 7c4fa21deaee0f1e0280a1dc129c7e5da82df99b (diff) | |
download | mongo-1881a6bf29b756110d9e13e82e0999345854bf9c.tar.gz |
SERVER-64771 move doc for FSM concurrency testing framework
Diffstat (limited to 'docs')
-rw-r--r-- | docs/images/testing/fsm.png | bin | 0 -> 25815 bytes | |||
-rw-r--r-- | docs/images/testing/fsm_composed_example.png | bin | 0 -> 99745 bytes | |||
-rw-r--r-- | docs/images/testing/fsm_serial_example.png | bin | 0 -> 104473 bytes | |||
-rw-r--r-- | docs/images/testing/fsm_simultaneous_example.png | bin | 0 -> 106377 bytes | |||
-rw-r--r-- | docs/testing/fsm_concurrency_testing_framework.md | 536 |
5 files changed, 536 insertions, 0 deletions
diff --git a/docs/images/testing/fsm.png b/docs/images/testing/fsm.png Binary files differnew file mode 100644 index 00000000000..2942162d1c3 --- /dev/null +++ b/docs/images/testing/fsm.png diff --git a/docs/images/testing/fsm_composed_example.png b/docs/images/testing/fsm_composed_example.png Binary files differnew file mode 100644 index 00000000000..463c0844bea --- /dev/null +++ b/docs/images/testing/fsm_composed_example.png diff --git a/docs/images/testing/fsm_serial_example.png b/docs/images/testing/fsm_serial_example.png Binary files differnew file mode 100644 index 00000000000..0c01f0e9604 --- /dev/null +++ b/docs/images/testing/fsm_serial_example.png diff --git a/docs/images/testing/fsm_simultaneous_example.png b/docs/images/testing/fsm_simultaneous_example.png Binary files differnew file mode 100644 index 00000000000..aff66370571 --- /dev/null +++ b/docs/images/testing/fsm_simultaneous_example.png diff --git a/docs/testing/fsm_concurrency_testing_framework.md b/docs/testing/fsm_concurrency_testing_framework.md new file mode 100644 index 00000000000..f65f5e0b3ea --- /dev/null +++ b/docs/testing/fsm_concurrency_testing_framework.md @@ -0,0 +1,536 @@ +# FSM-based Concurrency Testing Framework + + +## Overview +The FSM tests are meant to exercise concurrency within MongoDB. The suite +consists of workloads, which define discrete units of work as states in a FSM, +and runners, which define which tests to run and how they should be run. Each +workload defines states, which are JS functions that perform some meaningful +series of tasks and assertions, and transitions, which define how to move +between those states. A single workload begins by executing its setup function, +which is called once during the runner's thread of execution. Next, the runner +generates the number of threads specified by the workload, and each spawned +thread executes the start state (typically named "init") defined by the +workload. From this point on, each worker thread executes its own independent +copy of the FSM, and will randomly move between states (after executing the +function) based on the probabilities defined in the workload's transition table. +Each worker thread continues doing so until the number of transitions it makes +has reached the number of iterations defined by the workload. Once all the +worker threads have finished, the runner executes the workload's teardown +function. + +![fsm.png](/docs/images/testing/fsm.png) + +The runner provides three modes of execution for workloads: serial, parallel, +and composed. Serial mode runs the provided workloads one after the other, +waiting for all threads of a workload to complete before moving on to the next +workload. Parallel mode runs subsets of the provided workloads in separate +threads simultaneously. Composed mode runs subsets of the provided workloads in +the same threads simultaneously, with each thread moving between the states of +multiple workloads. + +New methods were added to allow for finer-grained assertions under different +situations. For example, a test that inserts a document into a collection, and +wants to assert its existence will fail if another test removes that document. +One option would have been to disable all assertions when running a mixture of +different workloads together, but doing so would make the system incapable of +detecting anything other than server crashes. Another option would have been to +design the workloads to be conflict-free (e.g. writing to separate collections, +using commutative operators), but this would leave large gaps in the achievable +test coverage. Neither of those options were found to be very appealing. +Instead, we chose to introduce the concept of an "assertion level" that acts as +a precondition for when an assertion is evaluated. This allows us to still make +some assertions, even when running a mixture of different workloads together. +There are three assertion levels: `ALWAYS`, `OWN_COLL`, and `OWN_DB`. They can +be thought of as follows: + +* `ALWAYS`: A statement that remains unequivocally true, regardless of what + another workload might be doing to the collection I was given (hint: think + defensively). Examples include "1 = 1" or inserting a document into a + collection (disregarding any unique indices). + +* `OWN_COLL`: A statement that is true only if I am the only workload operating + on the collection I was given. Examples include counting the number of + documents in a collection or updating a previously inserted document. + +* `OWN_DB`: A statement that is true only if I am the only workload operating on + the database I was given. Examples include renaming a collection or verifying + that a collection is capped. The workload typically relies on the use of + another collection aside from the one given. + +## Creating your own workload + +All workloads are stored in `jstests/concurrency/fsm_workloads` and as specific +examples you can refer to + +1. `jstests/concurrency/fsm_example.js` +1. `jstests/concurrency/fsm_example_inheritance.js` + +for writing new workloads. Every workload is loaded in as inline JavaScript +using the "load" function, which is a lot more like a `#include` than +`require.js`. This means that whatever variables are declared in the global +scope of the file will become part of the scope where load is called. The runner +will be looking for a variable called `$config` which will store the +configuration of your workload. + +### The $config object + +There should be exactly one `$config` per workload. For style consistency as +well as safety, be sure to wrap the value of `$config` in an anonymous function. +This will create a JS closure and a new scope: + +```javascript +$config = (function() { + /* ... */ + return { + threadCount: "<number of threads>", + iterations: "<number of steps>", + startState: "<start state for this workload>", + states: "<state functions>", + transitions: "<transition probability map>", + setup: "<function to initialize workload>", + teardown: "<function to cleanup workload if necessary>", + data: "<'this' property available to each state function>", + }; +)(); +``` + +When finished executing, `$config` must return an object containing the properties +above (some of which are optional, see below). + +### Defining states +It's best to also declare states within its own closure so as not to interfere +with the scope of $config. Each state takes two arguments, the db object and the +collection name. For later, note that this db and collection are the only one +that you can be guaranteed to "own" when asserting. Try to make each state a +discrete unit of work that can stand alone without the other states (in fact, an +FSM that requires another state to run before it will probably not work in +Composed mode). Additionally, try to define each function that makes up a state +with a name as opposed to anonymously - this makes easier to read backtraces +when things go wrong. + +```javascript +$config = (function() { + /* ... */ + var states = (function() { + function getRand() { + return Random.randInt(10); + } + + function init(db, collName) { + this.start = getRand() * this.tid; + } + + function scanGT(db, collName) { + db[collName].find({ _id: { $gt: this.start } }).itcount(); + } + + function scanLTE(db, collName) { + db[collName].find({ _id: { $lte: this.start } }).itcount(); + } + + + return { + init: init, + scanGT: scanGT, + scanLTE: scanLTE + }; + })(); + + /* ... */ + + return { + /* ... */ + states: states, + /* ... */ + }; +})(); +``` + +### Defining transitions + +The transitions object defines the probabilities of moving from one state to a +different state. When a state's function is finished executing, the FSM randomly +chooses the next state using the probabilities provided in the transitions +object. The probabilities of the transitions object do not necessarily need to +sum to 1.0, since the mechanism for choosing the next state uses normalized +random values. Here it is not necessary to use a separate closure. In the +example below, we're denoting an equal probability of moving to either of the +scan states from the init state: + +```javascript + +$config = (function() { + /* ... */ + var transitions = { + init: { scanGT: 0.5, scanLTE: 0.5 }, + scanGT: { scanGT: 0.8, scanLTE: 0.2 }, + scanLTE: { scanGT: 0.2, scanLTE: 0.8 } + }; + /* ... */ + return { + /* ... */ + transitions: transitions, + /* ... */ + }; +})(); +``` + +### Setup and teardown functions + +The setup and teardown functions are special in that they'll only be executed in +one thread. See the Runners section for more information about when they're run +relative to other workloads in various modes. The setup and teardown functions +take three arguments: db, coll, and cluster. The setup function (and +corresponding teardown) should perform most of the initialization your workload +needs, for example setting parameters on the server, adding seed data, or +setting up indexes. Note that rather than executing adminCommands (and others) +against the provided `db` you should use the provided +`cluster.executeOnMongodNodes` and `cluster.executeOnMongosNodes` functionality. + +```javascript +$config = (function() { + /* ... */ + function setup(db, collName, cluster) { + // Workloads should NOT drop the collection db[collName], as doing so + // is handled by jstests/concurrency/fsm_libs/runner.js before 'setup' is called. + for (var i = 0; i < 1000; ++i) { + db[collName].insert({ _id: i }); + } + cluster.executeOnMongodNodes(function(db) { + db.adminCommand({ setParameter: 1, internalQueryExecYieldIterations: 5 }); + }); + cluster.executeOnMongosNodes(function(db) { + printjson(db.serverCmdLineOpts()); + }); + } + + function teardown(db, collName, cluster) { + cluster.executeOnMongodNodes(function(db) { + db.adminCommand({ setParameter: 1, internalQueryExecYieldIterations: 128 }); + }); + } + /* ... */ + return { + /* ... */ + setup: setup, + teardown: teardown, + /* ... */ + }; +})(); +``` + +### The `data` object + +The `data` object preserves information between different states of an FSM within +an individual thread. Within a single state, the data object becomes the 'this' +context in which the state executes. Additionally, a tid attribute is added to +data by the runner to allow each thread to access a unique ID. Data is usually +defined above states inside the config, but left below it in the returned +object. Data is also available as the 'this' context in setup and teardown +functions. Note that once the FSM begins, the context data that was passed to +the setup function is copied into each thread - meaning each thread has its own +copy of the data and modifications to data will not be passed back to the +teardown function outside of what was changed in setup. Additionally, in +composition, each workload has its own data, meaning you don't have to worry +about properties being overridden by workloads other than the current one. + +```javascript +$config = (function() { + var data = { + start: 0 + }; + /* ... */ + return { + /* ... */ + data: data, + /* ... */ + }; +})(); +``` + +### Other properties of $config + +#### `threadCount` + +threadCount is the number of threads that will be used to run your workload in +Serial and Parallel modes. In both modes, the number of threads you provide will +execute the FSM simultaneously, cycling through different states of the +workload. Note that in serial mode, no other threads will be running outside of +those pertaining to this workload, and in parallel mode, other workloads will +also be given threads to execute their FSM. In some cases in parallel mode, this +number will be scaled down to make sure that all workloads can fit within the +number of threads available due to system or performance constraints. + +#### `iterations` + +This is just the number of states the FSM will go through before exiting. NOTE: +it is *not* the number of times each state will be executed. + +#### `startState` (optional) + +Default value is 'init'. If your workload does not have an init state than you +must specify in which state to begin. + +### Workload helpers + +`jstests/concurrency/fsm_workload_helpers` contains a few files that you can +include using 'load' at the top of a workload. These provide auxiliary +functionality that might be necessary for some workloads. The most important of +which is probably server_types.js + +#### server_types.js + +This helper file contains four functions: isMongos, isMongod, isMMAPv1, and +isWiredTiger. These can be used to restrict operations on different +functionality available in sharded environments, as well as based on storage +engine, and work as you would expect. One thing to note is that before calling +either isMMAPv1 or isWiredTiger, first verify isMongod. When special casing +functionality for sharded environments or storage engines, try to special case a +test for the exceptionality while still leaving in place assertions for either +case. + +#### indexed_noindex.js + +This helper can be used along with inheritance, to create a workload that is +exactly the same as an existing workload, but with the index created during +setup removed. In order to use this replace the function you provide to the +extendWorkload function with indexedNoindex. Additionally, ensure that the +workload you are extending has a function in its data object called +"getIndexSpec" that returns the spec for the index to be removed. + +```javascript + +load('jstests/concurrency/fsm_libs/extend_workload.js'); // for extendWorkload +load('jstests/concurrency/fsm_workload_modifiers/indexed_noindex.js'); // for indexedNoindex +load('jstests/concurrency/fsm_workloads/workload_with_index.js'); //for $config + +$config = extendWorkload($config, indexedNoIndex); +``` + +#### drop_utils.js + +These helpers provide safe methods for dropping collections, databases, roles, +and users created during a workload's execution. The methods take a regular +expression that the collection, database, role, or user name must match for it +to be dropped. Prefixing the items in any of these categories you create with a +prefix defined by your workload name is a good idea since the workload file name +can be assumed unique and will allow you to only affect your workload in these +cases. + + +## Test runners + +By default, all runners below are allowed to open a maximum of +`maxAllowedConnections` (= 100 by default) explicit connections. In replicated +and sharded environments, implicit connections are created to the original +mongod provided to the mongo shell executing the runner (one for each thread). +This behavior cannot be controlled, but it highlights the importance of always +using the db object provided in the FSM states rather than the global db which +will always correspond to the mongod the mongo shell initially connected to. + +### Execution modes + +#### Serial + +Serial is the simplest of all three modes and basically works as explained +above. Setup is run single threaded, data is copied into multiple threads where +the states are executed, and once all the threads have finished a teardown +function is run and the runner moves onto the next workload. + +![fsm_serial_example.png](/docs/images/testing/fsm_serial_example.png) + +#### Parallel (Simultaneous) + +In parallel or simultaneous mode (the naming convention has been slightly +inconsistent), the ordering becomes a little different. All workloads have their +setup functions run, then threads are spawned for each workload, and once they +all complete, all threads have their teardown function run. + +![fsm_simultaneous_example.png](/docs/images/testing/fsm_simultaneous_example.png) + +#### Composed + +Composed is the most complicated of runner modes. It is similar to parallel in +that all setups and teardowns are run in sequence, but the execution that goes +on between them is a little different. In composed mode, each thread spawned +after setup has a two step transition between states. The first step is to +decide whether or not to switch which FSM it is executing within. The +probability of switching is defined by the option composeProb. If no switching +between FSMs will be performed, then the next step is just to choose which state +to move to within the existing FSM using the workload's defined transition +probabilities. If switching between FSMs is the outcome of the first step, then +the second step becomes choosing a state from all the other workloads with equal +probability. For a walkthrough of this process, see the slides under "Other +Resources". + +![fsm_composed_example.png](/docs/images/testing/fsm_composed_example.png) + + +### Existing runners + +The existing runners all use `jstests/concurrency/fsm_libs/runner.js` to +actually execute the workloads. Most information about arguments and available +runWorkloads methods can be found by inspecting the source. Below you can find +the existing runners explained. The first argument to the three runWorkloads +methods (each corresponding to a different run mode), is an array of workload +files to run. clusterOptions, the second argument to the runWorkloads functions, +is explained in the other components section below. Execution options for +runWorkloads functions, the third argument, can contain the following options +(some depend on the run mode): + +* `numSubsets` - Not available in serial mode, determines how many subsets of + workloads to execute in parallel or composed mode +* `subsetSize` - Not available in serial mode, determines how large each subset of + workloads executed is +* `composeProb` - Only available in composed mode, determines the probability of + leaving the current FSM to enter a new FSM in composed mode +* `iterations` - Only available in composed mode, determines how many transitions + to perform between states in a single thread of composition. + +#### fsm_all.js + +Runs all workloads serially. For each workload, `$config.threadCount` threads +are spawned and each thread runs for exactly `$config.iterations` steps starting +at `$config.startState` and transitioning to other states based on the +transition probabilities defined in $config.transitions. + +#### fsm_all_composed.js + +options: numSubsets, subsetSize, composeProb, iterations (=100 by default) + +Runs numSubsets subsets (each subset has size subsetSize) of all workloads in +composition. During composition, the probability of transitioning from some +state in a workload to another state in the same workload is equal to +1-composeProb. Each thread spawned during composition goes for exactly +iterations steps. The same database and collection are used throughout a +composition of workloads. By default, each workload in each subset is run +between 2 and 3 times. The number of threads used during composition equals the +sum of the `$config.threadCount` values for each workload in each subset. + +#### fsm_all_simultaneous.js + +options: numSubsets, subsetSize + +Runs numSubsets subsets of size subsetSize of all workloads. The workloads in +each subset are started in parallel and each workload is run according to +settings in `$config`. + +#### fsm_all_replication.js + +Sets up a replica set (with 3 mongods by default) and runs workloads serially, +in parallel, or in composed mode. For example, + +`runWorkloadsSerially([<workload1>, <workload2>, ...], { replication: true } )` + +creates a replica set with 3 members and runs some workloads serially on the +primary. + +#### fsm_all_sharded.js + +Sets up a sharded cluster (with 2 shards and 1 mongos by default) and runs +workloads serially, in parallel, or in composed mode. For example, + +`runWorkloadsInParallel([<workload1>, <workload2>, ...], { sharded: true } )` + +creates a sharded cluster and runs workloads in parallel. + +#### fsm_all_sharded_replication.js + +Sets up a sharded cluster (with 2 shards, each having 3 replica set members, and +1 mongos by default) and runs workloads serially, in parallel, or in composed +mode. + +### Excluding a workload + +If any workloads fail because of known bugs in MongoDB, persistent MCI failures +or timeouts, the troublesome workload can be excluded from running by placing it +in the exclusion array in the corresponding runner. Please remember to place a +comment next to the excluded workload name identifying the reason a workload is +being excluded. For example, + +`'agg_sort_external.js', // SERVER-16700 Deadlock on WiredTiger LSM` + +Each file should also have two predefined sections - one for known bugs and one +for restrictions. The one above would be considered a known bug. However, +excluding a compact workload from sharded runners would be a restriction because +compact can only be run against individual mongods. + +## Using assertions + +Do NOT use built-in mongo shell asserts within any FSM-based workload. Instead +use any of the following assertion levels which are defined in terms of the DB +and collection passed to state functions within a workload. + +#### `assertAlways` + +Works just like a regular shell assert(), but is more clear about its +consideration for the other two assert modes + +#### `assertWhenOwnColl` + +Asserts when the current workload owns the collection. In this case, other +workloads could potentially modify other collections in the same database, or +access the database itself. Use this assertion when all you need is a guarantee +that no other FSM has been given access to the collection, but other FSMs may +have access to the same DB. + +#### `assertWhenOwnDB` + +Asserts when the current workload owns the database, including all the +collections in that database. Use this assertion when you need to guarantee +exclusive access for this workload to the provided database and all its included +collections. + +## Other components of the FSM library + +Most of these components live in jstests/concurrency/fsm_libs and provide the +functionality used by the runner. + +### ThreadManager + +Responsible for spawning and joining worker threads. Each spawned thread is +wrapped in a try/finally block to ensure that the database connection implicitly +created during the thread's execution is eventually closed explicitly. The +ThreadManager sets a random seed `([0, randInt(1e13))` which is the range of +`new Date().getTime())` before executing each workload. + +### Worker Thread + +Thread spawned by ThreadManager and used to run a Finite State Machine. + +### Cluster + +cluster.js is responsible for providing the cluster object that is passed to +setup and teardown functions, and the initial connection to a db to be used by +runner to pass to the workloads. For anything except for standalone, it makes +use of the shell's built-in cluster test helpers like `ShardingTest` and +`ReplSetTest`. clusterOptions are passed to cluster.js for initialization. +clusterOptions include: + +* `replication`: boolean, whether or not to use replication in the cluster +* `sameCollection`: boolean, whether or not all workloads are passed the same + collection +* `sameDB`: boolean, whether or not all workloads are passed the same DB +* `setupFunctions`: object, containing at most two functions under the keys + 'mongod' and 'mongos'. This allows you to run a function against all mongod or + mongos nodes in the cluster as part of the cluster initialization. Each + function takes a single argument, the db object against which configuration + can be run (will be set for each mongod/mongos) +* `sharded`: boolean, whether or not to use sharding in the cluster + +Note that sameCollection and sameDB can increase contention for a resource, but +will also decrease the strength of the assertions by ruling out the use of OwnDB +and OwnColl assertions. + +### Miscellaneous Execution Notes + +* A `CountDownLatch` (exposed through the v8-based mongo shell, as of MongoDB 3.0) + is used as a synchronization primitive by the ThreadManager to wait until all + spawned threads have finished being spawned before starting workload + execution. +* If more than 20% of the threads fail while spawning, we abort the test. If + fewer than 20% of the threads fail while spawning we allow the non-failed + threads to continue with the test. The 20% threshold is somewhat arbitrary; + the goal is to abort if "mostly all" of the threads failed but to tolerate "a + few" threads failing.
\ No newline at end of file |