From a5ee4e0d7b5912b4a4ce1ea7e15a8bbcff176a96 Mon Sep 17 00:00:00 2001 From: Nick Thomas Date: Fri, 19 Oct 2018 03:57:00 +0100 Subject: Document how GitLab keeps its tests pristine --- doc/development/testing_guide/best_practices.md | 124 ++++++++++++++++++++++++ 1 file changed, 124 insertions(+) diff --git a/doc/development/testing_guide/best_practices.md b/doc/development/testing_guide/best_practices.md index acbfa1850b4..440c812f0b4 100644 --- a/doc/development/testing_guide/best_practices.md +++ b/doc/development/testing_guide/best_practices.md @@ -209,6 +209,130 @@ it 'is overdue' do end ``` +### Pristine test environments + +The code exercised by a single GitLab test may access and modify many items of +data. Without careful preparation before a test runs, and cleanup afterward, +data can be changed by a test in such a way that it affects the behaviour of +following tests. This should be avoided at all costs! Fortunately, the existing +test framework handles most cases already. + +When the test environment does get polluted, a common outcome is +[flaky tests](flaky_tests.md). Pollution will often manifest as an order +dependency: running spec A followed by spec B will reliably fail, but running +spec B followed by spec A will reliably succeed. In these cases, you can use +`rspec --bisect` (or a manual pairwise bisect of spec files) to determine which +spec is at fault. Fixing the problem requires some understanding of how the test +suite ensures the environment is pristine. Read on to discover more about each +data store! + +#### SQL database + +This is managed for us by the `database_cleaner` gem. Each spec is surrounded in +a transaction, which is rolled back once the test completes. Certain specs will +instead issue `DELETE FROM` queries against every table after completion; this +allows the created rows to be viewed from multiple database connections, which +is important for specs that run in a browser, or migration specs, among others. + +One consequence of using these strategies, instead of the well-known +`TRUNCATE TABLES` approach, is that primary keys and other sequences are **not** +reset across specs. So if you create a project in spec A, then create a project +in spec B, the first will have `id=1`, while the second will have `id=2`. + +This means that specs should **never** rely on the value of an ID, or any other +sequence-generated column. To avoid accidental conflicts, specs should also +avoid manually specifying any values in these kinds of columns. Instead, leave +them unspecified, and look up the value after the row is created. + +#### Redis + +GitLab stores two main categories of data in Redis: cached items, and sidekiq +jobs. + +In most specs, the Rails cache is actually an in-memory store. This is replaced +between specs, so calls to `Rails.cache.read` and `Rails.cache.write` are safe. +However, if a spec makes direct Redis calls, it should mark itself with the +`:clean_gitlab_redis_cache`, `:clean_gitlab_redis_shared_state` or +`:clean_gitlab_redis_queues` traits as appropriate. + +Sidekiq jobs are typically not run in specs, but this behaviour can be altered +in each spec through the use of `Sidekiq::Testing.inline!` blocks. Any spec that +causes Sidekiq jobs to be pushed to Redis should use the `:sidekiq` trait, to +ensure that they are removed once the spec completes. + +#### Filesystem + +Filesystem data can be roughly split into "repositories", and "everything else". +Repositories are stored in `tmp/tests/repositories`. This directory is emptied +before a test run starts, and after the test run ends. It is not emptied between +specs, so created repositories accumulate within this directory over the +lifetime of the process. Deleting them is expensive, but this could lead to +pollution unless carefully managed. + +To avoid this, [hashed storage](../../administration/repository_storage_types.md) +is enabled in the test suite. This means that repositories are given a unique +path that depends on their project's ID. Since the project IDs are not reset +between specs, this guarantees that each spec gets its own repository on disk, +and prevents changes from being visible between specs. + +If a spec manually specifies a project ID, or inspects the state of the +`tmp/tests/repositories/` directory directly, then it should clean up the +directory both before and after it runs. In general, these patterns should be +completely avoided. + +Other classes of file linked to database objects, such as uploads, are generally +managed in the same way. With hashed storage enabled in the specs, they are +written to disk in locations determined by ID, so conflicts should not occur. + +Some specs disable hashed storage by passing the `:legacy_storage` trait to the +`projects` factory. Specs that do this must **never** override the `path` of the +project, or any of its groups. The default path includes the project ID, so will +not conflict; but if two specs create a `:legacy_storage` project with the same +path, they will use the same repository on disk and lead to test environment +pollution. + +Other files must be managed manually by the spec. If you run code that creates a +`tmp/test-file.csv` file, for instance, the spec must ensure that the file is +removed as part of cleanup. + +#### Persistent in-memory application state + +All the specs in a given `rspec` run share the same Ruby process, which means +they can affect each other by modifying Ruby objects that are accessible between +specs. In practice, this means global variables, and constants (which includes +Ruby classes, modules, etc). + +Global variables should generally not be modified. If absolutely necessary, a +block like this can be used to ensure the change is rolled back afterwards: + +```ruby +around(:each) do |example| + old_value = $0 + + begin + $0 = "new-value" + example.run + ensure + $0 = old_value + end +end +``` + +If a spec needs to modify a constant, it should use the `stub_const` helper to +ensure the change is rolled back. + +If you need to modify the contents of the `ENV` constant, you can use the +`stub_env` helper method instead. + +While most Ruby **instances** are not shared between specs, **classes** +and **modules** generally are. Class and module instance variables, accessors, +class variables, and other stateful idioms, should be treated in the same way as +global variables - don't modify them unless you have to! In particular, prefer +using expectations, or dependency injection along with stubs, to avoid the need +for modifications. If you have no other choice, an `around` block similar to the +example for global variables, above, can be used, but this should be avoided if +at all possible. + ### Table-based / Parameterized tests This style of testing is used to exercise one piece of code with a comprehensive -- cgit v1.2.1