summaryrefslogtreecommitdiff
path: root/testsuite/HACKING.adoc
blob: 05bc9bad71cc27b73511421dcf99dfb4a379bf79 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
== Running the testsuite

From the root directory of this repository, `make tests` will run all
the tests, which takes a few minutes.

More control is available from the `testsuite/` subdirectory (here):
`make all` runs all tests, `make parallel` runs the tests in parallel,
and you can use `make one DIR=tests/foo` to run the tests of
a specific sub-directory.

== Writing new tests

There are many kind of tests already, so the easiest way to start is
to extend or copy an existing test.

Note: in april 2023, the test scripting language has changed: the
org-mode-based syntax was replaced by at C-like syntax. ocamltest
includes options to translate from the old to the new syntax:
`-translate`, `-compact`, `-keep-lines`, `-keep-chars`. Look at
`tools/translate-all-tests` for an example using them. These options
will be removed after a transitional period.

== Sorts of tests

A test is specified in a `.ml` file containing a `TEST` block which is
processed by our `ocamltest` testing tool, and contains a description
of the testing to be performed.

There are two common sorts of tests:

- file-based tests: running them produces a `.result` file, that are
  compared to one (or several) `.reference` file(s) present in the
  repository. (Some tests, for example, have different reference
  results depending on whether the bytecode or native compilers are
  used, or depending on whether flambda is used or not). The tests
  succeeds if the .result and .reference files are identical.

- expect-style tests: those are toplevel-based tests with inline
  `[%%expect {|...|}]` blocks that contain the expected output for the
  toplevel phrases before them. The test succeeds if the toplevel
  output is identical to the expected output.

expect-style tests are typically easier to read and maintain; we
recommend using then whenever the feature that you want to test can be
observed from the bytecode toplevel.

== Promotion

Tests compare their current result with recorded reference results,
and fail if they differ. Sometimes changes to test results are in fact
expected and innocuous, they come from an intended change in output
instead of a regression. Modifying each reference file by hand would
be extremely time-consuming. The standard approach in this case is to
automatically "promote" the current results into expected results, by
overwriting the `.reference` file or `expect` payload.

Promoting a test is done using the `make promote DIR=tests/foo`
target. In general `make promote` accepts the same arguments (`TEST`,
`LIST`) as `make one` -- there is no analogue to `make all`.

Typical use-cases for promotion are changes to a compiler message that
occurs in the reference results, or code modifications that change
source locations included in reference results.

Whenever you promote test results, please check carefully using `git
diff` that the changes really correspond to an intended output
difference, and not to a regression. You then need to commit the
changes to the reference files (or expect test output); consider
explaining in the commit message why the output changed.

== Useful Makefile targets

`make parallel`::
  Runs the tests in parallel using the
  link:https://www.gnu.org/software/parallel/[GNU parallel] tool: tests run
  twice as fast with no difference in output order.

`make all-foo`, `make parallel-foo`::
  Runs only the tests in the directories whose name starts with `foo`:
  `parallel-typing`, `all-lib`, etc.

`make one DIR=tests/foo`::
  Runs only the tests in the directory `tests/foo`. This is often equivalent to
  `cd tests/foo && make`, but sometimes the latter breaks the test makefile if
  it contains fragile relative filesystem paths. Such errors should be fixed if
  you find them, but `make one DIR=...` is the more reliable option as it runs
  exactly as `make all` which is heavily tested.

`make one TEST=tests/foo/bar.ml`::
  Runs only the specific test `tests/foo/bar.ml`.

`make one LIST=tests.txt`::
  Runs only the tests in the directories listed in the file `tests.txt`.  The
  file should contain one directory per line; for instance, if the contents of
  `tests.txt` are:
+
....
tests/foo
tests/bar
tests/baz
....
+
then this will run all the tests in those three directories.

`make promote DIR=tests/foo`, `make promote TEST=tests/foo/bar.ml`, `make promote LIST=file.txt`::
  Promotes the current outputs to reference outputs for the specified tests.


== Useful environment variables

`KEEP_TEST_DIR_ON_SUCCESS=1`::
  Keeps temporary output files from a test run. This is handy to validate the
  content of temporary output files, run a produced executable by hand, etc.

`OCAMLTESTDIR=/tmp/foo`::
  Changes the output directory to the specified one. This should be combined
  with `KEEP_TEST_DIR_ON_SUCCESS=1` to inspect the test output. By default
  `OCAMLTESTDIR` is `_ocamltest`.

== Dimensioning the tests

By default, tests should run well on small virtual machines (2 cores,
2 Gb RAM, 64 or 32 bits), taking at most one minute, and preferably
less than 10 seconds, to run on such a machine.

Some machines used for continuous integration are more capable than
that.  They use the `OCAML_TEST_SIZE` environment variable to report
the available resources:

|====
| `OCAML_TEST_SIZE`  |  Resources          | Word size

| `1` or unset       | 2 cores, 2 Gb RAM   | 32 or 64 bits
| `2`                | 4 cores, 4 Gb RAM   | 64 bits
| `3`                | 8 cores, 8 Gb RAM   | 64 bits
|=====

Tests, then, can check the `OCAML_TEST_SIZE` environment variable and
increase the number of cores or the amount of memory used.  The
default should always be 2 cores and 2 Gb RAM.