testsuite/driver/README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133

GHC Driver Readme
=================

Greetings and well met.  If you are reading this, I can only assume that you
are likely interested in working on the testsuite in some capacity.  For more
detailed documentation, please see [here][1].

## ToC

1. Entry points of the testsuite performance tests
2. Quick overview of program parts
3. How to use the comparison tool
4. Important Types
5. Quick answers for "how do I do X"?


## Entry Points of the testsuite performance tests

The testsuite has two main entry points depending on which perspective you
approach it.  From the perspective of the test writer, the entry point is the
collect_stats function called in *.T files.  This function is declared in
perf_notes.py along with its associated infrastructure.  The purpose of this
function is to tell the test driver what metrics to compare when processing
the test. From the perspective of running the test-suite e.g. via make, its
entry point is the runtests.py file. That file contains the main logic for
running the individual tests, collecting information, handling failure, and
outputting the final results.

## Overview of how the performance test bits work.
During a Haskell Summer of Code project, an intern went through and revamped
most of the performance test code, as such there have been a few changes to it
that might be unusual to anyone previously familiar with the testsuite. One of
the biggest immediate benefits is that all platform differences, compiler
differences, and things such as that are not necessary to be considered by the
test writer anymore. This is due to the fact that the test comparison relies
entirely on locally collected metrics on the testing machine.

As such, it is perfectly sufficient to write `collect_stats('all',20)` in the
".T" files to measure the 3 potential stats that can be collected for that test
and automatically test them for regressions, failing if there is more than a 20%
change in any direction. In fact, even that is not necessary as
`collect_stats()` defaults to 'all', and 20% deviation allowed.

The function `collect_compiler_stats()` is completely equivalent in every way to
`collect_stats` except that it measures the performance of the compiler itself
rather than the performance of the code generated by the compiler. See the
implementation of collect_stats in /driver/testlib.py for more information.

If the performance of a test is improved so much that the test fails, the value
will still be recorded. The warning that will be emitted is merely a precaution
so that the programmer can double-check that they didn't introduce a bug;
something that might be suspicious if the test suddenly improves by 70%,
for example.

Performance metrics for performance tests are now stored in git notes under the
namespace 'perf'.  The format of the git note file is that each line represents
a single metric for a particular test: `$test_env $test_name $test_way
$metric_measured $value_collected` (delimited by tabs).

One can view the maximum deviation a test allows by looking inside its
respective all.T file; additionally, if one sets the verbosity level of the
test-suite to a value >= 4, they will see a good amount of output per test
detailing all the information about values.  This information will also print
if the test falls outside of the allowed bounds.  (see the test_cmp function in
/driver/perf_notes.py for exact formatting of the message)

The git notes are only appended to by the testsuite in a single atomic python
subprocess at the end of the test run; if the run is canceled at any time, the
notes will not be written.  The note appending command will be retried up to 4
times in the event of a failure (such as one happening due to a lock on the
repo) although this is never anticipated to happen.  If, for some reason, the 5
attempts were not enough, an error message will be printed out.  Further, there
is no current process or method for stripping duplicates, updating values, etc,
so if the testsuite is ran multiple times per commit there will be multiple
values in the git notes corresponding to the tests ran.  In this case the
average value is used.

## Quick overview of program parts

The relevant bits of the directory tree are as such:

```
├── driver                   -- Testsuite driver directory
    ├── junit.py             -- Contains code implementing JUnit features.
    ├── kill_extra_files.py  -- Some of the uglier implementation details.
    ├── perf_notes.py        -- Comparison tool and performance tests.
    ├── runtests.py          -- Main entrypoint for program; runs tests.
    ├── testglobals.py       -- Global data structures and objects.
    ├── testlib.py           -- Bulk of implementation is in here.
    └── testutil.py          -- Misc helper functions.
├── mk
    └── test.mk              -- Master makefile for running tests.
├── tests                    -- Main tests directory.
```

## How to Use the Comparison Tool

The comparison tool exists in `/driver/perf_notes.py`.

When the testsuite is ran, the performance metrics of the performance tests are
saved automatically in a local git note that will be attached to the commit.
The comparison tool is designed to help analyze performance metrics across
commits using this performance information.

Currently, it can only be ran by executing the file directly, like so:
```
$ python3 perf_notes.py (arguments go here)
```

If you run `perf_notes.py -h` you will see a description of all of the
arguments and how to use them.  The optional arguments exist to filter the
output to include only commits that you're interested in.  The most typical
usage of this tool will likely be running `perf_notes.py HEAD 'HEAD~1' '(commit
hash)' ...`

The way the performance metrics are stored in git notes remains strictly local
to the machine; as such, performance metrics will not exist for a commit until
you checkout that commit and run the testsuite (or test).

## Quick Answers for "How do I do X?"

* Q: How do I add a flag to "make test" to extend the testsuite functionality?
    1. Add the flag in the appropriate global object in testglobals.py
    2. Add a argument to the parser in runtests.py that sets the flag
    3. Go to the `testsuite/mk/test.mk` file and add a new ifeq (or ifneq)
        block. I suggest adding the block around line 200.
* Q: How do I modify how performance tests work?
    * That functionality resides in perf_notes.py which has pretty good
      in-code documentation.
    * Additionally, one will want to look at `compile_and_run`, `simple_run`,
      and `simple_build` in testutil.py

  [1]: https://gitlab.haskell.org/ghc/ghc/wikis/building/running-tests