summaryrefslogtreecommitdiff
path: root/experimental
diff options
context:
space:
mode:
authoralainfrisch <alain@frisch.fr>2018-01-26 16:15:49 +0100
committeralainfrisch <alain@frisch.fr>2018-01-26 16:15:49 +0100
commit70b36a7d18a9c507cb52f3563c17e5767fb06bae (patch)
tree36c1ecd5847e5e7eb3337506a79c77d81d8ef2ba /experimental
parent0e67880879d5f9f103eac8f7d89d17124ecf9480 (diff)
downloadocaml-70b36a7d18a9c507cb52f3563c17e5767fb06bae.tar.gz
Benchmark results.
Diffstat (limited to 'experimental')
-rwxr-xr-xexperimental/frisch/bench_ocamllex_optims.md98
1 files changed, 98 insertions, 0 deletions
diff --git a/experimental/frisch/bench_ocamllex_optims.md b/experimental/frisch/bench_ocamllex_optims.md
new file mode 100755
index 0000000000..7be2acb684
--- /dev/null
+++ b/experimental/frisch/bench_ocamllex_optims.md
@@ -0,0 +1,98 @@
+Some benchmark to evaluate the speedup to `ocamllex -ml`.
+
+In all tests, we tokenize the `typecore.ml` file (first loaded in
+memory) file using either:
+
+ - the OCaml lexer, or
+
+ - a simpler lexer with trivial actions (to eliminate the cost of
+actions themselves, which is not under the control of ocamllex).
+
+We run the output of:
+
+ - `ocamllex` without the -ml flag, i.e. using tables interpreted at
+runtime by the C support code
+
+ - `ocamllex -ml`, i.e. the automaton is translated to OCaml code;
+this is done on before and after the optimizations.
+
+For each case, we compile the benchmark with:
+
+ - `ocamlc`
+
+ - `ocamlopt -inline 10`
+
+ - `ocamlopt -inline 1000`
+
+(flambda disabled).
+
+The tables below show:
+
+ - the throughput (Mb of source code tokenized
+by second -- higher is better;
+
+ - its inverse (number of milleseconds to parse one Mb) -- lower is better;
+
+ - the allocation ratio (number of bytes allocated by the GC for each byte of source code)
+
+
+Conclusions:
+
+ - In native code, the "-ml" mode is slightly slower than the table
+ mode before the optimizations, but it becomes significantly faster
+ after the optimizations, obviously even more so when the
+ lexer actions are trivial (throughput 58.44 -> 98.30).
+
+ - In bytecode, the "-ml" mode is always much slower than the table
+ mode, but the optimization reduce the gap is little bit.
+
+ - Not tested here, but it is likely that the optimizations produce
+ code which would be more friendly to Javascript backends
+ (js_of_ocaml and Bucklescript), as they reduce quite a bit
+ the number of function calls and mutations.
+
+Note:
+
+ - The "refill handler" mode has been lightly tested only.
+
+
+OCaml lexer:
+
+````
+WITHOUT -ml flag:
+ NATIVE, -inline 1000: 38.07 Mb/s 26.27 ms/Mb alloc x 36.79
+ NATIVE, -inline 10 : 35.42 Mb/s 28.23 ms/Mb alloc x 36.79
+ BYTECODE : 7.84 Mb/s 127.54 ms/Mb alloc x 35.48
+
+
+WITH -ml flag, TRUNK:
+ NATIVE, -inline 1000: 34.36 Mb/s 29.11 ms/Mb alloc x 36.79
+ NATIVE, -inline 10 : 34.12 Mb/s 29.31 ms/Mb alloc x 36.79
+ BYTECODE : 4.08 Mb/s 244.93 ms/Mb alloc x 35.48
+
+
+WITH -ml flag, BRANCH:
+ NATIVE, -inline 1000: 45.56 Mb/s 21.95 ms/Mb alloc x 36.79
+ NATIVE, -inline 10 : 43.19 Mb/s 23.15 ms/Mb alloc x 36.79
+ BYTECODE : 4.35 Mb/s 229.91 ms/Mb alloc x 35.48
+````
+
+
+Simpler lexer (trivial actions):
+
+````
+WITHOUT -ml flag:
+ NATIVE, -inline 1000: 58.44 Mb/s 17.11 ms/Mb alloc x 21.94
+ NATIVE, -inline 10 : 58.24 Mb/s 17.17 ms/Mb alloc x 21.94
+ BYTECODE : 12.63 Mb/s 79.21 ms/Mb alloc x 21.93
+
+WITH -ml flag, TRUNK:
+ NATIVE, -inline 1000: 55.14 Mb/s 18.13 ms/Mb alloc x 21.94
+ NATIVE, -inline 10 : 50.76 Mb/s 19.70 ms/Mb alloc x 21.94
+ BYTECODE : 5.74 Mb/s 174.22 ms/Mb alloc x 21.93
+
+WITH -ml flag, BRANCH:
+ NATIVE, -inline 1000: 98.30 Mb/s 10.17 ms/Mb alloc x 21.94
+ NATIVE, -inline 10 : 87.16 Mb/s 11.47 ms/Mb alloc x 21.94
+ BYTECODE : 6.48 Mb/s 154.43 ms/Mb alloc x 21.93
+````