= Tips on hacking the OCaml runtime system = == Linking a test program with the debug runtime == Suppose you have a self-contained OCaml program `test.ml` that crashes, you are working on a development repository (not an installed version of your system). You probably want to run `test.ml` against the "debug runtime", which in particular activates the `CAMLassert` debug assertions. If you want to use the bytecode compiler: ---- # build the runtime make runtime -j # compile as usual ./ocamlc.opt -nostdlib -I stdlib test.ml -o test # run with the debug runtime (ocamlrund) ./runtime/ocamlrund ./test ---- If you want to use the native compiler: ---- # build the native runtime make runtimeopt -j # compile with "-runtime-variant d" ./ocamlopt.opt -nostdlib -I stdlib -runtime-variant d -I runtime test.ml -o test ./test ---- Note that the debug runtime does extra work, so it may slow down your program -- and sometimes make the issue you are trying to debug vanish. == GC messages == The GC can send various messages about what it is doing, enabled with the "v" option of OCAMLRUNPARAM. Various options are more or less documented in link:https://ocaml.org/manual/runtime.html#s:ocamlrun-options[]. You can enable all printing with ---- OCAMLRUNPARAM="v=0xffffffff" ./test ---- Note: `caml_gc_log` can be used to show log messages prefixed with the thread number, and it corresponds to the more precise setting `v=0x800`. == Heap verification == Another useful OCAMLRUNPARAM setting is `V=1`, which enables additional sanity checks on the heap during major GC cycles. ---- OCAMLRUNPARAM="V=1" ./test ---- == Getting stack traces after assertion failures (Linux) == The output of a crashing OCaml program may end up like this: ---- [03] file domain.c; line 404 ### Assertion failed: domain_state->young_start == NULL Aborted (core dumped) ---- The message "core dumped" indicates that some debugging information was kept on the disk. On Linux, systemd-enabled systems tend to use a systemd tool (of course!) to store core dumps. ---- # ask your system how core dumps are handled. $ cat /proc/sys/kernel/core_pattern |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h ---- If your system is also using `systemd-coredump`, then the command `coredumpctl dump` will show you information about the last "core dump". ---- $ $ coredumpctl dump PID: 678260 (Domain0) UID: 1000 (gasche) GID: 1000 (gasche) Signal: 6 (ABRT) Timestamp: Fri 2022-02-25 09:30:32 CET (4min 30s ago) Command Line: ./test Executable: /home/gasche/Prog/ocaml/github-max_domains/test Control Group: [...] [...] Disk Size: 133.0K Message: Process 678260 (Domain0) of user 1000 dumped core. Stack trace of thread 678266: #0 0x00007f60ee4842a2 raise (libc.so.6 + 0x3d2a2) #1 0x00007f60ee46d8a4 abort (libc.so.6 + 0x268a4) #2 0x0000000000475022 n/a (/home/gasche/Prog/ocaml/github-max_domains/test + 0x75022) Refusing to dump core to tty (use shell redirection or specify --output). ---- You can get a full backtrace using `echo bt | coredumpctl debug`: ---- $ echo bt | coredumpctl debug [...] Core was generated by `./test'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f60ee4842a2 in raise () from /lib64/libc.so.6 [Current thread is 1 (Thread 0x7f60d77fe640 (LWP 678266))] Missing separate debuginfos, use: dnf debuginfo-install glibc-2.33-20.fc34.x86_64 (gdb) #0 0x00007f60ee4842a2 in raise () from /lib64/libc.so.6 #1 0x00007f60ee46d8a4 in abort () from /lib64/libc.so.6 #2 0x0000000000475022 in caml_failed_assert ( expr=expr@entry=0x488498 "domain_state->young_start == NULL", file_os=file_os@entry=0x488218 "domain.c", line=line@entry=404) at misc.c:56 #3 0x0000000000461831 in caml_free_minor_heap () at domain.c:404 #4 0x000000000046237b in caml_reallocate_minor_heap (wsize=wsize@entry=786432) at domain.c:469 #5 0x0000000000474404 in caml_set_minor_heap_size (wsize=wsize@entry=786432) at minor_gc.c:130 #6 0x00000000004696b3 in caml_gc_set (v=) at gc_ctrl.c:222 #7 #8 0x000000000042a3b2 in camlTest__set_gc_280 () at test.ml:17 #9 0x000000000042a818 in camlTest__fun_529 () at test.ml:39 #10 0x000000000044947a in camlStdlib__Domain__body_694 () at domain.ml:204 #11 #12 0x000000000045fe38 in caml_callback_exn (closure=, arg=, arg@entry=1) at callback.c:169 #13 0x0000000000460369 in caml_callback (closure=, arg=arg@entry=1) at callback.c:253 #14 0x0000000000461f6a in domain_thread_func (v=0x7ffdd7357bb0) at domain.c:1034 #15 0x00007f60ee61f299 in start_thread () from /lib64/libpthread.so.0 #16 0x00007f60ee547353 in clone () from /lib64/libc.so.6 (gdb) quit ---- == Using `rr` for deterministic replay debugging == There is a lot of information on how to use `rr` to debug the OCaml runtime on the OCaml Multicore wiki: link:https://github.com/ocaml-multicore/ocaml-multicore/wiki/Debugging-the-OCaml-Multicore-runtime#rr[]. TODO: it would be nice to migrate some information here. == Compiling with sanitizers == TODO: I would be curious to know! (For the brave there are some scripts in link:../tools/ci/inria/sanitizers/script[], but you probably don't want to run them directly, in particular they will `git clean -xfd`, destroying changed/uncommitted files in your development repository!)