1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
|
\chapter{The ``Tail Modulo Constructor'' program transformation} \label{c:tail_mod_cons}
%HEVEA\cutname{tail_mod_cons.html}
(Introduced in OCaml 4.14)
Note: this feature is considered experimental, and its interface may
evolve, with user feedback, in the next few releases of the language.
Consider this natural implementation of the "List.map" function:
\begin{caml_example*}{verbatim}
let rec map f l =
match l with
| [] -> []
| x :: xs ->
let y = f x in
y :: map f xs
\end{caml_example*}
A well-known limitation of this implementation is that the recursive
call, "map f xs", is not in \emph{tail} position. The runtime needs to
remember to continue with "y :: r" after the call returns a value "r",
therefore this function consumes some amount of call-stack space on
each recursive call. The stack usage of "map f li" is proportional to
the length of "li". This is a correctness issue for large lists on
systems configured with limited stack space -- the dreaded
"Stack_overflow" exception.
\begin{caml_example}{toplevel}
let with_stack_limit stack_limit f =
let old_gc_settings = Gc.get () in
Gc.set { old_gc_settings with stack_limit };
Fun.protect ~finally:(fun () -> Gc.set old_gc_settings) f
;;
with_stack_limit 20_000 (fun () ->
List.length (map Fun.id (List.init 1_000_000 Fun.id))
);;
\end{caml_example}
In this implementation of "map", the recursive call happens in
a position that is not a \emph{tail} position in the program, but
within a datatype constructor application that is itself in
\emph{tail} position. We say that those positions, that are composed
of tail positions and constructor applications, are \emph{tail modulo
constructor} (TMC) positions -- we sometimes write \emph{tail modulo
cons} for brevity.
It is possible to rewrite programs such that tail modulo cons
positions become tail positions; after this transformation, the
implementation of "map" above becomes \emph{tail-recursive}, in the
sense that it only consumes a constant amount of stack space. The
OCaml compiler implements this transformation on demand, using the
"[\@tail_mod_cons]" or "[\@ocaml.tail_mod_cons]" attribute on the
function to transform.
\begin{caml_example*}{verbatim}
let[@tail_mod_cons] rec map f l =
match l with
| [] -> []
| x :: xs ->
let y = f x in
y :: map f xs
\end{caml_example*}
\begin{caml_example}{toplevel}
List.length (map Fun.id (List.init 1_000_000 Fun.id));;
\end{caml_example}
This transformation only improves calls in tail-modulo-cons position,
it does not improve recursive calls that do not fit in this fragment:
\begin{caml_example*}{verbatim}[warning=71]
(* does *not* work: addition is not a data constructor *)
let[@tail_mod_cons] rec length l =
match l with
| [] -> 0
| _ :: xs -> 1 + length xs
\end{caml_example*}
It is of course possible to use the "[\@tail_mod_cons]" transformation
on functions that contain some recursive calls in tail-modulo-cons
position, and some calls in other, arbitrary positions. Only the tail
calls and tail-modulo-cons calls will happen in constant stack space.
\paragraph{General design} This feature is provided as an explicit
program transformation, not an implicit optimization. It is
annotation-driven: the user is expected to express their intent by
adding annotations in the program using attributes, and will be asked
to do so in any ambiguous situation.
We expect it to be used mostly by advanced OCaml users needing to get
some guarantees on the stack-consumption behavior of their
programs. Our recommendation is to use the "[\@tailcall]" annotation on
all callsites that should not consume any stack
space. "[\@tail_mod_cons]" extends the set of functions on which calls
can be annotated to be tail calls, helping establish stack-consumption
guarantees in more cases.
\paragraph{Performance} A standard approach to get a tail-recursive
version of "List.map" is to use an accumulator to collect output
elements, and reverse it at the end of the traversal.
\begin{caml_example*}{verbatim}
let rec map f l = map_aux f [] l
and map_aux f acc l =
match l with
| [] -> List.rev acc
| x :: xs ->
let y = f x in
map_aux f (y :: acc) xs
\end{caml_example*}
This version is tail-recursive, but it is measurably slower than the
simple, non-tail-recursive version. In contrast, the tail-mod-cons
transformation provides an implementation that has comparable
performance to the original version, even on small inputs.
\paragraph{Evaluation order} Beware that the tail-modulo-cons
transformation has an effect on evaluation order: the constructor
argument that is transformed into tail-position will always be
evaluated last. Consider the following example:
\begin{caml_example*}{verbatim}
type 'a two_headed_list =
| Nil
| Consnoc of 'a * 'a two_headed_list * 'a
let[@tail_mod_cons] rec map f = function
| Nil -> Nil
| Consnoc (front, body, rear) ->
Consnoc (f front, map f body, f rear)
\end{caml_example*}
Due to the "[\@tail_mod_cons]" transformation, the calls to "f front"
and "f rear" will be evaluated before "map f body". In particular,
this is likely to be different from the evaluation order of the
unannotated version. (The evaluation order of constructor arguments
is unspecified in OCaml, but many implementations typically use
left-to-right or right-to-left.)
This effect on evaluation order is one of the reasons why the
tail-modulo-cons transformation has to be explicitly requested by the
user, instead of being applied as an automatic optimization.
\paragraph{Why tail-modulo-cons?} Other program transformations, in
particular a transformation to continuation-passing style (CPS), can
make all functions tail-recursive, instead of targeting only a small
fragment. Some reasons to provide builtin support for the less-general
tail-mod-cons are as follows:
\begin{itemize}
\item The tail-mod-cons transformation preserves the performance of
the original, non-tail-recursive version, while
a continuation-passing-style transformation incurs a measurable
constant-factor overhead.
\item The tail-mod-cons transformation cannot be expressed as
a source-to-source transformation of OCaml programs, as it relies on
mutable state in type-unsafe ways. In contrast,
continuation-passing-style versions can be written by hand, possibly
using a convenient monadic notation.
\end{itemize}
\paragraph{Note: OCaml call stack size} In OCaml 4.x and earlier,
bytecode programs respect the "stack_limit" runtime parameter
configuration (as set using "Gc.set" in the example above), or the "l"
setting of the "OCAMLRUNPARAM" variable. Native programs ignore these
settings and only respect the operating system native stack limit, as
set by "ulimit" on Unix systems. Most operating systems run with
a relatively low stack size limit by default, so stack overflow on
non-tail-recursive functions are a common programming bug.
Starting from OCaml 5.0, native code does not use the native system
stack for OCaml function calls anymore, so it is not affected by the
operating system native stack size; both native and bytecode programs
respect the OCaml runtime's own limit. The runtime limit is set to
a much higher default than most operating system native stacks, with
a limit of at least 512MiB, so stack overflow should be much less
common in practice. There is still a stack limit by default, as it
remains useful to quickly catch bugs with looping non-tail-recursive
functions. Without a stack limit, one has to wait for the whole memory
to be consumed by the stack for the program to crash, which can take
a long time and make the system unresponsive.
This means that the "tail modulo constructor" transformation is less
important on OCaml 5: it does improve performance noticeably in some
cases, but it is not necessary for basic correctness for most
use-cases.
\section{sec:disambiguation}{Disambiguation}
It may happen that several arguments of a constructor are recursive
calls to a tail-modulo-cons function. The transformation can only turn
one of these calls into a tail call. The compiler will not make an
implicit choice, but ask the user to provide an explicit
disambiguation.
Consider this type of syntactic expressions (assuming some
pre-existing type "var" of expression variables):
\begin{caml_example*}{verbatim}
type var (* some pre-existing type of variables *)
type exp =
| Var of var
| Let of binding * exp
and binding = var * exp
\end{caml_example*}
Consider a "map" function on variables. The direct definition has two
recursive calls inside arguments of the "Let" constructor, so it gets
rejected as ambiguous.
\begin{caml_example*}{verbatim}[error]
let[@tail_mod_cons] rec map_vars f exp =
match exp with
| Var v -> Var (f v)
| Let ((v, def), body) ->
Let ((f v, map_vars f def), map_vars f body)
\end{caml_example*}
To disambiguate, the user should add a "[\@tailcall]" attribute to the
recursive call that should be transformed to tail position:
\begin{caml_example*}{verbatim}
let[@tail_mod_cons] rec map_vars f exp =
match exp with
| Var v -> Var (f v)
| Let ((v, def), body) ->
Let ((f v, map_vars f def), (map_vars[@tailcall]) f body)
\end{caml_example*}
Be aware that the resulting function is \emph{not} tail-recursive, the
recursive call on "def" will consume stack space. However, expression
trees tend to be right-leaning (lots of "Let" in sequence,
rather than nested inside each other), so putting the call on "body"
in tail position is an interesting improvement over the naive
definition: it gives bounded stack space consumption if we assume
a bound on the nesting depth of "Let" constructs.
One would also get an error when using conflicting annotations, asking
for two of the constructor arguments to be put in tail position:
\begin{caml_example*}{verbatim}[error]
let[@tail_mod_cons] rec map_vars f exp =
match exp with
| Var v -> Var (f v)
| Let ((v, def), body) ->
Let ((f v, (map_vars[@tailcall]) f def), (map_vars[@tailcall]) f body)
\end{caml_example*}
\section{sec:out-of-tmc}{Danger: getting out of tail-mod-cons}
Due to the nature of the tail-mod-cons transformation
(see Section~\ref{sec:details} for a presentation of transformation):
\begin{itemize}
\item Calls from a tail-mod-cons function to another tail-mod-cons
function declared in the same recursive-binding group are
transformed into tail calls, as soon as they occur in tail position
or tail-modulo-cons position in the source function.
\item Calls from a function \emph{not} annotated tail-mod-cons to
a tail-mod-cons function or, conversely, from a tail-mod-cons
function to a non-tail-mod-cons function are transformed into
\emph{non}-tail calls, even if they syntactically appear in tail
position in the source program.
\end{itemize}
The fact that calls in tail position in the source program may become
non-tail calls if they go from a tail-mod-cons to a non-tail-mod-cons
function is surprising, and the transformation will warn about them.
For example:
\begin{caml_example*}{verbatim}[warning=71]
let[@tail_mod_cons] rec flatten = function
| [] -> []
| xs :: xss ->
let rec append_flatten xs xss =
match xs with
| [] -> flatten xss
| x :: xs -> x :: append_flatten xs xss
in append_flatten xs xss
\end{caml_example*}
Here the "append_flatten" helper is not annotated with
"[\@tail_mod_cons]", so the calls "append_flatten xs xss" and "flatten
xss" will \emph{not} be tail calls. The correct fix here is to annotate
"append_flatten" to be tail-mod-cons.
\begin{caml_example*}{verbatim}
let[@tail_mod_cons] rec flatten = function
| [] -> []
| xs :: xss ->
let[@tail_mod_cons] rec append_flatten xs xss =
match xs with
| [] -> flatten xss
| x :: xs -> x :: append_flatten xs xss
in append_flatten xs xss
\end{caml_example*}
The same warning occurs when "append_flatten" is a non-tail-mod-cons
function of the same recursive group; using the tail-mod-cons
transformation is a property of individual functions, not whole
recursive groups.
\begin{caml_example*}{verbatim}[warning=71]
let[@tail_mod_cons] rec flatten = function
| [] -> []
| xs :: xss -> append_flatten xs xss
and append_flatten xs xss =
match xs with
| [] -> flatten xss
| x :: xs -> x :: append_flatten xs xss
\end{caml_example*}
Again, the fix is to specialize "append_flatten" as well:
\begin{caml_example*}{verbatim}
let[@tail_mod_cons] rec flatten = function
| [] -> []
| xs :: xss -> append_flatten xs xss
and[@tail_mod_cons] append_flatten xs xss =
match xs with
| [] -> flatten xss
| x :: xs -> x :: append_flatten xs xss
\end{caml_example*}
Non-recursive functions can also be annotated "[\@tail_mod_cons]"; this is
typically useful for local bindings to recursive functions.
Incorrect version:
\begin{caml_example*}{verbatim}[warning=51,warning=71]
let[@tail_mod_cons] rec map_vars f exp =
let self exp = map_vars f exp in
match exp with
| Var v -> Var (f v)
| Let ((v, def), body) ->
Let ((f v, self def), (self[@tailcall]) body)
\end{caml_example*}
Recommended fix:
\begin{caml_example*}{verbatim}
let[@tail_mod_cons] rec map_vars f exp =
let[@tail_mod_cons] self exp = map_vars f exp in
match exp with
| Var v -> Var (f v)
| Let ((v, def), body) ->
Let ((f v, self def), (self[@tailcall]) body)
\end{caml_example*}
In other cases, there is either no benefit in making the called function
tail-mod-cons, or it is not possible: for example, it is a function
parameter (the transformation only works with direct calls to
known functions).
For example, consider a substitution function on binary trees:
\begin{caml_example*}{verbatim}[warning=72]
type 'a tree = Leaf of 'a | Node of 'a tree * 'a tree
let[@tail_mod_cons] rec bind (f : 'a -> 'a tree) (t : 'a tree) : 'a tree =
match t with
| Leaf v -> f v
| Node (left, right) ->
Node (bind f left, (bind[@tailcall]) f right)
\end{caml_example*}
Here "f" is a function parameter, not a direct call, and the current
implementation is strictly first-order, it does not support
tail-mod-cons arguments. In this case, the user should indicate that
they realize this call to "f v" is not, in fact, in tail position, by
using "(f[\@tailcall false]) v".
\begin{caml_example*}{verbatim}
type 'a tree = Leaf of 'a | Node of 'a tree * 'a tree
let[@tail_mod_cons] rec bind (f : 'a -> 'a tree) (t : 'a tree) : 'a tree =
match t with
| Leaf v -> (f[@tailcall false]) v
| Node (left, right) ->
Node (bind f left, (bind[@tailcall]) f right)
\end{caml_example*}
\section{sec:details}{Details on the transformation}
To use this advanced feature, it helps to be aware that the function transformation produces a specialized function in destination-passing-style.
Recall our "map" example:
\begin{caml_example*}{verbatim}
let rec map f l =
match l with
| [] -> []
| x :: xs ->
let y = f x in
y :: map f xs
\end{caml_example*}
Below is a description of the transformed program in pseudo-OCaml
notation: some operations are not expressible in OCaml source code.
(The transformation in fact happens on the Lambda intermediate
representation of the OCaml compiler.)
\begin{verbatim}
let rec map f l =
match l with
| [] -> []
| x :: xs ->
let y = f x in
let dst = y ::{mutable} Hole in
map_dps f xs dst 1;
dst
and map_dps f l dst idx =
match l with
| [] -> dst.idx <- []
| x :: xs ->
let y = f x in
let dst' = y ::{mutable} Hole in
dst.idx <- dst';
map_dps f xs dst' 1
\end{verbatim}
The source version of "map" gets transformed into two functions,
a \emph{direct-style} version that is also called "map", and
a \emph{destination-passing-style} version (DPS) called "map_dps". The
destination-passing-style version does not return a result directly,
instead it writes it into a memory location specified by two
additional function parameters, "dst" (a memory block) and "i"
(a position within the memory block).
The source call "y :: map f xs" gets transformed into the creation of
a mutable block "y ::{mutable} Hole", whose second parameter is an
un-initialized \emph{hole}. The block is then passed to "map_dps" as
a destination parameter (with offset "1").
Notice that "map" does not call itself recursively, it calls
"map_dps". Then, "map_dps" calls itself recursively, in
a tail-recursive way.
The call from "map" to "map_dps" is \emph{not} a tail call (this is
something that we could improve in the future); but this call happens
only once when invoking "map f l", with all list elements after the
first one processed in constant stack by "map_dps".
This explains the ``getting out of tail-mod-cons''
subtleties. Consider our previous example involving mutual recursion
between "flatten" and "append_flatten".
\begin{verbatim}
let[@tail_mod_cons] rec flatten l =
match l with
| [] -> []
| xs :: xss ->
append_flatten xs xss
\end{verbatim}
The call to "append_flatten", which syntactically appears in tail
position, gets transformed differently depending on whether the
function has a destination-passing-style version available, that is,
whether it is itself annotated "[\@tail_mod_cons]":
\begin{verbatim}
(* if append_flatten_dps exists *)
and flatten_dps l dst i =
match l with
| [] -> dst.i <- []
| xs :: xss ->
append_flatten_dps xs xss dst i
(* if append_flatten_dps does not exist *)
and rec flatten_dps l dst i =
match l with
| [] -> dst.i <- []
| xs :: xss ->
dst.i <- append_flatten xs xss
\end{verbatim}
If "append_flatten" does not have a destination-passing-style version,
the call gets transformed to a non-tail call.
\section{sec:limitations}{Current limitations}
\paragraph{Purely syntactic criterion} Just like tail calls in
general, the notion of tail-modulo-constructor position is purely
syntactic; some simple refactoring will move calls out of
tail-modulo-constructor position.
\begin{caml_example*}{verbatim}
(* works as expected *)
let[@tail_mod_cons] rec map f li =
match li with
| [] -> []
| x :: xs ->
let y = f x in
y ::
(* this call is in TMC position *)
map f xs
\end{caml_example*}
\begin{caml_example*}{verbatim}[warning=71]
(* not optimizable anymore *)
let[@tail_mod_cons] rec map f li =
match li with
| [] -> []
| x :: xs ->
let y = f x in
let ys =
(* this call is not in TMC position anymore *)
map f xs in
y :: ys
\end{caml_example*}
\paragraph{Local, first-order transformation} When a function gets
transformed with tail-mod-cons, two definitions are generated, one
providing a direct-style interface and one providing the
destination-passing-style version. However, not all calls to this
function in tail-modulo-cons position will use the
destination-passing-style version and become tail calls:
\begin{itemize}
\item The transformation is local: only tail-mod-cons calls to "foo"
within the same compilation unit as "foo" become tail calls.
\item The transformation is first-order: only direct calls to
known tail-mod-cons functions become tail calls when in
tail-mod-cons position, never calls to function parameters.
\end{itemize}
Consider the call "Option.map foo x" for example: even if "foo" is
called in tail-mod-cons position within the definition of
"Option.map", that call will never become a tail call. (This would be the
case even if the call to "Option.map" was inside the "Option"
module.)
In general this limitation is not a problem for recursive functions:
the first call from an outside module or a higher-order function will
consume stack space, but further recursive calls in tail-mod-cons
position will get optimized. For example, if "List.map" is defined as
a tail-mod-cons function, calls from outside the "List" module will
not become tail calls when in tail positions, but the recursive calls
within the definition of "List.map" are in tail-modulo-cons positions
and do become tail calls: processing the first element of the list
will consume stack space, but all further elements are handled in
constant space.
These limitations may be an issue in more complex situations where
mutual recursion happens between functions, with some functions not
annotated tail-mod-cons, or defined across different modules, or called
indirectly, for example through function parameters.
\paragraph{Non-exact calls to tupled functions} OCaml performs an
implicit optimization for ``tupled'' functions, which take a single
parameter that is a tuple: "let f (x, y, z) = ...". Direct calls to
these functions with a tuple literal argument (like "f (a, b, c)") will
call the ``tupled'' function by passing the parameters directly, instead
of building a tuple of them. Other calls, either indirect calls or calls
passing a more complex tuple value (like "let t = (a, b, c) in f t") are
compiled as ``inexact'' calls that go through a wrapper.
The "[\@tail_mod_cons]" transformation supports tupled functions, but
will only optimize ``exact'' calls in tail position; direct calls to
something other than a tuple literal will not become tail calls. The
user can manually unpack a tuple to force a call to be ``exact'': "let
(x, y, z) = t in f (x, y, z)". If there is any doubt as to whether a call
can be tail-mod-cons-optimized or not, one can use the "[\@tailcall]"
attribute on the called function, which will warn if the
transformation is not possible.
\begin{caml_example*}{verbatim}[warning=51]
let rec map (f, l) =
match l with
| [] -> []
| x :: xs ->
let y = f x in
let args = (f, xs) in
(* this inexact call cannot be tail-optimized, so a warning will be raised *)
y :: (map[@tailcall]) args
\end{caml_example*}
|