manual/src/cmds/native.etex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267

\chapter{Native-code compilation (ocamlopt)} \label{c:nativecomp}
%HEVEA\cutname{native.html}

This chapter describes the OCaml high-performance
native-code compiler "ocamlopt", which compiles OCaml source files to
native code object files and links these object files to produce
standalone executables.

The native-code compiler is only available on certain platforms.
It produces code that runs faster than the bytecode produced by
"ocamlc", at the cost of increased compilation time and executable code
size. Compatibility with the bytecode compiler is extremely high: the
same source code should run identically when compiled with "ocamlc" and
"ocamlopt".

It is not possible to mix native-code object files produced by "ocamlopt"
with bytecode object files produced by "ocamlc": a program must be
compiled entirely with "ocamlopt" or entirely with "ocamlc". Native-code
object files produced by "ocamlopt" cannot be loaded in the toplevel
system "ocaml".

\section{s:native-overview}{Overview of the compiler}

The "ocamlopt" command has a command-line interface very close to that
of "ocamlc". It accepts the same types of arguments, and processes them
sequentially, after all options have been processed:

\begin{itemize}
\item
Arguments ending in ".mli" are taken to be source files for
compilation unit interfaces. Interfaces specify the names exported by
compilation units: they declare value names with their types, define
public data types, declare abstract data types, and so on. From the
file \var{x}".mli", the "ocamlopt" compiler produces a compiled interface
in the file \var{x}".cmi". The interface produced is identical to that
produced by the bytecode compiler "ocamlc".

\item
Arguments ending in ".ml" are taken to be source files for compilation
unit implementations. Implementations provide definitions for the
names exported by the unit, and also contain expressions to be
evaluated for their side-effects.  From the file \var{x}".ml", the "ocamlopt"
compiler produces two files: \var{x}".o", containing native object code,
and \var{x}".cmx", containing extra information for linking and
optimization of the clients of the unit. The compiled implementation
should always be referred to under the name \var{x}".cmx" (when given
a ".o" or ".obj" file, "ocamlopt" assumes that it contains code compiled from C,
not from OCaml).

The implementation is checked against the interface file \var{x}".mli"
(if it exists) as described in the manual for "ocamlc"
(chapter~\ref{c:camlc}).

\item
Arguments ending in ".cmx" are taken to be compiled object code.  These
files are linked together, along with the object files obtained
by compiling ".ml" arguments (if any), and the OCaml standard
library, to produce a native-code executable program. The order in
which ".cmx" and ".ml" arguments are presented on the command line is
relevant: compilation units are initialized in that order at
run-time, and it is a link-time error to use a component of a unit
before having initialized it. Hence, a given \var{x}".cmx" file must come
before all ".cmx" files that refer to the unit \var{x}.

\item
Arguments ending in ".cmxa" are taken to be libraries of object code.
Such a library packs in two files (\var{lib}".cmxa" and \var{lib}".a"/".lib")
a set of object files (".cmx" and ".o"/".obj" files). Libraries are build with
"ocamlopt -a" (see the description of the "-a" option below). The object
files contained in the library are linked as regular ".cmx" files (see
above), in the order specified when the library was built. The only
difference is that if an object file contained in a library is not
referenced anywhere in the program, then it is not linked in.

\item
Arguments ending in ".c" are passed to the C compiler, which generates
a ".o"/".obj" object file. This object file is linked with the program.

\item
Arguments ending in ".o", ".a" or ".so" (".obj", ".lib" and ".dll"
under Windows) are assumed to be C object files and
libraries. They are linked with the program.

\end{itemize}

The output of the linking phase is a regular Unix or Windows
executable file. It does not need "ocamlrun" to run.

The compiler is able to emit some information on its internal stages:

\begin{itemize}
\item
%  The following two paragraphs are a duplicate from the description of the batch compiler.
".cmt" files for the implementation of the compilation unit
and ".cmti" for signatures if the option "-bin-annot" is passed to it (see the
description of "-bin-annot" below).
Each such file contains a typed abstract syntax tree (AST), that is produced
during the type checking procedure. This tree contains all available information
about the location and the specific type of each term in the source file.
The AST is partial if type checking was unsuccessful.

These ".cmt" and ".cmti" files are typically useful for code inspection tools.

\item
".cmir-linear" files for the implementation of the compilation unit
if the option "-save-ir-after scheduling" is passed to it.
Each such file contains a low-level intermediate representation,
produced by the instruction scheduling pass.

An external tool can perform low-level optimisations,
such as code layout, by transforming a ".cmir-linear" file.
To continue compilation, the compiler can be invoked with (a possibly modified)
".cmir-linear" file as an argument, instead of the corresponding source file.
\end{itemize}

\section{s:native-options}{Options}

The following command-line options are recognized by "ocamlopt".
The options "-pack", "-a", "-shared", "-c", "-output-obj" and
"-output-complete-obj" are mutually exclusive.

% Configure boolean variables used by the macros in unified-options.etex
\compfalse
\nattrue
\topfalse
% unified-options gathers all options across the native/bytecode
% compilers and toplevel
\input{unified-options.tex}

\paragraph{Options for the 32-bit x86 architecture}
The 32-bit code generator for Intel/AMD x86 processors ("i386"
architecture) supports the
following additional option:

\begin{options}
\item["-ffast-math"] Use the processor instructions to compute
trigonometric and exponential functions, instead of calling the
corresponding library routines.  The functions affected are:
"atan", "atan2", "cos", "log", "log10", "sin", "sqrt" and "tan".
The resulting code runs faster, but the range of supported arguments
and the precision of the result can be reduced.  In particular,
trigonometric operations "cos", "sin", "tan" have their range reduced to
$[-2^{64}, 2^{64}]$.
\end{options}

\paragraph{Options for the 64-bit x86 architecture}
The 64-bit code generator for Intel/AMD x86 processors ("amd64"
architecture) supports the following additional options:

\begin{options}
\item["-fPIC"] Generate position-independent machine code.  This is
the default.
\item["-fno-PIC"] Generate position-dependent machine code.
\end{options}

\paragraph{Options for the PowerPC architecture}
The PowerPC code generator supports the following additional options:

\begin{options}
\item["-flarge-toc"] Enables the PowerPC large model allowing the TOC (table of
contents) to be arbitrarily large.  This is the default since 4.11.
\item["-fsmall-toc"] Enables the PowerPC small model allowing the TOC to be up
to 64 kbytes per compilation unit.  Prior to 4.11 this was the default
behaviour.
\end{options}

\paragraph{Contextual control of command-line options}

The compiler command line can be modified ``from the outside''
with the following mechanisms. These are experimental
and subject to change. They should be used only for experimental and
development work, not in released packages.

\begin{options}
\item["OCAMLPARAM" \rm(environment variable)]
A set of arguments that will be inserted before or after the arguments from
the command line. Arguments are specified in a comma-separated list
of "name=value" pairs. A "_" is used to specify the position of
the command line arguments, i.e. "a=x,_,b=y" means that "a=x" should be
executed before parsing the arguments, and "b=y" after. Finally,
an alternative separator can be specified as the
first character of the string, within the set ":|; ,".
\item["ocaml_compiler_internal_params" \rm(file in the stdlib directory)]
A mapping of file names to lists of arguments that
will be added to the command line (and "OCAMLPARAM") arguments.
\item["OCAML_FLEXLINK" \rm(environment variable)]
Alternative executable to use on native
Windows for "flexlink" instead of the
configured value. Primarily used for bootstrapping.
\end{options}

\section{s:native-common-errors}{Common errors}

The error messages are almost identical to those of "ocamlc".
See section~\ref{s:comp-errors}.

\section{s:native:running-executable}{Running executables produced by ocamlopt}

Executables generated by "ocamlopt" are native, stand-alone executable
files that can be invoked directly.  They do
not depend on the "ocamlrun" bytecode runtime system nor on
dynamically-loaded C/OCaml stub libraries.

During execution of an "ocamlopt"-generated executable,
the following environment variables are also consulted:
\begin{options}
\item["OCAMLRUNPARAM"]  Same usage as in "ocamlrun"
  (see section~\ref{s:ocamlrun-options}), except that option "l"
  is ignored (the operating system's stack size limit
  is used instead).
\item["CAMLRUNPARAM"]  If "OCAMLRUNPARAM" is not found in the
  environment, then "CAMLRUNPARAM" will be used instead.  If
  "CAMLRUNPARAM" is not found, then the default values will be used.
\end{options}

\section{s:compat-native-bytecode}{Compatibility with the bytecode compiler}

This section lists the known incompatibilities between the bytecode
compiler and the native-code compiler. Except on those points, the two
compilers should generate code that behave identically.

\begin{itemize}

\item Signals are detected only when the program performs an
allocation in the heap. That is, if a signal is delivered while in a
piece of code that does not allocate, its handler will not be called
until the next heap allocation.

\item On ARM and PowerPC processors (32 and 64 bits), fused
  multiply-add (FMA) instructions can be generated for a
  floating-point multiplication followed by a floating-point addition
  or subtraction, as in "x *. y +. z".  The FMA instruction avoids
  rounding the intermediate result "x *. y", which is generally
  beneficial, but produces floating-point results that differ slightly
  from those produced by the bytecode interpreter.

\item On Intel/AMD x86 processors in 32-bit mode,
some intermediate results in floating-point computations are
kept in extended precision rather than being rounded to double
precision like the bytecode compiler always does.  Floating-point
results can therefore differ slightly between bytecode and native code.

\item The native-code compiler performs a number of optimizations that
the bytecode compiler does not perform, especially when the Flambda
optimizer is active.  In particular, the native-code compiler
identifies and eliminates ``dead code'', i.e.\ computations that do
not contribute to the results of the program.  For example,
\begin{verbatim}
        let _ = ignore M.f
\end{verbatim}
contains a reference to compilation unit "M" when compiled to
bytecode.  This reference forces "M" to be linked and its
initialization code to be executed.  The native-code compiler
eliminates the reference to "M", hence the compilation unit "M" may
not be linked and executed.  A workaround is to compile "M" with the
"-linkall" flag so that it will always be linked and executed, even if
not referenced.  See also the "Sys.opaque_identity" function from the
"Sys" standard library module.

\item Before 4.10, stack overflows, typically caused by excessively
  deep recursion, are not always turned into a "Stack_overflow"
  exception like with the bytecode compiler. The runtime system makes
  a best effort to trap stack overflows and raise the "Stack_overflow"
  exception, but sometimes it fails and a ``segmentation fault'' or
  another system fault occurs instead.

\end{itemize}