summaryrefslogtreecommitdiff
path: root/system/doc/efficiency_guide/processes.xmlsrc
blob: ad76e319463427f04a007e06c04012f87a0b327b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE chapter SYSTEM "chapter.dtd">

<chapter>
  <header>
    <copyright>
      <year>2001</year><year>2017</year>
      <holder>Ericsson AB. All Rights Reserved.</holder>
    </copyright>
    <legalnotice>
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at

          http://www.apache.org/licenses/LICENSE-2.0

      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License.

    </legalnotice>

    <title>Processes</title>
    <prepared>Bjorn Gustavsson</prepared>
    <docno></docno>
    <date>2007-11-21</date>
    <rev></rev>
    <file>processes.xmlsrc</file>
  </header>

  <section>
    <title>Creating an Erlang Process</title>

    <p>An Erlang process is lightweight compared to threads and
    processes in operating systems.</p>

    <p>A newly spawned Erlang process uses 326 words of memory. The size can
    be found as follows:</p>

    <pre>
Erlang/OTP 24 [erts-12.0] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Eshell V5.6  (abort with ^G)
1> <input>Fun = fun() -> receive after infinity -> ok end end.</input>
#Fun&lt;...>
2> <input>{_,Bytes} = process_info(spawn(Fun), memory).</input>
{memory,1232}
3> <input>Bytes div erlang:system_info(wordsize).</input>
309</pre>

    <p>The size includes 233 words for the heap area (which includes the
    stack). The garbage collector increases the heap as needed.</p>

    <p>The main (outer) loop for a process <em>must</em> be tail-recursive.
    Otherwise, the stack grows until the process terminates.</p>

    <p><em>DO NOT</em></p>
    <code type="erl">
loop() ->
  receive
     {sys, Msg} ->
         handle_sys_msg(Msg),
         loop();
     {From, Msg} ->
          Reply = handle_msg(Msg),
          From ! Reply,
          loop()
  end,
  io:format("Message is processed~n", []).</code>

    <p>The call to <c>io:format/2</c> will never be executed, but a
    return address will still be pushed to the stack each time
    <c>loop/0</c> is called recursively. The correct tail-recursive
    version of the function looks as follows:</p>

    <p><em>DO</em></p>
<code type="erl">
   loop() ->
      receive
         {sys, Msg} ->
            handle_sys_msg(Msg),
            loop();
         {From, Msg} ->
            Reply = handle_msg(Msg),
            From ! Reply,
            loop()
    end.</code>

    <section>
      <title>Initial Heap Size</title>

      <p>The default initial heap size of 233 words is quite conservative
      to support Erlang systems with hundreds of thousands or
      even millions of processes. The garbage collector grows and
      shrinks the heap as needed.</p>

      <p>In a system that use comparatively few processes, performance
      <em>might</em> be improved by increasing the minimum heap size
      using either the <c>+h</c> option for
      <seecom marker="erts:erl">erl</seecom> or on a process-per-process
      basis using the <c>min_heap_size</c> option for
      <seemfa marker="erts:erlang#spawn_opt/4">spawn_opt/4</seemfa>.</p>

      <p>The gain is twofold:</p>
      <list type="bulleted">
	<item>Although the garbage collector grows the heap, it grows it
	step-by-step, which is more costly than directly establishing a
	larger heap when the process is spawned.</item>
	<item>The garbage collector can also shrink the heap if it is
	much larger than the amount of data stored on it;
	setting the minimum heap size prevents that.</item>
      </list>

      <warning><p>The emulator probably uses more memory, and because garbage
      collections occur less frequently, huge binaries can be
      kept much longer.</p></warning>

      <p>In systems with many processes, computation tasks that run
      for a short time can be spawned off into a new process with
      a higher minimum heap size. When the process is done, it sends
      the result of the computation to another process and terminates.
      If the minimum heap size is calculated properly, the process might
      not have to do any garbage collections at all.
      <em>This optimization is not to be attempted
      without proper measurements.</em></p>
    </section>
  </section>

  <section>
    <title>Sending Messages</title>

    <p>All data in messages sent between Erlang processes is copied,
    except for <seeguide marker="binaryhandling#refc_binary">refc
    binaries</seeguide> and <seeguide
    marker="#literal-pool">literals</seeguide> on the same Erlang
    node.</p>

    <p>When a message is sent to a process on another Erlang node,
    it is first encoded to the Erlang External Format before
    being sent through a TCP/IP socket. The receiving Erlang node decodes
    the message and distributes it to the correct process.</p>
  </section>

  <section>
    <marker id="receiving-messages"></marker>
    <title>Receiving messages</title>

    <p>The cost of receiving messages depends on how complicated the
    <c>receive</c> expression is. A simple expression that matches any
    message is very cheap because it retrieves the first message in the
    message queue:</p>

    <p><em>DO</em></p>
    <code type="erl"><![CDATA[
receive
    Message -> handle_msg(Message)
end.]]></code>

    <p>However, this is not always convenient: we can receive a message that
    we do not know how to handle at this point, so it is common to
    only match the messages we expect:</p>

    <code type="erl"><![CDATA[
receive
    {Tag, Message} -> handle_msg(Message)
end.]]></code>

    <p>While this is convenient it means that the entire message queue must
    be searched until it finds a matching message. This is very expensive
    for processes with long message queues, so we have added an
    optimization for the common case of sending a request and waiting for a
    response shortly after:</p>

    <p><em>DO</em></p>
    <code type="erl"><![CDATA[
MRef = monitor(process, Process),
Process ! {self(), MRef, Request},
receive
    {MRef, Reply} ->
        erlang:demonitor(MRef, [flush]),
        handle_reply(Reply);
    {'DOWN', MRef, _, _, Reason} ->
        handle_error(Reason)
end.]]></code>

    <p>Since the compiler knows that the reference created by <c>monitor/2</c>
    cannot exist before the call (since it is a globally unique identifier),
    and that the <c>receive</c> only matches messages that contain said
    reference, it will tell the emulator to search only the messages that
    arrived after the call to <c>monitor/2</c>.</p>

    <p>The above is a simple example where one is but guaranteed that the
    optimization will take, but what about more complicated code?</p>

    <section>
      <marker id="recv_opt_info"></marker>
      <title>Option recv_opt_info</title>

      <p>Use the <c>recv_opt_info</c> option to have the compiler print
      information about receive optimizations. It can be given either to
      the compiler or <c>erlc</c>:</p>

      <code type="erl"><![CDATA[
erlc +recv_opt_info Mod.erl]]></code>

      <p>or passed through an environment variable:</p>

      <code type="erl"><![CDATA[
export ERL_COMPILER_OPTIONS=recv_opt_info]]></code>

      <p>Notice that <c>recv_opt_info</c> is not meant to be a permanent
      option added to your <c>Makefile</c>s, because all messages that it
      generates cannot be eliminated. Therefore, passing the option through
      the environment is in most cases the most practical approach.</p>

      <p>The warnings look as follows:</p>

      <code type="erl"><![CDATA[
efficiency_guide.erl:194: Warning: INFO: receive matches any message, this is always fast
efficiency_guide.erl:200: Warning: NOT OPTIMIZED: all clauses do not match a suitable reference
efficiency_guide.erl:206: Warning: OPTIMIZED: reference used to mark a message queue position
efficiency_guide.erl:208: Warning: OPTIMIZED: all clauses match reference created by monitor/2 at efficiency_guide.erl:206
efficiency_guide.erl:219: Warning: INFO: passing reference created by make_ref/0 at efficiency_guide.erl:218
efficiency_guide.erl:222: Warning: OPTIMIZED: all clauses match reference in function parameter 1]]></code>

      <p>To make it clearer exactly what code the warnings refer to, the
      warnings in the following examples are inserted as comments
      after the clause they refer to, for example:</p>

      <code type="erl"><![CDATA[
%% DO
simple_receive() ->
%% efficiency_guide.erl:194: Warning: INFO: not a selective receive, this is always fast
receive
    Message -> handle_msg(Message)
end.

%% DO NOT, unless Tag is known to be a suitable reference: see
%% cross_function_receive/0 further down.
selective_receive(Tag, Message) ->
%% efficiency_guide.erl:200: Warning: NOT OPTIMIZED: all clauses do not match a suitable reference
receive
    {Tag, Message} -> handle_msg(Message)
end.

%% DO
optimized_receive(Process, Request) ->
%% efficiency_guide.erl:206: Warning: OPTIMIZED: reference used to mark a message queue position
    MRef = monitor(process, Process),
    Process ! {self(), MRef, Request},
    %% efficiency_guide.erl:208: Warning: OPTIMIZED: matches reference created by monitor/2 at efficiency_guide.erl:206
    receive
        {MRef, Reply} ->
        erlang:demonitor(MRef, [flush]),
        handle_reply(Reply);
    {'DOWN', MRef, _, _, Reason} ->
    handle_error(Reason)
    end.

%% DO
cross_function_receive() ->
    %% efficiency_guide.erl:218: Warning: OPTIMIZED: reference used to mark a message queue position
    Ref = make_ref(),
    %% efficiency_guide.erl:219: Warning: INFO: passing reference created by make_ref/0 at efficiency_guide.erl:218
    cross_function_receive(Ref).

cross_function_receive(Ref) ->
    %% efficiency_guide.erl:222: Warning: OPTIMIZED: all clauses match reference in function parameter 1
    receive
        {Ref, Message} -> handle_msg(Message)
    end.]]></code>
    </section>
  </section>

  <section>
    <marker id="literal-pool"/>
    <title>Literal Pool</title>

    <p>Constant Erlang terms (hereafter called <em>literals</em>) are
    kept in <em>literal pools</em>; each loaded module has its own pool.
    The following function does not build the tuple every time
    it is called (only to have it discarded the next time the garbage
    collector was run), but the tuple is located in the module's
    literal pool:</p>

    <p><em>DO</em></p>
    <code type="erl"><![CDATA[
days_in_month(M) ->
    element(M, {31,28,31,30,31,30,31,31,30,31,30,31}).]]></code>

    <p>If a literal, or a term that contains a literal, is inserted
    into an Ets table, it is <em>copied</em>. The reason is that the
    module containing the literal can be unloaded in the future.</p>

    <p>When a literal is sent to another process, it is <em>not</em>
    copied. When a module holding a literal is unloaded, the literal
    will be copied to the heap of all processes that hold references
    to that literal.</p>

    <p>There also exists a global literal pool that is managed by the
    <seeerl marker="erts:persistent_term">persistent_term</seeerl>
    module.</p>

    <p>By default, 1 GB of virtual address space is reserved for all
    literal pools (in BEAM code and persistent terms). The amount of
    virtual address space reserved for literals can be changed by
    using the <seecref marker="erts:erts_alloc#MIscs"><c>+MIscs
    option</c></seecref> when starting the emulator.</p>

    <p>Here is an example how the reserved virtual address space for
    literals can be raised to 2 GB (2048 MB):</p>

    <pre>
    erl +MIscs 2048</pre>
  </section>

  <section>
    <marker id="loss-of-sharing"></marker>
    <title>Loss of Sharing</title>

    <p>An Erlang term can have shared subterms. Here is a simple
    example:</p>

    <code type="erl"><![CDATA[
{SubTerm, SubTerm}]]></code>

    <p>Shared subterms are <em>not</em> preserved in the following
    cases:</p>
    <list type="bulleted">
      <item>When a term is sent to another process</item>
      <item>When a term is passed as the initial process arguments in
      the <c>spawn</c> call</item>
      <item>When a term is stored in an Ets table</item>
    </list>
    <p>That is an optimization. Most applications do not send messages
    with shared subterms.</p>

    <p>The following example shows how a shared subterm can be created:</p>

    <codeinclude file="efficiency_guide.erl" tag="%%kilo_byte" type="erl"/>

    <p><c>kilo_byte/1</c> creates a deep list.
    If <c>list_to_binary/1</c> is called, the deep list can be
    converted to a binary of 1024 bytes:</p>

    <pre>
1> <input>byte_size(list_to_binary(efficiency_guide:kilo_byte())).</input>
1024</pre>

    <p>Using the <c>erts_debug:size/1</c> BIF, it can be seen that the
    deep list only requires 22 words of heap space:</p>

    <pre>
2> <input>erts_debug:size(efficiency_guide:kilo_byte()).</input>
22</pre>

    <p>Using the <c>erts_debug:flat_size/1</c> BIF, the size of the
    deep list can be calculated if sharing is ignored. It becomes
    the size of the list when it has been sent to another process
    or stored in an Ets table:</p>

    <pre>
3> <input>erts_debug:flat_size(efficiency_guide:kilo_byte()).</input>
4094</pre>

    <p>It can be verified that sharing will be lost if the data is
    inserted into an Ets table:</p>

    <pre>
4> <input>T = ets:new(tab, []).</input>
#Ref&lt;0.1662103692.2407923716.214181>
5> <input>ets:insert(T, {key,efficiency_guide:kilo_byte()}).</input>
true
6> <input>erts_debug:size(element(2, hd(ets:lookup(T, key)))).</input>
4094
7> <input>erts_debug:flat_size(element(2, hd(ets:lookup(T, key)))).</input>
4094</pre>

    <p>When the data has passed through an Ets table,
    <c>erts_debug:size/1</c> and <c>erts_debug:flat_size/1</c>
    return the same value. Sharing has been lost.</p>

    <p>It is possible to build an <em>experimental</em> variant of the
    runtime system that will preserve sharing when copying terms by
    giving the <c>--enable-sharing-preserving</c> option to the
    <c>configure</c> script.</p>
  </section>

  <section>
    <title>SMP Emulator</title>

    <p>The emulator takes advantage of a multi-core or multi-CPU
    computer by running several Erlang scheduler
    threads (typically, the same as the number of cores).</p>

    <p>To gain performance from a multi-core computer, your application
    <em>must have more than one runnable Erlang process</em> most of the time.
    Otherwise, the Erlang emulator can still only run one Erlang process
    at the time.</p>

    <p>Benchmarks that appear to be concurrent are often sequential.
    The estone benchmark, for example, is entirely sequential. So is
    the most common implementation of the "ring benchmark"; usually one process
    is active, while the others wait in a <c>receive</c> statement.</p>
  </section>
</chapter>