This guide to using maps efficiently starts with a brief section
on the choice between records or maps, followed by three sections
giving concrete (but brief) advice on using maps as an alternative to
records, as dictionaries, and as sets. The remaining sections dig
deeper, looking at how maps are implemented, the map syntax, and
finally the functions in the
Terminology used in this chapter:
If the advice in this chapter is followed, the performance of records compared to using small maps instead of records is expected to be similar. Therefore, the choice between records and maps should be based on the desired properties of the data structure and not performance.
The advantages of records compared to maps are:
The disadvantage of records compared to maps is that if a new field is added to a record, all code that uses that record must be recompiled. Because of that, it is recommended to only use records within a unit of code that can easily be recompiled all at once, for example within a single application or single module.
Use the map syntax instead of the functions in
the
Avoid having more than 32 elements in the map. As soon as there are more than 32 elements in the map, it will require more memory and keys can no longer be shared with other instances of the map.
When creating a new map, always create it with all keys that will ever be used. To maximize sharing of keys (thus minimizing memory use), create a single function that constructs the map using the map syntax and always use it.
Always update the map using the
Whenever possible, match multiple map elements at once.
Whenever possible, update multiple map elements at once.
Avoid default values and the
To avoid having to deal with a map that may lack some keys,
42, editor => emacs},
MapWithDefaultsApplied = maps:merge(DefaultMap, OtherMap)]]>
Using a map as a dictionary implies the following usage pattern:
Keys are usually variables not known at compile-time.
There can be any number of elements in the map.
Usually, no more than one element is looked up or updated at once.
Given that usage pattern, the difference in performance between using the map syntax and the maps module is usually small. Therefore, which one to use is mostly a matter of taste.
Maps are usually the most efficient dictionary data structure, with a few exceptions:
If it is necessary to frequently convert a
dictionary to a sorted list, or from a sorted list to a
dictionary, using
If all keys are non-negative integers, the
Starting in OTP 24, the
sets:new([{version,2}]).
#{}
2> sets:from_list([x,y,z], [{version,2}]).
#{x => [],y => [],z => []}]]>
If the intersection operation is frequently used and operations
that operate on a single element in a set (such as
If the elements of the set are integers in a fairly
compact range, the set can be represented as an integer where
each bit represents an element in the set. The union operation
is performed by
Internally, maps have two distinct representations depending on the number of elements in the map. The representation changes when a map grows beyond 32 elements, or when it shrinks to 32 elements or less.
A small map looks like this inside the runtime system:
As an example, let us look at how the map
Let us update the map:
Finally, change the value of one element:
When the value for an existing key is updated, the key tuple is not updated, allowing the key tuple to be shared with other instances of the map that have the same keys. In fact, the key tuple can be shared between all maps with the same keys with some care. To arrange that, define a function that returns a map. For example:
#{a => default, b => default, c => default}.]]>
Defined like this, the key tuple
Using the map syntax with small maps is particularly efficient. As long as the keys are known at compile-time, the map is updated in one go, making the time to update a map essentially constant regardless of the number of keys updated. The same goes for matching. (When the keys are variables, one or more of the keys could be identical, so the operations need to be performed sequentially from left to right.)
The memory size for a small map is the size of all keys and values
plus 5 words. See
A map with more than 32 elements is implemented as a
There is less performance to be gained by matching or updating multiple elements using the map syntax on a large map compared to a small map. The execution time is roughly proportional to the number of elements matched or updated.
The storage overhead for a large map is higher than for a
small map. For a large map, the extra number of words besides
the keys and values is roughly proportional to the number of
elements. For a map with 33 elements the overhead is at
least 53 heap words according to the formula in
When a large map is updated, the updated map and the original map will share common parts of the HAMT, but sharing will never be as effective as the best possible sharing of the key tuple for small maps.
Therefore, if maps are used instead of records and it is expected that many instances of the map will be created, it is more efficient from a memory standpoint to avoid using large maps (for example, by grouping related map elements into sub maps to reduce the number of elements).
Using the map syntax is usually slightly more efficient than
using the corresponding function in the
The gain in efficiency for the map syntax is more noticeable for the following operations that can only be achieved using the map syntax:
Matching multiple literal keys
Updating multiple literal keys
Adding multiple literal keys to a map
For example:
DO
DO NOT
If the map is a small map, the first example runs roughly three times as fast.
Note that for variable keys, the elements are updated sequentially from left to right. For example, given the following update with variable keys:
the compiler rewrites it like this to ensure that the updates are applied from left to right:
If a key is known to exist in a map, using the
Here follows some notes about most of the functions in the
If a function is implemented in C, it is pretty much impossible to implement the same functionality more efficiently in Erlang.
However, it might be possible to beat the
For example,
The implementation details given in this section can change in the future.
Using the map matching syntax instead of
As an optimization, the compiler will rewrite a call to
If the map is small and the keys are constants known at
compile-time, using the map matching syntax will be more
efficient than multiple calls to
As an optimization, the compiler will rewrite a call to
Value;
#{} -> Default
end]]>
This is reasonably efficient, but if a small map is used as an alternative to using a record it is often better not to rely on default values as it prevents sharing of keys, which may in the end use more memory than what you save from not storing default values in the map.
If default values are nevertheless required, instead of calling
Value2, Key2 => Value2, ..., KeyN => ValueN},
MapWithDefaultsApplied = maps:merge(DefaultMap, OtherMap)]]>
This helps share keys between the default map and the one you applied
defaults to, as long as the default map contains all the keys
that will ever be used and not just the ones with default values.
Whether this is faster than calling
Before OTP @OTP-18502@
A map is usually the most efficient way to implement a
set, but an exception is the intersection operation, where
As an optimization, the compiler rewrites calls to
The sharing of key tuples by
The compiler rewrites a call to
If the key is known to already exist in the map,
If the keys are constants known at compile-time, using the
map update syntax with the
As an optimization, the compiler rewrites calls to
Maps are usually more performant than
If the keys are constants known at compile-time, using the
map update syntax with the