summaryrefslogtreecommitdiff
path: root/doc/rtd/reference/merging.rst
blob: 62efffdb5d19066d21f30991c4f5773ec7ad9e73 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
.. _merging_user_data:

Merging user data sections
**************************

The ability to merge user data sections is a feature that was implemented by
popular request. It was identified that there should be a way to specify how
cloud-config YAML "dictionaries" provided as user data are handled when there
are multiple YAML files to be merged together (e.g., when performing an
#include).

The previous merging algorithm was very simple and would only overwrite
(and not append). So, it was decided to create a new and improved way to merge
dictionaries (and their contained objects) together in a customisable way,
thus allowing users who provide cloud-config user data to determine exactly
how their objects will be merged.

For example:

.. code-block:: yaml

   #cloud-config (1)
   runcmd:
     - bash1
     - bash2

   #cloud-config (2)
   runcmd:
     - bash3
     - bash4

The previous way of merging the two objects above would result in a final
cloud-config object that contains the following:

.. code-block:: yaml

   #cloud-config (merged)
   runcmd:
     - bash3
     - bash4

Typically this is not what users want - instead they would prefer:

.. code-block:: yaml

   #cloud-config (merged)
   runcmd:
     - bash1
     - bash2
     - bash3
     - bash4

This change makes it easier to combine the various cloud-config objects you
have into a more useful list. In this way, we reduce the duplication necessary
to accomplish the same result with the previous method.

Built-in mergers
================

``Cloud-init`` provides merging for the following built-in types:

- :command:`Dict`
- :command:`List`
- :command:`String`

``Dict``
--------

The :command:`Dict` merger has the following options, which control what is
done with values contained within the config.

- :command:`allow_delete`: Existing values not present in the new value can be
  deleted. Defaults to ``False``.
- :command:`no_replace`: Do not replace an existing value if one is already
  present. Enabled by default.
- :command:`replace`: Overwrite existing values with new ones.

``List``
--------

The :command:`List` merger has the following options, which control what is
done with the values contained within the config.

- :command:`append`: Add new value to the end of the list. Defaults to
  ``False``.
- :command:`prepend`: Add new values to the start of the list. Defaults to
  ``False``.
- :command:`no_replace`: Do not replace an existing value if one is already
  present. Enabled by default.
- :command:`replace`: Overwrite existing values with new ones.

String
------

The :command:`Str` merger has the following options, which control what is
done with the values contained within the config.

- :command:`append`: Add new value to the end of the string. Defaults to
  False.

Common options
--------------

These are the common options for all merge types, which control how recursive
merging is done on other types.

- :command:`recurse_dict`: If ``True``, merge the new values of the
  dictionary. Defaults to ``True``.
- :command:`recurse_list`: If ``True``, merge the new values of the list.
  Defaults to ``False``.
- :command:`recurse_array`: Alias for ``recurse_list``.
- :command:`recurse_str`: If ``True``, merge the new values of the string.
  Defaults to False.

Customisation
=============

Because the above merging algorithm may not always be desired (just as the
previous merging algorithm was not always the preferred one), the concept of
customised merging was introduced through `merge classes`.

A `merge class` is a class definition providing functions that can be used
to merge a given type with another given type.

An example of one of these `merging classes` is the following:

.. code-block:: python

   class Merger:
       def __init__(self, merger, opts):
           self._merger = merger
           self._overwrite = 'overwrite' in opts

       # This merging algorithm will attempt to merge with
       # another dictionary, on encountering any other type of object
       # it will not merge with said object, but will instead return
       # the original value
       #
       # On encountering a dictionary, it will create a new dictionary
       # composed of the original and the one to merge with, if 'overwrite'
       # is enabled then keys that exist in the original will be overwritten
       # by keys in the one to merge with (and associated values). Otherwise
       # if not in overwrite mode the 2 conflicting keys themselves will
       # be merged.
       def _on_dict(self, value, merge_with):
           if not isinstance(merge_with, (dict)):
               return value
           merged = dict(value)
           for (k, v) in merge_with.items():
               if k in merged:
                   if not self._overwrite:
                       merged[k] = self._merger.merge(merged[k], v)
                   else:
                       merged[k] = v
               else:
                   merged[k] = v
           return merged

As you can see, there is an ``_on_dict`` method here that will be given a
source value, and a value to merge with. The result will be the merged object.

This code itself is called by another merging class which "directs" the
merging to happen by analysing the object types to merge, and attempting to
find a known object that will merge that type. An example of this can be found
in the :file:`mergers/__init__.py` file (see ``LookupMerger`` and
``UnknownMerger``).

So, following the typical ``cloud-init`` approach of allowing source code to
be downloaded and used dynamically, it is possible for users to inject their
own merging files to handle specific types of merging as they choose (the
basic ones included will handle lists, dicts, and strings). Note how each
merge can have options associated with it, which affect how the merging is
performed. For example, a dictionary merger can be told to overwrite instead
of attempting to merge, or a string merger can be told to append strings
instead of discarding other strings to merge with.

How to activate
===============

There are a few ways to activate the merging algorithms, and to customise them
for your own usage.

1. The first way involves the usage of MIME messages in ``cloud-init`` to
   specify multi-part documents (this is one way in which multiple
   cloud-config can be joined together into a single cloud-config). Two new
   headers are looked for, both of which can define the way merging is done
   (the first header to exist "wins"). These new headers (in lookup order) are
   ``'Merge-Type'`` and ``'X-Merge-Type'``. The value should be a string which
   will satisfy the new merging format definition (see below for this format).

2. The second way is to specify the `merge type` in the body of the
   cloud-config dictionary. There are two ways to specify this; either as a
   string, or as a dictionary (see format below). The keys that are looked up
   for this definition are the following (in order): ``'merge_how'``,
   ``'merge_type'``.

String format
-------------

The following string format is expected: ::

   classname1(option1,option2)+classname2(option3,option4)....

The ``class name`` will be connected to class names used when looking for
the class that can be used to merge, and options provided will be given to the
class upon construction of that class.

The following example shows the default string that gets used when none is
otherwise provided: ::

   list()+dict()+str()

Dictionary format
-----------------

A dictionary can be used when it specifies the same information as the
string format (i.e., the second option above). For example:

.. code-block:: python

   {'merge_how': [{'name': 'list', 'settings': ['append']},
                  {'name': 'dict', 'settings': ['no_replace', 'recurse_list']},
                  {'name': 'str', 'settings': ['append']}]}

This would be the dictionary equivalent of the default string format.

Specifying multiple types, and what this does
=============================================

Now you may be asking yourself: "What exactly happens if I specify a
``merge-type`` header or dictionary for every cloud-config I provide?"

The answer is that when merging, a stack of ``'merging classes'`` is kept. The
first one in the stack is the default merging class. This set of mergers
will be used when the first cloud-config is merged with the initial empty
cloud-config dictionary. If the cloud-config that was just merged provided a
set of merging classes (via the above formats) then those merging classes will
be pushed onto the stack. Now if there is a second cloud-config to be merged
then the merging classes from the cloud-config before the first will be used
(not the default) and so on. In this way a cloud-config can decide how it will
merge with a cloud-config dictionary coming after it.

Other uses
==========

In addition to being used for merging user data sections, the default merging
algorithm for merging :file:`'conf.d'` YAML files (which form an initial YAML
config for ``cloud-init``) was also changed to use this mechanism, to take
advantage of the full benefits (and customisation) here as well. Other places
that used the previous merging are also, similarly, now extensible (metadata
merging, for example).

Note, however, that merge algorithms are not used *across* configuration types.
As was the case before merging was implemented, user data will overwrite
:file:`'conf.d'` configuration without merging.

Example cloud-config
====================

A common request is to include multiple ``runcmd`` directives in different
files and merge all of the commands together. To achieve this, we must modify
the default merging to allow for dictionaries to join list values.

The first config:

.. code-block:: yaml

   #cloud-config
   merge_how:
    - name: list
      settings: [append]
    - name: dict
      settings: [no_replace, recurse_list]

   runcmd:
     - bash1
     - bash2

The second config:

.. code-block:: yaml

   #cloud-config
   merge_how:
    - name: list
      settings: [append]
    - name: dict
      settings: [no_replace, recurse_list]

   runcmd:
     - bash3
     - bash4