summaryrefslogtreecommitdiff
path: root/doc/source/contributor/rolling-upgrades.rst
blob: 663b78cb051ad041846ea95bdf4bc8449fe86326 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
.. _rolling-upgrades-dev:

================
Rolling Upgrades
================

The ironic (ironic-api and ironic-conductor) services support rolling upgrades,
starting with a rolling upgrade from the Ocata to the Pike release. This
describes the design of rolling upgrades, followed by notes for developing new
features or modifying an IronicObject.

Design
======

Rolling upgrades between releases
---------------------------------
Ironic follows the `release-cycle-with-intermediary release model
<https://releases.openstack.org/reference/release_models.html>`_.
The releases are `semantic-versioned <http://semver.org/>`_, in the form
<major>.<minor>.<patch>.
We refer to a ``named release`` of ironic as the release associated with a
development cycle like Pike.

In addition, ironic follows the `standard deprecation policy
<https://governance.openstack.org/tc/reference/tags/assert_follows-standard-deprecation.html>`_,
which says that the deprecation period must be at least three months
and a cycle boundary. This means that there will never be anything that
is both deprecated *and* removed between two named releases.

Rolling upgrades will be supported between:

* named release N to N+1 (starting with N == Ocata)
* any named release to its latest revision, containing backported bug fixes.
  Because those bug fixes can contain improvements to the upgrade process, the
  operator should patch the system before upgrading between named releases.
* most recent named release N (and semver releases newer than N) to master.
  As with the above bullet point, there may be a bug or a feature introduced
  on a master branch, that we want to remove before publishing a named release.
  Deprecation policy allows to do this in a 3 month time frame.
  If the feature was included and removed in intermediate releases, there
  should be a release note added, with instructions on how to do a rolling
  upgrade to master from an affected release or release span. This would
  typically instruct the operator to upgrade to a particular intermediate
  release, before upgrading to master.

Rolling upgrade process
-----------------------
Ironic supports rolling upgrades as described in the
:doc:`upgrade guide <../admin/upgrade-guide>`.

The upgrade process will cause the ironic services to be running the ``FromVer``
and ``ToVer`` releases in this order:

0. Upgrade ironic code and run database schema migrations via the
   ``ironic-dbsync upgrade`` command.

1. Upgrade code and restart ironic-conductor services, one at a time.

2. Upgrade code and restart ironic-api services, one at a time.

3. Unpin API, RPC and object versions so that the services can now use the
   latest versions in ``ToVer``. This is done via updating the
   configuration option described below in `API, RPC and object version
   pinning`_ and then restarting the services.
   ironic-conductor services should be restarted
   first, followed by the ironic-api services. This is to ensure that when new
   functionality is exposed on the unpinned API service (via API micro
   version), it is available on the backend.

+------+---------------------------------+---------------------------------+
| step | ironic-api                      | ironic-conductor                |
+======+=================================+=================================+
|  0   | all FromVer                     | all FromVer                     |
+------+---------------------------------+---------------------------------+
|  1.1 | all FromVer                     | some FromVer, some ToVer-pinned |
+------+---------------------------------+---------------------------------+
|  1.2 | all FromVer                     | all ToVer-pinned                |
+------+---------------------------------+---------------------------------+
|  2.1 | some FromVer, some ToVer-pinned | all ToVer-pinned                |
+------+---------------------------------+---------------------------------+
|  2.2 | all ToVer-pinned                | all ToVer-pinned                |
+------+---------------------------------+---------------------------------+
|  3.1 | all ToVer-pinned                | some ToVer-pinned, some ToVer   |
+------+---------------------------------+---------------------------------+
|  3.2 | all ToVer-pinned                | all ToVer                       |
+------+---------------------------------+---------------------------------+
|  3.3 | some ToVer-pinned, some ToVer   | all ToVer                       |
+------+---------------------------------+---------------------------------+
|  3.4 | all ToVer                       | all ToVer                       |
+------+---------------------------------+---------------------------------+

Policy for changes to the DB model
----------------------------------

The policy for changes to the DB model is as follows:

* Adding new items to the DB model is supported.

* The dropping of columns or tables and corresponding objects' fields is
  subject to ironic's `deprecation policy
  <https://governance.openstack.org/tc/reference/tags/assert_follows-standard-deprecation.html>`_.
  But its alembic script has to wait one more deprecation period, otherwise
  an ``unknown column`` exception will be thrown when ``FromVer`` services
  access the DB. This is because :command:`ironic-dbsync upgrade` upgrades the
  DB schema but ``FromVer`` services still contain the dropped field in their
  SQLAlchemy DB model.

* An ``alembic.op.alter_column()`` to rename or resize a column is not allowed.
  Instead, split it into multiple operations, with one operation per release
  cycle (to maintain compatibility with an old SQLAlchemy model). For example,
  to rename a column, add the new column in release N, then remove the old
  column in release N+1.

* Some implementations of SQL's ``ALTER TABLE``, such as adding foreign keys in
  PostgreSQL, may impose table locks and cause downtime. If the change cannot
  be avoided and the impact is significant (e.g. the table can be frequently
  accessed and/or store a large dataset), these cases must be mentioned in the
  release notes.

API, RPC and object version pinning
-----------------------------------

For the ironic services to be running old and new releases at the same time
during a rolling upgrade, the services need to be able to handle different API,
RPC and object versions.

This versioning is handled via the configuration option:
``[DEFAULT]/pin_release_version``. It is used to pin the API, RPC and
IronicObject (e.g., Node, Conductor, Chassis, Port, and Portgroup) versions for
all the ironic services.

The default value of empty indicates that ironic-api and ironic-conductor
will use the latest versions of API, RPC and IronicObjects. Its possible values
are releases, named (e.g. ``ocata``) or sem-versioned (e.g. ``7.0``).

Internally, in `common/release_mappings.py
<https://git.openstack.org/cgit/openstack/ironic/tree/ironic/common/release_mappings.py>`_,
ironic maintains a mapping that indicates the API, RPC and
IronicObject versions associated with each release. This mapping is
maintained manually.

During a rolling upgrade, the services using the new release will set the
configuration option value to be the name (or version) of the old release.
This will indicate to the services running the new release, which API, RPC and
object versions that they should be compatible with, in order to communicate
with the services using the old release.

Handling API versions
---------------------

When the (newer) service is pinned, the maximum API version it supports
will be the pinned version -- which the older service supports (as described
above at `API, RPC and object version pinning`_). The ironic-api
service returns HTTP status code 406 for any requests with API versions that
are higher than this maximum version.

Handling RPC versions
---------------------

`ConductorAPI.__init__()
<https://git.openstack.org/cgit/openstack/ironic/tree/ironic/conductor/rpcapi.py?id=338fdb94fc3b031e8d91bc7131cb4cadf05d7b92#n111>`_
sets the ``version_cap`` variable to the desired (latest or pinned) RPC API
version and passes it to the ``RPCClient`` as an initialization parameter. This
variable is then used to determine the maximum requested message version that
the ``RPCClient`` can send.

Each RPC call can customize the request according to this ``version_cap``.
The `Ironic RPC versions`_ section below has more details about this.

Handling IronicObject versions
------------------------------

Internally, ironic services deal with IronicObjects in their latest versions.
Only at these boundaries, when the IronicObject enters or leaves the service,
do we deal with object versioning:

* getting objects from the database: convert to latest version
* saving objects to the database: if pinned, save in pinned version; else
  save in latest version
* serializing objects (to send over RPC): if pinned, send pinned version;
  else send latest version
* deserializing objects (receiving objects from RPC): convert to latest
  version

The ironic-api service also has to handle API requests/responses
based on whether or how a feature is supported by the API version and object
versions. For example, when the ironic-api service is pinned, it can only
allow actions that are available to the object's pinned version, and cannot
allow actions that are only available for the latest version of that object.

To support this:

* All the database tables (SQLAlchemy models) of the IronicObjects have a
  column named ``version``. The value is the version of the object that
  is saved in the database.

* The method ``IronicObject.get_target_version()`` returns the target version.
  If pinned, the pinned version is returned. Otherwise, the latest version is
  returned.

* The method ``IronicObject.convert_to_version()`` converts the object into the
  target version. The target version may be a newer or older version than the
  existing version of the object. The bulk of the work is done in the helper
  method ``IronicObject._convert_to_version()``. Subclasses that have new
  versions redefine this to perform the actual conversions.

In the following,

* The old release is ``FromVer``; it uses version 1.14 of a Node object.
* The new release is ``ToVer``. It uses version 1.15 of a Node object --
  this has a deprecated ``extra`` field and a new ``meta`` field that replaces
  ``extra``.
* db_obj['meta'] and db_obj['extra'] are the database representations of those
  node fields.

Getting objects from the database (API/conductor <-- DB)
::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Both ironic-api and ironic-conductor services read values from the database.
These values are converted to IronicObjects via the method
``IronicObject._from_db_object()``. This method always returns the IronicObject
in its latest version, even if it was in an older version in the database.
This is done regardless of the service being pinned or not.

Note that if an object is converted to a later version, that IronicObject will
retain any changes (in its ``_changed_fields`` field) resulting from that
conversion. This is needed in case the object gets saved later, in the latest
version.

For example, if the node in the database is in version 1.14 and has
db_obj['extra'] set:

* a ``FromVer`` service will get a Node with node.extra = db_obj['extra']
  (and no knowledge of node.meta since it doesn't exist)

* a ``ToVer`` service (pinned or unpinned), will get a Node with:

  * node.meta = db_obj['extra']
  * node.extra = None
  * node._changed_fields = ['meta', 'extra']

Saving objects to the database (API/conductor --> DB)
:::::::::::::::::::::::::::::::::::::::::::::::::::::

The version used for saving IronicObjects to the database is determined as
follows:

* For an unpinned service, the object is saved in its latest version. Since
  objects are always in their latest version, no conversions are needed.
* For a pinned service, the object is saved in its pinned version. Since
  objects are always in their latest version, the object needs to be converted
  to the pinned version before being saved.

The method ``IronicObject.do_version_changes_for_db()`` handles this logic,
returning a dictionary of changed fields and their new values (similar to the
existing ``oslo.versionedobjects.VersionedObject.obj_get_changes()``).
Since we do not keep track internally, of the database version of an object,
the object's ``version`` field will always be part of these changes.

The `Rolling upgrade process`_  (at step 3.1) ensures that by the time an
object can be saved in its latest version, all services are running the newer
release (although some may still be pinned) and can handle the latest object
versions.

An interesting situation can occur when the services are as described in step
3.1. It is possible for an IronicObject to be saved in a newer version and
subsequently get saved in an older version. For example, a ``ToVer`` unpinned
conductor might save a node in version 1.5. A subsequent request may cause a
``ToVer`` pinned conductor to replace and save the same node in version 1.4!

Sending objects via RPC (API/conductor -> RPC)
::::::::::::::::::::::::::::::::::::::::::::::

When a service makes an RPC request, any IronicObjects that are sent as
part of that request are serialized into entities or primitives via
``IronicObjectSerializer.serialize_entity()``. The version used for objects
being serialized is as follows:

* For an unpinned service, the object is serialized to its latest version.
  Since objects are always in their latest version, no conversions are needed.
* For a pinned service, the object is serialized to its pinned version.
  Since objects are always in their latest version, the object is converted to
  the pinned version before being serialized. The converted object includes
  changes that resulted from the conversion; this is needed so that the service
  at the other end of the RPC request has the necessary information if that
  object will be saved to the database.

Receiving objects via RPC (API/conductor <- RPC)
::::::::::::::::::::::::::::::::::::::::::::::::

When a service receives an RPC request, any entities that are part of the
request need to be deserialized (via
``oslo.versionedobjects.VersionedObjectSerializer.deserialize_entity()``).
For entities that represent IronicObjects, we want the deserialization process
(via ``IronicObjectSerializer._process_object()``) to result in IronicObjects
that are in their latest version, regardless of the version they were sent in
and regardless of whether the receiving service is pinned or not. Again, any
objects that are converted will retain the changes that resulted from the
conversion, useful if that object is later saved to the database.

For example, a ``FromVer`` ironic-api could issue an ``update_node()`` RPC
request with a node in version 1.4, where node.extra was changed (so
node._changed_fields = ['extra']). This node will be serialized in version 1.4.
The receiving ``ToVer`` pinned ironic-conductor deserializes it and converts
it to version 1.5. The resulting node will have node.meta set (to the changed
value from node.extra in v1.4), node.extra = None, and node._changed_fields =
['meta', 'extra'].


When developing a new feature or modifying an IronicObject
==========================================================

When adding a new feature or changing an IronicObject, they need to be coded so
that things work during a rolling upgrade.

The following describe areas where the code may need to be changed, as well as
some points to keep in mind when developing code.

ironic-api
----------

During a rolling upgrade, the new, pinned ironic-api is talking to a new
conductor that might also be pinned. There may also be old ironic-api services.
So the new, pinned ironic-api service needs to act like it was the older
service:

* New features should not be made available, unless they are somehow totally
  supported in the old and new releases. Pinning the API version is in place
  to handle this.

  * If, for whatever reason, the API version pinning doesn't prevent a request
    from being handled that cannot or should not be handled, it should be
    coded so that the response has HTTP status code 406 (Not Acceptable).
    This is the same response to requests that have an incorrect (old) version
    specified.

Ironic RPC versions
-------------------
When the signature (arguments) of an RPC method is changed or new methods are
added, the following needs to be considered:

- The RPC version must be incremented and be the same value for both the
  client (``ironic/conductor/rpcapi.py``, used by ironic-api) and the server
  (``ironic/conductor/manager.py``, used by ironic-conductor). It should also
  be updated in ``ironic/common/release_mappings.py``.
- Until there is a major version bump, new arguments of an RPC method can only
  be added as optional. Existing arguments cannot be removed or changed in
  incompatible ways with the method in older RPC versions.
- ironic-api (client-side) sets a version cap (by passing the version cap to
  the constructor of oslo_messaging.RPCClient). This "pinning" is in place
  during a rolling upgrade when the ``[DEFAULT]/pin_release_version``
  configuration option is set.
- New RPC methods are not available when the service is pinned to the older
  release version. In this case, the corresponding REST API function should
  return a server error or implement alternative behaviours.
- Methods which change arguments should run
  ``client.can_send_version()`` to see if the version of the request is
  compatible with the version cap of the RPC Client. Otherwise the request
  needs to be created to work with a previous version that is supported.
- ironic-conductor (server-side) should tolerate older versions of requests in
  order to keep working during the rolling upgrade process. The behaviour of
  ironic-conductor will depend on the input parameters passed from the
  client-side.
- Old methods can be removed only after they are no longer used by a previous
  named release.

Object versions
---------------
When subclasses of ``ironic.objects.base.IronicObject`` are modified, the
following needs to be considered:

- Any change of fields or change in signature of remotable methods needs a bump
  of the object version. The object versions are also maintained in
  ``ironic/common/release_mappings.py``.
- New objects must be added to ``ironic/common/release_mappings.py``. Also for
  the first releases they should be excluded from the version check by adding
  their class names to the ``NEW_MODELS`` list in ``ironic/cmd/dbsync.py``.
- The arguments of remotable methods (methods which are remoted to the
  conductor via RPC) can only be added as optional. They cannot be removed or
  changed in an incompatible way (to the previous release).
- Field types cannot be changed. Instead, create a new field and deprecate
  the old one.
- There is a `unit test
  <https://git.openstack.org/cgit/openstack/ironic/tree/ironic/tests/unit/objects/test_objects.py?id=e9318c75748c87a318b4ff35d9385b4d09e79da6#n721>`_
  that generates the hash of an object using its fields and the
  signatures of its remotable methods. Objects that have a version bump need
  to be updated in the
  `expected_object_fingerprints
  <https://git.openstack.org/cgit/openstack/ironic/tree/ironic/tests/unit/objects/test_objects.py?id=e9318c75748c87a318b4ff35d9385b4d09e79da6#n682>`_
  dictionary; otherwise this test will fail. A failed test can also indicate to
  the developer that their change(s) to an object require a version bump.
- When new version objects communicate with old version objects and when
  reading or writing to the database,
  ``ironic.objects.base.IronicObject._convert_to_version()`` will be called to
  convert objects to the target version. Objects should implement their own
  ._convert_to_version() to remove or alter fields which were added or changed
  after the target version::

    def _convert_to_version(self, target_version,
                            remove_unavailable_fields=True):
        """Convert to the target version.

        Subclasses should redefine this method, to do the conversion of the
        object to the target version.

        Convert the object to the target version. The target version may be
        the same, older, or newer than the version of the object. This is
        used for DB interactions as well as for serialization/deserialization.

        The remove_unavailable_fields flag is used to distinguish these two
        cases:

        1) For serialization/deserialization, we need to remove the unavailable
           fields, because the service receiving the object may not know about
           these fields. remove_unavailable_fields is set to True in this case.

        2) For DB interactions, we need to set the unavailable fields to their
           appropriate values so that these fields are saved in the DB. (If
           they are not set, the VersionedObject magic will not know to
           save/update them to the DB.) remove_unavailable_fields is set to
           False in this case.

        :param target_version: the desired version of the object
        :param remove_unavailable_fields: True to remove fields that are
            unavailable in the target version; set this to True when
            (de)serializing. False to set the unavailable fields to appropriate
            values; set this to False for DB interactions.

  This method must handle:

  * converting from an older version to a newer version
  * converting from a newer version to an older version
  * making sure, when converting, that you take into consideration other
    object fields that may have been affected by a field (value) only available
    in a newer version. For example, if field 'new' is only available in Node
    version 1.5 and Node.affected = Node.new+3, when converting to 1.4 (an
    older version), you may need to change the value of Node.affected too.

Online data migrations
----------------------
The ``ironic-dbsync online_data_migrations`` command will perform online
data migrations.

Keep in mind the `Policy for changes to the DB model`_.
Future incompatible changes in SQLAlchemy models, like removing or renaming
columns and tables can break rolling upgrades (when ironic services are run
with different release versions simultaneously). It is forbidden to remove these
database resources when they may still be used by the previous named release.

When `creating new Alembic migrations <faq>`_ which modify existing models,
make sure that any new columns default to NULL. Test the migration out on a
non-empty database to make sure that any new constraints don't cause the
database to be locked out for normal operations.

You can find an overview on what DDL operations may cause downtime in
https://dev.mysql.com/doc/refman/5.7/en/innodb-create-index-overview.html.
(You should also check older, widely deployed InnoDB versions for issues.)
In the case of PostgreSQL, adding a foreign key may lock a whole table for
writes.

Make sure to add a release note if there are any downtime-related concerns.

Backfilling default values, and migrating data between columns or between tables
must be implemented inside an online migration script. A script is a database
API method (added to ``ironic/db/api.py`` and ``ironic/db/sqlalchemy/api.py``)
which takes two arguments:

- context: an admin context
- max_count: this is used to limit the query. It is the maximum number of
  objects to migrate; >= 0. If zero, all the objects will be migrated.

It returns a two-tuple:

- the total number of objects that need to be migrated, at the start of
  the method, and
- the number of migrated objects.

In this method, the version column can be used to select and update old
objects.

The method name should be added to the list of ``ONLINE_MIGRATIONS`` in
``ironic/cmd/dbsync.py``.

The method should be removed in the next named release after this one.

After online data migrations are completed and the SQLAlchemy models no longer
contain old fields, old columns can be removed from the database. This takes
at least 3 releases, since we have to wait until the previous named release no
longer contains references to the old schema. Before removing any resources
from the database by modifying the schema, make sure that your implementation
checks that all objects in the affected tables have been migrated. This check
can be implemented using the version column.

"ironic-dbsync upgrade" command
-------------------------------
The ``ironic-dbsync upgrade`` command first checks that the versions of the
objects are compatible with the (new) release of ironic, before it will make
any DB schema changes. If one or more objects are not compatible, the upgrade
will not be performed.

This check is done by comparing the objects' ``version`` field in the database
with the expected (or supported) versions of these objects. The supported
versions are the versions specified in
``ironic.common.release_mappings.RELEASE_MAPPING``.
The newly created tables cannot pass this check and thus have to be excluded by
adding their object class names (e.g. ``Node``) to
``ironic.cmd.dbsync.NEW_MODELS``.