diff options
authorZuul <>2017-11-08 08:17:21 +0000
committerGerrit Code Review <>2017-11-08 08:17:21 +0000
commit1f957b3f2b7330a45b26efab1b507cc24364fe31 (patch)
parent281355ae5bd12d7598d105e38bebacb3de12f1d7 (diff)
parent4fb3808316791999efcd76dc9932c2a479477d56 (diff)
Merge "Rolling upgrades related dev documentation"
3 files changed, 481 insertions, 43 deletions
diff --git a/doc/source/contributor/code-contribution-guide.rst b/doc/source/contributor/code-contribution-guide.rst
index 94f4e2b27..2c6a9a502 100644
--- a/doc/source/contributor/code-contribution-guide.rst
+++ b/doc/source/contributor/code-contribution-guide.rst
@@ -142,49 +142,7 @@ not be tolerated, and will be called out in public on the mailing list.
Live Upgrade Related Concerns
-Ironic implements upgrade with the `same methodology as Nova <>`_.
-Ironic API RPC Versions
-When the signature(arguments) of an RPC method is changed, the following
-things need to be considered:
-- The RPC version must be incremented and be the same value for both the
- client (conductor/, used by ironic-api) and the server
- (conductor/, used by ironic-conductor).
-- New arguments of the method can only be added as optional. Existing
- arguments cannot be removed or changed in incompatible ways (with the
- method in older RPC versions).
-- Client-side can pin a version cap by passing ``version_cap`` to the
- constructor of oslo_messaging.RPCClient. Methods which change arguments
- should run client.can_send_version() to see if the version of the request
- is compatible with the version cap of RPC Client, otherwise the request
- needs to be created to work with a previous version that is supported.
-- Server-side should tolerate the older version of requests in order to keep
- working during the progress of live upgrade. The behavior of server-side
- should depend on the input parameters passed from the client-side.
-Object Versions
-When Object classes (subclasses of ironic.objects.base.IronicObject) are
-modified, the following things need to be considered:
-- The change of fields and the signature of remotable method needs a bump of
- object version.
-- The arguments of methods can only be added as optional, they cannot be
- removed or changed in an incompatible way.
-- Fields types cannot be changed. If it is a must, create a new field and
- deprecate the old one.
-- When new version objects communicate with old version objects,
- obj_make_compatible() will be called to convert objects to the target
- version during serialization. So objects should implement their own
- obj_make_compatible() to remove/alter attributes which was added/changed
- after the target version.
-- There is a test (object/ to generate the hash of object
- fields and the signatures of remotable methods, which helps developers to
- check if the change of objects need a version bump. The object fingerprint
- should only be updated with a version bump.
+See :doc:`/contributor/rolling-upgrades`.
Driver Internal Info
diff --git a/doc/source/contributor/index.rst b/doc/source/contributor/index.rst
index da30a3b0b..e2d53770d 100644
--- a/doc/source/contributor/index.rst
+++ b/doc/source/contributor/index.rst
@@ -26,6 +26,7 @@ primarily for developers.
Provisioning State Machine <states>
Developing New Notifications <notifications>
OSProfiler Tracing <osprofiler-support>
+ Rolling Upgrades <rolling-upgrades>
These pages contain information for PTLs, cross-project liaisons, and core
diff --git a/doc/source/contributor/rolling-upgrades.rst b/doc/source/contributor/rolling-upgrades.rst
new file mode 100644
index 000000000..d2b5b7052
--- /dev/null
+++ b/doc/source/contributor/rolling-upgrades.rst
@@ -0,0 +1,479 @@
+.. _rolling-upgrades-dev:
+Rolling Upgrades
+The ironic (ironic-api and ironic-conductor) services support rolling upgrades,
+starting with a rolling upgrade from the Ocata to the Pike release. This
+describes the design of rolling upgrades, followed by notes for developing new
+features or modifying an IronicObject.
+Rolling upgrades between releases
+Ironic follows the `release-cycle-with-intermediary release model
+The releases are `semantic-versioned <>`_, in the form
+We refer to a ``named release`` of ironic as the release associated with a
+development cycle like Pike.
+In addition, ironic follows the `standard deprecation policy
+which says that the deprecation period must be at least three months
+and a cycle boundary. This means that there will never be anything that
+is both deprecated *and* removed between two named releases.
+Rolling upgrades will be supported between:
+* named release N to N+1 (starting with N == Ocata)
+* any named release to its latest revision, containing backported bug fixes.
+ Because those bug fixes can contain improvements to the upgrade process, the
+ operator should patch the system before upgrading between named releases.
+* most recent named release N (and semver releases newer than N) to master.
+ As with the above bullet point, there may be a bug or a feature introduced
+ on a master branch, that we want to remove before publishing a named release.
+ Deprecation policy allows to do this in a 3 month time frame.
+ If the feature was included and removed in intermediate releases, there
+ should be a release note added, with instructions on how to do a rolling
+ upgrade to master from an affected release or release span. This would
+ typically instruct the operator to upgrade to a particular intermediate
+ release, before upgrading to master.
+Rolling upgrade process
+Ironic supports rolling upgrades as described in the
+:doc:`upgrade guide <../admin/upgrade-guide>`.
+The upgrade process will cause the ironic services to be running the ``FromVer``
+and ``ToVer`` releases in this order:
+1. Upgrade code and restart ironic-conductor services, one at a time.
+2. Upgrade code and restart ironic-api services, one at a time.
+3. Unpin RPC and object versions so that the services can now use the latest
+ versions in ``ToVer``. This is done via updating the new configuration
+ option described below in `RPC and object version pinning`_ and then
+ restarting the services. ironic-conductor services should be restarted
+ first, followed by the ironic-api services. This is to ensure that when new
+ functionality is exposed on the unpinned API service (via API micro
+ version), it is available on the backend.
+| step | ironic-api | ironic-conductor |
+| 0 | all FromVer | all FromVer |
+| 1.1 | all FromVer | some FromVer, some ToVer-pinned |
+| 1.2 | all FromVer | all ToVer-pinned |
+| 2.1 | some FromVer, some ToVer-pinned | all ToVer-pinned |
+| 2.2 | all ToVer-pinned | all ToVer-pinned |
+| 3.1 | all ToVer-pinned | some ToVer-pinned, some ToVer |
+| 3.2 | all ToVer-pinned | all ToVer |
+| 3.3 | some ToVer-pinned, some ToVer | all ToVer |
+| 3.4 | all ToVer | all ToVer |
+Policy for changes to the DB model
+The policy for changes to the DB model is as follows:
+* Adding new items to the DB model is supported.
+* The dropping of columns or tables and corresponding objects' fields is
+ subject to ironic's `deprecation policy
+ <>`_.
+ But its alembic script has to wait one more deprecation period, otherwise
+ an ``unknown column`` exception will be thrown when ``FromVer`` services
+ access the DB. This is because :command:`ironic-dbsync upgrade` upgrades the
+ DB schema but ``FromVer`` services still contain the dropped field in their
+ SQLAlchemy DB model.
+* An ``alembic.op.alter_column()`` to rename or resize a column is not allowed.
+ Instead, split it into multiple operations, with one operation per release
+ cycle (to maintain compatibility with an old SQLAlchemy model). For example,
+ to rename a column, add the new column in release N, then remove the old
+ column in release N+1.
+* Some implementations of SQL's ``ALTER TABLE``, such as adding foreign keys in
+ PostgreSQL, may impose table locks and cause downtime. If the change cannot
+ be avoided and the impact is significant (e.g. the table can be frequently
+ accessed and/or store a large dataset), these cases must be mentioned in the
+ release notes.
+RPC and object version pinning
+For the ironic services to be running old and new releases at the same time
+during a rolling upgrade, the services need to be able to handle different RPC
+versions and object versions.
+This versioning is handled via the configuration option:
+``[DEFAULT]/pin_release_version``. It is used to pin the RPC and IronicObject
+(e.g., Node, Conductor, Chassis, Port, and Portgroup) versions for
+all the ironic services.
+The default value of empty indicates that ironic-api and ironic-conductor
+will use the latest versions of RPC and IronicObjects. Its possible values are
+releases, named (e.g. ``ocata``) or sem-versioned (e.g. ``7.0``).
+Internally, in `common/
+ironic maintains a mapping that indicates the RPC and
+IronicObject versions associated with each release. This mapping is
+maintained manually.
+During a rolling upgrade, the services using the new release will set the
+configuration option value to be the name (or version) of the old release.
+This will indicate to the services running the new release, which RPC and
+object versions that they should be compatible with, in order to communicate
+with the services using the old release.
+Handling RPC versions
+sets the ``version_cap`` variable to the desired (latest or pinned) RPC API
+version and passes it to the ``RPCClient`` as an initialization parameter. This
+variable is then used to determine the maximum requested message version that
+the ``RPCClient`` can send.
+Each RPC call can customize the request according to this ``version_cap``.
+The `Ironic RPC versions`_ section below has more details about this.
+Handling IronicObject versions
+Internally, ironic services deal with IronicObjects in their latest versions.
+Only at these boundaries, when the IronicObject enters or leaves the service,
+do we deal with object versioning:
+* getting objects from the database: convert to latest version
+* saving objects to the database: if pinned, save in pinned version; else
+ save in latest version
+* serializing objects (to send over RPC): if pinned, send pinned version;
+ else send latest version
+* deserializing objects (receiving objects from RPC): convert to latest
+ version
+The ironic-api service also has to handle API requests/responses
+based on whether or how a feature is supported by the API version and object
+versions. For example, when the ironic-api service is pinned, it can only
+allow actions that are available to the object's pinned version, and cannot
+allow actions that are only available for the latest version of that object.
+To support this:
+* All the database tables (SQLAlchemy models) of the IronicObjects have a
+ column named ``version``. The value is the version of the object that
+ is saved in the database.
+* The method ``IronicObject.get_target_version()`` returns the target version.
+ If pinned, the pinned version is returned. Otherwise, the latest version is
+ returned.
+* The method ``IronicObject.convert_to_version()`` converts the object into the
+ target version. The target version may be a newer or older version than the
+ existing version of the object. The bulk of the work is done in the helper
+ method ``IronicObject._convert_to_version()``. Subclasses that have new
+ versions redefine this to perform the actual conversions.
+In the following,
+* The old release is ``FromVer``; it uses version 1.14 of a Node object.
+* The new release is ``ToVer``. It uses version 1.15 of a Node object --
+ this has a deprecated ``extra`` field and a new ``meta`` field that replaces
+ ``extra``.
+* db_obj['meta'] and db_obj['extra'] are the database representations of those
+ node fields.
+Getting objects from the database (API/conductor <-- DB)
+Both ironic-api and ironic-conductor services read values from the database.
+These values are converted to IronicObjects via the method
+``IronicObject._from_db_object()``. This method always returns the IronicObject
+in its latest version, even if it was in an older version in the database.
+This is done regardless of the service being pinned or not.
+Note that if an object is converted to a later version, that IronicObject will
+retain any changes (in its ``_changed_fields`` field) resulting from that
+conversion. This is needed in case the object gets saved later, in the latest
+For example, if the node in the database is in version 1.14 and has
+db_obj['extra'] set:
+* a ``FromVer`` service will get a Node with node.extra = db_obj['extra']
+ (and no knowledge of node.meta since it doesn't exist)
+* a ``ToVer`` service (pinned or unpinned), will get a Node with:
+ * node.meta = db_obj['extra']
+ * node.extra = None
+ * node._changed_fields = ['meta', 'extra']
+Saving objects to the database (API/conductor --> DB)
+The version used for saving IronicObjects to the database is determined as
+* For an unpinned service, the object is saved in its latest version. Since
+ objects are always in their latest version, no conversions are needed.
+* For a pinned service, the object is saved in its pinned version. Since
+ objects are always in their latest version, the object needs to be converted
+ to the pinned version before being saved.
+The method ``IronicObject.do_version_changes_for_db()`` handles this logic,
+returning a dictionary of changed fields and their new values (similar to the
+existing ``oslo.versionedobjects.VersionedObject.obj_get_changes()``).
+Since we do not keep track internally, of the database version of an object,
+the object's ``version`` field will always be part of these changes.
+The `Rolling upgrade process`_ (at step 3.1) ensures that by the time an
+object can be saved in its latest version, all services are running the newer
+release (although some may still be pinned) and can handle the latest object
+An interesting situation can occur when the services are as described in step
+3.1. It is possible for an IronicObject to be saved in a newer version and
+subsequently get saved in an older version. For example, a ``ToVer`` unpinned
+conductor might save a node in version 1.5. A subsequent request may cause a
+``ToVer`` pinned conductor to replace and save the same node in version 1.4!
+Sending objects via RPC (API/conductor -> RPC)
+When a service makes an RPC request, any IronicObjects that are sent as
+part of that request are serialized into entities or primitives via
+``IronicObjectSerializer.serialize_entity()``. The version used for objects
+being serialized is as follows:
+* For an unpinned service, the object is serialized to its latest version.
+ Since objects are always in their latest version, no conversions are needed.
+* For a pinned service, the object is serialized to its pinned version.
+ Since objects are always in their latest version, the object is converted to
+ the pinned version before being serialized. The converted object includes
+ changes that resulted from the conversion; this is needed so that the service
+ at the other end of the RPC request has the necessary information if that
+ object will be saved to the database.
+Receiving objects via RPC (API/conductor <- RPC)
+When a service receives an RPC request, any entities that are part of the
+request need to be deserialized (via
+For entities that represent IronicObjects, we want the deserialization process
+(via ``IronicObjectSerializer._process_object()``) to result in IronicObjects
+that are in their latest version, regardless of the version they were sent in
+and regardless of whether the receiving service is pinned or not. Again, any
+objects that are converted will retain the changes that resulted from the
+conversion, useful if that object is later saved to the database.
+For example, a ``FromVer`` ironic-api could issue an ``update_node()`` RPC
+request with a node in version 1.4, where node.extra was changed (so
+node._changed_fields = ['extra']). This node will be serialized in version 1.4.
+The receiving ``ToVer`` pinned ironic-conductor deserializes it and converts
+it to version 1.5. The resulting node will have node.meta set (to the changed
+value from node.extra in v1.4), node.extra = None, and node._changed_fields =
+['meta', 'extra'].
+When developing a new feature or modifying an IronicObject
+When adding a new feature or changing an IronicObject, they need to be coded so
+that things work during a rolling upgrade.
+The following describe areas where the code may need to be changed, as well as
+some points to keep in mind when developing code.
+During a rolling upgrade, the new, pinned ironic-api is talking to a new
+conductor that might also be pinned. There may also be old ironic-api services.
+So the new, pinned ironic-api service needs to act like it was the older
+* New features should not be made available, unless they are somehow totally
+ supported in the old and new releases. As a `future enhancement
+ <>`_, since new features or
+ new REST APIs are associated with new API microversions, we should enhance
+ the ``[DEFAULT]/pin_release_version`` configuration option to also include
+ pinning the API microversion.
+* For requests that cannot or should not be handled, the response should be
+ HTTP 406 (Not Acceptable). This is the same response to requests that have
+ an incorrect (old) version specified.
+ * This includes accessing a new field of an object. For example, if a
+ "new_field" field was added to Node, any requests pertaining to
+ Node.new_field should not be processed.
+Ironic RPC versions
+When the signature (arguments) of an RPC method is changed or new methods are
+added, the following needs to be considered:
+- The RPC version must be incremented and be the same value for both the
+ client (``ironic/conductor/``, used by ironic-api) and the server
+ (``ironic/conductor/``, used by ironic-conductor). It should also
+ be updated in ``ironic/common/``.
+- Until there is a major version bump, new arguments of an RPC method can only
+ be added as optional. Existing arguments cannot be removed or changed in
+ incompatible ways with the method in older RPC versions.
+- ironic-api (client-side) sets a version cap (by passing the version cap to
+ the constructor of oslo_messaging.RPCClient). This "pinning" is in place
+ during a rolling upgrade when the ``[DEFAULT]/pin_release_version``
+ configuration option is set.
+- New RPC methods are not available when the service is pinned to the older
+ release version. In this case, the corresponding REST API function should
+ return a server error or implement alternative behaviours.
+- Methods which change arguments should run
+ ``client.can_send_version()`` to see if the version of the request is
+ compatible with the version cap of the RPC Client. Otherwise the request
+ needs to be created to work with a previous version that is supported.
+- ironic-conductor (server-side) should tolerate older versions of requests in
+ order to keep working during the rolling upgrade process. The behaviour of
+ ironic-conductor will depend on the input parameters passed from the
+ client-side.
+- Old methods can be removed only after they are no longer used by a previous
+ named release.
+Object versions
+When subclasses of ``ironic.objects.base.IronicObject`` are modified, the
+following needs to be considered:
+- Any change of fields or change in signature of remotable methods needs a bump
+ of the object version. The object versions are also maintained in
+ ``ironic/common/``.
+- New objects must be added to ``ironic/common/``.
+- The arguments of remotable methods (methods which are remoted to the
+ conductor via RPC) can only be added as optional. They cannot be removed or
+ changed in an incompatible way (to the previous release).
+- Field types cannot be changed. Instead, create a new field and deprecate
+ the old one.
+- There is a `unit test
+ <>`_
+ that generates the hash of an object using its fields and the
+ signatures of its remotable methods. Objects that have a version bump need
+ to be updated in the
+ `expected_object_fingerprints
+ <>`_
+ dictionary; otherwise this test will fail. A failed test can also indicate to
+ the developer that their change(s) to an object require a version bump.
+- When new version objects communicate with old version objects and when
+ reading or writing to the database,
+ ``ironic.objects.base.IronicObject._convert_to_version()`` will be called to
+ convert objects to the target version. Objects should implement their own
+ ._convert_to_version() to remove or alter fields which were added or changed
+ after the target version::
+ def _convert_to_version(self, target_version,
+ remove_unavailable_fields=True):
+ """Convert to the target version.
+ Subclasses should redefine this method, to do the conversion of the
+ object to the target version.
+ Convert the object to the target version. The target version may be
+ the same, older, or newer than the version of the object. This is
+ used for DB interactions as well as for serialization/deserialization.
+ The remove_unavailable_fields flag is used to distinguish these two
+ cases:
+ 1) For serialization/deserialization, we need to remove the unavailable
+ fields, because the service receiving the object may not know about
+ these fields. remove_unavailable_fields is set to True in this case.
+ 2) For DB interactions, we need to set the unavailable fields to their
+ appropriate values so that these fields are saved in the DB. (If
+ they are not set, the VersionedObject magic will not know to
+ save/update them to the DB.) remove_unavailable_fields is set to
+ False in this case.
+ :param target_version: the desired version of the object
+ :param remove_unavailable_fields: True to remove fields that are
+ unavailable in the target version; set this to True when
+ (de)serializing. False to set the unavailable fields to appropriate
+ values; set this to False for DB interactions.
+ This method must handle:
+ * converting from an older version to a newer version
+ * converting from a newer version to an older version
+ * making sure, when converting, that you take into consideration other
+ object fields that may have been affected by a field (value) only available
+ in a newer version. For example, if field 'new' is only available in Node
+ version 1.5 and Node.affected =, when converting to 1.4 (an
+ older version), you may need to change the value of Node.affected too.
+Online data migrations
+Keep in mind the `Policy for changes to the DB model`_.
+Future incompatible changes in SQLAlchemy models, like removing or renaming
+columns and tables can break rolling upgrades (when ironic services are run
+with different release versions simultaneously). It is forbidden to remove these
+database resources when they may still be used by the previous named release.
+When `creating new Alembic migrations <faq>`_ which modify existing models,
+make sure that any new columns default to NULL. Test the migration out on a
+non-empty database to make sure that any new constraints don't cause the
+database to be locked out for normal operations.
+You can find an overview on what DDL operations may cause downtime in
+(You should also check older, widely deployed InnoDB versions for issues.)
+In the case of PostgreSQL, adding a foreign key may lock a whole table for
+Make sure to add a release note if there are any downtime-related concerns.
+Backfilling default values, and migrating data between columns or between tables
+must be implemented inside an online migration script. A script is a database
+API method (added to ``ironic/db/`` and ``ironic/db/sqlalchemy/``)
+which takes two arguments:
+- context: an admin context
+- max_count: this is used to limit the query. It is the maximum number of
+ objects to migrate; >= 0. If zero, all the objects will be migrated.
+It returns a two-tuple:
+- the total number of objects that need to be migrated, at the start of
+ the method, and
+- the number of migrated objects.
+In this method, the version column can be used to select and update old
+The method name should be added to the list of ``ONLINE_MIGRATIONS`` in
+The method should be removed in the next named release after this one.
+After online data migrations are completed and the SQLAlchemy models no longer
+contain old fields, old columns can be removed from the database. This takes
+at least 3 releases, since we have to wait until the previous named release no
+longer contains references to the old schema. Before removing any resources
+from the database by modifying the schema, make sure that your implementation
+checks that all objects in the affected tables have been migrated. This check
+can be implemented using the version column.