diff options
author | Julia Kreger <juliaashleykreger@gmail.com> | 2022-09-07 09:58:57 -0700 |
---|---|---|
committer | Julia Kreger <juliaashleykreger@gmail.com> | 2022-10-13 21:21:40 +0000 |
commit | 49e085583dec81c63d19f80a9ba067e38d8043ae (patch) | |
tree | 8a7da1a1c1c540f56bd578d3e06cf6038fcae239 /ironic/conductor | |
parent | 1435a15ce3013da0a3138e1c5ab6ac523c382bb8 (diff) | |
download | ironic-49e085583dec81c63d19f80a9ba067e38d8043ae.tar.gz |
Phase 1 - SQLAlchemy 2.0 Compatability
One of the major changes in SQLAlchemy 2.0 is the removal
of autocommit support. It turns out Ironic was using this quite
aggressively without even really being aware of it.
* Moved the declaritive_base to ORM, as noted in the SQLAlchemy 2.0
changes[0].
* Console testing caused us to become aware of issues around locking
where session synchronization, when autocommit was enabled, was
defaulted to False. The result of this is that you could have two
sessions have different results, which could results on different
threads, and where one could still attempt to lock based upon prior
information. Inherently, while this basically worked, it was
also sort of broken behavior. This resulted in locking being
rewritten to use the style mandated in SQLAlchemy 2.0 migration
documentation. This ultimately is due to locking, which is *heavily*
relied upon in Ironic, and in unit testing with sqlite, there are
no transactions, which means we can get some data inconsistency
in unit testing as well if we're reliant upon the database to
precisely and exactly return what we committed.[1]
* Begins changing the query.one()/query.all() style to use explicit
select statements as part of the new style mandated for migration
to SQLAlchemy 2.0.
* Instead of using field label strings for joined queries, use the
object format, which makes much more sense now, and is part of
the items required for eventual migration to 2.0.
* DB queries involving Traits are now loaded using SelectInLoad
as opposed to Joins. The now deprecated ORM queries were quietly
and silently de-duplicating rows and providing consistent sets
from the resulting joined table responses, however putting much
higher CPU load on the processing of results on the client.
Prior performance testing has informed us this should be a minimal
overhead impact, however these queries should no longer be in
transactions with the Database Servers which should offset the
shift in load pattern. The reason we cannot continue to deduplicate
locally in our code is because we carry Dict data sets which cannot
be hashed for deduplication. Most projects have handled this by
treating them as Text and then converting, but without a massive
rewrite, this seems to be the viable middle ground.
* Adds an explict mapping for traits and tags on the Node object
to point directly to the NodeTrait and NodeTag classes. This
superceeds the prior usage of a backref to make the association.
* Splits SQLAlchemy class model Node into Node and NodeBase, which
allows for high performance queries to skip querying for ``tags``
and ``traits``. Otherwise with the afrormentioned lookups would
always execute as they are now properties as well on the Node
class. This more common of a SQLAlchemy model, but Ironic's model
has been a bit more rigid to date.
* Adds a ``start_consoles`` and ``start_allocations`` option to the
conductor ``init_host`` method. This allows unit tests to be
executed and launched with the service context, while *not* also
creating race conditions which resulted in failed tests.
* The db API ``_paginate_query`` wrapper now contains additional
logic to handle traditional ORM query responses and the newer style
of unified query responses. Due to differences in queries and handling,
which also was part of the driver for the creation of ``NodeBase``,
as SQLAlchemy will only create an object if a base object is referenced.
Also, by default, everything returned is a tuple in 1.4 with the
unified interface.
* Also modified one unit test which counted time.sleep calls, which is
a known pattern which can create failures which are ultimately noise.
Ultimately, I have labelled the remaining places which SQLAlchemy
warnings are raised at for deprecation/removal of functionality,
which needs to be addressed.
[0] https://docs.sqlalchemy.org/en/14/changelog/migration_20.html
[1] https://docs.sqlalchemy.org/en/14/dialects/sqlite.html#transaction-isolation-level-autocommit
Change-Id: Ie0f4b8a814eaef1e852088d12d33ce1eab408e23
Diffstat (limited to 'ironic/conductor')
-rw-r--r-- | ironic/conductor/base_manager.py | 17 | ||||
-rw-r--r-- | ironic/conductor/manager.py | 7 |
2 files changed, 14 insertions, 10 deletions
diff --git a/ironic/conductor/base_manager.py b/ironic/conductor/base_manager.py index aa684408f..22ebd57f5 100644 --- a/ironic/conductor/base_manager.py +++ b/ironic/conductor/base_manager.py @@ -88,10 +88,14 @@ class BaseConductorManager(object): # clear all locks held by this conductor before registering self.dbapi.clear_node_reservations_for_conductor(self.host) - def init_host(self, admin_context=None): + def init_host(self, admin_context=None, start_consoles=True, + start_allocations=True): """Initialize the conductor host. :param admin_context: the admin context to pass to periodic tasks. + :param start_consoles: If consoles should be started in intialization. + :param start_allocations: If allocations should be started in + initialization. :raises: RuntimeError when conductor is already running. :raises: NoDriversLoaded when no drivers are enabled on the conductor. :raises: DriverNotFound if a driver is enabled that does not exist. @@ -189,8 +193,9 @@ class BaseConductorManager(object): # Start consoles if it set enabled in a greenthread. try: - self._spawn_worker(self._start_consoles, - ironic_context.get_admin_context()) + if start_consoles: + self._spawn_worker(self._start_consoles, + ironic_context.get_admin_context()) except exception.NoFreeConductorWorker: LOG.warning('Failed to start worker for restarting consoles.') @@ -207,8 +212,9 @@ class BaseConductorManager(object): # Resume allocations that started before the restart. try: - self._spawn_worker(self._resume_allocations, - ironic_context.get_admin_context()) + if start_allocations: + self._spawn_worker(self._resume_allocations, + ironic_context.get_admin_context()) except exception.NoFreeConductorWorker: LOG.warning('Failed to start worker for resuming allocations.') @@ -539,6 +545,7 @@ class BaseConductorManager(object): try: with task_manager.acquire(context, node_uuid, shared=False, purpose='start console') as task: + notify_utils.emit_console_notification( task, 'console_restore', obj_fields.NotificationStatus.START) diff --git a/ironic/conductor/manager.py b/ironic/conductor/manager.py index 7e98459ff..cf4988958 100644 --- a/ironic/conductor/manager.py +++ b/ironic/conductor/manager.py @@ -2203,18 +2203,16 @@ class ConductorManager(base_manager.BaseConductorManager): """ LOG.debug('RPC set_console_mode called for node %(node)s with ' 'enabled %(enabled)s', {'node': node_id, 'enabled': enabled}) - - with task_manager.acquire(context, node_id, shared=False, + with task_manager.acquire(context, node_id, shared=True, purpose='setting console mode') as task: node = task.node - task.driver.console.validate(task) - if enabled == node.console_enabled: op = 'enabled' if enabled else 'disabled' LOG.info("No console action was triggered because the " "console is already %s", op) else: + task.upgrade_lock() node.last_error = None node.save() task.spawn_after(self._spawn_worker, @@ -3469,7 +3467,6 @@ class ConductorManager(base_manager.BaseConductorManager): self.conductor.id): # Another conductor has taken over, skipping continue - LOG.debug('Taking over allocation %s', allocation.uuid) allocations.do_allocate(context, allocation) except Exception: |