| Commit message (Collapse) | Author | Age | Files | Lines |
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Instead of clearing existing reservations at the beginning of
del_host, wait for the tasks holding them to go to completion. This
check continues indefinitely until the conductor process exits due to
one of:
- All reservations for this conductor are released
- CONF.graceful_shutdown_timeout has elapsed
- The process manager (systemd, kubernetes) sends SIGKILL after the
configured graceful period
Because the default values of [DEFAULT]graceful_shutdown_timeout and
[conductor]heartbeat_timeout are the same (60s) no other conductor
will claim a node as an orphan until this conductor exits.
Change-Id: Ib8db915746228cd87272740825aaaea1fdf953c7
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Bandit 1.7.5 dropped with logic to check requests invocations.
Specifically if a timeout is not explicitly set, then it results
in an error.
This should cause our bandit job to go green.
Closes-Bug: 2015284
Change-Id: I1dcb3075de63aae97bb22012a54736c293393185
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Lookup returns generic 404 errors for security reasons. Logging is
the only way of debugging any issues during it.
Change-Id: I860ed6b90468a403f0f6cdec9c3d84bc872fda06
|
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Enables boot modes switching with Anaconda deploy for ilo driver
Story: 2010357
Task: 46530
Change-Id: I383cdd5c9d45b074d351ec98b1145fd68e2f3ac3
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
* Avoid using the term "introspection". We need to settle on either
"inspection" or "introspection", and the Ironic API already uses
the former.
* Accept (and return) inventory and plugin data separately to reflect
the Ironic API (single JSON blobs are an Inspector legacy).
* Make sure to mention the container name in error logging.
* Use more readable formatting syntax for building Swift names.
* Do not mock objects with dicts (in unit tests).
* Simplify inventory API tests.
Change-Id: Id8c4bc6d35b9634f5a5ac2b345a8fd7f1dba13c0
|
|\ \ \ \ \
| |/ / / / |
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change-Id: I7bba31e73daef7292d0710242e6f88793b7ab357
|
|\ \ \ \ \
| |/ / / / |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
... to reduce the already frightening size of ironic.conductor.manager
and make space for more inspection additions.
While here, fix up log messages for clarity and brevity.
Change-Id: I5196d58016ae094f17e0aad187a11d9cceaab04b
|
|\ \ \ \ \
| | |/ / /
| |/| / /
| |_|/ /
|/| | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fixes Secureboot with Anaconda deploy with PXE and iPXE
Story:2010356
Task: 46529
Change-Id: Id6262654bb5e41e02c7d90b9a9aaf395e7b6a088
|
|\ \ \ \
| |/ / /
|/| | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
SNMP driver was using the wrong dictionary key to retrieve auth_protocol
and priv_protocol from driver info. As a result, the SNMP client was
created with empty strings for both those fields. Any nodes configured
to use SNMP v3 with those fields failed because the SNMP driver was
unable to perform power related operations due to authentication error.
- Use correct keys for snmp auth_protocol and priv_protocol when
creating SNMP client
- Sanitize snmp auth_key and priv_key in API results
Story: 2010613
Task: 47535
Change-Id: I5efd3c9f79a021f1a8e613c3d13b6596a7972672
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In a relatively odd turn of events, should cleaning
have started, but then timed out due to lost communications
or a hard failure of the machine, an agent token could
previously be orphaned preventing re-cleaning.
We now explicitly remove the token in this case.
Change-Id: I236cdf6ddb040284e9fd1fa10136ad17ef665638
|
|\ \ \ \ \
| |_|_|/ /
|/| | | | |
|
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
While investigating a very curious report, I discovered that
if somehow the power was *already* turned off to a node, say
through an incorrect BMC *or* human action, and Ironic were
to pick it up (as it does by default, because it checks before
applying the power state, then it would not wipe the token
information, preventing the agent from connecting on the next
action/attempt/operation.
We now remove the token on all calls to conductor
utilities node_power_action method when appropriate, even
if no other work is required.
Change-Id: Ie89e8be9ad2887467f277772445d4bef79fa5ea1
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change-Id: I0acc5303c1a38645318fb9be4cb068d069b7fe6a
|
|\ \ \ \ \ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Even if a glance image is raw, we still recalculate the checksum after
"converting" it to raw. This process may take exceptionally long.
Change-Id: Id93d518b8d2b8064ff901f1a0452abd825e366c0
|
|\ \ \ \ \ \
| | |/ / / /
| |/| | | | |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Converts ironic.drivers.modules.inspector into a package with
two subpackages: client and interface, the latter containing most
of the current content.
Change-Id: Idbfd275c60a873e3de2e0a34db793619f8c99d85
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This mapping allows object version upgrades to be navigated
and needs to be updated pre-release otherwise we break the
inherent upgrade job to the latest state of the development
branch.
Also, had to backfill the records for the bugfix branch since,
while not required for that version to run, it is required to
have to upgrade from that version.
Also, lists antelope and 2023.1 as "named" releases, due to the
abiguity and configuration, it just seemed better to be on the
safe side.
Change-Id: I633275caf8c3dc750023fbb27bd8a3f4d23e9fa5
|
| |/ / / /
|/| | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
... And tags, but nobody uses tags since it is not available
via the API.
Anyhow, the online upgrade code was written under the assumption
that *all* tables had an "id" column. This is not always true
in the ironic data model for tables which started as pure extensions
of the Nodes table, and fails in particular when:
1) A database row has data stored in an ealier version of the object
2) That same object gets a version upgrade.
In the case which discovered this, BIOSSetting was added at version
1.0, and later updated to include additional fields which incremented
the version to 1.1. When the upgrade went to evaluate and iterate
through the fields, the command failed because the table was designed
around "node_id" instead of "id".
Story: 2010632
Task: 47590
Change-Id: I7bec6cfacb9d1558bc514c07386583436759f4df
|
|\ \ \ \ \
| |/ / / /
|/| | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When cleaning fails, we power off the node, unless it has been running
a clean step already. This happens when aborting cleaning or on a boot
failure. This change makes sure that the power action does not wipe
the last_error field, resulting in a node with provision_state=CLEANFAIL
and last_error=None for several seconds. I've hit this in Metal3.
Also when aborting cleaning, make sure last_error is set during
the transition to CLEANFAIL, not when the clean up thread starts
running.
While here, make sure to log the current step in all cases, not only
when aborting a non-abortable step.
Change-Id: Id21dd7eb44dad149661ebe2d75a9b030aa70526f
Story: #2010603
Task: #47476
|
|\ \ \ \ \
| |_|/ / /
|/| | | /
| | |_|/
| |/| | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently when a conductor is stopped, the rpc service stops
responding to requests as soon as self.manager.del_host returns. This
means that until the hash ring is reset on the whole cluster, requests
can be sent to a service which is stopped.
This change waits for the remaining seconds to delay stopping until
CONF.hash_ring_reset_interval has elapsed. This will improve the
reliability of the cluster when scaling down or rolling out updates.
This delay only occurs when there is more than one online conductor,
to allow fast restarts on single-node ironic installs (bifrost,
metal3).
Change-Id: I643eb34f9605532c5c12dd2a42f4ea67bf3e0b40
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Simulating workloads with the fake driver currently misses the reality
that some operations take time to complete, rather than occuring
instantly. This makes it difficult to mock real workloads for
performance and functional testing of ironic itself.
This change adds configurable random wait times for fake drivers in a
new ironic.conf [fake] section. Each supported driver having one
configuration option controlling the delay. These delays are applied
to operations which typically block in other drivers.
The default value of zero continues the existing behaviour of no
delay. A single integer value will result in a constant delay in
seconds. Two values separated by a comma will result in a triangular
distribution weighted by the first value, specifically in python[1]:
random.triangular(a, b, a)
Change-Id: I7cb1b50d035939e6c4538b3373002a309bfedea4
[1] https://docs.python.org/3/library/random.html#random.triangular
|
|\ \ \ \ \
| |_|/ / /
|/| | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This change adds the capability for the ironic-conductor
and standalone service process to transmit timer and counter
metrics to the message bus notifier which may be consumed by
a ceilometer, ironic-prometheus-exporter, or other consumer of
metrics event data on to the message bus.
This functionality is not presently supported on dedicated API
services such as those running as an ``ironic-api`` application
process, or Ironic WSGI application. This is due to the lack of
an internal trigger mechanism to transmit the data in a metrics
update to the message bus and/or notifier plugin.
This change requires ironic-lib 5.4.0 to collect and ship metrics via
the message bus.
Depends-On: https://review.opendev.org/c/openstack/ironic-lib/+/865311
Change-Id: If6941f970241a22d96e06d88365f76edc4683364
|
|\ \ \ \ \ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Follow-up to I385594339028c20cfc83fdcc4cbbec107efdacff
Story: 2010378
Task: 46624
Change-Id: I95f3caaaf3fd92d60ce39b5803747728f65bbc17
|
|\ \ \ \ \ \ |
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
While developing some internal metrics collection capability,
and the realization that a lock was needed, we realized that
the lock activity itself would be a bit noisy. And image actions
also get lock logging, and it is just really noisy, but not super
helpful for troubleshooting.
So, set it to WARNING instead.
Discussion wise, see:
https://review.opendev.org/c/openstack/ironic-lib/+/865311
Change-Id: I3ab14ee5b5cc063784d26e3c760f1422c692060d
|
|\ \ \ \ \ \ \ |
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Recently we hit an issue that the pid file is missing, current logic
simply removes pid file if the corresponding process is not found,
but if the pid file is lost then the console could never be stopped
and futher more, be restarted, regardless if the process is there or
not.
This patch captures FileNotFound to the exception handling to allow
console recovery.
Change-Id: I1a0b8347e960c6cff8aca10a22c67b710f7d617e
|
|\ \ \ \ \ \ \ \
| |_|_|_|_|_|/ /
|/| | | | | | | |
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The tl;dr is that we changed ``inspecting`` to include a
``inspect wait`` state. Unfortunately we never spotted the logic
inside of the db API. We never spotted it because our testing in
inspection code uses a mocked task manager... and we *really* don't
have intense db testing because we expect the objects and higher
level interactions to validate the lowest db level.
Unfortunately, because of the out of band inspection workflow,
we have to cover both cases in terms of what the starting state
and ending state could be, but we've added tests to
validate this is handled as we expect.
Change-Id: Icccbc6d65531e460c55555e021bf81d362f5fc8b
|
|\ \ \ \ \ \ \ \ |
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Some of these metrics decorator were unlabeled without a class which
would result in semi-confusing structures for the metrics counters.
Now, we should be semi-consistent.
Change-Id: Ie2795419991dc941f2a2b2bc0c6116b92d285041
|
|\ \ \ \ \ \ \ \ \ |
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The dynamically allocated console port for a node is saved
into database and reused on subsequent console operations.
In certain code path the port record cann't be trusted and
we should do a re-allocation.
This patch fixes the issue by ignores previous allocation
record. The extra cleanup in the takeover is not required
anymore and removed as well.
Change-Id: I1a07ea9b30a2c760af7a6a4e39f3ff227df28fff
Story: 2010489
Task: 47061
|
|\ \ \ \ \ \ \ \ \ \
| |_|_|_|_|_|_|_|_|/
|/| | | | | | | | | |
|