summaryrefslogtreecommitdiff
path: root/ironic/conductor
Commit message (Collapse)AuthorAgeFilesLines
...
* | Merge "Add "none" RPC transport that disables the RPC bus"Zuul2021-12-081-10/+23
|\ \
| * | Add "none" RPC transport that disables the RPC busDmitry Tantsur2021-12-071-10/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using the new combined executable in a single-conductor scenario, it may make sense to completely disable the remote RPC. The new ``rpc_transport`` value ``none`` achieves that. Change-Id: I6a83358c65b3ed213c8a991d42660ca51fc3a8ec Story: #2009676 Task: #44104
* | | Merge "All-in-one Ironic service with a local RPC bus"Zuul2021-12-082-57/+122
|\ \ \ | |/ /
| * | All-in-one Ironic service with a local RPC busDmitry Tantsur2021-12-072-57/+122
| |/ | | | | | | | | | | | | | | | | | | | | This adds a new executable /usr/bin/ironic (cool that we no longer have a CLI with this name) that starts API and conductor together in the same process. When an RPC host name matches the current one, the call is not routed through the remote RPC, a local function call is done instead. Story: #2009676 Task: #43953 Change-Id: I51bf7226aea145dc7c8fd93d61caa233ca16c9c9
* | Merge "Refactor driver_internal_info updates to methods"Zuul2021-12-075-128/+81
|\ \
| * | Refactor driver_internal_info updates to methodsSteve Baker2021-12-035-128/+81
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Making updates to driver_internal_info can result in hard to read code due the requirement to assign the whole driver_internal_info back to the node to trigger the expected update operation. This change replaces driver_internal_info update operations with a new methods: - set_driver_internal_info - del_driver_internal_info - timestamp_driver_internal_info This change defines the functions and moves core conductor logic to use them. Subsequent changes in this series will move drivers to use the new functions. Change-Id: Ib8917c3c674e77cd3aba6a1e73c65162e3ee1141
* | Merge "Avoid RPC notify_conductor_resume_{deploy,clean} in agent_base"Zuul2021-12-072-5/+9
|\ \
| * | Avoid RPC notify_conductor_resume_{deploy,clean} in agent_baseDmitry Tantsur2021-12-062-5/+9
| |/ | | | | | | | | | | | | | | | | | | | | | | | | Currently we use an RPC call to the conductor itself to proceed to the next clean or deploy step. This is unnecessary and requires temporary lifting the lock, potentially causing race conditions. This change makes the agent code use continue_node_{deploy,clean} directly. The drivers still need updating, it will be done later. Story: #2008167 Task: #40922 Change-Id: If4763d542029b9021432425532f24a0228f04c25
* | Trivial: log current state when continuing cleaningDmitry Tantsur2021-12-061-2/+2
|/ | | | Change-Id: I02a8ed6802fffee071e94be3c0cab2382b7e60ca
* Merge "[Trivial] Clarify conditions under which power recovery is attempted"Zuul2021-11-151-3/+4
|\
| * [Trivial] Clarify conditions under which power recovery is attemptedArne Wiebalck2021-11-041-3/+4
| | | | | | | | | | | | | | | | Be more precise when describing the conditions for automatic recovery from power failures ('maintenance type' is a term we use nowhere else). Change-Id: Iaf14c0fc73f8c97b9d8669485011966a650c21a8
* | Avoid handling a deploy failure twiceDmitry Tantsur2021-11-041-18/+25
|/ | | | | | | | | In some cases we handle the same exception twice in a row: in agent_base and in deployments.do_next_deploy_step. This change avoids it. Also make deploy step error messages more uniform across the board. Change-Id: Ic84c04118b1a85b10a761fc58796827583a5b086
* Merge "node_periodics: encapsulate the interface class check"Zuul2021-10-141-0/+13
|\
| * node_periodics: encapsulate the interface class checkDmitry Tantsur2021-10-121-0/+13
| | | | | | | | Change-Id: I887d4fe4836bc58b5605e950a4287f0d27a590cb
* | Merge "Add a helper for node-based periodics"Zuul2021-10-142-96/+212
|\ \ | |/
| * Add a helper for node-based periodicsDmitry Tantsur2021-10-112-96/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a very common pattern of periodic tasks that use iter_nodes to fetch some nodes, check them, create a task and conductor some operation. This change introduces a helper decorator for that and migrates the drivers to it. I'm intentionally leaving unit tests intact to demonstrate that the new decorator works exactly the same way (modulo cosmetic changes) as the previous hand-written code. Change-Id: Ifed4a457275d9451cc412dc80f3c09df72f50492 Story: #2009203 Task: #43522
* | Merge "Demote three warning messages"Zuul2021-10-081-3/+3
|\ \
| * | Demote three warning messagesDmitry Tantsur2021-10-061-3/+3
| |/ | | | | | | | | | | | | | | These 3 messages do not convey a lot of useful information to the operators and definitely do not represent a potential issue that warrants a warning. Change-Id: I77f5802125f79c945eb05a278f7ce53696df830a
* | Follow up to Add support for verify stepsJacob Anders2021-10-071-13/+2
| | | | | | | | | | | | | | | | This is a follow up to commit b385d9ae5bda45d683e51d0140be7c0b2c1d6aca shortening log messages, removing unnecessary validations and fixing a typo. Change-Id: Iedb32b5e571c554e19c78c8b7ef9be05d1909242
* | Merge "Add support for verify steps"Zuul2021-10-063-0/+121
|\ \ | |/ |/|
| * Add support for verify stepsJacob Anders2021-09-303-0/+121
| | | | | | | | | | | | | | | | | | | | | | | | This change adds support for verify steps in Ironic. Verify steps allow executing actions on transition from "verifying" to "managable" state and can perform actions such as cleaning BMC job queue or resetting the BMC on supported platforms. Verify steps are similar to deploy and clean steps, just simpler. Story: 2009025 Task: 42751 Change-Id: Iee27199a0315b8609e629bac272998c28274802b
* | Merge "require_exclusive_lock: log traceback that lead to an error"Zuul2021-09-291-0/+5
|\ \
| * | require_exclusive_lock: log traceback that lead to an errorDmitry Tantsur2021-09-231-0/+5
| |/ | | | | | | Change-Id: I546bcc4e53ed9963069f5d57a203421814e2a5be
* | Clean up caches periodicallyDmitry Tantsur2021-09-221-0/+7
|/ | | | | | | | | Currently they're only cleaned up on demand, which can lead to unnecessary disk usage on deployments that are not actively used. Story: #2008909 Task: #42500 Change-Id: Id5b58d1d1b2bbd2988db7a08d4ccfe2166033147
* Facilitate asset copy for bootloader opsJulia Kreger2021-09-151-3/+8
| | | | | | | Adds capability to copy bootloader assets from the system OS into the network boot folders on conductor startup. Change-Id: Ica8f9472d0a2409cf78832166c57f2bb96677833
* Record node history and manage events in dbJulia Kreger2021-09-104-35/+213
| | | | | | | | | | | | | | | | | | | | | | | * Adds periodic task to purge node_history entries based upon provided configuration. * Adds recording of node history entries for errors in the core conductor code. * Also changes the rescue abort behavior to remove the notice from being recorded as an error, as this is a likely bug in behavior for any process or service evaluating the node last_error field. * Makes use of a semi-free form event_type field to help provide some additional context into what is going on and why. For example if deployments are repeatedly failing, then perhaps it is a configuration issue, as opposed to a general failure. If a conductor has no resources, then the failure, in theory would point back to the conductor itself. Story: 2002980 Task: 42960 Change-Id: Ibfa8ac4878cacd98a43dd4424f6d53021ad91166
* Improve edge-case debugging for deployment and cleaningDmitry Tantsur2021-09-023-4/+19
| | | | | | | | | * Log traceback in fail_on_error This is a last-resort error handler, it needs to log traceback. * Use an assertion when we expect a present list of steps * Log the freshly build list of steps. Change-Id: I8cd4cd330551b7bc9a44957e0d15c8b75c09c299
* Split node verification code out of manager.pyJacob Anders2021-08-312-47/+73
| | | | | | | | | | Splitting code specific to node verification from manager.py into verify.py (as well as test_verify.py for tests). This is done in preparation for adding support for verify steps. Story: 2009025 Task: 43137 Change-Id: I22a9bd7ceac3dfd65f20e52cbacff4b9d3998c64
* Minor formatting and doc changes to change boot mode feature commit.Cenne2021-08-241-2/+2
| | | | | | | Story: 2008567 Task: 41709 depends-on: https://review.opendev.org/c/openstack/ironic/+/800084 Change-Id: I44e41dc3d8abcb99a2248d7b9c7ac5e9d786bb98
* Add api endpoints for changing boot_mode and secure_boot stateCenne2021-08-233-6/+205
| | | | | | | | | | | | | | | | | | | | Done: - Node API endpoints expose - RPC methods - Conductor Manager methods - Conductor utils new methods - RBAC new policies - Node API tests - Manager Tests (+ some testing for utils methods) - RBAC tests - Docs (api-ref) - REST API version history - Releasenotes Story: 2008567 Task: 41709 Change-Id: I2d72389edf546b99c536c6b130ca85ababf80591
* Enable priority overrides to enable/disable stepsJacob Anders2021-08-101-2/+9
| | | | | | | | | | | | | | | | Generic way to configure clean step priorites feature ( https://review.opendev.org/c/openstack/ironic/+/744117 ) enabled support for customising clean step priorities for any clean step by setting a configuration option. However, due to an error in code, it was not possible to use this feature to enable/disable steps entirely using this option as overrides were applied too late, after the disabled steps were already filtered out. This change fixes this error, making it possible to use step priority override configuration option to enable/disable steps as required. Story: 2009105 Change-Id: If3c01e6e4e8cedfe053e78fab9632bfff3682b06
* Fixes missing argument for log format stringCenne2021-07-271-1/+2
| | | | | Story: 2008567 Change-Id: Id5bcfad5cd4514dd710232d75fbd729856f16b17
* Add `boot_mode` and `secure_boot` to node object and expose in apiCenne2021-07-082-0/+48
| | | | | | | | | | | | | * add fields to Node object * expose them at endpoint `/v1/nodes/{node_ident}/states` * update states on powersync / entering managed state. * tests * update api endpoint info in api-ref Story: 2008567 Task: 41709 Change-Id: Iddd1421a6fa37d69da56658a2fefa5bc8cfd15e4
* Nicer error message when a deploy step failsDmitry Tantsur2021-06-222-8/+6
| | | | | | | No need to output the whole step structure, especially if it has arguments. Also no need to repeat "deploy". Change-Id: I06275fe894ee24638dca9bb8c78844ff3ca3d29b
* Include bios registry fields in bios APIBob Fournier2021-05-271-2/+1
| | | | | | | | | | Provide the fields in the BIOS setting API - ``/v1/nodes/{node}/bios/{setting}``, and in the BIOS setting list API when details are requested - ``/v1/nodes/<node>/bios?detail=True``. Story: #2008571 Task: #42483 Change-Id: Ie86ec57e428e2bb2efd099a839105e51a94824ab
* Merge "Delay rendering configdrive"Zuul2021-05-262-2/+19
|\
| * Delay rendering configdriveDmitry Tantsur2021-05-192-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | When the configdrive input is JSON (meta_data, etc), delay the rendering until the ISO image is actually used. It has two benefits: 1) Avoid storing a large ISO image in instance_info, 2) Allow deploy steps to access the original user's input. Fix configdrive masking to correctly mask dicts. Story: #2008875 Task: #42419 Change-Id: I86d30bbb505b8c794bfa6412606f4516f8885aa9
* | Trivial: comment why we don't check retired in allocationsDmitry Tantsur2021-05-201-0/+2
|/ | | | Change-Id: I31e128f5273cc50bf7662a62080251d8f226f6c5
* Merge "Process in-band deploy steps on fast-track"Zuul2021-04-281-20/+33
|\
| * Process in-band deploy steps on fast-trackDmitry Tantsur2021-04-131-20/+33
| | | | | | | | | | | | | | | | | | | | Currently we're only recording them, but do not validate and take into account. This change fixes it. The deployment code has been updated to account for the fact that deploy steps can change in run time. Change-Id: I01bd9e3a11fed68213bb392c04aa1d33bbe16b30
* | Remove temporary cleaning information on starting cleaningDmitry Tantsur2021-04-222-6/+2
|/ | | | | | Currently we only remove the URL, which may leave a stale token. Change-Id: I9ff2d726cb75317fe09bd43342541db0e721f2b8
* Wipe agent tokens on inspection start and abortDmitry Tantsur2021-04-082-3/+6
| | | | | | Also make sure the pregenerated flag is always reset. Change-Id: I73aaa803d3eb84ddac59a778e998836a645217eb
* Merge "Add agent_status and agent_status_message params to heartbeat"Zuul2021-04-012-7/+23
|\
| * Add agent_status and agent_status_message params to heartbeatArun S A G2021-03-312-7/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | agent_status is used by anaconda ramdisk to inform the conductor about state of the deployment. Valid agent states are 'start', 'end' and 'error'. The agent_status_message is used to describe the why the agent_status is set to a particular state. Use of these parameters require API version 1.72 or greater. When anaconda finishes deployment the agent_status is set to 'end'. When anaconda ramdisk is unable to deploy the OS for some reason the agent_status is set to 'error'. PXEAnacondaDeploy is implemented to handle the 'anaconda' deploy interface. PXEAnacondaDeploy ties to together pieces needed to deploy a node using anaconda ramdisk. Co-Authored-By: Jay Faulkner <jay@jvf.cc> Change-Id: Ieb452149730510b001c4712bbb2e0f28acfc3c2e
* | Merge "Generic way to configure clean step priorites"Zuul2021-03-311-3/+29
|\ \ | |/ |/|
| * Generic way to configure clean step prioritesJacob Anders2021-03-311-3/+29
| | | | | | | | | | | | | | | | | | This change adds a generic method of configuring clean step priorities instead of making changes in Ironic code every time a new clean step is introduced. Change-Id: I56b9a878724d27af2ac05232a1680017de4d8df5 Story: 1618014
* | Merge "Switch to JSON RPC from ironic-lib"Zuul2021-03-191-1/+1
|\ \
| * | Switch to JSON RPC from ironic-libDmitry Tantsur2021-03-101-1/+1
| | | | | | | | | | | | Change-Id: I8b438861780c85faae7ff18646960723a1fd9876
* | | API to force manual cleaning without booting IPADmitry Tantsur2021-03-165-32/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a new argument disable_ramdisk to the manual cleaning API. Only steps that are marked with requires_ramdisk=False can be run in this mode. Cleaning prepare/tear down is not done. Some steps (like redfish BIOS) currently require IPA to detect a successful reboot. They are not marked with requires_ramdisk just yet. Change-Id: Icacac871603bd48536188813647bc669c574de2a Story: #2008491 Task: #41540
* | | Don't try to use attempts=None with tenacityDmitry Tantsur2021-03-111-4/+5
|/ / | | | | | | Change-Id: Ifb139f71e9cb57409f95512e0dc087d0198b4b86