summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge "tests: Use GreenThreadPoolExecutor.shutdown(wait=True)"HEADmasterZuul2023-05-172-0/+42
|\
| * tests: Use GreenThreadPoolExecutor.shutdown(wait=True)melanie witt2023-05-172-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are still having some issues in the gate where greenlets from previous tests continue to run while the next test starts, causing false negative failures in unit or functional test jobs. This adds a new fixture that will ensure GreenThreadPoolExecutor.shutdown() is called with wait=True, to wait for greenlets in the pool to finish running before moving on. In local testing, doing this does not appear to adversely affect test run times, which was my primary concern. As a baseline, I ran a subset of functional tests in a loop until failure without the patch and after 11 hours, I got a failure reproducing the bug. With the patch, running the same subset of functional tests in a loop has been running for 24 hours and has not failed yet. Based on this, I think it may be worth trying this out to see if it will help stability of our unit and functional test jobs. And if it ends up impacting test run times or causes other issues, we can revert it. Partial-Bug: #1946339 Change-Id: Ia916310522b007061660172fa4d63d0fde9a55ac
* | Merge "doc: Update version info"Zuul2023-05-111-2/+2
|\ \
| * | doc: Update version infoStephen Finucane2023-04-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | The patch to remove legacy migrations merged during the Bobcat cycle, not the Antelope cycle, so the docs need to be updated accordingly. Change-Id: I0d164ff1aaaab8d84116a0210f668330d2f86e7e Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
* | | CI: fix backport validator for new branch namingElod Illes2023-05-111-1/+1
| |/ |/| | | | | | | | | | | | | validate-backport job started to fail as only old stable branch naming is accepted. This patch extends the script to allow numbers and dot as well in the branch names (like stable/2023.1). Change-Id: Icbdcd5d124717e195d55d9e42530611ed812fadd
* | Merge "Bump nova-ceph-multstore timeout"Zuul2023-05-111-1/+2
|\ \
| * | Bump nova-ceph-multstore timeoutDan Smith2023-05-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent change(s) to enable a lot more SSHABLE checks puts the runtime of the ceph job really close to the 2h timeout even when things are working. Sometimes it times out before it finishes even though things are progressing. Bump the timeout to avoid that. Also bump us to 8G swap to match what is set on the parent ceph job when we upgraded to jammy. We could just unset this, but better to pin it high in case that job (defined elsewhere) changes. Our job is the largest ceph job, so it makes sense that it keeps its own swap level high. Change-Id: I6cefd87671614d87d92e4675fbc989fc9453c8b9
* | | Enable use of service user token with admin contextmelanie witt2023-05-106-8/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the [service_user] section is configured in nova.conf, nova will have the ability to send a service user token alongside the user's token. The service user token is sent when nova calls other services' REST APIs to authenticate as a service, and service calls can sometimes have elevated privileges. Currently, nova does not however have the ability to send a service user token with an admin context. This means that when nova makes REST API calls to other services with an anonymous admin RequestContext (such as in nova-manage or periodic tasks), it will not be authenticated as a service. This adds a keyword argument to service_auth.get_auth_plugin() to enable callers to provide a user_auth object instead of attempting to extract the user_auth from the RequestContext. The cinder and neutron client modules are also adjusted to make use of the new user_auth keyword argument so that nova calls made with anonymous admin request contexts can authenticate as a service when configured. Related-Bug: #2004555 Change-Id: I14df2d55f4b2f0be58f1a6ad3f19e48f7a6bfcb4
* | | Use force=True for os-brick disconnect during deletemelanie witt2023-05-1039-114/+413
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'force' parameter of os-brick's disconnect_volume() method allows callers to ignore flushing errors and ensure that devices are being removed from the host. We should use force=True when we are going to delete an instance to avoid leaving leftover devices connected to the compute host which could then potentially be reused to map to volumes to an instance that should not have access to those volumes. We can use force=True even when disconnecting a volume that will not be deleted on termination because os-brick will always attempt to flush and disconnect gracefully before forcefully removing devices. Closes-Bug: #2004555 Change-Id: I3629b84d3255a8fe9d8a7cea8c6131d7c40899e8
* | Merge "Revert "Debug Nova APIs call failures""Zuul2023-05-091-6/+0
|\ \
| * | Revert "Debug Nova APIs call failures"Sylvain Bauza2023-05-021-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit afb0f774841d30dcae9c074d524e7fa9be840678. Reason for revert: We unfortunately leak the token in the logs which is considered a security flaw, even if only provided on DEBUG level. Change-Id: I52b52e65b689dadbdb08122c94652c491f850de6 Closes-Bug: #2012993
* | | Merge "Handle zero pinned CPU in a cell with mixed policy"Zuul2023-05-092-27/+25
|\ \ \
| * | | Handle zero pinned CPU in a cell with mixed policyBalazs Gibizer2022-12-132-27/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When cpu_policy is mixed the scheduler tries to find a valid CPU pinning for each instance NUMA cell. However if there is an instance NUMA cell that does not request any pinned CPUs then such logic will calculate empty pinning information for that cell. Then the scheduler logic wrongly assumes that an empty pinning result means there was no valid pinning. However there is difference between a None result when no valid pinning found, from an empty result [] which means there was nothing to pin. This patch makes sure that pinning == None is differentiated from pinning == []. Closes-Bug: #1994526 Change-Id: I5a35a45abfcfbbb858a94927853777f112e73e5b
* | | | Merge "Reproduce asym NUMA mixed CPU policy bug"Zuul2023-05-051-0/+74
|\ \ \ \ | |/ / /
| * | | Reproduce asym NUMA mixed CPU policy bugBalazs Gibizer2022-12-131-0/+74
| | | | | | | | | | | | | | | | | | | | Related-Bug: #1994526 Change-Id: I52ee068377cc48ef4b4cdcb4b05fdc8d926faddf
* | | | Merge "Fix get_segments_id with subnets without segment_id"Zuul2023-05-042-2/+18
|\ \ \ \
| * | | | Fix get_segments_id with subnets without segment_idSylvain Bauza2023-05-032-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unfortunatly when we merged Ie166f3b51fddeaf916cda7c5ac34bbcdda0fd17a we forgot that subnets can have no segment_id field. Change-Id: Idb35b7e3c69fe8efe498abe4ebcc6cad8918c4ed Closes-Bug: #2018375
* | | | | Merge "Have host look for CPU controller of cgroupsv2 location."Zuul2023-05-049-51/+170
|\ \ \ \ \
| * | | | | Have host look for CPU controller of cgroupsv2 location.Jorge San Emeterio2023-05-039-51/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the host class look under '/sys/fs/cgroup/cgroup.controllers' for support of the cpu controller. The host will try searching through cgroupsv1 first, just like up until now, and in the case that fails, it will try cgroupsv2 then. The host will not support the feature if both checks fail. This new check needs to be mocked by all tests that focus on this piece of code, as it touches a system file that requires privileges. For such thing, the CGroupsFixture is defined to easily add suck mocking to all test cases that require so. I also removed old mocking at test_driver.py in favor of the fixture from above. Partial-Bug: #2008102 Change-Id: I99b57c27c8a4425389bec2b7f05af660bab85610
* | | | | | Merge "Save cell socket correctly when updating host NUMA topology"Zuul2023-05-046-19/+53
|\ \ \ \ \ \
| * | | | | | Save cell socket correctly when updating host NUMA topologyArtom Lifshitz2023-04-256-19/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, in numa_usage_from_instance_numa(), any new NUMACell objects we created did not have the `socket` attribute. In some cases this was persisted all the way down to the database. Fix this by copying `socket` from the old_cell. Change-Id: I9ed3c31ccd3220b02d951fc6dbc5ea049a240a68 Closes-Bug: 1995153
* | | | | | | Merge "add hypervisor version weigher"Zuul2023-05-046-0/+232
|\ \ \ \ \ \ \ | |_|_|/ / / / |/| | | | | |
| * | | | | | add hypervisor version weigherSean Mooney2023-04-206-0/+232
| | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | implements: blueprint weigh-host-by-hypervisor-version Change-Id: I36b16a388383c26bdf432030bc9e28b2fd75d120
* | | | | | Merge "Fix a typo in this URL: ↵Zuul2023-04-281-1/+1
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | https://docs.openstack.org/nova/latest/admin/availability-zones.html"
| * | | | | | Fix a typo in this URL: ↵Author: Carl Morris2023-03-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://docs.openstack.org/nova/latest/admin/availability-zones.html Closes-Bug: #1956506 Change-Id: Iec536713923b17cfceb19f2382b7a10c8527705e
* | | | | | | Merge "Stop ignoring missing compute nodes in claims"Zuul2023-04-264-38/+66
|\ \ \ \ \ \ \
| * | | | | | | Stop ignoring missing compute nodes in claimsDan Smith2023-04-244-38/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The resource tracker will silently ignore attempts to claim resources when the node requested is not managed by this host. The misleading "self.disabled(nodename)" check will fail if the nodename is not known to the resource tracker, causing us to bail early with a NopClaim. That means we also don't do additional setup like creating a migration context for the instance, claim resources in placement, and handle PCI/NUMA things. This behavior is quite old, and clearly doesn't make sense in a world with things like placement. The bulk of the test changes here are due to the fact that a lot of tests were relying on this silent ignoring of a mismatching node, because they were passing node names that weren't even tracked. This change makes us raise an error if this happens so that we can actually catch it, and avoid silently continuing with no resource claim. Change-Id: I416126ee5d10428c296fe618aa877cca0e8dffcf
* | | | | | | | Merge "Remove silent failure to find a node on rebuild"Zuul2023-04-262-7/+26
|\ \ \ \ \ \ \ \ | |/ / / / / / /
| * | | | | | | Remove silent failure to find a node on rebuildDan Smith2023-04-242-7/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have been ignoring the case where a rebuild or evacuate is triggered and we fail to find *any* node for our host. This appears to be a very old behavior, which I traced back ten years to this: https://review.opendev.org/c/openstack/nova/+/35851 which was merely fixing the failure to reset instance.node during an evacuate (which re-uses rebuild, which before that was a single-host operation). That patch intended to make a failure to find a node for our host a non-fatal error, but it just means we fall through that check with no node selected, which means we never update instance.node *and* run ResourceTracker code that will fail to find the node later. So, this makes it an explicit error, where we stop further processing, set the migration for the evacuation to 'failed', and send a notification for it. This is the same behavior as happens further down if we find that the instance has been deleted underneath us. Change-Id: I88b962aaeaa0554da4ab00906ac4d9e6deb43589
* | | | | | | | Merge "Reproduce bug 1995153"Zuul2023-04-261-0/+109
|\ \ \ \ \ \ \ \ | |/ / / / / / / |/| | | / / / / | | |_|/ / / / | |/| | | | |
| * | | | | | Reproduce bug 1995153Artom Lifshitz2023-04-251-0/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we first boot an instance with NUMA topology on a host, any subsequent attempts to boot instances with the `socket` PCI NUMA policy will fail with `Cannot load 'socket' in the base class`. Demonstrate this in a functional test. Change-Id: I63f4e3dfa38f65b73d0051b8e52b1abd0f027e9b Related-bug: 1995153
* | | | | | | Remove focal job for 2023.2Dan Smith2023-04-241-13/+0
|/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Neutron also flipped to python>=3.9 on all their repos this morningi[1] which means we can't install neutron on focal at all. I'm not sure if that's going to get reverted at this point, but even if it is, it's going to take a while to undo. As noted in the comments and the original commit[2] that added this job, it was intended to be removed when we dropped focal from the test interface, which we have now done. 1: https://review.opendev.org/q/topic:bug%252F2017478 2: https://review.opendev.org/c/openstack/nova/+/861111 Change-Id: I5be638a702629e07ec9c88bd67bb9b7f1212f7fc
* | | | | | Merge "hyperv: Mark driver as experimental"Zuul2023-04-192-0/+14
|\ \ \ \ \ \ | |_|/ / / / |/| | | | |
| * | | | | hyperv: Mark driver as experimentalStephen Finucane2022-11-072-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cloudbase have changed priorities and will no longer be testing the Hyper-V driver. We need to mark this as experimental and consider removing it in the future. Change-Id: I823fbf660948c062581d4e0aaaadc6a6983de2a3 Signed-off-by: Stephen Finucane <sfinucan@redhat.com>
* | | | | | Merge "db: Remove legacy migrations"Zuul2023-04-1759-3537/+24
|\ \ \ \ \ \
| * | | | | | db: Remove legacy migrationsStephen Finucane2023-02-0159-3537/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sqlalchemy-migrate does not (and will not) support sqlalchemy 2.0. We need to drop these migrations to ensure we can upgrade our sqlalchemy version. Change-Id: I7756e393b78296fb8dbf3ca69c759d75b816376d Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
* | | | | | | Merge "Update to the PTL guide"Zuul2023-04-141-29/+44
|\ \ \ \ \ \ \
| * | | | | | | Update to the PTL guideSylvain Bauza2023-04-051-29/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Was a bit old, refreshed with more up-to-date information and links. Change-Id: I5b5da4748238acda98f29570fa97d09d8aa8df82
* | | | | | | | Merge "Make scheduler lazy-load the placement client"Zuul2023-04-102-1/+70
|\ \ \ \ \ \ \ \
| * | | | | | | | Make scheduler lazy-load the placement clientDan Smith2023-03-222-1/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Like we did for conductor, this makes the scheduler lazy-load the placement client instead of only doing it during __init__. This avoids a startup crash if keystone or placement are not available, but retains startup failures for other problems and errors likely to be a result of misconfigurations. Closes-Bug: #2012530 Change-Id: I42ed876b84d80536e83d9ae01696b0a64299c9f7
* | | | | | | | | Allow running functional-py311Dan Smith2023-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes us able to run functional on python 3.11. Without this, tox will happily (and silently) run the default venv, which is unit tests. Change-Id: I544a29ae78814f9a454daba8c1978f7ab2c2505c
* | | | | | | | | Merge "Update min support for Bobcat"Zuul2023-04-042-1/+17
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | Update min support for BobcatSylvain Bauza2023-03-082-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I needed to update some VDPA functests as they were verifying a Yoga compute service. NOTE(sbauza): For the moment, the grenade-skip-level is not voting but it will be done once I2b21e7d5f487f65ce4391f5c934046552d01a1e2 is merged. Change-Id: I8ef2a8f251a3142c359e14841459bffcc3b50ac9
* | | | | | | | | | mypy: Fix implicit optional usageEric Harney2023-03-277-11/+16
| |_|_|_|_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current versions of mypy run with no-implicit-optional by default. This change gets Nova's mypy test environment to pass again. Change-Id: Ie50c8d364ad9c339355cc138b560ec4df14fe307
* | | | | | | | | Add grenade-skip-level-always to novaDan Smith2023-03-231-1/+3
| |/ / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes us test N-2->N even for non-SLURP releases. Ideally we would continue to keep this working, even though we don't have to. But, even if this highlights some breaking change and we have to drop this job, the sentinel will be useful. Depends-On: https://review.opendev.org/c/openstack/grenade/+/875990 Change-Id: I2b21e7d5f487f65ce4391f5c934046552d01a1e2
* | | | | | | | Merge "Unbind port when offloading a shelved instance"Zuul2023-03-136-26/+73
|\ \ \ \ \ \ \ \
| * | | | | | | | Unbind port when offloading a shelved instanceArnaud Morin2022-11-296-26/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When offloading a shelved instance, the compute needs to remove the binding so the port will appear as "unbound" in neutron. Closes-Bug: 1983471 Change-Id: Ia49271b126870c7936c84527a4c39ab96b6c5ea7 Signed-off-by: Arnaud Morin <arnaud.morin@ovhcloud.com>
* | | | | | | | | Merge "fup for power management series"Zuul2023-03-093-26/+4
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | fup for power management seriesSylvain Bauza2023-02-233-26/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Emptying the cpu init file and directly calling the submodule API. Relates to blueprint libvirt-cpu-state-mgmt Change-Id: I1299ca4b49743f58bec6f541785dd9fbee0ae9e2
* | | | | | | | | | Merge "Update master for stable/2023.1"Zuul2023-03-092-0/+7
|\ \ \ \ \ \ \ \ \ \