diff options
author | Balazs Gibizer <gibi@redhat.com> | 2022-08-17 18:19:15 +0200 |
---|---|---|
committer | Balazs Gibizer <gibi@redhat.com> | 2022-08-25 10:00:10 +0200 |
commit | 2b447b7236f95752d00ebcee8c32cfef4850cf5d (patch) | |
tree | ef41ec5ed96d70bcf165b57cc869dc189f5f9902 /nova/compute | |
parent | 2aeb0a96b77e05172b13b4d1f692ff2b08f10bc9 (diff) | |
download | nova-2b447b7236f95752d00ebcee8c32cfef4850cf5d.tar.gz |
Trigger reschedule if PCI consumption fail on compute
The PciPassthroughFilter logic checks each InstancePCIRequest
individually against the available PCI pools of a given host and given
boot request. So it is possible that the scheduler accepts a host that
has a single PCI device available even if two devices are requested for
a single instance via two separate PCI aliases. Then the PCI claim on
the compute detects this but does not stop the boot just logs an ERROR.
This results in the instance booted without any PCI device.
This patch does two things:
1) changes the PCI claim to fail with an exception and trigger a
re-schedule instead of just logging an ERROR.
2) change the PciDeviceStats.support_requests that is called during
scheduling to not just filter pools for individual requests but also
consume the request from the pool within the scope of a single boot
request.
The fix in #2) would not be enough alone as two parallel scheduling
request could race for a single device on the same host. #1) is the
ultimate place where we consume devices under a compute global lock so
we need the fix there too.
Closes-Bug: #1986838
Change-Id: Iea477be57ae4e95dfc03acc9368f31d4be895343
Diffstat (limited to 'nova/compute')
-rw-r--r-- | nova/compute/manager.py | 8 |
1 files changed, 4 insertions, 4 deletions
diff --git a/nova/compute/manager.py b/nova/compute/manager.py index afa8ecc463..866fe65ff9 100644 --- a/nova/compute/manager.py +++ b/nova/compute/manager.py @@ -7712,10 +7712,10 @@ class ComputeManager(manager.Manager): if not pci_reqs.requests: return None - devices = self.rt.claim_pci_devices( - context, pci_reqs, instance.numa_topology) - - if not devices: + try: + devices = self.rt.claim_pci_devices( + context, pci_reqs, instance.numa_topology) + except exception.PciDeviceRequestFailed: LOG.info('Failed to claim PCI devices during interface attach ' 'for PCI request %s', pci_reqs, instance=instance) raise exception.InterfaceAttachPciClaimFailed( |