summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge "Hook resource_tracker to remove stale node information" into stable/ocataocata-eolstable/ocataZuul2020-03-194-1/+38
|\
| * Hook resource_tracker to remove stale node informationDan Smith2019-08-144-1/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When we remove a node from a host due to rebalance or decommission, we need to tell the resource tracker so it stops keeping track of the details for that node. This adds a hook for doing that and calls it from the place in compute manager where we purge the record from the database. Change-Id: Ie6b6bb2a9e8a4ad33675fccb3827e8197fa16398 Closes-Bug: #1784874 (cherry picked from commit 99db9faae5c1e84e5aa8586c32ffcca7a61ae276) (cherry picked from commit 7302a625ee9af2c9dd88f8630f1e95fc5977a70a) (cherry picked from commit 4f112195b0ad99f07784a9fe8dbe32a25329151d)
* | Merge "Fix incompatible version handling in BuildRequest" into stable/ocataZuul2020-03-192-3/+4
|\ \
| * | Fix incompatible version handling in BuildRequestBalazs Gibizer2019-12-172-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The BuildRequest object code assumed that IncompatibleObjectVersion exception has a objver field that contains the object version. This assumption is not true. The unit test made another mistake serializing the function object obj_to_primitive instead of serializing the result of the call of obj_to_primitive. This caused a false positive test covering the error in the implementation as well. Closes-Bug: #1812177 Change-Id: I1ef4a23aa2bf5cb46b481045f3d968f62f74606d (cherry picked from commit 975f0156137e37f7a9139c0268547d79dcc3c43c) (cherry picked from commit 6061186d2d86eb628dcdc61667c240576fa8f10e) (cherry picked from commit 7c85ebad0bf13d609b384e7637b0787bf8d484f7) (cherry picked from commit ebf011f11ebb4d329e90e7ec0e498a3a6832e154)
* | | Merge "Fix listing deleted servers with a marker" into stable/ocataZuul2020-03-192-8/+8
|\ \ \
| * | | Fix listing deleted servers with a markerMatt Riedemann2019-12-052-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change I1aa3ca6cc70cef65d24dec1e7db9491c9b73f7ab in Queens, which was backported through to Newton, introduced a regression when listing deleted servers with a marker because it assumes that if BuildRequestList.get_by_filters does not raise MarkerNotFound that the marker was found among the build requests and does not account for that get_by_filters method short-circuiting if filtering servers with deleted/cleaned/limit=0. The API code then nulls out the marker which means you'll continue to get the marker instance back in the results even though you shouldn't, and that can cause an infinite loop in some client-side tooling like nova's CLI: nova list --deleted --limit -1 This fixes the bug by raising MarkerNotFound from BuildRequestList.get_by_filters if we have a marker but are short-circuiting and returning early from the method based on limit or filters. Change-Id: Ic2b19c2aa06b3059ab0344b6ac56ffd62b3f755d Closes-Bug: #1849409 (cherry picked from commit df03499843aa7fd6089bd4d07b9d0eb5a8c14b47) (cherry picked from commit 03a2508362ecf50463d0659142312a30d0fb91f3) (cherry picked from commit 6038455e1dafeaf8ccfdd4be9e993beea3c9fb45) (cherry picked from commit c84846dbf93588e5420b3565d163c0393443e6d6) (cherry picked from commit 8c18293f93a0ef957010198ae3ef2f336364b97f) (cherry picked from commit b277ec154f2a175c4cab609b89fd8d3c7de4d0e9)
* | | | Merge "Add functional regression test for bug 1849409" into stable/ocataZuul2020-03-193-14/+92
|\ \ \ \ | |/ / /
| * | | Add functional regression test for bug 1849409Matt Riedemann2019-12-053-14/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change I1aa3ca6cc70cef65d24dec1e7db9491c9b73f7ab in Queens, which was backported through to Newton, introduced a regression when listing deleted servers with a marker because it assumes that if BuildRequestList.get_by_filters does not raise MarkerNotFound that the marker was found among the build requests and does not account for that get_by_filters method short-circuiting if filtering servers with deleted/cleaned/limit=0. The API code then nulls out the marker which means you'll continue to get the marker instance back in the results even though you shouldn't, and that can cause an infinite loop in some client-side tooling like nova's CLI: nova list --deleted --limit -1 This adds a functional recreate test for the regression which will be updated when the bug is fixed. NOTE(mriedem): In this backport the test is modified to disable the DiskFilter since we're using Placement for filtering on DISK_GB. Also, _wait_until_deleted is moved to InstanceHelperMixin since If7b02bcd8d77e94c7fb42b721792c1391bc0e3b7 is not in Ocata. Change-Id: I324193129acb6ac739133c7e76920762a8987a84 Related-Bug: #1849409 (cherry picked from commit 45c2752f2ce08b314012eff044b01aa7d626b43d) (cherry picked from commit 727d942b2830fb6970d99507f2b5eb1a28df01b2) (cherry picked from commit 47caaccd4a03660d7df144f2eadd821d36baeaa8) (cherry picked from commit 08337cccb060d0b3cad388004c1f802d5d053813) (cherry picked from commit f03f5075e3751ccd03a60999eab1d7f4bf7c4f02) (cherry picked from commit e2ba87cb7b1cfd4226cfd8028436ecf87ad8a2be)
* | | | Use stable constraint for Tempest pinned stable branchesGhanshyam Mann2020-02-102-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stable branches till stable/rocky is using python version <py3.6. Tempest test those branch in venv but Tempest tox use the master upper-constraint[1] which block installation due to dependencies require >=py3.6. For exmaple, oslo.concurrency 4.0.0 is not compatible for <py3.6. As we pin Tempest for EM stable brach, we should be able to use stable constraint for Tempest installation as well as while running during run-tempest playbook. tox.ini is hard coded to use master constraint[1] which force run-tempest to recreate the tox env and use the master constraint. Fix for that- https://review.opendev.org/#/c/705870/ nova-live-migration test hook run_test.sh needs to use the stable u-c so that Tempest installation in venv will use stable branch constraint. Modify the irrelevant-files for nova-live-migration job to run for run_test.sh script. [1] https://opendev.org/openstack/tempest/src/commit/bc9fe8eca801f54915ff3eafa418e6e18ac2df63/tox.ini#L14 Change-Id: I8190f93e0a754fa59ed848a3a230d1ef63a06abc (cherry picked from commit 48a66c56441861a206f9369b8c242cfd4dffd80d)
* | | | ocata-only: drop non-voting ceph jobMatt Riedemann2019-12-191-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nova on stable/ocata is the last remaining user of legacy-tempest-dsvm-full-devstack-plugin-ceph and runs it non-voting. stable/ocata is in extended maintenance mode and doesn't see much action anymore. Furthermore, the ceph job is being removed from nova in stable/pike testing [1] so it'd be a bit weird to run a ceph job on ocata but not pike so let's just remove it. Since it's non-voting if it breaks no one will likely notice anyway. [1] I9e153f86c81ed6d9f8d9682b66d6d5c7f7b25296 Change-Id: I3b088925dd46898bda6f74af4b933cfa732c31b7
* | | | Merge "Exclude build request marker from server listing" into stable/ocataZuul2019-12-182-9/+15
|\ \ \ \ | |_|/ / |/| | |
| * | | Exclude build request marker from server listingAndrey Volkov2019-08-192-9/+15
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When listing "real" (already in cell) instances e.g. from nova_cell1.instances table, a marker option means "start after the instance with marker". For VMs: | uuid | name | | 1 | vm1 | | 2 | vm2 | "openstack server list --marker 1" returns vm2 only. But for VMs from nova_api.build_requests table it's different. For VMs: | uuid | name | | 1 | vm1 | | 2 | vm2 | "openstack server list --marker 1" returns both vm1 and vm2. This patch excludes instance with marker from listing for instances from build_requests table. Closes-Bug: #1808286 Change-Id: I5165b69f956fbf1904112a742698b2739f747e72 (cherry picked from commit 2ef704cba619a149336692311c2614742aa32909) (cherry picked from commit 3eb0ba988f4cfca4399c92cd6bb7b4ae8665720c) (cherry picked from commit 8aadd4ebdfea3f1b46b2e2e62b355e8b40b6261b) (cherry picked from commit 3d3a263789c5cc49899d63d18f9233c981f5b894)
* | | Merge "Workaround missing RequestSpec.instance_group.uuid" into stable/ocataZuul2019-12-062-15/+14
|\ \ \
| * | | Workaround missing RequestSpec.instance_group.uuidMatt Riedemann2019-06-042-15/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's clear that we could have a RequestSpec.instance_group without a uuid field if the InstanceGroup is set from the _populate_group_info method which should only be used for legacy translation of request specs using legacy filter properties dicts. To workaround the issue, we look for the group scheduler hint to get the group uuid before loading it from the DB. The related functional regression recreate test is updated to show this solves the issue. Change-Id: I20981c987549eec40ad9762e74b0db16e54f4e63 Closes-Bug: #1830747 (cherry picked from commit da453c2bfe86ab7a825f0aa7ebced15886f7a5fd) (cherry picked from commit 8569eb9b4fb905cb92041b84c293dc4e7af27fa8) (cherry picked from commit 9fed1803b4d6b2778c47add9c327f0610edc5952) (cherry picked from commit 20b90f2e26e6a46a12c2fd943b4472c3147528fa) (cherry picked from commit 79cc08642172a3df1cd8d7a7c413adc21b468dcf)
* | | | Merge "Add regression recreate test for bug 1830747" into stable/ocataZuul2019-12-061-0/+153
|\ \ \ \ | |/ / / | | | / | |_|/ |/| |
| * | Add regression recreate test for bug 1830747Matt Riedemann2019-06-041-0/+153
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before change I4244f7dd8fe74565180f73684678027067b4506e in Stein, when a cold migration would reschedule to conductor it would not send the RequestSpec, only the filter_properties. The filter_properties contain a primitive version of the instance group information from the RequestSpec for things like the group members, hosts and policies, but not the uuid. When conductor is trying to reschedule the cold migration without a RequestSpec, it builds a RequestSpec from the components it has, like the filter_properties. This results in a RequestSpec with an instance_group field set but with no uuid field in the RequestSpec.instance_group. That RequestSpec gets persisted and then because of change Ie70c77db753711e1449e99534d3b83669871943f, later attempts to load the RequestSpec from the database will fail because of the missing RequestSpec.instance_group.uuid. The test added here recreates the pre-Stein scenario which could still be a problem (on master) for any corrupted RequestSpecs for older instances. NOTE(mriedem): In this version we have to use the FakeDriver because the MediumFakeDriver did not exist in Ocata. Also, we have to disable the DiskFilter since we are using placement during scheduling. Change-Id: I05700c97f756edb7470be7273d5c9c3d76d63e29 Related-Bug: #1830747 (cherry picked from commit c96c7c5e13bde39944a9dde7da7fe418b311ca2d) (cherry picked from commit 8478a754802e29dffbb65ef363ee189162f0adea) (cherry picked from commit a0a187c9bb9bef149e193027a6eedc09ba10ce1f) (cherry picked from commit 581df2c98676b6734e8195ab56c9e0dba74789a5) (cherry picked from commit 09ec97b95b19a42b949da76fe1f1b3cc06da8f35)
* | | Merge "Do not persist RequestSpec.ignore_hosts" into stable/ocataZuul2019-10-316-30/+71
|\ \ \
| * | | Do not persist RequestSpec.ignore_hostsMatt Riedemann2019-04-026-30/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change Ic3968721d257a167f3f946e5387cd227a7eeec6c in Newton started setting the RequestSpec.ignore_hosts field to the source instance.host during resize/cold migrate if allow_resize_to_same_host=False in config, which it is by default. Change I8abdf58a6537dd5e15a012ea37a7b48abd726579 also in Newton persists changes to the RequestSpec in conductor in order to save the RequestSpec.flavor for the new flavor. This inadvertently persists the ignore_hosts field as well. Later if you try to evacuate or unshelve the server it will ignore the original source host because of the persisted ignore_hosts value. This is obviously a problem in a small deployment with only a few compute nodes (like an edge deployment). As a result, an evacuation can fail if the only available host is the one being ignored. This change does two things: 1. In order to deal with existing corrupted RequestSpecs in the DB, this change simply makes conductor overwrite RequestSpec.ignore_hosts rather than append during evacuate before calling the scheduler so the current instance host (which is down) is filtered out. This evacuate code dealing with ignore_hosts goes back to Mitaka: I7fe694175bb47f53d281bd62ac200f1c8416682b The test_rebuild_instance_with_request_spec unit test is updated and renamed to actually be doing an evacuate which is what it was intended for, i.e. the host would not change during rebuild. 2. This change makes the RequestSpec no longer persist the ignore_hosts field like several other per-operation fields in the RequestSpec. The only operations that use ignore_hosts are resize (if allow_resize_to_same_host=False), evacuate and live migration, and the field gets reset in each case to ignore the source instance.host. The related functional recreate test is also updated to show the bug is fixed. Note that as part of that, the confirm_migration method in the fake virt driver needed to be implemented otherwise trying to evacuate back to the source host fails with an InstanceExists error since the confirmResize operation did not remove the guest from the source host. Conflicts: nova/tests/unit/conductor/test_conductor.py NOTE(mriedem): The conflict is due to not having change I434af8e4ad991ac114dd67d66797a562d16bafe2 in Ocata. Change-Id: I3f488be6f3c399f23ccf2b9ee0d76cd000da0e3e Closes-Bug: #1669054 (cherry picked from commit e4c998e57390c15891131205f7443fee98dde4ee) (cherry picked from commit 76dfb9d0b6cee1bbb553d3ec11139ca99a6f9474) (cherry picked from commit 31c08d0c7dc73bb2eefc69fa0f014b6478f3d149) (cherry picked from commit 8f1773a7af273e2739843a804fb4fe718c0a242d) (cherry picked from commit efab235f88ac040ef9654c782236b83ffcb473e3)
* | | | Merge "Add functional regression test for bug 1669054" into stable/ocataZuul2019-10-312-3/+93
|\ \ \ \ | |/ / / | | | / | |_|/ |/| |
| * | Add functional regression test for bug 1669054Matt Riedemann2019-04-022-3/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change Ic3968721d257a167f3f946e5387cd227a7eeec6c in Newton started setting the RequestSpec.ignore_hosts field to the source instance.host during resize/cold migrate if allow_resize_to_same_host=False in config, which it is by default. Change I8abdf58a6537dd5e15a012ea37a7b48abd726579 also in Newton persists changes to the RequestSpec in conductor in order to save the RequestSpec.flavor for the new flavor. This inadvertently persists the ignore_hosts field as well. Later if you try to evacuate or unshelve the server it will ignore the original source host because of the persisted ignore_hosts value. This is obviously a problem in a small deployment with only a few compute nodes (like an edge deployment). As a result, an evacuation can fail if the only available host is the one being ignored. This adds a functional regression recreate test for the bug. Conflicts: nova/tests/functional/api/client.py NOTE(mriedem): The conflict is due to not having change I71e9d8dae55653ad3ee70f708a6d92c98ed20c1c in Ocata which added the 201 value to the check_response_status default. NOTE(mriedem): This backport differs slightly in that (a) REQUIRES_LOCKING is removed and NeutronFixture is used explicitly since change I9b35ed7497db8bd1eb74f4bb89631aabbcfeec0d is not in Ocata and (b) kwargs are passed through post_server_action because change If7b02bcd8d77e94c7fb42b721792c1391bc0e3b7 is not in Ocata. Change-Id: I6ce2d6b1baf47796f867aede1acf292ec9739d6d Related-Bug: #1669054 (cherry picked from commit 556cf103b22ab6bebecc9d824d6f918cda38fe3e) (cherry picked from commit 20c1414945db633ed00c1f19f1f0d163028454d9) (cherry picked from commit 77164128bf5eef53b49547658be2ef902c020207) (cherry picked from commit 9fd4082d7c076146ec314b86e0e4772d0a021712) (cherry picked from commit fcd718dcdd3cb7ba46f16ff97ecee068d55c8801)
* | | Replace non-nova server fault messageMatt Riedemann2019-08-064-16/+193
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The server fault "message" is always shown in the API server response, regardless of policy or user role. The fault "details" are only shown to users with the admin role when the fault code is 500. The problem with this is for non-nova exceptions, the fault message is a string-ified version of the exception (see nova.compute.utils.exception_to_dict) which can contain sensitive information which the non-admin owner of the server can see. This change adds a functional test to recreate the issue and a change to exception_to_dict which for the non-nova case changes the fault message by simply storing the exception type class name. Admins can still see the fault traceback in the "details" key of the fault dict in the server API response. Note that _get_fault_details is changed so that the details also includes the exception value which is what used to be in the fault message for non-nova exceptions. This is necessary so admins can still get the exception message with the traceback details. Note that nova exceptions with a %(reason)s replacement variable could potentially be leaking sensitive details as well but those would need to be cleaned up on a case-by-case basis since we don't want to change the behavior of all fault messages otherwise users might not see information like NoValidHost when their server goes to ERROR status during scheduling. SecurityImpact: This change contains a fix for CVE-2019-14433. NOTE(mriedem): In this backport the functional test is modified slightly to remove the DiskFilter since we are using placement for scheduler filtering on DISK_GB. Change-Id: I5e0a43ec59341c9ac62f89105ddf82c4a014df81 Closes-Bug: #1837877 (cherry picked from commit 298b337a16c0d10916b4431c436d19b3d6f5360e) (cherry picked from commit 67651881163b75eb1983eaf753471a91ecec35eb) (cherry picked from commit e0b91a5b1e89bd0506dc6da86bc61f1708f0215a) (cherry picked from commit 3dcefba60a4f4553888a9dfda9fe3bee094d617a) (cherry picked from commit 6da28b0aa9b6a0ba67460f88dd2c397605b0679b)
* | | fix unshelve notification test instabilityBalazs Gibizer2019-08-062-9/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The unshelve notification sample test shelve-offloads an instance, waits for it state to change to SHELVED_OFFLOADED then unshelve the instance and matches generated the unshelve notification with the stored sample. This test intermittently fails as the host paramter of the instance doesn't match sometimes. The reason is that the compute manager during shelve offloading first sets the state of the instance then later sets the host of the instance. So the test can start unshelving the instance before the host is cleaned by the shelve offload code. The test is updated to not just wait for the state change but also wait for the change of the host attribute. Change-Id: I459332de407187724fd2962effb7f3a34751f505 Closes-Bug: #1704423 (cherry picked from commit da57d17e6c7d5d7e84af3c46a836ee587581bf8d)
* | | Implement power_off/power_on for the FakeDriverMatt Riedemann2019-08-0612-15/+23
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When trying to recreate hundreds of instance action events for scale testing with the FakeDriver, a nice simple way to do that is by stopping those instances and starting them again. However, since power_off/on aren't implemented, once you "stop" them the sync_instance_power_state periodic task in the compute manager thinks they are still running on the "hypervisor" and will stop them again via the API, which records yet another instance action and set of events. This just toggles the power state bit on the fake instance in the FakeDriver to make the periodic task do the right thing. As a result, we also have more realistic API and notification samples. Change-Id: Ie621686053ad774c4ae4f22bb2a455f98900b611 (cherry picked from commit 211e9b1961a14e22b39194fdd90e40738fc27202) (cherry picked from commit 4a3eea8c30f5a452bcf82b145c66de3d3416a1cd)
* | Do not dump all instances in the schedulerRadoslav Gerganov2019-05-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a log message in the scheduler which dumps all instances running on a compute host. There are at least 2 problems with this: 1. it generates huge amount of logs which are not really useful 2. it crashes when there is an instance with non-ascii name Instead of fixing 2), just print instance UUIDs. Closes-Bug: #1790126 Related-Bug: #1620692 Change-Id: I0eda1c58a7eb54121230c880818b4b1d0fdf4893 (cherry picked from commit 4fd7c93726eff4cc0b010741ea1772cf19c314fc) (cherry picked from commit 1a22c456b4c1ccd41ffa1bae5a37ba88652b2081) (cherry picked from commit 16b2a56005cc44973ec87f71037214db73264580) (cherry picked from commit fc8cd8f3b346c8f53e2dfc8e3de9fdcaedb0d35d)
* | OpenDev Migration PatchOpenDev Sysadmins2019-04-193-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit was bulk generated and pushed by the OpenDev sysadmins as a part of the Git hosting and code review systems migration detailed in these mailing list posts: http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003603.html http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004920.html Attempts have been made to correct repository namespaces and hostnames based on simple pattern matching, but it's possible some were updated incorrectly or missed entirely. Please reach out to us via the contact information listed at https://opendev.org/ with any questions you may have.
* | Merge "Update port device_owner when unshelving" into stable/ocataZuul2019-04-152-3/+25
|\ \
| * | Update port device_owner when unshelvingMatt Riedemann2018-12-192-3/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we shelve offload an instance, we unplug VIFs, delete the guest from the compute host, etc. The instance is still logically attached to its ports but they aren't connected on any host. When we unshelve an instance, it is scheduled and created on a potentially new host, in a potentially new availability zone. During unshelve, the compute manager will call the setup_instance_network_on_host() method to update the port host binding information for the new host, but was not updating the device_owner, which reflects the availability zone that the instance is in. Because of this, an instance can be created in az1, shelved, and then unshelved in az2 but the port device_owner still says az1 even though the port host binding is for a compute host in az2. This change simply updates the port device_owner when updating the port binding host during unshelve. A TODO is left in the cleanup_instance_network_on_host() method which is called during shelve offload but is currently not implemented. We should unbind ports when shelve offloading, but that is a bit of a bigger change and left for a separate patch since it is not technically needed for this bug fix. Change-Id: Ibd1cbe0e9b5cf3ede542dbf62b1a7d503ba7ea06 Closes-Bug: #1759924 (cherry picked from commit 93802619adde69bf84d26d7231480abb4da07c91) (cherry picked from commit 3e38d1cf16ca29948be499aa37e2494ffd001f12) (cherry picked from commit 245364ece1689853bf33ac25d7319618d064d909) (cherry picked from commit c7bb9b1652b6df3e0a353a5c9f4cf70299c4e5e7)
* | | Merge "Refix disk size during live migration with disk over-commit" into ↵Zuul2019-04-152-3/+3
|\ \ \ | | | | | | | | | | | | stable/ocata
| * | | Refix disk size during live migration with disk over-commitint32bit2019-01-252-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently available disk of targer host is calculated based on local_gb, not available disk(free_disk_gb). This check can be negative if the target host has no free disk. Change-Id: Iec50269ef31dfe090f0cd4db95a37909661bd910 closes-bug: 1744079 (cherry picked from commit e2cc275063658b23ed88824100919a6dfccb760d) Signed-off-by: Zhang Hua <joshua.zhang@canonical.com>
* | | | Merge "Fix disk size during live migration with disk over-commit" into ↵Zuul2019-04-152-1/+37
|\ \ \ \ | |/ / / | | | | | | | | stable/ocata
| * | | Fix disk size during live migration with disk over-commitMatthew Booth2019-01-252-1/+37
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to microversion 2.25, the migration api supported a 'disk_over_commit' parameter, which indicated that we should do disk size check on the destination host taking into account disk over-commit. Versions since 2.25 no longer have this parameter, but we continue to support the older microversion. This disk size check was broken when disk over commit was in use on the host. In LibvirtDriver._assert_dest_node_has_enough_disk() we correctly calculate the required space using allocated disk size rather than virtual disk size when doing an over-committed check. However, we were checking against 'disk_available_least' as reported by the destination host. This value is: the amount of disk space which the host would have if all instances fully allocated their storage. On an over-committed host it will therefore be negative, despite there actually being space on the destination host. The check we actually want to do for disk over commit is: does the target host have enough free space to take the allocated size of this instance's disks? We leave checking over-allocation ratios to the scheduler. Note that if we use disk_available_least here and the destination host is over-allocated, this will always fail because free space will be negative, even though we're explicitly ok with that. Using disk_available_least would make sense for the non-overcommit case, where the test would be: would the target host have enough free space for this instance if the instance fully allocated all its storage, and everything else on the target also fully allocated its storage? As noted, we no longer actually run that test, though. We fix the issue for legacy microversions by fixing the destination host's reported disk space according to the given disk_over_commit parameter. Change-Id: I8a705114d47384fcd00955d4a4f204072fed57c2 Resolves-bug: #1708572 (cherry picked from commit e097c001c8e11110efe8879da57264fcb7bdfdf2)
* | | Merge "tox: Don't write byte code (maybe)" into stable/ocataZuul2019-04-061-0/+3
|\ \ \
| * | | tox: Don't write byte code (maybe)Eric Fried2019-02-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In tox versions after 3.0.0rc1 [1], setting the environment variable PYTHONDONTWRITEBYTECODE will cause tox not to write .pyc files, which means you don't have to delete them, which makes things faster. In older tox versions, the env var is ignored. If we bump the minimum tox version to something later than 3.0.0rc1, we can remove the commands that find and remove .pyc files. Conflicts: tox.ini NOTE(stephenfin): Additional stable/queens conflicts are due to a number of envvars not being set because we're still using testr here. NOTE(stephenfin): Conflict is due to a number of unrelated changes made during Rocky, such as Idda28f153d5054efc885ef2bde0989841df29cd3. [1] https://github.com/tox-dev/tox/commit/336f4f6bd8b53223f940fc5cfc43b1bbd78d4699 Change-Id: I779a17afade78997ab084909a9e6a46b0f91f055 (cherry picked from commit 590a2b6bbf71294d187d7082cc302069797db029) (cherry picked from commit 99f0c4c0144a551f0fa7f3a8847327660e5ccb89) (cherry picked from commit 643908ad3a36bc9ad182d79f26dcc6212c638696)
* | | | Merge "Make supports_direct_io work on 4096b sector size" into stable/ocataZuul2019-04-062-2/+10
|\ \ \ \
| * | | | Make supports_direct_io work on 4096b sector sizeJens Harbott2019-01-182-2/+10
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current check uses an alignment of 512 bytes and will fail when the underlying device has sectors of size 4096 bytes, as is common e.g. for NVMe disks. So use an alignment of 4096 bytes, which is a multiple of 512 bytes and thus will cover both cases. Conflicts: nova/privsep/utils.py - supports_direct_io() is still in nova/virt/libvirt/driver.py for this branch Change-Id: I5151ae01e90506747860d9780547b0d4ce91d8bc Closes-Bug: 1801702 Co-Authored-By: Alexandre Arents <alexandre.arents@corp.ovh.com> (cherry picked from commit 14d98ef1b48ca7b2ea468a8f1ec967b954955a63) (cherry picked from commit ba844f2a7c1d4602017bb0fc600a30b150c28dbc)
* | | | PCI: do not force remove allocated devicesSean Mooney2019-04-032-5/+78
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the ocata release the pci_passthrough_whitelist was moved from the [DEFAULT] section of the nova.conf to the [pci] section and renamed to passthrough_whitelist. On upgrading if the operator chooses to migrate the config value to the new section it is not uncommon to forget to rename the config value. Similarly if an operator is updateing the whitelist and mistypes the value it can also lead to the whitelist being ignored. As a result of either error the nova compute agent would delete all database entries for a host regardless of if the pci device was in use by an instance. If this occurs the only recorse for an operator is to delete and recreate the guest on that host after correcting the error or manually restore the database to backup or otherwise consistent state. This change alters the _set_hvdevs function to not force remove allocated or claimed devices if they are no longer present in the pci whitelist. Conflicts: nova/pci/manager.py Closes-Bug: #1633120 Change-Id: I6e871311a0fa10beaf601ca6912b4a33ba4094e0 (cherry picked from commit 26c41eccade6412f61f9a8721d853b545061adcc)
* | | Merge "Drop legacy-grenade-dsvm-neutron-multinode-live-migration" into ↵Zuul2019-03-271-19/+0
|\ \ \ | | | | | | | | | | | | stable/ocata
| * | | Drop legacy-grenade-dsvm-neutron-multinode-live-migrationMatt Riedemann2019-02-281-19/+0
| | |/ | |/| | | | | | | | | | | | | | | | | | | This job doesn't run on stable/ocata changes anyway due to it's branch restrictions in openstack-zuul-jobs, becaues there is no stable/newton to upgrade from. Change-Id: I77c4cbbeccc4e3a814876c08437c15226265b523
* | | Merge "Ensure rbd auth fallback uses matching credentials" into stable/ocataZuul2019-03-262-3/+7
|\ \ \
| * | | Ensure rbd auth fallback uses matching credentialsCorey Bryant2019-01-082-3/+7
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of Ocata, cinder config is preferred for rbd auth values with a fallback to nova values [1]. The fallback path, for the case when rbd_user is configured in cinder.conf and rbd_secret_uuid is not configured in cinder.conf, results in the mismatched use of cinder rbd_user with nova rbd_secret_uuid. This fixes that fallback path to use nova rbd_user from nova.conf with rbd_secret_uuid from nova.conf. [1] See commit f2d27f6a8afb62815fb6a885bd4f8ae4ed287fd3 Thanks to David Ames for this fix. Change-Id: Ieba216275c07ab16414065ee47e66915e9e9477d Co-Authored-By: David Ames <david.ames@canonical.com> Closes-Bug: #1809454 (cherry picked from commit 47b7c4f3cc582bf463fd0c796df84736a0074f48) (cherry picked from commit f5d8ee1bfc3b7b9f1a25f85b42e207db0c9f4b04) (cherry picked from commit accef50f9648dc40f1a6f457f83f5359e9dd2a24) (cherry picked from commit a7e25aa3d2088e2726988c03e84b3b5ea47bfb7e)
* | | Merge "Handle unbound vif plug errors on compute restart" into stable/ocataZuul2019-03-263-5/+28
|\ \ \
| * | | Handle unbound vif plug errors on compute restartStephen Finucane2018-12-203-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As with change Ia963a093a1b26d90b4de2e8fc623031cf175aece, we can sometimes cache failed port binding information which we'll see on startup. Long term, the fix for both issues is to figure out how this is being cached and stop that happening but for now we simply need to allow the service to start up. To this end, we copy the approach in the aforementioned change and implement a translation function in os_vif_util for unbound which will make the plug_vifs code raise VirtualInterfacePlugException which is what the _init_instance code in ComputeManager is already handling. This has the same caveats as that change, namely that there may be smarter ways to do this that we should explore. However, that change also included a note which goes someway to explaining this. Conflicts: nova/compute/manager.py nova/tests/unit/network/test_os_vif_util.py NOTE(sfinucan): As with the 'stable/ocata' backport of change Ia963a093a1b26d90b4de2e8fc623031cf175aece, the compute manager conflicts are due to change I2740ea14e0c4ecee0d91c7f3e401b2c29498d097 in Queens. The _LE() marker has to be left intact for pep8 checks in Ocata. The test_os_vif_util conflicts are due to not having change Ic23effc05c901575f608f2b4c5ccd2b1fb3c2d5a nor change I3f38954bc5cf7b1690182dc8af45078eea275aa4 in Ocata Change-Id: Iaec1f6fd12dba8b11991b7a7595593d5c8b1db50 Signed-off-by: Stephen Finucane <sfinucan@redhat.com> Related-bug: #1784579 Closes-bug: #1809136 (cherry picked from commit 1def76a1c49032d93ab6c7ee61dbbfe8e29cafca) (cherry picked from commit bc0a5d0355311641daa87b46e311ae101f1817ad) (cherry picked from commit 79a90d37027b7ca131218e16eaee70d6d5152206) (cherry picked from commit 7b4f5725f821ef89176ef69f036471eaaf8a6201)
* | | | Merge "Handle binding_failed vif plug errors on compute restart" into ↵Zuul2019-03-263-5/+39
|\ \ \ \ | |/ / / | | | | | | | | stable/ocata
| * | | Handle binding_failed vif plug errors on compute restartMatt Riedemann2018-12-193-5/+39
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Like change Ia584dba66affb86787e3069df19bd17b89cb5c49 which came before, is port binding fails and we have a "binding_failed" vif type in the info cache, we'll fail to plug vifs for an instance on compute restart which will prevent the service from restarting. Before the os-vif conversion code, this was handled with VirtualInterfacePlugException but the os-vif conversion code fails in a different way by raising a generic NovaException because the os-vif conversion utility doesn't handle a vif_type of "binding_failed". To resolve this and make the os-vif flow for binding_failed behave the same as the legacy path, we implement a translation function in os_vif_util for binding_failed which will make the plug_vifs code raise VirtualInterfacePlugException which is what the _init_instance code in ComputeManager is already handling. Admittedly this isn't the smartest thing and doesn't attempt to recover / fix the instance networking info, but it at least gives a more clear indication of what's wrong and lets the nova-compute service start up. A note is left in the _init_instance error handling that we could potentially try to heal binding_failed vifs in _heal_instance_info_cache, but that would need to be done in a separate change since it's more invasive. Conflicts: nova/compute/manager.py nova/tests/unit/network/test_os_vif_util.py NOTE(mriedem): The compute manager conflicts are due to change I2740ea14e0c4ecee0d91c7f3e401b2c29498d097 in Queens. The _LE() marker has to be left intact for pep8 checks in Ocata. The test_os_vif_util conflicts are due to not having change Ic23effc05c901575f608f2b4c5ccd2b1fb3c2d5a nor change I3f38954bc5cf7b1690182dc8af45078eea275aa4 in Ocata. Change-Id: Ia963a093a1b26d90b4de2e8fc623031cf175aece Closes-Bug: #1784579 (cherry picked from commit cdf8ba5acb7f65042af9c21fcbe1a126bd857ad0) (cherry picked from commit a890e3d624a84d8eb0306fab580e2cec33e26bc3) (cherry picked from commit 4827cedbc56033c2ac3caf0d7998fca6aff997d6) (cherry picked from commit 254a19f0d326ae5d3b5890d0d5fc735a771fcc0b)
* | | Replace openstack.org git:// URLs with https://Ian Wienand2019-03-242-3/+3
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a mechanically generated change to replace openstack.org git:// URLs with https:// equivalents. This is in aid of a planned future move of the git hosting infrastructure to a self-hosted instance of gitea (https://gitea.io), which does not support the git wire protocol at this stage. This update should result in no functional change. For more information see the thread at http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003825.html Change-Id: I14a9063a44aa75cfabed6f2ca0cc75583d3d4d14
* | Default embedded instance.flavor.is_public attributeMohammed Naser2018-11-212-11/+19
|/ | | | | | | | | | | | | | | | It is possible that really old instances don't actually have this attribute defined which can lead to raising exceptions when loading their embedded flavors from the database. This patch fixes this by defaulting these values to true if they are not set. Change-Id: If04cd802ce7184dc94f94804c743faebe0d4bd8c Closes-Bug: #1789423 (cherry picked from commit c4f6b0bf6cc903cf52c4b238c3771604dda174b8) (cherry picked from commit c689c09996c4a3da9e05ccd5178a4b5060949889) (cherry picked from commit 61bf7ae5795a3dfac7afc57bed9b4195cfdb6042) (cherry picked from commit 9a0d338c6775803516c5fdb99f0581e40957cb0c)
* Merge "Don't delete neutron port when attach failed" into stable/ocataocata-em15.1.5Zuul2018-10-054-20/+31
|\
| * Don't delete neutron port when attach failedKevin_Zheng2018-10-034-20/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when attaching neutron pre-existing port to an instance, if the attach failed, it will also be deleted in Neutron side due to bad judgement of the who created the port by reading not up-to-date info_cache. The workflow starts at: https://github.com/openstack/nova/blob/9ed0d6114/nova/network/neutronv2/api.py#L881 ordered_ports and preexisting_port_ids are the same when attaching a preexisting port to an instance and it calls https://github.com/openstack/nova/blob/9ed0d6114/nova/network/base_api.py#L246 which calls back into the neutronv2 api code https://github.com/openstack/nova/blob/9ed0d6114/nova/network/neutronv2/api.py#L1274 and at this point, compute_utils.refresh_info_cache_for_instance(context, instance) won't have the newly attached port in it(see debug log: http://paste.openstack.org/show/613232/) because _build_network_info_model() is going to process it. The instance obj in memoryt with old info_cache will be used at rollback process and causing the miss-judging. This patch fixed it by updating instance.info_cache to the new ic after it is created. Conflicts: doc/notification_samples/instance-shutdown-end.json Co-Authored-By: huangtianhua@huawei.com Change-Id: Ib323b74d4ea1e874b476ab5addfc6bc79cb7c751 closes-bug: #1645175 (cherry picked from commit 115cf068a6d48cdf8b0d20a3c5a779bb8120aa9b)
* | Merge "Add check for invalid inventory amounts" into stable/ocataZuul2018-10-052-9/+94
|\ \
| * | Add check for invalid inventory amountsEdLeafe2018-10-022-9/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds sane minimum and maximums to the fields for an inventory posting, which will quickly return a 400 error if invalid values are passed instead of proceeding, only to fail at the DB layer. Conflicts: nova/tests/functional/api/openstack/placement/gabbits/inventory.yaml Partial-Bug: #1673227 Change-Id: I6296cee6b8a4be1dd53c52f6290ebda926cf6465 (cherry picked from commit 0b2d7c4d028104d3a2a0d5851b194eface707ec3)