summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Update git8.1.0James E. Blair2023-01-181-0/+13
| | | | | | | | This updates git to address CVE-2022-23521. Change-Id: Ib08ff1fc7b3c8623fa6b927f3010af72e1b946cf Co-Authored-By: Jeremy Stanley <fungi@yuggoth.org> Co-Authored-By: Clark Boylan <clark.boylan@gmail.com>
* Merge "Re-elect James Blair as project lead"Zuul2023-01-171-1/+1
|\
| * Re-elect James Blair as project leadJames E. Blair2023-01-091-1/+1
| | | | | | | | | | | | Extend my term as project lead for another year. Change-Id: I48b34551601236c99a2f2d0786cdde32d01d2c80
* | Merge "Honor independent pipeline requirements for non-live changes"Zuul2023-01-1715-12/+167
|\ \
| * | Honor independent pipeline requirements for non-live changesJames E. Blair2023-01-1715-12/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Independent pipelines ignore requirements for non-live changes because they are not actually executed. However, a user might configure an independent pipeline that requires code review and expect a positive code-review pipeline requirement to be enforced. To ignore it risks executing unreviewed code via dependencies. To correct this, we now enforce pipeline requirements in independent pipelines in the same way as dependent ones. This also adds a new "allow-other-connections" pipeline configuration option which permits users to specify exhaustive pipeline requirements. Change-Id: I6c006f9e63a888f83494e575455395bd534b955f Story: 2010515
* | | Merge "Fix scheduleFilesChanges fallback to target branch ref"Zuul2023-01-121-2/+3
|\ \ \
| * | | Fix scheduleFilesChanges fallback to target branch refFabien Boucher2023-01-111-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a driver does not specify the change.base_sha then the `scheduleFilesChanges` method implements a fallback on change.branch. However the piece of code that implement that fallback was not functional. This changes fixes the fallback mechanic. Change-Id: If8da86d4a5e4af5aa1af4cd3860dc13c15833fd6
* | | | Merge "Further avoid unnecessary change dependency updates"Zuul2023-01-123-1/+64
|\ \ \ \
| * | | | Further avoid unnecessary change dependency updatesJames E. Blair2023-01-043-1/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When adding a unit test for change I4fd6c0d4cf2839010ddf7105a7db12da06ef1074 I noticed that we were still querying the dependent change 4 times instead of the expected 2. This was due to an indentation error which caused all 3 query retry attempts to execute. This change corrects that and adds a unit test that covers this as well as the previous optimization. Change-Id: I798d8d713b8303abcebc32d5f9ccad84bd4a28b0
* | | | | Merge "Avoid unnecessary change dependency updates"Zuul2023-01-121-26/+43
|\ \ \ \ \ | |/ / / /
| * | | | Avoid unnecessary change dependency updatesSimon Westphahl2023-01-041-26/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updating commit and topic dependencies incurres a cost as we query the source system for the change details. The current implementation will update the commit and topic dependencies independent of whether or not the dependencies are already populated and when they were last updated. This can lead to multiple updates for the same change in a short amount of time e.g. when an event leads to a change to be added to multiple pipelines or when a circular dependency is enqueued. Instead we can use the following conditions to determine if the dependencies need to be refreshed: 1. when `updateCommitDependencies()` is called without an event (force a dependency refresh) 2. when the change's commit or topic dependencies were never updated 3. when the event ltime is greater than the last modified zxid of the change (dependencies could have change in the meantime) Change-Id: I4fd6c0d4cf2839010ddf7105a7db12da06ef1074
* | | | | Unvendor kazoo locks recipesClark Boylan2023-01-114-762/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These recipes were vendored so that we could carry this fix locally: https://github.com/python-zk/kazoo/pull/650 It appears that this fix has been merged and included in kazoo>=2.9.0 so we include that as the minimum version and drop the vendored file. This also fixes the isSet() deprecation warning as upstream kazoo has switched to is_set(). Change-Id: Ide48e9f949e083b658775b74db3856b118fc5d69
* | | | | Use unsafe_skip_rsa_key_validation with cryptographyJames E. Blair2023-01-112-19/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a partial revert of c4476d1b6aebec0ea3198e0203c7d35bedbea57a which added the use of a private flag to skip unecessary (for us) cryptography checks. The cryptography package has now normalized that flag into a parameter we can pass, so use the new param and update the dependency to require the version that supports it. Change-Id: I1dfa203525e85020ccf942422ad3cc7040b851dd
* | | | | Cleanup test loggingClark Boylan2023-01-112-2/+11
| |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were overlogging because we check for an openssl flag early and warn if it isn't present. That warning creates a default root streamhandler that emits to stderr causing all our logging to be emitted there. Fix this by creating a specific logger for this warning (avoids polluting the root logger) and add an assertion that the root logger's handler list is empty when we modify it for testing. Note I'm not sure why this warning is happening now and wasn't before. Maybe our openssl installations changed or cryptography modified the flag? This is worth investigating in a followup. Change-Id: I2a82cd6575e86facb80b28c81418ddfee8a32fa5
* | | | Merge "Update openshift client install version"Zuul2023-01-111-3/+3
|\ \ \ \ | |_|/ / |/| | |
| * | | Update openshift client install versionClark Boylan2023-01-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This updates the openshift client install to use the latest stable release. Hashes of the oc and kubectl command remain the same which should continue to allow us to avoid copying both files. Note we don't fetch the client from the stable-4.11/ path because the versions of the client under this path are updated when the stable version updates. Instad we fetch it from the permanent location for the current stable release (4.11.20/). Change-Id: Ie78ecd9108f8d6d100479910aa524f867020774f
* | | | Merge "Dedup the oc and kubectl commands in the docker images"Zuul2023-01-091-1/+3
|\ \ \ \ | |/ / / | | | / | |_|/ |/| |
| * | Dedup the oc and kubectl commands in the docker imagesClark Boylan2022-12-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | These binaries are about 115MB each and we copy both of them. Fortunately they are identical according to hashing routines so we can save space by copying them once and using a symlink. We choose to make `oc` the canonical file as these binaries come from openshift. Change-Id: I3a34acf4ee20db935a471c4fa9ca5e2f7d297d39
* | | Merge "Add timer event directly to pipeline event queues"Zuul2023-01-091-1/+3
|\ \ \
| * | | Add timer event directly to pipeline event queuesSimon Westphahl2023-01-041-1/+3
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Timer events are always generated for a particular pipeline and tenant. Because of that we can directly add the timer triggers to the matching pipeline event queue instead of taking a detour via the tenant trigger event queue. This will also eliminate duplicate events in cases where a project is part of multiple tenants. This happened since the scheduler added the event to all tenants that have the project configured that the event refers to. Change-Id: I6b4b77a82aac6d2c3441ace9f64dc9b4a80c5856
* | | Merge "Correctly set the repo remote URL"Zuul2023-01-092-2/+68
|\ \ \ | |/ / |/| |
| * | Correctly set the repo remote URLSimon Westphahl2022-12-072-2/+68
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I8e1b5b26f03cb75727d2b2e3c9310214a3eac447 introduced a regression that prevented us from re-cloning a repo that no longer exists on the file system (e.g. deleted by an operator) but where we still have the cached `Repo` object. The problem was that we only updated the remote URL of the repo object after we wrote it to the Git config. Unfortunately, if the repo no longer existed on the file system we would attempt to re-clone it with a possibly outdated remote URL. `test_set_remote_url` is a regression test for the issue described above. `test_set_remote_url_invalid` verifies that the original issue is fixes, where we updated the remote URL attribute of the repo object, but fail to update the Git config. Change-Id: I311842ccc7af38664c28177450ea9e80e1371638
* | Merge "Store pause and resume events on the build and report them"Zuul2023-01-029-5/+220
|\ \
| * | Store pause and resume events on the build and report themFelix Edel2023-01-029-5/+220
| |/ | | | | | | | | | | | | | | | | | | | | | | | | When a build is paused or resumed, we now store this information on the build together with the event time. Instead of additional attributes for each timestamp, we add an "event" list attribute to the build which can also be used for other events in the future. The events are stored in the SQL database and added to the MQTT payload so the information can be used by the zuul-web UI (e.g. in the "build times" gantt chart) or provided to external services. Change-Id: I789b4f69faf96e3b8fd090a2e389df3bb9efd602
* | Don't install '.' in bindep target.Clark Boylan2022-12-291-1/+0
| | | | | | | | | | | | | | | | | | The bindep target needs to be able to run when distro deps required to install Zuul are not yet installed (as bindep is what tells the user what distro deps to install). Drop the session install for '.' to address this. Change-Id: I1dfa125df7dfaf9601880f4eadbfafb91ab01945
* | Pass session.posargs to nox venv session's run callClark Boylan2022-12-291-1/+1
| | | | | | | | | | | | | | It is an error to pass no arguments to session.run() and using session.posargs preserves previous tox behavior. Change-Id: I7393b400059313528774ef477c4a96c71c04fe7e
* | Merge "Switch to nox-docs"Zuul2022-12-221-4/+4
|\ \
| * | Switch to nox-docsJames E. Blair2022-12-201-4/+4
| | | | | | | | | | | | | | | Depends-On: https://review.opendev.org/868228 Change-Id: I95dd6f751bd3d64a146ed32ec660e48dbe473d81
* | | Merge "Add noxfile and switch to nox"Zuul2022-12-228-41/+259
|\ \ \ | |/ /
| * | Add noxfile and switch to noxJames E. Blair2022-12-208-41/+259
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tox v4 behaves significantly differently than v3, and some of the more complex things we do with tox would need an overhaul to continue to use it. Meanwhile, nox is much simpler and more flexible, so let's try using it. This adds a noxfile which should be equivalent to our tox.ini file. We still need to update the docs build (which involves changes to base jobs) before we can completely remove tox. Depends-On: https://review.opendev.org/868134 Change-Id: Ibebb0988d2702d310e46c437e58917db3f091382
* | | Merge "Pin tox to 3"Zuul2022-12-221-0/+1
|\ \ \ | |/ /
| * | Pin tox to 3James E. Blair2022-12-151-0/+1
| | | | | | | | | | | | | | | | | | | | | There are many issues with toxv4 that make it difficult to use with Zuul. Pin to tox version 3 while we find a solution. Change-Id: I608b2ad4ab9407d8a0b77d5def5188922875e00f
* | | Merge "Fix deduplication exceptions in pipeline processing"Zuul2022-12-206-15/+43
|\ \ \
| * | | Fix deduplication exceptions in pipeline processingJames E. Blair2022-11-216-15/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a build is to be deduplicated and has not started yet and has a pending node request, we store a dictionary describing the target deduplicated build in the node_requests dictionary on the buildset. There were a few places where we directly accessed that dictionary and assumed the results would be the node request id. Notably, this could cause an error in pipeline processing (as well os potentially some other edge cases such as reconfiguring). Most of the time we can just ignore deduplicated node requests since the "real" buildset will take care of them. This change enriches the API to help with that. In other places, we add a check for the type. To test this, we enable relative_priority in the config file which is used in the deduplication tests, and we also add an assertion which runs at the end of every test that ensures there were no pipeline exceptions during the test (almost all the existing dedup tests fail this assertion before this change). Change-Id: Ia0c3f000426011b59542d8e56b43767fccc89a22
* | | | Merge "Document file-matcher behavior for refs w/o files"Zuul2022-12-191-0/+16
|\ \ \ \
| * | | | Document file-matcher behavior for refs w/o filesSimon Westphahl2022-12-021-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change documents the behavior of file matchers for refs that don't contain any files. This documents the behavior that was introduced with Icf5df145e4cd351ffd04b1e417e9f7ab8c5ccd12 after the related discussion in If7a3a7cc212c981529be086dadb8157f08bda342. Change-Id: I579dd6b50cd50a78d5e846f7c2376ffc9e7ba4a1
* | | | | Merge "Report a config error for unsupported merge mode"Zuul2022-12-1913-15/+401
|\ \ \ \ \
| * | | | | Report a config error for unsupported merge modeJames E. Blair2022-11-1113-15/+401
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This updates the branch cache (and associated connection mixin) to include information about supported project merge modes. With this, if a project on github has the "squash" merge mode disabled and a Zuul user attempts to configure Zuul to use the "squash" mode, then Zuul will report a configuration syntax error. This change adds implementation support only to the github driver. Other drivers may add support in the future. For all other drivers, the branch cache mixin simply returns a value indicating that all merge modes are supported, so there will be no behavior change. This is also the upgrade strategy: the branch cache uses a defaultdict that reports all merge modes supported for any project when it first loads the cache from ZK after an upgrade. Change-Id: I3ed9a98dfc1ed63ac11025eb792c61c9a6414384
* | | | | | Merge "Reuse queue items after reconfiguration"Zuul2022-12-154-18/+33
|\ \ \ \ \ \
| * | | | | | Reuse queue items after reconfigurationJames E. Blair2022-12-134-18/+33
| | |_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we reconfigure, we create new Pipeline objects, empty the values in the PipelineState and then reload all the objects from ZK. We then re-enqueue all the QueueItems to adjust and correct the object pointers between them (item_ahead and items_behind). We can avoid reloading all the objects from ZK if we keep queue items from the previous layout and rely on the re-enqueue method correctly resetting any relevant object pointers. We already defer this re-enqueue work to the next pipeline processing after a reconfiguration (so the reconfiguration itself doesn't take very long, but now the first pipeline run after a reconfiguration must perform a complete refresh). With this change, that first refresh is no longer be a complete refresh but a normal refresh, so we will get the benefits of previous reductions in refresh times. The main risk of this change is that it could introduce a memory leak. During development, additional debugging was performed to verify that after a re-enqueue, there are no obsolete layout or pipeline objects reachable from the pipeline state object. On schedulers where a re-enqueue does not take place (these schedulers would simply see the layout update and re-create their PipelineState python objects and refresh them after another scheduler has already performed the re-enqueue), we need to ensure that we update any internal references to Pipeline objects (which then lead to Layout objects and can cause memory leaks). To address that, we update the pipeline references in the ChangeQueue instances underneath a given PipelineState when that state is being reset after a reconfiguration. This change also removes the pipeline reference from the QueueItem, replacing it with a property that uses the pipeline reference on the ChangeQueue instead. This removes one extra place where an incorrect reference could cause a memory leak. Change-Id: I7fa99cd83a857216321f8d946fd42abd9ec427a3
* | | | | | Merge "Abort job if cleanup playbook timed out"Zuul2022-12-144-19/+51
|\ \ \ \ \ \ | |/ / / / / |/| | | | |
| * | | | | Abort job if cleanup playbook timed outFelix Edel2022-12-124-19/+51
| | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've investigated an issue where a job was stuck on the executor because it wasn't aborted properly. The job was cancelled by the scheduler, but the cleanup playbook on the executor ran into a timeout. This caused another abort via the WatchDog. The problem is that the abort function doesn't do anything if the cleanup playbook is running [1]. Most probably this covers the case that we don't want to abort the cleanup playbook after a normal job cancellation. However, this doesn't differentiate if the abort was caused by the run of the cleanup playbook itself, resulting in a build that's hanging indefinitely on the executor. To fix this, we now differentiate if the abort was caused by a stop() call [2] or if it was caused by a timeout. In case of a timeout, we kill the running process. Add a test case to validate the changed behaviour. Without the fix, the test case runs indefinitetly because the cleanup playbook won't be aborted even after the test times out (during the test cleanup). [1]: https://opendev.org/zuul/zuul/src/commit/4d555ca675d204b1d668a63fab2942a70f159143/zuul/executor/server.py#L2688 [2]: https://opendev.org/zuul/zuul/src/commit/4d555ca675d204b1d668a63fab2942a70f159143/zuul/executor/server.py#L1064 Change-Id: I979f55b52da3b7a237ac826dfa8f3007e8679932
* | | | | Merge "Avoid acquiring pipeline locks in manager postConfig"Zuul2022-12-134-93/+65
|\ \ \ \ \
| * | | | | Avoid acquiring pipeline locks in manager postConfigJames E. Blair2022-12-124-93/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After creating a new tenant layout, we call _postConfig on every pipeline manager. That creates the shared change queues and then also attaches new PipelineState and PipelineChangeList objects to the new Pipeline objects on the new layout. If the routine detects that it is the first scheduler to deal with this pipeline under the new layout UUID, it also resets the pipeline state in ZK (resets flags and moves all queues to the old_queues attribute so they will be re-enqueued). It also drops a PostConfigEvent into the pipeline queues to trigger a run of the pipeline after the re-enqueues. The work in the above paragraph means that we must hold the pipeline lock for each pipeline in turn during the reconfiguration. Most pipelines should not be processed at this point since we do hold the tenant write lock, however, some cleanup routines can be operating, and so we would end up waiting for them to complete before completing the reconfiguration. This could end up adding minutes to a reconfiguration. Incidentally, these cleanup routines are written with the expectation that they may be running during a reconfiguration and handle missing data from refreshes. We can avoid this by moving the "reset" work into the PipelineState deserialization method, where we can determine at the moment we refresh the object whether we need to "reset" it and do so. We can tell that a reset needs to happen if the layout uuid of the state object does not match the running layout of the tenant. We still need to attach new state and change list objects to the pipeline in _postConfig (since our pipeline object is new). We also should make sure that the objects exist in ZK before we leave that method, so that if a new pipeline is created, other schedulers will be able to load the (potentially still empty) objects from ZK. As an alternative, we could avoid even this work in _postConfig, but then we would have to handle missing objects on refresh, and it would not be possible to tell if the object was missing due to it being new or due to an error. To avoid masking errors, we keep the current expectation that we will create these objects in ZK on the initial reconfiguration. Finally, we do still want to run the pipeline processing after a reconfiguration (at least for now -- we may be approaching a point where that will no longer be necessary). So we move the emission of the PostConfigEvent into the scheduler in the cases where we know it has just updated the tenant layout. Change-Id: Ib1e467b5adb907f93bab0de61da84d2efc22e2a7
* | | | | | Merge "Consider queue settings for topic dependencies"Zuul2022-12-1313-27/+218
|\ \ \ \ \ \ | |_|/ / / / |/| | | | |
| * | | | | Consider queue settings for topic dependenciesSimon Westphahl2022-11-3013-27/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of a change's attributes are tenant-independent. This however is different for topic dependencies, which should only be considered in tenants where the dependencies-by-topic feature is enabled. This is mainly a problem when a project is part of multiple tenants as the dependencies-by-topic setting might be different for each tenant. To fix this we will only return the topic dependencies for a change in tenants where the feature has been activated. Since the `needs_changes` property is now a method called `getNeedsChanges()`, we also changed `needed_by_changes` to `getNeededByChanges()` so they match. Change-Id: I343306db0abbe2fbf98ddb3f81b6d509eaf4a2bf
* | | | | | Merge "Add Python 3.11 testing"Zuul2022-12-023-14/+15
|\ \ \ \ \ \
| * | | | | | Add Python 3.11 testingClark Boylan2022-10-273-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds python 3.11 testing and drops python3.10 in order to keep testing only the bounds of what Zuul supports. Note that currently the python 3.11 available for jammy is based on an RC release. This should be fine as we do functional testing with a released python 3.11 and that is what people will consume via the docker images. Change-Id: Ic5ecf2e23b250d3dbf592983b17ec715d6e9722e
* | | | | | | Merge "Allow clean scheduler shutdown when priming fails"Zuul2022-12-011-2/+3
|\ \ \ \ \ \ \ | |_|_|_|_|/ / |/| | | | | |
| * | | | | | Allow clean scheduler shutdown when priming failsSimon Westphahl2022-12-011-2/+3
| | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If config priming failed we need to unblock the main thread by setting the primed event so we can join the main thread during shutdown. So far this was only done after trying to join the main thread, so we need to set the primed event earlier. Change-Id: I5aef9a215cfd53baa94e525f8c303592a6c7e4b8