summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Add Gerrit pipeline trigger requirementsJames E. Blair2023-04-289-197/+761
| | | | | | | | | | | | | | | | | | | | | | This updates the Gerrit driver to match the pattern in the GitHub driver where instead of specifying individual trigger requirements such as "require-approvals", instead a complete ref filter (a la "requirements") can be embedded in the trigger filter. The "require-approvals" and "reject-approvals" attributes are deprecated in favor of the new approach. Additionally, all require filters in Gerrit are now available as reject filters. And finally, the Gerrit filters are updated to return FalseWithReason so that log messages are more useful, and the Github filters are updated to improve the language, avoid apostraphes for ease of grepping, and match the new Gerrit filters. Change-Id: Ia9c749f1c8e318fe01e84e52831a9d0d2c10b203
* Add GitHub pipeline trigger requirementsJames E. Blair2023-04-288-202/+598
| | | | | | | | | | | | | | | | | | | | This mimics a useful feature of the Gerrit driver and allows users to configure pipelines that trigger on events but only if certain conditions of the PR are met. Unlike the Gerrit driver, this embeds the entire require/reject filter within the trigger filter (the trigger filter has-a require or reject filter). This makes the code simpler and is easier for users to configure. If we like this approach, we should migrate the gerrit driver as well, and perhaps the other drivers. The "require-status" attribute already existed, but was undocumented. This documents it, adds backwards-compat handling for it, and deprecates it. Some documentation typos are also corrected. Change-Id: I4b6dd8c970691b1e74ffd5a96c2be4b8075f1a87
* Fix user creation in Docker test setup scriptSimon Westphahl2023-02-211-1/+2
| | | | | | | | | | | | MySQL 8 no longer supports implicitly creating a user using the GRANT statement. Use a separate CREATE USER statement instead. ERROR 1064 (42000) at line 1: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'identified by 'openstack_citest' WITH GRANT OPTION' at line 1 Change-Id: I4cab4c1855d1ba97cbfc9dd0835b3d302d73aa62
* Merge "Cleanup old rebase-merge dirs on repo reset"8.2.0Zuul2023-02-172-1/+85
|\
| * Cleanup old rebase-merge dirs on repo resetSimon Westphahl2023-02-172-1/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using the rebase merge-mode a failed "merge" will leave the repo in a state that Zuul so far could not recover from. The rebase will create a `.git/rebase-merge` directory which is not removed when the rebase fails. To fix this we will abort the rebase when it fails and also remove any existing `.git/rebase-merge` and `.git/rebase-apply` directories when resetting the repository. DEBUG zuul.Merger: [e: ...] Unable to merge {'branch': 'master', 'buildset_uuid': 'f7be4215f37049b4ba0236892a5d8197', 'connection': 'github', 'merge_mode': 5, 'newrev': None, 'number': 71, 'oldrev': None, 'patchset': 'e81d0b248565db290b30d9a638095947b699c76d', 'project': 'org/project', 'ref': 'refs/pull/71/head'} Traceback (most recent call last): File "/opt/zuul/lib/python3.10/site-packages/zuul/merger/merger.py", line 1099, in _mergeChange commit = repo.rebaseMerge( File "/opt/zuul/lib/python3.10/site-packages/zuul/merger/merger.py", line 626, in rebaseMerge repo.git.rebase(*args) File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 542, in <lambda> return lambda *args, **kwargs: self._call_process(name, *args, **kwargs) File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 1005, in _call_process return self.execute(call, **exec_kwargs) File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 822, in execute raise GitCommandError(command, status, stderr_value, stdout_value) git.exc.GitCommandError: Cmd('git') failed due to: exit code(128) cmdline: git rebase 39fead1852ef01a716a1c6470cee9e4197ff5587 stderr: 'fatal: It seems that there is already a rebase-merge directory, and I wonder if you are in the middle of another rebase. If that is the case, please try git rebase (--continue | --abort | --skip) If that is not the case, please rm -fr ".git/rebase-merge" and run me again. I am stopping in case you still have something valuable there. Change-Id: I8518cc5e4b3f7bbfc2c2283a2b946dee504991dd
* | Merge "Return cached Github change on concurrent update"Zuul2023-02-172-1/+43
|\ \
| * | Return cached Github change on concurrent updateSimon Westphahl2023-02-172-1/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a pull-request is updated concurrently on a scheduler we'll wait until the first thread has updated the change. The problem is if we needed to create a new PR object. In this case we'd return a Github change that wasn't updated and also doesn't have a cache key set. ERROR zuul.Scheduler: Exception processing pipeline check in tenant foobar Traceback (most recent call last): File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2149, in process_pipelines refreshed = self._process_pipeline( File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2241, in _process_pipeline self.process_pipeline_trigger_queue(tenant, pipeline) File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2447, in process_pipeline_trigger_queue self._process_trigger_event(tenant, pipeline, event) File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2480, in _process_trigger_event pipeline.manager.addChange(change, event) File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 534, in addChange self.updateCommitDependencies(change, None, event) File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 868, in updateCommitDependencies new_commit_needs_changes = [d.cache_key for d in dependencies] File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 868, in <listcomp> new_commit_needs_changes = [d.cache_key for d in dependencies] File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 5946, in cache_key return self.cache_stat.key.reference AttributeError: 'NoneType' object has no attribute 'key' Change-Id: I2f3012060c486d593ad857e046334c3d9bef0ed5
* | | Merge "Update reconfig event ltime on (smart) reconfig"Zuul2023-02-172-1/+5
|\ \ \
| * | | Update reconfig event ltime on (smart) reconfigSimon Westphahl2023-02-172-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure to update the last reconfig event ltime in the layout state during a (smart) reconfig. This is important in order to unblock pipelines when a tenant reconfig event is lost. So far the last reconfig event ltime was passed as -1, but wasn't set in the layout state since the ltime is not allowed to go backward. Change-Id: Iab04a962abbfbe901c22e4d5f1d484e3f53b0d33
* | | | Merge "Allow default-ansible-version to be an int"Zuul2023-02-171-3/+3
|\ \ \ \
| * | | | Allow default-ansible-version to be an intJames E. Blair2023-01-311-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We allow job.ansible-version to be a str, int, or float and coerce it to a string. We should do the same for tenant.default-ansible-version. Change-Id: I1a104b84d578f4932597893f62cd0cd06f031b4a
* | | | | Merge "Switch our local testing docker-compose to mysql 8.0"Zuul2023-02-171-1/+1
|\ \ \ \ \ | |_|/ / / |/| | | |
| * | | | Switch our local testing docker-compose to mysql 8.0Clark Boylan2023-01-311-1/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently I debugged an issue that reproduced on mysql 8.0 on Jammy but not my local system. It took quite some time for me to realize that there was a difference in database versions which ended up being an important detail. Update our docker-compose to better match what we get in CI. Change-Id: I7de268acb81680f3e6b7d3b1aa057e7babd3fa62
* | | | Merge "Fix exception on retry in source base class"Zuul2023-02-151-2/+1
|\ \ \ \
| * | | | Fix exception on retry in source base classSimon Westphahl2023-02-141-2/+1
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ERROR zuul.Scheduler: Exception processing pipeline check in tenant foobar Traceback (most recent call last): File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2149, in process_pipelines refreshed = self._process_pipeline( File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2241, in _process_pipeline self.process_pipeline_trigger_queue(tenant, pipeline) File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2447, in process_pipeline_trigger_queue self._process_trigger_event(tenant, pipeline, event) File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2480, in _process_trigger_event pipeline.manager.addChange(change, event) File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 534, in addChange self.updateCommitDependencies(change, None, event) File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 864, in updateCommitDependencies dep = source.getChangeByURLWithRetry(match, event) File "/opt/zuul/lib/python3.10/site-packages/zuul/source/__init__.py", line 112, in getChangeByURLWithRetry return dep UnboundLocalError: local variable 'dep' referenced before assignment Change-Id: I1c706e5e5d2d337ec84b8fc1ad5e900191f2362c
* | | | Merge "Add project-ssh-key to API docs"Zuul2023-02-151-4/+36
|\ \ \ \
| * | | | Add project-ssh-key to API docsDmitriy Rabotyagov2022-12-071-4/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment API call for fetching SSH public key is not documented anywhere. It's a bit confusing when a user tries to check through available calls. This call documented only in ``job-content.rst``. Change-Id: I69a8d8994b1b4e49ac2c5b07690ebb9ff2a62e38
* | | | | Merge "Prevent files cache ltime from going backward"Zuul2023-02-151-0/+6
|\ \ \ \ \
| * | | | | Prevent files cache ltime from going backwardSimon Westphahl2023-02-011-0/+6
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a race condition when two schedulers request files for the same repository at roughly the same time. The files cache ltime is generated before creating the cat job. We also cannot guarantee the order in which the job results are processed. Since we include the files cache ltime in the min. ltimes as part of the layout state we need to make sure that the timestamp of the cache doesn't go backward. This should fix the following exception we occassionally see in zuul-web: WARNING zuul.ConfigLoader: Zuul encountered an error while accessing the repo org/project. The error was: Configuration files missing from cache. Check Zuul scheduler logs for more information. Change-Id: I9fe2ed0b7a95dd12bb978273673e905d06bee72d
* | | | | Merge "Add PBR_VERSION argument to Dockerfile"Zuul2023-02-151-0/+1
|\ \ \ \ \
| * | | | | Add PBR_VERSION argument to DockerfileJames E. Blair2023-02-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows the use of the PBR_VERSION environment variable when building container imasges. This facilitates custom version numbers with builds. Change-Id: Ib0156836285a798ebe184691d109301bdf751efb
* | | | | | Merge "Simplufy shouldRefresh* checks"Zuul2023-02-151-6/+2
|\ \ \ \ \ \
| * | | | | | Simplufy shouldRefresh* checksJames E. Blair2023-02-011-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In comments on I3824af6149bf27c41a8d895fc682236bd0d91f6b Clark Boylan suggested that an equality check of the version number might be a simpler and more robust way of checking that an object should be refreshed. This change implements that suggestion. Change-Id: I9261fc68442e4002c4579db4f99d3c151ffd485c
* | | | | | | Merge "Expand the scope of zkprofile"Zuul2023-02-151-4/+3
|\ \ \ \ \ \ \
| * | | | | | | Expand the scope of zkprofileJames E. Blair2023-02-141-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the zkprofile command only profiles the pipeline refresh operations. This expands the scope to include all pipeline operations (refreshing and then any subsequent changes). In particular, this will help identify places where we write too much data. Change-Id: Ida7bcb311b5037f6ce15a166ddeae6db44ed6709
* | | | | | | | Merge "Add more log messages for run handler steps"Zuul2023-02-151-1/+7
|\ \ \ \ \ \ \ \
| * | | | | | | | Add more log messages for run handler stepsSimon Westphahl2023-02-061-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds distinct log message for certain run handler steps. So far some of those phases could only be inferred by other related log messages. Change-Id: Icceca78456b20754cbebb5747da3a538586cd5f5
* | | | | | | | | Merge "Add scheduler run handler metric"Zuul2023-02-153-52/+63
|\ \ \ \ \ \ \ \ \ | |/ / / / / / / /
| * | | | | | | | Add scheduler run handler metricSimon Westphahl2023-02-063-52/+63
| | |_|_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to better understand the scheduler run handler performance this change adds a new `zuul.scheduler.run_handler` metric to measure the duration of one run handler loop. Change-Id: I77e862cf99d6a8445e71d7daab410d5853487dc3
* | | | | | | | Merge "gerrit driver: fix bug around unicode branch names"Zuul2023-02-152-2/+52
|\ \ \ \ \ \ \ \ | |_|/ / / / / / |/| | | | | | |
| * | | | | | | gerrit driver: fix bug around unicode branch namesAlex Hornung2023-01-242-2/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a branch name contains unicode characters that are more than 1-byte wide, the size in bytes of the pack record won't match the size in characters, and the pack parsing will be incorrect. Instead, treat everything as an encoded byte string until parsing is done - and only decode when handling a single, parsed, record. Change-Id: I7f1a0cc96a36129fbc04c7a8687da3f66c1eef99
* | | | | | | | Merge "Document pct_used_hdd stat"Zuul2023-02-141-0/+5
|\ \ \ \ \ \ \ \
| * | | | | | | | Document pct_used_hdd statJames E. Blair2022-11-031-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change-Id: I072d81f6cfd489cf1cf69189eeb547e5cb68bebb
* | | | | | | | | Merge "Fix queue item creation in bundle deserialization"Zuul2023-02-141-2/+1
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | Fix queue item creation in bundle deserializationSimon Westphahl2023-02-141-2/+1
| | |_|_|_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I7fa99cd83a857216321f8d946fd42abd9ec427a3 replaced the pipeline reference of the `QueueItem` with a property. However, the pipeline was still passed to the `fromZK()` method when deserializing the bundle. ERROR zuul.Scheduler: Exception processing pipeline check in tenant foobar Traceback (most recent call last): File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2149, in process_pipelines refreshed = self._process_pipeline( File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2227, in _process_pipeline pipeline.state.refresh(ctx) File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 715, in refresh return super().refresh(context) File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 238, in refresh self._load(context) File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 361, in _load self._set(**self.deserialize(data, context)) File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 762, in deserialize queue.refresh(context) File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 238, in refresh self._load(context) File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 361, in _load self._set(**self.deserialize(data, context)) File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 1103, in deserialize Bundle.deserialize(context, self, items_by_path, bundle_data))) File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 5912, in deserialize bundle.items = [ File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 5913, in <listcomp> items_by_path.get(p) or QueueItem.fromZK( File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 232, in fromZK obj._set(**kw) File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 420, in _set super().__setattr__(name, value) AttributeError: can't set attribute 'pipeline' Change-Id: Ia8de8f6717b3a1211cf4d1841e56ccd0c78ee677
* | | | | | | | | Merge "Fix deprecated use of currentThread() in REPL"Zuul2023-02-141-3/+3
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | Fix deprecated use of currentThread() in REPLSimon Westphahl2023-02-141-3/+3
| |/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zuul/lib/repl.py:29: DeprecationWarning: currentThread() is deprecated, use current_thread() instead obj = self.files.get(threading.currentThread(), self.default) Change-Id: I6f5a9b6b169b882024a41623364eefe0955796ef
* | | | | | | | | Merge "Switch to sqlalchemy 2.0"Zuul2023-02-142-3/+1
|\ \ \ \ \ \ \ \ \
| * | | | | | | | | Switch to sqlalchemy 2.0Clark Boylan2023-02-012-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our test suite no longer complains with RemovedIn20 warnings for sqlalchemy 2.0 incompatibilities. Check if this works now. Change-Id: I0ffab3788493dcddf39a1262813293abb6611c19
* | | | | | | | | | Merge "Don't nest alembic transactions"Zuul2023-02-141-18/+22
|\ \ \ \ \ \ \ \ \ \ | |/ / / / / / / / /
| * | | | | | | | | Don't nest alembic transactionsClark Boylan2023-02-011-18/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SqlAlchemy 2.0 has gotten a lot more picky about transactions. In addition to needing to explicitly set up transactions using SqlAlchemy 2.0 it seems alembic's get_current_revision() call cannot be in the same transaction as the alembic migrate/stamp calls with MySQL 8.0. In particular the get_current_revision call seems to get a SHARED_HIGH_PRIO lock on the alembic_version table. This prevents the migrate/stamp calls from creating the alembic_version table as this requires an EXCLUSIVE lock. The SHARED_HIGH_PRIO lock appears to be in place as long as the get_current_revision transaction is active. To fix this we simplify our migration tooling and put get_current_revision in a transaction block of its own. The rest of our migrate function calls into functions that will setup new transactions and it doesn't need to be in this block. Change-Id: Ic71ddf1968610784cef72c4634ccec3a18855a0e
* | | | | | | | | | Merge "Address SqlAlchemy Removed in 2.0 Warnings"Zuul2023-02-145-113/+120
|\ \ \ \ \ \ \ \ \ \ | |/ / / / / / / / /
| * | | | | | | | | Address SqlAlchemy Removed in 2.0 WarningsClark Boylan2023-01-305-113/+120
| | |_|_|_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This addresses SqlAlchemy's removed in 2.0 warnings. Now that SqlAlchemy 2.0 has released we can see that we are not compatible yet. A good first step in adding compatibility is fixing warnings in 1.4. In particular there are four types of warning we fix here: 1. Using raw strings in conn.execute() calls. We need to use the text() construct instead. 2. Passing a list of items to select when doing select queries. Instead we need to pass things as normal posargs. 3. Accessing row result items as if the row is a dict. THis is not longer possible without first going through the row._mapping system. Instead we can access items as normal object attributes. 4. You must now use sqlalchemy.inspect() on a connectable to create an Inspector object rather than instantiating it directly. Finally we set up alembic's engine creation to run with future 2.0 behavior now that the warnings are cleared up. This appears to have already been done for the main zuul application. Change-Id: I5475e39bd93d71cd1106ec6d3a5423ea2dd51859
* | | | | | | | | Merge "Fix more file opening ResourceWarnings"Zuul2023-02-144-4/+8
|\ \ \ \ \ \ \ \ \ | |_|/ / / / / / / |/| | | | | | | |
| * | | | | | | | Fix more file opening ResourceWarningsClark Boylan2023-02-074-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've managed to get better at grepping for this and this finds some of the stragglers. They are all file opens without closes fixed by using a with open() context manager. Change-Id: I7b8c8516a86558e2027cb0f01aefe2dd1849069c
* | | | | | | | | Fix configuration error related to YAML anchorsJames E. Blair2023-02-132-0/+73
| |_|_|_|_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PyYAML is efficient with YAML anchors and will only construct one Python object for a YAML mapping that is used multiple times via anchors. We copy job variables (a mapping) straight from the YAML to an attribute of the Job object, then we freeze the Job object which converts all of its dicts to mappingproxy objects. This mutates the contents of the vars dict, and if that is used on another job via an anchor, we will end up with mappingproxy objects in what we think is a "raw yaml" configuration dict, and we will not be able to serialize it in case of errors. To address this, perform a deep copy of every configuration yaml blob before parsing it into an object. Change-Id: Idaa6ff78b1ac5a108fb9f43700cf8e66192c43ce
* | | | | | | | Merge "Update build/job versions in place"Zuul2023-02-111-6/+4
|\ \ \ \ \ \ \ \ | | |_|_|_|_|/ / | |/| | | | | |
| * | | | | | | Update build/job versions in placeJames E. Blair2023-01-171-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous change to add build/job object version numbers in ZK was perhaps too conservative with the way it updated them. It updated a copy of the version dictionary in order to avoid updating the in-memory dictionary in case the ZK write failed. This behavior is common for calls to updateAttributes, but we don't get this behavior when we use activeContext, so we generally don't rely on it. This change makes it slightly more efficient by updating the in-memory dictionary in place before writing it to ZK. Change-Id: If48d745ceefc0897946232947e5c6056fc833041
* | | | | | | | Remove layout_uuid from PipelineState create callJames E. Blair2023-02-103-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This argument is no longer necessary because it is set via an inderect route through the pipeline and tenant objects. Remove it, and also validate that it is set correctly in unit tests. Change-Id: I62ce18a61a416cbc397866838f8ac3b0ec1bd564
* | | | | | | | Fix race condition in pipeline change list initJames E. Blair2023-02-106-49/+241
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simon Westphahl describes the race condition: > [The race condition] can occur after a reconfiguration while > some schedulers are updating their local layout and some > already start processing pipelines in the same tenant. > > In this case the pipeline manager's `_postConfig()` method that > calls `PipelineChangeList.create(...)` races with the pipeline > processor updating the change keys. > > This leads to two change lists being written as separate > shards, that can't be correctly loaded, as all shards combined > are expected to form a single JSON object. > > The sequence of events seems to be the following: > 1. S1: pipeline processor needs to update the change keys > 2. S1 the shard writer will delete the `change_list` key with the old > shards > 3. S2: configloader calls the `_postConfig()` method > 4. S2: `PipelineChangeList.create()` notices that the `change_list` node > doesn't exists in Zookeeper: > https://opendev.org/zuul/zuul/src/branch/master/zuul/model.py#L921 > 6. S2: the shard writer creates the first shard `0000000000` > 7. S1: the shard writer creates the second shared `0000000001` > > The race condition seems to be introduced with > Ib1e467b5adb907f93bab0de61da84d2efc22e2a7 That change updated the pipeline manager _postConfig method so that it no longer acquires the pipeline lock when initalizing the pipeline state and change lists. This greatly reduces potential delays during reconfiguration, but has, perhaps predictably, lead to the race condition above. In the commit message for that change, I noted that we might be able to avoid even more work if we accept some caveats related to error reporting. Avoiding that work mean avoiding performing any writes during _postConfig which addresses the root cause of the race condition (steps 3-6 above. Ed. note: there is no step 5). From the commit message: > We still need to attach new state and change list objects to > the pipeline in _postConfig (since our pipeline object is new). > We also should make sure that the objects exist in ZK before we > leave that method, so that if a new pipeline is created, other > schedulers will be able to load the (potentially still empty) > objects from ZK. As an alternative, we could avoid even this > work in _postConfig, but then we would have to handle missing > objects on refresh, and it would not be possible to tell if the > object was missing due to it being new or due to an error. To > avoid masking errors, we keep the current expectation that we > will create these objects in ZK on the initial reconfiguration. The current change does exactly that. We no longer perform any ZK write operations on the state and change list objects in _postConfig. Instead, inside of the refresh methods, we detect the cases where they should be newly created and do so at that time. This happens with the pipeline lock, so is safe against any simultaneous operation from other components. There will be "ERROR" level log messages indicating that reading the state from ZK has failed when these objects are first initialized. To indicate that this is probably okay, they will now be immediately followed by "WARNING" level messages explaining that. Strictly speaking, this particular race should only occur for the change list object, not the pipeline state, since the race condition above requires a sharded object and of the two, only the change list is sharded. However, to keep the lifecycle of these two objects matched (and to simplify _postConfig) the same treatment is applied to both. Note that change I7fa99cd83a857216321f8d946fd42abd9ec427a3 merged after Ib1e467b and changed the behavior slightly, introducing the old_state and old_list arguments. Curiously, the old_list argument is effectively unused, so it is removed entirely in this change. Old_state still has a purpose and is retained. Change-Id: I519348e7d5d74e675808e990920480fb6e1fb981