delta/openstack/zuul.git - opendev.org: zuul/zuul.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Do not wait for streamer when disabled	James E. Blair	2023-04-10	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	When a user sets zuul_console_disabled, we don't need to try to connect to the streaming daemon. In fact, they may have set it because they know it won't be running. Check for this and avoid the connection step in that case and therefore avoid the extraneous "Waiting on logger" messages and extra 30 second delay at the end of each task. Change-Id: I86af231f1ca1c5b54b21daae29387a8798190a58
*	Merge "Check Gerrit submit requirements"	Zuul	2023-03-30	2	-6/+38
\|\
\| *	Check Gerrit submit requirements	James E. Blair	2023-03-28	2	-6/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With newer versions of Gerrit, we are increasingly likely to encounter systems where the traditional label requirements are minimized in favor of the new submit requirements rules. If Gerrit is configured to use a submit requirement instead of a traditional label blocking rule, that is typically done by switching the label function to "NoBlock", which, like the "NoOp" function, will still cause the label to appear in the "submit_record" field, but with a value of "MAY" instead of "OK", "NEED", or "REJECT". Instead, the interesting information will be in the "submit_requirements" field. In this field we can see the individual submit requirement rules and whether they are satisfied or not. Since submit requirements do not have a 1:1 mapping with labels, determining whether an "UNSATISFIED" submit requirement should be ignored (because it pertains to a label that Zuul will alter, like "Verified") is not as straightforward is it is for submit records. To be conservative, this change looks for any of the "allow needs" labels (typically "Verified") in each unsatisfied submit record and if it finds one, it ignores that record. With this change in place, we can avoid enqueing changes which we are certain can not be merged into gate pipelines, and will continue to enqueue changes about which we are uncertain. Change-Id: I667181565684d97c1d036e2db6193dc606c76c57
* \|	Merge "Fix prune-database command"	Zuul	2023-03-30	2	-8/+37
\|\ \ \| \|/ \|/\|
\| *	Fix prune-database command	James E. Blair	2023-03-29	2	-8/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This command had two problems: * It would only delete the first 50 buildsets * Depending on DB configuration, it may not have deleted anything or left orphan data. We did not tell sqlalchemy to cascade delete operations, meaning that when we deleted the buildset, we didn't delete anything else. If the database enforces foreign keys (innodb, psql) then the command would have failed. If it doesn't (myisam) then it would have deleted the buildset rows but not anything else. The tests use myisam, so they ran without error and without deleting the builds. They check that the builds are deleted, but only through the ORM via a joined load with the buildsets, and since the buildsets are gone, the builds weren't returned. To address this shortcoming, the tests now use distinct ORM methods which return objects without any joins. This would have caught the error had it been in place before. Additionally, the delet operation retained the default limit of 50 rows (set in place for the web UI), meaning that when it did run, it would only delete the most recent 50 matching builds. We now explicitly set the limit to a user-configurable batch size (by default, 10,000 builds) so that we keep transaction sizes manageable and avoid monopolizing database locks. We continue deleting buildsets in batches as long as any matching buildsets remain. This should allow users to remove very large amounts of data without affecting ongoing operations too much. Change-Id: I4c678b294eeda25589b75ab1ce7c5d0b93a07df3
* \|	Merge "Set cache ltime when branch protection changed"	Zuul	2023-03-24	1	-0/+5
\|\ \
\| * \|	Set cache ltime when branch protection changed	Simon Westphahl	2023-03-23	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we detect newly protected branche we also need to set the branch cache ltime accordingly. Otherwise we might end up with schedulers using an outdated branch cache during reconfig and layout update which can result in config not being loaded. Change-Id: Ie18ef0ce9664e58d25f34018f8eb4513bc8b559a
* \| \|	Merge "Add installation_id to event log"	Zuul	2023-03-23	1	-1/+2
\|\ \ \
\| * \| \|	Add installation_id to event log	Dong Zhang	2023-03-23	1	-1/+2
\| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Occasionally we need to look into hanging event processing, with the installation_id to be included in the log, it would be easier to find out which events are blocked in waiting for the lock. Change-Id: I824e299501642b61a57883f4b37dc121f5ea0979
* \| \|	Merge "Retry jobs on transient IO errors on repo update"	Zuul	2023-03-22	1	-0/+4
\|\ \ \
\| * \| \|	Retry jobs on transient IO errors on repo update	Simon Westphahl	2023-02-24	1	-0/+4
\| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We are occassionally seeing different types of IO errors when updating repos on an executor. Currently those exceptions will abort the build and result in an error being reported. Since those errors are usually transient and point to some infrastructure problem we should retry those builds instead. We'll catch all IOErrors which includes request related exceptions from the "requests" Python package. See: https://github.com/psf/requests/blob/main/requests/exceptions.py Traceback (most recent call last): File "/opt/zuul/lib/python3.10/site-packages/zuul/executor/server.py", line 3609, in _innerUpdateLoop self.merger.updateRepo( File "/opt/zuul/lib/python3.10/site-packages/zuul/merger/merger.py", line 994, in updateRepo repo = self.getRepo(connection_name, project_name, File "/opt/zuul/lib/python3.10/site-packages/zuul/merger/merger.py", line 966, in getRepo url = source.getGitUrl(project) File "/opt/zuul/lib/python3.10/site-packages/zuul/driver/github/githubsource.py", line 154, in getGitUrl return self.connection.getGitUrl(project) File "/opt/zuul/lib/python3.10/site-packages/zuul/driver/github/githubconnection.py", line 1744, in getGitUrl self._github_client_manager.get_installation_key( File "/opt/zuul/lib/python3.10/site-packages/zuul/driver/github/githubconnection.py", line 1126, in get_installation_key response = github.session.post(url, headers=headers, json=None) File "/opt/zuul/lib/python3.10/site-packages/requests/sessions.py", line 635, in post return self.request("POST", url, data=data, json=json, *kwargs) File "/opt/zuul/lib/python3.10/site-packages/github3/session.py", line 171, in request response = super().request(args, kwargs) File "/opt/zuul/lib/python3.10/site-packages/requests/sessions.py", line 587, in request resp = self.send(prep, send_kwargs) File "/opt/zuul/lib/python3.10/site-packages/requests/sessions.py", line 701, in send r = adapter.send(request, kwargs) File "/opt/zuul/lib/python3.10/site-packages/cachecontrol/adapter.py", line 53, in send resp = super(CacheControlAdapter, self).send(request, kw) File "/opt/zuul/lib/python3.10/site-packages/requests/adapters.py", line 565, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /api/v3/app/installations/123/access_tokens (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f44f6136ef0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')) Change-Id: I4e07e945c88b9ba61f83131076fbf7b9768a61f9
* \| \|	Merge "Don't add PR title in commit message on squash"	Zuul	2023-03-22	1	-5/+8
\|\ \ \
\| * \| \|	Don't add PR title in commit message on squash	Simon Westphahl	2023-03-20	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Github will use the PR title as the commit subject for squash merges, so we don't need include the title once again in the commit description. Change-Id: Id5a00701c236235f5a49abd025bcfad1b2da916c
* \| \| \|	Merge "Don't discard all cat job results in case of error"	Zuul	2023-03-22	1	-54/+58
\|\ \ \ \
\| * \| \| \|	Don't discard all cat job results in case of error	Simon Westphahl	2023-03-20	1	-54/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	So far we've aborted all cat jobs when any of the cat jobs failed. However, since we ignore those exceptions anyways unless we are validating the tenant configuration, we should continue processing the rest of the job results that haven't failed. If an early cat job failed we might otherwise not load configuration from repositories and branches even if those cat jobs were successful. Change-Id: I34f2a23641de9138b1e887df86ae2602ca190277
* \| \| \| \|	Merge "Truncate Github file annotation message to 64 KB"	Zuul	2023-03-22	1	-1/+7
\|\ \ \ \ \
\| * \| \| \| \|	Truncate Github file annotation message to 64 KB	Simon Westphahl	2023-03-03	1	-1/+7
\| \|/ / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	File annotations that are posted to a PR as part of a check run have a size limit of 64KB for the message field. Since it's unclear if this should be 64KiB or 64KB, we'll use KB as a unit to be on the safe side. Change-Id: I43e4cfbc3a96bf1e8a9828c55150216940a64728
* \| \| \| \|	Merge "Don't connect to MQTT broker in zuul-web"	Zuul	2023-03-22	1	-4/+11
\|\ \ \ \ \
\| * \| \| \| \|	Don't connect to MQTT broker in zuul-web	Simon Westphahl	2023-03-13	1	-4/+11
\| \| \|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to zuul-web not starting the Github event connector, we also don't want it to connect to the configured MQTT broker as the web component won't be using it. Change-Id: If61da19ab0af39bc68d12c9b6613bf6c41d7efaa
* \| \| \| \|	Merge "Fix variable name in job request queue log message"	Zuul	2023-03-22	1	-1/+1
\|\ \ \ \ \
\| * \| \| \| \|	Fix variable name in job request queue log message	Simon Westphahl	2023-03-14	1	-1/+1
\| \|/ / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Traceback (most recent call last): File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/job_request_queue.py", line 612, in cleanup "Unable to delete lock %s", path) UnboundLocalError: local variable 'path' referenced before assignment Change-Id: I9d76abce0f6e539374765bc7a988484548fda3f6
* \| \| \| \|	Merge "merger: Keep redundant cherry-pick commits"	Zuul	2023-03-15	1	-2/+20
\|\ \ \ \ \
\| * \| \| \| \|	merger: Keep redundant cherry-pick commits	Joshua Watt	2023-03-01	1	-2/+20
\| \| \|_\|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In normal git usage, cherry-picking a commit that has already been applied and doesn't do anything or cherry-picking an empty commit causes git to exit with an error to let the user decide what they want to do. However, this doesn't match the behavior of merges and rebases where non-empty commits that have already been applied are simply skipped (empty source commits are preserved). To fix this, add the --keep-redundant-commit option to `git cherry-pick` to make git always keep a commit when cherry-picking even when it is empty for either reason. Then, after the cherry-pick, check if the new commit is empty and if so back it out if the original commit _wasn't_ empty. This two step process is necessary because git doesn't have any options to simply skip cherry-pick commits that have already been applied to the tree. Removing commits that have already been applied is particularly important in a "deploy" pipeline triggered by a Gerrit "change-merged" event, since the scheduler will try to cherry-pick the change on top of the commit that just merged. Without this option, the cherry-pick will fail and the deploy pipeline will fail with a MERGE_CONFICT. Change-Id: I326ba49e2268197662d11fd79e46f3c020675f21
* \| \| \| \|	Merge "Expose nodepool slot attribute"	Zuul	2023-03-15	2	-0/+2
\|\ \ \ \ \ \| \|_\|/ / / \|/\| \| \| \|
\| * \| \| \|	Expose nodepool slot attribute	James E. Blair	2022-11-30	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nodepool now exposes a slot attribute which is set by the static and metastatic drivers to provide a stable id for which "slot" is occupied by a node on a host with max-parallel-jobs > 1. Expose this as a variable to Ansible so that jobs can use it to provide stable but non-conflicting workspace paths. This also documents all of the current "nodepool" host vars. Change-Id: I07cea423df7811c1de7763ff48b8308768246810
* \| \| \| \|	Merge "Elasticsearch: filter zuul data from job returned vars"	Zuul	2023-03-10	1	-1/+3
\|\ \ \ \ \
\| * \| \| \| \|	Elasticsearch: filter zuul data from job returned vars	Thomas Cardonne	2022-09-17	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove data under the `zuul` key from the job returned vars. These returned values are meant to be used only by Zuul and shouldn't be included in documents as it may include large amount of data such as file comments. Change-Id: Ie6de7e3373b21b7c234ffedd5db7d3ca5a0645b6
* \| \| \| \| \|	Merge "Use buildset merger items to collect extra config"	Zuul	2023-03-06	1	-17/+9
\|\ \ \ \ \ \
\| * \| \| \| \| \|	Use buildset merger items to collect extra config	Simon Westphahl	2023-02-24	1	-17/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of deciding in the pipeline manager which items are relevant when collecting extra config dirs and files we'll use the buildset's merger items for that. The merger items don't contain the canonical project name, but we can get this info from the source project. Change-Id: I5885fd8687a85b44d6a5c202cf66bc78920e3b58
* \| \| \| \| \| \|	Merge "extra-config-files/dirs in items of a bundle should be loaded"	Zuul	2023-03-06	1	-8/+17
\|\ \ \ \ \ \ \ \| \|/ / / / / /
\| * \| \| \| \| \|	extra-config-files/dirs in items of a bundle should be loaded	Dong Zhang	2023-02-23	1	-8/+17
\| \| \|_\|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case of a bundle, zuul should load extra-config-paths not only from items ahead, but should from all items in that bundle. Otherwise it might throw a "invalid config" error, because the required zuul items in extra-config-paths are not found. Change-Id: I5c14bcb14b7f5c627fd9bd49f887dcd55803c6a1
* \| \| \| \| \|	Merge "Fix race related to PR with changed base branch"	Zuul	2023-03-03	1	-1/+5
\|\ \ \ \ \ \ \| \|_\|_\|_\|_\|/ \|/\| \| \| \| \|
\| * \| \| \| \|	Fix race related to PR with changed base branch	Simon Westphahl	2023-03-02	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some people use a workflow that's known as "stacked pull requests" in order to split a change into more reviewable chunks. In this workflow the first PR in the stack targets a protected branch (e.g. master) and all other PRs target the unprotected branch of the next item in the stack. E.g. master <- feature-A (PR#1) <- feature-B (PR#2) <- ... Now, when the first PR in the stack is merged Github will automatically change the target branch of dependent PRs. For the above example this would look like the following after PR#1 is merged: master <- feature-B (PR#2) <- ... The problem here is that we might still process events for a PR before the base branch change, but the Github API already returns the info about the updated target branch. The problem with this is, that we used the target branch name from the event (outdated branch name) and and the information from the change object (new target branch) whether or not the target branch is protected to determine if a branch was configured as protected. In the above example Zuul might wrongly conclude that the "feature-A" branch (taken from the event) is now protected. In the related incident we also observed that this triggered a reconfiguration with the wrong state of now two protected branches (masters + feature-A). Since the project in question previously had only one branch this lead to a change in branch matching behavior for jobs defined in that repository. Change-Id: Ia037e3070aaecb05c062865a6bb0479b86e4dcde
* \| \| \| \| \|	Merge "Avoid layout updates after delete-pipeline-state"	Zuul	2023-03-02	1	-24/+13
\|\ \ \ \ \ \
\| * \| \| \| \| \|	Avoid layout updates after delete-pipeline-state	James E. Blair	2023-03-01	1	-24/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The delete-pipeline-state commend forces a layout update on every scheduler, but that isn't strictly necessary. While it may be helpful for some issues, if it really is necessary, the operator can issue a tenant reconfiguration after performing the delete-pipeline-state. In most cases, where only the state information itself is causing a problem, we can omit the layout updates and assume that the state reset alone is sufficient. To that end, this change removes the layout state changes from the delete-pipeline-state command and instead simply empties and recreates the pipeline state and change list objects. This is very similar to what happens in the pipeline manager _postConfig call, except in this case, we have the tenant lock so we know we can write with imputinity, and we know we are creating objects in ZK from scratch, so we use direct create calls. We set the pipeline state's layout uuid to None, which will cause the first scheduler that comes across it to (assuming its internal layout is up to date) perform a pipeline reset (which is almost a noop on an empty pipeline) and update the pipeline state layout to the current tenant layout state. Change-Id: I1c503280b516ffa7bbe4cf456d9c900b500e16b0
* \| \| \| \| \| \|	Merge "Set layout state event ltime in delete-pipeline-state"	Zuul	2023-03-02	1	-5/+4
\|\ \ \ \ \ \ \ \| \|/ / / / / / \| \| / / / / / \| \|/ / / / / \|/\| \| \| \| \|
\| * \| \| \| \|	Set layout state event ltime in delete-pipeline-state	James E. Blair	2023-02-28	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The delete-pipeline-state command updates the layout state in order to force schedulers to update their local layout (essentially perform a local-only reconfiguration). In doing so, it sets the last event ltime to -1. This is reasonable for initializing a new system, but in an existing system, when an event arrives at the tenant trigger event queue it is assigned the last reconfiguration event ltime seen by that trigger event queue. Later, when a scheduler processes such a trigger event after the delete-pipeline-state command has run, it will refuse to handle the event since it arrived much later than its local layout state. This must then be corrected manually by the operator by forcing a tenant reconfiguration. This means that the system essentially suffers the delay of two sequential reconfigurations before it can proceed. To correct this, set the last event ltime for the layout state to the ltime of the layout state itself. This means that once a scheduler has updated its local layout, it can proceed in processing old events. Change-Id: I66e798adbbdd55ff1beb1ecee39c7f5a5351fc4b
* \| \| \| \| \|	Merge "Ignore fetch-ref-replicated gerrit event"	Zuul	2023-03-02	1	-0/+2
\|\ \ \ \ \ \ \| \|_\|_\|_\|_\|/ \|/\| \| \| \| \|
\| * \| \| \| \|	Ignore fetch-ref-replicated gerrit event	Dong Zhang	2023-02-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This event appears in our new gerrit setup, and I think we should ignore it. Change-Id: Iad424678be2e2f4fc1809862acceca5b9240a3cf
* \| \| \| \| \|	Merge "Handle missing diff_refs attribute"	Zuul	2023-03-01	1	-7/+37
\|\ \ \ \ \ \
\| * \| \| \| \| \|	Handle missing diff_refs attribute	Fabien Boucher	2023-01-17	1	-7/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recently, on a production Zuul acting on projects hosted on gitlab.com, it has been discovered that a merge requested fetched from the API (just after Zuul receives the merge request created event) could have the "diff_refs" attribute set to None. Related bug: https://gitlab.com/gitlab-org/gitlab/-/issues/386562 Leading to the following stacktrace in the logs: 2022-12-14 10:08:47,921 ERROR zuul.GitlabEventConnector: Exception handling Gitlab event: Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 102, in run self.event_queue.election.run(self._run) File "/usr/local/lib/python3.8/site-packages/zuul/zk/election.py", line 28, in run return super().run(func, args, kwargs) File "/usr/local/lib/python3.8/site-packages/kazoo/recipe/election.py", line 54, in run func(args, **kwargs) File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 110, in _run self._handleEvent(event) File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 246, in _handleEvent self.connection._getChange(change_key, refresh=True, File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 621, in _getChange change = self._change_cache.updateChangeWithRetry(change_key, change, File "/usr/local/lib/python3.8/site-packages/zuul/zk/change_cache.py", line 432, in updateChangeWithRetry update_func(change) File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 619, in _update_change self._updateChange(c, event, mr) File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 665, in _updateChange change.commit_id = change.mr['diff_refs'].get('head_sha') AttributeError: 'NoneType' object has no attribute 'get' The attribute "diff_refs" becomes an object (with the expected keys) few moments later. In order to avoid this situation, this change adds a mechanism to retry fetching a MR until it owns some expected fields. In our case only "diff_refs". https://docs.gitlab.com/ee/api/merge_requests.html#response Tests are included with that change. Change-Id: I6f279516728def655acb8933542a02a4dbb3ccb6
* \| \| \| \| \| \|	Merge "Re-enqueue changes if dequeued missing deps"	Zuul	2023-02-28	1	-5/+32
\|\ \ \ \ \ \ \ \| \|_\|/ / / / / \|/\| \| \| \| \| \|
\| * \| \| \| \| \|	Re-enqueue changes if dequeued missing deps	James E. Blair	2023-02-20	1	-5/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When users create dependency cycles, the process often takes multiple steps, some of which cause changes enqueued in earlier steps to be dequeued. Users then need to re-enqueue those changes in order to have all changes in the cycle tested. This change attempts to improve this by detecting that situation and re-enqueing changes that are being de-queued because of missing deps. Note, thare are plenty of cases where we de-queue because of missing deps and we don't want to re-enqueue (ie, a one-way dependent change ahead has failed). To restrict this to only the situation we're interested in, we only act if the dependencies are already in the pipeline. A recently updated change in a cycle will have just been added to the pipeline, so this is true in the case we're interested in. A one-way dependent change that failed will have already been removed from the pipeline, and so this will be false in cases in which we are not interested. Change-Id: I84b3b2f8fffd1c946dafa605d1c17a37131558bd
* \| \| \| \| \| \|	Merge "Match events to pipelines based on topic deps"	Zuul	2023-02-27	2	-7/+31
\|\ \ \ \ \ \ \ \| \|/ / / / / / \| \| \| / / / / \| \|_\|/ / / / \|/\| \| \| \| \|
\| * \| \| \| \|	Match events to pipelines based on topic deps	James E. Blair	2023-02-17	2	-7/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We distribute tenant events to pipelines based on whether the event matches the pipeline (ie, patchset-created for a check pipeline) or if the event is related to a change already in the pipeline. The latter condition means that pipelines notice quickly when dependencies are changed and they can take appropriate action (such as ejecting changes which no longer have correct dependencies). For git and commit dependencies, an update to a cycle to add a new change requires an update to at least one existing change (for example, adding a new change to a cycle usually requires at least two Depends-On footers: the new change, as well as one of the changes already in the cycle). This means that updates to add new changes to cycles quickly come to the attention of pipelines. However, it is possible to add a new change to a topic dependency cycle without updating any existing changes. Merely uploading a new change in the same topic adds it to the cycle. Since that new change does not appear in any existing pipelines, pipelines won't notice the update until their next natural processing cycle, at which point they will refresh dependencies of any changes they contain, and they will notice the new dpendency and eject the cycle. To align the behavior of topic dependencies with git and commit dependencis, this change causes the scheduler to refresh the dependencies of the change it is handling during tenant trigger event processing, so that it can then compare that change's dependencies to changes already in pipelines to determine if this event is potentially relevant. This moves some work from pipeline processing (which is highly parallel) to tenant processing (which is only somewhat parallel). This could slow tenant event processing somewhat. However, the work is persisted in the change cache, and so it will not need to be repeated during pipeline processing. This is necessary because the tenant trigger event processor operates only with the pipeline change list data; it does not perform a full pipeline refresh, so it does not have access to the current queue items and their changes in order to compare the event change's topic with currently enqueued topics. There are some alternative ways we could implement this if the additional cost is an issue: 1) At the beginning of tenant trigger event processing, using the change list, restore each of the queue's change items from the change cache and compare topics. For large queues, this could end up generating quite a bit of ZK traffic. 2) Add the change topic to the change reference data structure so that it is stored in the change list. This is an abuse of this structure which otherwise exists only to store the minimum amount of information about a change in order to uniquely identify it. 3) Implement a PipelineTopicList similar to a PipelineChangeList for storing pipeline topics and accesing them without a full refresh. Another alternative would be to accept the delayed event handling of topic dependencies and elect not to "fix" this behavior. Change-Id: Ia9d691fa45d4a71a1bc78cc7a4bdec206cc025c8
* \| \| \| \| \|	Merge "Only store trigger event info on queue item"	Zuul	2023-02-27	3	-18/+67
\|\ \ \ \ \ \ \| \|_\|_\|/ / / \|/\| \| \| \| \|
\| * \| \| \| \|	Only store trigger event info on queue item	Simon Westphahl	2023-02-22	3	-18/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The event that's currently stored as part of the queue item is not sharded. This means that we can see Zookeeper disconnects when the queue item data exceeds the max. Znode size of 1MB. Since we only need the event's timestamp and the Zuul event-id after an item is enqueued, we can reduce the amount of data we store in Zookeeper and also avoid sharding the event. Change-Id: I13577498e55fd4bb189679836219dea4dc5729fc
* \| \| \| \| \|	Merge "[gitlab driver] fix "'EnqueueEvent' object has no attribute ↵	Zuul	2023-02-22	1	-6/+1
\|\ \ \ \ \ \ \| \|/ / / / / \|/\| \| \| \| \| \| \| \| \| \| \|	'change_url'""
\| * \| \| \| \|	[gitlab driver] fix "'EnqueueEvent' object has no attribute 'change_url'"	Fabien Boucher	2022-12-15	1	-6/+1
\| \| \|/ / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change fixes an issue that prevent an admin to use the zuul enqueue and enqueue-ref commands. File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 602, in _getChange AttributeError: 'EnqueueEvent' object has no attribute 'change_url' Change-Id: Iceaeadc64baee26adb71909122d8c55314b8e036
* \| \| \| \|	Merge "Cleanup old rebase-merge dirs on repo reset"8.2.0	Zuul	2023-02-17	1	-1/+27
\|\ \ \ \ \