| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
The test helper method that handles fake gerrit queries had a bug
which would cause the "topic:" queries to return all open changes.
When we correct that, we can see, by virtue of a newly raised
execption that there was some unexercised code in getChangesByTopic
which is now exercised. This change also corrects the exception
that is raised when mutating a set while iterating over it.
Change-Id: I1874482b2c28fd1082fcd56036afb20333232409
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before getting a change by key we need to convert the reference to a
`ChangeKey` object.
Traceback (most recent call last):
File "/opt/src/zuul/zuul/scheduler.py", line 2423, in process_tenant_trigger_queue
self._forward_trigger_event(event, tenant)
File "/opt/src/zuul/zuul/scheduler.py", line 2538, in _forward_trigger_event
pipeline.manager.updateCommitDependencies(change, event)
File "/opt/src/zuul/zuul/manager/__init__.py", line 924, in updateCommitDependencies
for dep in source.getChangesByTopic(change.topic):
File "/opt/src/zuul/zuul/driver/gerrit/gerritsource.py", line 171, in getChangesByTopic
git_change = self.getChange(git_key)
File "/opt/src/zuul/zuul/driver/gerrit/gerritsource.py", line 86, in getChange
return self.connection.getChange(change_key, refresh=refresh,
File "/opt/src/zuul/zuul/driver/gerrit/gerritconnection.py", line 791, in getChange
if change_key.connection_name != self.connection_name:
AttributeError: 'str' object has no attribute 'connection_name'
Change-Id: I20663b538ccb48bc13191a134caf3326b6f5b76e
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
With newer versions of Gerrit, we are increasingly likely to encounter
systems where the traditional label requirements are minimized in favor
of the new submit requirements rules. If Gerrit is configured to use
a submit requirement instead of a traditional label blocking rule, that
is typically done by switching the label function to "NoBlock", which,
like the "NoOp" function, will still cause the label to appear in the
"submit_record" field, but with a value of "MAY" instead of "OK", "NEED",
or "REJECT".
Instead, the interesting information will be in the "submit_requirements"
field. In this field we can see the individual submit requirement rules
and whether they are satisfied or not.
Since submit requirements do not have a 1:1 mapping with labels, determining
whether an "UNSATISFIED" submit requirement should be ignored (because it
pertains to a label that Zuul will alter, like "Verified") is not as
straightforward is it is for submit records. To be conservative, this
change looks for any of the "allow needs" labels (typically "Verified") in
each unsatisfied submit record and if it finds one, it ignores that record.
With this change in place, we can avoid enqueing changes which we are certain
can not be merged into gate pipelines, and will continue to enqueue changes
about which we are uncertain.
Change-Id: I667181565684d97c1d036e2db6193dc606c76c57
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This command had two problems:
* It would only delete the first 50 buildsets
* Depending on DB configuration, it may not have deleted anything
or left orphan data.
We did not tell sqlalchemy to cascade delete operations, meaning that
when we deleted the buildset, we didn't delete anything else.
If the database enforces foreign keys (innodb, psql) then the command
would have failed. If it doesn't (myisam) then it would have deleted
the buildset rows but not anything else.
The tests use myisam, so they ran without error and without deleting
the builds. They check that the builds are deleted, but only through
the ORM via a joined load with the buildsets, and since the buildsets
are gone, the builds weren't returned.
To address this shortcoming, the tests now use distinct ORM methods
which return objects without any joins. This would have caught
the error had it been in place before.
Additionally, the delet operation retained the default limit of 50
rows (set in place for the web UI), meaning that when it did run,
it would only delete the most recent 50 matching builds.
We now explicitly set the limit to a user-configurable batch size
(by default, 10,000 builds) so that we keep transaction sizes
manageable and avoid monopolizing database locks. We continue deleting
buildsets in batches as long as any matching buildsets remain. This
should allow users to remove very large amounts of data without
affecting ongoing operations too much.
Change-Id: I4c678b294eeda25589b75ab1ce7c5d0b93a07df3
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Github will use the PR title as the commit subject for squash merges, so
we don't need include the title once again in the commit description.
Change-Id: Id5a00701c236235f5a49abd025bcfad1b2da916c
|
|\ \ \ |
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In normal git usage, cherry-picking a commit that has already been
applied and doesn't do anything or cherry-picking an empty commit causes
git to exit with an error to let the user decide what they want to do.
However, this doesn't match the behavior of merges and rebases where
non-empty commits that have already been applied are simply skipped
(empty source commits are preserved).
To fix this, add the --keep-redundant-commit option to `git cherry-pick`
to make git always keep a commit when cherry-picking even when it is
empty for either reason. Then, after the cherry-pick, check if the new
commit is empty and if so back it out if the original commit _wasn't_
empty.
This two step process is necessary because git doesn't have any options
to simply skip cherry-pick commits that have already been applied to the
tree.
Removing commits that have already been applied is particularly
important in a "deploy" pipeline triggered by a Gerrit "change-merged"
event, since the scheduler will try to cherry-pick the change on top of
the commit that just merged. Without this option, the cherry-pick will
fail and the deploy pipeline will fail with a MERGE_CONFICT.
Change-Id: I326ba49e2268197662d11fd79e46f3c020675f21
|
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Remove data under the `zuul` key from the job returned vars.
These returned values are meant to be used only by Zuul and
shouldn't be included in documents as it may include large amount
of data such as file comments.
Change-Id: Ie6de7e3373b21b7c234ffedd5db7d3ca5a0645b6
|
|\ \ \ \ |
|
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In case of a bundle, zuul should load extra-config-paths not only from
items ahead, but should from all items in that bundle. Otherwise it might
throw a "invalid config" error, because the required zuul items in
extra-config-paths are not found.
Change-Id: I5c14bcb14b7f5c627fd9bd49f887dcd55803c6a1
|
|\ \ \ \ |
|
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Some people use a workflow that's known as "stacked pull requests" in
order to split a change into more reviewable chunks.
In this workflow the first PR in the stack targets a protected branch
(e.g. master) and all other PRs target the unprotected branch of the
next item in the stack.
E.g. master <- feature-A (PR#1) <- feature-B (PR#2) <- ...
Now, when the first PR in the stack is merged Github will automatically
change the target branch of dependent PRs. For the above example this
would look like the following after PR#1 is merged:
master <- feature-B (PR#2) <- ...
The problem here is that we might still process events for a PR before
the base branch change, but the Github API already returns the info
about the updated target branch.
The problem with this is, that we used the target branch name from the
event (outdated branch name) and and the information from the change
object (new target branch) whether or not the target branch is protected
to determine if a branch was configured as protected.
In the above example Zuul might wrongly conclude that the "feature-A"
branch (taken from the event) is now protected.
In the related incident we also observed that this triggered a
reconfiguration with the wrong state of now two protected branches
(masters + feature-A). Since the project in question previously had only
one branch this lead to a change in branch matching behavior for jobs
defined in that repository.
Change-Id: Ia037e3070aaecb05c062865a6bb0479b86e4dcde
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The delete-pipeline-state commend forces a layout update on every
scheduler, but that isn't strictly necessary. While it may be helpful
for some issues, if it really is necessary, the operator can issue
a tenant reconfiguration after performing the delete-pipeline-state.
In most cases, where only the state information itself is causing a
problem, we can omit the layout updates and assume that the state reset
alone is sufficient.
To that end, this change removes the layout state changes from the
delete-pipeline-state command and instead simply empties and recreates
the pipeline state and change list objects. This is very similar to
what happens in the pipeline manager _postConfig call, except in this
case, we have the tenant lock so we know we can write with imputinity,
and we know we are creating objects in ZK from scratch, so we use
direct create calls.
We set the pipeline state's layout uuid to None, which will cause the
first scheduler that comes across it to (assuming its internal layout
is up to date) perform a pipeline reset (which is almost a noop on an
empty pipeline) and update the pipeline state layout to the current
tenant layout state.
Change-Id: I1c503280b516ffa7bbe4cf456d9c900b500e16b0
|
|\ \ \ \ \
| |/ / / /
| | / / /
| |/ / /
|/| | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The delete-pipeline-state command updates the layout state in order
to force schedulers to update their local layout (essentially perform
a local-only reconfiguration). In doing so, it sets the last event
ltime to -1. This is reasonable for initializing a new system, but
in an existing system, when an event arrives at the tenant trigger
event queue it is assigned the last reconfiguration event ltime seen
by that trigger event queue. Later, when a scheduler processes such
a trigger event after the delete-pipeline-state command has run, it
will refuse to handle the event since it arrived much later than
its local layout state.
This must then be corrected manually by the operator by forcing a
tenant reconfiguration. This means that the system essentially suffers
the delay of two sequential reconfigurations before it can proceed.
To correct this, set the last event ltime for the layout state to
the ltime of the layout state itself. This means that once a scheduler
has updated its local layout, it can proceed in processing old events.
Change-Id: I66e798adbbdd55ff1beb1ecee39c7f5a5351fc4b
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Recently, on a production Zuul acting on projects hosted on gitlab.com,
it has been discovered that a merge requested fetched from the
API (just after Zuul receives the merge request created event) could have
the "diff_refs" attribute set to None.
Related bug: https://gitlab.com/gitlab-org/gitlab/-/issues/386562
Leading to the following stacktrace in the logs:
2022-12-14 10:08:47,921 ERROR zuul.GitlabEventConnector: Exception handling Gitlab event:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 102, in run
self.event_queue.election.run(self._run)
File "/usr/local/lib/python3.8/site-packages/zuul/zk/election.py", line 28, in run
return super().run(func, *args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/kazoo/recipe/election.py", line 54, in run
func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 110, in _run
self._handleEvent(event)
File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 246, in _handleEvent
self.connection._getChange(change_key, refresh=True,
File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 621, in _getChange
change = self._change_cache.updateChangeWithRetry(change_key, change,
File "/usr/local/lib/python3.8/site-packages/zuul/zk/change_cache.py", line 432, in updateChangeWithRetry
update_func(change)
File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 619, in _update_change
self._updateChange(c, event, mr)
File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 665, in _updateChange
change.commit_id = change.mr['diff_refs'].get('head_sha')
AttributeError: 'NoneType' object has no attribute 'get'
The attribute "diff_refs" becomes an object (with the expected keys) few
moments later.
In order to avoid this situation, this change adds a mechanism to retry
fetching a MR until it owns some expected fields. In our case only
"diff_refs".
https://docs.gitlab.com/ee/api/merge_requests.html#response
Tests are included with that change.
Change-Id: I6f279516728def655acb8933542a02a4dbb3ccb6
|
|\ \ \ \ \ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When users create dependency cycles, the process often takes multiple
steps, some of which cause changes enqueued in earlier steps to be
dequeued. Users then need to re-enqueue those changes in order to
have all changes in the cycle tested.
This change attempts to improve this by detecting that situation and
re-enqueing changes that are being de-queued because of missing deps.
Note, thare are plenty of cases where we de-queue because of missing
deps and we don't want to re-enqueue (ie, a one-way dependent change
ahead has failed). To restrict this to only the situation we're
interested in, we only act if the dependencies are already in the
pipeline. A recently updated change in a cycle will have just been
added to the pipeline, so this is true in the case we're interested in.
A one-way dependent change that failed will have already been removed
from the pipeline, and so this will be false in cases in which we
are not interested.
Change-Id: I84b3b2f8fffd1c946dafa605d1c17a37131558bd
|
|\ \ \ \ \ \
| |/ / / / /
| | | / / /
| |_|/ / /
|/| | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We distribute tenant events to pipelines based on whether the event
matches the pipeline (ie, patchset-created for a check pipeline) or
if the event is related to a change already in the pipeline. The
latter condition means that pipelines notice quickly when dependencies
are changed and they can take appropriate action (such as ejecting
changes which no longer have correct dependencies).
For git and commit dependencies, an update to a cycle to add a new
change requires an update to at least one existing change (for example,
adding a new change to a cycle usually requires at least two Depends-On
footers: the new change, as well as one of the changes already in the
cycle). This means that updates to add new changes to cycles quickly
come to the attention of pipelines.
However, it is possible to add a new change to a topic dependency cycle
without updating any existing changes. Merely uploading a new change
in the same topic adds it to the cycle. Since that new change does
not appear in any existing pipelines, pipelines won't notice the update
until their next natural processing cycle, at which point they will
refresh dependencies of any changes they contain, and they will notice
the new dpendency and eject the cycle.
To align the behavior of topic dependencies with git and commit
dependencis, this change causes the scheduler to refresh the
dependencies of the change it is handling during tenant trigger event
processing, so that it can then compare that change's dependencies
to changes already in pipelines to determine if this event is
potentially relevant.
This moves some work from pipeline processing (which is highly parallel)
to tenant processing (which is only somewhat parallel). This could
slow tenant event processing somewhat. However, the work is
persisted in the change cache, and so it will not need to be repeated
during pipeline processing.
This is necessary because the tenant trigger event processor operates
only with the pipeline change list data; it does not perform a full
pipeline refresh, so it does not have access to the current queue items
and their changes in order to compare the event change's topic with
currently enqueued topics.
There are some alternative ways we could implement this if the additional
cost is an issue:
1) At the beginning of tenant trigger event processing, using the change
list, restore each of the queue's change items from the change cache
and compare topics. For large queues, this could end up generating
quite a bit of ZK traffic.
2) Add the change topic to the change reference data structure so that
it is stored in the change list. This is an abuse of this structure
which otherwise exists only to store the minimum amount of information
about a change in order to uniquely identify it.
3) Implement a PipelineTopicList similar to a PipelineChangeList for storing
pipeline topics and accesing them without a full refresh.
Another alternative would be to accept the delayed event handling of topic
dependencies and elect not to "fix" this behavior.
Change-Id: Ia9d691fa45d4a71a1bc78cc7a4bdec206cc025c8
|
| |_|/ /
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The event that's currently stored as part of the queue item is not
sharded. This means that we can see Zookeeper disconnects when the queue
item data exceeds the max. Znode size of 1MB.
Since we only need the event's timestamp and the Zuul event-id after an
item is enqueued, we can reduce the amount of data we store in Zookeeper
and also avoid sharding the event.
Change-Id: I13577498e55fd4bb189679836219dea4dc5729fc
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When using the rebase merge-mode a failed "merge" will leave the repo in
a state that Zuul so far could not recover from. The rebase will create
a `.git/rebase-merge` directory which is not removed when the rebase
fails.
To fix this we will abort the rebase when it fails and also remove any
existing `.git/rebase-merge` and `.git/rebase-apply` directories when
resetting the repository.
DEBUG zuul.Merger: [e: ...] Unable to merge {'branch': 'master', 'buildset_uuid': 'f7be4215f37049b4ba0236892a5d8197', 'connection': 'github', 'merge_mode': 5, 'newrev': None, 'number': 71, 'oldrev': None, 'patchset': 'e81d0b248565db290b30d9a638095947b699c76d', 'project': 'org/project', 'ref': 'refs/pull/71/head'}
Traceback (most recent call last):
File "/opt/zuul/lib/python3.10/site-packages/zuul/merger/merger.py", line 1099, in _mergeChange
commit = repo.rebaseMerge(
File "/opt/zuul/lib/python3.10/site-packages/zuul/merger/merger.py", line 626, in rebaseMerge
repo.git.rebase(*args)
File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 542, in <lambda>
return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 1005, in _call_process
return self.execute(call, **exec_kwargs)
File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 822, in execute
raise GitCommandError(command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git rebase 39fead1852ef01a716a1c6470cee9e4197ff5587
stderr: 'fatal: It seems that there is already a rebase-merge directory, and
I wonder if you are in the middle of another rebase. If that is the
case, please try
git rebase (--continue | --abort | --skip)
If that is not the case, please
rm -fr ".git/rebase-merge"
and run me again. I am stopping in case you still have something
valuable there.
Change-Id: I8518cc5e4b3f7bbfc2c2283a2b946dee504991dd
|
|\ \ \ \ \ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When a pull-request is updated concurrently on a scheduler we'll wait
until the first thread has updated the change. The problem is if we
needed to create a new PR object. In this case we'd return a Github
change that wasn't updated and also doesn't have a cache key set.
ERROR zuul.Scheduler: Exception processing pipeline check in tenant foobar
Traceback (most recent call last):
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2149, in process_pipelines
refreshed = self._process_pipeline(
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2241, in _process_pipeline
self.process_pipeline_trigger_queue(tenant, pipeline)
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2447, in process_pipeline_trigger_queue
self._process_trigger_event(tenant, pipeline, event)
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2480, in _process_trigger_event
pipeline.manager.addChange(change, event)
File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 534, in addChange
self.updateCommitDependencies(change, None, event)
File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 868, in updateCommitDependencies
new_commit_needs_changes = [d.cache_key for d in dependencies]
File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 868, in <listcomp>
new_commit_needs_changes = [d.cache_key for d in dependencies]
File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 5946, in cache_key
return self.cache_stat.key.reference
AttributeError: 'NoneType' object has no attribute 'key'
Change-Id: I2f3012060c486d593ad857e046334c3d9bef0ed5
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Make sure to update the last reconfig event ltime in the layout state
during a (smart) reconfig. This is important in order to unblock
pipelines when a tenant reconfig event is lost.
So far the last reconfig event ltime was passed as -1, but wasn't set in
the layout state since the ltime is not allowed to go backward.
Change-Id: Iab04a962abbfbe901c22e4d5f1d484e3f53b0d33
|
|\ \ \ \ \ \ |
|
| | |/ / / /
| |/| | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In order to better understand the scheduler run handler performance this
change adds a new `zuul.scheduler.run_handler` metric to measure the
duration of one run handler loop.
Change-Id: I77e862cf99d6a8445e71d7daab410d5853487dc3
|
|\ \ \ \ \ \ |
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
If a branch name contains unicode characters that are more than 1-byte
wide, the size in bytes of the pack record won't match the size in
characters, and the pack parsing will be incorrect.
Instead, treat everything as an encoded byte string until parsing is
done - and only decode when handling a single, parsed, record.
Change-Id: I7f1a0cc96a36129fbc04c7a8687da3f66c1eef99
|
|\ \ \ \ \ \ \ |
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
This addresses SqlAlchemy's removed in 2.0 warnings. Now that SqlAlchemy
2.0 has released we can see that we are not compatible yet. A good first
step in adding compatibility is fixing warnings in 1.4.
In particular there are four types of warning we fix here:
1. Using raw strings in conn.execute() calls. We need to use the text()
construct instead.
2. Passing a list of items to select when doing select queries. Instead
we need to pass things as normal posargs.
3. Accessing row result items as if the row is a dict. THis is not
longer possible without first going through the row._mapping system.
Instead we can access items as normal object attributes.
4. You must now use sqlalchemy.inspect() on a connectable to create an
Inspector object rather than instantiating it directly.
Finally we set up alembic's engine creation to run with future 2.0
behavior now that the warnings are cleared up. This appears to have
already been done for the main zuul application.
Change-Id: I5475e39bd93d71cd1106ec6d3a5423ea2dd51859
|
|\ \ \ \ \ \ \ \
| |_|_|_|/ / / /
|/| | | | | | | |
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
I've managed to get better at grepping for this and this finds some of
the stragglers. They are all file opens without closes fixed by using a
with open() context manager.
Change-Id: I7b8c8516a86558e2027cb0f01aefe2dd1849069c
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
PyYAML is efficient with YAML anchors and will only construct one
Python object for a YAML mapping that is used multiple times via
anchors.
We copy job variables (a mapping) straight from the YAML to an
attribute of the Job object, then we freeze the Job object which
converts all of its dicts to mappingproxy objects. This mutates
the contents of the vars dict, and if that is used on another
job via an anchor, we will end up with mappingproxy objects in
what we think is a "raw yaml" configuration dict, and we will not
be able to serialize it in case of errors.
To address this, perform a deep copy of every configuration yaml
blob before parsing it into an object.
Change-Id: Idaa6ff78b1ac5a108fb9f43700cf8e66192c43ce
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
This argument is no longer necessary because it is set via an inderect
route through the pipeline and tenant objects. Remove it, and also
validate that it is set correctly in unit tests.
Change-Id: I62ce18a61a416cbc397866838f8ac3b0ec1bd564
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Simon Westphahl describes the race condition:
> [The race condition] can occur after a reconfiguration while
> some schedulers are updating their local layout and some
> already start processing pipelines in the same tenant.
>
> In this case the pipeline manager's `_postConfig()` method that
> calls `PipelineChangeList.create(...)` races with the pipeline
> processor updating the change keys.
>
> This leads to two change lists being written as separate
> shards, that can't be correctly loaded, as all shards combined
> are expected to form a single JSON object.
>
> The sequence of events seems to be the following:
> 1. S1: pipeline processor needs to update the change keys
> 2. S1 the shard writer will delete the `change_list` key with the old
> shards
> 3. S2: configloader calls the `_postConfig()` method
> 4. S2: `PipelineChangeList.create()` notices that the `change_list` node
> doesn't exists in Zookeeper:
> https://opendev.org/zuul/zuul/src/branch/master/zuul/model.py#L921
> 6. S2: the shard writer creates the first shard `0000000000`
> 7. S1: the shard writer creates the second shared `0000000001`
>
> The race condition seems to be introduced with
> Ib1e467b5adb907f93bab0de61da84d2efc22e2a7
That change updated the pipeline manager _postConfig method so
that it no longer acquires the pipeline lock when initalizing the
pipeline state and change lists. This greatly reduces potential
delays during reconfiguration, but has, perhaps predictably, lead
to the race condition above.
In the commit message for that change, I noted that we might be
able to avoid even more work if we accept some caveats related to
error reporting. Avoiding that work mean avoiding performing any
writes during _postConfig which addresses the root cause of the
race condition (steps 3-6 above. Ed. note: there is no step 5).
From the commit message:
> We still need to attach new state and change list objects to
> the pipeline in _postConfig (since our pipeline object is new).
> We also should make sure that the objects exist in ZK before we
> leave that method, so that if a new pipeline is created, other
> schedulers will be able to load the (potentially still empty)
> objects from ZK. As an alternative, we could avoid even this
> work in _postConfig, but then we would have to handle missing
> objects on refresh, and it would not be possible to tell if the
> object was missing due to it being new or due to an error. To
> avoid masking errors, we keep the current expectation that we
> will create these objects in ZK on the initial reconfiguration.
The current change does exactly that. We no longer perform any
ZK write operations on the state and change list objects in
_postConfig. Instead, inside of the refresh methods, we detect
the cases where they should be newly created and do so at that
time. This happens with the pipeline lock, so is safe against
any simultaneous operation from other components.
There will be "ERROR" level log messages indicating that reading
the state from ZK has failed when these objects are first
initialized. To indicate that this is probably okay, they will
now be immediately followed by "WARNING" level messages explaining
that.
Strictly speaking, this particular race should only occur for the
change list object, not the pipeline state, since the race
condition above requires a sharded object and of the two, only
the change list is sharded. However, to keep the lifecycle of
these two objects matched (and to simplify _postConfig) the same
treatment is applied to both.
Note that change I7fa99cd83a857216321f8d946fd42abd9ec427a3 merged
after Ib1e467b and changed the behavior slightly, introducing the
old_state and old_list arguments. Curiously, the old_list
argument is effectively unused, so it is removed entirely in this
change. Old_state still has a purpose and is retained.
Change-Id: I519348e7d5d74e675808e990920480fb6e1fb981
|
| |_|_|_|/ / /
|/| | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
These tests call getChangeQueue which mutates the pipeline state
objects in ZK. They should therefore hold the pipeline lock while
doing this. Otherwise, cleanup routines may run during the test
and delete the state out from under them.
Change-Id: If85d3cf66669f5786203309294528e1f528b0423
|
|\ \ \ \ \ \ \ |
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
In I9628e2770dda120b269612e28bb6217036942b8e we switched zuul.change from
a plain string tagged with !unsafe to base64 encoded and no !unsafe tag.
The idea was to make the inventory file parseable by external tools while
avoiding accidental interpolation of the commit message by Ansible.
That doesn't work in all cases -- it's not hard to construct a scenario
where after base64 decoding the message any further processing by Ansible
causes it to undergo interpolation. Moreover, since then we have made
many changes to how we deal with variables; notably, the inventory.yaml
is no longer actually used by Zuul's Anisble -- it is now there only
for human and downstream processing. We call it the "debug inventory".
The actual inventory is much more complex and in some cases has lots of
!unsafe tags in it.
Given all that, it now seems like the most straightforward way to deal
with this is to tag the message variable as !unsafe when passing it to
Zuul's Ansible, but render it as plain text in the inventory.yaml.
To address backwards compatability, this is done in a new variable called
zuul.change_message. Since that's a more descriptive variable anyway,
we will just keep that one in the future and drop the current base64-
encoded zuul.message variable
Change-Id: Iea86de15e722bc271c1bf0540db2c9efb032500c
|
|\ \ \ \ \ \ \ \
| |/ / / / / / /
|/| | | | | | | |
|
| | |/ / / / /
| |/| | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Commit 549103881899751e84b411212f7820cde502b42b appears to have
had an error. It attempted to only refresh changes if they are
in the pipeline or are dependencies of changes in the pipeline,
but it only did the latter, which has an adverse effect for the
GitHub driver. We can change dependencies of GitHub changes
without creating new patchsets, which means that this error
(not updating dependencies of changes in the pipeline) would cause
Zuul not to eject a GitHub change from a pipeline when its
dependencies changed via a PR message update.
The error went unnoticed with Gerrit since dependency updates
happen with new patchsets, which themselves cause changes to be
ejected.
This change corrects the behavior.
Repetitive calls to updateCommitDependencies are also eliminated
by creating a set of changes to update across the whole pipeline.
Finally, two tests are added, one for Gerrit (which previously would
have passed) and one for GitHub (which previously would have failed).
Change-Id: I005820223471e8f6372ef70730f381918ec9c1af
|
|\ \ \ \ \ \ \
| | |/ / / / /
| |/| | | | | |
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Since python 3.10 ssl.PROTOCOL_TLS has been deprecated. We are expected
to use ssl.PROTOCOL_TLS_CLIENT and ssl.PROTOCOL_TLS_SERVER depending on
how the sockets are to be used. Switch over to these new constants to
avoid the DeprecationWarning.
One thing to note is that PROTOCOL_TLS_CLIENT has default behaviors
around cert verification and hostname checking. Zuul is already
explicitly setting those options the way it wants to and I've left that
alone to avoid trouble if the defaults change later.
Finally, this doesn't fix the occurence of this error that happens
within kazoo. A separate PR has been made upstream to kazoo and this
should be fixed in the next kazoo release.
Change-Id: Ib41640f1d33d60503066464c8c98f865a74f003a
|