| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This updates the Gerrit driver to match the pattern in the GitHub
driver where instead of specifying individual trigger
requirements such as "require-approvals", instead a complete ref
filter (a la "requirements") can be embedded in the trigger
filter.
The "require-approvals" and "reject-approvals" attributes are
deprecated in favor of the new approach.
Additionally, all require filters in Gerrit are now available as
reject filters.
And finally, the Gerrit filters are updated to return
FalseWithReason so that log messages are more useful, and the
Github filters are updated to improve the language, avoid
apostraphes for ease of grepping, and match the new Gerrit
filters.
Change-Id: Ia9c749f1c8e318fe01e84e52831a9d0d2c10b203
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mimics a useful feature of the Gerrit driver and allows users
to configure pipelines that trigger on events but only if certain
conditions of the PR are met.
Unlike the Gerrit driver, this embeds the entire require/reject
filter within the trigger filter (the trigger filter has-a require
or reject filter). This makes the code simpler and is easier for
users to configure. If we like this approach, we should migrate the
gerrit driver as well, and perhaps the other drivers.
The "require-status" attribute already existed, but was undocumented.
This documents it, adds backwards-compat handling for it, and
deprecates it.
Some documentation typos are also corrected.
Change-Id: I4b6dd8c970691b1e74ffd5a96c2be4b8075f1a87
|
|
|
|
|
|
|
|
|
|
|
|
| |
MySQL 8 no longer supports implicitly creating a user using the GRANT
statement. Use a separate CREATE USER statement instead.
ERROR 1064 (42000) at line 1: You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version for the
right syntax to use near 'identified by 'openstack_citest' WITH GRANT
OPTION' at line 1
Change-Id: I4cab4c1855d1ba97cbfc9dd0835b3d302d73aa62
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When using the rebase merge-mode a failed "merge" will leave the repo in
a state that Zuul so far could not recover from. The rebase will create
a `.git/rebase-merge` directory which is not removed when the rebase
fails.
To fix this we will abort the rebase when it fails and also remove any
existing `.git/rebase-merge` and `.git/rebase-apply` directories when
resetting the repository.
DEBUG zuul.Merger: [e: ...] Unable to merge {'branch': 'master', 'buildset_uuid': 'f7be4215f37049b4ba0236892a5d8197', 'connection': 'github', 'merge_mode': 5, 'newrev': None, 'number': 71, 'oldrev': None, 'patchset': 'e81d0b248565db290b30d9a638095947b699c76d', 'project': 'org/project', 'ref': 'refs/pull/71/head'}
Traceback (most recent call last):
File "/opt/zuul/lib/python3.10/site-packages/zuul/merger/merger.py", line 1099, in _mergeChange
commit = repo.rebaseMerge(
File "/opt/zuul/lib/python3.10/site-packages/zuul/merger/merger.py", line 626, in rebaseMerge
repo.git.rebase(*args)
File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 542, in <lambda>
return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 1005, in _call_process
return self.execute(call, **exec_kwargs)
File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 822, in execute
raise GitCommandError(command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git rebase 39fead1852ef01a716a1c6470cee9e4197ff5587
stderr: 'fatal: It seems that there is already a rebase-merge directory, and
I wonder if you are in the middle of another rebase. If that is the
case, please try
git rebase (--continue | --abort | --skip)
If that is not the case, please
rm -fr ".git/rebase-merge"
and run me again. I am stopping in case you still have something
valuable there.
Change-Id: I8518cc5e4b3f7bbfc2c2283a2b946dee504991dd
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When a pull-request is updated concurrently on a scheduler we'll wait
until the first thread has updated the change. The problem is if we
needed to create a new PR object. In this case we'd return a Github
change that wasn't updated and also doesn't have a cache key set.
ERROR zuul.Scheduler: Exception processing pipeline check in tenant foobar
Traceback (most recent call last):
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2149, in process_pipelines
refreshed = self._process_pipeline(
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2241, in _process_pipeline
self.process_pipeline_trigger_queue(tenant, pipeline)
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2447, in process_pipeline_trigger_queue
self._process_trigger_event(tenant, pipeline, event)
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2480, in _process_trigger_event
pipeline.manager.addChange(change, event)
File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 534, in addChange
self.updateCommitDependencies(change, None, event)
File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 868, in updateCommitDependencies
new_commit_needs_changes = [d.cache_key for d in dependencies]
File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 868, in <listcomp>
new_commit_needs_changes = [d.cache_key for d in dependencies]
File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 5946, in cache_key
return self.cache_stat.key.reference
AttributeError: 'NoneType' object has no attribute 'key'
Change-Id: I2f3012060c486d593ad857e046334c3d9bef0ed5
|
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Make sure to update the last reconfig event ltime in the layout state
during a (smart) reconfig. This is important in order to unblock
pipelines when a tenant reconfig event is lost.
So far the last reconfig event ltime was passed as -1, but wasn't set in
the layout state since the ltime is not allowed to go backward.
Change-Id: Iab04a962abbfbe901c22e4d5f1d484e3f53b0d33
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We allow job.ansible-version to be a str, int, or float and coerce
it to a string. We should do the same for
tenant.default-ansible-version.
Change-Id: I1a104b84d578f4932597893f62cd0cd06f031b4a
|
|\ \ \ \ \
| |_|/ / /
|/| | | | |
|
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Recently I debugged an issue that reproduced on mysql 8.0 on Jammy but
not my local system. It took quite some time for me to realize that
there was a difference in database versions which ended up being
an important detail. Update our docker-compose to better match what we
get in CI.
Change-Id: I7de268acb81680f3e6b7d3b1aa057e7babd3fa62
|
|\ \ \ \ |
|
| | |/ /
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
ERROR zuul.Scheduler: Exception processing pipeline check in tenant foobar
Traceback (most recent call last):
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2149, in process_pipelines
refreshed = self._process_pipeline(
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2241, in _process_pipeline
self.process_pipeline_trigger_queue(tenant, pipeline)
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2447, in process_pipeline_trigger_queue
self._process_trigger_event(tenant, pipeline, event)
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2480, in _process_trigger_event
pipeline.manager.addChange(change, event)
File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 534, in addChange
self.updateCommitDependencies(change, None, event)
File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 864, in updateCommitDependencies
dep = source.getChangeByURLWithRetry(match, event)
File "/opt/zuul/lib/python3.10/site-packages/zuul/source/__init__.py", line 112, in getChangeByURLWithRetry
return dep
UnboundLocalError: local variable 'dep' referenced before assignment
Change-Id: I1c706e5e5d2d337ec84b8fc1ad5e900191f2362c
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
At the moment API call for fetching SSH public key is not documented
anywhere. It's a bit confusing when a user tries to check through
available calls. This call documented only in ``job-content.rst``.
Change-Id: I69a8d8994b1b4e49ac2c5b07690ebb9ff2a62e38
|
|\ \ \ \ \ |
|
| | |_|/ /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This fixes a race condition when two schedulers request files for the
same repository at roughly the same time.
The files cache ltime is generated before creating the cat job. We also
cannot guarantee the order in which the job results are processed.
Since we include the files cache ltime in the min. ltimes as part
of the layout state we need to make sure that the timestamp of the cache
doesn't go backward.
This should fix the following exception we occassionally see in
zuul-web:
WARNING zuul.ConfigLoader: Zuul encountered an error while accessing the repo org/project.
The error was:
Configuration files missing from cache. Check Zuul scheduler logs for more information.
Change-Id: I9fe2ed0b7a95dd12bb978273673e905d06bee72d
|
|\ \ \ \ \ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This allows the use of the PBR_VERSION environment variable when
building container imasges. This facilitates custom version numbers
with builds.
Change-Id: Ib0156836285a798ebe184691d109301bdf751efb
|
|\ \ \ \ \ \ |
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
In comments on I3824af6149bf27c41a8d895fc682236bd0d91f6b Clark
Boylan suggested that an equality check of the version number
might be a simpler and more robust way of checking that an object
should be refreshed. This change implements that suggestion.
Change-Id: I9261fc68442e4002c4579db4f99d3c151ffd485c
|
|\ \ \ \ \ \ \ |
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Currently the zkprofile command only profiles the pipeline refresh
operations. This expands the scope to include all pipeline operations
(refreshing and then any subsequent changes). In particular, this
will help identify places where we write too much data.
Change-Id: Ida7bcb311b5037f6ce15a166ddeae6db44ed6709
|
|\ \ \ \ \ \ \ \ |
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This change adds distinct log message for certain run handler steps.
So far some of those phases could only be inferred by other related log
messages.
Change-Id: Icceca78456b20754cbebb5747da3a538586cd5f5
|
|\ \ \ \ \ \ \ \ \
| |/ / / / / / / / |
|
| | |_|_|_|_|_|/
| |/| | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
In order to better understand the scheduler run handler performance this
change adds a new `zuul.scheduler.run_handler` metric to measure the
duration of one run handler loop.
Change-Id: I77e862cf99d6a8445e71d7daab410d5853487dc3
|
|\ \ \ \ \ \ \ \
| |_|/ / / / / /
|/| | | | | | | |
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
If a branch name contains unicode characters that are more than 1-byte
wide, the size in bytes of the pack record won't match the size in
characters, and the pack parsing will be incorrect.
Instead, treat everything as an encoded byte string until parsing is
done - and only decode when handling a single, parsed, record.
Change-Id: I7f1a0cc96a36129fbc04c7a8687da3f66c1eef99
|
|\ \ \ \ \ \ \ \ |
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Change-Id: I072d81f6cfd489cf1cf69189eeb547e5cb68bebb
|
|\ \ \ \ \ \ \ \ \ |
|
| | |_|_|_|_|_|_|/
| |/| | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
I7fa99cd83a857216321f8d946fd42abd9ec427a3 replaced the pipeline
reference of the `QueueItem` with a property. However, the pipeline was
still passed to the `fromZK()` method when deserializing the bundle.
ERROR zuul.Scheduler: Exception processing pipeline check in tenant foobar
Traceback (most recent call last):
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2149, in process_pipelines
refreshed = self._process_pipeline(
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2227, in _process_pipeline
pipeline.state.refresh(ctx)
File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 715, in refresh
return super().refresh(context)
File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 238, in refresh
self._load(context)
File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 361, in _load
self._set(**self.deserialize(data, context))
File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 762, in deserialize
queue.refresh(context)
File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 238, in refresh
self._load(context)
File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 361, in _load
self._set(**self.deserialize(data, context))
File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 1103, in deserialize
Bundle.deserialize(context, self, items_by_path, bundle_data)))
File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 5912, in deserialize
bundle.items = [
File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 5913, in <listcomp>
items_by_path.get(p) or QueueItem.fromZK(
File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 232, in fromZK
obj._set(**kw)
File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 420, in _set
super().__setattr__(name, value)
AttributeError: can't set attribute 'pipeline'
Change-Id: Ia8de8f6717b3a1211cf4d1841e56ccd0c78ee677
|
|\ \ \ \ \ \ \ \ \ |
|
| |/ / / / / / / /
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
zuul/lib/repl.py:29: DeprecationWarning: currentThread() is deprecated, use current_thread() instead
obj = self.files.get(threading.currentThread(), self.default)
Change-Id: I6f5a9b6b169b882024a41623364eefe0955796ef
|
|\ \ \ \ \ \ \ \ \ |
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
Our test suite no longer complains with RemovedIn20 warnings for
sqlalchemy 2.0 incompatibilities. Check if this works now.
Change-Id: I0ffab3788493dcddf39a1262813293abb6611c19
|
|\ \ \ \ \ \ \ \ \ \
| |/ / / / / / / / / |
|
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
SqlAlchemy 2.0 has gotten a lot more picky about transactions. In
addition to needing to explicitly set up transactions using SqlAlchemy
2.0 it seems alembic's get_current_revision() call cannot be in the same
transaction as the alembic migrate/stamp calls with MySQL 8.0.
In particular the get_current_revision call seems to get a
SHARED_HIGH_PRIO lock on the alembic_version table. This prevents the
migrate/stamp calls from creating the alembic_version table as this
requires an EXCLUSIVE lock. The SHARED_HIGH_PRIO lock appears to be in
place as long as the get_current_revision transaction is active. To fix
this we simplify our migration tooling and put get_current_revision in a
transaction block of its own. The rest of our migrate function calls
into functions that will setup new transactions and it doesn't need to
be in this block.
Change-Id: Ic71ddf1968610784cef72c4634ccec3a18855a0e
|
|\ \ \ \ \ \ \ \ \ \
| |/ / / / / / / / / |
|
| | |_|_|_|_|_|/ /
| |/| | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
This addresses SqlAlchemy's removed in 2.0 warnings. Now that SqlAlchemy
2.0 has released we can see that we are not compatible yet. A good first
step in adding compatibility is fixing warnings in 1.4.
In particular there are four types of warning we fix here:
1. Using raw strings in conn.execute() calls. We need to use the text()
construct instead.
2. Passing a list of items to select when doing select queries. Instead
we need to pass things as normal posargs.
3. Accessing row result items as if the row is a dict. THis is not
longer possible without first going through the row._mapping system.
Instead we can access items as normal object attributes.
4. You must now use sqlalchemy.inspect() on a connectable to create an
Inspector object rather than instantiating it directly.
Finally we set up alembic's engine creation to run with future 2.0
behavior now that the warnings are cleared up. This appears to have
already been done for the main zuul application.
Change-Id: I5475e39bd93d71cd1106ec6d3a5423ea2dd51859
|
|\ \ \ \ \ \ \ \ \
| |_|/ / / / / / /
|/| | | | | | | | |
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
I've managed to get better at grepping for this and this finds some of
the stragglers. They are all file opens without closes fixed by using a
with open() context manager.
Change-Id: I7b8c8516a86558e2027cb0f01aefe2dd1849069c
|
| |_|_|_|_|_|/ /
|/| | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
PyYAML is efficient with YAML anchors and will only construct one
Python object for a YAML mapping that is used multiple times via
anchors.
We copy job variables (a mapping) straight from the YAML to an
attribute of the Job object, then we freeze the Job object which
converts all of its dicts to mappingproxy objects. This mutates
the contents of the vars dict, and if that is used on another
job via an anchor, we will end up with mappingproxy objects in
what we think is a "raw yaml" configuration dict, and we will not
be able to serialize it in case of errors.
To address this, perform a deep copy of every configuration yaml
blob before parsing it into an object.
Change-Id: Idaa6ff78b1ac5a108fb9f43700cf8e66192c43ce
|
|\ \ \ \ \ \ \ \
| | |_|_|_|_|/ /
| |/| | | | | | |
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The previous change to add build/job object version numbers in ZK
was perhaps too conservative with the way it updated them. It
updated a copy of the version dictionary in order to avoid updating
the in-memory dictionary in case the ZK write failed. This behavior
is common for calls to updateAttributes, but we don't get this
behavior when we use activeContext, so we generally don't rely on
it.
This change makes it slightly more efficient by updating the
in-memory dictionary in place before writing it to ZK.
Change-Id: If48d745ceefc0897946232947e5c6056fc833041
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
This argument is no longer necessary because it is set via an inderect
route through the pipeline and tenant objects. Remove it, and also
validate that it is set correctly in unit tests.
Change-Id: I62ce18a61a416cbc397866838f8ac3b0ec1bd564
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Simon Westphahl describes the race condition:
> [The race condition] can occur after a reconfiguration while
> some schedulers are updating their local layout and some
> already start processing pipelines in the same tenant.
>
> In this case the pipeline manager's `_postConfig()` method that
> calls `PipelineChangeList.create(...)` races with the pipeline
> processor updating the change keys.
>
> This leads to two change lists being written as separate
> shards, that can't be correctly loaded, as all shards combined
> are expected to form a single JSON object.
>
> The sequence of events seems to be the following:
> 1. S1: pipeline processor needs to update the change keys
> 2. S1 the shard writer will delete the `change_list` key with the old
> shards
> 3. S2: configloader calls the `_postConfig()` method
> 4. S2: `PipelineChangeList.create()` notices that the `change_list` node
> doesn't exists in Zookeeper:
> https://opendev.org/zuul/zuul/src/branch/master/zuul/model.py#L921
> 6. S2: the shard writer creates the first shard `0000000000`
> 7. S1: the shard writer creates the second shared `0000000001`
>
> The race condition seems to be introduced with
> Ib1e467b5adb907f93bab0de61da84d2efc22e2a7
That change updated the pipeline manager _postConfig method so
that it no longer acquires the pipeline lock when initalizing the
pipeline state and change lists. This greatly reduces potential
delays during reconfiguration, but has, perhaps predictably, lead
to the race condition above.
In the commit message for that change, I noted that we might be
able to avoid even more work if we accept some caveats related to
error reporting. Avoiding that work mean avoiding performing any
writes during _postConfig which addresses the root cause of the
race condition (steps 3-6 above. Ed. note: there is no step 5).
From the commit message:
> We still need to attach new state and change list objects to
> the pipeline in _postConfig (since our pipeline object is new).
> We also should make sure that the objects exist in ZK before we
> leave that method, so that if a new pipeline is created, other
> schedulers will be able to load the (potentially still empty)
> objects from ZK. As an alternative, we could avoid even this
> work in _postConfig, but then we would have to handle missing
> objects on refresh, and it would not be possible to tell if the
> object was missing due to it being new or due to an error. To
> avoid masking errors, we keep the current expectation that we
> will create these objects in ZK on the initial reconfiguration.
The current change does exactly that. We no longer perform any
ZK write operations on the state and change list objects in
_postConfig. Instead, inside of the refresh methods, we detect
the cases where they should be newly created and do so at that
time. This happens with the pipeline lock, so is safe against
any simultaneous operation from other components.
There will be "ERROR" level log messages indicating that reading
the state from ZK has failed when these objects are first
initialized. To indicate that this is probably okay, they will
now be immediately followed by "WARNING" level messages explaining
that.
Strictly speaking, this particular race should only occur for the
change list object, not the pipeline state, since the race
condition above requires a sharded object and of the two, only
the change list is sharded. However, to keep the lifecycle of
these two objects matched (and to simplify _postConfig) the same
treatment is applied to both.
Note that change I7fa99cd83a857216321f8d946fd42abd9ec427a3 merged
after Ib1e467b and changed the behavior slightly, introducing the
old_state and old_list arguments. Curiously, the old_list
argument is effectively unused, so it is removed entirely in this
change. Old_state still has a purpose and is retained.
Change-Id: I519348e7d5d74e675808e990920480fb6e1fb981
|