| Commit message (Collapse) | Author | Age | Files | Lines |
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This command had two problems:
* It would only delete the first 50 buildsets
* Depending on DB configuration, it may not have deleted anything
or left orphan data.
We did not tell sqlalchemy to cascade delete operations, meaning that
when we deleted the buildset, we didn't delete anything else.
If the database enforces foreign keys (innodb, psql) then the command
would have failed. If it doesn't (myisam) then it would have deleted
the buildset rows but not anything else.
The tests use myisam, so they ran without error and without deleting
the builds. They check that the builds are deleted, but only through
the ORM via a joined load with the buildsets, and since the buildsets
are gone, the builds weren't returned.
To address this shortcoming, the tests now use distinct ORM methods
which return objects without any joins. This would have caught
the error had it been in place before.
Additionally, the delet operation retained the default limit of 50
rows (set in place for the web UI), meaning that when it did run,
it would only delete the most recent 50 matching builds.
We now explicitly set the limit to a user-configurable batch size
(by default, 10,000 builds) so that we keep transaction sizes
manageable and avoid monopolizing database locks. We continue deleting
buildsets in batches as long as any matching buildsets remain. This
should allow users to remove very large amounts of data without
affecting ongoing operations too much.
Change-Id: I4c678b294eeda25589b75ab1ce7c5d0b93a07df3
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The delete-pipeline-state commend forces a layout update on every
scheduler, but that isn't strictly necessary. While it may be helpful
for some issues, if it really is necessary, the operator can issue
a tenant reconfiguration after performing the delete-pipeline-state.
In most cases, where only the state information itself is causing a
problem, we can omit the layout updates and assume that the state reset
alone is sufficient.
To that end, this change removes the layout state changes from the
delete-pipeline-state command and instead simply empties and recreates
the pipeline state and change list objects. This is very similar to
what happens in the pipeline manager _postConfig call, except in this
case, we have the tenant lock so we know we can write with imputinity,
and we know we are creating objects in ZK from scratch, so we use
direct create calls.
We set the pipeline state's layout uuid to None, which will cause the
first scheduler that comes across it to (assuming its internal layout
is up to date) perform a pipeline reset (which is almost a noop on an
empty pipeline) and update the pipeline state layout to the current
tenant layout state.
Change-Id: I1c503280b516ffa7bbe4cf456d9c900b500e16b0
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The delete-pipeline-state command updates the layout state in order
to force schedulers to update their local layout (essentially perform
a local-only reconfiguration). In doing so, it sets the last event
ltime to -1. This is reasonable for initializing a new system, but
in an existing system, when an event arrives at the tenant trigger
event queue it is assigned the last reconfiguration event ltime seen
by that trigger event queue. Later, when a scheduler processes such
a trigger event after the delete-pipeline-state command has run, it
will refuse to handle the event since it arrived much later than
its local layout state.
This must then be corrected manually by the operator by forcing a
tenant reconfiguration. This means that the system essentially suffers
the delay of two sequential reconfigurations before it can proceed.
To correct this, set the last event ltime for the layout state to
the ltime of the layout state itself. This means that once a scheduler
has updated its local layout, it can proceed in processing old events.
Change-Id: I66e798adbbdd55ff1beb1ecee39c7f5a5351fc4b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Loading config involves significant network operations for each project:
* Loading project keys
* Asking the source for the list of branches for each project
* Retrieving the config file contents from the ZK cache (if present)
* Retrieving the config file contents from git (otherwise)
Only the third item in that list is parallelized currently; the others
are serialized. To parallelize the remainder, use a thread pool executor.
The value of max_workers=4 is chosen as it appears in practice on OpenDev
to make the most significant reduction in startup time while higher values
make little difference (and could potentially contribute to DoS scenarios
or local thread contention). Observed config priming times for various
worker counts:
1: 282s
2: 181s
4: 144s
8: 146s
Change-Id: I65472a8af96ed95eb28b88cc623ef103be76a46f
|
|
|
|
|
|
|
| |
This adds a zuul-admin command which allows operators to delete old
database entries.
Change-Id: I4e277a07394aa4852a563f4c9cdc39b5801ab4ba
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have two CLIs: zuul-client for REST-related operations, which cover
tenant-scoped, workflow modifying actions such as enqueue, dequeue and
promote; and zuul which supercedes zuul-client and covers also true admin
operations like ZooKeeper maintenance, config checking and issueing auth tokens.
This is a bit confusing for users and operators, and can induce code
duplication.
* Rename zuul CLI into zuul-admin. zuul is still a valid endpoint
and will be removed after next release.
* Print a deprecation warning when invoking the admin CLI as zuul
instead of zuul-admin, and when running autohold-*, enqueue-*,
dequeue and promote subcommands. These subcommands will need to be
run with zuul-client after next release.
* Clarify the scopes and deprecations in the documentation.
Change-Id: I90cf6f2be4e4c8180ad0f5e2696b7eaa7380b411
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is intended to aid Zuul developers who are diagnosing a bug
with a running Zuul and who have determined that Zuul may be able to
correct the situation and resume if a pipeline is completely reset.
It is intrusive and not at all guaranteed to work. It may make things
worse. It's basically just a convenience method to avoid firing up
the REPL and issuing Python commands directly. I can't enumerate the
requirements where it may or may not work. Therefore the documentation
recommends against its use and there is no release note included.
Nevertheless, we may find it useful to have such a command during
a crisis in the future.
Change-Id: Ib637c31ff3ebbb2733a4ad9b903075e7b3dc349c
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an early preparation step for removing the RPC calls between
zuul-web and the scheduler.
We want to format the status JSON and do the job freezing (job freezing
API) directly in zuul-web without utilising the scheduler via RPC. In
order to make this work, zuul-web must instantiate a ConfigLoader.
Currently this would require a scheduler instance which is not available
in zuul-web, thus we have to make this parameter optional.
Change-Id: I41214086aaa9d822ab888baf001972d2846528be
|
|
|
|
|
|
|
|
|
| |
It occured to me that we should also test that the removal of one
project's from an org in zookeeper should not remove other projects or
the org itself. We only want to remove a project whose keys are removed
and an org when all projects are removed from it.
Change-Id: I5bb3192785fe8a863b82f7d13494bd330541f0a1
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The zuul delete-keys command can leave us with empty org and project
dirs in zookeeper. When this happens the zuul export-keys command
complaisn about secrets not being present. Address this by checking if
the project dir and org dir should be cleaned up when calling
delete-keys.
Note this happend to OpenDev after renaming all projects from foo/* to
bar/* orphaning the org level portion of the name.
Change-Id: I6bba5ea29a752593b76b8e58a0d84615cc639346
|
|
|
|
| |
Change-Id: Idb2918fab4d17aa611bf81f42d5b86abc865514f
|
|
|
|
|
|
|
| |
This will give operators a tool for manual recovery in case of
emergency.
Change-Id: Ia84beb08b685f59a24f76cb0b6adf518f6e64362
|
|
|
|
|
|
| |
These can be used when renaming a project.
Change-Id: I98cf304914449622f9db48651b83e0744b676498
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the filesystem-based keystore in favor of only using
ZooKeeper. Zuul will no longer load missing keys from the filesystem,
nor will it write out decrypted copies of all keys to the filesystem.
This is more secure since it allows sites better control over when and
where secret data are written to disk.
To provide for system backups to aid in disaster recovery in the case
that the ZK data store is lost, two new scheduler commands are added:
* export-keys
* import-keys
These write the password-protected versions of the keys (in fact, a
raw dump of the ZK data) to the filesystem, and read the same data
back in. An administrator can invoke export-keys before performing a
system backup, and run import-keys to restore the data.
A minor doc change recommending the use of ``zuul-scheduler stop`` was
added as well; this is left over from a previous version of this change
but warrants updating.
This also removes the test_keystore test file; key generation is tested
in test_v3, and key usage is tested by tests which have encrypted secrets.
Change-Id: I5e6ea37c94ab73ec6f850591871c4127118414ed
|
|
|
|
|
|
|
| |
This change prevents the tenant-conf-check from failing when
running without a ZooKeeper service.
Change-Id: Ib4f96268e40afd46eb531f84e0d20751bb985fc3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the ZooKeeper connection is initialized directly in the cmd
classes like zuul.cmd.scheduler or zuul.cmd.merger and then passed to
the server instance.
Although this makes it easy to reuse a single ZooKeeper connection for
multiple components in the tests it's not very realistic.
A better approach would be to initialize the connection directly in the
server classes so that each component has its own connection to
ZooKeeper.
Those classes already get all necessary parameters, so we could get rid
of the additional "zk_client" parameter.
Furthermore it would allow us to use a dedicated ZooKeeper connection
for each component in the tests which is more realistic than sharing a
single connection between all components.
Change-Id: I12260d43be0897321cf47ef0c722ccd74599d43d
|
|
|
|
|
|
|
|
| |
The release of pyjwt 2.0.0 changed the behavior of some functions, which
caused errors. Fix the errors, use pyjwt 2.0.0's better handling of JWKS,
and pin requirement to 2.X to avoid future potential API breaking changes.
Change-Id: Ibef736e0f635dfaf4477cc2a90a22665da9f1959
|
|
|
|
|
|
|
|
| |
Config files are written using file handles which are never closed.
Under python 3.4 or later, that causes ResourceWarning warnings to be
emitted.
Change-Id: Ia3c11f61b62b367afe8f588816e3e8837835e835
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the subprocess is started before the reference timestamp is
created, it can happen that the check for the expiration field fails.
Traceback (most recent call last):
File "/tmp/zuul/tests/unit/test_client.py", line 151, in test_token_generation
(token['exp'], now))
File "/tmp/zuul/.tox/py36/lib/python3.6/site-packages/unittest2/case.py", line 702, in assertTrue
raise self.failureException(msg)
AssertionError: False is not true : (1568016146.9831738, 1568015546.1448617)
Change-Id: I9ef56c12ed1be2a6ec168c4a9363125919be44e9
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the "create-auth-token" subcommand to the zuul CLI; this subcommand
allows an operator to create an authentication token for a user with
customized authorizations.
This requires at least one auth section with a signing key to be specified in
Zuul's configuration file.
This is meant as a way to provide authorizations "manually" on test
deployments, until a proper authorization engine is plugged into Zuul,
in a subsequent patch.
Change-Id: I039e70cd8d5e502795772af0ea2a336c08316f2c
|
|
This patch adds a new command 'tenant-conf-check' to the Zuul client
command. This option runs a tenant_file validation by running the schema
valiation of the file. The command exits -1 if errors have been
detected. The command does not use RPC call but instead expects to
find the tenant_file on the local filesystem.
Change-Id: I6582bbc37706971085dac5c3ca3b4c690c515f9e
|