delta/openstack/zuul.git - opendev.org: zuul/zuul.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge "Fix prune-database command"	Zuul	2023-03-30	1	-19/+117
\|\
\| *	Fix prune-database command	James E. Blair	2023-03-29	1	-19/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This command had two problems: * It would only delete the first 50 buildsets * Depending on DB configuration, it may not have deleted anything or left orphan data. We did not tell sqlalchemy to cascade delete operations, meaning that when we deleted the buildset, we didn't delete anything else. If the database enforces foreign keys (innodb, psql) then the command would have failed. If it doesn't (myisam) then it would have deleted the buildset rows but not anything else. The tests use myisam, so they ran without error and without deleting the builds. They check that the builds are deleted, but only through the ORM via a joined load with the buildsets, and since the buildsets are gone, the builds weren't returned. To address this shortcoming, the tests now use distinct ORM methods which return objects without any joins. This would have caught the error had it been in place before. Additionally, the delet operation retained the default limit of 50 rows (set in place for the web UI), meaning that when it did run, it would only delete the most recent 50 matching builds. We now explicitly set the limit to a user-configurable batch size (by default, 10,000 builds) so that we keep transaction sizes manageable and avoid monopolizing database locks. We continue deleting buildsets in batches as long as any matching buildsets remain. This should allow users to remove very large amounts of data without affecting ongoing operations too much. Change-Id: I4c678b294eeda25589b75ab1ce7c5d0b93a07df3
* \|	Avoid layout updates after delete-pipeline-state	James E. Blair	2023-03-01	1	-29/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The delete-pipeline-state commend forces a layout update on every scheduler, but that isn't strictly necessary. While it may be helpful for some issues, if it really is necessary, the operator can issue a tenant reconfiguration after performing the delete-pipeline-state. In most cases, where only the state information itself is causing a problem, we can omit the layout updates and assume that the state reset alone is sufficient. To that end, this change removes the layout state changes from the delete-pipeline-state command and instead simply empties and recreates the pipeline state and change list objects. This is very similar to what happens in the pipeline manager _postConfig call, except in this case, we have the tenant lock so we know we can write with imputinity, and we know we are creating objects in ZK from scratch, so we use direct create calls. We set the pipeline state's layout uuid to None, which will cause the first scheduler that comes across it to (assuming its internal layout is up to date) perform a pipeline reset (which is almost a noop on an empty pipeline) and update the pipeline state layout to the current tenant layout state. Change-Id: I1c503280b516ffa7bbe4cf456d9c900b500e16b0
* \|	Set layout state event ltime in delete-pipeline-state	James E. Blair	2023-02-28	1	-63/+75
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The delete-pipeline-state command updates the layout state in order to force schedulers to update their local layout (essentially perform a local-only reconfiguration). In doing so, it sets the last event ltime to -1. This is reasonable for initializing a new system, but in an existing system, when an event arrives at the tenant trigger event queue it is assigned the last reconfiguration event ltime seen by that trigger event queue. Later, when a scheduler processes such a trigger event after the delete-pipeline-state command has run, it will refuse to handle the event since it arrived much later than its local layout state. This must then be corrected manually by the operator by forcing a tenant reconfiguration. This means that the system essentially suffers the delay of two sequential reconfigurations before it can proceed. To correct this, set the last event ltime for the layout state to the ltime of the layout state itself. This means that once a scheduler has updated its local layout, it can proceed in processing old events. Change-Id: I66e798adbbdd55ff1beb1ecee39c7f5a5351fc4b
*	Parallelize config cache loading	James E. Blair	2022-06-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Loading config involves significant network operations for each project: * Loading project keys * Asking the source for the list of branches for each project * Retrieving the config file contents from the ZK cache (if present) * Retrieving the config file contents from git (otherwise) Only the third item in that list is parallelized currently; the others are serialized. To parallelize the remainder, use a thread pool executor. The value of max_workers=4 is chosen as it appears in practice on OpenDev to make the most significant reduction in startup time while higher values make little difference (and could potentially contribute to DoS scenarios or local thread contention). Observed config priming times for various worker counts: 1: 282s 2: 181s 4: 144s 8: 146s Change-Id: I65472a8af96ed95eb28b88cc623ef103be76a46f
*	Add prune-database command	James E. Blair	2022-05-30	1	-2/+122
\| \| \| \| \| \| \|	This adds a zuul-admin command which allows operators to delete old database entries. Change-Id: I4e277a07394aa4852a563f4c9cdc39b5801ab4ba
*	Clarify zuul admin CLI scope	Matthieu Huin	2022-05-19	1	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have two CLIs: zuul-client for REST-related operations, which cover tenant-scoped, workflow modifying actions such as enqueue, dequeue and promote; and zuul which supercedes zuul-client and covers also true admin operations like ZooKeeper maintenance, config checking and issueing auth tokens. This is a bit confusing for users and operators, and can induce code duplication. * Rename zuul CLI into zuul-admin. zuul is still a valid endpoint and will be removed after next release. * Print a deprecation warning when invoking the admin CLI as zuul instead of zuul-admin, and when running autohold-, enqueue-, dequeue and promote subcommands. These subcommands will need to be run with zuul-client after next release. * Clarify the scopes and deprecations in the documentation. Change-Id: I90cf6f2be4e4c8180ad0f5e2696b7eaa7380b411
*	Add "zuul delete-pipeline-state" command	James E. Blair	2022-01-26	1	-1/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is intended to aid Zuul developers who are diagnosing a bug with a running Zuul and who have determined that Zuul may be able to correct the situation and resume if a pipeline is completely reset. It is intrusive and not at all guaranteed to work. It may make things worse. It's basically just a convenience method to avoid firing up the REPL and issuing Python commands directly. I can't enumerate the requirements where it may or may not work. Therefore the documentation recommends against its use and there is no release note included. Nevertheless, we may find it useful to have such a command during a crisis in the future. Change-Id: Ib637c31ff3ebbb2733a4ad9b903075e7b3dc349c
*	Make the ConfigLoader work independently of the Scheduler	Felix Edel	2021-11-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is an early preparation step for removing the RPC calls between zuul-web and the scheduler. We want to format the status JSON and do the job freezing (job freezing API) directly in zuul-web without utilising the scheduler via RPC. In order to make this work, zuul-web must instantiate a ConfigLoader. Currently this would require a scheduler instance which is not available in zuul-web, thus we have to make this parameter optional. Change-Id: I41214086aaa9d822ab888baf001972d2846528be
*	Add addtional checks to key deletion testing	Clark Boylan	2021-10-20	1	-0/+10
\| \| \| \| \| \| \| \| \|	It occured to me that we should also test that the removal of one project's from an org in zookeeper should not remove other projects or the org itself. We only want to remove a project whose keys are removed and an org when all projects are removed from it. Change-Id: I5bb3192785fe8a863b82f7d13494bd330541f0a1
*	Cleanup empty secrets dirs when deleting secrets	Clark Boylan	2021-10-19	1	-3/+35
\| \| \| \| \| \| \| \| \| \| \| \| \|	The zuul delete-keys command can leave us with empty org and project dirs in zookeeper. When this happens the zuul export-keys command complaisn about secrets not being present. Address this by checking if the project dir and org dir should be cleaned up when calling delete-keys. Note this happend to OpenDev after renaming all projects from foo/* to bar/* orphaning the org level portion of the name. Change-Id: I6bba5ea29a752593b76b8e58a0d84615cc639346
*	Exit sucessfully when manipulating project keys	Albin Vass	2021-09-21	1	-0/+4
\| \| \| \|	Change-Id: Idb2918fab4d17aa611bf81f42d5b86abc865514f
*	Add delete-state command to delete everything from ZK	James E. Blair	2021-08-24	1	-0/+49
\| \| \| \| \| \| \|	This will give operators a tool for manual recovery in case of emergency. Change-Id: Ia84beb08b685f59a24f76cb0b6adf518f6e64362
*	Add copy-keys and delete-keys zuul client commands	James E. Blair	2021-08-24	1	-1/+41
\| \| \| \| \| \|	These can be used when renaming a project. Change-Id: I98cf304914449622f9db48651b83e0744b676498
*	Add commands to export/import keys to/from ZK	James E. Blair	2021-08-24	1	-1/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This removes the filesystem-based keystore in favor of only using ZooKeeper. Zuul will no longer load missing keys from the filesystem, nor will it write out decrypted copies of all keys to the filesystem. This is more secure since it allows sites better control over when and where secret data are written to disk. To provide for system backups to aid in disaster recovery in the case that the ZK data store is lost, two new scheduler commands are added: * export-keys * import-keys These write the password-protected versions of the keys (in fact, a raw dump of the ZK data) to the filesystem, and read the same data back in. An administrator can invoke export-keys before performing a system backup, and run import-keys to restore the data. A minor doc change recommending the use of ``zuul-scheduler stop`` was added as well; this is left over from a previous version of this change but warrants updating. This also removes the test_keystore test file; key generation is tested in test_v3, and key usage is tested by tests which have encrypted secrets. Change-Id: I5e6ea37c94ab73ec6f850591871c4127118414ed
*	zuul tenant-conf-check: disable scheduler creation	Tristan Cacqueray	2021-06-11	1	-1/+7
\| \| \| \| \| \| \|	This change prevents the tenant-conf-check from failing when running without a ZooKeeper service. Change-Id: Ib4f96268e40afd46eb531f84e0d20751bb985fc3
*	Initialize ZooKeeper connection in server rather than in cmd classes	Felix Edel	2021-03-08	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the ZooKeeper connection is initialized directly in the cmd classes like zuul.cmd.scheduler or zuul.cmd.merger and then passed to the server instance. Although this makes it easy to reuse a single ZooKeeper connection for multiple components in the tests it's not very realistic. A better approach would be to initialize the connection directly in the server classes so that each component has its own connection to ZooKeeper. Those classes already get all necessary parameters, so we could get rid of the additional "zk_client" parameter. Furthermore it would allow us to use a dedicated ZooKeeper connection for each component in the tests which is more realistic than sharing a single connection between all components. Change-Id: I12260d43be0897321cf47ef0c722ccd74599d43d
*	Bump pyjwt to 2.0.0	Matthieu Huin	2021-01-14	1	-2/+2
\| \| \| \| \| \| \| \|	The release of pyjwt 2.0.0 changed the behavior of some functions, which caused errors. Fix the errors, use pyjwt 2.0.0's better handling of JWKS, and pin requirement to 2.X to avoid future potential API breaking changes. Change-Id: Ibef736e0f635dfaf4477cc2a90a22665da9f1959
*	test: prevent ResourceWarning in test_client	Antoine Musso	2020-01-22	1	-10/+11
\| \| \| \| \| \| \| \|	Config files are written using file handles which are never closed. Under python 3.4 or later, that causes ResourceWarning warnings to be emitted. Change-Id: Ia3c11f61b62b367afe8f588816e3e8837835e835
*	Fix timestamp race occurring on fast systems	Simon Westphahl	2019-09-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the subprocess is started before the reference timestamp is created, it can happen that the check for the expiration field fails. Traceback (most recent call last): File "/tmp/zuul/tests/unit/test_client.py", line 151, in test_token_generation (token['exp'], now)) File "/tmp/zuul/.tox/py36/lib/python3.6/site-packages/unittest2/case.py", line 702, in assertTrue raise self.failureException(msg) AssertionError: False is not true : (1568016146.9831738, 1568015546.1448617) Change-Id: I9ef56c12ed1be2a6ec168c4a9363125919be44e9
*	Allow operator to generate auth tokens through the CLI	mhuin	2019-07-10	1	-2/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the "create-auth-token" subcommand to the zuul CLI; this subcommand allows an operator to create an authentication token for a user with customized authorizations. This requires at least one auth section with a signing key to be specified in Zuul's configuration file. This is meant as a way to provide authorizations "manually" on test deployments, until a proper authorization engine is plugged into Zuul, in a subsequent patch. Change-Id: I039e70cd8d5e502795772af0ea2a336c08316f2c
*	Add tenant yaml validation option to zuul client	Fabien Boucher	2018-07-23	1	-0/+63
	This patch adds a new command 'tenant-conf-check' to the Zuul client command. This option runs a tenant_file validation by running the schema valiation of the file. The command exits -1 if errors have been detected. The command does not use RPC call but instead expects to find the tenant_file on the local filesystem. Change-Id: I6582bbc37706971085dac5c3ca3b4c690c515f9e