delta/openstack/zuul.git - opendev.org: zuul/zuul.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Refactor merge mode name lookup	James E. Blair	2022-11-10	1	-4/+1
\| \| \| \| \| \|	This is repeated in a few places, centralize it. Change-Id: I7bbed1f5f9faad31affa71ef17fbfc1740c54db8
*	Parallelize some pipeline refresh ops	James E. Blair	2022-11-09	1	-29/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We may be able to speed up pipeline refreshes in cases where there are large numbers of items or jobs/builds by parallelizing ZK reads. Quick refresher: the ZK protocol is async, and kazoo uses a queue to send operations to a single thread which manages IO. We typically call synchronous kazoo client methods which wait for the async result before returning. Since this is all thread-safe, we can attempt to fill the kazoo pipe by having multiple threads call the synchronous kazoo methods. If kazoo is waiting on IO for an earlier call, it will be able to start a later request simultaneously. Quick aside: it would be difficult for us to use the async methods directly since our overall code structure is still ordered and effectively single threaded (we need to load a QueueItem before we can load the BuildSet and the Builds, etc). Thus it makes the most sense for us to retain our ordering by using a ThreadPoolExecutor to run some operations in parallel. This change parallelizes loading QueueItems within a ChangeQueue, and also Builds/Jobs within a BuildSet. These are the points in a pipeline refresh tree which potentially have the largest number of children and could benefit the most from the change, especially if the ZK server has some measurable latency. Change-Id: I0871cc05a2d13e4ddc4ac284bd67e5e3003200ad
*	Merge "Change merge mode default based on driver"	Zuul	2022-10-27	1	-0/+7
\|\
\| *	Change merge mode default based on driver	James E. Blair	2022-10-13	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The default merge mode is 'merge-resolve' because it has been observed that it more closely matches the behavior of jgit in Gerrit (or, at least it did the last time we looked into this). The other drivers are unlikely to use jgit and more likely to use the default git merge strategy. This change allows the default to differ based on the driver, and changes the default for all non-gerrit drivers to 'merge'. The implementation anticipates that we may want to add more granularity in the future, so the API accepts a project as an argument, and in the future, drivers could provide a per-project default (which they may obtain from the remote code review system). That is not implemented yet. This adds some extra data to the /projects endpoint in the REST api. It is currently not easy (and perhaps not possible) to determine what a project's merge mode is through the api. This change adds a metadata field to the output which will show the resulting value computed from all of the project stanzas. The project stanzas themselves may have null values for the merge modes now, so the web app now protects against that. Change-Id: I9ddb79988ca08aba4662cd82124bd91e49fd053c
* \|	Support authz for read-only web access	James E. Blair	2022-10-25	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This updates the web UI to support the requirement for authn/z for read-only access. If authz is required for read access, we will automatically redirect. If we return and still aren't authorized, we will display an "Authorization required" page (rather than continuing and popping up API error notifications). The API methods are updated to send an authorization token whenever one is present. Change-Id: I31c13c943d05819b4122fcbcf2eaf41515c5b1d9
* \|	Set Access-Control-Allow-Origin headers in check_auth tool	James E. Blair	2022-10-25	1	-55/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since we check authorization in every method except info now, set the headers in the check_auth tool instead of the individual methods; that way they are set even in the case of a 401. Change-Id: I397180122e03915694ba6e59b4bd3a743120ee6e
* \|	Add access-rules configuration and documentation	James E. Blair	2022-10-25	1	-135/+284
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows configuration of read-only access rules, and corresponding documentation. It wraps every API method in an auth check (other than info endpoints). It exposes information in the info endpoints that the web UI can use to decide whether it should send authentication information for all requests. A later change will update the web UI to use that. Change-Id: I3985c3d0b9f831fd004b2bb010ab621c00486e05
* \|	Add api-root tenant config object	James E. Blair	2022-10-25	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to allow for authenticated read-only access to zuul-web, we need to be able to control the authz of the API root. Currently, we can only specify auth info for tenants. But if we want to control access to the tenant list itself, we need to be able to specify auth rules. To that end, add a new "api-root" tenant configuration object which, like tenants themselves, will allow attaching authz rules to it. We don't have any admin-level API endpoints at the root, so this change does not add "admin-rules" to the api-root object, but if we do develop those in the future, it could be added. A later change will add "access-rules" to the api-root in order to allow configuration of authenticated read-only access. This change does add an "authentication-realm" to the api-root object since that already exists for tenants and it will make sense to have that in the future as well. Currently the /info endpoint uses the system default authentication realm, but this will override it if set. In general, the approach here is that the "api-root" object should mirror the "tenant" object for all attributes that make sense. Change-Id: I4efc6fbd64f266e7a10e101db3350837adce371f
* \|	Add check_auth tool to zuul-web	James E. Blair	2022-10-25	1	-191/+170
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Authentication checking in the admin methods of zuul-web is very duplicative. Consolidate all of the auth checks into a cherrypy tool that we can use to decorate methods. This tool also anticipates that we will have read-only checks in the future, but for now, it is still only used for admin checks. This tool also populates some additional parameters (like tenant and auth info) so that we don't need to call "getTenantOrRaise" multiple times in a request. Several methods performed HTTP method checks inside the method which inhibits our ability to wrap an entire method with an auth_check. To resolve this, we now use method conditions on the routes dispatcher. As a convention, I have put the options handling on the "GET" methods since they are most likely to be universal. Change-Id: Id815efd9337cbed621509bb0f914bdb552379bc7
* \|	Merge "Simplify tenant_authorizatons check"	Zuul	2022-10-26	1	-23/+7
\|\ \
\| * \|	Simplify tenant_authorizatons check	James E. Blair	2022-10-06	1	-23/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This method iterates over all tenants but only needs to return information about a single tenant. Simplify the calculation for efficiency. This includes a change in behavior for unknown tenants. Currently, a request to /api/tenant/{name}/authorizations will always succeed even if the tenant does not exist (it will return an authorization entry indicating the user is not an admin of the unknown tenant). This is unnecessary and confusing. It will now return a 404 for the unknown tenant. In the updated unit test, tenant-two was an unknown tenant; its name has been updated to 'unknown' to make that clear. (Since the test asserted that data were returned either way, it is unclear whether the original author of the unit test expected tenant-two to be unknown or known.) Change-Id: I545575fb73ef555b34c207f8a5f2e70935c049aa
* \| \|	Merge "Remove unused /api/user/authorizations REST endpoint"	Zuul	2022-10-25	1	-29/+0
\|\ \ \ \| \|/ /
\| * \|	Remove unused /api/user/authorizations REST endpoint	James E. Blair	2022-10-06	1	-29/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This has not beeen used for a while and can be removed. This will simplify the authorization code in zuul-web. Change-Id: I0fa6c4fb87672c44d3f97db0be558737b4f102bc
* \| \|	Merge "Rename admin-rule to authorization-rule"	Zuul	2022-10-25	1	-3/+3
\|\ \ \ \| \|/ /
\| * \|	Rename admin-rule to authorization-rule	James E. Blair	2022-10-06	1	-3/+3
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a preparatory step to add access-control for read-level access to the API and web UI. Because we will likely end up with tenant config that looks like: - tenant: name: example admin-rules: ['my-admin-rule'] access-rules: ['my-read-only-rule'] It does not make sense for 'my-read-only-rule' to be defined as: - admin-rule: name: read-only-rule In other words, the current nomenclature conflates (new word: nomenconflature) the idea of an abstract authorization rule and what it authorizes. The new name makes it more clear than an authorization-rule can be used to authorize more than just admin access. Change-Id: I44da8060a804bc789720bd207c34d802a52b6975
* \|	Merge "Include skipped builds in database and web ui"	Zuul	2022-10-25	1	-2/+4
\|\ \ \| \|/ \|/\|
\| *	Include skipped builds in database and web ui	James E. Blair	2022-10-06	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have had an on-and-off relationship with skipped builds in the database. Generally we have attempted to exclude them from the db, but we have occasionally (accidentally?) included them. The status quo is that builds with a result of SKIPPED (as well as several other results which don't come from the executor) are not recorded in the database. With a greater interest in being able to determine which jobs ran or did not run for a change after the fact, this job deliberately adds all builds (whether they touch an executor or not, whether real or not) to the database. This means than anything that could potentially show up on the status page or in a code-review report will be in the database, and can therefore be seen in the web UI. It is still the case that we are not actually interested in seeing a page full of SKIPPED builds when we visit the "Builds" tab in the web ui (which is the principal reason we have not included them in the database so far). To address this, we set the default query in the builds tab to exclude skipped builds (it is easy to add other types of builds to exclude in the future if we wish). If a user then specifies a query filter to include specific results, we drop the exclusion from the query string. This allows for the expected behavior of not showing SKIPPED by default, then as specific results are added to the filter, we show only those, and if the user selects that they want to see SKIPPED, they will then be included. On the buildset page, we add a switch similar to the current "show retried jobs" switch that selects whether skipped builds in a buildset should be displayed (again, it hides them by default). Change-Id: I1835965101299bc7a95c952e99f6b0b095def085
* \|	Correct exit routine in web, merger	James E. Blair	2022-10-05	1	-17/+38
\|/ \| \| \| \| \| \| \| \| \|	Change I216b76d6aaf7ebd01fa8cca843f03fd7a3eea16d unified the service stop sequence but omitted changes to zuul-web. Update zuul-web to match and make its sequence more robust. Also remove unecessary sys.exit calls from the merger. Change-Id: Ifdebc17878aa44d57996e4bdd46e49e6144b406b
*	Merge "Trace received Github events"	Zuul	2022-10-04	1	-1/+5
\|\
\| *	Trace received Github events	Simon Westphahl	2022-09-30	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We'll create a span when zuul-web receives a Github webhook event which is then linked to the span for the event pre-processing step. The pre-processing span context will be added to the trigger events and with Icd240712b86cc22e55fb67f6787a0974d5308043 complete tracing of the whole chain from receiving a Github event until a change is enqueued. Change-Id: I1734a3a9e44f0ae01f5ed3453f8218945c90db58
* \|	Add semaphores to REST API	James E. Blair	2022-09-07	1	-0/+42
\|/ \| \| \| \| \| \| \| \| \| \| \| \|	This adds information about semaphores to the REST API. It allows for inspection of the known semaphores in a tenant, the current number of jobs holding the semaphore, and information about each holder iff that holder is in the current tenant. Followup changes will add zuul-client and zuul-web support for the API, along with docs and release notes. Change-Id: I6ff57ca8db11add2429eefcc8b560abc9c074f4a
*	Fix links for jobs with special characters	Albin Vass	2022-08-23	1	-0/+2
\| \| \| \|	Change-Id: I12e8a056a2e5cd1bb18c1f24ecd7db55405f0a8c
*	Fix race is test_tenant_add_remove	James E. Blair	2022-07-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	Change I101fc42e29f4b376272c417ec536c17f05cb1a60 introduced a race in test_tenant_add_remove. If the test takes longer than 1 second, it will succeed, but less than the 1 second cache interval and it will fail. Correct that by disabling caching for the test. Change-Id: I313e519ab26cdbdad0cc17cd4d3489b49482f0b3
*	Merge "Use attributes instead of a nested dict for web cache"	Zuul	2022-06-28	1	-29/+23
\|\
\| *	Use attributes instead of a nested dict for web cache	James E. Blair	2022-06-27	1	-29/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a code-style only change which uses instance attributes instead of a nested dict to store cache information in zuul-web. This is slightly shorter, easier to read/type, less subject to typos, and allows static analysis (eg pyflakes) to work better. It also was slighly misleading to have the two caches combined into a single variable since they operated very differently (one is a single cache, the other is a set of caches). Change-Id: I29ece7e8a39992724596d2d252ad7b023c50f8a8
* \|	Merge "REST API: cache tenants list"	Zuul	2022-06-28	1	-17/+48
\|\ \ \| \|/
\| *	REST API: cache tenants list	Matthieu Huin	2022-03-24	1	-17/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Queue size computations can be costly on big tenants. Add caching to limit performance impact. Change-Id: I101fc42e29f4b376272c417ec536c17f05cb1a60
* \|	Merge "Add global semaphore support"	Zuul	2022-06-22	1	-1/+2
\|\ \
\| * \|	Add global semaphore support	James E. Blair	2022-05-31	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for global semaphores which can be used by multiple tenants. This supports the use case where they represent real-world resources which operate independentyl of Zuul tenants. This implements and removes the spec describing the feature. One change from the spec is that the configuration object in the tenant config file is "global-semaphore" rather than "semaphore". This makes it easier to distinguish them in documentation (facilitating easier cross-references and deep links), and may also make it easier for users to understand that they have distinct behavoirs. Change-Id: I5f2225a700d8f9bef0399189017f23b3f4caad17
* \| \|	Zuul-web: return 404 when attempting autohold ops on an unknown tenant	Matthieu Huin	2022-06-03	1	-3/+4
\|/ / \| \| \| \| \| \|	Change-Id: I03e78a1774e20af6f6895faa089017cdcf62bb48
* \|	Fix zuul-web layout update on full/tenant-reconfigure	James E. Blair	2022-05-25	1	-34/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an operator issues a full-reconfigure or tenant-reconfigure command to the scheduler, the branch cache is cleared. It will be fully populated again during the subsequent full-reconfiguration, or eventually populated more slowly over time after a tenant-reconfiguration. But during the time the branch cache is clear, zuul-web will be unable to update layouts since it is incapable of populating the branch cache itself. This produces a short time period of errors during a full-reconfig or a potentially long time period of errors after a tenant-reconfig. To correct this, we now detect whether a layout update in zuul-web is incomplete due to a branch cache error, and retry the update later. In the scheduler, we only clear the projects from the branch cache that are affected by the tenants we are reloading (still all the projects for a full-reconfigure). This limits the time during which zuul-web will be unable to update the layout to only the time that the scheduler spends actually performing a reconfiguration. (Note that in general, the error here is not because zuul-web is loading the layout for the tenant that zuul-scheduler is reconfiguring, but rather it is loading a layout for a tenant which has projects in common with the tenant that is being reconfigured.) Change-Id: I6794da4d2316f7df6ab302c74b3efb5df4ce461a
* \|	Fix background layout updates in zuul-web	James E. Blair	2022-04-13	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change Ia0a524053a110e8e48f709d219651e2ad9c8513d updated the scheduler to more aggressively use the file cache, but did not make the corresponding updates to zuul-web. This means that configuration changes are not being reflected in the web UI until the service restarts. This corrects that and adds a test. Change-Id: Iebdf485105bd02e57fa0d3db5ba162308b640ca0
* \|	Merge "Report gross/total tenant resource usage stats"	Zuul	2022-03-23	1	-1/+1
\|\ \ \| \|/ \|/\|
\| *	Report gross/total tenant resource usage stats	Benjamin Schanzel	2022-03-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Export a new statsd gauge with the total resources of a tenant. Currently, we only export resources of in-use nodes. With this, we additionally report the cummulative resources of all of a tenants nodes (i.e. ready, deleting, ...). This also renames the existing in-use resource stat to distinguish those clearly. Change-Id: I76a8c1212c7e9b476782403d52e4e22c030d1371
* \|	Perform per tenant locking in getStatus	Tobias Henkel	2022-03-09	1	-7/+16
\|/ \| \| \| \| \| \| \| \| \| \| \|	When retrieving the status json zuul-web currently holds a global lock. This serializes all status requests regardless of the need to update. This can lead to very slow response times in a loaded multi tenant environment. This can be improved by locking per tenant and also don't block if we can satisfy the request by the cache. Change-Id: I9896bedd79761c304066027b1735a517511692ce
*	Merge "Cache serialized tenant status"	Zuul	2022-02-24	1	-4/+4
\|\
\| *	Cache serialized tenant status	Tobias Henkel	2022-01-31	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We see a very high cpu load on zuul web. Profiling shows that most of that time is spent during json serialization. The biggest json blobs we have is the per tenant status json. This is currently cached as data structure so we can filter it by change as well. In order to reduce json serialization effort also cache a pre-serialized version of this and use it for the tenant status. Change-Id: Idbbcda6bf0835fadf970e4fb43adc300770da8e7
* \|	Merge "Don't fail on missing change_queues key in status json"	Zuul	2022-02-23	1	-1/+1
\|\ \
\| * \|	Don't fail on missing change_queues key in status json	Felix Edel	2022-02-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the pipeline state is stored in ZooKeeper, there could be cases where the change_queues key is missing in the status json. This makes API requests fail: 2022-02-22 17:56:18,390 ERROR cherrypy.error.139989033522128: [22/Feb/2022:17:56:18] HTTP Traceback (most recent call last): File "/opt/zuul/lib/python3.8/site-packages/cherrypy/_cprequest.py", line 638, in respond self._do_respond(path_info) File "/opt/zuul/lib/python3.8/site-packages/cherrypy/_cprequest.py", line 697, in _do_respond response.body = self.handler() File "/opt/zuul/lib/python3.8/site-packages/cherrypy/lib/encoding.py", line 223, in __call__ self.body = self.oldhandler(args, kwargs) File "/opt/zuul/lib/python3.8/site-packages/cherrypy/lib/jsontools.py", line 59, in json_handler value = cherrypy.serving.request._json_inner_handler(args, *kwargs) File "/opt/zuul/lib/python3.8/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__ return self.callable(self.args, **self.kwargs) File "/opt/zuul/lib/python3.8/site-packages/zuul/web/__init__.py", line 1050, in status_change return result_filter.filterPayload(payload) File "/opt/zuul/lib/python3.8/site-packages/zuul/web/__init__.py", line 193, in filterPayload for change_queue in pipeline['change_queues']: KeyError: 'change_queues' Fix this by using a .get() call rather than directly accessing the dictionary key by name. A similar issue was already fixed in [1]. [1]: https://review.opendev.org/c/zuul/zuul/+/829018 Change-Id: I947f58f02c3da7dad35d1fc186c7026800f7cbdd
* \| \|	Add buildset start/end db columns	James E. Blair	2022-02-23	1	-13/+11
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add two columns to the buildset table in the database: the timestamp of the start of the first build and the end of the last build. These are calculated from the builds in the webui buildset page, but they are not available in the buildset listing without performing a table join on the server side. To keep the buildset query simple and fast, this adds the columns to the buildset table (which is a minor data duplication). Return the new values in the rest api. Change-Id: Ie162e414ed5cf09e9dc8f5c92e07b80934592fdf
* \|	Fix removal of tenant on layout update	Simon Westphahl	2022-02-21	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed tenants where not correctly handled by the layout update logic in the scheduler and zuul-web. The tenant was only removed from the scheduler that process the reconfiguration event but not from other scheduler and web instances that reacted on the layout update. In addition to that we now also clear the TPCs of the removed tenant. Change-Id: I0b3ff77388c27132a558207316bd42c8646212c6
* \|	Merge "Make a global component registry"	Zuul	2022-02-18	1	-5/+4
\|\ \
\| * \|	Make a global component registry	James E. Blair	2022-02-14	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We generally try to avoid global variables, but in this case, it may be helpful to set the component registry as a global variable. We need the component registry to determine the ZK data model API version. It's relatively straightforward to pass it through the zkcontext for zkobjects, but we also may need it in other places where we might alter processing of data we previously got from zk (eg, the semaphore cleanup). Or we might need it in serialize or deserialize methods of non-zkobjects (for example, ChangeKey). To account for all potential future uses, instantiate a global singleton object which holds a registry and use that instead of local-scoped component registry objects. We also add a clear method so that we can be sure unit tests start with clean data. Change-Id: Ib764dbc3a3fe39ad6d70d4807b8035777d727d93
* \| \|	Protect stream handler thread from exceptions	James E. Blair	2022-02-16	1	-35/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change I6da1c88dfe20eb9e1ada09d7aa741f9024ddfc04 updated the log streaming handler to explicitly close streaming sockets rather than waiting for them to be garbage collected. However, it appears it may be possible now for the unregisterStreamer method to be called after the socket is closed. Perhaps it can happen if the socket is closed due to an error on transmission. The exact mechanism is not clear, but the following errors were observed: :Exception in thread StreamManager: Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/zuul/web/__init__.py", line 1647, in run streamer.handle(event) File "/usr/local/lib/python3.8/site-packages/zuul/web/__init__.py", line 340, in handle self.websocket.send(data, False) File "/usr/local/lib/python3.8/site-packages/ws4py/websocket.py", line 303, in send self._write(m) File "/usr/local/lib/python3.8/site-packages/ws4py/websocket.py", line 285, in _write self.sock.sendall(b) BrokenPipeError: [Errno 32] Broken pipe During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/local/lib/python3.8/threading.py", line 870, in run self._target(self._args, *self._kwargs) File "/usr/local/lib/python3.8/site-packages/zuul/web/__init__.py", line 1651, in run self.unregisterStreamer(streamer) File "/usr/local/lib/python3.8/site-packages/zuul/web/__init__.py", line 1675, in unregisterStreamer self.poll.unregister(streamer.finger_socket) ValueError: file descriptor cannot be a negative integer (-1) While the exact sequence is not clear, the following should both prevent the issues from recurring and help produce accurate logs in the future: 1) Preserve the fileno on the stream handler so that we can unregister it from the poll even after it has been closed. We catch KeyErrors from poll.unregister, so if we attemp to unregister it twice, that's fine. We just can't unregister -1. So if we save the fileno that we use when registering and use the same one when unregistering, we should avoid any attempts to unregister an invalide fileno. 2) Put the entire stream handler run loop in a try/except. This is a pattern we generally use to avoid errors like this killing threads which otherwise should continue running. Change-Id: I67e67c0e1406cba2d31af28af0dcba302003af81
* \| \|	Don't block tenant list on empty pipeline summary	Simon Westphahl	2022-02-14	1	-1/+1
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case a pipeline was never processed by a scheduler the pipeline summary will be empty. This can block the tenant status page. 2022-02-14 09:01:10,292 ERROR cherrypy.error.140642757326640: [14/Feb/2022:09:01:10] HTTP Traceback (most recent call last): File "/opt/zuul/lib/python3.8/site-packages/cherrypy/_cprequest.py", line 638, in respond self._do_respond(path_info) File "/opt/zuul/lib/python3.8/site-packages/cherrypy/_cprequest.py", line 697, in _do_respond response.body = self.handler() File "/opt/zuul/lib/python3.8/site-packages/cherrypy/lib/encoding.py", line 223, in __call__ self.body = self.oldhandler(args, kwargs) File "/opt/zuul/lib/python3.8/site-packages/cherrypy/lib/jsontools.py", line 59, in json_handler value = cherrypy.serving.request._json_inner_handler(args, *kwargs) File "/opt/zuul/lib/python3.8/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__ return self.callable(self.args, **self.kwargs) File "/opt/zuul/lib/python3.8/site-packages/zuul/web/__init__.py", line 924, in tenants for queue in status["change_queues"]: KeyError: 'change_queues' Change-Id: Iae63c0628a8bdcd5fcf7aff66e1400ec394a07db
* \|	Add zuul-scheduler tenant-reconfigure	James E. Blair	2022-02-08	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a new reconfiguration command which behaves like full-reconfigure but only for a single tenant. This can be useful after connection issues with code hosting systems, or potentially with Zuul cache bugs. Because this is the first command-socket command with an argument, some command-socket infrastructure changes are necessary. Additionally, this includes some minor changes to make the services more consistent around socket commands. Change-Id: Ib695ab8e7ae54790a0a0e4ac04fdad96d60ee0c9
* \|	Fix threadpool.queue metric in zuul-web to report the correct value	Felix Edel	2022-02-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the metric reports the same value as the threadpool.idle metric which feels wrong. Fix this by providing the correct variable to the metric. Change-Id: I92acc1881f5299dc96d41593427cd8d7282b3380
* \|	Merge "Display overall duration in buidset page in zuul web"	Zuul	2022-02-05	1	-4/+13
\|\ \
\| * \|	Display overall duration in buidset page in zuul web	Dong Zhang	2022-01-27	1	-4/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The overall duration is from a user (developer) point of view, how much time it takes from the trigger of the build (e.g. a push, a comment, etc.), till the last build is finished. It takes into account also the time spent in waiting in queue, launching nodes, preparing the nodes, etc. Technically it measures between the event timestamp and the end time of the last build in the build set. This duration reflects the user experience of how much time the user needs to wait. Change-Id: I253d023146c696d0372197e599e0df3c217ef344
* \| \|	Add stats to web server	James E. Blair	2022-02-02	1	-3/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds matrics which report the number of thread workers in use as well as the number of requests queued at the start of each request in cherrypy. It also reports the number of streamers currently running. These can help us detect and diagnose problems with the web server. Change-Id: Iadf9479ae84167892ab11ae122f275637c0c6c6f