summaryrefslogtreecommitdiff
path: root/releasenotes/notes/pipeline-timing-ea263e6e5939b1aa.yaml
Commit message (Collapse)AuthorAgeFilesLines
* Add a tenant reconfiguration metricJames E. Blair2022-03-031-0/+1
| | | | | | | This allows operators to see when tenant reconfiguration events are processed, and how long they halt pipeline processing. Change-Id: I30102b2d51ae98ade194722a7720ddec7eed4dad
* Add even more pipeline processing statsJames E. Blair2022-02-281-0/+2
| | | | | | | This adds some timers that can help operators identify the frequency and time spent processing individual pipelines. Change-Id: I927e30268b3c3c08b720174a961259f9670f8c4b
* Add more pipeline processing statsJames E. Blair2022-02-281-0/+6
| | | | | | | | This adds the number of zk objects, nodes, and bytes read and written during each pipeline processing run. This can help Zuul developers ascertain where to optimize performance. Change-Id: Ic2592faeb08d6c2a72b99000864c41ada665cd3b
* Add pipeline timing metricsJames E. Blair2022-02-201-0/+16
This adds several metrics for different phases of processing an item in a pipeline: * How long we wait for a response from mergers * How long it takes to get or compute a layout * How long it takes to freeze jobs * How long we wait for node requests to complete * How long we wait for an executor to start running a job after the request And finally, the total amount of time from the original event until the first job starts. We already report that at the tenant level, this duplicates that for a pipeline-specific metric. Several of these would also make sense as job metrics, but since they are mainly intended to diagnose Zuul system performance and not individual jobs, that would be a waste of storage space due to the extremely high cardinality. Additionally, two other timing metrics are added: the cumulative time spent reading and writing ZKObject data to ZK during pipeline processing. These can help determine whether more effort should be spent optimizing ZK data transfer. In preparing this change, I noticed that python statsd emits floating point values for timing. It's not clear whether this strictly matches the statsd spec, but since it does emit values with that precision, I have removed several int() casts in order to maintain the precision through to the statsd client. I also noticed a place where we were writing a monotonic timestamp value in a JSON serialized string to ZK. I do not believe this value is currently being used, therefore there is no further error to correct, however, we should not use time.monotonic() for values that are serialized since the reference clock will be different on different systems. Several new attributes are added to the QueueItem and Build classes, but are done so in a way that is backwards compatible, so no model api schema upgrade is needed. The code sites where they are used protect against the null values which will occur in a mixed-version cluster (the components will just not emit these stats in those cases). Change-Id: Iaacbef7fa2ed93bfc398a118c5e8cfbc0a67b846