| Commit message (Collapse) | Author | Age | Files | Lines |
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| | |
With the new Elasticsearch does not support custom field type [1].
[1] https://www.elastic.co/guide/en/elasticsearch/reference/7.17/removal-of-types.html#_custom_type_field
Change-Id: I0b154da0a4736c6b7758f9936356d5b7097c35ad
|
|\ \
| |/ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
According to the removal-of-types[1] documentation, it is no longer
necessary to specify a document type.
[1] https://www.elastic.co/guide/en/elasticsearch/reference/7.17/removal-of-types.html
Change-Id: I02996ce328a48b5ae6493646abe08ebab31ec962
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When a build result arrives for a non-current buildset we should skip
the reporting as we can no longer create the reference to the buildset.
Traceback (most recent call last):
File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2654, in _doBuildCompletedEvent
self.sql.reportBuildEnd(
File "/opt/zuul/lib/python3.10/site-packages/zuul/driver/sql/sqlreporter.py", line 143, in reportBuildEnd
db_build = self._createBuild(db, build)
File "/opt/zuul/lib/python3.10/site-packages/zuul/driver/sql/sqlreporter.py", line 180, in _createBuild
tenant=buildset.item.pipeline.tenant.name, uuid=buildset.uuid)
AttributeError: 'NoneType' object has no attribute 'item'
Change-Id: Iccbe9ab8212fbbfa21cb29b84a17e03ca221d7bd
|
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When a job already fails during setup we never load the frozen hostvars.
Since the cleanup playbooks depend on those, we can skip the cleanup
runs if the dict is empty.
As we always add "localhost" to the hostlist, the frozen hostvars will
never be empty when loading was successful.
This will get rid of the following exception:
Traceback (most recent call last):
File "/opt/zuul/lib/python3.10/site-packages/zuul/executor/server.py", line 1126, in execute
self._execute()
File "/opt/zuul/lib/python3.10/site-packages/zuul/executor/server.py", line 1493, in _execute
self.runCleanupPlaybooks(success)
File "/opt/zuul/lib/python3.10/site-packages/zuul/executor/server.py", line 1854, in runCleanupPlaybooks
self.runAnsiblePlaybook(
File "/opt/zuul/lib/python3.10/site-packages/zuul/executor/server.py", line 3042, in runAnsiblePlaybook
self.writeInventory(playbook, self.frozen_hostvars)
File "/opt/zuul/lib/python3.10/site-packages/zuul/executor/server.py", line 2551, in writeInventory
inventory = make_inventory_dict(
File "/opt/zuul/lib/python3.10/site-packages/zuul/executor/server.py", line 913, in make_inventory_dict
node_hostvars = hostvars[node['name']].copy()
KeyError: 'node'
Change-Id: I33a6a9ab355482e471e79f3dd5d702589fee04b3
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | | |
Change-Id: I0d450d9385b9aaab22d2d87fb47798bf56525f50
|
|\ \ \ \ \ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
This is a follow-on to Ia78ad9e3ec51bc47bf68c9ff38c0fcd16ba2e728 to
use a different loopback address for the local connection to the
Python 2.7 container. This way, we don't have to override the
existing localhost/127.0.0.1 matches that avoid the executor trying to
talk to a zuul_console daemon. These bits are removed.
The comment around the port settings is updated while we're here.
Change-Id: I33b2198baba13ea348052e998b1a5a362c165479
|
|\ \ \ \ \ \
| |/ / / / / |
|
| | |_|/ /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Change Ief366c092e05fb88351782f6d9cd280bfae96237 intoduced a bug in
the streaming daemons because it was using Python 3.6 features. The
streaming console needs to work on all Ansible managed nodes, which
includes back to Python 2.7 nodes (while Ansible supports that).
This introduces a regression test by building about the smallest
Python 2.7 container that can be managed by Ansbile. We start this
container and modify the test inventory to include it, then run the
stream tests against it.
The existing testing runs against the "new" console but also tests
against the console OpenDev's Zuul starts to ensure
backwards-compatability. Since this container wasn't started by Zuul
it doesn't have this, so that testing is skipped for this node.
It might be good to abstract all testing of the console daemons into
separate containers for each Ansible supported managed-node Python
version -- it's a bit more work than I want to take on right now.
This should ensure the lower-bound though and prevent regressions for
older platforms.
Change-Id: Ia78ad9e3ec51bc47bf68c9ff38c0fcd16ba2e728
|
|\ \ \ \ \
| | |/ / /
| |/| | | |
|
| |/ / /
| | | |
| | | |
| | | | |
Change-Id: I2576d0dcec7c8f7bbb76bdd469fd992874742edc
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
I noticed in some of our testing a construct like
debug:
msg: '{{ ansible_version }}'
was actually erroring out; you'll see in the console output if you're
looking
Ansible output: b'TASK [Print ansible version msg={{ ansible_version }}] *************************'
Ansible output: b'[WARNING]: Failure using method (v2_runner_on_ok) in callback plugin'
Ansible output: b'(<ansible.plugins.callback.zuul_stream.CallbackModule object at'
Ansible output: b"0x7f502760b490>): 'dict' object has no attribute 'startswith'"
and the job-output.txt will be empty for this task (this is detected
by by I9f569a411729f8a067de17d99ef6b9d74fc21543).
This is because the msg value here comes in as a dict, and in several
places we assume it is a string. This changes places we inspect the
msg variable to use the standard Ansible way to make a text string
(to_text function) and ensures in the logging function it converts the
input to a string.
We test for this with updated tasks in the remote_zuul_stream tests.
It is slightly refactored to do partial matches so we can use the
version strings, which is where we saw the issue.
Change-Id: I6e6ed8dba2ba1fc74e7fc8361e8439ea6139279e
|
|\ \ \ \ \
| |/ / / /
| | | / /
| |_|/ /
|/| | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently the task in the test playbook
- hosts: compute1
tasks:
- name: Command Not Found
command: command-not-found
failed_when: false
is failing in the zuul_stream callback with an exception trying to
fill out the "delta" value in the message here. The result dict
(taken from the new output) shows us why:
2022-08-24 07:19:27.079961 | TASK [Command Not Found]
2022-08-24 07:19:28.578380 | compute1 | ok: ERROR (ignored)
2022-08-24 07:19:28.578622 | compute1 | {
2022-08-24 07:19:28.578672 | compute1 | "failed_when_result": false,
2022-08-24 07:19:28.578700 | compute1 | "msg": "[Errno 2] No such file or directory: b'command-not-found'",
2022-08-24 07:19:28.578726 | compute1 | "rc": 2
2022-08-24 07:19:28.578750 | compute1 | }
i.e. it has no start/stop/delta in the result (it did run and fail, so
you'd think it might ... but this is what Ansible gives us).
This checks for this path; as mentioned the output will now look like
above in this case.
This was found by the prior change
I9f569a411729f8a067de17d99ef6b9d74fc21543. This fixes the current
warning, so we invert the test to prevent further regressions.
Change-Id: I106b2bbe626ed5af8ca739d354ba41eca2f08f77
|
|\ \ \ \
| |_|/ /
|/| | | |
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The Ansible version is sometimes used for selecting the correct linter
or for implementing feature switches to make roles/playbooks backward
compatible.
With the split of Ansible into an "ansible" and "ansible-core" package,
the `ansible_version` now contains the version of the core package.
There seems to be no other variable that contains the version of the
"Ansible community" package that Zuul is using.
In order to support this use-case for Ansible 5+ we will add the Ansible
version to the job's Zuul vars.
Change-Id: I3f3a3237b8649770a9b7ff488e501a97b646a4c4
|
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
With the defaulh "linear" strategy (and likely others), Ansible will
send the on_task_start callback, and then fork a worker process to
execute that task. Since we spawn a thread in the on_task_start
callback, we can end up emitting a log message in this method while
Ansible is forking. If a forked process inherits a Python file object
(i.e., stdout) that is locked by a thread that doesn't exist in the
fork (i.e., this one), it can deadlock when trying to flush the file
object. To minimize the chances of that happening, we should avoid
using _display outside the main thread.
The Python logging module is supposed to use internal locks which are
automatically aqcuired and released across a fork. Assuming this is
(still) true and functioning correctly, we should be okay to issue
our Python logging module calls at any time. If there is a fault
in this system, however, it could have a potential to cause a similar
problem.
If we can convince the Ansible maintainers to lock _display across
forks, we may be able to revert this change in the future.
Change-Id: Ifc6b835c151539e6209284728ccad467bef8be6f
|
|\ \ \ \
| |/ / /
|/| | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This adds a config-error pipeline reporter configuration option and
now also reports config errors and merge conflicts to the database
as buildset failures.
The driving use case is that if periodic pipelines encounter config
errors (such as being unable to freeze a job graph), they might send
email if configured to send email on merge conflicts, but otherwise
their results are not reported to the database.
To make this more visible, first we need Zuul pipelines to report
buildset ends to the database in more cases -- currently we typically
only report a buildset end if there are jobs (and so a buildset start),
or in some other special cases. This change adds config errors and
merge conflicts to the set of cases where we report a buildset end.
Because of some shortcuts previously taken, that would end up reporting
a merge conflict message to the database instead of the actual error
message. To resolve this, we add a new config-error reporter action
and adjust the config error reporter handling path to use it instead
of the merge-conflicts action.
Tests of this as well as the merge-conflicts code path are added.
Finally, a small debug aid is added to the GerritReporter so that we
can easily see in the logs which reporter action was used.
Change-Id: I805c26a88675bf15ae9d0d6c8999b178185e4f1f
|
|\ \ \ \
| |_|/ /
|/| | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The report message "This change depends on a change that failed to
merge" (and a similar change for circular dependency bundles) is
famously vague. To help users identify the actual problem, include
URLs for which change(s) caused the problem so that users may more
easily resolve the issue.
Change-Id: Id8b9f8cf2c108703e9209e30bdc9a3933f074652
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
To avoid issues with outdated Github access tokens in the Git config we
only update the remote URL on the repo object after the config update
was successful.
This also adds a missing repo lock when building the repo state.
Change-Id: I8e1b5b26f03cb75727d2b2e3c9310214a3eac447
|
| |_|/ /
|/| | |
| | | |
| | | | |
Change-Id: I12e8a056a2e5cd1bb18c1f24ecd7db55405f0a8c
|
|\ \ \ \
| |_|_|/
|/| | | |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Sometimes Gerrit events may arrive in batches (for example, an
automated process modifies several related changes nearly
simultaneously). Because of our inbuilt delay (10 seconds by
default), it's possible that in these cases, many or all of the
updates represented by these events will have settled on the Gerrit
server before we even start processing the first event. In these
cases, we don't need to query the same changes multiple times.
Take for example a stack of 10 changes. Someone approves all 10
simultaneously. That would produce (at least) 10 events for Zuul
to process. Each event would cause Zuul to query all 10 changes in
the series (since they are related). That's 100 change queries
(and each change query requires 2 or 3 HTTP queries).
But if we know that all the event arrived before our first set of
change queries, we can reduce that set of 100 queries to 10 by
suppressing any queries after the first.
This change generates a logical timestamp (ltime) immediately
before querying Gerrit for a change, and stores that ltime in the
change cache. Whenever an event arrives for processing with an
ltime later than the query ltime, we assume the change is already
up to date with that event and do not perform another query.
Change-Id: Ib1b9245cc84ab3f5a0624697f4e3fc73bc8e03bd
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In zuul_stream.py:v2_playbook_on_task_start() it checks for
"task.loop" and exits if the task is part of a loop.
However the library/command.py override still writes out the console
log despite it never being read. To avoid leaving this file around,
mark a sentinel uuid in the action plugin if the command is part of a
loop. In that case, for simplicity we just write to /dev/null -- that
way no other assumptions in the library/command.py have to change; it
just doesn't leave a file on disk.
This is currently difficult to test as the infrastructure zuul_console
leaves /tmp/console-* files and we do not know what comes from that,
or testing. After this and the related change
I823156dc2bcae91bd6d9770bd1520aa55ad875b4 are deployed to the
infrastructure executors, we can make a simple and complete test for
the future by just ensuring no /tmp/console-* files are left behind
afer testing. I have tested this locally and do not see files from
loops, which I was before this change.
Change-Id: I4f4660c3c0b0f170561c14940cc159dc43eadc79
|
|\ \ \ \ \
| |_|_|/ /
|/| | | | |
|
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
An executor can have a zone configured and at the same time allow
unzoned jobs. In this case the executor was not counted for the zoned
executor metric (online/accepting).
Change-Id: Ib39947e3403d828b595cf2479e64789e049e63cc
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In Ief366c092e05fb88351782f6d9cd280bfae96237 I missed that this runs
in the context of the remote node; meaning that it must support all
the Python versions that might run there. f-strings are not 3.5
compatible.
I'm thinking about how to lint this better (a syntax check run?)
Change-Id: Ia4133b061800791196cd631f2e6836cb77347664
|
|\ \ \ \ \
| |/ / / /
|/| | | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When using protocol version 1, send a finalise message when streaming
is complete so that the zuul_console daemon can delete the temporary
file.
We test this by inspecting the Ansible console output, which logs a
message with the UUID of the streaming job. We dump the temporary
files on the remote side and make sure a console file for that job
isn't present.
Change-Id: I823156dc2bcae91bd6d9770bd1520aa55ad875b4
|
|\ \ \ \ \
| |/ / / /
| | / / /
| |/ / /
|/| | | |
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A refresher on how this works, to the best of my knowledge
1 Firstly, Zuul's Ansible has a library task "zuul_console:" which is
run against the remote node; this forks a console daemon, listening
on a default port.
2 We have a action plugin that runs for each task, and if that task
is a command/shell task, assigns it a unique id
3 We then override with library/command.py (which backs command/shell
tasks) with a version that forks and runs the process on the target
node as usual, but also saves the stdout/stderr output to a
temporary file named per the unique uuid from the step above.
4 At the same time we have the callback plugin zuul_stream.py, which
Ansible is calling as it moves through starting, running and
finishing the tasks. This looks at the task, and if it has a UUID
[2], sends a request to the zuul_console [1], which opens the
temporary file [3] and starts streaming it back.
5 We loop reading this until the connection is closed by [1],
eventually outputting each line.
In this way, the console log is effectively streamed and saved into
our job output.
We have established that we expect the console [1] is updated
asynchronously to the command/streaming [3,4] in situation such as
static nodes. This poses a problem if we ever want to update either
part -- for example we can not change the file-name that the
command.py file logs to, because an old zuul_console: will not know to
open the new file. You could imagine other fantasy things you might
like to do; e.g. negotiate compression etc. that would have similar
issues.
To provide the flexibility for these types of changes, implement a
simple protocol where the zuul_stream and zuul_console sides exchange
their respective version numbers before sending the log files. This
way they can both decide what operations are compatible both ways.
Luckily the extant protocol, which is really just sending a plain
uuid, can be adapted to this. When an old zuul_console server gets
the protocol request it will just look like an invalid log file, which
zuul_stream can handle and thus assume the remote end doesn't know
about protocols.
This bumps the testing timeout; it seems that the extra calls make for
random failures. The failures are random and not in the same place,
I've run this separately in 850719 several times and not seen any
problems with the higher timeout. This test is already has a settle
timeout slightly higher so I think it must have just been on the edge.
Change-Id: Ief366c092e05fb88351782f6d9cd280bfae96237
|
|\ \ \ |
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Adds the pipeline for the change to the subject format. This makes it
easier to include information about the pipeline (e.g. its name) in the
e-mail subject
Change-Id: I6ec973635543b4404c125589f23ffd1ba5504c17
|
| | |
| | |
| | |
| | |
| | |
| | | |
This fixes pep8 E275 which wants whitespace after assert and del.
Change-Id: I1f8659f462aa91c3fdf8f7eb8b939b67c0ce9f55
|
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
So that operators can see in aggregate how long merge, files-changes,
and repo-state merge operations take in certain pipelines, add
metrics for the merge operations themselves (these exclude the
overhead of pipeline processing and job dispatching).
Change-Id: I8a707b8453c7c9559d22c627292741972c47c7d7
|
|\ \ \ \ |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The pipeline change cache is used to avoid repeated cache lookups
for change dependencies, but is not as robust as the "real" sounce
change cache at managing the lifetime of change objects. If it is
used to store some change objects in one scheduler which are later
dequeued by a second scheduler, the next time those changes show
up in that pipeline on the first scheduler it may use old change
objects instead of new ones from the ZooKeeper cache.
To fix this, clear the pipeeline change cache before we refresh
the pipeline state. This will cause some extra ZK ChangeCache
hits to repopulate the cache, but only in the case of commit
dependencies (the pipeline change cache isn't used for git
dependencies or the changes in the pipeline themselves unless they
are commit dependencies).
Change-Id: I0a20dc972917440d4f3e8bb59295b77c13913a48
|