| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
When a user sets zuul_console_disabled, we don't need to try to
connect to the streaming daemon. In fact, they may have set it
because they know it won't be running. Check for this and avoid
the connection step in that case and therefore avoid the extraneous
"Waiting on logger" messages and extra 30 second delay at the end
of each task.
Change-Id: I86af231f1ca1c5b54b21daae29387a8798190a58
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This is a follow-on to Ia78ad9e3ec51bc47bf68c9ff38c0fcd16ba2e728 to
use a different loopback address for the local connection to the
Python 2.7 container. This way, we don't have to override the
existing localhost/127.0.0.1 matches that avoid the executor trying to
talk to a zuul_console daemon. These bits are removed.
The comment around the port settings is updated while we're here.
Change-Id: I33b2198baba13ea348052e998b1a5a362c165479
|
|\ \
| |/ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Change Ief366c092e05fb88351782f6d9cd280bfae96237 intoduced a bug in
the streaming daemons because it was using Python 3.6 features. The
streaming console needs to work on all Ansible managed nodes, which
includes back to Python 2.7 nodes (while Ansible supports that).
This introduces a regression test by building about the smallest
Python 2.7 container that can be managed by Ansbile. We start this
container and modify the test inventory to include it, then run the
stream tests against it.
The existing testing runs against the "new" console but also tests
against the console OpenDev's Zuul starts to ensure
backwards-compatability. Since this container wasn't started by Zuul
it doesn't have this, so that testing is skipped for this node.
It might be good to abstract all testing of the console daemons into
separate containers for each Ansible supported managed-node Python
version -- it's a bit more work than I want to take on right now.
This should ensure the lower-bound though and prevent regressions for
older platforms.
Change-Id: Ia78ad9e3ec51bc47bf68c9ff38c0fcd16ba2e728
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
I noticed in some of our testing a construct like
debug:
msg: '{{ ansible_version }}'
was actually erroring out; you'll see in the console output if you're
looking
Ansible output: b'TASK [Print ansible version msg={{ ansible_version }}] *************************'
Ansible output: b'[WARNING]: Failure using method (v2_runner_on_ok) in callback plugin'
Ansible output: b'(<ansible.plugins.callback.zuul_stream.CallbackModule object at'
Ansible output: b"0x7f502760b490>): 'dict' object has no attribute 'startswith'"
and the job-output.txt will be empty for this task (this is detected
by by I9f569a411729f8a067de17d99ef6b9d74fc21543).
This is because the msg value here comes in as a dict, and in several
places we assume it is a string. This changes places we inspect the
msg variable to use the standard Ansible way to make a text string
(to_text function) and ensures in the logging function it converts the
input to a string.
We test for this with updated tasks in the remote_zuul_stream tests.
It is slightly refactored to do partial matches so we can use the
version strings, which is where we saw the issue.
Change-Id: I6e6ed8dba2ba1fc74e7fc8361e8439ea6139279e
|
|\ \ \
| |/ /
| | /
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently the task in the test playbook
- hosts: compute1
tasks:
- name: Command Not Found
command: command-not-found
failed_when: false
is failing in the zuul_stream callback with an exception trying to
fill out the "delta" value in the message here. The result dict
(taken from the new output) shows us why:
2022-08-24 07:19:27.079961 | TASK [Command Not Found]
2022-08-24 07:19:28.578380 | compute1 | ok: ERROR (ignored)
2022-08-24 07:19:28.578622 | compute1 | {
2022-08-24 07:19:28.578672 | compute1 | "failed_when_result": false,
2022-08-24 07:19:28.578700 | compute1 | "msg": "[Errno 2] No such file or directory: b'command-not-found'",
2022-08-24 07:19:28.578726 | compute1 | "rc": 2
2022-08-24 07:19:28.578750 | compute1 | }
i.e. it has no start/stop/delta in the result (it did run and fail, so
you'd think it might ... but this is what Ansible gives us).
This checks for this path; as mentioned the output will now look like
above in this case.
This was found by the prior change
I9f569a411729f8a067de17d99ef6b9d74fc21543. This fixes the current
warning, so we invert the test to prevent further regressions.
Change-Id: I106b2bbe626ed5af8ca739d354ba41eca2f08f77
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the defaulh "linear" strategy (and likely others), Ansible will
send the on_task_start callback, and then fork a worker process to
execute that task. Since we spawn a thread in the on_task_start
callback, we can end up emitting a log message in this method while
Ansible is forking. If a forked process inherits a Python file object
(i.e., stdout) that is locked by a thread that doesn't exist in the
fork (i.e., this one), it can deadlock when trying to flush the file
object. To minimize the chances of that happening, we should avoid
using _display outside the main thread.
The Python logging module is supposed to use internal locks which are
automatically aqcuired and released across a fork. Assuming this is
(still) true and functioning correctly, we should be okay to issue
our Python logging module calls at any time. If there is a fault
in this system, however, it could have a potential to cause a similar
problem.
If we can convince the Ansible maintainers to lock _display across
forks, we may be able to revert this change in the future.
Change-Id: Ifc6b835c151539e6209284728ccad467bef8be6f
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When using protocol version 1, send a finalise message when streaming
is complete so that the zuul_console daemon can delete the temporary
file.
We test this by inspecting the Ansible console output, which logs a
message with the UUID of the streaming job. We dump the temporary
files on the remote side and make sure a console file for that job
isn't present.
Change-Id: I823156dc2bcae91bd6d9770bd1520aa55ad875b4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A refresher on how this works, to the best of my knowledge
1 Firstly, Zuul's Ansible has a library task "zuul_console:" which is
run against the remote node; this forks a console daemon, listening
on a default port.
2 We have a action plugin that runs for each task, and if that task
is a command/shell task, assigns it a unique id
3 We then override with library/command.py (which backs command/shell
tasks) with a version that forks and runs the process on the target
node as usual, but also saves the stdout/stderr output to a
temporary file named per the unique uuid from the step above.
4 At the same time we have the callback plugin zuul_stream.py, which
Ansible is calling as it moves through starting, running and
finishing the tasks. This looks at the task, and if it has a UUID
[2], sends a request to the zuul_console [1], which opens the
temporary file [3] and starts streaming it back.
5 We loop reading this until the connection is closed by [1],
eventually outputting each line.
In this way, the console log is effectively streamed and saved into
our job output.
We have established that we expect the console [1] is updated
asynchronously to the command/streaming [3,4] in situation such as
static nodes. This poses a problem if we ever want to update either
part -- for example we can not change the file-name that the
command.py file logs to, because an old zuul_console: will not know to
open the new file. You could imagine other fantasy things you might
like to do; e.g. negotiate compression etc. that would have similar
issues.
To provide the flexibility for these types of changes, implement a
simple protocol where the zuul_stream and zuul_console sides exchange
their respective version numbers before sending the log files. This
way they can both decide what operations are compatible both ways.
Luckily the extant protocol, which is really just sending a plain
uuid, can be adapted to this. When an old zuul_console server gets
the protocol request it will just look like an invalid log file, which
zuul_stream can handle and thus assume the remote end doesn't know
about protocols.
This bumps the testing timeout; it seems that the extra calls make for
random failures. The failures are random and not in the same place,
I've run this separately in 850719 several times and not seen any
problems with the higher timeout. This test is already has a settle
timeout slightly higher so I think it must have just been on the edge.
Change-Id: Ief366c092e05fb88351782f6d9cd280bfae96237
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Zuul Ansible callback stream plugin assumed that the ansible loop
var was always called 'item' in the result_dict. You can override this
value (and it is often necessary to do so to avoid collisions) to
something less generic. In those cases we would get errors like:
b'[WARNING]: Failure using method (v2_runner_item_on_ok) in callback plugin'
b'(<ansible.plugins.callback.zuul_stream.CallbackModule object at'
b"0x7fbecc97c910>): 'item'"
And stream output would not include the info typically logged.
Address this by checking if ansible_loop_var is in the results_dict and
using that value for the loop var name instead. We still fall back to
'item' as I'm not sure that ansible_loop_var is always present.
Change-Id: I408e6d4af632f8097d63c04cbcb611d843086f6c
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a role is applied to a host more than once (via either play
roles or include_roles, but not via an include_role loop), it will
have the same task UUID from ansible which means Zuul's command
plugin will write the streaming output to the same filename, and
the log streaming will request the same file. That means the file
might look this after the second invocation:
2022-05-19 17:06:23.673625 | one
2022-05-19 17:06:23.673781 | [Zuul] Task exit code: 0
2022-05-19 17:06:29.226463 | two
2022-05-19 17:06:29.226605 | [Zuul] Task exit code: 0
But since we stop reading the log after "Task exit code", the user
would see "one" twice, and never see "two".
Here are some potential fixes for this that don't work:
* Accessing the task vars from zuul_stream to store any additional
information: the callback plugins are not given the task vars.
* Setting the log id on the task args in zuul_stream instead of
command: the same Task object is used for each host and therefore
the command module might see the task object after it has been
further modified (in other words, nothing host-specific can be
set on the task object).
* Setting an even more unique uuid than Task._uuid on the Task
object in zuul_stream and using that in the command module instead
of Task._uuid: in some rare cases, the actual task Python object
may be different between the callback and command plugin, yet still
have the same _uuid; therefore the new attribute would be missing.
Instead, a global variable is used in order to transfer data between
zuul_stream and command. This variable holds a counter for each
task+host combination. Most of the time it will be 1, but if we run
the same task on the same host again, it will increment. Since Ansible
will not run more than one task on a host simultaneously, so there is
no race between the counter being incremented in zuul_stream and used
in command.
Because Ansible is re-invoked for each playbook, the memory usage is
not a concern.
There may be a fork between zuul_stream and command, but that's fine
as long as we treat it as read-only in the command plugin. It will
have the data for our current task+host from the most recent zuul_stream
callback invocation.
This change also includes a somewhat unrelated change to the test
infrastructure. Because we were not setting the log stream port on
the executor in tests, we were actually relying on the "real" OpenDev
Zuul starting zuul_console on the test nodes rather than the
zuul_console we set up for each specific Ansible version from the tests.
This corrects that and uses the correct zuul_console port, so that if we
make any changes to zuul_console in the future, we will test the
changed version, not the one from the Zuul which actually runs the
tox-remote job.
Change-Id: Ia656db5f3dade52c8dbd0505b24049fe0fff67a5
|
|
|
|
|
|
|
|
| |
async tasks is causing zuul_stream.py to hang for 30 seconds before
timing out all log streaming threads. This stops zuul_stream.py from
streaming logs for async tasks.
Change-Id: I86098b0788fdd87593e3c68d04953ede7607432b
|
|
|
|
| |
Change-Id: If76e730673ac80e254c878c46b17181e14b4a5de
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`_read_log()` calls `_log_streamline()` when it handles `socket.timeout`
exception. This change ensures `_log_streamline()` don't raise another
exception:
```
Traceback (most recent call last):'
File "/var/lib/zuul/ansible/2.9/zuul/ansible/callback/zuul_stream.py", line 137, in _read_log'
s = socket.create_connection((ip, port), 5)'
File "/usr/lib/python3.6/socket.py", line 724, in create_connection'
raise err'
File "/usr/lib/python3.6/socket.py", line 713, in create_connection'
sock.connect(sa)'
socket.timeout: timed out'
'
During handling of the above exception, another exception occurred:'
'
Traceback (most recent call last):'
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner'
self.run()'
File "/usr/lib/python3.6/threading.py", line 864, in run'
self._target(*self._args, **self._kwargs)'
File "/var/lib/zuul/ansible/2.9/zuul/ansible/callback/zuul_stream.py", line 152, in _read_log'
% (ip, port))'
File "/var/lib/zuul/ansible/2.9/zuul/ansible/callback/zuul_stream.py", line 202, in _log_streamline'
ts, ln = line.split(' | ', 1)"
ValueError: not enough values to unpack (expected 2, got 1)'
```
Change-Id: Ifeea230d1285ab62acf11a0ed37e38e508e1dba8
|
|
|
|
|
|
|
| |
Strangely this doesn't occur on master but did here:
https://review.opendev.org/#/c/727089/3
Change-Id: Ie2ff53d8ed2db03f70bd2ab7789028b5b7a5e91d
|
|
|
|
|
|
|
| |
Connections via winrm don't have support for streaming yet as this
needs a powershell based zuul_stream module.
Change-Id: Ibc0804ec0466fa83e37a4e453028833ce737da8e
|
|
|
|
|
|
|
|
|
|
|
| |
When we get a pod from nodepool, this starts a kubectl port-forward
on the pod so that zuul-console and the normal method of streaming
command output will work.
Change-Id: Iae85347c3d8e0a74e330a7b62b513c7b41641383
Story: 2007321
Task: 38832
Depends-On: https://review.opendev.org/709259
|
|
|
|
|
|
|
| |
Ansible has released 2.9 and now Zuul also supports it.
Change-Id: Iabf2d6278ba8d88e17403a4adae5521eb3e7019b
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
This change prevents an exception from happening in the zuul_stream
module when the msg result is a list instead of a string.
This is the case for the package module trying to install missing rpm
on ansible 2.8 with python3.
Change-Id: I242711ebc0b2dcd1262de38e76efc48554361bd1
Story: 2006830
|
|
|
|
|
|
|
| |
This change ensures the zuul-tox-remote job is testing the unique zuul_console
service instead of the one setup by the job.
Change-Id: If2e2324235a6608985168d606d95adeade36afe0
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases especially on contended nodes and short tasks it can
happen that the task is finished before the log we want to stream
showed up. In this case we stop the streamers too early resulting in
discarded log output. This makes the tox-remote tests flaky but could
also happen in production.
This can be fixed by giving the streamers some additional time when
stopping if the log didn't show up yet.
Change-Id: If7f786dfcbdac28fccb17d55ce76cb02ae6ad28a
|
|
|
|
|
|
|
|
|
| |
Ansible has released 2.8 and now zuul also supports it. We've had to
update zuul_console to deal with new tasks stats, along with
gather_facts also now being exposed directly to the job.
Change-Id: Ifa4be7cf408b1f05b0f985fa0c9a5e3947858078
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
|
|
|
|
|
|
| |
We should keep up with recent ansible versions so support 2.7.
Change-Id: Ic4234784f104690d72cfac71e09ecd170ccce1b5
|
|
As a first step towards supporting multiple ansible versions we need
tooling to manage ansible installations. This moves the installation
of ansible from the requirements.txt into zuul. This is called as a
setup hook to install the ansible versions into
<prefix>/lib/zuul/ansible. Further this tooling abstracts knowledge
that the executor must know in order to actually run the correct
version of ansible.
The actual usage of multiple ansible versions will be done in
follow-ups.
For better maintainability the ansible plugins live in
zuul/ansible/base where plugins can be kept in different versions if
necessary. For each supported ansible version there is a specific
folder that symlinks the according plugins.
Change-Id: I5ce1385245c76818777aa34230786a9dbaf723e5
Depends-On: https://review.openstack.org/623927
|