delta/openstack/zuul.git - opendev.org: zuul/zuul.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Do not wait for streamer when disabled	James E. Blair	2023-04-10	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	When a user sets zuul_console_disabled, we don't need to try to connect to the streaming daemon. In fact, they may have set it because they know it won't be running. Check for this and avoid the connection step in that case and therefore avoid the extraneous "Waiting on logger" messages and extra 30 second delay at the end of each task. Change-Id: I86af231f1ca1c5b54b21daae29387a8798190a58
*	Merge "zuul_stream : Use !127.0.0.1 for loopback"	Zuul	2022-09-08	1	-7/+7
\|\
\| *	zuul_stream : Use !127.0.0.1 for loopback	Ian Wienand	2022-09-07	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-on to Ia78ad9e3ec51bc47bf68c9ff38c0fcd16ba2e728 to use a different loopback address for the local connection to the Python 2.7 container. This way, we don't have to override the existing localhost/127.0.0.1 matches that avoid the executor trying to talk to a zuul_console daemon. These bits are removed. The comment around the port settings is updated while we're here. Change-Id: I33b2198baba13ea348052e998b1a5a362c165479
* \|	Merge "zuul-stream : Test against a Python 2.7 container"	Zuul	2022-09-08	1	-2/+6
\|\ \ \| \|/
\| *	zuul-stream : Test against a Python 2.7 container	Ian Wienand	2022-09-07	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change Ief366c092e05fb88351782f6d9cd280bfae96237 intoduced a bug in the streaming daemons because it was using Python 3.6 features. The streaming console needs to work on all Ansible managed nodes, which includes back to Python 2.7 nodes (while Ansible supports that). This introduces a regression test by building about the smallest Python 2.7 container that can be managed by Ansbile. We start this container and modify the test inventory to include it, then run the stream tests against it. The existing testing runs against the "new" console but also tests against the console OpenDev's Zuul starts to ensure backwards-compatability. Since this container wasn't started by Zuul it doesn't have this, so that testing is skipped for this node. It might be good to abstract all testing of the console daemons into separate containers for each Ansible supported managed-node Python version -- it's a bit more work than I want to take on right now. This should ensure the lower-bound though and prevent regressions for older platforms. Change-Id: Ia78ad9e3ec51bc47bf68c9ff38c0fcd16ba2e728
* \|	Merge "zuul_stream: handle non-string msg value"	Zuul	2022-09-07	1	-6/+12
\|\ \
\| * \|	zuul_stream: handle non-string msg value	Ian Wienand	2022-08-24	1	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed in some of our testing a construct like debug: msg: '{{ ansible_version }}' was actually erroring out; you'll see in the console output if you're looking Ansible output: b'TASK [Print ansible version msg={{ ansible_version }}] *************************' Ansible output: b'[WARNING]: Failure using method (v2_runner_on_ok) in callback plugin' Ansible output: b'(<ansible.plugins.callback.zuul_stream.CallbackModule object at' Ansible output: b"0x7f502760b490>): 'dict' object has no attribute 'startswith'" and the job-output.txt will be empty for this task (this is detected by by I9f569a411729f8a067de17d99ef6b9d74fc21543). This is because the msg value here comes in as a dict, and in several places we assume it is a string. This changes places we inspect the msg variable to use the standard Ansible way to make a text string (to_text function) and ensures in the logging function it converts the input to a string. We test for this with updated tasks in the remote_zuul_stream tests. It is slightly refactored to do partial matches so we can use the version strings, which is where we saw the issue. Change-Id: I6e6ed8dba2ba1fc74e7fc8361e8439ea6139279e
* \| \|	Merge "zuul_stream: handle failed_when tasks"	Zuul	2022-09-07	1	-4/+12
\|\ \ \ \| \|/ / \| \| / \| \|/ \|/\|
\| *	zuul_stream: handle failed_when tasks	Ian Wienand	2022-08-24	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the task in the test playbook - hosts: compute1 tasks: - name: Command Not Found command: command-not-found failed_when: false is failing in the zuul_stream callback with an exception trying to fill out the "delta" value in the message here. The result dict (taken from the new output) shows us why: 2022-08-24 07:19:27.079961 \| TASK [Command Not Found] 2022-08-24 07:19:28.578380 \| compute1 \| ok: ERROR (ignored) 2022-08-24 07:19:28.578622 \| compute1 \| { 2022-08-24 07:19:28.578672 \| compute1 \| "failed_when_result": false, 2022-08-24 07:19:28.578700 \| compute1 \| "msg": "[Errno 2] No such file or directory: b'command-not-found'", 2022-08-24 07:19:28.578726 \| compute1 \| "rc": 2 2022-08-24 07:19:28.578750 \| compute1 \| } i.e. it has no start/stop/delta in the result (it did run and fail, so you'd think it might ... but this is what Ansible gives us). This checks for this path; as mentioned the output will now look like above in this case. This was found by the prior change I9f569a411729f8a067de17d99ef6b9d74fc21543. This fixes the current warning, so we invert the test to prevent further regressions. Change-Id: I106b2bbe626ed5af8ca739d354ba41eca2f08f77
* \|	Do not use _display outside the main thread in zuul_stream	James E. Blair	2022-08-31	1	-13/+20
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the defaulh "linear" strategy (and likely others), Ansible will send the on_task_start callback, and then fork a worker process to execute that task. Since we spawn a thread in the on_task_start callback, we can end up emitting a log message in this method while Ansible is forking. If a forked process inherits a Python file object (i.e., stdout) that is locked by a thread that doesn't exist in the fork (i.e., this one), it can deadlock when trying to flush the file object. To minimize the chances of that happening, we should avoid using _display outside the main thread. The Python logging module is supposed to use internal locks which are automatically aqcuired and released across a fork. Assuming this is (still) true and functioning correctly, we should be okay to issue our Python logging module calls at any time. If there is a fault in this system, however, it could have a potential to cause a similar problem. If we can convince the Ansible maintainers to lock _display across forks, we may be able to revert this change in the future. Change-Id: Ifc6b835c151539e6209284728ccad467bef8be6f
*	zuul-stream: automatically remove streaming files	Ian Wienand	2022-08-09	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	When using protocol version 1, send a finalise message when streaming is complete so that the zuul_console daemon can delete the temporary file. We test this by inspecting the Ansible console output, which logs a message with the UUID of the streaming job. We dump the temporary files on the remote side and make sure a console file for that job isn't present. Change-Id: I823156dc2bcae91bd6d9770bd1520aa55ad875b4
*	zuul-stream: implement a protocol and version	Ian Wienand	2022-08-09	1	-28/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A refresher on how this works, to the best of my knowledge 1 Firstly, Zuul's Ansible has a library task "zuul_console:" which is run against the remote node; this forks a console daemon, listening on a default port. 2 We have a action plugin that runs for each task, and if that task is a command/shell task, assigns it a unique id 3 We then override with library/command.py (which backs command/shell tasks) with a version that forks and runs the process on the target node as usual, but also saves the stdout/stderr output to a temporary file named per the unique uuid from the step above. 4 At the same time we have the callback plugin zuul_stream.py, which Ansible is calling as it moves through starting, running and finishing the tasks. This looks at the task, and if it has a UUID [2], sends a request to the zuul_console [1], which opens the temporary file [3] and starts streaming it back. 5 We loop reading this until the connection is closed by [1], eventually outputting each line. In this way, the console log is effectively streamed and saved into our job output. We have established that we expect the console [1] is updated asynchronously to the command/streaming [3,4] in situation such as static nodes. This poses a problem if we ever want to update either part -- for example we can not change the file-name that the command.py file logs to, because an old zuul_console: will not know to open the new file. You could imagine other fantasy things you might like to do; e.g. negotiate compression etc. that would have similar issues. To provide the flexibility for these types of changes, implement a simple protocol where the zuul_stream and zuul_console sides exchange their respective version numbers before sending the log files. This way they can both decide what operations are compatible both ways. Luckily the extant protocol, which is really just sending a plain uuid, can be adapted to this. When an old zuul_console server gets the protocol request it will just look like an invalid log file, which zuul_stream can handle and thus assume the remote end doesn't know about protocols. This bumps the testing timeout; it seems that the extra calls make for random failures. The failures are random and not in the same place, I've run this separately in 850719 several times and not seen any problems with the higher timeout. This test is already has a settle timeout slightly higher so I think it must have just been on the edge. Change-Id: Ief366c092e05fb88351782f6d9cd280bfae96237
*	Handle non default loopvars in Ansible callback stream plugin	Clark Boylan	2022-07-06	1	-5/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Zuul Ansible callback stream plugin assumed that the ansible loop var was always called 'item' in the result_dict. You can override this value (and it is often necessary to do so to avoid collisions) to something less generic. In those cases we would get errors like: b'[WARNING]: Failure using method (v2_runner_item_on_ok) in callback plugin' b'(<ansible.plugins.callback.zuul_stream.CallbackModule object at' b"0x7fbecc97c910>): 'item'" And stream output would not include the info typically logged. Address this by checking if ansible_loop_var is in the results_dict and using that value for the loop var name instead. We still fall back to 'item' as I'm not sure that ansible_loop_var is always present. Change-Id: I408e6d4af632f8097d63c04cbcb611d843086f6c
*	Fix console log streaming with duplicated roles	James E. Blair	2022-05-21	1	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a role is applied to a host more than once (via either play roles or include_roles, but not via an include_role loop), it will have the same task UUID from ansible which means Zuul's command plugin will write the streaming output to the same filename, and the log streaming will request the same file. That means the file might look this after the second invocation: 2022-05-19 17:06:23.673625 \| one 2022-05-19 17:06:23.673781 \| [Zuul] Task exit code: 0 2022-05-19 17:06:29.226463 \| two 2022-05-19 17:06:29.226605 \| [Zuul] Task exit code: 0 But since we stop reading the log after "Task exit code", the user would see "one" twice, and never see "two". Here are some potential fixes for this that don't work: * Accessing the task vars from zuul_stream to store any additional information: the callback plugins are not given the task vars. * Setting the log id on the task args in zuul_stream instead of command: the same Task object is used for each host and therefore the command module might see the task object after it has been further modified (in other words, nothing host-specific can be set on the task object). * Setting an even more unique uuid than Task._uuid on the Task object in zuul_stream and using that in the command module instead of Task._uuid: in some rare cases, the actual task Python object may be different between the callback and command plugin, yet still have the same _uuid; therefore the new attribute would be missing. Instead, a global variable is used in order to transfer data between zuul_stream and command. This variable holds a counter for each task+host combination. Most of the time it will be 1, but if we run the same task on the same host again, it will increment. Since Ansible will not run more than one task on a host simultaneously, so there is no race between the counter being incremented in zuul_stream and used in command. Because Ansible is re-invoked for each playbook, the memory usage is not a concern. There may be a fork between zuul_stream and command, but that's fine as long as we treat it as read-only in the command plugin. It will have the data for our current task+host from the most recent zuul_stream callback invocation. This change also includes a somewhat unrelated change to the test infrastructure. Because we were not setting the log stream port on the executor in tests, we were actually relying on the "real" OpenDev Zuul starting zuul_console on the test nodes rather than the zuul_console we set up for each specific Ansible version from the tests. This corrects that and uses the correct zuul_console port, so that if we make any changes to zuul_console in the future, we will test the changed version, not the one from the Zuul which actually runs the tox-remote job. Change-Id: Ia656db5f3dade52c8dbd0505b24049fe0fff67a5
*	Don't stream async tasks	Albin Vass	2022-03-11	1	-3/+6
\| \| \| \| \| \| \| \|	async tasks is causing zuul_stream.py to hang for 30 seconds before timing out all log streaming threads. This stops zuul_stream.py from streaming logs for async tasks. Change-Id: I86098b0788fdd87593e3c68d04953ede7607432b
*	Reduce amount of 'Waiting on logger' messages sent	Albin Vass	2021-03-09	1	-2/+5
\| \| \| \|	Change-Id: If76e730673ac80e254c878c46b17181e14b4a5de
*	zuul_stream: avoid exception in _log_streamline	Gonéri Le Bouder	2020-07-15	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`_read_log()` calls `_log_streamline()` when it handles `socket.timeout` exception. This change ensures `_log_streamline()` don't raise another exception: ``` Traceback (most recent call last):' File "/var/lib/zuul/ansible/2.9/zuul/ansible/callback/zuul_stream.py", line 137, in _read_log' s = socket.create_connection((ip, port), 5)' File "/usr/lib/python3.6/socket.py", line 724, in create_connection' raise err' File "/usr/lib/python3.6/socket.py", line 713, in create_connection' sock.connect(sa)' socket.timeout: timed out' ' During handling of the above exception, another exception occurred:' ' Traceback (most recent call last):' File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner' self.run()' File "/usr/lib/python3.6/threading.py", line 864, in run' self._target(self._args, *self._kwargs)' File "/var/lib/zuul/ansible/2.9/zuul/ansible/callback/zuul_stream.py", line 152, in _read_log' % (ip, port))' File "/var/lib/zuul/ansible/2.9/zuul/ansible/callback/zuul_stream.py", line 202, in _log_streamline' ts, ln = line.split(' \| ', 1)" ValueError: not enough values to unpack (expected 2, got 1)' ``` Change-Id: Ifeea230d1285ab62acf11a0ed37e38e508e1dba8
*	Fix unused argument triggering flake8	Albin Vass	2020-05-12	1	-1/+1
\| \| \| \| \| \| \|	Strangely this doesn't occur on master but did here: https://review.opendev.org/#/c/727089/3 Change-Id: Ie2ff53d8ed2db03f70bd2ab7789028b5b7a5e91d
*	Don't try to stream from winrm connections	Tobias Henkel	2020-04-07	1	-0/+3
\| \| \| \| \| \| \|	Connections via winrm don't have support for streaming yet as this needs a powershell based zuul_stream module. Change-Id: Ibc0804ec0466fa83e37a4e453028833ce737da8e
*	Stream output from kubectl pods	James E. Blair	2020-02-27	1	-7/+15
\| \| \| \| \| \| \| \| \| \| \|	When we get a pod from nodepool, this starts a kubectl port-forward on the pod so that zuul-console and the normal method of streaming command output will work. Change-Id: Iae85347c3d8e0a74e330a7b62b513c7b41641383 Story: 2007321 Task: 38832 Depends-On: https://review.opendev.org/709259
*	Support Ansible 2.9	Paul Belanger	2019-12-10	1	-0/+14
\| \| \| \| \| \| \|	Ansible has released 2.9 and now Zuul also supports it. Change-Id: Iabf2d6278ba8d88e17403a4adae5521eb3e7019b Signed-off-by: Paul Belanger <pabelanger@redhat.com>
*	zuul_stream: handle module that emit non str msg	Tristan Cacqueray	2019-11-26	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	This change prevents an exception from happening in the zuul_stream module when the msg result is a list instead of a string. This is the case for the package module trying to install missing rpm on ansible 2.8 with python3. Change-Id: I242711ebc0b2dcd1262de38e76efc48554361bd1 Story: 2006830
*	zuul-tox-remote: use unique zuul_console service	Tristan Cacqueray	2019-05-17	1	-1/+1
\| \| \| \| \| \| \|	This change ensures the zuul-tox-remote job is testing the unique zuul_console service instead of the one setup by the job. Change-Id: If2e2324235a6608985168d606d95adeade36afe0
*	Fix race in log streaming	Tobias Henkel	2019-05-17	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	In some cases especially on contended nodes and short tasks it can happen that the task is finished before the log we want to stream showed up. In this case we stop the streamers too early resulting in discarded log output. This makes the tox-remote tests flaky but could also happen in production. This can be fixed by giving the streamers some additional time when stopping if the log didn't show up yet. Change-Id: If7f786dfcbdac28fccb17d55ce76cb02ae6ad28a
*	Support Ansible 2.8	Tobias Henkel	2019-05-16	1	-1/+12
\| \| \| \| \| \| \| \| \|	Ansible has released 2.8 and now zuul also supports it. We've had to update zuul_console to deal with new tasks stats, along with gather_facts also now being exposed directly to the job. Change-Id: Ifa4be7cf408b1f05b0f985fa0c9a5e3947858078 Signed-off-by: Paul Belanger <pabelanger@redhat.com>
*	Support ansible 2.7	Tobias Henkel	2019-03-15	1	-4/+4
\| \| \| \| \| \|	We should keep up with recent ansible versions so support 2.7. Change-Id: Ic4234784f104690d72cfac71e09ecd170ccce1b5
*	Manage ansible installations within zuul	Tobias Henkel	2019-03-15	1	-0/+633
	As a first step towards supporting multiple ansible versions we need tooling to manage ansible installations. This moves the installation of ansible from the requirements.txt into zuul. This is called as a setup hook to install the ansible versions into <prefix>/lib/zuul/ansible. Further this tooling abstracts knowledge that the executor must know in order to actually run the correct version of ansible. The actual usage of multiple ansible versions will be done in follow-ups. For better maintainability the ansible plugins live in zuul/ansible/base where plugins can be kept in different versions if necessary. For each supported ansible version there is a specific folder that symlinks the according plugins. Change-Id: I5ce1385245c76818777aa34230786a9dbaf723e5 Depends-On: https://review.openstack.org/623927