diff options
author | Benjamin Schubert <bschubert15@bloomberg.net> | 2020-07-03 12:57:06 +0000 |
---|---|---|
committer | Benjamin Schubert <bschubert15@bloomberg.net> | 2020-12-04 10:36:37 +0000 |
commit | 705d0023f65621b23b6b0828306dc5b4ee094b45 (patch) | |
tree | c021c06427c7e37508c682bc99e06d9eae090975 /src/buildstream/sandbox | |
parent | be88eaec0445ff2d85b73c17a392d0e65620202b (diff) | |
download | buildstream-705d0023f65621b23b6b0828306dc5b4ee094b45.tar.gz |
scheduler.py: Use threads instead of processes for jobs
This changes how the scheduler works and adapts all the code that needs
adapting in order to be able to run in threads instead of in
subprocesses, which helps with Windows support, and will allow some
simplifications in the main pipeline.
This addresses the following issues:
* Fix #810: All CAS calls are now made in the master process, and thus
share the same connection to the cas server
* Fix #93: We don't start as many child processes anymore, so the risk
of starving the machine are way less
* Fix #911: We now use `forkserver` for starting processes. We also
don't use subprocesses for jobs so we should be starting less
subprocesses
And the following highlevel changes where made:
* cascache.py: Run the CasCacheUsageMonitor in a thread instead of a
subprocess.
* casdprocessmanager.py: Ensure start and stop of the process are thread
safe.
* job.py: Run the child in a thread instead of a process, adapt how we
stop a thread, since we ca't use signals anymore.
* _multiprocessing.py: Not needed anymore, we are not using `fork()`.
* scheduler.py: Run the scheduler with a threadpool, to run the child
jobs in. Also adapt how our signal handling is done, since we are not
receiving signals from our children anymore, and can't kill them the
same way.
* sandbox: Stop using blocking signals to wait on the process, and use
timeouts all the time.
* messenger.py: Use a thread-local context for the handler, to allow for
multiple parameters in the same process.
* _remote.py: Ensure the start of the connection is thread safe
* _signal.py: Allow blocking entering in the signal's context managers
by setting an event. This is to ensure no thread runs long-running
code while we asked the scheduler to pause. This also ensures all the
signal handlers is thread safe.
* source.py: Change check around saving the source's ref. We are now
running in the same process, and thus the ref will already have been
changed.
Diffstat (limited to 'src/buildstream/sandbox')
-rw-r--r-- | src/buildstream/sandbox/_sandboxbuildboxrun.py | 14 | ||||
-rw-r--r-- | src/buildstream/sandbox/_sandboxremote.py | 4 |
2 files changed, 12 insertions, 6 deletions
diff --git a/src/buildstream/sandbox/_sandboxbuildboxrun.py b/src/buildstream/sandbox/_sandboxbuildboxrun.py index 3d71b7440..1c187d7fd 100644 --- a/src/buildstream/sandbox/_sandboxbuildboxrun.py +++ b/src/buildstream/sandbox/_sandboxbuildboxrun.py @@ -184,17 +184,27 @@ class SandboxBuildBoxRun(SandboxREAPI): try: while True: try: - returncode = process.wait() + # Here, we don't use `process.wait()` directly without a timeout + # This is because, if we were to do that, and the process would never + # output anything, the control would never be given back to the python + # process, which might thus not be able to check for request to + # shutdown, or kill the process. + # We therefore loop with a timeout, to ensure the python process + # can act if it needs. + returncode = process.wait(timeout=1) # If the process exits due to a signal, we # brutally murder it to avoid zombies if returncode < 0: utils._kill_process_tree(process.pid) + except subprocess.TimeoutExpired: + continue + # Unlike in the bwrap case, here only the main # process seems to receive the SIGINT. We pass # on the signal to the child and then continue # to wait. - except KeyboardInterrupt: + except _signals.TerminateException: process.send_signal(signal.SIGINT) continue diff --git a/src/buildstream/sandbox/_sandboxremote.py b/src/buildstream/sandbox/_sandboxremote.py index 6cba7d611..2ac159337 100644 --- a/src/buildstream/sandbox/_sandboxremote.py +++ b/src/buildstream/sandbox/_sandboxremote.py @@ -26,7 +26,6 @@ from functools import partial import grpc -from .. import utils from ..node import Node from .._message import Message, MessageType from ._sandboxreapi import SandboxREAPI @@ -59,9 +58,6 @@ class SandboxRemote(SandboxREAPI): if config is None: return - # gRPC doesn't support fork without exec, which is used in the main process. - assert not utils._is_main_process() - self.storage_url = config.storage_service["url"] self.exec_url = config.exec_service["url"] |