summaryrefslogtreecommitdiff
path: root/src/buildstream/sandbox
diff options
context:
space:
mode:
authorBenjamin Schubert <bschubert15@bloomberg.net>2020-07-03 12:57:06 +0000
committerBenjamin Schubert <bschubert15@bloomberg.net>2020-12-04 10:36:37 +0000
commit705d0023f65621b23b6b0828306dc5b4ee094b45 (patch)
treec021c06427c7e37508c682bc99e06d9eae090975 /src/buildstream/sandbox
parentbe88eaec0445ff2d85b73c17a392d0e65620202b (diff)
downloadbuildstream-705d0023f65621b23b6b0828306dc5b4ee094b45.tar.gz
scheduler.py: Use threads instead of processes for jobs
This changes how the scheduler works and adapts all the code that needs adapting in order to be able to run in threads instead of in subprocesses, which helps with Windows support, and will allow some simplifications in the main pipeline. This addresses the following issues: * Fix #810: All CAS calls are now made in the master process, and thus share the same connection to the cas server * Fix #93: We don't start as many child processes anymore, so the risk of starving the machine are way less * Fix #911: We now use `forkserver` for starting processes. We also don't use subprocesses for jobs so we should be starting less subprocesses And the following highlevel changes where made: * cascache.py: Run the CasCacheUsageMonitor in a thread instead of a subprocess. * casdprocessmanager.py: Ensure start and stop of the process are thread safe. * job.py: Run the child in a thread instead of a process, adapt how we stop a thread, since we ca't use signals anymore. * _multiprocessing.py: Not needed anymore, we are not using `fork()`. * scheduler.py: Run the scheduler with a threadpool, to run the child jobs in. Also adapt how our signal handling is done, since we are not receiving signals from our children anymore, and can't kill them the same way. * sandbox: Stop using blocking signals to wait on the process, and use timeouts all the time. * messenger.py: Use a thread-local context for the handler, to allow for multiple parameters in the same process. * _remote.py: Ensure the start of the connection is thread safe * _signal.py: Allow blocking entering in the signal's context managers by setting an event. This is to ensure no thread runs long-running code while we asked the scheduler to pause. This also ensures all the signal handlers is thread safe. * source.py: Change check around saving the source's ref. We are now running in the same process, and thus the ref will already have been changed.
Diffstat (limited to 'src/buildstream/sandbox')
-rw-r--r--src/buildstream/sandbox/_sandboxbuildboxrun.py14
-rw-r--r--src/buildstream/sandbox/_sandboxremote.py4
2 files changed, 12 insertions, 6 deletions
diff --git a/src/buildstream/sandbox/_sandboxbuildboxrun.py b/src/buildstream/sandbox/_sandboxbuildboxrun.py
index 3d71b7440..1c187d7fd 100644
--- a/src/buildstream/sandbox/_sandboxbuildboxrun.py
+++ b/src/buildstream/sandbox/_sandboxbuildboxrun.py
@@ -184,17 +184,27 @@ class SandboxBuildBoxRun(SandboxREAPI):
try:
while True:
try:
- returncode = process.wait()
+ # Here, we don't use `process.wait()` directly without a timeout
+ # This is because, if we were to do that, and the process would never
+ # output anything, the control would never be given back to the python
+ # process, which might thus not be able to check for request to
+ # shutdown, or kill the process.
+ # We therefore loop with a timeout, to ensure the python process
+ # can act if it needs.
+ returncode = process.wait(timeout=1)
# If the process exits due to a signal, we
# brutally murder it to avoid zombies
if returncode < 0:
utils._kill_process_tree(process.pid)
+ except subprocess.TimeoutExpired:
+ continue
+
# Unlike in the bwrap case, here only the main
# process seems to receive the SIGINT. We pass
# on the signal to the child and then continue
# to wait.
- except KeyboardInterrupt:
+ except _signals.TerminateException:
process.send_signal(signal.SIGINT)
continue
diff --git a/src/buildstream/sandbox/_sandboxremote.py b/src/buildstream/sandbox/_sandboxremote.py
index 6cba7d611..2ac159337 100644
--- a/src/buildstream/sandbox/_sandboxremote.py
+++ b/src/buildstream/sandbox/_sandboxremote.py
@@ -26,7 +26,6 @@ from functools import partial
import grpc
-from .. import utils
from ..node import Node
from .._message import Message, MessageType
from ._sandboxreapi import SandboxREAPI
@@ -59,9 +58,6 @@ class SandboxRemote(SandboxREAPI):
if config is None:
return
- # gRPC doesn't support fork without exec, which is used in the main process.
- assert not utils._is_main_process()
-
self.storage_url = config.storage_service["url"]
self.exec_url = config.exec_service["url"]