| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We created the State object in the core for the purpose of advertizing
state to the frontend, and the frontend can register callbacks and get
updates to state changes (implicit invocation in the frontend), state
always belongs to the core and the frontend can only read state.
When the frontend asks the core to do something, this should always
be done with an explicit function call, and preferably not via the
State object, as this confuses the use of state, which is only a readonly
state advertizing desk.
This was broken (implemented backwards) for job retries, instead we had
the frontend telling state "It has been requested that this job be retried !",
and then we had the core registering callbacks to that frontend request - this
direction of implicit invocation should not happen (the core should never
have to register callbacks on the State object at all in fact).
Summary of changes:
* _stream.py: Change _failure_retry(), which was for some reason
private albeit called from the frontend, to an explicit function
call named "retry_job()".
Instead of calling into the State object and causing core-side
callbacks to be triggered, later to be handled by the Scheduler,
implement the retry directly from the Stream, since this implementation
deals only with Queues and State, which already directly belong to
the Stream object, there is no reason to trouble the Scheduler
with this.
* _scheduler.py: Remove the callback handling the State "task retry"
event.
* _state.py: Remove the task retry callback chain completely.
* _frontend/app.py: Call stream.retry_job() instead of
stream.failure_retry(), now passing along the task's action name
rather than the task's ID.
This API now assumes that Stream.retry_job() can only be called on
a task which originates from a scheduler Queue, and expects to be
given the action name of the queue in which the given element has
failed and should be retried..
|
|
|
|
|
|
| |
Now that the only type of message that goes through are messages for the
messenger, we can remove the enveloppe and only ever handle messenger's
messages
|
|
|
|
|
| |
Since we run in a single process, we do not need this distinction
anymore
|
|
|
|
|
| |
This is not needed now that jobs run in the smae process, we can just
return the value from the method.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is currently only used by the ElementJob to send back information
about the workspace, that we can get directly now that we run in the
same process
* elementjob.py: Remove the returning of the workspace dict. This is
directly available in the main thread.
* queue.py: Use the workspace from the element directly instead of going
through child data
|
|
|
|
|
| |
This reduces a race condition where a sigint received shortly after
restarting the scheduler would cause the schedulert to crash.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes how the scheduler works and adapts all the code that needs
adapting in order to be able to run in threads instead of in
subprocesses, which helps with Windows support, and will allow some
simplifications in the main pipeline.
This addresses the following issues:
* Fix #810: All CAS calls are now made in the master process, and thus
share the same connection to the cas server
* Fix #93: We don't start as many child processes anymore, so the risk
of starving the machine are way less
* Fix #911: We now use `forkserver` for starting processes. We also
don't use subprocesses for jobs so we should be starting less
subprocesses
And the following highlevel changes where made:
* cascache.py: Run the CasCacheUsageMonitor in a thread instead of a
subprocess.
* casdprocessmanager.py: Ensure start and stop of the process are thread
safe.
* job.py: Run the child in a thread instead of a process, adapt how we
stop a thread, since we ca't use signals anymore.
* _multiprocessing.py: Not needed anymore, we are not using `fork()`.
* scheduler.py: Run the scheduler with a threadpool, to run the child
jobs in. Also adapt how our signal handling is done, since we are not
receiving signals from our children anymore, and can't kill them the
same way.
* sandbox: Stop using blocking signals to wait on the process, and use
timeouts all the time.
* messenger.py: Use a thread-local context for the handler, to allow for
multiple parameters in the same process.
* _remote.py: Ensure the start of the connection is thread safe
* _signal.py: Allow blocking entering in the signal's context managers
by setting an event. This is to ensure no thread runs long-running
code while we asked the scheduler to pause. This also ensures all the
signal handlers is thread safe.
* source.py: Change check around saving the source's ref. We are now
running in the same process, and thus the ref will already have been
changed.
|
|
|
|
|
|
| |
This enqueue_plan can take a long time, as it triggers a verification
of the 'cached' state for sources in some cases, which can take a long
time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This behavior has regressed a while back when introducing the messenger
object in 0026e379 from merge request !1500.
Main behavior change:
- Messages in the master log always appear with the task element's
element name and cache key, even if the element or plugin issuing
the log line is not the primary task element.
- Messages logged in the task specific log, retain the context of the
element names and cache keys which are issuing the log lines.
Changes include:
* _message.py: Added the task element name & key members
* _messenger.py: Log the element key as well if it is provided
* _widget.py: Prefer the task name & key when logging, we fallback
to the element name & key in case messages are being logged outside
of any ongoing task (main process/context)
* job.py: Unconditionally stamp messages with the task name & key
Also removed some unused parameters here, clearing up an XXX comment
* plugin.py: Add new `_message_kwargs` instance property, it is the responsibility
of the core base class to maintain the base keyword arguments which
are to be used as kwargs for Message() instances created on behalf
of the issuing plugin.
Use this method to construct messages in Plugin.__message() and to
pass kwargs along to Messenger.timed_activity().
* element.py: Update the `_message_kwargs` when the cache key is updated
* tests/frontend/logging.py: Fix test to expect the cache key in the logline
* tests/frontend/artifact_log.py: Fix test to expect the cache key in the logline
Fixes #1393
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of passing around untyped tuples for cache keys, lets have
a clearly typed object for this.
This makes for more readable code, and additionally corrects the
data model statement of intent that some cache keys should be displayed
as "dim", instead informing the frontend about whether the cache key
is "strict" or not, allowing the frontend to decide how to display
a strict or non-strict key.
This patch does the following:
* types.py: Add _DisplayKey
* element.py: Return a _DisplayKey from Element._get_display_key()
* Other sources: Updated to use the display key object
|
|
|
|
| |
This allows the frontend to render pending messages.
|
|
|
|
|
|
|
|
|
|
|
| |
As per #819, BuildStream should pull missing artifacts by default. The
previous behavior was to only pull missing buildtrees. A top-level
`--no-pull` option can easily be supported in the future.
This change makes it possible to use a single scheduler session (with
concurrent pull and push jobs). This commit also simplifies the code as
it removes the `sched_error_action` emulation, using the regular
frontend code path instead.
|
|
|
|
|
|
| |
This allows proper error handling when pushing an uncached element
should result in a failure (bst artifact push), not a skipped job
(bst build).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactored this to remove unneeded complexity in the code base,
as described here:
https://lists.apache.org/thread.html/r4b9517742433f07c79379ba5b67932cfe997c1e64965a9f1a2b613fc%40%3Cdev.buildstream.apache.org%3E
Changes:
* source.py: Added private Source._cache_directory() context manager
We also move the assertion about nodes which are safe to write to
a bit lower in Source._set_ref(), as this was unnecessarily early.
When tracking a workspace, the ref will be none and will turn out
to be none afterwards, it is not a problem that a workspace's node
is a synthetic one, as tracking will never affect it.
* local plugin: Implement get_unique_key() and stage() using
the new context manager in order to optimize staging and
cache key calculations here.
* workspace plugin: Implement get_unique_key() and stage() using
the new context manager in order to optimize staging and
cache key calculations here.
* trackqueue.py: No special casing with Source._is_trackable()
|
|
|
|
|
|
|
|
| |
`State.add_task()` required the job name to be unique in the session.
However, the tuple `(action_name, full_name)` is not guaranteed to be
unique. E.g., multiple `ArtifactElement` objects with the same element
name may participate in a single session. Use a separate task identifier
to fix this.
|
|
|
|
|
|
|
|
| |
We previously were sending custom messages from child jobs to parent
jobs for example for reporting the cache size.
This is not used anymore by the current implementation. Let's remove
this entirely
|
|
|
|
| |
This reduces the amount of code duplication
|
| |
|
| |
|
|
|
|
| |
Call directly the relevant methods from the stream to the scheduler
|
|
|
|
|
| |
This removes all notifications left coming from the scheduler, and
replaces them by callbacks
|
|
|
|
|
| |
The stream is itself calling the `run` method on the scheduler, we don't
need another indirection
|
|
|
|
|
| |
We are calling the scheduler, and it returning correctly already tells
us this.
|
|
|
|
|
| |
We are calling the scheduler, and it returning correctly already tells
us this.
|
|
|
|
|
| |
Stop using 'Notifications' for retries, the state is the one handling
the callbacks required for every change in status of elements
|
|
|
|
|
| |
This moves all implementations of 'start_time' into a single place
for easier handling and removing roundtrips of notifications
|
|
|
|
|
| |
The State is the interface between both, there is no need to do multiple
round-trips to handle such notifications
|
|
|
|
|
| |
The messenger should be the one receiving messages directly, we don't
need this indirection
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This way we split up the logic of how to load plugins from different
origins into their respective classes.
This commit also:
o Introduces PluginType (which is currently either SOURCE or ELEMENT)
o Reduces the complexity of the PluginFactory constructor
o Kills the loaded_dependencies list and the all_loaded_plugins API,
and replaces both of these with a new list_plugins() API.
Consequently the jobpickler.py from the scheduler, and the
widget.py from the frontend, are updated to use list_plugins().
o Split up the PluginOrigin implementations into separate files
Instead of having all PluginOrigin classes in pluginorigin.py, split
it up into one base class and separate files for each implementation,
which is more inline with BuildStream coding style.
This has the unfortunate side effect of adding load_plugin_origin()
into the __init__.py file, because keeping new_from_node() as
a PluginOrigin class method cannot be done without introducing a
cyclic dependency with PluginOrigin and it's implementations.
|
|
|
|
| |
This reverts commit aa25f6fcf49f0015fae34dfd79b4626a816bf886.
|
|
|
|
| |
This reverts commit 14e32a34f67df754d9146efafe9686bfe6c91e50.
|
|
|
|
|
|
|
|
|
|
| |
real_schedule()
In case queuing jobs results in jobs completing, we need to reset the
schedule handler at the beginning of the function and not after queueing
the jobs.
This fixes the failure to exit the main loop in #1312
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sending notifications causes potentially large bodies of code to run
in the abstracted frontend codebase, we are not allowed to have knowledge
of the frontend from this code.
Previously, we were adding the job to the active jobs, sending the
notification, and then starting the job. This means that if a BuildStream
frontend implementation crashes, we handle the excepting in an inconsistent
state and try to kill jobs which are not running.
In addition to making sure that active_jobs list adjustment and
job starting does not have any code body run in the danger window
in between these, this patch also adds some fault tolerance and assertions
around job termination so that:
o Job.terminate() and Job.kill() do not crash with None _process
o Job.start() raises an assertion if started after being terminated
This fixes the infinite looping aspects of frontend crashes at
job_start() time described in #1312.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is mostly a semantic change which defines how deprecation warnings
are suppressed in a more consistent fashion, by declaring such suppressions
in the plugin origin declarations rather than on the generic element/source
configuration overrides section.
Other side effects of this commit are that the warnings have been enhanced
to include the provenance of whence the deprecated plugins have been used in
the project, and that the custom deprecation message is optional and will
appear in the message detail string rather than in the primary warning text,
which now simply indicates that the plugin being used is deprecated.
Documentation and test cases are updated.
This fixes #1291
|
|
|
|
|
|
|
|
|
| |
So far we were only reporting "No Source plugin registered for kind 'foo'",
without specifying what bst file with line and column information, this
commit fixes it.
Additionally, this patch stores the provenance on the MetaSource to
allow this to happen for sources.
|
|
|
|
|
|
| |
`Sandbox` subclasses use `_signals.terminator()` to gracefully terminate
the running command and cleanup the sandbox. Setting a `SIGTERM` handler
in `job.py` breaks this.
|
|
|
|
|
| |
After a successful build we know that the artifact is cached. Avoid
querying buildbox-casd and the filesystem.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This removes the need to use consistency in Sources, by asking
explicitely whether the source is cached or not.
This introduces a new public method on source: `is_cached` that needs
implementation and that should return whether the source has a local
copy or not.
- On fetch, also reset whether the source was cached or set if as
cached when we know it was.
- Validate the cache's source after fetching it
This doesn't need to be run in the scheduler's process and can be
offloaded to the child, which will allow better multiprocessing
|
|
|
|
|
|
|
| |
This replaces the _get_consistency method by two methods:
`_has_all_sources_resolved` and `_has_all_sources_cached` which allows
a more fine grained control on what information is needed.
|
|
|
|
|
|
| |
'_source_cached' is not explicit enough as it doesn't distinguishes
between sources in their respective caches and sources in the global
sourcecache.
|
|
|
|
|
|
| |
The default exception handler of the async event loop bypasses
our custom global exception handler. Ensure that isn't the case,
as these exceptions are BUGS & should cause bst to exit.
|
|
|
|
|
| |
In Python 3.8, `ThreadedChildWatcher` is the default watcher that causes
issues with our scheduler. Enforce use of `SafeChildWatcher`.
|
|
|
|
|
|
|
|
|
|
|
| |
As we handle subprocess termination by pid with an asyncio child
watcher, the multiprocessing.Process object does not get notified when
the process terminates. And as the child watcher reaps the process, the
pid is no longer valid and the Process object is unable to check whether
the process is dead. This results in Process.close() raising a
ValueError.
Fixes: 9c23ce5c ("job.py: Replace message queue with pipe")
|
|
|
|
|
|
|
|
| |
A lightweight unidirectional pipe is sufficient to pass messages from
the child job process to its parent.
This also avoids the need to access the private `_reader` instance
variable of `multiprocessing.Queue`.
|
| |
|
|
|
|
|
|
|
|
| |
Per
https://docs.python.org/3/library/asyncio-policy.html#asyncio.AbstractChildWatcher.add_child_handler,
the callback from a child handler must be thread safe. Not all our
callbacks were. This changes all our callbacks to schedule a call for
the next loop iteration instead of executing it directly.
|
|
|
|
|
|
|
|
| |
The documentation
(https://docs.python.org/3/library/asyncio-policy.html#asyncio.AbstractChildWatcher)
is apparently missing this part, but the code mentions that new
processes should only ever be called inside a with block:
https://github.com/python/cpython/blob/99eb70a9eb9493602ff6ad8bb92df4318cf05a3e/Lib/asyncio/unix_events.py#L808
|
|
|
|
| |
We don't need to keep a reference to the watcher, let's remove it.
|