summaryrefslogtreecommitdiff
path: root/src/buildstream/source.py
Commit message (Collapse)AuthorAgeFilesLines
* scheduler.py: Use threads instead of processes for jobsBenjamin Schubert2020-12-041-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes how the scheduler works and adapts all the code that needs adapting in order to be able to run in threads instead of in subprocesses, which helps with Windows support, and will allow some simplifications in the main pipeline. This addresses the following issues: * Fix #810: All CAS calls are now made in the master process, and thus share the same connection to the cas server * Fix #93: We don't start as many child processes anymore, so the risk of starving the machine are way less * Fix #911: We now use `forkserver` for starting processes. We also don't use subprocesses for jobs so we should be starting less subprocesses And the following highlevel changes where made: * cascache.py: Run the CasCacheUsageMonitor in a thread instead of a subprocess. * casdprocessmanager.py: Ensure start and stop of the process are thread safe. * job.py: Run the child in a thread instead of a process, adapt how we stop a thread, since we ca't use signals anymore. * _multiprocessing.py: Not needed anymore, we are not using `fork()`. * scheduler.py: Run the scheduler with a threadpool, to run the child jobs in. Also adapt how our signal handling is done, since we are not receiving signals from our children anymore, and can't kill them the same way. * sandbox: Stop using blocking signals to wait on the process, and use timeouts all the time. * messenger.py: Use a thread-local context for the handler, to allow for multiple parameters in the same process. * _remote.py: Ensure the start of the connection is thread safe * _signal.py: Allow blocking entering in the signal's context managers by setting an event. This is to ensure no thread runs long-running code while we asked the scheduler to pause. This also ensures all the signal handlers is thread safe. * source.py: Change check around saving the source's ref. We are now running in the same process, and thus the ref will already have been changed.
* Refactor: Lazily instantiate ProvenanceInformation objectstristan/lazy-provenanceTristan van Berkom2020-10-011-2/+1
| | | | | | | | | | | | | As a rule, throughout the codebase we should not be using internal ProvenanceInformation objects in our APIs, but rather Node objects. This is because ProvenanceInformation is generated on the fly from a Node object, and it is needlessly expensive to instantiate one before it is absolutely needed. This patch unilaterally fixes the codebase to pass `provenance_node` Node objects around as arguments rather than `provenance` ProvenanceInformation objects.
* source.py: Remove BST_KEY_REQUIRES_STAGETristan van Berkom2020-09-241-64/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactored this to remove unneeded complexity in the code base, as described here: https://lists.apache.org/thread.html/r4b9517742433f07c79379ba5b67932cfe997c1e64965a9f1a2b613fc%40%3Cdev.buildstream.apache.org%3E Changes: * source.py: Added private Source._cache_directory() context manager We also move the assertion about nodes which are safe to write to a bit lower in Source._set_ref(), as this was unnecessarily early. When tracking a workspace, the ref will be none and will turn out to be none afterwards, it is not a problem that a workspace's node is a synthetic one, as tracking will never affect it. * local plugin: Implement get_unique_key() and stage() using the new context manager in order to optimize staging and cache key calculations here. * workspace plugin: Implement get_unique_key() and stage() using the new context manager in order to optimize staging and cache key calculations here. * trackqueue.py: No special casing with Source._is_trackable()
* Move handling of the source `directory` configuration to ElementSourcesjuerg/element-source-cacheJürg Billeter2020-09-031-80/+15
| | | | | | | | | | | | | | | | | The `directory` value determines where a source is staged within the build root of an element, however, it does not directly affect individual sources. With this change the sources will individually be cached in CAS independent of the value of `directory`. `ElementSources` will use the value of `directory` when staging all element sources into the build root. This results in a cache key change as the `directory` value is moved from the unique key of individual sources to the unique key of `ElementSources`. This is in preparation for #1274.
* Add ElementSourcesCacheJürg Billeter2020-09-031-8/+2
| | | | | | | | | | | | | | | | | | Sources have been cached in CAS individually, except for sources that transform other sources, which have been cached combined with all previous sources of the element. This caching structure may be confusing as sources are specified in the element as a list and this is not a good fit for #1274 where we want to support caching individual sources in a Remote Asset server with a BuildStream-independent URI (especially the `directory` configuration would be problematic). This replaces the combined caching of 'previous' sources with an element-level source cache, which caches all sources of an element staged together. Sources that don't depend on previous sources are still cached individually. This also makes it possible to add a list of all element sources to the source proto used by the element-level source cache.
* Completely remove MetaElementTristan van Berkom2020-08-131-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This dramatically affects the load process and removes one hoop we had to jump through, which is the creation of the extra intermediate MetaElement objects. This allows us to more easily carry state discovered by the Loader over to the Element constructor, as we need not add additional state to the intermediate MetaElement for this. Instead we have the Element initializer understand the LoadElement directly. Summary of changes: * _loader/metaelement.py: Removed * _loader/loadelement.py: Added some attributes previously required on MetaElement * _loader/loader.py: Removed _collect_element() and collect_element_no_deps(), removing the process of Loader.load() which translates LoadElements into MetaElements completely. * _loader/init.py: Export LoadElement, Dependency and Symbol types, stop exporting MetaElement * _loader/metasource.py: Now take the 'first_pass' parameter as an argument * _artifactelement.py: Use a virtual LoadElement instead of a virtual MetaElement to instantiate the ArtifactElement objects. * _pluginfactory/elementfactory.py: Adjust to now take a LoadElement * _project.py: Adjust Project.create_element() to now take a LoadElement, and call the new Element._new_from_load_element() instead of the old Element._new_from_meta() function * element.py: - Now export Element._new_from_load_element() instead of Element._new_from_meta() - Adjust the constructor to do the LoadElement toplevel node parsing instead of expecting members on the MetaElement object - Added __load_sources() which parses out and creates MetaSource objects for the sake of instantiating the element's Source objects. Consequently this simplifies the scenario where workspaces are involved. * source.py: Adjusted to use the new `first_pass` parameter to MetaSource when creating a duplicate clone.
* element/source: Remove pointless extra variabletristan/cleanup-pickleTristan van Berkom2020-08-101-1/+0
| | | | | These __meta_kind members are documented as relevant for pickling, which was removed a short while ago.
* source.py: Validate cache when it's used, not in `_is_cached()`Jürg Billeter2020-08-061-9/+6
| | | | | | | `_is_cached()` is indirectly called by the frontend, which is not optimal for handling per-plugin errors. Instead, call `validate_cache()` right before the cache is used: in fetch jobs and when opening a workspace.
* Completely abolish job pickling.tristan/nuke-pickle-jobberTristan van Berkom2020-06-151-22/+0
|
* source.py: Allow access to element's variableBenjamin Schubert2020-05-121-1/+8
| | | | | This automatically expands the variables from the element into the sources config
* Update all packages requirementsBenjamin Schubert2020-05-111-1/+1
| | | | Also fix linting errors coming with new version of pylint
* src: Removing all pre 2.x "Since" documentation annotations.Tristan Van Berkom2020-04-211-29/+4
| | | | | | | | | | | | | | | This does not make sense to keep in the public API surface documentation. As we are heading towards a release of 2.0, this represents a "reset" in public API, and older annotations only serve to clutter the documentation with information that is not relevant to the reader. Everything which is public at the time of the 2.0 release can be considered available "Since: 2.0" implicitly (as this is going to be the starting point of this new stable API). It will make sense to start adding these annotations again for any added API in 2.2 and forward.
* exceptions: Expose ErrorDomain, ErrorLoadReasonThomas Coldrick2020-01-231-1/+2
| | | | | | | Plugin tests are already accessing this API, but using imports from private modules. For motivation for this to be exposed publicly, note that ErrorDomain is an argument for most things in runcli.py, and LoadErrorReason may be another.
* source.py: Remove 'get_consistency' completelyBenjamin Schubert2020-01-161-48/+3
| | | | | This is not needed now that we have 'is_resolved' and 'is_cached'. We can therefore drop all calling places and implementations of it.
* source.py: Remove the reliance on consistency to get whether a source is cachedBenjamin Schubert2020-01-161-2/+64
| | | | | | | | | | | | | | | | | This removes the need to use consistency in Sources, by asking explicitely whether the source is cached or not. This introduces a new public method on source: `is_cached` that needs implementation and that should return whether the source has a local copy or not. - On fetch, also reset whether the source was cached or set if as cached when we know it was. - Validate the cache's source after fetching it This doesn't need to be run in the scheduler's process and can be offloaded to the child, which will allow better multiprocessing
* source.py: Add a new 'is_resolved' to get whether a source is resolved.Benjamin Schubert2020-01-161-6/+17
| | | | | | | | | | | | | | | `get_consistency` is coarse grained and hard to optimize, in addition to being un-userfriendly. This adds a new `is_resolved` that has for default implementation `get_ref() is not None`, which is true for most sources in BuildStream. Sources for which this is not true can override the method to give a more accurate description. Checking for this before looking whether the source is cached can reduce the amount of work necessary in some pipeline and opens the door for more optimizations and the removal of the source state check.
* source.py: Introduce methods to query state instead of get_consistencyBenjamin Schubert2020-01-161-3/+8
| | | | | | | | `get_consistency` doesn't allow being fine grained and asking only for a specific bit of information. This introduces methods `is_cached` and `is_resolved` which will be more flexible for refactoring.
* lint: Remove unnecessary list comprehensionsBenjamin Schubert2019-12-021-1/+1
| | | | | Newer version of pylint detect when a comprehension would not be needed. Let's remove all the ones that are indeed extraneous
* Fix stacktraces during element loadingTristan Maat2019-11-221-2/+15
| | | | | | | | | | These were caused by unhandled errors from plugins when calling `Source.get_consistency()`. This doesn't really solve the problem, since that interface is still used un-wrapped elsewhere, but it enables removing `Element.__schedule_tracking()` and fixes a bug. Ultimately we'd like to remove `Source.get_consistency()`, so this isn't too long-term of a problem.
* Reformat code using BlackChandan Singh2019-11-141-84/+85
| | | | | | | As discussed over the mailing list, reformat code using Black. This is a one-off change to reformat all our codebase. Moving forward, we shouldn't expect such blanket reformats. Rather, we expect each change to already comply with the Black formatting style.
* Add _is_trackable() method to Source()Darius Makovsky2019-11-111-0/+10
| | | | | | | | | | | | | | | | This method reports whether the source can be tracked. This would be false for sources advertising BST_KEY_REQUIRES_STAGE. Element tracking can be skipped if none of the held sources can be tracked. This is determined by the value of the `Element.__tracking_scheduled` attribute which is set in `Element._schedule_tracking()`. This is set to `True` if at least one source can be tracked. Also remove some of the tracking handling from `_stream._load` to `_stream.track` where it is more relevant. closes #1186
* Remove `commit`ting sources inside `Source()._generate_key`Darius Makovsky2019-11-051-4/+0
| | | | | | | | `Stream.shell()` should check that the element's sources are cached before calling the shell. If the sources are not cached raise a StreamError and recommend a fetch. closes #1182
* Replace BST_NO_PRESTAGE_KEY with BST_KEY_REQUIRES_STAGEDarius Makovsky2019-11-041-9/+8
| | | | Correct version number for BST_KEY_REQUIRES_STAGE
* source.py: _get_unique_key in _trackDarius Makovsky2019-10-301-2/+7
| | | | | Ensure that sources advertising BST_NO_PRESTAGE_KEY have keys after tracking.
* source.py: Add BST_NO_PRESTAGE_KEYDarius Makovsky2019-10-301-5/+43
| | | | | | | | | | Extend Source API Add `_stage_into_cas()` private method. Calls `self.stage` on a `CasBasedDirectory`. If the source sets BST_NO_PRESTAGE_KEY then the casdir is recreated from a stored digest and imported directly in `_stage`.
* Remove unnecessary parameter in Source._get_unique_keyDarius Makovsky2019-10-301-10/+4
|
* job pickling: plugins don't return their factoriesAngelos Evripiotis2019-10-251-8/+5
| | | | | | | | | Remove the need for plugins to find and return the factory they came from. Also take the opportunity to combine source and element pickling into a single 'plugin' pickling path. This will make it easier for us to later support pickling plugins from the 'first_pass_config' of projects.
* node.pyx: Make 'strip_node_info' publicBenjamin Schubert2019-10-161-2/+2
| | | | | 'strip_node_info' would be useful for multiple plugins. We should therefore allow users to use it.
* workspace.py: Do not close gRPC channelsJürg Billeter2019-10-151-2/+0
| | | | This is now handled in Context.prepare_fork().
* cascache.py: Rename close_channel() to close_grpc_channels()Jürg Billeter2019-10-151-1/+1
| | | | This aligns the method name with has_open_grpc_channels().
* Defer committing workspace files to cachetraveltissues/1159Darius Makovsky2019-10-081-0/+6
| | | | | | | | | Remove XFAIL mark from test_workspace_visible and remove the explicit SourceCache.commit() in the workspace source plugin. Allow buildstream to handle the commit logic. Add handling for non-cached workspace sources in `source.Source._generate_keys()`.
* Add initial mypy configuration and typesChandan Singh2019-09-021-43/+66
| | | | | | | | | | As a first step, add type hints to variables whose type `mypy` cannot infer automatically. This is the minimal set of type hints that allow running `mypy` without any arguments, and having it not fail. We currently ignore C extensions that mypy can't process directly. Later, we can look into generating stubs for such modules (potentially automatically).
* _message.py: Use element_name & element_key instead of unique_idtpollard/messageobjectTom Pollard2019-08-081-1/+6
| | | | | | | | | | | | | Adding the element full name and display key into all element related messages removes the need to look up the plugintable via a plugin unique_id just to retrieve the same values for logging and widget frontend display. Relying on plugintable state is also incompatible if the frontend will be running in a different process, as it will exist in multiple states. The element full name is now displayed instead of the unique_id, such as in the debugging widget. It is also displayed in place of 'name' (i.e including any junction prepend) to be more informative.
* Make ChildJobs and friends picklableAngelos Evripiotis2019-07-241-0/+26
| | | | | | | | | Pave the way toward supporting the 'spawn' method of creating jobs, by adding support for pickling ChildJobs. Introduce a new 'jobpickler' module that provides an entrypoint for this functionality. This also makes replays of jobs possible, which has made the debugging of plugins much easier for me.
* source: Cache mirror_directory instead of computing it everytimeBenjamin Schubert2019-07-171-5/+10
| | | | | This variable is accessed multiple times per run and can be slow on slow file systems.
* plugins: Update public documentation to be correct with the new NodesBenjamin Schubert2019-07-151-3/+3
| | | | | We need to update every place where we were passing a yaml 'dict' to now pass a 'MappingNode'
* node: Rename 'copy' to 'clone'Benjamin Schubert2019-07-151-1/+1
| | | | | | | | A 'clone' operation has an implicit understanding that it is expensive, which is not the case of a 'copy' operation, which is more usually a shallow copy. Therefore renaming to 'clone'
* _yaml: Mark 'strip_node_info' as buildstream-privateBenjamin Schubert2019-07-151-2/+2
|
* _yaml: Set 'MappingNode' public-private APIBenjamin Schubert2019-07-151-1/+1
| | | | | | | - _composite -> __composite (internal) - composite -> _composite (BuildStream private) - composite_under -> _composite_under (BuildStream private) - get -> _get (internal)
* _yaml: Mark attributes in ProvenanceInformation as Buildstream-privateBenjamin Schubert2019-07-151-11/+11
| | | | | Users should not need to get access to any of those, and should only need access to the ProvenanceInformation to print it.
* _yaml: Remove 'node_get_provenance' and add 'Node.get_provenance'Benjamin Schubert2019-07-151-3/+3
| | | | | | | | This replaces the helper method by adding a 'get_provenance' on the node directly - Adapt all call sites - Delay getting provenance wherever possible without major refactor
* _yaml: Remove 'node_validate' and replace by 'MappingNode.validate_keys'Benjamin Schubert2019-07-151-1/+1
| | | | - adapt all call sites to use the new API
* _yaml: Move 'node_composite' to a method on 'MappingNode'Benjamin Schubert2019-07-151-1/+1
| | | | | - Also take care of node_composite_move in the same way. - Adapt all calling places
* _yaml: Move 'node_final_assertions' to 'Node._assert_fully_composited'Benjamin Schubert2019-07-151-1/+1
|
* _yaml: remove node_sanitizeBenjamin Schubert2019-07-151-2/+3
| | | | | | | | | Some call places do not need calls to 'node_sanitize' anymore, therefore removing the call entirely. Other still use it for convenience, but that doesn't seem the right way to do it for consistency. Those places have been replaced by calls to 'Node.strip_node_info()'.
* _yaml: Remove 'node_find_target' and replace by 'MappingNode.find'Benjamin Schubert2019-07-151-2/+2
|
* _yaml: Remove 'key' from node_find_targetBenjamin Schubert2019-07-151-1/+3
| | | | | | | - node_find_target with 'key' is only used once in the codebase. We can remove and simplify this function - Allow 'MappingNode.get_node()' to be called without any 'expected_types'
* _yaml: Remove 'node_copy' and add 'Node.copy()'Benjamin Schubert2019-07-151-1/+1
| | | | Also adaprt every part of the code calling it
* _yaml: add 'get_mapping()' to MappingNodeBenjamin Schubert2019-07-151-6/+4
| | | | | | | | This allows to get a mapping node from another 'MappingNode', replacing 'node_get(my_mapping, key, type=dict)' Also changes all places where 'node_get' was called like that by the new API.
* source: rm unused _cache(), __source_cacheAngelos Evripiotis2019-07-091-6/+0
|