baserock/baserock/morph.git - Morph -- Baserock build tool

	Commit message (Collapse)	Author	Age	Files	Lines
*	distbuild: When a build finishes, say which worker it was built on	Sam Thursfield	2015-10-07	1	-3/+4
\| \| \| \|	Change-Id: I493fced8cf2664283923f6f41097ca991d3fc3de
*	distbuild: Fix crash when worker disconnects	Sam Thursfield	2015-06-24	1	-1/+1
\| \| \| \| \| \| \|	Bad function prototype meant that the mechanism for handling workers disconnecting actually caused the controller to crash instead. Change-Id: I8ceb6ad027ba2481c0c4c335e1760692823c208b
*	Disable WC exec-output messages in log by default	Richard Ipsum	2015-05-18	1	-1/+3
\| \| \| \|	Change-Id: I01a60d4ec187d5fab060f40947d97aa97013f7a7
*	distbuild: Set job status to failed when sending exec-cancel	Adam Coldrick	2015-05-12	1	-0/+8
\| \| \| \| \| \| \| \| \|	Currently jobs may continue running after exec-cancel is sent if exec-response takes a while to be sent back. This commit makes the job's state be set to 'failed' when exec-cancel is sent, so that the wait for exec-response doesn't matter. Change-Id: I858d9efcba38c81a912cf57aee2bdd8c02cb466b
*	Revert "distbuild: Track worker jobs using artifact basename only"	Adam Coldrick	2015-05-12	1	-29/+48
\| \| \| \| \| \| \| \| \| \|	This reverts commit 75ef3e9585091b463b60d2981b3b7283a2ea8eab. It turns out that the JobQueue may need to handle more than one build of the same artifact at once, as one may be in the process of being cancelled when another build of the same artifact is requested. So they do need an ID separate from the artifact ID. Change-Id: Ifa0c06987795a4aebdadbd9927de27919377b0a2
*	Clean up artifact serialisation	Adam Coldrick	2015-05-12	1	-5/+3
\| \| \| \| \| \| \|	We no longer serialise whole artifacts, so it doesn't make sense for things to still refer to serialise-artifact and similar. Change-Id: Id4d563a07041bbce77f13ac71dc3f7de39df5e23
*	distbuild: Builds currently break due to job being set twice	Lauren Perry	2015-05-11	1	-1/+0
\| \| \| \| \| \| \|	Remove extra job set line as self._current_job no longer exists in worker_build_scheduler.py Change-Id: I8849742587f11f83ebba64f48eaf97fac83e6589
*	distbuild: Allow WorkerConnection to track multiple in-flight jobs	Sam Thursfield	2015-05-07	1	-108/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although in theory a worker should only ever have one job at once, in practice this assumption doesn't hold, and can cause serious confusion. The worker (implemented in the JsonRouter class) will actually queue up exec-request messages and run the oldest one first. I saw a case where, due to a build not being correctly cancelled, the WorkerConnection.current_job attribute got out of sync with what the worker was actually building. This lead to an error when trying to fetch the built artifacts, as the controller tried to fetch artifacts for something that wasn't actually built yet, and everything got stuck. To prevent this from happening, we either need to remove the exec-request queue in the worker-daemon process, or make the WorkerConnection class cope with multiple jobs at once. The latter seems like the more robust approach, so I have done that. Another bug this fixes is the issue where, if the 'Computing build graph' (serialise-artifact) step of a build completes on the controller while one of its WorkerConnection objects is waiting for artifacts to be fetched by the shared cache from the worker, the build hangs. This would happen because the WorkerConnection assumed that any HelperResponse message it saw was the result of its request, so would send a _JobFinished before caching had actually finished if there was an unrelated HelperResponse received in the meantime. It now checks the request ID of the HelperResponse before calling the code that is now in the new _handle_helper_result_for_job() function. Change-Id: Ia961f333f9dae77405b58c82c99a56e4c43e1628
*	distbuild: Track worker jobs using artifact basename only	Sam Thursfield	2015-05-07	1	-34/+23
\| \| \| \| \| \| \|	Rather than generating IDs for each job, identify them by what artifact is going to be built. Artifact cache IDs need to be unique in any case. Change-Id: I37a0277931c45a8fb6e37ae7c2a6a942ae732fdd
*	distbuild: Track state of a job in the Job class	Sam Thursfield	2015-05-07	1	-22/+31
\| \| \| \| \| \| \|	This is a bit more comprehensive than the previous approach of using public instance attributes, and I find it easier to reason about. Change-Id: I2942ecf53c95e29893dc0982d38aec689ebfa614
*	distbuild: Make Jobs class into a more generic JobQueue	Sam Thursfield	2015-05-07	1	-11/+17
\| \| \| \| \| \| \|	The intention is to allow workers to use this class for job tracking, in addition to the controller. Change-Id: I355861086764476b383266bab7e850af5e05bc54
*	distbuild: Fix NameError when worker disconnects	Sam Thursfield	2015-04-28	1	-1/+1
\| \| \| \|	Change-Id: Ifdaa92c209a4ca488c4447911bef9b1bf7d61438
*	Make distbuild use an ArtifactReference not an Artifact internally when building	Adam Coldrick	2015-04-24	1	-13/+16
\| \| \| \| \| \| \| \|	We no longer serialise entire artifacts, so the output of deserialise_artifact is an ArtifactReference. This commit changes stuff in distbuild to know how to deal with that rather than an Artifact. Change-Id: I79b40d041700a85c25980e3bd70cd34dedd2a113
*	distbuild: Remove unneeded debugging statement	Sam Thursfield	2015-04-09	1	-6/+0
\| \| \| \| \| \| \|	A JsonMachine object can be set to log all messages that it sends, we don't need to handle it in the WorkerConnection class as well. Change-Id: Idfdc06953363a016708b5dda50c978eb93b1113c
*	distbuild: Make 'Current jobs' log message more useful	Sam Thursfield	2015-04-09	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's good to know which jobs are in progress and which are queued, when reading morph-controller.log. Old output: 2015-04-09 10:40:58 DEBUG Current jobs: ['3f647933a1effbb128c857225ba77e9aa775d92314ef0acf3e58e084a7248c73.chunk.stage1-binutils-misc', 'd7279e4179a31d8a3a98c27d5b01ad1bb7387c7fab623fee1086ab68af2784bb.chunk.stage2-fhs-dirs-misc'] New output: 2015-04-09 10:40:58 DEBUG Current jobs: ['3f647933a1effbb128c857225ba77e9aa775d92314ef0acf3e58e084a7248c73.chunk.stage1-binutils-misc (given to worker1:3434)', 'd7279e4179a31d8a3a98c27d5b01ad1bb7387c7fab623fee1086ab68af2784bb.chunk.stage2-fhs-dirs-misc (given to worker2:3434)'] Change-Id: Ie89e6723b0da5f930813591a3166301fd3966804
*	Use the modern way of the GPL copyright header: URL instead real address	Javier Jardón	2015-03-16	1	-2/+1
\| \| \| \|	Change-Id: I992dc0c1d40f563ade56a833162d409b02be90a0
*	Merge branch 'sam/distbuild-build-logs'	Sam Thursfield	2015-03-11	1	-54/+81
\|\ \| \| \| \| \| \| \| \|	Reviewed-By: Adam Coldrick <adam.coldrick@codethink.co.uk> Reviewed-By: Richard Maw <richard.maw@codethink.co.uk>
\| *	distbuild: Fix build logs being sent to the wrong log files	Sam Thursfield	2015-02-18	1	-54/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For a while we have seen an issue where output from build A would end up in the log file of some other random chunk. The problem turns out to be that the WorkerConnection class in the controller-daemon assumes cancellation is instantaneous. If a build was cancelled, the WorkerConnection would send a cancel message for the job it was running, and then start a new job. However, the worker-daemon process would have a backlog of exec-output messages and a delayed exec-response message from the old job. The controller would receive these and would assume that they were for the new job, without checking the job ID in the messages. Thus they would be sent to the wrong log file. To fix this, the WorkerConnection class now tracks jobs by job ID, and the code should be generally more robust when unexpected messages are received.
\| *	Update copyright years	Sam Thursfield	2015-02-18	1	-1/+1
\| \|
* \|	distbuild: Be more robust when a worker disconnects	Sam Thursfield	2015-02-03	1	-8/+35
\|/ \| \| \| \| \| \| \| \|	The logic to handle a worker disconnecting was broken. The WorkerConnection object would remove itself from the main loop as soon as the worker disconnected. But it would not get removed from the list of available workers that the WorkerBuildQueue maintains. So the controller would continue sending messages to this dead connection, and the builds it sent would hang forever for a response.
*	Fix issues with distbuild caused by moving to building per-source	Richard Maw	2014-10-08	1	-14/+13
\|
*	Fix copyright years of distbuild code.	Sam Thursfield	2014-09-11	1	-1/+1
\|
*	Note future improvement for fetching artifacts from remote cache	Sam Thursfield	2014-06-10	1	-0/+3
\|
*	Move presence check for job into remove function	Richard Ipsum	2014-06-04	1	-6/+6
\| \| \| \|	We always want to warn if we attempt to remove a job that's not present
*	Add comment to explain the use of _JobFailed event	Richard Ipsum	2014-06-04	1	-0/+6
\|
*	Make Jobs finish when caching is complete	Richard Ipsum	2014-06-03	1	-24/+24
\| \| \| \| \| \| \| \| \| \|	If a new build request makes a request for an artifact that is currently being cached then the artifact will be needlessly rebuilt. To avoid this the new build request should wait for caching to finish. We rename _ExecStarted, _ExecEnded, _ExecFailed to _JobStarted, _JobFinished, _JobFailed and Job's is_building attribute is renamed to running.
*	Make job fail if caching fails	Richard Ipsum	2014-06-03	1	-0/+5
\| \| \| \| \|	This fixes the bug that causes the distbuild controller to crash when population of the artifact cache fails.
*	Merge remote-tracking branch 'origin/sam/distbuild-logs-2'	Sam Thursfield	2014-05-14	1	-0/+1
\|\ \| \| \| \| \| \| \| \|	Reviewed-By: Richard Ipsum <richard.ipsum@codethink.co.uk> Reviewed-By: Lars Wirzenius <lars.wirzenius@codethink.co.uk>
\| *	distbuild: Include .build-log when copying chunk artifacts to the Trove	Sam Thursfield	2014-05-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Users need to be able to see logs of all builds, not just those that failed.
* \|	Make distbuild put worker logs onto stdout	Richard Ipsum	2014-05-14	1	-0/+1
\|/
*	Add _ExecFailed event	Richard Ipsum	2014-05-06	1	-4/+24
\| \| \| \|	To cancel jobs cleanly we need to know when a job has failed.
*	Use messages to update job state	Richard Ipsum	2014-05-06	1	-3/+24
\|
*	Add cancelling to WorkerBuildScheduler	Richard Ipsum	2014-05-06	1	-13/+93
\|
*	Remove unused import and method	Richard Ipsum	2014-05-06	1	-3/+0
\| \| \| \|	add_initiator() isn't necessary given lists have a remove method.
*	Remove route map	Richard Ipsum	2014-04-24	1	-1/+0
\|
*	WorkerConnection: _maybe_handle_helper_result	Richard Ipsum	2014-04-23	1	-9/+4
\| \| \| \| \| \| \|	Put our _exec_response_msg into WorkerBuildFinished event, it's essentially the same as _finished_msg, just a different name Get our artifact's cache key from the job
*	WorkerConnection: _request_caching	Richard Ipsum	2014-04-23	1	-9/+6
\| \| \| \|	Now we just get everything from the job object
*	WorkerConection: _handle_exec_response	Richard Ipsum	2014-04-23	1	-9/+8
\| \| \| \| \| \| \| \| \| \| \|	The exec_response_msg also needs to be sent to a number of initiators, so we give it a list of ids not just one. The exec_response_msg will be sent to the controller once the artifacts have been cached successfully. There's no longer any need to use a route map to retrieve the id of the initiator, since this is stored with the job
*	WorkerConnection: _handle_exec_output	Richard Ipsum	2014-04-23	1	-2/+2
\| \| \| \| \|	msg now contains a list of initiator ids rather than a single one, since BuiltOutput needs to be sent to a number of initiators
*	WorkerBuildQueuer: Use job's artifact and id	Richard Ipsum	2014-04-23	1	-14/+17
\| \| \| \| \| \|	Each job is given a unique id, so we don't need to generate an id for each exec request this means we can remove use of route map since we can use the job's id for the exec request
*	Remove cancel	Richard Ipsum	2014-04-23	1	-6/+1
\| \| \| \|	This method no longer works, we will replace it soon.
*	Change event names back	Richard Ipsum	2014-04-23	1	-3/+3
\| \| \| \| \|	The name change from BuildFailed -> JobFailed etc was unintentionally merged into master, undo this.
*	WorkerConnection: misc attributes	Richard Ipsum	2014-04-23	1	-0/+6
\| \| \| \| \| \|	_job is the job this worker is carrying out _exec_response_msg will contain the response the worker sends back to us when it finishes the build.
*	WorkerBuildQueuer: replace request queue with jobs	Richard Ipsum	2014-04-23	1	-25/+67
\|
*	Add Jobs and Job classes	Richard Ipsum	2014-04-23	1	-10/+49
\|
*	Make WorkerBuildCaching carry a list of ids	Richard Ipsum	2014-04-23	1	-5/+2
\| \| \| \|	We need to be able to send this message to a number of initiators
*	Add new build messages to worker build scheduler	Richard Ipsum	2014-04-23	1	-2/+12
\|
*	Merge branch 'baserock/richardipsum/distbuild_improve_annotation3'	Richard Ipsum	2014-04-15	1	-1/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: distbuild/build_controller.py Reviewed by: Lars Wirzenius Daniel Silverstone Sam Thursfield
\| *	Set body and headers in message	Richard Ipsum	2014-04-11	1	-1/+2
\| \| \| \| \| \| \| \|	body and headers must now be specified for http-request message.
* \|	Fix lines longer than 80 characters (my fault)	Sam Thursfield	2014-04-14	1	-2/+4
\| \|