Running jobs ============ This chapter contains tests that verify that WEBAPP schedules jobs, accepts job output, and lets the admin kill running jobs. Run a job successfully ---------------------- To start with, with an empty run-queue, nothing should be scheduled. SCENARIO run a job GIVEN a new git repository in CONFGIT AND an empty lorry-controller.conf in CONFGIT AND lorry-controller.conf in CONFGIT adds lorries *.lorry using prefix upstream AND WEBAPP uses CONFGIT as its configuration directory AND a running WEBAPP We stop the queue first. WHEN admin makes request POST /1.0/stop-queue Then make sure we don't get a job when we request one. WHEN admin makes request POST /1.0/give-me-job with host=testhost&pid=123 THEN response has job_id set to null WHEN admin makes request GET /1.0/list-running-jobs THEN response has running_jobs set to [] Add a Lorry spec to the run-queue, and request a job. We still shouldn't get a job, since the queue isn't set to run yet. GIVEN Lorry file CONFGIT/foo.lorry with {"foo":{"type":"git","url":"git://foo"}} WHEN admin makes request POST /1.0/read-configuration AND admin makes request POST /1.0/give-me-job with host=testhost&pid=123 THEN response has job_id set to null Enable the queue, and off we go. WHEN admin makes request POST /1.0/start-queue AND admin makes request POST /1.0/give-me-job with host=testhost&pid=123 THEN response has job_id set to 1 AND response has path set to "upstream/foo" WHEN admin makes request GET /1.0/lorry/upstream/foo THEN response has running_job set to 1 WHEN admin makes request GET /1.0/list-running-jobs THEN response has running_jobs set to [1] Requesting another job should now again return null. WHEN admin makes request POST /1.0/give-me-job with host=testhost&pid=123 THEN response has job_id set to null Inform WEBAPP the job is finished. WHEN MINION makes request POST /1.0/job-update with job_id=1&exit=0 THEN response has kill_job set to false WHEN admin makes request GET /1.0/lorry/upstream/foo THEN response has running_job set to null WHEN admin makes request GET /1.0/list-running-jobs THEN response has running_jobs set to [] Cleanup. FINALLY WEBAPP terminates Limit number of jobs running at the same time --------------------------------------------- WEBAPP can be told to limit the number of jobs running at the same time. Set things up. Note that we have two local Lorry files, so that we could, in principle, run two jobs at the same time. SCENARIO limit concurrent jobs GIVEN a new git repository in CONFGIT AND an empty lorry-controller.conf in CONFGIT AND lorry-controller.conf in CONFGIT adds lorries *.lorry using prefix upstream AND Lorry file CONFGIT/foo.lorry with {"foo":{"type":"git","url":"git://foo"}} AND Lorry file CONFGIT/bar.lorry with {"bar":{"type":"git","url":"git://bar"}} AND WEBAPP uses CONFGIT as its configuration directory AND a running WEBAPP WHEN admin makes request POST /1.0/read-configuration Check the current set of the `max_jobs` setting. WHEN admin makes request GET /1.0/get-max-jobs THEN response has max_jobs set to null Set the limit to 1. WHEN admin makes request POST /1.0/set-max-jobs with max_jobs=1 THEN response has max_jobs set to 1 WHEN admin makes request GET /1.0/get-max-jobs THEN response has max_jobs set to 1 Get a job. This should succeed. WHEN MINION makes request POST /1.0/give-me-job with host=testhost&pid=1 THEN response has job_id set to 1 Get a second job. This should not succeed. WHEN MINION makes request POST /1.0/give-me-job with host=testhost&pid=2 THEN response has job_id set to null Finish the first job. Then get a new job. This should succeed. WHEN MINION makes request POST /1.0/job-update with job_id=1&exit=0 AND MINION makes request POST /1.0/give-me-job with host=testhost&pid=2 THEN response has job_id set to 2 Stop job in the middle ---------------------- We need to be able to stop jobs while they're running as well. We start by setting up everything so that a job is running, the same way we did for the successful job scenario. SCENARIO stop a job while it's running GIVEN a new git repository in CONFGIT AND an empty lorry-controller.conf in CONFGIT AND lorry-controller.conf in CONFGIT adds lorries *.lorry using prefix upstream AND WEBAPP uses CONFGIT as its configuration directory AND a running WEBAPP AND Lorry file CONFGIT/foo.lorry with {"foo":{"type":"git","url":"git://foo"}} WHEN admin makes request POST /1.0/read-configuration AND admin makes request POST /1.0/start-queue AND admin makes request POST /1.0/give-me-job with host=testhost&pid=123 THEN response has job_id set to 1 AND response has path set to "upstream/foo" Admin will now ask WEBAPP to kill the job. This changes sets a field in the STATEDB only. WHEN admin makes request POST /1.0/stop-job with job_id=1 AND admin makes request GET /1.0/lorry/upstream/foo THEN response has kill_job set to true Now, when MINION updates the job, WEBAPP will tell it to kill it. MINION will do so, and then update the job again. WHEN MINION makes request POST /1.0/job-update with job_id=1&exit=no THEN response has kill_job set to true WHEN MINION makes request POST /1.0/job-update with job_id=1&exit=1 Admin will now see that the job has, indeed, been killed. WHEN admin makes request GET /1.0/lorry/upstream/foo THEN response has running_job set to null WHEN admin makes request GET /1.0/list-running-jobs THEN response has running_jobs set to [] Check that job can be run successfully again. In 2014, we found a bug where a lorry that was ever set to be killed, would never again successfully run. WHEN admin makes request POST /1.0/give-me-job with host=testhost&pid=123 THEN response has job_id set to 2 AND response has path set to "upstream/foo" WHEN MINION makes request POST /1.0/job-update with job_id=2&exit=no THEN response has kill_job set to false Cleanup. FINALLY WEBAPP terminates Stop a job that runs too long ----------------------------- Sometimes a job gets "stuck" and should be killed. The `lorry-controller.conf` has an optional `lorry-timeout` field for this, to set the timeout, and WEBAPP will tell MINION to kill a job when it has been running too long. Some setup. Set the `lorry-timeout` to a know value. It doesn't matter what it is since we'll be telling WEBAPP to fake its sense of time, so that the test suite is not timing sensitive. We wouldn't want to have the test suite fail when running on slow devices. SCENARIO stop stuck job GIVEN a new git repository in CONFGIT AND an empty lorry-controller.conf in CONFGIT AND lorry-controller.conf in CONFGIT adds lorries *.lorry using prefix upstream AND lorry-controller.conf in CONFGIT has lorry-timeout set to 1 for everything AND Lorry file CONFGIT/foo.lorry with {"foo":{"type":"git","url":"git://foo"}} AND WEBAPP uses CONFGIT as its configuration directory AND a running WEBAPP WHEN admin makes request POST /1.0/read-configuration Pretend it is the start of time. WHEN admin makes request POST /1.0/pretend-time with now=0 AND admin makes request GET /1.0/status THEN response has timestamp set to "1970-01-01 00:00:00 UTC" Start the job. WHEN admin makes request POST /1.0/give-me-job with host=testhost&pid=123 THEN response has job_id set to 1 Check that the job info contains a start time. WHEN admin makes request GET /1.0/job/1 THEN response has job_started set Pretend it is now much later, or at least later than the timeout specified. WHEN admin makes request POST /1.0/pretend-time with now=2 Pretend to be a MINION that reports an update on the job. WEBAPP should now be telling us to kill the job. WHEN MINION makes request POST /1.0/job-update with job_id=1&exit=no THEN response has kill_job set to true Kill the job, as requested. WHEN MINION makes request POST /1.0/job-update with job_id=1&exit=1 Verify we can run the job successfully after it has been killed once by timeout. In 2014 we had a bug where this would not happen, because a lorry that had ever been killed would never run successfully again. WHEN admin makes request POST /1.0/give-me-job with host=testhost&pid=123 THEN response has job_id set to 2 WHEN MINION makes request POST /1.0/job-update with job_id=2&exit=no THEN response has kill_job set to false Cleanup. FINALLY WEBAPP terminates Remove a terminated job ----------------------- WEBAPP doesn't remove jobs automatically, it needs to be told to remove jobs. SCENARIO remove job Setup. GIVEN a new git repository in CONFGIT AND an empty lorry-controller.conf in CONFGIT AND lorry-controller.conf in CONFGIT adds lorries *.lorry using prefix upstream AND WEBAPP uses CONFGIT as its configuration directory AND a running WEBAPP GIVEN Lorry file CONFGIT/foo.lorry with {"foo":{"type":"git","url":"git://foo"}} WHEN admin makes request POST /1.0/read-configuration Start job 1. WHEN admin makes request POST /1.0/give-me-job with host=testhost&pid=123 THEN response has job_id set to 1 Try to remove job 1 while it is running. This should fail. WHEN admin makes request POST /1.0/remove-job with job_id=1 THEN response has reason set to "still running" Finish the job. WHEN MINION makes request POST /1.0/job-update with job_id=1&exit=0 WHEN admin makes request GET /1.0/list-jobs THEN response has job_ids set to [1] Remove it. WHEN admin makes request POST /1.0/remove-job with job_id=1 AND admin makes request GET /1.0/list-jobs THEN response has job_ids set to [] Cleanup. FINALLY WEBAPP terminates