diff options
author | James E. Blair <jim@acmegating.com> | 2021-11-29 14:12:56 -0800 |
---|---|---|
committer | James E. Blair <jim@acmegating.com> | 2021-11-29 14:12:56 -0800 |
commit | 0c76985faf10b39e08e01c51683d76105862de36 (patch) | |
tree | 3e28c52dc5d91b705cca9ca5917ae92a39fd59f2 /tests/unit/test_sos.py | |
parent | d9802c9ffc58608d4d2ff2cd2940fff59fd4e518 (diff) | |
download | zuul-0c76985faf10b39e08e01c51683d76105862de36.tar.gz |
Don't delete pipeline summary objects if they have a syntax error
With most ZKObjects, we delete them from ZK if we are unable to
deserialize the JSON data. The pipeline manager will likely
re-create them if able in that case.
But the pipeline summary object is a unique case. We read from it
without obtaining a lock, so it's possible (likely even) that a
scheduler is in the middle of writing it out (it's sharded, so it
can be multiple znodes) when a zuul-web reads it. In that case it
was our intention to ignore the error and use the previous data.
However, since the zkobject base class automatically deletes the
object on error, this could result in deleting the summary from ZK
as it's being written. In that case we might continue using cached
data (or have no data if we didn't happen to have read from it
already) for an extended period of time until the pipeline updates
again (and that update could have the same problem).
To avoid this, add a class variable to indicate that the pipeline
summary object should not delete corrupt data. We will assume that
it is in the process of writing (and even if it is legitimately
corrupt, the resolution is the same regardless: wait for the scheduler
to write it again on the next pipeline pass, which it always does).
Change-Id: I6da8e7e01e0a31bf30520fdf9829b2a2f0559c11
Diffstat (limited to 'tests/unit/test_sos.py')
-rw-r--r-- | tests/unit/test_sos.py | 47 |
1 files changed, 47 insertions, 0 deletions
diff --git a/tests/unit/test_sos.py b/tests/unit/test_sos.py index f2dfc00a3..835b550bb 100644 --- a/tests/unit/test_sos.py +++ b/tests/unit/test_sos.py @@ -12,6 +12,8 @@ # License for the specific language governing permissions and limitations # under the License. +import zuul.model + from tests.base import iterate_timeout, ZuulTestCase @@ -139,3 +141,48 @@ class TestScaleOutScheduler(ZuulTestCase): dict(name='project-test1', result='SUCCESS', changes='1,1 2,1'), dict(name='project-test2', result='SUCCESS', changes='1,1 2,1'), ], ordered=False) + + def test_pipeline_summary(self): + # Test that we can deal with a truncated pipeline summary + self.executor_server.hold_jobs_in_build = True + tenant = self.scheds.first.sched.abide.tenants.get('tenant-one') + pipeline = tenant.layout.pipelines['check'] + context = self.createZKContext() + + def new_summary(): + summary = zuul.model.PipelineSummary() + summary._set(pipeline=pipeline) + summary.refresh(context) + return summary + + A = self.fake_gerrit.addFakeChange('org/project', 'master', 'A') + self.fake_gerrit.addEvent(A.getPatchsetCreatedEvent(1)) + self.waitUntilSettled() + + # Check we have a good summary + summary1 = new_summary() + self.assertNotEqual(summary1.status, {}) + self.assertTrue(context.client.exists(summary1.getPath())) + + # Make a syntax error in the status summary json + summary = new_summary() + summary._save(context, b'{"foo') + + # With the corrupt data, we should get an empty status but the + # path should still exist. + summary2 = new_summary() + self.assertEqual(summary2.status, {}) + self.assertTrue(context.client.exists(summary2.getPath())) + + # Our earlier summary object should use its cached data + summary1.refresh(context) + self.assertNotEqual(summary1.status, {}) + + self.executor_server.hold_jobs_in_build = False + self.executor_server.release() + self.waitUntilSettled() + + # The scheduler should have written a new summary that our + # second object can read now. + summary2.refresh(context) + self.assertNotEqual(summary2.status, {}) |