Don't delete pipeline summary objects if they have a syntax error

With most ZKObjects, we delete them from ZK if we are unable to deserialize the JSON data. The pipeline manager will likely re-create them if able in that case. But the pipeline summary object is a unique case. We read from it without obtaining a lock, so it's possible (likely even) that a scheduler is in the middle of writing it out (it's sharded, so it can be multiple znodes) when a zuul-web reads it. In that case it was our intention to ignore the error and use the previous data. However, since the zkobject base class automatically deletes the object on error, this could result in deleting the summary from ZK as it's being written. In that case we might continue using cached data (or have no data if we didn't happen to have read from it already) for an extended period of time until the pipeline updates again (and that update could have the same problem). To avoid this, add a class variable to indicate that the pipeline summary object should not delete corrupt data. We will assume that it is in the process of writing (and even if it is legitimately corrupt, the resolution is the same regardless: wait for the scheduler to write it again on the next pipeline pass, which it always does). Change-Id: I6da8e7e01e0a31bf30520fdf9829b2a2f0559c11
author: James E. Blair <jim@acmegating.com> 2021-11-29 14:12:56 -0800
committer: James E. Blair <jim@acmegating.com> 2021-11-29 14:12:56 -0800
commit: 0c76985faf10b39e08e01c51683d76105862de36 (patch)
tree: 3e28c52dc5d91b705cca9ca5917ae92a39fd59f2 /tests/unit/test_sos.py
parent: d9802c9ffc58608d4d2ff2cd2940fff59fd4e518 (diff)
download: zuul-0c76985faf10b39e08e01c51683d76105862de36.tar.gz
1 files changed, 47 insertions, 0 deletions
diff --git a/tests/unit/test_sos.py b/tests/unit/test_sos.py
index f2dfc00a3..835b550bb 100644
--- a/tests/unit/test_sos.py
+++ b/tests/unit/test_sos.py
@@ -12,6 +12,8 @@
 # License for the specific language governing permissions and limitations
 # under the License.
 
+import zuul.model
+
 from tests.base import iterate_timeout, ZuulTestCase
 
 
@@ -139,3 +141,48 @@ class TestScaleOutScheduler(ZuulTestCase):
             dict(name='project-test1', result='SUCCESS', changes='1,1 2,1'),
             dict(name='project-test2', result='SUCCESS', changes='1,1 2,1'),
         ], ordered=False)
+
+    def test_pipeline_summary(self):
+        # Test that we can deal with a truncated pipeline summary
+        self.executor_server.hold_jobs_in_build = True
+        tenant = self.scheds.first.sched.abide.tenants.get('tenant-one')
+        pipeline = tenant.layout.pipelines['check']
+        context = self.createZKContext()
+
+        def new_summary():
+            summary = zuul.model.PipelineSummary()
+            summary._set(pipeline=pipeline)
+            summary.refresh(context)
+            return summary
+
+        A = self.fake_gerrit.addFakeChange('org/project', 'master', 'A')
+        self.fake_gerrit.addEvent(A.getPatchsetCreatedEvent(1))
+        self.waitUntilSettled()
+
+        # Check we have a good summary
+        summary1 = new_summary()
+        self.assertNotEqual(summary1.status, {})
+        self.assertTrue(context.client.exists(summary1.getPath()))
+
+        # Make a syntax error in the status summary json
+        summary = new_summary()
+        summary._save(context, b'{"foo')
+
+        # With the corrupt data, we should get an empty status but the
+        # path should still exist.
+        summary2 = new_summary()
+        self.assertEqual(summary2.status, {})
+        self.assertTrue(context.client.exists(summary2.getPath()))
+
+        # Our earlier summary object should use its cached data
+        summary1.refresh(context)
+        self.assertNotEqual(summary1.status, {})
+
+        self.executor_server.hold_jobs_in_build = False
+        self.executor_server.release()
+        self.waitUntilSettled()
+
+        # The scheduler should have written a new summary that our
+        # second object can read now.
+        summary2.refresh(context)
+        self.assertNotEqual(summary2.status, {})
author	James E. Blair <jim@acmegating.com>	2021-11-29 14:12:56 -0800
committer	James E. Blair <jim@acmegating.com>	2021-11-29 14:12:56 -0800
commit	0c76985faf10b39e08e01c51683d76105862de36 (patch)
tree	3e28c52dc5d91b705cca9ca5917ae92a39fd59f2 /tests/unit/test_sos.py
parent	d9802c9ffc58608d4d2ff2cd2940fff59fd4e518 (diff)
download	zuul-0c76985faf10b39e08e01c51683d76105862de36.tar.gz