Fix race condition in pipeline change list init

Simon Westphahl describes the race condition: > [The race condition] can occur after a reconfiguration while > some schedulers are updating their local layout and some > already start processing pipelines in the same tenant. > > In this case the pipeline manager's `_postConfig()` method that > calls `PipelineChangeList.create(...)` races with the pipeline > processor updating the change keys. > > This leads to two change lists being written as separate > shards, that can't be correctly loaded, as all shards combined > are expected to form a single JSON object. > > The sequence of events seems to be the following: > 1. S1: pipeline processor needs to update the change keys > 2. S1 the shard writer will delete the `change_list` key with the old > shards > 3. S2: configloader calls the `_postConfig()` method > 4. S2: `PipelineChangeList.create()` notices that the `change_list` node > doesn't exists in Zookeeper: > https://opendev.org/zuul/zuul/src/branch/master/zuul/model.py#L921 > 6. S2: the shard writer creates the first shard `0000000000` > 7. S1: the shard writer creates the second shared `0000000001` > > The race condition seems to be introduced with > Ib1e467b5adb907f93bab0de61da84d2efc22e2a7 That change updated the pipeline manager _postConfig method so that it no longer acquires the pipeline lock when initalizing the pipeline state and change lists. This greatly reduces potential delays during reconfiguration, but has, perhaps predictably, lead to the race condition above. In the commit message for that change, I noted that we might be able to avoid even more work if we accept some caveats related to error reporting. Avoiding that work mean avoiding performing any writes during _postConfig which addresses the root cause of the race condition (steps 3-6 above. Ed. note: there is no step 5). From the commit message: > We still need to attach new state and change list objects to > the pipeline in _postConfig (since our pipeline object is new). > We also should make sure that the objects exist in ZK before we > leave that method, so that if a new pipeline is created, other > schedulers will be able to load the (potentially still empty) > objects from ZK. As an alternative, we could avoid even this > work in _postConfig, but then we would have to handle missing > objects on refresh, and it would not be possible to tell if the > object was missing due to it being new or due to an error. To > avoid masking errors, we keep the current expectation that we > will create these objects in ZK on the initial reconfiguration. The current change does exactly that. We no longer perform any ZK write operations on the state and change list objects in _postConfig. Instead, inside of the refresh methods, we detect the cases where they should be newly created and do so at that time. This happens with the pipeline lock, so is safe against any simultaneous operation from other components. There will be "ERROR" level log messages indicating that reading the state from ZK has failed when these objects are first initialized. To indicate that this is probably okay, they will now be immediately followed by "WARNING" level messages explaining that. Strictly speaking, this particular race should only occur for the change list object, not the pipeline state, since the race condition above requires a sharded object and of the two, only the change list is sharded. However, to keep the lifecycle of these two objects matched (and to simplify _postConfig) the same treatment is applied to both. Note that change I7fa99cd83a857216321f8d946fd42abd9ec427a3 merged after Ib1e467b and changed the behavior slightly, introducing the old_state and old_list arguments. Curiously, the old_list argument is effectively unused, so it is removed entirely in this change. Old_state still has a purpose and is retained. Change-Id: I519348e7d5d74e675808e990920480fb6e1fb981
author: James E. Blair <jim@acmegating.com> 2023-02-01 16:13:32 -0800
committer: James E. Blair <jim@acmegating.com> 2023-02-10 15:03:08 -0800
commit: 98dcd51d90972b0a2ba6c6993300158a6d5e7b2d (patch)
tree: 4f7e4713a04ea9d9af532a0d2ac187ac7d3b20ad /zuul/manager/__init__.py
parent: c3334743f6424eee40eab4632b6b5c7654cc17f9 (diff)
download: zuul-98dcd51d90972b0a2ba6c6993300158a6d5e7b2d.tar.gz
1 files changed, 42 insertions, 15 deletions
diff --git a/zuul/manager/__init__.py b/zuul/manager/__init__.py
index 832be780a..e85d5124e 100644
--- a/zuul/manager/__init__.py
+++ b/zuul/manager/__init__.py
@@ -28,6 +28,8 @@ from zuul.model import (
 )
 from zuul.zk.change_cache import ChangeKey
 from zuul.zk.components import COMPONENT_REGISTRY
+from zuul.zk.exceptions import LockException
+from zuul.zk.locks import pipeline_lock
 
 from opentelemetry import trace
 
@@ -95,21 +97,46 @@ class PipelineManager(metaclass=ABCMeta):
     def _postConfig(self):
         layout = self.pipeline.tenant.layout
         self.buildChangeQueues(layout)
-        with self.sched.createZKContext(None, self.log) as ctx,\
-             self.currentContext(ctx):
-            # Make sure we have state and change list objects, and
-            # ensure that they exist in ZK.  We don't hold the
-            # pipeline lock, but if they don't exist, that means they
-            # are new, so no one else will either, so the write on
-            # create is okay.  If they do exist and we have an old
-            # object, we'll just reuse it.  If it does exist and we
-            # don't have an old object, we'll get a new empty one.
-            # Regardless, these will not automatically refresh now, so
-            # they will be out of date until they are refreshed later.
-            self.pipeline.state = PipelineState.create(
-                self.pipeline, layout.uuid, self.pipeline.state)
-            self.pipeline.change_list = PipelineChangeList.create(
-                self.pipeline)
+        # Make sure we have state and change list objects.  We
+        # don't actually ensure they exist in ZK here; these are
+        # just local objects until they are serialized the first
+        # time.  Since we don't hold the pipeline lock, we can't
+        # reliably perform any read or write operations; we just
+        # need to ensure we have in-memory objects to work with
+        # and they will be initialized or loaded on the next
+        # refresh.
+
+        # These will be out of date until they are refreshed later.
+        self.pipeline.state = PipelineState.create(
+            self.pipeline, layout.uuid, self.pipeline.state)
+        self.pipeline.change_list = PipelineChangeList.create(
+            self.pipeline)
+
+        # Now, try to acquire a non-blocking pipeline lock and refresh
+        # them for the side effect of initializing them if necessary.
+        # In the case of a new pipeline, no one else should have a
+        # lock anyway, and this helps us avoid emitting a whole bunch
+        # of errors elsewhere on startup when these objects don't
+        # exist.  If the pipeline already exists and we can't acquire
+        # the lock, that's fine, we're much less likely to encounter
+        # read errors elsewhere in that case anyway.
+        try:
+            with pipeline_lock(
+                    self.sched.zk_client, self.pipeline.tenant.name,
+                    self.pipeline.name, blocking=False) as lock,\
+                    self.sched.createZKContext(lock, self.log) as ctx,\
+                    self.currentContext(ctx):
+                if not self.pipeline.state.exists(ctx):
+                    # We only do this if the pipeline doesn't exist in
+                    # ZK because in that case, this process should be
+                    # fast since it's empty.  If it does exist,
+                    # refreshing it may be slow and since other actors
+                    # won't encounter errors due to its absence, we
+                    # would rather defer the work to later.
+                    self.pipeline.state.refresh(ctx)
+                    self.pipeline.change_list.refresh(ctx)
+        except LockException:
+            pass
 
     def buildChangeQueues(self, layout):
         self.log.debug("Building relative_priority queues")
author	James E. Blair <jim@acmegating.com>	2023-02-01 16:13:32 -0800
committer	James E. Blair <jim@acmegating.com>	2023-02-10 15:03:08 -0800
commit	98dcd51d90972b0a2ba6c6993300158a6d5e7b2d (patch)
tree	4f7e4713a04ea9d9af532a0d2ac187ac7d3b20ad /zuul/manager/__init__.py
parent	c3334743f6424eee40eab4632b6b5c7654cc17f9 (diff)
download	zuul-98dcd51d90972b0a2ba6c6993300158a6d5e7b2d.tar.gz