Merge "Add SOS documentation and remove spec"

author: Zuul <zuul@review.opendev.org> 2021-11-30 03:48:05 +0000
committer: Gerrit Code Review <review@openstack.org> 2021-11-30 03:48:05 +0000
commit: 09ca7e01d0d1c1d8ce47edfee821535770ee2a00 (patch)
tree: 2ed8c218b1e9f378ab39b09ef9f2746c3c57662a /doc
parent: 7650882d009855a858941996a279decbc0293225 (diff)
parent: 2836af9c4441b33678a2d6a509769a09e544d91f (diff)
download: zuul-09ca7e01d0d1c1d8ce47edfee821535770ee2a00.tar.gz
3 files changed, 103 insertions, 343 deletions
diff --git a/doc/source/reference/developer/specs/index.rst b/doc/source/reference/developer/specs/index.rst
index b61daf06f..d96df0c26 100644
--- a/doc/source/reference/developer/specs/index.rst
+++ b/doc/source/reference/developer/specs/index.rst
@@ -19,7 +19,6 @@ documentation instead.
    tenant-scoped-admin-web-API
    kubernetes-operator
    circular-dependencies
-   scale-out-scheduler
    zuul-runner
    enhanced-regional-executors
    tenant-resource-quota
diff --git a/doc/source/reference/developer/specs/scale-out-scheduler.rst b/doc/source/reference/developer/specs/scale-out-scheduler.rst
deleted file mode 100644
index 4bb750cdb..000000000
--- a/doc/source/reference/developer/specs/scale-out-scheduler.rst
+++ /dev/null
@@ -1,341 +0,0 @@
-Scale out scheduler
-===================
-
-.. warning:: This is not authoritative documentation.  These features
-   are not currently available in Zuul.  They may change significantly
-   before final implementation, or may never be fully completed.
-
-Zuul has a microservices architecture with the goal of no single point of
-failure in mind. This has not yet been achieved for the zuul-scheduler
-component.
-
-Especially within large Zuul deployments with many also long running jobs the
-cost of a scheduler crash can be quite high. In this case currently all
-in-flight jobs are lost and need to be restarted. A scale out scheduler approach
-can avoid this.
-
-The same problem holds true when updating the scheduler. Currently there is no
-possibility to upgrade the scheduler without downtime. While the pipeline state
-can be saved and re-enqueued this still loses all in-flight jobs. Further on a
-larger deployment the startup of the scheduler easily can be in the multi minute
-range. Having the ability to do zero downtime upgrades can make updates much
-more easier.
-
-Further having multiple schedulers can facilitate parallel processing of several
-tenants and help reducing global locks within installations with many tenants.
-
-In this document we will outline an approach towards a completely single point
-of failure free zuul system. This will be a transition with multiple phases.
-
-
-Status quo
-----------
-
-Zuul is an event driven system with several event loops that interact with each
-other:
-
-* Driver event loop: Drivers like Github or Gerrit have its own event loops.
-  They perform preprocessing of the received events and add events into the
-  scheduler event loop.
-
-* Scheduler event loop: This event loop processes the pipelines and
-  reconfigurations.
-
-All of these event loops currently run within the scheduler process without
-persisting their state. So the path to a scale out scheduler involves mainly
-making all event loops scale out capable.
-
-
-
-Target architecture
--------------------
-
-In addition to the event loops mentioned above we need an additional event queue
-per pipeline. This will make it easy to process several pipelines in parallel. A
-new driver event would first be processed in the driver event queue. This will
-add a new event into the scheduler event queue. The scheduler event queue then
-checks which pipeline may be interested in this event according to the tenant
-configuration and layout. Based on this the event is dispatched to all matching
-pipeline queues.
-
-As it is today different event types will have different priorities. This will
-be expressed like in node-requests with a prefix.
-
-The event queues will be stored in Zookeeper in the following paths:
-
-* ``/zuul/events/connection/<connection name>/<sequence>``: Event queue of a
-  connection
-
-* ``/zuul/events/scheduler-global/<prio>-<sequence>``: Global event queue of
-  scheduler
-
-* ``/zuul/events/tenant/<tenant name>/<pipeline>/<prio>-<sequence>``: Pipeline
-  event queue
-
-In order to make reconfigurations efficient we also need to store the parsed
-branch config in Zookeeper. This makes it possible to create the current layout
-without the need to ask the mergers multiple times for the configuration. This
-also can be used by zuul-web to keep an up-to-date layout that can be used for
-api requests.
-
-We also need to store the pipeline state in Zookeeper. This will be similar to
-the status.json but also needs to contain the frozen jobs and their current
-state.
-
-Further we need to replace gearman by Zookeeper as rpc mechanism to the
-executors. This will make it possible that different schedulers can continue
-smoothly with pipeline execution. The jobs will be stored in
-``/zuul/jobs/<tenant>/<sequence>``.
-
-
-Driver event ingestion
-----------------------
-
-Currently the drivers immediately get events from Gerrit or Github, process them
-and forward the events to the scheduler event loop. Thus currently all events
-are lost during a downtime of the zuul-scheduler. In order to decouple this we
-can push the raw events into Zookeeper and pop them in the driver event loop.
-
-We will split the drivers into an event receiving and an event processing
-component. The event receiving component will store the events in a squenced
-znode in the path ``/zuul/events/connection/<connection name>/<sequence>``.
-The event receiving part may or may not run within the scheduler context.
-The event processing part will be part of the scheduler context.
-
-There are three types of event receive mechanisms in Zuul:
-
-* Active event gathering: The connection actively subscribes for events (Gerrit)
-  or generates them itself (git, timer, zuul)
-
-* Passive event gathering: The events are sent to zuul from outside (Github
-  webhooks)
-
-* Internal event generation: The events are generated within zuul itself and
-  typically get injected directly into the scheduler event loop and thus don't
-  need to be changed in this phase.
-
-The active and passive event gathering need to be handled slightly different.
-
-Active event gathering
-~~~~~~~~~~~~~~~~~~~~~~
-
-This is mainly done by the Gerrit driver. We actively maintain a connection to
-the target and receive events. This means that if we have more than one instance
-we need to find a way to handle duplicated events. This type of event gathering
-can run within the scheduler process. Optionally if there is interest we can
-also make it possible to run this as a standalone process.
-
-We can utilize leader election to make sure there is exactly one instance
-receiving the events. This makes sure that we don't need to handle duplicated
-events at all. A drawback is that there is a short time when the current leader
-stops until the next leader has started event gathering. This could lead to a
-few missed events. But as this is the most easiest way we can accept this in
-the initial version.
-
-If there is a need to guarantee that there is no missed event during a
-leadership change the above algorithm can be enhanced later with parallel
-gathering and deduplication strategies. As this is much more complicated this
-will not be in scope of this spec but subject to later enhancements.
-
-
-Passive event gathering
-~~~~~~~~~~~~~~~~~~~~~~~
-
-In case of passive event gathering the events are sent to Zuul typically via
-webhooks. These types of events will be received in zuul-web that stores them in
-Zookeeper. This type of event gathering is used for example by the Github and
-Gerrit driver (other drivers, possibly implemented before this is realized,
-should be checked too). In this
-case we can have multiple instances but still receive only one event. So we
-don't need to take special care of event deduplication or leader election -
-multiple instances behind a load balancer are save to use and recommended for
-such passive event gathering.
-
-
-Store unparsed branch config in Zookeeper
------------------------------------------
-
-We need to store the global configuration in zookeeper. However zookeeper is not
-designed as a database with a large amount of data we should store as little as
-possible in zookeeper. Thus we only store the per project-branch unparsed config
-in zookeeper. From this every part of zuul like the scheduler or also zuul-web
-can quickly recalculate the layout of each tenant and keep it up to date by
-watching for changes in the unparsed project-branch-config. We will lock the
-complete global config with one lock and maintain a checkpoint version of it.
-This way each component can watch the config version number and react
-accordingly. Although we lock the complete global config we still should store
-the actual config in distinct nodes per project and branch. This is needed
-because of the 1MB limit per znode in zookeeper. It further makes it less
-expensive to cache the global config in each component as this cache will be
-updated incrementally.
-
-The configs will be stored in the path ``/zuul/config/<project>/<branch>`` per
-branch segmented in ``/zuul/config/<project>/<branch>/<path-to-config>/<shard>``.
-The ``shard`` is a sequence number and will be used to store larger than 1MB
-files due to the limitation mentioned above.
-
-
-Store pipeline and tenant state in Zookeeper
---------------------------------------------
-
-The pipeline state is similar to the current status.json. However the frozen
-jobs and their state are needed for seemless continuation of the pipeline
-execution on a different scheduler. Further this can make it easy to generate
-the status.json directly in zuul-web by inspecting the data in Zookeeper.
-Buildsets that are enqueued in a pipeline will be stored in
-``/zuul/tenant/<tenant>/pipeline/<pipeline>/queue/<queue>/<buildset uuid>``.
-
-Each buildset will contain a child znode per job that holds a data structure
-with the frozen job as well as the current state. This will also contain a
-reference to the node request that was used for this job. When the node request
-is fulfilled the pipeline processor creates an execution-request which
-will be locked by an executor before processing the job. The buildset will
-contain a link to the execution request. The executor will accept the referenced
-node request, lock the nodes and run the job. If the job needs to be canceled
-the pipeline processor just pulls the execution-request. The executor will
-notice this, abort the job and return the nodes.
-
-We also need to store tenant state like semaphores in Zookeeper. This will be
-stored in ``/zuul/tenant/<tenant>/semaphores/<name>``.
-
-
-Mandatory SQL connection
-------------------------
-
-Currently the times database is stored on the local filesystem of the scheduler.
-We already have an optional SQL database that holds the needed information. We
-need to be able to rely on this information so we'll make the SQL db mandatory.
-
-Zuul currently supports multiple database connections. At the moment the SQL
-reporters can be configured on pipelines. This should be changed to global and
-tenant-based SQL reporters. When we make the database
-connection mandatory zuul needs to know which one is the primary database
-connection. If there is only one configured connection it will be automatically
-the primary. If there are more configured connections one will need to be
-configured as primary database. Reporters will use the primary
-database in any case.
-
-The primary database can be used to query the last 10 successful build times
-and use this as the times database.
-
-
-Executor via Zookeeper
-----------------------
-
-In order to prepare for distributed pipeline execution we need to use Zookeeper
-for scheduling jobs on the executors. This is needed so that any scheduler can
-take over a pipeline execution without having to restart jobs.
-
-As described above the executor will look for builds. These will be stored in
-``/zuul/builds/<sequence>``. The builds will contain every information that is
-needed to run the job. The builds are stored outside of the pipeline itself
-for two reasons. First the executors should not need to do a deep search when
-looking for new builds to do. Second this makes it clear that they are not
-subject to the pipeline lock but have their own locks. However the buildsets
-in the pipeline will contain a reference to their builds.
-
-During the lifecycle of a build the executor can update the states by their own.
-But should enqueue result events to the corresponding pipeline event queue as
-pipeline processing relies on build started, paused, finished events. The
-lifecycle will be as follows.
-
-* Build gets created in state REQUESTED
-* Executor locks it and sets the state to RUNNING. It will enqueue a build
-  started event to the pipeline event queue.
-* If requested the executor sets the state to PAUSED after the run phase and
-  enqueues a build paused event to the pipeline event queue
-* If build is PAUSED a resume can be requested by the pipeline processor by
-  adding an empty ``resume`` child node to the build. This way we don't have to
-  update a locked znode while ignoring the lock. The executor will then change
-  the state back to RUNNING and continue the execution.
-* When the build is finished the executor changes the state to COMPLETE, unlocks
-  the build and enqueues a build finished event to the pipeline.
-* If a build should be canceled the pipeline processor adds a ``cancel`` child
-  znode that will be recognized by the executor which will act accordingly.
-
-It can be that an executor crashes. In this case it will lose the lock. We need
-to be able to recover from this and emit the right event to the pipeline.
-Such a lost builds can be detected if it is in a state other than REQUESTED or
-COMPLETED but unlocked. Any executor that sees such a request while looking for
-new builds to execute will lock and mark it as COMPLETED and failed. It then
-will emit a build completed event such that the pipeline event processor can
-react on this and reschedule the build. There is no special handling needed to
-return the nodes as in this case the failing executor will also lose its lock
-on the nodes so they will be deleted or recycled by nodepool automatically.
-
-
-Parallelize pipeline processing
--------------------------------
-
-Once we have the above data in place we can create the per pipeline event and
-the global scheduler event queues in Zookeeper. The global scheduler event queue
-will receive the trigger, management and result events that are not tenant
-specific. The purpose of this queue is to take these events and dispatch them to
-the pipeline queues of the tenants as appropriate. This event queue can easily
-processed using a locking mechanism.
-
-We also have tenant global events like tenant reconfigurations. These need
-exclusive access to all pipelines in the tenant. So we need a two layer locking
-approach during pipeline processing. At first we need an RW lock at the tenant
-level. This will allow to be locked by all pipeline processors at the same time
-(call them readers as they don't modify the global tenant state). Management
-events (e.g. tenant-reconfiguration) however will get this lock exclusive (call
-them writers as they modify the global tenant state).
-
-Each pipeline processor will loop over all pipelines that have outstanding
-events. Before processing an event it will first try to lock the tenant. If it
-fails it will continue with pipelines in the the next tenant having outstanding
-events. If it got the tenant lock it will try to lock the pipeline. If it fails
-it will continue with the next pipeline. If it succeeds it will process all
-outstanding events of that pipeline. To prevent starvation of tenants we can
-define a max processing time after which the pipeline processor will switch to
-the next tenant or pipeline even if there are outstanding events.
-
-In order to reduce stalls when doing reconfigurations or tenant reconfigurations
-we can run one pipeline processor in one thread and reconfigurations in a
-separate thread(s). This way a tenant that is running a longer reconfiguration
-won't block other tenants.
-
-
-Zuul-web changes
-----------------
-
-Now zuul can be changed to directly use the data in Zookeeper instead if
-asking the scheduler via gearman.
-
-
-Security considerations
------------------------
-
-When switching the executor job queue to Zookeeper we need to take precautions
-because this will also contain decrypted secrets. In order to secure this
-communication channel we need to make sure that we use authenticated and
-encrypted connections to zookeeper.
-
-* There is already a change that adds Zookeeper auth:
-  https://review.openstack.org/619156
-* Kazoo SSL support just has landed: https://github.com/python-zk/kazoo/pull/513
-
-Further we will encrypt every secret that is stored in zookeeper using a
-symmetric cipher with a shared key that is known to all zuul services but not
-zookeeper. This way we can avoid dumping decrypted secrets into the transaction
-log of zookeeper.
-
-
-Roadmap
--------
-
-In order to manage the workload and minimize rebasing efforts, we suggest to
-break the above into smaller changes. Each such change should be then
-implemented separately.
-
-#. Mandatory SQL connection, definition of primary SQL connection and add SQL
-   reporters for tenants
-#. Storing parsed branch config in zookeeper
-#. Storing raw events in zookeeper using drivers or a separate service
-#. Event queue per pipeline
-#. Storing pipeline state and tenant state in zookeeper
-#. Adapt drivers to pop events from zookeeper (split drivers into event
-   receiving and event processing components)
-#. Parallel pipeline processing
-#. Switch to using zookeeper instead of gearman
diff --git a/doc/source/reference/developer/zookeeper.rst b/doc/source/reference/developer/zookeeper.rst
index 89b96af00..b0aae0b20 100644
--- a/doc/source/reference/developer/zookeeper.rst
+++ b/doc/source/reference/developer/zookeeper.rst
@@ -1,5 +1,107 @@
+ZooKeeper
+=========
+
+Overview
+--------
+
+Zuul has a microservices architecture with the goal of no single point of
+failure in mind.
+
+Zuul is an event driven system with several event loops that interact
+with each other:
+
+* Driver event loop: Drivers like GitHub or Gerrit have their own event loops.
+  They perform preprocessing of the received events and add events into the
+  scheduler event loop.
+
+* Scheduler event loop: This event loop processes the pipelines and
+  reconfigurations.
+
+Each of these event loops persists data in ZooKeeper so that other
+components can share or resume processing.
+
+A key aspect of scalability is maintaining an event queue per
+pipeline. This makes it easy to process several pipelines in
+parallel. A new driver event is first processed in the driver event
+queue. This adds a new event into the scheduler event queue. The
+scheduler event queue then checks which pipeline may be interested in
+this event according to the tenant configuration and layout. Based on
+this the event is dispatched to all matching pipeline queues.
+
+In order to make reconfigurations efficient we store the parsed branch
+config in Zookeeper. This makes it possible to create the current
+layout without the need to ask the mergers multiple times for the
+configuration. This is used by zuul-web to keep an up-to-date layout
+for API requests.
+
+We store the pipeline state in Zookeeper.  This contains the complete
+information about queue items, jobs and builds, as well as a separate
+abbreviated state for quick access by zuul-web for the status page.
+
+Driver Event Ingestion
+----------------------
+
+There are three types of event receiving mechanisms in Zuul:
+
+* Active event gathering: The connection actively listens to events (Gerrit)
+  or generates them itself (git, timer, zuul)
+
+* Passive event gathering: The events are sent to Zuul from outside (GitHub
+  webhooks)
+
+* Internal event generation: The events are generated within Zuul itself and
+  typically get injected directly into the scheduler event loop.
+
+The active event gathering needs to be handled differently from
+passive event gathering.
+
+Active Event Gathering
+~~~~~~~~~~~~~~~~~~~~~~
+
+This is mainly done by the Gerrit driver. We actively maintain a
+connection to the target and receive events. We utilize a leader
+election to make sure there is exactly one instance receiving the
+events.
+
+Passive Event Gathering
+~~~~~~~~~~~~~~~~~~~~~~~
+
+In case of passive event gathering the events are sent to Zuul
+typically via webhooks. These types of events are received in zuul-web
+which then stores them in Zookeeper. This type of event gathering is
+used by GitHub and other drivers. In this case we can have multiple
+instances but still receive only one event so that we don't need to
+take special care of event deduplication or leader election.  Multiple
+instances behind a load balancer are safe to use and recommended for
+such passive event gathering.
+
+Configuration Storage
+---------------------
+
+Zookeeper is not designed as a database with a large amount of data,
+so we should store as little as possible in zookeeper. Thus we only
+store the per project-branch unparsed config in zookeeper. From this,
+every part of Zuul, like the scheduler or zuul-web, can quickly
+recalculate the layout of each tenant and keep it up to date by
+watching for changes in the unparsed project-branch-config.
+
+We store the actual config sharded in multiple nodes, and those nodes
+are stored under per project and branch znodes. This is needed because
+of the 1MB limit per znode in zookeeper. It further makes it less
+expensive to cache the global config in each component as this cache
+is updated incrementally.
+
+Executor and Merger Queues
+--------------------------
+
+The executors and mergers each have an execution queue (and in the
+case of executors, optionally per-zone queues).  This makes it easy
+for executors and mergers to simply pick the next job to run without
+needing to inspect the entire pipeline state.  The scheduler is
+responsible for submitting job requests as the state changes.
+
 Zookeeper Map
-=============
+-------------
 
 This is a reference for object layout in Zookeeper.
author	Zuul <zuul@review.opendev.org>	2021-11-30 03:48:05 +0000
committer	Gerrit Code Review <review@openstack.org>	2021-11-30 03:48:05 +0000
commit	09ca7e01d0d1c1d8ce47edfee821535770ee2a00 (patch)
tree	2ed8c218b1e9f378ab39b09ef9f2746c3c57662a /doc
parent	7650882d009855a858941996a279decbc0293225 (diff)
parent	2836af9c4441b33678a2d6a509769a09e544d91f (diff)
download	zuul-09ca7e01d0d1c1d8ce47edfee821535770ee2a00.tar.gz