diff options
Diffstat (limited to 'doc/source/developer/specs/nodepool-in-zuul.rst')
-rw-r--r-- | doc/source/developer/specs/nodepool-in-zuul.rst | 743 |
1 files changed, 743 insertions, 0 deletions
diff --git a/doc/source/developer/specs/nodepool-in-zuul.rst b/doc/source/developer/specs/nodepool-in-zuul.rst new file mode 100644 index 000000000..10a3ad7dc --- /dev/null +++ b/doc/source/developer/specs/nodepool-in-zuul.rst @@ -0,0 +1,743 @@ +Nodepool in Zuul +================ + +.. warning:: This is not authoritative documentation. These features + are not currently available in Zuul. They may change significantly + before final implementation, or may never be fully completed. + +The following specification describes a plan to move Nodepool's +functionality into Zuul and end development of Nodepool as a separate +application. This will allow for more node and image related features +as well as simpler maintenance and deployment. + +Introduction +------------ + +Nodepool exists as a distinct application from Zuul largely due to +historical circumstances: it was originally a process for launching +nodes, attaching them to Jenkins, detaching them from Jenkins and +deleting them. Once Zuul grew its own execution engine, Nodepool +could have been adopted into Zuul at that point, but the existing +loose API meant it was easy to maintain them separately and combining +them wasn't particularly advantageous. + +However, now we find ourselves with a very robust framework in Zuul +for dealing with ZooKeeper, multiple components, web services and REST +APIs. All of these are lagging behind in Nodepool, and it is time to +address that one way or another. We could of course upgrade +Nodepool's infrastructure to match Zuul's, or even separate out these +frameworks into third-party libraries. However, there are other +reasons to consider tighter coupling between Zuul and Nodepool, and +these tilt the scales in favor of moving Nodepool functionality into +Zuul. + +Designing Nodepool as part of Zuul would allow for more features +related to Zuul's multi-tenancy. Zuul is quite good at +fault-tolerance as well as scaling, so designing Nodepool around that +could allow for better cooperation between node launchers. Finally, +as part of Zuul, Nodepool's image lifecycle can be more easily +integrated with Zuul-based workflow. + +There are two Nodepool components: nodepool-builder and +nodepool-launcher. We will address the functionality of each in the +following sections on Image Management and Node Management. + +This spec contemplates a new Zuul component to handle image and node +management: zuul-launcher. Much of the Nodepool configuration will +become Zuul configuration as well. That is detailed in its own +section, but for now, it's enough to know that the Zuul system as a +whole will know what images and node labels are present in the +configuration. + +Image Management +---------------- + +Part of nodepool-builder's functionality is important to have as a +long-running daemon, and part of what it does would make more sense as +a Zuul job. By moving the actual image build into a Zuul job, we can +make the activity more visible to users of the system. It will be +easier for users to test changes to image builds (inasmuch as they can +propose a change and a check job can run on that change to see if the +image builds sucessfully). Build history and logs will be visible in +the usual way in the Zuul web interface. + +A frequently requested feature is the ability to verify images before +putting them into service. This is not practical with the current +implementation of Nodepool because of the loose coupling with Zuul. +However, once we are able to include Zuul jobs in the workflow of +image builds, it is easier to incorporate Zuul jobs to validate those +images as well. This spec includes a mechanism for that. + +The parts of nodepool-builder that makes sense as a long-running +daemon are the parts dealing with image lifecycles. Uploading builds +to cloud providers, keeping track of image builds and uploads, +deciding when those images should enter or leave service, and deleting +them are all better done with state management and long-running +processes (we should know -- early versions of Nodepool attempted to +do all of that with Jenkins jobs with limited success). + +The sections below describe how we will implement image management in +Zuul. + +First, a reminder that using custom images is optional with Zuul. +Many Zuul systems will be able to operate using only stock cloud +provider images. One of the strengths of nodepool-builder is that it +can build an image for Zuul without relying on any particular cloud +provider images. A Zuul system whose operator wants to use custom +images will need to bootstrap that process, and under the proposed +system where images are build in Zuul jobs, that would need to be done +using a stock cloud image. In other words, to bootstrap a system such +as OpenDev from scratch, the operators would need to use a stock cloud +image to run the job to build the custom image. Once a custom image +is available, further image builds could be run on either the stock +cloud image or the custom image. That decision is left to the +operator and involves consideration of fault tolerance and disaster +recovery scenarios. + +To build a custom image, an operator will define a fairly typical Zuul +job for each image they would like to produce. For example, a system +may have one job to build a debian-stable image, a second job for +debian-unstable, a third job for ubuntu-focal, a fourth job for +ubuntu-jammy. Zuul's job inheritance system could be very useful here +to deal with many variations of a similar process. + +Currently nodepool-builder will build an image under three +circumstances: 1) the image (or the image in a particular format) is +missing; 2) a user has directly requested a build; 3) on an automatic +interval (typically daily). To map this into Zuul, we will use Zuul's +existing pipeline functionality, but we will add a new trigger for +case #1. Case #2 can be handled by a manual Zuul enqueue command, and +case #3 by a periodic pipeline trigger. + +Since Zuul knows what images are configured and what their current +states are, it will be able to emit trigger events when it detects +that a new image (or image format) has been added to its +configuration. In these cases, the `zuul` driver in Zuul will enqueue +an `image-build` trigger event on startup or reconfiguration for every +missing image. The event will include the image name. Pipelines will +be configured to trigger on `image-build` events as well as on a timer +trigger. + +Jobs will include an extra attribute to indicate they build a +particular image. This serves two purposes; first, in the case of an +`image-build` trigger event, it will act as a matcher so that only +jobs matching the image that needs building are run. Second, it will +allow Zuul to determine which formats are needed for that image (based +on which providers are configured to use it) and include that +information as job data. + +The job will be responsible for building the image and uploading the +result to some storage system. The URLs for each image format built +should be returned to Zuul as artifacts. + +Finally, the `zuul` driver reporter will accept parameters which will +tell it to search the result data for these artifact URLs and update +the internal image state accordingly. + +An example configuration for a simple single-stage image build: + +.. code-block:: yaml + + - pipeline: + name: image + trigger: + zuul: + events: + - image-build + timer: + time: 0 0 * * * + success: + zuul: + image-built: true + image-validated: true + + - job: + name: build-debian-unstable-image + image-build-name: debian-unstable + +This job would run whenever Zuul determines it needs a new +debian-unstable image or daily at midnight. Once the job completes, +because of the ``image-built: true`` report, it will look for artifact +data like this: + +.. code-block:: yaml + + artifacts: + - name: raw image + url: https://storage.example.com/new_image.raw + metadata: + type: zuul_image + image_name: debian-unstable + format: raw + - name: qcow2 image + url: https://storage.example.com/new_image.qcow2 + metadata: + type: zuul_image + image_name: debian-unstable + format: qcow2 + +Zuul will update internal records in ZooKeeper for the image to record +the storage URLs. The zuul-launcher process will then start +background processes to download the images from the storage system +and upload them to the configured providers (much as nodepool-builder +does now with files on disk). As a special case, it may detect that +the image files are stored in a location that a provider can access +directly for import and may be able to import directly from the +storage location rather than downloading locally first. + +To handle image validation, a flag will be stored for each image +upload indicating whether it has been validated. The example above +specifies ``image-validated: true`` and therefore Zuul will put the +image into service as soon as all image uploads are complete. +However, if it were false, then Zuul would emit an `image-validate` +event after each upload is complete. A second pipeline can be +configured to perform image validation. It can run any number of +jobs, and since Zuul has complete knowledge of image states, it will +supply nodes using the new image upload (which is not yet in service +for normal jobs). An example of this might look like: + +.. code-block:: yaml + + - pipeline: + name: image-validate + trigger: + zuul: + events: + - image-validate + success: + zuul: + image-validated: true + + - job: + name: validate-debian-unstable-image + image-build-name: debian-unstable + nodeset: + nodes: + - name: node + label: debian + +The label should specify the same image that is being validated. Its +node request will be made with extra specifications so that it is +fulfilled with a node built from the image under test. This process +may repeat for each of the providers using that image (normal pipeline +queue deduplication rules may need a special case to allow this). +Once the validation jobs pass, the entry in ZooKeeper will be updated +and the image will go into regular service. + +A more specific process definition follows: + +After a buildset reports with ``image-built: true``, Zuul will scan +result data and for each artifact it finds, it will create an entry in +ZooKeeper at `/zuul/images/<image_name>/<sequence>`. Zuul will know +not to emit any more `image-build` events for that image at this +point. + +For every provider using that image, Zuul will create an entry in +ZooKeeper at +`/zuul/image-uploads/<image_name>/<image_number>/provider/<provider_name>`. +It will set the remote image ID to null and the `image-validated` flag +to whatever was specified in the reporter. + +Whenever zuul-launcher observes a new `image-upload` record without an +ID, it will: + +* Lock the whole image +* Lock each upload it can handle +* Unlocks the image while retaining the upload locks +* Downloads artifact (if needed) and uploads images to provider +* If upload requires validation, it enqueues an `image-validate` zuul driver trigger event +* Unlocks upload + +The locking sequence is so that a single launcher can perform multiple +uploads from a single artifact download if it has the opportunity. + +Once more than two builds of an image are in service, the oldest is +deleted. The image ZooKeeper record set to the `deleting` state. +Zuul-launcher will delete the uploads from the providers. The `zuul` +driver emits an `image-delete` event with item data for the image +artifact. This will trigger an image-delete job that can delete the +artifact from the cloud storage. + +All of these pipeline definitions should typically be in a single +tenant (but need not be), but the images they build are potentially +available to each tenant that includes the image definition +configuration object (see the Configuration section below). Any repo +in a tenant with an image build pipeline will be able to cause images +to be built and uploaded to providers. + +Snapshot Images +~~~~~~~~~~~~~~~ + +Nodepool does not currently support snapshot images, but the spec for +the current version of Nodepool does contemplate the possibility of a +snapshot based nodepool-builder process. Likewise, this spec does not +require us to support snapshot image builds, but in case we want to +add support in the future, we should have a plan for it. + +The image build job in Zuul could, instead of running +diskimage-builder, act on the remote node to prepare it for a +snapshot. A special job attribute could indicate that it is a +snapshot image job, and instead of having the zuul-launcher component +delete the node at the end of the job, it could snapshot the node and +record that information in ZooKeeper. Unlike an image-build job, an +image-snapshot job would need to run in each provider (similar to how +it is proposed that an image-validate job will run in each provider). +An image-delete job would not be required. + + +Node Management +--------------- + +The techniques we have developed for cooperative processing in Zuul +can be applied to the node lifecycle. This is a good time to make a +significant change to the nodepool protocol. We can achieve several +long-standing goals: + +* Scaling and fault-tolerance: rather than having a 1:N relationship + of provider:nodepool-launcher, we can have multiple zuul-launcher + processes, each of which is capable of handling any number of + providers. + +* More intentional request fulfillment: almost no intelligence goes + into selecting which provider will fulfill a given node request; by + assigning providers intentionally, we can more efficiently utilize + providers. + +* Fulfilling node requests from multiple providers: by designing + zuul-launcher for cooperative work, we can have nodesets that + request nodes which are fulfilled by different providers. Generally + we should favor the same provider for a set of nodes (since they may + need to communicate over a LAN), but if that is not feasible, + allowing multiple providers to fulfill a request will permit + nodesets with diverse node types (e.g., VM + static, or VM + + container). + +Each zuul-launcher process will execute a number of processing loops +in series; first a global request processing loop, and then a +processing loop for each provider. Each one will involve obtaining a +ZooKeeper lock so that only one zuul-launcher process will perform +each function at a time. + +Zuul-launcher will need to know about every connection in the system +so that it may have a fuul copy of the configuration, but operators +may wish to localize launchers to specific clouds. To support this, +zuul-launcher will take an optional command-line argument to indicate +on which connections it should operate. + +Currently a node request as a whole may be declined by providers. We +will make that more granular and store information about each node in +the request (in other words, individual nodes may be declined by +providers). + +All drivers for providers should implement the state machine +interface. Any state machine information currently storen in memory +in nodepool-launcher will need to move to ZooKeeper so that other +launchers can resume state machine processing. + +The individual provider loop will: + +* Lock a provider in ZooKeeper (`/zuul/provider/<name>`) +* Iterate over every node assigned to that provider in a `building` state + + * Drive the state machine + * If success, update request + * If failure, determine if it's a temporary or permanent failure + and update the request accordingly + * If quota available, unpause provider (if paused) + +The global queue process will: + +* Lock the global queue +* Iterate over every pending node request, and every node within that request + + * If all providers have failed the request, clear all temp failures + * If all providers have permanently failed the request, return error + * Identify providers capable of fulfilling the request + * Assign nodes to any provider with sufficient quota + * If no providers with sufficient quota, assign it to first (highest + priority) provider that can fulfill it later and pause that + provider + +Configuration +------------- + +The configuration currently handled by Nodepool will be refactored and +added to Zuul's configuration syntax. It will be loaded directly from +git repos like most Zuul configuration, however it will be +non-speculative (like pipelines and semaphores -- changes must merge +before they take effect). + +Information about connecting to a cloud will be added to ``zuul.conf`` +as a ``connection`` entry. The rate limit setting will be moved to +the connection configuration. Providers will then reference these +connections by name. + +Because providers and images reference global (i.e., outside tenant +scope) concepts, ZooKeeper paths for data related to those should +include the canonical name of the repo where these objects are +defined. For example, a `debian-unstable` image in the +`opendev/images` repo should be stored at +``/zuul/zuul-images/opendev.org%2fopendev%2fimages/``. This avoids +collisions if different tenants contain different image objects with +the same name. + +The actual Zuul config objects will be tenant scoped. Image +definitions which should be available to a tenant should be included +in that tenant's config. Again using the OpenDev example, the +hypothetical `opendev/images` repository should be included in every +OpenDev tenant so all of those images are available. + +Within a tenant, image names must be unique (otherwise it is a tenant +configuration error, similar to a job name collision). + +The diskimage-builder related configuration items will no longer be +necessary since they will be encoded in Zuul jobs. This will reduce +the complexity of the configuration significantly. + +The provider configuration will change as we take the opportunity to +make it more "Zuul-like". Instead of a top-level dictionary, we will +use lists. We will standardize on attributes used across drivers +where possible, as well as attributes which may be located at +different levels of the configuration. + +The goals of this reorganization are: + +* Allow projects to manage their own image lifecycle (if permitted by + site administrators). +* Manage access control to labels, images and flavors via standard + Zuul mechanisms (whether an item appears within a tenant). +* Reduce repetition and boilerplate for systems with many clouds, + labels, or images. + +The new configuration objects are: + +Image + This represents any kind of image (A Zuul image built by a job + described above, or a cloud image). By using one object to + represent both, we open the possibility of having a label in one + provider use a cloud image and in another provider use a Zuul image + (because the label will reference the image by short-name which may + resolve to a different image object in different tenants). A given + image object will specify what type it is, and any relevant + information about it (such as the username to use, etc). + +Flavor + This is a new abstraction layer to reference instance types across + different cloud providers. Much like labels today, these probably + won't have much information associated with them other than to + reserve a name for other objects to reference. For example, a site + could define a `small` and a `large` flavor. These would later be + mapped to specific instance types on clouds. + +Label + Unlike the current Nodepool ``label`` definitions, these labels will + also specify the image and flavor to use. These reference the two + objects above, which means that labels themselves contain the + high-level definition of what will be provided (e.g., a `large + ubuntu` node) while the specific mapping of what `large` and + `ubuntu` mean are left to the more specific configuration levels. + +Section + This looks a lot like the current ``provider`` configuration in + Nodepool (but also a little bit like a ``pool``). Several parts of + the Nodepool configuration (such as separating out availability + zones from providers into pools) were added as an afterthought, and + we can take the opportunity to address that here. + + A ``section`` is part of a cloud. It might be a region (if a cloud + has regions). It might be one or more availability zones within a + region. A lot of the specifics about images, flavors, subnets, + etc., will be specified here. Because a cloud may have many + sections, we will implement inheritance among sections. + +Provider + This is mostly a mapping of labels to sections and is similar to a + provider pool in the current Nodepool configuration. It exists as a + separate object so that site administrators can restrict ``section`` + definitions to central repos and allow tenant administrators to + control their own image and labels by allowing certain projects to + define providers. + + It mostly consists of a list of labels, but may also include images. + +When launching a node, relevant attributes may come from several +sources (the pool, image, flavor, or provider). Not all attributes +make sense in all locations, but where we can support them in multiple +locations, the order of application (later items override earlier +ones) will be: + +* ``image`` stanza +* ``flavor`` stanza +* ``label`` stanza +* ``section`` stanza (top level) +* ``image`` within ``section`` +* ``flavor`` within ``section`` +* ``provider`` stanza (top level) +* ``label`` within ``provider`` + +This reflects that the configuration is built upwards from general and +simple objects toward more specific objects image, flavor, label, +section, provider. Generally speaking, inherited scalar values will +override, dicts will merge, lists will concatenate. + +An example configuration follows. First, some configuration which may +appear in a central project and shared among multiple tenants: + +.. code-block:: yaml + + # Images, flavors, and labels are the building blocks of the + # configuration. + + - image: + name: centos-7 + type: zuul + # Any other image-related info such as: + # username: ... + # python-path: ... + # shell-type: ... + # A default that can be overridden by a provider: + # config-drive: true + + - image: + name: ubuntu + type: cloud + + - flavor: + name: large + + - label: + name: centos-7 + min-ready: 1 + flavor: large + image: centos-7 + + - label: + name: ubuntu + flavor: small + image: ubuntu + + # A section for each cloud+region+az + + - section: + name: rax-base + abstract: true + connection: rackspace + boot-timeout: 120 + launch-timeout: 600 + key-name: infra-root-keys-2020-05-13 + # The launcher will apply the minimum of the quota reported by the + # driver (if available) or the values here. + quota: + instances: 2000 + subnet: some-subnet + tags: + section-info: foo + # We attach both kinds of images to providers in order to provide + # image-specific info (like config-drive) or username. + images: + - name: centos-7 + config-drive: true + # This is a Zuul image + - name: ubuntu + # This is a cloud image, so the specific cloud image name is required + image-name: ibm-ubuntu-20-04-3-minimal-amd64-1 + # Other information may be provided + # username ... + # python-path: ... + # shell-type: ... + flavors: + - name: small + cloud-flavor: "Performance 8G" + - name: large + cloud-flavor: "Performance 16G" + + - section: + name: rax-dfw + parent: rax-base + region: 'DFW' + availability-zones: ["a", "b"] + + # A provider to indicate what labels are available to a tenant from + # a section. + + - provider: + name: rax-dfw-main + section: rax-dfw + labels: + - name: centos-7 + - name: ubuntu + key-name: infra-root-keys-2020-05-13 + tags: + provider-info: bar + +The following configuration might appear in a repo that is only used +in a single tenant: + +.. code-block:: yaml + + - image: + name: devstack + type: zuul + + - label: + name: devstack + + - provider: + name: rax-dfw-devstack + section: rax-dfw + # The images can be attached to the provider just as a section. + image: + - name: devstack + config-drive: true + labels: + - name: devstack + +Here is a potential static node configuration: + +.. code-block:: yaml + + - label: + name: big-static-node + + - section: + name: static-nodes + connection: null + nodes: + - name: static.example.com + labels: + - big-static-node + host-key: ... + username: zuul + + - provider: + name: static-provider + section: static-nodes + labels: + - big-static-node + +Each of the the above stanzas may only appear once in a tenant for a +given name (like pipelines or semaphores, they are singleton objects). +If they appear in more than one branch of a project, the definitions +must be identical; otherwise, or if they appear in more than one repo, +the second definition is an error. These are meant to be used in +unbranched repos. Whatever tenants they appear in will be permitted +to access those respective resources. + +The purpose of the ``provider`` stanza is to associate labels, images, +and sections. Much of the configuration related to launching an +instance (including the availability of zuul or cloud images) may be +supplied in the ``provider`` stanza and will apply to any labels +within. The ``section`` stanza also allows configuration of the same +information except for the labels themselves. The ``section`` +supplies default values and the ``provider`` can override them or add +any missing values. Images are additive -- any images that appear in +a ``provider`` will augment those that appear in a ``section``. + +The result is a modular scheme for configuration, where a single +``section`` instance can be used to set as much information as +possible that applies globally to a provider. A simple configuration +may then have a single ``provider`` instance to attach labels to that +section. A more complex installation may define a "standard" pool +that is present in every tenant, and then tenant-specific pools as +well. These pools will all attach to the same section. + +References to sections, images and labels will be internally converted +to canonical repo names to avoid ambiguity. Under the current +Nodepool system, labels are truly a global object, but under this +proposal, a label short name in one tenant may be different than one +in another. Therefore the node request will internally specify the +canonical label name instead of the short name. Users will never use +canonical names, only short names. + +For static nodes, there is some repitition to labels: first labels +must be associated with the individual nodes defined on the section, +then the labels must appear again on a provider. This allows an +operator to define a collection of static nodes centrally on a +section, then include tenant-specific sets of labels in a provider. +For the simple case where all static node labels in a section should +be available in a provider, we could consider adding a flag to the +provider to allow that (e.g., ``include-all-node-labels: true``). +Static nodes themselves are configured on a section with a ``null`` +connection (since there is no cloud provider associated with static +nodes). In this case, the additional ``nodes`` section attribute +becomes available. + +Upgrade Process +--------------- + +Most users of diskimages will need to create new jobs to build these +images. This proposal also includes significant changes to the node +allocation system which come with operational risks. + +To make the transition as minimally disruptive as possible, we will +support both systems in Zuul, and allow for selection of one system or +the other on a per-label and per-tenant basis. + +By default, if a nodeset specifies a label that is not defined by a +``label`` object in the tenant, Zuul will use the old system and place +a ZooKeeper request in ``/nodepool``. If a matching ``label`` is +available in the tenant, The request will use the new system and be +sent to ``/zuul/node-requests``. Once a tenant has completely +converted, a configuration flag may be set in the tenant configuration +and that will allow Zuul to treat nodesets that reference unknown +labels as configuration errors. A later version of Zuul will remove +the backwards compatability and make this the standard behavior. + +Because each of the systems will have unique metadata, they will not +recognize each others nodes, and it will appear to each that another +system is using part of their quota. Nodepool is already designed to +handle this case (at least, handle it as well as possible). + +Library Requirements +-------------------- + +The new zuul-launcher component will need most of Nodepool's current +dependencies, which will entail adding many third-party cloud provider +interfaces. As of writing, this uses another 420M of disk space. +Since our primary method of distribution at this point is container +images, if the additional space is a concern, we could restrict the +installation of these dependencies to only the zuul-launcher image. + +Diskimage-Builder Testing +------------------------- + +The diskimage-builder project team has come to rely on Nodepool in its +testing process. It uses Nodepool to upload images to a devstack +cloud, launch nodes from those instances, and verify that they +function. To aid in continuity of testing in the diskimage-builder +project, we will extract the OpenStack image upload and node launching +code into a simple Python script that can be used in diskimage-builder +test jobs in place of Nodepool. + +Work Items +---------- + +* In existing Nodepool convert the following drivers to statemachine: + gce, kubernetes, openshift, openshift, openstack (openstack is the + only one likely to require substantial effort, the others should be + trivial) +* Replace Nodepool with an image upload script in diskimage-builder + test jobs +* Add roles to zuul-jobs to build images using diskimage-builder +* Implement node-related config items in Zuul config and Layout +* Create zuul-launcher executable/component +* Add image-name item data +* Add image-build-name attribute to jobs + * Including job matcher based on item image-name + * Include image format information based on global config +* Add zuul driver pipeline trigger/reporter +* Add image lifecycle manager to zuul-launcher + * Emit image-build events + * Emit image-validate events + * Emit image-delete events +* Add Nodepool driver code to Zuul +* Update zuul-launcher to perform image uploads and deletion +* Implement node launch global request handler +* Implement node launch provider handlers +* Update Zuul nodepool interface to handle both Nodepool and + zuul-launcher node request queues +* Add tenant feature flag to switch between them +* Release a minor version of Zuul with support for both +* Remove Nodepool support from Zuul +* Release a major version of Zuul with only zuul-launcher support +* Retire Nodepool |