diff options
Diffstat (limited to 'doc/rtd/explanation/boot.rst')
-rw-r--r-- | doc/rtd/explanation/boot.rst | 263 |
1 files changed, 263 insertions, 0 deletions
diff --git a/doc/rtd/explanation/boot.rst b/doc/rtd/explanation/boot.rst new file mode 100644 index 00000000..42ccbc87 --- /dev/null +++ b/doc/rtd/explanation/boot.rst @@ -0,0 +1,263 @@ +.. _boot_stages: + +Boot stages +*********** + +To be able to provide the functionality that it does, ``cloud-init`` must be +integrated into the boot in a fairly controlled way. There are five +stages to boot: + +1. Generator +2. Local +3. Network +4. Config +5. Final + +.. _boot-Generator: + +Generator +========= + +When booting under ``systemd``, a `generator`_ will run that determines if +``cloud-init.target`` should be included in the boot goals. By default, this +generator will enable ``cloud-init``. It will not enable ``cloud-init`` +if either: + +- The file :file:`/etc/cloud/cloud-init.disabled` exists, or +- The kernel command line as found in :file:`/proc/cmdline` contains + ``cloud-init=disabled``. When running in a container, the kernel command + line is not honored, but ``cloud-init`` will read an environment + variable named ``KERNEL_CMDLINE`` in its place. + +.. note:: + These mechanisms for disabling ``cloud-init`` at runtime currently only + exist in ``systemd``. + +.. _boot-Local: + +Local +===== + ++------------------+----------------------------------------------------------+ +| systemd service | ``cloud-init-local.service`` | ++---------+--------+----------------------------------------------------------+ +| runs | as soon as possible with ``/`` mounted read-write | ++---------+--------+----------------------------------------------------------+ +| blocks | as much of boot as possible, *must* block network | ++---------+--------+----------------------------------------------------------+ +| modules | none | ++---------+--------+----------------------------------------------------------+ + +The purpose of the local stage is to: + + - Locate "local" data sources, and + - Apply networking configuration to the system (including "fallback"). + +In most cases, this stage does not do much more than that. It finds the +datasource and determines the network configuration to be used. That +network configuration can come from: + +- **datasource**: Cloud-provided network configuration via metadata. +- **fallback**: ``Cloud-init``'s fallback networking consists of rendering + the equivalent to ``dhcp on eth0``, which was historically the most popular + mechanism for network configuration of a guest. +- **none**: Network configuration can be disabled by writing the file + :file:`/etc/cloud/cloud.cfg` with the content: + ``network: {config: disabled}``. + +If this is an instance's first boot, then the selected network configuration +is rendered. This includes clearing of all previous (stale) configuration +including persistent device naming with old MAC addresses. + +This stage must block network bring-up or any stale configuration that might +have already been applied. Otherwise, that could have negative effects such +as DHCP hooks or broadcast of an old hostname. It would also put the system +in an odd state to recover from, as it may then have to restart network +devices. + +``Cloud-init`` then exits and expects for the continued boot of the operating +system to bring network configuration up as configured. + +.. note:: + In the past, local datasources have been only those that were available + without network (such as 'ConfigDrive'). However, as seen in the recent + additions to the :ref:`DigitalOcean datasource<datasource_digital_ocean>`, + even data sources that require a network can operate at this stage. + +.. _boot-Network: + +Network +======= + ++------------------+----------------------------------------------------------+ +| systemd service | ``cloud-init.service`` | ++---------+--------+----------------------------------------------------------+ +| runs | after local stage and configured networking is up | ++---------+--------+----------------------------------------------------------+ +| blocks | as much of remaining boot as possible | ++---------+--------+----------------------------------------------------------+ +| modules | *cloud_init_modules* in ``/etc/cloud/cloud.cfg`` | ++---------+--------+----------------------------------------------------------+ + +This stage requires all configured networking to be online, as it will fully +process any user data that is found. Here, processing means it will: + +- retrieve any ``#include`` or ``#include-once`` (recursively) including + http, +- decompress any compressed content, and +- run any part-handler found. + +This stage runs the ``disk_setup`` and ``mounts`` modules which may partition +and format disks and configure mount points (such as in :file:`/etc/fstab`). +Those modules cannot run earlier as they may receive configuration input +from sources only available via the network. For example, a user may have +provided user data in a network resource that describes how local mounts +should be done. + +On some clouds, such as Azure, this stage will create filesystems to be +mounted, including ones that have stale (previous instance) references in +:file:`/etc/fstab`. As such, entries in :file:`/etc/fstab` other than those +necessary for cloud-init to run should not be done until after this stage. + +A part-handler will run at this stage, as will boothooks including +cloud-config ``bootcmd``. The user of this functionality has to be aware +that the system is in the process of booting when their code runs. + +.. _boot-Config: + +Config +====== + ++------------------+----------------------------------------------------------+ +| systemd service | ``cloud-config.service`` | ++---------+--------+----------------------------------------------------------+ +| runs | after network | ++---------+--------+----------------------------------------------------------+ +| blocks | nothing | ++---------+--------+----------------------------------------------------------+ +| modules | *cloud_config_modules* in ``/etc/cloud/cloud.cfg`` | ++---------+--------+----------------------------------------------------------+ + +This stage runs config modules only. Modules that do not really have an +effect on other stages of boot are run here, including ``runcmd``. + +.. _boot-Final: + +Final +===== + ++------------------+----------------------------------------------------------+ +| systemd service | ``cloud-final.service`` | ++---------+--------+----------------------------------------------------------+ +| runs | as final part of boot (traditional "rc.local") | ++---------+--------+----------------------------------------------------------+ +| blocks | nothing | ++---------+--------+----------------------------------------------------------+ +| modules | *cloud_final_modules* in ``/etc/cloud/cloud.cfg`` | ++---------+--------+----------------------------------------------------------+ + +This stage runs as late in boot as possible. Any scripts that a user is +accustomed to running after logging into a system should run correctly here. +Things that run here include: + +- package installations, +- configuration management plugins (Ansible, Puppet, Chef, salt-minion), and +- user-defined scripts (i.e., shell scripts passed as user data). + +For scripts external to ``cloud-init`` looking to wait until ``cloud-init`` is +finished, the :command:`cloud-init status --wait` subcommand can help block +external scripts until ``cloud-init`` is done without having to write your own +``systemd`` units dependency chains. See :ref:`cli_status` for more info. + +.. _boot-First_boot_determination: + +First boot determination +======================== + +``Cloud-init`` has to determine whether or not the current boot is the first +boot of a new instance, so that it applies the appropriate configuration. On +an instance's first boot, it should run all "per-instance" configuration, +whereas on a subsequent boot it should run only "per-boot" configuration. This +section describes how ``cloud-init`` performs this determination, as well as +why it is necessary. + +When it runs, ``cloud-init`` stores a cache of its internal state for use +across stages and boots. + +If this cache is present, then ``cloud-init`` has run on this system +before [#not-present]_. There are two cases where this could occur. Most +commonly, the instance has been rebooted, and this is a second/subsequent +boot. Alternatively, the filesystem has been attached to a *new* instance, +and this is the instance's first boot. The most obvious case where this +happens is when an instance is launched from an image captured from a +launched instance. + +By default, ``cloud-init`` attempts to determine which case it is running +in by checking the instance ID in the cache against the instance ID it +determines at runtime. If they do not match, then this is an instance's +first boot; otherwise, it's a subsequent boot. Internally, ``cloud-init`` +refers to this behaviour as ``check``. + +This behaviour is required for images captured from launched instances to +behave correctly, and so is the default that generic cloud images ship with. +However, there are cases where it can cause problems [#problems]_. For these +cases, ``cloud-init`` has support for modifying its behaviour to trust the +instance ID that is present in the system unconditionally. This means that +``cloud-init`` will never detect a new instance when the cache is present, +and it follows that the only way to cause ``cloud-init`` to detect a new +instance (and therefore its first boot) is to manually remove +``cloud-init``'s cache. Internally, this behaviour is referred to as +``trust``. + +To configure which of these behaviours to use, ``cloud-init`` exposes the +``manual_cache_clean`` configuration option. When ``false`` (the default), +``cloud-init`` will ``check`` and clean the cache if the instance IDs do +not match (this is the default, as discussed above). When ``true``, +``cloud-init`` will ``trust`` the existing cache (and therefore not clean it). + +Manual cache cleaning +===================== + +``Cloud-init`` ships a command for manually cleaning the cache: +:command:`cloud-init clean`. See :ref:`cli_clean`'s documentation for further +details. + +Reverting ``manual_cache_clean`` setting +---------------------------------------- + +Currently there is no support for switching an instance that is launched with +``manual_cache_clean: true`` from ``trust`` behaviour to ``check`` behaviour, +other than manually cleaning the cache. + +.. warning:: If you want to capture an instance that is currently in ``trust`` + mode as an image for launching other instances, you **must** manually clean + the cache. If you do not do so, then instances launched from the captured + image will all detect their first boot as a subsequent boot of the captured + instance, and will not apply any per-instance configuration. + + This is a functional issue, but also a potential security one: + ``cloud-init`` is responsible for rotating SSH host keys on first boot, + and this will not happen on these instances. + +.. [#not-present] It follows that if this cache is not present, + ``cloud-init`` has not run on this system before, so this is + unambiguously this instance's first boot. + +.. [#problems] A couple of ways in which this strict reliance on the presence + of a datasource has been observed to cause problems: + + - If a cloud's metadata service is flaky and ``cloud-init`` cannot + obtain the instance ID locally on that platform, ``cloud-init``'s + instance ID determination will sometimes fail to determine the current + instance ID, which makes it impossible to determine if this is an + instance's first or subsequent boot (`#1885527`_). + - If ``cloud-init`` is used to provision a physical appliance or device + and an attacker can present a datasource to the device with a different + instance ID, then ``cloud-init``'s default behaviour will detect this as + an instance's first boot and reset the device using the attacker's + configuration (this has been observed with the + :ref:`NoCloud datasource<datasource_nocloud>` in `#1879530`_). + +.. _generator: https://www.freedesktop.org/software/systemd/man/systemd.generator.html +.. _#1885527: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1885527 +.. _#1879530: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1879530 |