diff options
Diffstat (limited to 'doc/rtd/development/debugging.rst')
-rw-r--r-- | doc/rtd/development/debugging.rst | 325 |
1 files changed, 325 insertions, 0 deletions
diff --git a/doc/rtd/development/debugging.rst b/doc/rtd/development/debugging.rst new file mode 100644 index 00000000..b0a0da35 --- /dev/null +++ b/doc/rtd/development/debugging.rst @@ -0,0 +1,325 @@ +.. _debugging: + +Debugging ``cloud-init`` +************************ + +Overview +======== + +This topic will discuss general approaches for testing and debugging +``cloud-init`` on deployed instances. + +.. _boot_time_analysis: + +Boot time analysis +================== + +:command:`cloud-init analyze` +----------------------------- + +Occasionally, instances don't appear as performant as we would like and +``cloud-init`` packages a simple facility to inspect which operations took the +longest during boot and setup. + +The script :file:`/usr/bin/cloud-init` has an analysis sub-command, +:command:`analyze`, which parses any :file:`cloud-init.log` file into formatted +and sorted events. It allows for detailed analysis of the most costly +``cloud-init`` operations, and to determine the long-pole in ``cloud-init`` +configuration and setup. These subcommands default to reading +:file:`/var/log/cloud-init.log`. + +:command:`analyze show` +^^^^^^^^^^^^^^^^^^^^^^^ + +Parse and organise :file:`cloud-init.log` events by stage and include each +sub-stage granularity with time delta reports. + +.. code-block:: shell-session + + $ cloud-init analyze show -i my-cloud-init.log + +Example output: + +.. code-block:: shell-session + + -- Boot Record 01 -- + The total time elapsed since completing an event is printed after the "@" + character. + The time the event takes is printed after the "+" character. + + Starting stage: modules-config + |`->config-snap_config ran successfully @05.47700s +00.00100s + |`->config-ssh-import-id ran successfully @05.47800s +00.00200s + |`->config-locale ran successfully @05.48000s +00.00100s + ... + + +:command:`analyze dump` +^^^^^^^^^^^^^^^^^^^^^^^ + +Parse :file:`cloud-init.log` into event records and return a list of +dictionaries that can be consumed for other reporting needs. + +.. code-block:: shell-session + + $ cloud-init analyze dump -i my-cloud-init.log + +Example output: + +.. code-block:: + + [ + { + "description": "running config modules", + "event_type": "start", + "name": "modules-config", + "origin": "cloudinit", + "timestamp": 1510807493.0 + },... + +:command:`analyze blame` +^^^^^^^^^^^^^^^^^^^^^^^^ + +Parse :file:`cloud-init.log` into event records and sort them based on the +highest time cost for a quick assessment of areas of ``cloud-init`` that may +need improvement. + +.. code-block:: shell-session + + $ cloud-init analyze blame -i my-cloud-init.log + +Example output: + +.. code-block:: + + -- Boot Record 11 -- + 00.01300s (modules-final/config-scripts-per-boot) + 00.00400s (modules-final/config-final-message) + 00.00100s (modules-final/config-rightscale_userdata) + ... + +:command:`analyze boot` +^^^^^^^^^^^^^^^^^^^^^^^ + +Make subprocess calls to the kernel in order to get relevant pre-``cloud-init`` +timestamps, such as the kernel start, kernel finish boot, and ``cloud-init`` +start. + +.. code-block:: shell-session + + $ cloud-init analyze boot + +Example output: + +.. code-block:: + + -- Most Recent Boot Record -- + Kernel Started at: 2019-06-13 15:59:55.809385 + Kernel ended boot at: 2019-06-13 16:00:00.944740 + Kernel time to boot (seconds): 5.135355 + Cloud-init start: 2019-06-13 16:00:05.738396 + Time between Kernel boot and Cloud-init start (seconds): 4.793656 + +Analyze quickstart - LXC +------------------------ + +To quickly obtain a ``cloud-init`` log, try using :command:``lxc`` on any +Ubuntu system: + +.. code-block:: shell-session + + $ lxc init ubuntu-daily:focal x1 + $ lxc start x1 + $ # Take lxc's cloud-init.log and pipe it to the analyzer + $ lxc file pull x1/var/log/cloud-init.log - | cloud-init analyze dump -i - + $ lxc file pull x1/var/log/cloud-init.log - | \ + python3 -m cloudinit.analyze dump -i - + + +Analyze quickstart - KVM +------------------------ +To quickly analyze a KVM ``cloud-init`` log: + +1. Download the current cloud image + +.. code-block:: shell-session + + $ wget https://cloud-images.ubuntu.com/daily/server/focal/current/focal-server-cloudimg-amd64.img + +2. Create a snapshot image to preserve the original cloud image + +.. code-block:: shell-session + + $ qemu-img create -b focal-server-cloudimg-amd64.img -f qcow2 \ + test-cloudinit.qcow2 + +3. Create a seed image with metadata using :command:`cloud-localds` + +.. code-block:: shell-session + + $ cat > user-data <<EOF + #cloud-config + password: passw0rd + chpasswd: { expire: False } + EOF + $ cloud-localds my-seed.img user-data + +4. Launch your modified VM + +.. code-block:: shell-session + + $ kvm -m 512 -net nic -net user -redir tcp:2222::22 \ + -drive file=test-cloudinit.qcow2,if=virtio,format=qcow2 \ + -drive file=my-seed.img,if=virtio,format=raw + +5. Analyze the boot (:command:`blame`, :command:`dump`, :command:`show`) + +.. code-block:: shell-session + + $ ssh -p 2222 ubuntu@localhost 'cat /var/log/cloud-init.log' | \ + cloud-init analyze blame -i - + + +Running single cloud-config modules +=================================== + +This subcommand is not called by the init system. It can be called manually to +load the configured datasource and run a single cloud-config module once, using +the cached user data and metadata after the instance has booted. Each +cloud-config module has a module ``FREQUENCY`` configured: ``PER_INSTANCE``, +``PER_BOOT``, ``PER_ONCE`` or ``PER_ALWAYS``. When a module is run by +``cloud-init``, it stores a semaphore file in +:file:`/var/lib/cloud/instance/sem/config_<module_name>.<frequency>` which +marks when the module last successfully ran. Presence of this semaphore file +prevents a module from running again if it has already been run. To ensure that +a module is run again, the desired frequency can be overridden via the +command line: + +.. code-block:: shell-session + + $ sudo cloud-init single --name cc_ssh --frequency always + +Example output: + +.. code-block:: + + ... + Generating public/private ed25519 key pair + ... + +Inspect :file:`cloud-init.log` for output of what operations were performed as +a result. + +.. _proposed_sru_testing: + +Stable Release Updates (SRU) testing for ``cloud-init`` +======================================================= + +Once an Ubuntu release is stable (i.e. after it is released), updates for it +must follow a special procedure called a "Stable Release Update" (`SRU`_). + +The ``cloud-init`` project has a specific process it follows when validating +a ``cloud-init`` SRU, documented in the `CloudinitUpdates`_ wiki page. + +Generally an SRU test of ``cloud-init`` performs the following: + + * Install a pre-release version of ``cloud-init`` from the **-proposed** APT + pocket (e.g., **bionic-proposed**). + * Upgrade ``cloud-init`` and attempt a clean run of ``cloud-init`` to assert + that the new version works properly on the specific platform and Ubuntu + series. + * Check for tracebacks or errors in behaviour. + +Manual SRU verification procedure +--------------------------------- + +Below are steps to manually test a pre-release version of ``cloud-init`` +from **-proposed** + +.. note:: + For each Ubuntu SRU, the Ubuntu Server team manually validates the new + version of ``cloud-init`` on these platforms: **Amazon EC2, Azure, GCE, + OpenStack, Oracle, Softlayer (IBM), LXD, KVM** + +1. Launch a VM on your favorite platform, providing this cloud-config + user data and replacing `<YOUR_LAUNCHPAD_USERNAME>` with your username: + +.. code-block:: yaml + + ## template: jinja + #cloud-config + ssh_import_id: [<YOUR_LAUNCHPAD_USERNAME>] + hostname: SRU-worked-{{v1.cloud_name}} + +2. Wait for current ``cloud-init`` to complete, replace ``<YOUR_VM_IP>`` with + the IP address of the VM that you launched in step 1. Be sure to make a + note of the datasource ``cloud-init`` detected in ``--long`` output. You + will need this during step 5, where you will use it to confirm the same + datasource is detected after the upgrade: + +.. code-block:: bash + + CI_VM_IP=<YOUR_VM_IP> + $ ssh ubuntu@$CI_VM_IP -- cloud-init status --wait --long + +3. Set up the **-proposed** pocket on your VM and upgrade to the **-proposed** + ``cloud-init``. To do this, create the following bash script, which will + add the **-proposed** pocket to APT's sources and install ``cloud-init`` + from that pocket: + +.. code-block:: bash + + cat > setup_proposed.sh <<EOF + #/bin/bash + mirror=http://archive.ubuntu.com/ubuntu + echo deb \$mirror \$(lsb_release -sc)-proposed main | tee \ + /etc/apt/sources.list.d/proposed.list + apt-get update -q + apt-get install -qy cloud-init + EOF + +.. code-block:: shell-session + + $ scp setup_proposed.sh ubuntu@$CI_VM_IP:. + $ ssh ubuntu@$CI_VM_IP -- sudo bash setup_proposed.sh + +4. Change hostname, clean ``cloud-init``'s state, and reboot to run + ``cloud-init`` from scratch: + +.. code-block:: shell-session + + $ ssh ubuntu@$CI_VM_IP -- sudo hostname something-else + $ ssh ubuntu@$CI_VM_IP -- sudo cloud-init clean --logs --reboot + +5. Validate **-proposed** ``cloud-init`` came up without error. First, we block + until ``cloud-init`` completes, then verify from ``--long`` that the + datasource is the same as the one picked up from step 1. Errors will show up + in ``--long``: + +.. code-block:: shell-session + + $ ssh ubuntu@$CI_VM_IP -- cloud-init status --wait --long + +Make sure the hostname was set properly to `SRU-worked-<cloud name>`: + +.. code-block:: shell-session + + $ ssh ubuntu@$CI_VM_IP -- hostname + +Then, check for any errors or warnings in ``cloud-init`` logs. If successful, +this will produce no output: + +.. code-block:: shell-session + + $ ssh ubuntu@$CI_VM_IP -- grep Trace "/var/log/cloud-init*" + +6. If you encounter an error during SRU testing: + + * Create a `new cloud-init bug`_ reporting the version of ``cloud-init`` + affected + * Ping upstream ``cloud-init`` on Libera's `#cloud-init IRC channel`_ + +.. _SRU: https://wiki.ubuntu.com/StableReleaseUpdates +.. _CloudinitUpdates: https://wiki.ubuntu.com/CloudinitUpdates +.. _new cloud-init bug: https://bugs.launchpad.net/cloud-init/+filebug +.. _#cloud-init IRC channel: https://kiwiirc.com/nextclient/irc.libera.chat/cloud-init |