summaryrefslogtreecommitdiff
path: root/doc/rtd/development/debugging.rst
blob: c28c68fbe4988c110397b6571f8c507dfb5f1109 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
.. _debugging:

Debugging ``cloud-init``
************************

Overview
========

This topic will discuss general approaches for testing and debugging
``cloud-init`` on deployed instances.

.. _boot_time_analysis:

Boot time analysis
==================

:command:`cloud-init analyze`
-----------------------------

Occasionally, instances don't appear as performant as we would like and
``cloud-init`` packages a simple facility to inspect which operations took the
longest during boot and setup.

The script :file:`/usr/bin/cloud-init` has an analysis sub-command,
:command:`analyze`, which parses any :file:`cloud-init.log` file into formatted
and sorted events. It allows for detailed analysis of the most costly
``cloud-init`` operations, and to determine the long-pole in ``cloud-init``
configuration and setup. These subcommands default to reading
:file:`/var/log/cloud-init.log`.

:command:`analyze show`
^^^^^^^^^^^^^^^^^^^^^^^

Parse and organise :file:`cloud-init.log` events by stage and include each
sub-stage granularity with time delta reports.

.. code-block:: shell-session

    $ cloud-init analyze show -i my-cloud-init.log

Example output:

.. code-block:: shell-session

    -- Boot Record 01 --
    The total time elapsed since completing an event is printed after the "@"
    character.
    The time the event takes is printed after the "+" character.

    Starting stage: modules-config
    |`->config-snap_config ran successfully @05.47700s +00.00100s
    |`->config-ssh-import-id ran successfully @05.47800s +00.00200s
    |`->config-locale ran successfully @05.48000s +00.00100s
    ...


:command:`analyze dump`
^^^^^^^^^^^^^^^^^^^^^^^

Parse :file:`cloud-init.log` into event records and return a list of
dictionaries that can be consumed for other reporting needs.

.. code-block:: shell-session

    $ cloud-init analyze dump -i my-cloud-init.log

Example output:

.. code-block::

    [
     {
      "description": "running config modules",
      "event_type": "start",
      "name": "modules-config",
      "origin": "cloudinit",
      "timestamp": 1510807493.0
     },...

:command:`analyze blame`
^^^^^^^^^^^^^^^^^^^^^^^^

Parse :file:`cloud-init.log` into event records and sort them based on the
highest time cost for a quick assessment of areas of ``cloud-init`` that may
need improvement.

.. code-block:: shell-session

    $ cloud-init analyze blame -i my-cloud-init.log

Example output:

.. code-block::

    -- Boot Record 11 --
         00.01300s (modules-final/config-scripts-per-boot)
         00.00400s (modules-final/config-final-message)
         00.00100s (modules-final/config-rightscale_userdata)
         ...

:command:`analyze boot`
^^^^^^^^^^^^^^^^^^^^^^^

Make subprocess calls to the kernel in order to get relevant pre-``cloud-init``
timestamps, such as the kernel start, kernel finish boot, and ``cloud-init``
start.

.. code-block:: shell-session

    $ cloud-init analyze boot

Example output:

.. code-block::

    -- Most Recent Boot Record --
        Kernel Started at: 2019-06-13 15:59:55.809385
        Kernel ended boot at: 2019-06-13 16:00:00.944740
        Kernel time to boot (seconds): 5.135355
        Cloud-init start: 2019-06-13 16:00:05.738396
        Time between Kernel boot and Cloud-init start (seconds): 4.793656

Analyze quickstart - LXC
------------------------

To quickly obtain a ``cloud-init`` log, try using :command:``lxc`` on any
Ubuntu system:

.. code-block:: shell-session

    $ lxc init ubuntu-daily:focal x1
    $ lxc start x1
    $ # Take lxc's cloud-init.log and pipe it to the analyzer
    $ lxc file pull x1/var/log/cloud-init.log - | cloud-init analyze dump -i -
    $ lxc file pull x1/var/log/cloud-init.log - | \
      python3 -m cloudinit.analyze dump -i -


Analyze quickstart - KVM
------------------------
To quickly analyze a KVM ``cloud-init`` log:

1. Download the current cloud image

.. code-block:: shell-session

    $ wget https://cloud-images.ubuntu.com/daily/server/focal/current/focal-server-cloudimg-amd64.img

2. Create a snapshot image to preserve the original cloud image

.. code-block:: shell-session

    $ qemu-img create -b focal-server-cloudimg-amd64.img -f qcow2 \
    test-cloudinit.qcow2

3. Create a seed image with metadata using :command:`cloud-localds`

.. code-block:: shell-session

    $ cat > user-data <<EOF
      #cloud-config
      password: passw0rd
      chpasswd: { expire: False }
      EOF
    $  cloud-localds my-seed.img user-data

4. Launch your modified VM

.. code-block:: shell-session

    $  kvm -m 512 -net nic -net user -redir tcp:2222::22 \
        -drive file=test-cloudinit.qcow2,if=virtio,format=qcow2 \
        -drive file=my-seed.img,if=virtio,format=raw

5. Analyze the boot (:command:`blame`, :command:`dump`, :command:`show`)

.. code-block:: shell-session

    $ ssh -p 2222 ubuntu@localhost 'cat /var/log/cloud-init.log' | \
       cloud-init analyze blame -i -


Running single cloud-config modules
===================================

This subcommand is not called by the init system. It can be called manually to
load the configured datasource and run a single cloud-config module once, using
the cached user data and metadata after the instance has booted. Each
cloud-config module has a module ``FREQUENCY`` configured: ``PER_INSTANCE``,
``PER_BOOT``, ``PER_ONCE`` or ``PER_ALWAYS``. When a module is run by
``cloud-init``, it stores a semaphore file in
:file:`/var/lib/cloud/instance/sem/config_<module_name>.<frequency>` which
marks when the module last successfully ran. Presence of this semaphore file
prevents a module from running again if it has already been run. To ensure that
a module is run again, the desired frequency can be overridden via the
command line:

.. code-block:: shell-session

   $ sudo cloud-init single --name cc_ssh --frequency always

Example output:

.. code-block::

   ...
   Generating public/private ed25519 key pair
   ...

Inspect :file:`cloud-init.log` for output of what operations were performed as
a result.

.. _proposed_sru_testing:

Stable Release Updates (SRU) testing for ``cloud-init``
=======================================================

Once an Ubuntu release is stable (i.e. after it is released), updates for it
must follow a special procedure called a "Stable Release Update" (`SRU`_).

The ``cloud-init`` project has a specific process it follows when validating
a ``cloud-init`` SRU, documented in the `CloudinitUpdates`_ wiki page.

Generally an SRU test of ``cloud-init`` performs the following:

 * Install a pre-release version of ``cloud-init`` from the **-proposed** APT
   pocket (e.g., **bionic-proposed**).
 * Upgrade ``cloud-init`` and attempt a clean run of ``cloud-init`` to assert
   that the new version works properly on the specific platform and Ubuntu
   series.
 * Check for tracebacks or errors in behaviour.

Manual SRU verification procedure
---------------------------------

Below are steps to manually test a pre-release version of ``cloud-init``
from **-proposed**

.. note::
    For each Ubuntu SRU, the Ubuntu Server team manually validates the new
    version of ``cloud-init`` on these platforms: **Amazon EC2, Azure, GCE,
    OpenStack, Oracle, Softlayer (IBM), LXD, KVM**

1. Launch a VM on your favorite platform, providing this cloud-config
   user data and replacing `<YOUR_LAUNCHPAD_USERNAME>` with your username:

.. code-block:: yaml

    ## template: jinja
    #cloud-config
    ssh_import_id: [<YOUR_LAUNCHPAD_USERNAME>]
    hostname: SRU-worked-{{v1.cloud_name}}

2. Wait for current ``cloud-init`` to complete, replace ``<YOUR_VM_IP>`` with
   the IP address of the VM that you launched in step 1. Be sure to make a
   note of the datasource ``cloud-init`` detected in ``--long`` output. You
   will need this during step 5, where you will use it to confirm the same
   datasource is detected after the upgrade:

.. code-block:: bash

    CI_VM_IP=<YOUR_VM_IP>
    $ ssh ubuntu@$CI_VM_IP -- cloud-init status --wait --long

3. Set up the **-proposed** pocket on your VM and upgrade to the **-proposed**
   ``cloud-init``. To do this, create the following bash script, which will
   add the **-proposed** pocket to APT's sources and install ``cloud-init``
   from that pocket:

.. code-block:: bash

    cat > setup_proposed.sh <<EOF
    #/bin/bash
    mirror=http://archive.ubuntu.com/ubuntu
    echo deb \$mirror \$(lsb_release -sc)-proposed main | tee \
        /etc/apt/sources.list.d/proposed.list
    apt-get update -q
    apt-get install -qy cloud-init
    EOF

.. code-block:: shell-session

    $ scp setup_proposed.sh ubuntu@$CI_VM_IP:.
    $ ssh ubuntu@$CI_VM_IP -- sudo bash setup_proposed.sh

4. Change hostname, clean ``cloud-init``'s state, and reboot to run
   ``cloud-init`` from scratch:

.. code-block:: shell-session

    $ ssh ubuntu@$CI_VM_IP -- sudo hostname something-else
    $ ssh ubuntu@$CI_VM_IP -- sudo cloud-init clean --logs --reboot

5. Validate **-proposed** ``cloud-init`` came up without error. First, we block
   until ``cloud-init`` completes, then verify from ``--long`` that the
   datasource is the same as the one picked up from step 1. Errors will show up
   in ``--long``:

.. code-block:: shell-session

   $ ssh ubuntu@$CI_VM_IP -- cloud-init status --wait --long

Make sure the hostname was set properly to `SRU-worked-<cloud name>`:

.. code-block:: shell-session

   $ ssh ubuntu@$CI_VM_IP -- hostname

Then, check for any errors or warnings in ``cloud-init`` logs. If successful,
this will produce no output:

.. code-block:: shell-session

   $ ssh ubuntu@$CI_VM_IP -- grep Trace "/var/log/cloud-init*"

6. If you encounter an error during SRU testing:

   * Create a `new cloud-init bug`_ reporting the version of ``cloud-init``
     affected
   * Ping upstream ``cloud-init`` on Libera's `#cloud-init IRC channel`_

.. _SRU: https://wiki.ubuntu.com/StableReleaseUpdates
.. _CloudinitUpdates: https://wiki.ubuntu.com/CloudinitUpdates
.. _new cloud-init bug: https://github.com/canonical/cloud-init/issues
.. _#cloud-init IRC channel: https://kiwiirc.com/nextclient/irc.libera.chat/cloud-init