summaryrefslogtreecommitdiff
path: root/api-guide/source/faults.rst
blob: 529b119a45721da45bb14a58627371160a560fbe (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
======
Faults
======

This doc explains how to understand what has happened to your API request.

Every HTTP request has a status code. 2xx codes signify the API call was a
success. However, that is often not the end of the story. That generally only
means the request to start the operation has been accepted. It does not mean
the action you requested has successfully completed.


Tracking Errors by Request ID
=============================

There are two types of request ID.

.. list-table::
  :header-rows: 1
  :widths: 2,8

  * - Type
    - Description
  * - Local request ID
    - Locally generated unique request ID by each service and different between
      all services (Nova, Cinder, Glance, Neutron, etc.) involved
      in that operation. The format is ``req-`` + UUID (UUID4).
  * - Global request ID
    - User specified request ID which is utilized as common identifier
      by all services (Nova, Cinder, Glance, Neutron, etc.) involved
      in that operation. This request ID is same among all services involved
      in that operation.
      The format is ``req-`` + UUID (UUID4).

It is extremely common for clouds to have an ELK (Elastic Search, Logstash,
Kibana) infrastructure consuming their logs.
The only way to query these flows is if there is a common identifier across
all relevant messages. The global request ID immediately makes existing
deployed tooling better for managing OpenStack.

**Request Header**

In each REST API request, you can specify the global request ID
in ``X-Openstack-Request-Id`` header, starting from microversion 2.46.
The format must be ``req-`` + UUID (UUID4).
If not in accordance with the format, the global request ID is ignored by Nova.

Request header example::

  X-Openstack-Request-Id: req-3dccb8c4-08fe-4706-a91d-e843b8fe9ed2

**Response Header**

In each REST API request, ``X-Compute-Request-Id`` is returned
in the response header.
Starting from microversion 2.46, ``X-Openstack-Request-Id`` is also returned
in the response header.

``X-Compute-Request-Id`` and ``X-Openstack-Request-Id`` are local request IDs.
The global request IDs are not returned.

Response header example::

  X-Compute-Request-Id: req-d7bc29d0-7b99-4aeb-a356-89975043ab5e
  X-Openstack-Request-Id: req-d7bc29d0-7b99-4aeb-a356-89975043ab5e

Server Actions
--------------

Most `server action APIs`_ are asynchronous. Usually the API service will do
some minimal work and then send the request off to the ``nova-compute`` service
to complete the action and the API will return a 202 response to the client.
The client will poll the API until the operation completes, which could be a
status change on the server but is usually at least always waiting for the
server ``OS-EXT-STS:task_state`` field to go to ``null`` indicating the action
has completed either successfully or with an error.

If a server action fails and the server status changes to ``ERROR`` an
:ref:`instance fault <instance-fault>` will be shown with the server details.

The `os-instance-actions API`_ allows users end users to list the outcome of
server actions, referencing the requested action by request id. This is useful
when an action fails and the server status does not change to ``ERROR``.

To illustrate, consider a server (vm1) created with flavor ``m1.tiny``:

.. code-block:: console

  $ openstack server create --flavor m1.tiny --image cirros-0.4.0-x86_64-disk --wait vm1
  +-----------------------------+-----------------------------------------------------------------+
  | Field                       | Value                                                           |
  +-----------------------------+-----------------------------------------------------------------+
  | OS-DCF:diskConfig           | MANUAL                                                          |
  | OS-EXT-AZ:availability_zone | nova                                                            |
  | OS-EXT-STS:power_state      | Running                                                         |
  | OS-EXT-STS:task_state       | None                                                            |
  | OS-EXT-STS:vm_state         | active                                                          |
  | OS-SRV-USG:launched_at      | 2019-12-02T19:14:48.000000                                      |
  | OS-SRV-USG:terminated_at    | None                                                            |
  | accessIPv4                  |                                                                 |
  | accessIPv6                  |                                                                 |
  | addresses                   | private=10.0.0.60, fda0:e0c4:2764:0:f816:3eff:fe03:806          |
  | adminPass                   | NgascCr3dYo4                                                    |
  | config_drive                |                                                                 |
  | created                     | 2019-12-02T19:14:42Z                                            |
  | flavor                      | m1.tiny (1)                                                     |
  | hostId                      | 22e88bec09a7e33606348fce0abac0ebbbe091a35e29db1498ec4e14        |
  | id                          | 344174b8-34fd-4017-ae29-b9084dcf3861                            |
  | image                       | cirros-0.4.0-x86_64-disk (cce5e6d6-d359-4152-b277-1b4f1871557f) |
  | key_name                    | None                                                            |
  | name                        | vm1                                                             |
  | progress                    | 0                                                               |
  | project_id                  | b22597ea961545f3bde1b2ede0bd5b91                                |
  | properties                  |                                                                 |
  | security_groups             | name='default'                                                  |
  | status                      | ACTIVE                                                          |
  | updated                     | 2019-12-02T19:14:49Z                                            |
  | user_id                     | 046033fb3f824550999752b6525adbac                                |
  | volumes_attached            |                                                                 |
  +-----------------------------+-----------------------------------------------------------------+

The owner of the server then tries to resize the server to flavor ``m1.small``
which fails because there are no hosts available on which to resize the server:

.. code-block:: console

  $ openstack server resize --flavor m1.small --wait vm1
  Complete

Despite the openstack command saying the operation completed, the server shows
the original ``m1.tiny`` flavor and the status is not ``VERIFY_RESIZE``:

.. code-block::

  $ openstack server show vm1 -f value -c status -c flavor
  m1.tiny (1)
  ACTIVE

Since the status is not ``ERROR`` there are is no ``fault`` field in the server
details so we find the details by listing the events for the server:

.. code-block:: console

  $ openstack server event list vm1
  +------------------------------------------+--------------------------------------+--------+----------------------------+
  | Request ID                               | Server ID                            | Action | Start Time                 |
  +------------------------------------------+--------------------------------------+--------+----------------------------+
  | req-ea1b0dfc-3186-42a9-84ff-c4f4fb130fae | 344174b8-34fd-4017-ae29-b9084dcf3861 | resize | 2019-12-02T19:15:35.000000 |
  | req-4cdc4c93-0668-4ae6-98c8-a0a5fcc63d39 | 344174b8-34fd-4017-ae29-b9084dcf3861 | create | 2019-12-02T19:14:42.000000 |
  +------------------------------------------+--------------------------------------+--------+----------------------------+

To see details about the ``resize`` action, we use the Request ID for that
action:

.. code-block:: console

  $ openstack server event show vm1 req-ea1b0dfc-3186-42a9-84ff-c4f4fb130fae
  +---------------+------------------------------------------+
  | Field         | Value                                    |
  +---------------+------------------------------------------+
  | action        | resize                                   |
  | instance_uuid | 344174b8-34fd-4017-ae29-b9084dcf3861     |
  | message       | Error                                    |
  | project_id    | b22597ea961545f3bde1b2ede0bd5b91         |
  | request_id    | req-ea1b0dfc-3186-42a9-84ff-c4f4fb130fae |
  | start_time    | 2019-12-02T19:15:35.000000               |
  | user_id       | 046033fb3f824550999752b6525adbac         |
  +---------------+------------------------------------------+

We see the message is "Error" but are not sure what failed. By default the
event details for an action are not shown to users without the admin role so
use microversion 2.51 to see the events (the ``events`` field is JSON-formatted
here for readability):

.. code-block::

  $ openstack --os-compute-api-version 2.51 server event show vm1 req-ea1b0dfc-3186-42a9-84ff-c4f4fb130fae -f json -c events
  {
    "events": [
      {
        "event": "cold_migrate",
        "start_time": "2019-12-02T19:15:35.000000",
        "finish_time": "2019-12-02T19:15:36.000000",
        "result": "Error"
      },
      {
        "event": "conductor_migrate_server",
        "start_time": "2019-12-02T19:15:35.000000",
        "finish_time": "2019-12-02T19:15:36.000000",
        "result": "Error"
      }
    ]
  }

By default policy configuration a user with the admin role can see a
``traceback`` for each failed event just like with an instance fault:

.. code-block::

  $ source openrc admin admin
  $ openstack --os-compute-api-version 2.51 server event show 344174b8-34fd-4017-ae29-b9084dcf3861 req-ea1b0dfc-3186-42a9-84ff-c4f4fb130fae -f json -c events
  {
    "events": [
      {
        "event": "cold_migrate",
        "start_time": "2019-12-02T19:15:35.000000",
        "finish_time": "2019-12-02T19:15:36.000000",
        "result": "Error",
        "traceback": "  File \"/opt/stack/nova/nova/conductor/manager.py\",
        line 301, in migrate_server\n    host_list)\n
        File \"/opt/stack/nova/nova/conductor/manager.py\", line 367, in
        _cold_migrate\n    raise exception.NoValidHost(reason=msg)\n"
      },
      {
        "event": "conductor_migrate_server",
        "start_time": "2019-12-02T19:15:35.000000",
        "finish_time": "2019-12-02T19:15:36.000000",
        "result": "Error",
        "traceback": "  File \"/opt/stack/nova/nova/compute/utils.py\",
        line 1410, in decorated_function\n    return function(self, context,
        *args, **kwargs)\n  File \"/opt/stack/nova/nova/conductor/manager.py\",
        line 301, in migrate_server\n    host_list)\n
        File \"/opt/stack/nova/nova/conductor/manager.py\", line 367, in
        _cold_migrate\n    raise exception.NoValidHost(reason=msg)\n"
      }
    ]
  }

.. _server action APIs: https://docs.openstack.org/api-ref/compute/#servers-run-an-action-servers-action
.. _os-instance-actions API: https://docs.openstack.org/api-ref/compute/#servers-actions-servers-os-instance-actions

Logs
----

All logs on the system, by default, include the global request ID and
the local request ID when available. This allows an administrator to
track the API request processing as it transitions between all the
different nova services or between nova and other component services
called by nova during that request.

When nova services receive the local request IDs of other components in the
``X-Openstack-Request-Id`` header, the local request IDs are output to logs
along with the local request IDs of nova services.

.. tip::

   If a session client is used in client library, set ``DEBUG`` level to
   the ``keystoneauth`` log level. If not, set ``DEBUG`` level to the client
   library package. e.g. ``glanceclient``, ``cinderclient``.

Sample log output is provided below.
In this example, nova is using local request ID
``req-034279a7-f2dd-40ff-9c93-75768fda494d``,
while neutron is using local request ID
``req-39b315da-e1eb-4ab5-a45b-3f2dbdaba787``::

  Jun 19 09:16:34 devstack-master nova-compute[27857]: DEBUG keystoneauth.session [None req-034279a7-f2dd-40ff-9c93-75768fda494d admin admin] POST call to network for http://10.0.2.15:9696/v2.0/ports used request id req-39b315da-e1eb-4ab5-a45b-3f2dbdaba787 {{(pid=27857) request /usr/local/lib/python2.7/dist-packages/keystoneauth1/session.py:640}}

.. note::

   The local request IDs are useful to make 'call graphs'.

.. _instance-fault:

Instance Faults
---------------

Nova often adds an instance fault DB entry for an exception that happens
while processing an API request. This often includes more administrator
focused information, such as a stack trace. For a server with status
``ERROR`` or ``DELETED``, a ``GET /servers/{server_id}`` request will include
a ``fault`` object in the response body for the ``server`` resource. For
example::

  GET https://10.211.2.122/compute/v2.1/servers/c76a7603-95be-4368-87e9-7b9b89fb1d7e
  {
     "server": {
        "id": "c76a7603-95be-4368-87e9-7b9b89fb1d7e",
        "fault": {
           "created": "2018-04-10T13:49:40Z",
           "message": "No valid host was found.",
           "code": 500
        },
        "status": "ERROR",
        ...
     }
  }

Notifications
-------------

In many cases there are also notifications emitted that describe the error.
This is an administrator focused API, that works best when treated as
structured logging.

.. _synchronous_faults:

Synchronous Faults
==================

If an error occurs while processing our API request, you get a non 2xx
API status code. The system also returns additional
information about the fault in the body of the response.


**Example: Fault: JSON response**

.. code::

    {
       "itemNotFound":{
          "code": 404,
          "message":"Aggregate agg_h1 could not be found."
       }
    }

The error ``code`` is returned in the body of the response for convenience.
The ``message`` section returns a human-readable message that is appropriate
for display to the end user. The ``details`` section is optional and may
contain information--for example, a stack trace--to assist in tracking
down an error. The ``details`` section might or might not be appropriate for
display to an end user.

The root element of the fault (such as, computeFault) might change
depending on the type of error. The following link contains a list of possible
elements along with their associated error codes.

For more information on possible error code, please see:
http://specs.openstack.org/openstack/api-wg/guidelines/http/response-codes.html

Asynchronous faults
===================

An error may occur in the background while a server is being built or while a
server is executing an action.

In these cases, the server is usually placed in an ``ERROR`` state. For some
operations, like resize, it is possible that the operation fails but
the instance gracefully returned to its original state before attempting the
operation. In both of these cases, you should be able to find out more from
the `Server Actions`_ API described above.

When a server is placed into an ``ERROR`` state, a fault is embedded in the
offending server. Note that these asynchronous faults follow the same format
as the synchronous ones. The fault contains an error code, a human readable
message, and optional details about the error. Additionally, asynchronous
faults may also contain a ``created`` timestamp that specifies when the fault
occurred.


**Example: Server in error state: JSON response**

.. code::

    {
        "server": {
            "id": "52415800-8b69-11e0-9b19-734f0000ffff",
            "tenant_id": "1234",
            "user_id": "5678",
            "name": "sample-server",
            "created": "2010-08-10T12:00:00Z",
            "hostId": "e4d909c290d0fb1ca068ffafff22cbd0",
            "status": "ERROR",
            "progress": 66,
            "image" : {
                "id": "52415800-8b69-11e0-9b19-734f6f007777"
            },
            "flavor" : {
                "id": "52415800-8b69-11e0-9b19-734f216543fd"
            },
            "fault" : {
                "code" : 500,
                "created": "2010-08-10T11:59:59Z",
                "message": "No valid host was found. There are not enough hosts available.",
                "details": [snip]
            },
            "links": [
                {
                    "rel": "self",
                    "href": "http://servers.api.openstack.org/v2/1234/servers/52415800-8b69-11e0-9b19-734f000004d2"
                },
                {
                    "rel": "bookmark",
                    "href": "http://servers.api.openstack.org/1234/servers/52415800-8b69-11e0-9b19-734f000004d2"
                }
            ]
        }
    }