summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDmitry Tantsur <dtantsur@protonmail.com>2020-12-10 17:51:05 +0100
committerDmitry Tantsur <dtantsur@protonmail.com>2020-12-10 17:58:18 +0100
commit8a2c715a0a6677b2a32f9319c1227591b14bdfa5 (patch)
tree11345e03f6b8f6627cf8d0c88d55916aedca5880
parent42bf964c8cf747a0b6b82243cbf15a3630aa3a6e (diff)
downloadironic-8a2c715a0a6677b2a32f9319c1227591b14bdfa5.tar.gz
Add TLS troubleshooting guide entry
Change-Id: Ied66562bb2475513ddb8c712dedc5f50fc6cad4f
-rw-r--r--doc/source/admin/troubleshooting.rst51
1 files changed, 51 insertions, 0 deletions
diff --git a/doc/source/admin/troubleshooting.rst b/doc/source/admin/troubleshooting.rst
index e774bbc3e..2ddd22cfc 100644
--- a/doc/source/admin/troubleshooting.rst
+++ b/doc/source/admin/troubleshooting.rst
@@ -718,3 +718,54 @@ or vendor supplied images. Centos, Ubuntu, Fedora, and Debian all publish
operating system images which do generally include drivers and firmware for
physical hardware. Many of these published "cloud" images, also support
auto-configuration of networking AND population of user keys.
+
+Issues with autoconfigured TLS
+==============================
+
+These issues will manifest as an error in ``ironic-conductor`` logs looking
+similar to (lines are wrapped for readability)::
+
+ ERROR ironic.drivers.modules.agent_client [-]
+ Failed to connect to the agent running on node d7c322f0-0354-4008-92b4-f49fb2201001
+ for invoking command clean.get_clean_steps. Error:
+ HTTPSConnectionPool(host='192.168.123.126', port=9999): Max retries exceeded with url:
+ /v1/commands/?wait=true&agent_token=<token> (Caused by
+ SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),)):
+ requests.exceptions.SSLError: HTTPSConnectionPool(host='192.168.123.126', port=9999):
+ Max retries exceeded with url: /v1/commands/?wait=true&agent_token=<token>
+ (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))
+
+The cause of the issue is that the Bare Metal service cannot access the ramdisk
+with the TLS certificate provided by the ramdisk on first heartbeat. You can
+inspect the stored certificate in ``/var/lib/ironic/certificates/<node>.crt``.
+
+You can try connecting to the ramdisk using the IP address in the log message::
+
+ curl -vL https://<IP address>:9999/v1/commands \
+ --cacert /var/lib/ironic/certificates/<node UUID>.crt
+
+You can get the detailed information about the certificate using openSSL::
+
+ openssl x509 -text -noout -in /var/lib/ironic/certificates/<node UUID>.crt
+
+Clock skew
+----------
+
+One possible source of the problem is a discrepancy between the hardware
+clock on the node and the time on the machine with the Bare Metal service.
+It can be detected by comparing the ``Not Before`` field in the ``openssl``
+output with the timestamp of a log message.
+
+The recommended solution is to enable the NTP support in ironic-python-agent by
+passing the ``ipa-ntp-server`` argument with an address of an NTP server
+reachable by the node.
+
+If it is not possible, you need to ensure the correct hardware time on the
+machine. Keep in mind a potential issue with timezones: an ability to store
+timezone in hardware is pretty recent and may not be available. Since
+ironic-python-agent is likely operating in UTC, the hardware clock should also
+be set in UTC.
+
+.. note::
+ Microsoft Windows uses local time by default, so a machine that has
+ previously run Windows will likely have wrong time.