summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorZuul <zuul@review.opendev.org>2022-07-20 08:23:55 +0000
committerGerrit Code Review <review@openstack.org>2022-07-20 08:23:55 +0000
commit21b21a5f15bce4871f6af0dc06ebf50d0e9b0967 (patch)
treeb39fc6a45ecc4aeac290a91a71e2ff99534abe7a
parent6a1334a0683121dc94390c8a57aec1c3af208398 (diff)
parentbeb7484858d56ef34699895412881945c5507c81 (diff)
downloadironic-python-agent-21b21a5f15bce4871f6af0dc06ebf50d0e9b0967.tar.gz
Merge "Guard shared device/cluster filesystems"
-rw-r--r--doc/source/admin/hardware_managers.rst20
-rw-r--r--doc/source/admin/troubleshooting.rst96
-rw-r--r--doc/source/contributor/hardware_managers.rst19
-rw-r--r--ironic_python_agent/config.py11
-rw-r--r--ironic_python_agent/errors.py19
-rw-r--r--ironic_python_agent/hardware.py121
-rw-r--r--ironic_python_agent/tests/unit/test_hardware.py202
-rw-r--r--releasenotes/notes/prevent-deletion-of-shared-disk-filesystems-4c17c7666d2fe3bc.yaml20
8 files changed, 499 insertions, 9 deletions
diff --git a/doc/source/admin/hardware_managers.rst b/doc/source/admin/hardware_managers.rst
index 290fcf0f..e51093f0 100644
--- a/doc/source/admin/hardware_managers.rst
+++ b/doc/source/admin/hardware_managers.rst
@@ -103,3 +103,23 @@ Clean steps
Delete the RAID configuration. This step belongs to the ``raid`` interface
and must be used through the :ironic-doc:`ironic RAID feature
<admin/raid.html>`.
+
+Cleaning safeguards
+-------------------
+
+The stock hardware manager contains a number of safeguards to prevent
+unsafe conditions from occuring.
+
+Shared Disk Cluster Filesystems
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Commonly used shared disk cluster filesystems, when detected, causes cleaning
+processes on stock hardware manager methods to abort prior to destroying the
+contents on the disk.
+
+These filesystems include IBM General Parallel File System (GPFS),
+VmWare Virtual Machine File System (VMFS), and Red Hat Global File System
+(GFS2).
+
+For information on troubleshooting, and disabling this check,
+see :doc:`/admin/troubleshooting`.
diff --git a/doc/source/admin/troubleshooting.rst b/doc/source/admin/troubleshooting.rst
index 6e1a81c8..9e52f7d7 100644
--- a/doc/source/admin/troubleshooting.rst
+++ b/doc/source/admin/troubleshooting.rst
@@ -24,7 +24,7 @@ image boots. Kernel command line parameters are used to do this.
dynamic-login element example:
-- Add ``sshkey="ssh-rsa BBA1..."`` to pxe_append_params setting in
+- Add ``sshkey="ssh-rsa BBA1..."`` to kernel_append_params setting in
the ``ironic.conf`` file
- Restart the ironic-conductor with the command
``service ironic-conductor restart``
@@ -76,7 +76,7 @@ are used to do this.
dynamic-login element example::
Generate a ENCRYPTED_PASSWORD with the openssl passwd -1 command
- Add rootpwd="$ENCRYPTED_PASSWORD" value on the pxe_append_params setting in /etc/ironic/ironic.conf
+ Add rootpwd="$ENCRYPTED_PASSWORD" value on the kernel_append_params setting in /etc/ironic/ironic.conf
Restart the ironic-conductor with the command service ironic-conductor restart
Users can also be added to DIB built IPA images with the devuser element [1]_
@@ -118,7 +118,7 @@ Set IPA to debug logging
Debug logging can be enabled a several different ways. The easiest way is to
add ``ipa-debug=1`` to the kernel command line. To do this:
-- Append ``ipa-debug=1`` to the pxe_append_params setting in the
+- Append ``ipa-debug=1`` to the kernel_append_params setting in the
``ironic.conf`` file
- Restart the ironic-conductor with the command
``service ironic-conductor restart``
@@ -161,6 +161,96 @@ If the system uses systemd then systemctl can be used to restart the service::
sudo systemctl restart ironic-python-agent.service
+Cleaning halted with ProtectedDeviceError
+=========================================
+
+The IPA service has halted cleaning as one of the block devices within or
+attached to the bare metal node contains a class of filesystem which **MAY**
+cause irreparable harm to a potentially running cluster if accidently removed.
+
+These filesystems *may* be used for only local storage and as a result be
+safe to erase. However if a shared block device is in use, such as a device
+supplied via a Storage Area Network utilizing protocols such as iSCSI or
+FibreChannel. Ultimately the Host Bus Adapter (HBA) may not be an entirely
+"detectable" entity given the hardware market place and aspects such as
+"SmartNICs" and Converged Network Adapters with specific offload functions
+to support standards like "NVMe over Fabric" (NVMe-oF).
+
+By default, the agent will prevent these filesystems from being deleted and
+will halt the cleaning process when detected. The cleaning process can be
+re-triggered via Ironic's state machine once one of the documented settings
+have been used to notify the agent that no action is required.
+
+What filesystems are looked for
+-------------------------------
+
++-------------------------------------------+
+| IBM General Parallel Filesystem |
++-------------------------------------------+
+| Red Hat Global Filesystem 2 |
++-------------------------------------------+
+| VmWare Virtual Machine FileSystem (VMFS) |
++-------------------------------------------+
+
+I'm okay with deleting, how do I tell IPA to clean the disk(s)?
+---------------------------------------------------------------
+
+Four potential ways exist to signal to IPA. Please note, all of these options
+require access either to the ndoe in Ironic's API or ability to modify Ironic
+configuration.
+
+Via Ironic
+~~~~~~~~~~
+
+.. note:: This option requires that the version of Ironic be sufficient enough
+ to understand and explicitly provide this option to the Agent.
+
+Inform Ironic to provide the option to the Agent::
+
+ baremetal node set --driver-info wipe_special_filesystems=True
+
+Via a node's kernel_append_params setting
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This may be set on a node level by utilizing the override
+``kernel_append_params`` setting which can be utilized on a node
+level. Example::
+
+ baremetal node set --driver-info kernel_append_params="ipa-guard-special-filesystems=False"
+
+Alternatively, if you wish to set this only once, you may use
+the ``instance_info`` field, which is wiped upon teardown of the node.
+Example::
+
+ baremetal node set --instance-info kernel_append_params="ipa-guard-special-filesystems=False"
+
+Via Ironic's Boot time PXE parameters (Globally)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Globally, this setting may be passed by modifying the ``ironic.conf``
+configuration file on your cluster by adding
+``ipa-guard-special-filesystems=False`` string to the
+``[pxe]kernel_append_params`` parameter.
+
+.. warning::
+ If your running a multi-conductor deployment, all of your ``ironic.conf``
+ configuration files will need to be updated to match.
+
+Via Ramdisk configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This option requires modifying the ramdisk, and is the most complex, but may
+be advisable if you have a mixed environment cluster where shared clustered
+filesystems may be a concern on some machines, but not others.
+
+.. warning::
+ This requires rebuilding your agent ramdisk, and modifying the embedded
+ configuration file for the ironic-python-agent. If your confused at all
+ by this statement, this option is not for you.
+
+Edit /etc/ironic_python_agent/ironic_python_agent.conf and set the parameter
+``[DEFAULT]guard_special_filesystems`` to ``False``.
+
References
==========
diff --git a/doc/source/contributor/hardware_managers.rst b/doc/source/contributor/hardware_managers.rst
index ee3d6357..f0f78f7d 100644
--- a/doc/source/contributor/hardware_managers.rst
+++ b/doc/source/contributor/hardware_managers.rst
@@ -10,6 +10,13 @@ deploying your own hardware manager.
IPA ships with :doc:`GenericHardwareManager </admin/hardware_managers>`, which
implements basic cleaning and deployment methods compatible with most hardware.
+.. warning::
+ Some functionality inherent in the stock hardware manager cleaning methods
+ may be useful in custom hardware managers, but should not be inherently
+ expected to also work in custom managers. Examples of this are clustered
+ filesystem protections, and cleaning method fallback logic. Custom hardware
+ manager maintainers should be mindful when overriding the stock methods.
+
How are methods executed on HardwareManagers?
---------------------------------------------
Methods that modify hardware are dispatched to each hardware manager in
@@ -149,6 +156,18 @@ function with extra parameters.
be set to whichever manager has a higher hardware support level and then
use the higher priority in the case of a tie.
+In some cases, it may be necessary to create a customized cleaning step to
+take a particular pattern of behavior. Those doing such work may want to
+leverage file system safety checks, which are part of the stock hardware
+managers.
+
+.. code-block:: python
+
+ def custom_erase_devices(self, node, ports):
+ for dev in determine_my_devices_to_erase():
+ hardware.safety_check_block_device(node, dev.name)
+ my_special_cleaning(dev.name)
+
Custom HardwareManagers and Deploying
-------------------------------------
diff --git a/ironic_python_agent/config.py b/ironic_python_agent/config.py
index 8901c45a..9251a3e3 100644
--- a/ironic_python_agent/config.py
+++ b/ironic_python_agent/config.py
@@ -315,6 +315,17 @@ cli_opts = [
min=0, max=99, # 100 is when IPA is booted
help='Priority of the inject_files deploy step (disabled '
'by default), an integer between 1 and .'),
+ cfg.BoolOpt('guard-special-filesystems',
+ default=APARAMS.get('ipa-guard-special-filesystems', True),
+ help='Guard "special" shared device filesystems from '
+ 'cleaning by the stock hardware manager\'s cleaning '
+ 'methods. If one of these filesystems is detected '
+ 'during cleaning, the cleaning process will be aborted '
+ 'and infrastructure operator intervention may be '
+ 'required as this option is intended to prevent '
+ 'cleaning from inadvertently destroying a running '
+ 'cluster which may be visible over a storage fabric '
+ 'such as FibreChannel.'),
]
CONF.register_cli_opts(cli_opts)
diff --git a/ironic_python_agent/errors.py b/ironic_python_agent/errors.py
index 1f97c0e2..a0994207 100644
--- a/ironic_python_agent/errors.py
+++ b/ironic_python_agent/errors.py
@@ -348,3 +348,22 @@ class HeartbeatConnectionError(IronicAPIError):
def __init__(self, details):
super(HeartbeatConnectionError, self).__init__(details)
+
+
+class ProtectedDeviceError(CleaningError):
+ """Error raised when a cleaning is halted due to a protected device."""
+
+ message = 'Protected device located, cleaning aborted.'
+
+ def __init__(self, device, what):
+ details = ('Protected %(what)s located on device %(device)s. '
+ 'This volume or contents may be a shared block device. '
+ 'Please consult your storage administrator, and restart '
+ 'cleaning after either detaching the volumes, or '
+ 'instructing IPA to not protect contents. See Ironic '
+ 'Python Agent documentation for more information.' %
+ {'what': what,
+ 'device': device})
+
+ self.message = details
+ super(CleaningError, self).__init__(details)
diff --git a/ironic_python_agent/hardware.py b/ironic_python_agent/hardware.py
index 0f7e4f82..dfcce6f8 100644
--- a/ironic_python_agent/hardware.py
+++ b/ironic_python_agent/hardware.py
@@ -924,6 +924,10 @@ class HardwareManager(object, metaclass=abc.ABCMeta):
:param node: Ironic node object
:param ports: list of Ironic port objects
+ :raises: ProtectedDeviceFound if a device has been identified which
+ may require manual intervention due to the contents and
+ operational risk which exists as it could also be a sign
+ of an environmental misconfiguration.
:return: a dictionary in the form {device.name: erasure output}
"""
erase_results = {}
@@ -937,6 +941,7 @@ class HardwareManager(object, metaclass=abc.ABCMeta):
thread_pool = ThreadPool(min(max_pool_size, len(block_devices)))
for block_device in block_devices:
params = {'node': node, 'block_device': block_device}
+ safety_check_block_device(node, block_device.name)
erase_results[block_device.name] = thread_pool.apply_async(
dispatch_to_managers, ('erase_block_device',), params)
thread_pool.close()
@@ -1541,6 +1546,10 @@ class GenericHardwareManager(HardwareManager):
:param ports: list of Ironic port objects
:raises BlockDeviceEraseError: when there's an error erasing the
block device
+ :raises: ProtectedDeviceFound if a device has been identified which
+ may require manual intervention due to the contents and
+ operational risk which exists as it could also be a sign
+ of an environmental misconfiguration.
"""
block_devices = self.list_block_devices(include_partitions=True)
# NOTE(coreywright): Reverse sort by device name so a partition (eg
@@ -1549,6 +1558,7 @@ class GenericHardwareManager(HardwareManager):
block_devices.sort(key=lambda dev: dev.name, reverse=True)
erase_errors = {}
for dev in self._list_erasable_devices():
+ safety_check_block_device(node, dev.name)
try:
disk_utils.destroy_disk_metadata(dev.name, node['uuid'])
except processutils.ProcessExecutionError as e:
@@ -1572,6 +1582,10 @@ class GenericHardwareManager(HardwareManager):
:param ports: list of Ironic port objects
:raises BlockDeviceEraseError: when there's an error erasing the
block device
+ :raises: ProtectedDeviceFound if a device has been identified which
+ may require manual intervention due to the contents and
+ operational risk which exists as it could also be a sign
+ of an environmental misconfiguration.
"""
erase_errors = {}
info = node.get('driver_internal_info', {})
@@ -1579,6 +1593,7 @@ class GenericHardwareManager(HardwareManager):
LOG.debug("No erasable devices have been found.")
return
for dev in self._list_erasable_devices():
+ safety_check_block_device(node, dev.name)
try:
if self._is_nvme(dev):
execute_nvme_erase = info.get(
@@ -2957,3 +2972,109 @@ def get_multipath_status():
# as if we directly try and work with the global var, we will be racing
# tests endlessly.
return MULTIPATH_ENABLED
+
+
+def safety_check_block_device(node, device):
+ """Performs safety checking of a block device before destroying.
+
+ In order to guard against distruction of file systems such as
+ shared-disk file systems
+ (https://en.wikipedia.org/wiki/Clustered_file_system#SHARED-DISK)
+ or similar filesystems where multiple distinct computers may have
+ unlocked concurrent IO access to the entire block device or
+ SAN Logical Unit Number, we need to evaluate, and block cleaning
+ from occuring on these filesystems *unless* we have been explicitly
+ configured to do so.
+
+ This is because cleaning is an intentionally distructive operation,
+ and once started against such a device, given the complexities of
+ shared disk clustered filesystems where concurrent access is a design
+ element, in all likelihood the entire cluster can be negatively
+ impacted, and an operator will be forced to recover from snapshot and
+ or backups of the volume's contents.
+
+ :param node: A node, or cached node object.
+ :param device: String representing the path to the block
+ device to be checked.
+ :raises: ProtectedDeviceError when a device is identified with
+ one of these known clustered filesystems, and the overall
+ settings have not indicated for the agent to skip such
+ safety checks.
+ """
+
+ # NOTE(TheJulia): While this seems super rare, I found out after this
+ # thread of discussion started that I have customers which have done
+ # this and wiped out SAN volumes and their contents unintentionally
+ # as a result of these filesystems not being guarded.
+ # For those not familiar with shared disk clustered filesystems, think
+ # of it as like your impacting a Ceph cluster, except your suddenly
+ # removing the underlying disks from the OSD, and the entire cluster
+ # goes down.
+
+ if not CONF.guard_special_filesystems:
+ return
+ di_info = node.get('driver_internal_info', {})
+ if not di_info.get('wipe_special_filesystems', True):
+ return
+ report, _e = il_utils.execute('lsblk', '-Pbia',
+ '-oFSTYPE,UUID,PTUUID,PARTTYPE,PARTUUID',
+ device)
+
+ lines = report.splitlines()
+
+ identified_fs_types = []
+ identified_ids = []
+ for line in lines:
+ device = {}
+ # Split into KEY=VAL pairs
+ vals = shlex.split(line)
+ if not vals:
+ continue
+ for key, val in (v.split('=', 1) for v in vals):
+ if key == 'FSTYPE':
+ identified_fs_types.append(val)
+ if key in ['UUID', 'PTUUID', 'PARTTYPE', 'PARTUUID']:
+ identified_ids.append(val)
+ # Ignore block types not specified
+
+ _check_for_special_partitions_filesystems(
+ device,
+ identified_ids,
+ identified_fs_types)
+
+
+def _check_for_special_partitions_filesystems(device, ids, fs_types):
+ """Compare supplied IDs, Types to known items, and raise if found.
+
+ :param device: The block device in use, specificially for logging.
+ :param ids: A list above IDs found to check.
+ :param fs_types: A list of FS types found to check.
+ :raises: ProtectedDeviceError should a partition label or metadata
+ be discovered which suggests a shared disk clustered filesystem
+ has been discovered.
+ """
+
+ guarded_ids = {
+ # Apparently GPFS can used shared volumes....
+ '37AFFC90-EF7D-4E96-91C3-2D7AE055B174': 'IBM GPFS Partition',
+ # Shared volume parallel filesystem
+ 'AA31E02A-400F-11DB-9590-000C2911D1B8': 'VMware VMFS Partition (GPT)',
+ '0xfb': 'VMware VMFS Partition (MBR)',
+ }
+ for key, value in guarded_ids.items():
+ for id_value in ids:
+ if key == id_value:
+ raise errors.ProtectedDeviceError(
+ device=device,
+ what=value)
+
+ guarded_fs_types = {
+ 'gfs2': 'Red Hat Global File System 2',
+ }
+
+ for key, value in guarded_fs_types.items():
+ for fs in fs_types:
+ if key == fs:
+ raise errors.ProtectedDeviceError(
+ device=device,
+ what=value)
diff --git a/ironic_python_agent/tests/unit/test_hardware.py b/ironic_python_agent/tests/unit/test_hardware.py
index a610eb2f..21349050 100644
--- a/ironic_python_agent/tests/unit/test_hardware.py
+++ b/ironic_python_agent/tests/unit/test_hardware.py
@@ -1710,10 +1710,12 @@ class TestGenericHardwareManager(base.IronicAgentTest):
mocked_listdir.assert_has_calls(expected_calls)
mocked_mpath.assert_called_once_with()
+ @mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(hardware, 'ThreadPool', autospec=True)
@mock.patch.object(hardware, 'dispatch_to_managers', autospec=True)
def test_erase_devices_no_parallel_by_default(self, mocked_dispatch,
- mock_threadpool):
+ mock_threadpool,
+ mock_safety_check):
# NOTE(TheJulia): This test was previously more elaborate and
# had a high failure rate on py37 and py38. So instead, lets just
@@ -1732,10 +1734,43 @@ class TestGenericHardwareManager(base.IronicAgentTest):
calls = [mock.call(1)]
self.hardware.erase_devices({}, [])
mock_threadpool.assert_has_calls(calls)
+ mock_safety_check.assert_has_calls([
+ mock.call({}, '/dev/sdj'),
+ mock.call({}, '/dev/hdaa')
+ ])
+
+ @mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
+ @mock.patch.object(hardware, 'ThreadPool', autospec=True)
+ @mock.patch.object(hardware, 'dispatch_to_managers', autospec=True)
+ def test_erase_devices_no_parallel_by_default_protected_device(
+ self, mocked_dispatch,
+ mock_threadpool,
+ mock_safety_check):
+ mock_safety_check.side_effect = errors.ProtectedDeviceError(
+ device='foo',
+ what='bar')
+
+ self.hardware.list_block_devices = mock.Mock()
+
+ self.hardware.list_block_devices.return_value = [
+ hardware.BlockDevice('/dev/sdj', 'big', 1073741824, True),
+ hardware.BlockDevice('/dev/hdaa', 'small', 65535, False),
+ ]
+
+ calls = [mock.call(1)]
+ self.assertRaises(errors.ProtectedDeviceError,
+ self.hardware.erase_devices, {}, [])
+ mock_threadpool.assert_has_calls(calls)
+ mock_safety_check.assert_has_calls([
+ mock.call({}, '/dev/sdj'),
+ ])
+ mock_threadpool.apply_async.assert_not_called()
+ @mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch('multiprocessing.pool.ThreadPool.apply_async', autospec=True)
@mock.patch.object(hardware, 'dispatch_to_managers', autospec=True)
- def test_erase_devices_concurrency(self, mocked_dispatch, mocked_async):
+ def test_erase_devices_concurrency(self, mocked_dispatch, mocked_async,
+ mock_safety_check):
internal_info = self.node['driver_internal_info']
internal_info['disk_erasure_concurrency'] = 10
mocked_dispatch.return_value = 'erased device'
@@ -1760,9 +1795,15 @@ class TestGenericHardwareManager(base.IronicAgentTest):
for dev in (blkdev1, blkdev2)]
mocked_async.assert_has_calls(calls)
self.assertEqual(expected, result)
+ mock_safety_check.assert_has_calls([
+ mock.call(self.node, '/dev/sdj'),
+ mock.call(self.node, '/dev/hdaa'),
+ ])
+ @mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(hardware, 'ThreadPool', autospec=True)
- def test_erase_devices_concurrency_pool_size(self, mocked_pool):
+ def test_erase_devices_concurrency_pool_size(self, mocked_pool,
+ mock_safety_check):
self.hardware.list_block_devices = mock.Mock()
self.hardware.list_block_devices.return_value = [
hardware.BlockDevice('/dev/sdj', 'big', 1073741824, True),
@@ -1782,6 +1823,10 @@ class TestGenericHardwareManager(base.IronicAgentTest):
self.hardware.erase_devices(self.node, [])
mocked_pool.assert_called_with(1)
+ mock_safety_check.assert_has_calls([
+ mock.call(self.node, '/dev/sdj'),
+ mock.call(self.node, '/dev/hdaa'),
+ ])
@mock.patch.object(hardware, 'dispatch_to_managers', autospec=True)
def test_erase_devices_without_disk(self, mocked_dispatch):
@@ -2441,6 +2486,7 @@ class TestGenericHardwareManager(base.IronicAgentTest):
mock.call('/sys/fs/pstore/' + arg) for arg in pstore_entries
])
+ @mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(il_utils, 'execute', autospec=True)
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
@mock.patch.object(hardware.GenericHardwareManager,
@@ -2449,7 +2495,7 @@ class TestGenericHardwareManager(base.IronicAgentTest):
'_list_erasable_devices', autospec=True)
def test_erase_devices_express(
self, mock_list_erasable_devices, mock_nvme_erase,
- mock_destroy_disk_metadata, mock_execute):
+ mock_destroy_disk_metadata, mock_execute, mock_safety_check):
block_devices = [
hardware.BlockDevice('/dev/sda', 'sata', 65535, False),
hardware.BlockDevice('/dev/md0', 'raid-device', 32767, False),
@@ -2466,7 +2512,44 @@ class TestGenericHardwareManager(base.IronicAgentTest):
mock.call('/dev/md0', self.node['uuid'])],
mock_destroy_disk_metadata.call_args_list)
mock_list_erasable_devices.assert_called_with(self.hardware)
+ mock_safety_check.assert_has_calls([
+ mock.call(self.node, '/dev/sda'),
+ mock.call(self.node, '/dev/md0'),
+ mock.call(self.node, '/dev/nvme0n1'),
+ mock.call(self.node, '/dev/nvme1n1')
+ ])
+
+ @mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
+ @mock.patch.object(il_utils, 'execute', autospec=True)
+ @mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
+ @mock.patch.object(hardware.GenericHardwareManager,
+ '_nvme_erase', autospec=True)
+ @mock.patch.object(hardware.GenericHardwareManager,
+ '_list_erasable_devices', autospec=True)
+ def test_erase_devices_express_stops_on_safety_failure(
+ self, mock_list_erasable_devices, mock_nvme_erase,
+ mock_destroy_disk_metadata, mock_execute, mock_safety_check):
+ mock_safety_check.side_effect = errors.ProtectedDeviceError(
+ device='foo',
+ what='bar')
+ block_devices = [
+ hardware.BlockDevice('/dev/sda', 'sata', 65535, False),
+ hardware.BlockDevice('/dev/md0', 'raid-device', 32767, False),
+ hardware.BlockDevice('/dev/nvme0n1', 'nvme', 32767, False),
+ hardware.BlockDevice('/dev/nvme1n1', 'nvme', 32767, False)
+ ]
+ mock_list_erasable_devices.return_value = list(block_devices)
+
+ self.assertRaises(errors.ProtectedDeviceError,
+ self.hardware.erase_devices_express, self.node, [])
+ mock_nvme_erase.assert_not_called()
+ mock_destroy_disk_metadata.assert_not_called()
+ mock_list_erasable_devices.assert_called_with(self.hardware)
+ mock_safety_check.assert_has_calls([
+ mock.call(self.node, '/dev/sda')
+ ])
+ @mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(il_utils, 'execute', autospec=True)
@mock.patch.object(hardware.GenericHardwareManager,
'_is_virtual_media_device', autospec=True)
@@ -2475,7 +2558,7 @@ class TestGenericHardwareManager(base.IronicAgentTest):
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
def test_erase_devices_metadata(
self, mock_metadata, mock_list_devs, mock__is_vmedia,
- mock_execute):
+ mock_execute, mock_safety_check):
block_devices = [
hardware.BlockDevice('/dev/sr0', 'vmedia', 12345, True),
hardware.BlockDevice('/dev/sdb2', 'raid-member', 32767, False),
@@ -2509,7 +2592,62 @@ class TestGenericHardwareManager(base.IronicAgentTest):
mock.call(self.hardware, block_devices[2]),
mock.call(self.hardware, block_devices[5])],
mock__is_vmedia.call_args_list)
+ mock_safety_check.assert_has_calls([
+ mock.call(self.node, '/dev/sda1'),
+ mock.call(self.node, '/dev/sda'),
+ # This is kind of redundant code/pattern behavior wise
+ # but you never know what someone has done...
+ mock.call(self.node, '/dev/md0')
+ ])
+ @mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
+ @mock.patch.object(il_utils, 'execute', autospec=True)
+ @mock.patch.object(hardware.GenericHardwareManager,
+ '_is_virtual_media_device', autospec=True)
+ @mock.patch.object(hardware.GenericHardwareManager,
+ 'list_block_devices', autospec=True)
+ @mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
+ def test_erase_devices_metadata_safety_check(
+ self, mock_metadata, mock_list_devs, mock__is_vmedia,
+ mock_execute, mock_safety_check):
+ block_devices = [
+ hardware.BlockDevice('/dev/sr0', 'vmedia', 12345, True),
+ hardware.BlockDevice('/dev/sdb2', 'raid-member', 32767, False),
+ hardware.BlockDevice('/dev/sda', 'small', 65535, False),
+ hardware.BlockDevice('/dev/sda1', '', 32767, False),
+ hardware.BlockDevice('/dev/sda2', 'raid-member', 32767, False),
+ hardware.BlockDevice('/dev/md0', 'raid-device', 32767, False)
+ ]
+ # NOTE(coreywright): Don't return the list, but a copy of it, because
+ # we depend on its elements' order when referencing it later during
+ # verification, but the method under test sorts the list changing it.
+ mock_list_devs.return_value = list(block_devices)
+ mock__is_vmedia.side_effect = lambda _, dev: dev.name == '/dev/sr0'
+ mock_execute.side_effect = [
+ ('sdb2 linux_raid_member host:1 f9978968', ''),
+ ('sda2 linux_raid_member host:1 f9978969', ''),
+ ('sda1', ''), ('sda', ''), ('md0', '')]
+ mock_safety_check.side_effect = [
+ None,
+ errors.ProtectedDeviceError(
+ device='foo',
+ what='bar')
+ ]
+
+ self.assertRaises(errors.ProtectedDeviceError,
+ self.hardware.erase_devices_metadata,
+ self.node, [])
+
+ self.assertEqual([mock.call('/dev/sda1', self.node['uuid'])],
+ mock_metadata.call_args_list)
+ mock_list_devs.assert_called_with(self.hardware,
+ include_partitions=True)
+ mock_safety_check.assert_has_calls([
+ mock.call(self.node, '/dev/sda1'),
+ mock.call(self.node, '/dev/sda'),
+ ])
+
+ @mock.patch.object(hardware, 'safety_check_block_device', autospec=True)
@mock.patch.object(hardware.GenericHardwareManager,
'_is_linux_raid_member', autospec=True)
@mock.patch.object(hardware.GenericHardwareManager,
@@ -2519,7 +2657,7 @@ class TestGenericHardwareManager(base.IronicAgentTest):
@mock.patch.object(disk_utils, 'destroy_disk_metadata', autospec=True)
def test_erase_devices_metadata_error(
self, mock_metadata, mock_list_devs, mock__is_vmedia,
- mock__is_raid_member):
+ mock__is_raid_member, mock_safety_check):
block_devices = [
hardware.BlockDevice('/dev/sda', 'small', 65535, False),
hardware.BlockDevice('/dev/sdb', 'big', 10737418240, True),
@@ -2553,6 +2691,10 @@ class TestGenericHardwareManager(base.IronicAgentTest):
self.assertEqual([mock.call(self.hardware, block_devices[1]),
mock.call(self.hardware, block_devices[0])],
mock__is_vmedia.call_args_list)
+ mock_safety_check.assert_has_calls([
+ mock.call(self.node, '/dev/sdb'),
+ mock.call(self.node, '/dev/sda')
+ ])
@mock.patch.object(il_utils, 'execute', autospec=True)
def test__is_linux_raid_member(self, mocked_execute):
@@ -4991,3 +5133,51 @@ class TestAPIClientSaveAndUse(base.IronicAgentTest):
calls = [mock.call('list_hardware_info'),
mock.call('wait_for_disks')]
mock_dispatch.assert_has_calls(calls)
+
+
+@mock.patch.object(il_utils, 'execute', autospec=True)
+class TestProtectedDiskSafetyChecks(base.IronicAgentTest):
+
+ def test_special_filesystem_guard_not_enabled(self, mock_execute):
+ CONF.set_override('guard_special_filesystems', False)
+ hardware.safety_check_block_device({}, '/dev/foo')
+ mock_execute.assert_not_called()
+
+ def test_special_filesystem_guard_node_indicates_skip(self, mock_execute):
+ node = {
+ 'driver_internal_info': {
+ 'wipe_special_filesystems': False
+ }
+ }
+ mock_execute.return_value = ('', '')
+ hardware.safety_check_block_device(node, '/dev/foo')
+ mock_execute.assert_not_called()
+
+ def test_special_filesystem_guard_enabled_no_results(self, mock_execute):
+ mock_execute.return_value = ('', '')
+ hardware.safety_check_block_device({}, '/dev/foo')
+
+ def test_special_filesystem_guard_raises(self, mock_execute):
+ GFS2 = 'FSTYPE="gfs2"\n'
+ GPFS1 = 'UUID="37AFFC90-EF7D-4E96-91C3-2D7AE055B174"\n'
+ GPFS2 = 'PTUUID="37AFFC90-EF7D-4E96-91C3-2D7AE055B174"\n'
+ GPFS3 = 'PARTTYPE="37AFFC90-EF7D-4E96-91C3-2D7AE055B174"\n'
+ GPFS4 = 'PARTUUID="37AFFC90-EF7D-4E96-91C3-2D7AE055B174"\n'
+ VMFS1 = 'UUID="AA31E02A-400F-11DB-9590-000C2911D1B8"\n'
+ VMFS2 = 'UUID="AA31E02A-400F-11DB-9590-000C2911D1B8"\n'
+ VMFS3 = 'UUID="AA31E02A-400F-11DB-9590-000C2911D1B8"\n'
+ VMFS4 = 'UUID="AA31E02A-400F-11DB-9590-000C2911D1B8"\n'
+ VMFS5 = 'UUID="0xfb"\n'
+ VMFS6 = 'PTUUID="0xfb"\n'
+ VMFS7 = 'PARTTYPE="0xfb"\n'
+ VMFS8 = 'PARTUUID="0xfb"\n'
+
+ expected_failures = [GFS2, GPFS1, GPFS2, GPFS3, GPFS4, VMFS1, VMFS2,
+ VMFS3, VMFS4, VMFS5, VMFS6, VMFS7, VMFS8]
+ for failure in expected_failures:
+ mock_execute.reset_mock()
+ mock_execute.return_value = (failure, '')
+ self.assertRaises(errors.ProtectedDeviceError,
+ hardware.safety_check_block_device,
+ {}, '/dev/foo')
+ self.assertEqual(1, mock_execute.call_count)
diff --git a/releasenotes/notes/prevent-deletion-of-shared-disk-filesystems-4c17c7666d2fe3bc.yaml b/releasenotes/notes/prevent-deletion-of-shared-disk-filesystems-4c17c7666d2fe3bc.yaml
new file mode 100644
index 00000000..c29b3121
--- /dev/null
+++ b/releasenotes/notes/prevent-deletion-of-shared-disk-filesystems-4c17c7666d2fe3bc.yaml
@@ -0,0 +1,20 @@
+---
+fixes:
+ - |
+ Previously when the ``ironic-python-agent`` would undergo erasure of block
+ devices during cleaning, it would automatically attempt to erase the
+ contents of any "Shared Device" clustered filesystems which may be in use
+ by distinct multiple machines over a storage fabric. In particular
+ IBM GPFS, Red Hat Global File System 2, and VmWare Virtual Machine File
+ System (VMFS), are now identified and cleaning is halted. This is important
+ because should an active cluster be using the this disk, cleaning could
+ potentially cause the cluster to go down forcing restoration from backups.
+ Ideally, infrastructure operators should check their environment's storage
+ configuration and un-map any clustered filesystems from being visible to
+ Ironic nodes, unless explicitly needed and expected. Please see the
+ Ironic-Python-Agent troubleshooting documentation for more information.
+issues:
+ - |
+ Logic to guard VMFS filesystems from being destroyed *may* not recognize
+ VMFS extents. Operators with examples of partitioning for extent usage
+ are encouraged to contact the Ironic community.