summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorJoe Thornber <ejt@redhat.com>2018-04-20 10:43:50 -0500
committerDavid Teigland <teigland@redhat.com>2018-04-20 11:03:58 -0500
commit00f1b208a1bf44665ec97a791355b1fcf525a3a7 (patch)
tree6414eaef877cc4e63957729c8d8289debb8af743 /doc
parentd51429254f4e9b17083af0c554aa7045e5ec08bb (diff)
downloadlvm2-00f1b208a1bf44665ec97a791355b1fcf525a3a7.tar.gz
[io paths] Unpick agk's aio stuff
Diffstat (limited to 'doc')
-rw-r--r--doc/aio_design.txt215
1 files changed, 0 insertions, 215 deletions
diff --git a/doc/aio_design.txt b/doc/aio_design.txt
deleted file mode 100644
index c6eb44352..000000000
--- a/doc/aio_design.txt
+++ /dev/null
@@ -1,215 +0,0 @@
-Introducing asynchronous I/O to LVM
-===================================
-
-Issuing I/O asynchronously means instructing the kernel to perform specific
-I/O and return immediately without waiting for it to complete. The data
-is collected from the kernel later.
-
-Advantages
-----------
-
-A1. While waiting for the I/O to happen, the program could perform other
-operations.
-
-A2. When LVM is searching for its Physical Volumes, it issues a small amount of
-I/O to a large number of disks. If this was issued in parallel the overall
-runtime might be shorter while there should be little effect on the cpu time.
-
-A3. If more than one timeout occurs when accessing any devices, these can be
-taken in parallel, again reducing the runtime. This applies globally,
-not just while the code is searching for Physical Volumes, so reading,
-writing and committing the metadata may occasionally benefit too to some
-extent and there are probably maintenance advantages in using the same
-method of I/O throughout the main body of the code.
-
-A4. By introducing a simple callback function mechanism, the conversion can be
-performed largely incrementally by first refactoring and continuing to
-use synchronous I/O with the callbacks performed immediately. This allows the
-callbacks to be introduced without changing the running sequence of the code
-initially. Future projects could refactor some of the calling sites to
-simplify the code structure and even eliminate some of the nesting.
-This allows each part of what might ultimately amount to a large change to be
-introduced and tested independently.
-
-
-Disadvantages
--------------
-
-D1. The resulting code may be more complex with more failure modes to
-handle. Mitigate by thorough auditing and testing, rolling out
-gradually, and offering a simple switch to revert to the old behaviour.
-
-D2. The linux asynchronous I/O implementation is less mature than
-its synchronous I/O implementation and might show up problems that
-depend on the version of the kernel or library used. Fixes or
-workarounds for some of these might require kernel changes. For
-example, there are suggestions that despite being supposedly async,
-there are still cases where system calls can block. There might be
-resource dependencies on other processes running on the system that make
-it unsuitable for use while any devices are suspended. Mitigation
-as for D1.
-
-D3. The error handling within callbacks becomes more complicated.
-However we know that existing call paths can already sometimes discard
-errors, sometimes deliberately, sometimes not, so this aspect is in need
-of a complete review anyway and the new approach will make the error
-handling more transparent. Aim initially for overall behaviour that is
-no worse than that of the existing code, then work on improving it
-later.
-
-D4. The work will take a few weeks to code and test. This leads to a
-significant opportunity cost when compared against other enhancements
-that could be achieved in that time. However, the proof-of-concept work
-performed while writing this design has satisfied me that the work could
-proceed and be committed incrementally as a background task.
-
-
-Observations regarding LVM's I/O Architecture
----------------------------------------------
-
-H1. All device, metadata and config file I/O is constrained to pass through a
-single route in lib/device.
-
-H2. The first step of the analysis was to instrument this code path with
-log_debug messages. I/O is split into the following categories:
-
- "dev signatures",
- "PV labels",
- "VG metadata header",
- "VG metadata content",
- "extra VG metadata header",
- "extra VG metadata content",
- "LVM1 metadata",
- "pool metadata",
- "LV content",
- "logging",
-
-H3. A bounce buffer is used for most I/O.
-
-H4. Most callers finish using the supplied data before any further I/O is
-issued. The few that don't could be converted trivially to do so.
-
-H5. There is one stream of I/O per metadata area on each device.
-
-H6. Some reads fall at offsets close to immediately preceding reads, so it's
-possible to avoid these by caching one "block" per metadata area I/O stream.
-
-H7. Simple analysis suggests a minimum aligned read size of 8k would deliver
-immediate gains from this caching. A larger size might perform worse because
-almost all the time the extra data read would not be used, but this can be
-re-examined and tuned after the code is in place.
-
-
-Proposal
---------
-
-P1. Retain the "single I/O path" but offer an asynchronous option.
-
-P2. Eliminate the bounce buffer in most cases by improving alignment.
-
-P3. Reduce the number of reads by always reading a minimum of an aligned
-8k block.
-
-P4. Eliminate repeated reads by caching the last block read and changing
-the lib/device interface to return a pointer to read-only data within
-this block.
-
-P5. Only perform these interface changes for code on the critical path
-for now by converting other code sites to use wrappers around the new
-interface.
-
-P6. Treat asynchronous I/O as the interface of choice and optimise only
-for this case.
-
-P7. Convert the callers on the critical path to pass callback functions
-to the device layer. These functions will be called later with the
-read-only data, a context pointer and a success/failure indicator.
-Where an existing function performs a sequence of I/O, this has the
-advantage of breaking up the large function into smaller ones and
-wrapping the parameters used into structures. While this might look
-rather messy and ad-hoc in the short-term, it's a first step towards
-breaking up confusingly long functions into component parts and wrapping
-the existing long parameter lists into more appropriate structures and
-refactoring these parts of the code.
-
-P8. Limit the resources used by the asynchronous I/O by using two
-tunable parameters, one limiting the number of outstanding I/Os issued
-and another limiting the total amount of memory used.
-
-P9. Provide a fallback option if asynchronous I/O is unavailable by
-sharing the code paths but issuing the I/O synchronously and calling the
-callback immediately.
-
-P10. Only allocate the buffer for the I/O at the point where the I/O is
-about to be issued.
-
-P11. If the thresholds are exceeded, add the request to a simple queue,
-and process it later after some I/O has completed.
-
-
-Future work
------------
-F1. Perform a complete review of the error tracking so that device
-failures are handled and reported more cleanly, extending the existing
-basic error counting mechanism.
-
-F2. Consider whether some of the nested callbacks can be eliminated,
-which would allow for additional simplifications.
-
-F3. Adjust the contents of the adhoc context structs into more logical
-arrangements and use them more widely.
-
-F4. Perform wider refactoring of these areas of code.
-
-
-Testing considerations
-----------------------
-T1. The changes touch code on the device path, so a thorough re-test of
-the device layer is required. The new code needs a full audit down
-through the library layer into the kernel to check that all the error
-conditions that are currently implemented (such as EAGAIN) are handled
-sensibly. (LVM's I/O layer needs to remain as solid as we can make it.)
-
-T2. The current test suite provides a reasonably broad range of coverage
-of this area but is far from comprehensive.
-
-
-Acceptance criteria
--------------------
-A1. The current test suite should pass to the same extent as before the
-changes.
-
-A2. When all debugging and logging is disabled, strace -c must show
-improvements e.g. the expected fewer number of reads.
-
-A3. Running a range of commands under valgrind must not reveal any
-new leaks due to the changes.
-
-A4. All new coverity reports from the change must be addressed.
-
-A5. CPU time should be similar to that before, as the same work
-is being done overall, just in a different order.
-
-A6. Tests need to show improved behaviour in targetted areas. For example,
-if several devices are slow and time out, the delays should occur
-in parallel and the elapsed time should be less than before.
-
-
-Release considerations
-----------------------
-R1. Async I/O should be widely available and largely reliable on linux
-nowadays (even though parts of its interface and implementation remain a
-matter of controversy) so we should try to make its use the default
-whereever it is supported. If certain types of systems have problems we
-should try to detect those cases and disable it automatically there.
-
-R2. Because the implications of an unexpected problem in the new code
-could be severe for the people affected, the roll out needs to be gentle
-without a deadline to allow us plenty of time to gain confidence in the
-new code. Our own testing will only be able to cover a tiny fraction of
-the different setups our users have, so we need to look out for problems
-caused by this proactively and encourage people to test it on their own
-systems and report back. It must go into the tree near the start of a
-release cycle rather than at the end to provide time for our confidence
-in it to grow.
-