MDEV-25404: ssux_lock_low: Introduce a separate writer mutex

Having both readers and writers use a single lock word in futex system calls caused performance regression compared to SRW_LOCK_DUMMY (mutex and 2 condition variables). A contributing factor is that we did not accurately keep track of the number of waiting threads and thus had to invoke system calls to wake up any waiting threads. SUX_LOCK_GENERIC: Renamed from SRW_LOCK_DUMMY. This is the original implementation, with rw_lock (std::atomic<uint32_t>), a mutex and two condition variables. Using a separate writer mutex (as described below) is not possible, because the mutex ownership in a buf_block_t::lock must be able to transfer from a write submitter thread to an I/O completion thread, and pthread_mutex_lock() may assume that the submitter thread is recursively acquiring the mutex that it already holds, while in reality the I/O completion thread is the real owner. POSIX does not define an interface for requesting a mutex to be non-recursive. On Microsoft Windows, srw_lock_low will remain a simple wrapper of SRWLOCK. On 32-bit Microsoft Windows, sizeof(SRWLOCK)=4 while sizeof(srw_lock_low)=8. On other platforms, srw_lock_low is an alias of ssux_lock_low, the Simple (non-recursive) Shared/Update/eXclusive lock. In the futex-based implementation of ssux_lock_low (Linux, OpenBSD, Microsoft Windows), we shall use a dedicated mutex for exclusive requests (writer), and have a WRITER flag in the 'readers' lock word to inform that a writer is holding the lock or waiting for the lock to be granted. When the WRITER flag is set, all lock requests must acquire the writer mutex. Normally, shared (S) lock requests simply perform a compare-and-swap on the 'readers' word. Update locks are implemented as a combination of writer mutex and a normal counter in the 'readers' lock word. The conflict between U and X locks is guaranteed by the writer mutex. Unlike SUX_LOCK_GENERIC, wr_u_downgrade() will not wake up any pending rd_lock() waits. They will wait until u_unlock() releases the writer mutex. The ssux_lock_low is always wrapped by sux_lock (with a recursion count of U and X locks), used for dict_index_t::lock and buf_block_t::lock. Their memory footprint for the futex-based implementation will increase by sizeof(srw_mutex), or 4 bytes. This change addresses a performance regression in read-only benchmarks, such as sysbench oltp_read_only. Also write performance was improved. On 32-bit Linux and OpenBSD, lock_sys_t::hash_table will allocate two hash table elements for each srw_lock (14 instead of 15 hash table cells per 64-byte cache line on IA-32). On Microsoft Windows, sizeof(SRWLOCK)==sizeof(void*) and there is no change. Reviewed by: Vladislav Vaintroub Tested by: Axel Schwenke and Vladislav Vaintroub
author: Marko Mäkelä <marko.makela@mariadb.com> 2021-04-19 18:15:49 +0300
committer: Marko Mäkelä <marko.makela@mariadb.com> 2021-04-19 18:15:49 +0300
commit: 8751aa7397b2e698fa0b46ec3e60abb9e2fd7e1b (patch)
tree: bbe06e685d9e352c74892b44e8187a80e4fb7e94 /storage/innobase/include/lock0lock.h
parent: 040c16ab8b7d5e4192a17a72224e89ff14899cd5 (diff)
download: mariadb-git-8751aa7397b2e698fa0b46ec3e60abb9e2fd7e1b.tar.gz
1 files changed, 12 insertions, 7 deletions
diff --git a/storage/innobase/include/lock0lock.h b/storage/innobase/include/lock0lock.h
index b96f54e03a3..574c3dc1634 100644
--- a/storage/innobase/include/lock0lock.h
+++ b/storage/innobase/include/lock0lock.h
@@ -548,7 +548,7 @@ class lock_sys_t
 
   /** Hash table latch */
   struct hash_latch
-#if defined SRW_LOCK_DUMMY && !defined _WIN32
+#ifdef SUX_LOCK_GENERIC
   : private rw_lock
   {
     /** Wait for an exclusive lock */
@@ -577,15 +577,18 @@ class lock_sys_t
     { return memcmp(this, field_ref_zero, sizeof *this); }
 #endif
   };
-  static_assert(sizeof(hash_latch) <= sizeof(void*), "compatibility");
 
 public:
   struct hash_table
   {
+    /** Number of consecutive array[] elements occupied by a hash_latch */
+    static constexpr size_t LATCH= sizeof(void*) >= sizeof(hash_latch) ? 1 : 2;
+    static_assert(sizeof(hash_latch) <= LATCH * sizeof(void*), "allocation");
+
     /** Number of array[] elements per hash_latch.
-    Must be one less than a power of 2. */
+    Must be LATCH less than a power of 2. */
     static constexpr size_t ELEMENTS_PER_LATCH= CPU_LEVEL1_DCACHE_LINESIZE /
-      sizeof(void*) - 1;
+      sizeof(void*) - LATCH;
 
     /** number of payload elements in array[]. Protected by lock_sys.latch. */
     ulint n_cells;
@@ -608,11 +611,13 @@ public:
     /** @return the index of an array element */
     inline ulint calc_hash(ulint fold) const;
     /** @return raw array index converted to padded index */
-    static ulint pad(ulint h) { return 1 + (h / ELEMENTS_PER_LATCH) + h; }
+    static ulint pad(ulint h)
+    { return LATCH + LATCH * (h / ELEMENTS_PER_LATCH) + h; }
     /** Get a latch. */
     static hash_latch *latch(hash_cell_t *cell)
     {
-      void *l= ut_align_down(cell, (ELEMENTS_PER_LATCH + 1) * sizeof *cell);
+      void *l= ut_align_down(cell, sizeof *cell *
+                             (ELEMENTS_PER_LATCH + LATCH));
       return static_cast<hash_latch*>(l);
     }
     /** Get a hash table cell. */
@@ -646,7 +651,7 @@ private:
   /** Number of shared latches */
   std::atomic<ulint> readers{0};
 #endif
-#if defined SRW_LOCK_DUMMY && !defined _WIN32
+#ifdef SUX_LOCK_GENERIC
 protected:
   /** mutex for hash_latch::wait() */
   pthread_mutex_t hash_mutex;
author	Marko Mäkelä <marko.makela@mariadb.com>	2021-04-19 18:15:49 +0300
committer	Marko Mäkelä <marko.makela@mariadb.com>	2021-04-19 18:15:49 +0300
commit	8751aa7397b2e698fa0b46ec3e60abb9e2fd7e1b (patch)
tree	bbe06e685d9e352c74892b44e8187a80e4fb7e94 /storage/innobase/include/lock0lock.h
parent	040c16ab8b7d5e4192a17a72224e89ff14899cd5 (diff)
download	mariadb-git-8751aa7397b2e698fa0b46ec3e60abb9e2fd7e1b.tar.gz