summaryrefslogtreecommitdiff
path: root/storage
Commit message (Collapse)AuthorAgeFilesLines
* Merge 10.8 into 10.9Marko Mäkelä2023-02-16143-461/+334
|\
| * Merge 10.6 into 10.8Marko Mäkelä2023-02-1616-105/+115
| |\
| | * MDEV-30638 Deadlock between INSERT and InnoDB non-persistent statistics updateMarko Mäkelä2023-02-167-33/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a partial revert of commit 8b6a308e463f937eb8d2498b04967a222c83af90 (MDEV-29883) and a follow-up to the merge commit 394fc71f4fa8f8b1b6d24adfead0ec45121d271e (MDEV-24569). The latching order related to any operation that accesses the allocation metadata of an InnoDB index tree is as follows: 1. Acquire dict_index_t::lock in non-shared mode. 2. Acquire the index root page latch in non-shared mode. 3. Possibly acquire further index page latches. Unless an exclusive dict_index_t::lock is held, this must follow the root-to-leaf, left-to-right order. 4. Acquire a *non-shared* fil_space_t::latch. 5. Acquire latches on the allocation metadata pages. 6. Possibly allocate and write some pages, or free some pages. btr_get_size_and_reserved(), dict_stats_update_transient_for_index(), dict_stats_analyze_index(): Acquire an exclusive fil_space_t::latch in order to avoid a deadlock in fseg_n_reserved_pages() in case of concurrent access to multiple indexes sharing the same "inode page". fseg_page_is_allocated(): Acquire an exclusive fil_space_t::latch in order to avoid deadlocks. All callers are holding latches on a buffer pool page, or an index, or both. Before commit edbde4a11fd0b6437202f8019a79911441b6fb32 (MDEV-24167) a third mode was available that would not conflict with the shared fil_space_t::latch acquired by ha_innobase::info_low(), i_s_sys_tablespaces_fill_table(), or i_s_tablespaces_encryption_fill_table(). Because those calls should be rather rare, it makes sense to use the simple rw_lock with only shared and exclusive modes. fil_crypt_get_page_throttle(): Avoid invoking fseg_page_is_allocated() on an allocation bitmap page (which can never be freed), to avoid acquiring a shared latch on top of an exclusive one. mtr_t::s_lock_space(), MTR_MEMO_SPACE_S_LOCK: Remove.
| | * MDEV-30134 Assertion failed in buf_page_t::unfix() in buf_pool_t::watch_unset()Marko Mäkelä2023-02-162-50/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | buf_pool_t::watch_set(): Always buffer-fix a block if one was found, no matter if it is a watch sentinel or a buffer page. The type of the block descriptor will be rechecked in buf_page_t::watch_unset(). Do not expect the caller to acquire the page hash latch. Starting with commit bd5a6403cace36c6ed428cde62e35adcd3f7e7d0 it is safe to release buf_pool.mutex before acquiring a buf_pool.page_hash latch. buf_page_get_low(): Adjust to the changed buf_pool_t::watch_set(). This simplifies the logic and fixes a bug that was reproduced when using debug builds and the setting innodb_change_buffering_debug=1.
| | * MDEV-30397: MariaDB crash due to DB_FAIL reported for a corrupted pageMarko Mäkelä2023-02-163-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | buf_read_page_low(): Map the buf_page_t::read_complete() return value DB_FAIL to DB_PAGE_CORRUPTED. The purpose of the DB_FAIL return value is to avoid error log noise when read-ahead brings in an unused page that is typically filled with NUL bytes. If a synchronous read is bringing in a corrupted page where the page frame does not contain the expected tablespace identifier and page number, that must be treated as an attempt to read a corrupted page. The correct error code for this is DB_PAGE_CORRUPTED. The error code DB_FAIL is not handled by row_mysql_handle_errors(). This was missed in commit 0b47c126e31cddda1e94588799599e138400bcf8 (MDEV-13542).
| | * Merge 10.5 into 10.6Marko Mäkelä2023-02-164-11/+25
| | |\
| | | * MDEV-30657 InnoDB: Not applying UNDO_APPEND due to corruptionMarko Mäkelä2023-02-152-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This almost completely reverts commit acd23da4c2363511aae7d984c24cc6847aa3f19c and retains a safe optimization: recv_sys_t::parse(): Remove any old redo log records for the truncated tablespace, to free up memory earlier. If recovery consists of multiple batches, then recv_sys_t::apply() will must invoke recv_sys_t::trim() again to avoid wrongly applying old log records to an already truncated undo tablespace.
| | | * Fix S3 engine Coverity hitsAndrew Hutchings2023-02-142-4/+4
| | | | | | | | | | | | | | | | Very minor hits found by Coverity for the S3 engine.
| | * | Merge 10.5 into 10.6Marko Mäkelä2023-02-143-1/+38
| | |\ \ | | | |/
| | | * MDEV-30552 InnoDB recovery crashes when error handling scenarioThirunarayanan Balathandayuthapani2023-02-142-0/+9
| | | | | | | | | | | | | | | | | | | | - InnoDB fails to reset the after_apply variable before applying the redo log in last batch during multi-batch recovery.
| | | * MDEV-30551 InnoDB recovery hangs when buffer pool ran out of memoryThirunarayanan Balathandayuthapani2023-02-142-1/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - During non-last batch of multi-batch recovery, InnoDB holds log_sys.mutex and preallocates the block which may intiate page flush, which may initiate log flush, which requires log_sys.mutex to acquire again. This leads to assert failure. So InnoDB recovery should release log_sys.mutex before preallocating the block.
| * | | MDEV-30426 Assertion !rec_offs_nth_extern(offsets2, n) during bulk insertThirunarayanan Balathandayuthapani2023-02-141-3/+6
| | | | | | | | | | | | | | | | | | | | - cmp_rec_rec_simple() fails to detect duplicate key error for bulk insert operation
| * | | Merge 10.6 into 10.8Marko Mäkelä2023-02-10130-375/+235
| |\ \ \ | | |/ /
| | * | Merge 10.5 into 10.6Marko Mäkelä2023-02-10131-375/+235
| | |\ \ | | | |/
| | | * Merge 10.4 into 10.5Marko Mäkelä2023-02-10133-366/+232
| | | |\
| | | | * Apply clang-tidy to remove empty constructors / destructorsVicențiu Ciorbaru2023-02-09132-350/+228
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is the result of running run-clang-tidy -fix -header-filter=.* -checks='-*,modernize-use-equals-default' . Code style changes have been done on top. The result of this change leads to the following improvements: 1. Binary size reduction. * For a -DBUILD_CONFIG=mysql_release build, the binary size is reduced by ~400kb. * A raw -DCMAKE_BUILD_TYPE=Release reduces the binary size by ~1.4kb. 2. Compiler can better understand the intent of the code, thus it leads to more optimization possibilities. Additionally it enabled detecting unused variables that had an empty default constructor but not marked so explicitly. Particular change required following this patch in sql/opt_range.cc result_keys, an unused template class Bitmap now correctly issues unused variable warnings. Setting Bitmap template class constructor to default allows the compiler to identify that there are no side-effects when instantiating the class. Previously the compiler could not issue the warning as it assumed Bitmap class (being a template) would not be performing a NO-OP for its default constructor. This prevented the "unused variable warning".
| | | | * innodb: cmake - sched_getcpu removed - not usedDaniel Black2023-02-081-6/+0
| | | | |
| | | | * MDEV-30554 RockDB libatomic linking on riscv64Daniel Black2023-02-071-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing storage/rocksdb/CMakeCache.txt defined ATOMIC_EXTRA_LIBS when atomics where required. This was determined by the toplevel configure.cmake test (HAVE_GCC_C11_ATOMICS_WITH_LIBATOMIC). As build_rocksdb.cmake is included after ATOMIC_EXTRA_LIBS was set, we just need to use it. As such no riscv64 specific macro is needed in build_rocksdb.cmake. As highlighted by Gianfranco Costamagna (@LocutusOfBorg) in #2472 overwriting SYSTEM_LIBS was problematic. This is corrected in case in future SYSTEM_LIBS is changed elsewhere. Closes #2472.
| | | | * Merge branch '10.3' into 10.4Oleksandr Byelkin2023-01-2814-92/+111
| | | | |\
| | | * | | MDEV-30479 optimization: Invoke recv_sys_t::trim() earlierMarko Mäkelä2023-02-062-16/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | recv_sys_t::parse(): Discard old page-level redo log when parsing a TRIM_PAGES record. recv_sys_t::apply(): trim() was invoked in parse() already. recv_sys_t::truncated_undo_spaces[]: Only store the size, no LSN.
| | | * | | MDEV-30479 OPT_PAGE_CHECKSUM mismatch after innodb_undo_log_truncate=ONMarko Mäkelä2023-02-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | page_recv_t::trim(): Do remove log records for mini-transactions that end right at the threshold LSN. This will avoid an inconsistency where a dirty page had been evicted from the buffer pool during undo tablespace truncation, and recovery would attempt to apply log records for which the last available copy in the data file is too new. These changes would be discarded anyway.
| | * | | | Merge branch '10.6.12' into 10.6Oleksandr Byelkin2023-02-061-2/+2
| | |\ \ \ \
| | | * | | | Silence gcc-11 warningsVicențiu Ciorbaru2023-02-031-2/+2
| | | | | | |
* | | | | | | Merge branch '10.8' into 10.9mariadb-10.9.5Oleksandr Byelkin2023-02-012-0/+14
|\ \ \ \ \ \ \ | |/ / / / / /
| * | | | | | Merge branch '10.7' into 10.8mariadb-10.8.7Oleksandr Byelkin2023-02-012-0/+14
| |\ \ \ \ \ \
| | * \ \ \ \ \ Merge branch '10.6' into 10.7mariadb-10.7.810.7Oleksandr Byelkin2023-02-012-0/+14
| | |\ \ \ \ \ \ | | | |/ / / / /
| | | * | | | | MDEV-30527 Assertion !m_freed_pages in mtr_t::start() on DROP TEMPORARY TABLEMarko Mäkelä2023-02-011-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mtr_t::commit(): Add special handling of innodb_immediate_scrub_data_uncompressed for TEMPORARY TABLE. This fixes a regression that was caused by commit de4030e4d49805a7ded5c0bfee01cc3fd7623522 (MDEV-30400).
| | | * | | | | MDEV-30524 btr_cur_t::open_leaf() opens non-leaf page in BTR_MODIFY_LEAF modeMarko Mäkelä2023-01-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btr_cur_t::open_leaf(): When we have to reopen the root page in a different mode, ensure that we will actually acquire a latch upfront, instead of using RW_NO_LATCH. This prevents a race condition where the index tree would be split between the time we released the root page S latch and finally acquired a latch in mtr->upgrade_buffer_fix(), actually on a non-leaf root page. This race condition was introduced in commit 89ec4b53ac4c7568b9c9765fff50d9bec7cf3534 (MDEV-29603).
* | | | | | | | Merge branch '10.8' into 10.9Oleksandr Byelkin2023-01-3125-218/+342
|\ \ \ \ \ \ \ \ | |/ / / / / / /
| * | | | | | | Merge branch '10.7' into 10.8Oleksandr Byelkin2023-01-3125-218/+342
| |\ \ \ \ \ \ \ | | |/ / / / / /
| | * | | | | | Merge branch '10.6' into 10.7Oleksandr Byelkin2023-01-3125-218/+342
| | |\ \ \ \ \ \ | | | |/ / / / /
| | | * | | | | Merge branch '10.5' into 10.6Oleksandr Byelkin2023-01-3124-218/+335
| | | |\ \ \ \ \ | | | | |/ / / / | | | |/| / / / | | | | |/ / /
| | | | * | | Merge branch '10.4' into 10.5Oleksandr Byelkin2023-01-2721-201/+331
| | | | |\ \ \
| | | | | * \ \ Merge branch '10.3' into 10.4Oleksandr Byelkin2023-01-2614-92/+111
| | | | | |\ \ \ | | | | | | |/ / | | | | | |/| / | | | | | | |/
| | | | | | * Fix connect bson.cpp warningVicențiu Ciorbaru2023-01-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The ptyp variable is unused.
| | | | | | * Fix mroonga warning of use-after-freeVicențiu Ciorbaru2023-01-201-2/+2
| | | | | | |
| | | | | | * Minimize unsafe C functions usage - replace strcat() and strcpy() (and ↵Mikhail Chalov2023-01-2013-90/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | strncat() and strncpy()) with custom safe_strcat() and safe_strcpy() functions The MariaDB code base uses strcat() and strcpy() in several places. These are known to have memory safety issues and their usage is discouraged. Common security scanners like Flawfinder flags them. In MariaDB we should start using modern and safer variants on these functions. This is similar to memory issues fixes in 19af1890b56c6c147c296479bb6a4ad00fa59dbb and 9de9f105b5cb88249acc39af73d32af337d6fd5f but now replace use of strcat() and strcpy() with safer options strncat() and strncpy(). However, add '\0' forcefully to make sure the result string is correct since for these two functions it is not guaranteed what new string will be null-terminated. Example: size_t dest_len = sizeof(g->Message); strncpy(g->Message, "Null json tree", dest_len); strncat(g->Message, ":", sizeof(g->Message) - strlen(g->Message)); size_t wrote_sz = strlen(g->Message); size_t cur_len = wrote_sz >= dest_len ? dest_len - 1 : wrote_sz; g->Message[cur_len] = '\0'; All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services -- Reviewer and co-author Vicențiu Ciorbaru <vicentiu@mariadb.org> -- Reviewer additions: * The initial function implementation was flawed. Replaced with a simpler and also correct version. * Simplified code by making use of snprintf instead of chaining strcat. * Simplified code by removing dynamic string construction in the first place and using static strings if possible. See connect storage engine changes.
| | | | | * | MDEV-30370 Fixing spider hang when server abortsYuchen Pei2023-01-253-4/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is Kentoku's patch for MDEV-22979 (e6e41f04f4e + 22a0097727f), which fixes 30370. It changes the wait to a timed wait for the first sts thread, which waits on server start to execute the init queries for spider. It also flips the flag init_command to false when the sts thread is being freed. With these changes the sts thread can check the flag regularly and abort the init_queries when it finds out the init_command is false. This avoids the deadlock that causes the problem in MDEV-30370. It also fixes MDEV-22979 for 10.4, but not 10.5. I have not tested higher versions for MDEV-22979. A test has also been done on MDEV-29904 to avoid regression, given MDEV-27233 is a similar problem and its patch caused the regression. The test passes for 10.4-11.0. However, this adhoc test only works consistently when placed in the main testsuite. We should not place spider tests in the main suite, so we do not include it in this commit. A patch for MDEV-27912 should fix this problem and allow a proper test for MDEV-29904. See comments in the jira ticket MDEV-30370/29904 for the adhoc testcase used for this commit.
| | | | | * | MDEV-26541 Make UBSAN builds work with spider again.Yuchen Pei2023-01-204-107/+206
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When built with ubsan and trying to load the spider plugin, the hidden visibility of mysqld compiling flag causes ha_spider.so to be missing the symbol ha_partition. This commit fixes that, as well as some memcpy null pointer issues when built with ubsan. Signed-off-by: Yuchen Pei <yuchen.pei@mariadb.com>
| | | | | * | Merge 10.3 into 10.4Marko Mäkelä2023-01-171-4/+11
| | | | | |\ \ | | | | | | |/
| | | | | | * MDEV-30422 Merge new release of InnoDB 5.7.41 to 10.3Marko Mäkelä2023-01-171-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MySQL 5.7.41 includes one InnoDB change mysql/mysql-server@d2d6b2dd00f709bc528386009150d4bc726e25a0 that seems to be applicable to MariaDB Server 10.3 and 10.4. Even though commit 5b9ee8d8193a8c7a8ebdd35eedcadc3ae78e7fc1 seems to have fixed sporadic failures on our CI systems, it is theoretically possible that another race condition remained. buf_flush_page_cleaner_coordinator(): In the final loop, wait also for buf_get_n_pending_read_ios() to reach 0. In this way, if a secondary index leaf page was read into the buffer pool and ibuf_merge_or_delete_for_page() modified that page or some change buffer pages, the flush loop would execute until the buffer pool really is in a clean state. This potential data corruption bug does not affect MariaDB Server 10.5 or later, thanks to commit b42294bc6409794bdbd2051b32fa079d81cea61d which removed change buffer merges that are not explicitly requested.
| | | | * | | MDEV-30404: Inconsistent updates of PAGE_MAX_TRX_ID on ROW_FORMAT=COMPRESSED ↵Marko Mäkelä2023-01-261-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pages page_copy_rec_list_start(): Do not update the PAGE_MAX_TRX_ID on the compressed copy of the page. The modification is supposed to be logged as part of page_zip_compress() or page_zip_reorganize(). If the page cannot be compressed (due to running out of space), then page_zip_decompress() must be able to roll back the changes. This fixes a regression that was introduced in commit 56f6dab1d0e5a464ea49c1e5efb0032a0f5cea3e (MDEV-21174).
| | | | * | | MDEV-23855 fixup: Remove SRV_MASTER_CHECKPOINT_INTERVALMarko Mäkelä2023-01-251-12/+1
| | | | | | |
| | | * | | | MDEV-30429 InnoDB: Failing assertion: stat_value != UINT64_UNDEFINED in ↵Thirunarayanan Balathandayuthapani2023-01-261-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | storage/innobase/dict/dict0stats.cc line 3647 In dict_stats_analyze_index(), InnoDB sets the maximum value for index_stats_t to indicate the bulk under bulk insert operation. But InnoDB fails to empty the statistics of the table in that case.
* | | | | | | Merge 10.8 into 10.9Marko Mäkelä2023-01-2445-2628/+2974
|\ \ \ \ \ \ \ | |/ / / / / /
| * | | | | | Merge 10.7 into 10.8Marko Mäkelä2023-01-2445-2628/+2974
| |\ \ \ \ \ \ | | |/ / / / /
| | * | | | | Merge 10.6 into 10.7Marko Mäkelä2023-01-2446-2583/+2913
| | |\ \ \ \ \ | | | |/ / / /
| | | * | | | MDEV-30400 Assertion height == btr_page_get_level(...) on INSERTMarko Mäkelä2023-01-2441-2464/+2566
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This also fixes part of MDEV-29835 Partial server freeze which is caused by violations of the latching order that was defined in https://dev.mysql.com/worklog/task/?id=6326 (WL#6326: InnoDB: fix index->lock contention). Unless the current thread is holding an exclusive dict_index_t::lock, it must acquire page latches in a strict parent-to-child, left-to-right order. Not all cases of MDEV-29835 are fixed yet. Failure to follow the correct latching order will cause deadlocks of threads due to lock order inversion. As part of these changes, the BTR_MODIFY_TREE mode is modified so that an Update latch (U a.k.a. SX) will be acquired on the root page, and eXclusive latches (X) will be acquired on all pages leading to the leaf page, as well as any left and right siblings of the pages along the path. The DEBUG_SYNC test innodb.innodb_wl6326 will be removed, because at the time the DEBUG_SYNC point is hit, the thread is actually holding several page latches that will be blocking a concurrent SELECT statement. We also remove double bookkeeping that was caused due to excessive information hiding in mtr_t::m_memo. We simply let mtr_t::m_memo store information of latched pages, and ensure that mtr_memo_slot_t::object is never a null pointer. The tree_blocks[] and tree_savepoints[] were redundant. buf_page_get_low(): If innodb_change_buffering_debug=1, to avoid a hang, do not try to evict blocks if we are holding a latch on a modified page. The test innodb.innodb-change-buffer-recovery will be removed, because change buffering may no longer be forced by debug injection when the change buffer comprises multiple pages. Remove a debug assertion that could fail when innodb_change_buffering_debug=1 fails to evict a page. For other cases, the assertion is redundant, because we already checked that right after the got_block: label. The test innodb.innodb-change-buffering-recovery will be removed, because due to this change, we will be unable to evict the desired page. mtr_t::lock_register(): Register a change of a page latch on an unmodified buffer-fixed block. mtr_t::x_latch_at_savepoint(), mtr_t::sx_latch_at_savepoint(): Replaced by the use of mtr_t::upgrade_buffer_fix(), which now also handles RW_S_LATCH. mtr_t::set_modified(): For temporary tables, invoke buf_page_t::set_modified() here and not in mtr_t::commit(). We will never set the MTR_MEMO_MODIFY flag on other than persistent data pages, nor set mtr_t::m_modifications when temporary data pages are modified. mtr_t::commit(): Only invoke the buf_flush_note_modification() loop if persistent data pages were modified. mtr_t::get_already_latched(): Look up a latched page in mtr_t::m_memo. This avoids many redundant entries in mtr_t::m_memo, as well as redundant calls to buf_page_get_gen() for blocks that had already been looked up in a mini-transaction. btr_get_latched_root(): Return a pointer to an already latched root page. This replaces btr_root_block_get() in cases where the mini-transaction has already latched the root page. btr_page_get_parent(): Fetch a parent page that was already latched in BTR_MODIFY_TREE, by invoking mtr_t::get_already_latched(). If needed, upgrade the root page U latch to X. This avoids bloating mtr_t::m_memo as well as performing redundant buf_pool.page_hash lookups. For non-QUICK CHECK TABLE as well as for B-tree defragmentation, we will invoke btr_cur_search_to_nth_level(). btr_cur_search_to_nth_level(): This will only be used for non-leaf (level>0) B-tree searches that were formerly named BTR_CONT_SEARCH_TREE or BTR_CONT_MODIFY_TREE. In MDEV-29835, this function could be removed altogether, or retained for the case of CHECK TABLE without QUICK. btr_cur_t::left_block: Remove. btr_pcur_move_backward_from_page() can retrieve the left sibling from the end of mtr_t::m_memo. btr_cur_t::open_leaf(): Some clean-up. btr_cur_t::search_leaf(): Replaces btr_cur_search_to_nth_level() for searches to level=0 (the leaf level). We will never release parent page latches before acquiring leaf page latches. If we need to temporarily release the level=1 page latch in the BTR_SEARCH_PREV or BTR_MODIFY_PREV latch_mode, we will reposition the cursor on the child node pointer so that we will land on the correct leaf page. btr_cur_t::pessimistic_search_leaf(): Implement new BTR_MODIFY_TREE latching logic in the case that page splits or merges will be needed. The parent pages (and their siblings) should already be latched on the first dive to the leaf and be present in mtr_t::m_memo; there should be no need for BTR_CONT_MODIFY_TREE. This pre-latching almost suffices; it must be revised in MDEV-29835 and work-arounds removed for cases where mtr_t::get_already_latched() fails to find a block. rtr_search_to_nth_level(): A SPATIAL INDEX version of btr_search_to_nth_level() that can search to any level (including the leaf level). rtr_search_leaf(), rtr_insert_leaf(): Wrappers for rtr_search_to_nth_level(). rtr_search(): Replaces rtr_pcur_open(). rtr_latch_leaves(): Replaces btr_cur_latch_leaves(). Note that unlike in the B-tree code, there is no error handling in case the sibling pages are corrupted. rtr_cur_restore_position(): Remove an unused constant parameter. btr_pcur_open_on_user_rec(): Remove the constant parameter mode=PAGE_CUR_GE. row_ins_clust_index_entry_low(): Use a new mode=BTR_MODIFY_ROOT_AND_LEAF to gain access to the root page when mode!=BTR_MODIFY_TREE, to write the PAGE_ROOT_AUTO_INC. BTR_SEARCH_TREE, BTR_CONT_SEARCH_TREE: Remove. BTR_CONT_MODIFY_TREE: Note that this is only used by rtr_search_to_nth_level(). btr_pcur_optimistic_latch_leaves(): Replaces btr_cur_optimistic_latch_leaves(). ibuf_delete_rec(): Acquire exclusive ibuf.index->lock in order to avoid a deadlock with ibuf_insert_low(BTR_MODIFY_PREV). btr_blob_log_check_t(): Acquire a U latch on the root page, so that btr_page_alloc() in btr_store_big_rec_extern_fields() will avoid a deadlock. btr_store_big_rec_extern_fields(): Assert that the root page latch is being held. Tested by: Matthias Leich Reviewed by: Vladislav Lesin
| | | * | | | MDEV-24623 Replicate bulk insert as table-level exclusive keyDenis Protivensky2023-01-242-1/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - introduce table key construction function in wsrep service interface - don't add row keys when replicating bulk insert - don't start bulk insert on applier or when transaction is not active - don't start bulk insert on system versioned tables - implement actual bulk insert table-level key replication Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
| | | * | | | MDEV-30393 InnoDB: Assertion failure in dict0dict.cc upon ADD FULLTEXT INDEXThirunarayanan Balathandayuthapani2023-01-241-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: ======== - InnoDB fails to remove the newly created table or index from data dictionary and table cache if the alter fails in commit phase Solution: ======== - InnoDB should restart the transaction to remove the newly created table and index when it fails in commit phase of an alter operation. innodb_fts.misc_debug tests the scenario with the help of debug point "stats_lock_fail"