summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* WIP fix MDEV-30936 clang 15.0.7 -fsanitize=memory fails massivelybb-10.6-MDEV-30936Marko Mäkelä2023-03-2711-48/+20
| | | | | FIXME: innodb.full_crc32_import fails due to CRC-32C computation in buf_page_is_corrupted().
* MDEV-26782 InnoDB temporary tablespace: reclaiming of free space does not workMarko Mäkelä2023-03-274-64/+78
| | | | | | | | | | | | | | | | | | The motivation of this change is to allow undo pages for temporary tables to be marked free as often as possible, so that we can avoid buf_pool.LRU eviction of undo pages that contain data that is no longer needed. For temporary tables, no MVCC or purge of history is needed, and reusing cached undo log pages might not help that much. It is possible that this may cause some performance regression due to more frequent allocation and freeing of undo log pages. trx_write_serialisation_history(): Never cache temporary undo log pages. trx_undo_reuse_cached(): Assert that the rollback segment is persistent. trx_undo_assign_low(): Add template<bool is_temp>. Never invoke trx_undo_reuse_cached() for temporary tables.
* Cleanup: MONITOR_EXISTING trx_undo_slots_used, trx_undo_slots_cachedMarko Mäkelä2023-03-276-22/+33
| | | | | | Let us remove explicit updates of MONITOR_NUM_UNDO_SLOT_USED and MONITOR_NUM_UNDO_SLOT_CACHED, and let us compute the rough values from trx_sys.rseg_array[] on demand.
* trx_assign_rseg_low(): Simplify debug instrumentationMarko Mäkelä2023-03-271-14/+7
|
* MDEV-29593 Purge misses a chance to free not-yet-reused undo pagesMarko Mäkelä2023-03-272-117/+110
| | | | | | trx_purge_truncate_rseg_history(): If all other conditions for invoking trx_purge_remove_log_hdr() hold, but the state is TRX_UNDO_CACHED instead of TRX_UNDO_TO_PURGE, detach and free it.
* MDEV-26071: rpl.rpl_perfschema_applier_status_by_worker failed in bb …Andrei2023-03-243-29/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | …with: Test assertion failed Problem: ======= Assertion text: 'Value returned by SSS and PS table for Last_Error_Number should be same.' Assertion condition: '"1146" = "0"' Assertion condition, interpolated: '"1146" = "0"' Assertion result: '0' Analysis: ======== In parallel replication when slave is started the worker pool gets activated and it gets cleared when slave stops. Each time the worker pool gets activated a backup worker pool also gets created to store worker specific perforance schema information in case of errors. On error, all relevant information is copied from rpl_parallel_thread to rli and it gets cleared from thread. Then server waits for all workers to complete their work, during this stage performance schema table specific worker info is stored into the backup pool and finally the actual pool gets cleared. If users query the performance schema table to know the status of workers the information from backup pool will be used. The test simulates ER_NO_SUCH_TABLE error and verifies the worker information in pfs table. Test works fine if execution occurs in following order. Step 1. Error occurred 'worker information is copied to backup pool'. Step 2. handle_slave_sql invokes 'rpl_parallel_resize_pool_if_no_slaves' to deactivate worker pool, it marks the pool->count=0 Step 3. PFS table is queried, since actual pool is deactivated backup pool information is read. If the Step 3 happens prior to Step2 the pool is yet to be deactivated and the actual pool is read, which doesn't have any error details as they were cleared. Hence test ocasionally fails. Fix: === Upon error mark the back pool as being active so that if PFS table is quried since the backup pool is flagged as valid its information will be read, in case it is not flagged regular pool will be read. This work is one of the last pieces created by the late Sujatha Sivakumar.
* MDEV-29545 InnoDB: Can't find record during replace stmtThirunarayanan Balathandayuthapani2023-03-243-1/+49
| | | | | | | | | | | | | | | | Problem: ======== - InnoDB replace statement returns can't find record as result during bulk insert operation. InnoDB returns DB_END_OF_INDEX blindly when bulk transaction is visible to current transaction even though the search tuple is inserted as a part of current replace statement. Solution: ========= row_search_mvcc(): InnoDB should allow the transaction to read all the rows when innodb intends to do any locking on the record even though bulk insert transaction changes are visible to the current transaction
* MDEV-30900 Crash on macOS due to zero-initialized buf_dblwr.write_condMarko Mäkelä2023-03-241-0/+2
| | | | | | | | buf_dblwr_t::init(), buf_dblwr_t::close(): Cover also write_cond, which was added in commit a55b951e6082a4ce9a1f2ed5ee176ea7dbbaf1f2 without explicit initialization. On GNU/Linux, PTHREAD_COND_INITIALIZER is a zero-initializer. That is why the default zero initialization happened to work on that platform.
* rpl_reporting: sprintf -> snprintfDaniel Black2023-03-241-7/+7
| | | | | | This was failing to compile with AppleClang 14.0.0.14000029. Thanks to Arunesh Choudhary for noticing.
* Merge 10.5 into 10.6Marko Mäkelä2023-03-2229-111/+182
|\
| * MDEV-30882 Crash on ROLLBACK in a ROW_FORMAT=COMPRESSED tableMarko Mäkelä2023-03-222-17/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btr_cur_upd_rec_in_place(): Avoid calling page_zip_write_rec() if we are not modifying any fields that are stored in compressed format. btr_cur_update_in_place_zip_check(): New function to check if a ROW_FORMAT=COMPRESSED record can actually be updated in place. btr_cur_pessimistic_update(): If the BTR_KEEP_POS_FLAG is not set (we are in a ROLLBACK and cannot write any BLOBs), ignore the potential overflow and let page_zip_reorganize() or page_zip_compress() handle it. This avoids a failure when an attempted UPDATE of an NULL column to 0 is rolled back. During the ROLLBACK, we would try to move a non-updated long column to off-page storage in order to avoid a compression failure of the ROW_FORMAT=COMPRESSED page. page_zip_write_trx_id_and_roll_ptr(): Remove an assertion that would fail in row_upd_rec_in_place() because the uncompressed page would already have been modified there. This is a 10.5 version of commit ff3d4395d808b6421d2e0714e10d48c7aa2f3c3a (different because of commit 08ba388713946c03aa591899cd3a446a6202f882).
| * [MDEV-30824] Fix binlog to use 'String' for setting 'character_set_client'Tingyao Nian2023-03-2127-94/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a923d6f49c1ad6fd3f4d6ec02e444c26e4d1dfa8 disabled numeric setting of character_set_* variables with non-default values: MariaDB [(none)]> set character_set_client=224; ERROR 1115 (42000): Unknown character set: '224' However the corresponding binlog functionality still write numeric values for log event, and this will break binlog replay if the value is not default. Now make the server use 'String' type for 'character_set_client' when generating binlog events Before: /*!\C utf8mb4 *//*!*/; SET @@session.character_set_client=224,@@session.collation_connection=224,@@session.collation_server=33/*!*/; After: /*!\C utf8mb4 *//*!*/; SET @@session.character_set_client=utf8mb4,@@session.collation_connection=33,@@session.collation_server=8/*!*/; Note: prior to the previous commit, setting with '224' or '45' or 'utf8mb4' have the same effect, as they all set the parameter to 'utf8mb4'. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.
* | MDEV-30400 fixup: Fix the UNIV_ZIP_DEBUG buildMarko Mäkelä2023-03-202-3/+3
| |
* | MDEV-26827 fixup: Remove a bogus assertionMarko Mäkelä2023-03-201-1/+0
| | | | | | | | | | We can have dirty_blocks=0 when buf_flush_page_cleaner() is being woken up to write out or evict pages from the buf_pool.LRU list.
* | MDEV-30870 Undo tablespace name displays wrongly for I_S queriesThirunarayanan Balathandayuthapani2023-03-174-2/+16
| | | | | | | | | | | | - INNODB_SYS_TABLESPACES in information schema should display innodb_undo001, innodb_undo002 etc as tablespace name for undo tablespaces
* | MDEV-29975 InnoDB fails to release savepoint during bulk insertThirunarayanan Balathandayuthapani2023-03-173-1/+34
| | | | | | | | | | | | | | | | - InnoDB does rollback the whole transaction and discards the savepoint when there is a failure happens during bulk insert operation. When server request to release the savepoint, InnoDB should return DB_SUCCESS when it deals with bulk insert operation
* | MDEV-26827 Make page flushing even fasterMarko Mäkelä2023-03-1621-669/+705
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For more convenient monitoring of something that could greatly affect the volume of page writes, we add the status variable Innodb_buffer_pool_pages_split that was previously only available via information_schema.innodb_metrics as "innodb_page_splits". This was suggested by Axel Schwenke. buf_flush_page_count: Replaced with buf_pool.stat.n_pages_written. We protect buf_pool.stat (except n_page_gets) with buf_pool.mutex and remove unnecessary export_vars indirection. buf_pool.flush_list_bytes: Moved from buf_pool.stat.flush_list_bytes. Protected by buf_pool.flush_list_mutex. buf_pool_t::page_cleaner_status: Replaces buf_pool_t::n_flush_LRU_, buf_pool_t::n_flush_list_, and buf_pool_t::page_cleaner_is_idle. Protected by buf_pool.flush_list_mutex. We will exclusively broadcast buf_pool.done_flush_list by the buf_flush_page_cleaner thread, and only wait for it when communicating with buf_flush_page_cleaner. There is no need to keep a count of pending writes by the buf_pool.flush_list processing. A single flag suffices for that. Waits for page write completion can be performed by simply waiting on block->page.lock, or by invoking buf_dblwr.wait_for_page_writes(). buf_LRU_block_free_non_file_page(): Broadcast buf_pool.done_free and set buf_pool.try_LRU_scan when freeing a page. This would be executed also as part of buf_page_write_complete(). buf_page_write_complete(): Do not broadcast buf_pool.done_flush_list, and do not acquire buf_pool.mutex unless buf_pool.LRU eviction is needed. Let buf_dblwr count all writes to persistent pages and broadcast a condition variable when no outstanding writes remain. buf_flush_page_cleaner(): Prioritize LRU flushing and eviction right after "furious flushing" (lsn_limit). Simplify the conditions and reduce the hold time of buf_pool.flush_list_mutex. Refuse to shut down or sleep if buf_pool.ran_out(), that is, LRU eviction is needed. buf_pool_t::page_cleaner_wakeup(): Add the optional parameter for_LRU. buf_LRU_get_free_block(): Protect buf_lru_free_blocks_error_printed with buf_pool.mutex. Invoke buf_pool.page_cleaner_wakeup(true) to to ensure that buf_flush_page_cleaner() will process the LRU flush request. buf_do_LRU_batch(), buf_flush_list(), buf_flush_list_space(): Update buf_pool.stat.n_pages_written when submitting writes (while holding buf_pool.mutex), not when completing them. buf_page_t::flush(), buf_flush_discard_page(): Require that the page U-latch be acquired upfront, and remove buf_page_t::ready_for_flush(). buf_pool_t::delete_from_flush_list(): Remove the parameter "bool clear". buf_flush_page(): Count pending page writes via buf_dblwr. buf_flush_try_neighbors(): Take the block of page_id as a parameter. If the tablespace is dropped before our page has been written out, release the page U-latch. buf_pool_invalidate(): Let the caller ensure that there are no outstanding writes. buf_flush_wait_batch_end(false), buf_flush_wait_batch_end_acquiring_mutex(false): Replaced with buf_dblwr.wait_for_page_writes(). buf_flush_wait_LRU_batch_end(): Replaces buf_flush_wait_batch_end(true). buf_flush_list(): Remove some broadcast of buf_pool.done_flush_list. buf_flush_buffer_pool(): Invoke also buf_dblwr.wait_for_page_writes(). buf_pool_t::io_pending(), buf_pool_t::n_flush_list(): Remove. Outstanding writes are reflected by buf_dblwr.pending_writes(). buf_dblwr_t::init(): New function, to initialize the mutex and the condition variables, but not the backing store. buf_dblwr_t::is_created(): Replaces buf_dblwr_t::is_initialised(). buf_dblwr_t::pending_writes(), buf_dblwr_t::writes_pending: Keeps track of writes of persistent data pages. buf_flush_LRU(): Allow calls while LRU flushing may be in progress in another thread. Tested by Matthias Leich (correctness) and Axel Schwenke (performance)
* | MDEV-26055: Improve adaptive flushingMarko Mäkelä2023-03-165-354/+360
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adaptive flushing is enabled by setting innodb_max_dirty_pages_pct_lwm>0 (not default) and innodb_adaptive_flushing=ON (default). There is also the parameter innodb_adaptive_flushing_lwm (default: 10 per cent of the log capacity). It should enable some adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0. That is not being changed here. This idea was first presented by Inaam Rana several years ago, and I discussed it with Jean-François Gagné at FOSDEM 2023. buf_flush_page_cleaner(): When we are not near the log capacity limit (neither buf_flush_async_lsn nor buf_flush_sync_lsn are set), also try to move clean blocks from the buf_pool.LRU list to buf_pool.free or initiate writes (but not the eviction) of dirty blocks, until the remaining I/O capacity has been consumed. buf_flush_LRU_list_batch(): Add the parameter bool evict, to specify whether dirty least recently used pages (from buf_pool.LRU) should be evicted immediately after they have been written out. Callers outside buf_flush_page_cleaner() will pass evict=true, to retain the existing behaviour. buf_do_LRU_batch(): Add the parameter bool evict. Return counts of evicted and flushed pages. buf_flush_LRU(): Add the parameter bool evict. Assume that the caller holds buf_pool.mutex and will invoke buf_dblwr.flush_buffered_writes() afterwards. buf_flush_list_holding_mutex(): A low-level variant of buf_flush_list() whose caller must hold buf_pool.mutex and invoke buf_dblwr.flush_buffered_writes() afterwards. buf_flush_wait_batch_end_acquiring_mutex(): Remove. It is enough to have buf_flush_wait_batch_end(). page_cleaner_flush_pages_recommendation(): Avoid some floating-point arithmetics. buf_flush_page(), buf_flush_check_neighbor(), buf_flush_check_neighbors(), buf_flush_try_neighbors(): Rename the parameter "bool lru" to "bool evict". buf_free_from_unzip_LRU_list_batch(): Remove the parameter. Only actual page writes will contribute towards the limit. buf_LRU_free_page(): Evict freed pages of temporary tables. buf_pool.done_free: Broadcast whenever a block is freed (and buf_pool.try_LRU_scan is set). buf_pool_t::io_buf_t::reserve(): Retry indefinitely. During the test encryption.innochecksum we easily run out of these buffers for PAGE_COMPRESSED or ENCRYPTED pages. Tested by Matthias Leich and Axel Schwenke
* | MDEV-30357 Performance regression in locking reads from secondary indexesMarko Mäkelä2023-03-164-36/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lock_sec_rec_some_has_impl(): Remove a harmful condition that caused the performance regression and should not have been added in commit b6e41e38720d1e8d33b2abec0d1109615133bc2b in the first place. Locking transactions that have not modified any persistent tables can carry the transaction identifier 0. trx_t::max_inactive_id: A cache for trx_sys_t::find_same_or_older(). The value is not reset on transaction commit so that previous results can be reused for subsequent transactions. The smallest active transaction ID can only increase over time, not decrease. trx_sys_t::find_same_or_older(): Remember the maximum previous id for which rw_trx_hash.iterate() returned false, to avoid redundant iterations. lock_sec_rec_read_check_and_lock(): Add an early return in case we are already holding a covering table lock. lock_rec_convert_impl_to_expl(): Add a template parameter to avoid a redundant run-time check on whether the index is secondary. lock_rec_convert_impl_to_expl_for_trx(): Move some code from lock_rec_convert_impl_to_expl(), to reduce code duplication due to the added template parameter. Reviewed by: Vladislav Lesin Tested by: Matthias Leich
* | MDEV-29835 InnoDB hang on B-tree split or mergeMarko Mäkelä2023-03-167-160/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow-up to commit de4030e4d49805a7ded5c0bfee01cc3fd7623522 (MDEV-30400), which fixed some hangs related to B-tree split or merge. btr_root_block_get(): Use and update the root page guess. This is just a minor performance optimization, not affecting correctness. btr_validate_level(): Remove the parameter "lockout", and always acquire an exclusive dict_index_t::lock in CHECK TABLE without QUICK. This is needed in order to avoid latching order violation in btr_page_get_father_node_ptr_for_validate(). btr_cur_need_opposite_intention(): Return true in case btr_cur_compress_recommendation() would hold later during the mini-transaction, or if a page underflow or overflow is possible. If we return true, our caller will escalate to aqcuiring an exclusive dict_index_t::lock, to prevent a latching order violation and deadlock during btr_compress() or btr_page_split_and_insert(). btr_cur_t::search_leaf(), btr_cur_t::open_leaf(): Also invoke btr_cur_need_opposite_intention() on the leaf page. btr_cur_t::open_leaf(): When escalating to exclusive index locking, acquire exclusive latches on all pages as well. innobase_instant_try(): Return an error code if the root page cannot be retrieved. In addition to the normal stress testing with Random Query Generator (RQG) this has been tested with ./mtr --mysqld=--loose-innodb-limit-optimistic-insert-debug=2 but with the injection in btr_cur_optimistic_insert() for non-leaf pages adjusted so that it would use the value 3. (Otherwise, infinite page splits could occur in some mtr tests.) Tested by: Matthias Leich
* | Merge 10.5 into 10.6Marko Mäkelä2023-03-166-21/+101
|\ \ | |/
| * MDEV-30860 Race condition between buffer pool flush and log file deletion in ↵Marko Mäkelä2023-03-161-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mariadb-backup --prepare srv_start(): If we are going to close the log file in mariadb-backup --prepare, call buf_flush_sync() before calling recv_sys.debug_free() to ensure that the log file will not be accessed. This fixes a rather rare failure in the test mariabackup.innodb_force_recovery where buf_flush_page_cleaner() would invoke log_checkpoint_low() because !recv_recovery_is_on() would hold due to the fact that recv_sys.debug_free() had already been called. Then, the log write for the checkpoint would fail because srv_start() had invoked log_sys.log.close_file().
| * MDEV-30775 Performance regression in fil_space_t::try_to_close() introduced ↵Vlad Lesin2023-03-104-16/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in MDEV-23855 fil_node_open_file_low() tries to close files from the top of fil_system.space_list if the number of opened files is exceeded. It invokes fil_space_t::try_to_close(), which iterates the list searching for the first opened space. Then it just closes the space, leaving it in the same position in fil_system.space_list. On heavy files opening, like during 'SHOW TABLE STATUS ...' execution, if the number of opened files limit is reached, fil_space_t::try_to_close() iterates more and more closed spaces before reaching any opened space for each fil_node_open_file_low() call. What causes performance regression if the number of spaces is big enough. The fix is to keep opened spaces at the top of fil_system.space_list, and move closed files at the end of the list. For this purpose fil_space_t::space_list_last_opened pointer is introduced. It points to the last inserted opened space in fil_space_t::space_list. When space is opened, it's inserted to the position just after the pointer points to in fil_space_t::space_list to preserve the logic, inroduced in MDEV-23855. Any closed space is added to the end of fil_space_t::space_list. As opened spaces are located at the top of fil_space_t::space_list, fil_space_t::try_to_close() finds opened space faster. There can be the case when opened and closed spaces are mixed in fil_space_t::space_list if fil_system.freeze_space_list was set during fil_node_open_file_low() execution. But this should not cause any error, as fil_space_t::try_to_close() still iterates spaces in the list. There is no need in any test case for the fix, as it does not change any functionality, but just fixes performance regression.
* | Merge 10.5 into 10.6Marko Mäkelä2023-03-1015-159/+172
|\ \ | |/
| * MDEV-30819 InnoDB fails to start up after downgrading from MariaDB 11.0Marko Mäkelä2023-03-093-14/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While downgrades are not supported and misguided attempts at it could cause serious corruption especially after commit b07920b634f455c39e3650c6163bec2a8ce0ffe0 it might be useful if InnoDB would start up even after an upgrade to MariaDB Server 11.0 or later had removed the change buffer. innodb_change_buffering_update(): Disallow anything else than innodb_change_buffering=none when the change buffer is corrupted. ibuf_init_at_db_start(): Mention a possible downgrade in the corruption error message. If innodb_change_buffering=none, ignore the error but do not initialize ibuf.index. ibuf_free_excess_pages(), ibuf_contract(), ibuf_merge_space(), ibuf_update_max_tablespace_id(), ibuf_delete_for_discarded_space(), ibuf_print(): Check for !ibuf.index. ibuf_check_bitmap_on_import(): Remove some unnecessary code. This function is only accessing change buffer bitmap pages in a data file that is not attached to the rest of the database. It is not accessing the change buffer tree itself, hence it does not need any additional mutex protection. This has been tested both by starting up MariaDB Server 10.8 on a 11.0 data directory, and by running ./mtr --big-test while ibuf_init_at_db_start() was tweaked to always fail.
| * MDEV-23000: Ensure we get a warning from THD::drop_temporary_table() in case ↵Weijun Huang2023-03-092-7/+8
| | | | | | | | of disk errors
| * move alloca() definition from all *.h files to one new header fileJulius Goryavsky2023-03-079-32/+58
| |
| * MDEV-30567 rec_get_offsets() is not optimalMarko Mäkelä2023-03-061-118/+90
| | | | | | | | | | | | | | | | | | | | rec_init_offsets_comp_ordinary(), rec_init_offsets(), rec_get_offsets_reverse(), rec_get_nth_field_offs_old(): Simplify some bitwise arithmetics to avoid conditional jumps, and add branch prediction hints with the assumption that most variable-length columns are short. Tested by: Matthias Leich
* | Merge 10.5 into 10.6Marko Mäkelä2023-03-061-1/+1
|\ \ | |/
| * Fix GCC 5.3.1 -Wsign-compareMarko Mäkelä2023-03-061-1/+1
| | | | | | | | This fixes up commit 57c526ffb852fb027e25fdc77173d45bdc60b8a2
| * CONC-637 Build fails when specifying -DPLUGIN_AUTH_GSSAPI_CLIENT=OFFbb-10.5-sergSergei Golubchik2023-02-281-0/+0
| |
* | MDEV-30341 Reset check_foreigns, check_unique_secondary variablesThirunarayanan Balathandayuthapani2023-03-022-1/+6
| | | | | | | | | | | | - InnoDB fails to reset the check_foreigns and check_unique_secondary in trx_t::free(), trx_t::commit_cleanup(). This lead to bulk insert in internal innodb fts table operation.
* | Merge 10.5 into 10.6Marko Mäkelä2023-02-2818-102/+168
|\ \ | |/
| * MDEV-30753 Possible corruption due to trx_purge_free_segment()Marko Mäkelä2023-02-282-59/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Starting with commit 0de3be8cfdfc26f5c236eaefe12d03c7b4af22c8 (MDEV-30671), the field TRX_UNDO_NEEDS_PURGE lost its previous meaning. The following scenario is possible: (1) InnoDB is killed at a point of time corresponding to the durable execution of some fseg_free_step_not_header() but not trx_purge_remove_log_hdr(). (2) After restart, the affected pages are allocated for something else. (3) Purge will attempt to access the newly reallocated pages when looking for some old undo log records. trx_purge_free_segment(): Invoke trx_purge_remove_log_hdr() as the first thing, to be safe. If the server is killed, some pages will never be freed. That is the lesser evil. Also, before each mtr.start(), invoke log_free_check() to prevent ib_logfile0 overrun.
| * Added detection of memory overwrite with multi_mallocbb-10.5-montyMonty2023-02-2718-58/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch also fixes some bugs detected by valgrind after this patch: - Not enough copy_func elements was allocated by Create_tmp_table() which causes an memory overwrite in Create_tmp_table::add_fields() I added an ASSERT() to be able to detect this also without valgrind. The bug was that TMP_TABLE_PARAM::copy_fields was not correctly set when calling create_tmp_table(). - Aria::empty_bits is not allocated if there is no varchar/char/blob fields in the table. Fixed code to take this into account. This cannot cause any issues as this is just a memory access into other Aria memory and the content of the memory would not be used. - Aria::last_key_buff was not allocated big enough. This may have caused issues with rtrees and ma_extra(HA_EXTRA_REMEMBER_POS) as they would use the same memory area. - Aria and MyISAM didn't take extended key parts into account, which caused problems when copying rec_per_key from engine to sql level. - Mark asan builds with 'asan' in version strihng to detect these in not_valgrind_build.inc. This is needed to not have main.sp-no-valgrind fail with asan.
* | Merge 10.5 into 10.6Marko Mäkelä2023-02-2716-214/+201
|\ \ | |/
| * MDEV-30671 InnoDB undo log truncation fails to wait for purge of historyMarko Mäkelä2023-02-2416-200/+206
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is not safe to invoke trx_purge_free_segment() or execute innodb_undo_log_truncate=ON before all undo log records in the rollback segment has been processed. A prominent failure that would occur due to premature freeing of undo log pages is that trx_undo_get_undo_rec() would crash when trying to copy an undo log record to fetch the previous version of a record. If trx_undo_get_undo_rec() was not invoked in the unlucky time frame, then the symptom would be that some committed transaction history is never removed. This would be detected by CHECK TABLE...EXTENDED that was impleented in commit ab0190101b0587e0e03b2d75a967050b9a85fd1b. Such a garbage collection leak should be possible even when using innodb_undo_log_truncate=OFF, just involving trx_purge_free_segment(). trx_rseg_t::needs_purge: Change the type from Boolean to a transaction identifier, noting the most recent non-purged transaction, or 0 if everything has been purged. On transaction start, we initialize this to 1 more than the transaction start ID. On recovery, the field may be adjusted to the transaction end ID (TRX_UNDO_TRX_NO) if it is larger. The field TRX_UNDO_NEEDS_PURGE becomes write-only; only some debug assertions that would validate the value. The field reflects the old inaccurate Boolean field trx_rseg_t::needs_purge. trx_undo_mem_create_at_db_start(), trx_undo_lists_init(), trx_rseg_mem_restore(): Remove the parameter max_trx_id. Instead, store the maximum in trx_rseg_t::needs_purge, where trx_rseg_array_init() will find it. trx_purge_free_segment(): Contiguously hold a lock on trx_rseg_t to prevent any concurrent allocation of undo log. trx_purge_truncate_rseg_history(): Only invoke trx_purge_free_segment() if the rollback segment is empty and there are no pending transactions associated with it. trx_purge_truncate_history(): Only proceed with innodb_undo_log_truncate=ON if trx_rseg_t::needs_purge indicates that all history has been purged. Tested by: Matthias Leich
* | MDEV-25984 Assertion `max_doc_id > 0' failed in fts_init_doc_id()Thirunarayanan Balathandayuthapani2023-02-224-5/+63
| | | | | | | | | | | | | | | | | | - rollback_inplace_alter_table() locks the fts internal tables. At the time, insert tries to fetch the doc id from config table, fails to lock the config table and returns doc id as 0. fts_cmp_set_sync_doc_id(): Retry to fetch the doc id again if it encounter DB_LOCK_WAIT_TIMEOUT error
* | MDEV-29871 innodb_fts.fulltext_misc unexpectedly reports a resultThirunarayanan Balathandayuthapani2023-02-212-5/+3
| | | | | | | | | | - match()+0 returns the floating result and converts into integer value and it leads to sporadic failure.
* | MDEV-27701 Race on trx->lock.wait_lock between lock_rec_move() and ↵Vlad Lesin2023-02-204-4/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lock_sys_t::cancel() The initial issue was in assertion failure, which checked the equality of lock to cancel with trx->lock.wait_lock in lock_sys_t::cancel(). If we analyze lock_sys_t::cancel() code from the perspective of trx->lock.wait_lock racing, we won't find the error there, except the cases when we need to reload it after the corresponding latches acquiring. So the fix is just to remove the assertion and reload trx->lock.wait_lock after acquiring necessary latches. Reviewed by: Marko Mäkelä <marko.makela@mariadb.com>
* | Merge 10.5 into 10.6Marko Mäkelä2023-02-162-2/+11
|\ \ | |/
| * MDEV-30552 fixup: Fix the test for non-debugMarko Mäkelä2023-02-162-2/+11
| |
| * Fix clang -Winconsistent-missing-overrideMarko Mäkelä2023-02-161-1/+1
| |
* | MDEV-30638 Deadlock between INSERT and InnoDB non-persistent statistics updateMarko Mäkelä2023-02-167-33/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a partial revert of commit 8b6a308e463f937eb8d2498b04967a222c83af90 (MDEV-29883) and a follow-up to the merge commit 394fc71f4fa8f8b1b6d24adfead0ec45121d271e (MDEV-24569). The latching order related to any operation that accesses the allocation metadata of an InnoDB index tree is as follows: 1. Acquire dict_index_t::lock in non-shared mode. 2. Acquire the index root page latch in non-shared mode. 3. Possibly acquire further index page latches. Unless an exclusive dict_index_t::lock is held, this must follow the root-to-leaf, left-to-right order. 4. Acquire a *non-shared* fil_space_t::latch. 5. Acquire latches on the allocation metadata pages. 6. Possibly allocate and write some pages, or free some pages. btr_get_size_and_reserved(), dict_stats_update_transient_for_index(), dict_stats_analyze_index(): Acquire an exclusive fil_space_t::latch in order to avoid a deadlock in fseg_n_reserved_pages() in case of concurrent access to multiple indexes sharing the same "inode page". fseg_page_is_allocated(): Acquire an exclusive fil_space_t::latch in order to avoid deadlocks. All callers are holding latches on a buffer pool page, or an index, or both. Before commit edbde4a11fd0b6437202f8019a79911441b6fb32 (MDEV-24167) a third mode was available that would not conflict with the shared fil_space_t::latch acquired by ha_innobase::info_low(), i_s_sys_tablespaces_fill_table(), or i_s_tablespaces_encryption_fill_table(). Because those calls should be rather rare, it makes sense to use the simple rw_lock with only shared and exclusive modes. fil_crypt_get_page_throttle(): Avoid invoking fseg_page_is_allocated() on an allocation bitmap page (which can never be freed), to avoid acquiring a shared latch on top of an exclusive one. mtr_t::s_lock_space(), MTR_MEMO_SPACE_S_LOCK: Remove.
* | MDEV-30134 Assertion failed in buf_page_t::unfix() in buf_pool_t::watch_unset()Marko Mäkelä2023-02-162-50/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | buf_pool_t::watch_set(): Always buffer-fix a block if one was found, no matter if it is a watch sentinel or a buffer page. The type of the block descriptor will be rechecked in buf_page_t::watch_unset(). Do not expect the caller to acquire the page hash latch. Starting with commit bd5a6403cace36c6ed428cde62e35adcd3f7e7d0 it is safe to release buf_pool.mutex before acquiring a buf_pool.page_hash latch. buf_page_get_low(): Adjust to the changed buf_pool_t::watch_set(). This simplifies the logic and fixes a bug that was reproduced when using debug builds and the setting innodb_change_buffering_debug=1.
* | MDEV-30397: MariaDB crash due to DB_FAIL reported for a corrupted pageMarko Mäkelä2023-02-163-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | buf_read_page_low(): Map the buf_page_t::read_complete() return value DB_FAIL to DB_PAGE_CORRUPTED. The purpose of the DB_FAIL return value is to avoid error log noise when read-ahead brings in an unused page that is typically filled with NUL bytes. If a synchronous read is bringing in a corrupted page where the page frame does not contain the expected tablespace identifier and page number, that must be treated as an attempt to read a corrupted page. The correct error code for this is DB_PAGE_CORRUPTED. The error code DB_FAIL is not handled by row_mysql_handle_errors(). This was missed in commit 0b47c126e31cddda1e94588799599e138400bcf8 (MDEV-13542).
* | Merge 10.5 into 10.6Marko Mäkelä2023-02-167-16/+120
|\ \ | |/
| * MDEV-30657 InnoDB: Not applying UNDO_APPEND due to corruptionMarko Mäkelä2023-02-152-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | This almost completely reverts commit acd23da4c2363511aae7d984c24cc6847aa3f19c and retains a safe optimization: recv_sys_t::parse(): Remove any old redo log records for the truncated tablespace, to free up memory earlier. If recovery consists of multiple batches, then recv_sys_t::apply() will must invoke recv_sys_t::trim() again to avoid wrongly applying old log records to an already truncated undo tablespace.
| * MDEV-30333 Wrong result with not_null_range_scan and LEFT JOIN with empty tableMonty2023-02-153-5/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a bug in JOIN::make_notnull_conds_for_range_scans() when clearing TABLE->tmp_set, which was used to mark fields that could not be null. This function was only used if 'not_null_range_scan=on' is set. The effect was that tmp_set contained a 'random value' and this caused the optimizer to think that some fields could not be null. FLUSH TABLES clears tmp_set and because of this things worked temporarily. Fixed by clearing tmp_set properly.
| * Fix S3 engine Coverity hitsAndrew Hutchings2023-02-142-4/+4
| | | | | | | | Very minor hits found by Coverity for the S3 engine.