summaryrefslogtreecommitdiff
path: root/storage/innobase/row
Commit message (Collapse)AuthorAgeFilesLines
* Merge 10.5 into 10.6bb-10.3-mdev2126510.3-mdev21265Marko Mäkelä2020-11-261-2/+2
|\
| * Cleanup: Fix Intel compiler warnings about sign conversionsMarko Mäkelä2020-11-251-2/+2
| |
* | Cleanup: Use Atomic_relaxed for dict_sys.row_idMarko Mäkelä2020-11-242-25/+4
| |
* | MDEV-24167: Replace dict_operation_lock (dict_sys.latch)Marko Mäkelä2020-11-245-31/+15
| |
* | MDEV-24167: Replace trx_purge_latchMarko Mäkelä2020-11-242-19/+31
| |
* | MDEV-24167: Use lightweight srw_lock for btr_search_latchMarko Mäkelä2020-11-241-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many InnoDB rw-locks unnecessarily depend on the complex InnoDB rw_lock_t implementation that support the SX lock mode as well as recursive acquisition of X or SX locks. One of them is the bunch of adaptive hash index search latches, instrumented as btr_search_latch in PERFORMANCE_SCHEMA. Let us introduce a simpler lock for those in order to reduce overhead. srw_lock: A simple read-write lock that does not support recursion. On Microsoft Windows, this wraps SRWLOCK, only adding runtime overhead if PERFORMANCE_SCHEMA is enabled. On Linux (all architectures), this is implemented with std::atomic<uint32_t> and the futex system call. On other platforms, we will wrap mysql_rwlock_t with zero runtime overhead. The PERFORMANCE_SCHEMA instrumentation differs from InnoDB rw_lock_t in that we will only invoke PSI_RWLOCK_CALL(start_rwlock_wrwait) or PSI_RWLOCK_CALL(start_rwlock_rdwait) if there is an actual conflict.
* | Merge 10.5 into 10.6Marko Mäkelä2020-11-231-7/+13
|\ \ | |/
| * MDEV-24224 Gap lock on delete in 10.5 using READ COMMITTEDbb-10.5-MDEV-24224Marko Mäkelä2020-11-181-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When MDEV-19544 (commit 1a6f470464171bd1144e4dd6f169bb4018f2e81a) simplified the initialization of the local variable set_also_gap_locks, an inadvertent change was included. Essentially, all code branches that are executed when set_also_gap_locks hold must also ensure that trx->isolation_level > TRX_ISO_READ_COMMITTED holds. This was being violated in a few code paths. It turns out that there is an even simpler fix: Remove the test of thd_is_select() completely. In that way, the first part of UPDATE or DELETE should work exactly like SELECT...FOR UPDATE. thd_is_select(): Remove.
* | MDEV-22343 Remove SYS_TABLESPACES and SYS_DATAFILESbb-10.6-MDEV-23497Marko Mäkelä2020-11-112-65/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The InnoDB internal tables SYS_TABLESPACES and SYS_DATAFILES as well as the INFORMATION_SCHEMA views INNODB_SYS_TABLESPACES and INNODB_SYS_DATAFILES were introduced in MySQL 5.6 for no good reason in mysql/mysql-server/commit/e9255a22ef16d612a8076bc0b34002bc5a784627 when the InnoDB support for the DATA DIRECTORY attribute was introduced. The file system should be the authoritative source of information on files. Storing information about file system paths in the file system (symlinks, or even the .isl files that were unfortunately chosen as the solution) is sufficient. If information is additionally stored in some hidden tables inside the InnoDB system tablespace, everything unnecessarily becomes more complicated, because more copies of data mean more opportunity for the copies to be out of sync, and because modifying the data in the system tablespace in the desired way might not be possible at all without modifying the InnoDB source code. So, the copy in the system tablespace basically is a redundant, non-authoritative source of information. We will stop creating or accessing the system tables SYS_TABLESPACES and SYS_DATAFILES. We will also remove the view INFORMATION_SCHEMA.INNODB_SYS_DATAFILES along with SYS_DATAFILES. The view INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES will be repurposed to directly reflect fil_system.space_list. The column PAGE_SIZE, which would always contain the value of the GLOBAL read-only variable innodb_page_size, is removed. The column ZIP_PAGE_SIZE, which would actually contain the physical page size of a page, is renamed to PAGE_SIZE. Finally, a new column FILENAME is added, as a replacement of SYS_DATAFILES.PATH. This will also address MDEV-21801 (files that were created before upgrading to MySQL 5.6 or MariaDB 10.0 or later were never registered in SYS_TABLESPACES or SYS_DATAFILES) and MDEV-21801 (information about the system tablespace is not stored in SYS_TABLESPACES or SYS_DATAFILES).
* Merge 10.4 into 10.5Marko Mäkelä2020-10-302-5/+4
|\
| * Merge 10.3 into 10.4Marko Mäkelä2020-10-292-5/+4
| |\
| | * Merge 10.2 into 10.3Marko Mäkelä2020-10-282-4/+4
| | |\
| | | * MDEV-23991 dict_table_stats_lock() has unnecessarily long scopeEugene Kosov2020-10-272-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch removes dict_index_t::stats_latch. Table/index statistics now protected with dict_sys->mutex. That way statistics computation can happen in parallel in several threads and dict_sys->mutex will be locked only for a short period of time. This patch is a joint work with Marko Mäkelä dict_index_t::lock: make mutable which allows to pass const pointer when only lock is touched in an object btr_height_get() btr_get_size(): make index argument const for better type safety btr_estimate_number_of_different_key_vals(): now returns computed values instead of setting fields in dict_index_t directly remove everything related to dict_index_t::stats_latch dict_stats_index_set_n_diff(): now returns computed values instead of setting fields in dict_index_t directly dict_stats_analyze_index(): now returns computed values instead of setting fields in dict_index_t directly Reviewed by: Marko Mäkelä
* | | | MDEV-23855: Shrink fil_space_tMarko Mäkelä2020-10-262-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge n_pending_ios, n_pending_ops to std::atomic<uint32_t> n_pending. Change some more fil_space_t members to uint32_t to reduce the memory footprint. fil_space_t::add(), fil_ibd_create(): Attach the already opened handle to the tablespace, and enforce the fil_system.n_open limit. dict_boot(): Initialize fil_system.max_assigned_id. srv_boot(): Call srv_thread_pool_init() before anything else, so that files should be opened in the correct mode on Windows. fil_ibd_create(): Create the file in OS_FILE_AIO mode, just like fil_node_open_file_low() does it. dict_table_t::is_accessible(): Replaces fil_table_accessible(). Reviewed by: Vladislav Vaintroub
* | | | MDEV-23855: Remove fil_system.LRU and reduce fil_system.mutex contentionMarko Mäkelä2020-10-262-10/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also fixes MDEV-23929: innodb_flush_neighbors is not being ignored for system tablespace on SSD When the maximum configured number of file is exceeded, InnoDB will close data files. We used to maintain a fil_system.LRU list and a counter fil_node_t::n_pending to achieve this, at the huge cost of multiple fil_system.mutex operations per I/O operation. fil_node_open_file_low(): Implement a FIFO replacement policy: The last opened file will be moved to the end of fil_system.space_list, and files will be closed from the start of the list. However, we will not move tablespaces in fil_system.space_list while i_s_tablespaces_encryption_fill_table() is executing (producing output for INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION) because it may cause information of some tablespaces to go missing. We also avoid this in mariabackup --backup because datafiles_iter_next() assumes that the ordering is not changed. IORequest: Fold more parameters to IORequest::type. fil_space_t::io(): Replaces fil_io(). fil_space_t::flush(): Replaces fil_flush(). OS_AIO_IBUF: Remove. We will always issue synchronous reads of the change buffer pages in buf_read_page_low(). We will always ignore some errors for background reads. This should reduce fil_system.mutex contention a little. fil_node_t::complete_write(): Replaces fil_node_t::complete_io(). On both read and write completion, fil_space_t::release_for_io() will have to be called. fil_space_t::io(): Do not acquire fil_system.mutex in the normal code path. xb_delta_open_matching_space(): Do not try to open the system tablespace which was already opened. This fixes a file sharing violation in mariabackup --prepare --incremental. Reviewed by: Vladislav Vaintroub
* | | | Merge 10.4 to 10.5Marko Mäkelä2020-10-225-62/+80
|\ \ \ \ | |/ / /
| * | | Merge 10.3 into 10.4Marko Mäkelä2020-10-225-56/+60
| |\ \ \ | | |/ /
| | * | Merge 10.2 into 10.3Marko Mäkelä2020-10-225-70/+81
| | |\ \ | | | |/
| | | * Merge 10.1 into 10.2Marko Mäkelä2020-10-212-62/+21
| | | |\
| | | | * MDEV-23938: innodb row_search_idx_cond_check handle ICP_ABORTED_BY_USERSergei Petrunia2020-10-161-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - row_search_mvcc() should return DB_INTERRUPTED when it got killed. - Add a syncpoint for the ICP check. - Add test coverage for killed-during-ICP-check scenario Backport of MDEV-22761 fixes for ICP from 10.4 commits: * a6f956488c712bef3b13660584d1b905e0c676cc * c03885cd9ceb1ede7f49a9e218022b401b3a1e28 XtraDB was fixed in deb3b9a17498 Reviewer: Daniel Black
| | | | * MDEV-23722 InnoDB: Assertion: result != FTS_INVALID in fts_trx_row_get_new_stateThirunarayanan Balathandayuthapani2020-10-081-35/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Marking of deletion of row in fts index happens twice in self-referential foreign key relation. So while performing referential checks of foreign key, InnoDB can avoid updating of fts index if the foreign key has self-referential relationship. Reviewed-by: Marko Mäkelä
| | | * | Revert MDEV-23484 Rollback unnecessarily acquires dict_operation_lockMarko Mäkelä2020-10-203-23/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In row_undo_ins_remove_clust_rec() and similar places, an assertion !node->trx->dict_operation_lock_mode could fail, because an online ALTER is not allowed to run at the same time while DDL operations on the table are being rolled back. This race condition would be fixed by always acquiring an InnoDB table lock in ha_innobase::prepare_inplace_alter_table() or prepare_inplace_alter_table_dict(), or by ensuring that recovered transactions are protected by MDL that would block concurrent DDL until the rollback has been completed. This reverts commit 1509363970e9cb574005e3af560299c055dda983 and commit 22c4a7512f8dc3f2d2586a856b362ad97ab2bf7d.
| * | | | MDEV-22761: innodb row_search_idx_cond_check handle CHECK_ABORTED_BY_USERbb-10.4-danielblack-MDEV-22761Daniel Black2020-10-141-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part #3: Two more cases within row_search_mvcc need to handle the CHECK_ABORTED_BY_USER and process this as a DB_INTERRUPTED.
| * | | | MDEV-22761: innodb row_search_idx_cond_check handle CHECK_ABORTED_BY_USERSergei Petrunia2020-10-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part #2: - row_search_mvcc() should return DB_INTERRUPTED when it got - Move the sync point from innodb internals to handler_rowid_filter_check() where other storage engines can use it too - Add a similar syncpoint for the ICP check. - Add a bigger test and test coverage for Rowid Filter with MyISAM - Add test coverage for killed-during-ICP-check scenario
| * | | | MDEV-22761: innodb row_search_idx_cond_check handle CHECK_ABORTED_BY_USERDaniel Black2020-10-141-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | handler_rowid_filter_check can return CHECK_ABORTED_BY_USER. All the functions that call row_search_idx_cond_check handle the CHECK_ABORTED_BY_USER return value. So return it rather than generating an error. This incorrect handling was introduced in MDEV-21794 (8d85715d507d). Reviewer: Marko Mäkelä
* | | | | Cleanup: Make InnoDB page numbers uint32_tMarko Mäkelä2020-10-153-13/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | InnoDB stores a 32-bit page number in page headers and in some data structures, such as FIL_ADDR (consisting of a 32-bit page number and a 16-bit byte offset within a page). For better compile-time error detection and to reduce the memory footprint in some data structures, let us use a uint32_t for the page number, instead of ulint (size_t) which can be 64 bits.
* | | | | MDEV-23399: Performance regression with write workloadsMarko Mäkelä2020-10-158-39/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted the performance bottleneck to the page flushing. The configuration parameters will be changed as follows: innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction) innodb_lru_scan_depth=1536 (old: 1024) innodb_max_dirty_pages_pct=90 (old: 75) innodb_max_dirty_pages_pct_lwm=75 (old: 0) Note: The parameter innodb_lru_scan_depth will only affect LRU eviction of buffer pool pages when a new page is being allocated. The page cleaner thread will no longer evict any pages. It used to guarantee that some pages will remain free in the buffer pool. Now, we perform that eviction 'on demand' in buf_LRU_get_free_block(). The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows: * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks() * As a buf_pool.free limit in buf_LRU_list_batch() for terminating the flushing that is initiated e.g., by buf_LRU_get_free_block() The parameter also used to serve as an initial limit for unzip_LRU eviction (evicting uncompressed page frames while retaining ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit of 100 or unlimited for invoking buf_LRU_scan_and_free_block(). The status variables will be changed as follows: innodb_buffer_pool_pages_flushed: This includes also the count of innodb_buffer_pool_pages_LRU_flushed and should work reliably, updated one by one in buf_flush_page() to give more real-time statistics. The function buf_flush_stats(), which we are removing, was not called in every code path. For both counters, we will use regular variables that are incremented in a critical section of buf_pool.mutex. Note that show_innodb_vars() directly links to the variables, and reads of the counters will *not* be protected by buf_pool.mutex, so you cannot get a consistent snapshot of both variables. The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed, because the page cleaner no longer deals with writing or evicting least recently used pages, and because the single-page writes have been removed: * buffer_LRU_batch_flush_avg_time_slot * buffer_LRU_batch_flush_avg_time_thread * buffer_LRU_batch_flush_avg_time_est * buffer_LRU_batch_flush_avg_pass * buffer_LRU_single_flush_scanned * buffer_LRU_single_flush_num_scan * buffer_LRU_single_flush_scanned_per_call When moving to a single buffer pool instance in MDEV-15058, we missed some opportunity to simplify the buf_flush_page_cleaner thread. It was unnecessarily using a mutex and some complex data structures, even though we always have a single page cleaner thread. Furthermore, the buf_flush_page_cleaner thread had separate 'recovery' and 'shutdown' modes where it was waiting to be triggered by some other thread, adding unnecessary latency and potential for hangs in relatively rarely executed startup or shutdown code. The page cleaner was also running two kinds of batches in an interleaved fashion: "LRU flush" (writing out some least recently used pages and evicting them on write completion) and the normal batches that aim to increase the MIN(oldest_modification) in the buffer pool, to help the log checkpoint advance. The buf_pool.flush_list flushing was being blocked by buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN of a page is ahead of log_sys.get_flushed_lsn(), that is, what has been persistently written to the redo log, we would trigger a log flush and then resume the page flushing. This would unnecessarily limit the performance of the page cleaner thread and trigger the infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms. The settings might not be optimal" that were suppressed in commit d1ab89037a518fcffbc50c24e4bd94e4ec33aed0 unless log_warnings>2. Our revised algorithm will make log_sys.get_flushed_lsn() advance at the start of buf_flush_lists(), and then execute a 'best effort' to write out all pages. The flush batches will skip pages that were modified since the log was written, or are are currently exclusively locked. The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message will be removed, because by design, the buf_flush_page_cleaner() should not be blocked during a batch for extended periods of time. We will remove the single-page flushing altogether. Related to this, the debug parameter innodb_doublewrite_batch_size will be removed, because all of the doublewrite buffer will be used for flushing batches. If a page needs to be evicted from the buffer pool and all 100 least recently used pages in the buffer pool have unflushed changes, buf_LRU_get_free_block() will execute buf_flush_lists() to write out and evict innodb_lru_flush_size pages. At most one thread will execute buf_flush_lists() in buf_LRU_get_free_block(); other threads will wait for that LRU flushing batch to finish. To improve concurrency, we will replace the InnoDB ib_mutex_t and os_event_t native mutexes and condition variables in this area of code. Most notably, this means that the buffer pool mutex (buf_pool.mutex) is no longer instrumented via any InnoDB interfaces. It will continue to be instrumented via PERFORMANCE_SCHEMA. For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical sections of buf_pool.flush_list_mutex should be shorter than those for buf_pool.mutex, because in the worst case, they cover a linear scan of buf_pool.flush_list, while the worst case of a critical section of buf_pool.mutex covers a linear scan of the potentially much longer buf_pool.LRU list. mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable with SAFE_MUTEX. Some InnoDB debug assertions need this predicate instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner(). buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list: Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[]. The number of active flush operations. buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA and SAFE_MUTEX instrumentation. buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU. buf_pool_t::done_flush_list: Condition variable for !n_flush_list. buf_pool_t::do_flush_list: Condition variable to wake up the buf_flush_page_cleaner when a log checkpoint needs to be written or the server is being shut down. Replaces buf_flush_event. We will keep using timed waits (the page cleaner thread will wake _at least_ once per second), because the calculations for innodb_adaptive_flushing depend on fixed time intervals. buf_dblwr: Allocate statically, and move all code to member functions. Use a native mutex and condition variable. Remove code to deal with single-page flushing. buf_dblwr_check_block(): Make the check debug-only. We were spending a significant amount of execution time in page_simple_validate_new(). flush_counters_t::unzip_LRU_evicted: Remove. IORequest: Make more members const. FIXME: m_fil_node should be removed. buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex (which we are removing). page_cleaner_slot_t, page_cleaner_t: Remove many redundant members. pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot(). recv_writer_thread: Remove. Recovery works just fine without it, if we simply invoke buf_flush_sync() at the end of each batch in recv_sys_t::apply(). recv_recovery_from_checkpoint_finish(): Remove. We can simply call recv_sys.debug_free() directly. srv_started_redo: Replaces srv_start_state. SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown() can communicate with the normal page cleaner loop via the new function flush_buffer_pool(). buf_flush_remove(): Assert that the calling thread is holding buf_pool.flush_list_mutex. This removes unnecessary mutex operations from buf_flush_remove_pages() and buf_flush_dirty_pages(), which replace buf_LRU_flush_or_remove_pages(). buf_flush_lists(): Renamed from buf_flush_batch(), with simplified interface. Return the number of flushed pages. Clarified comments and renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this function, which was their only caller, and remove 2 unnecessary buf_pool.mutex release/re-acquisition that we used to perform around the buf_flush_batch() call. At the start, if not all log has been durably written, wait for a background task to do it, or start a new task to do it. This allows the log write to run concurrently with our page flushing batch. Any pages that were skipped due to too recent FIL_PAGE_LSN or due to them being latched by a writer should be flushed during the next batch, unless there are further modifications to those pages. It is possible that a page that we must flush due to small oldest_modification also carries a recent FIL_PAGE_LSN or is being constantly modified. In the worst case, all writers would then end up waiting in log_free_check() to allow the flushing and the checkpoint to complete. buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_flush_space(): Auxiliary function to look up a tablespace for page flushing. buf_flush_page(): Defer the computation of space->full_crc32(). Never call log_write_up_to(), but instead skip persistent pages whose latest modification (FIL_PAGE_LSN) is newer than the redo log. Also skip pages on which we cannot acquire a shared latch without waiting. buf_flush_try_neighbors(): Do not bother checking buf_fix_count because buf_flush_page() will no longer wait for the page latch. Take the tablespace as a parameter, and only execute this function when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold(). buf_flush_relocate_on_flush_list(): Declare as cold, and push down a condition from the callers. buf_flush_check_neighbor(): Take id.fold() as a parameter. buf_flush_sync(): Ensure that the buf_pool.flush_list is empty, because the flushing batch will skip pages whose modifications have not yet been written to the log or were latched for modification. buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables. buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize the counters, and report n->evicted. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_do_LRU_batch(): Return the number of pages flushed. buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if adaptive hash index entries are pointing to the block. buf_LRU_get_free_block(): Do not wake up the page cleaner, because it will no longer perform any useful work for us, and we do not want it to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0) writes out and evicts at most innodb_lru_flush_size pages. (The function buf_do_LRU_batch() may complete after writing fewer pages if more than innodb_lru_scan_depth pages end up in buf_pool.free list.) Eliminate some mutex release-acquire cycles, and wait for the LRU flush batch to complete before rescanning. buf_LRU_check_size_of_non_data_objects(): Simplify the code. buf_page_write_complete(): Remove the parameter evict, and always evict pages that were part of an LRU flush. buf_page_create(): Take a pre-allocated page as a parameter. buf_pool_t::free_block(): Free a pre-allocated block. recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block while not holding recv_sys.mutex. During page allocation, we may initiate a page flush, which in turn may initiate a log flush, which would require acquiring log_sys.mutex, which should always be acquired before recv_sys.mutex in order to avoid deadlocks. Therefore, we must not be holding recv_sys.mutex while allocating a buffer pool block. BtrBulk::logFreeCheck(): Skip a redundant condition. row_undo_step(): Do not invoke srv_inc_activity_count() for every row that is being rolled back. It should suffice to invoke the function in trx_flush_log_if_needed() during trx_t::commit_in_memory() when the rollback completes. sync_check_enable(): Remove. We will enable innodb_sync_debug from the very beginning. Reviewed by: Vladislav Vaintroub
* | | | | Merge branch '10.4' into 10.5Sujatha2020-09-291-2/+8
|\ \ \ \ \ | |/ / / /
| * | | | MDEV-23675 Assertion `pos < table->n_def' fails in dict_table_get_nth_colThirunarayanan Balathandayuthapani2020-09-251-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - During insertion of clustered inde, InnoDB does the check for foreign key constraints. It rebuild the tuple based on foreign column names when there is no foreign index. While rebuilding, InnoDB should ignore the dropped columns.
* | | | | Merge 10.4 into 10.5Marko Mäkelä2020-09-232-26/+33
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.3 into 10.4Marko Mäkelä2020-09-212-26/+33
| |\ \ \ \ | | |/ / /
| | * | | Merge 10.2 into 10.3Marko Mäkelä2020-09-212-26/+33
| | |\ \ \ | | | |/ /
| | | * | MDEV-20396 Server crashes after DELETE with SEL NULL Foreign key and aNikita Malyavin2020-09-141-23/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtual column in index Problem: row_ins_foreign_fill_virtual was unconditionally set virtual fields to NULL even though the field is not a part of a foreign key (but a part of an index) Solution: The new virtual value should be computed with regard to cascade updates.
| | | * | MDEV-18867 Long Time to Stop and StartThirunarayanan Balathandayuthapani2020-09-101-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fts_drop_orphaned_tables() takes long time to remove the orphaned FTS tables. In order to reduce the time, do the following: - Traverse fil_system.space_list and construct a set of table_id,index_id of all FTS_*.ibd tablespaces. - Traverse the sys_indexes table and ignore the entry from the above collection if it exist. - Existing elements in the collection can be considered as orphaned fts tables. construct the table name from (table_id,index_id) and invoke fts_drop_tables(). - Removed DICT_TF2_FTS_AUX_HEX_NAME flag usage from upgrade. - is_aux_table() in dict_table_t to check whether the given name is fts auxiliary table fts_space_set_t is a structure to store set of parent table id and index id - Remove unused FTS function in fts0fts.cc - Remove the fulltext index in row_format_redundant test case. Because it deals with the condition that SYS_TABLES does have corrupted entry and valid entry exist in SYS_INDEXES.
* | | | | MDEV-23719: Make lock_sys use page_id_tMarko Mäkelä2020-09-171-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 8ccb3caafb7cba0fca12e89c5c9b67a740364fdd it should be more efficient to use page_id_t rather than two separate variables for tablespace identifier and page number. lock_rec_fold(): Replaced with page_id_t::fold(). lock_rec_hash(): Replaced with lock_sys.hash(page_id). lock_rec_expl_exist_on_page(), lock_rec_get_first_on_page_addr(), lock_rec_get_first_on_page(): Replaced with lock_sys.get_first().
* | | | | Merge 10.4 into 10.5Marko Mäkelä2020-09-091-12/+48
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.3 into 10.4Marko Mäkelä2020-09-091-12/+51
| |\ \ \ \ | | |/ / /
| | * | | Merge 10.2 into 10.3Marko Mäkelä2020-09-091-12/+51
| | |\ \ \ | | | |/ /
| | | * | MDEV-22924 fixup: Replace C++11 autoMarko Mäkelä2020-09-091-1/+1
| | | | |
| | | * | MDEV-22924 fixup: Replace C++11 nullptrMarko Mäkelä2020-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | Only starting with MariaDB Server 10.4 we may depend on C++11.
| | | * | MDEV-22924 Corruption in MVCC read via secondary indexbb-10.2-MDEV-22924Marko Mäkelä2020-09-071-12/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An unsafe optimization was introduced by commit 2347ffd843b8e4ee9d8eaafab05368435db59ece (MDEV-20301) which is based on mysql/mysql-server@3f3136188f1bd383f77f97823cf6ebd72d5e4d7e or mysql/mysql-server@647a3814a91c3d3bffc70ddff5513398e3f37bd4 in MySQL 8.0.12 or MySQL 8.0.13 (which in turn is based on the contribution in MySQL Bug #84958). Row_sel_get_clust_rec_for_mysql::operator(): In addition to checking that the pointer to the record matches, also check the latest modification of the page (FIL_PAGE_LSN) as well as the page identifier. Only if all three match, it is safe to reuse cached_old_vers. Row_sel_get_clust_rec_for_mysql::check_eq(): Assert that the PRIMARY KEY of the cached old version of the record corresponds to the latest version. We got a test case where CHECK TABLE, UPDATE and purge would be hammering on the same table (with only 6 rows) and a pointer that was originally pointing to a record pk=2 would match a cached_clust_rec that was pointing to a record pk=1. In the diagnosed `rr replay` trace, we would wrongly return an old cached version of the pk=1 record, instead of retrieving the correct version of the pk=2 record. Because of this, CHECK TABLE would fail to count one of the records in a secondary index, and report failure. This bug appears to affect MVCC reads via secondary indexes only. The purge of history in secondary indexes uses a different code path, and so do checks for implicit record locks.
* | | | | Merge 10.4 into 10.5Marko Mäkelä2020-09-045-208/+243
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.3 into 10.4Marko Mäkelä2020-09-035-200/+243
| |\ \ \ \ | | |/ / /
| | * | | Merge 10.2 into 10.3Marko Mäkelä2020-09-035-199/+243
| | |\ \ \ | | | |/ /
| | | * | MDEV-23470 InnoDB: Failing assertion: cmp < 0 in ↵Thirunarayanan Balathandayuthapani2020-09-021-9/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | row_ins_check_foreign_constraint During insertion of clustered index, InnoDB does the check for foreign key constraints. Problem is that it uses the clustered index entry to search indexes of referenced tables and it could lead to unexpected result when there is no foreign index. Solution: ======== Rebuild the tuple based on foreign column names before searching it on reference index when there is no foreign index.
| | | * | Merge 10.1 into 10.2Marko Mäkelä2020-09-011-13/+20
| | | |\ \ | | | | |/ | | | | | | | | | | This also fixes MDEV-20464.
| | | | * MDEV-23600 Division by 0 in row_search_with_covering_prefixMarko Mäkelä2020-09-011-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The InnoDB index fields store bytes, not characters. Remove some unnecessary conversions from characters to bytes. This also fixes MDEV-20422 and the wrong-result bug MDEV-12486.
| | | * | Cleanup: Avoid repeated calls to dict_col_t::is_virtual()Marko Mäkelä2020-09-011-18/+12
| | | | |
| | | * | MDEV-20618 Assertion failed in row_upd_sec_index_entryNikita Malyavin2020-09-013-44/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a proper error handling of innobase_get_computed_value results in row_upd_store_row/row_upd_store_v_row. Also add an assertion in row_vers_build_clust_v_col to fail during row purge. Add one more assertion in row_sel_sec_rec_is_for_clust_rec for possible future catches.
| | | * | MDEV-18366 Crash on SELECT on a table with indexed virtual columnsNikita Malyavin2020-09-015-116/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem was in improper error handling behavior in `row_upd_build_difference_binary`: `innobase_free_row_for_vcol` wasn't called. To eliminate this problem in all potential places, a refactoring has been made: * class ib_vcol_row is added. It owns VCOL_STORAGE and heap and maintains it in RAII manner * all innobase_allocate_row_for_vcol/innobase_free_row_for_vcol pairs are substituted with ib_vcol_row usage * row_merge_buf_add is only left untouched because it doesn't own vheap passed as an argument * innobase_allocate_row_for_vcol does not allocate VCOL_STORAGE anymore and accepts it as an argument -- this reduces a number of memory allocations * move rec_printer out of `#ifndef DBUG_OFF` and mark it cold