delta/mariadb-git.git - github.com: MariaDB/server.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	MDEV-21452: Replace all direct use of os_event_t	Marko Mäkelä	2020-12-15	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Let us replace os_event_t with mysql_cond_t, and replace the necessary ib_mutex_t with mysql_mutex_t so that they can be used with condition variables. Also, let us replace polling (os_thread_sleep() or timed waits) with plain mysql_cond_wait() wherever possible. Furthermore, we will use the lightweight srw_mutex for trx_t::mutex, to hopefully reduce contention on lock_sys.mutex. FIXME: Add test coverage of mariabackup --backup --kill-long-queries-timeout
*	Merge 10.5 into 10.6	Marko Mäkelä	2020-12-14	1	-1/+1
\|\
\| *	MDEV-24391 heap-use-after-free in fil_space_t::flush_low()bb-10.5-MDEV-24391	Marko Mäkelä	2020-12-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We observed a race condition that involved two threads executing fil_flush_file_spaces() and one thread executing fil_delete_tablespace(). After one of the fil_flush_file_spaces() observed that space.needs_flush_not_stopping() is set and was releasing the fil_system.mutex, the other fil_flush_file_spaces() would complete the execution of fil_space_t::flush_low() on the same tablespace. Then, fil_delete_tablespace() would destroy the object, because the value of fil_space_t::n_pending did not prevent that. Finally, the fil_flush_file_spaces() would resume execution and invoke fil_space_t::flush_low() on the freed object. This race condition was introduced in commit 118e258aaac5da75a2ac4556201aaea3688fac67 of MDEV-23855. fil_space_t::flush(): Add a template parameter that indicates whether the caller is holding a reference to prevent the tablespace from being freed. buf_dblwr_t::flush_buffered_writes_completed(), row_quiesce_table_start(): Acquire a reference for the duration of the fil_space_t::flush_low() operation. It should be impossible for the object to be freed in these code paths, but we want to satisfy the debug assertions. fil_space_t::flush_low(): Do not increment or decrement the reference count, but instead assert that the caller is holding a reference. fil_space_extend_must_retry(), fil_flush_file_spaces(): Acquire a reference before releasing fil_system.mutex. This is what will fix the race condition.
* \|	Merge 10.5 into 10.6	Marko Mäkelä	2020-12-09	1	-4/+0
\|\ \ \| \|/
\| *	Remove unused DBUG_EXECUTE_IF "ignore_punch_hole"	Marko Mäkelä	2020-12-09	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit ea21d630be639317be0dc9d2b72a04f3ef3f9c7b we conditionally define a variable that only plays a role on systems that support hole-punching (explicit creation of sparse files). However, that broke debug builds on such systems. It turns out that the debug_dbug label "ignore_punch_hole" is not at all used in MariaDB server. It would be covered by the MySQL 5.7 test innodb.table_compress. (Note: MariaDB 10.1 implemented page_compressed tables before something comparable appeared in MySQL 5.7.)
* \|	Merge 10.5 into 10.6	Marko Mäkelä	2020-12-09	3	-64/+90
\|\ \ \| \|/
\| *	MDEV-12227 Defer writes to the InnoDB temporary tablespacebb-10.5-MDEV-24369	Marko Mäkelä	2020-12-09	2	-30/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The flushing of the InnoDB temporary tablespace is unnecessarily tied to the write-ahead redo logging and redo log checkpoints, which must be tied to the page writes of persistent tablespaces. Let us simply omit any pages of temporary tables from buf_pool.flush_list. In this way, log checkpoints will never incur any 'collateral damage' of writing out unmodified changes for temporary tables. After this change, pages of the temporary tablespace can only be written out by buf_flush_lists(n_pages,0) as part of LRU eviction. Hopefully, most of the time, that code will never be executed, and instead, the temporary pages will be evicted by buf_release_freed_page() without ever being written back to the temporary tablespace file. This should improve the efficiency of the checkpoint flushing and the buf_flush_page_cleaner thread. Reviewed by: Vladislav Vaintroub
\| *	Fix -Wunused-but-set-variable	Marko Mäkelä	2020-12-08	1	-3/+12
\| \|
\| *	MDEV-24369 Page cleaner sleeps despite innodb_max_dirty_pages_pct_lwm being ↵	Marko Mäkelä	2020-12-08	1	-32/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	exceeded MDEV-24278 improved the page cleaner so that it will no longer wake up once per second on an idle server. However, with innodb_adaptive_flushing (the default) the function page_cleaner_flush_pages_recommendation() could initially return 0 even if there is work to do. af_get_pct_for_dirty(): Remove. Based on a comment here, it appears that an initial intention of innodb_max_dirty_pages_pct_lwm=0.0 (the default value) was to disable something. That ceased to hold in MDEV-23855: the value is a pure threshold; the page cleaner will not perform any work unless the threshold is exceeded. page_cleaner_flush_pages_recommendation(): Add the parameter dirty_blocks to ensure that buf_pool.flush_list will eventually be emptied.
\| *	MDEV-24350 buf_dblwr unnecessarily uses memory-intensive srv_stats countersbb-10.5-MDEV-24350	Marko Mäkelä	2020-12-04	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The counters in srv_stats use std::atomic and multiple cache lines per counter. This is an overkill in a case where a critical section already exists in the code. A regular variable will work just fine, with much smaller memory bus impact.
\| *	MDEV-24348 InnoDB shutdown hang with innodb_flush_sync=0	Marko Mäkelä	2020-12-04	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This hang was caused by MDEV-23855, and we failed to fix it in MDEV-24109 (commit 4cbfdeca840098b9ed0d8147d43288c36743a328). When buf_flush_ahead() is invoked soon before server shutdown and the non-default setting innodb_flush_sync=OFF is in effect and the buffer pool contains dirty pages of temporary tables, the page cleaner thread may remain in an infinite loop without completing its work, thus causing the shutdown to hang. buf_flush_page_cleaner(): If the buffer pool contains no unmodified persistent pages, ensure that buf_flush_sync_lsn= 0 will be assigned, so that shutdown will proceed. The test case is not deterministic. On my system, it reproduced the hang with 95% probability when running multiple instances of the test in parallel, and 4% when running single-threaded. Thanks to Eugene Kosov for debugging and testing this.
* \|	MDEV-24142: Remove __FILE__,__LINE__ related to buf_block_t::lock	Marko Mäkelä	2020-12-03	1	-20/+3
\| \|
* \|	MDEV-24142: Remove the LatchDebug interface to rw-locks	Marko Mäkelä	2020-12-03	2	-13/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The latching order checks for rw-locks have not caught many bugs in the past few years and they are greatly complicating the code. Last time the debug checks were useful was in commit 59caf2c3c1fe128d1d2c3a8df9fadd4d25ab7102 (MDEV-13485). The B-tree hang MDEV-14637 was not caught by LatchDebug, because the granularity of the checks is not sufficient to distinguish the levels of non-leaf B-tree pages. The interface was already made dead code by the grandparent commit 03ca6495df31313c96e38834b9a235245e2ae2a8.
* \|	MDEV-24142: Replace InnoDB rw_lock_t with sux_lock	Marko Mäkelä	2020-12-03	7	-212/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	InnoDB buffer pool block and index tree latches depend on a special kind of read-update-write lock that allows reentrant (recursive) acquisition of the 'update' and 'write' locks as well as an upgrade from 'update' lock to 'write' lock. The 'update' lock allows any number of reader locks from other threads, but no concurrent 'update' or 'write' lock. If there were no requirement to support an upgrade from 'update' to 'write', we could compose the lock out of two srw_lock (implemented as any type of native rw-lock, such as SRWLOCK on Microsoft Windows). Removing this requirement is very difficult, so in commit f7e7f487d4b06695f91f6fbeb0396b9d87fc7bbf we implemented an 'update' mode to our srw_lock. Re-entrant or recursive locking is mostly needed when writing or freeing BLOB pages, but also in crash recovery or when merging buffered changes to an index page. The re-entrancy allows us to attach a previously acquired page to a sub-mini-transaction that will be committed before whatever else is holding the page latch. The SUX lock supports Shared ('read'), Update, and eXclusive ('write') locking modes. The S latches are not re-entrant, but a single S latch may be acquired even if the thread already holds an U latch. The idea of the U latch is to allow a write of something that concurrent readers do not care about (such as the contents of BTR_SEG_LEAF, BTR_SEG_TOP and other page allocation metadata structures, or the MDEV-6076 PAGE_ROOT_AUTO_INC). (The PAGE_ROOT_AUTO_INC field is only updated when a dict_table_t for the table exists, and only read when a dict_table_t for the table is being added to dict_sys.) block_lock::u_lock_try(bool for_io=true) is used in buf_flush_page() to allow concurrent readers but no concurrent modifications while the page is being written to the data file. That latch will be released by buf_page_write_complete() in a different thread. Hence, we use the special lock owner value FOR_IO. The index_lock::u_lock() improves concurrency on operations that involve non-leaf index pages. The interface has been cleaned up a little. We will use x_lock_recursive() instead of x_lock() when we know that a lock is already held by the current thread. Similarly, a lock upgrade from U to X is only allowed via u_x_upgrade() or x_lock_upgraded() but not via x_lock(). We will disable the LatchDebug and sync_array interfaces to InnoDB rw-locks. The SEMAPHORES section of SHOW ENGINE INNODB STATUS output will no longer include any information about InnoDB rw-locks, only TTASEventMutex (cmake -DMUTEXTYPE=event) waits. This will make a part of the 'innotop' script dead code. The block_lock buf_block_t::lock will not be covered by any PERFORMANCE_SCHEMA instrumentation. SHOW ENGINE INNODB MUTEX and INFORMATION_SCHEMA.INNODB_MUTEXES will no longer output source code file names or line numbers. The dict_index_t::lock will be identified by index and table names, which should be much more useful. PERFORMANCE_SCHEMA is lumping information about all dict_index_t::lock together as event_name='wait/synch/sxlock/innodb/index_tree_rw_lock'. buf_page_free(): Remove the file,line parameters. The sux_lock will not store such diagnostic information. buf_block_dbg_add_level(): Define as empty macro, to be removed in a subsequent commit. Unless the build was configured with cmake -DPLUGIN_PERFSCHEMA=NO the index_lock dict_index_t::lock will be instrumented via PERFORMANCE_SCHEMA. Similar to commit 1669c8890ca2e9092213626e5b047e58ca8b1e77 we will distinguish lock waits by registering shared_lock,exclusive_lock events instead of try_shared_lock,try_exclusive_lock. Actual 'try' operations will not be instrumented at all. rw_lock_list: Remove. After MDEV-24167, this only covered buf_block_t::lock and dict_index_t::lock. We will output their information by traversing buf_pool or dict_sys.
* \|	Merge 10.5 into 10.6	Marko Mäkelä	2020-11-30	1	-1/+1
\|\ \ \| \|/
\| *	MDEV-24308: Remove some os_thread_ functions	Marko Mäkelä	2020-11-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	os_thread_pf(): Remove. os_thread_eq(), os_thread_yield(), os_thread_get_curr_id(): Define as macros. ut_print_timestamp(), ut_sprintf_timestamp(): Simplify.
* \|	Merge 10.5 into 10.6bb-10.3-mdev21265 10.3-mdev21265	Marko Mäkelä	2020-11-26	2	-7/+33
\|\ \ \| \|/
\| *	MDEV-24280 InnoDB triggers too many independent periodic tasksbb-10.5-MDEV-24280	Marko Mäkelä	2020-11-25	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A side effect of MDEV-16264 is that a large number of threads will be created at server startup, to be destroyed after a minute or two. One source of such thread creation is srv_start_periodic_timer(). InnoDB is creating 3 periodic tasks: srv_master_callback (1Hz) srv_error_monitor_task (1Hz), and srv_monitor_task (0.2Hz). It appears that we can merge srv_error_monitor_task and srv_monitor_task and have them invoked 4 times per minute (every 15 seconds). This will affect our ability to enforce innodb_fatal_semaphore_wait_threshold and some computations around BUF_LRU_STAT_N_INTERVAL. We could remove srv_master_callback along with the DROP TABLE queue at some point of time in the future. We must keep it independent of the innodb_fatal_semaphore_wait_threshold detection, because the background DROP TABLE queue could get stuck due to dict_sys being locked by another thread. For now, srv_master_callback must be invoked once per second, so that innodb_flush_log_at_timeout=1 can work. BUF_LRU_STAT_N_INTERVAL: Reduce the precision and extend the time from 501 second to 415 seconds. srv_error_monitor_timer: Remove. MAX_MUTEX_NOWAIT: Increase from 201 second to 215 seconds. srv_refresh_innodb_monitor_stats(): Avoid a repeated call to time(NULL). Change the interval to less than 60 seconds. srv_monitor(): Renamed from srv_monitor_task. srv_monitor_task(): Renamed from srv_error_monitor_task(). Invoked only once in 15 seconds. Invoke also srv_monitor(). Increase the fatal_cnt threshold from 101 second to 115 seconds. sync_array_print_long_waits_low(): Invoke time(NULL) only once. Remove a bogus message about printouts for 30 seconds. Those printouts were effectively already disabled in MDEV-16264 (commit 5e62b6a5e06eb02cbde1e34e95e26f42d87fce02).
\| *	MDEV-24278 InnoDB page cleaner keeps waking up on idle server	Marko Mäkelä	2020-11-25	1	-3/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose of the InnoDB page cleaner subsystem is to write out modified pages from the buffer pool to data files. When the innodb_max_dirty_pages_pct_lwm is not exceeded or innodb_adaptive_flushing=ON decides not to write out anything, the page cleaner should keep sleeping indefinitely until the state of the system changes: a dirty page is added to the buffer pool such that the page cleaner would no longer be idle. buf_flush_page_cleaner(): Explicitly note when the page cleaner is idle. When that happens, use mysql_cond_wait() instead of mysql_cond_timedwait(). buf_flush_insert_into_flush_list(): Wake up the page cleaner if needed. innodb_max_dirty_pages_pct_update(), innodb_max_dirty_pages_pct_lwm_update(): Wake up the page cleaner just in case. Note: buf_flush_ahead(), buf_flush_wait_flushed() and shutdown are already waking up the page cleaner thread.
\| *	Partially Revert "MDEV-24270: Collect multiple completed events at a time"	Vladislav Vaintroub	2020-11-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This partially reverts commit 6479006e14691ff85072d06682f81b90875e9cb0. Remove the constant tpool::aio::N_PENDING, which has no intrinsic meaning for the tpool.
\| *	MDEV-24270: Collect multiple completed events at a time	Marko Mäkelä	2020-11-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	tpool::aio::N_PENDING: Replaces OS_AIO_N_PENDING_IOS_PER_THREAD. This limits two similar things: the number of outstanding requests that a thread may io_submit(), and the number of completed requests collected at a time by io_getevents().
* \|	MDEV-24167: Use lightweight srw_lock for btr_search_latch	Marko Mäkelä	2020-11-24	1	-7/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Many InnoDB rw-locks unnecessarily depend on the complex InnoDB rw_lock_t implementation that support the SX lock mode as well as recursive acquisition of X or SX locks. One of them is the bunch of adaptive hash index search latches, instrumented as btr_search_latch in PERFORMANCE_SCHEMA. Let us introduce a simpler lock for those in order to reduce overhead. srw_lock: A simple read-write lock that does not support recursion. On Microsoft Windows, this wraps SRWLOCK, only adding runtime overhead if PERFORMANCE_SCHEMA is enabled. On Linux (all architectures), this is implemented with std::atomic<uint32_t> and the futex system call. On other platforms, we will wrap mysql_rwlock_t with zero runtime overhead. The PERFORMANCE_SCHEMA instrumentation differs from InnoDB rw_lock_t in that we will only invoke PSI_RWLOCK_CALL(start_rwlock_wrwait) or PSI_RWLOCK_CALL(start_rwlock_rdwait) if there is an actual conflict.
* \|	Merge 10.5 into 10.6	Marko Mäkelä	2020-11-24	1	-12/+4
\|\ \ \| \|/
\| *	MDEV-24271 rw_lock::read_lock_yield() may cause writer starvation	Marko Mäkelä	2020-11-24	1	-12/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The greedy fetch_add(1) approach of read_trylock() may cause starvation of a waiting write lock request. Let us use a compare-and-swap for the read lock acquisition in order to guarantee the progress of writers.
* \|	Merge 10.5 into 10.6	Marko Mäkelä	2020-11-23	1	-93/+13
\|\ \ \| \|/
\| *	MDEV-24167: Remove PFS instrumentation of buf_block_t	Marko Mäkelä	2020-11-20	1	-82/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We always defined PFS_SKIP_BUFFER_MUTEX_RWLOCK, that is, the latches of the buffer pool blocks were never instrumented in PERFORMANCE_SCHEMA. For some reason, the debug_latch (which enforce proper usage of buffer-fixing in debug builds) was instrumented.
\| *	MDEV-24188 fixup: Simplify the wait loop	Marko Mäkelä	2020-11-17	1	-11/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Starting with commit 7cffb5f6e8a231a041152447be8980ce35d2c9b8 (MDEV-23399) the function buf_flush_page() will first acquire block->lock and only after that invoke set_io_fix(). Before that, it was possible to reach a livelock between buf_page_create() and buf_flush_page(). buf_page_create(): Directly try acquiring the exclusive page latch without checking whether the page is io-fixed or buffer-fixed. (As a matter of fact, the have_x_latch() check is not strictly necessary, because we still support recursive X-latches.) In case of a latch conflict, wait while allowing buf_page_write_complete() to acquire buf_pool.mutex and release the block->lock. An attempt to wait for exclusive block->lock while holding buf_pool.mutex would lead to a hang in the tests parts.part_supported_sql_func_innodb and stress.ddl_innodb, due to a deadlock between buf_page_write_complete() and buf_page_create(). Similarly, in case of an I/O fixed compressed-only ROW_FORMAT=COMPRESSED page, we will sleep before retrying. In both cases, we will sleep for 1ms or until a flush batch is completed.
* \|	Merge branch '10.5' into 10.6	Oleksandr Byelkin	2020-11-14	1	-14/+24
\|\ \ \| \|/
\| *	MDEV-24188: Merge 10.4 into 10.5	Marko Mäkelä	2020-11-13	1	-10/+20
\| \|\
\| \| *	MDEV-24188: Merge 10.3 into 10.4	Marko Mäkelä	2020-11-13	1	-15/+20
\| \| \|\
\| \| \| *	MDEV-24188: Merge 10.2 into 10.3	Marko Mäkelä	2020-11-13	1	-25/+27
\| \| \| \|\
\| \| \| \| *	MDEV-24188 Hang in buf_page_create() after reusing a previously freed page	Marko Mäkelä	2020-11-13	1	-25/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fix of MDEV-23456 (commit b1009ae5c16697d5eef443cc6a60a74301148c73) introduced a livelock between page flushing and a thread that is executing buf_page_create(). buf_page_create(): If the current mini-transaction is holding an exclusive latch on the page, do not attempt to acquire another one, and do not care about any I/O fix. mtr_t::have_x_latch(): Replaces mtr_t::get_fix_count(). dyn_buf_t::for_each_block(const Functor&) const: A new variant. rw_lock_own(): Add a const qualifier. Reviewed by: Thirunarayanan Balathandayuthapani
\| * \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-11-13	1	-4/+4
\| \|\ \ \ \ \| \| \|/ / /
\| \| * \| \|	Merge 10.3 into 10.4	Marko Mäkelä	2020-11-12	1	-3/+3
\| \| \|\ \ \ \| \| \| \|/ /
\| \| \| * \|	Merge 10.2 into 10.3	Marko Mäkelä	2020-11-12	1	-3/+3
\| \| \| \|\ \ \| \| \| \| \|/
\| \| \| \| *	MDEV-24182 ibuf_merge_or_delete_for_page() contains dead code	Marko Mäkelä	2020-11-11	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The function ibuf_merge_or_delete_for_page() was always being invoked with update_ibuf_bitmap=true ever since commit cd623508dff53c210154392da6c0f65b7b6bcf4c fixed up something after MDEV-9566. Furthermore, the parameter page_size is never being passed as a null pointer, and therefore it should better be a reference to a constant object.
* \| \| \| \|	Merge 10.5 into 10.6	Marko Mäkelä	2020-11-12	1	-44/+68
\|\ \ \ \ \ \| \|/ / / /
\| * \| \| \|	MDEV-24109 InnoDB hangs with innodb_flush_sync=OFF	Marko Mäkelä	2020-11-04	1	-44/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MDEV-23855 broke the handling of innodb_flush_sync=OFF. That parameter is supposed to limit the page write rate in case the log capacity is being exceeded and log checkpoints are needed. With this fix, the following should pass: ./mtr --mysqld=--loose-innodb-flush-sync=0 One of our best regression tests for page flushing is encryption.innochecksum. With innodb_page_size=16k and innodb_flush_sync=OFF it would likely hang without this fix. log_sys.last_checkpoint_lsn: Declare as Atomic_relaxed<lsn_t> so that we are allowed to read the value while not holding log_sys.mutex. buf_flush_wait_flushed(): Let the page cleaner perform the flushing also if innodb_flush_sync=OFF. After the page cleaner has completed, perform a checkpoint if it is needed, because buf_flush_sync_for_checkpoint() will not be run if innodb_flush_sync=OFF. buf_flush_ahead(): Simplify the condition. We do not really care whether buf_flush_page_cleaner() is running. buf_flush_page_cleaner(): Evaluate innodb_flush_sync at the low level. If innodb_flush_sync=OFF, rate-limit the batches to innodb_io_capacity_max pages per second. Reviewed by: Vladislav Vaintroub
* \| \| \| \|	Merge 10.5 into 10.6mariadb-10.5.7 bb-10.6-georg 10.6-vatu	Marko Mäkelä	2020-11-03	1	-1/+1
\|\ \ \ \ \ \| \|/ / / /
\| * \| \| \|	MDEV-24101 innodb_random_read_ahead=ON causes hang on DDL or shutdownmariadb-10.5.7	Marko Mäkelä	2020-11-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	buf_read_ahead_random(): Do not leak a tablespace reference. The reference was already acquired in fil_space_t::get(), and we must only check that operations were not stopped. This error was introduced when commit 118e258aaac5da75a2ac4556201aaea3688fac67 merged n_pending_ios, n_pending_ops into a single n_pending. This was not noticed earlier, because innodb_random_read_ahead is OFF by default and our regression tests did not vary that parameter at all.
* \| \| \| \|	Merge 10.5 into 10.6	Marko Mäkelä	2020-11-02	8	-3649/+2152
\|\ \ \ \ \ \| \|/ / / /
\| * \| \| \|	MDEV-24054 Assertion in_LRU_list failed in buf_flush_try_neighbors()	Marko Mäkelä	2020-10-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	buf_flush_try_neighbors(): Before invoking buf_page_t::ready_for_flush(), check that the freshly looked up buf_pool.page_hash entry actually is a buffer page and not a buf_pool.watch[] sentinel for purge buffering. This race condition was introduced in MDEV-15053 (commit b1ab211dee599eabd9a5b886fafa3adea29ae041). It is rather hard to hit this bug, because buf_flush_check_neighbors() already checked the condition. The problem exists if buf_pool.watch_set() was invoked for a page in the range after the check in buf_flush_check_neighbor() had been finished.
\| * \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-10-30	2	-10/+59
\| \|\ \ \ \ \| \| \|/ / /
\| \| * \| \|	Merge 10.3 into 10.4	Marko Mäkelä	2020-10-29	2	-74/+80
\| \| \|\ \ \ \| \| \| \|/ /
\| \| \| * \|	Merge 10.2 into 10.3	Marko Mäkelä	2020-10-28	2	-74/+80
\| \| \| \|\ \ \| \| \| \| \|/
\| \| \| \| *	MDEV-23693 Failing assertion: my_atomic_load32_explicit(&lock->lock_word, ↵	Thirunarayanan Balathandayuthapani	2020-10-27	2	-74/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MY_MEMORY_ORDER_RELAXED) == X_LOCK_DECR InnoDB frees the block lock during buffer pool shrinking when other thread is yet to release the block lock. While shrinking the buffer pool, InnoDB allows the page to be freed unless it is buffer fixed. In some cases, InnoDB releases the latch after unfixing the block. Fix: ==== - InnoDB should unfix the block after releases the latch. - Add more assertion to check buffer fix while accessing the page. - Introduced block_hint structure to store buf_block_t pointer and allow accessing the buf_block_t pointer only by passing a functor. It returns original buf_block_t* pointer if it is valid or nullptr if the pointer become stale. - Replace buf_block_is_uncompressed() with buf_pool_t::is_block_pointer() This change is motivated by a change in mysql-5.7.32: mysql/mysql-server@46e60de444a8fbd876cc6778a7e64a1d3426a48d Bug #31036301 ASSERTION FAILURE: SYNC0RW.IC:429:LOCK->LOCK_WORD
\| * \| \| \|	MDEV-24053 MSAN use-of-uninitialized-value in ↵	Marko Mäkelä	2020-10-29	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tpool::simulated_aio::simulated_aio_callback() Starting with commit ef3f71fa7435f092dfce36d606cf22332218dd8b MemorySanitizer would complain that we are writing uninitialized data via the doublewrite buffer. buf_dblwr_t::add_to_batch(): Zero out any unused part of the doublewrite buffer, for PAGE_COMPRESSED and ROW_FORMAT=COMPRESSED tables. Reviewed by: Eugene Kosov
\| * \| \| \|	MDEV-23855: Use normal mutex for log_sys.mutex, log_sys.flush_order_mutex	Marko Mäkelä	2020-10-26	1	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With an unreasonably small innodb_log_file_size, the page cleaner thread would frequently acquire log_sys.flush_order_mutex and spend a significant portion of CPU time spinning on that mutex when determining the checkpoint LSN.
\| * \| \| \|	MDEV-23855: Implement asynchronous doublewrite	Marko Mäkelä	2020-10-26	2	-35/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Synchronous writes and calls to fdatasync(), fsync() or FlushFileBuffers() would ruin performance. So, let us submit asynchronous writes for the doublewrite buffer. We submit a single request for the likely case that the two doublewrite buffers are contiquous in the system tablespace. buf_dblwr_t::flush_buffered_writes_completed(): The completion callback of buf_dblwr_t::flush_buffered_writes(). os_aio_wait_until_no_pending_writes(): Also wait for doublewrite batches. buf_dblwr_t::element::space: Remove. We can simply use element::request.node->space instead. Reviewed by: Vladislav Vaintroub
\| * \| \| \|	MDEV-23399 fixup: Interleaved doublewrite batches	Marko Mäkelä	2020-10-26	1	-45/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Vladislav Vaintroub