delta/mariadb-git.git - github.com: MariaDB/server.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add build on AIX	Etienne Guesnet	2020-12-16	1	-46/+0
\|
*	MDEV-22929 fixup: root_name() clash with clang++ <fstream>	Marko Mäkelä	2020-12-03	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	The clang++ -stdlib=libc++ header file <fstream> depends on <filesystem> that defines a member function path::root_name(), which conflicts with the rather unused #define root_name() that had been introduced in commit 7c58e97bf6f80a251046c5b3e7bce826fe058bd6. Because an instrumented -stdlib=libc++ (rather than the default -stdlib=libstdc++) is easier to build for a working -fsanitize=memory (cmake -DWITH_MSAN=ON), let us remove the conflicting #define for now.
*	MDEV-24125: linux large pages - Revert "Fixed centos 6 build failure"	Daniel Black	2020-11-17	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 6cf8f05fd9deb900a78898576b85753e09feddaa. Original patch assumed that MAP_HUGETLB as consistent across achitectures which isn't the case. Defining it unconditionally broke large pages on every achitecutre where the value differed from x86_64. With the EOL for Centos/RHEL6 announced in 10.5.7, <3.8 linux kernels are no longer supported.
*	Merge 10.4 into 10.5	Marko Mäkelä	2020-11-13	1	-9/+17
\|\
\| *	MDEV-24119 MDL BF-BF Conflict caused by TRUNCATE TABLE	sjaakola	2020-11-11	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR fixes same issue as MDEV-21577 for TRUNCATE TABLE. MDEV-21577 fixed TOI replication for OPTIMIZE, REPAIR and ALTER TABLE operating on FK child table. It was later found out that also TRUNCATE has similar problem and needs a fix. The actual fix is to do FK parent table lookup before TRUNCATE TOI isolation and append found FK parent table names in certification key list for the write set. PR contains also new test scenario in galera_ddl_fk_conflict test where FK child has two FK parent tables and there are two DML transactions operating on both parent tables. For development convenience, new TO isolation macro was added: WSREP_TO_ISOLATION_BEGIN_IF and WSREP_TO_ISOLATION_BEGIN_ALTER macro was changed to skip the goto statement. Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
\| *	MDEV-21577 MDL BF-BF conflict	sjaakola	2020-11-03	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some DDL statements appear to acquire MDL locks for a table referenced by foreign key constraint from the actual affected table of the DDL statement. OPTIMIZE, REPAIR and ALTER TABLE belong to this class of DDL statements. Earlier MariaDB version did not take this in consideration, and appended only affected table in the certification key list in write set. Because of missing certification information, it could happen that e.g. OPTIMIZE table for FK child table could be allowed to apply in parallel with DML operating on the foreign key parent table, and this could lead to unhandled MDL lock conflicts between two high priority appliers (BF). The fix in this patch, changes the TOI replication for OPTIMIZE, REPAIR and ALTER TABLE statements so that before the execution of respective DDL statement, there is foreign key parent search round. This FK parent search contains following steps: * open and lock the affected table (with permissive shared locks) * iterate over foreign key contstraints and collect and array of Fk parent table names * close all tables open for the THD and release MDL locks * do the actual TOI replication with the affected table and FK parent table names as key values The patch contains also new mtr test for verifying that the above mentioned DDL statements replicate without problems when operating on FK child table. The mtr test scenario #1, which can be used to check if some other DDL (on top of OPTIMIZE, REPAIR and ALTER) could cause similar excessive FK parent table locking. Reviewed-by: Aleksey Midenkov <aleksey.midenkov@mariadb.com> Reviewed-by: Jan Lindström <jan.lindstrom@mariadb.com>
* \|	MDEV-24109 InnoDB hangs with innodb_flush_sync=OFF	Marko Mäkelä	2020-11-04	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MDEV-23855 broke the handling of innodb_flush_sync=OFF. That parameter is supposed to limit the page write rate in case the log capacity is being exceeded and log checkpoints are needed. With this fix, the following should pass: ./mtr --mysqld=--loose-innodb-flush-sync=0 One of our best regression tests for page flushing is encryption.innochecksum. With innodb_page_size=16k and innodb_flush_sync=OFF it would likely hang without this fix. log_sys.last_checkpoint_lsn: Declare as Atomic_relaxed<lsn_t> so that we are allowed to read the value while not holding log_sys.mutex. buf_flush_wait_flushed(): Let the page cleaner perform the flushing also if innodb_flush_sync=OFF. After the page cleaner has completed, perform a checkpoint if it is needed, because buf_flush_sync_for_checkpoint() will not be run if innodb_flush_sync=OFF. buf_flush_ahead(): Simplify the condition. We do not really care whether buf_flush_page_cleaner() is running. buf_flush_page_cleaner(): Evaluate innodb_flush_sync at the low level. If innodb_flush_sync=OFF, rate-limit the batches to innodb_io_capacity_max pages per second. Reviewed by: Vladislav Vaintroub
* \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-11-03	1	-2/+5
\|\ \ \| \|/
\| *	Merge 10.3 into 10.4	Marko Mäkelä	2020-11-03	1	-2/+5
\| \|\
\| \| *	Merge 10.2 into 10.3	Marko Mäkelä	2020-11-02	1	-2/+5
\| \| \|\
\| \| \| *	MDEV-22387: Do not violate __attribute__((nonnull))	Marko Mäkelä	2020-11-02	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This follows up commit commit 94a520ddbe39ae97de1135d98699cf2674e6b77e and commit 7c5519c12d46ead947d341cbdcbb6fbbe4d4fe1b. After these changes, the default test suites on a cmake -DWITH_UBSAN=ON build no longer fail due to passing null pointers as parameters that are declared to never be null, but plenty of other runtime errors remain.
* \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-10-30	2	-6/+10
\|\ \ \ \ \| \|/ / /
\| * \| \|	Merge 10.3 into 10.4	Marko Mäkelä	2020-10-29	2	-6/+10
\| \|\ \ \ \| \| \|/ /
\| \| * \|	Merge 10.2 into 10.3	Marko Mäkelä	2020-10-28	2	-6/+10
\| \| \|\ \ \| \| \| \|/
\| \| \| *	MDEV-23867: insert... select crash in compute_window_func	Varun Gupta	2020-10-23	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are 2 issues here: Issue #1: memory allocation. An IO_CACHE that uses encryption uses a larger buffer (it needs space for the encrypted data, decrypted data, IO_CACHE_CRYPT struct to describe encryption parameters etc). Issue #2: IO_CACHE::seek_not_done When IO_CACHE objects are cloned, they still share the file descriptor. This means, operation on one IO_CACHE may change the file read position which will confuse other IO_CACHEs using it. The fix of these issues would be: Allocate the buffer to also include the extra size needed for encryption. Perform seek again after one IO_CACHE reads the file.
\| \| \| *	compilation failure with new C/C	Sergei Golubchik	2020-10-23	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	define symbols as C/C does to avoid "macro redefined" warnings
* \| \| \|	Expose utf8mb4_bin charset for plugins	Vicențiu Ciorbaru	2020-10-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cleanup other linker errors
* \| \| \|	MDEV-23399: Performance regression with write workloads	Marko Mäkelä	2020-10-15	2	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The buffer pool refactoring in MDEV-15053 and MDEV-22871 shifted the performance bottleneck to the page flushing. The configuration parameters will be changed as follows: innodb_lru_flush_size=32 (new: how many pages to flush on LRU eviction) innodb_lru_scan_depth=1536 (old: 1024) innodb_max_dirty_pages_pct=90 (old: 75) innodb_max_dirty_pages_pct_lwm=75 (old: 0) Note: The parameter innodb_lru_scan_depth will only affect LRU eviction of buffer pool pages when a new page is being allocated. The page cleaner thread will no longer evict any pages. It used to guarantee that some pages will remain free in the buffer pool. Now, we perform that eviction 'on demand' in buf_LRU_get_free_block(). The parameter innodb_lru_scan_depth(srv_LRU_scan_depth) is used as follows: * When the buffer pool is being shrunk in buf_pool_t::withdraw_blocks() * As a buf_pool.free limit in buf_LRU_list_batch() for terminating the flushing that is initiated e.g., by buf_LRU_get_free_block() The parameter also used to serve as an initial limit for unzip_LRU eviction (evicting uncompressed page frames while retaining ROW_FORMAT=COMPRESSED pages), but now we will use a hard-coded limit of 100 or unlimited for invoking buf_LRU_scan_and_free_block(). The status variables will be changed as follows: innodb_buffer_pool_pages_flushed: This includes also the count of innodb_buffer_pool_pages_LRU_flushed and should work reliably, updated one by one in buf_flush_page() to give more real-time statistics. The function buf_flush_stats(), which we are removing, was not called in every code path. For both counters, we will use regular variables that are incremented in a critical section of buf_pool.mutex. Note that show_innodb_vars() directly links to the variables, and reads of the counters will not be protected by buf_pool.mutex, so you cannot get a consistent snapshot of both variables. The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed, because the page cleaner no longer deals with writing or evicting least recently used pages, and because the single-page writes have been removed: * buffer_LRU_batch_flush_avg_time_slot * buffer_LRU_batch_flush_avg_time_thread * buffer_LRU_batch_flush_avg_time_est * buffer_LRU_batch_flush_avg_pass * buffer_LRU_single_flush_scanned * buffer_LRU_single_flush_num_scan * buffer_LRU_single_flush_scanned_per_call When moving to a single buffer pool instance in MDEV-15058, we missed some opportunity to simplify the buf_flush_page_cleaner thread. It was unnecessarily using a mutex and some complex data structures, even though we always have a single page cleaner thread. Furthermore, the buf_flush_page_cleaner thread had separate 'recovery' and 'shutdown' modes where it was waiting to be triggered by some other thread, adding unnecessary latency and potential for hangs in relatively rarely executed startup or shutdown code. The page cleaner was also running two kinds of batches in an interleaved fashion: "LRU flush" (writing out some least recently used pages and evicting them on write completion) and the normal batches that aim to increase the MIN(oldest_modification) in the buffer pool, to help the log checkpoint advance. The buf_pool.flush_list flushing was being blocked by buf_block_t::lock for no good reason. Furthermore, if the FIL_PAGE_LSN of a page is ahead of log_sys.get_flushed_lsn(), that is, what has been persistently written to the redo log, we would trigger a log flush and then resume the page flushing. This would unnecessarily limit the performance of the page cleaner thread and trigger the infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms. The settings might not be optimal" that were suppressed in commit d1ab89037a518fcffbc50c24e4bd94e4ec33aed0 unless log_warnings>2. Our revised algorithm will make log_sys.get_flushed_lsn() advance at the start of buf_flush_lists(), and then execute a 'best effort' to write out all pages. The flush batches will skip pages that were modified since the log was written, or are are currently exclusively locked. The MDEV-13670 message "page_cleaner: 1000ms intended loop took" message will be removed, because by design, the buf_flush_page_cleaner() should not be blocked during a batch for extended periods of time. We will remove the single-page flushing altogether. Related to this, the debug parameter innodb_doublewrite_batch_size will be removed, because all of the doublewrite buffer will be used for flushing batches. If a page needs to be evicted from the buffer pool and all 100 least recently used pages in the buffer pool have unflushed changes, buf_LRU_get_free_block() will execute buf_flush_lists() to write out and evict innodb_lru_flush_size pages. At most one thread will execute buf_flush_lists() in buf_LRU_get_free_block(); other threads will wait for that LRU flushing batch to finish. To improve concurrency, we will replace the InnoDB ib_mutex_t and os_event_t native mutexes and condition variables in this area of code. Most notably, this means that the buffer pool mutex (buf_pool.mutex) is no longer instrumented via any InnoDB interfaces. It will continue to be instrumented via PERFORMANCE_SCHEMA. For now, both buf_pool.flush_list_mutex and buf_pool.mutex will be declared with MY_MUTEX_INIT_FAST (PTHREAD_MUTEX_ADAPTIVE_NP). The critical sections of buf_pool.flush_list_mutex should be shorter than those for buf_pool.mutex, because in the worst case, they cover a linear scan of buf_pool.flush_list, while the worst case of a critical section of buf_pool.mutex covers a linear scan of the potentially much longer buf_pool.LRU list. mysql_mutex_is_owner(), safe_mutex_is_owner(): New predicate, usable with SAFE_MUTEX. Some InnoDB debug assertions need this predicate instead of mysql_mutex_assert_owner() or mysql_mutex_assert_not_owner(). buf_pool_t::n_flush_LRU, buf_pool_t::n_flush_list: Replaces buf_pool_t::init_flush[] and buf_pool_t::n_flush[]. The number of active flush operations. buf_pool_t::mutex, buf_pool_t::flush_list_mutex: Use mysql_mutex_t instead of ib_mutex_t, to have native mutexes with PERFORMANCE_SCHEMA and SAFE_MUTEX instrumentation. buf_pool_t::done_flush_LRU: Condition variable for !n_flush_LRU. buf_pool_t::done_flush_list: Condition variable for !n_flush_list. buf_pool_t::do_flush_list: Condition variable to wake up the buf_flush_page_cleaner when a log checkpoint needs to be written or the server is being shut down. Replaces buf_flush_event. We will keep using timed waits (the page cleaner thread will wake _at least_ once per second), because the calculations for innodb_adaptive_flushing depend on fixed time intervals. buf_dblwr: Allocate statically, and move all code to member functions. Use a native mutex and condition variable. Remove code to deal with single-page flushing. buf_dblwr_check_block(): Make the check debug-only. We were spending a significant amount of execution time in page_simple_validate_new(). flush_counters_t::unzip_LRU_evicted: Remove. IORequest: Make more members const. FIXME: m_fil_node should be removed. buf_flush_sync_lsn: Protect by std::atomic, not page_cleaner.mutex (which we are removing). page_cleaner_slot_t, page_cleaner_t: Remove many redundant members. pc_request_flush_slot(): Replaces pc_request() and pc_flush_slot(). recv_writer_thread: Remove. Recovery works just fine without it, if we simply invoke buf_flush_sync() at the end of each batch in recv_sys_t::apply(). recv_recovery_from_checkpoint_finish(): Remove. We can simply call recv_sys.debug_free() directly. srv_started_redo: Replaces srv_start_state. SRV_SHUTDOWN_FLUSH_PHASE: Remove. logs_empty_and_mark_files_at_shutdown() can communicate with the normal page cleaner loop via the new function flush_buffer_pool(). buf_flush_remove(): Assert that the calling thread is holding buf_pool.flush_list_mutex. This removes unnecessary mutex operations from buf_flush_remove_pages() and buf_flush_dirty_pages(), which replace buf_LRU_flush_or_remove_pages(). buf_flush_lists(): Renamed from buf_flush_batch(), with simplified interface. Return the number of flushed pages. Clarified comments and renamed min_n to max_n. Identify LRU batch by lsn=0. Merge all the functions buf_flush_start(), buf_flush_batch(), buf_flush_end() directly to this function, which was their only caller, and remove 2 unnecessary buf_pool.mutex release/re-acquisition that we used to perform around the buf_flush_batch() call. At the start, if not all log has been durably written, wait for a background task to do it, or start a new task to do it. This allows the log write to run concurrently with our page flushing batch. Any pages that were skipped due to too recent FIL_PAGE_LSN or due to them being latched by a writer should be flushed during the next batch, unless there are further modifications to those pages. It is possible that a page that we must flush due to small oldest_modification also carries a recent FIL_PAGE_LSN or is being constantly modified. In the worst case, all writers would then end up waiting in log_free_check() to allow the flushing and the checkpoint to complete. buf_do_flush_list_batch(): Clarify comments, and rename min_n to max_n. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_flush_space(): Auxiliary function to look up a tablespace for page flushing. buf_flush_page(): Defer the computation of space->full_crc32(). Never call log_write_up_to(), but instead skip persistent pages whose latest modification (FIL_PAGE_LSN) is newer than the redo log. Also skip pages on which we cannot acquire a shared latch without waiting. buf_flush_try_neighbors(): Do not bother checking buf_fix_count because buf_flush_page() will no longer wait for the page latch. Take the tablespace as a parameter, and only execute this function when innodb_flush_neighbors>0. Avoid repeated calls of page_id_t::fold(). buf_flush_relocate_on_flush_list(): Declare as cold, and push down a condition from the callers. buf_flush_check_neighbor(): Take id.fold() as a parameter. buf_flush_sync(): Ensure that the buf_pool.flush_list is empty, because the flushing batch will skip pages whose modifications have not yet been written to the log or were latched for modification. buf_free_from_unzip_LRU_list_batch(): Remove redundant local variables. buf_flush_LRU_list_batch(): Let the caller buf_do_LRU_batch() initialize the counters, and report n->evicted. Cache the last looked up tablespace. If neighbor flushing is not applicable, invoke buf_flush_page() directly, avoiding a page lookup in between. buf_do_LRU_batch(): Return the number of pages flushed. buf_LRU_free_page(): Only release and re-acquire buf_pool.mutex if adaptive hash index entries are pointing to the block. buf_LRU_get_free_block(): Do not wake up the page cleaner, because it will no longer perform any useful work for us, and we do not want it to compete for I/O while buf_flush_lists(innodb_lru_flush_size, 0) writes out and evicts at most innodb_lru_flush_size pages. (The function buf_do_LRU_batch() may complete after writing fewer pages if more than innodb_lru_scan_depth pages end up in buf_pool.free list.) Eliminate some mutex release-acquire cycles, and wait for the LRU flush batch to complete before rescanning. buf_LRU_check_size_of_non_data_objects(): Simplify the code. buf_page_write_complete(): Remove the parameter evict, and always evict pages that were part of an LRU flush. buf_page_create(): Take a pre-allocated page as a parameter. buf_pool_t::free_block(): Free a pre-allocated block. recv_sys_t::recover_low(), recv_sys_t::apply(): Preallocate the block while not holding recv_sys.mutex. During page allocation, we may initiate a page flush, which in turn may initiate a log flush, which would require acquiring log_sys.mutex, which should always be acquired before recv_sys.mutex in order to avoid deadlocks. Therefore, we must not be holding recv_sys.mutex while allocating a buffer pool block. BtrBulk::logFreeCheck(): Skip a redundant condition. row_undo_step(): Do not invoke srv_inc_activity_count() for every row that is being rolled back. It should suffice to invoke the function in trx_flush_log_if_needed() during trx_t::commit_in_memory() when the rollback completes. sync_check_enable(): Remove. We will enable innodb_sync_debug from the very beginning. Reviewed by: Vladislav Vaintroub
* \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-09-23	1	-0/+5
\|\ \ \ \ \| \|/ / /
\| * \| \|	Merge 10.3 into 10.4	Marko Mäkelä	2020-09-21	1	-2/+5
\| \|\ \ \ \| \| \|/ /
\| \| * \|	Merge 10.2 into 10.3	Marko Mäkelä	2020-09-21	1	-0/+5
\| \| \|\ \ \| \| \| \|/
\| \| \| *	MDEV-23101 : SIGSEGV in lock_rec_unlock() when Galera is enabled	Jan Lindström	2020-09-10	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove incorrect BF (brute force) handling from lock_rec_has_to_wait_in_queue and move condition to correct callers. Add a function to report BF lock waits and assert if incorrect BF-BF lock wait happens. wsrep_report_bf_lock_wait Add a new function to report BF lock wait. wsrep_assert_no_bf_bf_wait Add a new function to check do we have a BF-BF wait and if we have report this case and assert as it is a bug. lock_rec_has_to_wait Use new wsrep_assert_bf_wait to check BF-BF wait. lock_rec_create_low lock_table_create Use new function to report BF lock waits. lock_rec_insert_by_trx_age lock_grant_and_move_on_page lock_grant_and_move_on_rec Assert that trx is not Galera as VATS is not compatible with Galera. lock_rec_add_to_queue If there is conflicting lock in a queue make sure that transaction is BF. lock_rec_has_to_wait_in_queue Remove incorrect BF handling. If there is conflicting locks in a queue all transactions must wait. lock_rec_dequeue_from_page lock_rec_unlock If there is conflicting lock make sure it is not BF-BF case. lock_rec_queue_validate Add Galera record locking rules comment and use new function to report BF lock waits. All attempts to reproduce the original assertion have been failed. Therefore, there is no test case on this commit.
* \| \| \|	MDEV-19935 Create unified CRC-32 interface	Vladislav Vaintroub	2020-09-17	1	-11/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add CRC32C code to mysys. The x86-64 implementation uses PCMULQDQ in addition to CRC32 instruction after Intel whitepaper, and is ported from rocksdb code. Optimized ARM and POWER CRC32 were already present in mysys.
* \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-09-04	2	-0/+6
\|\ \ \ \ \| \|/ / /
\| * \| \|	MDEV-23633 fixup: Add missing semicolon	Marko Mäkelä	2020-09-04	1	-1/+1
\| \| \| \|
\| * \| \|	MDEV-23633 MY_RELAX_CPU performs unnecessary compare-and-swap on ARM	Marko Mäkelä	2020-09-04	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This follows up MDEV-14374, which was filed against MariaDB Server 10.3. Back then, on a 48-core Qualcomm Centriq 2400, the performance of delay loops for spinloops was tested both with and without the dummy compare-and-swap operation, and it was decided to keep the dummy operation. On target architectures where nothing special is available (other than x86 (IA-32, AMD64) or POWER), we perform a dummy compare-and-swap operation. This is contrary to the idea of the x86 PAUSE instruction and the __ppc_get_timebase(), which aim to keep the memory bus idle for a while, to allow other cores to better execute code while a spinloop is waiting for something to be changed. On MariaDB Server 10.4 and another implementation of the ARMv8 ISA, omitting the dummy compare-and-swap improved performance by up to 12%. So, let us avoid the dummy compare-and-swap on ARM. For now, we are retaining the dummy compare-and-swap on other ISAs (such as SPARC, MIPS, S390x, RISC-V) because we do not have any performance data for them.
\| * \| \|	Merge 10.3 into 10.4	Marko Mäkelä	2020-09-03	1	-0/+2
\| \|\ \ \ \| \| \|/ /
* \| \| \|	MDEV-23495: Refine Arm64 PMULL runtime check in MariaDB	Yuqi Gu	2020-08-21	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Raspberry Pi 4 supports crc32 but doesn't support pmull (MDEV-23030). The PR #1645 offers a solution to fix this issue. But it does not consider the condition that the target platform does support crc32 but not support PMULL. In this condition, it should leverage the Arm64 crc32 instruction (__crc32c) and just only skip parallel computation (pmull/vmull) rather than skip all hardware crc32 instruction of computation. The PR also removes unnecessary CRC32_ZERO branch in 'crc32c_aarch64' for MariaDB, formats the indent and coding style. Change-Id: I76371a6bd767b4985600e8cca10983d71b7e9459 Signed-off-by: Yuqi Gu <yuqi.gu@arm.com>
* \| \| \|	Added DBUG_PUSH_EMPTY and DBUG_POP_EMPTY to speed up DBUG	Monty	2020-08-20	1	-0/+5
\| \| \| \|
* \| \| \|	After-merge fix of the Windows build	Marko Mäkelä	2020-08-20	1	-6/+8
\| \| \| \|
* \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-08-20	1	-2/+3
\|\ \ \ \ \| \|/ / /
\| * \| \|	Merge 10.3 into 10.4	Marko Mäkelä	2020-08-20	1	-3/+0
\| \|\ \ \ \| \| \|/ /
\| \| * \|	Merge 10.2 into 10.3	Marko Mäkelä	2020-08-20	1	-3/+0
\| \| \|\ \ \| \| \| \|/
\| \| \| *	Merge 10.1 into 10.2	Marko Mäkelä	2020-08-20	1	-3/+0
\| \| \| \|\
* \| \| \| \	Merge 10.4 into 10.5	Marko Mäkelä	2020-08-14	1	-3/+8
\|\ \ \ \ \ \| \|/ / / /
\| * \| \| \|	Merge 10.3 into 10.4, except MDEV-22543	Marko Mäkelä	2020-08-13	1	-3/+8
\| \|\ \ \ \ \| \| \|/ / / \| \| \| \| \| \| \| \| \| \|	Also, fix GCC -Og -Wmaybe-uninitialized in run_backup_stage()
\| \| * \| \|	Merge 10.2 into 10.3	Marko Mäkelä	2020-08-13	1	-3/+6
\| \| \|\ \ \ \| \| \| \|/ /
\| \| \| * \|	Use DBUG_ASSERT(ptr != NULL) to ease merging to 10.3	Marko Mäkelä	2020-08-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In 10.3, DBUG_ASSERT() may expand to something that includes __builtin_expect(), which expects integer arguments, not pointers. To avoid any compiler warnings, let us use an explicit rather than implicit comparison to the null pointer.
\| \| \| * \|	replace assert() with DBUG_ASSERT()	Eugene Kosov	2020-08-12	1	-3/+5
\| \| \| \| \|
\| \| \| * \|	add debug assertion to ilist	Eugene Kosov	2020-08-11	1	-3/+4
\| \| \| \| \|
* \| \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-08-10	1	-1/+2
\|\ \ \ \ \ \| \|/ / / /
\| * \| \| \|	Merge 10.3 into 10.4	Marko Mäkelä	2020-08-10	1	-1/+2
\| \|\ \ \ \ \| \| \|/ / /
\| \| * \| \|	MDEV-23348 vio_shutdown does not prevent later ReadFile on named pipe	Vladislav Vaintroub	2020-08-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce st_vio::shutdown_flag to be checked prior to Read/WriteFile and during wait for async.io to finish.
* \| \| \| \|	Merge 10.4 into 10.5	Marko Mäkelä	2020-08-01	3	-4/+19
\|\ \ \ \ \ \| \|/ / / /
\| * \| \| \|	Merge 10.3 into 10.4	Marko Mäkelä	2020-07-31	1	-2/+0
\| \|\ \ \ \ \| \| \|/ / /
\| \| * \| \|	MDEV-21101 unexpected wait_timeout with pool-of-threads	Vladislav Vaintroub	2020-07-30	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to restricted size of the threadpool, execution of client queries can be delayed (queued) for a while. This delay was interpreted as client inactivity, and connection is closed, if client idle time + queue time exceeds wait_timeout. But users did not expect queue time to be included into wait_timeout. This patch changes the behavior. We don't close connection anymore, if there is some unread data present on connection, even if wait_timeout is exceeded. Unread data means that client was not idle, it sent a query, which we did not have time to process yet.
\| * \| \| \|	MDEV-23311 CEILING() and FLOOR() convert temporal input to numbers, unlike ↵	Alexander Barkov	2020-07-28	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ROUND() and TRUNCATE() Fixing functions CEILING and FLOOR to return - TIME for TIME input - DATETIME for DATETIME and TIMESTAMP input
\| * \| \| \|	MDEV-23249: Support aarch64 architecture timer	Tzachi Zidenberg	2020-07-23	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	aarch64 timer is available to userspace via arch register. clang's __builtin_readcyclecounter is wrong for aarch64 (reads the PMU cycle counter instead of the archi-timer register), so we don't use it. my_rdtsc unit-test on AWS m6g shows: frequency: 121830845 resolution: 1 overhead: 1 This counter is not strictly increasing, but it is non-decreasing.
\| * \| \| \|	Merge remote-tracking branch 'origin/bb-10.4-MDEV-21910' into 10.4	Julius Goryavsky	2020-07-16	1	-0/+5
\| \|\ \ \ \
\| \| * \| \| \|	MDEV-21910 Deadlock between BF abort and manual KILL command	sjaakola	2020-06-26	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When high priority replication slave applier encounters lock conflict in innodb, it will force the conflicting lock holder transaction (victim) to rollback. This is a must in multi-master sychronous replication model to avoid cluster lock-up. This high priority victim abort (aka "brute force" (BF) abort), is started from innodb lock manager while holding the victim's transaction's (trx) mutex. Depending on the execution state of the victim transaction, it may happen that the BF abort will call for THD::awake() to wake up the victim transaction for the rollback. Now, if BF abort requires THD::awake() to be called, then the applier thread executed locking protocol of: victim trx mutex -> victim THD::LOCK_thd_data If, at the same time another DBMS super user issues KILL command to abort the same victim, it will execute locking protocol of: victim THD::LOCK_thd_data -> victim trx mutex. These two locking protocol acquire mutexes in opposite order, hence unresolvable mutex locking deadlock may occur. The fix in this commit adds THD::wsrep_aborter flag to synchronize who can kill the victim This flag is set both when BF is called for from innodb and by KILL command. Either path of victim killing will bail out if victim's wsrep_killed is already set to avoid mutex conflicts with the other aborter execution. THD::wsrep_aborter records the aborter THD's ID. This is needed to preserve the right to kill the victim from different locations for the same aborter thread. It is also good error logging, to see who is reponsible for the abort. A new test case was added in galera.galera_bf_kill_debug.test for scenario where wsrep applier thread and manual KILL command try to kill same idle victim