summaryrefslogtreecommitdiff
path: root/tpool
Commit message (Collapse)AuthorAgeFilesLines
* MDEV-24313 (2 of 2): Silently ignored innodb_use_native_aio=1bb-10.5-MDEV-24313Marko Mäkelä2020-12-141-1/+1
| | | | | | | | | | | | | | | | | | | In commit 5e62b6a5e06eb02cbde1e34e95e26f42d87fce02 (MDEV-16264) the logic of os_aio_init() was changed so that it will never fail, but instead automatically disable innodb_use_native_aio (which is enabled by default) if the io_setup() system call would fail due to resource limits being exceeded. This is questionable, especially because falling back to simulated AIO may lead to significantly reduced performance. srv_n_file_io_threads, srv_n_read_io_threads, srv_n_write_io_threads: Change the data type from ulong to uint. os_aio_init(): Remove the parameters, and actually return an error code. thread_pool::configure_aio(): Do not silently fall back to simulated AIO. Reviewed by: Vladislav Vaintroub
* Simplify clang workarounds.Vladislav Vaintroub2020-12-071-9/+2
|
* MDEV-24295: Fix the non-clang buildMarko Mäkelä2020-12-021-0/+3
| | | | | Sorry, only tested commit 4174fc1a1bd1f1c29f10264108269bf2e18e2f24 on clang. Other compilers do not define __has_feature().
* MDEV-24295: Fix the WITH_MSAN buildMarko Mäkelä2020-12-021-1/+6
| | | | | For some reason, commit 5bb5d4ad3a687ac61a9c5f8ffff6dd231f9b581a made clang++-11 unhappy about a constexpr declaration.
* Clarify some comments.Vladislav Vaintroub2020-11-301-5/+25
| | | | | | | | | | | - the intention for my_getevents syscall is now better explained, why are we using it (to be able to interrupt io_getevents syscall via io_destroy()). - Fix comment for MAX_EVENTS in getevent_thread_routine. MAX_EVENTS is more of less arbitrary constant, chosen such that events array is big enough to get multiple simultaneous io completions, but small enough so it does not blow the thread's stack.
* MDEV-24295 Reduce wakeups by tpool maintenance timer, when server is idleVladislav Vaintroub2020-11-301-10/+109
| | | | | | | | If maintenance timer does not do much for prolonged time, it will wake up less frequently, once every 4 seconds instead of once every 0.4 second. It will wakeup more often if thread creation is throttled, to avoid stalls.
* Avoid some DBUG prints from idle server in thread poolMonty2020-11-261-0/+2
|
* MDEV-24270: Clarify some commentsMarko Mäkelä2020-11-251-9/+20
|
* Fix misspelling.Vladislav Vaintroub2020-11-251-11/+11
| | | | Kudos to Marko for finding.
* Cleanup. Provide accurate comment on my_getevents().Vladislav Vaintroub2020-11-251-2/+10
|
* Partially Revert "MDEV-24270: Collect multiple completed events at a time"Vladislav Vaintroub2020-11-252-4/+4
| | | | | | | This partially reverts commit 6479006e14691ff85072d06682f81b90875e9cb0. Remove the constant tpool::aio::N_PENDING, which has no intrinsic meaning for the tpool.
* MDEV-24270: Collect multiple completed events at a timeMarko Mäkelä2020-11-252-6/+7
| | | | | | | tpool::aio::N_PENDING: Replaces OS_AIO_N_PENDING_IOS_PER_THREAD. This limits two similar things: the number of outstanding requests that a thread may io_submit(), and the number of completed requests collected at a time by io_getevents().
* MDEV-24270 Misuse of io_getevents() causes wake-ups at least twice per secondMarko Mäkelä2020-11-251-82/+79
| | | | | | | | | | | | | | | | | | | In the asynchronous I/O interface, InnoDB is invoking io_getevents() with a timeout value of half a second, and requesting exactly 1 event at a time. The reason to have such a short timeout is to facilitate shutdown. We can do better: Use an infinite timeout, wait for a larger maximum number of events. On shutdown, we will invoke io_destroy(), which should lead to the io_getevents system call reporting EINVAL. my_getevents(): Reimplement the libaio io_getevents() by only invoking the system call. The library implementation would try to elide the system call and return 0 immediately if aio_ring_is_empty() holds. Here, we do want a blocking system call, not 100% CPU usage. Neither do we want the aio_ring_is_empty() trigger SIGSEGV because it is dereferencing some memory that was freed by io_destroy().
* MDEV-16264 fixup: Clean up asynchronous I/OMarko Mäkelä2020-10-261-1/+1
| | | | | | | | | | os_aio_userdata_t: Remove. It was basically duplicating IORequest. buf_page_write_complete(): Take only IORequest as a parameter. os_aio_func(), pfs_os_aio_func(): Replaced with os_aio() that has no redundant parameters. There is only one caller, so there is no point to pass __FILE__, __LINE__ as a parameter.
* MDEV-15053 Reduce buf_pool_t::mutex contentionMarko Mäkelä2020-06-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User-visible changes: The INFORMATION_SCHEMA views INNODB_BUFFER_PAGE and INNODB_BUFFER_PAGE_LRU will report a dummy value FLUSH_TYPE=0 and will no longer report the PAGE_STATE value READY_FOR_USE. We will remove some fields from buf_page_t and move much code to member functions of buf_pool_t and buf_page_t, so that the access rules of data members can be enforced consistently. Evicting or adding pages in buf_pool.LRU will remain covered by buf_pool.mutex. Evicting or adding pages in buf_pool.page_hash will remain covered by both buf_pool.mutex and the buf_pool.page_hash X-latch. After this fix, buf_pool.page_hash lookups can entirely avoid acquiring buf_pool.mutex, only relying on buf_pool.hash_lock_get() S-latch. Similarly, buf_flush_check_neighbors() can will rely solely on buf_pool.mutex, no buf_pool.page_hash latch at all. The buf_pool.mutex is rather contended in I/O heavy benchmarks, especially when the workload does not fit in the buffer pool. The first attempt to alleviate the contention was the buf_pool_t::mutex split in commit 4ed7082eefe56b3e97e0edefb3df76dd7ef5e858 which introduced buf_block_t::mutex, which we are now removing. Later, multiple instances of buf_pool_t were introduced in commit c18084f71b02ea707c6461353e6cfc15d7553bc6 and recently removed by us in commit 1a6f708ec594ac0ae2dd30db926ab07b100fa24b (MDEV-15058). UNIV_BUF_DEBUG: Remove. This option to enable some buffer pool related debugging in otherwise non-debug builds has not been used for years. Instead, we have been using UNIV_DEBUG, which is enabled in CMAKE_BUILD_TYPE=Debug. buf_block_t::mutex, buf_pool_t::zip_mutex: Remove. We can mainly rely on std::atomic and the buf_pool.page_hash latches, and in some cases depend on buf_pool.mutex or buf_pool.flush_list_mutex just like before. We must always release buf_block_t::lock before invoking unfix() or io_unfix(), to prevent a glitch where a block that was added to the buf_pool.free list would apper X-latched. See commit c5883debd6ef440a037011c11873b396923e93c5 how this glitch was finally caught in a debug environment. We move some buf_pool_t::page_hash specific code from the ha and hash modules to buf_pool, for improved readability. buf_pool_t::close(): Assert that all blocks are clean, except on aborted startup or crash-like shutdown. buf_pool_t::validate(): No longer attempt to validate n_flush[] against the number of BUF_IO_WRITE fixed blocks, because buf_page_t::flush_type no longer exists. buf_pool_t::watch_set(): Replaces buf_pool_watch_set(). Reduce mutex contention by separating the buf_pool.watch[] allocation and the insert into buf_pool.page_hash. buf_pool_t::page_hash_lock<bool exclusive>(): Acquire a buf_pool.page_hash latch. Replaces and extends buf_page_hash_lock_s_confirm() and buf_page_hash_lock_x_confirm(). buf_pool_t::READ_AHEAD_PAGES: Renamed from BUF_READ_AHEAD_PAGES. buf_pool_t::curr_size, old_size, read_ahead_area, n_pend_reads: Use Atomic_counter. buf_pool_t::running_out(): Replaces buf_LRU_buf_pool_running_out(). buf_pool_t::LRU_remove(): Remove a block from the LRU list and return its predecessor. Incorporates buf_LRU_adjust_hp(), which was removed. buf_page_get_gen(): Remove a redundant call of fsp_is_system_temporary(), for mode == BUF_GET_IF_IN_POOL_OR_WATCH, which is only used by BTR_DELETE_OP (purge), which is never invoked on temporary tables. buf_free_from_unzip_LRU_list_batch(): Avoid redundant assignments. buf_LRU_free_from_unzip_LRU_list(): Simplify the loop condition. buf_LRU_free_page(): Clarify the function comment. buf_flush_check_neighbor(), buf_flush_check_neighbors(): Rewrite the construction of the page hash range. We will hold the buf_pool.mutex for up to buf_pool.read_ahead_area (at most 64) consecutive lookups of buf_pool.page_hash. buf_flush_page_and_try_neighbors(): Remove. Merge to its only callers, and remove redundant operations in buf_flush_LRU_list_batch(). buf_read_ahead_random(), buf_read_ahead_linear(): Rewrite. Do not acquire buf_pool.mutex, and iterate directly with page_id_t. ut_2_power_up(): Remove. my_round_up_to_next_power() is inlined and avoids any loops. fil_page_get_prev(), fil_page_get_next(), fil_addr_is_null(): Remove. buf_flush_page(): Add a fil_space_t* parameter. Minimize the buf_pool.mutex hold time. buf_pool.n_flush[] is no longer updated atomically with the io_fix, and we will protect most buf_block_t fields with buf_block_t::lock. The function buf_flush_write_block_low() is removed and merged here. buf_page_init_for_read(): Use static linkage. Initialize the newly allocated block and acquire the exclusive buf_block_t::lock while not holding any mutex. IORequest::IORequest(): Remove the body. We only need to invoke set_punch_hole() in buf_flush_page() and nowhere else. buf_page_t::flush_type: Remove. Replaced by IORequest::flush_type. This field is only used during a fil_io() call. That function already takes IORequest as a parameter, so we had better introduce for the rarely changing field. buf_block_t::init(): Replaces buf_page_init(). buf_page_t::init(): Replaces buf_page_init_low(). buf_block_t::initialise(): Initialise many fields, but keep the buf_page_t::state(). Both buf_pool_t::validate() and buf_page_optimistic_get() requires that buf_page_t::in_file() be protected atomically with buf_page_t::in_page_hash and buf_page_t::in_LRU_list. buf_page_optimistic_get(): Now that buf_block_t::mutex no longer exists, we must check buf_page_t::io_fix() after acquiring the buf_pool.page_hash lock, to detect whether buf_page_init_for_read() has been initiated. We will also check the io_fix() before acquiring hash_lock in order to avoid unnecessary computation. The field buf_block_t::modify_clock (protected by buf_block_t::lock) allows buf_page_optimistic_get() to validate the block. buf_page_t::real_size: Remove. It was only used while flushing pages of page_compressed tables. buf_page_encrypt(): Add an output parameter that allows us ot eliminate buf_page_t::real_size. Replace a condition with debug assertion. buf_page_should_punch_hole(): Remove. buf_dblwr_t::add_to_batch(): Replaces buf_dblwr_add_to_batch(). Add the parameter size (to replace buf_page_t::real_size). buf_dblwr_t::write_single_page(): Replaces buf_dblwr_write_single_page(). Add the parameter size (to replace buf_page_t::real_size). fil_system_t::detach(): Replaces fil_space_detach(). Ensure that fil_validate() will not be violated even if fil_system.mutex is released and reacquired. fil_node_t::complete_io(): Renamed from fil_node_complete_io(). fil_node_t::close_to_free(): Replaces fil_node_close_to_free(). Avoid invoking fil_node_t::close() because fil_system.n_open has already been decremented in fil_space_t::detach(). BUF_BLOCK_READY_FOR_USE: Remove. Directly use BUF_BLOCK_MEMORY. BUF_BLOCK_ZIP_DIRTY: Remove. Directly use BUF_BLOCK_ZIP_PAGE, and distinguish dirty pages by buf_page_t::oldest_modification(). BUF_BLOCK_POOL_WATCH: Remove. Use BUF_BLOCK_NOT_USED instead. This state was only being used for buf_page_t that are in buf_pool.watch. buf_pool_t::watch[]: Remove pointer indirection. buf_page_t::in_flush_list: Remove. It was set if and only if buf_page_t::oldest_modification() is nonzero. buf_page_decrypt_after_read(), buf_corrupt_page_release(), buf_page_check_corrupt(): Change the const fil_space_t* parameter to const fil_node_t& so that we can report the correct file name. buf_page_monitor(): Declare as an ATTRIBUTE_COLD global function. buf_page_io_complete(): Split to buf_page_read_complete() and buf_page_write_complete(). buf_dblwr_t::in_use: Remove. buf_dblwr_t::buf_block_array: Add IORequest::flush_t. buf_dblwr_sync_datafiles(): Remove. It was a useless wrapper of os_aio_wait_until_no_pending_writes(). buf_flush_write_complete(): Declare static, not global. Add the parameter IORequest::flush_t. buf_flush_freed_page(): Simplify the code. recv_sys_t::flush_lru: Renamed from flush_type and changed to bool. fil_read(), fil_write(): Replaced with direct use of fil_io(). fil_buffering_disabled(): Remove. Check srv_file_flush_method directly. fil_mutex_enter_and_prepare_for_io(): Return the resolved fil_space_t* to avoid a duplicated lookup in the caller. fil_report_invalid_page_access(): Clean up the parameters. fil_io(): Return fil_io_t, which comprises fil_node_t and error code. Always invoke fil_space_t::acquire_for_io() and let either the sync=true caller or fil_aio_callback() invoke fil_space_t::release_for_io(). fil_aio_callback(): Rewrite to replace buf_page_io_complete(). fil_check_pending_operations(): Remove a parameter, and remove some redundant lookups. fil_node_close_to_free(): Wait for n_pending==0. Because we no longer do an extra lookup of the tablespace between fil_io() and the completion of the operation, we must give fil_node_t::complete_io() a chance to decrement the counter. fil_close_tablespace(): Remove unused parameter trx, and document that this is only invoked during the error handling of IMPORT TABLESPACE. row_import_discard_changes(): Merged with the only caller, row_import_cleanup(). Do not lock up the data dictionary while invoking fil_close_tablespace(). logs_empty_and_mark_files_at_shutdown(): Do not invoke fil_close_all_files(), to avoid a !needs_flush assertion failure on fil_node_t::close(). innodb_shutdown(): Invoke os_aio_free() before fil_close_all_files(). fil_close_all_files(): Invoke fil_flush_file_spaces() to ensure proper durability. thread_pool::unbind(): Fix a crash that would occur on Windows after srv_thread_pool->disable_aio() and os_file_close(). This fix was submitted by Vladislav Vaintroub. Thanks to Matthias Leich and Axel Schwenke for extensive testing, Vladislav Vaintroub for helpful comments, and Eugene Kosov for a review.
* MDEV-16264: Eliminate unsafe os_aio_userdata_t type castMarko Mäkelä2020-03-121-1/+1
|
* Fix compilation error due to type mismatch in tpool_generic.ccVicențiu Ciorbaru2020-02-131-1/+1
| | | | size_t compared to int
* MDEV-21674 purge_sys.stop() fails to wait for purge workers to completeMarko Mäkelä2020-02-072-6/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 5e62b6a5e06eb02cbde1e34e95e26f42d87fce02 (MDEV-16264), purge_sys_t::stop() no longer waited for all purge activity to stop. This caused problems on FLUSH TABLES...FOR EXPORT because of purge running concurrently with the buffer pool flush. The assertion at the end of buf_flush_dirty_pages() could fail. The, implemented by Vladislav Vaintroub, aims to eliminate race conditions when stopping or resuming purge: waitable_task::disable(): Wait for the task to complete, then replace the task callback function with noop. waitable_task::enable(): Restore the original task callback function after disable(). purge_sys_t::stop(): Invoke purge_coordinator_task.disable(). purge_sys_t::resume(): Invoke purge_coordinator_task.enable(). purge_sys_t::running(): Add const qualifier, and clarify the comment. The purge coordinator task will remain active as long as any purge worker task is active. purge_worker_callback(): Assert purge_sys.running(). srv_purge_wakeup(): Merge with the only caller purge_sys_t::resume(). purge_coordinator_task: Use static linkage.
* MDEV-21551 : Assertion `m_active_threads.size() >= m_long_tasks_count + ↵Vladislav Vaintroub2020-01-231-2/+3
| | | | | | | | | m_waiting_task_count' failed" Happened when running innodb_fts.sync_ddl m_long_task_count could be wrongly reset to 0, if m_task_queue is empty.
* MDEV-21551 Fix race condition in thread_pool_generic::wait_begin()Vladislav Vaintroub2020-01-221-2/+14
| | | | | | | | While waiting for mutex, thread_pool_generic::wait_begin(), current task can be marked long-running. This is done by periodic mantainence task, that runs in parallel. Fix to recheck is_long_task() after the mutex acquisition.
* MDEV-21551: Fix -Wsign-compareMarko Mäkelä2020-01-221-3/+3
| | | | | An assertion added in commit c20bf8fd494edd4e4931557395b8a2bdf6cc48ab includes a sign mismatch. Make the affected data members unsigned.
* MDEV-21551 Fix calculation of current concurrency level inVladislav Vaintroub2020-01-221-0/+2
| | | | | | | | | | | | | | | | | maybe_wake_or_create_thread() A task that is executed,could be counted as waiting (after wait_begin() before wait_end()) or as long-running (callback runs for a long time). If task is both marked waiting and long running, then calculation of current concurrency (# of executing tasks - # of long tasks - #of waiting tasks) is wrong, as task is counted twice. Thus current concurrency could go negative, but with unsigned arithmetic it will become a huge number. As a result, maybe_wake_or_create_thread() would neither wake or create a thread, when it should. Which may result in a deadlock.
* tpool - misc fixesVladislav Vaintroub2020-01-122-6/+7
|
* MDEV-21326 : Address TSAN warnings in tpool.Vladislav Vaintroub2020-01-123-7/+27
| | | | | | | | | | | | | | | | | 1. Fix places where data race warnings were relevant. tls_worker_data::m_state should be modified under mutex protection, since both maintainence timer and current worker set this flag. 2. Suppress warnings that are legitimate, yet harmless. Apparently, the dirty reads in waitable_task::get_ref_count() or write_slots->pending_io_count() Avoiding race entirely without side-effects here is tricky, and the effects of race is harmless. The worst thing that can happen due to race is an extra wait notification, under rare circumstances.
* tpool - implement post-task callback (for Innodb debugging)Vladislav Vaintroub2020-01-125-1/+32
|
* MDEV-16264 - some improvementsVladislav Vaintroub2019-12-095-34/+97
| | | | | | | - wait notification, tpool_wait_begin/tpool_wait_end - to notify the threadpool that current thread is going to wait Use it to wait for IOs to complete and also when purge waits for workers.
* MDEV-16264: Minor cleanupMarko Mäkelä2019-12-033-11/+12
| | | | | | | aio_linux::m_max_io_count: Unused data member; remove. aiocb::m_ret_len: Declare as the more compatible type size_t. Unfortunately, ssize_t is not available on Microsoft Visual Studio.
* MDEV-16264 - Fix assertion `m_queue.empty() && !m_tasks_running' in ↵Vladislav Vaintroub2019-11-251-1/+16
| | | | | | | | | | | | | | | tpool::task_group destructor This particular assertion happened when shutting down Innodb IO.IO shutdown properly waits for all IOs to finish However there is a race condition - right after releasing last IO slot and before decrementing task count in group, pending_io_count will be 0, but tasks_running will be 1, leading to assertion. The fix is to make task_group destructor to wait for last running task to finish.
* Fix compile error on centos6. it does not like std::this_thread::sleep()Vladislav Vaintroub2019-11-151-9/+3
| | | | | Simplify task_group destructor. No tasks must be running or queued into task group is being destroyed.
* MDEV-16264: Fix some white spaceMarko Mäkelä2019-11-155-48/+31
|
* MDEV-16264: Add threadpool libraryVladislav Vaintroub2019-11-1510-0/+2317
The library is capable of - asynchronous execution of tasks (and optionally waiting for them) - asynchronous file IO This is implemented using libaio on Linux and completion ports on Windows. Elsewhere, async io is "simulated", which means worker threads are performing synchronous IO. - timers, scheduling work asynchronously in some point of the future. Also periodic timers are implemented.