summaryrefslogtreecommitdiff
path: root/mysql-test/suite/innodb_zip
Commit message (Collapse)AuthorAgeFilesLines
* Merge 10.7 into 10.8Marko Mäkelä2022-11-091-1/+0
|\
| * Merge 10.5 into 10.6Marko Mäkelä2022-11-091-1/+0
| |\
| | * MDEV-22512: Re-enable the tests on 10.5Marko Mäkelä2022-11-081-1/+0
| | |
| * | Merge 10.5 into 10.6Marko Mäkelä2022-11-081-1/+1
| |\ \ | | |/
| | * Merge 10.4 into 10.5Marko Mäkelä2022-11-081-1/+1
| | |\
| | | * Merge 10.3 into 10.4Marko Mäkelä2022-11-081-1/+1
| | | |\
| | | | * MDEV-22512: Disable frequently failing testsMarko Mäkelä2022-11-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | InnoDB crash recovery can run out of memory before commit 50324ce62448f284522ee1e3be24d8f5ba3bf904 in MariaDB Server 10.5. Let us disable some frequently failing recovery tests in earlier versions.
* | | | | Merge 10.7 into 10.8Marko Mäkelä2022-09-219-53/+62
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.5 into 10.6Marko Mäkelä2022-09-209-53/+62
| |\ \ \ \ | | |/ / /
| | * | | Merge remote-tracking branch 'origin/10.4' into 10.5Alexander Barkov2022-09-149-53/+53
| | |\ \ \ | | | |/ /
| | | * | Merge 10.3 into 10.4Marko Mäkelä2022-09-139-53/+53
| | | |\ \ | | | | |/
| | | | * MDEV-29446 Change SHOW CREATE TABLE to display default collationAlexander Barkov2022-09-129-53/+53
| | | | |
* | | | | Merge 10.7 into 10.8Marko Mäkelä2022-08-241-10/+10
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.5 into 10.6Marko Mäkelä2022-08-221-10/+10
| |\ \ \ \ | | |/ / /
| | * | | Merge 10.4 into 10.5Marko Mäkelä2022-08-221-10/+10
| | |\ \ \ | | | |/ /
| | | * | Merge 10.3 into 10.4Marko Mäkelä2022-08-221-10/+10
| | | |\ \ | | | | |/
| | | | * MDEV-13013 fixup: Adjust a testMarko Mäkelä2022-08-221-10/+10
| | | | |
* | | | | Merge 10.7 into 10.8Marko Mäkelä2022-07-282-0/+28
|\ \ \ \ \ | |/ / / /
| * | | | MDEV-28950: Add a test caseMarko Mäkelä2022-07-272-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | After reverting commit commit 39f45f6f89ce2fc2db54bb8ab0f6076f923beeec all combinations of this test would crash the server.
* | | | | Merge 10.7 into 10.8Marko Mäkelä2022-06-062-8/+20
|\ \ \ \ \ | |/ / / /
| * | | | MDEV-13542: Crashing on corrupted page is unhelpfulMarko Mäkelä2022-06-062-8/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The approach to handling corruption that was chosen by Oracle in commit 177d8b0c125b841c0650d27d735e3b87509dc286 is not really useful. Not only did it actually fail to prevent InnoDB from crashing, but it is making things worse by blocking attempts to rescue data from or rebuild a partially readable table. We will try to prevent crashes in a different way: by propagating errors up the call stack. We will never mark the clustered index persistently corrupted, so that data recovery may be attempted by reading from the table, or by rebuilding the table. This should also fix MDEV-13680 (crash on btr_page_alloc() failure); it was extensively tested with innodb_file_per_table=0 and a non-autoextend system tablespace. We should now avoid crashes in many cases, such as when a page cannot be read or allocated, or an inconsistency is detected when attempting to update multiple pages. We will not crash on double-free, such as on the recovery of DDL in system tablespace in case something was corrupted. Crashes on corrupted data are still possible. The fault injection mechanism that is introduced in the subsequent commit may help catch more of them. buf_page_import_corrupt_failure: Remove the fault injection, and instead corrupt some pages using Perl code in the tests. btr_cur_pessimistic_insert(): Always reserve extents (except for the change buffer), in order to prevent a subsequent allocation failure. btr_pcur_open_at_rnd_pos(): Merged to the only caller ibuf_merge_pages(). btr_assert_not_corrupted(), btr_corruption_report(): Remove. Similar checks are already part of btr_block_get(). FSEG_MAGIC_N_BYTES: Replaces FSEG_MAGIC_N_VALUE. dict_hdr_get(), trx_rsegf_get_new(), trx_undo_page_get(), trx_undo_page_get_s_latched(): Replaced with error-checking calls. trx_rseg_t::get(mtr_t*): Replaces trx_rsegf_get(). trx_rseg_header_create(): Let the caller update the TRX_SYS page if needed. trx_sys_create_sys_pages(): Merged with trx_sysf_create(). dict_check_tablespaces_and_store_max_id(): Do not access DICT_HDR_MAX_SPACE_ID, because it was already recovered in dict_boot(). Merge dict_check_sys_tables() with this function. dir_pathname(): Replaces os_file_make_new_pathname(). row_undo_ins_remove_sec(): Do not modify the undo page by adding a terminating NUL byte to the record. btr_decryption_failed(): Report decryption failures dict_set_corrupted_by_space(), dict_set_encrypted_by_space(), dict_set_corrupted_index_cache_only(): Remove. dict_set_corrupted(): Remove the constant parameter dict_locked=false. Never flag the clustered index corrupted in SYS_INDEXES, because that would deny further access to the table. It might be possible to repair the table by executing ALTER TABLE or OPTIMIZE TABLE, in case no B-tree leaf page is corrupted. dict_table_skip_corrupt_index(), dict_table_next_uncorrupted_index(), row_purge_skip_uncommitted_virtual_index(): Remove, and refactor the callers to read dict_index_t::type only once. dict_table_is_corrupted(): Remove. dict_index_t::is_btree(): Determine if the index is a valid B-tree. BUF_GET_NO_LATCH, BUF_EVICT_IF_IN_POOL: Remove. UNIV_BTR_DEBUG: Remove. Any inconsistency will no longer trigger assertion failures, but error codes being returned. buf_corrupt_page_release(): Replaced with a direct call to buf_pool.corrupted_evict(). fil_invalid_page_access_msg(): Never crash on an invalid read; let the caller of buf_page_get_gen() decide. btr_pcur_t::restore_position(): Propagate failure status to the caller by returning CORRUPTED. opt_search_plan_for_table(): Simplify the code. row_purge_del_mark(), row_purge_upd_exist_or_extern_func(), row_undo_ins_remove_sec_rec(), row_undo_mod_upd_del_sec(), row_undo_mod_del_mark_sec(): Avoid mem_heap_create()/mem_heap_free() when no secondary indexes exist. row_undo_mod_upd_exist_sec(): Simplify the code. row_upd_clust_step(), dict_load_table_one(): Return DB_TABLE_CORRUPT if the clustered index (and therefore the table) is corrupted, similar to what we do in row_insert_for_mysql(). fut_get_ptr(): Replace with buf_page_get_gen() calls. buf_page_get_gen(): Return nullptr and *err=DB_CORRUPTION if the page is marked as freed. For other modes than BUF_GET_POSSIBLY_FREED or BUF_PEEK_IF_IN_POOL this will trigger a debug assertion failure. For BUF_GET_POSSIBLY_FREED, we will return nullptr for freed pages, so that the callers can be simplified. The purge of transaction history will be a new user of BUF_GET_POSSIBLY_FREED, to avoid crashes on corrupted data. buf_page_get_low(): Never crash on a corrupted page, but simply return nullptr. fseg_page_is_allocated(): Replaces fseg_page_is_free(). fts_drop_common_tables(): Return an error if the transaction was rolled back. fil_space_t::set_corrupted(): Report a tablespace as corrupted if it was not reported already. fil_space_t::io(): Invoke fil_space_t::set_corrupted() to report out-of-bounds page access or other errors. Clean up mtr_t::page_lock() buf_page_get_low(): Validate the page identifier (to check for recently read corrupted pages) after acquiring the page latch. buf_page_t::read_complete(): Flag uninitialized (all-zero) pages with DB_FAIL. Return DB_PAGE_CORRUPTED on page number mismatch. mtr_t::defer_drop_ahi(): Renamed from mtr_defer_drop_ahi(). recv_sys_t::free_corrupted_page(): Only set_corrupt_fs() if any log records exist for the page. We do not mind if read-ahead produces corrupted (or all-zero) pages that were not actually needed during recovery. recv_recover_page(): Return whether the operation succeeded. recv_sys_t::recover_low(): Simplify the logic. Check for recovery error. Thanks to Matthias Leich for testing this extensively and to the authors of https://rr-project.org for making it easy to diagnose and fix any failures that were found during the testing.
* | | | | Merge 10.7 into 10.8Marko Mäkelä2022-03-304-2/+13
|\ \ \ \ \ | |/ / / /
| * | | | MDEV-28181 The innochecksum -w option was inadvertently removedMarko Mäkelä2022-03-284-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 7a4fbb55b02b449a135fe935f624422eaacfdd7c (MDEV-25105) the innochecksum option --write (-w) was removed altogether. It should have been made a Boolean option, so that old data files may be converted to a format that is compatible with innodb_checksum_algorithm=strict_crc32 by executing the following: innochecksum -n -w ibdata* */*.ibd It would be better to use an older-version innochecksum for such a conversion, so that page checksums will be validated before updating the checksum. It never was possible for innochecksum to convert files to the innodb_checksum_algorithm=full_crc32 format that is the default for new InnoDB data files.
* | | | | Merge 10.7 into 10.8Marko Mäkelä2022-02-1717-638/+362
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.5 into 10.6Marko Mäkelä2022-02-1717-638/+362
| |\ \ \ \ | | |/ / /
| | * | | Merge 10.4 into 10.5Marko Mäkelä2022-02-1717-638/+362
| | |\ \ \ | | | |/ /
| | | * | Merge 10.3 into 10.4Marko Mäkelä2022-02-1717-638/+362
| | | |\ \ | | | | |/
| | | | * Merge 10.2 into 10.3Marko Mäkelä2022-02-1717-638/+362
| | | | |\
| | | | | * MDEV-27634 innodb_zip tests failing on s390xMarko Mäkelä2022-02-1617-638/+362
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some GNU/Linux distributions ship a zlib that is modified to use the s390x DFLTCC instruction. That modification would essentially redefine compressBound(sourceLen) as (sourceLen * 16 + 2308) / 8 + 6. Let us relax the tests for InnoDB ROW_FORMAT=COMPRESSED to cope with such a weaker compression guarantee. create_table_info_t::row_size_is_acceptable(): Remove a bogus debug-only assertion that would fail to hold for the test innodb_zip.bug36169. The function page_zip_empty_size() may indeed return 0.
* | | | | | Merge branch '10.7' into 10.8Oleksandr Byelkin2022-02-042-1/+2
|\ \ \ \ \ \ | |/ / / / /
| * | | | | Merge branch '10.5' into 10.6Oleksandr Byelkin2022-02-032-1/+2
| |\ \ \ \ \ | | |/ / / /
| | * | | | Merge branch '10.4' into 10.5Oleksandr Byelkin2022-02-012-1/+2
| | |\ \ \ \ | | | |/ / /
| | | * | | Merge branch '10.3' into 10.4Oleksandr Byelkin2022-01-302-1/+2
| | | |\ \ \ | | | | |/ /
| | | | * | Merge branch '10.2' into 10.3mariadb-10.3.33Oleksandr Byelkin2022-01-292-1/+2
| | | | |\ \ | | | | | |/
| | | | | * MDEV-27273 Confusing column count in IMPORT TABLESPACE error messagebb-10.2-MDEV-27273-import-column-checkEugene Kosov2022-01-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's misleading to compare and write to user number of columns and fields. Thus, it would be better to remove that check and let use see a subsequent error message about missing or mispaced column. row_import::match_schema(): remove misleading check
| | | | | * MDEV-26870 --skip-symbolic-links does not disallow .isl file creationMarko Mäkelä2022-01-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The InnoDB DATA DIRECTORY attribute is not implemented via symbolic links but something similar, *.isl files that contain the names of data files. InnoDB failed to ignore the DATA DIRECTORY attribute even though the server was started with --skip-symbolic-links. Native ALTER TABLE in InnoDB will retain the DATA DIRECTORY attribute of the table, no matter if the table will be rebuilt or not. Generic ALTER TABLE (with ALGORITHM=COPY) as well as TRUNCATE TABLE will discard the DATA DIRECTORY attribute. All tests have been run with and without the ./mtr option --mysqld=--skip-symbolic-links and some tests that use the InnoDB DATA DIRECTORY attribute have been adjusted for this.
* | | | | | MDEV-14425 Improve the redo log for concurrencyMarko Mäkelä2022-01-212-2/+2
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The InnoDB redo log used to be formatted in blocks of 512 bytes. The log blocks were encrypted and the checksum was calculated while holding log_sys.mutex, creating a serious scalability bottleneck. We remove the fixed-size redo log block structure altogether and essentially turn every mini-transaction into a log block of its own. This allows encryption and checksum calculations to be performed on local mtr_t::m_log buffers, before acquiring log_sys.mutex. The mutex only protects a memcpy() of the data to the shared log_sys.buf, as well as the padding of the log, in case the to-be-written part of the log would not end in a block boundary of the underlying storage. For now, the "padding" consists of writing a single NUL byte, to allow recovery and mariadb-backup to detect the end of the circular log faster. Like the previous implementation, we will overwrite the last log block over and over again, until it has been completely filled. It would be possible to write only up to the last completed block (if no more recent write was requested), or to write dummy FILE_CHECKPOINT records to fill the incomplete block, by invoking the currently disabled function log_pad(). This would require adjustments to some logic around log checkpoints, page flushing, and shutdown. An upgrade after a crash of any previous version is not supported. Logically empty log files from a previous version will be upgraded. An attempt to start up InnoDB without a valid ib_logfile0 will be refused. Previously, the redo log used to be created automatically if it was missing. Only with with innodb_force_recovery=6, it is possible to start InnoDB in read-only mode even if the log file does not exist. This allows the contents of a possibly corrupted database to be dumped. Because a prepared backup from an earlier version of mariadb-backup will create a 0-sized log file, we will allow an upgrade from such log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace looks valid. The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced with 64-byte log checkpoint blocks at 0x1000 and 0x2000. The start of log records will move from 0x800 to 0x3000. This allows us to use 4096-byte aligned blocks for all I/O in a future revision. We extend the MDEV-12353 redo log record format as follows. (1) Empty mini-transactions or extra NUL bytes will not be allowed. (2) The end-of-minitransaction marker (a NUL byte) will be replaced with a 1-bit sequence number, which will be toggled each time when the circular log file wraps back to the beginning. (3) After the sequence bit, a CRC-32C checksum of all data (excluding the sequence bit) will written. (4) If the log is encrypted, 8 bytes will be written before the checksum and included in it. This is part of the initialization vector (IV) of encrypted log data. (5) File names, page numbers, and checkpoint information will not be encrypted. Only the payload bytes of page-level log will be encrypted. The tablespace ID and page number will form part of the IV. (6) For padding, arbitrary-length FILE_CHECKPOINT records may be written, with all-zero payload, and with the normal end marker and checksum. The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON. In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup will require a valid log file. When resizing the log, we will create a logically empty ib_logfile101 at the current LSN and use an atomic rename to replace ib_logfile0 with it. See the test innodb.log_file_size. Because there is no mandatory padding in the log file, we are able to create a dummy log file as of an arbitrary log sequence number. See the test mariabackup.huge_lsn. The parameter innodb_log_write_ahead_size and the INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed. The minimum value of innodb_log_buffer_size will be increased to 2MiB (because log_sys.buf will replace recv_sys.buf) and the increment adjusted to 4096 bytes (the maximum log block size). The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed: os_log_fsyncs os_log_pending_fsyncs log_pending_log_flushes log_pending_checkpoint_writes The following status variables will be removed: Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs) Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design) log_sys.get_block_size(): Return the physical block size of the log file. This is only implemented on Linux and Microsoft Windows for now, and for the power-of-2 block sizes between 64 and 4096 bytes (the minimum and maximum size of a checkpoint block). If the block size is anything else, the traditional 512-byte size will be used via normal file system buffering. If the file system buffers can be bypassed, a message like the following will be issued: InnoDB: File system buffers for log disabled (block size=512 bytes) InnoDB: File system buffers for log disabled (block size=4096 bytes) This has been tested on Linux and Microsoft Windows with both sizes. On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC. Tests in 3 different environments where the log is stored in a device with a physical block size of 512 bytes are yielding better throughput without O_DIRECT. This could be due to the fact that in the event the last log block is being overwritten (if multiple transactions would become durable at the same time, and each of will write a small number of bytes to the last log block), it should be faster to re-copy data from log_sys.buf or log_sys.flush_buf to the kernel buffer, to be finally written at fdatasync() time. The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for data files. This option will enable O_DIRECT on the log file on Linux. It may be unsafe to use when the storage device does not support FUA (Force Unit Access) mode. When the server is compiled WITH_PMEM=ON, we will use memory-mapped I/O for the log file if the log resides on a "mount -o dax" device. We will identify PMEM in a start-up message: InnoDB: log sequence number 0 (memory-mapped); transaction id 3 On Linux, we will also invoke mmap() on any ib_logfile0 that resides in /dev/shm, effectively treating the log file as persistent memory. This should speed up "./mtr --mem" and increase the test coverage of PMEM on non-PMEM hardware. It also allows users to estimate how much the performance would be improved by installing persistent memory. On other tmpfs file systems such as /run, we will not use mmap(). mariadb-backup: Eliminated several variables. We will refer directly to recv_sys and log_sys. backup_wait_for_lsn(): Detect non-progress of xtrabackup_copy_logfile(). In this new log format with arbitrary-sized blocks, we can only detect log file overrun indirectly, by observing that the scanned log sequence number is not advancing. xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit, because we are not allowed to modify the server's log file, and our memory mapping is read-only. trx_flush_log_if_needed_low(): Do not use the callback on pmem. Using neither flush_lock nor write_lock around PMEM writes seems to yield the best performance. The pmem_persist() calls may still be somewhat slower than the pwrite() and fdatasync() based interface (PMEM mounted without -o dax). recv_sys_t::buf: Remove. We will use log_sys.buf for parsing. recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE. recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn. recv_sys_t, log_sys_t: Removed many data members. recv_sys.lsn: Renamed from recv_sys.recovered_lsn. recv_sys.offset: Renamed from recv_sys.recovered_offset. log_sys.buf_size: Replaces srv_log_buffer_size. recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset] when the buffer is being allocated from the memory heap. recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is backed by ib_logfile0. The pointer will wrap from recv_sys.len (log_sys.file_size) to log_sys.START_OFFSET. For the record that wraps around, we may copy file name or record payload data to the auxiliary buffer decrypt_buf in order to have a contiguous block of memory. The maximum size of a record is less than innodb_page_size bytes. recv_sys_t::parse(): Take the smart pointer as a template parameter. Do not temporarily add a trailing NUL byte to FILE_ records, because we are not supposed to modify the memory-mapped log file. (It is attached in read-write mode already during recovery.) recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse(). recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be returned on PMEM, use recv_ring to wrap around the buffer to the start. mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free on PMEM, because it has no meaning on the mmap-based log. log_sys.write_to_buf: Count writes to log_sys.buf. Replaces srv_stats.log_write_requests and export_vars.innodb_log_write_requests. Protected by log_sys.mutex. Updated consistently in log_close(). Previously, mtr_t::commit() conditionally updated the count, which was inconsistent. log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf, for writing to log_sys.log (the ib_logfile0). Replaces srv_stats.log_writes and export_vars.innodb_log_writes. Protected by log_sys.mutex. log_sys.waits: Count waits in append_prepare(). Replaces srv_stats.log_waits and export_vars.innodb_log_waits. recv_recover_page(): Do not unnecessarily acquire log_sys.flush_order_mutex. We are inserting the blocks in arbitary order anyway, to be adjusted in recv_sys.apply(true). We will change the definition of flush_lock and write_lock to avoid potential false sharing. Depending on sizeof(log_sys) and CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could share a cache line with each other or with the last data members of log_sys. Thanks to Matthias Leich for providing https://rr-project.org traces for various failures during the development, and to Thirunarayanan Balathandayuthapani for his help in debugging some of the recovery code. And thanks to the developers of the rr debugger for a tool without which extensive changes to InnoDB would be very challenging to get right. Thanks to Vladislav Vaintroub for useful feedback and to him, Axel Schwenke and Krunal Bauskar for testing the performance.
* | | | | Merge 10.5 into 10.6Marko Mäkelä2021-10-272-2/+2
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.4 into 10.5st-10.5-mergeMarko Mäkelä2021-10-272-2/+2
| |\ \ \ \ | | |/ / /
| | * | | MDEV-18543 IMPORT TABLESPACE fails after instant DROP COLUMNbb-10.4-MDEV-18543-instant-import-bugsEugene Kosov2021-10-262-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ALTER TABLE IMPORT doesn't properly handle instant alter metadata. This patch makes IMPORT read, parse and apply instant alter metadata at the very beginning of operation. So, cases when source table has some metadata and destination table doesn't have it now works fine. DISCARD already removes instant metadata so importing normal table into instant table worked fine before this patch. decrypt_decompress(): decrypts and decompresses page if needed handle_instant_metadata(): this should be the first thing to read source table. Basically, it applies instant metadata to a destination dict_table_t object. This is the first thing to read FSP flags so all possible checks of it were moved to this function. PageConverter::update_index_page(): it doesn't now read instant metadata. This logic were moved into handle_instant_metadata() row_import::match_flags(): this is a first part row_import::match_schema(). As a separate function it's used by handle_instant_metadata(). fil_space_t::is_full_crc32_compressed(): added convenient function ha_innobase::discard_or_import_tablespace(): do not reload table definition to read instant metadata because handle_instant_metadata() does it better. The reverted code was originally added in 4e7ee166a9c76eb3546356aabfd2dbc759671cd0 ANONYMOUS_VAR: this is a handy thing to use along with make_scope_exit() full_crc32_import.test shows different results, because no dict_table_close() and dict_table_open_on_id() happens. Thus, SHOW CREATE TABLE shows a little bit older table definition.
* | | | | MDEV-4750 follow-up: Reduce disabling innodb_stats_persistentMarko Mäkelä2021-08-316-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This essentially reverts commit 4e89ec6692786bc1cbdce64d43d8e85a5d247dab and only disables InnoDB persistent statistics for tests where it is desirable. By design, InnoDB persistent statistics will not be updated except by ANALYZE TABLE or by STATS_AUTO_RECALC. The internal transactions that update persistent InnoDB statistics in background tasks (with innodb_stats_auto_recalc=ON) may cause nondeterministic query plans or interfere with some tests that deal with other InnoDB internals, such as the purge of transaction history.
* | | | | Merge branch '10.5' into 10.6Oleksandr Byelkin2021-08-022-8/+5
|\ \ \ \ \ | |/ / / /
| * | | | Merge branch '10.4' into 10.5Oleksandr Byelkin2021-07-312-8/+5
| |\ \ \ \ | | |/ / /
| | * | | Merge branch '10.3' into 10.4Oleksandr Byelkin2021-07-312-8/+5
| | |\ \ \ | | | |/ /
| | | * | Merge 10.2 into 10.3Marko Mäkelä2021-07-222-8/+5
| | | |\ \ | | | | |/
| | | | * MDEV-25361 fixup: Fix integer type mismatchMarko Mäkelä2021-07-222-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | InnoDB tablespace identifiers and page numbers are 32-bit numbers. Let us use a 32-bit type for them in innochecksum. The changes in commit 1918bdf32cdbd1f190cc4479f4076ee4a467f25d broke the build on 32-bit Windows. Thanks to Vicențiu Ciorbaru for an initial version of this fixup.
* | | | | Merge 10.5 into 10.6Marko Mäkelä2021-06-211-4/+4
|\ \ \ \ \ | |/ / / /
| * | | | Merge 10.4 into 10.5Marko Mäkelä2021-06-211-4/+4
| |\ \ \ \ | | |/ / /
| | * | | Merge 10.3 into 10.4Marko Mäkelä2021-06-211-4/+4
| | |\ \ \ | | | |/ /
| | | * | MDEV-15912: Remove traces of insert_undoMarko Mäkelä2021-06-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let us simply refuse an upgrade from earlier versions if the upgrade procedure was not followed. This simplifies the purge, commit, and rollback of transactions. Before upgrading to MariaDB 10.3 or later, a clean shutdown of the server (with innodb_fast_shutdown=1 or 0) is necessary, to ensure that any incomplete transactions are rolled back. The undo log format was changed in MDEV-12288. There is only one persistent undo log for each transaction.