summaryrefslogtreecommitdiff
path: root/storage/innobase/include/log0crypt.h
Commit message (Collapse)AuthorAgeFilesLines
* MDEV-14425 Improve the redo log for concurrencyMarko Mäkelä2022-01-211-35/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The InnoDB redo log used to be formatted in blocks of 512 bytes. The log blocks were encrypted and the checksum was calculated while holding log_sys.mutex, creating a serious scalability bottleneck. We remove the fixed-size redo log block structure altogether and essentially turn every mini-transaction into a log block of its own. This allows encryption and checksum calculations to be performed on local mtr_t::m_log buffers, before acquiring log_sys.mutex. The mutex only protects a memcpy() of the data to the shared log_sys.buf, as well as the padding of the log, in case the to-be-written part of the log would not end in a block boundary of the underlying storage. For now, the "padding" consists of writing a single NUL byte, to allow recovery and mariadb-backup to detect the end of the circular log faster. Like the previous implementation, we will overwrite the last log block over and over again, until it has been completely filled. It would be possible to write only up to the last completed block (if no more recent write was requested), or to write dummy FILE_CHECKPOINT records to fill the incomplete block, by invoking the currently disabled function log_pad(). This would require adjustments to some logic around log checkpoints, page flushing, and shutdown. An upgrade after a crash of any previous version is not supported. Logically empty log files from a previous version will be upgraded. An attempt to start up InnoDB without a valid ib_logfile0 will be refused. Previously, the redo log used to be created automatically if it was missing. Only with with innodb_force_recovery=6, it is possible to start InnoDB in read-only mode even if the log file does not exist. This allows the contents of a possibly corrupted database to be dumped. Because a prepared backup from an earlier version of mariadb-backup will create a 0-sized log file, we will allow an upgrade from such log files, provided that the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace looks valid. The 512-byte log checkpoint blocks at 0x200 and 0x600 will be replaced with 64-byte log checkpoint blocks at 0x1000 and 0x2000. The start of log records will move from 0x800 to 0x3000. This allows us to use 4096-byte aligned blocks for all I/O in a future revision. We extend the MDEV-12353 redo log record format as follows. (1) Empty mini-transactions or extra NUL bytes will not be allowed. (2) The end-of-minitransaction marker (a NUL byte) will be replaced with a 1-bit sequence number, which will be toggled each time when the circular log file wraps back to the beginning. (3) After the sequence bit, a CRC-32C checksum of all data (excluding the sequence bit) will written. (4) If the log is encrypted, 8 bytes will be written before the checksum and included in it. This is part of the initialization vector (IV) of encrypted log data. (5) File names, page numbers, and checkpoint information will not be encrypted. Only the payload bytes of page-level log will be encrypted. The tablespace ID and page number will form part of the IV. (6) For padding, arbitrary-length FILE_CHECKPOINT records may be written, with all-zero payload, and with the normal end marker and checksum. The minimum size is 7 bytes, or 7+8 with innodb_encrypt_log=ON. In mariadb-backup and in Galera snapshot transfer (SST) scripts, we will no longer remove ib_logfile0 or create an empty ib_logfile0. Server startup will require a valid log file. When resizing the log, we will create a logically empty ib_logfile101 at the current LSN and use an atomic rename to replace ib_logfile0 with it. See the test innodb.log_file_size. Because there is no mandatory padding in the log file, we are able to create a dummy log file as of an arbitrary log sequence number. See the test mariabackup.huge_lsn. The parameter innodb_log_write_ahead_size and the INFORMATION_SCHEMA.INNODB_METRICS counter log_padded will be removed. The minimum value of innodb_log_buffer_size will be increased to 2MiB (because log_sys.buf will replace recv_sys.buf) and the increment adjusted to 4096 bytes (the maximum log block size). The following INFORMATION_SCHEMA.INNODB_METRICS counters will be removed: os_log_fsyncs os_log_pending_fsyncs log_pending_log_flushes log_pending_checkpoint_writes The following status variables will be removed: Innodb_os_log_fsyncs (this is included in Innodb_data_fsyncs) Innodb_os_log_pending_fsyncs (this was limited to at most 1 by design) log_sys.get_block_size(): Return the physical block size of the log file. This is only implemented on Linux and Microsoft Windows for now, and for the power-of-2 block sizes between 64 and 4096 bytes (the minimum and maximum size of a checkpoint block). If the block size is anything else, the traditional 512-byte size will be used via normal file system buffering. If the file system buffers can be bypassed, a message like the following will be issued: InnoDB: File system buffers for log disabled (block size=512 bytes) InnoDB: File system buffers for log disabled (block size=4096 bytes) This has been tested on Linux and Microsoft Windows with both sizes. On Linux, only enable O_DIRECT on the log for innodb_flush_method=O_DSYNC. Tests in 3 different environments where the log is stored in a device with a physical block size of 512 bytes are yielding better throughput without O_DIRECT. This could be due to the fact that in the event the last log block is being overwritten (if multiple transactions would become durable at the same time, and each of will write a small number of bytes to the last log block), it should be faster to re-copy data from log_sys.buf or log_sys.flush_buf to the kernel buffer, to be finally written at fdatasync() time. The parameter innodb_flush_method=O_DSYNC will imply O_DIRECT for data files. This option will enable O_DIRECT on the log file on Linux. It may be unsafe to use when the storage device does not support FUA (Force Unit Access) mode. When the server is compiled WITH_PMEM=ON, we will use memory-mapped I/O for the log file if the log resides on a "mount -o dax" device. We will identify PMEM in a start-up message: InnoDB: log sequence number 0 (memory-mapped); transaction id 3 On Linux, we will also invoke mmap() on any ib_logfile0 that resides in /dev/shm, effectively treating the log file as persistent memory. This should speed up "./mtr --mem" and increase the test coverage of PMEM on non-PMEM hardware. It also allows users to estimate how much the performance would be improved by installing persistent memory. On other tmpfs file systems such as /run, we will not use mmap(). mariadb-backup: Eliminated several variables. We will refer directly to recv_sys and log_sys. backup_wait_for_lsn(): Detect non-progress of xtrabackup_copy_logfile(). In this new log format with arbitrary-sized blocks, we can only detect log file overrun indirectly, by observing that the scanned log sequence number is not advancing. xtrabackup_copy_logfile(): On PMEM, do not modify the sequence bit, because we are not allowed to modify the server's log file, and our memory mapping is read-only. trx_flush_log_if_needed_low(): Do not use the callback on pmem. Using neither flush_lock nor write_lock around PMEM writes seems to yield the best performance. The pmem_persist() calls may still be somewhat slower than the pwrite() and fdatasync() based interface (PMEM mounted without -o dax). recv_sys_t::buf: Remove. We will use log_sys.buf for parsing. recv_sys_t::MTR_SIZE_MAX: Replaces RECV_SCAN_SIZE. recv_sys_t::file_checkpoint: Renamed from mlog_checkpoint_lsn. recv_sys_t, log_sys_t: Removed many data members. recv_sys.lsn: Renamed from recv_sys.recovered_lsn. recv_sys.offset: Renamed from recv_sys.recovered_offset. log_sys.buf_size: Replaces srv_log_buffer_size. recv_buf: A smart pointer that wraps log_sys.buf[recv_sys.offset] when the buffer is being allocated from the memory heap. recv_ring: A smart pointer that wraps a circular log_sys.buf[] that is backed by ib_logfile0. The pointer will wrap from recv_sys.len (log_sys.file_size) to log_sys.START_OFFSET. For the record that wraps around, we may copy file name or record payload data to the auxiliary buffer decrypt_buf in order to have a contiguous block of memory. The maximum size of a record is less than innodb_page_size bytes. recv_sys_t::parse(): Take the smart pointer as a template parameter. Do not temporarily add a trailing NUL byte to FILE_ records, because we are not supposed to modify the memory-mapped log file. (It is attached in read-write mode already during recovery.) recv_sys_t::parse_mtr(): Wrapper for recv_sys_t::parse(). recv_sys_t::parse_pmem(): Like parse_mtr(), but if PREMATURE_EOF would be returned on PMEM, use recv_ring to wrap around the buffer to the start. mtr_t::finish_write(), log_close(): Do not enforce log_sys.max_buf_free on PMEM, because it has no meaning on the mmap-based log. log_sys.write_to_buf: Count writes to log_sys.buf. Replaces srv_stats.log_write_requests and export_vars.innodb_log_write_requests. Protected by log_sys.mutex. Updated consistently in log_close(). Previously, mtr_t::commit() conditionally updated the count, which was inconsistent. log_sys.write_to_log: Count swaps of log_sys.buf and log_sys.flush_buf, for writing to log_sys.log (the ib_logfile0). Replaces srv_stats.log_writes and export_vars.innodb_log_writes. Protected by log_sys.mutex. log_sys.waits: Count waits in append_prepare(). Replaces srv_stats.log_waits and export_vars.innodb_log_waits. recv_recover_page(): Do not unnecessarily acquire log_sys.flush_order_mutex. We are inserting the blocks in arbitary order anyway, to be adjusted in recv_sys.apply(true). We will change the definition of flush_lock and write_lock to avoid potential false sharing. Depending on sizeof(log_sys) and CPU_LEVEL1_DCACHE_LINESIZE, the flush_lock and write_lock could share a cache line with each other or with the last data members of log_sys. Thanks to Matthias Leich for providing https://rr-project.org traces for various failures during the development, and to Thirunarayanan Balathandayuthapani for his help in debugging some of the recovery code. And thanks to the developers of the rr debugger for a tool without which extensive changes to InnoDB would be very challenging to get right. Thanks to Vladislav Vaintroub for useful feedback and to him, Axel Schwenke and Krunal Bauskar for testing the performance.
* MDEV-25791: Remove UNIV_INTERNMarko Mäkelä2021-05-271-12/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Back in 2006 or 2007, when MySQL AB and Innobase Oy existed as separately controlled entities (Innobase had been acquired by Oracle Corporation), MySQL 5.1 introduced a storage engine plugin interface and Oracle made use of it by distributing a separate InnoDB Plugin, which would contain some more bug fixes and improvements, compared to the version of InnoDB that was statically linked with the mysqld server that was distributed by MySQL AB. The built-in InnoDB would export global symbols, which would clash with the symbols of the dynamic InnoDB Plugin (which was supposed to override the built-in one when present). The solution to this problem was to declare all global symbols with UNIV_INTERN, so that they would get the GCC function attribute that specifies hidden visibility. Later, in MariaDB Server, something based on Percona XtraDB (a fork of MySQL InnoDB) became the statically linked implementation, and something closer to MySQL InnoDB was available as a dynamic plugin. Starting with version 10.2, MariaDB Server includes only one InnoDB implementation, and hence any reason to have the UNIV_INTERN definition was lost. btr_get_size_and_reserved(): Move to the same compilation unit with the only caller. innodb_set_buf_pool_size(): Remove. Modify innobase_buffer_pool_size directly. fil_crypt_calculate_checksum(): Merge to the only caller. ha_innobase::innobase_reset_autoinc(): Merge to the only caller. thd_query_start_micro(): Remove. Call thd_start_utime() directly.
* Cleanup: log upgrade and encryptionMarko Mäkelä2020-03-071-8/+4
| | | | | | | | | | | | | | | | log_crypt_101_read_checkpoint(), log_crypt_101_read_block(): Declare as ATTRIBUTE_COLD. These are only used when checking that a MariaDB 10.1 encrypted redo log is clean. log_block_calc_checksum_format_0(): Define in the only compilation unit where it is needed. This is only used when reading the checkpoint information from redo logs before MariaDB 10.2.2. crypt_info_t: Declare the byte arrays directly with alignas(). log_crypt(): Use memcpy_aligned instead of reinterpret_cast on integers.
* Cleanup: Remove srv_start_lsnMarko Mäkelä2020-03-021-4/+3
| | | | Most of the time, we can refer to recv_sys.recovered_lsn.
* Merge 10.3 into 10.4Marko Mäkelä2019-07-021-6/+2
|\
| * MDEV-17228 Encrypted temporary tables are not encryptedThirunarayanan Balathandayuthapani2019-06-281-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | - Introduce a new variable called innodb_encrypt_temporary_tables which is a boolean variable. It decides whether to encrypt the temporary tablespace. - Encrypts the temporary tablespace based on full checksum format. - Introduced a new counter to track encrypted and decrypted temporary tablespace pages. - Warnings issued if temporary table creation has conflict value with innodb_encrypt_temporary_tables - Added a new test case which reads and writes the pages from/to temporary tablespace.
* | Merge branch '10.3' into 10.4Oleksandr Byelkin2019-05-191-1/+1
|\ \ | |/
| * Merge 10.1 into 10.2Marko Mäkelä2019-05-131-1/+1
| |\
| | * Merge branch '5.5' into 10.1Vicențiu Ciorbaru2019-05-111-1/+1
| | |
| | * MDEV-14874 innodb_encrypt_log corrupts the log when the LSN crosses 32-bit ↵Marko Mäkelä2018-01-081-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | boundary This bug affects both writing and reading encrypted redo log in MariaDB 10.1, starting from version 10.1.3 which added support for innodb_encrypt_log. That is, InnoDB crash recovery and Mariabackup will sometimes fail when innodb_encrypt_log is used. MariaDB 10.2 or Mariabackup 10.2 or later versions are not affected. log_block_get_start_lsn(): Remove. This function would cause trouble if a log segment that is being read is crossing a 32-bit boundary of the LSN, because this function does not allow the most significant 32 bits of the LSN to change. log_blocks_crypt(), log_encrypt_before_write(), log_decrypt_after_read(): Add the parameter "lsn" for the start LSN of the block. log_blocks_encrypt(): Remove (unused function).
* | | MDEV-12041: innodb_encrypt_log key rotationMarko Mäkelä2018-08-131-5/+14
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will change the InnoDB encrypted redo log format only. Unencrypted redo log will keep using the MariaDB 10.3 format. In the new encrypted redo log format, 4 additional bytes will be reserved in the redo log block trailer for storing the encryption key version. For performance reasons, the encryption key rotation (checking if the latest encryption key version is being used) is only done at log_checkpoint(). LOG_HEADER_FORMAT_CURRENT: Remove. LOG_HEADER_FORMAT_ENC_10_4: The encrypted 10.4 format. LOG_BLOCK_KEY: The encryption key version field. LOG_BLOCK_TRL_SIZE: Remove. log_t: Add accessors framing_size(), payload_size(), trailer_offset(), to be used instead of referring to LOG_BLOCK_TRL_SIZE. log_crypt_t: An operation passed to log_crypt(). log_crypt(): Perform decryption, encryption, or encryption with key rotation. Return an error if key rotation at decryption fails. On encryption, keep using the previous key if the rotation fails. At startup, old-format encrypted redo log may be written before the redo log is upgraded (rebuilt) to the latest format. log_write_up_to(): Add the parameter rotate_key=false. log_checkpoint(): Invoke log_write_up_to() with rotate_key=true.
* | Merge 10.1 into 10.2Marko Mäkelä2017-09-171-0/+40
|\ \ | |/ | | | | | | | | | | | | | | | | | | This should also fix the MariaDB 10.2.2 bug MDEV-13826 CREATE FULLTEXT INDEX on encrypted table fails. MDEV-12634 FIXME: Modify innodb-index-online, innodb-table-online so that they will write and read merge sort files. InnoDB 5.7 introduced some optimizations to avoid using the files for small tables. Many collation test results have been adjusted for MDEV-10191.
| * MDEV-12634: Uninitialised ROW_MERGE_RESERVE_SIZE bytes written to tem…Jan Lindström2017-09-141-1/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | …porary file Fixed by removing writing key version to start of every block that was encrypted. Instead we will use single key version from log_sys crypt info. After this MDEV also blocks writen to row log are encrypted and blocks read from row log aren decrypted if encryption is configured for the table. innodb_status_variables[], struct srv_stats_t Added status variables for merge block and row log block encryption and decryption amounts. Removed ROW_MERGE_RESERVE_SIZE define. row_merge_fts_doc_tokenize Remove ROW_MERGE_RESERVE_SIZE row_log_t Add index, crypt_tail, crypt_head to be used in case of encryption. row_log_online_op, row_log_table_close_func Before writing a block encrypt it if encryption is enabled row_log_table_apply_ops, row_log_apply_ops After reading a block decrypt it if encryption is enabled row_log_allocate Allocate temporary buffers crypt_head and crypt_tail if needed. row_log_free Free temporary buffers crypt_head and crypt_tail if they exist. row_merge_encrypt_buf, row_merge_decrypt_buf Removed. row_merge_buf_create, row_merge_buf_write Remove ROW_MERGE_RESERVE_SIZE row_merge_build_indexes Allocate temporary buffer used in decryption and encryption if needed. log_tmp_blocks_crypt, log_tmp_block_encrypt, log_temp_block_decrypt New functions used in block encryption and decryption log_tmp_is_encrypted New function to check is encryption enabled. Added test case innodb-rowlog to force creating a row log and verify that operations are done using introduced status variables.
* | MDEV-13318 Crash recovery failure after the server is killed during ↵Marko Mäkelä2017-09-121-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | innodb_encrypt_log startup This fixes several InnoDB bugs related to innodb_encrypt_log and two Mariabackup --backup bugs. log_crypt(): Properly derive the initialization vector from the start LSN of each block. Add a debug assertion. log_crypt_init(): Note that the function should only be used when creating redo log files and that the information is persisted in the checkpoint pages. xtrabackup_copy_log(): Validate data_len. xtrabackup_backup_func(): Always use the chosen checkpoint buffer. log_group_write_buf(), log_write_up_to(): Only log_crypt() the redo log payload, not the padding bytes. innobase_start_or_create_for_mysql(): Do not invoke log_crypt_init() or initiate a redo log checkpoint. recv_find_max_checkpoint(): Return the contents of LOG_CHECKPOINT_NO to xtrabackup_backup_func() in log_sys->next_checkpoint_no.
* | MDEV-11782: Redefine the innodb_encrypt_log formatMarko Mäkelä2017-02-151-78/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Write only one encryption key to the checkpoint page. Use 4 bytes of nonce. Encrypt more of each redo log block, only skipping the 4-byte field LOG_BLOCK_HDR_NO which the initialization vector is derived from. Issue notes, not warning messages for rewriting the redo log files. recv_recovery_from_checkpoint_finish(): Do not generate any redo log, because we must avoid that before rewriting the redo log files, or otherwise a crash during a redo log rewrite (removing or adding encryption) may end up making the database unrecoverable. Instead, do these tasks in innobase_start_or_create_for_mysql(). Issue a firm "Missing MLOG_CHECKPOINT" error message. Remove some unreachable code and duplicated error messages for log corruption. LOG_HEADER_FORMAT_ENCRYPTED: A flag for identifying an encrypted redo log format. log_group_t::is_encrypted(), log_t::is_encrypted(): Determine if the redo log is in encrypted format. recv_find_max_checkpoint(): Interpret LOG_HEADER_FORMAT_ENCRYPTED. srv_prepare_to_delete_redo_log_files(): Display NOTE messages about adding or removing encryption. Do not issue warnings for redo log resizing any more. innobase_start_or_create_for_mysql(): Rebuild the redo logs also when the encryption changes. innodb_log_checksums_func_update(): Always use the CRC-32C checksum if innodb_encrypt_log. If needed, issue a warning that innodb_encrypt_log implies innodb_log_checksums. log_group_write_buf(): Compute the checksum on the encrypted block contents, so that transmission errors or incomplete blocks can be detected without decrypting. Rewrite most of the redo log encryption code. Only remember one encryption key at a time (but remember up to 5 when upgrading from the MariaDB 10.1 format.)
* | MDEV-11782 preparation: Add separate code for validating the 10.1 redo log.Marko Mäkelä2017-02-131-1/+14
|/ | | | | | | | | | | | | | log_crypt_101_read_checkpoint(): Read the encryption information from a MariaDB 10.1 checkpoint page. log_crypt_101_read_block(): Attempt to decrypt a MariaDB 10.1 redo log page. recv_log_format_0_recover(): Only attempt decryption on checksum mismatch. NOTE: With the MariaDB 10.1 innodb_encrypt_log format, we can actually determine from the cleartext portion of the redo log whether the redo log is empty. We do not really have to decrypt the redo log here, if we did not want to determine if the checksum is valid.
* MDEV-9793: getting mysqld crypto key from key version failedJan Lindström2016-03-301-1/+9
| | | | | | Make sure that we read all possible encryption keys from checkpoint and if log block checksum does not match, print all found checkpoint encryption keys.
* MDEV-8410: Changing file-key-management to example-key-management causes ↵Jan Lindström2015-08-081-9/+61
| | | | | | | | | | | | | | | | | | | crash and no real error MDEV-8409: Changing file-key-management-encryption-algorithm causes crash and no real info why Analysis: Both bugs has two different error cases. Firstly, at startup when server reads latest checkpoint but requested key_version, key management plugin or encryption algorithm or method is not found leading corrupted log entry. Secondly, similarly when reading system tablespace if requested key_version, key management plugin or encryption algorithm or method is not found leading buffer pool page corruption. Fix: Firsly, when reading checkpoint at startup check if the log record may be encrypted and if we find that it could be encrypted, print error message and do not start server. Secondly, if page is buffer pool seems corrupted but we find out that there is crypt_info, print additional error message before asserting.
* MDEV-8156: Assertion failure in file log0crypt.cc line 220 on server restartJan Lindström2015-06-181-1/+1
| | | | | Instead of asserting print informative error message to error log and return failure from innodb_init causing the server to shutdown.
* MDEV-8041: InnoDB redo log encryptionJan Lindström2015-05-091-52/+34
| | | | | Merged new version of InnoDB/XtraDB redo log encryption from Google provided by Jonas Oreland.
* Add encryption key id to the API as a distinct conceptSergei Golubchik2015-04-091-1/+1
| | | | which is separate from the encryption key version
* remove now-empty my_aes.{h,cc}Sergei Golubchik2015-04-091-1/+1
| | | | move remaning defines to my_crypt, add MY_ namespace prefix
* renames to follow single consistent naming styleSergei Golubchik2015-04-091-1/+1
| | | | with namespace prefixes
* encryption plugin controls the encryptionSergei Golubchik2015-04-091-0/+2
| | | | | | | | | * no --encryption-algorithm option anymore * encrypt/decrypt methods in the encryption plugin * ecnrypt/decrypt methods in the encryption_km service * file_km plugin has --file-key-management-encryption-algorithm * debug_km always uses aes_cbc * example_km changes between aes_cbc and aes_ecb for different key versions
* encryption keys serviceSergei Golubchik2015-02-101-1/+0
|
* encryption key management plugin apiSergei Golubchik2015-02-101-1/+1
|
* Push for testing of encryptionMonty2015-02-101-0/+85