summaryrefslogtreecommitdiff
path: root/storage/innobase/include/row0upd.h
Commit message (Collapse)AuthorAgeFilesLines
* Merge 10.5 into 10.6Marko Mäkelä2023-01-031-10/+6
|\
| * Merge 10.4 into 10.5Marko Mäkelä2023-01-031-10/+6
| |\
| | * Merge 10.3 into 10.4Marko Mäkelä2023-01-031-10/+6
| | |\
| | | * MDEV-25004 Missing row in FTS_DOC_ID_INDEX during DELETE HISTORYAleksey Midenkov2022-12-271-10/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. In case of system-versioned table add row_end into FTS_DOC_ID index in fts_create_common_tables() and innobase_create_key_defs(). fts_n_uniq() returns 1 or 2 depending on whether the table is system-versioned. After this patch recreate of FTS_DOC_ID index is required for existing system-versioned tables. If you see this message in error log or server warnings: "InnoDB: Table db/t1 contains 2 indexes inside InnoDB, which is different from the number of indexes 1 defined in the MariaDB" use this command to fix the table: ALTER TABLE db.t1 FORCE; 2. Fix duplicate history for secondary unique index like it was done in MDEV-23644 for clustered index (932ec586aad). In case of existing history row which conflicts with currently inseted row we check in row_ins_scan_sec_index_for_duplicate() whether that row was inserted as part of current transaction. In that case we indicate with DB_FOREIGN_DUPLICATE_KEY that new history row is not needed and should be silently skipped. 3. Some parts of MDEV-21138 (7410ff436e9) reverted. Skipping of FTS_DOC_ID index for history rows made problems with purge system. Now this is fixed differently by p.2. 4. wait_all_purged.inc checks that we didn't affect non-history rows so they are deleted and purged correctly. Additional FTS fixes fts_init_get_doc_id(): exclude history rows from max_doc_id calculation. fts_init_get_doc_id() callback is used only for crash recovery. fts_add_doc_by_id(): set max value for row_end field. fts_read_stopword(): stopwords table can be system-versioned too. We now read stopwords only for current data. row_insert_for_mysql(): exclude history rows from doc_id validation. row_merge_read_clustered_index(): exclude history_rows from doc_id processing. fts_load_user_stopword(): for versioned table retrieve row_end field and skip history rows. For non-versioned table we retrieve 'value' field twice (just for uniformity). FTS tests for System Versioning now include maybe_versioning.inc which adds 3 combinations: 'vers' for debug build sets sysvers_force and sysvers_hide. sysvers_force makes every created table system-versioned, sysvers_hide hides WITH SYSTEM VERSIONING for SHOW CREATE. Note: basic.test, stopword.test and versioning.test do not require debug for 'vers' combination. This is controlled by $modify_create_table in maybe_versioning.inc and these tests run WITH SYSTEM VERSIONING explicitly which allows to test 'vers' combination on non-debug builds. 'vers_trx' like 'vers' sets sysvers_force_trx and sysvers_hide. That tests FTS with trx_id-based System Versioning. 'orig' works like before: no System Versioning is added, no debug is required. Upgrade/downgrade test for System Versioning is done by innodb_fts.versioning. It has 2 combinations: 'prepare' makes binaries in std_data (requires old server and OLD_BINDIR). It tests upgrade/downgrade against old server as well. 'upgrade' tests upgrade against binaries in std_data. Cleanups: Removed innodb-fts-stopword.test as it duplicates stopword.test
* | | | MDEV-24402: InnoDB CHECK TABLE ... EXTENDEDMarko Mäkelä2022-10-211-9/+1
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, the attribute EXTENDED of CHECK TABLE was ignored by InnoDB, and InnoDB only counted the records in each index according to the current read view. Unless the attribute QUICK was specified, the function btr_validate_index() would be invoked to validate the B-tree structure (the sibling and child links between index pages). The EXTENDED check will not only count all index records according to the current read view, but also ensure that any delete-marked records in the clustered index are waiting for the purge of history, and that all secondary index records point to a version of the clustered index record that is waiting for the purge of history. In other words, no index may contain orphan records. Normal MVCC reads and the non-EXTENDED version of CHECK TABLE would ignore these orphans. Unpurged records merely result in warnings (at most one per index), not errors, and no indexes will be flagged as corrupted due to such garbage. It will remain possible to SELECT data from such indexes or tables (which will skip such records) or to rebuild the table to reclaim some space. We introduce purge_sys.end_view that will be (almost) a copy of purge_sys.view at the end of a batch of purging committed transaction history. It is not an exact copy, because if the size of a purge batch is limited by innodb_purge_batch_size, some records that purge_sys.view would allow to be purged will be left over for subsequent batches. The purge_sys.view is relevant in the purge of committed transaction history, to determine if records are safe to remove. The new purge_sys.end_view is relevant in MVCC operations and in CHECK TABLE ... EXTENDED. It tells which undo log records are safe to access (have not been discarded at the end of a purge batch). purge_sys.clone_oldest_view<true>(): In trx_lists_init_at_db_start(), clone the oldest read view similar to purge_sys_t::clone_end_view() so that CHECK TABLE ... EXTENDED will not report bogus failures between InnoDB restart and the completed purge of committed transaction history. purge_sys_t::is_purgeable(): Replaces purge_sys_t::changes_visible() in the case that purge_sys.latch will not be held by the caller. Among other things, this guards access to BLOBs. It is not safe to dereference any BLOBs of a delete-marked purgeable record, because they may have already been freed. purge_sys_t::view_guard::view(): Return a reference to purge_sys.view that will be protected by purge_sys.latch, held by purge_sys_t::view_guard. purge_sys_t::end_view_guard::view(): Return a reference to purge_sys.end_view while it is protected by purge_sys.end_latch. Whenever a thread needs to retrieve an older version of a clustered index record, it will hold a page latch on the clustered index page and potentially also on a secondary index page that points to the clustered index page. If these pages contain purgeable records that would be accessed by a currently running purge batch, the progress of the purge batch would be blocked by the page latches. Hence, it is safe to make a copy of purge_sys.end_view while holding an index page latch, and consult the copy of the view to determine whether a record should already have been purged. btr_validate_index(): Remove a redundant check. row_check_index_match(): Check if a secondary index record and a version of a clustered index record match each other. row_check_index(): Replaces row_scan_index_for_mysql(). Count the records in each index directly, duplicating the relevant logic from row_search_mvcc(). Initialize check_table_extended_view for CHECK ... EXTENDED while holding an index leaf page latch. If we encounter an orphan record, the copy of purge_sys.end_view that we make is safe for visibility checks, and trx_undo_get_undo_rec() will check for the safety to access each undo log record. Should that check fail, we should return DB_MISSING_HISTORY to report a corrupted index. The EXTENDED check tries to match each secondary index record with every available clustered index record version, by duplicating the logic of row_vers_build_for_consistent_read() and invoking trx_undo_prev_version_build() directly. Before invoking row_check_index_match() on delete-marked clustered index record versions, we will consult purge_sys.is_purgeable() in order to avoid accessing freed BLOBs. We will always check that the DB_TRX_ID or PAGE_MAX_TRX_ID does not exceed the global maximum. Orphan secondary index records will be flagged only if everything up to PAGE_MAX_TRX_ID has been purged. We warn also about clustered index records whose nonzero DB_TRX_ID should have been reset in purge or rollback. trx_set_rw_mode(): Move an assertion from ReadView::set_creator_trx_id(). trx_undo_prev_version_build(): Remove two debug-only parameters, and return an error code instead of a Boolean. trx_undo_get_undo_rec(): Return a pointer to the undo log record, or nullptr if one cannot be retrieved. Instead of consulting the purge_sys.view, consult the purge_sys.end_view to determine which records can be accessed. trx_undo_get_rec_if_purgeable(): A variant of trx_undo_get_undo_rec() that will consult purge_sys.view instead of purge_sys.end_view. TRX_UNDO_CHECK_PURGEABILITY: A new parameter to trx_undo_prev_version_build(), passed by row_vers_old_has_index_entry() so that purge_sys.view instead of purge_sys.end_view will be consulted to determine whether a secondary index record may be safely purged. row_upd_changes_disowned_external(): Remove. This should be more expensive than briefly latching purge_sys in trx_undo_prev_version_build() (which may make use of transactional memory). row_sel_reset_old_vers_heap(): New function, split from row_sel_build_prev_vers_for_mysql(). row_sel_build_prev_vers_for_mysql(): Reorder some parameters to simplify the call to row_sel_reset_old_vers_heap(). row_search_for_mysql(): Replaced with direct calls to row_search_mvcc(). sel_node_get_nth_plan(): Define inline in row0sel.h open_step(): Define at the call site, in simplified form. sel_node_reset_cursor(): Merged with the only caller open_step(). --- ReadViewBase::check_trx_id_sanity(): Remove. Let us handle "future" DB_TRX_ID in a more meaningful way: row_sel_clust_sees(): Return DB_SUCCESS if the record is visible, DB_SUCCESS_LOCKED_REC if it is invisible, and DB_CORRUPTION if the DB_TRX_ID is in the future. row_undo_mod_must_purge(), row_undo_mod_clust(): Silently ignore corrupted DB_TRX_ID. We are in ROLLBACK, and we should have noticed that corruption when we were about to modify the record in the first place (leading us to refuse the operation). row_vers_build_for_consistent_read(): Return DB_CORRUPTION if DB_TRX_ID is in the future. Tested by: Matthias Leich Reviewed by: Vladislav Lesin
* | | Merge 10.4 into 10.5Marko Mäkelä2022-10-131-1/+4
|\ \ \ | |/ /
| * | Merge 10.3 into 10.4Marko Mäkelä2022-10-131-1/+4
| |\ \ | | |/
| | * MDEV-29753 An error is wrongly reported during INSERT with vcol indexNikita Malyavin2022-10-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | See also commits aa8a31da and 64678c for a Bug #22990029 fix. In this scenario INSERT chose to check if delete unmarking is available for a just deleted record. To build an update vector, it needed to calculate the vcols as well. Since this INSERT was not IGNORE-flagged, recalculation failed. Solutiuon: temporarily set abort_on_warning=true, while calculating the column for delete-unmarked insert.
* | | Merge branch '10.4' into 10.5Oleksandr Byelkin2022-02-011-1/+1
|\ \ \ | |/ /
| * | Merge branch '10.3' into 10.4Oleksandr Byelkin2022-01-301-1/+1
| |\ \ | | |/
| | * Merge branch '10.2' into 10.3mariadb-10.3.33Oleksandr Byelkin2022-01-291-1/+1
| | |\
| | | * MDEV-27494 Rename .ic files to .inlVladislav Vaintroub2022-01-171-1/+1
| | | |
* | | | Merge 10.4 into 10.5Marko Mäkelä2020-08-261-18/+12
|\ \ \ \ | |/ / /
| * | | Merge 10.3 into 10.4Marko Mäkelä2020-08-261-18/+12
| |\ \ \ | | |/ /
| | * | Merge 10.2 into 10.3Marko Mäkelä2020-08-261-18/+12
| | |\ \ | | | |/
| | | * MDEV-23547 InnoDB: Failing assertion: *len in row_upd_ext_fetchMarko Mäkelä2020-08-251-18/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bug was originally repeated on 10.4 after defining a UNIQUE KEY on a TEXT column, which is implemented by MDEV-371 by creating the index on a hidden virtual column. While row_vers_vc_matches_cluster() is executing in a purge thread to find out if an index entry may be removed in a secondary index that comprises a virtual column, another purge thread may process the undo log record that this check is interested in, and write a null BLOB pointer in that record. This would trip the assertion. To prevent this from occurring, we must propagate the 'missing BLOB' error up the call stack. row_upd_ext_fetch(): Return NULL when the error occurs. row_upd_index_replace_new_col_val(): Return whether the previous version was built successfully. row_upd_index_replace_new_col_vals_index_pos(): Check the error result. Yes, we would intentionally crash on this error if it occurs outside the purge thread. row_upd_index_replace_new_col_vals(): Check for the error condition, and simplify the logic. trx_undo_prev_version_build(): Check for the error condition.
* | | | Merge 10.4 into 10.5Marko Mäkelä2020-07-211-10/+35
|\ \ \ \ | |/ / /
| * | | Merge 10.3 into 10.4Marko Mäkelä2020-07-211-10/+35
| |\ \ \ | | |/ /
| | * | MDEV-20661 Virtual fields are not recalculated on system fields value assignmentAleksey Midenkov2020-07-201-1/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix stale virtual field value in 4 cases: when virtual field depends on row_start/row_end in timestamp/trx_id versioned table. row_start dep is recalculated in vers_update_fields() (SQL and InnoDB layer). row_end dep is recalculated on history row insert.
| | * | MDEV-22061 InnoDB: Assertion of missing row in sec index row_start upon ↵Aleksey Midenkov2020-07-201-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | REPLACE on a system-versioned table make_versioned_helper() appended new update field unconditionally while it should check if this field already exists in update vector. Misc renames to conform versioning prefix. vers_update_fields() name conforms with sql layer TABLE::vers_update_fields().
* | | | Merge 10.4 into 10.5Marko Mäkelä2020-05-051-3/+3
|\ \ \ \ | |/ / /
| * | | Merge 10.3 into 10.4Marko Mäkelä2020-05-051-6/+6
| |\ \ \ | | |/ /
| | * | Merge branch '10.2' into 10.3Oleksandr Byelkin2020-05-041-6/+6
| | |\ \ | | | |/
| | | * MDEV-21595: innodb offset_t rename to rec_offsDaniel Black2020-04-291-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | thanks to: perl -i -pe 's/\boffset_t\b/rec_offs/g' $(git grep -lw offset_t storage/innobase)
* | | | Merge 10.4 into 10.5Marko Mäkelä2020-04-291-10/+0
|\ \ \ \ | |/ / /
| * | | Merge 10.3 into 10.4Marko Mäkelä2020-04-271-11/+1
| |\ \ \ | | |/ /
| | * | Merge 10.2 into 10.3Marko Mäkelä2020-04-271-11/+1
| | |\ \ | | | |/
| | | * Cleanup: Make row_upd_store_row() staticMarko Mäkelä2020-04-241-11/+1
| | | |
* | | | MDEV-21907: InnoDB: Enable -Wconversion on clang and GCCMarko Mäkelä2020-03-121-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The -Wconversion in GCC seems to be stricter than in clang. GCC at least since version 4.4.7 issues truncation warnings for assignments to bitfields, while clang 10 appears to only issue warnings when the sizes in bytes rounded to the nearest integer powers of 2 are different. Before GCC 10.0.0, -Wconversion required more casts and would not allow some operations, such as x<<=1 or x+=1 on a data type that is narrower than int. GCC 5 (but not GCC 4, GCC 6, or any later version) is complaining about x|=y even when x and y are compatible types that are narrower than int. Hence, we must rewrite some x|=y as x=static_cast<byte>(x|y) or similar, or we must disable -Wconversion. In GCC 6 and later, the warning for assigning wider to bitfields that are narrower than 8, 16, or 32 bits can be suppressed by applying a bitwise & with the exact bitmask of the bitfield. For older GCC, we must disable -Wconversion for GCC 4 or 5 in such cases. The bitwise negation operator appears to promote short integers to a wider type, and hence we must add explicit truncation casts around them. Microsoft Visual C does not allow a static_cast to truncate a constant, such as static_cast<byte>(1) truncating int. Hence, we will use the constructor-style cast byte(~1) for such cases. This has been tested at least with GCC 4.8.5, 5.4.0, 7.4.0, 9.2.1, 10.0.0, clang 9.0.1, 10.0.0, and MSVC 14.22.27905 (Microsoft Visual Studio 2019) on 64-bit and 32-bit targets (IA-32, AMD64, POWER 8, POWER 9, ARMv8).
* | | | MDEV-12353: Remove support for crash-upgradebb-10.5-MDEV-12353Marko Mäkelä2020-02-131-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | We tighten some assertions regarding dict_index_t::is_dummy and crash recovery, now that redo log processing will no longer create dummy objects.
* | | | MDEV-12353: Replace MLOG_REC_INSERT,MLOG_COMP_REC_INSERTMarko Mäkelä2020-02-131-26/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | page_mem_alloc_free(), page_dir_set_n_heap(), page_ptr_set_direction(): Merge with the callers. page_direction_reset(), page_direction_increment(), page_zip_dir_insert(), page_zip_write_rec_ext(), page_zip_write_rec(): Add the parameter mtr, and write log. PageBulk::insert(), PageBulk::finish(): Write log for all changes. page_cur_rec_insert(), page_cur_insert_rec_write_log(), page_cur_insert_rec_write_log(): Remove. page_rec_set_next(), page_header_set_field(), page_header_set_ptr(): Remove. Use lower-level operations with or without logging. page_zip_dir_add_slot(): Move to the same compilation unit with its only caller, page_cur_insert_rec_zip(). page_cur_insert_rec_zip(): Mark pieces of code that must be skipped once this task is completed. btr_defragment_chunk(): Before starting a mini-transaction that is writing (a lot), invoke log_free_check(). This should allow the test innodb.innodb_defrag_concurrent to pass with the mtr default_mysqld.cnf setting of innodb_log_file_size=10M. MLOG_BUF_MARGIN: Remove.
* | | | MDEV-12353: Replace DELETE_MARK redo log records with MLOG_WRITE_STRINGMarko Mäkelä2020-02-131-38/+1
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btr_cur_upd_rec_sys(): Replaces row_upd_rec_sys_fields() and implements redo logging. row_upd_rec_sys_fields_in_recovery(): Remove, and merge to the only remaining caller btr_cur_parse_update_in_place(). btr_cur_del_mark_set_clust_rec_log(), btr_cur_del_mark_set_sec_rec_log(), btr_cur_set_deleted_flag_for_ibuf(): Remove, and replace with btr_rec_set_deleted<bool>(). page_zip_rec_set_deleted(): Add the parameter mtr, and write a MLOG_ZIP_WRITE_STRING record to the log.
* | | Merge 10.3 into 10.4Marko Mäkelä2019-12-131-6/+7
|\ \ \ | |/ / | | | | | | | | | We disable the MDEV-21189 test galera.galera_partition because it times out.
| * | Merge 10.2 into 10.3Marko Mäkelä2019-12-131-6/+7
| |\ \ | | |/
| | * MDEV-20950 Reduce size of record offsetsbb-10.2-MDEV-20950-stack-offsetsEugene Kosov2019-12-131-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | offset_t: this is a type which represents one record offset. It's unsigned short int. a lot of functions: replace ulint with offset_t btr_pcur_restore_position_func(), page_validate(), row_ins_scan_sec_index_for_duplicate(), row_upd_clust_rec_by_insert_inherit_func(), row_vers_impl_x_locked_low(), trx_undo_prev_version_build(): allocate record offsets on the stack instead of waiting for rec_get_offsets() to allocate it from mem_heap_t. So, reducing memory allocations. RECORD_OFFSET, INDEX_OFFSET: now it's less convenient to store pointers in offset_t* array. One pointer occupies now several offset_t. And those constant are start indexes into array to places where to store pointer values REC_OFFS_HEADER_SIZE: adjusted for the new reality REC_OFFS_NORMAL_SIZE: increase size from 100 to 300 which means less heap allocations. And sizeof(offset_t[REC_OFFS_NORMAL_SIZE]) now is 600 bytes which is smaller than previous 800 bytes. REC_OFFS_SEC_INDEX_SIZE: adjusted for the new reality rem0rec.h, rem0rec.ic, rem0rec.cc: various arguments, return values and local variables types were changed to fix numerous integer conversions issues. enum field_type_t: offset types concept was introduces which replaces old offset flags stuff. Like in earlier version, 2 upper bits are used to store offset type. And this enum represents those types. REC_OFFS_SQL_NULL, REC_OFFS_MASK: removed get_type(), set_type(), get_value(), combine(): these are convenience functions to work with offsets and it's types rec_offs_base()[0]: still uses an old scheme with flags REC_OFFS_COMPACT and REC_OFFS_EXTERNAL rec_offs_base()[i]: these have type offset_t now. Two upper bits contains type.
* | | Merge 10.3 into 10.4Eugene Kosov2019-07-261-8/+18
|\ \ \ | |/ /
| * | Cleanups: DELETE HISTORY [MDEV-19814]Aleksey Midenkov2019-07-251-8/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Made make_versioned_*() proxies inline; * Renamed truncate_history to delete_history Part of: MDEV-19814 Server crash in row_upd_del_mark_clust_rec or Assertion `update->n_fields < ulint(table->n_cols + table->n_v_cols)' failed in upd_node_t::make_versioned_helper
* | | Merge branch '10.3' into 10.4Oleksandr Byelkin2019-05-191-1/+1
|\ \ \ | |/ /
| * | Merge 10.2 into 10.3Marko Mäkelä2019-05-141-1/+1
| |\ \ | | |/
| | * Merge 10.1 into 10.2Marko Mäkelä2019-05-131-1/+1
| | |\
| | | * Update FSF addressVicențiu Ciorbaru2019-05-111-1/+1
| | | |
* | | | Merge 10.3 into 10.4Marko Mäkelä2019-01-241-3/+5
|\ \ \ \ | |/ / /
| * | | Merge 10.2 into 10.3Marko Mäkelä2019-01-241-3/+5
| |\ \ \ | | |/ /
| | * | Bug #22990029 GCOLS: INCORRECT BEHAVIOR AFTER DATA INSERTED WITH IGNORE KEYWORDAditya A2019-01-231-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PROBLEM ------- 1. We are inserting a base column entry which causes an invalid value by the function provided to generate virtual column,but we go ahead and insert this due to ignore keyword. 2. We then delete this record, making this record delete marked in innodb. If we try to insert another record with the same pk as the deleted record and if the rec is not purged ,then we try to undelete mark this record and try to build a update vector with previous and updated value and while calculating the value of virtual column we get error from server that we cannot calculate this from base column. Innodb assumes that innobase_get_computed_value() Should always return a valid value for the base column present in the row. The failure of this call was not handled ,so we were crashing. FIX
* | | | Merge 10.3 into 10.4Marko Mäkelä2018-11-301-1/+0
|\ \ \ \ | |/ / /
| * | | Merge 10.2 into 10.3Marko Mäkelä2018-11-301-1/+0
| |\ \ \ | | |/ / | | | | | | | | | | | | | | | | Also, related to MDEV-15522, MDEV-17304, MDEV-17835, remove the Galera xtrabackup tests, because xtrabackup never worked with MariaDB Server 10.3 due to InnoDB redo log format changes.
| | * | Remove some unnecessary InnoDB #includeMarko Mäkelä2018-11-291-2/+0
| | | |
* | | | MDEV-17750: Remove dict_index_get_sys_col_pos()Marko Mäkelä2018-11-161-25/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dict_index_t::db_trx_id(): Return the position of DB_TRX_ID. Only valid for the clustered index. dict_index_t::db_roll_ptr(): Return the position of DB_ROLL_PTR. Only valid for the clustered index. dict_index_get_sys_col_pos(): Remove. This was performing unnecessarily complex computations, which only made sense for DB_ROW_ID, which would exist either as the first field in the clustered index or as the last field in a secondary index (only when a DB_ROW_ID column is materialised). row_sel_store_row_id_to_prebuilt(): Remove, and replace with simpler code. row_upd_index_entry_sys_field(): Remove. btr_cur_log_sys(): Replaces row_upd_write_sys_vals_to_log(). btr_cur_write_sys(): Write DB_TRX_ID,DB_ROLL_PTR to a data tuple.
* | | | MDEV-15662 Instant DROP COLUMN or changing the order of columnsMarko Mäkelä2018-10-191-1/+8
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow ADD COLUMN anywhere in a table, not only adding as the last column. Allow instant DROP COLUMN and instant changing the order of columns. The added columns will always be added last in clustered index records. In new records, instantly dropped columns will be stored as NULL or empty when possible. Information about dropped and reordered columns will be written in a metadata BLOB (mblob), which is stored before the first 'user' field in the hidden metadata record at the start of the clustered index. The presence of mblob is indicated by setting the delete-mark flag in the metadata record. The metadata BLOB stores the number of clustered index fields, followed by an array of column information for each field. For dropped columns, we store the NOT NULL flag, the fixed length, and for variable-length columns, whether the maximum length exceeded 255 bytes. For non-dropped columns, we store the column position. Unlike with MDEV-11369, when a table becomes empty, it cannot be converted back to the canonical format. The reason for this is that other threads may hold cached objects such as row_prebuilt_t::ins_node that could refer to dropped or reordered index fields. For instant DROP COLUMN and ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC, we must store the n_core_null_bytes in the root page, so that the chain of node pointer records can be followed in order to reach the leftmost leaf page where the metadata record is located. If the mblob is present, we will zero-initialize the strings "infimum" and "supremum" in the root page, and use the last byte of "supremum" for storing the number of null bytes (which are allocated but useless on node pointer pages). This is necessary for btr_cur_instant_init_metadata() to be able to navigate to the mblob. If the PRIMARY KEY contains any variable-length column and some nullable columns were instantly dropped, the dict_index_t::n_nullable in the data dictionary could be smaller than it actually is in the non-leaf pages. Because of this, the non-leaf pages could use more bytes for the null flags than the data dictionary expects, and we could be reading the lengths of the variable-length columns from the wrong offset, and thus reading the child page number from wrong place. This is the result of two design mistakes that involve unnecessary storage of data: First, it is nonsense to store any data fields for the leftmost node pointer records, because the comparisons would be resolved by the MIN_REC_FLAG alone. Second, there cannot be any null fields in the clustered index node pointer fields, but we nevertheless reserve space for all the null flags. Limitations (future work): MDEV-17459 Allow instant ALTER TABLE even if FULLTEXT INDEX exists MDEV-17468 Avoid table rebuild on operations on generated columns MDEV-17494 Refuse ALGORITHM=INSTANT when the row size is too large btr_page_reorganize_low(): Preserve any metadata in the root page. Call lock_move_reorganize_page() only after restoring the "infimum" and "supremum" records, to avoid a memcmp() assertion failure. dict_col_t::DROPPED: Magic value for dict_col_t::ind. dict_col_t::clear_instant(): Renamed from dict_col_t::remove_instant(). Do not assert that the column was instantly added, because we sometimes call this unconditionally for all columns. Convert an instantly added column to a "core column". The old name remove_instant() could be mistaken to refer to "instant DROP COLUMN". dict_col_t::is_added(): Rename from dict_col_t::is_instant(). dtype_t::metadata_blob_init(): Initialize the mblob data type. dtuple_t::is_metadata(), dtuple_t::is_alter_metadata(), upd_t::is_metadata(), upd_t::is_alter_metadata(): Check if info_bits refer to a metadata record. dict_table_t::instant: Metadata about dropped or reordered columns. dict_table_t::prepare_instant(): Prepare ha_innobase_inplace_ctx::instant_table for instant ALTER TABLE. innobase_instant_try() will pass this to dict_table_t::instant_column(). On rollback, dict_table_t::rollback_instant() will be called. dict_table_t::instant_column(): Renamed from instant_add_column(). Add the parameter col_map so that columns can be reordered. Copy and adjust v_cols[] as well. dict_table_t::find(): Find an old column based on a new column number. dict_table_t::serialise_columns(), dict_table_t::deserialise_columns(): Convert the mblob. dict_index_t::instant_metadata(): Create the metadata record for instant ALTER TABLE. Invoke dict_table_t::serialise_columns(). dict_index_t::reconstruct_fields(): Invoked by dict_table_t::deserialise_columns(). dict_index_t::clear_instant_alter(): Move the fields for the dropped columns to the end, and sort the surviving index fields in ascending order of column position. ha_innobase::check_if_supported_inplace_alter(): Do not allow adding a FTS_DOC_ID column if a hidden FTS_DOC_ID column exists due to FULLTEXT INDEX. (This always required ALGORITHM=COPY.) instant_alter_column_possible(): Add a parameter for InnoDB table, to check for additional conditions, such as the maximum number of index fields. ha_innobase_inplace_ctx::first_alter_pos: The first column whose position is affected by instant ADD, DROP, or changing the order of columns. innobase_build_col_map(): Skip added virtual columns. prepare_inplace_add_virtual(): Correctly compute num_to_add_vcol. Remove some unnecessary code. Note that the call to innodb_base_col_setup() should be executed later. commit_try_norebuild(): If ctx->is_instant(), let the virtual columns be added or dropped by innobase_instant_try(). innobase_instant_try(): Fill in a zero default value for the hidden column FTS_DOC_ID (to reduce the work needed in MDEV-17459). If any columns were dropped or reordered (or added not last), delete any SYS_COLUMNS records for the following columns, and insert SYS_COLUMNS records for all subsequent stored columns as well as for all virtual columns. If any virtual column is dropped, rewrite all virtual column metadata. Use a shortcut only for adding virtual columns. This is because innobase_drop_virtual_try() assumes that the dropped virtual columns still exist in ctx->old_table. innodb_update_cols(): Renamed from innodb_update_n_cols(). innobase_add_one_virtual(), innobase_insert_sys_virtual(): Change the return type to bool, and invoke my_error() when detecting an error. innodb_insert_sys_columns(): Insert a record into SYS_COLUMNS. Refactored from innobase_add_one_virtual() and innobase_instant_add_col(). innobase_instant_add_col(): Replace the parameter dfield with type. innobase_instant_drop_cols(): Drop matching columns from SYS_COLUMNS and all columns from SYS_VIRTUAL. innobase_add_virtual_try(), innobase_drop_virtual_try(): Let the caller invoke innodb_update_cols(). innobase_rename_column_try(): Skip dropped columns. commit_cache_norebuild(): Update table->fts->doc_col. dict_mem_table_col_rename_low(): Skip dropped columns. trx_undo_rec_get_partial_row(): Skip dropped columns. trx_undo_update_rec_get_update(): Handle the metadata BLOB correctly. trx_undo_page_report_modify(): Avoid out-of-bounds access to record fields. Log metadata records consistently. Apparently, the first fields of a clustered index may be updated in an update_undo vector when the index is ID_IND of SYS_FOREIGN, as part of renaming the table during ALTER TABLE. Normally, updates of the PRIMARY KEY should be logged as delete-mark and an insert. row_undo_mod_parse_undo_rec(), row_purge_parse_undo_rec(): Use trx_undo_metadata. row_undo_mod_clust_low(): On metadata rollback, roll back the root page too. row_undo_mod_clust(): Relax an assertion. The delete-mark flag was repurposed for ALTER TABLE metadata records. row_rec_to_index_entry_impl(): Add the template parameter mblob and the optional parameter info_bits for specifying the desired new info bits. For the metadata tuple, allow conversion between the original format (ADD COLUMN only) and the generic format (with hidden BLOB). Add the optional parameter "pad" to determine whether the tuple should be padded to the index fields (on ALTER TABLE it should), or whether it should remain at its original size (on rollback). row_build_index_entry_low(): Clean up the code, removing redundant variables and conditions. For instantly dropped columns, generate a dummy value that is NULL, the empty string, or a fixed length of NUL bytes, depending on the type of the dropped column. row_upd_clust_rec_by_insert_inherit_func(): On the update of PRIMARY KEY of a record that contained a dropped column whose value was stored externally, we will be inserting a dummy NULL or empty string value to the field of the dropped column. The externally stored column would eventually be dropped when purge removes the delete-marked record for the old PRIMARY KEY value. btr_index_rec_validate(): Recognize the metadata record. btr_discard_only_page_on_level(): Preserve the generic instant ALTER TABLE metadata. btr_set_instant(): Replaces page_set_instant(). This sets a clustered index root page to the appropriate format, or upgrades from the MDEV-11369 instant ADD COLUMN to generic ALTER TABLE format. btr_cur_instant_init_low(): Read and validate the metadata BLOB page before reconstructing the dictionary information based on it. btr_cur_instant_init_metadata(): Do not read any lengths from the metadata record header before reading the BLOB. At this point, we would not actually know how many nullable fields the metadata record contains. btr_cur_instant_root_init(): Initialize n_core_null_bytes in one of two possible ways. btr_cur_trim(): Handle the mblob record. row_metadata_to_tuple(): Convert a metadata record to a data tuple, based on the new info_bits of the metadata record. btr_cur_pessimistic_update(): Invoke row_metadata_to_tuple() if needed. Invoke dtuple_convert_big_rec() for metadata records if the record is too large, or if the mblob is not yet marked as externally stored. btr_cur_optimistic_delete_func(), btr_cur_pessimistic_delete(): When the last user record is deleted, do not delete the generic instant ALTER TABLE metadata record. Only delete MDEV-11369 instant ADD COLUMN metadata records. btr_cur_optimistic_insert(): Avoid unnecessary computation of rec_size. btr_pcur_store_position(): Allow a logically empty page to contain a metadata record for generic ALTER TABLE. REC_INFO_DEFAULT_ROW_ADD: Renamed from REC_INFO_DEFAULT_ROW. This is for the old instant ADD COLUMN (MDEV-11369) only. REC_INFO_DEFAULT_ROW_ALTER: The more generic metadata record, with additional information for dropped or reordered columns. rec_info_bits_valid(): Remove. The only case when this would fail is when the record is the generic ALTER TABLE metadata record. rec_is_alter_metadata(): Check if a record is the metadata record for instant ALTER TABLE (other than ADD COLUMN). NOTE: This function must not be invoked on node pointer records, because the delete-mark flag in those records may be set (it is garbage), and then a debug assertion could fail because index->is_instant() does not necessarily hold. rec_is_add_metadata(): Check if a record is MDEV-11369 ADD COLUMN metadata record (not more generic instant ALTER TABLE). rec_get_converted_size_comp_prefix_low(): Assume that the metadata field will be stored externally. In dtuple_convert_big_rec() during the rec_get_converted_size() call, it would not be there yet. rec_get_converted_size_comp(): Replace status,fields,n_fields with tuple. rec_init_offsets_comp_ordinary(), rec_get_converted_size_comp_prefix_low(), rec_convert_dtuple_to_rec_comp(): Add template<bool mblob = false>. With mblob=true, process a record with a metadata BLOB. rec_copy_prefix_to_buf(): Assert that no fields beyond the key and system columns are being copied. Exclude the metadata BLOB field. rec_convert_dtuple_to_metadata_comp(): Convert an alter metadata tuple into a record. row_upd_index_replace_metadata(): Apply an update vector to an alter_metadata tuple. row_log_allocate(): Replace dict_index_t::is_instant() with a more appropriate condition that ignores dict_table_t::instant. Only a table on which the MDEV-11369 ADD COLUMN was performed can "lose its instantness" when it becomes empty. After instant DROP COLUMN or reordering columns, we cannot simply convert the table to the canonical format, because the data dictionary cache and all possibly existing references to it from other client connection threads would have to be adjusted. row_quiesce_write_index_fields(): Do not crash when the table contains an instantly dropped column. Thanks to Thirunarayanan Balathandayuthapani for discussing the design and implementing an initial prototype of this. Thanks to Matthias Leich for testing.
* | | Fix most -Wsign-conversion in InnoDBMarko Mäkelä2018-04-281-2/+2
| | | | | | | | | | | | Change innodb_buffer_pool_size, innodb_fill_factor to unsigned.