summaryrefslogtreecommitdiff
path: root/include/my_base.h
Commit message (Collapse)AuthorAgeFilesLines
* MDEV-17554 Auto-create new partition for system versioned tables with ↵Aleksey Midenkov2022-05-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | history partitioned by INTERVAL/LIMIT :: Syntax change :: Keyword AUTO enables history partition auto-creation. Examples: CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO; CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING PARTITION BY SYSTEM_TIME INTERVAL 1 MONTH STARTS '2021-01-01 00:00:00' AUTO PARTITIONS 12; CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING PARTITION BY SYSTEM_TIME LIMIT 1000 AUTO; Or with explicit partitions: CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO (PARTITION p0 HISTORY, PARTITION pn CURRENT); To disable or enable auto-creation one can use ALTER TABLE by adding or removing AUTO from partitioning specification: CREATE TABLE t1 (x int) WITH SYSTEM VERSIONING PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO; # Disables auto-creation: ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR; # Enables auto-creation: ALTER TABLE t1 PARTITION BY SYSTEM_TIME INTERVAL 1 HOUR AUTO; If the rest of partitioning specification is identical to CREATE TABLE no repartitioning will be done (for details see MDEV-27328). :: Description :: Before executing history-generating DML command (see the list of commands below) add N history partitions, so that N would be sufficient for potentially generated history. N > 1 may be required when history partitions are switched by INTERVAL and current_timestamp is N times further than the interval boundary of the last history partition. If the last history partition equals or exceeds LIMIT records then new history partition is created and selected as the working partition. According to MDEV-28411 partitions cannot be switched (or created) while the command is running. Thus LIMIT does not carry strict limitation and the history partition size must be planned as LIMIT value plus average number of history one DML command can generate. Auto-creation is implemented by synchronous fast_alter_partition_table() call from the thread of the executed DML command before the command itself is run (by the fallback and retry mechanism similar to Discovery feature, see Open_table_context). The name for newly added partitions are generated like default partition names with extension of MDEV-22155 (which avoids name clashes by extending assignment counter to next free-enough gap). These DML commands can trigger auto-creation: DELETE (including multitable DELETE, excluding DELETE HISTORY) UPDATE (including multitable UPDATE) REPLACE (including REPLACE .. SELECT) INSERT .. ON DUPLICATE KEY UPDATE (including INSERT .. SELECT .. ODKU) LOAD DATA .. REPLACE :: Bug fixes :: MDEV-23642 Locking timeout caused by auto-creation affects original DML The reasons for this are: - Do not disrupt main business process (the history is auxiliary service); - Consequences are non-fatal (history is not lost, but comes into wrong partition; fixed by partitioning rebuild); - There is more freedom for application to fail in this case or not: it may read warning info and find corresponding error number. - While non-failing command is easy to handle by an application and fail it, the opposite is hard to handle: there is no automatic actions to fix failed command and retry, DBA intervention is required and until then application is non-functioning. MDEV-23639 Auto-create does not work under LOCK TABLES or inside triggers Don't do tdc_remove_table() for OT_ADD_HISTORY_PARTITION because it is not possible in locked tables mode. LTM_LOCK_TABLES mode (and LTM_PRELOCKED_UNDER_LOCK_TABLES) works out of the box as fast_alter_partition_table() can reopen tables via locked_tables_list. In LTM_PRELOCKED we reopen and relock table manually. :: More fixes :: * some_table_marked_for_reopen flag fix some_table_marked_for_reopen affets only reopen of m_locked_tables. I.e. Locked_tables_list::reopen_tables() reopens only tables from m_locked_tables. * Unused can_recover_from_failed_open() condition Is recover_from_failed_open() can be really used after open_and_process_routine()? :: Reviewed by :: Sergei Golubchik <serg@mariadb.org>
* Merge branch '10.5' into 10.6Oleksandr Byelkin2022-02-031-1/+2
|\
| * Merge branch '10.4' into 10.5Oleksandr Byelkin2022-02-011-1/+2
| |\
| | * fix MDEV-27217 (4d5ae2b3258d0d4eb3addd61fdabf49d9a6314e7)Oleksandr Byelkin2022-01-301-2/+3
| | | | | | | | | | | | | | | Add error from later versions to avoid chaging HA_ERR_* accross versions and in already released versions.
| | * Merge branch '10.3' into 10.4Oleksandr Byelkin2022-01-301-1/+2
| | |\
| | | * MDEV-27217 DELETE partition selection doesn't work for history partitionsAleksey Midenkov2022-01-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LIMIT history switching requires the number of history partitions to be marked for read: from first to last non-empty plus one empty. The least we can do is to fail with error message if the needed partition was not marked for read. As this is handler interface we require new handler error code to display user-friendly error message. Switching by INTERVAL works out-of-the-box with ER_ROW_DOES_NOT_MATCH_GIVEN_PARTITION_SET error.
* | | | Merge 10.5 into 10.6Marko Mäkelä2021-09-161-1/+2
|\ \ \ \ | |/ / /
| * | | Improve error messages from Ariatmp-10.5-montyMonty2021-09-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | - Error on commit now returns HA_ERR_COMMIT_ERROR instead of HA_ERR_INTERNAL_ERROR - If checkpoint fails, it will now print out where it failed.
* | | | MDEV-25506 (3 of 3): Do not delete .ibd files before commitMarko Mäkelä2021-06-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a complete rewrite of DROP TABLE, also as part of other DDL, such as ALTER TABLE, CREATE TABLE...SELECT, TRUNCATE TABLE. The background DROP TABLE queue hack is removed. If a transaction needs to drop and create a table by the same name (like TRUNCATE TABLE does), it must first rename the table to an internal #sql-ib name. No committed version of the data dictionary will include any #sql-ib tables, because whenever a transaction renames a table to a #sql-ib name, it will also drop that table. Either the rename will be rolled back, or the drop will be committed. Data files will be unlinked after the transaction has been committed and a FILE_RENAME record has been durably written. The file will actually be deleted when the detached file handle returned by fil_delete_tablespace() will be closed, after the latches have been released. It is possible that a purge of the delete of the SYS_INDEXES record for the clustered index will execute fil_delete_tablespace() concurrently with the DDL transaction. In that case, the thread that arrives later will wait for the other thread to finish. HTON_TRUNCATE_REQUIRES_EXCLUSIVE_USE: A new handler flag. ha_innobase::truncate() now requires that all other references to the table be released in advance. This was implemented by Monty. ha_innobase::delete_table(): If CREATE TABLE..SELECT is detected, we will "hijack" the current transaction, drop the table in the current transaction and commit the current transaction. This essentially fixes MDEV-21602. There is a FIXME comment about making the check less failure-prone. ha_innobase::truncate(), ha_innobase::delete_table(): Implement a fast path for temporary tables. We will no longer allow temporary tables to use the adaptive hash index. dict_table_t::mdl_name: The original table name for the purpose of acquiring MDL in purge, to prevent a race condition between a DDL transaction that is dropping a table, and purge processing undo log records of DML that had executed before the DDL operation. For #sql-backup- tables during ALTER TABLE...ALGORITHM=COPY, the dict_table_t::mdl_name will differ from dict_table_t::name. dict_table_t::parse_name(): Use mdl_name instead of name. dict_table_rename_in_cache(): Update mdl_name. For the internal FTS_ tables of FULLTEXT INDEX, purge would acquire MDL on the FTS_ table name, but not on the main table, and therefore it would be able to run concurrently with a DDL transaction that is dropping the table. Previously, the DROP TABLE queue hack prevented a race between purge and DDL. For now, we introduce purge_sys.stop_FTS() to prevent purge from opening any table, while a DDL transaction that may drop FTS_ tables is in progress. The function fts_lock_table(), which will be invoked before the dictionary is locked, will wait for purge to release any table handles. trx_t::drop_table_statistics(): Drop statistics for the table. This replaces dict_stats_drop_index(). We will drop or rename persistent statistics atomically as part of DDL transactions. On lock conflict for dropping statistics, we will fail instantly with DB_LOCK_WAIT_TIMEOUT, because we will be holding the exclusive data dictionary latch. trx_t::commit_cleanup(): Separated from trx_t::commit_in_memory(). Relax an assertion around fts_commit() and allow DB_LOCK_WAIT_TIMEOUT in addition to DB_DUPLICATE_KEY. The call to fts_commit() is entirely misplaced here and may obviously break the consistency of transactions that affect FULLTEXT INDEX. It needs to be fixed separately. dict_table_t::n_foreign_key_checks_running: Remove (MDEV-21175). The counter was a work-around for missing meta-data locking (MDL) on the SQL layer, and not really needed in MariaDB. ER_TABLE_IN_FK_CHECK: Replaced with ER_UNUSED_28. HA_ERR_TABLE_IN_FK_CHECK: Remove. row_ins_check_foreign_constraints(): Do not acquire dict_sys.latch either. The SQL-layer MDL will protect us. This was reviewed by Thirunarayanan Balathandayuthapani and tested by Matthias Leich.
* | | | Merge 10.5 into 10.6Marko Mäkelä2021-03-051-2/+0
|\ \ \ \ | |/ / /
| * | | Merge 10.4 into 10.5Marko Mäkelä2021-03-051-4/+2
| |\ \ \ | | |/ /
| | * | Remove unused HA_EXTRA_FAKE_START_STMTMarko Mäkelä2021-03-051-4/+2
| | | | | | | | | | | | | | | | This is fixup for commit f06a0b5338694755842a58798bb3a9a40da78bfd.
* | | | MDEV-515 Reduce InnoDB undo logging for insert into empty tableMarko Mäkelä2021-01-251-2/+4
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We implement an idea that was suggested by Michael 'Monty' Widenius in October 2017: When InnoDB is inserting into an empty table or partition, we can write a single undo log record TRX_UNDO_EMPTY, which will cause ROLLBACK to clear the table. For this to work, the insert into an empty table or partition must be covered by an exclusive table lock that will be held until the transaction has been committed or rolled back, or the INSERT operation has been rolled back (and the table is empty again), in lock_table_x_unlock(). Clustered index records that are covered by the TRX_UNDO_EMPTY record will carry DB_TRX_ID=0 and DB_ROLL_PTR=1<<55, and thus they cannot be distinguished from what MDEV-12288 leaves behind after purging the history of row-logged operations. Concurrent non-locking reads must be adjusted: If the read view was created before the INSERT into an empty table, then we must continue to imagine that the table is empty, and not try to read any records. If the read view was created after the INSERT was committed, then all records must be visible normally. To implement this, we introduce the field dict_table_t::bulk_trx_id. This special handling only applies to the very first INSERT statement of a transaction for the empty table or partition. If a subsequent statement in the transaction is modifying the initially empty table again, we must enable row-level undo logging, so that we will be able to roll back to the start of the statement in case of an error (such as duplicate key). INSERT IGNORE will continue to use row-level logging and locking, because implementing it would require the ability to roll back the latest row. Since the undo log that we write only allows us to roll back the entire statement, we cannot support INSERT IGNORE. We will introduce a handler::extra() parameter HA_EXTRA_IGNORE_INSERT to indicate to storage engines that INSERT IGNORE is being executed. In many test cases, we add an extra record to the table, so that during the 'interesting' part of the test, row-level locking and logging will be used. Replicas will continue to use row-level logging and locking until MDEV-24622 has been addressed. Likewise, this optimization will be disabled in Galera cluster until MDEV-24623 enables it. dict_table_t::bulk_trx_id: The latest active or committed transaction that initiated an insert into an empty table or partition. Protected by exclusive table lock and a clustered index leaf page latch. ins_node_t::bulk_insert: Whether bulk insert was initiated. trx_t::mod_tables: Use C++11 style accessors (emplace instead of insert). Unlike earlier, this collection will cover also temporary tables. trx_mod_table_time_t: Add start_bulk_insert(), end_bulk_insert(), is_bulk_insert(), was_bulk_insert(). trx_undo_report_row_operation(): Before accessing any undo log pages, invoke trx->mod_tables.emplace() in order to determine whether undo logging was disabled, or whether this is the first INSERT and we are supposed to write a TRX_UNDO_EMPTY record. row_ins_clust_index_entry_low(): If we are inserting into an empty clustered index leaf page, set the ins_node_t::bulk_insert flag for the subsequent trx_undo_report_row_operation() call. lock_rec_insert_check_and_lock(), lock_prdt_insert_check_and_lock(): Remove the redundant parameter 'flags' that can be checked in the caller. btr_cur_ins_lock_and_undo(): Simplify the logic. Correctly write DB_TRX_ID,DB_ROLL_PTR after invoking trx_undo_report_row_operation(). trx_mark_sql_stat_end(), ha_innobase::extra(HA_EXTRA_IGNORE_INSERT), ha_innobase::external_lock(): Invoke trx_t::end_bulk_insert() so that the next statement will not be covered by table-level undo logging. ReadView::changes_visible(trx_id_t) const: New accessor for the case where the trx_id_t is not read from a potentially corrupted index page but directly from the memory. In this case, we can skip a sanity check. row_sel(), row_sel_try_search_shortcut(), row_search_mvcc(): row_sel_try_search_shortcut_for_mysql(), row_merge_read_clustered_index(): Check dict_table_t::bulk_trx_id. row_sel_clust_sees(): Replaces lock_clust_rec_cons_read_sees(). lock_sec_rec_cons_read_sees(): Replaced with lower-level code. btr_root_page_init(): Refactored from btr_create(). dict_index_t::clear(), dict_table_t::clear(): Empty an index or table, for the ROLLBACK of an INSERT operation. ROW_T_EMPTY, ROW_OP_EMPTY: Note a concurrent ROLLBACK of an INSERT into an empty table. This is joint work with Thirunarayanan Balathandayuthapani, who created a working prototype. Thanks to Matthias Leich for extensive testing.
* | | MDEV-19745 BACKUP STAGE BLOCK_DDL hangs on flush sequence tableMonty2020-06-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem was that FLUSH TABLES where trying to read latest sequence state which conflicted with a running ALTER SEQUENCE. Removed the reading of the state, when opening a table for FLUSH, as it's not needed in this case. Other thing: - Fixed a potential issue with concurrently running ALTER SEQUENCE where the later ALTER could potentially read old data
* | | Merge 10.4 into 10.5Marko Mäkelä2020-05-041-1/+7
|\ \ \ | |/ /
| * | MDEV-22401: Optimizer trace: multi-component range is not printed correctlySergei Petrunia2020-04-291-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KEY_MULTI_RANGE::range_flag does not have correct flag bits for per-endpoint flags (NEAR_MIN, NEAR_MAX, NO_MIN_RANGE, NO_MAX_RANGE). It only has bits for flags that describe both endpoints. So - Document this. - Switch optimizer trace to using {start|end}_key.flag values, instead. This fixes the bug. - Switch records_in_column_ranges() to doing that too. (This used to work, because KEY_MULTI_RANGE::range_flag had correct flag value for the last key component, and EITS only uses one-component pseudo-indexes)
* | | Updated optimizer costs in multi_range_read_info_const() and sql_select.ccMonty2020-03-271-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - multi_range_read_info_const now uses the new records_in_range interface - Added handler::avg_io_cost() - Don't calculate avg_io_cost() in get_sweep_read_cost if avg_io_cost is not 1.0. In this case we trust the avg_io_cost() from the handler. - Changed test_quick_select to use TIME_FOR_COMPARE instead of TIME_FOR_COMPARE_IDX to align this with the rest of the code. - Fixed bug when using test_if_cheaper_ordering where we didn't use keyread if index was changed - Fixed a bug where we didn't use index only read when using order-by-index - Added keyread_time() to HEAP. The default keyread_time() was optimized for blocks and not suitable for HEAP. The effect was the HEAP prefered table scans over ranges for btree indexes. - Fixed get_sweep_read_cost() for HEAP tables - Ensure that range and ref have same cost for simple ranges Added a small cost (MULTI_RANGE_READ_SETUP_COST) to ranges to ensure we favior ref for range for simple queries. - Fixed that matching_candidates_in_table() uses same number of records as the rest of the optimizer - Added avg_io_cost() to JT_EQ_REF cost. This helps calculate the cost for HEAP and temporary tables better. A few tests changed because of this. - heap::read_time() and heap::keyread_time() adjusted to not add +1. This was to ensure that handler::keyread_time() doesn't give higher cost for heap tables than for normal tables. One effect of this is that heap and derived tables stored in heap will prefer key access as this is now regarded as cheap. - Changed cost for index read in sql_select.cc to match multi_range_read_info_const(). All index cost calculation is now done trough one function. - 'ref' will now use quick_cost for keys if it exists. This is done so that for '=' ranges, 'ref' is prefered over 'range'. - scan_time() now takes avg_io_costs() into account - get_delayed_table_estimates() uses block_size and avg_io_cost() - Removed default argument to test_if_order_by_key(); simplifies code
* | | Added page_range to records_in_range() to improve range statisticsMonty2020-03-271-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prototype change: - virtual ha_rows records_in_range(uint inx, key_range *min_key, - key_range *max_key) + virtual ha_rows records_in_range(uint inx, const key_range *min_key, + const key_range *max_key, + page_range *res) The handler can ignore the page_range parameter. In the case the handler updates the parameter, the optimizer can deduce the following: - If previous range's last key is on the same block as next range's first key - If the current key range is in one block - We can also assume that the first and last block read are cached! This can be used for a better calculation of IO seeks when we estimate the cost of a range index scan. The parameter is fully implemented for MyISAM, Aria and InnoDB. A separate patch will update handler::multi_range_read_info_const() to take the benefits of this change and also remove the double records_in_range() calls that are not anymore needed.
* | | move Aria/S3 specific flag into Aria codeSergei Golubchik2019-08-231-1/+0
| | | | | | | | | | | | it doesn't belong in include/my_base.h
* | | MDEV-20306 Assert when converting encrypted Aria table to S3Monty2019-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes: - maria_create() now uses a bit in the parameter flags to check if table should be encrypted instead of using maria_encrypted_tables. - Don't encrypt tables that are to be converted to S3 - Added encrypted flag to ARIA_TABLE_CAPABILITIES - maria_chk --description now prints if table is encrypted. Other operations is not allowed on encrypted tables.
* | | Merge 10.4 into 10.5Marko Mäkelä2019-06-181-0/+1
|\ \ \ | |/ /
| * | Merge branch '10.3' into 10.4Oleksandr Byelkin2019-06-141-0/+1
| |\ \ | | |/
| | * Merge branch '10.2' into 10.3Oleksandr Byelkin2019-06-141-0/+1
| | |\
| | | * Merge branch '10.1' into 10.2Oleksandr Byelkin2019-06-131-0/+1
| | | |\
| | | | * Merge branch '5.5' into 10.1Oleksandr Byelkin2019-06-121-0/+1
| | | | |\
| | | | | * MDEV-18479 ComplementIgor Babaev2019-05-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch complements the patch that fixes bug MDEV-18479. This patch takes care of possible overflow when calculating the estimated number of rows in a materialized derived table / view.
* | | | | | Merge 10.4 into 10.5Marko Mäkelä2019-05-231-1/+1
|\ \ \ \ \ \ | |/ / / / /
| * | | | | Merge branch '10.3' into 10.4Oleksandr Byelkin2019-05-191-1/+1
| |\ \ \ \ \ | | |/ / / /
| | * | | | Merge 10.2 into 10.3Marko Mäkelä2019-05-141-1/+1
| | |\ \ \ \ | | | |/ / /
| | | * | | Merge 10.1 into 10.2Marko Mäkelä2019-05-131-1/+1
| | | |\ \ \ | | | | |/ /
| | | | * | Merge branch '5.5' into 10.1Vicențiu Ciorbaru2019-05-111-1/+1
| | | | |\ \ | | | | | |/
| | | | | * Follow-up to changing FSF addressVicențiu Ciorbaru2019-05-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some places didn't match the previous rules, making the Floor address wrong. Additional sed rules: sed -i -e 's/Place.*Suite .*, Boston/Street, Fifth Floor, Boston/g' sed -i -e 's/Suite .*, Boston/Fifth Floor, Boston/g'
| | | | * | remove HA_ERR_INFO, use ER_ALTER_INFOSergei Golubchik2015-12-231-2/+1
| | | | | |
| | | | * | Don't send error 0 to my_printf_error()Monty2015-12-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixed by adding HA_ERR_INFO as a informational warning to by used by MyISAM This is used to inform when we create a backup copy of the data file. Also improved informational messages when creating backup copies of data and index files
* | | | | | MDEV-17841 S3 storage engineMonty2019-05-231-1/+1
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A read-only storage engine that stores it's data in (aws) S3 To store data in S3 one could use ALTER TABLE: ALTER TABLE table_name ENGINE=S3 libmarias3 integration done by Sergei Golubchik libmarias3 created by Andrew Hutchings
* | | | | MDEV-371 Unique Index for long columnsSachin2019-02-221-1/+2
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements engine independent unique hash index. Usage:- Unique HASH index can be created automatically for blob/varchar/test column whose key length > handler->max_key_length() or it can be explicitly specified. Automatic Creation:- Create TABLE t1 (a blob unique); Explicit Creation:- Create TABLE t1 (a int , unique(a) using HASH); Internal KEY_PART Representations:- Long unique key_info will have 2 representations. (lets understand this with an example create table t1(a blob, b blob , unique(a, b)); ) 1. User Given Representation:- key_info->key_part array will be similar to what user has defined. So in case of example it will have 2 key_parts (a, b) 2. Storage Engine Representation:- In this case there will be only one key_part and it will point to HASH_FIELD. This key_part will be always after user defined key_parts. So:- User Given Representation [a] [b] [hash_key_part] key_info->key_part ----^ Storage Engine Representation [a] [b] [hash_key_part] key_info->key_part ------------^ Table->s->key_info will have User Given Representation, While table->key_info will have Storage Engine Representation.Representation can be changed into each other by calling re/setup_keyinfo_hash function. Working:- 1. So when user specifies HASH_INDEX or key_length is > handler->max_key_length(), In mysql_prepare_create_table One extra vfield is added (for each long unique key). And key_info->algorithm is set to HA_KEY_ALG_LONG_HASH. 2. In init_from_binary_frm_image values for hash_keypart is set (like fieldnr , field and flags) 3. In parse_vcol_defs, HASH_FIELD->vcol_info is created. Item_func_hash is used with list of Item_fields, When Explicit length is given by user then Item_left is used to concatenate Item_field values. 4. In ha_write_row/ha_update_row check_duplicate_long_entry_key is called which will create the hash key from table->record[0] and then call ha_index_read_map , if we found duplicated hash , we will compare the result field by field.
* | | | Fix a lot of compiler warnings found by -WunusedMonty2018-04-261-1/+1
| | | |
* | | | Misc. typosluz.paz2018-04-051-1/+1
| | | | | | | | | | | | | | | | Found via `codespell -i 3 -w --skip="./debian/po" -I ../mariadb-server-word-whitelist.txt ./cmake/ ./debian/ ./Docs/ ./include/ ./man/ ./plugin/ ./strings/`
* | | | Merge bb-10.2-ext into 10.3Marko Mäkelä2018-01-301-2/+8
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPY Move a test from innodb.rename_table_debug to innodb.alter_copy. ha_innobase::extra(HA_EXTRA_BEGIN_ALTER_COPY): Register id-versioned tables so that mysql.transaction_registry will be updated, even for empty tables that are subjected to ALTER TABLE…ALGORITHM=COPY.
| * \ \ \ Merge 10.2 into bb-10.2-extMarko Mäkelä2018-01-301-1/+7
| |\ \ \ \ | | |/ / /
| | * | | MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPYMarko Mäkelä2018-01-301-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend a lot of time rolling back writes to the intermediate copy of the table. To reduce the amount of busy work done, a work-around was introduced in commit fd069e2bb36a3c1c1f26d65dd298b07e6d83ac8b in MySQL 4.1.8 and 5.0.2, to commit the transaction after every 10,000 inserted rows. A proper fix would have been to disable the undo logging altogether and to simply drop the intermediate copy of the table on subsequent server startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585. In MariaDB 10.2, the intermediate copy of the table would be left behind with a name starting with the string #sql. This is a backport of a bug fix from MySQL 8.0.0 to MariaDB, contributed by jixianliang <271365745@qq.com>. Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation InnoDB must for now keep the undo logging enabled, so that the latest row can be rolled back in case of an error. In Galera cluster, the LOAD DATA statement will retain the existing behaviour and commit the transaction after every 10,000 rows if the parameter wsrep_load_data_splitting=ON is set. The logic to do so (the wsrep_load_data_split() function and the call handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work by Ji Xianliang and Marko Mäkelä. The original fix: Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com> Date: Wed Dec 2 16:09:15 2015 +0530 Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY Problem: During ALTER TABLE, we commit and restart the transaction for every 10,000 rows, so that the rollback after recovery would not take so long. Fix: Suppress the undo logging during copy alter operation. If fts_index is present then insert directly into fts auxiliary table rather than doing at commit time. ha_innobase::num_write_row: Remove the variable. ha_innobase::write_row(): Remove the hack for committing every 10000 rows. row_lock_table_for_mysql(): Remove the extra 2 parameters. lock_get_src_table(), lock_is_table_exclusive(): Remove. Reviewed-by: Marko Mäkelä <marko.makela@oracle.com> Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com> Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
* | | | | Timestamp-based versioning for InnoDB [closes #209]Aleksey Midenkov2017-12-181-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Removed integer_fields check * Reworked Vers_parse_info::check_sys_fields() * Misc renames * versioned as vers_sys_type_t * Removed versioned_by_sql(), versioned_by_engine() versioned() works as before; versioned(VERS_TIMESTAMP) is versioned_by_sql(); versioned(VERS_TRX_ID) is versioned_by_engine(). * create_tmp_table() fix * Foreign constraints for timestamp-based * Range auto-specifier fix * SQL: 1-row partition rotation fix [fixes #260] * Fix 'drop system versioning, algorithm=inplace'
* | | | | MDEV-10177 Invisible Columns and Invisible IndexSachin Setiya2017-12-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Feature Definition:- This feature adds invisible column functionality to server. There is 4 level of "invisibility": 1. Not invisible (NOT_INVISIBLE) — Normal columns created by the user 2. A little bit invisible (USER_DEFINED_INVISIBLE) — columns that the user has marked invisible. They aren't shown in SELECT * and they don't require values in INSERT table VALUE (...). Otherwise they behave as normal columns. 3. More invisible (SYSTEM_INVISIBLE) — Can be queried explicitly, otherwise invisible from everything. Think ROWID sytem column. Because they're invisible from ALTER TABLE and from CREATE TABLE they cannot be created or dropped, they're created by the system. User cant not create a column name which is same as of SYSTEM_INVISIBLE. 4. Very invisible (COMPLETELY_INVISIBLE) — as above, but cannot be queried either. They can only show up in EXPLAIN EXTENDED (might be possible for a very invisible indexed virtual column) but otherwise they don't exist for the user.If user creates a columns which has same name as of COMPLETELY_INVISIBLE then COMPLETELY_INVISIBLE column is renamed again. So it is completely invisible from user. Invisible Index(HA_INVISIBLE_KEY):- Creation of invisible columns require a new type of index which will be only visible to system. User cant see/alter/create/delete this index. If user creates a index which is same name as of invisible index then it will be renamed. Syntax Details:- Only USER_DEFINED_INVISIBLE column can be created by user. This can be created by adding INVISIBLE suffix after column definition. Create table t1( a int invisible, b int); Rules:- There are some rules/restrictions related to use of invisible columns 1. All the columns in table cant be invisible. Create table t1(a int invisible); \\error Create table t1(a int invisible, b int invisble); \\error 2. If you want invisible column to be NOT NULL then you have to supply Default value for the column. Create table t1(a int, b int not null); \\error 3. If you create a view/create table with select * then this wont copy invisible fields. So newly created view/table wont have any invisible columns. Create table t2 as select * from t1;//t2 wont have t1 invisible column Create view v1 as select * from t1;//v1 wont have t1 invisible column 4. Invisibility wont be forwarded to next table in any case of create table/view as select */(a,b,c) from table. Create table t2 as select a,b,c from t1; // t2 will have t1 invisible // column(b), but this wont be invisible in t2 Create view v1 as select a,b,c from t1; // v1 will have t1 invisible // column(b), but this wont be invisible in v1 Implementation Details:- Parsing:- INVISIBLE_SYM is added into vcol_attribute(so its like unique suffix), It is also added into keyword_sp_not_data_type so that table can have column with name invisible. Implementation detail is given by each modified function/created function. (Some function are left as they were self explanatory) (m= Modified, n= Newly Created) mysql_prepare_create_table(m):- Extra checks for invisible columns are added. Also some DEBUG_EXECUTE_IF are also added for test cases. mysql_prepare_alter_table(m):- Now this will drop all the COMPLETELY_INVISIBLE column and HA_INVISIBLE_KEY index. Further Modifications are made to stop drop/change/delete of SYSTEM_INVISIBLE column. build_frm_image(m):- Now this allows incorporating field_visibility status into frm image. To remain compatible with old frms field_visibility info will be only written when any of the field is not NOT_INVISIBLE. extra2_write_additional_field_properties(n):- This will write field visibility info into buffer. We first write EXTRA2_FIELD_FLAGS into buffer/frm , then each next char will have field_visibility for each field. init_from_binary_frm_image(m):- Now if we get EXTRA2_FIELD_FLAGS, then we will read the next n(n= number of fields) chars and set the field_visibility. We also increment thd->status_var.feature_invisible_columns. One important thing to note if we find out that key contains a field whose visibility is > USER_DEFINED_INVISIBLE then , we declare this key as invisible key. sql_show.cc is changed accordingly to make show table, show keys correct. mysql_insert(m):- If we get to know that we are doing insert in this way insert into t1 values(1,1); without explicitly specifying columns, then we check for if we have invisible fields if yes then we reset the whole record, Why ? Because first we want hidden columns to get default/null value. Second thing auto_increment has property no default and no null which voilates invisible key rule 2, And because of this it was giving error. Reseting table->record[0] eliminates this issue. More info put breakpoint on handler::write_row and see auto_increment value. fill_record(m):- we continue loop if we find invisible column because this is already reseted/will get its value if it is default. Test cases:- Since we can not directly add > USER_DEFINED_INVISIBLE column then I have debug_dbug to create it in mysql_prepare_create_table. Patch Credit:- Serg Golubchik
* | | | | Adding multi_range_read support to partitionsMonty2017-12-031-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Other things: - Cleanup of allocated bitmaps done in open(), which simplifies init_partition_bitmaps() - Add needed defines in ha_spider.cc to enable new spider code - Fixed some DBUG_PRINT() to be consistent with normal code - Removed end space - The changes in test cases partition_innodb, partition_range, partition_pruning etc are becasue partitions can now more exactly calculate the number of rows in a range. Contains spider patches: 014,015,023,033,035,037,040,042,044,045,049,050,051,053,059
* | | | | Enusure that my_global.h is included firstMichael Widenius2017-08-241-1/+0
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Added sql/mariadb.h file that should be included first by files in sql directory, if sql_plugin.h is not used (sql_plugin.h adds SHOW variables that must be done before my_global.h is included) - Removed a lot of include my_global.h from include files - Removed include's of some files that my_global.h automatically includes - Removed duplicated include's of my_sys.h - Replaced include my_config.h with my_global.h
* | | | MDEV-10139 Support for SEQUENCE objectsMonty2017-05-081-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - SETVAL(sequence_name, next_value, is_used, round) - ALTER SEQUENCE, including RESTART WITH Other things: - Added handler::extra() option HA_EXTRA_PREPARE_FOR_ALTER_TABLE to signal ha_sequence() that it should allow write_row statments. - ALTER ONLINE TABLE now works with SEQUENCE:s
* | | | MDEV-10139 Support for SEQUENCE objectsMonty2017-04-071-1/+6
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Working features: CREATE OR REPLACE [TEMPORARY] SEQUENCE [IF NOT EXISTS] name [ INCREMENT [ BY | = ] increment ] [ MINVALUE [=] minvalue | NO MINVALUE ] [ MAXVALUE [=] maxvalue | NO MAXVALUE ] [ START [ WITH | = ] start ] [ CACHE [=] cache ] [ [ NO ] CYCLE ] ENGINE=xxx COMMENT=".." SELECT NEXT VALUE FOR sequence_name; SELECT NEXTVAL(sequence_name); SELECT PREVIOUS VALUE FOR sequence_name; SELECT LASTVAL(sequence_name); SHOW CREATE SEQUENCE sequence_name; SHOW CREATE TABLE sequence_name; CREATE TABLE sequence-structure ... SEQUENCE=1 ALTER TABLE sequence RENAME TO sequence2; RENAME TABLE sequence TO sequence2; DROP [TEMPORARY] SEQUENCE [IF EXISTS] sequence_names Missing features - SETVAL(value,sequence_name), to be used with replication. - Check replication, including checking that sequence tables are marked not transactional. - Check that a commit happens for NEXT VALUE that changes table data (may already work) - ALTER SEQUENCE. ANSI SQL version of setval. - Share identical sequence entries to not add things twice to table list. - testing insert/delete/update/truncate/load data - Run and fix Alibaba sequence tests (part of mysql-test/suite/sql_sequence) - Write documentation for NEXT VALUE / PREVIOUS_VALUE - NEXTVAL in DEFAULT - Ensure that NEXTVAL in DEFAULT uses database from base table - Two NEXTVAL for same row should give same answer. - Oracle syntax sequence_table.nextval, without any FOR or FROM. - Sequence tables are treated as 'not read constant tables' by SELECT; Would be better if we would have a separate list for sequence tables so that select doesn't know about them, except if refereed to with FROM. Other things done: - Improved output for safemalloc backtrack - frm_type_enum changed to Table_type - Removed lex->is_view and replaced with lex->table_type. This allows use to more easy check if item is view, sequence or table. - Added table flag HA_CAN_TABLES_WITHOUT_ROLLBACK, needed for handlers that want's to support sequences - Added handler calls: - engine_name(), to simplify getting engine name for partition and sequences - update_first_row(), to be able to do efficient sequence implementations. - Made binlog_log_row() global to be able to call it from ha_sequence.cc - Added handler variable: row_already_logged, to be able to flag that the changed row is already logging to replication log. - Added CF_DB_CHANGE and CF_SCHEMA_CHANGE flags to simplify deny_updates_if_read_only_option() - Added sp_add_cfetch() to avoid new conflicts in sql_yacc.yy - Moved code for add_table_options() out from sql_show.cc::show_create_table() - Added String::append_longlong() and used it in sql_show.cc to simplify code. - Added extra option to dd_frm_type() and ha_table_exists to indicate if the table is a sequence. Needed by DROP SQUENCE to not drop a table.
* | | Fix many -Wconversion warnings.Marko Mäkelä2017-03-071-106/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define my_thread_id as an unsigned type, to avoid mismatch with ulonglong. Change some parameters to this type. Use size_t in a few more places. Declare many flag constants as unsigned to avoid sign mismatch when shifting bits or applying the unary ~ operator. When applying the unary ~ operator to enum constants, explictly cast the result to an unsigned type, because enum constants can be treated as signed. In InnoDB, change the source code line number parameters from ulint to unsigned type. Also, make some InnoDB functions return a narrower type (unsigned or uint32_t instead of ulint; bool instead of ibool).
* | | cleanup: fix a commentSergei Golubchik2016-12-121-1/+1
| | |
* | | misc after-merge changes:Sergei Golubchik2016-09-101-26/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * remove new InnoDB-specific ER_ and HA_ERR_ codes * renamed few old ER_ and HA_ERR_ error messages to be less MyISAM-specific * remove duplicate enum definitions (durability_properties, icp_result) * move new mysql-test include files to their owner suite * rename xtradb.rdiff files to *-disabled * remove mistakenly committed helper perl module * remove long obsolete handler::ha_statistic_increment() method * restore the standard C xid_t structure to not have setters and getters * remove xid_t::reset that was cleaning too much * move MySQL-5.7 ER_ codes where they belong * fir innodb to include service_wsrep.h not internal wsrep headers * update tests and results