summaryrefslogtreecommitdiff
path: root/sql/ha_partition.cc
Commit message (Collapse)AuthorAgeFilesLines
* MDEV-17556 Assertion `bitmap_is_set_all(&table->s->all_set)' failedNikita Malyavin2021-01-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The assertion failed in handler::ha_reset upon SELECT under READ UNCOMMITTED from table with index on virtual column. This was the debug-only failure, though the problem is mush wider: * MY_BITMAP is a structure containing my_bitmap_map, the latter is a raw bitmap. * read_set, write_set and vcol_set of TABLE are the pointers to MY_BITMAP * The rest of MY_BITMAPs are stored in TABLE and TABLE_SHARE * The pointers to the stored MY_BITMAPs, like orig_read_set etc, and sometimes all_set and tmp_set, are assigned to the pointers. * Sometimes tmp_use_all_columns is used to substitute the raw bitmap directly with all_set.bitmap * Sometimes even bitmaps are directly modified, like in TABLE::update_virtual_field(): bitmap_clear_all(&tmp_set) is called. The last three bullets in the list, when used together (which is mostly always) make the program flow cumbersome and impossible to follow, notwithstanding the errors they cause, like this MDEV-17556, where tmp_set pointer was assigned to read_set, write_set and vcol_set, then its bitmap was substituted with all_set.bitmap by dbug_tmp_use_all_columns() call, and then bitmap_clear_all(&tmp_set) was applied to all this. To untangle this knot, the rule should be applied: * Never substitute bitmaps! This patch is about this. orig_*, all_set bitmaps are never substituted already. This patch changes the following function prototypes: * tmp_use_all_columns, dbug_tmp_use_all_columns to accept MY_BITMAP** and to return MY_BITMAP * instead of my_bitmap_map* * tmp_restore_column_map, dbug_tmp_restore_column_maps to accept MY_BITMAP* instead of my_bitmap_map* These functions now will substitute read_set/write_set/vcol_set directly, and won't touch underlying bitmaps.
* Merge branch '10.1' into 10.2Oleksandr Byelkin2020-08-021-16/+16
|\
| * Code comment spellfixesIan Gilfillan2020-07-221-16/+16
| |
* | assert(a && b); -> assert(a); assert(b);Sergei Golubchik2020-05-271-7/+10
| |
* | Merge 10.1 into 10.2Marko Mäkelä2019-05-131-1/+1
|\ \ | |/
| * Merge branch '5.5' into 10.1Vicențiu Ciorbaru2019-05-111-1/+1
| |\
| | * Update FSF AddressVicențiu Ciorbaru2019-05-111-1/+1
| | | | | | | | | | | | * Update wrong zip-code
* | | Merge branch '10.1' into 10.2Oleksandr Byelkin2019-05-041-3/+8
|\ \ \ | |/ /
| * | Bug#28573894 ALTER PARTITIONED TABLE ADD AUTO_INCREMENT DIFF RESULT ↵Marko Mäkelä2019-04-251-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DEPENDING ON ALGORITHM For partitioned table, ensure that the AUTO_INCREMENT values will be assigned from the same sequence. This is based on the following change in MySQL 5.6.44: commit aaba359c13d9200747a609730dafafc3b63cd4d6 Author: Rahul Malik <rahul.m.malik@oracle.com> Date: Mon Feb 4 13:31:41 2019 +0530 Bug#28573894 ALTER PARTITIONED TABLE ADD AUTO_INCREMENT DIFF RESULT DEPENDING ON ALGORITHM Problem: When a partition table is in-place altered to add an auto-increment column, then its values are starting over for each partition. Analysis: In the case of in-place alter, InnoDB is creating a new sequence object for each partition. It is default initialized. So auto-increment columns start over for each partition. Fix: Assign old sequence of the partition to the sequence of next partition so it won't start over. RB#21148 Reviewed by Bin Su <bin.x.su@oracle.com>
| * | MDEV-9519: Data corruption will happen on the Galera cluster size changeJulius Goryavsky2019-02-251-19/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have a 2+ node cluster which is replicating from an async master and the binlog_format is set to STATEMENT and multi-row inserts are executed on a table with an auto_increment column such that values are automatically generated by MySQL, then the server node generates wrong auto_increment values, which are different from what was generated on the async master. In the title of the MDEV-9519 it was proposed to ban start slave on a Galera if master binlog_format = statement and wsrep_auto_increment_control = 1, but the problem can be solved without such a restriction. The causes and fixes: 1. We need to improve processing of changing the auto-increment values after changing the cluster size. 2. If wsrep auto_increment_control switched on during operation of the node, then we should immediately update the auto_increment_increment and auto_increment_offset global variables, without waiting of the next invocation of the wsrep_view_handler_cb() callback. In the current version these variables retain its initial values if wsrep_auto_increment_control is switched on during operation of the node, which leads to inconsistent results on the different nodes in some scenarios. 3. If wsrep auto_increment_control switched off during operation of the node, then we must return the original values of the auto_increment_increment and auto_increment_offset global variables, as the user has set. To make this possible, we need to add a "shadow copies" of these variables (which stores the latest values set by the user). https://jira.mariadb.org/browse/MDEV-9519
* | | MDEV-9519: Data corruption will happen on the Galera cluster size changeJulius Goryavsky2019-02-261-19/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have a 2+ node cluster which is replicating from an async master and the binlog_format is set to STATEMENT and multi-row inserts are executed on a table with an auto_increment column such that values are automatically generated by MySQL, then the server node generates wrong auto_increment values, which are different from what was generated on the async master. In the title of the MDEV-9519 it was proposed to ban start slave on a Galera if master binlog_format = statement and wsrep_auto_increment_control = 1, but the problem can be solved without such a restriction. The causes and fixes: 1. We need to improve processing of changing the auto-increment values after changing the cluster size. 2. If wsrep auto_increment_control switched on during operation of the node, then we should immediately update the auto_increment_increment and auto_increment_offset global variables, without waiting of the next invocation of the wsrep_view_handler_cb() callback. In the current version these variables retain its initial values if wsrep_auto_increment_control is switched on during operation of the node, which leads to inconsistent results on the different nodes in some scenarios. 3. If wsrep auto_increment_control switched off during operation of the node, then we must return the original values of the auto_increment_increment and auto_increment_offset global variables, as the user has set. To make this possible, we need to add a "shadow copies" of these variables (which stores the latest values set by the user). https://jira.mariadb.org/browse/MDEV-9519
* | | MDEV-16429: Assertion `!table || (!table->read_set || ↵Nikita Malyavin2018-12-201-35/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bitmap_is_set(table->read_set, field_index))' fails upon attempt to update virtual column on partitioned versioned table When using buffered sort in `UPDATE`, keyread is used. In this case, `TABLE::update_virtual_field` should be aborted, but it actually isn't, because it is called not with a top-level handler, but with the one that is actually going to access the disk. Here the problemm is issued with partitioning, so the solution is to recursively mark for keyread all the underlying partition handlers. * ha_partition: update keyread state for child partitions Closes #800
* | | Merge 10.1 into 10.2Marko Mäkelä2018-11-061-2/+7
|\ \ \ | |/ /
| * | Merge branch '10.0' into 10.1Sergei Golubchik2018-10-301-2/+7
| |\ \
| | * | MDEV-14815 - Server crash or AddressSanitizer errors or valgrind warningsSergey Vojtovich2018-10-191-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in thr_lock / has_old_lock upon FLUSH TABLES Explicit partition access of partitioned MEMORY table under LOCK TABLES may cause subsequent statements to crash the server, deadlock, trigger valgrind warnings or ASAN errors. Freed memory was being used due to incorrect cleanup. At least MyISAM and InnoDB don't seem to be affected, since their THR_LOCK structures don't survive FLUSH TABLES. MEMORY keeps table shared data (including THR_LOCK) even if there're no open instances. There's partition_info::lock_partitions bitmap, which holds bits of partitions allowed to be accessed after pruning. This bitmap is updated for each individual statement. This bitmap was abused in ha_partition::store_lock() such that when we need to unlock a table, locked by LOCK TABLES, only locks for partitions that were accessed by previous statement were released. Eventually FLUSH TABLES frees THR_LOCK_DATA objects, which are still linked into THR_LOCK lists. When such THR_LOCK gets reused we end up with freed memory access. Fixed by using ha_partition::m_locked_partitions bitmap similarly to ha_partition::external_lock().
| * | | MDEV-16912: Spider Order By column[datatime] limit 5 returns 3 rowsJacob Mathew2018-09-131-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem occurs in 10.2 and earlier releases of MariaDB Server because the Partition Engine was not pushing the engine conditions to the underlying storage engine of each partition. This caused Spider to return the first 5 rows in the table with the data provided by the customer. 2 of the 5 rows did not qualify the WHERE clause, so they were removed from the result set by the server. To fix the problem, I have back-ported support for engine condition pushdown in the Partition Engine from MariaDB Server 10.3. Author: Jacob Mathew. Reviewer: Kentoku Shiba. Cherry-Picked: Commit eb2ca3d on branch bb-10.2-MDEV-16912
* | | | MDEV-13089 identifier quoting in partitioningSergei Golubchik2018-09-211-4/+2
| | | | | | | | | | | | | | | | cover ALTER TABLE
* | | | MDEV-16912: Spider Order By column[datatime] limit 5 returns 3 rowsbb-10.2-MDEV-16912Jacob Mathew2018-09-111-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem occurs in 10.2 and earlier releases of MariaDB Server because the Partition Engine was not pushing the engine conditions to the underlying storage engine of each partition. This caused Spider to return the first 5 rows in the table with the data provided by the customer. 2 of the 5 rows did not qualify the WHERE clause, so they were removed from the result set by the server. To fix the problem, I have back-ported support for engine condition pushdown in the Partition Engine from MariaDB Server 10.3. Author: Jacob Mathew. Reviewer: Kentoku Shiba.
* | | | Merge 10.1 into 10.2Marko Mäkelä2018-06-261-1/+2
|\ \ \ \ | |/ / /
| * | | Merge 10.0 into 10.1Marko Mäkelä2018-06-261-1/+2
| |\ \ \ | | |/ /
| | * | MDEV-15242 Poor RBR update performance with partitioned tablesAndrei Elkin2018-06-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Observed and described partitioned engine execution time difference between master and slave was caused by excessive invocation of base_engine::rnd_init which was done also for partitions uninvolved into Rows-event operation. The bug's slave slowdown therefore scales with the number of partitions. Fixed with applying an upstream patch. References: ---------- https://bugs.mysql.com/bug.php?id=73648 Bug#25687813 REPLICATION REGRESSION WITH RBR AND PARTITIONED TABLES
* | | | MDEV-15626 Assertion on update virtual column in partitioned tableSergei Golubchik2018-05-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | table.cc: virtual columns must be computed for INSERT, if they're part of the partitioning expression. this change broke gcol.gcol_partition_innodb. fix CHECK TABLE for partitioned tables and vcols. sql_partition.cc: mark prerequisite base columns in full_part_field_set ha_partition.cc initialize vcol_set accordingly
* | | | MDEV-11415 Remove excessive undo logging during ALTER TABLE…ALGORITHM=COPYMarko Mäkelä2018-01-301-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a crash occurs during ALTER TABLE…ALGORITHM=COPY, InnoDB would spend a lot of time rolling back writes to the intermediate copy of the table. To reduce the amount of busy work done, a work-around was introduced in commit fd069e2bb36a3c1c1f26d65dd298b07e6d83ac8b in MySQL 4.1.8 and 5.0.2, to commit the transaction after every 10,000 inserted rows. A proper fix would have been to disable the undo logging altogether and to simply drop the intermediate copy of the table on subsequent server startup. This is what happens in MariaDB 10.3 with MDEV-14717,MDEV-14585. In MariaDB 10.2, the intermediate copy of the table would be left behind with a name starting with the string #sql. This is a backport of a bug fix from MySQL 8.0.0 to MariaDB, contributed by jixianliang <271365745@qq.com>. Unlike recent MySQL, MariaDB supports ALTER IGNORE. For that operation InnoDB must for now keep the undo logging enabled, so that the latest row can be rolled back in case of an error. In Galera cluster, the LOAD DATA statement will retain the existing behaviour and commit the transaction after every 10,000 rows if the parameter wsrep_load_data_splitting=ON is set. The logic to do so (the wsrep_load_data_split() function and the call handler::extra(HA_EXTRA_FAKE_START_STMT)) are joint work by Ji Xianliang and Marko Mäkelä. The original fix: Author: Thirunarayanan Balathandayuthapani <thirunarayanan.balathandayuth@oracle.com> Date: Wed Dec 2 16:09:15 2015 +0530 Bug#17479594 AVOID INTERMEDIATE COMMIT WHILE DOING ALTER TABLE ALGORITHM=COPY Problem: During ALTER TABLE, we commit and restart the transaction for every 10,000 rows, so that the rollback after recovery would not take so long. Fix: Suppress the undo logging during copy alter operation. If fts_index is present then insert directly into fts auxiliary table rather than doing at commit time. ha_innobase::num_write_row: Remove the variable. ha_innobase::write_row(): Remove the hack for committing every 10000 rows. row_lock_table_for_mysql(): Remove the extra 2 parameters. lock_get_src_table(), lock_is_table_exclusive(): Remove. Reviewed-by: Marko Mäkelä <marko.makela@oracle.com> Reviewed-by: Shaohua Wang <shaohua.wang@oracle.com> Reviewed-by: Jon Olav Hauglid <jon.hauglid@oracle.com>
* | | | Merge remote-tracking branch 'origin/10.1' into bb-10.2-vicentiuVicențiu Ciorbaru2017-12-281-36/+33
|\ \ \ \ | |/ / /
| * | | MDEV-14026 ALTER TABLE ... DELAY_KEY_WRITE=1 creates table copy for ↵Sergei Golubchik2017-12-251-9/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | partitioned MyISAM table with DATA DIRECTORY/INDEX DIRECTORY options set data_file_name and index_file_name in HA_CREATE_INFO before calling check_if_incompatible_data()
| * | | cleanup: ha_partition::update_create_infoSergei Golubchik2017-12-251-27/+8
| | | |
* | | | Merge branch '10.1' into 10.2Sergei Golubchik2017-10-241-3/+3
|\ \ \ \ | |/ / /
| * | | Merge branch '10.0' into 10.1Sergei Golubchik2017-10-221-3/+3
| |\ \ \ | | |/ /
| | * | Merge branch '5.5' into 10.0Sergei Golubchik2017-10-181-3/+3
| | |\ \ | | | |/
| | | * Merge branch 'mysql/5.5' into 5.5Sergei Golubchik2017-10-171-2/+3
| | | |\
| | | | * Bug#26390632: CREATE TABLE CAN CAUSE MYSQL TO EXIT.Nisha Gopalakrishnan2017-08-231-59/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Analysis ======== CREATE TABLE of InnoDB table with a partition name which exceeds the path limit can cause the server to exit. During the preparation of the partition name, there was no check to identify whether the complete path name for partition exceeds the max supported path length, causing the server to exit during subsequent processing. Fix === During the preparation of partition name, check and report an error if the partition path name exceeds the maximum path name limit. This is a 5.5 patch.
| | | | * backport of Bug#17401628Mattias Jonsson2013-11-201-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | revid:mattias.jonsson@oracle.com-20131119103616-u6t82s8cpgp0q3ex Use of uninitialized memory in the priority queue used for returning records in sorted order. It happens if no previous partition have returned a row since the beginning of index_init + an index_read* call returned HA_ERR_KEY_NOT_FOUND for all partitions (otherwise the record buffer/priority queue would be initialized) + an index_next/prev call where all partitions returns HA_ERR_END_OF_FILE.
| | | | * Bug#17588348: INDEX MERGE USED ON PARTITIONED TABLE Aditya A2013-11-051-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CAN RETURN WRONG RESULT SET PROBLEM ------- In ha_partition::cmp_ref() we were only calling the underlying cmp_ref() of storage engine if the records are in the same partiton,else we sort by partition and returns the result.But the index merge intersect algorithm expects first to sort by row-id first and then by partition id. FIX --- Compare the refernces first using storage engine cmp_ref and then if references are equal(only happens if non clustered index is used) then sort it by partition id. [Approved by Mattiasj #rb3755] -
| | | | * Bug #16051817 GOT ERROR 124 FROM STORAGE ENGINE Aditya A2013-10-211-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ON DELETE FROM A PARTITIONED TABLE PROBLEM ------- The user first disables all the non unique indexes in the table and then rebuilds one partition. During rebuild the indexes on that particular partition are enabled. Now when we give a query the optimizer is unaware that on one partition indexes are enabled and if the optimizer selects that index,myisam thinks that the index is not active and gives an error. FIX --- Before rebuilding a partition check whether non unique indexes are disabled on the partitons. If they are disabled then after rebuild disable the index on the partition. [Approved by Mattiasj #rb3469]
| | | | * Bug #11766851 QUERYING I_S.PARTITIONS CHANGES THE CARDINALITY OF THEAditya A2013-07-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PARTITIONS. ANALYSIS -------- Whenever we query I_S.partitions, ha_partition::get_dynamic_partition_info() is called which resets the cardinality according to the number of rows in last partition. Fix --- When we call get_dynamic_partition_info() avoid passing the flag HA_STATUS_CONST to info() since HA_STATUS_CONST should ideally not be called for per partition. [Approved by mattiasj rb#2830 ]
| | | | * Bug#16589511: MYSQL_UPGRADE FAILS TO WRITE OUT ENTIREMattias Jonsson2013-06-281-31/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ALTER TABLE ... ALGORITHM= ... STATEMENT The problem was an intermediate buffer of smaller size, which truncated the alter statement. Solved by providing the size of the buffer to be allocated through the function call, instead of using an one-size-fits-all stack buffer inside the function.
| | | | * Bug#13548704 ALGORITHM USED FOR DROPPING PARTITIONED TABLE CAN LEADAditya A2013-06-141-14/+30
| | | | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TO INCONSISTENCY [Merge from 5.1]
| | | | | * Bug#13548704 ALGORITHM USED FOR DROPPING PARTITIONED TABLE CAN LEADAditya A2013-06-141-14/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TO INCONSISTENCY PROBLEM -------- When we drop a partitoned table , we first gather the information about partitions in the table from the table_name.par file and store it in an internal data structure.Then we delete this file and the data in the table. If the server crashes after deleting the file,then after recovering we cannot access the table .Even we cannot drop the table ,because drop algorithm requires par file to read the partition information. FIX --- 1. We move the part of deleting par file after deleting all the table data from the storage egine. 2. During drop operation if we detect that the par file is missing then we delete the .frm file,since there is no way of recovering without par file. [Approved by Mattias rb#2576 ]
| | | | * | Bug#16274455: CAN NOT ACESS PARTITIONED TABLES WHENMattias Jonsson2013-02-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DOWNGRADED FROM 5.6.11 TO 5.6.10 Problem was new syntax not accepted by previous version. Fixed by adding version comment of /*!50531 around the new syntax. Like this in the .frm file: 'PARTITION BY KEY /*!50611 ALGORITHM = 2 */ () PARTITIONS 3' and also changing the output from SHOW CREATE TABLE to: CREATE TABLE t1 (a INT) /*!50100 PARTITION BY KEY */ /*!50611 ALGORITHM = 1 */ /*!50100 () PARTITIONS 3 */ It will always add the ALGORITHM into the .frm for KEY [sub]partitioned tables, but for SHOW CREATE TABLE it will only add it in case it is the non default ALGORITHM = 1. Also notice that for 5.5, it will say /*!50531 instead of /*!50611, which will make upgrade from 5.5 > 5.5.31 to 5.6 < 5.6.11 fail! If one downgrades an fixed version to the same major version (5.5 or 5.6) the bug 14521864 will be visible again, but unless the .frm is updated, it will work again when upgrading again. Also fixed so that the .frm does not get updated version if a single partition check passes.
| | | | * | Bug#14521864: MYSQL 5.1 TO 5.5 BUGS PARTITIONINGMattias Jonsson2013-01-301-20/+463
| | | | |\ \ | | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to an internal change in the server code in between 5.1 and 5.5 (wl#2649) the hash function used in KEY partitioning changed for numeric and date/time columns (from binary hash calculation to character based hash calculation). Also enum/set changed from latin1 ci based hash calculation to binary hash between 5.1 and 5.5. (bug#11759782). These changes makes KEY [sub]partitioned tables on any of the affected column types incompatible with 5.5 and above, since the calculation of partition id differs. Also since InnoDB asserts that a deleted row was previously read (positioned), the server asserts on delete of a row that is in the wrong partition. The solution for this situation is: 1) The partitioning engine will check that delete/update will go to the partition the row was read from and give an error otherwise, consisting of the rows partitioning fields. This will avoid asserts in InnoDB and also alert the user that there is a misplaced row. A detailed error message will be given, including an entry to the error log consisting of both table name, partition and row content (PK if exists, otherwise all partitioning columns). 2) A new optional syntax for KEY () partitioning in 5.5 is allowed: [SUB]PARTITION BY KEY [ALGORITHM = N] (list_of_cols) Where N = 1 uses the same hashing as 5.1 (Numeric/date/time fields uses binary hashing, ENUM/SET uses charset hashing) N = 2 uses the same hashing as 5.5 (Numeric/date/time fields uses charset hashing, ENUM/SET uses binary hashing). If not set on CREATE/ALTER it will default to 2. This new syntax should probably be ignored by NDB. 3) Since there is a demand for avoiding scanning through the full table, during upgrade the ALTER TABLE t PARTITION BY ... command is considered a no-op (only .frm change) if everything except ALGORITHM is the same and ALGORITHM was not set before, which allows manually upgrading such table by something like: ALTER TABLE t PARTITION BY KEY ALGORITHM = 1 () or ALTER TABLE t PARTITION BY KEY ALGORITHM = 2 () 4) Enhanced partitioning with CHECK/REPAIR to also check for/repair misplaced rows. (Also works for ALTER TABLE t CHECK/REPAIR PARTITION) CHECK FOR UPGRADE: If the .frm version is < 5.5.3 and uses KEY [sub]partitioning and an affected column type then it will fail with an message: KEY () partitioning changed, please run: ALTER TABLE `test`.`t1` PARTITION BY KEY ALGORITHM = 1 (a) PARTITIONS 12 (i.e. current partitioning clause, with the addition of ALGORITHM = 1) CHECK without FOR UPGRADE: if MEDIUM (default) or EXTENDED options are given: Scan all rows and verify that it is in the correct partition. Fail for the first misplaced row. REPAIR: if default or EXTENDED (i.e. not QUICK/USE_FRM): Scan all rows and every misplaced row is moved into its correct partitions. 5) Updated mysqlcheck (called by mysql_upgrade) to handle the new output from CHECK FOR UPGRADE, to run the ALTER statement instead of running REPAIR. This will allow mysql_upgrade (or CHECK TABLE t FOR UPGRADE) to upgrade a KEY [sub]partitioned table that has any affected field type and a .frm version < 5.5.3 to ALGORITHM = 1 without rebuild. Also notice that if the .frm has a version of >= 5.5.3 and ALGORITHM is not set, it is not possible to know if it consists of rows from 5.1 or 5.5! In these cases I suggest that the user does: (optional) LOCK TABLE t WRITE; SHOW CREATE TABLE t; (verify that it has no ALGORITHM = N, and to be safe, I would suggest backing up the .frm file, to be used if one need to change to another ALGORITHM = N, without needing to rebuild/repair) ALTER TABLE t <old partitioning clause, but with ALGORITHM = N>; which should set the ALGORITHM to N (if the table has rows from 5.1 I would suggest N = 1, otherwise N = 2) CHECK TABLE t; (here one could use the backed up .frm instead and change to a new N and run CHECK again and see if it passes) and if there are misplaced rows: REPAIR TABLE t; (optional) UNLOCK TABLES;
| | | | * | mergeMattias Jonsson2012-12-271-0/+1
| | | | |\ \ | | | | | |/
| | | | | * Bug#14845133: Mattias Jonsson2012-11-131-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem is related to the changes made in bug#13025132. get_partition_set can do dynamic pruning which limits the partitions to scan even further. This is not accounted for when setting the correct start of the preallocated record buffer used in the priority queue, thus leading to wrong buffer is used (including wrong preset partitioning id, connected to that buffer). Solution is to fast forward the buffer pointer to point to the correct partition record buffer.
| | | | * | manual merge of bug#14845133 mysql-5.1 -> mysql-5.5Mattias Jonsson2012-11-131-2/+19
| | | | |\ \
| | | | | * | Bug#14845133: Mattias Jonsson2012-11-131-2/+19
| | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem is related to the changes made in bug#13025132. get_partition_set can do dynamic pruning which limits the partitions to scan even further. This is not accounted for when setting the correct start of the preallocated record buffer used in the priority queue, thus leading to wrong buffer is used (including wrong preset partitioning id, connected to that buffer). Solution is to fast forward the buffer pointer to point to the correct partition record buffer.
| | | | * | Bug#14495351: CRASH IN HA_PARTITION::HANDLE_UNORDERED_NEXTJon Olav Hauglid2012-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Follow-up patch - Fix broken build: error: format ‘%u’ expects argument of type ‘unsigned int’, but argument 2 has type ‘key_part_map {aka long unsigned int}’ [-Werror=format]
| | | | * | Bug#14495351: CRASH IN HA_PARTITION::HANDLE_UNORDERED_NEXTMattias Jonsson2012-09-101-46/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The partitioning engine does not implement index_next for partitions which return HA_ERR_KEY_NOT_FOUND in index_read_map. If HA_ERR_KEY_NOT_FOUND was returned by a partition during index_read_map, that partition would not be included in following calls to index_next. If no partition returned a row in index_read_map, then the subsequent call to index_next would try to use a non existing handler (index out of bound). Even after fixing the index out of bound if at least one partition returned. So it is really two connected bugs 1) crash due to index out of bound (-1 unsigned). 2) not including partitions that returned HA_ERR_KEY_NOT_FOUND. Fixed by recording the partitions that returned HA_ERR_KEY_NOT_FOUND, and include them too when doing handle_ordered_next the first time.
| | | | * | mergeMattias Jonsson2012-08-201-1/+1
| | | | |\ \ | | | | | |/
| | | | | * Bug#13025132 - PARTITIONS USE TOO MUCH MEMORYMattias Jonsson2012-08-201-1/+1
| | | | | | | | | | | | | | | | | | pre-push fix, removed unused variable.
| | | | * | mergeMattias Jonsson2012-08-201-18/+13
| | | | |\ \ | | | | | |/
| | | | | * Bug#13025132 - PARTITIONS USE TOO MUCH MEMORYMattias Jonsson2012-08-171-18/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Additional patch to remove the part_id -> ref_buffer offset. The partitioning id and the associate record buffer can be found without having to calculate it. By initializing it for each used partition, and then reuse the key-buffer from the queue, it is not needed to have such map.