delta/mariadb-git.git - github.com: MariaDB/server.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	A cleanup for MDEV-30695 Refactor case folding data types in Asian collations	Alexander Barkov	2023-03-03	1	-13/+13
\| \| \| \|	Adding "const" qualifiers to casefold_info_st::page
*	MDEV-30695 Refactor case folding data types in Asian collations	Alexander Barkov	2023-02-21	1	-1064/+1068
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a non-functional change and should not change the server behavior. Casefolding information is now stored in items of a new data type MY_CASEFOLD_CHARACTER: typedef struct casefold_info_char_t { uint32 toupper; uint32 tolower; } MY_CASEFOLD_CHARACTER; Before this change, casefolding tables for Asian collations were stored in: typedef struct unicase_info_char_st { uint32 toupper; uint32 tolower; uint32 sort; } MY_UNICASE_CHARACTER; The "sort" member was not used in the code handling Asian collations, it only wasted space. (it's only used by Unicode _general_ci and _general_mysql500_ci collations). Unicode collations (at least UCA and _bin) should also be refactored later, but under terms of a separate task.
*	MDEV-30661 UPPER() returns an empty string for U+0251 in uca1400 collations ↵	Alexander Barkov	2023-02-17	1	-13/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for utf8 String length growth during upper/lower conversion in Unicode collations depends only on the underlying MY_UNICASE_INFO used in the collation. Maintaining a separate member CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply duplicated this information and caused bugs like this (when MY_UNICASE_INFO and case??_multiply when out of sync because of incomplete CHARSET_INFO initialization). Fix: Changing CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply from members to virtual functions. The virtual functions in Unicode collations calculate case conversion growth factors from the MY_UNICASE_INFO. This guarantees that the growth factors are always in sync with the MY_UNICASE_INFO.
*	MDEV-27009 Add UCA-14.0.0 collations	Alexander Barkov	2022-08-10	1	-8/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Added one neutral and 22 tailored (language specific) collations based on Unicode Collation Algorithm version 14.0.0. Collations were added for Unicode character sets utf8mb3, utf8mb4, ucs2, utf16, utf32. Every tailoring was added with four accent and case sensitivity flag combinations, e.g: * utf8mb4_uca1400_swedish_as_cs * utf8mb4_uca1400_swedish_as_ci * utf8mb4_uca1400_swedish_ai_cs * utf8mb4_uca1400_swedish_ai_ci and their _nopad_ variants: * utf8mb4_uca1400_swedish_nopad_as_cs * utf8mb4_uca1400_swedish_nopad_as_ci * utf8mb4_uca1400_swedish_nopad_ai_cs * utf8mb4_uca1400_swedish_nopad_ai_ci - Introducing a conception of contextually typed named collations: CREATE DATABASE db1 CHARACTER SET utf8mb4; CREATE TABLE db1.t1 (a CHAR(10) COLLATE uca1400_as_ci); The idea is that there is no a need to specify the character set prefix in the new collation names. It's enough to type just the suffix "uca1400_as_ci". The character set is taken from the context. In the above example script the context character set is utf8mb4. So the CREATE TABLE will make a column with the collation utf8mb4_uca1400_as_ci. Short collations names can be used in any parts of the SQL syntax where the COLLATE clause is understood. - New collations are displayed only one time (without character set combinations) by these statements: SELECT * FROM INFORMATION_SCHEMA.COLLATIONS; SHOW COLLATION; For example, all these collations: - utf8mb3_uca1400_swedish_as_ci - utf8mb4_uca1400_swedish_as_ci - ucs2_uca1400_swedish_as_ci - utf16_uca1400_swedish_as_ci - utf32_uca1400_swedish_as_ci have just one entry in INFORMATION_SCHEMA.COLLATIONS and SHOW COLLATION, with COLLATION_NAME equal to "uca1400_swedish_as_ci", which is the suffix without the character set name: SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci'; +-----------------------+ \| COLLATION_NAME \| +-----------------------+ \| uca1400_swedish_as_ci \| +-----------------------+ Note, the behaviour of old collations did not change. Non-unicode collations (e.g. latin1_swedish_ci) and old UCA-4.0.0 collations (e.g. utf8mb4_unicode_ci) are still displayed with the character set prefix, as before. - The structure of the table INFORMATION_SCHEMA.COLLATIONS was changed. The NOT NULL constraint was removed from these columns: - CHARACTER_SET_NAME - ID - IS_DEFAULT and from the corresponding columns in SHOW COLLATION. For example: SELECT COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci'; +-----------------------+--------------------+------+------------+ \| COLLATION_NAME \| CHARACTER_SET_NAME \| ID \| IS_DEFAULT \| +-----------------------+--------------------+------+------------+ \| uca1400_swedish_as_ci \| NULL \| NULL \| NULL \| +-----------------------+--------------------+------+------------+ The NULL value in these columns now means that the collation is applicable to multiple character sets. The behavioir of old collations did not change. Make sure your client programs can handle NULL values in these columns. - The structure of the table INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY was changed. Three new NOT NULL columns were added: - FULL_COLLATION_NAME - ID - IS_DEFAULT New collations have multiple entries in COLLATION_CHARACTER_SET_APPLICABILITY. The column COLLATION_NAME contains the collation name without the character set prefix. The column FULL_COLLATION_NAME contains the collation name with the character set prefix. Old collations have full collation name in both FULL_COLLATION_NAME and COLLATION_NAME. SELECT COLLATION_NAME, FULL_COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY WHERE FULL_COLLATION_NAME RLIKE '^(utf8mb4\|latin1).swedish.ci$'; +-----------------------------+-------------------------------------+--------------------+------+------------+ \| COLLATION_NAME \| FULL_COLLATION_NAME \| CHARACTER_SET_NAME \| ID \| IS_DEFAULT \| +-----------------------------+-------------------------------------+--------------------+------+------------+ \| latin1_swedish_ci \| latin1_swedish_ci \| latin1 \| 8 \| Yes \| \| latin1_swedish_nopad_ci \| latin1_swedish_nopad_ci \| latin1 \| 1032 \| \| \| utf8mb4_swedish_ci \| utf8mb4_swedish_ci \| utf8mb4 \| 232 \| \| \| uca1400_swedish_ai_ci \| utf8mb4_uca1400_swedish_ai_ci \| utf8mb4 \| 2368 \| \| \| uca1400_swedish_as_ci \| utf8mb4_uca1400_swedish_as_ci \| utf8mb4 \| 2370 \| \| \| uca1400_swedish_nopad_ai_ci \| utf8mb4_uca1400_swedish_nopad_ai_ci \| utf8mb4 \| 2372 \| \| \| uca1400_swedish_nopad_as_ci \| utf8mb4_uca1400_swedish_nopad_as_ci \| utf8mb4 \| 2374 \| \| +-----------------------------+-------------------------------------+--------------------+------+------------+ - Other INFORMATION_SCHEMA queries: SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.PARAMETERS; SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES; SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.ROUTINES; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.EVENTS; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.EVENTS; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.ROUTINES; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.ROUTINES; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.TRIGGERS; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.TRIGGERS; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.VIEWS; display full collation names, including character sets prefix, for all collations, including new collations. Corresponding SHOW commands also display full collation names in collation related columns: SHOW CREATE TABLE t1; SHOW CREATE DATABASE db1; SHOW TABLE STATUS; SHOW CREATE FUNCTION f1; SHOW CREATE PROCEDURE p1; SHOW CREATE EVENT ev1; SHOW CREATE TRIGGER tr1; SHOW CREATE VIEW; These INFORMATION_SCHEMA queries and SHOW statements may change in the future, to display show collation names.
*	Merge branch '10.6' into 10.7	Oleksandr Byelkin	2022-02-04	1	-5/+9
\|\
\| *	Merge branch '10.5' into 10.6	Oleksandr Byelkin	2022-02-03	1	-5/+9
\| \|\
\| \| *	Merge branch '10.4' into 10.5	Oleksandr Byelkin	2022-02-01	1	-5/+9
\| \| \|\
\| \| \| *	Merge branch '10.3' into 10.4	Oleksandr Byelkin	2022-01-30	1	-5/+5
\| \| \| \|\
\| \| \| \| *	MDEV-27494 Rename .ic files to .inl	Vladislav Vaintroub	2022-01-17	1	-5/+5
\| \| \| \| \|
\| \| \| * \|	MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD ↵bb-10.4-bar-MDEV-25904	Alexander Barkov	2022-01-21	1	-0/+4
\| \| \| \|/ \| \| \| \| \| \| \| \| \| \| \| \|	strings
* \| \| \|	Merge 10.6 into 10.7	Marko Mäkelä	2021-09-30	1	-4/+13
\|\ \ \ \ \| \|/ / /
\| * \| \|	MDEV-26669 Add MY_COLLATION_HANDLER functions min_str() and max_str()bb-10.6-bar-MDEV-26669	Alexander Barkov	2021-09-27	1	-4/+13
\| \| \| \|
* \| \| \|	MDEV-26572 Improve simple multibyte collation performance on the ASCII rangebb-10.7-bar-MDEV-26572	Alexander Barkov	2021-09-13	1	-0/+4
\|/ / /
* \| \|	Change CHARSET_INFO character set and collaction names to LEX_CSTRING	Monty	2021-05-19	1	-8/+9
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change removed 68 explict strlen() calls from the code. The following renames was done to ensure we don't use the old names when merging code from earlier releases, as using the new variables for print function could result in crashes: - charset->csname renamed to charset->cs_name - charset->name renamed to charset->coll_name Almost everything where mechanical changes except: - Changed to use the new Protocol::store(LEX_CSTRING..) when possible - Changed to use field->store(LEX_CSTRING, CHARSET_INFO) when possible - Changed to use String->append(LEX_CSTRING&) when possible Other things: - There where compiler issues with ensuring that all character set names points to the same string: gcc doesn't allow one to use integer constants when defining global structures (constant char * pointers works fine). To get around this, I declared defines for each character set name length.
* \|	MDEV-7947 strcmp() takes 0.37% in OLTP RO	Monty	2020-07-23	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch ensures that all identical character sets shares the same cs->csname. This allows us to replace strcmp() in my_charset_same() with comparisons of pointers. This fixes a long standing performance issue that could cause as strcmp() for every item sent trough the protocol class to the end user. One consequence of this patch is that we don't allow one to add a character definition in the Index.xml file that changes the csname of an existing character set. This is by design as changing character set names of existing ones is extremely dangerous, especially as some storage engines just records character set numbers. As we now have a hash over character set's csname, we can in the future use that for faster access to a specific character set. This could be done by changing the hash to non unique and use the hash to find the next character set with same csname.
* \|	MDEV-22043 Special character leads to assertion in ↵	Alexander Barkov	2020-05-09	1	-0/+1
\|/ \| \| \| \| \| \| \| \| \| \|	my_wc_to_printable_generic on 10.5.2 (debug) The code did not take into account that: - U+005C (backslash) can occupy more than mbminlen characters (e.g. in sjis) - Some character sets do not have a code for U+005C (e.g. swe7) Adding a new function my_wc_to_printable into MY_CHARSET_HANDLER to cover all special cases easier.
*	Merge 10.1 into 10.2	Marko Mäkelä	2019-05-13	1	-1/+1
\|\
\| *	Merge branch '5.5' into 10.1	Vicențiu Ciorbaru	2019-05-11	1	-1/+1
\| \|\
\| \| *	Update FSF Address	Vicențiu Ciorbaru	2019-05-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	* Update wrong zip-code
* \| \|	Merge 10.1 into 10.2	Marko Mäkelä	2018-08-02	1	-4/+4
\|\ \ \ \| \|/ /
\| * \|	Merge branch '10.0' into bb-10.1-merge-sanja	Oleksandr Byelkin	2018-07-25	1	-4/+4
\| \|\ \
\| \| * \|	Simplify caseup() and casedn() in charsets	Alexander Barkov	2018-07-19	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After the MDEV-13118 fix there's no code in the server that wants caseup/casedn to change the argument in place for simple charsets. Let's remove this logic and always return the result in a new string for all charsets, both simple and complex. 1. Removing the optimization that some character sets used in casedn() and caseup(), which allowed (and required) to change the case in-place, overwriting the string passed as the "src" argument. Now all CHARSET_INFO's work in the same way: non of them change the source string in-place, all of them now convert case from the source string to the destination string, leaving the source string untouched. 2. Adding "const" qualifier to the "char src" parameter to caseup() and casedn(). 3. Removing duplicate implementations in ctype-mb.c. Now both caseup() and casedn() implementations for all CJK character sets use internally the same function my_casefold_mb() (the former my_casefold_mb_varlen()). 4. Removing the "unused" attribute from parameters of some my_case{up\|dn}_xxx() implementations, as the affected parameters are now used* in the code. Previously these parameters were used only in DBUG_ASSERT().
* \| \| \|	MDEV-7769 MY_CHARSET_INFO refactoring# On branch 10.2	Alexander Barkov	2016-10-10	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Part 3 (final): removing MY_CHARSET_HANDLER::well_formed_len().
* \| \| \|	MDEV-9711 NO PAD collations	Alexander Barkov	2016-09-06	1	-0/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on the patch from Daniil Medvedev (a Google Summer of Code task)
* \| \| \|	MDEV-6353 my_ismbchar() and my_mbcharlen() refactoring	Alexander Barkov	2016-05-17	1	-7/+0
\| \| \| \|
* \| \| \|	MDEV-9823 LOAD DATA INFILE silently truncates incomplete byte sequences	Alexander Barkov	2016-04-06	1	-0/+1
\| \| \| \|
* \| \| \|	MDEV-9665 Remove cs->cset->ismbchar()	Alexander Barkov	2016-03-16	1	-11/+0
\|/ / / \| \| \| \| \| \| \| \| \|	Using a more powerfull cs->cset->charlen() instead.
* \| \|	Adding MY_CHARSET_HANDLER::native_to_mb().	Alexander Barkov	2015-08-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a pre-requisite patch for: - MDEV-8433 Make field<'broken-string' use indexes - MDEV-8625 Bad result set with ignorable characters when using a prefix key - MDEV-8626 Bad result set with contractions when using a prefix key
* \| \|	MDEV-8215 Asian MB3 charsets: compare broken bytes as "greater than any ↵	Alexander Barkov	2015-07-03	1	-5/+40
\| \| \| \| \| \| \| \| \| \| \| \|	non-broken character"
* \| \|	MDEV-6566 Different INSERT behaviour on bad bytes with and without character ↵	Alexander Barkov	2015-03-13	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	set conversion
* \| \|	Adding a shared include file ctype-mb.ic and removing a number	Alexander Barkov	2015-03-04	1	-61/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of very similar copies of my_well_formed_len_xxx(), implemented for big5, cp932, euckr, eucjpms, gb2312m gbk, sjis, ujis.
* \| \|	A preparatory patch for MDEV-6566.	Alexander Barkov	2015-03-02	1	-1/+2
\|/ / \| \| \| \| \| \| \| \| \| \|	Adding a new virtual function MY_CHARSET_HANDLER::copy_abort(). Moving character set specific code into the correspoding implementations (for simple, multi-byte and mbmaxlen>1 character sets).
* \|	MDEV-6776 ujis and eucjmps erroneously accept 0x8EA0 as a valid byte sequence	Alexander Barkov	2014-09-24	1	-9/+8
\| \|
* \|	5.5.39 merge	Sergei Golubchik	2014-08-07	1	-3/+3
\|\ \ \| \|/
\| *	mysql-5.5.39 merge	Sergei Golubchik	2014-08-02	1	-3/+3
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	~40% bugfixed() applied ~40$ bugfixed reverted (incorrect or we're not buggy) ~20% bugfixed applied, despite us being not buggy () only changes in the server code, e.g. not cmakefiles
\| \| *	Bug#18850241 WRONG COPYRIGHT HEADER IN SOME STRINGS/CTYPE-* FILES	Erlend Dahl	2014-06-23	1	-1/+2
\| \| \|
* \| \|	MDEV-5163 Merge WEIGHT_STRING function from MySQL-5.6	Alexander Barkov	2013-10-23	1	-1/+3
\| \| \|
* \| \|	MDEV-4928 Merge collation customization improvements	Alexander Barkov	2013-10-02	1	-20/+27
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merging the following MySQL-5.6 changes: - WL#5624: Collation customization improvements http://dev.mysql.com/worklog/task/?id=5624 - WL#4013: Unicode german2 collation http://dev.mysql.com/worklog/task/?id=4013 - Bug#62429 XML: ExtractValue, UpdateXML max arg length 127 chars http://bugs.mysql.com/bug.php?id=62429 (required by WL#5624)
* \|	mysql-5.5.32 merge	Sergei Golubchik	2013-07-16	1	-2/+2
\|\ \ \| \|/
\| *	Fix for Bug 16395495 - OLD FSF ADDRESS IN GPL HEADER	Murthy Narkedimilli	2013-03-19	1	-2/+2
\| \|
* \|	5.3 merge	Sergei Golubchik	2012-01-13	1	-1/+1
\|\ \
\| * \	Merge with MariaDB 5.1	Michael Widenius	2011-11-24	1	-2/+3
\| \|\ \
\| \| * \	Initail merge with MySQL 5.1 (XtraDB still needs to be merged)	Michael Widenius	2011-11-21	1	-2/+3
\| \| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixed up copyright messages.
* \| \| \ \	mysql-5.5.18 merge	Sergei Golubchik	2011-11-03	1	-1/+1
\|\ \ \ \ \ \| \| \|_\|_\|/ \| \|/\| \| \|
\| * \| \| \|	Updated/added copyright headers	Kent Boortz	2011-06-30	1	-1/+1
\| \|\ \ \ \ \| \| \| \|_\|/ \| \| \|/\| \|
\| \| * \| \|	Updated/added copyright headers	Kent Boortz	2011-06-30	1	-3/+4
\| \| \|\ \ \
* \| \| \ \ \	merge with 5.3	Sergei Golubchik	2011-10-19	1	-4/+5
\|\ \ \ \ \ \ \| \| \|_\|_\|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sql/sql_insert.cc: CREATE ... IF NOT EXISTS may do nothing, but it is still not a failure. don't forget to my_ok it. **** CREATE ... IF NOT EXISTS may do nothing, but it is still not a failure. don't forget to my_ok it. sql/sql_table.cc: small cleanup **** small cleanup
\| * \| \| \| \|	Merge with MariaDB 5.1	Michael Widenius	2011-05-03	1	-1/+3
\| \|\ \ \ \ \ \| \| \| \|_\|_\|/ \| \| \|/\| \| \|
\| \| * \| \| \|	Merge with MySQL 5.1.57/58	Michael Widenius	2011-05-02	1	-1/+3
\| \| \|\ \ \ \ \| \| \| \| \|/ / \| \| \| \|/\| \| \| \| \| \| \| \|	Moved some BSD string functions from Unireg
\| * \| \| \| \|	Merge with 5.1 to get in changes from MySQL 5.1.55	Michael Widenius	2011-02-28	1	-3/+2
\| \|\ \ \ \ \ \| \| \|/ / / /