diff options
author | Alexander Barkov <bar@mariadb.com> | 2023-03-31 17:20:03 +0400 |
---|---|---|
committer | Alexander Barkov <bar@mariadb.com> | 2023-04-04 12:30:50 +0400 |
commit | 8020b1bd735c686818f1563e2c2317e263d5bd3a (patch) | |
tree | 9280a9d419e60dc409f88138e732c8ff67050c2e /include | |
parent | 0cc1694e9c7481b59d372af7f759bb0bcf552bfa (diff) | |
download | mariadb-git-8020b1bd735c686818f1563e2c2317e263d5bd3a.tar.gz |
MDEV-30034 UNIQUE USING HASH accepts duplicate entries for tricky collations
- Adding a new argument "flag" to MY_COLLATION_HANDLER::strnncollsp_nchars()
and a flag MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES.
The flag defines if strnncollsp_nchars() should emulate trailing spaces
which were possibly trimmed earlier (e.g. in InnoDB CHAR compression).
This is important for NOPAD collations.
For example, with this input:
- str1= 'a ' (Latin letter a followed by one space)
- str2= 'a ' (Latin letter a followed by two spaces)
- nchars= 3
if the flag is given, strnncollsp_nchars() will virtually restore
one trailing space to str1 up to nchars (3) characters and compare two
strings as equal:
- str1= 'a ' (one extra trailing space emulated)
- str2= 'a ' (as is)
If the flag is not given, strnncollsp_nchars() does not add trailing
virtual spaces, so in case of a NOPAD collation, str1 will be compared
as less than str2 because it is shorter.
- Field_string::cmp_prefix() now passes the new flag.
Field_varstring::cmp_prefix() and Field_blob::cmp_prefix() do
not pass the new flag.
- The branch in cmp_whole_field() in storage/innobase/rem/rem0cmp.cc
(which handles the CHAR data type) now also passed the new flag.
- Fixing UCA collations to respect the new flag.
Other collations are possibly also affected, however
I had no success in making an SQL script demonstrating the problem.
Other collations will be extended to respect this flags in a separate
patch later.
- Changing the meaning of the last parameter of Field::cmp_prefix()
from "number of bytes" (internal length)
to "number of characters" (user visible length).
The code calling cmp_prefix() from handler.cc was wrong.
After this change, the call in handler.cc became correct.
The code calling cmp_prefix() from key_rec_cmp() in key.cc
was adjusted according to this change.
- Old strnncollsp_nchar() related tests in unittest/strings/strings-t.c
now pass the new flag.
A few new tests also were added, without the flag.
Diffstat (limited to 'include')
-rw-r--r-- | include/m_ctype.h | 25 |
1 files changed, 24 insertions, 1 deletions
diff --git a/include/m_ctype.h b/include/m_ctype.h index 484cd0a657e..96eea74d5ba 100644 --- a/include/m_ctype.h +++ b/include/m_ctype.h @@ -248,6 +248,28 @@ extern MY_UNI_CTYPE my_uni_ctype[256]; #define MY_STRXFRM_REVERSE_LEVEL6 0x00200000 /* if reverse order for level6 */ #define MY_STRXFRM_REVERSE_SHIFT 16 +/* Flags to strnncollsp_nchars */ +/* + MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES - + defines if inside strnncollsp_nchars() + short strings should be virtually extended to "nchars" + characters by emulating trimmed trailing spaces. + + This flag is needed when comparing packed strings of the CHAR + data type, when trailing spaces are trimmed on storage (like in InnoDB), + however the actual values (after unpacking) will have those trailing + spaces. + + If this flag is passed, strnncollsp_nchars() performs both + truncating longer strings and extending shorter strings + to exactly "nchars". + + If this flag is not passed, strnncollsp_nchars() only truncates longer + strings to "nchars", but does not extend shorter strings to "nchars". +*/ +#define MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES 1 + + /* Collation IDs for MariaDB that should not conflict with MySQL. We reserve 256..511, because MySQL will most likely use this range @@ -383,7 +405,8 @@ struct my_collation_handler_st int (*strnncollsp_nchars)(CHARSET_INFO *, const uchar *str1, size_t len1, const uchar *str2, size_t len2, - size_t nchars); + size_t nchars, + uint flags); size_t (*strnxfrm)(CHARSET_INFO *, uchar *dst, size_t dstlen, uint nweights, const uchar *src, size_t srclen, uint flags); |