MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD stringsbb-10.4-bar-MDEV-25904

author: Alexander Barkov <bar@mariadb.com> 2021-09-29 15:13:57 +0400
committer: Alexander Barkov <bar@mariadb.com> 2022-01-21 12:16:07 +0400
commit: b915f79e4e004fde4f6ac8f341afee980e11792b (patch)
tree: 2568032d75c7af9a72c6669b306fda4418b5ed20 /include
parent: db574173d19731f1e5dc75d325f72398afac8d59 (diff)
download: mariadb-git-b915f79e4e004fde4f6ac8f341afee980e11792b.tar.gz
1 files changed, 54 insertions, 0 deletions
diff --git a/include/m_ctype.h b/include/m_ctype.h
index 0f6e6a11666..187c8710929 100644
--- a/include/m_ctype.h
+++ b/include/m_ctype.h
@@ -330,6 +330,60 @@ struct my_collation_handler_st
 		       const uchar *, size_t, const uchar *, size_t, my_bool);
   int     (*strnncollsp)(CHARSET_INFO *,
                          const uchar *, size_t, const uchar *, size_t);
+  /*
+    strnncollsp_nchars() - similar to strnncollsp() but assumes that both
+                           strings were originally CHAR(N) values with the
+                           same N, then were optionally space-padded,
+                           or optionally space-trimmed.
+
+                           In other words, this function compares in the way
+                           if we insert both values into a CHAR(N) column
+                           and then compare the two column values.
+
+    It compares the same amount of characters from the two strings.
+    This is especially important for NOPAD collations.
+
+    If CHAR_LENGTH of the two strings are different,
+    the shorter string is virtually padded with trailing spaces
+    up to CHAR_LENGTH of the longer string, to guarantee that the
+    same amount of characters are compared.
+    This is important if the two CHAR(N) strings are space-trimmed 
+    (e.g. like in InnoDB compact format for CHAR).
+
+    The function compares not more than "nchars" characters only.
+    This can be useful to compare CHAR(N) space-padded strings
+    (when the exact N is known) without having to truncate them before
+    the comparison.
+
+    For example, Field_string stores a "CHAR(3) CHARACTER SET utf8mb4" value
+    of "aaa" as 12 bytes in a record buffer:
+    - 3 bytes of the actual data, followed by
+    - 9 bytes of spaces (just fillers, not real data)
+    The caller can pass nchars=3 to compare CHAR(3) record values.
+    In such case, the comparator won't go inside the 9 bytes of the fillers.
+
+    If N is not known, the caller can pass max(len1,len2) as the "nchars" value
+    (i.e. the maximum of the OCTET_LENGTH of the two strings).
+
+    Notes on complex collations.
+
+    This function counts contraction parts as individual characters.
+    For example, the Czech letter 'ch' (in Czech collations)
+    is ordinarily counted by the "nchars" limit as TWO characters
+    (although it is only one letter).
+    This corresponds to what CHAR(N) does in INSERT.
+
+    If the "nchars" limit tears apart a contraction, only the part fitting
+    into "nchars" characters is used. For example, in case of a Czech collation,
+    the string "ach" with nchars=2 is compared as 'ac': the contraction
+    'ch' is torn apart and the letter 'c' acts as an individual character.
+    This emulates the same comparison result with the scenario when we insert
+    'ach' into a CHAR(2) column and then compare it.
+  */
+  int     (*strnncollsp_nchars)(CHARSET_INFO *,
+                                const uchar *str1, size_t len1,
+                                const uchar *str2, size_t len2,
+                                size_t nchars);
   size_t     (*strnxfrm)(CHARSET_INFO *,
                          uchar *dst, size_t dstlen, uint nweights,
                          const uchar *src, size_t srclen, uint flags);
author	Alexander Barkov <bar@mariadb.com>	2021-09-29 15:13:57 +0400
committer	Alexander Barkov <bar@mariadb.com>	2022-01-21 12:16:07 +0400
commit	b915f79e4e004fde4f6ac8f341afee980e11792b (patch)
tree	2568032d75c7af9a72c6669b306fda4418b5ed20 /include
parent	db574173d19731f1e5dc75d325f72398afac8d59 (diff)
download	mariadb-git-b915f79e4e004fde4f6ac8f341afee980e11792b.tar.gz