Bug #18359924: INNODB AND MYISAM CORRUPTION ON PREFIX INDEXES

The problem was in the validation of the input data for blob types. When assigned binary data, the character blob types were only checking if the length of these data is a multiple of the minimum char length for the destination charset. And since e.g. UTF-8's minimum character length is 1 (becuase it's variable length) even byte sequences that are invalid utf-8 strings (e.g. wrong leading byte etc) were copied verbatim into utf-8 columns when coming from binary strings or fields. Storing invalid data into string columns was having all kinds of ill effects on code that assumed that the encoding data are valid to begin with. Fixed by additionally checking the incoming binary string for validity when assigning it to a non-binary string column. Made sure the conversions to charsets with no known "invalid" ranges are not covered by the extra check. Removed trailing spaces. Test case added.
author: Georgi Kodinov <georgi.kodinov@oracle.com> 2014-04-10 13:18:32 +0300
committer: Georgi Kodinov <georgi.kodinov@oracle.com> 2014-04-10 13:18:32 +0300
commit: 37b9a31a3095dd8f4a15b957f1c4b28fe4fab4ed (patch)
tree: ac8da13ae17e0a391e2727dabc8d8ccac5cc518c /sql/sql_string.cc
parent: 92351c831f7fefcbbd48c7e914225fdc55adad36 (diff)
download: mariadb-git-37b9a31a3095dd8f4a15b957f1c4b28fe4fab4ed.tar.gz
1 files changed, 30 insertions, 0 deletions
diff --git a/sql/sql_string.cc b/sql/sql_string.cc
index 07fc7e4ff1d..b9a9ce92cd6 100644
--- a/sql/sql_string.cc
+++ b/sql/sql_string.cc
@@ -224,6 +224,36 @@ bool String::needs_conversion(uint32 arg_length,
 
 
 /*
+  Checks that the source string can just be copied to the destination string
+  without conversion.
+  Unlike needs_conversion it will require conversion on incoming binary data
+  to ensure the data are verified for vailidity first.
+
+  @param arg_length   Length of string to copy.
+  @param from_cs      Character set to copy from
+  @param to_cs        Character set to copy to
+
+  @return conversion needed
+*/
+bool String::needs_conversion_on_storage(uint32 arg_length,
+                                         CHARSET_INFO *cs_from,
+                                         CHARSET_INFO *cs_to)
+{
+  uint32 offset;
+  return (needs_conversion(arg_length, cs_from, cs_to, &offset) ||
+          (cs_from == &my_charset_bin &&      /* force conversion when storing a binary string */
+           cs_to != &my_charset_bin &&        /* into a non-binary destination */
+           (                                  /* and any of the following is true :*/
+            cs_to->mbminlen != cs_to->mbmaxlen || /* it's a variable length encoding */
+            cs_to->mbminlen > 2 ||            /* longer than 2 bytes : neither 1 byte nor ucs2 */
+            0 != (arg_length % cs_to->mbmaxlen)
+           )
+          )
+         );
+}
+
+
+/*
   Copy a multi-byte character sets with adding leading zeros.
 
   SYNOPSIS
author	Georgi Kodinov <georgi.kodinov@oracle.com>	2014-04-10 13:18:32 +0300
committer	Georgi Kodinov <georgi.kodinov@oracle.com>	2014-04-10 13:18:32 +0300
commit	37b9a31a3095dd8f4a15b957f1c4b28fe4fab4ed (patch)
tree	ac8da13ae17e0a391e2727dabc8d8ccac5cc518c /sql/sql_string.cc
parent	92351c831f7fefcbbd48c7e914225fdc55adad36 (diff)
download	mariadb-git-37b9a31a3095dd8f4a15b957f1c4b28fe4fab4ed.tar.gz