From dd0ff30278cd7b24776ccf36a9c0d9171a569750 Mon Sep 17 00:00:00 2001 From: Alexander Barkov Date: Tue, 29 Nov 2016 06:51:12 +0400 Subject: MDEV-11343 LOAD DATA INFILE fails to load data with an escape character followed by a multi-byte character MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Partially backporting MDEV-9874 from 10.2 to 10.0 READ_INFO::read_field() raised the ER_INVALID_CHARACTER_STRING error when reading an escape character followed by a multi-byte character. Raising wellformedness errors in READ_INFO::read_field() was wrong, because the main goal of READ_INFO::read_field() is to *unescape* the data which was presumably escaped using mysql_real_escape_string(), using the same character set with the one specified in "LOAD DATA INFILE ... CHARACTER SET ..." (or assumed by default). During LOAD DATA, multi-byte characters are not always scanned as a single entity! In case of escaped data, parts of a multi-byte character can be scanned on different loop iterations. So the old code erroneously tested welformedness in the middle of a multi-byte character. Moreover, the data after unescaping can go into a BLOB field, not a text field. Wellformedness tests are meaningless in this case. Ater this patch, wellformedness is only checked later, during Field::store(str,length,cs) time. The loop that scans bytes only makes sure to revert the changes made by mysql_real_escape_string(). Note, in some cases users can supply data which did not really go through mysql_real_escape_string() and was escaped by some other means, or was not escaped at all. The file reported in this MDEV contains the string "\ä", which is an example of such improperly escaped data, as - either there should be two backslashes: "\\ä" - or there should be no backslashes at all: "ä" mysql_real_escape_string() could not generate "\ä". --- include/m_ctype.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include') diff --git a/include/m_ctype.h b/include/m_ctype.h index 5994816cbfc..8d9838fdde2 100644 --- a/include/m_ctype.h +++ b/include/m_ctype.h @@ -180,6 +180,9 @@ extern MY_UNI_CTYPE my_uni_ctype[256]; /* A helper macros for "need at least n bytes" */ #define MY_CS_TOOSMALLN(n) (-100-(n)) +#define MY_CS_IS_TOOSMALL(rc) ((rc) >= MY_CS_TOOSMALL6 && (rc) <= MY_CS_TOOSMALL) + + #define MY_SEQ_INTTAIL 1 #define MY_SEQ_SPACES 2 -- cgit v1.2.1