diff options
author | unknown <malff/marcsql@weblab.(none)> | 2007-06-12 15:23:58 -0600 |
---|---|---|
committer | unknown <malff/marcsql@weblab.(none)> | 2007-06-12 15:23:58 -0600 |
commit | fdb35eea0bf6bdb2c02ee12107f5c2c8f4d3b191 (patch) | |
tree | 5827319d574182509c192b85dcb24886e07ced70 /sql | |
parent | 6bc660b0c040db876ae048b6e80c7dd28074be1b (diff) | |
download | mariadb-git-fdb35eea0bf6bdb2c02ee12107f5c2c8f4d3b191.tar.gz |
Bug#25411 (trigger code truncated), PART II
Bug 28127 (Some valid identifiers names are not parsed correctly)
Bug 26302 (MySQL server cuts off trailing "*/" from comments in SP/func)
This patch is the second part of a major cleanup, required to fix
Bug 25411 (trigger code truncated).
The root cause of the issue stems from the function skip_rear_comments,
which was a work around to remove "extra" "*/" characters from the query
text, when parsing a query and reusing the text fragments to represent a
view, trigger, function or stored procedure.
The reason for this work around is that "special comments",
like /*!50002 XXX */, were not parsed properly, so that a query like:
AAA /*!50002 BBB */ CCC
would be seen by the parser as "AAA BBB */ CCC" when the current version
is greater or equal to 5.0.2
The root cause of this stems from how special comments are parsed.
Special comments are really out-of-bound text that appear inside a query,
that affects how the parser behave.
In nature, /*!50002 XXX */ in MySQL is similar to the C concept
of preprocessing :
#if VERSION >= 50002
XXX
#endif
Depending on the current VERSION of the server, either the special comment
should be expanded or it should be ignored, but in all cases the "text" of
the query should be re-written to strip the "/*!50002" and "*/" markers,
which does not belong to the SQL language itself.
Prior to this fix, these markers would leak into :
- the storage format for VIEW,
- the storage format for FUNCTION,
- the storage format for FUNCTION parameters, in mysql.proc (param_list),
- the storage format for PROCEDURE,
- the storage format for PROCEDURE parameters, in mysql.proc (param_list),
- the storage format for TRIGGER,
- the binary log used for replication.
In all cases, not only this cause format corruption, but also provide a vector
for dormant security issues, by allowing to tunnel code that will be activated
after an upgrade.
The proper solution is to deal with special comments strictly during parsing,
when accepting a query from the outside world.
Once a query is parsed and an object is created with a persistant
representation, this object should not arbitrarily mutate after an upgrade.
In short, special comments are a useful but limited feature for MYSQLdump,
when used at an *interface* level to facilitate import/export,
but bloating the server *internal* storage format is *not* the proper way
to deal with configuration management of the user logic.
With this fix:
- the Lex_input_stream class now acts as a comment pre-processor,
and either expands or ignore special comments on the fly.
- MYSQLlex and sql_yacc.yy have been cleaned up to strictly use the
public interface of Lex_input_stream. In particular, how the input stream
accepts or rejects a character is private to Lex_input_stream, and the
internal buffer pointers of that class are strictly private, and should not
be tempered with during parsing.
This caused many changes mostly in sql_lex.cc.
During the code cleanup in case MY_LEX_NUMBER_IDENT,
Bug 28127 (Some valid identifiers names are not parsed correctly)
was found and fixed.
By parsing special comments properly, and removing the function
'skip_rear_comments' [sic],
Bug 26302 (MySQL server cuts off trailing "*/" from comments in SP/func)
has been fixed as well.
sql/event_data_objects.cc:
Cleanup of the code that extracts the query text
sql/sp.cc:
Cleanup of the code that extracts the query text
sql/sp_head.cc:
Cleanup of the code that extracts the query text
sql/sql_trigger.cc:
Cleanup of the code that extracts the query text
sql/sql_view.cc:
Cleanup of the code that extracts the query text
mysql-test/r/comments.result:
Bug#25411 (trigger code truncated)
mysql-test/r/sp.result:
Bug#25411 (trigger code truncated)
Bug 26302 (MySQL server cuts off trailing "*/" from comments in SP/func)
mysql-test/r/trigger.result:
Bug#25411 (trigger code truncated)
mysql-test/r/varbinary.result:
Bug 28127 (Some valid identifiers names are not parsed correctly)
mysql-test/t/comments.test:
Bug#25411 (trigger code truncated)
mysql-test/t/sp.test:
Bug#25411 (trigger code truncated)
Bug 26302 (MySQL server cuts off trailing "*/" from comments in SP/func)
mysql-test/t/trigger.test:
Bug#25411 (trigger code truncated)
mysql-test/t/varbinary.test:
Bug 28127 (Some valid identifiers names are not parsed correctly)
sql/sql_lex.cc:
Implemented comment pre-processing in Lex_input_stream,
major cleanup of the lex/yacc code to not use Lex_input_stream private members.
sql/sql_lex.h:
Implemented comment pre-processing in Lex_input_stream,
major cleanup of the lex/yacc code to not use Lex_input_stream private members.
sql/sql_yacc.yy:
post merge fix : view_check_options must be parsed before signaling the end of the query
Diffstat (limited to 'sql')
-rw-r--r-- | sql/event_data_objects.cc | 56 | ||||
-rw-r--r-- | sql/sp.cc | 12 | ||||
-rw-r--r-- | sql/sp_head.cc | 19 | ||||
-rw-r--r-- | sql/sql_lex.cc | 558 | ||||
-rw-r--r-- | sql/sql_lex.h | 334 | ||||
-rw-r--r-- | sql/sql_trigger.cc | 22 | ||||
-rw-r--r-- | sql/sql_view.cc | 11 | ||||
-rw-r--r-- | sql/sql_yacc.yy | 125 |
8 files changed, 751 insertions, 386 deletions
diff --git a/sql/event_data_objects.cc b/sql/event_data_objects.cc index 138752d5ce5..c95a6c5dd3c 100644 --- a/sql/event_data_objects.cc +++ b/sql/event_data_objects.cc @@ -148,8 +148,7 @@ Event_parse_data::init_name(THD *thd, sp_name *spn) The body is extracted by copying all data between the start of the body set by another method and the current pointer in Lex. - Some questionable removal of characters is done in here, and that part - should be refactored when the parser is smarter. + See related code in sp_head::init_strings(). */ void @@ -160,58 +159,9 @@ Event_parse_data::init_body(THD *thd) /* This method is called from within the parser, from sql_yacc.yy */ DBUG_ASSERT(thd->m_lip != NULL); - DBUG_PRINT("info", ("body: '%s' body_begin: 0x%lx end: 0x%lx", body_begin, - (long) body_begin, (long) thd->m_lip->ptr)); - - body.length= thd->m_lip->ptr - body_begin; - const char *body_end= body_begin + body.length - 1; - - /* Trim nuls or close-comments ('*'+'/') or spaces at the end */ - while (body_begin < body_end) - { - - if ((*body_end == '\0') || - (my_isspace(thd->variables.character_set_client, *body_end))) - { /* consume NULs and meaningless whitespace */ - --body.length; - --body_end; - continue; - } - - /* - consume closing comments - - This is arguably wrong, but it's the best we have until the parser is - changed to be smarter. FIXME PARSER - - See also the sp_head code, where something like this is done also. - - One idea is to keep in the lexer structure the count of the number of - open-comments we've entered, and scan left-to-right looking for a - closing comment IFF the count is greater than zero. - - Another idea is to remove the closing comment-characters wholly in the - parser, since that's where it "removes" the opening characters. - */ - if ((*(body_end - 1) == '*') && (*body_end == '/')) - { - DBUG_PRINT("info", ("consumend one '*" "/' comment in the query '%s'", - body_begin)); - body.length-= 2; - body_end-= 2; - continue; - } - - break; /* none were found, so we have excised all we can. */ - } - - /* the first is always whitespace which I cannot skip in the parser */ - while (my_isspace(thd->variables.character_set_client, *body_begin)) - { - ++body_begin; - --body.length; - } + body.length= thd->m_lip->get_cpp_ptr() - body_begin; body.str= thd->strmake(body_begin, body.length); + trim_whitespace(thd->charset(), & body); DBUG_VOID_RETURN; } diff --git a/sql/sp.cc b/sql/sp.cc index 49e86f9d07e..8fcc80a1504 100644 --- a/sql/sp.cc +++ b/sql/sp.cc @@ -588,10 +588,14 @@ sp_create_routine(THD *thd, int type, sp_head *sp) log_query.append(STRING_WITH_LEN("CREATE ")); append_definer(thd, &log_query, &thd->lex->definer->user, &thd->lex->definer->host); - log_query.append(thd->lex->stmt_definition_begin, - (char *)sp->m_body_begin - - thd->lex->stmt_definition_begin + - sp->m_body.length); + + LEX_STRING stmt_definition; + stmt_definition.str= (char*) thd->lex->stmt_definition_begin; + stmt_definition.length= thd->lex->stmt_definition_end + - thd->lex->stmt_definition_begin; + trim_whitespace(thd->charset(), & stmt_definition); + + log_query.append(stmt_definition.str, stmt_definition.length); /* Such a statement can always go directly to binlog, no trans cache */ thd->binlog_query(THD::MYSQL_QUERY_TYPE, diff --git a/sql/sp_head.cc b/sql/sp_head.cc index 912176191cf..bd38de2dd42 100644 --- a/sql/sp_head.cc +++ b/sql/sp_head.cc @@ -564,18 +564,17 @@ sp_head::init_strings(THD *thd, LEX *lex) m_params.str= strmake_root(root, m_param_begin, m_params.length); } - /* If ptr has overrun end_of_query then end_of_query is the end */ - endp= (lip->ptr > lip->end_of_query ? lip->end_of_query : lip->ptr); - /* - Trim "garbage" at the end. This is sometimes needed with the - "/ * ! VERSION... * /" wrapper in dump files. - */ - endp= skip_rear_comments(thd->charset(), m_body_begin, endp); + endp= lip->get_cpp_ptr(); + lex->stmt_definition_end= endp; m_body.length= endp - m_body_begin; m_body.str= strmake_root(root, m_body_begin, m_body.length); - m_defstr.length= endp - lip->buf; - m_defstr.str= strmake_root(root, lip->buf, m_defstr.length); + trim_whitespace(thd->charset(), & m_body); + + m_defstr.length= endp - lip->get_cpp_buf(); + m_defstr.str= strmake_root(root, lip->get_cpp_buf(), m_defstr.length); + trim_whitespace(thd->charset(), & m_defstr); + DBUG_VOID_RETURN; } @@ -1827,8 +1826,6 @@ sp_head::reset_lex(THD *thd) sublex->trg_table_fields.empty(); sublex->sp_lex_in_use= FALSE; - sublex->in_comment= oldlex->in_comment; - /* Reset type info. */ sublex->charset= NULL; diff --git a/sql/sql_lex.cc b/sql/sql_lex.cc index f169f2d3c4a..1b8d90d51b6 100644 --- a/sql/sql_lex.cc +++ b/sql/sql_lex.cc @@ -31,16 +31,6 @@ sys_var *trg_new_row_fake_var= (sys_var*) 0x01; -/* Macros to look like lex */ - -#define yyGet() ((uchar) *(lip->ptr++)) -#define yyGetLast() ((uchar) lip->ptr[-1]) -#define yyPeek() ((uchar) lip->ptr[0]) -#define yyPeek2() ((uchar) lip->ptr[1]) -#define yyUnget() lip->ptr-- -#define yySkip() lip->ptr++ -#define yyLength() ((uint) (lip->ptr - lip->tok_start)-1) - /* Longest standard keyword name */ #define TOCK_NAME_LENGTH 24 @@ -127,17 +117,24 @@ Lex_input_stream::Lex_input_stream(THD *thd, yylineno(1), yytoklen(0), yylval(NULL), - ptr(buffer), - tok_start(NULL), - tok_end(NULL), - end_of_query(buffer + length), - tok_start_prev(NULL), - buf(buffer), + m_ptr(buffer), + m_tok_start(NULL), + m_tok_end(NULL), + m_end_of_query(buffer + length), + m_tok_start_prev(NULL), + m_buf(buffer), + m_echo(true), + m_cpp_tok_start(NULL), + m_cpp_tok_start_prev(NULL), + m_cpp_tok_end(NULL), next_state(MY_LEX_START), found_semicolon(NULL), ignore_space(test(thd->variables.sql_mode & MODE_IGNORE_SPACE)), - stmt_prepare_mode(FALSE) + stmt_prepare_mode(FALSE), + in_comment(NO_COMMENT) { + m_cpp_buf= (char*) thd->alloc(length + 1); + m_cpp_ptr= m_cpp_buf; } Lex_input_stream::~Lex_input_stream() @@ -192,7 +189,6 @@ void lex_start(THD *thd) lex->parsing_options.reset(); lex->empty_field_list_on_rset= 0; lex->select_lex.select_number= 1; - lex->in_comment=0; lex->length=0; lex->part_info= 0; lex->select_lex.in_sum_expr=0; @@ -261,7 +257,7 @@ void lex_end(LEX *lex) static int find_keyword(Lex_input_stream *lip, uint len, bool function) { - const char *tok= lip->tok_start; + const char *tok= lip->get_tok_start(); SYMBOL *symbol= get_hash_symbol(tok, len, function); if (symbol) @@ -312,9 +308,9 @@ bool is_lex_native_function(const LEX_STRING *name) static LEX_STRING get_token(Lex_input_stream *lip, uint skip, uint length) { LEX_STRING tmp; - yyUnget(); // ptr points now after last token char + lip->yyUnget(); // ptr points now after last token char tmp.length=lip->yytoklen=length; - tmp.str= lip->m_thd->strmake(lip->tok_start + skip, tmp.length); + tmp.str= lip->m_thd->strmake(lip->get_tok_start() + skip, tmp.length); return tmp; } @@ -332,10 +328,10 @@ static LEX_STRING get_quoted_token(Lex_input_stream *lip, LEX_STRING tmp; const char *from, *end; char *to; - yyUnget(); // ptr points now after last token char + lip->yyUnget(); // ptr points now after last token char tmp.length= lip->yytoklen=length; tmp.str=(char*) lip->m_thd->alloc(tmp.length+1); - from= lip->tok_start + skip; + from= lip->get_tok_start() + skip; to= tmp.str; end= to+length; for ( ; to != end; ) @@ -353,23 +349,25 @@ static LEX_STRING get_quoted_token(Lex_input_stream *lip, Fix sometimes to do only one scan of the string */ -static char *get_text(Lex_input_stream *lip) +static char *get_text(Lex_input_stream *lip, int pre_skip, int post_skip) { reg1 uchar c,sep; uint found_escape=0; CHARSET_INFO *cs= lip->m_thd->charset(); - sep= yyGetLast(); // String should end with this - while (lip->ptr != lip->end_of_query) + sep= lip->yyGetLast(); // String should end with this + while (! lip->eof()) { - c = yyGet(); + c= lip->yyGet(); #ifdef USE_MB { int l; if (use_mb(cs) && - (l = my_ismbchar(cs, lip->ptr-1, lip->end_of_query))) { - lip->ptr += l-1; - continue; + (l = my_ismbchar(cs, + lip->get_ptr() -1, + lip->get_end_of_query()))) { + lip->skip_binary(l-1); + continue; } } #endif @@ -377,26 +375,31 @@ static char *get_text(Lex_input_stream *lip) !(lip->m_thd->variables.sql_mode & MODE_NO_BACKSLASH_ESCAPES)) { // Escaped character found_escape=1; - if (lip->ptr == lip->end_of_query) + if (lip->eof()) return 0; - yySkip(); + lip->yySkip(); } else if (c == sep) { - if (c == yyGet()) // Check if two separators in a row + if (c == lip->yyGet()) // Check if two separators in a row { - found_escape=1; // dupplicate. Remember for delete + found_escape=1; // duplicate. Remember for delete continue; } else - yyUnget(); + lip->yyUnget(); /* Found end. Unescape and return string */ const char *str, *end; char *start; - str=lip->tok_start+1; - end=lip->ptr-1; + str= lip->get_tok_start(); + end= lip->get_ptr(); + /* Extract the text from the token */ + str += pre_skip; + end -= post_skip; + DBUG_ASSERT(end >= str); + if (!(start= (char*) lip->m_thd->alloc((uint) (end-str)+1))) return (char*) ""; // Sql_alloc has set error flag if (!found_escape) @@ -581,9 +584,7 @@ int MYSQLlex(void *arg, void *yythd) lip->yylval=yylval; // The global state - lip->tok_start_prev= lip->tok_start; - - lip->tok_start=lip->tok_end=lip->ptr; + lip->start_token(); state=lip->next_state; lip->next_state=MY_LEX_OPERATOR_OR_IDENT; LINT_INIT(c); @@ -592,17 +593,22 @@ int MYSQLlex(void *arg, void *yythd) switch (state) { case MY_LEX_OPERATOR_OR_IDENT: // Next is operator or keyword case MY_LEX_START: // Start of token - // Skip startspace - for (c=yyGet() ; (state_map[c] == MY_LEX_SKIP) ; c= yyGet()) + // Skip starting whitespace + while(state_map[c= lip->yyPeek()] == MY_LEX_SKIP) { if (c == '\n') lip->yylineno++; + + lip->yySkip(); } - lip->tok_start=lip->ptr-1; // Start of real token + + /* Start of real token */ + lip->restart_token(); + c= lip->yyGet(); state= (enum my_lex_states) state_map[c]; break; case MY_LEX_ESCAPE: - if (yyGet() == 'N') + if (lip->yyGet() == 'N') { // Allow \N as shortcut for NULL yylval->lex_str.str=(char*) "\\N"; yylval->lex_str.length=2; @@ -610,40 +616,53 @@ int MYSQLlex(void *arg, void *yythd) } case MY_LEX_CHAR: // Unknown or single char token case MY_LEX_SKIP: // This should not happen - if (c == '-' && yyPeek() == '-' && - (my_isspace(cs,yyPeek2()) || - my_iscntrl(cs,yyPeek2()))) + if (c == '-' && lip->yyPeek() == '-' && + (my_isspace(cs,lip->yyPeekn(1)) || + my_iscntrl(cs,lip->yyPeekn(1)))) { state=MY_LEX_COMMENT; break; } - yylval->lex_str.str=(char*) (lip->ptr=lip->tok_start);// Set to first chr - yylval->lex_str.length=1; - c=yyGet(); + if (c != ')') lip->next_state= MY_LEX_START; // Allow signed numbers + if (c == ',') - lip->tok_start=lip->ptr; // Let tok_start point at next item - /* - Check for a placeholder: it should not precede a possible identifier - because of binlogging: when a placeholder is replaced with - its value in a query for the binlog, the query must stay - grammatically correct. - */ - else if (c == '?' && lip->stmt_prepare_mode && !ident_map[yyPeek()]) + { + /* + Warning: + This is a work around, to make the "remember_name" rule in + sql/sql_yacc.yy work properly. + The problem is that, when parsing "select expr1, expr2", + the code generated by bison executes the *pre* action + remember_name (see select_item) *before* actually parsing the + first token of expr2. + */ + lip->restart_token(); + } + else + { + /* + Check for a placeholder: it should not precede a possible identifier + because of binlogging: when a placeholder is replaced with + its value in a query for the binlog, the query must stay + grammatically correct. + */ + if (c == '?' && lip->stmt_prepare_mode && !ident_map[lip->yyPeek()]) return(PARAM_MARKER); + } + return((int) c); case MY_LEX_IDENT_OR_NCHAR: - if (yyPeek() != '\'') - { + if (lip->yyPeek() != '\'') + { state= MY_LEX_IDENT; break; } /* Found N'string' */ - lip->tok_start++; // Skip N - yySkip(); // Skip ' - if (!(yylval->lex_str.str = get_text(lip))) + lip->yySkip(); // Skip ' + if (!(yylval->lex_str.str = get_text(lip, 2, 1))) { state= MY_LEX_CHAR; // Read char by char break; @@ -652,13 +671,13 @@ int MYSQLlex(void *arg, void *yythd) return(NCHAR_STRING); case MY_LEX_IDENT_OR_HEX: - if (yyPeek() == '\'') + if (lip->yyPeek() == '\'') { // Found x'hex-number' state= MY_LEX_HEX_NUMBER; break; } case MY_LEX_IDENT_OR_BIN: - if (yyPeek() == '\'') + if (lip->yyPeek() == '\'') { // Found b'bin-number' state= MY_LEX_BIN_NUMBER; break; @@ -669,54 +688,58 @@ int MYSQLlex(void *arg, void *yythd) if (use_mb(cs)) { result_state= IDENT_QUOTED; - if (my_mbcharlen(cs, yyGetLast()) > 1) + if (my_mbcharlen(cs, lip->yyGetLast()) > 1) { - int l = my_ismbchar(cs, lip->ptr-1, lip->end_of_query); + int l = my_ismbchar(cs, + lip->get_ptr() -1, + lip->get_end_of_query()); if (l == 0) { state = MY_LEX_CHAR; continue; } - lip->ptr += l - 1; + lip->skip_binary(l - 1); } - while (ident_map[c=yyGet()]) + while (ident_map[c=lip->yyGet()]) { if (my_mbcharlen(cs, c) > 1) { int l; - if ((l = my_ismbchar(cs, lip->ptr-1, lip->end_of_query)) == 0) + if ((l = my_ismbchar(cs, + lip->get_ptr() -1, + lip->get_end_of_query())) == 0) break; - lip->ptr += l-1; + lip->skip_binary(l-1); } } } else #endif { - for (result_state= c; ident_map[c= yyGet()]; result_state|= c); + for (result_state= c; ident_map[c= lip->yyGet()]; result_state|= c); /* If there were non-ASCII characters, mark that we must convert */ result_state= result_state & 0x80 ? IDENT_QUOTED : IDENT; } - length= (uint) (lip->ptr - lip->tok_start)-1; - start= lip->ptr; + length= lip->yyLength(); + start= lip->get_ptr(); if (lip->ignore_space) { /* If we find a space then this can't be an identifier. We notice this below by checking start != lex->ptr. */ - for (; state_map[c] == MY_LEX_SKIP ; c= yyGet()); + for (; state_map[c] == MY_LEX_SKIP ; c= lip->yyGet()); } - if (start == lip->ptr && c == '.' && ident_map[yyPeek()]) + if (start == lip->get_ptr() && c == '.' && ident_map[lip->yyPeek()]) lip->next_state=MY_LEX_IDENT_SEP; else { // '(' must follow directly if function - yyUnget(); - if ((tokval = find_keyword(lip, length,c == '('))) + lip->yyUnget(); + if ((tokval = find_keyword(lip, length, c == '('))) { lip->next_state= MY_LEX_START; // Allow signed numbers return(tokval); // Was keyword } - yySkip(); // next state does a unget + lip->yySkip(); // next state does a unget } yylval->lex_str=get_token(lip, 0, length); @@ -735,16 +758,48 @@ int MYSQLlex(void *arg, void *yythd) return(result_state); // IDENT or IDENT_QUOTED case MY_LEX_IDENT_SEP: // Found ident and now '.' - yylval->lex_str.str=(char*) lip->ptr; - yylval->lex_str.length=1; - c=yyGet(); // should be '.' + yylval->lex_str.str= (char*) lip->get_ptr(); + yylval->lex_str.length= 1; + c= lip->yyGet(); // should be '.' lip->next_state= MY_LEX_IDENT_START;// Next is an ident (not a keyword) - if (!ident_map[yyPeek()]) // Probably ` or " + if (!ident_map[lip->yyPeek()]) // Probably ` or " lip->next_state= MY_LEX_START; return((int) c); case MY_LEX_NUMBER_IDENT: // number or ident which num-start - while (my_isdigit(cs,(c = yyGet()))) ; + if (lip->yyGetLast() == '0') + { + c= lip->yyGet(); + if (c == 'x') + { + while (my_isxdigit(cs,(c = lip->yyGet()))) ; + if ((lip->yyLength() >= 3) && !ident_map[c]) + { + /* skip '0x' */ + yylval->lex_str=get_token(lip, 2, lip->yyLength()-2); + return (HEX_NUM); + } + lip->yyUnget(); + state= MY_LEX_IDENT_START; + break; + } + else if (c == 'b') + { + while ((c= lip->yyGet()) == '0' || c == '1'); + if ((lip->yyLength() >= 3) && !ident_map[c]) + { + /* Skip '0b' */ + yylval->lex_str= get_token(lip, 2, lip->yyLength()-2); + return (BIN_NUM); + } + lip->yyUnget(); + state= MY_LEX_IDENT_START; + break; + } + lip->yyUnget(); + } + + while (my_isdigit(cs, (c = lip->yyGet()))) ; if (!ident_map[c]) { // Can't be identifier state=MY_LEX_INT_OR_REAL; @@ -753,42 +808,18 @@ int MYSQLlex(void *arg, void *yythd) if (c == 'e' || c == 'E') { // The following test is written this way to allow numbers of type 1e1 - if (my_isdigit(cs,yyPeek()) || - (c=(yyGet())) == '+' || c == '-') + if (my_isdigit(cs,lip->yyPeek()) || + (c=(lip->yyGet())) == '+' || c == '-') { // Allow 1E+10 - if (my_isdigit(cs,yyPeek())) // Number must have digit after sign + if (my_isdigit(cs,lip->yyPeek())) // Number must have digit after sign { - yySkip(); - while (my_isdigit(cs,yyGet())) ; - yylval->lex_str=get_token(lip, 0, yyLength()); + lip->yySkip(); + while (my_isdigit(cs,lip->yyGet())) ; + yylval->lex_str=get_token(lip, 0, lip->yyLength()); return(FLOAT_NUM); } } - yyUnget(); /* purecov: inspected */ - } - else if (c == 'x' && (lip->ptr - lip->tok_start) == 2 && - lip->tok_start[0] == '0' ) - { // Varbinary - while (my_isxdigit(cs,(c = yyGet()))) ; - if ((lip->ptr - lip->tok_start) >= 4 && !ident_map[c]) - { - /* skip '0x' */ - yylval->lex_str=get_token(lip, 2, yyLength()-2); - return (HEX_NUM); - } - yyUnget(); - } - else if (c == 'b' && (lip->ptr - lip->tok_start) == 2 && - lip->tok_start[0] == '0' ) - { // b'bin-number' - while (my_isxdigit(cs,(c = yyGet()))) ; - if ((lip->ptr - lip->tok_start) >= 4 && !ident_map[c]) - { - /* Skip '0b' */ - yylval->lex_str= get_token(lip, 2, yyLength()-2); - return (BIN_NUM); - } - yyUnget(); + lip->yyUnget(); } // fall through case MY_LEX_IDENT_START: // We come here after '.' @@ -797,44 +828,46 @@ int MYSQLlex(void *arg, void *yythd) if (use_mb(cs)) { result_state= IDENT_QUOTED; - while (ident_map[c=yyGet()]) + while (ident_map[c=lip->yyGet()]) { if (my_mbcharlen(cs, c) > 1) { int l; - if ((l = my_ismbchar(cs, lip->ptr-1, lip->end_of_query)) == 0) + if ((l = my_ismbchar(cs, + lip->get_ptr() -1, + lip->get_end_of_query())) == 0) break; - lip->ptr += l-1; + lip->skip_binary(l-1); } } } else #endif { - for (result_state=0; ident_map[c= yyGet()]; result_state|= c); + for (result_state=0; ident_map[c= lip->yyGet()]; result_state|= c); /* If there were non-ASCII characters, mark that we must convert */ result_state= result_state & 0x80 ? IDENT_QUOTED : IDENT; } - if (c == '.' && ident_map[yyPeek()]) + if (c == '.' && ident_map[lip->yyPeek()]) lip->next_state=MY_LEX_IDENT_SEP;// Next is '.' - yylval->lex_str= get_token(lip, 0, yyLength()); + yylval->lex_str= get_token(lip, 0, lip->yyLength()); return(result_state); case MY_LEX_USER_VARIABLE_DELIMITER: // Found quote char { uint double_quotes= 0; char quote_char= c; // Used char - while ((c=yyGet())) + while ((c=lip->yyGet())) { int var_length; if ((var_length= my_mbcharlen(cs, c)) == 1) { if (c == quote_char) { - if (yyPeek() != quote_char) + if (lip->yyPeek() != quote_char) break; - c=yyGet(); + c=lip->yyGet(); double_quotes++; continue; } @@ -842,78 +875,78 @@ int MYSQLlex(void *arg, void *yythd) #ifdef USE_MB else if (var_length < 1) break; // Error - lip->ptr+= var_length-1; + lip->skip_binary(var_length-1); #endif } if (double_quotes) yylval->lex_str=get_quoted_token(lip, 1, - yyLength() - double_quotes -1, + lip->yyLength() - double_quotes -1, quote_char); else - yylval->lex_str=get_token(lip, 1, yyLength() -1); + yylval->lex_str=get_token(lip, 1, lip->yyLength() -1); if (c == quote_char) - yySkip(); // Skip end ` + lip->yySkip(); // Skip end ` lip->next_state= MY_LEX_START; return(IDENT_QUOTED); } case MY_LEX_INT_OR_REAL: // Complete int or incomplete real if (c != '.') { // Found complete integer number. - yylval->lex_str=get_token(lip, 0, yyLength()); + yylval->lex_str=get_token(lip, 0, lip->yyLength()); return int_token(yylval->lex_str.str,yylval->lex_str.length); } // fall through case MY_LEX_REAL: // Incomplete real number - while (my_isdigit(cs,c = yyGet())) ; + while (my_isdigit(cs,c = lip->yyGet())) ; if (c == 'e' || c == 'E') { - c = yyGet(); + c = lip->yyGet(); if (c == '-' || c == '+') - c = yyGet(); // Skip sign + c = lip->yyGet(); // Skip sign if (!my_isdigit(cs,c)) { // No digit after sign state= MY_LEX_CHAR; break; } - while (my_isdigit(cs,yyGet())) ; - yylval->lex_str=get_token(lip, 0, yyLength()); + while (my_isdigit(cs,lip->yyGet())) ; + yylval->lex_str=get_token(lip, 0, lip->yyLength()); return(FLOAT_NUM); } - yylval->lex_str=get_token(lip, 0, yyLength()); + yylval->lex_str=get_token(lip, 0, lip->yyLength()); return(DECIMAL_NUM); case MY_LEX_HEX_NUMBER: // Found x'hexstring' - yyGet(); // Skip ' - while (my_isxdigit(cs,(c = yyGet()))) ; - length=(lip->ptr - lip->tok_start); // Length of hexnum+3 - if (!(length & 1) || c != '\'') - { - return(ABORT_SYM); // Illegal hex constant - } - yyGet(); // get_token makes an unget + lip->yySkip(); // Accept opening ' + while (my_isxdigit(cs, (c= lip->yyGet()))) ; + if (c != '\'') + return(ABORT_SYM); // Illegal hex constant + lip->yySkip(); // Accept closing ' + length= lip->yyLength(); // Length of hexnum+3 + if ((length % 2) == 0) + return(ABORT_SYM); // odd number of hex digits yylval->lex_str=get_token(lip, 2, // skip x' length-3); // don't count x' and last ' return (HEX_NUM); case MY_LEX_BIN_NUMBER: // Found b'bin-string' - yyGet(); // Skip ' - while ((c= yyGet()) == '0' || c == '1'); - length= (lip->ptr - lip->tok_start); // Length of bin-num + 3 + lip->yySkip(); // Accept opening ' + while ((c= lip->yyGet()) == '0' || c == '1'); if (c != '\'') - return(ABORT_SYM); // Illegal hex constant - yyGet(); // get_token makes an unget + return(ABORT_SYM); // Illegal hex constant + lip->yySkip(); // Accept closing ' + length= lip->yyLength(); // Length of bin-num + 3 yylval->lex_str= get_token(lip, 2, // skip b' length-3); // don't count b' and last ' return (BIN_NUM); case MY_LEX_CMP_OP: // Incomplete comparison operator - if (state_map[yyPeek()] == MY_LEX_CMP_OP || - state_map[yyPeek()] == MY_LEX_LONG_CMP_OP) - yySkip(); - if ((tokval = find_keyword(lip, (uint) (lip->ptr - lip->tok_start),0))) + if (state_map[lip->yyPeek()] == MY_LEX_CMP_OP || + state_map[lip->yyPeek()] == MY_LEX_LONG_CMP_OP) + lip->yySkip(); + if ((tokval = find_keyword(lip, lip->yyLength() + 1, 0))) { lip->next_state= MY_LEX_START; // Allow signed numbers return(tokval); @@ -922,14 +955,14 @@ int MYSQLlex(void *arg, void *yythd) break; case MY_LEX_LONG_CMP_OP: // Incomplete comparison operator - if (state_map[yyPeek()] == MY_LEX_CMP_OP || - state_map[yyPeek()] == MY_LEX_LONG_CMP_OP) + if (state_map[lip->yyPeek()] == MY_LEX_CMP_OP || + state_map[lip->yyPeek()] == MY_LEX_LONG_CMP_OP) { - yySkip(); - if (state_map[yyPeek()] == MY_LEX_CMP_OP) - yySkip(); + lip->yySkip(); + if (state_map[lip->yyPeek()] == MY_LEX_CMP_OP) + lip->yySkip(); } - if ((tokval = find_keyword(lip, (uint) (lip->ptr - lip->tok_start),0))) + if ((tokval = find_keyword(lip, lip->yyLength() + 1, 0))) { lip->next_state= MY_LEX_START; // Found long op return(tokval); @@ -938,12 +971,12 @@ int MYSQLlex(void *arg, void *yythd) break; case MY_LEX_BOOL: - if (c != yyPeek()) + if (c != lip->yyPeek()) { state=MY_LEX_CHAR; break; } - yySkip(); + lip->yySkip(); tokval = find_keyword(lip,2,0); // Is a bool operator lip->next_state= MY_LEX_START; // Allow signed numbers return(tokval); @@ -956,7 +989,7 @@ int MYSQLlex(void *arg, void *yythd) } /* " used for strings */ case MY_LEX_STRING: // Incomplete text string - if (!(yylval->lex_str.str = get_text(lip))) + if (!(yylval->lex_str.str = get_text(lip, 1, 1))) { state= MY_LEX_CHAR; // Read char by char break; @@ -966,82 +999,138 @@ int MYSQLlex(void *arg, void *yythd) case MY_LEX_COMMENT: // Comment lex->select_lex.options|= OPTION_FOUND_COMMENT; - while ((c = yyGet()) != '\n' && c) ; - yyUnget(); // Safety against eof + while ((c = lip->yyGet()) != '\n' && c) ; + lip->yyUnget(); // Safety against eof state = MY_LEX_START; // Try again break; case MY_LEX_LONG_COMMENT: /* Long C comment? */ - if (yyPeek() != '*') + if (lip->yyPeek() != '*') { state=MY_LEX_CHAR; // Probable division break; } - yySkip(); // Skip '*' lex->select_lex.options|= OPTION_FOUND_COMMENT; - if (yyPeek() == '!') // MySQL command in comment + /* Reject '/' '*', since we might need to turn off the echo */ + lip->yyUnget(); + + if (lip->yyPeekn(2) == '!') { - ulong version=MYSQL_VERSION_ID; - yySkip(); - state=MY_LEX_START; - if (my_isdigit(cs,yyPeek())) - { // Version number - version=strtol((char*) lip->ptr,(char**) &lip->ptr,10); - } - if (version <= MYSQL_VERSION_ID) - { - lex->in_comment=1; - break; - } + lip->in_comment= DISCARD_COMMENT; + /* Accept '/' '*' '!', but do not keep this marker. */ + lip->set_echo(false); + lip->yySkip(); + lip->yySkip(); + lip->yySkip(); + + /* + The special comment format is very strict: + '/' '*' '!', followed by exactly + 2 digits (major), then 3 digits (minor). + */ + char version_str[6]; + version_str[0]= lip->yyPeekn(0); + version_str[1]= lip->yyPeekn(1); + version_str[2]= lip->yyPeekn(2); + version_str[3]= lip->yyPeekn(3); + version_str[4]= lip->yyPeekn(4); + version_str[5]= 0; + if ( my_isdigit(cs, version_str[0]) + && my_isdigit(cs, version_str[1]) + && my_isdigit(cs, version_str[2]) + && my_isdigit(cs, version_str[3]) + && my_isdigit(cs, version_str[4]) + ) + { + ulong version; + version=strtol(version_str, NULL, 10); + + /* Accept 'M' 'M' 'm' 'm' 'm' */ + lip->yySkipn(5); + + if (version <= MYSQL_VERSION_ID) + { + /* Expand the content of the special comment as real code */ + lip->set_echo(true); + state=MY_LEX_START; + break; + } + } + else + { + state=MY_LEX_START; + lip->set_echo(true); + break; + } } - while (lip->ptr != lip->end_of_query && - ((c=yyGet()) != '*' || yyPeek() != '/')) + else { - if (c == '\n') - lip->yylineno++; + lip->in_comment= PRESERVE_COMMENT; + lip->yySkip(); // Accept / + lip->yySkip(); // Accept * } - if (lip->ptr != lip->end_of_query) - yySkip(); // remove last '/' - state = MY_LEX_START; // Try again + + while (! lip->eof() && + ((c=lip->yyGet()) != '*' || lip->yyPeek() != '/')) + { + if (c == '\n') + lip->yylineno++; + } + if (! lip->eof()) + lip->yySkip(); // remove last '/' + state = MY_LEX_START; // Try again + lip->set_echo(true); break; case MY_LEX_END_LONG_COMMENT: - if (lex->in_comment && yyPeek() == '/') + if ((lip->in_comment != NO_COMMENT) && lip->yyPeek() == '/') { - yySkip(); - lex->in_comment=0; - state=MY_LEX_START; + /* Reject '*' '/' */ + lip->yyUnget(); + /* Accept '*' '/', with the proper echo */ + lip->set_echo(lip->in_comment == PRESERVE_COMMENT); + lip->yySkipn(2); + /* And start recording the tokens again */ + lip->set_echo(true); + lip->in_comment=NO_COMMENT; + state=MY_LEX_START; } else state=MY_LEX_CHAR; // Return '*' break; case MY_LEX_SET_VAR: // Check if ':=' - if (yyPeek() != '=') + if (lip->yyPeek() != '=') { state=MY_LEX_CHAR; // Return ':' break; } - yySkip(); + lip->yySkip(); return (SET_VAR); case MY_LEX_SEMICOLON: // optional line terminator - if (yyPeek()) + if (lip->yyPeek()) { if ((thd->client_capabilities & CLIENT_MULTI_STATEMENTS) && !lip->stmt_prepare_mode) { lex->safe_to_cache_query= 0; - lip->found_semicolon= lip->ptr; + lip->found_semicolon= lip->get_ptr(); thd->server_status|= SERVER_MORE_RESULTS_EXISTS; lip->next_state= MY_LEX_END; + lip->set_echo(true); return (END_OF_INPUT); } state= MY_LEX_CHAR; // Return ';' break; } - /* fall true */ + lip->next_state=MY_LEX_END; // Mark for next loop + return(END_OF_INPUT); case MY_LEX_EOL: - if (lip->ptr >= lip->end_of_query) + if (lip->eof()) { - lip->next_state=MY_LEX_END; // Mark for next loop - return(END_OF_INPUT); + lip->yyUnget(); // Reject the last '\0' + lip->set_echo(false); + lip->yySkip(); + lip->set_echo(true); + lip->next_state=MY_LEX_END; // Mark for next loop + return(END_OF_INPUT); } state=MY_LEX_CHAR; break; @@ -1051,16 +1140,16 @@ int MYSQLlex(void *arg, void *yythd) /* Actually real shouldn't start with . but allow them anyhow */ case MY_LEX_REAL_OR_POINT: - if (my_isdigit(cs,yyPeek())) + if (my_isdigit(cs,lip->yyPeek())) state = MY_LEX_REAL; // Real else { state= MY_LEX_IDENT_SEP; // return '.' - yyUnget(); // Put back '.' + lip->yyUnget(); // Put back '.' } break; case MY_LEX_USER_END: // end '@' of user@hostname - switch (state_map[yyPeek()]) { + switch (state_map[lip->yyPeek()]) { case MY_LEX_STRING: case MY_LEX_USER_VARIABLE_DELIMITER: case MY_LEX_STRING_OR_DELIMITER: @@ -1072,20 +1161,20 @@ int MYSQLlex(void *arg, void *yythd) lip->next_state=MY_LEX_HOSTNAME; break; } - yylval->lex_str.str=(char*) lip->ptr; + yylval->lex_str.str=(char*) lip->get_ptr(); yylval->lex_str.length=1; return((int) '@'); case MY_LEX_HOSTNAME: // end '@' of user@hostname - for (c=yyGet() ; + for (c=lip->yyGet() ; my_isalnum(cs,c) || c == '.' || c == '_' || c == '$'; - c= yyGet()) ; - yylval->lex_str=get_token(lip, 0, yyLength()); + c= lip->yyGet()) ; + yylval->lex_str=get_token(lip, 0, lip->yyLength()); return(LEX_HOSTNAME); case MY_LEX_SYSTEM_VAR: - yylval->lex_str.str=(char*) lip->ptr; + yylval->lex_str.str=(char*) lip->get_ptr(); yylval->lex_str.length=1; - yySkip(); // Skip '@' - lip->next_state= (state_map[yyPeek()] == + lip->yySkip(); // Skip '@' + lip->next_state= (state_map[lip->yyPeek()] == MY_LEX_USER_VARIABLE_DELIMITER ? MY_LEX_OPERATOR_OR_IDENT : MY_LEX_IDENT_OR_KEYWORD); @@ -1096,19 +1185,19 @@ int MYSQLlex(void *arg, void *yythd) We should now be able to handle: [(global | local | session) .]variable_name */ - - for (result_state= 0; ident_map[c= yyGet()]; result_state|= c); + + for (result_state= 0; ident_map[c= lip->yyGet()]; result_state|= c); /* If there were non-ASCII characters, mark that we must convert */ result_state= result_state & 0x80 ? IDENT_QUOTED : IDENT; - + if (c == '.') lip->next_state=MY_LEX_IDENT_SEP; - length= (uint) (lip->ptr - lip->tok_start)-1; - if (length == 0) + length= lip->yyLength(); + if (length == 0) return(ABORT_SYM); // Names must be nonempty. if ((tokval= find_keyword(lip, length,0))) { - yyUnget(); // Put back 'c' + lip->yyUnget(); // Put back 'c' return(tokval); // Was keyword } yylval->lex_str=get_token(lip, 0, length); @@ -1149,32 +1238,31 @@ Alter_info::Alter_info(const Alter_info &rhs, MEM_ROOT *mem_root) } -/* - Skip comment in the end of statement. - - SYNOPSIS - skip_rear_comments() - cs character set - begin pointer to the beginning of statement - end pointer to the end of statement - - DESCRIPTION - The function is intended to trim comments at the end of the statement. +void trim_whitespace(CHARSET_INFO *cs, LEX_STRING *str) +{ + /* + TODO: + This code assumes that there are no multi-bytes characters + that can be considered white-space. + */ - RETURN - Pointer to the last non-comment symbol of the statement. -*/ + while ((str->length > 0) && (my_isspace(cs, str->str[0]))) + { + str->length --; + str->str ++; + } -const char *skip_rear_comments(CHARSET_INFO *cs, const char *begin, - const char *end) -{ - while (begin < end && (end[-1] == '*' || - end[-1] == '/' || end[-1] == ';' || - my_isspace(cs, end[-1]))) - end-= 1; - return end; + /* + FIXME: + Also, parsing backward is not safe with multi bytes characters + */ + while ((str->length > 0) && (my_isspace(cs, str->str[str->length-1]))) + { + str->length --; + } } + /* st_select_lex structures initialisations */ diff --git a/sql/sql_lex.h b/sql/sql_lex.h index 2cecf282e7f..7057bf4b5ef 100644 --- a/sql/sql_lex.h +++ b/sql/sql_lex.h @@ -1049,8 +1049,40 @@ struct st_parsing_options /** + The state of the lexical parser, when parsing comments. +*/ +enum enum_comment_state +{ + /** + Not parsing comments. + */ + NO_COMMENT, + /** + Parsing comments that need to be preserved. + Typically, these are user comments '/' '*' ... '*' '/'. + */ + PRESERVE_COMMENT, + /** + Parsing comments that need to be discarded. + Typically, these are special comments '/' '*' '!' ... '*' '/', + or '/' '*' '!' 'M' 'M' 'm' 'm' 'm' ... '*' '/', where the comment + markers should not be expanded. + */ + DISCARD_COMMENT +}; + + +/** This class represents the character input stream consumed during lexical analysis. + In addition to consuming the input stream, this class performs some + comment pre processing, by filtering out out of bound special text + from the query input stream. + Two buffers, with pointers inside each buffers, are maintained in + parallel. The 'raw' buffer is the original query text, which may + contain out-of-bound comments. The 'cpp' (for comments pre processor) + is the pre-processed buffer that contains only the query text that + should be seen once out-of-bound data is removed. */ class Lex_input_stream { @@ -1058,6 +1090,218 @@ public: Lex_input_stream(THD *thd, const char* buff, unsigned int length); ~Lex_input_stream(); + /** + Set the echo mode. + When echo is true, characters parsed from the raw input stream are + preserved. When false, characters parsed are silently ignored. + @param echo the echo mode. + */ + void set_echo(bool echo) + { + m_echo= echo; + } + + /** + Skip binary from the input stream. + @param n number of bytes to accept. + */ + void skip_binary(int n) + { + if (m_echo) + { + memcpy(m_cpp_ptr, m_ptr, n); + m_cpp_ptr += n; + } + m_ptr += n; + } + + /** + Get a character, and advance in the stream. + @return the next character to parse. + */ + char yyGet() + { + char c= *m_ptr++; + if (m_echo) + *m_cpp_ptr++ = c; + return c; + } + + /** + Get the last character accepted. + @return the last character accepted. + */ + char yyGetLast() + { + return m_ptr[-1]; + } + + /** + Look at the next character to parse, but do not accept it. + */ + char yyPeek() + { + return m_ptr[0]; + } + + /** + Look ahead at some character to parse. + @param n offset of the character to look up + */ + char yyPeekn(int n) + { + return m_ptr[n]; + } + + /** + Cancel the effect of the last yyGet() or yySkip(). + Note that the echo mode should not change between calls to yyGet / yySkip + and yyUnget. The caller is responsible for ensuring that. + */ + void yyUnget() + { + m_ptr--; + if (m_echo) + m_cpp_ptr--; + } + + /** + Accept a character, by advancing the input stream. + */ + void yySkip() + { + if (m_echo) + *m_cpp_ptr++ = *m_ptr++; + else + m_ptr++; + } + + /** + Accept multiple characters at once. + @param n the number of characters to accept. + */ + void yySkipn(int n) + { + if (m_echo) + { + memcpy(m_cpp_ptr, m_ptr, n); + m_cpp_ptr += n; + } + m_ptr += n; + } + + /** + End of file indicator for the query text to parse. + @return true if there are no more characters to parse + */ + bool eof() + { + return (m_ptr >= m_end_of_query); + } + + /** + End of file indicator for the query text to parse. + @param n number of characters expected + @return true if there are less than n characters to parse + */ + bool eof(int n) + { + return ((m_ptr + n) >= m_end_of_query); + } + + /** Get the raw query buffer. */ + const char* get_buf() + { + return m_buf; + } + + /** Get the pre-processed query buffer. */ + const char* get_cpp_buf() + { + return m_cpp_buf; + } + + /** Get the end of the raw query buffer. */ + const char* get_end_of_query() + { + return m_end_of_query; + } + + /** Mark the stream position as the start of a new token. */ + void start_token() + { + m_tok_start_prev= m_tok_start; + m_tok_start= m_ptr; + m_tok_end= m_ptr; + + m_cpp_tok_start_prev= m_cpp_tok_start; + m_cpp_tok_start= m_cpp_ptr; + m_cpp_tok_end= m_cpp_ptr; + } + + /** + Adjust the starting position of the current token. + This is used to compensate for starting whitespace. + */ + void restart_token() + { + m_tok_start= m_ptr; + m_cpp_tok_start= m_cpp_ptr; + } + + /** Get the token start position, in the raw buffer. */ + const char* get_tok_start() + { + return m_tok_start; + } + + /** Get the token start position, in the pre-processed buffer. */ + const char* get_cpp_tok_start() + { + return m_cpp_tok_start; + } + + /** Get the token end position, in the raw buffer. */ + const char* get_tok_end() + { + return m_tok_end; + } + + /** Get the token end position, in the pre-processed buffer. */ + const char* get_cpp_tok_end() + { + return m_cpp_tok_end; + } + + /** Get the previous token start position, in the raw buffer. */ + const char* get_tok_start_prev() + { + return m_tok_start_prev; + } + + /** Get the current stream pointer, in the raw buffer. */ + const char* get_ptr() + { + return m_ptr; + } + + /** Get the current stream pointer, in the pre-processed buffer. */ + const char* get_cpp_ptr() + { + return m_cpp_ptr; + } + + /** Get the length of the current token, in the raw buffer. */ + uint yyLength() + { + /* + The assumption is that the lexical analyser is always 1 character ahead, + which the -1 account for. + */ + DBUG_ASSERT(m_ptr > m_tok_start); + return (uint) ((m_ptr - m_tok_start) - 1); + } + /** Current thread. */ THD *m_thd; @@ -1070,37 +1314,74 @@ public: /** Interface with bison, value of the last token parsed. */ LEX_YYSTYPE yylval; - /** Pointer to the current position in the input stream. */ - const char* ptr; +private: + /** Pointer to the current position in the raw input stream. */ + const char* m_ptr; + + /** Starting position of the last token parsed, in the raw buffer. */ + const char* m_tok_start; - /** Starting position of the last token parsed. */ - const char* tok_start; + /** Ending position of the previous token parsed, in the raw buffer. */ + const char* m_tok_end; - /** Ending position of the last token parsed. */ - const char* tok_end; + /** End of the query text in the input stream, in the raw buffer. */ + const char* m_end_of_query; - /** End of the query text in the input stream. */ - const char* end_of_query; + /** Starting position of the previous token parsed, in the raw buffer. */ + const char* m_tok_start_prev; - /** Starting position of the previous token parsed. */ - const char* tok_start_prev; + /** Begining of the query text in the input stream, in the raw buffer. */ + const char* m_buf; - /** Begining of the query text in the input stream. */ - const char* buf; + /** Echo the parsed stream to the pre-processed buffer. */ + bool m_echo; + + /** Pre-processed buffer. */ + char* m_cpp_buf; + + /** Pointer to the current position in the pre-processed input stream. */ + char* m_cpp_ptr; + + /** + Starting position of the last token parsed, + in the pre-processed buffer. + */ + const char* m_cpp_tok_start; + + /** + Starting position of the previous token parsed, + in the pre-procedded buffer. + */ + const char* m_cpp_tok_start_prev; + + /** + Ending position of the previous token parsed, + in the pre-processed buffer. + */ + const char* m_cpp_tok_end; + +public: /** Current state of the lexical analyser. */ enum my_lex_states next_state; - /** Position of ';' in the stream, to delimit multiple queries. */ + /** + Position of ';' in the stream, to delimit multiple queries. + This delimiter is in the raw buffer. + */ const char* found_semicolon; /** SQL_MODE = IGNORE_SPACE. */ bool ignore_space; - /* + + /** TRUE if we're parsing a prepared statement: in this mode we should allow placeholders and disallow multi-statements. */ bool stmt_prepare_mode; + + /** State of the lexical analyser for comments. */ + enum_comment_state in_comment; }; @@ -1138,8 +1419,17 @@ typedef struct st_lex : public Query_tables_list CHARSET_INFO *charset, *underscore_charset; /* store original leaf_tables for INSERT SELECT and PS/SP */ TABLE_LIST *leaf_tables_insert; - /* Position (first character index) of SELECT of CREATE VIEW statement */ - uint create_view_select_start; + + /** Start of SELECT of CREATE VIEW statement */ + const char* create_view_select_start; + /** End of SELECT of CREATE VIEW statement */ + const char* create_view_select_end; + + /** Start of 'ON <table>', in trigger statements. */ + const char* raw_trg_on_table_name_begin; + /** End of 'ON <table>', in trigger statements. */ + const char* raw_trg_on_table_name_end; + /* Partition info structure filled in by PARTITION BY parse part */ partition_info *part_info; @@ -1238,7 +1528,9 @@ typedef struct st_lex : public Query_tables_list uint8 create_view_algorithm; uint8 create_view_check; bool drop_if_exists, drop_temporary, local_file, one_shot_set; - bool in_comment, verbose, no_write_to_binlog; + + bool verbose, no_write_to_binlog; + bool tx_chain, tx_release; /* Special JOIN::prepare mode: changing of query is prohibited. @@ -1302,10 +1594,12 @@ typedef struct st_lex : public Query_tables_list - CREATE FUNCTION (points to "FUNCTION" or "AGGREGATE"); This pointer is required to add possibly omitted DEFINER-clause to the - DDL-statement before dumping it to the binlog. + DDL-statement before dumping it to the binlog. */ const char *stmt_definition_begin; + const char *stmt_definition_end; + /* Pointers to part of LOAD DATA statement that should be rewritten during replication ("LOCAL 'filename' REPLACE INTO" part). @@ -1434,8 +1728,8 @@ extern void lex_free(void); extern void lex_start(THD *thd); extern void lex_end(LEX *lex); extern int MYSQLlex(void *arg, void *yythd); -extern const char *skip_rear_comments(CHARSET_INFO *cs, const char *ubegin, - const char *uend); + +extern void trim_whitespace(CHARSET_INFO *cs, LEX_STRING *str); extern bool is_lex_native_function(const LEX_STRING *name); diff --git a/sql/sql_trigger.cc b/sql/sql_trigger.cc index e15003ab243..f63fcab04bb 100644 --- a/sql/sql_trigger.cc +++ b/sql/sql_trigger.cc @@ -563,10 +563,13 @@ bool Table_triggers_list::create_trigger(THD *thd, TABLE_LIST *tables, append_definer(thd, stmt_query, &definer_user, &definer_host); } - stmt_query->append(thd->lex->stmt_definition_begin, - (char *) thd->lex->sphead->m_body_begin - - thd->lex->stmt_definition_begin + - thd->lex->sphead->m_body.length); + LEX_STRING stmt_definition; + stmt_definition.str= (char*) thd->lex->stmt_definition_begin; + stmt_definition.length= thd->lex->stmt_definition_end + - thd->lex->stmt_definition_begin; + trim_whitespace(thd->charset(), & stmt_definition); + + stmt_query->append(stmt_definition.str, stmt_definition.length); trg_def->str= stmt_query->c_ptr(); trg_def->length= stmt_query->length(); @@ -1032,7 +1035,11 @@ bool Table_triggers_list::check_n_load(THD *thd, const char *db, if (!(on_table_name= (LEX_STRING*) alloc_root(&table->mem_root, sizeof(LEX_STRING)))) goto err_with_lex_cleanup; - *on_table_name= lex.ident; + + on_table_name->str= (char*) lex.raw_trg_on_table_name_begin; + on_table_name->length= lex.raw_trg_on_table_name_end + - lex.raw_trg_on_table_name_begin; + if (triggers->on_table_names_list.push_back(on_table_name, &table->mem_root)) goto err_with_lex_cleanup; @@ -1348,7 +1355,12 @@ Table_triggers_list::change_table_name_in_triggers(THD *thd, /* Construct CREATE TRIGGER statement with new table name. */ buff.length(0); + + /* WARNING: 'on_table_name' is supposed to point inside 'def' */ + DBUG_ASSERT(on_table_name->str > def->str); + DBUG_ASSERT(on_table_name->str < (def->str + def->length)); before_on_len= on_table_name->str - def->str; + buff.append(def->str, before_on_len); buff.append(STRING_WITH_LEN("ON ")); append_identifier(thd, &buff, new_table_name->str, new_table_name->length); diff --git a/sql/sql_view.cc b/sql/sql_view.cc index c1e4006555f..a2d6e080763 100644 --- a/sql/sql_view.cc +++ b/sql/sql_view.cc @@ -690,7 +690,6 @@ static int mysql_register_view(THD *thd, TABLE_LIST *view, char md5[MD5_BUFF_LENGTH]; bool can_be_merged; char dir_buff[FN_REFLEN], path_buff[FN_REFLEN]; - const char *endp; LEX_STRING dir, file, path; int error= 0; DBUG_ENTER("mysql_register_view"); @@ -708,10 +707,12 @@ static int mysql_register_view(THD *thd, TABLE_LIST *view, /* fill structure */ view->query.str= str.c_ptr_safe(); view->query.length= str.length(); - view->source.str= thd->query + thd->lex->create_view_select_start; - endp= view->source.str; - endp= skip_rear_comments(thd->charset(), endp, thd->query + thd->query_length); - view->source.length= endp - view->source.str; + + view->source.str= (char*) thd->lex->create_view_select_start; + view->source.length= (thd->lex->create_view_select_end + - thd->lex->create_view_select_start); + trim_whitespace(thd->charset(), & view->source); + view->file_version= 1; view->calc_md5(md5); view->md5.str= md5; diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy index 4ffe8647513..837c1699249 100644 --- a/sql/sql_yacc.yy +++ b/sql/sql_yacc.yy @@ -106,7 +106,7 @@ void my_parse_error(const char *s) THD *thd= current_thd; Lex_input_stream *lip= thd->m_lip; - const char *yytext= lip->tok_start; + const char *yytext= lip->get_tok_start(); /* Push an error into the error stack */ my_printf_error(ER_PARSE_ERROR, ER(ER_PARSE_ERROR), MYF(0), s, (yytext ? yytext : ""), @@ -1872,9 +1872,9 @@ ev_sql_stmt: bzero((char *)&lex->sp_chistics, sizeof(st_sp_chistics)); lex->sphead->m_chistics= &lex->sp_chistics; - lex->sphead->m_body_begin= lip->ptr; + lex->sphead->m_body_begin= lip->get_cpp_ptr(); - lex->event_parse_data->body_begin= lip->ptr; + lex->event_parse_data->body_begin= lip->get_cpp_ptr(); } ev_sql_stmt_inner @@ -1986,6 +1986,7 @@ create_function_tail: LEX *lex= thd->lex; Lex_input_stream *lip= thd->m_lip; sp_head *sp; + const char* tmp_param_begin; /* First check if AGGREGATE was used, in that case it's a @@ -2017,7 +2018,10 @@ create_function_tail: */ $<ulong_num>$= thd->client_capabilities & CLIENT_MULTI_QUERIES; thd->client_capabilities &= ~CLIENT_MULTI_QUERIES; - lex->sphead->m_param_begin= lip->tok_start+1; + + tmp_param_begin= lip->get_cpp_tok_start(); + tmp_param_begin++; + lex->sphead->m_param_begin= tmp_param_begin; } sp_fdparam_list ')' { @@ -2025,7 +2029,7 @@ create_function_tail: LEX *lex= thd->lex; Lex_input_stream *lip= thd->m_lip; - lex->sphead->m_param_end= lip->tok_start; + lex->sphead->m_param_end= lip->get_cpp_tok_start(); } RETURNS_SYM { @@ -2065,7 +2069,7 @@ create_function_tail: Lex_input_stream *lip= thd->m_lip; lex->sphead->m_chistics= &lex->sp_chistics; - lex->sphead->m_body_begin= lip->tok_start; + lex->sphead->m_body_begin= lip->get_cpp_tok_start(); } sp_proc_stmt { @@ -2676,7 +2680,7 @@ sp_proc_stmt_statement: Lex_input_stream *lip= thd->m_lip; lex->sphead->reset_lex(thd); - lex->sphead->m_tmp_query= lip->tok_start; + lex->sphead->m_tmp_query= lip->get_tok_start(); } statement { @@ -2709,9 +2713,9 @@ sp_proc_stmt_statement: lex->tok_end otherwise. */ if (yychar == YYEMPTY) - i->m_query.length= lip->ptr - sp->m_tmp_query; + i->m_query.length= lip->get_ptr() - sp->m_tmp_query; else - i->m_query.length= lip->tok_end - sp->m_tmp_query; + i->m_query.length= lip->get_tok_end() - sp->m_tmp_query; i->m_query.str= strmake_root(thd->mem_root, sp->m_tmp_query, i->m_query.length); @@ -6229,14 +6233,14 @@ remember_name: { THD *thd= YYTHD; Lex_input_stream *lip= thd->m_lip; - $$= (char*) lip->tok_start; + $$= (char*) lip->get_cpp_tok_start(); }; remember_end: { THD *thd= YYTHD; Lex_input_stream *lip= thd->m_lip; - $$=(char*) lip->tok_end; + $$= (char*) lip->get_cpp_tok_end(); }; select_item2: @@ -8010,7 +8014,7 @@ procedure_item: if (add_proc_to_list(thd, $2)) MYSQL_YYABORT; if (!$2->name) - $2->set_name($1,(uint) ((char*) lip->tok_end - $1), + $2->set_name($1,(uint) ((char*) lip->get_tok_end() - $1), thd->charset()); } ; @@ -9111,7 +9115,7 @@ load: LOAD DATA_SYM my_error(ER_SP_BADSTATEMENT, MYF(0), "LOAD DATA"); MYSQL_YYABORT; } - lex->fname_start= lip->ptr; + lex->fname_start= lip->get_ptr(); } load_data {} @@ -9148,7 +9152,7 @@ load_data: THD *thd= YYTHD; LEX *lex= thd->lex; Lex_input_stream *lip= thd->m_lip; - lex->fname_end= lip->ptr; + lex->fname_end= lip->get_ptr(); } TABLE_SYM table_ident { @@ -9337,7 +9341,7 @@ param_marker: my_error(ER_VIEW_SELECT_VARIABLE, MYF(0)); MYSQL_YYABORT; } - item= new Item_param((uint) (lip->tok_start - thd->query)); + item= new Item_param((uint) (lip->get_tok_start() - thd->query)); if (!($$= item) || lex->param_list.push_back(item)) { my_message(ER_OUT_OF_RESOURCES, ER(ER_OUT_OF_RESOURCES), MYF(0)); @@ -9470,7 +9474,7 @@ simple_ident: Item_splocal *splocal; splocal= new Item_splocal($1, spv->offset, spv->type, - lip->tok_start_prev - + lip->get_tok_start_prev() - lex->sphead->m_tmp_query); #ifndef DBUG_OFF if (splocal) @@ -10146,7 +10150,7 @@ option_type_value: lex->option_type=OPT_SESSION; lex->var_list.empty(); lex->one_shot_set= 0; - lex->sphead->m_tmp_query= lip->tok_start; + lex->sphead->m_tmp_query= lip->get_tok_start(); } } ext_option_value @@ -10179,9 +10183,9 @@ option_type_value: lip->tok_end otherwise. */ if (yychar == YYEMPTY) - qbuff.length= lip->ptr - sp->m_tmp_query; + qbuff.length= lip->get_ptr() - sp->m_tmp_query; else - qbuff.length= lip->tok_end - sp->m_tmp_query; + qbuff.length= lip->get_tok_end() - sp->m_tmp_query; if (!(qbuff.str= (char*) alloc_root(thd->mem_root, qbuff.length + 5))) @@ -11362,7 +11366,7 @@ view_tail: if (!lex->select_lex.add_table_to_list(thd, $3, NULL, TL_OPTION_UPDATING)) MYSQL_YYABORT; } - view_list_opt AS view_select view_check_option + view_list_opt AS view_select {} ; @@ -11387,40 +11391,32 @@ view_list: view_select: { + THD *thd= YYTHD; LEX *lex= Lex; + Lex_input_stream *lip= thd->m_lip; lex->parsing_options.allows_variable= FALSE; lex->parsing_options.allows_select_into= FALSE; lex->parsing_options.allows_select_procedure= FALSE; lex->parsing_options.allows_derived= FALSE; - } - view_select_aux + lex->create_view_select_start= lip->get_cpp_ptr(); + } + view_select_aux view_check_option { + THD *thd= YYTHD; LEX *lex= Lex; + Lex_input_stream *lip= thd->m_lip; lex->parsing_options.allows_variable= TRUE; lex->parsing_options.allows_select_into= TRUE; lex->parsing_options.allows_select_procedure= TRUE; lex->parsing_options.allows_derived= TRUE; + lex->create_view_select_end= lip->get_cpp_ptr(); } ; view_select_aux: - SELECT_SYM remember_name select_init2 - { - THD *thd= YYTHD; - LEX *lex= thd->lex; - const char *stmt_beg= (lex->sphead ? - lex->sphead->m_tmp_query : thd->query); - lex->create_view_select_start= $2 - stmt_beg; - } - | '(' remember_name select_paren ')' union_opt - { - THD *thd= YYTHD; - LEX *lex= thd->lex; - const char *stmt_beg= (lex->sphead ? - lex->sphead->m_tmp_query : thd->query); - lex->create_view_select_start= $2 - stmt_beg; - } - ; + SELECT_SYM select_init2 + | '(' select_paren ')' union_opt + ; view_check_option: /* empty */ @@ -11440,9 +11436,31 @@ view_check_option: **************************************************************************/ trigger_tail: - TRIGGER_SYM remember_name sp_name trg_action_time trg_event - ON remember_name table_ident FOR_SYM remember_name EACH_SYM ROW_SYM - { + TRIGGER_SYM + remember_name + sp_name + trg_action_time + trg_event + ON + remember_name /* $7 */ + { /* $8 */ + THD *thd= YYTHD; + LEX *lex= thd->lex; + Lex_input_stream *lip= thd->m_lip; + lex->raw_trg_on_table_name_begin= lip->get_tok_start(); + } + table_ident /* $9 */ + FOR_SYM + remember_name /* $11 */ + { /* $12 */ + THD *thd= YYTHD; + LEX *lex= thd->lex; + Lex_input_stream *lip= thd->m_lip; + lex->raw_trg_on_table_name_end= lip->get_tok_start(); + } + EACH_SYM + ROW_SYM + { /* $15 */ THD *thd= YYTHD; LEX *lex= thd->lex; Lex_input_stream *lip= thd->m_lip; @@ -11461,7 +11479,7 @@ trigger_tail: sp->init_sp_name(thd, $3); lex->stmt_definition_begin= $2; lex->ident.str= $7; - lex->ident.length= $10 - $7; + lex->ident.length= $11 - $7; sp->m_type= TYPE_ENUM_TRIGGER; lex->sphead= sp; @@ -11476,12 +11494,10 @@ trigger_tail: bzero((char *)&lex->sp_chistics, sizeof(st_sp_chistics)); lex->sphead->m_chistics= &lex->sp_chistics; - lex->sphead->m_body_begin= lip->ptr; - while (my_isspace(system_charset_info, lex->sphead->m_body_begin[0])) - ++lex->sphead->m_body_begin; + lex->sphead->m_body_begin= lip->get_cpp_ptr(); } - sp_proc_stmt - { + sp_proc_stmt /* $16 */ + { /* $17 */ LEX *lex= Lex; sp_head *sp= lex->sphead; @@ -11489,7 +11505,7 @@ trigger_tail: sp->init_strings(YYTHD, lex); /* Restore flag if it was cleared above */ - YYTHD->client_capabilities |= $<ulong_num>13; + YYTHD->client_capabilities |= $<ulong_num>15; sp->restore_thd_mem_root(YYTHD); if (sp->is_not_allowed_in_function("trigger")) @@ -11500,7 +11516,7 @@ trigger_tail: sp_proc_stmt alternatives are not saving/restoring LEX, so lex->query_tables can be wiped out. */ - if (!lex->select_lex.add_table_to_list(YYTHD, $8, + if (!lex->select_lex.add_table_to_list(YYTHD, $9, (LEX_STRING*) 0, TL_OPTION_UPDATING, TL_IGNORE)) @@ -11558,8 +11574,11 @@ sp_tail: THD *thd= YYTHD; LEX *lex= thd->lex; Lex_input_stream *lip= thd->m_lip; + const char* tmp_param_begin; - lex->sphead->m_param_begin= lip->tok_start+1; + tmp_param_begin= lip->get_cpp_tok_start(); + tmp_param_begin++; + lex->sphead->m_param_begin= tmp_param_begin; } sp_pdparam_list ')' @@ -11568,7 +11587,7 @@ sp_tail: LEX *lex= thd->lex; Lex_input_stream *lip= thd->m_lip; - lex->sphead->m_param_end= lip->tok_start; + lex->sphead->m_param_end= lip->get_cpp_tok_start(); bzero((char *)&lex->sp_chistics, sizeof(st_sp_chistics)); } sp_c_chistics @@ -11578,7 +11597,7 @@ sp_tail: Lex_input_stream *lip= thd->m_lip; lex->sphead->m_chistics= &lex->sp_chistics; - lex->sphead->m_body_begin= lip->tok_start; + lex->sphead->m_body_begin= lip->get_cpp_tok_start(); } sp_proc_stmt { |