From 41783de54966481dd295aae95a2bfc8f6302a940 Mon Sep 17 00:00:00 2001 From: Sven Sandberg Date: Tue, 14 Jul 2009 21:31:19 +0200 Subject: BUG#39934: Slave stops for engine that only support row-based logging General overview: The logic for switching to row format when binlog_format=MIXED had numerous flaws. The underlying problem was the lack of a consistent architecture. General purpose of this changeset: This changeset introduces an architecture for switching to row format when binlog_format=MIXED. It enforces the architecture where it has to. It leaves some bugs to be fixed later. It adds extensive tests to verify that unsafe statements work as expected and that appropriate errors are produced by problems with the selection of binlog format. It was not practical to split this into smaller pieces of work. Problem 1: To determine the logging mode, the code has to take several parameters into account (namely: (1) the value of binlog_format; (2) the capabilities of the engines; (3) the type of the current statement: normal, unsafe, or row injection). These parameters may conflict in several ways, namely: - binlog_format=STATEMENT for a row injection - binlog_format=STATEMENT for an unsafe statement - binlog_format=STATEMENT for an engine only supporting row logging - binlog_format=ROW for an engine only supporting statement logging - statement is unsafe and engine does not support row logging - row injection in a table that does not support statement logging - statement modifies one table that does not support row logging and one that does not support statement logging Several of these conflicts were not detected, or were detected with an inappropriate error message. The problem of BUG#39934 was that no appropriate error message was written for the case when an engine only supporting row logging executed a row injection with binlog_format=ROW. However, all above cases must be handled. Fix 1: Introduce new error codes (sql/share/errmsg.txt). Ensure that all conditions are detected and handled in decide_logging_format() Problem 2: The binlog format shall be determined once per statement, in decide_logging_format(). It shall not be changed before or after that. Before decide_logging_format() is called, all information necessary to determine the logging format must be available. This principle ensures that all unsafe statements are handled in a consistent way. However, this principle is not followed: thd->set_current_stmt_binlog_row_based_if_mixed() is called in several places, including from code executing UPDATE..LIMIT, INSERT..SELECT..LIMIT, DELETE..LIMIT, INSERT DELAYED, and SET @@binlog_format. After Problem 1 was fixed, that caused inconsistencies where these unsafe statements would not print the appropriate warnings or errors for some of the conflicts. Fix 2: Remove calls to THD::set_current_stmt_binlog_row_based_if_mixed() from code executed after decide_logging_format(). Compensate by calling the set_current_stmt_unsafe() at parse time. This way, all unsafe statements are detected by decide_logging_format(). Problem 3: INSERT DELAYED is not unsafe: it is logged in statement format even if binlog_format=MIXED, and no warning is printed even if binlog_format=STATEMENT. This is BUG#45825. Fix 3: Made INSERT DELAYED set itself to unsafe at parse time. This allows decide_logging_format() to detect that a warning should be printed or the binlog_format changed. Problem 4: LIMIT clause were not marked as unsafe when executed inside stored functions/triggers/views/prepared statements. This is BUG#45785. Fix 4: Make statements containing the LIMIT clause marked as unsafe at parse time, instead of at execution time. This allows propagating unsafe-ness to the view. mysql-test/extra/rpl_tests/create_recursive_construct.inc: Added auxiliary file used by binlog_unsafe.test to create and execute recursive constructs (functions/procedures/triggers/views/prepared statements). mysql-test/extra/rpl_tests/rpl_foreign_key.test: removed unnecessary set @@session.binlog_format mysql-test/extra/rpl_tests/rpl_insert_delayed.test: Filter out table id from table map events in binlog listing. Got rid of $binlog_format_statement. mysql-test/extra/rpl_tests/rpl_ndb_apply_status.test: disable warnings around call to unsafe procedure mysql-test/include/rpl_udf.inc: Disabled warnings for code that generates warnings for some binlog formats. That would otherwise cause inconsistencies in the result file. mysql-test/r/mysqldump.result: Views are now unsafe if they contain a LIMIT clause. That fixed BUG#45831. Due to BUG#45832, a warning is printed for the CREATE VIEW statement. mysql-test/r/sp_trans.result: Unsafe statements in stored procedures did not give a warning if binlog_format=statement. This is BUG#45824. Now they do, so this result file gets a new warning. mysql-test/suite/binlog/r/binlog_multi_engine.result: Error message changed. mysql-test/suite/binlog/r/binlog_statement_insert_delayed.result: INSERT DELAYED didn't generate a warning when binlog_format=STATEMENT. That was BUG#45825. Now there is a warning, so result file needs to be updated. mysql-test/suite/binlog/r/binlog_stm_ps.result: Changed error message. mysql-test/suite/binlog/r/binlog_unsafe.result: updated result file: - error message changed - added test for most combinations of unsafe constructs invoked from recursive constructs - INSERT DELAYED now gives a warning (because BUG#45826 is fixed) - INSERT..SELECT..LIMIT now gives a warning from inside recursive constructs (because BUG#45785 was fixed) - When a recursive construct (e.g., stored proc or function) contains more than one statement, at least one of which is unsafe, then all statements in the recursive construct give warnings. This is a new bug introduced by this changeset. It will be addressed in a post-push fix. mysql-test/suite/binlog/t/binlog_innodb.test: Changed error code for innodb updates with READ COMMITTED or READ UNCOMMITTED transaction isolation level and binlog_format=statement. mysql-test/suite/binlog/t/binlog_multi_engine.test: The error code has changed for statements where more than one engine is involved and one of them is self-logging. mysql-test/suite/binlog/t/binlog_unsafe-master.opt: Since binlog_unsafe now tests unsafe-ness of UDF's, we need an extra flag in the .opt file. mysql-test/suite/binlog/t/binlog_unsafe.test: - Clarified comment. - Rewrote first part of test. Now it tests not only unsafe variables and functions, but also unsafe-ness due to INSERT..SELECT..LIMIT, INSERT DELAYED, insert into two autoinc columns, use of UDF's, and access to log tables in the mysql database. Also, in addition to functions, procedures, triggers, and prepared statements, it now also tests views; and it constructs recursive calls in two levels by combining these recursive constructs. Part of the logic is in extra/rpl_tests/create_recursive_construct.inc. - added tests for all special system variables that should not be unsafe. - added specific tests for BUG#45785 and BUG#45825 mysql-test/suite/rpl/r/rpl_events.result: updated result file mysql-test/suite/rpl/r/rpl_extraColmaster_innodb.result: updated result file mysql-test/suite/rpl/r/rpl_extraColmaster_myisam.result: updated result file mysql-test/suite/rpl/r/rpl_foreign_key_innodb.result: updated result file mysql-test/suite/rpl/r/rpl_idempotency.result: updated result file mysql-test/suite/rpl/r/rpl_mix_found_rows.result: Split rpl_found_rows.test into rpl_mix_found_rows.test (a new file) and rpl_stm_found_rows.test (renamed rpl_found_rows.test). This file equals the second half of the old rpl_found_rows.result, with the following modifications: - minor formatting changes - additional initialization mysql-test/suite/rpl/r/rpl_mix_insert_delayed.result: Moved out code operating in mixed mode from rpl_stm_insert_delayed (into rpl_mix_insert_delayed) and got rid of explicit setting of binlog format. mysql-test/suite/rpl/r/rpl_rbr_to_sbr.result: updated result file mysql-test/suite/rpl/r/rpl_row_idempotency.result: Moved the second half of rpl_idempotency.test, which only executed in row mode, to rpl_row_idempotency.test. This is the new result file. mysql-test/suite/rpl/r/rpl_row_insert_delayed.result: Got rid of unnecessary explicit setting of binlog format. mysql-test/suite/rpl/r/rpl_stm_found_rows.result: Split rpl_found_rows.test into rpl_mix_found_rows.test (a new file) and rpl_stm_found_rows.test (renamed rpl_found_rows.test). Changes in this file: - minor formatting changes - warning is now issued for unsafe statements inside procedures (since BUG#45824 is fixed) - second half of file is moved to rpl_mix_found_rows.result mysql-test/suite/rpl/r/rpl_stm_insert_delayed.result: Moved out code operating in mixed mode from rpl_stm_insert_delayed (into rpl_mix_insert_delayed) and got rid of explicit setting of binlog format. mysql-test/suite/rpl/r/rpl_stm_loadfile.result: error message changed mysql-test/suite/rpl/r/rpl_temporary_errors.result: updated result file mysql-test/suite/rpl/r/rpl_udf.result: Remove explicit set of binlog format (and triplicate test execution) and rely on test system executing the test in all binlog formats. mysql-test/suite/rpl/t/rpl_bug31076.test: Test is only valid in mixed or row mode since it generates row events. mysql-test/suite/rpl/t/rpl_events.test: Removed explicit set of binlog_format and removed duplicate testing. Instead, we rely on the test system to try all binlog formats. mysql-test/suite/rpl/t/rpl_extraColmaster_innodb.test: Removed triplicate testing and instead relying on test system. Test is only relevant for row format since statement-based replication cannot handle extra columns on master. mysql-test/suite/rpl/t/rpl_extraColmaster_myisam.test: Removed triplicate testing and instead relying on test system. Test is only relevant for row format since statement-based replication cannot handle extra columns on master. mysql-test/suite/rpl/t/rpl_idempotency-slave.opt: Removed .opt file to avoid server restarts. mysql-test/suite/rpl/t/rpl_idempotency.test: - Moved out row-only tests to a new test file, rpl_row_idempotency.test. rpl_idempotency now only contains tests that execute in all binlog_formats. - While I was here, also removed .opt file to avoid server restarts. The slave_exec_mode is now set inside the test instead. mysql-test/suite/rpl/t/rpl_mix_found_rows.test: Split rpl_found_rows.test into rpl_mix_found_rows.test (a new file) and rpl_stm_found_rows.test (renamed rpl_found_rows.test). This file contains the second half of the original rpl_found_rows.test with the follwing changes: - initialization - removed SET_BINLOG_FORMAT and added have_binlog_format_mixed.inc - minor formatting changes mysql-test/suite/rpl/t/rpl_mix_insert_delayed.test: Moved out code operating in mixed mode from rpl_stm_insert_delayed (into rpl_mix_insert_delayed) and got rid of explicit setting of binlog format. mysql-test/suite/rpl/t/rpl_rbr_to_sbr.test: Test cannot execute in statement mode, since we no longer switch to row format when binlog_format=statement. Enforced mixed mode throughout the test. mysql-test/suite/rpl/t/rpl_row_idempotency.test: Moved the second half of rpl_idempotency.test, which only executed in row mode, to this new file. We now rely on the test system to set binlog format. mysql-test/suite/rpl/t/rpl_row_insert_delayed.test: - Got rid of unnecessary explicit setting of binlog format. - extra/rpl_tests/rpl_insert_delayed.test does not need the $binlog_format_statement variable any more, so that was removed. mysql-test/suite/rpl/t/rpl_slave_skip.test: The test switches binlog_format internally and master generates both row and statement events. Hence, the slave must be able to log in both statement and row format. Hence test was changed to only execute in mixed mode. mysql-test/suite/rpl/t/rpl_stm_found_rows.test: Split rpl_found_rows.test into rpl_mix_found_rows.test (a new file) and rpl_stm_found_rows.test (renamed rpl_found_rows.test). Changes in this file: - minor formatting changes - added have_binlog_format_statement and removed SET BINLOG_FORMAT. - second half of file is moved to rpl_mix_found_rows.test - added cleanup code mysql-test/suite/rpl/t/rpl_stm_insert_delayed.test: Moved out code operating in mixed mode from rpl_stm_insert_delayed (into rpl_mix_insert_delayed) and got rid of explicit setting of binlog format. mysql-test/suite/rpl/t/rpl_switch_stm_row_mixed.test: The test switches binlog_format internally and master generates both row and statement events. Hence, the slave must be able to log in both statement and row format. Hence test was changed to only execute in mixed mode on slave. mysql-test/suite/rpl/t/rpl_temporary_errors.test: Removed explicit set of binlog format. Instead, the test now only executes in row mode. mysql-test/suite/rpl/t/rpl_udf.test: Remove explicit set of binlog format (and triplicate test execution) and rely on test system executing the test in all binlog formats. mysql-test/suite/rpl_ndb/combinations: Added combinations file for rpl_ndb. mysql-test/suite/rpl_ndb/r/rpl_ndb_binlog_format_errors.result: new result file mysql-test/suite/rpl_ndb/r/rpl_ndb_circular_simplex.result: updated result file mysql-test/suite/rpl_ndb/t/rpl_ndb_2innodb.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_2myisam.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_basic.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_binlog_format_errors-master.opt: new option file mysql-test/suite/rpl_ndb/t/rpl_ndb_binlog_format_errors-slave.opt: new option file mysql-test/suite/rpl_ndb/t/rpl_ndb_binlog_format_errors.test: New test case to verify all errors and warnings generated by decide_logging_format. mysql-test/suite/rpl_ndb/t/rpl_ndb_blob.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_blob2.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_circular.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_circular_simplex.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. While I was here, also made the test clean up after itself. mysql-test/suite/rpl_ndb/t/rpl_ndb_commit_afterflush.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_ctype_ucs2_def.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_delete_nowhere.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_do_db.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_do_table.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_func003.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_innodb_trans.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_insert_ignore.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_mixed_engines_transactions.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_multi_update3.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_rep_ignore.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_row_001.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_sp003.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_sp006.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/suite/rpl_ndb/t/rpl_ndb_trig004.test: The test needs slave to be able to switch to row mode, so the test was changed to only execute in mixed and row mode. mysql-test/t/partition_innodb_stmt.test: Changed error code for innodb updates with READ COMMITTED or READ UNCOMMITTED transaction isolation level and binlog_format=statement. sql/event_db_repository.cc: Use member function to read current_stmt_binlog_row_based. sql/events.cc: Use member function to read current_stmt_binlog_row_based. sql/ha_ndbcluster_binlog.cc: reset_current_stmt_binlog_row_based() is not a no-op for the ndb_binlog thread any more. Instead, the ndb_binlog thread now forces row mode both initially and just after calling mysql_parse. (mysql_parse() is the only place where reset_current_stmt_binlog_row_based() may be called from the ndb_binlog thread, so these are the only two places that need to change.) sql/ha_partition.cc: Use member function to read current_stmt_binlog_row_based. sql/handler.cc: Use member function to read current_stmt_binlog_row_based. sql/item_create.cc: Added DBUG_ENTER to some functions, to be able to trace when set_stmt_unsafe is called. sql/log.cc: Use member function to read current_stmt_binlog_row_based. sql/log_event.cc: - Moved logic for changing to row format out of do_apply_event (and into decide_logging_format). - Added @todo comment for post-push cleanup. sql/log_event_old.cc: Move logic for changing to row format out of do_apply_event (and into decide_logging_format). sql/mysql_priv.h: Make decide_logging_format() a member of the THD class, for two reasons: - It is natural from an object-oriented perspective. - decide_logging_format() needs to access private members of THD (specifically, the new binlog_warning_flags field). sql/rpl_injector.cc: Removed call to set_current_stmt_binlog_row_based(). From now on, only decide_logging_fromat is allowed to modify current_stmt_binlog_row_based. This call is from the ndb_binlog thread, mostly executing code in ha_ndbcluster_binlog.cc. This call can be safely removed, because: - current_stmt_binlog_row_based is initialized for the ndb_binlog thread's THD object when the THD object is created. So we're not going to read uninitialized memory. - The behavior of ndb_binlog thread does not use the state of the current_stmt_binlog_row_based. It is conceivable that the ndb_binlog thread would rely on the current_stmt_binlog_format in two situations: (1) when it calls mysql_parse; (2) when it calls THD::binlog_query. In case (1), it always clears THD::options&OPTION_BIN_LOG (because run_query() in ha_ndbcluster_binlog.cc is only called with disable_binlogging = TRUE). In case (2), it always uses qtype=STMT_QUERY_TYPE. sql/set_var.cc: Added @todo comment for post-push cleanup. sql/share/errmsg.txt: Added new error messages and clarified ER_BINLOG_UNSAFE_STATEMENT. sql/sp.cc: Added DBUG_ENTER, to be able to trace when set_stmt_unsafe is called. Got rid of MYSQL_QUERY_TYPE: it was equivalent to STMT_QUERY_TYPE. sql/sp_head.cc: Use member function to read current_stmt_binlog_row_based. sql/sp_head.h: Added DBUG_ENTER, to be able to trace when set_stmt_unsafe is called. sql/sql_acl.cc: Got rid of MYSQL_QUERY_TYPE: it was equivalent to STMT_QUERY_TYPE. sql/sql_base.cc: - Made decide_logging_format take care of all logic for deciding the logging format, and for determining the related warnings and errors. See comment above decide_logging_format for details. - Made decide_logging_format a member function of THD, since it needs to access private members of THD and since its purpose is to update the state of a THD object. - Added DBUG_ENTER, to be able to trace when set_stmt_unsafe is called. sql/sql_class.cc: - Moved logic for determining unsafe warnings away from THD::binlog_query (and into decide_logging_format()). Now, it works like this: 1. decide_logging_format detects that the current statement shall produce a warning, if it ever makes it to the binlog 2. decide_logging_format sets a flag of THD::binlog_warning_flags. 3. THD::binlog_query reads the flag. If the flag is set, it generates a warning. - Use member function to read current_stmt_binlog_row_based. sql/sql_class.h: - Added THD::binlog_warning_flags (see sql_class.cc for explanation). - Made decide_logging_format() and reset_for_next_command() member functions of THD (instead of standalone functions). This was needed for two reasons: (1) the functions need to access the private member THD::binlog_warning_flags; (2) the purpose of these functions is to update the staet of a THD object, so from an object-oriented point of view they should be member functions. - Encapsulated current_stmt_binlog_row_based, so it is now private and can only be accessed from a member function. Also changed the data type to an enumeration instead of a bool. - Removed MYSQL_QUERY_TYPE, because it was equivalent to STMT_QUERY_TYPE anyways. - When reset_current_stmt_binlog_row_based was called from the ndb_binlog thread, it would behave as a no-op. This special case has been removed, and the behavior of reset_current_stmt_binlog_row_based does not depend on which thread calls it any more. The special case did not serve any purpose, since the ndb binlog thread did not take the current_stmt_binlog_row_based flag into account anyways. sql/sql_delete.cc: - Moved logic for setting row format for DELETE..LIMIT away from mysql_prepare_delete. (Instead, we mark the statement as unsafe at parse time (sql_yacc.yy) and rely on decide_logging_format() (sql_class.cc) to set row format.) This is part of the fix for BUG#45831. - Use member function to read current_stmt_binlog_row_based. sql/sql_insert.cc: - Removed unnecessary calls to thd->lex->set_stmt_unsafe() and thd->set_current_stmt_binlog_row_based_if_mixed() from handle_delayed_insert(). The calls are unnecessary because they have already been made; they were made in the constructor of the `di' object. - Since decide_logging_format() is now a member function of THD, code that calls decide_logging_format() had to be updated. - Added DBUG_ENTER call, to be able to trace when set_stmt_unsafe is called. - Moved call to set_stmt_unsafe() for INSERT..SELECT..LIMIT away from mysql_insert_select_prepare() (and into decide_logging_format). This is part of the fix for BUG#45831. - Use member function to read current_stmt_binlog_row_based. sql/sql_lex.h: - Added the flag BINLOG_STMT_FLAG_ROW_INJECTION to enum_binlog_stmt_flag. This was necessary so that a statement can identify itself as a row injection. - Added appropriate setter and getter functions for the new flag. - Added or clarified some comments. - Added DBUG_ENTER() sql/sql_load.cc: Use member function to read current_stmt_binlog_row_based. sql/sql_parse.cc: - Made mysql_reset_thd_for_next_command() clear thd->binlog_warning_flags. - Since thd->binlog_warning_flags is private, it must be set in a member function of THD. Hence, moved the body of mysql_reset_thd_for_next_command() to the new member function THD::reset_thd_for_next_command(), and made mysql_reset_thd_for_next_command() call THD::reset_thd_for_next_command(). - Removed confusing comment. - Use member function to read current_stmt_binlog_row_based. sql/sql_repl.cc: Use member function to read current_stmt_binlog_row_based. sql/sql_table.cc: Use member function to read current_stmt_binlog_row_based. sql/sql_udf.cc: Use member function to read current_stmt_binlog_row_based. sql/sql_update.cc: Moved logic for setting row format for UPDATE..LIMIT away from mysql_prepare_update. (Instead, we mark the statement as unsafe at parse time (sql_yacc.yy) and rely on decide_logging_format() (sql_class.cc) to set row format.) This is part of the fix for BUG#45831. sql/sql_yacc.yy: Made INSERT DELAYED, INSERT..SELECT..LIMIT, UPDATE..LIMIT, and DELETE..LIMIT mark themselves as unsafe at parse time (instead of at execution time). This is part of the fixes BUG#45831 and BUG#45825. storage/example/ha_example.cc: Made exampledb accept inserts. This was needed by the new test case rpl_ndb_binlog_format_errors, because it needs an engine that is statement-only (and accepts inserts). storage/example/ha_example.h: Made exampledb a statement-only engine instead of a row-only engine. No existing test relied exampledb's row-only capabilities. The new test case rpl_ndb_binlog_format_errors needs an engine that is statement-only. storage/innobase/handler/ha_innodb.cc: - Changed error error code and message given by innodb when binlog_format=STATEMENT and transaction isolation level is READ COMMITTED or READ UNCOMMITTED. - While I was here, also simplified the condition for checking when to give the error. --- sql/log.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'sql/log.cc') diff --git a/sql/log.cc b/sql/log.cc index ee7ee48b42c..ef7d5c75f84 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -3734,7 +3734,7 @@ int THD::binlog_write_table_map(TABLE *table, bool is_trans) table->s->table_map_id)); /* Pre-conditions */ - DBUG_ASSERT(current_stmt_binlog_row_based && mysql_bin_log.is_open()); + DBUG_ASSERT(is_current_stmt_binlog_format_row() && mysql_bin_log.is_open()); DBUG_ASSERT(table->s->table_map_id != ULONG_MAX); Table_map_log_event::flag_set const @@ -4009,7 +4009,7 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info) */ if (thd) { - if (!thd->current_stmt_binlog_row_based) + if (!thd->is_current_stmt_binlog_format_row()) { if (thd->stmt_depends_on_first_successful_insert_id_in_prev_stmt) { -- cgit v1.2.1 From 19c380aaff1f1f3c0d21ac0c18904c21d7bdce76 Mon Sep 17 00:00:00 2001 From: Alfranio Correia Date: Tue, 3 Nov 2009 19:02:56 +0000 Subject: WL#2687 WL#5072 BUG#40278 BUG#47175 Non-transactional updates that take place inside a transaction present problems for logging because they are visible to other clients before the transaction is committed, and they are not rolled back even if the transaction is rolled back. It is not always possible to log correctly in statement format when both transactional and non-transactional tables are used in the same transaction. In the current patch, we ensure that such scenario is completely safe under the ROW and MIXED modes. --- sql/log.cc | 1041 +++++++++++++++++++++++++++++++++++------------------------- 1 file changed, 603 insertions(+), 438 deletions(-) (limited to 'sql/log.cc') diff --git a/sql/log.cc b/sql/log.cc index 491030aca34..fa063196e14 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -148,112 +148,155 @@ private: }; /* - Helper class to store binary log transaction data. + Helper classes to store non-transactional and transactional data + before copying it to the binary log. */ -class binlog_trx_data { +class binlog_cache_data +{ public: - binlog_trx_data() - : at_least_one_stmt(0), incident(FALSE), m_pending(0), - before_stmt_pos(MY_OFF_T_UNDEF) + binlog_cache_data(): m_pending(0), before_stmt_pos (MY_OFF_T_UNDEF), + incident(FALSE) { - trans_log.end_of_file= max_binlog_cache_size; + cache_log.end_of_file= max_binlog_cache_size; } - ~binlog_trx_data() + ~binlog_cache_data() { - DBUG_ASSERT(pending() == NULL); - close_cached_file(&trans_log); + DBUG_ASSERT(empty()); + close_cached_file(&cache_log); } - my_off_t position() const { - return my_b_tell(&trans_log); + bool empty() const + { + return pending() == NULL && my_b_tell(&cache_log) == 0; } - bool empty() const + Rows_log_event *pending() const { - return pending() == NULL && my_b_tell(&trans_log) == 0; + return m_pending; } - /* - Truncate the transaction cache to a certain position. This - includes deleting the pending event. - */ - void truncate(my_off_t pos) + void set_pending(Rows_log_event *const pending) { - DBUG_PRINT("info", ("truncating to position %lu", (ulong) pos)); - DBUG_PRINT("info", ("before_stmt_pos=%lu", (ulong) pos)); - delete pending(); - set_pending(0); - reinit_io_cache(&trans_log, WRITE_CACHE, pos, 0, 0); - trans_log.end_of_file= max_binlog_cache_size; - if (pos < before_stmt_pos) - before_stmt_pos= MY_OFF_T_UNDEF; + m_pending= pending; + } - /* - The only valid positions that can be truncated to are at the - beginning of a statement. We are relying on this fact to be able - to set the at_least_one_stmt flag correctly. In other word, if - we are truncating to the beginning of the transaction cache, - there will be no statements in the cache, otherwhise, we will - have at least one statement in the transaction cache. - */ - at_least_one_stmt= (pos > 0); + void set_incident(void) + { + incident= TRUE; + } + + bool has_incident(void) + { + return(incident); } - /* - Reset the entire contents of the transaction cache, emptying it - completely. - */ - void reset() { - if (!empty()) - truncate(0); - before_stmt_pos= MY_OFF_T_UNDEF; + void reset() + { + truncate(0); incident= FALSE; - trans_log.end_of_file= max_binlog_cache_size; + before_stmt_pos= MY_OFF_T_UNDEF; + cache_log.end_of_file= max_binlog_cache_size; DBUG_ASSERT(empty()); } - Rows_log_event *pending() const + my_off_t get_byte_position() const { - return m_pending; + return my_b_tell(&cache_log); } - void set_pending(Rows_log_event *const pending) + my_off_t get_prev_position() { - m_pending= pending; + return(before_stmt_pos); } - IO_CACHE trans_log; // The transaction cache - - void set_incident(void) + void set_prev_position(my_off_t pos) { - incident= TRUE; + before_stmt_pos= pos; } - bool has_incident(void) + void restore_prev_position() { - return(incident); + truncate(before_stmt_pos); + } + + void restore_savepoint(my_off_t pos) + { + truncate(pos); + if (pos < before_stmt_pos) + before_stmt_pos= MY_OFF_T_UNDEF; } - /** - Boolean that is true if there is at least one statement in the - transaction cache. + /* + Cache to store data before copying it to the binary log. */ - bool at_least_one_stmt; - bool incident; + IO_CACHE cache_log; private: /* - Pending binrows event. This event is the event where the rows are - currently written. + Pending binrows event. This event is the event where the rows are currently + written. */ Rows_log_event *m_pending; -public: /* Binlog position before the start of the current statement. */ my_off_t before_stmt_pos; + + /* + This indicates that some events did not get into the cache and most likely + it is corrupted. + */ + bool incident; + + /* + It truncates the cache to a certain position. This includes deleting the + pending event. + */ + void truncate(my_off_t pos) + { + DBUG_PRINT("info", ("truncating to position %lu", (ulong) pos)); + if (pending()) + { + delete pending(); + set_pending(0); + } + reinit_io_cache(&cache_log, WRITE_CACHE, pos, 0, 0); + cache_log.end_of_file= max_binlog_cache_size; + } + + binlog_cache_data& operator=(const binlog_cache_data& info); + binlog_cache_data(const binlog_cache_data& info); +}; + +class binlog_cache_mngr { +public: + binlog_cache_mngr() {} + + void reset_cache(binlog_cache_data* cache_data) + { + cache_data->reset(); + } + + binlog_cache_data* get_binlog_cache_data(bool is_transactional) + { + return (is_transactional ? &trx_cache : &stmt_cache); + } + + IO_CACHE* get_binlog_cache_log(bool is_transactional) + { + return (is_transactional ? &trx_cache.cache_log : &stmt_cache.cache_log); + } + + binlog_cache_data stmt_cache; + + binlog_cache_data trx_cache; + +private: + + binlog_cache_mngr& operator=(const binlog_cache_mngr& info); + binlog_cache_mngr(const binlog_cache_mngr& info); }; handlerton *binlog_hton; @@ -1265,26 +1308,6 @@ int LOGGER::set_handlers(uint error_log_printer, return 0; } -/** - This function checks if a transactional talbe was updated by the - current statement. - - @param thd The client thread that executed the current statement. - @return - @c true if a transactional table was updated, @false otherwise. -*/ -static bool stmt_has_updated_trans_table(THD *thd) -{ - Ha_trx_info *ha_info; - - for (ha_info= thd->transaction.stmt.ha_list; ha_info; ha_info= ha_info->next()) - { - if (ha_info->is_trx_read_write() && ha_info->ht() != binlog_hton) - return (TRUE); - } - return (FALSE); -} - /* Save position of binary log transaction cache. @@ -1307,10 +1330,10 @@ binlog_trans_log_savepos(THD *thd, my_off_t *pos) DBUG_ASSERT(pos != NULL); if (thd_get_ha_data(thd, binlog_hton) == NULL) thd->binlog_setup_trx_data(); - binlog_trx_data *const trx_data= - (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton); + binlog_cache_mngr *const cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton); DBUG_ASSERT(mysql_bin_log.is_open()); - *pos= trx_data->position(); + *pos= cache_mngr->trx_cache.get_byte_position(); DBUG_PRINT("return", ("*pos: %lu", (ulong) *pos)); DBUG_VOID_RETURN; } @@ -1341,9 +1364,9 @@ binlog_trans_log_truncate(THD *thd, my_off_t pos) /* Only true if binlog_trans_log_savepos() wasn't called before */ DBUG_ASSERT(pos != ~(my_off_t) 0); - binlog_trx_data *const trx_data= - (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton); - trx_data->truncate(pos); + binlog_cache_mngr *const cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton); + cache_mngr->trx_cache.restore_savepoint(pos); DBUG_VOID_RETURN; } @@ -1372,115 +1395,127 @@ int binlog_init(void *p) static int binlog_close_connection(handlerton *hton, THD *thd) { - binlog_trx_data *const trx_data= - (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton); - DBUG_ASSERT(trx_data->empty()); + binlog_cache_mngr *const cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton); + DBUG_ASSERT(cache_mngr->trx_cache.empty() && cache_mngr->stmt_cache.empty()); thd_set_ha_data(thd, binlog_hton, NULL); - trx_data->~binlog_trx_data(); - my_free((uchar*)trx_data, MYF(0)); + cache_mngr->~binlog_cache_mngr(); + my_free((uchar*)cache_mngr, MYF(0)); return 0; } -/* - End a transaction. +/** + This function flushes a transactional cache upon commit/rollback. - SYNOPSIS - binlog_end_trans() + @param thd The thread whose transaction should be flushed + @param cache_mngr Pointer to the cache data to be flushed + @param end_ev The end event either commit/rollback. - thd The thread whose transaction should be ended - trx_data Pointer to the transaction data to use - end_ev The end event to use, or NULL - all True if the entire transaction should be ended, false if - only the statement transaction should be ended. + @return + nonzero if an error pops up when flushing the transactional cache. +*/ +static int +binlog_flush_trx_cache(THD *thd, binlog_cache_mngr *cache_mngr, + Log_event *end_ev) +{ + DBUG_ENTER("binlog_flush_trx_cache"); + int error=0; + IO_CACHE *cache_log= &cache_mngr->trx_cache.cache_log; - DESCRIPTION + /* + This function handles transactional changes and as such + this flag equals to true. + */ + bool const is_transactional= TRUE; - End the currently open transaction. The transaction can be either - a real transaction (if 'all' is true) or a statement transaction - (if 'all' is false). + if (thd->binlog_flush_pending_rows_event(TRUE, is_transactional)) + DBUG_RETURN(1); + /* + Doing a commit or a rollback including non-transactional tables, + i.e., ending a transaction where we might write the transaction + cache to the binary log. + + We can always end the statement when ending a transaction since + transactions are not allowed inside stored functions. If they + were, we would have to ensure that we're not ending a statement + inside a stored function. + */ + error= mysql_bin_log.write(thd, &cache_mngr->trx_cache.cache_log, end_ev, + cache_mngr->trx_cache.has_incident()); + cache_mngr->reset_cache(&cache_mngr->trx_cache); - If 'end_ev' is NULL, the transaction is a rollback of only - transactional tables, so the transaction cache will be truncated - to either just before the last opened statement transaction (if - 'all' is false), or reset completely (if 'all' is true). - */ + /* + We need to step the table map version after writing the + transaction cache to disk. + */ + mysql_bin_log.update_table_map_version(); + statistic_increment(binlog_cache_use, &LOCK_status); + if (cache_log->disk_writes != 0) + { + statistic_increment(binlog_cache_disk_use, &LOCK_status); + cache_log->disk_writes= 0; + } + + DBUG_ASSERT(cache_mngr->trx_cache.empty()); + DBUG_RETURN(error); +} + +/** + This function truncates the transactional cache upon committing or rolling + back either a transaction or a statement. + + @param thd The thread whose transaction should be flushed + @param cache_mngr Pointer to the cache data to be flushed + @param all @c true means truncate the transaction, otherwise the + statement must be truncated. + + @return + nonzero if an error pops up when truncating the transactional cache. +*/ static int -binlog_end_trans(THD *thd, binlog_trx_data *trx_data, - Log_event *end_ev, bool all) +binlog_truncate_trx_cache(THD *thd, binlog_cache_mngr *cache_mngr, bool all) { - DBUG_ENTER("binlog_end_trans"); + DBUG_ENTER("binlog_truncate_trx_cache"); int error=0; - IO_CACHE *trans_log= &trx_data->trans_log; - DBUG_PRINT("enter", ("transaction: %s end_ev: 0x%lx", - all ? "all" : "stmt", (long) end_ev)); - DBUG_PRINT("info", ("thd->options={ %s%s}", - FLAGSTR(thd->options, OPTION_NOT_AUTOCOMMIT), - FLAGSTR(thd->options, OPTION_BEGIN))); + /* + This function handles transactional changes and as such this flag + equals to true. + */ + bool const is_transactional= TRUE; + DBUG_PRINT("info", ("thd->options={ %s%s}, transaction: %s", + FLAGSTR(thd->options, OPTION_NOT_AUTOCOMMIT), + FLAGSTR(thd->options, OPTION_BEGIN), + all ? "all" : "stmt")); /* - NULL denotes ROLLBACK with nothing to replicate: i.e., rollback of - only transactional tables. If the transaction contain changes to - any non-transactiona tables, we need write the transaction and log - a ROLLBACK last. + If rolling back an entire transaction or a single statement not + inside a transaction, we reset the transaction cache. */ - if (end_ev != NULL) + thd->binlog_remove_pending_rows_event(TRUE, is_transactional); + if (all || !thd->in_multi_stmt_transaction()) { - if (thd->binlog_flush_pending_rows_event(TRUE)) - DBUG_RETURN(1); - /* - Doing a commit or a rollback including non-transactional tables, - i.e., ending a transaction where we might write the transaction - cache to the binary log. - - We can always end the statement when ending a transaction since - transactions are not allowed inside stored functions. If they - were, we would have to ensure that we're not ending a statement - inside a stored function. - */ - error= mysql_bin_log.write(thd, &trx_data->trans_log, end_ev, - trx_data->has_incident()); - trx_data->reset(); + if (cache_mngr->trx_cache.has_incident()) + error= mysql_bin_log.write_incident(thd, TRUE); - /* - We need to step the table map version after writing the - transaction cache to disk. - */ - mysql_bin_log.update_table_map_version(); - statistic_increment(binlog_cache_use, &LOCK_status); - if (trans_log->disk_writes != 0) - { - statistic_increment(binlog_cache_disk_use, &LOCK_status); - trans_log->disk_writes= 0; - } + cache_mngr->reset_cache(&cache_mngr->trx_cache); + + thd->clear_binlog_table_maps(); } + /* + If rolling back a statement in a transaction, we truncate the + transaction cache to remove the statement. + */ else - { - /* - If rolling back an entire transaction or a single statement not - inside a transaction, we reset the transaction cache. + cache_mngr->trx_cache.restore_prev_position(); - If rolling back a statement in a transaction, we truncate the - transaction cache to remove the statement. - */ - thd->binlog_remove_pending_rows_event(TRUE); - if (all || !(thd->options & (OPTION_BEGIN | OPTION_NOT_AUTOCOMMIT))) - { - if (trx_data->has_incident()) - mysql_bin_log.write_incident(thd, TRUE); - trx_data->reset(); - } - else // ...statement - trx_data->truncate(trx_data->before_stmt_pos); - - /* - We need to step the table map version on a rollback to ensure - that a new table map event is generated instead of the one that - was written to the thrown-away transaction cache. - */ - mysql_bin_log.update_table_map_version(); - } + /* + We need to step the table map version on a rollback to ensure that a new + table map event is generated instead of the one that was written to the + thrown-away transaction cache. + */ + mysql_bin_log.update_table_map_version(); - DBUG_ASSERT(thd->binlog_get_pending_rows_event() == NULL); + DBUG_ASSERT(thd->binlog_get_pending_rows_event(is_transactional) == NULL); DBUG_RETURN(error); } @@ -1495,11 +1530,57 @@ static int binlog_prepare(handlerton *hton, THD *thd, bool all) return 0; } +/** + This function flushes the non-transactional to the binary log upon + committing or rolling back a statement. + + @param thd The thread whose transaction should be flushed + @param cache_mngr Pointer to the cache data to be flushed + + @return + nonzero if an error pops up when flushing the non-transactional cache. +*/ +static int +binlog_flush_stmt_cache(THD *thd, binlog_cache_mngr *cache_mngr) +{ + int error= 0; + DBUG_ENTER("binlog_flush_stmt_cache"); + /* + If we are flushing the statement cache, it means that the changes get + through otherwise the cache is empty and this routine should not be called. + */ + DBUG_ASSERT(cache_mngr->stmt_cache.has_incident() == FALSE); + /* + This function handles non-transactional changes and as such this flag equals + to false. + */ + bool const is_transactional= FALSE; + IO_CACHE *cache_log= &cache_mngr->stmt_cache.cache_log; + thd->binlog_flush_pending_rows_event(TRUE, is_transactional); + Query_log_event qev(thd, STRING_WITH_LEN("COMMIT"), TRUE, FALSE, TRUE, 0); + if ((error= mysql_bin_log.write(thd, cache_log, &qev, + cache_mngr->stmt_cache.has_incident()))) + DBUG_RETURN(error); + cache_mngr->reset_cache(&cache_mngr->stmt_cache); + + /* + We need to step the table map version after writing the + transaction cache to disk. + */ + mysql_bin_log.update_table_map_version(); + statistic_increment(binlog_cache_use, &LOCK_status); + if (cache_log->disk_writes != 0) + { + statistic_increment(binlog_cache_disk_use, &LOCK_status); + cache_log->disk_writes= 0; + } + DBUG_RETURN(error); +} + /** This function is called once after each statement. - It has the responsibility to flush the transaction cache to the - binlog file on commits. + It has the responsibility to flush the caches to the binary log on commits. @param hton The binlog handlerton. @param thd The client thread that executes the transaction. @@ -1512,54 +1593,53 @@ static int binlog_commit(handlerton *hton, THD *thd, bool all) { int error= 0; DBUG_ENTER("binlog_commit"); - binlog_trx_data *const trx_data= - (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton); + binlog_cache_mngr *const cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton); + bool const in_transaction= thd->in_multi_stmt_transaction(); + + DBUG_PRINT("debug", + ("all: %d, in_transaction: %s, all.modified_non_trans_table: %s, stmt.modified_non_trans_table: %s", + all, + YESNO(in_transaction), + YESNO(thd->transaction.all.modified_non_trans_table), + YESNO(thd->transaction.stmt.modified_non_trans_table))); + + if (!cache_mngr->stmt_cache.empty()) + { + binlog_flush_stmt_cache(thd, cache_mngr); + } - if (trx_data->empty()) + if (cache_mngr->trx_cache.empty()) { - // we're here because trans_log was flushed in MYSQL_BIN_LOG::log_xid() - trx_data->reset(); + /* + we're here because cache_log was flushed in MYSQL_BIN_LOG::log_xid() + */ + cache_mngr->reset_cache(&cache_mngr->trx_cache); DBUG_RETURN(0); } /* We commit the transaction if: - - We are not in a transaction and committing a statement, or - - - We are in a transaction and a full transaction is committed - - Otherwise, we accumulate the statement + - We are in a transaction and a full transaction is committed. + Otherwise, we accumulate the changes. */ - ulonglong const in_transaction= - thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN); - DBUG_PRINT("debug", - ("all: %d, empty: %s, in_transaction: %s, all.modified_non_trans_table: %s, stmt.modified_non_trans_table: %s", - all, - YESNO(trx_data->empty()), - YESNO(in_transaction), - YESNO(thd->transaction.all.modified_non_trans_table), - YESNO(thd->transaction.stmt.modified_non_trans_table))); if (!in_transaction || all) { - Query_log_event qev(thd, STRING_WITH_LEN("COMMIT"), TRUE, TRUE, 0); - error= binlog_end_trans(thd, trx_data, &qev, all); - goto end; + Query_log_event qev(thd, STRING_WITH_LEN("COMMIT"), TRUE, FALSE, TRUE, 0); + error= binlog_flush_trx_cache(thd, cache_mngr, &qev); } -end: - if (!all) - trx_data->before_stmt_pos = MY_OFF_T_UNDEF; // part of the stmt commit - DBUG_RETURN(error); -} + /* + This is part of the stmt rollback. + */ + if (!all) + cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF); + DBUG_RETURN(error); + } /** - This function is called when a transaction involving a transactional - table is rolled back. - - It has the responsibility to flush the transaction cache to the - binlog file. However, if the transaction does not involve - non-transactional tables, nothing needs to be logged. + This function is called when a transaction or a statement is rolled back. @param hton The binlog handlerton. @param thd The client thread that executes the transaction. @@ -1572,18 +1652,38 @@ static int binlog_rollback(handlerton *hton, THD *thd, bool all) { DBUG_ENTER("binlog_rollback"); int error=0; - binlog_trx_data *const trx_data= - (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton); - - if (trx_data->empty()) { - trx_data->reset(); - DBUG_RETURN(0); - } + binlog_cache_mngr *const cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton); DBUG_PRINT("debug", ("all: %s, all.modified_non_trans_table: %s, stmt.modified_non_trans_table: %s", YESNO(all), YESNO(thd->transaction.all.modified_non_trans_table), YESNO(thd->transaction.stmt.modified_non_trans_table))); + + /* + If an incident event is set we do not flush the content of the statement + cache because it may be corrupted. + */ + if (cache_mngr->stmt_cache.has_incident()) + { + mysql_bin_log.write_incident(thd, TRUE); + cache_mngr->reset_cache(&cache_mngr->stmt_cache); + } + else if (!cache_mngr->stmt_cache.empty()) + { + binlog_flush_stmt_cache(thd, cache_mngr); + } + + if (cache_mngr->trx_cache.empty()) + { + /* + we're here because cache_log was flushed in MYSQL_BIN_LOG::log_xid() + */ + cache_mngr->reset_cache(&cache_mngr->trx_cache); + DBUG_RETURN(0); + } + + if (mysql_bin_log.check_write_error(thd)) { /* @@ -1594,49 +1694,46 @@ static int binlog_rollback(handlerton *hton, THD *thd, bool all) */ DBUG_ASSERT(!all); /* - We reach this point if either only transactional tables were modified or - the effect of a statement that did not get into the binlog needs to be - rolled back. In the latter case, if a statement changed non-transactional - tables or had the OPTION_KEEP_LOG associated, we write an incident event - to the binlog in order to stop slaves and notify users that some changes - on the master did not get into the binlog and slaves will be inconsistent. - On the other hand, if a statement is transactional, we just safely roll it - back. + We reach this point if the effect of a statement did not properly get into + a cache and need to be rolled back. */ - if ((thd->transaction.stmt.modified_non_trans_table || - (thd->options & OPTION_KEEP_LOG)) && - mysql_bin_log.check_write_error(thd)) - trx_data->set_incident(); - error= binlog_end_trans(thd, trx_data, 0, all); + error= binlog_truncate_trx_cache(thd, cache_mngr, all); } else - { - /* - We flush the cache with a rollback, wrapped in a beging/rollback if: + { + /* + We flush the cache wrapped in a beging/rollback if: . aborting a transcation that modified a non-transactional table or; . aborting a statement that modified both transactional and - non-transctional tables but which is not in the boundaries of any - transaction; + non-transctional tables but which is not in the boundaries of any + transaction; . the OPTION_KEEP_LOG is activate. */ - if ((all && thd->transaction.all.modified_non_trans_table) || + if (thd->variables.binlog_format == BINLOG_FORMAT_STMT && + ((all && thd->transaction.all.modified_non_trans_table) || (!all && thd->transaction.stmt.modified_non_trans_table && - !(thd->options & (OPTION_BEGIN | OPTION_NOT_AUTOCOMMIT))) || - ((thd->options & OPTION_KEEP_LOG))) + !thd->in_multi_stmt_transaction()) || + (thd->options & OPTION_KEEP_LOG))) { - Query_log_event qev(thd, STRING_WITH_LEN("ROLLBACK"), TRUE, TRUE, 0); - error= binlog_end_trans(thd, trx_data, &qev, all); + Query_log_event qev(thd, STRING_WITH_LEN("ROLLBACK"), TRUE, FALSE, TRUE, 0); + error= binlog_flush_trx_cache(thd, cache_mngr, &qev); } /* Otherwise, we simply truncate the cache as there is no change on non-transactional tables as follows. */ - else if ((all && !thd->transaction.all.modified_non_trans_table) || - (!all && !thd->transaction.stmt.modified_non_trans_table)) - error= binlog_end_trans(thd, trx_data, 0, all); + else if (all || (!all && + (!thd->transaction.stmt.modified_non_trans_table || + !thd->in_multi_stmt_transaction() || + thd->variables.binlog_format != BINLOG_FORMAT_STMT))) + error= binlog_truncate_trx_cache(thd, cache_mngr, all); } + + /* + This is part of the stmt rollback. + */ if (!all) - trx_data->before_stmt_pos = MY_OFF_T_UNDEF; // part of the stmt rollback + cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF); DBUG_RETURN(error); } @@ -1712,7 +1809,8 @@ static int binlog_savepoint_set(handlerton *hton, THD *thd, void *sv) int errcode= query_error_code(thd, thd->killed == THD::NOT_KILLED); int const error= thd->binlog_query(THD::STMT_QUERY_TYPE, - thd->query, thd->query_length, TRUE, FALSE, errcode); + thd->query, thd->query_length, TRUE, FALSE, FALSE, + errcode); DBUG_RETURN(error); } @@ -1731,7 +1829,8 @@ static int binlog_savepoint_rollback(handlerton *hton, THD *thd, void *sv) int errcode= query_error_code(thd, thd->killed == THD::NOT_KILLED); int error= thd->binlog_query(THD::STMT_QUERY_TYPE, - thd->query, thd->query_length, TRUE, FALSE, errcode); + thd->query, thd->query_length, TRUE, FALSE, FALSE, + errcode); DBUG_RETURN(error); } binlog_trans_log_truncate(thd, *(my_off_t*)sv); @@ -3737,27 +3836,67 @@ bool MYSQL_BIN_LOG::is_query_in_union(THD *thd, query_id_t query_id_param) int THD::binlog_setup_trx_data() { DBUG_ENTER("THD::binlog_setup_trx_data"); - binlog_trx_data *trx_data= - (binlog_trx_data*) thd_get_ha_data(this, binlog_hton); + binlog_cache_mngr *cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton); - if (trx_data) + if (cache_mngr) DBUG_RETURN(0); // Already set up - trx_data= (binlog_trx_data*) my_malloc(sizeof(binlog_trx_data), MYF(MY_ZEROFILL)); - if (!trx_data || - open_cached_file(&trx_data->trans_log, mysql_tmpdir, + cache_mngr= (binlog_cache_mngr*) my_malloc(sizeof(binlog_cache_mngr), MYF(MY_ZEROFILL)); + if (!cache_mngr || + open_cached_file(&cache_mngr->stmt_cache.cache_log, mysql_tmpdir, + LOG_PREFIX, binlog_cache_size, MYF(MY_WME)) || + open_cached_file(&cache_mngr->trx_cache.cache_log, mysql_tmpdir, LOG_PREFIX, binlog_cache_size, MYF(MY_WME))) { - my_free((uchar*)trx_data, MYF(MY_ALLOW_ZERO_PTR)); + my_free((uchar*)cache_mngr, MYF(MY_ALLOW_ZERO_PTR)); DBUG_RETURN(1); // Didn't manage to set it up } - thd_set_ha_data(this, binlog_hton, trx_data); + thd_set_ha_data(this, binlog_hton, cache_mngr); - trx_data= new (thd_get_ha_data(this, binlog_hton)) binlog_trx_data; + cache_mngr= new (thd_get_ha_data(this, binlog_hton)) binlog_cache_mngr; DBUG_RETURN(0); } +/** + This function checks if a transactional talbe was updated by the + current transaction. + + @param thd The client thread that executed the current statement. + @return + @c true if a transactional table was updated, @false otherwise. +*/ +bool +trans_has_updated_trans_table(THD* thd) +{ + binlog_cache_mngr *const cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton); + + return (cache_mngr ? my_b_tell (&cache_mngr->trx_cache.cache_log) : 0); +} + +/** + This function checks if a transactional talbe was updated by the + current statement. + + @param thd The client thread that executed the current statement. + @return + @c true if a transactional table was updated, @false otherwise. +*/ +bool +stmt_has_updated_trans_table(THD *thd) +{ + Ha_trx_info *ha_info; + + for (ha_info= thd->transaction.stmt.ha_list; ha_info; ha_info= ha_info->next()) + { + if (ha_info->is_trx_read_write() && ha_info->ht() != binlog_hton) + return (TRUE); + } + return (FALSE); +} + /* Function to start a statement and optionally a transaction for the binary log. @@ -3771,11 +3910,10 @@ int THD::binlog_setup_trx_data() - Start a transaction if not in autocommit mode or if a BEGIN statement has been seen. - - Start a statement transaction to allow us to truncate the binary - log. + - Start a statement transaction to allow us to truncate the cache. - Save the currrent binlog position so that we can roll back the - statement by truncating the transaction log. + statement by truncating the cache. We only update the saved position if the old one was undefined, the reason is that there are some cases (e.g., for CREATE-SELECT) @@ -3789,15 +3927,15 @@ int THD::binlog_setup_trx_data() void THD::binlog_start_trans_and_stmt() { - binlog_trx_data *trx_data= (binlog_trx_data*) thd_get_ha_data(this, binlog_hton); + binlog_cache_mngr *cache_mngr= (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton); DBUG_ENTER("binlog_start_trans_and_stmt"); - DBUG_PRINT("enter", ("trx_data: 0x%lx trx_data->before_stmt_pos: %lu", - (long) trx_data, - (trx_data ? (ulong) trx_data->before_stmt_pos : + DBUG_PRINT("enter", ("cache_mngr: %p cache_mngr->trx_cache.get_prev_position(): %lu", + cache_mngr, + (cache_mngr ? (ulong) cache_mngr->trx_cache.get_prev_position() : (ulong) 0))); - if (trx_data == NULL || - trx_data->before_stmt_pos == MY_OFF_T_UNDEF) + if (cache_mngr == NULL || + cache_mngr->trx_cache.get_prev_position() == MY_OFF_T_UNDEF) { this->binlog_set_stmt_begin(); if (options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN)) @@ -3818,27 +3956,35 @@ THD::binlog_start_trans_and_stmt() } void THD::binlog_set_stmt_begin() { - binlog_trx_data *trx_data= - (binlog_trx_data*) thd_get_ha_data(this, binlog_hton); + binlog_cache_mngr *cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton); /* - The call to binlog_trans_log_savepos() might create the trx_data + The call to binlog_trans_log_savepos() might create the cache_mngr structure, if it didn't exist before, so we save the position into an auto variable and then write it into the transaction - data for the binary log (i.e., trx_data). + data for the binary log (i.e., cache_mngr). */ my_off_t pos= 0; binlog_trans_log_savepos(this, &pos); - trx_data= (binlog_trx_data*) thd_get_ha_data(this, binlog_hton); - trx_data->before_stmt_pos= pos; + cache_mngr= (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton); + cache_mngr->trx_cache.set_prev_position(pos); } -/* - Write a table map to the binary log. - */ - -int THD::binlog_write_table_map(TABLE *table, bool is_trans) +/** + This function writes a table map to the binary log. + Note that in order to keep the signature uniform with related methods, + we use a redundant parameter to indicate whether a transactional table + was changed or not. + + @param table a pointer to the table. + @param is_transactional @c true indicates a transactional table, + otherwise @c false a non-transactional. + @return + nonzero if an error pops up when writing the table map event. +*/ +int THD::binlog_write_table_map(TABLE *table, bool is_transactional) { int error; DBUG_ENTER("THD::binlog_write_table_map"); @@ -3854,12 +4000,17 @@ int THD::binlog_write_table_map(TABLE *table, bool is_trans) flags= Table_map_log_event::TM_NO_FLAGS; Table_map_log_event - the_event(this, table, table->s->table_map_id, is_trans, flags); + the_event(this, table, table->s->table_map_id, is_transactional, flags); - if (is_trans && binlog_table_maps == 0) + if (binlog_table_maps == 0) binlog_start_trans_and_stmt(); - if ((error= mysql_bin_log.write(&the_event))) + binlog_cache_mngr *const cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton); + + IO_CACHE *file= cache_mngr->get_binlog_cache_log(is_transactional); + + if ((error= the_event.write(file))) DBUG_RETURN(error); binlog_table_maps++; @@ -3867,144 +4018,163 @@ int THD::binlog_write_table_map(TABLE *table, bool is_trans) DBUG_RETURN(0); } +/** + This function retrieves a pending row event from a cache which is + specified through the parameter @c is_transactional. Respectively, when it + is @c true, the pending event is returned from the transactional cache. + Otherwise from the non-transactional cache. + + @param is_transactional @c true indicates a transactional cache, + otherwise @c false a non-transactional. + @return + The row event if any. +*/ Rows_log_event* -THD::binlog_get_pending_rows_event() const +THD::binlog_get_pending_rows_event(bool is_transactional) const { - binlog_trx_data *const trx_data= - (binlog_trx_data*) thd_get_ha_data(this, binlog_hton); + Rows_log_event* rows= NULL; + binlog_cache_mngr *const cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton); + /* - This is less than ideal, but here's the story: If there is no - trx_data, prepare_pending_rows_event() has never been called - (since the trx_data is set up there). In that case, we just return - NULL. + This is less than ideal, but here's the story: If there is no cache_mngr, + prepare_pending_rows_event() has never been called (since the cache_mngr + is set up there). In that case, we just return NULL. */ - return trx_data ? trx_data->pending() : NULL; + if (cache_mngr) + { + binlog_cache_data *cache_data= + cache_mngr->get_binlog_cache_data(is_transactional); + + rows= cache_data->pending(); + } + return (rows); } +/** + This function stores a pending row event into a cache which is specified + through the parameter @c is_transactional. Respectively, when it is @c + true, the pending event is stored into the transactional cache. Otherwise + into the non-transactional cache. + + @param evt a pointer to the row event. + @param is_transactional @c true indicates a transactional cache, + otherwise @c false a non-transactional. +*/ void -THD::binlog_set_pending_rows_event(Rows_log_event* ev) +THD::binlog_set_pending_rows_event(Rows_log_event* ev, bool is_transactional) { if (thd_get_ha_data(this, binlog_hton) == NULL) binlog_setup_trx_data(); - binlog_trx_data *const trx_data= - (binlog_trx_data*) thd_get_ha_data(this, binlog_hton); + binlog_cache_mngr *const cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton); + + DBUG_ASSERT(cache_mngr); + + binlog_cache_data *cache_data= + cache_mngr->get_binlog_cache_data(is_transactional); - DBUG_ASSERT(trx_data); - trx_data->set_pending(ev); + cache_data->set_pending(ev); } /** - Remove the pending rows event, discarding any outstanding rows. - - If there is no pending rows event available, this is effectively a + This function removes the pending rows event, discarding any outstanding + rows. If there is no pending rows event available, this is effectively a no-op. - */ + + @param thd a pointer to the user thread. + @param is_transactional @c true indicates a transactional cache, + otherwise @c false a non-transactional. +*/ int -MYSQL_BIN_LOG::remove_pending_rows_event(THD *thd) +MYSQL_BIN_LOG::remove_pending_rows_event(THD *thd, bool is_transactional) { DBUG_ENTER("MYSQL_BIN_LOG::remove_pending_rows_event"); - binlog_trx_data *const trx_data= - (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton); + binlog_cache_mngr *const cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton); + + DBUG_ASSERT(cache_mngr); - DBUG_ASSERT(trx_data); + binlog_cache_data *cache_data= + cache_mngr->get_binlog_cache_data(is_transactional); - if (Rows_log_event* pending= trx_data->pending()) + if (Rows_log_event* pending= cache_data->pending()) { delete pending; - trx_data->set_pending(NULL); + cache_data->set_pending(NULL); } DBUG_RETURN(0); } /* - Moves the last bunch of rows from the pending Rows event to the binlog - (either cached binlog if transaction, or disk binlog). Sets a new pending - event. + Moves the last bunch of rows from the pending Rows event to a cache (either + transactional cache if is_transaction is @c true, or the non-transactional + cache otherwise. Sets a new pending event. + + @param thd a pointer to the user thread. + @param evt a pointer to the row event. + @param is_transactional @c true indicates a transactional cache, + otherwise @c false a non-transactional. */ int MYSQL_BIN_LOG::flush_and_set_pending_rows_event(THD *thd, - Rows_log_event* event) + Rows_log_event* event, + bool is_transactional) { DBUG_ENTER("MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event)"); DBUG_ASSERT(mysql_bin_log.is_open()); DBUG_PRINT("enter", ("event: 0x%lx", (long) event)); int error= 0; + binlog_cache_mngr *const cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton); - binlog_trx_data *const trx_data= - (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton); + DBUG_ASSERT(cache_mngr); - DBUG_ASSERT(trx_data); + binlog_cache_data *cache_data= + cache_mngr->get_binlog_cache_data(is_transactional); - DBUG_PRINT("info", ("trx_data->pending(): 0x%lx", (long) trx_data->pending())); + DBUG_PRINT("info", ("cache_mngr->pending(): 0x%lx", (long) cache_data->pending())); - if (Rows_log_event* pending= trx_data->pending()) + if (Rows_log_event* pending= cache_data->pending()) { - IO_CACHE *file= &log_file; + IO_CACHE *file= &cache_data->cache_log; /* - Decide if we should write to the log file directly or to the - transaction log. - */ - if (pending->get_cache_stmt() || my_b_tell(&trx_data->trans_log)) - file= &trx_data->trans_log; - - /* - If we are writing to the log file directly, we could avoid - locking the log. This does not work since we need to step the - m_table_map_version below, and that change has to be protected - by the LOCK_log mutex. - */ - pthread_mutex_lock(&LOCK_log); - - /* - Write pending event to log file or transaction cache + Write pending event to the cache. */ if (pending->write(file)) { - pthread_mutex_unlock(&LOCK_log); set_write_error(thd); + if (check_write_error(thd) && cache_data && + thd->transaction.stmt.modified_non_trans_table) + cache_data->set_incident(); DBUG_RETURN(1); } /* We step the table map version if we are writing an event - representing the end of a statement. We do this regardless of - wheather we write to the transaction cache or to directly to the - file. - - In an ideal world, we could avoid stepping the table map version - if we were writing to a transaction cache, since we could then - reuse the table map that was written earlier in the transaction - cache. This does not work since STMT_END_F implies closing all - table mappings on the slave side. + representing the end of a statement. + In an ideal world, we could avoid stepping the table map version, + since we could then reuse the table map that was written earlier + in the cache. This does not work since STMT_END_F implies closing + all table mappings on the slave side. + TODO: Find a solution so that table maps does not have to be written several times within a transaction. - */ + */ if (pending->get_flags(Rows_log_event::STMT_END_F)) ++m_table_map_version; delete pending; - - if (file == &log_file) - { - error= flush_and_sync(0); - if (!error) - { - signal_update(); - rotate_and_purge(RP_LOCK_LOG_IS_ALREADY_LOCKED); - } - } - - pthread_mutex_unlock(&LOCK_log); } - thd->binlog_set_pending_rows_event(event); + thd->binlog_set_pending_rows_event(event, is_transactional); DBUG_RETURN(error); } @@ -4018,6 +4188,7 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info) THD *thd= event_info->thd; bool error= 1; DBUG_ENTER("MYSQL_BIN_LOG::write(Log_event *)"); + binlog_cache_data *cache_data= 0; if (thd->binlog_evt_union.do_union) { @@ -4026,27 +4197,22 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info) We will log the function call to the binary log on function exit */ thd->binlog_evt_union.unioned_events= TRUE; - thd->binlog_evt_union.unioned_events_trans |= event_info->cache_stmt; + thd->binlog_evt_union.unioned_events_trans |= + event_info->use_trans_cache(); DBUG_RETURN(0); } /* - Flush the pending rows event to the transaction cache or to the - log file. Since this function potentially aquire the LOCK_log - mutex, we do this before aquiring the LOCK_log mutex in this - function. - We only end the statement if we are in a top-level statement. If we are inside a stored function, we do not end the statement since this will close all tables on the slave. */ bool const end_stmt= thd->prelocked_mode && thd->lex->requires_prelocking(); - if (thd->binlog_flush_pending_rows_event(end_stmt)) + if (thd->binlog_flush_pending_rows_event(end_stmt, + event_info->use_trans_cache())) DBUG_RETURN(error); - pthread_mutex_lock(&LOCK_log); - /* In most cases this is only called if 'is_open()' is true; in fact this is mostly called if is_open() *was* true a few instructions before, but it @@ -4054,7 +4220,6 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info) */ if (likely(is_open())) { - IO_CACHE *file= &log_file; #ifdef HAVE_REPLICATION /* In the future we need to add to the following if tests like @@ -4064,63 +4229,67 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info) const char *local_db= event_info->get_db(); if ((thd && !(thd->options & OPTION_BIN_LOG)) || (!binlog_filter->db_ok(local_db))) - { - VOID(pthread_mutex_unlock(&LOCK_log)); DBUG_RETURN(0); - } #endif /* HAVE_REPLICATION */ -#if defined(USING_TRANSACTIONS) - /* - Should we write to the binlog cache or to the binlog on disk? - Write to the binlog cache if: - - it is already not empty (meaning we're in a transaction; note that the - present event could be about a non-transactional table, but still we need - to write to the binlog cache in that case to handle updates to mixed - trans/non-trans table types the best possible in binlogging) - - or if the event asks for it (cache_stmt == TRUE). - */ - if (opt_using_transactions && thd) +#if defined(USING_TRANSACTIONS) + IO_CACHE *file= NULL; + + if (event_info->use_direct_logging()) + { + file= &log_file; + pthread_mutex_lock(&LOCK_log); + } + else { if (thd->binlog_setup_trx_data()) goto err; - binlog_trx_data *const trx_data= - (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton); - IO_CACHE *trans_log= &trx_data->trans_log; - my_off_t trans_log_pos= my_b_tell(trans_log); - if (event_info->get_cache_stmt() || trans_log_pos != 0 || - stmt_has_updated_trans_table(thd)) + binlog_cache_mngr *const cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton); + + /* + If we are about to use write rows, we just need to check the type of + the event (either transactional or non-transactional) in order to + choose the cache. + */ + if (thd->is_current_stmt_binlog_format_row()) { - DBUG_PRINT("info", ("Using trans_log: cache: %d, trans_log_pos: %lu", - event_info->get_cache_stmt(), - (ulong) trans_log_pos)); - thd->binlog_start_trans_and_stmt(); - file= trans_log; + file= cache_mngr->get_binlog_cache_log(event_info->use_trans_cache()); + cache_data= cache_mngr->get_binlog_cache_data(event_info->use_trans_cache()); } /* - TODO as Mats suggested, for all the cases above where we write to - trans_log, it sounds unnecessary to lock LOCK_log. We should rather - test first if we want to write to trans_log, and if not, lock - LOCK_log. + However, if we are about to write statements we need to consider other + things. We use the non-transactional cache when: + + . the transactional cache is empty which means that there were no + early statement on behalf of the transaction. + . the respective event is tagged as non-transactional. */ + else if (cache_mngr->trx_cache.empty() && + !event_info->use_trans_cache()) + { + file= &cache_mngr->stmt_cache.cache_log; + cache_data= &cache_mngr->stmt_cache; + } + else + { + file= &cache_mngr->trx_cache.cache_log; + cache_data= &cache_mngr->trx_cache; + } + + thd->binlog_start_trans_and_stmt(); } #endif /* USING_TRANSACTIONS */ DBUG_PRINT("info",("event type: %d",event_info->get_type_code())); /* - No check for auto events flag here - this write method should - never be called if auto-events are enabled - */ - - /* - 1. Write first log events which describe the 'run environment' - of the SQL command - */ + No check for auto events flag here - this write method should + never be called if auto-events are enabled. - /* - If row-based binlogging, Insert_id, Rand and other kind of "setting - context" events are not needed. + Write first log events which describe the 'run environment' + of the SQL command. If row-based binlogging, Insert_id, Rand + and other kind of "setting context" events are not needed. */ if (thd) { @@ -4170,39 +4339,48 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info) } /* - Write the SQL command - */ - - if (event_info->write(file) || + Write the event. + */ + if (event_info->write(file) || DBUG_EVALUATE_IF("injecting_fault_writing", 1, 0)) goto err; - if (file == &log_file) // we are writing to the real log (disk) + error= 0; + +err: + if (event_info->use_direct_logging()) { - bool synced= 0; - if (flush_and_sync(&synced)) - goto err; + if (!error) + { + bool synced; + if ((error= flush_and_sync(&synced))) + goto unlock; - if (RUN_HOOK(binlog_storage, after_flush, - (thd, log_file_name, file->pos_in_file, synced))) { - sql_print_error("Failed to run 'after_flush' hooks"); - goto err; + if ((error= RUN_HOOK(binlog_storage, after_flush, + (thd, log_file_name, file->pos_in_file, synced)))) + { + sql_print_error("Failed to run 'after_flush' hooks"); + goto unlock; + } + signal_update(); + rotate_and_purge(RP_LOCK_LOG_IS_ALREADY_LOCKED); } - - signal_update(); - rotate_and_purge(RP_LOCK_LOG_IS_ALREADY_LOCKED); +unlock: + pthread_mutex_unlock(&LOCK_log); } - error=0; -err: if (error) + { set_write_error(thd); + if (check_write_error(thd) && cache_data && + thd->transaction.stmt.modified_non_trans_table) + cache_data->set_incident(); + } } if (event_info->flags & LOG_EVENT_UPDATE_TABLE_MAP_VERSION_F) ++m_table_map_version; - pthread_mutex_unlock(&LOCK_log); DBUG_RETURN(error); } @@ -4314,7 +4492,7 @@ uint MYSQL_BIN_LOG::next_file_id() write_cache() cache Cache to write to the binary log lock_log True if the LOCK_log mutex should be aquired, false otherwise - sync_log True if the log should be flushed and sync:ed + sync_log True if the log should be flushed and synced DESCRIPTION Write the contents of the cache to the binary log. The cache will @@ -4530,9 +4708,6 @@ bool MYSQL_BIN_LOG::write(THD *thd, IO_CACHE *cache, Log_event *commit_event, DBUG_ENTER("MYSQL_BIN_LOG::write(THD *, IO_CACHE *, Log_event *)"); VOID(pthread_mutex_lock(&LOCK_log)); - /* NULL would represent nothing to replicate after ROLLBACK */ - DBUG_ASSERT(commit_event != NULL); - DBUG_ASSERT(is_open()); if (likely(is_open())) // Should always be true { @@ -4547,19 +4722,9 @@ bool MYSQL_BIN_LOG::write(THD *thd, IO_CACHE *cache, Log_event *commit_event, transaction is either a BEGIN..COMMIT block or a single statement in autocommit mode. */ - Query_log_event qinfo(thd, STRING_WITH_LEN("BEGIN"), TRUE, TRUE, 0); - - /* - Now this Query_log_event has artificial log_pos 0. It must be - adjusted to reflect the real position in the log. Not doing it - would confuse the slave: it would prevent this one from - knowing where he is in the master's binlog, which would result - in wrong positions being shown to the user, MASTER_POS_WAIT - undue waiting etc. - */ + Query_log_event qinfo(thd, STRING_WITH_LEN("BEGIN"), TRUE, FALSE, TRUE, 0); if (qinfo.write(&log_file)) goto err; - DBUG_EXECUTE_IF("crash_before_writing_xid", { if ((write_error= write_cache(cache, false, true))) @@ -5657,13 +5822,13 @@ int TC_LOG_BINLOG::log_xid(THD *thd, my_xid xid) { DBUG_ENTER("TC_LOG_BINLOG::log"); Xid_log_event xle(thd, xid); - binlog_trx_data *trx_data= - (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton); + binlog_cache_mngr *cache_mngr= + (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton); /* We always commit the entire transaction when writing an XID. Also note that the return value is inverted. */ - DBUG_RETURN(!binlog_end_trans(thd, trx_data, &xle, TRUE)); + DBUG_RETURN(!binlog_flush_trx_cache(thd, cache_mngr, &xle)); } void TC_LOG_BINLOG::unlog(ulong cookie, my_xid xid) -- cgit v1.2.1 From ac647f5a3eb12313f981800ac1fd0c562402abcc Mon Sep 17 00:00:00 2001 From: unknown Date: Thu, 3 Dec 2009 16:59:58 +0800 Subject: WL#5142 FLUSH LOGS should take optional arguments for which log(s) to flush Support for flushing individual logs, so that the user can selectively flush a subset of the server logs. Flush of individual logs is done according to the following syntax: FLUSH LOGS; The syntax is extended so that the user is able to flush a subset of logs: FLUSH [log_category LOGS,]; where log_category is one of: SLOW ERROR BINARY ENGINE GENERAL RELAY. mysql-test/suite/rpl/r/rpl_flush_logs.result: Test result for WL#5142. mysql-test/suite/rpl/t/rpl_flush_logs.test: Added the test file to verify if the 'flush individual log' statement works fine. sql/log.cc: Added the two functions to flush slow and general log. sql/sql_parse.cc: Added code to flush specified logs against the option. sql/sql_yacc.yy: Added code to parse the 'flush * log' statement syntax and set its option to Lex->type. --- sql/log.cc | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) (limited to 'sql/log.cc') diff --git a/sql/log.cc b/sql/log.cc index 3d863583859..1c95b21f533 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -965,6 +965,54 @@ bool LOGGER::flush_logs(THD *thd) } +/** + Close and reopen the slow log (with locks). + + @returns FALSE. +*/ +bool LOGGER::flush_slow_log() +{ + /* + Now we lock logger, as nobody should be able to use logging routines while + log tables are closed + */ + logger.lock_exclusive(); + + /* Reopen slow log file */ + if (opt_slow_log) + file_log_handler->get_mysql_slow_log()->reopen_file(); + + /* End of log flush */ + logger.unlock(); + + return 0; +} + + +/** + Close and reopen the general log (with locks). + + @returns FALSE. +*/ +bool LOGGER::flush_general_log() +{ + /* + Now we lock logger, as nobody should be able to use logging routines while + log tables are closed + */ + logger.lock_exclusive(); + + /* Reopen general log file */ + if (opt_log) + file_log_handler->get_mysql_log()->reopen_file(); + + /* End of log flush */ + logger.unlock(); + + return 0; +} + + /* Log slow query with all enabled log event handlers -- cgit v1.2.1 From 9e980bf79ef0c727c630e79c1bc043c48bc947ee Mon Sep 17 00:00:00 2001 From: Mats Kindahl Date: Tue, 15 Dec 2009 16:11:44 +0100 Subject: BUG#49618: Field length stored incorrectly in binary log for InnoDB The class Field_bit_as_char stores the metadata for the field incorrecly because bytes_in_rec and bit_len are set to (field_length + 7 ) / 8 and 0 respectively, while Field_bit has the correct values field_length / 8 and field_length % 8. Solved the problem by re-computing the values for the metadata based on the field_length instead of using the bytes_in_rec and bit_len variables. To handle compatibility with old server, a table map flag was added to indicate that the bit computation is exact. If the flag is clear, the slave computes the number of bytes required to store the bit field and compares that instead, effectively allowing replication *without conversion* from any field length that require the same number of bytes to store. mysql-test/suite/rpl/t/rpl_typeconv_innodb.test: Adding test to check compatibility for bit field replication when using InnoDB. sql/field.cc: Extending compatible_field_size() with flags from table map to allow fields to check master info. sql/field.h: Extending compatible_field_size() with flags from table map to allow fields to check master info. sql/log.cc: Removing table map flags since they are not used outside table map class. sql/log_event.cc: Removing flags parameter from table map constructor since it is not used and does not have to be exposed. sql/log_event.h: Adding flag to denote that bit length for bit field type is exact and not potentially rounded to even bytes. sql/rpl_utility.cc: Adding fields to table_def to store table map flags. sql/rpl_utility.h: Removing obsolete comment and adding flags to store table map flags from master. --- sql/log.cc | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'sql/log.cc') diff --git a/sql/log.cc b/sql/log.cc index e8366c47863..f74b3ef858a 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -3846,11 +3846,8 @@ int THD::binlog_write_table_map(TABLE *table, bool is_trans) DBUG_ASSERT(current_stmt_binlog_row_based && mysql_bin_log.is_open()); DBUG_ASSERT(table->s->table_map_id != ULONG_MAX); - Table_map_log_event::flag_set const - flags= Table_map_log_event::TM_NO_FLAGS; - Table_map_log_event - the_event(this, table, table->s->table_map_id, is_trans, flags); + the_event(this, table, table->s->table_map_id, is_trans); if (is_trans && binlog_table_maps == 0) binlog_start_trans_and_stmt(); -- cgit v1.2.1 From 54b2371e92a723ee9d44c282b3b1c5b7baf50f89 Mon Sep 17 00:00:00 2001 From: Alfranio Correia Date: Tue, 5 Jan 2010 16:55:23 +0000 Subject: BUG#50038 Deadlock on flush logs with concurrent DML and RBR In auto-commit mode, updating both trx and non-trx tables (i.e. issuing a mixed statement) causes the following sequence of events: 1 - "Flush trx changes" (MYSQL_BIN_LOG::write) - T1: 1.1 - mutex_lock (&LOCK_log) 1.2 - mutex_lock (&LOCK_prep_xids) 1.3 - increase prepared_xids 1.4 - mutex_unlock (&LOCK_prep_xids) 1.5 - mutex_unlock (&LOCK_log) 2 - "Flush non-trx changes" (MYSQL_BIN_LOG::write) - T1: 2.1 - mutex_lock (&LOCK_log) 2.2 - mutex_unlock (&LOCK_log) 3. "unlog" - T1 3.1 - mutex_lock (&LOCK_prep_xids) 3.2 - decrease prepared xids 3.3 - pthread_cond_signal(&COND_prep_xids); 3.4 - mutex_unlock (&LOCK_prep_xids) The "FLUSH logs" command produces the following sequence of events: 1 - "FLUSH logs" command (MYSQL_BIN_LOG::new_file_impl) - user thread: 1.1 - mutex_lock (&LOCK_log) 1.2 - mutex_lock (&LOCK_prep_xids) 1.3 - while (prepared_xids) pthread_cond_wait(..., &LOCK_prep_xids); 1.4 - mutex_unlock (&LOCK_prep_xids) 1.5 - mutex_unlock (&LOCK_log) A deadlock will arise if T1 flushes the trx changes and thus increases prepared_xids but before it is able to continue the execution and flush the non-trx changes, an user thread calls the "FLUSH logs" command and wait that the prepared_xids is decreased and gets to zero. However, T1 cannot proceed with the call to "Flush non-trx changes" because it will block in the mutex "LOCK_log" and by consequence cannot complete the execution and call the unlog to decrease the prepared_xids. To fix the problem, we ensure that the non-trx changes are always flushed before the trx changes. Note that if you call "Flush non-trx changes" and a concurrent "FLUSH logs" is issued, the "Flush non-trx changes" may block, but a deadlock will never happen because the prepared_xids will eventually get to zero. Bottom line, there will not be any transaction able to increase the prepared_xids because they will block in the mutex "LOCK_log" (MYSQL_BIN_LOG::write) and those that increased the prepared_xids will eventually commit and decrease the prepared_xids. --- sql/log.cc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'sql/log.cc') diff --git a/sql/log.cc b/sql/log.cc index 55f08978b20..8781fb03031 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -5958,7 +5958,8 @@ int TC_LOG_BINLOG::log_xid(THD *thd, my_xid xid) We always commit the entire transaction when writing an XID. Also note that the return value is inverted. */ - DBUG_RETURN(!binlog_flush_trx_cache(thd, cache_mngr, &xle)); + DBUG_RETURN(!binlog_flush_stmt_cache(thd, cache_mngr) && + !binlog_flush_trx_cache(thd, cache_mngr, &xle)); } void TC_LOG_BINLOG::unlog(ulong cookie, my_xid xid) -- cgit v1.2.1 From e0e0f9e3d46917fe9b611fc9769e64032c267446 Mon Sep 17 00:00:00 2001 From: Marc Alff Date: Mon, 11 Jan 2010 18:47:27 -0700 Subject: WL#2360 Performance schema Part V: performance schema implementation --- sql/log.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'sql/log.cc') diff --git a/sql/log.cc b/sql/log.cc index 3680398f068..7776b6bfbdc 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -3075,7 +3075,7 @@ bool MYSQL_BIN_LOG::reset_logs(THD* thd) thread. If the transaction involved MyISAM tables, it should go into binlog even on rollback. */ - pthread_mutex_lock(&LOCK_thread_count); + mysql_mutex_lock(&LOCK_thread_count); /* Save variables so that we can reopen the log */ save_name=name; @@ -3168,7 +3168,7 @@ bool MYSQL_BIN_LOG::reset_logs(THD* thd) err: if (error == 1) name= const_cast(save_name); - pthread_mutex_unlock(&LOCK_thread_count); + mysql_mutex_unlock(&LOCK_thread_count); mysql_mutex_unlock(&LOCK_index); mysql_mutex_unlock(&LOCK_log); DBUG_RETURN(error); -- cgit v1.2.1