From 41783de54966481dd295aae95a2bfc8f6302a940 Mon Sep 17 00:00:00 2001
From: Sven Sandberg <sven.sandberg@sun.com>
Date: Tue, 14 Jul 2009 21:31:19 +0200
Subject: BUG#39934: Slave stops for engine that only support row-based logging
 General overview: The logic for switching to row format when
 binlog_format=MIXED had numerous flaws. The underlying problem was the lack
 of a consistent architecture. General purpose of this changeset: This
 changeset introduces an architecture for switching to row format when
 binlog_format=MIXED. It enforces the architecture where it has to. It leaves
 some bugs to be fixed later. It adds extensive tests to verify that unsafe
 statements work as expected and that appropriate errors are produced by
 problems with the selection of binlog format. It was not practical to split
 this into smaller pieces of work.

Problem 1:
To determine the logging mode, the code has to take several parameters
into account (namely: (1) the value of binlog_format; (2) the
capabilities of the engines; (3) the type of the current statement:
normal, unsafe, or row injection). These parameters may conflict in
several ways, namely:
 - binlog_format=STATEMENT for a row injection
 - binlog_format=STATEMENT for an unsafe statement
 - binlog_format=STATEMENT for an engine only supporting row logging
 - binlog_format=ROW for an engine only supporting statement logging
 - statement is unsafe and engine does not support row logging
 - row injection in a table that does not support statement logging
 - statement modifies one table that does not support row logging and
   one that does not support statement logging
Several of these conflicts were not detected, or were detected with
an inappropriate error message. The problem of BUG#39934 was that no
appropriate error message was written for the case when an engine
only supporting row logging executed a row injection with
binlog_format=ROW. However, all above cases must be handled.
Fix 1:
Introduce new error codes (sql/share/errmsg.txt). Ensure that all
conditions are detected and handled in decide_logging_format()

Problem 2:
The binlog format shall be determined once per statement, in
decide_logging_format(). It shall not be changed before or after that.
Before decide_logging_format() is called, all information necessary to
determine the logging format must be available. This principle ensures
that all unsafe statements are handled in a consistent way.
However, this principle is not followed:
thd->set_current_stmt_binlog_row_based_if_mixed() is called in several
places, including from code executing UPDATE..LIMIT,
INSERT..SELECT..LIMIT, DELETE..LIMIT, INSERT DELAYED, and
SET @@binlog_format. After Problem 1 was fixed, that caused
inconsistencies where these unsafe statements would not print the
appropriate warnings or errors for some of the conflicts.
Fix 2:
Remove calls to THD::set_current_stmt_binlog_row_based_if_mixed() from
code executed after decide_logging_format(). Compensate by calling the
set_current_stmt_unsafe() at parse time. This way, all unsafe statements
are detected by decide_logging_format().

Problem 3:
INSERT DELAYED is not unsafe: it is logged in statement format even if
binlog_format=MIXED, and no warning is printed even if
binlog_format=STATEMENT. This is BUG#45825.
Fix 3:
Made INSERT DELAYED set itself to unsafe at parse time. This allows
decide_logging_format() to detect that a warning should be printed or
the binlog_format changed.

Problem 4:
LIMIT clause were not marked as unsafe when executed inside stored
functions/triggers/views/prepared statements. This is
BUG#45785.
Fix 4:
Make statements containing the LIMIT clause marked as unsafe at
parse time, instead of at execution time. This allows propagating
unsafe-ness to the view.


mysql-test/extra/rpl_tests/create_recursive_construct.inc:
  Added auxiliary file used by binlog_unsafe.test to create and
  execute recursive constructs
  (functions/procedures/triggers/views/prepared statements).
mysql-test/extra/rpl_tests/rpl_foreign_key.test:
  removed unnecessary set @@session.binlog_format
mysql-test/extra/rpl_tests/rpl_insert_delayed.test:
  Filter out table id from table map events in binlog listing.
  Got rid of $binlog_format_statement.
mysql-test/extra/rpl_tests/rpl_ndb_apply_status.test:
  disable warnings around call to unsafe procedure
mysql-test/include/rpl_udf.inc:
  Disabled warnings for code that generates warnings
  for some binlog formats. That would otherwise cause
  inconsistencies in the result file.
mysql-test/r/mysqldump.result:
  Views are now unsafe if they contain a LIMIT clause.
  That fixed BUG#45831. Due to BUG#45832, a warning is
  printed for the CREATE VIEW statement.
mysql-test/r/sp_trans.result:
  Unsafe statements in stored procedures did not give a warning if
  binlog_format=statement. This is BUG#45824. Now they do, so this
  result file gets a new warning.
mysql-test/suite/binlog/r/binlog_multi_engine.result:
  Error message changed.
mysql-test/suite/binlog/r/binlog_statement_insert_delayed.result:
  INSERT DELAYED didn't generate a warning when binlog_format=STATEMENT.
  That was BUG#45825. Now there is a warning, so result file needs to be
  updated.
mysql-test/suite/binlog/r/binlog_stm_ps.result:
  Changed error message.
mysql-test/suite/binlog/r/binlog_unsafe.result:
  updated result file:
   - error message changed
   - added test for most combinations of unsafe constructs invoked
     from recursive constructs
   - INSERT DELAYED now gives a warning (because BUG#45826 is fixed)
   - INSERT..SELECT..LIMIT now gives a warning from inside recursive
     constructs (because BUG#45785 was fixed)
   - When a recursive construct (e.g., stored proc or function)
     contains more than one statement, at least one of which is
     unsafe, then all statements in the recursive construct give
     warnings. This is a new bug introduced by this changeset.
     It will be addressed in a post-push fix.
mysql-test/suite/binlog/t/binlog_innodb.test:
  Changed error code for innodb updates with READ COMMITTED or
  READ UNCOMMITTED transaction isolation level and
  binlog_format=statement.
mysql-test/suite/binlog/t/binlog_multi_engine.test:
  The error code has changed for statements where more than one
  engine is involved and one of them is self-logging.
mysql-test/suite/binlog/t/binlog_unsafe-master.opt:
  Since binlog_unsafe now tests unsafe-ness of UDF's, we need an extra
  flag in the .opt file.
mysql-test/suite/binlog/t/binlog_unsafe.test:
   - Clarified comment.
   - Rewrote first part of test. Now it tests not only unsafe variables
     and functions, but also unsafe-ness due to INSERT..SELECT..LIMIT,
     INSERT DELAYED, insert into two autoinc columns, use of UDF's, and
     access to log tables in the mysql database.
     Also, in addition to functions, procedures, triggers, and prepared
     statements, it now also tests views; and it constructs recursive
     calls in two levels by combining these recursive constructs.
     Part of the logic is in extra/rpl_tests/create_recursive_construct.inc.
   - added tests for all special system variables that should not be unsafe.
   - added specific tests for BUG#45785 and BUG#45825
mysql-test/suite/rpl/r/rpl_events.result:
  updated result file
mysql-test/suite/rpl/r/rpl_extraColmaster_innodb.result:
  updated result file
mysql-test/suite/rpl/r/rpl_extraColmaster_myisam.result:
  updated result file
mysql-test/suite/rpl/r/rpl_foreign_key_innodb.result:
  updated result file
mysql-test/suite/rpl/r/rpl_idempotency.result:
  updated result file
mysql-test/suite/rpl/r/rpl_mix_found_rows.result:
  Split rpl_found_rows.test into rpl_mix_found_rows.test (a new file) and
  rpl_stm_found_rows.test (renamed rpl_found_rows.test). This file equals
  the second half of the old rpl_found_rows.result, with the following
  modifications:
   - minor formatting changes
   - additional initialization
mysql-test/suite/rpl/r/rpl_mix_insert_delayed.result:
  Moved out code operating in mixed mode from rpl_stm_insert_delayed
  (into rpl_mix_insert_delayed) and got rid of explicit setting of
  binlog format.
mysql-test/suite/rpl/r/rpl_rbr_to_sbr.result:
  updated result file
mysql-test/suite/rpl/r/rpl_row_idempotency.result:
  Moved the second half of rpl_idempotency.test, which only
  executed in row mode, to rpl_row_idempotency.test. This is
  the new result file.
mysql-test/suite/rpl/r/rpl_row_insert_delayed.result:
  Got rid of unnecessary explicit setting of binlog format.
mysql-test/suite/rpl/r/rpl_stm_found_rows.result:
  Split rpl_found_rows.test into rpl_mix_found_rows.test (a new file) and
  rpl_stm_found_rows.test (renamed rpl_found_rows.test). Changes in
  this file:
   - minor formatting changes
   - warning is now issued for unsafe statements inside procedures
     (since BUG#45824 is fixed)
   - second half of file is moved to rpl_mix_found_rows.result
mysql-test/suite/rpl/r/rpl_stm_insert_delayed.result:
  Moved out code operating in mixed mode from rpl_stm_insert_delayed
  (into rpl_mix_insert_delayed) and got rid of explicit setting of
  binlog format.
mysql-test/suite/rpl/r/rpl_stm_loadfile.result:
  error message changed
mysql-test/suite/rpl/r/rpl_temporary_errors.result:
  updated result file
mysql-test/suite/rpl/r/rpl_udf.result:
  Remove explicit set of binlog format (and triplicate test execution)
  and rely on test system executing the test in all binlog formats.
mysql-test/suite/rpl/t/rpl_bug31076.test:
  Test is only valid in mixed or row mode since it generates row events.
mysql-test/suite/rpl/t/rpl_events.test:
  Removed explicit set of binlog_format and removed duplicate testing.
  Instead, we rely on the test system to try all binlog formats.
mysql-test/suite/rpl/t/rpl_extraColmaster_innodb.test:
  Removed triplicate testing and instead relying on test system.
  Test is only relevant for row format since statement-based replication
  cannot handle extra columns on master.
mysql-test/suite/rpl/t/rpl_extraColmaster_myisam.test:
  Removed triplicate testing and instead relying on test system.
  Test is only relevant for row format since statement-based replication
  cannot handle extra columns on master.
mysql-test/suite/rpl/t/rpl_idempotency-slave.opt:
  Removed .opt file to avoid server restarts.
mysql-test/suite/rpl/t/rpl_idempotency.test:
  - Moved out row-only tests to a new test file, rpl_row_idempotency.test.
    rpl_idempotency now only contains tests that execute in all
    binlog_formats.
  - While I was here, also removed .opt file to avoid server restarts.
    The slave_exec_mode is now set inside the test instead.
mysql-test/suite/rpl/t/rpl_mix_found_rows.test:
  Split rpl_found_rows.test into rpl_mix_found_rows.test (a new file) and
  rpl_stm_found_rows.test (renamed rpl_found_rows.test). This file
  contains the second half of the original rpl_found_rows.test with the
  follwing changes:
   - initialization
   - removed SET_BINLOG_FORMAT and added have_binlog_format_mixed.inc
   - minor formatting changes
mysql-test/suite/rpl/t/rpl_mix_insert_delayed.test:
  Moved out code operating in mixed mode from rpl_stm_insert_delayed
  (into rpl_mix_insert_delayed) and got rid of explicit setting of
  binlog format.
mysql-test/suite/rpl/t/rpl_rbr_to_sbr.test:
  Test cannot execute in statement mode, since we no longer
  switch to row format when binlog_format=statement.
  Enforced mixed mode throughout the test.
mysql-test/suite/rpl/t/rpl_row_idempotency.test:
  Moved the second half of rpl_idempotency.test, which only
  executed in row mode, to this new file. We now rely on the
  test system to set binlog format.
mysql-test/suite/rpl/t/rpl_row_insert_delayed.test:
   - Got rid of unnecessary explicit setting of binlog format.
   - extra/rpl_tests/rpl_insert_delayed.test does not need the
     $binlog_format_statement variable any more, so that was
     removed.
mysql-test/suite/rpl/t/rpl_slave_skip.test:
  The test switches binlog_format internally and master generates both
  row and statement events. Hence, the slave must be able to log in both
  statement and row format. Hence test was changed to only execute in
  mixed mode.
mysql-test/suite/rpl/t/rpl_stm_found_rows.test:
  Split rpl_found_rows.test into rpl_mix_found_rows.test (a new file) and
  rpl_stm_found_rows.test (renamed rpl_found_rows.test). Changes in
  this file:
   - minor formatting changes
   - added have_binlog_format_statement and removed SET BINLOG_FORMAT.
   - second half of file is moved to rpl_mix_found_rows.test
   - added cleanup code
mysql-test/suite/rpl/t/rpl_stm_insert_delayed.test:
  Moved out code operating in mixed mode from rpl_stm_insert_delayed
  (into rpl_mix_insert_delayed) and got rid of explicit setting of
  binlog format.
mysql-test/suite/rpl/t/rpl_switch_stm_row_mixed.test:
  The test switches binlog_format internally and master generates both
  row and statement events. Hence, the slave must be able to log in both
  statement and row format. Hence test was changed to only execute in
  mixed mode on slave.
mysql-test/suite/rpl/t/rpl_temporary_errors.test:
  Removed explicit set of binlog format. Instead, the test now only
  executes in row mode.
mysql-test/suite/rpl/t/rpl_udf.test:
  Remove explicit set of binlog format (and triplicate test execution)
  and rely on test system executing the test in all binlog formats.
mysql-test/suite/rpl_ndb/combinations:
  Added combinations file for rpl_ndb.
mysql-test/suite/rpl_ndb/r/rpl_ndb_binlog_format_errors.result:
  new result file
mysql-test/suite/rpl_ndb/r/rpl_ndb_circular_simplex.result:
  updated result file
mysql-test/suite/rpl_ndb/t/rpl_ndb_2innodb.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_2myisam.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_basic.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_binlog_format_errors-master.opt:
  new option file
mysql-test/suite/rpl_ndb/t/rpl_ndb_binlog_format_errors-slave.opt:
  new option file
mysql-test/suite/rpl_ndb/t/rpl_ndb_binlog_format_errors.test:
  New test case to verify all errors and warnings generated by
  decide_logging_format.
mysql-test/suite/rpl_ndb/t/rpl_ndb_blob.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_blob2.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_circular.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_circular_simplex.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
  While I was here, also made the test clean up after itself.
mysql-test/suite/rpl_ndb/t/rpl_ndb_commit_afterflush.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_ctype_ucs2_def.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_delete_nowhere.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_do_db.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_do_table.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_func003.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_innodb_trans.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_insert_ignore.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_mixed_engines_transactions.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_multi_update3.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_rep_ignore.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_row_001.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_sp003.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_sp006.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/suite/rpl_ndb/t/rpl_ndb_trig004.test:
  The test needs slave to be able to switch to row mode, so the
  test was changed to only execute in mixed and row mode.
mysql-test/t/partition_innodb_stmt.test:
  Changed error code for innodb updates with READ COMMITTED or
  READ UNCOMMITTED transaction isolation level and
  binlog_format=statement.
sql/event_db_repository.cc:
  Use member function to read current_stmt_binlog_row_based.
sql/events.cc:
  Use member function to read current_stmt_binlog_row_based.
sql/ha_ndbcluster_binlog.cc:
  reset_current_stmt_binlog_row_based() is not a no-op for the ndb_binlog
  thread any more. Instead, the ndb_binlog thread now forces row mode both
  initially and just after calling mysql_parse.  (mysql_parse() is the only
  place where reset_current_stmt_binlog_row_based() may be called from
  the ndb_binlog thread, so these are the only two places that need to
  change.)
sql/ha_partition.cc:
  Use member function to read current_stmt_binlog_row_based.
sql/handler.cc:
  Use member function to read current_stmt_binlog_row_based.
sql/item_create.cc:
  Added DBUG_ENTER to some functions, to be able to trace when
  set_stmt_unsafe is called.
sql/log.cc:
  Use member function to read current_stmt_binlog_row_based.
sql/log_event.cc:
   - Moved logic for changing to row format out of do_apply_event (and into
     decide_logging_format).
   - Added @todo comment for post-push cleanup.
sql/log_event_old.cc:
  Move logic for changing to row format out of do_apply_event (and into
  decide_logging_format).
sql/mysql_priv.h:
  Make decide_logging_format() a member of the THD class, for two reasons:
   - It is natural from an object-oriented perspective.
   - decide_logging_format() needs to access private members of THD
     (specifically, the new binlog_warning_flags field).
sql/rpl_injector.cc:
  Removed call to set_current_stmt_binlog_row_based().
  From now on, only decide_logging_fromat is allowed to modify
  current_stmt_binlog_row_based. This call is from the ndb_binlog
  thread, mostly executing code in ha_ndbcluster_binlog.cc.
  This call can be safely removed, because:
   - current_stmt_binlog_row_based is initialized for the ndb_binlog
     thread's THD object when the THD object is created. So we're
     not going to read uninitialized memory.
   - The behavior of ndb_binlog thread does not use the state of the
     current_stmt_binlog_row_based. It is conceivable that the
     ndb_binlog thread would rely on the current_stmt_binlog_format
     in two situations:
      (1) when it calls mysql_parse;
      (2) when it calls THD::binlog_query.
     In case (1), it always clears THD::options&OPTION_BIN_LOG (because
     run_query() in ha_ndbcluster_binlog.cc is only called with
     disable_binlogging = TRUE).
     In case (2), it always uses qtype=STMT_QUERY_TYPE.
sql/set_var.cc:
  Added @todo comment for post-push cleanup.
sql/share/errmsg.txt:
  Added new error messages and clarified ER_BINLOG_UNSAFE_STATEMENT.
sql/sp.cc:
  Added DBUG_ENTER, to be able to trace when set_stmt_unsafe is called.
  Got rid of MYSQL_QUERY_TYPE: it was equivalent to STMT_QUERY_TYPE.
sql/sp_head.cc:
  Use member function to read current_stmt_binlog_row_based.
sql/sp_head.h:
  Added DBUG_ENTER, to be able to trace when set_stmt_unsafe is called.
sql/sql_acl.cc:
  Got rid of MYSQL_QUERY_TYPE: it was equivalent to STMT_QUERY_TYPE.
sql/sql_base.cc:
   - Made decide_logging_format take care of all logic for deciding the
     logging format, and for determining the related warnings and errors.
     See comment above decide_logging_format for details.
   - Made decide_logging_format a member function of THD, since it needs
     to access private members of THD and since its purpose is to update
     the state of a THD object.
   - Added DBUG_ENTER, to be able to trace when set_stmt_unsafe is called.
sql/sql_class.cc:
  - Moved logic for determining unsafe warnings away from THD::binlog_query
    (and into decide_logging_format()). Now, it works like this:
    1. decide_logging_format detects that the current statement shall
       produce a warning, if it ever makes it to the binlog
    2. decide_logging_format sets a flag of THD::binlog_warning_flags.
    3. THD::binlog_query reads the flag. If the flag is set, it generates
       a warning.
  - Use member function to read current_stmt_binlog_row_based.
sql/sql_class.h:
  - Added THD::binlog_warning_flags (see sql_class.cc for explanation).
  - Made decide_logging_format() and reset_for_next_command() member
    functions of THD (instead of standalone functions). This was needed
    for two reasons: (1) the functions need to access the private member
    THD::binlog_warning_flags; (2) the purpose of these functions is to
    update the staet of a THD object, so from an object-oriented point
    of view they should be member functions.
  - Encapsulated current_stmt_binlog_row_based, so it is now private and
    can only be accessed from a member function. Also changed the
    data type to an enumeration instead of a bool.
  - Removed MYSQL_QUERY_TYPE, because it was equivalent to
    STMT_QUERY_TYPE anyways.
  - When reset_current_stmt_binlog_row_based was called from the
    ndb_binlog thread, it would behave as a no-op. This special
    case has been removed, and the behavior of
    reset_current_stmt_binlog_row_based does not depend on which thread
    calls it any more. The special case did not serve any purpose,
    since the ndb binlog thread did not take the
    current_stmt_binlog_row_based flag into account anyways.
sql/sql_delete.cc:
  - Moved logic for setting row format for DELETE..LIMIT away from
    mysql_prepare_delete.
    (Instead, we mark the statement as unsafe at parse time (sql_yacc.yy)
    and rely on decide_logging_format() (sql_class.cc) to set row format.)
    This is part of the fix for BUG#45831.
  - Use member function to read current_stmt_binlog_row_based.
sql/sql_insert.cc:
   - Removed unnecessary calls to thd->lex->set_stmt_unsafe() and
     thd->set_current_stmt_binlog_row_based_if_mixed() from
     handle_delayed_insert(). The calls are unnecessary because they
     have already been made; they were made in the constructor of
     the `di' object.
   - Since decide_logging_format() is now a member function of THD, code
     that calls decide_logging_format() had to be updated.
   - Added DBUG_ENTER call, to be able to trace when set_stmt_unsafe is
     called.
   - Moved call to set_stmt_unsafe() for INSERT..SELECT..LIMIT away from
     mysql_insert_select_prepare() (and into decide_logging_format).
     This is part of the fix for BUG#45831.
   - Use member function to read current_stmt_binlog_row_based.
sql/sql_lex.h:
   - Added the flag BINLOG_STMT_FLAG_ROW_INJECTION to enum_binlog_stmt_flag.
     This was necessary so that a statement can identify itself as a row
     injection.
   - Added appropriate setter and getter functions for the new flag.
   - Added or clarified some comments.
   - Added DBUG_ENTER()
sql/sql_load.cc:
  Use member function to read current_stmt_binlog_row_based.
sql/sql_parse.cc:
   - Made mysql_reset_thd_for_next_command() clear thd->binlog_warning_flags.
   - Since thd->binlog_warning_flags is private, it must be set in a
     member function of THD. Hence, moved the body of
     mysql_reset_thd_for_next_command() to the new member function
     THD::reset_thd_for_next_command(), and made
     mysql_reset_thd_for_next_command() call
     THD::reset_thd_for_next_command().
   - Removed confusing comment.
   - Use member function to read current_stmt_binlog_row_based.
sql/sql_repl.cc:
  Use member function to read current_stmt_binlog_row_based.
sql/sql_table.cc:
  Use member function to read current_stmt_binlog_row_based.
sql/sql_udf.cc:
  Use member function to read current_stmt_binlog_row_based.
sql/sql_update.cc:
  Moved logic for setting row format for UPDATE..LIMIT away from
  mysql_prepare_update.
  (Instead, we mark the statement as unsafe at parse time (sql_yacc.yy)
  and rely on decide_logging_format() (sql_class.cc) to set row format.)
  This is part of the fix for BUG#45831.
sql/sql_yacc.yy:
  Made INSERT DELAYED, INSERT..SELECT..LIMIT, UPDATE..LIMIT, and
  DELETE..LIMIT mark themselves as unsafe at parse time (instead
  of at execution time).
  This is part of the fixes BUG#45831 and BUG#45825.
storage/example/ha_example.cc:
  Made exampledb accept inserts. This was needed by the new test case
  rpl_ndb_binlog_format_errors, because it needs an engine that
  is statement-only (and accepts inserts).
storage/example/ha_example.h:
  Made exampledb a statement-only engine instead of a row-only engine.
  No existing test relied exampledb's row-only capabilities. The new
  test case rpl_ndb_binlog_format_errors needs an engine that is
  statement-only.
storage/innobase/handler/ha_innodb.cc:
  - Changed error error code and message given by innodb when
    binlog_format=STATEMENT and transaction isolation level is
    READ COMMITTED or READ UNCOMMITTED.
  - While I was here, also simplified the condition for
    checking when to give the error.
---
 sql/log.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'sql/log.cc')

diff --git a/sql/log.cc b/sql/log.cc
index ee7ee48b42c..ef7d5c75f84 100644
--- a/sql/log.cc
+++ b/sql/log.cc
@@ -3734,7 +3734,7 @@ int THD::binlog_write_table_map(TABLE *table, bool is_trans)
                        table->s->table_map_id));
 
   /* Pre-conditions */
-  DBUG_ASSERT(current_stmt_binlog_row_based && mysql_bin_log.is_open());
+  DBUG_ASSERT(is_current_stmt_binlog_format_row() && mysql_bin_log.is_open());
   DBUG_ASSERT(table->s->table_map_id != ULONG_MAX);
 
   Table_map_log_event::flag_set const
@@ -4009,7 +4009,7 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info)
     */
     if (thd)
     {
-      if (!thd->current_stmt_binlog_row_based)
+      if (!thd->is_current_stmt_binlog_format_row())
       {
         if (thd->stmt_depends_on_first_successful_insert_id_in_prev_stmt)
         {
-- 
cgit v1.2.1


From 19c380aaff1f1f3c0d21ac0c18904c21d7bdce76 Mon Sep 17 00:00:00 2001
From: Alfranio Correia <alfranio.correia@sun.com>
Date: Tue, 3 Nov 2009 19:02:56 +0000
Subject: WL#2687 WL#5072 BUG#40278 BUG#47175

Non-transactional updates that take place inside a transaction present problems
for logging because they are visible to other clients before the transaction
is committed, and they are not rolled back even if the transaction is rolled
back. It is not always possible to log correctly in statement format when both
transactional and non-transactional tables are used in the same transaction.

In the current patch, we ensure that such scenario is completely safe under the
ROW and MIXED modes.
---
 sql/log.cc | 1041 +++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 603 insertions(+), 438 deletions(-)

(limited to 'sql/log.cc')

diff --git a/sql/log.cc b/sql/log.cc
index 491030aca34..fa063196e14 100644
--- a/sql/log.cc
+++ b/sql/log.cc
@@ -148,112 +148,155 @@ private:
 };
 
 /*
-  Helper class to store binary log transaction data.
+  Helper classes to store non-transactional and transactional data
+  before copying it to the binary log.
 */
-class binlog_trx_data {
+class binlog_cache_data
+{
 public:
-  binlog_trx_data()
-    : at_least_one_stmt(0), incident(FALSE), m_pending(0),
-    before_stmt_pos(MY_OFF_T_UNDEF)
+  binlog_cache_data(): m_pending(0), before_stmt_pos (MY_OFF_T_UNDEF),
+  incident(FALSE)
   {
-    trans_log.end_of_file= max_binlog_cache_size;
+    cache_log.end_of_file= max_binlog_cache_size;
   }
 
-  ~binlog_trx_data()
+  ~binlog_cache_data()
   {
-    DBUG_ASSERT(pending() == NULL);
-    close_cached_file(&trans_log);
+    DBUG_ASSERT(empty());
+    close_cached_file(&cache_log);
   }
 
-  my_off_t position() const {
-    return my_b_tell(&trans_log);
+  bool empty() const
+  {
+    return pending() == NULL && my_b_tell(&cache_log) == 0;
   }
 
-  bool empty() const
+  Rows_log_event *pending() const
   {
-    return pending() == NULL && my_b_tell(&trans_log) == 0;
+    return m_pending;
   }
 
-  /*
-    Truncate the transaction cache to a certain position. This
-    includes deleting the pending event.
-   */
-  void truncate(my_off_t pos)
+  void set_pending(Rows_log_event *const pending)
   {
-    DBUG_PRINT("info", ("truncating to position %lu", (ulong) pos));
-    DBUG_PRINT("info", ("before_stmt_pos=%lu", (ulong) pos));
-    delete pending();
-    set_pending(0);
-    reinit_io_cache(&trans_log, WRITE_CACHE, pos, 0, 0);
-    trans_log.end_of_file= max_binlog_cache_size;
-    if (pos < before_stmt_pos)
-      before_stmt_pos= MY_OFF_T_UNDEF;
+    m_pending= pending;
+  }
 
-    /*
-      The only valid positions that can be truncated to are at the
-      beginning of a statement. We are relying on this fact to be able
-      to set the at_least_one_stmt flag correctly. In other word, if
-      we are truncating to the beginning of the transaction cache,
-      there will be no statements in the cache, otherwhise, we will
-      have at least one statement in the transaction cache.
-     */
-    at_least_one_stmt= (pos > 0);
+  void set_incident(void)
+  {
+    incident= TRUE;
+  }
+  
+  bool has_incident(void)
+  {
+    return(incident);
   }
 
-  /*
-    Reset the entire contents of the transaction cache, emptying it
-    completely.
-   */
-  void reset() {
-    if (!empty())
-      truncate(0);
-    before_stmt_pos= MY_OFF_T_UNDEF;
+  void reset()
+  {
+    truncate(0);
     incident= FALSE;
-    trans_log.end_of_file= max_binlog_cache_size;
+    before_stmt_pos= MY_OFF_T_UNDEF;
+    cache_log.end_of_file= max_binlog_cache_size;
     DBUG_ASSERT(empty());
   }
 
-  Rows_log_event *pending() const
+  my_off_t get_byte_position() const
   {
-    return m_pending;
+    return my_b_tell(&cache_log);
   }
 
-  void set_pending(Rows_log_event *const pending)
+  my_off_t get_prev_position()
   {
-    m_pending= pending;
+     return(before_stmt_pos);
   }
 
-  IO_CACHE trans_log;                         // The transaction cache
-
-  void set_incident(void)
+  void set_prev_position(my_off_t pos)
   {
-    incident= TRUE;
+     before_stmt_pos= pos;
   }
   
-  bool has_incident(void)
+  void restore_prev_position()
   {
-    return(incident);
+    truncate(before_stmt_pos);
+  }
+
+  void restore_savepoint(my_off_t pos)
+  {
+    truncate(pos);
+    if (pos < before_stmt_pos)
+      before_stmt_pos= MY_OFF_T_UNDEF;
   }
 
-  /**
-    Boolean that is true if there is at least one statement in the
-    transaction cache.
+  /*
+    Cache to store data before copying it to the binary log.
   */
-  bool at_least_one_stmt;
-  bool incident;
+  IO_CACHE cache_log;
 
 private:
   /*
-    Pending binrows event. This event is the event where the rows are
-    currently written.
+    Pending binrows event. This event is the event where the rows are currently
+    written.
    */
   Rows_log_event *m_pending;
 
-public:
   /*
     Binlog position before the start of the current statement.
   */
   my_off_t before_stmt_pos;
+ 
+  /*
+    This indicates that some events did not get into the cache and most likely
+    it is corrupted.
+  */ 
+  bool incident;
+
+  /*
+    It truncates the cache to a certain position. This includes deleting the
+    pending event.
+   */
+  void truncate(my_off_t pos)
+  {
+    DBUG_PRINT("info", ("truncating to position %lu", (ulong) pos));
+    if (pending())
+    {
+      delete pending();
+      set_pending(0);
+    }
+    reinit_io_cache(&cache_log, WRITE_CACHE, pos, 0, 0);
+    cache_log.end_of_file= max_binlog_cache_size;
+  }
+ 
+  binlog_cache_data& operator=(const binlog_cache_data& info);
+  binlog_cache_data(const binlog_cache_data& info);
+};
+
+class binlog_cache_mngr {
+public:
+  binlog_cache_mngr() {}
+
+  void reset_cache(binlog_cache_data* cache_data)
+  {
+    cache_data->reset();
+  }
+
+  binlog_cache_data* get_binlog_cache_data(bool is_transactional)
+  {
+    return (is_transactional ? &trx_cache : &stmt_cache);
+  }
+
+  IO_CACHE* get_binlog_cache_log(bool is_transactional)
+  {
+    return (is_transactional ? &trx_cache.cache_log : &stmt_cache.cache_log);
+  }
+
+  binlog_cache_data stmt_cache;
+
+  binlog_cache_data trx_cache;
+
+private:
+
+  binlog_cache_mngr& operator=(const binlog_cache_mngr& info);
+  binlog_cache_mngr(const binlog_cache_mngr& info);
 };
 
 handlerton *binlog_hton;
@@ -1265,26 +1308,6 @@ int LOGGER::set_handlers(uint error_log_printer,
   return 0;
 }
 
-/** 
-    This function checks if a transactional talbe was updated by the
-    current statement.
-
-    @param thd The client thread that executed the current statement.
-    @return
-      @c true if a transactional table was updated, @false otherwise.
-*/
-static bool stmt_has_updated_trans_table(THD *thd)
-{
-  Ha_trx_info *ha_info;
-
-  for (ha_info= thd->transaction.stmt.ha_list; ha_info; ha_info= ha_info->next())
-  {
-    if (ha_info->is_trx_read_write() && ha_info->ht() != binlog_hton)
-      return (TRUE);
-  }
-  return (FALSE);
-}
-
  /*
   Save position of binary log transaction cache.
 
@@ -1307,10 +1330,10 @@ binlog_trans_log_savepos(THD *thd, my_off_t *pos)
   DBUG_ASSERT(pos != NULL);
   if (thd_get_ha_data(thd, binlog_hton) == NULL)
     thd->binlog_setup_trx_data();
-  binlog_trx_data *const trx_data=
-    (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton);
+  binlog_cache_mngr *const cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton);
   DBUG_ASSERT(mysql_bin_log.is_open());
-  *pos= trx_data->position();
+  *pos= cache_mngr->trx_cache.get_byte_position();
   DBUG_PRINT("return", ("*pos: %lu", (ulong) *pos));
   DBUG_VOID_RETURN;
 }
@@ -1341,9 +1364,9 @@ binlog_trans_log_truncate(THD *thd, my_off_t pos)
   /* Only true if binlog_trans_log_savepos() wasn't called before */
   DBUG_ASSERT(pos != ~(my_off_t) 0);
 
-  binlog_trx_data *const trx_data=
-    (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton);
-  trx_data->truncate(pos);
+  binlog_cache_mngr *const cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton);
+  cache_mngr->trx_cache.restore_savepoint(pos);
   DBUG_VOID_RETURN;
 }
 
@@ -1372,115 +1395,127 @@ int binlog_init(void *p)
 
 static int binlog_close_connection(handlerton *hton, THD *thd)
 {
-  binlog_trx_data *const trx_data=
-    (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton);
-  DBUG_ASSERT(trx_data->empty());
+  binlog_cache_mngr *const cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton);
+  DBUG_ASSERT(cache_mngr->trx_cache.empty() && cache_mngr->stmt_cache.empty());
   thd_set_ha_data(thd, binlog_hton, NULL);
-  trx_data->~binlog_trx_data();
-  my_free((uchar*)trx_data, MYF(0));
+  cache_mngr->~binlog_cache_mngr();
+  my_free((uchar*)cache_mngr, MYF(0));
   return 0;
 }
 
-/*
-  End a transaction.
+/**
+  This function flushes a transactional cache upon commit/rollback.
 
-  SYNOPSIS
-    binlog_end_trans()
+  @param thd        The thread whose transaction should be flushed
+  @param cache_mngr Pointer to the cache data to be flushed
+  @param end_ev     The end event either commit/rollback.
 
-    thd      The thread whose transaction should be ended
-    trx_data Pointer to the transaction data to use
-    end_ev   The end event to use, or NULL
-    all      True if the entire transaction should be ended, false if
-             only the statement transaction should be ended.
+  @return
+    nonzero if an error pops up when flushing the transactional cache.
+*/
+static int
+binlog_flush_trx_cache(THD *thd, binlog_cache_mngr *cache_mngr,
+                       Log_event *end_ev)
+{
+  DBUG_ENTER("binlog_flush_trx_cache");
+  int error=0;
+  IO_CACHE *cache_log= &cache_mngr->trx_cache.cache_log;
 
-  DESCRIPTION
+  /*
+    This function handles transactional changes and as such
+    this flag equals to true.
+  */
+  bool const is_transactional= TRUE;
 
-    End the currently open transaction. The transaction can be either
-    a real transaction (if 'all' is true) or a statement transaction
-    (if 'all' is false).
+  if (thd->binlog_flush_pending_rows_event(TRUE, is_transactional))
+    DBUG_RETURN(1);
+  /*
+    Doing a commit or a rollback including non-transactional tables,
+    i.e., ending a transaction where we might write the transaction
+    cache to the binary log.
+
+    We can always end the statement when ending a transaction since
+    transactions are not allowed inside stored functions. If they
+    were, we would have to ensure that we're not ending a statement
+    inside a stored function.
+  */
+  error= mysql_bin_log.write(thd, &cache_mngr->trx_cache.cache_log, end_ev,
+                             cache_mngr->trx_cache.has_incident());
+  cache_mngr->reset_cache(&cache_mngr->trx_cache);
 
-    If 'end_ev' is NULL, the transaction is a rollback of only
-    transactional tables, so the transaction cache will be truncated
-    to either just before the last opened statement transaction (if
-    'all' is false), or reset completely (if 'all' is true).
- */
+  /*
+    We need to step the table map version after writing the
+    transaction cache to disk.
+  */
+  mysql_bin_log.update_table_map_version();
+  statistic_increment(binlog_cache_use, &LOCK_status);
+  if (cache_log->disk_writes != 0)
+  {
+    statistic_increment(binlog_cache_disk_use, &LOCK_status);
+    cache_log->disk_writes= 0;
+  }
+
+  DBUG_ASSERT(cache_mngr->trx_cache.empty());
+  DBUG_RETURN(error);
+}
+
+/**
+  This function truncates the transactional cache upon committing or rolling
+  back either a transaction or a statement.
+
+  @param thd        The thread whose transaction should be flushed
+  @param cache_mngr Pointer to the cache data to be flushed
+  @param all        @c true means truncate the transaction, otherwise the
+                    statement must be truncated.
+
+  @return
+    nonzero if an error pops up when truncating the transactional cache.
+*/
 static int
-binlog_end_trans(THD *thd, binlog_trx_data *trx_data,
-                 Log_event *end_ev, bool all)
+binlog_truncate_trx_cache(THD *thd, binlog_cache_mngr *cache_mngr, bool all)
 {
-  DBUG_ENTER("binlog_end_trans");
+  DBUG_ENTER("binlog_truncate_trx_cache");
   int error=0;
-  IO_CACHE *trans_log= &trx_data->trans_log;
-  DBUG_PRINT("enter", ("transaction: %s  end_ev: 0x%lx",
-                       all ? "all" : "stmt", (long) end_ev));
-  DBUG_PRINT("info", ("thd->options={ %s%s}",
-                      FLAGSTR(thd->options, OPTION_NOT_AUTOCOMMIT),
-                      FLAGSTR(thd->options, OPTION_BEGIN)));
+  /*
+    This function handles transactional changes and as such this flag
+    equals to true.
+  */
+  bool const is_transactional= TRUE;
 
+  DBUG_PRINT("info", ("thd->options={ %s%s}, transaction: %s",
+                      FLAGSTR(thd->options, OPTION_NOT_AUTOCOMMIT),
+                      FLAGSTR(thd->options, OPTION_BEGIN),
+                      all ? "all" : "stmt"));
   /*
-    NULL denotes ROLLBACK with nothing to replicate: i.e., rollback of
-    only transactional tables.  If the transaction contain changes to
-    any non-transactiona tables, we need write the transaction and log
-    a ROLLBACK last.
+    If rolling back an entire transaction or a single statement not
+    inside a transaction, we reset the transaction cache.
   */
-  if (end_ev != NULL)
+  thd->binlog_remove_pending_rows_event(TRUE, is_transactional);
+  if (all || !thd->in_multi_stmt_transaction())
   {
-    if (thd->binlog_flush_pending_rows_event(TRUE))
-      DBUG_RETURN(1);
-    /*
-      Doing a commit or a rollback including non-transactional tables,
-      i.e., ending a transaction where we might write the transaction
-      cache to the binary log.
-
-      We can always end the statement when ending a transaction since
-      transactions are not allowed inside stored functions.  If they
-      were, we would have to ensure that we're not ending a statement
-      inside a stored function.
-     */
-    error= mysql_bin_log.write(thd, &trx_data->trans_log, end_ev,
-                               trx_data->has_incident());
-    trx_data->reset();
+    if (cache_mngr->trx_cache.has_incident())
+      error= mysql_bin_log.write_incident(thd, TRUE);
 
-    /*
-      We need to step the table map version after writing the
-      transaction cache to disk.
-    */
-    mysql_bin_log.update_table_map_version();
-    statistic_increment(binlog_cache_use, &LOCK_status);
-    if (trans_log->disk_writes != 0)
-    {
-      statistic_increment(binlog_cache_disk_use, &LOCK_status);
-      trans_log->disk_writes= 0;
-    }
+    cache_mngr->reset_cache(&cache_mngr->trx_cache);
+
+    thd->clear_binlog_table_maps();
   }
+  /*
+    If rolling back a statement in a transaction, we truncate the
+    transaction cache to remove the statement.
+  */
   else
-  {
-    /*
-      If rolling back an entire transaction or a single statement not
-      inside a transaction, we reset the transaction cache.
+      cache_mngr->trx_cache.restore_prev_position();
 
-      If rolling back a statement in a transaction, we truncate the
-      transaction cache to remove the statement.
-     */
-    thd->binlog_remove_pending_rows_event(TRUE);
-    if (all || !(thd->options & (OPTION_BEGIN | OPTION_NOT_AUTOCOMMIT)))
-    {
-      if (trx_data->has_incident())
-        mysql_bin_log.write_incident(thd, TRUE);
-      trx_data->reset();
-    }
-    else                                        // ...statement
-      trx_data->truncate(trx_data->before_stmt_pos);
-
-    /*
-      We need to step the table map version on a rollback to ensure
-      that a new table map event is generated instead of the one that
-      was written to the thrown-away transaction cache.
-    */
-    mysql_bin_log.update_table_map_version();
-  }
+  /*
+    We need to step the table map version on a rollback to ensure that a new
+    table map event is generated instead of the one that was written to the
+    thrown-away transaction cache.
+  */
+  mysql_bin_log.update_table_map_version();
 
-  DBUG_ASSERT(thd->binlog_get_pending_rows_event() == NULL);
+  DBUG_ASSERT(thd->binlog_get_pending_rows_event(is_transactional) == NULL);
   DBUG_RETURN(error);
 }
 
@@ -1495,11 +1530,57 @@ static int binlog_prepare(handlerton *hton, THD *thd, bool all)
   return 0;
 }
 
+/**
+  This function flushes the non-transactional to the binary log upon
+  committing or rolling back a statement.
+
+  @param thd        The thread whose transaction should be flushed
+  @param cache_mngr Pointer to the cache data to be flushed
+
+  @return
+    nonzero if an error pops up when flushing the non-transactional cache.
+*/
+static int
+binlog_flush_stmt_cache(THD *thd, binlog_cache_mngr *cache_mngr)
+{
+  int error= 0;
+  DBUG_ENTER("binlog_flush_stmt_cache");
+  /*
+    If we are flushing the statement cache, it means that the changes get
+    through otherwise the cache is empty and this routine should not be called.
+  */
+  DBUG_ASSERT(cache_mngr->stmt_cache.has_incident() == FALSE);
+  /*
+    This function handles non-transactional changes and as such this flag equals
+    to false.
+  */
+  bool const is_transactional= FALSE;
+  IO_CACHE *cache_log= &cache_mngr->stmt_cache.cache_log;
+  thd->binlog_flush_pending_rows_event(TRUE, is_transactional);
+  Query_log_event qev(thd, STRING_WITH_LEN("COMMIT"), TRUE, FALSE, TRUE, 0);
+  if ((error= mysql_bin_log.write(thd, cache_log, &qev,
+                                  cache_mngr->stmt_cache.has_incident())))
+    DBUG_RETURN(error);
+  cache_mngr->reset_cache(&cache_mngr->stmt_cache);
+
+  /*
+    We need to step the table map version after writing the
+    transaction cache to disk.
+  */
+  mysql_bin_log.update_table_map_version();
+  statistic_increment(binlog_cache_use, &LOCK_status);
+  if (cache_log->disk_writes != 0)
+  {
+    statistic_increment(binlog_cache_disk_use, &LOCK_status);
+    cache_log->disk_writes= 0;
+  }
+  DBUG_RETURN(error);
+}
+
 /**
   This function is called once after each statement.
 
-  It has the responsibility to flush the transaction cache to the
-  binlog file on commits.
+  It has the responsibility to flush the caches to the binary log on commits.
 
   @param hton  The binlog handlerton.
   @param thd   The client thread that executes the transaction.
@@ -1512,54 +1593,53 @@ static int binlog_commit(handlerton *hton, THD *thd, bool all)
 {
   int error= 0;
   DBUG_ENTER("binlog_commit");
-  binlog_trx_data *const trx_data=
-    (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton);
+  binlog_cache_mngr *const cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton);
+  bool const in_transaction= thd->in_multi_stmt_transaction();
+
+  DBUG_PRINT("debug",
+             ("all: %d, in_transaction: %s, all.modified_non_trans_table: %s, stmt.modified_non_trans_table: %s",
+              all,
+              YESNO(in_transaction),
+              YESNO(thd->transaction.all.modified_non_trans_table),
+              YESNO(thd->transaction.stmt.modified_non_trans_table)));
+
+  if (!cache_mngr->stmt_cache.empty())
+  {
+    binlog_flush_stmt_cache(thd, cache_mngr);
+  }
 
-  if (trx_data->empty())
+  if (cache_mngr->trx_cache.empty())
   {
-    // we're here because trans_log was flushed in MYSQL_BIN_LOG::log_xid()
-    trx_data->reset();
+    /*
+      we're here because cache_log was flushed in MYSQL_BIN_LOG::log_xid()
+    */
+    cache_mngr->reset_cache(&cache_mngr->trx_cache);
     DBUG_RETURN(0);
   }
 
   /*
     We commit the transaction if:
-
      - We are not in a transaction and committing a statement, or
-
-     - We are in a transaction and a full transaction is committed
-
-    Otherwise, we accumulate the statement
+     - We are in a transaction and a full transaction is committed.
+    Otherwise, we accumulate the changes.
   */
-  ulonglong const in_transaction=
-    thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN);
-  DBUG_PRINT("debug",
-             ("all: %d, empty: %s, in_transaction: %s, all.modified_non_trans_table: %s, stmt.modified_non_trans_table: %s",
-              all,
-              YESNO(trx_data->empty()),
-              YESNO(in_transaction),
-              YESNO(thd->transaction.all.modified_non_trans_table),
-              YESNO(thd->transaction.stmt.modified_non_trans_table)));
   if (!in_transaction || all)
   {
-    Query_log_event qev(thd, STRING_WITH_LEN("COMMIT"), TRUE, TRUE, 0);
-    error= binlog_end_trans(thd, trx_data, &qev, all);
-    goto end;
+    Query_log_event qev(thd, STRING_WITH_LEN("COMMIT"), TRUE, FALSE, TRUE, 0);
+    error= binlog_flush_trx_cache(thd, cache_mngr, &qev);
   }
 
-end:
-  if (!all)
-    trx_data->before_stmt_pos = MY_OFF_T_UNDEF; // part of the stmt commit
-  DBUG_RETURN(error);
-}
+  /*
+    This is part of the stmt rollback.
+  */
+    if (!all)
+    cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF);
+    DBUG_RETURN(error);
+  }
 
 /**
-  This function is called when a transaction involving a transactional
-  table is rolled back.
-
-  It has the responsibility to flush the transaction cache to the
-  binlog file. However, if the transaction does not involve
-  non-transactional tables, nothing needs to be logged.
+  This function is called when a transaction or a statement is rolled back.
 
   @param hton  The binlog handlerton.
   @param thd   The client thread that executes the transaction.
@@ -1572,18 +1652,38 @@ static int binlog_rollback(handlerton *hton, THD *thd, bool all)
 {
   DBUG_ENTER("binlog_rollback");
   int error=0;
-  binlog_trx_data *const trx_data=
-    (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton);
-
-  if (trx_data->empty()) {
-    trx_data->reset();
-    DBUG_RETURN(0);
-  }
+  binlog_cache_mngr *const cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton);
 
   DBUG_PRINT("debug", ("all: %s, all.modified_non_trans_table: %s, stmt.modified_non_trans_table: %s",
                        YESNO(all),
                        YESNO(thd->transaction.all.modified_non_trans_table),
                        YESNO(thd->transaction.stmt.modified_non_trans_table)));
+
+  /*
+    If an incident event is set we do not flush the content of the statement
+    cache because it may be corrupted.
+  */
+  if (cache_mngr->stmt_cache.has_incident())
+  {
+    mysql_bin_log.write_incident(thd, TRUE);
+    cache_mngr->reset_cache(&cache_mngr->stmt_cache);
+  }
+  else if (!cache_mngr->stmt_cache.empty())
+  {
+    binlog_flush_stmt_cache(thd, cache_mngr);
+  }
+
+  if (cache_mngr->trx_cache.empty()) 
+  {
+    /* 
+      we're here because cache_log was flushed in MYSQL_BIN_LOG::log_xid()
+    */
+    cache_mngr->reset_cache(&cache_mngr->trx_cache);
+    DBUG_RETURN(0);
+  }
+
+
   if (mysql_bin_log.check_write_error(thd))
   {
     /*
@@ -1594,49 +1694,46 @@ static int binlog_rollback(handlerton *hton, THD *thd, bool all)
     */
     DBUG_ASSERT(!all);
     /*
-      We reach this point if either only transactional tables were modified or
-      the effect of a statement that did not get into the binlog needs to be
-      rolled back. In the latter case, if a statement changed non-transactional
-      tables or had the OPTION_KEEP_LOG associated, we write an incident event
-      to the binlog in order to stop slaves and notify users that some changes
-      on the master did not get into the binlog and slaves will be inconsistent.
-      On the other hand, if a statement is transactional, we just safely roll it
-      back.
+      We reach this point if the effect of a statement did not properly get into
+      a cache and need to be rolled back.
     */
-    if ((thd->transaction.stmt.modified_non_trans_table ||
-        (thd->options & OPTION_KEEP_LOG)) &&
-        mysql_bin_log.check_write_error(thd))
-      trx_data->set_incident();
-    error= binlog_end_trans(thd, trx_data, 0, all);
+    error= binlog_truncate_trx_cache(thd, cache_mngr, all);
   }
   else
-  {
-   /*
-      We flush the cache with a rollback, wrapped in a beging/rollback if:
+  {  
+    /*
+      We flush the cache wrapped in a beging/rollback if:
         . aborting a transcation that modified a non-transactional table or;
         . aborting a statement that modified both transactional and
-          non-transctional tables but which is not in the boundaries of any
-          transaction;
+         non-transctional tables but which is not in the boundaries of any
+         transaction;
         . the OPTION_KEEP_LOG is activate.
     */
-    if ((all && thd->transaction.all.modified_non_trans_table) ||
+    if (thd->variables.binlog_format == BINLOG_FORMAT_STMT &&
+        ((all && thd->transaction.all.modified_non_trans_table) ||
         (!all && thd->transaction.stmt.modified_non_trans_table &&
-         !(thd->options & (OPTION_BEGIN | OPTION_NOT_AUTOCOMMIT))) ||
-        ((thd->options & OPTION_KEEP_LOG)))
+        !thd->in_multi_stmt_transaction()) ||
+        (thd->options & OPTION_KEEP_LOG)))
     {
-      Query_log_event qev(thd, STRING_WITH_LEN("ROLLBACK"), TRUE, TRUE, 0);
-      error= binlog_end_trans(thd, trx_data, &qev, all);
+      Query_log_event qev(thd, STRING_WITH_LEN("ROLLBACK"), TRUE, FALSE, TRUE, 0);
+      error= binlog_flush_trx_cache(thd, cache_mngr, &qev);
     }
     /*
       Otherwise, we simply truncate the cache as there is no change on
       non-transactional tables as follows.
     */
-    else if ((all && !thd->transaction.all.modified_non_trans_table) ||
-          (!all && !thd->transaction.stmt.modified_non_trans_table))
-      error= binlog_end_trans(thd, trx_data, 0, all);
+    else if (all || (!all &&
+                     (!thd->transaction.stmt.modified_non_trans_table ||
+                      !thd->in_multi_stmt_transaction() || 
+                      thd->variables.binlog_format != BINLOG_FORMAT_STMT)))
+      error= binlog_truncate_trx_cache(thd, cache_mngr, all);
   }
+
+  /* 
+    This is part of the stmt rollback.
+  */
   if (!all)
-    trx_data->before_stmt_pos = MY_OFF_T_UNDEF; // part of the stmt rollback
+    cache_mngr->trx_cache.set_prev_position(MY_OFF_T_UNDEF); 
   DBUG_RETURN(error);
 }
 
@@ -1712,7 +1809,8 @@ static int binlog_savepoint_set(handlerton *hton, THD *thd, void *sv)
   int errcode= query_error_code(thd, thd->killed == THD::NOT_KILLED);
   int const error=
     thd->binlog_query(THD::STMT_QUERY_TYPE,
-                      thd->query, thd->query_length, TRUE, FALSE, errcode);
+                      thd->query, thd->query_length, TRUE, FALSE, FALSE,
+                      errcode);
   DBUG_RETURN(error);
 }
 
@@ -1731,7 +1829,8 @@ static int binlog_savepoint_rollback(handlerton *hton, THD *thd, void *sv)
     int errcode= query_error_code(thd, thd->killed == THD::NOT_KILLED);
     int error=
       thd->binlog_query(THD::STMT_QUERY_TYPE,
-                        thd->query, thd->query_length, TRUE, FALSE, errcode);
+                        thd->query, thd->query_length, TRUE, FALSE, FALSE,
+                        errcode);
     DBUG_RETURN(error);
   }
   binlog_trans_log_truncate(thd, *(my_off_t*)sv);
@@ -3737,27 +3836,67 @@ bool MYSQL_BIN_LOG::is_query_in_union(THD *thd, query_id_t query_id_param)
 int THD::binlog_setup_trx_data()
 {
   DBUG_ENTER("THD::binlog_setup_trx_data");
-  binlog_trx_data *trx_data=
-    (binlog_trx_data*) thd_get_ha_data(this, binlog_hton);
+  binlog_cache_mngr *cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton);
 
-  if (trx_data)
+  if (cache_mngr)
     DBUG_RETURN(0);                             // Already set up
 
-  trx_data= (binlog_trx_data*) my_malloc(sizeof(binlog_trx_data), MYF(MY_ZEROFILL));
-  if (!trx_data ||
-      open_cached_file(&trx_data->trans_log, mysql_tmpdir,
+  cache_mngr= (binlog_cache_mngr*) my_malloc(sizeof(binlog_cache_mngr), MYF(MY_ZEROFILL));
+  if (!cache_mngr ||
+      open_cached_file(&cache_mngr->stmt_cache.cache_log, mysql_tmpdir,
+                       LOG_PREFIX, binlog_cache_size, MYF(MY_WME)) ||
+      open_cached_file(&cache_mngr->trx_cache.cache_log, mysql_tmpdir,
                        LOG_PREFIX, binlog_cache_size, MYF(MY_WME)))
   {
-    my_free((uchar*)trx_data, MYF(MY_ALLOW_ZERO_PTR));
+    my_free((uchar*)cache_mngr, MYF(MY_ALLOW_ZERO_PTR));
     DBUG_RETURN(1);                      // Didn't manage to set it up
   }
-  thd_set_ha_data(this, binlog_hton, trx_data);
+  thd_set_ha_data(this, binlog_hton, cache_mngr);
 
-  trx_data= new (thd_get_ha_data(this, binlog_hton)) binlog_trx_data;
+  cache_mngr= new (thd_get_ha_data(this, binlog_hton)) binlog_cache_mngr;
 
   DBUG_RETURN(0);
 }
 
+/** 
+    This function checks if a transactional talbe was updated by the
+    current transaction.
+
+    @param thd The client thread that executed the current statement.
+    @return
+      @c true if a transactional table was updated, @false otherwise.
+*/
+bool
+trans_has_updated_trans_table(THD* thd)
+{
+  binlog_cache_mngr *const cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton);
+
+  return (cache_mngr ? my_b_tell (&cache_mngr->trx_cache.cache_log) : 0);
+}
+
+/** 
+    This function checks if a transactional talbe was updated by the
+    current statement.
+
+    @param thd The client thread that executed the current statement.
+    @return
+      @c true if a transactional table was updated, @false otherwise.
+*/
+bool
+stmt_has_updated_trans_table(THD *thd)
+{
+  Ha_trx_info *ha_info;
+
+  for (ha_info= thd->transaction.stmt.ha_list; ha_info; ha_info= ha_info->next())
+  {
+    if (ha_info->is_trx_read_write() && ha_info->ht() != binlog_hton)
+      return (TRUE);
+  }
+  return (FALSE);
+}
+
 /*
   Function to start a statement and optionally a transaction for the
   binary log.
@@ -3771,11 +3910,10 @@ int THD::binlog_setup_trx_data()
     - Start a transaction if not in autocommit mode or if a BEGIN
       statement has been seen.
 
-    - Start a statement transaction to allow us to truncate the binary
-      log.
+    - Start a statement transaction to allow us to truncate the cache.
 
     - Save the currrent binlog position so that we can roll back the
-      statement by truncating the transaction log.
+      statement by truncating the cache.
 
       We only update the saved position if the old one was undefined,
       the reason is that there are some cases (e.g., for CREATE-SELECT)
@@ -3789,15 +3927,15 @@ int THD::binlog_setup_trx_data()
 void
 THD::binlog_start_trans_and_stmt()
 {
-  binlog_trx_data *trx_data= (binlog_trx_data*) thd_get_ha_data(this, binlog_hton);
+  binlog_cache_mngr *cache_mngr= (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton);
   DBUG_ENTER("binlog_start_trans_and_stmt");
-  DBUG_PRINT("enter", ("trx_data: 0x%lx  trx_data->before_stmt_pos: %lu",
-                       (long) trx_data,
-                       (trx_data ? (ulong) trx_data->before_stmt_pos :
+  DBUG_PRINT("enter", ("cache_mngr: %p  cache_mngr->trx_cache.get_prev_position(): %lu",
+                       cache_mngr,
+                       (cache_mngr ? (ulong) cache_mngr->trx_cache.get_prev_position() :
                         (ulong) 0)));
 
-  if (trx_data == NULL ||
-      trx_data->before_stmt_pos == MY_OFF_T_UNDEF)
+  if (cache_mngr == NULL ||
+      cache_mngr->trx_cache.get_prev_position() == MY_OFF_T_UNDEF)
   {
     this->binlog_set_stmt_begin();
     if (options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN))
@@ -3818,27 +3956,35 @@ THD::binlog_start_trans_and_stmt()
 }
 
 void THD::binlog_set_stmt_begin() {
-  binlog_trx_data *trx_data=
-    (binlog_trx_data*) thd_get_ha_data(this, binlog_hton);
+  binlog_cache_mngr *cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton);
 
   /*
-    The call to binlog_trans_log_savepos() might create the trx_data
+    The call to binlog_trans_log_savepos() might create the cache_mngr
     structure, if it didn't exist before, so we save the position
     into an auto variable and then write it into the transaction
-    data for the binary log (i.e., trx_data).
+    data for the binary log (i.e., cache_mngr).
   */
   my_off_t pos= 0;
   binlog_trans_log_savepos(this, &pos);
-  trx_data= (binlog_trx_data*) thd_get_ha_data(this, binlog_hton);
-  trx_data->before_stmt_pos= pos;
+  cache_mngr= (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton);
+  cache_mngr->trx_cache.set_prev_position(pos);
 }
 
 
-/*
-  Write a table map to the binary log.
- */
-
-int THD::binlog_write_table_map(TABLE *table, bool is_trans)
+/**
+  This function writes a table map to the binary log. 
+  Note that in order to keep the signature uniform with related methods,
+  we use a redundant parameter to indicate whether a transactional table
+  was changed or not.
+ 
+  @param table             a pointer to the table.
+  @param is_transactional  @c true indicates a transactional table,
+                           otherwise @c false a non-transactional.
+  @return
+    nonzero if an error pops up when writing the table map event.
+*/
+int THD::binlog_write_table_map(TABLE *table, bool is_transactional)
 {
   int error;
   DBUG_ENTER("THD::binlog_write_table_map");
@@ -3854,12 +4000,17 @@ int THD::binlog_write_table_map(TABLE *table, bool is_trans)
     flags= Table_map_log_event::TM_NO_FLAGS;
 
   Table_map_log_event
-    the_event(this, table, table->s->table_map_id, is_trans, flags);
+    the_event(this, table, table->s->table_map_id, is_transactional, flags);
 
-  if (is_trans && binlog_table_maps == 0)
+  if (binlog_table_maps == 0)
     binlog_start_trans_and_stmt();
 
-  if ((error= mysql_bin_log.write(&the_event)))
+  binlog_cache_mngr *const cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton);
+
+  IO_CACHE *file= cache_mngr->get_binlog_cache_log(is_transactional);
+
+  if ((error= the_event.write(file)))
     DBUG_RETURN(error);
 
   binlog_table_maps++;
@@ -3867,144 +4018,163 @@ int THD::binlog_write_table_map(TABLE *table, bool is_trans)
   DBUG_RETURN(0);
 }
 
+/**
+  This function retrieves a pending row event from a cache which is
+  specified through the parameter @c is_transactional. Respectively, when it
+  is @c true, the pending event is returned from the transactional cache.
+  Otherwise from the non-transactional cache.
+
+  @param is_transactional  @c true indicates a transactional cache,
+                           otherwise @c false a non-transactional.
+  @return
+    The row event if any. 
+*/
 Rows_log_event*
-THD::binlog_get_pending_rows_event() const
+THD::binlog_get_pending_rows_event(bool is_transactional) const
 {
-  binlog_trx_data *const trx_data=
-    (binlog_trx_data*) thd_get_ha_data(this, binlog_hton);
+  Rows_log_event* rows= NULL;
+  binlog_cache_mngr *const cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton);
+
   /*
-    This is less than ideal, but here's the story: If there is no
-    trx_data, prepare_pending_rows_event() has never been called
-    (since the trx_data is set up there). In that case, we just return
-    NULL.
+    This is less than ideal, but here's the story: If there is no cache_mngr,
+    prepare_pending_rows_event() has never been called (since the cache_mngr
+    is set up there). In that case, we just return NULL.
    */
-  return trx_data ? trx_data->pending() : NULL;
+  if (cache_mngr)
+  {
+    binlog_cache_data *cache_data=
+      cache_mngr->get_binlog_cache_data(is_transactional);
+
+    rows= cache_data->pending();
+  }
+  return (rows);
 }
 
+/**
+  This function stores a pending row event into a cache which is specified
+  through the parameter @c is_transactional. Respectively, when it is @c
+  true, the pending event is stored into the transactional cache. Otherwise
+  into the non-transactional cache.
+
+  @param evt               a pointer to the row event.
+  @param is_transactional  @c true indicates a transactional cache,
+                           otherwise @c false a non-transactional.
+*/
 void
-THD::binlog_set_pending_rows_event(Rows_log_event* ev)
+THD::binlog_set_pending_rows_event(Rows_log_event* ev, bool is_transactional)
 {
   if (thd_get_ha_data(this, binlog_hton) == NULL)
     binlog_setup_trx_data();
 
-  binlog_trx_data *const trx_data=
-    (binlog_trx_data*) thd_get_ha_data(this, binlog_hton);
+  binlog_cache_mngr *const cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(this, binlog_hton);
+
+  DBUG_ASSERT(cache_mngr);
+
+  binlog_cache_data *cache_data=
+    cache_mngr->get_binlog_cache_data(is_transactional);
 
-  DBUG_ASSERT(trx_data);
-  trx_data->set_pending(ev);
+  cache_data->set_pending(ev);
 }
 
 
 /**
-  Remove the pending rows event, discarding any outstanding rows.
-
-  If there is no pending rows event available, this is effectively a
+  This function removes the pending rows event, discarding any outstanding
+  rows. If there is no pending rows event available, this is effectively a
   no-op.
- */
+
+  @param thd               a pointer to the user thread.
+  @param is_transactional  @c true indicates a transactional cache,
+                           otherwise @c false a non-transactional.
+*/
 int
-MYSQL_BIN_LOG::remove_pending_rows_event(THD *thd)
+MYSQL_BIN_LOG::remove_pending_rows_event(THD *thd, bool is_transactional)
 {
   DBUG_ENTER("MYSQL_BIN_LOG::remove_pending_rows_event");
 
-  binlog_trx_data *const trx_data=
-    (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton);
+  binlog_cache_mngr *const cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton);
+
+  DBUG_ASSERT(cache_mngr);
 
-  DBUG_ASSERT(trx_data);
+  binlog_cache_data *cache_data=
+    cache_mngr->get_binlog_cache_data(is_transactional);
 
-  if (Rows_log_event* pending= trx_data->pending())
+  if (Rows_log_event* pending= cache_data->pending())
   {
     delete pending;
-    trx_data->set_pending(NULL);
+    cache_data->set_pending(NULL);
   }
 
   DBUG_RETURN(0);
 }
 
 /*
-  Moves the last bunch of rows from the pending Rows event to the binlog
-  (either cached binlog if transaction, or disk binlog). Sets a new pending
-  event.
+  Moves the last bunch of rows from the pending Rows event to a cache (either
+  transactional cache if is_transaction is @c true, or the non-transactional
+  cache otherwise. Sets a new pending event.
+
+  @param thd               a pointer to the user thread.
+  @param evt               a pointer to the row event.
+  @param is_transactional  @c true indicates a transactional cache,
+                           otherwise @c false a non-transactional.
 */
 int
 MYSQL_BIN_LOG::flush_and_set_pending_rows_event(THD *thd,
-                                                Rows_log_event* event)
+                                                Rows_log_event* event,
+                                                bool is_transactional)
 {
   DBUG_ENTER("MYSQL_BIN_LOG::flush_and_set_pending_rows_event(event)");
   DBUG_ASSERT(mysql_bin_log.is_open());
   DBUG_PRINT("enter", ("event: 0x%lx", (long) event));
 
   int error= 0;
+  binlog_cache_mngr *const cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton);
 
-  binlog_trx_data *const trx_data=
-    (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton);
+  DBUG_ASSERT(cache_mngr);
 
-  DBUG_ASSERT(trx_data);
+  binlog_cache_data *cache_data=
+    cache_mngr->get_binlog_cache_data(is_transactional);
 
-  DBUG_PRINT("info", ("trx_data->pending(): 0x%lx", (long) trx_data->pending()));
+  DBUG_PRINT("info", ("cache_mngr->pending(): 0x%lx", (long) cache_data->pending()));
 
-  if (Rows_log_event* pending= trx_data->pending())
+  if (Rows_log_event* pending= cache_data->pending())
   {
-    IO_CACHE *file= &log_file;
+    IO_CACHE *file= &cache_data->cache_log;
 
     /*
-      Decide if we should write to the log file directly or to the
-      transaction log.
-    */
-    if (pending->get_cache_stmt() || my_b_tell(&trx_data->trans_log))
-      file= &trx_data->trans_log;
-
-    /*
-      If we are writing to the log file directly, we could avoid
-      locking the log. This does not work since we need to step the
-      m_table_map_version below, and that change has to be protected
-      by the LOCK_log mutex.
-    */
-    pthread_mutex_lock(&LOCK_log);
-
-    /*
-      Write pending event to log file or transaction cache
+      Write pending event to the cache.
     */
     if (pending->write(file))
     {
-      pthread_mutex_unlock(&LOCK_log);
       set_write_error(thd);
+      if (check_write_error(thd) && cache_data &&
+          thd->transaction.stmt.modified_non_trans_table)
+        cache_data->set_incident();
       DBUG_RETURN(1);
     }
 
     /*
       We step the table map version if we are writing an event
-      representing the end of a statement.  We do this regardless of
-      wheather we write to the transaction cache or to directly to the
-      file.
-
-      In an ideal world, we could avoid stepping the table map version
-      if we were writing to a transaction cache, since we could then
-      reuse the table map that was written earlier in the transaction
-      cache.  This does not work since STMT_END_F implies closing all
-      table mappings on the slave side.
+      representing the end of a statement.
 
+      In an ideal world, we could avoid stepping the table map version,
+      since we could then reuse the table map that was written earlier
+      in the cache. This does not work since STMT_END_F implies closing
+      all table mappings on the slave side.
+    
       TODO: Find a solution so that table maps does not have to be
       written several times within a transaction.
-     */
+    */
     if (pending->get_flags(Rows_log_event::STMT_END_F))
       ++m_table_map_version;
 
     delete pending;
-
-    if (file == &log_file)
-    {
-      error= flush_and_sync(0);
-      if (!error)
-      {
-        signal_update();
-        rotate_and_purge(RP_LOCK_LOG_IS_ALREADY_LOCKED);
-      }
-    }
-
-    pthread_mutex_unlock(&LOCK_log);
   }
 
-  thd->binlog_set_pending_rows_event(event);
+  thd->binlog_set_pending_rows_event(event, is_transactional);
 
   DBUG_RETURN(error);
 }
@@ -4018,6 +4188,7 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info)
   THD *thd= event_info->thd;
   bool error= 1;
   DBUG_ENTER("MYSQL_BIN_LOG::write(Log_event *)");
+  binlog_cache_data *cache_data= 0;
 
   if (thd->binlog_evt_union.do_union)
   {
@@ -4026,27 +4197,22 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info)
       We will log the function call to the binary log on function exit
     */
     thd->binlog_evt_union.unioned_events= TRUE;
-    thd->binlog_evt_union.unioned_events_trans |= event_info->cache_stmt;
+    thd->binlog_evt_union.unioned_events_trans |=
+      event_info->use_trans_cache();
     DBUG_RETURN(0);
   }
 
   /*
-    Flush the pending rows event to the transaction cache or to the
-    log file.  Since this function potentially aquire the LOCK_log
-    mutex, we do this before aquiring the LOCK_log mutex in this
-    function.
-
     We only end the statement if we are in a top-level statement.  If
     we are inside a stored function, we do not end the statement since
     this will close all tables on the slave.
   */
   bool const end_stmt=
     thd->prelocked_mode && thd->lex->requires_prelocking();
-  if (thd->binlog_flush_pending_rows_event(end_stmt))
+  if (thd->binlog_flush_pending_rows_event(end_stmt,
+                                           event_info->use_trans_cache()))
     DBUG_RETURN(error);
 
-  pthread_mutex_lock(&LOCK_log);
-
   /*
      In most cases this is only called if 'is_open()' is true; in fact this is
      mostly called if is_open() *was* true a few instructions before, but it
@@ -4054,7 +4220,6 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info)
   */
   if (likely(is_open()))
   {
-    IO_CACHE *file= &log_file;
 #ifdef HAVE_REPLICATION
     /*
       In the future we need to add to the following if tests like
@@ -4064,63 +4229,67 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info)
     const char *local_db= event_info->get_db();
     if ((thd && !(thd->options & OPTION_BIN_LOG)) ||
 	(!binlog_filter->db_ok(local_db)))
-    {
-      VOID(pthread_mutex_unlock(&LOCK_log));
       DBUG_RETURN(0);
-    }
 #endif /* HAVE_REPLICATION */
 
-#if defined(USING_TRANSACTIONS) 
-    /*
-      Should we write to the binlog cache or to the binlog on disk?
-      Write to the binlog cache if:
-      - it is already not empty (meaning we're in a transaction; note that the
-     present event could be about a non-transactional table, but still we need
-     to write to the binlog cache in that case to handle updates to mixed
-     trans/non-trans table types the best possible in binlogging)
-      - or if the event asks for it (cache_stmt == TRUE).
-    */
-    if (opt_using_transactions && thd)
+#if defined(USING_TRANSACTIONS)
+    IO_CACHE *file= NULL;
+
+    if (event_info->use_direct_logging())
+    {
+      file= &log_file;
+      pthread_mutex_lock(&LOCK_log);
+    }
+    else
     {
       if (thd->binlog_setup_trx_data())
         goto err;
 
-      binlog_trx_data *const trx_data=
-        (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton);
-      IO_CACHE *trans_log= &trx_data->trans_log;
-      my_off_t trans_log_pos= my_b_tell(trans_log);
-      if (event_info->get_cache_stmt() || trans_log_pos != 0 ||
-          stmt_has_updated_trans_table(thd))
+      binlog_cache_mngr *const cache_mngr=
+        (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton);
+
+      /*
+        If we are about to use write rows, we just need to check the type of
+        the event (either transactional or non-transactional) in order to
+        choose the cache.
+      */
+      if (thd->is_current_stmt_binlog_format_row())
       {
-        DBUG_PRINT("info", ("Using trans_log: cache: %d, trans_log_pos: %lu",
-                            event_info->get_cache_stmt(),
-                            (ulong) trans_log_pos));
-        thd->binlog_start_trans_and_stmt();
-        file= trans_log;
+        file= cache_mngr->get_binlog_cache_log(event_info->use_trans_cache());
+        cache_data= cache_mngr->get_binlog_cache_data(event_info->use_trans_cache());
       }
       /*
-        TODO as Mats suggested, for all the cases above where we write to
-        trans_log, it sounds unnecessary to lock LOCK_log. We should rather
-        test first if we want to write to trans_log, and if not, lock
-        LOCK_log.
+        However, if we are about to write statements we need to consider other
+        things. We use the non-transactional cache when:
+
+        . the transactional cache is empty which means that there were no
+         early statement on behalf of the transaction.
+        . the respective event is tagged as non-transactional.
       */
+      else if (cache_mngr->trx_cache.empty() &&
+               !event_info->use_trans_cache())
+      {
+        file= &cache_mngr->stmt_cache.cache_log;
+        cache_data=  &cache_mngr->stmt_cache;
+      }
+      else
+      {
+        file= &cache_mngr->trx_cache.cache_log;
+        cache_data= &cache_mngr->trx_cache;
+      }
+
+      thd->binlog_start_trans_and_stmt();
     }
 #endif /* USING_TRANSACTIONS */
     DBUG_PRINT("info",("event type: %d",event_info->get_type_code()));
 
     /*
-      No check for auto events flag here - this write method should
-      never be called if auto-events are enabled
-    */
-
-    /*
-      1. Write first log events which describe the 'run environment'
-      of the SQL command
-    */
+       No check for auto events flag here - this write method should
+       never be called if auto-events are enabled.
 
-    /*
-      If row-based binlogging, Insert_id, Rand and other kind of "setting
-      context" events are not needed.
+       Write first log events which describe the 'run environment'
+       of the SQL command. If row-based binlogging, Insert_id, Rand
+       and other kind of "setting context" events are not needed.
     */
     if (thd)
     {
@@ -4170,39 +4339,48 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info)
     }
 
     /*
-       Write the SQL command
-     */
-
-    if (event_info->write(file) || 
+      Write the event.
+    */
+    if (event_info->write(file) ||
         DBUG_EVALUATE_IF("injecting_fault_writing", 1, 0))
       goto err;
 
-    if (file == &log_file) // we are writing to the real log (disk)
+    error= 0;
+
+err:
+    if (event_info->use_direct_logging())
     {
-      bool synced= 0;
-      if (flush_and_sync(&synced))
-	goto err;
+      if (!error)
+      {
+        bool synced;
+        if ((error= flush_and_sync(&synced)))
+          goto unlock;
 
-      if (RUN_HOOK(binlog_storage, after_flush,
-                   (thd, log_file_name, file->pos_in_file, synced))) {
-        sql_print_error("Failed to run 'after_flush' hooks");
-        goto err;
+        if ((error= RUN_HOOK(binlog_storage, after_flush,
+                 (thd, log_file_name, file->pos_in_file, synced))))
+        {
+          sql_print_error("Failed to run 'after_flush' hooks");
+          goto unlock;
+        }
+        signal_update();
+        rotate_and_purge(RP_LOCK_LOG_IS_ALREADY_LOCKED);
       }
-
-      signal_update();
-      rotate_and_purge(RP_LOCK_LOG_IS_ALREADY_LOCKED);
+unlock:
+      pthread_mutex_unlock(&LOCK_log);
     }
-    error=0;
 
-err:
     if (error)
+    {
       set_write_error(thd);
+      if (check_write_error(thd) && cache_data &&
+        thd->transaction.stmt.modified_non_trans_table)
+        cache_data->set_incident();
+    }
   }
 
   if (event_info->flags & LOG_EVENT_UPDATE_TABLE_MAP_VERSION_F)
     ++m_table_map_version;
 
-  pthread_mutex_unlock(&LOCK_log);
   DBUG_RETURN(error);
 }
 
@@ -4314,7 +4492,7 @@ uint MYSQL_BIN_LOG::next_file_id()
     write_cache()
     cache    Cache to write to the binary log
     lock_log True if the LOCK_log mutex should be aquired, false otherwise
-    sync_log True if the log should be flushed and sync:ed
+    sync_log True if the log should be flushed and synced
 
   DESCRIPTION
     Write the contents of the cache to the binary log. The cache will
@@ -4530,9 +4708,6 @@ bool MYSQL_BIN_LOG::write(THD *thd, IO_CACHE *cache, Log_event *commit_event,
   DBUG_ENTER("MYSQL_BIN_LOG::write(THD *, IO_CACHE *, Log_event *)");
   VOID(pthread_mutex_lock(&LOCK_log));
 
-  /* NULL would represent nothing to replicate after ROLLBACK */
-  DBUG_ASSERT(commit_event != NULL);
-
   DBUG_ASSERT(is_open());
   if (likely(is_open()))                       // Should always be true
   {
@@ -4547,19 +4722,9 @@ bool MYSQL_BIN_LOG::write(THD *thd, IO_CACHE *cache, Log_event *commit_event,
         transaction is either a BEGIN..COMMIT block or a single
         statement in autocommit mode.
       */
-      Query_log_event qinfo(thd, STRING_WITH_LEN("BEGIN"), TRUE, TRUE, 0);
-
-      /*
-        Now this Query_log_event has artificial log_pos 0. It must be
-        adjusted to reflect the real position in the log. Not doing it
-        would confuse the slave: it would prevent this one from
-        knowing where he is in the master's binlog, which would result
-        in wrong positions being shown to the user, MASTER_POS_WAIT
-        undue waiting etc.
-      */
+      Query_log_event qinfo(thd, STRING_WITH_LEN("BEGIN"), TRUE, FALSE, TRUE, 0);
       if (qinfo.write(&log_file))
         goto err;
-
       DBUG_EXECUTE_IF("crash_before_writing_xid",
                       {
                         if ((write_error= write_cache(cache, false, true)))
@@ -5657,13 +5822,13 @@ int TC_LOG_BINLOG::log_xid(THD *thd, my_xid xid)
 {
   DBUG_ENTER("TC_LOG_BINLOG::log");
   Xid_log_event xle(thd, xid);
-  binlog_trx_data *trx_data=
-    (binlog_trx_data*) thd_get_ha_data(thd, binlog_hton);
+  binlog_cache_mngr *cache_mngr=
+    (binlog_cache_mngr*) thd_get_ha_data(thd, binlog_hton);
   /*
     We always commit the entire transaction when writing an XID. Also
     note that the return value is inverted.
    */
-  DBUG_RETURN(!binlog_end_trans(thd, trx_data, &xle, TRUE));
+  DBUG_RETURN(!binlog_flush_trx_cache(thd, cache_mngr, &xle));
 }
 
 void TC_LOG_BINLOG::unlog(ulong cookie, my_xid xid)
-- 
cgit v1.2.1


From ac647f5a3eb12313f981800ac1fd0c562402abcc Mon Sep 17 00:00:00 2001
From: unknown <Dao-Gang.Qu@sun.com>
Date: Thu, 3 Dec 2009 16:59:58 +0800
Subject: WL#5142 FLUSH LOGS should take optional arguments for which log(s) to
 flush

Support for flushing individual logs, so that the user can
selectively flush a subset of the server logs.

Flush of individual logs is done according to the
following syntax:

  FLUSH <log_category> LOGS;

The syntax is extended so that the user is able to flush a
subset of logs:

  FLUSH [log_category LOGS,];

where log_category is one of:
  SLOW
  ERROR
  BINARY
  ENGINE
  GENERAL
  RELAY.


mysql-test/suite/rpl/r/rpl_flush_logs.result:
  Test result for WL#5142.
mysql-test/suite/rpl/t/rpl_flush_logs.test:
  Added the test file to verify if the 'flush individual log'
  statement works fine.
sql/log.cc:
  Added the two functions to flush slow and general log.
sql/sql_parse.cc:
  Added code to flush specified logs against the option.
sql/sql_yacc.yy:
  Added code to parse the 'flush * log' statement syntax and
  set its option to Lex->type.
---
 sql/log.cc | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

(limited to 'sql/log.cc')

diff --git a/sql/log.cc b/sql/log.cc
index 3d863583859..1c95b21f533 100644
--- a/sql/log.cc
+++ b/sql/log.cc
@@ -965,6 +965,54 @@ bool LOGGER::flush_logs(THD *thd)
 }
 
 
+/**
+  Close and reopen the slow log (with locks).
+  
+  @returns FALSE.
+*/
+bool LOGGER::flush_slow_log()
+{
+  /*
+    Now we lock logger, as nobody should be able to use logging routines while
+    log tables are closed
+  */
+  logger.lock_exclusive();
+
+  /* Reopen slow log file */
+  if (opt_slow_log)
+    file_log_handler->get_mysql_slow_log()->reopen_file();
+
+  /* End of log flush */
+  logger.unlock();
+
+  return 0;
+}
+
+
+/**
+  Close and reopen the general log (with locks).
+
+  @returns FALSE.
+*/
+bool LOGGER::flush_general_log()
+{
+  /*
+    Now we lock logger, as nobody should be able to use logging routines while
+    log tables are closed
+  */
+  logger.lock_exclusive();
+
+  /* Reopen general log file */
+  if (opt_log)
+    file_log_handler->get_mysql_log()->reopen_file();
+
+  /* End of log flush */
+  logger.unlock();
+
+  return 0;
+}
+
+
 /*
   Log slow query with all enabled log event handlers
 
-- 
cgit v1.2.1


From 9e980bf79ef0c727c630e79c1bc043c48bc947ee Mon Sep 17 00:00:00 2001
From: Mats Kindahl <mats@sun.com>
Date: Tue, 15 Dec 2009 16:11:44 +0100
Subject: BUG#49618: Field length stored incorrectly in binary log           
 for InnoDB

The class Field_bit_as_char stores the metadata for the
field incorrecly because bytes_in_rec and bit_len are set
to (field_length + 7 ) / 8 and 0 respectively, while
Field_bit has the correct values field_length / 8 and
field_length % 8.

Solved the problem by re-computing the values for the
metadata based on the field_length instead of using the
bytes_in_rec and bit_len variables.

To handle compatibility with old server, a table map
flag was added to indicate that the bit computation is
exact. If the flag is clear, the slave computes the
number of bytes required to store the bit field and
compares that instead, effectively allowing replication
*without conversion* from any field length that require
the same number of bytes to store.

mysql-test/suite/rpl/t/rpl_typeconv_innodb.test:
  Adding test to check compatibility for bit field
  replication when using InnoDB.
sql/field.cc:
  Extending compatible_field_size() with flags from
  table map to allow fields to check master info.
sql/field.h:
  Extending compatible_field_size() with flags from
  table map to allow fields to check master info.
sql/log.cc:
  Removing table map flags since they are not used
  outside table map class.
sql/log_event.cc:
  Removing flags parameter from table map constructor
  since it is not used and does not have to be exposed.
sql/log_event.h:
  Adding flag to denote that bit length for bit field type
  is exact and not potentially rounded to even bytes.
sql/rpl_utility.cc:
  Adding fields to table_def to store table map flags.
sql/rpl_utility.h:
  Removing obsolete comment and adding flags to store
  table map flags from master.
---
 sql/log.cc | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

(limited to 'sql/log.cc')

diff --git a/sql/log.cc b/sql/log.cc
index e8366c47863..f74b3ef858a 100644
--- a/sql/log.cc
+++ b/sql/log.cc
@@ -3846,11 +3846,8 @@ int THD::binlog_write_table_map(TABLE *table, bool is_trans)
   DBUG_ASSERT(current_stmt_binlog_row_based && mysql_bin_log.is_open());
   DBUG_ASSERT(table->s->table_map_id != ULONG_MAX);
 
-  Table_map_log_event::flag_set const
-    flags= Table_map_log_event::TM_NO_FLAGS;
-
   Table_map_log_event
-    the_event(this, table, table->s->table_map_id, is_trans, flags);
+    the_event(this, table, table->s->table_map_id, is_trans);
 
   if (is_trans && binlog_table_maps == 0)
     binlog_start_trans_and_stmt();
-- 
cgit v1.2.1


From 54b2371e92a723ee9d44c282b3b1c5b7baf50f89 Mon Sep 17 00:00:00 2001
From: Alfranio Correia <alfranio.correia@sun.com>
Date: Tue, 5 Jan 2010 16:55:23 +0000
Subject: BUG#50038 Deadlock on flush logs with concurrent DML and RBR

In auto-commit mode, updating both trx and non-trx tables (i.e. issuing a mixed
statement) causes the following sequence of events:

1 - "Flush trx changes" (MYSQL_BIN_LOG::write) - T1:
  1.1 - mutex_lock (&LOCK_log)
  1.2 - mutex_lock (&LOCK_prep_xids)
  1.3 - increase prepared_xids
  1.4 - mutex_unlock (&LOCK_prep_xids)
  1.5 - mutex_unlock (&LOCK_log)

2 - "Flush non-trx changes" (MYSQL_BIN_LOG::write) - T1:
  2.1 - mutex_lock (&LOCK_log)
  2.2 - mutex_unlock (&LOCK_log)

3. "unlog" - T1
  3.1 - mutex_lock (&LOCK_prep_xids)
  3.2 - decrease prepared xids
  3.3 - pthread_cond_signal(&COND_prep_xids);
  3.4 - mutex_unlock (&LOCK_prep_xids)

The "FLUSH logs" command produces the following sequence of events:

1 - "FLUSH logs" command (MYSQL_BIN_LOG::new_file_impl) - user thread:
  1.1 - mutex_lock (&LOCK_log)
  1.2 - mutex_lock (&LOCK_prep_xids)
  1.3 - while (prepared_xids)  pthread_cond_wait(..., &LOCK_prep_xids);
  1.4 - mutex_unlock (&LOCK_prep_xids)
  1.5 - mutex_unlock (&LOCK_log)

A deadlock will arise if T1 flushes the trx changes and thus increases
prepared_xids but before it is able to continue the execution and flush the
non-trx changes, an user thread calls the "FLUSH logs" command and wait that
the prepared_xids is decreased and gets to zero. However, T1 cannot proceed
with the call to "Flush non-trx changes" because it will block in the mutex
"LOCK_log" and by consequence cannot complete the execution and call the
unlog to decrease the prepared_xids.

To fix the problem, we ensure that the non-trx changes are always flushed
before the trx changes.

Note that if you call "Flush non-trx changes" and a concurrent "FLUSH logs" is
issued, the "Flush non-trx changes" may block, but a deadlock will never happen
because the prepared_xids will eventually get to zero. Bottom line, there will
not be any transaction able to increase the prepared_xids because they will
block in the mutex "LOCK_log" (MYSQL_BIN_LOG::write) and those that increased
the prepared_xids will eventually commit and decrease the prepared_xids.
---
 sql/log.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'sql/log.cc')

diff --git a/sql/log.cc b/sql/log.cc
index 55f08978b20..8781fb03031 100644
--- a/sql/log.cc
+++ b/sql/log.cc
@@ -5958,7 +5958,8 @@ int TC_LOG_BINLOG::log_xid(THD *thd, my_xid xid)
     We always commit the entire transaction when writing an XID. Also
     note that the return value is inverted.
    */
-  DBUG_RETURN(!binlog_flush_trx_cache(thd, cache_mngr, &xle));
+  DBUG_RETURN(!binlog_flush_stmt_cache(thd, cache_mngr) &&
+              !binlog_flush_trx_cache(thd, cache_mngr, &xle));
 }
 
 void TC_LOG_BINLOG::unlog(ulong cookie, my_xid xid)
-- 
cgit v1.2.1


From e0e0f9e3d46917fe9b611fc9769e64032c267446 Mon Sep 17 00:00:00 2001
From: Marc Alff <marc.alff@sun.com>
Date: Mon, 11 Jan 2010 18:47:27 -0700
Subject: WL#2360 Performance schema

Part V: performance schema implementation
---
 sql/log.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'sql/log.cc')

diff --git a/sql/log.cc b/sql/log.cc
index 3680398f068..7776b6bfbdc 100644
--- a/sql/log.cc
+++ b/sql/log.cc
@@ -3075,7 +3075,7 @@ bool MYSQL_BIN_LOG::reset_logs(THD* thd)
     thread. If the transaction involved MyISAM tables, it should go
     into binlog even on rollback.
   */
-  pthread_mutex_lock(&LOCK_thread_count);
+  mysql_mutex_lock(&LOCK_thread_count);
 
   /* Save variables so that we can reopen the log */
   save_name=name;
@@ -3168,7 +3168,7 @@ bool MYSQL_BIN_LOG::reset_logs(THD* thd)
 err:
   if (error == 1)
     name= const_cast<char*>(save_name);
-  pthread_mutex_unlock(&LOCK_thread_count);
+  mysql_mutex_unlock(&LOCK_thread_count);
   mysql_mutex_unlock(&LOCK_index);
   mysql_mutex_unlock(&LOCK_log);
   DBUG_RETURN(error);
-- 
cgit v1.2.1