BUG#55263: assert in check_binlog_magic

The procedure for setting up a fake binary log, by changing the relay log files manually, is run twice when we issue mtr with --repeat=2. However, when setting it up for the second time, neither the sql thread is reset nor the server is restarted. This means that previous stale relay log IO_CACHE metadata - from first run - is left around. As a consequence, when the test is run for the second time, the IO_CACHE for the relay log has position in file different than zero, triggering the assertion. We fix this by deploying a call to my_b_seek before calling check_binlog_magic in next_event. This prevents the server from asserting, in the cases that the SQL thread was reads from a hot log (relay.NNNNN), then is stopped, then is restarted from a previous cold log (relay.MMMMM), and then it reaches the hot log relay.NNNNN again, in which case, the read coordinates are not set to zero, but to the old values. Additionally, some comments to the source code were added.
author: Luis Soares <luis.soares@oracle.com> 2010-09-24 10:44:53 +0100
committer: Luis Soares <luis.soares@oracle.com> 2010-09-24 10:44:53 +0100
commit: 66a40d0b8ac73ece279d8a91e3da2aaa9f65ef09 (patch)
tree: cdfa97b221f1127cb222fedd3d491f0eb92b7554 /sql/slave.cc
parent: b288324a13452a5c8430af6eb560ba1ba4b73746 (diff)
download: mariadb-git-66a40d0b8ac73ece279d8a91e3da2aaa9f65ef09.tar.gz
1 files changed, 59 insertions, 5 deletions
diff --git a/sql/slave.cc b/sql/slave.cc
index f1e0962e7e8..bd9017e6318 100644
--- a/sql/slave.cc
+++ b/sql/slave.cc
@@ -4349,12 +4349,66 @@ static Log_event* next_event(Relay_log_info* rli)
         DBUG_ASSERT(rli->cur_log_fd == -1);
 
         /*
-          Read pointer has to be at the start since we are the only
-          reader.
-          We must keep the LOCK_log to read the 4 first bytes, as this is a hot
-          log (same as when we call read_log_event() above: for a hot log we
-          take the mutex).
+           When the SQL thread is [stopped and] (re)started the
+           following may happen:
+
+           1. Log was hot at stop time and remains hot at restart
+
+              SQL thread reads again from hot_log (SQL thread was
+              reading from the active log when it was stopped and the
+              very same log is still active on SQL thread restart).
+
+              In this case, my_b_seek is performed on cur_log, while
+              cur_log points to relay_log.get_log_file();
+
+           2. Log was hot at stop time but got cold before restart
+
+              The log was hot when SQL thread stopped, but it is not
+              anymore when the SQL thread restarts.
+
+              In this case, the SQL thread reopens the log, using
+              cache_buf, ie, cur_log points to &cache_buf, and thence
+              its coordinates are reset.
+
+           3. Log was already cold at stop time
+
+              The log was not hot when the SQL thread stopped, and, of
+              course, it will not be hot when it restarts.
+
+              In this case, the SQL thread opens the cold log again,
+              using cache_buf, ie, cur_log points to &cache_buf, and
+              thence its coordinates are reset.
+
+           4. Log was hot at stop time, DBA changes to previous cold
+              log and restarts SQL thread
+
+              The log was hot when the SQL thread was stopped, but the
+              user changed the coordinates of the SQL thread to
+              restart from a previous cold log.
+
+              In this case, at start time, cur_log points to a cold
+              log, opened using &cache_buf as cache, and coordinates
+              are reset. However, as it moves on to the next logs, it
+              will eventually reach the hot log. If the hot log is the
+              same at the time the SQL thread was stopped, then
+              coordinates were not reset - the cur_log will point to
+              relay_log.get_log_file(), and not a freshly opened
+              IO_CACHE through cache_buf. For this reason we need to
+              deploy a my_b_seek before calling check_binlog_magic at
+              this point of the code (see: BUG#55263 for more
+              details).
+          
+          NOTES: 
+            - We must keep the LOCK_log to read the 4 first bytes, as
+              this is a hot log (same as when we call read_log_event()
+              above: for a hot log we take the mutex).
+
+            - Because of scenario #4 above, we need to have a
+              my_b_seek here. Otherwise, we might hit the assertion
+              inside check_binlog_magic.
         */
+
+        my_b_seek(cur_log, (my_off_t) 0);
         if (check_binlog_magic(cur_log,&errmsg))
         {
           if (!hot_log) pthread_mutex_unlock(log_lock);
author	Luis Soares <luis.soares@oracle.com>	2010-09-24 10:44:53 +0100
committer	Luis Soares <luis.soares@oracle.com>	2010-09-24 10:44:53 +0100
commit	66a40d0b8ac73ece279d8a91e3da2aaa9f65ef09 (patch)
tree	cdfa97b221f1127cb222fedd3d491f0eb92b7554 /sql/slave.cc
parent	b288324a13452a5c8430af6eb560ba1ba4b73746 (diff)
download	mariadb-git-66a40d0b8ac73ece279d8a91e3da2aaa9f65ef09.tar.gz