summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorShishir Jaiswal <shishir.j.jaiswal@oracle.com>2016-12-22 14:56:02 +0530
committerShishir Jaiswal <shishir.j.jaiswal@oracle.com>2016-12-22 14:56:02 +0530
commite00810b934fd495009c1b8d47446714bdbc0b249 (patch)
tree28b25a5fb8a5d1944ec586eaa2fb66983561cf7b
parent1079066b22815b9c46a6689c93469c3af1fd88ff (diff)
downloadmariadb-git-e00810b934fd495009c1b8d47446714bdbc0b249.tar.gz
Bug#11751149 - TRYING TO START MYSQL WHILE ANOTHER INSTANCE
IS STARTING: CONFUSING ERROR DESCRIPTION =========== When mysql server processes transactions but has not yet committed and shuts down abnormally (due to crash, external killing etc.), a recovery is due from Storage engine side which takes place the next time mysql server (either through mysqld or mysqld_safe) is run. While the 1st server is in mid of recovery, if another instance of mysqld_safe is made to run, it may result into 2nd instance killing the 1st one after a moment. ANALYSIS ======== In the "while true" loop, we've a check (which is done after the server stops) for the existence of pid file to enquire if it was a normal shutdown or not. If the file is absent, it means that the graceful exit of server had removed this file. However if the file is present, the scripts makes a plain assumption that this file is leftover of the "current" server. It misses to consider that it could be a valid pid file belonging to another running mysql server. We need to add more checks in the latter case. The script should extract the PID from this existing file and check if its running or not. If yes, it means an older instance of mysql server is running and hence the script should abort. FIX === Checking the status of process (alive or not) by adding a @CHECK_PID@ in such a case. Aborting if its alive. Detailed logic is as follows: - The mysqld_safe script would quit at start only as soon as it finds that there is an active PID i.e. a mysql server is already running. - The PID file creation takes place after InnoDb recovery, which means in rare case (when PID file isn't created yet) it may happen that more than 1 server can come up but even in that case others will have to wait till the 1st server has released the acquired InnoDb lock. In this case all these servers will either TIMEOUT waiting for InnoDb lock or after this they would find that the 1st server is already running (by reading $pid_file) and would abort. - Our core fix is that we now check the status of mysql server process (alive or not) after the server stops running within the loop of "run -> shutdown/kill/abort -> run ... ", so that only the script who owns the mysql server would be able to bring it down if required. NOTE ==== Removed the deletion of pid file and socket file from entry of the loop, as it may result in 2nd instance deleting these files created by 1st instance in RACE condition. Compensated this by deleting these files at end of the loop Reverted the changes made in patch to Bug#16776528. So after this patch is pushed, the concept of mysqld_safe.pid would go altogether. This was required as the script was deleting other instance's mysqld_safe.pid allowing multiple mysqld_safe instances to run in parallel. This patch would fix Bug#16776528 as well as the resources would be guarded anyway by InnoDb lock + our planned 5.7 patch.
-rw-r--r--scripts/mysqld_safe.sh42
1 files changed, 28 insertions, 14 deletions
diff --git a/scripts/mysqld_safe.sh b/scripts/mysqld_safe.sh
index a5c87a44e65..5148ecfc888 100644
--- a/scripts/mysqld_safe.sh
+++ b/scripts/mysqld_safe.sh
@@ -790,14 +790,23 @@ then
fi
if [ ! -h "$pid_file" ]; then
rm -f "$pid_file"
+ if test -f "$pid_file"; then
+ log_error "Fatal error: Can't remove the pid file:
+$pid_file.
+Please remove the file manually and start $0 again;
+mysqld daemon not started"
+ exit 1
+ fi
fi
- if test -f "$pid_file"
- then
- log_error "Fatal error: Can't remove the pid file:
-$pid_file
-Please remove it manually and start $0 again;
+ if [ ! -h "$safe_mysql_unix_port" ]; then
+ rm -f "$safe_mysql_unix_port"
+ if test -f "$safe_mysql_unix_port"; then
+ log_error "Fatal error: Can't remove the socket file:
+$safe_mysql_unix_port.
+Please remove the file manually and start $0 again;
mysqld daemon not started"
- exit 1
+ exit 1
+ fi
fi
fi
@@ -841,14 +850,6 @@ have_sleep=1
while true
do
- # Some extra safety
- if [ ! -h "$safe_mysql_unix_port" ]; then
- rm -f "$safe_mysql_unix_port"
- fi
- if [ ! -h "$pid_file" ]; then
- rm -f "$pid_file"
- fi
-
start_time=`date +%M%S`
eval_log_error "$cmd"
@@ -884,6 +885,13 @@ do
if test ! -f "$pid_file" # This is removed if normal shutdown
then
break
+ else # self's mysqld crashed or other's mysqld running
+ PID=`cat "$pid_file"`
+ if @CHECK_PID@
+ then # true when above pid belongs to a running mysqld process
+ log_error "A mysqld process with pid=$PID is already running. Aborting!!"
+ exit 1
+ fi
fi
@@ -941,6 +949,12 @@ do
I=`expr $I + 1`
done
fi
+ if [ ! -h "$pid_file" ]; then
+ rm -f "$pid_file"
+ fi
+ if [ ! -h "$safe_mysql_unix_port" ]; then
+ rm -f "$safe_mysql_unix_port"
+ fi
log_notice "mysqld restarted"
done