diff options
author | Shishir Jaiswal <shishir.j.jaiswal@oracle.com> | 2016-12-22 14:56:02 +0530 |
---|---|---|
committer | Shishir Jaiswal <shishir.j.jaiswal@oracle.com> | 2016-12-22 14:56:02 +0530 |
commit | e00810b934fd495009c1b8d47446714bdbc0b249 (patch) | |
tree | 28b25a5fb8a5d1944ec586eaa2fb66983561cf7b /scripts | |
parent | 1079066b22815b9c46a6689c93469c3af1fd88ff (diff) | |
download | mariadb-git-e00810b934fd495009c1b8d47446714bdbc0b249.tar.gz |
Bug#11751149 - TRYING TO START MYSQL WHILE ANOTHER INSTANCE
IS STARTING: CONFUSING ERROR
DESCRIPTION
===========
When mysql server processes transactions but has not yet
committed and shuts down abnormally (due to crash, external
killing etc.), a recovery is due from Storage engine side
which takes place the next time mysql server (either
through mysqld or mysqld_safe) is run.
While the 1st server is in mid of recovery, if another
instance of mysqld_safe is made to run, it may result into
2nd instance killing the 1st one after a moment.
ANALYSIS
========
In the "while true" loop, we've a check (which is done
after the server stops) for the existence of pid file to
enquire if it was a normal shutdown or not. If the file is
absent, it means that the graceful exit of server had
removed this file.
However if the file is present, the scripts makes a plain
assumption that this file is leftover of the "current"
server. It misses to consider that it could be a valid pid
file belonging to another running mysql server.
We need to add more checks in the latter case. The script
should extract the PID from this existing file and check if
its running or not. If yes, it means an older instance of
mysql server is running and hence the script should abort.
FIX
===
Checking the status of process (alive or not) by adding a
@CHECK_PID@ in such a case. Aborting if its alive. Detailed
logic is as follows:
- The mysqld_safe script would quit at start only as soon
as it finds that there is an active PID i.e. a mysql server
is already running.
- The PID file creation takes place after InnoDb recovery,
which means in rare case (when PID file isn't created yet)
it may happen that more than 1 server can come up but even
in that case others will have to wait till the 1st server
has released the acquired InnoDb lock. In this case all
these servers will either TIMEOUT waiting for InnoDb lock
or after this they would find that the 1st server is
already running (by reading $pid_file) and would abort.
- Our core fix is that we now check the status of mysql
server process (alive or not) after the server stops
running within the loop of "run -> shutdown/kill/abort ->
run ... ", so that only the script who owns the mysql
server would be able to bring it down if required.
NOTE
====
Removed the deletion of pid file and socket file from entry
of the loop, as it may result in 2nd instance deleting
these files created by 1st instance in RACE condition.
Compensated this by deleting these files at end of the loop
Reverted the changes made in patch to Bug#16776528. So
after this patch is pushed, the concept of mysqld_safe.pid
would go altogether. This was required as the script was
deleting other instance's mysqld_safe.pid allowing multiple
mysqld_safe instances to run in parallel. This patch would
fix Bug#16776528 as well as the resources would be guarded
anyway by InnoDb lock + our planned 5.7 patch.
Diffstat (limited to 'scripts')
-rw-r--r-- | scripts/mysqld_safe.sh | 42 |
1 files changed, 28 insertions, 14 deletions
diff --git a/scripts/mysqld_safe.sh b/scripts/mysqld_safe.sh index a5c87a44e65..5148ecfc888 100644 --- a/scripts/mysqld_safe.sh +++ b/scripts/mysqld_safe.sh @@ -790,14 +790,23 @@ then fi if [ ! -h "$pid_file" ]; then rm -f "$pid_file" + if test -f "$pid_file"; then + log_error "Fatal error: Can't remove the pid file: +$pid_file. +Please remove the file manually and start $0 again; +mysqld daemon not started" + exit 1 + fi fi - if test -f "$pid_file" - then - log_error "Fatal error: Can't remove the pid file: -$pid_file -Please remove it manually and start $0 again; + if [ ! -h "$safe_mysql_unix_port" ]; then + rm -f "$safe_mysql_unix_port" + if test -f "$safe_mysql_unix_port"; then + log_error "Fatal error: Can't remove the socket file: +$safe_mysql_unix_port. +Please remove the file manually and start $0 again; mysqld daemon not started" - exit 1 + exit 1 + fi fi fi @@ -841,14 +850,6 @@ have_sleep=1 while true do - # Some extra safety - if [ ! -h "$safe_mysql_unix_port" ]; then - rm -f "$safe_mysql_unix_port" - fi - if [ ! -h "$pid_file" ]; then - rm -f "$pid_file" - fi - start_time=`date +%M%S` eval_log_error "$cmd" @@ -884,6 +885,13 @@ do if test ! -f "$pid_file" # This is removed if normal shutdown then break + else # self's mysqld crashed or other's mysqld running + PID=`cat "$pid_file"` + if @CHECK_PID@ + then # true when above pid belongs to a running mysqld process + log_error "A mysqld process with pid=$PID is already running. Aborting!!" + exit 1 + fi fi @@ -941,6 +949,12 @@ do I=`expr $I + 1` done fi + if [ ! -h "$pid_file" ]; then + rm -f "$pid_file" + fi + if [ ! -h "$safe_mysql_unix_port" ]; then + rm -f "$safe_mysql_unix_port" + fi log_notice "mysqld restarted" done |