Applied InnoDB snapshot innodb-5.0-ss2095

Fixes the following bugs: - Bug #29560: InnoDB >= 5.0.30 hangs on adaptive hash rw-lock 'waiting for an X-lock' Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. - Bug #32125: Database crash due to ha_innodb.cc:3896: ulint convert_search_mode_to_innobase When unknown find_flag is encountered in convert_search_mode_to_innobase() do not call assert(0); instead queue a MySQL error using my_error() and return the error code PAGE_CUR_UNSUPP. Change the functions that call convert_search_mode_to_innobase() to handle that error code by "canceling" execution and returning appropriate error code further upstream. innobase/include/db0err.h: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2091: branches/5.0: Merge r2088 from trunk: log for r2088: Fix Bug#32125 (http://bugs.mysql.com/32125) "Database crash due to ha_innodb.cc:3896: ulint convert_search_mode_to_innobase": When unknown find_flag is encountered in convert_search_mode_to_innobase() do not call assert(0); instead queue a MySQL error using my_error() and return the error code PAGE_CUR_UNSUPP. Change the functions that call convert_search_mode_to_innobase() to handle that error code by "canceling" execution and returning appropriate error code further upstream. Approved by: Heikki innobase/include/os0sync.h: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/include/page0cur.h: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2091: branches/5.0: Merge r2088 from trunk: log for r2088: Fix Bug#32125 (http://bugs.mysql.com/32125) "Database crash due to ha_innodb.cc:3896: ulint convert_search_mode_to_innobase": When unknown find_flag is encountered in convert_search_mode_to_innobase() do not call assert(0); instead queue a MySQL error using my_error() and return the error code PAGE_CUR_UNSUPP. Change the functions that call convert_search_mode_to_innobase() to handle that error code by "canceling" execution and returning appropriate error code further upstream. Approved by: Heikki innobase/include/sync0rw.h: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/include/sync0rw.ic: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/include/sync0sync.ic: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/os/os0sync.c: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/srv/srv0srv.c: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/sync/sync0arr.c: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/sync/sync0rw.c: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/sync/sync0sync.c: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki sql/ha_innodb.cc: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2091: branches/5.0: Merge r2088 from trunk: log for r2088: Fix Bug#32125 (http://bugs.mysql.com/32125) "Database crash due to ha_innodb.cc:3896: ulint convert_search_mode_to_innobase": When unknown find_flag is encountered in convert_search_mode_to_innobase() do not call assert(0); instead queue a MySQL error using my_error() and return the error code PAGE_CUR_UNSUPP. Change the functions that call convert_search_mode_to_innobase() to handle that error code by "canceling" execution and returning appropriate error code further upstream. Approved by: Heikki Revision r2095: branches/5.0: Merge r2093 from trunk: convert_search_mode_to_innobase(): Add the missing case label HA_READ_MBR_EQUAL that was forgotten in r2088.
author: unknown <tsmith@ramayana.hindu.god> 2007-11-20 10:53:19 -0700
committer: unknown <tsmith@ramayana.hindu.god> 2007-11-20 10:53:19 -0700
commit: a3dc40e24affdef5c09e4e9a144ada5f9c14680d (patch)
tree: 21747ca103bd9a3c41f06ba1d1e27672465dfaaa /innobase
parent: 49934f490a91cad206f68c1cc54e18dd6453f02a (diff)
download: mariadb-git-a3dc40e24affdef5c09e4e9a144ada5f9c14680d.tar.gz
11 files changed, 231 insertions, 40 deletions
diff --git a/innobase/include/db0err.h b/innobase/include/db0err.h
index de5ac44e73f..247c5de67db 100644
--- a/innobase/include/db0err.h
+++ b/innobase/include/db0err.h
@@ -57,6 +57,18 @@ Created 5/24/1996 Heikki Tuuri
 					buffer pool (for big transactions,
 					InnoDB stores the lock structs in the
 					buffer pool) */
+#define DB_FOREIGN_DUPLICATE_KEY 46	/* foreign key constraints
+					activated by the operation would
+					lead to a duplicate key in some
+					table */
+#define DB_TOO_MANY_CONCURRENT_TRXS 47	/* when InnoDB runs out of the
+					preconfigured undo slots, this can
+					only happen when there are too many
+					concurrent transactions */
+#define DB_UNSUPPORTED		48	/* when InnoDB sees any artefact or
+					a feature that it can't recoginize or
+					work with e.g., FT indexes created by
+					a later version of the engine. */
 
 /* The following are partial failure codes */
 #define DB_FAIL 		1000
diff --git a/innobase/include/os0sync.h b/innobase/include/os0sync.h
index d27b1676f1b..ef013bd1f2a 100644
--- a/innobase/include/os0sync.h
+++ b/innobase/include/os0sync.h
@@ -112,9 +112,13 @@ os_event_set(
 	os_event_t	event);	/* in: event to set */
 /**************************************************************
 Resets an event semaphore to the nonsignaled state. Waiting threads will
-stop to wait for the event. */
+stop to wait for the event.
+The return value should be passed to os_even_wait_low() if it is desired
+that this thread should not wait in case of an intervening call to
+os_event_set() between this os_event_reset() and the
+os_event_wait_low() call. See comments for os_event_wait_low(). */
 
-void
+ib_longlong
 os_event_reset(
 /*===========*/
 	os_event_t	event);	/* in: event to reset */
@@ -125,16 +129,38 @@ void
 os_event_free(
 /*==========*/
 	os_event_t	event);	/* in: event to free */
+
 /**************************************************************
 Waits for an event object until it is in the signaled state. If
 srv_shutdown_state == SRV_SHUTDOWN_EXIT_THREADS this also exits the
 waiting thread when the event becomes signaled (or immediately if the
-event is already in the signaled state). */
+event is already in the signaled state).
+
+Typically, if the event has been signalled after the os_event_reset()
+we'll return immediately because event->is_set == TRUE.
+There are, however, situations (e.g.: sync_array code) where we may
+lose this information. For example:
+
+thread A calls os_event_reset()
+thread B calls os_event_set()   [event->is_set == TRUE]
+thread C calls os_event_reset() [event->is_set == FALSE]
+thread A calls os_event_wait()  [infinite wait!]
+thread C calls os_event_wait()  [infinite wait!]
+
+Where such a scenario is possible, to avoid infinite wait, the
+value returned by os_event_reset() should be passed in as
+reset_sig_count. */
+
+#define os_event_wait(event) os_event_wait_low((event), 0)
 
 void
-os_event_wait(
-/*==========*/
-	os_event_t	event);	/* in: event to wait */
+os_event_wait_low(
+/*==============*/
+	os_event_t	event,		/* in: event to wait */
+	ib_longlong	reset_sig_count);/* in: zero or the value
+					returned by previous call of
+					os_event_reset(). */
+
 /**************************************************************
 Waits for an event object until it is in the signaled state or
 a timeout is exceeded. In Unix the timeout is always infinite. */
diff --git a/innobase/include/page0cur.h b/innobase/include/page0cur.h
index b03302b0e77..3a76c5e02ba 100644
--- a/innobase/include/page0cur.h
+++ b/innobase/include/page0cur.h
@@ -22,6 +22,7 @@ Created 10/4/1994 Heikki Tuuri
 
 /* Page cursor search modes; the values must be in this order! */
 
+#define	PAGE_CUR_UNSUPP	0
 #define	PAGE_CUR_G	1
 #define	PAGE_CUR_GE	2
 #define	PAGE_CUR_L	3
diff --git a/innobase/include/sync0rw.h b/innobase/include/sync0rw.h
index 4cd26ba1921..55eaf94bae4 100644
--- a/innobase/include/sync0rw.h
+++ b/innobase/include/sync0rw.h
@@ -418,6 +418,17 @@ field. Then no new readers are allowed in. */
 
 struct rw_lock_struct {
 	os_event_t	event;	/* Used by sync0arr.c for thread queueing */
+
+#ifdef __WIN__
+	os_event_t	wait_ex_event;	/* This windows specific event is
+				used by the thread which has set the
+				lock state to RW_LOCK_WAIT_EX. The
+				rw_lock design guarantees that this
+				thread will be the next one to proceed
+				once the current the event gets
+				signalled. See LEMMA 2 in sync0sync.c */
+#endif
+
 	ulint	reader_count;	/* Number of readers who have locked this
 				lock in the shared mode */
 	ulint	writer; 	/* This field is set to RW_LOCK_EX if there
diff --git a/innobase/include/sync0rw.ic b/innobase/include/sync0rw.ic
index 31a1ea6562a..5b65b57082f 100644
--- a/innobase/include/sync0rw.ic
+++ b/innobase/include/sync0rw.ic
@@ -382,6 +382,9 @@ rw_lock_s_unlock_func(
 	mutex_exit(mutex);
 
 	if (UNIV_UNLIKELY(sg)) {
+#ifdef __WIN__
+		os_event_set(lock->wait_ex_event);
+#endif
 		os_event_set(lock->event);
 		sync_array_object_signalled(sync_primary_wait_array);
 	}
@@ -463,6 +466,9 @@ rw_lock_x_unlock_func(
 	mutex_exit(&(lock->mutex));
 
 	if (UNIV_UNLIKELY(sg)) {
+#ifdef __WIN__
+		os_event_set(lock->wait_ex_event);
+#endif
 		os_event_set(lock->event);
 		sync_array_object_signalled(sync_primary_wait_array);
 	}
diff --git a/innobase/include/sync0sync.ic b/innobase/include/sync0sync.ic
index e5c6f56d8ba..ae807bbce4a 100644
--- a/innobase/include/sync0sync.ic
+++ b/innobase/include/sync0sync.ic
@@ -207,7 +207,7 @@ mutex_exit(
 	perform the read first, which could leave a waiting
 	thread hanging indefinitely.
 
-	Our current solution call every 10 seconds
+	Our current solution call every second
 	sync_arr_wake_threads_if_sema_free()
 	to wake up possible hanging threads if
 	they are missed in mutex_signal_object. */
diff --git a/innobase/os/os0sync.c b/innobase/os/os0sync.c
index a3204a7b3e8..59195b03acf 100644
--- a/innobase/os/os0sync.c
+++ b/innobase/os/os0sync.c
@@ -151,7 +151,14 @@ os_event_create(
 	ut_a(0 == pthread_cond_init(&(event->cond_var), NULL));
 #endif
 	event->is_set = FALSE;
-	event->signal_count = 0;
+
+	/* We return this value in os_event_reset(), which can then be
+	be used to pass to the os_event_wait_low(). The value of zero
+	is reserved in os_event_wait_low() for the case when the
+	caller does not want to pass any signal_count value. To
+	distinguish between the two cases we initialize signal_count
+	to 1 here. */
+	event->signal_count = 1;
 #endif /* __WIN__ */
 
 	/* The os_sync_mutex can be NULL because during startup an event
@@ -244,13 +251,20 @@ os_event_set(
 
 /**************************************************************
 Resets an event semaphore to the nonsignaled state. Waiting threads will
-stop to wait for the event. */
+stop to wait for the event.
+The return value should be passed to os_even_wait_low() if it is desired
+that this thread should not wait in case of an intervening call to
+os_event_set() between this os_event_reset() and the
+os_event_wait_low() call. See comments for os_event_wait_low(). */
 
-void
+ib_longlong
 os_event_reset(
 /*===========*/
+				/* out: current signal_count. */
 	os_event_t	event)	/* in: event to reset */
 {
+	ib_longlong	ret = 0;
+
 #ifdef __WIN__
 	ut_a(event);
 
@@ -265,9 +279,11 @@ os_event_reset(
 	} else {
 		event->is_set = FALSE;
 	}
+	ret = event->signal_count;
 
 	os_fast_mutex_unlock(&(event->os_mutex));
 #endif
+	return(ret);
 }
 
 /**************************************************************
@@ -335,18 +351,38 @@ os_event_free(
 Waits for an event object until it is in the signaled state. If
 srv_shutdown_state == SRV_SHUTDOWN_EXIT_THREADS this also exits the
 waiting thread when the event becomes signaled (or immediately if the
-event is already in the signaled state). */
+event is already in the signaled state).
+
+Typically, if the event has been signalled after the os_event_reset()
+we'll return immediately because event->is_set == TRUE.
+There are, however, situations (e.g.: sync_array code) where we may
+lose this information. For example:
+
+thread A calls os_event_reset()
+thread B calls os_event_set()   [event->is_set == TRUE]
+thread C calls os_event_reset() [event->is_set == FALSE]
+thread A calls os_event_wait()  [infinite wait!]
+thread C calls os_event_wait()  [infinite wait!]
+
+Where such a scenario is possible, to avoid infinite wait, the
+value returned by os_event_reset() should be passed in as
+reset_sig_count. */
 
 void
-os_event_wait(
-/*==========*/
-	os_event_t	event)	/* in: event to wait */
+os_event_wait_low(
+/*==============*/
+	os_event_t	event,		/* in: event to wait */
+	ib_longlong	reset_sig_count)/* in: zero or the value
+					returned by previous call of
+					os_event_reset(). */
 {
 #ifdef __WIN__
 	DWORD	err;
 
 	ut_a(event);
 
+	UT_NOT_USED(reset_sig_count);
+
 	/* Specify an infinite time limit for waiting */
 	err = WaitForSingleObject(event->handle, INFINITE);
 
@@ -360,7 +396,11 @@ os_event_wait(
 
 	os_fast_mutex_lock(&(event->os_mutex));
 
-	old_signal_count = event->signal_count;
+	if (reset_sig_count) {
+		old_signal_count = reset_sig_count;
+	} else {
+		old_signal_count = event->signal_count;
+	}
 
 	for (;;) {
 		if (event->is_set == TRUE
diff --git a/innobase/srv/srv0srv.c b/innobase/srv/srv0srv.c
index 31cd202e4d2..1227824ef80 100644
--- a/innobase/srv/srv0srv.c
+++ b/innobase/srv/srv0srv.c
@@ -1881,12 +1881,6 @@ loop:
 
 	os_thread_sleep(1000000);
 
-	/* In case mutex_exit is not a memory barrier, it is
-	theoretically possible some threads are left waiting though
-	the semaphore is already released. Wake up those threads: */
-	
-	sync_arr_wake_threads_if_sema_free();
-
 	current_time = time(NULL);
 
 	time_elapsed = difftime(current_time, last_monitor_time);
@@ -2083,9 +2077,15 @@ loop:
 		srv_refresh_innodb_monitor_stats();
 	}
 
+	/* In case mutex_exit is not a memory barrier, it is
+	theoretically possible some threads are left waiting though
+	the semaphore is already released. Wake up those threads: */
+	
+	sync_arr_wake_threads_if_sema_free();
+
 	if (sync_array_print_long_waits()) {
 		fatal_cnt++;
-		if (fatal_cnt > 5) {
+		if (fatal_cnt > 10) {
 
 			fprintf(stderr,
 "InnoDB: Error: semaphore wait has lasted > %lu seconds\n"
@@ -2103,7 +2103,7 @@ loop:
 
 	fflush(stderr);
 
-	os_thread_sleep(2000000);
+	os_thread_sleep(1000000);
 
 	if (srv_shutdown_state < SRV_SHUTDOWN_CLEANUP) {
 
diff --git a/innobase/sync/sync0arr.c b/innobase/sync/sync0arr.c
index 64f9310bad3..504a877bcc2 100644
--- a/innobase/sync/sync0arr.c
+++ b/innobase/sync/sync0arr.c
@@ -40,7 +40,15 @@ because we can do with a very small number of OS events,
 say 200. In NT 3.51, allocating events seems to be a quadratic
 algorithm, because 10 000 events are created fast, but
 100 000 events takes a couple of minutes to create.
-*/
+
+As of 5.0.30 the above mentioned design is changed. Since now
+OS can handle millions of wait events efficiently, we no longer
+have this concept of each cell of wait array having one event.
+Instead, now the event that a thread wants to wait on is embedded
+in the wait object (mutex or rw_lock). We still keep the global
+wait array for the sake of diagnostics and also to avoid infinite
+wait The error_monitor thread scans the global wait array to signal
+any waiting threads who have missed the signal. */
 
 /* A cell where an individual thread may wait suspended
 until a resource is released. The suspending is implemented
@@ -62,6 +70,14 @@ struct sync_cell_struct {
 	ibool		waiting;	/* TRUE if the thread has already
 					called sync_array_event_wait
 					on this cell */
+	ib_longlong	signal_count;	/* We capture the signal_count
+					of the wait_object when we
+					reset the event. This value is
+					then passed on to os_event_wait
+					and we wait only if the event
+					has not been signalled in the
+					period between the reset and
+					wait call. */
 	time_t		reservation_time;/* time when the thread reserved
 					the wait cell */
 };
@@ -216,6 +232,7 @@ sync_array_create(
 		cell = sync_array_get_nth_cell(arr, i);        	
                 cell->wait_object = NULL;
 		cell->waiting = FALSE;
+		cell->signal_count = 0;
 	}
 
 	return(arr);
@@ -282,16 +299,23 @@ sync_array_validate(
 /***********************************************************************
 Puts the cell event in reset state. */
 static
-void
+ib_longlong
 sync_cell_event_reset(
 /*==================*/
+				/* out: value of signal_count
+				at the time of reset. */
 	ulint		type,	/* in: lock type mutex/rw_lock */
 	void*		object) /* in: the rw_lock/mutex object */
 {
 	if (type == SYNC_MUTEX) {
-		os_event_reset(((mutex_t *) object)->event);
+		return(os_event_reset(((mutex_t *) object)->event));
+#ifdef __WIN__
+	} else if (type == RW_LOCK_WAIT_EX) {
+		return(os_event_reset(
+		       ((rw_lock_t *) object)->wait_ex_event));
+#endif
 	} else {
-		os_event_reset(((rw_lock_t *) object)->event);
+		return(os_event_reset(((rw_lock_t *) object)->event));
 	}
 }		
 
@@ -345,8 +369,11 @@ sync_array_reserve_cell(
 
 			sync_array_exit(arr);
 
-			/* Make sure the event is reset */
-			sync_cell_event_reset(type, object);
+			/* Make sure the event is reset and also store
+			the value of signal_count at which the event
+			was reset. */
+			cell->signal_count = sync_cell_event_reset(type,
+								object);
 
 			cell->reservation_time = time(NULL);
 
@@ -388,7 +415,14 @@ sync_array_wait_event(
 
 	if (cell->request_type == SYNC_MUTEX) {
 		event = ((mutex_t*) cell->wait_object)->event;
-	} else {
+#ifdef __WIN__
+	/* On windows if the thread about to wait is the one which
+	has set the state of the rw_lock to RW_LOCK_WAIT_EX, then
+	it waits on a special event i.e.: wait_ex_event. */
+	} else if (cell->request_type == RW_LOCK_WAIT_EX) {
+		event = ((rw_lock_t*) cell->wait_object)->wait_ex_event;
+#endif
+	} else {	
 		event = ((rw_lock_t*) cell->wait_object)->event;
 	}
 
@@ -413,7 +447,7 @@ sync_array_wait_event(
 #endif
         sync_array_exit(arr);
 
-        os_event_wait(event);
+        os_event_wait_low(event, cell->signal_count);
 
         sync_array_free_cell(arr, index);
 }
@@ -457,7 +491,11 @@ sync_array_cell_print(
 #endif /* UNIV_SYNC_DEBUG */
 			(ulong) mutex->waiters);
 
-	} else if (type == RW_LOCK_EX || type == RW_LOCK_SHARED) {
+	} else if (type == RW_LOCK_EX
+#ifdef __WIN__
+		   || type == RW_LOCK_WAIT_EX
+#endif
+		   || type == RW_LOCK_SHARED) {
 
 		fputs(type == RW_LOCK_EX ? "X-lock on" : "S-lock on", file);
 
@@ -638,7 +676,8 @@ sync_array_detect_deadlock(
 
 		return(FALSE); /* No deadlock */
 
-	} else if (cell->request_type == RW_LOCK_EX) {
+	} else if (cell->request_type == RW_LOCK_EX
+		   || cell->request_type == RW_LOCK_WAIT_EX) {
 
 	    lock = cell->wait_object;
 
@@ -734,7 +773,8 @@ sync_arr_cell_can_wake_up(
 			return(TRUE);
 		}
 
-	} else if (cell->request_type == RW_LOCK_EX) {
+	} else if (cell->request_type == RW_LOCK_EX
+		   || cell->request_type == RW_LOCK_WAIT_EX) {
 
 	    	lock = cell->wait_object;
 
@@ -783,6 +823,7 @@ sync_array_free_cell(
 
 	cell->waiting = FALSE;
 	cell->wait_object =  NULL;
+	cell->signal_count = 0;
 
 	ut_a(arr->n_reserved > 0);
 	arr->n_reserved--;
@@ -839,6 +880,14 @@ sync_arr_wake_threads_if_sema_free(void)
 
 					mutex = cell->wait_object;
 					os_event_set(mutex->event);
+#ifdef __WIN__
+				} else if (cell->request_type
+					   == RW_LOCK_WAIT_EX) {
+					rw_lock_t*	lock;
+
+					lock = cell->wait_object;
+					os_event_set(lock->wait_ex_event);
+#endif
 				} else {
 					rw_lock_t*	lock;
 
diff --git a/innobase/sync/sync0rw.c b/innobase/sync/sync0rw.c
index 629331d6049..337fd3d77fd 100644
--- a/innobase/sync/sync0rw.c
+++ b/innobase/sync/sync0rw.c
@@ -132,6 +132,10 @@ rw_lock_create_func(
 	lock->last_x_line = 0;
 	lock->event = os_event_create(NULL);
 
+#ifdef __WIN__
+	lock->wait_ex_event = os_event_create(NULL);
+#endif
+
 	mutex_enter(&rw_lock_list_mutex);
 	
 	if (UT_LIST_GET_LEN(rw_lock_list) > 0) {
@@ -168,6 +172,10 @@ rw_lock_free(
 	mutex_enter(&rw_lock_list_mutex);
 	os_event_free(lock->event);
 
+#ifdef __WIN__
+	os_event_free(lock->wait_ex_event);
+#endif
+
 	if (UT_LIST_GET_PREV(list, lock)) {
 		ut_a(UT_LIST_GET_PREV(list, lock)->magic_n == RW_LOCK_MAGIC_N);
 	}
@@ -521,7 +529,15 @@ lock_loop:
 	rw_x_system_call_count++;
 
         sync_array_reserve_cell(sync_primary_wait_array,
-				lock, RW_LOCK_EX,
+				lock,
+#ifdef __WIN__
+				/* On windows RW_LOCK_WAIT_EX signifies
+				that this thread should wait on the
+				special wait_ex_event. */
+				(state == RW_LOCK_WAIT_EX)
+				 ? RW_LOCK_WAIT_EX :
+#endif
+				RW_LOCK_EX,
 				file_name, line,
 				&index);
 
diff --git a/innobase/sync/sync0sync.c b/innobase/sync/sync0sync.c
index 25b7a5588d9..c98e38d5f27 100644
--- a/innobase/sync/sync0sync.c
+++ b/innobase/sync/sync0sync.c
@@ -95,17 +95,47 @@ have happened that the thread which was holding the mutex has just released
 it and did not see the waiters byte set to 1, a case which would lead the
 other thread to an infinite wait.
 
-LEMMA 1: After a thread resets the event of the cell it reserves for waiting
-========
-for a mutex, some thread will eventually call sync_array_signal_object with
-the mutex as an argument. Thus no infinite wait is possible.
+LEMMA 1: After a thread resets the event of a mutex (or rw_lock), some
+=======
+thread will eventually call os_event_set() on that particular event.
+Thus no infinite wait is possible in this case.
 
 Proof:	After making the reservation the thread sets the waiters field in the
 mutex to 1. Then it checks that the mutex is still reserved by some thread,
 or it reserves the mutex for itself. In any case, some thread (which may be
 also some earlier thread, not necessarily the one currently holding the mutex)
 will set the waiters field to 0 in mutex_exit, and then call
-sync_array_signal_object with the mutex as an argument. 
+os_event_set() with the mutex as an argument. 
+Q.E.D.
+
+LEMMA 2: If an os_event_set() call is made after some thread has called
+=======
+the os_event_reset() and before it starts wait on that event, the call
+will not be lost to the second thread. This is true even if there is an
+intervening call to os_event_reset() by another thread.
+Thus no infinite wait is possible in this case.
+
+Proof (non-windows platforms): os_event_reset() returns a monotonically
+increasing value of signal_count. This value is increased at every
+call of os_event_set() If thread A has called os_event_reset() followed
+by thread B calling os_event_set() and then some other thread C calling
+os_event_reset(), the is_set flag of the event will be set to FALSE;
+but now if thread A calls os_event_wait_low() with the signal_count
+value returned from the earlier call of os_event_reset(), it will
+return immediately without waiting.
+Q.E.D.
+
+Proof (windows): If there is a writer thread which is forced to wait for
+the lock, it may be able to set the state of rw_lock to RW_LOCK_WAIT_EX
+The design of rw_lock ensures that there is one and only one thread
+that is able to change the state to RW_LOCK_WAIT_EX and this thread is
+guaranteed to acquire the lock after it is released by the current
+holders and before any other waiter gets the lock.
+On windows this thread waits on a separate event i.e.: wait_ex_event.
+Since only one thread can wait on this event there is no chance
+of this event getting reset before the writer starts wait on it.
+Therefore, this thread is guaranteed to catch the os_set_event()
+signalled unconditionally at the release of the lock.
 Q.E.D. */
 
 ulint	sync_dummy			= 0;
author	unknown <tsmith@ramayana.hindu.god>	2007-11-20 10:53:19 -0700
committer	unknown <tsmith@ramayana.hindu.god>	2007-11-20 10:53:19 -0700
commit	a3dc40e24affdef5c09e4e9a144ada5f9c14680d (patch)
tree	21747ca103bd9a3c41f06ba1d1e27672465dfaaa /innobase
parent	49934f490a91cad206f68c1cc54e18dd6453f02a (diff)
download	mariadb-git-a3dc40e24affdef5c09e4e9a144ada5f9c14680d.tar.gz