diff options
Diffstat (limited to 'src/lock/Design')
-rw-r--r-- | src/lock/Design | 301 |
1 files changed, 301 insertions, 0 deletions
diff --git a/src/lock/Design b/src/lock/Design new file mode 100644 index 00000000..f82bc7e8 --- /dev/null +++ b/src/lock/Design @@ -0,0 +1,301 @@ +Synchronization in the Locking Subsystem + +This is a document that describes how we implemented fine-grain locking +in the lock manager (that is, locking on a hash bucket level instead of +locking the entire region). We found that the increase in concurrency +was not sufficient to warrant the increase in complexity or the additional +cost of performing each lock operation. Therefore, we don't use this +any more. Should we have to do fine-grain locking in a future release, +this would be a reasonable starting point. + +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= +1. Data structures +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= + +The lock manager maintains 3 different structures: + +Objects (__db_lockobj): + Describes an object that is locked. When used with DB, this consists + of a __db_ilock (a file identifier and a page number). + +Lockers (__db_locker): + Identifies a specific locker ID and maintains the head of a list of + locks held by a locker (for using during transaction commit/abort). + +Locks (__db_lock): + Describes a particular object lock held on behalf of a particular + locker id. + +Objects and Lockers reference Locks. + +These structures are organized via two synchronized hash tables. Each +hash table consists of two physical arrays: the array of actual hash +buckets and an array of mutexes so we can lock individual buckets, rather +than the whole table. + +One hash table contains Objects and the other hash table contains Lockers. +Objects contain two lists of locks, waiters and holders: holders currently +hold a lock on the Object, waiters are lock waiting to be granted. +Lockers are a single linked list that connects the Locks held on behalf +of the specific locker ID. + +In the diagram below: + +Locker ID #1 holds a lock on Object #1 (L1) and Object #2 (L5), and is +waiting on a lock on Object #1 (L3). + +Locker ID #2 holds a lock on Object #1 (L2) and is waiting on a lock for +Object #2 (L7). + +Locker ID #3 is waiting for a lock on Object #2 (L6). + + OBJECT ----------------------- + HASH | | + ----|------------- | + ________ _______ | | ________ | | + | |-->| O1 |--|---|-->| O2 | | | + |_______| |_____| | | |______| V | + | | W H--->L1->L2 W H--->L5 | holders + |_______| | | | | V + | | ------->L3 \ ------->L6------>L7 waiters + |_______| / \ \ + . . / \ \ + . . | \ \ + . . | \ ----------- + |_______| | -------------- | + | | ____|____ ___|_____ _|______ + |_______| | | | | | | + | | | LID1 | | LID2 | | LID3 | + |_______| |_______| |_______| |______| + ^ ^ ^ + | | | + ___|________________________|________|___ + LOCKER | | | | | | | | | + HASH | | | | | | | | | + | | | | | | | | | + |____|____|____|____|____|____|____|____| + +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= +2. Synchronization +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= + +There are four types of mutexes in the subsystem. + +Object mutexes; + These map one-to-one to each bucket in the Object hash table. + Holding a mutex on an Object bucket secures all the Objects in + that bucket as well as the Lock structures linked from those + Objects. All fields in the Locks EXCEPT the Locker links (the + links that attach Locks by Locker ID) are protected by these + mutexes. + +Locker mutexes: + These map one-to-one to each bucket in the Locker hash table. + Holding a mutex on a Locker bucket secures the Locker structures + and the Locker links in the Locks. + +Memory mutex: + This mutex allows calls to allocate/free memory, i.e. calls to + __db_shalloc and __db_shalloc_free, as well as manipulation of + the Object, Locker and Lock free lists. + +Region mutex: + This mutex is currently only used to protect the locker ids. + It may also be needed later to provide exclusive access to + the region for deadlock detection. + +Creating or removing a Lock requires locking both the Object lock and the +Locker lock (and eventually the shalloc lock to return the item to the +free list). + +The locking hierarchy is as follows: + + The Region mutex may never be acquired after any other mutex. + + The Object mutex may be acquired after the Region mutex. + + The Locker mutex may be acquired after the Region and Object + mutexes. + + The Memory mutex may be acquired after any mutex. + +So, if both and Object mutex and a Locker mutex are going to be acquired, +the Object mutex must be acquired first. + +The Memory mutex may be acquired after any other mutex, but no other mutexes +can be acquired once the Memory mutex is held. + +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= +3. The algorithms: +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= +The locking subsystem supports four basic operations: + Get a Lock (lock_get) + + Release a Lock (lock_put) + + Release all the Locks on a specific Object (lock_vec) + + Release all the Locks for a specific Locker (lock_vec) + +Get a lock: + Acquire Object bucket mutex. + Acquire Locker bucket mutex. + + Acquire Memory mutex. + If the Object does not exist + Take an Object off the freelist. + If the Locker doesn't exist + Take a Locker off the freelist. + Take a Lock off the free list. + Release Memory mutex. + + Add Lock to the Object list. + Add Lock to the Locker list. + Release Locker bucket mutex + + If the lock cannot be granted + Release Object bucket mutex + Acquire lock mutex (blocks) + + Acquire Object bucket mutex + If lock acquisition did not succeed (e.g, deadlock) + Acquire Locker bucket mutex + If locker should be destroyed + Remove locker from hash table + Acquire Memory mutex + Return locker to free list + Release Memory mutex + Release Locker bucket mutex + + If object should be released + Acquire Memory mutex + Return object to free list + Release Memory mutex + + Release Object bucket mutex + +Release a lock: + Acquire Object bucket mutex. + (Requires that we be able to find the Object hash bucket + without looking inside the Lock itself.) + + If releasing a single lock and the user provided generation number + doesn't match the Lock's generation number, the Lock has been reused + and we return failure. + + Enter lock_put_internal: + if the Lock is still on the Object's lists: + Increment Lock's generation number. + Remove Lock from the Object's list (NULL link fields). + Promote locks for the Object. + + Enter locker_list_removal + Acquire Locker bucket mutex. + If Locker doesn't exist: + Release Locker bucket mutex + Release Object bucket mutex + Return error. + Else if Locker marked as deleted: + dont_release = TRUE + Else + Remove Lock from Locker list. + If Locker has no more locks + Remove Locker from table. + Acquire Memory mutex. + Return Locker to free list + Release Memory mutex + Release Locker bucket mutex. + Exit locker_list_removal + + If (!dont_release) + Acquire Memory mutex + Return Lock to free list + Release Memory mutex + + Exit lock_put_internal + + Release Object bucket mutex + +Release all the Locks on a specific Object (lock_vec, DB_PUT_ALL_OBJ): + + Acquire Object bucket mutex. + + For each lock on the waiter list: + lock_put_internal + For each lock on the holder list: + lock_put_internal + + Release Object bucket mutex. + +Release all the Locks for a specific Locker (lock_vec, DB_PUT_ALL): + + Acquire Locker bucket mutex. + Mark Locker deleted. + Release Locker mutex. + + For each lock on the Locker's list: + Remove from locker's list + (The lock could get put back on the free list in + lock_put and then could get reallocated and the + act of setting its locker links could clobber us.) + Perform "Release a Lock" above: skip locker_list_removal. + + Acquire Locker bucket mutex. + Remove Locker + Release Locker mutex. + + Acquire Memory mutex + Return Locker to free list + Release Memory mutex + +Deadlock detection (lock_detect): + + For each bucket in Object table + Acquire the Object bucket mutex. + create waitsfor + + For each bucket in Object table + Release the Object mutex. + +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= +FAQ: +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= +Q: Why do you need generation numbers? +A: If a lock has been released due to a transaction abort (potentially in a + different process), and then lock is released by a thread of control + unaware of the abort, the lock might have potentially been re-allocated + to a different object. The generation numbers detect this problem. + + Note, we assume that reads/writes of lock generation numbers are atomic, + if they are not, it is theoretically possible that a re-allocated lock + could be mistaken for another lock. + +Q: Why is is safe to walk the Locker list without holding any mutexes at + all? +A: Locks are created with both the Object and Locker bucket mutexes held. + Once created, they removed in two ways: + + a) when a specific Lock is released, in which case, the Object and + Locker bucket mutexes are again held, and + + b) when all Locks for a specific Locker Id is released. + + In case b), the Locker bucket mutex is held while the Locker chain is + marked as "destroyed", which blocks any further access to the Locker + chain. Then, each individual Object bucket mutex is acquired when each + individual Lock is removed. + +Q: What are the implications of doing fine grain locking? + +A: Since we no longer globally lock the entire region, lock_vec will no + longer be atomic. We still execute the items in a lock_vec in order, + so things like lock-coupling still work, but you can't make any + guarantees about atomicity. + +Q: How do I configure for FINE_GRAIN locking? + +A: We currently do not support any automatic configuration for FINE_GRAIN + locking. When we do, will need to document that atomicity discussion + listed above (it is bug-report #553). + +Copyright (c) 2010, 2012 Oracle and/or its affiliates. All rights reserved. |