summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* WT-3329 Visit trees using a tiny fraction of cache. (#3442)mongodb-3.4.9Michael Cahill2017-09-073-31/+8
| | | | | | | | | | For workloads where no tree takes up a large enough fraction of cache, we were using a randomized approach to deciding when eviction should visit trees. That led to slow performance for workloads with uniform updates over thousands of trees. (cherry picked from commit 2f1ec98512010f6c92bf27a41180e3c8704b54c8) Signed-off-by: Alex Gorrod <alexander.gorrod@mongodb.com>
* WT-3438 Don't tune eviction thread count when the count is fixed (#3519)David Hows2017-09-071-0/+7
| | | | | (cherry picked from commit 6173a98979715ed727c432c1a31da64ea8a37048) Signed-off-by: Alex Gorrod <alexander.gorrod@mongodb.com>
* WT-3499 Add a visibility rwlock between transactions and checkpoints. (#3575)sueloverso2017-08-153-1/+33
| | | | | | | | | | * WT-3499 Add a visibility rwlock between transactions and checkpoints. * Typo * Just acquire/release the lock immediately for synchronization. (cherry picked from commit 80c6cee91faf84c4772a98c40e60d0a6890ccb52)
* WT-3471 Sweep the table cache after schema changes. (#3551)Michael Cahill2017-08-034-0/+40
| | | During WT_SESSION::reset, if there has been a schema change (such as a WT_SESSION::drop operation) since the last sweep, do a pass through the table cache and remove any obsolete table handles.
* WT-3373 Access violation due to a bug in internal page splitting (#3478)mongodb-3.4.8mongodb-3.4.7mongodb-3.4.6Keith Bostic2017-06-271-3/+29
| | | | | | | | | When acquiring a lock on our parent internal page, we use the WT_REF.home field to reference our parent page. As a child of the parent page, we prevent its eviction, but that's a weak guarantee. If the parent page splits, and our WT_REF were to move with the split, the WT_REF.home field might change underneath us and we could race, and end up attempting to access an evicted page. Set the session page-index generation so if the parent splits, it still can't be evicted.
* WT-3331 Get time into a local variable so we can read and use a consistent ↵sueloverso2017-06-192-8/+21
| | | | | | time (#3430)
* WT-3219 Make the clang-analyzer job fail when lint is introduced (#3400)Keith Bostic2017-06-194-6/+19
| | | Quiet the four remaining clang-analyzer complaints.
* WT-3297 support the gcc/clang -fvisibility=hidden flag (#3404)Keith Bostic2017-06-199-32/+73
|
* WT-3327 Check for system clock ticking backwards (#3427)sueloverso2017-06-198-143/+178
|
* WT-3362 Checkpoints shouldn't block drops. (#3459)Michael Cahill2017-06-192-37/+55
| | | | | | | | Testing has uncovered another case where drops can spin trying to lock a checkpoint handle until a checkpoint completes. This change fixes that in two ways: attempting to lock (but not open) a handle won't spin, and drop will always attempt to lock the live tree before locking any checkpoint handles.
* WT-3369 WT_CURSOR->uri should always match the URI used to open the cursor ↵Don Anderson2017-06-193-3/+8
| | | | (#3464)
* WT-3356 Use atomic reads of rwlocks. (#3454)mongodb-3.4.5Michael Cahill2017-06-071-30/+51
| | | | | | | | | | | | * WT-3356 Use atomic reads of rwlocks. Previously we had some conditions that checked several fields within a rwlock by indirecting to the live structure. Switch to always doing a read of the full 64-bit value, then using local reads from the copy. Otherwise, we're relying on the compiler and the memory model to order the structure accesses in "code execution order". That could explain assertion failures and/or incorrect behavior with the new rwlock implementation. * Change all waits to 10ms. Previously when stalling waiting to get into the lock we would wait for 1ms, but once queued we waited forever. The former is probably too aggressive (burns too much CPU when we should be able to wait for a notification), and the latter is dangerous if a notification is ever lost (a thread with a ticket may never wake up).
* WT-3354 Fix bugs found by Coverity. (#3451)Michael Cahill2017-06-074-7/+9
| | | | | | | | | | | | | | | * WT-3354 Fix bugs found by Coverity. * two cases where error checking for rwlocks should goto the error label for cleanup. * LSM code not restoring isolation if a checkpoint fails part way through * Take care with ordering an assertion after a read barrier. We just had an assertion failure on PPC, and from inspection it looks like read in the assertion could be scheduled before read that sees the ticket allocated. We have a read barrier in this path to protect against exactly that kind of thing happening to application data, move the assertion after it so our diagnostics are also safe.
* WT-3345 Tune WiredTiger's read/write locks. (#3446)Michael Cahill2017-06-0240-316/+565
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add a workload that stresses rwlock performance under various conditions (including `threads >> cores`), tune read and write lock operations to only spin when it is likely to help, and to back off to a condition variable when there is heavy contention. * New rwlock implementation: queue readers and writers separately, don't enforce fairness among readers or if the lock is overwhelmed. * Switch to a spinlock whenever we need to lock a page. Previously we had a read/write lock in the __wt_page structure that was only ever acquired in write mode, plus a spinlock in the page->modify structure. Switch to using the spinlock for everything. One slight downside of this change is that we can no longer precisely determine whether a page is locked based on the status of the spinlock (since another page sharing the same lock could be holding it in the places where we used to check). Since that was only ever used for diagnostic / debugging purposes, I think the benefit of the change outweighs this issue. * Fix a bug where a failure during `__wt_curfile_create` caused a data handle to be released twice. This is caught by the sanity checking assertions in the new read/write lock code. * Split may be holding a page lock when restoring update. Tell the restore code we have the page exclusive and no further locking is required. * Allocate a spinlock for each modified page. Using shared page locks for mulitple operations that need to lock a page (including inserts and reconciliation) resulted in self-deadlock when the lookaside table was used. That's because reconciliation held a page lock, then caused inserts to the lookaside table, which acquired the page lock for a page in the lookaside table. With a shared set of page locks, they could both be the same lock. Switch (back?) to allocating a spinlock per modified page. Earlier in this ticket we saved some space in __wt_page, so growing __wt_page_modify is unlikely to be noticeable. * Tweak padding and position of the spinlock in WT_PAGE_MODIFY to claw back some bytes. Move evict_pass_gen to the end of WT_PAGE: on inspection, it should be a cold field relative to the others, which now fit in one x86 cache line. (cherry picked from commit 42daa132f21c1391ae2b2b3d789df85878aca471)
* WT-3293 Don't explicitly mark internal symbols hidden. (#3398)Alex Gorrod2017-06-024-773/+770
| | | | | It messes with external stack decoders (e.g., MongoDB's built-in heap profiling). (cherry picked from commit 96ee1d3f21d434a6c4389a82092f570d211ad608)
* WT-3158 Fix structure layout on Windows. (#3417)mongodb-3.5.8Keith Bostic2017-05-161-2/+3
| | | | Use awk instead of wc to get a count of lines, awk never includes whitespace in the output.
* WT-3158 Fix structure layout on Windows. (#3416)Michael Cahill2017-05-162-5/+7
| | | | | | | We use a pragma on Windows to force a struct to be packed, but were missing the "end" pragma that restores normal layout. The result was that most structs were being packed, leading to poor performance for workloads (particularly when accessing session structures).
* WT-3271 Prevent integer overflow in eviction tuning. (#3379)mongodb-3.5.7mongodb-3.5.6mongodb-3.4.4Michael Cahill2017-04-111-17/+19
| | | | (cherry picked from: 8f371403f0ccfae0188d7e4c2e6d629ade697b13)
* WT-3265 Allow eviction of recently split pages when tree is locked. (#3372)Michael Cahill2017-04-081-1/+6
| | | | | | | | | | | | | | | | | (cherry picked from commit: 84e6ac0e67019bba22af87b99b40bb0bc0e21157) When pages split in WiredTiger, internal pages cannot be evicted immediately because there is a chance that a reader is still looking at an index pointing to the page. We check for this when considering pages for eviction, and assert that we never evict an internal page in an active generation. However, if a page splits and then we try to get exclusive access to the tree (e.g., to verify it), we could fail to evict the tree from cache even though we have guaranteed exclusive access to it. Relax the check on internal pages to allow eviction from trees that are locked exclusive.
* WT-3262 Don't check if the cache is full when accessing metadata. (#3376)Michael Cahill2017-04-081-6/+11
| | | Also don't check for a full cache while holding the table lock (we're likely reading the metadata in that case, just being extra careful).
* Merge commit 'adbe2ec' into mongodb-3.6Alex Gorrod2017-04-064-16/+22
|\
| * WT-3249 Look at slot_state during force while holding lock. (#3365)sueloverso2017-04-043-15/+21
| | | | | | | | | | | | We could race an in-progress switch that set a new, empty active slot but has not yet released the previously active slot and get an incorrect LSN.
| * WT-3254 Fix typo in reconfig string (#3366)sueloverso2017-04-041-1/+1
| |
* | Merge branch 'develop' into mongodb-3.6Alex Gorrod2017-04-042-35/+19
|\ \ | |/
| * WT-3250 Have one function initializing the WT portion of the spinlock. (#3364)sueloverso2017-04-032-35/+19
| | | | | | | | Unify spinlock structures.
* | Merge branch 'develop' into mongodb-3.6Alex Gorrod2017-04-041-0/+3
|\ \ | |/
| * WT-3250 Fix spinlock statistics tracking on Windows. (#3363)Michael Cahill2017-04-031-0/+3
| | | | | | | | | | | | | | MongoDB user on Windows noticed the "LSM: application work units currently queued" statistic was changing in a configuration that involved no LSM code. This was caused by a bug in code that tracks time spent in spinlocks incrementing the wrong statistic. In particular, spinlocks contain fields describing which statistics should be used to track time spent in that spinlock. A value of -1 indicates that the spinlock should not be tracked, but a value of zero is the first statistic in the array for a connection, which happens to be the "LSM: application work units currently queued" statistic. The Windows implementation of spinlocks was not setting these fields to -1, leading to the bug. This bug was introduced by WT 2955 and also meant that every WiredTiger spinlock on Windows was being timed, which may have negatively impacted Windows performance.
* | Merge branch 'develop' into mongodb-3.6Alex Gorrod2017-04-01131-1909/+2920
|\ \ | |/
| * WT-3243 Reorder log slot release so joins don't wait on IO (#3360)sueloverso2017-03-317-192/+221
| |
| * WT-3190 perform a complete re-tune of eviction workers every 30 seconds. (#3324)Alexandra (Sasha) Fedorova2017-03-305-201/+250
| | | | | | | | Otherwise the number of workers wouldn't adjust when the workload changed.
| * WT-2439 Enhance reconciliation page layout (#3358)Keith Bostic2017-03-307-479/+591
| | | | | | | | | | | | | | | | | | * Set minimum split pct to 50. * The leaf-page value dictionary stores cell offsets in the disk image, which implies a dictionary reset any time we hit a boundary or grow the disk image buffer. Recent changes broke that, we weren't resetting the dictionary when the disk image buffer was resized. Instead of clearing the dictionary on buffer resize, switch to using cell offsets in the dictionary instead of cell pointers. It's unlikely to be a big win for many workloads, but it might help some, and it's cleaner than resetting the dictionary more often. Add a verify of disk images we don't write: the I/O routines verify any image we write, but we need to verify any image we create.
| * WT-3155 Remove WT_CONN_SERVER_RUN flag (#3344)Keith Bostic2017-03-2912-58/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set WT_CONN_CLOSING earlier in the connection close process (before calling the async close functions). This requires removing the assert in btree handle open that close hasn't yet been called. Add a barrier after setting the connection close flag to ensure the write is flushed. LSM workers checked both the WT_CONN_SERVER_RUN and WT_LSM_WORKER_RUN flags because the LSM destroy path (__lsm_manager_worker_shutdown), didn't clear WT_LSM_WORKER_RUN flag. Add that clear, change __lsm_worker to only check WT_LSM_WORKER_RUN. Previously, the LSM manager checked the WT_CONN_SERVER_RUN flag in the LSM destroy path and connection shutdown waited on the LSM manager to stop and clear WT_CONN_SERVER_LSM. Flip that process: the LSM shutdown path now clears WT_CONN_SERVER_LSM, and the LSM manager stops when it sees WT_CONN_SERVER_LSM is cleared. The LSM manager sets a new flag, WT_LSM_MANAGER_SHUTDOWN, when it's stopped, and the shutdown process waits on that new flag. Add memory barriers to the thread create and join functions. WiredTiger typically sets (clears) state and expects threads to see the state and start (stop). It simpler and safer if we imply a barrier in the thread API. * Rename WT_CONN_LOG_SERVER_RUN to WT_CONN_SERVER_LOG to match the other server flags. * Once the async and LSM servers have exited, assert no more files are opened. * Instead of using a barrier to ensure the worker run state isn't cached, declare the structure field volatile. Use a stand-alone structure field instead of a set of flags, it's a simpler "volatile" story. * In one of two places, when shutting down worker threads, we signalled the condition variable to wake the worker thread. For consistency, remove the signal (we're only sleeping for 100th of a second, the wake isn't buying us anything). * Restore the assertion in __open_session() that we're not in the "closing" path, returning an error is more dangerous, it might cause a thread to panic, and then we have a panic racing with the close. * A wt_thread_t (POSIX pthread_t) is an opaque type, and can't be assigned to 0 or tested against an integral value portably. Add a bool WT_LSM_WORKER_ARGS.tid_set field instead of assigning or testing the wt_thread_t. We already have an __wt_lsm_start function, add a __wt_lsm_stop function and move the setting/clearing of the WT_LSM_WORKER_ARGS.{running,tid_set} fields into those functions so we ensure the ordering is correct.
| * WT-3208 Don't count page rewrites as eviction making progress. (#3356)Michael Cahill2017-03-294-9/+39
| |
| * WT-3244 Turn off in-memory cache-full checks on the metadata file (#3359)Keith Bostic2017-03-291-0/+8
| | | | | | This avoids metadata operations failing in in-memory configurations.
| * Revert "WT-2439 Improve page layout: keep pages more than half full (#3277)"Michael Cahill2017-03-297-532/+463
| | | | | | | | This reverts commit 1c41c7735b3529521b7bd34180f80584caee7f59.
| * WT-2439 Improve page layout: keep pages more than half full (#3277)Sulabh Mahajan2017-03-297-463/+532
| | | | | | * Changes `split_pct` to have a minimum of 50%.
| * WT-3238 Java: Fix Cursor.compare and Cursor.equals to return int values. (#3355)Don Anderson2017-03-294-1/+186
| | | | | | Non-zero int values for these functions should not raise exceptions.
| * SERVER-28168 Cannot start or repair mongodb after unexpected shutdown. (#3353)Keith Bostic2017-03-271-14/+20
| | | | | | | | | | Panic if there's an error in reading/writing from/to the turtle file, there's no point in continuing. This change avoids user confusion when the turtle file is corrupted or zero'd out by the filesystem.
| * WT-3240 Coverity reports (#3354)Keith Bostic2017-03-276-21/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * WT-3240 Coverity reports Coverity report 1373075: allocated memory is leaked if __wt_snprintf fails. * Coverity report 1373074: allocated memory is leaked if __wt_snprintf fails. * Coverity report 1373073: allocated memory is leaked if __wt_snprintf fails. * Coverity report 1373072: allocated memory is leaked if __wt_snprintf fails. * Coverity report 1373071: allocated memory is leaked if __wt_snprintf fails. * Coverity report 1369053: CID 1369053 (#1 of 1): Unused value (UNUSED_VALUE) assigned_pointer: Assigning value from "," to append_comma here, but that stored value is overwritten before it can be used.
| * WT-3207 Use config to determine checkpoint force value. (#3350)sueloverso2017-03-271-1/+5
| |
| * WT-98 Update the current cursor value without a searchKeith Bostic2017-03-241-5/+5
| | | | | | | | | | Revert "Change LSM WT_CURSOR.{compare,insert,update,remove} to accept an internal key instead of copying the key into WiredTiger-owned memory (in other words, replace WT_CURSOR_NEEDKEY calls with WT_CURSOR_CHECKKEY)." This reverts commit af2c787.
| * WT-3136 bug fix: WiredTiger doesn't check sprintf calls for error return (#3348)Keith Bostic2017-03-241-1/+1
| | | | | | Fix a typo.
| * WT-3136 bug fix: WiredTiger doesn't check sprintf calls for error return (#3347)Keith Bostic2017-03-242-2/+10
| | | | | | | | | | | | Add a style check for use of the snprintf/vsnprintf calls rather than the WiredTiger library replacements. Fix a wtperf snprintf call I missed.
| * WT-3136 bug fix: WiredTiger doesn't check sprintf calls for error return (#3340)Keith Bostic2017-03-2484-671/+893
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * WT-3136 bug fix: WiredTiger doesn't check sprintf calls for error return Make a pass through the source base to check sprintf, snprintf, vsprintf and vsnprintf calls for errors. * A WiredTiger key is a uint64_t. Use sizeof(), don't hard-wire buffer sizes into the code. * More (u_int) vs. (uint64_t) fixes. * Use CONFIG_APPEND instead of FORMAT_APPEND, it makes more sense. * revert part of 4475ae9, there's an explicit allocation of the size of the buffer. * MVSC complaints: test\format\config.c(765): warning C4018: '<': signed/unsigned mismatch test\format\config.c(765): warning C4018: '>': signed/unsigned mismatch * Change Windows testing shim to correctly use __wt_snprintf * Change Windows test shim to use the __wt_XXX functions * MSDN's _vscprintf API returns the number of characters excluding the termininating nul byte, return that value.
| * WT-98 Update the current cursor value without a search (#3346)Keith Bostic2017-03-241-43/+43
| | | | | | | | | | | | | | | | | | | | | | * WT-98 Update the current cursor value without a search When running in-memory and insert/update fails, we should expect WT_ROLLBACK even when not running inside a transaction. * Order the operations alphabetically (they were ordered the way they were because of the order in which we used to choose operations, but that's no longer the case).
| * WT-98 Update the current cursor value without a search (#3330)Keith Bostic2017-03-2426-365/+663
| |
* | Merge branch 'develop' into mongodb-3.6Michael Cahill2017-03-24136-1570/+2577
|\ \ | |/
| * WT-3228 Remove with overwrite shouldn't return WT_NOTFOUND (#3339)Keith Bostic2017-03-243-26/+59
| | | | | | | | | | * Table cursors with overwrite configured wrongly treat not-found as an error, return success instead. * The LSM code clears WT_CURSTD_KEY_SET on unsuccessful searches, which breaks table cursors with indices doing searches on the set of cursors in order to delete old index keys, because there's no key set when it's time to do the update.
| * WT-3234 Update WiredTiger build for clang 4.0. (#3345)Keith Bostic2017-03-242-18/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Update WiredTiger build for clang 4.0. ex_all.c:852:7: error: possible misuse of comma operator here [-Werror,-Wcomma] p1++, p2++; ^ ex_all.c:852:3: note: cast expression to void to silence warning p1++, p2++; ^~~~ (void)( ) 1 error generated. * wtperf.c:2670:4: error: code will never be executed [-Werror,-Wunreachable-code] pos += (size_t)snprintf( ^~~ wtperf.c:2669:23: note: silence by adding parentheses to mark code as explicitly dead if (opts->verbose > 1 && strlen(debug_tconfig) != 0) ^ /* DISABLES CODE */ ( ) wtperf.c:2630:4: error: code will never be executed [-Werror,-Wunreachable-code] pos += (size_t)snprintf( ^~~ wtperf.c:2629:23: note: silence by adding parentheses to mark code as explicitly dead if (opts->verbose > 1 && strlen(debug_cconfig) != 0) ^ /* DISABLES CODE */ ( ) 2 errors generated.
| * SERVER-28194 Missing WiredTiger.turtle file loses data (#3337)Keith Bostic2017-03-232-15/+13
| | | | | | | | | | | | There's a two step process on Windows to rename files (including the turtle file), remove the original and then move the replacement into place -- a DeleteFileW followed by a MoveFileW. If we crash in the middle (and in SERVER-28194, it looks like there's a weirder failure mode, where the DeleteFileW succeeded, but the file was still there), we can be left without a turtle file, which will lose all of the data in the database. * Add the MOVEFILE_WRITE_THROUGH flag to the MoveFileEx call. If we somehow end up in a copy-then-delete path, that flag adds a disk flush after the copy phase, so the window of vulnerability is as short as possible.