| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
For workloads where no tree takes up a large enough fraction of cache,
we were using a randomized approach to deciding when eviction should
visit trees. That led to slow performance for workloads with uniform
updates over thousands of trees.
(cherry picked from commit 2f1ec98512010f6c92bf27a41180e3c8704b54c8)
Signed-off-by: Alex Gorrod <alexander.gorrod@mongodb.com>
|
|
|
|
|
| |
(cherry picked from commit 6173a98979715ed727c432c1a31da64ea8a37048)
Signed-off-by: Alex Gorrod <alexander.gorrod@mongodb.com>
|
|
|
|
|
|
|
|
|
|
| |
* WT-3499 Add a visibility rwlock between transactions and checkpoints.
* Typo
* Just acquire/release the lock immediately for synchronization.
(cherry picked from commit 80c6cee91faf84c4772a98c40e60d0a6890ccb52)
|
|
|
| |
During WT_SESSION::reset, if there has been a schema change (such as a WT_SESSION::drop operation) since the last sweep, do a pass through the table cache and remove any obsolete table handles.
|
|
|
|
|
|
|
|
|
| |
When acquiring a lock on our parent internal page, we use the WT_REF.home
field to reference our parent page. As a child of the parent page, we
prevent its eviction, but that's a weak guarantee. If the parent page
splits, and our WT_REF were to move with the split, the WT_REF.home field
might change underneath us and we could race, and end up attempting to
access an evicted page. Set the session page-index generation so if the
parent splits, it still can't be evicted.
|
|
|
|
|
|
| |
time (#3430)
|
|
|
| |
Quiet the four remaining clang-analyzer complaints.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Testing has uncovered another case where drops can spin trying to lock a
checkpoint handle until a checkpoint completes. This change fixes that
in two ways: attempting to lock (but not open) a handle won't spin, and
drop will always attempt to lock the live tree before locking any
checkpoint handles.
|
|
|
|
| |
(#3464)
|
|
|
|
|
|
|
|
|
|
|
|
| |
* WT-3356 Use atomic reads of rwlocks.
Previously we had some conditions that checked several fields within a rwlock by indirecting to the live structure. Switch to always doing a read of the full 64-bit value, then using local reads from the copy.
Otherwise, we're relying on the compiler and the memory model to order the structure accesses in "code execution order". That could explain assertion failures and/or incorrect behavior with the new rwlock implementation.
* Change all waits to 10ms.
Previously when stalling waiting to get into the lock we would wait for 1ms, but once queued we waited forever. The former is probably too aggressive (burns too much CPU when we should be able to wait for a notification), and the latter is dangerous if a notification is ever lost (a thread with a ticket may never wake up).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* WT-3354 Fix bugs found by Coverity.
* two cases where error checking for rwlocks should goto the error label for cleanup.
* LSM code not restoring isolation if a checkpoint fails part way through
* Take care with ordering an assertion after a read barrier.
We just had an assertion failure on PPC, and from inspection it looks
like read in the assertion could be scheduled before read that sees the
ticket allocated. We have a read barrier in this path to protect
against exactly that kind of thing happening to application data, move
the assertion after it so our diagnostics are also safe.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add a workload that stresses rwlock performance under various conditions (including `threads >> cores`), tune read and write lock operations to only spin when it is likely to help, and to back off to a condition variable when there is heavy contention.
* New rwlock implementation: queue readers and writers separately, don't enforce fairness among readers or if the lock is overwhelmed.
* Switch to a spinlock whenever we need to lock a page.
Previously we had a read/write lock in the __wt_page structure that was only ever acquired in write mode, plus a spinlock in the page->modify structure. Switch to using the spinlock for everything.
One slight downside of this change is that we can no longer precisely determine whether a page is locked based on the status of the spinlock (since another page sharing the same lock could be holding it in the places where we used to check). Since that was only ever used
for diagnostic / debugging purposes, I think the benefit of the change outweighs this issue.
* Fix a bug where a failure during `__wt_curfile_create` caused a data handle to be released twice. This is caught by the sanity checking assertions in the new read/write lock code.
* Split may be holding a page lock when restoring update. Tell the restore code we have the page exclusive and no further locking is required.
* Allocate a spinlock for each modified page.
Using shared page locks for mulitple operations that need to lock a page (including inserts and reconciliation) resulted in self-deadlock when the lookaside table was used. That's because reconciliation held a page lock, then caused inserts to the lookaside table, which acquired the page lock for a page in the lookaside table. With a shared set of page locks, they could both be the same lock.
Switch (back?) to allocating a spinlock per modified page. Earlier in this ticket we saved some space in __wt_page, so growing __wt_page_modify is unlikely to be noticeable.
* Tweak padding and position of the spinlock in WT_PAGE_MODIFY to claw back some bytes.
Move evict_pass_gen to the end of WT_PAGE: on inspection, it should be a cold field relative to the others, which now fit in one x86 cache line.
(cherry picked from commit 42daa132f21c1391ae2b2b3d789df85878aca471)
|
|
|
|
|
| |
It messes with external stack decoders (e.g., MongoDB's built-in heap profiling).
(cherry picked from commit 96ee1d3f21d434a6c4389a82092f570d211ad608)
|
|
|
|
| |
Use awk instead of wc to get a count of lines, awk never includes
whitespace in the output.
|
|
|
|
|
|
|
| |
We use a pragma on Windows to force a struct to be packed, but were
missing the "end" pragma that restores normal layout. The result was
that most structs were being packed, leading to poor performance for
workloads (particularly when accessing session structures).
|
|
|
|
| |
(cherry picked from: 8f371403f0ccfae0188d7e4c2e6d629ade697b13)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(cherry picked from commit: 84e6ac0e67019bba22af87b99b40bb0bc0e21157)
When pages split in WiredTiger, internal pages cannot be evicted
immediately because there is a chance that a reader is still looking at
an index pointing to the page. We check for this when considering pages
for eviction, and assert that we never evict an internal page in an
active generation.
However, if a page splits and then we try to get exclusive access to
the tree (e.g., to verify it), we could fail to evict the tree from
cache even though we have guaranteed exclusive access to it.
Relax the check on internal pages to allow eviction from trees that are
locked exclusive.
|
|
|
| |
Also don't check for a full cache while holding the table lock (we're likely reading the metadata in that case, just being extra careful).
|
|\ |
|
| |
| |
| |
| |
| |
| | |
We could race an in-progress switch that set a new, empty active slot
but has not yet released the previously active slot and get an
incorrect LSN.
|
| | |
|
|\ \
| |/ |
|
| |
| |
| |
| | |
Unify spinlock structures.
|
|\ \
| |/ |
|
| |
| |
| |
| |
| |
| |
| | |
MongoDB user on Windows noticed the "LSM: application work units currently queued" statistic was changing in a configuration that involved no LSM code. This was caused by a bug in code that tracks time spent in spinlocks incrementing the wrong statistic.
In particular, spinlocks contain fields describing which statistics should be used to track time spent in that spinlock. A value of -1 indicates that the spinlock should not be tracked, but a value of zero is the first statistic in the array for a connection, which happens to be the "LSM: application work units currently queued" statistic. The Windows implementation of spinlocks was not setting these fields to -1, leading to the bug.
This bug was introduced by WT 2955 and also meant that every WiredTiger spinlock on Windows was being timed, which may have negatively impacted Windows performance.
|
|\ \
| |/ |
|
| | |
|
| |
| |
| |
| | |
Otherwise the number of workers wouldn't adjust when the workload changed.
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* Set minimum split pct to 50.
* The leaf-page value dictionary stores cell offsets in the disk image, which implies a dictionary reset any time we hit a boundary or grow the disk image buffer. Recent changes broke that, we weren't resetting the dictionary when the disk image buffer was resized.
Instead of clearing the dictionary on buffer resize, switch to using cell offsets in the dictionary instead of cell pointers. It's unlikely to be a big win for many workloads, but it might help some, and it's cleaner than resetting the dictionary more often.
Add a verify of disk images we don't write: the I/O routines verify any image we write, but we need to verify any image we create.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Set WT_CONN_CLOSING earlier in the connection close process (before
calling the async close functions). This requires removing the assert
in btree handle open that close hasn't yet been called. Add a barrier
after setting the connection close flag to ensure the write is flushed.
LSM workers checked both the WT_CONN_SERVER_RUN and WT_LSM_WORKER_RUN
flags because the LSM destroy path (__lsm_manager_worker_shutdown),
didn't clear WT_LSM_WORKER_RUN flag. Add that clear, change __lsm_worker
to only check WT_LSM_WORKER_RUN.
Previously, the LSM manager checked the WT_CONN_SERVER_RUN flag in the
LSM destroy path and connection shutdown waited on the LSM manager to
stop and clear WT_CONN_SERVER_LSM. Flip that process: the LSM shutdown
path now clears WT_CONN_SERVER_LSM, and the LSM manager stops when it
sees WT_CONN_SERVER_LSM is cleared. The LSM manager sets a new flag,
WT_LSM_MANAGER_SHUTDOWN, when it's stopped, and the shutdown process
waits on that new flag.
Add memory barriers to the thread create and join functions. WiredTiger
typically sets (clears) state and expects threads to see the state and
start (stop). It simpler and safer if we imply a barrier in the thread
API.
* Rename WT_CONN_LOG_SERVER_RUN to WT_CONN_SERVER_LOG to match the other
server flags.
* Once the async and LSM servers have exited, assert no more files are
opened.
* Instead of using a barrier to ensure the worker run state isn't cached,
declare the structure field volatile. Use a stand-alone structure field
instead of a set of flags, it's a simpler "volatile" story.
* In one of two places, when shutting down worker threads, we signalled the
condition variable to wake the worker thread. For consistency, remove the
signal (we're only sleeping for 100th of a second, the wake isn't buying
us anything).
* Restore the assertion in __open_session() that we're not in the
"closing" path, returning an error is more dangerous, it might
cause a thread to panic, and then we have a panic racing with the
close.
* A wt_thread_t (POSIX pthread_t) is an opaque type, and can't be assigned
to 0 or tested against an integral value portably. Add a
bool WT_LSM_WORKER_ARGS.tid_set field instead of assigning or testing the
wt_thread_t.
We already have an __wt_lsm_start function, add a __wt_lsm_stop function
and move the setting/clearing of the WT_LSM_WORKER_ARGS.{running,tid_set}
fields into those functions so we ensure the ordering is correct.
|
| | |
|
| |
| |
| | |
This avoids metadata operations failing in in-memory configurations.
|
| |
| |
| |
| | |
This reverts commit 1c41c7735b3529521b7bd34180f80584caee7f59.
|
| |
| |
| | |
* Changes `split_pct` to have a minimum of 50%.
|
| |
| |
| | |
Non-zero int values for these functions should not raise exceptions.
|
| |
| |
| |
| |
| | |
Panic if there's an error in reading/writing from/to the turtle file,
there's no point in continuing. This change avoids user confusion when
the turtle file is corrupted or zero'd out by the filesystem.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* WT-3240 Coverity reports
Coverity report 1373075: allocated memory is leaked if __wt_snprintf
fails.
* Coverity report 1373074: allocated memory is leaked if __wt_snprintf
fails.
* Coverity report 1373073: allocated memory is leaked if __wt_snprintf
fails.
* Coverity report 1373072: allocated memory is leaked if __wt_snprintf
fails.
* Coverity report 1373071: allocated memory is leaked if __wt_snprintf
fails.
* Coverity report 1369053: CID 1369053 (#1 of 1): Unused value
(UNUSED_VALUE) assigned_pointer: Assigning value from "," to
append_comma here, but that stored value is overwritten before
it can be used.
|
| | |
|
| |
| |
| |
| |
| | |
Revert "Change LSM WT_CURSOR.{compare,insert,update,remove} to accept an internal key instead of copying the key into WiredTiger-owned memory (in other words, replace WT_CURSOR_NEEDKEY calls with WT_CURSOR_CHECKKEY)."
This reverts commit af2c787.
|
| |
| |
| | |
Fix a typo.
|
| |
| |
| |
| |
| |
| | |
Add a style check for use of the snprintf/vsnprintf calls rather than
the WiredTiger library replacements.
Fix a wtperf snprintf call I missed.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* WT-3136 bug fix: WiredTiger doesn't check sprintf calls for error return
Make a pass through the source base to check sprintf, snprintf, vsprintf
and vsnprintf calls for errors.
* A WiredTiger key is a uint64_t.
Use sizeof(), don't hard-wire buffer sizes into the code.
* More (u_int) vs. (uint64_t) fixes.
* Use CONFIG_APPEND instead of FORMAT_APPEND, it makes more sense.
* revert part of 4475ae9, there's an explicit allocation of the size of
the buffer.
* MVSC complaints:
test\format\config.c(765): warning C4018: '<': signed/unsigned mismatch
test\format\config.c(765): warning C4018: '>': signed/unsigned mismatch
* Change Windows testing shim to correctly use __wt_snprintf
* Change Windows test shim to use the __wt_XXX functions
* MSDN's _vscprintf API returns the number of characters excluding the
termininating nul byte, return that value.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* WT-98 Update the current cursor value without a search
When running in-memory and insert/update fails, we should expect
WT_ROLLBACK even when not running inside a transaction.
* Order the operations alphabetically (they were ordered the way they were
because of the order in which we used to choose operations, but that's no
longer the case).
|
| | |
|
|\ \
| |/ |
|
| |
| |
| |
| |
| | |
* Table cursors with overwrite configured wrongly treat not-found as an error, return success instead.
* The LSM code clears WT_CURSTD_KEY_SET on unsuccessful searches, which breaks table cursors with indices doing searches on the set of cursors in order to delete old index keys, because there's no key set when it's time to do the update.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* Update WiredTiger build for clang 4.0.
ex_all.c:852:7: error: possible misuse of comma operator here [-Werror,-Wcomma]
p1++, p2++;
^
ex_all.c:852:3: note: cast expression to void to silence warning
p1++, p2++;
^~~~
(void)( )
1 error generated.
* wtperf.c:2670:4: error: code will never be executed [-Werror,-Wunreachable-code]
pos += (size_t)snprintf(
^~~
wtperf.c:2669:23: note: silence by adding parentheses to mark code as explicitly dead
if (opts->verbose > 1 && strlen(debug_tconfig) != 0)
^
/* DISABLES CODE */ ( )
wtperf.c:2630:4: error: code will never be executed [-Werror,-Wunreachable-code]
pos += (size_t)snprintf(
^~~
wtperf.c:2629:23: note: silence by adding parentheses to mark code as explicitly dead
if (opts->verbose > 1 && strlen(debug_cconfig) != 0)
^
/* DISABLES CODE */ ( )
2 errors generated.
|
| |
| |
| |
| |
| |
| | |
There's a two step process on Windows to rename files (including the turtle file), remove the original and then move the replacement into place -- a DeleteFileW followed by a MoveFileW. If we crash in the middle (and in SERVER-28194, it looks like there's a weirder failure mode, where the DeleteFileW succeeded, but the file was still there), we can be left without a turtle file, which will lose all of the data in the database.
* Add the MOVEFILE_WRITE_THROUGH flag to the MoveFileEx call. If we somehow end up in a copy-then-delete path, that flag adds a disk flush after the copy phase, so the window of vulnerability is as short as possible.
|