| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Previously, when evicting a page with modify updates, some of the
modifications could be skipped when calculating the complete value to
write to the new page. This could lead to updates being lost.
|
|
|
|
| |
use lookaside (#3816)
|
| |
|
|
|
|
|
|
| |
In particular, balance primary inserts, overflowing the cache to
use the lookaside table, secondary inserts and secondary reads of the
oplog (assuming the oplog is at least partially stored in the lookaside
table).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an application performs a truncate operation, WiredTiger marks
pages deleted. If such a page is subsequently read with a view earlier
than the truncate, the page is reinstantiated and all records deleted
(as if truncate had taken the slow path).
Such a page cannot be evicted: if the truncate is rolled back, it
expects to find the page and any tombstones so it can roll them all
back. If the page is evicted or split, the rollback will fail.
This change takes two approaches: don't allow checkpoints to queue
pages for urgent eviction, since checkpoints use special rules to
determine whether eviction is permitted. In addition, check for
uncommitted truncate operations before allowing any page to be evicted.
|
|
|
|
| |
It is an optimization that tickles a corner case with checkpoints
and the fast truncate path.
|
|
|
| |
Queue the page for urgent eviction when application thread is not supposed to do it. This allows splitting pages to maintain the performance of write operations.
|
|
|
| |
Compaction skips all blocks with associated disk images, but the disk image may simply be a result of not wanting to discard the block's contents from the cache. Only skip blocks without any disk address.
|
| |
|
|
|
| |
When the format 'S' is preceded by a size, that indicates the maximum number of bytes the string can store. This does not include the terminating NUL byte. Hence strlen cannot be used to determine the length of the string. Instead use custom version of strnlen.
|
| |
|
|
|
| |
Enabled by default when snappy compression is available.
|
|
|
| |
We already avoided reviewing update chains looking for obsolete updates when the oldest transaction wasn't moving forward. Also avoid the review if the pinned timestamp isn't moving forward - we won't be able to free any updates anyway so it's wasted work.
|
|
|
| |
In particular, when reconciling insert lists on row store leaf pages, we also need to check for splits. That requires tracking keys with saved updates, even if the keys aren't written to the page.
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Allow trimming of obsolete modify updates.
Obsolete update discards look for a contiguous block of obsolete updates
at the end of an update chain. Previously, we reset the search each
time we saw a modify update. That is not what we want: it is fine to
discard obsolete modify updates as long as there is a complete value
somewhere after them in the list.
* Fix handling of aborts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are several behavior changes here to avoid situations where oplog
reads can block when a primary is under severe cache update pressure.
In particular:
* don't block if a scan encounters a large page;
* don't stall reads when the dirty limit of cache is reached;
* mark pages read from lookaside clean if it is safe to do so.
* Revert to including lookaside pages in the dirty limit.
Always check for cache full before starting a transaction, check the
dirty limit at the end of transactions that do updates.
* Don't retry page rewrites until transaction state changes.
We used to have this check for transaction IDs, extend it to also check
when the pinned timestamp hasn't moved forward and don't retry eviction.
|
|
|
|
| |
complete. (#3791)
|
|
|
| |
So they don't try to flush an already flushed chunk.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Multiple changes aimed at improving performance and decreasing stalls
when applications keep more history than fits in cache.
Support multiple lookaside sessions / cursors simultaneously (initially 5).
Don't count lookaside pages as part of the dirty content in cache.
Add statistics that indicate the range of pinned timestamps.
Try to further hand-optimize WT_SESSION::transaction_timestamp, since
it is called under a mutex by MongoDB.
Dropping a tree with lookaside entries now causes the entries to be
discarded in the background by the sweep thread, rather than doing a
full pass of the lookaside table for every drop.
|
|
|
|
| |
(#3783)
|
|
|
| |
When a page split is in progress it is possible to get another split on this page through its child in another thread. Acquire a page lock on the page undergoing the split to prevent concurrent split on same page.
|
|
|
|
|
|
| |
(#3780)
Revert "WT-3555 Streamline open_cursor for simple tables. (#3637)"
This reverts commit 982121f68b336eb58edfbae26300ef46935277b8.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* WT-3696 Add checks to ensure session usage is single threaded.
This currently fails with at least test/fops - that will need
to be fixed before the branch is merged.
* Ignore the default session - it's used by connection methods.
Those methods can be called multi-threaded.
* Add comments and ref count to API entry
* Review feedback
* Implement review feedback.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* WT-3717 Add a verbose lookaside mode
This required removing the temporary lookaside option, since we are
about to run out of bits in verbose flags.
* I forgot __wt_verbose is a macro.
* Remove temporary verbose config from reconfigure test
* Remove debugging code.
* Reset connection flag to 32 bits
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Cleanup pass over test/format options. Configuration options that don't
involve random assignment (C_IGNORE), should only set the min/max range
to which they can be assigned, and if they're strings (C_STRING), they
shouldn't even set those.
* Remove debian packaging support (it was out of date, and there is other
up-to-date Debian packaging), remove the RPM specification, we no longer
build any RPM packages.
* Don't call __conn_dhandle_config_clear() separately from
__conn_dhandle_destroy(), WT_DATA_HANDLE.cfg is memory in
the dhandle, the destroy function should take care of it.
* When debugging a page, dump the page's memory footprint.
* Fix code indentation, remove unreachable return.
* WiredTiger policy is to call a function in higher level code that is a
stub when timestamps are disabled. Remove some unnecessary HAVE_TIMESTAMP
* whitespace
* WT_TXN_TIMESTAMP_FLAG_CHECK no longer used.
* Make declaration order match.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix the lookaside info saved by reconciliation and how lookaside interacts with checkpoints.
Previously, we tracked whether eviction was successful, and if so,
continued the checkpoint from after the evicted page. That could skip
over pages in some cases (presumably if eviction caused a split).
Instead, simplify the loop to make eviction advisory. If eviction
succeeds, it should leave the reference in a state where checkpoint can
skip over it quickly. If eviction fails, it may still have written the
reference and leave it clean, saving work for checkpoint. Either way,
checkpoint visits every reference in the tree regardless of splits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also fix reads during checkpoint so the expected read generation is
set, and checkpoint can attempt to clean up after itself.
Add a "lookaside_score" measuring the proportion of unstable updates in
cache (those required for historic reads), use that to determine when
to use the lookaside table rather than waiting for the cache to become
stuck.
Instead of discarding updates as part of restoring a page, including the original value in the update list if the on-page update is a modify. This removes some problematic code that was inconsistent about removing updates. Also, if we need to restore updates earlier than the on-page version, the previous code was incorrect.
Disable checkpoint skipping lookaside pages for now: always visit every page until we are tracking the correct IDs and timestamps to skip properly.
Allow new/old checkpoints to skip most lookaside pages.
Also try lookaside if the cache is stuck.
All trees with lookaside pages must stay dirty.
Eviction has to match checkpoint's durability rules (including
"immediate" visibility for oplog and related tables).
Stick to simple eviction for the metadata table.
|
|
|
| |
This reverts commit cad209bdd902df372cc4bc617b080f849fb90261.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Cleanup pass over test/format options. Configuration options that don't
involve random assignment (C_IGNORE), should only set the min/max range
to which they can be assigned, and if they're strings (C_STRING), they
shouldn't even set those.
* Remove debian packaging support (it was out of date, and there is other
up-to-date Debian packaging), remove the RPM specification, we no longer
build any RPM packages.
* Don't call __conn_dhandle_config_clear() separately from
__conn_dhandle_destroy(), WT_DATA_HANDLE.cfg is memory in
the dhandle, the destroy function should take care of it.
* When debugging a page, dump the page's memory footprint.
* Fix code indentation, remove unreachable return.
* WiredTiger policy is to call a function in higher level code that is a
stub when timestamps are disabled. Remove some unnecessary HAVE_TIMESTAMP
* whitespace
* WT_TXN_TIMESTAMP_FLAG_CHECK no longer used.
* Make declaration order match.
* clang-tidy warning: redundant return statement at the end of a function
with a void return type [readability-redundant-control-flow]
* clang-tidy: warning: redundant cast to the same type
[google-readability-casting]
* clang-tidy: warning: different indentation for 'if' and corresponding
'else' [readability-misleading-indentation]
* clang-tidy: warning: do not use 'else' after 'break'
[readability-else-after-return]
* clang-tidy: warning: Value stored to 'eviction_progress_rate' is never
read [clang-analyzer-deadcode.DeadStores]
* clang-tidy: warning: redundant cast to the same type
[google-readability-casting]
|
|
|
| |
This reverts commit b59b8856c040531ef883b6e68010ff1f47ce1495.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Also fix reads during checkpoint so the expected read generation is
set, and checkpoint can attempt to clean up after itself.
Add a "lookaside_score" measuring the proportion of unstable updates in
cache (those required for historic reads), use that to determine when
to use the lookaside table rather than waiting for the cache to become
stuck.
Instead of discarding updates as part of restoring a page, including the original value in the update list if the on-page update is a modify. This removes some problematic code that was inconsistent about removing updates. Also, if we need to restore updates earlier than the on-page version, the previous code was incorrect.
Disable checkpoint skipping lookaside pages for now: always visit every page until we are tracking the correct IDs and timestamps to skip properly.
Ensure all trees with lookaside pages must stay dirty.
|
|
|
|
|
|
|
| |
* Don't write uncommitted updates during eviction for checkpoints.
* Since blocks appear immediately in lookaside, retry cursor positioning.
* If checkpoint skips lookaside pages, the tree must stay dirty.
|
|
|
|
|
|
|
|
| |
Cleanup code so that the functions can use the checkpoint global variables
for time instead of passing that information as function arguments.
During a database checkpoint, output verbose progress messages indicating the
the cumulative amount of pages checkpointed every 20 seconds.
|
|
|
|
|
|
| |
Change metadata unroll to discard tracked checpoints.
If fail to either unroll or apply all tracked operations, panic, the
state is no longer recoverable.
|
|
|
|
|
|
|
|
| |
The default compaction timeout is 1200 seconds, and we occasionally exceed
it. Handle the error and continue.
Periodically check on eviction when compacting in an LSM tree, and quit
if we're stuck.
|
|
|
|
| |
children marked for lookaside reads and that will fail when/if we close
the tree with the page marked dirty.
|
| |
|
| |
|
|
|
|
| |
statistics server (#3746)
|
|
|
|
| |
In 1f7810e I broke the update list handling when restoring in-memory
pages. Revert to the original version.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* WT-3629 cache accounting underflow checks and logging
WT_CACHE.bytes_image and WT_CACHE.bytes_inmem can race, only read them once.
* Replace checks for underflow with CAS calls, checking for underflow
before decrementing the value.
* Remove the special-case where we zero out some values, simply skipping
the decrement in the case of underflow should be no worse than setting
the value to zero.
* Revert to using a simple underflow check, and panic if we ever see the
failure.
* Flag an error but don't panic on cache underflow.
If it happens in production, we should be able to keep going.
Abort in diagnostic mode so we capture the failure.
* Fix a missing word in a comment, update the spell file.
* Fix WT_EXABYTE comparison.
* __wt_abort() is a gcc 'no-return' function, don't do anything after
calling it.
|
|
|
|
|
|
| |
Add a separate counter of eviction progress: when we do eviction but don't count it as progress, the page doesn't stay in memory.
Update various places that track whether eviction is making progress to use the new counter. In particular, cleanup / rename the eviction thread tuning code and move its state from WT_CONNECTION_IMPL to WT_CACHE.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
timestamp (#3747)
* Test format failure with commit timestamp older than oldest timestamp
When validating a timestamp, include the type of timestamp with any
error message, we validate read timestamps as well as commit timestamps.
* Don't update the thread's information until after the commit, otherwise
we could race with the timestamp thread and try to commit a change at a
timestamp earlier than the "oldest" timestamp.
* Rework the timestamp() function so we can't read the global timestamp
counter until after we've read the latest-commit timestamp value from
the running threads.
Rework for clarity, and assert the expected relationship between the
thread timestamp values and the global timestamp counter.
|
| |
|