summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* WT-3743 In lookaside sweep, check btree IDs are in range. (#3785)Michael Cahill2017-11-131-1/+3
|
* WT-3715 Lookaside eviction tuning. (#3777)Michael Cahill2017-11-1034-242/+783
| | | | | | | | | | | | | | | | | | Multiple changes aimed at improving performance and decreasing stalls when applications keep more history than fits in cache. Support multiple lookaside sessions / cursors simultaneously (initially 5). Don't count lookaside pages as part of the dirty content in cache. Add statistics that indicate the range of pinned timestamps. Try to further hand-optimize WT_SESSION::transaction_timestamp, since it is called under a mutex by MongoDB. Dropping a tree with lookaside entries now causes the entries to be discarded in the background by the sweep thread, rather than doing a full pass of the lookaside table for every drop.
* WT-3637 Fix a heap use after free from evicting of a page that just split ↵Vamsi Krishna2017-11-101-0/+12
| | | | (#3783)
* WT-3710 Get page-level lock to ensure single threaded page-split (#3784)nehakhatri52017-11-101-0/+5
| | | When a page split is in progress it is possible to get another split on this page through its child in another thread. Acquire a page lock on the page undergoing the split to prevent concurrent split on same page.
* WT-3730 For simple tables, do not use table dhandle after it is released. ↵Don Anderson2017-11-085-19/+8
| | | | | | (#3780) Revert "WT-3555 Streamline open_cursor for simple tables. (#3637)" This reverts commit 982121f68b336eb58edfbae26300ef46935277b8.
* WT-3696 Add checks to ensure session usage is single threaded. (#3760)Alex Gorrod2017-11-067-5/+75
| | | | | | | | | | | | | | | | | * WT-3696 Add checks to ensure session usage is single threaded. This currently fails with at least test/fops - that will need to be fixed before the branch is merged. * Ignore the default session - it's used by connection methods. Those methods can be called multi-threaded. * Add comments and ref count to API entry * Review feedback * Implement review feedback.
* WT-3717 Add a verbose lookaside mode (#3774)Alex Gorrod2017-11-036-80/+102
| | | | | | | | | | | | | | | * WT-3717 Add a verbose lookaside mode This required removing the temporary lookaside option, since we are about to run out of bits in verbose flags. * I forgot __wt_verbose is a macro. * Remove temporary verbose config from reconfigure test * Remove debugging code. * Reset connection flag to 32 bits
* WT-3705 Code style fixes, and remove debian packaging (#3771)Alex Gorrod2017-11-0213-107/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | * Cleanup pass over test/format options. Configuration options that don't involve random assignment (C_IGNORE), should only set the min/max range to which they can be assigned, and if they're strings (C_STRING), they shouldn't even set those. * Remove debian packaging support (it was out of date, and there is other up-to-date Debian packaging), remove the RPM specification, we no longer build any RPM packages. * Don't call __conn_dhandle_config_clear() separately from __conn_dhandle_destroy(), WT_DATA_HANDLE.cfg is memory in the dhandle, the destroy function should take care of it. * When debugging a page, dump the page's memory footprint. * Fix code indentation, remove unreachable return. * WiredTiger policy is to call a function in higher level code that is a stub when timestamps are disabled. Remove some unnecessary HAVE_TIMESTAMP * whitespace * WT_TXN_TIMESTAMP_FLAG_CHECK no longer used. * Make declaration order match.
* WT-3675 Fix the lookaside interactions with checkpoints. (#3776)Michael Cahill2017-11-025-61/+58
| | | | | | | | | | | | | | Fix the lookaside info saved by reconciliation and how lookaside interacts with checkpoints. Previously, we tracked whether eviction was successful, and if so, continued the checkpoint from after the evicted page. That could skip over pages in some cases (presumably if eviction caused a split). Instead, simplify the loop to make eviction advisory. If eviction succeeds, it should leave the reference in a state where checkpoint can skip over it quickly. If eviction fails, it may still have written the reference and leave it clean, saving work for checkpoint. Either way, checkpoint visits every reference in the tree regardless of splits.
* WT-3652 Skip unnecessary lookaside reads / writes (#3764)Alex Gorrod2017-10-3113-412/+487
| | | | | | | | | | | | | | | | | | | | | | | | | Also fix reads during checkpoint so the expected read generation is set, and checkpoint can attempt to clean up after itself. Add a "lookaside_score" measuring the proportion of unstable updates in cache (those required for historic reads), use that to determine when to use the lookaside table rather than waiting for the cache to become stuck. Instead of discarding updates as part of restoring a page, including the original value in the update list if the on-page update is a modify. This removes some problematic code that was inconsistent about removing updates. Also, if we need to restore updates earlier than the on-page version, the previous code was incorrect. Disable checkpoint skipping lookaside pages for now: always visit every page until we are tracking the correct IDs and timestamps to skip properly. Allow new/old checkpoints to skip most lookaside pages. Also try lookaside if the cache is stuck. All trees with lookaside pages must stay dirty. Eviction has to match checkpoint's durability rules (including "immediate" visibility for oplog and related tables). Stick to simple eviction for the metadata table.
* Revert WT-3705 Full build Friday and lint (#3770)Alex Gorrod2017-10-3115-114/+122
| | | This reverts commit cad209bdd902df372cc4bc617b080f849fb90261.
* WT-3708 PRIu64 format incorrectly specified for size_t (#3768)Keith Bostic2017-10-301-1/+2
|
* WT-3705 Full build Friday and lint (#3765)Keith Bostic2017-10-3015-122/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Cleanup pass over test/format options. Configuration options that don't involve random assignment (C_IGNORE), should only set the min/max range to which they can be assigned, and if they're strings (C_STRING), they shouldn't even set those. * Remove debian packaging support (it was out of date, and there is other up-to-date Debian packaging), remove the RPM specification, we no longer build any RPM packages. * Don't call __conn_dhandle_config_clear() separately from __conn_dhandle_destroy(), WT_DATA_HANDLE.cfg is memory in the dhandle, the destroy function should take care of it. * When debugging a page, dump the page's memory footprint. * Fix code indentation, remove unreachable return. * WiredTiger policy is to call a function in higher level code that is a stub when timestamps are disabled. Remove some unnecessary HAVE_TIMESTAMP * whitespace * WT_TXN_TIMESTAMP_FLAG_CHECK no longer used. * Make declaration order match. * clang-tidy warning: redundant return statement at the end of a function with a void return type [readability-redundant-control-flow] * clang-tidy: warning: redundant cast to the same type [google-readability-casting] * clang-tidy: warning: different indentation for 'if' and corresponding 'else' [readability-misleading-indentation] * clang-tidy: warning: do not use 'else' after 'break' [readability-else-after-return] * clang-tidy: warning: Value stored to 'eviction_progress_rate' is never read [clang-analyzer-deadcode.DeadStores] * clang-tidy: warning: redundant cast to the same type [google-readability-casting]
* Revert "WT-3652 Skip unnecessary lookaside reads / writes. (#3744)" (#3763)Alex Gorrod2017-10-2712-406/+328
| | | This reverts commit b59b8856c040531ef883b6e68010ff1f47ce1495.
* WT-3652 Skip unnecessary lookaside reads / writes. (#3744)Michael Cahill2017-10-2712-328/+406
| | | | | | | | | | | | | | | | Also fix reads during checkpoint so the expected read generation is set, and checkpoint can attempt to clean up after itself. Add a "lookaside_score" measuring the proportion of unstable updates in cache (those required for historic reads), use that to determine when to use the lookaside table rather than waiting for the cache to become stuck. Instead of discarding updates as part of restoring a page, including the original value in the update list if the on-page update is a modify. This removes some problematic code that was inconsistent about removing updates. Also, if we need to restore updates earlier than the on-page version, the previous code was incorrect. Disable checkpoint skipping lookaside pages for now: always visit every page until we are tracking the correct IDs and timestamps to skip properly. Ensure all trees with lookaside pages must stay dirty.
* WT-3666 Fix lost updates with lookaside eviction. (#3759)Michael Cahill2017-10-264-29/+85
| | | | | | | * Don't write uncommitted updates during eviction for checkpoints. * Since blocks appear immediately in lookaside, retry cursor positioning. * If checkpoint skips lookaside pages, the tree must stay dirty.
* WT-3223 Add optional checkpoint progress messages (#3730)nehakhatri52017-10-258-105/+164
| | | | | | | | Cleanup code so that the functions can use the checkpoint global variables for time instead of passing that information as function arguments. During a database checkpoint, output verbose progress messages indicating the the cumulative amount of pages checkpointed every 20 seconds.
* WT-3680 metadata unroll should discard in-process checkpoints (#3752)Keith Bostic2017-10-251-21/+37
| | | | | | Change metadata unroll to discard tracked checpoints. If fail to either unroll or apply all tracked operations, panic, the state is no longer recoverable.
* WT-3677 test/format compaction doesn't handle timeout error return (#3755)Keith Bostic2017-10-251-0/+7
| | | | | | | | The default compaction timeout is 1200 seconds, and we occasionally exceed it. Handle the error and continue. Periodically check on eviction when compacting in an LSM tree, and quit if we're stuck.
* Don't mark internal pages dirty in order to flush them, they may have (#3756)Keith Bostic2017-10-241-2/+3
| | | | children marked for lookaside reads and that will fail when/if we close the tree with the page marked dirty.
* WT-3683 Allow eviction of clean pages with history when cache is stuck (#3754)Michael Cahill2017-10-241-0/+17
|
* WT-3681 Don't truncate the last log file in recovery (#3753)sueloverso2017-10-241-7/+7
|
* WT-3673 Fix a bug where opening the lookaside table can race with the ↵Keith Bostic2017-10-236-48/+28
| | | | statistics server (#3746)
* WT-3674 wiredtiger-test-spinlock #3916, snapshot isolation failure (#3750)Keith Bostic2017-10-221-2/+2
| | | | In 1f7810e I broke the update list handling when restoring in-memory pages. Revert to the original version.
* WT-3629 cache underflow and logging (#3704)Keith Bostic2017-10-202-53/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * WT-3629 cache accounting underflow checks and logging WT_CACHE.bytes_image and WT_CACHE.bytes_inmem can race, only read them once. * Replace checks for underflow with CAS calls, checking for underflow before decrementing the value. * Remove the special-case where we zero out some values, simply skipping the decrement in the case of underflow should be no worse than setting the value to zero. * Revert to using a simple underflow check, and panic if we ever see the failure. * Flag an error but don't panic on cache underflow. If it happens in production, we should be able to keep going. Abort in diagnostic mode so we capture the failure. * Fix a missing word in a comment, update the spell file. * Fix WT_EXABYTE comparison. * __wt_abort() is a gcc 'no-return' function, don't do anything after calling it.
* WT-3616 format failed to report a stuck cache (#3745)Keith Bostic2017-10-2029-147/+189
| | | | | | Add a separate counter of eviction progress: when we do eviction but don't count it as progress, the page doesn't stay in memory. Update various places that track whether eviction is making progress to use the new counter. In particular, cleanup / rename the eviction thread tuning code and move its state from WT_CONNECTION_IMPL to WT_CACHE.
* WT-3585 Add an API to allow read timestamp to round up to oldest (#3721)Vamsi Krishna2017-10-203-5/+62
|
* WT-3598 Open cursor should not to set transaction error (#3733)Vamsi Krishna2017-10-202-6/+13
|
* WT-3669 Check for rolled back updates during reconciliation. (#3742)Michael Cahill2017-10-193-2/+16
|
* WT-3672 Test format failure with commit timestamp older than oldest ↵Keith Bostic2017-10-193-12/+13
| | | | | | | | | | | | | | | | | | | | timestamp (#3747) * Test format failure with commit timestamp older than oldest timestamp When validating a timestamp, include the type of timestamp with any error message, we validate read timestamps as well as commit timestamps. * Don't update the thread's information until after the commit, otherwise we could race with the timestamp thread and try to commit a change at a timestamp earlier than the "oldest" timestamp. * Rework the timestamp() function so we can't read the global timestamp counter until after we've read the latest-commit timestamp value from the running threads. Rework for clarity, and assert the expected relationship between the thread timestamp values and the global timestamp counter.
* WT-3643 Set panic on error path if recovery needed. (#3720)sueloverso2017-10-191-1/+8
|
* WT-3640 Change bytes-read statistic (#3714)Keith Bostic2017-10-192-2/+4
| | | | Reads should include overflow pages and reads shouldn't include allocated memory chunks associated with the page (for example, WT_REF arrays), writes should be updated atomically.
* WT-3596 Make improvements to timestamp documentation (#3696)Sulabh Mahajan2017-10-192-17/+39
|
* WT-3662 Write lookaside after reconciliation has succeeded.Keith Bostic2017-10-171-1/+0
|
* WT-3635 Coverity 1381606 & Friday builds & lint. (#3702)Keith Bostic2017-10-1719-70/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * WT-3635 Coverity 1381606 & Friday builds & lint. CID 1381606 (#1 of 1): Unused value (UNUSED_VALUE) assigned_value: Assigning value true to modified here, but that stored value is overwritten before it can be used. * The checkpoint_lock is no longer used. * Typo: no reason to test "*updp != NULL" twice. * At different times __evict_review() and __wt_page_can_evict() have returned a boolean when we're doing an in-memory split, or a flags value where the in-memory split information was included. Switch back from returning a flag to returning a boolean: the functions no longer return any other information than if we're doing an in-memory split, and the places where __evict_review() still returns flags that have no meaning to __wt_evict() aren't useful. * KNF/whitespace. * WT_CONNECTION_IMPL.{las_verb_gen_read,las_verb_gen_write} don't need to be declared volatile, there's no cachine issue here. * Ignore "static" when sorting function arguments. Display the file name when complaining about illegal types. Complain about assignments in variable declarations. Fix source code where we had assignments in variable declarations. * The "start" argument is type 'I', a uint32_t, not an int. * __rec_append_orig_value() doesn't need to walk the update list twice, leave the WT_UPDATE reference pointing to the last element and skip the second walk. * Fix 41a5923, need to step to the end of the list. * WT_CACHE isn't declared volatile, use the appropriate function. * Handle static declarations correctly, for some reason, they historically come first. * KNF indentation fix. We must always find the on-page update in the list, assert that fact. * Fix merge of develop branch. * Don't declare first_ts_upd unless HAVE_TIMESTAMPS is #defined, otherwise we'll get static analysis complaints. * Fix error in comments about time unit.
* WT-3662 Write lookaside after reconciliation has succeeded. (#3738)Keith Bostic2017-10-165-249/+306
|
* WT-3663 lookaside records ignored unless a backing disk block written (#3739)Keith Bostic2017-10-161-10/+20
| | | | | It's possible to have a chunk that has no disk image or backing address, it's a lookaside-backed chunk with no entries. Set the lookaside reference even if there's no backing address, overriding the WT_REF_DISK state with WT_REF_LOOKASIDE.
* WT-3660 WiredTiger documentation refers to WT_CURSOR::first. (#3737)Keith Bostic2017-10-131-1/+1
| | | That method is no longer available, replaced by WT_CURSOR::reset.
* WT-3657 Timestamp and lookaside related automated test failures (#3736)Keith Bostic2017-10-131-7/+17
| | | | | | | Don't include update-associated memory in the split decision unless there's either entries to put on the page or updates that will be associated with the image. Based on no data at all, I'm bounding the total number at 10 before we'll consider update information as part of a split.
* WT-3657 Use saved update size for splits, don't grow raw buffer. (#3735)Michael Cahill2017-10-131-85/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | * WT-3657 Use saved update size for splits, don't grow raw buffer. Previously, we would split pages once reconciliation saw 100 saved updates. Apart from suffering from the same problems as other naive split approaches (e.g., creating a new page with a single saved update each time a new insert plus eviction happens), the previous approach also doubled the size of the buffer passed to raw compression each time splits were triggered by the saved update count, leading to attempts to allocate unexpectedly large amounts of memory. * Don't put an "else" after a "continue", fix the comment, we're no longer checking evict/restore. * Turn off raw compression when lookaside is configured on an eviction, I don't trust raw compression with zero-length buffers, that is, chunks that have no entries. Revert minor part of the raw compression changes (we no longer need to move the split_grow label in the raw compression functions). * Remove the __rec_split_raw/__rec_split_raw_worker division, it no longer serves a purpose, change to a single __rec_split_raw call.
* WT-3619 Make compaction more aware of checkpoints and eviction. (#3707)Keith Bostic2017-10-132-43/+126
| | | | | | | | | | | | | | | Otherwise running compaction can cause excessive interruption to other operations - including making them stall for extended periods. Compaction blocks checkpoints for an entire file walk (and in the case of many collections in a database, it might block on each collection). Change compaction to acquire/discard WT_BTREE.flush_lock for each page it reviews, ensuring checkpoint wins and can proceed. If compaction can't acquire WT_BTREE.flush_lock, return EBUSY to the driver routine, which waits for checkpoint to complete before starting the next compaction pass. Change compaction to complain and give up if there's eviction pressure. In the face of eviction pressure, there's no point in doing compaction, we're just making a bad problem worse.
* WT-3655 Don't dirty pages to induce lookaside eviction. (#3732)Michael Cahill2017-10-131-18/+0
| | | Code was added recently to enable lookaside to be available for some clean pages - but it introduced issues closing handles, where it appeared the tree was dirty.
* WT-3611 Backup comment doesn't match the code. (#3715)Keith Bostic2017-10-131-7/+2
|
* Revert previous commitnehakhatri52017-10-122-21/+17
|
* Globalize the timers associated with checkpointnehakhatri52017-10-122-17/+21
| | | | | This is done to cleanup code so that the functions can use the global variables for time instead of passing that information as function arguments.
* WT-3650 Fix minimum timestamp tracking in lookaside. (#3729)Michael Cahill2017-10-121-5/+7
| | | | | Previously, the minimum timestamp was tracked incorrectly, which meant that checkpoints as-of a timestamp would incorrectly skip some pages with lookaside entries.
* WT-3646 Only use lookaside when operations are blocked waiting for cache (#3722)Michael Cahill2017-10-128-71/+153
| | | | | | | | | | | * We recently changed eviction to try using the lookaside table sooner. For some workloads, this can lead to poor performance and runaway growth in the lookaside table size. Use lookaside eviction when the cache is "nearly stuck". * Never use checkpoint's session for writing to lookaside. Lookaside writes should be committed as soon as possible, not delayed to the end of a checkpoint. Also, we have special visibility rules for the checkpoint transaction and (different) special visibility rules for the lookaside table. They do not play nice together. * Improve performance when lookaside eviction is required: * fix the optimization for checkpoints so most lookaside pages can be skipped when the stable timestamp is lagging. * allow row store leaf pages to split when evicting and no values are visible (currently using a dumb heuristic of splitting as soon as 100 records need updates saved). * allow use of lookaside and splits when checkpoints do eviction.
* WT-3646 Don't trigger eviction from checkpoints in write heavy workloads (#3726)Michael Cahill2017-10-127-19/+22
| | | | | | | | | | | | Previously, checkpoint attempted to push out any page requiring forced eviction. With this change, pages read into cache by checkpoint are marked with a different read generation, so that checkpoint can distinguish between pages it read (from lookaside) and big / hot pages that happen to be in cache when a checkpoint runs. Make sure ordinary page read generations fall in the expected range (i.e., are never less that WT_READGEN_START_VALUE unless the page is new or should be evicted soon).
* WT-3645 Allow eviction of lookaside pages as soon as writes commit. (#3725)Michael Cahill2017-10-126-24/+30
|
* WT-3649 Disable lookaside eviction during close. (#3724)Michael Cahill2017-10-121-0/+7
|