summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* WT-3776 Cursor remove operation unpins page too early (#3825)mongodb-3.6.1Keith Bostic2017-12-071-10/+28
| | | | | | | | | | | | | | | | | There's trickiness in the page-pinned check. By definition a remove operation leaves a cursor positioned if it's initially positioned. However, if every item on the page is deleted and we unpin the page, eviction might delete the page and our search will re-instantiate an empty page for us. Cursor remove returns not-found whether or not that eviction/deletion happens and it's OK unless cursor-overwrite is configured (which means we return success even if there's no item to delete). In that case, we'll fail when we try to point the cursor at the key on the page to satisfy the positioned requirement. It's arguably safe to simply leave the key initialized in the cursor (as that's all a positioned cursor implies), but it's probably safer to avoid page eviction entirely in the positioned case. (cherry picked from commit 2ac616e61fac1c0e71b47e5d7633c6fbf518fb2f)
* WT-3786 Transactions should read their writes regardless of timestamps. (#3826)Michael Cahill2017-12-072-0/+22
| | | | (cherry picked from commit 2f489ea88e61b00d52dd0000d18841b85f32be27)
* WT-3079 Resume eviction walks per tree. (#3822)Michael Cahill2017-12-073-48/+80
| | | | | | | | Record how many pages we want and how many pages we have queued so far in a tree, then resume the walk next iteration. This avoids a single tree with a target larger than the queue size being walked completely before the eviction server moves on to the next tree. (cherry picked from commit fca6a8d71e5cf3b887a0749ae85519c963ba40d1)
* Merge branch 'mongodb-3.8' into mongodb-3.6mongodb-3.6.0Alex Gorrod2017-11-2911-71/+126
|\
| * WT-3773 Fix a bug calculating on-disk images for modify updates. (#3817)Michael Cahill2017-11-296-26/+36
| | | | | | | | | | Previously, when evicting a page with modify updates, some of the modifications could be skipped when calculating the complete value to write to the new page. This could lead to updates being lost.
| * WT-3763 Revert part of the change that made reconciliation more likely to ↵nehakhatri52017-11-291-9/+8
| | | | | | | | use lookaside (#3816)
| * WT-3763 Disable suffix compression on key with saved updates. (#3814)Michael Cahill2017-11-281-3/+3
| |
| * WT-3763 Tune eviction for various MongoDB workloads. (#3804)Michael Cahill2017-11-288-41/+63
| | | | | | | | | | | | In particular, balance primary inserts, overflowing the cache to use the lookaside table, secondary inserts and secondary reads of the oplog (assuming the oplog is at least partially stored in the lookaside table).
| * WT-3764 Allow fast eviction of unwanted clean pages. (#3806)Michael Cahill2017-11-284-7/+17
| |
| * WT-3765 Prevent eviction of pages being truncated. (#3809)Michael Cahill2017-11-271-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an application performs a truncate operation, WiredTiger marks pages deleted. If such a page is subsequently read with a view earlier than the truncate, the page is reinstantiated and all records deleted (as if truncate had taken the slow path). Such a page cannot be evicted: if the truncate is rolled back, it expects to find the page and any tombstones so it can roll them all back. If the page is evicted or split, the rollback will fail. This change takes two approaches: don't allow checkpoints to queue pages for urgent eviction, since checkpoints use special rules to determine whether eviction is permitted. In addition, check for uncommitted truncate operations before allowing any page to be evicted.
* | Merge branch 'mongodb-3.8' into mongodb-3.6Luke Chen2017-11-2813-29/+85
|\ \ | |/
| * WT-3761 Don't immediately evict pages even if they look clean (#3805)Alex Gorrod2017-11-241-2/+1
| | | | | | | | It is an optimization that tickles a corner case with checkpoints and the fast truncate path.
| * WT-3761 Queue pages for urgent eviction on page release (#3803)nehakhatri52017-11-231-9/+14
| | | | | | Queue the page for urgent eviction when application thread is not supposed to do it. This allows splitting pages to maintain the performance of write operations.
| * WT-3607 compaction skips all blocks with associated disk images (#3801)Keith Bostic2017-11-231-1/+1
| | | | | | Compaction skips all blocks with associated disk images, but the disk image may simply be a result of not wanting to discard the block's contents from the cache. Only skip blocks without any disk address.
| * WT-3762 Add a "force" flag to WT_CONNECTION::set_timestamp. (#3800)Michael Cahill2017-11-237-11/+40
| |
| * WT-3658 Fix the string length calculation when size is given (#3786)nehakhatri52017-11-234-8/+31
| | | | | | When the format 'S' is preceded by a size, that indicates the maximum number of bytes the string can store. This does not include the terminating NUL byte. Hence strlen cannot be used to determine the length of the string. Instead use custom version of strnlen.
* | Merge branch 'mongodb-3.8' into mongodb-3.6Luke Chen2017-11-225-33/+91
|\ \ | |/
| * WT-3760 Avoid writing overflow values into the lookaside file (#3799)Vamsi Krishna2017-11-222-6/+7
| |
| * WT-3758 Turn on snappy compression for lookaside file (#3797)Vamsi Krishna2017-11-221-0/+7
| | | | | | Enabled by default when snappy compression is available.
| * WT-3754 Consider timestamps before reviewing update chains for obsolete (#3796)Sulabh Mahajan2017-11-223-5/+20
| | | | | | We already avoided reviewing update chains looking for obsolete updates when the oldest transaction wasn't moving forward. Also avoid the review if the pinned timestamp isn't moving forward - we won't be able to free any updates anyway so it's wasted work.
| * WT-3751 Allow splits when no data is visible. (#3793)Michael Cahill2017-11-211-21/+55
| | | | | | In particular, when reconciling insert lists on row store leaf pages, we also need to check for splits. That requires tracking keys with saved updates, even if the keys aren't written to the page.
| * WT-3752 Allow trimming of obsolete modify updates. (#3794)Michael Cahill2017-11-201-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | * Allow trimming of obsolete modify updates. Obsolete update discards look for a contiguous block of obsolete updates at the end of an update chain. Previously, we reset the search each time we saw a modify update. That is not what we want: it is fine to discard obsolete modify updates as long as there is a complete value somewhere after them in the list. * Fix handling of aborts.
* | Merge branch 'mongodb-3.8' into mongodb-3.6Alex Gorrod2017-11-1724-355/+484
|\ \ | |/
| * WT-3745 Favor oplog reads when under cache pressure. (#3788)Michael Cahill2017-11-1722-338/+447
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several behavior changes here to avoid situations where oplog reads can block when a primary is under severe cache update pressure. In particular: * don't block if a scan encounters a large page; * don't stall reads when the dirty limit of cache is reached; * mark pages read from lookaside clean if it is safe to do so. * Revert to including lookaside pages in the dirty limit. Always check for cache full before starting a transaction, check the dirty limit at the end of transactions that do updates. * Don't retry page rewrites until transaction state changes. We used to have this check for transaction IDs, extend it to also check when the pinned timestamp hasn't moved forward and don't retry eviction.
| * WT-3746 Don't busy wait when syncing the log and waiting for writes to ↵sueloverso2017-11-161-1/+3
| | | | | | | | complete. (#3791)
| * WT-3444 Fixes to LSM compact and alter interaction (#3787)sueloverso2017-11-141-16/+34
| | | | | | So they don't try to flush an already flushed chunk.
* | Merge branch 'develop' into mongodb-3.6mongodb-3.7.0Luke Chen2017-11-131-1/+3
|\ \ | |/
| * WT-3743 In lookaside sweep, check btree IDs are in range. (#3785)Michael Cahill2017-11-131-1/+3
| |
* | Merge branch 'develop' into mongodb-3.6Alex Gorrod2017-11-1357-365/+1013
|\ \ | |/
| * WT-3715 Lookaside eviction tuning. (#3777)Michael Cahill2017-11-1036-245/+790
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multiple changes aimed at improving performance and decreasing stalls when applications keep more history than fits in cache. Support multiple lookaside sessions / cursors simultaneously (initially 5). Don't count lookaside pages as part of the dirty content in cache. Add statistics that indicate the range of pinned timestamps. Try to further hand-optimize WT_SESSION::transaction_timestamp, since it is called under a mutex by MongoDB. Dropping a tree with lookaside entries now causes the entries to be discarded in the background by the sweep thread, rather than doing a full pass of the lookaside table for every drop.
| * WT-3648 Read the thread's timestamp information once. (#3782)sueloverso2017-11-102-9/+11
| |
| * WT-3637 Fix a heap use after free from evicting of a page that just split ↵Vamsi Krishna2017-11-101-0/+12
| | | | | | | | (#3783)
| * WT-3710 Get page-level lock to ensure single threaded page-split (#3784)nehakhatri52017-11-101-0/+5
| | | | | | When a page split is in progress it is possible to get another split on this page through its child in another thread. Acquire a page lock on the page undergoing the split to prevent concurrent split on same page.
| * WT-3730 For simple tables, do not use table dhandle after it is released. ↵Don Anderson2017-11-085-19/+8
| | | | | | | | | | | | (#3780) Revert "WT-3555 Streamline open_cursor for simple tables. (#3637)" This reverts commit 982121f68b336eb58edfbae26300ef46935277b8.
| * WT-3696 Add checks to ensure session usage is single threaded. (#3760)Alex Gorrod2017-11-0612-11/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * WT-3696 Add checks to ensure session usage is single threaded. This currently fails with at least test/fops - that will need to be fixed before the branch is merged. * Ignore the default session - it's used by connection methods. Those methods can be called multi-threaded. * Add comments and ref count to API entry * Review feedback * Implement review feedback.
| * WT-3717 Add a verbose lookaside mode (#3774)Alex Gorrod2017-11-0310-84/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * WT-3717 Add a verbose lookaside mode This required removing the temporary lookaside option, since we are about to run out of bits in verbose flags. * I forgot __wt_verbose is a macro. * Remove temporary verbose config from reconfigure test * Remove debugging code. * Reset connection flag to 32 bits
* | Merge branch 'develop' into mongodb-3.6Luke Chen2017-11-0370-1032/+1206
|\ \ | |/
| * WT-3705 Code style fixes, and remove debian packaging (#3771)Alex Gorrod2017-11-0240-322/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Cleanup pass over test/format options. Configuration options that don't involve random assignment (C_IGNORE), should only set the min/max range to which they can be assigned, and if they're strings (C_STRING), they shouldn't even set those. * Remove debian packaging support (it was out of date, and there is other up-to-date Debian packaging), remove the RPM specification, we no longer build any RPM packages. * Don't call __conn_dhandle_config_clear() separately from __conn_dhandle_destroy(), WT_DATA_HANDLE.cfg is memory in the dhandle, the destroy function should take care of it. * When debugging a page, dump the page's memory footprint. * Fix code indentation, remove unreachable return. * WiredTiger policy is to call a function in higher level code that is a stub when timestamps are disabled. Remove some unnecessary HAVE_TIMESTAMP * whitespace * WT_TXN_TIMESTAMP_FLAG_CHECK no longer used. * Make declaration order match.
| * WT-3675 Fix the lookaside interactions with checkpoints. (#3776)Michael Cahill2017-11-025-61/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the lookaside info saved by reconciliation and how lookaside interacts with checkpoints. Previously, we tracked whether eviction was successful, and if so, continued the checkpoint from after the evicted page. That could skip over pages in some cases (presumably if eviction caused a split). Instead, simplify the loop to make eviction advisory. If eviction succeeds, it should leave the reference in a state where checkpoint can skip over it quickly. If eviction fails, it may still have written the reference and leave it clean, saving work for checkpoint. Either way, checkpoint visits every reference in the tree regardless of splits.
| * WT-3713 Make error output more concise. Some refactoring. (#3775)sueloverso2017-11-011-40/+99
| |
| * WT-3714 Make the data ranges easier to view. (#3773)sueloverso2017-10-312-2/+2
| |
| * WT-3711 Add signal handler to random_abort. (#3772)sueloverso2017-10-312-6/+28
| |
| * WT-3652 Skip unnecessary lookaside reads / writes (#3764)Alex Gorrod2017-10-3115-414/+491
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also fix reads during checkpoint so the expected read generation is set, and checkpoint can attempt to clean up after itself. Add a "lookaside_score" measuring the proportion of unstable updates in cache (those required for historic reads), use that to determine when to use the lookaside table rather than waiting for the cache to become stuck. Instead of discarding updates as part of restoring a page, including the original value in the update list if the on-page update is a modify. This removes some problematic code that was inconsistent about removing updates. Also, if we need to restore updates earlier than the on-page version, the previous code was incorrect. Disable checkpoint skipping lookaside pages for now: always visit every page until we are tracking the correct IDs and timestamps to skip properly. Allow new/old checkpoints to skip most lookaside pages. Also try lookaside if the cache is stuck. All trees with lookaside pages must stay dirty. Eviction has to match checkpoint's durability rules (including "immediate" visibility for oplog and related tables). Stick to simple eviction for the metadata table.
| * Revert WT-3705 Full build Friday and lint (#3770)Alex Gorrod2017-10-3142-121/+337
| | | | | | This reverts commit cad209bdd902df372cc4bc617b080f849fb90261.
| * WT-3707 Install signal handler to detect child unexpectedly failing. (#3769)sueloverso2017-10-301-5/+31
| | | | | | | | | | | | | | | | * WT-3707 Install signal handler to detect child unexpectedly failing. * Review comments. * Fix compiler warnings. Publish the timestamp to avoid re-ordering.
| * WT-3708 PRIu64 format incorrectly specified for size_t (#3768)Keith Bostic2017-10-301-1/+2
| |
| * WT-3630 Send the test environment to "make check". (#3766)Michael Cahill2017-10-301-1/+1
| | | | | | | | | | | | * WT-3630 Send the test environment to "make check". (It also invokes Python to sanitize the build).
| * WT-3705 Full build Friday and lint (#3765)Keith Bostic2017-10-3042-337/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Cleanup pass over test/format options. Configuration options that don't involve random assignment (C_IGNORE), should only set the min/max range to which they can be assigned, and if they're strings (C_STRING), they shouldn't even set those. * Remove debian packaging support (it was out of date, and there is other up-to-date Debian packaging), remove the RPM specification, we no longer build any RPM packages. * Don't call __conn_dhandle_config_clear() separately from __conn_dhandle_destroy(), WT_DATA_HANDLE.cfg is memory in the dhandle, the destroy function should take care of it. * When debugging a page, dump the page's memory footprint. * Fix code indentation, remove unreachable return. * WiredTiger policy is to call a function in higher level code that is a stub when timestamps are disabled. Remove some unnecessary HAVE_TIMESTAMP * whitespace * WT_TXN_TIMESTAMP_FLAG_CHECK no longer used. * Make declaration order match. * clang-tidy warning: redundant return statement at the end of a function with a void return type [readability-redundant-control-flow] * clang-tidy: warning: redundant cast to the same type [google-readability-casting] * clang-tidy: warning: different indentation for 'if' and corresponding 'else' [readability-misleading-indentation] * clang-tidy: warning: do not use 'else' after 'break' [readability-else-after-return] * clang-tidy: warning: Value stored to 'eviction_progress_rate' is never read [clang-analyzer-deadcode.DeadStores] * clang-tidy: warning: redundant cast to the same type [google-readability-casting]
| * Revert "WT-3652 Skip unnecessary lookaside reads / writes. (#3744)" (#3763)Alex Gorrod2017-10-2713-407/+328
| | | | | | This reverts commit b59b8856c040531ef883b6e68010ff1f47ce1495.
| * WT-3652 Skip unnecessary lookaside reads / writes. (#3744)Michael Cahill2017-10-2713-328/+407
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also fix reads during checkpoint so the expected read generation is set, and checkpoint can attempt to clean up after itself. Add a "lookaside_score" measuring the proportion of unstable updates in cache (those required for historic reads), use that to determine when to use the lookaside table rather than waiting for the cache to become stuck. Instead of discarding updates as part of restoring a page, including the original value in the update list if the on-page update is a modify. This removes some problematic code that was inconsistent about removing updates. Also, if we need to restore updates earlier than the on-page version, the previous code was incorrect. Disable checkpoint skipping lookaside pages for now: always visit every page until we are tracking the correct IDs and timestamps to skip properly. Ensure all trees with lookaside pages must stay dirty.