summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Improve efficiency of couch_jobs:accept/2 for viewsmake-couch-view-job-accepts-fasterNick Vatamaniuc2020-06-023-3/+6
| | | | | | | | | | | | | | | | Use the `no_schedule` option to speed up job dequeuing. This optimization allows dequeuing jobs more efficiently if these conditions are met: 1) Job IDs start with a random prefix 2) No time-based scheduling is used Both of those can be true for views job IDs can be generated such that signature comes before the db name part, which is what this commit does. The way the optimization works, is random IDs are generating in pending jobs range, then, a key selection is used to pick either a job before or after it. That reduces each dequeue attempt to just 1 read instead of reading up to 1000 jobs.
* Handle error:{timeout, _} exception in couch_jobs:acceptNick Vatamaniuc2020-06-021-0/+2
| | | | | | Under load accept loop can blow up with timeout error from `erlfdb:wait(...)`(https://github.com/apache/couchdb-erlfdb/blob/master/src/erlfdb.erl#L255) so guard against it just like we do for fdb transaction timeout (1031) errors.
* Remove on_commit handler from fabric2_fdbNick Vatamaniuc2020-06-022-47/+26
| | | | | | Update db handles right away as soon we db verison is checked. This ensures concurrent requests will get access to the current handle as soon as possible and may avoid doing extra version checks and re-opens.
* Prevent eviction of newer handles from fabric_server cacheNick Vatamaniuc2020-06-022-9/+70
| | | | | | Check metadata versions to ensure newer handles are not clobbered. The same thing is done for removal, `maybe_remove/1` removes handle only if there isn't a newer handle already there.
* Guard couch_jobs:accept_loop timing outNick Vatamaniuc2020-05-291-1/+9
| | | | And also against too many conflicts during overload
* Protect couch_jobs activity monitor against timeouts as wellNick Vatamaniuc2020-05-291-3/+3
|
* Fix bad catch statement in couch_jobs activity monitorNick Vatamaniuc2020-05-291-1/+1
|
* Fix mango erlfdb error catch clause erlfdb -> erlfdb_errorNick Vatamaniuc2020-05-282-5/+6
|
* Don't skip over docs in mango indices on erlfdb errorsNick Vatamaniuc2020-05-282-1/+16
|
* Introduce _bulk_docs max_doc_count limitNick Vatamaniuc2020-05-274-1/+32
| | | | | | | Let users specify the maximum document count for the _bulk_docs requests. If the document count exceeds the maximum it would return a 413 HTTP error. This would also signal the replicator to try to bisect the _bulk_docs array into smaller batches.
* Lower the default batch size for update_docs to 2.5MBNick Vatamaniuc2020-05-272-2/+2
| | | | Observed a number of timeouts with the previous default
* Remove erlfdb mock from update_docs/2,3 testNick Vatamaniuc2020-05-221-14/+0
| | | | | In a constrained CI environment transactions could retry multiple times so we cannot rely on precisely counting erlfdb:transactional/2 calls.
* Improve load handling in couch_jobs and couch_viewsNick Vatamaniuc2020-05-212-2/+9
| | | | | | | | | | Increase couch_views job timeout by 20 seconds. This will set a larger jitter when multiple nodes concurrently check and re-equeue jobs. It would reduce the chance of them bumping into each other and conflicting. If they do conflict in activity monitor, catch the error and emit an error log. We gain some more robustness under load for a longer timeout for jobs with workers that have suddenly died getting re-enqueued.
* Merge pull request #2896 from cloudant/pagination-api-fix-limitiilyak2020-05-212-19/+93
|\ | | | | Fix handling of limit query parameter
| * Fix handling of limit query parameterILYA Khlopotov2020-05-202-19/+93
| |
* | Merge pull request #2897 from apache/improve-db-expiration-logPeng Hui Jiang2020-05-211-2/+2
|\ \ | | | | | | Improve log of permanently deleting databases
| * | Improve log of permanently deleting databasesimprove-db-expiration-logjiangph2020-05-211-2/+2
|/ /
* | Bulk docs transaction batchingNick Vatamaniuc2020-05-205-29/+379
|/ | | | | | | | | | | | | | | * Interactive (regular) requests are split into smaller transactions, so larger updates won't fail with either timeout so or transaction too large FDB errors. * Non-interactive (replicated) requests can now batch their updates in a few transaction and gain extra performance. Batch size is configurable: ``` [fabric] update_docs_batch_size = 5000000 ```
* Fix flaky couch_jobs type monitor testNick Vatamaniuc2020-05-151-2/+36
| | | | | | | Sometimes this test fails on Jenkins but doesn't fail locally. The attempted fix is to make sure to simply retry a few times for the number of children in the supervisor to be the expected values. Also extend the timeout to 15 seconds.
* Merge pull request #2870 from cloudant/pagination-api-2iilyak2020-05-157-54/+1683
|\ | | | | Pagination API
| * Add tests for pagination APIILYA Khlopotov2020-05-151-0/+771
| |
| * Implement pagination APIILYA Khlopotov2020-05-156-45/+600
| |
| * Add tests for legacy API before refactoringILYA Khlopotov2020-05-151-0/+302
| |
| * Move not_implemented check down to allow testing of validationILYA Khlopotov2020-05-151-5/+6
| |
| * Fix variable shadowingILYA Khlopotov2020-05-151-4/+4
|/
* Fix compiler warningJay Doane2020-05-141-1/+1
|
* Fix a few flaky tests in fabric2_dbNick Vatamaniuc2020-05-134-17/+22
| | | | | | Add some longer timeouts and fix a race condition in db cleanup tests (Thanks to @jdoane for the patch)
* Merge pull request #2857 from apache/background-db-deletionPeng Hui Jiang2020-05-134-3/+358
|\ | | | | Background database deletion
| * background deletion for soft-deleted databasejiangph2020-05-134-3/+358
|/ | | | | | | | allow background job to delete soft-deleted database according to specified criteria to release space. Once database is hard-deleted, the data can't be fetched back. Co-authored-by: Nick Vatamaniuc<vatamane@apache.org>
* Fix couch_views updater_running info resultNick Vatamaniuc2020-05-093-34/+56
| | | | | | | | | | | | Previously we always returned `false` because the result from `couch_jobs:get_job_state` was expected to be just `Status`, but it is `{ok, Status}`. That part is now explicit so we account for every possible job state and would fail on a clause match if we get something else there. Moved `job_state/2` function to `couch_view_jobs` to avoid duplicating the logic on how to calculate job_id and keep it all in one module. Tests were updated to explicitly check for each state job state.
* mix format all_docs_test.exsGarren Smith2020-05-081-58/+65
|
* add local_docs to fold_doc with docidsGarren Smith2020-05-084-23/+292
|
* Convert aegis key cach to LRU with hard expiration timeEric Avdey2020-05-074-20/+327
|
* Merge pull request #2874 from cloudant/enable-exunitiilyak2020-05-073-2/+4
|\ | | | | Re-enable ExUnit tests
| * Update erlfdbILYA Khlopotov2020-05-071-1/+1
| |
| * Re-enable ExUnit testsILYA Khlopotov2020-05-072-1/+3
|/
* add test to make sure type <<"text">> design docs are ignored (#2866)Tony Sun2020-05-051-0/+8
|
* return correct not implemented for reduceGarren Smith2020-05-041-1/+1
|
* Fix list_dbs_info_tx_too_old flaky testNick Vatamaniuc2020-04-291-1/+1
| | | | | On CI creating a 100 dbs in a row was too much to do in 5 seconds so bump it to 15.
* Fix a flaky fdbcore index testNick Vatamaniuc2020-04-291-2/+2
|
* Improve robustness of couch expiring cache testJay Doane2020-04-282-34/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In its current incarnation, the so-called "simple lifecycle" test is prone to numerous failures in the CI system [1], doubtless because it's riddled with race conditions. The original author makes many assumptions about how quickly an (actual, unmocked) FDB instance will respond to a request. The primary goal is to stop failing CI builds, while other considerations include: keeping the run time of the test as low as possible, keeping the code coverage high, and documenting the known races. Specifically: - Increase the `stale` and `expired` times by a factor of 5 to decrease sensitivity to poor FDB performance. - Change default timer from `erlang:system_time/1` to `os:timestamp` on the assumption that the latter is less prone to warping [2]. - Decrease the period of the cache server reaper by half to increase accuracy of eviction time. - Inline and modify the `test_util:wait` code to make the timer explicit, and emphasize that `timer:delay/1` only works with millisecond resolution. - Don't fail the test if it can't get a fresh lookup immediately after insertion, but let it continue on to the next race, at least to the point of expiration and deletion, which continue to be asserted. - Factor `Timeout` and `Interval` to allow declarations near the other hard-coded parameters. - Move cache server `Opts` into `setup/0` and eliminate `start_link/0`. - Double the overall test timeout to 20 seconds. This has soaked for hundreds of runs on a 5 year old laptop, but the real test is the CI system. Should this test continue to fail CI builds, additional improvements could include mocking the timer and/or FDB layer to eliminate the variability of an integrated system. [1] https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FPullRequests/detail/PR-2813/10/pipeline [2] http://erlang.org/doc/apps/erts/time_correction.html#terminology
* Re-enable the tx options testsNick Vatamaniuc2020-04-282-3/+15
| | | | | | | And an extra level of error checking to erlfdb:set_option since it could fail if we forget to update erlfdb dependency or fdb server version is too old. That operation can fail with an error:badarg which is exactly how list_to_integer fails and result in a confusing log message.
* Temporary disable fabric2_tx_options_testsEric Avdey2020-04-281-1/+1
|
* Remove etag from changes and _list_dbsGarren Smith2020-04-282-28/+13
|
* Fix mango test suitePaul J. Davis2020-04-271-1/+2
|
* Allow specifying FDB transaction optionsNick Vatamaniuc2020-04-274-12/+245
| | | | | | | | | | | | | | | With the latest erlfdb release v1.1.0 we have the ability to set default transaction options on the database handle. Once set, those are inherited by every transaction started from that handle. Use this feature to give advanced users a way to experiment with various transaction options. Descriptions of those options in the default.ini file have been mostly a copy and paste from the fdb_c_option.g.h file from the client library. In addition, specify some safer default values for transaction timeouts (1min) and retry limit (100). These quite conservative and are basically something less that "infinity". In the future these may be adjusted lower.
* Update erlfdb to v1.1.0Nick Vatamaniuc2020-04-271-1/+1
| | | | https://github.com/apache/couchdb-erlfdb/releases/tag/v1.1.0
* Add a couch_views test for multiple design documents with the same mapNick Vatamaniuc2020-04-271-8/+80
|
* Merge pull request #2826 from apache/aegisRobert Newson2020-04-2720-38/+849
|\ | | | | Add native encryption support
| * Add native encryption supportRobert Newson2020-04-2720-38/+849
|/ | | | | | | | | | | | | | | | | | | | | | | A new application, aegis, is introduced to provide strong at-rest protection of CouchDB data (where possible). Currently we encrypt the following values (if enabled): 1. Document content 2. Attachment content 3. Index values Things not encrypted: 1. _all_docs 2. _changes 3. doc id 4. doc rev 5. Index keys 6. All other metadata Co-Authored-By: Eric Avdey <eiri@apache.org> Co-Authored-By: Robert Samuel Newson <rnewson@apache.org>