delta/couchdb.git - github.com: apache/couchdb.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	add multiple indexer active task testadd_active_tasks_fdb	Tony Sun	2020-07-23	1	-5/+29
\|
*	use filtermap	Tony Sun	2020-07-23	1	-6/+22
\|
*	formatting	Tony Sun	2020-07-23	2	-3/+2
\|
*	add fabric verification to test	Tony Sun	2020-07-23	1	-3/+7
\|
*	scrub extra info from get_active_job_ids	Tony Sun	2020-07-23	2	-17/+25
\|
*	fix test	Tony Sun	2020-07-23	1	-0/+127
\|
*	encapsulate <<"active_task_info">> to fabric	Tony Sun	2020-07-23	4	-25/+35
\|
*	move active_task_info into util	Tony Sun	2020-07-23	2	-26/+28
\|
*	add spec	Tony Sun	2020-07-23	1	-0/+2
\|
*	revert to using job_data	Tony Sun	2020-07-22	6	-119/+38
\|
*	add get_active_jobs to couch_jobs	Tony Sun	2020-07-21	1	-0/+7
\|
*	add active_tasks for view builds using version stamps	Tony Sun	2020-07-21	8	-12/+162
\| \| \| \| \| \| \| \| \| \| \| \|	Active Tasks requires TotalChanges and ChangesDone to show the progress of long running tasks. This requires count_changes_since to be implemented. Unfortunately, that is not easily done via with foundationdb. This commit replaces TotalChanges with the versionstamp + the number of docs as a progress indicator. This can possibly break existing api that relys on TotalChanges. ChangesDone will still exist, but instead of relying on the current changes seq it is simply a reflection of how many documents were written by the updater process.
*	Merge pull request #2960 from cloudant/add-max_bulk_get_count	iilyak	2020-06-23	4	-1/+34
\|\ \| \| \| \|	Add max_bulk_get_count configuration option
\| *	Add max_bulk_get_count configuration option	ILYA Khlopotov	2020-06-22	4	-1/+34
\|/
*	Reserve aegis namespace under ?CLUSTER_CONFIG	Eric Avdey	2020-06-17	1	-0/+4
\|
*	add back r and w options	Tony Sun	2020-06-12	1	-0/+12
\|
*	Bump erlfdb to v1.2.2	Nick Vatamaniuc	2020-06-12	1	-1/+1
\| \| \| \|	https://github.com/apache/couchdb-erlfdb/releases/tag/v1.2.2
*	Handle transaction and future timeouts in couch_jobs notifiers	Nick Vatamaniuc	2020-06-10	1	-1/+10
\| \| \| \| \| \|	In an overload scenario do not let notifiers crash and lose their subscribers, instead make them more robust and let them retry on future or transaction timeouts.
*	Split couch_views acceptors and workers	Nick Vatamaniuc	2020-06-08	5	-22/+309
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Optimize couch_views by using a separate set of acceptors and workers. Previously, all `max_workers` where spawned on startup, and were to waiting to accept jobs in parallel. In a setup with a large number of pods, and 100 workers per pod, that could lead to a lot of conflicts being generated when all those workers race to accept the same job at the same time. The improvement is to spawn only a limited number of acceptors (5, by default), then, spawn more after some of them become workers. Also, when some workers finish or die with an error, check if more acceptors could be spawned. As an example, here is what might happen with `max_acceptors = 5` and `max_workers = 100` (`A` and `W` are the current counts of acceptors and workers, respectively): 1. Starting out: `A = 5, W = 0` 2. After 2 acceptors start running: `A = 3, W = 2` Then immediately 2 more acceptors are spawned: `A = 5, W = 2` 3. After 95 workers are started: `A = 5, W = 95` 4. Now if 3 acceptors accept, it would look like: `A = 2, W = 98` But no more acceptors would be started. 5. If the last 2 acceptors also accept jobs: `A = 0, W = 100` At this point no more indexing jobs can be accepted and started until at least one of the workers finish and exit. 6. If 1 worker exits: `A = 0, W = 99` An acceptor will be immediately spawned `A = 1, W = 99` 7. If all 99 workers exit, it will go back to: `A = 5, W = 0`
*	Include database uuid in db info result	Nick Vatamaniuc	2020-06-04	3	-5/+16
\| \| \| \| \| \| \| \|	As per ML [discussion](https://lists.apache.org/thread.html/rb328513fb932e231cf8793f92dd1cc2269044cb73cb43a6662c464a1%40%3Cdev.couchdb.apache.org%3E) add a `uuid` field to db info results in order to be able to uniquely identify a particular instance of a database. When a database is deleted and re-created with the same name, it will return a new `uuid` value.
*	Fix couch_jobs accept timeout when no_schedule option is used	Nick Vatamaniuc	2020-06-03	1	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \|	When waiting to accept jobs and scheduling was used, timeout is limited based on the time scheduling parameter. When no_schedule option is used, time scheduling parameter is set to 0 always, and so in that case, we have to special-case the limit to return `infinity`. Later on when we wait for the watch to fire, the actual timeout can still be limited, by a separate user specified timeout option, but if user specifies `infinity` there and sets `#{no_schedule => true}` then we should respect and never return `{error, not_found}` in response.
*	Improve efficiency of couch_jobs:accept/2 for views	Nick Vatamaniuc	2020-06-02	3	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the `no_schedule` option to speed up job dequeuing. This optimization allows dequeuing jobs more efficiently if these conditions are met: 1) Job IDs start with a random prefix 2) No time-based scheduling is used Both of those can be true for views job IDs can be generated such that signature comes before the db name part, which is what this commit does. The way the optimization works, is random IDs are generating in pending jobs range, then, a key selection is used to pick either a job before or after it. That reduces each dequeue attempt to just 1 read instead of reading up to 1000 jobs.
*	Handle error:{timeout, _} exception in couch_jobs:accept	Nick Vatamaniuc	2020-06-02	1	-0/+2
\| \| \| \| \| \|	Under load accept loop can blow up with timeout error from `erlfdb:wait(...)`(https://github.com/apache/couchdb-erlfdb/blob/master/src/erlfdb.erl#L255) so guard against it just like we do for fdb transaction timeout (1031) errors.
*	Remove on_commit handler from fabric2_fdb	Nick Vatamaniuc	2020-06-02	2	-47/+26
\| \| \| \| \| \|	Update db handles right away as soon we db verison is checked. This ensures concurrent requests will get access to the current handle as soon as possible and may avoid doing extra version checks and re-opens.
*	Prevent eviction of newer handles from fabric_server cache	Nick Vatamaniuc	2020-06-02	2	-9/+70
\| \| \| \| \| \|	Check metadata versions to ensure newer handles are not clobbered. The same thing is done for removal, `maybe_remove/1` removes handle only if there isn't a newer handle already there.
*	Guard couch_jobs:accept_loop timing out	Nick Vatamaniuc	2020-05-29	1	-1/+9
\| \| \| \|	And also against too many conflicts during overload
*	Protect couch_jobs activity monitor against timeouts as well	Nick Vatamaniuc	2020-05-29	1	-3/+3
\|
*	Fix bad catch statement in couch_jobs activity monitor	Nick Vatamaniuc	2020-05-29	1	-1/+1
\|
*	Fix mango erlfdb error catch clause erlfdb -> erlfdb_error	Nick Vatamaniuc	2020-05-28	2	-5/+6
\|
*	Don't skip over docs in mango indices on erlfdb errors	Nick Vatamaniuc	2020-05-28	2	-1/+16
\|
*	Introduce _bulk_docs max_doc_count limit	Nick Vatamaniuc	2020-05-27	4	-1/+32
\| \| \| \| \| \| \|	Let users specify the maximum document count for the _bulk_docs requests. If the document count exceeds the maximum it would return a 413 HTTP error. This would also signal the replicator to try to bisect the _bulk_docs array into smaller batches.
*	Lower the default batch size for update_docs to 2.5MB	Nick Vatamaniuc	2020-05-27	2	-2/+2
\| \| \| \|	Observed a number of timeouts with the previous default
*	Remove erlfdb mock from update_docs/2,3 test	Nick Vatamaniuc	2020-05-22	1	-14/+0
\| \| \| \| \|	In a constrained CI environment transactions could retry multiple times so we cannot rely on precisely counting erlfdb:transactional/2 calls.
*	Improve load handling in couch_jobs and couch_views	Nick Vatamaniuc	2020-05-21	2	-2/+9
\| \| \| \| \| \| \| \| \| \|	Increase couch_views job timeout by 20 seconds. This will set a larger jitter when multiple nodes concurrently check and re-equeue jobs. It would reduce the chance of them bumping into each other and conflicting. If they do conflict in activity monitor, catch the error and emit an error log. We gain some more robustness under load for a longer timeout for jobs with workers that have suddenly died getting re-enqueued.
*	Merge pull request #2896 from cloudant/pagination-api-fix-limit	iilyak	2020-05-21	2	-19/+93
\|\ \| \| \| \|	Fix handling of limit query parameter
\| *	Fix handling of limit query parameter	ILYA Khlopotov	2020-05-20	2	-19/+93
\| \|
* \|	Merge pull request #2897 from apache/improve-db-expiration-log	Peng Hui Jiang	2020-05-21	1	-2/+2
\|\ \ \| \| \| \| \| \|	Improve log of permanently deleting databases
\| * \|	Improve log of permanently deleting databasesimprove-db-expiration-log	jiangph	2020-05-21	1	-2/+2
\|/ /
* \|	Bulk docs transaction batching	Nick Vatamaniuc	2020-05-20	5	-29/+379
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Interactive (regular) requests are split into smaller transactions, so larger updates won't fail with either timeout so or transaction too large FDB errors. * Non-interactive (replicated) requests can now batch their updates in a few transaction and gain extra performance. Batch size is configurable: ``` [fabric] update_docs_batch_size = 5000000 ```
*	Fix flaky couch_jobs type monitor test	Nick Vatamaniuc	2020-05-15	1	-2/+36
\| \| \| \| \| \| \|	Sometimes this test fails on Jenkins but doesn't fail locally. The attempted fix is to make sure to simply retry a few times for the number of children in the supervisor to be the expected values. Also extend the timeout to 15 seconds.
*	Merge pull request #2870 from cloudant/pagination-api-2	iilyak	2020-05-15	7	-54/+1683
\|\ \| \| \| \|	Pagination API
\| *	Add tests for pagination API	ILYA Khlopotov	2020-05-15	1	-0/+771
\| \|
\| *	Implement pagination API	ILYA Khlopotov	2020-05-15	6	-45/+600
\| \|
\| *	Add tests for legacy API before refactoring	ILYA Khlopotov	2020-05-15	1	-0/+302
\| \|
\| *	Move not_implemented check down to allow testing of validation	ILYA Khlopotov	2020-05-15	1	-5/+6
\| \|
\| *	Fix variable shadowing	ILYA Khlopotov	2020-05-15	1	-4/+4
\|/
*	Fix compiler warning	Jay Doane	2020-05-14	1	-1/+1
\|
*	Fix a few flaky tests in fabric2_db	Nick Vatamaniuc	2020-05-13	4	-17/+22
\| \| \| \| \| \|	Add some longer timeouts and fix a race condition in db cleanup tests (Thanks to @jdoane for the patch)
*	Merge pull request #2857 from apache/background-db-deletion	Peng Hui Jiang	2020-05-13	4	-3/+358
\|\ \| \| \| \|	Background database deletion
\| *	background deletion for soft-deleted database	jiangph	2020-05-13	4	-3/+358
\|/ \| \| \| \| \| \| \|	allow background job to delete soft-deleted database according to specified criteria to release space. Once database is hard-deleted, the data can't be fetched back. Co-authored-by: Nick Vatamaniuc<vatamane@apache.org>