| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Active Tasks requires TotalChanges and ChangesDone to show the progress
of long running tasks. This requires count_changes_since to be
implemented. Unfortunately, that is not easily done via with
foundationdb. This commit replaces TotalChanges with the
versionstamp + the number of docs as a progress indicator. This can
possibly break existing api that relys on TotalChanges. ChangesDone
will still exist, but instead of relying on the current changes seq
it is simply a reflection of how many documents were written by the
updater process.
|
|\
| |
| | |
Add max_bulk_get_count configuration option
|
|/ |
|
| |
|
| |
|
|
|
|
| |
https://github.com/apache/couchdb-erlfdb/releases/tag/v1.2.2
|
|
|
|
|
|
| |
In an overload scenario do not let notifiers crash and lose their subscribers,
instead make them more robust and let them retry on future or transaction
timeouts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Optimize couch_views by using a separate set of acceptors and workers.
Previously, all `max_workers` where spawned on startup, and were to
waiting to accept jobs in parallel. In a setup with a large number of
pods, and 100 workers per pod, that could lead to a lot of conflicts
being generated when all those workers race to accept the same job at
the same time.
The improvement is to spawn only a limited number of acceptors (5, by
default), then, spawn more after some of them become workers. Also,
when some workers finish or die with an error, check if more acceptors
could be spawned.
As an example, here is what might happen with `max_acceptors = 5` and
`max_workers = 100` (`A` and `W` are the current counts of acceptors
and workers, respectively):
1. Starting out:
`A = 5, W = 0`
2. After 2 acceptors start running:
`A = 3, W = 2`
Then immediately 2 more acceptors are spawned:
`A = 5, W = 2`
3. After 95 workers are started:
`A = 5, W = 95`
4. Now if 3 acceptors accept, it would look like:
`A = 2, W = 98`
But no more acceptors would be started.
5. If the last 2 acceptors also accept jobs: `A = 0, W = 100` At this
point no more indexing jobs can be accepted and started until at
least one of the workers finish and exit.
6. If 1 worker exits:
`A = 0, W = 99`
An acceptor will be immediately spawned
`A = 1, W = 99`
7. If all 99 workers exit, it will go back to:
`A = 5, W = 0`
|
|
|
|
|
|
|
|
| |
As per ML
[discussion](https://lists.apache.org/thread.html/rb328513fb932e231cf8793f92dd1cc2269044cb73cb43a6662c464a1%40%3Cdev.couchdb.apache.org%3E)
add a `uuid` field to db info results in order to be able to uniquely identify
a particular instance of a database. When a database is deleted and re-created
with the same name, it will return a new `uuid` value.
|
|
|
|
|
|
|
|
|
|
|
|
| |
When waiting to accept jobs and scheduling was used, timeout is limited based
on the time scheduling parameter. When no_schedule option is used, time
scheduling parameter is set to 0 always, and so in that case, we have to
special-case the limit to return `infinity`.
Later on when we wait for the watch to fire, the actual timeout can still be
limited, by a separate user specified timeout option, but if user specifies
`infinity` there and sets `#{no_schedule => true}` then we should respect and
never return `{error, not_found}` in response.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use the `no_schedule` option to speed up job dequeuing. This optimization
allows dequeuing jobs more efficiently if these conditions are met:
1) Job IDs start with a random prefix
2) No time-based scheduling is used
Both of those can be true for views job IDs can be generated such that
signature comes before the db name part, which is what this commit does.
The way the optimization works, is random IDs are generating in pending jobs
range, then, a key selection is used to pick either a job before or after
it. That reduces each dequeue attempt to just 1 read instead of reading up to
1000 jobs.
|
|
|
|
|
|
| |
Under load accept loop can blow up with timeout error from
`erlfdb:wait(...)`(https://github.com/apache/couchdb-erlfdb/blob/master/src/erlfdb.erl#L255)
so guard against it just like we do for fdb transaction timeout (1031) errors.
|
|
|
|
|
|
| |
Update db handles right away as soon we db verison is checked. This ensures
concurrent requests will get access to the current handle as soon as possible
and may avoid doing extra version checks and re-opens.
|
|
|
|
|
|
| |
Check metadata versions to ensure newer handles are not clobbered. The same
thing is done for removal, `maybe_remove/1` removes handle only if there isn't
a newer handle already there.
|
|
|
|
| |
And also against too many conflicts during overload
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
Let users specify the maximum document count for the _bulk_docs requests. If
the document count exceeds the maximum it would return a 413 HTTP error. This
would also signal the replicator to try to bisect the _bulk_docs array into
smaller batches.
|
|
|
|
| |
Observed a number of timeouts with the previous default
|
|
|
|
|
| |
In a constrained CI environment transactions could retry multiple times so we
cannot rely on precisely counting erlfdb:transactional/2 calls.
|
|
|
|
|
|
|
|
|
|
| |
Increase couch_views job timeout by 20 seconds. This will set a larger jitter
when multiple nodes concurrently check and re-equeue jobs. It would reduce the
chance of them bumping into each other and conflicting.
If they do conflict in activity monitor, catch the error and emit an error log.
We gain some more robustness under load for a longer timeout for jobs with
workers that have suddenly died getting re-enqueued.
|
|\
| |
| | |
Fix handling of limit query parameter
|
| | |
|
|\ \
| | |
| | | |
Improve log of permanently deleting databases
|
|/ / |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Interactive (regular) requests are split into smaller transactions, so
larger updates won't fail with either timeout so or transaction too large
FDB errors.
* Non-interactive (replicated) requests can now batch their updates in a few
transaction and gain extra performance.
Batch size is configurable:
```
[fabric]
update_docs_batch_size = 5000000
```
|
|
|
|
|
|
|
| |
Sometimes this test fails on Jenkins but doesn't fail locally. The attempted
fix is to make sure to simply retry a few times for the number of children in
the supervisor to be the expected values. Also extend the timeout to 15
seconds.
|
|\
| |
| | |
Pagination API
|
| | |
|
| | |
|
| | |
|
| | |
|
|/ |
|
| |
|
|
|
|
|
|
| |
Add some longer timeouts and fix a race condition in db cleanup tests
(Thanks to @jdoane for the patch)
|
|\
| |
| | |
Background database deletion
|
|/
|
|
|
|
|
|
| |
allow background job to delete soft-deleted database according to
specified criteria to release space. Once database is hard-deleted,
the data can't be fetched back.
Co-authored-by: Nick Vatamaniuc<vatamane@apache.org>
|