| Commit message (Collapse) | Author | Age | Files | Lines |
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously, dbs with N < cluster default N would pollute logs with critical
errors regarding not having enough shards. Instead, use each database's
expected N value to emit custodian reports.
Note: the expected N value is a bit tricky to understand since with shard
splitting feature, shard ranges are not guaranteed to exactly match for all
copies. The N value is then defined as the max number of rings which can be
completed with the given set of shards -- complete the ring once, remove
participating shards, try again, etc. Lucky for us, that function is already
written (`mem3_util:calculate_max_n(Shards)` so we are just re-using it.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Prevent failures like:
mem3_rep: find_source_seq_unknown_node_test...*failed*
in function gen_server:call/2 (gen_server.erl, line 206)
in call from couch_log:log/3 (src/couch_log.erl, line 73)
in call from mem3_rep:find_source_seq_int/5 (src/mem3_rep.erl, line 248)
in call from mem3_rep:'-find_source_seq_unknown_node_test/0-fun-0-'/0 (src/mem3_rep.erl, line 794)
**exit:{noproc,{gen_server,call,
[couch_log_server,
{log,{log_entry,warning,<0.17426.5>,
["mem3_rep",32,102,105,110,100|...],
"--------",
["2021",45,"10",45|...]}}]}}
output:<<"">>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Prevent failures like this from repeated test runs:
mem3_bdu_test:73: mem3_bdu_shard_doc_test_ (t_design_docs_are_not_validated)...*failed*
in function mem3_bdu_test:'-t_design_docs_are_not_validated/1-fun-0-'/1 (test/eunit/mem3_bdu_test.erl, line 206)
in call from mem3_bdu_test:t_design_docs_are_not_validated/1 (test/eunit/mem3_bdu_test.erl, line 206)
in call from eunit_test:run_testfun/1 (eunit_test.erl, line 71)
in call from eunit_proc:run_test/1 (eunit_proc.erl, line 510)
in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 335)
in call from eunit_proc:handle_test/2 (eunit_proc.erl, line 493)
in call from eunit_proc:tests_inorder/3 (eunit_proc.erl, line 435)
in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 325)
**error:{assertEqual,[{module,mem3_bdu_test},
{line,206},
{expression,"Code"},
{expected,201},
{value,409}]}
output:<<"">>
|
| |
| |
| |
| |
| | |
Depending on configuration, it is possible for the shards db to be
different than `_dbs`.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Prevent failures like this:
mem3_sync_event_listener:267: should_set_sync_delay...*failed*
in function gen_server:call/3 (gen_server.erl, line 214)
in call from mem3_sync_event_listener:'-should_set_sync_delay/1-fun-1-'/1
(src/mem3_sync_event_listener.erl, line 268)
**exit:{{noproc,{gen_server,call,
[couch_log_server,
{log,{log_entry,notice,<0.31789.5>,
["config",58,32,91,[...]|...],
"--------",
["2021",45,[...]|...]}}]}},
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Our existing logic for handling rewinds in the changes feed addresses
the following cases:
- A node that contributed to a sequence is in maintenance mode
- A shard that contributed to a sequence has been split
This patch adds support for cases where the node that contributed to a
client-supplied sequence is down at the beginning of the request
handling. It reuses the same logic as the maintenance mode case as these
two situations really ought to be handled the same way.
A future improvement would be to unify the "node down" and "shard split"
logic so that we could handle the compound case, e.g. replacing a shard
from a down node with a pair of shards from nodes that cover the same
range.
Fixes #3788
Co-authored-by: Nick Vatamaniuc <vatamane@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Unused functions in `couch_util_tests`
- Unused variables in `couch_prometheus_e2e_tests`
- Unused variable in `dreyfus_blacklist_await_test`
- Deprecated BIF `erlang:now/0` in `dreyfus_purge_test`
- `export_all` flag in dreyfus tests
- Unused variable in `mem3_reshard_test`
|
| |
| |
| |
| |
| |
| | |
The test only checks that we can update the shard doc so we just verifythat.
Apparently, it doesn't mean we can synchronously access the newly created db
info right away so we just skip that part to avoid a flaky failure.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Noticed this flaky test show up in a few recent test runs, for
[example](https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FPullRequests/detail/PR-3799/1/pipeline)
The test was flaky as We were only waiting for the replication task or
scheduler job to appear in the list but didn't not wait until the
value of the task had been updated to an expected value. So the task
might have appeared but then only half the docs written (say, 5
instead of 10). Testing the value at that stage is too early and the
test would fail.
To fix the issue, besides waiting on the task/job to appear in the
list, also wait until its `docs_written` value matches the expected
value. By that point `docs_read` should have caught up as well.
|
| |
| |
| |
| |
| |
| |
| | |
Fetch the libicu base version as well as the collator version. The
base version may be used to determine which libicu library CouchDB is
using. The collator version may be used to debug view behavior in case
when collation order had changed from one version ot the next.
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This fixes issue: https://github.com/apache/couchdb/issues/3786
In addition, add few _all_dbs limit tests since we didn't seem to have
any previously to catch such issues. Plus, test some of the corner
cases which should be caught by the BDU and should return a 403 error
code.
|
|/
|
|
|
|
|
|
|
|
|
|
| |
Previously, view reduce collation with keys relied on the keys in the
rows returned from the view shards to exactly match (=:=) the keys
specified in the args. However, in the case when there are multiple
rows which compare equal with the unicode collator, that may not
always be the case.
In that case when the rows are fetched from the row dict by key, they
should be matched using the same collation algorithm as the one used
on the view shards.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This means `update_seq` values from `GET $db` `last_seq` returned from ` GET
$db/_changes?since=now&limit=` will be more resilient to change feed rewinds.
Besides, those sequences will now be more consistent and users won't have to
wonder why one opaque sequence works slightly differently than another opaque
update sequence.
Previously, when the sequences were returned only as numeric values, it was
impossible to calculate replacements and change feeds had to always rewind back
to 0 for those ranges. With uuids and epochs in play, it is possible to figure
out that some shards might have moved to new nodes or find internal replication
checkpoints to avoid streaming changes feeds from 0 on those ranges.
Some replication Elixir tests decode update sequences, so those were updated to
handle the new uuid and epoch format as well.
Fixes: https://github.com/apache/couchdb/issues/3787
Co-author: Adam Kocoloski kocolosk@apache.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
couch_icu_driver is only used for binary strings comparison in
couch_ejson_compare when expression depth becomes greater than 10.
The logic for string comparison is identical to what couch_ejson_compare uses,
so opt to just use couch_ejson_compare instead of keeping a whole other binary
collation driver around.
To avoid a possible infinite loop if couch_ejson_compare nif fails to
load, throw a nif loading error as is common for nif modules.
To avoid another case of a possible infinite retry from of badarg
generated by max depth, and/or an actual bad ejson term, use a
specific max depth error so we don't have to guess when we catch it
and retry term traversal in erlang.
There was another uncodumented case when badarg was thrown besides max
depth or an invalid arg. It was when a prop value was compared with
any other supported type. In erlang it would be handled in these
clauses:
```
less_erl({A},{B}) when is_list(A), is_list(B) -> less_props(A,B);
less_erl({A},_) when is_list(A) -> -1;
less_erl(_,{B}) when is_list(B) -> 1.
```
However, in C we threw a badarg for the last two clauses and relied on
erlang to do all the work. This case was a potential performance issue
as well since that is a common comparison for mango where we may
compare keys against the max json object value (<<255,255,255,255>>).
Add a few property tests in order to validate collation behavior. The two main
ones are:
1) Given an expected sort order of some test values, assert that both the
erlang and nif collators would correctly order any of those test values.
2) In general, the nif collator would sort any json the same way as the
erlang one.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`infinity` it turns out is a valid configuration value for fabric
request_timeout. We can pass that to Erlang `receive` statement, any arithmetic
with it would fail.
To guard against the crash use the max small int value (60 bits). With enough
shards, due to the exponential nature of the algorithm, we still get a nice
progression from the minimum 100 msec all the way up to the large int value.
This case is illustrated in the test.
Issue: https://github.com/apache/couchdb/issues/3789
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Prevent this race condition:
*** context setup failed ***
**in function couch_replicator_doc_processor:setup/0 (src/couch_replicator_doc_processor.erl, line 872)
**error:{badmatch,{error,{already_started,<0.4946.0>}}}
|
|
|
|
|
|
| |
- Prepend unused variable with underscore
- Add nowarn_export_all compiler option
- Use STACKTRACE macro
|
|
|
|
| |
Delete unused function and remove unused variable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the previous attempt [1] we improved the logic by spawning workers on the
matching target shards only. However, that wasn't enough as workers might still
reject the passed in sequence from the old node when it asserts ownership
locally on each shard.
Re-use the already existing replacement clause, where after uuid is matched, we
try harder to find the highest viable sequence. To use the unit test setup as
an example, if the shard moved from node1 to node2, and recorded epoch `{node2,
10}` on the new node, then a sequence generated on node1 before the move, for
example 12, would rewind down to 10 only when calculated on new location on
node2, instead of being rewound all the way down to 0.
[1] https://github.com/apache/couchdb/commit/e83935c7f8c3e47b47f07f22ece327f6529d4da0
|
|
|
|
| |
To include another changelog entry: https://github.com/apache/couchdb-documentation/commit/4f00da0b0cedf63ebf391e43b1a56bb36f7d0f96
|
|
|
| |
Missed file in f85cff669f20cee0a54da7bb8c645dfc4d2de5c9
|
|
|
|
|
|
| |
Based off of the upstream 1.0.9 + CouchDB clone changes
https://github.com/apache/couchdb-jiffy/releases/tag/CouchDB-1.0.9-1
|
| |
|
|
|
|
|
|
| |
Fauxton was failing so we backported the fix from main
https://github.com/apache/couchdb/commit/f85cff669f20cee0a54da7bb8c645dfc4d2de5c9
for it.
|
| |
|
|
|
|
|
| |
These two tests exercise the same assertions as the individual
`sandbox_doc_attachments` test case in chttpd_csp_tests.erl.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We do that by matching the comparator function behavior used during row merging
[1] in with the comparison function used when sorting the rows on view
shards [2]. This goes along with the constraint in the lists:merge/3 docs which
indicates that the input lists should be sorted according to the same
comparator [3] as the one passed to the lists:merge/3 call.
The stability of returned rows results from the case when both keys match as
equal. Now `lists:merge/3` will favor the element in the existing rows list
instead of always replacing [4] the older matching row with the last arriving
row, since now `less(A, A)` will be `false`, while previously it was `true`.
The fix was found by Adam when discussing issue #3750
https://github.com/apache/couchdb/issues/3750#issuecomment-920947424
Co-author: Adam Kocoloski <kocolosk@apache.org>
[1] https://github.com/apache/couchdb/blob/3.x/src/fabric/src/fabric_view_map.erl#L244-L248
[2] https://github.com/apache/couchdb/blob/3.x/src/couch_mrview/src/couch_mrview_util.erl#L1103-L1107
[3] https://erlang.org/doc/man/lists.html#merge-3
[4] https://github.com/erlang/otp/blob/master/lib/stdlib/src/lists.erl#L2668-L2675
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Results should now be returned in descending {key, doc_id} order.
The idea is to reverse the key list before sending it to the workers, so they
will emit rows in reverse order. Also, we are using the same reversed list when
building the KeyDict structure on the coordinator. That way the order of the
sent rows and the expected coordinator sorting order will match.
For testing, enhance an existing multi-key Elixir view test to test both
ascending and descending cases and actually check that the rows are in the
correct order each time.
|
|\
| |
| | |
Port 3286 - Add ability to control which Elixir integration tests to run
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New `elixir-suite` Makefile target is added. It runs a predefined set of elixir
integration tests.
The feature is controlled by two files:
- test/elixir/test/config/suite.elixir - contains list of all available tests
- test/elixir/test/config/skip.elixir - contains list of tests to skip
In order to update the `test/elixir/test/config/suite.elixir` when new tests
are added. The one would need to run the following command:
```
MIX_ENV=integration mix suite > test/elixir/test/config/suite.elixir
```
|
|\
| |
| | |
Fix limit0 for views again
|
| |
| |
| |
| |
| |
| |
| |
| | |
The limit=0 clause was introduced in commit 4e0c97bf which added
sorted=false support. It accidentally matches when the user specifies
limit=0 and causes us not to apply the logic that ensures we collect a
{meta, Meta} message from each shard range and then send the
total_rows and offset fields.
|
|/
|
|
| |
This reverts commit a36e7308ab4a2cfead6da64a9f83b7776722382d.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, as soon as one row returned, we immediately stopped, erroneously
assuming that meta for all ranges have already been received. However, it was
possible that we'd get meta from range 00-7f, then a row from 00-7f before
getting meta from 7f-ff and thus we'd return an empty result.
To fix the issue we simply re-use the already existing limit=0 clause from the
fabric_view:maybe_send_row/1 function which will wait until there is a complete
ring before returning. That relies on updating the counters (the ring) only
with meta return and not with view rows, so if the ring is complete, we know we
only completed with meta.
The other issue with limit=0 clause was that it wasn't properly ack-ing the
received row. Rows are acked for sorted=false case below and for the regular
limit>0, sorted=true case in fabric_view:get_next_row/1.
Issue: https://github.com/apache/couchdb/issues/3750
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, if the source db purge sequence > `purge_infos_limit`, shard
splitting would crash with the `{{invalid_start_purge_seq,0},
[{couch_bt_engine,fold_purge_infos,5...` error. That was because purge
sequences were always copied starting from 0. That would only work as long as
the total number of purges stayed below the purge_infos_limit threshold. In
order to correctly gather the purge sequences, the start sequence must be
based off of the actual oldest sequence currently available.
An example of how it should be done is in the `mem_rpc` module, when loading
purge infos [0], so here we do exactly the same. The `MinSeq - 1` logic is also
evident by inspecting the fold_purge_infos [1] function.
The test sets up the exact scenario as described above: reduces the purge info
limit to 10 then purges 20 documents. By purging more than the limit, we ensure
the starting sequence is now != 0. However, the purge sequence btree is
actually trimmed down during compaction. That is why there are a few extra
helper functions to ensure compaction runs and finishes before shard splitting
starts.
Fixes: https://github.com/apache/couchdb/issues/3738
[0] https://github.com/apache/couchdb/blob/4ea9f1ea1a2078162d0e281948b56469228af3f7/src/mem3/src/mem3_rpc.erl#L206-L207
[1] https://github.com/apache/couchdb/blob/3.x/src/couch/src/couch_bt_engine.erl#L625-L637
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, users with low {Q, N} dbs often got the `"No DB shards could be
opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout
was too low to open the few available shards and the whole request would crash
with a 500 error.
Attempt to calculate an optimal timeout value based on the number of shards and
the max fabric request timeout limit.
The sequence of doubling (by default) timeouts forms a geometric progression.
Use the well known closed form formula for the sum [0], and the maximum request
timeout, to calculate the initial timeout. The test case illustrates a few
examples with some default Q and N values.
Because we don't want the timeout value to be too low, since it takes time to
open shards, and we don't want to quickly cycle through a few initial shards
and discard the results, the minimum inital timeout is clipped to the
previously hard-coded 100 msec timeout. Unlike previously however, this minimum
value can now also be configured.
[0] https://en.wikipedia.org/wiki/Geometric_series
Fixes: https://github.com/apache/couchdb/issues/3733
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces CSP settings for attachments and show/list funs and
streamlines the configuration with the existing Fauxton CSP options.
Deprecates the old `[csp] enable` and `[csp] header_value` config
options, but they are honoured going forward.
They are replaced with `[csp] utils_enable` and `[csp] utils_header_value`
respectively. The funcitonality and default values remain the same.
In addition, these new config options are added, along with their
default values:
```
[csp]
attachments_enable = true
attachments_header_value = sandbox
showlist_enable = true
showlist_header_value = sandbox
```
These add `Content-Security-Policy` headers to all attachment requests
and to all non-JSON show and all list function responses.
Co-authored-by: Nick Vatamaniuc <vatamane@gmail.com>
Co-authored-by: Robert Newson <rnewson@apache.org>
|