summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* fix(auth): straggling s/couch_httpd_auth/chttpd/fix/cluster-setupJan Lehnardt2021-10-301-1/+1
|
* fix(setup): straggling s/couch_httpd_auth/chttpd_auth/ closes #3805Jan Lehnardt2021-10-291-2/+2
|
* Eliminate custodian false positive errors for dbs with N < default NNick Vatamaniuc2021-10-281-4/+3
| | | | | | | | | | | | | Previously, dbs with N < cluster default N would pollute logs with critical errors regarding not having enough shards. Instead, use each database's expected N value to emit custodian reports. Note: the expected N value is a bit tricky to understand since with shard splitting feature, shard ranges are not guaranteed to exactly match for all copies. The N value is then defined as the max number of rings which can be completed with the given set of shards -- complete the ring once, remove participating shards, try again, etc. Lucky for us, that function is already written (`mem3_util:calculate_max_n(Shards)` so we are just re-using it.
* Use configured shards db in custodian instead of `"dbs"`Nick Vatamaniuc2021-10-282-2/+3
|
* Mock `couch_log:warning/2`Jay Doane2021-10-271-5/+22
| | | | | | | | | | | | | | | | | Prevent failures like: mem3_rep: find_source_seq_unknown_node_test...*failed* in function gen_server:call/2 (gen_server.erl, line 206) in call from couch_log:log/3 (src/couch_log.erl, line 73) in call from mem3_rep:find_source_seq_int/5 (src/mem3_rep.erl, line 248) in call from mem3_rep:'-find_source_seq_unknown_node_test/0-fun-0-'/0 (src/mem3_rep.erl, line 794) **exit:{noproc,{gen_server,call, [couch_log_server, {log,{log_entry,warning,<0.17426.5>, ["mem3_rep",32,102,105,110,100|...], "--------", ["2021",45,"10",45|...]}}]}} output:<<"">>
* Use unique ddoc id to prevent collisionsJay Doane2021-10-271-2/+4
| | | | | | | | | | | | | | | | | | | | Prevent failures like this from repeated test runs: mem3_bdu_test:73: mem3_bdu_shard_doc_test_ (t_design_docs_are_not_validated)...*failed* in function mem3_bdu_test:'-t_design_docs_are_not_validated/1-fun-0-'/1 (test/eunit/mem3_bdu_test.erl, line 206) in call from mem3_bdu_test:t_design_docs_are_not_validated/1 (test/eunit/mem3_bdu_test.erl, line 206) in call from eunit_test:run_testfun/1 (eunit_test.erl, line 71) in call from eunit_proc:run_test/1 (eunit_proc.erl, line 510) in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 335) in call from eunit_proc:handle_test/2 (eunit_proc.erl, line 493) in call from eunit_proc:tests_inorder/3 (eunit_proc.erl, line 435) in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 325) **error:{assertEqual,[{module,mem3_bdu_test}, {line,206}, {expression,"Code"}, {expected,201}, {value,409}]} output:<<"">>
* Parameterize shards dbJay Doane2021-10-271-31/+31
| | | | | Depending on configuration, it is possible for the shards db to be different than `_dbs`.
* Mock `couch_log` for config applicationJay Doane2021-10-271-0/+4
| | | | | | | | | | | | | | | Prevent failures like this: mem3_sync_event_listener:267: should_set_sync_delay...*failed* in function gen_server:call/3 (gen_server.erl, line 214) in call from mem3_sync_event_listener:'-should_set_sync_delay/1-fun-1-'/1 (src/mem3_sync_event_listener.erl, line 268) **exit:{{noproc,{gen_server,call, [couch_log_server, {log,{log_entry,notice,<0.31789.5>, ["config",58,32,91,[...]|...], "--------", ["2021",45,[...]|...]}}]}},
* Minimize rewinds when a node is down (#3792)Adam Kocoloski2021-10-271-17/+59
| | | | | | | | | | | | | | | | | | | | | Our existing logic for handling rewinds in the changes feed addresses the following cases: - A node that contributed to a sequence is in maintenance mode - A shard that contributed to a sequence has been split This patch adds support for cases where the node that contributed to a client-supplied sequence is down at the beginning of the request handling. It reuses the same logic as the maintenance mode case as these two situations really ought to be handled the same way. A future improvement would be to unify the "node down" and "shard split" logic so that we could handle the compound case, e.g. replacing a shard from a down node with a pair of shards from nodes that cover the same range. Fixes #3788 Co-authored-by: Nick Vatamaniuc <vatamane@gmail.com>
* Eliminate eunit compiler warningsJay Doane2021-10-256-33/+18
| | | | | | | | | - Unused functions in `couch_util_tests` - Unused variables in `couch_prometheus_e2e_tests` - Unused variable in `dreyfus_blacklist_await_test` - Deprecated BIF `erlang:now/0` in `dreyfus_purge_test` - `export_all` flag in dreyfus tests - Unused variable in `mem3_reshard_test`
* Fix flaky mem3_bdu testNick Vatamaniuc2021-10-251-9/+1
| | | | | | The test only checks that we can update the shard doc so we just verifythat. Apparently, it doesn't mean we can synchronously access the newly created db info right away so we just skip that part to avoid a flaky failure.
* Fix flaky retain_stats replicator testNick Vatamaniuc2021-10-241-6/+14
| | | | | | | | | | | | | | | | Noticed this flaky test show up in a few recent test runs, for [example](https://ci-couchdb.apache.org/blue/organizations/jenkins/jenkins-cm1%2FPullRequests/detail/PR-3799/1/pipeline) The test was flaky as We were only waiting for the replication task or scheduler job to appear in the list but didn't not wait until the value of the task had been updated to an expected value. So the task might have appeared but then only half the docs written (say, 5 instead of 10). Testing the value at that stage is too early and the test would fail. To fix the issue, besides waiting on the task/job to appear in the list, also wait until its `docs_written` value matches the expected value. By that point `docs_read` should have caught up as well.
* Add libicu version fetching and emit it in the _node/_local/_versionsNick Vatamaniuc2021-10-244-10/+98
| | | | | | | Fetch the libicu base version as well as the collator version. The base version may be used to determine which libicu library CouchDB is using. The collator version may be used to debug view behavior in case when collation order had changed from one version ot the next.
* Move custodian VDU to a BDU and fix _all_dbs off-by-one limit bugNick Vatamaniuc2021-10-248-81/+448
| | | | | | | | | This fixes issue: https://github.com/apache/couchdb/issues/3786 In addition, add few _all_dbs limit tests since we didn't seem to have any previously to catch such issues. Plus, test some of the corner cases which should be caught by the BDU and should return a 403 error code.
* Include shard uuids in db_info update sequencesNick Vatamaniuc2021-10-194-3/+97
| | | | | | | | | | | | | | | | | | | | | This means `update_seq` values from `GET $db` `last_seq` returned from ` GET $db/_changes?since=now&limit=` will be more resilient to change feed rewinds. Besides, those sequences will now be more consistent and users won't have to wonder why one opaque sequence works slightly differently than another opaque update sequence. Previously, when the sequences were returned only as numeric values, it was impossible to calculate replacements and change feeds had to always rewind back to 0 for those ranges. With uuids and epochs in play, it is possible to figure out that some shards might have moved to new nodes or find internal replication checkpoints to avoid streaming changes feeds from 0 on those ranges. Some replication Elixir tests decode update sequences, so those were updated to handle the new uuid and epoch format as well. Fixes: https://github.com/apache/couchdb/issues/3787 Co-author: Adam Kocoloski kocolosk@apache.org
* Remove couch_icu_driver moduleNick Vatamaniuc2021-10-1511-343/+390
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | couch_icu_driver is only used for binary strings comparison in couch_ejson_compare when expression depth becomes greater than 10. The logic for string comparison is identical to what couch_ejson_compare uses, so opt to just use couch_ejson_compare instead of keeping a whole other binary collation driver around. To avoid a possible infinite loop if couch_ejson_compare nif fails to load, throw a nif loading error as is common for nif modules. To avoid another case of a possible infinite retry from of badarg generated by max depth, and/or an actual bad ejson term, use a specific max depth error so we don't have to guess when we catch it and retry term traversal in erlang. There was another uncodumented case when badarg was thrown besides max depth or an invalid arg. It was when a prop value was compared with any other supported type. In erlang it would be handled in these clauses: ``` less_erl({A},{B}) when is_list(A), is_list(B) -> less_props(A,B); less_erl({A},_) when is_list(A) -> -1; less_erl(_,{B}) when is_list(B) -> 1. ``` However, in C we threw a badarg for the last two clauses and relied on erlang to do all the work. This case was a potential performance issue as well since that is a common comparison for mango where we may compare keys against the max json object value (<<255,255,255,255>>). Add a few property tests in order to validate collation behavior. The two main ones are: 1) Given an expected sort order of some test values, assert that both the erlang and nif collators would correctly order any of those test values. 2) In general, the nif collator would sort any json the same way as the erlang one.
* Fix badarith error in get_db_timeout when request timeout = infinityNick Vatamaniuc2021-10-151-1/+9
| | | | | | | | | | | | | `infinity` it turns out is a valid configuration value for fabric request_timeout. We can pass that to Erlang `receive` statement, any arithmetic with it would fail. To guard against the crash use the max small int value (60 bits). With enough shards, due to the exponential nature of the algorithm, we still get a nice progression from the minimum 100 msec all the way up to the large int value. This case is illustrated in the test. Issue: https://github.com/apache/couchdb/issues/3789
* Avoid badmatch for fabric:cleanup_index_filesjiahuili2021-10-131-1/+5
|
* Return empty list from fabric:inactive_index_files/1 when database doesn't existjiahuili2021-10-122-3/+67
|
* Kill Pid synchronouslyJay Doane2021-10-121-1/+1
| | | | | | | | Prevent this race condition: *** context setup failed *** **in function couch_replicator_doc_processor:setup/0 (src/couch_replicator_doc_processor.erl, line 872) **error:{badmatch,{error,{already_started,<0.4946.0>}}}
* Eliminate compiler warningsJay Doane2021-10-123-3/+4
| | | | | | - Prepend unused variable with underscore - Add nowarn_export_all compiler option - Use STACKTRACE macro
* Eliminate compiler warningsJay Doane2021-10-092-4/+1
| | | | Delete unused function and remove unused variable.
* Try harder to avoid a change feed rewind after a shard moveNick Vatamaniuc2021-10-081-8/+24
| | | | | | | | | | | | | | | | In the previous attempt [1] we improved the logic by spawning workers on the matching target shards only. However, that wasn't enough as workers might still reject the passed in sequence from the old node when it asserts ownership locally on each shard. Re-use the already existing replacement clause, where after uuid is matched, we try harder to find the highest viable sequence. To use the unit test setup as an example, if the shard moved from node1 to node2, and recorded epoch `{node2, 10}` on the new node, then a sequence generated on node1 before the move, for example 12, would rewind down to 10 only when calculated on new location on node2, instead of being rewound all the way down to 0. [1] https://github.com/apache/couchdb/commit/e83935c7f8c3e47b47f07f22ece327f6529d4da0
* Bump docs for 3.2.0-RC23.2.0-RC23.2.0Nick Vatamaniuc2021-10-051-1/+1
| | | | To include another changelog entry: https://github.com/apache/couchdb-documentation/commit/4f00da0b0cedf63ebf391e43b1a56bb36f7d0f96
* Fix Windows makefile for Fauxton (#3777)Joan Touzet2021-10-051-1/+1
| | | Missed file in f85cff669f20cee0a54da7bb8c645dfc4d2de5c9
* Bump jiffy to CouchDB-1.0.9-1Nick Vatamaniuc2021-10-051-1/+1
| | | | | | Based off of the upstream 1.0.9 + CouchDB clone changes https://github.com/apache/couchdb-jiffy/releases/tag/CouchDB-1.0.9-1
* backport C++ standard settings from SM86 to SM78Dave Cottlehuber2021-09-301-2/+2
|
* Bump version to 3.2.0 and update dependencies3.2.0-RC1Nick Vatamaniuc2021-09-274-8/+8
| | | | | | Fauxton was failing so we backported the fix from main https://github.com/apache/couchdb/commit/f85cff669f20cee0a54da7bb8c645dfc4d2de5c9 for it.
* Remove unused variables and extra whitespaceJay Doane2021-09-271-3/+2
|
* Remove redundant CSP testsJay Doane2021-09-271-45/+0
| | | | | These two tests exercise the same assertions as the individual `sandbox_doc_attachments` test case in chttpd_csp_tests.erl.
* Make view merge row aggregation in fabric stableNick Vatamaniuc2021-09-242-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | We do that by matching the comparator function behavior used during row merging [1] in with the comparison function used when sorting the rows on view shards [2]. This goes along with the constraint in the lists:merge/3 docs which indicates that the input lists should be sorted according to the same comparator [3] as the one passed to the lists:merge/3 call. The stability of returned rows results from the case when both keys match as equal. Now `lists:merge/3` will favor the element in the existing rows list instead of always replacing [4] the older matching row with the last arriving row, since now `less(A, A)` will be `false`, while previously it was `true`. The fix was found by Adam when discussing issue #3750 https://github.com/apache/couchdb/issues/3750#issuecomment-920947424 Co-author: Adam Kocoloski <kocolosk@apache.org> [1] https://github.com/apache/couchdb/blob/3.x/src/fabric/src/fabric_view_map.erl#L244-L248 [2] https://github.com/apache/couchdb/blob/3.x/src/couch_mrview/src/couch_mrview_util.erl#L1103-L1107 [3] https://erlang.org/doc/man/lists.html#merge-3 [4] https://github.com/erlang/otp/blob/master/lib/stdlib/src/lists.erl#L2668-L2675
* Properly sort descending=true view results when a key list is providedNick Vatamaniuc2021-09-222-10/+48
| | | | | | | | | | | | | Results should now be returned in descending {key, doc_id} order. The idea is to reverse the key list before sending it to the workers, so they will emit rows in reverse order. Also, we are using the same reversed list when building the KeyDict structure on the coordinator. That way the order of the sent rows and the expected coordinator sorting order will match. For testing, enhance an existing multi-key Elixir view test to test both ascending and descending cases and actually check that the rows are in the correct order each time.
* Merge pull request #3752 from cloudant/port-3286iilyak2021-09-179-21/+1305
|\ | | | | Port 3286 - Add ability to control which Elixir integration tests to run
| * Remove error message on mix testBessenyei Balázs Donát2021-09-161-3/+0
| |
| * Disable some testsILYA Khlopotov2021-09-161-0/+4
| |
| * Use elixir-suiteILYA Khlopotov2021-09-161-1/+1
| |
| * Update elixir test suiteILYA Khlopotov2021-09-161-32/+151
| |
| * Load test helpers to prevent crash of test case extractorILYA Khlopotov2021-09-161-0/+9
| |
| * Fix logic in ensure_exunit_startedILYA Khlopotov2021-09-161-1/+1
| |
| * Add --erlang-config option to dev/runILYA Khlopotov2021-09-161-1/+8
| |
| * Add ability to control which Elixir integration tests to runILYA Khlopotov2021-09-167-16/+1164
|/ | | | | | | | | | | | | | | | New `elixir-suite` Makefile target is added. It runs a predefined set of elixir integration tests. The feature is controlled by two files: - test/elixir/test/config/suite.elixir - contains list of all available tests - test/elixir/test/config/skip.elixir - contains list of tests to skip In order to update the `test/elixir/test/config/suite.elixir` when new tests are added. The one would need to run the following command: ``` MIX_ENV=integration mix suite > test/elixir/test/config/suite.elixir ```
* Merge pull request #3754 from apache/fix-limit0-for-views-againRobert Newson2021-09-151-5/+4
|\ | | | | Fix limit0 for views again
| * Restrict the limit=0 clause to the sorted=false case as originally intendedRobert Newson2021-09-151-1/+1
| | | | | | | | | | | | | | | | The limit=0 clause was introduced in commit 4e0c97bf which added sorted=false support. It accidentally matches when the user specifies limit=0 and causes us not to apply the logic that ensures we collect a {meta, Meta} message from each shard range and then send the total_rows and offset fields.
| * Revert "Fix meta result for views when limit = 0"Robert Newson2021-09-151-5/+4
|/ | | | This reverts commit a36e7308ab4a2cfead6da64a9f83b7776722382d.
* Fix meta result for views when limit = 0Nick Vatamaniuc2021-09-151-4/+5
| | | | | | | | | | | | | | | | | | | Previously, as soon as one row returned, we immediately stopped, erroneously assuming that meta for all ranges have already been received. However, it was possible that we'd get meta from range 00-7f, then a row from 00-7f before getting meta from 7f-ff and thus we'd return an empty result. To fix the issue we simply re-use the already existing limit=0 clause from the fabric_view:maybe_send_row/1 function which will wait until there is a complete ring before returning. That relies on updating the counters (the ring) only with meta return and not with view rows, so if the ring is complete, we know we only completed with meta. The other issue with limit=0 clause was that it wasn't properly ack-ing the received row. Rows are acked for sorted=false case below and for the regular limit>0, sorted=true case in fabric_view:get_next_row/1. Issue: https://github.com/apache/couchdb/issues/3750
* Fix splitting shards with large purge sequencesNick Vatamaniuc2021-09-102-1/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if the source db purge sequence > `purge_infos_limit`, shard splitting would crash with the `{{invalid_start_purge_seq,0}, [{couch_bt_engine,fold_purge_infos,5...` error. That was because purge sequences were always copied starting from 0. That would only work as long as the total number of purges stayed below the purge_infos_limit threshold. In order to correctly gather the purge sequences, the start sequence must be based off of the actual oldest sequence currently available. An example of how it should be done is in the `mem_rpc` module, when loading purge infos [0], so here we do exactly the same. The `MinSeq - 1` logic is also evident by inspecting the fold_purge_infos [1] function. The test sets up the exact scenario as described above: reduces the purge info limit to 10 then purges 20 documents. By purging more than the limit, we ensure the starting sequence is now != 0. However, the purge sequence btree is actually trimmed down during compaction. That is why there are a few extra helper functions to ensure compaction runs and finishes before shard splitting starts. Fixes: https://github.com/apache/couchdb/issues/3738 [0] https://github.com/apache/couchdb/blob/4ea9f1ea1a2078162d0e281948b56469228af3f7/src/mem3/src/mem3_rpc.erl#L206-L207 [1] https://github.com/apache/couchdb/blob/3.x/src/couch/src/couch_bt_engine.erl#L625-L637
* Remove log line from CSP logic in chttpd_utilNick Vatamaniuc2021-09-101-1/+0
|
* Remove debug log line from attachments handlerNick Vatamaniuc2021-09-101-1/+0
|
* Improve fabric_util get_db timeout logicNick Vatamaniuc2021-09-092-2/+64
| | | | | | | | | | | | | | | | | | | | | | | | | Previously, users with low {Q, N} dbs often got the `"No DB shards could be opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout was too low to open the few available shards and the whole request would crash with a 500 error. Attempt to calculate an optimal timeout value based on the number of shards and the max fabric request timeout limit. The sequence of doubling (by default) timeouts forms a geometric progression. Use the well known closed form formula for the sum [0], and the maximum request timeout, to calculate the initial timeout. The test case illustrates a few examples with some default Q and N values. Because we don't want the timeout value to be too low, since it takes time to open shards, and we don't want to quickly cycle through a few initial shards and discard the results, the minimum inital timeout is clipped to the previously hard-coded 100 msec timeout. Unlike previously however, this minimum value can now also be configured. [0] https://en.wikipedia.org/wiki/Geometric_series Fixes: https://github.com/apache/couchdb/issues/3733
* feat: add more fine-grained CSP supportJan Lehnardt2021-09-088-46/+316
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces CSP settings for attachments and show/list funs and streamlines the configuration with the existing Fauxton CSP options. Deprecates the old `[csp] enable` and `[csp] header_value` config options, but they are honoured going forward. They are replaced with `[csp] utils_enable` and `[csp] utils_header_value` respectively. The funcitonality and default values remain the same. In addition, these new config options are added, along with their default values: ``` [csp] attachments_enable = true attachments_header_value = sandbox showlist_enable = true showlist_header_value = sandbox ``` These add `Content-Security-Policy` headers to all attachment requests and to all non-JSON show and all list function responses. Co-authored-by: Nick Vatamaniuc <vatamane@gmail.com> Co-authored-by: Robert Newson <rnewson@apache.org>