summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Move custodian VDU to a BDU and fix _all_dbs off-by-one limit bugreplace-custodian-design-docNick Vatamaniuc2021-10-227-81/+358
| | | | | | | | | This fixes issue: https://github.com/apache/couchdb/issues/3786 In addition, add few _all_dbs limit tests since we didn't seem to have any previously to catch such issues. Plus, test some of the corner cases which should be caught by the BDU and should return a 403 error code.
* Include shard uuids in db_info update sequencesNick Vatamaniuc2021-10-194-3/+97
| | | | | | | | | | | | | | | | | | | | | This means `update_seq` values from `GET $db` `last_seq` returned from ` GET $db/_changes?since=now&limit=` will be more resilient to change feed rewinds. Besides, those sequences will now be more consistent and users won't have to wonder why one opaque sequence works slightly differently than another opaque update sequence. Previously, when the sequences were returned only as numeric values, it was impossible to calculate replacements and change feeds had to always rewind back to 0 for those ranges. With uuids and epochs in play, it is possible to figure out that some shards might have moved to new nodes or find internal replication checkpoints to avoid streaming changes feeds from 0 on those ranges. Some replication Elixir tests decode update sequences, so those were updated to handle the new uuid and epoch format as well. Fixes: https://github.com/apache/couchdb/issues/3787 Co-author: Adam Kocoloski kocolosk@apache.org
* Remove couch_icu_driver moduleNick Vatamaniuc2021-10-1511-343/+390
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | couch_icu_driver is only used for binary strings comparison in couch_ejson_compare when expression depth becomes greater than 10. The logic for string comparison is identical to what couch_ejson_compare uses, so opt to just use couch_ejson_compare instead of keeping a whole other binary collation driver around. To avoid a possible infinite loop if couch_ejson_compare nif fails to load, throw a nif loading error as is common for nif modules. To avoid another case of a possible infinite retry from of badarg generated by max depth, and/or an actual bad ejson term, use a specific max depth error so we don't have to guess when we catch it and retry term traversal in erlang. There was another uncodumented case when badarg was thrown besides max depth or an invalid arg. It was when a prop value was compared with any other supported type. In erlang it would be handled in these clauses: ``` less_erl({A},{B}) when is_list(A), is_list(B) -> less_props(A,B); less_erl({A},_) when is_list(A) -> -1; less_erl(_,{B}) when is_list(B) -> 1. ``` However, in C we threw a badarg for the last two clauses and relied on erlang to do all the work. This case was a potential performance issue as well since that is a common comparison for mango where we may compare keys against the max json object value (<<255,255,255,255>>). Add a few property tests in order to validate collation behavior. The two main ones are: 1) Given an expected sort order of some test values, assert that both the erlang and nif collators would correctly order any of those test values. 2) In general, the nif collator would sort any json the same way as the erlang one.
* Fix badarith error in get_db_timeout when request timeout = infinityNick Vatamaniuc2021-10-151-1/+9
| | | | | | | | | | | | | `infinity` it turns out is a valid configuration value for fabric request_timeout. We can pass that to Erlang `receive` statement, any arithmetic with it would fail. To guard against the crash use the max small int value (60 bits). With enough shards, due to the exponential nature of the algorithm, we still get a nice progression from the minimum 100 msec all the way up to the large int value. This case is illustrated in the test. Issue: https://github.com/apache/couchdb/issues/3789
* Avoid badmatch for fabric:cleanup_index_filesjiahuili2021-10-131-1/+5
|
* Return empty list from fabric:inactive_index_files/1 when database doesn't existjiahuili2021-10-122-3/+67
|
* Kill Pid synchronouslyJay Doane2021-10-121-1/+1
| | | | | | | | Prevent this race condition: *** context setup failed *** **in function couch_replicator_doc_processor:setup/0 (src/couch_replicator_doc_processor.erl, line 872) **error:{badmatch,{error,{already_started,<0.4946.0>}}}
* Eliminate compiler warningsJay Doane2021-10-123-3/+4
| | | | | | - Prepend unused variable with underscore - Add nowarn_export_all compiler option - Use STACKTRACE macro
* Eliminate compiler warningsJay Doane2021-10-092-4/+1
| | | | Delete unused function and remove unused variable.
* Try harder to avoid a change feed rewind after a shard moveNick Vatamaniuc2021-10-081-8/+24
| | | | | | | | | | | | | | | | In the previous attempt [1] we improved the logic by spawning workers on the matching target shards only. However, that wasn't enough as workers might still reject the passed in sequence from the old node when it asserts ownership locally on each shard. Re-use the already existing replacement clause, where after uuid is matched, we try harder to find the highest viable sequence. To use the unit test setup as an example, if the shard moved from node1 to node2, and recorded epoch `{node2, 10}` on the new node, then a sequence generated on node1 before the move, for example 12, would rewind down to 10 only when calculated on new location on node2, instead of being rewound all the way down to 0. [1] https://github.com/apache/couchdb/commit/e83935c7f8c3e47b47f07f22ece327f6529d4da0
* Bump docs for 3.2.0-RC23.2.0-RC23.2.0Nick Vatamaniuc2021-10-051-1/+1
| | | | To include another changelog entry: https://github.com/apache/couchdb-documentation/commit/4f00da0b0cedf63ebf391e43b1a56bb36f7d0f96
* Fix Windows makefile for Fauxton (#3777)Joan Touzet2021-10-051-1/+1
| | | Missed file in f85cff669f20cee0a54da7bb8c645dfc4d2de5c9
* Bump jiffy to CouchDB-1.0.9-1Nick Vatamaniuc2021-10-051-1/+1
| | | | | | Based off of the upstream 1.0.9 + CouchDB clone changes https://github.com/apache/couchdb-jiffy/releases/tag/CouchDB-1.0.9-1
* backport C++ standard settings from SM86 to SM78Dave Cottlehuber2021-09-301-2/+2
|
* Bump version to 3.2.0 and update dependencies3.2.0-RC1Nick Vatamaniuc2021-09-274-8/+8
| | | | | | Fauxton was failing so we backported the fix from main https://github.com/apache/couchdb/commit/f85cff669f20cee0a54da7bb8c645dfc4d2de5c9 for it.
* Remove unused variables and extra whitespaceJay Doane2021-09-271-3/+2
|
* Remove redundant CSP testsJay Doane2021-09-271-45/+0
| | | | | These two tests exercise the same assertions as the individual `sandbox_doc_attachments` test case in chttpd_csp_tests.erl.
* Make view merge row aggregation in fabric stableNick Vatamaniuc2021-09-242-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | We do that by matching the comparator function behavior used during row merging [1] in with the comparison function used when sorting the rows on view shards [2]. This goes along with the constraint in the lists:merge/3 docs which indicates that the input lists should be sorted according to the same comparator [3] as the one passed to the lists:merge/3 call. The stability of returned rows results from the case when both keys match as equal. Now `lists:merge/3` will favor the element in the existing rows list instead of always replacing [4] the older matching row with the last arriving row, since now `less(A, A)` will be `false`, while previously it was `true`. The fix was found by Adam when discussing issue #3750 https://github.com/apache/couchdb/issues/3750#issuecomment-920947424 Co-author: Adam Kocoloski <kocolosk@apache.org> [1] https://github.com/apache/couchdb/blob/3.x/src/fabric/src/fabric_view_map.erl#L244-L248 [2] https://github.com/apache/couchdb/blob/3.x/src/couch_mrview/src/couch_mrview_util.erl#L1103-L1107 [3] https://erlang.org/doc/man/lists.html#merge-3 [4] https://github.com/erlang/otp/blob/master/lib/stdlib/src/lists.erl#L2668-L2675
* Properly sort descending=true view results when a key list is providedNick Vatamaniuc2021-09-222-10/+48
| | | | | | | | | | | | | Results should now be returned in descending {key, doc_id} order. The idea is to reverse the key list before sending it to the workers, so they will emit rows in reverse order. Also, we are using the same reversed list when building the KeyDict structure on the coordinator. That way the order of the sent rows and the expected coordinator sorting order will match. For testing, enhance an existing multi-key Elixir view test to test both ascending and descending cases and actually check that the rows are in the correct order each time.
* Merge pull request #3752 from cloudant/port-3286iilyak2021-09-179-21/+1305
|\ | | | | Port 3286 - Add ability to control which Elixir integration tests to run
| * Remove error message on mix testBessenyei Balázs Donát2021-09-161-3/+0
| |
| * Disable some testsILYA Khlopotov2021-09-161-0/+4
| |
| * Use elixir-suiteILYA Khlopotov2021-09-161-1/+1
| |
| * Update elixir test suiteILYA Khlopotov2021-09-161-32/+151
| |
| * Load test helpers to prevent crash of test case extractorILYA Khlopotov2021-09-161-0/+9
| |
| * Fix logic in ensure_exunit_startedILYA Khlopotov2021-09-161-1/+1
| |
| * Add --erlang-config option to dev/runILYA Khlopotov2021-09-161-1/+8
| |
| * Add ability to control which Elixir integration tests to runILYA Khlopotov2021-09-167-16/+1164
|/ | | | | | | | | | | | | | | | New `elixir-suite` Makefile target is added. It runs a predefined set of elixir integration tests. The feature is controlled by two files: - test/elixir/test/config/suite.elixir - contains list of all available tests - test/elixir/test/config/skip.elixir - contains list of tests to skip In order to update the `test/elixir/test/config/suite.elixir` when new tests are added. The one would need to run the following command: ``` MIX_ENV=integration mix suite > test/elixir/test/config/suite.elixir ```
* Merge pull request #3754 from apache/fix-limit0-for-views-againRobert Newson2021-09-151-5/+4
|\ | | | | Fix limit0 for views again
| * Restrict the limit=0 clause to the sorted=false case as originally intendedRobert Newson2021-09-151-1/+1
| | | | | | | | | | | | | | | | The limit=0 clause was introduced in commit 4e0c97bf which added sorted=false support. It accidentally matches when the user specifies limit=0 and causes us not to apply the logic that ensures we collect a {meta, Meta} message from each shard range and then send the total_rows and offset fields.
| * Revert "Fix meta result for views when limit = 0"Robert Newson2021-09-151-5/+4
|/ | | | This reverts commit a36e7308ab4a2cfead6da64a9f83b7776722382d.
* Fix meta result for views when limit = 0Nick Vatamaniuc2021-09-151-4/+5
| | | | | | | | | | | | | | | | | | | Previously, as soon as one row returned, we immediately stopped, erroneously assuming that meta for all ranges have already been received. However, it was possible that we'd get meta from range 00-7f, then a row from 00-7f before getting meta from 7f-ff and thus we'd return an empty result. To fix the issue we simply re-use the already existing limit=0 clause from the fabric_view:maybe_send_row/1 function which will wait until there is a complete ring before returning. That relies on updating the counters (the ring) only with meta return and not with view rows, so if the ring is complete, we know we only completed with meta. The other issue with limit=0 clause was that it wasn't properly ack-ing the received row. Rows are acked for sorted=false case below and for the regular limit>0, sorted=true case in fabric_view:get_next_row/1. Issue: https://github.com/apache/couchdb/issues/3750
* Fix splitting shards with large purge sequencesNick Vatamaniuc2021-09-102-1/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if the source db purge sequence > `purge_infos_limit`, shard splitting would crash with the `{{invalid_start_purge_seq,0}, [{couch_bt_engine,fold_purge_infos,5...` error. That was because purge sequences were always copied starting from 0. That would only work as long as the total number of purges stayed below the purge_infos_limit threshold. In order to correctly gather the purge sequences, the start sequence must be based off of the actual oldest sequence currently available. An example of how it should be done is in the `mem_rpc` module, when loading purge infos [0], so here we do exactly the same. The `MinSeq - 1` logic is also evident by inspecting the fold_purge_infos [1] function. The test sets up the exact scenario as described above: reduces the purge info limit to 10 then purges 20 documents. By purging more than the limit, we ensure the starting sequence is now != 0. However, the purge sequence btree is actually trimmed down during compaction. That is why there are a few extra helper functions to ensure compaction runs and finishes before shard splitting starts. Fixes: https://github.com/apache/couchdb/issues/3738 [0] https://github.com/apache/couchdb/blob/4ea9f1ea1a2078162d0e281948b56469228af3f7/src/mem3/src/mem3_rpc.erl#L206-L207 [1] https://github.com/apache/couchdb/blob/3.x/src/couch/src/couch_bt_engine.erl#L625-L637
* Remove log line from CSP logic in chttpd_utilNick Vatamaniuc2021-09-101-1/+0
|
* Remove debug log line from attachments handlerNick Vatamaniuc2021-09-101-1/+0
|
* Improve fabric_util get_db timeout logicNick Vatamaniuc2021-09-092-2/+64
| | | | | | | | | | | | | | | | | | | | | | | | | Previously, users with low {Q, N} dbs often got the `"No DB shards could be opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout was too low to open the few available shards and the whole request would crash with a 500 error. Attempt to calculate an optimal timeout value based on the number of shards and the max fabric request timeout limit. The sequence of doubling (by default) timeouts forms a geometric progression. Use the well known closed form formula for the sum [0], and the maximum request timeout, to calculate the initial timeout. The test case illustrates a few examples with some default Q and N values. Because we don't want the timeout value to be too low, since it takes time to open shards, and we don't want to quickly cycle through a few initial shards and discard the results, the minimum inital timeout is clipped to the previously hard-coded 100 msec timeout. Unlike previously however, this minimum value can now also be configured. [0] https://en.wikipedia.org/wiki/Geometric_series Fixes: https://github.com/apache/couchdb/issues/3733
* feat: add more fine-grained CSP supportJan Lehnardt2021-09-088-46/+316
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces CSP settings for attachments and show/list funs and streamlines the configuration with the existing Fauxton CSP options. Deprecates the old `[csp] enable` and `[csp] header_value` config options, but they are honoured going forward. They are replaced with `[csp] utils_enable` and `[csp] utils_header_value` respectively. The funcitonality and default values remain the same. In addition, these new config options are added, along with their default values: ``` [csp] attachments_enable = true attachments_header_value = sandbox showlist_enable = true showlist_header_value = sandbox ``` These add `Content-Security-Policy` headers to all attachment requests and to all non-JSON show and all list function responses. Co-authored-by: Nick Vatamaniuc <vatamane@gmail.com> Co-authored-by: Robert Newson <rnewson@apache.org>
* feat(couch_file): log file path when a file was truncated from under usJan Lehnardt2021-09-011-3/+14
|
* Merge keys from rebar.configncshaw2021-08-274-3/+16
|
* Avoid change feed rewinds after shard movesNick Vatamaniuc2021-08-268-9/+414
| | | | | | | | | | | | | | | When shards are moved to new nodes, and the user supplies a change sequence from the old shard map configuration, attempt to match missing nodes and ranges by inspecting current shard uuids in order to avoid rewinds. Previously, if a node and range was missing, we randomly picked a node in the appropriate range, so 1/3 of the time we might have hit the exact node, but 2/3 of the time we would end up with a complete changes feed rewind to 0. Unfortunately, this involves a fabric worker scatter gather operation to all shard copies. This should only happen when we get an old sequence. We rely on that happening rarely, mostly right after the shards moved, then users would get new sequence from the recent shard map.
* Remove unused fabric_doc_attachmentsNick Vatamaniuc2021-08-261-160/+0
| | | | | | This module was kept around since 2.2.0 only to facilitate cluster upgrades after we switched the receiver logic to not closures around between nodes https://github.com/apache/couchdb/commit/fe53e437ca5ec9d23aa1b55d7934daced157a9e3
* fix: avoid dropping attachment chunks on quorum writesJan Lehnardt2021-08-263-18/+18
| | | | | | | | | | | | | | | | | | | This only applies to databases that have an n > [cluster] n. Our `middleman()` function that proxies attachment streams from the incoming HTTP socket on the coordinating node to the target shard-bearing nodes used the server config to determine whether it should start dropping chunks from the stream. If a database was created with a larger `n`, the `middleman()` function could have started to drop attachment chunks before all attached nodes had a chance to receive it. This fix uses a database’s concrete `n` value rather than the server config default value. Co-Authored-By: James Coglan <jcoglan@gmail.com> Co-Authored-By: Robert Newson <rnewson@apache.org>
* Discard a payload on a delete attachment requestEric Avdey2021-08-262-0/+18
| | | | | | | | | | | While including a payload within a DELETE request is not forbidden by RFC7231 its presence on a delete attachment request leaves a mochiweb acceptor in a semi-opened state since mochiweb's using lazy load for the request bodies. This makes a next immediate request to the same acceptor to hung until previous request's receive timeout. This PR adds a step to explicitly "drain" and discard an entity body on a delete attachment request to prevent that.
* Fix fauxton_root templating in bin/couchdb scriptNick Vatamaniuc2021-08-201-1/+3
| | | | | | | | Rebar mustache templating engine has a bug when handling the }}} brackets in a case like {...{{var}}}. So we work around the issue by using a separate variable. This is an alternate fix for issue: https://github.com/apache/couchdb/pull/3617
* Update smoosh docs to use rpc:multicallRussell Branca2021-08-201-42/+21
|
* Disable running ibrowse testsNick Vatamaniuc2021-08-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | They used to be disabled before the last major ibrowse upgrade. On MacOS and FreeBSD the following tests fails periodically: ``` ibrowse_tests: running_server_fixture_test_ (Pipeline too small signals retries)...*failed* in function ibrowse_tests:'-small_pipeline/0-fun-5-'/1 (test/ibrowse_tests.erl, line 150) in call from ibrowse_tests:small_pipeline/0 (test/ibrowse_tests.erl, line 150) **error:{assertEqual,[{module,ibrowse_tests}, {line,150}, {expression,"Counts"}, {expected,"\n\n\n\n\n\n\n\n\n\n"}, {value,"\t\n\n\n\n\t\t\n\n\t"}]} output:<<"Load Balancer Pid : <0.494.0> ``` But seems to pass more reliable on Linux for some reson. It would be nice to run the tests of course but having a passing full platsform suite is more important.
* Ensure maybe_close message is sent to correct process (#3700)Russell Branca2021-08-101-1/+1
|
* Improve handling of + in urls 3.xncshaw2021-08-107-36/+116
|
* fix(dev/run): allow -n > 5Jan Lehnardt2021-08-091-1/+1
|
* Fix response code for existing att and wrong revncshaw2021-07-302-109/+108
|