summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* [Jenkins] Try to cleanup hung beam.smp (FreeBSD)jenkins-cleanup-freebsdJoan Touzet2020-01-171-0/+1
|
* Merge pull request #2466 from apache/mango_index_consistency_errorPeng Hui Jiang2020-01-181-0/+4
|\ | | | | Handle not_found atom in mango text indexes
| * Handle not_found docs in mango text indexesarchive/mango_index_consistency_errormango_index_consistency_errorWill Holley2020-01-171-0/+4
|/ | | | | | mango_cursor_text:get_json_docs may return a not_found atom instead of a Doc. In this case, we should just ignore the hit instead of attempting to evaluate it against a mango selector.
* Add a few missing settings to the default.ini fileNick Vatamaniuc2020-01-171-0/+17
| | | | | | Some rexi and reshard parameters Issue: https://github.com/apache/couchdb/issues/2457
* Instrument Mango execution statsWill Holley2020-01-175-10/+40
| | | | | | | | | | | | | | | | Adds metrics for mango execution statistics: * total docs examined (counter). Note that for non-quorum doc reads this needs to be counted at the shard level and not be the mango query coordinator * total results returned (counter) * query time (histogram) * mango:selector evaluations (may be queries or _changes feeds) For mrview-based queries which don't use quorum reads (the default), total docs examined should equal mango:selector evaluations. However, there are cases where mango selectors are evaluated outside of query time e.g. partial indexing, selector-based _changes which make it useful to maintain a distinct counter.
* Warn on mango index scanWill Holley2020-01-175-30/+65
| | | | | | | Adds a warning to the _find endpoint if the ratio of docs scanned: results returned is higher than a configurable threshold (default 10). This warning was previously generated in Fauxton; moving it to the server side allows us to expose it via _stats as well.
* Add mango.query_invalid_index counterWill Holley2020-01-172-0/+6
| | | | | Adds a metric to expose the number of Mango queries that could not use the index specified in the _find query.
* Refactor Mango warning generationWill Holley2020-01-173-42/+53
| | | | | | | | | | | | | | Refactor the code generating Mango warnings to (hopefully!) improve readability. This also moves the metric counter for unindexed queries such that it gets incremented only when no index is used. Previously, it would increment if *any* warning was generated (e.g. when Mango warned that the user-specified index was not found/valid but fell back to a different index). The warning string returned now also contains all warnings generated for the query, delimited via newlines - not just the first one.
* fix empty queries (#1783)Tony Sun2020-01-1611-16/+164
| | | | | | | | | | | | | | | | | | | | | | | * fix empty queries Mango text indexes currently throw function clauses when queries in two situations: 1) Empty Selector 2) Operators with empty arrays. We fix this issue and change the behavior as follows: 1) Any empty selector will fall back on _all_docs (like json indexes). 2) $or, $and, $in, $all that specify an empty arrays will be treated as no-op when combined with other operators. 3) A single no-nop query such as {"$or": []} will not be executed and return nothing just like json indexes. 4) $nin with an empty array will return all docs that match the field, just like json indexes. Co-authored-by: Jan Lehnardt <jan@apache.org> Co-authored-by: Peng Hui Jiang <jiangphcn@apache.org>
* Explicitly disallow SM60 on aarch64Joan Touzet2020-01-163-2/+12
| | | | Includes configure changes and Jenkins setting change.
* Debug mem3_sync_event_listener flakinessNick Vatamaniuc2020-01-161-0/+4
| | | | | Noticed mem3_sync_event_listner tests still fails intermetently, add a debug log to it to hopefully find the cause of the failure.
* Fix fabric worker failures for partition requestsNick Vatamaniuc2020-01-169-85/+239
| | | | | | | | | | | | | Previously any failed node or rexi worker error resulted in requests failing immediately even though there were available workers to keep handling the request. This was because the progress check function didn't account for the fact that partition requests only use a handful of shards which, by design, do not complete the full ring. Here we fix both partition info queries and dreyfus search functionality. We follow the pattern from fabric and pass through a set of "ring options" that let the progress function know it is dealing with partitions instead of a full ring.
* Merge pull request #2450 from apache/couchdb-ken-hastingsPeng Hui Jiang2020-01-161-1/+3
|\ | | | | Adjust way to detect presence of hastings for Ken
| * More way to detect presence of hastings for Kenjiangph2020-01-161-1/+3
|/ | | | | | | | After moving ken from https://github.com/apache/couchdb-ken to https://github.com/apache/couchdb/tree/master/src/ken. The directory structure related to ken is changed for downstream including Cloudant. Increase more way to detect presence of hastings for Ken so that Ken can work correctly for geospatial index.
* Add SameSite support to auth cookieWill Holley2020-01-154-1/+62
| | | | | | | | Adds a new configuration field, `couch_httpd_auth.same_site` which sets the `SameSite` attribute of the CouchDB auth cookie. If no value is set (the default), no `SameSite` attribute is added. Refs #2221
* mochiweb v2.20Will Holley2020-01-151-1/+1
|
* Preserve replication job stats when jobs are re-createdNick Vatamaniuc2020-01-144-82/+185
| | | | | | | | | | | | | | | | | | | | | | | | Previously we made sure replication job statistics were preserved when the jobs were started and stopped by the scheduler. However, if a db node restarted or user re-created the job, replication stats would be reset to 0. Some statistics like `docs_read` and `docs_written` are perhaps not as critical. However `doc_write_failures` is. That is the indicator that some replication docs have not replicated to the target. Not preserving that statistic meant users could perceive there was a data loss during replication -- data was replicated successfully according to the replication job with no write failures, user deletes source database, then some times later noticed some of their data is missing. These statistics were already logged in the checkpoint history and we just had to initialize a stats object from them when a replication job starts. In that initialization code we pick the highest values from either the running scheduler or the checkpointed log. The reason is that the running stats could be higher if say job was stopped suddenly and failed to checkpoint but scheduler retained the data. Fixes: #2414
* Properly account for replication stats when splitting bulk docs batchesNick Vatamaniuc2020-01-141-2/+3
| | | | | | | | | | | | Previously if batch of bulk docs had to be bisected in order to fit a lower max request size limit on the target, we only counted stats for the second batch. So it was possibly we might have missed some `doc_write_failures` updates which can be perceived as a data loss to the customer. So we use the handy-dandy `sum_stats/2` function to sum the return stats from both batches and return that. Issue: https://github.com/apache/couchdb/issues/2414
* Enable arm64v8 builds on Jenkins (#2436)Joan Touzet2020-01-141-11/+46
|
* Disable JIT compiler on SpiderMonkey 60Paul J. Davis2020-01-131-0/+3
| | | | | | | We've had a number of segfaults in the `make javascript` test suite. The few times we've been able to get core dumps all appear to indicate something wrong in the JIT compiler. Disabling the JIT compilers appears to prevent these segfaults.
* Eliminate multiple compiler warningsJoan Touzet2020-01-135-7/+4
| | | | | | | | | | | | | | | | | | | | | | We now only support OTP 20+, with 19 at a stretch. erlang:now/0 was deprecated in OTP 18, so we can now suppress these warnings: ``` /home/joant/couchdb/src/dreyfus/src/dreyfus_index_updater.erl:62: Warning: erlang:now/0: Deprecated BIF. See the "Time and Time Correction in Erlang" chapter of the ERTS User's Guide for more information. /home/joant/couchdb/src/dreyfus/src/dreyfus_index_updater.erl:83: Warning: erlang:now/0: Deprecated BIF. See the "Time and Time Correction in Erlang" chapter of the ERTS User's Guide for more information. ``` Also, some unused variables were removed: ``` /home/joant/couchdb/src/couch/src/couch_bt_engine.erl:997: Warning: variable 'NewSeq' is unused /home/joant/couchdb/src/mem3/src/mem3_rep.erl:752: Warning: variable 'TMap' is unused /home/joant/couchdb/src/dreyfus/src/dreyfus_httpd.erl:76: Warning: variable 'LimitValue' is unused /home/joant/couchdb/src/dreyfus/src/dreyfus_util.erl:345: Warning: variable 'Db' is unused ``` PRs to follow in ets_lru, hyper, ibrowse to track the rest of `erlang:now/0` deprecations.
* Improve replicator error reportingNick Vatamaniuc2020-01-135-31/+329
| | | | | | | | | | | | | | | | | | | | | | | Previously many HTTP requests failed noisily with `function_clause` errors. Expect some of those failures and handle them better. There are mainly 3 types of improvements: 1) Error messages are shorter. Instead of `function_clause` with a cryptic internal fun names, return a simple marker like `bulk_docs_failed` 2) Include the error body if it was returned. HTTP failures besides the error code may contain useful information in the body to help debug the failure. 3) Do not log or include the stack trace in the message. The error names are enough to identify the place were they are generated so avoid spamming the user and the logs with them. This is done by using `{shutdown, Error}` tuples to bubble up the error the replication scheduler. There is a small but related cleanup of removing source and target monitors since we'd want to handle those error better however those errors are never triggered since we removed local replication endpoints recently. Fixes: https://github.com/apache/couchdb/issues/2413
* Happy New Year 2020! (#2443)Joan Touzet2020-01-132-2/+2
|
* Merge pull request #2438 from apache/reset-corrupt-view-indexRobert Newson2020-01-112-1/+7
|\ | | | | Reset a view shard if the signature is wrong
| * Debug mem3 eunit errorreset-corrupt-view-indexPaul J. Davis2020-01-101-1/+1
| |
| * Reset a view shard if the signature is wrongRobert Newson2020-01-101-0/+6
|/ | | | | | | | | | We encountered a case_clause error when reading the header for a .view file as the response was {ok, {Sig, nil}} where Sig is neither the expected sig or the pre-upgrade sig (though surely the pre-1.2 goop is not firing anymore). We now log this specific issue and then proceed as if we found no valid header.
* When shard splitting make sure to reset the targets before any retriesNick Vatamaniuc2020-01-102-15/+5
| | | | | | | | | | Previously the target was reset only when the whole job started, but not when the initial copy phase restarted on its own. If that happened, we left the target around so the retry failed always with the `eexist` error. Target reset has a check to make sure the shards are not in the global shard map, in case someone manually added them, for example. If they are found there the job panics and exists.
* Remove debug logging from test/javascript/runPaul J. Davis2020-01-101-1/+0
|
* Remove EUnit retries on failureNick Vatamaniuc2020-01-091-10/+1
| | | | | | | Since we switched from Travis to Jenkins, let's see how tests run without retries in the new environment. For reference, retries were introduced in: https://github.com/apache/couchdb/commit/220462a1dd2d921fc4ecba3488f5fedefb75217f
* Use separate requests to write design when replicatingNick Vatamaniuc2020-01-092-12/+23
| | | | | | | | | | | | | Design doc writes could fail on the target when replicating with non-admin credentials. Typically the replicator will skip over them and bump the `doc_write_failures` counter. However, that relies on the POST request returning a `200 OK` response. If the authentication scheme is implemented such that the whole request fails if some docs don't have enough permission to be written, then the replication job ends up crashing with an ugly exception and gets stuck retrying forever. In order to accomodate that scanario write _design docs in their separate requests just like we write attachments. Fixes: #2415
* Include JavaScript JUnit reports in JenkinsPaul J. Davis2020-01-092-12/+12
|
* Prevent the elision of `jenkins` in log URLsPaul J. Davis2020-01-091-1/+1
|
* Include test reports when uploading logsPaul J. Davis2020-01-091-0/+3
|
* Generate test results on build failuresPaul J. Davis2020-01-093-2/+8
|
* Include JavaScript test results in reportPaul J. Davis2020-01-091-1/+2
|
* Add a JUnit report to JavaScript testsPaul J. Davis2020-01-092-4/+73
|
* Fix missing parentheses in couchdb.inWill Holley2020-01-091-2/+2
| | | | 8e89688 added a syntax error to couchdb.in. This fixes it.
* Address flaky test failure on t_invalid_view/1jiangph2020-01-091-3/+6
| | | | | - Start couch_log to make sure that couch_log_server proc exists and write log instead of getting noproc error during test
* Fix chttpd_purge_tests.erlPaul J. Davis2020-01-091-1/+2
| | | | Fixes #2424
* fix(#2143): allow env var overrides for js query server config (#2393)Jan Lehnardt2020-01-082-5/+5
| | | | | | * fix(#2143): allow env var overrides for js query server config * Remove incorrect quotation marks from couchdb.cmd.in Co-authored-by: Joan Touzet <wohali@apache.org>
* Debug design_docs.js failurePaul J. Davis2020-01-081-2/+4
| | | | | This test has been failing randomly on Jenkins across multiple PRs. This adds more context to the error that causes the test to fail.
* Log the exit code of couchjsPaul J. Davis2020-01-081-1/+1
| | | | | | Recently we've been seeing the `couchjs` test runner exiting without displaying a traceback of an error. This logs the exit code of the OS process to see if that gives any insight into why its exiting.
* Fix missing mango execution stats (part 2)Will Holley2020-01-082-52/+23
| | | | | | | | | | | | | | | | | The previous implementation of Mango execution stats relied on passing the docs_examined count from each shard to the coordinator in the view_row record. This failed to collect the count of documents read which weren't followed by a match (in a given shard). For example, if an index was scanned but no documents were matched, the docs_examined would be 0, when it should be equal to the number of documents in the index. This commit changes the implementation so that docs examined is passed only when each shard has completed its index scan. The work is split into 2 commits to support mixed-version cluster upgrades - the previous commit adds the message handlers only so can be safely rolled out without breaking in-flight requests.
* Fix missing mango execution stats (part 1)Will Holley2020-01-083-6/+23
| | | | | Adds message handlers to mango / all_docs / mrview fabric to recieve an execution_stats message.
* Jenkins: update binary platform matrix (#2422)Joan Touzet2020-01-081-4/+7
| | | | This PR drops Debian jessie, adds Debian buster, and adds CentOS 8 to the binary platform build matrix on master.
* Uncomment COUCHDB_FAUXTON_DOCROOT for couchdb.cmd (#2416)Joan Touzet2020-01-071-1/+1
| | | Fixes #2404
* Halt on no admin to avoid crash dump (#2417)Joan Touzet2020-01-071-1/+1
|
* Bypass authentication check for /_up (#2411)Jan Lehnardt2020-01-072-0/+6
| | | | | | | | | | | | | | | Add config variable chttpd.require_valid_user_except_for_up defaulting to false. This will allow various automated health check systems to hit /_up without having to provide a username/password pair when the chttpd.require_valid_user config setting is true. Apparently, many of these health check providers do not even allow supplying creds for such a purpose... Closes #823 Co-authored-by: Joan Touzet <wohali@users.noreply.github.com>
* Make the rexi:stream2 interface unacked message limit configurable (#2360)Kyle Snavely2020-01-072-2/+4
| | | | | | | | Also lower the default stream_limit to 5 based on the results of performance testing. Co-authored-by: Adam Kocoloski <kocolosk@apache.org> Co-authored-by: Kyle Snavely <kjsnavely@gmail.com>
* Remove unused batching code from replicator (#2419)Nick Vatamaniuc2020-01-071-33/+9
| | | | | | The `batch_doc(Doc)` code was previously used for local endpoints when flushing docs with attachments. After that code was removed, the `remote_doc_handler/2` filters out all docs with attachments before they even get to the doc flusher so batch_doc(Doc) effectively is always returns `true`.