summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'master' into add-shard-sync-apiadd-shard-sync-apiJoan Touzet2019-01-18103-972/+1252
|\
| * Fix timeout in chttpd_purge_testsPaul J. Davis2019-01-181-11/+25
| |
| * Move Jenkins to use Erlang 19 for initial build step (#1866)Joan Touzet2019-01-181-3/+3
| |
| * Merge pull request #1865 from apache/purge_request_with_101_docidPeng Hui Jiang2019-01-182-1/+22
| |\ | | | | | | Support one purge request with more than 100 docid
| | * Support one purge request with more than 100 docidpurge_request_with_101_docidjiangph2019-01-172-1/+22
| |/ | | | | | | COUCHDB-3226
| * Fix fabric_open_doc_revsPaul J. Davis2019-01-151-25/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a subtle bug when opening specific revisions in fabric_doc_open_revs due to a race condition between updates being applied across a cluster. The underlying cause here was due to the stemming after a document had been updated more than revs_limit number of times along with concurrent reads to a node that had not yet made the update. To illustrate lets consider a document A which has a revision history from `{N, RevN}` to `{N+1000, RevN+1000}` (assuming revs_limit is the default 1000). If we consider a single node perspective when an update comes in we added the new revision and stem the oldest revision. The docs the revisions on the node would be `{N+1, RevN+1}` to `{N+1001, RevN+1001}`. The bug exists when we attempt to open revisions on a different node that has yet to apply the new update. In this case when fabric_doc_open_revs could be called with `{N+1000, RevN+1000}`. This results in a response from fabric_doc_open_revs that includes two different `{ok, Doc}` results instead of the expected one instance. The reason for this is that one document has revisions `{N+1, RevN+1}` to `{N+1000, RevN+1000}` from the node that has applied the update, while the node without the update responds with revisions `{N, RevN}` to {N+1000, RevN+1000}`. To rephrase that, a node that has applied an update can end up returning a revision path that contains `revs_limit - 1` revisions while a node wihtout the update returns all `revs_limit` revisions. This slight change in the path prevented the responses from being properly combined into a single response. This bug has existed for many years. However, read repair effectively prevents it from being a significant issue by immediately fixing the revision history discrepancy. This was discovered due to the recent bug in read repair during a mixed cluster upgrade to a release including clustered purge. In this situation we end up crashing the design document cache which then leads to all of the design document requests being direct reads which can end up causing cluster nodes to OOM and die. The conditions require a significant number of design document edits coupled with already significant load to those modified design documents. The most direct example observed was a clustered that had a significant number of filtered replications in and out of the cluster.
| * Improve vm.args template comments (#1861)Joan Touzet2019-01-151-3/+12
| |
| * Fix read repair in a mixed cluster environmentPaul J. Davis2019-01-142-3/+3
| | | | | | | | | | | | This enables backwards compatbility with nodes still running the old version of fabric_rpc when a cluster is upgraded to master. This has no effect once all nodes are upgraded to the latest version.
| * Fix end_time field in /_replicate responseNick Vatamaniuc2019-01-081-2/+2
| | | | | | | | | | | | | | | | | | Previously `end_time` was generated converting the start_time to universal, then passing that to `httpd_util:rfc1123_date/1`. However, `rfc1123_date/1` also transates its argument from local to UTC time, that is it accepts input to be in local time format. Fixes #1841
| * Merge pull request #1808 from apache/before_doc_updatePeng Hui Jiang2019-01-058-26/+34
| |\ | | | | | | Update before_doc_update/2 to before_doc_update/3
| | * Update before_doc_update/2 to before_doc_update/3before_doc_updatejiangph2019-01-058-26/+34
| |/ | | | | | | - Pass UpdateType to before_doc_update/3
| * Merge pull request #1831 from apache/intro-cpse_test_purge_seqsPeng Hui Jiang2019-01-032-6/+9
| |\ | | | | | | Re-Introduce cpse_test_purge_seqs
| | * Introduce cpse_test_purge_seqs againjiangph2019-01-032-6/+9
| |/ | | | | | | | | | | - Re-introduce cpse_test_purge_seqs after fixing issue on cpse_test_purge_seqs:cpse_increment_purge_seq_on_partial_purge/1 with undef issue
| * happy new year (#1838)Jan Lehnardt2018-12-312-2/+2
| |
| * Merge pull request #1833 from cloudant/minimum-erlang-otp-19iilyak2018-12-285-46/+1
| |\ | | | | | | Change minimum supported Erlang version to OTP 19
| | * Change minimum supported Erlang version to OTP 19Jay Doane2018-12-285-46/+1
| |/
| * Remove obsolete travis filesJay Doane2018-12-2710-308/+0
| | | | | | | | | | | | These files were used when their apps had separate repositories, but are obsolete in the "mono repo" since their apps are built together using the top level .travis.yml now.
| * Remove explicit modules list from .app.src filesJay Doane2018-12-276-74/+0
| | | | | | | | | | | | The modules lists in .app files are automatically generated by rebar from .app.src files, so these explicit lists are unnecessary and prone to being out of date.
| * Merge pull request #1798 from cloudant/suppress-compiler-warningsiilyak2018-12-2746-260/+332
| |\ | | | | | | Suppress compiler warnings
| | * Suppress export-related compiler warningsJay Doane2018-12-2722-108/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - For export_all warnings: either replace with explicit exports, add nowarn_export_all compiler directives when appropriate, or in the case of couch_epi_sup, move the test to dedicated test file and export the function needed for testing. - For "function already exported" warning in couch_key_tree_prop_tests, remove include_lib attribute for eunit.hrl since it already gets imported in triq.hrl
| | * Reduce number of behaviour undefined compiler warningsJay Doane2018-12-272-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move couch_event and mem3 modules earlier in the list of SubDirs to suppress behaviour undefined warnings. This has the side effect of running the tests in the new order, which induces failures in couch_index tests. Those failures are related to quorum, and can be traced to mem3 seeds tests leaving a _nodes db containing several node docs in the tmp/data directory, ultimately resulting in badmatch errors e.g. when a test expects 'ok' but gets 'accepted' instead. To prevent test failures, a cleanup function is implemented which deletes any existing "nodes_db" left after test completion.
| | * Suppress misc compiler warningsJay Doane2018-12-273-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - couch_util_tests.erl:90: Warning: the result of the expression is ignored - couch_mrview_index_changes_tests.erl:189,196: Warning: a term is constructed, but never used - couch_replicator_connection_tests.erl:76: Warning: this expression will fail with a 'badarith' exception
| | * Suppress unused function compiler warningsJay Doane2018-12-274-100/+98
| | | | | | | | | | | | | | | | | | - Add unused test cases to test fixture - Eliminate unreferenced code - Comment out code that is referenced in commented code only
| | * Suppress crypto and random compiler warningsJay Doane2018-12-275-20/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace deprecated crypto:rand_uniform/2 and 'random' module functions with equivalent couch_rand:uniform/1 calls, or eliminate the offending code entirely if unused. Note that crypto:rand_uniform/2 takes two parameters which have different semantics than the single argument couch_rand:uniform/1. Tests in mem3 are also provided to validate that the random rotation of node lists was converted correctly.
| | * Suppress unused variable and type compiler warningsJay Doane2018-12-279-12/+8
| | |
| | * Suppress variable exported from 'case' compiler warningsJay Doane2018-12-272-9/+8
| |/
| * Merge pull request #1829 from cloudant/elixir-test-improvementsiilyak2018-12-276-33/+61
| |\ | | | | | | Elixir test improvements
| | * Do not automatically fail tests if quorum conditions unmetJay Doane2018-12-221-3/+3
| | |
| | * Improve all_docs_test robustnessJay Doane2018-12-221-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wrap deleted element assertions in retry_until to prevent timing related failures like: AllDocsTest * test All Docs tests (331.1ms) 1) test All Docs tests (AllDocsTest) test/all_docs_test.exs:15 Assertion with == failed code: assert length(deleted) == 1 left: 0 right: 1 stacktrace: test/all_docs_test.exs:72: (test)
| | * Fix elixir test formattingJay Doane2018-12-225-27/+52
| |/ | | | | | | | | | | | | | | | | | | | | | | | | Prior to this, `make elixir` was failing with these errors: ** (Mix) mix format failed due to --check-formatted. The following files were not formatted: * test/security_validation_test.exs * test/rewrite_test.exs * test/cluster_with_quorum_test.exs * test/cluster_without_quorum_test.exs * test/all_docs_test.exs
| * Remove shim couch_replicator_manager moduleKyle Snavely2018-12-204-89/+15
| |
| * Clean rexi stream workers when coordinator process is killedNick Vatamaniuc2018-12-201-0/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes fabric coordinators end up getting brutally terminated [1], and in that case they might never process their `after` clause where their remote rexi workers are killed. Those workers are left lingering around keeping databases active for up to 5 minutes at a time. To prevent that from happening, let coordinators which use streams spawn an auxiliary cleaner process. This process will monitor the main coordinator and if it dies will ensure remote workers are killed, freeing resources immediately. In order not to send 2x the number of kill messages during the normal exit, fabric_util:cleanup() will stop the auxiliary process before continuing. [1] One instance is when the ddoc cache is refreshed: https://github.com/apache/couchdb/blob/master/src/ddoc_cache/src/ddoc_cache_entry.erl#L236
| * Move fabric streams to a fabric_streams moduleNick Vatamaniuc2018-12-207-98/+129
| | | | | | | | | | | | Streams functionality is fairly isolated from the rest of the utils module so move it to its own. This is mostly in preparation to add a streams workers cleaner process.
| * Suppress credo TODO suggests (#1822)Ivan Mironov2018-12-201-2/+5
| |
| * Migrate cluster with(out) quorum js tests as elixir tests (#1812)Juanjo Rodriguez2018-12-194-2/+383
| |
| * Increase timeout on restart in JS/elixir tests to 30s (#1820)Joan Touzet2018-12-192-3/+3
| |
| * Merge pull request #1800 from cloudant/allow-specifying-individual-elixir-testsiilyak2018-12-194-4/+17
| |\ | | | | | | Support specifying individual Elixir tests to run
| | * Merge branch 'master' into allow-specifying-individual-elixir-testsJoan Touzet2018-12-1211-45/+327
| | |\ | | |/ | |/|
| | * Support specifying individual Elixir tests to runILYA Khlopotov2018-12-074-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | Individual tests can be specified as: - make elixir tests=test/basics_test.exs - make elixir tests=test/basics_test.exs,test/config_test.exs - make elixir tests=test/basic_test.exs:17
* | | Add new /{db}/_sync_shards endpoint (admin-only)Joan Touzet2019-01-183-1/+22
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | This server admin-only endpoint forces an n-way sync of all shards across all nodes on which they are hosted. This can be useful for an administrator adding a new node to the cluster, after updating _dbs so that the new node hosts an existing db with content, to force the new node to sync all of that db's shards. Users may want to bump their `[mem3] sync_concurrency` value to a larger figure for the duration of the shards sync. Closes #1807
* | Merge pull request #1805 from cloudant/fix-with-haproxyiilyak2018-12-111-1/+1
|\ \ | | | | | | Fix haproxy config file location
| * | Fix haproxy config file locationILYA Khlopotov2018-12-111-1/+1
|/ / | | | | | | | | The problem was introduced in 94eff0d8 during rebase of https://github.com/apache/couchdb/pull/1774
* | Merge pull request #1774 from cloudant/support-more-than-3-nodesiilyak2018-12-112-15/+86
|\ \ | | | | | | Support for more than 3 nodes dev cluster
| * | Support for more than 3 nodes dev clusterILYA Khlopotov2018-12-112-15/+86
|/ /
* | Merge pull request #1796 from cloudant/tests/port-delayed_commits-to-elixirEric Avdey2018-12-104-1/+89
|\ \ | | | | | | Port delayed_commits test to Elixir
| * | Port delayed_commits test to ElixirEric Avdey2018-12-102-1/+32
| | |
| * | Add elixir helper to restart a node or the whole clusterEric Avdey2018-12-102-0/+57
|/ /
* | Merge pull request #1802 from van-mronov/moduledociilyak2018-12-102-0/+4
|\ \ | | | | | | Fix running elixir test suite
| * | Add moduledoc attributeIvan Mironov2018-12-072-0/+4
|/ /
* | Merge pull request #1770 from cloudant/COUCHDB-1384-function-clause-errorPeng Hui Jiang2018-12-082-6/+126
|\ \ | | | | | | Fix function_clause exception on invalid DB security objects