summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* couch_epi depends on crypto appcouch_epi_crypto_depRobert Newson2017-08-091-1/+2
|
* Allow replicator application to always update replicator docsNick Vatamaniuc2017-08-031-0/+5
| | | | | | Previously when updating the document with the error or failed states the document body will not pass the validation function and will consequently crash either the scheduler or the doc processor.
* Fix timeouts in couch epi testsNick Vatamaniuc2017-08-021-7/+8
| | | | | | | | | | | Sleeps there not enough when run in a contrained test environment Adjust timeouts to let tests pass even when setting CPU usage limit down to 1% in my VirtualBox VM. Also switch to using macro defines to make it look slightly cleaner. Fixes #731
* Update advice on the use of -name (and NOT -sname)Joan Touzet2017-08-011-4/+16
| | | | Closes #729. See the ticket for additional information.
* Fix timeout in couch auth testNick Vatamaniuc2017-08-011-5/+19
| | | | | | | The test was racy. Use test_util:wait/1 function there just like other places like couch_index_compaction_tests Fixes #724
* Rewrite ddoc_cache to improve performancePaul J. Davis2017-08-0129-356/+2436
| | | | | | | | | | | | | | | | | | | | | | | There were a couple issues with the previous ddoc_cache implementation that made it possible to tip over the ddoc_cache_opener process. First, there were a lot of messages flowing through a single gen_server. And second, the cache relied on periodically evicting entries to ensure proper behavior in not caching an entry forever after it had changed on disk. The new version makes two important changes. First, entries now have an associated process that manages the cache entry. This process will periodically refresh the entry and if the entry has changed or no longer exists the process will remove its entry from cache. The second major change is that the cache entry process directly mutates the related ets table entries so that our performance is not dependent on the speed of ets table mutations. Using a custom entry that does no work the cache can now sustain roughly one million operations a second with a twenty thousand clients fighting over a cache limited to one thousand items. In production this means that cache performance will likely be rate limited by other factors like loading design documents from disk.
* Remove duplicated eviction messagesPaul J. Davis2017-08-011-10/+1
| | | | | This is an old merge artifact that was duplicating the event notifications twice per design document update.
* Make replication ID generation more robust.Nick Vatamaniuc2017-07-312-1/+176
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replications checkpoint to _local documents identified by replication ids. If replication ids change replication tasks will not be able to find their previous checkpoints and will rewind their change feeds back to 0. For a large database that could mean reprocessing millions of documents. Current version of replication id generation algorithm hashes the full url of the source, target, their headers, including authorization ones as well, and a few other things. This means when user changes their password and updates their replication document, replication ids will change and all the checkpoint will be invalidated. Also, it is fairly common to upgrade services from http:// to https://. Replication endpoint URIs then typically just change their schema part accordingly. However, schema is part of the replication ID calculation, so replication ids would then change as well. Introduce a more robust replication id generation algorithm which can handle some of those issues. The new algorithm: 1. Excludes source and target URI schema from the replication id calculation. As long as the host and other parts stay the same changing the schema will have no effect on the replication id. 2. Ignores inline (specified in the URL) basic authentication passwords. 3. Ignores basic authentication password even if provided in the basic authorization headers. 4. Is insensitive to switching between providing basic authentication credentials inline or in a headers section. However it includes the username used in the basic auth in the calculation. It is plausible scenario that http://user1:pass1@a.host.com is really a different database than http://user2:pass@@a.host.com Issue #688
* Save migrated replicator checkpoint documents immediatelyNick Vatamaniuc2017-07-311-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if the replication id algorithm was updated, replicator would migrate checkpoint documents but keep them in memory. They would be written to their respective databases only if checkpoints need to be updated, which doesn't happen unless the source database changes. As a result it was possible for checkpoints to be lost. Here is how it could happen: 1. Checkpoints were created for current (3) version of the replicator document. Assume the replication document contains some credentials tha look like 'adm:pass', and the commputed v3 replication id is "3abc...". 2. Replication id algorithm is updated to version 4. Version 4 ignores passwords, such that changing authentication from 'adm:pass' to 'adm:pass2' would not change the replication ids. 3. Server code is updated with version 4. Replicator looks for checkpoints with the new version 4, which it calculates to be "4def...". It can't find it, so it looks for v3, it finds "3abc..." and decides to migrate it. However migration only happens in memory. That is, the checkpoint document is updated but it need a checkpoint to happen for it to be written to disk. 4. There are no changes to the source db. So no checkpoints are forced to happen. 5. User hears that the new replicator version is improved and passwords shouldn't alter the replication ids and all the checkpoints are reused. They update the replication document with their new credentials - adm:pass2. 6. The updated document with 'adm:pass2' credentials is processed by the replicator. It computes the v4 replication id - "4def...". It's the same as before since it wasn't affected by pass -> pass2 change. That replication checkpoint document is not found on neither source not target. Replicator then computes v3 of the id to find the older version. However, v3 is affected by the passwords, so there it computes "3ghi..." which is different from previous v3 which was "3abc..." It cannot find it. Computes v2 and checks, then v1, and eventually gives up not finding checkpoint and restart the change feed from 0 again. To fix it, update `find_replication_logs` to also write the migrated replication checkpoint documents to their respective databases as soon as it finds them.
* Increase timeouts in replicator compactor testsNick Vatamaniuc2017-07-311-4/+4
| | | | | | | | | | | | Also decrease number of rounds from 5 to 3. With: `VBoxManage bandwidthctl ${VM} set Limit --limit 100K` It needed over 400 seconds to pass with 5 rounds. Fixes #725
* Stop couch_index processes on ddoc updateMayya Sharipova2017-07-3111-48/+261
| | | | | | | | | | | | | | | | | | | | | | | | | Currently when ddoc is updated, couch_index and couch_index_updater processes corresponding to the previous version of ddoc will still exist until all indexing processing initiated by them is done. When ddoc of a big database is rapidly modified, this puts a lot of unnecessary strain on database resources. With this change, when ddoc is updated: * all couch_index processes for the previous version of ddoc will be shutdown * all linked to them couch_index_updater processes will die as well * all processes waiting for indexing activity to be finished (waiters for couch_index:get_status) will receive an immediate reply: ddoc_updated. Interactive user requests (view queries) will get response: {404, <<"not_found">>, <<"Design document was updated or deleted.">>} Check if there are ddocs that use the same couch_index process before closing it on ddoc_updated 1. When opening an index, always add a record {DbName, {DDocId, Sig}} to ?BY_DB. 2. When ddoc_updated, check if there other ddocs in ?BY_DB with the same Sig. If there are no, stop couch_index processes. If there are other, only remove {DbName, {DDocId, Sig}} record from ?BY_DB for this ddoc.
* Bump setup, docs and fauxton dependencies2.1.0-RC12.1.0Joan Touzet2017-07-301-3/+3
|
* Bump image version in JenkinsfileJoan Touzet2017-07-301-16/+19
|
* Merge pull request #722 from michellephung/update-fauxtonMichelle Phung2017-07-291-1/+1
|\ | | | | Update Rebar file with Fauxton 1.1.13
| * Update Rebar file with Fauxton 1.1.13michellephung2017-07-291-1/+1
|/
* Disable flaky 413 replication testNick Vatamaniuc2017-07-281-14/+19
| | | | | | | | | | | It is possible that sometimes a multipart/related PUT with a doc and an attachment would fail with the connection being un-expectedly closed before the client (ibrowse) gets to parse the 413 error response. That makes the test flaky so it is disabled for now. Issue #574
* Do not unconditioanlly retry a request which was closed unexpectedlyNick Vatamaniuc2017-07-281-2/+4
| | | | | | | | | In some case such as when replicator flushes a document received from an open_revs response, it explictly sets the number of retries to 0 because the context for that request might not be restartable and the retry should happen at a higher level. Issue #574
* bump docs dependencyJoan Touzet2017-07-282-1/+2
|
* Leave the uncommented defaults uncommentedPaul J. Davis2017-07-271-2/+2
|
* Extend the log config option descriptionPaul J. Davis2017-07-271-32/+54
| | | | | Someone asked on Slack/IRC about this so I figured I'd clean it up a bit to be more clear on how it works.
* Merge pull request #710 from apache/fix/peruser-testJan Lehnardt2017-07-271-41/+44
|\ | | | | Fix/peruser test
| * fix: use the right values for assertionsfix/peruser-testJan Lehnardt2017-07-221-2/+2
| |
| * fix: return all generator asserts, so they all runJan Lehnardt2017-07-221-12/+20
| |
| * chore: remove debugging lineJan Lehnardt2017-07-221-1/+0
| |
| * Revert "Fix couch_peruser EUnit test"Jan Lehnardt2017-07-221-32/+28
| | | | | | | | This reverts commit 4b63ba898562382e48a1899af5efa3cb77bda1d7.
* | Increase various eunit test timeoutsJay Doane2017-07-266-6/+7
| | | | | | | | | | | | Several eunit tests tend to fail by timing out when run on travis-ci. This change increases timeouts on the more commonly failing tests, and improves test robustness.
* | Fix regression test for COUCHDB-1283Paul J. Davis2017-07-251-3/+14
| | | | | | | | | | | | | | This makes sure that we correctly synchronize with the process running compaction before we perform our desired assertions. Fixes #701
* | Add soak-javascript targetPaul J. Davis2017-07-251-0/+15
| |
* | Update default.ini with all changes since 2.0Joan Touzet2017-07-251-2/+20
| |
* | Strip ?rev off of logfile-uploader's printed URLJoan Touzet2017-07-241-1/+1
| |
* | Increase timeout in couch's couch_db_mpr_tests module to 30 secondsNick Vatamaniuc2017-07-241-21/+24
| | | | | | | | | | | | | | | | | | | | Previous default timeout of 5 seconds was not enough when running in an environment where disk access is severly throttled. To add a timeout, changed the test function into a test generator. That also made the `with` construct un-necessary. Fixes #695
* | Fix link to changelog/whatsnewJoan Touzet2017-07-241-2/+2
|/
* Merge branch 'master' of https://github.com/apache/couchdbNick Vatamaniuc2017-07-211-0/+470
|\
| * Restore Jenkins builds on masterJoan Touzet2017-07-211-0/+470
| |
* | Bump config dep to 1.0.1 (ncrease timeouts for set and get).Nick Vatamaniuc2017-07-211-1/+1
|/ | | | https://github.com/apache/couchdb-config/pull/16
* Do not persist restart times setting in os_daemons_testsJoan Touzet2017-07-211-1/+1
| | | | | | Looks like an oversight in commit 789f75d. Closes #703
* bump all deps to tagsJoan Touzet2017-07-211-11/+11
|
* Fix couch_peruser EUnit testJoan Touzet2017-07-211-28/+32
| | | | | | | | The test was repeatedly creating/deleting the exact same DB name, which is a recipe for disaster. Changed to use unique DB names. Closes #705.
* Improve JS restartServer() support functionJoan Touzet2017-07-211-10/+16
| | | | | | | | | | | Previously, we potentially could attempt to restart couch, immediately attempt to see if couch had restarted, and fail if the server wasn't there (pre- or post-restart). This change wraps all attempts to contact couch in restartServer() with try blocks and simplifies the check-if-restarted logic. Closes #669. May or may not help with #673.
* Explicitly mention Facebook "BSD+Patents" license in NOTICE perJoan Touzet2017-07-211-0/+2
| | | | | | LEGAL-303 Closes #697
* Temporarily disable Jenkins buildsJoan Touzet2017-07-211-327/+0
|
* Increase timeout of some replication testsNick Vatamaniuc2017-07-202-2/+2
| | | | | | | | Could reproduce issue #633 by limiting disk throughput in a VBox VM instance to about 5KB. Try to increase the timeouts to let it handle such apparent slowdowns. Fixed #633
* TMP: Add debug logging for failed assertionPaul J. Davis2017-07-191-0/+7
|
* Fix cancellation race in replication.js testsNick Vatamaniuc2017-07-181-1/+1
| | | | | | | | Replication cancelation doesn't immediately update active tasks. Instead, use the new `waitReplicationTaskStop(rep_id)` function to propery wait for the task status. Issue #634
* Simplify regression test for COUCHDB-1283Paul J. Davis2017-07-181-149/+21
| | | | | | | | | | | The previous version of this test relied on trying to bump into the all_dbs_active error from the couch_server LRU. This proves to be rather difficult to reliably provide assertions on behavior. In hindsight all we really care about is that the compactor holds a monitor against the database and then we can trust couch_server will not evict anything that is actively monitored. Fixes #680
* Remove get_details replicator job gen_server callNick Vatamaniuc2017-07-182-13/+0
| | | | | | | | | | | | | This was used from a test only and it wasn't reliable. Because of replicator job delays initialization the `State` would be either #rep_state{} or #rep. If replication job hasn't finished initializing, then state would be #rep{} and a call like get_details which matches the state with #rep_state{] would fail with the batmatch error. As seen in issue #686 So remove `get_details` call and let the test rely on task polling as all other tests do.
* Merge pull request #693 from cloudant/use-stop_sync-in-mem3-testiilyak2017-07-181-2/+2
|\ | | | | Use test_util:stop_config in mem3_util_test
| * Use test_util:stop_config in mem3_util_testILYA Khlopotov2017-07-181-2/+2
|/ | | | | | | The config:stop is asynchronous which causes test failures with error like the following {error,{already_started,<0.32662.3>}
* Merge pull request #691 from cloudant/3367-fix-test-caseiilyak2017-07-181-17/+17
|\ | | | | 3367 fix test case
| * Fix trailing whitespace issuesILYA Khlopotov2017-07-181-2/+2
| |