summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* chore: more robust test3406/improve-testsJan Lehnardt2017-05-141-16/+18
|
* Merge pull request #516 from cloudant/global-ignore-eunit-subdirEric Avdey2017-05-111-17/+2
|\ | | | | Ignore .eunit and .rebar for all deps
| * Ignore .eunit and .rebar for all depsEric Avdey2017-05-111-17/+2
|/ | | | | | | This update makes git to ignore .eunit and .rebar subdirs for all dependencies instead of specifying them in .gitignore file individually.
* Test changes_listener dies on mem3_shards shutdownJay Doane2017-05-101-0/+41
| | | | | | | This adds a test to ensure that the changes_listener process exits when the mem3_shards process is shut down. COUCHDB-3398
* Expose mem3_shards:get_changes_pid/0Jay Doane2017-05-101-0/+6
| | | | Simplify getting changes listener pid for testing
* Mango $allMatch return false for empty list (#511)garren smith2017-05-102-4/+16
| | | The $allMatch selector returns false for a document with an empty list
* Increase timeout for JS test harness restartServer fnJoan Touzet2017-05-091-1/+1
|
* Increase timeout for compression testsJoan Touzet2017-05-091-1/+1
|
* Handle non-default _replicator dbs in _scheduler/docs endpointNick Vatamaniuc2017-05-091-24/+81
| | | | | | | | | | | | | | | | | | | | | | | | Previously _scheduler/docs assumed only the default _replicator db. To provide consistency and to allow disambiguation between a db named 'db/_replicator' and the document named 'db/_replicator' in the default replicator db, access to the single document API is changed to always require the replicator db. That is `/docid` should not be `/_replicator/docid`. Now these kinds of paths are accepted after `_scheduler/docs`: * `/` : all docs from default _replicator db * `/_replicator` : all docs from default replicator db * `/other%2f_replicator` : non-default replicator db, urlencoded * `/other/_replicator` : non-default replicator db, unencoded * `/other%2f_replicator/docid` : doc from a non-default db, urlencoded * `/other/_replicator/docid` : doc from a non-default db, db is unencoded Because `_replicator` is not a valid document ID, it's possible to unambiguously parse unescaped db paths. Issue: #506
* Hibernate couch_stream after each writeJoan Touzet2017-05-091-1/+1
| | | | | | | | | | | | | | | In COUCHDB-1946 Adam Kocoloski investigated a memory explosion resulting from replication of databases with large attachments (npm fullfat). He was able to stabilize memory usage to a much lower level by hibernating couch_stream after each write. While this increases CPU utilization when writing attachments, it should help reduce memory utilization. This patch is the single change that affected a ~70% reduction in memory. No alteration to the spawn of couch_stream to change the fullsweep_after setting has been made, in part because this can be adjusted at the erl command line if desired (-erl ERL_FULLSWEEP_AFTER 0).
* Choose index based on fields match (#469)garren smith2017-05-092-13/+128
| | | | | | | | | | | * Choose index based on Prefix and FieldRange First choose the index with the lowest difference between its Prefix and the FieldRanges. If that is equal, then choose the index with the least number of fields in the index. If we still cannot break the tie, then choose alphabetically based on ddocId. Return the first element's Index and IndexRanges.
* Switch to using Travis containerised buildsJoan Touzet2017-05-051-11/+48
|
* Merge pull request #484 from cloudant/couchdb-3389Nick Vatamaniuc2017-05-051-41/+50
|\ | | | | Apply random jitter during initial _replicator shard discovery
| * Add jittered sleep during replicator shard scanningNick Vatamaniuc2017-05-051-41/+50
|/ | | | | | | | | | | | | | | | | | | | | | | This is bringing back previous code: https://github.com/apache/couchdb/blob/884cf3e55f77ab1a5f26dc7202ce21771062eae6/src/couch_replicator_manager.erl#L940-L946 This is to avoid a stampede during startup when potentially a large number shards are found and change feeds have to be opened for all of them at the same time. The average jitter value starts at 10 msec for first shard, then goes up to 1 minute for 6000th shard and stays clamped at 1 minute afterwards. (Note: that's the average, the range is 1 -> 2 * average as this is a uniform random distribution). Some sample values: * 100 - 1 second * 1000 - 10 seconds * 6000 and higher - 1 minute Jira: COUCHDB-3389
* Merge pull request #503 from cloudant/couchdb-3324-fix-badargNick Vatamaniuc2017-05-031-1/+1
|\ | | | | Fix `badarg` when querying replicator's _scheduler/docs endpoint
| * Fix `badarg` when querying replicator's _scheduler/docs endpointNick Vatamaniuc2017-05-031-1/+1
|/ | | | Jira: COUCHDB-3324
* bypass compact.js flaky comparisonJoan Touzet2017-05-031-1/+1
|
* Fix and re-enable many test casesJoan Touzet2017-05-0314-252/+135
|
* Fix error on race condition in mem3 startupJoan Touzet2017-05-021-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During mem3 startup, 2 paths attempt to call `couch_server:create/2` on `_dbs`: ``` gen_server:init_it/6 -> mem3_shards:init/1 -> mem3_shards:get_update_seq/0 -> couch_server:create/2 ``` and ``` mem3_sync:initial_sync/1 -> mem3_shards:fold/2 -> couch_server:create/2 ``` Normally, the first path completes before the second. If the second path finishes first, the first path fails because it does not expect a `file_exists` response. This patch makes `mem3_util:ensure_enxists/1` more robust in the face of a race to create `_dbs`. Fixes COUCHDB-3402. Approved by @davisp and @iilyak
* Bump docs to include scheduling replicator documentationNick Vatamaniuc2017-05-021-1/+1
| | | | Jira: COUCHDB-3324
* Update rebar with new fauxton tagmichellephung2017-04-301-1/+1
|
* Merge pull request #312 from robertkowalski/build-thanksJoan Touzet2017-04-301-0/+68
|\ | | | | build: pull authors out of subrepos
| * build: pull authors out of subreposRobert Kowalski2015-04-021-0/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - pulls comitternames out of subrepos - uses local variables instead of globals - makes use of our .mailmap file in our main repo - composable: just source it and pipe it into your process usage in `couchdb-build-release.sh`: ```sh sed -e "/^#.*/d" THANKS.in > $RELDIR/THANKS source "build-aux/print-committerlist.sh" print_comitter_list >> THANKS; ```
* | Revert "Fix error on race condition in mem3 startup"Joan Touzet2017-04-301-6/+1
| | | | | | | | This reverts commit 2689507fc0f4a4a3731df34d3634bb1bcd4afbc3.
* | bump version.mkJoan Touzet2017-04-301-1/+1
| |
* | Fix error on race condition in mem3 startupJoan Touzet2017-04-301-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During mem3 startup, 2 paths attempt to call `couch_server:create/2` on `_dbs`: ``` gen_server:init_it/6 -> mem3_shards:init/1 -> mem3_shards:get_update_seq/0 -> couch_server:create/2 ``` and ``` mem3_sync:initial_sync/1 -> mem3_shards:fold/2 -> couch_server:create/2 ``` Normally, the first path completes before the second. If the second path finishes first, the first path fails because it does not expect a `file_exists` response. This patch simply retries mem3_util:ensure_exists/1 once if it gets back a `file_exists` response. Any failures past this point are not handled. Fixes COUCHDB-3402.
* | bump for next releaseJoan Touzet2017-04-292-6/+6
| |
* | snap --> couchdb-pkg repoJoan Touzet2017-04-294-71/+0
| |
* | Merge pull request #500 from cloudant/couchdb-3324-windows-makefile-fixNick Vatamaniuc2017-04-291-1/+1
|\ \ | | | | | | Disabling replication startup jitter in Windows makefile
| * | Disabling replication startup jitter in Windows makefileNick Vatamaniuc2017-04-291-1/+1
|/ / | | | | | | | | | | | | | | | | | | | | A similar change has already been made to *nix Makefile already. This is for the Javascript integration test suite to not timeout when running in Travis. We don't run Windows tests in Travis but this should speed things a bit, and it's nice to keep both in sync. Jira: COUCHDB-3324
* | Merge pull request #470 from apache/63012-schedulerNick Vatamaniuc2017-04-2849-2644/+8667
|\ \ | | | | | | Scheduling Replicator
| * | Add `_scheduler/{jobs,docs}` API endpoints63012-schedulerNick Vatamaniuc2017-04-285-14/+517
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The `_scheduler/jobs` endpoint provides a view of all replications managed by the scheduler. This endpoint includes more information on the replication than the `_scheduler/docs` endpoint, including the history of state transitions of the replication. This part was implemented by Benjamin Bastian. The `_scheduler/docs` endpoint provides a view of all replicator docs which have been seen by the scheduler. This endpoint includes useful information such as the state of the replication and the coordinator node. The implemention of `_scheduler/docs` mimics closely `_all_docs` behavior: similar pagination, HTTP request processing and fabric / rexi setup. The algorithm is roughly as follows: * http endpoint: - parses query args like it does for any view query - parses states to filter by, states are kept in the `extra` query arg * Call is made to couch_replicator_fabric. This is equivalent to fabric:all_docs. Here the typical fabric / rexi setup is happening. * Fabric worker is in `couch_replicator_fabric_rpc:docs/3`. This worker is similar to fabric_rpc's all_docs handler. However it is a bit more intricate to handle both replication document in terminal state as well as those which are active. - Before emitting it queries the state of the document to see if it is in a terminal state. If it is, it filters it and decides if it should be emitted or not. - If the document state cannot be found from the document. It tries to fetch active state from local node's doc processor via key based lookup. If it finds, it can also filter it based on state and emit it or skip. - If the document cannot be found in the node's local doc processor ETS table, the row is emitted with a doc value of `undecided`. This will let the coordinator fetch the state by possibly querying other nodes's doc processors. * Coordinator then starts handling messages. This also mostly mimics all_docs. At this point the most interesting thing is handling `undecided` docs. If one is found, then `replicator:active_doc/2` is queried. There, all nodes where document shards live are queries. This is better than a previous implementation where all nodes were queries all the time. * The final work happens in `couch_replicator_httpd` where the emitting callback is. There we only the doc is emitted (not keys, rows, values). Another thing that happens is the `Total` value is decremented to account for the always-present _design doc. Because of this a bunch of stuff was removed. Including an extra view which was build and managed by the previous implementation. As a bonus, other view-related parameters such as skip and limit seems to work out of the box and don't have to be implemented ad-hoc. Also, most importantly many thanks to Paul Davis for suggesting this approach. Jira: COUCHDB-3324
| * | Stitch scheduling replicator together.Nick Vatamaniuc2017-04-2827-2068/+1246
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Glue together all the scheduling replicator pieces. Scheduler is the main component. It can run a large number of replication jobs by switching between them, stopping and starting some periodically. Jobs which fail are backed off exponentially. Normal (non-continuous) jobs will be allowed to run to completion to preserve their current semantics. Scheduler behavior can configured by these configuration options in `[replicator]` sections: * `max_jobs` : Number of actively running replications. Making this too high could cause performance issues. Making it too low could mean replications jobs might not have enough time to make progress before getting unscheduled again. This parameter can be adjusted at runtime and will take effect during next reschudling cycle. * `interval` : Scheduling interval in milliseconds. During each reschedule cycle scheduler might start or stop up to "max_churn" number of jobs. * `max_churn` : Maximum number of replications to start and stop during rescheduling. This parameter along with "interval" defines the rate of job replacement. During startup, however a much larger number of jobs could be started (up to max_jobs) in short period of time. Replication jobs are added to the scheduler by the document processor or from the `couch_replicator:replicate/2` function when called from `_replicate` HTTP endpoint handler. Document processor listens for updates via couch_mutlidb_changes module then tries to add replication jobs to the scheduler. Sometimes translating a document update to a replication job could fail, either permantly (if document is malformed and missing some expected fields for example) or temporarily if it is a filtered replication and filter cannot be fetched. A failed filter fetch will be retried with an exponential backoff. couch_replicator_clustering is in charge of monitoring cluster membership changes. When membership changes, after a configurable quiet period, a rescan will be initiated. Rescan will shufle replication jobs to make sure a replication job is running on only one node. A new set of stats were added to introspect scheduler and doc processor internals. The top replication supervisor structure is `rest_for_one`. This means if a child crashes, all children to the "right" of it will be restarted (if visualized supervisor hierarchy as an upside-down tree). Clustering, connection pool and rate limiter are towards the "left" as they are more fundamental, if clustering child crashes, most other components will be restart. Doc process or and multi-db changes children are towards the "right". If they crash, they can be safely restarted without affecting already running replication or components like clustering or connection pool. Jira: COUCHDB-3324
| * | Implement replication document processorNick Vatamaniuc2017-04-283-0/+1365
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Document processor listens for `_replicator` db document updates, parses those changes then tries to add replication jobs to the scheduler. Listening for changes happens in `couch_multidb_changes module`. That module is generic and is set up to listen to shards with `_replicator` suffix by `couch_replicator_db_changes`. Updates are then passed to the document processor's `process_change/2` function. Document replication ID calculation, which can involve fetching filter code from the source DB, and addition to the scheduler, is done in a separate worker process: `couch_replicator_doc_processor_worker`. Before couch replicator manager did most of this work. There are a few improvement over previous implementation: * Invalid (malformed) replication documents are immediately failed and will not be continuously retried. * Replication manager message queue backups is unfortunately a common issue in production. This is because processing document updates is a serial (blocking) operation. Most of that blocking code was moved to separate worker processes. * Failing filter fetches have an exponential backoff. * Replication documents don't have to be deleted first then re-added in order update the replication. Document processor on update will compare new and previous replication related document fields and update the replication job if those changed. Users can freely update unlrelated (custom) fields in their replication docs. * In case of filtered replications using custom functions, document processor will periodically check if filter code on the source has changed. Filter code contents is factored into replication ID calculation. If filter code changes replication ID will change as well. Jira: COUCHDB-3324
| * | Refactor utils into 3 modulesNick Vatamaniuc2017-04-284-482/+1198
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Over the years utils accumulated a lot of functionality. Clean up a bit by separating it into specific modules according to semantics: - couch_replicator_docs : Handle read and writing to replicator dbs. It includes updating state fields, parsing options from documents, and making sure replication VDU design document is in sync. - couch_replicator_filters : Fetch and manipulate replication filters. - couch_replicator_ids : Calculate replication IDs. Handles versioning and Pretty formatting of IDs. Filtered replications using user filter functions incorporate a filter code hash into the calculation, in that case call couch_replicator_filters module to fetch the filter from the source. Jira: COUCHDB-3324
| * | AIMD based rate limiter implementationNick Vatamaniuc2017-04-283-0/+413
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AIMD: additive increase / multiplicative decrease feedback control algorithm. https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease This is an algorithm which converges on the available channel capacity. Each participant doesn't a priori know the capacity and participants don't communicate or know about each other (so they don't coordinate to divide the capacity among themselves). A variation of this is used in TCP congestion control algorithm. This is proven to converge, while for example, additive increase / additive decrease or multiplicative increase / multiplicative decrease won't. A few tweaks were applied to the base control logic: * Estimated value is an interval (period) instead of a rate. This is for convenience, as users will probably want to know how much to sleep. But, rate is just 1000 / interval, so it is easy to transform. * There is a hard max limit for estimated period. Mainly as a practical concern as connections sleeping too long will timeout and / or jobs will waste time sleeping and consume scheduler slots, while others could be running. * There is a time decay component used to handle large pauses between updates. In case of large update interval, assume (optimistically) some successful requests have been made. Intuitively, the more time passes, the less accurate the estimated period probably is. * The rate of updates applied to the algorithm is limited. This effectively acts as a low pass filter and make the algorithm handle better spikes and short bursts of failures. This is not a novel idea, some alternative TCP control algorithms like Westwood+ do something similar. * There is a large downward pressure applied to the increasing interval as it approaches the max limit. This is done by tweaking the additive factor via a step function. In practice this has effect of trying to make it a bit harder for jobs to cross the maximum backoff threshold, as they would be killed and potentially lose intermediate work. Main API functions are: success(Key) -> IntervalInMilliseconds failure(Key) -> IntervalInMilliseconds interval(Key) -> IntervalInMilliseconds Key is any (hashable by phash2) term. Typically would be something like {Method, Url}. The result from the function is the current period value. Caller would then presumably choose to sleep for that amount of time before or after making requests. The current interval can be read with interval(Key) function. Implementation is sharded ETS tables based on the key and there is a periodic timer which cleans unused items. Jira: COUCHDB-3324
| * | Share connections between replicationsBenjamin Bastian2017-04-283-85/+354
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds functionality to share connections between replications. This is to solve two problems: - Prior to this commit, each replication would create a pool of connections and hold onto those connections as long as the replication existed. This was wasteful and cause CouchDB to use many unnecessary connections. - When the pool was being terminated, the pool would block while the socket was closed. This would cause the entire replication scheduler to block. By reusing connections, connections are never closed by clients. They are only ever relinquished. This operation is always fast. This commit adds an intermediary process which tracks which connection processes are being used by which client. It monitors clients and connections. If a client or connection crashes, the paired client/connection will be terminated. A client can gracefully relinquish ownership of a connection. If that happens, the connection will be shared with another client. If the connection remains idle for too long, it will be closed. Jira: COUCHDB-3324
| * | Implement multi-db shard change monitoringNick Vatamaniuc2017-04-281-0/+860
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Monitor shards which match a suffix for creation, deletion, and doc updates. To use implement `couch_multidb_changes` behavior. Call `start_link` with DbSuffix, with an option to skip design docs (`skip_ddocs`). Behavior callback functions will be called when shards are created, deleted, found and updated. Jira: COUCHDB-3324
| * | Cluster ownership module implementationNick Vatamaniuc2017-04-281-0/+243
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This module maintains cluster membership information for replication and provides functions to check ownership of replication jobs. A cluster membership change is registered only after a configurable `cluster_quiet_period` interval has passed since the last node addition or removal. This is useful in cases of rolling node reboots in a cluster in order to avoid rescanning for membership changes after every node up and down event, and instead doing only on rescan at the very end. Jira: COUCHDB-3324
| * | Introduce couch_replicator_schedulerRobert Newson2017-04-284-0/+2476
|/ / | | | | | | | | | | | | | | Scheduling replicator can run a large number of replication jobs by scheduling them. It will periodically stop some jobs and start new ones. Jobs that fail will be penalized with an exponential backoff. Jira: COUCHDB-3324
* | Merge pull request #490 from cloudant/revert-Add-sys_dbs-to-LRUiilyak2017-04-284-36/+88
|\ \ | | | | | | Revert add sys dbs to lru
| * | Adjust reverted code to new couch_lru APIILYA Khlopotov2017-04-271-8/+10
| | |
| * | Revert "Add sys_dbs to the LRU"ILYA Khlopotov2017-04-254-32/+74
| | | | | | | | | | | | | | | | | | | | | This reverts commit 92c25a98666e59ca497fa5732a81e771ff52e07d. Conflicts: src/couch/src/couch_lru.erl
| * | Revert "fix compiler and dialyzer warnings"ILYA Khlopotov2017-04-251-10/+18
| | | | | | | | | | | | This reverts commit 2d984b59ed6886a179fd500efe9780a0d0ad62e3.
* | | bump documentation versionJoan Touzet2017-04-271-1/+1
| | |
* | | Merge pull request #483 from apache/feat-couchupJoan Touzet2017-04-251-0/+480
|\ \ \ | | | | | | | | New couchup 1.x -> 2.x database migration tool
| * | | New couchup 1.x -> 2.x database migration toolfeat-couchupJoan Touzet2017-04-231-0/+480
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds a new Python-based database migration tool, couchup. It is intended to be used at the command-line on the server being upgraded, before bringing the node (or cluster) into service. couchup provides 4 subcommands to assist in the migration process: * list - lists all CouchDB 1.x databases * replicate - replicates one or more 1.x databases to CouchDB 2.x * rebuild - rebuilds one or more CouchDB 2.x views * delete - deletes one or more CouchDB 1.x databases A typical workflow for a single-node upgrade process would look like: ```sh $ couchdb list $ couchdb replicate -a $ couchdb rebuild -a $ couchdb delete -a ``` A clustered upgrade process would be the same, but must be preceded by setting up all the nodes in the cluster first. Various optional arguments provide for admin login/password, overriding ports, quiet mode and so on. Of special note is that `couchup rebuild` supports an optional flag, `-f`, to filter deleted documents during the replication process. I struggled some with the naming convention. For those in the know, a '1.x database' is a node-local database appearing only on port 5986, and a '2.x database' is a clustered database appearing on port 5984, and in raw, sharded form on port 5986.
* | | | Merge pull request #476 from apache/COUCHDB-3376-fix-mem3-shardsPaul J. Davis2017-04-251-15/+325
|\ \ \ \ | |_|/ / |/| | | Couchdb 3376 fix mem3 shards
| * | | Add unit tests for mem3_shardsCOUCHDB-3376-fix-mem3-shardsNick Vatamaniuc2017-04-241-8/+211
| | | | | | | | | | | | | | | | COUCHDB-3376
| * | | Use a temporary process when caching shard mapsPaul J. Davis2017-04-211-13/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change introduces a new shard_writer process into the mem3_shards caching approach. It turns out its not terribly difficult to create thundering herd scenarios that back up the mem3_shards mailbox. And if the Q value is large this backup can happen quite quickly. This changes things so that we use a temporary process to perform the actual `ets:insert/2` call which keeps the shard map out of mem3_shards' message queue. A second optimization is that only a single client will attempt to send the shard map to begin with by checking the existence of the writer key using `ets:insert_new/2`. COUCHDB-3376