delta/couchdb.git - github.com: apache/couchdb.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add tests for shard's split changes feedshard-split-changes-feed-test	Eric Avdey	2019-02-28	1	-0/+447
\|
*	Implement resharding HTTP API	Nick Vatamaniuc	2019-02-28	6	-0/+1975
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements the API as defined in RFC #1920 The handlers live in the `mem3_reshard_httpd` and helpers, like validators live in the `mem3_reshard_httpd_util` module. There are also a bunch of high level (HTTP & fabric) API tests that check that shard splitting happens properly, jobs are behaving as defined in the RFC, etc. Issue #1920 Co-authored-by: Eric Avdey <eiri@eiri.ca>
*	Update internal replicator to replicate to multiple targets	Nick Vatamaniuc	2019-02-28	2	-172/+298
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shard splitting will result in uneven shard copies. Previously internal replicator knew to replicate from one shard copy to another but now it needs to know how to replicate from one source to possibly multiple targets. The main idea is to reuse the same logic and "pick" function as `couch_db_split`. But to avoid a penalty of calling the custom hash function for every document even for cases when there is just a single target, there is a special "1 target" case where the hash function is `undefined`. Another case where internal replicator is used is to topoff replication and to replicate the shard map dbs to and from current node (used in shard map update logic). For that reason there are a few helper mem3_util and mem3_rpc functions. Issue #1920
*	Implement initial shard splitting data copy	Nick Vatamaniuc	2019-02-28	9	-3/+966
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The first step when a new shard splitting job starts is to do a bulk copy of data from the source to the target. Ideally this should happen as fast as possible as it could potentially churn through billions of documents. This logic is implemented in the `couch_db_split` module in the main `couch` application. To understand better what happens in `couch_db_split` it is better to think of it as a version of `couch_bt_engine_compactor` that lives just above the couch_db_engine (PSE) interface instead of below it. The first initial data copy does is it creates the targets. Targets are created based on the source parameters. So if the source uses a specific PSE engine, targets will use the same PSE engine. If the source is partitioned, the targets will use the same partitioned hash function as well. An interesting bit with respect to target creation is that targets are not regular couch_db databases but are closer to a couch_file with a couch_db_updater process linked to them. They are linked directly without going through couch_server. This is done in order to avoid the complexity of handling concurrent updates, handling VDU, interactive vs non-interactive updates, making sure it doesn't compact while copying happens, doesn't update any LRUs, or emit `db_updated` events. Those are things are not needed and handling them would make this more fragile. Another way to think of the targets during the initial bulk data copy is as "hidden" or "write-only" dbs. Another notable thing is that `couch_db_split` doesn't know anything about shards and only knows about databases. The input is a source, a map of targets and a caller provided "picker" function which will know how for each given document ID to pick one of the targets. This will work for both regular dbs as well as partitioned ones. All the logic will be inside the pick function not embedded in `couch_db_split`. One last point is about handling internal replicator _local checkpoint docs. Those documents are transformed when they are copied such that the old source UUID is replaced with the new target's UUID, since each shard will have its own new UUID. That is done to avoid replications rewinding. Besides those points, the rest is rather boring and it's just "open documents from the source, pick the target, copy the documents to one of the targets, read more documents from the source, etc". Issue #1920 Co-authored-by: Paul J. Davis <davisp@apache.org> Co-authored-by: Eric Avdey <eiri@eiri.ca>
*	Shard splitting job implementation	Nick Vatamaniuc	2019-02-28	5	-0/+1465
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the implementation of the shard splitting job. `mem3_reshard` manager spawns `mem3_reshard_job` instances via the `mem3_reshard_job_sup` supervisor. Each job is a gen_server process that starts in `mem3_reshard_job:init/1` with `#job{}` record instance as the argument. Then the job goes through recovery, so it can handle resuming in cases where the job was interrupted previously and it was initialized from a checkpointed state. Checkpoiting happens in `mem3_reshard` manager with the help of the `mem3_reshard_store` module (introduced in a previous commit). After recovery, processing starts in the `switch_state` function. The states are defined as a sequence of atoms in a list in `mem3_reshard.hrl`. In the `switch_state()` function, the state and history is updated in the `#job{}` record, then `mem3_reshard` manager is asked to checkpoint the new state. The job process waits for `mem3_reshard` manager to notify it when checkpointing has finished so it can continue processesing the new state. That happens when the `do_state` gen_server cast is received. `do_state` function has state matching heads for each state. Usually if there are long running tasks to be performed `do_state` will spawn a few workers and perform all the work in there. In the meantime the main job process will simpy wait for all the workers to exit. When that happens, it will call `switch_state` to switch to the new state, checkpoint again and so on. Since there are quite a few steps needed to split a shard, some of the helper function needed are defined in separate modules such as: * mem3_reshard_index : Index discovery and building. * mem3_reshard_dbdoc : Shard map updates. * couch_db_split : Initial (bulk) data copy (added in a separate commit). * mem3_rep : To perfom "top-offs" in between some steps. Issue #1920
*	Resharding supervisor and job manager	Nick Vatamaniuc	2019-02-28	8	-4/+1136
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of the resharding logic lives in the mem3 application under the `mem3_reshard_sup` supervisor. `mem3_reshard_sup` has three children: 1) `mem3_reshard` : The main reshading job manager. 2) `mem3_reshard_job_sup` : A simple-one-for-one supervisor to keep track of individual resharding jobs. 3) `mem3_reshard_dbdoc` : Helper gen_server used to update the shard map. `mem_reshard` gen_server is the central point in the resharding logic. It is a job manager which accept new jobs, monitors jobs when they run, checkpoints their status as they make progress, and knows how to restore their state when a node reboots. Jobs are represented as instances of the `#job{}` records defined in `mem3_reshard.hrl` header. There is also a global resharding state represented by a `#state{}` record. `mem3_reshard` gen_server maintains an ets table of "live" `#job{}` records. as its gen_server state represented by `#state{}`. When jobs are checkpointed or user updates the global resharding state, `mem3_reshard` will use the `mem3_reshard_store` module to persist those updates to `_local/...` documents in the shards database. The idea is to allow jobs to persist across node or application restarts. After a job is added, if the global state is not `stopped`, `mem3_reshard` manager will ask the `mem3_reshard_job_sup` to spawn a new child. That child will be running in a gen_server defined in `mem3_reshard_job` module (included in subsequent commits). Each child process will periodically ask `mem3_reshard` manager to checkpoint when it jump to a new state. `mem3_reshard` checkpoints then informs the child to continue its work. Issue: #1920
*	Uneven shard copy handling in mem3 and fabric	Nick Vatamaniuc	2019-02-28	8	-158/+838
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The introduction of shard splitting will eliminate the contraint that all document copies are located in shards with same range boundaries. That assumption was made by default in mem3 and fabric functions that do shard replacement, worker spawning, unpacking `_changes` update sequences and some others. This commit updates those places to handle the case where document copies might be in different shard ranges. A good place to start from is the `mem3_util:get_ring()` function. This function returns a full non-overlapped ring from a set of possibly overlapping shards. This function is used by almost everything else in this commit: 1) It's used when only a single copy of the data is needed, for example in cases where _all_docs or _changes procesessig. 2) Used when checking if progress is possible after some nodes died. `get_ring()` returns `[]` when it cannot find a full ring is used to indicate that progress is not possible. 3) During shard replacement. This is pershaps the most complicated case. During replacement besides just finding a possible covering of the ring from the set of shards, it is also desirable to find one that minimizes the number of workers that have to be replaced. A neat trick used here is to provide `get_ring` with a custom sort function, which prioritizes certain shard copies over others. In case of replacements it prioritiezes shards for which workers have already spawned. In the default cause `get_ring()` will prioritize longer ranges over shorter ones, so for example, to cover the interval [00-ff] with either [00-7f, 80-ff] or [00-ff] shards ranges, it will pick the single [00-ff] range instead of [00-7f, 80-ff] pair. Issue #1920 Co-authored-by: Paul J. Davis <davisp@apache.org>
*	Merge pull request #1951 from apache/fail-make-on-eunit-failure	Russell Branca	2019-02-27	1	-1/+1
\|\ \| \| \| \|	Fail make eunit upon eunit app suite failure
\| *	Fail make eunit upon eunit app suite failurefail-make-on-eunit-failure	Russell Branca	2019-02-27	1	-1/+1
\|/
*	Merge pull request #1942 from cloudant/update-smoosh-1.0.1	iilyak	2019-02-26	1	-1/+1
\|\ \| \| \| \|	Update smoosh to 1.0.1
\| *	Update smoosh to 1.0.1	ILYA Khlopotov	2019-02-26	1	-1/+1
\|/
*	Merge pull request #1941 from apache/upgrade-ken-1.0.3	Robert Newson	2019-02-25	1	-1/+1
\|\ \| \| \| \|	upgrade ken to 1.0.3
\| *	upgrade ken to 1.0.3	Tony Sun	2019-02-25	1	-1/+1
\|/
*	Merge pull request #1938 from cloudant/update-folsom	iilyak	2019-02-25	1	-1/+1
\|\ \| \| \| \|	Update folsom to support newer erlang
\| *	Update folsom to support newer erlang	ILYA Khlopotov	2019-02-25	1	-1/+1
\|/
*	fixes to elixir tests (#1939)	garren smith	2019-02-25	2	-6/+11
\|
*	Send correct 400 for missing partition with _find (#1936)	garren smith	2019-02-25	2	-0/+28
\| \| \| \| \| \| \| \|	When sending an incorrect partition query e.g: /parition/_find, send a 400 with the message: Partition must not start with an underscore This makes it consistent with all the other partition requests
*	Merge pull request #1933 from cloudant/fix-compilation-warnings	iilyak	2019-02-22	1	-1/+1
\|\ \| \| \| \|	Remove compilation warnings
\| *	Remove compilation warnings	ILYA Khlopotov	2019-02-22	1	-1/+1
\|/
*	Merge pull request #1932 from apache/update-config-dep	Peng Hui Jiang	2019-02-22	1	-1/+1
\|\ \| \| \| \|	Update config dependency to 2.1.6
\| *	Update config dependency to 2.1.6update-config-dep	jiangph	2019-02-22	1	-1/+1
\|/ \| \| \|	This allows to keep features on config process restart
*	Fix elixir tests on Jenkins (#1931)	garren smith	2019-02-21	14	-86/+141
\| \| \| \| \| \| \| \| \|	fix flaky elixir tests on jenkins * Add retry_until to some flaky tests. * Add skip_on_jenkins tag for tests that won't pass with retry_until but pass on Travis. * set jenkins timeout to 90 minutes
*	Merge pull request #1925 from apache/allow-list-for-purge-docid	Peng Hui Jiang	2019-02-21	2	-3/+74
\|\ \| \| \| \|	Support list for docid when using couch_db:purge_docs/3
\| *	Support list for docid when using couch_db:purge_docs/3	jiangph	2019-02-21	2	-3/+74
\|/ \| \| \| \|	The default type for docid is binary, and this change is to extend to support list for docid when using couch_db:purge_docs/3.
*	Fix elixir tests and add back to make check (#1918)	garren smith	2019-02-15	9	-53/+76
\| \| \| \| \| \|	* Add back elixir tests to make check * Fix reliability of the flaky elixir tests
*	Merge pull request #1642 from ↵	iilyak	2019-02-14	6	-15/+271
\|\ \| \| \| \| \| \| \| \|	cloudant/91984-set-io_priority-for-couch-index-pids Set io_priority for couch_index pids
\| *	Set io_priority for couch_index pids	ILYA Khlopotov	2019-02-14	6	-15/+271
\|/
*	Merge pull request #1803 from cloudant/configurable-auth-salt	Jay Doane	2019-02-12	1	-0/+66
\|\ \| \| \| \|	Sync admin passwords at cluster setup finish
\| *	Sync admin password hashes at cluster setup finish	Jay Doane	2019-02-12	1	-0/+66
\|/ \| \| \| \| \|	This ensures that admin password hashes are the same on all nodes when passwords are set directly on each node rather than through the coordinator node.
*	Merge pull request #1910 from apache/import-cloudant-ken	Robert Newson	2019-02-07	2	-0/+3
\|\ \| \| \| \|	Import ken from cloudant
\| *	Import kenimport-cloudant-ken	Robert Newson	2019-02-07	2	-0/+3
\|/
*	Add check for repeated `partition` definitions	Garren Smith	2019-02-07	2	-1/+17
\| \| \| \| \| \| \| \| \| \|	This is a usability improvement. If someone specifies the a `partition` value in the query string that is different than the `partition` value in the URL path it is not clear which value would be used. This allows specifying it in both places as long as the query string matches the URL path and throws a 400 Bad Request error otherwise. Co-Authored-By: Garren Smith <garren.smith@gmail.com>
*	fix couchup for python3 (#1905)	Clemens Stolle	2019-02-07	1	-1/+3
\|
*	Merge pull request #1904 from apache/import-smoosh	Robert Newson	2019-02-06	8	-1085/+11
\|\ \| \| \| \|	Import smoosh from Cloudant
\| *	remove elixir tests from 'make check' until they are reliable	Robert Newson	2019-02-06	1	-1/+0
\| \|
\| *	run formatting check before time-consuming tests	Robert Newson	2019-02-06	1	-1/+1
\| \|
\| *	Import smoosh from Cloudant	Robert Newson	2019-02-06	7	-1083/+10
\|/ \| \| \|	Remove couch_compaction_daemon and related tests too.
*	Force mix rebar/hex/deps get on make elixir (#1894)	Joan Touzet	2019-02-05	2	-6/+14
\|
*	Merge pull request #1901 from apache/fix-doc-update-invalid-rev-crash	Eric Avdey	2019-02-05	2	-5/+24
\|\ \| \| \| \|	Fix `badarg` crash on an invalid revision for individual doc update
\| *	Fix badarg crash on invalid rev for individual doc updatefix-doc-update-invalid-rev-crash	Eric Avdey	2019-02-04	2	-5/+24
\|/
*	Fix from_json_obj_validate crash when provided rev isn't a valid hex	Eric Avdey	2019-02-04	2	-3/+16
\|
*	Make from_json_error_cases tests idiomatic	Eric Avdey	2019-02-04	1	-10/+6
\|
*	Merge pull request #1889 from apache/import-cloudant-ioq	Robert Newson	2019-02-01	4	-20/+15
\|\ \| \| \| \|	Import IOQ from Cloudant
\| *	We don't need to verify that erlang:garbage_collect() worksimport-cloudant-ioq	Robert Newson	2019-02-01	1	-16/+1
\| \|
\| *	increase timeout on test	Robert Newson	2019-02-01	1	-1/+1
\| \|
\| *	Run each apps test in a separate process	Robert Newson	2019-02-01	1	-3/+10
\| \| \| \| \| \| \| \|	This reduces non-causal test failures between apps.
\| *	Import IOQ from Cloudant	Robert Newson	2019-02-01	2	-1/+4
\|/
*	add w:3 for lots of docs test (#1893)	garren smith	2019-02-01	1	-1/+1
\|
*	format	Garren Smith	2019-01-31	1	-1/+5
\|
*	Add junit formatter	Garren Smith	2019-01-31	3	-2/+4
\| \| \| \|	Add the junit formatter so that Jenkins can read the elixir tests