delta/couchdb.git - github.com: apache/couchdb.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix rebar3 version to 3.21.0HEAD main	Nick Vatamaniuc	2023-05-17	1	-1/+1
\| \| \| \| \|	It's the latest version. Otherwise, unexpected updates there break our CI chain.
*	Optimize mem3:dbname/1 function	Nick Vatamaniuc	2023-05-17	2	-3/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`mem3:dbname/1` with a `<<"shard/...">>` binary is called quite a few times as seen when profiling with fprof: https://gist.github.com/nickva/38760462c1545bf55d98f4898ae1983d In that case `mem3:dbname` is removing the timestamp suffix. However, because it uses `filename:rootname/1` which handles cases pertaining to file system paths and such, it ends up being a bit more expensive than necessary. To optimize it assume it has a timestamp suffix and try to parse it out first, and then verify can be parsed into an integer, if that fails fall back to using `filename:rootname/1`. To lower chance of the timestamp suffix changing and us not noticing move the shard suffix generation function from fabric to mem3 so the generating and the stripping functions are right next to each other. A quick speed comparison test shows a 6x speedup or so: ``` shard_speed_test() -> Shard = <<"shards/80000000-9fffffff/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.1234567890">>, shard_speed_check(Shard, 10000). shard_speed_check(Shard, N) -> T0 = erlang:monotonic_time(), do_dbname(Shard, N), Dt = erlang:monotonic_time() - T0, DtUsec = erlang:convert_time_unit(Dt, native, microsecond), DtUsec / N. do_dbname(_, 0) -> ok; do_dbname(Shard, N) -> _ = dbname(Shard), do_dbname(Shard, N - 1). ``` On main: ``` (node1@127.0.0.1)1> mem3:shard_speed_test(). 1.3099 ``` With PR: ``` (node1@127.0.0.1)1> mem3:shard_speed_test(). 0.1959 ```
*	Update default vm.args settings	Nick Vatamaniuc	2023-05-16	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \|	* Increase distribution buffer size from 1MB to 32MB. We've been using this value at Cloudant for years in all the clusters. RabbitMQ defaults to 128MB, which is even higher. This might speed busy clusters with lots of distribution traffic. * Add a commented out example of scheduling flags to use in a Docker or Kube environment with CFS quotas (advice taken from https://erlangforums.com/t/vm-tuning-guide/1945/3).
*	mango: address missing parts of the `_index` API	Gabor Pali	2023-05-16	3	-21/+151
\| \| \| \| \| \| \| \| \| \| \| \|	Many of the requests aimed outside the scope of the `_index` endpoint are not handled gracefully but trigger an internal server error. Enhance the index HTTP REST API handler logic to return proper answers for invalid queries and supply it with more exhaustive integration tests. Provide documentation for the existing `_index/_bulk_delete` endpoint as it was missing, and mention that the `_design` prefix is not needed when deleting indexes.
*	Add a simple fabric benchmark	Nick Vatamaniuc	2023-05-13	2	-0/+450
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is mostly a diagnostic tool in the spirit of couch_debug. It creates a database, fills it with some docs, and then tries to read them. It computes rough expected rates for doc operations: how many docs per second it could insert, read, get via _all_docs, etc. When the test is done, it deletes the database. If it crashes, it also deletes the database. If someone brutally kills it, the subsequent runs will still find old databases and delete them. To run a benchmark: ``` fabric_bench:go(). ``` Pass parameters as a map: ``` fabric_bench:go(#{doc_size=>large, docs=>25000}). ``` To get available options: ``` fabric_bench:opts() ```
*	Merge pull request #4600 from apache/nouveau-query-deserializer-tests	Robert Newson	2023-05-12	2	-22/+101
\|\ \| \| \| \|	add tests for QueryDeserializer
\| *	add tests for QueryDeserializer	Robert Newson	2023-05-12	2	-22/+101
\|/ \| \| \| \| \|	this class is currently unreachable but might be a new way to specify the query in SearchRequest rather than expanding the string-based query syntax
*	Merge pull request #4599 from apache/nouveau-dev-container	Robert Newson	2023-05-12	1	-3/+10
\|\ \| \| \| \|	add nouveau and java tools to dev container
\| *	add nouveau and java tools to dev containernouveau-dev-container	Robert Newson	2023-05-12	1	-3/+10
\|/
*	Speed up internal replicator	Nick Vatamaniuc	2023-05-11	1	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Increase internal replicator default batch size and batch count. On systems with a slower (remote) disks, or a slower dist protocol, internal replicator can easily fall behind during a high rate of bulk_docs ingestion. For each batch of 100 it had to sync security properties, make an rpc call to fetch remote target sync checkpoint, open handles, fetch revs diff, etc. If there are changes to sync it would also incur the commit (fsycn) delay as well. It make sense to operate on slightly larger batches to increase performance. I picked 500 as that's the default for the (external) replicator. It also helps to keep replicating more than one batch once we've brought the source and target data into the page cache, so opted to make it do 5 batches per job run at most. A survey of other batch size already in use by the internal replicator: * Shard splitting uses a batch of 2000 [1]. * Seed" system dbs replication uses 1000 [2] There is some danger in creating too large of a rev list for highly conflicted documents. In that case already have chunking for max rev [3] to keep everything under 5000 revs per batch. To be on the safe side both values are now configurable and can be adjusted at runtime. To validate how this affects performance used a simple benchmarking utility: https://gist.github.com/nickva/9a2a3665702a876ec06d3d720aa19b0a With defaults: ``` fabric_bench:go(). ... *** DB fabric-bench-1683835787725432000 [{q,4},{n,3}] created. Inserting 100000 docs * Add 100000 docs small, bs=1000 (Hz): 420 --- mem3_sync backlog: 76992 --- mem3_sync backlog: 82792 --- mem3_sync backlog: 107592 ... snipped a few minutes of waiting for backlog to clear ... --- mem3_sync backlog: 1500 --- mem3_sync backlog: 0 ... ok ``` With this PR ``` (node1@127.0.0.1)3> fabric_bench:go(). ... *** DB fabric-bench-1683834758071419000 [{q,4},{n,3}] created. Inserting 100000 docs * Add 100000 docs small, bs=1000 (Hz): 600 --- mem3_sync backlog: 0 ... ok ``` 100000 doc insertion rate improved from 420 docs/sec to 600 with no minutes long sync backlog left over. [1] https://github.com/apache/couchdb/blob/a854625d74a5b3847b99c6f536187723821d0aae/src/mem3/src/mem3_reshard_job.erl#L52 [2] https://github.com/apache/couchdb/blob/a854625d74a5b3847b99c6f536187723821d0aae/src/mem3/src/mem3_rpc.erl#L181 [3] https://github.com/apache/couchdb/blob/a854625d74a5b3847b99c6f536187723821d0aae/src/mem3/src/mem3_rep.erl#L609
*	fix(mango): covering indexes for partitioned databases	Gabor Pali	2023-05-11	4	-28/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous work that introduced the keys-only covering indexes did not count with the case that database might be partitioned. And since they use a different format for their own local indexes and the code does not handle that, it will crash. When indexes are defined globally for the partitioned databases, there is no problem because the view row does not include information about the partition, i.e. it is transparent. Add the missing support for these scenarios and extend the test suite to cover them as well. That latter required some changes to the base classes in the integration test suite as it apparently completely misses out on running test cases for partitioned databases.
*	Merge pull request #4596 from apache/nouveau-quality-plugins	Robert Newson	2023-05-11	54	-379/+250
\|\ \| \| \| \|	Nouveau quality plugins
\| *	apply spotless changes	Robert Newson	2023-05-11	53	-377/+232
\| \|
\| *	add test and code quality plugins	Robert Newson	2023-05-11	1	-2/+18
\|/
*	Merge pull request #4595 from apache/dreyfus-default-field	Robert Newson	2023-05-11	1	-5/+14
\|\ \| \| \| \|	fix dreyfus after 'Improve nouveau mango integration'
\| *	fix dreyfus after 'Improve nouveau mango integration'dreyfus-default-field	Robert Newson	2023-05-11	1	-5/+14
\|/ \| \| \| \|	dreyfus/clouseau needs "string" type when indexing, so make a separate add_default_field_nouveau function
*	mango: extend execution statistics with keys examined (#4569)	PÁLI Gábor János	2023-05-10	5	-28/+219
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add another field to the shard-level Mango execution statistics to keep track of the count of keys that were examined for the query. Note that this requires to change the way how stats are stored -- an approach similar to that of the view callback arguments was chosen, which features a map. This current version supports both the old and new formats. The coordinator may request getting the results in the new one by adding `execution_stats_map` for the arguments of the view callback. Otherwise the old format is used (without the extra field), which makes it possible to work with older coordinators. Old workers will automatically ignore this argument and answer in the old format.
*	Merge changes from 3.3.x into main	Nick Vatamaniuc	2023-05-10	5	-3/+9
\| \| \| \| \| \|	We've been cherry picking from main into 3.3.x and 3.2.x but there were some changes we've been making on those branches only so we're bringing them into main.
*	Merge pull request #4589 from apache/nouveau-mango-string-text	Robert Newson	2023-05-10	6	-7/+85
\|\ \| \| \| \|	Improve nouveau mango integration
\| *	Improve nouveau mango integration	Robert Newson	2023-05-10	6	-7/+85
\|/ \| \| \| \| \|	1) Fix sorting on strings and numbers 2) use 'string' type for string fields 3) use 'text' type for the default field
*	Merge pull request #4587 from apache/nouveau-gradle	Robert Newson	2023-05-10	13	-300/+437
\|\ \| \| \| \|	switch to Gradle
\| *	switch to Gradle	Robert Newson	2023-05-10	13	-300/+437
\|/
*	Merge pull request #4585 from apache/nouveau-lucene-9.6.0	Robert Newson	2023-05-09	3	-17/+19
\|\ \| \| \| \|	upgrade nouveau to lucene 9.6.0
\| *	upgrade nouveau to lucene 9.6.0	Robert Newson	2023-05-09	3	-17/+19
\|/
*	Remove extra unused variable (#4577)	Russell Branca	2023-05-09	1	-1/+1
\| \| \|	* Remove extra unused variable
*	Merge pull request #4584 from apache/revert-invalid-path-fix	Robert Newson	2023-05-09	2	-7/+3
\|\ \| \| \| \|	Revert "fix(mango): GET invalid path under `_index` should not cause 500
\| *	Revert "fix(mango): GET invalid path under `_index` should not cause 500"	Robert Newson	2023-05-09	2	-7/+3
\|/ \| \| \|	This reverts commit c1195e43c0b55f99892bb5d6b593de178499b969.
*	Merge pull request #4574 from apache/remove-content-md5-header	Robert Newson	2023-05-09	6	-133/+18
\|\ \| \| \| \|	remove Content-MD5 header support
\| *	remove Content-MD5 header support	Robert Newson	2023-05-09	6	-133/+18
\|/ \| \| \|	Part of a series of changes to expunge MD5 entirely.
*	Add new configure option and restore --dev behavior (#4582)	Ronny Berndt	2023-05-09	1	-0/+7
\|
*	Remove duplicate etag generation function	Nick Vatamaniuc	2023-05-07	3	-6/+5
\| \| \| \| \| \| \| \| \|	Use the couch_httpd one as it would be odd for couch_httpd to call chttpd. Also fix the test assertion order: the first argument should be the expected value, the second one should be the test value [1] [1] https://www.erlang.org/doc/apps/eunit/chapter.html#Assert_macros
*	Encapsulate MD5 file checksums bits in couch_file	Nick Vatamaniuc	2023-05-05	2	-4/+4
\| \| \| \|	Avoid leaking checksumming details into couch_bt_engine.
*	Import xxHash	Nick Vatamaniuc	2023-05-05	16	-1/+6760
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://cyan4973.github.io/xxHash/ It's a reasonable replacment for MD5 * It's fast: about the speed of memcpy [1] * Has a 128 bit variant, so its output is the same size as MD5's. * It's not cryptographic. So it won't require any replacing again in a few years. * It's a single header file. So it's easy to update and build. We need only the 128 bit variant so the NIF only implements that API call at the moment. To avoid blocking the schedulers on large inputs, the NIF will swtich to using dirty CPU schedulers if the input size is greater than 1MB. Benchmarking on an 8 year-old laptop, 1MB block can be hashed in about 40-50 microseconds. As the first use case replace MD5 in ETag generation. [1] The speedup compared to MD5: ``` > Payload = crypto:strong_rand_bytes(10241024100). <<3,24,111,1,194,207,162,224,207,181,240,217,215,218,218, 205,158,34,105,37,113,104,124,155,61,3,179,30,67,...>> > timer:tc(fun() -> erlang:md5(Payload) end). {712241, <<236,134,158,103,156,236,124,91,106,251,186,60,167,244, 30,53>>} > timer:tc(fun() -> crypto:hash(md5, Payload) end). {190945, <<236,134,158,103,156,236,124,91,106,251,186,60,167,244, 30,53>>} > timer:tc(fun() -> exxhash:xxhash128(Payload) end). {9952, <<24,239,152,98,18,100,83,212,174,157,72,241,149,121,161, 122>>} ``` (First element of the tuple is time in microseconds).
*	mention flag and new dependencies	Robert Newson	2023-05-05	2	-0/+12
\|
*	Add report logging (#4483)	Russell Branca	2023-05-04	19	-29/+251
\| \| \| \| \| \|	Add new report logging mechanism to log a map of key/value pairs --------- Co-authored-by: ILYA Khlopotov <iilyak@apache.org>
*	Nouveau doc fixes (#4572)	Glynn Bird	2023-05-04	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	* FIX NOUVEAU DOCS - MISSING PARAMETER The Nouveau docs contain guidance on how to code definsively for handling docs with missing attributes. All of the code blocks in this section are missing the first parameter which indicates the data type to be indexed by Lucene. * FIX NOUVEAU DOCS - SWAP query= for q= In some places in the Nouveau API examples, there was a `query=` parameter, when it should be `q=`.
*	CVE-2023-2626 details doc update	Nick Vatamaniuc	2023-05-02	1	-5/+39
\|
*	Clarify encoding length in performance.rst	Ruben Laguna	2023-05-02	1	-2/+2
\| \| \| \| \|	The original text said that something that takes 16 hex digits can be represented with just 4 digits (in an hypothetical base62 encoding). I believe that was a typo since 16 hex digits encode a 8-byte sequence that will require (8/3)*4 = 11 digits in base64 (without padding).
*	Merge pull request #4559 from apache/fix-ken-server-nouveau	Robert Newson	2023-05-02	1	-1/+1
\|\ \| \| \| \|	fix ken_server:nouveau_updated
\| *	fix ken_server:nouveau_updatedfix-ken-server-nouveau	Robert Newson	2023-05-01	1	-1/+1
\|/
*	Make Erlang 24 the minimum version	Nick Vatamaniuc	2023-04-30	10	-164/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can drop a compat nouveau_maps module. Later we can check the code and see if we can replace any maps:map/2 with maps:foreach/2 perhaps. In smoosh_persist, no need to check for file:delete/2. Later we should probably make the delete in couch_file do the same thing to avoid going through the file server. `sha_256_512_supported/0` has been true for a while but the check had been broken, the latest crypto module is `crypto:mac/3,4` so we can re-enable these tests. ML discussion: https://lists.apache.org/thread/7nxm16os8dl331034v126kb73jmb7j3x
*	Merge pull request #4557 from apache/nouveau-partition-support	Robert Newson	2023-04-29	6	-7/+106
\|\ \| \| \| \|	finish partitioned support for nouveau
\| *	finish partitioned support for nouveau	Robert Newson	2023-04-29	6	-7/+106
\|/
*	Merge pull request #4556 from apache/nouveau-remove-afterburner	Robert Newson	2023-04-28	1	-4/+0
\|\ \| \| \| \|	remove afterburner
\| *	remove afterburner	Robert Newson	2023-04-28	1	-4/+0
\|/ \| \| \|	https://github.com/FasterXML/jackson-modules-base/tree/jackson-modules-base-2.13.3/blackbird#readme
*	Merge pull request #4555 from apache/nouveau-race-condition-creating-index	Robert Newson	2023-04-27	3	-18/+80
\|\
\| *	fix race condition when creating indexes	Robert Newson	2023-04-27	3	-18/+80
\|/
*	Merge pull request #4543 from apache/jenkins-enable-nouveau-ci	Robert Newson	2023-04-27	15	-36/+280
\|\ \| \| \| \|	enable nouveau in CI
\| *	cache (immutable) maven artifacts	Robert Newson	2023-04-27	1	-1/+1
\| \|
\| *	maven batch mode	Robert Newson	2023-04-27	1	-3/+3
\| \|