summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* Optimize mem3:dbname/1 functionNick Vatamaniuc2023-05-172-3/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `mem3:dbname/1` with a `<<"shard/...">>` binary is called quite a few times as seen when profiling with fprof: https://gist.github.com/nickva/38760462c1545bf55d98f4898ae1983d In that case `mem3:dbname` is removing the timestamp suffix. However, because it uses `filename:rootname/1` which handles cases pertaining to file system paths and such, it ends up being a bit more expensive than necessary. To optimize it assume it has a timestamp suffix and try to parse it out first, and then verify can be parsed into an integer, if that fails fall back to using `filename:rootname/1`. To lower chance of the timestamp suffix changing and us not noticing move the shard suffix generation function from fabric to mem3 so the generating and the stripping functions are right next to each other. A quick speed comparison test shows a 6x speedup or so: ``` shard_speed_test() -> Shard = <<"shards/80000000-9fffffff/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.1234567890">>, shard_speed_check(Shard, 10000). shard_speed_check(Shard, N) -> T0 = erlang:monotonic_time(), do_dbname(Shard, N), Dt = erlang:monotonic_time() - T0, DtUsec = erlang:convert_time_unit(Dt, native, microsecond), DtUsec / N. do_dbname(_, 0) -> ok; do_dbname(Shard, N) -> _ = dbname(Shard), do_dbname(Shard, N - 1). ``` On main: ``` (node1@127.0.0.1)1> mem3:shard_speed_test(). 1.3099 ``` With PR: ``` (node1@127.0.0.1)1> mem3:shard_speed_test(). 0.1959 ```
* mango: address missing parts of the `_index` APIGabor Pali2023-05-163-21/+151
| | | | | | | | | | | | Many of the requests aimed outside the scope of the `_index` endpoint are not handled gracefully but trigger an internal server error. Enhance the index HTTP REST API handler logic to return proper answers for invalid queries and supply it with more exhaustive integration tests. Provide documentation for the existing `_index/_bulk_delete` endpoint as it was missing, and mention that the `_design` prefix is not needed when deleting indexes.
* Add a simple fabric benchmarkNick Vatamaniuc2023-05-132-0/+450
| | | | | | | | | | | | | | | | | | | | | | | | This is mostly a diagnostic tool in the spirit of couch_debug. It creates a database, fills it with some docs, and then tries to read them. It computes rough expected rates for doc operations: how many docs per second it could insert, read, get via _all_docs, etc. When the test is done, it deletes the database. If it crashes, it also deletes the database. If someone brutally kills it, the subsequent runs will still find old databases and delete them. To run a benchmark: ``` fabric_bench:go(). ``` Pass parameters as a map: ``` fabric_bench:go(#{doc_size=>large, docs=>25000}). ``` To get available options: ``` fabric_bench:opts() ```
* Speed up internal replicatorNick Vatamaniuc2023-05-111-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Increase internal replicator default batch size and batch count. On systems with a slower (remote) disks, or a slower dist protocol, internal replicator can easily fall behind during a high rate of bulk_docs ingestion. For each batch of 100 it had to sync security properties, make an rpc call to fetch remote target sync checkpoint, open handles, fetch revs diff, etc. If there are changes to sync it would also incur the commit (fsycn) delay as well. It make sense to operate on slightly larger batches to increase performance. I picked 500 as that's the default for the (external) replicator. It also helps to keep replicating more than one batch once we've brought the source and target data into the page cache, so opted to make it do 5 batches per job run at most. A survey of other batch size already in use by the internal replicator: * Shard splitting uses a batch of 2000 [1]. * Seed" system dbs replication uses 1000 [2] There is some danger in creating too large of a rev list for highly conflicted documents. In that case already have chunking for max rev [3] to keep everything under 5000 revs per batch. To be on the safe side both values are now configurable and can be adjusted at runtime. To validate how this affects performance used a simple benchmarking utility: https://gist.github.com/nickva/9a2a3665702a876ec06d3d720aa19b0a With defaults: ``` fabric_bench:go(). ... *** DB fabric-bench-1683835787725432000 [{q,4},{n,3}] created. Inserting 100000 docs * Add 100000 docs small, bs=1000 (Hz): 420 --- mem3_sync backlog: 76992 --- mem3_sync backlog: 82792 --- mem3_sync backlog: 107592 ... snipped a few minutes of waiting for backlog to clear ... --- mem3_sync backlog: 1500 --- mem3_sync backlog: 0 ... ok ``` With this PR ``` (node1@127.0.0.1)3> fabric_bench:go(). ... *** DB fabric-bench-1683834758071419000 [{q,4},{n,3}] created. Inserting 100000 docs * Add 100000 docs small, bs=1000 (Hz): 600 --- mem3_sync backlog: 0 ... ok ``` 100000 doc insertion rate improved from 420 docs/sec to 600 with no minutes long sync backlog left over. [1] https://github.com/apache/couchdb/blob/a854625d74a5b3847b99c6f536187723821d0aae/src/mem3/src/mem3_reshard_job.erl#L52 [2] https://github.com/apache/couchdb/blob/a854625d74a5b3847b99c6f536187723821d0aae/src/mem3/src/mem3_rpc.erl#L181 [3] https://github.com/apache/couchdb/blob/a854625d74a5b3847b99c6f536187723821d0aae/src/mem3/src/mem3_rep.erl#L609
* fix(mango): covering indexes for partitioned databasesGabor Pali2023-05-114-28/+125
| | | | | | | | | | | | | | | | | The previous work that introduced the keys-only covering indexes did not count with the case that database might be partitioned. And since they use a different format for their own local indexes and the code does not handle that, it will crash. When indexes are defined globally for the partitioned databases, there is no problem because the view row does not include information about the partition, i.e. it is transparent. Add the missing support for these scenarios and extend the test suite to cover them as well. That latter required some changes to the base classes in the integration test suite as it apparently completely misses out on running test cases for partitioned databases.
* fix dreyfus after 'Improve nouveau mango integration'dreyfus-default-fieldRobert Newson2023-05-111-5/+14
| | | | | dreyfus/clouseau needs "string" type when indexing, so make a separate add_default_field_nouveau function
* mango: extend execution statistics with keys examined (#4569)PÁLI Gábor János2023-05-105-28/+219
| | | | | | | | | | | | | | | Add another field to the shard-level Mango execution statistics to keep track of the count of keys that were examined for the query. Note that this requires to change the way how stats are stored -- an approach similar to that of the view callback arguments was chosen, which features a map. This current version supports both the old and new formats. The coordinator may request getting the results in the new one by adding `execution_stats_map` for the arguments of the view callback. Otherwise the old format is used (without the extra field), which makes it possible to work with older coordinators. Old workers will automatically ignore this argument and answer in the old format.
* Merge changes from 3.3.x into mainNick Vatamaniuc2023-05-101-0/+1
| | | | | | We've been cherry picking from main into 3.3.x and 3.2.x but there were some changes we've been making on those branches only so we're bringing them into main.
* Improve nouveau mango integrationRobert Newson2023-05-104-2/+36
| | | | | | 1) Fix sorting on strings and numbers 2) use 'string' type for string fields 3) use 'text' type for the default field
* switch to GradleRobert Newson2023-05-101-1/+0
|
* upgrade nouveau to lucene 9.6.0Robert Newson2023-05-091-1/+1
|
* Remove extra unused variable (#4577)Russell Branca2023-05-091-1/+1
| | | * Remove extra unused variable
* Revert "fix(mango): GET invalid path under `_index` should not cause 500"Robert Newson2023-05-092-7/+3
| | | | This reverts commit c1195e43c0b55f99892bb5d6b593de178499b969.
* remove Content-MD5 header supportRobert Newson2023-05-095-110/+18
| | | | Part of a series of changes to expunge MD5 entirely.
* Remove duplicate etag generation functionNick Vatamaniuc2023-05-073-6/+5
| | | | | | | | | Use the couch_httpd one as it would be odd for couch_httpd to call chttpd. Also fix the test assertion order: the first argument should be the expected value, the second one should be the test value [1] [1] https://www.erlang.org/doc/apps/eunit/chapter.html#Assert_macros
* Encapsulate MD5 file checksums bits in couch_fileNick Vatamaniuc2023-05-052-4/+4
| | | | Avoid leaking checksumming details into couch_bt_engine.
* Import xxHashNick Vatamaniuc2023-05-0512-1/+6730
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://cyan4973.github.io/xxHash/ It's a reasonable replacment for MD5 * It's fast: about the speed of memcpy [1] * Has a 128 bit variant, so its output is the same size as MD5's. * It's not cryptographic. So it won't require any replacing again in a few years. * It's a single header file. So it's easy to update and build. We need only the 128 bit variant so the NIF only implements that API call at the moment. To avoid blocking the schedulers on large inputs, the NIF will swtich to using dirty CPU schedulers if the input size is greater than 1MB. Benchmarking on an 8 year-old laptop, 1MB block can be hashed in about 40-50 microseconds. As the first use case replace MD5 in ETag generation. [1] The speedup compared to MD5: ``` > Payload = crypto:strong_rand_bytes(1024*1024*100). <<3,24,111,1,194,207,162,224,207,181,240,217,215,218,218, 205,158,34,105,37,113,104,124,155,61,3,179,30,67,...>> > timer:tc(fun() -> erlang:md5(Payload) end). {712241, <<236,134,158,103,156,236,124,91,106,251,186,60,167,244, 30,53>>} > timer:tc(fun() -> crypto:hash(md5, Payload) end). {190945, <<236,134,158,103,156,236,124,91,106,251,186,60,167,244, 30,53>>} > timer:tc(fun() -> exxhash:xxhash128(Payload) end). {9952, <<24,239,152,98,18,100,83,212,174,157,72,241,149,121,161, 122>>} ``` (First element of the tuple is time in microseconds).
* mention flag and new dependenciesRobert Newson2023-05-052-0/+12
|
* Add report logging (#4483)Russell Branca2023-05-0419-29/+251
| | | | | | Add new report logging mechanism to log a map of key/value pairs --------- Co-authored-by: ILYA Khlopotov <iilyak@apache.org>
* Nouveau doc fixes (#4572)Glynn Bird2023-05-041-9/+9
| | | | | | | | | | | | | * FIX NOUVEAU DOCS - MISSING PARAMETER The Nouveau docs contain guidance on how to code definsively for handling docs with missing attributes. All of the code blocks in this section are missing the first parameter which indicates the data type to be indexed by Lucene. * FIX NOUVEAU DOCS - SWAP query= for q= In some places in the Nouveau API examples, there was a `query=` parameter, when it should be `q=`.
* CVE-2023-2626 details doc updateNick Vatamaniuc2023-05-021-5/+39
|
* Clarify encoding length in performance.rstRuben Laguna2023-05-021-2/+2
| | | | | The original text said that something that takes 16 hex digits can be represented with just 4 digits (in an hypothetical base62 encoding). I believe that was a typo since 16 hex digits encode a 8-byte sequence that will require (8/3)*4 = 11 digits in base64 (without padding).
* fix ken_server:nouveau_updatedfix-ken-server-nouveauRobert Newson2023-05-011-1/+1
|
* Make Erlang 24 the minimum versionNick Vatamaniuc2023-04-307-160/+19
| | | | | | | | | | | | | | | We can drop a compat nouveau_maps module. Later we can check the code and see if we can replace any maps:map/2 with maps:foreach/2 perhaps. In smoosh_persist, no need to check for file:delete/2. Later we should probably make the delete in couch_file do the same thing to avoid going through the file server. `sha_256_512_supported/0` has been true for a while but the check had been broken, the latest crypto module is `crypto:mac/3,4` so we can re-enable these tests. ML discussion: https://lists.apache.org/thread/7nxm16os8dl331034v126kb73jmb7j3x
* finish partitioned support for nouveauRobert Newson2023-04-294-6/+32
|
* OTP 23 supportRobert Newson2023-04-274-4/+125
|
* Another flaky couch_js fixNick Vatamaniuc2023-04-261-3/+1
| | | | | | | After the previous fix, now the flakiness moved on to the next line. Remove the extra assertion to avoid it generating flaky tests. The main assertion is already checked above that we get a crash.
* Noticed the new internal error couchjs test was flakyNick Vatamaniuc2023-04-261-0/+6
| | | | | | It's designed to crash and exit but depending when it does it exactly it may generate different errors. Add a few more clauses. Hopefully we don't have to completely remove it or comment it out.
* declare dependency on nouveauRobert Newson2023-04-261-1/+2
|
* doc(cve): add 2023-26268 placeholder & backport release notes3.3.2.post1Jan Lehnardt2023-04-252-0/+65
|
* doc(cve): add 2023-26268 placeholderJan Lehnardt2023-04-251-0/+27
|
* Import nouveau (#4291)Robert Newson2023-04-2241-33/+3653
| | | Nouveau - a new (experimental) full-text indexing feature for Apache CouchDB, using Lucene 9. Requires Java 11 or higher (19 is preferred).
* fix(mango): GET invalid path under `_index` should not cause 500Gabor Pali2023-04-192-3/+7
| | | | | | | Sending GET requests targeting paths under the `/{db}/_index` endpoint, e.g. `/{db}/_index/something`, cause an internal error. Change the endpoint's behavior to gracefully return HTTP 405 "Method Not Allowed" instead to be consistent with others.
* mango: refactorGabor Pali2023-04-181-20/+23
|
* mango: fix definition of index coverageGabor Pali2023-04-183-5/+107
| | | | | | | Covering indexes shall provide all the fields that the selector may contain, otherwise the derived documents would get dropped on the "match and extract" phase even if they were matching. Extend the integration tests to check this case as well.
* mango: enhance compositionality of `consider_index_coverage/3`Gabor Pali2023-04-182-34/+46
| | | | | | | Ideally, the effect of this function should be applied at a single spot of the code. When building the base options, covering index information should be left blank to make it consistent with the rest of the parameters.
* mango: mark fields with the `$exists` operator indexableGabor Pali2023-04-181-0/+94
| | | | | | | | | This is required to make index selection work better with covering indexes. The `$exists` operator prescribes the presence of the given field so that if an index has the field, it shall be considered because it implies true. Without this change, it will not happen, but covering indexes can work if the index is manually picked.
* mango: add integration tests for keys-only covering indexesGabor Pali2023-04-181-0/+115
|
* _find: mention the `covered` attribute in the `_explain` responseGabor Pali2023-04-181-0/+4
|
* mango: add eunit testsGabor Pali2023-04-182-1/+820
|
* mango: increase coverage of the `choose_best_index/1` testGabor Pali2023-04-181-2/+9
|
* mango: add type information for better self-documentationGabor Pali2023-04-183-8/+88
|
* mango: introduce support for covering indexesGabor Pali2023-04-182-27/+77
| | | | | | | | | | | | | | | | | | | | | | | | As a performance improvement, shorten the gap between Mango queries and the underlying map-reduce views: try to serve requests without pulling documents from the primary data set, i.e. run the query with `include_docs` set to `false` when there is a chance that it can be "covered" by the chosen index. The rows in the results are then built from the information stored there. Extend the response on the `_explain` endpoint to show information in the `covered` Boolean attribute about the query would be covered by the index or not. Remarks: - This should be a transparent optimization, without any semantical effect on the queries. - Because the main purpose of indexes is to store keys and the document identifiers, the change will only work in cases when the selected fields overlap with those. The chance of being covered could be increased by adding more non-key fields to the index, but that is not in scope here.
* Remove explicit importGabor Pali2023-04-182-143/+120
|
* Remove limit parameter from kenNick Vatamaniuc2023-04-172-19/+4
| | | | | | | It's not used anymore. In a test where it was used to test config persistence, replace it with `set_delay`.
* Improve couch_proc_managerNick Vatamaniuc2023-04-1515-546/+1062
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main improvement is speeding up process lookup. This should result in improved latency for concurrent requests which quickly acquire and release couchjs processes. Testing with concurrent vdu and map/reduce calls showed a 1.6 -> 6x performance speedup [1]. Previously, couch_proc_manager linearly searched through all the processes and executed a custom callback function for each to match design doc IDs. Instead, use a separate ets table index for idle processes to avoid scanning assigned processes. Use a db tag in addition to a ddoc id to quickly find idle processes. This could improve performance, but if that's not the case, allow configuring the tagging scheme to use a db prefix only, or disable the scheme altogether. Use the new `map_get` ets select guard [2] to perform ddoc id lookups during the ets select traversal without a custom matcher callback. In ordered ets tables use the partially bound key trick [3]. This helps skip scanning processes using a different query language altogether. Waiting clients used `os:timestamp/0` as a unique client identifier. It turns out, `os:timestamp/0` is not guaranteed to be unique and could result in some clients never getting a response. This bug was mostly likely the reason the "fifo client order" test had to be commented out. Fix the issue by using a newer monotonic timestamp function, and for uniqueness add the client's gen_server return tag at the end. Uncomment the previously commented out test so it can hopefully run again. When clients tag a previously untagged process, asynchronously replace the untagged process with a new process. This happens in the background and the client doesn't have to wait for it. When a ddoc tagged process cannot be found, before giving up, stop the oldest unused ddoc processes to allow spawning new fresh ones. To avoid doing a linear scan here, keep a separate `?IDLE_ACCESS` index with an ordered list of idle ddoc proceses sorted by their last usage time. When processes are returned to the pool, quickly respond to the client with an early return, instead of forcing them to wait until we re-insert the process back into the idle ets table. This should improve client latency. If the waiting client list gets long enough, where it waits longer than the gen_server get_proc timeout, do not waste time assigning or spawning a new process for that client, since it already timed-out. When gathering stats, avoid making gen_server calls, at least for the total number of processes spawned metric. Table sizes can be easily computed with `ets:info(Table, size)` from outside the main process. In addition to peformance improvements clean up the couch_proc_manager API by forcing all the calls to go through properly exported functions instead of doing direct gen_server calls. Remove `#proc_int{}` and use only `#proc{}`. The cast to a list/tuple between `#proc_int{}` and `#proc{}` was dangerous and it avoided the compiler checking that we're using the proper fields. Adding an extra field to the record resulted in mis-matched fields being assigned. To simplify the code a bit, keep the per-language count in an ets table. This helps not having to thread the old and updated state everywhere. Everything else was mostly kept in ets tables anyway, so we're staying consistent with that general pattern. Improve test coverage and convert the tests to use the `?TDEF_FE` macro so there is no need for the awkward `?_test(begin ... end)` construct. [1] https://gist.github.com/nickva/f088accc958f993235e465b9591e5fac [2] https://www.erlang.org/doc/apps/erts/match_spec.html [3] https://www.erlang.org/doc/man/ets.html#table-traversal
* fix (prometheus): do not emit ophaned HELP/TYPE linesWill Holley2023-04-141-0/+2
| | | | | In cases where metrics are optional, prevent `# HELP` and `# TYPE` lines from being emitted if there is no corresponding metric series.
* feat (prometheus): add Erlang distribution statsWill Holley2023-04-141-1/+136
| | | | | | | | | | | | | | | | | | | | | | | | # Why The _prometheus endpoint was missing the erlang distribution stats returned by the _system endpoint. This is useful when diagnosing networking issues between couchdb nodes. # How Adds a new function `couch_prometheus_server:get_distribution_stats/0`. This gathers the distribution stats in a similar fashion to `chttpd_node:get_distribution_stats/0` but formats them in a more prometheus-friendly way. Naming convention follows prometheus standards, so the type of the value is appended to the metric name and, where counter types are used, a "_total" suffix is added. For example: ``` couchdb_erlang_distribution_recv_oct_bytes_total{node="node2@127.0.0.1"} 30609 couchdb_erlang_distribution_recv_oct_bytes_total{node="node3@127.0.0.1"} 28392 ```
* feat (prometheus): couch_db_updater and couch_file queue statsWill Holley2023-04-143-4/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | # What Adds summary metrics for couch_db_updater and couch_file, the same as returned by the `_system` endpoint. Unlike the other message queue stats, these are returned as a Prometheus summary type across the following metrics, using `couch_db_updater` as an example: * couchdb_erlang_message_queue_couch_db_updater{quantile="0.5"} * couchdb_erlang_message_queue_couch_db_updater{quantile="0.9"} * couchdb_erlang_message_queue_couch_db_updater{quantile="0.99"} * couchdb_erlang_message_queue_couch_db_updater_sum * couchdb_erlang_message_queue_couch_db_updater_count The count metric represents the number of processes and the sum is the total size of all message queues for those processes. In addition, min and max message queue sizes are returned, matching the _system endpoint response: * couchdb_erlang_message_queue_couch_db_updater_min * couchdb_erlang_message_queue_couch_db_updater_max # How This represents a new type of metric in the prometheus endpoint - the existing `summary` types have all been for latency histograms - so a new utility function `pid_to_prom_summary` is added to format the message queue stats into prometheus metrics series. In `chttpd_node` I've extracted the formatting step from the `db_pid_stats` function to allow for re-use between `chttpd_node` and `couch_prometheus_server`, where the result is formatted differently. `chttpd_node` doesn't seem like the best place to put shared code like this but neither does there seem an obvious place to extract it to as an alternative, so I've left it for now.
* feat (prometheus): include aggregated couch/index message queuesWill Holley2023-04-142-44/+23
| | | | | | | | | | | | | | | In #3860 and #3366 we added sharding to `couch_index_server` and `couch_server`. The `_system` endpoint surfaces a "fake" message queue for each of these contining the aggregated queue size across all shards. This commit adds the same for the `_prometheus` endpoint. Originally I had thought to just filter out the per-shard queue lengths as we've not found these to be useful in Cloudant, but I'll leave them in for now for consistency with the `_system` endpoint. Arguably, we should filter in both places if there's agreement that the per-shard queue lengths are just noise.