summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* WIP send await time in response header - dreyfusdreyfus-await-timeRobert Newson2023-04-043-12/+35
|
* Merge pull request #4507 from apache/prometheus_metricsWill Holley2023-04-036-27/+82
|\ | | | | feat: additional prometheus metrics
| * fix (prometheus): gauge types for metrics that can be decrementedWill Holley2023-04-031-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | Prometheus assumes that metrics with `counter` types are cumulative. This isn't the case in CouchDB / Folsom, which allows counters to be decremented. This changes the type of metrics where we decrement the counter values to `gauge`: - couchdb_open_databases - couchdb_couchdb_open_os_files - couchdb_httpd_clients_requesting_changes
| * feat (prometheus): membership metricWill Holley2023-04-031-1/+13
| | | | | | | | | | | | | | | | | | | | | | Add a gauge metric `membership` to the `_prometheus` endpoint. The metric has labels: - `nodes=all_nodes` - `nodes=cluster_nodes` matching the fields in the `_membership` endpoint (I think consistency here is more useful than renaming the labels to e.g. expected/actual).
| * feat (prometheus): internal_replication_jobs metricWill Holley2023-03-314-18/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds an internal replication backlog metric. In the `_system` endpoint this is called `internal_replication_jobs`, so I've preserved the name, though it appears to represent the backlog of changes. Adding a dependency on mem3 to `couch_prometheus` requires some changes to the tests and dependency tree: - `couchdb.app.src` no longer lists a dependency on `couch_prometheus`. I don't know why this was needed previously - it doesn't appear to be required. - `couch_prometheus` now has dependencies on `couch` and `mem3`. This both ensures that `couch_prometheus` doesn't crash if mem3 isn't running and also resolves a race condition on startup where the `_prometheus` endpoint returns incomplete stats. - `couch_prometheus:system_stats_test/0` is moved to `couch_prometheus_e2e_tests:t_starts_with_couchdb/0`. It is really an integration test, since it depends on the `_prometheus` endpoint being able to collect data for all the metrics, and it tests only that the metrics names begin with `couchdb_`.
| * feat (prometheus): metrics for individual message queuesWill Holley2023-03-313-9/+16
|/ | | | | | | | | | | The `_prometheus` endpoint today includes size/min/max metrics across all message queues. This adds a new metric - `erlang_message_queue_size{queue_name="<name>"}` which tracks the size of individual message queues. This could replace the previous metrics since those can be derived from the new metric by prometheus, but I've left them in place for compatibility.
* docs(hosts): Remove misleading /etc/hosts info (#4506)Ronny Berndt2023-03-301-4/+0
|
* Treat javascript internal errors as fatalNick Vatamaniuc2023-03-293-1/+47
| | | | | | | | | | | | | | | | | | | | Spidermonkey sometimes throws an `InternalError` when exceeding memory limits, when normally we'd expect it to crash or exit with a non-0 exit code. Because we trap exceptions, and continue emitting rows, it is possible for users views to randomly miss indexed rows based on whether GC had run or not, other internal runtime state which may have been consuming more or less memory until that time. To prevent the view continuing processing documents, and randomly dropping emitted rows, depending on memory pressure in the JS runtime at the time, choose to treat Internal errors as fatal. After an InternalError is raised we expect the process to exit just like it would during OOM. Add a test to assert this happens. Fix https://github.com/apache/couchdb/issues/4504
* Merge pull request #4503 from apache/410-goneRobert Newson2023-03-291-0/+2
|\ | | | | add error_info clause for 410 Gone
| * add error_info clause for 410 GoneRobert Newson2023-03-291-0/+2
|/
* docs(_find): catch up with the implementation and further fixesGabor Pali2023-03-271-15/+34
| | | | | | | | | - Unify the style of synopsis lines. - Mention the `partitioned` parameter where applicable. - Fix formatting of `warning` in one of the example responses. - Trade the possibly retired `range` attribute for `mrargs` and expand the attributes within `opts` in the response of `_explain`.
* Merge pull request #4495 from apache/add_db_event_crash_testRobert Newson2023-03-241-0/+233
|\ | | | | eunit test to assert ddoc_updated clause doesn't throw
| * Increase index crash test cover a bitNick Vatamaniuc2023-03-241-59/+164
| | | | | | | | | | | | | | Fail index opens in a few different ways and assert async_error is called. Also crash an index process after it's open to test it doesn't take down any index servers.
| * eunit test to assert ddoc_updated clause doesn't throwRobert Newson2023-03-241-0/+128
|/ | | | | We pass in a shard name that doesn't exist, causing couch_util:with_db to throw. we assert that we get back {ok, St} and don't crash.
* Add log directory to eunit setup template (#4493)Ronny Berndt2023-03-233-4/+7
| | | | To log messages of test-runs to a single file, we need to add a absoulute path to the file logger.
* Suppress sasl_error_logger output on Windows (#4492)Ronny Berndt2023-03-231-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Correct env variable `ERL_AFLAGS` to suppress `sasl_error_logger` output. Suppressing messages like: ``` =ERROR REPORT==== 22-Mar-2023::22:47:36.788000 === Error in process <0.998.0> with exit value: {database_does_not_exist, [{mem3_shards,load_shards_from_db,"_users", [{file,"src/mem3_shards.erl"},{line,430}]}, {mem3_shards,load_shards_from_disk,1, [{file,"src/mem3_shards.erl"},{line,405}]}, {mem3_shards,load_shards_from_disk,2, [{file,"src/mem3_shards.erl"},{line,434}]}, {mem3_shards,for_docid,3,[{file,"src/mem3_shards.erl"},{line,100}]}, {fabric_doc_open,go,3,[{file,"src/fabric_doc_open.erl"},{line,39}]}, {chttpd_auth_cache,ensure_auth_ddoc_exists,2, [{file,"src/chttpd_auth_cache.erl"},{line,214}]}, {chttpd_auth_cache,listen_for_changes,1, [{file,"src/chttpd_auth_cache.erl"},{line,160}]}]} chttpd_socket_buffer_size_test:51: small_recbuf...[0.006 s] ok =INFO REPORT==== 22-Mar-2023::22:47:36.897000 === application: chttpd exited: stopped type: temporary =INFO REPORT==== 22-Mar-2023::22:47:36.897000 === application: fabric exited: stopped type: temporary ```
* Merge pull request #4491 from apache/couch_index_fixesRobert Newson2023-03-223-63/+119
|\ | | | | Couch index fixes
| * track index pids during open and don't crash if they doRobert Newson2023-03-222-14/+33
| |
| * don't crash in handle_db_eventRobert Newson2023-03-221-38/+68
| |
| * log the original stack trace if Mod:Func throwsRobert Newson2023-03-221-7/+18
| |
| * Revert "catch and log any error from mem3:local_shards"Robert Newson2023-03-221-5/+1
|/ | | | This reverts commit 937ccb6ef84b773882c967a6fa6f4d71df42e4cf.
* fix(doc): reverse definition of `all_nodes` and `cluster_nodes` to match realityJan Lehnardt2023-03-221-1/+1
|
* Bump Erlang 24 and 25 in CINick Vatamaniuc2023-03-222-2/+2
| | | | It fixes aliases leak and prepare for new package updates
* Merge pull request #4485 from apache/couch_index_crashesRobert Newson2023-03-211-1/+5
|\ | | | | catch and log any error from mem3:local_shards
| * catch and log any error from mem3:local_shardsRobert Newson2023-03-211-1/+5
|/
* docs(typo): Fix server name duplicate (#4484)Ronny Berndt2023-03-211-1/+1
|
* feat: add type and descriptions to prometheus output (#4475)Will Holley2023-03-202-51/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The `/_node/_local/_prometheus` is a missing `TYPE` annotation for `couchdb_httpd_status_codes`. In addition, it contains no `HELP` annotations, which are useful when exploring the metrics, particularly where metrics do not strictly match those returned by the `_stats` or `_system` endpoints. This PR adds the missing `TYPE` annotation and adds `HELP` annotations to all metrics. The spec for the prometheus text format is at https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md, for reference. It also adds additional spacing between the metrics series, making it easier for humans to parse. ## couch_prometheus_util:to_prom/3 `couch_prometheus_util:to_prom/3` is replaced by `couch_prometheus_util:to_prom/4`. which now expects a description alongside the metric name and type. ## couch_prometheus_util:couch_to_prom/3 `couch_prometheus_util:couch_to_prom/3` now extracts the metrics description from the metric metadata returned by `couch_stats`. In some cases, where the metrics are transformed e.g. from multiple metrics to a single metric with a tag, the description is explicitly specified to match the new metric semantics.
* Migrate configure settings to Windows (#4479)Ronny Berndt2023-03-171-0/+6
| | | | Migrate the settings `with_proper` and `erlang_md5` to `configure.ps1` to add it to `config.erl`.
* Improve couch_js_testsNick Vatamaniuc2023-03-161-60/+111
| | | | | | | | | | | | | | | | | | | | | | | | | There are a few minor improvements: - Add more tests to check sandboxing resets, and that docs are "frozen". - Remove the extra `\n` and `"` around function body lines. Erlang can do multi-line binaries just fine. Mark the sections with %erlfmt-ignore to the formatter doesn't complain. - Generalize `should_create_sandbox` test to check for the `not defined` string only. Experimenting with QuickJS noticed that it uses single quotes around`'Object.foo' is not defined` and SM doesn't. So check for `not defined` part only as it's obvious enough what the check is about. - Make sure to return test procs back to the pool. Previously, none of the tests returned the processes back into the pool, and when the tests ended, they were forcibly killed which resulted in log noise that looked like: ``` erl_child_setup: failed with error 32 on line 265 erl_child_setup: failed with error 32 on line 265 ... ```
* fix: prometheus counter metric naming (#4474)Will Holley2023-03-142-73/+77
| | | | | | | | | | | | | | | | `couch_prometheus_util:counter_metric/1` is intended to add a `_total` suffix to the name of metrics with type `counter` if the name does not already end with `_total`. The implementation was previously incorrect, resulting in metrics with names like `document_purges_total_total`. This adds basic eunit tests for the failure / success scenarios and fixes the implementation. It is a breaking change for consumers of the `_node/_local/_prometheus` endpoint, however, since the following metric is renamed: * `couchdb_document_purges_total_total` -> `couchdb_document_purges_total`
* Fix list ordering and indentation in "Search" docs (#4476)Ronny Berndt2023-03-131-46/+44
|
* Avoid re-compiling filter view functionsNick Vatamaniuc2023-03-133-8/+6
| | | | | | | | | | | | | Filter view functions feature re-uses view map function for filtering _changes feeds. Instead of accumulating emitted KVs, it uses custom emit() function which just toggles a flag. However, in order to use this optimisation, the function is compiled first with the regular emit function, then the function source is queried with a non-portable toSource() method, and re-compiled again with a new sandbox where emit is overridden. Instead of reparsing and re-compiling it, pass the sandbox to the compile function and compile filter views with that correct sandbox to start with. Moreover, this helps remove another non-portable function call.
* fix: remove duplicate couchdb_erlang* from _prometheusWill Holley2023-03-132-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | The `_node/_local/_prometheus` was returning duplicate rows for the following metrics: ``` couchdb_erlang_memory_bytes couchdb_erlang_gc_collections_total couchdb_erlang_gc_words_reclaimed_total couchdb_erlang_context_switches_total couchdb_erlang_reductions_total couchdb_erlang_processes couchdb_erlang_process_limit ``` Prometheus will gracefully handle the duplication, picking the first entry only, but it bloats the response and can potentially cause unexpected results if there's a signficant delay capturing the samples. The duplication is caused by a duplicate function call to `get_vm_stats()` in the prometheus endpoint handler. Removing the duplicate call fixes the problem.
* Prepare for Erlang OTP/26 (#4465)Ronny Berndt2023-03-111-1/+1
|
* Remove stale links from documentation commentsGabor Pali2023-03-101-6/+1
|
* Modify conflict ruby examplejiahuili2023-03-101-42/+57
| | | | | | - Add `adm:pass` to DB's URL to prevent unauthorized error - Add `new_edits:false` to the request JSON object to generate conflicts - Code reformatting by Rubyfmt
* mango: correct text index selection for queries with `$regex` (#4458)PÁLI Gábor János2023-03-106-22/+666
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * mango: Remove unused `op_insert` The `op_insert` elements in the abstract representation of the translated Lucene queries do not seem to be produced anywhere in the code. This might have been left over a while ago, and now retire it. * mango: Remove unused directory include * mango: Equip text index selection with tests, specs, and docs - Add specifications for the important functions that play some role in the text index selection. This would help to understand the implicit contracts around them and the associated data flow. - Introduce `test_utils:as_selector/1` to make it easier to build valid Mango selectors for testing. On the top level, it uses Erlang maps to ensure the structural consistency of the input (selectors are JSON objects that can be considered maps). Maps are then validated and normalized by `jiffy` and Mango's internal normalization rules for selectors for additional correctness, they eventually become embedded JSON objects. This facilities writing better unit tests that are closer to the real-world use. At the same time, it comes with a dependency on these tools and their misbehavior can cause test failures. - Add unit tests for the major functions that contribute to the index selection logic and boost the test coverage of the `mango_idx_text` and `mango_selector_text` modules. That is important because running integration tests on a higher level requires a working Clouseau instance, which may not always be available. With these unit tests in place, changes in the code can be tracked easily. Also, the test cases can aid the reader to get a better understanding of the assumed behavior. - Explain the purpose of `mango_idx_text:is_usable/3` as this is not trivial to catch at the first sight. Thanks @mikerhodes for providing the input. * mango: Refactor index selection tests * mango: Correct text index selection for `$regex` For the `$regex` operator, text indexes can be overly permissive which can cause that they are selected even if they could not serve the corresponding query. Rework the interpreteration of `$regex` to avoid such problems.
* Bump snappy to CouchDB-1.0.9 (#4464)Ronny Berndt2023-03-101-1/+1
|
* Fix erlfmt-format on Windows (#4463)Ronny Berndt2023-03-092-4/+4
| | | The erlfmt executable likes POSIX paths on Windows too.
* Tweak formatting and style of `_find` API documentation (#4460)PÁLI Gábor János2023-03-091-89/+97
| | | | | | | | | | | | | | | | | | | | | | | While reading the documentation, noticed a couple of problems with the HTML version, of which this change covers: - Parameter types for JSON objects are not uniform. The type `json` could either ambigiously refer to objects or arrays and used randomly. - The `Content-Type` header for response descriptions is not always put in a list, though probably that is the right choice because there can be many. - Indentation of paragraphs and list items is off sometimes which causes rendering mistakes. - Formatting of JSON examples is not uniform by indentation. - Add a couple of missed markdown for names, such as operators, to make them stand out from the text for better readability. - Unify style: use colons for the sentences that introduce examples, end sentences with full stops, lose contractions, regularize elaboration of object details for index descriptions and definitions. - Fix the reference to views in design documents when talking about indexes. - Add link for `_find` at the section about the index selection.
* Improve documentation of source code format checksGabor Pali2023-03-093-8/+17
| | | | | | | | | - Add `python-black-update` for `make help`. - In the output of `make help`, diffentiate between Erlang and Python source code checks. - Include the use of `black` in the developer documentation. - Hide `erlfmt` commands for the respective targets. This makes the targets consistent with their Windows versions.
* Documentation: Add `adm:pass` to replication endpoint URL (#4457)Jiahui Li2023-03-064-60/+65
| | | | | | | When using replication, we need to specify a username and password (`adm:pass` / `admin:password`) for the replication endpoint URL, otherwise a 401 unauthorized error will be thrown. Co-authored-by: jiahuili <Jiahui.Li@ibm.com>
* Remove duplicate parts of doc note (#4455)Ronny Berndt2023-03-061-2/+1
|
* Fix flaky elixir users_db_testsNick Vatamaniuc2023-03-041-2/+6
| | | | | | | | This fails more often on MacOS CI workers [1] but it seems to be a general flaky test as the users auth ddoc is not guaranteed to be inserted synchronously. [1] https://github.com/apache/couchdb/issues/4397#issue-1551336429
* Fix flaky LRU testNick Vatamaniuc2023-03-041-0/+6
| | | | | | | | | | | | Try to fix flaky test which was noticed on the MacOS CI worker [1]: ``` [2023-02-17T07:40:34.978Z] ets_lru_test:285: -test_limits/2-fun-7- (Expire leaves new entries)...ok [2023-02-17T07:40:34.978Z] ets_lru_test:292: -test_limits/2-fun-5- (Entry was expired)...*failed* [2023-02-17T07:40:34.978Z] in function ets_lru_test:'-test_limits/2-fun-5-'/2 (test/ets_lru_test.erl, line 294) ``` [1] https://github.com/apache/couchdb/issues/4397#issuecomment-1434764920
* Fix bad prometheus section nameNick Vatamaniuc2023-03-043-1/+5
| | | | | | | | | | | | | Interval was not read from the correct section. The tests had the correct section. This should fix the prometheus flaky test [1]. This was apparent on MacOS a bit more. The interval in the test was still 5 seconds instead of 1, so it was right at the edge of timing out (5 seconds is the default eunit timeout). [1] https://github.com/apache/couchdb/issues/4397#issuecomment-1425027913
* Only allow POST request for /{db}/_view_cleanup (#4449)Ronny Berndt2023-03-021-2/+4
|
* This enables configuring FIPS mode at runtime without the need for a custom ↵Nick Vatamaniuc2023-02-276-12/+107
| | | | | | build. Issue: #4442
* Use persistent terms for featuresNick Vatamaniuc2023-02-271-15/+15
| | | | This is intended to speed up feature checks.
* Update last_check in file logger recordRami Alia2023-02-261-2/+5
|