| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|\
| |
| | |
feat: additional prometheus metrics
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Prometheus assumes that metrics with `counter` types are cumulative.
This isn't the case in CouchDB / Folsom, which allows counters to
be decremented.
This changes the type of metrics where we decrement the counter values
to `gauge`:
- couchdb_open_databases
- couchdb_couchdb_open_os_files
- couchdb_httpd_clients_requesting_changes
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add a gauge metric `membership` to the `_prometheus` endpoint. The metric
has labels:
- `nodes=all_nodes`
- `nodes=cluster_nodes`
matching the fields in the `_membership` endpoint (I think consistency
here is more useful than renaming the labels to e.g. expected/actual).
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Adds an internal replication backlog metric. In the `_system` endpoint
this is called `internal_replication_jobs`, so I've preserved the name,
though it appears to represent the backlog of changes.
Adding a dependency on mem3 to `couch_prometheus` requires some changes
to the tests and dependency tree:
- `couchdb.app.src` no longer lists a dependency on `couch_prometheus`.
I don't know why this was needed previously - it doesn't appear to be
required.
- `couch_prometheus` now has dependencies on `couch` and `mem3`.
This both ensures that `couch_prometheus` doesn't crash if mem3 isn't
running and also resolves a race condition on startup where the
`_prometheus` endpoint returns incomplete stats.
- `couch_prometheus:system_stats_test/0` is moved to
`couch_prometheus_e2e_tests:t_starts_with_couchdb/0`. It is really
an integration test, since it depends on the `_prometheus` endpoint
being able to collect data for all the metrics, and it tests only
that the metrics names begin with `couchdb_`.
|
|/
|
|
|
|
|
|
|
|
|
| |
The `_prometheus` endpoint today includes size/min/max metrics
across all message queues. This adds a new metric -
`erlang_message_queue_size{queue_name="<name>"}` which tracks the
size of individual message queues.
This could replace the previous metrics since those can be derived from
the new metric by prometheus, but I've left them in place for
compatibility.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spidermonkey sometimes throws an `InternalError` when exceeding memory limits,
when normally we'd expect it to crash or exit with a non-0 exit code. Because
we trap exceptions, and continue emitting rows, it is possible for users views
to randomly miss indexed rows based on whether GC had run or not, other
internal runtime state which may have been consuming more or less memory until
that time.
To prevent the view continuing processing documents, and randomly dropping
emitted rows, depending on memory pressure in the JS runtime at the time,
choose to treat Internal errors as fatal.
After an InternalError is raised we expect the process to exit just like it
would during OOM.
Add a test to assert this happens.
Fix https://github.com/apache/couchdb/issues/4504
|
|\
| |
| | |
add error_info clause for 410 Gone
|
|/ |
|
|
|
|
|
|
|
|
|
| |
- Unify the style of synopsis lines.
- Mention the `partitioned` parameter where applicable.
- Fix formatting of `warning` in one of the example responses.
- Trade the possibly retired `range` attribute for `mrargs` and
expand the attributes within `opts` in the response of
`_explain`.
|
|\
| |
| | |
eunit test to assert ddoc_updated clause doesn't throw
|
| |
| |
| |
| |
| |
| |
| | |
Fail index opens in a few different ways and assert async_error is called.
Also crash an index process after it's open to test it doesn't take down any
index servers.
|
|/
|
|
|
| |
We pass in a shard name that doesn't exist, causing couch_util:with_db to throw.
we assert that we get back {ok, St} and don't crash.
|
|
|
|
| |
To log messages of test-runs to a single file, we need to add
a absoulute path to the file logger.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Correct env variable `ERL_AFLAGS` to suppress `sasl_error_logger` output.
Suppressing messages like:
```
=ERROR REPORT==== 22-Mar-2023::22:47:36.788000 ===
Error in process <0.998.0> with exit value:
{database_does_not_exist,
[{mem3_shards,load_shards_from_db,"_users",
[{file,"src/mem3_shards.erl"},{line,430}]},
{mem3_shards,load_shards_from_disk,1,
[{file,"src/mem3_shards.erl"},{line,405}]},
{mem3_shards,load_shards_from_disk,2,
[{file,"src/mem3_shards.erl"},{line,434}]},
{mem3_shards,for_docid,3,[{file,"src/mem3_shards.erl"},{line,100}]},
{fabric_doc_open,go,3,[{file,"src/fabric_doc_open.erl"},{line,39}]},
{chttpd_auth_cache,ensure_auth_ddoc_exists,2,
[{file,"src/chttpd_auth_cache.erl"},{line,214}]},
{chttpd_auth_cache,listen_for_changes,1,
[{file,"src/chttpd_auth_cache.erl"},{line,160}]}]}
chttpd_socket_buffer_size_test:51: small_recbuf...[0.006 s] ok
=INFO REPORT==== 22-Mar-2023::22:47:36.897000 ===
application: chttpd
exited: stopped
type: temporary
=INFO REPORT==== 22-Mar-2023::22:47:36.897000 ===
application: fabric
exited: stopped
type: temporary
```
|
|\
| |
| | |
Couch index fixes
|
| | |
|
| | |
|
| | |
|
|/
|
|
| |
This reverts commit 937ccb6ef84b773882c967a6fa6f4d71df42e4cf.
|
| |
|
|
|
|
| |
It fixes aliases leak and prepare for new package updates
|
|\
| |
| | |
catch and log any error from mem3:local_shards
|
|/ |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `/_node/_local/_prometheus` is a missing `TYPE` annotation for
`couchdb_httpd_status_codes`.
In addition, it contains no `HELP` annotations, which
are useful when exploring the metrics, particularly where
metrics do not strictly match those returned by the `_stats` or
`_system` endpoints.
This PR adds the missing `TYPE` annotation and adds `HELP` annotations
to all metrics.
The spec for the prometheus text format is at
https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md,
for reference.
It also adds additional spacing between the metrics series, making it
easier for humans to parse.
## couch_prometheus_util:to_prom/3
`couch_prometheus_util:to_prom/3` is replaced by `couch_prometheus_util:to_prom/4`.
which now expects a description alongside the metric name and type.
## couch_prometheus_util:couch_to_prom/3
`couch_prometheus_util:couch_to_prom/3` now extracts the metrics
description from the metric metadata returned by `couch_stats`.
In some cases, where the metrics are transformed e.g. from multiple
metrics to a single metric with a tag, the description is explicitly
specified to match the new metric semantics.
|
|
|
|
| |
Migrate the settings `with_proper` and `erlang_md5` to `configure.ps1`
to add it to `config.erl`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a few minor improvements:
- Add more tests to check sandboxing resets, and that docs are "frozen".
- Remove the extra `\n` and `"` around function body lines. Erlang can do
multi-line binaries just fine. Mark the sections with %erlfmt-ignore to the
formatter doesn't complain.
- Generalize `should_create_sandbox` test to check for the `not defined` string
only. Experimenting with QuickJS noticed that it uses single quotes
around`'Object.foo' is not defined` and SM doesn't. So check for `not
defined` part only as it's obvious enough what the check is about.
- Make sure to return test procs back to the pool. Previously, none of the
tests returned the processes back into the pool, and when the tests ended,
they were forcibly killed which resulted in log noise that looked like:
```
erl_child_setup: failed with error 32 on line 265
erl_child_setup: failed with error 32 on line 265
...
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`couch_prometheus_util:counter_metric/1` is intended to add a
`_total` suffix to the name of metrics with type `counter` if
the name does not already end with `_total`.
The implementation was previously incorrect, resulting in metrics
with names like `document_purges_total_total`.
This adds basic eunit tests for the failure / success scenarios and
fixes the implementation.
It is a breaking change for consumers of the `_node/_local/_prometheus`
endpoint, however, since the following metric is renamed:
* `couchdb_document_purges_total_total` -> `couchdb_document_purges_total`
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Filter view functions feature re-uses view map function for filtering _changes
feeds. Instead of accumulating emitted KVs, it uses custom emit() function
which just toggles a flag. However, in order to use this optimisation, the
function is compiled first with the regular emit function, then the function
source is queried with a non-portable toSource() method, and re-compiled again
with a new sandbox where emit is overridden.
Instead of reparsing and re-compiling it, pass the sandbox to the compile
function and compile filter views with that correct sandbox to start with.
Moreover, this helps remove another non-portable function call.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The `_node/_local/_prometheus` was returning duplicate rows for the
following metrics:
```
couchdb_erlang_memory_bytes
couchdb_erlang_gc_collections_total
couchdb_erlang_gc_words_reclaimed_total
couchdb_erlang_context_switches_total
couchdb_erlang_reductions_total
couchdb_erlang_processes
couchdb_erlang_process_limit
```
Prometheus will gracefully handle the duplication, picking the first
entry only, but it bloats the response and can potentially cause
unexpected results if there's a signficant delay capturing the
samples.
The duplication is caused by a duplicate function call to
`get_vm_stats()` in the prometheus endpoint handler. Removing the
duplicate call fixes the problem.
|
| |
|
| |
|
|
|
|
|
|
| |
- Add `adm:pass` to DB's URL to prevent unauthorized error
- Add `new_edits:false` to the request JSON object to generate conflicts
- Code reformatting by Rubyfmt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* mango: Remove unused `op_insert`
The `op_insert` elements in the abstract representation of the
translated Lucene queries do not seem to be produced anywhere in
the code. This might have been left over a while ago, and now
retire it.
* mango: Remove unused directory include
* mango: Equip text index selection with tests, specs, and docs
- Add specifications for the important functions that play some
role in the text index selection. This would help to understand
the implicit contracts around them and the associated data flow.
- Introduce `test_utils:as_selector/1` to make it easier to build
valid Mango selectors for testing. On the top level, it uses
Erlang maps to ensure the structural consistency of the input
(selectors are JSON objects that can be considered maps). Maps
are then validated and normalized by `jiffy` and Mango's internal
normalization rules for selectors for additional correctness,
they eventually become embedded JSON objects. This facilities
writing better unit tests that are closer to the real-world use.
At the same time, it comes with a dependency on these tools and
their misbehavior can cause test failures.
- Add unit tests for the major functions that contribute to the
index selection logic and boost the test coverage of the
`mango_idx_text` and `mango_selector_text` modules. That is
important because running integration tests on a higher level
requires a working Clouseau instance, which may not always be
available. With these unit tests in place, changes in the code
can be tracked easily. Also, the test cases can aid the reader
to get a better understanding of the assumed behavior.
- Explain the purpose of `mango_idx_text:is_usable/3` as this is
not trivial to catch at the first sight. Thanks @mikerhodes for
providing the input.
* mango: Refactor index selection tests
* mango: Correct text index selection for `$regex`
For the `$regex` operator, text indexes can be overly permissive
which can cause that they are selected even if they could not
serve the corresponding query. Rework the interpreteration of
`$regex` to avoid such problems.
|
| |
|
|
|
| |
The erlfmt executable likes POSIX paths on Windows too.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While reading the documentation, noticed a couple of problems with
the HTML version, of which this change covers:
- Parameter types for JSON objects are not uniform. The type `json`
could either ambigiously refer to objects or arrays and used
randomly.
- The `Content-Type` header for response descriptions is not always
put in a list, though probably that is the right choice because
there can be many.
- Indentation of paragraphs and list items is off sometimes which
causes rendering mistakes.
- Formatting of JSON examples is not uniform by indentation.
- Add a couple of missed markdown for names, such as operators,
to make them stand out from the text for better readability.
- Unify style: use colons for the sentences that introduce
examples, end sentences with full stops, lose contractions,
regularize elaboration of object details for index descriptions
and definitions.
- Fix the reference to views in design documents when talking
about indexes.
- Add link for `_find` at the section about the index selection.
|
|
|
|
|
|
|
|
|
| |
- Add `python-black-update` for `make help`.
- In the output of `make help`, diffentiate between Erlang and
Python source code checks.
- Include the use of `black` in the developer documentation.
- Hide `erlfmt` commands for the respective targets. This makes
the targets consistent with their Windows versions.
|
|
|
|
|
|
|
| |
When using replication, we need to specify a username and password
(`adm:pass` / `admin:password`) for the replication endpoint URL,
otherwise a 401 unauthorized error will be thrown.
Co-authored-by: jiahuili <Jiahui.Li@ibm.com>
|
| |
|
|
|
|
|
|
|
|
| |
This fails more often on MacOS CI workers [1] but it seems to be a general
flaky test as the users auth ddoc is not guaranteed to be inserted
synchronously.
[1] https://github.com/apache/couchdb/issues/4397#issue-1551336429
|
|
|
|
|
|
|
|
|
|
|
|
| |
Try to fix flaky test which was noticed on the MacOS CI worker [1]:
```
[2023-02-17T07:40:34.978Z] ets_lru_test:285: -test_limits/2-fun-7- (Expire leaves new entries)...ok
[2023-02-17T07:40:34.978Z] ets_lru_test:292: -test_limits/2-fun-5- (Entry was expired)...*failed*
[2023-02-17T07:40:34.978Z] in function ets_lru_test:'-test_limits/2-fun-5-'/2 (test/ets_lru_test.erl, line 294)
```
[1] https://github.com/apache/couchdb/issues/4397#issuecomment-1434764920
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Interval was not read from the correct section.
The tests had the correct section.
This should fix the prometheus flaky test [1]. This was apparent on MacOS
a bit more. The interval in the test was still 5 seconds instead of 1,
so it was right at the edge of timing out (5 seconds is the default
eunit timeout).
[1] https://github.com/apache/couchdb/issues/4397#issuecomment-1425027913
|
| |
|
|
|
|
|
|
| |
build.
Issue: #4442
|
|
|
|
| |
This is intended to speed up feature checks.
|
| |
|