| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
| |
Due to a retry loop in erlfb:transactional couchdb might try to send
multiple http responses for a single request which is clearly an
error.
This PR ensures the second attempt is prevented, closing the TCP
socket instead.
|
|
|
|
|
|
| |
During buggify runs we disable max tx retries by setting it to -1. That's FDB's
documented way to of doing it. However, when we re-use that setting to handle
restart_tx logic we don't account for -1, so we that's what this PR fixes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add `buggify-elixir-suite` target to run Elixir integration tests
under FoundationDB's client buggify mode [1]. In this mode, the FDB C
client in the `erlfdb` application will periodically throw mostly
retryable errors (`1009`, `1007`, etc). Transaction closures should
properly handle retryable errors without side-effects such as
re-sending response data to the user more than once or, attempt to
re-read data from the socket after it was already read once.
In order to avoid false positives, provide a custom .ini settings file
which disables transaction timeouts (`1031` errors). Those are not
retryable by default, as far as the `on_error` callback is
concerned. Ff we do have timeouts set ( = 60000), it signals the
FoundationDB client that we expect to handle timeouts in buggify mode,
so it starts throwing them [2]. Since we don't handle those everywhere
we get quite a few false positive errors.
Buggify settings I believe are the default -- 25% chance to activate
an error, and 25% chance of firing the error when the code passes over
that section. In most test runs this should result in a pass, but
sometimes, due to lingering bugs, there will be timeouts, 409
conflicts and other failures so we cannot yet turn this into a
reliable integration test step.
[1] https://apple.github.io/foundationdb/client-testing.html
[2] https://github.com/apple/foundationdb/blob/master/fdbclient/ReadYourWrites.actor.cpp#L1191-L1194
|
|
|
|
|
|
|
| |
Set the `worker_trap_exits = false` setting to ensure our replication worker
pool properly cleans up worker processes.
Ref: https://github.com/apache/couchdb/pull/3208
|
|\
| |
| | |
Handle the case when md5 field is undefined
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously the default value for md5 field was `<<>>`. This value changed to
undefined when we switch to using maps instead of erlang records.
The change break the `couch_att:to_json/4` funciton because `base64:encode/1`
cannot handle atom `undefined`.
The issue happens for inline attachments only this is why it wasn't discovered
earlier.
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
|\ \
| |/
| |
| |
| | |
cloudant/handle-unknown_eval_api_language-error-message
Add clause for unknown_eval_api_language
|
|/ |
|
|\
| |
| | |
We don't have any couch_stats metrics in fabric
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
* Drop support for Erlang < 21
The new logger only works in Erlang 21+, so we can start simplifying
the codebase by removing the macro that provides support for retrieving
stack traces the old way.
* Produce structured logging reports
We use the new logger macros to generate log events alongside the
existing couch_log system. This commit does not introduce any handlers
for the new events; that will come later.
|
|\
| |
| | |
fix default values for prometheus templates
|
| |
| |
| |
| |
| |
| |
| | |
couch_prometheus's additional http server should be off by default
and the port should have a default value of 17986 when running
make release
|
|/
|
|
|
|
|
|
|
|
|
|
| |
Previously we installed two of them, one through the
`error_logger:add_report_handler/1` call and another through the
`gen_event:add_sup_handler/3`. The first one was to setup `logger` handler, but
the second was was still needed so we're notified if event manager died.
The fix is to do what `error_logger:add_handler/1` does [1] and instead of
calling the `gen_event:add_handler/3`, call `gen_server:add_sup_handler/3`.
[1] https://github.com/erlang/otp/blob/40922798411c2d23ee8a99456f96d6637c62b762/lib/kernel/src/error_logger.erl#L453-L455
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After recent improvements to how retries are handled in `fabric2_fdb`, Elixir
integration tests can often pass when running under "buggify" mode. The chance
of re-sending response data during retries is now much lower, too. However,
there are still some instances of that type of failure when running `_all_dbs`
tests. To trigger it would have to run the all_dbs test from basics_tests.exs a
few thousands times in a row. The reason for the failure is that retryable
errors might be still thrown during the `LayerPrefix = get_dir(Tx)` call, and
the whole transaction closure would be retried in [2]. When that happens,
user's callback is called twice with `meta` and it sends `[` twice in a row to
the client [3], which is incorrect.
A simple fix is to not call `meta` or `complete` from the transaction context.
That works because we don't pass the transaction object into user's callback
and the user won't be able to run another transaction in the same process
anyway.
There are already tests which test retriable errors in _all_dbsa and _dbs_info,
but they had been updated to only throw retriable errors when rows are emitted
to match the new behavior.
[0] https://github.com/apache/couchdb/commit/acb43e12fd7fddc6f606246875909f7c7df27324
[1] ```
ERL_ZFLAGS="-erlfdb network_options '[client_buggify_enable, {client_buggify_section_activated_probability, 35}, {client_buggify_section_fired_probability, 35}]'" make elixir tests=test/elixir/test/basics_test.exs:71
```
[2] https://github.com/apache/couchdb/blob/082f8078411aab7d71cc86afca0fe2eff3104b01/src/fabric/src/fabric2_db.erl#L279-L287
[3] https://github.com/apache/couchdb/blob/082f8078411aab7d71cc86afca0fe2eff3104b01/src/chttpd/src/chttpd_misc.erl#L137
|
|
|
|
|
|
|
|
|
|
|
| |
When running integration tests with a "buggified" client [1], sometimes
`fold_range_not_progressing` is triggered since it's possible retriable errors
might be thrown 3 times in a row. Instead of bumping it arbitrarily, since we
already have a retry limit in fabric2_server, start using that.
[1] ```
ERL_ZFLAGS="-erlfdb network_options '[client_buggify_enable, {client_buggify_section_activated_probability, 25}, {client_buggify_section_fired_probability, 25}]'" make elixir tests=test/elixir/test/basics_test.exs
```
|
|
|
|
|
|
|
|
|
|
| |
Previously we deleted the ets handle cache, then bumped the FDB metadata. That
has a race condition if anything else, like fabric2_indexing or other
couch_jobs users, tried to get a jtx() handle before the metadata bump. In that
case, they would have re-inserted a handle with a stale `md_version` and the
test assertions would fail.
The fix is to first bump the md version, then delete the handles.
|
|\
| |
| | |
Create couch lib
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
couch_jobs activity monitor is responsible for checking jobs which have not
been updated often enough by their workers and re-enqueuing them. Previously,
when the number of jobs grew high enough, couch_jobs could fail to either
iterate through all the jobs and timeout with a 1007, or try to re-enqueue too
many jobs such that the sum of the commit data would end up being larger than
the 10MB FDB limit.
couch_jobs notifier is in charge of notifying subscribers when job state
changes. If the jobs are updated, it would notify if it noticed updates,
otherwise it would notify if jobs switched to a new state (running -> pending,
running -> finished, etc). Previously, if there were too many jobs and/or the
cluster was overloaded, it was possible for the notifier to consistently fail
with timeouts.
To fix both issues introduce batching with the batch size dynamically adjusted
based on load. When consecutive errors occur the batch size will shrink
exponentially down to 1 row per transaction. Then, with each success, the batch
will grow linearly by a fixed amount. This auto-configurable behavior should
provide optimal behavior during overload and during normal operating
conditions.
For tests, since there are already tests which test enqueuing and subscription,
use the same tests but make sure they are run while errors are periodically
generated. That's accomplished with the help of `meck:loop/1` meck return
specification.
|
| |
| |
| |
| |
| |
| | |
This macro can be used to simplify retryable error checks throughout couch_jobs
app. It checks for erlfdb retryable errors (1007, 1009, etc), for the 1031
(`transaction_timed_out`) error and for `{timeout, _}`.
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
Also, add a clause for the variant without a txid part.
Previously, `next_vs/1` could overflow the batch or the txid field. The range
of values for both those is [0..16#FFFF], so the correct check before
incrementing each field should be `< 16#FFFF` instead of `=< 16#FFFF`. Since
we're dealing with bytes and in other places in the file we use 16#FFFF for max
values in the versionstamp fields, switch to hex constants.
The tests were included in the fabric2_changes_fold_tests module as next_vs is
relevant for the _changes feed since_seq calculation.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now host our CI containers directly under the Apache Docker Hub
org.
In addition, the newly rebuilt buster-erlang-all image has 4 Erlang
releases in it, corresponding to the latest version available in each
supported major release today:
* 20.3.8.26 (against which our 3.2 binaries will be built)
* 21.3.8.22
* 22.3.4.17
* 23.3.1
This PR changes our PR builds to run against all 4 of these versions.
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces a macro and inserts it everywhere we catch errors
and then generatre a stacktrace.
So far the only thing that is a little bit ugly is that in two places,
I had to add a header include dependency on couch_db.erl where those
modules didn’t have any ties to couchdb/* before, alas. I’d be willing
to duplicate the macros in those modules, if we don’t want the include
dependency.
|
|
|
|
|
|
|
|
|
|
| |
* Remove non-existent applications
* Most importantly, start running all the unit test. This should include 500+ new couch tests.
* Noticed `local` application was flaky and periodically timing out in CI.
Since it's a transitive dependency of jaeger, let's skip running it for now.
It's a bit in the same category as brcypt, meck and hyper.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Update the couch_views include paths
* Exclude non-existent applications from setup logic
* Do not run tests agains the backdoor port
* Do not run tests checking for non-existent system dbs
* Since "new" couch_att attachment format has changed a bit and encoding is
not `identity` and `md5` is not `<<>>` any longer, some tests had to be
updated to set those explicitly.
|
|
|
|
|
|
|
|
|
|
| |
This is mostly a bulk search and replace to update fabric, couch_views, chttpd,
mango and couch_replicator to use either the new included file or the new
utility functions in couch_views.
The `couch_views_http:transform_row/2` function was brought from the
removed`fabric_view` module. It's used in only one place to it was copied there
directly.
|
|
|
|
|
|
|
|
| |
couch_auth_cache only handles reading server admin credentials from config files and returns the auth design doc (used in chttpd_auth_cache).
Node local `_user` docs logic has been removed. Validation to check
for _conflicts is also not needed as the "docs" proplists created from
the config server admin section don't have conflicts.
|
|
|
|
|
|
|
|
| |
`normalize_dbname/1` is not needed as database names do not have the `.couch`
suffix, and we don't have shard paths any more. For validation, send the
`DbName` to the `fabric2_db_plugin` as both the real DbName and the
"normalized" one. This is mostly to avoid changing the plugin interface for now
and should be eventually updated (in a separate PR).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* `couch_mrview_util` functions ended up mostly in `couch_views_util`.
* `couch_mrview` validatation functions ended up in `couch_views_validate`
module.
* `couch_mrview_http` functions moved to `couch_views_http_util`. The reason
they didn't end up in `couch_views_http` is because a lot of the functions
there have the exact same names as the ones in `couch_views_http_util`.
There is quite a bit of duplication involved but that is left for another
refactoring in the future. The general flow of control goes from chttpd ->
couch_views_http -> couch_views_http_util.
Most of the changes are just copy and paste with the exception of the
`ddoc_to_mrst/2` function. Previously, there were two almost identical copies
-- one in `couch_mrview_util` and another in `couch_views_util`. Both were used
by different parts of the code. The difference was the couch_views one
optionally disabled reduce functions, and replaced their body with the
`disabled` atom, while the one in `couch_mrview` didn't. Trying to unify them
such that only the `couch_views` one is used, resulted in the inability to
write design documents on server which have custom reduce disabled. That may be
a better behavior, however that should be updated in a separate PR and possibly
a mailing list discussion. So in order to preserve the exisiting behavior,
couch_eval was update to not fail in `try_compile` when design documents are
disabled.
Patches to the rest of the code to update the include path and use the new
utility functions will be updated in a separate commit.
|
|
|
|
|
|
| |
Try to minimize changes and cheated a bit by returning `false` from
`is_text_service_available()`. Also keeping in mind that we'd probably want
this functionality in the future.
|
|
|
|
| |
Un-used defines and records are removed
|
|
|
|
| |
It was replaced by fabric2_db_plugin
|
|
|
|
| |
Also remove the with_db/2 function as it's not used any longer
|
| |
|
|
|
|
|
| |
Remove ioq call from `couch_os_process:prompt/2` and
`couch_js_os_process:prompt/2`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove all the code related to opening and caching of databases.
However, couch_server did a few other things such as:
* Parse and maintain the CouchDB version
* Return the server "uuid" value
* The gen_server was monitoring config updates and hashing admin passwords
when they were updated
It was a 50/50 decision to move that functionality out to other modules
completely or keep it where it is. Since it wasn't just a single thing, and the
overall PR was getting rather large, opted to pair the exisiting code to the
minimum, and then later we can do another round of cleanup and find a better
place for those features.
|
|
|
|
|
|
|
|
| |
The main change is to remove `validate_docid/1,2` and use
`fabric2_db:validate_docid/1` instead.
`with_ejson_body` is also not needed as request bodies are parsed
to ejson and fabric2_fdb also deserializes to ejson.
|
|
|
|
|
|
|
| |
Remove shard handling from `couch_flags`.
`couch_db:normalize_dbname/1` call is not necessary as db names are not shards
and do not have the `.couch` extension any more.
|
|
|
|
|
|
| |
Remove logic handling index directories and couch_files. Definitely not a
comprehensive cleanup. We should probably have a separate PR to pick out what
would be useful for `main`.
|