| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
| |
Nouveau - a new (experimental) full-text indexing feature for Apache CouchDB, using Lucene 9. Requires Java 11 or higher (19 is preferred).
|
|
|
|
|
|
|
| |
Sending GET requests targeting paths under the `/{db}/_index`
endpoint, e.g. `/{db}/_index/something`, cause an internal error.
Change the endpoint's behavior to gracefully return HTTP 405
"Method Not Allowed" instead to be consistent with others.
|
| |
|
|
|
|
|
|
|
| |
Covering indexes shall provide all the fields that the selector
may contain, otherwise the derived documents would get dropped on
the "match and extract" phase even if they were matching. Extend
the integration tests to check this case as well.
|
|
|
|
|
|
|
| |
Ideally, the effect of this function should be applied at a single
spot of the code. When building the base options, covering index
information should be left blank to make it consistent with the
rest of the parameters.
|
|
|
|
|
|
|
|
|
| |
This is required to make index selection work better with covering
indexes. The `$exists` operator prescribes the presence of the
given field so that if an index has the field, it shall be
considered because it implies true. Without this change, it will
not happen, but covering indexes can work if the index is manually
picked.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As a performance improvement, shorten the gap between Mango
queries and the underlying map-reduce views: try to serve
requests without pulling documents from the primary data set, i.e.
run the query with `include_docs` set to `false` when there is a
chance that it can be "covered" by the chosen index. The rows in
the results are then built from the information stored there.
Extend the response on the `_explain` endpoint to show information
in the `covered` Boolean attribute about the query would be covered
by the index or not.
Remarks:
- This should be a transparent optimization, without any semantical
effect on the queries.
- Because the main purpose of indexes is to store keys and the
document identifiers, the change will only work in cases when
the selected fields overlap with those. The chance of being
covered could be increased by adding more non-key fields to the
index, but that is not in scope here.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* mango: Remove unused `op_insert`
The `op_insert` elements in the abstract representation of the
translated Lucene queries do not seem to be produced anywhere in
the code. This might have been left over a while ago, and now
retire it.
* mango: Remove unused directory include
* mango: Equip text index selection with tests, specs, and docs
- Add specifications for the important functions that play some
role in the text index selection. This would help to understand
the implicit contracts around them and the associated data flow.
- Introduce `test_utils:as_selector/1` to make it easier to build
valid Mango selectors for testing. On the top level, it uses
Erlang maps to ensure the structural consistency of the input
(selectors are JSON objects that can be considered maps). Maps
are then validated and normalized by `jiffy` and Mango's internal
normalization rules for selectors for additional correctness,
they eventually become embedded JSON objects. This facilities
writing better unit tests that are closer to the real-world use.
At the same time, it comes with a dependency on these tools and
their misbehavior can cause test failures.
- Add unit tests for the major functions that contribute to the
index selection logic and boost the test coverage of the
`mango_idx_text` and `mango_selector_text` modules. That is
important because running integration tests on a higher level
requires a working Clouseau instance, which may not always be
available. With these unit tests in place, changes in the code
can be tracked easily. Also, the test cases can aid the reader
to get a better understanding of the assumed behavior.
- Explain the purpose of `mango_idx_text:is_usable/3` as this is
not trivial to catch at the first sight. Thanks @mikerhodes for
providing the input.
* mango: Refactor index selection tests
* mango: Correct text index selection for `$regex`
For the `$regex` operator, text indexes can be overly permissive
which can cause that they are selected even if they could not
serve the corresponding query. Rework the interpreteration of
`$regex` to avoid such problems.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit groups a couple of major changes that aim to amend
many pain points in making the Mango integration test suites
more accessible.
- The test framework behind the Mango integration test suite
provides a lot of flags that are not currently exposed on
the level of the main `Makefile`. Change this for the
greater flexilibility.
- Mango's test suite documentation is buried in the source tree,
which is not common for other kind of tests. To increase its
visibility and unify the style, move the contents of this file
over to the general developer documentation.
- Promote the use of the `mango-test` target instead of setting
up the related machinery manually. The commands recorded in
the original documentation are out of date and only minor
implementation details anyway.
- Retire the explicit control over the activation of Mango
integration tests that require support for text indexes.
Instead learn the availability of this feature from the current
CouchDB instance and run tests based on that. This effectively
makes the activation automated, which could be controlled
implicitly by either hooking up of a Clouseau instance or not.
- Running the Mango integration tests do not remove the databases
on their completion, which can inadvertently pollute the local
data store. To avoid this, enforce removal of test databases but
allow it to be disabled on demand.
|
|
|
|
|
|
|
|
|
|
| |
Python 3 uses UTF-8 encoding on reading the source files by default
and UTF-8 itself has become more widely adopted in the recent years
therefore it makes sense to remove the associated annotations.
At the same time, it helps to unbreak the Unicode key tests where
the Apple logo ('', as Unicode character) is featured and then got
butchered by forcing the ISO-8859-1 encoding on it.
|
|
|
|
|
|
| |
Text indexes do not support the `$keymapMatch` operator thus let
the test suite know about this limitation to avoid the related
error.
|
|
|
|
|
|
|
|
|
| |
Comparators are not represented by binary strings in the selection
ranges, captured by the `range/0` type. Although that is how they
are coming from the corresponding parsed JSON object, they are
being translated to specific atoms on the fly.
Noticed by: nickva
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit aims to improve Mango by reducing the data transferred to
the coordinator during query execution. It may reduce memory or CPU use
at the coordinator but that isn't the primary goal.
Currently, when documents are read at the shard level, they are compared
locally at the shard with the selector to ensure they match before they
are sent to the coordinator. This ensures we're not sending documents
across the network that the coordinator immediately discards, saving
bandwidth and coordinator processing. This commit further executes field
projection (`fields` in the query) at the shard level. This should
further save bandwidth, particularly for queries that project few fields
from large documents.
One item of complexity is that a query may request a quorum read of
documents, meaning that we need to do the document read at the
coordinator and not the shard, then perform the `selector` and `fields`
processing there rather than at the shard. To ensure that documents are
processed consistently whether at the shard or coordinator,
match_and_extract_doc/3 is added. There is still one orphan call outside
match_and_extract_doc/2 to extract/2 which supports cluster upgrade and
should later be removed.
Shard level processing is already performed in a callback, view_cb/2,
that's passed to fabric's view processing to run for each row in the
view result set. It's used for the shard local selector and fields
processing. To make it clear what arguments are destined for this
callback, the commit encapsulates the arguments, using viewcbargs_new/2
and viewcbargs_get/2.
As we push down more functionality to the shard, the context this
function needs to carry with it will increase, so having a record for it
will be valuable.
Supporting cluster upgrades:
The commit supports shard pushdown for Mango `fields` processing for
situations during rolling cluster upgrades.
In the state where the coordinator is speaking to an upgraded node, the
view_cb/2 needs to support being passed just the `selector` outside of
the new viewcbargs record. In this case, the shard will not process
fields, but the coordinator will.
In the situation where the coordinator is upgraded but the shard is not,
we need to send the selector to the shard via `selector` and also
execute the fields projection at the coordinator. Therefore we pass
arguments to view_cb/2 via both `selector` and `callback_args` and have
an apparently spurious field projection (mango_fields:extract/2) in the
code that receives back values from the shard ( factored out into
doc_member_and_extract).
Both of these affordances should only need to exist through one minor
version change and be removed thereafter -- if people are jumping
several minor versions of CouchDB in one go, hopefully they are prepared
for a bit of trouble.
Testing upgrade states:
As view_cb is completely separate from the rest of the cursor code,
we can first try out the branch's code using view_cb from `main`, and
then the other way -- the branch's view_cb with the rest of the file
from main. I did both of these tests successfully.
|
|
|
|
|
|
|
| |
I needed to understand the format of arguments to `match/2` when writing
the code to support projecting fields on the shard, so I wrote some code
to figure it out as a test. I figure this may be useful for future
work in this area, so push as commit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The remainder argument for the `$mod` operator can be zero, while
its documentation suggests otherwise. It actually covers a very
realistic use case where divisibility is expressed.
Neither related restrictions could be identified in the sources
[1] nor MongoDB forbids this [2]. Tests also seem to exercise this
specific case [3]. Thanks @iilyak for checking on these.
[1] https://github.com/apache/couchdb/blob/adf17140e81d0b74f2b2ecdea48fc4f702832eaf/src/mango/src/mango_selector.erl#L512:L513
[2] https://www.mongodb.com/docs/manual/reference/operator/query/mod/
[3] https://github.com/apache/couchdb/blob/0059b8f90e58e10b199a4b768a06a762d12a30d3/src/mango/test/03-operator-test.py#L58
|
| |
|
|
|
|
|
| |
Cherry-picked commits from 0156a55012b76adb652c11032596d9801c71665e
Thx @kianmeng
|
| |
|
|
|
|
|
| |
Creating an index with "ddoc":"" or "name":"" should return a 400 Bad Request.
This fixes: https://github.com/apache/couchdb/issues/1472
|
| |
|
|\ |
|
| |
| |
| |
| |
| | |
calling connected() every time causes spurious 503's when clouseau
is temporarily unavailable, which is usually masked by retry logic.
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These exceptions from main were ported over to 3.x
```
--- a/src/chttpd/src/chttpd.erl
+++ b/src/chttpd/src/chttpd.erl
@@ -491,6 +491,7 @@ extract_cookie(#httpd{mochi_req = MochiReq}) ->
end.
%%% end hack
+%% erlfmt-ignore
set_auth_handlers() ->
AuthenticationDefault = "{chttpd_auth, cookie_authentication_handler},
```
```
--- a/src/couch/src/couch_debug.erl
+++ b/src/couch/src/couch_debug.erl
@@ -49,6 +49,7 @@ help() ->
].
-spec help(Function :: atom()) -> ok.
+%% erlfmt-ignore
help(opened_files) ->
```
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces a macro and inserts it everywhere we catch errors
and then generatre a stacktrace.
So far the only thing that is a little bit ugly is that in two places,
I had to add a header include dependency on couch_db.erl where those
modules didn’t have any ties to couchdb/* before, alas. I’d be willing
to duplicate the macros in those modules, if we don’t want the include
dependency.
|
| |
|
|
|
|
|
|
|
|
|
| |
When partition_query_limit is set for couch_mrview, it limits how many
docs can be scanned when executing partitioned queries. But this limits
mango's doc scans internally. This leads to documents not being scanned
to fulfill a query. This fixes:
https://github.com/apache/couchdb/issues/2795
Co-authored-by: Joan Touzet <wohali@users.noreply.github.com>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, in https://github.com/apache/couchdb/pull/1783, the logic
was wrong in relation to how certain operators interacted with empty
arrays. We modify this logic to make it such that:
{"foo":"bar", "bar":{"$in":[]}}
and
{"foo":"bar", "bar":{"$all":[]}}
should return 0 results.
Co-authored-by: Joan Touzet <wohali@users.noreply.github.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Allows `configure.ps1` to correctly pull and build `rebar` on Windows
* Removes the static declarations in `rebar.config.script` on
specific, pre-determined paths to various includes/libraries
necessary for NIFs and external binaries (expectation is these are
passed in env vars INCLUDE, LIB and LIBPATH)
* fixes the SM60 `couchjs` build by telling `windows.h` not to
redefine min and max as macros through a `#define`
* fixes the `make eunit` target on Windows
* Adds the missing `EXE_LINK_CXX_TEMPLATE` that our rebar doesn't have,
but `enc` has today, which is also causing a failed `couchjs` (C++)
build on Windows
* Causes `make python-black` to correctly cause failure in `make check`
if it finds problems
* fixes Mango tests on Python 3.8 by bumping the hypothesis dependency
* fixes one Elixir test on Windows (incorrect calculation of `now(:ms)`
due to Erlang clock precision difference)
* a little bit of python black cleanup (mango tests)
|
|
|
|
|
|
|
| |
The CouchDB API defines the warning field returned by _find to be
a string (and this is what Fauxton expects). 5d55e289 was missing
a string conversion and returned the warning(s) as an array. This
restores the intended behaviour.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a user issues a range query $lt, $lte, $gt, $gte for text indexes,
the query is translated into a MIN, MAX range query against clouseau.
If not quoted, an error occurs:
{"error":"text_search_error","reason":"Cannot parse
'(a_3astring:[\"\" TO string\\ containing\\ space})'..}
This is because the string is broken up into 3 tokens which the parser
cannot parse. If we add quotes to the string, the the range query works
correctly.
|
| |
|
|
|
|
|
|
| |
mango_cursor_text:get_json_docs may return a not_found atom instead
of a Doc. In this case, we should just ignore the hit instead of
attempting to evaluate it against a mango selector.
|