diff options
Diffstat (limited to 'src/docs/src/ddocs/nouveau.rst')
-rw-r--r-- | src/docs/src/ddocs/nouveau.rst | 692 |
1 files changed, 692 insertions, 0 deletions
diff --git a/src/docs/src/ddocs/nouveau.rst b/src/docs/src/ddocs/nouveau.rst new file mode 100644 index 000000000..7a9e6d217 --- /dev/null +++ b/src/docs/src/ddocs/nouveau.rst @@ -0,0 +1,692 @@ +.. Licensed under the Apache License, Version 2.0 (the "License"); you may not +.. use this file except in compliance with the License. You may obtain a copy of +.. the License at +.. +.. http://www.apache.org/licenses/LICENSE-2.0 +.. +.. Unless required by applicable law or agreed to in writing, software +.. distributed under the License is distributed on an "AS IS" BASIS, WITHOUT +.. WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +.. License for the specific language governing permissions and limitations under +.. the License. + +.. _ddoc/nouveau: + +======= +Nouveau +======= + +.. warning:: + Nouveau is an experimental feature. Future releases might change how the endpoints + work and might invalidate existing indexes. + +Nouveau indexes enable you to query a database by using the +`Lucene Query Parser Syntax. <https://lucene.apache.org/core/9_5_0/queryparser/ +org/apache/lucene/queryparser/classic/package-summary.html#Overview>`_ +A nouveau index uses one, or multiple, fields from your documents. You can use a nouveau +index to run queries to find documents based on the content they contain. + +.. warning:: + Nouveau cannot function unless it has a functioning Nouveau server. + See :ref:`Nouveau Server Installation <install/nouveau>` for details. + +To create a nouveau index, you add a JavaScript function to a design document in the +database. An index builds after processing one search request or after the server detects +a document update. The ``index`` function takes the following parameters: + +#. Field type - The type of the field, can be ``string``, ``text``, ``double`` + or ``stored``. +#. Field name - The name of the field you want to use when you query the index. + If you set this parameter to ``default``, then this field is queried if no field is + specified in the query syntax. +#. Data that you want to index, for example, ``doc.address.country``. +#. (Optional) The third parameter includes the following field: ``store``. + +By default, a nouveau index response returns 25 rows. The number of hits that are returned +can be changed by using the ``limit`` parameter. Each response includes a ``bookmark`` +field. You can include the value of the ``bookmark`` field in subsequent queries to fetch +results from deeper in the result set. + +*Example design document that defines a nouveau index:* + +.. code-block:: javascript + + { + "_id": "_design/nouveau_example", + "nouveau": { + "animals": { + "index": "function(doc){ ... }" + } + } + } + +A nouveau index will inherit the partitioning type from the ``options.partitioned`` field +of the design document that contains it. + +Index functions +=============== + +Attempting to index by using a data field that does not exist fails. To avoid +this problem, use the appropriate +:ref:`guard clause <ddoc/nouveau/index_guard_clauses>`. + +.. note:: + Your indexing functions operate in a memory-constrained environment + where the document itself forms a part of the memory that is used + in that environment. Your code's stack and document must fit inside this + memory. In other words, a document must be loaded in order to be indexed. + Documents are limited to a maximum size of 64 MB. + +The function that is contained in the index field is a JavaScript function +that is called for each document in the database. +The function takes the document as a parameter, +extracts some data from it, and then calls the function that is defined +in the ``index`` field to index that data. + +The ``index`` function takes four parameters, where the third parameter is optional. + +#. The first parameter is the type of the field. + +#. The second parameter is the name of the field you intend to use + when querying the index, and which is specified in the Lucene + syntax portion of subsequent queries. An example appears in the + following query: + + .. code-block:: javascript + + q=color:red + + The Lucene field name ``color`` is the first parameter of the ``index`` function. + + If the special value ``"default"`` is used when you define the name, + you do not have to specify a field name at query time. + The effect is that the query can be simplified: + + .. code-block:: javascript + + q=red + +#. The third parameter is the data to be indexed. Keep the following information + in mind when you index your data: + + - This data must be only a string, number, or boolean. Other types will cause + an error to be thrown by the index function call. + + - If an error is thrown when running your function, for this reason or others, + the document will not be added to that search index. + +#. The fourth, optional, parameter is a JavaScript object with the following fields: + + *Index function (optional parameter)* + + * **store** - If ``true``, the value is returned in the search result; otherwise, + the value is not returned. Values are ``true`` or ``false``. Default is ``false``. + + .. note:: + + If you do not set the ``store`` parameter, + the index data results for the document are not returned in response to a query. + +*Example search index function:* + +.. code-block:: javascript + + function(doc) { + if (typeof(doc.min_length) == 'number') { + index("double", "min_length", doc.min_length, {"store": true}); + } + if (typeof(doc.diet) == 'string') { + index("string", "diet", doc.diet, {"store": true}); + } + if (typeof(doc.latin_name) == 'string') { + index("string", "latin_name", doc.latin_name, {"store": true}); + } + if (typeof(doc.class) == 'string') { + index("string", "class", doc.class, {"store": true}); + } + } + +.. _ddoc/nouveau/index_guard_clauses: + +Index guard clauses +------------------- + +Runtime errors in the index function cause the document not to be indexed at all. The +most common runtime errors are described below; + +*Example of failing to check whether the indexed value exists:* + +.. warning:: example of bad code +.. code-block:: javascript + + index("min_length", doc.min_length, {"store": true}); + +For documents without a `min_length` value, this index call will +pass ``undefined`` as the value. This will be rejected by nouveau's +validation function and the document will not be indexed. + +*Example of failing to check whether the nested indexed value exists:* + +.. warning:: example of bad code +.. code-block:: javascript + + if (doc.foo.bar) { + index("bar", doc.foo.bar, {"store": true}); + } + +This bad example fails in a different way if ``doc.foo`` doesn't +exist; the evaluation of ``doc.foo.bar`` throws an exception. + +.. code-block:: javascript + + if (doc.foo && typeof(doc.foo) == 'object' && typeof(doc.foo.bar == 'string')) { + index("bar", doc.foo.bar, {"store": true}); + } + +This example correctly checks that ``doc.foo`` is an object and its +``bar`` entry is a string. + +*Example of checking the index value exists but disallowing valid false values:* + +.. warning:: example of bad code +.. code-block:: javascript + + if (doc.min_length) { + index("min_length", doc.min_length, {"store": true}); + } + +We correct the previous mistake so documents without min_length are +indexed (assuming there are other index calls for values that `do` +exist) but we've acccidentally prevented the indexing of the +``min_length`` field if the ``doc.min_length`` happens to be ``0``. + +.. code-block:: javascript + + if (typeof(doc.min_length == 'number')) { + index("min_length", doc.min_length, {"store": true}); + } + +This good example ensures we index any document where ``min_length`` is a number. + +.. _ddoc/nouveau/analyzers: + +Analyzers +========= + +Analyzers convert textual input into ``tokens`` which can be searched +on. Analyzers typically have different rules for how they break up +input into tokens, they might convert all text to lower case, they +might omit whole words (typically words so common they are unlikely to +be useful for searching), they might omit parts of words (removing +``ing`` suffixes in English, for example): + +We expose a large number of Lucene's analyzers. We invent one +ourselves (``simple_asciifolding``); + +* arabic +* armenian +* basque +* bulgarian +* catalan +* chinese +* cjk +* classic +* czech +* danish +* dutch +* email +* english +* finnish +* french +* galician +* german +* hindi +* hungarian +* indonesian +* irish +* italian +* japanese +* keyword +* latvian +* norwegian +* persian +* polish +* portugese +* romanian +* russian +* simple +* simple_asciifolding +* spanish +* standard +* swedish +* thai +* turkish +* whitespace + +*Example analyzer document:* + +.. code-block:: javascript + + { + "_id": "_design/analyzer_example", + "nouveau": { + "INDEX_NAME": { + "index": "function (doc) { ... }", + "default_analyzer": "$ANALYZER_NAME" + } + } + } + +.. _ddoc/nouveau/field-analyzers: + +Field analyzers +---------------- + +You may optionally specify a different analyzer for a specific field. + +*Example of defining different analyzers for different fields:* + +.. code-block:: javascript + + { + "_id": "_design/analyzer_example", + "nouveau": { + "INDEX_NAME": { + "default_analyzer": "english", + "field_analyzers": { + "spanish": "spanish", + "german": "german" + }, + "index": "function (doc) { ... }" + } + } + } + +Testing analyzer tokenization +----------------------------- + +You can test the results of analyzer tokenization by posting sample data to the +``_nouveau_analyze`` endpoint. + +*Example of using HTTP to test the keyword analyzer:* + +.. code-block:: http + + POST /_nouveau_analyze HTTP/1.1 + Content-Type: application/json + {"analyzer":"keyword", "text":"ablanks@renovations.com"} + +*Example of using the command line to test the keyword analyzer:* + +.. code-block:: sh + + curl 'https://$HOST:5984/_nouveau_analyze' -H 'Content-Type: application/json' + -d '{"analyzer":"keyword", "text":"ablanks@renovations.com"}' + +*Result of testing the keyword analyzer:* + +.. code-block:: javascript + + { + "tokens": [ + "ablanks@renovations.com" + ] + } + +*Example of using HTTP to test the standard analyzer:* + +.. code-block:: http + + POST /_nouveau_analyze HTTP/1.1 + Content-Type: application/json + {"analyzer":"standard", "text":"ablanks@renovations.com"} + +*Example of using the command line to test the standard analyzer:* + +.. code-block:: sh + + curl 'https://$HOST:5984/_nouveau_analyze' -H 'Content-Type: application/json' + -d '{"analyzer":"standard", "text":"ablanks@renovations.com"}' + +*Result of testing the standard analyzer:* + +.. code-block:: javascript + + { + "tokens": [ + "ablanks", + "renovations.com" + ] + } + +Queries +======= + +After you create a search index, you can query it. + +- Issue a partition query using: + ``GET /$DATABASE/_partition/$PARTITION_KEY/_design/$DDOC/_nouveau/$INDEX_NAME`` +- Issue a global query using: + ``GET /$DATABASE/_design/$DDOC/_nouveau/$INDEX_NAME`` + +Specify your search by using the ``q`` parameter. + +*Example of using HTTP to query a partitioned index:* + +.. code-block:: http + + GET /$DATABASE/_partition/$PARTITION_KEY/_design/$DDOC/_nouveau/$INDEX_NAME?include_docs=true&query="*:*"&limit=1 HTTP/1.1 + Content-Type: application/json + +*Example of using HTTP to query a global index:* + +.. code-block:: http + + GET /$DATABASE/_design/$DDOC/_nouveau/$INDEX_NAME?include_docs=true&query="*:*"&limit=1 HTTP/1.1 + Content-Type: application/json + +*Example of using the command line to query a partitioned index:* + +.. code-block:: sh + + curl https://$HOST:5984/$DATABASE/_partition/$PARTITION_KEY/_design/$DDOC/ + _nouveau/$INDEX_NAME?include_docs=true\&query="*:*"\&limit=1 \ + +*Example of using the command line to query a global index:* + +.. code-block:: sh + + curl https://$HOST:5984/$DATABASE/_design/$DDOC/_nouveau/$INDEX_NAME? + include_docs=true\&query="*:*"\&limit=1 \ + +.. _ddoc/nouveau/query_parameters: + +Query Parameters +---------------- + +A full list of query parameters can be found in the +:ref:`API Reference <api/ddoc/nouveau>`. + +.. note:: + Do not combine the ``bookmark`` and ``update`` options. These options + constrain the choice of shard replicas to use for the response. When used + together, the options might cause problems when contact is attempted + with replicas that are slow or not available. + +Relevance +--------- + +When more than one result might be returned, it is possible for them to be sorted. By +default, the sorting order is determined by 'relevance'. + +Relevance is measured according to `Apache Lucene Scoring +<https://lucene.apache.org/core/9_5_0/core/org/apache/ +lucene/search/package-summary.html>`_. +As an example, if you search a simple database for the word +``example``, two documents might contain the word. If one document +mentions the word ``example`` 10 times, but the second document +mentions it only twice, then the first document is considered to be +more 'relevant'. + +If you do not provide a ``sort`` parameter, relevance is used by default. The highest +scoring matches are returned first. + +If you provide a ``sort`` parameter, then matches are returned in that order, ignoring +relevance. + +If you want to use a ``sort`` parameter, and also include ordering by relevance in your +search results, use the special fields ``-<score>`` or ``<score>`` within the ``sort`` +parameter. + +POSTing search queries +---------------------- + +Instead of using the ``GET`` HTTP method, you can also use ``POST``. The main advantage of +``POST`` queries is that they can have a request body, so you can specify the request as a +JSON object. Each parameter in the query string of a ``GET`` request corresponds to a +field in the JSON object in the request body. + +*Example of using HTTP to POST a search request:* + +.. code-block:: http + + POST /db/_design/ddoc/_nouveau/searchname HTTP/1.1 + Content-Type: application/json + +*Example of using the command line to POST a search request:* + +.. code-block:: sh + + curl 'https://$HOST:5984/db/_design/ddoc/_nouveau/searchname' -X POST -H 'Content-Type: application/json' -d @search.json + +*Example JSON document that contains a search request:* + +.. code-block:: javascript + + { + "q": "index:my query", + "sort": "foo", + "limit": 3 + } + +Query syntax +============ + +The CouchDB search query syntax is based on the +`Lucene syntax. <https://lucene.apache.org/core/9_5_0/queryparser/org/apache/ +lucene/queryparser/classic/package-summary.html>`_ +Search queries take the form of ``name:value`` unless the name is omitted, in which case +they use the default field, as demonstrated in the following examples: + +*Example search query expressions:* + +.. code-block:: javascript + + // Birds + class:bird + +.. code-block:: text + + // Animals that begin with the letter "l" + l* + +.. code-block:: text + + // Carnivorous birds + class:bird AND diet:carnivore + +.. code-block:: text + + // Herbivores that start with letter "l" + l* AND diet:herbivore + +.. code-block:: text + + // Medium-sized herbivores + min_length:[1 TO 3] AND diet:herbivore + +.. code-block:: text + + // Herbivores that are 2m long or less + diet:herbivore AND min_length:[-Infinity TO 2] + +.. code-block:: text + + // Mammals that are at least 1.5m long + class:mammal AND min_length:[1.5 TO Infinity] + +.. code-block:: text + + // Find "Meles meles" + latin_name:"Meles meles" + +.. code-block:: text + + // Mammals who are herbivore or carnivore + diet:(herbivore OR omnivore) AND class:mammal + +.. code-block:: text + + // Return all results + *:* + +Queries over multiple fields can be logically combined, and groups and fields can be +further grouped. The available logical operators are case-sensitive and are ``AND``, +``+``, ``OR``, ``NOT`` and ``-``. Range queries can run over strings or numbers. + +If you want a fuzzy search, you can run a query with ``~`` to find terms like the search +term. For instance, ``look~`` finds the terms ``book`` and ``took``. + +.. note:: + If the lower and upper bounds of a range query are both strings that + contain only numeric digits, the bounds are treated as numbers not as + strings. For example, if you search by using the query + ``mod_date:["20170101" TO "20171231"]``, the results include documents + for which ``mod_date`` is between the numeric values 20170101 and + 20171231, not between the strings "20170101" and "20171231". + +You can alter the importance of a search term by adding ``^`` and a positive number. This +alteration makes matches containing the term more or less relevant, proportional to the +power of the boost value. The default value is 1, which means no increase or decrease in +the strength of the match. A decimal value of 0 - 1 reduces importance. making the match +strength weaker. A value greater than one increases importance, making the match strength +stronger. + +Wildcard searches are supported, for both single (``?``) and multiple (``*``) character +searches. For example, ``dat?`` would match ``date`` and ``data``, whereas ``dat*`` would +match ``date``, ``data``, ``database``, and ``dates``. Wildcards must come after the +search term. + +Use ``*:*`` to return all results. + +The following characters require escaping if you want to search on them: + +.. code-block:: sh + + + - && || ! ( ) { } [ ] ^ " ~ * ? : \ / + +To escape one of these characters, use a preceding backslash character (``\``). + +The response to a search query contains an ``order`` field for each of the results. The +``order`` field is an array where the first element is the field or fields that are +specified in the ``sort`` parameter. See the +:ref:`sort parameter <api/ddoc/search>`. If no ``sort`` parameter is included +in the query, then the ``order`` field contains the `Lucene relevance score +<https://lucene.apache.org/core/9_5_0/core/org/apache/ +lucene/search/package-summary.html>`_. + +.. _ddoc/nouveau/faceting: + +Faceting +-------- + +Nouveau Search also supports faceted searching, enabling discovery of aggregate +information about matches quickly and easily. You can match all documents by using the +special ``?q=*:*`` query syntax, and use the returned facets to refine your query. + +*Example of search query:* + +.. code-block:: javascript + + function(doc) { + index("string", "type", doc.type); + index("double", "price", doc.price); + } + +To use facets, all the documents in the index must include all the fields that have +faceting enabled. If your documents do not include all the fields, you receive a +``bad_request`` error with the following reason, "The ``field_name`` does not exist." If +each document does not contain all the fields for facets, create separate indexes for each +field. If you do not create separate indexes for each field, you must include only +documents that contain all the fields. Verify that the fields exist in each document by +using a single ``if`` statement. + +*Example if statement to verify that the required fields exist in each document:* + +.. code-block:: javascript + + if (typeof doc.town == "string" && typeof doc.name == "string") { + index("string", "town", doc.town); + index("string", "name", doc.name); + } + +Counts +------ + +.. note:: + The ``counts`` option is only available when making global queries. + +The ``counts`` facet syntax takes a list of fields, and returns the number of query +results for each unique value of each named field. + +.. note:: + The ``count`` operation works only if the indexed values are strings. + The indexed values cannot be mixed types. For example, + if 100 strings are indexed, and one number, + then the index cannot be used for ``count`` operations. + You can check the type by using the ``typeof`` operator, and convert it + by using the ``parseInt``, + ``parseFloat``, or ``.toString()`` functions. + +*Example of a query using the counts facet syntax:* + +.. code-block:: text + + ?q=*:*&counts=["type"] + +*Example response after using of the counts facet syntax:* + +.. code-block:: javascript + + { + "total_rows":100000, + "bookmark":"g...", + "rows":[...], + "counts":{ + "type":{ + "sofa": 10, + "chair": 100, + "lamp": 97 + } + } + } + +Ranges +------ + +.. note:: + The ``ranges`` option is only available when making global queries. + +The ``range`` facet syntax reuses the standard Lucene syntax for ranges to return counts +of results that fit into each specified category. Inclusive range queries are denoted by +brackets (``[``, ``]``). Exclusive range queries are denoted by curly brackets (``{``, +``}``). + +.. note:: + The ``range`` operation works only if the indexed values are numbers. The indexed + values cannot be mixed types. For example, if 100 strings are indexed, and one number, + then the index cannot be used for ``range`` operations. You can check the type by + using the ``typeof`` operator, and convert it by using the ``parseInt``, + ``parseFloat``, or ``.toString()`` functions. + +*Example of a request that uses faceted search for matching ranges:* + +.. code-block:: text + + ?q=*:*&ranges={"price":{"cheap":"[0 TO 100]","expensive":"{100 TO Infinity}"}} + +*Example results after a ranges check on a faceted search:* + +.. code-block:: javascript + + { + "total_rows":100000, + "bookmark":"g...", + "rows":[...], + "ranges": { + "price": { + "expensive": 278682, + "cheap": 257023 + } + } + } |