summaryrefslogtreecommitdiff
path: root/src/mango/README.md
blob: 4c4bb60a672bdb51abd7e371866aa455e3c9f9d7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
Mango
=====

A MongoDB inspired query language interface for Apache CouchDB.


Motivation
----------

Mango provides a single HTTP API endpoint that accepts JSON bodies via HTTP POST. These bodies provide a set of instructions that will be handled with the results being returned to the client in the same order as they were specified. The general principle of this API is to be simple to implement on the client side while providing users a more natural conversion to Apache CouchDB than would otherwise exist using the standard RESTful HTTP interface that already exists.


Actions
-------

The general API exposes a set of actions that are similar to what MongoDB exposes (although not all of MongoDB's API is supported). These are meant to be loosely and obviously inspired by MongoDB but without too much attention to maintaining the exact behavior.

Each action is specified as a JSON object with a number of keys that affect the behavior. Each action object has at least one field named "action" which must
have a string value indicating the action to be performed. For each action there are zero or more fields that will affect behavior. Some of these fields are required and some are optional.

For convenience, the HTTP API will accept a JSON body that is either a single JSON object which specifies a single action or a JSON array that specifies a list of actions that will then be invoked serially. While multiple commands can be batched into a single HTTP request, there are no guarantees about atomicity or isolation for a batch of commands.

Activating Query on a cluster
--------------------------------------------

Query can be enabled by setting the following config:

```
rpc:multicall(config, set, ["native_query_servers", "query", "{mango_native_proc, start_link, []}"]).
```

HTTP API
========

This API adds a single URI endpoint to the existing CouchDB HTTP API. Creating databases, authentication, Map/Reduce views, etc are all still supported exactly as currently document. No existing behavior is changed.

The endpoint added is for the URL pattern `/dbname/_query` and has the following characteristics:

* The only HTTP method supported is `POST`.
* The request `Content-Type` must be `application/json`.
* The response status code will either be `200`, `4XX`, or `5XX`
* The response `Content-Type` will be `application/json`
* The response `Transfer-Encoding` will be `chunked`.
* The response is a single JSON object or array that matches to the single command or list of commands that exist in the request.

This is intended to be a significantly simpler use of HTTP than the current APIs. This is motivated by the fact that this entire API is aimed at customers who are not as savvy at HTTP or non-relational document stores. Once a customer is comfortable using this API we hope to expose any other "power features" through the existing HTTP API and its adherence to HTTP semantics.


Supported Actions
=================

This is a list of supported actions that Mango understands. For the time being it is limited to the four normal CRUD actions plus one meta action to create indices on the database.

insert
------

Insert a document or documents into the database.

Keys:

* action - "insert"
* docs - The JSON document to insert
* w (optional) (default: 2) - An integer > 0 for the write quorum size

If the provided document or documents do not contain an "\_id" field one will be added using an automatically generated UUID.

It is more performant to specify multiple documents in the "docs" field than it is to specify multiple independent insert actions. Each insert action is submitted as a single bulk update (ie, \_bulk\_docs in CouchDB terminology). This, however, does not make any guarantees on the isolation or atomicity of the bulk operation. It is merely a performance benefit.


find
----

Retrieve documents from the database.

Keys:

* action - "find"
* selector - JSON object following selector syntax, described below
* limit (optional) (default: 25) - integer >= 0, Limit the number of rows returned
* skip (optional) (default: 0) - integer >= 0, Skip the specified number of rows
* sort (optional) (default: []) - JSON array following sort syntax, described below 
* fields (optional) (default: null) - JSON array following the field syntax, described below
* r (optional) (default: 1) - By default a find will return the document that was found when traversing the index. Optionally there can be a quorum read for each document using `r` as the read quorum. This is obviously less performant than using the document local to the index.
* conflicts (optional) (default: false) - boolean, whether or not to include information about any existing conflicts for the document.

The important thing to note about the find command is that it must execute over a generated index. If a selector is provided that cannot be satisfied using an existing index the list of basic indices that could be used will be returned.

For the most part, indices are generated in response to the "create\_index" action (described below) although there are two special indices that can be used as well. The "\_id" is automatically indexed and is similar to every other index. There is also a special "\_seq" index to retrieve documents in the order of their update sequence.

Its also quite possible to generate a query that can't be satisfied by any index. In this case an error will be returned stating that fact. Generally speaking the easiest way to stumble onto this is to attempt to OR two separate fields which would require a complete table scan. In the future I expect to support these more complicated queries using an extended indexing API (which deviates from the current MongoDB model a bit).


update
------

Update an existing document in the database

Keys:

* action - "update"
* selector - JSON object following selector syntax, described below
* update - JSON object following update syntax, described below
* upsert - (optional) (default: false) - boolean, Whether or not to create a new document if the selector does not match any documents in the database
* limit (optional) (default: 1) - integer > 0, How many documents returned from the selector should be modified. Currently has a maximum value of 100
* sort - (optional) (default: []) - JSON array following sort syntax, described below
* r (optional) (default: 1) - integer > 0, read quorum constant
* w (optional) (default: 2) - integer > 0, write quorum constant

Updates are fairly straightforward other than to mention that the selector (like find) must be satisifiable using an existing index.

On the update field, if the provided JSON object has one or more update operator (described below) then the operation is applied onto the existing document (if one exists) else the entire contents are replaced with exactly the value of the `update` field.


delete
------

Remove a document from the database.

Keys:

* action - "delete"
* selector - JSON object following selector syntax, described below
* force (optional) (default: false) - Delete all conflicted versions of the document as well
* limit - (optional) (default: 1) - integer > 0, How many documents to delete from the database. Currently has a maximum value of 100
* sort - (optional) (default: []) - JSON array following sort syntax, described below
* r (optional) (default: 1) - integer > 1, read quorum constant
* w (optional) (default: 2) - integer > 0, write quorum constant

Deletes behave quite similarly to update except they attempt to remove documents from the database. Its important to note that if a document has conflicts it may "appear" that delete's aren't having an effect. This is because the delete operation by default only removes a single revision. Specify `"force":true` if you would like to attempt to delete all live revisions.

If you wish to delete a specific revision of the document, you can specify it in the selector using the special "\_rev" field.


create\_index
-------------

Create an index on the database

Keys:

* action - "create\_index"
* index - JSON array following sort syntax, described below
* type (optional) (default: "json") - string, specifying the index type to create. Currently only "json" indexes are supported but in the future we will provide full-text indexes as well as Geo spatial indexes
* name (optional) - string, optionally specify a name for the index. If a name is not provided one will be automatically generated
* ddoc (optional) - Indexes can be grouped into design documents underneath the hood for efficiency. This is an advanced feature. Don't specify a design document here unless you know the consequences of index invalidation. By default each index is placed in its own separate design document for isolation.

Anytime an operation is required to locate a document in the database it is required that an index must exist that can be used to locate it. By default the only two indices that exist are for the document "\_id" and the special "\_seq" index.

Indices are created in the background. If you attempt to create an index on a large database and then immediately utilize it, the request may block for a considerable amount of time before the request completes.

Indices can specify multiple fields to index simultaneously. This is roughly analogous to a compound index in SQL with the corresponding tradeoffs. For instance, an index may contain the (ordered set of) fields "foo", "bar", and "baz". If a selector specifying "bar" is received, it can not be answered. Although if a selector specifying "foo" and "bar" is received, it can be answered more efficiently than if there were only an index on "foo" and "bar" independently.

NB: while the index allows the ability to specify sort directions these are currently not supported. The sort direction must currently be specified as "asc" in the JSON. [INTERNAL]: This will require that we patch the view engine as well as the cluster coordinators in Fabric to follow the specified sort orders. The concepts are straightforward but the implementation may need some thought to fit into the current shape of things.


list\_indexes
-------------

List the indexes that exist in a given database.

Keys:

* action - "list\_indexes"


delete\_index
-------------

Delete the specified index from the database.

Keys:

* action - "delete\_index"
* name - string, the index to delete
* design\_doc - string, the design doc id from which to delete the index. For auto-generated index names and design docs, you can retrieve this information from the `list\_indexes` action

Indexes require resources to maintain. If you find that an index is no longer necessary then it can be beneficial to remove it from the database.


describe\_selector
------------------

Shows debugging information for a given selector

Keys:

* action - "describe\_selector"
* selector - JSON object in selector syntax, described below
* extended (optional) (default: false) - Show information on what existing indexes could be used with this selector

This is a useful debugging utility that will show how a given selector is normalized before execution as well as information on what indexes could be used to satisfy it.

If `"extended": true` is included then the list of existing indices that could be used for this selector are also returned.



JSON Syntax Descriptions
========================

This API uses a few defined JSON structures for various operations. Here we'll describe each in detail.


Selector Syntax
---------------

The Mango query language is expressed as a JSON object describing documents of interest. Within this structure it is also possible to express conditional logic using specially named fields. This is inspired by and intended to maintain a fairly close parity to the existing MongoDB behavior.

As an example, the simplest selector for Mango might look something like such:

    {"_id": "Paul"}

Which would match the document named "Paul" (if one exists). Extending this example using other fields might look like such:

    {"_id": "Paul", "location": "Boston"}

This would match a document named "Paul" *AND* having a "location" value of "Boston". Seeing as though I'm sitting in my basement in Omaha, this is unlikely.

There are two special syntax elements for the object keys in a selector. The first is that the period (full stop, or simply `.`) character denotes subfields in a document. For instance, here are two equivalent examples:

    {"location": {"city": "Omaha"}}
    {"location.city": "Omaha"}

If the object's key contains the period it could be escaped with backslash, i.e.

    {"location\\.city": "Omaha"}

Note that the double backslash here is necessary to encode an actual single backslash.

The second important syntax element is the use of a dollar sign (`$`) prefix to denote operators. For example:

    {"age": {"$gt": 21}}

In this example, we have created the boolean expression `age > 21`.

There are two core types of operators in the selector syntax: combination operators and condition operators. In general, combination operators contain groups of condition operators. We'll describe the list of each below.

### Implicit Operators

For the most part every operator must be of the form `{"$operator": argument}`. Though there are two implicit operators for selectors.

First, any JSON object that is not the argument to a condition operator is an implicit `$and` operator on each field. For instance, these two examples are identical:

    {"foo": "bar", "baz": true}
    {"$and": [{"foo": {"$eq": "bar"}}, {"baz": {"$eq": true}}]}

And as shown, any field that contains a JSON value that has no operators in it is an equality condition. For instance, these are equivalent:

    {"foo": "bar"}
    {"foo": {"$eq": "bar"}}

And to be clear, these are also equivalent:

    {"foo": {"bar": "baz"}}
    {"foo": {"$eq": {"bar": "baz"}}}

Although, the previous example would actually be normalized internally to this:

    {"foo.bar": {"$eq": "baz"}}


### Combination Operators

These operators are responsible for combining groups of condition operators. Most familiar are the standard boolean operators plus a few extra for working with JSON arrays.

Each of the combining operators take a single argument that is either a condition operator or an array of condition operators.

The list of combining characters:

* "$and" - array argument
* "$or" - array argument
* "$not" - single argument
* "$nor" - array argument
* "$all" - array argument (special operator for array values)
* "$elemMatch" - single argument (special operator for array values)
* "$allMatch" - single argument (special operator for array values)

### Condition Operators

Condition operators are specified on a per field basis and apply to the value indexed for that field. For instance, the basic "$eq" operator matches when the indexed field is equal to its argument. There is currently support for the basic equality and inequality operators as well as a number of meta operators. Some of these operators will accept any JSON argument while some require a specific JSON formatted argument. Each is noted below.

The list of conditional arguments:

(In)equality operators

* "$lt" - any JSON
* "$lte" - any JSON
* "$eq" - any JSON
* "$ne" - any JSON
* "$gte" - any JSON
* "$gt" - any JSON

Object related operators

* "$exists" - boolean, check whether the field exists or not regardless of its value
* "$type" - string, check the document field's type

Array related operators

* "$in" - array of JSON values, the document field must exist in the list provided
* "$nin" - array of JSON values, the document field must not exist in the list provided
* "$size" - integer, special condition to match the length of an array field in a document. Non-array fields cannot match this condition.

Misc related operators

* "$mod" - [Divisor, Remainder], where Divisor and Remainder are both positive integers (ie, greater than 0). Matches documents where (field % Divisor == Remainder) is true. This is false for any non-integer field
* "$regex" - string, a regular expression pattern to match against the document field. Only matches when the field is a string value and matches the supplied matches


Update Syntax
-------------

Need to describe the syntax for update operators.


Sort Syntax
-----------

The sort syntax is a basic array of field name and direction pairs. It looks like such:

    [{field1: dir1} | ...]

Where field1 can be any field (dotted notation is available for sub-document fields) and dir1 can be "asc" or "desc".

Note that it is highly recommended that you specify a single key per object in your sort ordering so that the order is not dependent on the combination of JSON libraries between your application and the internals of Mango's indexing engine.


Fields Syntax
-------------

When retrieving documents from the database you can specify that only a subset of the fields are returned. This allows you to limit your results strictly to the parts of the document that are interesting for the local application logic. The fields returned are specified as an array. Unlike MongoDB only the fields specified are included, there is no automatic inclusion of the "\_id" or other metadata fields when a field list is included.

A trivial example:

    ["foo", "bar", "baz"]


HTTP API
========

Short summary until the full documentation can be brought over.

POST /dbname/\_find
-------------------------

Issue a query.

Request body is a JSON object that has the selector and the various options like limit/skip etc. Or we could post the selector and put the other options into the query string. Though I'd probably prefer to have it all in the body for consistency.

Response is streamed out like a view. 

POST /dbname/\_index
--------------------------

Request body contains the index definition.

Response body is empty and the result is returned as the status code (200 OK -> created, 3something for exists).

GET /dbname/\_index
-------------------------

Request body is empty.

Response body is all of the indexes that are available for use by find.

DELETE /dbname/\_index/ddocid/viewname
--------------------------------------------

Remove the specified index.

Request body is empty.

Response body is empty. The status code gives enough information.