summaryrefslogtreecommitdiff
path: root/src/mango/README.md
blob: ff8b14761fda37a4c9256380a4214aa7eea749b1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
Mango
=====

A MongoDB inspired query language interface for Apache CouchDB.


Motivation
----------

Mango provides a single HTTP API endpoint that accepts JSON bodies via
HTTP POST. These bodies provide a set of instructions that will be
handled with the results being returned to the client in the same
order as they were specified. The general principle of this API is to
be simple to implement on the client side while providing users a more
natural conversion to Apache CouchDB than would otherwise exist using
the standard RESTful HTTP interface that already exists.


Actions
-------

The general API exposes a set of actions that are similar to what
MongoDB exposes (although not all of MongoDB's API is
supported). These are meant to be loosely and obviously inspired by
MongoDB but without too much attention to maintaining the exact
behavior.

Each action is specified as a JSON object with a number of keys that
affect the behavior. Each action object has at least one field named
"action" which must have a string value indicating the action to be
performed. For each action there are zero or more fields that will
affect behavior. Some of these fields are required and some are
optional.

For convenience, the HTTP API will accept a JSON body that is either a
single JSON object which specifies a single action or a JSON array
that specifies a list of actions that will then be invoked
serially. While multiple commands can be batched into a single HTTP
request, there are no guarantees about atomicity or isolation for a
batch of commands.

Activating Query on a cluster
--------------------------------------------

Query can be enabled by setting the following config:

```
rpc:multicall(config, set, ["native_query_servers", "query", "{mango_native_proc, start_link, []}"]).
```

HTTP API
========

This API adds a single URI endpoint to the existing CouchDB HTTP
API. Creating databases, authentication, Map/Reduce views, etc are all
still supported exactly as currently document. No existing behavior is
changed.

The endpoint added is for the URL pattern `/dbname/_query` and has the
following characteristics:

* The only HTTP method supported is `POST`.
* The request `Content-Type` must be `application/json`.
* The response status code will either be `200`, `4XX`, or `5XX`
* The response `Content-Type` will be `application/json`
* The response `Transfer-Encoding` will be `chunked`.
* The response is a single JSON object or array that matches to the
  single command or list of commands that exist in the request.

This is intended to be a significantly simpler use of HTTP than the
current APIs. This is motivated by the fact that this entire API is
aimed at customers who are not as savvy at HTTP or non-relational
document stores. Once a customer is comfortable using this API we hope
to expose any other "power features" through the existing HTTP API and
its adherence to HTTP semantics.


Supported Actions
=================

This is a list of supported actions that Mango understands. For the
time being it is limited to the four normal CRUD actions plus one meta
action to create indices on the database.

insert
------

Insert a document or documents into the database.

Keys:

* action - "insert"
* docs - The JSON document to insert
* w (optional) (default: 2) - An integer > 0 for the write quorum size

If the provided document or documents do not contain an "\_id" field
one will be added using an automatically generated UUID.

It is more performant to specify multiple documents in the "docs"
field than it is to specify multiple independent insert actions. Each
insert action is submitted as a single bulk update (ie, \_bulk\_docs
in CouchDB terminology). This, however, does not make any guarantees
on the isolation or atomicity of the bulk operation. It is merely a
performance benefit.


find
----

Retrieve documents from the database.

Keys:

* action - "find"
* selector - JSON object following selector syntax, described below
* limit (optional) (default: 25) - integer >= 0, Limit the number of
  rows returned
* skip (optional) (default: 0) - integer >= 0, Skip the specified
  number of rows
* sort (optional) (default: []) - JSON array following sort syntax,
  described below
* fields (optional) (default: null) - JSON array following the field
  syntax, described below
* r (optional) (default: 1) - By default a find will return the
  document that was found when traversing the index. Optionally there
  can be a quorum read for each document using `r` as the read
  quorum. This is obviously less performant than using the document
  local to the index.
* conflicts (optional) (default: false) - boolean, whether or not to
  include information about any existing conflicts for the document.

The important thing to note about the find command is that it must
execute over a generated index. If a selector is provided that cannot
be satisfied using an existing index the list of basic indices that
could be used will be returned.

For the most part, indices are generated in response to the
"create\_index" action (described below) although there are two
special indices that can be used as well. The "\_id" is automatically
indexed and is similar to every other index. There is also a special
"\_seq" index to retrieve documents in the order of their update
sequence.

Its also quite possible to generate a query that can't be satisfied by
any index. In this case an error will be returned stating that
fact. Generally speaking the easiest way to stumble onto this is to
attempt to OR two separate fields which would require a complete table
scan. In the future I expect to support these more complicated queries
using an extended indexing API (which deviates from the current
MongoDB model a bit).


update
------

Update an existing document in the database

Keys:

* action - "update"
* selector - JSON object following selector syntax, described below
* update - JSON object following update syntax, described below
* upsert - (optional) (default: false) - boolean, Whether or not to
  create a new document if the selector does not match any documents
  in the database
* limit (optional) (default: 1) - integer > 0, How many documents
  returned from the selector should be modified. Currently has a
  maximum value of 100
* sort - (optional) (default: []) - JSON array following sort syntax,
  described below
* r (optional) (default: 1) - integer > 0, read quorum constant
* w (optional) (default: 2) - integer > 0, write quorum constant

Updates are fairly straightforward other than to mention that the
selector (like find) must be satisifiable using an existing index.

On the update field, if the provided JSON object has one or more
update operator (described below) then the operation is applied onto
the existing document (if one exists) else the entire contents are
replaced with exactly the value of the `update` field.


delete
------

Remove a document from the database.

Keys:

* action - "delete"
* selector - JSON object following selector syntax, described below
* force (optional) (default: false) - Delete all conflicted versions
  of the document as well
* limit - (optional) (default: 1) - integer > 0, How many documents to
  delete from the database. Currently has a maximum value of 100
* sort - (optional) (default: []) - JSON array following sort syntax,
  described below
* r (optional) (default: 1) - integer > 1, read quorum constant
* w (optional) (default: 2) - integer > 0, write quorum constant

Deletes behave quite similarly to update except they attempt to remove
documents from the database. Its important to note that if a document
has conflicts it may "appear" that delete's aren't having an
effect. This is because the delete operation by default only removes a
single revision. Specify `"force":true` if you would like to attempt
to delete all live revisions.

If you wish to delete a specific revision of the document, you can
specify it in the selector using the special "\_rev" field.


create\_index
-------------

Create an index on the database

Keys:

* action - "create\_index"
* index - JSON array following sort syntax, described below
* type (optional) (default: "json") - string, specifying the index
  type to create. Currently only "json" indexes are supported but in
  the future we will provide full-text indexes as well as Geo spatial
  indexes
* name (optional) - string, optionally specify a name for the
  index. If a name is not provided one will be automatically generated
* ddoc (optional) - Indexes can be grouped into design documents
  underneath the hood for efficiency. This is an advanced
  feature. Don't specify a design document here unless you know the
  consequences of index invalidation. By default each index is placed
  in its own separate design document for isolation.

Anytime an operation is required to locate a document in the database
it is required that an index must exist that can be used to locate
it. By default the only two indices that exist are for the document
"\_id" and the special "\_seq" index.

Indices are created in the background. If you attempt to create an
index on a large database and then immediately utilize it, the request
may block for a considerable amount of time before the request
completes.

Indices can specify multiple fields to index simultaneously. This is
roughly analogous to a compound index in SQL with the corresponding
tradeoffs. For instance, an index may contain the (ordered set of)
fields "foo", "bar", and "baz". If a selector specifying "bar" is
received, it can not be answered. Although if a selector specifying
"foo" and "bar" is received, it can be answered more efficiently than
if there were only an index on "foo" and "bar" independently.

NB: while the index allows the ability to specify sort directions
these are currently not supported. The sort direction must currently
be specified as "asc" in the JSON. [INTERNAL]: This will require that
we patch the view engine as well as the cluster coordinators in Fabric
to follow the specified sort orders. The concepts are straightforward
but the implementation may need some thought to fit into the current
shape of things.


list\_indexes
-------------

List the indexes that exist in a given database.

Keys:

* action - "list\_indexes"


delete\_index
-------------

Delete the specified index from the database.

Keys:

* action - "delete\_index"
* name - string, the index to delete
* design\_doc - string, the design doc id from which to delete the
  index. For auto-generated index names and design docs, you can
  retrieve this information from the `list\_indexes` action

Indexes require resources to maintain. If you find that an index is no
longer necessary then it can be beneficial to remove it from the
database.


describe\_selector
------------------

Shows debugging information for a given selector

Keys:

* action - "describe\_selector"
* selector - JSON object in selector syntax, described below
* extended (optional) (default: false) - Show information on what
  existing indexes could be used with this selector

This is a useful debugging utility that will show how a given selector
is normalized before execution as well as information on what indexes
could be used to satisfy it.

If `"extended": true` is included then the list of existing indices
that could be used for this selector are also returned.



JSON Syntax Descriptions
========================

This API uses a few defined JSON structures for various
operations. Here we'll describe each in detail.


Selector Syntax
---------------

The Mango query language is expressed as a JSON object describing
documents of interest. Within this structure it is also possible to
express conditional logic using specially named fields. This is
inspired by and intended to maintain a fairly close parity to the
existing MongoDB behavior.

As an example, the simplest selector for Mango might look something like such:

    {"_id": "Paul"}

Which would match the document named "Paul" (if one exists). Extending
this example using other fields might look like such:

    {"_id": "Paul", "location": "Boston"}

This would match a document named "Paul" *AND* having a "location"
value of "Boston". Seeing as though I'm sitting in my basement in
Omaha, this is unlikely.

There are two special syntax elements for the object keys in a
selector. The first is that the period (full stop, or simply `.`)
character denotes subfields in a document. For instance, here are two
equivalent examples:

    {"location": {"city": "Omaha"}}
    {"location.city": "Omaha"}

If the object's key contains the period it could be escaped with backslash, i.e.

    {"location\\.city": "Omaha"}

Note that the double backslash here is necessary to encode an actual
single backslash.

The second important syntax element is the use of a dollar sign (`$`)
prefix to denote operators. For example:

    {"age": {"$gt": 21}}

In this example, we have created the boolean expression `age > 21`.

There are two core types of operators in the selector syntax:
combination operators and condition operators. In general, combination
operators contain groups of condition operators. We'll describe the
list of each below.

### Implicit Operators

For the most part every operator must be of the form `{"$operator":
argument}`. Though there are two implicit operators for selectors.

First, any JSON object that is not the argument to a condition
operator is an implicit `$and` operator on each field. For instance,
these two examples are identical:

    {"foo": "bar", "baz": true}
    {"$and": [{"foo": {"$eq": "bar"}}, {"baz": {"$eq": true}}]}

And as shown, any field that contains a JSON value that has no
operators in it is an equality condition. For instance, these are
equivalent:

    {"foo": "bar"}
    {"foo": {"$eq": "bar"}}

And to be clear, these are also equivalent:

    {"foo": {"bar": "baz"}}
    {"foo": {"$eq": {"bar": "baz"}}}

Although, the previous example would actually be normalized internally to this:

    {"foo.bar": {"$eq": "baz"}}


### Combination Operators

These operators are responsible for combining groups of condition
operators. Most familiar are the standard boolean operators plus a few
extra for working with JSON arrays.

Each of the combining operators take a single argument that is either
a condition operator or an array of condition operators.

The list of combining characters:

* "$and" - array argument
* "$or" - array argument
* "$not" - single argument
* "$nor" - array argument
* "$all" - array argument (special operator for array values)
* "$elemMatch" - single argument (special operator for array values)
* "$allMatch" - single argument (special operator for array values)

### Condition Operators

Condition operators are specified on a per field basis and apply to
the value indexed for that field. For instance, the basic "$eq"
operator matches when the indexed field is equal to its
argument. There is currently support for the basic equality and
inequality operators as well as a number of meta operators. Some of
these operators will accept any JSON argument while some require a
specific JSON formatted argument. Each is noted below.

The list of conditional arguments:

(In)equality operators

* "$lt" - any JSON
* "$lte" - any JSON
* "$eq" - any JSON
* "$ne" - any JSON
* "$gte" - any JSON
* "$gt" - any JSON

Object related operators

* "$exists" - boolean, check whether the field exists or not
  regardless of its value
* "$type" - string, check the document field's type

Array related operators

* "$in" - array of JSON values, the document field must exist in the
  list provided
* "$nin" - array of JSON values, the document field must not exist in
  the list provided
* "$size" - integer, special condition to match the length of an array
  field in a document. Non-array fields cannot match this condition.

Misc related operators

* "$mod" - [Divisor, Remainder], where Divisor and Remainder are both
  positive integers (ie, greater than 0). Matches documents where
  (field % Divisor == Remainder) is true. This is false for any
  non-integer field
* "$regex" - string, a regular expression pattern to match against the
  document field. Only matches when the field is a string value and
  matches the supplied matches


Update Syntax
-------------

Need to describe the syntax for update operators.


Sort Syntax
-----------

The sort syntax is a basic array of field name and direction pairs. It
looks like such:

    [{field1: dir1} | ...]

Where field1 can be any field (dotted notation is available for
sub-document fields) and dir1 can be "asc" or "desc".

Note that it is highly recommended that you specify a single key per
object in your sort ordering so that the order is not dependent on the
combination of JSON libraries between your application and the
internals of Mango's indexing engine.


Fields Syntax
-------------

When retrieving documents from the database you can specify that only
a subset of the fields are returned. This allows you to limit your
results strictly to the parts of the document that are interesting for
the local application logic. The fields returned are specified as an
array. Unlike MongoDB only the fields specified are included, there is
no automatic inclusion of the "\_id" or other metadata fields when a
field list is included.

A trivial example:

    ["foo", "bar", "baz"]


HTTP API
========

Short summary until the full documentation can be brought over.

POST /dbname/\_find
-------------------------

Issue a query.

Request body is a JSON object that has the selector and the various
options like limit/skip etc. Or we could post the selector and put the
other options into the query string. Though I'd probably prefer to
have it all in the body for consistency.

Response is streamed out like a view.

POST /dbname/\_index
--------------------------

Request body contains the index definition.

Response body is empty and the result is returned as the status code
(200 OK -> created, 3something for exists).

GET /dbname/\_index
-------------------------

Request body is empty.

Response body is all of the indexes that are available for use by find.

DELETE /dbname/\_index/ddocid/viewname
--------------------------------------------

Remove the specified index.

Request body is empty.

Response body is empty. The status code gives enough information.