diff options
author | Ben Pfaff <blp@ovn.org> | 2017-12-14 11:17:23 -0800 |
---|---|---|
committer | Ben Pfaff <blp@ovn.org> | 2017-12-14 11:21:42 -0800 |
commit | 12b84d50e0324727a24fc5aa378497e1dbe41821 (patch) | |
tree | 6bbc52f2da260a5049b684cb225b70d7b18663c7 /Documentation/ref | |
parent | adb4185d01ce964db1154df07dbd91c0f90539f7 (diff) | |
download | openvswitch-12b84d50e0324727a24fc5aa378497e1dbe41821.tar.gz |
ovsdb: Improve documentation.
Signed-off-by: Ben Pfaff <blp@ovn.org>
Acked-by: Justin Pettit <jpettit@ovn.org>
Diffstat (limited to 'Documentation/ref')
-rw-r--r-- | Documentation/ref/index.rst | 3 | ||||
-rw-r--r-- | Documentation/ref/ovsdb-server.7.rst | 394 | ||||
-rw-r--r-- | Documentation/ref/ovsdb.5.rst | 125 | ||||
-rw-r--r-- | Documentation/ref/ovsdb.7.rst | 454 |
4 files changed, 976 insertions, 0 deletions
diff --git a/Documentation/ref/index.rst b/Documentation/ref/index.rst index 3e2f8d5d9..d83b809f5 100644 --- a/Documentation/ref/index.rst +++ b/Documentation/ref/index.rst @@ -41,6 +41,9 @@ time: ovs-test.8 ovs-vlan-test.8 + ovsdb-server.7 + ovsdb.5 + ovsdb.7 The remainder are still in roff format can be found below: diff --git a/Documentation/ref/ovsdb-server.7.rst b/Documentation/ref/ovsdb-server.7.rst new file mode 100644 index 000000000..cc625f601 --- /dev/null +++ b/Documentation/ref/ovsdb-server.7.rst @@ -0,0 +1,394 @@ +.. + Copyright (c) 2017 Nicira, Inc. + + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +============ +ovsdb-server +============ + +Description +=========== + +``ovsdb-server`` implements the Open vSwitch Database (OVSDB) protocol +specified in RFC 7047. This document provides clarifications for how +``ovsdb-server`` implements the protocol and describes the extensions that it +provides beyond RFC 7047. Numbers in section headings refer to corresponding +sections in RFC 7047. + +3.1 JSON Usage +-------------- + +RFC 4627 says that names within a JSON object should be unique. +The Open vSwitch JSON parser discards all but the last value +for a name that is specified more than once. + +The definition of <error> allows for implementation extensions. +Currently ``ovsdb-server`` uses the following additional ``error`` +strings (which might change in later releases): + +``syntax error`` or ``unknown column`` + The request could not be parsed as an OVSDB request. An additional + ``syntax`` member, whose value is a string that contains JSON, may narrow + down the particular syntax that could not be parsed. + +``internal error`` + The request triggered a bug in ``ovsdb-server``. + +``ovsdb error`` + A map or set contains a duplicate key. + +``permission error`` + The request was denied by the role-based access control extension, + introduced in version 2.8. + +3.2 Schema Format +----------------- + +RFC 7047 requires the ``version`` field in <database-schema>. Current versions +of ``ovsdb-server`` allow it to be omitted (future versions are likely to +require it). + +RFC 7047 allows columns that contain weak references to be immutable. This +raises the issue of the behavior of the weak reference when the rows that it +references are deleted. Since version 2.6, ``ovsdb-server`` forces columns +that contain weak references to be mutable. + +Since version 2.8, the table name ``RBAC_Role`` is used internally by the +role-based access control extension to ``ovsdb-server`` and should not be used +for purposes other than defining mappings of role names to table access +permissions. This table has one row per role name and the following columns: + +``name`` + The role name. + +``permissions`` + A map of table name to a reference to a row in a separate permission table. + +The separate RBAC permission table has one row per access control +configuration and the following columns: + +``name`` + The name of the table to which the row applies. + +``authorization`` + The set of column names and column:key pairs to be compared with the client + ID in order to determine the authorization status of the requested + operation. + +``insert_delete`` + A boolean value, true if authorized insertions and deletions are allowed, + false if no insertions or deletions are allowed. + +``update`` + The set of columns and column:key pairs for which authorized update and + mutate operations should be permitted. + +4 Wire Protocol +--------------- + +The original OVSDB specifications included the following reasons, omitted from +RFC 7047, to operate JSON-RPC directly over a stream instead of over HTTP: + +* JSON-RPC is a peer-to-peer protocol, but HTTP is a client-server protocol, + which is a poor match. Thus, JSON-RPC over HTTP requires the client to + periodically poll the server to receive server requests. + +* HTTP is more complicated than stream connections and doesn't provide any + corresponding advantage. + +* The JSON-RPC specification for HTTP transport is incomplete. + +4.1.3 Transact +-------------- + +Since version 2.8, role-based access controls can be applied to operations +within a transaction that would modify the contents of the database (these +operations include row insert, row delete, column update, and column +mutate). Role-based access controls are applied when the database schema +contains a table with the name ``RBAC_Role`` and the connection on which the +transaction request was received has an associated role name (from the ``role`` +column in the remote connection table). When role-based access controls are +enabled, transactions that are otherwise well-formed may be rejected depending +on the client's role, ID, and the contents of the ``RBAC_Role`` table and +associated permissions table. + +4.1.5 Monitor +------------- + +For backward compatibility, ``ovsdb-server`` currently permits a single +<monitor-request> to be used instead of an array; it is treated as a +single-element array. Future versions of ``ovsdb-server`` might remove this +compatibility feature. + +Because the <json-value> parameter is used to match subsequent update +notifications (see below) to the request, it must be unique among all active +monitors. ``ovsdb-server`` rejects attempt to create two monitors with the +same identifier. + +4.1.12 Monitor_cond +------------------- + +A new monitor method added in Open vSwitch version 2.6. The ``monitor_cond`` +request enables a client to replicate subsets of tables within an OVSDB +database by requesting notifications of changes to rows matching one of the +conditions specified in ``where`` by receiving the specified contents of these +rows when table updates occur. ``monitor_cond`` also allows a more efficient +update notifications by receiving <table-updates2> notifications (described +below). + +The ``monitor`` method described in Section 4.1.5 also applies to +``monitor_cond``, with the following exceptions: + +* RPC request method becomes ``monitor_cond``. + +* Reply result follows <table-updates2>, described in Section 4.1.14. + +* Subsequent changes are sent to the client using the ``update2`` monitor + notification, described in Section 4.1.14 + +* Update notifications are being sent only for rows matching [<condition>*]. + + +The request object has the following members:: + + "method": "monitor_cond" + "params": [<db-name>, <json-value>, <monitor-cond-requests>] + "id": <nonnull-json-value> + +The <json-value> parameter is used to match subsequent update notifications +(see below) to this request. The <monitor-cond-requests> object maps the name +of the table to an array of <monitor-cond-request>. + +Each <monitor-cond-request> is an object with the following members:: + + "columns": [<column>*] optional + "where": [<condition>*] optional + "select": <monitor-select> optional + +The ``columns``, if present, define the columns within the table to be +monitored that match conditions. If not present, all columns are monitored. + +The ``where``, if present, is a JSON array of <condition> and boolean values. +If not present or condition is an empty array, implicit True will be considered +and updates on all rows will be sent. + +<monitor-select> is an object with the following members:: + + "initial": <boolean> optional + "insert": <boolean> optional + "delete": <boolean> optional + "modify": <boolean> optional + +The contents of this object specify how the columns or table are to be +monitored as explained in more detail below. + +The response object has the following members:: + + "result": <table-updates2> + "error": null + "id": same "id" as request + +The <table-updates2> object is described in detail in Section 4.1.14. It +contains the contents of the tables for which initial rows are selected. If no +tables initial contents are requested, then ``result`` is an empty object. + +Subsequently, when changes to a specified table that match one of the +conditions in <monitor-cond-request> are committed, the changes are +automatically sent to the client using the ``update2`` monitor notification +(see Section 4.1.14). This monitoring persists until the JSON-RPC session +terminates or until the client sends a ``monitor_cancel`` JSON-RPC request. + +Each <monitor-cond-request> specifies one or more conditions and the manner in +which the rows that match the conditions are to be monitored. The +circumstances in which an ``update`` notification is sent for a row within the +table are determined by <monitor-select>: + +* If ``initial`` is omitted or true, every row in the original table that + matches one of the conditions is sent as part of the response to the + ``monitor_cond`` request. + +* If ``insert`` is omitted or true, update notifications are sent for rows + newly inserted into the table that match conditions or for rows modified in + the table so that their old version does not match the condition and new + version does. + +* If ``delete`` is omitted or true, update notifications are sent for rows + deleted from the table that match conditions or for rows modified in the + table so that their old version does match the conditions and new version + does not. + +* If ``modify`` is omitted or true, update notifications are sent whenever a + row in the table that matches conditions in both old and new version is + modified. + +Both ``monitor`` and ``monitor_cond`` sessions can exist concurrently. However, +``monitor`` and ``monitor_cond`` shares the same <json-value> parameter space; +it must be unique among all ``monitor`` and ``monitor_cond`` sessions. + +4.1.13 Monitor_cond_change +-------------------------- + +The ``monitor_cond_change`` request enables a client to change an existing +``monitor_cond`` replication of the database by specifying a new condition and +columns for each replicated table. Currently changing the columns set is not +supported. + +The request object has the following members:: + + "method": "monitor_cond_change" + "params": [<json-value>, <json-value>, <monitor-cond-update-requests>] + "id": <nonnull-json-value> + +The <json-value> parameter should have a value of an existing conditional +monitoring session from this client. The second <json-value> in params array is +the requested value for this session. This value is valid only after +``monitor_cond_change`` is committed. A user can use these values to +distinguish between update messages before conditions update and after. The +<monitor-cond-update-requests> object maps the name of the table to an array of +<monitor-cond-update-request>. Monitored tables not included in +<monitor-cond-update-requests> retain their current conditions. + +Each <monitor-cond-update-request> is an object with the following members:: + + "columns": [<column>*] optional + "where": [<condition>*] optional + +The ``columns`` specify a new array of columns to be monitored, although this +feature is not yet supported. + +The ``where`` specify a new array of conditions to be applied to this +monitoring session. + +The response object has the following members:: + + "result": null + "error": null + "id": same "id" as request + +Subsequent <table-updates2> notifications are described in detail in Section +4.1.14 in the RFC. If insert contents are requested by original monitor_cond +request, <table-updates2> will contain rows that match the new condition and do +not match the old condition. If deleted contents are requested by origin +monitor request, <table-updates2> will contain any matched rows by old +condition and not matched by the new condition. + +Changes according to the new conditions are automatically sent to the client +using the ``update2`` monitor notification. An update, if any, as a result of +a condition change, will be sent to the client before the reply to the +``monitor_cond_change`` request. + +4.1.14 Update2 notification +--------------------------- + +The ``update2`` notification is sent by the server to the client to report +changes in tables that are being monitored following a ``monitor_cond`` request +as described above. The notification has the following members:: + + "method": "update2" + "params": [<json-value>, <table-updates2>] + "id": null + +The <json-value> in ``params`` is the same as the value passed as the +<json-value> in ``params`` for the corresponding ``monitor`` request. +<table-updates2> is an object that maps from a table name to a <table-update2>. +A <table-update2> is an object that maps from row's UUID to a <row-update2> +object. A <row-update2> is an object with one of the following members: + +``"initial": <row>`` + present for ``initial`` updates + +``"insert": <row>`` + present for ``insert`` updates + +``"delete": <row>`` + present for ``delete`` updates + +``"modify": <row>"`` + present for ``modify`` updates + +The format of <row> is described in Section 5.1. + +<row> is always a null object for a ``delete`` update. In ``initial`` and +``insert`` updates, <row> omits columns whose values equal the default value of +the column type. + +For a ``modify`` update, <row> contains only the columns that are modified. +<row> stores the difference between the old and new value for those columns, as +described below. + +For columns with single value, the difference is the value of the new column. + +The difference between two sets are all elements that only belong to one of the +sets. + +The difference between two maps are all key-value pairs whose keys appears in +only one of the maps, plus the key-value pairs whose keys appear in both maps +but with different values. For the latter elements, <row> includes the value +from the new column. + +Initial views of rows are not presented in update2 notifications, but in the +response object to the ``monitor_cond`` request. The formatting of the +<table-updates2> object, however, is the same in either case. + +4.1.15 Get Server ID +-------------------- + +A new RPC method added in Open vSwitch version 2.7. The request contains the +following members:: + + "method": "get_server_id" + "params": null + "id": <nonnull-json-value> + +The response object contains the following members:: + + "result": "<server_id>" + "error": null + "id": same "id" as request + +<server_id> is JSON string that contains a UUID that uniquely identifies the +running OVSDB server process. A fresh UUID is generated when the process +restarts. + +5.1 Notation +------------ + +For <condition>, RFC 7047 only allows the use of ``!=``, ``==``, ``includes``, +and ``excludes`` operators with set types. Open vSwitch 2.4 and later extend +<condition> to allow the use of ``<``, ``<=``, ``>=``, and ``>`` operators with +columns with type "set of 0 or 1 integer" and "set of 0 or 1 real". These +conditions evaluate to false when the column is empty, and otherwise as +described in RFC 7047 for integer and real types. + +<condition> is specified in Section 5.1 in the RFC with the following change: A +condition can be either a 3-element JSON array as described in the RFC or a +boolean value. In case of an empty array an implicit true boolean value will be +considered. + +5.2.6 Wait, 5.2.7 Commit, 5.2.9 Comment +--------------------------------------- + +RFC 7047 says that the ``wait``, ``commit``, and ``comment`` operations have no +corresponding result object. This is not true. Instead, when such an +operation is successful, it yields a result object with no members. diff --git a/Documentation/ref/ovsdb.5.rst b/Documentation/ref/ovsdb.5.rst new file mode 100644 index 000000000..f3e50976b --- /dev/null +++ b/Documentation/ref/ovsdb.5.rst @@ -0,0 +1,125 @@ +.. + Copyright (c) 2017 Nicira, Inc. + + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +===== +ovsdb +===== + +Description +=========== + +OVSDB, the Open vSwitch Database, is a database system whose network +protocol is specified by RFC 7047. The RFC does not specify an on-disk +storage format. This manpage documents the format used by Open vSwitch. + +Most users do not need to be concerned with this specification. Instead, +to manipulate OVSDB files, refer to `ovsdb-tool(1)`. For an +introduction to OVSDB as a whole, read `ovsdb(7)`. + +OVSDB files explicitly record changes that are implied by the database schema. +For example, the OVSDB "garbage collection" feature means that when a client +removes the last reference to a garbage-collected row, the database server +automatically removes that row. The database file explicitly records the +deletion of the garbage-collected row, so that the reader does not need to +infer it. + +OVSDB files do not include the values of ephemeral columns. + +Database files are text files encoded in UTF-8 with LF (U+000A) line ends, +organized as append-only series of records. Each record consists of 2 +lines of text. + +The first line in each record has the format ``OVSDB JSON`` *length* *hash*, +where *length* is a positive decimal integer and *hash* is a SHA-1 checksum +expressed as 40 hexadecimal digits. Words in the first line must be separated +by exactly one space. + +The second line must be exactly *length* bytes long (including the LF) and its +SHA-1 checksum (including the LF) must match *hash* exactly. The line's +contents must be a valid JSON object as specified by RFC 4627. Strings in the +JSON object must be valid UTF-8. To ensure that the second line is exactly one +line of text, the OVSDB implementation expresses any LF characters within a +JSON string as ``\n``. For the same reason, and to save space, the OVSDB +implementation does not "pretty print" the JSON object with spaces and LFs. +(The OVSDB implementation tolerates LFs when reading an OVSDB database file, as +long as *length* and *hash* are correct.) + +JSON Notation +------------- + +We use notation from RFC 7047 here to describe the JSON data in records. +In addition to the notation defined there, we add the following: + +<raw-uuid> + A 36-character JSON string that contains a UUID in the format described by + RFC 4122, e.g. ``"550e8400-e29b-41d4-a716-446655440000"`` + +Standalone Format +----------------- + +The first record in a standalone database contains the JSON schema for the +database, as specified in RFC 7047. Only this record is mandatory (a +standalone file that contains only a schema represents an empty database). + +The second and subsequent records in a standalone database are transaction +records. Each record may have the following optional special members, +which do not have any semantics but are often useful to administrators +looking through a database log with ``ovsdb-tool show-log``: + +``"_date": <integer>`` + The time at which the transaction was committed, as an integer number of + milliseconds since the Unix epoch. Early versions of OVSDB counted seconds + instead of milliseconds; these can be detected by noticing that their + values are less than 2**32. + + OVSDB always writes a ``_date`` member. + +``"_comment": <string>`` + A JSON string that specifies the comment provided in a transaction + ``comment`` operation. If a transaction has multiple ``comment`` + operations, OVSDB concatenates them into a single ``_comment`` member, + separated by a new-line. + + OVSDB only writes a ``_comment`` member if it would be + a nonempty string. + +Each of these records also has one or more additional members, each of which +maps from the name of a database table to a <table-txn>: + +<table-txn> + A JSON object that describes the effects of a transaction on a database + table. Its names are <raw-uuid>s for rows in the table and its values are + <row-txn>s. + +<row-txn> + Either ``null``, which indicates that the transaction deleted this row, or + a JSON object that describes how the transaction inserted or modified the + row, whose names are the names of columns and whose values are <value>s + that give the column's new value. + + For new rows, the OVSDB implementation omits columns whose values have the + default values for their types defined in RFC 7047 section 5.2.1; for + modified rows, the OVSDB implementation omits columns whose values are + unchanged. diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst new file mode 100644 index 000000000..1106c63e2 --- /dev/null +++ b/Documentation/ref/ovsdb.7.rst @@ -0,0 +1,454 @@ +.. + Copyright (c) 2017 Nicira, Inc. + + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +===== +ovsdb +===== + +Description +=========== + +OVSDB, the Open vSwitch Database, is a network database system. Schemas in +OVSDB specify the tables in a database and their columns' types and can +include data, uniqueness, and referential integrity constraints. OVSDB +offers atomic, consistent, isolated, durable transactions. RFC 7047 +specifies the JSON-RPC based protocol that OVSDB clients and servers use to +communicate. + +The OVSDB protocol is well suited for state synchronization because it +allows each client to monitor the contents of a whole database or a subset +of it. Whenever a monitored portion of the database changes, the server +tells the client what rows were added or modified (including the new +contents) or deleted. Thus, OVSDB clients can easily keep track of the +newest contents of any part of the database. + +While OVSDB is general-purpose and not particularly specialized for use with +Open vSwitch, Open vSwitch does use it for multiple purposes. The leading use +of OVSDB is for configuring and monitoring ``ovs-vswitchd(8)``, the Open +vSwitch switch daemon, using the schema documented in +``ovs-vswitchd.conf.db(5)``. The Open Virtual Network (OVN) sub-project of OVS +uses two OVSDB schemas, documented in ``ovn-nb(5)`` and ``ovn-sb(5)``. +Finally, Open vSwitch includes the "VTEP" schema, documented in +``vtep(5)`` that many third-party hardware switches support for +configuring VXLAN, although OVS itself does not directly use this schema. + +The OVSDB protocol specification allows independent, interoperable +implementations of OVSDB to be developed. Open vSwitch includes an OVSDB +server implementation named ``ovsdb-server(1)``, which supports several +protocol extensions documented in its manpage, and a basic command-line OVSDB +client named ``ovsdb-client(1)``, as well as OVSDB client libraries for C and +for Python. Open vSwitch documentation often speaks of these OVSDB +implementations in Open vSwitch as simply "OVSDB," even though that is distinct +from the OVSDB protocol; we make the distinction explicit only when it might +otherwise be unclear from the context. + +In addition to these generic OVSDB server and client tools, Open vSwitch +includes tools for working with databases that have specific schemas: +``ovs-vsctl`` works with the ``ovs-vswitchd`` configuration database, +``vtep-ctl`` works with the VTEP database, ``ovn-nbctl`` works with +the OVN Northbound database, and so on. + +RFC 7047 specifies the OVSDB protocol but it does not specify an on-disk +storage format. Open vSwitch includes ``ovsdb-tool(1)`` for working with its +own on-disk database formats. The most notable feature of this format is that +``ovsdb-tool(1)`` makes it easy for users to print the transactions that have +changed a database since the last time it was compacted. This feature is often +useful for troubleshooting. + +Schemas +======= + +Schemas in OVSDB have a JSON format that is specified in RFC 7047. They +are often stored in files with an extension ``.ovsschema``. An +on-disk database in OVSDB includes a schema and data, embedding both into a +single file. The Open vSwitch utility ``ovsdb-tool`` has commands +that work with schema files and with the schemas embedded in database +files. + +An Open vSwitch schema has three important identifiers. The first is its +name, which is also the name used in JSON-RPC calls to identify a database +based on that schema. For example, the schema used to configure Open +vSwitch has the name ``Open_vSwitch``. Schema names begin with a +letter or an underscore, followed by any number of letters, underscores, or +digits. The ``ovsdb-tool`` commands ``schema-name`` and +``db-name`` extract the schema name from a schema or database +file, respectively. + +An OVSDB schema also has a version of the form ``x.y.z`` e.g. ``1.2.3``. +Schemas managed within the Open vSwitch project manage version numbering in the +following way (but OVSDB does not mandate this approach). Whenever we change +the database schema in a non-backward compatible way (e.g. when we delete a +column or a table), we increment <x> and set <y> and <z> to 0. When we change +the database schema in a backward compatible way (e.g. when we add a new +column), we increment <y> and set <z> to 0. When we change the database schema +cosmetically (e.g. we reindent its syntax), we increment <z>. The +``ovsdb-tool`` commands ``schema-version`` and ``db-version`` extract the +schema version from a schema or database file, respectively. + +Very old OVSDB schemas do not have a version, but RFC 7047 mandates it. + +An OVSDB schema optionally has a "checksum." RFC 7047 does not specify the use +of the checksum and recommends that clients ignore it. Open vSwitch uses the +checksum to remind developers to update the version: at build time, if the +schema's embedded checksum, ignoring the checksum field itself, does not match +the schema's content, then it fails the build with a recommendation to update +the version and the checksum. Thus, a developer who changes the schema, but +does not update the version, receives an automatic reminder. In practice this +has been an effective way to ensure compliance with the version number policy. +The ``ovsdb-tool`` commands ``schema-cksum`` and ``db-cksum`` extract the +schema checksum from a schema or database file, respectively. + +Service Models +============== + +OVSDB supports two service models for databases: **standalone**, and +**active-backup**. The service models provide different compromises +among consistency and availability. + +RFC 7047, which specifies the OVSDB protocol, does not mandate or specify +any particular service model. + +The following sections describe the individual service models. + +Standalone Database Service Model +--------------------------------- + +A **standalone** database runs a single server. If the server stops running, +the database becomes inaccessible, and if the server's storage is lost or +corrupted, the database's content is lost. This service model is appropriate +when the database controls a process or activity to which it is linked via +"fate-sharing." For example, an OVSDB instance that controls an Open vSwitch +virtual switch daemon, ``ovs-vswitchd``, is a standalone database because a +server failure would take out both the database and the virtual switch. + +To set up a standalone database, use ``ovsdb-tool create`` to +create a database file, then run ``ovsdb-server`` to start the +database service. + +Active-Backup Database Service Model +------------------------------------ + +An **active-backup** database runs two servers (on different hosts). At any +given time, one of the servers is designated with the **active** role and the +other the **backup** role. An active server behaves just like a standalone +server. A backup server makes an OVSDB connection to the active server and +uses it to continuously replicate its content as it changes in real time. +OVSDB clients can connect to either server but only the active server allows +data modification or lock transactions. + +Setup for an active-backup database starts from a working standalone database +service, which is initially the active server. On another node, to set up a +backup server, create a database file with the same schema as the active +server. The initial contents of the database file do not matter, as long as +the schema is correct, so ``ovsdb-tool create`` will work, as will copying the +database file from the active server. Then use +``ovsdb-server --sync-from=<active>`` to start the backup server, where +<active> is an OVSDB connection method (see `Connection Methods`_ below) that +connects to the active server. At that point, the backup server will fetch a +copy of the active database and keep it up-to-date until it is killed. + +When the active server in an active-backup server pair fails, an administrator +can switch the backup server to an active role with the ``ovs-appctl`` command +``ovsdb-server/disconnect-active-ovsdb-server``. Clients then have read/write +access to the now-active server. Of course, administrators are slow to respond +compared to software, so in practice external management software detects the +active server's failure and changes the backup server's role. For example, the +"Integration Guide for Centralized Control" in the Open vSwitch documentation +describes how to use Pacemaker for this purpose in OVN. + +Suppose an active server fails and its backup is promoted to active. If the +failed server is revived, it must be started as a backup server. Otherwise, if +both servers are active, then they may start out of sync, if the database +changed while the server was down, and they will continue to diverge over time. +This also happens if the software managing the database servers cannot reach +the active server and therefore switches the backup to active, but other hosts +can reach both servers. These "split-brain" problems are unsolvable in general +for server pairs. + +Compared to a standalone server, the active-backup service model +somewhat increases availability, at a risk of split-brain. It adds +generally insignificant performance overhead. + +Open vSwitch 2.6 introduced support for the active-backup service model. + +Database Replication +==================== + +OVSDB can layer **replication** on top of any of its service models. +Replication, in this context, means to make, and keep up-to-date, a read-only +copy of the contents of a database (the ``replica``). One use of replication +is to keep an up-to-date backup of a database. A replica used solely for +backup would not need to support clients of its own. A set of replicas that do +serve clients could be used to scale out read access to the primary database. + +A database replica is set up in the same way as a backup server in an +active-backup pair, with the difference that the replica is never promoted to +an active role. + +A database can have multiple replicas. + +Open vSwitch 2.6 introduced support for database replication. + +Connection Methods +================== + +An OVSDB **connection method** is a string that specifies how to make a +JSON-RPC connection between an OVSDB client and server. Connection methods are +part of the Open vSwitch implementation of OVSDB and not specified by RFC 7047. +``ovsdb-server`` uses connection methods to specify how it should listen for +connections from clients and ``ovsdb-client`` uses them to specify how it +should connect to a server. Connections in the opposite direction, where +``ovsdb-server`` connects to a client that is configured to listen for an +incoming connection, are also possible. + +Connection methods are classified as **active** or **passive**. An active +connection method makes an outgoing connection to a remote host; a passive +connection method listens for connections from remote hosts. The most common +arrangement is to configure an OVSDB server with passive connection methods and +clients with active ones, but the OVSDB implementation in Open vSwitch supports +the opposite arrangement as well. + +OVSDB supports the following active connection methods: + +ssl:<ip>:<port> + The specified SSL or TLS <port> on the host at the given <ip>. + +tcp:<ip>:<port> + The specified TCP <port> on the host at the given <ip>. + +unix:<file> + On Unix-like systems, connect to the Unix domain server socket named + <file>. + + On Windows, connect to a local named pipe that is represented by a file + created in the path <file> to mimic the behavior of a Unix domain socket. + +OVSDB supports the following passive connection methods: + +pssl:<port>[:<ip>] + Listen on the given TCP <port> for SSL or TLS connections. By default, + connections are not bound to a particular local IP address. Specifying + <ip> limits connections to those from the given IP. + +ptcp:<port>[:<ip>] + Listen on the given TCP <port>. By default, connections are not bound to a + particular local IP address. Specifying <ip> limits connections to those + from the given IP. + +punix:<file> + On Unix-like systems, listens for connections on the Unix domain socket + named <file>. + + On Windows, listens on a local named pipe, creating a named pipe + <file> to mimic the behavior of a Unix domain socket. + +All IP-based connection methods accept IPv4 and IPv6 addresses. To specify an +IPv6 address, wrap it in square brackets, e.g. ``ssl:[::1]:6640``. Passive +IP-based connection methods by default listen for IPv4 connections only; use +``[::]`` as the address to accept both IPv4 and IPv6 connections, +e.g. ``pssl:6640:[::]``. DNS names are not accepted. On Linux, use +``%<device>`` to designate a scope for IPv6 link-level addresses, +e.g. ``ssl:[fe80::1234%eth0]:6653``. + +The <port> may be omitted from connection methods that use a port number. The +default <port> for TCP-based connection methods is 6640, e.g. ``pssl:`` is +equivalent to ``pssl:6640``. In Open vSwitch prior to version 2.4.0, the +default port was 6632. To avoid incompatibility between older and newer +versions, we encourage users to specify a port number. + +The ``ssl`` and ``pssl`` connection methods requires additional configuration +through ``--private-key``, ``--certificate``, and ``--ca-cert`` command line +options. Open vSwitch can be built without SSL support, in which case these +connection methods are not supported. + +Database Life Cycle +=================== + +This section describes how to handle various events in the life cycle of +a database using the Open vSwitch implementation of OVSDB. + +Creating a Database +------------------- + +Creating and starting up the service for a new database was covered +separately for each database service model in the `Service +Models`_ section, above. + +Backing Up and Restoring a Database +----------------------------------- + +OVSDB is often used in contexts where the database contents are not +particularly valuable. For example, in many systems, the database for +configuring ``ovs-vswitchd`` is essentially rebuilt from scratch +at boot time. It is not worthwhile to back up these databases. + +When OVSDB is used for valuable data, a backup strategy is worth +considering. One way is to use database replication, discussed above in +`Database Replication`_ which keeps an online, up-to-date +copy of a database, possibly on a remote system. This works with all OVSDB +service models. + +A more common backup strategy is to periodically take and store a snapshot. +For the standalone and active-backup service models, making a copy of the +database file, e.g. using ``cp``, effectively makes a snapshot, and because +OVSDB database files are append-only, it works even if the database is being +modified when the snapshot takes place. + +To restore from a backup, stop the database server or servers, overwrite +the database file with the backup (e.g. with ``cp``), and then +restart the servers. + +None of these approaches saves and restores data in columns that the schema +designates as ephemeral. This is by design: the designer of a schema only +marks a column as ephemeral if it is acceptable for its data to be lost +when a database server restarts. + +Upgrading or Downgrading a Database +----------------------------------- + +The evolution of a piece of software can require changes to the schemas of the +databases that it uses. For example, new features might require new tables or +new columns in existing tables, or conceptual changes might require a database +to be reorganized in other ways. In some cases, the easiest way to deal with a +change in a database schema is to delete the existing database and start fresh +with the new schema, especially if the data in the database is easy to +reconstruct. But in many other cases, it is better to convert the database +from one schema to another. + +The OVSDB implementation in Open vSwitch has built-in support for some simple +cases of converting a database from one schema to another. This support can +handle changes that add or remove database columns or tables or that eliminate +constraints (for example, changing a column that must have exactly one value +into one that has one or more values). It can also handle changes that add +constraints or make them stricter, but only if the existing data in the +database satisfies the new constraints (for example, changing a column that has +one or more values into a column with exactly one value, if every row in the +column has exactly one value). The built-in conversion can cause data loss in +obvious ways, for example if the new schema removes tables or columns, or +indirectly, for example by deleting unreferenced rows in tables that the new +schema marks for garbage collection. + +Converting a database can lose data, so it is wise to make a backup beforehand. + +To use OVSDB's built-in support for schema conversion with a standalone or +active-backup database, first stop the database server or servers, then use +``ovsdb-tool convert`` to convert it to the new schema, and then restart the +database server. + +Schema versions and checksums (see Schemas_ above) can give hints about whether +a database needs to be converted to a new schema. If there is any question, +though, the ``needs-conversion`` command on ``ovsdb-tool`` can provide a +definitive answer. + +Working with Database History +----------------------------- + +Both on-disk database formats that OVSDB supports are organized as a stream of +transaction records. Each record describes a change to the database as a list +of rows that were inserted or deleted or modified, along with the details. +Therefore, in normal operation, a database file only grows, as each change +causes another record to be appended at the end. Usually, a user has no need +to understand this file structure. This section covers some exceptions. + +Compacting Databases +-------------------- + +If OVSDB database files were truly append-only, then over time they would grow +without bound. To avoid this problem, OVSDB can **compact** a database file, +that is, replace it by a new version that contains only the current database +contents, as if it had been inserted by a single transaction. From time to +time, ``ovsdb-server`` automatically compacts a database that grows much larger +than its minimum size. + +Because ``ovsdb-server`` automatically compacts databases, it is usually not +necessary to compact them manually, but OVSDB still offers a few ways to do it. +First, ``ovsdb-tool compact`` can compact a standalone or active-backup +database that is not currently being served by ``ovsdb-server`` (or otherwise +locked for writing by another process). To compact any database that is +currently being served by ``ovsdb-server``, use ``ovs-appctl`` to send the +``ovsdb-server/compact`` command. Each server in an active-backup database +maintains its database file independently, so to compact all of them, issue +this command separately on each server. + +Viewing History +--------------- + +The ``ovsdb-tool`` utility's ``show-log`` command displays the transaction +records in an OVSDB database file in a human-readable format. By default, it +shows minimal detail, but adding the option ``-m`` once or twice increases the +level of detail. In addition to the transaction data, it shows the time and +date of each transaction and any "comment" added to the transaction by the +client. The comments can be helpful for quickly understanding a transaction; +for example, ``ovs-vsctl`` adds its command line to the transactions that it +makes. + +For active-backup databases, the sequence of transactions in each server's log +will differ, even at points when they reflect the same data. + +Truncating History +------------------ + +It may occasionally be useful to "roll back" a database file to an earlier +point. Because of the organization of OVSDB records, this is easy to do. +Start by noting the record number <i> of the first record to delete in +``ovsdb-tool show-log`` output. Each record is two lines of plain text, so +trimming the log is as simple as running ``head -n <j>``, where <j> = 2 * <i>. + +Corruption +---------- + +When ``ovsdb-server`` opens an OVSDB database file, of any kind, it reads as +many transaction records as it can from the file until it reaches the end of +the file or it encounters a corrupted record. At that point it stops reading +and regards the data that it has read to this point as the full contents of the +database file, effectively rolling the database back to an earlier point. + +Each transaction record contains an embedded SHA-1 checksum, which the server +verifies as it reads a database file. It detects corruption when a checksum +fails to verify. Even though SHA-1 is no longer considered secure for use in +cryptography, it is acceptable for this purpose because it is not used to +defend against malicious attackers. + +The first record in a standalone or active-backup database file specifies the +schema. ``ovsdb-server`` will refuse to work with a database whose first +record is corrupted. Delete and recreate such a database, or restore it from a +backup. + +When ``ovsdb-server`` adds records to a database file in which it detected +corruption, it first truncates the file just after the last good record. + +See Also +======== + +RFC 7047, "The Open vSwitch Database Management Protocol." + +Open vSwitch implementations of generic OVSDB functionality: +``ovsdb-server(1)``, ``ovsdb-client(1)``, ``ovsdb-tool(1)``. + +Tools for working with databases that have specific OVSDB schemas: +``ovs-vsctl(8)``, ``vtep-ctl(8)``, ``ovn-nbctl(8)``, ``ovn-sbctl(8)``. + +OVSDB schemas for Open vSwitch and related functionality: +``ovs-vswitchd.conf.db(5)``, ``vtep(5)``, ``ovn-nb(5)``, ``ovn-sb(5)``. |