This schema specifies relations that a VTEP can use to integrate physical ports into logical switches maintained by a network virtualization controller such as NSX.

Glossary:

VTEP
VXLAN Tunnel End Point, an entity which originates and/or terminates VXLAN tunnels.
HSC
Hardware Switch Controller.
NVC
Network Virtualization Controller, e.g. NSX.
VRF
Virtual Routing and Forwarding instance.

Common Column

Some tables contain a column, named other_config. This column has the same form and purpose each place that it appears, so we describe it here to save space later.

other_config: map of string-string pairs

Key-value pairs for configuring rarely used or proprietary features.

Some tables do not have other_config column because no key-value pairs have yet been defined for them.

Top-level configuration for a hardware VTEP. There must be exactly one record in the table.

The physical switch or switches managed by the VTEP.

When a physical switch integrates support for this VTEP schema, which is expected to be the most common case, this column should point to one record that represents the switch itself. In another possible implementation, a server or a VM presents a VTEP schema front-end interface to one or more physical switches, presumably communicating with those physical switches over a proprietary protocol. In that case, this column would point to one for each physical switch, and the set might change over time as the front-end server comes to represent a differing set of switches.

These columns primarily configure the database server (ovsdb-server), not the hardware VTEP itself.

Database clients to which the database server should connect or to which it should listen, along with options for how these connection should be configured. See the table for more information.
The overall purpose of this column is described under Common Column at the beginning of this document.

Configuration for a database connection to an Open vSwitch Database (OVSDB) client.

The database server can initiate and maintain active connections to remote clients. It can also listen for database connections.

Connection method for managers.

The following connection methods are currently supported:

ssl:ip[:port]

The specified SSL port (default: 6640) on the host at the given ip, which must be expressed as an IP address (not a DNS name).

SSL key and certificate configuration happens outside the database.

tcp:ip[:port]
The specified TCP port (default: 6640) on the host at the given ip, which must be expressed as an IP address (not a DNS name).
pssl:[port][:ip]

Listens for SSL connections on the specified TCP port (default: 6640). If ip, which must be expressed as an IP address (not a DNS name), is specified, then connections are restricted to the specified local IP address.

ptcp:[port][:ip]
Listens for connections on the specified TCP port (default: 6640). If ip, which must be expressed as an IP address (not a DNS name), is specified, then connections are restricted to the specified local IP address.
Maximum number of milliseconds to wait between connection attempts. Default is implementation-specific. Maximum number of milliseconds of idle time on connection to the client before sending an inactivity probe message. If the Open vSwitch database does not communicate with the client for the specified number of seconds, it will send a probe. If a response is not received for the same additional amount of time, the database server assumes the connection has been broken and attempts to reconnect. Default is implementation-specific. A value of 0 disables inactivity probes. true if currently connected to this manager, false otherwise. A human-readable description of the last error on the connection to the manager; i.e. strerror(errno). This key will exist only if an error has occurred.

The state of the connection to the manager:

VOID
Connection is disabled.
BACKOFF
Attempting to reconnect at an increasing period.
CONNECTING
Attempting to connect.
ACTIVE
Connected, remote host responsive.
IDLE
Connection is idle. Waiting for response to keep-alive.

These values may change in the future. They are provided only for human consumption.

The amount of time since this manager last successfully connected to the database (in seconds). Value is empty if manager has never successfully connected. The amount of time since this manager last disconnected from the database (in seconds). Value is empty if manager has never disconnected. Space-separated list of the names of OVSDB locks that the connection holds. Omitted if the connection does not hold any locks. Space-separated list of the names of OVSDB locks that the connection is currently waiting to acquire. Omitted if the connection is not waiting for any locks. Space-separated list of the names of OVSDB locks that the connection has had stolen by another OVSDB client. Omitted if no locks have been stolen from this connection.

When specifies a connection method that listens for inbound connections (e.g. ptcp: or pssl:) and more than one connection is actually active, the value is the number of active connections. Otherwise, this key-value pair is omitted.

When multiple connections are active, status columns and key-value pairs (other than this one) report the status of one arbitrarily chosen connection.

Additional configuration for a connection between the manager and the database server.

The Differentiated Service Code Point (DSCP) is specified using 6 bits in the Type of Service (TOS) field in the IP header. DSCP provides a mechanism to classify the network traffic and provide Quality of Service (QoS) on IP networks. The DSCP value specified here is used when establishing the connection between the manager and the database server. If no value is specified, a default value of 48 is chosen. Valid DSCP values must be in the range 0 to 63.
A physical switch that implements a VTEP. The physical ports within the switch. Tunnels created by this switch as instructed by the NVC. IPv4 or IPv6 addresses at which the switch may be contacted for management purposes.

IPv4 or IPv6 addresses on which the switch may originate or terminate tunnels.

This column is intended to allow a to determine the that terminates the tunnel represented by a .

Symbolic name for the switch, such as its hostname. An extended description for the switch, such as its switch login banner.

An entry in this column indicates to the NVC that this switch has encountered a fault. The switch must clear this column when the fault has been cleared.

Indicates that the switch has been unable to process MAC entries requested by the NVC due to lack of table resources. Indicates that the switch has been unable to create tunnels requested by the NVC due to lack of resources. Indicates that the switch has been unable to create the logical router interfaces requested by the NVC due to conflicting configurations or a lack of hardware resources. Indicates that the switch has been unable to create the static routes requested by the NVC due to conflicting configurations or a lack of hardware resources. Indicates that the switch has been unable to create the logical router requested by the NVC due to conflicting configurations or a lack of hardware resources. Indicates that the switch does not support logical routing. Indicates that an error has occurred in the switch but that no more specific information is available. Indicates that the requested source node replication mode cannot be supported by the physical switch; this specifically means in this context that the physical switch lacks the capability to support source node replication mode. This error occurs when a controller attempts to set source node replication mode for one of the logical switches that the physical switch is keeping context for. An NVC that observes this error should take appropriate action (for example reverting the logical switch to service node replication mode). It is recommended that an NVC be proactive and test for support of source node replication by using a test logical switch on vtep physical switch nodes and then trying to change the replication mode to source node on this logical switch, checking for error. The NVC could remember this capability per vtep physical switch. Using mixed replication modes on a given logical switch is not recommended. Service node replication mode is considered a basic requirement since it only requires sending a packet to a single transport node, hence it is not expected that a switch should report that service node mode cannot be supported.
The overall purpose of this column is described under Common Column at the beginning of this document.
A tunnel created by a . Tunnel end-point local to the physical switch. Tunnel end-point remote to the physical switch.

BFD, defined in RFC 5880, allows point to point detection of connectivity failures by occasional transmission of BFD control messages. VTEPs are expected to implement BFD.

BFD operates by regularly transmitting BFD control messages at a rate negotiated independently in each direction. Each endpoint specifies the rate at which it expects to receive control messages, and the rate at which it's willing to transmit them. An endpoint which fails to receive BFD control messages for a period of three times the expected reception rate will signal a connectivity fault. In the case of a unidirectional connectivity issue, the system not receiving BFD control messages will signal the problem to its peer in the messages it transmits.

A hardware VTEP is expected to use BFD to determine reachability of devices at the end of the tunnels with which it exchanges data. This can enable the VTEP to choose a functioning service node among a set of service nodes providing high availability. It also enables the NVC to report the health status of tunnels.

In many cases the BFD peer of a hardware VTEP will be an Open vSwitch instance. The Open vSwitch implementation of BFD aims to comply faithfully with the requirements put forth in RFC 5880. Open vSwitch does not implement the optional Authentication or ``Echo Mode'' features.

The HSC writes the key-value pairs in the column to specify the local configurations to be used for BFD sessions on this tunnel.

Set to an Ethernet address in the form xx:xx:xx:xx:xx:xx to set the MAC expected as destination for received BFD packets. The default is 00:23:20:00:00:01. Set to an IPv4 address to set the IP address that is expected as destination for received BFD packets. The default is 169.254.1.0.

The column is the remote counterpart of the column. The NVC writes the key-value pairs in this column.

Set to an Ethernet address in the form xx:xx:xx:xx:xx:xx to set the destination MAC to be used for transmitted BFD packets. The default is 00:23:20:00:00:01. Set to an IPv4 address to set the IP address used as destination for transmitted BFD packets. The default is 169.254.1.1.

The NVC sets up key-value pairs in the column to enable and configure BFD.

True to enable BFD on this . If not specified, BFD will not be enabled by default. The shortest interval, in milliseconds, at which this BFD session offers to receive BFD control messages. The remote endpoint may choose to send messages at a slower rate. Defaults to 1000. The shortest interval, in milliseconds, at which this BFD session is willing to transmit BFD control messages. Messages will actually be transmitted at a slower rate if the remote endpoint is not willing to receive as quickly as specified. Defaults to 100. An alternate receive interval, in milliseconds, that must be greater than or equal to . The implementation should switch from to when there is no obvious incoming data traffic at the tunnel, to reduce the CPU and bandwidth cost of monitoring an idle tunnel. This feature may be disabled by setting a value of 0. This feature is reset whenever or changes. When true, traffic received on the is used to indicate the capability of packet I/O. BFD control packets are still transmitted and received. At least one BFD control packet must be received every 100 * amount of time. Otherwise, even if traffic is received, the will be false. Set to true to notify the remote endpoint that traffic should not be forwarded to this system for some reason other than a connectivity failure on the interface being monitored. The typical underlying reason is ``concatenated path down,'' that is, that connectivity beyond the local system is down. Defaults to false. Set to true to make BFD accept only control messages with a tunnel key of zero. By default, BFD accepts control messages with any tunnel key.

The VTEP sets key-value pairs in the column to report the status of BFD on this tunnel. When BFD is not enabled, with , the HSC clears all key-value pairs from .

Set to true if the BFD session has been successfully enabled. Set to false if the VTEP cannot support BFD or has insufficient resources to enable BFD on this tunnel. The NVC will disable the BFD monitoring on the other side of the tunnel once this value is set to false. Reports the state of the BFD session. The BFD session is fully healthy and negotiated if UP. Reports whether the BFD session believes this may be used to forward traffic. Typically this means the local session is signaling UP, and the remote system isn't signaling a problem such as concatenated path down. A diagnostic code specifying the local system's reason for the last change in session state. The error messages are defined in section 4.1 of [RFC 5880]. Reports the state of the remote endpoint's BFD session. A diagnostic code specifying the remote system's reason for the last change in session state. The error messages are defined in section 4.1 of [RFC 5880]. A short message providing further information about the BFD status (possibly including reasons why BFD could not be enabled).
A port within a . Identifies how VLANs on the physical port are bound to logical switches. If, for example, the map contains a (VLAN, logical switch) pair, a packet that arrives on the port in the VLAN is considered to belong to the paired logical switch. A value of zero in the VLAN field means that untagged traffic on the physical port is mapped to the logical switch.

Attach Access Control Lists (ACLs) to the physical port. The column consists of a map of VLAN tags to s. If the value of the VLAN tag in the map is 0, this means that the ACL is associated with the entire physical port. Non-zero values mean that the ACL is to be applied only on packets carrying that VLAN tag value. Switches will not necessarily support matching on the VLAN tag for all ACLs, and unsupported ACL bindings will cause errors to be reported. The binding of an ACL to a specific VLAN and the binding of an ACL to the entire physical port should not be combined on a single physical port. That is, a mix of zero and non-zero keys in the map is not recommended.

Statistics for VLANs bound to logical switches on the physical port. An implementation that fully supports such statistics would populate this column with a mapping for every VLAN that is bound in . An implementation that does not support such statistics or only partially supports them would not populate this column or partially populate it, respectively. A value of zero in the VLAN field refers to untagged traffic on the physical port. Symbolic name for the port. The name ought to be unique within a given , but the database is not capable of enforcing this. An extended description for the port.

An entry in this column indicates to the NVC that the physical port has encountered a fault. The switch must clear this column when the error has been cleared.

Indicates that a VLAN-to-logical-switch mapping requested by the controller could not be instantiated by the switch because of a conflict with local configuration.

Indicates that an error has occurred in associating an ACL with a port.

Indicates that an error has occurred on the port but that no more specific information is available.

The overall purpose of this column is described under Common Column at the beginning of this document.
Reports statistics for the with which a VLAN on a is associated. These statistics count only packets to which the binding applies. Number of packets sent by the . Number of bytes in packets sent by the . Number of packets received by the . Number of bytes in packets received by the .
A logical Ethernet switch, whose implementation may span physical and virtual media, possibly crossing L3 domains via tunnels; a logical layer-2 domain; an Ethernet broadcast domain.

Tunnel protocols tend to have a field that allows the tunnel to be partitioned into sub-tunnels: VXLAN has a VNI, GRE and STT have a key, CAPWAP has a WSI, and so on. We call these generically ``tunnel keys.'' Given that one needs to use a tunnel key at all, there are at least two reasonable ways to assign their values:

This column is used only in the tunnel key per model (see above), because only in that model is there a tunnel key associated with a logical switch.

For vxlan_over_ipv4 encapsulation, when the tunnel key per model is in use, this column is the VXLAN VNI that identifies a logical switch. It must be in the range 0 to 16,777,215.

For handling L2 broadcast, multicast and unknown unicast traffic, packets can be sent to all members of a logical switch referenced by a physical switch. There are different modes to replicate the packets. The default mode of replication is to send the traffic to a service node, which can be a hypervisor, server or appliance, and let the service node handle replication to other transport nodes (hypervisors or other VTEP physical switches). This mode is called service node replication. An alternate mode of replication, called source node replication involves the source node sending to all other transport nodes. Hypervisors are always responsible for doing their own replication for locally attached VMs in both modes. Service node replication mode is the default and considered a basic requirement because it only requires sending the packet to a single transport node.

This optional column defines the replication mode per . There are 2 valid values, service_node and source_node. If the column is not set, the replication mode defaults to service_node.

Symbolic name for the logical switch. An extended description for the logical switch, such as its switch login banner. The overall purpose of this column is described under Common Column at the beginning of this document.

Mapping of unicast MAC addresses to tunnels (physical locators). This table is written by the HSC, so it contains the MAC addresses that have been learned on physical ports by a VTEP.

A MAC address that has been learned by the VTEP. The Logical switch to which this mapping applies. The physical locator to be used to reach this MAC address. In this table, the physical locator will be one of the tunnel IP addresses of the appropriate VTEP. The IP address to which this MAC corresponds. Optional field for the purpose of ARP supression.

Mapping of unicast MAC addresses to tunnels (physical locators). This table is written by the NVC, so it contains the MAC addresses that the NVC has learned. These include VM MAC addresses, in which case the physical locators will be hypervisor IP addresses. The NVC will also report MACs that it has learned from other HSCs in the network, in which case the physical locators will be tunnel IP addresses of the corresponding VTEPs.

A MAC address that has been learned by the NVC. The Logical switch to which this mapping applies. The physical locator to be used to reach this MAC address. In this table, the physical locator will be either a hypervisor IP address or a tunnel IP addresses of another VTEP. The IP address to which this MAC corresponds. Optional field for the purpose of ARP supression.

Mapping of multicast MAC addresses to tunnels (physical locators). This table is written by the HSC, so it contains the MAC addresses that have been learned on physical ports by a VTEP. These may be learned by IGMP snooping, for example. This table also specifies how to handle unknown unicast and broadcast packets.

A MAC address that has been learned by the VTEP.

The keyword unknown-dst is used as a special ``Ethernet address'' that indicates the locations to which packets in a logical switch whose destination addresses do not otherwise appear in (for unicast addresses) or (for multicast addresses) should be sent.

The Logical switch to which this mapping applies. The physical locator set to be used to reach this MAC address. In this table, the physical locator set will be contain one or more tunnel IP addresses of the appropriate VTEP(s). The IP address to which this MAC corresponds. Optional field for the purpose of ARP supression.

Mapping of multicast MAC addresses to tunnels (physical locators). This table is written by the NVC, so it contains the MAC addresses that the NVC has learned. This table also specifies how to handle unknown unicast and broadcast packets.

Multicast packet replication may be handled by a service node, in which case the physical locators will be IP addresses of service nodes. If the VTEP supports replication onto multiple tunnels, using source node replication, then this may be used to replicate directly onto VTEP-hypervisor or VTEP-VTEP tunnels.

A MAC address that has been learned by the NVC.

The keyword unknown-dst is used as a special ``Ethernet address'' that indicates the locations to which packets in a logical switch whose destination addresses do not otherwise appear in (for unicast addresses) or (for multicast addresses) should be sent.

The Logical switch to which this mapping applies. The physical locator set to be used to reach this MAC address. In this table, the physical locator set will be either a set of service nodes when service node replication is used or the set of transport nodes (defined as hypervisors or VTEPs) participating in the associated logical switch, when source node replication is used. When service node replication is used, the VTEP should send packets to one member of the locator set that is known to be healthy and reachable, which could be determined by BFD. When source node replication is used, the VTEP should send packets to all members of the locator set. The IP address to which this MAC corresponds. Optional field for the purpose of ARP supression.

A logical router, or VRF. A logical router may be connected to one or more logical switches. Subnet addresses and interface addresses may be configured on the interfaces.

Maps from an IPv4 or IPv6 address prefix in CIDR notation to a logical switch. Multiple prefixes may map to the same switch. By writing a 32-bit (or 128-bit for v6) address with a /N prefix length, both the router's interface address and the subnet prefix can be configured. For example, 192.68.1.1/24 creates a /24 subnet for the logical switch attached to the interface and assigns the address 192.68.1.1 to the router interface. One or more static routes, mapping IP prefixes to next hop IP addresses. Maps ACLs to logical router interfaces. The router interfaces are indicated using IP address notation, and must be the same interfaces created in the column. For example, an ACL could be associated with the logical router interface with an address of 192.68.1.1 as defined in the example above. Symbolic name for the logical router. An extended description for the logical router.

An entry in this column indicates to the NVC that the HSC has encountered a fault in configuring state related to the logical router.

Indicates that an error has occurred in associating an ACL with a logical router port.

Indicates that an error has occurred in configuring the logical router but that no more specific information is available.

The overall purpose of this column is described under Common Column at the beginning of this document.

MAC address to be used when a VTEP issues ARP requests on behalf of a logical router.

A distributed logical router is implemented by a set of VTEPs (both hardware VTEPs and vswitches). In order for a given VTEP to populate the local ARP cache for a logical router, it issues ARP requests with a source MAC address that is unique to the VTEP. A single per-VTEP MAC can be re-used across all logical networks. This table contains the MACs that are used by the VTEPs of a given HSC. The table provides the mapping from MAC to physical locator for each VTEP so that replies to the ARP requests can be sent back to the correct VTEP using the appropriate physical locator.

The source MAC to be used by a given VTEP. The to use for replies to ARP requests from this MAC address.

MAC address to be used when a remote VTEP issues ARP requests on behalf of a logical router.

This table is the remote counterpart of . The NVC writes this table to notify the HSC of the MACs that will be used by remote VTEPs when they issue ARP requests on behalf of a distributed logical router.

The source MAC to be used by a given VTEP. The to use for replies to ARP requests from this MAC address.

A set of one or more s.

This table exists only because OVSDB does not have a way to express the type ``map from string to one or more records.''

Identifies an endpoint to which logical switch traffic may be encapsulated and forwarded.

The vxlan_over_ipv4 encapsulation, the only encapsulation defined so far, can use either tunnel key model described in the ``Per Logical-Switch Tunnel Key'' section in the table. When the tunnel key per model is in use, the column in the table is filled with a VNI and the column in this table is empty; in the key-per-tunnel model, the opposite is true. The former model is older, and thus likely to be more widely supported. See the ``Per Logical-Switch Tunnel Key'' section in the table for further discussion of the model.

The type of tunneling encapsulation.

For vxlan_over_ipv4 encapsulation, the IPv4 address of the VXLAN tunnel endpoint.

We expect that this column could be used for IPv4 or IPv6 addresses in encapsulations to be introduced later.

This column is used only in the tunnel key per + model (see above).

For vxlan_over_ipv4 encapsulation, when the + model is in use, this column is the VXLAN VNI. It must be in the range 0 to 16,777,215.

Describes the individual entries that comprise an Access Control List.

Each entry in the table is a single rule to match on certain header fields. While there are a large number of fields that can be matched on, most hardware cannot match on arbitrary combinations of fields. It is common to match on either L2 fields (described below in the L2 group of columns) or L3/L4 fields (the L3/L4 group of columns) but not both. The hardware switch controller may log an error if an ACL entry requires it to match on an incompatible mixture of fields.

The sequence number for the ACL entry for the purpose of ordering entries in an ACL. Lower numbered entries are matched before higher numbered entries.

Source MAC address, in the form xx:xx:xx:xx:xx:xx

Destination MAC address, in the form xx:xx:xx:xx:xx:xx

Ethertype in hexadecimal, in the form 0xAAAA

Source IP address, in the form xx.xx.xx.xx for IPv4 or appropriate colon-separated hexadecimal notation for IPv6.

Mask that determines which bits of source_ip to match on, in the form xx.xx.xx.xx for IPv4 or appropriate colon-separated hexadecimal notation for IPv6.

Destination IP address, in the form xx.xx.xx.xx for IPv4 or appropriate colon-separated hexadecimal notation for IPv6.

Mask that determines which bits of dest_ip to match on, in the form xx.xx.xx.xx for IPv4 or appropriate colon-separated hexadecimal notation for IPv6.

Protocol number in the IPv4 header, or value of the "next header" field in the IPv6 header.

Lower end of the range of source port values. The value specified is included in the range.

Upper end of the range of source port values. The value specified is included in the range.

Lower end of the range of destination port values. The value specified is included in the range.

Upper end of the range of destination port values. The value specified is included in the range.

Integer representing the value of TCP flags to match. For example, the SYN flag is the second least significant bit in the TCP flags. Hence a value of 2 would indicate that the "SYN" flag should be set (assuming an appropriate mask).

Integer representing the mask to apply when matching TCP flags. For example, a value of 2 would imply that the "SYN" flag should be matched and all other flags ignored.

ICMP type to be matched.

ICMP code to be matched.

Direction of traffic to match on the specified port, either "ingress" (toward the logical switch or router) or "egress" (leaving the logical switch or router).

Action to take for this rule, either "permit" or "deny".

An entry in this column indicates to the NVC that the ACL could not be configured as requested. The switch must clear this column when the error has been cleared.

Indicates that an ACL entry requested by the controller could not be instantiated by the switch, e.g. because it requires an unsupported combination of fields to be matched.

Indicates that an error has occurred in configuring the ACL entry but no more specific information is available.

Access Control List table. Each ACL is constructed as a set of entries from the table. Packets that are not matched by any entry in the ACL are allowed by default.

A set of references to entries in the table.

A human readable name for the ACL, which may (for example) be displayed on the switch CLI.

An entry in this column indicates to the NVC that the ACL could not be configured as requested. The switch must clear this column when the error has been cleared.

Indicates that an ACL requested by the controller could not be instantiated by the switch, e.g., because it requires an unsupported combination of fields to be matched.

Indicates that an ACL requested by the controller could not be instantiated by the switch due to a shortage of resources (e.g. TCAM space).

Indicates that an error has occurred in configuring the ACL but no more specific information is available.