This schema specifies relations that a VTEP can use to integrate
physical ports into logical switches maintained by a network
virtualization controller such as NSX.
Glossary:
- VTEP
-
VXLAN Tunnel End Point, an entity which originates and/or terminates
VXLAN tunnels.
- HSC
-
Hardware Switch Controller.
- NVC
-
Network Virtualization Controller, e.g. NSX.
- VRF
-
Virtual Routing and Forwarding instance.
Common Column
Some tables contain a column, named other_config
.
This column has the same form and purpose each place that it appears,
so we describe it here to save space later.
other_config
: map of string-string pairs
-
Key-value pairs for configuring rarely used or proprietary features.
Some tables do not have other_config
column because no
key-value pairs have yet been defined for them.
Top-level configuration for a hardware VTEP. There must be
exactly one record in the table.
The physical switch or switches managed by the VTEP.
When a physical switch integrates support for this VTEP schema, which
is expected to be the most common case, this column should point to one
record that represents the switch
itself. In another possible implementation, a server or a VM presents
a VTEP schema front-end interface to one or more physical switches,
presumably communicating with those physical switches over a
proprietary protocol. In that case, this column would point to one
for each physical switch, and the set
might change over time as the front-end server comes to represent a
differing set of switches.
These columns primarily configure the database server
(ovsdb-server
), not the hardware VTEP itself.
Database clients to which the database server should connect or
to which it should listen, along with options for how these
connection should be configured. See the
table for more information.
The overall purpose of this column is described under Common
Column
at the beginning of this document.
Configuration for a database connection to an Open vSwitch Database
(OVSDB) client.
The database server can initiate and maintain active connections
to remote clients. It can also listen for database connections.
Connection method for managers.
The following connection methods are currently supported:
ssl:ip
[:port
]
-
The specified SSL port (default: 6640) on the host at
the given ip, which must be expressed as an IP address
(not a DNS name).
SSL key and certificate configuration happens outside the
database.
tcp:ip
[:port
]
-
The specified TCP port (default: 6640) on the host at
the given ip, which must be expressed as an IP address
(not a DNS name).
pssl:
[port][:ip
]
-
Listens for SSL connections on the specified TCP port
(default: 6640). If ip, which must be expressed as an
IP address (not a DNS name), is specified, then connections are
restricted to the specified local IP address.
ptcp:
[port][:ip
]
-
Listens for connections on the specified TCP port
(default: 6640). If ip, which must be expressed as an
IP address (not a DNS name), is specified, then connections are
restricted to the specified local IP address.
Maximum number of milliseconds to wait between connection attempts.
Default is implementation-specific.
Maximum number of milliseconds of idle time on connection to the
client before sending an inactivity probe message. If the Open
vSwitch database does not communicate with the client for the
specified number of seconds, it will send a probe. If a
response is not received for the same additional amount of time,
the database server assumes the connection has been broken
and attempts to reconnect. Default is implementation-specific.
A value of 0 disables inactivity probes.
true
if currently connected to this manager,
false
otherwise.
A human-readable description of the last error on the connection
to the manager; i.e. strerror(errno)
. This key
will exist only if an error has occurred.
The state of the connection to the manager:
VOID
- Connection is disabled.
BACKOFF
- Attempting to reconnect at an increasing period.
CONNECTING
- Attempting to connect.
ACTIVE
- Connected, remote host responsive.
IDLE
- Connection is idle. Waiting for response to keep-alive.
These values may change in the future. They are provided only for
human consumption.
The amount of time since this manager last successfully connected
to the database (in seconds). Value is empty if manager has never
successfully connected.
The amount of time since this manager last disconnected from the
database (in seconds). Value is empty if manager has never
disconnected.
Space-separated list of the names of OVSDB locks that the connection
holds. Omitted if the connection does not hold any locks.
Space-separated list of the names of OVSDB locks that the connection is
currently waiting to acquire. Omitted if the connection is not waiting
for any locks.
Space-separated list of the names of OVSDB locks that the connection
has had stolen by another OVSDB client. Omitted if no locks have been
stolen from this connection.
When specifies a connection method that
listens for inbound connections (e.g. ptcp:
or
pssl:
) and more than one connection is actually active,
the value is the number of active connections. Otherwise, this
key-value pair is omitted.
When multiple connections are active, status columns and key-value
pairs (other than this one) report the status of one arbitrarily
chosen connection.
Additional configuration for a connection between the manager
and the database server.
The Differentiated Service Code Point (DSCP) is specified using 6 bits
in the Type of Service (TOS) field in the IP header. DSCP provides a
mechanism to classify the network traffic and provide Quality of
Service (QoS) on IP networks.
The DSCP value specified here is used when establishing the
connection between the manager and the database server. If no
value is specified, a default value of 48 is chosen. Valid DSCP
values must be in the range 0 to 63.
A physical switch that implements a VTEP.
The physical ports within the switch.
Tunnels created by this switch as instructed by the NVC.
IPv4 or IPv6 addresses at which the switch may be contacted
for management purposes.
IPv4 or IPv6 addresses on which the switch may originate or
terminate tunnels.
This column is intended to allow a to
determine the that terminates
the tunnel represented by a .
Symbolic name for the switch, such as its hostname.
An extended description for the switch, such as its switch login
banner.
An entry in this column indicates to the NVC that this switch
has encountered a fault. The switch must clear this column
when the fault has been cleared.
Indicates that the switch has been unable to process MAC
entries requested by the NVC due to lack of table resources.
Indicates that the switch has been unable to create tunnels
requested by the NVC due to lack of resources.
Indicates that the switch has been unable to create the logical router
interfaces requested by the NVC due to conflicting configurations or a
lack of hardware resources.
Indicates that the switch has been unable to create the static routes
requested by the NVC due to conflicting configurations or a lack of
hardware resources.
Indicates that the switch has been unable to create the logical router
requested by the NVC due to conflicting configurations or a lack of
hardware resources.
Indicates that the switch does not support logical routing.
Indicates that an error has occurred in the switch but that no
more specific information is available.
Indicates that the requested source node replication mode cannot be
supported by the physical switch; this specifically means in this
context that the physical switch lacks the capability to support
source node replication mode. This error occurs when a controller
attempts to set source node replication mode for one of the logical
switches that the physical switch is keeping context for. An NVC
that observes this error should take appropriate action (for example
reverting the logical switch to service node replication mode).
It is recommended that an NVC be proactive and test for support of
source node replication by using a test logical switch on vtep
physical switch nodes and then trying to change the replication mode
to source node on this logical switch, checking for error. The NVC
could remember this capability per vtep physical switch. Using
mixed replication modes on a given logical switch is not recommended.
Service node replication mode is considered a basic requirement
since it only requires sending a packet to a single transport node,
hence it is not expected that a switch should report that service
node mode cannot be supported.
The overall purpose of this column is described under Common
Column
at the beginning of this document.
A tunnel created by a .
Tunnel end-point local to the physical switch.
Tunnel end-point remote to the physical switch.
BFD, defined in RFC 5880, allows point to point detection of
connectivity failures by occasional transmission of BFD control
messages. VTEPs are expected to implement BFD.
BFD operates by regularly transmitting BFD control messages at a
rate negotiated independently in each direction. Each endpoint
specifies the rate at which it expects to receive control messages,
and the rate at which it's willing to transmit them. An endpoint
which fails to receive BFD control messages for a period of three
times the expected reception rate will signal a connectivity
fault. In the case of a unidirectional connectivity issue, the
system not receiving BFD control messages will signal the problem
to its peer in the messages it transmits.
A hardware VTEP is expected to use BFD to determine reachability of
devices at the end of the tunnels with which it exchanges data. This
can enable the VTEP to choose a functioning service node among a set of
service nodes providing high availability. It also enables the NVC to
report the health status of tunnels.
In many cases the BFD peer of a hardware VTEP will be an Open vSwitch
instance. The Open vSwitch implementation of BFD aims to comply
faithfully with the requirements put forth in RFC 5880. Open vSwitch
does not implement the optional Authentication or ``Echo Mode''
features.
The HSC writes the key-value pairs in the
column to specify the local
configurations to be used for BFD sessions on this tunnel.
Set to an Ethernet address in the form
xx:xx:xx:xx:xx:xx
to set the MAC expected as destination for received BFD packets.
The default is 00:23:20:00:00:01
.
Set to an IPv4 address to set the IP address that is expected as destination
for received BFD packets. The default is 169.254.1.0
.
The column is the remote
counterpart of the column.
The NVC writes the key-value pairs in this column.
Set to an Ethernet address in the form
xx:xx:xx:xx:xx:xx
to set the destination MAC to be used for transmitted BFD packets.
The default is 00:23:20:00:00:01
.
Set to an IPv4 address to set the IP address used as destination
for transmitted BFD packets. The default is 169.254.1.1
.
The NVC sets up key-value pairs in the
column to enable and configure BFD.
True to enable BFD on this . If not
specified, BFD will not be enabled by default.
The shortest interval, in milliseconds, at which this BFD session
offers to receive BFD control messages. The remote endpoint may
choose to send messages at a slower rate. Defaults to
1000
.
The shortest interval, in milliseconds, at which this BFD session is
willing to transmit BFD control messages. Messages will actually be
transmitted at a slower rate if the remote endpoint is not willing to
receive as quickly as specified. Defaults to 100
.
An alternate receive interval, in milliseconds, that must be greater
than or equal to . The
implementation should switch from
to when there is no obvious
incoming data traffic at the tunnel, to reduce the CPU and bandwidth
cost of monitoring an idle tunnel. This feature may be disabled by
setting a value of 0. This feature is reset whenever
or
changes.
When true
, traffic received on the
is used to indicate the capability of packet I/O.
BFD control packets are still transmitted and received. At least one
BFD control packet must be received every
100 * amount of time.
Otherwise, even if traffic is received, the
will be false
.
Set to true to notify the remote endpoint that traffic should not be
forwarded to this system for some reason other than a connectivity
failure on the interface being monitored. The typical underlying
reason is ``concatenated path down,'' that is, that connectivity
beyond the local system is down. Defaults to false.
Set to true to make BFD accept only control messages with a tunnel
key of zero. By default, BFD accepts control messages with any
tunnel key.
The VTEP sets key-value pairs in the
column to report the status of BFD on this tunnel. When BFD is
not enabled, with , the
HSC clears all key-value pairs from .
Set to true if the BFD session has been successfully enabled.
Set to false if the VTEP cannot support BFD or has insufficient
resources to enable BFD on this tunnel. The NVC will disable
the BFD monitoring on the other side of the tunnel once this
value is set to false.
Reports the state of the BFD session. The BFD session is fully
healthy and negotiated if UP
.
Reports whether the BFD session believes this
may be used to forward traffic. Typically this means the local session
is signaling UP
, and the remote system isn't signaling a
problem such as concatenated path down.
A diagnostic code specifying the local system's reason for the
last change in session state. The error messages are defined in
section 4.1 of [RFC 5880].
Reports the state of the remote endpoint's BFD session.
A diagnostic code specifying the remote system's reason for the
last change in session state. The error messages are defined in
section 4.1 of [RFC 5880].
A short message providing further information about the BFD status
(possibly including reasons why BFD could not be enabled).
A port within a .
Identifies how VLANs on the physical port are bound to logical switches.
If, for example, the map contains a (VLAN, logical switch) pair, a packet
that arrives on the port in the VLAN is considered to belong to the
paired logical switch. A value of zero in the VLAN field means
that untagged traffic on the physical port is mapped to the
logical switch.
Attach Access Control Lists (ACLs) to the physical port. The
column consists of a map of VLAN tags to s. If the value of
the VLAN tag in the map is 0, this means that the ACL is
associated with the entire physical port. Non-zero values mean
that the ACL is to be applied only on packets carrying that VLAN
tag value. Switches will not necessarily support matching on the
VLAN tag for all ACLs, and unsupported ACL bindings will cause
errors to be reported. The binding of an ACL to a specific
VLAN and the binding of an ACL to the entire physical port
should not be combined on a single physical port. That is, a
mix of zero and non-zero keys in the map is not recommended.
Statistics for VLANs bound to logical switches on the physical port. An
implementation that fully supports such statistics would populate this
column with a mapping for every VLAN that is bound in . An implementation that does not support such
statistics or only partially supports them would not populate this column
or partially populate it, respectively. A value of zero in the
VLAN field refers to untagged traffic on the physical port.
Symbolic name for the port. The name ought to be unique within a given
, but the database is not capable of
enforcing this.
An extended description for the port.
An entry in this column indicates to the NVC that the physical port has
encountered a fault. The switch must clear this column when the error
has been cleared.
Indicates that a VLAN-to-logical-switch mapping requested by
the controller could not be instantiated by the switch
because of a conflict with local configuration.
Indicates that an error has occurred in associating an ACL
with a port.
Indicates that an error has occurred on the port but that no
more specific information is available.
The overall purpose of this column is described under Common
Column
at the beginning of this document.
Reports statistics for the with which a VLAN
on a is associated.
These statistics count only packets to which the binding applies.
Number of packets sent by the .
Number of bytes in packets sent by the .
Number of packets received by the .
Number of bytes in packets received by the .
A logical Ethernet switch, whose implementation may span physical and
virtual media, possibly crossing L3 domains via tunnels; a logical layer-2
domain; an Ethernet broadcast domain.
Tunnel protocols tend to have a field that allows the tunnel
to be partitioned into sub-tunnels: VXLAN has a VNI, GRE and
STT have a key, CAPWAP has a WSI, and so on. We call these
generically ``tunnel keys.'' Given that one needs to use a
tunnel key at all, there are at least two reasonable ways to
assign their values:
-
Per +
pair. That is, each logical switch may be assigned a different
tunnel key on every . This model is
especially flexible.
In this model, carries the tunnel
key. Therefore, one record will
exist for each logical switch carried at a given IP destination.
-
Per . That is, every tunnel
associated with a particular logical switch carries the same tunnel
key, regardless of the to which the
tunnel is addressed. This model may ease switch implementation
because it imposes fewer requirements on the hardware datapath.
In this model, carries the tunnel
key. Therefore, one record will
exist for each IP destination.
This column is used only in the tunnel key per model (see above), because only in that
model is there a tunnel key associated with a logical switch.
For vxlan_over_ipv4
encapsulation, when the tunnel key
per model is in use, this column is the
VXLAN VNI that identifies a logical switch. It must be in the range
0 to 16,777,215.
For handling L2 broadcast, multicast and unknown unicast traffic,
packets can be sent to all members of a logical switch referenced by
a physical switch. There are different modes to replicate the
packets. The default mode of replication is to send the traffic to
a service node, which can be a hypervisor, server or appliance, and
let the service node handle replication to other transport nodes
(hypervisors or other VTEP physical switches). This mode is called
service node replication. An alternate mode of replication, called
source node replication involves the source node sending to all
other transport nodes. Hypervisors are always responsible for doing
their own replication for locally attached VMs in both modes.
Service node replication mode is the default and considered a
basic requirement because it only requires sending the packet to
a single transport node.
This optional column defines the replication mode per
. There are 2 valid values,
service_node
and source_node
. If the
column is not set, the replication mode defaults to service_node.
Symbolic name for the logical switch.
An extended description for the logical switch, such as its switch
login banner.
The overall purpose of this column is described under Common
Column
at the beginning of this document.
Mapping of unicast MAC addresses to tunnels (physical
locators). This table is written by the HSC, so it contains the
MAC addresses that have been learned on physical ports by a
VTEP.
A MAC address that has been learned by the VTEP.
The Logical switch to which this mapping applies.
The physical locator to be used to reach this MAC address. In
this table, the physical locator will be one of the tunnel IP
addresses of the appropriate VTEP.
The IP address to which this MAC corresponds. Optional field for
the purpose of ARP supression.
Mapping of unicast MAC addresses to tunnels (physical
locators). This table is written by the NVC, so it contains the
MAC addresses that the NVC has learned. These include VM MAC
addresses, in which case the physical locators will be
hypervisor IP addresses. The NVC will also report MACs that it
has learned from other HSCs in the network, in which case the
physical locators will be tunnel IP addresses of the
corresponding VTEPs.
A MAC address that has been learned by the NVC.
The Logical switch to which this mapping applies.
The physical locator to be used to reach this MAC address. In
this table, the physical locator will be either a hypervisor IP
address or a tunnel IP addresses of another VTEP.
The IP address to which this MAC corresponds. Optional field for
the purpose of ARP supression.
Mapping of multicast MAC addresses to tunnels (physical
locators). This table is written by the HSC, so it contains the
MAC addresses that have been learned on physical ports by a
VTEP. These may be learned by IGMP snooping, for example. This
table also specifies how to handle unknown unicast and broadcast packets.
A MAC address that has been learned by the VTEP.
The keyword unknown-dst
is used as a special
``Ethernet address'' that indicates the locations to which
packets in a logical switch whose destination addresses do not
otherwise appear in (for
unicast addresses) or (for
multicast addresses) should be sent.
The Logical switch to which this mapping applies.
The physical locator set to be used to reach this MAC address. In
this table, the physical locator set will be contain one or more tunnel IP
addresses of the appropriate VTEP(s).
The IP address to which this MAC corresponds. Optional field for
the purpose of ARP supression.
Mapping of multicast MAC addresses to tunnels (physical
locators). This table is written by the NVC, so it contains the
MAC addresses that the NVC has learned. This
table also specifies how to handle unknown unicast and broadcast
packets.
Multicast packet replication may be handled by a service node,
in which case the physical locators will be IP addresses of
service nodes. If the VTEP supports replication onto multiple
tunnels, using source node replication, then this may be used to
replicate directly onto VTEP-hypervisor or VTEP-VTEP tunnels.
A MAC address that has been learned by the NVC.
The keyword unknown-dst
is used as a special
``Ethernet address'' that indicates the locations to which
packets in a logical switch whose destination addresses do not
otherwise appear in (for
unicast addresses) or (for
multicast addresses) should be sent.
The Logical switch to which this mapping applies.
The physical locator set to be used to reach this MAC address. In
this table, the physical locator set will be either a set of service
nodes when service node replication is used or the set of transport
nodes (defined as hypervisors or VTEPs) participating in the associated
logical switch, when source node replication is used. When service node
replication is used, the VTEP should send packets to one member of the
locator set that is known to be healthy and reachable, which could be
determined by BFD. When source node replication is used, the VTEP
should send packets to all members of the locator set.
The IP address to which this MAC corresponds. Optional field for
the purpose of ARP supression.
A logical router, or VRF. A logical router may be connected to one or more
logical switches. Subnet addresses and interface addresses may be configured on the
interfaces.
Maps from an IPv4 or IPv6 address prefix in CIDR notation to a
logical switch. Multiple prefixes may map to the same switch. By
writing a 32-bit (or 128-bit for v6) address with a /N prefix
length, both the router's interface address and the subnet
prefix can be configured. For example, 192.68.1.1/24 creates a
/24 subnet for the logical switch attached to the interface and
assigns the address 192.68.1.1 to the router interface.
One or more static routes, mapping IP prefixes to next hop IP addresses.
Maps ACLs to logical router interfaces. The router interfaces
are indicated using IP address notation, and must be the same
interfaces created in the
column. For example, an ACL could be associated with the logical
router interface with an address of 192.68.1.1 as defined in the
example above.
Symbolic name for the logical router.
An extended description for the logical router.
An entry in this column indicates to the NVC that the HSC has
encountered a fault in configuring state related to the
logical router.
Indicates that an error has occurred in associating an ACL
with a logical router port.
Indicates that an error has occurred in configuring the
logical router but that no
more specific information is available.
The overall purpose of this column is described under Common
Column
at the beginning of this document.
MAC address to be used when a VTEP issues ARP requests on behalf
of a logical router.
A distributed logical router is implemented by a set of VTEPs
(both hardware VTEPs and vswitches). In order for a given VTEP
to populate the local ARP cache for a logical router, it issues
ARP requests with a source MAC address that is unique to the VTEP. A
single per-VTEP MAC can be re-used across all logical
networks. This table contains the MACs that are used by the
VTEPs of a given HSC. The table provides the mapping from MAC to
physical locator for each VTEP so that replies to the ARP
requests can be sent back to the correct VTEP using the
appropriate physical locator.
The source MAC to be used by a given VTEP.
The to use for replies to ARP
requests from this MAC address.
MAC address to be used when a remote VTEP issues ARP requests on behalf
of a logical router.
This table is the remote counterpart of . The NVC writes this table to notify
the HSC of the MACs that will be used by remote VTEPs when they
issue ARP requests on behalf of a distributed logical router.
The source MAC to be used by a given VTEP.
The to use for replies to ARP
requests from this MAC address.
A set of one or more s.
This table exists only because OVSDB does not have a way to
express the type ``map from string to one or more records.''
Identifies an endpoint to which logical switch traffic may be
encapsulated and forwarded.
The vxlan_over_ipv4
encapsulation, the only encapsulation
defined so far, can use either tunnel key model described in the ``Per
Logical-Switch Tunnel Key'' section in the
table. When the tunnel key per model is in
use, the column in the
table is filled with a VNI and the column in this table is empty; in the
key-per-tunnel model, the opposite is true. The former model is older,
and thus likely to be more widely supported. See the ``Per
Logical-Switch Tunnel Key'' section in the
table for further discussion of the model.
The type of tunneling encapsulation.
For vxlan_over_ipv4
encapsulation, the IPv4 address of the
VXLAN tunnel endpoint.
We expect that this column could be used for IPv4 or IPv6 addresses in
encapsulations to be introduced later.
This column is used only in the tunnel key per + model (see
above).
For vxlan_over_ipv4
encapsulation, when the + model is in
use, this column is the VXLAN VNI. It must be in the range 0 to
16,777,215.
Describes the individual entries that comprise an Access Control List.
Each entry in the table is a single rule to match on certain
header fields. While there are a large number of fields that can
be matched on, most hardware cannot match on arbitrary
combinations of fields. It is common to match on either L2
fields (described below in the L2 group of columns) or L3/L4 fields
(the L3/L4 group of columns) but not both. The hardware switch
controller may log an error if an ACL entry requires it to match
on an incompatible mixture of fields.
The sequence number for the ACL entry for the purpose of
ordering entries in an ACL. Lower numbered entries are matched
before higher numbered entries.
Source MAC address, in the form
xx:xx:xx:xx:xx:xx
Destination MAC address, in the form
xx:xx:xx:xx:xx:xx
Ethertype in hexadecimal, in the form
0xAAAA
Source IP address, in the form
xx.xx.xx.xx for IPv4 or appropriate
colon-separated hexadecimal notation for IPv6.
Mask that determines which bits of source_ip to match on, in the form
xx.xx.xx.xx for IPv4 or appropriate
colon-separated hexadecimal notation for IPv6.
Destination IP address, in the form
xx.xx.xx.xx for IPv4 or appropriate
colon-separated hexadecimal notation for IPv6.
Mask that determines which bits of dest_ip to match on, in the form
xx.xx.xx.xx for IPv4 or appropriate
colon-separated hexadecimal notation for IPv6.
Protocol number in the IPv4 header, or value of the "next
header" field in the IPv6 header.
Lower end of the range of source port values. The value
specified is included in the range.
Upper end of the range of source port values. The value
specified is included in the range.
Lower end of the range of destination port values. The value
specified is included in the range.
Upper end of the range of destination port values. The value
specified is included in the range.
Integer representing the value of TCP flags to match. For
example, the SYN flag is the second least significant bit in
the TCP flags. Hence a value of 2 would indicate that the "SYN"
flag should be set (assuming an appropriate mask).
Integer representing the mask to apply when matching TCP
flags. For example, a value of 2 would imply that the "SYN"
flag should be matched and all other flags ignored.
ICMP type to be matched.
ICMP code to be matched.
Direction of traffic to match on the specified port, either
"ingress" (toward the logical switch or router) or "egress"
(leaving the logical switch or router).
Action to take for this rule, either "permit" or "deny".
An entry in this column indicates to the NVC that the ACL
could not be configured as requested. The switch must clear this column when the error
has been cleared.
Indicates that an ACL entry requested by
the controller could not be instantiated by the switch,
e.g. because it requires an unsupported combination of
fields to be matched.
Indicates that an error has occurred in configuring the ACL
entry but no
more specific information is available.
Access Control List table. Each ACL is constructed as a set of
entries from the table. Packets that
are not matched by any entry in the ACL are allowed by default.
A set of references to entries in the table.
A human readable name for the ACL, which may (for example) be displayed on
the switch CLI.
An entry in this column indicates to the NVC that the ACL
could not be configured as requested. The switch must clear this column when the error
has been cleared.
Indicates that an ACL requested by
the controller could not be instantiated by the switch,
e.g., because it requires an unsupported combination of
fields to be matched.
Indicates that an ACL requested by
the controller could not be instantiated by the switch due
to a shortage of resources (e.g. TCAM space).
Indicates that an error has occurred in configuring the ACL
but no
more specific information is available.