summaryrefslogtreecommitdiff
path: root/vswitchd/bridge.c
Commit message (Collapse)AuthorAgeFilesLines
* ofproto-dpif-upcall: Wait for valid hw flow stats before applying ↵Eelco Chaudron2023-03-151-0/+3
| | | | | | | | | | | | | | | | | min-revalidate-pps. Depending on the driver implementation, it can take from 0.2 seconds up to 2 seconds before offloaded flow statistics are updated. This is true for both TC and rte_flow-based offloading. This is causing a problem with min-revalidate-pps, as old statistic values are used during this period. This fix will wait for at least 2 seconds, by default, before assuming no packets where received during this period. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ipfix: Make template and stats interval configurable.Adrian Moreno2023-02-271-0/+17
| | | | | | | | | Add options to the IPFIX table configure the interval to send statistics and template information. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* dpctl: Add support to count upcall packets.wangchuanlei2023-01-311-1/+3
| | | | | | | | | Add support to count upcall packets per port, both succeed and failed, which is a better way to see how many packets upcalled on each interface. Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: wangchuanlei <wangchuanlei@inspur.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* netdev: Assume default link speed to be 10 Gbps instead of 100 Mbps.Ilya Maximets2022-11-301-2/+2
| | | | | | | | | | | | | | | | | | | | | 100 Mbps was a fair assumption 13 years ago. Modern days 10 Gbps seems like a good value in case no information is available otherwise. The change mainly affects QoS which is currently limited to 100 Mbps if the user didn't specify 'max-rate' and the card doesn't report the speed or OVS doesn't have a predefined enumeration for the speed reported by the NIC. Calculation of the path cost for STP/RSTP is also affected if OVS is unable to determine the link speed. Lower link speed adapters are typically good at reporting their speed, so chances for overshoot should be low. But newer high-speed adapters, for which there is no speed enumeration or if there are some other issues, will not suffer that much. Acked-by: Mike Pattrick <mkp@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* vswitchd: Publish per iface received multicast packets.David Marchand2022-11-241-0/+1
| | | | | | | | | | The count of received multicast packets has been computed internally, but not exposed to ovsdb. Fix this. Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Mike Pattrick <mkp@redhat.com> Acked-by: Michael Santana <msantana@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* xenserver: Remove xenserver.Greg Rose2022-08-151-48/+4
| | | | | | | | | | | | | Remove the current xenserver implementation - it is obsolete and since 3.0 we do not support kernel module builds [1]. 1. https://mail.openvswitch.org/pipermail/ovs-dev/2022-July/395789.html [i.maximets] Can be added back if people willing to maintain it will be found. Signed-off-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ofproto/bond: Add knob 'all-members-active'.Christophe Fontaine2022-07-151-0/+3
| | | | | | | | | | This config param allows the delivery of broadcast and multicast packets to the secondary interface of non-lacp bonds, equivalent to the option 'all_slaves_active' for Linux kernel bonds. Reported-at: https://bugzilla.redhat.com/1720935 Signed-off-by: Christophe Fontaine <cfontain@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* odp-execute: Add function pointers to odp-execute for different action ↵Emma Finn2022-07-151-0/+3
| | | | | | | | | | | | | | | | implementations. This commit introduces the initial infrastructure required to allow different implementations for OvS actions. The patch introduces action function pointers which allows user to switch between different action implementations available. This will allow for more performance and flexibility so the user can choose the action implementation to best suite their use case. Signed-off-by: Emma Finn <emma.finn@intel.com> Acked-by: Harry van Haaren <harry.van.haaren@intel.com> Acked-by: Sunil Pai G <sunil.pai.g@intel.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* hmap: use short version of safe loops if possible.Adrian Moreno2022-03-301-28/+28
| | | | | | | | | | | | | | | Using SHORT version of the *_SAFE loops makes the code cleaner and less error prone. So, use the SHORT version and remove the extra variable when possible for hmap and all its derived types. In order to be able to use both long and short versions without changing the name of the macro for all the clients, overload the existing name and select the appropriate version depending on the number of arguments. Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* list: use short version of safe loops if possible.Adrian Moreno2022-03-301-8/+8
| | | | | | | | | | | | | | | Using the SHORT version of the *_SAFE loops makes the code cleaner and less error-prone. So, use the SHORT version and remove the extra variable when possible. In order to be able to use both long and short versions without changing the name of the macro for all the clients, overload the existing name and select the appropriate version depending on the number of arguments. Acked-by: Dumitru Ceara <dceara@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* bridge: Fix incorrect configuration of netdev's dpif type.Ilya Maximets2021-12-171-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | netdev_set_dpif_type() can only be used with a normalized dpif type as an argument, which is a constant static string derived from a type of a dpif_class or a constant string "system". Usage of a same constant string allows netdev-offload module to compare types by simply comparing pointers. OTOH, 'br->ofproto->type' is a dynamic string that: a. Can be NULL. b. Even if not NULL and equal, can be a different dynamically allocated string. Both these qualities breaks assumptions made by all other modules related to HW offload, breaking the functionality. Fix that by moving netdev_set_dpif_type() to dpif.c and calling with a correct constant string as an argument. The call moved from bridge.c to dpif.c, because we need to have access to the dpif class, but bridge.c should not. Not trying to set the dpif_type inside the netdev_ports_insert(), because it's used now outside the offloading context. So, it's cleaner to move the netdev_set_dpif_type() call outside of the netdev-offload module. Additionally removed the redundant call from the netdev_ports_insert() and refactored the function, since it doesn't need an extra argument anymore. Fixes: 4f19a78a61c5 ("netdev-vport: Fix userspace tunnel ioctl(SIOCGIFINDEX) info logs.") Reported-by: Roi Dayan <roid@nvidia.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-December/390117.html Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Lin Huang <linhuang@ruijie.com.cn> Acked-by: Roi Dayan <roid@nvidia.com>
* netdev-vport: Fix userspace tunnel ioctl(SIOCGIFINDEX) info logs.Lin Huang2021-12-081-0/+2
| | | | | | | | | | | | | | | | | | | Userspace tunnel doesn't have a valid device in the kernel. So get_ifindex() function (ioctl) always get error during adding a port, deleting a port or updating a port status. The info log is "2021-08-29T09:17:39.830Z|00059|netdev_linux|INFO|ioctl(SIOCGIFINDEX) on vxlan_sys_4789 device failed: No such device" If there are a lot of userspace tunnel ports on a bridge, the iface_refresh_netdev_status() function will spend a lot of time. So ignore userspace tunnel port ioctl(SIOCGIFINDEX) operation, just return -ENODEV. Signed-off-by: Lin Huang <linhuang@ruijie.com.cn> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* ovsdb-idl: Add memory report function.Ilya Maximets2021-11-041-0/+2
| | | | | | | | | | | | | | | Added new function to return memory usage statistics for database objects inside IDL. Statistics similar to what ovsdb-server reports. Not counting _Server database as it should be small, hence doesn't worth adding extra code to the ovsdb-cs module. Can be added later if needed. ovs-vswitchd is a user in OVS, but this API will be mostly useful for OVN daemons. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Han Zhou <hzhou@ovn.org> Acked-by: Dumitru Ceara <dceara@redhat.com>
* ovsdb-data: Optimize union of sets.Ilya Maximets2021-09-241-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current algorithm of ovsdb_datum_union looks like this: for-each atom in b: if not bin_search(a, atom): push(a, clone(atom)) quicksort(a) So, the complexity looks like this: Nb * log2(Na) + Nb + (Na + Nb) * log2(Na + Nb) Comparisons clones Comparisons for quicksort for search ovsdb_datum_union() is heavily used in database transactions while new element is added to a set. For example, if new logical switch port is added to a logical switch in OVN. This is a very common use case where CMS adds one new port to an existing switch that already has, let's say, 100 ports. For this case ovsdb-server will have to perform: 1 * log2(100) + 1 clone + 101 * log2(101) Comparisons Comparisons for for search quicksort. ~7 1 ~707 Roughly 714 comparisons of atoms and 1 clone. Since binary search can give us position, where new atom should go (it's the 'low' index after the search completion) for free, the logic can be re-worked like this: copied = 0 for-each atom in b: desired_position = bin_search(a, atom) push(result, a[ copied : desired_position - 1 ]) copied = desired_position push(result, clone(atom)) push(result, a[ copied : Na ]) swap(a, result) Complexity of this schema: Nb * log2(Na) + Nb + Na Comparisons clones memory copy on push for search 'swap' is just a swap of a few pointers. 'push' is not a 'clone', but a simple memory copy of 'union ovsdb_atom'. In general, this schema substitutes complexity of a quicksort with complexity of a memory copy of Na atom structures, where we're not even copying strings that these atoms are pointing to. Complexity in the example above goes down from 714 comparisons to 7 comparisons and memcpy of 100 * sizeof (union ovsdb_atom) bytes. General complexity of a memory copy should always be lower than complexity of a quicksort, especially because these copies usually performed in bulk, so this new schema should work faster for any input. All in all, this change allows to execute several times more transactions per second for transactions that adds new entries to sets. Alternatively, union can be implemented as a linear merge of two sorted arrays, but this will result in O(Na) comparisons, which is more than Nb * log2(Na) in common case, since Na is usually far bigger than Nb. Linear merge will also mean per-atom memory copies instead of copying in bulk. 'replace' functionality of ovsdb_datum_union() had no users, so it just removed. But it can easily be added back if needed in the future. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Han Zhou <hzhou@ovn.org> Acked-by: Mark D. Gray <mark.d.gray@redhat.com>
* bridge: Use correct (legacy) role names in database.Ben Pfaff2021-07-071-2/+2
| | | | | | | | | | The vswitchd database schema requires role names to be "master" or "slave", but this code tried to use "primary" and "secondary". Signed-off-by: Ben Pfaff <blp@ovn.org> Reported-at: https://github.com/openvswitch/ovs-issues/issues/218 Tested-at: https://github.com/openvswitch/ovs-issues/issues/218#issuecomment-875374045 Fixes: 807152a4ddfb ("Use primary/secondary, not master/slave, as names for OpenFlow roles.")
* bridge: fix type mismatchYunjian Wang2021-07-021-5/+5
| | | | | | | | | | | | | | Currently the function ofproto_set_flow_limit() was not checking 'limit' value. It maybe negative, which will be lead to a big unsigned value. The 'limit' should never be negative so it's better to just use smap_get_uint() to get it right. And fix ofproto_set_max_idle(), ofproto_set_min_revalidate_pps(), ofproto_set_max_revalidator() and ofproto_set_bundle_idle_timeout() together. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* bridge: Only an inactivity_probe of 0 should turn off inactivity probes.Ben Pfaff2021-07-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | The documentation for inactivity_probe says this: inactivity_probe: optional integer Maximum number of milliseconds of idle time on connec‐ tion to controller before sending an inactivity probe message. If Open vSwitch does not communicate with the controller for the specified number of seconds, it will send a probe. If a response is not received for the same additional amount of time, Open vSwitch assumes the con‐ nection has been broken and attempts to reconnect. De‐ fault is implementation-specific. A value of 0 disables inactivity probes. This means that a value of 0 should disable inactivity probes and any other value should be in milliseconds. The code in bridge.c was actually interpreting it as any value between 0 and 999 disabling inactivity probes. That was surprising when I accidentally configured it to 5 or to 10, not remembering that it was in milliseconds, and disabled them entirely. This fixes the problem. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ilya Maximets <i.maximets@ovn.org>
* add port-based ingress policing based packet-per-second rate-limitingYong Xu2021-07-011-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | OVS has support for using policing to enforce a rate limit in kilobits per second. This is configured using OVSDB. f.e. $ ovs-vsctl set interface tap0 ingress_policing_rate=1000 $ ovs-vsctl set interface tap0 ingress_policing_burst=100 This patch adds a related feature, allowing policing to enforce a rate limit in kilo-packets per second. This is also configured using OVSDB. $ ovs-vsctl set interface tap0 ingress_policing_kpkts_rate=1000 $ ovs-vsctl set interface tap0 ingress_policing_kpkts_burst=100 The kilo-bit and kilo-packet rate limits may be used separately or in combination. Add separate action for BPS and PPS in netlink message. Revise code and change action result to pipe to allow traffic pipe into second action. This patch implements the feature for: * OVSDB (northbound API) * TC policer when used both with and without TC offload (kernel API) Signed-off-by: Yong Xu <yong.xu@corigine.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* Eliminate use of term "slave" in bond, LACP, and bundle contexts.Ben Pfaff2020-10-211-17/+18
| | | | | | | | | | | | | The new term is "member". Most of these changes should not change user-visible behavior. One place where they do is in "ovs-ofctl dump-flows", which will now output "members:..." inside "bundle" actions instead of "slaves:...". I don't expect this to cause real problems in most systems. The old syntax is still supported on input for backward compatibility. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com>
* bond: Fix using uninitialized 'lacp_fallback_ab_cfg' for 'bond-primary'.Ilya Maximets2020-10-171-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | 's->lacp_fallback_ab_cfg' initialized down below in the code, so we're using it uninitialized to detect if we need to get 'bond-primary' configuration. Found by valgrind: Conditional jump or move depends on uninitialised value(s) at 0x409114: port_configure_bond (bridge.c:4569) by 0x409114: port_configure (bridge.c:1284) by 0x40F6E6: bridge_reconfigure (bridge.c:917) by 0x411425: bridge_run (bridge.c:3330) by 0x406D84: main (ovs-vswitchd.c:127) Uninitialised value was created by a stack allocation at 0x408C53: port_configure (bridge.c:1190) Fix that by moving this code to the point where 'lacp_fallback_ab_cfg' already initialized. Additionally clarified behavior of 'bond-primary' in manpages for the fallback to AB case. Fixes: b4e50218a0f8 ("bond: Add 'primary' interface concept for active-backup mode.") Acked-by: Jeff Squyres <jsquyres@cisco.com> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* Eliminate "whitelist" and "blacklist" terms.Ben Pfaff2020-10-161-14/+13
| | | | | | | | There is one remaining use under datapath. That change should happen upstream in Linux first according to our usual policy. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
* Use primary/secondary, not master/slave, as names for OpenFlow roles.Ben Pfaff2020-10-161-4/+4
| | | | | Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
* bond: Add 'primary' interface concept for active-backup mode.Jeff Squyres2020-07-171-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In AB bonding, if the current active slave becomes disabled, a replacement slave is arbitrarily picked from the remaining set of enabled slaves. This commit adds the concept of a "primary" slave: an interface that will always be (or become) the current active slave if it is enabled. The rationale for this functionality is to allow the designation of a preferred interface for a given bond. For example: 1. Bond is created with interfaces p1 (primary) and p2, both enabled. 2. p1 becomes the current active slave (because it was designated as the primary). 3. Later, p1 fails/becomes disabled. 4. p2 is chosen to become the current active slave. 5. Later, p1 becomes re-enabled. 6. p1 is chosen to become the current active slave (because it was designated as the primary) Note that p1 becomes the active slave once it becomes re-enabled, even if nothing has happened to p2. This "primary" concept exists in Linux kernel network interface bonding, but did not previously exist in OVS bonding. Only one primary slave interface is supported per bond, and is only supported for active/backup bonding. The primary slave interface is designated via "other_config:bond-primary" when creating a bond. Also, while adding tests for the "primary" concept, make a few small improvements to the non-primary AB bonding test. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* bridge: Fix null dereference on ct_timeout_policy recordYi-Hung Wei2020-06-271-2/+4
| | | | | | | | | | | | | | Accoridng to vswitch.ovsschema, each CT_Zone record may have zero or one associcated CT_Timeout_policy. Thus, this patch checks if ovsrec_ct_timeout_policy exist before accesses the record. VMWare-BZ: 2585825 Fixes: 45339539f69d ("ovs-vsctl: Add conntrack zone commands.") Fixes: 993cae678bca ("ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tables") Reported-by: Yang Song <yangsong@vmware.com> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com>
* userspace: Avoid dp_hash recirculation for balance-tcp bond mode.Vishal Deep Ajmera2020-06-221-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem: In OVS, flows with output over a bond interface of type “balance-tcp” gets translated by the ofproto layer into "HASH" and "RECIRC" datapath actions. After recirculation, the packet is forwarded to the bond member port based on 8-bits of the datapath hash value computed through dp_hash. This causes performance degradation in the following ways: 1. The recirculation of the packet implies another lookup of the packet’s flow key in the exact match cache (EMC) and potentially Megaflow classifier (DPCLS). This is the biggest cost factor. 2. The recirculated packets have a new “RSS” hash and compete with the original packets for the scarce number of EMC slots. This implies more EMC misses and potentially EMC thrashing causing costly DPCLS lookups. 3. The 256 extra megaflow entries per bond for dp_hash bond selection put additional load on the revalidation threads. Owing to this performance degradation, deployments stick to “balance-slb” bond mode even though it does not do active-active load balancing for VXLAN- and GRE-tunnelled traffic because all tunnel packet have the same source MAC address. Proposed optimization: This proposal introduces a new load-balancing output action instead of recirculation. Maintain one table per-bond (could just be an array of uint16's) and program it the same way internal flows are created today for each possible hash value (256 entries) from ofproto layer. Use this table to load-balance flows as part of output action processing. Currently xlate_normal() -> output_normal() -> bond_update_post_recirc_rules() -> bond_may_recirc() and compose_output_action__() generate 'dp_hash(hash_l4(0))' and 'recirc(<RecircID>)' actions. In this case the RecircID identifies the bond. For the recirculated packets the ofproto layer installs megaflow entries that match on RecircID and masked dp_hash and send them to the corresponding output port. Instead, we will now generate action as 'lb_output(<bond id>)' This combines hash computation (only if needed, else re-use RSS hash) and inline load-balancing over the bond. This action is used *only* for balance-tcp bonds in userspace datapath (the OVS kernel datapath remains unchanged). Example: Current scheme: With 8 UDP flows (with random UDP src port): flow-dump from pmd on cpu core: 2 recirc_id(0),in_port(7),<...> actions:hash(hash_l4(0)),recirc(0x1) recirc_id(0x1),dp_hash(0xf8e02b7e/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb236c260/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x7d89eb18/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0xa78d75df/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0xb58d846f/0xff),<...> actions:2 recirc_id(0x1),dp_hash(0x24534406/0xff),<...> actions:1 recirc_id(0x1),dp_hash(0x3cf32550/0xff),<...> actions:1 New scheme: We can do with a single flow entry (for any number of new flows): in_port(7),<...> actions:lb_output(1) A new CLI has been added to dump datapath bond cache as given below. # ovs-appctl dpif-netdev/bond-show [dp] Bond cache: bond-id 1 : bucket 0 - slave 2 bucket 1 - slave 1 bucket 2 - slave 2 bucket 3 - slave 1 Co-authored-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Manohar Krishnappa Chidambaraswamy <manukc@gmail.com> Signed-off-by: Vishal Deep Ajmera <vishal.deep.ajmera@ericsson.com> Tested-by: Matteo Croce <mcroce@redhat.com> Tested-by: Adrian Moreno <amorenoz@redhat.com> Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
* vswitchd: Add serial number configuration.Kirill A. Kornilov2020-01-311-0/+9
| | | | | Signed-off-by: Kirill A. Kornilov <kornilov@zelax.ru> Signed-off-by: Ben Pfaff <blp@ovn.org>
* userspace: Add TCP Segmentation Offload supportFlavio Leitner2020-01-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Abbreviated as TSO, TCP Segmentation Offload is a feature which enables the network stack to delegate the TCP segmentation to the NIC reducing the per packet CPU overhead. A guest using vhostuser interface with TSO enabled can send TCP packets much bigger than the MTU, which saves CPU cycles normally used to break the packets down to MTU size and to calculate checksums. It also saves CPU cycles used to parse multiple packets/headers during the packet processing inside virtual switch. If the destination of the packet is another guest in the same host, then the same big packet can be sent through a vhostuser interface skipping the segmentation completely. However, if the destination is not local, the NIC hardware is instructed to do the TCP segmentation and checksum calculation. It is recommended to check if NIC hardware supports TSO before enabling the feature, which is off by default. For additional information please check the tso.rst document. Signed-off-by: Flavio Leitner <fbl@sysclose.org> Tested-by: Ciara Loftus <ciara.loftus.intel.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>
* bridge: Split the column updates of rstp statistics and status.Krishna Kolakaluri2020-01-061-3/+31
| | | | | | | | | | Split the update of rstp_statistics column and rstp_status column in Port table into two different functions. This helps in controlling the number of times the rstp_statistics column is updated with the key "stats-update_interval" in Open_vSwitch table. Signed-off-by: Krishna Kolakaluri <kkolakaluri@plume.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* bridge: Allow manual notifications about interfaces' updates.Ilya Maximets2019-12-181-0/+2
| | | | | | | | | | | | | Sometimes interface updates could happen in a way ifnotifier is not able to catch. For example some heavy operations (device reset) in netdev-dpdk could require re-applying of the bridge configuration. For this purpose new manual notifier introduced. Its function 'if_notifier_manual_report()' could be called directly by the code that aware about changes. This new notifier is thread-safe. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com>
* Add offload packets statisticszhaozhanxu2019-12-061-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add argument '--offload-stats' for command ovs-appctl bridge/dump-flows to display the offloaded packets statistics. The commands display as below: orignal command: ovs-appctl bridge/dump-flows br0 duration=574s, n_packets=1152, n_bytes=110768, priority=0,actions=NORMAL table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=2,recirc_id=0,actions=drop table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x1,actions=controller(reason=) table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x2,actions=drop table_id=254, duration=574s, n_packets=0, n_bytes=0, priority=0,reg0=0x3,actions=drop new command with argument '--offload-stats' Notice: 'n_offload_packets' are a subset of n_packets and 'n_offload_bytes' are a subset of n_bytes. ovs-appctl bridge/dump-flows --offload-stats br0 duration=582s, n_packets=1152, n_bytes=110768, n_offload_packets=1107, n_offload_bytes=107992, priority=0,actions=NORMAL table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=2,recirc_id=0,actions=drop table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x1,actions=controller(reason=) table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x2,actions=drop table_id=254, duration=582s, n_packets=0, n_bytes=0, n_offload_packets=0, n_offload_bytes=0, priority=0,reg0=0x3,actions=drop Signed-off-by: zhaozhanxu <zhaozhanxu@163.com> Signed-off-by: Simon Horman <simon.horman@netronome.com>
* ofproto-dpif: Expose datapath capability to ovsdb.William Tu2019-11-211-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds support for fetching the datapath's capabilities from the result of 'check_support()', and write the supported capability to a new database column, called 'capabilities' under Datapath table. To see how it works, run: # ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev # ovs-vsctl -- --id=@m create Datapath datapath_version=0 \ 'ct_zones={}' 'capabilities={}' \ -- set Open_vSwitch . datapaths:"netdev"=@m # ovs-vsctl list-dp-cap netdev ufid=true sample_nesting=true clone=true tnl_push_pop=true \ ct_orig_tuple=true ct_eventmask=true ct_state=true \ ct_clear=true max_vlan_headers=1 recirc=true ct_label=true \ max_hash_alg=1 ct_state_nat=true ct_timeout=true \ ct_mark=true ct_orig_tuple6=true check_pkt_len=true \ masked_set_action=true max_mpls_depth=3 trunc=true ct_zone=true Signed-off-by: William Tu <u9012063@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> --- v5: Add improved documentation from Ben and fix checkpatch error (tab and line 79 char) v4: rebase to master v3: fix 32-bit build, reported by Greg travis: https://travis-ci.org/williamtu/ovs-travis/builds/599276267 v2: rebase to master
* ofproto-dpif: Consume CT_Zone, and CT_Timeout_Policy tablesYi-Hung Wei2019-09-261-2/+195
| | | | | | | | | | | | | | | | | | | | | | | This patch consumes the CT_Zone and CT_Timeout_Policy tables, maintains the zone-based configuration in the vswitchd. Whenever there is a database change, vswitchd will read the datapath, CT_Zone, and CT_Timeout_Policy tables from ovsdb, builds an internal snapshot of the database configuration in bridge.c, and pushes down the change into ofproto and dpif layer. If a new zone-based timeout policy is added, it updates the zone to timeout policy mapping in the per datapath type datapath structure in dpif-backer, and pushes down the timeout policy into the datapath via dpif interface. If a timeout policy is no longer used, for kernel datapath, vswitchd may not be able to remove it from datapath immediately since datapath flows can still reference the to-be-deleted timeout policies. Thus, we keep an timeout policy kill list, that vswitchd will go back to the list periodically and try to kill the unused timeout policies. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Signed-off-by: Justin Pettit <jpettit@ovn.org>
* show "rx_missed_errors" counter in interface statisicstxfh20072019-09-251-0/+1
| | | | | | | | | | | Hi all: Currently OVS maintains several Statistics counters per interface. "rx_missed_errors" counter is amount them and collects pkts not received due to local resource constaints. Many ovs netdevs support collecting this counter, such as netdev-linux, netdev-dpdk, netdev-bsd and so on. But as far as I know, this counter can't be read by command "ovs-vsctl list interface <int-name>|grep statistics". I have found the root cause(may be I was wrong) is in task "iface_refresh_stats", the "rx_missed_errors" is not in the macro IFACE_STATS. So even if this counter is updated by netdev, it woundn't be read by users. This simple patch tries to solve this problem, many thanks for your kindly reminder. Signed-off-by: Liu Chang <liuchang@cmss.chinamobile.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* vswitchd: Make packet-in controller queue size configurableDumitru Ceara2019-09-231-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | The ofconn packet-in queue for packets that can't be immediately sent on the rconn connection was limited to 100 packets (hardcoded value). While increasing this limit is usually not recommended as it might create buffer bloat and increase latency, in scaled scenarios it is useful if the administrator (or CMS) can adjust the queue size. One such situation was noticed while performing scale testing of the OVN IGMP functionality: triggering ~200 simultaneous IGMP reports was causing tail drops on the packet-in queue towards ovn-controller. This commit adds the possibility to configure the queue size for: - management controller (br-int.mgmt): through the other_config:controller-queue-size column of the Bridge table. This value is limited to 512 as large queues definitely affect latency. If not present the default value of 100 is used. This is done in order to maintain the same default behavior as before the commit. - other controllers: through the controller_queue_size column of the Controller table. This value is also limited to 512. If not present the code uses the Bridge:other_config:controller-queue-size configuration. Acked-by: Mark Michelson <mmichels@redhat.com> Signed-off-by: Dumitru Ceara <dceara@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* vswitch: ratelimit the device add logAaron Conole2019-09-231-2/+7
| | | | | | | | | | | | | | | It's possible that a port added to the system with certain kinds of invalid parameters will cause the 'could not add' log to be triggered. When this happens, the vswitch run loop can continually re-attempt adding the port. While the parameters remain invalid the vswitch run loop will re-trigger the warning, flooding the syslog. This patch adds a simple rate limit to the log. Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* upcall: Configure datapath min-revalidate-pps through ovs-vsctl.Vlad Buslov2019-08-211-0/+3
| | | | | | | | | | | This patch adds a new configuration option, "min-revalidate-pps" to the Open_vSwitch "other-config" column. This sets minimum pps that flow must have in order to be revalidated when revalidation duration exceeds half of max-revalidator config variable. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* upcall: Configure datapath max-revalidator through ovs-vsctl.Vlad Buslov2019-08-211-0/+3
| | | | | | | | | | | This patch adds a new configuration option, "max-revalidator" to the Open_vSwitch "other-config" column. This sets maximum allowed ravalidator timeout. Actual timeout value is determined at runtime as minimum of "max-idle" and "max-revalidator". Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev: Split up netdev offloading to separate module.Ilya Maximets2019-06-111-0/+1
| | | | | | | | | | | | | | | New module 'netdev-offload' created to manage different flow API implementations. All the generic and provider independent code moved there from the 'netdev' module. Flow API providers further encapsulated. The only function that was changed is 'netdev_any_oor'. Now it uses offloading related hmap instead of common 'netdev_shash'. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ben Pfaff <blp@ovn.org> Acked-by: Roi Dayan <roid@mellanox.com>
* ofproto-dpif-xlate: Add "always" mode to priority tagsEli Britstein2019-05-241-0/+2
| | | | | | | | | | | Configure "if-nonzero" priority tags to retain the 802.1Q header when the VLAN ID is zero, except both the VLAN ID and priority are zero. Add a "always" configuration option to retain the 802.1Q header in such frames as well. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-xlate: Change priority tags from boolean to enumEli Britstein2019-05-241-2/+6
| | | | | | | | | | | | | Priority tags is a port configuration to determine how the port treats priority tags, e.g. zero VLAN ID. Change the type from boolean to enum as a pre-step towards introducing additional modes. The new options are "never", equivalent to previously "false", and "if-nonzero", equivalent to previously "true". "true" is still supported for backwards compatibility. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* bridge: Propagate patch port pairing errors to db.Ilya Maximets2019-03-261-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Virtual ports like 'patch' ports that almost fully implemented on 'ofproto' layer could have internal to 'ofproto' statuses that could not be retrieved from 'netdev' or other layers. For example, in current implementation there is no way to get the patch port pairing status (i.e. if it has usable peer?). New 'ofproto-provider' API function 'vport_get_status' introduced to cover this gap. It allowes 'bridge' layer to retrive current status of ofproto virtual ports and propagate it to DB. For now we're only interested in pairing errors of 'patch' ports. That are propagated to the 'error' column of the 'Interface' table. Ex.: $ ovs-vsctl show ... Bridge "br1" ... Port "patch1" Interface "patch1" type: patch options: {peer="patch0"} error: "No usable peer 'patch0' exists in 'system' datapath." Bridge "br0" datapath_type: netdev ... Port "patch0" Interface "patch0" type: patch options: {peer="patch1"} error: "No usable peer 'patch1' exists in 'netdev' datapath." Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* vswitchd: Allow user to configure controllers as "primary" or "service".Ben Pfaff2019-02-051-0/+11
| | | | | | | | | | | | | | Normally it makes sense for an active connection to be primary and a passive connection to be a service connection, but I've run into a corner case where it is better for a passive connection to be a primary connection. This specific case is for use with OFtest, which expects to be a primary controller. However, it also wants to reconnect frequently, which is slow for active connections because of the backoff; by configuring a passive, primary controller, OFtest can reconnect as frequently and as quickly as it wants, making the overall test much faster. Acked-by: Justin Pettit <jpettit@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org>
* connmgr: Make treatment of active and passive connections more uniform.Ben Pfaff2019-02-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Until now, connmgr has handled active and passive OpenFlow connections in quite different ways. Any active connection, whether it was currently connected or not, was always maintained as an ofconn. Whenever such a connection (re)connected, its settings were cleared. On the other hand, passive connections had a separate listener which created an ofconn when a new connection came in, and these ofconns would be deleted when such a connection was closed. This approach is inelegant and has occasionally led to bugs when reconnection didn't clear all of the state that it should have. There's another motivation here. Currently, active connections are always primary controllers and passive connections are always service controllers (as documented in ovs-vswitchd.conf.db(5)). Sometimes it would be useful to have passive primary controllers (maybe active service controllers too but I haven't personally run into that use case). As is, this is difficult to implement because there is so much different code in use between active and passive connections. This commit will make it easier. Signed-off-by: Ben Pfaff <blp@ovn.org>
* connmgr: Improve interface for setting controllers.Ben Pfaff2018-10-311-61/+41
| | | | | | | | | | | Using an shash instead of an array simplifies the code for both the caller and the callee. Putting the set of allowed OpenFlow versions into the ofproto_controller data structure also simplifies the overall function interface slightly. Tested-by: Yifeng Sun <pkusunyifeng@gmail.com> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* bridge.c: prevent controller connects while flow-restore-waitZak Whittington2018-10-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When force-reload-kmod is used, it shows an error when reinstalling tlvs during "Restoring saved flows" step: OFPT_ERROR (xid=0x4): NXTTMFC_ALREADY_MAPPED This is caused by a race condition between the restore script, which calls ofctl, and the connected controllers both adding back the same TLVs. The restore script already sets flow-restore-wait to true while doing flow restoration, and sets it back to false after it is done, and this patch utilizes that fact to prevent the TLV race. It does this by preventing vswitchd from connecting to controllers in the controller table while it is in a flow-restore-wait state. With this patch, when bridge_configure_remotes() calls bridge_get_controllers(), it first checks if flow-restore-wait has been set, and if so, it ignores any controllers in the controller database and sets n_controllers to 0. This solution does preserve the management service controller which is added via bridge_ofproto_controller_for_mgmt() after checking whether we should call bridge_get_controllers() (and thus n_controllers is properly set to 1, etc) VMware-BZ: 2195377 Signed-off-by: Zak Whittington <zwhitt.vmware@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Revert "bridge: Fix ovs-appctl qos/show repeated queue information"Ben Pfaff2018-10-031-1/+0
| | | | | | | | | | | | This reverts commit 6b4d0211e84a ("bridge: Fix ovs-appctl qos/show repeated queue information"), which is no longer necessary now that commit 65f3c34c7417 ("netdev: Properly clear 'details' when iterating in NETDEV_QOS_FOR_EACH.") has been applied. The former commit fixed a symptom of the root cause fixed by the latter. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org>
* bridge: Fix ovs-appctl qos/show repeated queue informationEelco Chaudron2018-10-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch below would stop qos/show to repeat information from the previous queues. See below an example before and after the fix: Before: $ ovs-appctl qos/show p5p2 QoS: p5p2 linux-htb max-rate: 2428800 Default: burst: 12512 min-rate: 12000 max-rate: 2428800 tx_packets: 0 tx_bytes: 0 tx_errors: 0 Queue 20: burst: 12512 burst: 12512 min-rate: 12000 min-rate: 12000 max-rate: 607200 max-rate: 2428800 tx_packets: 28780 tx_bytes: 43572920 tx_errors: 17611 Queue 10: burst: 12512 burst: 12512 burst: 12512 max-rate: 2428800 max-rate: 607200 max-rate: 2428800 min-rate: 12000 min-rate: 12000 min-rate: 12000 tx_packets: 71751 tx_bytes: 108631014 tx_errors: 18503 After: $ ovs-appctl qos/show p5p2 QoS: p5p2 linux-htb max-rate: 2428800 Default: burst: 12512 min-rate: 12000 max-rate: 2428800 tx_packets: 0 tx_bytes: 0 tx_errors: 0 Queue 20: burst: 12512 min-rate: 12000 max-rate: 607200 tx_packets: 28780 tx_bytes: 43572920 tx_errors: 17611 Queue 10: burst: 12512 min-rate: 12000 max-rate: 2428800 tx_packets: 71751 tx_bytes: 108631014 tx_errors: 18503 Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* bridge: Clean leaking netdevs when route is added.Tiago Lam2018-07-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When adding a route to a bridge, by executing "$appctl ovs/route/add $IP/$MASK $BR", a reference to the existing netdev is taken and stored in an instantiated ip_dev struct which is then stored in an addr_list list in tnl-ports.c. When OvS is signaled to exit, as a result of a "$appctl $OVS_PID exit --cleanup", for example, the bridge takes care of destroying its allocated port and iface structs. While destroying and freeing an iface, the netdev associated with it is also destroyed. However, for this to happen its ref_cnt must be 0. Otherwise the destructor of the netdev (specific to each datapath) won't be called. On the userspace datapath this means a system interface, such as "br0", wouldn't get deleted upon exit of OvS (when a route happens to be assocaited). This was first observed in the "ptap - triangle bridge setup with L2 and L3 GRE tunnels" test, which runs as part of the system userspace testsuite and uses the netdev datapath (as opoosed to several tests which use the dummy datapath, where this issue isn't seen). The test would pass every other time and fail the rest of the times because the needed system interfaces (br-p1, br-p2 and br-p3) were already present (from the previous successfull run which didn't clean up properly), leading to a failure. To fix the leak and clean up the interfaces upon exit, on its final stage before destroying a netdev, in iface_destroy__(), the bridge calls tnl_port_map_delete_ipdev() which takes care of freeing the instatiated ip_dev structs that refer to a specific netdev. An extra test is also introduced which verifies that the resources used by OvS netdev datapath have been correctly cleaned up between OVS_TRAFFIC_VSWITCHD_STOP and AT_CLEANUP. Signed-off-by: Tiago Lam <tiago.lam@intel.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* treewide: Convert leading tabs to spaces.Ben Pfaff2018-06-111-5/+5
| | | | | | | | | It's always been OVS coding style to use spaces rather than tabs for indentation, but some tabs have snuck in over time. This commit converts them to spaces. Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
* dpdk: reflect status and version in the databaseAaron Conole2018-05-251-0/+5
| | | | | | | | | | | | | | | | The normal way of retrieving the running DPDK status involves parsing log files and issuing various incantations of ovs-vsctl and ovs-appctl commands to determine whether the rte_eal_init successfully started. This commit adds two new records to reflect the dpdk version, and the dpdk initialization status. To support this, the other_config:dpdk-init configuration block supports the 'true' and 'try' keywords now, instead of just 'true'. Signed-off-by: Aaron Conole <aconole@redhat.com> Acked-by: Kevin Traynor <ktraynor@redhat.com> Signed-off-by: Ian Stokes <ian.stokes@intel.com>