summaryrefslogtreecommitdiff
path: root/ofproto
Commit message (Collapse)AuthorAgeFilesLines
* util: Expose function nullable_string_is_equal.Ilya Maximets2016-07-252-12/+0
| | | | | | | | Implementation of 'nullable_string_is_equal()' moved to util.c and reused inside dpif-netdev. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* json: Move from lib to include/openvswitch.Terry Wilson2016-07-2214-14/+19
| | | | | | | | | | | | | | | To easily allow both in- and out-of-tree building of the Python wrapper for the OVS JSON parser (e.g. w/ pip), move json.h to include/openvswitch. This also requires moving lib/{hmap,shash}.h. Both hmap.h and shash.h were #include-ing "util.h" even though the headers themselves did not use anything from there, but rather from include/openvswitch/util.h. Fixing that required including util.h in several C files mostly due to OVS_NOT_REACHED and things like xmalloc. Signed-off-by: Terry Wilson <twilson@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* tunneling: get skb marking to work properly with tunnelsAnsis Atteka2016-07-211-1/+2
| | | | | | | | | | | | | | | There are two issues that this patch fixes: 1. it was impossible to set skb mark at all through NXM_NX_PKT_MARK register for tunnel packets; AND 2. ipsec_xxx tunnels would not be marked with the default IPsec mark (broken by d23df9a87 "lib/odp: Use masked set actions."). This patch also adds anti-regression tests to prevent such breakages in the future. Signed-off-by: Ansis Atteka <aatteka@ovn.org> VMware-BZ: #1653178 Acked-by: Jarno Rajahalme <jarno@ovn.org>
* ofproto: Fix consistent hashingLiran Schour2016-07-131-3/+2
| | | | | | | | | | | Hashing will not be consistent as long as we use for hashing the index of the bucket in the list (for remove/insert of buckets not from/to the end of the bucket list). Use bucket_id for hashing instead. Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Simon Horman <simon.horman@netronome.com>
* Increase number of registers to 16.Justin Pettit2016-07-122-2/+2
| | | | | | | | With eight 32-bit registers, we can only store two IPv6 addresses, which is pretty tight. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-mirror: Add mirror snaplen support.William Tu2016-07-035-7/+41
| | | | | | | | | | This patch adds a 'snaplen' config for mirroring table. A mirrored packet with size larger than snaplen bytes will be truncated in datapath before sending to the mirror output port. Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/141186839 Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto: Add relaxed group_mod command ADD_OR_MODJan Scheurich2016-07-021-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for a new Group Mod command OFPGC_ADD_OR_MOD to OVS for all OpenFlow versions that support groups (OF11 and higher). The new ADD_OR_MOD creates a group that does not yet exist (like ADD) and modifies an existing group (like MODIFY). Rational: In OpenFlow 1.x the Group Mod commands OFPGC_ADD and OFPGC_MODIFY have strict semantics: ADD fails if the group exists, while MODIFY fails if the group does not exist. This requires a controller to exactly know the state of the switch when programming a group in order not run the risk of getting an OFP Error message in response. This is hard to achieve and maintain at all times in view of possible switch and controller restarts or other connection losses between switch and controller. Due to the un-acknowledged nature of the Group Mod message programming groups safely and efficiently at the same time is virtually impossible as the controller has to either query the existence of the group prior to each Group Mod message or to insert a Barrier Request/Reply after every group to be sure that no Error can be received at a later stage and require a complicated roll-back of any dependent actions taken between the failed Group Mod and the Error. In the ovs-ofctl command line the ADD_OR_MOD command is made available through the new option --may-create in the mod-group command: $ ovs-ofctl -Oopenflow13 del-groups br-int group_id=100 $ ovs-ofctl -Oopenflow13 mod-group br-int group_id=100,type=indirect,bucket=actions=2 OFPT_ERROR (OF1.3) (xid=0x2): OFPGMFC_UNKNOWN_GROUP OFPT_GROUP_MOD (OF1.3) (xid=0x2): MOD group_id=100,type=indirect,bucket=actions=output:2 $ ovs-ofctl -Oopenflow13 --may-create mod-group br-int group_id=100,type=indirect,bucket=actions=2 $ ovs-ofctl -Oopenflow13 dump-groups br-int OFPST_GROUP_DESC reply (OF1.3) (xid=0x2): group_id=100,type=indirect,bucket=actions=output:2 $ ovs-ofctl -Oopenflow13 --may-create mod-group br-int group_id=100,type=indirect,bucket=actions=3 $ ovs-ofctl -Oopenflow13 dump-groups br-int OFPST_GROUP_DESC reply (OF1.3) (xid=0x2): group_id=100,type=indirect,bucket=actions=output:3 Signed-off-by: Jan Scheurich <jan.scheurich at web.de> Signed-off-by: Ben Pfaff <blp@ovn.org>
* bfd: Allow setting OAM bit when encapsulated in tunnel.Jesse Gross2016-06-295-21/+36
| | | | | | | | | | | | | | | | | | | | | | Some tunnel protocols, such as Geneve, have a bit in the tunnel header to indicate that it is an OAM packet. This means that the packet should be processed as a tunnel control frame and not be passed onto connected links. When BFD is used inside of a tunnel it is often used in this control capacity, so this adds an option to enable marking the outer header when the output port is a tunnel that supports the OAM concept. It is also possible to use tunnels as point-to-point links that are simply carrying BFD as payload, so this is not always turned on. Conceptually, this may also apply to other types of packets locally generated by the switch, most obviously CFM. However, BFD seems to be most commonly used for this type of tunnel monitoring application so this only adds the option to BFD for the time being to avoid unnecessarily adding configuration knobs that might never get used. Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* util: New function nullable_xstrdup().Ben Pfaff2016-06-263-12/+5
| | | | | | It's a pretty common pattern so create a function for it. Signed-off-by: Ben Pfaff <blp@ovn.org>
* ipfix: Export user specified virtual observation IDWenyu Zhang2016-06-243-11/+94
| | | | | | | | | | | | In virtual network, users want more info about the virtual point to observe the traffic. It should be a string to provide clear info, not a simple interger ID. Introduce "other-config: virtual_obs_id" in IPFIX, which is a string configured by user. Introduce an enterprise IPFIX entity "virtualObsID"(898) to export the value. The entity is a variable-length string. Signed-off-by: Wenyu Zhang <wenyuz@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* Revert "ipfix: Export user specified virtual observation ID".Ben Pfaff2016-06-243-94/+11
| | | | | | | | | | | | | | | | | | | | | | | This reverts commit 337bebe91c94d9d201e28811c469869d32e978ff, which caused a crash in test 1048 "ofproto-dpif - Flow IPFIX sanity check" (now test 1051) with the following backtrace: #0 hmap_first_with_hash (hmap=<optimized out>, hmap=<optimized out>, hash=<optimized out>) at ../lib/hmap.h:328 #1 smap_find__ (smap=0x94, key=key@entry=0x817f7ab "virtual_obs_id", key_len=14, hash=2537071222) at ../lib/smap.c:366 #2 0x0812b9d7 in smap_get_node (smap=0x9738a276, key=0x817f7ab "virtual_obs_id") at ../lib/smap.c:198 #3 0x0812ba30 in smap_get (smap=0x94, key=0x817f7ab "virtual_obs_id") at ../lib/smap.c:189 #4 0x08055a60 in bridge_configure_ipfix (br=<optimized out>) at ../vswitchd/bridge.c:1237 #5 bridge_reconfigure (ovs_cfg=0x94) at ../vswitchd/bridge.c:666 #6 0x080568d3 in bridge_run () at ../vswitchd/bridge.c:2972 #7 0x0804c9dd in main (argc=10, argv=0xffd8b934) at ../vswitchd/ovs-vswitchd.c:112 Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofp-actions: Add truncate action.William Tu2016-06-244-0/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds a new action to support packet truncation. The new action is formatted as 'output(port=n,max_len=m)', as output to port n, with packet size being MIN(original_size, m). One use case is to enable port mirroring to send smaller packets to the destination port so that only useful packet information is mirrored/copied, saving some performance overhead of copying entire packet payload. Example use case is below as well as shown in the testcases: - Output to port 1 with max_len 100 bytes. - The output packet size on port 1 will be MIN(original_packet_size, 100). # ovs-ofctl add-flow br0 'actions=output(port=1,max_len=100)' - The scope of max_len is limited to output action itself. The following packet size of output:1 and output:2 will be intact. # ovs-ofctl add-flow br0 \ 'actions=output(port=1,max_len=100),output:1,output:2' - The Datapath actions shows: # Datapath actions: trunc(100),1,1,2 Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/140037134 Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org>
* ipfix: Export user specified virtual observation IDWenyu Zhang2016-06-243-11/+94
| | | | | | | | | | | | | | In virtual network, users want more info about the virtual point to observe the traffic. It should be a string to provide clear info, not a simple interger ID. Introduce "other-config: virtual_obs_id" in IPFIX, which is a string configured by user. Introduce an enterprise IPFIX entity "virtualObsID"(898) to export the value. The entity is a variable-length string. Signed-off-by: Wenyu Zhang <wenyuz@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto: Set to revalidate when a new version is available.Jarno Rajahalme2016-06-213-26/+15
| | | | | | | | | | | | There is no need to set the revalidate flag after each flow mod separately, as we can do it once after the whole transaction is finished. It is not done at all if the transaction fails. In the successful case this change makes no functional difference, since the revalidation thread is triggered by the main thread only after a bundle transaction has been fully processed. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* xlate: Fix typo in comment.Jarno Rajahalme2016-06-211-1/+1
| | | | Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* ipfix: Support tunnel information for Flow IPFIX.Benli Ye2016-06-175-18/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support to export tunnel information for flow-based IPFIX. The original steps to configure flow level IPFIX: 1) Create a new record in Flow_Sample_Collector_Set table: 'ovs-vsctl -- create Flow_Sample_Collector_Set id=1 bridge="Bridge UUID"' 2) Add IPFIX configuration which is referred by corresponding row in Flow_Sample_Collector_Set table: 'ovs-vsctl -- set Flow_Sample_Collector_Set "Flow_Sample_Collector_Set UUID" ipfix=@i -- --id=@i create IPFIX targets=\"IP:4739\" obs_domain_id=123 obs_point_id=456 cache_active_timeout=60 cache_max_flows=13' 3) Add sample action to the flows: 'ovs-ofctl add-flow mybridge in_port=1, actions=sample'('probability=65535,collector_set_id=1, obs_domain_id=123,obs_point_id=456')',output:3' NXAST_SAMPLE action was used in step 3. In order to support exporting tunnel information, the NXAST_SAMPLE2 action was added and with NXAST_SAMPLE2 action in this patch, the step 3 should be configured like below: 'ovs-ofctl add-flow mybridge in_port=1, actions=sample'('probability=65535,collector_set_id=1,obs_domain_id=123, obs_point_id=456,sampling_port=3')',output:3' 'sampling_port' can be equal to ingress port or one of egress ports. If sampling port is equal to output port and the output port is a tunnel port, OVS_USERSPACE_ATTR_EGRESS_TUN_PORT will be set in the datapath flow sample action. When flow sample action upcall happens, tunnel information will be retrieved from the datapath and then IPFIX can export egress tunnel port information. If samping_port=65535 (OFPP_NONE), flow-based IPFIX will keep the same behavior as before. This patch mainly do three tasks: 1) Add a new flow sample action NXAST_SAMPLE2 to support exporting tunnel information. NXAST_SAMPLE2 action has a new added field 'sampling_port'. 2) Use 'other_configure: enable-tunnel-sampling' to enable or disable exporting tunnel information. 3) If 'sampling_port' is equal to output port and output port is a tunnel port, the translation of OpenFlow "sample" action should first emit set(tunnel(...)), then the sample action itself. It makes sure the egress tunnel information can be sampled. 4) Add a test of flow-based IPFIX for tunnel set. How to test flow-based IPFIX: 1) Setup a test environment with two Linux host with Docker supported 2) Create a Docker container and a GRE tunnel port on each host 3) Use ovs-docker to add the container on the bridge 4) Listen on port 4739 on the collector machine and use wireshark to filter 'cflow' packets. 5) Configure flow-based IPFIX: - 'ovs-vsctl -- create Flow_Sample_Collector_Set id=1 bridge="Bridge UUID"' - 'ovs-vsctl -- set Flow_Sample_Collector_Set "Flow_Sample_Collector_Set UUID" ipfix=@i -- --id=@i create IPFIX \ targets=\"IP:4739\" cache_active_timeout=60 cache_max_flows=13 \ other_config:enable-tunnel-sampling=true' - 'ovs-ofctl add-flow mybridge in_port=1, actions=sample'('probability=65535,collector_set_id=1,obs_domain_id=123, obs_point_id=456,sampling_port=3')',output:3' Note: The in-port is container port. The output port and sampling_port are both open flow port and the output port is a GRE tunnel port. 6) Ping from the container whose host enabled flow-based IPFIX. 7) Get the IPFIX template pakcets and IPFIX information packets. Signed-off-by: Benli Ye <daniely@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ipfix: Bug fix for not sending template packets on 32-bit OSBenli Ye2016-06-141-2/+2
| | | | | | | | | | | | | | | 'last_template_set_time' in truct dpif_ipfix_exporter is declared as time_t and time_t is long int type. If we initialize 'last_template_set_time' as TIME_MIN, whose value is -2147483648 on 32-bit OS and -2^63 on 64-bit OS. There will be a problem on 32-bit OS when comparing 'last_template_set_time' with a unisgned int type variable, because type casting will happen and negative value could be a large positive number. Fix this problem by simply initialize 'last_template_set_time' as 0. Signed-off-by: Benli Ye <daniely@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: William Tu <u9012063@gmail.com>
* ipfix: Add support for exporting ipfix statistics.Benli Ye2016-06-147-22/+274
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is meaningful for user to check the stats of IPFIX. Using IPFIX stats, user can know how much flows the system can support. It is also can be used for performance check of IPFIX. IPFIX stats is added for per IPFIX exporter. If bridge IPFIX is enabled on the bridge, the whole bridge will have one exporter. For flow IPFIX, the system keeps per id (column in Flow_Sample_Collector_Set) per exporter. 1) Add 'ovs-ofctl dump-ipfix-bridge SWITCH' to export IPFIX stats of the bridge which enable bridge IPFIX. The output format: NXST_IPFIX_BRIDGE reply (xid=0x2): bridge ipfix: flows=0, current flows=0, sampled pkts=0, \ ipv4 ok=0, ipv6 ok=0, tx pkts=0 pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=0 2) Add 'ovs-ofctl dump-ipfix-flow SWITCH' to export IPFIX stats of the bridge which enable flow IPFIX. The output format: NXST_IPFIX_FLOW reply (xid=0x2): 2 ids id 1: flows=4, current flows=4, sampled pkts=14, ipv4 ok=13, \ ipv6 ok=0, tx pkts=0 pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=0 id 2: flows=0, current flows=0, sampled pkts=0, ipv4 ok=0, \ ipv6 ok=0, tx pkts=0 pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=0 flows: the number of total flow records, including those exported. current flows: the number of current flow records cached. sampled pkts: Successfully sampled packet count. ipv4 ok: successfully sampled IPv4 flow packet count. ipv6 ok: Successfully sampled IPv6 flow packet count. tx pkts: the count of IPFIX exported packets sent to the collector(s). pkts errs: count of packets failed when sampling, maybe not supported or other error. ipv4 errs: Count of IPV4 flow packet in the error packets. ipv6 errs: Count of IPV6 flow packet in the error packets. tx errs: the count of IPFIX exported packets failed when sending to the collector(s). Signed-off-by: Benli Ye <daniely@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* odp-util: Remove odp_in_port from struct odp_flow_key_parms.Jesse Gross2016-06-131-3/+0
| | | | | | | | | | | | | | | | | When calling odp_flow_key_from_flow (or _mask), the in_port included as part of the flow is ignored and must be explicitly passed as a separate parameter. This is because the assumption was that the flow's version would often be in OFP format, rather than ODP. However, at this point all flows that are ready for serialization in netlink format already have their in_port properly set to ODP format. As a result, every caller needs to explicitly initialize the extra paramter to the value that is in the flow. This switches to just use the value in the flow to simply things and avoid the possibility of forgetting to initialize the extra parameter. Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* ofproto-dpif-upcall: Translate input port as part of upcall translation.Jesse Gross2016-06-131-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | When we generate wildcards for upcalled flows, the flows and therefore the wildcards, are in OpenFlow format. These are mostly the same but one exception is the input port. We work around this problem by simply performing an exact match on the input port when generating netlink formatted keys. (This does not lose any information in practice because action translation also always exact matches on input port.) While this works fine for kernel based flows, it misses the userspace datapath, which directly consumes the OFP format mask for the input port. The effect of this is that the in_port mask is sometimes only the lower 16 bits of the field. (This is because OFP format is a 16-bit value stored in a 32-bit field. The full width of the field is initialized with an exact match mask but certain operations result in cleaving this down to 16 bits.) In practice this does not cause a problem because datapath port numbers are almost always in the lower 16 bits of the range anyways. This moves the masking of the datapath format field to translation so that all datapaths see the same result. This also makes more sense conceptually as the input port in the flow is also in ODP format at this stage. Signed-off-by: Jesse Gross <jesse@kernel.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* ofproto-dpif-upcall: Prevent memory leak on log message.Thadeu Lima de Souza Cascardo2016-06-081-0/+1
| | | | | | | | | | | When DPIF does not support UFID (like old kernels), it may print this message quite frequently, if using an OVS version that does not include the upstream fix af50de800ecb ("ofproto-dpif-upcall: Pass key to dpif_flow_get()."). Fixes: 64bb477f0568 ("dpif: Minimize memory copy for revalidation.") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Joe Stringer <joe@ovn.org>
* debian, rhel: Ship ovs shared libraries and header filesEdwin Chiu2016-06-071-1/+1
| | | | | | | | | | Compile and package ovs shared libraries and create new header package for debian (openvswitch-dev) and rhel (openvswitch-devel). VMware-BZ: #1556299 Signed-off-by: Edwin Chiu <echiu@vmware.com> Co-authored-by: Harold Lim <haroldl@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ipfix: Bug fix for configuring IPFIX for flowsBenli Ye2016-06-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two kinds of IPFIX: bridge level IPFIX and flow level IPFIX. Now if we only configure flow level IPFIX, even if there is no bridge IPFIX configuration, the datapath flow will contain a sample action for bridge IPFIX. Fix it. Steps to configure flow level IPFIX: 1) Create a new record in Flow_Sample_Collector_Set table: 'ovs-vsctl -- create Flow_Sample_Collector_Set id=1 bridge="Bridge UUID"' 2) Add IPFIX configuration which is referred by corresponding row in Flow_Sample_Collector_Set table: 'ovs-vsctl -- set Flow_Sample_Collector_Set "Flow_Sample_Collector_Set UUID" ipfix=@i -- --id=@i create IPFIX targets=\"IP:4739\" obs_domain_id=123 obs_point_id=456 cache_active_timeout=60 cache_max_flows=13' 3) Add sample action to the flows: 'ovs-ofctl add-flow mybridge in_port=1, actions=sample'('probability=65535,collector_set_id=1, obs_domain_id=123,obs_point_id=456')',output:LOCAL' Before this fix, if you only configure flow IPFIX, the datapath flow is: id(0),in_port(2),eth_type(0x0806), packets:0, bytes:0, used:never, actions:sample(sample=0.0%,actions(userspace(pid=4294960835, ipfix(output_port=4294967295)))),sample(sample=100.0%, actions(userspace(pid=4294960835,flow_sample(probability=65535, collector_set_id=1,obs_domain_id=123,obs_point_id=456)))), sample(sample=0.0%,actions(userspace(pid=4294960835, ipfix(output_port=1)))),1 The datapath flow should only contain the sample action like below: id(0),in_port(2),eth_type(0x0800),ipv4(frag=no), packets:9, bytes:871, used:0.656s, actions:sample(sample=100.0%,actions(userspace(pid=4294962911, flow_sample(probability=65535,collector_set_id=1,obs_domain_id=123, obs_point_id=456)))),1 Signed-off-by: Benli Ye <daniely@vmware.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif: Cache result of time_msec() for rule_expire().Daniele Di Proietto2016-06-021-4/+4
| | | | | | | | | | | | | | | | | In the run() function of ofproto-dpif we call rule_expire() for every possible flow that has a timeout and rule_expire() calls time_msec(). Calling time_msec() repeatedly can be pretty expensive, even though most of the time it involves only a vdso call. This commit calls time_msec only once in run(), to reduce the workload. Keeping the flows ordered by expiration in some kind of heap or timing wheel data structure could help make this process more efficient, if rule_expire() turns out to be a bottleneck. VMware-BZ: #1655122 Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
* xlate: Skip recirculation for output and set actionsSimon Horman2016-05-271-1/+121
| | | | | | | | | | | | | | | | | | | | | | | | Until 8bf009bf8ab4 ("xlate: Always recirculate after an MPLS POP to a non-MPLS ethertype.") the translation code took some care to only recirculate as a result of a pop_mpls action if necessary. This was implemented using per-action checks and resulted in some maintenance burden. Unfortunately recirculation is a relatively expensive operation and a performance degradation of up to 35% has been observed with the above mentioned patch applied for the arguably common case of: pop_mpls,set(l2 field),output This patch attempts to strike a balance between performance and maintainability by special casing set and output actions such that recirculation may be avoided. This partially reverts the above mentioned commit. In particular most of the C code outside of do_xlate_actions(). Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* ofproto: update mtu when port is getting removed as wellak47izatool@gmail.com2016-05-251-7/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we're adding the port into ovs bridge, its mtu is updated to the minimal mtu of the included port. But when the port is getting removed, no such update is performed, which leads to bug. For example, when the port with minimal mtu is removed, bridge's mtu must adapt to new value, but it won't happen. How to reproduce the problem: $ ovs-vsctl add-br testing $ ip link add name gretap11 type gretap local 10.0.0.1 remote 10.0.0.100 $ ip link add name gretap12 type gretap local 10.0.0.1 remote 10.0.0.200 $ ip link set dev gretap12 mtu 1600 $ ovs-vsctl add-port testing gretap11 $ ovs-vsctl add-port testing gretap12 $ ip a sh testing 16: testing: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1 link/ether 7a:42:95:00:96:40 brd ff:ff:ff:ff:ff:ff $ ovs-vsctl del-port gretap11 $ ip a sh testing 16: testing: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1 link/ether 7a:42:95:00:96:40 brd ff:ff:ff:ff:ff:ff $## as we can see here, 'testing' bridge mtu is stuck, while it must adapt to new '1600' value, $## cause there is only one port 'gretap12' left, and it's mtu is '1600': $ ip a sh gretap12 19: gretap12@NONE: <BROADCAST,MULTICAST> mtu 1600 qdisc noop master ovs-system state DOWN group default qlen 1000 link/ether b2:c6:1d:9f:be:0d brd ff:ff:ff:ff:ff:ff My commit fixes this problem - mtu update is performed on port removal as well. Signed-off-by: wisd0me <ak47izatool@gmail.com> Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com>
* netdev-native-tnl: Introduce ip_build_header()Pravin B Shelar2016-05-233-60/+11
| | | | | | | | | | The native tunneling build tunnel header code is spread across two different modules, it makes pretty hard to follow the code. Following patch refactors the code to move all code to netdev-ative-tnl module. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* upcall: Unregister dpif cbs in udpif_destroy().Joe Stringer2016-05-231-0/+3
| | | | | | | | | | | | | | During udpif_create(), we register callbacks for handling upcalls and purging the datapath; however, in the corresponding udpif_destroy() we never did this. This could potentially lead to dereference of uninitialized memory in the userspace datapath if the main thread destroys the udpif then executes an OpenFlow packet-out. Fixes: e4e74c3a2b9a ("dpif-netdev: Purge all ukeys when reconfigure pmd.") Fixes: 623540e4617e ("dpif-netdev: Streamline miss handling.") Reported-by: William Tu <u9012063@gmail.com> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif: Call dpif_poll_threads_set() before dpif_run().Daniele Di Proietto2016-05-231-2/+2
| | | | | | | | | | | An upcoming commit will make dpif_poll_threads_set() record the requested configuration and dpif_run() apply it, so it makes sense to change the order. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Mark Kavanagh <mark.b.kavanagh@intel.com>
* hmap: Use struct for hmap_at_position().Daniele Di Proietto2016-05-231-5/+3
| | | | | | | | The interface will be more similar to the cmap. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Tested-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Ilya Maximets <i.maximets@samsung.com>
* ofproto-dpif-xlate: Fix IGMP megaflow matching.Ben Pfaff2016-05-201-8/+20
| | | | | | | | | IGMP translations wasn't setting enough bits in the wildcards to ensure different packets were handled differently. Reported-by: "O'Reilly, Darragh" <darragh.oreilly@hpe.com> Reported-at: http://openvswitch.org/pipermail/discuss/2016-April/021036.html Signed-off-by: Ben Pfaff <blp@ovn.org>
* dpif: Pass flow parameter to dpif_execute().Daniele Di Proietto2016-05-202-0/+11
| | | | | | | | | | | | | | | | All the callers of the function already have a copy of the extracted flow in their stack (or a few frames before). This is useful for different resons: * It forces the callers to also call flow_extract() on the packet, which is necessary to initialize the l2,l3,l4 pointers. * It will be used in the userspace datapath to generate the RSS hash by a following commit * It can be used by the userspace connection tracker to avoid extracting the l3 type again. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
* tnl-ports: Handle STT ports.Pravin B Shelar2016-05-181-2/+6
| | | | | | | | STT uses TCP port so we need to filter traffic on basis of TCP port numbers. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* tunnel: Add IP ECN related functions.Pravin B Shelar2016-05-181-3/+3
| | | | | | | | Set and get functions for IP explicit congestion notification flag. These function would be used by STT reassembly code. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* dpif-netdev: create batch objectPravin B Shelar2016-05-181-2/+3
| | | | | | | | | | DPDK datapath operate on batch of packets. To pass the batch of packets around we use packets array and count. Next patch needs to associate meta-data with each batch of packets. So Introducing a batch structure to make handling the metadata easier. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jesse Gross <jesse@kernel.org>
* ofproto-dpif-xlate: Fix compilation with GCC 4.6.Ben Pfaff2016-05-171-1/+1
| | | | | | | | | | | | | | Without this change, GCC 4.6 reports: ofproto/ofproto-dpif-xlate.c: In function ‘xlate_actions’: ofproto/ofproto-dpif-xlate.c:5117:27: error: missing initializer ofproto/ofproto-dpif-xlate.c:5117:27: error: (near initialization for ‘(anonymous).masks.vlan_tci’) Reported-by: Joe Stringer <joe@ovn.org> Reported-at: https://travis-ci.org/openvswitch/ovs/builds/130256491 Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* ofproto-dpif-upcall: Fix UFID usage with flow_modify.Joe Stringer2016-05-171-1/+1
| | | | | | | | | | | As per the delete_op_init{,__}() functions, the UFID should only be passed down if ukey->ufid_present is set. Otherwise it is possible to request a flow modification only using a UFID in a datapath that doesn't support UFID, which will fail. Fixes: 43b2f131a229 ("ofproto: Allow in-place modifications of datapath flows.") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-xlate: Always generate wildcards.Ben Pfaff2016-05-141-13/+8
| | | | | | | | | | | | | | | | | | | | | | | | Until now, the flow translation code has tried to avoid constructing a set of wildcards during translation in the cases where it can, because wildcards are large and somewhat expensive. However, this has problems that we hadn't previously realized. Specifically, the generated actions can depend on the constructed wildcards, to decide which bits of a field need to be set in a masked set_field action. This means that in practice translation needs to always construct the wildcards. (It might be possible to avoid masked set_field when we're not constructing wildcards, but this would mean that we'd generate different actions depending on whether wildcards were being constructed, which seems rather confusing at best. Also, the cases in which we don't need wildcards anyway are fairly obscure, meaning that the benefits of avoiding them in those cases are minimal and that it's going to be hard to get test coverage. The latter is probably why we didn't notice this until now.) Reported-by: William Tu <u9012063@gmail.com> Reported-at: http://openvswitch.org/pipermail/dev/2016-April/069219.html Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com> Tested-by: William Tu <u9012063@gmail.com>
* ofproto-dpif-upcall: Pass key to dpif_flow_get().Joe Stringer2016-05-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Windows datapath folks have reported instances where OVS userspace will pass down a flow_get request to the datapath using a UFID even though the datapath has no support for UFIDs. Since commit e672ff9b4d22 ("ofproto-dpif: Restore metadata and registers on recirculation."), if a flow dump provides a flow that userspace isn't aware of, and the flow dump doesn't provide actions for that flow, then userspace will attempt a flow_get using just the UFID. This is because the ofproto-dpif layer doesn't pass the key down to the dpif layer even if it's available. Prior to the above commit, the codepath was only hit if the key was not available, which would have implied UFID support. This assumption is now broken: An empty set of actions could also trigger flow_get, and datapaths without UFID support are free to pass up empty actions lists. Pass down the flow key if available, and don't pass down the UFID if unavailable to be more consistent with the usage of other dpif APIs within this file. Fixes: e672ff9b4d22 ("ofproto-dpif: Restore metadata and registers on recirculation.") Reported-by: Sairam Venugopal <vsairam@vmware.com> Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* ofproto-dpif-xlate: fix for group liveness propagationLászló Sürü2016-05-111-1/+1
| | | | | | | | | | | | | | | | | | According to OpenFlow v1.3.5 specification a group is considered live, if it has at least one live bucket in it. (6.5 Group Table Modification Messages: "A group is considered live if a least one of its buckets is live.") However, OVS implementation incorrectly returns group as live when no live bucket is found in group_is_alive() function of ofproto-dpif-xlate.c. Instead it should return true only if a live bucket is found (that is != NULL). Signed-off-by: László Sűrű <laszlo.suru@ericsson.com> Co-authored-by: Jan Scheurich <jan.scheurich@ericsson.com> Signed-off-by: Jan Scheurich <jan.scheurich@ericsson.com> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* ofproto-dpif: Restore packet metadata when a continuation is resumed.Numan Siddique2016-05-101-6/+18
| | | | | | | | | Recirculations due to NXT_RESUME are failing if the packet metadata is not restored prior to the packet execution. Reported-at: http://openvswitch.org/pipermail/dev/2016-May/070723.html Signed-off-by: Numan Siddique <nusiddiq@redhat.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
* util: Pass 128-bit arguments directly instead of using pointers.Justin Pettit2016-05-083-3/+3
| | | | | | | | | | | Commit f2d105b5 (ofproto-dpif-xlate: xlate ct_{mark, label} correctly.) introduced the ovs_u128_and() function. It directly takes ovs_u128 values as arguments instead of pointers to them. As this is a bit more direct way to deal with 128-bit values, modify the other utility functions to do the same. Signed-off-by: Justin Pettit <jpettit@ovn.org> Acked-by: Joe Stringer <joe@ovn.org>
* cmap: New macro CMAP_INITIALIZER, for initializing an empty cmap.Ben Pfaff2016-05-093-30/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes code is much simpler if we can statically initialize data structures. Until now, this has not been possible for cmap-based data structures, so this commit introduces a CMAP_INITIALIZER macro. This works by adding a singleton empty cmap_impl that simply forces the first insertion into any cmap that points to it to allocate a real cmap_impl. There could be some risk that rogue code modifies the singleton, so for safety it is also marked 'const' to allow the linker to put it into a read-only page. This adds a new OVS_ALIGNED_VAR macro with GCC and MSVC implementations. The latter is based on Microsoft webpages, so developers who know Windows might want to scrutinize it. As examples of the kind of simplification this can make possible, this commit removes an initialization function from ofproto-dpif-rid.c and a call to cmap_init() from tnl-neigh-cache.c. An upcoming commit will add another user. CC: Jarno Rajahalme <jarno@ovn.org> CC: Gurucharan Shetty <guru@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
* ofproto-dpif: Do not count resubmit to later tables against limit.Ben Pfaff2016-05-094-26/+71
| | | | | | | | | | | | | | | | | | | | | | | | Open vSwitch must ensure that flow translation takes a finite amount of time. Until now it has implemented this by limiting the depth of recursion. The initial limit, in version 1.0.1, was no recursion at all, and then over the years it has increased to 8 levels, then 16, then 32, and 64 for the last few years. Now reports are coming in that 64 levels are inadequate for some OVN setups. The natural inclination would be to double the limit again to 128 levels. This commit attempts another approach. Instead of increasing the limit, it reduces the class of resubmits that count against the limit. Since the goal for the depth limit is to prevent an infinite amount of work, it's not necessary to count resubmits that can't lead to infinite work. In particular, a resubmit from a table numbered x to a table y > x cannot do this, because any OpenFlow switch has a finite number of tables. Because in fact a resubmit (or goto_table) from one table to a later table is the most common form of an OpenFlow pipeline, I suspect that this will greatly alleviate the pressure to increase the depth limit. Reported-by: Guru Shetty <guru@ovn.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
* ofproto-dpif: Rename "recurse" to "indentation".Ben Pfaff2016-05-094-40/+42
| | | | | | | | | | | The "recurse" member of struct xlate_in and struct xlate_ctx is used for two purposes: to determine the amount of indentation in "ofproto/trace" output and to limit the depth of recursion. An upcoming commit will separate these tasks, and so in preparation this commit renames "recurse" to "indentation". Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Ryan Moats <rmoats@us.ibm.com>
* Remove "VLAN splinters" feature.Pravin B Shelar2016-04-277-493/+4
| | | | | | | | | | | The "VLAN splinters" feature works around buggy device drivers in old Linux versions. But support for the old kernel is dropped, So now all supported kernel vlan drivers should be working fine with OVS kernel datapath. Following patch removes this deprecated feature. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* hmap: Add HMAP_FOR_EACH_POP.Daniele Di Proietto2016-04-266-21/+14
| | | | | | | Makes popping each member of the hmap a bit easier. Signed-off-by: Daniele Di Proietto <diproiettod@vmware.com> Acked-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-xlate: Tidy up ct_mark xlate code.Joe Stringer2016-04-221-11/+10
| | | | | | | | Make the ct_mark netlink serialization more consistent with the way that ct_label is serialized. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* ofproto-dpif-xlate: xlate ct_{mark, label} correctly.Joe Stringer2016-04-221-11/+16
| | | | | | | | | | | | | | | | | | | | | | | | When translating multiple ct actions in a row which include modification of ct_mark or ct_labels, these fields could be incorrectly translated into datapath actions, resulting in modification of these fields for entries when the OpenFlow rules didn't actually specify the change. For instance, the following OpenFlow actions: ct(zone=1,commit,exec(set_field(1->ct_mark))),ct(zone=2,table=1),... Would translate into the datapath actions: ct(zone=1,commit,mark=1),ct(zone=2,mark=1),recirc(...),... This commit fixes the issue by zeroing the wildcards for these fields prior to performing nested actions translation (and restoring afterwards). As such, these fields do not hold both the match and the field modification values at the same time. As a result, the ct_mark and ct_labels don't leak from one ct action to the next. Fixes: 8e53fe8cf7a1 ("Add connection tracking mark support.") Fixes: 9daf23484fb1 ("Add connection tracking label support.") Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* tunneling: Fix for concomitant IPv4 and IPv6 tunnelsThadeu Lima de Souza Cascardo2016-04-211-0/+4
| | | | | | | | | | | When using an IPv6 tunnel on the same bridge as an IPv4 tunnel, the flow received from the IPv6 tunnel would have an IPv4 address added to it, causing problems when trying to put or execute the action on Linux datapath. Clearing the IPv6 address when we have a valid IPv4 address fixes this problem. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>