summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* odp-util: Stop parse odp actions if nlattr is overflowbranch-2.2Yifeng Sun2019-02-041-0/+4
| | | | | | | | | `encap = nl_msg_start_nested(key, OVS_KEY_ATTR_ENCAP)` ensures that key->size >= (encap + NLA_HDRLEN), so the `if` statement is safe. Reported-at: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11306 Signed-off-by: Yifeng Sun <pkusunyifeng@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* python: Escape backslashes while formatting logs.Ilya Maximets2019-01-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since python version 3.7 (and some 3.6+ versions) regexp engine changed to treat the wrong escape sequences as errors. Previously, if the replace string had something like '\u0000', '\u' was qualified as a bad escape sequence and treated just as a sequence of characters '\' and 'u'. But know this triggers an error: Traceback (most recent call last): File "/usr/lib/python3.7/sre_parse.py", line 1021, in parse_template this = chr(ESCAPES[this][1]) KeyError: '\\u' From the documentation [1]: Unknown escapes consisting of '\' and an ASCII letter in replacement templates for re.sub() were deprecated in Python 3.5, and will now cause an error. [1] https://docs.python.org/3/whatsnew/3.7.html#api-and-feature-removals We need to escape the backslash by another one to keep regexp engine from errors. In case of '\\u000', '\\' is a valid escape sequence and the 'u' is a simple character. To be 100% safe we need to use 're.escape(replace)', but it escapes too many characters making the logs hard to read. This change fixes Python 3 tests on systems with python 3.7. Should be backward compatible. Reported-by: Ben Pfaff <blp@ovn.org> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* python: Catch setsockopt exceptions for TCP stream.Ilya Maximets2018-12-201-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | 'sock.setsockopt' could throw exceptions. For example, if non-blocking connection failed before the call: Traceback (most recent call last): File "../.././test-ovsdb.py", line 896, in <module> main(sys.argv) File "../.././test-ovsdb.py", line 891, in main func(*args) File "../.././test-ovsdb.py", line 604, in do_idl ovs.stream.Stream.open(r)) File "/root/git_/ovs/python/ovs/stream.py", line 190, in open error, sock = cls._open(suffix, dscp) File "/root/git_/ovs/python/ovs/stream.py", line 744, in _open sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) File "/usr/local/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) socket.error: [Errno 54] Connection reset by peer This fixes tests on FreeBSD. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netdev: Properly clear 'details' when iterating in NETDEV_QOS_FOR_EACH.Ben Pfaff2018-10-031-1/+2
| | | | | | | | | | | The function comment for netdev_queue_dump_next() said that it cleared its 'detail' argument, but it didn't actually do that, which meant that details could be incorrectly carried along from one queue to the next. Reported-by: Flavio Leitner <fbl@sysclose.org> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Eelco Chaudron <echaudro@redhat.com> Acked-by: Flavio Leitner <fbl@sysclose.org>
* netdev-linux: Avoid division by 0 if kernel reports bad scheduler data.Ben Pfaff2018-08-201-1/+1
| | | | | | | | | | | | | If the kernel reported a value of 0 for the second value in /proc/net/psched, it would cause a division-by-zero fault in read_psched(). I don't know of a kernel that would actually do that, but it's still better to be safe. Found by clang static analyzer. Reported-by: Bhargava Shastry <bshastry@sect.tu-berlin.de> Signed-off-by: Ben Pfaff <blp@ovn.org> Reviewed-by: Yifeng Sun <pkusunyifeng@gmail.com>
* pcap-file: Fix formatting of log message.Ben Pfaff2018-08-031-1/+1
| | | | | Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Alin Gabriel Serdean <aserdean@ovn.org>
* ovs-tcpundump: fix a conversion issueAaron Conole2018-01-121-1/+1
| | | | | | | | | | | When I tried using ovs-tcpundump, I got the following error message: Traceback (most recent call last): File ./ovs-tcpundump, line 64, in <module> if m is None or int(m.group(1)) == 0: ValueError: invalid literal for int() with base 10: '00a0' Signed-off-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* connmgr: Fix crash when in_band_create() fails.Ben Pfaff2017-07-171-1/+1
| | | | | | | | | | | update_in_band_remotes() created an in-band manager and then tried to work with it without first checking whether creation had succeeded. If it failed, this led to a segfault. Reported-by: Numan Siddique <nusiddiq@redhat.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2017-July/335530.html Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
* ovsdb-types: Fix memory leak on error path.Yunjian Wang2017-07-111-0/+1
| | | | | | Fixes: bd76d25d8b3b ("ovsdb: Add simple constraints.") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofp-util: fix memory leak in ofputil_pull_ofp11_bucketszhongbaisong2017-07-071-0/+1
| | | | | Signed-off-by: zhongbaisong <zhongbaisong@huawei.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* ofp-print: Don't abort on unknown reason in role status message.Ben Pfaff2017-07-071-1/+2
| | | | | | | | | | A buggy or malicious switch could send a role status message with a bad reason code, which if printed by OVS would cause it to abort. This fixes the problem. Reported-by: Bhargava Shastry <bshastry@sec.t-labs.tu-berlin.de> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
* rconn: Avoid abort for ill-behaved remote.Ben Pfaff2016-12-231-6/+10
| | | | | | | | | | | | | | | | | | | | If an rconn peer fails to send a hello message, the version number doesn't get set. Later, if the peer delays long enough, the rconn attempts to send an echo request but assert-fails instead because it doesn't know what version to use. This fixes the problem. To reproduce this problem: make sandbox ovs-vsctl add-br br0 ovs-vsctl set-controller br0 ptcp:12345 nc 127.0.0.1 12345 and wait 10 seconds for ovs-vswitchd to die. (Then exit the sandbox.) Reported-by: 张东亚 <fortitude.zhang@gmail.com> Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Justin Pettit <jpettit@ovn.org>
* ofproto-dpif-ipfix: Fix assertion failure for bad configuration.Ben Pfaff2016-12-091-24/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The assertions in dpif_ipfix_set_options() made some bad assumptions about flow exporters. The code that added and removed exporters would add a flow exporter even if it had an invalid configuration ("broken"), but the assertions checked that broken flow exporters were not added. Thus, the when a flow exporter was broken, ovs-vswitchd would crash due to an assertion failure. Here is an example vsctl command that, run in the sandbox, would crash ovs-vswitchd: ovs-vsctl \ -- add-br br0 \ -- --id=@br0 get bridge br0 \ -- --id=@ipfix create ipfix target='["xyzzy"]' \ -- create flow_sample_collector_set id=1 bridge=@br0 ipfix=@ipfix The minimal fix would be to remove the assertions, but this would leave broken flow exporters in place. This commit goes a little farther and actually removes broken flow exporters. This fix pulls code out of an "if" statement to a higher level, so it is a smaller fix when viewed igoring space changes. This bug dates back to the introduction of IPFIX in 2013. VMware-BZ: #1779123 CC: Romain Lenglet <romain.lenglet@berabera.info> Fixes: 29089a540cfa ("Implement IPFIX export") Signed-off-by: Ben Pfaff <blp@ovn.org> Acked-by: Jarno Rajahalme <jarno@ovn.org>
* bond: Use correct type for slave's change_seq.Jarno Rajahalme2015-12-041-1/+1
| | | | | | | | | | | | | | | seq values are 64-bit, and storing them to a 32-bit variable causes the stored value never to match actual seq value after the seq value gets big enough. This is a likely cause of OVS main thread using 100% CPU in a system using bonds after some runtime. VMware-BZ: #1564993 Reported-by: Hiram Bayless <hbayless@vmware.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Ben Pfaff <blp@ovn.org>
* debian: place kernel module to satisfy depmod search.Saurabh Mohan2015-11-022-1/+2
| | | | | | | | | | | | On Ubuntu depmod's search priority is configured in /etc/depmod to be updates and then the kernel built-in directory. $ cat /etc/depmod.d/ubuntu.conf search updates ubuntu built-in Thus change the placement of openvswitch.ko under updates/ not kernel/updates. Acked-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Saurabh Mohan <saurabh@cplanenetworks.com>
* meta-flow: Fix ip_frag handling in mf_set_wild().Jarno Rajahalme2015-08-202-3/+3
| | | | | | | | | The wildcard bits were set when they should have been cleared. Found by inspection. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* Update my email addressYAMAMOTO Takashi2015-06-031-1/+1
| | | | | Signed-off-by: YAMAMOTO Takashi <yamamoto@midokura.com> Acked-by: Justin Pettit <jpettit@nicira.com>
* json: Fix error message for corner case in json_string_unescape().Ben Pfaff2015-05-292-1/+10
| | | | | | | | | | The error message should not include bytes already copied from the input string. Found by inspection. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
* debian: install openvswitch kernel module under "updates" directoryAnsis Atteka2015-05-261-1/+1
| | | | | | | | | | | | | | | | | | | This patch fixes a bug where "modprobe openvswitch" command on Ubuntu distribution would have sometimes tried to load OVS kernel module that shipped together with Linux Kernel, even though one had also installed OVS datapath debian package created with module-assistant. Because of this issue force-reload-kmod command occasionally malfunctioned and failed to load the right kernel module. This bug happened *occasionally* because the default Ubuntu depmod configuration in /etc/depmod.d/ubuntu.conf is set to look for kernel modules first in "updates" directory, then in "ubuntu" directory and then in other directories. If there were two openvswitch.ko modules in "other directories", then modprobe would have loaded kernel module that was nondeterministically listed first by file system. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* netdev-bsd: Fix sign extension bug in ifr_flags on FreeBSD.Kevin Lo2015-04-051-2/+4
| | | | | | | | | | | FreeBSD fills the int return value with ifr_flagshigh in the high 16 bits and ifr_flags in the low 16 bits rather than blindly promoting ifr_flags to an int, which will preserve the sign. This commit makes sure the flags returned isn't negative and apply mask 0xffff to flags. Signed-off-by: Kevin Lo <kevlo@FreeBSD.org> Signed-off-by: Ben Pfaff <blp@nicira.com>
* ofproto-dpif-xlate: Roll back group bucket actions after every bucket.Jarno Rajahalme2015-03-242-8/+32
| | | | | | | | | | We used to roll back group bucket changes only for 'all' and 'indirect' group types, but the expected semantics of all group types is that any changes by the group bucket are not visible after the group has been executed. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* vlog: Logging option '--syslog-target' needs one argument.Gurucharan Shetty2015-01-261-1/+1
| | | | | | | | Without this commit, starting a daemon with just '--syslog-target' causes a segmentation fault. Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* setup n_upcall_pids for vport_request when destroy all channelsGao feng2015-01-221-0/+1
| | | | | | | | Setup the n_upcall_pids to 1, otherwise the OVS_VPORT_ATTR_UPCALL_PID nlattr will be incorrect. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Thomas Graf <tgraf@noironetworks.com>
* pktbuf: Always initialize '*bufferp' even when 'pb == NULL'.Ben Pfaff2015-01-191-1/+2
| | | | | | | | | | | | Otherwise if a service connection (which does not have buffers) attempts to use buffers, '*bufferp' will be uninitialized, which can cause a segfault in the caller. Found using OFtest configured to use service (active rather than passive) connections. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com>
* netflow: Fix interpretation of flow_seq.Motonori Shindo2014-11-041-1/+1
| | | | | | | | | 'flow_seq" field in NetFlow v5 header should represent a number of NetFlow flow records exported while it is representing the number of NetFlow packets exported in the current code. This patch fixes this problem. Signed-off-by: Motonori Shindo <motonori@shin.do> Signed-off-by: Ben Pfaff <blp@nicira.com>
* odp-util: Fix segfault in MPLS attribute parsing.Joe Stringer2014-10-311-31/+26
| | | | | | | | | | | | | | | | Just because the ethertype is MPLS, this doesn't mean that the datapath understands and provides OVS_KEY_ATTR_MPLS attributes for the flow. Previously we would check the size of the OVS_KEY_ATTR_MPLS attribute before checking whether the attribute is present. This would cause a segfault in nl_attr_get_size(), usually triggered from a handler thread. This patch brings the MPLS parsing code more in line with the rest of the parse_l2_5_onward() function, by only processing MPLS if the attribute is present. Reported-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* ofp-print-ofctl: Free group buckets.Simon Horman2014-10-311-0/+2
| | | | | | | Found by inspection using make check-valgrind. Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* lib/meta-flow: Index correct MPLS lse in mf_is_all_wild().Jarno Rajahalme2014-10-061-2/+2
| | | | | | | Should index the first lse for all parts of the lse (label, TC, BOS). Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
* ovs-vsctl: Correctly exit on errors for non-map types in "remove" command.Ben Pfaff2014-09-022-5/+12
| | | | | | Reported-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com>
* thread: Use explicit wide type when shifting > 32 bitsThomas Graf2014-08-291-2/+2
| | | | | | | | | Without the explicit wide type, the shift operation may be performed on a int which will result in implementation defined behaviour on a system with more than 32 CPUs. Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* netdev-linux: Cast policer rate to uint64_t to avoid overflowThomas Graf2014-08-291-1/+1
| | | | | | | | | tc_fill_rate() takes a 64bit int, casting kbits_rate from int to uint64_t avoids a possible overflow when translating from kbits to bytes. Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* ovsdb: Fix error leak for negative timeout and invalid until caseThomas Graf2014-08-281-0/+2
| | | | | | | | | | Although the check for negative timeout is present, the error string is overwritten if an invalid "until" is found right after. This leaks an error string and results in not reporting the negative timeout back to the user even though it is encountered first. Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* vtep-ctl: Free error string before return from cmd_remove().Madhu Challa2014-08-271-2/+2
| | | | | | | | | Error string should be freed in all cases. Found by Coverity. Signed-off-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* Fix memory leaks in error paths.yinpeijun2014-08-272-0/+2
| | | | | | | Found by Fortify. Signed-off-by: yinpeijun <yinpeijun@huawei.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* debian: Fix cross build.Ed Swierk2014-08-261-1/+1
| | | | | | | | | | Cross-building openvswitch with debuild -aARCH (or equivalent) fails because the target architecture is not getting passed to configure. Thus binaries like ovs-appctl get built using the build host architecture. Signed-off-by: Ed Swierk <eswierk@skyportsystems.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* odp-util: Only add recirc_id mask to Netlink message if mask is providedThomas Graf2014-08-261-1/+3
| | | | | | | | | Current unconditional call may result in NULL being passed to nl_msg_put_u32(). Cc: Andy Zhou <azhou@nicira.com> Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* json: Fix leaked nodes in json_hash_object()Thomas Graf2014-08-261-0/+1
| | | | | | | nodes is allocated through shash_sort() but never freed. Signed-off-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* connmgr: Only send role status messages to OpenFlow 1.4+ controllers.Ben Pfaff2014-07-284-41/+139
| | | | | | | | | | | | Only OpenFlow 1.4 and later support role status messages, but this code tried to send them to all controllers, which caused an assertion failure. Also, add tests to check that role status messages work, and that they don't cause trouble with OF1.2. Reported-by: Anup Khadka <khadka.py@gmail.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Justin Pettit <jpettit@nicira.com>
* connmgr: Demote service controllers too in ofconn_set_role().Ben Pfaff2014-07-281-3/+3
| | | | | | | | | Service controllers can set their roles, so it's necessary to demote them to slaves if another controller becomes master. Unfortunately the 'controllers' hmap only contains primary controllers, so this was omitted. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Justin Pettit <jpettit@nicira.com>
* ovs-ofctl: Add --unixctl command line option.Ben Pfaff2014-07-283-1/+15
| | | | | | | | This matches the option offered by some other Open vSwitch daemons. I intend to use it in tests in an upcoming commit. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Justin Pettit <jpettit@nicira.com>
* Fix two memory leaks.yinpeijun2014-07-283-0/+3
| | | | | | | Found by coverity. Signed-off-by: yinpeijun <yinpeijun@huawei.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* ovsdb: Don't add ovsdb-server.c to libovsdb.Gurucharan Shetty2014-07-182-1/+1
| | | | | | | | | | Without this change, with shared libraries, VLOG constructor for ovsdb-server would get called twice corrupting the 'vlog_modules' list causing an infinite loop. Signed-off-by: Gurucharan Shetty <gshetty@nicira.com> Reported-by: Gur Stavi <gstavi@mrv.com> Acked-by: Ben Pfaff <blp@nicira.com>
* netlink-socket: Fix handling socket allocation failure in nl_dump_start().Ben Pfaff2014-07-151-7/+6
| | | | | | | | | | | If nl_pool_alloc() failed, then 'dump' was not initialized at all and further use of the dump would access uninitialized data, probably causing a crash. Found by inspection. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Joe Stringer <joestringer@nicira.com>
* dpif-linux: Avoid null dereference if all ports disappear.Ben Pfaff2014-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | When dpif_linux_refresh_channels() refreshes the set of channels when the number of handlers changes, it destroys all the dpif's channels and sets dpif->uc_array_size to 0. If the port dump later in the function turns up no ports (which generally indicates a bug), then no channels will be allocated and thus dpif->uc_array_size will remain 0 and 'channels' will be null in each handler. This is self-consistent, at least, but dpif_linux_port_get_pid__() was still willing in this situation to try to access element 0 of the set of channels, dereferencing a null pointer. This fixes the problem. I encountered this while looking at a bug that I had introduced during development that caused the port dump to always be empty. It would be difficult to encounter in normal use. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Joe Stringer <joestringer@nicira.com>
* ofp-msgs: Correct code for queue configuration messages in OpenFlow 1.0.Ben Pfaff2014-07-153-4/+5
| | | | | Reported-by: Simon Jouet <simon.jouet@gmail.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* ofp-util: Fix null pointer dereference in ofputil_pull_buckets().Ben Pfaff2014-07-151-0/+1
| | | | | | | Found by clang-analyzer. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
* tests: Disable glibc memory checking under glibc <= 2.11.Ben Pfaff2014-07-111-4/+18
| | | | | | | | | | | | | | We noticed that the unit tests sometimes fail on XenServer inside glibc's memory checker, in the free_check() function. It turns out that the glibc memory checker in glibc 2.11 and earlier had an internal race that caused false positives in multithreaded programs. This commit avoids the problem by disabling the glibc memory checker in glibc 2.11 and earlier. VMware-BZ: #1267127 Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Gurucharan Shetty <gshetty@nicira.com>
* netlink-socket: Work around kernel Netlink dump thread races.Ben Pfaff2014-07-102-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Linux kernel Netlink implementation has two races that cause problems for processes that attempt to dump a table in a multithreaded manner. The first race is in the structure of the kernel netlink_recv() function. This function pulls a message from the socket queue and, if there is none, reports EAGAIN: skb = skb_recv_datagram(sk, flags, noblock, &err); if (skb == NULL) goto out; Only if a message is successfully read from the socket queue does the function, toward the end, try to queue up a new message to be dumped: if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) { ret = netlink_dump(sk); if (ret) { sk->sk_err = ret; sk->sk_error_report(sk); } } This means that if thread A reads a message from a dump, then thread B attempts to read one before A queues up the next, B will get EAGAIN. This means that, following EAGAIN, B needs to wait until A returns to userspace before it tries to read the socket again. nl_dump_next() already does this, using 'dump->status_seq' (although the need for it has never been explained clearly, to my knowledge). The second race is more serious. Suppose thread X and thread Y both simultaneously attempt to queue up a new message to be dumped, using the call to netlink_dump() quoted above. netlink_dump() begins with: mutex_lock(nlk->cb_mutex); cb = nlk->cb; if (cb == NULL) { err = -EINVAL; goto errout_skb; } Suppose that X gets cb_mutex first and finds that the dump is complete. It will therefore, toward the end of netlink_dump(), clear nlk->cb to NULL to indicate that no dump is in progress and release the mutex: nlk->cb = NULL; mutex_unlock(nlk->cb_mutex); When Y grabs cb_mutex afterward, it will see that nlk->cb is NULL and return -EINVAL as quoted above. netlink_recv() stuffs -EINVAL in sk_err, but that error is not reported immediately; instead, it is saved for the next read from the socket. Since Open vSwitch maintains a pool of Netlink sockets, that next failure can crop up pretty much anywhere. One of the worst places for it to crop up is in the execution of a later transaction (e.g. in nl_sock_transact_multiple__()), because userspace treats Netlink transactions as idempotent and will re-execute them when socket errors occur. For a transaction that sends a packet, this causes packet duplication, which we actually observed in practice. (ENOBUFS should actually cause transactions to be re-executed in many cases, but EINVAL should not; this is a separate bug in the userspace netlink code.) VMware-BZ: #1283188 Reported-and-tested-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
* netlink-socket: Fix sign of error code.Ben Pfaff2014-07-101-1/+1
| | | | | | | | | Commit 8f20fd98db (netlink-socket: Work around upstream kernel Netlink bug.) got the sign of the error code wrong, so that it reported e.g. -22 for EINVAL to nl_sock_recv__()'s caller, instead of 22. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
* datapath/flow_netlink: Fix NDP flow mask validationDaniele Di Proietto2014-07-101-1/+1
| | | | | | | | | | | | | | match_validate() enforce that a mask matching on NDP attributes has also an exact match on ICMPv6 type. The ICMPv6 type, which is 8-bit wide, is stored in the 'tp.src' field of 'struct sw_flow_key', which is 16-bit wide. Therefore, an exact match on ICMPv6 type should only check the first 8 bits. This commit fixes a bug that prevented flows with an exact match on NDP field from being installed Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com>