summaryrefslogtreecommitdiff
path: root/lib/netlink-socket.h
Commit message (Collapse)AuthorAgeFilesLines
* netlink linux: enable listening to all nsidsFlavio Leitner2018-03-311-0/+2
| | | | | | | | | | | | | | | Internal ports may be moved to another network namespace and when that happens, the vswitch stops receiving netlink notifications. This patch enables the vswitch to listen to all network namespaces that have a nsid assigned into the network namespace where the socket has been opened. It requires kernel 4.2 or newer. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netlink: provide network namespace id from a msg.Flavio Leitner2018-03-311-1/+1
| | | | | | | | The netlink notification's ancillary data contains the network namespace id (netnsid) needed to identify the device correctly. Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* netlink-socket: Reorder elements in nl_dump structure.Bhanuprakash Bodireddy2016-10-171-3/+3
| | | | | | | | | | | | | | By reordering the elements in nl_dump structure, pad bytes can be reduced there by saving a cache line. Before: structure size:72, holes:1, sum padbytes:4, cachelines:2 After: structure size:64, holes:0, sum padbytes:0, cachelines:1 Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Co-authored-by: Antonio Fischetti <antonio.fischetti@intel.com> Signed-off-by: Antonio Fischetti <antonio.fischetti@intel.com> Acked-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
* Move lib/ofpbuf.h to include/openvswitch directoryBen Warren2016-03-301-1/+1
| | | | | | Signed-off-by: Ben Warren <ben@skyportsystems.com> Acked-by: Ryan Moats <rmoats@us.ibm.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
* nl_sock_fd is not used under MSVCAlin Serdean2015-09-281-0/+2
| | | | | | | Ifdef out nl_sock_fd to make users aware it is not used. Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
* netlink-socket: Add packet subscribe functionality on Windows.Nithin Raju2014-10-231-0/+13
| | | | | | | | | | In this patch, we add support in userspace for packet subscribe API similar to the join/leave MC group API that is used for port events. The kernel code has already been commited. Signed-off-by: Nithin Raju <nithin@vmware.com> Acked-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* netlink-socket: Allow compiling on MSVC even without HAVE_NETLINK.Alin Serdean2014-07-291-0/+2
| | | | | | | Bypass the error compilation when compiling under MSVC. Signed-off-by: Alin Gabriel Serdean <aserdean@cloudbasesolutions.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* netlink-socket: Add conceptual documentation.Ben Pfaff2014-07-291-11/+141
| | | | | | | | | | Based on a conversation with the VMware Hyper-V team earlier today. This commit also changes a couple of functions that were only used with netlink-socket.c into static functions. I couldn't think of a reason for code outside that file to use them. Signed-off-by: Ben Pfaff <blp@nicira.com>
* netlink-socket: Simplify multithreaded dumping to match Linux reality.Ben Pfaff2014-07-161-4/+7
| | | | | | | | | | | | | | | | | | | | | | Commit 0791315e4d (netlink-socket: Work around kernel Netlink dump thread races.) introduced a simple workaround for Linux kernel races in Netlink dumps. However, the code remained more complicated than needed. This commit simplifies it. The main reason for complication in the code was 'status_seq' in nl_dump. This member was there to allow a thread to wait for some other thread to refill the socket buffer with another dump message (although we did not understand the reason at the time it was introduced). Now that we know that Netlink dumps properly need to be serialized to work in existing Linux kernels, there's no additional value in having 'status_seq', because serialized recvmsg() calls always refill the socket buffer properly. This commit updates nl_msg_next() to clear its buffer argument on error. This is a more convenient interface for the new version of the Netlink dump code. nl_msg_next() doesn't have any other callers. Signed-off-by: Ben Pfaff <blp@nicira.com>
* netlink-socket: Work around kernel Netlink dump thread races.Ben Pfaff2014-07-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Linux kernel Netlink implementation has two races that cause problems for processes that attempt to dump a table in a multithreaded manner. The first race is in the structure of the kernel netlink_recv() function. This function pulls a message from the socket queue and, if there is none, reports EAGAIN: skb = skb_recv_datagram(sk, flags, noblock, &err); if (skb == NULL) goto out; Only if a message is successfully read from the socket queue does the function, toward the end, try to queue up a new message to be dumped: if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) { ret = netlink_dump(sk); if (ret) { sk->sk_err = ret; sk->sk_error_report(sk); } } This means that if thread A reads a message from a dump, then thread B attempts to read one before A queues up the next, B will get EAGAIN. This means that, following EAGAIN, B needs to wait until A returns to userspace before it tries to read the socket again. nl_dump_next() already does this, using 'dump->status_seq' (although the need for it has never been explained clearly, to my knowledge). The second race is more serious. Suppose thread X and thread Y both simultaneously attempt to queue up a new message to be dumped, using the call to netlink_dump() quoted above. netlink_dump() begins with: mutex_lock(nlk->cb_mutex); cb = nlk->cb; if (cb == NULL) { err = -EINVAL; goto errout_skb; } Suppose that X gets cb_mutex first and finds that the dump is complete. It will therefore, toward the end of netlink_dump(), clear nlk->cb to NULL to indicate that no dump is in progress and release the mutex: nlk->cb = NULL; mutex_unlock(nlk->cb_mutex); When Y grabs cb_mutex afterward, it will see that nlk->cb is NULL and return -EINVAL as quoted above. netlink_recv() stuffs -EINVAL in sk_err, but that error is not reported immediately; instead, it is saved for the next read from the socket. Since Open vSwitch maintains a pool of Netlink sockets, that next failure can crop up pretty much anywhere. One of the worst places for it to crop up is in the execution of a later transaction (e.g. in nl_sock_transact_multiple__()), because userspace treats Netlink transactions as idempotent and will re-execute them when socket errors occur. For a transaction that sends a packet, this causes packet duplication, which we actually observed in practice. (ENOBUFS should actually cause transactions to be re-executed in many cases, but EINVAL should not; this is a separate bug in the userspace netlink code.) VMware-BZ: #1283188 Reported-and-tested-by: Alex Wang <alexw@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Alex Wang <alexw@nicira.com>
* netlink: Make nl_dump_next() thread-safe.Joe Stringer2014-02-271-2/+14
| | | | | | | | | | | | | | | | | | | | | | This patch modifies 'struct nl_dump' and nl_dump_next() to allow multiple threads to share the same nl_dump. These changes are targeted around synchronizing dump status between multiple callers, and allowing callers to fully process their existing buffers before determining whether to stop fetching flows. The 'status' field of 'struct nl_dump' becomes atomic, so that multiple threads may check and/or update it to communicate when there is an error or the netlink dump is finished. The low bit holds whether the final message was seen, while the higher bits hold an errno value. nl_dump_next() will now read all messages from the given buffer before checking the shared error status and attempting to fetch more. Multiple threads may call this with the same nl_dump, but must provide independent buffers. As previously, the final dump status can be determined by calling nl_dump_done() from a single thread. Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* netlink: Remove buffer from 'struct nl_dump'.Joe Stringer2014-02-271-2/+3
| | | | | | | | | This patch makes all of the users of 'struct nl_dump' allocate their own buffers to pass down to nl_dump_next(). This paves the way for allowing multithreaded flow dumping. Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* netlink: Rename 'dump->seq' to 'dump->nl_seq'Joe Stringer2014-01-231-2/+2
| | | | | | | | An upcoming patch will introduce another, completely unrelated seq to 'struct nl_dump'. Giving this one a better name should reduce confusion. Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* datapath: Cleanup netlink compat code.Pravin B Shelar2013-09-061-2/+1
| | | | | | | | Patch removes genl, netlink, rtnl compat code and dpif-linux fallback-id compat code. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
* netlink-socket: Make thread-safe.Ben Pfaff2013-07-181-0/+6
| | | | | | | The uses of vlog in this module are not thread-safe, because vlog itself is not yet thread-safe. Signed-off-by: Ben Pfaff <blp@nicira.com>
* netlink-socket: Simplify use of transactions and dumps.Ben Pfaff2013-07-181-2/+7
| | | | | | | This disentangles "struct nl_dump" from "struct nl_sock", clearing the way to make the use of either one thread-safe in an obviously correct manner. Signed-off-by: Ben Pfaff <blp@nicira.com>
* ovs-brcompatd: Fix sending replies to kernel requests.Ben Pfaff2012-07-051-0/+2
| | | | | | | | | | | | | | | Commit 7d7447 (netlink: Postpone choosing sequence numbers until send time.) broke ovs-brcompatd because it prevented userspace replies to kernel requests from using the correct sequence numbers. This commit fixes it. Atzm Watanabe found the root cause and provided an alternative patch to avoid the problem. Reported-by: André Ruß <andre.russ@hybris.com> Reported-by: Atzm Watanabe <atzm@stratosphere.co.jp> Tested-by: Atzm Watanabe <atzm@stratosphere.co.jp> Signed-off-by: Ben Pfaff <blp@nicira.com>
* Global replace of Nicira Networks.Raju Subramanian2012-05-021-1/+1
| | | | | | | | Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc. Feature #10593 Signed-off-by: Raju Subramanian <rsubramanian@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
* netlink-socket: Make caller provide message receive buffers.Ben Pfaff2012-04-181-7/+13
| | | | | | | Typically an nl_sock client can stack-allocate the buffer for receiving a Netlink message, which provides a performance boost. Signed-off-by: Ben Pfaff <blp@nicira.com>
* netlink-socket: Remove unnecessary #include.Ben Pfaff2012-04-181-1/+0
| | | | Signed-off-by: Ben Pfaff <blp@nicira.com>
* dpif-linux: Use poll() internally in dpif_linux_recv().Ben Pfaff2011-11-281-0/+1
| | | | | | | | | | | | | Using poll() internally in dpif_linux_recv(), instead of relying on the results of the main loop poll() call, brings netperf CRR performance back within 1% of par versus the code base before the poll_fd_woke() optimizations were introduced. It also increases the ovs-benchmark results by about 5% versus that baseline, too. My theory is that this is because the main loop takes long enough that a significant number of packets can arrive during the main loop itself, so this reduces the time before OVS gets to those packets.
* Revert "poll-loop: Enable checking whether a FD caused a wakeup."Ben Pfaff2011-11-281-1/+0
| | | | | | This reverts commit 1e276d1a10539a8cd97d2ad63c073a9a43f0f1ef. The poll_fd_woke() and nl_sock_woke() function added in that commit are no longer used, so there is no reason to keep them in the tree.
* netlink-socket: New function nl_sock_transact_multiple().Ben Pfaff2011-10-141-0/+14
| | | | This will be used in an upcoming commit.
* poll-loop: Enable checking whether a FD caused a wakeup.Jesse Gross2011-09-231-0/+1
| | | | | | | | | Each time we run through the poll loop, we check all file descriptors that we were waiting on to see if there is data available. However, this requires a system call and poll already provides information on which FDs caused the wakeup so it is inefficient as the number of active FDs grows. This provides a way to check whether a given FD has data.
* netlink: Expose method to get Netlink pid of a socket.Jesse Gross2011-09-231-0/+2
| | | | | | | | In the future, the kernel will use unicast messages instead of multicast to send upcalls. As a result, we need to be able to tell it where to direct the traffic. This adds a function to expose the Netlink pid of a socket so it can be included in messages to the kernel.
* dpif-linux: Handle nl_lookup_genl_mcgroup() failures.Ethan Jackson2011-09-161-1/+2
| | | | | | | | | The nl_lookup_genl_mcgroup() function can fail on older kernels which do not support the required netlink interface. Before this patch, dpif-linux would refuse to create a datapath when this happened. With this patch, it attempts to use a workaround. If the workaround fails it simply disables the affected features without completely disabling the dpif.
* netlink-socket: New function nl_lookup_genl_mcgroup().Ethan Jackson2011-09-011-0/+2
|
* netlink-socket: Remove unused nl_sock_sendv() function.Ben Pfaff2011-07-271-3/+0
| | | | This function hasn't been used for ages.
* netlink-socket: Make dumping and doing transactions on same nl_sock safe.Ben Pfaff2011-01-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's not safe to use a single Netlink fd to do multiple operations in an synchronous way. Some of the limitations are fundamental; for example, the kernel only supports a single "dump" operation at a time. Others are limitations imposed by the OVS coding style; for example, our Netlink library is not callback based, so nothing can be done about incoming messages that can't be handled immediately. Regardless, in OVS multicast groups, transactions, and dumps cannot coexist on a single nl_sock. This is only mildly irritating at the moment, but it will become much worse later on, when dpif-linux shifts to using Netlink dumps for listing various kinds of datapath entities. When that happens, a dump will be in progress in situations where the dpif-linux client might want to do other operations. For example, it is reasonable for the client to list flows and, in the middle, look up information on vports mentioned in those flows. It might be possible to simply ban and avoid such nested operations--I have not even audited the source tree to find out whether we do anything like that already--but that seems like an unnecessary cramp on our coding style. Furthermore, it's difficult to explain and justify without understanding the implementation. This patch takes another approach, by improving the Netlink socket library to avoid artificial constraints. When an operation, or a dump, or joining a multicast group would cause a problem, this patch makes the library transparently create a separate Netlink socket. This solves the problem without putting any onerous restrictions on use. This commit also slightly simplifies netdev_vport_reset_names(). It had been written to destroy the dump object before the Netlink socket that it used, but this is no longer necessary and doing it in the opposite order saved a few lines of code. Reviewed by Ethan Jackson <ethan@nicira.com>.
* netlink-socket: New function for draining the receive buffer.Ben Pfaff2011-01-271-0/+2
| | | | | | This will be used in an upcoming patch. Reviewed by Justin Pettit.
* netlink-socket: Add functions for joining and leaving multicast groups.Ben Pfaff2011-01-271-4/+5
| | | | | | | | | | | | | When this library was originally implemented, support for Linux 2.4 was important. The Netlink implementation in Linux only added support for joining and leaving multicast groups after a socket is bound as of Linux 2.6.14, so the library did not support it either. But the current version of Open vSwitch targets Linux 2.6.18 and over, so it's fine to add this support now, and this commit does so. This will be used more extensively in upcoming commits. Reviewed by Justin Pettit.
* netlink: Split into generic and Linux-specific parts.Ben Pfaff2010-12-101-0/+78
The parts of the netlink module that are related to sockets are Linux-specific, since only Linux has AF_NETLINK sockets. The rest can be built anywhere. This commit breaks them into two modules, and builds the generic one on all platforms. Acked-by: Jesse Gross <jesse@nicira.com>