| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
Without this change, with shared libraries, VLOG
constructor for ovsdb-server would get called twice corrupting
the 'vlog_modules' list causing an infinite loop.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Reported-by: Gur Stavi <gstavi@mrv.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
| |
Clarify comments for trie-based subtable skipping.
Perform the cheaper check first.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we only returned the last matching prefix length
encountered during a trie lookup, and skipped subtables that had
prefixes longer than that. This patch changes the trie lookup
functions to return all matching prefix lengths seen, so that all
non-matching prefix lengths can be skipped.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
| |
These stylistic changes makes the following patch a bit simpler.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
|
|
|
|
|
|
|
|
|
|
|
| |
Now that it is clear that struct cls_classifier itself does not
need RCU indirection and pvector is defined in its own header, it
is possible get rid of the indirection from struct classifier to
struct cls_classifier.
Suggested-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following changes were made:
- Since we have two dpdk classes, we should split the initial operations needed
by both classes from the initialization needed by each class.
- The dpdk_ring class does not need an initialization function: it has been
removed. This also prevents many testcase from failing, because
dpdk_ring_class_init() was printing an unexpected log message
(OVS_VSWITCHD_START at tests/ofproto-macros.at:54 check for a specific set of
startup log messages)
- If the user doesn't pass the --dpdk option we do not register the dpdk*
classes
- Do not call VLOG_ERR if there are 0 dpdk ethernet device. OVS can now be used
with dpdk_ring devices.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
|
|
|
|
|
| |
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
|
|
|
|
|
|
|
|
| |
An upcoming commit will add a caller that needs to format both the key and
the value. That isn't cleanly possible with the current interface.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
|
|
|
|
|
|
|
|
|
| |
The only client for ofproto_controller_info was transforming the array of
pairs into an smap anyway. It's easy for the code that fills in the array
to generate it as an smap directly, and it's also easier to extend later.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
| |
'n_txq' was initialized to 0 and never modified, so pinsched_count_txqlen()
always returned 0. Instead, return the correct number.
This only affected the results of "ovs-appctl memory/show", and only if
controller rate limiting was turned on, so it is not a serious bug.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
|
|
|
|
|
| |
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 0d1cee123a84 (stp: Fix bpdu tx problem in listening state)
makes ovs drop the stp bpdu packets if stp is not enabled on the
input port. However, when pif bridge is used and stp is enabled
on the integration bridge. The flow translation of stp bpdu
packets will go through a level of resubmission which changes
the input port to the corresponding peer port. Since, the
patch port on the pif bridge does not have stp enabled, the
flow translation will drop the bpdu packets.
This commit fixes the issue by making ovs forward stp bpdu packets
on stp-disabled port.
VMware-BZ: #1284695
Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
|
|
|
|
|
|
|
|
|
| |
cmap_next_position() didn't update the node pointer while iterating through a
list of nodes with the same hash.
This commit fixes the bug and improve test-cmap to detect it.
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Shared memory ring patch
This patch enables the client dpdk rings within the netdev-dpdk. It adds
a new dpdk device called dpdkr (other naming suggestions?). This allows
for the use of shared memory to communicate with other dpdk applications,
on the host or within a virtual machine. Instructions for use are in
INSTALL.DPDK.
This has been tested on Intel multi-core platforms and with the client
application within the host.
Signed-off-by: Gerald Rogers <gerald.rogers@intel.com>
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 0791315e4d (netlink-socket: Work around kernel Netlink dump thread
races.) introduced a simple workaround for Linux kernel races in Netlink
dumps. However, the code remained more complicated than needed. This
commit simplifies it.
The main reason for complication in the code was 'status_seq' in nl_dump.
This member was there to allow a thread to wait for some other thread to
refill the socket buffer with another dump message (although we did not
understand the reason at the time it was introduced). Now that we know
that Netlink dumps properly need to be serialized to work in existing
Linux kernels, there's no additional value in having 'status_seq',
because serialized recvmsg() calls always refill the socket buffer
properly.
This commit updates nl_msg_next() to clear its buffer argument on error.
This is a more convenient interface for the new version of the Netlink
dump code. nl_msg_next() doesn't have any other callers.
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
| |
This refactors the ovsdb-server main loop into a new function, which allows
to call it from multiple places.
Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de>
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
| |
Suggested-by: Gerald Rogers <gerald.rogers@intel.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The userspace datapath returns RCU-protected actions from flow_get() and
flow_dump_next(). This doesn't cause any trouble for current users of
these functions, but it imposes additional constraints on their use.
This patch makes the dpif documentation more explicit about how the
results of these functions can be used.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
| |
commit 8661af798(timeval: Provide a variation for time/warp command)
provided a variation of time/warp command to prevent multiple
invocations of time/warp. That commit changed only the users in bfd.at
and cfm.at as they used it the most. Since we haven't had any negative
confequences because of the change, add the remaining users.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the 'interfaces' file, if an admin adds the openvswitch interface
in 'auto', ifupdown will try to create OVS interfaces even before
openvswitch has started. In a case like that, assume that the admin
knows what he is doing and try to start openvswitch.
The negatives I see are
1. /usr is NFS mounted, in which case expect the admin
to mount it through initramfs.
2. syslog through network may not be accessible.
This is similar to what rhel does with commit 602453000e28ec10
(rhel: Automatically start openvswitch service before bringing an ovs
interface up or down)
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
| |
This is a port of commit d7aab661 ( rhel: Add Patch Port support to
initscripts.) from rhel to debian's ifupdown script.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
| |
This removes a GCC 4.9 warning (unused-but-set-variable)
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
| |
Following patch adds NULL check for memory allocated
by kmalloc.
Reported-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
| |
If nl_pool_alloc() failed, then 'dump' was not initialized at all and
further use of the dump would access uninitialized data, probably causing
a crash.
Found by inspection.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
|
|
|
|
|
| |
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When dpif_linux_refresh_channels() refreshes the set of channels when
the number of handlers changes, it destroys all the dpif's channels and
sets dpif->uc_array_size to 0. If the port dump later in the function
turns up no ports (which generally indicates a bug), then no channels will
be allocated and thus dpif->uc_array_size will remain 0 and 'channels' will
be null in each handler. This is self-consistent, at least, but
dpif_linux_port_get_pid__() was still willing in this situation to
try to access element 0 of the set of channels, dereferencing a null
pointer.
This fixes the problem.
I encountered this while looking at a bug that I had introduced during
development that caused the port dump to always be empty. It would be
difficult to encounter in normal use.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Joe Stringer <joestringer@nicira.com>
|
|
|
|
|
| |
Reported-by: Simon Jouet <simon.jouet@gmail.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
| |
packages.
Signed-off-by: Kirkland Spector <kspector@salesforce.com>
Acked-by: Andrey Falko <afalko@salesforce.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
| |
Found by clang-analyzer.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
|
|
|
|
|
|
|
|
|
|
| |
These variables were initialized in an outer scope and then immediately
changed in an inner one, so they might as well be farther in.
Found by clang-analyzer.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
collect_rules_strict() and collect_rules_loose() destroy the rule
collections that they create if they return an error, and some of their
callers then go on to destroy them again. This could cause a double-free
in the case where rule_collection_destroy() actually calls free(). That
never happens in the current tree, because free() is only necessary if
malloc() was called and there's a 64-entry stub that none of the current
code in collect_rules_*() can fill up in their error cases. Still, it
seems better to fix the problem.
Found by clang-analyzer.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
|
|
|
|
|
|
|
| |
Found by clang-analyzer.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
| |
If the datapath doesn't dump a flow for some reason, and the current
dump is expected to revalidate all flows in the datapath, then perform
revalidation for those flows by fetching them during the sweep phase.
If revalidation is not required, then leave the flow in the datapath and
don't revalidate it.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
| |
Change the interface to allow implementations to pass back a buffer, and
allow callers to specify which of actions, mask, and stats they wish to
receive. This will be used in the next commit.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
| |
ovsdb-server's port on Windows does not support the "--run" option.
The two tests skipped in this commit make use of "--run" option to
test ovsdb-server's truncating of corrupt log or bad transaction.
It looks a little tricky to get this test running on Windows without
the "--run" option implemented.
Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
| |
Split up ovs_flow_cmd_fill_info() to make it easier to cache parts of a
dump reply. This will be used to streamline flow_dump in a future patch.
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We noticed that the unit tests sometimes fail on XenServer inside glibc's
memory checker, in the free_check() function. It turns out that the
glibc memory checker in glibc 2.11 and earlier had an internal race that
caused false positives in multithreaded programs.
This commit avoids the problem by disabling the glibc memory checker in
glibc 2.11 and earlier.
VMware-BZ: #1267127
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Gurucharan Shetty <gshetty@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If megaflows are disabled, the userspace does not send the netlink attribute
OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask.
sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the
bytes that represent padding for struct sw_flow, or the bytes that represent
fields that may not be set during ovs_flow_extract().
This is a problem, because when we extract a flow from a packet,
we do not memset() anymore the struct sw_flow to 0 (since commit 9cef26ac6a71).
This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(),
which operates on the netlink attributes rather than on the mask key. Using
this approach we are sure that only the bytes that the user provided in the
flow are matched.
Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we
now return with an error.
Reported-by: Alex Wang <alexw@nicira.com>
Suggested-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
|
|
|
|
|
|
|
|
|
| |
Following patch enables all available tunnel GSO features for OVS
bridge device so that ovs can use hardware offloads available to
underling device.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Use CRC32 intrinsics for hash computations when building for
X86_64 with SSE4_2.
Add a new hash_words64() and change hash_words() to be inlined when
'n_words' is a compile-time constant.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that all the relevant classifier structures use RCU and internal
mutual exclusion for modifications, we can remove the fat-rwlock and
thus make the classifier lookups lockless.
As the readers are operating concurrently with the writers, a
concurrent reader may or may not see a new rule being added by a
writer, depending on how the concurrent events overlap with each
other. Overall, this is no different from the former locked behavior,
but there the visibility of the new rule only depended on the timing
of the locking functions.
A new rule is first added to the segment indices, so the readers may
find the rule in the indices before the rule is visible in the
subtables 'rules' map. This may result in us losing the opportunity
to quit lookups earlier, resulting in sub-optimal wildcarding. This
will be fixed by forthcoming revalidation always scheduled after flow
table changes.
Similar behavior may happen due to us removing the overlapping rule
(if any) from the indices only after the corresponding new rule has
been added.
The subtable's max priority is updated only after a rule is inserted
to the maps, so the concurrent readers may not see the rule, as the
updated priority ordered subtable list will only be visible after the
subtable's max priority is updated.
Similarly, the classifier's partitions are updated by the caller after
the rule is inserted to the maps, so the readers may keep skipping the
subtable until they see the updated partitions.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
|
|
|
|
|
|
|
|
| |
cls_set_prefix_fields() now synchronizes explicitly with the readers,
waiting them to finish using the old configuration before changing to
the new configuration.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add an internal mutex to struct cls_classifier, and reorganize
classifier internal structures according to the user of each field,
marking the fields that need to be protected by the mutex. This makes
locking requirements easier to track, and may make lookup more memory
efficient.
After this patch there is some double locking, as callers are taking
the fat-rwlock, and we take the mutex internally. A following patch
will remove the classifier fat-rwlock, removing the (double) locking
overhead.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
|
|
|
|
|
|
|
| |
Rename 'nbits' as 'n_bits' to be more consistent with other count-like
fields.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
|
|
|
|
|
|
| |
A following patch will add the first external user.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
|
|
|
|
|
|
|
|
| |
Hide the cursor from the classifier iteration users and move locking to
the iterators. This will make following RCU changes simpler, as the call
sites of the iterators need not be changed at that point.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
|
|
|
|
|
|
|
| |
Use cmap instead of hmap & hindex in classifier.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Linux kernel Netlink implementation has two races that cause problems
for processes that attempt to dump a table in a multithreaded manner.
The first race is in the structure of the kernel netlink_recv() function.
This function pulls a message from the socket queue and, if there is none,
reports EAGAIN:
skb = skb_recv_datagram(sk, flags, noblock, &err);
if (skb == NULL)
goto out;
Only if a message is successfully read from the socket queue does the
function, toward the end, try to queue up a new message to be dumped:
if (nlk->cb && atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf / 2) {
ret = netlink_dump(sk);
if (ret) {
sk->sk_err = ret;
sk->sk_error_report(sk);
}
}
This means that if thread A reads a message from a dump, then thread B
attempts to read one before A queues up the next, B will get EAGAIN. This
means that, following EAGAIN, B needs to wait until A returns to userspace
before it tries to read the socket again. nl_dump_next() already does
this, using 'dump->status_seq' (although the need for it has never been
explained clearly, to my knowledge).
The second race is more serious. Suppose thread X and thread Y both
simultaneously attempt to queue up a new message to be dumped, using the
call to netlink_dump() quoted above. netlink_dump() begins with:
mutex_lock(nlk->cb_mutex);
cb = nlk->cb;
if (cb == NULL) {
err = -EINVAL;
goto errout_skb;
}
Suppose that X gets cb_mutex first and finds that the dump is complete. It
will therefore, toward the end of netlink_dump(), clear nlk->cb to NULL to
indicate that no dump is in progress and release the mutex:
nlk->cb = NULL;
mutex_unlock(nlk->cb_mutex);
When Y grabs cb_mutex afterward, it will see that nlk->cb is NULL and
return -EINVAL as quoted above. netlink_recv() stuffs -EINVAL in sk_err,
but that error is not reported immediately; instead, it is saved for the
next read from the socket. Since Open vSwitch maintains a pool of Netlink
sockets, that next failure can crop up pretty much anywhere. One of the
worst places for it to crop up is in the execution of a later transaction
(e.g. in nl_sock_transact_multiple__()), because userspace treats Netlink
transactions as idempotent and will re-execute them when socket errors
occur. For a transaction that sends a packet, this causes packet
duplication, which we actually observed in practice. (ENOBUFS should
actually cause transactions to be re-executed in many cases, but EINVAL
should not; this is a separate bug in the userspace netlink code.)
VMware-BZ: #1283188
Reported-and-tested-by: Alex Wang <alexw@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
|
|
|
|
|
|
|
|
|
| |
Commit 8f20fd98db (netlink-socket: Work around upstream kernel Netlink
bug.) got the sign of the error code wrong, so that it reported e.g. -22
for EINVAL to nl_sock_recv__()'s caller, instead of 22.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Alex Wang <alexw@nicira.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
skb_clone() NULL check is implemented in do_output(), as past of the
common (fast) path. Refactoring so that NULL check is done in the
slow path, immediately after skb_clone() is called.
Besides optimization, this change also improves code readability by
making the skb_clone() NULL check consistent within OVS datapath
module.
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
|