summaryrefslogtreecommitdiff
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* net: Use local_bh_disable in netif_rx_ni()Thomas Gleixner2013-09-131-4/+2
| | | | | | | | | | This code triggers the new WARN in __raise_softirq_irqsoff() though it actually looks at the softirq pending bit and calls into the softirq code, but that fits not well with the context related softirq model of RT. It's correct on mainline though, but going through local_bh_disable/enable here is not going to hurt badly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* net: netfilter: Serialize xt_write_recseq sections on RTThomas Gleixner2013-09-131-0/+6
| | | | | | | | | | The netfilter code relies only on the implicit semantics of local_bh_disable() for serializing wt_write_recseq sections. RT breaks that and needs explicit serialization here. Reported-by: Peter LaDow <petela@gocougs.wsu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
* net: Use get_cpu_light() in ip_send_unicast_reply()Thomas Gleixner2013-09-131-2/+3
| | | | Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* net: Another local_irq_disable/kmalloc headacheThomas Gleixner2013-09-131-2/+4
| | | | | | Replace it by a local lock. Though that's pretty inefficient :( Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* net: Use cpu_chill() instead of cpu_relax()Thomas Gleixner2013-09-132-3/+5
| | | | | | | | | Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
* net,RT:REmove preemption disabling in netif_rx()Priyanka Jain2013-09-131-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1)enqueue_to_backlog() (called from netif_rx) should be bind to a particluar CPU. This can be achieved by disabling migration. No need to disable preemption 2)Fixes crash "BUG: scheduling while atomic: ksoftirqd" in case of RT. If preemption is disabled, enqueue_to_backog() is called in atomic context. And if backlog exceeds its count, kfree_skb() is called. But in RT, kfree_skb() might gets scheduled out, so it expects non atomic context. 3)When CONFIG_PREEMPT_RT_FULL is not defined, migrate_enable(), migrate_disable() maps to preempt_enable() and preempt_disable(), so no change in functionality in case of non-RT. -Replace preempt_enable(), preempt_disable() with migrate_enable(), migrate_disable() respectively -Replace get_cpu(), put_cpu() with get_cpu_light(), put_cpu_light() respectively Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com> Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com> Cc: <rostedt@goodmis.orgn> Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* softirq: Check preemption after reenabling interruptsThomas Gleixner2013-09-131-0/+6
| | | | | | | | | | | | | | raise_softirq_irqoff() disables interrupts and wakes the softirq daemon, but after reenabling interrupts there is no preemption check, so the execution of the softirq thread might be delayed arbitrarily. In principle we could add that check to local_irq_enable/restore, but that's overkill as the rasie_softirq_irqoff() sections are the only ones which show this behaviour. Reported-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
* net: sysrq via icmpCarsten Emde2013-09-132-0/+37
| | | | | | | | | | | There are (probably rare) situations when a system crashed and the system console becomes unresponsive but the network icmp layer still is alive. Wouldn't it be wonderful, if we then could submit a sysreq command via ping? This patch provides this facility. Please consult the updated documentation Documentation/sysrq.txt for details. Signed-off-by: Carsten Emde <C.Emde@osadl.org>
* net: Avoid livelock in net_tx_action() on RTSteven Rostedt2013-09-131-1/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | qdisc_lock is taken w/o disabling interrupts or bottom halfs. So code holding a qdisc_lock() can be interrupted and softirqs can run on the return of interrupt in !RT. The spin_trylock() in net_tx_action() makes sure, that the softirq does not deadlock. When the lock can't be acquired q is requeued and the NET_TX softirq is raised. That causes the softirq to run over and over. That works in mainline as do_softirq() has a retry loop limit and leaves the softirq processing in the interrupt return path and schedules ksoftirqd. The task which holds qdisc_lock cannot be preempted, so the lock is released and either ksoftirqd or the next softirq in the return from interrupt path can proceed. Though it's a bit strange to actually run MAX_SOFTIRQ_RESTART (10) loops before it decides to bail out even if it's clear in the first iteration :) On RT all softirq processing is done in a FIFO thread and we don't have a loop limit, so ksoftirqd preempts the lock holder forever and unqueues and requeues until the reset button is hit. Due to the forced threading of ksoftirqd on RT we actually cannot deadlock on qdisc_lock because it's a "sleeping lock". So it's safe to replace the spin_trylock() with a spin_lock(). When contended, ksoftirqd is scheduled out and the lock holder can proceed. [ tglx: Massaged changelog and code comments ] Solved-by: Thomas Gleixner <tglx@linuxtronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Carsten Emde <cbe@osadl.org> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Luis Claudio R. Goncalves <lclaudio@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* skbufhead-raw-lock.patchThomas Gleixner2013-09-131-6/+20
| | | | Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* net-netif_rx_ni-migrate-disable.patchThomas Gleixner2013-09-131-2/+2
| | | | Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* net-wireless-warn-nort.patchThomas Gleixner2013-09-131-1/+1
| | | | Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* softirq-thread-do-softirq.patchThomas Gleixner2013-09-131-1/+1
| | | | Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* net-flip-lock-dep-thingy.patchThomas Gleixner2013-09-131-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ======================================================= [ INFO: possible circular locking dependency detected ] 3.0.0-rc3+ #26 ------------------------------------------------------- ip/1104 is trying to acquire lock: (local_softirq_lock){+.+...}, at: [<ffffffff81056d12>] __local_lock+0x25/0x68 but task is already holding lock: (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (sk_lock-AF_INET){+.+...}: [<ffffffff810836e5>] lock_acquire+0x103/0x12e [<ffffffff813e2781>] lock_sock_nested+0x82/0x92 [<ffffffff81433308>] lock_sock+0x10/0x12 [<ffffffff81433afa>] tcp_close+0x1b/0x355 [<ffffffff81453c99>] inet_release+0xc3/0xcd [<ffffffff813dff3f>] sock_release+0x1f/0x74 [<ffffffff813dffbb>] sock_close+0x27/0x2b [<ffffffff81129c63>] fput+0x11d/0x1e3 [<ffffffff81126577>] filp_close+0x70/0x7b [<ffffffff8112667a>] sys_close+0xf8/0x13d [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b -> #0 (local_softirq_lock){+.+...}: [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8 [<ffffffff810836e5>] lock_acquire+0x103/0x12e [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a [<ffffffff81056d12>] __local_lock+0x25/0x68 [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f [<ffffffff81433c38>] tcp_close+0x159/0x355 [<ffffffff81453c99>] inet_release+0xc3/0xcd [<ffffffff813dff3f>] sock_release+0x1f/0x74 [<ffffffff813dffbb>] sock_close+0x27/0x2b [<ffffffff81129c63>] fput+0x11d/0x1e3 [<ffffffff81126577>] filp_close+0x70/0x7b [<ffffffff8112667a>] sys_close+0xf8/0x13d [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_INET); lock(local_softirq_lock); lock(sk_lock-AF_INET); lock(local_softirq_lock); *** DEADLOCK *** 1 lock held by ip/1104: #0: (sk_lock-AF_INET){+.+...}, at: [<ffffffff81433308>] lock_sock+0x10/0x12 stack backtrace: Pid: 1104, comm: ip Not tainted 3.0.0-rc3+ #26 Call Trace: [<ffffffff81081649>] print_circular_bug+0x1f8/0x209 [<ffffffff81082ecc>] __lock_acquire+0xacc/0xdc8 [<ffffffff81056d12>] ? __local_lock+0x25/0x68 [<ffffffff810836e5>] lock_acquire+0x103/0x12e [<ffffffff81056d12>] ? __local_lock+0x25/0x68 [<ffffffff81046c75>] ? get_parent_ip+0x11/0x41 [<ffffffff814a7e40>] _raw_spin_lock+0x3b/0x4a [<ffffffff81056d12>] ? __local_lock+0x25/0x68 [<ffffffff81046c8c>] ? get_parent_ip+0x28/0x41 [<ffffffff81056d12>] __local_lock+0x25/0x68 [<ffffffff81056d8b>] local_bh_disable+0x36/0x3b [<ffffffff81433308>] ? lock_sock+0x10/0x12 [<ffffffff814a7fc4>] _raw_write_lock_bh+0x16/0x4f [<ffffffff81433c38>] tcp_close+0x159/0x355 [<ffffffff81453c99>] inet_release+0xc3/0xcd [<ffffffff813dff3f>] sock_release+0x1f/0x74 [<ffffffff813dffbb>] sock_close+0x27/0x2b [<ffffffff81129c63>] fput+0x11d/0x1e3 [<ffffffff81126577>] filp_close+0x70/0x7b [<ffffffff8112667a>] sys_close+0xf8/0x13d [<ffffffff814ae882>] system_call_fastpath+0x16/0x1b Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* SUNRPC: Fix memory corruption issue on 32-bit highmem systemsTrond Myklebust2013-09-111-3/+6
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 347e2233b7667e336d9f671f1a52dfa3f0416e2c ] Some architectures, such as ARM-32 do not return the same base address when you call kmap_atomic() twice on the same page. This causes problems for the memmove() call in the XDR helper routine "_shift_data_right_pages()", since it defeats the detection of overlapping memory ranges, and has been seen to corrupt memory. The fix is to distinguish between the case where we're doing an inter-page copy or not. In the former case of we know that the memory ranges cannot possibly overlap, so we can additionally micro-optimise by replacing memmove() with memcpy(). Reported-by: Mark Young <MYoung@nvidia.com> Reported-by: Matt Craighead <mcraighead@nvidia.com> Cc: Bruce Fields <bfields@fieldses.org> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Matt Craighead <mcraighead@nvidia.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* cfg80211: fix P2P GO interface teardownJohannes Berg2013-09-111-0/+1
| | | | | | | | | | | | | | | | | [ Upstream commit 74418edec915d0f446debebde08d170c7b8ba0ee ] When a P2P GO interface goes down, cfg80211 doesn't properly tear it down, leading to warnings later. Add the GO interface type to the enumeration to tear it down like AP interfaces. Otherwise, we leave it pending and mac80211's state can get very confused, leading to warnings later. Cc: stable@vger.kernel.org Reported-by: Ilan Peer <ilan.peer@intel.com> Tested-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net_sched: info leak in atm_tc_dump_class()Dan Carpenter2013-09-111-0/+1
| | | | | | | | | | | [ Upstream commit 8cb3b9c3642c0263d48f31d525bcee7170eedc20 ] The "pvc" struct has a hole after pvc.sap_family which is not cleared. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* af_key: more info leaks in pfkey messagesDan Carpenter2013-09-111-0/+4
| | | | | | | | | | | | | | [ Upstream commit ff862a4668dd6dba962b1d2d8bd344afa6375683 ] This is inspired by a5cc68f3d6 "af_key: fix info leaks in notify messages". There are some struct members which don't get initialized and could disclose small amounts of private information. Acked-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net_sched: Fix stack info leak in cbq_dump_wrr().David S. Miller2013-09-111-0/+1
| | | | | | | | | | | | | [ Upstream commit a0db856a95a29efb1c23db55c02d9f0ff4f0db48 ] Make sure the reserved fields, and padding (if any), are fully initialized. Based upon a patch by Dan Carpenter and feedback from Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* ipv6: take rtnl_lock and mark mrt6 table as freed on namespace cleanupHannes Frederic Sowa2013-09-111-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 905a6f96a1b18e490a75f810d733ced93c39b0e5 ] Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer which leads to a panic (from Srivatsa S. Bhat): BUG: unable to handle kernel paging request at ffff882018552020 IP: [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] PGD 290a067 PUD 207ffe0067 PMD 207ff1d067 PTE 8000002018552060 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter +ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma dca mlx4_core be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 7 Comm: kworker/u33:0 Not tainted 3.11.0-rc1-ea45e-a #4 Hardware name: IBM -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012 Workqueue: netns cleanup_net task: ffff8810393641c0 ti: ffff881039366000 task.ti: ffff881039366000 RIP: 0010:[<ffffffffa0366b02>] [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] RSP: 0018:ffff881039367bd8 EFLAGS: 00010286 RAX: ffff881039367fd8 RBX: ffff882018552000 RCX: dead000000200200 RDX: 0000000000000000 RSI: ffff881039367b68 RDI: ffff881039367b68 RBP: ffff881039367bf8 R08: ffff881039367b68 R09: 2222222222222222 R10: 2222222222222222 R11: 2222222222222222 R12: ffff882015a7a040 R13: ffff882014eb89c0 R14: ffff8820289e2800 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff882018552020 CR3: 0000000001c0b000 CR4: 00000000000407f0 Stack: ffff881039367c18 ffff882014eb89c0 ffff882015e28c00 0000000000000000 ffff881039367c18 ffffffffa034d9d1 ffff8820289e2800 ffff882014eb89c0 ffff881039367c58 ffffffff815bdecb ffffffff815bddf2 ffff882014eb89c0 Call Trace: [<ffffffffa034d9d1>] rawv6_close+0x21/0x40 [ipv6] [<ffffffff815bdecb>] inet_release+0xfb/0x220 [<ffffffff815bddf2>] ? inet_release+0x22/0x220 [<ffffffffa032686f>] inet6_release+0x3f/0x50 [ipv6] [<ffffffff8151c1d9>] sock_release+0x29/0xa0 [<ffffffff81525520>] sk_release_kernel+0x30/0x70 [<ffffffffa034f14b>] icmpv6_sk_exit+0x3b/0x80 [ipv6] [<ffffffff8152fff9>] ops_exit_list+0x39/0x60 [<ffffffff815306fb>] cleanup_net+0xfb/0x1a0 [<ffffffff81075e3a>] process_one_work+0x1da/0x610 [<ffffffff81075dc9>] ? process_one_work+0x169/0x610 [<ffffffff81076390>] worker_thread+0x120/0x3a0 [<ffffffff81076270>] ? process_one_work+0x610/0x610 [<ffffffff8107da2e>] kthread+0xee/0x100 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70 [<ffffffff8162a99c>] ret_from_fork+0x7c/0xb0 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70 Code: 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 4c 8b 67 30 49 89 fd e8 db 3c 1e e1 49 8b 9c 24 90 08 00 00 48 85 db 74 06 <4c> 39 6b 20 74 20 bb f3 ff ff ff e8 8e 3c 1e e1 89 d8 4c 8b 65 RIP [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] RSP <ffff881039367bd8> CR2: ffff882018552020 Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* sysctl net: Keep tcp_syn_retries inside the boundaryMichal Tesar2013-09-111-1/+5
| | | | | | | | | | | [ Upstream commit 651e92716aaae60fc41b9652f54cb6803896e0da ] Limit the min/max value passed to the /proc/sys/net/ipv4/tcp_syn_retries. Signed-off-by: Michal Tesar <mtesar@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* mac80211: fix ethtool stats for non-station interfacesJohannes Berg2013-09-111-0/+2
| | | | | | | | | | | | | | | [ Upstream commit e13bae4f807401729b3f27c7e882a96b8b292809 ] As reported in https://bugzilla.kernel.org/show_bug.cgi?id=60514, the station loop never initialises 'sinfo' and therefore adds up a stack values, leaking stack information (the number of times it adds values is easily obtained another way.) Fix this by initialising the sinfo for each station to add. Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* mac80211: fix duplicate retransmission detectionJohannes Berg2013-09-111-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 6b0f32745dcfba01d7be33acd1b40306c7a914c6 ] The duplicate retransmission detection code in mac80211 erroneously attempts to do the check for every frame, even frames that don't have a sequence control field or that don't use it (QoS-Null frames.) This is problematic because it causes the code to access data beyond the end of the SKB and depending on the data there will drop packets erroneously. Correct the code to not do duplicate detection for such frames. I found this error while testing AP powersave, it lead to retransmitted PS-Poll frames being dropped entirely as the data beyond the end of the SKB was always zero. Cc: stable@vger.kernel.org [all versions] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* nl80211: fix mgmt tx status and testmode reporting for netnsMichal Kazior2013-09-111-2/+5
| | | | | | | | | | | | | | | | | [ Upstream commit a0ec570f4f69c4cb700d743a915096c2c8f56a99 ] These two events were sent to the default network namespace. This caused AP mode in a non-default netns to not work correctly. Mgmt tx status was multicasted to a different (default) netns instead of the one the AP was in. Cc: stable@vger.kernel.org Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* svcrdma: underflow issue in decode_write_list()Dan Carpenter2013-09-111-6/+14
| | | | | | | | | | | | | | | | [ Upstream commit b2781e1021525649c0b33fffd005ef219da33926 ] My static checker marks everything from ntohl() as untrusted and it complains we could have an underflow problem doing: return (u32 *)&ary->wc_array[nchunks]; Also on 32 bit systems the upper bound check could overflow. Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* vlan: fix a race in egress prio managementEric Dumazet2013-09-111-0/+7
| | | | | | | | | | | | | | | | [ Upstream commit 3e3aac497513c669e1c62c71e1d552ea85c1d974 ] egress_priority_map[] hash table updates are protected by rtnl, and we never remove elements until device is dismantled. We have to make sure that before inserting an new element in hash table, all its fields are committed to memory or else another cpu could find corrupt values and crash. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* vlan: mask vlan prio bitsEric Dumazet2013-09-112-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit d4b812dea4a236f729526facf97df1a9d18e191c ] In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e ("vlan: don't deliver frames for unknown vlans to protocols") Florian made sure we set pkt_type to PACKET_OTHERHOST if the vlan id is set and we could find a vlan device for this particular id. But we also have a problem if prio bits are set. Steinar reported an issue on a router receiving IPv6 frames with a vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device, because skb->vlan_tci is set. Forwarded frame is completely corrupted : We can see (8100:4000) being inserted in the middle of IPv6 source address : 16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 > 9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64 0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651 0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213 0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 It seems we are not really ready to properly cope with this right now. We can probably do better in future kernels : vlan_get_ingress_priority() should be a netdev property instead of a per vlan_dev one. For stable kernels, lets clear vlan_tci to fix the bugs. Reported-by: Steinar H. Gunderson <sesse@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* skb: report completion status for zero copy skbsMichael S. Tsirkin2013-09-111-2/+2
| | | | | | | | | | | | | | | | | | | [ Upstream commit e19d6763cc300fcb706bd291b24ac06be71e1ce6 ] Even if skb is marked for zero copy, net core might still decide to copy it later which is somewhat slower than a copy in user context: besides copying the data we need to pin/unpin the pages. Add a parameter reporting such cases through zero copy callback: if this happens a lot, device can take this into account and switch to copying in user context. This patch updates all users but ignores the passed value for now: it will be used by follow-up patches. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* 9p: fix off by one causing access violations and memory corruptionSasha Levin2013-09-111-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 110ecd69a9feea82a152bbf9b12aba57e6396883 ] p9_release_pages() would attempt to dereference one value past the end of pages[]. This would cause the following crashes: [ 6293.171817] BUG: unable to handle kernel paging request at ffff8807c96f3000 [ 6293.174146] IP: [<ffffffff8412793b>] p9_release_pages+0x3b/0x60 [ 6293.176447] PGD 79c5067 PUD 82c1e3067 PMD 82c197067 PTE 80000007c96f3060 [ 6293.180060] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 6293.180060] Modules linked in: [ 6293.180060] CPU: 62 PID: 174043 Comm: modprobe Tainted: G W 3.10.0-next-20130710-sasha #3954 [ 6293.180060] task: ffff8807b803b000 ti: ffff880787dde000 task.ti: ffff880787dde000 [ 6293.180060] RIP: 0010:[<ffffffff8412793b>] [<ffffffff8412793b>] p9_release_pages+0x3b/0x60 [ 6293.214316] RSP: 0000:ffff880787ddfc28 EFLAGS: 00010202 [ 6293.214316] RAX: 0000000000000001 RBX: ffff8807c96f2ff8 RCX: 0000000000000000 [ 6293.222017] RDX: ffff8807b803b000 RSI: 0000000000000001 RDI: ffffea001c7e3d40 [ 6293.222017] RBP: ffff880787ddfc48 R08: 0000000000000000 R09: 0000000000000000 [ 6293.222017] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001 [ 6293.222017] R13: 0000000000000001 R14: ffff8807cc50c070 R15: ffff8807cc50c070 [ 6293.222017] FS: 00007f572641d700(0000) GS:ffff8807f3600000(0000) knlGS:0000000000000000 [ 6293.256784] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 6293.256784] CR2: ffff8807c96f3000 CR3: 00000007c8e81000 CR4: 00000000000006e0 [ 6293.256784] Stack: [ 6293.256784] ffff880787ddfcc8 ffff880787ddfcc8 0000000000000000 ffff880787ddfcc8 [ 6293.256784] ffff880787ddfd48 ffffffff84128be8 ffff880700000002 0000000000000001 [ 6293.256784] ffff8807b803b000 ffff880787ddfce0 0000100000000000 0000000000000000 [ 6293.256784] Call Trace: [ 6293.256784] [<ffffffff84128be8>] p9_virtio_zc_request+0x598/0x630 [ 6293.256784] [<ffffffff8115c610>] ? wake_up_bit+0x40/0x40 [ 6293.256784] [<ffffffff841209b1>] p9_client_zc_rpc+0x111/0x3a0 [ 6293.256784] [<ffffffff81174b78>] ? sched_clock_cpu+0x108/0x120 [ 6293.256784] [<ffffffff84122a21>] p9_client_read+0xe1/0x2c0 [ 6293.256784] [<ffffffff81708a90>] v9fs_file_read+0x90/0xc0 [ 6293.256784] [<ffffffff812bd073>] vfs_read+0xc3/0x130 [ 6293.256784] [<ffffffff811a78bd>] ? trace_hardirqs_on+0xd/0x10 [ 6293.256784] [<ffffffff812bd5a2>] SyS_read+0x62/0xa0 [ 6293.256784] [<ffffffff841a1a00>] tracesys+0xdd/0xe2 [ 6293.256784] Code: 66 90 48 89 fb 41 89 f5 48 8b 3f 48 85 ff 74 29 85 f6 74 25 45 31 e4 66 0f 1f 84 00 00 00 00 00 e8 eb 14 12 fd 41 ff c4 49 63 c4 <48> 8b 3c c3 48 85 ff 74 05 45 39 e5 75 e7 48 83 c4 08 5b 41 5c [ 6293.256784] RIP [<ffffffff8412793b>] p9_release_pages+0x3b/0x60 [ 6293.256784] RSP <ffff880787ddfc28> [ 6293.256784] CR2: ffff8807c96f3000 [ 6293.256784] ---[ end trace 50822ee72cd360fc ]--- Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* ipv6: in case of link failure remove route directly instead of letting it expireHannes Frederic Sowa2013-09-111-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 1eb4f758286884e7566627164bca4c4a16952a83 ] We could end up expiring a route which is part of an ecmp route set. Doing so would invalidate the rt->rt6i_nsiblings calculations and could provoke the following panic: [ 80.144667] ------------[ cut here ]------------ [ 80.145172] kernel BUG at net/ipv6/ip6_fib.c:733! [ 80.145172] invalid opcode: 0000 [#1] SMP [ 80.145172] Modules linked in: 8021q nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables +snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer virtio_balloon snd soundcore i2c_piix4 i2c_core virtio_net virtio_blk [ 80.145172] CPU: 1 PID: 786 Comm: ping6 Not tainted 3.10.0+ #118 [ 80.145172] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 80.145172] task: ffff880117fa0000 ti: ffff880118770000 task.ti: ffff880118770000 [ 80.145172] RIP: 0010:[<ffffffff815f3b5d>] [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 [ 80.145172] RSP: 0018:ffff880118771798 EFLAGS: 00010202 [ 80.145172] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011350e480 [ 80.145172] RDX: ffff88011350e238 RSI: 0000000000000004 RDI: ffff88011350f738 [ 80.145172] RBP: ffff880118771848 R08: ffff880117903280 R09: 0000000000000001 [ 80.145172] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88011350f680 [ 80.145172] R13: ffff880117903280 R14: ffff880118771890 R15: ffff88011350ef90 [ 80.145172] FS: 00007f02b5127740(0000) GS:ffff88011fd00000(0000) knlGS:0000000000000000 [ 80.145172] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 80.145172] CR2: 00007f981322a000 CR3: 00000001181b1000 CR4: 00000000000006e0 [ 80.145172] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 80.145172] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 80.145172] Stack: [ 80.145172] 0000000000000001 ffff880100000000 ffff880100000000 ffff880117903280 [ 80.145172] 0000000000000000 ffff880119a4cf00 0000000000000400 00000000000007fa [ 80.145172] 0000000000000000 0000000000000000 0000000000000000 ffff88011350f680 [ 80.145172] Call Trace: [ 80.145172] [<ffffffff815eeceb>] ? rt6_bind_peer+0x4b/0x90 [ 80.145172] [<ffffffff815ed985>] __ip6_ins_rt+0x45/0x70 [ 80.145172] [<ffffffff815eee35>] ip6_ins_rt+0x35/0x40 [ 80.145172] [<ffffffff815ef1e4>] ip6_pol_route.isra.44+0x3a4/0x4b0 [ 80.145172] [<ffffffff815ef34a>] ip6_pol_route_output+0x2a/0x30 [ 80.145172] [<ffffffff81616077>] fib6_rule_action+0xd7/0x210 [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 [ 80.145172] [<ffffffff81553026>] fib_rules_lookup+0xc6/0x140 [ 80.145172] [<ffffffff81616374>] fib6_rule_lookup+0x44/0x80 [ 80.145172] [<ffffffff815ef320>] ? ip6_pol_route_input+0x30/0x30 [ 80.145172] [<ffffffff815edea3>] ip6_route_output+0x73/0xb0 [ 80.145172] [<ffffffff815dfdf3>] ip6_dst_lookup_tail+0x2c3/0x2e0 [ 80.145172] [<ffffffff813007b1>] ? list_del+0x11/0x40 [ 80.145172] [<ffffffff81082a4c>] ? remove_wait_queue+0x3c/0x50 [ 80.145172] [<ffffffff815dfe4d>] ip6_dst_lookup_flow+0x3d/0xa0 [ 80.145172] [<ffffffff815fda77>] rawv6_sendmsg+0x267/0xc20 [ 80.145172] [<ffffffff815a8a83>] inet_sendmsg+0x63/0xb0 [ 80.145172] [<ffffffff8128eb93>] ? selinux_socket_sendmsg+0x23/0x30 [ 80.145172] [<ffffffff815218d6>] sock_sendmsg+0xa6/0xd0 [ 80.145172] [<ffffffff81524a68>] SYSC_sendto+0x128/0x180 [ 80.145172] [<ffffffff8109825c>] ? update_curr+0xec/0x170 [ 80.145172] [<ffffffff81041d09>] ? kvm_clock_get_cycles+0x9/0x10 [ 80.145172] [<ffffffff810afd1e>] ? __getnstimeofday+0x3e/0xd0 [ 80.145172] [<ffffffff8152509e>] SyS_sendto+0xe/0x10 [ 80.145172] [<ffffffff8164efd9>] system_call_fastpath+0x16/0x1b [ 80.145172] Code: fe ff ff 41 f6 45 2a 06 0f 85 ca fe ff ff 49 8b 7e 08 4c 89 ee e8 94 ef ff ff e9 b9 fe ff ff 48 8b 82 28 05 00 00 e9 01 ff ff ff <0f> 0b 49 8b 54 24 30 0d 00 00 40 00 89 83 14 01 00 00 48 89 53 [ 80.145172] RIP [<ffffffff815f3b5d>] fib6_add+0x75d/0x830 [ 80.145172] RSP <ffff880118771798> [ 80.387413] ---[ end trace 02f20b7a8b81ed95 ]--- [ 80.390154] Kernel panic - not syncing: Fatal exception in interrupt Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_sizeHannes Frederic Sowa2013-09-111-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 75a493e60ac4bbe2e977e7129d6d8cbb0dd236be ] If the socket had an IPV6_MTU value set, ip6_append_data_mtu lost track of this when appending the second frame on a corked socket. This results in the following splat: [37598.993962] ------------[ cut here ]------------ [37598.994008] kernel BUG at net/core/skbuff.c:2064! [37598.994008] invalid opcode: 0000 [#1] SMP [37598.994008] Modules linked in: tcp_lp uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev media vfat fat usb_storage fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat +nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi +scsi_transport_iscsi rfcomm bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_conexant arc4 iwldvm mac80211 snd_hda_intel acpi_cpufreq mperf coretemp snd_hda_codec microcode cdc_wdm cdc_acm [37598.994008] snd_hwdep cdc_ether snd_seq snd_seq_device usbnet mii joydev btusb snd_pcm bluetooth i2c_i801 e1000e lpc_ich mfd_core ptp iwlwifi pps_core snd_page_alloc mei cfg80211 snd_timer thinkpad_acpi snd tpm_tis soundcore rfkill tpm tpm_bios vhost_net tun macvtap macvlan kvm_intel kvm uinput binfmt_misc +dm_crypt i915 i2c_algo_bit drm_kms_helper drm i2c_core wmi video [37598.994008] CPU 0 [37598.994008] Pid: 27320, comm: t2 Not tainted 3.9.6-200.fc18.x86_64 #1 LENOVO 27744PG/27744PG [37598.994008] RIP: 0010:[<ffffffff815443a5>] [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330 [37598.994008] RSP: 0018:ffff88003670da18 EFLAGS: 00010202 [37598.994008] RAX: ffff88018105c018 RBX: 0000000000000004 RCX: 00000000000006c0 [37598.994008] RDX: ffff88018105a6c0 RSI: ffff88018105a000 RDI: ffff8801e1b0aa00 [37598.994008] RBP: ffff88003670da78 R08: 0000000000000000 R09: ffff88018105c040 [37598.994008] R10: ffff8801e1b0aa00 R11: 0000000000000000 R12: 000000000000fff8 [37598.994008] R13: 00000000000004fc R14: 00000000ffff0504 R15: 0000000000000000 [37598.994008] FS: 00007f28eea59740(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000 [37598.994008] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [37598.994008] CR2: 0000003d935789e0 CR3: 00000000365cb000 CR4: 00000000000407f0 [37598.994008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [37598.994008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [37598.994008] Process t2 (pid: 27320, threadinfo ffff88003670c000, task ffff88022c162ee0) [37598.994008] Stack: [37598.994008] ffff88022e098a00 ffff88020f973fc0 0000000000000008 00000000000004c8 [37598.994008] ffff88020f973fc0 00000000000004c4 ffff88003670da78 ffff8801e1b0a200 [37598.994008] 0000000000000018 00000000000004c8 ffff88020f973fc0 00000000000004c4 [37598.994008] Call Trace: [37598.994008] [<ffffffff815fc21f>] ip6_append_data+0xccf/0xfe0 [37598.994008] [<ffffffff8158d9f0>] ? ip_copy_metadata+0x1a0/0x1a0 [37598.994008] [<ffffffff81661f66>] ? _raw_spin_lock_bh+0x16/0x40 [37598.994008] [<ffffffff8161548d>] udpv6_sendmsg+0x1ed/0xc10 [37598.994008] [<ffffffff812a2845>] ? sock_has_perm+0x75/0x90 [37598.994008] [<ffffffff815c3693>] inet_sendmsg+0x63/0xb0 [37598.994008] [<ffffffff812a2973>] ? selinux_socket_sendmsg+0x23/0x30 [37598.994008] [<ffffffff8153a450>] sock_sendmsg+0xb0/0xe0 [37598.994008] [<ffffffff810135d1>] ? __switch_to+0x181/0x4a0 [37598.994008] [<ffffffff8153d97d>] sys_sendto+0x12d/0x180 [37598.994008] [<ffffffff810dfb64>] ? __audit_syscall_entry+0x94/0xf0 [37598.994008] [<ffffffff81020ed1>] ? syscall_trace_enter+0x231/0x240 [37598.994008] [<ffffffff8166a7e7>] tracesys+0xdd/0xe2 [37598.994008] Code: fe 07 00 00 48 c7 c7 04 28 a6 81 89 45 a0 4c 89 4d b8 44 89 5d a8 e8 1b ac b1 ff 44 8b 5d a8 4c 8b 4d b8 8b 45 a0 e9 cf fe ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 48 [37598.994008] RIP [<ffffffff815443a5>] skb_copy_and_csum_bits+0x325/0x330 [37598.994008] RSP <ffff88003670da18> [37599.007323] ---[ end trace d69f6a17f8ac8eee ]--- While there, also check if path mtu discovery is activated for this socket. The logic was adapted from ip6_append_data when first writing on the corked socket. This bug was introduced with commit 0c1833797a5a6ec23ea9261d979aa18078720b74 ("ipv6: fix incorrect ipsec fragment"). v2: a) Replace IPV6_PMTU_DISC_DO with IPV6_PMTUDISC_PROBE. b) Don't pass ipv6_pinfo to ip6_append_data_mtu (suggestion by Gao feng, thanks!). c) Change mtu to unsigned int, else we get a warning about non-matching types because of the min()-macro type-check. Acked-by: Gao feng <gaofeng@cn.fujitsu.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET ↵Hannes Frederic Sowa2013-09-112-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pending data [ Upstream commit 8822b64a0fa64a5dd1dfcf837c5b0be83f8c05d1 ] We accidentally call down to ip6_push_pending_frames when uncorking pending AF_INET data on a ipv6 socket. This results in the following splat (from Dave Jones): skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:126! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37 task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000 RIP: 0010:[<ffffffff816e759c>] [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282 RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006 RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520 RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800 R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800 FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Stack: ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0 Call Trace: [<ffffffff8159a9aa>] skb_push+0x3a/0x40 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0 [<ffffffff816f5d54>] tracesys+0xdd/0xe2 Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 RIP [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP <ffff8801e6431de8> This patch adds a check if the pending data is of address family AF_INET and directly calls udp_push_ending_frames from udp_v6_push_pending_frames if that is the case. This bug was found by Dave Jones with trinity. (Also move the initialization of fl6 below the AF_INET check, even if not strictly necessary.) Cc: Dave Jones <davej@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* l2tp: add missing .owner to struct pppox_protoWei Yongjun2013-09-111-1/+2
| | | | | | | | | | | [ Upstream commit e1558a93b61962710733dc8c11a2bc765607f1cd ] Add missing .owner of struct pppox_proto. This prevents the module from being removed from underneath its users. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* vti: remove duplicated code to fix a memory leakCong Wang2013-09-111-7/+0
| | | | | | | | | | | | | | | | | | | | [ Upstream commit ab6c7a0a43c2eaafa57583822b619b22637b49c7 ] vti module allocates dev->tstats twice: in vti_fb_tunnel_init() and in vti_tunnel_init(), this lead to a memory leak of dev->tstats. Just remove the duplicated operations in vti_fb_tunnel_init(). (candidate for -stable) Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Saurabh Mohan <saurabh.mohan@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* x25: Fix broken locking in ioctl error paths.Dave Jones2013-09-111-7/+8
| | | | | | | | | | | | | | | | | | | [ Upstream commit 4ccb93ce7439b63c31bc7597bfffd13567fa483d ] Two of the x25 ioctl cases have error paths that break out of the function without unlocking the socket, leading to this warning: ================================================ [ BUG: lock held when returning to user space! ] 3.10.0-rc7+ #36 Not tainted ------------------------------------------------ trinity-child2/31407 is leaving the kernel with locks still held! 1 lock held by trinity-child2/31407: #0: (sk_lock-AF_X25){+.+.+.}, at: [<ffffffffa024b6da>] x25_ioctl+0x8a/0x740 [x25] Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* neighbour: fix a race in neigh_destroy()Eric Dumazet2013-09-111-5/+7
| | | | | | | | | | | | | | | | | | | | | [ Upstream commit c9ab4d85de222f3390c67aedc9c18a50e767531e ] There is a race in neighbour code, because neigh_destroy() uses skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, while other parts of the code assume neighbour rwlock is what protects arp_queue Convert all skb_queue_purge() calls to the __skb_queue_purge() variant Use __skb_queue_head_init() instead of skb_queue_head_init() to make clear we do not use arp_queue.lock And hold neigh->lock in neigh_destroy() to close the race. Reported-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* svcrpc: fix handling of too-short rpc'sJ. Bruce Fields2013-09-111-1/+4
| | | | | | | | | | | | | | | | [ Upstream commit cf3aa02cb4a0c5af5557dd47f15a08a7df33182a ] If we detect that an rpc is too short, we abort and close the connection. Except, there's a bug here: we're leaving sk_datalen nonzero without leaving any pages in the sk_pages array. The most likely result of the inconsistency is a subsequent crash in svc_tcp_clear_pages. Also demote the BUG_ON in svc_tcp_clear_pages to a WARN. Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* Revert "net/core/sock.c: add missing VSOCK string in af_family_*_key_strings"Steven Rostedt (Red Hat)2013-09-041-3/+3
| | | | | | | | | | | | | | | This reverts commit 5ad0f2284194c7227c82ea4197378292fc454d60. Commit "net/core/sock.c: add missing VSOCK string in af_family_*_key_strings" which is upstream commit: 456db6a4d495f40777da6f1f32f62f13026f52db is dependent on a larger change that should not neet to be backported. This commit causes problems by itself, so revert it. Reported-by: Joe Korty <joe.korty@ccur.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* libceph: Fix NULL pointer dereference in auth client codeTyler Hicks2013-08-131-0/+6
| | | | | | | | | | | | | | | | | | | [ Upstream commit 2cb33cac622afde897aa02d3dcd9fbba8bae839e ] A malicious monitor can craft an auth reply message that could cause a NULL function pointer dereference in the client's kernel. To prevent this, the auth_none protocol handler needs an empty ceph_auth_client_ops->build_request() function. CVE-2013-1059 Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Chanam Park <chanam.park@hkpco.kr> Reviewed-by: Seth Arnold <seth.arnold@canonical.com> Reviewed-by: Sage Weil <sage@inktank.com> Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* Bluetooth: Fix crash in l2cap_build_cmd() with small MTUAnderson Lizardo2013-08-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 300b962e5244a1ea010df7e88595faa0085b461d ] If a too small MTU value is set with ioctl(HCISETACLMTU) or by a bogus controller, memory corruption happens due to a memcpy() call with negative length. Fix this crash on either incoming or outgoing connections with a MTU smaller than L2CAP_HDR_SIZE + L2CAP_CMD_HDR_SIZE: [ 46.885433] BUG: unable to handle kernel paging request at f56ad000 [ 46.888037] IP: [<c03d94cd>] memcpy+0x1d/0x40 [ 46.888037] *pdpt = 0000000000ac3001 *pde = 00000000373f8067 *pte = 80000000356ad060 [ 46.888037] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC [ 46.888037] Modules linked in: hci_vhci bluetooth virtio_balloon i2c_piix4 uhci_hcd usbcore usb_common [ 46.888037] CPU: 0 PID: 1044 Comm: kworker/u3:0 Not tainted 3.10.0-rc1+ #12 [ 46.888037] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 46.888037] Workqueue: hci0 hci_rx_work [bluetooth] [ 46.888037] task: f59b15b0 ti: f55c4000 task.ti: f55c4000 [ 46.888037] EIP: 0060:[<c03d94cd>] EFLAGS: 00010212 CPU: 0 [ 46.888037] EIP is at memcpy+0x1d/0x40 [ 46.888037] EAX: f56ac1c0 EBX: fffffff8 ECX: 3ffffc6e EDX: f55c5cf2 [ 46.888037] ESI: f55c6b32 EDI: f56ad000 EBP: f55c5c68 ESP: f55c5c5c [ 46.888037] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 46.888037] CR0: 8005003b CR2: f56ad000 CR3: 3557d000 CR4: 000006f0 [ 46.888037] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 46.888037] DR6: ffff0ff0 DR7: 00000400 [ 46.888037] Stack: [ 46.888037] fffffff8 00000010 00000003 f55c5cac f8c6a54c ffffffff f8c69eb2 00000000 [ 46.888037] f4783cdc f57f0070 f759c590 1001c580 00000003 0200000a 00000000 f5a88560 [ 46.888037] f5ba2600 f5a88560 00000041 00000000 f55c5d90 f8c6f4c7 00000008 f55c5cf2 [ 46.888037] Call Trace: [ 46.888037] [<f8c6a54c>] l2cap_send_cmd+0x1cc/0x230 [bluetooth] [ 46.888037] [<f8c69eb2>] ? l2cap_global_chan_by_psm+0x152/0x1a0 [bluetooth] [ 46.888037] [<f8c6f4c7>] l2cap_connect+0x3f7/0x540 [bluetooth] [ 46.888037] [<c019b37b>] ? trace_hardirqs_off+0xb/0x10 [ 46.888037] [<c01a0ff8>] ? mark_held_locks+0x68/0x110 [ 46.888037] [<c064ad20>] ? mutex_lock_nested+0x280/0x360 [ 46.888037] [<c064b9d9>] ? __mutex_unlock_slowpath+0xa9/0x150 [ 46.888037] [<c01a118c>] ? trace_hardirqs_on_caller+0xec/0x1b0 [ 46.888037] [<c064ad08>] ? mutex_lock_nested+0x268/0x360 [ 46.888037] [<c01a125b>] ? trace_hardirqs_on+0xb/0x10 [ 46.888037] [<f8c72f8d>] l2cap_recv_frame+0xb2d/0x1d30 [bluetooth] [ 46.888037] [<c01a0ff8>] ? mark_held_locks+0x68/0x110 [ 46.888037] [<c064b9d9>] ? __mutex_unlock_slowpath+0xa9/0x150 [ 46.888037] [<c01a118c>] ? trace_hardirqs_on_caller+0xec/0x1b0 [ 46.888037] [<f8c754f1>] l2cap_recv_acldata+0x2a1/0x320 [bluetooth] [ 46.888037] [<f8c491d8>] hci_rx_work+0x518/0x810 [bluetooth] [ 46.888037] [<f8c48df2>] ? hci_rx_work+0x132/0x810 [bluetooth] [ 46.888037] [<c0158979>] process_one_work+0x1a9/0x600 [ 46.888037] [<c01588fb>] ? process_one_work+0x12b/0x600 [ 46.888037] [<c015922e>] ? worker_thread+0x19e/0x320 [ 46.888037] [<c015922e>] ? worker_thread+0x19e/0x320 [ 46.888037] [<c0159187>] worker_thread+0xf7/0x320 [ 46.888037] [<c0159090>] ? rescuer_thread+0x290/0x290 [ 46.888037] [<c01602f8>] kthread+0xa8/0xb0 [ 46.888037] [<c0656777>] ret_from_kernel_thread+0x1b/0x28 [ 46.888037] [<c0160250>] ? flush_kthread_worker+0x120/0x120 [ 46.888037] Code: c3 90 8d 74 26 00 e8 63 fc ff ff eb e8 90 55 89 e5 83 ec 0c 89 5d f4 89 75 f8 89 7d fc 3e 8d 74 26 00 89 cb 89 c7 c1 e9 02 89 d6 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 5d f4 8b 75 f8 8b 7d fc 89 [ 46.888037] EIP: [<c03d94cd>] memcpy+0x1d/0x40 SS:ESP 0068:f55c5c5c [ 46.888037] CR2: 00000000f56ad000 [ 46.888037] ---[ end trace 0217c1f4d78714a9 ]--- Signed-off-by: Anderson Lizardo <anderson.lizardo@openbossa.org> Cc: stable@vger.kernel.org Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* sctp: fully initialize sctp_outq in sctp_outq_initNeil Horman2013-08-131-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit c5c7774d7eb4397891edca9ebdf750ba90977a69 ] In commit 2f94aabd9f6c925d77aecb3ff020f1cc12ed8f86 (refactor sctp_outq_teardown to insure proper re-initalization) we modified sctp_outq_teardown to use sctp_outq_init to fully re-initalize the outq structure. Steve West recently asked me why I removed the q->error = 0 initalization from sctp_outq_teardown. I did so because I was operating under the impression that sctp_outq_init would properly initalize that value for us, but it doesn't. sctp_outq_init operates under the assumption that the outq struct is all 0's (as it is when called from sctp_association_init), but using it in __sctp_outq_teardown violates that assumption. We should do a memset in sctp_outq_init to ensure that the entire structure is in a known state there instead. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: "West, Steve (NSN - US/Fort Worth)" <steve.west@nsn.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: netdev@vger.kernel.org CC: davem@davemloft.net Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* l2tp: Fix sendmsg() return valueGuillaume Nault2013-08-131-1/+1
| | | | | | | | | | | [ Upstream commit a6f79d0f26704214b5b702bbac525cb72997f984 ] PPPoL2TP sockets should comply with the standard send*() return values (i.e. return number of bytes sent instead of 0 upon success). Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* l2tp: Fix PPP header erasure and memory leakGuillaume Nault2013-08-131-2/+2
| | | | | | | | | | | | [ Upstream commit 55b92b7a11690bc377b5d373872a6b650ae88e64 ] Copy user data after PPP framing header. This prevents erasure of the added PPP header and avoids leaking two bytes of uninitialised memory at the end of skb's data buffer. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* packet: packet_getname_spkt: make sure string is always 0-terminatedDaniel Borkmann2013-08-131-3/+2
| | | | | | | | | | | | | | | | [ Upstream commit 2dc85bf323515e59e15dfa858d1472bb25cad0fe ] uaddr->sa_data is exactly of size 14, which is hard-coded here and passed as a size argument to strncpy(). A device name can be of size IFNAMSIZ (== 16), meaning we might leave the destination string unterminated. Thus, use strlcpy() and also sizeof() while we're at it. We need to memset the data area beforehand, since strlcpy does not padd the remaining buffer with zeroes for user space, so that we do not possibly leak anything. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net: sctp: fix NULL pointer dereference in socket destructionDaniel Borkmann2013-08-131-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 1abd165ed757db1afdefaac0a4bc8a70f97d258c ] While stress testing sctp sockets, I hit the following panic: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffffa0490c4e>] sctp_endpoint_free+0xe/0x40 [sctp] PGD 7cead067 PUD 7ce76067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: sctp(F) libcrc32c(F) [...] CPU: 7 PID: 2950 Comm: acc Tainted: GF 3.10.0-rc2+ #1 Hardware name: Dell Inc. PowerEdge T410/0H19HD, BIOS 1.6.3 02/01/2011 task: ffff88007ce0e0c0 ti: ffff88007b568000 task.ti: ffff88007b568000 RIP: 0010:[<ffffffffa0490c4e>] [<ffffffffa0490c4e>] sctp_endpoint_free+0xe/0x40 [sctp] RSP: 0018:ffff88007b569e08 EFLAGS: 00010292 RAX: 0000000000000000 RBX: ffff88007db78a00 RCX: dead000000200200 RDX: ffffffffa049fdb0 RSI: ffff8800379baf38 RDI: 0000000000000000 RBP: ffff88007b569e18 R08: ffff88007c230da0 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff880077990d00 R14: 0000000000000084 R15: ffff88007db78a00 FS: 00007fc18ab61700(0000) GS:ffff88007fc60000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000020 CR3: 000000007cf9d000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88007b569e38 ffff88007db78a00 ffff88007b569e38 ffffffffa049fded ffffffff81abf0c0 ffff88007db78a00 ffff88007b569e58 ffffffff8145b60e 0000000000000000 0000000000000000 ffff88007b569eb8 ffffffff814df36e Call Trace: [<ffffffffa049fded>] sctp_destroy_sock+0x3d/0x80 [sctp] [<ffffffff8145b60e>] sk_common_release+0x1e/0xf0 [<ffffffff814df36e>] inet_create+0x2ae/0x350 [<ffffffff81455a6f>] __sock_create+0x11f/0x240 [<ffffffff81455bf0>] sock_create+0x30/0x40 [<ffffffff8145696c>] SyS_socket+0x4c/0xc0 [<ffffffff815403be>] ? do_page_fault+0xe/0x10 [<ffffffff8153cb32>] ? page_fault+0x22/0x30 [<ffffffff81544e02>] system_call_fastpath+0x16/0x1b Code: 0c c9 c3 66 2e 0f 1f 84 00 00 00 00 00 e8 fb fe ff ff c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 <48> 8b 47 20 48 89 fb c6 47 1c 01 c6 40 12 07 e8 9e 68 01 00 48 RIP [<ffffffffa0490c4e>] sctp_endpoint_free+0xe/0x40 [sctp] RSP <ffff88007b569e08> CR2: 0000000000000020 ---[ end trace e0d71ec1108c1dd9 ]--- I did not hit this with the lksctp-tools functional tests, but with a small, multi-threaded test program, that heavily allocates, binds, listens and waits in accept on sctp sockets, and then randomly kills some of them (no need for an actual client in this case to hit this). Then, again, allocating, binding, etc, and then killing child processes. This panic then only occurs when ``echo 1 > /proc/sys/net/sctp/auth_enable'' is set. The cause for that is actually very simple: in sctp_endpoint_init() we enter the path of sctp_auth_init_hmacs(). There, we try to allocate our crypto transforms through crypto_alloc_hash(). In our scenario, it then can happen that crypto_alloc_hash() fails with -EINTR from crypto_larval_wait(), thus we bail out and release the socket via sk_common_release(), sctp_destroy_sock() and hit the NULL pointer dereference as soon as we try to access members in the endpoint during sctp_endpoint_free(), since endpoint at that time is still NULL. Now, if we have that case, we do not need to do any cleanup work and just leave the destruction handler. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* ipv6: assign rt6_info to inet6_ifaddr in init_loopbackGao feng2013-08-131-1/+3
| | | | | | | | | | | | | | | | | | [ Upstream commit 534c877928a16ae5f9776436a497109639bf67dc ] Commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f "net IPv6 : Fix broken IPv6 routing table after loopback down-up" forgot to assign rt6_info to the inet6_ifaddr. When disable the net device, the rt6_info which allocated in init_loopback will not be destroied in __ipv6_ifa_notify. This will trigger the waring message below [23527.916091] unregister_netdevice: waiting for tap0 to become free. Usage count = 1 Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net/core/sock.c: add missing VSOCK string in af_family_*_key_stringsFederico Vaga2013-08-131-3/+3
| | | | | | | | | | | | [ Upstream commit 456db6a4d495f40777da6f1f32f62f13026f52db ] The three arrays of strings: af_family_key_strings, af_family_slock_key_strings and af_family_clock_key_strings have not VSOCK's string Signed-off-by: Federico Vaga <federico.vaga@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net: Unbreak compat_sys_{send,recv}msgAndy Lutomirski2013-08-132-41/+44
| | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit a7526eb5d06b0084ef12d7b168d008fcf516caab ] I broke them in this commit: commit 1be374a0518a288147c6a7398792583200a67261 Author: Andy Lutomirski <luto@amacapital.net> Date: Wed May 22 14:07:44 2013 -0700 net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg This patch adds __sys_sendmsg and __sys_sendmsg as common helpers that accept MSG_CMSG_COMPAT and blocks MSG_CMSG_COMPAT at the syscall entrypoints. It also reverts some unnecessary checks in sys_socketcall. Apparently I was suffering from underscore blindness the first time around. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Tested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msgAndy Lutomirski2013-08-131-2/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 1be374a0518a288147c6a7398792583200a67261 ] To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, trinity@vger.kernel.org, Andy Lutomirski <luto@amacapital.net>, netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net> Subject: [PATCH 5/5] net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg MSG_CMSG_COMPAT is (AFAIK) not intended to be part of the API -- it's a hack that steals a bit to indicate to other networking code that a compat entry was used. So don't allow it from a non-compat syscall. This prevents an oops when running this code: int main() { int s; struct sockaddr_in addr; struct msghdr *hdr; char *highpage = mmap((void*)(TASK_SIZE_MAX - 4096), 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0); if (highpage == MAP_FAILED) err(1, "mmap"); s = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP); if (s == -1) err(1, "socket"); addr.sin_family = AF_INET; addr.sin_port = htons(1); addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK); if (connect(s, (struct sockaddr*)&addr, sizeof(addr)) != 0) err(1, "connect"); void *evil = highpage + 4096 - COMPAT_MSGHDR_SIZE; printf("Evil address is %p\n", evil); if (syscall(__NR_sendmmsg, s, evil, 1, MSG_CMSG_COMPAT) < 0) err(1, "sendmmsg"); return 0; } Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* ipv4: fix redirect handling for TCP packetsMichal Kubecek2013-08-131-1/+6
| | | | | | | | | | | | | | | | | | | | [ Upstream commit f96ef988cc603487c03a6de07807b06cbe641829 ] Unlike ipv4_redirect() and ipv4_sk_redirect(), ip_do_redirect() doesn't call __build_flow_key() directly but via ip_rt_build_flow_key() wrapper. This leads to __build_flow_key() getting pointer to IPv4 header of the ICMP redirect packet rather than pointer to the embedded IPv4 header of the packet initiating the redirect. As a result, handling of ICMP redirects initiated by TCP packets is broken. Issue was introduced by 4895c771c ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>