summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Linux 3.12.74-rt99 REBASEv3.12.74-rt99-rebasev3.12-rt-rebaseSteven Rostedt (VMware)2017-06-081-1/+1
|
* lockdep: Fix compilation error for !CONFIG_MODULES and !CONFIG_SMPDan Murphy2017-06-072-0/+10
| | | | | | | | | | | | | | | | | | | | | When CONFIG_MODULES is not set then it fails to compile in lockdep: |kernel/locking/lockdep.c: In function 'look_up_lock_class': |kernel/locking/lockdep.c:684:12: error: implicit declaration of function | '__is_module_percpu_address' [-Werror=implicit-function-declaration] If CONFIG_MODULES is set but CONFIG_SMP is not, then it compiles but fails link at the end: |kernel/locking/lockdep.c:684: undefined reference to `__is_module_percpu_address' |kernel/built-in.o:(.debug_addr+0x1e674): undefined reference to `__is_module_percpu_address' This patch adds the function for both cases. Signed-off-by: Dan Murphy <dmurphy@ti.com> [bigeasy: merge the two patches from Dan into one, adapt changelog] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* rt: Drop the removal of _GPL from rt_mutex_destroy()'s EXPORT_SYMBOLSebastian Andrzej Siewior2017-06-071-1/+2
| | | | | | | | What we have now should be enough, the EXPORT_SYMBOL statement for rt_mutex_destroy() is not required. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* lockdep: Handle statically initialized PER_CPU locks properThomas Gleixner2017-06-075-35/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a PER_CPU struct which contains a spin_lock is statically initialized via: DEFINE_PER_CPU(struct foo, bla) = { .lock = __SPIN_LOCK_UNLOCKED(bla.lock) }; then lockdep assigns a seperate key to each lock because the logic for assigning a key to statically initialized locks is to use the address as the key. With per CPU locks the address is obvioulsy different on each CPU. That's wrong, because all locks should have the same key. To solve this the following modifications are required: 1) Extend the is_kernel/module_percpu_addr() functions to hand back the canonical address of the per CPU address, i.e. the per CPU address minus the per CPU offset. 2) Check the lock address with these functions and if the per CPU check matches use the returned canonical address as the lock key, so all per CPU locks have the same key. 3) Move the static_obj(key) check into look_up_lock_class() so this check can be avoided for statically initialized per CPU locks. That's required because the canonical address fails the static_obj(key) check for obvious reasons. Reported-by: Mike Galbraith <efault@gmx.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* rt: Drop mutex_disable() on !DEBUG configs and the GPL suffix from export symbolSebastian Andrzej Siewior2017-06-072-2/+6
| | | | | | | | | | | | | | | | Alex Goins reported that mutex_destroy() on RT will force a GPL only symbol which won't link and therefore fail on a non-GPL kernel module. This does not happen on !RT and is a regression on RT which we would like to avoid. I try here the easy thing and to not use rt_mutex_destroy() if CONFIG_DEBUG_MUTEXES is not enabled. This will still break for the DEBUG configs so instead of adding a wrapper around rt_mutex_destroy() (which we have for rt_mutex_lock() for instance) I am simply dropping the GPL part from the export. Reported-by: Alex Goins <agoins@nvidia.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* x86/mm/cpa: avoid wbinvd() for PREEMPTJohn Ogness2017-06-071-0/+8
| | | | | | | | | | | | | | | | Although wbinvd() is faster than flushing many individual pages, it blocks the memory bus for "long" periods of time (>100us), thus directly causing unusually large latencies on all CPUs, regardless of any CPU isolation features that may be active. For 1024 pages, flushing those pages individually can take up to 2200us, but the task remains fully preemptible during that time. Cc: stable-rt@vger.kernel.org Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* radix-tree: use local locksSebastian Andrzej Siewior2017-06-072-19/+16
| | | | | | | | | | | | The preload functionality uses per-CPU variables and preempt-disable to ensure that it does not switch CPUs during its usage. This patch adds local_locks() instead preempt_disable() for the same purpose and to remain preemptible on -RT. Cc: stable-rt@vger.kernel.org Reported-and-debugged-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* workqueue: use rcu_readlock() in put_pwq_unlocked()Sebastian Andrzej Siewior2017-06-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RCU sched protection was changed to RCU only and so all IRQ-off and preempt-off disabled region were changed to the relevant rcu-read-lock primitives. One was missed and triggered: |[ BUG: bad unlock balance detected! ] |4.4.30-rt41 #51 Tainted: G W |btattach/345 is trying to release lock ( |Unable to handle kernel paging request at virtual address 6b6b6bbb |Backtrace: |[<c016b5a0>] (lock_release) from [<c0804844>] (rt_spin_unlock+0x20/0x30) |[<c0804824>] (rt_spin_unlock) from [<c0138954>] (put_pwq_unlocked+0xa4/0x118) |[<c01388b0>] (put_pwq_unlocked) from [<c0138b2c>] (destroy_workqueue+0x164/0x1b0) |[<c01389c8>] (destroy_workqueue) from [<c078e1ac>] (hci_unregister_dev+0x120/0x21c) |[<c078e08c>] (hci_unregister_dev) from [<c054f658>] (hci_uart_tty_close+0x90/0xbc) |[<c054f5c8>] (hci_uart_tty_close) from [<c03a2be8>] (tty_ldisc_close+0x50/0x58) |[<c03a2b98>] (tty_ldisc_close) from [<c03a2cb4>] (tty_ldisc_kill+0x18/0x78) |[<c03a2c9c>] (tty_ldisc_kill) from [<c03a3528>] (tty_ldisc_release+0x100/0x134) |[<c03a3428>] (tty_ldisc_release) from [<c039cd68>] (tty_release+0x3bc/0x460) |[<c039c9ac>] (tty_release) from [<c020cc08>] (__fput+0xe0/0x1b4) |[<c020cb28>] (__fput) from [<c020cd3c>] (____fput+0x10/0x14) |[<c020cd2c>] (____fput) from [<c013e0d4>] (task_work_run+0xa4/0xb8) |[<c013e030>] (task_work_run) from [<c0121754>] (do_exit+0x40c/0x8b0) |[<c0121348>] (do_exit) from [<c0122ff8>] (do_group_exit+0x54/0xc4) Cc: stable-rt@vger.kernel.org Reported-by: John Keeping <john@metanate.com> Tested-by: John Keeping <john@metanate.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* kbuild: add -fno-PIESebastian Andrzej Siewior2017-06-071-1/+1
| | | | | | | | | Debian started to build the gcc with -fPIE by default so the kernel build ends before it starts properly with: |kernel/bounds.c:1:0: error: code model kernel does not support PIC mode Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* ftrace: Fix trace header alignmentMike Galbraith2017-06-071-11/+11
| | | | | | | | | | Line up helper arrows to the right column. Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bigeasy: fixup function tracer header] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* fs/dcache: incremental fixup of the retry routineSebastian Andrzej Siewior2017-06-071-4/+3
| | | | | | | | | | It has been pointed out by tglx that on UP the non-RT task could spin its entire time slice because the lock owner is preempted. This won't happen on !RT. So we back to "chill" if we can't cond_resched() did not work. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* fs/dcache: resched/chill only if we make no progressSebastian Andrzej Siewior2017-06-071-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | Upstream commit 47be61845c77 ("fs/dcache.c: avoid soft-lockup in dput()") changed the condition _when_ cpu_relax() / cond_resched() was invoked. This change was adapted in -RT into mostly the same thing except that if cond_resched() did nothing we had to do cpu_chill() to force the task off CPU for a tiny little bit in case the task had RT priority and did not want to leave the CPU. This change resulted in a performance regression (in my testcase the build time on /dev/shm increased from 19min to 24min). The reason is that with this change cpu_chill() was invoked even dput() made progress (dentry_kill() returned a different dentry) instead only if we were trying this operation on the same dentry over and over again. This patch brings back to the old behavior back to cond_resched() & chill if we make no progress. A little improvement is to invoke cpu_chill() only if we are a RT task (and avoid the sleep otherwise). Otherwise the scheduler should remove us from the CPU if we make no progress. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net: add a lock around icmp_sk()Sebastian Andrzej Siewior2017-06-071-0/+8
| | | | | | | | | | It looks like the this_cpu_ptr() access in icmp_sk() is protected with local_bh_disable(). To avoid missing serialization in -RT I am adding here a local lock. No crash has been observed, this is just precaution. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net: add back the missing serialization in ip_send_unicast_reply()Sebastian Andrzej Siewior2017-06-071-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some time ago Sami Pietikäinen reported a crash on -RT in ip_send_unicast_reply() which was later fixed by Nicholas Mc Guire (v3.12.8-rt11). Later (v3.18.8) the code was reworked and I dropped the patch. As it turns out it was mistake. I have reports that the same crash is possible with a similar backtrace. It seems that vanilla protects access to this_cpu_ptr() via local_bh_disable(). This does not work the on -RT since we can have NET_RX and NET_TX running in parallel on the same CPU. This is brings back the old locks. |Unable to handle kernel NULL pointer dereference at virtual address 00000010 |PC is at __ip_make_skb+0x198/0x3e8 |[<c04e39d8>] (__ip_make_skb) from [<c04e3ca8>] (ip_push_pending_frames+0x20/0x40) |[<c04e3ca8>] (ip_push_pending_frames) from [<c04e3ff0>] (ip_send_unicast_reply+0x210/0x22c) |[<c04e3ff0>] (ip_send_unicast_reply) from [<c04fbb54>] (tcp_v4_send_reset+0x190/0x1c0) |[<c04fbb54>] (tcp_v4_send_reset) from [<c04fcc1c>] (tcp_v4_do_rcv+0x22c/0x288) |[<c04fcc1c>] (tcp_v4_do_rcv) from [<c0474364>] (release_sock+0xb4/0x150) |[<c0474364>] (release_sock) from [<c04ed904>] (tcp_close+0x240/0x454) |[<c04ed904>] (tcp_close) from [<c0511408>] (inet_release+0x74/0x7c) |[<c0511408>] (inet_release) from [<c0470728>] (sock_release+0x30/0xb0) |[<c0470728>] (sock_release) from [<c0470abc>] (sock_close+0x1c/0x24) |[<c0470abc>] (sock_close) from [<c0115ec4>] (__fput+0xe8/0x20c) |[<c0115ec4>] (__fput) from [<c0116050>] (____fput+0x18/0x1c) |[<c0116050>] (____fput) from [<c0058138>] (task_work_run+0xa4/0xb8) |[<c0058138>] (task_work_run) from [<c0011478>] (do_work_pending+0xd0/0xe4) |[<c0011478>] (do_work_pending) from [<c000e740>] (work_pending+0xc/0x20) |Code: e3530001 8a000001 e3a00040 ea000011 (e5973010) Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* scsi/fcoe: Fix get_cpu()/put_cpu_light() imbalance in fcoe_recv_frame()Mike Galbraith2017-06-071-1/+1
| | | | | | | | | | | During master->rt merge, I stumbled across the buglet below. Fix get_cpu()/put_cpu_light() imbalance. Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Gabraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* sched: lazy_preempt: avoid a warning in the !RT caseSebastian Andrzej Siewior2017-06-071-1/+1
| | | | | Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* timers: wakeup all timer waiters without holding the base lockSebastian Andrzej Siewior2017-06-071-1/+1
| | | | | | | | | | There should be no need to hold the base lock during the wakeup. There should be no boosting involved, the wakeup list has its own lock so it should be safe to do this without the lock. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* timers: wakeup all timer waitersSebastian Andrzej Siewior2017-06-071-1/+1
| | | | | | | | | | | | | | | The base lock is dropped during the invocation if the timer. That means it is possible that we have one waiter while timer1 is running and once this one finished, we get another waiter while timer2 is running. Since we wake up only one waiter it is possible that we miss the other one. This will probably heal itself over time because most of the time we complete timers without an active wake up. To avoid the scenario where we don't wake up all waiters at once, wake_up_all() is used. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* x86: Fix an RT MCE crashCorey Minyard2017-06-071-1/+2
| | | | | | | | | | | | | | On some x86 systems an MCE interrupt would come in before the kernel was ready for it. Looking at the latest RT code, it has similar (but not quite the same) code, except it adds a bool that tells if MCE handling is initialized. That was required because they had switched to use swork instead of a kernel thread. Here, just checking to see if the thread is NULL is good enough to see if MCE handling is initialized. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* trace: correct off by one while recording the trace-eventSebastian Andrzej Siewior2017-06-071-0/+3
| | | | | | | | | | | | Trace events like raw_syscalls show always a preempt code of one. The reason is that on PREEMPT kernels rcu_read_lock_sched_notrace() increases the preemption counter and the function recording the counter is caller within the RCU section. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> [ Changed this to upstream version. See commit e947841c0dce ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* mm: perform lru_add_drain_all() remotelyLuiz Capitulino2017-06-071-7/+30
| | | | | | | | | | | | | | | | | | | | lru_add_drain_all() works by scheduling lru_add_drain_cpu() to run on all CPUs that have non-empty LRU pagevecs and then waiting for the scheduled work to complete. However, workqueue threads may never have the chance to run on a CPU that's running a SCHED_FIFO task. This causes lru_add_drain_all() to block forever. This commit solves this problem by changing lru_add_drain_all() to drain the LRU pagevecs of remote CPUs. This is done by grabbing swapvec_lock and calling lru_add_drain_cpu(). PS: This is based on an idea and initial implementation by Rik van Riel. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* locallock: add local_lock_on()Sebastian Andrzej Siewior2017-06-071-0/+6
| | | | | Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* arm: lazy preempt: correct resched conditionSebastian Andrzej Siewior2017-06-071-1/+5
| | | | | | | | | | | | If we get out of preempt_schedule_irq() then we check for NEED_RESCHED and call the former function again if set because the preemption counter has be zero at this point. However the counter for lazy-preempt might not be zero therefore we have to check the counter before looking at the need_resched_lazy flag. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* kernel/printk: Don't try to print from IRQ/NMI regionSebastian Andrzej Siewior2017-06-071-0/+10
| | | | | | | | | | | On -RT we try to acquire sleeping locks which might lead to warnings from lockdep or a warn_on() from spin_try_lock() (which is a rtmutex on RT). We don't print in general from a IRQ off region so we should not try this via console_unblank() / bust_spinlocks() as well. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* list_bl: fixup bogus lockdep warningJosh Cartwright2017-06-071-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At first glance, the use of 'static inline' seems appropriate for INIT_HLIST_BL_HEAD(). However, when a 'static inline' function invocation is inlined by gcc, all callers share any static local data declared within that inline function. This presents a problem for how lockdep classes are setup. raw_spinlocks, for example, when CONFIG_DEBUG_SPINLOCK, # define raw_spin_lock_init(lock) \ do { \ static struct lock_class_key __key; \ \ __raw_spin_lock_init((lock), #lock, &__key); \ } while (0) When this macro is expanded into a 'static inline' caller, like INIT_HLIST_BL_HEAD(): static inline INIT_HLIST_BL_HEAD(struct hlist_bl_head *h) { h->first = NULL; raw_spin_lock_init(&h->lock); } ...the static local lock_class_key object is made a function static. For compilation units which initialize invoke INIT_HLIST_BL_HEAD() more than once, then, all of the invocations share this same static local object. This can lead to some very confusing lockdep splats (example below). Solve this problem by forcing the INIT_HLIST_BL_HEAD() to be a macro, which prevents the lockdep class object sharing. ============================================= [ INFO: possible recursive locking detected ] 4.4.4-rt11 #4 Not tainted --------------------------------------------- kswapd0/59 is trying to acquire lock: (&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan but task is already holding lock: (&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&h->lock#2); lock(&h->lock#2); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by kswapd0/59: #0: (shrinker_rwsem){+.+...}, at: rt_down_read_trylock #1: (&h->lock#2){+.+.-.}, at: mb_cache_shrink_scan Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org> Tested-by: Luis Claudio R. Goncalves <lclaudio@uudg.org> Signed-off-by: Josh Cartwright <joshc@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net: dev: always take qdisc's busylock in __dev_xmit_skb()Sebastian Andrzej Siewior2017-06-071-0/+4
| | | | | | | | | | | | | | | | The root-lock is dropped before dev_hard_start_xmit() is invoked and after setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away by a task with a higher priority then the task with the higher priority won't be able to submit packets to the NIC directly instead they will be enqueued into the Qdisc. The NIC will remain idle until the task(s) with higher priority leave the CPU and the task with lower priority gets back and finishes the job. If we take always the busylock we ensure that the RT task can boost the low-prio task and submit the packet. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* kvm, rt: change async pagefault code locking for PREEMPT_RTRik van Riel2017-06-071-18/+19
| | | | | | | | | | | | The async pagefault wake code can run from the idle task in exception context, so everything here needs to be made non-preemptible. Conversion to a simple wait queue and raw spinlock does the trick. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Fix probe_wakeup_latency_hist_start() prototypeMike Galbraith2017-06-071-2/+2
| | | | | | | | | Drop 'success' arg from probe_wakeup_latency_hist_start(). Link: http://lkml.kernel.org/r/1457064246.3501.2.camel@gmail.com Fixes: cf1dd658 sched: Introduce the trace_sched_waking tracepoint Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* kernel: sched: Fix preempt_disable_ip recodring for preempt_disable()Sebastian Andrzej Siewior2017-06-074-15/+15
| | | | | | | | | | | | | | | | preempt_disable() invokes preempt_count_add() which saves the caller in current->preempt_disable_ip. It uses CALLER_ADDR1 which does not look for its caller but for the parent of the caller. Which means we get the correct caller for something like spin_lock() unless the architectures inlines those invocations. It is always wrong for preempt_disable() or local_bh_disable(). This patch makes the function get_parent_ip() which tries CALLER_ADDR0,1,2 if the former is a locking function. This seems to record the preempt_disable() caller properly for preempt_disable() itself as well as for get_cpu_var() or local_bh_disable(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* rcu/torture: Comment out rcu_bh ops on PREEMPT_RT_FULLClark Williams2017-06-071-0/+7
| | | | | | | | | RT has dropped support of rcu_bh, comment out in rcutorture. Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* trace: Use rcuidle version for preemptoff_hist trace pointYang Shi2017-06-072-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running -rt kernel with both PREEMPT_OFF_HIST and LOCKDEP enabled, the below error is reported: [ INFO: suspicious RCU usage. ] 4.4.1-rt6 #1 Not tainted include/trace/events/hist.h:31 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! no locks held by swapper/0/0. stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.1-rt6-WR8.0.0.0_standard #1 Stack : 0000000000000006 0000000000000000 ffffffff81ca8c38 ffffffff81c8fc80 ffffffff811bdd68 ffffffff81cb0000 0000000000000000 ffffffff81cb0000 0000000000000000 0000000000000000 0000000000000004 0000000000000000 0000000000000004 ffffffff811bdf50 0000000000000000 ffffffff82b60000 0000000000000000 ffffffff812897ac ffffffff819f0000 000000000000000b ffffffff811be460 ffffffff81b7c588 ffffffff81c8fc80 0000000000000000 0000000000000000 ffffffff81ec7f88 ffffffff81d70000 ffffffff81b70000 ffffffff81c90000 ffffffff81c3fb00 ffffffff81c3fc28 ffffffff815e6f98 0000000000000000 ffffffff81c8fa87 ffffffff81b70958 ffffffff811bf2c4 0707fe32e8d60ca5 ffffffff81126d60 0000000000000000 0000000000000000 ... Call Trace: [<ffffffff81126d60>] show_stack+0xe8/0x108 [<ffffffff815e6f98>] dump_stack+0x88/0xb0 [<ffffffff8124b88c>] time_hardirqs_off+0x204/0x300 [<ffffffff811aa5dc>] trace_hardirqs_off_caller+0x24/0xe8 [<ffffffff811a4ec4>] cpu_startup_entry+0x39c/0x508 [<ffffffff81d7dc68>] start_kernel+0x584/0x5a0 Replace regular trace_preemptoff_hist to rcuidle version to avoid the error. Signed-off-by: Yang Shi <yang.shi@windriver.com> Cc: bigeasy@linutronix.de Cc: rostedt@goodmis.org Cc: linux-rt-users@vger.kernel.org Link: http://lkml.kernel.org/r/1456262603-10075-1-git-send-email-yang.shi@windriver.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* sched,rt: __always_inline preemptible_lazy()Mike Galbraith2017-06-071-1/+1
| | | | | | | | | | | | | | | homer: # nm kernel/sched/core.o|grep preemptible_lazy 00000000000000b5 t preemptible_lazy echo wakeup_rt > current_tracer ==> Welcome to infinity. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Link: http://lkml.kernel.org/r/1456067490.3771.2.camel@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* kernel/stop_machine: partly revert "stop_machine: Use raw spinlocks"Sebastian Andrzej Siewior2017-06-071-31/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With completion using swait and so rawlocks we don't need this anymore. Further, bisect thinks this patch is responsible for: |BUG: unable to handle kernel NULL pointer dereference at (null) |IP: [<ffffffff81082123>] sched_cpu_active+0x53/0x70 |PGD 0 |Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC |Dumping ftrace buffer: | (ftrace buffer empty) |Modules linked in: |CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.1+ #330 |Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 |task: ffff88013ae64b00 ti: ffff88013ae74000 task.ti: ffff88013ae74000 |RIP: 0010:[<ffffffff81082123>] [<ffffffff81082123>] sched_cpu_active+0x53/0x70 |RSP: 0000:ffff88013ae77eb8 EFLAGS: 00010082 |RAX: 0000000000000001 RBX: ffffffff81c2cf20 RCX: 0000001050fb52fb |RDX: 0000001050fb52fb RSI: 000000105117ca1e RDI: 00000000001c7723 |RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 |R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffff |R13: ffffffff81c2cee0 R14: 0000000000000000 R15: 0000000000000001 |FS: 0000000000000000(0000) GS:ffff88013b200000(0000) knlGS:0000000000000000 |CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b |CR2: 0000000000000000 CR3: 0000000001c09000 CR4: 00000000000006e0 |Stack: | ffffffff810c446d ffff88013ae77f00 ffffffff8107d8dd 000000000000000a | 0000000000000001 0000000000000000 0000000000000000 0000000000000000 | 0000000000000000 ffff88013ae77f10 ffffffff8107d90e ffff88013ae77f20 |Call Trace: | [<ffffffff810c446d>] ? debug_lockdep_rcu_enabled+0x1d/0x20 | [<ffffffff8107d8dd>] ? notifier_call_chain+0x5d/0x80 | [<ffffffff8107d90e>] ? __raw_notifier_call_chain+0xe/0x10 | [<ffffffff810598a3>] ? cpu_notify+0x23/0x40 | [<ffffffff8105a7b8>] ? notify_cpu_starting+0x28/0x30 during hotplug. The rawlocks need to remain however. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* kernel: softirq: unlock with irqs onSebastian Andrzej Siewior2017-06-071-1/+3
| | | | | | | | | We unlock the lock while the interrupts are off. This isn't a problem now but will get because the migrate_disable() + enable are not symmetrical in regard to the status of interrupts. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* kernel: migrate_disable() do fastpath in atomic & irqs-offSebastian Andrzej Siewior2017-06-071-2/+2
| | | | | | | | With interrupts off it makes no sense to do the long path since we can't leave the CPU anyway. Also we might end up in a recursion with lockdep. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* latencyhist: disable jump-labelsSebastian Andrzej Siewior2017-06-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Atleast on X86 we die a recursive death |CPU: 3 PID: 585 Comm: bash Not tainted 4.4.1-rt4+ #198 |Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 |task: ffff88007ab4cd00 ti: ffff88007ab94000 task.ti: ffff88007ab94000 |RIP: 0010:[<ffffffff81684870>] [<ffffffff81684870>] int3+0x0/0x10 |RSP: 0018:ffff88013c107fd8 EFLAGS: 00010082 |RAX: ffff88007ab4cd00 RBX: ffffffff8100ceab RCX: 0000000080202001 |RDX: 0000000000000000 RSI: ffffffff8100ceab RDI: ffffffff810c78b2 |RBP: ffff88007ab97c10 R08: ffffffffff57b000 R09: 0000000000000000 |R10: ffff88013bb64790 R11: ffff88007ab4cd68 R12: ffffffff8100ceab |R13: ffffffff810c78b2 R14: ffffffff810f8158 R15: ffffffff810f9120 |FS: 0000000000000000(0000) GS:ffff88013c100000(0063) knlGS:00000000f74e3940 |CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b |CR2: 0000000008cf6008 CR3: 000000013b169000 CR4: 00000000000006e0 |Call Trace: | <#DB> | [<ffffffff810f8158>] ? trace_preempt_off+0x18/0x170 | <<EOE>> | [<ffffffff81077745>] preempt_count_add+0xa5/0xc0 | [<ffffffff810c78b2>] on_each_cpu+0x22/0x90 | [<ffffffff8100ceab>] text_poke_bp+0x5b/0xc0 | [<ffffffff8100a29c>] arch_jump_label_transform+0x8c/0xf0 | [<ffffffff8111c77c>] __jump_label_update+0x6c/0x80 | [<ffffffff8111c83a>] jump_label_update+0xaa/0xc0 | [<ffffffff8111ca54>] static_key_slow_inc+0x94/0xa0 | [<ffffffff810e0d8d>] tracepoint_probe_register_prio+0x26d/0x2c0 | [<ffffffff810e0df3>] tracepoint_probe_register+0x13/0x20 | [<ffffffff810fca78>] trace_event_reg+0x98/0xd0 | [<ffffffff810fcc8b>] __ftrace_event_enable_disable+0x6b/0x180 | [<ffffffff810fd5b8>] event_enable_write+0x78/0xc0 | [<ffffffff8117a768>] __vfs_write+0x28/0xe0 | [<ffffffff8117b025>] vfs_write+0xa5/0x180 | [<ffffffff8117bb76>] SyS_write+0x46/0xa0 | [<ffffffff81002c91>] do_fast_syscall_32+0xa1/0x1d0 | [<ffffffff81684d57>] sysenter_flags_fixed+0xd/0x17 during echo 1 > /sys/kernel/debug/tracing/events/hist/preemptirqsoff_hist/enable Reported-By: Christoph Mathys <eraserix@gmail.com> Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net: provide a way to delegate processing a softirq to ksoftirqdSebastian Andrzej Siewior2017-06-073-1/+30
| | | | | | | | | | | | | If the NET_RX uses up all of his budget it moves the following NAPI invocations into the `ksoftirqd`. On -RT it does not do so. Instead it rises the NET_RX softirq in its current context again. In order to get closer to mainline's behaviour this patch provides __raise_softirq_irqoff_ksoft() which raises the softirq in the ksoftird. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* softirq: split timer softirqs out of ksoftirqdSebastian Andrzej Siewior2017-06-071-10/+72
| | | | | | | | | | | | | | | | | | | | | | | The softirqd runs in -RT with SCHED_FIFO (prio 1) and deals mostly with timer wakeup which can not happen in hardirq context. The prio has been risen from the normal SCHED_OTHER so the timer wakeup does not happen too late. With enough networking load it is possible that the system never goes idle and schedules ksoftirqd and everything else with a higher priority. One of the tasks left behind is one of RCU's threads and so we see stalls and eventually run out of memory. This patch moves the TIMER and HRTIMER softirqs out of the `ksoftirqd` thread into its own `ktimersoftd`. The former can now run SCHED_OTHER (same as mainline) and the latter at SCHED_FIFO due to the wakeups. From networking point of view: The NAPI callback runs after the network interrupt thread completes. If its run time takes too long the NAPI code itself schedules the `ksoftirqd`. Here in the thread it can run at SCHED_OTHER priority and it won't defer RCU anymore. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* preempt-lazy: Add the lazy-preemption check to preempt_schedule()Sebastian Andrzej Siewior2017-06-071-8/+26
| | | | | | | | | Probably in the rebase onto v4.1 this check got moved into less commonly used preempt_schedule_notrace(). This patch ensures that both functions use it. Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* ptrace: don't open IRQs in ptrace_freeze_traced() too earlySebastian Andrzej Siewior2017-06-071-2/+4
| | | | | | | | | | | In the non-RT case the spin_lock_irq() here disables interrupts as well as raw_spin_lock_irq(). So in the unlock case the interrupts are enabled too early. Reported-by: kernel test robot <ying.huang@linux.intel.com> Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* sched: Introduce the trace_sched_waking tracepointPeter Zijlstra2017-06-074-14/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream commit fbd705a0c6184580d0e2fbcbd47a37b6e5822511 Mathieu reported that since 317f394160e9 ("sched: Move the second half of ttwu() to the remote cpu") trace_sched_wakeup() can happen out of context of the waker. This is a problem when you want to analyse wakeup paths because it is now very hard to correlate the wakeup event to whoever issued the wakeup. OTOH trace_sched_wakeup() is issued at the point where we set p->state = TASK_RUNNING, which is right were we hand the task off to the scheduler, so this is an important point when looking at scheduling behaviour, up to here its been the wakeup path everything hereafter is due to scheduler policy. To bridge this gap, introduce a second tracepoint: trace_sched_waking. It is guaranteed to be called in the waker context. [ Ported to linux-4.1.y-rt kernel by Mathieu Desnoyers. Resolved conflict: try_to_wake_up_local() does not exist in -rt kernel. Removed its instrumentation hunk. ] Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Julien Desfossez <jdesfossez@efficios.com> CC: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Francis Giraldeau <francis.giraldeau@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/20150609091336.GQ3644@twins.programming.kicks-ass.net Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* irqwork: Move irq safe work to irq contextThomas Gleixner2017-06-073-4/+17
| | | | | | | | | | | | | | | | On architectures where arch_irq_work_has_interrupt() returns false, we end up running the irq safe work from the softirq context. That results in a potential deadlock in the scheduler irq work which expects that function to be called with interrupts disabled. Split the irq_work_tick() function into a hard and soft variant. Call the hard variant from the tick interrupt and add the soft variant to the timer softirq. Reported-and-tested-by: Yanjiang Jin <yanjiang.jin@windriver.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net/core/cpuhotplug: Drain input_pkt_queue locklessGrygorii Strashko2017-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I can constantly see below error report with 4.1 RT-kernel on TI ARM dra7-evm if I'm trying to unplug cpu1: [ 57.737589] CPU1: shutdown [ 57.767537] BUG: spinlock bad magic on CPU#0, sh/137 [ 57.767546] lock: 0xee994730, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 57.767552] CPU: 0 PID: 137 Comm: sh Not tainted 4.1.10-rt8-01700-g2c38702-dirty #55 [ 57.767555] Hardware name: Generic DRA74X (Flattened Device Tree) [ 57.767568] [<c001acd0>] (unwind_backtrace) from [<c001534c>] (show_stack+0x20/0x24) [ 57.767579] [<c001534c>] (show_stack) from [<c075560c>] (dump_stack+0x84/0xa0) [ 57.767593] [<c075560c>] (dump_stack) from [<c00aca48>] (spin_dump+0x84/0xac) [ 57.767603] [<c00aca48>] (spin_dump) from [<c00acaa4>] (spin_bug+0x34/0x38) [ 57.767614] [<c00acaa4>] (spin_bug) from [<c00acc10>] (do_raw_spin_lock+0x168/0x1c0) [ 57.767624] [<c00acc10>] (do_raw_spin_lock) from [<c075b4cc>] (_raw_spin_lock+0x4c/0x54) [ 57.767631] [<c075b4cc>] (_raw_spin_lock) from [<c07599fc>] (rt_spin_lock_slowlock+0x5c/0x374) [ 57.767638] [<c07599fc>] (rt_spin_lock_slowlock) from [<c075bcf4>] (rt_spin_lock+0x38/0x70) [ 57.767649] [<c075bcf4>] (rt_spin_lock) from [<c06333c0>] (skb_dequeue+0x28/0x7c) [ 57.767662] [<c06333c0>] (skb_dequeue) from [<c06476ec>] (dev_cpu_callback+0x1b8/0x240) [ 57.767673] [<c06476ec>] (dev_cpu_callback) from [<c007566c>] (notifier_call_chain+0x3c/0xb4) The reason is that skb_dequeue is taking skb->lock, but RT changed the core code to use a raw spinlock. The non-raw lock is not initialized on purpose to catch exactly this kind of problem. Fixes: 91df05da13a6 'net: Use skbufhead with raw lock' Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net: Make synchronize_rcu_expedited() conditional on !RT_FULLJosh Cartwright2017-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | While the use of synchronize_rcu_expedited() might make synchronize_net() "faster", it does so at significant cost on RT systems, as expediting a grace period forcibly preempts any high-priority RT tasks (via the stop_machine() mechanism). Without this change, we can observe a latency spike up to 30us with cyclictest by rapidly unplugging/reestablishing an ethernet link. Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Josh Cartwright <joshc@ni.com> Cc: bigeasy@linutronix.de Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20151027123153.GG8245@jcartwri.amer.corp.natinst.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* dump stack: don't disable preemption during traceSebastian Andrzej Siewior2017-06-071-4/+4
| | | | | | | | | | | | | | I see here large latencies during a stack dump on x86. The preempt_disable() and get_cpu() should forbid moving the task to another CPU during a stack dump and avoiding two stack traces in parallel on the same CPU. However a stack trace from a second CPU may still happen in parallel. Also nesting is allowed so a stack trace happens in process-context and we may have another one from IRQ context. With migrate disable we keep this code preemptible and allow a second backtrace on the same CPU by another task. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* rtmutex: Use chainwalking control enumbmouring@ni.com2017-06-071-1/+1
| | | | | | | | | | | | In 8930ed80 (rtmutex: Cleanup deadlock detector debug logic), chainwalking control enums were introduced to limit the deadlock detection logic. One of the calls to task_blocks_on_rt_mutex was missed when converting to use the enums. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Brad Mouring <brad.mouring@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* rtmutex: Handle non enqueued waiters gracefullyThomas Gleixner2017-06-071-1/+1
| | | | | | | | | | | | | | | | Yimin debugged that in case of a PI wakeup in progress when rt_mutex_start_proxy_lock() calls task_blocks_on_rt_mutex() the latter returns -EAGAIN and in consequence the remove_waiter() call runs into a BUG_ON() because there is nothing to remove. Guard it with rt_mutex_has_waiters(). This is a quick fix which is easy to backport. The proper fix is to have a central check in remove_waiter() so we can call it unconditionally. Reported-and-debugged-by: Yimin Deng <yimin11.deng@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* ARM: smp: Move clear_tasks_mm_cpumask() call to __cpu_die()Grygorii Strashko2017-06-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running with the RT-kernel (4.1.5-rt5) on TI OMAP dra7-evm and trying to do Suspend to RAM, the following backtrace occurs: Disabling non-boot CPUs ... PM: noirq suspend of devices complete after 7.295 msecs Disabling non-boot CPUs ... BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:917 in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1 INFO: lockdep is turned off. irq event stamp: 122 hardirqs last enabled at (121): [<c06ac0ac>] _raw_spin_unlock_irqrestore+0x88/0x90 hardirqs last disabled at (122): [<c06abed0>] _raw_spin_lock_irq+0x28/0x5c softirqs last enabled at (0): [<c003d294>] copy_process.part.52+0x410/0x19d8 softirqs last disabled at (0): [< (null)>] (null) Preemption disabled at:[< (null)>] (null) CPU: 1 PID: 18 Comm: migration/1 Tainted: G W 4.1.4-rt3-01046-g96ac8da #204 Hardware name: Generic DRA74X (Flattened Device Tree) [<c0019134>] (unwind_backtrace) from [<c0014774>] (show_stack+0x20/0x24) [<c0014774>] (show_stack) from [<c06a70f4>] (dump_stack+0x88/0xdc) [<c06a70f4>] (dump_stack) from [<c006cab8>] (___might_sleep+0x198/0x2a8) [<c006cab8>] (___might_sleep) from [<c06ac4dc>] (rt_spin_lock+0x30/0x70) [<c06ac4dc>] (rt_spin_lock) from [<c013f790>] (find_lock_task_mm+0x9c/0x174) [<c013f790>] (find_lock_task_mm) from [<c00409ac>] (clear_tasks_mm_cpumask+0xb4/0x1ac) [<c00409ac>] (clear_tasks_mm_cpumask) from [<c00166a4>] (__cpu_disable+0x98/0xbc) [<c00166a4>] (__cpu_disable) from [<c06a2e8c>] (take_cpu_down+0x1c/0x50) [<c06a2e8c>] (take_cpu_down) from [<c00f2600>] (multi_cpu_stop+0x11c/0x158) [<c00f2600>] (multi_cpu_stop) from [<c00f2a9c>] (cpu_stopper_thread+0xc4/0x184) [<c00f2a9c>] (cpu_stopper_thread) from [<c0069058>] (smpboot_thread_fn+0x18c/0x324) [<c0069058>] (smpboot_thread_fn) from [<c00649c4>] (kthread+0xe8/0x104) [<c00649c4>] (kthread) from [<c0010058>] (ret_from_fork+0x14/0x3c) CPU1: shutdown PM: Calling sched_clock_suspend+0x0/0x40 PM: Calling timekeeping_suspend+0x0/0x2e0 PM: Calling irq_gc_suspend+0x0/0x68 PM: Calling fw_suspend+0x0/0x2c PM: Calling cpu_pm_suspend+0x0/0x28 Also, sometimes system stucks right after displaying "Disabling non-boot CPUs ...". The root cause of above backtrace is task_lock() which takes a sleeping lock on -RT. To fix the issue, move clear_tasks_mm_cpumask() call from __cpu_disable() to __cpu_die() which is called on the thread which is asking for a target CPU to be shutdown. In addition, this change restores CPUhotplug functionality on TI OMAP dra7-evm and CPU1 can be unplugged/plugged many times. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <linux-arm-kernel@lists.infradead.org> Cc: Sekhar Nori <nsekhar@ti.com> Cc: Austin Schuh <austin@peloton-tech.com> Cc: <philipp@peloton-tech.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: <bigeasy@linutronix.de> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/1441995683-30817-1-git-send-email-grygorii.strashko@ti.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* cpufreq: Remove cpufreq_rwsemSebastian Andrzej Siewior2017-06-071-43/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpufreq_rwsem was introduced in commit 6eed9404ab3c4 ("cpufreq: Use rwsem for protecting critical sections) in order to replace try_module_get() on the cpu-freq driver. That try_module_get() worked well until the refcount was so heavily used that module removal became more or less impossible. Though when looking at the various (undocumented) protection mechanisms in that code, the randomly sprinkeled around cpufreq_rwsem locking sites are superfluous. The policy, which is acquired in cpufreq_cpu_get() and released in cpufreq_cpu_put() is sufficiently protected already. cpufreq_cpu_get(cpu) /* Protects against concurrent driver removal */ read_lock_irqsave(&cpufreq_driver_lock, flags); policy = per_cpu(cpufreq_cpu_data, cpu); kobject_get(&policy->kobj); read_unlock_irqrestore(&cpufreq_driver_lock, flags); The reference on the policy serializes versus module unload already: cpufreq_unregister_driver() subsys_interface_unregister() __cpufreq_remove_dev_finish() per_cpu(cpufreq_cpu_data) = NULL; cpufreq_policy_put_kobj() If there is a reference held on the policy, i.e. obtained prior to the unregister call, then cpufreq_policy_put_kobj() will wait until that reference is dropped. So once subsys_interface_unregister() returns there is no policy pointer in flight and no new reference can be obtained. So that rwsem protection is useless. The other usage of cpufreq_rwsem in show()/store() of the sysfs interface is redundant as well because sysfs already does the proper kobject_get()/put() pairs. That leaves CPU hotplug versus module removal. The current down_write() around the write_lock() in cpufreq_unregister_driver() is silly at best as it protects actually nothing. The trivial solution to this is to prevent hotplug across cpufreq_unregister_driver completely. [upstream: rafael/linux-pm 454d3a2500a4eb33be85dde3bfba9e5f6b5efadc] [fixes: "cpufreq_stat_notifier_trans: No policy found" since v4.0-rt] Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT_FULLBogdan Purcareata2017-06-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | While converting the openpic emulation code to use a raw_spinlock_t enables guests to run on RT, there's still a performance issue. For interrupts sent in directed delivery mode with a multiple CPU mask, the emulated openpic will loop through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop through all the pending interrupts for that VCPU. This is done while holding the raw_lock, meaning that in all this time the interrupts and preemption are disabled on the host Linux. A malicious user app can max both these number and cause a DoS. This temporary fix is sent for two reasons. First is so that users who want to use the in-kernel MPIC emulation are aware of the potential latencies, thus making sure that the hardware MPIC and their usage scenario does not involve interrupts sent in directed delivery mode, and the number of possible pending interrupts is kept small. Secondly, this should incentivize the development of a proper openpic emulation that would be better suited for RT. Cc: stable-rt@vger.kernel.org Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>