summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Linux 3.0.101-rt130 REBASEv3.0.101-rt130-rebasev3.0-rt-rebaseSteven Rostedt (Red Hat)2013-10-241-1/+1
|
* genirq: do not invoke the affinity callback via a workqueueSebastian Andrzej Siewior2013-10-242-3/+77
| | | | | | | | | | | Joe Korty reported, that __irq_set_affinity_locked() schedules a workqueue while holding a rawlock which results in a might_sleep() warning. This patch moves the invokation into a process context so that we only wakeup() a process while holding the lock. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* hwlat-detector: Use thread instead of stop machineSteven Rostedt2013-10-241-34/+25
| | | | | | | | | | | | | | There's no reason to use stop machine to search for hardware latency. Simply disabling interrupts while running the loop will do enough to check if something comes in that wasn't disabled by interrupts being off, which is exactly what stop machine does. Instead of using stop machine, just have the thread disable interrupts while it checks for hardware latency. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* hwlat-detector: Use trace_clock_local if availableSteven Rostedt2013-10-241-9/+25
| | | | | | | | | | | | As ktime_get() calls into the timing code which does a read_seq(), it may be affected by other CPUS that touch that lock. To remove this dependency, use the trace_clock_local() which is already exported for module use. If CONFIG_TRACING is enabled, use that as the clock, otherwise use ktime_get(). Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* hwlat-detect/trace: Export trace_clock_local for hwlat-detectorSteven Rostedt (Red Hat)2013-10-241-0/+1
| | | | | | | | The hwlat-detector needs a better clock than just ktime_get() as that can induce its own latencies. The trace clock is perfect for it, but it needs to be exported for use by modules. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* hwlat-detector: Update hwlat_detector to add outer loop detectionSteven Rostedt2013-10-241-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hwlat_detector reads two timestamps in a row, then reports any gap between those calls. The problem is, it misses everything between the second reading of the time stamp to the first reading of the time stamp in the next loop. That's were most of the time is spent, which means, chances are likely that it will miss all hardware latencies. This defeats the purpose. By also testing the first time stamp from the previous loop second time stamp (the outer loop), we are more likely to find a latency. Setting the threshold to 1, here's what the report now looks like: 1347415723.0232202770 0 2 1347415725.0234202822 0 2 1347415727.0236202875 0 2 1347415729.0238202928 0 2 1347415731.0240202980 0 2 1347415734.0243203061 0 2 1347415736.0245203113 0 2 1347415738.0247203166 2 0 1347415740.0249203219 0 3 1347415742.0251203272 0 3 1347415743.0252203299 0 3 1347415745.0254203351 0 2 1347415747.0256203404 0 2 1347415749.0258203457 0 2 1347415751.0260203510 0 2 1347415754.0263203589 0 2 1347415756.0265203642 0 2 1347415758.0267203695 0 2 1347415760.0269203748 0 2 1347415762.0271203801 0 2 1347415764.0273203853 2 0 There's some hardware latency that takes 2 microseconds to run. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* rt,ntp: Move call to schedule_delayed_work() to helper threadSteven Rostedt2013-10-241-0/+42
| | | | | | | | | | | | | | | | | | | | The ntp code for notify_cmos_timer() is called from a hard interrupt context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks that have been converted to mutexes, thus calling schedule_delayed_work() from interrupt is not safe. Add a helper thread that does the call to schedule_delayed_work and wake up that thread instead of calling schedule_delayed_work() directly. This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls schedule_delayed_work() directly in irq context. Note: There's a few places in the kernel that do this. Perhaps the RT code should have a dedicated thread that does the checks. Just register a notifier on boot up for your check and wake up the thread when needed. This will be a todo. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* drm/i915: drop trace_i915_gem_ring_dispatch on rtSebastian Andrzej Siewior2013-10-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This tracepoint is responsible for: |[<814cc358>] __schedule_bug+0x4d/0x59 |[<814d24cc>] __schedule+0x88c/0x930 |[<814d3b90>] ? _raw_spin_unlock_irqrestore+0x40/0x50 |[<814d3b95>] ? _raw_spin_unlock_irqrestore+0x45/0x50 |[<810b57b5>] ? task_blocks_on_rt_mutex+0x1f5/0x250 |[<814d27d9>] schedule+0x29/0x70 |[<814d3423>] rt_spin_lock_slowlock+0x15b/0x278 |[<814d3786>] rt_spin_lock+0x26/0x30 |[<a00dced9>] gen6_gt_force_wake_get+0x29/0x60 [i915] |[<a00e183f>] gen6_ring_get_irq+0x5f/0x100 [i915] |[<a00b2a33>] ftrace_raw_event_i915_gem_ring_dispatch+0xe3/0x100 [i915] |[<a00ac1b3>] i915_gem_do_execbuffer.isra.13+0xbd3/0x1430 [i915] |[<810f8943>] ? trace_buffer_unlock_commit+0x43/0x60 |[<8113e8d2>] ? ftrace_raw_event_kmem_alloc+0xd2/0x180 |[<8101d063>] ? native_sched_clock+0x13/0x80 |[<a00acf29>] i915_gem_execbuffer2+0x99/0x280 [i915] |[<a00114a3>] drm_ioctl+0x4c3/0x570 [drm] |[<8101d0d9>] ? sched_clock+0x9/0x10 |[<a00ace90>] ? i915_gem_execbuffer+0x480/0x480 [i915] |[<810f1c18>] ? rb_commit+0x68/0xa0 |[<810f1c6c>] ? ring_buffer_unlock_commit+0x1c/0xa0 |[<81197467>] do_vfs_ioctl+0x97/0x540 |[<81021318>] ? ftrace_raw_event_sys_enter+0xd8/0x130 |[<811979a1>] sys_ioctl+0x91/0xb0 |[<814db931>] tracesys+0xe1/0xe6 Chris Wilson does not like to move i915_trace_irq_get() out of the macro |No. This enables the IRQ, as well as making a number of |very expensively serialised read, unconditionally. so it is gone now on RT. Cc: stable-rt@vger.kernel.org Reported-by: Joakim Hernberg <jbh@alchemy.lu> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* kernel/hotplug: restore original cpu mask oncpu/downSebastian Andrzej Siewior2013-10-241-1/+12
| | | | | | | | | | If a task which is allowed to run only on CPU X puts CPU Y down then it will be allowed on all CPUs but the on CPU Y after it comes back from kernel. This patch ensures that we don't lose the initial setting unless the CPU the task is running is going down. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* kernel/cpu: fix cpu down problem if kthread's cpu is going downSebastian Andrzej Siewior2013-10-241-2/+14
| | | | | | | | | | | | | | | | | | | | | | If kthread is pinned to CPUx and CPUx is going down then we get into trouble: - first the unplug thread is created - it will set itself to hp->unplug. As a result, every task that is going to take a lock, has to leave the CPU. - the CPU_DOWN_PREPARE notifier are started. The worker thread will start a new process for the "high priority worker". Now kthread would like to take a lock but since it can't leave the CPU it will never complete its task. We could fire the unplug thread after the notifier but then the cpu is no longer marked "online" and the unplug thread will run on CPU0 which was fixed before :) So instead the unplug thread is started and kept waiting until the notfier complete their work. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* timers: prepare for full preemption improveZhao Hongjiang2013-10-241-2/+6
| | | | | | | | | | wake_up should do nothing on the nort, so we should use wakeup_timer_waiters, also fix a spell mistake. Cc: stable-rt@vger.kernel.org Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> [bigeasy: s/CONFIG_PREEMPT_RT_BASE/CONFIG_PREEMPT_RT_FULL/] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* list_bl.h: fix it for for !SMP && !DEBUG_SPINLOCKUwe Kleine-König2013-10-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch "list_bl.h: make list head locking RT safe" introduced an unconditional __set_bit(0, (unsigned long *)b); in void hlist_bl_lock(struct hlist_bl_head *b). This clobbers the value of b->first. When the value of b->first is retrieved using hlist_bl_first the clobbering is undone using (unsigned long)h->first & ~LIST_BL_LOCKMASK and so depending on LIST_BL_LOCKMASK being one. But LIST_BL_LOCKMASK is only one if at least on of CONFIG_SMP and CONFIG_DEBUG_SPINLOCK are defined. Without these the value returned by hlist_bl_first has the zeroth bit set which likely results in a crash. So only do the clobbering in the cases where LIST_BL_LOCKMASK is one. An alternative would be to always define LIST_BL_LOCKMASK to one with CONFIG_PREEMPT_RT_BASE. Cc: stable-rt@vger.kernel.org Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* list_bl.h: make list head locking RT safePaul Gortmaker2013-10-241-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As per changes in include/linux/jbd_common.h for avoiding the bit_spin_locks on RT ("fs: jbd/jbd2: Make state lock and journal head lock rt safe") we do the same thing here. We use the non atomic __set_bit and __clear_bit inside the scope of the lock to preserve the ability of the existing LIST_DEBUG code to use the zero'th bit in the sanity checks. As a bit spinlock, we had no lockdep visibility into the usage of the list head locking. Now, if we were to implement it as a standard non-raw spinlock, we would see: BUG: sleeping function called from invalid context at kernel/rtmutex.c:658 in_atomic(): 1, irqs_disabled(): 0, pid: 122, name: udevd 5 locks held by udevd/122: #0: (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffff811967e8>] lock_rename+0xe8/0xf0 #1: (rename_lock){+.+...}, at: [<ffffffff811a277c>] d_move+0x2c/0x60 #2: (&dentry->d_lock){+.+...}, at: [<ffffffff811a0763>] dentry_lock_for_move+0xf3/0x130 #3: (&dentry->d_lock/2){+.+...}, at: [<ffffffff811a0734>] dentry_lock_for_move+0xc4/0x130 #4: (&dentry->d_lock/3){+.+...}, at: [<ffffffff811a0747>] dentry_lock_for_move+0xd7/0x130 Pid: 122, comm: udevd Not tainted 3.4.47-rt62 #7 Call Trace: [<ffffffff810b9624>] __might_sleep+0x134/0x1f0 [<ffffffff817a24d4>] rt_spin_lock+0x24/0x60 [<ffffffff811a0c4c>] __d_shrink+0x5c/0xa0 [<ffffffff811a1b2d>] __d_drop+0x1d/0x40 [<ffffffff811a24be>] __d_move+0x8e/0x320 [<ffffffff811a278e>] d_move+0x3e/0x60 [<ffffffff81199598>] vfs_rename+0x198/0x4c0 [<ffffffff8119b093>] sys_renameat+0x213/0x240 [<ffffffff817a2de5>] ? _raw_spin_unlock+0x35/0x60 [<ffffffff8107781c>] ? do_page_fault+0x1ec/0x4b0 [<ffffffff817a32ca>] ? retint_swapgs+0xe/0x13 [<ffffffff813eb0e6>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8119b0db>] sys_rename+0x1b/0x20 [<ffffffff817a3b96>] system_call_fastpath+0x1a/0x1f Since we are only taking the lock during short lived list operations, lets assume for now that it being raw won't be a significant latency concern. Cc: stable-rt@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* genirq: Set irq thread to RT priority on creationIvo Sieben2013-10-241-4/+6
| | | | | | | | | | | | | | | | | | | | When a threaded irq handler is installed the irq thread is initially created on normal scheduling priority. Only after the irq thread is woken up it sets its priority to RT_FIFO MAX_USER_RT_PRIO/2 itself. This means that interrupts that occur directly after the irq handler is installed will be handled on a normal scheduling priority instead of the realtime priority that one would expect. Fix this by setting the RT priority on creation of the irq_thread. Cc: stable-rt@vger.kernel.org Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1370254322-17240-1-git-send-email-meltedpianoman@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* x86/mce: fix mce timer intervalMike Galbraith2013-10-241-2/+2
| | | | | | | | | | | | | Seems mce timer fire at the wrong frequency in -rt kernels since roughly forever due to 32 bit overflow. 3.8-rt is also missing a multiplier. Add missing us -> ns conversion and 32 bit overflow prevention. Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bigeasy: use ULL instead of u64 cast] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* sched/workqueue: Only wake up idle workers if not blocked on sleeping spin lockSteven Rostedt2013-10-241-1/+3
| | | | | | | | | | | | | | | | | | In -rt, most spin_locks() turn into mutexes. One of these spin_lock conversions is performed on the workqueue gcwq->lock. When the idle worker is worken, the first thing it will do is grab that same lock and it too will block, possibly jumping into the same code, but because nr_running would already be decremented it prevents an infinite loop. But this is still a waste of CPU cycles, and it doesn't follow the method of mainline, as new workers should only be woken when a worker thread is truly going to sleep, and not just blocked on a spin_lock(). Check the saved_state too before waking up new workers. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* Linux 3.0.89-rt117 REBASESteven Rostedt (Red Hat)2013-10-241-1/+1
|
* swap: Use unique local lock name for swap_lockSteven Rostedt2013-10-241-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | From lib/Kconfig.debug on CONFIG_FORCE_WEAK_PER_CPU: ---- s390 and alpha require percpu variables in modules to be defined weak to work around addressing range issue which puts the following two restrictions on percpu variable definitions. 1. percpu symbols must be unique whether static or not 2. percpu variables can't be defined inside a function To ensure that generic code follows the above rules, this option forces all percpu variables to be defined as weak. ---- The addition of the local IRQ swap_lock in mm/swap.c broke this config as the name "swap_lock" is used through out the kernel. Just do a "git grep swap_lock" to see, and the new swap_lock is a local lock which defines the swap_lock for per_cpu. The fix was to rename swap_lock to swapvec_lock which keeps it unique. Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* x86/mce: Defer mce wakeups to threads for PREEMPT_RTSteven Rostedt2013-10-241-16/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We had a customer report a lockup on a 3.0-rt kernel that had the following backtrace: [ffff88107fca3e80] rt_spin_lock_slowlock at ffffffff81499113 [ffff88107fca3f40] rt_spin_lock at ffffffff81499a56 [ffff88107fca3f50] __wake_up at ffffffff81043379 [ffff88107fca3f80] mce_notify_irq at ffffffff81017328 [ffff88107fca3f90] intel_threshold_interrupt at ffffffff81019508 [ffff88107fca3fa0] smp_threshold_interrupt at ffffffff81019fc1 [ffff88107fca3fb0] threshold_interrupt at ffffffff814a1853 It actually bugged because the lock was taken by the same owner that already had that lock. What happened was the thread that was setting itself on a wait queue had the lock when an MCE triggered. The MCE interrupt does a wake up on its wait list and grabs the same lock. NOTE: THIS IS NOT A BUG ON MAINLINE Sorry for yelling, but as I Cc'd mainline maintainers I want them to know that this is an PREEMPT_RT bug only. I only Cc'd them for advice. On PREEMPT_RT the wait queue locks are converted from normal "spin_locks" into an rt_mutex (see the rt_spin_lock_slowlock above). These are not to be taken by hard interrupt context. This usually isn't a problem as most all interrupts in PREEMPT_RT are converted into schedulable threads. Unfortunately that's not the case with the MCE irq. As wait queue locks are notorious for long hold times, we can not convert them to raw_spin_locks without causing issues with -rt. But Thomas has created a "simple-wait" structure that uses raw spin locks which may have been a good fit. Unfortunately, wait queues are not the only issue, as the mce_notify_irq also does a schedule_work(), which grabs the workqueue spin locks that have the exact same issue. Thus, this patch I'm proposing is to move the actual work of the MCE interrupt into a helper thread that gets woken up on the MCE interrupt and does the work in a schedulable context. NOTE: THIS PATCH ONLY CHANGES THE BEHAVIOR WHEN PREEMPT_RT IS SET Oops, sorry for yelling again, but I want to stress that I keep the same behavior of mainline when PREEMPT_RT is not set. Thus, this only changes the MCE behavior when PREEMPT_RT is configured. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bigeasy@linutronix: make mce_notify_work() a proper prototype, use kthread_run()] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
* rcutiny: Use simple waitqueueThomas Gleixner2013-10-241-4/+5
| | | | | | | Simple waitqueues can be handled from interrupt disabled contexts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* acpi/rt: Convert acpi_gbl_hardware lock back to a raw_spinlock_tSteven Rostedt2013-10-245-7/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We hit the following bug with 3.6-rt: [ 5.898990] BUG: scheduling while atomic: swapper/3/0/0x00000002 [ 5.898991] no locks held by swapper/3/0. [ 5.898993] Modules linked in: [ 5.898996] Pid: 0, comm: swapper/3 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1 [ 5.898997] Call Trace: [ 5.899011] [<ffffffff810804e7>] __schedule_bug+0x67/0x90 [ 5.899028] [<ffffffff81577923>] __schedule+0x793/0x7a0 [ 5.899032] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200 [ 5.899034] [<ffffffff81577b89>] schedule+0x29/0x70 [ 5.899036] BUG: scheduling while atomic: swapper/7/0/0x00000002 [ 5.899037] no locks held by swapper/7/0. [ 5.899039] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0 [ 5.899040] Modules linked in: [ 5.899041] [ 5.899045] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90 [ 5.899046] Pid: 0, comm: swapper/7 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1 [ 5.899047] Call Trace: [ 5.899049] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40 [ 5.899052] [<ffffffff810804e7>] __schedule_bug+0x67/0x90 [ 5.899054] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80 [ 5.899056] [<ffffffff81577923>] __schedule+0x793/0x7a0 [ 5.899059] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23 [ 5.899062] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200 [ 5.899068] [<ffffffff8130be64>] acpi_write_bit_register+0x33/0xb0 [ 5.899071] [<ffffffff81577b89>] schedule+0x29/0x70 [ 5.899072] [<ffffffff8130be13>] ? acpi_read_bit_register+0x33/0x51 [ 5.899074] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0 [ 5.899077] [<ffffffff8131d1fc>] acpi_idle_enter_bm+0x8a/0x28e [ 5.899079] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90 [ 5.899081] [<ffffffff8107e5da>] ? this_cpu_load+0x1a/0x30 [ 5.899083] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40 [ 5.899087] [<ffffffff8144c759>] cpuidle_enter+0x19/0x20 [ 5.899088] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80 [ 5.899090] [<ffffffff8144c777>] cpuidle_enter_state+0x17/0x50 [ 5.899092] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23 [ 5.899094] [<ffffffff8144d1a1>] cpuidle899101] [<ffffffff8130be13>] ? As the acpi code disables interrupts in acpi_idle_enter_bm, and calls code that grabs the acpi lock, it causes issues as the lock is currently in RT a sleeping lock. The lock was converted from a raw to a sleeping lock due to some previous issues, and tests that showed it didn't seem to matter. Unfortunately, it did matter for one of our boxes. This patch converts the lock back to a raw lock. I've run this code on a few of my own machines, one being my laptop that uses the acpi quite extensively. I've been able to suspend and resume without issues. [ tglx: Made the change exclusive for acpi_gbl_hardware_lock ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: John Kacur <jkacur@gmail.com> Cc: Clark Williams <clark@redhat.com> Link: http://lkml.kernel.org/r/1360765565.23152.5.camel@gandalf.local.home Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/32: Use kmap switch for non highmem as wellThomas Gleixner2013-10-242-2/+4
| | | | | | | | | Even with CONFIG_HIGHMEM=n we need to take care of the "atomic" mappings which are installed via iomap_atomic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* mm: swap: Initialize local locks earlyThomas Gleixner2013-10-241-3/+9
| | | | | | Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* sched: Init idle->on_rq in init_idle()Thomas Gleixner2013-10-241-0/+1
| | | | | | Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* mmci: Remove bogus local_irq_save()Thomas Gleixner2013-10-241-5/+0
| | | | | | | | | On !RT interrupt runs with interrupts disabled. On RT it's in a thread, so no need to disable interrupts at all. Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* drivers-tty-pl011-irq-disable-madness.patchThomas Gleixner2013-10-241-5/+10
| | | | | | Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* sched: Consider pi boosting in setschedulerThomas Gleixner2013-10-243-8/+48
| | | | | | | | | | | | | | | | | | If a PI boosted task policy/priority is modified by a setscheduler() call we unconditionally dequeue and requeue the task if it is on the runqueue even if the new priority is lower than the current effective boosted priority. This can result in undesired reordering of the priority bucket list. If the new priority is less or equal than the current effective we just store the new parameters in the task struct and leave the scheduler class and the runqueue untouched. This is handled when the task deboosts itself. Only if the new priority is higher than the effective boosted priority we apply the change immediately. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: stable-rt@vger.kernel.org
* sched: Queue RT tasks to head when prio dropsThomas Gleixner2013-10-241-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following scenario does not work correctly: Runqueue of CPUx contains two runnable and pinned tasks: T1: SCHED_FIFO, prio 80 T2: SCHED_FIFO, prio 80 T1 is on the cpu and executes the following syscalls (classic priority ceiling scenario): sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90); ... sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80); ... Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back to sleep the scheduler picks T2. Surprise! The same happens w/o actual preemption when T1 is forced into the scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes pick_next_task() which returns T2. So T1 gets preempted and scheduled out. This happens because sched_setscheduler() dequeues T1 from the prio 90 list and then enqueues it on the tail of the prio 80 list behind T2. This violates the POSIX spec and surprises user space which relies on the guarantee that SCHED_FIFO tasks are not scheduled out unless they give the CPU up voluntarily or are preempted by a higher priority task. In the latter case the preempted task must get back on the CPU after the preempting task schedules out again. We fixed a similar issue already in commit 60db48c (sched: Queue a deboosted task to the head of the RT prio queue). The same treatment is necessary for sched_setscheduler(). So enqueue to head of the prio bucket list if the priority of the task is lowered. It might be possible that existing user space relies on the current behaviour, but it can be considered highly unlikely due to the corner case nature of the application scenario. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: stable-rt@vger.kernel.org
* sched: Adjust sched_reset_on_fork when nothing else changesThomas Gleixner2013-10-241-2/+4
| | | | | | | | | If the policy and priority remain unchanged a possible modification of sched_reset_on_fork gets lost in the early exit path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: stable-rt@vger.kernel.org
* net: netfilter: Serialize xt_write_recseq sections on RTThomas Gleixner2013-10-243-0/+17
| | | | | | | | | | The netfilter code relies only on the implicit semantics of local_bh_disable() for serializing wt_write_recseq sections. RT breaks that and needs explicit serialization here. Reported-by: Peter LaDow <petela@gocougs.wsu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
* rcu: Disable RCU_FAST_NO_HZ on RTThomas Gleixner2013-10-241-1/+1
| | | | | | | | This uses a timer_list timer from the irq disabled guts of the idle code. Disable it for now to prevent wreckage. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
* hrtimer: Raise softirq if hrtimer irq stalledWatanabe2013-10-241-5/+4
| | | | | | | When the hrtimer stall detection hits the softirq is not raised. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
* sched: Better debug output for might sleepThomas Gleixner2013-10-242-2/+25
| | | | | | | | | might sleep can tell us where interrupts have been disabled, but we have no idea what disabled preemption. Add some debug infrastructure. Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* rt: rwsem/rwlock: lockdep annotationsThomas Gleixner2013-10-241-21/+25
| | | | | | | | | rwlocks and rwsems on RT do not allow multiple readers. Annotate the lockdep acquire functions accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* mm: page_alloc: Use local_lock_on() instead of plain spinlockThomas Gleixner2013-10-242-2/+13
| | | | | | | | | The plain spinlock while sufficient does not update the local_lock internals. Use a proper local_lock function instead to ease debugging. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* mm: slab: Fix potential deadlockThomas Gleixner2013-10-242-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ============================================= [ INFO: possible recursive locking detected ] 3.6.0-rt1+ #49 Not tainted --------------------------------------------- swapper/0/1 is trying to acquire lock: lock_slab_on+0x72/0x77 but task is already holding lock: __local_lock_irq+0x24/0x77 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&per_cpu(slab_lock, __cpu).lock); lock(&per_cpu(slab_lock, __cpu).lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by swapper/0/1: kmem_cache_create+0x33/0x89 __local_lock_irq+0x24/0x77 stack backtrace: Pid: 1, comm: swapper/0 Not tainted 3.6.0-rt1+ #49 Call Trace: __lock_acquire+0x9a4/0xdc4 ? __local_lock_irq+0x24/0x77 ? lock_slab_on+0x72/0x77 lock_acquire+0xc4/0x108 ? lock_slab_on+0x72/0x77 ? unlock_slab_on+0x5b/0x5b rt_spin_lock+0x36/0x3d ? lock_slab_on+0x72/0x77 ? migrate_disable+0x85/0x93 lock_slab_on+0x72/0x77 do_ccupdate_local+0x19/0x44 slab_on_each_cpu+0x36/0x5a do_tune_cpucache+0xc1/0x305 enable_cpucache+0x8c/0xb5 setup_cpu_cache+0x28/0x182 __kmem_cache_create+0x34b/0x380 ? shmem_mount+0x1a/0x1a kmem_cache_create+0x4a/0x89 ? shmem_mount+0x1a/0x1a shmem_init+0x3e/0xd4 kernel_init+0x11c/0x214 kernel_thread_helper+0x4/0x10 ? retint_restore_args+0x13/0x13 ? start_kernel+0x3bc/0x3bc ? gs_change+0x13/0x13 It's not a missing annotation. It's simply wrong code and needs to be fixed. Instead of nesting the local and the remote cpu lock simply acquire only the remote cpu lock, which is sufficient protection for this procedure. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* softirq: Init softirq local lock after per cpu section is set upSteven Rostedt2013-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I discovered this bug when booting 3.4-rt on my powerpc box. It crashed with the following report: ------------[ cut here ]------------ kernel BUG at /work/rt/stable-rt.git/kernel/rtmutex_common.h:75! Oops: Exception in kernel mode, sig: 5 [#1] PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient Modules linked in: NIP: c0000000004aa03c LR: c0000000004aa01c CTR: c00000000009b2ac REGS: c00000003e8d7950 TRAP: 0700 Not tainted (3.4.11-test-rt19) MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000082 XER: 20000000 SOFTE: 0 TASK = c00000003e8fdcd0[11] 'ksoftirqd/1' THREAD: c00000003e8d4000 CPU: 1 GPR00: 0000000000000001 c00000003e8d7bd0 c000000000d6cbb0 0000000000000000 GPR04: c00000003e8fdcd0 0000000000000000 0000000024004082 c000000000011454 GPR08: 0000000000000000 0000000080000001 c00000003e8fdcd1 0000000000000000 GPR12: 0000000024000084 c00000000fff0280 ffffffffffffffff 000000003ffffad8 GPR16: ffffffffffffffff 000000000072c798 0000000000000060 0000000000000000 GPR20: 0000000000642741 000000000072c858 000000003ffffaf0 0000000000000417 GPR24: 000000000072dcd0 c00000003e7ff990 0000000000000000 0000000000000001 GPR28: 0000000000000000 c000000000792340 c000000000ccec78 c000000001182338 NIP [c0000000004aa03c] .wakeup_next_waiter+0x44/0xb8 LR [c0000000004aa01c] .wakeup_next_waiter+0x24/0xb8 Call Trace: [c00000003e8d7bd0] [c0000000004aa01c] .wakeup_next_waiter+0x24/0xb8 (unreliable) [c00000003e8d7c60] [c0000000004a0320] .rt_spin_lock_slowunlock+0x8c/0xe4 [c00000003e8d7ce0] [c0000000004a07cc] .rt_spin_unlock+0x54/0x64 [c00000003e8d7d60] [c0000000000636bc] .__thread_do_softirq+0x130/0x174 [c00000003e8d7df0] [c00000000006379c] .run_ksoftirqd+0x9c/0x1a4 [c00000003e8d7ea0] [c000000000080b68] .kthread+0xa8/0xb4 [c00000003e8d7f90] [c00000000001c2f8] .kernel_thread+0x54/0x70 Instruction dump: 60000000 e86d01c8 38630730 4bff7061 60000000 ebbf0008 7c7c1b78 e81d0040 7fe00278 7c000074 7800d182 68000001 <0b000000> e88d01c8 387d0010 38840738 The rtmutex_common.h:75 is: rt_mutex_top_waiter(struct rt_mutex *lock) { struct rt_mutex_waiter *w; w = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter, list_entry); BUG_ON(w->lock != lock); return w; } Where the waiter->lock is corrupted. I saw various other random bugs that all had to with the softirq lock and plist. As plist needs to be initialized before it is used I investigated how this lock is initialized. It's initialized with: void __init softirq_early_init(void) { local_irq_lock_init(local_softirq_lock); } Where: do { \ int __cpu; \ for_each_possible_cpu(__cpu) \ spin_lock_init(&per_cpu(lvar, __cpu).lock); \ } while (0) As the softirq lock is a local_irq_lock, which is a per_cpu lock, the initialization is done to all per_cpu versions of the lock. But lets look at where the softirq_early_init() is called from. In init/main.c: start_kernel() /* * Interrupts are still disabled. Do necessary setups, then * enable them */ softirq_early_init(); tick_init(); boot_cpu_init(); page_address_init(); printk(KERN_NOTICE "%s", linux_banner); setup_arch(&command_line); mm_init_owner(&init_mm, &init_task); mm_init_cpumask(&init_mm); setup_command_line(command_line); setup_nr_cpu_ids(); setup_per_cpu_areas(); smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ One of the first things that is called is the initialization of the softirq lock. But if you look further down, we see the per_cpu areas have not been set up yet. Thus initializing a local_irq_lock() before the per_cpu section is set up, may not work as it is initializing the per cpu locks before the per cpu exists. By moving the softirq_early_init() right after setup_per_cpu_areas(), the kernel boots fine. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <clark@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Carsten Emde <cbe@osadl.org> Cc: vomlehn@texas.net Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/1349362924.6755.18.camel@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* fix printk flush of messagesFrank Rowand2013-10-245-54/+3
| | | | | | | | | | | Reverse preempt-rt-allow-immediate-magic-sysrq-output-for-preempt_rt_full.patch The problem addressed by that patch does not exist after applying console-make-rt-friendly-update.patch Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Link: http://lkml.kernel.org/r/4FB44EF1.9050809@am.sony.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* fix printk flush of messagesFrank Rowand2013-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updates console-make-rt-friendly.patch #ifdef CONFIG_PREEMPT_RT_FULL, printk() output is never flushed by printk() because: # some liberties taken in this pseudo-code to make it easier to follow printk() vprintk() raw_spin_lock(&logbuf_lock) # increment preempt_count(): preempt_disable() result = console_trylock_for_printk() retval = 0 # lock will always be false, because preempt_count() will be >= 1 lock = ... && !preempt_count() if (lock) retval = 1 return retval # result will always be false since lock will always be false if (result) console_unlock() # this is where the printk() output would be flushed On system boot some printk() output is flushed because register_console() and tty_open() call console_unlock(). This change also fixes the problem that was previously fixed by preempt-rt-allow-immediate-magic-sysrq-output-for-preempt_rt_full.patch Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Frank <Frank_Rowand@sonyusa.com> Link: http://lkml.kernel.org/r/4FB44FD0.4090800@am.sony.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* cpu/rt: Fix cpu_hotplug variable initializationSteven Rostedt2013-10-241-4/+0
| | | | | | | | | | The commit "cpu/rt: Rework cpu down for PREEMPT_RT" changed the double meaning of the cpu_hotplug.lock, where it was a spinlock for RT and a mutex for non-RT, to just a mutex for both. But the initialization of the variable was not updated to reflect this change. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* cpu/rt: Rework cpu down for PREEMPT_RTSteven Rostedt2013-10-243-40/+285
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bringing a CPU down is a pain with the PREEMPT_RT kernel because tasks can be preempted in many more places than in non-RT. In order to handle per_cpu variables, tasks may be pinned to a CPU for a while, and even sleep. But these tasks need to be off the CPU if that CPU is going down. Several synchronization methods have been tried, but when stressed they failed. This is a new approach. A sync_tsk thread is still created and tasks may still block on a lock when the CPU is going down, but how that works is a bit different. When cpu_down() starts, it will create the sync_tsk and wait on it to inform that current tasks that are pinned on the CPU are no longer pinned. But new tasks that are about to be pinned will still be allowed to do so at this time. Then the notifiers are called. Several notifiers will bring down tasks that will enter these locations. Some of these tasks will take locks of other tasks that are on the CPU. If we don't let those other tasks continue, but make them block until CPU down is done, the tasks that the notifiers are waiting on will never complete as they are waiting for the locks held by the tasks that are blocked. Thus we still let the task pin the CPU until the notifiers are done. After the notifiers run, we then make new tasks entering the pinned CPU sections grab a mutex and wait. This mutex is now a per CPU mutex in the hotplug_pcp descriptor. To help things along, a new function in the scheduler code is created called migrate_me(). This function will try to migrate the current task off the CPU this is going down if possible. When the sync_tsk is created, all tasks will then try to migrate off the CPU going down. There are several cases that this wont work, but it helps in most cases. After the notifiers are called and if a task can't migrate off but enters the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex until the CPU down is complete. Then the scheduler will force the migration anyway. Also, I found that THREAD_BOUND need to also be accounted for in the pinned CPU, and the migrate_disable no longer treats them special. This helps fix issues with ksoftirqd and workqueue that unbind on CPU down. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* perf: Make swevent hrtimer run in irq instead of softirqYong Zhang2013-10-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise we get a deadlock like below: [ 1044.042749] BUG: scheduling while atomic: ksoftirqd/21/141/0x00010003 [ 1044.042752] INFO: lockdep is turned off. [ 1044.042754] Modules linked in: [ 1044.042757] Pid: 141, comm: ksoftirqd/21 Tainted: G W 3.4.0-rc2-rt3-23676-ga723175-dirty #29 [ 1044.042759] Call Trace: [ 1044.042761] <IRQ> [<ffffffff8107d8e5>] __schedule_bug+0x65/0x80 [ 1044.042770] [<ffffffff8168978c>] __schedule+0x83c/0xa70 [ 1044.042775] [<ffffffff8106bdd2>] ? prepare_to_wait+0x32/0xb0 [ 1044.042779] [<ffffffff81689a5e>] schedule+0x2e/0xa0 [ 1044.042782] [<ffffffff81071ebd>] hrtimer_wait_for_timer+0x6d/0xb0 [ 1044.042786] [<ffffffff8106bb30>] ? wake_up_bit+0x40/0x40 [ 1044.042790] [<ffffffff81071f20>] hrtimer_cancel+0x20/0x40 [ 1044.042794] [<ffffffff8111da0c>] perf_swevent_cancel_hrtimer+0x3c/0x50 [ 1044.042798] [<ffffffff8111da31>] task_clock_event_stop+0x11/0x40 [ 1044.042802] [<ffffffff8111da6e>] task_clock_event_del+0xe/0x10 [ 1044.042805] [<ffffffff8111c568>] event_sched_out+0x118/0x1d0 [ 1044.042809] [<ffffffff8111c649>] group_sched_out+0x29/0x90 [ 1044.042813] [<ffffffff8111ed7e>] __perf_event_disable+0x18e/0x200 [ 1044.042817] [<ffffffff8111c343>] remote_function+0x63/0x70 [ 1044.042821] [<ffffffff810b0aae>] generic_smp_call_function_single_interrupt+0xce/0x120 [ 1044.042826] [<ffffffff81022bc7>] smp_call_function_single_interrupt+0x27/0x40 [ 1044.042831] [<ffffffff8168d50c>] call_function_single_interrupt+0x6c/0x80 [ 1044.042833] <EOI> [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20 [ 1044.042840] [<ffffffff8168b970>] ? _raw_spin_unlock_irq+0x30/0x70 [ 1044.042844] [<ffffffff8168b976>] ? _raw_spin_unlock_irq+0x36/0x70 [ 1044.042848] [<ffffffff810702e2>] run_hrtimer_softirq+0xc2/0x200 [ 1044.042853] [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20 [ 1044.042857] [<ffffffff81045265>] __do_softirq_common+0xf5/0x3a0 [ 1044.042862] [<ffffffff81045c3d>] __thread_do_softirq+0x15d/0x200 [ 1044.042865] [<ffffffff81045dda>] run_ksoftirqd+0xfa/0x210 [ 1044.042869] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200 [ 1044.042873] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200 [ 1044.042877] [<ffffffff8106b596>] kthread+0xb6/0xc0 [ 1044.042881] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70 [ 1044.042886] [<ffffffff8168d994>] kernel_thread_helper+0x4/0x10 [ 1044.042889] [<ffffffff8107d98c>] ? finish_task_switch+0x8c/0x110 [ 1044.042894] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70 [ 1044.042897] [<ffffffff8168bd5d>] ? retint_restore_args+0xe/0xe [ 1044.042900] [<ffffffff8106b4e0>] ? kthreadd+0x1e0/0x1e0 [ 1044.042902] [<ffffffff8168d990>] ? gs_change+0xb/0xb Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1341476476-5666-1-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* fs, jbd: pull your plug when waiting for spaceMike Galbraith2013-10-241-0/+2
| | | | | | | | | | | | | With an -rt kernel, and a heavy sync IO load, tasks can jam up on journal locks without unplugging, which can lead to terminal IO starvation. Unplug and schedule when waiting for space. Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Theodore Tso <tytso@mit.edu> Link: http://lkml.kernel.org/r/1341812414.7370.73.camel@marge.simpson.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* slab: Prevent local lock deadlockThomas Gleixner2013-10-241-4/+22
| | | | | | | | | | | | On RT we avoid the cross cpu function calls and take the per cpu local locks instead. Now the code missed that taking the local lock on the cpu which runs the code must use the proper local lock functions and not a simple spin_lock(). Otherwise it deadlocks later when trying to acquire the local lock with the proper function. Reported-and-tested-by: Chris Pringle <chris.pringle@miranda.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* Latency histograms: Detect another yet overlooked sharedprio conditionCarsten Emde2013-10-241-0/+3
| | | | | | | | | | | | | | While waiting for an RT process to be woken up, the previous process may go to wait and switch to another one with the same priority which then becomes current. This condition was not correctly recognized and led to erroneously high latency recordings during periods of low CPU load. This patch correctly marks such latencies as sharedprio and prevents them from being recorded as actual system latency. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* Disable RT_GROUP_SCHED in PREEMPT_RT_FULLCarsten Emde2013-10-241-0/+1
| | | | | | | | | | | Strange CPU stalls have been observed in RT when RT_GROUP_SCHED was configured. Disable it for now. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* Latency histograms: Adjust timer, if already elapsed when programmedCarsten Emde2013-10-242-2/+17
| | | | | | | | | | | | | | Nothing prevents a programmer from calling clock_nanosleep() with an already elapsed wakeup time in absolute time mode or with a too small delay in relative time mode. Such timers cannot wake up in time and, thus, should be corrected when entered into the missed timers latency histogram (CONFIG_MISSED_TIMERS_HIST). This patch marks such timers and uses a corrected expiration time. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* Latency histogramms: Cope with backwards running local trace clockCarsten Emde2013-10-242-35/+38
| | | | | | | | | | | | | | | | | | | | | | Thanks to the wonders of modern technology, the local trace clock can now run backwards. Since this never happened before, the time difference between now and somewhat earlier was expected to never become negative and, thus, stored in an unsigned integer variable. Nowadays, we need a signed integer to ensure that the value is stored as underflow in the related histogram. (In cases where this is not a misfunction, bipolar histograms can be used.) This patch takes care that all latency variables are represented as signed integers and negative numbers are considered as histogram underflows. In one of the misbehaving processors switching to global clock solved the problem: echo global >/sys/kernel/debug/tracing/trace_clock Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* mips-remove-smp-reserve-lock.patchThomas Gleixner2013-10-241-6/+0
| | | | | | | | Instead of making the lock raw, remove it as it protects nothing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* net,RT:REmove preemption disabling in netif_rx()Priyanka Jain2013-10-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1)enqueue_to_backlog() (called from netif_rx) should be bind to a particluar CPU. This can be achieved by disabling migration. No need to disable preemption 2)Fixes crash "BUG: scheduling while atomic: ksoftirqd" in case of RT. If preemption is disabled, enqueue_to_backog() is called in atomic context. And if backlog exceeds its count, kfree_skb() is called. But in RT, kfree_skb() might gets scheduled out, so it expects non atomic context. 3)When CONFIG_PREEMPT_RT_FULL is not defined, migrate_enable(), migrate_disable() maps to preempt_enable() and preempt_disable(), so no change in functionality in case of non-RT. -Replace preempt_enable(), preempt_disable() with migrate_enable(), migrate_disable() respectively -Replace get_cpu(), put_cpu() with get_cpu_light(), put_cpu_light() respectively Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com> Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com> Cc: <rostedt@goodmis.orgn> Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>