summaryrefslogtreecommitdiff
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* locking/atomic, kref: Add KREF_INIT()Peter Zijlstra2017-01-141-3/+1
| | | | | | | | | | | | | | | | Since we need to change the implementation, stop exposing internals. Provide KREF_INIT() to allow static initialization of struct kref. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Add kselftests for ww_mutex stressChris Wilson2017-01-141-0/+254
| | | | | | | | | | | | | | Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Nicolai Hähnle <nhaehnle@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161201114711.28697-8-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Add kselftests for resolving ww_mutex cyclic deadlocksChris Wilson2017-01-141-0/+115
| | | | | | | | | | | | | | | | | Check that ww_mutexes can detect cyclic deadlocks (generalised ABBA cycles) and resolve them by lock reordering. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Nicolai Hähnle <nhaehnle@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161201114711.28697-7-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Add kselftests for ww_mutex ABBA deadlock detectionChris Wilson2017-01-141-0/+98
| | | | | | | | | | | | | | Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Nicolai Hähnle <nhaehnle@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161201114711.28697-6-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Add kselftests for ww_mutex AA deadlock detectionChris Wilson2017-01-141-0/+39
| | | | | | | | | | | | | | Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Nicolai Hähnle <nhaehnle@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161201114711.28697-5-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Begin kselftests for ww_mutexChris Wilson2017-01-142-0/+141
| | | | | | | | | | | | | | Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Nicolai Hähnle <nhaehnle@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161201114711.28697-4-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Add ww_mutex to locktorture testChris Wilson2017-01-141-0/+73
| | | | | | | | | | | | | | | | | | Although ww_mutexes degenerate into mutexes, it would be useful to torture the deadlock handling between multiple ww_mutexes in addition to torturing the regular mutexes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Nicolai Hähnle <nhaehnle@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161201114711.28697-3-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/mutex: Initialize mutex_waiter::ww_ctx with poison when debuggingNicolai Hähnle2017-01-141-0/+4
| | | | | | | | | | | | | | | | | | | Help catch cases where mutex_lock is used directly on w/w mutexes, which otherwise result in the w/w tasks reading uninitialized data. Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-12-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Optimize ww-mutexes by yielding to other waiters from ↵Nicolai Hähnle2017-01-141-26/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | optimistic spin Lock stealing is less beneficial for w/w mutexes since we may just end up backing off if we stole from a thread with an earlier acquire stamp that already holds another w/w mutex that we also need. So don't spin optimistically unless we are sure that there is no other waiter that might cause us to back off. Median timings taken of a contention-heavy GPU workload: Before: real 0m52.946s user 0m7.272s sys 1m55.964s After: real 0m53.086s user 0m7.360s sys 1m46.204s This particular workload still spends 20%-25% of CPU in mutex_spin_on_owner according to perf, but my attempts to further reduce this spinning based on various heuristics all lead to an increase in measured wall time despite the decrease in sys time. Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-11-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loopNicolai Hähnle2017-01-141-20/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the following scenario, thread #1 should back off its attempt to lock ww1 and unlock ww2 (assuming the acquire context stamps are ordered accordingly). Thread #0 Thread #1 --------- --------- successfully lock ww2 set ww1->base.owner attempt to lock ww1 confirm ww1->ctx == NULL enter mutex_spin_on_owner set ww1->ctx What was likely to happen previously is: attempt to lock ww2 refuse to spin because ww2->ctx != NULL schedule() detect thread #0 is off CPU stop optimistic spin return -EDEADLK unlock ww2 wakeup thread #0 lock ww2 Now, we are more likely to see: detect ww1->ctx != NULL stop optimistic spin return -EDEADLK unlock ww2 successfully lock ww2 ... because thread #1 will stop its optimistic spin as soon as possible. The whole scenario is quite unlikely, since it requires thread #1 to get between thread #0 setting the owner and setting the ctx. But since we're idling here anyway, the additional check is basically free. Found by inspection. Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-10-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/mutex: Improve inliningPeter Zijlstra2017-01-141-41/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | Instead of inlining __mutex_lock_common() 5 times, once for each {state,ww} variant. Reduce this to two, ww and !ww. Then add __always_inline to mutex_optimistic_spin(), so that that will get inlined all 4 remaining times, for all {waiter,ww} variants. text data bss dec hex filename 6301 0 0 6301 189d defconfig-build/kernel/locking/mutex.o 4053 0 0 4053 fd5 defconfig-build/kernel/locking/mutex.o 4257 0 0 4257 10a1 defconfig-build/kernel/locking/mutex.o This reduces total text size and better separates the ww and !ww mutex code generation. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Optimize ww-mutexes by waking at most one waiter for ↵Nicolai Hähnle2017-01-141-19/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | backoff when acquiring the lock The wait list is sorted by stamp order, and the only waiting task that may have to back off is the first waiter with a context. The regular slow path does not have to wake any other tasks at all, since all other waiters that would have to back off were either woken up when the waiter was added to the list, or detected the condition before they added themselves. Median timings taken of a contention-heavy GPU workload: Without this series: real 0m59.900s user 0m7.516s sys 2m16.076s With changes up to and including this patch: real 0m52.946s user 0m7.272s sys 1m55.964s Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-9-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Notify waiters that have to back off while adding tasks to ↵Nicolai Hähnle2017-01-141-10/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | wait list While adding our task as a waiter, detect if another task should back off because of us. With this patch, we establish the invariant that the wait list contains at most one (sleeping) waiter with ww_ctx->acquired > 0, and this waiter will be the first waiter with a context. Since only waiters with ww_ctx->acquired > 0 have to back off, this allows us to be much more economical with wakeups. Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-8-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Add waiters in stamp orderNicolai Hähnle2017-01-141-7/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add regular waiters in stamp order. Keep adding waiters that have no context in FIFO order and take care not to starve them. While adding our task as a waiter, back off if we detect that there is a waiter with a lower stamp in front of us. Make sure to call lock_contended even when we back off early. For w/w mutexes, being first in the wait list is only stable when taking the lock without a context. Therefore, the purpose of the first flag is split into two: 'first' remains to indicate whether we want to spin optimistically, while 'handoff' indicates that we should be prepared to accept a handoff. For w/w locking with a context, we always accept handoffs after the first schedule(), to handle the following sequence of events: 1. Task #0 unlocks and hands off to Task #2 which is first in line 2. Task #1 adds itself in front of Task #2 3. Task #2 wakes up and must accept the handoff even though it is no longer first in line Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <Nicolai.Haehnle@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-7-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Remove the __ww_mutex_lock*() inline wrappersNicolai Hähnle2017-01-141-8/+8
| | | | | | | | | | | | | | | | | | | | | Keep the documentation in the header file since there is no good place for it in mutex.c: there are two rather different implementations with different EXPORT_SYMBOLs for each function. Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= <Nicolai.Haehnle@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-6-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Set use_ww_ctx even when locking without a contextNicolai Hähnle2017-01-141-12/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We will add a new field to struct mutex_waiter. This field must be initialized for all waiters if any waiter uses the ww_use_ctx path. So there is a trade-off: Keep ww_mutex locking without a context on the faster non-use_ww_ctx path, at the cost of adding the initialization to all mutex locks (including non-ww_mutexes), or avoid the additional cost for non-ww_mutex locks, at the cost of adding additional checks to the use_ww_ctx path. We take the latter choice. It may be worth eliminating the users of ww_mutex_lock(lock, NULL), but there are a lot of them. Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-5-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/ww_mutex: Extract stamp comparison to __ww_mutex_stamp_after()Nicolai Hähnle2017-01-141-2/+8
| | | | | | | | | | | | | | | | | | The function will be re-used in subsequent patches. Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <dev@mblankhorst.nl> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1482346000-9927-4-git-send-email-nhaehnle@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/mutex: Fix mutex handoffPeter Zijlstra2017-01-142-58/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While reviewing the ww_mutex patches, I noticed that it was still possible to (incorrectly) succeed for (incorrect) code like: mutex_lock(&a); mutex_lock(&a); This was possible if the second mutex_lock() would block (as expected) but then receive a spurious wakeup. At that point it would find itself at the front of the queue, request a handoff and instantly claim ownership and continue, since owner would point to itself. Avoid this scenario and simplify the code by introducing a third low bit to signal handoff pickup. So once we request handoff, unlock clears the handoff bit and sets the pickup bit along with the new owner. This also removes the need for the .handoff argument to __mutex_trylock(), since that becomes superfluous with PICKUP. In order to guarantee enough low bits, ensure task_struct alignment is at least L1_CACHE_BYTES (which seems a good ideal regardless). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 9d659ae14b54 ("locking/mutex: Add lock handoff to avoid starvation") Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/percpu-rwsem: Replace waitqueue with rcuwaitDavidlohr Bueso2017-01-141-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | The use of any kind of wait queue is an overkill for pcpu-rwsems. While one option would be to use the less heavy simple (swait) flavor, this is still too much for what pcpu-rwsems needs. For one, we do not care about any sort of queuing in that the only (rare) time writers (and readers, for that matter) are queued is when trying to acquire the regular contended rw_sem. There cannot be any further queuing as writers are serialized by the rw_sem in the first place. Given that percpu_down_write() must not be called after exit_notify(), we can replace the bulky waitqueue with rcuwait such that a writer can wait for its turn to take the lock. As such, we can avoid the queue handling and locking overhead. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Link: http://lkml.kernel.org/r/1484148146-14210-3-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/wait, RCU: Introduce rcuwait machineryDavidlohr Bueso2017-01-141-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rcuwait provides support for (single) RCU-safe task wait/wake functionality, with the caveat that it must not be called after exit_notify(), such that we avoid racing with rcu delayed_put_task_struct callbacks, task_struct being rcu unaware in this context -- for which we similarly have task_rcu_dereference() magic, but with different return semantics, which can conflict with the wakeup side. The interfaces are quite straightforward: rcuwait_wait_event() rcuwait_wake_up() More details are in the comments, but it's perhaps worth mentioning at least, that users must provide proper serialization when waiting on a condition, and avoid corrupting a concurrent waiter. Also care must be taken between the task and the condition for when calling the wakeup -- we cannot miss wakeups. When porting users, this is for example, a given when using waitqueues in that everything is done under the q->lock. As such, it can remove sources of non preemptable unbounded work for realtime. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Link: http://lkml.kernel.org/r/1484148146-14210-2-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/core: Remove set_task_state()Davidlohr Bueso2017-01-145-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a nasty interface and setting the state of a foreign task must not be done. As of the following commit: be628be0956 ("bcache: Make gc wakeup sane, remove set_task_state()") ... everyone in the kernel calls set_task_state() with current, allowing the helper to be removed. However, as the comment indicates, it is still around for those archs where computing current is more expensive than using a pointer, at least in theory. An important arch that is affected is arm64, however this has been addressed now [1] and performance is up to par making no difference with either calls. Of all the callers, if any, it's the locking bits that would care most about this -- ie: we end up passing a tsk pointer to a lot of the lock slowpath, and setting ->state on that. The following numbers are based on two tests: a custom ad-hoc microbenchmark that just measures latencies (for ~65 million calls) between get_task_state() vs get_current_state(). Secondly for a higher overview, an unlink microbenchmark was used, which pounds on a single file with open, close,unlink combos with increasing thread counts (up to 4x ncpus). While the workload is quite unrealistic, it does contend a lot on the inode mutex or now rwsem. [1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com == 1. x86-64 == Avg runtime set_task_state(): 601 msecs Avg runtime set_current_state(): 552 msecs vanilla dirty Hmean unlink1-processes-2 36089.26 ( 0.00%) 38977.33 ( 8.00%) Hmean unlink1-processes-5 28555.01 ( 0.00%) 29832.55 ( 4.28%) Hmean unlink1-processes-8 37323.75 ( 0.00%) 44974.57 ( 20.50%) Hmean unlink1-processes-12 43571.88 ( 0.00%) 44283.01 ( 1.63%) Hmean unlink1-processes-21 34431.52 ( 0.00%) 38284.45 ( 11.19%) Hmean unlink1-processes-30 34813.26 ( 0.00%) 37975.17 ( 9.08%) Hmean unlink1-processes-48 37048.90 ( 0.00%) 39862.78 ( 7.59%) Hmean unlink1-processes-79 35630.01 ( 0.00%) 36855.30 ( 3.44%) Hmean unlink1-processes-110 36115.85 ( 0.00%) 39843.91 ( 10.32%) Hmean unlink1-processes-141 32546.96 ( 0.00%) 35418.52 ( 8.82%) Hmean unlink1-processes-172 34674.79 ( 0.00%) 36899.21 ( 6.42%) Hmean unlink1-processes-203 37303.11 ( 0.00%) 36393.04 ( -2.44%) Hmean unlink1-processes-224 35712.13 ( 0.00%) 36685.96 ( 2.73%) == 2. ppc64le == Avg runtime set_task_state(): 938 msecs Avg runtime set_current_state: 940 msecs vanilla dirty Hmean unlink1-processes-2 19269.19 ( 0.00%) 30704.50 ( 59.35%) Hmean unlink1-processes-5 20106.15 ( 0.00%) 21804.15 ( 8.45%) Hmean unlink1-processes-8 17496.97 ( 0.00%) 17243.28 ( -1.45%) Hmean unlink1-processes-12 14224.15 ( 0.00%) 17240.21 ( 21.20%) Hmean unlink1-processes-21 14155.66 ( 0.00%) 15681.23 ( 10.78%) Hmean unlink1-processes-30 14450.70 ( 0.00%) 15995.83 ( 10.69%) Hmean unlink1-processes-48 16945.57 ( 0.00%) 16370.42 ( -3.39%) Hmean unlink1-processes-79 15788.39 ( 0.00%) 14639.27 ( -7.28%) Hmean unlink1-processes-110 14268.48 ( 0.00%) 14377.40 ( 0.76%) Hmean unlink1-processes-141 14023.65 ( 0.00%) 16271.69 ( 16.03%) Hmean unlink1-processes-172 13417.62 ( 0.00%) 16067.55 ( 19.75%) Hmean unlink1-processes-203 15293.08 ( 0.00%) 15440.40 ( 0.96%) Hmean unlink1-processes-234 13719.32 ( 0.00%) 16190.74 ( 18.01%) Hmean unlink1-processes-265 16400.97 ( 0.00%) 16115.22 ( -1.74%) Hmean unlink1-processes-296 14388.60 ( 0.00%) 16216.13 ( 12.70%) Hmean unlink1-processes-320 15771.85 ( 0.00%) 15905.96 ( 0.85%) x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching) and ppc64 (with paca) show similar improvements in the unlink microbenches. The small delta for ppc64 (2ms), does not represent the gains on the unlink runs. In the case of x86, there was a decent amount of variation in the latency runs, but always within a 20 to 50ms increase), ppc was more constant. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: mark.rutland@arm.com Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* kernel/locking: Compute 'current' directlyDavidlohr Bueso2017-01-144-29/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch effectively replaces the tsk pointer dereference (which is obviously == current), to directly use get_current() macro. This is to make the removal of setting foreign task states smoother and painfully obvious. Performance win on some archs such as x86-64 and ppc64. On a microbenchmark that calls set_task_state() vs set_current_state() and an inode rwsem pounding benchmark doing unlink: == 1. x86-64 == Avg runtime set_task_state(): 601 msecs Avg runtime set_current_state(): 552 msecs vanilla dirty Hmean unlink1-processes-2 36089.26 ( 0.00%) 38977.33 ( 8.00%) Hmean unlink1-processes-5 28555.01 ( 0.00%) 29832.55 ( 4.28%) Hmean unlink1-processes-8 37323.75 ( 0.00%) 44974.57 ( 20.50%) Hmean unlink1-processes-12 43571.88 ( 0.00%) 44283.01 ( 1.63%) Hmean unlink1-processes-21 34431.52 ( 0.00%) 38284.45 ( 11.19%) Hmean unlink1-processes-30 34813.26 ( 0.00%) 37975.17 ( 9.08%) Hmean unlink1-processes-48 37048.90 ( 0.00%) 39862.78 ( 7.59%) Hmean unlink1-processes-79 35630.01 ( 0.00%) 36855.30 ( 3.44%) Hmean unlink1-processes-110 36115.85 ( 0.00%) 39843.91 ( 10.32%) Hmean unlink1-processes-141 32546.96 ( 0.00%) 35418.52 ( 8.82%) Hmean unlink1-processes-172 34674.79 ( 0.00%) 36899.21 ( 6.42%) Hmean unlink1-processes-203 37303.11 ( 0.00%) 36393.04 ( -2.44%) Hmean unlink1-processes-224 35712.13 ( 0.00%) 36685.96 ( 2.73%) == 2. ppc64le == Avg runtime set_task_state(): 938 msecs Avg runtime set_current_state: 940 msecs vanilla dirty Hmean unlink1-processes-2 19269.19 ( 0.00%) 30704.50 ( 59.35%) Hmean unlink1-processes-5 20106.15 ( 0.00%) 21804.15 ( 8.45%) Hmean unlink1-processes-8 17496.97 ( 0.00%) 17243.28 ( -1.45%) Hmean unlink1-processes-12 14224.15 ( 0.00%) 17240.21 ( 21.20%) Hmean unlink1-processes-21 14155.66 ( 0.00%) 15681.23 ( 10.78%) Hmean unlink1-processes-30 14450.70 ( 0.00%) 15995.83 ( 10.69%) Hmean unlink1-processes-48 16945.57 ( 0.00%) 16370.42 ( -3.39%) Hmean unlink1-processes-79 15788.39 ( 0.00%) 14639.27 ( -7.28%) Hmean unlink1-processes-110 14268.48 ( 0.00%) 14377.40 ( 0.76%) Hmean unlink1-processes-141 14023.65 ( 0.00%) 16271.69 ( 16.03%) Hmean unlink1-processes-172 13417.62 ( 0.00%) 16067.55 ( 19.75%) Hmean unlink1-processes-203 15293.08 ( 0.00%) 15440.40 ( 0.96%) Hmean unlink1-processes-234 13719.32 ( 0.00%) 16190.74 ( 18.01%) Hmean unlink1-processes-265 16400.97 ( 0.00%) 16115.22 ( -1.74%) Hmean unlink1-processes-296 14388.60 ( 0.00%) 16216.13 ( 12.70%) Hmean unlink1-processes-320 15771.85 ( 0.00%) 15905.96 ( 0.85%) Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: mark.rutland@arm.com Link: http://lkml.kernel.org/r/1483479794-14013-4-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* kernel/exit: Compute 'current' directlyDavidlohr Bueso2017-01-141-11/+11
| | | | | | | | | | | | | | | | | | | | This patch effectively replaces the tsk pointer dereference (which is obviously == current), to directly use get_current() macro. In this case, do_exit() always passes current to exit_mm(), hence we can simply get rid of the argument. This is also a performance win on some archs such as x86-64 and ppc64 -- arm64 is no longer an issue. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave@stgolabs.net Cc: mark.rutland@arm.com Link: http://lkml.kernel.org/r/1483479794-14013-2-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/pvqspinlock: Don't wait if vCPU is preemptedPan Xinhui2017-01-121-1/+1
| | | | | | | | | | | | | | If prev node is not in running state or its vCPU is preempted, we can give up our vCPU slices in pv_wait_node() ASAP. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: longman@redhat.com Link: http://lkml.kernel.org/r/1484035006-6787-1-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Fixed typos in the changelog, removed ugly linebreak from the code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* locking/spinlocks: Remove the unused spin_lock_bh_nested() APIWaiman Long2017-01-121-8/+0
| | | | | | | | | | | | | The spin_lock_bh_nested() API is defined but is not used anywhere in the kernel. So all spin_lock_bh_nested() and related APIs are now removed. Signed-off-by: Waiman Long <longman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1483975612-16447-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* signal: protect SIGNAL_UNKILLABLE from unintentional clearing.Jamie Iles2017-01-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 00cd5c37afd5 ("ptrace: permit ptracing of /sbin/init") we can now trace init processes. init is initially protected with SIGNAL_UNKILLABLE which will prevent fatal signals such as SIGSTOP, but there are a number of paths during tracing where SIGNAL_UNKILLABLE can be implicitly cleared. This can result in init becoming stoppable/killable after tracing. For example, running: while true; do kill -STOP 1; done & strace -p 1 and then stopping strace and the kill loop will result in init being left in state TASK_STOPPED. Sending SIGCONT to init will resume it, but init will now respond to future SIGSTOP signals rather than ignoring them. Make sure that when setting SIGNAL_STOP_CONTINUED/SIGNAL_STOP_STOPPED that we don't clear SIGNAL_UNKILLABLE. Link: http://lkml.kernel.org/r/20170104122017.25047-1-jamie.iles@oracle.com Signed-off-by: Jamie Iles <jamie.iles@oracle.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}Dan Williams2017-01-101-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both arch_add_memory() and arch_remove_memory() expect a single threaded context. For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does not hold any locks over this check and branch: if (pgd_val(*pgd)) { pud = (pud_t *)pgd_page_vaddr(*pgd); paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end), page_size_mask); continue; } pud = alloc_low_page(); paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end), page_size_mask); The result is that two threads calling devm_memremap_pages() simultaneously can end up colliding on pgd initialization. This leads to crash signatures like the following where the loser of the race initializes the wrong pgd entry: BUG: unable to handle kernel paging request at ffff888ebfff0000 IP: memcpy_erms+0x6/0x10 PGD 2f8e8fc067 PUD 0 /* <---- Invalid PUD */ Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13 task: ffff882fac290040 ti: ffff882f887a4000 task.ti: ffff882f887a4000 RIP: memcpy_erms+0x6/0x10 [..] Call Trace: ? pmem_do_bvec+0x205/0x370 [nd_pmem] ? blk_queue_enter+0x3a/0x280 pmem_rw_page+0x38/0x80 [nd_pmem] bdev_read_page+0x84/0xb0 Hold the standard memory hotplug mutex over calls to arch_{add,remove}_memory(). Fixes: 41e94a851304 ("add devm_memremap_pages") Link: http://lkml.kernel.org/r/148357647831.9498.12606007370121652979.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* bpf: do not use KMALLOC_SHIFT_MAXMichal Hocko2017-01-102-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Commit 01b3f52157ff ("bpf: fix allocation warnings in bpf maps and integer overflow") has added checks for the maximum allocateable size. It (ab)used KMALLOC_SHIFT_MAX for that purpose. While this is not incorrect it is not very clean because we already have KMALLOC_MAX_SIZE for this very reason so let's change both checks to use KMALLOC_MAX_SIZE instead. The original motivation for using KMALLOC_SHIFT_MAX was to work around an incorrect KMALLOC_MAX_SIZE which could lead to allocation warnings but it is no longer needed since "slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER". Link: http://lkml.kernel.org/r/20161220130659.16461-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds2017-01-051-4/+14
|\ | | | | | | | | | | | | | | | | | | | | | | | | Pull audit fixes from Paul Moore: "Two small fixes relating to audit's use of fsnotify. The first patch plugs a leak and the second fixes some lock shenanigans. The patches are small and I banged on this for an afternoon with our testsuite and didn't see anything odd" * 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit: audit: Fix sleep in atomic fsnotify: Remove fsnotify_duplicate_mark()
| * audit: Fix sleep in atomicJan Kara2017-01-031-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | Audit tree code was happily adding new notification marks while holding spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can lead to sleeping while holding a spinlock, deadlocks due to lock inversion, and probably other fun. Fix the problem by acquiring group->mark_mutex earlier. CC: Paul Moore <paul@paul-moore.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com>
| * fsnotify: Remove fsnotify_duplicate_mark()Jan Kara2016-12-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are only two calls sites of fsnotify_duplicate_mark(). Those are in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused for audit tree, inode pointer and group gets set in fsnotify_add_mark_locked() later anyway, mask and free_mark are already set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is actively harmful because following fsnotify_add_mark_locked() will leak group reference by overwriting the group pointer. So just remove the two calls to fsnotify_duplicate_mark() and the function. Signed-off-by: Jan Kara <jack@suse.cz> [PM: line wrapping to fit in 80 chars] Signed-off-by: Paul Moore <paul@paul-moore.com>
* | smp/hotplug: Undo tglxs brainfartThomas Gleixner2016-12-261-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The attempt to prevent overwriting an active state resulted in a disaster which effectively disables all dynamically allocated hotplug states. Cleanup the mess. Fixes: dc280d936239 ("cpu/hotplug: Prevent overwriting of callbacks") Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2016-12-2524-147/+144
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer type cleanups from Thomas Gleixner: "This series does a tree wide cleanup of types related to timers/timekeeping. - Get rid of cycles_t and use a plain u64. The type is not really helpful and caused more confusion than clarity - Get rid of the ktime union. The union has become useless as we use the scalar nanoseconds storage unconditionally now. The 32bit timespec alike storage got removed due to the Y2038 limitations some time ago. That leaves the odd union access around for no reason. Clean it up. Both changes have been done with coccinelle and a small amount of manual mopping up" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ktime: Get rid of ktime_equal() ktime: Cleanup ktime_set() usage ktime: Get rid of the union clocksource: Use a plain u64 instead of cycle_t
| * | ktime: Cleanup ktime_set() usageThomas Gleixner2016-12-255-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ktime_set(S,N) was required for the timespec storage type and is still useful for situations where a Seconds and Nanoseconds part of a time value needs to be converted. For anything where the Seconds argument is 0, this is pointless and can be replaced with a simple assignment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
| * | ktime: Get rid of the unionThomas Gleixner2016-12-2513-88/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ktime is a union because the initial implementation stored the time in scalar nanoseconds on 64 bit machine and in a endianess optimized timespec variant for 32bit machines. The Y2038 cleanup removed the timespec variant and switched everything to scalar nanoseconds. The union remained, but become completely pointless. Get rid of the union and just keep ktime_t as simple typedef of type s64. The conversion was done with coccinelle and some manual mopping up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
| * | clocksource: Use a plain u64 instead of cycle_tThomas Gleixner2016-12-2510-52/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
* | | Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds2016-12-251-184/+51
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug notifier removal from Thomas Gleixner: "This is the final cleanup of the hotplug notifier infrastructure. The series has been reintgrated in the last two days because there came a new driver using the old infrastructure via the SCSI tree. Summary: - convert the last leftover drivers utilizing notifiers - fixup for a completely broken hotplug user - prevent setup of already used states - removal of the notifiers - treewide cleanup of hotplug state names - consolidation of state space There is a sphinx based documentation pending, but that needs review from the documentation folks" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/armada-xp: Consolidate hotplug state space irqchip/gic: Consolidate hotplug state space coresight/etm3/4x: Consolidate hotplug state space cpu/hotplug: Cleanup state names cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions staging/lustre/libcfs: Convert to hotplug state machine scsi/bnx2i: Convert to hotplug state machine scsi/bnx2fc: Convert to hotplug state machine cpu/hotplug: Prevent overwriting of callbacks x86/msr: Remove bogus cleanup from the error path bus: arm-ccn: Prevent hotplug callback leak perf/x86/intel/cstate: Prevent hotplug callback leak ARM/imx/mmcd: Fix broken cpu hotplug handling scsi: qedi: Convert to hotplug state machine
| * | cpu/hotplug: Remove obsolete cpu hotplug register/unregister functionsThomas Gleixner2016-12-251-138/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hotcpu_notifier(), cpu_notifier(), __hotcpu_notifier(), __cpu_notifier(), register_hotcpu_notifier(), register_cpu_notifier(), __register_hotcpu_notifier(), __register_cpu_notifier(), unregister_hotcpu_notifier(), unregister_cpu_notifier(), __unregister_hotcpu_notifier(), __unregister_cpu_notifier() are unused now. Remove them and all related code. Remove also the now pointless cpu notifier error injection mechanism. The states can be executed step by step and error rollback is the same as cpu down, so any state transition can be tested w/o requiring the notifier error injection. Some CPU hotplug states are kept as they are (ab)used for hotplug state tracking. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161221192112.005642358@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | cpu/hotplug: Prevent overwriting of callbacksThomas Gleixner2016-12-251-46/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Developers manage to overwrite states blindly without thought. That's fatal and hard to debug. Add sanity checks to make it fail. This requries to restructure the code so that the dynamic state allocation happens in the same lock protected section as the actual store. Otherwise the previous assignment of 'Reserved' to the name field would trigger the overwrite check. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192111.675234535@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds2016-12-2429-29/+29
|/ / | | | | | | | | | | | | | | | | | | | | | | | | This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2016-12-231-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "On the kernel side there's two x86 PMU driver fixes and a uprobes fix, plus on the tooling side there's a number of fixes and some late updates" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) perf sched timehist: Fix invalid period calculation perf sched timehist: Remove hardcoded 'comm_width' check at print_summary perf sched timehist: Enlarge default 'comm_width' perf sched timehist: Honour 'comm_width' when aligning the headers perf/x86: Fix overlap counter scheduling bug perf/x86/pebs: Fix handling of PEBS buffer overflows samples/bpf: Move open_raw_sock to separate header samples/bpf: Remove perf_event_open() declaration samples/bpf: Be consistent with bpf_load_program bpf_insn parameter tools lib bpf: Add bpf_prog_{attach,detach} samples/bpf: Switch over to libbpf perf diff: Do not overwrite valid build id perf annotate: Don't throw error for zero length symbols perf bench futex: Fix lock-pi help string perf trace: Check if MAP_32BIT is defined (again) samples/bpf: Make perf_event_read() static uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation samples/bpf: Make samples more libbpf-centric tools lib bpf: Add flags to bpf_create_map() tools lib bpf: use __u32 from linux/types.h ...
| * | uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint ↵Marcin Nowakowski2016-12-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | creation Commit: 72e6ae285a1d ('ARM: 8043/1: uprobes need icache flush after xol write' ... has introduced an arch-specific method to ensure all caches are flushed appropriately after an instruction is written to an XOL page. However, when the XOL area is created and the out-of-line breakpoint instruction is copied, caches are not flushed at all and stale data may be found in icache. Replace a simple copy_to_page() with arch_uprobe_copy_ixol() to allow the arch to ensure all caches are updated accordingly. This change fixes uprobes on MIPS InterAptiv (tested on Creator Ci40). Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Victor Kamensky <victor.kamensky@linaro.org> Cc: linux-mips@linux-mips.org Link: http://lkml.kernel.org/r/1481625657-22850-1-git-send-email-marcin.nowakowski@imgtec.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2016-12-231-0/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull final vfs updates from Al Viro: "Assorted cleanups and fixes all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sg_write()/bsg_write() is not fit to be called under KERNEL_DS ufs: fix function declaration for ufs_truncate_blocks fs: exec: apply CLOEXEC before changing dumpable task flags seq_file: reset iterator to first record for zero offset vfs: fix isize/pos/len checks for reflink & dedupe [iov_iter] fix iterate_all_kinds() on empty iterators move aio compat to fs/aio.c reorganize do_make_slave() clone_private_mount() doesn't need to touch namespace_sem remove a bogus claim about namespace_sem being held by callers of mnt_alloc_id()
| * | | move aio compat to fs/aio.cAl Viro2016-12-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... and fix the minor buglet in compat io_submit() - native one kills ioctx as cleanup when put_user() fails. Get rid of bogus compat_... in !CONFIG_AIO case, while we are at it - they should simply fail with ENOSYS, same as for native counterparts. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | kcov: make kcov work properly with KASLR enabledAlexander Popov2016-12-201-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Subtract KASLR offset from the kernel addresses reported by kcov. Tested on x86_64 and AArch64 (Hikey LeMaker). Link: http://lkml.kernel.org/r/1481417456-28826-3-git-send-email-alex.popov@linux.com Signed-off-by: Alexander Popov <alex.popov@linux.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Jon Masters <jcm@redhat.com> Cc: David Daney <david.daney@cavium.com> Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Nicolai Stange <nicstange@gmail.com> Cc: James Morse <james.morse@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | ima: on soft reboot, save the measurement listMimi Zohar2016-12-201-0/+4
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TPM PCRs are only reset on a hard reboot. In order to validate a TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list of the running kernel must be saved and restored on boot. This patch uses the kexec buffer passing mechanism to pass the serialized IMA binary_runtime_measurements to the next kernel. Link: http://lkml.kernel.org/r/1480554346-29071-7-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2016-12-181-0/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "Prevent NULL pointer dereferencing in the tick broadcast code. Old bug, which got unearthed by the hotplug ordering problem" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/broadcast: Prevent NULL pointer dereference
| * | | tick/broadcast: Prevent NULL pointer dereferenceThomas Gleixner2016-12-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a disfunctional timer, e.g. dummy timer, is installed, the tick core tries to setup the broadcast timer. If no broadcast device is installed, the kernel crashes with a NULL pointer dereference in tick_broadcast_setup_oneshot() because the function has no sanity check. Reported-by: Mason <slash.tmp@free.fr> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Richard Cochran <rcochran@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org>, Cc: Sebastian Frias <sf84@laposte.net> Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com> Cc: Robin Murphy <robin.murphy@arm.com> Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr
* | | | Merge branch 'smp-urgent-for-linus' of ↵Linus Torvalds2016-12-181-1/+5
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug fixes from Thomas Gleixner: "Two fixlets for cpu hotplug: - Fix a subtle ordering problem with the dummy timer. This happened to work before the conversion by chance due to initcall ordering. - Fix the function comment for __cpuhp_setup_state()" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Clarify description of __cpuhp_setup_state() return value clocksource/dummy_timer: Move hotplug callback after the real timers
| * | | | cpu/hotplug: Clarify description of __cpuhp_setup_state() return valueBoris Ostrovsky2016-12-151-1/+5
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When invoked with CPUHP_AP_ONLINE_DYN state __cpuhp_setup_state() is expected to return positive value which is the hotplug state that the routine assigns. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: linux-pm@vger.kernel.org Cc: viresh.kumar@linaro.org Cc: bigeasy@linutronix.de Cc: rjw@rjwysocki.net Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1481814058-4799-2-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>