delta/linux-next.git - git.kernel.org: pub/scm/linux/kernel/git/next/linux-next.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add linux-next specific files for 20160805next-20160805	Stephen Rothwell	2016-08-05	5	-0/+3886
\| \| \| \|	Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
*	Merge branch 'akpm/master'	Stephen Rothwell	2016-08-05	3	-57/+85
\|\
\| *	ipc/sem.c: fix complex_count vs. simple op race	Manfred Spraul	2016-08-05	2	-55/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race: sem_lock has a fast path that allows parallel simple operations. There are two reasons why a simple operation cannot run in parallel: - a non-simple operations is ongoing (sma->sem_perm.lock held) - a complex operation is sleeping (sma->complex_count != 0) As both facts are stored independently, a thread can bypass the current checks by sleeping in the right positions. See below for more details (or kernel bugzilla 105651). The patch fixes that by creating one variable (complex_mode) that tracks both reasons why parallel operations are not possible. The patch also updates stale documentation regarding the locking. With regards to stable kernels: The patch is required for all kernels that include the commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?) The alternative is to revert the patch that introduced the race. The patch is safe for backporting, i.e. it makes no assumptions about memory barriers in spin_unlock_wait(). Background: Here is the race of the current implementation: Thread A: (simple op) - does the first "sma->complex_count == 0" test Thread B: (complex op) - does sem_lock(): This includes an array scan. But the scan can't find Thread A, because Thread A does not own sem->lock yet. - the thread does the operation, increases complex_count, drops sem_lock, sleeps Thread A: - spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock) - sleeps before the complex_count test Thread C: (complex op) - does sem_lock (no array scan, complex_count==1) - wakes up Thread B. - decrements complex_count Thread A: - does the complex_count test Bug: Now both thread A and thread C operate on the same array, without any synchronization. Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com Reported-by: <felixh@informatik.uni-bremen.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: <1vier1@web.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	drivers/net/wireless/intel/iwlwifi/dvm/calib.c: simplfy min() expression	Andrew Morton	2016-08-05	1	-2/+1
\|/ \| \| \| \| \| \| \| \| \|	This cast is no longer needed. Cc: Johannes Berg <johannes.berg@intel.com> Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Cc: Intel Linux Wireless <linuxwifi@intel.com> Cc: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
*	Merge branch 'akpm-current/current'	Stephen Rothwell	2016-08-05	53	-385/+888
\|\
\| *	ipc/msg.c: use freezable blocking call	Yogesh Gaur	2016-08-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid waking up every thread sleeping in a msgrcv call during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. Ref: https://lkml.org/lkml/2013/5/1/424 Backtrace: [<c03e3924>] (__schedule+0x0/0x5d8) from [<c03e3f88>] (schedule+0x8c/0x90) [<c03e3efc>] (schedule+0x0/0x90) from [<c01ef9f8>] (do_msgrcv+0x2e0/0x368) [<c01ef718>] (do_msgrcv+0x0/0x368) from [<c01efaac>] (SyS_msgrcv+0x2c/0x38) [<c01efa80>] (SyS_msgrcv+0x0/0x38) from [<c001a180>] (ret_fast_syscall+0x0/0x48) tPlay0Cb2 R running 0 297 204 0x00000001 This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Signed-off-by: Yogesh Gaur <yn.gaur@samsung.com> Signed-off-by: Manjeet Pawar <manjeet.p@samsung.com> Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Reviewed-by : Ajeet Yadav <ajeet.y@samsung.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	ipc/msg.c: msgsnd: use freezable blocking call	Maninder Singh	2016-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When any task is stuck in Interruptible or Uninterruptible state then waking up of that task fails. If wakeup fails, then suspend operation fails and all process send to frezeer state at this moment also gets wakeup. Correct implementation is that if suspend fails, then kernel would retry suspend operation again after some specific timeinterval for some fixed retry count. But as changes suggested by Mr Colin Cross (https://lkml.org/lkml/2013/5/1/424), for the system calls for which issue has been faced process flag being appended with PF_FREEZER_SKIP. We are testing some scenarios in which we have to do multi suspend-resume scenario, and we faced the problem, hence the suggested changes for msgsnd and msgrcv. Avoid waking up every thread sleeping in a msgsnd call during suspend and resume by calling a freezable blocking call. Previous patches modified the freezer to avoid sending wakeups to threads that are blocked in freezable blocking calls. This call was selected to be converted to a freezable call because it doesn't hold any locks or release any resources when interrupted that might be needed by another freezing task or a kernel driver during suspend, and is a common site where idle userspace tasks are blocked. Signed-off-by: Vaneet narang <v.narang@samsung.com> Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Cc: Yogesh Gaur <yn.gaur@samsung.com> Cc: Manjeet Pawar <manjeet.p@samsung.com> Cc: Ajeet Yadav <ajeet.y@samsung.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	nvme: use the DMA_ATTR_NO_WARN attribute	Mauricio Faria de Oliveira	2016-08-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the DMA_ATTR_NO_WARN attribute for the dma_map_sg() call of the nvme driver that returns BLK_MQ_RQ_QUEUE_BUSY (not for BLK_MQ_RQ_QUEUE_ERROR). Link: http://lkml.kernel.org/r/1470092390-25451-4-git-send-email-mauricfo@linux.vnet.ibm.com Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	powerpc: implement the DMA_ATTR_NO_WARN attribute	Mauricio Faria de Oliveira	2016-08-05	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for the DMA_ATTR_NO_WARN attribute on powerpc iommu code. Link: http://lkml.kernel.org/r/1470092390-25451-3-git-send-email-mauricfo@linux.vnet.ibm.com Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	dma-mapping: introduce the DMA_ATTR_NO_WARN attribute	Mauricio Faria de Oliveira	2016-08-05	2	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce the DMA_ATTR_NO_WARN attribute, and document it. Link: http://lkml.kernel.org/r/1470092390-25451-2-git-send-email-mauricfo@linux.vnet.ibm.com Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	random: remove unused randomize_range()	Jason Cooper	2016-08-05	2	-20/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All call sites for randomize_range have been updated to use the much simpler and more robust randomize_page(). Remove the now unnecessary code. Link: http://lkml.kernel.org/r/20160803233913.32511-8-jason@lakedaemon.net Signed-off-by: Jason Cooper <jason@lakedaemon.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	unicore32: use simpler API for random address requests	Jason Cooper	2016-08-05	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, all callers to randomize_range() set the length to 0 and calculate end by adding a constant to the start address. We can simplify the API to remove a bunch of needless checks and variables. Use the new randomize_addr(start, range) call to set the requested address. Link: http://lkml.kernel.org/r/20160803233913.32511-7-jason@lakedaemon.net Signed-off-by: Jason Cooper <jason@lakedaemon.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	tile: use simpler API for random address requests	Jason Cooper	2016-08-05	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, all callers to randomize_range() set the length to 0 and calculate end by adding a constant to the start address. We can simplify the API to remove a bunch of needless checks and variables. Use the new randomize_addr(start, range) call to set the requested address. Link: http://lkml.kernel.org/r/20160803233913.32511-6-jason@lakedaemon.net Signed-off-by: Jason Cooper <jason@lakedaemon.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Chris Metcalf <cmetcalf@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	arm64: use simpler API for random address requests	Jason Cooper	2016-08-05	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, all callers to randomize_range() set the length to 0 and calculate end by adding a constant to the start address. We can simplify the API to remove a bunch of needless checks and variables. Use the new randomize_addr(start, range) call to set the requested address. Link: http://lkml.kernel.org/r/20160803233913.32511-5-jason@lakedaemon.net Signed-off-by: Jason Cooper <jason@lakedaemon.net> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Russell King - ARM Linux" <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	ARM: use simpler API for random address requests	Jason Cooper	2016-08-05	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, all callers to randomize_range() set the length to 0 and calculate end by adding a constant to the start address. We can simplify the API to remove a bunch of needless checks and variables. Use the new randomize_addr(start, range) call to set the requested address. Link: http://lkml.kernel.org/r/20160803233913.32511-4-jason@lakedaemon.net Signed-off-by: Jason Cooper <jason@lakedaemon.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Russell King - ARM Linux" <linux@arm.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	x86: use simpler API for random address requests	Jason Cooper	2016-08-05	2	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, all callers to randomize_range() set the length to 0 and calculate end by adding a constant to the start address. We can simplify the API to remove a bunch of needless checks and variables. Use the new randomize_addr(start, range) call to set the requested address. Link: http://lkml.kernel.org/r/20160803233913.32511-3-jason@lakedaemon.net Signed-off-by: Jason Cooper <jason@lakedaemon.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	random: simplify API for random address requests	Jason Cooper	2016-08-05	2	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To date, all callers of randomize_range() have set the length to 0, and check for a zero return value. For the current callers, the only way to get zero returned is if end <= start. Since they are all adding a constant to the start address, this is unnecessary. We can remove a bunch of needless checks by simplifying the API to do just what everyone wants, return an address between [start, start + range). While we're here, s/get_random_int/get_random_long/. No current call site is adversely affected by get_random_int(), since all current range requests are < UINT_MAX. However, we should match caller expectations to avoid coming up short (ha!) in the future. All current callers to randomize_range() chose to use the start address if randomize_range() failed. Therefore, we simplify things by just returning the start address on error. randomize_range() will be removed once all callers have been converted over to randomize_addr(). Link: http://lkml.kernel.org/r/20160803233913.32511-2-jason@lakedaemon.net Signed-off-by: Jason Cooper <jason@lakedaemon.net> Acked-by: Kees Cook <keescook@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Roberts, William C" <william.c.roberts@intel.com> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: "Russell King - ARM Linux" <linux@arm.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Nick Kralevich <nnk@google.com> Cc: Jeffrey Vander Stoep <jeffv@google.com> Cc: Daniel Cashman <dcashman@android.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	kdump, vmcoreinfo: report actual value of phys_base	HATAYAMA Daisuke	2016-08-05	2	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, VMCOREINFO note information reports the virtual address of phys_base that is assigned to symbol phys_base. But this doesn't make sense because to refer to phys_base, it's necessary to get the value of phys_base itself we are now about to refer to. Userland tools related to kdump such as makedumpfile and crash utility so far have made some efforts to calculate phys_base on crash dump formats generated by mechanisms running outside Linux kernel, such as virtual machine hypervisor such as qemu dump, which ordinary users use via virsh dump, or ones implemented on vendor specific firmware. That is, find a kernel data whose virtual and physical addresses are available via its note information and calculate phys_base from it. However, such data structure is not the one prepared for phys_base purpose. There's no guarantee that other crash dump mechanisms include such information that can be used to calculate phys_base similarly. To get VMCOREINFO in vmcore, it's easy to use strings and grep commands like this; VMCOREINFO consists of simple string: $ strings vmcore-3.10.0-121.el7.x86_64 \| grep -E ".VMCOREINFO." -A 100 VMCOREINFO OSRELEASE=3.10.0-121.el7.x86_64 PAGESIZE=4096 ... This is also useful to get value of phys_base in kdump 2nd kernel contained in vmcore using the above-mentioned external crash dump mechanism; kdump 2nd kernel is an inherently relocated kernel. This commit doesn't remove VMCOREINFO_SYMBOL(phys_base) line because makedumpfile refers to it and if removing it, old versions makedumpfile doesn't work well. Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Dave Anderson <anderson@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	compat: remove compat_printk()	Arnd Bergmann	2016-08-05	3	-25/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After 7e8e385aaf6e ("x86/compat: Remove sys32_vm86_warning"), this function has become unused, so we can remove it as well. Link: http://lkml.kernel.org/r/20160617142903.3070388-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	lib: Add CRC64 ECMA module	Marian Chereji	2016-08-05	4	-0/+405
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add implementation of CRC64 ECMA checksum. We have an IP Acceleration driver for Freescale network processors which is using this CRC64. However, it still needs some work in order for it to become upstreamable. Signed-off-by: Marian Chereji <marian.chereji@freescale.com> Reviewed-by: Varvara Andrei-B21317 <andrei.varvara@freescale.com> Reviewed-by: Fleming Andrew-AFLEMING <AFLEMING@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	printk: remove unnecessary #ifdef CONFIG_PRINTK	Andreas Ziegler	2016-08-05	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit 874f9c7da9a4 ("printk: create pr_<level> functions") in linux-next, new pr_level defines were added to printk.c. These defines are guarded by an #ifdef CONFIG_PRINTK - however, there is already a surrounding #ifdef CONFIG_PRINTK starting a lot earlier in line 249 which means the newly introduced #ifdef is unnecessary. Let's remove it to avoid confusion. Link: http://lkml.kernel.org/r/1470297129-28049-1-git-send-email-andreas.ziegler@fau.de Signed-off-by: Andreas Ziegler <andreas.ziegler@fau.de> Cc: Joe Perches <joe@perches.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	proc: add LSM hook checks to /proc/<tid>/timerslack_ns	John Stultz	2016-08-05	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As requested, this patch checks the existing LSM hooks task_getscheduler/task_setscheduler when reading or modifying the task's timerslack value. Previous versions added new get/settimerslack LSM hooks, but since they checked the same PROCESS__SET/GETSCHED values as existing hooks, it was suggested we just use the existing ones. Link: http://lkml.kernel.org/r/1469132667-17377-2-git-send-email-john.stultz@linaro.org Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Todd Kjos <tkjos@google.com> Cc: Colin Cross <ccross@android.com> Cc: Nick Kralevich <nnk@google.com> Cc: Dmitry Shmidt <dimitrysh@google.com> Cc: Elliott Hughes <enh@google.com> Cc: James Morris <jmorris@namei.org> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	proc: relax /proc/<tid>/timerslack_ns capability requirements	John Stultz	2016-08-05	1	-14/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an interface to allow a task to change another tasks timerslack was first proposed, it was suggested that something greater then CAP_SYS_NICE would be needed, as a task could be delayed further then what normally could be done with nice adjustments. So CAP_SYS_PTRACE was adopted instead for what became the /proc/<tid>/timerslack_ns interface. However, for Android (where this feature originates), giving the system_server CAP_SYS_PTRACE would allow it to observe and modify all tasks memory. This is considered too high a privilege level for only needing to change the timerslack. After some discussion, it was realized that a CAP_SYS_NICE process can set a task as SCHED_FIFO, so they could fork some spinning processes and set them all SCHED_FIFO 99, in effect delaying all other tasks for an infinite amount of time. So as a CAP_SYS_NICE task can already cause trouble for other tasks, using it as a required capability for accessing and modifying /proc/<tid>/timerslack_ns seems sufficient. Thus, this patch loosens the capability requirements to CAP_SYS_NICE and removes CAP_SYS_PTRACE, simplifying some of the code flow as well. This is technically an ABI change, but as the feature just landed in 4.6, I suspect no one is yet using it. Link: http://lkml.kernel.org/r/1469132667-17377-1-git-send-email-john.stultz@linaro.org Signed-off-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Nick Kralevich <nnk@google.com> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Todd Kjos <tkjos@google.com> Cc: Colin Cross <ccross@android.com> Cc: Nick Kralevich <nnk@google.com> Cc: Dmitry Shmidt <dimitrysh@google.com> Cc: Elliott Hughes <enh@google.com> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm/vmstat.c: walk the zone in pageblock_nr_pages steps	zhong jiang	2016-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when walking the zone, we can happens to the holes. we should not align MAX_ORDER_NR_PAGES, so it can skip the normal memory. In addition, pagetypeinfo_showmixedcount_print reflect fragmentization. we hope to get more accurate data. therefore, I decide to fix it. Link: http://lkml.kernel.org/r/1469502526-24486-2-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm/page_owner: align with pageblock_nr pages	zhong jiang	2016-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When pfn_valid(pfn) returns false, pfn should be aligned with pageblock_nr_pages other than MAX_ORDER_NR_PAGES in init_pages_in_zone, because the skipped 2M may be valid pfn, as a result, early allocated count will not be accurate. Link: http://lkml.kernel.org/r/1468938136-24228-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm/zsmalloc: add per-class compact trace event	Ganesh Mahendran	2016-08-05	2	-18/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	add per-class compact trace event to get number of migrated objects and number of freed pages. trace log is like below: bash-3863 [001] .... 141.791366: zs_compact_start: pool zram0 bash-3863 [001] .... 141.791372: zs_compact: class 254: 0 objects migrated, 0 pages freed bash-3863 [001] .... 141.791375: zs_compact: class 202: 0 objects migrated, 0 pages freed bash-3863 [001] .... 141.791385: zs_compact: class 190: 1 objects migrated, 3 pages freed bash-3863 [001] .... 141.791393: zs_compact: class 168: 2 objects migrated, 2 pages freed bash-3863 [001] .... 141.791396: zs_compact: class 151: 0 objects migrated, 0 pages freed bash-3863 [001] .... 141.791407: zs_compact: class 144: 5 objects migrated, 4 pages freed bash-3863 [001] .... 141.791427: zs_compact: class 126: 8 objects migrated, 8 pages freed bash-3863 [001] .... 141.791433: zs_compact: class 111: 1 objects migrated, 4 pages freed bash-3863 [001] .... 141.791459: zs_compact: class 107: 18 objects migrated, 12 pages freed bash-3863 [001] .... 141.791487: zs_compact: class 100: 18 objects migrated, 16 pages freed bash-3863 [001] .... 141.791509: zs_compact: class 94: 18 objects migrated, 9 pages freed bash-3863 [001] .... 141.791560: zs_compact: class 91: 44 objects migrated, 24 pages freed bash-3863 [001] .... 141.791605: zs_compact: class 83: 35 objects migrated, 20 pages freed bash-3863 [001] .... 141.791616: zs_compact: class 76: 8 objects migrated, 4 pages freed bash-3863 [001] .... 141.791644: zs_compact: class 74: 21 objects migrated, 9 pages freed bash-3863 [001] .... 141.791665: zs_compact: class 71: 18 objects migrated, 10 pages freed bash-3863 [001] .... 141.791736: zs_compact: class 67: 0 objects migrated, 0 pages freed bash-3863 [001] .... 141.791763: zs_compact: class 66: 22 objects migrated, 8 pages freed bash-3863 [001] .... 141.791820: zs_compact: class 62: 18 objects migrated, 6 pages freed bash-3863 [001] .... 141.791826: zs_compact: class 58: 1 objects migrated, 4 pages freed bash-3863 [001] .... 141.791829: zs_compact: class 57: 0 objects migrated, 0 pages freed bash-3863 [001] .... 141.791834: zs_compact: class 54: 2 objects migrated, 2 pages freed ... bash-3863 [001] .... 141.791952: zs_compact: class 4: 0 objects migrated, 0 pages freed bash-3863 [001] .... 141.791964: zs_compact: class 3: 14 objects migrated, 1 pages freed bash-3863 [001] .... 141.791966: zs_compact: class 2: 0 objects migrated, 0 pages freed bash-3863 [001] .... 141.791968: zs_compact: class 1: 0 objects migrated, 0 pages freed bash-3863 [001] .... 141.791971: zs_compact: class 0: 0 objects migrated, 0 pages freed bash-3863 [001] .... 141.791973: zs_compact_end: pool zram0: 155 pages compacted Also this patch changes trace_zsmalloc_compact_start[end] to trace_zs_compact_start[end] to keep function naming consistent with others in zsmalloc. Link: http://lkml.kernel.org/r/1467882338-4300-8-git-send-email-opensource.ganesh@gmail.com Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm/zsmalloc: add trace events for zs_compact	Ganesh Mahendran	2016-08-05	2	-0/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently zsmalloc is widely used in android device. Sometimes, we want to see how frequently zs_compact is triggered or how may pages freed by zs_compact(), or which zsmalloc pool is compacted. We have backported the zs_compact() to our product(kernel 3.18). It is usefull for a longtime running device. But there is not a convenient way to get the detailed information of zs_comapct() which is usefull for performance optimization. Information about how much time zs_compact used, which pool is compacted, how many page freed, etc. With these information, we will know what is going on in zs_comapct. And draw the relation between free mem and zs_comapct. Most of the time, user can get the brief information from trace_mm_shrink_slab_[start \| end], but in some senario, they do not use zsmalloc shrinker, but trigger compaction manually. So add some trace events in zs_compact is convenient. Also we can add some zsmalloc specific information(pool name, total compact pages, etc) in zsmalloc trace. This patch add two trace events for zs_compact(), below the trace log: ----------------------------- root@land:/ # cat /d/tracing/trace kswapd0-125 [007] ...1 174.176979: zsmalloc_compact_start: pool zram0 kswapd0-125 [007] ...1 174.181967: zsmalloc_compact_end: pool zram0: 608 pages compacted(total 1794) kswapd0-125 [000] ...1 184.134475: zsmalloc_compact_start: pool zram0 kswapd0-125 [000] ...1 184.135010: zsmalloc_compact_end: pool zram0: 62 pages compacted(total 1856) kswapd0-125 [003] ...1 226.927221: zsmalloc_compact_start: pool zram0 kswapd0-125 [003] ...1 226.928575: zsmalloc_compact_end: pool zram0: 250 pages compacted(total 2106) ----------------------------- Link: http://lkml.kernel.org/r/1465289804-4913-1-git-send-email-opensource.ganesh@gmail.com Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm: oom: deduplicate victim selection code for memcg and global oom	Vladimir Davydov	2016-08-05	4	-205/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When selecting an oom victim, we use the same heuristic for both memory cgroup and global oom. The only difference is the scope of tasks to select the victim from. So we could just export an iterator over all memcg tasks and keep all oom related logic in oom_kill.c, but instead we duplicate pieces of it in memcontrol.c reusing some initially private functions of oom_kill.c in order to not duplicate all of it. That looks ugly and error prone, because any modification of select_bad_process should also be propagated to mem_cgroup_out_of_memory. Let's rework this as follows: keep all oom heuristic related code private to oom_kill.c and make oom_kill.c use exported memcg functions when it's really necessary (like in case of iterating over memcg tasks). Link: http://lkml.kernel.org/r/1470056933-7505-1-git-send-email-vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm: memcontrol: add sanity checks for memcg->id.ref on get/put	Vladimir Davydov	2016-08-05	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Link: http://lkml.kernel.org/r/ad702144f24374cbfb3a35b71658a0ae24ba7d84.1470057819.git.vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	kernel/watchdog: use nmi registers snapshot in hardlockup handler	Konstantin Khlebnikov	2016-08-05	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	NMI handler doesn't call set_irq_regs(), it's set only by normal IRQ. Thus get_irq_regs() returns NULL or stale registers snapshot with IP/SP pointing to the code interrupted by IRQ which was interrupted by NMI. NULL isn't a problem: in this case watchdog calls dump_stack() and prints full stack trace including NMI. But if we're stuck in IRQ handler then NMI watchlog will print stack trace without IRQ part at all. This patch uses registers snapshot passed into NMI handler as arguments: these registers points exactly to the instruction interrupted by NMI. Fixes: 55537871ef66 ("kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup") Link: http://lkml.kernel.org/r/146771764784.86724.6006627197118544150.stgit@buzz Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: <stable@vger.kernel.org> [4.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	block: restore /proc/partitions to not display non-partitionable removable ↵	Josh Hunt	2016-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	devices We found with newer kernels we started seeing the cdrom device showing up in /proc/partitions, but it was not there before. Looking into this I found that commit d27769ec ("block: add GENHD_FL_NO_PART_SCAN") introduces this change in behavior. It's not clear to me from the commit's changelog if this change was intentional or not. This comment still remains: /* Don't show non-partitionable removeable devices or empty devices */ so I've decided to send a patch to restore the behavior of not printing unpartitionable removable devices. Signed-off-by: Josh Hunt <johunt@akamai.com> Cc: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	kbuild: simpler generation of assembly constants	Alexey Dobriyan	2016-08-05	7	-33/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gcc doesn't really look inside "asm" statements and more or less directly emits it into assembly. So pretend "#define" is CPU instruction. C++ comment can't be used because sparc assembler doesn't understand it. Link: http://lkml.kernel.org/r/20160713173646.GA1910@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Michal Marek <mmarek@suse.cz> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	arm: arch/arm/include/asm/page.h needs personality.h	Andrew Morton	2016-08-05	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VM_DATA_DEFAULT_FLAGS uses READ_IMPLIES_EXEC, so page.h should include personality.h to provide this. This fixes no known bugs and can be safely ignored ;) Cc: Russell King <linux@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm/slab: improve performance of gathering slabinfo stats	Aruna Ramakrishna	2016-08-05	3	-28/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On large systems, when some slab caches grow to millions of objects (and many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2 seconds. During this time, interrupts are disabled while walking the slab lists (slabs_full, slabs_partial, and slabs_free) for each node, and this sometimes causes timeouts in other drivers (for instance, Infiniband). This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for total number of allocated slabs per node, per cache. This counter is updated when a slab is created or destroyed. This enables us to skip traversing the slabs_full list while gathering slabinfo statistics, and since slabs_full tends to be the biggest list when the cache is large, it results in a dramatic performance improvement. Getting slabinfo statistics now only requires walking the slabs_free and slabs_partial lists, and those lists are usually much smaller than slabs_full. We tested this after growing the dentry cache to 70GB, and the performance improved from 2s to 5ms. Link: http://lkml.kernel.org/r/1470337273-6700-1-git-send-email-aruna.ramakrishna@oracle.com Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm: memcontrol: fix memcg id ref counter on swap charge move	Vladimir Davydov	2016-08-05	1	-6/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit 73f576c04b941 swap entries do not pin memcg->css.refcnt directly. Instead, they pin memcg->id.ref. So we should adjust the reference counters accordingly when moving swap charges between cgroups. Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Link: http://lkml.kernel.org/r/3119b9b4526b18e6afcf55d3b4220437d642b00d.1470057819.git.vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm: memcontrol: fix swap counter leak on swapout from offline cgroup	Vladimir Davydov	2016-08-05	1	-6/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An offline memory cgroup might have anonymous memory or shmem left charged to it and no swap. Since only swap entries pin the id of an offline cgroup, such a cgroup will have no id and so an attempt to swapout its anon/shmem will not store memory cgroup info in the swap cgroup map. As a result, memcg->swap or memcg->memsw will never get uncharged from it and any of its ascendants. Fix this by always charging swapout to the first ancestor cgroup that hasn't released its id yet. Fixes: 73f576c04b941 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Link: http://lkml.kernel.org/r/01cbe4d1a9fd9bbd42c95e91694d8ed9c9fc2208.1470057819.git.vdavydov@virtuozzo.com Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	arch/powerpc/kernel/fadump.c: register the memory reserved by fadump	Srikar Dronamraju	2016-08-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fadump kernel reserves large chunks of memory even before the pages are initialized. This could mean memory that corresponds to several nodes might fall in memblock reserved regions. Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialize only certain size memory per node. The certain size takes into account the dentry and inode cache sizes. Currently the cache sizes are calculated based on the total system memory including the reserved memory. However such a kernel when booting the same kernel as fadump kernel will not be able to allocate the required amount of memory to suffice for the dentry and inode caches. This results in crashes like the below on large systems such as 32 TB systems. Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes) vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC) CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3 Call Trace: [c00000000108fb10] [c0000000007fac88] dump_stack+0xb0/0xf0 (unreliable) [c00000000108fb50] [c000000000235264] warn_alloc_failed+0x114/0x160 [c00000000108fbf0] [c000000000281484] __vmalloc_node_range+0x304/0x340 [c00000000108fca0] [c00000000028152c] __vmalloc+0x6c/0x90 [c00000000108fd40] [c000000000aecfb0] alloc_large_system_hash+0x1b8/0x2c0 [c00000000108fe00] [c000000000af7240] inode_init+0x94/0xe4 [c00000000108fe80] [c000000000af6fec] vfs_caches_init+0x8c/0x13c [c00000000108ff00] [c000000000ac4014] start_kernel+0x50c/0x578 [c00000000108ff90] [c000000000008c6c] start_here_common+0x20/0xa8 Register the memory reserved by fadump, so that the cache sizes are calculated based on the free memory (i.e Total memory - reserved memory). Link: http://lkml.kernel.org/r/1470330729-6273-2-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Suggested-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm/page_alloc.c: replace set_dma_reserve to set_memory_reserve	Srikar Dronamraju	2016-08-05	3	-10/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Expand the scope of the existing dma_reserve to accommodate other memory reserves too. Accordingly rename variable dma_reserve to nr_memory_reserve. set_memory_reserve() also takes a new parameter that helps to identify if the current value needs to be incremented. Link: http://lkml.kernel.org/r/1470330729-6273-1-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Suggested-by: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm, oom: fix uninitialized ret in task_will_free_mem()	Geert Uytterhoeven	2016-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mm/oom_kill.c: In function `task_will_free_mem': mm/oom_kill.c:767: warning: `ret' may be used uninitialized in this function If __task_will_free_mem() is never called inside the for_each_process() loop, ret will not be initialized. Fixes: 1af8bb43269563e4 ("mm, oom: fortify task_will_free_mem()") Link: http://lkml.kernel.org/r/1470255599-24841-1-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm/memblock.c: fix NULL dereference error	zijun_hu	2016-08-05	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It causes NULL dereference error and failure to get type_a->regions[0] info if parameter type_b of __next_mem_range_rev() == NULL Fix this by checking before dereferring and initializing idx_b to 0 The approach is tested by dumping all types of region via __memblock_dump_all() and __next_mem_range_rev() fixed to UART separately the result is okay after checking the logs. Link: http://lkml.kernel.org/r/57A0320D.6070102@zoho.com Signed-off-by: zijun_hu <zijun_hu@htc.com> Tested-by: zijun_hu <zijun_hu@htc.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	MAINTAINERS: update cgroup's document path	seokhoon.yoon	2016-08-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cgroup's document path is changed to "cgroup-v1". update it. Link: http://lkml.kernel.org/r/1470322507-5161-1-git-send-email-iamyooon@gmail.com Signed-off-by: seokhoon.yoon <iamyooon@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	slub: drop bogus inline for fixup_red_left()	Geert Uytterhoeven	2016-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With m68k-linux-gnu-gcc-4.1: include/linux/slub_def.h:126: warning: `fixup_red_left' declared inline after being called include/linux/slub_def.h:126: warning: previous declaration of `fixup_red_left' was here Commit c146a2b98eb5898e ("mm, kasan: account for object redzone in SLUB's nearest_obj()") made fixup_red_left() global, but forgot to remove the inline keyword. Fixes: c146a2b98eb5898e ("mm, kasan: account for object redzone in SLUB's nearest_obj()") Link: http://lkml.kernel.org/r/1470256262-1586-1-git-send-email-geert@linux-m68k.org Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Alexander Potapenko <glider@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	powerpc/fsl_rio: fix a missing error code	Dan Carpenter	2016-08-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We should set the error code here rather than incorrectly returning 0. Otherwise static checkers complain. Link: http://lkml.kernel.org/r/20160804053525.GM775@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexandre Bounine <alexandre.bounine@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm: initialise per_cpu_nodestats for all online pgdats at boot	Mel Gorman	2016-08-05	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Paul Mackerras and Reza Arbab reported that machines with memoryless nodes fail when vmstats are refreshed. Paul reported an oops as follows [ 1.713998] Unable to handle kernel paging request for data at address 0xff7a10000 [ 1.714164] Faulting instruction address: 0xc000000000270cd0 [ 1.714304] Oops: Kernel access of bad area, sig: 11 [#1] [ 1.714414] SMP NR_CPUS=2048 NUMA PowerNV [ 1.714530] Modules linked in: [ 1.714647] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.7.0-kvm+ #118 [ 1.714786] task: c000000ff0680010 task.stack: c000000ff0704000 [ 1.714926] NIP: c000000000270cd0 LR: c000000000270ce8 CTR: 0000000000000000 [ 1.715093] REGS: c000000ff0707900 TRAP: 0300 Not tainted (4.7.0-kvm+) [ 1.715232] MSR: 9000000102009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE,TM[E]> CR: 846b6824 XER: 20000000 [ 1.715748] CFAR: c000000000008768 DAR: 0000000ff7a10000 DSISR: 42000000 SOFTE: 1 GPR00: c000000000270d08 c000000ff0707b80 c0000000011fb200 0000000000000000 GPR04: 0000000000000800 0000000000000000 0000000000000000 0000000000000000 GPR08: ffffffffffffffff 0000000000000000 0000000ff7a10000 c00000000122aae0 GPR12: c000000000a1e440 c00000000fb80000 c00000000000c188 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000cecad0 GPR24: c000000000d035b8 c000000000d6cd18 c000000000d6cd18 c000001fffa86300 GPR28: 0000000000000000 c000001fffa96300 c000000001230034 c00000000122eb18 [ 1.717484] NIP [c000000000270cd0] refresh_zone_stat_thresholds+0x80/0x240 [ 1.717568] LR [c000000000270ce8] refresh_zone_stat_thresholds+0x98/0x240 [ 1.717648] Call Trace: [ 1.717687] [c000000ff0707b80] [c000000000270d08] refresh_zone_stat_thresholds+0xb8/0x240 (unreliable) Both supplied potential fixes but one potentially misses checks and another had redundant initialisations. This version initialises per_cpu_nodestats on a per-pgdat basis instead of on a per-zone basis. Link: http://lkml.kernel.org/r/20160804092404.GI2799@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Paul Mackerras <paulus@ozlabs.org> Reported-by: Reza Arbab <arbab@linux.vnet.ibm.com> Tested-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm/memblock: fix a typo in a comment	Alexander Kuleshov	2016-08-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	s/accomodate/accommodate Link: http://lkml.kernel.org/r/20160804121824.18100-1-kuleshovmail@gmail.com Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
\| *	mm: disable CONFIG_MEMORY_HOTPLUG when KASAN is enabled	zhong jiang	2016-08-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At present it is obvious that memory online and offline will fail when KASAN is enabled. So add the condition to limit the memory_hotplug when KASAN is enabled. Link: http://lkml.kernel.org/r/1470063651-29519-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* \|	Merge remote-tracking branch 'rtc/rtc-next'	Stephen Rothwell	2016-08-05	105	-1582/+850
\|\ \
\| * \|	rtc: rv8803: Clear V1F when setting the time	Benoît Thébaudeau	2016-07-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	V1F indicates that the time accuracy may have been compromised because of a voltage drop (possibly only temporary) below VLOW1, which stops the temperature compensation. When the time is set, the accuracy is restored, so V1F should be cleared in order to indicate this and to be able to detect the next temperature compensation loss. This is the same principle as for V2F, which is cleared when the time is set to indicate that the time is no longer invalid and to be able to detect the next data loss. Signed-off-by: Benoît Thébaudeau <benoit@wsystem.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
\| * \|	rtc: rv8803: Stop the clock while setting the time	Benoît Thébaudeau	2016-07-28	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to the application manual of the RX8900, the RESET bit must be set to 1 to prevent a timer update while setting the time. This also resets the subsecond counter. The application manual of the RV-8803 does not mention such a requirement, and it says that the 100th Seconds register is cleared when writing to the Seconds register, but using the RESET bit for the RV-8803 too should not be an issue and is probably safer. This change also ensures that the RESET bit is initialized properly in all cases. Indeed, all the registers must be initialized if the voltage has been lower than VLOW2 (triggering V2F), but not low enough to trigger a POR. Signed-off-by: Benoît Thébaudeau <benoit@wsystem.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
\| * \|	rtc: rv8803: Always apply the I²C workaround	Benoît Thébaudeau	2016-07-28	1	-66/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The I²C NACK issue of the RV-8803 may occur after any I²C START condition, depending on the timings. Consequently, the workaround must be applied for all the I²C transfers. This commit abstracts the I²C transfer code into register access functions. This avoids duplicating the I²C workaround everywhere. This also avoids the duplication of the code handling the return value of i2c_smbus_read_i2c_block_data(). Error messages are issued in case of definitive register access failures (if the workaround fails). This change also makes the I²C transfer return value checks consistent. Signed-off-by: Benoît Thébaudeau <benoit@wsystem.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>