summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* drbd: device->ldev is not guaranteed on an D_ATTACHING diskPhilipp Reisner2014-07-106-56/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some parts of the code assumed that get_ldev_if_state(device, D_ATTACHING) is sufficient to access the ldev member of the device object. That was wrong. ldev may not be there or might be freed at any time if the device has a disk state of D_ATTACHING. bm_rw() Documented that drbd_bm_read() is only called from drbd_adm_attach. drbd_bm_write() is only called when a reference is held, and it is documented that a caller has to hold a reference before calling drbd_bm_write() drbd_bm_write_page() Use get_ldev() instead of get_ldev_if_state(device, D_ATTACHING) drbd_bmio_set_n_write() No longer use get_ldev_if_state(device, D_ATTACHING). All callers hold a reference to ldev now. drbd_bmio_clear_n_write() All callers where holding a reference of ldev anyways. Remove the misleading get_ldev_if_state(device, D_ATTACHING) drbd_reconsider_max_bio_size() Removed the get_ldev_if_state(device, D_ATTACHING). All callers now pass a struct drbd_backing_dev* when they have a proper reference, or a NULL pointer. Before this fix, the receiver could trigger a NULL pointer deref when in drbd_reconsider_max_bio_size() drbd_bump_write_ordering() Used get_ldev_if_state(device, D_ATTACHING) with the wrong assumption. Remove it, and allow the caller to pass in a struct drbd_backing_dev* when the caller knows that accessing this bdev is safe. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* drbd: Move write_ordering from connection to resourcePhilipp Reisner2014-07-105-19/+19
| | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* block: virtio-blk: support multi virt queues per virtio-blk deviceMing Lei2014-07-011-20/+84
| | | | | | | | | | | | | Firstly this patch supports more than one virtual queues for virtio-blk device. Secondly this patch maps the virtual queue to blk-mq's hardware queue. With this approach, both scalability and performance can be improved. Signed-off-by: Ming Lei <ming.lei@canonical.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* include/uapi/linux/virtio_blk.h: introduce feature of VIRTIO_BLK_F_MQMing Lei2014-07-011-0/+5
| | | | | | | | | | | | | | | | | Current virtio-blk spec only supports one virtual queue for transfering data between VM and host, and inside VM all kinds of operations on the virtual queue needs to hold one lock, so cause below problems: - bad scalability - bad throughput This patch requests to introduce feature of VIRTIO_BLK_F_MQ so that more than one virtual queues can be used to virtio-blk device, then above problems can be solved or eased. Signed-off-by: Ming Lei <ming.lei@canonical.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* block SG_IO: add SG_FLAG_Q_AT_HEAD flagDouglas Gilbert2014-07-013-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the SG_IO ioctl was copied into the block layer and later into the bsg driver, subtle differences emerged. One difference is the way injected commands are queued through the block layer (i.e. this is not SCSI device queueing nor SATA NCQ). Summarizing: - SG_IO on block layer device: blk_exec*(at_head=false) - sg device SG_IO: at_head=true - bsg device SG_IO: at_head=true Some time ago Boaz Harrosh introduced a sg v4 flag called BSG_FLAG_Q_AT_TAIL to override the bsg driver default. A recent patch titled: "sg: add SG_FLAG_Q_AT_TAIL flag" allowed the sg driver default to be overridden. This patch allows a SG_IO ioctl sent to a block layer device to have its default overridden. ChangeLog: - introduce SG_FLAG_Q_AT_HEAD flag in sg.h to cause commands that are injected via a block layer device SG_IO ioctl to set at_head=true - make comments clearer about queueing in sg.h since the header is used both by the sg device and block layer device implementations of the SG_IO ioctl. - introduce BSG_FLAG_Q_AT_HEAD in bsg.h for compatibility (it does nothing) and update comments. Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Jens Axboe <axboe@fb.com>
* block: fix SG_[GS]ET_RESERVED_SIZE ioctl when max_sectors is hugeAkinobu Mita2014-07-011-4/+11
| | | | | | | | | | | | | | | SG_GET_RESERVED_SIZE and SG_SET_RESERVED_SIZE ioctls access a reserved buffer in bytes as int type. The value needs to be capped at the request queue's max_sectors. But integer overflow is not correctly handled in the calculation when converting max_sectors from sectors to bytes. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: linux-scsi@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* block: fix BLKSECTGET ioctl when max_sectors is greater than USHRT_MAXAkinobu Mita2014-07-012-3/+8
| | | | | | | | | | | | | | | | | BLKSECTGET ioctl loads the request queue's max_sectors as unsigned short value to the argument pointer. So if the max_sector is greater than USHRT_MAX, the upper 16 bits of that is just discarded. In such case, USHRT_MAX is more preferable than the lower 16 bits of max_sectors. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Douglas Gilbert <dgilbert@interlog.com> Cc: linux-scsi@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* block/partitions/efi.c: kerneldoc fixingFabian Frederick2014-07-011-22/+24
| | | | | | | | | | Adding function documentation and fixing kerneldoc warnings ('field: description' uniformization). Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jens Axboe <axboe@fb.com>
* block/partitions/msdos.c: code clean-upFabian Frederick2014-07-011-5/+8
| | | | | | | | | | | | checkpatch fixing: WARNING: Missing a blank line after declarations WARNING: space prohibited between function name and open parenthesis '(' ERROR: spaces required around that '<' (ctx:VxV) Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jens Axboe <axboe@fb.com>
* block/partitions/amiga.c: replace nolevel printk by pr_errFabian Frederick2014-07-011-5/+7
| | | | | | | | | Also add no prefix pr_fmt to avoid any future default format update Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jens Axboe <axboe@fb.com>
* block/partitions/aix.c: replace count*size kzalloc by kcallocFabian Frederick2014-07-011-1/+1
| | | | | | | | | kcalloc manages count*sizeof overflow. Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jens Axboe <axboe@fb.com>
* bio-integrity: add "bip_max_vcnt" into struct bio_integrity_payloadGu Zheng2014-07-012-9/+4
| | | | | | | | | | | | | | | | | | Commit 08778795 ("block: Fix nr_vecs for inline integrity vectors") from Martin introduces the function bip_integrity_vecs(get the useful vectors) to fix the issue about nr_vecs for inline integrity vectors that reported by David Milburn. But it seems that bip_integrity_vecs() will return the wrong number if the bio is not based on any bio_set for some reason(bio->bi_pool == NULL), because in that case, the bip_inline_vecs[0] is malloced directly. So here we add the bip_max_vcnt to record the count of vector slots, and cleanup the function bip_integrity_vecs(). Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* blk-mq: use percpu_ref for mq usage countTejun Heo2014-07-012-40/+31
| | | | | | | | | | | | | | | | | | | | | | | | | Currently, blk-mq uses a percpu_counter to keep track of how many usages are in flight. The percpu_counter is drained while freezing to ensure that no usage is left in-flight after freezing is complete. blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this per-cpu gating mechanism. This type of code has relatively high chance of subtle bugs which are extremely difficult to trigger and it's way too hairy to be open coded in blk-mq. percpu_ref can serve the same purpose after the recent changes. This patch replaces the open-coded per-cpu usage counting and draining mechanism with percpu_ref. blk_mq_queue_enter() performs tryget_live on the ref and exit() performs put. blk_mq_freeze_queue() kills the ref and waits until the reference count reaches zero. blk_mq_unfreeze_queue() revives the ref and wakes up the waiters. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* blk-mq: collapse __blk_mq_drain_queue() into blk_mq_freeze_queue()Tejun Heo2014-07-011-14/+9
| | | | | | | | | | | | | Keeping __blk_mq_drain_queue() as a separate function doesn't buy us anything and it's gonna be further simplified. Let's flatten it into its caller. This patch doesn't make any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@fb.com>
* blk-mq: decouble blk-mq freezing from generic bypassingTejun Heo2014-07-014-13/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blk_mq freezing is entangled with generic bypassing which bypasses blkcg and io scheduler and lets IO requests fall through the block layer to the drivers in FIFO order. This allows forward progress on IOs with the advanced features disabled so that those features can be configured or altered without worrying about stalling IO which may lead to deadlock through memory allocation. However, generic bypassing doesn't quite fit blk-mq. blk-mq currently doesn't make use of blkcg or ioscheds and it maps bypssing to freezing, which blocks request processing and drains all the in-flight ones. This causes problems as bypassing assumes that request processing is online. blk-mq works around this by conditionally allowing request processing for the problem case - during queue initialization. Another weirdity is that except for during queue cleanup, bypassing started on the generic side prevents blk-mq from processing new requests but doesn't drain the in-flight ones. This shouldn't break anything but again highlights that something isn't quite right here. The root cause is conflating blk-mq freezing and generic bypassing which are two different mechanisms. The only intersecting purpose that they serve is during queue cleanup. Let's properly separate blk-mq freezing from generic bypassing and simply use it where necessary. * request_queue->mq_freeze_depth is added and blk_mq_[un]freeze_queue() now operate on this counter instead of ->bypass_depth. The replacement for QUEUE_FLAG_BYPASS isn't added but the counter is tested directly. This will be further updated by later changes. * blk_mq_drain_queue() is dropped and "__" prefix is dropped from blk_mq_freeze_queue(). Queue cleanup path now calls blk_mq_freeze_queue() directly. * blk_queue_enter()'s fast path condition is simplified to simply check @q->mq_freeze_depth. Previously, the condition was !blk_queue_dying(q) && (!blk_queue_bypass(q) || !blk_queue_init_done(q)) mq_freeze_depth is incremented right after dying is set and blk_queue_init_done() exception isn't necessary as blk-mq doesn't start frozen, which only leaves the blk_queue_bypass() test which can be replaced by @q->mq_freeze_depth test. This change simplifies the code and reduces confusion in the area. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@fb.com>
* block, blk-mq: draining can't be skipped even if bypass_depth was non-zeroTejun Heo2014-07-013-10/+10
| | | | | | | | | | | | | | | | | | | | | | Currently, both blk_queue_bypass_start() and blk_mq_freeze_queue() skip queue draining if bypass_depth was already above zero. The assumption is that the one which bumped the bypass_depth should have performed draining already; however, there's nothing which prevents a new instance of bypassing/freezing from starting before the previous one finishes draining. The current code may allow the later bypassing/freezing instances to complete while there still are in-flight requests which haven't finished draining. Fix it by draining regardless of bypass_depth. We still skip draining from blk_queue_bypass_start() while the queue is initializing to avoid introducing excessive delays during boot. INIT_DONE setting is moved above the initial blk_queue_bypass_end() so that bypassing attempts can't slip inbetween. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@fb.com>
* blk-mq: fix a memory ordering bug in blk_mq_queue_enter()Tejun Heo2014-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | blk-mq uses a percpu_counter to keep track of how many usages are in flight. The percpu_counter is drained while freezing to ensure that no usage is left in-flight after freezing is complete. blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this per-cpu gating mechanism; unfortunately, it contains a subtle bug - smp_wmb() in blk_mq_queue_enter() doesn't prevent prevent the cpu from fetching @q->bypass_depth before incrementing @q->mq_usage_counter and if freezing happens inbetween the caller can slip through and freezing can be complete while there are active users. Use smp_mb() instead so that bypass_depth and mq_usage_counter modifications and tests are properly interlocked. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Signed-off-by: Jens Axboe <axboe@fb.com>
* Merge branch 'for-3.17' of ↵Jens Axboe2014-07-0111-822/+834
|\ | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into for-3.17/core Merge the percpu_ref changes from Tejun, he says they are stable now.
| * percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()Tejun Heo2014-06-282-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that explicit invocation of percpu_ref_exit() is necessary to free the percpu counter, we can implement percpu_ref_reinit() which reinitializes a released percpu_ref. This can be used implement scalable gating switch which can be drained and then re-opened without worrying about memory allocation failures. percpu_ref_is_zero() is added to be used in a sanity check in percpu_ref_exit(). As this function will be useful for other purposes too, make it a public interface. v2: Use smp_read_barrier_depends() instead of smp_load_acquire(). We only need data dep barrier and smp_load_acquire() is stronger and heavier on some archs. Spotted by Lai Jiangshan. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
| * percpu-refcount: require percpu_ref to be exited explicitlyTejun Heo2014-06-285-33/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, a percpu_ref undoes percpu_ref_init() automatically by freeing the allocated percpu area when the percpu_ref is killed. While seemingly convenient, this has the following niggles. * It's impossible to re-init a released reference counter without going through re-allocation. * In the similar vein, it's impossible to initialize a percpu_ref count with static percpu variables. * We need and have an explicit destructor anyway for failure paths - percpu_ref_cancel_init(). This patch removes the automatic percpu counter freeing in percpu_ref_kill_rcu() and repurposes percpu_ref_cancel_init() into a generic destructor now named percpu_ref_exit(). percpu_ref_destroy() is considered but it gets confusing with percpu_ref_kill() while "exit" clearly indicates that it's the counterpart of percpu_ref_init(). All percpu_ref_cancel_init() users are updated to invoke percpu_ref_exit() instead and explicit percpu_ref_exit() calls are added to the destruction path of all percpu_ref users. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Nicholas A. Bellinger <nab@linux-iscsi.org> Cc: Li Zefan <lizefan@huawei.com>
| * percpu-refcount: use unsigned long for pcpu_count pointerTejun Heo2014-06-282-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu_ref->pcpu_count is a percpu pointer with a status flag in its lowest bit. As such, it always goes through arithmetic operations which is very cumbersome to do on a pointer. It has to be first casted to unsigned long and then back. Let's just make the field unsigned long so that we can skip the first casts. While at it, rename it to pcpu_counter_ptr to clarify that it's a pointer value. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Christoph Lameter <cl@linux-foundation.org>
| * percpu-refcount: add helpers for ->percpu_count accessesTejun Heo2014-06-282-22/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * All four percpu_ref_*() operations implemented in the header file perform the same operation to determine whether the percpu_ref is alive and extract the percpu pointer. Factor out the common logic into __pcpu_ref_alive(). This doesn't change the generated code. * There are a couple places in percpu-refcount.c which masks out PCPU_REF_DEAD to obtain the percpu pointer. Factor it out into pcpu_count_ptr(). * The above changes make the WARN_ON_ONCE() conditional at the top of percpu_ref_kill_and_confirm() the only user of REF_STATUS(). Test PCPU_REF_DEAD directly and remove REF_STATUS(). This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Christoph Lameter <cl@linux-foundation.org>
| * percpu-refcount: one bit is enough for REF_STATUSTejun Heo2014-06-282-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu-refcount currently reserves two lowest bits of its percpu pointer to indicate its state; however, only one bit is used for PCPU_REF_DEAD. Simplify it by removing PCPU_STATUS_BITS/MASK and testing PCPU_REF_DEAD directly. This also allows the compiler to choose a more efficient instruction depending on the architecture. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Christoph Lameter <cl@linux-foundation.org>
| * percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()Tejun Heo2014-06-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ioctx_alloc() reaches inside percpu_ref and directly frees ->pcpu_count in its failure path, which is quite gross. percpu_ref has been providing a proper interface to do this, percpu_ref_cancel_init(), for quite some time now. Let's use that instead. This patch doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Kent Overstreet <kmo@daterainc.com>
| * workqueue: stronger test in process_one_work()Lai Jiangshan2014-06-191-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the recent changes, when POOL_DISASSOCIATED is cleared, the running worker's local CPU should be the same as pool->cpu without any exception even during cpu-hotplug. Update the sanity check in process_one_work() accordingly. This patch changes "(proposition_A && proposition_B && proposition_C)" to "(proposition_B && proposition_C)", so if the old compound proposition is true, the new one must be true too. so this will not hide any possible bug which can be caught by the old test. tj: Minor updates to the description. CC: Jason J. Herne <jjherne@linux.vnet.ibm.com> CC: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * workqueue: clear POOL_DISASSOCIATED in rebind_workers()Lai Jiangshan2014-06-191-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit a9ab775bcadf ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE") moved the pool->lock into rebind_workers() without also moving "pool->flags &= ~POOL_DISASSOCIATED". There is nothing wrong with "pool->flags &= ~POOL_DISASSOCIATED" not being moved together, but there isn't any benefit either. We move it into rebind_workers() and achieve these benefits: 1) Better readability. POOL_DISASSOCIATED is cleared in rebind_workers() as expected. 2) When POOL_DISASSOCIATED is cleared, we can ensure that all the running workers of the pool are on the local CPU (pool->cpu). tj: Cosmetic updates to the code and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * percpu: Use ALIGN macro instead of hand coding alignment calculationChristoph Lameter2014-06-191-2/+1
| | | | | | | | | | Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and ↵Tejun Heo2014-06-172-9/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | operations __verify_pcpu_ptr() is used to verify that a specified parameter is actually an percpu pointer by percpu accessor and operation implementations. Currently, where it's called isn't clearly defined and we just ensure that it's invoked at least once for all accessors and operations. The lack of clarity on when it should be called isn't nice and given that this is a completely generic issue, there's no reason to make archs worry about it. This patch updates __verify_pcpu_ptr() invocations such that it's always invoked from the final generic wrapper once per access or operation. As this is already the case for {raw|this}_cpu_*() definitions through __pcpu_size_*(), only the {raw|per|this}_cpu_ptr() accessors need to be updated. This change makes it unnecessary for archs to worry about __verify_pcpu_ptr(). x86's arch_raw_cpu_ptr() is updated accordingly. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * percpu: preffity percpu header filesTejun Heo2014-06-172-399/+435
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu macros are difficult to read. It's partly because they're fairly complex but also because they simply lack visual and conventional consistency to an unusual degree. The preceding patches tried to organize macro definitions consistently by their roles. This patch makes the following cosmetic changes to improve overall readability. * Use consistent convention for multi-line macro definitions - "do {" or "({" are now put on their own lines and the line continuing '\' are all put on the same column. * Temp variables used inside macro are consistently given "__" prefix. * When a macro argument is passed to another macro or a function, putting extra parenthses around it doesn't help anything. Don't put them. * _this_cpu_generic_*() are renamed to this_cpu_generic_*() so that they're consistent with raw_cpu_generic_*(). * Reorganize raw_cpu_*() and this_cpu_*() definitions so that trivial wrappers are collected in one place after actual operation definitions. * Other misc cleanups including reorganizing comments. All changes in this patch are cosmetic and cause no functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Christoph Lameter <cl@linux.com>
| * percpu: use raw_cpu_*() to define __this_cpu_*()Tejun Heo2014-06-171-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __this_cpu_*() operations are the same as raw_cpu_*() operations except for the added __this_cpu_preempt_check(). Curiously, these were defined using __pcu_size_call_*() instead of being layered on top of raw_cpu_*(). Let's layer them so that __this_cpu_*() are defined in terms of raw_cpu_*(). It's simpler and less error-prone this way. This patch doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Christoph Lameter <cl@linux.com>
| * percpu: reorder macros in percpu header filesTejun Heo2014-06-172-112/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * In include/asm-generic/percpu.h, collect {raw|_this}_cpu_generic*() macros into one place. They were dispersed through {raw|this}_cpu_*_N() definitions and the visiual inconsistency was making following the code unnecessarily difficult. * In include/linux/percpu-defs.h, move __verify_pcpu_ptr() later in the file so that it's right above accessor definitions where it's actually used. This is pure reorganization. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Christoph Lameter <cl@linux.com>
| * percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.hTejun Heo2014-06-172-208/+209
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're in the process of moving all percpu accessors and operations to include/linux/percpu-defs.h so that they're available to arch headers without having to include full include/linux/percpu.h which may cause cyclic inclusion dependency. This patch moves {raw|this}_cpu_*() definitions from include/linux/percpu.h to include/linux/percpu-defs.h. The code is moved mostly verbatim; however, raw_cpu_*() are placed above this_cpu_*() which is more conventional as the raw operations may be used to defined other variants. This is pure reorganization. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Christoph Lameter <cl@linux.com>
| * percpu: move generic {raw|this}_cpu_*_N() definitions to ↵Tejun Heo2014-06-172-344/+341
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | include/asm-generic/percpu.h {raw|this}_cpu_*_N() operations are expected to be provided by archs and the generic definitions are provided as fallbacks. As such, these firmly belong to include/asm-generic/percpu.h. Move the generic definitions to include/asm-generic/percpu.h. The code is moved mostly verbatim; however, raw_cpu_*_N() are placed above this_cpu_*_N() which is more conventional as the raw operations may be used to defined other variants. This is pure reorganization. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Christoph Lameter <cl@linux.com>
| * percpu: only allow sized arch overrides for {raw|this}_cpu_*() opsTejun Heo2014-06-171-89/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, percpu allows two separate methods for overriding {raw|this}_cpu_*() ops - for a given operation, an arch can provide whole replacement or sized sub operations to override specific parts of it. e.g. arch either can provide this_cpu_add() or this_cpu_add_4() to override only the 4 byte operation. While quite flexible on a glance, the dual-overriding scheme complicates the code path for no actual gain. It compilcates the already complex operation definitions and if an arch wants to override all sizes, it can easily provide all variants anyway. In fact, no arch is actually making use of whole operation override. Another oddity is that __this_cpu_*() operations are defined in the same way as raw_cpu_*() but ignores full overrides of the raw_cpu_*() and doesn't allow full operation override, so if an arch provides whole overrides for raw_cpu_*() operations __this_cpu_*() ends up using the generic implementations. More importantly, it takes away the layering between arch-specific and generic parts making it impossible for the generic part to implement arch-independent features on top of arch-specific overrides. This patch removes the support for whole operation overrides. As no arch is using it, this doesn't cause any actual difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Christoph Lameter <cl@linux.com>
| * percpu: reorganize include/linux/percpu-defs.hTejun Heo2014-06-171-23/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reorganize for better readability. * Accessor definitions are collected into one place and SMP and UP now define them in the same order. * Definitions are layered when possible - e.g. per_cpu() is now defined in terms of this_cpu_ptr(). * Rather pointless comment dropped. * per_cpu(), __raw_get_cpu_var() and __get_cpu_var() are defined in a way which can be shared between SMP and UP and moved out of CONFIG_SMP blocks. This patch doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org>
| * percpu: move accessors from include/linux/percpu.h to percpu-defs.hTejun Heo2014-06-172-37/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | include/linux/percpu-defs.h is gonna host all accessors and operations so that arch headers can make use of them too without worrying about circular dependency through include/linux/percpu.h. This patch moves the following accessors from include/linux/percpu.h to include/linux/percpu-defs.h. * get/put_cpu_var() * get/put_cpu_ptr() * per_cpu_ptr() This is pure reorgniazation. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Christoph Lameter <cl@linux.com>
| * percpu: include/asm-generic/percpu.h should contain only arch-overridable partsTejun Heo2014-06-172-64/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The roles of the various percpu header files has become unclear. There are four header files involved. include/linux/percpu-defs.h include/linux/percpu.h include/asm-generic/percpu.h arch/*/include/asm/percpu.h The original intention for include/asm-generic/percpu.h is providing generic definitions for arch-overridable parts; however, it now hosts various stuff which can't be overridden by archs. Also, include/linux/percpu-defs.h was initially added to contain section and percpu variable definition macros so that arch header files can make use of them without worrying about introducing cyclic inclusion dependency by including include/linux/percpu.h; however, arch headers sometimes need to access percpu variables too and this is one of the reasons why some accessors were implemented in include/linux/asm-generic/percpu.h. Let's clear up the situation by making include/asm-generic/percpu.h contain only arch-overridable parts and moving accessors and operations into include/linux/percpu-defs. Note that this patch only moves things from include/asm-generic/percpu.h. include/linux/percpu.h will be taken care of by later patches. This patch moves the followings. * SHIFT_PERCPU_PTR() / VERIFY_PERCPU_PTR() * per_cpu() * raw_cpu_ptr() * this_cpu_ptr() * __get_cpu_var() * __raw_get_cpu_var() * __this_cpu_ptr() * PER_CPU_[SHARED_]ALIGNED_SECTION * PER_CPU_[SHARED_]ALIGNED_SECTION * PER_CPU_FIRST_SECTION This patch is pure reorganization. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Christoph Lameter <cl@linux.com>
| * percpu: introduce arch_raw_cpu_ptr()Tejun Heo2014-06-172-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, archs can override raw_cpu_ptr() directly; however, we wanna build a layer of indirection in the generic part of percpu so that we can implement generic features there without affecting archs. Introduce arch_raw_cpu_ptr() which is used to define raw_cpu_ptr() by generic percpu code. The two are identical for now. x86 is currently the only arch which overrides raw_cpu_ptr() and is converted to define arch_raw_cpu_ptr() instead. This doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com>
| * percpu: disallow archs from overriding SHIFT_PERCPU_PTR()Tejun Heo2014-06-171-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It has been about half a decade since all archs started using the dynamic percpu allocator and thus the same SHIFT_PERCPU_PTR() implementation. There's no benefit in overriding SHIFT_PERCPU_PTR() anymore. Remove #ifndef around it to clarify that this is identical regardless of the arch. This patch doesn't cause any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Christoph Lameter <cl@linux.com>
* | Linux 3.16-rc3v3.16-rc3Linus Torvalds2014-06-291-1/+1
| |
* | Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds2014-06-296-6/+45
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ARM fixes from Russell King: "Another round of ARM fixes. The largest change here is the L2 changes to work around problems for the Armada 37x/380 devices, where most of the size comes down to comments rather than code. The other significant fix here is for the ptrace code, to ensure that rewritten syscalls work as intended. This was pointed out by Kees Cook, but Will Deacon reworked the patch to be more elegant. The remainder are fairly trivial changes" * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: 8087/1: ptrace: reload syscall number after secure_computing() check ARM: 8086/1: Set memblock limit for nommu ARM: 8085/1: sa1100: collie: add top boot mtd partition ARM: 8084/1: sa1100: collie: revert back to cfi_probe ARM: 8080/1: mcpm.h: remove unused variable declaration ARM: 8076/1: mm: add support for HW coherent systems in PL310 cache
| * | ARM: 8087/1: ptrace: reload syscall number after secure_computing() checkWill Deacon2014-06-291-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the syscall tracing path, we call out to secure_computing() to allow seccomp to check the syscall number being attempted. As part of this, a SIGTRAP may be sent to the tracer and the syscall could be re-written by a subsequent SET_SYSCALL ptrace request. Unfortunately, this new syscall is ignored by the current code unless TIF_SYSCALL_TRACE is also set on the current thread. This patch slightly reworks the enter path of the syscall tracing code so that we always reload the syscall number from current_thread_info()->syscall after the potential ptrace traps. Acked-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | ARM: 8086/1: Set memblock limit for nommuLaura Abbott2014-06-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1c2f87c (ARM: 8025/1: Get rid of meminfo) changed find_limits to use memblock_get_current_limit for calculating the max_low pfn. nommu targets never actually set a limit on memblock though which means memblock_get_current_limit will just return the default value. Set the memblock_limit to be the end of DDR to make sure bounds are calculated correctly. Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | ARM: 8085/1: sa1100: collie: add top boot mtd partitionAndrea Adami2014-06-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | The CFI mapping is now perfect so we can expose the top block, read only. There isn't much to read, though, just the sharpsl_params values. Signed-off-by: Andrea Adami <andrea.adami@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | ARM: 8084/1: sa1100: collie: revert back to cfi_probeAndrea Adami2014-06-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reverts commit d26b17edafc45187c30cae134a5e5429d58ad676 ARM: sa1100: collie.c: fall back to jedec_probe flash detection Unfortunately the detection was challenged on the defective unit used for tests: one of the NOR chips did not respond to the CFI query. Moreover that bad device needed extra delays on erase-suspend/resume cycles. Tested personally on 3 different units and with feedback of two other users. Signed-off-by: Andrea Adami <andrea.adami@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | ARM: 8080/1: mcpm.h: remove unused variable declarationNicolas Pitre2014-06-291-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | The sync_phys variable has been replaced by link time computation in mcpm_head.S before the code was submitted upstream. Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | ARM: 8076/1: mm: add support for HW coherent systems in PL310 cacheThomas Petazzoni2014-06-292-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a PL310 cache is used on a system that provides hardware coherency, the outer cache sync operation is useless, and can be skipped. Moreover, on some systems, it is harmful as it causes deadlocks between the Marvell coherency mechanism, the Marvell PCIe controller and the Cortex-A9. To avoid this, this commit introduces a new Device Tree property 'arm,io-coherent' for the L2 cache controller node, valid only for the PL310 cache. It identifies the usage of the PL310 cache in an I/O coherent configuration. Internally, it makes the driver disable the outer cache sync operation. Note that technically speaking, a fully coherent system wouldn't require any of the other .outer_cache operations. However, in practice, when booting secondary CPUs, these are not yet coherent, and therefore a set of cache maintenance operations are necessary at this point. This explains why we keep the other .outer_cache operations and only ->sync is disabled. While in theory any write to a PL310 register could cause the deadlock, in practice, disabling ->sync is sufficient to workaround the deadlock, since the other cache maintenance operations are only used in very specific situations. Contrary to previous versions of this patch, this new version does not simply NULL-ify the ->sync member, because the l2c_init_data structures are now 'const' and therefore cannot be modified, which is a good thing. Therefore, this patch introduces a separate l2c_init_data instance, called of_l2c310_coherent_data. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* | | MAINTAINERS: exceptions for Documentation maintainerRandy Dunlap2014-06-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | Note that I don't maintain Documentation/ABI/, Documentation/devicetree/, or the language translation files. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Documentation: add section about git to email-clients.txtDan Carpenter2014-06-291-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | These days most people use git to send patches so I have added a section about that. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'spi-v3.16-rc2' of ↵Linus Torvalds2014-06-284-35/+27
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "A few driver specific fixes, the biggest one being a fix for the newly added Qualcomm SPI controller driver to make it not use its internal chip select due to hardware bugs, replacing it with GPIOs" * tag 'spi-v3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: qup: Remove chip select function spi: qup: Fix order of spi_register_master spi: sh-sci: fix use-after-free in sh_sci_spi_remove() spi/pxa2xx: fix incorrect SW mode chipselect setting for BayTrail LPSS SPI