summaryrefslogtreecommitdiff
path: root/drivers/nvme/host/nvme.h
Commit message (Collapse)AuthorAgeFilesLines
* nvme: implement non-mdts command limitsKeith Busch2021-04-061-0/+3
| | | | | | | | | | | | | | | | | Commands that access LBA contents without a data transfer between the host historically have not had a spec defined upper limit. The driver set the queue constraints for such commands to the max data transfer size just to be safe, but this artificial constraint frequently limits devices below their capabilities. The NVMe Workgroup ratified TP4040 defines how a controller may advertise their non-MDTS limits. Use these if provided and default to the current constraints if not. Since the Dataset Management command limits are defined in logical blocks, but without a namespace to tell us the logical block size, the code defaults to the safe 512b size. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: use driver pdu command for passthroughKeith Busch2021-04-021-2/+1
| | | | | | | | | | | | | | | | | | All nvme transport drivers preallocate an nvme command for each request. Assume to use that command for nvme_setup_cmd() instead of requiring drivers pass a pointer to it. All nvme drivers must initialize the generic nvme_request 'cmd' to point to the transport's preallocated nvme_command. The generic nvme_request cmd pointer had previously been used only as a temporary copy for passthrough commands. Since it now points to the command that gets dispatched, passthrough commands must directly set it up prior to executing the request. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: rename nvme_init_identify()Chaitanya Kulkarni2021-04-021-1/+1
| | | | | | | | | | | | | | This is a prep patch so that we can move the identify data structure related code initialization from nvme_init_identify() into a helper. Rename the function nvmet_init_identify() to nvmet_init_ctrl_finish(). Next patch will move the nvme_id_ctrl related initialization from newly renamed function nvme_init_ctrl_finish() into the nvme_init_identify() helper. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: use NVME_CTRL_CMIC_ANA macroKanchan Joshi2021-04-021-1/+1
| | | | | | | | Use the proper macro instead of hard-coded value. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: add 48-bit DMA address quirk for Amazon NVMe controllersFilippo Sironi2021-02-101-0/+6
| | | | | | | | | | | | | | Some Amazon NVMe controllers do not follow the NVMe specification and are limited to 48-bit DMA addresses. Add a quirk to force bounce buffering if needed and limit the IOVA allocation for these devices. This affects all current Amazon NVMe controllers that expose EBS volumes (0x0061, 0x0065, 0x8061) and local instance storage (0xcd00, 0xcd01, 0xcd02). Signed-off-by: Filippo Sironi <sironi@amazon.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-hwmon: rework to avoid devm allocationHannes Reinecke2021-02-101-0/+8
| | | | | | | | | | | | | | | | The original design to use device-managed resource allocation doesn't really work as the NVMe controller has a vastly different lifetime than the hwmon sysfs attributes, causing warning about duplicate sysfs entries upon reconnection. This patch reworks the hwmon allocation to avoid device-managed resource allocation, and uses the NVMe controller as parent for the sysfs attributes. Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Hannes Reinecke <hare@suse.de> Tested-by: Enzo Matsumiya <ematsumiya@suse.de> Tested-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: introduce a nvme_host_path_error helperChao Leng2021-02-101-0/+1
| | | | | | | | | | | | | When using nvme native multipathing, if a path related error occurs during ->queue_rq, the request needs to be completed with NVME_SC_HOST_PATH_ERROR so that the request can be failed over. Introduce a helper to complete the command from ->queue_rq in a wait that invokes nvme_complete_rq. Signed-off-by: Chao Leng <lengchao@huawei.com> [hch: renamed, added a return value to clean up the callers a bit] Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-core: add cancel tagset helpersChao Leng2021-02-021-0/+2
| | | | | | | | Add nvme_cancel_tagset and nvme_cancel_admin_tagset for tear down and reconnection error handling. Signed-off-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: remove the unused status argument from nvme_trace_bio_completeMax Gurtovoy2021-01-061-4/+2
| | | | | | | | The only used argument in this function is the "req". Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: unexport functions with no external callerMinwoo Im2021-01-061-3/+0
| | | | | | | | | | | There are no callers for nvme_reset_ctrl_sync() and nvme_alloc_request_qid() so that we keep the symbols exported. Unexport those functions, mark them static and update the header file respectively. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* Merge tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-blockLinus Torvalds2020-12-161-2/+9
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver updates from Jens Axboe: "Nothing major in here: - NVMe pull request from Christoph: - nvmet passthrough improvements (Chaitanya Kulkarni) - fcloop error injection support (James Smart) - read-only support for zoned namespaces without Zone Append (Javier González) - improve some error message (Minwoo Im) - reject I/O to offline fabrics namespaces (Victor Gladkov) - PCI queue allocation cleanups (Niklas Schnelle) - remove an unused allocation in nvmet (Amit Engel) - a Kconfig spelling fix (Colin Ian King) - nvme_req_qid simplication (Baolin Wang) - MD pull request from Song: - Fix race condition in md_ioctl() (Dae R. Jeong) - Initialize read_slot properly for raid10 (Kevin Vigor) - Code cleanup (Pankaj Gupta) - md-cluster resync/reshape fix (Zhao Heming) - Move null_blk into its own directory (Damien Le Moal) - null_blk zone and discard improvements (Damien Le Moal) - bcache race fix (Dongsheng Yang) - Set of rnbd fixes/improvements (Gioh Kim, Guoqing Jiang, Jack Wang, Lutz Pogrell, Md Haris Iqbal) - lightnvm NULL pointer deref fix (tangzhenhao) - sr in_interrupt() removal (Sebastian Andrzej Siewior) - FC endpoint security support for s390/dasd (Jan Höppner, Sebastian Ott, Vineeth Vijayan). From the s390 arch guys, arch bits included as it made it easier for them to funnel the feature through the block driver tree. - Follow up fixes (Colin Ian King)" * tag 'for-5.11/drivers-2020-12-14' of git://git.kernel.dk/linux-block: (64 commits) block: drop dead assignments in loop_init() sr: Remove in_interrupt() usage in sr_init_command(). sr: Switch the sector size back to 2048 if sr_read_sector() changed it. cdrom: Reset sector_size back it is not 2048. drivers/lightnvm: fix a null-ptr-deref bug in pblk-core.c null_blk: Move driver into its own directory null_blk: Allow controlling max_hw_sectors limit null_blk: discard zones on reset null_blk: cleanup discard handling null_blk: Improve implicit zone close null_blk: improve zone locking block: Align max_hw_sectors to logical blocksize null_blk: Fail zone append to conventional zones null_blk: Fix zone size initialization bcache: fix race between setting bdev state to none and new write request direct to backing block/rnbd: fix a null pointer dereference on dev->blk_symlink_name block/rnbd-clt: Dynamically alloc buffer for pathname & blk_symlink_name block/rnbd: call kobject_put in the failure path Documentation/ABI/rnbd-srv: add document for force_close block/rnbd-srv: close a mapped device from server side. ...
| * nvme: export zoned namespaces without Zone Append support read-onlyJavier González2020-12-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow ZNS NVMe SSDs to present a read-only namespace when append is not supported, instead of rejecting the namespace directly. This allows (i) the namespace to be used in read-only mode, which is not a problem as the append command only affects the write path, and (ii) to use standard management tools such as nvme-cli to choose a different format or firmware slot that is compatible with the Linux zoned block device. Signed-off-by: Javier González <javier.gonz@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fabrics: reject I/O to offline deviceVictor Gladkov2020-12-011-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commands get stuck while Host NVMe-oF controller is in reconnect state. The controller enters into reconnect state when it loses connection with the target. It tries to reconnect every 10 seconds (default) until a successful reconnect or until the reconnect time-out is reached. The default reconnect time out is 10 minutes. Applications are expecting commands to complete with success or error within a certain timeout (30 seconds by default). The NVMe host is enforcing that timeout while it is connected, but during reconnect the timeout is not enforced and commands may get stuck for a long period or even forever. To fix this long delay due to the default timeout, introduce new "fast_io_fail_tmo" session parameter. The timeout is measured in seconds from the controller reconnect and any command beyond that timeout is rejected. The new parameter value may be passed during 'connect'. The default value of -1 means no timeout (similar to current behavior). Signed-off-by: Victor Gladkov <victor.gladkov@kioxia.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: split nvme_alloc_request()Chaitanya Kulkarni2020-12-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now nvme_alloc_request() allocates a request from block layer based on the value of the qid. When qid set to NVME_QID_ANY it used blk_mq_alloc_request() else blk_mq_alloc_request_hctx(). The function nvme_alloc_request() is called from different context, The only place where it uses non NVME_QID_ANY value is for fabrics connect commands :- nvme_submit_sync_cmd() NVME_QID_ANY nvme_features() NVME_QID_ANY nvme_sec_submit() NVME_QID_ANY nvmf_reg_read32() NVME_QID_ANY nvmf_reg_read64() NVME_QID_ANY nvmf_reg_write32() NVME_QID_ANY nvmf_connect_admin_queue() NVME_QID_ANY nvme_submit_user_cmd() NVME_QID_ANY nvme_alloc_request() nvme_keep_alive() NVME_QID_ANY nvme_alloc_request() nvme_timeout() NVME_QID_ANY nvme_alloc_request() nvme_delete_queue() NVME_QID_ANY nvme_alloc_request() nvmet_passthru_execute_cmd() NVME_QID_ANY nvme_alloc_request() nvmf_connect_io_queue() QID __nvme_submit_sync_cmd() nvme_alloc_request() With passthru nvme_alloc_request() now falls into the I/O fast path such that blk_mq_alloc_request_hctx() is never gets called and that adds additional branch check in fast path. Split the nvme_alloc_request() into nvme_alloc_request() and nvme_alloc_request_qid(). Replace each call of the nvme_alloc_request() with NVME_QID_ANY param with a call to newly added nvme_alloc_request() without NVME_QID_ANY. Replace a call to nvme_alloc_request() with QID param with a call to newly added nvme_alloc_request() and nvme_alloc_request_qid() based on the qid value set in the __nvme_submit_sync_cmd(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: use consistent macro name for timeoutChaitanya Kulkarni2020-12-011-1/+1
| | | | | | | | | | | | | | | | This is purely a clenaup patch, add prefix NVME to the ADMIN_TIMEOUT to make consistent with NVME_IO_TIMEOUT. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: simplify nvme_req_qid()Baolin Wang2020-12-011-1/+2
| | | | | | | | | | | | | | | | Use the request's '->mq_hctx->queue_num' directly to simplify the nvme_req_qid() function. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | nvme: directly cache command effects logKeith Busch2020-11-141-6/+0
|/ | | | | | | | | Remove the struct used for tracking known command effects logs in a list. This is now saved in an xarray that doesn't use these elements. Instead, store the log directly instead of the wrapper struct. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: introduce nvme_sync_io_queuesChao Leng2020-11-031-0/+1
| | | | | | | | | Introduce sync io queues for some scenarios which just only need sync io queues not sync all queues. Signed-off-by: Chao Leng <lengchao@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: use queuedata for nvme_req_qidKeith Busch2020-10-221-1/+1
| | | | | | | | | | The request's rq_disk isn't set for passthrough IO commands, so tracing uses qid 0 for these which incorrectly decodes as an admin command. Use the request_queue's queuedata instead since that value is always set for the IO queues, and never set for the admin queue. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
* Merge tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-blockLinus Torvalds2020-10-131-8/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver updates from Jens Axboe: "Here are the driver updates for 5.10. A few SCSI updates in here too, in coordination with Martin as they depend on core block changes for the shared tag bitmap. This contains: - NVMe pull requests via Christoph: - fix keep alive timer modification (Amit Engel) - order the PCI ID list more sensibly (Andy Shevchenko) - cleanup the open by controller helper (Chaitanya Kulkarni) - use an xarray for the CSE log lookup (Chaitanya Kulkarni) - support ZNS in nvmet passthrough mode (Chaitanya Kulkarni) - fix nvme_ns_report_zones (Christoph Hellwig) - add a sanity check to nvmet-fc (James Smart) - fix interrupt allocation when too many polled queues are specified (Jeffle Xu) - small nvmet-tcp optimization (Mark Wunderlich) - fix a controller refcount leak on init failure (Chaitanya Kulkarni) - misc cleanups (Chaitanya Kulkarni) - major refactoring of the scanning code (Christoph Hellwig) - MD updates via Song: - Bug fixes in bitmap code, from Zhao Heming - Fix a work queue check, from Guoqing Jiang - Fix raid5 oops with reshape, from Song Liu - Clean up unused code, from Jason Yan - Discard improvements, from Xiao Ni - raid5/6 page offset support, from Yufen Yu - Shared tag bitmap for SCSI/hisi_sas/null_blk (John, Kashyap, Hannes) - null_blk open/active zone limit support (Niklas) - Set of bcache updates (Coly, Dongsheng, Qinglang)" * tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (78 commits) md/raid5: fix oops during stripe resizing md/bitmap: fix memory leak of temporary bitmap md: fix the checking of wrong work queue md/bitmap: md_bitmap_get_counter returns wrong blocks md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks md/raid0: remove unused function is_io_in_chunk_boundary() nvme-core: remove extra condition for vwc nvme-core: remove extra variable nvme: remove nvme_identify_ns_list nvme: refactor nvme_validate_ns nvme: move nvme_validate_ns nvme: query namespace identifiers before adding the namespace nvme: revalidate zone bitmaps in nvme_update_ns_info nvme: remove nvme_update_formats nvme: update the known admin effects nvme: set the queue limits in nvme_update_ns_info nvme: remove the 0 lba_shift check in nvme_update_ns_info nvme: clean up the check for too large logic block sizes nvme: freeze the queue over ->lba_shift updates nvme: factor out a nvme_configure_metadata helper ...
| * nvme: remove the disk argument to nvme_update_zone_infoChristoph Hellwig2020-10-071-6/+2
| | | | | | | | | | | | | | | | | | | | The queue can trivially be derived from the nvme_ns structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
| * nvme: fix initialization of the zone bitmapsChristoph Hellwig2020-10-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The removal of the ->revalidate_disk method broke the initialization of the zone bitmaps, as nvme_revalidate_disk now never gets called during initialization. Move the zone related code from nvme_revalidate_disk into a new helper in zns.c, and call it from nvme_alloc_ns in addition to nvme_validate_ns to ensure the zone bitmaps are initialized during probe. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
| * nvme: use an xarray to lookup the Commands Supported and Effects logChaitanya Kulkarni2020-09-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using linked list we have to open code the locking, search, and destroy operations with the loops even if data structure doesn't fall into the fast path. One of the main advantage of having XArray to store, search, and remove items is that it handles all the locking by itself, avoids the loops when using linked lists, provides clear API to replace the linked list's search and destroy loops. This patch replaces the ctrl->cel list with XArray and removes :- a. Extra code needed for the linked list for ctrl->cel item management such as nvme_find_cel(). b. Destroy loop in the nvme_free_ctrl(). c. Explicit insertion locking in the nvme_get_effects_log(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: lift the file open code from nvme_ctrl_get_by_pathChaitanya Kulkarni2020-09-271-1/+1
| | | | | | | | | | | | | | | | | | | | Lift opening the file open/close code from nvme_ctrl_get_by_path into the caller, just keeping a simple nvme_ctrl_from_file() helper. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: refactored a bit, split the bug fixes into a separate prep patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
| * Merge branch 'for-5.10/block' into for-5.10/driversJens Axboe2020-09-241-13/+0
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * for-5.10/block: (140 commits) bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag bdi: invert BDI_CAP_NO_ACCT_WB bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag mm: use SWP_SYNCHRONOUS_IO more intelligently bdi: remove BDI_CAP_SYNCHRONOUS_IO bdi: remove BDI_CAP_CGROUP_WRITEBACK block: lift setting the readahead size into the block layer md: update the optimal I/O size on reshape bdi: initialize ->ra_pages and ->io_pages in bdi_init aoe: set an optimal I/O size bcache: inherit the optimal I/O size drbd: remove dead code in device_to_statistics fs: remove the unused SB_I_MULTIROOT flag block: mark blkdev_get static PM: mm: cleanup swsusp_swap_check mm: split swap_type_of PM: rewrite is_hibernate_resume_dev to not require an inode mm: cleanup claim_swapfile ocfs2: cleanup o2hb_region_dev_store dasd: cleanup dasd_scan_partitions ...
* | \ Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-blockLinus Torvalds2020-10-131-13/+0
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: - Series of merge handling cleanups (Baolin, Christoph) - Series of blk-throttle fixes and cleanups (Baolin) - Series cleaning up BDI, seperating the block device from the backing_dev_info (Christoph) - Removal of bdget() as a generic API (Christoph) - Removal of blkdev_get() as a generic API (Christoph) - Cleanup of is-partition checks (Christoph) - Series reworking disk revalidation (Christoph) - Series cleaning up bio flags (Christoph) - bio crypt fixes (Eric) - IO stats inflight tweak (Gabriel) - blk-mq tags fixes (Hannes) - Buffer invalidation fixes (Jan) - Allow soft limits for zone append (Johannes) - Shared tag set improvements (John, Kashyap) - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel) - DM no-wait support (Mike, Konstantin) - Request allocation improvements (Ming) - Allow md/dm/bcache to use IO stat helpers (Song) - Series improving blk-iocost (Tejun) - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang, Xianting, Yang, Yufen, yangerkun) * tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits) block: fix uapi blkzoned.h comments blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue blk-mq: get rid of the dead flush handle code path block: get rid of unnecessary local variable block: fix comment and add lockdep assert blk-mq: use helper function to test hw stopped block: use helper function to test queue register block: remove redundant mq check block: invoke blk_mq_exit_sched no matter whether have .exit_sched percpu_ref: don't refer to ref->data if it isn't allocated block: ratelimit handle_bad_sector() message blk-throttle: Re-use the throtl_set_slice_end() blk-throttle: Open code __throtl_de/enqueue_tg() blk-throttle: Move service tree validation out of the throtl_rb_first() blk-throttle: Move the list operation after list validation blk-throttle: Fix IO hang for a corner case blk-throttle: Avoid tracking latency if low limit is invalid blk-throttle: Avoid getting the current time if tg->last_finish_time is 0 blk-throttle: Remove a meaningless parameter for throtl_downgrade_state() block: Remove redundant 'return' statement ...
| * | nvme: don't call revalidate_disk from nvme_set_queue_dyingChristoph Hellwig2020-09-011-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In nvme_set_queue_dying we really just want to ensure the disk and bdev sizes are set to zero. Going through revalidate_disk leads to a somewhat arcance and complex callchain relying on special behavior in a few places. Instead just lift the set_capacity directly to nvme_set_queue_dying, and rename and move the nvme_mpath_update_disk_size helper so that we can use it in nvme_set_queue_dying to propagate the size to the bdev without detours. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block: replace bd_set_size with bd_set_nr_sectorsChristoph Hellwig2020-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace bd_set_size with a version that takes the number of sectors instead, as that fits most of the current and future callers much better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | nvme: return errors for hwmon initKeith Busch2020-09-221-2/+5
| |/ |/| | | | | | | | | | | | | | | | | Initializing the nvme hwmon retrieves a log from the controller. If the controller is broken, we need to return the appropriate error so that subsequent initialization doesn't attempt to continue. Reported-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | nvme: Revert: Fix controller creation races with teardown flowJames Smart2020-09-081-1/+0
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The indicated patch introduced a barrier in the sysfs_delete attribute for the controller that rejects the request if the controller isn't created. "Created" is defined as at least 1 call to nvme_start_ctrl(). This is problematic in error-injection testing. If an error occurs on the initial attempt to create an association and the controller enters reconnect(s) attempts, the admin cannot delete the controller until either there is a successful association created or ctrl_loss_tmo times out. Where this issue is particularly hurtful is when the "admin" is the nvme-cli, it is performing a connection to a discovery controller, and it is initiated via auto-connect scripts. With the FC transport, if the first connection attempt fails, the controller enters a normal reconnect state but returns control to the cli thread that created the controller. In this scenario, the cli attempts to read the discovery log via ioctl, which fails, causing the cli to see it as an empty log and then proceeds to delete the discovery controller. The delete is rejected and the controller is left live. If the discovery controller reconnect then succeeds, there is no action to delete it, and it sits live doing nothing. Cc: <stable@vger.kernel.org> # v5.7+ Fixes: ce1518139e69 ("nvme: Fix controller creation races with teardown flow") Signed-off-by: James Smart <james.smart@broadcom.com> CC: Israel Rukshin <israelr@mellanox.com> CC: Max Gurtovoy <maxg@mellanox.com> CC: Christoph Hellwig <hch@lst.de> CC: Keith Busch <kbusch@kernel.org> CC: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: have nvme_wait_freeze_timeout return if it timed outSagi Grimberg2020-08-281-1/+1
| | | | | | | | | | Users can detect if the wait has completed or not and take appropriate actions based on this information (e.g. weather to continue initialization or rather fail and schedule another initialization attempt). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* nvme: just check the status code type in nvme_is_path_errorChristoph Hellwig2020-08-211-10/+2
| | | | | | | | | | | | | Check the SCT sub-field for a path related status instead of enumerating invididual status code. As of NVMe 1.4 this adds "Internal Path Error" and "Controller Pathing Error" to the list, but it also future proofs for additional status codes added to the category. Suggested-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme: refactor command completionChristoph Hellwig2020-08-211-3/+28
| | | | | | | | | | | | Lift all the code to decide the dispostition of a completed command from nvme_complete_rq and nvme_failover_req into a new helper, which returns an emum of the potential actions. nvme_complete_rq then just switches on those and calls the proper helper for the action. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme: rename and document nvme_end_requestChristoph Hellwig2020-08-211-1/+7
| | | | | | | | | | | | nvme_end_request is a bit misnamed, as it wraps around the blk_mq_complete_* API. It's semantics also are non-trivial, so give it a more descriptive name and add a comment explaining the semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-blockLinus Torvalds2020-08-051-2/+84
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver updates from Jens Axboe: - NVMe: - ZNS support (Aravind, Keith, Matias, Niklas) - Misc cleanups, optimizations, fixes (Baolin, Chaitanya, David, Dongli, Max, Sagi) - null_blk zone capacity support (Aravind) - MD: - raid5/6 fixes (ChangSyun) - Warning fixes (Damien) - raid5 stripe fixes (Guoqing, Song, Yufen) - sysfs deadlock fix (Junxiao) - raid10 deadlock fix (Vitaly) - struct_size conversions (Gustavo) - Set of bcache updates/fixes (Coly) * tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-block: (117 commits) md/raid5: Allow degraded raid6 to do rmw md/raid5: Fix Force reconstruct-write io stuck in degraded raid5 raid5: don't duplicate code for different paths in handle_stripe raid5-cache: hold spinlock instead of mutex in r5c_journal_mode_show md: print errno in super_written md/raid5: remove the redundant setting of STRIPE_HANDLE md: register new md sysfs file 'uuid' read-only md: fix max sectors calculation for super 1.0 nvme-loop: remove extra variable in create ctrl nvme-loop: set ctrl state connecting after init nvme-multipath: do not fall back to __nvme_find_path() for non-optimized paths nvme-multipath: fix logic for non-optimized paths nvme-rdma: fix controller reset hang during traffic nvme-tcp: fix controller reset hang during traffic nvmet: introduce the passthru Kconfig option nvmet: introduce the passthru configfs interface nvmet: Add passthru enable/disable helpers nvmet: add passthru code to process commands nvme: export nvme_find_get_ns() and nvme_put_ns() nvme: introduce nvme_ctrl_get_by_path() ...
| * nvme: export nvme_find_get_ns() and nvme_put_ns()Logan Gunthorpe2020-07-291-0/+2
| | | | | | | | | | | | | | | | | | | | nvme_find_get_ns() and nvme_put_ns() are required by the target passthru code and are exported under the NVME_TARGET_PASSTHRU namespace. Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: introduce nvme_ctrl_get_by_path()Logan Gunthorpe2020-07-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nvme_ctrl_get_by_path() is analogous to blkdev_get_by_path() except it gets a struct nvme_ctrl from the path to its char dev (/dev/nvme0). It makes use of filp_open() to open the file and uses the private data to obtain a pointer to the struct nvme_ctrl. If the fops of the file do not match, -EINVAL is returned. The purpose of this function is to support NVMe-OF target passthru and is exported under the NVME_TARGET_PASSTHRU namespace. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: introduce nvme_execute_passthru_rq to call nvme_passthru_[start|end]()Logan Gunthorpe2020-07-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new nvme_execute_passthru_rq() helper which calls nvme_passthru_[start|end]() around blk_execute_rq(). This ensures all passthru calls (including nvme_submit_io()) will be wrapped appropriately. nvme_execute_passthru_rq() will also be useful for the nvmet passthru code and is exported in the NVME_TARGET_PASSTHRU namespace. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: create helper function to obtain command effectsLogan Gunthorpe2020-07-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Separate the code to obtain command effects from the code to start a passthru request and move the nvme_passthru_start() and nvme_passthru_end() functions up above nvme_submit_user_cmd() in order that they may be used in a new helper a subsequent patch. The new helper function will be necessary for nvmet passthru code to determine if we need to change out of interrupt context to handle the effects. It is exported in the NVME_TARGET_PASSTHRU namespace. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: fix deadlock in disconnect during scan_work and/or ana_workSagi Grimberg2020-07-291-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A deadlock happens in the following scenario with multipath: 1) scan_work(nvme0) detects a new nsid while nvme0 is an optimized path to it, path nvme1 happens to be inaccessible. 2) Before scan_work is complete nvme0 disconnect is initiated nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING 3) scan_work(1) attempts to submit IO, but nvme_path_is_optimized() observes nvme0 is not LIVE. Since nvme1 is a possible path IO is requeued and scan_work hangs. -- Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 -- 4) Delete also hangs in flush_work(ctrl->scan_work) from nvme_remove_namespaces(). Similiarly a deadlock with ana_work may happen: if ana_work has started and calls nvme_mpath_set_live and device_add_disk, it will trigger I/O. When we trigger disconnect I/O will block because our accessible (optimized) path is disconnecting, but the alternate path is inaccessible, so I/O blocks. Then disconnect tries to flush the ana_work and hangs. [ 605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core] [ 605.552087] Call Trace: [ 605.552683] __schedule+0x2b9/0x6c0 [ 605.553507] schedule+0x42/0xb0 [ 605.554201] io_schedule+0x16/0x40 [ 605.555012] do_read_cache_page+0x438/0x830 [ 605.556925] read_cache_page+0x12/0x20 [ 605.557757] read_dev_sector+0x27/0xc0 [ 605.558587] amiga_partition+0x4d/0x4c5 [ 605.561278] check_partition+0x154/0x244 [ 605.562138] rescan_partitions+0xae/0x280 [ 605.563076] __blkdev_get+0x40f/0x560 [ 605.563830] blkdev_get+0x3d/0x140 [ 605.564500] __device_add_disk+0x388/0x480 [ 605.565316] device_add_disk+0x13/0x20 [ 605.566070] nvme_mpath_set_live+0x5e/0x130 [nvme_core] [ 605.567114] nvme_update_ns_ana_state+0x2c/0x30 [nvme_core] [ 605.568197] nvme_update_ana_state+0xca/0xe0 [nvme_core] [ 605.569360] nvme_parse_ana_log+0xa1/0x180 [nvme_core] [ 605.571385] nvme_read_ana_log+0x76/0x100 [nvme_core] [ 605.572376] nvme_ana_work+0x15/0x20 [nvme_core] [ 605.573330] process_one_work+0x1db/0x380 [ 605.574144] worker_thread+0x4d/0x400 [ 605.574896] kthread+0x104/0x140 [ 605.577205] ret_from_fork+0x35/0x40 [ 605.577955] INFO: task nvme:14044 blocked for more than 120 seconds. [ 605.579239] Tainted: G OE 5.3.5-050305-generic #201910071830 [ 605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.582320] nvme D 0 14044 14043 0x00000000 [ 605.583424] Call Trace: [ 605.583935] __schedule+0x2b9/0x6c0 [ 605.584625] schedule+0x42/0xb0 [ 605.585290] schedule_timeout+0x203/0x2f0 [ 605.588493] wait_for_completion+0xb1/0x120 [ 605.590066] __flush_work+0x123/0x1d0 [ 605.591758] __cancel_work_timer+0x10e/0x190 [ 605.593542] cancel_work_sync+0x10/0x20 [ 605.594347] nvme_mpath_stop+0x2f/0x40 [nvme_core] [ 605.595328] nvme_stop_ctrl+0x12/0x50 [nvme_core] [ 605.596262] nvme_do_delete_ctrl+0x3f/0x90 [nvme_core] [ 605.597333] nvme_sysfs_delete+0x5c/0x70 [nvme_core] [ 605.598320] dev_attr_store+0x17/0x30 Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will indicate the phase of controller deletion where I/O cannot be allowed to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to be issued to the bottom device, and only after we flush the ana_work and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces) we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work from re-firing by aborting early if we are not LIVE, so we should be safe here. In addition, change the transport drivers to follow the updated state machine. Fixes: 0d0b660f214d ("nvme: add ANA support") Reported-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: document nvme controller statesSagi Grimberg2020-07-291-0/+14
| | | | | | | | | | | | | | | | We are starting to see some non-trivial states so lets start documenting them. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-core: replace ctrl page size with a macroChaitanya Kulkarni2020-07-291-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Saving the nvme controller's page size was from a time when the driver tried to use different sized pages, but this value is always set to a constant, and has been this way for some time. Remove the 'page_size' field and replace its usage with the constant value. This also lets the compiler make some micro-optimizations in the io path, and that's always a good thing. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: support for zoned namespacesKeith Busch2020-07-081-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for NVM Express Zoned Namespaces (ZNS) Command Set defined in NVM Express TP4053. Zoned namespaces are discovered based on their Command Set Identifier reported in the namespaces Namespace Identification Descriptor list. A successfully discovered Zoned Namespace will be registered with the block layer as a host managed zoned block device with Zone Append command support. A namespace that does not support append is not supported by the driver. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Keith Busch <keith.busch@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: support for multiple Command Sets Supported and Effects log pagesKeith Busch2020-07-081-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Commands Supported and Effects log page was extended with a CSI field that enables the host to query the log page for each command set supported. Retrieve this log page for each command set that an attached namespace supports, and save a pointer to that log in the namespace head. Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Keith Busch <keith.busch@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: implement multiple I/O Command Set supportNiklas Cassel2020-07-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implements support for multiple I/O Command Sets. NVMe TP 4056 introduces a method to enumerate multiple command sets per namespace. If the command set is exposed, this method for enumeration will be used instead of the traditional method that uses the CC.CSS register command set register for command set identification. For namespaces where the Command Set Identifier is not supported or recognized, the specific namespace will not be created. Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * Merge tag 'v5.8-rc4' into for-5.9/driversJens Axboe2020-07-081-0/+2
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge in 5.8-rc4 for-5.9/block to setup for-5.9/drivers, to provide a clean base and making the life for the NVMe changes easier. Signed-off-by: Jens Axboe <axboe@kernel.dk> * tag 'v5.8-rc4': (732 commits) Linux 5.8-rc4 x86/ldt: use "pr_info_once()" instead of open-coding it badly MIPS: Do not use smp_processor_id() in preemptible code MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen .gitignore: Do not track `defconfig` from `make savedefconfig` io_uring: fix regression with always ignoring signals in io_cqring_wait() x86/ldt: Disable 16-bit segments on Xen PV x86/entry/32: Fix #MC and #DB wiring on x86_32 x86/entry/xen: Route #DB correctly on Xen PV x86/entry, selftests: Further improve user entry sanity checks x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER i2c: mlxcpld: check correct size of maximum RECV_LEN packet i2c: add Kconfig help text for slave mode i2c: slave-eeprom: update documentation i2c: eg20t: Load module automatically if ID matches i2c: designware: platdrv: Set class based on DMI i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665 mm/page_alloc: fix documentation error vmalloc: fix the owner argument for the new __vmalloc_node_range callers mm/cma.c: use exact_nid true to fix possible per-numa cma leak ...
* | \ Merge tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-blockLinus Torvalds2020-08-031-2/+5
|\ \ \ | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block updates from Jens Axboe: "Good amount of cleanups and tech debt removals in here, and as a result, the diffstat shows a nice net reduction in code. - Softirq completion cleanups (Christoph) - Stop using ->queuedata (Christoph) - Cleanup bd claiming (Christoph) - Use check_events, moving away from the legacy media change (Christoph) - Use inode i_blkbits consistently (Christoph) - Remove old unused writeback congestion bits (Christoph) - Cleanup/unify submission path (Christoph) - Use bio_uninit consistently, instead of bio_disassociate_blkg (Christoph) - sbitmap cleared bits handling (John) - Request merging blktrace event addition (Jan) - sysfs add/remove race fixes (Luis) - blk-mq tag fixes/optimizations (Ming) - Duplicate words in comments (Randy) - Flush deferral cleanup (Yufen) - IO context locking/retry fixes (John) - struct_size() usage (Gustavo) - blk-iocost fixes (Chengming) - blk-cgroup IO stats fixes (Boris) - Various little fixes" * tag 'for-5.9/block-20200802' of git://git.kernel.dk/linux-block: (135 commits) block: blk-timeout: delete duplicated word block: blk-mq-sched: delete duplicated word block: blk-mq: delete duplicated word block: genhd: delete duplicated words block: elevator: delete duplicated word and fix typos block: bio: delete duplicated words block: bfq-iosched: fix duplicated word iocost_monitor: start from the oldest usage index iocost: Fix check condition of iocg abs_vdebt block: Remove callback typedefs for blk_mq_ops block: Use non _rcu version of list functions for tag_set_list blk-cgroup: show global disk stats in root cgroup io.stat blk-cgroup: make iostat functions visible to stat printing block: improve discard bio alignment in __blkdev_issue_discard() block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers block: defer flush request no matter whether we have elevator block: make blk_timeout_init() static block: remove retry loop in ioc_release_fn() block: remove unnecessary ioc nested locking block: integrate bd_start_claiming into __blkdev_get ...
| * | block: move ->make_request_fn to struct block_device_operationsChristoph Hellwig2020-07-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The make_request_fn is a little weird in that it sits directly in struct request_queue instead of an operation vector. Replace it with a block_device_operations method called submit_bio (which describes much better what it does). Also remove the request_queue argument to it, as the queue can be derived pretty trivially from the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | nvme: use blk_mq_complete_request_remote to avoid an indirect function callChristoph Hellwig2020-06-241-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new blk_mq_complete_request_remote helper to avoid an indirect function call in the completion fast path. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | blk-mq: move failure injection out of blk_mq_complete_requestChristoph Hellwig2020-06-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the call to blk_should_fake_timeout out of blk_mq_complete_request and into the drivers, skipping call sites that are obvious error handlers, and remove the now superflous blk_mq_force_complete_rq helper. This ensures we don't keep injecting errors into completions that just terminate the Linux request after the hardware has been reset or the command has been aborted. Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>