summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-linus' of ↵Linus Torvalds2016-01-23174-4669/+7110
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Initial roundup of 4.5 merge window patches - Remove usage of ib_query_device and instead store attributes in ib_device struct - Move iopoll out of block and into lib, rename to irqpoll, and use in several places in the rdma stack as our new completion queue polling library mechanism. Update the other block drivers that already used iopoll to use the new mechanism too. - Replace the per-entry GID table locks with a single GID table lock - IPoIB multicast cleanup - Cleanups to the IB MR facility - Add support for 64bit extended IB counters - Fix for netlink oops while parsing RDMA nl messages - RoCEv2 support for the core IB code - mlx4 RoCEv2 support - mlx5 RoCEv2 support - Cross Channel support for mlx5 - Timestamp support for mlx5 - Atomic support for mlx5 - Raw QP support for mlx5 - MAINTAINERS update for mlx4/mlx5 - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates - Add support for remote invalidate to the iSER driver (pushed through the RDMA tree due to dependencies, acknowledged by nab) - Update to NFSoRDMA (pushed through the RDMA tree due to dependencies, acknowledged by Bruce)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits) IB/mlx5: Unify CQ create flags check IB/mlx5: Expose Raw Packet QP to user space consumers {IB, net}/mlx5: Move the modify QP operation table to mlx5_ib IB/mlx5: Support setting Ethernet priority for Raw Packet QPs IB/mlx5: Add Raw Packet QP query functionality IB/mlx5: Add create and destroy functionality for Raw Packet QP IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types IB/mlx5: Allocate a Transport Domain for each ucontext net/mlx5_core: Warn on unsupported events of QP/RQ/SQ net/mlx5_core: Add RQ and SQ event handling net/mlx5_core: Export transport objects IB/mlx5: Expose CQE version to user-space IB/mlx5: Add CQE version 1 support to user QPs and SRQs IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext IB/sa: Fix netlink local service GFP crash IB/srpt: Remove redundant wc array IB/qib: Improve ipoib UD performance IB/mlx4: Advertise RoCE v2 support IB/mlx4: Create and use another QP1 for RoCEv2 IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers ...
| * IB/mlx5: Unify CQ create flags checkLeon Romanovsky2016-01-212-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The create_cq() can receive creation flags which were used differently by two commits which added create_cq extended command and cross-channel. The merged code caused to not accept any flags at all. This patch unifies the check into one function and one return error code. Fixes: 972ecb821379 ("IB/mlx5: Add create_cq extended command") Fixes: 051f263098a9 ("IB/mlx5: Add driver cross-channel support") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx5: Expose Raw Packet QP to user space consumersmajd@mellanox.com2016-01-211-12/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | Added Raw Packet QP modify functionality which will enable user space consumers to use it. Since Raw Packet QP is built of SQ and RQ sub-objects, therefore Raw Packet QP state changes are implemented by changing the state of the sub-objects. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * {IB, net}/mlx5: Move the modify QP operation table to mlx5_ibmajd@mellanox.com2016-01-213-54/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | When modifying a QP, the desired operation was determined in the mlx5_core using a transition table that takes the current state, the final state, and returns the desired operation. Since this logic will be used for Raw Packet QP, move the operation table to the mlx5_ib. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx5: Support setting Ethernet priority for Raw Packet QPsmajd@mellanox.com2016-01-214-4/+57
| | | | | | | | | | | | | | | | | | | | | | When the user changes the Address Vector(AV) in the modify QP, he provides an SL. This SL should be translated to Ethernet Priority by taking the 3 LSB bits, and modify the QP's TIS according to this Ethernet priority. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx5: Add Raw Packet QP query functionalitymajd@mellanox.com2016-01-214-23/+205
| | | | | | | | | | | | | | | | | | | | | | Since Raw Packet QP is composed of RQ and SQ, the IB QP's state is derived from the sub-objects. Therefore we need to query each one of the sub-objects, and decide on the IB QP's state. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx5: Add create and destroy functionality for Raw Packet QPmajd@mellanox.com2016-01-213-18/+365
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for Raw Packet QP for the mlx5 device. Raw Packet QP, unlike other QP types, has no matching mlx5_core_qp object but rather it is built of RQ/SQ/TIR/TIS/TD mlx5_core object. Since the SQ and RQ work-queue (WQ) buffers are not contiguous like other QPs, we allocate separate buffers in the user-space and pass the address of each one of them separately to the kernel. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP typesmajd@mellanox.com2016-01-213-95/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract specific IB QP fields to mlx5_ib_qp_trans structure. The mlx5_core QP object resides in mlx5_ib_qp_base, which all QP types inherit from. When we need to find mlx5_ib_qp using mlx5_core QP (event handling and co), we use a pointer that resides in mlx5_ib_qp_base. In addition, we delete all redundant fields that weren't used anywhere in the code: -doorbell_qpn -sq_max_wqes_per_wr -sq_spare_wqes Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx5: Allocate a Transport Domain for each ucontextmajd@mellanox.com2016-01-212-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Transport Domain groups several TIS and TIR object. By grouping these object, it defines wheather local loopback packets that are sent from the TIS objects in the group are received by the TIR objects in the same group. Allocate a Transport Domain(TD) for each user context to be used in the future by Raw Packet QP for Self-Loopback Control. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * net/mlx5_core: Warn on unsupported events of QP/RQ/SQmajd@mellanox.com2016-01-211-0/+52
| | | | | | | | | | | | | | | | | | When an event arrives on QP/RQ/SQ, check whether it's supported, and print a warning message otherwise. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * net/mlx5_core: Add RQ and SQ event handlingmajd@mellanox.com2016-01-215-23/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | RQ/SQ will be used to implement IB verbs QPs, so the IB QP affiliated events are affiliated also with SQs and RQs. Since SQ, RQ and QP resource numbers do not share the same name space, a queue type field was added to the event data to specify the SW object that the event is affiliated with. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * net/mlx5_core: Export transport objectsmajd@mellanox.com2016-01-215-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | To be used by mlx5_ib in the following patches for implementing RAW PACKET QP. Add mlx5_core_ prefix to alloc and delloc transport_domain since they are exposed now. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx5: Expose CQE version to user-spaceHaggai Abramovsky2016-01-212-5/+18
| | | | | | | | | | | | | | | | | | | | Per user context, work with CQE version that both the user-space and the kernel support. Report this CQE version via the response of the alloc_ucontext command. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx5: Add CQE version 1 support to user QPs and SRQsHaggai Abramovsky2016-01-217-21/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enforce working with CQE version 1 when the user supports CQE version 1 and asked to work this way. If the user still works with CQE version 0, then use the default CQE version to tell the Firmware that the user still works in the older mode. After this patch, the kernel still reports CQE version 0. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontextHaggai Abramovsky2016-01-211-1/+4
| | | | | | | | | | | | | | | | The wrong buffer size was passed to ib_is_udata_cleared. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/sa: Fix netlink local service GFP crashKaike Wan2016-01-211-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rdma netlink local service registers a handler to handle RESOLVE response and another handler to handle SET_TIMEOUT request. The first thing these handlers do is to call netlink_capable() to check the access right of the received skb to make sure that the sender has root access. Under normal conditions, such responses and requests will be directly forwarded to the handlers without going through the netlink_dump pathway (see ibnl_rcv_msg() in drivers/infiniband/core/netlink.c). However, a user application could send a RESOLVE request (not response) to the local service, which will fall into the netlink_dump pathway, where a new skb will be created without initializing the control block. This new skb will be eventually forwarded to the local service RESOLVE response handler. Unfortunately, netlink_capable() will cause general protection fault if the skb's control block is not initialized. This patch will address the problem by checking the skb first. Signed-off-by: Kaike Wan <kaike.wan@intel.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/srpt: Remove redundant wc arraySagi Grimberg2016-01-191-2/+0
| | | | | | | | | | | | | | | | | | No usage after the conversion to the new CQ API. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/qib: Improve ipoib UD performanceMike Marciniszyn2016-01-191-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on profiling, UD performance drops in case of processes in a single client due to excess context switches when the progress workqueue is scheduled. This is solved by modifying the heuristic to select the direct progress instead of the scheduling progress via the workqueue when UD-like situations are detected in the heuristic. Reviewed-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx4: Advertise RoCE v2 supportMatan Barak2016-01-191-3/+9
| | | | | | | | | | | | | | | | | | Advertise RoCE v2 support in port_immutable attributes according to the hardware's capabilities. This enables the verbs stack to use RoCE v2 mode. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx4: Create and use another QP1 for RoCEv2Moni Shoua2016-01-192-19/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mlx4 driver uses a special QP to implement the GSI QP. This kind of QP allows to build the InfiniBand headers in software. When mlx4 hardware builds the packet, it calculates the ICRC and puts it at the end of the payload. However, this ICRC calculation depends on the QP configuration, which is determined when the QP is modified (roce_mode during INIT->RTR). When receiving a packet, the ICRC verification doesn't depend on this configuration. Therefore, using two GSI QPs for send (one for each RoCE version) and one GSI QP for receive are required. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headersMoni Shoua2016-01-193-12/+54
| | | | | | | | | | | | | | | | | | | | | | | | RoCEv2 packets are sent over IP/UDP protocols. The mlx4 driver uses a type of RAW QP to send packets for QP1 and therefore needs to build the network headers below BTH in software. This patch adds option to build QP1 packets with IP and UDP headers if RoCEv2 is requested. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx4: Enable RoCE v2 when the IB device is addedMoni Shoua2016-01-191-1/+8
| | | | | | | | | | | | | | | | | | If the hardware supports RoCE v2, we configure the hardware UDP port according to the RoCE v2 Annex when mlx4_ib device is added. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx4: Support modify_qp for RoCE v2Moni Shoua2016-01-192-4/+35
| | | | | | | | | | | | | | | | | | In order to support modify_qp for RoCE v2, we need to set the gid_type in the QP context. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/core: Add definition for the standard RoCE V2 UDP portMoni Shoua2016-01-191-0/+1
| | | | | | | | | | | | | | | | | | This will be used in hardware device driver when building QP or AH contexts. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * net/mlx4_core: Add support for RoCE v2 entropyMoni Shoua2016-01-192-1/+38
| | | | | | | | | | | | | | | | | | In RoCE v2 we need to choose a source UDP port, we do so by using entropy over the source and dest QPNs. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * net/mlx4_core: Add support for configuring RoCE v2 UDP portMoni Shoua2016-01-192-1/+16
| | | | | | | | | | | | | | | | | | | | | | In order to support RoCE v2, the hardware needs to be configured to classify certain UDP packets as RoCE v2 packets and pass it through its RoCE pipeline. This patch enables configuring this UDP port. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx4: Add support for setting RoCEv2 gids in hardwareMoni Shoua2016-01-192-4/+62
| | | | | | | | | | | | | | | | | | To tell hardware about a gid with type RoCEv2, software needs a new modifier to the SET_PORT command: MLX4_SET_PORT_ROCE_ADDR. This can replace the old method, MLX4_SET_PORT_GID_TABLE, for RoCEv1 gids. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * net/mlx4_core: Configure mlx4 hardware for mixed RoCE v1/v2 modesMoni Shoua2016-01-192-1/+11
| | | | | | | | | | | | | | | | | | If the hardware supports RoCE v2 (mixed with RoCE v1) mode, we enable it. This is necessary in order to support RoCE v2. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/mlx4: Add gid_type to GID propertiesMoni Shoua2016-01-192-4/+14
| | | | | | | | | | | | | | | | | | | | IB core driver adds a property of type to struct ib_gid_attr. The mlx4 driver should take that in consideration when modifying or querying the hardware gid table. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * net/mlx4: Query RoCE supportMoni Shoua2016-01-192-3/+30
| | | | | | | | | | | | | | | | | | Query the RoCE support from firmware using the appropriate firmware commands. Downstream patches will read these capabilities and act accordingly. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * svc_rdma: use local_dma_lkeyChristoph Hellwig2016-01-195-40/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | We now alwasy have a per-PD local_dma_lkey available. Make use of that fact in svc_rdma and stop registering our own MR. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * svcrdma: Add class for RDMA backwards direction transportChuck Lever2016-01-198-15/+475
| | | | | | | | | | | | | | | | | | | | To support the server-side of an NFSv4.1 backchannel on RDMA connections, add a transport class that enables backward direction messages on an existing forward channel connection. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * svcrdma: Define maximum number of backchannel requestsChuck Lever2016-01-193-15/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extra resources for handling backchannel requests have to be pre-allocated when a transport instance is created. Set up additional fields in svcxprt_rdma to track these resources. The max_requests fields are elements of the RPC-over-RDMA protocol, so they should be u32. To ensure that unsigned arithmetic is used everywhere, some other fields in the svcxprt_rdma struct are updated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * svcrdma: Make map_xdr non-staticChuck Lever2016-01-192-7/+9
| | | | | | | | | | | | | | | | Pre-requisite to use map_xdr in the backchannel code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * svcrdma: Remove last two __GFP_NOFAIL call sitesChuck Lever2016-01-192-2/+7
| | | | | | | | | | | | | | | | | | | | | | Clean up. These functions can otherwise fail, so check for page allocation failures too. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * svcrdma: Add gfp flags to svc_rdma_post_recv()Chuck Lever2016-01-193-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | svc_rdma_post_recv() allocates pages for receive buffers on-demand. It uses GFP_KERNEL so the allocator tries hard, and may sleep. But I'm about to add a call to svc_rdma_post_recv() from a function that may not sleep. Since all svc_rdma_post_recv() call sites can tolerate its failure, allow it to fail if the page allocator returns nothing. Longer term, receive buffers, being a finite resource per-connection, should be pre-allocated and re-used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * svcrdma: Remove unused req_map and ctxt kmem_cachesChuck Lever2016-01-193-42/+1
| | | | | | | | | | | | | | | | Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * svcrdma: Improve allocation of struct svc_rdma_req_mapChuck Lever2016-01-193-15/+84
| | | | | | | | | | | | | | | | | | To ensure this allocation cannot fail and will not sleep, pre-allocate the req_map structures per-connection. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * svcrdma: Improve allocation of struct svc_rdma_op_ctxtChuck Lever2016-01-192-14/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the maximum payload size of NFS READ and WRITE was increased by commit cc9a903d915c ("svcrdma: Change maximum server payload back to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt increased to over 6KB (on x86_64). That makes allocating one of these from a kmem_cache more likely to fail in situations when system memory is exhausted. Since I'm about to add a caller where this allocation must always work _and_ it cannot sleep, pre-allocate ctxts for each connection. Another motivation for this change is that NFSv4.x servers are required by specification not to drop NFS requests. Pre-allocating memory resources reduces the likelihood of a drop. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * svcrdma: Clean up process_context()Chuck Lever2016-01-191-23/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | Be sure the completed ctxt is put in every path. The xprt enqueue can take a while, so put the completed ctxt back in circulation _before_ enqueuing the xprt. Remove/disable debugging. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * svcrdma: Clean up rdma_create_xprt()Chuck Lever2016-01-191-8/+1
| | | | | | | | | | | | | | | | | | | | kzalloc is used here, so setting the atomic fields to zero is unnecessary. sc_ord is set again in handle_connect_req. The other fields are re-initialized in svc_rdma_accept(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Bruce Fields <bfields@fieldses.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/core: Use hop-limit from IP stack for RoCEMatan Barak2016-01-196-19/+26
| | | | | | | | | | | | | | | | | | Previously, IPV6_DEFAULT_HOPLIMIT was used as the hop limit value for RoCE. Fixing that by taking ip4_dst_hoplimit and ip6_dst_hoplimit as hop limit values. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/core: Rename rdma_addr_find_dmac_by_grhMatan Barak2016-01-194-17/+19
| | | | | | | | | | | | | | | | | | rdma_addr_find_dmac_by_grh resolves dmac, vlan_id and if_index and downsteram patch will also add hop_limit as an output parameter, thus we rename it to rdma_addr_find_l2_eth_by_grh. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/cm: Fix a recently introduced deadlockBart Van Assche2016-01-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ib_send_cm_drep() calls cm_enter_timewait() while holding a spinlock that can be locked from inside an interrupt handler. Hence do not enable interrupts inside cm_enter_timewait() if called with interrupts disabled. This patch fixes e.g. the following deadlock: Acked-by: Erez Shitrit <erezsh@mellanox.com> ================================= [ INFO: inconsistent lock state ] 4.4.0-rc7+ #1 Tainted: G E --------------------------------- inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. swapper/8/0 [HC1[1]:SC0[0]:HE0:SE1] takes: (&(&cm_id_priv->lock)->rlock){?.+...}, at: [<ffffffffa036eec4>] cm_establish+0x 74/0x1b0 [ib_cm] {HARDIRQ-ON-W} state was registered at: [<ffffffff810a3c11>] mark_held_locks+0x71/0x90 [<ffffffff810a3e87>] trace_hardirqs_on_caller+0xa7/0x1c0 [<ffffffff810a3fad>] trace_hardirqs_on+0xd/0x10 [<ffffffff8151c40b>] _raw_spin_unlock_irq+0x2b/0x40 [<ffffffffa036ea8e>] cm_enter_timewait+0xae/0x100 [ib_cm] [<ffffffffa036ff76>] ib_send_cm_drep+0xb6/0x190 [ib_cm] [<ffffffffa052ed08>] srp_cm_handler+0x128/0x1a0 [ib_srp] [<ffffffffa0370340>] cm_process_work+0x20/0xf0 [ib_cm] [<ffffffffa0371335>] cm_dreq_handler+0x135/0x2c0 [ib_cm] [<ffffffffa03733c5>] cm_work_handler+0x75/0xd0 [ib_cm] [<ffffffff8107184d>] process_one_work+0x1bd/0x460 [<ffffffff81073148>] worker_thread+0x118/0x420 [<ffffffff81078454>] kthread+0xe4/0x100 [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70 irq event stamp: 1672286 hardirqs last enabled at (1672283): [<ffffffff81408ec0>] poll_idle+0x10/0x80 hardirqs last disabled at (1672284): [<ffffffff8151d304>] common_interrupt+0x84/0x89 softirqs last enabled at (1672286): [<ffffffff8105b4dc>] _local_bh_enable+0x1c/0x50 softirqs last disabled at (1672285): [<ffffffff8105b697>] irq_enter+0x47/0x70 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&cm_id_priv->lock)->rlock); <Interrupt> lock(&(&cm_id_priv->lock)->rlock); *** DEADLOCK *** no locks held by swapper/8/0. stack backtrace: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G E 4.4.0-rc7+ #1 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014 ffff88045af5e950 ffff88046e503a88 ffffffff81251c1b 0000000000000007 0000000000000006 0000000000000003 ffff88045af5ddc0 ffff88046e503ad8 ffffffff810a32f4 0000000000000000 0000000000000000 0000000000000001 Call Trace: <IRQ> [<ffffffff81251c1b>] dump_stack+0x4f/0x74 [<ffffffff810a32f4>] print_usage_bug+0x184/0x190 [<ffffffff810a36e2>] mark_lock_irq+0xf2/0x290 [<ffffffff810a3995>] mark_lock+0x115/0x1b0 [<ffffffff810a3b8c>] mark_irqflags+0x15c/0x170 [<ffffffff810a4fef>] __lock_acquire+0x1ef/0x560 [<ffffffff810a53c2>] lock_acquire+0x62/0x80 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60 [<ffffffffa036eec4>] cm_establish+0x74/0x1b0 [ib_cm] [<ffffffffa036f031>] ib_cm_notify+0x31/0x100 [ib_cm] [<ffffffffa0637f24>] srpt_qp_event+0x54/0xd0 [ib_srpt] [<ffffffffa0196052>] mlx4_ib_qp_event+0x72/0xc0 [mlx4_ib] [<ffffffffa00775b9>] mlx4_qp_event+0x69/0xd0 [mlx4_core] [<ffffffffa006000e>] mlx4_eq_int+0x51e/0xd50 [mlx4_core] [<ffffffffa006084f>] mlx4_msi_x_interrupt+0xf/0x20 [mlx4_core] [<ffffffff810b67b0>] handle_irq_event_percpu+0x40/0x110 [<ffffffff810b68bf>] handle_irq_event+0x3f/0x70 [<ffffffff810ba7f9>] handle_edge_irq+0x79/0x120 [<ffffffff81007f3d>] handle_irq+0x5d/0x130 [<ffffffff810071fd>] do_IRQ+0x6d/0x130 [<ffffffff8151d309>] common_interrupt+0x89/0x89 <EOI> [<ffffffff8140895f>] cpuidle_enter_state+0xcf/0x200 [<ffffffff81408aa2>] cpuidle_enter+0x12/0x20 [<ffffffff810990d6>] call_cpuidle+0x36/0x60 [<ffffffff81099163>] cpuidle_idle_call+0x63/0x110 [<ffffffff8109930a>] cpu_idle_loop+0xfa/0x130 [<ffffffff8109934e>] cpu_startup_entry+0xe/0x10 [<ffffffff8103c443>] start_secondary+0x83/0x90 Fixes: commit be4b499323bf ("IB/cm: Do not queue work to a device that's going away") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/srpt: Fix the RDMA completion handlersBart Van Assche2016-01-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid that the following kernel crash is triggered when processing an RDMA completion: BUG: unable to handle kernel paging request at 0000000100000198 IP: [<ffffffff810a4ea2>] __lock_acquire+0xa2/0x560 Call Trace: [<ffffffff810a53c2>] lock_acquire+0x62/0x80 [<ffffffff8151bd33>] _raw_spin_lock_irqsave+0x43/0x60 [<ffffffffa04fd437>] srpt_rdma_read_done+0x57/0x120 [ib_srpt] [<ffffffffa0144dd3>] __ib_process_cq+0x43/0xc0 [ib_core] [<ffffffffa0145115>] ib_cq_poll_work+0x25/0x70 [ib_core] [<ffffffff8107184d>] process_one_work+0x1bd/0x460 [<ffffffff81073148>] worker_thread+0x118/0x420 [<ffffffff81078454>] kthread+0xe4/0x100 [<ffffffff8151cbbf>] ret_from_fork+0x3f/0x70 Fixes: commit 59fae4deaad3 ("IB/srpt: chain RDMA READ/WRITE requests"). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * irq_poll: Fix irq_poll_sched()Bart Van Assche2016-01-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The IRQ_POLL_F_SCHED bit is set as long as polling is ongoing. This means that irq_poll_sched() must proceed if this bit has not yet been set. Fixes: commit ea51190c0315 ("irq_poll: fold irq_poll_sched_prep into irq_poll_sched"). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/core: Fix dereference before checkMatan Barak2016-01-191-4/+5
| | | | | | | | | | | | | | | | | | Sparse complains about dereference before check. Fixing this by moving the check before the dereference. Fixes: 200298326b27 ('IB/core: Validate route when we init ah') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/core: Eliminate sparse false context imbalance warningMatan Barak2016-01-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When write_gid function needs to do a sleep-able operation, it unlocks table->rwlock and then relocks it. Sparse complains about context imbalance. This is safe as write_gid is always called with table->rwlock. write_gid protects from simultaneous writes to this GID entry by setting the GID_TABLE_ENTRY_INVALID flag. Fixes: 9c584f049596 ('IB/core: Change per-entry lock in RoCE GID table to one lock') Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/core: sysfs.c: Fix PerfMgt ClassPortInfo handlingHal Rosenstock2016-01-191-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Port number is not part of ClassPortInfo attribute but is still needed as a parameter when invoking process_mad. To properly handle this attribute, port_num is added as a parameter to get_counter_table and get_perf_mad was changed not to store port_num in the attribute itself when it's querying the ClassPortInfo attribute. This handles issue pointed out by Matan Barak <matanb@dev.mellanox.co.il> Fixes: 145d9c541032 ('IB/core: Display extended counter set if available') Signed-off-by: Hal Rosenstock <hal@mellanox.com> Acked-by: Matan Barak <matanb@mellanox.com> Acked-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * IB/core: Remove set-but-not-used variable from ib_sg_to_pages()Bart Van Assche2016-01-191-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Detected this by building the IB core with W=1. See also patch "IB core: Fix ib_sg_to_pages()" (commit 8f5ba10ed40a). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Leon Romanovsky <leon.romanovsky@mellanox.com> Acked-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>