summaryrefslogtreecommitdiff
path: root/usr/initiator.c
diff options
context:
space:
mode:
authorgulams <64251312+gulams@users.noreply.github.com>2020-04-28 13:03:50 +0530
committerGulam Mohamed <gulam.mohamed@oracle.com>2020-05-04 21:03:39 +0000
commitdc7560d404857c0540caed2f71f8e7c2e7307ab3 (patch)
tree30ad749d4a22aa598862ed0cb5033115f906fce2 /usr/initiator.c
parent433288fd876a31a83d77cad07419f8da793091ea (diff)
downloadopen-iscsi-dc7560d404857c0540caed2f71f8e7c2e7307ab3.tar.gz
Proper disconnect of TCP connection
1. Due to configuration issues, the logins from iSCSI initiator were getting rejected by the target 2. The initiator was retrying the login again and again 3. Each time the initiator tries to log in, the host number gets incremented by 1 4. At one point of time, the host number reached 65535 5. During the login process, once the TCP connection is established, the initiator tries to set the host parameters for the network interface if its not the default interface 6. While setting these host parameters, it will try to do a lookup of the host based upon the host number 7. The host number in "iscsi_uevent" structure is uint32_t. This is given as an argument to the scsi_host_lookup() function 8. This scsi_host_lookup() function takes it as unsigned short. So, when it receives the host number above 65535, the value is wrapped and starts from 0 again 9. Thus the incorrect value of host number is received by the scsi_host_lookup() function and hence it returns with error that the host is not existing in the list 10. Due to this "host not found error", the open-iscsi will retry this particular connection again and again 11. In this each retry, it will disconnect and then connect again with the same connection pointer, i.e it re-opens the connection multiple times till 120 seconds timeout 12. During these 120 seconds, observed that its trying to re-open the connection aroung 400+ times with each time disconnect and connect 13. After 120 seconds, the connection and session will be destroyed 14. So, while doing multiple retries of connect and disconnect during the 120 seconds, when the connect is successful it will try to bind the connection to the session 15. When it binds the connection and session, the reference count for the socket is incremented 16. When it disconnects, its trying to close the socket with close(sockfd) system call 17. This close() system call is entering into the kernel and NOT going forward till the networking layrer to call tcp_close() to send the FIN packet to the target 18. Its not going till tcp_close() because the reference count of the socket is still 1 19. So, the initiator is not sending the FIN packet to target and hence target is timing out and sending FIN after its timeout. This happens for all the retries (400+) 20. At some point, when this FIN packet is received by the initiator, the connection was destroyed and the memory was re-used for some other purpose and hence we see the panic Fix: == Fix is to decrement the reference count of the socket fd after disconnect by calling the stop connection Corrected the indentation for the change in the function iscsi_login_eh()
Diffstat (limited to 'usr/initiator.c')
-rw-r--r--usr/initiator.c6
1 files changed, 1 insertions, 5 deletions
diff --git a/usr/initiator.c b/usr/initiator.c
index a07f9aa..5f4bdca 100644
--- a/usr/initiator.c
+++ b/usr/initiator.c
@@ -711,11 +711,7 @@ static void iscsi_login_eh(struct iscsi_conn *conn, struct queue_task *qtask,
!iscsi_retry_initial_login(conn))
session_conn_shutdown(conn, qtask, err);
else {
- session->reopen_cnt++;
- session->t->template->ep_disconnect(conn);
- if (iscsi_conn_connect(conn, qtask))
- queue_delayed_reopen(qtask,
- ISCSI_CONN_ERR_REOPEN_DELAY);
+ session_conn_reopen(conn, qtask, STOP_CONN_TERM);
}
break;
case R_STAGE_SESSION_REDIRECT: