summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorgulams <64251312+gulams@users.noreply.github.com>2021-08-26 15:35:41 +0530
committerGitHub <noreply@github.com>2021-08-26 15:35:41 +0530
commit6859efdd2245fc2d684ab05883836e9f0f386970 (patch)
tree8c7ac626ada6cce3cd606311da5f97e17e685212
parent2a8f9d81d0d6b5094c3fe9c686e2afb2ec27058a (diff)
downloadopen-iscsi-6859efdd2245fc2d684ab05883836e9f0f386970.tar.gz
Handle recv() returning 0 in iscsid_response()
Description: ------------ Due to any issue on target side or any other underlying network issue, the initiator will get "connection refused" error when it tries to connect to the iscsi target. Once it gets the connection refused error, the error handler will try to re-connect to the target after 3 seconds and this will be repeated till the timeout of 120 seconds for the initial login. After this, the connection will be terminated and shutdown. After this, when the image installation started to upgrade the iscsi-initiator-utils, first systemd tried to Reload the daemon and started tryign to login iscsi devicces, which triggered the iscsi connection to the target which was down (for which we had the initial login failed as mentioned above). This trigger of connection was from the iscsiadm (iscsi.service) which got the connection refused error. So, it will be retried. But now systemd stopped the iscsid for updating the iscsi-initiator-utils which was started again later. Since the iscsid daemon was stopped, the retry logic in iscsid_response() function kept polling and went into indefinite loop: int iscsid_response(int fd, iscsiadm_cmd_e cmd, iscsiadm_rsp_t *rsp, int timeout) { ... ... while (len) { struct pollfd pfd; pfd.fd = fd; pfd.events = POLLIN; err = poll(&pfd, 1, timeout); <<< This poll returned err = 1 if (!err) { if (poll_wait) continue; return ISCSI_ERR_SESSION_NOT_CONNECTED; } else if (err < 0) { if (errno == EINTR) continue; log_error("got poll error (%d/%d), daemon died?", err, errno); return ISCSI_ERR_ISCSID_COMM_ERR; } else if (pfd.revents & POLLIN) { <<< We came here which returned 0 err = recv(fd, rsp, sizeof(*rsp), MSG_WAITALL); if (err < 0) { log_error("read error (%d/%d), daemon died?", err, errno); break; } len -= err; iscsi_err = rsp->err; } } ... ... } In the above code poll() was returning 1 (indicating success and the poll fd was existing and will be closed only after this while loop) and revents was set to POLLIN. So, we enetered the last "else if" block and tried to recv the message from the target. Since the target was shutdown gracefully (due to which we were getting connection refused errors), the recv() call returned 0 as no bytes were received. Since the bytes received was 0, the value of "len" did not change due to which the while loop is repeated again. The status on pfd was not changed as there was nobody to change it and also it was not yet closed (will be closed only after the while loop is exited). So, the poll again returned 1. Hence we again tried to receive the bytes from target which again returned 0. So, this went into infinite loop and hence the iscsi.service got stuck on iscsiadm. This caused the iscsi-initiator-utils update process to get stuck and in-turn the image installation got stuck. Fix: The fix is to handle the case, by exiting the while loop, when the recv() returns the 0 bytes indicating that the remote target service (the peer) had an orderly shutdown.
-rw-r--r--usr/iscsid_req.c2
1 files changed, 1 insertions, 1 deletions
diff --git a/usr/iscsid_req.c b/usr/iscsid_req.c
index a3aba6d..596086e 100644
--- a/usr/iscsid_req.c
+++ b/usr/iscsid_req.c
@@ -165,7 +165,7 @@ int iscsid_response(int fd, iscsiadm_cmd_e cmd, iscsiadm_rsp_t *rsp,
return ISCSI_ERR_ISCSID_COMM_ERR;
} else if (pfd.revents & POLLIN) {
err = recv(fd, rsp, sizeof(*rsp), MSG_WAITALL);
- if (err < 0) {
+ if (err <= 0) {
log_error("read error (%d/%d), daemon died?",
err, errno);
break;