Always detect when a connection is closed behind psycopg2's back.

There's a race condition that only seems to happen over Unix-domain sockets. Sometimes, the closed socket is reported by the kernel to libpq like this (captured with strace): sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29 recvfrom(3, "E\0\0\0mSFATAL\0C57P01\0Mterminating "..., 16384, 0, NULL, NULL) = 110 recvfrom(3, 0x12d0330, 16384, 0, 0, 0) = -1 ECONNRESET (Connection reset by peer) That is, psycopg2/libpq sees no error when sending the first query after the connection is closed, but gets an error reading the result. In that case, everything worked fine. But sometimes, the error manifests like this: sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe) recvfrom(3, "E\0\0\0mSFATAL\0C57P01\0Mterminating "..., 16384, 0, NULL, NULL) = 110 recvfrom(3, "", 16274, 0, NULL, NULL) = 0 recvfrom(3, "", 16274, 0, NULL, NULL) = 0 i.e. libpq received an error when sending the query. This manifests as a slightly different exception from a slightly different place. More importantly, in this case connection.closed is left at 0 rather than being set to 2, and that is the bug I'm fixing here. Note that we see almost identical behaviour for sync and async connections, and the fixes are the same. So I added extremely similar test cases. Finally, there is still a bug here: for async connections, we sometimes raise DatabaseError (incorrect) and sometimes raise OperationalError (correct). Will fix that next.
author: Greg Ward <gward@dyn.com> 2016-06-28 17:46:21 -0400
committer: Daniele Varrazzo <daniele.varrazzo@gmail.com> 2017-03-14 12:08:03 +0000
commit: b203a7c775c5f81f59c6b4205c3c33455ceeec3d (patch)
tree: 1d5a4e6419334610498cf5e83769fbf943485275
parent: 3c124a0b87a7bde51519cd0eb98980f04eed1595 (diff)
download: psycopg2-b203a7c775c5f81f59c6b4205c3c33455ceeec3d.tar.gz
2 files changed, 51 insertions, 0 deletions
diff --git a/psycopg/pqpath.c b/psycopg/pqpath.c
index 59cbb5e..222696b 100644
--- a/psycopg/pqpath.c
+++ b/psycopg/pqpath.c
@@ -986,6 +986,9 @@ pq_execute(cursorObject *curs, const char *query, int async, int no_result, int
 
         /* don't let pgres = NULL go to pq_fetch() */
         if (curs->pgres == NULL) {
+            if (CONNECTION_BAD == PQstatus(curs->conn->pgconn)) {
+                curs->conn->closed = 2;
+            }
             pthread_mutex_unlock(&(curs->conn->lock));
             Py_BLOCK_THREADS;
             if (!PyErr_Occurred()) {
@@ -1013,6 +1016,9 @@ pq_execute(cursorObject *curs, const char *query, int async, int no_result, int
 
         CLEARPGRES(curs->pgres);
         if (PQsendQuery(curs->conn->pgconn, query) == 0) {
+            if (CONNECTION_BAD == PQstatus(curs->conn->pgconn)) {
+                curs->conn->closed = 2;
+            }
             pthread_mutex_unlock(&(curs->conn->lock));
             Py_BLOCK_THREADS;
             PyErr_SetString(OperationalError,
diff --git a/tests/test_cursor.py b/tests/test_cursor.py
index 7f3bf21..f9fc66c 100755
--- a/tests/test_cursor.py
+++ b/tests/test_cursor.py
@@ -29,6 +29,8 @@ import psycopg2.extensions
 from testutils import (unittest, ConnectingTestCase, skip_before_postgres,
     skip_if_no_namedtuple, skip_if_no_getrefcount, slow)
 
+import psycopg2.extras
+
 
 class CursorTests(ConnectingTestCase):
 
@@ -537,6 +539,49 @@ class CursorTests(ConnectingTestCase):
             self.assertRaises(exception, cur.callproc, procname, parameter_sequence)
             self.conn.rollback()
 
+    def test_external_close_sync(self):
+        # If a "victim" connection is closed by a "control" connection
+        # behind psycopg2's back, psycopg2 always handles it correctly:
+        # raise OperationalError, set conn.closed to 2. This reproduces
+        # issue #443, a race between control_conn closing victim_conn and
+        # psycopg2 noticing.
+        control_conn = self.conn
+        connect_func = self.connect
+        wait_func = lambda conn: None
+        self._test_external_close(control_conn, connect_func, wait_func)
+
+    def test_external_close_async(self):
+        # Issue #443 is in the async code too. Since the fix is duplicated,
+        # so is the test.
+        control_conn = self.conn
+        connect_func = lambda: self.connect(async=True)
+        wait_func = psycopg2.extras.wait_select
+        self._test_external_close(control_conn, connect_func, wait_func)
+
+    def _test_external_close(self, control_conn, connect_func, wait_func):
+        # The short sleep before using victim_conn the second time makes it
+        # much more likely to lose the race and see the bug. Repeating the
+        # test several times makes it even more likely.
+        for i in range(10):
+            victim_conn = connect_func()
+            wait_func(victim_conn)
+
+            with victim_conn.cursor() as cur:
+                cur.execute('select pg_backend_pid()')
+                wait_func(victim_conn)
+                pid1 = cur.fetchall()[0][0]
+
+            with control_conn.cursor() as cur:
+                cur.execute('select pg_terminate_backend(%s)', (pid1,))
+
+            time.sleep(0.001)
+            with self.assertRaises(psycopg2.DatabaseError):
+                with victim_conn.cursor() as cur:
+                    cur.execute('select 1')
+                    wait_func(victim_conn)
+
+            self.assertEqual(victim_conn.closed, 2)
+
 
 def test_suite():
     return unittest.TestLoader().loadTestsFromName(__name__)
author	Greg Ward <gward@dyn.com>	2016-06-28 17:46:21 -0400
committer	Daniele Varrazzo <daniele.varrazzo@gmail.com>	2017-03-14 12:08:03 +0000
commit	b203a7c775c5f81f59c6b4205c3c33455ceeec3d (patch)
tree	1d5a4e6419334610498cf5e83769fbf943485275
parent	3c124a0b87a7bde51519cd0eb98980f04eed1595 (diff)
download	psycopg2-b203a7c775c5f81f59c6b4205c3c33455ceeec3d.tar.gz