Pre-release cleanup.

author: Eric S. Raymond <esr@thyrsus.com> 2005-06-27 18:31:53 +0000
committer: Eric S. Raymond <esr@thyrsus.com> 2005-06-27 18:31:53 +0000
commit: caedffbd5c387fef901b11a642f83f9a7afa1b5a (patch)
tree: 0cbb7ce5aee95d1361c22291b93a15462cf08902
parent: e7d20f03c99f5d77d3be5411bab5fe56fe6cfb4f (diff)
download: gpsd-caedffbd5c387fef901b11a642f83f9a7afa1b5a.tar.gz
4 files changed, 82 insertions, 11 deletions
diff --git a/TODO b/TODO
index afd38794..6c3c1e8d 100644
--- a/TODO
+++ b/TODO
@@ -4,6 +4,48 @@ will unfold them again.
 
 ** Bugs:
 
+*** Under unknown conditions, a long-running xgps induces a memory leak in the X server
+
+Rob Janssen writes:
+>I have found something that leaks.  Not in our software, but in the X server.
+>But caused by xgps.
+>
+>After 3 copies of xgps ran for a couple of days, I noticed a lot of swap
+>is in use:
+>             total       used       free     shared    buffers     cached
+>Mem:       1035776     992324      43452          0     174580      54756
+>-/+ buffers/cache:     762988     272788
+>Swap:      2104440    1444944     659496
+>
+>This is more than normal for my system.  I noticed it when switching to a
+>virtual screen where mozilla is running.  It took several seconds to
+>redraw.
+>
+>I did a "ps axu" and found these interesting lines:
+>
+>USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
+>root      7004  2.5 33.6 1861128 348460 ?    SL   Jun19 210:46 /usr/X11R6/bi
+>ntp      18011  0.0  0.2  2696 2692 ?        SLs  Jun19   0:00 /usr/sbin/ntp
+>nobody   23105  0.0  0.0  7004  900 ?        S<sl Jun21   0:17 /home/rob/src
+>rob      28724  0.2  0.1  7084 1968 pts/18   S    Jun22   9:03 /home/rob/src
+>rob      28744  0.3  0.1  7084 2040 pts/18   S    Jun22  14:52 /home/rob/src
+>rob      28759  0.2  0.1  7084 1936 pts/18   S    Jun22   9:44 /home/rob/src
+>
+>So the gps programs were not consuming that much memory.
+>I still stopped the 3 xgps programs (the last three in this list) and then
+>the X server showed:
+>root      7004  2.5  2.2 367140 23424 ?      SL   Jun19 210:49 /usr/X11R6/bi
+>
+>So there certainly is a relation here.
+>Unfortunately I have zero knowledge about X programming.  I would guess
+>some kind of session or operation is started and never ended, or something
+>is said to be saved, but apparently it remains related to the specific
+>window because the X server neatly frees it once the program disconnects.
+
+But Rob's xgps memory leak doesn't reproduce on a stock Fedora Core 3
+system.  ESR tested for this in the simplest possible way, by doing
+system("free t") at the end of each handle_input() call.
+
 *** EPH and EPV reports are zeroed too often in the TSIP driver 
 
 There is some bad interaction between the policy code in
@@ -84,16 +126,6 @@ Rob recommends:
 
 I have not yet reproduced this.
 
-*** Possible resource-leak bug, not yet reproduced or confirmed
-
-Wojciech Kazubski <wk@ire.pw.edu.pl> reports: when I connect to gpsd first
-time everything goes fine, and several clients can connect without
-problem but if the last client disconnects, the gpsd does not respond
-to any inquiry. It is living and accepting commands but responding
-with GPSD,P=? or so. And possibly after some time (few hours?) it
-stops responding but the process still looks active (running out of
-resources??).
-
 ** To do:
 
 *** Track error computation
diff --git a/gpsd.c b/gpsd.c
index 24ea130e..d956b146 100644
--- a/gpsd.c
+++ b/gpsd.c
@@ -554,7 +554,7 @@ static int handle_request(int cfd, char *buf, int buflen)
 	    phrase[strlen(phrase)-1] = '\0';
 	    break;
 	case 'L':
-	    (void)snprintf(phrase, sizeof(phrase), ",L=2 " VERSION " abcdefiklmnpqrstuvwxy");	//ghj
+	    (void)snprintf(phrase, sizeof(phrase), ",L=2 " VERSION " abcdefiklmnopqrstuvwxyz");	//ghj
 	    break;
 	case 'M':
 	    if (assign_channel(whoami)==0 && (!whoami->device || whoami->device->gpsdata.fix.mode == MODE_NOT_SEEN))
diff --git a/test/garmin17n.log b/test/garmin17n.log
index 80ed5bdc..49ca0d2a 100644
--- a/test/garmin17n.log
+++ b/test/garmin17n.log
@@ -4,6 +4,7 @@
 # Pause-noted: Y
 # Well-behaved: Y
 # Submitted-by: Wojciech Kazubski <wk@ire.pw.edu.pl>
+# Comment: Only emits GPRMC when it has a fix.
 # Date: 12 Mar 2005
 $GPRMC,093802,A,5213.1439,N,02100.6511,E,000.0,226.0,160305,004.2,E,D*15
 $GPGGA,093802,5213.1439,N,02100.6511,E,2,10,0.9,137.2,M,36.2,M,,*43
diff --git a/www/faq.html b/www/faq.html
index 6a3884d2..99a32582 100644
--- a/www/faq.html
+++ b/www/faq.html
@@ -124,6 +124,44 @@ gaps, to do policy.  What you're seeing as a bug only looks like one
 because <code>xgps</code>, as is proper for a test client, has as
 little policy as possible.</p>
 
+<h1 id='lockup'>My <code>gpsd</code> sometimes stops responding overnight</h1>
+
+<p>At one point in the development of <code>gpsd</code> we got a
+report of the daemon ceasing to respond to queries when run for
+more than a day or so; the user, quite reasonably, suspected some sort
+of resource leak in the daemon.  On the other hand, other users reported
+good operation over much longer periods with the same version of
+the software. That suggests a bug at the level of the user's operating 
+system or local site configuration.</p>
+
+<p>Nevertheless, the possibility of a resource-leak bug alarmed us
+enough that after 2.26 one of us (ESR) built an entire test framework
+for auditing the code's dynamic behavior and used it to apply <a
+href="http://valgrind.org">Valgrind</a>.  You can look at the
+resulting script, valgrind-audit, in the source distribution.  This
+turned up a couple of minor leaks, but nothing sufficient to explain
+the report.</p>
+
+<p>One of our senior developers, Rob Janssen, has seen
+<code>gpsd</code> interact badly with overnight backups, pushing the
+system load average through the roof.  He says: "when you copy many
+gigabytes of data from disk to disk, the [Linux] kernel's buffer
+management goes completely haywire. [...]  I think this is caused both
+by allocation of many buffers for reading files, and by accumulation
+of many dirty buffers that still have to be written.  At some point,
+programs like gpsd (but also all interactive programs and the X
+display manager) come to a complete standstill while the system is
+swapping like mad."</p></p>
+
+<p>If Rob's analysis is correct, <code>gpsd</code> is a canary in a
+coal mine.  If your <code>gpsd</code> locks up after a long period of
+operation, you should look at your logs and see if you can connect the
+point at which it stopped responding to some kind of resource crisis 
+brought on by lots of I/O activity.</p>
+
+<p>Another thing to try is running <code>gpsd</code> under Valgrind overnight 
+and seeing if it reports any leaks.</p>
+
 <h1 id='why_migrate'>Why this version of <code>gpsd</code>?</h1>
 
 <p>If you have written a <code>gpsd</code>-aware application using one
author	Eric S. Raymond <esr@thyrsus.com>	2005-06-27 18:31:53 +0000
committer	Eric S. Raymond <esr@thyrsus.com>	2005-06-27 18:31:53 +0000
commit	caedffbd5c387fef901b11a642f83f9a7afa1b5a (patch)
tree	0cbb7ce5aee95d1361c22291b93a15462cf08902
parent	e7d20f03c99f5d77d3be5411bab5fe56fe6cfb4f (diff)
download	gpsd-caedffbd5c387fef901b11a642f83f9a7afa1b5a.tar.gz