summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorEric S. Raymond <esr@thyrsus.com>2005-06-27 18:31:53 +0000
committerEric S. Raymond <esr@thyrsus.com>2005-06-27 18:31:53 +0000
commitcaedffbd5c387fef901b11a642f83f9a7afa1b5a (patch)
tree0cbb7ce5aee95d1361c22291b93a15462cf08902
parente7d20f03c99f5d77d3be5411bab5fe56fe6cfb4f (diff)
downloadgpsd-caedffbd5c387fef901b11a642f83f9a7afa1b5a.tar.gz
Pre-release cleanup.
-rw-r--r--TODO52
-rw-r--r--gpsd.c2
-rw-r--r--test/garmin17n.log1
-rw-r--r--www/faq.html38
4 files changed, 82 insertions, 11 deletions
diff --git a/TODO b/TODO
index afd38794..6c3c1e8d 100644
--- a/TODO
+++ b/TODO
@@ -4,6 +4,48 @@ will unfold them again.
** Bugs:
+*** Under unknown conditions, a long-running xgps induces a memory leak in the X server
+
+Rob Janssen writes:
+>I have found something that leaks. Not in our software, but in the X server.
+>But caused by xgps.
+>
+>After 3 copies of xgps ran for a couple of days, I noticed a lot of swap
+>is in use:
+> total used free shared buffers cached
+>Mem: 1035776 992324 43452 0 174580 54756
+>-/+ buffers/cache: 762988 272788
+>Swap: 2104440 1444944 659496
+>
+>This is more than normal for my system. I noticed it when switching to a
+>virtual screen where mozilla is running. It took several seconds to
+>redraw.
+>
+>I did a "ps axu" and found these interesting lines:
+>
+>USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
+>root 7004 2.5 33.6 1861128 348460 ? SL Jun19 210:46 /usr/X11R6/bi
+>ntp 18011 0.0 0.2 2696 2692 ? SLs Jun19 0:00 /usr/sbin/ntp
+>nobody 23105 0.0 0.0 7004 900 ? S<sl Jun21 0:17 /home/rob/src
+>rob 28724 0.2 0.1 7084 1968 pts/18 S Jun22 9:03 /home/rob/src
+>rob 28744 0.3 0.1 7084 2040 pts/18 S Jun22 14:52 /home/rob/src
+>rob 28759 0.2 0.1 7084 1936 pts/18 S Jun22 9:44 /home/rob/src
+>
+>So the gps programs were not consuming that much memory.
+>I still stopped the 3 xgps programs (the last three in this list) and then
+>the X server showed:
+>root 7004 2.5 2.2 367140 23424 ? SL Jun19 210:49 /usr/X11R6/bi
+>
+>So there certainly is a relation here.
+>Unfortunately I have zero knowledge about X programming. I would guess
+>some kind of session or operation is started and never ended, or something
+>is said to be saved, but apparently it remains related to the specific
+>window because the X server neatly frees it once the program disconnects.
+
+But Rob's xgps memory leak doesn't reproduce on a stock Fedora Core 3
+system. ESR tested for this in the simplest possible way, by doing
+system("free t") at the end of each handle_input() call.
+
*** EPH and EPV reports are zeroed too often in the TSIP driver
There is some bad interaction between the policy code in
@@ -84,16 +126,6 @@ Rob recommends:
I have not yet reproduced this.
-*** Possible resource-leak bug, not yet reproduced or confirmed
-
-Wojciech Kazubski <wk@ire.pw.edu.pl> reports: when I connect to gpsd first
-time everything goes fine, and several clients can connect without
-problem but if the last client disconnects, the gpsd does not respond
-to any inquiry. It is living and accepting commands but responding
-with GPSD,P=? or so. And possibly after some time (few hours?) it
-stops responding but the process still looks active (running out of
-resources??).
-
** To do:
*** Track error computation
diff --git a/gpsd.c b/gpsd.c
index 24ea130e..d956b146 100644
--- a/gpsd.c
+++ b/gpsd.c
@@ -554,7 +554,7 @@ static int handle_request(int cfd, char *buf, int buflen)
phrase[strlen(phrase)-1] = '\0';
break;
case 'L':
- (void)snprintf(phrase, sizeof(phrase), ",L=2 " VERSION " abcdefiklmnpqrstuvwxy"); //ghj
+ (void)snprintf(phrase, sizeof(phrase), ",L=2 " VERSION " abcdefiklmnopqrstuvwxyz"); //ghj
break;
case 'M':
if (assign_channel(whoami)==0 && (!whoami->device || whoami->device->gpsdata.fix.mode == MODE_NOT_SEEN))
diff --git a/test/garmin17n.log b/test/garmin17n.log
index 80ed5bdc..49ca0d2a 100644
--- a/test/garmin17n.log
+++ b/test/garmin17n.log
@@ -4,6 +4,7 @@
# Pause-noted: Y
# Well-behaved: Y
# Submitted-by: Wojciech Kazubski <wk@ire.pw.edu.pl>
+# Comment: Only emits GPRMC when it has a fix.
# Date: 12 Mar 2005
$GPRMC,093802,A,5213.1439,N,02100.6511,E,000.0,226.0,160305,004.2,E,D*15
$GPGGA,093802,5213.1439,N,02100.6511,E,2,10,0.9,137.2,M,36.2,M,,*43
diff --git a/www/faq.html b/www/faq.html
index 6a3884d2..99a32582 100644
--- a/www/faq.html
+++ b/www/faq.html
@@ -124,6 +124,44 @@ gaps, to do policy. What you're seeing as a bug only looks like one
because <code>xgps</code>, as is proper for a test client, has as
little policy as possible.</p>
+<h1 id='lockup'>My <code>gpsd</code> sometimes stops responding overnight</h1>
+
+<p>At one point in the development of <code>gpsd</code> we got a
+report of the daemon ceasing to respond to queries when run for
+more than a day or so; the user, quite reasonably, suspected some sort
+of resource leak in the daemon. On the other hand, other users reported
+good operation over much longer periods with the same version of
+the software. That suggests a bug at the level of the user's operating
+system or local site configuration.</p>
+
+<p>Nevertheless, the possibility of a resource-leak bug alarmed us
+enough that after 2.26 one of us (ESR) built an entire test framework
+for auditing the code's dynamic behavior and used it to apply <a
+href="http://valgrind.org">Valgrind</a>. You can look at the
+resulting script, valgrind-audit, in the source distribution. This
+turned up a couple of minor leaks, but nothing sufficient to explain
+the report.</p>
+
+<p>One of our senior developers, Rob Janssen, has seen
+<code>gpsd</code> interact badly with overnight backups, pushing the
+system load average through the roof. He says: "when you copy many
+gigabytes of data from disk to disk, the [Linux] kernel's buffer
+management goes completely haywire. [...] I think this is caused both
+by allocation of many buffers for reading files, and by accumulation
+of many dirty buffers that still have to be written. At some point,
+programs like gpsd (but also all interactive programs and the X
+display manager) come to a complete standstill while the system is
+swapping like mad."</p></p>
+
+<p>If Rob's analysis is correct, <code>gpsd</code> is a canary in a
+coal mine. If your <code>gpsd</code> locks up after a long period of
+operation, you should look at your logs and see if you can connect the
+point at which it stopped responding to some kind of resource crisis
+brought on by lots of I/O activity.</p>
+
+<p>Another thing to try is running <code>gpsd</code> under Valgrind overnight
+and seeing if it reports any leaks.</p>
+
<h1 id='why_migrate'>Why this version of <code>gpsd</code>?</h1>
<p>If you have written a <code>gpsd</code>-aware application using one