diff options
author | Eric S. Raymond <esr@thyrsus.com> | 2005-06-27 18:31:53 +0000 |
---|---|---|
committer | Eric S. Raymond <esr@thyrsus.com> | 2005-06-27 18:31:53 +0000 |
commit | caedffbd5c387fef901b11a642f83f9a7afa1b5a (patch) | |
tree | 0cbb7ce5aee95d1361c22291b93a15462cf08902 | |
parent | e7d20f03c99f5d77d3be5411bab5fe56fe6cfb4f (diff) | |
download | gpsd-caedffbd5c387fef901b11a642f83f9a7afa1b5a.tar.gz |
Pre-release cleanup.
-rw-r--r-- | TODO | 52 | ||||
-rw-r--r-- | gpsd.c | 2 | ||||
-rw-r--r-- | test/garmin17n.log | 1 | ||||
-rw-r--r-- | www/faq.html | 38 |
4 files changed, 82 insertions, 11 deletions
@@ -4,6 +4,48 @@ will unfold them again. ** Bugs: +*** Under unknown conditions, a long-running xgps induces a memory leak in the X server + +Rob Janssen writes: +>I have found something that leaks. Not in our software, but in the X server. +>But caused by xgps. +> +>After 3 copies of xgps ran for a couple of days, I noticed a lot of swap +>is in use: +> total used free shared buffers cached +>Mem: 1035776 992324 43452 0 174580 54756 +>-/+ buffers/cache: 762988 272788 +>Swap: 2104440 1444944 659496 +> +>This is more than normal for my system. I noticed it when switching to a +>virtual screen where mozilla is running. It took several seconds to +>redraw. +> +>I did a "ps axu" and found these interesting lines: +> +>USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND +>root 7004 2.5 33.6 1861128 348460 ? SL Jun19 210:46 /usr/X11R6/bi +>ntp 18011 0.0 0.2 2696 2692 ? SLs Jun19 0:00 /usr/sbin/ntp +>nobody 23105 0.0 0.0 7004 900 ? S<sl Jun21 0:17 /home/rob/src +>rob 28724 0.2 0.1 7084 1968 pts/18 S Jun22 9:03 /home/rob/src +>rob 28744 0.3 0.1 7084 2040 pts/18 S Jun22 14:52 /home/rob/src +>rob 28759 0.2 0.1 7084 1936 pts/18 S Jun22 9:44 /home/rob/src +> +>So the gps programs were not consuming that much memory. +>I still stopped the 3 xgps programs (the last three in this list) and then +>the X server showed: +>root 7004 2.5 2.2 367140 23424 ? SL Jun19 210:49 /usr/X11R6/bi +> +>So there certainly is a relation here. +>Unfortunately I have zero knowledge about X programming. I would guess +>some kind of session or operation is started and never ended, or something +>is said to be saved, but apparently it remains related to the specific +>window because the X server neatly frees it once the program disconnects. + +But Rob's xgps memory leak doesn't reproduce on a stock Fedora Core 3 +system. ESR tested for this in the simplest possible way, by doing +system("free t") at the end of each handle_input() call. + *** EPH and EPV reports are zeroed too often in the TSIP driver There is some bad interaction between the policy code in @@ -84,16 +126,6 @@ Rob recommends: I have not yet reproduced this. -*** Possible resource-leak bug, not yet reproduced or confirmed - -Wojciech Kazubski <wk@ire.pw.edu.pl> reports: when I connect to gpsd first -time everything goes fine, and several clients can connect without -problem but if the last client disconnects, the gpsd does not respond -to any inquiry. It is living and accepting commands but responding -with GPSD,P=? or so. And possibly after some time (few hours?) it -stops responding but the process still looks active (running out of -resources??). - ** To do: *** Track error computation @@ -554,7 +554,7 @@ static int handle_request(int cfd, char *buf, int buflen) phrase[strlen(phrase)-1] = '\0'; break; case 'L': - (void)snprintf(phrase, sizeof(phrase), ",L=2 " VERSION " abcdefiklmnpqrstuvwxy"); //ghj + (void)snprintf(phrase, sizeof(phrase), ",L=2 " VERSION " abcdefiklmnopqrstuvwxyz"); //ghj break; case 'M': if (assign_channel(whoami)==0 && (!whoami->device || whoami->device->gpsdata.fix.mode == MODE_NOT_SEEN)) diff --git a/test/garmin17n.log b/test/garmin17n.log index 80ed5bdc..49ca0d2a 100644 --- a/test/garmin17n.log +++ b/test/garmin17n.log @@ -4,6 +4,7 @@ # Pause-noted: Y # Well-behaved: Y # Submitted-by: Wojciech Kazubski <wk@ire.pw.edu.pl> +# Comment: Only emits GPRMC when it has a fix. # Date: 12 Mar 2005 $GPRMC,093802,A,5213.1439,N,02100.6511,E,000.0,226.0,160305,004.2,E,D*15 $GPGGA,093802,5213.1439,N,02100.6511,E,2,10,0.9,137.2,M,36.2,M,,*43 diff --git a/www/faq.html b/www/faq.html index 6a3884d2..99a32582 100644 --- a/www/faq.html +++ b/www/faq.html @@ -124,6 +124,44 @@ gaps, to do policy. What you're seeing as a bug only looks like one because <code>xgps</code>, as is proper for a test client, has as little policy as possible.</p> +<h1 id='lockup'>My <code>gpsd</code> sometimes stops responding overnight</h1> + +<p>At one point in the development of <code>gpsd</code> we got a +report of the daemon ceasing to respond to queries when run for +more than a day or so; the user, quite reasonably, suspected some sort +of resource leak in the daemon. On the other hand, other users reported +good operation over much longer periods with the same version of +the software. That suggests a bug at the level of the user's operating +system or local site configuration.</p> + +<p>Nevertheless, the possibility of a resource-leak bug alarmed us +enough that after 2.26 one of us (ESR) built an entire test framework +for auditing the code's dynamic behavior and used it to apply <a +href="http://valgrind.org">Valgrind</a>. You can look at the +resulting script, valgrind-audit, in the source distribution. This +turned up a couple of minor leaks, but nothing sufficient to explain +the report.</p> + +<p>One of our senior developers, Rob Janssen, has seen +<code>gpsd</code> interact badly with overnight backups, pushing the +system load average through the roof. He says: "when you copy many +gigabytes of data from disk to disk, the [Linux] kernel's buffer +management goes completely haywire. [...] I think this is caused both +by allocation of many buffers for reading files, and by accumulation +of many dirty buffers that still have to be written. At some point, +programs like gpsd (but also all interactive programs and the X +display manager) come to a complete standstill while the system is +swapping like mad."</p></p> + +<p>If Rob's analysis is correct, <code>gpsd</code> is a canary in a +coal mine. If your <code>gpsd</code> locks up after a long period of +operation, you should look at your logs and see if you can connect the +point at which it stopped responding to some kind of resource crisis +brought on by lots of I/O activity.</p> + +<p>Another thing to try is running <code>gpsd</code> under Valgrind overnight +and seeing if it reports any leaks.</p> + <h1 id='why_migrate'>Why this version of <code>gpsd</code>?</h1> <p>If you have written a <code>gpsd</code>-aware application using one |