If the GPSD project ever needs a slogan, it will be "Hide the ugliness!" GPS technology works, but is baroque, ugly and poorly documented. Our job is to know the grotty details so nobody else has to.

Audience and supported hardware

Our paradigm user, the one we have in mind when making design choices, is running navigational or wardriving software on a Linux laptop or PDA. Some of our developers are actively interested in supporting GPS-with-SBC (Single-Board-Computer) hardware such as is used in balloon telemetry, marine navigation, and aviation.

These two use cases have similar issues in areas like client interfaces and power/duty-cycle management. The one place where they differ substantially is that in the SBC case we generally know in advance what devices will be connected and when. Thus, by designing for the less predictable laptop/PDA environment, we cover both. But it is not by accident that the source code can be built with support for only any single GPS type compiled in.

While we will support survey-grade GPSes when and if we have that hardware for testing, our focus will probably remain on inexpensive and readily-available consumer-grade GPS hardware, especially GPS mice.

The time and location service

The primary aim of the GPSD project is to support a simple time-and-location service for users and their geographically-aware applications.

A GPS is a device for delivering fourteen numbers: x, y, z, t, vx, vy, vz, and error estimates for each of these seven coordinates. The gpsd daemon's job is to deliver these numbers to user applications with minimum fuss.

'Minimum fuss' means that the only action the user should have to take to enable location service is to plug in a GPS. The gpsd daemon, and its associated hotplug scripts or local equivalent, is responsible for automatically configuring itself. That includes autobauding, handshaking with the device, determining the correct mode or protocol to use, and issuing any device-specific initializations required.

Features (such as GPS type or serial-parameter switches) that would require the user to perform administrative actions to enable location service will be rejected. GPSes that cannot be autoconfigured will not be supported. 99% of the GPS hardware on the market in 2005 is autoconfigurable, and the design direction of GPS chipsets is such that this percentage will rise rather than fall; we deliberately choose simplicity of interface and zero administration over 100% coverage.

Here is a concrete example of how this principle applies. At least one very low-end GPS chipset does not deliver correct checksums on the packets it ships to a host unless it has a fix. Typically, GPSes do not have a fix when they are plugged in, at the time gpsd must recognize and autoconfigure the device. Thus, supporting this chipset would require that we either (a) disable packet integrity checking in the autoconfiguration code, making detection of other more well-behaved devices unreliable, or (b) add an invocation switch to disable packet integrity checking for that chipset alone. We refuse to do either, and do not support this chipset.

The testing and tuning tools

Another principal goal of the GPSD software is that it be able to demonstrate its own correctness, give technical users good tools for measuring GPS accuracy and diagnosis of GPS idiosyncrasies, and provide a test framework for gpsd-using applications.

Accordingly, we support the gpsfake tool that simulates a GPS using recorded or synthetic log data. We support gpsprof, which collects accuracy and latency statistics on GPSes and the GPS+gpsd combination. And we include a comprehensive regression-test suite with the package. These tools are not accidents, they are essential to ensure that the basic GPS-monitoring code is not merely correct but demonstrably correct.

We support a tool, sirfmon, which is a low-level packet monitor and diagnostic tool for the chipset with 80% market share. sirfmon is capable of tuning some device-specific control settings such as the SiRF static-navigation mode. A future direction of the project is to support diagnostic monitoring and tuning for other chipsets.

The upgrading tools

A secondary aim of the GPSD project is to support GPS firmware upgrades under open-source operating systems, freeing GPS users from reliance on closed-source vendor tools and closed-source operating systems.

We have made a first step in that direction with the initial pre-alpha version of gpsflash, currently supporting SiRF chips only. A future direction of the project is to have gpsflash support firmare upgrades for other chipsets.

The GPS/GNSS monitoring tools

Another secondary goal of the project is to provide open-source tools for diagnostic monitoring and accuracy profiling not just of individual GPSes but of the GPS/GNSS network itself. The protocols (such as IS-GPS-200 for satellite downlink and RCTM104 for differential-GPS corrections) are notoriously poorly documented, and open-source tools for interpreting them have in the past been hard to find and only sporadically maintained.

We aim to remedy this. Our design goal is to provide lossless translators between these protocols and readable, documented text-stream formats.

We currently provide a tool for decoding RTCM104 reports on satellite health, almanacs, and pseudorange information from differential-GPS radios and reference stations. A future direction of the project is to support an RTCM104 encoder.

Contribution guidelines

Send patches in diff -u or -c format

We prefer diff -u format, but diff -c is acceptable. Do not send patches in the default (-e or ed) mode, as they are too brittle.

Before shipping a patch, you should go through the following checklist:

If you are introducing a new feature or driver, include a documentation patch.
Use the regression-test suite — "make testregress" — to check that your patch doesn't break the handling of any already-supported GPS.
If you have valgrind(1) on your development system, run valgrind-audit and look out for reports of memory leaks and other dynamic-allocation problems (see the Valgrind website for a description of this tool if you don't already know about it). If you can't run valgrind. tell us that you couldn't do it.
If you have splint(1) on your development system, make sure the patched code displays no warnings when you run 'make splint' (see the Splint website for further description of this tool if you don't already know about it). If you can't run splint, tell us that you couldn't do it.

If you are contributing a driver for a new GPS, please also do the following things:

Send us a representative sample of the GPS output for future regression-testing.
Write a hardware entry describing the GPS for the hardware page.

There's a whole section on adding new drivers later in this document.

The license on contributions

The GPSD libraries are under the BSD license. Please do not send contributions with GPL attached!

The reason for this policy is to avoid making people nervous about linking the GPSD libraries to applications that may be under other licenses (such as MIT, BSD, AFL, etc.).

Don't add invocation options!

If you send a patch that adds a command-line option to the daemon, it will almost certainly be refused. Ditto for any patch that requires gpsd to parse a dotfile.

One of the major objectives of this project is for gpsd not to require administration — under Linux, at least. It autobauds, it does protocol discovery, and it's activated by the hotplug system. Arranging these things involved quite a lot of work, and we're not willing to lose the zero-configuration property that work gained us.

Instead of adding a command-line option to support whatever feature you had in mind, try to figure out a way that the feature can autoconfigure itself by doing runtime checks. If you're not clever enough to manage that, consider whether your feature control might be implemented with an extension to the gpsd protocol or the control-socket command set.

Here are three specific reasons command-line switches are evil:

(1) Command-line switches are often a lazy programmer's way out of writing correct adaptive logic. This is why we keep rejecting requests for a baud-rate switch and a GPS type switch — the right thing is to make the packet-sniffer work better, and if we relented in our opposition the pressure to get that right would disappear. Suddenly we'd be back to end-users having to fiddle with settings the software ought to figure out for itself, which is unacceptable.

(2) Command-line switches without corresponding protocol commands pin the daemon's behavior for its entire lifespan. Why should the user have to fix a policy at startup time and never get to change his/her mind afterwards? Stupid design...

(3) The command-line switches used for a normal gpsd startup can only be changed by modifying the hotplug script. Requiring end-users to modify hotplug scripts (or anything else in admin space) is a crash landing.

Don't use malloc!

The best way to avoid having dynamic-memory allocation problems is not to use malloc/free at all. The gpsd daemon doesn't (though the client-side code does). Thus, even the longest-running instance can't have memory leaks. The only cost for this turned out to be embedding a PATH_MAX-sized buffer in the gpsd.h structure.

Don't undo this by using malloc/free in a driver or anywhere else.

Understanding the code

Debugging

For debugging purposes, it may be helpful to configure with --disable-shared. This turns off all the shared-library crud, making it somewhat easier to use gdb.

There is a script called logextract in the distribution that you can use to strip clean NMEA out of the log files produced by gpsd. This can be useful if someone ships you a log that they allege caused gpsd to misbehave.

gpsfake enables you to repeatedly feed a packet sequence to a gpsd instance running as non-root. Watching such a session with gdb should smoke out any repeatable bug pretty quickly.

The parsing of GPGSV sentences in the NMEA driver has been a persistent and nasty trouble spot, causing more buffer overruns and weird secondary damage than all the rest of the code put together. Any time you get a bug report that seems to afflict NMEA devices only, suspicion should focus here.

Profiling

There is a barely-documented Z command in the daemon will cause it to emit a $ clause on every request. The $ clause contains four space-separated fields:

An identifing sentence tag.
The character length of the sentence containing the timestamp data.
The timestamp associated with the sentence, in seconds since the Unix epoch (this time is leap-second corrected, like UTC). This timestamp may be zero. If nonzero, it is the base time for the packet.
An offset from the timestamp telling when gpsd believes the transmission of the current packet started (this is actually recorded just before the first read of the new packet). If the sentence timestamp was zero, this offset is a full timestamp and the base time of the packet.
An offset from the base time telling when gpsd received the last bytes of the packet.
An offset from the base time telling when gpsd decoded the data.
An offset from the base time taken just before encoding the response — effectively, when gpsd was polled to transmit the data.
An offset from the base time telling when gpsd transmitted the data.

The Z figures measure components of the latency between the GPS's time measurement and when the sentence data became available to the client. For it to be meaningful, the GPS has to ship timestamps with sub-second precision. SiRF-II and Evermore chipsets ship times with millisecond resolution. Your machine's time reference must also be accurate to subsecond precision; I recommend using ntpd, which will normally give you about 15 microseconds precision (two orders of magnitude better than GPSes report).

Note, some inaccuracy is introduced into the start- and end-of-packet timestamps by the fact that the last read of a packet may grab a few bytes of the next one.

The distribution lincludes a Python script, gpsprof, that uses the Z command to collect profiling information from a running GPS instance. You can use this to measure the latency at each stage — GPS to daemon, daemon to client library — and to estimate the portion of the latency induced by serial transmit time. The gpsprof script creates latency plots using gnuplot(1). It can also report the raw data.

Porting to weird machines: endianness, width, and signedness issues.

The gpsd code is well-tested on 32- and 64-bit IA chips, also on PPCs. Thus, it's known to work on mainstream chips of either 32 or 64 bits and either big-endian or little-endian representation with IEE754 floating point.

Handling of NMEA devices should not be sensitive to the machine's internal numeric representations, However, because the binary-protocol drivers have to mine bytes out of the incoming packets and mung them into fixed-width integer quantities, there could potentially be issues on weird machines. The regression test should spot these.

If you are porting to a true 16-bit machine, or something else with an unusual set of data type widths, take a look at bits.h. We've tried to collect all the architecture dependencies here. If splint gives you warnings, it is possible you may need to adjust the -D directives in .splintrc that are used to define away fixed-width typedefs.

(No, we don't know why splint doesn't handle these natively.)

Architecture and how to hack it

gpsd is not a complicated piece of code. Essentially, it spins in a loop polling for input from one of three sources:

A client making requests over a TCP/IP port.
A set of GPSes, connected via serial or USB devices.
A DGPS server issuing periodic differential-GPS updates.

The daemon only connects to a GPS when clients are connected to it. Otherwise all GPS devices are closed and the daemon is quiescent, but retains fix and timestamp data from the last active period.

All writes to client sockets go through throttled_write(). This code addresses two cases. First, client has dropped the connection. Second, client is connected but not picking up data and our buffers are backing up. If we let this continue, the write buffers will fill and the effect will be denial-of-service to clients that are better behaved.

Our strategy is brutally simple and takes advantage of the fact that GPS data has a short shelf life. If the client doesn't pick it up within a few minutes, it's probably not useful to that client. So if data is backing up to a client, drop that client. That's why we set the client socket to nonblocking.

GPS input updates an internal data structure which has slots in it for all the data you can get from a GPS. Client commands mine that structure and ship reports up the socket to the client. DGPS data is passed through, raw, to the GPS.

The trickiest part of the code is the handling of input sources in gpsd.c itself. It had to tolerate clients connecting and disconnecting at random times, and the GPS being unplugged and replugged, without leaking file descriptors; also arrange for the GPS to be powered up when and only when clients are active.

Autoconfiguration

One of the design goals for gpsd is to be as near zero-configuration as possible. Under most circumstances, it doesn't require either the GPS type or the serial-line parameters to connect to it to be specified. Presently, here's how the autoconfig works.

At each baud rate gpsd grabs packets until it sees either a well-formed and checksum-verified NMEA packet, a well-formed and checksum-verified packet of one of the binary protocols, or it sees one of the two special trigger strings EARTHA or ASTRAL, or it fills a long buffer with garbage (in which case it steps to the next baud rate).
If it finds a SiRF packet, it queries the chip for firmware version. If the version is < 231.000 it drops back to SiRF NMEA. We're done.
If it finds a Zodiac binary packet (led with 0xff 0x81), it switches to the Zodiac driver. We're done.
If it finds an Evermore binary packet (led with DEL=0x10 followed by STX=0x02) it switches to Evermore binary protocol. We're done.
If it finds a TSIP binary packet (led with 0x10=DLE), it switches to the TSIP driver. We're done.
If it finds an iTrax binary packet (led with <! ), it switches to the iTrax driver. We're done.
If it finds EARTHA, it selects the Earthmade driver, which then flips the connection to Zodiac binary mode. We're done.
If it finds ASTRAL, it feeds the TripMate on the other end what it wants and goes to Tripmate NMEA mode. We're done.
If it finds a NMEA packet, it selects the NMEA driver. This initializes by shipping all vendor-specific initialization strings to the device. The objectives are to enable GSA, disable GLL, and disable VTG. Probe strings go here too, like the one that turns on SiRF debugging output in order to detect SiRF chips.
Now gpsd reads NMEA packets. If it sees a driver trigger string it invokes the matching driver. Presently there is really only one of these: "$Ack Input 105.\r\n", the response to the SiRF probe. On seeing this, gpsd switches from NMEA to SiRF binary mode, probes for firmware version, and either stays in binary or drops back to SiRF NMEA.

The outcome is that we know exactly what we're looking at, without any driver-type or baud rate options.

Error modeling

To estimate errors (which we must do if the GPS isn't nice and reports them in meters with a documented confidence interval), we need to multiply an estimate of User Equivalent Range Error (UERE) by the appropriate dilution factor,

The UERE estimate is usually computed as the square root of the sum of the squares of individual error estimates from a physical model. The following is a representative physical error model for satellite range measurements:

From R.B Langley's 1997 "The GPS error budget". GPS World , Vol. 8, No. 3, pp. 51-56

Atmospheric error — ionosphere	7.0m
Atmospheric error — troposphere	0.7m
Clock and ephemeris error	3.6m
Receiver noise	1.5m
Multipath effect	1.2m

From Hoffmann-Wellenhof et al. (1997), "GPS: Theory and Practice", 4th Ed., Springer.

Code range noise (C/A)	0.3m
Code range noise (P-code)	0.03m
Phase range	0.005m

We're assuming these are 2-sigma error ranges. This needs to be checked in the sources. If they're 1-sigma the resulting UEREs need to be doubled.

See this discussion of conversion factors.

Carl Carter of SiRF says: "Ionospheric error is typically corrected for at least in large part, by receivers applying the Klobuchar model using data supplied in the navigation message (subframe 4, page 18, Ionospheric and UTC data). As a result, its effect is closer to that of the troposphere, amounting to the residual between real error and corrections."

"Multipath effect is dramatically variable, ranging from near 0 in good conditions (for example, our roof-mounted antenna with few if any multipath sources within any reasonable range) to hundreds of meters in tough conditions like urban canyons. Picking a number to use for that is, at any instant, a guess."

"Using Hoffman-Wellenhoff is fine, but you can't use all 3 values. You need to use one at a time, depending on what you are using for range measurements. For example, our receiver only uses the C/A code, never the P code, so the 0.03 value does not apply. But once we lock onto the carrier phase, we gradually apply that as a smoothing on our C/A code, so we gradually shift from pure C/A code to nearly pure carrier phase. Rather than applying both C/A and carrier phase, you need to determine how long we have been using the carrier smoothing and use a blend of the two."

On Carl's advice we would apply tropospheric error twice, and use the largest Wellenhof figure:

UERE = sqrt(0.7^2 + 0.7^2 + 3.6^2 + 1.5^2 + 1.2^2 + 0.3^2) = 4.1

DGPS corrects for atmospheric distortion, ephemeris error, and satellite/ receiver clock error. Thus:

UERE = sqrt(1.5^2 + 1.2^2 + 0.3^2) = 1.8

which we round up to 2 (95% confidence).

Due to multipath uncertainty, Carl says 4.1 is too low and recommends a non-DGPS UERE estimate of 8 (95% confidence). That's what we use.

Known trouble spots

The Y2.1K problem and other calendar issues

Because of limitations in various GPS protocols (e.g., they were designed by fools who weren't looking past the ends of their noses) this code unavoidably includes some assumptions that will turn around and bite on various future dates.

The two specific problems are:

NMEA delivers only two-digit years.
SiRF chips at firmware level 231 deliver only GPS time in binary mode, not leap-second-corrected UTC.

See the timebase.h file for various constants that will need to be tweaked accasionally to cope with these problems.

Note that gpsd does not rely on the system clock in any way. This is so you can use it to set the system clock.

Hotplug interface problems

The hotplug interface works pretty nicely for telling gpsd which device to look at, at least on my FC3/FC4/FC5 Linux machines. The fly in the ointment is that we're using a deprecated version of the interface, the old-style /etc/hotplug version with usermap files.

It is unlikely this interface will be dropped by distro makers any time soon, because it's supporting a bunch of popular USB cameras. Still, it would be nice not to be using a deprecated interface.

There is experimental udev support in the distribution now. Someday this will replace the hotplug stuff.

A more general problem: the hotplug code we have is Linux-specific. OpenBSD (at least) features a hotplug daemon with similar capabilities. We ought to do the right thing there as well.

Security Issues

Between versions 2.16 and 2.20, hotplugging was handled in the most obvious way, by allowing the F command to declare new GPS devices for gpsd to look at. Because gpsd runs as root, this had problems:

A malicious client with non-root access on the host could use F to point gpsd at a spoof GPS that was actually a pty feeding bogus location data.
A malicious client could use repeated probes of a target tty or other device to cause data loss to other users. This is a potential remote exploit! Not too bad if the bytes he steals are your mouse, it would just get jumpy and jerky — but suppose they're from an actual tty and sections drop out of a serial data stream you were relying on?

The conclusion was inescapable. Switching among and probing devices that gpsd already knows about can be an unprivileged operation, but editing gpsd's device list must be privileged. Hotplug scripts should be able to do it, but ordinary clients should not.

Adding an authentication mechanism was considered and rejected (can you say "can of big wriggly worms"?). Instead, there is a separate control channel for the daemon, only locally accessible, only recognizing "add device" and "remove device" commands.

The channel is a Unix-domain socket owned by root, so it has file-system protection bits. An intruder would need root permissions to get at it, in which case you'd have much bigger problems than a spoofed GPS.

More generally, certainly gpsd needs to treat command input as untrusted and for safety's sake should treat GPS data as untrusted too (in particular this means never assuming that either source won't try to overflow a buffer).

Daemon versions after 2.21 drop privileges after startup, setting UID to "nobody" and GID to whichever group owns the GPS device specified at startup time — or, if it doesn't exist, the system's lowest-numbered TTY device named in PROTO_TTY. It may be necessary to change PROTO_TTY in gpsd.c for non-Linux systems.

Adding new GPS types

This section explains the conventions drivers for new devices should follow.

Driver architecture

Internally, gpsd supports multiple GPS types. All are represented by driver method tables; the main loop knows nothing about the driver methods except when to call them. At any given time one driver is active; by default it's the NMEA one.

To add a new device, populate another driver structure and add it to the null-terminated array in drivers.c.

Unless your driver is a nearly trivial variant on an existing one, it should live in its own C source file named after the driver type. Add it to the libgps_c_sources name list in Makefile.am

The easiest way write a driver is probably to copy the driver_proto.c file in the source distribution, change names appropriately, and write the guts of the analyzer and writer functions. Look in gpsutils.c before you do; driver helper functions live there. Also read some existing drivers for clues.

When not to add a driver

It is not necessary to add a driver just because your NMEA GPS wants some funky initialization string. Simply ship the string in the initializer for the default NMEA driver. Because vendor control strings live in vendor-specific namespaces (PSRF for SiRF, PGRM for Garmin, etc.) your initializing control string will almost certainly be ignored by anything not specifically watching for it.

Initializing time and date

Some mode-changing commands have time field that initializes the GPS clock. If the designers were smart, they included a control bit that allows the GPS to retain its clock value (and previous fix, if any) and for you to leave those fields empty (sometimes this is called "hot start").

If the GPS-Week/TOW fields are required, as on the Evermore chip, don't just zero them. GPSes do eventually converge on the correct time when they've exchanged handshakes with enough satellites, but the time required for convergence is proportional to how far off the initial value is. So make a point of getting the GPS week right.

How drivers are invoked

Drivers are invoked in one of three ways: (1) when the NMEA driver notices a trigger string associated with another driver. (2) when the packet state machine in packet.c recognizes a special packet type, or (3) when a probe function returns true during device open.

Each driver may have a trigger string that the NMEA interpreter watches for. When that string is recognized at the start of a line, the interpreter switches to its driver.

When a driver switch takes place, the old driver's wrapup method is called. Then the new driver's initializer method is called.

A good thing to send from the NMEA initializer is probe strings. These are strings which should elicit an identifying response from the GPS that you can use as a trigger string for a native-mode driver.

Don't worry about probe strings messing up GPSes they aren't meant for. In general, all GPSes have rather rigidly defined packet formats with checksums. Thus, for this probe to look legal in a different binary command set, not only would the prefix and any suffix characters have to match, but the checksum algorithm would have to be identical.

Incoming characters from the GPS device are gathered into packets by an elaborate state machine in packet.c. The purpose of this state machine is so gpsd can autobaud and recignize GPS types automatically. The other way for a driver to be invoked is for the state machine to recognize a special packet type associated with the driver.

If you have to add a new packet type to packet.c, add tests for the type to the TESTMAIN code. Also, remember to tell gpsfake how to gather the new packet type so it can handle logs for regression testing. The relevant function in gpsfake is packet_get(). It doesn't have to deal with garbage or verify checksums, as we assume the logfiles will be clean packet sequences.

Probe functions are interpreted for drivers that don't use the packet getter because they read from a device with special kernel support. See the Garmin binary driver for an example.

Where to put the data you get from the GPS

Your driver should put new data from each incoming packet or sentence in the 'newdata' member of the GPS, and return a validity flag mask telling what members were updated (all float members are initially set --> to not-a-number. as well). There is driver-independent code that will be responsible for merging that new data into the existing fix. To assist this, the CYCLE_START_SET flag is special. Set this when the driver returns the first timestamped message containing fix data in in an update cycle. (This excludes satellite-picture messages and messages about GPS status that don't contain fix data.)

Your packet parser must return field-validity mask bits (using the _SET macros in gps.h), suitable to be put in session->gpsdata.valid. The watcher-mode logic relies on these as its way of knowing what to publish. Also, you must ensure that gpsdata.fix.mode is set properly to indicate fix validity after each message; the framework code relies on this. Finally, you must set gpsdata.status to indicate when DGPS fixes are available, whether through RTCM or WAAS/Egnos.

Your packet parser is also responsible for setting the tag field in the gps_data_t structure. This is the string that will be emitted as the first field of each $ record for profiling. The packet getter will set the sentence-length for you; it will be raw byte length, including both payload and header/trailer bytes.

Note, also, that all the timestamps your driver puts in the session structure should be UTC (with leap-second corrections) not just Unix seconds since the epoch. The report-generator function for D does not apply a timezone offset.

Report errors with a 95% confidence interval

gpsd drivers are expected to report position error estimates with a 95% confidence interval. A few devices (Garmins and Zodiacs) actually report error estimates. For the rest we have to compute them using an error model.

Here's a table that explains how to convert from various confidence interval units you might see in vendor documentation.

sqr(alpha)	Probability	Notation
1.00	39.4%	1-sigma or standard ellipse
1.18	50.0%	Circular Error Probable (CEP)
1.414	63.2%	Distance RMS (DRMS)
2.00	86.5%	2 sigma ellipse
2.45	95.0%	95% confidence level
2.818	98.2%	2DRMS
3.00	98.9%	3 sigma ellipse

There are constants in gpsd.h for these factors.

Log files for regression testing

Any time you add support for a new GPS type, you should also send us a representative log for your GPS. This will help ensure that support for your device is never broken in any gpsd release, because we will run the full regression before we ship.

The correct format for a capture fiilr is described in the FAQ entry on reporting bugs.

See the header comment of the gpsfake.py module for more about the logfile format.

An ideal log file for regression testing would include an initial portion during which the GPS has no fix, a portion during which it has a fix but is stationary, and a portion during which it is moving.

Throughput computation for baud rate changes

At low baud rates it is possible to try to push more characters of NMEA through per cycle than the time to transmit will allow. Here are the maxima to use for computation:

GLL	51
GGA	82
VTG	40
RMC	75
GSA	67
GSV	60 (per line, thus 180 for a set of 3)
ZDA	34

The transmit time for a cycle (which must be less than 1 second) is the total character count multiplied by 10 and divided by the baud rate.

A typical budget is GGA, RMC, GSA, 3*GSV = 82+75+67+(3*60) = 404.

When you write a driver that includes the capability to change sampling rates, you must fill in the cycle_chars member with a maximum character length so the daemon framework code will be able to compute when a sample-rate change will work. If you have to estimate this number, err on the high side.

The buffering problem

Considered in the abstract, the cleanest thing for a position/velocity/time oracle to return is a 14-tuple including position components in all four dimensions, velocity in three, and associated error estimates for all seven degrees of freedom. This is what the O message in GPSD protocol attempts to deliver.

If GPS hardware were ideally designed, we'd get exactly one report like this per cycle from our device. That's what we get from the SiRF (packet type 02), Zodiac (packet type 1000), Garmin Binary, and iTalk (NAV_FIX message) protocols. Garmin and Trimble also implement full PVT solutions in a single line of text. These, together, account for a share of the GPS market that is 80% and rising in 2006.

Unfortunately, many GPSes actually deliver their PVT reports as a collection of sentences in NMEA 0183 (or as packets in a vendor binary protocol less well designed than SiRF's) each of which is only a partial report. Here's the most important kind of incompleteness: for historical reasons, NMEA splits 2-D position info and altitude into two different messages (GGA and GPRMC or GLL), each issued once during the normal 1-second send cycle.

Mapping the design space

For NMEA devices, then (and for devices speaking similary mal-designed vendor binary protocols) accumulating a complete PVT thus requires decisions about the following sorts of actions:

What data will be buffered, and for how long.
When the accumulated data will be shipped to the user.
When to invalidate some or all of the buffered data.
The when-to-ship question assumes watcher mode is on; if the user queries explicitly the when-to-ship decision is out of our hands.

In thinking about these decisions, it's useful to consider the set of events on which an action like "merge new data into PVT buffer" or "clear the PVT data buffer" or "ship report to user" can trigger.

On receipt of any sentence or packet from the GPS.
On receipt of a specified sentence or packet from the GPS.
When the timestamp of a sentence or packet differs from the last timestamp recorded.
When some or all of the PVT data has not been refreshed for a specified number of seconds.

That latency can really matter. If the GPS is on a car driving down the highway at 112kph (70mph), the 1 second delay in the buffered data can represent an error of 31 meters (102 feet) in reported position.

In general, buffering would make it easy to retrieve the data you want at the time you want it, but the data would not necessarily be valid for time of retrieval. Buffering makes life easier for applications that just want to display a position indicator, and harder for perfectionists that worry about precise location of moving GPSes.

The policy decision about whether you want to be a "perfectionist" or not fundamentally belongs in the client. This isn't to say gpsd could not have different buffering modes to help the client implement its decision, but the modes and their controls would have to be implemented very carefully. Otherwise we'd risk imposing the wrong policy (or, worse, a broken version of a wrong policy) on half the client applications out there.

There are hundreds, even thousands of possible sets of action-to-event bindings. The "right" binding for a particular device depends not only on the protocol it uses but on considerations like how much time latency we are willing to let the buffering policy inflict on a report.

Discussion of possible policies follows. See also the speculation later on about combining buffering with interpolation.

Report then clear per packet

A device like a SiRF-II that reports all its PVT data in a single packet needs no buffering; it should ship to the user on receipt of that packet and then invalidate the PVT buffer right afterwards. (This is a "report then clear per packet" policy.)

But triggering a buffer clear on every packet would do bad things if we're in client-pull mode. We never know when a client might ask for a response. Consider the case of two simultaneously connected clients, one sending queries and the other in watcher mode - if we clear after we ship the O message to the watcher, then the other client queries, it gets nothing in response.

Buffer all, report then clear on trigger

On the other hand, if (say) we knew that an NMEA GPS were always going to end its report cycle with GPGGA, it might make sense to buffer all data until GPGGA appears, ship a report afterwards, and then clear the PVT buffer. This would mean shipping just one report per cycle (good) at the cost of introducing some latency into the reporting of data the GPS sends earlier in the cycle (bad). (This would be "buffer all, report-then-clear on trigger")

Here's where it gets ugly. We don't know what the user's tolerance for latency is. And, in general, we can't tell when end-of-cycle, is happening, because different NMEA devices ship their sentences in different orders. Worse: we can't even count on all send cycles of the same device having the same end sentence, so the naive plan of waiting one cycle to see what the end sentence is won't work. Devices like the Garmin 48 have two different cycle sequences with different start and end sentences.

So we can't actually trigger on end-of-cycle. The only between-cycles transition we can spot more or less reliably is actually start of cycle, by looking to see when the timestamp of a sentence or packet differs from the last timestamp recorded (event 3 above). This will be after the last end-of-cycle by some (possibly large) fraction of a second; in fact, waiting for start-of-cycle to report data from the last one is the worst possible latency hit.

Buffer all, report on every packet, clear at start-of-cycle

Another possible policy is "buffer all, report on every packet, clear at start-of-cycle". This is simple and adds minimum reporting latency to new data, but means that O responses can issue more than once per second with accumulating sets of data that only sum up to a complete report on the last one.

Another advantage of this policy is that when applied to a device like a SiRF-II or Zodiac chipset that ships only one PVT packet per cycle, it collapses to "report then clear per packet".

Here's a disadvantage: the client display, unless its does its own buffering, may flicker annoyingly. The problem is this: suppose we get an altitude in a GGA packet, throw an O response at the client, and display it. This happens to be late in the report cycle. Start of cycle clears the buffer; a GPRMC arrives with no altitude in it. The altitude value in the client display flickers to "not available", and won't be restored until the following GGA.

This is the policy gpsd currently follows.

Buffer all, report on every packet, never clear data

Has all the advantages of the previous policy and avoids the flicker problem. However, it would mean the user often sees data that is up to one cycle time stale. This might be OK except that it could happen even if the GPS has just lost lock — that is, in the interval between start of cycle and receipt of sentence with the mode field invalidating the, bad data, gpsd would be pretending to know something it doesn't.

GPSes sometimes do this, delivering data from dead-reckoning or interpolation when they've lost lock. This comes up most often with altitude; because of the tall skinny shape of the tetrahedra defined by GPS range data, a GPS can lose 3D lock but still have an altitude guess good enough for it to deliver a 2D fix with confidence. But just because GPSes fudge is no good reason for gpsd to add a layer of prevarication on top of that.

But the conclusive argument against this policy is that, while it can be simulated by buffering data delivered according to a clear-every-cycle policy, the reverse is not true. Under this policy there would be no way to distinguish in gpsd's reports between data valid now and data held over from a previous cycle; on the other hand, under a clear-at-start-of-cycle policy the client can still do whatever buffering and smoothing it wants to.

Buffer all, report on every packet, time out old data

gpsd does not presently keep the sort of per-field ageing data needed to track the age of different PVT fields separately. But it does know how many seconds have elapsed since the last packet receipt — it uses this to tell if the device has dropped offline, by looking for an age greater than the cycle time.

When the device is returning fixes steadily, this policy will look exactly like "buffer all, report on every packet, never clear data", because every piece of data will be refreshed once per cycle. It will have the same sort of prevarication problems as that policy, too. If the device loses lock, the user will see that the PVT data is undefined only when the timeout expires.

Fine-grained timeouts using per-field aging wouldn't change this picture much. They'd mainly be useful for matching the timeout on a piece of data to its "natural" lifetime — usually 1 sec for PVT data and 5 sec for satellite-picture data.

There is no perfect option

Any potential data-management policy would have drawbacks for some devices even if it were implemented perfectly. The more complex policies would have an additional problem; buffering code with complicated flush triggers is notoriously prone to grow bugs near its edge cases.

Thus, gpsd has a serious, gnarly data-management problem at its core. This problem lurks behind many user bug reports and motivates some of the most difficult-to-understand code in the daemon. And when you look hard at the problems posed by the variable sequences of sentences in NMEA devices...it gets even nastier.

It's tempting to think that, if we knew the device type in advance, we could write a state machine adapted to its sentence sequence that would do a perfect job of data management. The trouble with this theory is that we'd need separate state machines for each NMEA dialect. That way lies madness — and an inability to cope gracefully with devices never seen before. Since the zero-configuration design goal means that we can't count on the user or administrator passing device-type information to gpsd in the first place, we avoid this trap.

But that means gpsd has to adapt to what it sees coming down the wire. At least it can use a different policy for each device driver, dispatching once the device type has been identified.

Combining buffering with interpolation: a speculative design

One possible choice (not let implemented in gpsd or its client libraries) would be to combine buffering with interpolation. Here's a speculative design for a client which does its own extrapolation:

Thread 1: GPS handler. Sets watcher mode. Each time a report is received, it stores that data along with the result of a call to gettimeofday() (so that we have microsecond precision, rather than just seconds from time()). No need to double-buffer any data - just the latest complete O report is sufficient. When the client receives a query from thread 2, it applies a differential correction to the last reported position, based on the last reported velocity and the difference between the stored gettimeofday() time and a new gettimeofday() call.

Thread 2: main application. Driven by whatever events you want it to be. Queries thread 1 whenever it needs an accurate GPS position NOW.

The main problem with this approach is that it would require an onboard clock far more accurate than the GPS's once-per-second reports. This is a problem; in general, we can't assume that a gpsd instance running in a car or boat will have access to ntpd or NIST radio time signals.

Designing Ahead

This section is preliminary design sketches for things we are likely to need to do in the future.

Non-PVT Data

There are lots of other things a sensor wedded to a GPS might report that don't fit the position-velocity-time model of the O report. Depth of water. Temperature of water. Compass heading. Roll. Pitch. Yaw. We've already had requests to handle some of these for NMEA-emitting devices like magnetic compasses (which report heading via a proprietary TNTHTM sentence) and fish finders (which report water depth and temperature via NMEA DPT and MTW sentences).

To cope with this, we could either:

extend the O tuple to report non-PVT kinds of information, or
define a new set of requests and responses for non-PVT data.

The simplest way to attack this problem would be to just start adding fields to the O response for each new kind of data we get asked to handle. The problem with this is that we could pile up fields forever and end up with a monstrously long report most of the fields of which are always '?' and some of which are the moral equivalent of "count of the nose hairs on the nearest warthog".

There's a slightly more civilized possibility. We could add optional private-use fields to the O report, the semantics of which would be allowed to vary by device. So, for example, the Humminbird would return water depth in the first private-use field, while the TNT would return heading. Clients would have to know, based on the device type as revealed by I, how to interpret and label the data in the private-use fields.

If we go for alternative (2), the first consequence is that we actually have to implement a Version 4 protocol. The existing GPSD protocol is about out of namespace.

Design Sketch for Version 4 Protocol

There are almost no more letters left in the namespace of the GPSD version 3 protocol. At time of writing in late 2006 we can have one more command, 'h', and that's it. After that, extending the set will require a new command/response syntax.

Steps towards defining a new syntax have been taken. The senior developers have agreed on a new protocol with a basic sentence syntax something like this:

<request> ::= <introducer> <commands> <newline>

<command> ::= <command-id>:<arg1>,<arg2>,<arg3>,...<argn>;

Each request shall begin with an introducer chararacter. This is given as '!' here but that could change. The purpose of the introducer is to inform the command parser that extended commands follow. Each request shall end with a newline indication, which may consist of either LF or CR-LF.

All requests shall be composed of US-ASCII characters and shall be no more than 78 characters in length, exclusive of the trailing newline. Each request may consist of one or more semicolon-terminated commands.

Each command shall begin with a command identifier. The command identifier shall begin with an alphabetic character and consist of alphanumeric characters. It shall be followed by a colon. The 52 command identifiers consisting of a single alphabetic are reserved equivalents of the Version 3 commands.

The colon may be followed by zero or more comma-separated arguments. Arguments may contain any US-ASCII character other than the introducer character, the colon, the comma, the semicolon, or the hash '#' character (reserved for comments). Trailing zero-length arguments will be ignored; thus, in particular, machine-generated requests may terminate the final argument with a comma without confusing gpsd's parser.

Responses shall have the same format as requests. The response to a command shall have the same identifier as the command. As an exception, commands may have multi-line responses in an SMTP-like format not yet decided; this is to retain the 78-character line limit while allowing the damon to pass back more data than can fit on a line.

Blind alleys

Things we've considered doing and rejected.

Reporting fix data only once per cycle

See the discussion of the buffering problem, above. The "Buffer all, report then clear on start-of-cycle" policy would introduce an unpleasant amount of latency. gpsd actually uses the "Buffer all, report on every packet, clear at start-of-cycle" policy.

Allowing clients to ship arbitrary control strings to a GPS

Tempting — it would allow us to do sirfmon-like things with the daemon running — but a bad idea. It would make denial-of-service attacks on applications using the GPS far too easy. For example, suppose the control string were a baud-rate change?

Using libusb to do USB device discovery

There has been some consideration of going to the cross-platform libusb library to do USB device discovery. This would create an external dependency that gpsd doesn't now have, and bring more complexity on board than is probably desirable.

We've chosen instead to rely on the local hotplug system. That way gpsd can concentrate solely on knowing about GPSes.

Setting FIFO threshold to 1 to reduce jitter in serial-message times

When using gpsd as a time reference, one of the things we'd like to do is make the amount of lag in the message path from GPS to GPS small and with as little jitter as possible, so we can correct for it with a constant offset.

A possibility we considered is to set the FIFO threshold on the serial device UART to 1 using TIOCGSERIAL/TIOCSSERIAL. This would, in effect, disable transmission buffering, increasing lag but decreasing jitter.

But it's almost certainly not worth the work. Rob Janssen, our timekeeping expert, reckons that at 4800bps the UART buffering can cause at most about 15msec of jitter. This is, observably, swamped by other less controllable sources of variation.

Stop using a compiled-in UTC-TAI offset

Instead, from the hotplug script, we could maintain a local offset file:

If there is no local offset file, download the current leap-second offset from IERS or the U.S. Naval Observatory and copy it to a local offset file
If there is a local offset file, consider it stale after five months and reload it.
gpsd should read the local offset file when it starts up, if it exists.

However, it turns out this is only an issue for EverMore chips. SiRF GPSes can get the offset from the PPS or subframe data; NMEA GPSes don't need it; and the other binary protocols supply it. Looks like it's not worth doing.

Subsecond polling

gpsd relies on the GPS to periodically send PVT reports to it. A few GPSes have he capability to change their cycle time so they can ship reports more often (gpsd 'c' command). These all send in some vendor-binary format; no NMEA GPS I've ever seen allows you to set a cycle time of less than a second, if only because at 4800bps, a full PVT report takes just under one second in NMEA.

But most GPSes send PVT reports once a second. At 50km/h (31mi/h) that's 13.8 meters change in position between updates, about the same as the uncertainty of position under typical conditions.

There is, however, a way to sample some GPSes at higher frequency. SiRF chips, and some others, allow you to shut down periodic notifications and poll them for PVT. At 57600bps we could poll a NMEA GPS 16 times a second, and a SiRF one maybe 18 times a second.

Alas, Chris Kuethe reports: "At least on the SiRF 2 and 3 receivers I have, you get one fix per second. I cooked up a test harness to disable as many periodic messages as possible and then poll as quickly as possible, and the receiver would not kick out more than one fix per second. Foo!"

So subsecond polling would be a blind alley for all SiRF devices and all NMEA devices. That's well over 90% of cases and renders it not worh doing.

Contents

Goals and philosophy of the project