GPSD has an exceptionally low defect rate. Our first Coverity scan, in March 2007, turned up only 2 errors in over 22KLOC; our second, in May 2012, turned up just 13 errors in 72KLOC, all on rarely-used code paths. Though the software is very widely deployed on multiple platforms, we often go for months between new tracker bugs.
Here's how that's done:
GPSD has around 100 unit tests and regression tests, including sample device output for almost every sensor type we support. We've put a lot of effort into making the tests easy and fast to run so they can be run often. This makes it actively difficult for random code changes to break our device drivers without somebody noticing.
Which isn't to say those drivers can't be wrong, just that the ways they can be wrong are constrained to be through either:
Our first Coverity run only turned up two driver bugs - static buffer overruns in methods for changing the device's reporting protocol and line speed that escaped notice because they can't be checked in our test harnesses but only on a live device.
This is also why Coverity didn't find defects on commonly-used code paths. If there'd been any, the regression tests probably would have smashed them out long ago. A great deal of boring, grubby, finicky work went into getting our test framework in shape, but it has paid off hugely.
We regulary apply Coverity, cppcheck, and scan-build. We've as yet been unable to eliminate all scan-build warnings, but we require the code to audit clean under all the other tools on each release.
We used to use splint, until we found that we couldn't replicate the results of splint runs reliably in different Linux distributions. Also, far and away the biggest pain in the ass to use. You have to drop cryptic, cluttery magic comments all over your source to pass hints to splint and suppress its extremely voluminous and picky output. We have retired in favor of more modern analyzers.
cppcheck is much newer and much less prone to false positives. Likewise scan-build. But here's what experience tells us: each of these tools finds overlapping but different sets of bugs. Coverity is, by reputation at least, capable enough that it might dominate one or more of them - but why take chances? Best to use all and constrain the population of undiscovered bugs into as small a fraction of the state space as we can.
We also use valgrind to check for memory leaks, though this is not expected to turn up bugs (and doesn't) due to our no-dynamic-allocation house rule.
Neither magic or genius is required to get defect densities as low as GPSD's. It's more a matter of sheer bloody-minded persistence - the willingness to do the up-front work required to apply and discipline fault scanners, write test harnesses, and automate your verification process so you can run a truly rigorous validation with the push of a button.
Many more projects could do this than do. And many more projects should.