diff options
Diffstat (limited to 'ext/dbm/sdbm/readme.ms')
-rw-r--r-- | ext/dbm/sdbm/readme.ms | 353 |
1 files changed, 353 insertions, 0 deletions
diff --git a/ext/dbm/sdbm/readme.ms b/ext/dbm/sdbm/readme.ms new file mode 100644 index 0000000000..01ca17ccdf --- /dev/null +++ b/ext/dbm/sdbm/readme.ms @@ -0,0 +1,353 @@ +.\" tbl | readme.ms | [tn]roff -ms | ... +.\" note the "C" (courier) and "CB" fonts: you will probably have to +.\" change these. +.\" $Id: readme.ms,v 1.1 90/12/13 13:09:15 oz Exp Locker: oz $ + +.de P1 +.br +.nr dT 4 +.nf +.ft C +.sp .5 +.nr t \\n(dT*\\w'x'u +.ta 1u*\\ntu 2u*\\ntu 3u*\\ntu 4u*\\ntu 5u*\\ntu 6u*\\ntu 7u*\\ntu 8u*\\ntu 9u*\\ntu 10u*\\ntu 11u*\\ntu 12u*\\ntu 13u*\\ntu 14u*\\ntu +.. +.de P2 +.br +.ft 1 +.br +.sp .5 +.br +.fi +.. +.\" CW uses the typewriter/courier font. +.de CW +\fC\\$1\\fP\\$2 +.. + +.\" Footnote numbering [by Henry Spencer] +.\" <text>\*f for a footnote number.. +.\" .FS +.\" \*F <footnote text> +.\" .FE +.\" +.ds f \\u\\s-2\\n+f\\s+2\\d +.nr f 0 1 +.ds F \\n+F. +.nr F 0 1 + +.ND +.LP +.TL +\fIsdbm\fP \(em Substitute DBM +.br +or +.br +Berkeley \fIndbm\fP for Every UN*X\** Made Simple +.AU +Ozan (oz) Yigit +.AI +The Guild of PD Software Toolmakers +Toronto - Canada +.sp +oz@nexus.yorku.ca +.LP +.FS +UN*X is not a trademark of any (dis)organization. +.FE +.sp 2 +\fIImplementation is the sincerest form of flattery. \(em L. Peter Deutsch\fP +.SH +A The Clone of the \fIndbm\fP library +.PP +The sources accompanying this notice \(em \fIsdbm\fP \(em constitute +the first public release (Dec. 1990) of a complete clone of +the Berkeley UN*X \fIndbm\fP library. The \fIsdbm\fP library is meant to +clone the proven functionality of \fIndbm\fP as closely as possible, +including a few improvements. It is practical, easy to understand, and +compatible. +The \fIsdbm\fP library is not derived from any licensed, proprietary or +copyrighted software. +.PP +The \fIsdbm\fP implementation is based on a 1978 algorithm +[Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''. +In the course of searching for a substitute for \fIndbm\fP, I +prototyped three different external-hashing algorithms [Lar78, Fag79, Lit80] +and ultimately chose Larson's algorithm as a basis of the \fIsdbm\fP +implementation. The Bell Labs +\fIdbm\fP (and therefore \fIndbm\fP) is based on an algorithm invented by +Ken Thompson, [Tho90, Tor87] and predates Larson's work. +.PP +The \fIsdbm\fR programming interface is totally compatible +with \fIndbm\fP and includes a slight improvement in database initialization. +It is also expected to be binary-compatible under most UN*X versions that +support the \fIndbm\fP library. +.PP +The \fIsdbm\fP implementation shares the shortcomings of the \fIndbm\fP +library, as a side effect of various simplifications to the original Larson +algorithm. It does produce \fIholes\fP in the page file as it writes +pages past the end of file. (Larson's paper include a clever solution to +this problem that is a result of using the hash value directly as a block +address.) On the other hand, extensive tests seem to indicate that \fIsdbm\fP +creates fewer holes in general, and the resulting pagefiles are +smaller. The \fIsdbm\fP implementation is also faster than \fIndbm\fP +in database creation. +Unlike the \fIndbm\fP, the \fIsdbm\fP +.CW store +operation will not ``wander away'' trying to split its +data pages to insert a datum that \fIcannot\fP (due to elaborate worst-case +situations) be inserted. (It will fail after a pre-defined number of attempts.) +.SH +Important Compatibility Warning +.PP +The \fIsdbm\fP and \fIndbm\fP +libraries \fIcannot\fP share databases: one cannot read the (dir/pag) +database created by the other. This is due to the differences +between the \fIndbm\fP and \fIsdbm\fP algorithms\**, +.FS +Torek's discussion [Tor87] +indicates that \fIdbm/ndbm\fP implementations use the hash +value to traverse the radix trie differently than \fIsdbm\fP +and as a result, the page indexes are generated in \fIdifferent\fP order. +For more information, send e-mail to the author. +.FE +and the hash functions +used. +It is easy to convert between the \fIdbm/ndbm\fP databases and \fIsdbm\fP +by ignoring the index completely: see +.CW dbd , +.CW dbu +etc. +.R +.LP +.SH +Notice of Intellectual Property +.LP +\fIThe entire\fP sdbm \fIlibrary package, as authored by me,\fP Ozan S. Yigit, +\fIis hereby placed in the public domain.\fP As such, the author is not +responsible for the consequences of use of this software, no matter how +awful, even if they arise from defects in it. There is no expressed or +implied warranty for the \fIsdbm\fP library. +.PP +Since the \fIsdbm\fP +library package is in the public domain, this \fIoriginal\fP +release or any additional public-domain releases of the modified original +cannot possibly (by definition) be withheld from you. Also by definition, +You (singular) have all the rights to this code (including the right to +sell without permission, the right to hoard\** +.FS +You cannot really hoard something that is available to the public at +large, but try if it makes you feel any better. +.FE +and the right to do other icky things as +you see fit) but those rights are also granted to everyone else. +.PP +Please note that all previous distributions of this software contained +a copyright (which is now dropped) to protect its +origins and its current public domain status against any possible claims +and/or challenges. +.SH +Acknowledgments +.PP +Many people have been very helpful and supportive. A partial list would +necessarily include Rayan Zacherissen (who contributed the man page, +and also hacked a MMAP version of \fIsdbm\fP), +Arnold Robbins, Chris Lewis, +Bill Davidsen, Henry Spencer, Geoff Collyer, Rich Salz (who got me started +in the first place), Johannes Ruschein +(who did the minix port) and David Tilbrook. I thank you all. +.SH +Distribution Manifest and Notes +.LP +This distribution of \fIsdbm\fP includes (at least) the following: +.P1 + CHANGES change log + README this file. + biblio a small bibliography on external hashing + dba.c a crude (n/s)dbm page file analyzer + dbd.c a crude (n/s)dbm page file dumper (for conversion) + dbe.1 man page for dbe.c + dbe.c Janick's database editor + dbm.c a dbm library emulation wrapper for ndbm/sdbm + dbm.h header file for the above + dbu.c a crude db management utility + hash.c hashing function + makefile guess. + pair.c page-level routines (posted earlier) + pair.h header file for the above + readme.ms troff source for the README file + sdbm.3 man page + sdbm.c the real thing + sdbm.h header file for the above + tune.h place for tuning & portability thingies + util.c miscellaneous +.P2 +.PP +.CW dbu +is a simple database manipulation program\** that tries to look +.FS +The +.CW dbd , +.CW dba , +.CW dbu +utilities are quick hacks and are not fit for production use. They were +developed late one night, just to test out \fIsdbm\fP, and convert some +databases. +.FE +like Bell Labs' +.CW cbt +utility. It is currently incomplete in functionality. +I use +.CW dbu +to test out the routines: it takes (from stdin) tab separated +key/value pairs for commands like +.CW build +or +.CW insert +or takes keys for +commands like +.CW delete +or +.CW look . +.P1 + dbu <build|creat|look|insert|cat|delete> dbmfile +.P2 +.PP +.CW dba +is a crude analyzer of \fIdbm/sdbm/ndbm\fP +page files. It scans the entire +page file, reporting page level statistics, and totals at the end. +.PP +.CW dbd +is a crude dump program for \fIdbm/ndbm/sdbm\fP +databases. It ignores the +bitmap, and dumps the data pages in sequence. It can be used to create +input for the +.CW dbu +utility. +Note that +.CW dbd +will skip any NULLs in the key and data +fields, thus is unsuitable to convert some peculiar databases that +insist in including the terminating null. +.PP +I have also included a copy of the +.CW dbe +(\fIndbm\fP DataBase Editor) by Janick Bergeron [janick@bnr.ca] for +your pleasure. You may find it more useful than the little +.CW dbu +utility. +.PP +.CW dbm.[ch] +is a \fIdbm\fP library emulation on top of \fIndbm\fP +(and hence suitable for \fIsdbm\fP). Written by Robert Elz. +.PP +The \fIsdbm\fP +library has been around in beta test for quite a long time, and from whatever +little feedback I received (maybe no news is good news), I believe it has been +functioning without any significant problems. I would, of course, appreciate +all fixes and/or improvements. Portability enhancements would especially be +useful. +.SH +Implementation Issues +.PP +Hash functions: +The algorithm behind \fIsdbm\fP implementation needs a good bit-scrambling +hash function to be effective. I ran into a set of constants for a simple +hash function that seem to help \fIsdbm\fP perform better than \fIndbm\fP +for various inputs: +.P1 + /* + * polynomial conversion ignoring overflows + * 65599 nice. 65587 even better. + */ + long + dbm_hash(char *str, int len) { + register unsigned long n = 0; + + while (len--) + n = n * 65599 + *str++; + return n; + } +.P2 +.PP +There may be better hash functions for the purposes of dynamic hashing. +Try your favorite, and check the pagefile. If it contains too many pages +with too many holes, (in relation to this one for example) or if +\fIsdbm\fP +simply stops working (fails after +.CW SPLTMAX +attempts to split) when you feed your +NEWS +.CW history +file to it, you probably do not have a good hashing function. +If you do better (for different types of input), I would like to know +about the function you use. +.PP +Block sizes: It seems (from various tests on a few machines) that a page +file block size +.CW PBLKSIZ +of 1024 is by far the best for performance, but +this also happens to limit the size of a key/value pair. Depending on your +needs, you may wish to increase the page size, and also adjust +.CW PAIRMAX +(the maximum size of a key/value pair allowed: should always be at least +three words smaller than +.CW PBLKSIZ .) +accordingly. The system-wide version of the library +should probably be +configured with 1024 (distribution default), as this appears to be sufficient +for most common uses of \fIsdbm\fP. +.SH +Portability +.PP +This package has been tested in many different UN*Xes even including minix, +and appears to be reasonably portable. This does not mean it will port +easily to non-UN*X systems. +.SH +Notes and Miscellaneous +.PP +The \fIsdbm\fP is not a very complicated package, at least not after you +familiarize yourself with the literature on external hashing. There are +other interesting algorithms in existence that ensure (approximately) +single-read access to a data value associated with any key. These are +directory-less schemes such as \fIlinear hashing\fP [Lit80] (+ Larson +variations), \fIspiral storage\fP [Mar79] or directory schemes such as +\fIextensible hashing\fP [Fag79] by Fagin et al. I do hope these sources +provide a reasonable playground for experimentation with other algorithms. +See the June 1988 issue of ACM Computing Surveys [Enb88] for an +excellent overview of the field. +.PG +.SH +References +.LP +.IP [Lar78] 4m +P.-A. Larson, +``Dynamic Hashing'', \fIBIT\fP, vol. 18, pp. 184-201, 1978. +.IP [Tho90] 4m +Ken Thompson, \fIprivate communication\fP, Nov. 1990 +.IP [Lit80] 4m +W. Litwin, +`` Linear Hashing: A new tool for file and table addressing'', +\fIProceedings of the 6th Conference on Very Large Dabatases (Montreal)\fP, +pp. 212-223, Very Large Database Foundation, Saratoga, Calif., 1980. +.IP [Fag79] 4m +R. Fagin, J. Nievergelt, N. Pippinger, and H. R. Strong, +``Extendible Hashing - A Fast Access Method for Dynamic Files'', +\fIACM Trans. Database Syst.\fP, vol. 4, no.3, pp. 315-344, Sept. 1979. +.IP [Wal84] 4m +Rich Wales, +``Discussion of "dbm" data base system'', \fIUSENET newsgroup unix.wizards\fP, +Jan. 1984. +.IP [Tor87] 4m +Chris Torek, +``Re: dbm.a and ndbm.a archives'', \fIUSENET newsgroup comp.unix\fP, +1987. +.IP [Mar79] 4m +G. N. Martin, +``Spiral Storage: Incrementally Augmentable Hash Addressed Storage'', +\fITechnical Report #27\fP, University of Varwick, Coventry, U.K., 1979. +.IP [Enb88] 4m +R. J. Enbody and H. C. Du, +``Dynamic Hashing Schemes'',\fIACM Computing Surveys\fP, +vol. 20, no. 2, pp. 85-113, June 1988. |