diff options
Diffstat (limited to 'ext/dbm/sdbm/README')
-rw-r--r-- | ext/dbm/sdbm/README | 396 |
1 files changed, 396 insertions, 0 deletions
diff --git a/ext/dbm/sdbm/README b/ext/dbm/sdbm/README new file mode 100644 index 0000000000..cd7312cc57 --- /dev/null +++ b/ext/dbm/sdbm/README @@ -0,0 +1,396 @@ + + + + + + + sdbm - Substitute DBM + or + Berkeley ndbm for Every UN*X[1] Made Simple + + Ozan (oz) Yigit + + The Guild of PD Software Toolmakers + Toronto - Canada + + oz@nexus.yorku.ca + + + +Implementation is the sincerest form of flattery. - L. Peter +Deutsch + +A The Clone of the ndbm library + + The sources accompanying this notice - sdbm - consti- +tute the first public release (Dec. 1990) of a complete +clone of the Berkeley UN*X ndbm library. The sdbm library is +meant to clone the proven functionality of ndbm as closely +as possible, including a few improvements. It is practical, +easy to understand, and compatible. The sdbm library is not +derived from any licensed, proprietary or copyrighted +software. + + The sdbm implementation is based on a 1978 algorithm +[Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''. +In the course of searching for a substitute for ndbm, I pro- +totyped three different external-hashing algorithms [Lar78, +Fag79, Lit80] and ultimately chose Larson's algorithm as a +basis of the sdbm implementation. The Bell Labs dbm (and +therefore ndbm) is based on an algorithm invented by Ken +Thompson, [Tho90, Tor87] and predates Larson's work. + + The sdbm programming interface is totally compatible +with ndbm and includes a slight improvement in database ini- +tialization. It is also expected to be binary-compatible +under most UN*X versions that support the ndbm library. + + The sdbm implementation shares the shortcomings of the +ndbm library, as a side effect of various simplifications to +the original Larson algorithm. It does produce holes in the +page file as it writes pages past the end of file. (Larson's +paper include a clever solution to this problem that is a +result of using the hash value directly as a block address.) +On the other hand, extensive tests seem to indicate that +sdbm creates fewer holes in general, and the resulting page- +files are smaller. The sdbm implementation is also faster +than ndbm in database creation. Unlike the ndbm, the sdbm +_________________________ + + [1] UN*X is not a trademark of any (dis)organization. + + + + + + + + + + - 2 - + + +store operation will not ``wander away'' trying to split its +data pages to insert a datum that cannot (due to elaborate +worst-case situations) be inserted. (It will fail after a +pre-defined number of attempts.) + +Important Compatibility Warning + + The sdbm and ndbm libraries cannot share databases: one +cannot read the (dir/pag) database created by the other. +This is due to the differences between the ndbm and sdbm +algorithms[2], and the hash functions used. It is easy to +convert between the dbm/ndbm databases and sdbm by ignoring +the index completely: see dbd, dbu etc. + + +Notice of Intellectual Property + +The entire sdbm library package, as authored by me, Ozan S. +Yigit, is hereby placed in the public domain. As such, the +author is not responsible for the consequences of use of +this software, no matter how awful, even if they arise from +defects in it. There is no expressed or implied warranty for +the sdbm library. + + Since the sdbm library package is in the public domain, +this original release or any additional public-domain +releases of the modified original cannot possibly (by defin- +ition) be withheld from you. Also by definition, You (singu- +lar) have all the rights to this code (including the right +to sell without permission, the right to hoard[3] and the +right to do other icky things as you see fit) but those +rights are also granted to everyone else. + + Please note that all previous distributions of this +software contained a copyright (which is now dropped) to +protect its origins and its current public domain status +against any possible claims and/or challenges. + +Acknowledgments + + Many people have been very helpful and supportive. A +partial list would necessarily include Rayan Zacherissen +(who contributed the man page, and also hacked a MMAP +_________________________ + + [2] Torek's discussion [Tor87] indicates that +dbm/ndbm implementations use the hash value to traverse +the radix trie differently than sdbm and as a result, +the page indexes are generated in different order. For +more information, send e-mail to the author. + [3] You cannot really hoard something that is avail- +able to the public at large, but try if it makes you +feel any better. + + + + + + + + + + + - 3 - + + +version of sdbm), Arnold Robbins, Chris Lewis, Bill David- +sen, Henry Spencer, Geoff Collyer, Rich Salz (who got me +started in the first place), Johannes Ruschein (who did the +minix port) and David Tilbrook. I thank you all. + +Distribution Manifest and Notes + +This distribution of sdbm includes (at least) the following: + + CHANGES change log + README this file. + biblio a small bibliography on external hashing + dba.c a crude (n/s)dbm page file analyzer + dbd.c a crude (n/s)dbm page file dumper (for conversion) + dbe.1 man page for dbe.c + dbe.c Janick's database editor + dbm.c a dbm library emulation wrapper for ndbm/sdbm + dbm.h header file for the above + dbu.c a crude db management utility + hash.c hashing function + makefile guess. + pair.c page-level routines (posted earlier) + pair.h header file for the above + readme.ms troff source for the README file + sdbm.3 man page + sdbm.c the real thing + sdbm.h header file for the above + tune.h place for tuning & portability thingies + util.c miscellaneous + + dbu is a simple database manipulation program[4] that +tries to look like Bell Labs' cbt utility. It is currently +incomplete in functionality. I use dbu to test out the rou- +tines: it takes (from stdin) tab separated key/value pairs +for commands like build or insert or takes keys for commands +like delete or look. + + dbu <build|creat|look|insert|cat|delete> dbmfile + + dba is a crude analyzer of dbm/sdbm/ndbm page files. It +scans the entire page file, reporting page level statistics, +and totals at the end. + + dbd is a crude dump program for dbm/ndbm/sdbm data- +bases. It ignores the bitmap, and dumps the data pages in +sequence. It can be used to create input for the dbu util- +ity. Note that dbd will skip any NULLs in the key and data +fields, thus is unsuitable to convert some peculiar +_________________________ + + [4] The dbd, dba, dbu utilities are quick hacks and +are not fit for production use. They were developed +late one night, just to test out sdbm, and convert some +databases. + + + + + + + + + + - 4 - + + +databases that insist in including the terminating null. + + I have also included a copy of the dbe (ndbm DataBase +Editor) by Janick Bergeron [janick@bnr.ca] for your pleas- +ure. You may find it more useful than the little dbu util- +ity. + + dbm.[ch] is a dbm library emulation on top of ndbm (and +hence suitable for sdbm). Written by Robert Elz. + + The sdbm library has been around in beta test for quite +a long time, and from whatever little feedback I received +(maybe no news is good news), I believe it has been func- +tioning without any significant problems. I would, of +course, appreciate all fixes and/or improvements. Portabil- +ity enhancements would especially be useful. + +Implementation Issues + + Hash functions: The algorithm behind sdbm implementa- +tion needs a good bit-scrambling hash function to be effec- +tive. I ran into a set of constants for a simple hash func- +tion that seem to help sdbm perform better than ndbm for +various inputs: + + /* + * polynomial conversion ignoring overflows + * 65599 nice. 65587 even better. + */ + long + dbm_hash(char *str, int len) { + register unsigned long n = 0; + + while (len--) + n = n * 65599 + *str++; + return n; + } + + There may be better hash functions for the purposes of +dynamic hashing. Try your favorite, and check the pagefile. +If it contains too many pages with too many holes, (in rela- +tion to this one for example) or if sdbm simply stops work- +ing (fails after SPLTMAX attempts to split) when you feed +your NEWS history file to it, you probably do not have a +good hashing function. If you do better (for different +types of input), I would like to know about the function you +use. + + Block sizes: It seems (from various tests on a few +machines) that a page file block size PBLKSIZ of 1024 is by +far the best for performance, but this also happens to limit +the size of a key/value pair. Depending on your needs, you +may wish to increase the page size, and also adjust PAIRMAX +(the maximum size of a key/value pair allowed: should always + + + + + + + + + + - 5 - + + +be at least three words smaller than PBLKSIZ.) accordingly. +The system-wide version of the library should probably be +configured with 1024 (distribution default), as this appears +to be sufficient for most common uses of sdbm. + +Portability + + This package has been tested in many different UN*Xes +even including minix, and appears to be reasonably portable. +This does not mean it will port easily to non-UN*X systems. + +Notes and Miscellaneous + + The sdbm is not a very complicated package, at least +not after you familiarize yourself with the literature on +external hashing. There are other interesting algorithms in +existence that ensure (approximately) single-read access to +a data value associated with any key. These are directory- +less schemes such as linear hashing [Lit80] (+ Larson varia- +tions), spiral storage [Mar79] or directory schemes such as +extensible hashing [Fag79] by Fagin et al. I do hope these +sources provide a reasonable playground for experimentation +with other algorithms. See the June 1988 issue of ACM Com- +puting Surveys [Enb88] for an excellent overview of the +field. + +References + + +[Lar78] + P.-A. Larson, ``Dynamic Hashing'', BIT, vol. 18, pp. + 184-201, 1978. + +[Tho90] + Ken Thompson, private communication, Nov. 1990 + +[Lit80] + W. Litwin, `` Linear Hashing: A new tool for file and + table addressing'', Proceedings of the 6th Conference on + Very Large Dabatases (Montreal), pp. 212-223, Very + Large Database Foundation, Saratoga, Calif., 1980. + +[Fag79] + R. Fagin, J. Nievergelt, N. Pippinger, and H. R. + Strong, ``Extendible Hashing - A Fast Access Method for + Dynamic Files'', ACM Trans. Database Syst., vol. 4, + no.3, pp. 315-344, Sept. 1979. + +[Wal84] + Rich Wales, ``Discussion of "dbm" data base system'', + USENET newsgroup unix.wizards, Jan. 1984. + +[Tor87] + Chris Torek, ``Re: dbm.a and ndbm.a archives'', + + + + + + + + + + - 6 - + + + USENET newsgroup comp.unix, 1987. + +[Mar79] + G. N. Martin, ``Spiral Storage: Incrementally Augment- + able Hash Addressed Storage'', Technical Report #27, + University of Varwick, Coventry, U.K., 1979. + +[Enb88] + R. J. Enbody and H. C. Du, ``Dynamic Hashing + Schemes'',ACM Computing Surveys, vol. 20, no. 2, pp. + 85-113, June 1988. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + |