.\" tbl | readme.ms | [tn]roff -ms | ... .\" note the "C" (courier) and "CB" fonts: you will probably have to .\" change these. .\" $Id: readme.ms,v 1.1 90/12/13 13:09:15 oz Exp Locker: oz $ .de P1 .br .nr dT 4 .nf .ft C .sp .5 .nr t \\n(dT*\\w'x'u .ta 1u*\\ntu 2u*\\ntu 3u*\\ntu 4u*\\ntu 5u*\\ntu 6u*\\ntu 7u*\\ntu 8u*\\ntu 9u*\\ntu 10u*\\ntu 11u*\\ntu 12u*\\ntu 13u*\\ntu 14u*\\ntu .. .de P2 .br .ft 1 .br .sp .5 .br .fi .. .\" CW uses the typewriter/courier font. .de CW \fC\\$1\\fP\\$2 .. .\" Footnote numbering [by Henry Spencer] .\" \*f for a footnote number.. .\" .FS .\" \*F .\" .FE .\" .ds f \\u\\s-2\\n+f\\s+2\\d .nr f 0 1 .ds F \\n+F. .nr F 0 1 .ND .LP .TL \fIsdbm\fP \(em Substitute DBM .br or .br Berkeley \fIndbm\fP for Every UN*X\** Made Simple .AU Ozan (oz) Yigit .AI The Guild of PD Software Toolmakers Toronto - Canada .sp oz@nexus.yorku.ca .LP .FS UN*X is not a trademark of any (dis)organization. .FE .sp 2 \fIImplementation is the sincerest form of flattery. \(em L. Peter Deutsch\fP .SH A The Clone of the \fIndbm\fP library .PP The sources accompanying this notice \(em \fIsdbm\fP \(em constitute the first public release (Dec. 1990) of a complete clone of the Berkeley UN*X \fIndbm\fP library. The \fIsdbm\fP library is meant to clone the proven functionality of \fIndbm\fP as closely as possible, including a few improvements. It is practical, easy to understand, and compatible. The \fIsdbm\fP library is not derived from any licensed, proprietary or copyrighted software. .PP The \fIsdbm\fP implementation is based on a 1978 algorithm [Lar78] by P.-A. (Paul) Larson known as "Dynamic Hashing". In the course of searching for a substitute for \fIndbm\fP, I prototyped three different external-hashing algorithms [Lar78, Fag79, Lit80] and ultimately chose Larson's algorithm as a basis of the \fIsdbm\fP implementation. The Bell Labs \fIdbm\fP (and therefore \fIndbm\fP) is based on an algorithm invented by Ken Thompson, [Tho90, Tor87] and predates Larson's work. .PP The \fIsdbm\fR programming interface is totally compatible with \fIndbm\fP and includes a slight improvement in database initialization. It is also expected to be binary-compatible under most UN*X versions that support the \fIndbm\fP library. .PP The \fIsdbm\fP implementation shares the shortcomings of the \fIndbm\fP library, as a side effect of various simplifications to the original Larson algorithm. It does produce \fIholes\fP in the page file as it writes pages past the end of file. (Larson's paper include a clever solution to this problem that is a result of using the hash value directly as a block address.) On the other hand, extensive tests seem to indicate that \fIsdbm\fP creates fewer holes in general, and the resulting pagefiles are smaller. The \fIsdbm\fP implementation is also faster than \fIndbm\fP in database creation. Unlike the \fIndbm\fP, the \fIsdbm\fP .CW store operation will not "wander away" trying to split its data pages to insert a datum that \fIcannot\fP (due to elaborate worst-case situations) be inserted. (It will fail after a pre-defined number of attempts.) .SH Important Compatibility Warning .PP The \fIsdbm\fP and \fIndbm\fP libraries \fIcannot\fP share databases: one cannot read the (dir/pag) database created by the other. This is due to the differences between the \fIndbm\fP and \fIsdbm\fP algorithms\**, .FS Torek's discussion [Tor87] indicates that \fIdbm/ndbm\fP implementations use the hash value to traverse the radix trie differently than \fIsdbm\fP and as a result, the page indexes are generated in \fIdifferent\fP order. For more information, send e-mail to the author. .FE and the hash functions used. It is easy to convert between the \fIdbm/ndbm\fP databases and \fIsdbm\fP by ignoring the index completely: see .CW dbd , .CW dbu etc. .R .LP .SH Notice of Intellectual Property .LP \fIThe entire\fP sdbm \fIlibrary package, as authored by me,\fP Ozan S. Yigit, \fIis hereby placed in the public domain.\fP As such, the author is not responsible for the consequences of use of this software, no matter how awful, even if they arise from defects in it. There is no expressed or implied warranty for the \fIsdbm\fP library. .PP Since the \fIsdbm\fP library package is in the public domain, this \fIoriginal\fP release or any additional public-domain releases of the modified original cannot possibly (by definition) be withheld from you. Also by definition, You (singular) have all the rights to this code (including the right to sell without permission, the right to hoard\** .FS You cannot really hoard something that is available to the public at large, but try if it makes you feel any better. .FE and the right to do other icky things as you see fit) but those rights are also granted to everyone else. .PP Please note that all previous distributions of this software contained a copyright (which is now dropped) to protect its origins and its current public domain status against any possible claims and/or challenges. .SH Acknowledgments .PP Many people have been very helpful and supportive. A partial list would necessarily include Rayan Zacherissen (who contributed the man page, and also hacked a MMAP version of \fIsdbm\fP), Arnold Robbins, Chris Lewis, Bill Davidsen, Henry Spencer, Geoff Collyer, Rich Salz (who got me started in the first place), Johannes Ruschein (who did the minix port) and David Tilbrook. I thank you all. .SH Distribution Manifest and Notes .LP This distribution of \fIsdbm\fP includes (at least) the following: .P1 CHANGES change log README this file. biblio a small bibliography on external hashing dba.c a crude (n/s)dbm page file analyzer dbd.c a crude (n/s)dbm page file dumper (for conversion) dbe.1 man page for dbe.c dbe.c Janick's database editor dbm.c a dbm library emulation wrapper for ndbm/sdbm dbm.h header file for the above dbu.c a crude db management utility hash.c hashing function makefile guess. pair.c page-level routines (posted earlier) pair.h header file for the above readme.ms troff source for the README file sdbm.3 man page sdbm.c the real thing sdbm.h header file for the above tune.h place for tuning & portability thingies util.c miscellaneous .P2 .PP .CW dbu is a simple database manipulation program\** that tries to look .FS The .CW dbd , .CW dba , .CW dbu utilities are quick hacks and are not fit for production use. They were developed late one night, just to test out \fIsdbm\fP, and convert some databases. .FE like Bell Labs' .CW cbt utility. It is currently incomplete in functionality. I use .CW dbu to test out the routines: it takes (from stdin) tab separated key/value pairs for commands like .CW build or .CW insert or takes keys for commands like .CW delete or .CW look . .P1 dbu dbmfile .P2 .PP .CW dba is a crude analyzer of \fIdbm/sdbm/ndbm\fP page files. It scans the entire page file, reporting page level statistics, and totals at the end. .PP .CW dbd is a crude dump program for \fIdbm/ndbm/sdbm\fP databases. It ignores the bitmap, and dumps the data pages in sequence. It can be used to create input for the .CW dbu utility. Note that .CW dbd will skip any NULLs in the key and data fields, thus is unsuitable to convert some peculiar databases that insist in including the terminating null. .PP I have also included a copy of the .CW dbe (\fIndbm\fP DataBase Editor) by Janick Bergeron [janick@bnr.ca] for your pleasure. You may find it more useful than the little .CW dbu utility. .PP .CW dbm.[ch] is a \fIdbm\fP library emulation on top of \fIndbm\fP (and hence suitable for \fIsdbm\fP). Written by Robert Elz. .PP The \fIsdbm\fP library has been around in beta test for quite a long time, and from whatever little feedback I received (maybe no news is good news), I believe it has been functioning without any significant problems. I would, of course, appreciate all fixes and/or improvements. Portability enhancements would especially be useful. .SH Implementation Issues .PP Hash functions: The algorithm behind \fIsdbm\fP implementation needs a good bit-scrambling hash function to be effective. I ran into a set of constants for a simple hash function that seem to help \fIsdbm\fP perform better than \fIndbm\fP for various inputs: .P1 /* * polynomial conversion ignoring overflows * 65599 nice. 65587 even better. */ long dbm_hash(char *str, int len) { unsigned long n = 0; while (len--) n = n * 65599 + *str++; return n; } .P2 .PP There may be better hash functions for the purposes of dynamic hashing. Try your favorite, and check the pagefile. If it contains too many pages with too many holes, (in relation to this one for example) or if \fIsdbm\fP simply stops working (fails after .CW SPLTMAX attempts to split) when you feed your NEWS .CW history file to it, you probably do not have a good hashing function. If you do better (for different types of input), I would like to know about the function you use. .PP Block sizes: It seems (from various tests on a few machines) that a page file block size .CW PBLKSIZ of 1024 is by far the best for performance, but this also happens to limit the size of a key/value pair. Depending on your needs, you may wish to increase the page size, and also adjust .CW PAIRMAX (the maximum size of a key/value pair allowed: should always be at least three words smaller than .CW PBLKSIZ .) accordingly. The system-wide version of the library should probably be configured with 1024 (distribution default), as this appears to be sufficient for most common uses of \fIsdbm\fP. .SH Portability .PP This package has been tested in many different UN*Xes even including minix, and appears to be reasonably portable. This does not mean it will port easily to non-UN*X systems. .SH Notes and Miscellaneous .PP The \fIsdbm\fP is not a very complicated package, at least not after you familiarize yourself with the literature on external hashing. There are other interesting algorithms in existence that ensure (approximately) single-read access to a data value associated with any key. These are directory-less schemes such as \fIlinear hashing\fP [Lit80] (+ Larson variations), \fIspiral storage\fP [Mar79] or directory schemes such as \fIextensible hashing\fP [Fag79] by Fagin et al. I do hope these sources provide a reasonable playground for experimentation with other algorithms. See the June 1988 issue of ACM Computing Surveys [Enb88] for an excellent overview of the field. .PG .SH References .LP .IP [Lar78] 4m P.-A. Larson, "Dynamic Hashing", \fIBIT\fP, vol. 18, pp. 184-201, 1978. .IP [Tho90] 4m Ken Thompson, \fIprivate communication\fP, Nov. 1990 .IP [Lit80] 4m W. Litwin, "Linear Hashing: A new tool for file and table addressing", \fIProceedings of the 6th Conference on Very Large Dabatases (Montreal)\fP, pp. 212-223, Very Large Database Foundation, Saratoga, Calif., 1980. .IP [Fag79] 4m R. Fagin, J. Nievergelt, N. Pippinger, and H. R. Strong, "Extendible Hashing - A Fast Access Method for Dynamic Files", \fIACM Trans. Database Syst.\fP, vol. 4, no.3, pp. 315-344, Sept. 1979. .IP [Wal84] 4m Rich Wales, "Discussion of 'dbm' data base system", \fIUSENET newsgroup unix.wizards\fP, Jan. 1984. .IP [Tor87] 4m Chris Torek, "Re: dbm.a and ndbm.a archives", \fIUSENET newsgroup comp.unix\fP, 1987. .IP [Mar79] 4m G. N. Martin, "Spiral Storage: Incrementally Augmentable Hash Addressed Storage", \fITechnical Report #27\fP, University of Varwick, Coventry, U.K., 1979. .IP [Enb88] 4m R. J. Enbody and H. C. Du, "Dynamic Hashing Schemes",\fIACM Computing Surveys\fP, vol. 20, no. 2, pp. 85-113, June 1988.