summaryrefslogtreecommitdiff
path: root/ext/dbm/sdbm/readme.ms
diff options
context:
space:
mode:
Diffstat (limited to 'ext/dbm/sdbm/readme.ms')
-rw-r--r--ext/dbm/sdbm/readme.ms353
1 files changed, 353 insertions, 0 deletions
diff --git a/ext/dbm/sdbm/readme.ms b/ext/dbm/sdbm/readme.ms
new file mode 100644
index 0000000000..01ca17ccdf
--- /dev/null
+++ b/ext/dbm/sdbm/readme.ms
@@ -0,0 +1,353 @@
+.\" tbl | readme.ms | [tn]roff -ms | ...
+.\" note the "C" (courier) and "CB" fonts: you will probably have to
+.\" change these.
+.\" $Id: readme.ms,v 1.1 90/12/13 13:09:15 oz Exp Locker: oz $
+
+.de P1
+.br
+.nr dT 4
+.nf
+.ft C
+.sp .5
+.nr t \\n(dT*\\w'x'u
+.ta 1u*\\ntu 2u*\\ntu 3u*\\ntu 4u*\\ntu 5u*\\ntu 6u*\\ntu 7u*\\ntu 8u*\\ntu 9u*\\ntu 10u*\\ntu 11u*\\ntu 12u*\\ntu 13u*\\ntu 14u*\\ntu
+..
+.de P2
+.br
+.ft 1
+.br
+.sp .5
+.br
+.fi
+..
+.\" CW uses the typewriter/courier font.
+.de CW
+\fC\\$1\\fP\\$2
+..
+
+.\" Footnote numbering [by Henry Spencer]
+.\" <text>\*f for a footnote number..
+.\" .FS
+.\" \*F <footnote text>
+.\" .FE
+.\"
+.ds f \\u\\s-2\\n+f\\s+2\\d
+.nr f 0 1
+.ds F \\n+F.
+.nr F 0 1
+
+.ND
+.LP
+.TL
+\fIsdbm\fP \(em Substitute DBM
+.br
+or
+.br
+Berkeley \fIndbm\fP for Every UN*X\** Made Simple
+.AU
+Ozan (oz) Yigit
+.AI
+The Guild of PD Software Toolmakers
+Toronto - Canada
+.sp
+oz@nexus.yorku.ca
+.LP
+.FS
+UN*X is not a trademark of any (dis)organization.
+.FE
+.sp 2
+\fIImplementation is the sincerest form of flattery. \(em L. Peter Deutsch\fP
+.SH
+A The Clone of the \fIndbm\fP library
+.PP
+The sources accompanying this notice \(em \fIsdbm\fP \(em constitute
+the first public release (Dec. 1990) of a complete clone of
+the Berkeley UN*X \fIndbm\fP library. The \fIsdbm\fP library is meant to
+clone the proven functionality of \fIndbm\fP as closely as possible,
+including a few improvements. It is practical, easy to understand, and
+compatible.
+The \fIsdbm\fP library is not derived from any licensed, proprietary or
+copyrighted software.
+.PP
+The \fIsdbm\fP implementation is based on a 1978 algorithm
+[Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''.
+In the course of searching for a substitute for \fIndbm\fP, I
+prototyped three different external-hashing algorithms [Lar78, Fag79, Lit80]
+and ultimately chose Larson's algorithm as a basis of the \fIsdbm\fP
+implementation. The Bell Labs
+\fIdbm\fP (and therefore \fIndbm\fP) is based on an algorithm invented by
+Ken Thompson, [Tho90, Tor87] and predates Larson's work.
+.PP
+The \fIsdbm\fR programming interface is totally compatible
+with \fIndbm\fP and includes a slight improvement in database initialization.
+It is also expected to be binary-compatible under most UN*X versions that
+support the \fIndbm\fP library.
+.PP
+The \fIsdbm\fP implementation shares the shortcomings of the \fIndbm\fP
+library, as a side effect of various simplifications to the original Larson
+algorithm. It does produce \fIholes\fP in the page file as it writes
+pages past the end of file. (Larson's paper include a clever solution to
+this problem that is a result of using the hash value directly as a block
+address.) On the other hand, extensive tests seem to indicate that \fIsdbm\fP
+creates fewer holes in general, and the resulting pagefiles are
+smaller. The \fIsdbm\fP implementation is also faster than \fIndbm\fP
+in database creation.
+Unlike the \fIndbm\fP, the \fIsdbm\fP
+.CW store
+operation will not ``wander away'' trying to split its
+data pages to insert a datum that \fIcannot\fP (due to elaborate worst-case
+situations) be inserted. (It will fail after a pre-defined number of attempts.)
+.SH
+Important Compatibility Warning
+.PP
+The \fIsdbm\fP and \fIndbm\fP
+libraries \fIcannot\fP share databases: one cannot read the (dir/pag)
+database created by the other. This is due to the differences
+between the \fIndbm\fP and \fIsdbm\fP algorithms\**,
+.FS
+Torek's discussion [Tor87]
+indicates that \fIdbm/ndbm\fP implementations use the hash
+value to traverse the radix trie differently than \fIsdbm\fP
+and as a result, the page indexes are generated in \fIdifferent\fP order.
+For more information, send e-mail to the author.
+.FE
+and the hash functions
+used.
+It is easy to convert between the \fIdbm/ndbm\fP databases and \fIsdbm\fP
+by ignoring the index completely: see
+.CW dbd ,
+.CW dbu
+etc.
+.R
+.LP
+.SH
+Notice of Intellectual Property
+.LP
+\fIThe entire\fP sdbm \fIlibrary package, as authored by me,\fP Ozan S. Yigit,
+\fIis hereby placed in the public domain.\fP As such, the author is not
+responsible for the consequences of use of this software, no matter how
+awful, even if they arise from defects in it. There is no expressed or
+implied warranty for the \fIsdbm\fP library.
+.PP
+Since the \fIsdbm\fP
+library package is in the public domain, this \fIoriginal\fP
+release or any additional public-domain releases of the modified original
+cannot possibly (by definition) be withheld from you. Also by definition,
+You (singular) have all the rights to this code (including the right to
+sell without permission, the right to hoard\**
+.FS
+You cannot really hoard something that is available to the public at
+large, but try if it makes you feel any better.
+.FE
+and the right to do other icky things as
+you see fit) but those rights are also granted to everyone else.
+.PP
+Please note that all previous distributions of this software contained
+a copyright (which is now dropped) to protect its
+origins and its current public domain status against any possible claims
+and/or challenges.
+.SH
+Acknowledgments
+.PP
+Many people have been very helpful and supportive. A partial list would
+necessarily include Rayan Zacherissen (who contributed the man page,
+and also hacked a MMAP version of \fIsdbm\fP),
+Arnold Robbins, Chris Lewis,
+Bill Davidsen, Henry Spencer, Geoff Collyer, Rich Salz (who got me started
+in the first place), Johannes Ruschein
+(who did the minix port) and David Tilbrook. I thank you all.
+.SH
+Distribution Manifest and Notes
+.LP
+This distribution of \fIsdbm\fP includes (at least) the following:
+.P1
+ CHANGES change log
+ README this file.
+ biblio a small bibliography on external hashing
+ dba.c a crude (n/s)dbm page file analyzer
+ dbd.c a crude (n/s)dbm page file dumper (for conversion)
+ dbe.1 man page for dbe.c
+ dbe.c Janick's database editor
+ dbm.c a dbm library emulation wrapper for ndbm/sdbm
+ dbm.h header file for the above
+ dbu.c a crude db management utility
+ hash.c hashing function
+ makefile guess.
+ pair.c page-level routines (posted earlier)
+ pair.h header file for the above
+ readme.ms troff source for the README file
+ sdbm.3 man page
+ sdbm.c the real thing
+ sdbm.h header file for the above
+ tune.h place for tuning & portability thingies
+ util.c miscellaneous
+.P2
+.PP
+.CW dbu
+is a simple database manipulation program\** that tries to look
+.FS
+The
+.CW dbd ,
+.CW dba ,
+.CW dbu
+utilities are quick hacks and are not fit for production use. They were
+developed late one night, just to test out \fIsdbm\fP, and convert some
+databases.
+.FE
+like Bell Labs'
+.CW cbt
+utility. It is currently incomplete in functionality.
+I use
+.CW dbu
+to test out the routines: it takes (from stdin) tab separated
+key/value pairs for commands like
+.CW build
+or
+.CW insert
+or takes keys for
+commands like
+.CW delete
+or
+.CW look .
+.P1
+ dbu <build|creat|look|insert|cat|delete> dbmfile
+.P2
+.PP
+.CW dba
+is a crude analyzer of \fIdbm/sdbm/ndbm\fP
+page files. It scans the entire
+page file, reporting page level statistics, and totals at the end.
+.PP
+.CW dbd
+is a crude dump program for \fIdbm/ndbm/sdbm\fP
+databases. It ignores the
+bitmap, and dumps the data pages in sequence. It can be used to create
+input for the
+.CW dbu
+utility.
+Note that
+.CW dbd
+will skip any NULLs in the key and data
+fields, thus is unsuitable to convert some peculiar databases that
+insist in including the terminating null.
+.PP
+I have also included a copy of the
+.CW dbe
+(\fIndbm\fP DataBase Editor) by Janick Bergeron [janick@bnr.ca] for
+your pleasure. You may find it more useful than the little
+.CW dbu
+utility.
+.PP
+.CW dbm.[ch]
+is a \fIdbm\fP library emulation on top of \fIndbm\fP
+(and hence suitable for \fIsdbm\fP). Written by Robert Elz.
+.PP
+The \fIsdbm\fP
+library has been around in beta test for quite a long time, and from whatever
+little feedback I received (maybe no news is good news), I believe it has been
+functioning without any significant problems. I would, of course, appreciate
+all fixes and/or improvements. Portability enhancements would especially be
+useful.
+.SH
+Implementation Issues
+.PP
+Hash functions:
+The algorithm behind \fIsdbm\fP implementation needs a good bit-scrambling
+hash function to be effective. I ran into a set of constants for a simple
+hash function that seem to help \fIsdbm\fP perform better than \fIndbm\fP
+for various inputs:
+.P1
+ /*
+ * polynomial conversion ignoring overflows
+ * 65599 nice. 65587 even better.
+ */
+ long
+ dbm_hash(char *str, int len) {
+ register unsigned long n = 0;
+
+ while (len--)
+ n = n * 65599 + *str++;
+ return n;
+ }
+.P2
+.PP
+There may be better hash functions for the purposes of dynamic hashing.
+Try your favorite, and check the pagefile. If it contains too many pages
+with too many holes, (in relation to this one for example) or if
+\fIsdbm\fP
+simply stops working (fails after
+.CW SPLTMAX
+attempts to split) when you feed your
+NEWS
+.CW history
+file to it, you probably do not have a good hashing function.
+If you do better (for different types of input), I would like to know
+about the function you use.
+.PP
+Block sizes: It seems (from various tests on a few machines) that a page
+file block size
+.CW PBLKSIZ
+of 1024 is by far the best for performance, but
+this also happens to limit the size of a key/value pair. Depending on your
+needs, you may wish to increase the page size, and also adjust
+.CW PAIRMAX
+(the maximum size of a key/value pair allowed: should always be at least
+three words smaller than
+.CW PBLKSIZ .)
+accordingly. The system-wide version of the library
+should probably be
+configured with 1024 (distribution default), as this appears to be sufficient
+for most common uses of \fIsdbm\fP.
+.SH
+Portability
+.PP
+This package has been tested in many different UN*Xes even including minix,
+and appears to be reasonably portable. This does not mean it will port
+easily to non-UN*X systems.
+.SH
+Notes and Miscellaneous
+.PP
+The \fIsdbm\fP is not a very complicated package, at least not after you
+familiarize yourself with the literature on external hashing. There are
+other interesting algorithms in existence that ensure (approximately)
+single-read access to a data value associated with any key. These are
+directory-less schemes such as \fIlinear hashing\fP [Lit80] (+ Larson
+variations), \fIspiral storage\fP [Mar79] or directory schemes such as
+\fIextensible hashing\fP [Fag79] by Fagin et al. I do hope these sources
+provide a reasonable playground for experimentation with other algorithms.
+See the June 1988 issue of ACM Computing Surveys [Enb88] for an
+excellent overview of the field.
+.PG
+.SH
+References
+.LP
+.IP [Lar78] 4m
+P.-A. Larson,
+``Dynamic Hashing'', \fIBIT\fP, vol. 18, pp. 184-201, 1978.
+.IP [Tho90] 4m
+Ken Thompson, \fIprivate communication\fP, Nov. 1990
+.IP [Lit80] 4m
+W. Litwin,
+`` Linear Hashing: A new tool for file and table addressing'',
+\fIProceedings of the 6th Conference on Very Large Dabatases (Montreal)\fP,
+pp. 212-223, Very Large Database Foundation, Saratoga, Calif., 1980.
+.IP [Fag79] 4m
+R. Fagin, J. Nievergelt, N. Pippinger, and H. R. Strong,
+``Extendible Hashing - A Fast Access Method for Dynamic Files'',
+\fIACM Trans. Database Syst.\fP, vol. 4, no.3, pp. 315-344, Sept. 1979.
+.IP [Wal84] 4m
+Rich Wales,
+``Discussion of "dbm" data base system'', \fIUSENET newsgroup unix.wizards\fP,
+Jan. 1984.
+.IP [Tor87] 4m
+Chris Torek,
+``Re: dbm.a and ndbm.a archives'', \fIUSENET newsgroup comp.unix\fP,
+1987.
+.IP [Mar79] 4m
+G. N. Martin,
+``Spiral Storage: Incrementally Augmentable Hash Addressed Storage'',
+\fITechnical Report #27\fP, University of Varwick, Coventry, U.K., 1979.
+.IP [Enb88] 4m
+R. J. Enbody and H. C. Du,
+``Dynamic Hashing Schemes'',\fIACM Computing Surveys\fP,
+vol. 20, no. 2, pp. 85-113, June 1988.