ext/SDBM_File/sdbm/readme.ms


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353

.\" tbl | readme.ms | [tn]roff -ms | ...
.\" note the "C" (courier) and "CB" fonts: you will probably have to
.\" change these.
.\" $Id: readme.ms,v 1.1 90/12/13 13:09:15 oz Exp Locker: oz $

.de P1
.br
.nr dT 4
.nf
.ft C
.sp .5
.nr t \\n(dT*\\w'x'u
.ta 1u*\\ntu 2u*\\ntu 3u*\\ntu 4u*\\ntu 5u*\\ntu 6u*\\ntu 7u*\\ntu 8u*\\ntu 9u*\\ntu 10u*\\ntu 11u*\\ntu 12u*\\ntu 13u*\\ntu 14u*\\ntu
..
.de P2
.br
.ft 1
.br
.sp .5
.br
.fi
..
.\" CW uses the typewriter/courier font.
.de CW
\fC\\$1\\fP\\$2
..

.\" Footnote numbering [by Henry Spencer]
.\" <text>\*f for a footnote number..
.\" .FS
.\" \*F <footnote text>
.\" .FE
.\"
.ds f \\u\\s-2\\n+f\\s+2\\d
.nr f 0 1
.ds F \\n+F.
.nr F 0 1

.ND
.LP
.TL
\fIsdbm\fP \(em Substitute DBM
.br
or
.br
Berkeley \fIndbm\fP for Every UN*X\** Made Simple
.AU
Ozan (oz) Yigit
.AI
The Guild of PD Software Toolmakers
Toronto - Canada
.sp
oz@nexus.yorku.ca
.LP
.FS
UN*X is not a trademark of any (dis)organization.
.FE
.sp 2
\fIImplementation is the sincerest form of flattery. \(em L. Peter Deutsch\fP
.SH
A The Clone of the \fIndbm\fP library
.PP
The sources accompanying this notice \(em \fIsdbm\fP \(em constitute
the first public release (Dec. 1990) of a complete clone of
the Berkeley UN*X \fIndbm\fP library. The \fIsdbm\fP library is meant to
clone the proven functionality of \fIndbm\fP as closely as possible,
including a few improvements. It is practical, easy to understand, and
compatible.
The \fIsdbm\fP library is not derived from any licensed, proprietary or
copyrighted software.
.PP
The \fIsdbm\fP implementation is based on a 1978 algorithm
[Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''.
In the course of searching for a substitute for \fIndbm\fP, I
prototyped three different external-hashing algorithms [Lar78, Fag79, Lit80]
and ultimately chose Larson's algorithm as a basis of the \fIsdbm\fP
implementation. The Bell Labs
\fIdbm\fP (and therefore \fIndbm\fP) is based on an algorithm invented by
Ken Thompson, [Tho90, Tor87] and predates Larson's work.
.PP
The \fIsdbm\fR programming interface is totally compatible
with \fIndbm\fP and includes a slight improvement in database initialization.
It is also expected to be binary-compatible under most UN*X versions that
support the \fIndbm\fP library.
.PP
The \fIsdbm\fP implementation shares the shortcomings of the \fIndbm\fP
library, as a side effect of various simplifications to the original Larson
algorithm. It does produce \fIholes\fP in the page file as it writes
pages past the end of file. (Larson's paper include a clever solution to
this problem that is a result of using the hash value directly as a block
address.) On the other hand, extensive tests seem to indicate that \fIsdbm\fP
creates fewer holes in general, and the resulting pagefiles are
smaller. The \fIsdbm\fP implementation is also faster than \fIndbm\fP
in database creation.
Unlike the \fIndbm\fP, the \fIsdbm\fP
.CW store
operation will not ``wander away'' trying to split its
data pages to insert a datum that \fIcannot\fP (due to elaborate worst-case
situations) be inserted. (It will fail after a pre-defined number of attempts.)
.SH
Important Compatibility Warning
.PP
The \fIsdbm\fP and \fIndbm\fP
libraries \fIcannot\fP share databases: one cannot read the (dir/pag)
database created by the other. This is due to the differences
between the \fIndbm\fP and \fIsdbm\fP algorithms\**, 
.FS
Torek's discussion [Tor87]
indicates that \fIdbm/ndbm\fP implementations use the hash
value to traverse the radix trie differently than \fIsdbm\fP
and as a result, the page indexes are generated in \fIdifferent\fP order.
For more information, send e-mail to the author.
.FE
and the hash functions
used.
It is easy to convert between the \fIdbm/ndbm\fP databases and \fIsdbm\fP
by ignoring the index completely: see
.CW dbd ,
.CW dbu
etc.
.R
.LP
.SH
Notice of Intellectual Property
.LP
\fIThe entire\fP sdbm  \fIlibrary package, as authored by me,\fP Ozan S. Yigit,
\fIis hereby placed in the public domain.\fP As such, the author is not
responsible for the consequences of use of this software, no matter how
awful, even if they arise from defects in it. There is no expressed or
implied warranty for the \fIsdbm\fP library.
.PP
Since the \fIsdbm\fP
library package is in the public domain, this \fIoriginal\fP
release or any additional public-domain releases of the modified original
cannot possibly (by definition) be withheld from you. Also by definition,
You (singular) have all the rights to this code (including the right to
sell without permission, the right to hoard\**
.FS
You cannot really hoard something that is available to the public at
large, but try if it makes you feel any better.
.FE
and the right to do other icky things as
you see fit) but those rights are also granted to everyone else.
.PP
Please note that all previous distributions of this software contained
a copyright (which is now dropped) to protect its
origins and its current public domain status against any possible claims
and/or challenges.
.SH
Acknowledgments
.PP
Many people have been very helpful and supportive.  A partial list would
necessarily include Rayan Zacherissen (who contributed the man page,
and also hacked a MMAP version of \fIsdbm\fP),
Arnold Robbins, Chris Lewis,
Bill Davidsen, Henry Spencer, Geoff Collyer, Rich Salz (who got me started
in the first place), Johannes Ruschein
(who did the minix port) and David Tilbrook. I thank you all.
.SH
Distribution Manifest and Notes
.LP
This distribution of \fIsdbm\fP includes (at least) the following:
.P1
	CHANGES		change log
	README		this file.
	biblio		a small bibliography on external hashing
	dba.c		a crude (n/s)dbm page file analyzer
	dbd.c		a crude (n/s)dbm page file dumper (for conversion)
	dbe.1		man page for dbe.c
	dbe.c		Janick's database editor
	dbm.c		a dbm library emulation wrapper for ndbm/sdbm
	dbm.h		header file for the above
	dbu.c		a crude db management utility
	hash.c		hashing function
	makefile	guess.
	pair.c		page-level routines (posted earlier)
	pair.h		header file for the above
	readme.ms	troff source for the README file
	sdbm.3		man page
	sdbm.c		the real thing
	sdbm.h		header file for the above
	tune.h		place for tuning & portability thingies
	util.c		miscellaneous
.P2
.PP
.CW dbu
is a simple database manipulation program\** that tries to look
.FS
The 
.CW dbd ,
.CW dba ,
.CW dbu
utilities are quick hacks and are not fit for production use. They were
developed late one night, just to test out \fIsdbm\fP, and convert some
databases.
.FE
like Bell Labs'
.CW cbt
utility. It is currently incomplete in functionality.
I use
.CW dbu
to test out the routines: it takes (from stdin) tab separated
key/value pairs for commands like
.CW build
or
.CW insert
or takes keys for
commands like
.CW delete
or
.CW look .
.P1
	dbu <build|creat|look|insert|cat|delete> dbmfile
.P2
.PP
.CW dba
is a crude analyzer of \fIdbm/sdbm/ndbm\fP
page files. It scans the entire
page file, reporting page level statistics, and totals at the end.
.PP
.CW dbd
is a crude dump program for \fIdbm/ndbm/sdbm\fP
databases. It ignores the
bitmap, and dumps the data pages in sequence. It can be used to create
input for the
.CW dbu 
utility.
Note that
.CW dbd
will skip any NULLs in the key and data
fields, thus is unsuitable to convert some peculiar databases that
insist in including the terminating null.
.PP
I have also included a copy of the
.CW dbe
(\fIndbm\fP DataBase Editor) by Janick Bergeron [janick@bnr.ca] for
your pleasure. You may find it more useful than the little
.CW dbu
utility.
.PP
.CW dbm.[ch]
is a \fIdbm\fP library emulation on top of \fIndbm\fP
(and hence suitable for \fIsdbm\fP). Written by Robert Elz.
.PP
The \fIsdbm\fP
library has been around in beta test for quite a long time, and from whatever
little feedback I received (maybe no news is good news), I believe it has been
functioning without any significant problems. I would, of course, appreciate
all fixes and/or improvements. Portability enhancements would especially be
useful.
.SH
Implementation Issues
.PP
Hash functions:
The algorithm behind \fIsdbm\fP implementation needs a good bit-scrambling
hash function to be effective. I ran into a set of constants for a simple
hash function that seem to help \fIsdbm\fP perform better than \fIndbm\fP
for various inputs:
.P1
	/*
	 * polynomial conversion ignoring overflows
	 * 65599 nice. 65587 even better.
	 */
	long
	dbm_hash(char *str, int len) {
		register unsigned long n = 0;
	
		while (len--)
			n = n * 65599 + *str++;
		return n;
	}
.P2
.PP
There may be better hash functions for the purposes of dynamic hashing.
Try your favorite, and check the pagefile. If it contains too many pages
with too many holes, (in relation to this one for example) or if
\fIsdbm\fP
simply stops working (fails after 
.CW SPLTMAX
attempts to split) when you feed your
NEWS 
.CW history
file to it, you probably do not have a good hashing function.
If you do better (for different types of input), I would like to know
about the function you use.
.PP
Block sizes: It seems (from various tests on a few machines) that a page
file block size
.CW PBLKSIZ
of 1024 is by far the best for performance, but
this also happens to limit the size of a key/value pair. Depending on your
needs, you may wish to increase the page size, and also adjust
.CW PAIRMAX
(the maximum size of a key/value pair allowed: should always be at least
three words smaller than
.CW PBLKSIZ .)
accordingly. The system-wide version of the library
should probably be
configured with 1024 (distribution default), as this appears to be sufficient
for most common uses of \fIsdbm\fP.
.SH
Portability
.PP
This package has been tested in many different UN*Xes even including minix,
and appears to be reasonably portable. This does not mean it will port
easily to non-UN*X systems.
.SH
Notes and Miscellaneous
.PP
The \fIsdbm\fP is not a very complicated package, at least not after you
familiarize yourself with the literature on external hashing. There are
other interesting algorithms in existence that ensure (approximately)
single-read access to a data value associated with any key. These are
directory-less schemes such as \fIlinear hashing\fP [Lit80] (+ Larson
variations), \fIspiral storage\fP [Mar79] or directory schemes such as
\fIextensible hashing\fP [Fag79] by Fagin et al. I do hope these sources
provide a reasonable playground for experimentation with other algorithms.
See the June 1988 issue of ACM Computing Surveys [Enb88] for an
excellent overview of the field. 
.PG
.SH
References
.LP
.IP [Lar78] 4m
P.-A. Larson,
``Dynamic Hashing'', \fIBIT\fP, vol.  18,  pp. 184-201, 1978.
.IP [Tho90] 4m
Ken Thompson, \fIprivate communication\fP, Nov. 1990
.IP [Lit80] 4m
W. Litwin,
`` Linear Hashing: A new tool  for  file  and table addressing'',
\fIProceedings of the 6th Conference on Very Large  Dabatases  (Montreal)\fP,
pp.  212-223,  Very Large Database Foundation, Saratoga, Calif., 1980.
.IP [Fag79] 4m
R. Fagin, J.  Nievergelt,  N.  Pippinger,  and  H.  R. Strong,
``Extendible Hashing - A Fast Access Method for Dynamic Files'',
\fIACM Trans. Database Syst.\fP, vol. 4,  no.3, pp. 315-344, Sept. 1979.
.IP [Wal84] 4m
Rich Wales,
``Discussion of "dbm" data base system'', \fIUSENET newsgroup unix.wizards\fP,
Jan. 1984.
.IP [Tor87] 4m
Chris Torek,
``Re:  dbm.a  and  ndbm.a  archives'', \fIUSENET newsgroup comp.unix\fP,
1987.
.IP [Mar79] 4m
G. N. Martin,
``Spiral Storage: Incrementally  Augmentable  Hash  Addressed  Storage'',
\fITechnical Report #27\fP, University of Varwick, Coventry, U.K., 1979.
.IP [Enb88] 4m
R. J. Enbody and H. C. Du,
``Dynamic Hashing  Schemes'',\fIACM Computing Surveys\fP,
vol. 20, no. 2, pp. 85-113, June 1988.