ext/SDBM_File/sdbm/README


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396


                   sdbm - Substitute DBM
                             or
        Berkeley ndbm for Every UN*X[1] Made Simple

                      Ozan (oz) Yigit

            The Guild of PD Software Toolmakers
                      Toronto - Canada

                     oz@nexus.yorku.ca


Implementation is the sincerest form of flattery. - L. Peter
Deutsch

A The Clone of the ndbm library

     The sources accompanying this notice - sdbm  -  consti-
tute  the  first  public  release  (Dec. 1990) of a complete
clone of the Berkeley UN*X ndbm library. The sdbm library is
meant  to  clone the proven functionality of ndbm as closely
as possible, including a few improvements. It is  practical,
easy to understand, and compatible.  The sdbm library is not
derived  from  any  licensed,  proprietary  or   copyrighted
software.

     The sdbm implementation is based on  a  1978  algorithm
[Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''.
In the course of searching for a substitute for ndbm, I pro-
totyped  three different external-hashing algorithms [Lar78,
Fag79, Lit80] and ultimately chose Larson's algorithm  as  a
basis  of  the  sdbm  implementation. The Bell Labs dbm (and
therefore ndbm) is based on an  algorithm  invented  by  Ken
Thompson, [Tho90, Tor87] and predates Larson's work.

     The sdbm programming interface  is  totally  compatible
with ndbm and includes a slight improvement in database ini-
tialization.  It is also expected  to  be  binary-compatible
under most UN*X versions that support the ndbm library.

     The sdbm implementation shares the shortcomings of  the
ndbm library, as a side effect of various simplifications to
the original Larson algorithm. It does produce holes in  the
page file as it writes pages past the end of file. (Larson's
paper include a clever solution to this problem  that  is  a
result of using the hash value directly as a block address.)
On the other hand, extensive tests  seem  to  indicate  that
sdbm creates fewer holes in general, and the resulting page-
files are smaller. The sdbm implementation  is  also  faster
than  ndbm  in database creation.  Unlike the ndbm, the sdbm
_________________________

  [1] UN*X is not a trademark of any (dis)organization.


                           - 2 -


store operation will not ``wander away'' trying to split its
data  pages  to insert a datum that cannot (due to elaborate
worst-case situations) be inserted. (It will  fail  after  a
pre-defined number of attempts.)

Important Compatibility Warning

     The sdbm and ndbm libraries cannot share databases: one
cannot  read  the  (dir/pag)  database created by the other.
This is due to the differences between  the  ndbm  and  sdbm
algorithms[2], and the hash functions used.  It is  easy  to
convert  between the dbm/ndbm databases and sdbm by ignoring
the index completely: see dbd, dbu etc.


Notice of Intellectual Property

The entire sdbm  library package, as authored by me, Ozan S.
Yigit,  is  hereby placed in the public domain. As such, the
author is not responsible for the  consequences  of  use  of
this  software, no matter how awful, even if they arise from
defects in it. There is no expressed or implied warranty for
the sdbm library.

     Since the sdbm library package is in the public domain,
this   original  release  or  any  additional  public-domain
releases of the modified original cannot possibly (by defin-
ition) be withheld from you. Also by definition, You (singu-
lar) have all the rights to this code (including  the  right
to sell without permission, the right to  hoard[3]  and  the
right  to  do  other  icky  things as you see fit) but those
rights are also granted to everyone else.

     Please note that all  previous  distributions  of  this
software  contained  a  copyright  (which is now dropped) to
protect its origins and its  current  public  domain  status
against any possible claims and/or challenges.

Acknowledgments

     Many people have been very helpful and  supportive.   A
partial  list  would  necessarily  include Rayan Zacherissen
(who contributed the  man  page,  and  also  hacked  a  MMAP
_________________________

  [2] Torek's   discussion   [Tor87]   indicates   that
dbm/ndbm implementations use the hash value to traverse
the radix trie differently than sdbm and as  a  result,
the page indexes are generated in different order.  For
more information, send e-mail to the author.
  [3] You  cannot really hoard something that is avail-
able to the public at large, but try if  it  makes  you
feel any better.


                           - 3 -


version of sdbm), Arnold Robbins, Chris Lewis,  Bill  David-
sen,  Henry  Spencer,  Geoff  Collyer, Rich Salz (who got me
started in the first place), Johannes Ruschein (who did  the
minix port) and David Tilbrook. I thank you all.

Distribution Manifest and Notes

This distribution of sdbm includes (at least) the following:

    CHANGES     change log
    README      this file.
    biblio      a small bibliography on external hashing
    dba.c       a crude (n/s)dbm page file analyzer
    dbd.c       a crude (n/s)dbm page file dumper (for conversion)
    dbe.1       man page for dbe.c
    dbe.c       Janick's database editor
    dbm.c       a dbm library emulation wrapper for ndbm/sdbm
    dbm.h       header file for the above
    dbu.c       a crude db management utility
    hash.c      hashing function
    makefile    guess.
    pair.c      page-level routines (posted earlier)
    pair.h      header file for the above
    readme.ms   troff source for the README file
    sdbm.3      man page
    sdbm.c      the real thing
    sdbm.h      header file for the above
    tune.h      place for tuning & portability thingies
    util.c      miscellaneous

     dbu is a simple database manipulation  program[4]  that
tries to look like Bell Labs' cbt utility. It  is  currently
incomplete in functionality.  I use dbu to test out the rou-
tines: it takes (from stdin) tab separated  key/value  pairs
for commands like build or insert or takes keys for commands
like delete or look.

    dbu <build|creat|look|insert|cat|delete> dbmfile

     dba is a crude analyzer of dbm/sdbm/ndbm page files. It
scans the entire page file, reporting page level statistics,
and totals at the end.

     dbd is a crude dump  program  for  dbm/ndbm/sdbm  data-
bases.  It  ignores  the bitmap, and dumps the data pages in
sequence. It can be used to create input for the  dbu  util-
ity.   Note that dbd will skip any NULLs in the key and data
fields,  thus  is  unsuitable  to  convert   some   peculiar
_________________________

  [4] The dbd, dba, dbu utilities are quick  hacks  and
are  not  fit  for  production use. They were developed
late one night, just to test out sdbm, and convert some
databases.


                           - 4 -


databases that insist in including the terminating null.

     I have also included a copy of the dbe  (ndbm  DataBase
Editor)  by  Janick Bergeron [janick@bnr.ca] for your pleas-
ure. You may find it more useful than the little  dbu  util-
ity.

     dbm.[ch] is a dbm library emulation on top of ndbm (and
hence suitable for sdbm). Written by Robert Elz.

     The sdbm library has been around in beta test for quite
a  long  time,  and from whatever little feedback I received
(maybe no news is good news), I believe it  has  been  func-
tioning  without  any  significant  problems.  I  would,  of
course, appreciate all fixes and/or improvements.  Portabil-
ity enhancements would especially be useful.

Implementation Issues

     Hash functions: The algorithm behind  sdbm  implementa-
tion  needs a good bit-scrambling hash function to be effec-
tive. I ran into a set of constants for a simple hash  func-
tion  that  seem  to  help sdbm perform better than ndbm for
various inputs:

    /*
     * polynomial conversion ignoring overflows
     * 65599 nice. 65587 even better.
     */
    long
    dbm_hash(char *str, int len) {
        register unsigned long n = 0;

        while (len--)
            n = n * 65599 + *str++;
        return n;
    }

     There may be better hash functions for the purposes  of
dynamic hashing.  Try your favorite, and check the pagefile.
If it contains too many pages with too many holes, (in rela-
tion  to this one for example) or if sdbm simply stops work-
ing (fails after SPLTMAX attempts to split)  when  you  feed
your  NEWS  history  file  to it, you probably do not have a
good hashing function.  If  you  do  better  (for  different
types of input), I would like to know about the function you
use.

     Block sizes: It seems (from  various  tests  on  a  few
machines)  that a page file block size PBLKSIZ of 1024 is by
far the best for performance, but this also happens to limit
the  size  of a key/value pair. Depending on your needs, you
may wish to increase the page size, and also adjust  PAIRMAX
(the maximum size of a key/value pair allowed: should always


                           - 5 -


be at least three words smaller than PBLKSIZ.)  accordingly.
The  system-wide  version  of the library should probably be
configured with 1024 (distribution default), as this appears
to be sufficient for most common uses of sdbm.

Portability

     This package has been tested in many  different  UN*Xes
even including minix, and appears to be reasonably portable.
This does not mean it will port easily to non-UN*X systems.

Notes and Miscellaneous

     The sdbm is not a very complicated  package,  at  least
not  after  you  familiarize yourself with the literature on
external hashing. There are other interesting algorithms  in
existence  that ensure (approximately) single-read access to
a data value associated with any key. These  are  directory-
less schemes such as linear hashing [Lit80] (+ Larson varia-
tions), spiral storage [Mar79] or directory schemes such  as
extensible  hashing  [Fag79] by Fagin et al. I do hope these
sources provide a reasonable playground for  experimentation
with  other algorithms.  See the June 1988 issue of ACM Com-
puting Surveys [Enb88] for  an  excellent  overview  of  the
field.

References


[Lar78]
    P.-A. Larson, ``Dynamic Hashing'', BIT, vol.   18,   pp.
    184-201, 1978.

[Tho90]
    Ken Thompson, private communication, Nov. 1990

[Lit80]
    W. Litwin, `` Linear Hashing: A new tool  for  file  and
    table addressing'', Proceedings of the 6th Conference on
    Very Large  Dabatases  (Montreal), pp.   212-223,   Very
    Large Database Foundation, Saratoga, Calif., 1980.

[Fag79]
    R. Fagin, J.  Nievergelt,  N.  Pippinger,  and   H.   R.
    Strong,  ``Extendible Hashing - A Fast Access Method for
    Dynamic Files'', ACM  Trans.  Database  Syst.,  vol.  4,
    no.3, pp. 315-344, Sept. 1979.

[Wal84]
    Rich Wales, ``Discussion of "dbm"  data  base  system'',
    USENET newsgroup unix.wizards, Jan. 1984.

[Tor87]
    Chris Torek,  ``Re:   dbm.a   and   ndbm.a   archives'',


                           - 6 -


    USENET newsgroup comp.unix, 1987.

[Mar79]
    G. N. Martin, ``Spiral Storage: Incrementally   Augment-
    able   Hash  Addressed  Storage'', Technical Report #27,
    University of Varwick, Coventry, U.K., 1979.

[Enb88]
    R.  J.  Enbody  and  H.   C.   Du,   ``Dynamic   Hashing
    Schemes'',ACM  Computing  Surveys,  vol.  20, no. 2, pp.
    85-113, June 1988.