From d02b48c63a58ea4367a0e905979f140b7d090f86 Mon Sep 17 00:00:00 2001 From: "Ralf S. Engelschall" Date: Mon, 21 Dec 1998 10:52:47 +0000 Subject: Import of old SSLeay release: SSLeay 0.8.1b --- doc/lhash.doc | 151 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) create mode 100644 doc/lhash.doc (limited to 'doc/lhash.doc') diff --git a/doc/lhash.doc b/doc/lhash.doc new file mode 100644 index 0000000000..5a2aeb4b38 --- /dev/null +++ b/doc/lhash.doc @@ -0,0 +1,151 @@ +The LHASH library. + +I wrote this library in 1991 and have since forgotten why I called it lhash. +It implements a hash table from an article I read at the +time from 'Communications of the ACM'. What makes this hash +table different is that as the table fills, the hash table is +increased (or decreased) in size via realloc(). +When a 'resize' is done, instead of all hashes being redistributed over +twice as many 'buckets', one bucket is split. So when an 'expand' is done, +there is only a minimal cost to redistribute some values. Subsequent +inserts will cause more single 'bucket' redistributions but there will +never be a sudden large cost due to redistributing all the 'buckets'. + +The state for a particular hash table is kept in the LHASH structure. +The LHASH structure also records statistics about most aspects of accessing +the hash table. This is mostly a legacy of my writing this library for +the reasons of implementing what looked like a nice algorithm rather than +for a particular software product. + +Internal stuff you probably don't want to know about. +The decision to increase or decrease the hash table size is made depending +on the 'load' of the hash table. The load is the number of items in the +hash table divided by the size of the hash table. The default values are +as follows. If (hash->up_load < load) => expand. +if (hash->down_load > load) => contract. The 'up_load' has a default value of +1 and 'down_load' has a default value of 2. These numbers can be modified +by the application by just playing with the 'up_load' and 'down_load' +variables. The 'load' is kept in a form which is multiplied by 256. So +hash->up_load=8*256; will cause a load of 8 to be set. + +If you are interested in performance the field to watch is +num_comp_calls. The hash library keeps track of the 'hash' value for +each item so when a lookup is done, the 'hashes' are compared, if +there is a match, then a full compare is done, and +hash->num_comp_calls is incremented. If num_comp_calls is not equal +to num_delete plus num_retrieve it means that your hash function is +generating hashes that are the same for different values. It is +probably worth changing your hash function if this is the case because +even if your hash table has 10 items in a 'bucked', it can be searched +with 10 'unsigned long' compares and 10 linked list traverses. This +will be much less expensive that 10 calls to you compare function. + +LHASH *lh_new( +unsigned long (*hash)(), +int (*cmp)()); + This function is used to create a new LHASH structure. It is passed + function pointers that are used to store and retrieve values passed + into the hash table. The 'hash' + function is a hashing function that will return a hashed value of + it's passed structure. 'cmp' is passed 2 parameters, it returns 0 + is they are equal, otherwise, non zero. + If there are any problems (usually malloc failures), NULL is + returned, otherwise a new LHASH structure is returned. The + hash value is normally truncated to a power of 2, so make sure + that your hash function returns well mixed low order bits. + +void lh_free( +LHASH *lh); + This function free()s a LHASH structure. If there is malloced + data in the hash table, it will not be freed. Consider using the + lh_doall function to deallocate any remaining entries in the hash + table. + +char *lh_insert( +LHASH *lh, +char *data); + This function inserts the data pointed to by data into the lh hash + table. If there is already and entry in the hash table entry, the + value being replaced is returned. A NULL is returned if the new + entry does not clash with an entry already in the table (the normal + case) or on a malloc() failure (perhaps I should change this....). + The 'char *data' is exactly what is passed to the hash and + comparison functions specified in lh_new(). + +char *lh_delete( +LHASH *lh, +char *data); + This routine deletes an entry from the hash table. The value being + deleted is returned. NULL is returned if there is no such value in + the hash table. + +char *lh_retrieve( +LHASH *lh, +char *data); + If 'data' is in the hash table it is returned, else NULL is + returned. The way these routines would normally be uses is that a + dummy structure would have key fields populated and then + ret=lh_retrieve(hash,&dummy);. Ret would now be a pointer to a fully + populated structure. + +void lh_doall( +LHASH *lh, +void (*func)(char *a)); + This function will, for every entry in the hash table, call function + 'func' with the data item as parameters. + This function can be quite useful when used as follows. + void cleanup(STUFF *a) + { STUFF_free(a); } + lh_doall(hash,cleanup); + lh_free(hash); + This can be used to free all the entries, lh_free() then + cleans up the 'buckets' that point to nothing. Be careful + when doing this. If you delete entries from the hash table, + in the call back function, the table may decrease in size, + moving item that you are + currently on down lower in the hash table. This could cause + some entries to be skipped. The best solution to this problem + is to set lh->down_load=0 before you start. This will stop + the hash table ever being decreased in size. + +void lh_doall_arg( +LHASH *lh; +void(*func)(char *a,char *arg)); +char *arg; + This function is the same as lh_doall except that the function + called will be passed 'arg' as the second argument. + +unsigned long lh_strhash( +char *c); + This function is a demo string hashing function. Since the LHASH + routines would normally be passed structures, this routine would + not normally be passed to lh_new(), rather it would be used in the + function passed to lh_new(). + +The next three routines print out various statistics about the state of the +passed hash table. These numbers are all kept in the lhash structure. + +void lh_stats( +LHASH *lh, +FILE *out); + This function prints out statistics on the size of the hash table, + how many entries are in it, and the number and result of calls to + the routines in this library. + +void lh_node_stats( +LHASH *lh, +FILE *out); + For each 'bucket' in the hash table, the number of entries is + printed. + +void lh_node_usage_stats( +LHASH *lh, +FILE *out); + This function prints out a short summary of the state of the hash + table. It prints what I call the 'load' and the 'actual load'. + The load is the average number of data items per 'bucket' in the + hash table. The 'actual load' is the average number of items per + 'bucket', but only for buckets which contain entries. So the + 'actual load' is the average number of searches that will need to + find an item in the hash table, while the 'load' is the average number + that will be done to record a miss. -- cgit v1.2.1