summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBruno Haible <bruno@clisp.org>2003-05-07 13:35:57 +0000
committerBruno Haible <bruno@clisp.org>2003-05-07 13:35:57 +0000
commitc4b3453ac315891fd9809c58eaea71e98247195b (patch)
treebd75c0c31f2b96d2c8bd5565683f9ac184cc22c1
parent7f2d2ba06525d249e84c7907442d2e23f69c2ddd (diff)
downloadgperf-c4b3453ac315891fd9809c58eaea71e98247195b.tar.gz
Regenerated for 3.0.
-rw-r--r--doc/gperf.199
-rw-r--r--doc/gperf.html970
-rw-r--r--doc/gperf.info956
-rw-r--r--doc/gperf_1.html6
-rw-r--r--doc/gperf_10.html154
-rw-r--r--doc/gperf_2.html17
-rw-r--r--doc/gperf_3.html19
-rw-r--r--doc/gperf_4.html12
-rw-r--r--doc/gperf_5.html453
-rw-r--r--doc/gperf_6.html318
-rw-r--r--doc/gperf_7.html24
-rw-r--r--doc/gperf_8.html23
-rw-r--r--doc/gperf_9.html91
-rw-r--r--doc/gperf_toc.html47
14 files changed, 2159 insertions, 1030 deletions
diff --git a/doc/gperf.1 b/doc/gperf.1
index dd425e0..9ed2d09 100644
--- a/doc/gperf.1
+++ b/doc/gperf.1
@@ -1,21 +1,27 @@
-.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.022.
-.TH GPERF "1" "September 2000" "GNU gperf 2.7.2" FSF
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.23.
+.TH GPERF "1" "May 2003" "GNU gperf 3.0" FSF
.SH NAME
gperf \- generate a perfect hash function from a key set
.SH SYNOPSIS
.B gperf
[\fIOPTION\fR]... [\fIINPUT-FILE\fR]
.SH DESCRIPTION
-GNU `gperf' generates perfect hash functions.
+GNU 'gperf' generates perfect hash functions.
.PP
If a long option shows an argument as mandatory, then it is mandatory
for the equivalent short option also.
+.SS "Output file location:"
+.HP
+\fB\-\-output\-file\fR=\fIFILE\fR Write output to specified file.
+.PP
+The results are written to standard output if no output file is specified
+or if it is -.
.SS "Input file interpretation:"
.TP
\fB\-e\fR, \fB\-\-delimiters\fR=\fIDELIMITER\-LIST\fR
Allow user to provide a string containing delimiters
used to separate keywords from their attributes.
-Default is ",\en".
+Default is ",".
.TP
\fB\-t\fR, \fB\-\-struct\-type\fR
Allows the user to include a structured type
@@ -23,6 +29,11 @@ declaration for generated code. Any text before %%
is considered part of the type declaration. Key
words and additional fields may follow this, one
group of fields per line.
+.TP
+\fB\-\-ignore\-case\fR
+Consider upper and lower case ASCII characters as
+equivalent. Note that locale dependent case mappings
+are ignored.
.SS "Language for the output code:"
.TP
\fB\-L\fR, \fB\-\-language\fR=\fILANGUAGE\-NAME\fR
@@ -39,21 +50,27 @@ structure.
Initializers for additional components in the keyword
structure.
.TP
-\fB\-H\fR, \fB\-\-hash\-fn\-name\fR=\fINAME\fR
+\fB\-H\fR, \fB\-\-hash\-function\-name\fR=\fINAME\fR
Specify name of generated hash function. Default is
-`hash'.
+\&'hash'.
.TP
-\fB\-N\fR, \fB\-\-lookup\-fn\-name\fR=\fINAME\fR
+\fB\-N\fR, \fB\-\-lookup\-function\-name\fR=\fINAME\fR
Specify name of generated lookup function. Default
-name is `in_word_set'.
+name is 'in_word_set'.
.TP
\fB\-Z\fR, \fB\-\-class\-name\fR=\fINAME\fR
Specify name of generated C++ class. Default name is
-`Perfect_Hash'.
+\&'Perfect_Hash'.
.TP
\fB\-7\fR, \fB\-\-seven\-bit\fR
Assume 7-bit characters.
.TP
+\fB\-l\fR, \fB\-\-compare\-lengths\fR
+Compare key lengths before trying a string
+comparison. This is necessary if the keywords
+contain NUL bytes. It also helps cut down on the
+number of string comparisons made during the lookup.
+.TP
\fB\-c\fR, \fB\-\-compare\-strncmp\fR
Generate comparison code using strncmp rather than
strcmp.
@@ -70,14 +87,27 @@ lookup function rather than with defines.
Include the necessary system include file <string.h>
at the beginning of the code.
.TP
-\fB\-G\fR, \fB\-\-global\fR
+\fB\-G\fR, \fB\-\-global\-table\fR
Generate the static table of keywords as a static
global variable, rather than hiding it inside of the
lookup function (which is the default behavior).
.TP
+\fB\-P\fR, \fB\-\-pic\fR
+Optimize the generated table for inclusion in shared
+libraries. This reduces the startup time of programs
+using a shared library containing the generated code.
+.TP
+\fB\-Q\fR, \fB\-\-string\-pool\-name\fR=\fINAME\fR
+Specify name of string pool generated by option \fB\-\-pic\fR.
+Default name is 'stringpool'.
+.TP
+\fB\-\-null\-strings\fR
+Use NULL strings instead of empty strings for empty
+keyword table entries.
+.TP
\fB\-W\fR, \fB\-\-word\-array\-name\fR=\fINAME\fR
Specify name of word list array. Default name is
-`wordlist'.
+\&'wordlist'.
.TP
\fB\-S\fR, \fB\-\-switch\fR=\fICOUNT\fR
Causes the generated C code to use a switch
@@ -99,30 +129,23 @@ defined elsewhere.
.TP
\fB\-k\fR, \fB\-\-key\-positions\fR=\fIKEYS\fR
Select the key positions used in the hash function.
-The allowable choices range between 1-126, inclusive.
+The allowable choices range between 1-255, inclusive.
The positions are separated by commas, ranges may be
used, and key positions may occur in any order.
Also, the meta-character '*' causes the generated
hash function to consider ALL key positions, and $
-indicates the ``final character'' of a key, e.g.,
+indicates the "final character" of a key, e.g.,
$,1,2,4,6-10.
.TP
-\fB\-l\fR, \fB\-\-compare\-strlen\fR
-Compare key lengths before trying a string
-comparison. This helps cut down on the number of
-string comparisons made during the lookup.
-.TP
\fB\-D\fR, \fB\-\-duplicates\fR
Handle keywords that hash to duplicate values. This
is useful for certain highly redundant keyword sets.
.TP
-\fB\-f\fR, \fB\-\-fast\fR=\fIITERATIONS\fR
-Generate the gen-perf.hash function ``fast''. This
-decreases gperf's running time at the cost of
-minimizing generated table size. The numeric
-argument represents the number of times to iterate
-when resolving a collision. `0' means ``iterate by
-the number of keywords''.
+\fB\-m\fR, \fB\-\-multiple\-iterations\fR=\fIITERATIONS\fR
+Perform multiple choices of the \fB\-i\fR and \fB\-j\fR values,
+and choose the best results. This increases the
+running time by a factor of ITERATIONS but does a
+good job minimizing the generated table size.
.TP
\fB\-i\fR, \fB\-\-initial\-asso\fR=\fIN\fR
Provide an initial value for the associate values
@@ -130,7 +153,7 @@ array. Default is 0. Setting this value larger helps
inflate the size of the final table.
.TP
\fB\-j\fR, \fB\-\-jump\fR=\fIJUMP\-VALUE\fR
-Affects the ``jump value'', i.e., how far to advance
+Affects the "jump value", i.e., how far to advance
the associated character value upon collisions. Must
be an odd number, default is 5.
.TP
@@ -138,25 +161,20 @@ be an odd number, default is 5.
Do not include the length of the keyword when
computing the hash function.
.TP
-\fB\-o\fR, \fB\-\-occurrence\-sort\fR
-Reorders input keys by frequency of occurrence of
-the key sets. This should decrease the search time
-dramatically.
-.TP
\fB\-r\fR, \fB\-\-random\fR
Utilizes randomness to initialize the associated
values table.
.TP
\fB\-s\fR, \fB\-\-size\-multiple\fR=\fIN\fR
Affects the size of the generated hash table. The
-numeric argument N indicates ``how many times larger
-or smaller'' the associated value range should be,
+numeric argument N indicates "how many times larger
+or smaller" the associated value range should be,
in relationship to the number of keys, e.g. a value
-of 3 means ``allow the maximum associated value to
+of 3 means "allow the maximum associated value to
be about 3 times larger than the number of input
-keys.'' Conversely, a value of \fB\-3\fR means ``make the
+keys". Conversely, a value of 1/3 means "make the
maximum associated value about 3 times smaller than
-the number of input keys. A larger table should
+the number of input keys". A larger table should
decrease the time required for an unsuccessful
search, at the expense of extra table space. Default
value is 1.
@@ -171,8 +189,15 @@ Print the gperf version number.
\fB\-d\fR, \fB\-\-debug\fR
Enables the debugging option (produces verbose
output to the standard error).
+.SH AUTHOR
+Written by Douglas C. Schmidt and Bruno Haible.
.SH "REPORTING BUGS"
-Report bugs to <bug-gnu-utils@gnu.org>.
+Report bugs to <bug-gnu-gperf@gnu.org>.
+.SH COPYRIGHT
+Copyright \(co 1989-1998, 2000-2003 Free Software Foundation, Inc.
+.br
+This is free software; see the source for copying conditions. There is NO
+warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
.SH "SEE ALSO"
The full documentation for
.B gperf
diff --git a/doc/gperf.html b/doc/gperf.html
index 17ab5c4..6dde98c 100644
--- a/doc/gperf.html
+++ b/doc/gperf.html
@@ -1,15 +1,16 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
- from gperf.texi on 26 September 2000 -->
+ from gperf.texi on 7 May 2003 -->
<TITLE>Perfect Hash Function Generator</TITLE>
</HEAD>
<BODY>
-<H1>User's Guide to <CODE>gperf</CODE> 2.7.2</H1>
+<H1>User's Guide to <CODE>gperf</CODE> 3.0</H1>
<H2>The GNU Perfect Hash Function Generator</H2>
-<H2>Edition 2.7.2, 26 September 2000</H2>
+<H2>Edition 3.0, 7 May 2003</H2>
<ADDRESS>Douglas C. Schmidt</ADDRESS>
+<ADDRESS>Bruno Haible</ADDRESS>
<P>
<P><HR><P>
<H1>Table of Contents</H1>
@@ -26,26 +27,32 @@
<UL>
<LI><A NAME="TOC8" HREF="gperf.html#SEC8">3.1 Input Format to <CODE>gperf</CODE></A>
<UL>
-<LI><A NAME="TOC9" HREF="gperf.html#SEC9">3.1.1 <CODE>struct</CODE> Declarations and C Code Inclusion</A>
-<LI><A NAME="TOC10" HREF="gperf.html#SEC10">3.1.2 Format for Keyword Entries</A>
-<LI><A NAME="TOC11" HREF="gperf.html#SEC11">3.1.3 Including Additional C Functions</A>
+<LI><A NAME="TOC9" HREF="gperf.html#SEC9">3.1.1 Declarations</A>
+<UL>
+<LI><A NAME="TOC10" HREF="gperf.html#SEC10">3.1.1.1 User-supplied <CODE>struct</CODE></A>
+<LI><A NAME="TOC11" HREF="gperf.html#SEC11">3.1.1.2 Gperf Declarations</A>
+<LI><A NAME="TOC12" HREF="gperf.html#SEC12">3.1.1.3 C Code Inclusion</A>
+</UL>
+<LI><A NAME="TOC13" HREF="gperf.html#SEC13">3.1.2 Format for Keyword Entries</A>
+<LI><A NAME="TOC14" HREF="gperf.html#SEC14">3.1.3 Including Additional C Functions</A>
+<LI><A NAME="TOC15" HREF="gperf.html#SEC15">3.1.4 Where to place directives for GNU <CODE>indent</CODE>.</A>
</UL>
-<LI><A NAME="TOC12" HREF="gperf.html#SEC12">3.2 Output Format for Generated C Code with <CODE>gperf</CODE></A>
-<LI><A NAME="TOC13" HREF="gperf.html#SEC13">3.3 Use of NUL characters</A>
+<LI><A NAME="TOC16" HREF="gperf.html#SEC16">3.2 Output Format for Generated C Code with <CODE>gperf</CODE></A>
+<LI><A NAME="TOC17" HREF="gperf.html#SEC17">3.3 Use of NUL bytes</A>
</UL>
-<LI><A NAME="TOC14" HREF="gperf.html#SEC14">4 Invoking <CODE>gperf</CODE></A>
+<LI><A NAME="TOC18" HREF="gperf.html#SEC18">4 Invoking <CODE>gperf</CODE></A>
<UL>
-<LI><A NAME="TOC15" HREF="gperf.html#SEC15">4.1 Options that affect Interpretation of the Input File</A>
-<LI><A NAME="TOC16" HREF="gperf.html#SEC16">4.2 Options to specify the Language for the Output Code</A>
-<LI><A NAME="TOC17" HREF="gperf.html#SEC17">4.3 Options for fine tuning Details in the Output Code</A>
-<LI><A NAME="TOC18" HREF="gperf.html#SEC18">4.4 Options for changing the Algorithms employed by <CODE>gperf</CODE></A>
-<LI><A NAME="TOC19" HREF="gperf.html#SEC19">4.5 Informative Output</A>
+<LI><A NAME="TOC19" HREF="gperf.html#SEC19">4.1 Specifying the Location of the Output File</A>
+<LI><A NAME="TOC20" HREF="gperf.html#SEC20">4.2 Options that affect Interpretation of the Input File</A>
+<LI><A NAME="TOC21" HREF="gperf.html#SEC21">4.3 Options to specify the Language for the Output Code</A>
+<LI><A NAME="TOC22" HREF="gperf.html#SEC22">4.4 Options for fine tuning Details in the Output Code</A>
+<LI><A NAME="TOC23" HREF="gperf.html#SEC23">4.5 Options for changing the Algorithms employed by <CODE>gperf</CODE></A>
+<LI><A NAME="TOC24" HREF="gperf.html#SEC24">4.6 Informative Output</A>
</UL>
-<LI><A NAME="TOC20" HREF="gperf.html#SEC20">5 Known Bugs and Limitations with <CODE>gperf</CODE></A>
-<LI><A NAME="TOC21" HREF="gperf.html#SEC21">6 Things Still Left to Do</A>
-<LI><A NAME="TOC22" HREF="gperf.html#SEC22">7 Implementation Details of GNU <CODE>gperf</CODE></A>
-<LI><A NAME="TOC23" HREF="gperf.html#SEC23">8 Bibliography</A>
-<LI><A NAME="TOC24" HREF="gperf.html#SEC24">Concept Index</A>
+<LI><A NAME="TOC25" HREF="gperf.html#SEC25">5 Known Bugs and Limitations with <CODE>gperf</CODE></A>
+<LI><A NAME="TOC26" HREF="gperf.html#SEC26">6 Things Still Left to Do</A>
+<LI><A NAME="TOC27" HREF="gperf.html#SEC27">7 Bibliography</A>
+<LI><A NAME="TOC28" HREF="gperf.html#SEC28">Concept Index</A>
</UL>
<P><HR><P>
@@ -504,15 +511,13 @@ Public License instead of this License.
<A NAME="IDX1"></A>
The GNU <CODE>gperf</CODE> perfect hash function generator utility was
-originally written in GNU C++ by Douglas C. Schmidt. It is now also
-available in a highly-portable "old-style" C version. The general
+written in GNU C++ by Douglas C. Schmidt. The general
idea for the perfect hash function generator was inspired by Keith
Bostic's algorithm written in C, and distributed to net.sources around
1984. The current program is a heavily modified, enhanced, and extended
implementation of Keith's basic idea, created at the University of
California, Irvine. Bugs, patches, and suggestions should be reported
-to both <CODE>&#60;bug-gnu-utils@gnu.org&#62;</CODE> and
-<CODE>&#60;gperf-bugs@lists.sourceforge.net&#62;</CODE>.
+to <CODE>&#60;bug-gnu-gperf@gnu.org&#62;</CODE>.
<LI>
@@ -525,8 +530,9 @@ that greatly helped improve the quality and functionality of <CODE>gperf</CODE>.
<LI>
-A testsuite was added by Bruno Haible. He also rewrote the output
-routines for better reliability.
+Bruno Haible enhanced and optimized the search algorithm. He also rewrote
+the input routines and the output routines for better reliability, and
+added a testsuite.
</UL>
@@ -537,8 +543,8 @@ routines for better reliability.
<CODE>gperf</CODE> is a perfect hash function generator written in C++. It
transforms an <VAR>n</VAR> element user-specified keyword set <VAR>W</VAR> into a
perfect hash function <VAR>F</VAR>. <VAR>F</VAR> uniquely maps keywords in
-<VAR>W</VAR> onto the range 0..<VAR>k</VAR>, where <VAR>k</VAR> &#62;= <VAR>n</VAR>. If <VAR>k</VAR>
-= <VAR>n</VAR> then <VAR>F</VAR> is a <EM>minimal</EM> perfect hash function.
+<VAR>W</VAR> onto the range 0..<VAR>k</VAR>, where <VAR>k</VAR> &#62;= <VAR>n-1</VAR>. If <VAR>k</VAR>
+= <VAR>n-1</VAR> then <VAR>F</VAR> is a <EM>minimal</EM> perfect hash function.
<CODE>gperf</CODE> generates a 0..<VAR>k</VAR> element static lookup table and a
pair of C functions. These functions determine whether a given
character string <VAR>s</VAR> occurs in <VAR>W</VAR>, using at most one probe into
@@ -548,11 +554,12 @@ the lookup table.
<P>
<CODE>gperf</CODE> currently generates the reserved keyword recognizer for
lexical analyzers in several production and research compilers and
-language processing tools, including GNU C, GNU C++, GNU Pascal, GNU
-Modula 3, and GNU indent. Complete C++ source code for <CODE>gperf</CODE> is
-available via anonymous ftp from <CODE>ftp://ftp.gnu.org/pub/gnu/gperf/</CODE>.
+language processing tools, including GNU C, GNU C++, GNU Java, GNU Pascal,
+GNU Modula 3, and GNU indent. Complete C++ source code for <CODE>gperf</CODE> is
+available from <CODE>http://ftp.gnu.org/pub/gnu/gperf/</CODE>.
A paper describing <CODE>gperf</CODE>'s design and implementation in greater
-detail is available in the Second USENIX C++ Conference proceedings.
+detail is available in the Second USENIX C++ Conference proceedings
+or from <CODE>http://www.cs.wustl.edu/~schmidt/resume.html</CODE>.
</P>
@@ -566,7 +573,7 @@ detail is available in the Second USENIX C++ Conference proceedings.
A <STRONG>static search structure</STRONG> is an Abstract Data Type with certain
fundamental operations, e.g., <EM>initialize</EM>, <EM>insert</EM>,
and <EM>retrieve</EM>. Conceptually, all insertions occur before any
-retrievals. In practice, <CODE>gperf</CODE> generates a <CODE>static</CODE> array
+retrievals. In practice, <CODE>gperf</CODE> generates a <EM>static</EM> array
containing search set keywords and any associated attributes specified
by the user. Thus, there is essentially no execution-time cost for the
insertions. It is a useful data structure for representing <EM>static
@@ -633,8 +640,8 @@ the drudgery associated with constructing time- and space-efficient
search structures by hand. It has proven a useful and practical tool
for serious programming projects. Output from <CODE>gperf</CODE> is currently
used in several production and research compilers, including GNU C, GNU
-C++, GNU Pascal, and GNU Modula 3. The latter two compilers are not yet
-part of the official GNU distribution. Each compiler utilizes
+C++, GNU Java, GNU Pascal, and GNU Modula 3. The latter two compilers are
+not yet part of the official GNU distribution. Each compiler utilizes
<CODE>gperf</CODE> to automatically generate static search structures that
efficiently identify their respective reserved keywords.
@@ -645,7 +652,7 @@ efficiently identify their respective reserved keywords.
<P>
The perfect hash function generator <CODE>gperf</CODE> reads a set of
-"keywords" from a <STRONG>keyfile</STRONG> (or from the standard input by
+"keywords" from an input file (or from the standard input by
default). It attempts to derive a perfect hashing function that
recognizes a member of the <STRONG>static keyword set</STRONG> with at most a
single probe into the lookup table. If <CODE>gperf</CODE> succeeds in
@@ -668,7 +675,7 @@ somewhat. Actual results depend on your C compiler, of course.
</P>
<P>
-In general, <CODE>gperf</CODE> assigns values to the characters it is using
+In general, <CODE>gperf</CODE> assigns values to the bytes it is using
for hashing until some set of values gives each keyword a unique value.
A helpful heuristic is that the larger the hash value range, the easier
it is for <CODE>gperf</CODE> to find and generate a perfect hash function.
@@ -683,7 +690,7 @@ Experimentation is the key to getting the most from <CODE>gperf</CODE>.
<A NAME="IDX5"></A>
<A NAME="IDX6"></A>
<A NAME="IDX7"></A>
-You can control the input keyfile format by varying certain command-line
+You can control the input file format by varying certain command-line
arguments, in particular the <SAMP>`-t'</SAMP> option. The input's appearance
is similar to GNU utilities <CODE>flex</CODE> and <CODE>bison</CODE> (or UNIX
utilities <CODE>lex</CODE> and <CODE>yacc</CODE>). Here's an outline of the general
@@ -700,25 +707,53 @@ functions
</PRE>
<P>
-<EM>Unlike</EM> <CODE>flex</CODE> or <CODE>bison</CODE>, all sections of
-<CODE>gperf</CODE>'s input are optional. The following sections describe the
+<EM>Unlike</EM> <CODE>flex</CODE> or <CODE>bison</CODE>, the declarations section and
+the functions section are optional. The following sections describe the
input format for each section.
</P>
+<P>
+It is possible to omit the declaration section entirely, if the <SAMP>`-t'</SAMP>
+option is not given. In this case the input file begins directly with the
+first keyword line, e.g.:
+
+</P>
+
+<PRE>
+january
+february
+march
+april
+...
+</PRE>
+
-<H3><A NAME="SEC9" HREF="gperf.html#TOC9">3.1.1 <CODE>struct</CODE> Declarations and C Code Inclusion</A></H3>
+<H3><A NAME="SEC9" HREF="gperf.html#TOC9">3.1.1 Declarations</A></H3>
<P>
The keyword input file optionally contains a section for including
-arbitrary C declarations and definitions, as well as provisions for
-providing a user-supplied <CODE>struct</CODE>. If the <SAMP>`-t'</SAMP> option
+arbitrary C declarations and definitions, <CODE>gperf</CODE> declarations that
+act like command-line options, as well as for providing a user-supplied
+<CODE>struct</CODE>.
+
+</P>
+
+
+
+<H4><A NAME="SEC10" HREF="gperf.html#TOC10">3.1.1.1 User-supplied <CODE>struct</CODE></A></H4>
+
+<P>
+If the <SAMP>`-t'</SAMP> option (or, equivalently, the <SAMP>`%struct-type'</SAMP> declaration)
<EM>is</EM> enabled, you <EM>must</EM> provide a C <CODE>struct</CODE> as the last
-component in the declaration section from the keyfile file. The first
-field in this struct must be a <CODE>char *</CODE> or <CODE>const char *</CODE>
-identifier called <SAMP>`name'</SAMP>, although it is possible to modify this
-field's name with the <SAMP>`-K'</SAMP> option described below.
+component in the declaration section from the input file. The first
+field in this struct must be of type <CODE>char *</CODE> or <CODE>const char *</CODE>
+if the <SAMP>`-P'</SAMP> option is not given, or of type <CODE>int</CODE> if the option
+<SAMP>`-P'</SAMP> (or, equivalently, the <SAMP>`%pic'</SAMP> declaration) is enabled.
+This first field must be called <SAMP>`name'</SAMP>, although it is possible to modify
+its name with the <SAMP>`-K'</SAMP> option (or, equivalently, the
+<SAMP>`%define slot-name'</SAMP> declaration) described below.
</P>
<P>
@@ -752,9 +787,260 @@ appearing left justified in the first column, as in the UNIX utility
<CODE>lex</CODE>.
</P>
+
+
+<H4><A NAME="SEC11" HREF="gperf.html#TOC11">3.1.1.2 Gperf Declarations</A></H4>
+
+<P>
+The declaration section can contain <CODE>gperf</CODE> declarations. They
+influence the way <CODE>gperf</CODE> works, like command line options do.
+In fact, every such declaration is equivalent to a command line option.
+There are three forms of declarations:
+
+</P>
+
+<OL>
+<LI>
+
+Declarations without argument, like <SAMP>`%compare-lengths'</SAMP>.
+
+<LI>
+
+Declarations with an argument, like <SAMP>`%switch=<VAR>count</VAR>'</SAMP>.
+
+<LI>
+
+Declarations of names of entities in the output file, like
+<SAMP>`%define lookup-function-name <VAR>name</VAR>'</SAMP>.
+</OL>
+
+<P>
+When a declaration is given both in the input file and as a command line
+option, the command-line option's value prevails.
+
+</P>
<P>
+The following <CODE>gperf</CODE> declarations are available.
+
+</P>
+<DL COMPACT>
+
+<DT><SAMP>`%delimiters=<VAR>delimiter-list</VAR>'</SAMP>
+<DD>
<A NAME="IDX9"></A>
+Allows you to provide a string containing delimiters used to
+separate keywords from their attributes. The default is ",". This
+option is essential if you want to use keywords that have embedded
+commas or newlines.
+
+<DT><SAMP>`%struct-type'</SAMP>
+<DD>
<A NAME="IDX10"></A>
+Allows you to include a <CODE>struct</CODE> type declaration for generated
+code; see above for an example.
+
+<DT><SAMP>`%ignore-case'</SAMP>
+<DD>
+<A NAME="IDX11"></A>
+Consider upper and lower case ASCII characters as equivalent. The string
+comparison will use a case insignificant character comparison. Note that
+locale dependent case mappings are ignored.
+
+<DT><SAMP>`%language=<VAR>language-name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX12"></A>
+Instructs <CODE>gperf</CODE> to generate code in the language specified by the
+option's argument. Languages handled are currently:
+
+<DL COMPACT>
+
+<DT><SAMP>`KR-C'</SAMP>
+<DD>
+Old-style K&#38;R C. This language is understood by old-style C compilers and
+ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
+because of lacking <SAMP>`const'</SAMP>.
+
+<DT><SAMP>`C'</SAMP>
+<DD>
+Common C. This language is understood by ANSI C compilers, and also by
+old-style C compilers, provided that you <CODE>#define const</CODE> to empty
+for compilers which don't know about this keyword.
+
+<DT><SAMP>`ANSI-C'</SAMP>
+<DD>
+ANSI C. This language is understood by ANSI C compilers and C++ compilers.
+
+<DT><SAMP>`C++'</SAMP>
+<DD>
+C++. This language is understood by C++ compilers.
+</DL>
+
+The default is C.
+
+<DT><SAMP>`%define slot-name <VAR>name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX13"></A>
+This declaration is only useful when option <SAMP>`-t'</SAMP> (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) has been given.
+By default, the program assumes the structure component identifier for
+the keyword is <SAMP>`name'</SAMP>. This option allows an arbitrary choice of
+identifier for this component, although it still must occur as the first
+field in your supplied <CODE>struct</CODE>.
+
+<DT><SAMP>`%define initializer-suffix <VAR>initializers</VAR>'</SAMP>
+<DD>
+<A NAME="IDX14"></A>
+This declaration is only useful when option <SAMP>`-t'</SAMP> (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) has been given.
+It permits to specify initializers for the structure members following
+<VAR>slot-name</VAR> in empty hash table entries. The list of initializers
+should start with a comma. By default, the emitted code will
+zero-initialize structure members following <VAR>slot-name</VAR>.
+
+<DT><SAMP>`%define hash-function-name <VAR>name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX15"></A>
+Allows you to specify the name for the generated hash function. Default
+name is <SAMP>`hash'</SAMP>. This option permits the use of two hash tables in
+the same file.
+
+<DT><SAMP>`%define lookup-function-name <VAR>name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX16"></A>
+Allows you to specify the name for the generated lookup function.
+Default name is <SAMP>`in_word_set'</SAMP>. This option permits multiple
+generated hash functions to be used in the same application.
+
+<DT><SAMP>`%define class-name <VAR>name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX17"></A>
+This option is only useful when option <SAMP>`-L C++'</SAMP> (or, equivalently,
+the <SAMP>`%language=C++'</SAMP> declaration) has been given. It
+allows you to specify the name of generated C++ class. Default name is
+<CODE>Perfect_Hash</CODE>.
+
+<DT><SAMP>`%7bit'</SAMP>
+<DD>
+<A NAME="IDX18"></A>
+This option specifies that all strings that will be passed as arguments
+to the generated hash function and the generated lookup function will
+solely consist of 7-bit ASCII characters (bytes in the range 0..127).
+(Note that the ANSI C functions <CODE>isalnum</CODE> and <CODE>isgraph</CODE> do
+<EM>not</EM> guarantee that a byte is in this range. Only an explicit
+test like <SAMP>`c &#62;= 'A' &#38;&#38; c &#60;= 'Z''</SAMP> guarantees this.)
+
+<DT><SAMP>`%compare-lengths'</SAMP>
+<DD>
+<A NAME="IDX19"></A>
+Compare keyword lengths before trying a string comparison. This option
+is mandatory for binary comparisons (see section <A HREF="gperf.html#SEC17">3.3 Use of NUL bytes</A>). It also might
+cut down on the number of string comparisons made during the lookup, since
+keywords with different lengths are never compared via <CODE>strcmp</CODE>.
+However, using <SAMP>`%compare-lengths'</SAMP> might greatly increase the size of the
+generated C code if the lookup table range is large (which implies that
+the switch option <SAMP>`-S'</SAMP> or <SAMP>`%switch'</SAMP> is not enabled), since the length
+table contains as many elements as there are entries in the lookup table.
+
+<DT><SAMP>`%compare-strncmp'</SAMP>
+<DD>
+<A NAME="IDX20"></A>
+Generates C code that uses the <CODE>strncmp</CODE> function to perform
+string comparisons. The default action is to use <CODE>strcmp</CODE>.
+
+<DT><SAMP>`%readonly-tables'</SAMP>
+<DD>
+<A NAME="IDX21"></A>
+Makes the contents of all generated lookup tables constant, i.e.,
+"readonly". Many compilers can generate more efficient code for this
+by putting the tables in readonly memory.
+
+<DT><SAMP>`%enum'</SAMP>
+<DD>
+<A NAME="IDX22"></A>
+Define constant values using an enum local to the lookup function rather
+than with #defines. This also means that different lookup functions can
+reside in the same file. Thanks to James Clark <CODE>&#60;jjc@ai.mit.edu&#62;</CODE>.
+
+<DT><SAMP>`%includes'</SAMP>
+<DD>
+<A NAME="IDX23"></A>
+Include the necessary system include file, <CODE>&#60;string.h&#62;</CODE>, at the
+beginning of the code. By default, this is not done; the user must
+include this header file himself to allow compilation of the code.
+
+<DT><SAMP>`%global-table'</SAMP>
+<DD>
+<A NAME="IDX24"></A>
+Generate the static table of keywords as a static global variable,
+rather than hiding it inside of the lookup function (which is the
+default behavior).
+
+<DT><SAMP>`%pic'</SAMP>
+<DD>
+<A NAME="IDX25"></A>
+Optimize the generated table for inclusion in shared libraries. This
+reduces the startup time of programs using a shared library containing
+the generated code. If the <SAMP>`%struct-type'</SAMP> declaration (or,
+equivalently, the option <SAMP>`-t'</SAMP>) is also given, the first field of the
+user-defined struct must be of type <SAMP>`int'</SAMP>, not <SAMP>`char *'</SAMP>, because
+it will contain offsets into the string pool instead of actual strings.
+To convert such an offset to a string, you can use the expression
+<SAMP>`stringpool + <VAR>o</VAR>'</SAMP>, where <VAR>o</VAR> is the offset. The string pool
+name can be changed through the <SAMP>`%define string-pool-name'</SAMP> declaration.
+
+<DT><SAMP>`%define string-pool-name <VAR>name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX26"></A>
+Allows you to specify the name of the generated string pool created by
+the declaration <SAMP>`%pic'</SAMP> (or, equivalently, the option <SAMP>`-P'</SAMP>).
+The default name is <SAMP>`stringpool'</SAMP>. This declaration permits the use of
+two hash tables in the same file, with <SAMP>`%pic'</SAMP> and even when the
+<SAMP>`%global-table'</SAMP> declaration (or, equivalently, the option <SAMP>`-G'</SAMP>)
+is given.
+
+<DT><SAMP>`%null-strings'</SAMP>
+<DD>
+<A NAME="IDX27"></A>
+Use NULL strings instead of empty strings for empty keyword table entries.
+This reduces the startup time of programs using a shared library containing
+the generated code (but not as much as the declaration <SAMP>`%pic'</SAMP>), at the
+expense of one more test-and-branch instruction at run time.
+
+<DT><SAMP>`%define word-array-name <VAR>name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX28"></A>
+Allows you to specify the name for the generated array containing the
+hash table. Default name is <SAMP>`wordlist'</SAMP>. This option permits the
+use of two hash tables in the same file, even when the option <SAMP>`-G'</SAMP>
+(or, equivalently, the <SAMP>`%global-table'</SAMP> declaration) is given.
+
+<DT><SAMP>`%switch=<VAR>count</VAR>'</SAMP>
+<DD>
+<A NAME="IDX29"></A>
+Causes the generated C code to use a <CODE>switch</CODE> statement scheme,
+rather than an array lookup table. This can lead to a reduction in both
+time and space requirements for some input files. The argument to this
+option determines how many <CODE>switch</CODE> statements are generated. A
+value of 1 generates 1 <CODE>switch</CODE> containing all the elements, a
+value of 2 generates 2 tables with 1/2 the elements in each
+<CODE>switch</CODE>, etc. This is useful since many C compilers cannot
+correctly generate code for large <CODE>switch</CODE> statements. This option
+was inspired in part by Keith Bostic's original C program.
+
+<DT><SAMP>`%omit-struct-type'</SAMP>
+<DD>
+<A NAME="IDX30"></A>
+Prevents the transfer of the type declaration to the output file. Use
+this option if the type is already defined elsewhere.
+</DL>
+
+
+
+<H4><A NAME="SEC12" HREF="gperf.html#TOC12">3.1.1.3 C Code Inclusion</A></H4>
+
+<P>
+<A NAME="IDX31"></A>
+<A NAME="IDX32"></A>
Using a syntax similar to GNU utilities <CODE>flex</CODE> and <CODE>bison</CODE>, it
is possible to directly include C source text and comments verbatim into
the generated output file. This is accomplished by enclosing the region
@@ -778,37 +1064,25 @@ march, 3, 31, 31
...
</PRE>
-<P>
-It is possible to omit the declaration section entirely. In this case
-the keyfile begins directly with the first keyword line, e.g.:
-</P>
-
-<PRE>
-january, 1, 31, 31
-february, 2, 28, 29
-march, 3, 31, 31
-april, 4, 30, 30
-...
-</PRE>
-
-
-<H3><A NAME="SEC10" HREF="gperf.html#TOC10">3.1.2 Format for Keyword Entries</A></H3>
+<H3><A NAME="SEC13" HREF="gperf.html#TOC13">3.1.2 Format for Keyword Entries</A></H3>
<P>
-The second keyfile format section contains lines of keywords and any
+The second input file format section contains lines of keywords and any
associated attributes you might supply. A line beginning with <SAMP>`#'</SAMP>
in the first column is considered a comment. Everything following the
-<SAMP>`#'</SAMP> is ignored, up to and including the following newline.
+<SAMP>`#'</SAMP> is ignored, up to and including the following newline. A line
+beginning with <SAMP>`%'</SAMP> in the first column is an option declaration and
+must not occur within the keywords section.
</P>
<P>
-The first field of each non-comment line is always the key itself. It
+The first field of each non-comment line is always the keyword itself. It
can be given in two ways: as a simple name, i.e., without surrounding
string quotation marks, or as a string enclosed in double-quotes, in
C syntax, possibly with backslash escapes like <CODE>\"</CODE> or <CODE>\234</CODE>
-or <CODE>\xa8</CODE>. In either case, it must start right at the beginning
+or <CODE>\xa8</CODE>. In either case, it must start right at the beginning
of the line, without leading whitespace.
In this context, a "field" is considered to extend up to, but
not include, the first blank, comma, or newline. Here is a simple
@@ -840,14 +1114,15 @@ Additional fields may optionally follow the leading keyword. Fields
should be separated by commas, and terminate at the end of line. What
these fields mean is entirely up to you; they are used to initialize the
elements of the user-defined <CODE>struct</CODE> provided by you in the
-declaration section. If the <SAMP>`-t'</SAMP> option is <EM>not</EM> enabled
+declaration section. If the <SAMP>`-t'</SAMP> option (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) is <EM>not</EM> enabled
these fields are simply ignored. All previous examples except the last
one contain keyword attributes.
</P>
-<H3><A NAME="SEC11" HREF="gperf.html#TOC11">3.1.3 Including Additional C Functions</A></H3>
+<H3><A NAME="SEC14" HREF="gperf.html#TOC14">3.1.3 Including Additional C Functions</A></H3>
<P>
The optional third section also corresponds closely with conventions
@@ -860,9 +1135,57 @@ section is valid C.
</P>
-<H2><A NAME="SEC12" HREF="gperf.html#TOC12">3.2 Output Format for Generated C Code with <CODE>gperf</CODE></A></H2>
+<H3><A NAME="SEC15" HREF="gperf.html#TOC15">3.1.4 Where to place directives for GNU <CODE>indent</CODE>.</A></H3>
+
<P>
-<A NAME="IDX11"></A>
+If you want to invoke GNU <CODE>indent</CODE> on a <CODE>gperf</CODE> input file,
+you will see that GNU <CODE>indent</CODE> doesn't understand the <SAMP>`%%'</SAMP>,
+<SAMP>`%{'</SAMP> and <SAMP>`%}'</SAMP> directives that control <CODE>gperf</CODE>'s
+interpretation of the input file. Therefore you have to insert some
+directives for GNU <CODE>indent</CODE>. More precisely, assuming the most
+general input file structure
+
+</P>
+
+<PRE>
+declarations part 1
+%{
+verbatim code
+%}
+declarations part 2
+%%
+keywords
+%%
+functions
+</PRE>
+
+<P>
+you would insert <SAMP>`*INDENT-OFF*'</SAMP> and <SAMP>`*INDENT-ON*'</SAMP> comments
+as follows:
+
+</P>
+
+<PRE>
+/* *INDENT-OFF* */
+declarations part 1
+%{
+/* *INDENT-ON* */
+verbatim code
+/* *INDENT-OFF* */
+%}
+declarations part 2
+%%
+keywords
+%%
+/* *INDENT-ON* */
+functions
+</PRE>
+
+
+
+<H2><A NAME="SEC16" HREF="gperf.html#TOC16">3.2 Output Format for Generated C Code with <CODE>gperf</CODE></A></H2>
+<P>
+<A NAME="IDX33"></A>
</P>
<P>
@@ -877,34 +1200,36 @@ function prototypes are as follows:
<P>
<DL>
<DT><U>Function:</U> unsigned int <B>hash</B> <I>(const char * <VAR>str</VAR>, unsigned int <VAR>len</VAR>)</I>
-<DD><A NAME="IDX12"></A>
+<DD><A NAME="IDX34"></A>
By default, the generated <CODE>hash</CODE> function returns an integer value
-created by adding <VAR>len</VAR> to several user-specified <VAR>str</VAR> key
+created by adding <VAR>len</VAR> to several user-specified <VAR>str</VAR> byte
positions indexed into an <STRONG>associated values</STRONG> table stored in a
local static array. The associated values table is constructed
internally by <CODE>gperf</CODE> and later output as a static local C array
-called <SAMP>`hash_table'</SAMP>; its meaning and properties are described below
-(see section <A HREF="gperf.html#SEC22">7 Implementation Details of GNU <CODE>gperf</CODE></A>). The relevant key positions are specified via
-the <SAMP>`-k'</SAMP> option when running <CODE>gperf</CODE>, as detailed in the
-<EM>Options</EM> section below(see section <A HREF="gperf.html#SEC14">4 Invoking <CODE>gperf</CODE></A>).
+called <SAMP>`hash_table'</SAMP>. The relevant selected positions (i.e. indices
+into <VAR>str</VAR>) are specified via the <SAMP>`-k'</SAMP> option when running
+<CODE>gperf</CODE>, as detailed in the <EM>Options</EM> section below (see section <A HREF="gperf.html#SEC18">4 Invoking <CODE>gperf</CODE></A>).
</DL>
</P>
<P>
<DL>
<DT><U>Function:</U> <B>in_word_set</B> <I>(const char * <VAR>str</VAR>, unsigned int <VAR>len</VAR>)</I>
-<DD><A NAME="IDX13"></A>
+<DD><A NAME="IDX35"></A>
If <VAR>str</VAR> is in the keyword set, returns a pointer to that
-keyword. More exactly, if the option <SAMP>`-t'</SAMP> was given, it returns
-a pointer to the matching keyword's structure. Otherwise it returns
+keyword. More exactly, if the option <SAMP>`-t'</SAMP> (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) was given, it returns
+a pointer to the matching keyword's structure. Otherwise it returns
<CODE>NULL</CODE>.
</DL>
</P>
<P>
-If the option <SAMP>`-c'</SAMP> is not used, <VAR>str</VAR> must be a NUL terminated
-string of exactly length <VAR>len</VAR>. If <SAMP>`-c'</SAMP> is used, <VAR>str</VAR> must
-simply be an array of <VAR>len</VAR> characters and does not need to be NUL
+If the option <SAMP>`-c'</SAMP> (or, equivalently, the <SAMP>`%compare-strncmp'</SAMP>
+declaration) is not used, <VAR>str</VAR> must be a NUL terminated
+string of exactly length <VAR>len</VAR>. If <SAMP>`-c'</SAMP> (or, equivalently, the
+<SAMP>`%compare-strncmp'</SAMP> declaration) is used, <VAR>str</VAR> must
+simply be an array of <VAR>len</VAR> bytes and does not need to be NUL
terminated.
</P>
@@ -925,7 +1250,7 @@ Make use of the user-defined <CODE>struct</CODE>.
<DD>
<DT><SAMP>`--switch=<VAR>total-switch-statements</VAR>'</SAMP>
<DD>
-<A NAME="IDX14"></A>
+<A NAME="IDX36"></A>
Generate 1 or more C <CODE>switch</CODE> statement rather than use a large,
(and potentially sparse) static array. Although the exact time and
space savings of this approach vary according to your C compiler's
@@ -934,9 +1259,11 @@ code.
</DL>
<P>
-If the <SAMP>`-t'</SAMP> and <SAMP>`-S'</SAMP> options are omitted, the default action
-is to generate a <CODE>char *</CODE> array containing the keys, together with
-additional null strings used for padding the array. By experimenting
+If the <SAMP>`-t'</SAMP> and <SAMP>`-S'</SAMP> options (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> and <SAMP>`%switch'</SAMP> declarations) are omitted, the default
+action
+is to generate a <CODE>char *</CODE> array containing the keywords, together with
+additional empty strings used for padding the array. By experimenting
with the various input and output options, and timing the resulting C
code, you can determine the best option choices for different keyword
set characteristics.
@@ -944,60 +1271,84 @@ set characteristics.
</P>
-<H2><A NAME="SEC13" HREF="gperf.html#TOC13">3.3 Use of NUL characters</A></H2>
+<H2><A NAME="SEC17" HREF="gperf.html#TOC17">3.3 Use of NUL bytes</A></H2>
<P>
-<A NAME="IDX15"></A>
+<A NAME="IDX37"></A>
</P>
<P>
By default, the code generated by <CODE>gperf</CODE> operates on zero
-terminated strings, the usual representation of strings in C. This means
-that the keywords in the input file must not contain NUL characters,
+terminated strings, the usual representation of strings in C. This means
+that the keywords in the input file must not contain NUL bytes,
and the <VAR>str</VAR> argument passed to <CODE>hash</CODE> or <CODE>in_word_set</CODE>
must be NUL terminated and have exactly length <VAR>len</VAR>.
</P>
<P>
-If option <SAMP>`-c'</SAMP> is used, then the <VAR>str</VAR> argument does not need
-to be NUL terminated. The code generated by <CODE>gperf</CODE> will only
+If option <SAMP>`-c'</SAMP> (or, equivalently, the <SAMP>`%compare-strncmp'</SAMP>
+declaration) is used, then the <VAR>str</VAR> argument does not need
+to be NUL terminated. The code generated by <CODE>gperf</CODE> will only
access the first <VAR>len</VAR>, not <VAR>len+1</VAR>, bytes starting at <VAR>str</VAR>.
However, the keywords in the input file still must not contain NUL
-characters.
+bytes.
</P>
<P>
-If option <SAMP>`-l'</SAMP> is used, then the hash table performs binary
-comparison. The keywords in the input file may contain NUL characters,
+If option <SAMP>`-l'</SAMP> (or, equivalently, the <SAMP>`%compare-lengths'</SAMP>
+declaration) is used, then the hash table performs binary
+comparison. The keywords in the input file may contain NUL bytes,
written in string syntax as <CODE>\000</CODE> or <CODE>\x00</CODE>, and the code
-generated by <CODE>gperf</CODE> will treat NUL like any other character.
-Also, in this case the <SAMP>`-c'</SAMP> option is ignored.
+generated by <CODE>gperf</CODE> will treat NUL like any other byte.
+Also, in this case the <SAMP>`-c'</SAMP> option (or, equivalently, the
+<SAMP>`%compare-strncmp'</SAMP> declaration) is ignored.
</P>
-<H1><A NAME="SEC14" HREF="gperf.html#TOC14">4 Invoking <CODE>gperf</CODE></A></H1>
+<H1><A NAME="SEC18" HREF="gperf.html#TOC18">4 Invoking <CODE>gperf</CODE></A></H1>
<P>
There are <EM>many</EM> options to <CODE>gperf</CODE>. They were added to make
the program more convenient for use with real applications. "On-line"
-help is readily available via the <SAMP>`-h'</SAMP> option. Here is the
+help is readily available via the <SAMP>`--help'</SAMP> option. Here is the
complete list of options.
</P>
-<H2><A NAME="SEC15" HREF="gperf.html#TOC15">4.1 Options that affect Interpretation of the Input File</A></H2>
+<H2><A NAME="SEC19" HREF="gperf.html#TOC19">4.1 Specifying the Location of the Output File</A></H2>
<DL COMPACT>
+<DT><SAMP>`--output-file=<VAR>file</VAR>'</SAMP>
+<DD>
+Allows you to specify the name of the file to which the output is written to.
+</DL>
+
+<P>
+The results are written to standard output if no output file is specified
+or if it is <SAMP>`-'</SAMP>.
+
+</P>
+
+
+<H2><A NAME="SEC20" HREF="gperf.html#TOC20">4.2 Options that affect Interpretation of the Input File</A></H2>
+
+<P>
+These options are also available as declarations in the input file
+(see section <A HREF="gperf.html#SEC11">3.1.1.2 Gperf Declarations</A>).
+
+</P>
+<DL COMPACT>
+
<DT><SAMP>`-e <VAR>keyword-delimiter-list</VAR>'</SAMP>
<DD>
<DT><SAMP>`--delimiters=<VAR>keyword-delimiter-list</VAR>'</SAMP>
<DD>
-<A NAME="IDX16"></A>
-Allows the user to provide a string containing delimiters used to
-separate keywords from their attributes. The default is ",\n". This
+<A NAME="IDX38"></A>
+Allows you to provide a string containing delimiters used to
+separate keywords from their attributes. The default is ",". This
option is essential if you want to use keywords that have embedded
commas or newlines. One useful trick is to use -e'TAB', where TAB is
the literal tab character.
@@ -1012,12 +1363,29 @@ part of the type declaration. Keywords and additional fields may follow
this, one group of fields per line. A set of examples for generating
perfect hash tables and functions for Ada, C, C++, Pascal, Modula 2,
Modula 3 and JavaScript reserved words are distributed with this release.
+
+<DT><SAMP>`--ignore-case'</SAMP>
+<DD>
+Consider upper and lower case ASCII characters as equivalent. The string
+comparison will use a case insignificant character comparison. Note that
+locale dependent case mappings are ignored. This option is therefore not
+suitable if a properly internationalized or locale aware case mapping
+should be used. (For example, in a Turkish locale, the upper case equivalent
+of the lowercase ASCII letter <SAMP>`i'</SAMP> is the non-ASCII character
+<SAMP>`capital i with dot above'</SAMP>.) For this case, it is better to apply
+an uppercase or lowercase conversion on the string before passing it to
+the <CODE>gperf</CODE> generated function.
</DL>
-<H2><A NAME="SEC16" HREF="gperf.html#TOC16">4.2 Options to specify the Language for the Output Code</A></H2>
+<H2><A NAME="SEC21" HREF="gperf.html#TOC21">4.3 Options to specify the Language for the Output Code</A></H2>
+
+<P>
+These options are also available as declarations in the input file
+(see section <A HREF="gperf.html#SEC11">3.1.1.2 Gperf Declarations</A>).
+</P>
<DL COMPACT>
<DT><SAMP>`-L <VAR>generated-language-name</VAR>'</SAMP>
@@ -1031,23 +1399,23 @@ option's argument. Languages handled are currently:
<DT><SAMP>`KR-C'</SAMP>
<DD>
-Old-style K&#38;R C. This language is understood by old-style C compilers and
+Old-style K&#38;R C. This language is understood by old-style C compilers and
ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
because of lacking <SAMP>`const'</SAMP>.
<DT><SAMP>`C'</SAMP>
<DD>
-Common C. This language is understood by ANSI C compilers, and also by
+Common C. This language is understood by ANSI C compilers, and also by
old-style C compilers, provided that you <CODE>#define const</CODE> to empty
for compilers which don't know about this keyword.
<DT><SAMP>`ANSI-C'</SAMP>
<DD>
-ANSI C. This language is understood by ANSI C compilers and C++ compilers.
+ANSI C. This language is understood by ANSI C compilers and C++ compilers.
<DT><SAMP>`C++'</SAMP>
<DD>
-C++. This language is understood by C++ compilers.
+C++. This language is understood by C++ compilers.
</DL>
The default is C.
@@ -1055,26 +1423,32 @@ The default is C.
<DT><SAMP>`-a'</SAMP>
<DD>
This option is supported for compatibility with previous releases of
-<CODE>gperf</CODE>. It does not do anything.
+<CODE>gperf</CODE>. It does not do anything.
<DT><SAMP>`-g'</SAMP>
<DD>
This option is supported for compatibility with previous releases of
-<CODE>gperf</CODE>. It does not do anything.
+<CODE>gperf</CODE>. It does not do anything.
</DL>
-<H2><A NAME="SEC17" HREF="gperf.html#TOC17">4.3 Options for fine tuning Details in the Output Code</A></H2>
+<H2><A NAME="SEC22" HREF="gperf.html#TOC22">4.4 Options for fine tuning Details in the Output Code</A></H2>
+<P>
+Most of these options are also available as declarations in the input file
+(see section <A HREF="gperf.html#SEC11">3.1.1.2 Gperf Declarations</A>).
+
+</P>
<DL COMPACT>
-<DT><SAMP>`-K <VAR>key-name</VAR>'</SAMP>
+<DT><SAMP>`-K <VAR>slot-name</VAR>'</SAMP>
<DD>
-<DT><SAMP>`--slot-name=<VAR>key-name</VAR>'</SAMP>
+<DT><SAMP>`--slot-name=<VAR>slot-name</VAR>'</SAMP>
<DD>
-<A NAME="IDX17"></A>
-This option is only useful when option <SAMP>`-t'</SAMP> has been given.
+<A NAME="IDX39"></A>
+This option is only useful when option <SAMP>`-t'</SAMP> (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) has been given.
By default, the program assumes the structure component identifier for
the keyword is <SAMP>`name'</SAMP>. This option allows an arbitrary choice of
identifier for this component, although it still must occur as the first
@@ -1084,16 +1458,17 @@ field in your supplied <CODE>struct</CODE>.
<DD>
<DT><SAMP>`--initializer-suffix=<VAR>initializers</VAR>'</SAMP>
<DD>
-<A NAME="IDX18"></A>
-This option is only useful when option <SAMP>`-t'</SAMP> has been given.
+<A NAME="IDX40"></A>
+This option is only useful when option <SAMP>`-t'</SAMP> (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) has been given.
It permits to specify initializers for the structure members following
-<VAR>key name</VAR> in empty hash table entries. The list of initializers
+<VAR>slot-name</VAR> in empty hash table entries. The list of initializers
should start with a comma. By default, the emitted code will
-zero-initialize structure members following <VAR>key name</VAR>.
+zero-initialize structure members following <VAR>slot-name</VAR>.
<DT><SAMP>`-H <VAR>hash-function-name</VAR>'</SAMP>
<DD>
-<DT><SAMP>`--hash-fn-name=<VAR>hash-function-name</VAR>'</SAMP>
+<DT><SAMP>`--hash-function-name=<VAR>hash-function-name</VAR>'</SAMP>
<DD>
Allows you to specify the name for the generated hash function. Default
name is <SAMP>`hash'</SAMP>. This option permits the use of two hash tables in
@@ -1101,19 +1476,19 @@ the same file.
<DT><SAMP>`-N <VAR>lookup-function-name</VAR>'</SAMP>
<DD>
-<DT><SAMP>`--lookup-fn-name=<VAR>lookup-function-name</VAR>'</SAMP>
+<DT><SAMP>`--lookup-function-name=<VAR>lookup-function-name</VAR>'</SAMP>
<DD>
Allows you to specify the name for the generated lookup function.
-Default name is <SAMP>`in_word_set'</SAMP>. This option permits completely
-automatic generation of perfect hash functions, especially when multiple
-generated hash functions are used in the same application.
+Default name is <SAMP>`in_word_set'</SAMP>. This option permits multiple
+generated hash functions to be used in the same application.
<DT><SAMP>`-Z <VAR>class-name</VAR>'</SAMP>
<DD>
<DT><SAMP>`--class-name=<VAR>class-name</VAR>'</SAMP>
<DD>
-<A NAME="IDX19"></A>
-This option is only useful when option <SAMP>`-L C++'</SAMP> has been given. It
+<A NAME="IDX41"></A>
+This option is only useful when option <SAMP>`-L C++'</SAMP> (or, equivalently,
+the <SAMP>`%language=C++'</SAMP> declaration) has been given. It
allows you to specify the name of generated C++ class. Default name is
<CODE>Perfect_Hash</CODE>.
@@ -1123,12 +1498,25 @@ allows you to specify the name of generated C++ class. Default name is
<DD>
This option specifies that all strings that will be passed as arguments
to the generated hash function and the generated lookup function will
-solely consist of 7-bit ASCII characters (characters in the range 0..127).
+solely consist of 7-bit ASCII characters (bytes in the range 0..127).
(Note that the ANSI C functions <CODE>isalnum</CODE> and <CODE>isgraph</CODE> do
-<EM>not</EM> guarantee that a character is in this range. Only an explicit
+<EM>not</EM> guarantee that a byte is in this range. Only an explicit
test like <SAMP>`c &#62;= 'A' &#38;&#38; c &#60;= 'Z''</SAMP> guarantees this.) This was the
default in versions of <CODE>gperf</CODE> earlier than 2.7; now the default is
-to assume 8-bit characters.
+to support 8-bit and multibyte characters.
+
+<DT><SAMP>`-l'</SAMP>
+<DD>
+<DT><SAMP>`--compare-lengths'</SAMP>
+<DD>
+Compare keyword lengths before trying a string comparison. This option
+is mandatory for binary comparisons (see section <A HREF="gperf.html#SEC17">3.3 Use of NUL bytes</A>). It also might
+cut down on the number of string comparisons made during the lookup, since
+keywords with different lengths are never compared via <CODE>strcmp</CODE>.
+However, using <SAMP>`-l'</SAMP> might greatly increase the size of the
+generated C code if the lookup table range is large (which implies that
+the switch option <SAMP>`-S'</SAMP> or <SAMP>`%switch'</SAMP> is not enabled), since the length
+table contains as many elements as there are entries in the lookup table.
<DT><SAMP>`-c'</SAMP>
<DD>
@@ -1163,35 +1551,66 @@ include this header file himself to allow compilation of the code.
<DT><SAMP>`-G'</SAMP>
<DD>
-<DT><SAMP>`--global'</SAMP>
+<DT><SAMP>`--global-table'</SAMP>
<DD>
Generate the static table of keywords as a static global variable,
rather than hiding it inside of the lookup function (which is the
default behavior).
+<DT><SAMP>`-P'</SAMP>
+<DD>
+<DT><SAMP>`--pic'</SAMP>
+<DD>
+Optimize the generated table for inclusion in shared libraries. This
+reduces the startup time of programs using a shared library containing
+the generated code. If the option <SAMP>`-t'</SAMP> (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) is also given, the first field of the
+user-defined struct must be of type <SAMP>`int'</SAMP>, not <SAMP>`char *'</SAMP>, because
+it will contain offsets into the string pool instead of actual strings.
+To convert such an offset to a string, you can use the expression
+<SAMP>`stringpool + <VAR>o</VAR>'</SAMP>, where <VAR>o</VAR> is the offset. The string pool
+name can be changed through the option <SAMP>`--string-pool-name'</SAMP>.
+
+<DT><SAMP>`-Q <VAR>string-pool-name</VAR>'</SAMP>
+<DD>
+<DT><SAMP>`--string-pool-name=<VAR>string-pool-name</VAR>'</SAMP>
+<DD>
+Allows you to specify the name of the generated string pool created by
+option <SAMP>`-P'</SAMP>. The default name is <SAMP>`stringpool'</SAMP>. This option
+permits the use of two hash tables in the same file, with <SAMP>`-P'</SAMP> and
+even when the option <SAMP>`-G'</SAMP> (or, equivalently, the <SAMP>`%global-table'</SAMP>
+declaration) is given.
+
+<DT><SAMP>`--null-strings'</SAMP>
+<DD>
+Use NULL strings instead of empty strings for empty keyword table entries.
+This reduces the startup time of programs using a shared library containing
+the generated code (but not as much as option <SAMP>`-P'</SAMP>), at the expense
+of one more test-and-branch instruction at run time.
+
<DT><SAMP>`-W <VAR>hash-table-array-name</VAR>'</SAMP>
<DD>
<DT><SAMP>`--word-array-name=<VAR>hash-table-array-name</VAR>'</SAMP>
<DD>
-<A NAME="IDX20"></A>
+<A NAME="IDX42"></A>
Allows you to specify the name for the generated array containing the
hash table. Default name is <SAMP>`wordlist'</SAMP>. This option permits the
use of two hash tables in the same file, even when the option <SAMP>`-G'</SAMP>
-is given.
+(or, equivalently, the <SAMP>`%global-table'</SAMP> declaration) is given.
<DT><SAMP>`-S <VAR>total-switch-statements</VAR>'</SAMP>
<DD>
<DT><SAMP>`--switch=<VAR>total-switch-statements</VAR>'</SAMP>
<DD>
-<A NAME="IDX21"></A>
+<A NAME="IDX43"></A>
Causes the generated C code to use a <CODE>switch</CODE> statement scheme,
rather than an array lookup table. This can lead to a reduction in both
-time and space requirements for some keyfiles. The argument to this
-option determines how many <CODE>switch</CODE> statements are generated. A
+time and space requirements for some input files. The argument to this
+option determines how many <CODE>switch</CODE> statements are generated. A
value of 1 generates 1 <CODE>switch</CODE> containing all the elements, a
value of 2 generates 2 tables with 1/2 the elements in each
<CODE>switch</CODE>, etc. This is useful since many C compilers cannot
-correctly generate code for large <CODE>switch</CODE> statements. This option
+correctly generate code for large <CODE>switch</CODE> statements. This option
was inspired in part by Keith Bostic's original C program.
<DT><SAMP>`-T'</SAMP>
@@ -1204,92 +1623,66 @@ this option if the type is already defined elsewhere.
<DT><SAMP>`-p'</SAMP>
<DD>
This option is supported for compatibility with previous releases of
-<CODE>gperf</CODE>. It does not do anything.
+<CODE>gperf</CODE>. It does not do anything.
</DL>
-<H2><A NAME="SEC18" HREF="gperf.html#TOC18">4.4 Options for changing the Algorithms employed by <CODE>gperf</CODE></A></H2>
+<H2><A NAME="SEC23" HREF="gperf.html#TOC23">4.5 Options for changing the Algorithms employed by <CODE>gperf</CODE></A></H2>
<DL COMPACT>
-<DT><SAMP>`-k <VAR>keys</VAR>'</SAMP>
+<DT><SAMP>`-k <VAR>selected-byte-positions</VAR>'</SAMP>
<DD>
-<DT><SAMP>`--key-positions=<VAR>keys</VAR>'</SAMP>
+<DT><SAMP>`--key-positions=<VAR>selected-byte-positions</VAR>'</SAMP>
<DD>
-Allows selection of the character key positions used in the keywords'
-hash function. The allowable choices range between 1-126, inclusive.
+Allows selection of the byte positions used in the keywords'
+hash function. The allowable choices range between 1-255, inclusive.
The positions are separated by commas, e.g., <SAMP>`-k 9,4,13,14'</SAMP>;
ranges may be used, e.g., <SAMP>`-k 2-7'</SAMP>; and positions may occur
-in any order. Furthermore, the meta-character '*' causes the generated
-hash function to consider <STRONG>all</STRONG> character positions in each key,
-whereas '$' instructs the hash function to use the "final character"
-of a key (this is the only way to use a character position greater than
-126, incidentally).
+in any order. Furthermore, the wildcard '*' causes the generated
+hash function to consider <STRONG>all</STRONG> byte positions in each keyword,
+whereas '$' instructs the hash function to use the "final byte"
+of a keyword (this is the only way to use a byte position greater than
+255, incidentally).
For instance, the option <SAMP>`-k 1,2,4,6-10,'$''</SAMP> generates a hash
function that considers positions 1,2,4,6,7,8,9,10, plus the last
-character in each key (which may differ for each key, obviously). Keys
-with length less than the indicated key positions work properly, since
-selected key positions exceeding the key length are simply not
+byte in each keyword (which may be at a different position for each
+keyword, obviously). Keywords
+with length less than the indicated byte positions work properly, since
+selected byte positions exceeding the keyword length are simply not
referenced in the hash function.
-<DT><SAMP>`-l'</SAMP>
-<DD>
-<DT><SAMP>`--compare-strlen'</SAMP>
-<DD>
-Compare key lengths before trying a string comparison. This might cut
-down on the number of string comparisons made during the lookup, since
-keys with different lengths are never compared via <CODE>strcmp</CODE>.
-However, using <SAMP>`-l'</SAMP> might greatly increase the size of the
-generated C code if the lookup table range is large (which implies that
-the switch option <SAMP>`-S'</SAMP> is not enabled), since the length table
-contains as many elements as there are entries in the lookup table.
-This option is mandatory for binary comparisons (see section <A HREF="gperf.html#SEC13">3.3 Use of NUL characters</A>).
+This option is not normally needed since version 2.8 of <CODE>gperf</CODE>;
+the default byte positions are computed depending on the keyword set,
+through a search that minimizes the number of byte positions.
<DT><SAMP>`-D'</SAMP>
<DD>
<DT><SAMP>`--duplicates'</SAMP>
<DD>
-<A NAME="IDX22"></A>
-Handle keywords whose key position sets hash to duplicate values.
-Duplicate hash values occur for two reasons:
-
-
-<UL>
-<LI>
-
-Since <CODE>gperf</CODE> does not backtrack it is possible for it to process
-all your input keywords without finding a unique mapping for each word.
-However, frequently only a very small number of duplicates occur, and
-the majority of keys still require one probe into the table.
-
-<LI>
-
-Sometimes a set of keys may have the same names, but possess different
-attributes. With the -D option <CODE>gperf</CODE> treats all these keys as
+<A NAME="IDX44"></A>
+Handle keywords whose selected byte sets hash to duplicate values.
+Duplicate hash values can occur if a set of keywords has the same names, but
+possesses different attributes, or if the selected byte positions are not well
+chosen. With the -D option <CODE>gperf</CODE> treats all these keywords as
part of an equivalence class and generates a perfect hash function with
-multiple comparisons for duplicate keys. It is up to you to completely
+multiple comparisons for duplicate keywords. It is up to you to completely
disambiguate the keywords by modifying the generated C code. However,
<CODE>gperf</CODE> helps you out by organizing the output.
-</UL>
-Option <SAMP>`-D'</SAMP> is extremely useful for certain large or highly
-redundant keyword sets, e.g., assembler instruction opcodes.
Using this option usually means that the generated hash function is no
longer perfect. On the other hand, it permits <CODE>gperf</CODE> to work on
keyword sets that it otherwise could not handle.
-<DT><SAMP>`-f <VAR>iteration-amount</VAR>'</SAMP>
+<DT><SAMP>`-m <VAR>iterations</VAR>'</SAMP>
<DD>
-<DT><SAMP>`--fast=<VAR>iteration-amount</VAR>'</SAMP>
+<DT><SAMP>`--multiple-iterations=<VAR>iterations</VAR>'</SAMP>
<DD>
-Generate the perfect hash function "fast". This decreases
-<CODE>gperf</CODE>'s running time at the cost of minimizing generated
-table-size. The iteration amount represents the number of times to
-iterate when resolving a collision. `0' means iterate by the number of
-keywords. This option is probably most useful when used in conjunction
-with options <SAMP>`-D'</SAMP> and/or <SAMP>`-S'</SAMP> for <EM>large</EM> keyword sets.
+Perform multiple choices of the <SAMP>`-i'</SAMP> and <SAMP>`-j'</SAMP> values, and
+choose the best results. This increases the running time by a factor of
+<VAR>iterations</VAR> but does a good job minimizing the generated table size.
<DT><SAMP>`-i <VAR>initial-value</VAR>'</SAMP>
<DD>
@@ -1298,16 +1691,17 @@ with options <SAMP>`-D'</SAMP> and/or <SAMP>`-S'</SAMP> for <EM>large</EM> keywo
Provides an initial <VAR>value</VAR> for the associate values array. Default
is 0. Increasing the initial value helps inflate the final table size,
possibly leading to more time efficient keyword lookups. Note that this
-option is not particularly useful when <SAMP>`-S'</SAMP> is used. Also,
+option is not particularly useful when <SAMP>`-S'</SAMP> (or, equivalently,
+<SAMP>`%switch'</SAMP>) is used. Also,
<SAMP>`-i'</SAMP> is overridden when the <SAMP>`-r'</SAMP> option is used.
<DT><SAMP>`-j <VAR>jump-value</VAR>'</SAMP>
<DD>
<DT><SAMP>`--jump=<VAR>jump-value</VAR>'</SAMP>
<DD>
-<A NAME="IDX23"></A>
+<A NAME="IDX45"></A>
Affects the "jump value", i.e., how far to advance the associated
-character value upon collisions. <VAR>Jump-value</VAR> is rounded up to an
+byte value upon collisions. <VAR>Jump-value</VAR> is rounded up to an
odd number, the default is 5. If the <VAR>jump-value</VAR> is 0 <CODE>gperf</CODE>
jumps by random amounts.
@@ -1319,24 +1713,6 @@ Instructs the generator not to include the length of a keyword when
computing its hash value. This may save a few assembly instructions in
the generated lookup table.
-<DT><SAMP>`-o'</SAMP>
-<DD>
-<DT><SAMP>`--occurrence-sort'</SAMP>
-<DD>
-Reorders the keywords by sorting the keywords so that frequently
-occuring key position set components appear first. A second reordering
-pass follows so that keys with "already determined values" are placed
-towards the front of the keylist. This may decrease the time required
-to generate a perfect hash function for many keyword sets, and also
-produce more minimal perfect hash functions. The reason for this is
-that the reordering helps prune the search time by handling inevitable
-collisions early in the search process. On the other hand, if the
-number of keywords is <EM>very</EM> large using <SAMP>`-o'</SAMP> may
-<EM>increase</EM> <CODE>gperf</CODE>'s execution time, since collisions will
-begin earlier and continue throughout the remainder of keyword
-processing. See Cichelli's paper from the January 1980 Communications
-of the ACM for details.
-
<DT><SAMP>`-r'</SAMP>
<DD>
<DT><SAMP>`--random'</SAMP>
@@ -1345,8 +1721,7 @@ Utilizes randomness to initialize the associated values table. This
frequently generates solutions faster than using deterministic
initialization (which starts all associated values at 0). Furthermore,
using the randomization option generally increases the size of the
-table. If <CODE>gperf</CODE> has difficultly with a certain keyword set try using
-<SAMP>`-r'</SAMP> or <SAMP>`-D'</SAMP>.
+table.
<DT><SAMP>`-s <VAR>size-multiple</VAR>'</SAMP>
<DD>
@@ -1354,36 +1729,31 @@ table. If <CODE>gperf</CODE> has difficultly with a certain keyword set try usi
<DD>
Affects the size of the generated hash table. The numeric argument for
this option indicates "how many times larger or smaller" the maximum
-associated value range should be, in relationship to the number of keys.
-If the <VAR>size-multiple</VAR> is negative the maximum associated value is
-calculated by <EM>dividing</EM> it into the total number of keys. For
-example, a value of 3 means "allow the maximum associated value to be
-about 3 times larger than the number of input keys".
-
-Conversely, a value of -3 means "allow the maximum associated value to
-be about 3 times smaller than the number of input keys". Negative
-values are useful for limiting the overall size of the generated hash
-table, though this usually increases the number of duplicate hash
-values.
-
-If `generate switch' option <SAMP>`-S'</SAMP> is <EM>not</EM> enabled, the maximum
+associated value range should be, in relationship to the number of keywords.
+It can be written as an integer, a floating-point number or a fraction.
+For example, a value of 3 means "allow the maximum associated value to be
+about 3 times larger than the number of input keywords".
+Conversely, a value of 1/3 means "allow the maximum associated value to
+be about 3 times smaller than the number of input keywords". Values
+smaller than 1 are useful for limiting the overall size of the generated hash
+table, though the option <SAMP>`-m'</SAMP> is better at this purpose.
+
+If `generate switch' option <SAMP>`-S'</SAMP> (or, equivalently, <SAMP>`%switch'</SAMP>) is
+<EM>not</EM> enabled, the maximum
associated value influences the static array table size, and a larger
table should decrease the time required for an unsuccessful search, at
the expense of extra table space.
The default value is 1, thus the default maximum associated value about
-the same size as the number of keys (for efficiency, the maximum
+the same size as the number of keywords (for efficiency, the maximum
associated value is always rounded up to a power of 2). The actual
table size may vary somewhat, since this technique is essentially a
-heuristic. In particular, setting this value too high slows down
-<CODE>gperf</CODE>'s runtime, since it must search through a much larger range
-of values. Judicious use of the <SAMP>`-f'</SAMP> option helps alleviate this
-overhead, however.
+heuristic.
</DL>
-<H2><A NAME="SEC19" HREF="gperf.html#TOC19">4.5 Informative Output</A></H2>
+<H2><A NAME="SEC24" HREF="gperf.html#TOC24">4.6 Informative Output</A></H2>
<DL COMPACT>
@@ -1414,7 +1784,7 @@ option is enabled.
-<H1><A NAME="SEC20" HREF="gperf.html#TOC20">5 Known Bugs and Limitations with <CODE>gperf</CODE></A></H1>
+<H1><A NAME="SEC25" HREF="gperf.html#TOC25">5 Known Bugs and Limitations with <CODE>gperf</CODE></A></H1>
<P>
The following are some limitations with the current release of
@@ -1433,16 +1803,6 @@ work efficiently on much larger keyword sets (over 15,000 keywords).
When processing large keyword sets it helps greatly to have over 8 megs
of RAM.
-However, since <CODE>gperf</CODE> does not backtrack no guaranteed solution
-occurs on every run. On the other hand, it is usually easy to obtain a
-solution by varying the option parameters. In particular, try the
-<SAMP>`-r'</SAMP> option, and also try changing the default arguments to the
-<SAMP>`-s'</SAMP> and <SAMP>`-j'</SAMP> options. To <EM>guarantee</EM> a solution, use
-the <SAMP>`-D'</SAMP> and <SAMP>`-S'</SAMP> options, although the final results are not
-likely to be a <EM>perfect</EM> hash function anymore! Finally, use the
-<SAMP>`-f'</SAMP> option if you want <CODE>gperf</CODE> to generate the perfect hash
-function <EM>fast</EM>, with less emphasis on making it minimal.
-
<LI>
The size of the generate static keyword array can get <EM>extremely</EM>
@@ -1451,22 +1811,22 @@ similar. This tends to slow down the compilation of the generated C
code, and <EM>greatly</EM> inflates the object code size. If this
situation occurs, consider using the <SAMP>`-S'</SAMP> option to reduce data
size, potentially increasing keyword recognition time a negligible
-amount. Since many C compilers cannot correctly generated code for
+amount. Since many C compilers cannot correctly generate code for
large switch statements it is important to qualify the <VAR>-S</VAR> option
with an appropriate numerical argument that controls the number of
switch statements generated.
<LI>
-The maximum number of key positions selected for a given key has an
-arbitrary limit of 126. This restriction should be removed, and if
+The maximum number of selected byte positions has an
+arbitrary limit of 255. This restriction should be removed, and if
anyone considers this a problem write me and let me know so I can remove
the constraint.
</UL>
-<H1><A NAME="SEC21" HREF="gperf.html#TOC21">6 Things Still Left to Do</A></H1>
+<H1><A NAME="SEC26" HREF="gperf.html#TOC26">6 Things Still Left to Do</A></H1>
<P>
It should be "relatively" easy to replace the current perfect hash
@@ -1479,19 +1839,10 @@ worthwhile improvements include:
<UL>
<LI>
-Make the algorithm more robust. At present, the program halts with an
-error diagnostic if it can't find a direct solution and the <SAMP>`-D'</SAMP>
-option is not enabled. A more comprehensive, albeit computationally
-expensive, approach would employ backtracking or enable alternative
-options and retry. It's not clear how helpful this would be, in
-general, since most search sets are rather small in practice.
-
-<LI>
-
Another useful extension involves modifying the program to generate
"minimal" perfect hash functions (under certain circumstances, the
current version can be rather extravagant in the generated table size).
-Again, this is mostly of theoretical interest, since a sparse table
+This is mostly of theoretical interest, since a sparse table
often produces faster lookups, and use of the <SAMP>`-S'</SAMP> <CODE>switch</CODE>
option can minimize the data size, at the expense of slightly longer
lookups (note that the gcc compiler generally produces good code for
@@ -1500,40 +1851,30 @@ lookups (note that the gcc compiler generally produces good code for
<LI>
In addition to improving the algorithm, it would also be useful to
-generate a C++ class or Ada package as the code output, in addition to
-the current C routines.
+generate an Ada package as the code output, in addition to the current
+C and C++ routines.
</UL>
-<H1><A NAME="SEC22" HREF="gperf.html#TOC22">7 Implementation Details of GNU <CODE>gperf</CODE></A></H1>
+<H1><A NAME="SEC27" HREF="gperf.html#TOC27">7 Bibliography</A></H1>
<P>
-A paper describing the high-level description of the data structures and
-algorithms used to implement <CODE>gperf</CODE> will soon be available. This
-paper is useful not only from a maintenance and enhancement perspective,
-but also because they demonstrate several clever and useful programming
-techniques, e.g., `Iteration Number' boolean arrays, double
-hashing, a "safe" and efficient method for reading arbitrarily long
-input from a file, and a provably optimal algorithm for simultaneously
-determining both the minimum and maximum elements in a list.
+[1] Chang, C.C.: <I>A Scheme for Constructing Ordered Minimal Perfect
+Hashing Functions</I> Information Sciences 39(1986), 187-195.
</P>
-
-
-
-<H1><A NAME="SEC23" HREF="gperf.html#TOC23">8 Bibliography</A></H1>
-
<P>
-[1] Chang, C.C.: <I>A Scheme for Constructing Ordered Minimal Perfect
-Hashing Functions</I> Information Sciences 39(1986), 187-195.
-
[2] Cichelli, Richard J. <I>Author's Response to "On Cichelli's Minimal Perfect Hash
Functions Method"</I> Communications of the ACM, 23, 12(December 1980), 729.
-
+
+</P>
+<P>
[3] Cichelli, Richard J. <I>Minimal Perfect Hash Functions Made Simple</I>
Communications of the ACM, 23, 1(January 1980), 17-19.
-
+
+</P>
+<P>
[4] Cook, C. R. and Oldehoeft, R.R. <I>A Letter Oriented Minimal
Perfect Hashing Function</I> SIGPLAN Notices, 17, 9(September 1982), 18-27.
@@ -1541,7 +1882,9 @@ Perfect Hashing Function</I> SIGPLAN Notices, 17, 9(September 1982), 18-27.
<P>
[5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M.
<I>Practical Perfect Hashing</I> Computer Journal, 28, 1(January 1985), 54-58.
-
+
+</P>
+<P>
[6] Jaeschke, G. <I>Reciprocal Hashing: A Method for Generating Minimal
Perfect Hashing Functions</I> Communications of the ACM, 24, 12(December
1981), 829-833.
@@ -1564,44 +1907,71 @@ Second USENIX C++ Conference Proceedings, April 1990.
</P>
<P>
-[10] Sebesta, R.W. and Taylor, M.A. <I>Minimal Perfect Hash Functions
+[10] Schmidt, Douglas C. <I>GPERF: A Perfect Hash Function Generator</I>
+C++ Report, SIGS 10 10 (November/December 1998).
+
+</P>
+<P>
+[11] Sebesta, R.W. and Taylor, M.A. <I>Minimal Perfect Hash Functions
for Reserved Word Lists</I> SIGPLAN Notices, 20, 12(September 1985), 47-53.
</P>
<P>
-[11] Sprugnoli, R. <I>Perfect Hashing Functions: A Single Probe
+[12] Sprugnoli, R. <I>Perfect Hashing Functions: A Single Probe
Retrieving Method for Static Sets</I> Communications of the ACM, 20
11(November 1977), 841-850.
</P>
<P>
-[12] Stallman, Richard M. <I>Using and Porting GNU CC</I> Free Software Foundation,
+[13] Stallman, Richard M. <I>Using and Porting GNU CC</I> Free Software Foundation,
1988.
</P>
<P>
-[13] Stroustrup, Bjarne <I>The C++ Programming Language.</I> Addison-Wesley, 1986.
+[14] Stroustrup, Bjarne <I>The C++ Programming Language.</I> Addison-Wesley, 1986.
</P>
<P>
-[14] Tiemann, Michael D. <I>User's Guide to GNU C++</I> Free Software
+[15] Tiemann, Michael D. <I>User's Guide to GNU C++</I> Free Software
Foundation, 1989.
</P>
-<H1><A NAME="SEC24" HREF="gperf.html#TOC24">Concept Index</A></H1>
+<H1><A NAME="SEC28" HREF="gperf.html#TOC28">Concept Index</A></H1>
<P>
<H2>%</H2>
<DIR>
<LI><A HREF="gperf.html#IDX8"><SAMP>`%%'</SAMP></A>
-<LI><A HREF="gperf.html#IDX9"><SAMP>`%{'</SAMP></A>
-<LI><A HREF="gperf.html#IDX10"><SAMP>`%}'</SAMP></A>
+<LI><A HREF="gperf.html#IDX18"><SAMP>`%7bit'</SAMP></A>
+<LI><A HREF="gperf.html#IDX19"><SAMP>`%compare-lengths'</SAMP></A>
+<LI><A HREF="gperf.html#IDX20"><SAMP>`%compare-strncmp'</SAMP></A>
+<LI><A HREF="gperf.html#IDX17"><SAMP>`%define class-name'</SAMP></A>
+<LI><A HREF="gperf.html#IDX15"><SAMP>`%define hash-function-name'</SAMP></A>
+<LI><A HREF="gperf.html#IDX14"><SAMP>`%define initializer-suffix'</SAMP></A>
+<LI><A HREF="gperf.html#IDX16"><SAMP>`%define lookup-function-name'</SAMP></A>
+<LI><A HREF="gperf.html#IDX13"><SAMP>`%define slot-name'</SAMP></A>
+<LI><A HREF="gperf.html#IDX26"><SAMP>`%define string-pool-name'</SAMP></A>
+<LI><A HREF="gperf.html#IDX28"><SAMP>`%define word-array-name'</SAMP></A>
+<LI><A HREF="gperf.html#IDX9"><SAMP>`%delimiters'</SAMP></A>
+<LI><A HREF="gperf.html#IDX22"><SAMP>`%enum'</SAMP></A>
+<LI><A HREF="gperf.html#IDX24"><SAMP>`%global-table'</SAMP></A>
+<LI><A HREF="gperf.html#IDX11"><SAMP>`%ignore-case'</SAMP></A>
+<LI><A HREF="gperf.html#IDX23"><SAMP>`%includes'</SAMP></A>
+<LI><A HREF="gperf.html#IDX12"><SAMP>`%language'</SAMP></A>
+<LI><A HREF="gperf.html#IDX27"><SAMP>`%null-strings'</SAMP></A>
+<LI><A HREF="gperf.html#IDX30"><SAMP>`%omit-struct-type'</SAMP></A>
+<LI><A HREF="gperf.html#IDX25"><SAMP>`%pic'</SAMP></A>
+<LI><A HREF="gperf.html#IDX21"><SAMP>`%readonly-tables'</SAMP></A>
+<LI><A HREF="gperf.html#IDX10"><SAMP>`%struct-type'</SAMP></A>
+<LI><A HREF="gperf.html#IDX29"><SAMP>`%switch'</SAMP></A>
+<LI><A HREF="gperf.html#IDX31"><SAMP>`%{'</SAMP></A>
+<LI><A HREF="gperf.html#IDX32"><SAMP>`%}'</SAMP></A>
</DIR>
<H2>a</H2>
<DIR>
-<LI><A HREF="gperf.html#IDX20">Array name</A>
+<LI><A HREF="gperf.html#IDX42">Array name</A>
</DIR>
<H2>b</H2>
<DIR>
@@ -1609,13 +1979,13 @@ Foundation, 1989.
</DIR>
<H2>c</H2>
<DIR>
-<LI><A HREF="gperf.html#IDX19">Class name</A>
+<LI><A HREF="gperf.html#IDX41">Class name</A>
</DIR>
<H2>d</H2>
<DIR>
<LI><A HREF="gperf.html#IDX5">Declaration section</A>
-<LI><A HREF="gperf.html#IDX16">Delimiters</A>
-<LI><A HREF="gperf.html#IDX22">Duplicates</A>
+<LI><A HREF="gperf.html#IDX38">Delimiters</A>
+<LI><A HREF="gperf.html#IDX44">Duplicates</A>
</DIR>
<H2>f</H2>
<DIR>
@@ -1624,17 +1994,17 @@ Foundation, 1989.
</DIR>
<H2>h</H2>
<DIR>
-<LI><A HREF="gperf.html#IDX12">hash</A>
-<LI><A HREF="gperf.html#IDX11">hash table</A>
+<LI><A HREF="gperf.html#IDX34">hash</A>
+<LI><A HREF="gperf.html#IDX33">hash table</A>
</DIR>
<H2>i</H2>
<DIR>
-<LI><A HREF="gperf.html#IDX13">in_word_set</A>
-<LI><A HREF="gperf.html#IDX18">Initializers</A>
+<LI><A HREF="gperf.html#IDX35">in_word_set</A>
+<LI><A HREF="gperf.html#IDX40">Initializers</A>
</DIR>
<H2>j</H2>
<DIR>
-<LI><A HREF="gperf.html#IDX23">Jump value</A>
+<LI><A HREF="gperf.html#IDX45">Jump value</A>
</DIR>
<H2>k</H2>
<DIR>
@@ -1646,18 +2016,18 @@ Foundation, 1989.
</DIR>
<H2>n</H2>
<DIR>
-<LI><A HREF="gperf.html#IDX15">NUL</A>
+<LI><A HREF="gperf.html#IDX37">NUL</A>
</DIR>
<H2>s</H2>
<DIR>
-<LI><A HREF="gperf.html#IDX17">Slot name</A>
+<LI><A HREF="gperf.html#IDX39">Slot name</A>
<LI><A HREF="gperf.html#IDX2">Static search structure</A>
-<LI><A HREF="gperf.html#IDX14"><CODE>switch</CODE></A>, <A HREF="gperf.html#IDX21"><CODE>switch</CODE></A>
+<LI><A HREF="gperf.html#IDX36"><CODE>switch</CODE></A>, <A HREF="gperf.html#IDX43"><CODE>switch</CODE></A>
</DIR>
</P>
<P><HR><P>
-This document was generated on 26 September 2000 using the
+This document was generated on 7 May 2003 using the
<A HREF="http://wwwcn.cern.ch/dci/texi2html/">texi2html</A>
translator version 1.51.</P>
</BODY>
diff --git a/doc/gperf.info b/doc/gperf.info
index 526cacc..fda381f 100644
--- a/doc/gperf.info
+++ b/doc/gperf.info
@@ -1,4 +1,4 @@
-This is gperf.info, produced by makeinfo version 4.0 from gperf.texi.
+This is gperf.info, produced by makeinfo version 4.3 from gperf.texi.
INFO-DIR-SECTION Programming Tools
START-INFO-DIR-ENTRY
@@ -6,9 +6,9 @@ START-INFO-DIR-ENTRY
END-INFO-DIR-ENTRY
This file documents the features of the GNU Perfect Hash Function
-Generator 2.7.2.
+Generator 3.0.
- Copyright (C) 1989-2000 Free Software Foundation, Inc.
+ Copyright (C) 1989-2003 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
@@ -42,13 +42,12 @@ bugs.
* Copying:: GNU `gperf' General Public License says
how you can copy and share `gperf'.
* Contributors:: People who have contributed to `gperf'.
-* Motivation:: Static search structures and GNU GPERF.
+* Motivation:: The purpose of `gperf'.
* Search Structures:: Static search structures and GNU `gperf'
* Description:: High-level discussion of how GPERF functions.
* Options:: A description of options to the program.
* Bugs:: Known bugs and limitations with GPERF.
* Projects:: Things still left to do.
-* Implementation:: Implementation Details for GNU GPERF.
* Bibliography:: Material Referenced in this Report.
* Concept Index::
@@ -58,13 +57,20 @@ High-Level Description of GNU `gperf'
* Input Format:: Input Format to `gperf'
* Output Format:: Output Format for Generated C Code with `gperf'
-* Binary Strings:: Use of NUL characters
+* Binary Strings:: Use of NUL bytes
Input Format to `gperf'
-* Declarations:: `struct' Declarations and C Code Inclusion.
+* Declarations:: Declarations.
* Keywords:: Format for Keyword Entries.
* Functions:: Including Additional C Functions.
+* Controls for GNU indent:: Where to place directives for GNU `indent'.
+
+Declarations
+
+* User-supplied Struct:: Specifying keywords with attributes.
+* Gperf Declarations:: Embedding command line options in the input.
+* C Code Inclusion:: Including C declarations and definitions.
Invoking `gperf'
@@ -81,7 +87,6 @@ GNU GENERAL PUBLIC LICENSE
**************************
Version 2, June 1991
-
Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
@@ -140,7 +145,6 @@ patent must be licensed for everyone's free use or not licensed at all.
modification follow.
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
-
0. This License applies to any program or other work which contains a
notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program",
@@ -445,15 +449,13 @@ Contributors to GNU `gperf' Utility
***********************************
* The GNU `gperf' perfect hash function generator utility was
- originally written in GNU C++ by Douglas C. Schmidt. It is now
- also available in a highly-portable "old-style" C version. The
- general idea for the perfect hash function generator was inspired
- by Keith Bostic's algorithm written in C, and distributed to
- net.sources around 1984. The current program is a heavily
- modified, enhanced, and extended implementation of Keith's basic
- idea, created at the University of California, Irvine. Bugs,
- patches, and suggestions should be reported to both
- `<bug-gnu-utils@gnu.org>' and `<gperf-bugs@lists.sourceforge.net>'.
+ written in GNU C++ by Douglas C. Schmidt. The general idea for
+ the perfect hash function generator was inspired by Keith Bostic's
+ algorithm written in C, and distributed to net.sources around
+ 1984. The current program is a heavily modified, enhanced, and
+ extended implementation of Keith's basic idea, created at the
+ University of California, Irvine. Bugs, patches, and suggestions
+ should be reported to `<bug-gnu-gperf@gnu.org>'.
* Special thanks is extended to Michael Tiemann and Doug Lea, for
providing a useful compiler, and for giving me a forum to exhibit
@@ -463,8 +465,9 @@ Contributors to GNU `gperf' Utility
insights that greatly helped improve the quality and functionality
of `gperf'.
- * A testsuite was added by Bruno Haible. He also rewrote the output
- routines for better reliability.
+ * Bruno Haible enhanced and optimized the search algorithm. He also
+ rewrote the input routines and the output routines for better
+ reliability, and added a testsuite.

File: gperf.info, Node: Motivation, Next: Search Structures, Prev: Contributors, Up: Top
@@ -475,18 +478,19 @@ Introduction
`gperf' is a perfect hash function generator written in C++. It
transforms an N element user-specified keyword set W into a perfect
hash function F. F uniquely maps keywords in W onto the range 0..K,
-where K >= N. If K = N then F is a _minimal_ perfect hash function.
+where K >= N-1. If K = N-1 then F is a _minimal_ perfect hash function.
`gperf' generates a 0..K element static lookup table and a pair of C
functions. These functions determine whether a given character string
S occurs in W, using at most one probe into the lookup table.
`gperf' currently generates the reserved keyword recognizer for
lexical analyzers in several production and research compilers and
-language processing tools, including GNU C, GNU C++, GNU Pascal, GNU
-Modula 3, and GNU indent. Complete C++ source code for `gperf' is
-available via anonymous ftp from `ftp://ftp.gnu.org/pub/gnu/gperf/'. A
-paper describing `gperf''s design and implementation in greater detail
-is available in the Second USENIX C++ Conference proceedings.
+language processing tools, including GNU C, GNU C++, GNU Java, GNU
+Pascal, GNU Modula 3, and GNU indent. Complete C++ source code for
+`gperf' is available from `http://ftp.gnu.org/pub/gnu/gperf/'. A paper
+describing `gperf''s design and implementation in greater detail is
+available in the Second USENIX C++ Conference proceedings or from
+`http://www.cs.wustl.edu/~schmidt/resume.html'.

File: gperf.info, Node: Search Structures, Next: Description, Prev: Motivation, Up: Top
@@ -497,7 +501,7 @@ Static search structures and GNU `gperf'
A "static search structure" is an Abstract Data Type with certain
fundamental operations, e.g., _initialize_, _insert_, and _retrieve_.
Conceptually, all insertions occur before any retrievals. In practice,
-`gperf' generates a `static' array containing search set keywords and
+`gperf' generates a _static_ array containing search set keywords and
any associated attributes specified by the user. Thus, there is
essentially no execution-time cost for the insertions. It is a useful
data structure for representing _static search sets_. Static search
@@ -549,10 +553,10 @@ associated with constructing time- and space-efficient search
structures by hand. It has proven a useful and practical tool for
serious programming projects. Output from `gperf' is currently used in
several production and research compilers, including GNU C, GNU C++,
-GNU Pascal, and GNU Modula 3. The latter two compilers are not yet
-part of the official GNU distribution. Each compiler utilizes `gperf'
-to automatically generate static search structures that efficiently
-identify their respective reserved keywords.
+GNU Java, GNU Pascal, and GNU Modula 3. The latter two compilers are
+not yet part of the official GNU distribution. Each compiler utilizes
+`gperf' to automatically generate static search structures that
+efficiently identify their respective reserved keywords.

File: gperf.info, Node: Description, Next: Options, Prev: Search Structures, Up: Top
@@ -564,10 +568,10 @@ High-Level Description of GNU `gperf'
* Input Format:: Input Format to `gperf'
* Output Format:: Output Format for Generated C Code with `gperf'
-* Binary Strings:: Use of NUL characters
+* Binary Strings:: Use of NUL bytes
The perfect hash function generator `gperf' reads a set of
-"keywords" from a "keyfile" (or from the standard input by default).
+"keywords" from an input file (or from the standard input by default).
It attempts to derive a perfect hashing function that recognizes a
member of the "static keyword set" with at most a single probe into the
lookup table. If `gperf' succeeds in generating such a function it
@@ -586,7 +590,7 @@ scheme that minimizes data space storage size. Furthermore, using a C
`switch' may actually speed up the keyword retrieval time somewhat.
Actual results depend on your C compiler, of course.
- In general, `gperf' assigns values to the characters it is using for
+ In general, `gperf' assigns values to the bytes it is using for
hashing until some set of values gives each keyword a unique value. A
helpful heuristic is that the larger the hash value range, the easier
it is for `gperf' to find and generate a perfect hash function.
@@ -598,10 +602,10 @@ File: gperf.info, Node: Input Format, Next: Output Format, Prev: Description,
Input Format to `gperf'
=======================
- You can control the input keyfile format by varying certain
-command-line arguments, in particular the `-t' option. The input's
-appearance is similar to GNU utilities `flex' and `bison' (or UNIX
-utilities `lex' and `yacc'). Here's an outline of the general format:
+ You can control the input file format by varying certain command-line
+arguments, in particular the `-t' option. The input's appearance is
+similar to GNU utilities `flex' and `bison' (or UNIX utilities `lex'
+and `yacc'). Here's an outline of the general format:
declarations
%%
@@ -609,30 +613,58 @@ utilities `lex' and `yacc'). Here's an outline of the general format:
%%
functions
- _Unlike_ `flex' or `bison', all sections of `gperf''s input are
-optional. The following sections describe the input format for each
-section.
+ _Unlike_ `flex' or `bison', the declarations section and the
+functions section are optional. The following sections describe the
+input format for each section.
* Menu:
-* Declarations:: `struct' Declarations and C Code Inclusion.
+* Declarations:: Declarations.
* Keywords:: Format for Keyword Entries.
* Functions:: Including Additional C Functions.
+* Controls for GNU indent:: Where to place directives for GNU `indent'.
+
+ It is possible to omit the declaration section entirely, if the `-t'
+option is not given. In this case the input file begins directly with
+the first keyword line, e.g.:
+
+ january
+ february
+ march
+ april
+ ...

File: gperf.info, Node: Declarations, Next: Keywords, Prev: Input Format, Up: Input Format
-`struct' Declarations and C Code Inclusion
-------------------------------------------
+Declarations
+------------
The keyword input file optionally contains a section for including
-arbitrary C declarations and definitions, as well as provisions for
-providing a user-supplied `struct'. If the `-t' option _is_ enabled,
-you _must_ provide a C `struct' as the last component in the
-declaration section from the keyfile file. The first field in this
-struct must be a `char *' or `const char *' identifier called `name',
-although it is possible to modify this field's name with the `-K'
-option described below.
+arbitrary C declarations and definitions, `gperf' declarations that act
+like command-line options, as well as for providing a user-supplied
+`struct'.
+
+* Menu:
+
+* User-supplied Struct:: Specifying keywords with attributes.
+* Gperf Declarations:: Embedding command line options in the input.
+* C Code Inclusion:: Including C declarations and definitions.
+
+
+File: gperf.info, Node: User-supplied Struct, Next: Gperf Declarations, Prev: Declarations, Up: Declarations
+
+User-supplied `struct'
+......................
+
+ If the `-t' option (or, equivalently, the `%struct-type' declaration)
+_is_ enabled, you _must_ provide a C `struct' as the last component in
+the declaration section from the input file. The first field in this
+struct must be of type `char *' or `const char *' if the `-P' option is
+not given, or of type `int' if the option `-P' (or, equivalently, the
+`%pic' declaration) is enabled. This first field must be called
+`name', although it is possible to modify its name with the `-K' option
+(or, equivalently, the `%define slot-name' declaration) described below.
Here is a simple example, using months of the year and their
attributes as input:
@@ -656,6 +688,200 @@ attributes as input:
other fields are a pair of consecutive percent signs, `%%', appearing
left justified in the first column, as in the UNIX utility `lex'.
+
+File: gperf.info, Node: Gperf Declarations, Next: C Code Inclusion, Prev: User-supplied Struct, Up: Declarations
+
+Gperf Declarations
+..................
+
+ The declaration section can contain `gperf' declarations. They
+influence the way `gperf' works, like command line options do. In
+fact, every such declaration is equivalent to a command line option.
+There are three forms of declarations:
+
+ 1. Declarations without argument, like `%compare-lengths'.
+
+ 2. Declarations with an argument, like `%switch=COUNT'.
+
+ 3. Declarations of names of entities in the output file, like
+ `%define lookup-function-name NAME'.
+
+ When a declaration is given both in the input file and as a command
+line option, the command-line option's value prevails.
+
+ The following `gperf' declarations are available.
+
+`%delimiters=DELIMITER-LIST'
+ Allows you to provide a string containing delimiters used to
+ separate keywords from their attributes. The default is ",". This
+ option is essential if you want to use keywords that have embedded
+ commas or newlines.
+
+`%struct-type'
+ Allows you to include a `struct' type declaration for generated
+ code; see above for an example.
+
+`%ignore-case'
+ Consider upper and lower case ASCII characters as equivalent. The
+ string comparison will use a case insignificant character
+ comparison. Note that locale dependent case mappings are ignored.
+
+`%language=LANGUAGE-NAME'
+ Instructs `gperf' to generate code in the language specified by the
+ option's argument. Languages handled are currently:
+
+ `KR-C'
+ Old-style K&R C. This language is understood by old-style C
+ compilers and ANSI C compilers, but ANSI C compilers may flag
+ warnings (or even errors) because of lacking `const'.
+
+ `C'
+ Common C. This language is understood by ANSI C compilers,
+ and also by old-style C compilers, provided that you `#define
+ const' to empty for compilers which don't know about this
+ keyword.
+
+ `ANSI-C'
+ ANSI C. This language is understood by ANSI C compilers and
+ C++ compilers.
+
+ `C++'
+ C++. This language is understood by C++ compilers.
+
+ The default is C.
+
+`%define slot-name NAME'
+ This declaration is only useful when option `-t' (or,
+ equivalently, the `%struct-type' declaration) has been given. By
+ default, the program assumes the structure component identifier for
+ the keyword is `name'. This option allows an arbitrary choice of
+ identifier for this component, although it still must occur as the
+ first field in your supplied `struct'.
+
+`%define initializer-suffix INITIALIZERS'
+ This declaration is only useful when option `-t' (or,
+ equivalently, the `%struct-type' declaration) has been given. It
+ permits to specify initializers for the structure members following
+ SLOT-NAME in empty hash table entries. The list of initializers
+ should start with a comma. By default, the emitted code will
+ zero-initialize structure members following SLOT-NAME.
+
+`%define hash-function-name NAME'
+ Allows you to specify the name for the generated hash function.
+ Default name is `hash'. This option permits the use of two hash
+ tables in the same file.
+
+`%define lookup-function-name NAME'
+ Allows you to specify the name for the generated lookup function.
+ Default name is `in_word_set'. This option permits multiple
+ generated hash functions to be used in the same application.
+
+`%define class-name NAME'
+ This option is only useful when option `-L C++' (or, equivalently,
+ the `%language=C++' declaration) has been given. It allows you to
+ specify the name of generated C++ class. Default name is
+ `Perfect_Hash'.
+
+`%7bit'
+ This option specifies that all strings that will be passed as
+ arguments to the generated hash function and the generated lookup
+ function will solely consist of 7-bit ASCII characters (bytes in
+ the range 0..127). (Note that the ANSI C functions `isalnum' and
+ `isgraph' do _not_ guarantee that a byte is in this range. Only
+ an explicit test like `c >= 'A' && c <= 'Z'' guarantees this.)
+
+`%compare-lengths'
+ Compare keyword lengths before trying a string comparison. This
+ option is mandatory for binary comparisons (*note Binary
+ Strings::). It also might cut down on the number of string
+ comparisons made during the lookup, since keywords with different
+ lengths are never compared via `strcmp'. However, using
+ `%compare-lengths' might greatly increase the size of the
+ generated C code if the lookup table range is large (which implies
+ that the switch option `-S' or `%switch' is not enabled), since
+ the length table contains as many elements as there are entries in
+ the lookup table.
+
+`%compare-strncmp'
+ Generates C code that uses the `strncmp' function to perform
+ string comparisons. The default action is to use `strcmp'.
+
+`%readonly-tables'
+ Makes the contents of all generated lookup tables constant, i.e.,
+ "readonly". Many compilers can generate more efficient code for
+ this by putting the tables in readonly memory.
+
+`%enum'
+ Define constant values using an enum local to the lookup function
+ rather than with #defines. This also means that different lookup
+ functions can reside in the same file. Thanks to James Clark
+ `<jjc@ai.mit.edu>'.
+
+`%includes'
+ Include the necessary system include file, `<string.h>', at the
+ beginning of the code. By default, this is not done; the user must
+ include this header file himself to allow compilation of the code.
+
+`%global-table'
+ Generate the static table of keywords as a static global variable,
+ rather than hiding it inside of the lookup function (which is the
+ default behavior).
+
+`%pic'
+ Optimize the generated table for inclusion in shared libraries.
+ This reduces the startup time of programs using a shared library
+ containing the generated code. If the `%struct-type' declaration
+ (or, equivalently, the option `-t') is also given, the first field
+ of the user-defined struct must be of type `int', not `char *',
+ because it will contain offsets into the string pool instead of
+ actual strings. To convert such an offset to a string, you can
+ use the expression `stringpool + O', where O is the offset. The
+ string pool name can be changed through the `%define
+ string-pool-name' declaration.
+
+`%define string-pool-name NAME'
+ Allows you to specify the name of the generated string pool
+ created by the declaration `%pic' (or, equivalently, the option
+ `-P'). The default name is `stringpool'. This declaration
+ permits the use of two hash tables in the same file, with `%pic'
+ and even when the `%global-table' declaration (or, equivalently,
+ the option `-G') is given.
+
+`%null-strings'
+ Use NULL strings instead of empty strings for empty keyword table
+ entries. This reduces the startup time of programs using a shared
+ library containing the generated code (but not as much as the
+ declaration `%pic'), at the expense of one more test-and-branch
+ instruction at run time.
+
+`%define word-array-name NAME'
+ Allows you to specify the name for the generated array containing
+ the hash table. Default name is `wordlist'. This option permits
+ the use of two hash tables in the same file, even when the option
+ `-G' (or, equivalently, the `%global-table' declaration) is given.
+
+`%switch=COUNT'
+ Causes the generated C code to use a `switch' statement scheme,
+ rather than an array lookup table. This can lead to a reduction
+ in both time and space requirements for some input files. The
+ argument to this option determines how many `switch' statements
+ are generated. A value of 1 generates 1 `switch' containing all
+ the elements, a value of 2 generates 2 tables with 1/2 the
+ elements in each `switch', etc. This is useful since many C
+ compilers cannot correctly generate code for large `switch'
+ statements. This option was inspired in part by Keith Bostic's
+ original C program.
+
+`%omit-struct-type'
+ Prevents the transfer of the type declaration to the output file.
+ Use this option if the type is already defined elsewhere.
+
+
+File: gperf.info, Node: C Code Inclusion, Prev: Gperf Declarations, Up: Declarations
+
+C Code Inclusion
+................
+
Using a syntax similar to GNU utilities `flex' and `bison', it is
possible to directly include C source text and comments verbatim into
the generated output file. This is accomplished by enclosing the region
@@ -674,34 +900,28 @@ fragment based on the previous example that illustrates this feature:
march, 3, 31, 31
...
- It is possible to omit the declaration section entirely. In this
-case the keyfile begins directly with the first keyword line, e.g.:
-
- january, 1, 31, 31
- february, 2, 28, 29
- march, 3, 31, 31
- april, 4, 30, 30
- ...
-

File: gperf.info, Node: Keywords, Next: Functions, Prev: Declarations, Up: Input Format
Format for Keyword Entries
--------------------------
- The second keyfile format section contains lines of keywords and any
-associated attributes you might supply. A line beginning with `#' in
-the first column is considered a comment. Everything following the `#'
-is ignored, up to and including the following newline.
-
- The first field of each non-comment line is always the key itself.
-It can be given in two ways: as a simple name, i.e., without surrounding
-string quotation marks, or as a string enclosed in double-quotes, in C
-syntax, possibly with backslash escapes like `\"' or `\234' or `\xa8'.
-In either case, it must start right at the beginning of the line,
-without leading whitespace. In this context, a "field" is considered
-to extend up to, but not include, the first blank, comma, or newline.
-Here is a simple example taken from a partial list of C reserved words:
+ The second input file format section contains lines of keywords and
+any associated attributes you might supply. A line beginning with `#'
+in the first column is considered a comment. Everything following the
+`#' is ignored, up to and including the following newline. A line
+beginning with `%' in the first column is an option declaration and
+must not occur within the keywords section.
+
+ The first field of each non-comment line is always the keyword
+itself. It can be given in two ways: as a simple name, i.e., without
+surrounding string quotation marks, or as a string enclosed in
+double-quotes, in C syntax, possibly with backslash escapes like `\"'
+or `\234' or `\xa8'. In either case, it must start right at the
+beginning of the line, without leading whitespace. In this context, a
+"field" is considered to extend up to, but not include, the first
+blank, comma, or newline. Here is a simple example taken from a
+partial list of C reserved words:
# These are a few C reserved words, see the c.gperf file
# for a complete list of ANSI C reserved words.
@@ -722,12 +942,13 @@ elided if the declaration section is empty.
should be separated by commas, and terminate at the end of line. What
these fields mean is entirely up to you; they are used to initialize the
elements of the user-defined `struct' provided by you in the
-declaration section. If the `-t' option is _not_ enabled these fields
-are simply ignored. All previous examples except the last one contain
-keyword attributes.
+declaration section. If the `-t' option (or, equivalently, the
+`%struct-type' declaration) is _not_ enabled these fields are simply
+ignored. All previous examples except the last one contain keyword
+attributes.

-File: gperf.info, Node: Functions, Prev: Keywords, Up: Input Format
+File: gperf.info, Node: Functions, Next: Controls for GNU indent, Prev: Keywords, Up: Input Format
Including Additional C Functions
--------------------------------
@@ -740,6 +961,44 @@ responsibility to ensure that the code contained in this section is
valid C.

+File: gperf.info, Node: Controls for GNU indent, Prev: Functions, Up: Input Format
+
+Where to place directives for GNU `indent'.
+-------------------------------------------
+
+ If you want to invoke GNU `indent' on a `gperf' input file, you will
+see that GNU `indent' doesn't understand the `%%', `%{' and `%}'
+directives that control `gperf''s interpretation of the input file.
+Therefore you have to insert some directives for GNU `indent'. More
+precisely, assuming the most general input file structure
+
+ declarations part 1
+ %{
+ verbatim code
+ %}
+ declarations part 2
+ %%
+ keywords
+ %%
+ functions
+
+you would insert `*INDENT-OFF*' and `*INDENT-ON*' comments as follows:
+
+ /* *INDENT-OFF* */
+ declarations part 1
+ %{
+ /* *INDENT-ON* */
+ verbatim code
+ /* *INDENT-OFF* */
+ %}
+ declarations part 2
+ %%
+ keywords
+ %%
+ /* *INDENT-ON* */
+ functions
+
+
File: gperf.info, Node: Output Format, Next: Binary Strings, Prev: Input Format, Up: Description
Output Format for Generated C Code with `gperf'
@@ -754,23 +1013,25 @@ function prototypes are as follows:
- Function: unsigned int hash (const char * STR, unsigned int LEN)
By default, the generated `hash' function returns an integer value
- created by adding LEN to several user-specified STR key positions
+ created by adding LEN to several user-specified STR byte positions
indexed into an "associated values" table stored in a local static
array. The associated values table is constructed internally by
`gperf' and later output as a static local C array called
- `hash_table'; its meaning and properties are described below
- (*note Implementation::). The relevant key positions are specified
- via the `-k' option when running `gperf', as detailed in the
- _Options_ section below(*note Options::).
+ `hash_table'. The relevant selected positions (i.e. indices into
+ STR) are specified via the `-k' option when running `gperf', as
+ detailed in the _Options_ section below (*note Options::).
- Function: in_word_set (const char * STR, unsigned int LEN)
If STR is in the keyword set, returns a pointer to that keyword.
- More exactly, if the option `-t' was given, it returns a pointer
- to the matching keyword's structure. Otherwise it returns `NULL'.
+ More exactly, if the option `-t' (or, equivalently, the
+ `%struct-type' declaration) was given, it returns a pointer to the
+ matching keyword's structure. Otherwise it returns `NULL'.
- If the option `-c' is not used, STR must be a NUL terminated string
-of exactly length LEN. If `-c' is used, STR must simply be an array of
-LEN characters and does not need to be NUL terminated.
+ If the option `-c' (or, equivalently, the `%compare-strncmp'
+declaration) is not used, STR must be a NUL terminated string of
+exactly length LEN. If `-c' (or, equivalently, the `%compare-strncmp'
+declaration) is used, STR must simply be an array of LEN bytes and does
+not need to be NUL terminated.
The code generated for these two functions is affected by the
following options:
@@ -787,35 +1048,38 @@ following options:
degree of optimization, this method often results in smaller and
faster code.
- If the `-t' and `-S' options are omitted, the default action is to
-generate a `char *' array containing the keys, together with additional
-null strings used for padding the array. By experimenting with the
-various input and output options, and timing the resulting C code, you
-can determine the best option choices for different keyword set
-characteristics.
+ If the `-t' and `-S' options (or, equivalently, the `%struct-type'
+and `%switch' declarations) are omitted, the default action is to
+generate a `char *' array containing the keywords, together with
+additional empty strings used for padding the array. By experimenting
+with the various input and output options, and timing the resulting C
+code, you can determine the best option choices for different keyword
+set characteristics.

File: gperf.info, Node: Binary Strings, Prev: Output Format, Up: Description
-Use of NUL characters
-=====================
+Use of NUL bytes
+================
By default, the code generated by `gperf' operates on zero
-terminated strings, the usual representation of strings in C. This means
-that the keywords in the input file must not contain NUL characters,
+terminated strings, the usual representation of strings in C. This
+means that the keywords in the input file must not contain NUL bytes,
and the STR argument passed to `hash' or `in_word_set' must be NUL
terminated and have exactly length LEN.
- If option `-c' is used, then the STR argument does not need to be
-NUL terminated. The code generated by `gperf' will only access the
-first LEN, not LEN+1, bytes starting at STR. However, the keywords in
-the input file still must not contain NUL characters.
+ If option `-c' (or, equivalently, the `%compare-strncmp'
+declaration) is used, then the STR argument does not need to be NUL
+terminated. The code generated by `gperf' will only access the first
+LEN, not LEN+1, bytes starting at STR. However, the keywords in the
+input file still must not contain NUL bytes.
- If option `-l' is used, then the hash table performs binary
-comparison. The keywords in the input file may contain NUL characters,
-written in string syntax as `\000' or `\x00', and the code generated by
-`gperf' will treat NUL like any other character. Also, in this case
-the `-c' option is ignored.
+ If option `-l' (or, equivalently, the `%compare-lengths'
+declaration) is used, then the hash table performs binary comparison.
+The keywords in the input file may contain NUL bytes, written in string
+syntax as `\000' or `\x00', and the code generated by `gperf' will
+treat NUL like any other byte. Also, in this case the `-c' option (or,
+equivalently, the `%compare-strncmp' declaration) is ignored.

File: gperf.info, Node: Options, Next: Bugs, Prev: Description, Up: Top
@@ -825,11 +1089,12 @@ Invoking `gperf'
There are _many_ options to `gperf'. They were added to make the
program more convenient for use with real applications. "On-line" help
-is readily available via the `-h' option. Here is the complete list of
-options.
+is readily available via the `--help' option. Here is the complete
+list of options.
* Menu:
+* Output File:: Specifying the Location of the Output File
* Input Details:: Options that affect Interpretation of the Input File
* Output Language:: Specifying the Language for the Output Code
* Output Details:: Fine tuning Details in the Output Code
@@ -837,18 +1102,34 @@ options.
* Verbosity:: Informative Output

-File: gperf.info, Node: Input Details, Next: Output Language, Prev: Options, Up: Options
+File: gperf.info, Node: Output File, Next: Input Details, Prev: Options, Up: Options
+
+Specifying the Location of the Output File
+==========================================
+
+`--output-file=FILE'
+ Allows you to specify the name of the file to which the output is
+ written to.
+
+ The results are written to standard output if no output file is
+specified or if it is `-'.
+
+
+File: gperf.info, Node: Input Details, Next: Output Language, Prev: Output File, Up: Options
Options that affect Interpretation of the Input File
====================================================
+ These options are also available as declarations in the input file
+(*note Gperf Declarations::).
+
`-e KEYWORD-DELIMITER-LIST'
`--delimiters=KEYWORD-DELIMITER-LIST'
- Allows the user to provide a string containing delimiters used to
- separate keywords from their attributes. The default is ",\n".
- This option is essential if you want to use keywords that have
- embedded commas or newlines. One useful trick is to use -e'TAB',
- where TAB is the literal tab character.
+ Allows you to provide a string containing delimiters used to
+ separate keywords from their attributes. The default is ",". This
+ option is essential if you want to use keywords that have embedded
+ commas or newlines. One useful trick is to use -e'TAB', where TAB
+ is the literal tab character.
`-t'
`--struct-type'
@@ -860,44 +1141,59 @@ Options that affect Interpretation of the Input File
Pascal, Modula 2, Modula 3 and JavaScript reserved words are
distributed with this release.
+`--ignore-case'
+ Consider upper and lower case ASCII characters as equivalent. The
+ string comparison will use a case insignificant character
+ comparison. Note that locale dependent case mappings are ignored.
+ This option is therefore not suitable if a properly
+ internationalized or locale aware case mapping should be used.
+ (For example, in a Turkish locale, the upper case equivalent of
+ the lowercase ASCII letter `i' is the non-ASCII character `capital
+ i with dot above'.) For this case, it is better to apply an
+ uppercase or lowercase conversion on the string before passing it
+ to the `gperf' generated function.
+

File: gperf.info, Node: Output Language, Next: Output Details, Prev: Input Details, Up: Options
Options to specify the Language for the Output Code
===================================================
+ These options are also available as declarations in the input file
+(*note Gperf Declarations::).
+
`-L GENERATED-LANGUAGE-NAME'
`--language=GENERATED-LANGUAGE-NAME'
Instructs `gperf' to generate code in the language specified by the
option's argument. Languages handled are currently:
`KR-C'
- Old-style K&R C. This language is understood by old-style C
+ Old-style K&R C. This language is understood by old-style C
compilers and ANSI C compilers, but ANSI C compilers may flag
warnings (or even errors) because of lacking `const'.
`C'
- Common C. This language is understood by ANSI C compilers,
+ Common C. This language is understood by ANSI C compilers,
and also by old-style C compilers, provided that you `#define
const' to empty for compilers which don't know about this
keyword.
`ANSI-C'
- ANSI C. This language is understood by ANSI C compilers and
+ ANSI C. This language is understood by ANSI C compilers and
C++ compilers.
`C++'
- C++. This language is understood by C++ compilers.
+ C++. This language is understood by C++ compilers.
The default is C.
`-a'
This option is supported for compatibility with previous releases
- of `gperf'. It does not do anything.
+ of `gperf'. It does not do anything.
`-g'
This option is supported for compatibility with previous releases
- of `gperf'. It does not do anything.
+ of `gperf'. It does not do anything.

File: gperf.info, Node: Output Details, Next: Algorithmic Details, Prev: Output Language, Up: Options
@@ -905,51 +1201,68 @@ File: gperf.info, Node: Output Details, Next: Algorithmic Details, Prev: Outp
Options for fine tuning Details in the Output Code
==================================================
-`-K KEY-NAME'
-`--slot-name=KEY-NAME'
- This option is only useful when option `-t' has been given. By
- default, the program assumes the structure component identifier for
- the keyword is `name'. This option allows an arbitrary choice of
- identifier for this component, although it still must occur as the
- first field in your supplied `struct'.
+ Most of these options are also available as declarations in the
+input file (*note Gperf Declarations::).
+
+`-K SLOT-NAME'
+`--slot-name=SLOT-NAME'
+ This option is only useful when option `-t' (or, equivalently, the
+ `%struct-type' declaration) has been given. By default, the
+ program assumes the structure component identifier for the keyword
+ is `name'. This option allows an arbitrary choice of identifier
+ for this component, although it still must occur as the first
+ field in your supplied `struct'.
`-F INITIALIZERS'
`--initializer-suffix=INITIALIZERS'
- This option is only useful when option `-t' has been given. It
- permits to specify initializers for the structure members following
- KEY NAME in empty hash table entries. The list of initializers
- should start with a comma. By default, the emitted code will
- zero-initialize structure members following KEY NAME.
+ This option is only useful when option `-t' (or, equivalently, the
+ `%struct-type' declaration) has been given. It permits to specify
+ initializers for the structure members following SLOT-NAME in
+ empty hash table entries. The list of initializers should start
+ with a comma. By default, the emitted code will zero-initialize
+ structure members following SLOT-NAME.
`-H HASH-FUNCTION-NAME'
-`--hash-fn-name=HASH-FUNCTION-NAME'
+`--hash-function-name=HASH-FUNCTION-NAME'
Allows you to specify the name for the generated hash function.
Default name is `hash'. This option permits the use of two hash
tables in the same file.
`-N LOOKUP-FUNCTION-NAME'
-`--lookup-fn-name=LOOKUP-FUNCTION-NAME'
+`--lookup-function-name=LOOKUP-FUNCTION-NAME'
Allows you to specify the name for the generated lookup function.
- Default name is `in_word_set'. This option permits completely
- automatic generation of perfect hash functions, especially when
- multiple generated hash functions are used in the same application.
+ Default name is `in_word_set'. This option permits multiple
+ generated hash functions to be used in the same application.
`-Z CLASS-NAME'
`--class-name=CLASS-NAME'
- This option is only useful when option `-L C++' has been given. It
- allows you to specify the name of generated C++ class. Default
- name is `Perfect_Hash'.
+ This option is only useful when option `-L C++' (or, equivalently,
+ the `%language=C++' declaration) has been given. It allows you to
+ specify the name of generated C++ class. Default name is
+ `Perfect_Hash'.
`-7'
`--seven-bit'
This option specifies that all strings that will be passed as
arguments to the generated hash function and the generated lookup
- function will solely consist of 7-bit ASCII characters (characters
- in the range 0..127). (Note that the ANSI C functions `isalnum'
- and `isgraph' do _not_ guarantee that a character is in this
- range. Only an explicit test like `c >= 'A' && c <= 'Z''
- guarantees this.) This was the default in versions of `gperf'
- earlier than 2.7; now the default is to assume 8-bit characters.
+ function will solely consist of 7-bit ASCII characters (bytes in
+ the range 0..127). (Note that the ANSI C functions `isalnum' and
+ `isgraph' do _not_ guarantee that a byte is in this range. Only
+ an explicit test like `c >= 'A' && c <= 'Z'' guarantees this.)
+ This was the default in versions of `gperf' earlier than 2.7; now
+ the default is to support 8-bit and multibyte characters.
+
+`-l'
+`--compare-lengths'
+ Compare keyword lengths before trying a string comparison. This
+ option is mandatory for binary comparisons (*note Binary
+ Strings::). It also might cut down on the number of string
+ comparisons made during the lookup, since keywords with different
+ lengths are never compared via `strcmp'. However, using `-l'
+ might greatly increase the size of the generated C code if the
+ lookup table range is large (which implies that the switch option
+ `-S' or `%switch' is not enabled), since the length table contains
+ as many elements as there are entries in the lookup table.
`-c'
`--compare-strncmp'
@@ -976,29 +1289,57 @@ Options for fine tuning Details in the Output Code
include this header file himself to allow compilation of the code.
`-G'
-`--global'
+`--global-table'
Generate the static table of keywords as a static global variable,
rather than hiding it inside of the lookup function (which is the
default behavior).
+`-P'
+`--pic'
+ Optimize the generated table for inclusion in shared libraries.
+ This reduces the startup time of programs using a shared library
+ containing the generated code. If the option `-t' (or,
+ equivalently, the `%struct-type' declaration) is also given, the
+ first field of the user-defined struct must be of type `int', not
+ `char *', because it will contain offsets into the string pool
+ instead of actual strings. To convert such an offset to a string,
+ you can use the expression `stringpool + O', where O is the
+ offset. The string pool name can be changed through the option
+ `--string-pool-name'.
+
+`-Q STRING-POOL-NAME'
+`--string-pool-name=STRING-POOL-NAME'
+ Allows you to specify the name of the generated string pool
+ created by option `-P'. The default name is `stringpool'. This
+ option permits the use of two hash tables in the same file, with
+ `-P' and even when the option `-G' (or, equivalently, the
+ `%global-table' declaration) is given.
+
+`--null-strings'
+ Use NULL strings instead of empty strings for empty keyword table
+ entries. This reduces the startup time of programs using a shared
+ library containing the generated code (but not as much as option
+ `-P'), at the expense of one more test-and-branch instruction at
+ run time.
+
`-W HASH-TABLE-ARRAY-NAME'
`--word-array-name=HASH-TABLE-ARRAY-NAME'
Allows you to specify the name for the generated array containing
the hash table. Default name is `wordlist'. This option permits
the use of two hash tables in the same file, even when the option
- `-G' is given.
+ `-G' (or, equivalently, the `%global-table' declaration) is given.
`-S TOTAL-SWITCH-STATEMENTS'
`--switch=TOTAL-SWITCH-STATEMENTS'
Causes the generated C code to use a `switch' statement scheme,
rather than an array lookup table. This can lead to a reduction
- in both time and space requirements for some keyfiles. The
+ in both time and space requirements for some input files. The
argument to this option determines how many `switch' statements
- are generated. A value of 1 generates 1 `switch' containing all
+ are generated. A value of 1 generates 1 `switch' containing all
the elements, a value of 2 generates 2 tables with 1/2 the
elements in each `switch', etc. This is useful since many C
compilers cannot correctly generate code for large `switch'
- statements. This option was inspired in part by Keith Bostic's
+ statements. This option was inspired in part by Keith Bostic's
original C program.
`-T'
@@ -1008,7 +1349,7 @@ Options for fine tuning Details in the Output Code
`-p'
This option is supported for compatibility with previous releases
- of `gperf'. It does not do anything.
+ of `gperf'. It does not do anything.

File: gperf.info, Node: Algorithmic Details, Next: Verbosity, Prev: Output Details, Up: Options
@@ -1016,86 +1357,67 @@ File: gperf.info, Node: Algorithmic Details, Next: Verbosity, Prev: Output De
Options for changing the Algorithms employed by `gperf'
=======================================================
-`-k KEYS'
-`--key-positions=KEYS'
- Allows selection of the character key positions used in the
- keywords' hash function. The allowable choices range between
- 1-126, inclusive. The positions are separated by commas, e.g.,
- `-k 9,4,13,14'; ranges may be used, e.g., `-k 2-7'; and positions
- may occur in any order. Furthermore, the meta-character '*'
- causes the generated hash function to consider *all* character
- positions in each key, whereas '$' instructs the hash function to
- use the "final character" of a key (this is the only way to use a
- character position greater than 126, incidentally).
+`-k SELECTED-BYTE-POSITIONS'
+`--key-positions=SELECTED-BYTE-POSITIONS'
+ Allows selection of the byte positions used in the keywords' hash
+ function. The allowable choices range between 1-255, inclusive.
+ The positions are separated by commas, e.g., `-k 9,4,13,14';
+ ranges may be used, e.g., `-k 2-7'; and positions may occur in any
+ order. Furthermore, the wildcard '*' causes the generated hash
+ function to consider *all* byte positions in each keyword, whereas
+ '$' instructs the hash function to use the "final byte" of a
+ keyword (this is the only way to use a byte position greater than
+ 255, incidentally).
For instance, the option `-k 1,2,4,6-10,'$'' generates a hash
function that considers positions 1,2,4,6,7,8,9,10, plus the last
- character in each key (which may differ for each key, obviously).
- Keys with length less than the indicated key positions work
- properly, since selected key positions exceeding the key length
- are simply not referenced in the hash function.
+ byte in each keyword (which may be at a different position for each
+ keyword, obviously). Keywords with length less than the indicated
+ byte positions work properly, since selected byte positions
+ exceeding the keyword length are simply not referenced in the hash
+ function.
-`-l'
-`--compare-strlen'
- Compare key lengths before trying a string comparison. This might
- cut down on the number of string comparisons made during the
- lookup, since keys with different lengths are never compared via
- `strcmp'. However, using `-l' might greatly increase the size of
- the generated C code if the lookup table range is large (which
- implies that the switch option `-S' is not enabled), since the
- length table contains as many elements as there are entries in the
- lookup table. This option is mandatory for binary comparisons
- (*note Binary Strings::).
+ This option is not normally needed since version 2.8 of `gperf';
+ the default byte positions are computed depending on the keyword
+ set, through a search that minimizes the number of byte positions.
`-D'
`--duplicates'
- Handle keywords whose key position sets hash to duplicate values.
- Duplicate hash values occur for two reasons:
-
- * Since `gperf' does not backtrack it is possible for it to
- process all your input keywords without finding a unique
- mapping for each word. However, frequently only a very small
- number of duplicates occur, and the majority of keys still
- require one probe into the table.
-
- * Sometimes a set of keys may have the same names, but possess
- different attributes. With the -D option `gperf' treats all
- these keys as part of an equivalence class and generates a
- perfect hash function with multiple comparisons for duplicate
- keys. It is up to you to completely disambiguate the
- keywords by modifying the generated C code. However, `gperf'
- helps you out by organizing the output.
-
- Option `-D' is extremely useful for certain large or highly
- redundant keyword sets, e.g., assembler instruction opcodes.
+ Handle keywords whose selected byte sets hash to duplicate values.
+ Duplicate hash values can occur if a set of keywords has the same
+ names, but possesses different attributes, or if the selected byte
+ positions are not well chosen. With the -D option `gperf' treats
+ all these keywords as part of an equivalence class and generates a
+ perfect hash function with multiple comparisons for duplicate
+ keywords. It is up to you to completely disambiguate the keywords
+ by modifying the generated C code. However, `gperf' helps you out
+ by organizing the output.
+
Using this option usually means that the generated hash function
is no longer perfect. On the other hand, it permits `gperf' to
work on keyword sets that it otherwise could not handle.
-`-f ITERATION-AMOUNT'
-`--fast=ITERATION-AMOUNT'
- Generate the perfect hash function "fast". This decreases
- `gperf''s running time at the cost of minimizing generated
- table-size. The iteration amount represents the number of times to
- iterate when resolving a collision. `0' means iterate by the
- number of keywords. This option is probably most useful when used
- in conjunction with options `-D' and/or `-S' for _large_ keyword
- sets.
+`-m ITERATIONS'
+`--multiple-iterations=ITERATIONS'
+ Perform multiple choices of the `-i' and `-j' values, and choose
+ the best results. This increases the running time by a factor of
+ ITERATIONS but does a good job minimizing the generated table size.
`-i INITIAL-VALUE'
`--initial-asso=INITIAL-VALUE'
Provides an initial VALUE for the associate values array. Default
is 0. Increasing the initial value helps inflate the final table
size, possibly leading to more time efficient keyword lookups.
- Note that this option is not particularly useful when `-S' is
- used. Also, `-i' is overridden when the `-r' option is used.
+ Note that this option is not particularly useful when `-S' (or,
+ equivalently, `%switch') is used. Also, `-i' is overridden when
+ the `-r' option is used.
`-j JUMP-VALUE'
`--jump=JUMP-VALUE'
Affects the "jump value", i.e., how far to advance the associated
- character value upon collisions. JUMP-VALUE is rounded up to an
- odd number, the default is 5. If the JUMP-VALUE is 0 `gperf'
- jumps by random amounts.
+ byte value upon collisions. JUMP-VALUE is rounded up to an odd
+ number, the default is 5. If the JUMP-VALUE is 0 `gperf' jumps by
+ random amounts.
`-n'
`--no-strlen'
@@ -1103,61 +1425,40 @@ Options for changing the Algorithms employed by `gperf'
computing its hash value. This may save a few assembly
instructions in the generated lookup table.
-`-o'
-`--occurrence-sort'
- Reorders the keywords by sorting the keywords so that frequently
- occuring key position set components appear first. A second
- reordering pass follows so that keys with "already determined
- values" are placed towards the front of the keylist. This may
- decrease the time required to generate a perfect hash function for
- many keyword sets, and also produce more minimal perfect hash
- functions. The reason for this is that the reordering helps prune
- the search time by handling inevitable collisions early in the
- search process. On the other hand, if the number of keywords is
- _very_ large using `-o' may _increase_ `gperf''s execution time,
- since collisions will begin earlier and continue throughout the
- remainder of keyword processing. See Cichelli's paper from the
- January 1980 Communications of the ACM for details.
-
`-r'
`--random'
Utilizes randomness to initialize the associated values table.
This frequently generates solutions faster than using deterministic
initialization (which starts all associated values at 0).
Furthermore, using the randomization option generally increases
- the size of the table. If `gperf' has difficultly with a certain
- keyword set try using `-r' or `-D'.
+ the size of the table.
`-s SIZE-MULTIPLE'
`--size-multiple=SIZE-MULTIPLE'
Affects the size of the generated hash table. The numeric
argument for this option indicates "how many times larger or
smaller" the maximum associated value range should be, in
- relationship to the number of keys. If the SIZE-MULTIPLE is
- negative the maximum associated value is calculated by _dividing_
- it into the total number of keys. For example, a value of 3 means
- "allow the maximum associated value to be about 3 times larger
- than the number of input keys".
-
- Conversely, a value of -3 means "allow the maximum associated
- value to be about 3 times smaller than the number of input keys".
- Negative values are useful for limiting the overall size of the
- generated hash table, though this usually increases the number of
- duplicate hash values.
-
- If `generate switch' option `-S' is _not_ enabled, the maximum
- associated value influences the static array table size, and a
- larger table should decrease the time required for an unsuccessful
- search, at the expense of extra table space.
+ relationship to the number of keywords. It can be written as an
+ integer, a floating-point number or a fraction. For example, a
+ value of 3 means "allow the maximum associated value to be about 3
+ times larger than the number of input keywords". Conversely, a
+ value of 1/3 means "allow the maximum associated value to be about
+ 3 times smaller than the number of input keywords". Values
+ smaller than 1 are useful for limiting the overall size of the
+ generated hash table, though the option `-m' is better at this
+ purpose.
+
+ If `generate switch' option `-S' (or, equivalently, `%switch') is
+ _not_ enabled, the maximum associated value influences the static
+ array table size, and a larger table should decrease the time
+ required for an unsuccessful search, at the expense of extra table
+ space.
The default value is 1, thus the default maximum associated value
- about the same size as the number of keys (for efficiency, the
+ about the same size as the number of keywords (for efficiency, the
maximum associated value is always rounded up to a power of 2).
The actual table size may vary somewhat, since this technique is
- essentially a heuristic. In particular, setting this value too
- high slows down `gperf''s runtime, since it must search through a
- much larger range of values. Judicious use of the `-f' option
- helps alleviate this overhead, however.
+ essentially a heuristic.

File: gperf.info, Node: Verbosity, Prev: Algorithmic Details, Up: Options
@@ -1200,17 +1501,6 @@ Known Bugs and Limitations with `gperf'
15,000 keywords). When processing large keyword sets it helps
greatly to have over 8 megs of RAM.
- However, since `gperf' does not backtrack no guaranteed solution
- occurs on every run. On the other hand, it is usually easy to
- obtain a solution by varying the option parameters. In
- particular, try the `-r' option, and also try changing the default
- arguments to the `-s' and `-j' options. To _guarantee_ a
- solution, use the `-D' and `-S' options, although the final
- results are not likely to be a _perfect_ hash function anymore!
- Finally, use the `-f' option if you want `gperf' to generate the
- perfect hash function _fast_, with less emphasis on making it
- minimal.
-
* The size of the generate static keyword array can get _extremely_
large if the input keyword file is large or if the keywords are
quite similar. This tends to slow down the compilation of the
@@ -1218,17 +1508,17 @@ Known Bugs and Limitations with `gperf'
this situation occurs, consider using the `-S' option to reduce
data size, potentially increasing keyword recognition time a
negligible amount. Since many C compilers cannot correctly
- generated code for large switch statements it is important to
+ generate code for large switch statements it is important to
qualify the -S option with an appropriate numerical argument that
controls the number of switch statements generated.
- * The maximum number of key positions selected for a given key has an
- arbitrary limit of 126. This restriction should be removed, and if
- anyone considers this a problem write me and let me know so I can
- remove the constraint.
+ * The maximum number of selected byte positions has an arbitrary
+ limit of 255. This restriction should be removed, and if anyone
+ considers this a problem write me and let me know so I can remove
+ the constraint.

-File: gperf.info, Node: Projects, Next: Implementation, Prev: Bugs, Up: Top
+File: gperf.info, Node: Projects, Next: Bibliography, Prev: Bugs, Up: Top
Things Still Left to Do
***********************
@@ -1238,45 +1528,22 @@ function algorithm with a more exhaustive approach; the perfect hash
module is essential independent from other program modules. Additional
worthwhile improvements include:
- * Make the algorithm more robust. At present, the program halts
- with an error diagnostic if it can't find a direct solution and
- the `-D' option is not enabled. A more comprehensive, albeit
- computationally expensive, approach would employ backtracking or
- enable alternative options and retry. It's not clear how helpful
- this would be, in general, since most search sets are rather small
- in practice.
-
* Another useful extension involves modifying the program to generate
"minimal" perfect hash functions (under certain circumstances, the
current version can be rather extravagant in the generated table
- size). Again, this is mostly of theoretical interest, since a
- sparse table often produces faster lookups, and use of the `-S'
- `switch' option can minimize the data size, at the expense of
- slightly longer lookups (note that the gcc compiler generally
- produces good code for `switch' statements, reducing the need for
- more complex schemes).
+ size). This is mostly of theoretical interest, since a sparse
+ table often produces faster lookups, and use of the `-S' `switch'
+ option can minimize the data size, at the expense of slightly
+ longer lookups (note that the gcc compiler generally produces good
+ code for `switch' statements, reducing the need for more complex
+ schemes).
* In addition to improving the algorithm, it would also be useful to
- generate a C++ class or Ada package as the code output, in
- addition to the current C routines.
+ generate an Ada package as the code output, in addition to the
+ current C and C++ routines.

-File: gperf.info, Node: Implementation, Next: Bibliography, Prev: Projects, Up: Top
-
-Implementation Details of GNU `gperf'
-*************************************
-
- A paper describing the high-level description of the data structures
-and algorithms used to implement `gperf' will soon be available. This
-paper is useful not only from a maintenance and enhancement perspective,
-but also because they demonstrate several clever and useful programming
-techniques, e.g., `Iteration Number' boolean arrays, double hashing, a
-"safe" and efficient method for reading arbitrarily long input from a
-file, and a provably optimal algorithm for simultaneously determining
-both the minimum and maximum elements in a list.
-
-
-File: gperf.info, Node: Bibliography, Next: Concept Index, Prev: Implementation, Up: Top
+File: gperf.info, Node: Bibliography, Next: Concept Index, Prev: Projects, Up: Top
Bibliography
************
@@ -1311,20 +1578,23 @@ Hash Functions Communications of the ACM, 28, 5(December 1985), 523-532
[9] Schmidt, Douglas C. GPERF: A Perfect Hash Function Generator
Second USENIX C++ Conference Proceedings, April 1990.
- [10] Sebesta, R.W. and Taylor, M.A. Minimal Perfect Hash Functions
+ [10] Schmidt, Douglas C. GPERF: A Perfect Hash Function Generator
+C++ Report, SIGS 10 10 (November/December 1998).
+
+ [11] Sebesta, R.W. and Taylor, M.A. Minimal Perfect Hash Functions
for Reserved Word Lists SIGPLAN Notices, 20, 12(September 1985), 47-53.
- [11] Sprugnoli, R. Perfect Hashing Functions: A Single Probe
+ [12] Sprugnoli, R. Perfect Hashing Functions: A Single Probe
Retrieving Method for Static Sets Communications of the ACM, 20
11(November 1977), 841-850.
- [12] Stallman, Richard M. Using and Porting GNU CC Free Software
+ [13] Stallman, Richard M. Using and Porting GNU CC Free Software
Foundation, 1988.
- [13] Stroustrup, Bjarne The C++ Programming Language.
+ [14] Stroustrup, Bjarne The C++ Programming Language.
Addison-Wesley, 1986.
- [14] Tiemann, Michael D. User's Guide to GNU C++ Free Software
+ [15] Tiemann, Michael D. User's Guide to GNU C++ Free Software
Foundation, 1989.

@@ -1335,9 +1605,31 @@ Concept Index
* Menu:
-* %%: Declarations.
-* %{: Declarations.
-* %}: Declarations.
+* %%: User-supplied Struct.
+* %7bit: Gperf Declarations.
+* %compare-lengths: Gperf Declarations.
+* %compare-strncmp: Gperf Declarations.
+* %define class-name: Gperf Declarations.
+* %define hash-function-name: Gperf Declarations.
+* %define initializer-suffix: Gperf Declarations.
+* %define lookup-function-name: Gperf Declarations.
+* %define slot-name: Gperf Declarations.
+* %define string-pool-name: Gperf Declarations.
+* %define word-array-name: Gperf Declarations.
+* %delimiters: Gperf Declarations.
+* %enum: Gperf Declarations.
+* %global-table: Gperf Declarations.
+* %ignore-case: Gperf Declarations.
+* %includes: Gperf Declarations.
+* %language: Gperf Declarations.
+* %null-strings: Gperf Declarations.
+* %omit-struct-type: Gperf Declarations.
+* %pic: Gperf Declarations.
+* %readonly-tables: Gperf Declarations.
+* %struct-type: Gperf Declarations.
+* %switch: Gperf Declarations.
+* %{: C Code Inclusion.
+* %}: C Code Inclusion.
* Array name: Output Details.
* Bugs: Contributors.
* Class name: Output Details.
@@ -1362,28 +1654,32 @@ Concept Index

Tag Table:
-Node: Top1236
-Node: Copying3130
-Node: Contributors22321
-Node: Motivation23580
-Node: Search Structures24656
-Node: Description28201
-Node: Input Format30102
-Node: Declarations30944
-Node: Keywords33268
-Node: Functions35023
-Node: Output Format35517
-Node: Binary Strings38113
-Node: Options39119
-Node: Input Details39825
-Node: Output Language40890
-Node: Output Details42194
-Node: Algorithmic Details46842
-Node: Verbosity54284
-Node: Bugs54987
-Node: Projects57215
-Node: Implementation58792
-Node: Bibliography59509
-Node: Concept Index61452
+Node: Top1234
+Node: Copying3318
+Node: Contributors22507
+Node: Motivation23700
+Node: Search Structures24828
+Node: Description28383
+Node: Input Format30276
+Node: Declarations31413
+Node: User-supplied Struct31989
+Node: Gperf Declarations33405
+Node: C Code Inclusion41815
+Node: Keywords42644
+Node: Functions44580
+Node: Controls for GNU indent45106
+Node: Output Format46045
+Node: Binary Strings48829
+Node: Options49972
+Node: Output File50757
+Node: Input Details51141
+Node: Output Language52972
+Node: Output Details54383
+Node: Algorithmic Details61298
+Node: Verbosity66547
+Node: Bugs67250
+Node: Projects68842
+Node: Bibliography69970
+Node: Concept Index72026

End Tag Table
diff --git a/doc/gperf_1.html b/doc/gperf_1.html
index b47dd53..d5269ce 100644
--- a/doc/gperf_1.html
+++ b/doc/gperf_1.html
@@ -1,12 +1,12 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
- from gperf.texi on 26 September 2000 -->
+ from gperf.texi on 7 May 2003 -->
<TITLE>Perfect Hash Function Generator - GNU GENERAL PUBLIC LICENSE</TITLE>
</HEAD>
<BODY>
-Go to the first, previous, <A HREF="gperf_2.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the first, previous, <A HREF="gperf_2.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
<P><HR><P>
@@ -455,6 +455,6 @@ Public License instead of this License.
</P>
<P><HR><P>
-Go to the first, previous, <A HREF="gperf_2.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the first, previous, <A HREF="gperf_2.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
</BODY>
</HTML>
diff --git a/doc/gperf_10.html b/doc/gperf_10.html
index 8f72c16..0590625 100644
--- a/doc/gperf_10.html
+++ b/doc/gperf_10.html
@@ -1,82 +1,104 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
- from gperf.texi on 26 September 2000 -->
+ from gperf.texi on 7 May 2003 -->
-<TITLE>Perfect Hash Function Generator - 8 Bibliography</TITLE>
+<TITLE>Perfect Hash Function Generator - Concept Index</TITLE>
</HEAD>
<BODY>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_9.html">previous</A>, <A HREF="gperf_11.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_9.html">previous</A>, next, last section, <A HREF="gperf_toc.html">table of contents</A>.
<P><HR><P>
-<H1><A NAME="SEC23" HREF="gperf_toc.html#TOC23">8 Bibliography</A></H1>
+<H1><A NAME="SEC28" HREF="gperf_toc.html#TOC28">Concept Index</A></H1>
<P>
-[1] Chang, C.C.: <I>A Scheme for Constructing Ordered Minimal Perfect
-Hashing Functions</I> Information Sciences 39(1986), 187-195.
-
-[2] Cichelli, Richard J. <I>Author's Response to "On Cichelli's Minimal Perfect Hash
-Functions Method"</I> Communications of the ACM, 23, 12(December 1980), 729.
-
-[3] Cichelli, Richard J. <I>Minimal Perfect Hash Functions Made Simple</I>
-Communications of the ACM, 23, 1(January 1980), 17-19.
-
-[4] Cook, C. R. and Oldehoeft, R.R. <I>A Letter Oriented Minimal
-Perfect Hashing Function</I> SIGPLAN Notices, 17, 9(September 1982), 18-27.
-
-</P>
-<P>
-[5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M.
-<I>Practical Perfect Hashing</I> Computer Journal, 28, 1(January 1985), 54-58.
-
-[6] Jaeschke, G. <I>Reciprocal Hashing: A Method for Generating Minimal
-Perfect Hashing Functions</I> Communications of the ACM, 24, 12(December
-1981), 829-833.
-
-</P>
-<P>
-[7] Jaeschke, G. and Osterburg, G. <I>On Cichelli's Minimal Perfect
-Hash Functions Method</I> Communications of the ACM, 23, 12(December 1980),
-728-729.
-
-</P>
-<P>
-[8] Sager, Thomas J. <I>A Polynomial Time Generator for Minimal Perfect
-Hash Functions</I> Communications of the ACM, 28, 5(December 1985), 523-532
-
-</P>
-<P>
-[9] Schmidt, Douglas C. <I>GPERF: A Perfect Hash Function Generator</I>
-Second USENIX C++ Conference Proceedings, April 1990.
-
-</P>
-<P>
-[10] Sebesta, R.W. and Taylor, M.A. <I>Minimal Perfect Hash Functions
-for Reserved Word Lists</I> SIGPLAN Notices, 20, 12(September 1985), 47-53.
-
-</P>
-<P>
-[11] Sprugnoli, R. <I>Perfect Hashing Functions: A Single Probe
-Retrieving Method for Static Sets</I> Communications of the ACM, 20
-11(November 1977), 841-850.
-
-</P>
-<P>
-[12] Stallman, Richard M. <I>Using and Porting GNU CC</I> Free Software Foundation,
-1988.
-
-</P>
-<P>
-[13] Stroustrup, Bjarne <I>The C++ Programming Language.</I> Addison-Wesley, 1986.
-
-</P>
-<P>
-[14] Tiemann, Michael D. <I>User's Guide to GNU C++</I> Free Software
-Foundation, 1989.
+<H2>%</H2>
+<DIR>
+<LI><A HREF="gperf_5.html#IDX8"><SAMP>`%%'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX18"><SAMP>`%7bit'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX19"><SAMP>`%compare-lengths'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX20"><SAMP>`%compare-strncmp'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX17"><SAMP>`%define class-name'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX15"><SAMP>`%define hash-function-name'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX14"><SAMP>`%define initializer-suffix'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX16"><SAMP>`%define lookup-function-name'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX13"><SAMP>`%define slot-name'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX26"><SAMP>`%define string-pool-name'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX28"><SAMP>`%define word-array-name'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX9"><SAMP>`%delimiters'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX22"><SAMP>`%enum'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX24"><SAMP>`%global-table'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX11"><SAMP>`%ignore-case'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX23"><SAMP>`%includes'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX12"><SAMP>`%language'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX27"><SAMP>`%null-strings'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX30"><SAMP>`%omit-struct-type'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX25"><SAMP>`%pic'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX21"><SAMP>`%readonly-tables'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX10"><SAMP>`%struct-type'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX29"><SAMP>`%switch'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX31"><SAMP>`%{'</SAMP></A>
+<LI><A HREF="gperf_5.html#IDX32"><SAMP>`%}'</SAMP></A>
+</DIR>
+<H2>a</H2>
+<DIR>
+<LI><A HREF="gperf_6.html#IDX42">Array name</A>
+</DIR>
+<H2>b</H2>
+<DIR>
+<LI><A HREF="gperf_2.html#IDX1">Bugs</A>
+</DIR>
+<H2>c</H2>
+<DIR>
+<LI><A HREF="gperf_6.html#IDX41">Class name</A>
+</DIR>
+<H2>d</H2>
+<DIR>
+<LI><A HREF="gperf_5.html#IDX5">Declaration section</A>
+<LI><A HREF="gperf_6.html#IDX38">Delimiters</A>
+<LI><A HREF="gperf_6.html#IDX44">Duplicates</A>
+</DIR>
+<H2>f</H2>
+<DIR>
+<LI><A HREF="gperf_5.html#IDX4">Format</A>
+<LI><A HREF="gperf_5.html#IDX7">Functions section</A>
+</DIR>
+<H2>h</H2>
+<DIR>
+<LI><A HREF="gperf_5.html#IDX34">hash</A>
+<LI><A HREF="gperf_5.html#IDX33">hash table</A>
+</DIR>
+<H2>i</H2>
+<DIR>
+<LI><A HREF="gperf_5.html#IDX35">in_word_set</A>
+<LI><A HREF="gperf_6.html#IDX40">Initializers</A>
+</DIR>
+<H2>j</H2>
+<DIR>
+<LI><A HREF="gperf_6.html#IDX45">Jump value</A>
+</DIR>
+<H2>k</H2>
+<DIR>
+<LI><A HREF="gperf_5.html#IDX6">Keywords section</A>
+</DIR>
+<H2>m</H2>
+<DIR>
+<LI><A HREF="gperf_4.html#IDX3">Minimal perfect hash functions</A>
+</DIR>
+<H2>n</H2>
+<DIR>
+<LI><A HREF="gperf_5.html#IDX37">NUL</A>
+</DIR>
+<H2>s</H2>
+<DIR>
+<LI><A HREF="gperf_6.html#IDX39">Slot name</A>
+<LI><A HREF="gperf_4.html#IDX2">Static search structure</A>
+<LI><A HREF="gperf_5.html#IDX36"><CODE>switch</CODE></A>, <A HREF="gperf_6.html#IDX43"><CODE>switch</CODE></A>
+</DIR>
</P>
<P><HR><P>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_9.html">previous</A>, <A HREF="gperf_11.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_9.html">previous</A>, next, last section, <A HREF="gperf_toc.html">table of contents</A>.
</BODY>
</HTML>
diff --git a/doc/gperf_2.html b/doc/gperf_2.html
index b3c3c57..3611e41 100644
--- a/doc/gperf_2.html
+++ b/doc/gperf_2.html
@@ -1,12 +1,12 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
- from gperf.texi on 26 September 2000 -->
+ from gperf.texi on 7 May 2003 -->
<TITLE>Perfect Hash Function Generator - Contributors to GNU gperf Utility</TITLE>
</HEAD>
<BODY>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_1.html">previous</A>, <A HREF="gperf_3.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_1.html">previous</A>, <A HREF="gperf_3.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
<P><HR><P>
@@ -18,15 +18,13 @@ Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_1.html">previous</A>,
<A NAME="IDX1"></A>
The GNU <CODE>gperf</CODE> perfect hash function generator utility was
-originally written in GNU C++ by Douglas C. Schmidt. It is now also
-available in a highly-portable "old-style" C version. The general
+written in GNU C++ by Douglas C. Schmidt. The general
idea for the perfect hash function generator was inspired by Keith
Bostic's algorithm written in C, and distributed to net.sources around
1984. The current program is a heavily modified, enhanced, and extended
implementation of Keith's basic idea, created at the University of
California, Irvine. Bugs, patches, and suggestions should be reported
-to both <CODE>&#60;bug-gnu-utils@gnu.org&#62;</CODE> and
-<CODE>&#60;gperf-bugs@lists.sourceforge.net&#62;</CODE>.
+to <CODE>&#60;bug-gnu-gperf@gnu.org&#62;</CODE>.
<LI>
@@ -39,11 +37,12 @@ that greatly helped improve the quality and functionality of <CODE>gperf</CODE>.
<LI>
-A testsuite was added by Bruno Haible. He also rewrote the output
-routines for better reliability.
+Bruno Haible enhanced and optimized the search algorithm. He also rewrote
+the input routines and the output routines for better reliability, and
+added a testsuite.
</UL>
<P><HR><P>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_1.html">previous</A>, <A HREF="gperf_3.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_1.html">previous</A>, <A HREF="gperf_3.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
</BODY>
</HTML>
diff --git a/doc/gperf_3.html b/doc/gperf_3.html
index 529b1c7..dda84ab 100644
--- a/doc/gperf_3.html
+++ b/doc/gperf_3.html
@@ -1,12 +1,12 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
- from gperf.texi on 26 September 2000 -->
+ from gperf.texi on 7 May 2003 -->
<TITLE>Perfect Hash Function Generator - 1 Introduction</TITLE>
</HEAD>
<BODY>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_2.html">previous</A>, <A HREF="gperf_4.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_2.html">previous</A>, <A HREF="gperf_4.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
<P><HR><P>
@@ -16,8 +16,8 @@ Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_2.html">previous</A>,
<CODE>gperf</CODE> is a perfect hash function generator written in C++. It
transforms an <VAR>n</VAR> element user-specified keyword set <VAR>W</VAR> into a
perfect hash function <VAR>F</VAR>. <VAR>F</VAR> uniquely maps keywords in
-<VAR>W</VAR> onto the range 0..<VAR>k</VAR>, where <VAR>k</VAR> &#62;= <VAR>n</VAR>. If <VAR>k</VAR>
-= <VAR>n</VAR> then <VAR>F</VAR> is a <EM>minimal</EM> perfect hash function.
+<VAR>W</VAR> onto the range 0..<VAR>k</VAR>, where <VAR>k</VAR> &#62;= <VAR>n-1</VAR>. If <VAR>k</VAR>
+= <VAR>n-1</VAR> then <VAR>F</VAR> is a <EM>minimal</EM> perfect hash function.
<CODE>gperf</CODE> generates a 0..<VAR>k</VAR> element static lookup table and a
pair of C functions. These functions determine whether a given
character string <VAR>s</VAR> occurs in <VAR>W</VAR>, using at most one probe into
@@ -27,14 +27,15 @@ the lookup table.
<P>
<CODE>gperf</CODE> currently generates the reserved keyword recognizer for
lexical analyzers in several production and research compilers and
-language processing tools, including GNU C, GNU C++, GNU Pascal, GNU
-Modula 3, and GNU indent. Complete C++ source code for <CODE>gperf</CODE> is
-available via anonymous ftp from <CODE>ftp://ftp.gnu.org/pub/gnu/gperf/</CODE>.
+language processing tools, including GNU C, GNU C++, GNU Java, GNU Pascal,
+GNU Modula 3, and GNU indent. Complete C++ source code for <CODE>gperf</CODE> is
+available from <CODE>http://ftp.gnu.org/pub/gnu/gperf/</CODE>.
A paper describing <CODE>gperf</CODE>'s design and implementation in greater
-detail is available in the Second USENIX C++ Conference proceedings.
+detail is available in the Second USENIX C++ Conference proceedings
+or from <CODE>http://www.cs.wustl.edu/~schmidt/resume.html</CODE>.
</P>
<P><HR><P>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_2.html">previous</A>, <A HREF="gperf_4.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_2.html">previous</A>, <A HREF="gperf_4.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
</BODY>
</HTML>
diff --git a/doc/gperf_4.html b/doc/gperf_4.html
index 1658d3b..cdd063d 100644
--- a/doc/gperf_4.html
+++ b/doc/gperf_4.html
@@ -1,12 +1,12 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
- from gperf.texi on 26 September 2000 -->
+ from gperf.texi on 7 May 2003 -->
<TITLE>Perfect Hash Function Generator - 2 Static search structures and GNU gperf</TITLE>
</HEAD>
<BODY>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_3.html">previous</A>, <A HREF="gperf_5.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_3.html">previous</A>, <A HREF="gperf_5.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
<P><HR><P>
@@ -19,7 +19,7 @@ Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_3.html">previous</A>,
A <STRONG>static search structure</STRONG> is an Abstract Data Type with certain
fundamental operations, e.g., <EM>initialize</EM>, <EM>insert</EM>,
and <EM>retrieve</EM>. Conceptually, all insertions occur before any
-retrievals. In practice, <CODE>gperf</CODE> generates a <CODE>static</CODE> array
+retrievals. In practice, <CODE>gperf</CODE> generates a <EM>static</EM> array
containing search set keywords and any associated attributes specified
by the user. Thus, there is essentially no execution-time cost for the
insertions. It is a useful data structure for representing <EM>static
@@ -86,13 +86,13 @@ the drudgery associated with constructing time- and space-efficient
search structures by hand. It has proven a useful and practical tool
for serious programming projects. Output from <CODE>gperf</CODE> is currently
used in several production and research compilers, including GNU C, GNU
-C++, GNU Pascal, and GNU Modula 3. The latter two compilers are not yet
-part of the official GNU distribution. Each compiler utilizes
+C++, GNU Java, GNU Pascal, and GNU Modula 3. The latter two compilers are
+not yet part of the official GNU distribution. Each compiler utilizes
<CODE>gperf</CODE> to automatically generate static search structures that
efficiently identify their respective reserved keywords.
</P>
<P><HR><P>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_3.html">previous</A>, <A HREF="gperf_5.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_3.html">previous</A>, <A HREF="gperf_5.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
</BODY>
</HTML>
diff --git a/doc/gperf_5.html b/doc/gperf_5.html
index 010ad4e..af6dbe9 100644
--- a/doc/gperf_5.html
+++ b/doc/gperf_5.html
@@ -1,12 +1,12 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
- from gperf.texi on 26 September 2000 -->
+ from gperf.texi on 7 May 2003 -->
<TITLE>Perfect Hash Function Generator - 3 High-Level Description of GNU gperf</TITLE>
</HEAD>
<BODY>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_4.html">previous</A>, <A HREF="gperf_6.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_4.html">previous</A>, <A HREF="gperf_6.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
<P><HR><P>
@@ -14,7 +14,7 @@ Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_4.html">previous</A>,
<P>
The perfect hash function generator <CODE>gperf</CODE> reads a set of
-"keywords" from a <STRONG>keyfile</STRONG> (or from the standard input by
+"keywords" from an input file (or from the standard input by
default). It attempts to derive a perfect hashing function that
recognizes a member of the <STRONG>static keyword set</STRONG> with at most a
single probe into the lookup table. If <CODE>gperf</CODE> succeeds in
@@ -37,7 +37,7 @@ somewhat. Actual results depend on your C compiler, of course.
</P>
<P>
-In general, <CODE>gperf</CODE> assigns values to the characters it is using
+In general, <CODE>gperf</CODE> assigns values to the bytes it is using
for hashing until some set of values gives each keyword a unique value.
A helpful heuristic is that the larger the hash value range, the easier
it is for <CODE>gperf</CODE> to find and generate a perfect hash function.
@@ -52,7 +52,7 @@ Experimentation is the key to getting the most from <CODE>gperf</CODE>.
<A NAME="IDX5"></A>
<A NAME="IDX6"></A>
<A NAME="IDX7"></A>
-You can control the input keyfile format by varying certain command-line
+You can control the input file format by varying certain command-line
arguments, in particular the <SAMP>`-t'</SAMP> option. The input's appearance
is similar to GNU utilities <CODE>flex</CODE> and <CODE>bison</CODE> (or UNIX
utilities <CODE>lex</CODE> and <CODE>yacc</CODE>). Here's an outline of the general
@@ -69,25 +69,53 @@ functions
</PRE>
<P>
-<EM>Unlike</EM> <CODE>flex</CODE> or <CODE>bison</CODE>, all sections of
-<CODE>gperf</CODE>'s input are optional. The following sections describe the
+<EM>Unlike</EM> <CODE>flex</CODE> or <CODE>bison</CODE>, the declarations section and
+the functions section are optional. The following sections describe the
input format for each section.
</P>
+<P>
+It is possible to omit the declaration section entirely, if the <SAMP>`-t'</SAMP>
+option is not given. In this case the input file begins directly with the
+first keyword line, e.g.:
+</P>
-<H3><A NAME="SEC9" HREF="gperf_toc.html#TOC9">3.1.1 <CODE>struct</CODE> Declarations and C Code Inclusion</A></H3>
+<PRE>
+january
+february
+march
+april
+...
+</PRE>
+
+
+
+<H3><A NAME="SEC9" HREF="gperf_toc.html#TOC9">3.1.1 Declarations</A></H3>
<P>
The keyword input file optionally contains a section for including
-arbitrary C declarations and definitions, as well as provisions for
-providing a user-supplied <CODE>struct</CODE>. If the <SAMP>`-t'</SAMP> option
+arbitrary C declarations and definitions, <CODE>gperf</CODE> declarations that
+act like command-line options, as well as for providing a user-supplied
+<CODE>struct</CODE>.
+
+</P>
+
+
+
+<H4><A NAME="SEC10" HREF="gperf_toc.html#TOC10">3.1.1.1 User-supplied <CODE>struct</CODE></A></H4>
+
+<P>
+If the <SAMP>`-t'</SAMP> option (or, equivalently, the <SAMP>`%struct-type'</SAMP> declaration)
<EM>is</EM> enabled, you <EM>must</EM> provide a C <CODE>struct</CODE> as the last
-component in the declaration section from the keyfile file. The first
-field in this struct must be a <CODE>char *</CODE> or <CODE>const char *</CODE>
-identifier called <SAMP>`name'</SAMP>, although it is possible to modify this
-field's name with the <SAMP>`-K'</SAMP> option described below.
+component in the declaration section from the input file. The first
+field in this struct must be of type <CODE>char *</CODE> or <CODE>const char *</CODE>
+if the <SAMP>`-P'</SAMP> option is not given, or of type <CODE>int</CODE> if the option
+<SAMP>`-P'</SAMP> (or, equivalently, the <SAMP>`%pic'</SAMP> declaration) is enabled.
+This first field must be called <SAMP>`name'</SAMP>, although it is possible to modify
+its name with the <SAMP>`-K'</SAMP> option (or, equivalently, the
+<SAMP>`%define slot-name'</SAMP> declaration) described below.
</P>
<P>
@@ -121,9 +149,260 @@ appearing left justified in the first column, as in the UNIX utility
<CODE>lex</CODE>.
</P>
+
+
+<H4><A NAME="SEC11" HREF="gperf_toc.html#TOC11">3.1.1.2 Gperf Declarations</A></H4>
+
+<P>
+The declaration section can contain <CODE>gperf</CODE> declarations. They
+influence the way <CODE>gperf</CODE> works, like command line options do.
+In fact, every such declaration is equivalent to a command line option.
+There are three forms of declarations:
+
+</P>
+
+<OL>
+<LI>
+
+Declarations without argument, like <SAMP>`%compare-lengths'</SAMP>.
+
+<LI>
+
+Declarations with an argument, like <SAMP>`%switch=<VAR>count</VAR>'</SAMP>.
+
+<LI>
+
+Declarations of names of entities in the output file, like
+<SAMP>`%define lookup-function-name <VAR>name</VAR>'</SAMP>.
+</OL>
+
+<P>
+When a declaration is given both in the input file and as a command line
+option, the command-line option's value prevails.
+
+</P>
<P>
+The following <CODE>gperf</CODE> declarations are available.
+
+</P>
+<DL COMPACT>
+
+<DT><SAMP>`%delimiters=<VAR>delimiter-list</VAR>'</SAMP>
+<DD>
<A NAME="IDX9"></A>
+Allows you to provide a string containing delimiters used to
+separate keywords from their attributes. The default is ",". This
+option is essential if you want to use keywords that have embedded
+commas or newlines.
+
+<DT><SAMP>`%struct-type'</SAMP>
+<DD>
<A NAME="IDX10"></A>
+Allows you to include a <CODE>struct</CODE> type declaration for generated
+code; see above for an example.
+
+<DT><SAMP>`%ignore-case'</SAMP>
+<DD>
+<A NAME="IDX11"></A>
+Consider upper and lower case ASCII characters as equivalent. The string
+comparison will use a case insignificant character comparison. Note that
+locale dependent case mappings are ignored.
+
+<DT><SAMP>`%language=<VAR>language-name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX12"></A>
+Instructs <CODE>gperf</CODE> to generate code in the language specified by the
+option's argument. Languages handled are currently:
+
+<DL COMPACT>
+
+<DT><SAMP>`KR-C'</SAMP>
+<DD>
+Old-style K&#38;R C. This language is understood by old-style C compilers and
+ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
+because of lacking <SAMP>`const'</SAMP>.
+
+<DT><SAMP>`C'</SAMP>
+<DD>
+Common C. This language is understood by ANSI C compilers, and also by
+old-style C compilers, provided that you <CODE>#define const</CODE> to empty
+for compilers which don't know about this keyword.
+
+<DT><SAMP>`ANSI-C'</SAMP>
+<DD>
+ANSI C. This language is understood by ANSI C compilers and C++ compilers.
+
+<DT><SAMP>`C++'</SAMP>
+<DD>
+C++. This language is understood by C++ compilers.
+</DL>
+
+The default is C.
+
+<DT><SAMP>`%define slot-name <VAR>name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX13"></A>
+This declaration is only useful when option <SAMP>`-t'</SAMP> (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) has been given.
+By default, the program assumes the structure component identifier for
+the keyword is <SAMP>`name'</SAMP>. This option allows an arbitrary choice of
+identifier for this component, although it still must occur as the first
+field in your supplied <CODE>struct</CODE>.
+
+<DT><SAMP>`%define initializer-suffix <VAR>initializers</VAR>'</SAMP>
+<DD>
+<A NAME="IDX14"></A>
+This declaration is only useful when option <SAMP>`-t'</SAMP> (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) has been given.
+It permits to specify initializers for the structure members following
+<VAR>slot-name</VAR> in empty hash table entries. The list of initializers
+should start with a comma. By default, the emitted code will
+zero-initialize structure members following <VAR>slot-name</VAR>.
+
+<DT><SAMP>`%define hash-function-name <VAR>name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX15"></A>
+Allows you to specify the name for the generated hash function. Default
+name is <SAMP>`hash'</SAMP>. This option permits the use of two hash tables in
+the same file.
+
+<DT><SAMP>`%define lookup-function-name <VAR>name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX16"></A>
+Allows you to specify the name for the generated lookup function.
+Default name is <SAMP>`in_word_set'</SAMP>. This option permits multiple
+generated hash functions to be used in the same application.
+
+<DT><SAMP>`%define class-name <VAR>name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX17"></A>
+This option is only useful when option <SAMP>`-L C++'</SAMP> (or, equivalently,
+the <SAMP>`%language=C++'</SAMP> declaration) has been given. It
+allows you to specify the name of generated C++ class. Default name is
+<CODE>Perfect_Hash</CODE>.
+
+<DT><SAMP>`%7bit'</SAMP>
+<DD>
+<A NAME="IDX18"></A>
+This option specifies that all strings that will be passed as arguments
+to the generated hash function and the generated lookup function will
+solely consist of 7-bit ASCII characters (bytes in the range 0..127).
+(Note that the ANSI C functions <CODE>isalnum</CODE> and <CODE>isgraph</CODE> do
+<EM>not</EM> guarantee that a byte is in this range. Only an explicit
+test like <SAMP>`c &#62;= 'A' &#38;&#38; c &#60;= 'Z''</SAMP> guarantees this.)
+
+<DT><SAMP>`%compare-lengths'</SAMP>
+<DD>
+<A NAME="IDX19"></A>
+Compare keyword lengths before trying a string comparison. This option
+is mandatory for binary comparisons (see section <A HREF="gperf_5.html#SEC17">3.3 Use of NUL bytes</A>). It also might
+cut down on the number of string comparisons made during the lookup, since
+keywords with different lengths are never compared via <CODE>strcmp</CODE>.
+However, using <SAMP>`%compare-lengths'</SAMP> might greatly increase the size of the
+generated C code if the lookup table range is large (which implies that
+the switch option <SAMP>`-S'</SAMP> or <SAMP>`%switch'</SAMP> is not enabled), since the length
+table contains as many elements as there are entries in the lookup table.
+
+<DT><SAMP>`%compare-strncmp'</SAMP>
+<DD>
+<A NAME="IDX20"></A>
+Generates C code that uses the <CODE>strncmp</CODE> function to perform
+string comparisons. The default action is to use <CODE>strcmp</CODE>.
+
+<DT><SAMP>`%readonly-tables'</SAMP>
+<DD>
+<A NAME="IDX21"></A>
+Makes the contents of all generated lookup tables constant, i.e.,
+"readonly". Many compilers can generate more efficient code for this
+by putting the tables in readonly memory.
+
+<DT><SAMP>`%enum'</SAMP>
+<DD>
+<A NAME="IDX22"></A>
+Define constant values using an enum local to the lookup function rather
+than with #defines. This also means that different lookup functions can
+reside in the same file. Thanks to James Clark <CODE>&#60;jjc@ai.mit.edu&#62;</CODE>.
+
+<DT><SAMP>`%includes'</SAMP>
+<DD>
+<A NAME="IDX23"></A>
+Include the necessary system include file, <CODE>&#60;string.h&#62;</CODE>, at the
+beginning of the code. By default, this is not done; the user must
+include this header file himself to allow compilation of the code.
+
+<DT><SAMP>`%global-table'</SAMP>
+<DD>
+<A NAME="IDX24"></A>
+Generate the static table of keywords as a static global variable,
+rather than hiding it inside of the lookup function (which is the
+default behavior).
+
+<DT><SAMP>`%pic'</SAMP>
+<DD>
+<A NAME="IDX25"></A>
+Optimize the generated table for inclusion in shared libraries. This
+reduces the startup time of programs using a shared library containing
+the generated code. If the <SAMP>`%struct-type'</SAMP> declaration (or,
+equivalently, the option <SAMP>`-t'</SAMP>) is also given, the first field of the
+user-defined struct must be of type <SAMP>`int'</SAMP>, not <SAMP>`char *'</SAMP>, because
+it will contain offsets into the string pool instead of actual strings.
+To convert such an offset to a string, you can use the expression
+<SAMP>`stringpool + <VAR>o</VAR>'</SAMP>, where <VAR>o</VAR> is the offset. The string pool
+name can be changed through the <SAMP>`%define string-pool-name'</SAMP> declaration.
+
+<DT><SAMP>`%define string-pool-name <VAR>name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX26"></A>
+Allows you to specify the name of the generated string pool created by
+the declaration <SAMP>`%pic'</SAMP> (or, equivalently, the option <SAMP>`-P'</SAMP>).
+The default name is <SAMP>`stringpool'</SAMP>. This declaration permits the use of
+two hash tables in the same file, with <SAMP>`%pic'</SAMP> and even when the
+<SAMP>`%global-table'</SAMP> declaration (or, equivalently, the option <SAMP>`-G'</SAMP>)
+is given.
+
+<DT><SAMP>`%null-strings'</SAMP>
+<DD>
+<A NAME="IDX27"></A>
+Use NULL strings instead of empty strings for empty keyword table entries.
+This reduces the startup time of programs using a shared library containing
+the generated code (but not as much as the declaration <SAMP>`%pic'</SAMP>), at the
+expense of one more test-and-branch instruction at run time.
+
+<DT><SAMP>`%define word-array-name <VAR>name</VAR>'</SAMP>
+<DD>
+<A NAME="IDX28"></A>
+Allows you to specify the name for the generated array containing the
+hash table. Default name is <SAMP>`wordlist'</SAMP>. This option permits the
+use of two hash tables in the same file, even when the option <SAMP>`-G'</SAMP>
+(or, equivalently, the <SAMP>`%global-table'</SAMP> declaration) is given.
+
+<DT><SAMP>`%switch=<VAR>count</VAR>'</SAMP>
+<DD>
+<A NAME="IDX29"></A>
+Causes the generated C code to use a <CODE>switch</CODE> statement scheme,
+rather than an array lookup table. This can lead to a reduction in both
+time and space requirements for some input files. The argument to this
+option determines how many <CODE>switch</CODE> statements are generated. A
+value of 1 generates 1 <CODE>switch</CODE> containing all the elements, a
+value of 2 generates 2 tables with 1/2 the elements in each
+<CODE>switch</CODE>, etc. This is useful since many C compilers cannot
+correctly generate code for large <CODE>switch</CODE> statements. This option
+was inspired in part by Keith Bostic's original C program.
+
+<DT><SAMP>`%omit-struct-type'</SAMP>
+<DD>
+<A NAME="IDX30"></A>
+Prevents the transfer of the type declaration to the output file. Use
+this option if the type is already defined elsewhere.
+</DL>
+
+
+
+<H4><A NAME="SEC12" HREF="gperf_toc.html#TOC12">3.1.1.3 C Code Inclusion</A></H4>
+
+<P>
+<A NAME="IDX31"></A>
+<A NAME="IDX32"></A>
Using a syntax similar to GNU utilities <CODE>flex</CODE> and <CODE>bison</CODE>, it
is possible to directly include C source text and comments verbatim into
the generated output file. This is accomplished by enclosing the region
@@ -147,37 +426,25 @@ march, 3, 31, 31
...
</PRE>
-<P>
-It is possible to omit the declaration section entirely. In this case
-the keyfile begins directly with the first keyword line, e.g.:
-
-</P>
-
-<PRE>
-january, 1, 31, 31
-february, 2, 28, 29
-march, 3, 31, 31
-april, 4, 30, 30
-...
-</PRE>
-
-<H3><A NAME="SEC10" HREF="gperf_toc.html#TOC10">3.1.2 Format for Keyword Entries</A></H3>
+<H3><A NAME="SEC13" HREF="gperf_toc.html#TOC13">3.1.2 Format for Keyword Entries</A></H3>
<P>
-The second keyfile format section contains lines of keywords and any
+The second input file format section contains lines of keywords and any
associated attributes you might supply. A line beginning with <SAMP>`#'</SAMP>
in the first column is considered a comment. Everything following the
-<SAMP>`#'</SAMP> is ignored, up to and including the following newline.
+<SAMP>`#'</SAMP> is ignored, up to and including the following newline. A line
+beginning with <SAMP>`%'</SAMP> in the first column is an option declaration and
+must not occur within the keywords section.
</P>
<P>
-The first field of each non-comment line is always the key itself. It
+The first field of each non-comment line is always the keyword itself. It
can be given in two ways: as a simple name, i.e., without surrounding
string quotation marks, or as a string enclosed in double-quotes, in
C syntax, possibly with backslash escapes like <CODE>\"</CODE> or <CODE>\234</CODE>
-or <CODE>\xa8</CODE>. In either case, it must start right at the beginning
+or <CODE>\xa8</CODE>. In either case, it must start right at the beginning
of the line, without leading whitespace.
In this context, a "field" is considered to extend up to, but
not include, the first blank, comma, or newline. Here is a simple
@@ -209,14 +476,15 @@ Additional fields may optionally follow the leading keyword. Fields
should be separated by commas, and terminate at the end of line. What
these fields mean is entirely up to you; they are used to initialize the
elements of the user-defined <CODE>struct</CODE> provided by you in the
-declaration section. If the <SAMP>`-t'</SAMP> option is <EM>not</EM> enabled
+declaration section. If the <SAMP>`-t'</SAMP> option (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) is <EM>not</EM> enabled
these fields are simply ignored. All previous examples except the last
one contain keyword attributes.
</P>
-<H3><A NAME="SEC11" HREF="gperf_toc.html#TOC11">3.1.3 Including Additional C Functions</A></H3>
+<H3><A NAME="SEC14" HREF="gperf_toc.html#TOC14">3.1.3 Including Additional C Functions</A></H3>
<P>
The optional third section also corresponds closely with conventions
@@ -229,9 +497,57 @@ section is valid C.
</P>
-<H2><A NAME="SEC12" HREF="gperf_toc.html#TOC12">3.2 Output Format for Generated C Code with <CODE>gperf</CODE></A></H2>
+<H3><A NAME="SEC15" HREF="gperf_toc.html#TOC15">3.1.4 Where to place directives for GNU <CODE>indent</CODE>.</A></H3>
+
<P>
-<A NAME="IDX11"></A>
+If you want to invoke GNU <CODE>indent</CODE> on a <CODE>gperf</CODE> input file,
+you will see that GNU <CODE>indent</CODE> doesn't understand the <SAMP>`%%'</SAMP>,
+<SAMP>`%{'</SAMP> and <SAMP>`%}'</SAMP> directives that control <CODE>gperf</CODE>'s
+interpretation of the input file. Therefore you have to insert some
+directives for GNU <CODE>indent</CODE>. More precisely, assuming the most
+general input file structure
+
+</P>
+
+<PRE>
+declarations part 1
+%{
+verbatim code
+%}
+declarations part 2
+%%
+keywords
+%%
+functions
+</PRE>
+
+<P>
+you would insert <SAMP>`*INDENT-OFF*'</SAMP> and <SAMP>`*INDENT-ON*'</SAMP> comments
+as follows:
+
+</P>
+
+<PRE>
+/* *INDENT-OFF* */
+declarations part 1
+%{
+/* *INDENT-ON* */
+verbatim code
+/* *INDENT-OFF* */
+%}
+declarations part 2
+%%
+keywords
+%%
+/* *INDENT-ON* */
+functions
+</PRE>
+
+
+
+<H2><A NAME="SEC16" HREF="gperf_toc.html#TOC16">3.2 Output Format for Generated C Code with <CODE>gperf</CODE></A></H2>
+<P>
+<A NAME="IDX33"></A>
</P>
<P>
@@ -246,34 +562,36 @@ function prototypes are as follows:
<P>
<DL>
<DT><U>Function:</U> unsigned int <B>hash</B> <I>(const char * <VAR>str</VAR>, unsigned int <VAR>len</VAR>)</I>
-<DD><A NAME="IDX12"></A>
+<DD><A NAME="IDX34"></A>
By default, the generated <CODE>hash</CODE> function returns an integer value
-created by adding <VAR>len</VAR> to several user-specified <VAR>str</VAR> key
+created by adding <VAR>len</VAR> to several user-specified <VAR>str</VAR> byte
positions indexed into an <STRONG>associated values</STRONG> table stored in a
local static array. The associated values table is constructed
internally by <CODE>gperf</CODE> and later output as a static local C array
-called <SAMP>`hash_table'</SAMP>; its meaning and properties are described below
-(see section <A HREF="gperf_9.html#SEC22">7 Implementation Details of GNU <CODE>gperf</CODE></A>). The relevant key positions are specified via
-the <SAMP>`-k'</SAMP> option when running <CODE>gperf</CODE>, as detailed in the
-<EM>Options</EM> section below(see section <A HREF="gperf_6.html#SEC14">4 Invoking <CODE>gperf</CODE></A>).
+called <SAMP>`hash_table'</SAMP>. The relevant selected positions (i.e. indices
+into <VAR>str</VAR>) are specified via the <SAMP>`-k'</SAMP> option when running
+<CODE>gperf</CODE>, as detailed in the <EM>Options</EM> section below (see section <A HREF="gperf_6.html#SEC18">4 Invoking <CODE>gperf</CODE></A>).
</DL>
</P>
<P>
<DL>
<DT><U>Function:</U> <B>in_word_set</B> <I>(const char * <VAR>str</VAR>, unsigned int <VAR>len</VAR>)</I>
-<DD><A NAME="IDX13"></A>
+<DD><A NAME="IDX35"></A>
If <VAR>str</VAR> is in the keyword set, returns a pointer to that
-keyword. More exactly, if the option <SAMP>`-t'</SAMP> was given, it returns
-a pointer to the matching keyword's structure. Otherwise it returns
+keyword. More exactly, if the option <SAMP>`-t'</SAMP> (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) was given, it returns
+a pointer to the matching keyword's structure. Otherwise it returns
<CODE>NULL</CODE>.
</DL>
</P>
<P>
-If the option <SAMP>`-c'</SAMP> is not used, <VAR>str</VAR> must be a NUL terminated
-string of exactly length <VAR>len</VAR>. If <SAMP>`-c'</SAMP> is used, <VAR>str</VAR> must
-simply be an array of <VAR>len</VAR> characters and does not need to be NUL
+If the option <SAMP>`-c'</SAMP> (or, equivalently, the <SAMP>`%compare-strncmp'</SAMP>
+declaration) is not used, <VAR>str</VAR> must be a NUL terminated
+string of exactly length <VAR>len</VAR>. If <SAMP>`-c'</SAMP> (or, equivalently, the
+<SAMP>`%compare-strncmp'</SAMP> declaration) is used, <VAR>str</VAR> must
+simply be an array of <VAR>len</VAR> bytes and does not need to be NUL
terminated.
</P>
@@ -294,7 +612,7 @@ Make use of the user-defined <CODE>struct</CODE>.
<DD>
<DT><SAMP>`--switch=<VAR>total-switch-statements</VAR>'</SAMP>
<DD>
-<A NAME="IDX14"></A>
+<A NAME="IDX36"></A>
Generate 1 or more C <CODE>switch</CODE> statement rather than use a large,
(and potentially sparse) static array. Although the exact time and
space savings of this approach vary according to your C compiler's
@@ -303,9 +621,11 @@ code.
</DL>
<P>
-If the <SAMP>`-t'</SAMP> and <SAMP>`-S'</SAMP> options are omitted, the default action
-is to generate a <CODE>char *</CODE> array containing the keys, together with
-additional null strings used for padding the array. By experimenting
+If the <SAMP>`-t'</SAMP> and <SAMP>`-S'</SAMP> options (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> and <SAMP>`%switch'</SAMP> declarations) are omitted, the default
+action
+is to generate a <CODE>char *</CODE> array containing the keywords, together with
+additional empty strings used for padding the array. By experimenting
with the various input and output options, and timing the resulting C
code, you can determine the best option choices for different keyword
set characteristics.
@@ -313,36 +633,39 @@ set characteristics.
</P>
-<H2><A NAME="SEC13" HREF="gperf_toc.html#TOC13">3.3 Use of NUL characters</A></H2>
+<H2><A NAME="SEC17" HREF="gperf_toc.html#TOC17">3.3 Use of NUL bytes</A></H2>
<P>
-<A NAME="IDX15"></A>
+<A NAME="IDX37"></A>
</P>
<P>
By default, the code generated by <CODE>gperf</CODE> operates on zero
-terminated strings, the usual representation of strings in C. This means
-that the keywords in the input file must not contain NUL characters,
+terminated strings, the usual representation of strings in C. This means
+that the keywords in the input file must not contain NUL bytes,
and the <VAR>str</VAR> argument passed to <CODE>hash</CODE> or <CODE>in_word_set</CODE>
must be NUL terminated and have exactly length <VAR>len</VAR>.
</P>
<P>
-If option <SAMP>`-c'</SAMP> is used, then the <VAR>str</VAR> argument does not need
-to be NUL terminated. The code generated by <CODE>gperf</CODE> will only
+If option <SAMP>`-c'</SAMP> (or, equivalently, the <SAMP>`%compare-strncmp'</SAMP>
+declaration) is used, then the <VAR>str</VAR> argument does not need
+to be NUL terminated. The code generated by <CODE>gperf</CODE> will only
access the first <VAR>len</VAR>, not <VAR>len+1</VAR>, bytes starting at <VAR>str</VAR>.
However, the keywords in the input file still must not contain NUL
-characters.
+bytes.
</P>
<P>
-If option <SAMP>`-l'</SAMP> is used, then the hash table performs binary
-comparison. The keywords in the input file may contain NUL characters,
+If option <SAMP>`-l'</SAMP> (or, equivalently, the <SAMP>`%compare-lengths'</SAMP>
+declaration) is used, then the hash table performs binary
+comparison. The keywords in the input file may contain NUL bytes,
written in string syntax as <CODE>\000</CODE> or <CODE>\x00</CODE>, and the code
-generated by <CODE>gperf</CODE> will treat NUL like any other character.
-Also, in this case the <SAMP>`-c'</SAMP> option is ignored.
+generated by <CODE>gperf</CODE> will treat NUL like any other byte.
+Also, in this case the <SAMP>`-c'</SAMP> option (or, equivalently, the
+<SAMP>`%compare-strncmp'</SAMP> declaration) is ignored.
</P>
<P><HR><P>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_4.html">previous</A>, <A HREF="gperf_6.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_4.html">previous</A>, <A HREF="gperf_6.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
</BODY>
</HTML>
diff --git a/doc/gperf_6.html b/doc/gperf_6.html
index a9cbacc..599910d 100644
--- a/doc/gperf_6.html
+++ b/doc/gperf_6.html
@@ -1,38 +1,59 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
- from gperf.texi on 26 September 2000 -->
+ from gperf.texi on 7 May 2003 -->
<TITLE>Perfect Hash Function Generator - 4 Invoking gperf</TITLE>
</HEAD>
<BODY>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_5.html">previous</A>, <A HREF="gperf_7.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_5.html">previous</A>, <A HREF="gperf_7.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
<P><HR><P>
-<H1><A NAME="SEC14" HREF="gperf_toc.html#TOC14">4 Invoking <CODE>gperf</CODE></A></H1>
+<H1><A NAME="SEC18" HREF="gperf_toc.html#TOC18">4 Invoking <CODE>gperf</CODE></A></H1>
<P>
There are <EM>many</EM> options to <CODE>gperf</CODE>. They were added to make
the program more convenient for use with real applications. "On-line"
-help is readily available via the <SAMP>`-h'</SAMP> option. Here is the
+help is readily available via the <SAMP>`--help'</SAMP> option. Here is the
complete list of options.
</P>
-<H2><A NAME="SEC15" HREF="gperf_toc.html#TOC15">4.1 Options that affect Interpretation of the Input File</A></H2>
+<H2><A NAME="SEC19" HREF="gperf_toc.html#TOC19">4.1 Specifying the Location of the Output File</A></H2>
<DL COMPACT>
+<DT><SAMP>`--output-file=<VAR>file</VAR>'</SAMP>
+<DD>
+Allows you to specify the name of the file to which the output is written to.
+</DL>
+
+<P>
+The results are written to standard output if no output file is specified
+or if it is <SAMP>`-'</SAMP>.
+
+</P>
+
+
+<H2><A NAME="SEC20" HREF="gperf_toc.html#TOC20">4.2 Options that affect Interpretation of the Input File</A></H2>
+
+<P>
+These options are also available as declarations in the input file
+(see section <A HREF="gperf_5.html#SEC11">3.1.1.2 Gperf Declarations</A>).
+
+</P>
+<DL COMPACT>
+
<DT><SAMP>`-e <VAR>keyword-delimiter-list</VAR>'</SAMP>
<DD>
<DT><SAMP>`--delimiters=<VAR>keyword-delimiter-list</VAR>'</SAMP>
<DD>
-<A NAME="IDX16"></A>
-Allows the user to provide a string containing delimiters used to
-separate keywords from their attributes. The default is ",\n". This
+<A NAME="IDX38"></A>
+Allows you to provide a string containing delimiters used to
+separate keywords from their attributes. The default is ",". This
option is essential if you want to use keywords that have embedded
commas or newlines. One useful trick is to use -e'TAB', where TAB is
the literal tab character.
@@ -47,12 +68,29 @@ part of the type declaration. Keywords and additional fields may follow
this, one group of fields per line. A set of examples for generating
perfect hash tables and functions for Ada, C, C++, Pascal, Modula 2,
Modula 3 and JavaScript reserved words are distributed with this release.
+
+<DT><SAMP>`--ignore-case'</SAMP>
+<DD>
+Consider upper and lower case ASCII characters as equivalent. The string
+comparison will use a case insignificant character comparison. Note that
+locale dependent case mappings are ignored. This option is therefore not
+suitable if a properly internationalized or locale aware case mapping
+should be used. (For example, in a Turkish locale, the upper case equivalent
+of the lowercase ASCII letter <SAMP>`i'</SAMP> is the non-ASCII character
+<SAMP>`capital i with dot above'</SAMP>.) For this case, it is better to apply
+an uppercase or lowercase conversion on the string before passing it to
+the <CODE>gperf</CODE> generated function.
</DL>
-<H2><A NAME="SEC16" HREF="gperf_toc.html#TOC16">4.2 Options to specify the Language for the Output Code</A></H2>
+<H2><A NAME="SEC21" HREF="gperf_toc.html#TOC21">4.3 Options to specify the Language for the Output Code</A></H2>
+<P>
+These options are also available as declarations in the input file
+(see section <A HREF="gperf_5.html#SEC11">3.1.1.2 Gperf Declarations</A>).
+
+</P>
<DL COMPACT>
<DT><SAMP>`-L <VAR>generated-language-name</VAR>'</SAMP>
@@ -66,23 +104,23 @@ option's argument. Languages handled are currently:
<DT><SAMP>`KR-C'</SAMP>
<DD>
-Old-style K&#38;R C. This language is understood by old-style C compilers and
+Old-style K&#38;R C. This language is understood by old-style C compilers and
ANSI C compilers, but ANSI C compilers may flag warnings (or even errors)
because of lacking <SAMP>`const'</SAMP>.
<DT><SAMP>`C'</SAMP>
<DD>
-Common C. This language is understood by ANSI C compilers, and also by
+Common C. This language is understood by ANSI C compilers, and also by
old-style C compilers, provided that you <CODE>#define const</CODE> to empty
for compilers which don't know about this keyword.
<DT><SAMP>`ANSI-C'</SAMP>
<DD>
-ANSI C. This language is understood by ANSI C compilers and C++ compilers.
+ANSI C. This language is understood by ANSI C compilers and C++ compilers.
<DT><SAMP>`C++'</SAMP>
<DD>
-C++. This language is understood by C++ compilers.
+C++. This language is understood by C++ compilers.
</DL>
The default is C.
@@ -90,26 +128,32 @@ The default is C.
<DT><SAMP>`-a'</SAMP>
<DD>
This option is supported for compatibility with previous releases of
-<CODE>gperf</CODE>. It does not do anything.
+<CODE>gperf</CODE>. It does not do anything.
<DT><SAMP>`-g'</SAMP>
<DD>
This option is supported for compatibility with previous releases of
-<CODE>gperf</CODE>. It does not do anything.
+<CODE>gperf</CODE>. It does not do anything.
</DL>
-<H2><A NAME="SEC17" HREF="gperf_toc.html#TOC17">4.3 Options for fine tuning Details in the Output Code</A></H2>
+<H2><A NAME="SEC22" HREF="gperf_toc.html#TOC22">4.4 Options for fine tuning Details in the Output Code</A></H2>
+<P>
+Most of these options are also available as declarations in the input file
+(see section <A HREF="gperf_5.html#SEC11">3.1.1.2 Gperf Declarations</A>).
+
+</P>
<DL COMPACT>
-<DT><SAMP>`-K <VAR>key-name</VAR>'</SAMP>
+<DT><SAMP>`-K <VAR>slot-name</VAR>'</SAMP>
<DD>
-<DT><SAMP>`--slot-name=<VAR>key-name</VAR>'</SAMP>
+<DT><SAMP>`--slot-name=<VAR>slot-name</VAR>'</SAMP>
<DD>
-<A NAME="IDX17"></A>
-This option is only useful when option <SAMP>`-t'</SAMP> has been given.
+<A NAME="IDX39"></A>
+This option is only useful when option <SAMP>`-t'</SAMP> (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) has been given.
By default, the program assumes the structure component identifier for
the keyword is <SAMP>`name'</SAMP>. This option allows an arbitrary choice of
identifier for this component, although it still must occur as the first
@@ -119,16 +163,17 @@ field in your supplied <CODE>struct</CODE>.
<DD>
<DT><SAMP>`--initializer-suffix=<VAR>initializers</VAR>'</SAMP>
<DD>
-<A NAME="IDX18"></A>
-This option is only useful when option <SAMP>`-t'</SAMP> has been given.
+<A NAME="IDX40"></A>
+This option is only useful when option <SAMP>`-t'</SAMP> (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) has been given.
It permits to specify initializers for the structure members following
-<VAR>key name</VAR> in empty hash table entries. The list of initializers
+<VAR>slot-name</VAR> in empty hash table entries. The list of initializers
should start with a comma. By default, the emitted code will
-zero-initialize structure members following <VAR>key name</VAR>.
+zero-initialize structure members following <VAR>slot-name</VAR>.
<DT><SAMP>`-H <VAR>hash-function-name</VAR>'</SAMP>
<DD>
-<DT><SAMP>`--hash-fn-name=<VAR>hash-function-name</VAR>'</SAMP>
+<DT><SAMP>`--hash-function-name=<VAR>hash-function-name</VAR>'</SAMP>
<DD>
Allows you to specify the name for the generated hash function. Default
name is <SAMP>`hash'</SAMP>. This option permits the use of two hash tables in
@@ -136,19 +181,19 @@ the same file.
<DT><SAMP>`-N <VAR>lookup-function-name</VAR>'</SAMP>
<DD>
-<DT><SAMP>`--lookup-fn-name=<VAR>lookup-function-name</VAR>'</SAMP>
+<DT><SAMP>`--lookup-function-name=<VAR>lookup-function-name</VAR>'</SAMP>
<DD>
Allows you to specify the name for the generated lookup function.
-Default name is <SAMP>`in_word_set'</SAMP>. This option permits completely
-automatic generation of perfect hash functions, especially when multiple
-generated hash functions are used in the same application.
+Default name is <SAMP>`in_word_set'</SAMP>. This option permits multiple
+generated hash functions to be used in the same application.
<DT><SAMP>`-Z <VAR>class-name</VAR>'</SAMP>
<DD>
<DT><SAMP>`--class-name=<VAR>class-name</VAR>'</SAMP>
<DD>
-<A NAME="IDX19"></A>
-This option is only useful when option <SAMP>`-L C++'</SAMP> has been given. It
+<A NAME="IDX41"></A>
+This option is only useful when option <SAMP>`-L C++'</SAMP> (or, equivalently,
+the <SAMP>`%language=C++'</SAMP> declaration) has been given. It
allows you to specify the name of generated C++ class. Default name is
<CODE>Perfect_Hash</CODE>.
@@ -158,12 +203,25 @@ allows you to specify the name of generated C++ class. Default name is
<DD>
This option specifies that all strings that will be passed as arguments
to the generated hash function and the generated lookup function will
-solely consist of 7-bit ASCII characters (characters in the range 0..127).
+solely consist of 7-bit ASCII characters (bytes in the range 0..127).
(Note that the ANSI C functions <CODE>isalnum</CODE> and <CODE>isgraph</CODE> do
-<EM>not</EM> guarantee that a character is in this range. Only an explicit
+<EM>not</EM> guarantee that a byte is in this range. Only an explicit
test like <SAMP>`c &#62;= 'A' &#38;&#38; c &#60;= 'Z''</SAMP> guarantees this.) This was the
default in versions of <CODE>gperf</CODE> earlier than 2.7; now the default is
-to assume 8-bit characters.
+to support 8-bit and multibyte characters.
+
+<DT><SAMP>`-l'</SAMP>
+<DD>
+<DT><SAMP>`--compare-lengths'</SAMP>
+<DD>
+Compare keyword lengths before trying a string comparison. This option
+is mandatory for binary comparisons (see section <A HREF="gperf_5.html#SEC17">3.3 Use of NUL bytes</A>). It also might
+cut down on the number of string comparisons made during the lookup, since
+keywords with different lengths are never compared via <CODE>strcmp</CODE>.
+However, using <SAMP>`-l'</SAMP> might greatly increase the size of the
+generated C code if the lookup table range is large (which implies that
+the switch option <SAMP>`-S'</SAMP> or <SAMP>`%switch'</SAMP> is not enabled), since the length
+table contains as many elements as there are entries in the lookup table.
<DT><SAMP>`-c'</SAMP>
<DD>
@@ -198,35 +256,66 @@ include this header file himself to allow compilation of the code.
<DT><SAMP>`-G'</SAMP>
<DD>
-<DT><SAMP>`--global'</SAMP>
+<DT><SAMP>`--global-table'</SAMP>
<DD>
Generate the static table of keywords as a static global variable,
rather than hiding it inside of the lookup function (which is the
default behavior).
+<DT><SAMP>`-P'</SAMP>
+<DD>
+<DT><SAMP>`--pic'</SAMP>
+<DD>
+Optimize the generated table for inclusion in shared libraries. This
+reduces the startup time of programs using a shared library containing
+the generated code. If the option <SAMP>`-t'</SAMP> (or, equivalently, the
+<SAMP>`%struct-type'</SAMP> declaration) is also given, the first field of the
+user-defined struct must be of type <SAMP>`int'</SAMP>, not <SAMP>`char *'</SAMP>, because
+it will contain offsets into the string pool instead of actual strings.
+To convert such an offset to a string, you can use the expression
+<SAMP>`stringpool + <VAR>o</VAR>'</SAMP>, where <VAR>o</VAR> is the offset. The string pool
+name can be changed through the option <SAMP>`--string-pool-name'</SAMP>.
+
+<DT><SAMP>`-Q <VAR>string-pool-name</VAR>'</SAMP>
+<DD>
+<DT><SAMP>`--string-pool-name=<VAR>string-pool-name</VAR>'</SAMP>
+<DD>
+Allows you to specify the name of the generated string pool created by
+option <SAMP>`-P'</SAMP>. The default name is <SAMP>`stringpool'</SAMP>. This option
+permits the use of two hash tables in the same file, with <SAMP>`-P'</SAMP> and
+even when the option <SAMP>`-G'</SAMP> (or, equivalently, the <SAMP>`%global-table'</SAMP>
+declaration) is given.
+
+<DT><SAMP>`--null-strings'</SAMP>
+<DD>
+Use NULL strings instead of empty strings for empty keyword table entries.
+This reduces the startup time of programs using a shared library containing
+the generated code (but not as much as option <SAMP>`-P'</SAMP>), at the expense
+of one more test-and-branch instruction at run time.
+
<DT><SAMP>`-W <VAR>hash-table-array-name</VAR>'</SAMP>
<DD>
<DT><SAMP>`--word-array-name=<VAR>hash-table-array-name</VAR>'</SAMP>
<DD>
-<A NAME="IDX20"></A>
+<A NAME="IDX42"></A>
Allows you to specify the name for the generated array containing the
hash table. Default name is <SAMP>`wordlist'</SAMP>. This option permits the
use of two hash tables in the same file, even when the option <SAMP>`-G'</SAMP>
-is given.
+(or, equivalently, the <SAMP>`%global-table'</SAMP> declaration) is given.
<DT><SAMP>`-S <VAR>total-switch-statements</VAR>'</SAMP>
<DD>
<DT><SAMP>`--switch=<VAR>total-switch-statements</VAR>'</SAMP>
<DD>
-<A NAME="IDX21"></A>
+<A NAME="IDX43"></A>
Causes the generated C code to use a <CODE>switch</CODE> statement scheme,
rather than an array lookup table. This can lead to a reduction in both
-time and space requirements for some keyfiles. The argument to this
-option determines how many <CODE>switch</CODE> statements are generated. A
+time and space requirements for some input files. The argument to this
+option determines how many <CODE>switch</CODE> statements are generated. A
value of 1 generates 1 <CODE>switch</CODE> containing all the elements, a
value of 2 generates 2 tables with 1/2 the elements in each
<CODE>switch</CODE>, etc. This is useful since many C compilers cannot
-correctly generate code for large <CODE>switch</CODE> statements. This option
+correctly generate code for large <CODE>switch</CODE> statements. This option
was inspired in part by Keith Bostic's original C program.
<DT><SAMP>`-T'</SAMP>
@@ -239,92 +328,66 @@ this option if the type is already defined elsewhere.
<DT><SAMP>`-p'</SAMP>
<DD>
This option is supported for compatibility with previous releases of
-<CODE>gperf</CODE>. It does not do anything.
+<CODE>gperf</CODE>. It does not do anything.
</DL>
-<H2><A NAME="SEC18" HREF="gperf_toc.html#TOC18">4.4 Options for changing the Algorithms employed by <CODE>gperf</CODE></A></H2>
+<H2><A NAME="SEC23" HREF="gperf_toc.html#TOC23">4.5 Options for changing the Algorithms employed by <CODE>gperf</CODE></A></H2>
<DL COMPACT>
-<DT><SAMP>`-k <VAR>keys</VAR>'</SAMP>
+<DT><SAMP>`-k <VAR>selected-byte-positions</VAR>'</SAMP>
<DD>
-<DT><SAMP>`--key-positions=<VAR>keys</VAR>'</SAMP>
+<DT><SAMP>`--key-positions=<VAR>selected-byte-positions</VAR>'</SAMP>
<DD>
-Allows selection of the character key positions used in the keywords'
-hash function. The allowable choices range between 1-126, inclusive.
+Allows selection of the byte positions used in the keywords'
+hash function. The allowable choices range between 1-255, inclusive.
The positions are separated by commas, e.g., <SAMP>`-k 9,4,13,14'</SAMP>;
ranges may be used, e.g., <SAMP>`-k 2-7'</SAMP>; and positions may occur
-in any order. Furthermore, the meta-character '*' causes the generated
-hash function to consider <STRONG>all</STRONG> character positions in each key,
-whereas '$' instructs the hash function to use the "final character"
-of a key (this is the only way to use a character position greater than
-126, incidentally).
+in any order. Furthermore, the wildcard '*' causes the generated
+hash function to consider <STRONG>all</STRONG> byte positions in each keyword,
+whereas '$' instructs the hash function to use the "final byte"
+of a keyword (this is the only way to use a byte position greater than
+255, incidentally).
For instance, the option <SAMP>`-k 1,2,4,6-10,'$''</SAMP> generates a hash
function that considers positions 1,2,4,6,7,8,9,10, plus the last
-character in each key (which may differ for each key, obviously). Keys
-with length less than the indicated key positions work properly, since
-selected key positions exceeding the key length are simply not
+byte in each keyword (which may be at a different position for each
+keyword, obviously). Keywords
+with length less than the indicated byte positions work properly, since
+selected byte positions exceeding the keyword length are simply not
referenced in the hash function.
-<DT><SAMP>`-l'</SAMP>
-<DD>
-<DT><SAMP>`--compare-strlen'</SAMP>
-<DD>
-Compare key lengths before trying a string comparison. This might cut
-down on the number of string comparisons made during the lookup, since
-keys with different lengths are never compared via <CODE>strcmp</CODE>.
-However, using <SAMP>`-l'</SAMP> might greatly increase the size of the
-generated C code if the lookup table range is large (which implies that
-the switch option <SAMP>`-S'</SAMP> is not enabled), since the length table
-contains as many elements as there are entries in the lookup table.
-This option is mandatory for binary comparisons (see section <A HREF="gperf_5.html#SEC13">3.3 Use of NUL characters</A>).
+This option is not normally needed since version 2.8 of <CODE>gperf</CODE>;
+the default byte positions are computed depending on the keyword set,
+through a search that minimizes the number of byte positions.
<DT><SAMP>`-D'</SAMP>
<DD>
<DT><SAMP>`--duplicates'</SAMP>
<DD>
-<A NAME="IDX22"></A>
-Handle keywords whose key position sets hash to duplicate values.
-Duplicate hash values occur for two reasons:
-
-
-<UL>
-<LI>
-
-Since <CODE>gperf</CODE> does not backtrack it is possible for it to process
-all your input keywords without finding a unique mapping for each word.
-However, frequently only a very small number of duplicates occur, and
-the majority of keys still require one probe into the table.
-
-<LI>
-
-Sometimes a set of keys may have the same names, but possess different
-attributes. With the -D option <CODE>gperf</CODE> treats all these keys as
+<A NAME="IDX44"></A>
+Handle keywords whose selected byte sets hash to duplicate values.
+Duplicate hash values can occur if a set of keywords has the same names, but
+possesses different attributes, or if the selected byte positions are not well
+chosen. With the -D option <CODE>gperf</CODE> treats all these keywords as
part of an equivalence class and generates a perfect hash function with
-multiple comparisons for duplicate keys. It is up to you to completely
+multiple comparisons for duplicate keywords. It is up to you to completely
disambiguate the keywords by modifying the generated C code. However,
<CODE>gperf</CODE> helps you out by organizing the output.
-</UL>
-Option <SAMP>`-D'</SAMP> is extremely useful for certain large or highly
-redundant keyword sets, e.g., assembler instruction opcodes.
Using this option usually means that the generated hash function is no
longer perfect. On the other hand, it permits <CODE>gperf</CODE> to work on
keyword sets that it otherwise could not handle.
-<DT><SAMP>`-f <VAR>iteration-amount</VAR>'</SAMP>
+<DT><SAMP>`-m <VAR>iterations</VAR>'</SAMP>
<DD>
-<DT><SAMP>`--fast=<VAR>iteration-amount</VAR>'</SAMP>
+<DT><SAMP>`--multiple-iterations=<VAR>iterations</VAR>'</SAMP>
<DD>
-Generate the perfect hash function "fast". This decreases
-<CODE>gperf</CODE>'s running time at the cost of minimizing generated
-table-size. The iteration amount represents the number of times to
-iterate when resolving a collision. `0' means iterate by the number of
-keywords. This option is probably most useful when used in conjunction
-with options <SAMP>`-D'</SAMP> and/or <SAMP>`-S'</SAMP> for <EM>large</EM> keyword sets.
+Perform multiple choices of the <SAMP>`-i'</SAMP> and <SAMP>`-j'</SAMP> values, and
+choose the best results. This increases the running time by a factor of
+<VAR>iterations</VAR> but does a good job minimizing the generated table size.
<DT><SAMP>`-i <VAR>initial-value</VAR>'</SAMP>
<DD>
@@ -333,16 +396,17 @@ with options <SAMP>`-D'</SAMP> and/or <SAMP>`-S'</SAMP> for <EM>large</EM> keywo
Provides an initial <VAR>value</VAR> for the associate values array. Default
is 0. Increasing the initial value helps inflate the final table size,
possibly leading to more time efficient keyword lookups. Note that this
-option is not particularly useful when <SAMP>`-S'</SAMP> is used. Also,
+option is not particularly useful when <SAMP>`-S'</SAMP> (or, equivalently,
+<SAMP>`%switch'</SAMP>) is used. Also,
<SAMP>`-i'</SAMP> is overridden when the <SAMP>`-r'</SAMP> option is used.
<DT><SAMP>`-j <VAR>jump-value</VAR>'</SAMP>
<DD>
<DT><SAMP>`--jump=<VAR>jump-value</VAR>'</SAMP>
<DD>
-<A NAME="IDX23"></A>
+<A NAME="IDX45"></A>
Affects the "jump value", i.e., how far to advance the associated
-character value upon collisions. <VAR>Jump-value</VAR> is rounded up to an
+byte value upon collisions. <VAR>Jump-value</VAR> is rounded up to an
odd number, the default is 5. If the <VAR>jump-value</VAR> is 0 <CODE>gperf</CODE>
jumps by random amounts.
@@ -354,24 +418,6 @@ Instructs the generator not to include the length of a keyword when
computing its hash value. This may save a few assembly instructions in
the generated lookup table.
-<DT><SAMP>`-o'</SAMP>
-<DD>
-<DT><SAMP>`--occurrence-sort'</SAMP>
-<DD>
-Reorders the keywords by sorting the keywords so that frequently
-occuring key position set components appear first. A second reordering
-pass follows so that keys with "already determined values" are placed
-towards the front of the keylist. This may decrease the time required
-to generate a perfect hash function for many keyword sets, and also
-produce more minimal perfect hash functions. The reason for this is
-that the reordering helps prune the search time by handling inevitable
-collisions early in the search process. On the other hand, if the
-number of keywords is <EM>very</EM> large using <SAMP>`-o'</SAMP> may
-<EM>increase</EM> <CODE>gperf</CODE>'s execution time, since collisions will
-begin earlier and continue throughout the remainder of keyword
-processing. See Cichelli's paper from the January 1980 Communications
-of the ACM for details.
-
<DT><SAMP>`-r'</SAMP>
<DD>
<DT><SAMP>`--random'</SAMP>
@@ -380,8 +426,7 @@ Utilizes randomness to initialize the associated values table. This
frequently generates solutions faster than using deterministic
initialization (which starts all associated values at 0). Furthermore,
using the randomization option generally increases the size of the
-table. If <CODE>gperf</CODE> has difficultly with a certain keyword set try using
-<SAMP>`-r'</SAMP> or <SAMP>`-D'</SAMP>.
+table.
<DT><SAMP>`-s <VAR>size-multiple</VAR>'</SAMP>
<DD>
@@ -389,36 +434,31 @@ table. If <CODE>gperf</CODE> has difficultly with a certain keyword set try usi
<DD>
Affects the size of the generated hash table. The numeric argument for
this option indicates "how many times larger or smaller" the maximum
-associated value range should be, in relationship to the number of keys.
-If the <VAR>size-multiple</VAR> is negative the maximum associated value is
-calculated by <EM>dividing</EM> it into the total number of keys. For
-example, a value of 3 means "allow the maximum associated value to be
-about 3 times larger than the number of input keys".
-
-Conversely, a value of -3 means "allow the maximum associated value to
-be about 3 times smaller than the number of input keys". Negative
-values are useful for limiting the overall size of the generated hash
-table, though this usually increases the number of duplicate hash
-values.
-
-If `generate switch' option <SAMP>`-S'</SAMP> is <EM>not</EM> enabled, the maximum
+associated value range should be, in relationship to the number of keywords.
+It can be written as an integer, a floating-point number or a fraction.
+For example, a value of 3 means "allow the maximum associated value to be
+about 3 times larger than the number of input keywords".
+Conversely, a value of 1/3 means "allow the maximum associated value to
+be about 3 times smaller than the number of input keywords". Values
+smaller than 1 are useful for limiting the overall size of the generated hash
+table, though the option <SAMP>`-m'</SAMP> is better at this purpose.
+
+If `generate switch' option <SAMP>`-S'</SAMP> (or, equivalently, <SAMP>`%switch'</SAMP>) is
+<EM>not</EM> enabled, the maximum
associated value influences the static array table size, and a larger
table should decrease the time required for an unsuccessful search, at
the expense of extra table space.
The default value is 1, thus the default maximum associated value about
-the same size as the number of keys (for efficiency, the maximum
+the same size as the number of keywords (for efficiency, the maximum
associated value is always rounded up to a power of 2). The actual
table size may vary somewhat, since this technique is essentially a
-heuristic. In particular, setting this value too high slows down
-<CODE>gperf</CODE>'s runtime, since it must search through a much larger range
-of values. Judicious use of the <SAMP>`-f'</SAMP> option helps alleviate this
-overhead, however.
+heuristic.
</DL>
-<H2><A NAME="SEC19" HREF="gperf_toc.html#TOC19">4.5 Informative Output</A></H2>
+<H2><A NAME="SEC24" HREF="gperf_toc.html#TOC24">4.6 Informative Output</A></H2>
<DL COMPACT>
@@ -448,6 +488,6 @@ option is enabled.
</DL>
<P><HR><P>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_5.html">previous</A>, <A HREF="gperf_7.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_5.html">previous</A>, <A HREF="gperf_7.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
</BODY>
</HTML>
diff --git a/doc/gperf_7.html b/doc/gperf_7.html
index 263bff2..084f646 100644
--- a/doc/gperf_7.html
+++ b/doc/gperf_7.html
@@ -1,16 +1,16 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
- from gperf.texi on 26 September 2000 -->
+ from gperf.texi on 7 May 2003 -->
<TITLE>Perfect Hash Function Generator - 5 Known Bugs and Limitations with gperf</TITLE>
</HEAD>
<BODY>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_6.html">previous</A>, <A HREF="gperf_8.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_6.html">previous</A>, <A HREF="gperf_8.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
<P><HR><P>
-<H1><A NAME="SEC20" HREF="gperf_toc.html#TOC20">5 Known Bugs and Limitations with <CODE>gperf</CODE></A></H1>
+<H1><A NAME="SEC25" HREF="gperf_toc.html#TOC25">5 Known Bugs and Limitations with <CODE>gperf</CODE></A></H1>
<P>
The following are some limitations with the current release of
@@ -29,16 +29,6 @@ work efficiently on much larger keyword sets (over 15,000 keywords).
When processing large keyword sets it helps greatly to have over 8 megs
of RAM.
-However, since <CODE>gperf</CODE> does not backtrack no guaranteed solution
-occurs on every run. On the other hand, it is usually easy to obtain a
-solution by varying the option parameters. In particular, try the
-<SAMP>`-r'</SAMP> option, and also try changing the default arguments to the
-<SAMP>`-s'</SAMP> and <SAMP>`-j'</SAMP> options. To <EM>guarantee</EM> a solution, use
-the <SAMP>`-D'</SAMP> and <SAMP>`-S'</SAMP> options, although the final results are not
-likely to be a <EM>perfect</EM> hash function anymore! Finally, use the
-<SAMP>`-f'</SAMP> option if you want <CODE>gperf</CODE> to generate the perfect hash
-function <EM>fast</EM>, with less emphasis on making it minimal.
-
<LI>
The size of the generate static keyword array can get <EM>extremely</EM>
@@ -47,20 +37,20 @@ similar. This tends to slow down the compilation of the generated C
code, and <EM>greatly</EM> inflates the object code size. If this
situation occurs, consider using the <SAMP>`-S'</SAMP> option to reduce data
size, potentially increasing keyword recognition time a negligible
-amount. Since many C compilers cannot correctly generated code for
+amount. Since many C compilers cannot correctly generate code for
large switch statements it is important to qualify the <VAR>-S</VAR> option
with an appropriate numerical argument that controls the number of
switch statements generated.
<LI>
-The maximum number of key positions selected for a given key has an
-arbitrary limit of 126. This restriction should be removed, and if
+The maximum number of selected byte positions has an
+arbitrary limit of 255. This restriction should be removed, and if
anyone considers this a problem write me and let me know so I can remove
the constraint.
</UL>
<P><HR><P>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_6.html">previous</A>, <A HREF="gperf_8.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_6.html">previous</A>, <A HREF="gperf_8.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
</BODY>
</HTML>
diff --git a/doc/gperf_8.html b/doc/gperf_8.html
index a016c5d..58460aa 100644
--- a/doc/gperf_8.html
+++ b/doc/gperf_8.html
@@ -1,16 +1,16 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
- from gperf.texi on 26 September 2000 -->
+ from gperf.texi on 7 May 2003 -->
<TITLE>Perfect Hash Function Generator - 6 Things Still Left to Do</TITLE>
</HEAD>
<BODY>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_7.html">previous</A>, <A HREF="gperf_9.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_7.html">previous</A>, <A HREF="gperf_9.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
<P><HR><P>
-<H1><A NAME="SEC21" HREF="gperf_toc.html#TOC21">6 Things Still Left to Do</A></H1>
+<H1><A NAME="SEC26" HREF="gperf_toc.html#TOC26">6 Things Still Left to Do</A></H1>
<P>
It should be "relatively" easy to replace the current perfect hash
@@ -23,19 +23,10 @@ worthwhile improvements include:
<UL>
<LI>
-Make the algorithm more robust. At present, the program halts with an
-error diagnostic if it can't find a direct solution and the <SAMP>`-D'</SAMP>
-option is not enabled. A more comprehensive, albeit computationally
-expensive, approach would employ backtracking or enable alternative
-options and retry. It's not clear how helpful this would be, in
-general, since most search sets are rather small in practice.
-
-<LI>
-
Another useful extension involves modifying the program to generate
"minimal" perfect hash functions (under certain circumstances, the
current version can be rather extravagant in the generated table size).
-Again, this is mostly of theoretical interest, since a sparse table
+This is mostly of theoretical interest, since a sparse table
often produces faster lookups, and use of the <SAMP>`-S'</SAMP> <CODE>switch</CODE>
option can minimize the data size, at the expense of slightly longer
lookups (note that the gcc compiler generally produces good code for
@@ -44,11 +35,11 @@ lookups (note that the gcc compiler generally produces good code for
<LI>
In addition to improving the algorithm, it would also be useful to
-generate a C++ class or Ada package as the code output, in addition to
-the current C routines.
+generate an Ada package as the code output, in addition to the current
+C and C++ routines.
</UL>
<P><HR><P>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_7.html">previous</A>, <A HREF="gperf_9.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_7.html">previous</A>, <A HREF="gperf_9.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
</BODY>
</HTML>
diff --git a/doc/gperf_9.html b/doc/gperf_9.html
index e9c933d..4bc59ce 100644
--- a/doc/gperf_9.html
+++ b/doc/gperf_9.html
@@ -1,30 +1,95 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
- from gperf.texi on 26 September 2000 -->
+ from gperf.texi on 7 May 2003 -->
-<TITLE>Perfect Hash Function Generator - 7 Implementation Details of GNU gperf</TITLE>
+<TITLE>Perfect Hash Function Generator - 7 Bibliography</TITLE>
</HEAD>
<BODY>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_8.html">previous</A>, <A HREF="gperf_10.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_8.html">previous</A>, <A HREF="gperf_10.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
<P><HR><P>
-<H1><A NAME="SEC22" HREF="gperf_toc.html#TOC22">7 Implementation Details of GNU <CODE>gperf</CODE></A></H1>
+<H1><A NAME="SEC27" HREF="gperf_toc.html#TOC27">7 Bibliography</A></H1>
<P>
-A paper describing the high-level description of the data structures and
-algorithms used to implement <CODE>gperf</CODE> will soon be available. This
-paper is useful not only from a maintenance and enhancement perspective,
-but also because they demonstrate several clever and useful programming
-techniques, e.g., `Iteration Number' boolean arrays, double
-hashing, a "safe" and efficient method for reading arbitrarily long
-input from a file, and a provably optimal algorithm for simultaneously
-determining both the minimum and maximum elements in a list.
+[1] Chang, C.C.: <I>A Scheme for Constructing Ordered Minimal Perfect
+Hashing Functions</I> Information Sciences 39(1986), 187-195.
</P>
+<P>
+[2] Cichelli, Richard J. <I>Author's Response to "On Cichelli's Minimal Perfect Hash
+Functions Method"</I> Communications of the ACM, 23, 12(December 1980), 729.
+
+</P>
+<P>
+[3] Cichelli, Richard J. <I>Minimal Perfect Hash Functions Made Simple</I>
+Communications of the ACM, 23, 1(January 1980), 17-19.
+</P>
+<P>
+[4] Cook, C. R. and Oldehoeft, R.R. <I>A Letter Oriented Minimal
+Perfect Hashing Function</I> SIGPLAN Notices, 17, 9(September 1982), 18-27.
+
+</P>
+<P>
+[5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M.
+<I>Practical Perfect Hashing</I> Computer Journal, 28, 1(January 1985), 54-58.
+
+</P>
+<P>
+[6] Jaeschke, G. <I>Reciprocal Hashing: A Method for Generating Minimal
+Perfect Hashing Functions</I> Communications of the ACM, 24, 12(December
+1981), 829-833.
+
+</P>
+<P>
+[7] Jaeschke, G. and Osterburg, G. <I>On Cichelli's Minimal Perfect
+Hash Functions Method</I> Communications of the ACM, 23, 12(December 1980),
+728-729.
+
+</P>
+<P>
+[8] Sager, Thomas J. <I>A Polynomial Time Generator for Minimal Perfect
+Hash Functions</I> Communications of the ACM, 28, 5(December 1985), 523-532
+
+</P>
+<P>
+[9] Schmidt, Douglas C. <I>GPERF: A Perfect Hash Function Generator</I>
+Second USENIX C++ Conference Proceedings, April 1990.
+
+</P>
+<P>
+[10] Schmidt, Douglas C. <I>GPERF: A Perfect Hash Function Generator</I>
+C++ Report, SIGS 10 10 (November/December 1998).
+
+</P>
+<P>
+[11] Sebesta, R.W. and Taylor, M.A. <I>Minimal Perfect Hash Functions
+for Reserved Word Lists</I> SIGPLAN Notices, 20, 12(September 1985), 47-53.
+
+</P>
+<P>
+[12] Sprugnoli, R. <I>Perfect Hashing Functions: A Single Probe
+Retrieving Method for Static Sets</I> Communications of the ACM, 20
+11(November 1977), 841-850.
+
+</P>
+<P>
+[13] Stallman, Richard M. <I>Using and Porting GNU CC</I> Free Software Foundation,
+1988.
+
+</P>
+<P>
+[14] Stroustrup, Bjarne <I>The C++ Programming Language.</I> Addison-Wesley, 1986.
+
+</P>
+<P>
+[15] Tiemann, Michael D. <I>User's Guide to GNU C++</I> Free Software
+Foundation, 1989.
+
+</P>
<P><HR><P>
-Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_8.html">previous</A>, <A HREF="gperf_10.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
+Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_8.html">previous</A>, <A HREF="gperf_10.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
</BODY>
</HTML>
diff --git a/doc/gperf_toc.html b/doc/gperf_toc.html
index 0ece534..3541fbb 100644
--- a/doc/gperf_toc.html
+++ b/doc/gperf_toc.html
@@ -1,15 +1,16 @@
<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
- from gperf.texi on 26 September 2000 -->
+ from gperf.texi on 7 May 2003 -->
<TITLE>Perfect Hash Function Generator - Table of Contents</TITLE>
</HEAD>
<BODY>
-<H1>User's Guide to <CODE>gperf</CODE> 2.7.2</H1>
+<H1>User's Guide to <CODE>gperf</CODE> 3.0</H1>
<H2>The GNU Perfect Hash Function Generator</H2>
-<H2>Edition 2.7.2, 26 September 2000</H2>
+<H2>Edition 3.0, 7 May 2003</H2>
<ADDRESS>Douglas C. Schmidt</ADDRESS>
+<ADDRESS>Bruno Haible</ADDRESS>
<P>
<P><HR><P>
<UL>
@@ -25,29 +26,35 @@
<UL>
<LI><A NAME="TOC8" HREF="gperf_5.html#SEC8">3.1 Input Format to <CODE>gperf</CODE></A>
<UL>
-<LI><A NAME="TOC9" HREF="gperf_5.html#SEC9">3.1.1 <CODE>struct</CODE> Declarations and C Code Inclusion</A>
-<LI><A NAME="TOC10" HREF="gperf_5.html#SEC10">3.1.2 Format for Keyword Entries</A>
-<LI><A NAME="TOC11" HREF="gperf_5.html#SEC11">3.1.3 Including Additional C Functions</A>
+<LI><A NAME="TOC9" HREF="gperf_5.html#SEC9">3.1.1 Declarations</A>
+<UL>
+<LI><A NAME="TOC10" HREF="gperf_5.html#SEC10">3.1.1.1 User-supplied <CODE>struct</CODE></A>
+<LI><A NAME="TOC11" HREF="gperf_5.html#SEC11">3.1.1.2 Gperf Declarations</A>
+<LI><A NAME="TOC12" HREF="gperf_5.html#SEC12">3.1.1.3 C Code Inclusion</A>
+</UL>
+<LI><A NAME="TOC13" HREF="gperf_5.html#SEC13">3.1.2 Format for Keyword Entries</A>
+<LI><A NAME="TOC14" HREF="gperf_5.html#SEC14">3.1.3 Including Additional C Functions</A>
+<LI><A NAME="TOC15" HREF="gperf_5.html#SEC15">3.1.4 Where to place directives for GNU <CODE>indent</CODE>.</A>
</UL>
-<LI><A NAME="TOC12" HREF="gperf_5.html#SEC12">3.2 Output Format for Generated C Code with <CODE>gperf</CODE></A>
-<LI><A NAME="TOC13" HREF="gperf_5.html#SEC13">3.3 Use of NUL characters</A>
+<LI><A NAME="TOC16" HREF="gperf_5.html#SEC16">3.2 Output Format for Generated C Code with <CODE>gperf</CODE></A>
+<LI><A NAME="TOC17" HREF="gperf_5.html#SEC17">3.3 Use of NUL bytes</A>
</UL>
-<LI><A NAME="TOC14" HREF="gperf_6.html#SEC14">4 Invoking <CODE>gperf</CODE></A>
+<LI><A NAME="TOC18" HREF="gperf_6.html#SEC18">4 Invoking <CODE>gperf</CODE></A>
<UL>
-<LI><A NAME="TOC15" HREF="gperf_6.html#SEC15">4.1 Options that affect Interpretation of the Input File</A>
-<LI><A NAME="TOC16" HREF="gperf_6.html#SEC16">4.2 Options to specify the Language for the Output Code</A>
-<LI><A NAME="TOC17" HREF="gperf_6.html#SEC17">4.3 Options for fine tuning Details in the Output Code</A>
-<LI><A NAME="TOC18" HREF="gperf_6.html#SEC18">4.4 Options for changing the Algorithms employed by <CODE>gperf</CODE></A>
-<LI><A NAME="TOC19" HREF="gperf_6.html#SEC19">4.5 Informative Output</A>
+<LI><A NAME="TOC19" HREF="gperf_6.html#SEC19">4.1 Specifying the Location of the Output File</A>
+<LI><A NAME="TOC20" HREF="gperf_6.html#SEC20">4.2 Options that affect Interpretation of the Input File</A>
+<LI><A NAME="TOC21" HREF="gperf_6.html#SEC21">4.3 Options to specify the Language for the Output Code</A>
+<LI><A NAME="TOC22" HREF="gperf_6.html#SEC22">4.4 Options for fine tuning Details in the Output Code</A>
+<LI><A NAME="TOC23" HREF="gperf_6.html#SEC23">4.5 Options for changing the Algorithms employed by <CODE>gperf</CODE></A>
+<LI><A NAME="TOC24" HREF="gperf_6.html#SEC24">4.6 Informative Output</A>
</UL>
-<LI><A NAME="TOC20" HREF="gperf_7.html#SEC20">5 Known Bugs and Limitations with <CODE>gperf</CODE></A>
-<LI><A NAME="TOC21" HREF="gperf_8.html#SEC21">6 Things Still Left to Do</A>
-<LI><A NAME="TOC22" HREF="gperf_9.html#SEC22">7 Implementation Details of GNU <CODE>gperf</CODE></A>
-<LI><A NAME="TOC23" HREF="gperf_10.html#SEC23">8 Bibliography</A>
-<LI><A NAME="TOC24" HREF="gperf_11.html#SEC24">Concept Index</A>
+<LI><A NAME="TOC25" HREF="gperf_7.html#SEC25">5 Known Bugs and Limitations with <CODE>gperf</CODE></A>
+<LI><A NAME="TOC26" HREF="gperf_8.html#SEC26">6 Things Still Left to Do</A>
+<LI><A NAME="TOC27" HREF="gperf_9.html#SEC27">7 Bibliography</A>
+<LI><A NAME="TOC28" HREF="gperf_10.html#SEC28">Concept Index</A>
</UL>
<P><HR><P>
-This document was generated on 26 September 2000 using the
+This document was generated on 7 May 2003 using the
<A HREF="http://wwwcn.cern.ch/dci/texi2html/">texi2html</A>
translator version 1.51.</P>
</BODY>