diff options
author | Bruno Haible <bruno@clisp.org> | 2000-08-19 06:20:11 +0000 |
---|---|---|
committer | Bruno Haible <bruno@clisp.org> | 2000-08-19 06:20:11 +0000 |
commit | 501ee3e64033e900181aaebd7ae44dbcacccca9d (patch) | |
tree | 2f1dfc500f54d083dba1a0ebb13b0e418631b6d4 /doc/gperf_6.html | |
download | gperf-501ee3e64033e900181aaebd7ae44dbcacccca9d.tar.gz |
Initial revision
Diffstat (limited to 'doc/gperf_6.html')
-rw-r--r-- | doc/gperf_6.html | 292 |
1 files changed, 292 insertions, 0 deletions
diff --git a/doc/gperf_6.html b/doc/gperf_6.html new file mode 100644 index 0000000..8a51b3b --- /dev/null +++ b/doc/gperf_6.html @@ -0,0 +1,292 @@ +<HTML> +<HEAD> +<!-- This HTML file has been created by texi2html 1.51 + from gperf.texi on 15 April 1998 --> + +<TITLE>User's Guide to gperf - 3 High-Level Description of GNU gperf</TITLE> +</HEAD> +<BODY> +Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_5.html">previous</A>, <A HREF="gperf_7.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>. +<P><HR><P> + + +<H1><A NAME="SEC8" HREF="gperf_toc.html#TOC8">3 High-Level Description of GNU <CODE>gperf</CODE></A></H1> + +<P> +The perfect hash function generator <CODE>gperf</CODE> reads a set of +"keywords" from a <STRONG>keyfile</STRONG> (or from the standard input by +default). It attempts to derive a perfect hashing function that +recognizes a member of the <STRONG>static keyword set</STRONG> with at most a +single probe into the lookup table. If <CODE>gperf</CODE> succeeds in +generating such a function it produces a pair of C source code routines +that perform hashing and table lookup recognition. All generated C code +is directed to the standard output. Command-line options described +below allow you to modify the input and output format to <CODE>gperf</CODE>. + +</P> +<P> +By default, <CODE>gperf</CODE> attempts to produce time-efficient code, with +less emphasis on efficient space utilization. However, several options +exist that permit trading-off execution time for storage space and vice +versa. In particular, expanding the generated table size produces a +sparse search structure, generally yielding faster searches. +Conversely, you can direct <CODE>gperf</CODE> to utilize a C <CODE>switch</CODE> +statement scheme that minimizes data space storage size. Furthermore, +using a C <CODE>switch</CODE> may actually speed up the keyword retrieval time +somewhat. Actual results depend on your C compiler, of course. + +</P> +<P> +In general, <CODE>gperf</CODE> assigns values to the characters it is using +for hashing until some set of values gives each keyword a unique value. +A helpful heuristic is that the larger the hash value range, the easier +it is for <CODE>gperf</CODE> to find and generate a perfect hash function. +Experimentation is the key to getting the most from <CODE>gperf</CODE>. + +</P> + + +<H2><A NAME="SEC9" HREF="gperf_toc.html#TOC9">3.1 Input Format to <CODE>gperf</CODE></A></H2> + +<P> +You can control the input keyfile format by varying certain command-line +arguments, in particular the <SAMP>`-t'</SAMP> option. The input's appearance +is similar to GNU utilities <CODE>flex</CODE> and <CODE>bison</CODE> (or UNIX +utilities <CODE>lex</CODE> and <CODE>yacc</CODE>). Here's an outline of the general +format: + +</P> + +<PRE> +declarations +%% +keywords +%% +functions +</PRE> + +<P> +<EM>Unlike</EM> <CODE>flex</CODE> or <CODE>bison</CODE>, all sections of <CODE>gperf</CODE>'s input +are optional. The following sections describe the input format for each +section. + +</P> + + + +<H3><A NAME="SEC10" HREF="gperf_toc.html#TOC10">3.1.1 <CODE>struct</CODE> Declarations and C Code Inclusion</A></H3> + +<P> +The keyword input file optionally contains a section for including +arbitrary C declarations and definitions, as well as provisions for +providing a user-supplied <CODE>struct</CODE>. If the <SAMP>`-t'</SAMP> option +<EM>is</EM> enabled, you <EM>must</EM> provide a C <CODE>struct</CODE> as the last +component in the declaration section from the keyfile file. The first +field in this struct must be a <CODE>char *</CODE> identifier called <SAMP>`name'</SAMP>, +although it is possible to modify this field's name with the <SAMP>`-K'</SAMP> +option described below. + +</P> +<P> +Here is simple example, using months of the year and their attributes as +input: + +</P> + +<PRE> +struct months { char *name; int number; int days; int leap_days; }; +%% +january, 1, 31, 31 +february, 2, 28, 29 +march, 3, 31, 31 +april, 4, 30, 30 +may, 5, 31, 31 +june, 6, 30, 30 +july, 7, 31, 31 +august, 8, 31, 31 +september, 9, 30, 30 +october, 10, 31, 31 +november, 11, 30, 30 +december, 12, 31, 31 +</PRE> + +<P> +Separating the <CODE>struct</CODE> declaration from the list of key words and +other fields are a pair of consecutive percent signs, <CODE>%%</CODE>, +appearing left justified in the first column, as in the UNIX utility +<CODE>lex</CODE>. + +</P> +<P> +Using a syntax similar to GNU utilities <CODE>flex</CODE> and <CODE>bison</CODE>, it +is possible to directly include C source text and comments verbatim into +the generated output file. This is accomplished by enclosing the region +inside left-justified surrounding <CODE>%{</CODE>, <CODE>%}</CODE> pairs. Here is +an input fragment based on the previous example that illustrates this +feature: + +</P> + +<PRE> +%{ +#include <assert.h> +/* This section of code is inserted directly into the output. */ +int return_month_days (struct months *months, int is_leap_year); +%} +struct months { char *name; int number; int days; int leap_days; }; +%% +january, 1, 31, 31 +february, 2, 28, 29 +march, 3, 31, 31 +... +</PRE> + +<P> +It is possible to omit the declaration section entirely. In this case +the keyfile begins directly with the first keyword line, e.g.: + +</P> + +<PRE> +january, 1, 31, 31 +february, 2, 28, 29 +march, 3, 31, 31 +april, 4, 30, 30 +... +</PRE> + + + +<H3><A NAME="SEC11" HREF="gperf_toc.html#TOC11">3.1.2 Format for Keyword Entries</A></H3> + +<P> +The second keyfile format section contains lines of keywords and any +associated attributes you might supply. A line beginning with <SAMP>`#'</SAMP> +in the first column is considered a comment. Everything following the +<SAMP>`#'</SAMP> is ignored, up to and including the following newline. + +</P> +<P> +The first field of each non-comment line is always the key itself. It +should be given as a simple name, i.e., without surrounding +string quotation marks, and be left-justified flush against the first +column. In this context, a "field" is considered to extend up to, but +not include, the first blank, comma, or newline. Here is a simple +example taken from a partial list of C reserved words: + +</P> + +<PRE> +# These are a few C reserved words, see the c.<CODE>gperf</CODE> file +# for a complete list of ANSI C reserved words. +unsigned +sizeof +switch +signed +if +default +for +while +return +</PRE> + +<P> +Note that unlike <CODE>flex</CODE> or <CODE>bison</CODE> the first <CODE>%%</CODE> marker +may be elided if the declaration section is empty. + +</P> +<P> +Additional fields may optionally follow the leading keyword. Fields +should be separated by commas, and terminate at the end of line. What +these fields mean is entirely up to you; they are used to initialize the +elements of the user-defined <CODE>struct</CODE> provided by you in the +declaration section. If the <SAMP>`-t'</SAMP> option is <EM>not</EM> enabled +these fields are simply ignored. All previous examples except the last +one contain keyword attributes. + +</P> + + +<H3><A NAME="SEC12" HREF="gperf_toc.html#TOC12">3.1.3 Including Additional C Functions</A></H3> + +<P> +The optional third section also corresponds closely with conventions +found in <CODE>flex</CODE> and <CODE>bison</CODE>. All text in this section, +starting at the final <CODE>%%</CODE> and extending to the end of the input +file, is included verbatim into the generated output file. Naturally, +it is your responsibility to ensure that the code contained in this +section is valid C. + +</P> + + +<H2><A NAME="SEC13" HREF="gperf_toc.html#TOC13">3.2 Output Format for Generated C Code with <CODE>gperf</CODE></A></H2> + +<P> +Several options control how the generated C code appears on the standard +output. Two C function are generated. They are called <CODE>hash</CODE> and +<CODE>in_word_set</CODE>, although you may modify the name for +<CODE>in_word_set</CODE> with a command-line option. Both functions require +two arguments, a string, <CODE>char *</CODE> <VAR>str</VAR>, and a length +parameter, <CODE>int</CODE> <VAR>len</VAR>. Their default function prototypes are +as follows: + +</P> + +<PRE> +static int hash (char *str, int len); +int in_word_set (char *str, int len); +</PRE> + +<P> +By default, the generated <CODE>hash</CODE> function returns an integer value +created by adding <VAR>len</VAR> to several user-specified <VAR>str</VAR> key +positions indexed into an <STRONG>associated values</STRONG> table stored in a +local static array. The associated values table is constructed +internally by <CODE>gperf</CODE> and later output as a static local C array called +<VAR>hash_table</VAR>; its meaning and properties are described below. +See section <A HREF="gperf_10.html#SEC22">7 Implementation Details of GNU <CODE>gperf</CODE></A>. The relevant key positions are specified via the +<SAMP>`-k'</SAMP> option when running <CODE>gperf</CODE>, as detailed in the <EM>Options</EM> +section below. See section <A HREF="gperf_7.html#SEC14">4 Options to the <CODE>gperf</CODE> Utility</A>. + +</P> +<P> +Two options, <SAMP>`-g'</SAMP> (assume you are compiling with GNU C and its +<CODE>inline</CODE> feature) and <SAMP>`-a'</SAMP> (assume ANSI C-style function +prototypes), alter the content of both the generated <CODE>hash</CODE> and +<CODE>in_word_set</CODE> routines. However, function <CODE>in_word_set</CODE> may +be modified more extensively, in response to your option settings. The +options that affect the <CODE>in_word_set</CODE> structure are: + +</P> + +<UL> +<DL COMPACT> + +<DT><SAMP>`-t'</SAMP> +<DD> +Make use of the user-defined <CODE>struct</CODE>. + +<DT><SAMP>`-S <VAR>total switch statements</VAR>'</SAMP> +<DD> +Generate 1 or more C <CODE>switch</CODE> statement rather than use a large, +(and potentially sparse) static array. Although the exact time and +space savings of this approach vary according to your C compiler's +degree of optimization, this method often results in smaller and faster +code. +</DL> +</UL> + +<P> +If the <SAMP>`-t'</SAMP> and <SAMP>`-S'</SAMP> options are omitted, the +default action is to generate a <CODE>char *</CODE> array containing the keys, +together with additional null strings used for padding the array. By +experimenting with the various input and output options, and timing the +resulting C code, you can determine the best option choices for +different keyword set characteristics. + +</P> +<P><HR><P> +Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_5.html">previous</A>, <A HREF="gperf_7.html">next</A>, <A HREF="gperf_11.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>. +</BODY> +</HTML> |