summaryrefslogtreecommitdiff
path: root/apps/gperf/gperf.texi
diff options
context:
space:
mode:
Diffstat (limited to 'apps/gperf/gperf.texi')
-rw-r--r--apps/gperf/gperf.texi1184
1 files changed, 0 insertions, 1184 deletions
diff --git a/apps/gperf/gperf.texi b/apps/gperf/gperf.texi
deleted file mode 100644
index 649d05f7ec6..00000000000
--- a/apps/gperf/gperf.texi
+++ /dev/null
@@ -1,1184 +0,0 @@
-\input texinfo @c -*-texinfo-*-
-
-@settitle User's Guide to @code{gperf}
-@setfilename gperf.info
-
-@ifinfo
-@format
-START-INFO-DIR-ENTRY
-* Gperf: (gperf). Perfect Hash Function Generator.
-END-INFO-DIR-ENTRY
-@end format
-@end ifinfo
-
-@ifinfo
-This file documents the features of the GNU Perfect Hash Function Generator
-
-Copyright (C) 1989 Free Software Foundation, Inc.
-
-Permission is granted to make and distribute verbatim copies of
-this manual provided the copyright notice and this permission notice
-are preserved on all copies.
-
-@ignore
-Permission is granted to process this file through @TeX{} and print the
-results, provided the printed document carries copying permission
-notice identical to this one except for the removal of this paragraph
-(this paragraph not being relevant to the printed manual).
-
-@end ignore
-
-Permission is granted to copy and distribute modified versions of this
-manual under the conditions for verbatim copying, provided also that the
-section entitled ``GNU General Public License'' is included exactly as
-in the original, and provided that the entire resulting derived work is
-distributed under the terms of a permission notice identical to this one.
-
-Permission is granted to copy and distribute translations of this manual
-into another language, under the above conditions for modified versions,
-except that the section entitled ``GNU @code{gperf} General Public License'' an
-d
-this permission notice may be included in translations approved by the
-Free Software Foundation instead of in the original English.
-@end ifinfo
-
-@setchapternewpage odd
-
-@titlepage
-@center @titlefont{User's Guide}
-@sp 2
-@center @titlefont{for the}
-@sp 2
-@center @titlefont{GNU GPERF Utility}
-@sp 4
-@center Douglas C. Schmidt
-@sp 3
-@center last updated 1 November 1989
-@sp 1
-@center for version 2.0
-@page
-@vskip 0pt plus 1filll
-Copyright @copyright{} 1989 Free Software Foundation, Inc.
-
-
-Permission is granted to make and distribute verbatim copies of
-this manual provided the copyright notice and this permission notice
-are preserved on all copies.
-
-Permission is granted to copy and distribute modified versions of this
-manual under the conditions for verbatim copying, provided also that the
-section entitled ``GNU @code{gperf} General Public License'' is included exactl
-y as
-in the original, and provided that the entire resulting derived work is
-distributed under the terms of a permission notice identical to this one.
-
-Permission is granted to copy and distribute translations of this manual
-into another language, under the above conditions for modified versions,
-except that the section entitled ``GNU @code{gperf} General Public License'' ma
-y be
-included in a translation approved by the author instead of in the original
-English.
-@end titlepage
-
-@ifinfo
-@node Top, Copying, (dir), (dir)
-@ichapter Introduction
-
-This manual documents the GNU @code{gperf} perfect hash function generator
-utility, focusing on its features and how to use them, and how to report
-bugs.
-
-@end ifinfo
-@menu
-* Copying:: GNU @code{gperf} General Public License says
- how you can copy and share @code{gperf}.
-* Contributors:: People who have contributed to @code{gperf}.
-* Motivation:: Static search structures and GNU GPERF.
-* Search Structures:: Static search structures and GNU @code{gperf}
-* Description:: High-level discussion of how GPERF functions.
-* Options:: A description of options to the program.
-* Bugs:: Known bugs and limitations with GPERF.
-* Projects:: Things still left to do.
-* Implementation:: Implementation Details for GNU GPERF.
-* Bibliography:: Material Referenced in this Report.
-
- --- The Detailed Node Listing ---
-
-High-Level Description of GNU @code{gperf}
-
-* Input Format:: Input Format to @code{gperf}
-* Output Format:: Output Format for Generated C Code with @code{gperf}
-
-Input Format to @code{gperf}
-
-* Declarations:: @code{struct} Declarations and C Code Inclusion.
-* Keywords:: Format for Keyword Entries.
-* Functions:: Including Additional C Functions.
-@end menu
-
-@node Copying, Contributors, Top, Top
-@unnumbered GNU GENERAL PUBLIC LICENSE
-@center Version 1, February 1989
-
-@display
-Copyright @copyright{} 1989 Free Software Foundation, Inc.
-675 Mass Ave, Cambridge, MA 02139, USA
-
-Everyone is permitted to copy and distribute verbatim copies
-of this license document, but changing it is not allowed.
-@end display
-
-@unnumberedsec Preamble
-
- The license agreements of most software companies try to keep users
-at the mercy of those companies. By contrast, our General Public
-License is intended to guarantee your freedom to share and change free
-software---to make sure the software is free for all its users. The
-General Public License applies to the Free Software Foundation's
-software and to any other program whose authors commit to using it.
-You can use it for your programs, too.
-
- When we speak of free software, we are referring to freedom, not
-price. Specifically, the General Public License is designed to make
-sure that you have the freedom to give away or sell copies of free
-software, that you receive source code or can get it if you want it,
-that you can change the software or use pieces of it in new free
-programs; and that you know you can do these things.
-
- To protect your rights, we need to make restrictions that forbid
-anyone to deny you these rights or to ask you to surrender the rights.
-These restrictions translate to certain responsibilities for you if you
-distribute copies of the software, or if you modify it.
-
- For example, if you distribute copies of a such a program, whether
-gratis or for a fee, you must give the recipients all the rights that
-you have. You must make sure that they, too, receive or can get the
-source code. And you must tell them their rights.
-
- We protect your rights with two steps: (1) copyright the software, and
-(2) offer you this license which gives you legal permission to copy,
-distribute and/or modify the software.
-
- Also, for each author's protection and ours, we want to make certain
-that everyone understands that there is no warranty for this free
-software. If the software is modified by someone else and passed on, we
-want its recipients to know that what they have is not the original, so
-that any problems introduced by others will not reflect on the original
-authors' reputations.
-
- The precise terms and conditions for copying, distribution and
-modification follow.
-
-@iftex
-@unnumberedsec TERMS AND CONDITIONS
-@end iftex
-@ifinfo
-@center TERMS AND CONDITIONS
-@end ifinfo
-
-@enumerate
-@item
-This License Agreement applies to any program or other work which
-contains a notice placed by the copyright holder saying it may be
-distributed under the terms of this General Public License. The
-``Program'', below, refers to any such program or work, and a ``work based
-on the Program'' means either the Program or any work containing the
-Program or a portion of it, either verbatim or with modifications. Each
-licensee is addressed as ``you''.
-
-@item
-You may copy and distribute verbatim copies of the Program's source
-code as you receive it, in any medium, provided that you conspicuously and
-appropriately publish on each copy an appropriate copyright notice and
-disclaimer of warranty; keep intact all the notices that refer to this
-General Public License and to the absence of any warranty; and give any
-other recipients of the Program a copy of this General Public License
-along with the Program. You may charge a fee for the physical act of
-transferring a copy.
-
-@item
-You may modify your copy or copies of the Program or any portion of
-it, and copy and distribute such modifications under the terms of Paragraph
-1 above, provided that you also do the following:
-
-@itemize @bullet
-@item
-cause the modified files to carry prominent notices stating that
-you changed the files and the date of any change; and
-
-@item
-cause the whole of any work that you distribute or publish, that
-in whole or in part contains the Program or any part thereof, either
-with or without modifications, to be licensed at no charge to all
-third parties under the terms of this General Public License (except
-that you may choose to grant warranty protection to some or all
-third parties, at your option).
-
-@item
-If the modified program normally reads commands interactively when
-run, you must cause it, when started running for such interactive use
-in the simplest and most usual way, to print or display an
-announcement including an appropriate copyright notice and a notice
-that there is no warranty (or else, saying that you provide a
-warranty) and that users may redistribute the program under these
-conditions, and telling the user how to view a copy of this General
-Public License.
-
-@item
-You may charge a fee for the physical act of transferring a
-copy, and you may at your option offer warranty protection in
-exchange for a fee.
-@end itemize
-
-Mere aggregation of another independent work with the Program (or its
-derivative) on a volume of a storage or distribution medium does not bring
-the other work under the scope of these terms.
-
-@item
-You may copy and distribute the Program (or a portion or derivative of
-it, under Paragraph 2) in object code or executable form under the terms of
-Paragraphs 1 and 2 above provided that you also do one of the following:
-
-@itemize @bullet
-@item
-accompany it with the complete corresponding machine-readable
-source code, which must be distributed under the terms of
-Paragraphs 1 and 2 above; or,
-
-@item
-accompany it with a written offer, valid for at least three
-years, to give any third party free (except for a nominal charge
-for the cost of distribution) a complete machine-readable copy of the
-corresponding source code, to be distributed under the terms of
-Paragraphs 1 and 2 above; or,
-
-@item
-accompany it with the information you received as to where the
-corresponding source code may be obtained. (This alternative is
-allowed only for noncommercial distribution and only if you
-received the program in object code or executable form alone.)
-@end itemize
-
-Source code for a work means the preferred form of the work for making
-modifications to it. For an executable file, complete source code means
-all the source code for all modules it contains; but, as a special
-exception, it need not include source code for modules which are standard
-libraries that accompany the operating system on which the executable
-file runs, or for standard header files or definitions files that
-accompany that operating system.
-
-@item
-You may not copy, modify, sublicense, distribute or transfer the
-Program except as expressly provided under this General Public License.
-Any attempt otherwise to copy, modify, sublicense, distribute or transfer
-the Program is void, and will automatically terminate your rights to use
-the Program under this License. However, parties who have received
-copies, or rights to use copies, from you under this General Public
-License will not have their licenses terminated so long as such parties
-remain in full compliance.
-
-@item
-By copying, distributing or modifying the Program (or any work based
-on the Program) you indicate your acceptance of this license to do so,
-and all its terms and conditions.
-
-@item
-Each time you redistribute the Program (or any work based on the
-Program), the recipient automatically receives a license from the original
-licensor to copy, distribute or modify the Program subject to these
-terms and conditions. You may not impose any further restrictions on the
-recipients' exercise of the rights granted herein.
-
-@item
-The Free Software Foundation may publish revised and/or new versions
-of the General Public License from time to time. Such new versions will
-be similar in spirit to the present version, but may differ in detail to
-address new problems or concerns.
-
-Each version is given a distinguishing version number. If the Program
-specifies a version number of the license which applies to it and ``any
-later version'', you have the option of following the terms and conditions
-either of that version or of any later version published by the Free
-Software Foundation. If the Program does not specify a version number of
-the license, you may choose any version ever published by the Free Software
-Foundation.
-
-@item
-If you wish to incorporate parts of the Program into other free
-programs whose distribution conditions are different, write to the author
-to ask for permission. For software which is copyrighted by the Free
-Software Foundation, write to the Free Software Foundation; we sometimes
-make exceptions for this. Our decision will be guided by the two goals
-of preserving the free status of all derivatives of our free software and
-of promoting the sharing and reuse of software generally.
-
-@iftex
-@heading NO WARRANTY
-@end iftex
-@ifinfo
-@center NO WARRANTY
-@end ifinfo
-
-@item
-BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
-FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
-OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
-PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
-OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
-MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
-TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
-PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
-REPAIR OR CORRECTION.
-
-@item
-IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL
-ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
-REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
-INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES
-ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT
-LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES
-SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE
-WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
-ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
-@end enumerate
-
-@iftex
-@heading END OF TERMS AND CONDITIONS
-@end iftex
-@ifinfo
-@center END OF TERMS AND CONDITIONS
-@end ifinfo
-
-@page
-@unnumberedsec Appendix: How to Apply These Terms to Your New Programs
-
- If you develop a new program, and you want it to be of the greatest
-possible use to humanity, the best way to achieve this is to make it
-free software which everyone can redistribute and change under these
-terms.
-
- To do so, attach the following notices to the program. It is safest to
-attach them to the start of each source file to most effectively convey
-the exclusion of warranty; and each file should have at least the
-``copyright'' line and a pointer to where the full notice is found.
-
-@smallexample
-@var{one line to give the program's name and a brief idea of what it does.}
-Copyright (C) 19@var{yy} @var{name of author}
-
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 1, or (at your option)
-any later version.
-
-This program is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
-@end smallexample
-
-Also add information on how to contact you by electronic and paper mail.
-
-If the program is interactive, make it output a short notice like this
-when it starts in an interactive mode:
-
-@smallexample
-Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
-Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
-This is free software, and you are welcome to redistribute it
-under certain conditions; type `show c' for details.
-@end smallexample
-
-The hypothetical commands `show w' and `show c' should show the
-appropriate parts of the General Public License. Of course, the
-commands you use may be called something other than `show w' and `show
-c'; they could even be mouse-clicks or menu items---whatever suits your
-program.
-
-You should also get your employer (if you work as a programmer) or your
-school, if any, to sign a ``copyright disclaimer'' for the program, if
-necessary. Here a sample; alter the names:
-
-@example
-Yoyodyne, Inc., hereby disclaims all copyright interest in the
-program `Gnomovision' (a program to direct compilers to make passes
-at assemblers) written by James Hacker.
-
-@var{signature of Ty Coon}, 1 April 1989
-Ty Coon, President of Vice
-@end example
-
-That's all there is to it!
-
-@node Contributors, Motivation, Copying, Top
-@unnumbered Contributors to GNU @code{gperf} Utility
-
-@itemize @bullet
-@item
-The GNU @code{gperf} perfect hash function generator utility was
-originally written in GNU C++ by Douglas C. Schmidt. It is now also
-available in a highly-portable ``old-style'' C version. The general
-idea for the perfect hash function generator was inspired by Keith
-Bostic's algorithm written in C, and distributed to net.sources around
-1984. The current program is a heavily modified, enhanced, and extended
-implementation of Keith's basic idea, created at the University of
-California, Irvine. Bugs, patches, and suggestions should be reported
-to schmidt at ics.uci.edu.
-
-@item
-Special thanks is extended to Michael Tiemann and Doug Lea, for
-providing a useful compiler, and for giving me a forum to exhibit my
-creation.
-
-In addition, Adam de Boor and Nels Olson provided many tips and insights
-that greatly helped improve the quality and functionality of @code{gperf}.
-@end itemize
-
-@node Motivation, Search Structures, Contributors, Top
-@chapter Introduction
-
-@code{gperf} is a perfect hash function generator written in C++. It
-transforms an @emph{n} element user-specified keyword set @emph{W} into
-a perfect hash function @emph{F}. @emph{F} uniquely maps keywords in
-@emph{W} onto the range 0..@emph{k}, where @emph{k} >= @emph{n}. If
-@emph{k = n} then @emph{F} is a @emph{minimal} perfect hash function.
-@code{gperf} generates a 0..@emph{k} element static lookup table and a
-pair of C functions. These functions determine whether a given
-character string @emph{s} occurs in @emph{W}, using at most one probe
-into the lookup table.
-
-@code{gperf} currently generates the reserved keyword recognizer for
-lexical analyzers in several production and research compilers and
-language processing tools, including GNU C, GNU C++, GNU Pascal, GNU
-Modula 3, and GNU indent. Complete C++ source code for @code{gperf} is
-available via anonymous ftp from ics.uci.edu. @code{gperf} also is
-distributed along with the GNU libg++ library. A highly portable,
-functionally equivalent K&R C version of @code{gperf} is archived in
-comp.sources.unix, volume 20. Finally, a paper describing
-@code{gperf}'s design and implementation in greater detail is available
-in the Second USENIX C++ Conference proceedings.
-
-@node Search Structures, Description, Motivation, Top
-@chapter Static search structures and GNU @code{gperf}
-
-A @dfn{static search structure} is an Abstract Data Type with certain
-fundamental operations, @emph{e.g.}, @emph{initialize}, @emph{insert},
-and @emph{retrieve}. Conceptually, all insertions occur before any
-retrievals. In practice, @code{gperf} generates a @code{static} array
-containing search set keywords and any associated attributes specified
-by the user. Thus, there is essentially no execution-time cost for the
-insertions. It is a useful data structure for representing @emph{static
-search sets}. Static search sets occur frequently in software system
-applications. Typical static search sets include compiler reserved
-words, assembler instruction opcodes, and built-in shell interpreter
-commands. Search set members, called @dfn{keywords}, are inserted into
-the structure only once, usually during program initialization, and are
-not generally modified at run-time.
-
-Numerous static search structure implementations exist, @emph{e.g.},
-arrays, linked lists, binary search trees, digital search tries, and
-hash tables. Different approaches offer trade-offs between space
-utilization and search time efficiency. For example, an @emph{n} element
-sorted array is space efficient, though the average-case time
-complexity for retrieval operations using binary search is
-proportional to log @emph{n}. Conversely, hash table implementations
-often locate a table entry in constant time, but typically impose
-additional memory overhead and exhibit poor worst case performance.
-
-
-@emph{Minimal perfect hash functions} provide an optimal solution for a
-particular class of static search sets. A minimal perfect hash
-function is defined by two properties:
-
-@itemize @bullet
-@item
-It allows keyword recognition in a static search set using at most
-@emph{one} probe into the hash table. This represents the ``perfect''
-property.
-@item
-The actual memory allocated to store the keywords is precisely large
-enough for the keyword set, and @emph{no larger}. This is the
-``minimal'' property.
-@end itemize
-
-For most applications it is far easier to generate @emph{perfect} hash
-functions than @emph{minimal perfect} hash functions. Moreover,
-non-minimal perfect hash functions frequently execute faster than
-minimal ones in practice. This phenomena occurs since searching a
-sparse keyword table increases the probability of locating a ``null''
-entry, thereby reducing string comparisons. @code{gperf}'s default
-behavior generates @emph{near-minimal} perfect hash functions for
-keyword sets. However, @code{gperf} provides many options that permit
-user control over the degree of minimality and perfection.
-
-Static search sets often exhibit relative stability over time. For
-example, Ada's 63 reserved words have remained constant for nearly a
-decade. It is therefore frequently worthwhile to expend concerted
-effort building an optimal search structure @emph{once}, if it
-subsequently receives heavy use multiple times. @code{gperf} removes
-the drudgery associated with constructing time- and space-efficient
-search structures by hand. It has proven a useful and practical tool
-for serious programming projects. Output from @code{gperf} is currently
-used in several production and research compilers, including GNU C, GNU
-C++, GNU Pascal, and GNU Modula 3. The latter two compilers are not yet
-part of the official GNU distribution. Each compiler utilizes
-@code{gperf} to automatically generate static search structures that
-efficiently identify their respective reserved keywords.
-
-@node Description, Options, Search Structures, Top
-@chapter High-Level Description of GNU @code{gperf}
-
-@menu
-* Input Format:: Input Format to @code{gperf}
-* Output Format:: Output Format for Generated C Code with @code{gperf}
-@end menu
-
-The perfect hash function generator @code{gperf} reads a set of
-``keywords'' from a @dfn{keyfile} (or from the standard input by
-default). It attempts to derive a perfect hashing function that
-recognizes a member of the @dfn{static keyword set} with at most a
-single probe into the lookup table. If @code{gperf} succeeds in
-generating such a function it produces a pair of C source code routines
-that perform hashing and table lookup recognition. All generated C code
-is directed to the standard output. Command-line options described
-below allow you to modify the input and output format to @code{gperf}.
-
-By default, @code{gperf} attempts to produce time-efficient code, with
-less emphasis on efficient space utilization. However, several options
-exist that permit trading-off execution time for storage space and vice
-versa. In particular, expanding the generated table size produces a
-sparse search structure, generally yielding faster searches.
-Conversely, you can direct @code{gperf} to utilize a C @code{switch}
-statement scheme that minimizes data space storage size. Furthermore,
-using a C @code{switch} may actually speed up the keyword retrieval time
-somewhat. Actual results depend on your C compiler, of course.
-
-In general, @code{gperf} assigns values to the characters it is using
-for hashing until some set of values gives each keyword a unique value.
-A helpful heuristic is that the larger the hash value range, the easier
-it is for @code{gperf} to find and generate a perfect hash function.
-Experimentation is the key to getting the most from @code{gperf}.
-
-@node Input Format, Output Format, Description, Description
-@section Input Format to @code{gperf}
-
-You can control the input keyfile format by varying certain command-line
-arguments, in particular the @samp{-t} option. The input's appearance
-is similar to GNU utilities @code{flex} and @code{bison} (or UNIX
-utilities @code{lex} and @code{yacc}). Here's an outline of the general
-format:
-
-@example
-@group
-declarations
-%%
-keywords
-%%
-functions
-@end group
-@end example
-
-@emph{Unlike} @code{flex} or @code{bison}, all sections of @code{gperf}'s input
-are optional. The following sections describe the input format for each
-section.
-
-@menu
-* Declarations:: @code{struct} Declarations and C Code Inclusion.
-* Keywords:: Format for Keyword Entries.
-* Functions:: Including Additional C Functions.
-@end menu
-
-@node Declarations, Keywords, Input Format, Input Format
-@subsection @code{struct} Declarations and C Code Inclusion
-
-The keyword input file optionally contains a section for including
-arbitrary C declarations and definitions, as well as provisions for
-providing a user-supplied @code{struct}. If the @samp{-t} option
-@emph{is} enabled, you @emph{must} provide a C @code{struct} as the last
-component in the declaration section from the keyfile file. The first
-field in this struct must be a @code{char *} identifier called ``name,''
-although it is possible to modify this field's name with the @samp{-K}
-option described below.
-
-Here is simple example, using months of the year and their attributes as
-input:
-
-@example
-@group
-struct months @{ char *name; int number; int days; int leap_days; @};
-%%
-january, 1, 31, 31
-february, 2, 28, 29
-march, 3, 31, 31
-april, 4, 30, 30
-may, 5, 31, 31
-june, 6, 30, 30
-july, 7, 31, 31
-august, 8, 31, 31
-september, 9, 30, 30
-october, 10, 31, 31
-november, 11, 30, 30
-december, 12, 31, 31
-@end group
-@end example
-
-Separating the @code{struct} declaration from the list of key words and
-other fields are a pair of consecutive percent signs, @code{%%},
-appearing left justified in the first column, as in the UNIX utility
-@code{lex}.
-
-Using a syntax similar to GNU utilities @code{flex} and @code{bison}, it
-is possible to directly include C source text and comments verbatim into
-the generated output file. This is accomplished by enclosing the region
-inside left-justified surrounding @code{%@{}, @code{%@}} pairs. Here is
-an input fragment based on the previous example that illustrates this
-feature:
-
-@example
-@group
-%@{
-#include <assert.h>
-/* This section of code is inserted directly into the output. */
-int return_month_days (struct months *months, int is_leap_year);
-%@}
-struct months @{ char *name; int number; int days; int leap_days; @};
-%%
-january, 1, 31, 31
-february, 2, 28, 29
-march, 3, 31, 31
-...
-@end group
-@end example
-
-It is possible to omit the declaration section entirely. In this case
-the keyfile begins directly with the first keyword line, @emph{e.g.}:
-
-@example
-@group
-january, 1, 31, 31
-february, 2, 28, 29
-march, 3, 31, 31
-april, 4, 30, 30
-...
-@end group
-@end example
-
-@node Keywords, Functions, Declarations, Input Format
-@subsection Format for Keyword Entries
-
-The second keyfile format section contains lines of keywords and any
-associated attributes you might supply. A line beginning with @samp{#}
-in the first column is considered a comment. Everything following the
-@samp{#} is ignored, up to and including the following newline.
-
-The first field of each non-comment line is always the key itself. It
-should be given as a simple name, @emph{i.e.}, without surrounding
-string quotation marks, and be left-justified flush against the first
-column. In this context, a ``field'' is considered to extend up to, but
-not include, the first blank, comma, or newline. Here is a simple
-example taken from a partial list of C reserved words:
-
-@example
-@group
-# These are a few C reserved words, see the c.@code{gperf} file
-# for a complete list of ANSI C reserved words.
-unsigned
-sizeof
-switch
-signed
-if
-default
-for
-while
-return
-@end group
-@end example
-
-Note that unlike @code{flex} or @code{bison} the first @code{%%} marker
-may be elided if the declaration section is empty.
-
-Additional fields may optionally follow the leading keyword. Fields
-should be separated by commas, and terminate at the end of line. What
-these fields mean is entirely up to you; they are used to initialize the
-elements of the user-defined @code{struct} provided by you in the
-declaration section. If the @samp{-t} option is @emph{not} enabled
-these fields are simply ignored. All previous examples except the last
-one contain keyword attributes.
-
-@node Functions, , Keywords, Input Format
-@subsection Including Additional C Functions
-
-The optional third section also corresponds closely with conventions
-found in @code{flex} and @code{bison}. All text in this section,
-starting at the final @code{%%} and extending to the end of the input
-file, is included verbatim into the generated output file. Naturally,
-it is your responsibility to ensure that the code contained in this
-section is valid C.
-
-@node Output Format, , Input Format, Description
-@section Output Format for Generated C Code with @code{gperf}
-
-Several options control how the generated C code appears on the standard
-output. Two C function are generated. They are called @code{hash} and
-@code{in_word_set}, although you may modify the name for
-@code{in_word_set} with a command-line option. Both functions require
-two arguments, a string, @code{char *} @var{str}, and a length
-parameter, @code{int} @var{len}. Their default function prototypes are
-as follows:
-
-@example
-@group
-static int hash (char *str, int len);
-int in_word_set (char *str, int len);
-@end group
-@end example
-
-By default, the generated @code{hash} function returns an integer value
-created by adding @var{len} to several user-specified @var{str} key
-positions indexed into an @dfn{associated values} table stored in a
-local static array. The associated values table is constructed
-internally by @code{gperf} and later output as a static local C array called
-@var{hash_table}; its meaning and properties are described below.
-@xref{Implementation}. The relevant key positions are specified via the
-@samp{-k} option when running @code{gperf}, as detailed in the @emph{Options}
-section below. @xref{Options}.
-
-Two options, @samp{-g} (assume you are compiling with GNU C and its
-@code{inline} feature) and @samp{-a} (assume ANSI C-style function
-prototypes), alter the content of both the generated @code{hash} and
-@code{in_word_set} routines. However, function @code{in_word_set} may
-be modified more extensively, in response to your option settings. The
-options that affect the @code{in_word_set} structure are:
-
-@itemize @bullet
-@table @samp
-@item -p
-Have function @code{in_word_set} return a pointer rather than a boolean.
-
-@item -t
-Make use of the user-defined @code{struct}.
-
-@item -S @var{total switch statements}
-Generate 1 or more C @code{switch} statement rather than use a large,
-(and potentially sparse) static array. Although the exact time and
-space savings of this approach vary according to your C compiler's
-degree of optimization, this method often results in smaller and faster
-code.
-@end table
-@end itemize
-
-If the @samp{-t}, @samp{-S}, and @samp{-p} options are omitted the
-default action is to generate a @code{char *} array containing the keys,
-together with additional null strings used for padding the array. By
-experimenting with the various input and output options, and timing the
-resulting C code, you can determine the best option choices for
-different keyword set characteristics.
-
-@node Options, Bugs, Description, Top
-@chapter Options to the @code{gperf} Utility
-
-There are @emph{many} options to @code{gperf}. They were added to make
-the program more convenient for use with real applications. ``On-line''
-help is readily available via the @samp{-h} option. Other options
-include:
-
-@itemize @bullet
-@table @samp
-@item -a
-Generate ANSI Standard C code using function prototypes. The default is
-to use ``classic'' K&R C function declaration syntax.
-
-@item -c
-Generates C code that uses the @code{strncmp} function to perform
-string comparisons. The default action is to use @code{strcmp}.
-
-@item -C
-Makes the contents of all generated lookup tables constant, @emph{i.e.},
-``readonly.'' Many compilers can generate more efficient code for this
-by putting the tables in readonly memory.
-
-@item -d
-Enables the debugging option. This produces verbose diagnostics to
-``standard error'' when @code{gperf} is executing. It is useful both for
-maintaining the program and for determining whether a given set of
-options is actually speeding up the search for a solution. Some useful
-information is dumped at the end of the program when the @samp{-d}
-option is enabled.
-
-@item -D
-Handle keywords whose key position sets hash to duplicate values.
-Duplicate hash values occur for two reasons:
-
-@itemize @bullet
-@item
-Since @code{gperf} does not backtrack it is possible for it to process
-all your input keywords without finding a unique mapping for each word.
-However, frequently only a very small number of duplicates occur, and
-the majority of keys still require one probe into the table.
-@item
-Sometimes a set of keys may have the same names, but possess different
-attributes. With the -D option @code{gperf} treats all these keys as part of
-an equivalence class and generates a perfect hash function with multiple
-comparisons for duplicate keys. It is up to you to completely
-disambiguate the keywords by modifying the generated C code. However,
-@code{gperf} helps you out by organizing the output.
-@end itemize
-
-Option @samp{-D} is extremely useful for certain large or highly
-redundant keyword sets, @emph{i.e.}, assembler instruction opcodes.
-Using this option usually means that the generated hash function is no
-longer perfect. On the other hand, it permits @code{gperf} to work on
-keyword sets that it otherwise could not handle.
-
-@item -e @var{keyword delimiter list}
-Allows the user to provide a string containing delimiters used to
-separate keywords from their attributes. The default is ",\n". This
-option is essential if you want to use keywords that have embedded
-commas or newlines. One useful trick is to use -e'TAB', where TAB is
-the literal tab character.
-
-@item -E
-Define constant values using an enum local to the lookup function rather
-than with #defines. This also means that different lookup functions can
-reside in the same file. Thanks to James Clark (jjc at ai.mit.edu).
-
-@item -f @var{iteration amount}
-Generate the perfect hash function ``fast.'' This decreases @code{gperf}'s
-running time at the cost of minimizing generated table-size. The
-iteration amount represents the number of times to iterate when
-resolving a collision. `0' means `iterate by the number of keywords.
-This option is probably most useful when used in conjunction with options
-@samp{-D} and/or @samp{-S} for @emph{large} keyword sets.
-
-@item -g
-Assume a GNU compiler, @emph{e.g.}, @code{g++} or @code{gcc}. This
-makes all generated routines use the ``inline'' keyword to remove the
-cost of function calls. Note that @samp{-g} does @emph{not} imply
-@samp{-a}, since other non-ANSI C compilers may have provisions for a
-function @code{inline} feature.
-
-@item -G
-Generate the static table of keywords as a static global variable,
-rather than hiding it inside of the lookup function (which is the
-default behavior).
-
-@item -h
-Prints a short summary on the meaning of each program option. Aborts
-further program execution.
-
-@item -H @var{hash function name}
-Allows you to specify the name for the generated hash function. Default
-name is `hash.' This option permits the use of two hash tables in the
-same file.
-
-@item -i @var{initial value}
-Provides an initial @var{value} for the associate values array. Default
-is 0. Increasing the initial value helps inflate the final table size,
-possibly leading to more time efficient keyword lookups. Note that this
-option is not particularly useful when @samp{-S} is used. Also,
-@samp{-i} is overriden when the @samp{-r} option is used.
-
-@item -j @var{jump value}
-Affects the ``jump value,'' @emph{i.e.}, how far to advance the
-associated character value upon collisions. @var{Jump value} is rounded
-up to an odd number, the default is 5. If the @var{jump value} is 0 @code{gper
-f}
-jumps by random amounts.
-
-@item -k @var{keys}
-Allows selection of the character key positions used in the keywords'
-hash function. The allowable choices range between 1-126, inclusive.
-The positions are separated by commas, @emph{e.g.}, @samp{-k 9,4,13,14};
-ranges may be used, @emph{e.g.}, @samp{-k 2-7}; and positions may occur
-in any order. Furthermore, the meta-character '*' causes the generated
-hash function to consider @strong{all} character positions in each key,
-whereas '$' instructs the hash function to use the ``final character''
-of a key (this is the only way to use a character position greater than
-126, incidentally).
-
-For instance, the option @samp{-k 1,2,4,6-10,'$'} generates a hash
-function that considers positions 1,2,4,6,7,8,9,10, plus the last
-character in each key (which may differ for each key, obviously). Keys
-with length less than the indicated key positions work properly, since
-selected key positions exceeding the key length are simply not
-referenced in the hash function.
-
-@item -K @var{key name}
-By default, the program assumes the structure component identifier for
-the keyword is ``name.'' This option allows an arbitrary choice of
-identifier for this component, although it still must occur as the first
-field in your supplied @code{struct}.
-
-@item -l
-Compare key lengths before trying a string comparison. This might cut
-down on the number of string comparisons made during the lookup, since
-keys with different lengths are never compared via @code{strcmp}.
-However, using @samp{-l} might greatly increase the size of the
-generated C code if the lookup table range is large (which implies that
-the switch option @samp{-S} is not enabled), since the length table
-contains as many elements as there are entries in the lookup table.
-
-@item -L @var{generated language name}
-Instructs @code{gperf} to generate code in the language specified by the
-option's argument. Languages handled are currently C++ and C. The
-default is C.
-
-@item -n
-Instructs the generator not to include the length of a keyword when
-computing its hash value. This may save a few assembly instructions in
-the generated lookup table.
-
-@item -N @var{lookup function name}
-Allows you to specify the name for the generated lookup function.
-Default name is `in_word_set.' This option permits completely automatic
-generation of perfect hash functions, especially when multiple generated
-hash functions are used in the same application.
-
-@item -o
-Reorders the keywords by sorting the keywords so that frequently
-occuring key position set components appear first. A second reordering
-pass follows so that keys with ``already determined values'' are placed
-towards the front of the keylist. This may decrease the time required
-to generate a perfect hash function for many keyword sets, and also
-produce more minimal perfect hash functions. The reason for this is
-that the reordering helps prune the search time by handling inevitable
-collisions early in the search process. On the other hand, if the
-number of keywords is @emph{very} large using @samp{-o} may
-@emph{increase} @code{gperf}'s execution time, since collisions will begin
-earlier and continue throughout the remainder of keyword processing.
-See Cichelli's paper from the January 1980 Communications of the ACM for
-details.
-
-@item -p
-Changes the return value of the generated function @code{in_word_set}
-from boolean (@emph{i.e.}, 0 or 1), to either type ``pointer to
-user-defined struct,'' (if the @samp{-t} option is enabled), or simply
-to @code{char *}, if @samp{-t} is not enabled. This option is most
-useful when the @samp{-t} option (allowing user-defined structs) is
-used. For example, it is possible to automatically generate the GNU C
-reserved word lookup routine with the options @samp{-p} and @samp{-t}.
-
-@item -r
-Utilizes randomness to initialize the associated values table. This
-frequently generates solutions faster than using deterministic
-initialization (which starts all associated values at 0). Furthermore,
-using the randomization option generally increases the size of the
-table. If @code{gperf} has difficultly with a certain keyword set try using
-@samp{-r} or @samp{-D}.
-
-@item -s @var{size-multiple}
-Affects the size of the generated hash table. The numeric argument for
-this option indicates ``how many times larger or smaller'' the maximum
-associated value range should be, in relationship to the number of keys.
-If the @var{size-multiple} is negative the maximum associated value is
-calculated by @emph{dividing} it into the total number of keys. For
-example, a value of 3 means ``allow the maximum associated value to be
-about 3 times larger than the number of input keys.''
-
-Conversely, a value of -3 means ``allow the maximum associated value to
-be about 3 times smaller than the number of input keys.'' Negative
-values are useful for limiting the overall size of the generated hash
-table, though this usually increases the number of duplicate hash
-values.
-
-If `generate switch' option @samp{-S} is @emph{not} enabled, the maximum
-associated value influences the static array table size, and a larger
-table should decrease the time required for an unsuccessful search, at
-the expense of extra table space.
-
-The default value is 1, thus the default maximum associated value about
-the same size as the number of keys (for efficiency, the maximum
-associated value is always rounded up to a power of 2). The actual
-table size may vary somewhat, since this technique is essentially a
-heuristic. In particular, setting this value too high slows down
-@code{gperf}'s runtime, since it must search through a much larger range
-of values. Judicious use of the @samp{-f} option helps alleviate this
-overhead, however.
-
-@item -S @var{total switch statements}
-Causes the generated C code to use a @code{switch} statement scheme,
-rather than an array lookup table. This can lead to a reduction in both
-time and space requirements for some keyfiles. The argument to this
-option determines how many @code{switch} statements are generated. A
-value of 1 generates 1 @code{switch} containing all the elements, a
-value of 2 generates 2 tables with 1/2 the elements in each
-@code{switch}, etc. This is useful since many C compilers cannot
-correctly generate code for large @code{switch} statements. This option
-was inspired in part by Keith Bostic's original C program.
-
-@item -t
-Allows you to include a @code{struct} type declaration for generated
-code. Any text before a pair of consecutive %% is consider part of the
-type declaration. Key words and additional fields may follow this, one
-group of fields per line. A set of examples for generating perfect hash
-tables and functions for Ada, C, and G++, Pascal, and Modula 2 and 3
-reserved words are distributed with this release.
-
-@item -T
-Prevents the transfer of the type declaration to the output file. Use
-this option if the type is already defined elsewhere.
-
-@item -v
-Prints out the current version number.
-
-@item -Z @var{class name}
-Allow user to specify name of generated C++ class. Default name is
-@code{Perfect_Hash}.
-@end table
-@end itemize
-
-@node Bugs, Projects, Options, Top
-@chapter Known Bugs and Limitations with @code{gperf}
-
-The following are some limitations with the current release of
-@code{gperf}:
-
-@itemize @bullet
-@item
-The @code{gperf} utility is tuned to execute quickly, and works quickly
-for small to medium size data sets (around 1000 keywords). It is
-extremely useful for maintaining perfect hash functions for compiler
-keyword sets. Several recent enhancements now enable @code{gperf} to
-work efficiently on much larger keyword sets (over 15,000 keywords).
-When processing large keyword sets it helps greatly to have over 8 megs
-of RAM.
-
-However, since @code{gperf} does not backtrack no guaranteed solution
-occurs on every run. On the other hand, it is usually easy to obtain a
-solution by varying the option parameters. In particular, try the
-@samp{-r} option, and also try changing the default arguments to the
-@samp{-s} and @samp{-j} options. To @emph{guarantee} a solution, use
-the @samp{-D} and @samp{-S} options, although the final results are not
-likely to be a @emph{perfect} hash function anymore! Finally, use the
-@samp{-f} option if you want @code{gperf} to generate the perfect hash
-function @emph{fast}, with less emphasis on making it minimal.
-
-@item
-The size of the generate static keyword array can get @emph{extremely}
-large if the input keyword file is large or if the keywords are quite
-similar. This tends to slow down the compilation of the generated C
-code, and @emph{greatly} inflates the object code size. If this
-situation occurs, consider using the @samp{-S} option to reduce data
-size, potentially increasing keyword recognition time a negligible
-amount. Since many C compilers cannot correctly generated code for
-large switch statements it is important to qualify the @var{-S} option
-with an appropriate numerical argument that controls the number of
-switch statements generated.
-
-@item
-The maximum number of key positions selected for a given key has an
-arbitrary limit of 126. This restriction should be removed, and if
-anyone considers this a problem write me and let me know so I can remove
-the constraint.
-
-@item
-The C++ source code only compiles correctly with GNU G++, version 1.36
-(and hopefully later versions). Porting to AT&T cfront would be
-tedious, but possible (and desirable). There is also a K&R C version
-available now. This should compile without change on most BSD systems,
-but may require a bit of work to run on SYSV, since @code{gperf} uses
-@var{alloca} in several places. Send mail to schmidt at ics.uci.edu for
-information.
-@end itemize
-
-@node Projects, Implementation, Bugs, Top
-@chapter Things Still Left to Do
-
-It should be ``relatively'' easy to replace the current perfect hash
-function algorithm with a more exhaustive approach; the perfect hash
-module is essential independent from other program modules. Additional
-worthwhile improvements include:
-
-@itemize @bullet
-@item
-Make the algorithm more robust. At present, the program halts with an
-error diagnostic if it can't find a direct solution and the @samp{-D}
-option is not enabled. A more comprehensive, albeit computationally
-expensive, approach would employ backtracking or enable alternative
-options and retry. It's not clear how helpful this would be, in
-general, since most search sets are rather small in practice.
-
-@item
-Another useful extension involves modifying the program to generate
-``minimal'' perfect hash functions (under certain circumstances, the
-current version can be rather extravagant in the generated table size).
-Again, this is mostly of theoretical interest, since a sparse table
-often produces faster lookups, and use of the @samp{-S} @code{switch}
-option can minimize the data size, at the expense of slightly longer
-lookups (note that the gcc compiler generally produces good code for
-@code{switch} statements, reducing the need for more complex schemes).
-
-@item
-In addition to improving the algorithm, it would also be useful to
-generate a C++ class or Ada package as the code output, in addition to
-the current C routines.
-@end itemize
-
-@node Implementation, Bibliography, Projects, Top
-@chapter Implementation Details of GNU @code{gperf}
-
-A paper describing the high-level description of the data structures and
-algorithms used to implement @code{gperf} will soon be available. This
-paper is useful not only from a maintenance and enhancement perspective,
-but also because they demonstrate several clever and useful programming
-techniques, @emph{e.g.}, `Iteration Number' boolean arrays, double
-hashing, a ``safe'' and efficient method for reading arbitrarily long
-input from a file, and a provably optimal algorithm for simultaneously
-determining both the minimum and maximum elements in a list.
-
-@page
-
-@node Bibliography, , Implementation, Top
-@chapter Bibliography
-
-[1] Chang, C.C.: @i{A Scheme for Constructing Ordered Minimal Perfect
-Hashing Functions} Information Sciences 39(1986), 187-195.
-
-[2] Cichelli, Richard J. @i{Author's Response to ``On Cichelli's Minimal Perfec
-t Hash
-Functions Method''} Communications of the ACM, 23, 12(December 1980), 729.
-
-[3] Cichelli, Richard J. @i{Minimal Perfect Hash Functions Made Simple}
-Communications of the ACM, 23, 1(January 1980), 17-19.
-
-[4] Cook, C. R. and Oldehoeft, R.R. @i{A Letter Oriented Minimal
-Perfect Hashing Function} SIGPLAN Notices, 17, 9(September 1982), 18-27.
-
-[5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M.
-@i{Practical Perfect Hashing} Computer Journal, 28, 1(January 1985), 54-58.
-
-[6] Jaeschke, G. @i{Reciprocal Hashing: A Method for Generating Minimal
-Perfect Hashing Functions} Communications of the ACM, 24, 12(December
-1981), 829-833.
-
-[7] Jaeschke, G. and Osterburg, G. @i{On Cichelli's Minimal Perfect
-Hash Functions Method} Communications of the ACM, 23, 12(December 1980),
-728-729.
-
-[8] Sager, Thomas J. @i{A Polynomial Time Generator for Minimal Perfect
-Hash Functions} Communications of the ACM, 28, 5(December 1985), 523-532
-
-[9] Schmidt, Douglas C. @i{GPERF: A Perfect Hash Function Generator}
-Second USENIX C++ Conference Proceedings, April 1990.
-
-[10] Sebesta, R.W. and Taylor, M.A. @i{Minimal Perfect Hash Functions
-for Reserved Word Lists} SIGPLAN Notices, 20, 12(September 1985), 47-53.
-
-[11] Sprugnoli, R. @i{Perfect Hashing Functions: A Single Probe
-Retrieving Method for Static Sets} Communications of the ACM, 20
-11(November 1977), 841-850.
-
-[12] Stallman, Richard M. @i{Using and Porting GNU CC} Free Software Foundation,
-1988.
-
-[13] Stroustrup, Bjarne @i{The C++ Programming Language.} Addison-Wesley, 1986.
-
-[14] Tiemann, Michael D. @i{User's Guide to GNU C++} Free Software
-Foundation, 1989.
-
-@contents
-@bye