summaryrefslogtreecommitdiff
path: root/apps/gperf/gperf.info
diff options
context:
space:
mode:
Diffstat (limited to 'apps/gperf/gperf.info')
-rw-r--r--apps/gperf/gperf.info1127
1 files changed, 0 insertions, 1127 deletions
diff --git a/apps/gperf/gperf.info b/apps/gperf/gperf.info
deleted file mode 100644
index a0947230573..00000000000
--- a/apps/gperf/gperf.info
+++ /dev/null
@@ -1,1127 +0,0 @@
-This is Info file gperf.info, produced by Makeinfo-1.55 from the input
-file ./gperf.texi.
-
-START-INFO-DIR-ENTRY
-* Gperf: (gperf). Perfect Hash Function Generator.
-END-INFO-DIR-ENTRY
-
- This file documents the features of the GNU Perfect Hash Function
-Generator
-
- Copyright (C) 1989 Free Software Foundation, Inc.
-
- Permission is granted to make and distribute verbatim copies of this
-manual provided the copyright notice and this permission notice are
-preserved on all copies.
-
- Permission is granted to copy and distribute modified versions of
-this manual under the conditions for verbatim copying, provided also
-that the section entitled "GNU General Public License" is included
-exactly as in the original, and provided that the entire resulting
-derived work is distributed under the terms of a permission notice
-identical to this one.
-
- Permission is granted to copy and distribute translations of this
-manual into another language, under the above conditions for modified
-versions, except that the section entitled "GNU `gperf' General Public
-License" an d this permission notice may be included in translations
-approved by the Free Software Foundation instead of in the original
-English.
-
-
-File: gperf.info, Node: Top, Next: Copying, Prev: (dir), Up: (dir)
-
-Introduction
-************
-
- This manual documents the GNU `gperf' perfect hash function generator
-utility, focusing on its features and how to use them, and how to report
-bugs.
-
-* Menu:
-
-* Copying:: GNU `gperf' General Public License says
- how you can copy and share `gperf'.
-* Contributors:: People who have contributed to `gperf'.
-* Motivation:: Static search structures and GNU GPERF.
-* Search Structures:: Static search structures and GNU `gperf'
-* Description:: High-level discussion of how GPERF functions.
-* Options:: A description of options to the program.
-* Bugs:: Known bugs and limitations with GPERF.
-* Projects:: Things still left to do.
-* Implementation:: Implementation Details for GNU GPERF.
-* Bibliography:: Material Referenced in this Report.
-
- -- The Detailed Node Listing --
-
-High-Level Description of GNU `gperf'
-
-* Input Format:: Input Format to `gperf'
-* Output Format:: Output Format for Generated C Code with `gperf'
-
-Input Format to `gperf'
-
-* Declarations:: `struct' Declarations and C Code Inclusion.
-* Keywords:: Format for Keyword Entries.
-* Functions:: Including Additional C Functions.
-
-
-File: gperf.info, Node: Copying, Next: Contributors, Prev: Top, Up: Top
-
-GNU GENERAL PUBLIC LICENSE
-**************************
-
- Version 1, February 1989
-
- Copyright (C) 1989 Free Software Foundation, Inc.
- 675 Mass Ave, Cambridge, MA 02139, USA
-
- Everyone is permitted to copy and distribute verbatim copies
- of this license document, but changing it is not allowed.
-
-Preamble
-========
-
- The license agreements of most software companies try to keep users
-at the mercy of those companies. By contrast, our General Public
-License is intended to guarantee your freedom to share and change free
-software--to make sure the software is free for all its users. The
-General Public License applies to the Free Software Foundation's
-software and to any other program whose authors commit to using it.
-You can use it for your programs, too.
-
- When we speak of free software, we are referring to freedom, not
-price. Specifically, the General Public License is designed to make
-sure that you have the freedom to give away or sell copies of free
-software, that you receive source code or can get it if you want it,
-that you can change the software or use pieces of it in new free
-programs; and that you know you can do these things.
-
- To protect your rights, we need to make restrictions that forbid
-anyone to deny you these rights or to ask you to surrender the rights.
-These restrictions translate to certain responsibilities for you if you
-distribute copies of the software, or if you modify it.
-
- For example, if you distribute copies of a such a program, whether
-gratis or for a fee, you must give the recipients all the rights that
-you have. You must make sure that they, too, receive or can get the
-source code. And you must tell them their rights.
-
- We protect your rights with two steps: (1) copyright the software,
-and (2) offer you this license which gives you legal permission to copy,
-distribute and/or modify the software.
-
- Also, for each author's protection and ours, we want to make certain
-that everyone understands that there is no warranty for this free
-software. If the software is modified by someone else and passed on, we
-want its recipients to know that what they have is not the original, so
-that any problems introduced by others will not reflect on the original
-authors' reputations.
-
- The precise terms and conditions for copying, distribution and
-modification follow.
-
- TERMS AND CONDITIONS
-
- 1. This License Agreement applies to any program or other work which
- contains a notice placed by the copyright holder saying it may be
- distributed under the terms of this General Public License. The
- "Program", below, refers to any such program or work, and a "work
- based on the Program" means either the Program or any work
- containing the Program or a portion of it, either verbatim or with
- modifications. Each licensee is addressed as "you".
-
- 2. You may copy and distribute verbatim copies of the Program's source
- code as you receive it, in any medium, provided that you
- conspicuously and appropriately publish on each copy an
- appropriate copyright notice and disclaimer of warranty; keep
- intact all the notices that refer to this General Public License
- and to the absence of any warranty; and give any other recipients
- of the Program a copy of this General Public License along with
- the Program. You may charge a fee for the physical act of
- transferring a copy.
-
- 3. You may modify your copy or copies of the Program or any portion of
- it, and copy and distribute such modifications under the terms of
- Paragraph 1 above, provided that you also do the following:
-
- * cause the modified files to carry prominent notices stating
- that you changed the files and the date of any change; and
-
- * cause the whole of any work that you distribute or publish,
- that in whole or in part contains the Program or any part
- thereof, either with or without modifications, to be licensed
- at no charge to all third parties under the terms of this
- General Public License (except that you may choose to grant
- warranty protection to some or all third parties, at your
- option).
-
- * If the modified program normally reads commands interactively
- when run, you must cause it, when started running for such
- interactive use in the simplest and most usual way, to print
- or display an announcement including an appropriate copyright
- notice and a notice that there is no warranty (or else,
- saying that you provide a warranty) and that users may
- redistribute the program under these conditions, and telling
- the user how to view a copy of this General Public License.
-
- * You may charge a fee for the physical act of transferring a
- copy, and you may at your option offer warranty protection in
- exchange for a fee.
-
- Mere aggregation of another independent work with the Program (or
- its derivative) on a volume of a storage or distribution medium
- does not bring the other work under the scope of these terms.
-
- 4. You may copy and distribute the Program (or a portion or
- derivative of it, under Paragraph 2) in object code or executable
- form under the terms of Paragraphs 1 and 2 above provided that you
- also do one of the following:
-
- * accompany it with the complete corresponding machine-readable
- source code, which must be distributed under the terms of
- Paragraphs 1 and 2 above; or,
-
- * accompany it with a written offer, valid for at least three
- years, to give any third party free (except for a nominal
- charge for the cost of distribution) a complete
- machine-readable copy of the corresponding source code, to be
- distributed under the terms of Paragraphs 1 and 2 above; or,
-
- * accompany it with the information you received as to where the
- corresponding source code may be obtained. (This alternative
- is allowed only for noncommercial distribution and only if you
- received the program in object code or executable form alone.)
-
- Source code for a work means the preferred form of the work for
- making modifications to it. For an executable file, complete
- source code means all the source code for all modules it contains;
- but, as a special exception, it need not include source code for
- modules which are standard libraries that accompany the operating
- system on which the executable file runs, or for standard header
- files or definitions files that accompany that operating system.
-
- 5. You may not copy, modify, sublicense, distribute or transfer the
- Program except as expressly provided under this General Public
- License. Any attempt otherwise to copy, modify, sublicense,
- distribute or transfer the Program is void, and will automatically
- terminate your rights to use the Program under this License.
- However, parties who have received copies, or rights to use
- copies, from you under this General Public License will not have
- their licenses terminated so long as such parties remain in full
- compliance.
-
- 6. By copying, distributing or modifying the Program (or any work
- based on the Program) you indicate your acceptance of this license
- to do so, and all its terms and conditions.
-
- 7. Each time you redistribute the Program (or any work based on the
- Program), the recipient automatically receives a license from the
- original licensor to copy, distribute or modify the Program
- subject to these terms and conditions. You may not impose any
- further restrictions on the recipients' exercise of the rights
- granted herein.
-
- 8. The Free Software Foundation may publish revised and/or new
- versions of the General Public License from time to time. Such
- new versions will be similar in spirit to the present version, but
- may differ in detail to address new problems or concerns.
-
- Each version is given a distinguishing version number. If the
- Program specifies a version number of the license which applies to
- it and "any later version", you have the option of following the
- terms and conditions either of that version or of any later
- version published by the Free Software Foundation. If the Program
- does not specify a version number of the license, you may choose
- any version ever published by the Free Software Foundation.
-
- 9. If you wish to incorporate parts of the Program into other free
- programs whose distribution conditions are different, write to the
- author to ask for permission. For software which is copyrighted
- by the Free Software Foundation, write to the Free Software
- Foundation; we sometimes make exceptions for this. Our decision
- will be guided by the two goals of preserving the free status of
- all derivatives of our free software and of promoting the sharing
- and reuse of software generally.
-
- NO WARRANTY
-
- 10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
- WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
- LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
- HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT
- WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT
- NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
- FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE
- QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
- PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY
- SERVICING, REPAIR OR CORRECTION.
-
- 11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
- WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY
- MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE
- LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
- INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
- INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
- DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU
- OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY
- OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
- ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
-
- END OF TERMS AND CONDITIONS
-
-Appendix: How to Apply These Terms to Your New Programs
-=======================================================
-
- If you develop a new program, and you want it to be of the greatest
-possible use to humanity, the best way to achieve this is to make it
-free software which everyone can redistribute and change under these
-terms.
-
- To do so, attach the following notices to the program. It is safest
-to attach them to the start of each source file to most effectively
-convey the exclusion of warranty; and each file should have at least the
-"copyright" line and a pointer to where the full notice is found.
-
- ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
- Copyright (C) 19YY NAME OF AUTHOR
-
- This program is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 1, or (at your option)
- any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
-
- You should have received a copy of the GNU General Public License
- along with this program; if not, write to the Free Software
- Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
-
- Also add information on how to contact you by electronic and paper
-mail.
-
- If the program is interactive, make it output a short notice like
-this when it starts in an interactive mode:
-
- Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR
- Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
- This is free software, and you are welcome to redistribute it
- under certain conditions; type `show c' for details.
-
- The hypothetical commands `show w' and `show c' should show the
-appropriate parts of the General Public License. Of course, the
-commands you use may be called something other than `show w' and `show
-c'; they could even be mouse-clicks or menu items--whatever suits your
-program.
-
- You should also get your employer (if you work as a programmer) or
-your school, if any, to sign a "copyright disclaimer" for the program,
-if necessary. Here a sample; alter the names:
-
- Yoyodyne, Inc., hereby disclaims all copyright interest in the
- program `Gnomovision' (a program to direct compilers to make passes
- at assemblers) written by James Hacker.
-
- SIGNATURE OF TY COON, 1 April 1989
- Ty Coon, President of Vice
-
- That's all there is to it!
-
-
-File: gperf.info, Node: Contributors, Next: Motivation, Prev: Copying, Up: Top
-
-Contributors to GNU `gperf' Utility
-***********************************
-
- * The GNU `gperf' perfect hash function generator utility was
- originally written in GNU C++ by Douglas C. Schmidt. It is now
- also available in a highly-portable "old-style" C version. The
- general idea for the perfect hash function generator was inspired
- by Keith Bostic's algorithm written in C, and distributed to
- net.sources around 1984. The current program is a heavily
- modified, enhanced, and extended implementation of Keith's basic
- idea, created at the University of California, Irvine. Bugs,
- patches, and suggestions should be reported to schmidt at
- ics.uci.edu.
-
- * Special thanks is extended to Michael Tiemann and Doug Lea, for
- providing a useful compiler, and for giving me a forum to exhibit
- my creation.
-
- In addition, Adam de Boor and Nels Olson provided many tips and
- insights that greatly helped improve the quality and functionality
- of `gperf'.
-
-
-File: gperf.info, Node: Motivation, Next: Search Structures, Prev: Contributors, Up: Top
-
-Introduction
-************
-
- `gperf' is a perfect hash function generator written in C++. It
-transforms an *n* element user-specified keyword set *W* into a perfect
-hash function *F*. *F* uniquely maps keywords in *W* onto the range
-0..*k*, where *k* >= *n*. If *k = n* then *F* is a *minimal* perfect
-hash function. `gperf' generates a 0..*k* element static lookup table
-and a pair of C functions. These functions determine whether a given
-character string *s* occurs in *W*, using at most one probe into the
-lookup table.
-
- `gperf' currently generates the reserved keyword recognizer for
-lexical analyzers in several production and research compilers and
-language processing tools, including GNU C, GNU C++, GNU Pascal, GNU
-Modula 3, and GNU indent. Complete C++ source code for `gperf' is
-available via anonymous ftp from ics.uci.edu. `gperf' also is
-distributed along with the GNU libg++ library. A highly portable,
-functionally equivalent K&R C version of `gperf' is archived in
-comp.sources.unix, volume 20. Finally, a paper describing `gperf''s
-design and implementation in greater detail is available in the Second
-USENIX C++ Conference proceedings.
-
-
-File: gperf.info, Node: Search Structures, Next: Description, Prev: Motivation, Up: Top
-
-Static search structures and GNU `gperf'
-****************************************
-
- A "static search structure" is an Abstract Data Type with certain
-fundamental operations, *e.g.*, *initialize*, *insert*, and *retrieve*.
-Conceptually, all insertions occur before any retrievals. In
-practice, `gperf' generates a `static' array containing search set
-keywords and any associated attributes specified by the user. Thus,
-there is essentially no execution-time cost for the insertions. It is
-a useful data structure for representing *static search sets*. Static
-search sets occur frequently in software system applications. Typical
-static search sets include compiler reserved words, assembler
-instruction opcodes, and built-in shell interpreter commands. Search
-set members, called "keywords", are inserted into the structure only
-once, usually during program initialization, and are not generally
-modified at run-time.
-
- Numerous static search structure implementations exist, *e.g.*,
-arrays, linked lists, binary search trees, digital search tries, and
-hash tables. Different approaches offer trade-offs between space
-utilization and search time efficiency. For example, an *n* element
-sorted array is space efficient, though the average-case time
-complexity for retrieval operations using binary search is proportional
-to log *n*. Conversely, hash table implementations often locate a
-table entry in constant time, but typically impose additional memory
-overhead and exhibit poor worst case performance.
-
- *Minimal perfect hash functions* provide an optimal solution for a
-particular class of static search sets. A minimal perfect hash
-function is defined by two properties:
-
- * It allows keyword recognition in a static search set using at most
- *one* probe into the hash table. This represents the "perfect"
- property.
-
- * The actual memory allocated to store the keywords is precisely
- large enough for the keyword set, and *no larger*. This is the
- "minimal" property.
-
- For most applications it is far easier to generate *perfect* hash
-functions than *minimal perfect* hash functions. Moreover, non-minimal
-perfect hash functions frequently execute faster than minimal ones in
-practice. This phenomena occurs since searching a sparse keyword table
-increases the probability of locating a "null" entry, thereby reducing
-string comparisons. `gperf''s default behavior generates
-*near-minimal* perfect hash functions for keyword sets. However,
-`gperf' provides many options that permit user control over the degree
-of minimality and perfection.
-
- Static search sets often exhibit relative stability over time. For
-example, Ada's 63 reserved words have remained constant for nearly a
-decade. It is therefore frequently worthwhile to expend concerted
-effort building an optimal search structure *once*, if it subsequently
-receives heavy use multiple times. `gperf' removes the drudgery
-associated with constructing time- and space-efficient search
-structures by hand. It has proven a useful and practical tool for
-serious programming projects. Output from `gperf' is currently used in
-several production and research compilers, including GNU C, GNU C++,
-GNU Pascal, and GNU Modula 3. The latter two compilers are not yet
-part of the official GNU distribution. Each compiler utilizes `gperf'
-to automatically generate static search structures that efficiently
-identify their respective reserved keywords.
-
-
-File: gperf.info, Node: Description, Next: Options, Prev: Search Structures, Up: Top
-
-High-Level Description of GNU `gperf'
-*************************************
-
-* Menu:
-
-* Input Format:: Input Format to `gperf'
-* Output Format:: Output Format for Generated C Code with `gperf'
-
- The perfect hash function generator `gperf' reads a set of
-"keywords" from a "keyfile" (or from the standard input by default).
-It attempts to derive a perfect hashing function that recognizes a
-member of the "static keyword set" with at most a single probe into the
-lookup table. If `gperf' succeeds in generating such a function it
-produces a pair of C source code routines that perform hashing and
-table lookup recognition. All generated C code is directed to the
-standard output. Command-line options described below allow you to
-modify the input and output format to `gperf'.
-
- By default, `gperf' attempts to produce time-efficient code, with
-less emphasis on efficient space utilization. However, several options
-exist that permit trading-off execution time for storage space and vice
-versa. In particular, expanding the generated table size produces a
-sparse search structure, generally yielding faster searches.
-Conversely, you can direct `gperf' to utilize a C `switch' statement
-scheme that minimizes data space storage size. Furthermore, using a C
-`switch' may actually speed up the keyword retrieval time somewhat.
-Actual results depend on your C compiler, of course.
-
- In general, `gperf' assigns values to the characters it is using for
-hashing until some set of values gives each keyword a unique value. A
-helpful heuristic is that the larger the hash value range, the easier
-it is for `gperf' to find and generate a perfect hash function.
-Experimentation is the key to getting the most from `gperf'.
-
-
-File: gperf.info, Node: Input Format, Next: Output Format, Prev: Description, Up: Description
-
-Input Format to `gperf'
-=======================
-
- You can control the input keyfile format by varying certain
-command-line arguments, in particular the `-t' option. The input's
-appearance is similar to GNU utilities `flex' and `bison' (or UNIX
-utilities `lex' and `yacc'). Here's an outline of the general format:
-
- declarations
- %%
- keywords
- %%
- functions
-
- *Unlike* `flex' or `bison', all sections of `gperf''s input are
-optional. The following sections describe the input format for each
-section.
-
-* Menu:
-
-* Declarations:: `struct' Declarations and C Code Inclusion.
-* Keywords:: Format for Keyword Entries.
-* Functions:: Including Additional C Functions.
-
-
-File: gperf.info, Node: Declarations, Next: Keywords, Prev: Input Format, Up: Input Format
-
-`struct' Declarations and C Code Inclusion
-------------------------------------------
-
- The keyword input file optionally contains a section for including
-arbitrary C declarations and definitions, as well as provisions for
-providing a user-supplied `struct'. If the `-t' option *is* enabled,
-you *must* provide a C `struct' as the last component in the
-declaration section from the keyfile file. The first field in this
-struct must be a `char *' identifier called "name," although it is
-possible to modify this field's name with the `-K' option described
-below.
-
- Here is simple example, using months of the year and their
-attributes as input:
-
- struct months { char *name; int number; int days; int leap_days; };
- %%
- january, 1, 31, 31
- february, 2, 28, 29
- march, 3, 31, 31
- april, 4, 30, 30
- may, 5, 31, 31
- june, 6, 30, 30
- july, 7, 31, 31
- august, 8, 31, 31
- september, 9, 30, 30
- october, 10, 31, 31
- november, 11, 30, 30
- december, 12, 31, 31
-
- Separating the `struct' declaration from the list of key words and
-other fields are a pair of consecutive percent signs, `%%', appearing
-left justified in the first column, as in the UNIX utility `lex'.
-
- Using a syntax similar to GNU utilities `flex' and `bison', it is
-possible to directly include C source text and comments verbatim into
-the generated output file. This is accomplished by enclosing the region
-inside left-justified surrounding `%{', `%}' pairs. Here is an input
-fragment based on the previous example that illustrates this feature:
-
- %{
- #include <assert.h>
- /* This section of code is inserted directly into the output. */
- int return_month_days (struct months *months, int is_leap_year);
- %}
- struct months { char *name; int number; int days; int leap_days; };
- %%
- january, 1, 31, 31
- february, 2, 28, 29
- march, 3, 31, 31
- ...
-
- It is possible to omit the declaration section entirely. In this
-case the keyfile begins directly with the first keyword line, *e.g.*:
-
- january, 1, 31, 31
- february, 2, 28, 29
- march, 3, 31, 31
- april, 4, 30, 30
- ...
-
-
-File: gperf.info, Node: Keywords, Next: Functions, Prev: Declarations, Up: Input Format
-
-Format for Keyword Entries
---------------------------
-
- The second keyfile format section contains lines of keywords and any
-associated attributes you might supply. A line beginning with `#' in
-the first column is considered a comment. Everything following the `#'
-is ignored, up to and including the following newline.
-
- The first field of each non-comment line is always the key itself.
-It should be given as a simple name, *i.e.*, without surrounding string
-quotation marks, and be left-justified flush against the first column.
-In this context, a "field" is considered to extend up to, but not
-include, the first blank, comma, or newline. Here is a simple example
-taken from a partial list of C reserved words:
-
- # These are a few C reserved words, see the c.`gperf' file
- # for a complete list of ANSI C reserved words.
- unsigned
- sizeof
- switch
- signed
- if
- default
- for
- while
- return
-
- Note that unlike `flex' or `bison' the first `%%' marker may be
-elided if the declaration section is empty.
-
- Additional fields may optionally follow the leading keyword. Fields
-should be separated by commas, and terminate at the end of line. What
-these fields mean is entirely up to you; they are used to initialize the
-elements of the user-defined `struct' provided by you in the
-declaration section. If the `-t' option is *not* enabled these fields
-are simply ignored. All previous examples except the last one contain
-keyword attributes.
-
-
-File: gperf.info, Node: Functions, Prev: Keywords, Up: Input Format
-
-Including Additional C Functions
---------------------------------
-
- The optional third section also corresponds closely with conventions
-found in `flex' and `bison'. All text in this section, starting at the
-final `%%' and extending to the end of the input file, is included
-verbatim into the generated output file. Naturally, it is your
-responsibility to ensure that the code contained in this section is
-valid C.
-
-
-File: gperf.info, Node: Output Format, Prev: Input Format, Up: Description
-
-Output Format for Generated C Code with `gperf'
-===============================================
-
- Several options control how the generated C code appears on the
-standard output. Two C function are generated. They are called `hash'
-and `in_word_set', although you may modify the name for `in_word_set'
-with a command-line option. Both functions require two arguments, a
-string, `char *' STR, and a length parameter, `int' LEN. Their default
-function prototypes are as follows:
-
- static int hash (char *str, int len);
- int in_word_set (char *str, int len);
-
- By default, the generated `hash' function returns an integer value
-created by adding LEN to several user-specified STR key positions
-indexed into an "associated values" table stored in a local static
-array. The associated values table is constructed internally by
-`gperf' and later output as a static local C array called HASH_TABLE;
-its meaning and properties are described below. *Note
-Implementation::. The relevant key positions are specified via the `-k'
-option when running `gperf', as detailed in the *Options* section
-below. *Note Options::.
-
- Two options, `-g' (assume you are compiling with GNU C and its
-`inline' feature) and `-a' (assume ANSI C-style function prototypes),
-alter the content of both the generated `hash' and `in_word_set'
-routines. However, function `in_word_set' may be modified more
-extensively, in response to your option settings. The options that
-affect the `in_word_set' structure are:
-
- `-p'
- Have function `in_word_set' return a pointer rather than a
- boolean.
-
- `-t'
- Make use of the user-defined `struct'.
-
- `-S TOTAL SWITCH STATEMENTS'
- Generate 1 or more C `switch' statement rather than use a
- large, (and potentially sparse) static array. Although the
- exact time and space savings of this approach vary according
- to your C compiler's degree of optimization, this method
- often results in smaller and faster code.
-
- If the `-t', `-S', and `-p' options are omitted the default action
-is to generate a `char *' array containing the keys, together with
-additional null strings used for padding the array. By experimenting
-with the various input and output options, and timing the resulting C
-code, you can determine the best option choices for different keyword
-set characteristics.
-
-
-File: gperf.info, Node: Options, Next: Bugs, Prev: Description, Up: Top
-
-Options to the `gperf' Utility
-******************************
-
- There are *many* options to `gperf'. They were added to make the
-program more convenient for use with real applications. "On-line" help
-is readily available via the `-h' option. Other options include:
-
- `-a'
- Generate ANSI Standard C code using function prototypes. The
- default is to use "classic" K&R C function declaration syntax.
-
- `-c'
- Generates C code that uses the `strncmp' function to perform
- string comparisons. The default action is to use `strcmp'.
-
- `-C'
- Makes the contents of all generated lookup tables constant,
- *i.e.*, "readonly." Many compilers can generate more
- efficient code for this by putting the tables in readonly
- memory.
-
- `-d'
- Enables the debugging option. This produces verbose
- diagnostics to "standard error" when `gperf' is executing.
- It is useful both for maintaining the program and for
- determining whether a given set of options is actually
- speeding up the search for a solution. Some useful
- information is dumped at the end of the program when the `-d'
- option is enabled.
-
- `-D'
- Handle keywords whose key position sets hash to duplicate
- values. Duplicate hash values occur for two reasons:
-
- * Since `gperf' does not backtrack it is possible for it
- to process all your input keywords without finding a
- unique mapping for each word. However, frequently only
- a very small number of duplicates occur, and the
- majority of keys still require one probe into the table.
-
- * Sometimes a set of keys may have the same names, but
- possess different attributes. With the -D option
- `gperf' treats all these keys as part of an equivalence
- class and generates a perfect hash function with multiple
- comparisons for duplicate keys. It is up to you to
- completely disambiguate the keywords by modifying the
- generated C code. However, `gperf' helps you out by
- organizing the output.
-
- Option `-D' is extremely useful for certain large or highly
- redundant keyword sets, *i.e.*, assembler instruction opcodes.
- Using this option usually means that the generated hash
- function is no longer perfect. On the other hand, it permits
- `gperf' to work on keyword sets that it otherwise could not
- handle.
-
- `-e KEYWORD DELIMITER LIST'
- Allows the user to provide a string containing delimiters
- used to separate keywords from their attributes. The default
- is ",\n". This option is essential if you want to use
- keywords that have embedded commas or newlines. One useful
- trick is to use -e'TAB', where TAB is the literal tab
- character.
-
- `-E'
- Define constant values using an enum local to the lookup
- function rather than with #defines. This also means that
- different lookup functions can reside in the same file.
- Thanks to James Clark (jjc at ai.mit.edu).
-
- `-f ITERATION AMOUNT'
- Generate the perfect hash function "fast." This decreases
- `gperf''s running time at the cost of minimizing generated
- table-size. The iteration amount represents the number of
- times to iterate when resolving a collision. `0' means
- `iterate by the number of keywords. This option is probably
- most useful when used in conjunction with options `-D' and/or
- `-S' for *large* keyword sets.
-
- `-g'
- Assume a GNU compiler, *e.g.*, `g++' or `gcc'. This makes
- all generated routines use the "inline" keyword to remove the
- cost of function calls. Note that `-g' does *not* imply
- `-a', since other non-ANSI C compilers may have provisions
- for a function `inline' feature.
-
- `-G'
- Generate the static table of keywords as a static global
- variable, rather than hiding it inside of the lookup function
- (which is the default behavior).
-
- `-h'
- Prints a short summary on the meaning of each program option.
- Aborts further program execution.
-
- `-H HASH FUNCTION NAME'
- Allows you to specify the name for the generated hash
- function. Default name is `hash.' This option permits the
- use of two hash tables in the same file.
-
- `-i INITIAL VALUE'
- Provides an initial VALUE for the associate values array.
- Default is 0. Increasing the initial value helps inflate the
- final table size, possibly leading to more time efficient
- keyword lookups. Note that this option is not particularly
- useful when `-S' is used. Also, `-i' is overriden when the
- `-r' option is used.
-
- `-j JUMP VALUE'
- Affects the "jump value," *i.e.*, how far to advance the
- associated character value upon collisions. JUMP VALUE is
- rounded up to an odd number, the default is 5. If the JUMP
- VALUE is 0 `gper f' jumps by random amounts.
-
- `-k KEYS'
- Allows selection of the character key positions used in the
- keywords' hash function. The allowable choices range between
- 1-126, inclusive. The positions are separated by commas,
- *e.g.*, `-k 9,4,13,14'; ranges may be used, *e.g.*, `-k 2-7';
- and positions may occur in any order. Furthermore, the
- meta-character '*' causes the generated hash function to
- consider *all* character positions in each key, whereas '$'
- instructs the hash function to use the "final character" of a
- key (this is the only way to use a character position greater
- than 126, incidentally).
-
- For instance, the option `-k 1,2,4,6-10,'$'' generates a hash
- function that considers positions 1,2,4,6,7,8,9,10, plus the
- last character in each key (which may differ for each key,
- obviously). Keys with length less than the indicated key
- positions work properly, since selected key positions
- exceeding the key length are simply not referenced in the
- hash function.
-
- `-K KEY NAME'
- By default, the program assumes the structure component
- identifier for the keyword is "name." This option allows an
- arbitrary choice of identifier for this component, although
- it still must occur as the first field in your supplied
- `struct'.
-
- `-l'
- Compare key lengths before trying a string comparison. This
- might cut down on the number of string comparisons made
- during the lookup, since keys with different lengths are
- never compared via `strcmp'. However, using `-l' might
- greatly increase the size of the generated C code if the
- lookup table range is large (which implies that the switch
- option `-S' is not enabled), since the length table contains
- as many elements as there are entries in the lookup table.
-
- `-L GENERATED LANGUAGE NAME'
- Instructs `gperf' to generate code in the language specified
- by the option's argument. Languages handled are currently
- C++ and C. The default is C.
-
- `-n'
- Instructs the generator not to include the length of a
- keyword when computing its hash value. This may save a few
- assembly instructions in the generated lookup table.
-
- `-N LOOKUP FUNCTION NAME'
- Allows you to specify the name for the generated lookup
- function. Default name is `in_word_set.' This option
- permits completely automatic generation of perfect hash
- functions, especially when multiple generated hash functions
- are used in the same application.
-
- `-o'
- Reorders the keywords by sorting the keywords so that
- frequently occuring key position set components appear first.
- A second reordering pass follows so that keys with "already
- determined values" are placed towards the front of the
- keylist. This may decrease the time required to generate a
- perfect hash function for many keyword sets, and also produce
- more minimal perfect hash functions. The reason for this is
- that the reordering helps prune the search time by handling
- inevitable collisions early in the search process. On the
- other hand, if the number of keywords is *very* large using
- `-o' may *increase* `gperf''s execution time, since
- collisions will begin earlier and continue throughout the
- remainder of keyword processing. See Cichelli's paper from
- the January 1980 Communications of the ACM for details.
-
- `-p'
- Changes the return value of the generated function
- `in_word_set' from boolean (*i.e.*, 0 or 1), to either type
- "pointer to user-defined struct," (if the `-t' option is
- enabled), or simply to `char *', if `-t' is not enabled.
- This option is most useful when the `-t' option (allowing
- user-defined structs) is used. For example, it is possible
- to automatically generate the GNU C reserved word lookup
- routine with the options `-p' and `-t'.
-
- `-r'
- Utilizes randomness to initialize the associated values
- table. This frequently generates solutions faster than using
- deterministic initialization (which starts all associated
- values at 0). Furthermore, using the randomization option
- generally increases the size of the table. If `gperf' has
- difficultly with a certain keyword set try using `-r' or `-D'.
-
- `-s SIZE-MULTIPLE'
- Affects the size of the generated hash table. The numeric
- argument for this option indicates "how many times larger or
- smaller" the maximum associated value range should be, in
- relationship to the number of keys. If the SIZE-MULTIPLE is
- negative the maximum associated value is calculated by
- *dividing* it into the total number of keys. For example, a
- value of 3 means "allow the maximum associated value to be
- about 3 times larger than the number of input keys."
-
- Conversely, a value of -3 means "allow the maximum associated
- value to be about 3 times smaller than the number of input
- keys." Negative values are useful for limiting the overall
- size of the generated hash table, though this usually
- increases the number of duplicate hash values.
-
- If `generate switch' option `-S' is *not* enabled, the maximum
- associated value influences the static array table size, and
- a larger table should decrease the time required for an
- unsuccessful search, at the expense of extra table space.
-
- The default value is 1, thus the default maximum associated
- value about the same size as the number of keys (for
- efficiency, the maximum associated value is always rounded up
- to a power of 2). The actual table size may vary somewhat,
- since this technique is essentially a heuristic. In
- particular, setting this value too high slows down `gperf''s
- runtime, since it must search through a much larger range of
- values. Judicious use of the `-f' option helps alleviate this
- overhead, however.
-
- `-S TOTAL SWITCH STATEMENTS'
- Causes the generated C code to use a `switch' statement
- scheme, rather than an array lookup table. This can lead to
- a reduction in both time and space requirements for some
- keyfiles. The argument to this option determines how many
- `switch' statements are generated. A value of 1 generates 1
- `switch' containing all the elements, a value of 2 generates
- 2 tables with 1/2 the elements in each `switch', etc. This
- is useful since many C compilers cannot correctly generate
- code for large `switch' statements. This option was inspired
- in part by Keith Bostic's original C program.
-
- `-t'
- Allows you to include a `struct' type declaration for
- generated code. Any text before a pair of consecutive %% is
- consider part of the type declaration. Key words and
- additional fields may follow this, one group of fields per
- line. A set of examples for generating perfect hash tables
- and functions for Ada, C, and G++, Pascal, and Modula 2 and 3
- reserved words are distributed with this release.
-
- `-T'
- Prevents the transfer of the type declaration to the output
- file. Use this option if the type is already defined
- elsewhere.
-
- `-v'
- Prints out the current version number.
-
- `-Z CLASS NAME'
- Allow user to specify name of generated C++ class. Default
- name is `Perfect_Hash'.
-
-
-File: gperf.info, Node: Bugs, Next: Projects, Prev: Options, Up: Top
-
-Known Bugs and Limitations with `gperf'
-***************************************
-
- The following are some limitations with the current release of
-`gperf':
-
- * The `gperf' utility is tuned to execute quickly, and works quickly
- for small to medium size data sets (around 1000 keywords). It is
- extremely useful for maintaining perfect hash functions for
- compiler keyword sets. Several recent enhancements now enable
- `gperf' to work efficiently on much larger keyword sets (over
- 15,000 keywords). When processing large keyword sets it helps
- greatly to have over 8 megs of RAM.
-
- However, since `gperf' does not backtrack no guaranteed solution
- occurs on every run. On the other hand, it is usually easy to
- obtain a solution by varying the option parameters. In
- particular, try the `-r' option, and also try changing the default
- arguments to the `-s' and `-j' options. To *guarantee* a
- solution, use the `-D' and `-S' options, although the final
- results are not likely to be a *perfect* hash function anymore!
- Finally, use the `-f' option if you want `gperf' to generate the
- perfect hash function *fast*, with less emphasis on making it
- minimal.
-
- * The size of the generate static keyword array can get *extremely*
- large if the input keyword file is large or if the keywords are
- quite similar. This tends to slow down the compilation of the
- generated C code, and *greatly* inflates the object code size. If
- this situation occurs, consider using the `-S' option to reduce
- data size, potentially increasing keyword recognition time a
- negligible amount. Since many C compilers cannot correctly
- generated code for large switch statements it is important to
- qualify the -S option with an appropriate numerical argument that
- controls the number of switch statements generated.
-
- * The maximum number of key positions selected for a given key has an
- arbitrary limit of 126. This restriction should be removed, and if
- anyone considers this a problem write me and let me know so I can
- remove the constraint.
-
- * The C++ source code only compiles correctly with GNU G++, version
- 1.36 (and hopefully later versions). Porting to AT&T cfront would
- be tedious, but possible (and desirable). There is also a K&R C
- version available now. This should compile without change on most
- BSD systems, but may require a bit of work to run on SYSV, since
- `gperf' uses ALLOCA in several places. Send mail to schmidt at
- ics.uci.edu for information.
-
-
-File: gperf.info, Node: Projects, Next: Implementation, Prev: Bugs, Up: Top
-
-Things Still Left to Do
-***********************
-
- It should be "relatively" easy to replace the current perfect hash
-function algorithm with a more exhaustive approach; the perfect hash
-module is essential independent from other program modules. Additional
-worthwhile improvements include:
-
- * Make the algorithm more robust. At present, the program halts
- with an error diagnostic if it can't find a direct solution and
- the `-D' option is not enabled. A more comprehensive, albeit
- computationally expensive, approach would employ backtracking or
- enable alternative options and retry. It's not clear how helpful
- this would be, in general, since most search sets are rather small
- in practice.
-
- * Another useful extension involves modifying the program to generate
- "minimal" perfect hash functions (under certain circumstances, the
- current version can be rather extravagant in the generated table
- size). Again, this is mostly of theoretical interest, since a
- sparse table often produces faster lookups, and use of the `-S'
- `switch' option can minimize the data size, at the expense of
- slightly longer lookups (note that the gcc compiler generally
- produces good code for `switch' statements, reducing the need for
- more complex schemes).
-
- * In addition to improving the algorithm, it would also be useful to
- generate a C++ class or Ada package as the code output, in
- addition to the current C routines.
-
-
-File: gperf.info, Node: Implementation, Next: Bibliography, Prev: Projects, Up: Top
-
-Implementation Details of GNU `gperf'
-*************************************
-
- A paper describing the high-level description of the data structures
-and algorithms used to implement `gperf' will soon be available. This
-paper is useful not only from a maintenance and enhancement perspective,
-but also because they demonstrate several clever and useful programming
-techniques, *e.g.*, `Iteration Number' boolean arrays, double hashing,
-a "safe" and efficient method for reading arbitrarily long input from a
-file, and a provably optimal algorithm for simultaneously determining
-both the minimum and maximum elements in a list.
-
-
-File: gperf.info, Node: Bibliography, Prev: Implementation, Up: Top
-
-Bibliography
-************
-
- [1] Chang, C.C.: A Scheme for Constructing Ordered Minimal Perfect
-Hashing Functions Information Sciences 39(1986), 187-195.
-
- [2] Cichelli, Richard J. Author's Response to "On Cichelli's Minimal
-Perfec t Hash Functions Method" Communications of the ACM, 23,
-12(December 1980), 729.
-
- [3] Cichelli, Richard J. Minimal Perfect Hash Functions Made Simple
-Communications of the ACM, 23, 1(January 1980), 17-19.
-
- [4] Cook, C. R. and Oldehoeft, R.R. A Letter Oriented Minimal
-Perfect Hashing Function SIGPLAN Notices, 17, 9(September 1982), 18-27.
-
- [5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M.
-Practical Perfect Hashing Computer Journal, 28, 1(January 1985), 54-58.
-
- [6] Jaeschke, G. Reciprocal Hashing: A Method for Generating Minimal
-Perfect Hashing Functions Communications of the ACM, 24, 12(December
-1981), 829-833.
-
- [7] Jaeschke, G. and Osterburg, G. On Cichelli's Minimal Perfect
-Hash Functions Method Communications of the ACM, 23, 12(December 1980),
-728-729.
-
- [8] Sager, Thomas J. A Polynomial Time Generator for Minimal Perfect
-Hash Functions Communications of the ACM, 28, 5(December 1985), 523-532
-
- [9] Schmidt, Douglas C. GPERF: A Perfect Hash Function Generator
-Second USENIX C++ Conference Proceedings, April 1990.
-
- [10] Sebesta, R.W. and Taylor, M.A. Minimal Perfect Hash Functions
-for Reserved Word Lists SIGPLAN Notices, 20, 12(September 1985), 47-53.
-
- [11] Sprugnoli, R. Perfect Hashing Functions: A Single Probe
-Retrieving Method for Static Sets Communications of the ACM, 20
-11(November 1977), 841-850.
-
- [12] Stallman, Richard M. Using and Porting GNU CC Free Software
-Foundation, 1988.
-
- [13] Stroustrup, Bjarne The C++ Programming Language.
-Addison-Wesley, 1986.
-
- [14] Tiemann, Michael D. User's Guide to GNU C++ Free Software
-Foundation, 1989.
-
-
-
-Tag Table:
-Node: Top1218
-Node: Copying2456
-Node: Contributors15759
-Node: Motivation16859
-Node: Search Structures18126
-Node: Description21679
-Node: Input Format23499
-Node: Declarations24294
-Node: Keywords26601
-Node: Functions28192
-Node: Output Format28686
-Node: Options31156
-Node: Bugs44526
-Node: Projects47213
-Node: Implementation48790
-Node: Bibliography49509
-
-End Tag Table