diff options
Diffstat (limited to 'ACE/apps/gperf/gperf.info')
-rw-r--r-- | ACE/apps/gperf/gperf.info | 1129 |
1 files changed, 1129 insertions, 0 deletions
diff --git a/ACE/apps/gperf/gperf.info b/ACE/apps/gperf/gperf.info new file mode 100644 index 00000000000..49793970903 --- /dev/null +++ b/ACE/apps/gperf/gperf.info @@ -0,0 +1,1129 @@ +This is ../../../apps/gperf/gperf.info, produced by makeinfo version +4.3 from ../../../apps/gperf/gperf.texi. + +INFO-DIR-SECTION GNU programming tools +START-INFO-DIR-ENTRY +* Gperf: (gperf). Perfect Hash Function Generator. +END-INFO-DIR-ENTRY + + This file documents the features of the GNU Perfect Hash Function +Generator + + Copyright (C) 1989 Free Software Foundation, Inc. + + Permission is granted to make and distribute verbatim copies of this +manual provided the copyright notice and this permission notice are +preserved on all copies. + + Permission is granted to copy and distribute modified versions of +this manual under the conditions for verbatim copying, provided also +that the section entitled "GNU General Public License" is included +exactly as in the original, and provided that the entire resulting +derived work is distributed under the terms of a permission notice +identical to this one. + + Permission is granted to copy and distribute translations of this +manual into another language, under the above conditions for modified +versions, except that the section entitled "GNU `gperf' General Public +License" an d this permission notice may be included in translations +approved by the Free Software Foundation instead of in the original +English. + + +File: gperf.info, Node: Top, Next: Copying, Prev: (dir), Up: (dir) + +GNU GPERF Utility +***************** + +Introduction +************ + + This manual documents the GNU `gperf' perfect hash function generator +utility, focusing on its features and how to use them, and how to report +bugs. + +* Menu: + +* Copying:: GNU `gperf' General Public License says + how you can copy and share `gperf'. +* Contributors:: People who have contributed to `gperf'. +* Motivation:: Static search structures and GNU GPERF. +* Search Structures:: Static search structures and GNU `gperf' +* Description:: High-level discussion of how GPERF functions. +* Options:: A description of options to the program. +* Bugs:: Known bugs and limitations with GPERF. +* Projects:: Things still left to do. +* Implementation:: Implementation Details for GNU GPERF. +* Bibliography:: Material Referenced in this Report. + + --- The Detailed Node Listing --- + +High-Level Description of GNU `gperf' + +* Input Format:: Input Format to `gperf' +* Output Format:: Output Format for Generated C Code with `gperf' + +Input Format to `gperf' + +* Declarations:: `struct' Declarations and C Code Inclusion. +* Keywords:: Format for Keyword Entries. +* Functions:: Including Additional C Functions. + + +File: gperf.info, Node: Copying, Next: Contributors, Prev: Top, Up: Top + +GNU GENERAL PUBLIC LICENSE +************************** + + Version 1, February 1989 + Copyright (C) 1989 Free Software Foundation, Inc. + 675 Mass Ave, Cambridge, MA 02139, USA + + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + +Preamble +======== + + The license agreements of most software companies try to keep users +at the mercy of those companies. By contrast, our General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. The +General Public License applies to the Free Software Foundation's +software and to any other program whose authors commit to using it. +You can use it for your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Specifically, the General Public License is designed to make +sure that you have the freedom to give away or sell copies of free +software, that you receive source code or can get it if you want it, +that you can change the software or use pieces of it in new free +programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of a such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must tell them their rights. + + We protect your rights with two steps: (1) copyright the software, +and (2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + 1. This License Agreement applies to any program or other work which + contains a notice placed by the copyright holder saying it may be + distributed under the terms of this General Public License. The + "Program", below, refers to any such program or work, and a "work + based on the Program" means either the Program or any work + containing the Program or a portion of it, either verbatim or with + modifications. Each licensee is addressed as "you". + + 2. You may copy and distribute verbatim copies of the Program's source + code as you receive it, in any medium, provided that you + conspicuously and appropriately publish on each copy an + appropriate copyright notice and disclaimer of warranty; keep + intact all the notices that refer to this General Public License + and to the absence of any warranty; and give any other recipients + of the Program a copy of this General Public License along with + the Program. You may charge a fee for the physical act of + transferring a copy. + + 3. You may modify your copy or copies of the Program or any portion of + it, and copy and distribute such modifications under the terms of + Paragraph 1 above, provided that you also do the following: + + * cause the modified files to carry prominent notices stating + that you changed the files and the date of any change; and + + * cause the whole of any work that you distribute or publish, + that in whole or in part contains the Program or any part + thereof, either with or without modifications, to be licensed + at no charge to all third parties under the terms of this + General Public License (except that you may choose to grant + warranty protection to some or all third parties, at your + option). + + * If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the simplest and most usual way, to print + or display an announcement including an appropriate copyright + notice and a notice that there is no warranty (or else, + saying that you provide a warranty) and that users may + redistribute the program under these conditions, and telling + the user how to view a copy of this General Public License. + + * You may charge a fee for the physical act of transferring a + copy, and you may at your option offer warranty protection in + exchange for a fee. + + Mere aggregation of another independent work with the Program (or + its derivative) on a volume of a storage or distribution medium + does not bring the other work under the scope of these terms. + + 4. You may copy and distribute the Program (or a portion or + derivative of it, under Paragraph 2) in object code or executable + form under the terms of Paragraphs 1 and 2 above provided that you + also do one of the following: + + * accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of + Paragraphs 1 and 2 above; or, + + * accompany it with a written offer, valid for at least three + years, to give any third party free (except for a nominal + charge for the cost of distribution) a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Paragraphs 1 and 2 above; or, + + * accompany it with the information you received as to where the + corresponding source code may be obtained. (This alternative + is allowed only for noncommercial distribution and only if you + received the program in object code or executable form alone.) + + Source code for a work means the preferred form of the work for + making modifications to it. For an executable file, complete + source code means all the source code for all modules it contains; + but, as a special exception, it need not include source code for + modules which are standard libraries that accompany the operating + system on which the executable file runs, or for standard header + files or definitions files that accompany that operating system. + + 5. You may not copy, modify, sublicense, distribute or transfer the + Program except as expressly provided under this General Public + License. Any attempt otherwise to copy, modify, sublicense, + distribute or transfer the Program is void, and will automatically + terminate your rights to use the Program under this License. + However, parties who have received copies, or rights to use + copies, from you under this General Public License will not have + their licenses terminated so long as such parties remain in full + compliance. + + 6. By copying, distributing or modifying the Program (or any work + based on the Program) you indicate your acceptance of this license + to do so, and all its terms and conditions. + + 7. Each time you redistribute the Program (or any work based on the + Program), the recipient automatically receives a license from the + original licensor to copy, distribute or modify the Program + subject to these terms and conditions. You may not impose any + further restrictions on the recipients' exercise of the rights + granted herein. + + 8. The Free Software Foundation may publish revised and/or new + versions of the General Public License from time to time. Such + new versions will be similar in spirit to the present version, but + may differ in detail to address new problems or concerns. + + Each version is given a distinguishing version number. If the + Program specifies a version number of the license which applies to + it and "any later version", you have the option of following the + terms and conditions either of that version or of any later + version published by the Free Software Foundation. If the Program + does not specify a version number of the license, you may choose + any version ever published by the Free Software Foundation. + + 9. If you wish to incorporate parts of the Program into other free + programs whose distribution conditions are different, write to the + author to ask for permission. For software which is copyrighted + by the Free Software Foundation, write to the Free Software + Foundation; we sometimes make exceptions for this. Our decision + will be guided by the two goals of preserving the free status of + all derivatives of our free software and of promoting the sharing + and reuse of software generally. + + NO WARRANTY + + 10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO + WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE + LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT + HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT + WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT + NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND + FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE + QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE + PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY + SERVICING, REPAIR OR CORRECTION. + + 11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN + WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY + MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE + LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, + INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR + INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF + DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU + OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY + OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN + ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + +Appendix: How to Apply These Terms to Your New Programs +======================================================= + + If you develop a new program, and you want it to be of the greatest +possible use to humanity, the best way to achieve this is to make it +free software which everyone can redistribute and change under these +terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least the +"copyright" line and a pointer to where the full notice is found. + + ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES. + Copyright (C) 19YY NAME OF AUTHOR + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 1, or (at your option) + any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + + Also add information on how to contact you by electronic and paper +mail. + + If the program is interactive, make it output a short notice like +this when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + + The hypothetical commands `show w' and `show c' should show the +appropriate parts of the General Public License. Of course, the +commands you use may be called something other than `show w' and `show +c'; they could even be mouse-clicks or menu items--whatever suits your +program. + + You should also get your employer (if you work as a programmer) or +your school, if any, to sign a "copyright disclaimer" for the program, +if necessary. Here a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the + program `Gnomovision' (a program to direct compilers to make passes + at assemblers) written by James Hacker. + + SIGNATURE OF TY COON, 1 April 1989 + Ty Coon, President of Vice + + That's all there is to it! + + +File: gperf.info, Node: Contributors, Next: Motivation, Prev: Copying, Up: Top + +Contributors to GNU `gperf' Utility +*********************************** + + * The GNU `gperf' perfect hash function generator utility was + originally written in GNU C++ by Douglas C. Schmidt. It is now + also available in a highly-portable "old-style" C version. The + general idea for the perfect hash function generator was inspired + by Keith Bostic's algorithm written in C, and distributed to + net.sources around 1984. The current program is a heavily + modified, enhanced, and extended implementation of Keith's basic + idea, created at the University of California, Irvine. Bugs, + patches, and suggestions should be reported to schmidt at + ics.uci.edu. + + * Special thanks is extended to Michael Tiemann and Doug Lea, for + providing a useful compiler, and for giving me a forum to exhibit + my creation. + + In addition, Adam de Boor and Nels Olson provided many tips and + insights that greatly helped improve the quality and functionality + of `gperf'. + + +File: gperf.info, Node: Motivation, Next: Search Structures, Prev: Contributors, Up: Top + +Introduction +************ + + `gperf' is a perfect hash function generator written in C++. It +transforms an _n_ element user-specified keyword set _W_ into a perfect +hash function _F_. _F_ uniquely maps keywords in _W_ onto the range +0.._k_, where _k_ >= _n_. If _k = n_ then _F_ is a _minimal_ perfect +hash function. `gperf' generates a 0.._k_ element static lookup table +and a pair of C functions. These functions determine whether a given +character string _s_ occurs in _W_, using at most one probe into the +lookup table. + + `gperf' currently generates the reserved keyword recognizer for +lexical analyzers in several production and research compilers and +language processing tools, including GNU C, GNU C++, GNU Pascal, GNU +Modula 3, and GNU indent. Complete C++ source code for `gperf' is +available via anonymous ftp from ics.uci.edu. `gperf' also is +distributed along with the GNU libg++ library. A highly portable, +functionally equivalent K&R C version of `gperf' is archived in +comp.sources.unix, volume 20. Finally, a paper describing `gperf''s +design and implementation in greater detail is available in the Second +USENIX C++ Conference proceedings. + + +File: gperf.info, Node: Search Structures, Next: Description, Prev: Motivation, Up: Top + +Static search structures and GNU `gperf' +**************************************** + + A "static search structure" is an Abstract Data Type with certain +fundamental operations, _e.g._, _initialize_, _insert_, and _retrieve_. +Conceptually, all insertions occur before any retrievals. In +practice, `gperf' generates a `static' array containing search set +keywords and any associated attributes specified by the user. Thus, +there is essentially no execution-time cost for the insertions. It is +a useful data structure for representing _static search sets_. Static +search sets occur frequently in software system applications. Typical +static search sets include compiler reserved words, assembler +instruction opcodes, and built-in shell interpreter commands. Search +set members, called "keywords", are inserted into the structure only +once, usually during program initialization, and are not generally +modified at run-time. + + Numerous static search structure implementations exist, _e.g._, +arrays, linked lists, binary search trees, digital search tries, and +hash tables. Different approaches offer trade-offs between space +utilization and search time efficiency. For example, an _n_ element +sorted array is space efficient, though the average-case time +complexity for retrieval operations using binary search is proportional +to log _n_. Conversely, hash table implementations often locate a +table entry in constant time, but typically impose additional memory +overhead and exhibit poor worst case performance. + + _Minimal perfect hash functions_ provide an optimal solution for a +particular class of static search sets. A minimal perfect hash +function is defined by two properties: + + * It allows keyword recognition in a static search set using at most + _one_ probe into the hash table. This represents the "perfect" + property. + + * The actual memory allocated to store the keywords is precisely + large enough for the keyword set, and _no larger_. This is the + "minimal" property. + + For most applications it is far easier to generate _perfect_ hash +functions than _minimal perfect_ hash functions. Moreover, non-minimal +perfect hash functions frequently execute faster than minimal ones in +practice. This phenomena occurs since searching a sparse keyword table +increases the probability of locating a "null" entry, thereby reducing +string comparisons. `gperf''s default behavior generates +_near-minimal_ perfect hash functions for keyword sets. However, +`gperf' provides many options that permit user control over the degree +of minimality and perfection. + + Static search sets often exhibit relative stability over time. For +example, Ada's 63 reserved words have remained constant for nearly a +decade. It is therefore frequently worthwhile to expend concerted +effort building an optimal search structure _once_, if it subsequently +receives heavy use multiple times. `gperf' removes the drudgery +associated with constructing time- and space-efficient search +structures by hand. It has proven a useful and practical tool for +serious programming projects. Output from `gperf' is currently used in +several production and research compilers, including GNU C, GNU C++, +GNU Pascal, and GNU Modula 3. The latter two compilers are not yet +part of the official GNU distribution. Each compiler utilizes `gperf' +to automatically generate static search structures that efficiently +identify their respective reserved keywords. + + +File: gperf.info, Node: Description, Next: Options, Prev: Search Structures, Up: Top + +High-Level Description of GNU `gperf' +************************************* + +* Menu: + +* Input Format:: Input Format to `gperf' +* Output Format:: Output Format for Generated C Code with `gperf' + + The perfect hash function generator `gperf' reads a set of +"keywords" from a "keyfile" (or from the standard input by default). +It attempts to derive a perfect hashing function that recognizes a +member of the "static keyword set" with at most a single probe into the +lookup table. If `gperf' succeeds in generating such a function it +produces a pair of C source code routines that perform hashing and +table lookup recognition. All generated C code is directed to the +standard output. Command-line options described below allow you to +modify the input and output format to `gperf'. + + By default, `gperf' attempts to produce time-efficient code, with +less emphasis on efficient space utilization. However, several options +exist that permit trading-off execution time for storage space and vice +versa. In particular, expanding the generated table size produces a +sparse search structure, generally yielding faster searches. +Conversely, you can direct `gperf' to utilize a C `switch' statement +scheme that minimizes data space storage size. Furthermore, using a C +`switch' may actually speed up the keyword retrieval time somewhat. +Actual results depend on your C compiler, of course. + + In general, `gperf' assigns values to the characters it is using for +hashing until some set of values gives each keyword a unique value. A +helpful heuristic is that the larger the hash value range, the easier +it is for `gperf' to find and generate a perfect hash function. +Experimentation is the key to getting the most from `gperf'. + + +File: gperf.info, Node: Input Format, Next: Output Format, Prev: Description, Up: Description + +Input Format to `gperf' +======================= + + You can control the input keyfile format by varying certain +command-line arguments, in particular the `-t' option. The input's +appearance is similar to GNU utilities `flex' and `bison' (or UNIX +utilities `lex' and `yacc'). Here's an outline of the general format: + + declarations + %% + keywords + %% + functions + + _Unlike_ `flex' or `bison', all sections of `gperf''s input are +optional. The following sections describe the input format for each +section. + +* Menu: + +* Declarations:: `struct' Declarations and C Code Inclusion. +* Keywords:: Format for Keyword Entries. +* Functions:: Including Additional C Functions. + + +File: gperf.info, Node: Declarations, Next: Keywords, Prev: Input Format, Up: Input Format + +`struct' Declarations and C Code Inclusion +------------------------------------------ + + The keyword input file optionally contains a section for including +arbitrary C declarations and definitions, as well as provisions for +providing a user-supplied `struct'. If the `-t' option _is_ enabled, +you _must_ provide a C `struct' as the last component in the +declaration section from the keyfile file. The first field in this +struct must be a `char *' identifier called "name," although it is +possible to modify this field's name with the `-K' option described +below. + + Here is simple example, using months of the year and their +attributes as input: + + struct months { char *name; int number; int days; int leap_days; }; + %% + january, 1, 31, 31 + february, 2, 28, 29 + march, 3, 31, 31 + april, 4, 30, 30 + may, 5, 31, 31 + june, 6, 30, 30 + july, 7, 31, 31 + august, 8, 31, 31 + september, 9, 30, 30 + october, 10, 31, 31 + november, 11, 30, 30 + december, 12, 31, 31 + + Separating the `struct' declaration from the list of key words and +other fields are a pair of consecutive percent signs, `%%', appearing +left justified in the first column, as in the UNIX utility `lex'. + + Using a syntax similar to GNU utilities `flex' and `bison', it is +possible to directly include C source text and comments verbatim into +the generated output file. This is accomplished by enclosing the region +inside left-justified surrounding `%{', `%}' pairs. Here is an input +fragment based on the previous example that illustrates this feature: + + %{ + #include <assert.h> + /* This section of code is inserted directly into the output. */ + int return_month_days (struct months *months, int is_leap_year); + %} + struct months { char *name; int number; int days; int leap_days; }; + %% + january, 1, 31, 31 + february, 2, 28, 29 + march, 3, 31, 31 + ... + + It is possible to omit the declaration section entirely. In this +case the keyfile begins directly with the first keyword line, _e.g._: + + january, 1, 31, 31 + february, 2, 28, 29 + march, 3, 31, 31 + april, 4, 30, 30 + ... + + +File: gperf.info, Node: Keywords, Next: Functions, Prev: Declarations, Up: Input Format + +Format for Keyword Entries +-------------------------- + + The second keyfile format section contains lines of keywords and any +associated attributes you might supply. A line beginning with `#' in +the first column is considered a comment. Everything following the `#' +is ignored, up to and including the following newline. + + The first field of each non-comment line is always the key itself. +It should be given as a simple name, _i.e._, without surrounding string +quotation marks, and be left-justified flush against the first column. +In this context, a "field" is considered to extend up to, but not +include, the first blank, comma, or newline. Here is a simple example +taken from a partial list of C reserved words: + + # These are a few C reserved words, see the c.`gperf' file + # for a complete list of ANSI C reserved words. + unsigned + sizeof + switch + signed + if + default + for + while + return + + Note that unlike `flex' or `bison' the first `%%' marker may be +elided if the declaration section is empty. + + Additional fields may optionally follow the leading keyword. Fields +should be separated by commas, and terminate at the end of line. What +these fields mean is entirely up to you; they are used to initialize the +elements of the user-defined `struct' provided by you in the +declaration section. If the `-t' option is _not_ enabled these fields +are simply ignored. All previous examples except the last one contain +keyword attributes. + + +File: gperf.info, Node: Functions, Prev: Keywords, Up: Input Format + +Including Additional C Functions +-------------------------------- + + The optional third section also corresponds closely with conventions +found in `flex' and `bison'. All text in this section, starting at the +final `%%' and extending to the end of the input file, is included +verbatim into the generated output file. Naturally, it is your +responsibility to ensure that the code contained in this section is +valid C. + + +File: gperf.info, Node: Output Format, Prev: Input Format, Up: Description + +Output Format for Generated C Code with `gperf' +=============================================== + + Several options control how the generated C code appears on the +standard output. Two C function are generated. They are called `hash' +and `in_word_set', although you may modify the name for `in_word_set' +with a command-line option. Both functions require two arguments, a +string, `char *' STR, and a length parameter, `int' LEN. Their default +function prototypes are as follows: + + static int hash (char *str, int len); + int in_word_set (char *str, int len); + + By default, the generated `hash' function returns an integer value +created by adding LEN to several user-specified STR key positions +indexed into an "associated values" table stored in a local static +array. The associated values table is constructed internally by +`gperf' and later output as a static local C array called HASH_TABLE; +its meaning and properties are described below. *Note +Implementation::. The relevant key positions are specified via the `-k' +option when running `gperf', as detailed in the _Options_ section +below. *Note Options::. + + Two options, `-g' (assume you are compiling with GNU C and its +`inline' feature) and `-a' (assume ANSI C-style function prototypes), +alter the content of both the generated `hash' and `in_word_set' +routines. However, function `in_word_set' may be modified more +extensively, in response to your option settings. The options that +affect the `in_word_set' structure are: + + `-p' + Have function `in_word_set' return a pointer rather than a + boolean. + + `-t' + Make use of the user-defined `struct'. + + `-S TOTAL SWITCH STATEMENTS' + Generate 1 or more C `switch' statement rather than use a + large, (and potentially sparse) static array. Although the + exact time and space savings of this approach vary according + to your C compiler's degree of optimization, this method + often results in smaller and faster code. + + If the `-t', `-S', and `-p' options are omitted the default action +is to generate a `char *' array containing the keys, together with +additional null strings used for padding the array. By experimenting +with the various input and output options, and timing the resulting C +code, you can determine the best option choices for different keyword +set characteristics. + + +File: gperf.info, Node: Options, Next: Bugs, Prev: Description, Up: Top + +Options to the `gperf' Utility +****************************** + + There are _many_ options to `gperf'. They were added to make the +program more convenient for use with real applications. "On-line" help +is readily available via the `-h' option. Other options include: + + `-a' + Generate ANSI Standard C code using function prototypes. The + default is to use "classic" K&R C function declaration syntax. + + `-c' + Generates C code that uses the `strncmp' function to perform + string comparisons. The default action is to use `strcmp'. + + `-C' + Makes the contents of all generated lookup tables constant, + _i.e._, "readonly." Many compilers can generate more + efficient code for this by putting the tables in readonly + memory. + + `-d' + Enables the debugging option. This produces verbose + diagnostics to "standard error" when `gperf' is executing. + It is useful both for maintaining the program and for + determining whether a given set of options is actually + speeding up the search for a solution. Some useful + information is dumped at the end of the program when the `-d' + option is enabled. + + `-D' + Handle keywords whose key position sets hash to duplicate + values. Duplicate hash values occur for two reasons: + + * Since `gperf' does not backtrack it is possible for it + to process all your input keywords without finding a + unique mapping for each word. However, frequently only + a very small number of duplicates occur, and the + majority of keys still require one probe into the table. + + * Sometimes a set of keys may have the same names, but + possess different attributes. With the -D option + `gperf' treats all these keys as part of an equivalence + class and generates a perfect hash function with multiple + comparisons for duplicate keys. It is up to you to + completely disambiguate the keywords by modifying the + generated C code. However, `gperf' helps you out by + organizing the output. + + Option `-D' is extremely useful for certain large or highly + redundant keyword sets, _i.e._, assembler instruction opcodes. + Using this option usually means that the generated hash + function is no longer perfect. On the other hand, it permits + `gperf' to work on keyword sets that it otherwise could not + handle. + + `-e KEYWORD DELIMITER LIST' + Allows the user to provide a string containing delimiters + used to separate keywords from their attributes. The default + is ",\n". This option is essential if you want to use + keywords that have embedded commas or newlines. One useful + trick is to use -e'TAB', where TAB is the literal tab + character. + + `-E' + Define constant values using an enum local to the lookup + function rather than with #defines. This also means that + different lookup functions can reside in the same file. + Thanks to James Clark (jjc at ai.mit.edu). + + `-f ITERATION AMOUNT' + Generate the perfect hash function "fast." This decreases + `gperf''s running time at the cost of minimizing generated + table-size. The iteration amount represents the number of + times to iterate when resolving a collision. `0' means + `iterate by the number of keywords. This option is probably + most useful when used in conjunction with options `-D' and/or + `-S' for _large_ keyword sets. + + `-g' + Assume a GNU compiler, _e.g._, `g++' or `gcc'. This makes + all generated routines use the "inline" keyword to remove the + cost of function calls. Note that `-g' does _not_ imply + `-a', since other non-ANSI C compilers may have provisions + for a function `inline' feature. + + `-G' + Generate the static table of keywords as a static global + variable, rather than hiding it inside of the lookup function + (which is the default behavior). + + `-h' + Prints a short summary on the meaning of each program option. + Aborts further program execution. + + `-H HASH FUNCTION NAME' + Allows you to specify the name for the generated hash + function. Default name is `hash.' This option permits the + use of two hash tables in the same file. + + `-i INITIAL VALUE' + Provides an initial VALUE for the associate values array. + Default is 0. Increasing the initial value helps inflate the + final table size, possibly leading to more time efficient + keyword lookups. Note that this option is not particularly + useful when `-S' is used. Also, `-i' is overriden when the + `-r' option is used. + + `-j JUMP VALUE' + Affects the "jump value," _i.e._, how far to advance the + associated character value upon collisions. JUMP VALUE is + rounded up to an odd number, the default is 5. If the JUMP + VALUE is 0 `gper f' jumps by random amounts. + + `-k KEYS' + Allows selection of the character key positions used in the + keywords' hash function. The allowable choices range between + 1-126, inclusive. The positions are separated by commas, + _e.g._, `-k 9,4,13,14'; ranges may be used, _e.g._, `-k 2-7'; + and positions may occur in any order. Furthermore, the + meta-character '*' causes the generated hash function to + consider *all* character positions in each key, whereas '$' + instructs the hash function to use the "final character" of a + key (this is the only way to use a character position greater + than 126, incidentally). + + For instance, the option `-k 1,2,4,6-10,'$'' generates a hash + function that considers positions 1,2,4,6,7,8,9,10, plus the + last character in each key (which may differ for each key, + obviously). Keys with length less than the indicated key + positions work properly, since selected key positions + exceeding the key length are simply not referenced in the + hash function. + + `-K KEY NAME' + By default, the program assumes the structure component + identifier for the keyword is "name." This option allows an + arbitrary choice of identifier for this component, although + it still must occur as the first field in your supplied + `struct'. + + `-l' + Compare key lengths before trying a string comparison. This + might cut down on the number of string comparisons made + during the lookup, since keys with different lengths are + never compared via `strcmp'. However, using `-l' might + greatly increase the size of the generated C code if the + lookup table range is large (which implies that the switch + option `-S' is not enabled), since the length table contains + as many elements as there are entries in the lookup table. + + `-L GENERATED LANGUAGE NAME' + Instructs `gperf' to generate code in the language specified + by the option's argument. Languages handled are currently + C++ and C. The default is C. + + `-n' + Instructs the generator not to include the length of a + keyword when computing its hash value. This may save a few + assembly instructions in the generated lookup table. + + `-N LOOKUP FUNCTION NAME' + Allows you to specify the name for the generated lookup + function. Default name is `in_word_set.' This option + permits completely automatic generation of perfect hash + functions, especially when multiple generated hash functions + are used in the same application. + + `-o' + Reorders the keywords by sorting the keywords so that + frequently occuring key position set components appear first. + A second reordering pass follows so that keys with "already + determined values" are placed towards the front of the + keylist. This may decrease the time required to generate a + perfect hash function for many keyword sets, and also produce + more minimal perfect hash functions. The reason for this is + that the reordering helps prune the search time by handling + inevitable collisions early in the search process. On the + other hand, if the number of keywords is _very_ large using + `-o' may _increase_ `gperf''s execution time, since + collisions will begin earlier and continue throughout the + remainder of keyword processing. See Cichelli's paper from + the January 1980 Communications of the ACM for details. + + `-p' + Changes the return value of the generated function + `in_word_set' from boolean (_i.e._, 0 or 1), to either type + "pointer to user-defined struct," (if the `-t' option is + enabled), or simply to `char *', if `-t' is not enabled. + This option is most useful when the `-t' option (allowing + user-defined structs) is used. For example, it is possible + to automatically generate the GNU C reserved word lookup + routine with the options `-p' and `-t'. + + `-r' + Utilizes randomness to initialize the associated values + table. This frequently generates solutions faster than using + deterministic initialization (which starts all associated + values at 0). Furthermore, using the randomization option + generally increases the size of the table. If `gperf' has + difficultly with a certain keyword set try using `-r' or `-D'. + + `-s SIZE-MULTIPLE' + Affects the size of the generated hash table. The numeric + argument for this option indicates "how many times larger or + smaller" the maximum associated value range should be, in + relationship to the number of keys. If the SIZE-MULTIPLE is + negative the maximum associated value is calculated by + _dividing_ it into the total number of keys. For example, a + value of 3 means "allow the maximum associated value to be + about 3 times larger than the number of input keys." + + Conversely, a value of -3 means "allow the maximum associated + value to be about 3 times smaller than the number of input + keys." Negative values are useful for limiting the overall + size of the generated hash table, though this usually + increases the number of duplicate hash values. + + If `generate switch' option `-S' is _not_ enabled, the maximum + associated value influences the static array table size, and + a larger table should decrease the time required for an + unsuccessful search, at the expense of extra table space. + + The default value is 1, thus the default maximum associated + value about the same size as the number of keys (for + efficiency, the maximum associated value is always rounded up + to a power of 2). The actual table size may vary somewhat, + since this technique is essentially a heuristic. In + particular, setting this value too high slows down `gperf''s + runtime, since it must search through a much larger range of + values. Judicious use of the `-f' option helps alleviate this + overhead, however. + + `-S TOTAL SWITCH STATEMENTS' + Causes the generated C code to use a `switch' statement + scheme, rather than an array lookup table. This can lead to + a reduction in both time and space requirements for some + keyfiles. The argument to this option determines how many + `switch' statements are generated. A value of 1 generates 1 + `switch' containing all the elements, a value of 2 generates + 2 tables with 1/2 the elements in each `switch', etc. This + is useful since many C compilers cannot correctly generate + code for large `switch' statements. This option was inspired + in part by Keith Bostic's original C program. + + `-t' + Allows you to include a `struct' type declaration for + generated code. Any text before a pair of consecutive %% is + consider part of the type declaration. Key words and + additional fields may follow this, one group of fields per + line. A set of examples for generating perfect hash tables + and functions for Ada, C, and G++, Pascal, and Modula 2 and 3 + reserved words are distributed with this release. + + `-T' + Prevents the transfer of the type declaration to the output + file. Use this option if the type is already defined + elsewhere. + + `-v' + Prints out the current version number. + + `-Z CLASS NAME' + Allow user to specify name of generated C++ class. Default + name is `Perfect_Hash'. + + +File: gperf.info, Node: Bugs, Next: Projects, Prev: Options, Up: Top + +Known Bugs and Limitations with `gperf' +*************************************** + + The following are some limitations with the current release of +`gperf': + + * The `gperf' utility is tuned to execute quickly, and works quickly + for small to medium size data sets (around 1000 keywords). It is + extremely useful for maintaining perfect hash functions for + compiler keyword sets. Several recent enhancements now enable + `gperf' to work efficiently on much larger keyword sets (over + 15,000 keywords). When processing large keyword sets it helps + greatly to have over 8 megs of RAM. + + However, since `gperf' does not backtrack no guaranteed solution + occurs on every run. On the other hand, it is usually easy to + obtain a solution by varying the option parameters. In + particular, try the `-r' option, and also try changing the default + arguments to the `-s' and `-j' options. To _guarantee_ a + solution, use the `-D' and `-S' options, although the final + results are not likely to be a _perfect_ hash function anymore! + Finally, use the `-f' option if you want `gperf' to generate the + perfect hash function _fast_, with less emphasis on making it + minimal. + + * The size of the generate static keyword array can get _extremely_ + large if the input keyword file is large or if the keywords are + quite similar. This tends to slow down the compilation of the + generated C code, and _greatly_ inflates the object code size. If + this situation occurs, consider using the `-S' option to reduce + data size, potentially increasing keyword recognition time a + negligible amount. Since many C compilers cannot correctly + generated code for large switch statements it is important to + qualify the -S option with an appropriate numerical argument that + controls the number of switch statements generated. + + * The maximum number of key positions selected for a given key has an + arbitrary limit of 126. This restriction should be removed, and if + anyone considers this a problem write me and let me know so I can + remove the constraint. + + * The C++ source code only compiles correctly with GNU G++, version + 1.36 (and hopefully later versions). Porting to AT&T cfront would + be tedious, but possible (and desirable). There is also a K&R C + version available now. This should compile without change on most + BSD systems, but may require a bit of work to run on SYSV, since + `gperf' uses ALLOCA in several places. Send mail to schmidt at + ics.uci.edu for information. + + +File: gperf.info, Node: Projects, Next: Implementation, Prev: Bugs, Up: Top + +Things Still Left to Do +*********************** + + It should be "relatively" easy to replace the current perfect hash +function algorithm with a more exhaustive approach; the perfect hash +module is essential independent from other program modules. Additional +worthwhile improvements include: + + * Make the algorithm more robust. At present, the program halts + with an error diagnostic if it can't find a direct solution and + the `-D' option is not enabled. A more comprehensive, albeit + computationally expensive, approach would employ backtracking or + enable alternative options and retry. It's not clear how helpful + this would be, in general, since most search sets are rather small + in practice. + + * Another useful extension involves modifying the program to generate + "minimal" perfect hash functions (under certain circumstances, the + current version can be rather extravagant in the generated table + size). Again, this is mostly of theoretical interest, since a + sparse table often produces faster lookups, and use of the `-S' + `switch' option can minimize the data size, at the expense of + slightly longer lookups (note that the gcc compiler generally + produces good code for `switch' statements, reducing the need for + more complex schemes). + + * In addition to improving the algorithm, it would also be useful to + generate a C++ class or Ada package as the code output, in + addition to the current C routines. + + +File: gperf.info, Node: Implementation, Next: Bibliography, Prev: Projects, Up: Top + +Implementation Details of GNU `gperf' +************************************* + + A paper describing the high-level description of the data structures +and algorithms used to implement `gperf' will soon be available. This +paper is useful not only from a maintenance and enhancement perspective, +but also because they demonstrate several clever and useful programming +techniques, _e.g._, `Iteration Number' boolean arrays, double hashing, +a "safe" and efficient method for reading arbitrarily long input from a +file, and a provably optimal algorithm for simultaneously determining +both the minimum and maximum elements in a list. + + +File: gperf.info, Node: Bibliography, Prev: Implementation, Up: Top + +Bibliography +************ + + [1] Chang, C.C.: A Scheme for Constructing Ordered Minimal Perfect +Hashing Functions Information Sciences 39(1986), 187-195. + + [2] Cichelli, Richard J. Author's Response to "On Cichelli's Minimal +Perfec t Hash Functions Method" Communications of the ACM, 23, +12(December 1980), 729. + + [3] Cichelli, Richard J. Minimal Perfect Hash Functions Made Simple +Communications of the ACM, 23, 1(January 1980), 17-19. + + [4] Cook, C. R. and Oldehoeft, R.R. A Letter Oriented Minimal +Perfect Hashing Function SIGPLAN Notices, 17, 9(September 1982), 18-27. + + [5] Cormack, G. V. and Horspool, R. N. S. and Kaiserwerth, M. +Practical Perfect Hashing Computer Journal, 28, 1(January 1985), 54-58. + + [6] Jaeschke, G. Reciprocal Hashing: A Method for Generating Minimal +Perfect Hashing Functions Communications of the ACM, 24, 12(December +1981), 829-833. + + [7] Jaeschke, G. and Osterburg, G. On Cichelli's Minimal Perfect +Hash Functions Method Communications of the ACM, 23, 12(December 1980), +728-729. + + [8] Sager, Thomas J. A Polynomial Time Generator for Minimal Perfect +Hash Functions Communications of the ACM, 28, 5(December 1985), 523-532 + + [9] Schmidt, Douglas C. GPERF: A Perfect Hash Function Generator +Second USENIX C++ Conference Proceedings, April 1990. + + [10] Sebesta, R.W. and Taylor, M.A. Minimal Perfect Hash Functions +for Reserved Word Lists SIGPLAN Notices, 20, 12(September 1985), 47-53. + + [11] Sprugnoli, R. Perfect Hashing Functions: A Single Probe +Retrieving Method for Static Sets Communications of the ACM, 20 +11(November 1977), 841-850. + + [12] Stallman, Richard M. Using and Porting GNU CC Free Software +Foundation, 1988. + + [13] Stroustrup, Bjarne The C++ Programming Language. +Addison-Wesley, 1986. + + [14] Tiemann, Michael D. User's Guide to GNU C++ Free Software +Foundation, 1989. + + + +Tag Table: +Node: Top1277 +Node: Copying2566 +Node: Contributors15867 +Node: Motivation16967 +Node: Search Structures18234 +Node: Description21787 +Node: Input Format23607 +Node: Declarations24402 +Node: Keywords26709 +Node: Functions28300 +Node: Output Format28794 +Node: Options31264 +Node: Bugs44634 +Node: Projects47321 +Node: Implementation48898 +Node: Bibliography49617 + +End Tag Table |