diff options
Diffstat (limited to 'doc/gawk.texi')
-rw-r--r-- | doc/gawk.texi | 2214 |
1 files changed, 1261 insertions, 953 deletions
diff --git a/doc/gawk.texi b/doc/gawk.texi index 108b3320..c8fed041 100644 --- a/doc/gawk.texi +++ b/doc/gawk.texi @@ -13,16 +13,16 @@ * awk: (gawk)Invoking gawk. Text scanning and processing. @end direntry -@c @set xref-automatic-section-title +@set xref-automatic-section-title @c The following information should be updated here only! @c This sets the edition of the document, the version of gawk it @c applies to and all the info about who's publishing this edition @c These apply across the board. -@set UPDATE-MONTH February, 2003 +@set UPDATE-MONTH June, 2003 @set VERSION 3.1 -@set PATCHLEVEL 2 +@set PATCHLEVEL 3 @set FSF @@ -232,7 +232,7 @@ Cover art by Etienne Suvasa. @ifnottex @ifnotxml -@node Top, Foreword, (dir), (dir) +@node Top @top General Introduction @c Preface node should come right after the Top @c node, in `unnumbered' sections, then the chapter, `What is gawk'. @@ -436,6 +436,8 @@ particular records in a file and perform operations upon them. some condition is satisfied. * For Statement:: Another looping statement, that provides initialization and increment clauses. +* Switch Statement:: Switch/case evaluation for conditional + execution of statements based on a value. * Break Statement:: Immediately exit the innermost enclosing loop. * Continue Statement:: Skip to the end of the innermost enclosing @@ -532,6 +534,7 @@ particular records in a file and perform operations upon them. transitions. * Rewind Function:: A function for rereading the current file. * File Checking:: Checking that data files are readable. +* Empty Files:: Checking for zero-length files. * Ignoring Assigns:: Treating assignments as file names. * Getopt Function:: A function for processing command-line arguments. @@ -585,10 +588,12 @@ particular records in a file and perform operations upon them. * PC Installation:: Installing and Compiling @command{gawk} on MS-DOS and OS/2. * PC Binary Installation:: Installing a prepared distribution. -* PC Compiling:: Compiling @command{gawk} for MS-DOS, Win32, +* PC Compiling:: Compiling @command{gawk} for MS-DOS, Windows32, and OS/2. -* PC Using:: Running @command{gawk} on MS-DOS, Win32 and +* PC Using:: Running @command{gawk} on MS-DOS, Windows32 and OS/2. +* PC Dynamic:: Compiling @command{gawk} for dynamic + libraries. * Cygwin:: Building and running @command{gawk} for Cygwin. * VMS Installation:: Installing @command{gawk} on VMS. @@ -644,17 +649,17 @@ particular records in a file and perform operations upon them. @summarycontents @contents -@node Foreword, Preface, Top, Top +@node Foreword @unnumbered Foreword Arnold Robbins and I are good friends. We were introduced 11 years ago -by circumstances---and our favorite programming language, AWK. +by circumstances---and our favorite programming language, AWK. The circumstances started a couple of years -earlier. I was working at a new job and noticed an unplugged +earlier. I was working at a new job and noticed an unplugged Unix computer sitting in the corner. No one knew how to use it, and neither did I. However, a couple of days later it was running, and -I was @code{root} and the one-and-only user. +I was @code{root} and the one-and-only user. That day, I began the transition from statistician to Unix programmer. On one of many trips to the library or bookstore in search of @@ -684,12 +689,12 @@ any system; my wife uses @command{gawk} on her VMS box.) My Unix system started out unplugged from the wall; it certainly was not plugged into a network. So, oblivious to the existence of @command{gawk} and the Unix community in general, and desiring a new @command{awk}, I wrote -my own, called @command{mawk}. +my own, called @command{mawk}. Before I was finished I knew about @command{gawk}, but it was too late to stop, so I eventually posted -to a @code{comp.sources} newsgroup. +to a @code{comp.sources} newsgroup. -A few days after my posting, I got a friendly email +A few days after my posting, I got a friendly email from Arnold introducing himself. He suggested we share design and algorithms and attached a draft of the POSIX standard so @@ -698,7 +703,7 @@ after publication of the AWK book. Frankly, if our roles had been reversed, I would not have been so open and we probably would -have never met. I'm glad we did meet. +have never met. I'm glad we did meet. He is an AWK expert's AWK expert and a genuinely nice person. Arnold contributes significant amounts of his expertise and time to the Free Software Foundation. @@ -715,11 +720,11 @@ a wealth of practical programs that emphasize the power of AWK's basic idioms: data driven control-flow, pattern matching with regular expressions, and associative arrays. -Those looking for something new can try out @command{gawk}'s +Those looking for something new can try out @command{gawk}'s interface to network protocols via special @file{/inet} files. The programs in this book make clear that an AWK program is -typically much smaller and faster to develop than +typically much smaller and faster to develop than a counterpart written in C. Consequently, there is often a payoff to prototype an algorithm or design in AWK to get it running quickly and expose @@ -758,7 +763,7 @@ Michael Brennan Author of @command{mawk} @end display -@node Preface, Getting Started, Foreword, Top +@node Preface @unnumbered Preface @c I saw a comment somewhere that the preface should describe the book itself, @c and the introduction should describe what the book covers. @@ -860,7 +865,7 @@ microcomputers, BeOS, Tandem D20, and VMS. * Acknowledgments:: Acknowledgments. @end menu -@node History, Names, Preface, Preface +@node History @unnumberedsec History of @command{awk} and @command{gawk} @cindex recipe for a programming language @cindex programming language, recipe for @@ -875,7 +880,7 @@ microcomputers, BeOS, Tandem D20, and VMS. Blend all parts well using @code{lex} and @code{yacc}. Document minimally and release. -After eight years, add another part @code{egrep} and two +After eight years, add another part @code{egrep} and two more parts C. Document very well and release. @end quotation @@ -919,15 +924,15 @@ wrote the bulk of His code finally became part of the main @command{gawk} distribution with @command{gawk} @value{PVERSION} 3.1. -@xref{Contributors, ,Major Contributors to @command{gawk}}, +@xref{Contributors}, for a complete list of those who made important contributions to @command{gawk}. -@node Names, This Manual, History, Preface +@node Names @section A Rose by Any Other Name @cindex @command{awk}, new vs. old The @command{awk} language has evolved over the years. Full details are -provided in @ref{Language History, ,The Evolution of the @command{awk} Language}. +provided in @ref{Language History}. The language described in this @value{DOCUMENT} is often referred to as ``new @command{awk}'' (@command{nawk}). @@ -959,7 +964,7 @@ that should be available in any complete implementation of POSIX @command{awk}, we simply use the term @command{awk}. When referring to a feature that is specific to the GNU implementation, we use the term @command{gawk}. -@node This Manual, Conventions, Names, Preface +@node This Manual @section Using This Book @cindex @command{awk}, terms describing @@ -1009,24 +1014,24 @@ exposed to @command{awk}, there is a lot of information here that even the @command{awk} expert should find useful. In particular, the description of POSIX @command{awk} and the example programs in -@ref{Library Functions, ,A Library of @command{awk} Functions}, and in -@ref{Sample Programs, ,Practical @command{awk} Programs}, +@ref{Library Functions}, and in +@ref{Sample Programs}, should be of interest. -@ref{Getting Started, ,Getting Started with @command{awk}}, +@ref{Getting Started}, provides the essentials you need to know to begin using @command{awk}. -@ref{Regexp, ,Regular Expressions}, +@ref{Regexp}, introduces regular expressions in general, and in particular the flavors supported by POSIX @command{awk} and @command{gawk}. -@ref{Reading Files, , Reading Input Files}, +@ref{Reading Files}, describes how @command{awk} reads your data. It introduces the concepts of records and fields, as well as the @code{getline} command. I/O redirection is first described here. -@ref{Printing, , Printing Output}, +@ref{Printing}, describes how @command{awk} programs can produce output with @code{print} and @code{printf}. @@ -1034,12 +1039,12 @@ describes how @command{awk} programs can produce output with describes expressions, which are the basic building blocks for getting most things done in a program. -@ref{Patterns and Actions, ,Patterns Actions and Variables}, +@ref{Patterns and Actions}, describes how to write patterns for matching records, actions for doing something when a record is matched, and the built-in variables @command{awk} and @command{gawk} use. -@ref{Arrays, ,Arrays in @command{awk}}, +@ref{Arrays}, covers @command{awk}'s one-and-only data structure: associative arrays. Deleting array elements and whole arrays is also described, as well as sorting arrays in @command{gawk}. @@ -1049,47 +1054,47 @@ describes the built-in functions @command{awk} and @command{gawk} provide, as well as how to define your own functions. -@ref{Internationalization, ,Internationalization with @command{gawk}}, +@ref{Internationalization}, describes special features in @command{gawk} for translating program messages into different languages at runtime. -@ref{Advanced Features, ,Advanced Features of @command{gawk}}, +@ref{Advanced Features}, describes a number of @command{gawk}-specific advanced features. Of particular note are the abilities to have two-way communications with another process, perform TCP/IP networking, and profile your @command{awk} programs. -@ref{Invoking Gawk, ,Running @command{awk} and @command{gawk}}, +@ref{Invoking Gawk}, describes how to run @command{gawk}, the meaning of its command-line options, and how it finds @command{awk} program source files. -@ref{Library Functions, ,A Library of @command{awk} Functions}, and -@ref{Sample Programs, ,Practical @command{awk} Programs}, +@ref{Library Functions}, and +@ref{Sample Programs}, provide many sample @command{awk} programs. Reading them allows you to see @command{awk} solving real problems. -@ref{Language History, ,The Evolution of the @command{awk} Language}, +@ref{Language History}, describes how the @command{awk} language has evolved since first release to present. It also describes how @command{gawk} has acquired features over time. -@ref{Installation, ,Installing @command{gawk}}, +@ref{Installation}, describes how to get @command{gawk}, how to compile it under Unix, and how to compile and use it on different non-Unix systems. It also describes how to report bugs in @command{gawk} and where to get three other freely available implementations of @command{awk}. -@ref{Notes, ,Implementation Notes}, +@ref{Notes}, describes how to disable @command{gawk}'s extensions, as well as how to contribute new code to @command{gawk}, how to write extension libraries, and some possible future directions for @command{gawk} development. -@ref{Basic Concepts, ,Basic Programming Concepts}, +@ref{Basic Concepts}, provides some very cursory background material for those who are completely unfamiliar with computer programming. Also centralized there is a discussion of some of the issues @@ -1101,12 +1106,12 @@ defines most, if not all, the significant terms used throughout the book. If you find terms that you aren't familiar with, try looking them up here. -@ref{Copying, ,GNU General Public License}, and +@ref{Copying}, and @ref{GNU Free Documentation License}, present the licenses that cover the @command{gawk} source code and this @value{DOCUMENT}, respectively. -@node Conventions, Manual History, This Manual, Preface +@node Conventions @section Typographical Conventions @cindex Texinfo @@ -1181,7 +1186,7 @@ As noted by the opening quote, though, any coverage of dark corners is, by definition, something that is incomplete. -@node Manual History, How To Contribute, Conventions, Preface +@node Manual History @unnumberedsec The GNU Project and This Book @cindex FSF (Free Software Foundation) @@ -1208,7 +1213,7 @@ copy of the GPL is included in this @value{DOCUMENT} @end ifnotinfo for your reference -(@pxref{Copying, ,GNU General Public License}). +(@pxref{Copying}). The GPL applies to the C language source code for @command{gawk}. To find out more about the FSF and the GNU Project online, see @uref{http://www.gnu.org, the GNU Project's home page}. @@ -1290,12 +1295,12 @@ Edition @value{EDITION} maintains the basic structure of Edition 1.0, but with significant additional material, reflecting the host of new features in @command{gawk} @value{PVERSION} @value{VERSION}. Of particular note is -@ref{Array Sorting, ,Sorting Array Values and Indices with @command{gawk}}, -@ref{Bitwise Functions, ,Using @command{gawk}'s Bit Manipulation Functions}, -@ref{Internationalization, ,Internationalization with @command{gawk}}, -@ref{Advanced Features, ,Advanced Features of @command{gawk}}, +@ref{Array Sorting}, +@ref{Bitwise Functions}, +@ref{Internationalization}, +@ref{Advanced Features}, and -@ref{Dynamic Extensions, ,Adding New Built-in Functions to @command{gawk}}. +@ref{Dynamic Extensions}. @end ignore @cindex Close, Diane @@ -1318,23 +1323,23 @@ This edition maintains the basic structure of Edition 1.0, but with significant additional material, reflecting the host of new features in @command{gawk} @value{PVERSION} @value{VERSION}. Of particular note is -@ref{Array Sorting, ,Sorting Array Values and Indices with @command{gawk}}, +@ref{Array Sorting}, as well as -@ref{Bitwise Functions, ,Using @command{gawk}'s Bit Manipulation Functions}, -@ref{Internationalization, ,Internationalization with @command{gawk}}, +@ref{Bitwise Functions}, +@ref{Internationalization}, and also -@ref{Advanced Features, ,Advanced Features of @command{gawk}}, +@ref{Advanced Features}, and -@ref{Dynamic Extensions, ,Adding New Built-in Functions to @command{gawk}}. +@ref{Dynamic Extensions}. @cite{@value{TITLE}} will undoubtedly continue to evolve. An electronic version comes with the @command{gawk} distribution from the FSF. If you find an error in this @value{DOCUMENT}, please report it! -@xref{Bugs, ,Reporting Problems and Bugs}, for information on submitting +@xref{Bugs}, for information on submitting problem reports electronically, or write to me in care of the publisher. -@node How To Contribute, Acknowledgments, Manual History, Preface +@node How To Contribute @unnumberedsec How to Contribute As the maintainer of GNU @command{awk}, @@ -1348,7 +1353,7 @@ share with the rest of the world, please contact me (@email{arnold@@gnu.org}). Making things available on the Internet helps keep the @command{gawk} distribution down to manageable size. -@node Acknowledgments, , How To Contribute, Preface +@node Acknowledgments @unnumberedsec Acknowledgments The initial draft of @cite{The GAWK Manual} had the following acknowledgments: @@ -1500,37 +1505,37 @@ and @command{gawk}. It contains the following chapters: @itemize @bullet @item -@ref{Getting Started, ,Getting Started with @command{awk}}. +@ref{Getting Started}. @item -@ref{Regexp, ,Regular Expressions}. +@ref{Regexp}. @item -@ref{Reading Files, , Reading Input Files}. +@ref{Reading Files}. @item -@ref{Printing, , Printing Output}. +@ref{Printing}. @item @ref{Expressions}. @item -@ref{Patterns and Actions, ,Patterns Actions and Variables}. +@ref{Patterns and Actions}. @item -@ref{Arrays, ,Arrays in @command{awk}}. +@ref{Arrays}. @item @ref{Functions}. @item -@ref{Internationalization, ,Internationalization with @command{gawk}}. +@ref{Internationalization}. @item -@ref{Advanced Features, ,Advanced Features of @command{gawk}}. +@ref{Advanced Features}. @item -@ref{Invoking Gawk, ,Running @command{awk} and @command{gawk}}. +@ref{Invoking Gawk}. @end itemize @page @@ -1539,7 +1544,7 @@ and @command{gawk}. It contains the following chapters: @end iftex @end ignore -@node Getting Started, Regexp, Preface, Top +@node Getting Started @chapter Getting Started with @command{awk} @c @cindex script, definition of @c @cindex rule, definition of @@ -1573,7 +1578,7 @@ When you run @command{awk}, you specify an @command{awk} @dfn{program} that tells @command{awk} what to do. The program consists of a series of @dfn{rules}. (It may also contain @dfn{function definitions}, an advanced feature that we will ignore for now. -@xref{User-defined, ,User-Defined Functions}.) Each rule specifies one +@xref{User-defined}.) Each rule specifies one pattern to search for and one action to perform upon finding the pattern. @@ -1604,7 +1609,7 @@ program looks like this: other things. @end menu -@node Running gawk, Sample Data Files, Getting Started, Getting Started +@node Running gawk @section How to Run @command{awk} Programs @cindex @command{awk} programs, running @@ -1640,7 +1645,7 @@ variations of each. * Quoting:: More discussion of shell quoting issues. @end menu -@node One-shot, Read Terminal, Running gawk, Running gawk +@node One-shot @subsection One-Shot Throwaway @command{awk} Programs Once you are familiar with @command{awk}, you will often type in simple @@ -1672,7 +1677,7 @@ programs from shell scripts, because it avoids the need for a separate file for the @command{awk} program. A self-contained shell script is more reliable because there are no other files to misplace. -@ref{Very Simple, ,Some Simple Examples}, +@ref{Very Simple}, @ifnotinfo later in this @value{CHAPTER}, @end ifnotinfo @@ -1696,7 +1701,7 @@ egrep foo @var{files} @dots{} @end example @end ignore -@node Read Terminal, Long, One-shot, Running gawk +@node Read Terminal @subsection Running @command{awk} Without Input Files @cindex standard input @@ -1758,7 +1763,7 @@ What, me worry? @kbd{@value{CTL}-d} @end example -@node Long, Executable Scripts, Read Terminal, Running gawk +@node Long @subsection Running Long Programs @cindex @command{awk} programs, running @@ -1800,7 +1805,7 @@ awk "BEGIN @{ print \"Don't Panic!\" @}" @cindex quoting @noindent This was explained earlier -(@pxref{Read Terminal, ,Running @command{awk} Without Input Files}). +(@pxref{Read Terminal}). Note that you don't usually need single quotes around the @value{FN} that you specify with @option{-f}, because most @value{FN}s don't contain any of the shell's special characters. Notice that in @file{advice}, the @command{awk} @@ -1816,7 +1821,7 @@ you can add the extension @file{.awk} to the @value{FN}. This doesn't affect the execution of the @command{awk} program but it does make ``housekeeping'' easier. -@node Executable Scripts, Comments, Long, Running gawk +@node Executable Scripts @subsection Executable @command{awk} Programs @cindex @command{awk} programs @cindex @code{#} (number sign), @code{#!} (executable scripts) @@ -1891,7 +1896,7 @@ of @command{awk} (such as @file{/bin/awk}), and some put the name of your script (@samp{advice}). Don't rely on the value of @code{ARGV[0]} to provide your script name. -@node Comments, Quoting, Executable Scripts, Running gawk +@node Comments @subsection Comments in @command{awk} Programs @cindex @code{#} (number sign), commenting @cindex number sign (@code{#}), commenting @@ -1925,7 +1930,7 @@ when reading it at a later time. @cindex single quote (@code{'}), vs. apostrophe @cindex @code{'} (single quote), vs. apostrophe @strong{Caution:} As mentioned in -@ref{One-shot, ,One-Shot Throwaway @command{awk} Programs}, +@ref{One-shot}, you can enclose small to medium programs in single quotes, in order to keep your shell scripts self-contained. When doing so, @emph{don't} put an apostrophe (i.e., a single quote) into a comment (or anywhere else @@ -1958,7 +1963,7 @@ Putting a backslash before the single quote in @samp{let's} wouldn't help, since backslashes are not special inside single quotes. The next @value{SUBSECTION} describes the shell's quoting rules. -@node Quoting, , Comments, Running gawk +@node Quoting @subsection Shell-Quoting Issues @cindex quoting, rules for @@ -2000,7 +2005,7 @@ The shell does no interpretation of the quoted text, passing it on verbatim to the command. It is @emph{impossible} to embed a single quote inside single-quoted text. Refer back to -@ref{Comments, ,Comments in @command{awk} Programs}, +@ref{Comments}, for an example of what happens if you try. @item @@ -2019,7 +2024,7 @@ Thus, the example seen @ifnotinfo previously @end ifnotinfo -in @ref{Read Terminal, ,Running @command{awk} Without Input Files}, +in @ref{Read Terminal}, is applicable: @example @@ -2050,7 +2055,7 @@ awk -F"" '@var{program}' @var{files} # wrong! @end example @noindent -In the second case, @command{awk} will attempt to use the text of the program +In the second case, @command{awk} will attempt to use the text of the program as the value of @code{FS}, and the first @value{FN} as the text of the program! This results in syntax errors at best, and confusing behavior at worst. @end itemize @@ -2096,7 +2101,7 @@ If you really need both single and double quotes in your @command{awk} program, it is probably best to move it into a separate file, where the shell won't be part of the picture, and you can say what you mean. -@node Sample Data Files, Very Simple, Running gawk, Getting Started +@node Sample Data Files @section @value{DDF}s for the Examples @c For gawk >= 3.2, update these data files. No-one has such slow modems! @@ -2182,12 +2187,12 @@ learn in this @value{DOCUMENT}. @cindex Texinfo If you are using the stand-alone version of Info, -see @ref{Extract Program, ,Extracting Programs from Texinfo Source Files}, +see @ref{Extract Program}, for an @command{awk} program that extracts these @value{DF}s from @file{gawk.texi}, the Texinfo source file for this Info file. @end ifinfo -@node Very Simple, Two Rules, Sample Data Files, Getting Started +@node Very Simple @section Some Simple Examples The following command runs a simple @command{awk} program that searches the @@ -2210,7 +2215,7 @@ You will notice that slashes (@samp{/}) surround the string @samp{foo} in the @command{awk} program. The slashes indicate that @samp{foo} is the pattern to search for. This type of pattern is called a @dfn{regular expression}, which is covered in more detail later -(@pxref{Regexp, ,Regular Expressions}). +(@pxref{Regexp}). The pattern is allowed to match parts of words. There are single quotes around the @command{awk} program so that the shell won't @@ -2346,7 +2351,7 @@ If you use the expression @samp{NR % 2 == 1} instead, the program would print the odd-numbered lines. @end itemize -@node Two Rules, More Complex, Very Simple, Getting Started +@node Two Rules @section An Example with Two Rules @cindex @command{awk} programs @@ -2358,8 +2363,8 @@ no actions are run. After processing all the rules that match the line (and perhaps there are none), @command{awk} reads the next line. (However, -@pxref{Next Statement, ,The @code{next} Statement}, -and also @pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). +@pxref{Next Statement}, +and also @pxref{Nextfile Statement}). This continues until the program reaches the end of the file. For example, the following @command{awk} program contains two rules: @@ -2403,7 +2408,7 @@ $ awk '/12/ @{ print $0 @} Note how the line beginning with @samp{sabafoo} in @file{BBS-list} was printed twice, once for each rule. -@node More Complex, Statements/Lines, Two Rules, Getting Started +@node More Complex @section A More Complex Example Now that we've mastered some simple tasks, let's look at @@ -2426,7 +2431,7 @@ This command prints the total number of bytes in all the files in the current directory that were last modified in November (of any year). @footnote{In the C shell (@command{csh}), you need to type a semicolon and then a backslash at the end of the first line; see -@ref{Statements/Lines, ,@command{awk} Statements Versus Lines}, for an +@ref{Statements/Lines}, for an explanation. In a POSIX-compliant shell, such as the Bourne shell or @command{bash}, you can type the example as shown. If the command @samp{echo $path} produces an empty output line, you are most likely @@ -2475,13 +2480,13 @@ After the last line of output from @command{ls} has been processed, the In this example, the value of @code{sum} is 80600. These more advanced @command{awk} techniques are covered in later sections -(@pxref{Action Overview, ,Actions}). Before you can move on to more +(@pxref{Action Overview}). Before you can move on to more advanced @command{awk} programming, you have to know how @command{awk} interprets your input and displays your output. By manipulating fields and using @code{print} statements, you can produce some very useful and impressive-looking reports. -@node Statements/Lines, Other Features, More Complex, Getting Started +@node Statements/Lines @section @command{awk} Statements Versus Lines @cindex line breaks @cindex newlines @@ -2506,10 +2511,10 @@ symbols and keywords: A newline at any other point is considered the end of the statement.@footnote{The @samp{?} and @samp{:} referred to here is the three-operand conditional expression described in -@ref{Conditional Exp, ,Conditional Expressions}. +@ref{Conditional Exp}. Splitting lines after @samp{?} and @samp{:} is a minor @command{gawk} extension; if @option{--posix} is specified -(@pxref{Options, , Command-Line Options}), then this extension is disabled.} +(@pxref{Options}), then this extension is disabled.} @cindex @code{\} (backslash), continuing lines and @cindex backslash (@code{\}), continuing lines and @@ -2623,7 +2628,7 @@ separated with a semicolon was not in the original @command{awk} language; it was added for consistency with the treatment of statements within an action. -@node Other Features, When, Statements/Lines, Getting Started +@node Other Features @section Other Features of @command{awk} @cindex variables @@ -2640,9 +2645,9 @@ performing bit manipulation, and for runtime string translation. As we develop our presentation of the @command{awk} language, we introduce most of the variables and many of the functions. They are defined systematically in @ref{Built-in Variables}, and -@ref{Built-in, ,Built-in Functions}. +@ref{Built-in}. -@node When, , Other Features, Getting Started +@node When @section When to Use @command{awk} @cindex @command{awk}, uses for @@ -2653,7 +2658,7 @@ statements, and other selection criteria, you can produce much more complex output. The @command{awk} language is very useful for producing reports from large amounts of raw data, such as summarizing information from the output of other utility programs like @command{ls}. -(@xref{More Complex, ,A More Complex Example}.) +(@xref{More Complex}.) Programs written with @command{awk} are usually much smaller than they would be in other languages. This makes @command{awk} programs easy to compose and @@ -2680,7 +2685,7 @@ of large programs. Programs in these languages may require more lines of source code than the equivalent @command{awk} programs, but they are easier to maintain and usually run more efficiently. -@node Regexp, Reading Files, Getting Started, Top +@node Regexp @chapter Regular Expressions @cindex regexp, See regular expressions @c STARTOFRANGE regexp @@ -2721,7 +2726,7 @@ regular expressions work, we will present more complicated instances. * Locales:: How the locale affects things. @end menu -@node Regexp Usage, Escape Sequences, Regexp, Regexp +@node Regexp Usage @section How to Use Regular Expressions @cindex regular expressions, as patterns @@ -2761,7 +2766,7 @@ not be the entire current input record. The two operators @samp{~} and @samp{!~} perform regular expression comparisons. Expressions using these operators can be used as patterns, or in @code{if}, @code{while}, @code{for}, and @code{do} statements. -(@xref{Statements, ,Control Statements in Actions}.) +(@xref{Statements}.) For example: @example @@ -2815,7 +2820,7 @@ When a regexp is enclosed in slashes, such as @code{/foo/}, we call it a @dfn{regexp constant}, much like @code{5.27} is a numeric constant and @code{"foo"} is a string constant. -@node Escape Sequences, Regexp Operators, Regexp Usage, Regexp +@node Escape Sequences @section Escape Sequences @cindex escape sequences @@ -2933,11 +2938,11 @@ in order to tell @command{awk} to keep processing the rest of the string. In @command{gawk}, a number of additional two-character sequences that begin with a backslash have special meaning in regexps. -@xref{GNU Regexp Operators, ,@command{gawk}-Specific Regexp Operators}. +@xref{GNU Regexp Operators}. In a regexp, a backslash before any character that is not in the previous list and not listed in -@ref{GNU Regexp Operators, ,@command{gawk}-Specific Regexp Operators}, +@ref{GNU Regexp Operators}, means that the next character should be taken literally, even if it would normally be a regexp operator. For example, @code{/a\+b/} matches the three characters @samp{a+b}. @@ -2958,9 +2963,9 @@ as soon as @command{awk} reads your program. @item @command{gawk} processes both regexp constants and dynamic regexps -(@pxref{Computed Regexps, ,Using Dynamic Regexps}), +(@pxref{Computed Regexps}), for the special operators listed in -@ref{GNU Regexp Operators, ,@command{gawk}-Specific Regexp Operators}. +@ref{GNU Regexp Operators}. @item A backslash before any other character means to treat that character @@ -3006,7 +3011,7 @@ In such implementations, typing @code{"a\qc"} is the same as typing Suppose you use an octal or hexadecimal escape to represent a regexp metacharacter. -(See @ref{Regexp Operators, , Regular Expression Operators}.) +(See @ref{Regexp Operators}.) Does @command{awk} treat the character as a literal character or as a regexp operator? @@ -3015,12 +3020,12 @@ Historically, such characters were taken literally. @value{DARKCORNER} However, the POSIX standard indicates that they should be treated as real metacharacters, which is what @command{gawk} does. -In compatibility mode (@pxref{Options, ,Command-Line Options}), +In compatibility mode (@pxref{Options}), @command{gawk} treats the characters represented by octal and hexadecimal escape sequences literally when used in regexp constants. Thus, @code{/a\52b/} is equivalent to @code{/a\*b/}. -@node Regexp Operators, Character Lists, Escape Sequences, Regexp +@node Regexp Operators @section Regular Expression Operators @c STARTOFRANGE regexpo @cindex regular expressions, operators @@ -3093,7 +3098,7 @@ with @samp{A}. @c comma before using does NOT do tertiary @cindex POSIX @command{awk}, period (@code{.}), using -In strict POSIX mode (@pxref{Options, ,Command-Line Options}), +In strict POSIX mode (@pxref{Options}), @samp{.} does not match the @sc{nul} character, which is a character with all bits equal to zero. Otherwise, @sc{nul} is just another character. Other versions of @command{awk} @@ -3113,7 +3118,7 @@ the square brackets. For example, @samp{[MVX]} matches any one of the characters @samp{M}, @samp{V}, or @samp{X} in a string. A full discussion of what can be inside the square brackets of a character list is given in -@ref{Character Lists, ,Using Character Lists}. +@ref{Character Lists}. @cindex character lists, complemented @item [^ @dots{}] @@ -3137,7 +3142,7 @@ means it matches any string that starts with @samp{P} or contains a digit. The alternation applies to the largest possible regexps on either side. @cindex @code{()} (parentheses) -@cindex parentheses @code{()} +@cindex parentheses @code{()} @item (@dots{}) Parentheses are used for grouping in regular expressions, as in arithmetic. They can be used to concatenate regular expressions @@ -3218,7 +3223,7 @@ and @command{egrep} consistent with each other. However, because old programs may use @samp{@{} and @samp{@}} in regexp constants, by default @command{gawk} does @emph{not} match interval expressions in regexps. If either @option{--posix} or @option{--re-interval} are specified -(@pxref{Options, , Command-Line Options}), then interval expressions +(@pxref{Options}), then interval expressions are allowed in regexps. For new programs that use @samp{@{} and @samp{@}} in regexp constants, @@ -3244,12 +3249,12 @@ For example, @samp{/+/} matches a literal plus sign. However, many other versio @command{awk} treat such a usage as a syntax error. If @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), POSIX character classes and interval expressions are not available in regular expressions. @c ENDOFRANGE regexpo -@node Character Lists, GNU Regexp Operators, Regexp Operators, Regexp +@node Character Lists @section Using Character Lists @c STARTOFRANGE charlist @cindex character lists @@ -3423,7 +3428,7 @@ they do not recognize collating symbols or equivalence classes. @c maybe one day ... @c ENDOFRANGE charlist -@node GNU Regexp Operators, Case-sensitivity, Character Lists, Regexp +@node GNU Regexp Operators @section @command{gawk}-Specific Regexp Operators @c This section adapted (long ago) from the regex-0.12 manual @@ -3546,7 +3551,7 @@ lesser of two evils. @cindex regular expressions, @command{gawk}, command-line options @cindex @command{gawk}, command-line options The various command-line options -(@pxref{Options, ,Command-Line Options}) +(@pxref{Options}) control how @command{gawk} interprets characters in regexps: @table @asis @@ -3559,7 +3564,7 @@ GNU regexp operators. @end ifnotinfo @ifnottex GNU regexp operators described -in @ref{Regexp Operators, ,Regular Expression Operators}. +in @ref{Regexp Operators}. @end ifnottex However, interval expressions are not supported. @@ -3584,7 +3589,7 @@ when @option{--posix} is is used.) @c ENDOFRANGE gregexp @c ENDOFRANGE regexpg -@node Case-sensitivity, Leftmost Longest, GNU Regexp Operators, Regexp +@node Case-sensitivity @section Case Sensitivity in Matching @c STARTOFRANGE regexpcs @@ -3605,7 +3610,7 @@ One way to perform a case-insensitive match at a particular point in the program is to convert the data to a single case, using the @code{tolower} or @code{toupper} built-in string functions (which we haven't discussed yet; -@pxref{String Functions, ,String Manipulation Functions}). +@pxref{String Functions}). For example: @example @@ -3656,8 +3661,8 @@ thing you can do with @code{IGNORECASE} only is dynamically turn case-sensitivity on or off for all the rules at once. @code{IGNORECASE} can be set on the command line or in a @code{BEGIN} rule -(@pxref{Other Arguments, ,Other Command-Line Arguments}; also -@pxref{Using BEGIN/END, ,Startup and Cleanup Actions}). +(@pxref{Other Arguments}; also +@pxref{Using BEGIN/END}). Setting @code{IGNORECASE} from the command line is a way to make a program case-insensitive without having to edit it. @@ -3677,12 +3682,12 @@ ASCII characters, which also provides a number of characters suitable for use with European languages. The value of @code{IGNORECASE} has no effect if @command{gawk} is in -compatibility mode (@pxref{Options, ,Command-Line Options}). +compatibility mode (@pxref{Options}). Case is always significant in compatibility mode. @c ENDOFRANGE csregexp @c ENDOFRANGE regexpcs -@node Leftmost Longest, Computed Regexps, Case-sensitivity, Regexp +@node Leftmost Longest @section How Much Text Matches? @cindex regular expressions, leftmost longest match @@ -3694,7 +3699,7 @@ echo aaaabcd | awk '@{ sub(/a+/, "<A>"); print @}' @end example This example uses the @code{sub} function (which we haven't discussed yet; -@pxref{String Functions, ,String Manipulation Functions}) +@pxref{String Functions}) to make a change to the input record. Here, the regexp @code{/a+/} indicates ``one or more @samp{a} characters,'' and the replacement text is @samp{<A>}. @@ -3714,14 +3719,14 @@ For simple match/no-match tests, this is not so important. But when doing text matching and substitutions with the @code{match}, @code{sub}, @code{gsub}, and @code{gensub} functions, it is very important. @ifinfo -@xref{String Functions, ,String Manipulation Functions}, +@xref{String Functions}, for more information on these functions. @end ifinfo Understanding this principle is also important for regexp-based record -and field splitting (@pxref{Records, ,How Input Is Split into Records}, -and also @pxref{Field Separators, ,Specifying How Fields Are Separated}). +and field splitting (@pxref{Records}, +and also @pxref{Field Separators}). -@node Computed Regexps, Locales, Leftmost Longest, Regexp +@node Computed Regexps @section Using Dynamic Regexps @c STARTOFRANGE dregexp @@ -3837,7 +3842,7 @@ occur often in practice, but it's worth noting for future reference. @c ENDOFRANGE regexpd @c ENDOFRANGE regexp -@node Locales, , Computed Regexps, Regexp +@node Locales @section Where You Are Makes A Difference Modern systems support the notion of @dfn{locales}: a way to tell @@ -3849,7 +3854,7 @@ one particular case. The following example uses the @code{sub} function, which does text replacement -(@pxref{String Functions, , String-Manipulation Functions}). +(@pxref{String Functions}). Here, the intent is to remove trailing uppercase characters: @example @@ -3868,7 +3873,7 @@ before running @command{gawk}, by using the shell statements: @example -LANG=C LC_ALL=C +LANG=C LC_ALL=C export LANG LC_ALL @end example @@ -3877,7 +3882,19 @@ Unix manner, where case distinctions do matter. You may wish to put these statements into your shell startup file, e.g., @file{$HOME/.profile}. -@node Reading Files, Printing, Regexp, Top +Similar considerations apply to other ranges. For example, +@samp{["-/]} is perfectly valid in ASCII, but is not valid in many +Unicode locales, such as @samp{en_US.UTF-8}. (In general, such +ranges should be avoided; either list the characters individually, +or use a POSIX character class such as @samp{[[:punct:]]}.) + +For the normal case of @samp{RS = "\n"}, the locale is largely irrelevant. +For other single byte record separators, using @samp{LC_ALL=C} will give you +much better performance when reading records. Otherwise, @command{gawk} has +to make several function calls, @emph{per input character} to find the record +terminator. + +@node Reading Files @chapter Reading Input Files @c STARTOFRANGE infir @@ -3906,7 +3923,7 @@ On rare occasions, you may need to use the @code{getline} command. The @code{getline} command is valuable, both because it can do explicit input from any number of files, and because the files used with it do not have to be named on the @command{awk} command line -(@pxref{Getline, ,Explicit Input with @code{getline}}). +(@pxref{Getline}). @menu * Records:: Controlling how data is split into records. @@ -3920,7 +3937,7 @@ used with it do not have to be named on the @command{awk} command line using the @code{getline} function. @end menu -@node Records, Fields, Reading Files, Reading Files +@node Records @section How Input Is Split into Records @c STARTOFRANGE inspl @@ -3953,13 +3970,13 @@ assigning the character to the built-in variable @code{RS}. Like any other variable, the value of @code{RS} can be changed in the @command{awk} program with the assignment operator, @samp{=} -(@pxref{Assignment Ops, ,Assignment Expressions}). +(@pxref{Assignment Ops}). The new record-separator character should be enclosed in quotation marks, which indicate a string constant. Often the right time to do this is at the beginning of execution, before any input is processed, so that the very first record is read with the proper separator. To do this, use the special @code{BEGIN} pattern -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). +(@pxref{BEGIN/END}). For example: @cindex @code{BEGIN} pattern @@ -4012,7 +4029,7 @@ $ awk 'BEGIN @{ RS = "/" @} @noindent Note that the entry for the @samp{camelot} BBS is not split. In the original @value{DF} -(@pxref{Sample Data Files, ,@value{DDF}s for the Examples}), +(@pxref{Sample Data Files}), the line looks like this: @example @@ -4031,7 +4048,7 @@ is the original newline in the @value{DF}, not the one added by @cindex separators, for records Another way to change the record separator is on the command line, using the variable-assignment feature -(@pxref{Other Arguments, ,Other Command-Line Arguments}): +(@pxref{Other Arguments}): @example awk '@{ print $0 @}' RS="/" BBS-list @@ -4063,7 +4080,7 @@ The empty string @code{""} (a string without any characters) has a special meaning as the value of @code{RS}. It means that records are separated by one or more blank lines and nothing else. -@xref{Multiple Line, ,Multiple-Line Records}, for more details. +@xref{Multiple Line}, for more details. If you change the value of @code{RS} in the middle of an @command{awk} run, the new value is used to delimit subsequent records, but the record @@ -4083,7 +4100,7 @@ sets the variable @code{RT} to the text in the input that matched When using @command{gawk}, the value of @code{RS} is not limited to a one-character string. It can be any regular expression -(@pxref{Regexp, ,Regular Expressions}). +(@pxref{Regexp}). In general, each record ends at the next string that matches the regular expression; the next record starts at the end of the matching string. This general rule is @@ -4107,9 +4124,9 @@ with optional leading and/or trailing whitespace: $ echo record 1 AAAA record 2 BBBB record 3 | > gawk 'BEGIN @{ RS = "\n|( *[[:upper:]]+ *)" @} > @{ print "Record =", $0, "and RT =", RT @}' -@print{} Record = record 1 and RT = AAAA -@print{} Record = record 2 and RT = BBBB -@print{} Record = record 3 and RT = +@print{} Record = record 1 and RT = AAAA +@print{} Record = record 2 and RT = BBBB +@print{} Record = record 3 and RT = @print{} @end example @@ -4117,7 +4134,7 @@ $ echo record 1 AAAA record 2 BBBB record 3 | The final line of output has an extra blank line. This is because the value of @code{RT} is a newline, and the @code{print} statement supplies its own terminating newline. -@xref{Simple Sed, ,A Simple Stream Editor}, for a more useful example +@xref{Simple Sed}, for a more useful example of @code{RS} as a regexp and @code{RT}. If you set @code{RS} to a regular expression that allows optional @@ -4132,7 +4149,7 @@ no guarantee that this will never happen. The use of @code{RS} as a regular expression and the @code{RT} variable are @command{gawk} extensions; they are not available in compatibility mode -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). In compatibility mode, only the first character of the value of @code{RS} is used to determine the end of the record. @@ -4177,7 +4194,7 @@ record onto the end of the previous ones. @c ENDOFRANGE inspl @c ENDOFRANGE recspl -@node Fields, Nonconstant Fields, Records, Reading Files +@node Fields @section Examining Fields @cindex examining fields @@ -4257,7 +4274,7 @@ $ awk '$1 ~ /foo/ @{ print $0 @}' BBS-list This example prints each record in the file @file{BBS-list} whose first field contains the string @samp{foo}. The operator @samp{~} is called a @dfn{matching operator} -(@pxref{Regexp Usage, , How to Use Regular Expressions}); +(@pxref{Regexp Usage}); it tests whether a string (here, the field @code{$1}) matches a given regular expression. @@ -4274,7 +4291,7 @@ $ awk '/foo/ @{ print $1, $NF @}' BBS-list @end example @c ENDOFRANGE fiex -@node Nonconstant Fields, Changing Fields, Fields, Reading Files +@node Nonconstant Fields @section Nonconstant Field Numbers @cindex fields, numbers @cindex field numbers @@ -4310,7 +4327,7 @@ operator in the field-number expression. This example, then, prints the hours of operation (the fourth field) for every line of the file @file{BBS-list}. (All of the @command{awk} operators are listed, in order of decreasing precedence, in -@ref{Precedence, , Operator Precedence (How Operators Nest)}.) +@ref{Precedence}.) If the field number you compute is zero, you get the entire record. Thus, @samp{$(2-2)} has the same value as @code{$0}. Negative field @@ -4320,13 +4337,13 @@ what happens when you reference a negative field number. @command{gawk} notices this and terminates your program. Other @command{awk} implementations may behave differently.) -As mentioned in @ref{Fields, ,Examining Fields}, +As mentioned in @ref{Fields}, @command{awk} stores the current record's number of fields in the built-in variable @code{NF} (also @pxref{Built-in Variables}). The expression @code{$NF} is not a special feature---it is the direct consequence of evaluating @code{NF} and using its value as a field number. -@node Changing Fields, Field Separators, Nonconstant Fields, Reading Files +@node Changing Fields @section Changing the Contents of a Field @c STARTOFRANGE ficon @@ -4351,7 +4368,7 @@ The program first saves the original value of field three in the variable @code{nboxes}. The @samp{-} sign represents subtraction, so this program reassigns field three, @code{$3}, as the original value of field three minus ten: -@samp{$3 - 10}. (@xref{Arithmetic Ops, ,Arithmetic Operators}.) +@samp{$3 - 10}. (@xref{Arithmetic Ops}.) Then it prints the original and new values for field three. (Someone in the warehouse made a consistent mistake while inventorying the red boxes.) @@ -4361,7 +4378,7 @@ as a number; the string of characters must be converted to a number for the computer to do arithmetic on it. The number resulting from the subtraction is converted back to a string of characters that then becomes field three. -@xref{Conversion, ,Conversion of Strings and Numbers}. +@xref{Conversion}. When the value of a field is changed (as perceived by @command{awk}), the text of the input record is recalculated to contain the new field where @@ -4408,7 +4425,7 @@ existing fields. @cindex output field separator, See @code{OFS} variable @cindex field separators, See Also @code{OFS} This recomputation affects and is affected by -@code{NF} (the number of fields; @pxref{Fields, ,Examining Fields}). +@code{NF} (the number of fields; @pxref{Fields}). For example, the value of @code{NF} is set to the number of the highest field you create. The exact format of @code{$0} is also affected by a feature that has not been discussed yet: @@ -4429,9 +4446,9 @@ else @noindent should print @samp{everything is normal}, because @code{NF+1} is certain -to be out of range. (@xref{If Statement, ,The @code{if}-@code{else} Statement}, +to be out of range. (@xref{If Statement}, for more information about @command{awk}'s @code{if-else} statements. -@xref{Typing and Comparison, ,Variable Typing and Comparison Expressions}, +@xref{Typing and Comparison}, for more information about the @samp{!=} operator.) It is important to note that making an assignment to an existing field @@ -4502,10 +4519,10 @@ the fields. Any assignment to @code{$0} causes the record to be reparsed into fields using the @emph{current} value of @code{FS}. This also applies to any built-in function that updates @code{$0}, such as @code{sub} and @code{gsub} -(@pxref{String Functions, ,String-Manipulation Functions}). +(@pxref{String Functions}). @c ENDOFRANGE ficon -@node Field Separators, Constant Size, Changing Fields, Reading Files +@node Field Separators @section Specifying How Fields Are Separated @menu @@ -4547,12 +4564,12 @@ the Unix Bourne shell, @command{sh}, or @command{bash}). @cindex @code{FS} variable, changing value of The value of @code{FS} can be changed in the @command{awk} program with the -assignment operator, @samp{=} (@pxref{Assignment Ops, ,Assignment Expressions}). +assignment operator, @samp{=} (@pxref{Assignment Ops}). Often the right time to do this is at the beginning of execution before any input has been processed, so that the very first record is read with the proper separator. To do this, use the special @code{BEGIN} pattern -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). +(@pxref{BEGIN/END}). For example, here we set the value of @code{FS} to the string @code{","}: @@ -4612,7 +4629,7 @@ beginning or the end of the line, that too delimits an empty field. The space character is the only single character that does not follow these rules. -@node Regexp Field Splitting, Single Character Fields, Field Separators, Field Separators +@node Regexp Field Splitting @subsection Using Regular Expressions to Separate Fields @c STARTOFRANGE regexpfs @@ -4644,7 +4661,7 @@ For a less trivial example of a regular expression, try using single spaces to separate fields the way single commas are used. @code{FS} can be set to @w{@code{"[@ ]"}} (left bracket, space, right bracket). This regular expression matches a single space and nothing else -(@pxref{Regexp, ,Regular Expressions}). +(@pxref{Regexp}). There is an important difference between the two cases of @samp{FS = @w{" "}} (a single space) and @samp{FS = @w{"[ \t\n]+"}} @@ -4696,7 +4713,7 @@ Finally, the last @code{print} statement prints the new @code{$0}. @c ENDOFRANGE regexpfs @c ENDOFRANGE fsregexp -@node Single Character Fields, Command Line Field Separator, Regexp Field Splitting, Field Separators +@node Single Character Fields @subsection Making Each Character a Separate Field @cindex differences in @command{awk} and @command{gawk}, single-character fields @@ -4726,11 +4743,11 @@ In this case, most versions of Unix @command{awk} simply treat the entire record as only having one field. @value{DARKCORNER} In compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), if @code{FS} is the null string, then @command{gawk} also behaves this way. -@node Command Line Field Separator, Field Splitting Summary, Single Character Fields, Field Separators +@node Command Line Field Separator @subsection Setting @code{FS} from the Command Line @cindex @code{-F} option @cindex options, command-line @@ -4778,7 +4795,7 @@ a single @samp{\} to use for the field separator. @c @cindex historical features As a special case, in compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), if the argument to @option{-F} is @samp{t}, then @code{FS} is set to the TAB character. If you type @samp{-F\t} at the shell, without any quotes, the @samp{\} gets deleted, so @command{awk} @@ -4849,7 +4866,7 @@ the entries for users who have no password: awk -F: '$2 == ""' /etc/passwd @end example -@node Field Splitting Summary, , Command Line Field Separator, Field Separators +@node Field Splitting Summary @subsection Field-Splitting Summary It is important to remember that when you assign a string constant @@ -4935,7 +4952,7 @@ root:nSijPlPhZZwgE:0:0:Root:/: @subheading Advanced Notes: @code{FS} and @code{IGNORECASE} The @code{IGNORECASE} variable -(@pxref{User-modified, ,Built-in Variables That Control @command{awk}}) +(@pxref{User-modified}) affects field splitting @emph{only} when the value of @code{FS} is a regexp. It has no effect when @code{FS} is a single character, even if that character is a letter. Thus, in the following code: @@ -4956,7 +4973,7 @@ will take effect. @c ENDOFRANGE fisepr @c ENDOFRANGE fisepg -@node Constant Size, Multiple Line, Field Separators, Reading Files +@node Constant Size @section Reading Fixed-Width Data @ifnotinfo @@ -4985,7 +5002,7 @@ the use of a variable number of spaces and @emph{empty fields are just spaces}. Clearly, @command{awk}'s normal field splitting based on @code{FS} does not work well in this case. Although a portable @command{awk} program can use a series of @code{substr} calls on @code{$0} -(@pxref{String Functions, ,String Manipulation Functions}), +(@pxref{String Functions}), this is awkward and inefficient for a large number of fields. @c comma before specifying is part of tertiary @@ -5076,7 +5093,7 @@ Assigning a value to @code{FS} causes @command{gawk} to use without having to know the current value of @code{FS}. In order to tell which kind of field splitting is in effect, use @code{PROCINFO["FS"]} -(@pxref{Auto-set, ,Built-in Variables That Convey Information}). +(@pxref{Auto-set}). The value is @code{"FS"} if regular field splitting is being used, or it is @code{"FIELDWIDTHS"} if fixed-width field splitting is being used: @@ -5090,10 +5107,10 @@ else This information is useful when writing a function that needs to temporarily change @code{FS} or @code{FIELDWIDTHS}, read some records, and then restore the original settings -(@pxref{Passwd Functions, ,Reading the User Database}, +(@pxref{Passwd Functions}, for an example of such a function). -@node Multiple Line, Getline, Constant Size, Reading Files +@node Multiple Line @section Multiple-Line Records @c STARTOFRANGE recm @@ -5134,7 +5151,7 @@ string @code{"\n\n+"} to @code{RS}. This regexp matches the newline at the end of the record and one or more blank lines after the record. In addition, a regular expression always matches the longest possible sequence when there is a choice -(@pxref{Leftmost Longest, ,How Much Text Matches?}). +(@pxref{Leftmost Longest}). So the next record doesn't start until the first nonblank line that follows---no matter how many blank lines appear in a row, they are considered one record separator. @@ -5166,7 +5183,7 @@ to @w{@code{" "}}). This feature can be a problem if you really don't want the newline character to separate fields, because there is no way to prevent it. However, you can work around this by using the @code{split} function to break up the record manually -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). If you have a single character field separator, you can work around the special feature in a different way, by making @code{FS} into a regexp for that single character. For example, if the field @@ -5217,15 +5234,15 @@ $ awk -f addrs.awk addresses @print{} Name is: Jane Doe @print{} Address is: 123 Main Street @print{} City and State are: Anywhere, SE 12345-6789 -@print{} +@print{} @print{} Name is: John Smith @print{} Address is: 456 Tree-lined Avenue @print{} City and State are: Smallville, MW 98765-4321 -@print{} +@print{} @dots{} @end example -@xref{Labels Program, ,Printing Mailing Labels}, for a more realistic +@xref{Labels Program}, for a more realistic program that deals with address lists. The following table @@ -5268,7 +5285,7 @@ value specified by @code{RS}. @c ENDOFRANGE imr @c ENDOFRANGE frm -@node Getline, , Multiple Line, Reading Files +@node Getline @section Explicit Input with @code{getline} @c STARTOFRANGE getl @@ -5316,7 +5333,7 @@ represents a shell command. * Getline Summary:: Summary of @code{getline} Variants. @end menu -@node Plain Getline, Getline/Variable, Getline, Getline +@node Plain Getline @subsection Using @code{getline} with No Arguments The @code{getline} command can be used without arguments to read input @@ -5370,9 +5387,9 @@ of @code{$0} that triggered the rule that executed @code{getline} is lost. By contrast, the @code{next} statement reads a new record but immediately begins processing it normally, starting with the first -rule in the program. @xref{Next Statement, ,The @code{next} Statement}. +rule in the program. @xref{Next Statement}. -@node Getline/Variable, Getline/File, Plain Getline, Getline +@node Getline/Variable @subsection Using @code{getline} into a Variable @c comma before using is NOT for tertiary @cindex variables, @code{getline} command into, using @@ -5422,7 +5439,7 @@ The @code{getline} command used in this way sets only the variables split into fields, so the values of the fields (including @code{$0}) and the value of @code{NF} do not change. -@node Getline/File, Getline/Variable/File, Getline/Variable, Getline +@node Getline/File @subsection Using @code{getline} from a File @cindex input redirection @@ -5462,10 +5479,8 @@ According to POSIX, @samp{getline < @var{expression}} is ambiguous if because the concatenation operator is not parenthesized. You should write it as @samp{getline < (dir "/" file)} if you want your program to be portable to other @command{awk} implementations. -(It happens that @command{gawk} gets it right, but you should not -rely on this. Parentheses make it easier to read.) -@node Getline/Variable/File, Getline/Pipe, Getline/File, Getline +@node Getline/Variable/File @subsection Using @code{getline} into a Variable from a File @c comma before using is NOT for tertiary @cindex variables, @code{getline} command into, using @@ -5502,16 +5517,16 @@ the @samp{@@include} line. The @code{close} function is called to ensure that if two identical @samp{@@include} lines appear in the input, the entire specified file is included twice. -@xref{Close Files And Pipes, ,Closing Input and Output Redirections}. +@xref{Close Files And Pipes}. One deficiency of this program is that it does not process nested @samp{@@include} statements (i.e., @samp{@@include} statements in included files) the way a true macro preprocessor would. -@xref{Igawk Program, ,An Easy Way to Use Library Functions}, for a program +@xref{Igawk Program}, for a program that does handle nested @samp{@@include} statements. -@node Getline/Pipe, Getline/Variable/Pipe, Getline/Variable/File, Getline +@node Getline/Pipe @subsection Using @code{getline} from a Pipe @cindex @code{|} (vertical bar), @code{|} operator (I/O) @@ -5546,7 +5561,7 @@ The @code{close} function is called to ensure that if two identical @samp{@@execute} lines appear in the input, the command is run for each one. @ifnottex -@xref{Close Files And Pipes, ,Closing Input and Output Redirections}. +@xref{Close Files And Pipes}. @end ifnottex @c Exercise!! @c This example is unrealistic, since you could just use system @@ -5593,12 +5608,8 @@ According to POSIX, @samp{@var{expression} | getline} is ambiguous if because the concatenation operator is not parenthesized. You should write it as @samp{(@w{"echo "} "date") | getline} if you want your program to be portable to other @command{awk} implementations. -@ifinfo -(It happens that @command{gawk} gets it right, but you should not -rely on this. Parentheses make it easier to read, anyway.) -@end ifinfo -@node Getline/Variable/Pipe, Getline/Coprocess, Getline/Pipe, Getline +@node Getline/Variable/Pipe @subsection Using @code{getline} into a Variable from a Pipe @c comma before using is NOT for tertiary @cindex variables, @code{getline} command into, using @@ -5629,11 +5640,9 @@ According to POSIX, @samp{@var{expression} | getline @var{var}} is ambiguous if because the concatenation operator is not parenthesized. You should write it as @samp{(@w{"echo "} "date") | getline @var{var}} if you want your program to be portable to other @command{awk} implementations. -(It happens that @command{gawk} gets it right, but you should not -rely on this. Parentheses make it easier to read, anyway.) @end ifinfo -@node Getline/Coprocess, Getline/Variable/Coprocess, Getline/Variable/Pipe, Getline +@node Getline/Coprocess @subsection Using @code{getline} from a Coprocess @cindex coprocesses, @code{getline} from @c comma before using is NOT for tertiary @@ -5672,10 +5681,10 @@ and of @code{NF}. Coprocesses are an advanced feature. They are discussed here only because this is the @value{SECTION} on @code{getline}. -@xref{Two-way I/O, ,Two-Way Communications with Another Process}, +@xref{Two-way I/O}, where coprocesses are discussed in more detail. -@node Getline/Variable/Coprocess, Getline Notes, Getline/Coprocess, Getline +@node Getline/Variable/Coprocess @subsection Using @code{getline} into a Variable from a Coprocess @c comma before using is NOT for tertiary @cindex variables, @code{getline} command into, using @@ -5691,11 +5700,11 @@ changed is @var{var}. @ifinfo Coprocesses are an advanced feature. They are discussed here only because this is the @value{SECTION} on @code{getline}. -@xref{Two-way I/O, ,Two-Way Communications with Another Process}, +@xref{Two-way I/O}, where coprocesses are discussed in more detail. @end ifinfo -@node Getline Notes, Getline Summary, Getline/Variable/Coprocess, Getline +@node Getline Notes @subsection Points to Remember About @code{getline} Here are some miscellaneous points about @code{getline} that you should bear in mind: @@ -5731,8 +5740,8 @@ causes @command{awk} to set the value of @code{FILENAME}. Normally, @code{FILENAME} does not have a value inside @code{BEGIN} rules, because you have not yet started to process the command-line @value{DF}s. @value{DARKCORNER} -(@xref{BEGIN/END, , The @code{BEGIN} and @code{END} Special Patterns}, -also @pxref{Auto-set, ,Built-in Variables That Convey Information}.) +(@xref{BEGIN/END}, +also @pxref{Auto-set}.) @item Using @code{FILENAME} with @code{getline} @@ -5745,7 +5754,7 @@ probably by accident, and you should reconsider what it is you're trying to accomplish. @end itemize -@node Getline Summary, , Getline Notes, Getline +@node Getline Summary @subsection Summary of @code{getline} Variants @cindex @code{getline} command, variants @@ -5775,7 +5784,7 @@ This is a @command{gawk} extension @c ENDOFRANGE inex @c ENDOFRANGE infir -@node Printing, Expressions, Reading Files, Top +@node Printing @chapter Printing Output @c STARTOFRANGE prnt @@ -5790,9 +5799,9 @@ computing @emph{which} values to print. However, with two exceptions, you cannot specify @emph{how} to print them---how many columns, whether to use exponential notation or not, and so on. (For the exceptions, @pxref{Output Separators}, and -@ref{OFMT, ,Controlling Numeric Output with @code{print}}.) +@ref{OFMT}.) For printing with specifications, you need the @code{printf} statement -(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}). +(@pxref{Printf}). @c STARTOFRANGE prnts @cindex @code{print} statement @@ -5816,7 +5825,7 @@ and discusses the @code{close} built-in function. * Close Files And Pipes:: Closing Input and Output Files and Pipes. @end menu -@node Print, Print Examples, Printing, Printing +@node Print @section The @code{print} Statement The @code{print} statement is used to produce output with simple, standardized @@ -5832,7 +5841,7 @@ print @var{item1}, @var{item2}, @dots{} The entire list of items may be optionally enclosed in parentheses. The parentheses are necessary if any of the item expressions uses the @samp{>} relational operator; otherwise it could be confused with a redirection -(@pxref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}). +(@pxref{Redirection}). The items to print can be constant strings or numbers, fields of the current record (such as @code{$1}), variables, or any @command{awk} @@ -5850,7 +5859,7 @@ double-quote characters, your text is taken as an @command{awk} expression, and you will probably get an error. Keep in mind that a space is printed between any two items. -@node Print Examples, Output Separators, Print, Printing +@node Print Examples @section Examples of @code{print} Statements Each @code{print} statement makes at least one line of output. However, it @@ -5907,7 +5916,7 @@ example's output makes much sense. A heading line at the beginning would make it clearer. Let's add some headings to our table of months (@code{$1}) and green crates shipped (@code{$2}). We do this using the @code{BEGIN} pattern -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}) +(@pxref{BEGIN/END}) so that the headings are only printed once: @example @@ -5948,17 +5957,17 @@ Lining up columns this way can get pretty complicated when there are many columns to fix. Counting spaces for two or three columns is simple, but any more than this can take up a lot of time. This is why the @code{printf} statement was -created (@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}); +created (@pxref{Printf}); one of its specialties is lining up columns of data. @cindex line continuations, in @code{print} statement @cindex @code{print} statement, line continuations and @strong{Note:} You can continue either a @code{print} or @code{printf} statement simply by putting a newline after any comma -(@pxref{Statements/Lines, ,@command{awk} Statements Versus Lines}). +(@pxref{Statements/Lines}). @c ENDOFRANGE prnts -@node Output Separators, OFMT, Print Examples, Printing +@node Output Separators @section Output Separators @cindex @code{OFS} variable @@ -5984,11 +5993,11 @@ character. Thus, each @code{print} statement normally makes a separate line. In order to change how output fields and records are separated, assign new values to the variables @code{OFS} and @code{ORS}. The usual place to do this is in the @code{BEGIN} rule -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}), so +(@pxref{BEGIN/END}), so that it happens before any input is processed. It can also be done with assignments on the command line, before the names of the input files, or using the @option{-v} command-line option -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). The following example prints the first and second fields of each input record, separated by a semicolon, with a blank line added after each newline: @@ -6008,9 +6017,9 @@ program by using a new value of @code{OFS}. $ awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @} > @{ print $1, $2 @}' BBS-list @print{} aardvark;555-5553 -@print{} +@print{} @print{} alpo-net;555-3412 -@print{} +@print{} @print{} barfly;555-7685 @dots{} @end example @@ -6018,7 +6027,7 @@ $ awk 'BEGIN @{ OFS = ";"; ORS = "\n\n" @} If the value of @code{ORS} does not contain a newline, the program's output is run together on a single line. -@node OFMT, Printf, Output Separators, Printing +@node OFMT @section Controlling Numeric Output with @code{print} @cindex numeric, output format @c the comma does NOT start a secondary @@ -6027,13 +6036,13 @@ When the @code{print} statement is used to print numeric values, @command{awk} internally converts the number to a string of characters and prints that string. @command{awk} uses the @code{sprintf} function to do this conversion -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). For now, it suffices to say that the @code{sprintf} function accepts a @dfn{format specification} that tells it how to format numbers (or strings), and that there are a number of different ways in which numbers can be formatted. The different format specifications are discussed more fully in -@ref{Control Letters, , Format-Control Letters}. +@ref{Control Letters}. @cindex @code{sprintf} function @cindex @code{OFMT} variable @@ -6062,7 +6071,7 @@ According to the POSIX standard, @command{awk}'s behavior is undefined if @code{OFMT} contains anything but a floating-point conversion specification. @value{DARKCORNER} -@node Printf, Redirection, OFMT, Printing +@node Printf @section Using @code{printf} Statements for Fancier Printing @c STARTOFRANGE printfs @@ -6086,7 +6095,7 @@ arguments. * Printf Examples:: Several examples. @end menu -@node Basic Printf, Control Letters, Printf, Printf +@node Basic Printf @subsection Introduction to the @code{printf} Statement @cindex @code{printf} statement, syntax of @@ -6100,7 +6109,7 @@ printf @var{format}, @var{item1}, @var{item2}, @dots{} The entire list of arguments may optionally be enclosed in parentheses. The parentheses are necessary if any of the item expressions use the @samp{>} relational operator; otherwise, it can be confused with a redirection -(@pxref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}). +(@pxref{Redirection}). @cindex format strings The difference between @code{printf} and @code{print} is the @var{format} @@ -6133,7 +6142,7 @@ $ awk 'BEGIN @{ Here, neither the @samp{+} nor the @samp{OUCH} appear when the message is printed. -@node Control Letters, Format Modifiers, Basic Printf, Printf +@node Control Letters @subsection Format-Control Letters @cindex @code{printf} statement, format-control characters @cindex format specifiers, @code{printf} statement @@ -6214,13 +6223,15 @@ argument and it ignores any modifiers. @cindex dark corner, format-control characters @cindex @command{gawk}, format-control characters @strong{Note:} -When using the integer format-control letters for values that are outside -the range of a C @code{long} integer, @command{gawk} switches to the -@samp{%g} format specifier. Other versions of @command{awk} may print -invalid values or do something else entirely. +When using the integer format-control letters for values that are +outside the range of the widest C integer type, @command{gawk} switches to the +the @samp{%g} format specifier. If @option{--lint} is provided on the +command line (@pxref{Options}), @command{gawk} +warns about this. Other versions of @command{awk} may print invalid +values or do something else entirely. @value{DARKCORNER} -@node Format Modifiers, Printf Examples, Control Letters, Printf +@node Format Modifiers @subsection Modifiers for @code{printf} Formats @c STARTOFRANGE pfm @@ -6259,7 +6270,7 @@ prints the famous friendly message twice. At first glance, this feature doesn't seem to be of much use. It is in fact a @command{gawk} extension, intended for use in translating messages at runtime. -@xref{Printf Ordering, , Rearranging @code{printf} Arguments}, +@xref{Printf Ordering}, which describes how and why to use positional specifiers. For now, we will not use them. @@ -6405,12 +6416,12 @@ C programmers may be used to supplying additional modifiers in @code{printf} format strings. These are not valid in @command{awk}. Most @command{awk} implementations silently ignore these modifiers. If @option{--lint} is provided on the command line -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), @command{gawk} warns about their use. If @option{--posix} is supplied, their use is a fatal error. @c ENDOFRANGE pfm -@node Printf Examples, , Format Modifiers, Printf +@node Printf Examples @subsection Examples Using @code{printf} The following is a simple example of @@ -6454,7 +6465,7 @@ after them. The table could be made to look even nicer by adding headings to the tops of the columns. This is done using the @code{BEGIN} pattern -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}) +(@pxref{BEGIN/END}) so that the headers are only printed once, at the beginning of the @command{awk} program: @@ -6494,10 +6505,10 @@ At this point, it would be a worthwhile exercise to use the @code{printf} statement to line up the headings and table data for the @file{inventory-shipped} example that was covered earlier in the @value{SECTION} on the @code{print} statement -(@pxref{Print, ,The @code{print} Statement}). +(@pxref{Print}). @c ENDOFRANGE printfs -@node Redirection, Special Files, Printf, Printing +@node Redirection @section Redirecting Output of @code{print} and @code{printf} @cindex output redirection @@ -6611,11 +6622,11 @@ close(report) The message is built using string concatenation and saved in the variable @code{m}. It's then sent down the pipeline to the @command{mail} program. (The parentheses group the items to concatenate---see -@ref{Concatenation, ,String Concatenation}.) +@ref{Concatenation}.) The @code{close} function is called here because it's a good idea to close the pipe as soon as all the intended output has been sent to it. -@xref{Close Files And Pipes, ,Closing Input and Output Redirections}, +@xref{Close Files And Pipes}, for more information. This example also illustrates the use of a variable to represent @@ -6638,7 +6649,7 @@ but subsidiary to, the @command{awk} program. This feature is a @command{gawk} extension, and is not available in POSIX @command{awk}. -@xref{Two-way I/O, ,Two-Way Communications with Another Process}, +@xref{Two-way I/O}, for a more complete discussion. @end table @@ -6672,7 +6683,7 @@ is only opened once. @cindex @command{gawk}, implementation issues, pipes @ifnotinfo As mentioned earlier -(@pxref{Getline Notes, ,Points About @code{getline} to Remember}), +(@pxref{Getline Notes}), many @end ifnotinfo @ifnottex @@ -6703,14 +6714,14 @@ END @{ close("sh") @} The @code{tolower} function returns its argument string with all uppercase characters converted to lowercase -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). The program builds up a list of command lines, using the @command{mv} utility to rename the files. It then sends the list to the shell for execution. @c ENDOFRANGE outre @c ENDOFRANGE reout -@node Special Files, Close Files And Pipes, Redirection, Printing +@node Special Files @section Special @value{FFN}s in @command{gawk} @c STARTOFRANGE gfn @cindex @command{gawk}, @value{FN}s in @@ -6726,7 +6737,7 @@ process-related information, and TCP/IP networking. * Special Caveats:: Things to watch out for. @end menu -@node Special FD, Special Process, Special Files, Special Files +@node Special FD @subsection Special Files for Standard Descriptors @cindex standard input @cindex input, standard @@ -6822,7 +6833,7 @@ It is a common error to omit the quotes, which leads to confusing results. @c Exercise: What does it do? :-) -@node Special Process, Special Network, Special FD, Special Files +@node Special Process @subsection Special Files for Process-Related Information @cindex files, for process information @@ -6831,7 +6842,7 @@ to confusing results. about the running @command{gawk} process. Each of these ``files'' provides a single record of information. To read them more than once, they must first be closed with the @code{close} function -(@pxref{Close Files And Pipes, ,Closing Input and Output Redirections}). +(@pxref{Close Files And Pipes}). The @value{FN}s are: @c @cindex @code{/dev/pid} special file @@ -6892,9 +6903,9 @@ in the next release of @command{gawk}. @command{gawk} prints a warning message every time you use one of these files. To obtain process-related information, use the @code{PROCINFO} array. -@xref{Auto-set, ,Built-in Variables That Convey Information}. +@xref{Auto-set}. -@node Special Network, Special Caveats, Special Process, Special Files +@node Special Network @subsection Special Files for Network Communications @cindex networks, support for @cindex TCP/IP, support for @@ -6913,12 +6924,12 @@ and the other fields represent the other essential pieces of information for making a networking connection. These @value{FN}s are used with the @samp{|&} operator for communicating with a coprocess -(@pxref{Two-way I/O, ,Two-Way Communications with Another Process}). +(@pxref{Two-way I/O}). This is an advanced feature, mentioned here only for completeness. Full discussion is delayed until -@ref{TCP/IP Networking, ,Using @command{gawk} for Network Programming}. +@ref{TCP/IP Networking}. -@node Special Caveats, , Special Network, Special Files +@node Special Caveats @subsection Special @value{FFN} Caveats Here is a list of things to bear in mind when using the @@ -6929,7 +6940,7 @@ special @value{FN}s that @command{gawk} provides: @cindex @value{FN}s, in compatibility mode @item Recognition of these special @value{FN}s is disabled if @command{gawk} is in -compatibility mode (@pxref{Options, ,Command-Line Options}). +compatibility mode (@pxref{Options}). @c @cindex automatic warnings @c @cindex warnings, automatic @@ -6969,7 +6980,7 @@ Doing so results in unpredictable behavior. @end itemize @c ENDOFRANGE gfn -@node Close Files And Pipes, , Special Files, Printing +@node Close Files And Pipes @section Closing Input and Output Redirections @cindex files, output, See output files @c STARTOFRANGE ifc @@ -6986,7 +6997,7 @@ Doing so results in unpredictable behavior. If the same @value{FN} or the same shell command is used with @code{getline} more than once during the execution of an @command{awk} program -(@pxref{Getline, ,Explicit Input with @code{getline}}), +(@pxref{Getline}), the file is opened (or the command is executed) the first time only. At that time, the first record of input is read from that file or command. The next time the same file or command is used with @code{getline}, @@ -7140,7 +7151,7 @@ The second argument should be a string, with either of the values @code{"to"} or @code{"from"}. Case does not matter. As this is an advanced feature, a more complete discussion is delayed until -@ref{Two-way I/O, ,Two-Way Communications with Another Process}, +@ref{Two-way I/O}, which discusses it in more detail and gives an example. @c fakenode --- for prepinfo @@ -7183,6 +7194,15 @@ files, respectively. This value is zero if the close succeeds, or @minus{}1 if it fails. +The POSIX standard is very vague; it says that @code{close} +returns zero on success and non-zero otherwise. In general, +different implementations vary in what they report when closing +pipes; thus the return value cannot be used portably. +@value{DARKCORNER} + +@ignore +@c 4/27/2003: Commenting this out for now, given the above +@c return of 16-bit value The return value for closing a pipeline is particularly useful. It allows you to get the output from a command as well as its exit status. @@ -7209,13 +7229,14 @@ piping into @code{getline}. For commands piped into from @code{print} or @code{printf}, the return value from @code{close} is that of the library's @code{pclose} function. +@end ignore @c ENDOFRANGE ifc @c ENDOFRANGE ofc @c ENDOFRANGE pc @c ENDOFRANGE cc @c ENDOFRANGE prnt -@node Expressions, Patterns and Actions, Printing, Top +@node Expressions @chapter Expressions @c STARTOFRANGE exps @cindex expressions @@ -7257,7 +7278,7 @@ combinations of these with various operators. * Precedence:: How various operators nest. @end menu -@node Constants, Using Constant Regexps, Expressions, Expressions +@node Constants @section Constant Expressions @cindex constants, types of @@ -7275,7 +7296,7 @@ have different forms, but are stored identically internally. * Regexp Constants:: Regular Expression constants. @end menu -@node Scalar Constants, Nondecimal-numbers, Constants, Constants +@node Scalar Constants @subsection Numeric and String Constants @cindex numeric, constants @@ -7311,7 +7332,7 @@ eight-bit ASCII characters including ASCII @sc{nul} (character code zero). Other @command{awk} implementations may have difficulty with some character codes. -@node Nondecimal-numbers, Regexp Constants, Scalar Constants, Constants +@node Nondecimal-numbers @subsection Octal and Hexadecimal Numbers @cindex octal numbers @cindex hexadecimal numbers @@ -7369,14 +7390,14 @@ are not treated differently; doing so by default would break old programs. (If you really need to do this, use the @option{--non-decimal-data} command-line option; -@pxref{Nondecimal Data, ,Allowing Nondecimal Input Data}.) +@pxref{Nondecimal Data}.) If you have octal or hexadecimal data, you can use the @code{strtonum} function -(@pxref{String Functions, ,String Manipulation Functions}) +(@pxref{String Functions}) to convert the data into a number. Most of the time, you will want to use octal or hexadecimal constants when working with the built-in bit manipulation functions; -see @ref{Bitwise Functions, ,Using @command{gawk}'s Bit Manipulation Functions}, +see @ref{Bitwise Functions}, for more information. Unlike some early C implementations, @samp{8} and @samp{9} are not valid @@ -7392,7 +7413,7 @@ $ gawk 'BEGIN @{ print "021 is", 021 ; print 018 @}' @cindex compatibility mode (@command{gawk}), hexadecimal numbers Octal and hexadecimal source code constants are a @command{gawk} extension. If @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), they are not available. @c fakenode --- for prepinfo @@ -7412,7 +7433,7 @@ $ gawk 'BEGIN @{ printf "0x11 is <%s>\n", 0x11 @}' @print{} 0x11 is <17> @end example -@node Regexp Constants, , Nondecimal-numbers, Constants +@node Regexp Constants @subsection Regular Expression Constants @c STARTOFRANGE rec @@ -7428,7 +7449,7 @@ matching operators can also match computed or ``dynamic'' regexps (which are just ordinary strings or variables that contain a regexp). @c ENDOFRANGE cnst -@node Using Constant Regexps, Variables, Constants, Expressions +@node Using Constant Regexps @section Using Regular Expression Constants @cindex dark corner, regexp constants @@ -7440,7 +7461,7 @@ When a regexp constant appears by itself, it has the same meaning as if it appeared in a pattern, i.e., @samp{($0 ~ /foo/)} @value{DARKCORNER} -@xref{Expression Patterns, ,Expressions as Patterns}. +@xref{Expression Patterns}. This means that the following two code segments: @example @@ -7501,14 +7522,14 @@ POSIX specification. Constant regular expressions are also used as the first argument for the @code{gensub}, @code{sub}, and @code{gsub} functions, and as the second argument of the @code{match} function -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). Modern implementations of @command{awk}, including @command{gawk}, allow the third argument of @code{split} to be a regexp constant, but some older implementations do not. @value{DARKCORNER} This can lead to confusion when attempting to use regexp constants as arguments to user-defined functions -(@pxref{User-defined, ,User-Defined Functions}). +(@pxref{User-defined}). For example: @example @@ -7541,7 +7562,7 @@ a parameter to a user-defined function, since passing a truth value in this way is probably not what was intended. @c ENDOFRANGE rec -@node Variables, Conversion, Using Constant Regexps, Expressions +@node Variables @section Variables @cindex variables, user-defined @@ -7558,7 +7579,7 @@ on the @command{awk} command line. advanced method of input. @end menu -@node Using Variables, Assignment Options, Variables, Variables +@node Using Variables @subsection Using Variables in a Program Variables let you give names to values and refer to them later. Variables @@ -7571,7 +7592,7 @@ A variable name is a valid expression by itself; it represents the variable's current value. Variables are given new values with @dfn{assignment operators}, @dfn{increment operators}, and @dfn{decrement operators}. -@xref{Assignment Ops, ,Assignment Expressions}. +@xref{Assignment Ops}. @c NEXT ED: Can also be changed by sub, gsub, split @cindex variables, built-in @@ -7590,7 +7611,7 @@ is zero if converted to a number. There is no need to ``initialize'' each variable explicitly in @command{awk}, which is what you would do in C and in most other traditional languages. -@node Assignment Options, , Using Variables, Variables +@node Assignment Options @subsection Assigning Variables on the Command Line @cindex variables, assigning on command line @c comma before assigning does NOT start tertiary @@ -7598,7 +7619,7 @@ which is what you would do in C and in most other traditional languages. Any @command{awk} variable can be set by including a @dfn{variable assignment} among the arguments on the command line when @command{awk} is invoked -(@pxref{Other Arguments, ,Other Command-Line Arguments}). +(@pxref{Other Arguments}). Such an assignment has the following form: @example @@ -7621,7 +7642,7 @@ as in the following: the variable is set at the very beginning, even before the @code{BEGIN} rules are run. The @option{-v} option and its assignment must precede all the @value{FN} arguments, as well as the program text. -(@xref{Options, ,Command-Line Options}, for more information about +(@xref{Options}, for more information about the @option{-v} option.) Otherwise, the variable assignment is performed at a time determined by its position among the input file arguments---after the processing of the @@ -7652,13 +7673,13 @@ $ awk '@{ print $n @}' n=4 inventory-shipped n=2 BBS-list @cindex dark corner, command-line arguments Command-line arguments are made available for explicit examination by the @command{awk} program in the @code{ARGV} array -(@pxref{ARGC and ARGV, ,Using @code{ARGC} and @code{ARGV}}). +(@pxref{ARGC and ARGV}). @command{awk} processes the values of command-line assignments for escape sequences (@pxref{Escape Sequences}). @value{DARKCORNER} -@node Conversion, Arithmetic Ops, Variables, Expressions +@node Conversion @section Conversion of Strings and Numbers @cindex converting, strings to numbers @@ -7700,7 +7721,7 @@ by the @command{awk} built-in variable @code{CONVFMT} (@pxref{Built-in Variables Numbers are converted using the @code{sprintf} function with @code{CONVFMT} as the format specifier -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). @code{CONVFMT}'s default value is @code{"%.6g"}, which prints a value with at least six significant digits. For some applications, you might want to @@ -7745,10 +7766,46 @@ However, these semantics for @code{OFMT} are something to keep in mind if you mu port your new style program to older implementations of @command{awk}. We recommend that instead of changing your programs, just port @command{gawk} itself. -@xref{Print, ,The @code{print} Statement}, +@xref{Print}, for more information on the @code{print} statement. -@node Arithmetic Ops, Concatenation, Conversion, Expressions +Finally, once again, where you are can matter when it comes to +converting between numbers and strings. In +@ref{Locales}, we mentioned that the +local character set and language (the locale) can affect how @command{gawk} matches +characters. The locale also affects numeric formats. In particular, for @command{awk} +programs, it affects the decimal point character. The @code{"C"} locale, and most +English-language locales, use the period character (@samp{.}) as the decimal point. +However, many (if not most) European and non-English locales use the comma (@samp{,}) +as the decimal point character. + +The POSIX standard says that @command{awk} always uses the period as the decimal +point when reading the @command{awk} program source code, and for command-line +variable assignments (@pxref{Other Arguments}). +However, when interpreting input data, for @code{print} and @code{printf} output, +and for number to string conversion, the local decimal point character is used. +As of @value{PVERSION} 3.1.3, @command{gawk} fully complies with this aspect +of the standard. Here are some examples indicating the difference in behavior, +on a GNU/Linux system: + +@example +$ gawk 'BEGIN @{ printf "%g\n", 3.1415927 @}' +@print{} 3.14159 +$ LC_ALL=en_DK gawk 'BEGIN @{ printf "%g\n", 3.1415927 @}' +@print{} 3,14159 +$ echo 4,321 | gawk '@{ print $1 + 1 @}' +@print{} 5 +$ echo 4,321 | LC_ALL=en_DK gawk '@{ print $1 + 1 @}' +@print{} 5,321 +@end example + +@noindent +The @samp{en_DK} locale is for English in Denmark, where the comma acts as +the decimal point separator. In the normal @code{"C"} locale, @command{gawk} +treats @samp{4,321} as @samp{4}, while in the Danish locale, it's treated +as the full number, @samp{4.321}. + +@node Arithmetic Ops @section Arithmetic Operators @cindex arithmetic operators @cindex operators, arithmetic @@ -7862,7 +7919,7 @@ The POSIX standard only specifies the use of @samp{^} for exponentiation. For maximum portability, do not use the @samp{**} operator. -@node Concatenation, Assignment Ops, Arithmetic Ops, Expressions +@node Concatenation @section String Concatenation @cindex Kernighan, Brian @quotation @@ -7992,7 +8049,7 @@ As mentioned earlier, when doing concatenation, @emph{parenthesize}. Otherwise, you're never quite sure what you'll get. -@node Assignment Ops, Increment Ops, Concatenation, Expressions +@node Assignment Ops @section Assignment Expressions @c STARTOFRANGE asop @cindex assignment operators @@ -8042,8 +8099,8 @@ a @dfn{side effect}. @cindex operators, assignment The lefthand operand of an assignment need not be a variable (@pxref{Variables}); it can also be a field -(@pxref{Changing Fields, ,Changing the Contents of a Field}) or -an array element (@pxref{Arrays, ,Arrays in @command{awk}}). +(@pxref{Changing Fields}) or +an array element (@pxref{Arrays}). These are all called @dfn{lvalues}, which means they can appear on the lefthand side of an assignment operator. The righthand operand may be any expression; it produces the new value @@ -8149,7 +8206,7 @@ BEGIN @{ The indices of @code{bar} are practically guaranteed to be different, because @code{rand} returns different values each time it is called. (Arrays and the @code{rand} function haven't been covered yet. -@xref{Arrays, ,Arrays in @command{awk}}, +@xref{Arrays}, and see @ref{Numeric Functions}, for more information). This example illustrates an important fact about assignment operators: the lefthand expression is only evaluated @emph{once}. @@ -8269,12 +8326,12 @@ awk '/[=]=/' /dev/null @command{gawk} does not have this problem, nor do the other freely available versions described in -@ref{Other Versions, , Other Freely Available @command{awk} Implementations}. +@ref{Other Versions}. @c ENDOFRANGE exas @c ENDOFRANGE opas @c ENDOFRANGE asop -@node Increment Ops, Truth Values, Assignment Ops, Expressions +@node Increment Ops @section Increment and Decrement Operators @c STARTOFRANGE inop @@ -8397,7 +8454,7 @@ You should avoid such things in your own programs. @c ENDOFRANGE opde @c ENDOFRANGE deop -@node Truth Values, Typing and Comparison, Increment Ops, Expressions +@node Truth Values @section True and False in @command{awk} @cindex truth values @cindex logical false/true @@ -8432,7 +8489,7 @@ There is a surprising consequence of the ``nonzero or non-null'' rule: the string constant @code{"0"} is actually true, because it is non-null. @value{DARKCORNER} -@node Typing and Comparison, Boolean Ops, Truth Values, Expressions +@node Typing and Comparison @section Variable Typing and Comparison Expressions @quotation @i{The Guide is definitive. Reality is frequently inaccurate.}@* @@ -8525,15 +8582,15 @@ following symmetric matrix: % \hfil -- infinite glue; has the effect of right-justifying in this case. % # -- replaced by the text (for instance, `STRNUM', in the last row). % \quad -- about the width of an `M'. Just separates the columns. -% +% % The second column (\vrule#) is what generates the vertical rule that % spans table rows. -% +% % The doubled && before the next entry means `repeat the following % template as many times as necessary on each line' -- in our case, twice. -% +% % The template itself, \quad#\hfil, left-justifies with a little space before. -% +% \halign{\strut\hfil#\quad&\vrule#&&\quad#\hfil\cr &&STRING &NUMERIC &STRNUM\cr % The \omit tells TeX to skip inserting the template for this column on @@ -8635,7 +8692,7 @@ True if the array @var{array} has an element with the subscript @var{subscript}. Comparison expressions have the value one if true and zero if false. When comparing operands of mixed types, numeric operands are converted to strings using the value of @code{CONVFMT} -(@pxref{Conversion, ,Conversion of Strings and Numbers}). +(@pxref{Conversion}). Strings are compared by comparing the first character of each, then the second character of each, @@ -8730,8 +8787,8 @@ has the value one if @code{x} contains @samp{foo}, such as The righthand operand of the @samp{~} and @samp{!~} operators may be either a regexp constant (@code{/@dots{}/}) or an ordinary expression. In the latter case, the value of the expression as a string is used as a -dynamic regexp (@pxref{Regexp Usage, ,How to Use Regular Expressions}; also -@pxref{Computed Regexps, ,Using Dynamic Regexps}). +dynamic regexp (@pxref{Regexp Usage}; also +@pxref{Computed Regexps}). @cindex @command{awk}, regexp constants and @cindex regexp constants @@ -8746,14 +8803,14 @@ $0 ~ /@var{regexp}/ One special place where @code{/foo/} is @emph{not} an abbreviation for @samp{$0 ~ /foo/} is when it is the righthand operand of @samp{~} or @samp{!~}. -@xref{Using Constant Regexps, ,Using Regular Expression Constants}, +@xref{Using Constant Regexps}, where this is discussed in more detail. @c ENDOFRANGE comex @c ENDOFRANGE excom @c ENDOFRANGE vartypc @c ENDOFRANGE varting -@node Boolean Ops, Conditional Exp, Typing and Comparison, Expressions +@node Boolean Ops @section Boolean Expressions @cindex and Boolean-logic operator @cindex or Boolean-logic operator @@ -8778,7 +8835,7 @@ The terms are equivalent. Boolean expressions can be used wherever comparison and matching expressions can be used. They can be used in @code{if}, @code{while}, @code{do}, and @code{for} statements -(@pxref{Statements, ,Control Statements in Actions}). +(@pxref{Statements}). They have numeric values (one if true, zero if false) that come into play if the result of the Boolean expression is stored in a variable or used in arithmetic. @@ -8830,7 +8887,7 @@ BEGIN @{ if (! ("HOME" in ENVIRON)) @end example (The @code{in} operator is described in -@ref{Reference to Elements, ,Referring to an Array Element}.) +@ref{Reference to Elements}.) @end table @cindex short-circuit operators @@ -8848,7 +8905,7 @@ its evaluation. Statements that use @samp{&&} or @samp{||} can be continued simply by putting a newline after them. But you cannot put a newline in front of either of these operators without using backslash continuation -(@pxref{Statements/Lines, ,@command{awk} Statements Versus Lines}). +(@pxref{Statements/Lines}). @cindex @code{!} (exclamation point), @code{!} operator @cindex exclamation point (@code{!}), @code{!} operator @@ -8884,7 +8941,7 @@ so we'll leave well enough alone. @cindex @code{next} statement @strong{Note:} The @code{next} statement is discussed in -@ref{Next Statement, ,The @code{next} Statement}. +@ref{Next Statement}. @code{next} tells @command{awk} to skip the rest of the rules, get the next record, and start processing the rules over again at the top. The reason it's there is to avoid printing the bracketing @@ -8892,7 +8949,7 @@ The reason it's there is to avoid printing the bracketing @c ENDOFRANGE exbo @c ENDOFRANGE boex -@node Conditional Exp, Function Calls, Boolean Ops, Expressions +@node Conditional Exp @section Conditional Expressions @cindex conditional expressions @cindex expressions, conditional @@ -8935,7 +8992,7 @@ x == y ? a[i++] : b[i++] This is guaranteed to increment @code{i} exactly once, because each time only one of the two increment expressions is executed and the other is not. -@xref{Arrays, ,Arrays in @command{awk}}, +@xref{Arrays}, for more information about arrays. @cindex differences in @command{awk} and @command{gawk}, line continuations @@ -8946,11 +9003,11 @@ a statement that uses @samp{?:} can be continued simply by putting a newline after either character. However, putting a newline in front of either character does not work without using backslash continuation -(@pxref{Statements/Lines, ,@command{awk} Statements Versus Lines}). +(@pxref{Statements/Lines}). If @option{--posix} is specified -(@pxref{Options, , Command-Line Options}), then this extension is disabled. +(@pxref{Options}), then this extension is disabled. -@node Function Calls, Precedence, Conditional Exp, Expressions +@node Function Calls @section Function Calls @cindex function calls @@ -8962,10 +9019,10 @@ example, the function @code{sqrt} computes the square root of a number. @cindex functions, built-in A fixed set of functions are @dfn{built-in}, which means they are available in every @command{awk} program. The @code{sqrt} function is one -of these. @xref{Built-in, ,Built-in Functions}, for a list of built-in +of these. @xref{Built-in}, for a list of built-in functions and their descriptions. In addition, you can define functions for use in your program. -@xref{User-defined, ,User-Defined Functions}, +@xref{User-defined}, for instructions on how to do this. @cindex arguments, in function calls @@ -9004,10 +9061,10 @@ Some of the built-in functions have one or more optional arguments. If those arguments are not supplied, the functions use a reasonable default value. -@xref{Built-in, ,Built-in Functions}, for full details. If arguments +@xref{Built-in}, for full details. If arguments are omitted in calls to user-defined functions, then those arguments are treated as local variables and initialized to the empty string -(@pxref{User-defined, ,User-Defined Functions}). +(@pxref{User-defined}). @cindex side effects, function calls Like every other expression, the function call has a value, which is @@ -9029,7 +9086,7 @@ $ awk '@{ print "The square root of", $1, "is", sqrt($1) @}' @kbd{@value{CTL}-d} @end example -@node Precedence, , Function Calls, Expressions +@node Precedence @section Operator Precedence (How Operators Nest) @c STARTOFRANGE prec @cindex precedence @@ -9122,7 +9179,7 @@ Addition, subtraction. @item @r{String Concatenation} No special symbol is used to indicate concatenation. The operands are simply written side by side -(@pxref{Concatenation, ,String Concatenation}). +(@pxref{Concatenation}). @cindex @code{<} (left angle bracket), @code{<} operator @cindex left angle bracket (@code{<}), @code{<} operator @@ -9216,7 +9273,7 @@ For maximum portability, do not use them. @c ENDOFRANGE oppr @c ENDOFRANGE exps -@node Patterns and Actions, Arrays, Expressions, Top +@node Patterns and Actions @chapter Patterns, Actions, and Variables @c STARTOFRANGE pat @cindex patterns @@ -9242,7 +9299,7 @@ building something useful. * Built-in Variables:: Summarizes the built-in variables. @end menu -@node Pattern Overview, Using Shell Variables, Patterns and Actions, Patterns and Actions +@node Pattern Overview @section Pattern Elements @menu @@ -9262,31 +9319,31 @@ The following is a summary of the types of @command{awk} patterns: @item /@var{regular expression}/ A regular expression. It matches when the text of the input record fits the regular expression. -(@xref{Regexp, ,Regular Expressions}.) +(@xref{Regexp}.) @item @var{expression} A single expression. It matches when its value is nonzero (if a number) or non-null (if a string). -(@xref{Expression Patterns, ,Expressions as Patterns}.) +(@xref{Expression Patterns}.) @item @var{pat1}, @var{pat2} A pair of patterns separated by a comma, specifying a range of records. The range includes both the initial record that matches @var{pat1} and the final record that matches @var{pat2}. -(@xref{Ranges, ,Specifying Record Ranges with Patterns}.) +(@xref{Ranges}.) @item BEGIN @itemx END Special patterns for you to supply startup or cleanup actions for your @command{awk} program. -(@xref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}.) +(@xref{BEGIN/END}.) @item @var{empty} The empty pattern matches every input record. -(@xref{Empty, ,The Empty Pattern}.) +(@xref{Empty}.) @end table -@node Regexp Patterns, Expression Patterns, Pattern Overview, Pattern Overview +@node Regexp Patterns @subsection Regular Expressions as Patterns @cindex patterns, expressions as @cindex regular expressions, as patterns @@ -9303,7 +9360,7 @@ For example: END @{ print buzzwords, "buzzwords seen" @} @end example -@node Expression Patterns, Ranges, Regexp Patterns, Pattern Overview +@node Expression Patterns @subsection Expressions as Patterns @cindex expressions, as patterns @@ -9319,14 +9376,14 @@ depends on only what has happened so far in the execution of the @cindex comparison expressions, as patterns @cindex patterns, comparison expressions as Comparison expressions, using the comparison operators described in -@ref{Typing and Comparison, ,Variable Typing and Comparison Expressions}, +@ref{Typing and Comparison}, are a very common kind of pattern. Regexp matching and nonmatching are also very common expressions. The left operand of the @samp{~} and @samp{!~} operators is a string. The right operand is either a constant regular expression enclosed in slashes (@code{/@var{regexp}/}), or any expression whose string value is used as a dynamic regular expression -(@pxref{Computed Regexps, , Using Dynamic Regexps}). +(@pxref{Computed Regexps}). The following example prints the second field of each input record whose first field is precisely @samp{foo}: @@ -9410,7 +9467,7 @@ patterns. Likewise, the special patterns @code{BEGIN} and @code{END}, which never match any input record, are not expressions and cannot appear inside Boolean patterns. -@node Ranges, BEGIN/END, Expression Patterns, Pattern Overview +@node Ranges @subsection Specifying Record Ranges with Patterns @cindex range patterns @@ -9455,7 +9512,7 @@ the @samp{%} symbol), each on its own line, that should be ignored. A first attempt would be to combine a range pattern that describes the delimited text with the @code{next} statement -(not discussed yet, @pxref{Next Statement, , The @code{next} Statement}). +(not discussed yet, @pxref{Next Statement}). This causes @command{awk} to skip any further processing of the current record and start over again with the next input record. Such a program looks like this: @@ -9499,7 +9556,7 @@ $ echo Yes | gawk '(/1/,/2/) || /Yes/' @error{} gawk: cmd. line:2: ^ unexpected newline @end example -@node BEGIN/END, Empty, Ranges, Pattern Overview +@node BEGIN/END @subsection The @code{BEGIN} and @code{END} Special Patterns @c STARTOFRANGE beg @@ -9520,7 +9577,7 @@ programmers. * I/O And BEGIN/END:: I/O issues in BEGIN/END rules. @end menu -@node Using BEGIN/END, I/O And BEGIN/END, BEGIN/END, BEGIN/END +@node Using BEGIN/END @subsubsection Startup and Cleanup Actions A @code{BEGIN} rule is executed once only, before the first input record @@ -9564,14 +9621,14 @@ in terms of program organization and readability. Multiple @code{BEGIN} and @code{END} rules are useful for writing library functions, because each library file can have its own @code{BEGIN} and/or -@code{END} rule to do its own initialization and/or cleanup. +@code{END} rule to do its own initialization and/or cleanup. The order in which library functions are named on the command line controls the order in which their @code{BEGIN} and @code{END} rules are executed. Therefore, you have to be careful when writing such rules in library files so that the order in which they are executed doesn't matter. -@xref{Options, ,Command-Line Options}, for more information on +@xref{Options}, for more information on using library functions. -@xref{Library Functions, ,A Library of @command{awk} Functions}, +@xref{Library Functions}, for a number of useful library functions. If an @command{awk} program has only a @code{BEGIN} rule and no @@ -9582,7 +9639,7 @@ reading and ignoring input until the end of the file was seen.} However, if an no other rules in the program. This is necessary in case the @code{END} rule checks the @code{FNR} and @code{NR} variables. -@node I/O And BEGIN/END, , Using BEGIN/END, BEGIN/END +@node I/O And BEGIN/END @subsubsection Input/Output from @code{BEGIN} and @code{END} Rules @cindex input/output, from @code{BEGIN} and @code{END} @@ -9594,14 +9651,14 @@ there simply is no input record, and therefore no fields, when executing @code{BEGIN} rules. References to @code{$0} and the fields yield a null string or zero, depending upon the context. One way to give @code{$0} a real value is to execute a @code{getline} command -without a variable (@pxref{Getline, ,Explicit Input with @code{getline}}). +without a variable (@pxref{Getline}). Another way is simply to assign a value to @code{$0}. @cindex differences in @command{awk} and @command{gawk}, @code{BEGIN}/@code{END} patterns @cindex POSIX @command{awk}, @code{BEGIN}/@code{END} patterns @cindex @code{print} statement, @code{BEGIN}/@code{END} patterns and @cindex @code{BEGIN} pattern, @code{print} statement and -@cindex @code{END} pattern, @code{print} statement and +@cindex @code{END} pattern, @code{print} statement and The second point is similar to the first but from the other direction. Traditionally, due largely to implementation issues, @code{$0} and @code{NF} were @emph{undefined} inside an @code{END} rule. @@ -9626,17 +9683,17 @@ line is needed in the output, the program should print one explicitly. @cindex @code{next} statement, @code{BEGIN}/@code{END} patterns and @cindex @code{nextfile} statement, @code{BEGIN}/@code{END} patterns and @cindex @code{BEGIN} pattern, @code{next}/@code{nextfile} statements and -@cindex @code{END} pattern, @code{next}/@code{nextfile} statements and +@cindex @code{END} pattern, @code{next}/@code{nextfile} statements and Finally, the @code{next} and @code{nextfile} statements are not allowed in a @code{BEGIN} rule, because the implicit read-a-record-and-match-against-the-rules loop has not started yet. Similarly, those statements are not valid in an @code{END} rule, since all the input has been read. -(@xref{Next Statement, ,The @code{next} Statement}, and see -@ref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}.) +(@xref{Next Statement}, and see +@ref{Nextfile Statement}.) @c ENDOFRANGE beg @c ENDOFRANGE end -@node Empty, , BEGIN/END, Pattern Overview +@node Empty @subsection The Empty Pattern @cindex empty pattern @@ -9652,7 +9709,7 @@ awk '@{ print $1 @}' BBS-list prints the first field of every record. @c ENDOFRANGE pat -@node Using Shell Variables, Action Overview, Pattern Overview, Patterns and Actions +@node Using Shell Variables @section Using Shell Variables in Programs @cindex shells, variables @cindex @command{awk} programs, shell variables in @@ -9686,15 +9743,15 @@ The second part is single-quoted. Variable substitution via quoting works, but can be potentially messy. It requires a good understanding of the shell's quoting rules -(@pxref{Quoting, ,Shell Quoting Issues}), +(@pxref{Quoting}), and it's often difficult to correctly match up the quotes when reading the program. A better method is to use @command{awk}'s variable assignment feature -(@pxref{Assignment Options, ,Assigning Variables on the Command Line}) +(@pxref{Assignment Options}) to assign the shell variable's value to an @command{awk} variable's value. Then use dynamic regexps to match the pattern -(@pxref{Computed Regexps, ,Using Dynamic Regexps}). +(@pxref{Computed Regexps}). The following shows how to redo the previous example using this technique: @@ -9715,7 +9772,7 @@ provides more flexibility, since the variable can be used anywhere inside the program---for printing, as an array subscript, or for any other use---without requiring the quoting tricks at every point in the program. -@node Action Overview, Statements, Using Shell Variables, Patterns and Actions +@node Action Overview @section Actions @c @cindex action, definition of @c @cindex curly braces @@ -9725,7 +9782,7 @@ use---without requiring the quoting tricks at every point in the program. An @command{awk} program or script consists of a series of rules and function definitions interspersed. (Functions are -described later. @xref{User-defined, ,User-Defined Functions}.) +described later. @xref{User-defined}.) A rule contains a pattern and an action, either of which (but not both) may be omitted. The purpose of the @dfn{action} is to tell @command{awk} what to do once a match for the pattern is found. Thus, @@ -9767,13 +9824,13 @@ Call functions or assign values to variables (@pxref{Expressions}). Executing this kind of statement simply computes the value of the expression. This is useful when the expression has side effects -(@pxref{Assignment Ops, ,Assignment Expressions}). +(@pxref{Assignment Ops}). @item Control statements Specify the control flow of @command{awk} programs. The @command{awk} language gives you C-like constructs (@code{if}, @code{for}, @code{while}, and @code{do}) as well as a few -special ones (@pxref{Statements, ,Control Statements in Actions}). +special ones (@pxref{Statements}). @item Compound statements Consist of one or more statements enclosed in @@ -9783,22 +9840,22 @@ or @code{for} statement. @item Input statements Use the @code{getline} command -(@pxref{Getline, ,Explicit Input with @code{getline}}). +(@pxref{Getline}). Also supplied in @command{awk} are the @code{next} -statement (@pxref{Next Statement, ,The @code{next} Statement}), +statement (@pxref{Next Statement}), and the @code{nextfile} statement -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). +(@pxref{Nextfile Statement}). @item Output statements Such as @code{print} and @code{printf}. -@xref{Printing, ,Printing Output}. +@xref{Printing}. @item Deletion statements For deleting array elements. -@xref{Delete, ,The @code{delete} Statement}. +@xref{Delete}. @end table -@node Statements, Built-in Variables, Action Overview, Patterns and Actions +@node Statements @section Control Statements in Actions @c STARTOFRANGE csta @cindex control statements @@ -9838,6 +9895,8 @@ newlines or semicolons. condition is satisfied. * For Statement:: Another looping statement, that provides initialization and increment clauses. +* Switch Statement:: Switch/case evaluation for conditional + execution of statements based on a value. * Break Statement:: Immediately exit the innermost enclosing loop. * Continue Statement:: Skip to the end of the innermost enclosing loop. @@ -9846,7 +9905,7 @@ newlines or semicolons. * Exit Statement:: Stop execution of @command{awk}. @end menu -@node If Statement, While Statement, Statements, Statements +@node If Statement @subsection The @code{if}-@code{else} Statement @cindex @code{if} statement @@ -9894,7 +9953,7 @@ it produces a syntax error. Don't actually write programs this way, because a human reader might fail to see the @code{else} if it is not the first thing on its line. -@node While Statement, Do Statement, If Statement, Statements +@node While Statement @subsection The @code{while} Statement @cindex @code{while} statement @cindex loops @@ -9954,7 +10013,7 @@ compound statement or else is very simple. The newline after the open-brace that begins the compound statement is not required either, but the program is harder to read without it. -@node Do Statement, For Statement, While Statement, Statements +@node Do Statement @subsection The @code{do}-@code{while} Statement @cindex @code{do}-@code{while} statement @@ -9998,7 +10057,7 @@ realistic example, since in this case an ordinary @code{while} would do just as well. This situation reflects actual experience; only occasionally is there a real use for a @code{do} statement. -@node For Statement, Break Statement, Do Statement, Statements +@node For Statement @subsection The @code{for} Statement @cindex @code{for} statement @@ -10076,7 +10135,7 @@ while (@var{condition}) @{ @cindex loops, @code{continue} statements and @noindent The only exception is when the @code{continue} statement -(@pxref{Continue Statement, ,The @code{continue} Statement}) is used +(@pxref{Continue Statement}) is used inside the loop. Changing a @code{for} statement to a @code{while} statement in this way can change the effect of the @code{continue} statement inside the loop. @@ -10098,11 +10157,71 @@ for (i in array) @end example @noindent -@xref{Scanning an Array, ,Scanning All Elements of an Array}, +@xref{Scanning an Array}, for more information on this version of the @code{for} loop. @end ifinfo -@node Break Statement, Continue Statement, For Statement, Statements +@node Switch Statement +@subsection The @code{switch} Statement +@cindex @code{switch} statement +@cindex @code{case} keyword +@cindex @code{default} keyword + +@strong{NOTE:} This @value{SUBSECTION} describes an experimental feature +added in @command{gawk} 3.1.3. It is @emph{not} enabled by default. To +enable it, use the @option{--enable-switch} option to @command{configure} +when @command{gawk} is being configured and built. +@xref{Additional Configuration Options}, +for more information. + +The @code{switch} statement allows the evaluation of an expression and +the execution of statements based on a @code{case} match. Case statements +are checked for a match in the order they are defined. If no suitable +@code{case} is found, the @code{default} section is executed, if supplied. The +general form of the @code{switch} statement looks like this: + +@example +switch (@var{expression}) @{ +case @var{value or regular expression}: + @var{case-body} +default: + @var{default-body} +@} +@end example + +The @code{switch} statement works as it does in C. Once a match to a given +case is made, case statement bodies are executed until a @code{break}, +@code{continue}, @code{next}, @code{nextfile} or @code{exit} is encountered, +or the end of the @code{switch} statement itself. For example: + +@example +switch (NR * 2 + 1) @{ +case 3: +case "11": + print NR - 1 + break + +case /2[[:digit:]]+/: + print NR + +default: + print NR + 1 + +case -1: + print NR * -1 +@} +@end example + +Note that if none of the statements specified above halt execution +of a matched @code{case} statement, execution falls through to the +next @code{case} until execution halts. In the above example, for +any case value starting with @samp{2} followed by one or more digits, +the @code{print} statement is executed and then falls through into the +@code{default} section, executing its @code{print} statement. In turn, +the @minus{}1 case will also be executed since the @code{default} does +not halt execution. + +@node Break Statement @subsection The @code{break} Statement @cindex @code{break} statement @cindex loops, exiting @@ -10131,7 +10250,7 @@ immediately @dfn{breaks out} of the containing @code{for} loop. This means that @command{awk} proceeds immediately to the statement following the loop and continues processing. (This is very different from the @code{exit} statement, which stops the entire @command{awk} program. -@xref{Exit Statement, ,The @code{exit} Statement}.) +@xref{Exit Statement}.) Th following program illustrates how the @var{condition} of a @code{for} or @code{while} statement could be replaced with a @code{break} inside @@ -10164,18 +10283,18 @@ The @code{break} statement has no meaning when used outside the body of a loop. However, although it was never documented, historical implementations of @command{awk} treated the @code{break} statement outside of a loop as if it were a @code{next} statement -(@pxref{Next Statement, ,The @code{next} Statement}). +(@pxref{Next Statement}). Recent versions of Unix @command{awk} no longer allow this usage. @command{gawk} supports this use of @code{break} only if @option{--traditional} has been specified on the command line -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). Otherwise, it is treated as an error, since the POSIX standard specifies that @code{break} should only be used inside the body of a loop. @value{DARKCORNER} -@node Continue Statement, Next Statement, Break Statement, Statements +@node Continue Statement @subsection The @code{continue} Statement @cindex @code{continue} statement @@ -10234,15 +10353,15 @@ a loop. Historical versions of @command{awk} treated a @code{continue} statement outside a loop the same way they treated a @code{break} statement outside a loop: as if it were a @code{next} statement -(@pxref{Next Statement, ,The @code{next} Statement}). +(@pxref{Next Statement}). Recent versions of Unix @command{awk} no longer work this way, and @command{gawk} allows it only if @option{--traditional} is specified on -the command line (@pxref{Options, ,Command-Line Options}). Just like the +the command line (@pxref{Options}). Just like the @code{break} statement, the POSIX standard specifies that @code{continue} should only be used inside the body of a loop. @value{DARKCORNER} -@node Next Statement, Nextfile Statement, Continue Statement, Statements +@node Next Statement @subsection The @code{next} Statement @cindex @code{next} statement @@ -10252,7 +10371,7 @@ further rules are executed for the current record, and the rest of the current rule's action isn't executed. Contrast this with the effect of the @code{getline} function -(@pxref{Getline, ,Explicit Input with @code{getline}}). That also causes +(@pxref{Getline}). That also causes @command{awk} to read the next record immediately, but it does not alter the flow of control in any way (i.e., the rest of the current action executes with a new input record). @@ -10284,12 +10403,12 @@ the program's subsequent rules won't see the bad record. The error message is redirected to the standard error output stream, as error messages should be. For more detail see -@ref{Special Files, ,Special @value{FFN}s in @command{gawk}}. +@ref{Special Files}. @c @cindex @command{awk} language, POSIX version @c @cindex @code{next}, inside a user-defined function @cindex @code{BEGIN} pattern, @code{next}/@code{nextfile} statements and -@cindex @code{END} pattern, @code{next}/@code{nextfile} statements and +@cindex @code{END} pattern, @code{next}/@code{nextfile} statements and @cindex POSIX @command{awk}, @code{next}/@code{nextfile} statements and @cindex @code{next} statement, user-defined functions and @cindex functions, user-defined, @code{next}/@code{nextfile} statements and @@ -10299,15 +10418,15 @@ the @code{next} statement is used in a @code{BEGIN} or @code{END} rule. Although POSIX permits it, some other @command{awk} implementations don't allow the @code{next} statement inside function bodies -(@pxref{User-defined, ,User-Defined Functions}). +(@pxref{User-defined}). Just as with any other @code{next} statement, a @code{next} statement inside a function body reads the next record and starts processing it with the first rule in the program. If the @code{next} statement causes the end of the input to be reached, then the code in any @code{END} rules is executed. -@xref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}. +@xref{BEGIN/END}. -@node Nextfile Statement, Exit Statement, Next Statement, Statements +@node Nextfile Statement @subsection Using @command{gawk}'s @code{nextfile} Statement @cindex @code{nextfile} statement @cindex differences in @command{awk} and @command{gawk}, @code{next}/@code{nextfile} statements @@ -10321,7 +10440,7 @@ current @value{DF}. The @code{nextfile} statement is a @command{gawk} extension. In most other @command{awk} implementations, or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), @code{nextfile} is not special. Upon execution of the @code{nextfile} statement, @code{FILENAME} is @@ -10331,7 +10450,7 @@ starts over with the first rule in the program. (@code{ARGIND} hasn't been introduced yet. @xref{Built-in Variables}.) If the @code{nextfile} statement causes the end of the input to be reached, then the code in any @code{END} rules is executed. -@xref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}. +@xref{BEGIN/END}. The @code{nextfile} statement is useful when there are many @value{DF}s to process but it isn't necessary to process every record in every file. @@ -10347,17 +10466,17 @@ opened with redirections. It is not related to the main processing that If it's necessary to use an @command{awk} version that doesn't support @code{nextfile}, see -@ref{Nextfile Function, ,Implementing @code{nextfile} as a Function}, +@ref{Nextfile Function}, for a user-defined function that simulates the @code{nextfile} statement. @cindex functions, user-defined, @code{next}/@code{nextfile} statements and @cindex @code{nextfile} statement, user-defined functions and The current version of the Bell Laboratories @command{awk} -(@pxref{Other Versions, ,Other Freely Available @command{awk} Implementations}) +(@pxref{Other Versions}) also supports @code{nextfile}. However, it doesn't allow the @code{nextfile} statement inside function bodies -(@pxref{User-defined, ,User-Defined Functions}). +(@pxref{User-defined}). @command{gawk} does; a @code{nextfile} inside a function body reads the next record and starts processing it with the first rule in the program, just as any other @code{nextfile} statement. @@ -10374,7 +10493,7 @@ inconsistent. When it appeared after @code{next}, @samp{file} was a keyword; otherwise, it was a regular identifier. The old usage is no longer accepted; @samp{next file} generates a syntax error. -@node Exit Statement, , Nextfile Statement, Statements +@node Exit Statement @subsection The @code{exit} Statement @cindex @code{exit} statement @@ -10393,7 +10512,7 @@ program stops processing everything immediately. No input records are read. However, if an @code{END} rule is present, as part of executing the @code{exit} statement, the @code{END} rule is executed -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). +(@pxref{BEGIN/END}). If @code{exit} is used as part of an @code{END} rule, it causes the program to stop immediately. @@ -10406,7 +10525,7 @@ In such a case, if you don't want the @code{END} rule to do its job, set a variable to nonzero before the @code{exit} statement and check that variable in the @code{END} rule. -@xref{Assert Function, ,Assertions}, +@xref{Assert Function}, for an example that does this. @cindex dark corner, @code{exit} statement @@ -10439,7 +10558,7 @@ BEGIN @{ @c ENDOFRANGE acs @c ENDOFRANGE accs -@node Built-in Variables, , Statements, Patterns and Actions +@node Built-in Variables @section Built-in Variables @c STARTOFRANGE bvar @cindex built-in variables @@ -10468,7 +10587,7 @@ describing their areas of activity. * ARGC and ARGV:: Ways to use @code{ARGC} and @code{ARGV}. @end menu -@node User-modified, Auto-set, Built-in Variables, Built-in Variables +@node User-modified @subsection Built-in Variables That Control @command{awk} @c STARTOFRANGE bvaru @cindex built-in variables, user-modifiable @@ -10495,15 +10614,15 @@ files should use binary I/O. Any other string value is equivalent to @code{"rw"}, but @command{gawk} generates a warning message. @code{BINMODE} is described in more detail in -@ref{PC Using, ,Using @command{gawk} on PC Operating Systems}. +@ref{PC Using}. @cindex differences in @command{awk} and @command{gawk}, @code{BINMODE} variable This variable is a @command{gawk} extension. In other @command{awk} implementations (except @command{mawk}, -@pxref{Other Versions, , Other Freely Available @command{awk} Implementations}), +@pxref{Other Versions}), or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), it is not special. @cindex @code{CONVFMT} variable @@ -10512,10 +10631,10 @@ it is not special. @cindex strings, converting, numbers to @item CONVFMT This string controls conversion of numbers to -strings (@pxref{Conversion, ,Conversion of Strings and Numbers}). +strings (@pxref{Conversion}). It works by being passed, in effect, as the first argument to the @code{sprintf} function -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). Its default value is @code{"%.6g"}. @code{CONVFMT} was introduced by the POSIX standard. @@ -10528,11 +10647,11 @@ This is a space-separated list of columns that tells @command{gawk} how to split input with fixed columnar boundaries. Assigning a value to @code{FIELDWIDTHS} overrides the use of @code{FS} for field splitting. -@xref{Constant Size, ,Reading Fixed-Width Data}, for more information. +@xref{Constant Size}, for more information. @cindex @command{gawk}, @code{FIELDWIDTHS} variable in If @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), then @code{FIELDWIDTHS} +(@pxref{Options}), then @code{FIELDWIDTHS} has no special meaning, and field-splitting operations occur based exclusively on the value of @code{FS}. @@ -10541,7 +10660,7 @@ exclusively on the value of @code{FS}. @cindex field separators @item FS This is the input field separator -(@pxref{Field Separators, ,Specifying How Fields Are Separated}). +(@pxref{Field Separators}). The value is a single-character string or a multi-character regular expression that matches the separations between fields in an input record. If the value is the null string (@code{""}), then each @@ -10585,11 +10704,11 @@ functions, record termination with @code{RS}, and field splitting with However, the value of @code{IGNORECASE} does @emph{not} affect array subscripting and it does not affect field splitting when using a single-character field separator. -@xref{Case-sensitivity, ,Case Sensitivity in Matching}. +@xref{Case-sensitivity}. @cindex @command{gawk}, @code{IGNORECASE} variable in If @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), then @code{IGNORECASE} has no special meaning. Thus, string and regexp operations are always case-sensitive. @@ -10599,7 +10718,7 @@ and regexp operations are always case-sensitive. @item LINT # When this variable is true (nonzero or non-null), @command{gawk} behaves as if the @option{--lint} command-line option is in effect. -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). With a value of @code{"fatal"}, lint warnings become fatal errors. With a value of @code{"invalid"}, only warnings about things that are actually invalid are issued. (This is not fully implemented yet.) @@ -10621,10 +10740,10 @@ of @command{awk} being executed. @cindex strings, converting, numbers to @item OFMT This string controls conversion of numbers to -strings (@pxref{Conversion, ,Conversion of Strings and Numbers}) for +strings (@pxref{Conversion}) for printing with the @code{print} statement. It works by being passed as the first argument to the @code{sprintf} function -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). Its default value is @code{"%.6g"}. Earlier versions of @command{awk} also used @code{OFMT} to specify the format for converting numbers to strings in general expressions; this is now done by @code{CONVFMT}. @@ -10656,13 +10775,13 @@ It can also be the null string, in which case records are separated by runs of blank lines. If it is a regexp, records are separated by matches of the regexp in the input text. -(@xref{Records, ,How Input Is Split into Records}.) +(@xref{Records}.) The ability for @code{RS} to be a regular expression is a @command{gawk} extension. In most other @command{awk} implementations, or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), just the first character of @code{RS}'s value is used. @cindex @code{SUBSEP} variable @@ -10673,7 +10792,7 @@ This is the subscript separator. It has the default value of @code{"\034"} and is used to separate the parts of the indices of a multidimensional array. Thus, the expression @code{@w{foo["A", "B"]}} really accesses @code{foo["A\034B"]} -(@pxref{Multi-dimensional, ,Multidimensional Arrays}). +(@pxref{Multi-dimensional}). @cindex @code{TEXTDOMAIN} variable @cindex differences in @command{awk} and @command{gawk}, @code{TEXTDOMAIN} variable @@ -10683,13 +10802,13 @@ This variable is used for internationalization of programs at the @command{awk} level. It sets the default text domain for specially marked string constants in the source text, as well as for the @code{dcgettext}, @code{dcngettext} and @code{bindtextdomain} functions -(@pxref{Internationalization, ,Internationalization with @command{gawk}}). +(@pxref{Internationalization}). The default value of @code{TEXTDOMAIN} is @code{"messages"}. This variable is a @command{gawk} extension. In other @command{awk} implementations, or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), it is not special. @end table @c ENDOFRANGE bvar @@ -10697,7 +10816,7 @@ it is not special. @c ENDOFRANGE bvaru @c ENDOFRANGE nmbv -@node Auto-set, ARGC and ARGV, User-modified, Built-in Variables +@node Auto-set @subsection Built-in Variables That Convey Information @c STARTOFRANGE bvconi @@ -10716,7 +10835,7 @@ information to your program. The variables that are specific to @item ARGC@r{,} ARGV The command-line arguments available to @command{awk} programs are stored in an array called @code{ARGV}. @code{ARGC} is the number of command-line -arguments present. @xref{Other Arguments, ,Other Command-Line Arguments}. +arguments present. @xref{Other Arguments}. Unlike most @command{awk} arrays, @code{ARGV} is indexed from 0 to @code{ARGC} @minus{} 1. In the following example: @@ -10746,7 +10865,7 @@ method of accessing command-line arguments. The value of @code{ARGV[0]} can vary from system to system. Also, you should note that the program text is @emph{not} included in @code{ARGV}, nor are any of @command{awk}'s command-line options. -@xref{ARGC and ARGV, , Using @code{ARGC} and @code{ARGV}}, for information +@xref{ARGC and ARGV}, for information about how @command{awk} uses these variables. @cindex @code{ARGIND} variable @@ -10772,7 +10891,7 @@ next file is opened. This variable is a @command{gawk} extension. In other @command{awk} implementations, or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), it is not special. @cindex @code{ENVIRON} variable @@ -10789,7 +10908,7 @@ does not affect the environment passed on to any programs that Some operating systems may not have environment variables. On such systems, the @code{ENVIRON} array is empty (except for @w{@code{ENVIRON["AWKPATH"]}}, -@pxref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}). +@pxref{AWKPATH Variable}). @cindex @code{ERRNO} variable @cindex differences in @command{awk} and @command{gawk}, @code{ERRNO} variable @@ -10802,7 +10921,7 @@ then @code{ERRNO} contains a string describing the error. This variable is a @command{gawk} extension. In other @command{awk} implementations, or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), it is not special. @cindex @code{FILENAME} variable @@ -10812,7 +10931,7 @@ The name of the file that @command{awk} is currently reading. When no @value{DF}s are listed on the command line, @command{awk} reads from the standard input and @code{FILENAME} is set to @code{"-"}. @code{FILENAME} is changed each time a new file is read -(@pxref{Reading Files, ,Reading Input Files}). +(@pxref{Reading Files}). Inside a @code{BEGIN} rule, the value of @code{FILENAME} is @code{""}, since there are no input files being processed yet.@footnote{Some early implementations of Unix @command{awk} initialized @@ -10821,7 +10940,7 @@ processed. This behavior was incorrect and should not be relied upon in your programs.} @value{DARKCORNER} Note, though, that using @code{getline} -(@pxref{Getline, ,Explicit Input with @code{getline}}) +(@pxref{Getline}) inside a @code{BEGIN} rule can give @code{FILENAME} a value. @@ -10829,14 +10948,14 @@ inside a @code{BEGIN} rule can give @item FNR The current record number in the current file. @code{FNR} is incremented each time a new record is read -(@pxref{Getline, ,Explicit Input with @code{getline}}). It is reinitialized +(@pxref{Getline}). It is reinitialized to zero each time a new input file is started. @cindex @code{NF} variable @item NF The number of fields in the current input record. @code{NF} is set each time a new record is read, when a new field is -created or when @code{$0} changes (@pxref{Fields, ,Examining Fields}). +created or when @code{$0} changes (@pxref{Fields}). Unlike most of the variables described in this @ifnotinfo @@ -10848,13 +10967,13 @@ node, assigning a value to @code{NF} has the potential to affect @command{awk}'s internal workings. In particular, assignments to @code{NF} can be used to create or remove fields from the -current record: @xref{Changing Fields, ,Changing the Contents of a Field}. +current record: @xref{Changing Fields}. @cindex @code{NR} variable @item NR The number of input records @command{awk} has processed since the beginning of the program's execution -(@pxref{Records, ,How Input Is Split into Records}). +(@pxref{Records}). @code{NR} is incremented each time a new record is read. @cindex @code{PROCINFO} array @@ -10897,19 +11016,19 @@ On some systems, there may be elements in the array, @code{"group1"} through @code{"group@var{N}"} for some @var{N}. @var{N} is the number of supplementary groups that the process has. Use the @code{in} operator to test for these elements -(@pxref{Reference to Elements, , Referring to an Array Element}). +(@pxref{Reference to Elements}). This array is a @command{gawk} extension. In other @command{awk} implementations, or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), it is not special. @cindex @code{RLENGTH} variable @item RLENGTH The length of the substring matched by the @code{match} function -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). @code{RLENGTH} is set by invoking the @code{match} function. Its value is the length of the matched string, or @minus{}1 if no match is found. @@ -10917,7 +11036,7 @@ is the length of the matched string, or @minus{}1 if no match is found. @item RSTART The start-index in characters of the substring that is matched by the @code{match} function -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). @code{RSTART} is set by invoking the @code{match} function. Its value is the position of the string where the matched substring starts, or zero if no match was found. @@ -10931,7 +11050,7 @@ that matched the text denoted by @code{RS}, the record separator. This variable is a @command{gawk} extension. In other @command{awk} implementations, or if @command{gawk} is in compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), it is not special. @end table @c ENDOFRANGE bvconi @@ -10965,18 +11084,18 @@ $ echo '1 @noindent Before @code{FNR} was added to the @command{awk} language -(@pxref{V7/SVR3.1, ,Major Changes Between V7 and SVR3.1}), +(@pxref{V7/SVR3.1}), many @command{awk} programs used this feature to track the number of records in a file by resetting @code{NR} to zero when @code{FILENAME} changed. -@node ARGC and ARGV, , Auto-set, Built-in Variables +@node ARGC and ARGV @subsection Using @code{ARGC} and @code{ARGV} @cindex @code{ARGC}/@code{ARGV} variables @cindex arguments, command-line @cindex command line, arguments -@ref{Auto-set, ,Built-in Variables That Convey Information}, +@ref{Auto-set}, presented the following program describing the information contained in @code{ARGC} and @code{ARGV}: @@ -10997,7 +11116,7 @@ contains @samp{inventory-shipped}, and @code{ARGV[2]} contains Notice that the @command{awk} program is not entered in @code{ARGV}. The other special command-line options, with their arguments, are also not entered. This includes variable assignments done with the @option{-v} -option (@pxref{Options, ,Command-Line Options}). +option (@pxref{Options}). Normal variable assignments on the command line @emph{are} treated as arguments and do show up in the @code{ARGV} array: @@ -11036,12 +11155,12 @@ special feature, @command{awk} ignores @value{FN}s that have been replaced with the null string. Another option is to use the @code{delete} statement to remove elements from -@code{ARGV} (@pxref{Delete, ,The @code{delete} Statement}). +@code{ARGV} (@pxref{Delete}). All of these actions are typically done in the @code{BEGIN} rule, before actual processing of the input begins. -@xref{Split Program, ,Splitting a Large File into Pieces}, and see -@ref{Tee Program, ,Duplicating Output into Multiple Files}, for examples +@xref{Split Program}, and see +@ref{Tee Program}, for examples of each way of removing elements from @code{ARGV}. The following fragment processes @code{ARGV} in order to examine, and then remove, command-line options: @@ -11090,7 +11209,7 @@ Because @option{-d} is not a valid @command{gawk} option, it and the following @option{-v} are passed on to the @command{awk} program. -@node Arrays, Functions, Patterns and Actions, Top +@node Arrays @chapter Arrays in @command{awk} @c STARTOFRANGE arrs @cindex arrays @@ -11114,7 +11233,7 @@ for sorting an array based on its indices. @cindex namespace issues @command{awk} maintains a single set of names that may be used for naming variables, arrays, and functions -(@pxref{User-defined, ,User-Defined Functions}). +(@pxref{User-defined}). Thus, you cannot have a variable and an array with the same name in the same @command{awk} program. @@ -11137,7 +11256,7 @@ same @command{awk} program. * Array Sorting:: Sorting array values and indices. @end menu -@node Array Intro, Reference to Elements, Arrays, Arrays +@node Array Intro @section Introduction to Arrays The @command{awk} language provides one-dimensional arrays @@ -11269,7 +11388,7 @@ numeric form---thus illustrating that a single array can have both numbers and strings as indices. In fact, array subscripts are always strings; this is discussed in more detail in -@ref{Numeric Array Subscripts, ,Using Numbers to Subscript Arrays}. +@ref{Numeric Array Subscripts}. Here, the number @code{1} isn't double-quoted, since @command{awk} automatically converts it to a string. @@ -11282,14 +11401,14 @@ to retrieve it. When @command{awk} creates an array (e.g., with the @code{split} built-in function), that array's indices are consecutive integers starting at one. -(@xref{String Functions, ,String Manipulation Functions}.) +(@xref{String Functions}.) @command{awk}'s arrays are efficient---the time to access an element is independent of the number of elements in the array. @c ENDOFRANGE arrin @c ENDOFRANGE inarr -@node Reference to Elements, Assigning Elements, Array Intro, Arrays +@node Reference to Elements @section Referring to an Array Element @cindex arrays, elements, referencing @cindex elements in arrays @@ -11312,7 +11431,7 @@ of array @code{foo} at index @samp{4.3}. A reference to an array element that has no recorded value yields a value of @code{""}, the null string. This includes elements that have not been assigned any value as well as elements that have been -deleted (@pxref{Delete, ,The @code{delete} Statement}). Such a reference +deleted (@pxref{Delete}). Such a reference automatically creates that array element, with the null string as its value. (In some cases, this is unfortunate, because it might waste memory inside @command{awk}.) @@ -11351,7 +11470,7 @@ if (frequencies[2] != "") print "Subscript 2 is present." @end example -@node Assigning Elements, Array Example, Reference to Elements, Arrays +@node Assigning Elements @section Assigning Array Elements @cindex arrays, elements, assigning @cindex elements in arrays, assigning @@ -11369,7 +11488,7 @@ Array elements can be assigned values just like assigned a value. The expression @var{value} is the value to assign to that element of the array. -@node Array Example, Scanning an Array, Assigning Elements, Arrays +@node Array Example @section Basic Array Example The following program takes a list of lines, each beginning with a line @@ -11437,7 +11556,7 @@ END @{ @} @end example -@node Scanning an Array, Delete, Array Example, Arrays +@node Scanning an Array @section Scanning All Elements of an Array @cindex elements in arrays, scanning @cindex arrays, scanning @@ -11470,7 +11589,7 @@ the word as index. The second rule scans the elements of @code{used} to find all the distinct words that appear in the input. It prints each word that is more than 10 characters long and also prints the number of such words. -@xref{String Functions, ,String Manipulation Functions}, +@xref{String Functions}, for more information on the built-in function @code{length}. @example @@ -11492,7 +11611,7 @@ END @{ @end example @noindent -@xref{Word Sorting, ,Generating Word Usage Counts}, +@xref{Word Sorting}, for a more detailed example of this type. @cindex arrays, elements, order of @@ -11505,7 +11624,7 @@ the loop body; it is not predictable whether the @code{for} loop will reach them. Similarly, changing @var{var} inside the loop may produce strange results. It is best to avoid such things. -@node Delete, Numeric Array Subscripts, Scanning an Array, Arrays +@node Delete @section The @code{delete} Statement @cindex @code{delete} statement @cindex deleting elements in arrays @@ -11520,7 +11639,7 @@ delete @var{array}[@var{index}] @end example Once an array element has been deleted, any value the element once -had is no longer available. It is as if the element had never +had is no longer available. It is as if the element had never been referred to or had been given a value. The following is an example of deleting elements in an array: @@ -11555,7 +11674,7 @@ if (4 in foo) @cindex lint checking, array elements It is not an error to delete an element that does not exist. If @option{--lint} is provided on the command line -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), @command{gawk} issues a warning message when an element that is not in the array is deleted. @@ -11571,7 +11690,7 @@ delete @var{array} @end example This ability is a @command{gawk} extension; it is not available in -compatibility mode (@pxref{Options, ,Command-Line Options}). +compatibility mode (@pxref{Options}). Using this version of the @code{delete} statement is about three times more efficient than the equivalent loop that deletes each element one @@ -11589,7 +11708,7 @@ split("", array) @c comma before deleting does NOT start a tertiary @cindex @code{split} function, array elements, deleting The @code{split} function -(@pxref{String Functions, ,String Manipulation Functions}) +(@pxref{String Functions}) clears out the target array first. This call asks it to split apart the null string. Because there is no data to split out, the function simply clears the array and then returns. @@ -11602,7 +11721,7 @@ delete an array and then use the array's name as a scalar a[1] = 3; delete a; a = 3 @end example -@node Numeric Array Subscripts, Uninitialized Subscripts, Delete, Arrays +@node Numeric Array Subscripts @section Using Numbers to Subscript Arrays @cindex numbers, as array subscripts @@ -11612,7 +11731,7 @@ a[1] = 3; delete a; a = 3 An important aspect about arrays to remember is that @emph{array subscripts are always strings}. When a numeric value is used as a subscript, it is converted to a string value before being used for subscripting -(@pxref{Conversion, ,Conversion of Strings and Numbers}). +(@pxref{Conversion}). This means that the value of the built-in variable @code{CONVFMT} can affect how your program accesses elements of an array. For example: @@ -11640,7 +11759,7 @@ since @code{"12.15"} is a different string from @code{"12.153"}. @cindex converting, during subscripting According to the rules for conversions -(@pxref{Conversion, ,Conversion of Strings and Numbers}), integer +(@pxref{Conversion}), integer values are always converted to strings as integers, no matter what the value of @code{CONVFMT} may happen to be. So the usual case of the following works: @@ -11653,7 +11772,7 @@ for (i = 1; i <= maxsub; i++) The ``integer values always convert to strings as integers'' rule has an additional consequence for array indexing. Octal and hexadecimal constants -(@pxref{Nondecimal-numbers, ,Octal and Hexadecimal Numbers}) +(@pxref{Nondecimal-numbers}) are converted internally into numbers, and their original form is forgotten. This means, for example, that @@ -11668,7 +11787,7 @@ things work as one would expect them to. But it is useful to have a precise knowledge of the actual rules which sometimes can have a subtle effect on your programs. -@node Uninitialized Subscripts, Multi-dimensional, Numeric Array Subscripts, Arrays +@node Uninitialized Subscripts @section Using Uninitialized Variables as Subscripts @c last comma does NOT start a tertiary @@ -11726,9 +11845,9 @@ Even though it is somewhat unusual, the null string @value{DARKCORNER} @command{gawk} warns about the use of the null string as a subscript if @option{--lint} is provided -on the command line (@pxref{Options, ,Command-Line Options}). +on the command line (@pxref{Options}). -@node Multi-dimensional, Multi-scanning, Uninitialized Subscripts, Arrays +@node Multi-dimensional @section Multidimensional Arrays @cindex subscripts in arrays, multidimensional @@ -11744,7 +11863,7 @@ two-dimensional array named @code{grid} is with Multidimensional arrays are supported in @command{awk} through concatenation of indices into one string. @command{awk} converts the indices into strings -(@pxref{Conversion, ,Conversion of Strings and Numbers}) and +(@pxref{Conversion}) and concatenates them together, with a separator between them. This creates a single string that describes the values of the separate indices. The combined string is used as a single index into an ordinary, @@ -11826,7 +11945,7 @@ the program produces the following output: 3 2 1 6 @end example -@node Multi-scanning, Array Sorting, Multi-dimensional, Arrays +@node Multi-scanning @section Scanning Multidimensional Arrays There is no special @code{for} statement for scanning a @@ -11839,9 +11958,9 @@ multidimensional @emph{way of accessing} an array. However, if your program has an array that is always accessed as multidimensional, you can get the effect of scanning it by combining the scanning @code{for} statement -(@pxref{Scanning an Array, ,Scanning All Elements of an Array}) with the +(@pxref{Scanning an Array}) with the built-in @code{split} function -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). It works in the following manner: @example @@ -11874,7 +11993,7 @@ The result is to set @code{separate[1]} to @code{"1"} and @code{separate[2]} to @code{"foo"}. Presto! The original sequence of separate indices is recovered. -@node Array Sorting, , Multi-scanning, Arrays +@node Array Sorting @section Sorting Array Values and Indices with @command{gawk} @cindex arrays, sorting @@ -11890,7 +12009,7 @@ While this can be educational for exploring different sorting algorithms, usually that's not the point of the program. @command{gawk} provides the built-in @code{asort} and @code{asorti} functions -(@pxref{String Functions, ,String Manipulation Functions}) +(@pxref{String Functions}) for sorting arrays. For example: @example @@ -11906,7 +12025,7 @@ to some number @var{n}, the total number of elements in @code{data}. @code{data[1]} @value{LEQ} @code{data[2]} @value{LEQ} @code{data[3]}, and so on. The comparison of array elements is done using @command{gawk}'s usual comparison rules -(@pxref{Typing and Comparison, ,Variable Typing and Comparison Expressions}). +(@pxref{Typing and Comparison}). @cindex side effects, @code{asort} function An important side effect of calling @code{asort} is that @@ -11983,7 +12102,7 @@ affects sorting for both @code{asort} and @code{asorti}. Caveat Emptor. @c ENDOFRANGE arrs -@node Functions, Internationalization, Arrays, Top +@node Functions @chapter Functions @c STARTOFRANGE funcbi @@ -12006,7 +12125,7 @@ The second half of this @value{CHAPTER} describes these * User-defined:: Describes User-defined functions in detail. @end menu -@node Built-in, User-defined, Functions, Functions +@node Built-in @section Built-in Functions @c 2e: USE TEXINFO-2 FUNCTION DEFINITION STUFF!!!!!!!!!!!!! @@ -12028,7 +12147,7 @@ but are summarized here for your convenience. * I18N Functions:: Functions for string translation. @end menu -@node Calling Built-in, Numeric Functions, Built-in, Built-in +@node Calling Built-in @subsection Calling Built-in Functions To call one of @command{awk}'s built-in functions, write the name of @@ -12087,7 +12206,7 @@ and 12. But if the order of evaluation is right to left, @code{i} first becomes 10, then 11, and @code{atan2} is called with the two arguments 11 and 10. -@node Numeric Functions, String Functions, Calling Built-in, Built-in +@node Numeric Functions @subsection Numeric Functions The following list describes all of @@ -12137,7 +12256,7 @@ This returns the arctangent of @code{@var{y} / @var{x}} in radians. @cindex random numbers, @code{rand}/@code{srand} functions This returns a random number. The values of @code{rand} are uniformly distributed between zero and one. -The value is never zero and never one.@footnote{The C version of @code{rand} +The value could be zero but is never one.@footnote{The C version of @code{rand} is known to produce fairly poor sequences of random numbers. However, nothing requires that an @command{awk} implementation use the C @code{rand} to implement the @command{awk} version of @code{rand}. @@ -12214,7 +12333,7 @@ easy to keep track of the seeds in case you need to consistently reproduce sequences of random numbers. @end table -@node String Functions, I/O Functions, Numeric Functions, Built-in +@node String Functions @subsection String-Manipulation Functions The functions in this @value{SECTION} look at or change the text of one or more @@ -12267,9 +12386,9 @@ a[3] = "sac" @end example The @code{asort} function is described in more detail in -@ref{Array Sorting, ,Sorting Array Values and Indices with @command{gawk}}. +@ref{Array Sorting}. @code{asort} is a @command{gawk} extension; it is not available -in compatibility mode (@pxref{Options, ,Command-Line Options}). +in compatibility mode (@pxref{Options}). @item asorti(@var{source} @r{[}, @var{dest}@r{]}) # @cindex @code{asorti} function (@command{gawk}) @@ -12281,10 +12400,10 @@ the comparison performed is always a string comparison. (Here too, @code{IGNORECASE} affects the sorting.) The @code{asorti} function is described in more detail in -@ref{Array Sorting, ,Sorting Array Values and Indices with @command{gawk}}. +@ref{Array Sorting}. It was added in @command{gawk} 3.1.2. @code{asorti} is a @command{gawk} extension; it is not available -in compatibility mode (@pxref{Options, ,Command-Line Options}). +in compatibility mode (@pxref{Options}). @item index(@var{in}, @var{find}) @cindex @code{index} function @@ -12336,7 +12455,7 @@ at which that substring begins (one, if it starts at the beginning of The @var{regexp} argument may be either a regexp constant (@samp{/@dots{}/}) or a string constant (@var{"@dots{}"}). In the latter case, the string is treated as a regexp to be matched. -@ref{Computed Regexps, ,Using Dynamic Regexps}, for a +@ref{Computed Regexps}, for a discussion of the difference between the two forms, and the implications for writing your program correctly. @@ -12430,10 +12549,15 @@ $ echo foooobazbarrrrr | @print{} 9 7 @end example +There may not be subscripts for the start and index for every parenthesized +subexpressions, since they may not all have matched text; thus they +should be tested for with the @code{in} operator +(@pxref{Reference to Elements}). + @cindex troubleshooting, @code{match} function The @var{array} argument to @code{match} is a @command{gawk} extension. In compatibility mode -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), using a third argument is a fatal error. @item split(@var{string}, @var{array} @r{[}, @var{fieldsep}@r{]}) @@ -12486,7 +12610,7 @@ the third argument to be a regexp constant (@code{/abc/}) as well as a string. @value{DARKCORNER} The POSIX standard allows this as well. -@ref{Computed Regexps, ,Using Dynamic Regexps}, for a +@ref{Computed Regexps}, for a discussion of the difference between using a string constant or a regexp constant, and the implications for writing your program correctly. @@ -12495,7 +12619,7 @@ elements in the array @var{array}. If @var{string} is null, the array has no elements. (So this is a portable way to delete an entire array with one statement. -@xref{Delete, ,The @code{delete} Statement}.) +@xref{Delete}.) If @var{string} does not match @var{fieldsep} at all (but is not null), @var{array} has one element only. The value of that element is the original @@ -12505,7 +12629,7 @@ If @var{string} does not match @var{fieldsep} at all (but is not null), @cindex @code{sprintf} function This returns (without printing) the string that @code{printf} would have printed out with the same arguments -(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}). +(@pxref{Printf}). For example: @example @@ -12534,11 +12658,11 @@ Using the @code{strtonum} function is @emph{not} the same as adding zero to a string value; the automatic coercion of strings to numbers works only for decimal data, not for octal or hexadecimal.@footnote{Unless you use the @option{--non-decimal-data} option, which isn't recommended. -@xref{Nondecimal Data, ,Allowing Nondecimal Input Data}, for more information.} +@xref{Nondecimal Data}, for more information.} @cindex differences in @command{awk} and @command{gawk}, @code{strtonum} function (@command{gawk}) @code{strtonum} is a @command{gawk} extension; it is not available -in compatibility mode (@pxref{Options, ,Command-Line Options}). +in compatibility mode (@pxref{Options}). @item sub(@var{regexp}, @var{replacement} @r{[}, @var{target}@r{]}) @cindex @code{sub} function @@ -12552,7 +12676,7 @@ The modified string becomes the new value of @var{target}. The @var{regexp} argument may be either a regexp constant (@samp{/@dots{}/}) or a string constant (@var{"@dots{}"}). In the latter case, the string is treated as a regexp to be matched. -@ref{Computed Regexps, ,Using Dynamic Regexps}, for a +@ref{Computed Regexps}, for a discussion of the difference between the two forms, and the implications for writing your program correctly. @@ -12605,7 +12729,7 @@ $ awk 'BEGIN @{ @noindent This shows how @samp{&} can represent a nonconstant string and also illustrates the ``leftmost, longest'' rule in regexp matching -(@pxref{Leftmost Longest, ,How Much Text Matches?}). +(@pxref{Leftmost Longest}). The effect of this special character (@samp{&}) can be turned off by putting a backslash before it in the string. As usual, to insert one backslash in @@ -12723,7 +12847,7 @@ If @var{regexp} does not match @var{target}, @code{gensub}'s return value is the original unchanged value of @var{target}. @code{gensub} is a @command{gawk} extension; it is not available -in compatibility mode (@pxref{Options, ,Command-Line Options}). +in compatibility mode (@pxref{Options}). @item substr(@var{string}, @var{start} @r{[}, @var{length}@r{]}) @cindex @code{substr} function @@ -12798,7 +12922,7 @@ Nonalphabetic characters are left unchanged. For example, @code{toupper("MiXeD cAsE 123")} returns @code{"MIXED CASE 123"}. @end table -@node Gory Details, , String Functions, String Functions +@node Gory Details @subsubsection More About @samp{\} and @samp{&} with @code{sub}, @code{gsub}, and @code{gensub} @cindex escape processing, @code{gsub}/@code{gensub}/@code{sub} functions @@ -13051,7 +13175,7 @@ $ echo abc | awk '@{ gsub(/m*/, "X"); print @}' @noindent Although this makes a certain amount of sense, it can be surprising. -@node I/O Functions, Time Functions, String Functions, Built-in +@node I/O Functions @subsection Input/Output Functions The following functions relate to input/output (I/O). @@ -13064,7 +13188,7 @@ Optional parameters are enclosed in square brackets ([ ]): Close the file @var{filename} for input or output. Alternatively, the argument may be a shell command that was used for creating a coprocess, or for redirecting to or from a pipe; then the coprocess or pipe is closed. -@xref{Close Files And Pipes, ,Closing Input and Output Redirections}, +@xref{Close Files And Pipes}, for more information. When closing a coprocess, it is occasionally useful to first close @@ -13073,7 +13197,7 @@ by providing a second argument to @code{close}. This second argument should be one of the two string values @code{"to"} or @code{"from"}, indicating which end of the pipe to close. Case in the string does not matter. -@xref{Two-way I/O, ,Two-Way Communications with Another Process}, +@xref{Two-way I/O}, which discusses this feature in more detail and gives an example. @item fflush(@r{[}@var{filename}@r{]}) @@ -13099,7 +13223,7 @@ buffers its output and the @code{fflush} function forces @code{fflush} was added to the Bell Laboratories research version of @command{awk} in 1994; it is not part of the POSIX standard and is not available if @option{--posix} has been specified on the -command line (@pxref{Options, ,Command-Line Options}). +command line (@pxref{Options}). @cindex @command{gawk}, @code{fflush} function in @command{gawk} extends the @code{fflush} function in two ways. The first @@ -13265,7 +13389,7 @@ second print If @command{awk} did not flush its buffers before calling @code{system}, you would see the latter (undesirable) output. -@node Time Functions, Bitwise Functions, I/O Functions, Built-in +@node Time Functions @subsection Using @command{gawk}'s Timestamp Functions @c STARTOFRANGE tst @@ -13371,7 +13495,7 @@ time data coming from an external source, such as a log file. The @code{strftime} function allows you to easily turn a timestamp into human-readable information. It is similar in nature to the @code{sprintf} function -(@pxref{String Functions, ,String Manipulation Functions}), +(@pxref{String Functions}), in that it copies nonformat specification characters verbatim to the returned string, while substituting date and time values for format specifications in the @var{format} string. @@ -13522,7 +13646,7 @@ and so on).@footnote{If you don't understand any of this, don't worry about it; these facilities are meant to make it easier to ``internationalize'' programs. Other internationalization features are described in -@ref{Internationalization, ,Internationalization with @command{gawk}}.} +@ref{Internationalization}.} (These facilitate compliance with the POSIX @command{date} utility.) @item %% @@ -13551,7 +13675,7 @@ A public-domain C version of @code{strftime} is supplied with @command{gawk} for systems that are not yet fully standards-compliant. It supports all of the just listed format specifications. If that version is -used to compile @command{gawk} (@pxref{Installation, ,Installing @command{gawk}}), +used to compile @command{gawk} (@pxref{Installation}), then the following additional format specifications are available: @table @code @@ -13634,7 +13758,7 @@ gawk 'BEGIN @{ @c ENDOFRANGE filogtst @c ENDOFRANGE gawtst -@node Bitwise Functions, I18N Functions, Time Functions, Built-in +@node Bitwise Functions @subsection Bit-Manipulation Functions of @command{gawk} @c STARTOFRANGE bit @cindex bitwise, operations @@ -13781,12 +13905,12 @@ Return the value of @var{val}, shifted right by @var{count} bits. @end multitable For all of these functions, first the double-precision floating-point value is -converted to a C @code{unsigned long}, then the bitwise operation is +converted to the widest C unsigned integer type, then the bitwise operation is performed and then the result is converted back into a C @code{double}. (If you don't understand this paragraph, don't worry about it.) Here is a user-defined function -(@pxref{User-defined, ,User-Defined Functions}) +(@pxref{User-defined}) that illustrates the use of these functions: @cindex @code{bits2str} user-defined function @@ -13882,7 +14006,7 @@ of 8-bit quantities. This is typical in modern computers. The main code in the @code{BEGIN} rule shows the difference between the decimal and octal values for the same numbers -(@pxref{Nondecimal-numbers, ,Octal and Hexadecimal Numbers}), +(@pxref{Nondecimal-numbers}), and then demonstrates the results of the @code{compl}, @code{lshift}, and @code{rshift} functions. @c ENDOFRANGE bit @@ -13891,7 +14015,7 @@ results of the @code{compl}, @code{lshift}, and @code{rshift} functions. @c ENDOFRANGE xor @c ENDOFRANGE opbit -@node I18N Functions, , Bitwise Functions, Built-in +@node I18N Functions @subsection Using @command{gawk}'s String-Translation Functions @cindex @command{gawk}, string-translation functions @cindex functions, string-translation @@ -13901,7 +14025,7 @@ results of the @code{compl}, @code{lshift}, and @code{rshift} functions. @command{gawk} provides facilities for internationalizing @command{awk} programs. These include the functions described in the following list. The descriptions here are purposely brief. -@xref{Internationalization, ,Internationalization with @command{gawk}}, +@xref{Internationalization}, for the full story. Optional parameters are enclosed in square brackets ([ ]): @@ -13939,7 +14063,7 @@ given @var{domain}. @c ENDOFRANGE funcbi @c ENDOFRANGE bifunc -@node User-defined, , Built-in, Functions +@node User-defined @section User-Defined Functions @c STARTOFRANGE udfunc @@ -13960,7 +14084,7 @@ them, i.e., to tell @command{awk} what they should do. * Dynamic Typing:: How variable types can change at runtime. @end menu -@node Definition Syntax, Function Example, User-defined, User-defined +@node Definition Syntax @subsection Function Definition Syntax @c STARTOFRANGE fdef @@ -14053,7 +14177,7 @@ the keyword @code{function} may be abbreviated @code{func}. However, POSIX only specifies the use of the keyword @code{function}. This actually has some practical implications. If @command{gawk} is in POSIX-compatibility mode -(@pxref{Options, ,Command-Line Options}), then the following +(@pxref{Options}), then the following statement does @emph{not} define a function: @example @@ -14074,7 +14198,7 @@ in @command{awk} programs.) To ensure that your @command{awk} programs are portable, always use the keyword @code{function} when defining a function. -@node Function Example, Function Caveats, Definition Syntax, User-defined +@node Function Example @subsection Function Definition Examples Here is an example of a user-defined function, called @code{myprint}, that @@ -14125,7 +14249,7 @@ function delarray(a, i) When working with arrays, it is often necessary to delete all the elements in an array and start over with a new list of elements -(@pxref{Delete, ,The @code{delete} Statement}). +(@pxref{Delete}). Instead of having to repeat this loop everywhere that you need to clear out an array, your program can just call @code{delarray}. @@ -14161,7 +14285,7 @@ $ echo "Don't Panic!" | The C @code{ctime} function takes a timestamp and returns it in a string, formatted in a well-known fashion. The following example uses the built-in @code{strftime} function -(@pxref{Time Functions, ,Using @command{gawk}'s Timestamp Functions}) +(@pxref{Time Functions}) to create an @command{awk} version of @code{ctime}: @cindex @code{ctime} user-defined function @@ -14183,7 +14307,7 @@ function ctime(ts, format) @end example @c ENDOFRANGE fdef -@node Function Caveats, Return Statement, Function Example, User-defined +@node Function Caveats @subsection Calling User-Defined Functions @c STARTOFRANGE fudc @@ -14302,18 +14426,18 @@ problem if a program calls an undefined function. @cindex lint checking, undefined functions If @option{--lint} is specified -(@pxref{Options, ,Command-Line Options}), +(@pxref{Options}), @command{gawk} reports calls to undefined functions. @cindex portability, @code{next} statement in user-defined functions Some @command{awk} implementations generate a runtime error if you use the @code{next} statement -(@pxref{Next Statement, , The @code{next} Statement}) +(@pxref{Next Statement}) inside a user-defined function. @command{gawk} does not have this limitation. @c ENDOFRANGE fudc -@node Return Statement, Dynamic Typing, Function Caveats, User-defined +@node Return Statement @subsection The @code{return} Statement @c comma does NOT start a secondary @cindex @code{return} statement, user-defined functions @@ -14404,7 +14528,7 @@ Given the following input: the program reports (predictably) that @code{99385} is the largest number in the array. -@node Dynamic Typing, , Return Statement, User-defined +@node Dynamic Typing @subsection Functions and Their Effects on Variable Typing @command{awk} is a very fluid language. @@ -14432,7 +14556,7 @@ being aware of them. @c ENDOFRANGE udfunc @c ENDOFRANGE funcud -@node Internationalization, Advanced Features, Functions, Top +@node Internationalization @chapter Internationalization with @command{gawk} Once upon a time, computer makers @@ -14467,7 +14591,7 @@ a requirement. * Gawk I18N:: @command{gawk} is also internationalized. @end menu -@node I18N and L10N, Explaining gettext, Internationalization, Internationalization +@node I18N and L10N @section Internationalization and Localization @cindex internationalization @@ -14484,7 +14608,7 @@ used for printing error messages, the language used to read responses, and information related to how numerical and monetary values are printed and read. -@node Explaining gettext, Programmer i18n, I18N and L10N, Internationalization +@node Explaining gettext @section GNU @code{gettext} @cindex internationalizing a program @@ -14633,7 +14757,7 @@ so on). This information is accessed via the POSIX character classes in regular expressions, such as @code{/[[:alnum:]]/} -(@pxref{Regexp Operators, ,Regular Expression Operators}). +(@pxref{Regexp Operators}). @cindex monetary information, localization @cindex currency symbols, localization @@ -14669,7 +14793,7 @@ All of the above. (Not too useful in the context of @code{gettext}.) @end table @c ENDOFRANGE gettex -@node Programmer i18n, Translator i18n, Explaining gettext, Internationalization +@node Programmer i18n @section Internationalizing @command{awk} Programs @c STARTOFRANGE inap @cindex @command{awk} programs, internationalizing @@ -14704,7 +14828,7 @@ one of the known locale categories described in the previous @value{SECTION}. @end ifnotinfo @ifinfo -@ref{Explaining gettext, ,GNU @code{gettext}}. +@ref{Explaining gettext}. @end ifinfo You must also supply a text domain. Use @code{TEXTDOMAIN} if you want to use the current domain. @@ -14751,7 +14875,7 @@ outlined in the previous @value{SECTION}, @end ifnotinfo @ifinfo -@ref{Explaining gettext, ,GNU @code{gettext}}, +@ref{Explaining gettext}, @end ifinfo like so: @@ -14761,9 +14885,9 @@ like so: @item Set the variable @code{TEXTDOMAIN} to the text domain of your program. This is best done in a @code{BEGIN} rule -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}), +(@pxref{BEGIN/END}), or it can also be done via the @option{-v} command-line -option (@pxref{Options, ,Command-Line Options}): +option (@pxref{Options}): @example BEGIN @{ @@ -14821,11 +14945,11 @@ BEGIN @{ @end enumerate -@xref{I18N Example, ,A Simple Internationalization Example}, +@xref{I18N Example}, for an example program showing the steps to create and use translations from @command{awk}. -@node Translator i18n, I18N Example, Programmer i18n, Internationalization +@node Translator i18n @section Translating @command{awk} Programs @cindex @code{.po} files @@ -14835,7 +14959,7 @@ and use translations from @command{awk}. Once a program's translatable strings have been marked, they must be extracted to create the initial @file{.po} file. As part of translation, it is often helpful to rearrange the order -in which arguments to @code{printf} are output. +in which arguments to @code{printf} are output. @command{gawk}'s @option{--gen-po} command-line option extracts the messages and is discussed next. @@ -14849,7 +14973,7 @@ is covered. * I18N Portability:: @command{awk}-level portability issues. @end menu -@node String Extraction, Printf Ordering, Translator i18n, Translator i18n +@node String Extraction @subsection Extracting Marked Strings @cindex strings, extracting @c comma does NOT start secondary @@ -14880,18 +15004,18 @@ appear as the first argument to @code{dcgettext} or as the first and second argument to @code{dcngettext}.@footnote{Starting with @code{gettext} version 0.11.5, the @command{xgettext} utility that comes with GNU @code{gettext} can handle @file{.awk} files.} -@xref{I18N Example, ,A Simple Internationalization Example}, +@xref{I18N Example}, for the full list of steps to go through to create and test translations for @command{guide}. -@node Printf Ordering, I18N Portability, String Extraction, Translator i18n +@node Printf Ordering @subsection Rearranging @code{printf} Arguments @cindex @code{printf} statement, positional specifiers @c comma does NOT start secondary @cindex positional specifiers, @code{printf} statement Format strings for @code{printf} and @code{sprintf} -(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}) +(@pxref{Printf}) present a special problem for translation. Consider the following:@footnote{This example is borrowed from the GNU @code{gettext} manual.} @@ -14978,7 +15102,7 @@ their primary purpose is to help in producing correct translations of format strings into languages different from the one in which the program is first written. -@node I18N Portability, , Printf Ordering, Translator i18n +@node I18N Portability @subsection @command{awk} Portability Issues @cindex portability, internationalization and @@ -15060,7 +15184,7 @@ retrieve the translated string, this should not be a problem in practice. @end itemize @c ENDOFRANGE inap -@node I18N Example, Gawk I18N, Translator i18n, Internationalization +@node I18N Example @section A Simple Internationalization Example Now let's look at a step-by-step example of how to internationalize and @@ -15180,7 +15304,7 @@ $ gawk -f guide.awk If the three replacement functions for @code{dcgettext}, @code{dcngettext} and @code{bindtextdomain} -(@pxref{I18N Portability, ,@command{awk} Portability Issues}) +(@pxref{I18N Portability}) are in a file named @file{libintl.awk}, then we can run @file{guide.awk} unchanged as follows: @@ -15191,7 +15315,7 @@ $ gawk --posix -f guide.awk -f libintl.awk @print{} Pardon me, Zaphod who? @end example -@node Gawk I18N, , I18N Example, Internationalization +@node Gawk I18N @section @command{gawk} Can Speak Your Language As of @value{PVERSION} 3.1, @command{gawk} itself has been internationalized @@ -15222,7 +15346,7 @@ before compiling and installing it. for more information. @c ENDOFRANGE inloc -@node Advanced Features, Invoking Gawk, Internationalization, Top +@node Advanced Features @chapter Advanced Features of @command{gawk} @cindex advanced features, network connections, See Also networks, connections @c STARTOFRANGE gawadv @@ -15254,7 +15378,7 @@ of TCP/IP networking and BSD portal files. Finally, @command{gawk} can @dfn{profile} an @command{awk} program, making it possible to tune it for performance. -@ref{Dynamic Extensions, ,Adding New Built-in Functions to @command{gawk}}, +@ref{Dynamic Extensions}, discusses the ability to dynamically add new built-in functions to @command{gawk}. As this feature is still immature and likely to change, its description is relegated to an appendix. @@ -15267,7 +15391,7 @@ its description is relegated to an appendix. * Profiling:: Profiling your @command{awk} programs. @end menu -@node Nondecimal Data, Two-way I/O, Advanced Features, Advanced Features +@node Nondecimal Data @section Allowing Nondecimal Input Data @cindex @code{--non-decimal-data} option @cindex advanced features, @command{gawk}, nondecimal input data @@ -15320,11 +15444,11 @@ facility disabled. If you want it, you must explicitly request it. @emph{Use of this option is not recommended.} It can break old programs very badly. Instead, use the @code{strtonum} function to convert your data -(@pxref{Nondecimal-numbers, ,Octal and Hexadecimal Numbers}). +(@pxref{Nondecimal-numbers}). This makes your programs easier to write and easier to read, and leads to less surprising results. -@node Two-way I/O, TCP/IP Networking, Nondecimal Data, Advanced Features +@node Two-way I/O @section Two-Way Communications with Another Process @cindex Brennan, Michael @cindex programmers, attractiveness of @@ -15440,7 +15564,7 @@ other one to do something. It is possible to close just one end of the two-way pipe to a coprocess, by supplying a second argument to the @code{close} function of either @code{"to"} or @code{"from"} -(@pxref{Close Files And Pipes, ,Closing Input and Output Redirections}). +(@pxref{Close Files And Pipes}). These strings tell @command{gawk} to close the end of the pipe that sends data to the process or the end that reads from it, respectively. @@ -15487,7 +15611,7 @@ Beginning with @command{gawk} 3.1.2, you may use Pseudo-ttys (ptys) for two-way communication instead of pipes, if your system supports them. This is done on a per-command basis, by setting a special element in the @code{PROCINFO} array -(@pxref{Auto-set, ,Built-in Variables That Convey Information}), +(@pxref{Auto-set}), like so: @example @@ -15503,7 +15627,7 @@ loss in performance. If your system does not have ptys, or if all the system's ptys are in use, @command{gawk} automatically falls back to using regular pipes. -@node TCP/IP Networking, Portal Files, Two-way I/O, Advanced Features +@node TCP/IP Networking @section Using @command{gawk} for Network Programming @cindex advanced features, @command{gawk}, network programming @cindex networks, programming @@ -15521,7 +15645,7 @@ is busy hung or dead.} In addition to being able to open a two-way pipeline to a coprocess on the same system -(@pxref{Two-way I/O, ,Two-Way Communications with Another Process}), +(@pxref{Two-way I/O}), it is possible to make a two-way connection to another process on another system across an IP networking connection. @@ -15591,7 +15715,7 @@ which comes as part of the @command{gawk} distribution, for a much more complete introduction and discussion, as well as extensive examples. -@node Portal Files, Profiling, TCP/IP Networking, Advanced Features +@node Portal Files @section Using @command{gawk} with BSD Portals @cindex advanced features, @command{gawk}, BSD portals @cindex portal files @@ -15604,7 +15728,7 @@ extensive examples. Similar to the @file{/inet} special files, if @command{gawk} is configured with the @option{--enable-portals} option -(@pxref{Quick Installation, , Compiling @command{gawk} for Unix}), +(@pxref{Quick Installation}), then @command{gawk} treats files whose pathnames begin with @code{/p} as 4.4 BSD-style portals. @@ -15616,7 +15740,7 @@ then manages creating the process associated with the portal and the corresponding communications with the portal's process. @c ENDOFRANGE tcpip -@node Profiling, , Portal Files, Advanced Features +@node Profiling @section Profiling Your @command{awk} Programs @c STARTOFRANGE awkp @cindex @command{awk} programs, profiling @@ -15793,7 +15917,7 @@ The counts next to the statements in the body show how many times those statements were executed. @cindex @code{@{@}} (braces), @command{pgawk} program -@cindex braces (@code{@{@}}), @command{pgawk} program +@cindex braces (@code{@{@}}), @command{pgawk} program @item The layout uses ``K&R'' style with tabs. Braces are used everywhere, even when @@ -15921,7 +16045,7 @@ keyboard. The @code{INT} signal is generated by the @c ENDOFRANGE awkp @c ENDOFRANGE proawk -@node Invoking Gawk, Library Functions, Advanced Features, Top +@node Invoking Gawk @chapter Running @command{awk} and @command{gawk} This @value{CHAPTER} covers how to run awk, both POSIX-standard @@ -15948,7 +16072,7 @@ full details. * Known Bugs:: Known Bugs in @command{gawk}. @end menu -@node Command Line, Options, Invoking Gawk, Invoking Gawk +@node Command Line @section Invoking @command{awk} @cindex command line, invoking @command{awk} from @cindex @command{awk}, invoking @@ -15987,7 +16111,7 @@ If @option{--lint} has been specified on the command line, @command{gawk} issues a warning that the program is empty. -@node Options, Other Arguments, Command Line, Invoking Gawk +@node Options @section Command-Line Options @c STARTOFRANGE ocl @cindex options, command-line @@ -16022,7 +16146,7 @@ The options and their meanings are as follows: @cindex @code{--field-separator} option @cindex @code{FS} variable, @code{--field-separator} option and Sets the @code{FS} variable to @var{fs} -(@pxref{Field Separators, ,Specifying How Fields Are Separated}). +(@pxref{Field Separators}). @item -f @var{source-file} @itemx --file @var{source-file} @@ -16040,7 +16164,7 @@ instead of in the first non-option argument. Sets the variable @var{var} to the value @var{val} @emph{before} execution of the program begins. Such variable values are available inside the @code{BEGIN} rule -(@pxref{Other Arguments, ,Other Command-Line Arguments}). +(@pxref{Other Arguments}). The @option{-v} option can only set one variable, but it can be used more than once, setting another variable each time, like this: @@ -16110,9 +16234,9 @@ Specifies @dfn{compatibility mode}, in which the GNU extensions to the @command{awk} language are disabled, so that @command{gawk} behaves just like the Bell Laboratories research version of Unix @command{awk}. @option{--traditional} is the preferred form of this option. -@xref{POSIX/GNU, ,Extensions in @command{gawk} Not in POSIX @command{awk}}, +@xref{POSIX/GNU}, which summarizes the extensions. Also see -@ref{Compatibility Mode, ,Downward Compatibility and Debugging}. +@ref{Compatibility Mode}. @item -W copyright @itemx --copyright @@ -16154,7 +16278,7 @@ names like @code{i}, @code{j}, etc.) Analyzes the source program and generates a GNU @code{gettext} Portable Object file on standard output for all string constants that have been marked for translation. -@xref{Internationalization, ,Internationalization with @command{gawk}}, +@xref{Internationalization}, for information about this option. @item -W help @@ -16190,7 +16314,7 @@ actually invalid are issued. (This is not fully implemented yet.) @cindex @code{--lint-old} option Warns about constructs that are not available in the original version of @command{awk} from Version 7 Unix -(@pxref{V7/SVR3.1, ,Major Changes Between V7 and SVR3.1}). +(@pxref{V7/SVR3.1}). @item -W non-decimal-data @itemx --non-decimal-data @@ -16200,7 +16324,7 @@ Warns about constructs that are not available in the original version of @cindex octal values, enabling interpretation of Enable automatic interpretation of octal and hexadecimal values in input data -(@pxref{Nondecimal Data, ,Allowing Nondecimal Input Data}). +(@pxref{Nondecimal Data}). @cindex troubleshooting, @code{--non-decimal-data} option @strong{Caution:} This option can severely break old programs. @@ -16229,15 +16353,15 @@ restrictions: @item Newlines do not act as whitespace to separate fields when @code{FS} is equal to a single space -(@pxref{Fields, , Examining Fields}). +(@pxref{Fields}). @item Newlines are not allowed after @samp{?} or @samp{:} -(@pxref{Conditional Exp, ,Conditional Expressions}). +(@pxref{Conditional Exp}). @item The synonym @code{func} for the keyword @code{function} is not -recognized (@pxref{Definition Syntax, ,Function Definition Syntax}). +recognized (@pxref{Definition Syntax}). @cindex @code{*} (asterisk), @code{**} operator @cindex asterisk (@code{*}), @code{**} operator @@ -16249,20 +16373,20 @@ recognized (@pxref{Definition Syntax, ,Function Definition Syntax}). @cindex caret (@code{^}), @code{^=} operator @item The @samp{**} and @samp{**=} operators cannot be used in -place of @samp{^} and @samp{^=} (@pxref{Arithmetic Ops, ,Arithmetic Operators}, -and also @pxref{Assignment Ops, ,Assignment Expressions}). +place of @samp{^} and @samp{^=} (@pxref{Arithmetic Ops}, +and also @pxref{Assignment Ops}). @cindex @code{FS} variable, as TAB character @item Specifying @samp{-Ft} on the command-line does not set the value of @code{FS} to be a single TAB character -(@pxref{Field Separators, ,Specifying How Fields Are Separated}). +(@pxref{Field Separators}). @c comma does not start secondary @cindex @code{fflush} function, unsupported @item The @code{fflush} built-in function is not supported -(@pxref{I/O Functions, ,Input/Output Functions}). +(@pxref{I/O Functions}). @end itemize @c @cindex automatic warnings @@ -16278,7 +16402,7 @@ also issues a warning if both options are supplied. @cindex @code{--profile} option @cindex @command{awk} programs, profiling, enabling Enable profiling of @command{awk} programs -(@pxref{Profiling, ,Profiling Your @command{awk} Programs}). +(@pxref{Profiling}). By default, profiles are created in a file named @file{awkprof.out}. The optional @var{file} argument allows you to specify a different @value{FN} for the profile file. @@ -16293,7 +16417,7 @@ call counts for each function. @cindex @code{--re-interval} option @cindex regular expressions, interval expressions and Allows interval expressions -(@pxref{Regexp Operators, , Regular Expression Operators}) +(@pxref{Regexp Operators}) in regexps. Because interval expressions were traditionally not available in @command{awk}, @command{gawk} does not provide them by default. This prevents old @command{awk} @@ -16308,7 +16432,7 @@ code that you enter on the command line. Program source code is taken from the @var{program-text}. This is particularly useful when you have library functions that you want to use from your command-line -programs (@pxref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}). +programs (@pxref{AWKPATH Variable}). @item -W version @itemx --version @@ -16320,7 +16444,7 @@ This allows you to determine if your copy of @command{gawk} is up to date with respect to whatever the Free Software Foundation is currently distributing. It is also useful for bug reports -(@pxref{Bugs, , Reporting Problems and Bugs}). +(@pxref{Bugs}). @end table As long as program text has been supplied, @@ -16332,7 +16456,7 @@ In compatibility mode, as a special case, if the value of @var{fs} supplied to the @option{-F} option is @samp{t}, then @code{FS} is set to the TAB character (@code{"\t"}). This is true only for @option{--traditional} and not for @option{--posix} -(@pxref{Field Separators, ,Specifying How Fields Are Separated}). +(@pxref{Field Separators}). @cindex @code{-f} option, on command line The @option{-f} option may be used more than once on the command line. @@ -16342,7 +16466,7 @@ useful for creating libraries of @command{awk} functions. These functions can be written once and then retrieved from a standard place, instead of having to be included into each individual program. (As mentioned in -@ref{Definition Syntax, ,Function Definition Syntax}, +@ref{Definition Syntax}, function names must be unique.) Library functions can still be used, even if the program is entered at the terminal, @@ -16357,7 +16481,7 @@ file and command-line @command{awk} programs, @command{gawk} provides the @option{--source} option. This does not require you to pre-empt the standard input for your source code; it allows you to easily mix command-line and library source code -(@pxref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}). +(@pxref{AWKPATH Variable}). @cindex @code{--source} option If no @option{-f} or @option{--source} option is specified, then @command{gawk} @@ -16400,7 +16524,7 @@ environments. @c ENDOFRANGE ocl @c ENDOFRANGE clo -@node Other Arguments, AWKPATH Variable, Options, Invoking Gawk +@node Other Arguments @section Other Command-Line Arguments @cindex command line, arguments @cindex arguments, command-line @@ -16411,7 +16535,7 @@ argument that has the form @code{@var{var}=@var{value}}, assigns the value @var{value} to the variable @var{var}---it does not specify a file at all. (This was discussed earlier in -@ref{Assignment Options, ,Assigning Variables on the Command Line}.) +@ref{Assignment Options}.) @cindex @code{ARGIND} variable, command-line arguments @cindex @code{ARGC}/@code{ARGV} variables, command-line arguments @@ -16434,7 +16558,7 @@ Therefore, the variables actually receive the given values after all previously specified files have been read. In particular, the values of variables assigned in this fashion are @emph{not} available inside a @code{BEGIN} rule -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}), +(@pxref{BEGIN/END}), because such rules are run before @command{awk} begins scanning the argument list. @cindex dark corner, escape sequences @@ -16468,7 +16592,7 @@ Given the variable assignment feature, the @option{-F} option for setting the value of @code{FS} is not strictly necessary. It remains for historical compatibility. -@node AWKPATH Variable, Obsolete, Other Arguments, Invoking Gawk +@node AWKPATH Variable @section The @env{AWKPATH} Environment Variable @cindex @env{AWKPATH} environment variable @cindex directories, searching @@ -16507,10 +16631,10 @@ would have to be typed for each file. By using both the @option{--source} and @option{-f} options, your command-line @command{awk} programs can use facilities in @command{awk} library files -(@pxref{Library Functions, , A Library of @command{awk} Functions}). +(@pxref{Library Functions}). Path searching is not done if @command{gawk} is in compatibility mode. This is true for both @option{--traditional} and @option{--posix}. -@xref{Options, ,Command-Line Options}. +@xref{Options}. @strong{Note:} If you want files in the current directory to be found, you must include the current directory in the path, either by including @@ -16534,7 +16658,7 @@ sense: the @env{AWKPATH} environment variable is used to find the program source files. Once your program is running, all the files have been found, and @command{gawk} no longer needs to use @env{AWKPATH}. -@node Obsolete, Undocumented, AWKPATH Variable, Invoking Gawk +@node Obsolete @section Obsolete Options and/or Features @cindex features, advanced, See advanced features @@ -16559,12 +16683,12 @@ in @command{gawk} 3.0 but still worked. Starting with @value{PVERSION} 3.1, the two-word usage is no longer accepted. The process-related special files described in -@ref{Special Process, ,Special Files for Process-Related Information}, +@ref{Special Process}, work as described, but are now considered deprecated. @command{gawk} prints a warning message every time they are used. (Use @code{PROCINFO} instead; see -@ref{Auto-set, ,Built-in Variables That Convey Information}.) +@ref{Auto-set}.) They will be removed from the next release of @command{gawk}. @ignore @@ -16573,10 +16697,10 @@ is thus essentially a place holder, in case some option becomes obsolete in a future version of @command{gawk}. @end ignore -@node Undocumented, Known Bugs, Obsolete, Invoking Gawk +@node Undocumented @section Undocumented Options and Features @cindex undocumented features -@cindex features, undocumented +@cindex features, undocumented @cindex Skywalker, Luke @cindex Kenobi, Obi-Wan @cindex Jedi knights @@ -16630,7 +16754,7 @@ awk '@{ sum += $1 @} @end example @noindent -@xref{Statements/Lines, ,@command{awk} Statements Versus Lines}, for a fuller +@xref{Statements/Lines}, for a fuller explanation. You can insert newlines after the @samp{;} in @code{for} loops. @@ -16654,7 +16778,7 @@ verbatim, instead of using the octal equivalent. @end ignore -@node Known Bugs, , Undocumented, Invoking Gawk +@node Known Bugs @section Known Bugs in @command{gawk} @cindex @command{gawk}, debugging @cindex debugging @command{gawk} @@ -16666,7 +16790,7 @@ verbatim, instead of using the octal equivalent. @cindex @code{FS} variable, changing value of @item The @option{-F} option for changing the value of @code{FS} -(@pxref{Options, ,Command-Line Options}) +(@pxref{Options}) is not necessary given the command-line variable assignment feature; it remains only for backward compatibility. @@ -16689,10 +16813,10 @@ It contains the following chapters: @itemize @bullet @item -@ref{Library Functions, ,A Library of @command{awk} Functions}. +@ref{Library Functions}. @item -@ref{Sample Programs, ,Practical @command{awk} Programs}. +@ref{Sample Programs}. @end itemize @@ -16702,7 +16826,7 @@ It contains the following chapters: @end iftex @end ignore -@node Library Functions, Sample Programs, Invoking Gawk, Top +@node Library Functions @chapter A Library of @command{awk} Functions @c STARTOFRANGE libf @cindex libraries of @command{awk} functions @@ -16711,7 +16835,7 @@ It contains the following chapters: @c STARTOFRANGE fudlib @cindex functions, user-defined, library of -@ref{User-defined, ,User-Defined Functions}, describes how to write +@ref{User-defined}, describes how to write your own @command{awk} functions. Writing functions is important, because it allows you to encapsulate algorithms and program tasks in a single place. It simplifies programming, making program development more @@ -16719,7 +16843,7 @@ manageable, and making programs more readable. One valuable way to learn a new programming language is to @emph{read} programs in that language. To that end, this @value{CHAPTER} -and @ref{Sample Programs, ,Practical @command{awk} Programs}, +and @ref{Sample Programs}, provide a good-sized body of code for you to read, and hopefully, to learn from. @@ -16730,7 +16854,7 @@ use these functions. The functions are presented here in a progression from simple to complex. @cindex Texinfo -@ref{Extract Program, ,Extracting Programs from Texinfo Source Files}, +@ref{Extract Program}, presents a program that you can use to extract the source code for these example library functions and programs from the Texinfo source for this @value{DOCUMENT}. @@ -16739,11 +16863,11 @@ for this @value{DOCUMENT}. If you have written one or more useful, general-purpose @command{awk} functions and would like to contribute them to the author's collection of @command{awk} programs, see -@ref{How To Contribute, ,How to Contribute}, for more information. +@ref{How To Contribute}, for more information. @cindex portability, example programs The programs in this @value{CHAPTER} and in -@ref{Sample Programs, ,Practical @command{awk} Programs}, +@ref{Sample Programs}, freely use features that are @command{gawk}-specific. Rewriting these programs for different implementations of awk is pretty straightforward. @@ -16752,9 +16876,9 @@ Use @samp{| "cat 1>&2"} instead of @samp{> "/dev/stderr"} if your system does not have a @file{/dev/stderr}, or if you cannot use @command{gawk}. A number of programs use @code{nextfile} -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}) +(@pxref{Nextfile Statement}) to skip any remaining input in the input file. -@ref{Nextfile Function, ,Implementing @code{nextfile} as a Function}, +@ref{Nextfile Function}, shows you how to write a function that does the same thing. @c 12/2000: Thanks to Nelson Beebe for pointing out the output issue. @@ -16789,7 +16913,7 @@ comparisons use only lowercase letters. * Group Functions:: Functions for getting group information. @end menu -@node Library Names, General Functions, Library Functions, Library Functions +@node Library Names @section Naming Library Function Global Variables @cindex names, arrays/variables @@ -16808,7 +16932,7 @@ a specific function). There is no intermediate state analogous to Library functions often need to have global variables that they can use to preserve state information between calls to the function---for example, @code{getopt}'s variable @code{_opti} -(@pxref{Getopt Function, ,Processing Command-Line Options}). +(@pxref{Getopt Function}). Such variables are called @dfn{private}, since the only functions that need to use them are the ones in the library. @@ -16830,7 +16954,7 @@ with the user's program. In addition, several of the library functions use a prefix that helps indicate what function or set of functions use the variables---for example, @code{_pw_byname} in the user database routines -(@pxref{Passwd Functions, ,Reading the User Database}). +(@pxref{Passwd Functions}). This convention is recommended, since it even further decreases the chance of inadvertent conflict among variable names. Note that this convention is used equally well for variable names and for private @@ -16843,7 +16967,7 @@ As a final note on variable naming, if a function makes global variables available for use by a main program, it is a good convention to start that variable's name with a capital letter---for example, @code{getopt}'s @code{Opterr} and @code{Optind} variables -(@pxref{Getopt Function, ,Processing Command-Line Options}). +(@pxref{Getopt Function}). The leading capital letter indicates that it is global, while the fact that the variable name is not all capital letters indicates that the variable is not one of @command{awk}'s built-in variables, such as @code{FS}. @@ -16873,7 +16997,7 @@ A different convention, common in the Tcl community, is to use a single associative array to hold the values needed by the library function(s), or ``package.'' This significantly decreases the number of actual global names in use. For example, the functions described in -@ref{Passwd Functions, , Reading the User Database}, +@ref{Passwd Functions}, might have used array elements @code{@w{PW_data["inited"]}}, @code{@w{PW_data["total"]}}, @code{@w{PW_data["count"]}}, and @code{@w{PW_data["awklib"]}}, instead of @code{@w{_pw_inited}}, @code{@w{_pw_awklib}}, @code{@w{_pw_total}}, @@ -16883,7 +17007,7 @@ The conventions presented in this @value{SECTION} are exactly that: conventions. You are not required to write your programs this way---we merely recommend that you do so. -@node General Functions, Data File Management, Library Names, Library Functions +@node General Functions @section General Programming This @value{SECTION} presents a number of functions that are of general @@ -16903,7 +17027,7 @@ programming use. * Gettimeofday Function:: A function to get formatted times. @end menu -@node Nextfile Function, Assert Function, General Functions, General Functions +@node Nextfile Function @subsection Implementing @code{nextfile} as a Function @cindex input files, skipping @@ -16915,7 +17039,7 @@ programming use. @cindex @code{nextfile} statement, implementing @cindex @command{gawk}, @code{nextfile} statement in The @code{nextfile} statement, presented in -@ref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}, +@ref{Nextfile Statement}, is a @command{gawk}-specific extension---it is not available in most other implementations of @command{awk}. This @value{SECTION} shows two versions of a @code{nextfile} function that you can use to simulate @command{gawk}'s @@ -16939,7 +17063,7 @@ a private variable named @code{_abandon_}. If the @value{FN} matches, then the action part of the rule executes a @code{next} statement to go on to the next record. (The use of @samp{_} in the variable name is a convention. It is discussed more fully in -@ref{Library Names, , Naming Library Function Global Variables}.) +@ref{Library Names}.) The use of the @code{next} statement effectively creates a loop that reads all the records from the current @value{DF}. @@ -17035,7 +17159,7 @@ computations). @c ENDOFRANGE flibnex @c ENDOFRANGE nexim -@node Assert Function, Round Function, Nextfile Function, General Functions +@node Assert Function @subsection Assertions @c STARTOFRANGE asse @@ -17153,7 +17277,7 @@ to the program calling @code{assert}. Normally, if a program consists of just a @code{BEGIN} rule, the input files and/or standard input are not read. However, now that the program has an @code{END} rule, @command{awk} attempts to read the input @value{DF}s or standard input -(@pxref{Using BEGIN/END, , Startup and Cleanup Actions}), +(@pxref{Using BEGIN/END}), most likely causing the program to hang as it waits for input. @cindex @code{BEGIN} pattern, @code{assert} user-defined function and @@ -17165,7 +17289,7 @@ with an @code{exit} statement. @c ENDOFRANGE flibass @c ENDOFRANGE libfass -@node Round Function, Cliff Random Function, Assert Function, General Functions +@node Round Function @subsection Rounding Numbers @cindex rounding @@ -17177,7 +17301,7 @@ with an @code{exit} statement. @cindex @code{printf} statement, @code{sprintf} function and @cindex @code{sprintf} function, @code{print}/@code{printf} statements and The way @code{printf} and @code{sprintf} -(@pxref{Printf, , Using @code{printf} Statements for Fancier Printing}) +(@pxref{Printf}) perform rounding often depends upon the system's C @code{sprintf} subroutine. On many machines, @code{sprintf} rounding is ``unbiased,'' which means it doesn't always round a trailing @samp{.5} up, contrary @@ -17232,7 +17356,7 @@ function round(x, ival, aval, fraction) @c endfile @end example -@node Cliff Random Function, Ordinal Functions, Round Function, General Functions +@node Cliff Random Function @subsection The Cliff Random Number Generator @cindex random numbers, Cliff @cindex Cliff random numbers @@ -17277,7 +17401,7 @@ If the built-in @code{rand} function (@pxref{Numeric Functions}) isn't random enough, you might try using this function instead. -@node Ordinal Functions, Join Function, Cliff Random Function, General Functions +@node Ordinal Functions @subsection Translating Between Characters and Numbers @cindex libraries of @command{awk} functions, character values as numbers @@ -17397,7 +17521,7 @@ written this way initially for ease of development. There is a ``test program'' in a @code{BEGIN} rule, to test the function. It is commented out for production use. -@node Join Function, Gettimeofday Function, Ordinal Functions, General Functions +@node Join Function @subsection Merging an Array into a String @cindex libraries of @command{awk} functions, merging arrays into strings @@ -17408,14 +17532,14 @@ When doing string processing, it is often useful to be able to join all the strings in an array into one long string. The following function, @code{join}, accomplishes this task. It is used later in several of the application programs -(@pxref{Sample Programs, ,Practical @command{awk} Programs}). +(@pxref{Sample Programs}). Good function design is important; this function needs to be general but it should also have a reasonable default behavior. It is called with an array as well as the beginning and ending indices of the elements in the array to be merged. This assumes that the array indices are numeric---a reasonable assumption since the array was likely created with @code{split} -(@pxref{String Functions, ,String Manipulation Functions}): +(@pxref{String Functions}): @cindex @code{join} user-defined function @example @@ -17457,7 +17581,7 @@ be nice if @command{awk} had an assignment operator for concatenation. The lack of an explicit operator for concatenation makes string operations more difficult than they really need to be.} -@node Gettimeofday Function, , Join Function, General Functions +@node Gettimeofday Function @subsection Managing the Time of Day @cindex libraries of @command{awk} functions, managing, time @@ -17465,7 +17589,7 @@ more difficult than they really need to be.} @cindex timestamps, formatted @cindex time, managing The @code{systime} and @code{strftime} functions described in -@ref{Time Functions, ,Using @command{gawk}'s Timestamp Functions}, +@ref{Time Functions}, provide the minimum functionality necessary for dealing with the time of day in human readable form. While @code{strftime} is extensive, the control formats are not necessarily easy to remember or intuitively obvious when @@ -17551,13 +17675,13 @@ function gettimeofday(time, ret, now, i) The string indices are easier to use and read than the various formats required by @code{strftime}. The @code{alarm} program presented in -@ref{Alarm Program, ,An Alarm Clock Program}, +@ref{Alarm Program}, uses this function. A more general design for the @code{gettimeofday} function would have allowed the user to supply an optional timestamp value to use instead of the current time. -@node Data File Management, Getopt Function, General Functions, Library Functions +@node Data File Management @section @value{DDF} Management @c STARTOFRANGE dataf @@ -17573,17 +17697,18 @@ command-line @value{DF}s. * Filetrans Function:: A function for handling data file transitions. * Rewind Function:: A function for rereading the current file. * File Checking:: Checking that data files are readable. +* Empty Files:: Checking for zero-length files. * Ignoring Assigns:: Treating assignments as file names. @end menu -@node Filetrans Function, Rewind Function, Data File Management, Data File Management +@node Filetrans Function @subsection Noting @value{DDF} Boundaries @cindex files, managing, @value{DF} boundaries @cindex files, initialization and cleanup The @code{BEGIN} and @code{END} rules are each executed exactly once at -the beginning and end of your @command{awk} program, respectively -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). +the beginning and end of your @command{awk} program, respectively +(@pxref{BEGIN/END}). We (the @command{gawk} authors) once had a user who mistakenly thought that the @code{BEGIN} rule is executed at the beginning of each @value{DF} and the @code{END} rule is executed at the end of each @value{DF}. When informed @@ -17646,7 +17771,7 @@ again the value of multiple @code{BEGIN} and @code{END} rules should be clear. @cindex @code{beginfile} user-defined function @cindex @code{endfile} user-defined function This version has same problem as the first version of @code{nextfile} -(@pxref{Nextfile Function, ,Implementing @code{nextfile} as a Function}). +(@pxref{Nextfile Function}). If the same @value{DF} occurs twice in a row on the command line, then @code{endfile} and @code{beginfile} are not executed at the end of the first pass and at the beginning of the second pass. @@ -17678,18 +17803,18 @@ END @{ endfile(_filename_) @} @c endfile @end example -@ref{Wc Program, ,Counting Things}, +@ref{Wc Program}, shows how this library function can be used and how it simplifies writing the main program. -@node Rewind Function, File Checking, Filetrans Function, Data File Management +@node Rewind Function @subsection Rereading the Current File @cindex files, reading Another request for a new built-in function was for a @code{rewind} function that would make it possible to reread the current file. The requesting user didn't want to have to use @code{getline} -(@pxref{Getline, , Explicit Input with @code{getline}}) +(@pxref{Getline}) inside a loop. However, as long as you are not in the @code{END} rule, it is @@ -17730,7 +17855,7 @@ function rewind( i) @end example This code relies on the @code{ARGIND} variable -(@pxref{Auto-set, ,Built-in Variables That Convey Information}), +(@pxref{Auto-set}), which is specific to @command{gawk}. If you are not using @command{gawk}, you can use ideas presented in @@ -17738,17 +17863,17 @@ If you are not using the previous @value{SECTION} @end ifnotinfo @ifinfo -@ref{Filetrans Function, ,Noting @value{DDF} Boundaries}, +@ref{Filetrans Function}, @end ifinfo to either update @code{ARGIND} on your own or modify this code as appropriate. The @code{rewind} function also relies on the @code{nextfile} keyword -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). -@xref{Nextfile Function, ,Implementing @code{nextfile} as a Function}, +(@pxref{Nextfile Statement}). +@xref{Nextfile Function}, for a function version of @code{nextfile}. -@node File Checking, Ignoring Assigns, Rewind Function, Data File Management +@node File Checking @subsection Checking for Readable @value{DDF}s @cindex troubleshooting, readable @value{DF}s @@ -17797,14 +17922,116 @@ skips the file (since it's no longer in the list). @c This doesn't handle /dev/stdin etc. Not worth the hassle to mention or fix. -@node Ignoring Assigns, , File Checking, Data File Management +@node Empty Files +@subsection Checking For Zero-length Files + +All known @command{awk} implementations silently skip over zero-length files. +This is a by-product of @command{awk}'s implicit +read-a-record-and-match-against-the-rules loop: when @command{awk} +tries to read a record from an empty file, it immediately receives an +end of file indication, closes the file, and proceeds on to the next +command-line @value{DF}, @emph{without} executing any user-level +@command{awk} program code. + +Using @command{gawk}'s @code{ARGIND} variable +(@pxref{Built-in Variables}), it is possible to detect when an empty +@value{DF} has been skipped. Similar to the library file presented +in @ref{Filetrans Function}, the following library file calls a function named +@code{zerofile} that the user must provide. The arguments passed are +the @value{FN} and the position in @code{ARGV} where it was found: + +@cindex @code{zerofile.awk} program +@example +@c file eg/lib/zerofile.awk +# zerofile.awk --- library file to process empty input files +@c endfile +@ignore +@c file eg/lib/zerofile.awk +# +# Arnold Robbins, arnold@@gnu.org, Public Domain +# June 2003 + +@c endfile +@end ignore +@c file eg/lib/zerofile.awk +BEGIN @{ Argind = 0 @} + +ARGIND > Argind + 1 @{ + for (Argind++; Argind < ARGIND; Argind++) + zerofile(ARGV[Argind], Argind) +@} + +ARGIND != Argind @{ Argind = ARGIND @} + +END @{ + if (ARGIND > Argind) + for (Argind++; Argind <= ARGIND; Argind++) + zerofile(ARGV[Argind], Argind) +@} +@c endfile +@end example + +The user-level variable @code{Argind} allows the @command{awk} program +to track its progress through @code{ARGV}. Whenever the program detects +that @code{ARGIND} is greater than @samp{Argind + 1}, it means that one or +more empty files were skipped. The action then calls @code{zerofile} for +each such file, incrementing @code{Argind} along the way. + +The @samp{Argind != ARGIND} rule simply keeps @code{Argind} up to date +in the normal case. + +Finally, the @code{END} rule catches the case of any empty files at +the end of the command-line arguments. Note that the test in the +condition of the @code{for} loop uses the @samp{<=} operator, +not @code{<}. + +As an exercise, you might consider whether this same problem can +be solved without relying on @command{gawk}'s @code{ARGIND} variable. + +As a second exercise, revise this code to handle the case where +an intervening value in @code{ARGV} is a variable assignment. + +@ignore +# zerofile2.awk --- same thing, portably +BEGIN @{ + ARGIND = Argind = 0 + for (i = 1; i < ARGC; i++) + Fnames[ARGV[i]]++ + +@} +FNR == 1 @{ + while (ARGV[ARGIND] != FILENAME) + ARGIND++ + Seen[FILENAME]++ + if (Seen[FILENAME] == Fnames[FILENAME]) + do + ARGIND++ + while (ARGV[ARGIND] != FILENAME) +@} +ARGIND > Argind + 1 @{ + for (Argind++; Argind < ARGIND; Argind++) + zerofile(ARGV[Argind], Argind) +@} +ARGIND != Argind @{ + Argind = ARGIND +@} +END @{ + if (ARGIND < ARGC - 1) + ARGIND = ARGC - 1 + if (ARGIND > Argind) + for (Argind++; Argind <= ARGIND; Argind++) + zerofile(ARGV[Argind], Argind) +@} +@end ignore + +@node Ignoring Assigns @subsection Treating Assignments as @value{FFN}s @cindex assignments as filenames @cindex filenames, assignments as Occasionally, you might not want @command{awk} to process command-line variable assignments -(@pxref{Assignment Options, ,Assigning Variables on the Command Line}). +(@pxref{Assignment Options}). In particular, if you have @value{FN}s that contain an @samp{=} character, @command{awk} treats the @value{FN} as an assignment, and does not process it. @@ -17848,7 +18075,7 @@ awk -v No_command_assign=1 -f noassign.awk -f yourprog.awk * @end example The function works by looping through the arguments. -It prepends @samp{./} to +It prepends @samp{./} to any argument that matches the form of a variable assignment, turning that argument into a @value{FN}. @@ -17860,7 +18087,7 @@ are left alone. @c ENDOFRANGE flibdataf @c ENDOFRANGE libfdataf -@node Getopt Function, Passwd Functions, Data File Management, Library Functions +@node Getopt Function @section Processing Command-Line Options @c STARTOFRANGE libfclo @@ -17877,7 +18104,7 @@ are left alone. Most utilities on POSIX compatible systems take options, or ``switches,'' on the command line that can be used to change the way a program behaves. @command{awk} is an example of such a program -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). Often, options take @dfn{arguments}; i.e., data that the program needs to correctly obey the command-line option. For example, @command{awk}'s @option{-F} option requires a string to use as the field separator. @@ -17973,7 +18200,7 @@ main(int argc, char *argv[]) As a side point, @command{gawk} actually uses the GNU @code{getopt_long} function to process both normal and GNU-style long options -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). The abstraction provided by @code{getopt} is very useful and is quite handy in @command{awk} programs as well. Following is an @command{awk} @@ -17981,7 +18208,7 @@ version of @code{getopt}. This function highlights one of the greatest weaknesses in @command{awk}, which is that it is very poor at manipulating single characters. Repeated calls to @code{substr} are necessary for accessing individual characters -(@pxref{String Functions, ,String Manipulation Functions}).@footnote{This +(@pxref{String Functions}).@footnote{This function was written before @command{gawk} acquired the ability to split strings into single characters using @code{""} as the separator. We have left it alone, since using @code{substr} is more portable.} @@ -18203,14 +18430,14 @@ In both runs, the first @option{--} terminates the arguments to @command{awk}, so that it does not try to interpret the @option{-a}, etc., as its own options. Several of the sample programs presented in -@ref{Sample Programs, ,Practical @command{awk} Programs}, +@ref{Sample Programs}, use @code{getopt} to process their arguments. @c ENDOFRANGE libfclo @c ENDOFRANGE flibclo @c ENDOFRANGE clop @c ENDOFRANGE oclp -@node Passwd Functions, Group Functions, Getopt Function, Library Functions +@node Passwd Functions @section Reading the User Database @c STARTOFRANGE libfudata @@ -18232,7 +18459,7 @@ However, because these are numbers, they do not provide very useful information to the average user. There needs to be some way to find the user information associated with the user and group ID numbers. This @value{SECTION} presents a suite of functions for retrieving information from the -user database. @xref{Group Functions, ,Reading the Group Database}, +user database. @xref{Group Functions}, for a similar suite that retrieves information from the group database. @cindex @code{getpwent} function (C library) @@ -18575,14 +18802,14 @@ once. If you are worried about squeezing every last cycle out of your this is not necessary, since most @command{awk} programs are I/O-bound, and it clutters up the code. -The @command{id} program in @ref{Id Program, ,Printing out User Information}, +The @command{id} program in @ref{Id Program}, uses these functions. @c ENDOFRANGE libfudata @c ENDOFRANGE flibudata @c ENDOFRANGE udatar @c ENDOFRANGE dataur -@node Group Functions, , Passwd Functions, Library Functions +@node Group Functions @section Reading the Group Database @c STARTOFRANGE libfgdata @@ -18602,7 +18829,7 @@ uses these functions. @cindex group file @cindex files, group Much of the discussion presented in -@ref{Passwd Functions, ,Reading the User Database}, +@ref{Passwd Functions}, applies to the group database as well. Although there has traditionally been a well-known file (@file{/etc/group}) in a well-known format, the POSIX standard only provides a set of C library routines @@ -18635,7 +18862,7 @@ is as follows: #if HAVE_CONFIG_H #include <config.h> #endif - + #if defined (STDC_HEADERS) #include <stdlib.h> #endif @@ -18818,7 +19045,7 @@ routine, we have chosen to put it in @file{/usr/local/libexec/awk}. You might want it to be in a different directory on your system. These routines follow the same general outline as the user database routines -(@pxref{Passwd Functions, ,Reading the User Database}). +(@pxref{Passwd Functions}). The @code{@w{_gr_inited}} variable is used to ensure that the database is scanned no more than once. The @code{@w{_gr_init}} function first saves @code{FS}, @code{FIELDWIDTHS}, @code{RS}, and @@ -18945,7 +19172,7 @@ Most of the work is in scanning the database and building the various associative arrays. The functions that the user calls are themselves very simple, relying on @command{awk}'s associative arrays to do work. -The @command{id} program in @ref{Id Program, ,Printing out User Information}, +The @command{id} program in @ref{Id Program}, uses these functions. @c ENDOFRANGE libfgdata @c ENDOFRANGE flibgdata @@ -18955,12 +19182,12 @@ uses these functions. @c ENDOFRANGE fudlib @c ENDOFRANGE datagr -@node Sample Programs, Language History, Library Functions, Top +@node Sample Programs @chapter Practical @command{awk} Programs @c STARTOFRANGE awkpex @cindex @command{awk} programs, examples of -@ref{Library Functions, ,A Library of @command{awk} Functions}, +@ref{Library Functions}, presents the idea that reading programs in a language contributes to learning that language. This @value{CHAPTER} continues that theme, presenting a potpourri of @command{awk} programs for your reading @@ -18985,7 +19212,7 @@ ability to do a lot in just a few lines of code. @end ifnotinfo Many of these programs use the library functions presented in -@ref{Library Functions, ,A Library of @command{awk} Functions}. +@ref{Library Functions}. @menu * Running Examples:: How to run these examples. @@ -18993,7 +19220,7 @@ Many of these programs use the library functions presented in * Miscellaneous Programs:: Some interesting @command{awk} programs. @end menu -@node Running Examples, Clones, Sample Programs, Sample Programs +@node Running Examples @section Running the Example Programs To run a given program, you would typically do something like this: @@ -19008,7 +19235,7 @@ Here, @var{program} is the name of the @command{awk} program (such as program that start with a @samp{-}, and @var{files} are the actual @value{DF}s. If your system supports the @samp{#!} executable interpreter mechanism -(@pxref{Executable Scripts, , Executable @command{awk} Programs}), +(@pxref{Executable Scripts}), you can instead run your program directly: @example @@ -19021,7 +19248,7 @@ If your @command{awk} is not @command{gawk}, you may instead need to use this: cut.awk -- -c1-8 myfiles > results @end example -@node Clones, Miscellaneous Programs, Running Examples, Sample Programs +@node Clones @section Reinventing Wheels for Fun and Profit @c last comma is part of secondary @c STARTOFRANGE posimawk @@ -19049,7 +19276,7 @@ The programs are presented in alphabetical order. * Wc Program:: The @command{wc} utility. @end menu -@node Cut Program, Egrep Program, Clones, Clones +@node Cut Program @subsection Cutting out Fields and Columns @cindex @command{cut} utility @@ -19095,9 +19322,9 @@ Suppress printing of lines that do not contain the field delimiter. @end table The @command{awk} implementation of @command{cut} uses the @code{getopt} library -function (@pxref{Getopt Function, ,Processing Command-Line Options}) +function (@pxref{Getopt Function}) and the @code{join} library function -(@pxref{Join Function, ,Merging an Array into a String}). +(@pxref{Join Function}). The program begins with a comment describing the options, the library functions needed, and a @code{usage} function that prints out a usage @@ -19271,7 +19498,7 @@ function set_fieldlist( n, m, i, j, k, f, g) The @code{set_charlist} function is more complicated than @code{set_fieldlist}. The idea here is to use @command{gawk}'s @code{FIELDWIDTHS} variable -(@pxref{Constant Size, ,Reading Fixed-Width Data}), +(@pxref{Constant Size}), which describes constant-width input. When using a character list, that is exactly what we have. @@ -19368,7 +19595,7 @@ written out between the fields: This version of @command{cut} relies on @command{gawk}'s @code{FIELDWIDTHS} variable to do the character-based cutting. While it is possible in other @command{awk} implementations to use @code{substr} -(@pxref{String Functions, ,String Manipulation Functions}), +(@pxref{String Functions}), it is also extremely painful. The @code{FIELDWIDTHS} variable supplies an elegant solution to the problem of picking the input line apart by characters. @@ -19378,7 +19605,7 @@ of picking the input line apart by characters. @c Exercise: Rewrite using split with "". -@node Egrep Program, Id Program, Cut Program, Clones +@node Egrep Program @subsection Searching for Regular Expressions in Files @c STARTOFRANGE regexps @@ -19390,7 +19617,7 @@ of picking the input line apart by characters. @cindex @command{egrep} utility The @command{egrep} utility searches files for patterns. It uses regular expressions that are almost identical to those available in @command{awk} -(@pxref{Regexp, ,Regular Expressions}). +(@pxref{Regexp}). It is used in the following manner: @example @@ -19432,9 +19659,9 @@ option is to allow patterns that start with a @samp{-}. @end table This version uses the @code{getopt} library function -(@pxref{Getopt Function, ,Processing Command-Line Options}) +(@pxref{Getopt Function}) and the file transition library program -(@pxref{Filetrans Function, ,Noting @value{DDF} Boundaries}). +(@pxref{Filetrans Function}). The program begins with a descriptive comment and then a @code{BEGIN} rule that processes the command-line arguments with @code{getopt}. The @option{-i} @@ -19673,7 +19900,7 @@ or not. @c ENDOFRANGE sfregexp @c ENDOFRANGE fsregexp -@node Id Program, Split Program, Egrep Program, Clones +@node Id Program @subsection Printing out User Information @cindex printing, user information @@ -19697,9 +19924,9 @@ individual numbers. Here is a simple version of @command{id} written in @command{awk}. It uses the user database library functions -(@pxref{Passwd Functions, ,Reading the User Database}) +(@pxref{Passwd Functions}) and the group database library functions -(@pxref{Group Functions, ,Reading the Group Database}): +(@pxref{Group Functions}): The program is fairly straightforward. All the work is done in the @code{BEGIN} rule. The user and group ID numbers are obtained from @@ -19814,7 +20041,7 @@ information is printed. Modify this version to accept the same arguments and perform in the same way. @end ignore -@node Split Program, Tee Program, Id Program, Clones +@node Split Program @subsection Splitting a Large File into Pieces @c STARTOFRANGE filspl @@ -19838,7 +20065,7 @@ argument that specifies the @value{FN} prefix. Here is a version of @code{split} in @command{awk}. It uses the @code{ord} and @code{chr} functions presented in -@ref{Ordinal Functions, ,Translating Between Characters and Numbers}. +@ref{Ordinal Functions}. The program first sets its defaults, and then tests to make sure there are not too many arguments. It then looks at each argument in turn. The @@ -19959,7 +20186,7 @@ which isn't true for EBCDIC systems. @c BFD... @c ENDOFRANGE filspl -@node Tee Program, Uniq Program, Split Program, Clones +@node Tee Program @subsection Duplicating Output into Multiple Files @c last comma is part of secondary @@ -19989,7 +20216,7 @@ If the first argument is @option{-a}, then the flag variable @code{copy[1]} are deleted. If @code{ARGC} is less than two, then no @value{FN}s were supplied and @code{tee} prints a usage message and exits. Finally, @command{awk} is forced to read the standard input by setting -@code{ARGV[1]} to @code{"-"} and @code{ARGC} to two: +@code{ARGV[1]} to @code{"-"} and @code{ARGC} to two: @c NEXT ED: Add more leading commentary in this program @cindex @code{tee.awk} program @@ -20078,7 +20305,7 @@ END \ @c endfile @end example -@node Uniq Program, Wc Program, Tee Program, Clones +@node Uniq Program @subsection Printing Nonduplicated Lines of Text @c STARTOFRANGE prunt @@ -20132,9 +20359,9 @@ Normally @command{uniq} behaves as if both the @option{-d} and @command{uniq} uses the @code{getopt} library function -(@pxref{Getopt Function, ,Processing Command-Line Options}) +(@pxref{Getopt Function}) and the @code{join} library function -(@pxref{Join Function, ,Merging an Array into a String}). +(@pxref{Join Function}). The program begins with a @code{usage} function and then a brief outline of the options and their meanings in a comment. @@ -20238,7 +20465,7 @@ comparison of @code{last} and @code{$0}. Otherwise, things get more complicated. If fields have to be skipped, each line is broken into an array using @code{split} -(@pxref{String Functions, ,String Manipulation Functions}); +(@pxref{String Functions}); the desired fields are then joined back into a line using @code{join}. The joined lines are stored in @code{clast} and @code{cline}. If no fields are skipped, @code{clast} and @code{cline} are set to @@ -20337,7 +20564,7 @@ END @{ @c ENDOFRANGE prunt @c ENDOFRANGE tpul -@node Wc Program, , Uniq Program, Clones +@node Wc Program @subsection Counting Things @c STARTOFRANGE count @@ -20382,9 +20609,9 @@ words (i.e., fields) and counts them, it counts lines (i.e., records), and it can easily tell us how long a line is. This uses the @code{getopt} library function -(@pxref{Getopt Function, ,Processing Command-Line Options}) +(@pxref{Getopt Function}) and the file-transition functions -(@pxref{Filetrans Function, ,Noting @value{DDF} Boundaries}). +(@pxref{Filetrans Function}). This version has one notable difference from traditional versions of @command{wc}: it always prints the counts in the order lines, words, @@ -20461,7 +20688,7 @@ The @code{endfile} function adds the current file's numbers to the running totals of lines, words, and characters.@footnote{@command{wc} can't just use the value of @code{FNR} in @code{endfile}. If you examine the code in -@ref{Filetrans Function, , Noting @value{DDF} Boundaries} +@ref{Filetrans Function} you will see that @code{FNR} has already been reset by the time @code{endfile} is called.} It then prints out those numbers @@ -20533,7 +20760,7 @@ END @{ @c ENDOFRANGE chco @c ENDOFRANGE posimawk -@node Miscellaneous Programs, , Clones, Sample Programs +@node Miscellaneous Programs @section A Grab Bag of @command{awk} Programs This @value{SECTION} is a large ``grab bag'' of miscellaneous programs. @@ -20554,7 +20781,7 @@ We hope you find them both interesting and enjoyable. files. @end menu -@node Dupword Program, Alarm Program, Miscellaneous Programs, Miscellaneous Programs +@node Dupword Program @subsection Finding Duplicated Words in a Document @c last comma is part of secondary @@ -20622,7 +20849,7 @@ word, comparing it to the previous one: @c endfile @end example -@node Alarm Program, Translate Program, Dupword Program, Miscellaneous Programs +@node Alarm Program @subsection An Alarm Clock Program @cindex insomnia, cure for @cindex Robbins, Arnold @@ -20642,7 +20869,7 @@ the number of times to repeat the message as well as a delay between repetitions. This program uses the @code{gettimeofday} function from -@ref{Gettimeofday Function, ,Managing the Time of Day}. +@ref{Gettimeofday Function}. All the work is done in the @code{BEGIN} rule. The first part is argument checking and setting of defaults: the delay, the count, and the message to @@ -20750,7 +20977,7 @@ is how long to wait before setting off the alarm: @cindex @command{sleep} utility Finally, the program uses the @code{system} function -(@pxref{I/O Functions, ,Input/Output Functions}) +(@pxref{I/O Functions}) to call the @command{sleep} utility. The @command{sleep} utility simply pauses for the given number of seconds. If the exit status is not zero, the program assumes that @command{sleep} was interrupted and exits. If @@ -20780,7 +21007,7 @@ seconds are necessary: @c ENDOFRANGE tialarm @c ENDOFRANGE alaex -@node Translate Program, Labels Program, Alarm Program, Miscellaneous Programs +@node Translate Program @subsection Transliterating Characters @c STARTOFRANGE chtra @@ -20823,7 +21050,7 @@ The @command{translate} program demonstrates one of the few weaknesses of standard @command{awk}: dealing with individual characters is very painful, requiring repeated use of the @code{substr}, @code{index}, and @code{gsub} built-in functions -(@pxref{String Functions, ,String Manipulation Functions}).@footnote{This +(@pxref{String Functions}).@footnote{This program was written before @command{gawk} acquired the ability to split each character in a string into separate array elements.} @c Exercise: How might you use this new feature to simplify the program? @@ -20920,7 +21147,7 @@ function, it is not necessarily efficient, and we (the @command{gawk} authors) started to consider adding a built-in function. However, shortly after writing this program, we learned that the System V Release 4 @command{awk} had added the @code{toupper} and @code{tolower} functions -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). These functions handle the vast majority of the cases where character transliteration is necessary, and so we chose to simply add those functions to @command{gawk} as well and then leave well @@ -20932,7 +21159,7 @@ assumes that the ``from'' and ``to'' lists will never change throughout the lifetime of the program. @c ENDOFRANGE chtra -@node Labels Program, Word Sorting, Translate Program, Miscellaneous Programs +@node Labels Program @subsection Printing Mailing Labels @c STARTOFRANGE prml @@ -20955,7 +21182,7 @@ the @code{line} array and printing the page when 20 labels have been read. The @code{BEGIN} rule simply sets @code{RS} to the empty string, so that @command{awk} splits records at blank lines -(@pxref{Records, ,How Input Is Split into Records}). +(@pxref{Records}). It sets @code{MAXLINES} to 100, since 100 is the maximum number of lines on the page (20 * 5 = 100). @@ -21055,7 +21282,7 @@ END \ @c ENDOFRANGE prml @c ENDOFRANGE mlprint -@node Word Sorting, History Sorting, Labels Program, Miscellaneous Programs +@node Word Sorting @subsection Generating Word-Usage Counts @c last comma is part of secondary @@ -21088,7 +21315,7 @@ END @{ This program has two rules. The first rule, because it has an empty pattern, is executed for every input line. It uses @command{awk}'s field-accessing mechanism -(@pxref{Fields, ,Examining Fields}) to pick out the individual words from +(@pxref{Fields}) to pick out the individual words from the line, and the built-in variable @code{NF} (@pxref{Built-in Variables}) to know how many fields are available. For each input word, it increments an element of the array @code{freq} to @@ -21187,14 +21414,14 @@ See the general operating system documentation for more information on how to use the @command{sort} program. @c ENDOFRANGE worus -@node History Sorting, Extract Program, Word Sorting, Miscellaneous Programs +@node History Sorting @subsection Removing Duplicates from Unsorted Text @c last comma is part of secondary @c STARTOFRANGE lidu @cindex lines, duplicate, removing The @command{uniq} program -(@pxref{Uniq Program, ,Printing Nonduplicated Lines of Text}), +(@pxref{Uniq Program}), removes duplicate lines from @emph{sorted} data. Suppose, however, you need to remove duplicate lines from a @value{DF} but @@ -21257,7 +21484,7 @@ This works because @code{data[$0]} is incremented each time a line is seen. @c ENDOFRANGE lidu -@node Extract Program, Simple Sed, History Sorting, Miscellaneous Programs +@node Extract Program @subsection Extracting Programs from Texinfo Source Files @c STARTOFRANGE texse @@ -21267,13 +21494,13 @@ seen. @cindex files, Texinfo, extracting programs from @ifnotinfo Both this chapter and the previous chapter -(@ref{Library Functions, ,A Library of @command{awk} Functions}) +(@ref{Library Functions}) present a large number of @command{awk} programs. @end ifnotinfo @ifinfo The nodes -@ref{Library Functions, ,A Library of @command{awk} Functions}, -and @ref{Sample Programs, ,Practical @command{awk} Programs}, +@ref{Library Functions}, +and @ref{Sample Programs}, are the top level nodes for a large number of @command{awk} programs. @end ifinfo If you want to experiment with these programs, it is tedious to have to type @@ -21323,14 +21550,14 @@ file and does two things, based on the special comments. Upon seeing @samp{@w{@@c system @dots{}}}, it runs a command, by extracting the command text from the control line and passing it on to the @code{system} function -(@pxref{I/O Functions, ,Input/Output Functions}). +(@pxref{I/O Functions}). Upon seeing @samp{@@c file @var{filename}}, each subsequent line is sent to the file @var{filename}, until @samp{@@c endfile} is encountered. The rules in @file{extract.awk} match either @samp{@@c} or @samp{@@comment} by letting the @samp{omment} part be optional. Lines containing @samp{@@group} and @samp{@@end group} are simply removed. @file{extract.awk} uses the @code{join} library function -(@pxref{Join Function, ,Merging an Array into a String}). +(@pxref{Join Function}). The example programs in the online Texinfo source for @cite{@value{TITLE}} (@file{gawk.texi}) have all been bracketed inside @samp{file} and @@ -21423,7 +21650,7 @@ redirection for printing the contents, keeping open file management simple. The @samp{for} loop does the work. It reads lines using @code{getline} -(@pxref{Getline, ,Explicit Input with @code{getline}}). +(@pxref{Getline}). For an unexpected end of file, it calls the @code{@w{unexpected_eof}} function. If the line is an ``endfile'' line, then it breaks out of the loop. @@ -21436,7 +21663,7 @@ symbols, the program can print it directly. Otherwise, each leading @samp{@@} must be stripped off. To remove the @samp{@@} symbols, the line is split into separate elements of the array @code{a}, using the @code{split} function -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). The @samp{@@} symbol is used as the separator character. Each element of @code{a} that is empty indicates two successive @samp{@@} symbols in the original line. For each two empty elements (@samp{@@@@} in @@ -21493,7 +21720,7 @@ line. That line is then printed to the output file: An important thing to note is the use of the @samp{>} redirection. Output done with @samp{>} only opens the file once; it stays open and subsequent output is appended to the file -(@pxref{Redirection, , Redirecting Output of @code{print} and @code{printf}}). +(@pxref{Redirection}). This makes it easy to mix program text and explanatory prose for the same sample source file (as has been done here!) without any hassle. The file is only closed when a new data @value{FN} is encountered or at the end of the @@ -21523,7 +21750,7 @@ END @{ @c ENDOFRANGE texse @c ENDOFRANGE fitex -@node Simple Sed, Igawk Program, Extract Program, Miscellaneous Programs +@node Simple Sed @subsection A Simple Stream Editor @cindex @command{sed} utility @@ -21543,7 +21770,7 @@ Here, @samp{s/old/new/g} tells @command{sed} to look for the regexp @samp{old} on each input line and globally replace it with the text @samp{new}, i.e., all the occurrences on a line. This is similar to @command{awk}'s @code{gsub} function -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). The following program, @file{awksed.awk}, accepts at least two command-line arguments: the pattern to look for and the text to replace it with. Any @@ -21600,7 +21827,7 @@ BEGIN @{ The program relies on @command{gawk}'s ability to have @code{RS} be a regexp, as well as on the setting of @code{RT} to the actual text that terminates the -record (@pxref{Records, ,How Input Is Split into Records}). +record (@pxref{Records}). The idea is to have @code{RS} be the pattern to look for. @command{gawk} automatically sets @code{$0} to the text between matches of the pattern. @@ -21614,14 +21841,14 @@ statement unconditionally prints the replacement text, which is not correct. However, if the file did not end in text that matches @code{RS}, @code{RT} is set to the null string. In this case, we can print @code{$0} using @code{printf} -(@pxref{Printf, ,Using @code{printf} Statements for Fancier Printing}). +(@pxref{Printf}). The @code{BEGIN} rule handles the setup, checking for the right number of arguments and calling @code{usage} if there is a problem. Then it sets @code{RS} and @code{ORS} from the command-line arguments and sets @code{ARGV[1]} and @code{ARGV[2]} to the null string, so that they are not treated as @value{FN}s -(@pxref{ARGC and ARGV, , Using @code{ARGC} and @code{ARGV}}). +(@pxref{ARGC and ARGV}). The @code{usage} function prints an error message and exits. Finally, the single rule handles the printing scheme outlined above, @@ -21648,7 +21875,7 @@ Exercise: what are the advantages and disadvantages of this version versus sed? Others? @end ignore -@node Igawk Program, , Simple Sed, Miscellaneous Programs +@node Igawk Program @subsection An Easy Way to Use Library Functions @c STARTOFRANGE libfex @@ -21662,7 +21889,7 @@ However, using library functions is only easy when writing @command{awk} programs; it is painful when running them, requiring multiple @option{-f} options. If @command{gawk} is unavailable, then so too is the @env{AWKPATH} environment variable and the ability to put @command{awk} functions into a -library directory (@pxref{Options, ,Command-Line Options}). +library directory (@pxref{Options}). It would be nice to be able to write programs in the following manner: @example @@ -21877,7 +22104,7 @@ The @command{awk} program to process @samp{@@include} directives is stored in the shell variable @code{expand_prog}. Doing this keeps the shell script readable. The @command{awk} program reads through the user's program, one line at a time, using @code{getline} -(@pxref{Getline, ,Explicit Input with @code{getline}}). The input +(@pxref{Getline}). The input @value{FN}s and @samp{@@include} statements are managed using a stack. As each @samp{@@include} is encountered, the current @value{FN} is ``pushed'' onto the stack and the file named in the @samp{@@include} @@ -21889,7 +22116,7 @@ the first one on the stack. The @code{pathto} function does the work of finding the full path to a file. It simulates @command{gawk}'s behavior when searching the @env{AWKPATH} environment variable -(@pxref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}). +(@pxref{AWKPATH Variable}). If a @value{FN} has a @samp{/} in it, no path search is done. Otherwise, the @value{FN} is concatenated with the name of each directory in the path, and an attempt is made to open the generated @value{FN}. @@ -22146,22 +22373,22 @@ It contains the following appendixes: @itemize @bullet @item -@ref{Language History, ,The Evolution of the @command{awk} Language}. +@ref{Language History}. @item -@ref{Installation, ,Installing @command{gawk}}. +@ref{Installation}. @item -@ref{Notes, ,Implementation Notes}. +@ref{Notes}. @item -@ref{Basic Concepts, ,Basic Programming Concepts}. +@ref{Basic Concepts}. @item @ref{Glossary}. @item -@ref{Copying, ,GNU General Public License}. +@ref{Copying}. @item @ref{GNU Free Documentation License}. @@ -22173,7 +22400,7 @@ It contains the following appendixes: @end iftex @end ignore -@node Language History, Installation, Sample Programs, Top +@node Language History @appendix The Evolution of the @command{awk} Language This @value{DOCUMENT} describes the GNU implementation of @command{awk}, which follows @@ -22201,7 +22428,7 @@ of the @value{DOCUMENT} where you can find more information. * Contributors:: The major contributors to @command{gawk}. @end menu -@node V7/SVR3.1, SVR4, Language History, Language History +@node V7/SVR3.1 @appendixsec Major Changes Between V7 and SVR3.1 @c STARTOFRANGE gawkv @cindex @command{awk}, versions of @@ -22216,18 +22443,18 @@ cross-references to further details: @itemize @bullet @item The requirement for @samp{;} to separate rules on a line -(@pxref{Statements/Lines, ,@command{awk} Statements Versus Lines}). +(@pxref{Statements/Lines}). @item User-defined functions and the @code{return} statement -(@pxref{User-defined, ,User-Defined Functions}). +(@pxref{User-defined}). @item -The @code{delete} statement (@pxref{Delete, ,The @code{delete} Statement}). +The @code{delete} statement (@pxref{Delete}). @item The @code{do}-@code{while} statement -(@pxref{Do Statement, ,The @code{do}-@code{while} Statement}). +(@pxref{Do Statement}). @item The built-in functions @code{atan2}, @code{cos}, @code{sin}, @code{rand}, and @@ -22235,11 +22462,11 @@ The built-in functions @code{atan2}, @code{cos}, @code{sin}, @code{rand}, and @item The built-in functions @code{gsub}, @code{sub}, and @code{match} -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). @item The built-in functions @code{close} and @code{system} -(@pxref{I/O Functions, ,Input/Output Functions}). +(@pxref{I/O Functions}). @item The @code{ARGC}, @code{ARGV}, @code{FNR}, @code{RLENGTH}, @code{RSTART}, @@ -22247,26 +22474,26 @@ and @code{SUBSEP} built-in variables (@pxref{Built-in Variables}). @item The conditional expression using the ternary operator @samp{?:} -(@pxref{Conditional Exp, ,Conditional Expressions}). +(@pxref{Conditional Exp}). @item The exponentiation operator @samp{^} -(@pxref{Arithmetic Ops, ,Arithmetic Operators}) and its assignment operator -form @samp{^=} (@pxref{Assignment Ops, ,Assignment Expressions}). +(@pxref{Arithmetic Ops}) and its assignment operator +form @samp{^=} (@pxref{Assignment Ops}). @item C-compatible operator precedence, which breaks some old @command{awk} -programs (@pxref{Precedence, ,Operator Precedence (How Operators Nest)}). +programs (@pxref{Precedence}). @item Regexps as the value of @code{FS} -(@pxref{Field Separators, ,Specifying How Fields Are Separated}) and as the +(@pxref{Field Separators}) and as the third argument to the @code{split} function -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). @item Dynamic regexps as operands of the @samp{~} and @samp{!~} operators -(@pxref{Regexp Usage, ,How to Use Regular Expressions}). +(@pxref{Regexp Usage}). @item The escape sequences @samp{\b}, @samp{\f}, and @samp{\r} @@ -22277,19 +22504,19 @@ something you can rely on.) @item Redirection of input for the @code{getline} function -(@pxref{Getline, ,Explicit Input with @code{getline}}). +(@pxref{Getline}). @item Multiple @code{BEGIN} and @code{END} rules -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). +(@pxref{BEGIN/END}). @item Multidimensional arrays -(@pxref{Multi-dimensional, ,Multidimensional Arrays}). +(@pxref{Multi-dimensional}). @end itemize @c ENDOFRANGE gawkv1 -@node SVR4, POSIX, V7/SVR3.1, Language History +@node SVR4 @appendixsec Changes Between SVR3.1 and SVR4 @cindex @command{awk}, versions of, changes between SVR3.1 and SVR4 @@ -22303,12 +22530,12 @@ The @code{ENVIRON} variable (@pxref{Built-in Variables}). @item Multiple @option{-f} options on the command line -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). @c MKS awk @item The @option{-v} option for assigning variables before program execution begins -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). @c GNU, Bell Laboratories & MKS together @item @@ -22326,29 +22553,29 @@ A defined return value for the @code{srand} built-in function @item The @code{toupper} and @code{tolower} built-in string functions for case translation -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). @item A cleaner specification for the @samp{%c} format-control letter in the @code{printf} function -(@pxref{Control Letters, ,Format-Control Letters}). +(@pxref{Control Letters}). @item The ability to dynamically pass the field width and precision (@code{"%*.*d"}) in the argument list of the @code{printf} function -(@pxref{Control Letters, ,Format-Control Letters}). +(@pxref{Control Letters}). @item The use of regexp constants, such as @code{/foo/}, as expressions, where they are equivalent to using the matching operator, as in @samp{$0 ~ /foo/} -(@pxref{Using Constant Regexps, ,Using Regular Expression Constants}). +(@pxref{Using Constant Regexps}). @item Processing of escape sequences inside command-line variable assignments -(@pxref{Assignment Options, ,Assigning Variables on the Command Line}). +(@pxref{Assignment Options}). @end itemize -@node POSIX, BTL, SVR4, Language History +@node POSIX @appendixsec Changes Between SVR4 and POSIX @command{awk} @cindex @command{awk}, versions of, changes between SVR4 and POSIX @command{awk} @cindex POSIX @command{awk}, changes in @command{awk} versions @@ -22359,15 +22586,15 @@ introduced the following changes into the language: @itemize @bullet @item The use of @option{-W} for implementation-specific options -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). @item The use of @code{CONVFMT} for controlling the conversion of numbers -to strings (@pxref{Conversion, ,Conversion of Strings and Numbers}). +to strings (@pxref{Conversion}). @item The concept of a numeric string and tighter comparison rules to go -with it (@pxref{Typing and Comparison, ,Variable Typing and Comparison Expressions}). +with it (@pxref{Typing and Comparison}). @item More complete documentation of many of the previously undocumented @@ -22387,33 +22614,33 @@ standard: @item Newlines do not act as whitespace to separate fields when @code{FS} is equal to a single space -(@pxref{Fields, ,Examining Fields}). +(@pxref{Fields}). @item Newlines are not allowed after @samp{?} or @samp{:} -(@pxref{Conditional Exp, ,Conditional Expressions}). +(@pxref{Conditional Exp}). @item The synonym @code{func} for the keyword @code{function} is not -recognized (@pxref{Definition Syntax, ,Function Definition Syntax}). +recognized (@pxref{Definition Syntax}). @item The operators @samp{**} and @samp{**=} cannot be used in -place of @samp{^} and @samp{^=} (@pxref{Arithmetic Ops, ,Arithmetic Operators}, -and @ref{Assignment Ops, ,Assignment Expressions}). +place of @samp{^} and @samp{^=} (@pxref{Arithmetic Ops}, +and @ref{Assignment Ops}). @item Specifying @samp{-Ft} on the command line does not set the value of @code{FS} to be a single TAB character -(@pxref{Field Separators, ,Specifying How Fields Are Separated}). +(@pxref{Field Separators}). @item The @code{fflush} built-in function is not supported -(@pxref{I/O Functions, ,Input/Output Functions}). +(@pxref{I/O Functions}). @end itemize @c ENDOFRANGE gawkv -@node BTL, POSIX/GNU, POSIX, Language History +@node BTL @appendixsec Extensions in the Bell Laboratories @command{awk} @cindex @command{awk}, versions of, See Also Bell Laboratories @command{awk} @@ -22422,7 +22649,7 @@ The @code{fflush} built-in function is not supported @cindex Kernighan, Brian Brian Kernighan, one of the original designers of Unix @command{awk}, has made his version available via his home page -(@pxref{Other Versions, ,Other Freely Available @command{awk} Implementations}). +(@pxref{Other Versions}). This @value{SECTION} describes extensions in his version of @command{awk} that are not in POSIX @command{awk}: @@ -22431,23 +22658,23 @@ not in POSIX @command{awk}: The @samp{-mf @var{N}} and @samp{-mr @var{N}} command-line options to set the maximum number of fields and the maximum record size, respectively -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). As a side note, his @command{awk} no longer needs these options; it continues to accept them to avoid breaking old programs. @item The @code{fflush} built-in function for flushing buffered output -(@pxref{I/O Functions, ,Input/Output Functions}). +(@pxref{I/O Functions}). @item The @samp{**} and @samp{**=} operators -(@pxref{Arithmetic Ops, ,Arithmetic Operators} +(@pxref{Arithmetic Ops} and -@ref{Assignment Ops, ,Assignment Expressions}). +@ref{Assignment Ops}). @item The use of @code{func} as an abbreviation for @code{function} -(@pxref{Definition Syntax, ,Function Definition Syntax}). +(@pxref{Definition Syntax}). @ignore @item @@ -22469,23 +22696,23 @@ The @samp{\x} escape sequence @item The @file{/dev/stdin}, @file{/dev/stdout}, and @file{/dev/stderr} special files -(@pxref{Special Files, ,Special @value{FFN}s in @command{gawk}}). +(@pxref{Special Files}). @item The ability for @code{FS} and for the third argument to @code{split} to be null strings -(@pxref{Single Character Fields, , Making Each Character a Separate Field}). +(@pxref{Single Character Fields}). @item The @code{nextfile} statement -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). +(@pxref{Nextfile Statement}). @item The ability to delete all of an array at once with @samp{delete @var{array}} -(@pxref{Delete, ,The @code{delete} Statement}). +(@pxref{Delete}). @end itemize -@node POSIX/GNU, Contributors, BTL, Language History +@node POSIX/GNU @appendixsec Extensions in @command{gawk} Not in POSIX @command{awk} @ignore @@ -22506,12 +22733,12 @@ Within each category, be alphabetical. @c STARTOFRANGE exgnot @cindex extensions, in @command{gawk}, not in POSIX @command{awk} @c STARTOFRANGE posnot -@cindex POSIX, @command{gawk} extensions not included in +@cindex POSIX, @command{gawk} extensions not included in The GNU implementation, @command{gawk}, adds a large number of features. This @value{SECTION} lists them in the order they were added to @command{gawk}. They can all be disabled with either the @option{--traditional} or @option{--posix} options -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). Version 2.10 of @command{gawk} introduced the following features: @@ -22519,16 +22746,16 @@ Version 2.10 of @command{gawk} introduced the following features: @item The @env{AWKPATH} environment variable for specifying a path search for the @option{-f} command-line option -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). @item The @code{IGNORECASE} variable and its effects -(@pxref{Case-sensitivity, ,Case Sensitivity in Matching}). +(@pxref{Case-sensitivity}). @item The @file{/dev/stdin}, @file{/dev/stdout}, @file{/dev/stderr} and @file{/dev/fd/@var{N}} special @value{FN}s -(@pxref{Special Files, ,Special @value{FFN}s in @command{gawk}}). +(@pxref{Special Files}). @end itemize Version 2.13 of @command{gawk} introduced the following features: @@ -22536,25 +22763,25 @@ Version 2.13 of @command{gawk} introduced the following features: @itemize @bullet @item The @code{FIELDWIDTHS} variable and its effects -(@pxref{Constant Size, ,Reading Fixed-Width Data}). +(@pxref{Constant Size}). @item The @code{systime} and @code{strftime} built-in functions for obtaining and printing timestamps -(@pxref{Time Functions, ,Using @command{gawk}'s Timestamp Functions}). +(@pxref{Time Functions}). @item The @option{-W lint} option to provide error and portability checking for both the source code and at runtime -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). @item The @option{-W compat} option to turn off the GNU extensions -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). @item The @option{-W posix} option for full POSIX compliance -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). @end itemize Version 2.14 of @command{gawk} introduced the following feature: @@ -22562,7 +22789,7 @@ Version 2.14 of @command{gawk} introduced the following feature: @itemize @bullet @item The @code{next file} statement for skipping to the next @value{DF} -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). +(@pxref{Nextfile Statement}). @end itemize Version 2.15 of @command{gawk} introduced the following features: @@ -22580,20 +22807,20 @@ The @code{ERRNO} variable, which contains the system error message when @item The @file{/dev/pid}, @file{/dev/ppid}, @file{/dev/pgrpid}, and @file{/dev/user} @value{FN} interpretation -(@pxref{Special Files, ,Special @value{FFN}s in @command{gawk}}). +(@pxref{Special Files}). @item The ability to delete all of an array at once with @samp{delete @var{array}} -(@pxref{Delete, ,The @code{delete} Statement}). +(@pxref{Delete}). @item The ability to use GNU-style long-named options that start with @option{--} -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). @item The @option{--source} option for mixing command-line and library-file source code -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). @end itemize Version 3.0 of @command{gawk} introduced the following features: @@ -22602,66 +22829,66 @@ Version 3.0 of @command{gawk} introduced the following features: @item @code{IGNORECASE} changed, now applying to string comparison as well as regexp operations -(@pxref{Case-sensitivity, ,Case Sensitivity in Matching}). +(@pxref{Case-sensitivity}). @item The @code{RT} variable that contains the input text that matched @code{RS} -(@pxref{Records, ,How Input Is Split into Records}). +(@pxref{Records}). @item Full support for both POSIX and GNU regexps -(@pxref{Regexp, , Regular Expressions}). +(@pxref{Regexp}). @item The @code{gensub} function for more powerful text manipulation -(@pxref{String Functions, ,String Manipulation Functions}). +(@pxref{String Functions}). @item The @code{strftime} function acquired a default time format, allowing it to be called with no arguments -(@pxref{Time Functions, ,Using @command{gawk}'s Timestamp Functions}). +(@pxref{Time Functions}). @item The ability for @code{FS} and for the third argument to @code{split} to be null strings -(@pxref{Single Character Fields, , Making Each Character a Separate Field}). +(@pxref{Single Character Fields}). @item The ability for @code{RS} to be a regexp -(@pxref{Records, ,How Input Is Split into Records}). +(@pxref{Records}). @item The @code{next file} statement became @code{nextfile} -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). +(@pxref{Nextfile Statement}). @item The @option{--lint-old} option to warn about constructs that are not available in the original Version 7 Unix version of @command{awk} -(@pxref{V7/SVR3.1, ,Major Changes Between V7 and SVR3.1}). +(@pxref{V7/SVR3.1}). @item The @option{-m} option and the @code{fflush} function from the Bell Laboratories research version of @command{awk} -(@pxref{Options, ,Command-Line Options}; also -@pxref{I/O Functions, ,Input/Output Functions}). +(@pxref{Options}; also +@pxref{I/O Functions}). @item The @option{--re-interval} option to provide interval expressions in regexps -(@pxref{Regexp Operators, , Regular Expression Operators}). +(@pxref{Regexp Operators}). @item The @option{--traditional} option was added as a better name for -@option{--compat} (@pxref{Options, ,Command-Line Options}). +@option{--compat} (@pxref{Options}). @item The use of GNU Autoconf to control the configuration process -(@pxref{Quick Installation, , Compiling @command{gawk} for Unix}). +(@pxref{Quick Installation}). @item Amiga support -(@pxref{Amiga Installation, ,Installing @command{gawk} on an Amiga}). +(@pxref{Amiga Installation}). @end itemize @@ -22671,7 +22898,7 @@ Version 3.1 of @command{gawk} introduced the following features: @item The @code{BINMODE} special variable for non-POSIX systems, which allows binary I/O for input and/or output files -(@pxref{PC Using, ,Using @command{gawk} on PC Operating Systems}). +(@pxref{PC Using}). @item The @code{LINT} special variable, which dynamically controls lint warnings @@ -22686,53 +22913,53 @@ The @code{TEXTDOMAIN} special variable for setting an application's internationalization text domain (@pxref{Built-in Variables}, and -@ref{Internationalization, ,Internationalization with @command{gawk}}). +@ref{Internationalization}). @item The ability to use octal and hexadecimal constants in @command{awk} program source code -(@pxref{Nondecimal-numbers, ,Octal and Hexadecimal Numbers}). +(@pxref{Nondecimal-numbers}). @item The @samp{|&} operator for two-way I/O to a coprocess -(@pxref{Two-way I/O, ,Two-Way Communications with Another Process}). +(@pxref{Two-way I/O}). @item The @file{/inet} special files for TCP/IP networking using @samp{|&} -(@pxref{TCP/IP Networking, , Using @command{gawk} for Network Programming}). +(@pxref{TCP/IP Networking}). @item The optional second argument to @code{close} that allows closing one end of a two-way pipe to a coprocess -(@pxref{Two-way I/O, ,Two-Way Communications with Another Process}). +(@pxref{Two-way I/O}). @item The optional third argument to the @code{match} function for capturing text-matching subexpressions within a regexp -(@pxref{String Functions, , String Manipulation Functions}). +(@pxref{String Functions}). @item Positional specifiers in @code{printf} formats for making translations easier -(@pxref{Printf Ordering, , Rearranging @code{printf} Arguments}). +(@pxref{Printf Ordering}). @item The @code{asort} and @code{asorti} functions for sorting arrays -(@pxref{Array Sorting, ,Sorting Array Values and Indices with @command{gawk}}). +(@pxref{Array Sorting}). @item The @code{bindtextdomain}, @code{dcgettext} and @code{dcngettext} functions for internationalization -(@pxref{Programmer i18n, ,Internationalizing @command{awk} Programs}). +(@pxref{Programmer i18n}). @item The @code{extension} built-in function and the ability to add new built-in functions dynamically -(@pxref{Dynamic Extensions, , Adding New Built-in Functions to @command{gawk}}). +(@pxref{Dynamic Extensions}). @item The @code{mktime} built-in function for creating timestamps -(@pxref{Time Functions, ,Using @command{gawk}'s Timestamp Functions}). +(@pxref{Time Functions}). @item The @@ -22745,57 +22972,57 @@ The and @code{strtonum} built-in functions -(@pxref{Bitwise Functions, ,Using @command{gawk}'s Bit Manipulation Functions}). +(@pxref{Bitwise Functions}). @item @cindex @code{next file} statement The support for @samp{next file} as two words was removed completely -(@pxref{Nextfile Statement, ,Using @command{gawk}'s @code{nextfile} Statement}). +(@pxref{Nextfile Statement}). @item The @option{--dump-variables} option to print a list of all global variables -(@pxref{Options, ,Command-Line Options}). +(@pxref{Options}). @item The @option{--gen-po} command-line option and the use of a leading underscore to mark strings that should be translated -(@pxref{String Extraction, ,Extracting Marked Strings}). +(@pxref{String Extraction}). @item The @option{--non-decimal-data} option to allow non-decimal input data -(@pxref{Nondecimal Data, ,Allowing Nondecimal Input Data}). +(@pxref{Nondecimal Data}). @item The @option{--profile} option and @command{pgawk}, the profiling version of @command{gawk}, for producing execution profiles of @command{awk} programs -(@pxref{Profiling, ,Profiling Your @command{awk} Programs}). +(@pxref{Profiling}). @item The @option{--enable-portals} configuration option to enable special treatment of pathnames that begin with @file{/p} as BSD portals -(@pxref{Portal Files, , Using @command{gawk} with BSD Portals}). +(@pxref{Portal Files}). @item The use of GNU Automake to help in standardizing the configuration process -(@pxref{Quick Installation, , Compiling @command{gawk} for Unix}). +(@pxref{Quick Installation}). @item The use of GNU @code{gettext} for @command{gawk}'s own message output -(@pxref{Gawk I18N, ,@command{gawk} Can Speak Your Language}). +(@pxref{Gawk I18N}). @item BeOS support -(@pxref{BeOS Installation, , Installing @command{gawk} on BeOS}). +(@pxref{BeOS Installation}). @item Tandem support -(@pxref{Tandem Installation, ,Installing @command{gawk} on a Tandem}). +(@pxref{Tandem Installation}). @item The Atari port became officially unsupported -(@pxref{Atari Installation, ,Installing @command{gawk} on the Atari ST}). +(@pxref{Atari Installation}). @item The source code now uses new-style function definitions, with @@ -22804,7 +23031,7 @@ The source code now uses new-style function definitions, with @item The @option{--disable-lint} configuration option to disable lint checking at compile time -(@pxref{Additional Configuration Options, , Additional Configuration Options}). +(@pxref{Additional Configuration Options}). @end itemize @@ -22814,7 +23041,7 @@ at compile time @c ENDOFRANGE exgnot @c ENDOFRANGE posnot -@node Contributors, , POSIX/GNU, Language History +@node Contributors @appendixsec Major Contributors to @command{gawk} @cindex @command{gawk}, list of contributors to @quotation @@ -22920,7 +23147,7 @@ currently maintains the MS-DOS port. @item @cindex Grigera, Juan Juan Grigera -maintains the port to Win32 systems. +maintains the port to Windows32 systems. @item @cindex Hankerson, Darrel @@ -22973,6 +23200,13 @@ updated the @command{gawk} port for OS/2. Isamu Hasegawa, of IBM in Japan, contributed support for multibyte characters. +@cindex Benzinger, Michael +Michael Benzinger contributed the initial code for @code{switch} statements. + +@cindex McPhee, Patrick +Patrick T.J.@: McPhee contributed the code for dynamic loading in Windows32 +environments. + @item @cindex Robbins, Arnold Arnold Robbins @@ -22980,7 +23214,7 @@ has been working on @command{gawk} since 1988, at first helping David Trueman, and as the primary maintainer since around 1994. @end itemize -@node Installation, Notes, Language History, Top +@node Installation @appendix Installing @command{gawk} @c last two commas are part of see also @@ -22993,7 +23227,7 @@ This appendix provides instructions for installing @command{gawk} on the various platforms that are supported by the developers. The primary developer supports GNU/Linux (and Unix), whereas the other ports are contributed. -@xref{Bugs, , Reporting Problems and Bugs}, +@xref{Bugs}, for the electronic mail addresses of the people who did the respective ports. @@ -23008,7 +23242,7 @@ the respective ports. implementations. @end menu -@node Gawk Distribution, Unix Installation, Installation, Installation +@node Gawk Distribution @appendixsec The @command{gawk} Distribution @cindex source code, @command{gawk} @@ -23022,7 +23256,7 @@ subdirectories. * Distribution contents:: What is in the distribution. @end menu -@node Getting, Extracting, Gawk Distribution, Gawk Distribution +@node Getting @appendixsubsec Getting the @command{gawk} Distribution @c last comma is part of secondary @cindex @command{gawk}, source code, obtaining @@ -23065,7 +23299,7 @@ The up-to-date list of mirror sites is available from Try to use one of the mirrors; they will be less busy, and you can usually find one closer to your site. -@node Extracting, Distribution contents, Getting, Gawk Distribution +@node Extracting @appendixsubsec Extracting the Distribution @command{gawk} is distributed as a @code{tar} file compressed with the GNU Zip program, @code{gzip}. @@ -23099,7 +23333,7 @@ If you are not on a Unix system, you need to make other arrangements for getting and extracting the @command{gawk} distribution. You should consult a local expert. -@node Distribution contents, , Extracting, Gawk Distribution +@node Distribution contents @appendixsubsec Contents of the @command{gawk} Distribution @c STARTOFRANGE gawdis @cindex @command{gawk}, distribution @@ -23107,7 +23341,7 @@ a local expert. The @command{gawk} distribution has a number of C source files, documentation files, subdirectories, and files related to the configuration process -(@pxref{Unix Installation, ,Compiling and Installing @command{gawk} on Unix}), +(@pxref{Unix Installation}), as well as several subdirectories related to different non-Unix operating systems: @@ -23198,7 +23432,7 @@ The generated Info file for @item doc/igawk.1 The @command{troff} source for a manual page describing the @command{igawk} program presented in -@ref{Igawk Program, ,An Easy Way to Use Library Functions}. +@ref{Igawk Program}. @item doc/Makefile.in The input file used during the configuration process to generate the @@ -23222,7 +23456,7 @@ the @file{Makefile.in} files used by @command{autoconf} and @itemx m4/* These files and subdirectories are used when configuring @command{gawk} for various Unix systems. They are explained in -@ref{Unix Installation, ,Compiling and Installing @command{gawk} on Unix}. +@ref{Unix Installation}. @item intl/* @itemx po/* @@ -23235,15 +23469,15 @@ contains message translations. @itemx awklib/Makefile.in @itemx awklib/eg/* The @file{awklib} directory contains a copy of @file{extract.awk} -(@pxref{Extract Program, ,Extracting Programs from Texinfo Source Files}), +(@pxref{Extract Program}), which can be used to extract the sample programs from the Texinfo source file for this @value{DOCUMENT}. It also contains a @file{Makefile.in} file, which @command{configure} uses to generate a @file{Makefile}. @file{Makefile.am} is used by GNU Automake to create @file{Makefile.in}. The library functions from -@ref{Library Functions, , A Library of @command{awk} Functions}, +@ref{Library Functions}, and the @command{igawk} program from -@ref{Igawk Program, , An Easy Way to Use Library Functions}, +@ref{Igawk Program}, are included as ready-to-use files in the @command{gawk} distribution. They are installed as part of the installation process. The rest of the programs in this @value{DOCUMENT} are available in appropriate @@ -23251,22 +23485,22 @@ subdirectories of @file{awklib/eg}. @item unsupported/atari/* Files needed for building @command{gawk} on an Atari ST -(@pxref{Atari Installation, ,Installing @command{gawk} on the Atari ST}, for details). +(@pxref{Atari Installation}, for details). @item unsupported/tandem/* Files needed for building @command{gawk} on a Tandem -(@pxref{Tandem Installation, ,Installing @command{gawk} on a Tandem}, for details). +(@pxref{Tandem Installation}, for details). @item posix/* Files needed for building @command{gawk} on POSIX-compliant systems. @item pc/* Files needed for building @command{gawk} under MS-DOS, MS Windows and OS/2 -(@pxref{PC Installation, ,Installation on PC Operating Systems}, for details). +(@pxref{PC Installation}, for details). @item vms/* Files needed for building @command{gawk} under VMS -(@pxref{VMS Installation, ,How to Compile and Install @command{gawk} on VMS}, for details). +(@pxref{VMS Installation}, for details). @item test/* A test suite for @@ -23277,7 +23511,7 @@ be confident of a successful port. @end table @c ENDOFRANGE gawdis -@node Unix Installation, Non-Unix Installation, Gawk Distribution, Installation +@node Unix Installation @appendixsec Compiling and Installing @command{gawk} on Unix Usually, you can compile and install @command{gawk} by typing only two @@ -23290,7 +23524,7 @@ to configure @command{gawk} for your system yourself. * Configuration Philosophy:: How it's all supposed to work. @end menu -@node Quick Installation, Additional Configuration Options, Unix Installation, Unix Installation +@node Quick Installation @appendixsubsec Compiling @command{gawk} for Unix @c @cindex installation, unix @@ -23352,9 +23586,9 @@ If these steps do not work, or if any of the tests fail, check the files in the @file{README_d} directory to see if you've found a known problem. If the failure is not described there, please send in a bug report -(@pxref{Bugs, ,Reporting Problems and Bugs}.) +(@pxref{Bugs}.) -@node Additional Configuration Options, Configuration Philosophy, Quick Installation, Unix Installation +@node Additional Configuration Options @appendixsubsec Additional Configuration Options @cindex @command{gawk}, configuring, options @c comma is part of primary @@ -23370,7 +23604,14 @@ command line when compiling @command{gawk} from scratch, including: Treat pathnames that begin with @file{/p} as BSD portal files when doing two-way I/O with the @samp{|&} operator -(@pxref{Portal Files, , Using @command{gawk} with BSD Portals}). +(@pxref{Portal Files}). + +@cindex @code{--enable-switch} configuration option +@cindex configuration option, @code{--enable-switch} +@item --enable-switch +Enable the recognition and execution of C-style @code{switch} statements +in @command{awk} programs +(@pxref{Switch Statement}.) @cindex Linux @cindex GNU/Linux @@ -23388,10 +23629,10 @@ All known modern GNU/Linux systems use Glibc 2. Use this option on any other sy @item --disable-lint This option disables all lint checking within @code{gawk}. The @option{--lint} and @option{--lint-old} options -(@pxref{Options, , Command-Line Options}) +(@pxref{Options}) are accepted, but silently do nothing. Similarly, setting the @code{LINT} variable -(@pxref{User-modified, , Built-in Variables That Control @command{awk}}) +(@pxref{User-modified}) has no effect on the running @command{awk} program. When used with GCC's automatic dead-code-elimination, this option @@ -23413,7 +23654,7 @@ You should also use this option if @option{--with-included-gettext} doesn't work on your system. @end table -@node Configuration Philosophy, , Additional Configuration Options, Unix Installation +@node Configuration Philosophy @appendixsubsec The Configuration Process @cindex @command{gawk}, configuring @@ -23456,12 +23697,12 @@ It is also possible that the @command{configure} program generated by If you do have a problem, the file @file{configure.in} is the input for @command{autoconf}. You may be able to change this file and generate a new version of @command{configure} that works on your system -(@pxref{Bugs, ,Reporting Problems and Bugs}, +(@pxref{Bugs}, for information on how to report problems in configuring @command{gawk}). The same mechanism may be used to send in updates to @file{configure.in} and/or @file{custom.h}. -@node Non-Unix Installation, Unsupported, Unix Installation, Installation +@node Non-Unix Installation @appendixsec Installation on Other Operating Systems This @value{SECTION} describes how to install @command{gawk} on @@ -23475,7 +23716,7 @@ various non-Unix systems. * VMS Installation:: Installing @command{gawk} on VMS. @end menu -@node Amiga Installation, BeOS Installation, Non-Unix Installation, Non-Unix Installation +@node Amiga Installation @appendixsubsec Installing @command{gawk} on an Amiga @cindex amiga @@ -23511,9 +23752,9 @@ configure -v m68k-amigaos Then run @command{make} and you should be all set! If these steps do not work, please send in a bug report -(@pxref{Bugs, ,Reporting Problems and Bugs}). +(@pxref{Bugs}). -@node BeOS Installation, PC Installation, Amiga Installation, Non-Unix Installation +@node BeOS Installation @appendixsubsec Installing @command{gawk} on BeOS @cindex BeOS @cindex installation, beos @@ -23547,12 +23788,12 @@ $ make install BeOS uses @command{bash} as its shell; thus, you use @command{gawk} the same way you would under Unix. If these steps do not work, please send in a bug report -(@pxref{Bugs, ,Reporting Problems and Bugs}). +(@pxref{Bugs}). @c Rewritten by Scott Deifik <scottd@amgen.com> @c and Darrel Hankerson <hankedr@mail.auburn.edu> -@node PC Installation, VMS Installation, BeOS Installation, Non-Unix Installation +@node PC Installation @appendixsubsec Installation on PC Operating Systems @c first comma is part of primary @@ -23561,27 +23802,28 @@ If these steps do not work, please send in a bug report @cindex operating systems, PC, @command{gawk} on, installing This @value{SECTION} covers installation and usage of @command{gawk} on x86 machines running DOS, any version of Windows, or OS/2. -In this @value{SECTION}, the term ``Win32'' +In this @value{SECTION}, the term ``Windows32'' refers to any of Windows-95/98/ME/NT/2000. The limitations of DOS (and DOS shells under Windows or OS/2) has meant that various ``DOS extenders'' are often used with programs such as @command{gawk}. The varying capabilities of Microsoft Windows 3.1 -and Win32 can add to the confusion. For an overview of the +and Windows32 can add to the confusion. For an overview of the considerations, please refer to @file{README_d/README.pc} in the distribution. @menu * PC Binary Installation:: Installing a prepared distribution. -* PC Compiling:: Compiling @command{gawk} for MS-DOS, Win32, +* PC Compiling:: Compiling @command{gawk} for MS-DOS, Windows32, and OS/2. -* PC Using:: Running @command{gawk} on MS-DOS, Win32 and +* PC Dynamic:: Compiling @command{gawk} for dynamic libraries. +* PC Using:: Running @command{gawk} on MS-DOS, Windows32 and OS/2. * Cygwin:: Building and running @command{gawk} for Cygwin. @end menu -@node PC Binary Installation, PC Compiling, PC Installation, PC Installation +@node PC Binary Installation @appendixsubsubsec Installing a Prepared Distribution for PC Systems If you have received a binary distribution prepared by the DOS @@ -23601,7 +23843,7 @@ contents. In particular, it may include more than one version of the OS/2 (32 bit, EMX) binary distributions are prepared for the @file{/usr} directory of your preferred drive. Set @env{UNIXROOT} to your installation drive (e.g., @samp{e:}) if you want to install @command{gawk} onto another drive -than the hardcoded default @samp{c:}. Executables appear in @file{/usr/bin}, +than the hardcoded default @samp{c:}. Executables appear in @file{/usr/bin}, libraries under @file{/usr/share/awk}, manual pages under @file{/usr/man}, Texinfo documentation under @file{/usr/info} and NLS files under @file{/usr/share/locale}. If you already have a file @file{/usr/info/dir} from another package @@ -23619,13 +23861,13 @@ set properly. The binary distribution may contain a separate file containing additional or more detailed installation instructions. -@node PC Compiling, PC Using, PC Binary Installation, PC Installation +@node PC Compiling @appendixsubsubsec Compiling @command{gawk} for PC Operating Systems -@command{gawk} can be compiled for MS-DOS, Win32, and OS/2 using the GNU +@command{gawk} can be compiled for MS-DOS, Windows32, and OS/2 using the GNU development tools from DJ Delorie (DJGPP; MS-DOS only) or Eberhard -Mattes (EMX; MS-DOS, Win32 and OS/2). Microsoft Visual C/C++ can be used -to build a Win32 version, and Microsoft C/C++ can be +Mattes (EMX; MS-DOS, Windows32 and OS/2). Microsoft Visual C/C++ can be used +to build a Windows32 version, and Microsoft C/C++ can be used to build 16-bit versions for MS-DOS and OS/2. @c FIXME: (As of @command{gawk} 3.1.2, the MSC version doesn't work. However, @@ -23635,7 +23877,7 @@ The file additional notes, and @file{pc/Makefile} contains important information on compilation options. -To build @command{gawk} for MS-DOS, Win32, and OS/2 (16 bit only; for 32 bit +To build @command{gawk} for MS-DOS, Windows32, and OS/2 (16 bit only; for 32 bit (EMX) you can use the @command{configure} script and skip the following paragraphs; for details see below), copy the files in the @file{pc} directory (@emph{except} for @file{ChangeLog}) to the directory with the rest of the @command{gawk} @@ -23643,7 +23885,7 @@ sources. The @file{Makefile} contains a configuration section with comments and may need to be edited in order to work with your @command{make} utility. The @file{Makefile} contains a number of targets for building various MS-DOS, -Win32, and OS/2 versions. A list of targets is printed if the @command{make} +Windows32, and OS/2 versions. A list of targets is printed if the @command{make} command is given without a target. As an example, to build @command{gawk} using the DJGPP tools, enter @samp{make djgpp}. @@ -23732,8 +23974,56 @@ try GNU Make 3.79.1 or later versions. You should find the latest version on @uref{http://www.unixos2.org/sw/pub/binary/make/} or on @uref{ftp://hobbes.nmsu.edu/pub/os2/}. - -@node PC Using, Cygwin, PC Compiling, PC Installation +@node PC Dynamic +@appendixsubsubsec Compiling @command{gawk} For Dynamic Libraries + +@c From README_d/README.pcdynamic +@c 11 June 2003 + +To compile @command{gawk} with dynamic extension support, +uncomment the definitions of @code{DYN_FLAGS}, @code{DYN_EXP}, +@code{DYN_OBJ}, and @code{DYN_MAKEXP} in the configuration section of +the @file{Makefile}. There are two definitions for @code{DYN_MAKEXP}: +pick the one that matches your target. + +To build some of the example extension libraries, @command{cd} to the +extension directory and copy @file{Makefile.pc} to @file{Makefile}. You +can then build using the same two targets. To run the example +@command{awk} scripts, you'll need to either change the call to +the @code{extension} function to match the name of the library (for +instance, change @code{"./ordchr.so"} to @code{"ordchr.dll"} or simply +@code{"ordchr"}), or rename the library to match the call (for instance, +rename @file{ordchr.dll} to @file{ordchr.so}). + +If you build @command{gawk.exe} with one compiler but want to build +an extension library with the other, you need to copy the import +library. Visual C uses a library called @file{gawk.lib}, while MinGW uses +a library called @file{libgawk.a}. These files are equivalent and will +interoperate if you give them the correct name. The resulting shared +libraries are also interoperable. + +To create your own extension library, you can use the examples as models, +but you're essentially on your own. Post to @code{comp.lang.awk} or +send electronic mail to @email{ptjm@@interlog.com} if you have problems getting +started. If you need to access functions or variables which are not +exported by @command{gawk.exe}, add them to @file{gawkw32.def} and +rebuild. You should also add @code{ATTRIBUTE_EXPORTED} to the declaration +in @file{awk.h} of any variables you add to @file{gawkw32.def}. + +Note that extension libraries have the name of the @command{awk} +executable embedded in them at link time, so they will work only +with @command{gawk.exe}. In particular, they won't work if you +rename @command{gawk.exe} to @command{awk.exe} or if you try to use +@command{pgawk.exe}. You can perform profiling by temporarily renaming +@command{pgawk.exe} to @command{gawk.exe}. You can resolve this problem +by changing the program name in the definition of @code{DYN_MAKEXP} +for your compiler. + +On Windows32, libraries are sought first in the current directory, then in +the directory containing @command{gawk.exe}, and finally through the +@env{PATH} environment variable. + +@node PC Using @appendixsubsubsec Using @command{gawk} on PC Operating Systems @c STARTOFRANGE opgawx @cindex operating systems, PC, @command{gawk} on @@ -23742,7 +24032,7 @@ version on @uref{http://www.unixos2.org/sw/pub/binary/make/} or on With the exception of the Cygwin environment, the @samp{|&} operator and TCP/IP networking -(@pxref{TCP/IP Networking, , Using @command{gawk} for Network Programming}) +(@pxref{TCP/IP Networking}) are not supported for MS-DOS or MS-Windows. EMX (OS/2 only) does support at least the @samp{|&} operator. @@ -23753,7 +24043,7 @@ at least the @samp{|&} operator. @cindex semicolon (@code{;}), @code{AWKPATH} variable and @cindex @code{AWKPATH} environment variable The OS/2 and MS-DOS versions of @command{gawk} search for program files as -described in @ref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}. +described in @ref{AWKPATH Variable}. However, semicolons (rather than colons) separate elements in the @env{AWKPATH} variable. If @env{AWKPATH} is not set or is empty, then the default search path for OS/2 (16 bit) and MS-DOS versions is @@ -23823,7 +24113,7 @@ appropriate @samp{-v BINMODE=@var{N}} option on the command line. changed mid-stream. The name @code{BINMODE} was chosen to match @command{mawk} -(@pxref{Other Versions, , Other Freely Available @command{awk} Implementations}). +(@pxref{Other Versions}). Both @command{mawk} and @command{gawk} handle @code{BINMODE} similarly; however, @command{mawk} adds a @samp{-W BINMODE=@var{N}} option and an environment variable that can set @code{BINMODE}, @code{RS}, and @code{ORS}. The @@ -23870,7 +24160,7 @@ gawk -f binmode1.awk @dots{} With proper quoting, in the first example the setting of @code{RS} can be moved into the @code{BEGIN} rule. -@node Cygwin, , PC Using, PC Installation +@node Cygwin @appendixsubsubsec Using @command{gawk} In The Cygwin Environment @command{gawk} can be used ``out of the box'' under Windows if you are @@ -23892,11 +24182,11 @@ step on Cygwin takes considerably longer. However, it does finish, and then the @samp{make} proceeds as usual. @strong{Note:} The @samp{|&} operator and TCP/IP networking -(@pxref{TCP/IP Networking, , Using @command{gawk} for Network Programming}) +(@pxref{TCP/IP Networking}) are fully supported in the Cygwin environment. This is not true for any other environment for MS-DOS or MS-Windows. -@node VMS Installation, , PC Installation, Non-Unix Installation +@node VMS Installation @appendixsubsec How to Compile and Install @command{gawk} on VMS @c based on material from Pat Rankin <rankin@eql.caltech.edu> @@ -23912,7 +24202,7 @@ This @value{SUBSECTION} describes how to compile and install @command{gawk} unde * VMS POSIX:: Alternate instructions for VMS POSIX. @end menu -@node VMS Compilation, VMS Installation Details, VMS Installation, VMS Installation +@node VMS Compilation @appendixsubsubsec Compiling @command{gawk} on VMS To compile @command{gawk} under VMS, there is a @code{DCL} command procedure that @@ -23960,7 +24250,7 @@ No changes to @file{config.h} are needed. @command{gawk} has been tested under VAX/VMS 5.5-1 using VAX C V3.2, and GNU C 1.40 and 2.3. It should work without modifications for VMS V4.6 and up. -@node VMS Installation Details, VMS Running, VMS Compilation, VMS Installation +@node VMS Installation Details @appendixsubsubsec Installing @command{gawk} on VMS To install @command{gawk}, all you need is a ``foreign'' command, which is @@ -24008,7 +24298,7 @@ If, after searching in both directories, the file still is not found, the file search. If @samp{AWK_LIBRARY} is not defined, that portion of the file search fails benignly. -@node VMS Running, VMS POSIX, VMS Installation Details, VMS Installation +@node VMS Running @appendixsubsubsec Running @command{gawk} on VMS Command-line parsing and quoting conventions are significantly different @@ -24047,7 +24337,7 @@ of @samp{AWKPATH} is a comma-separated list of directory specifications. When defining it, the value should be quoted so that it retains a single translation and not a multitranslation @code{RMS} searchlist. -@node VMS POSIX, , VMS Running, VMS Installation +@node VMS POSIX @appendixsubsubsec Building and Using @command{gawk} on VMS POSIX Ignore the instructions above, although @file{vms/gawk.hlp} should still @@ -24076,7 +24366,7 @@ Once built, @command{gawk} works like any other shell utility. Unlike the normal VMS port of @command{gawk}, no special command-line manipulation is needed in the VMS POSIX environment. -@node Unsupported, Bugs, Non-Unix Installation, Installation +@node Unsupported @appendixsec Unsupported Operating System Ports This sections describes systems for which @@ -24087,7 +24377,7 @@ the @command{gawk} port is no longer supported. * Tandem Installation:: Installing @command{gawk} on a Tandem. @end menu -@node Atari Installation, Tandem Installation, Unsupported, Unsupported +@node Atari Installation @appendixsubsec Installing @command{gawk} on the Atari ST The Atari port is no longer supported. It is @@ -24106,7 +24396,7 @@ exactly right). In order to use @command{gawk}, you need to have a shell, either text or graphics, that does not map all the characters of a command line to uppercase. Maintaining case distinction in option flags is very -important (@pxref{Options, ,Command-Line Options}). +important (@pxref{Options}). These days this is the default and it may only be a problem for some very old machines. If your system does not preserve the case of option flags, you need to upgrade your tools. Support for I/O @@ -24118,7 +24408,7 @@ from other environments. Pipes are nice to have but not vital. * Atari Using:: Running @command{gawk} on Atari. @end menu -@node Atari Compiling, Atari Using, Atari Installation, Atari Installation +@node Atari Compiling @appendixsubsubsec Compiling @command{gawk} on the Atari ST A proper compilation of @command{gawk} sources when @code{sizeof(int)} @@ -24146,7 +24436,7 @@ Some @command{gawk} source code fragments depend on a preprocessor define @samp{atarist}. This basically assumes the TOS environment with @command{gcc}. Modify these sections as appropriate if they are not right for your environment. Also see the remarks about @env{AWKPATH} and @code{envsep} in -@ref{Atari Using, ,Running @command{gawk} on the Atari ST}. +@ref{Atari Using}. As shipped, the sample @file{config.h} claims that the @code{system} function is missing from the libraries, which is not true, and an @@ -24156,7 +24446,7 @@ Depending upon your particular combination of shell and operating system, you might want to change the file to indicate that @code{system} is available. -@node Atari Using, , Atari Compiling, Atari Installation +@node Atari Using @appendixsubsubsec Running @command{gawk} on the Atari ST An executable version of @command{gawk} should be placed, as usual, @@ -24172,7 +24462,7 @@ memory, it is a good idea to put it on a RAM drive. If neither current directory for its temporary files. The ST version of @command{gawk} searches for its program files, as described in -@ref{AWKPATH Variable, ,The @env{AWKPATH} Environment Variable}. +@ref{AWKPATH Variable}. The default value for the @env{AWKPATH} variable is taken from @code{DEFPATH} defined in @file{Makefile}. The sample @command{gcc}/TOS @file{Makefile} for the ST in the distribution sets @code{DEFPATH} to @@ -24209,7 +24499,7 @@ use only backslashes. Also remember that in @command{awk}, backslashes in strings have to be doubled in order to get literal backslashes (@pxref{Escape Sequences}). -@node Tandem Installation, , Atari Installation, Unsupported +@node Tandem Installation @appendixsubsec Installing @command{gawk} on a Tandem @cindex tandem @cindex installation, tandem @@ -24221,10 +24511,10 @@ The port's contributor no longer has access to a Tandem system. The Tandem port was done on a Cyclone machine running D20. The port is pretty clean and all facilities seem to work except for the I/O piping facilities -(@pxref{Getline/Pipe, , Using @code{getline} from a Pipe}, -@ref{Getline/Variable/Pipe, ,Using @code{getline} into a Variable from a Pipe}, +(@pxref{Getline/Pipe}, +@ref{Getline/Variable/Pipe}, and -@ref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}), +@ref{Redirection}), which is just too foreign a concept for Tandem. To build a Tandem executable from source, download all of the files so @@ -24246,14 +24536,14 @@ Unix @samp{<} and @samp{>} for file redirection. (Redirection options on @code{getline}, @code{print} etc., are supported.) The @samp{-mr @var{val}} option -(@pxref{Options, ,Command-Line Options}) +(@pxref{Options}) has been ``stolen'' to enable Tandem users to process fixed-length records with no ``end-of-line'' character. That is, @samp{-mr 74} tells @command{gawk} to read the input file as fixed 74-byte records. @c ENDOFRANGE opgawx @c ENDOFRANGE pcgawon -@node Bugs, Other Versions, Unsupported, Installation +@node Bugs @appendixsec Reporting Problems and Bugs @cindex archeologists @quotation @@ -24381,7 +24671,7 @@ report to the @email{bug-gawk@@gnu.org} email list as well. @c ENDOFRANGE dbugg @c ENDOFRANGE tblgawb -@node Other Versions, , Bugs, Installation +@node Other Versions @appendixsec Other Freely Available @command{awk} Implementations @c STARTOFRANGE awkim @cindex @command{awk}, implementations @@ -24428,7 +24718,7 @@ the C compiler from GCC (the GNU Compiler Collection) works quite nicely. -@xref{BTL, ,Extensions in the Bell Laboratories @command{awk}}, +@xref{BTL}, for a list of extensions in this @command{awk} that are not in POSIX @command{awk}. @cindex Brennan, Michael @@ -24437,7 +24727,7 @@ for a list of extensions in this @command{awk} that are not in POSIX @command{aw @item @command{mawk} Michael Brennan has written an independent implementation of @command{awk}, called @command{mawk}. It is available under the GPL -(@pxref{Copying, ,GNU General Public License}), +(@pxref{Copying}), just as @command{gawk} is. You can get it via anonymous @command{ftp} to the host @@ -24447,7 +24737,7 @@ Use ``binary'' or ``image'' mode, and retrieve @file{mawk1.3.3.tar.gz} @command{gunzip} may be used to decompress this file. Installation is similar to @command{gawk}'s -(@pxref{Unix Installation, , Compiling and Installing @command{gawk} on Unix}). +(@pxref{Unix Installation}). @cindex extensions, @command{mawk} @command{mawk} has the following extensions that are not in POSIX @command{awk}: @@ -24455,17 +24745,17 @@ is similar to @command{gawk}'s @itemize @bullet @item The @code{fflush} built-in function for flushing buffered output -(@pxref{I/O Functions, ,Input/Output Functions}). +(@pxref{I/O Functions}). @item The @samp{**} and @samp{**=} operators -(@pxref{Arithmetic Ops, ,Arithmetic Operators} +(@pxref{Arithmetic Ops} and also see -@ref{Assignment Ops, ,Assignment Expressions}). +@ref{Assignment Ops}). @item The use of @code{func} as an abbreviation for @code{function} -(@pxref{Definition Syntax, ,Function Definition Syntax}). +(@pxref{Definition Syntax}). @item The @samp{\x} escape sequence @@ -24474,25 +24764,25 @@ The @samp{\x} escape sequence @item The @file{/dev/stdout}, and @file{/dev/stderr} special files -(@pxref{Special Files, ,Special @value{FFN}s in @command{gawk}}). +(@pxref{Special Files}). Use @code{"-"} instead of @code{"/dev/stdin"} with @command{mawk}. @item The ability for @code{FS} and for the third argument to @code{split} to be null strings -(@pxref{Single Character Fields, , Making Each Character a Separate Field}). +(@pxref{Single Character Fields}). @item The ability to delete all of an array at once with @samp{delete @var{array}} -(@pxref{Delete, ,The @code{delete} Statement}). +(@pxref{Delete}). @item The ability for @code{RS} to be a regexp -(@pxref{Records, ,How Input Is Split into Records}). +(@pxref{Records}). @item The @code{BINMODE} special variable for non-Unix operating systems -(@pxref{PC Using, ,Using @command{gawk} on PC Operating Systems}). +(@pxref{PC Using}). @end itemize The next version of @command{mawk} will support @code{nextfile}. @@ -24519,7 +24809,7 @@ You can reach Andrew Sumner at @email{andrew@@zbcom.net}. Nelson H.F.@: Beebe at the University of Utah has modified the Bell Labs @command{awk} to provide timing and profiling information. It is different from @command{pgawk} -(@pxref{Profiling, ,Profiling Your @command{awk} Programs}), +(@pxref{Profiling}), in that it uses CPU-based profiling, not line-count profiling. You may find it at either @uref{ftp://ftp.math.utah.edu/pub/pawk/pawk-20020210.tar.gz} @@ -24531,7 +24821,7 @@ or @c ENDOFRANGE ingawk @c ENDOFRANGE awkim -@node Notes, Basic Concepts, Installation, Top +@node Notes @appendix Implementation Notes @c STARTOFRANGE gawii @cindex @command{gawk}, implementation issues @@ -24551,7 +24841,7 @@ maintainers of @command{gawk}. Everything in it applies specifically to * Future Extensions:: New features that may be implemented one day. @end menu -@node Compatibility Mode, Additions, Notes, Notes +@node Compatibility Mode @appendixsec Downward Compatibility and Debugging @cindex @command{gawk}, implementation issues, downward compatibility @cindex @command{gawk}, implementation issues, debugging @@ -24559,7 +24849,7 @@ maintainers of @command{gawk}. Everything in it applies specifically to @c first comma is part of primary @cindex implementation issues, @command{gawk}, debugging -@xref{POSIX/GNU, ,Extensions in @command{gawk} Not in POSIX @command{awk}}, +@xref{POSIX/GNU}, for a summary of the GNU extensions to the @command{awk} language and program. All of these features can be turned off by invoking @command{gawk} with the @option{--traditional} option or with the @option{--posix} option. @@ -24577,13 +24867,13 @@ This option is intended only for serious @command{gawk} developers and not for the casual user. It probably has not even been compiled into your version of @command{gawk}, since it slows down execution. -@node Additions, Dynamic Extensions, Compatibility Mode, Notes +@node Additions @appendixsec Making Additions to @command{gawk} If you find that you want to enhance @command{gawk} in a significant fashion, you are perfectly free to do so. That is the point of having free software; the source code is available and you are free to change -it as you want (@pxref{Copying, ,GNU General Public License}). +it as you want (@pxref{Copying}). This @value{SECTION} discusses the ways you might want to change @command{gawk} as well as any considerations you should bear in mind. @@ -24595,7 +24885,7 @@ as well as any considerations you should bear in mind. system. @end menu -@node Adding Code, New Ports, Additions, Additions +@node Adding Code @appendixsubsec Adding New Features @c STARTOFRANGE adfgaw @@ -24613,7 +24903,7 @@ make it possible for me to include your changes: @item Before building the new feature into @command{gawk} itself, consider writing it as an extension module -(@pxref{Dynamic Extensions, ,Adding New Built-in Functions to @command{gawk}}). +(@pxref{Dynamic Extensions}). If that's not possible, continue with the rest of the steps in this list. @item @@ -24621,7 +24911,7 @@ Get the latest version. It is much easier for me to integrate changes if they are relative to the most recent distributed version of @command{gawk}. If your version of @command{gawk} is very old, I may not be able to integrate them at all. -(@xref{Getting, ,Getting the @command{gawk} Distribution}, +(@xref{Getting}, for information on getting the latest version of @command{gawk}.) @item @@ -24723,7 +25013,7 @@ those changes in the public domain and submit a signed statement to that effect, or assign the copyright in your changes to the FSF. Both of these actions are easy to do and @emph{many} people have done so already. If you have questions, please contact me -(@pxref{Bugs, , Reporting Problems and Bugs}), +(@pxref{Bugs}), or @email{gnu@@gnu.org}. @cindex Texinfo @@ -24748,7 +25038,7 @@ more compact.) I recommend using the GNU version of @command{diff}. Send the output produced by either run of @command{diff} to me when you submit your changes. -(@xref{Bugs, , Reporting Problems and Bugs}, for the electronic mail +(@xref{Bugs}, for the electronic mail information.) Using this format makes it easy for me to apply your changes to the @@ -24770,7 +25060,7 @@ probably will not. @c ENDOFRANGE gawadf @c ENDOFRANGE fadgaw -@node New Ports, , Adding Code, Additions +@node New Ports @appendixsubsec Porting @command{gawk} to a New Operating System @cindex portability, @command{gawk} @cindex operating systems, porting @command{gawk} to @@ -24783,7 +25073,7 @@ several steps: @item Follow the guidelines in @ifinfo -@ref{Adding Code, ,Adding New Features}, +@ref{Adding Code}, @end ifinfo @ifnotinfo the previous @value{SECTION} @@ -24801,7 +25091,7 @@ If the changes needed for a particular system affect too much of the code, I probably will not accept them. In such a case, you can, of course, distribute your changes on your own, as long as you comply with the GPL -(@pxref{Copying, ,GNU General Public License}). +(@pxref{Copying}). @item A number of the files that come with @command{gawk} are maintained by other @@ -24871,7 +25161,7 @@ operating systems' code that is already there. In the code that you supply and maintain, feel free to use a coding style and brace layout that suits your taste. -@node Dynamic Extensions, Future Extensions, Additions, Notes +@node Dynamic Extensions @appendixsec Adding New Built-in Functions to @command{gawk} @cindex Robinson, Will @cindex robot, the @@ -24907,7 +25197,7 @@ upon the next release. * Sample Library:: A example of new functions. @end menu -@node Internals, Sample Library, Dynamic Extensions, Dynamic Extensions +@node Internals @appendixsubsec A Minimal Introduction to @command{gawk} Internals @c STARTOFRANGE gawint @cindex @command{gawk}, internals @@ -25085,7 +25375,9 @@ It is provided as a convenience. An argument that is supposed to be an array needs to be handled with some extra code, in case the array being passed in is actually from a function parameter. -The following boilerplate code shows how to do this: + +In versions of @command{gawk} up to and including 3.1.2, the +following boilerplate code shows how to do this: @smallexample NODE *the_arg; @@ -25109,11 +25401,27 @@ the_arg->type = Node_var_array; assoc_clear(the_arg); @end smallexample +For versions 3.1.3 and later, the internals changed. In particular, +the interface was actually @emph{simplified} drastically. The +following boilerplate code now suffices: + +@smallexample +NODE *the_arg; + +the_arg = get_argument(tree, 2); /* assume need 3rd arg, 0-based */ + +/* force it to be an array: */ +the_arg = get_array(the_arg); + +/* if necessary, clear it: */ +assoc_clear(the_arg); +@end smallexample + Again, you should spend time studying the @command{gawk} internals; don't just blindly copy this code. @c ENDOFRANGE gawint -@node Sample Library, , Internals, Dynamic Extensions +@node Sample Library @appendixsubsec Directory and File Operation Built-ins @c comma is part of primary @c STARTOFRANGE chdirg @@ -25140,7 +25448,7 @@ external extension library. * Using Internal File Ops:: How to use an external extension. @end menu -@node Internal File Description, Internal File Ops, Sample Library, Sample Library +@node Internal File Description @appendixsubsubsec Using @code{chdir} and @code{stat} This @value{SECTION} shows how to use the new functions at the @command{awk} @@ -25220,7 +25528,7 @@ be a function of the file's size if the file has holes. The file's last access, modification, and inode update times, respectively. These are numeric timestamps, suitable for formatting with @code{strftime} -(@pxref{Built-in, ,Built-in Functions}). +(@pxref{Built-in}). @item "pmode" The file's ``printable mode.'' This is a string representation of @@ -25263,7 +25571,7 @@ The file is a symbolic link. Several additional elements may be present depending upon the operating system and the type of the file. You can test for them in your @command{awk} program by using the @code{in} operator -(@pxref{Reference to Elements, ,Referring to an Array Element}): +(@pxref{Reference to Elements}): @table @code @item "blksize" @@ -25282,7 +25590,7 @@ represent the numeric device number and the major and minor components of that number, respectively. @end table -@node Internal File Ops, Using Internal File Ops, Internal File Description, Sample Library +@node Internal File Ops @appendixsubsubsec C Code for @code{chdir} and @code{stat} Here is the C code for these extensions. They were written for @@ -25481,7 +25789,7 @@ void *dl; And that's it! As an exercise, consider adding functions to implement system calls such as @code{chown}, @code{chmod}, and @code{umask}. -@node Using Internal File Ops, , Internal File Ops, Sample Library +@node Using Internal File Ops @appendixsubsubsec Integrating the Extensions @c last comma is part of secondary @@ -25529,7 +25837,7 @@ BEGIN @{ Here are the results of running the program: @example -$ gawk -f testff.awk +$ gawk -f testff.awk @print{} Info for testff.awk @print{} ret = 0 @print{} data["blksize"] = 4096 @@ -25557,7 +25865,7 @@ $ gawk -f testff.awk @c ENDOFRANGE adfugaw @c ENDOFRANGE fubadgaw -@node Future Extensions, , Dynamic Extensions, Notes +@node Future Extensions @appendixsec Probable Future Extensions @ignore From emory!scalpel.netlabs.com!lwall Tue Oct 31 12:43:17 1995 @@ -25654,7 +25962,7 @@ source code easier to work with: @table @asis @item Loadable module mechanics The current extension mechanism works -(@pxref{Dynamic Extensions, ,Adding New Built-in Functions to @command{gawk}}), +(@pxref{Dynamic Extensions}), but is rather primitive. It requires a fair amount of manual work to create and integrate a loadable module. Nor is the current mechanism as portable as might be desired. @@ -25707,12 +26015,12 @@ now. Finally, the programs in the test suite could use documenting in this @value{DOCUMENT}. -@xref{Additions, ,Making Additions to @command{gawk}}, +@xref{Additions}, if you are interested in tackling any of these projects. @c ENDOFRANGE impis @c ENDOFRANGE gawii -@node Basic Concepts, Glossary, Notes, Top +@node Basic Concepts @appendix Basic Programming Concepts @cindex programming, concepts @c STARTOFRANGE procon @@ -25732,7 +26040,7 @@ other introductory texts that you should refer to instead.) * Floating Point Issues:: Stuff to know about floating-point numbers. @end menu -@node Basic High Level, Basic Data Typing, Basic Concepts, Basic Concepts +@node Basic High Level @appendixsec What a Program Does @cindex processing data @@ -25929,7 +26237,7 @@ These are the things you do before actually starting to process data, such as checking arguments, initializing any data you need to work with, and so on. This step corresponds to @command{awk}'s @code{BEGIN} rule -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). +(@pxref{BEGIN/END}). If you were baking a cake, this might consist of laying out all the mixing bowls and the baking pan, and making sure you have all the @@ -25942,7 +26250,7 @@ one logical chunk at a time, and processes it as appropriate. In most programming languages, you have to manually manage the reading of data, checking to see if there is more each time you read a chunk. @command{awk}'s pattern-action paradigm -(@pxref{Getting Started, ,Getting Started with @command{awk}}) +(@pxref{Getting Started}) handles the mechanics of this for you. In baking a cake, the processing corresponds to the actual labor: @@ -25953,7 +26261,7 @@ into the oven. Once you've processed all the data, you may have things you need to do before exiting. This step corresponds to @command{awk}'s @code{END} rule -(@pxref{BEGIN/END, ,The @code{BEGIN} and @code{END} Special Patterns}). +(@pxref{BEGIN/END}). After the cake comes out of the oven, you still have to wrap it in plastic wrap to keep anyone from tasting it, as well as wash @@ -25992,7 +26300,7 @@ when those patterns are seen. This @dfn{data-driven} nature of @command{awk} programs usually makes them both easier to write and easier to read. -@node Basic Data Typing, Floating Point Issues, Basic High Level, Basic Concepts +@node Basic Data Typing @appendixsec Data Values in a Computer @cindex variables @@ -26015,7 +26323,7 @@ String values are essentially anything that's not a number, such as a name. Strings are sometimes referred to as @dfn{character data}, since they store the individual characters that comprise them. Individual variables, as well as numeric and string variables, are -referred to as @dfn{scalar} values. +referred to as @dfn{scalar} values. Groups of values, such as arrays, are not scalars. @cindex integers @@ -26049,7 +26357,7 @@ exactly. can hold more digits than @dfn{single-precision} floating-point numbers. Floating-point issues are discussed more fully in -@ref{Floating Point Issues, ,Floating-Point Number Caveats}. +@ref{Floating Point Issues}. At the very lowest level, computers store values as groups of binary digits, or @dfn{bits}. Modern computers group bits into groups of eight, called @dfn{bytes}. @@ -26076,7 +26384,7 @@ its right. Each column may contain either a 0 or a 1. Thus, binary 1010 represents 1 times 8, plus 0 times 4, plus 1 times 2, plus 0 times 1, or decimal 10. Octal and hexadecimal are discussed more in -@ref{Nondecimal-numbers, ,Octal and Hexadecimal Numbers}. +@ref{Nondecimal-numbers}. Programs are written in programming languages. Hundreds, if not thousands, of programming languages exist. @@ -26100,7 +26408,7 @@ In 1999, a revised ISO C standard was approved and released. Future versions of @command{gawk} will be as compatible as possible with this standard. -@node Floating Point Issues, , Basic Data Typing, Basic Concepts +@node Floating Point Issues @appendixsec Floating-Point Number Caveats As mentioned earlier, floating-point numbers represent what are called @@ -26121,7 +26429,7 @@ Internally, @command{awk} keeps both the numeric value (double-precision floating-point) and the string value for a variable. Separately, @command{awk} keeps track of what type the variable has -(@pxref{Typing and Comparison, ,Variable Typing and Comparison Expressions}), +(@pxref{Typing and Comparison}), which plays a role in how variables are used in comparisons. It is important to note that the string value for a number may not @@ -26212,7 +26520,7 @@ when stored internally, but that they are in fact equal to each other, as well as to ``regular'' zero: @smallexample -$ gawk 'BEGIN @{ mz = -0 ; pz = 0 +$ gawk 'BEGIN @{ mz = -0 ; pz = 0 > printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz > printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0 > @}' @@ -26225,7 +26533,7 @@ that contains negative zero values; the fact that the zero is negative is noted and can affect comparisons. @c ENDOFRANGE procon -@node Glossary, Copying, Basic Concepts, Top +@node Glossary @unnumbered Glossary @table @asis @@ -26233,7 +26541,7 @@ is noted and can affect comparisons. A series of @command{awk} statements attached to a rule. If the rule's pattern matches an input record, @command{awk} executes the rule's action. Actions are always enclosed in curly braces. -(@xref{Action Overview, ,Actions}.) +(@xref{Action Overview}.) @cindex Spencer, Henry @cindex @command{sed} utility @@ -26280,7 +26588,7 @@ Useful for reasoning about how a program is supposed to behave. An @command{awk} expression that changes the value of some @command{awk} variable or data object. An object that you can assign to is called an @dfn{lvalue}. The assigned values are called @dfn{rvalues}. -@xref{Assignment Ops, ,Assignment Expressions}. +@xref{Assignment Ops}. @item Associative Array Arrays in which the indices may be numbers or strings, not just @@ -26321,7 +26629,7 @@ memory objects, or other data. @command{awk} lets you work with floating-point numbers and strings. @command{gawk} lets you manipulate bit values with the built-in functions described in -@ref{Bitwise Functions, ,Using @command{gawk}'s Bit Manipulation Functions}. +@ref{Bitwise Functions}. Computers are often defined by how many bits they use to represent integer values. Typical systems are 32-bit systems, but 64-bit systems are @@ -26344,7 +26652,7 @@ numerical, I/O-related, and string computations. Examples are substring of a string). @command{gawk} provides functions for timestamp management, bit manipulation, and runtime string translation. -(@xref{Built-in, ,Built-in Functions}.) +(@xref{Built-in}.) @item Built-in Variable @code{ARGC}, @@ -26431,13 +26739,13 @@ See also ``Interpreter.'' @item Compound Statement A series of @command{awk} statements, enclosed in curly braces. Compound statements may be nested. -(@xref{Statements, ,Control Statements in Actions}.) +(@xref{Statements}.) @item Concatenation Concatenating two strings means sticking them together, one after another, producing a new string. For example, the string @samp{foo} concatenated with the string @samp{bar} gives the string @samp{foobar}. -(@xref{Concatenation, ,String Concatenation}.) +(@xref{Concatenation}.) @item Conditional Expression An expression using the @samp{?:} ternary operator, such as @@ -26445,14 +26753,14 @@ An expression using the @samp{?:} ternary operator, such as @var{expr1} is evaluated; if the result is true, the value of the whole expression is the value of @var{expr2}; otherwise the value is @var{expr3}. In either case, only one of @var{expr2} and @var{expr3} -is evaluated. (@xref{Conditional Exp, ,Conditional Expressions}.) +is evaluated. (@xref{Conditional Exp}.) @item Comparison Expression A relation that is either true or false, such as @samp{(a < b)}. Comparison expressions are used in @code{if}, @code{while}, @code{do}, and @code{for} statements, and in patterns to select which input records to process. -(@xref{Typing and Comparison, ,Variable Typing and Comparison Expressions}.) +(@xref{Typing and Comparison}.) @item Curly Braces The characters @samp{@{} and @samp{@}}. Curly braces are used in @@ -26479,7 +26787,7 @@ are interested in processing, and what to do when that data is seen. @item Data Objects These are numbers and strings of characters. Numbers are converted into strings and vice versa, as needed. -(@xref{Conversion, ,Conversion of Strings and Numbers}.) +(@xref{Conversion}.) @item Deadlock The situation in which two communicating processes are each waiting @@ -26495,7 +26803,7 @@ numbers, but operations on them are sometimes more expensive. This is the way A dynamic regular expression is a regular expression written as an ordinary expression. It could be a string constant, such as @code{"foo"}, but it may also be an expression whose value can vary. -(@xref{Computed Regexps, , Using Dynamic Regexps}.) +(@xref{Computed Regexps}.) @item Environment A collection of strings, of the form @var{name@code{=}val}, that each @@ -26530,9 +26838,9 @@ separated by whitespace (or by a separator regexp that you can change by setting the built-in variable @code{FS}). Such pieces are called fields. If the pieces are of fixed length, you can use the built-in variable @code{FIELDWIDTHS} to describe their lengths. -(@xref{Field Separators, ,Specifying How Fields Are Separated}, +(@xref{Field Separators}, and -@ref{Constant Size, ,Reading Fixed-Width Data}.) +@ref{Constant Size}.) @item Flag A variable whose truth value indicates the existence or nonexistence @@ -26548,7 +26856,7 @@ Format strings are used to control the appearance of output in the @code{strftime} and @code{sprintf} functions, and are used in the @code{printf} statement as well. Also, data conversions from numbers to strings are controlled by the format string contained in the built-in variable -@code{CONVFMT}. (@xref{Control Letters, ,Format-Control Letters}.) +@code{CONVFMT}. (@xref{Control Letters}.) @item Free Documentation License This document describes the terms under which this @value{DOCUMENT} @@ -26580,7 +26888,7 @@ The GNU implementation of @command{awk}. @cindex GNU General Public License @item General Public License This document describes the terms under which @command{gawk} and its source -code may be distributed. (@xref{Copying, ,GNU General Public License}.) +code may be distributed. (@xref{Copying}.) @item GMT ``Greenwich Mean Time.'' @@ -26623,7 +26931,7 @@ out of a running program. @item Input Record A single chunk of data that is read in by @command{awk}. Usually, an @command{awk} input record consists of one line of text. -(@xref{Records, ,How Input Is Split into Records}.) +(@xref{Records}.) @item Integer A whole number, i.e., a number that does not have a fractional part. @@ -26743,7 +27051,7 @@ rules. A pattern is an arbitrary conditional expression against which input is tested. If the condition is satisfied, the pattern is said to @dfn{match} the input record. A typical pattern might compare the input record against -a regular expression. (@xref{Pattern Overview, ,Pattern Elements}.) +a regular expression. (@xref{Pattern Overview}.) @item POSIX The name for a series of standards @@ -26763,12 +27071,12 @@ without explicit parentheses. Variables and/or functions that are meant for use exclusively by library functions and not for the main @command{awk} program. Special care must be taken when naming such variables and functions. -(@xref{Library Names, , Naming Library Function Global Variables}.) +(@xref{Library Names}.) @item Range (of input lines) A sequence of consecutive lines from the input file(s). A pattern can specify ranges of input lines for @command{awk} to process or it can -specify single lines. (@xref{Pattern Overview, ,Pattern Elements}.) +specify single lines. (@xref{Pattern Overview}.) @item Recursion When a function calls itself, either directly or indirectly. @@ -26782,8 +27090,8 @@ You can redirect the output of the @code{print} and @code{printf} statements to a file or a system command, using the @samp{>}, @samp{>>}, @samp{|}, and @samp{|&} operators. You can redirect input to the @code{getline} statement using the @samp{<}, @samp{|}, and @samp{|&} operators. -(@xref{Redirection, ,Redirecting Output of @code{print} and @code{printf}}, -and @ref{Getline, ,Explicit Input with @code{getline}}.) +(@xref{Redirection}, +and @ref{Getline}.) @item Regexp Short for @dfn{regular expression}. A regexp is a pattern that denotes a @@ -26791,7 +27099,7 @@ set of strings, possibly an infinite set. For example, the regexp @samp{R.*xp} matches any string starting with the letter @samp{R} and ending with the letters @samp{xp}. In @command{awk}, regexps are used in patterns and in conditional expressions. Regexps may contain -escape sequences. (@xref{Regexp, ,Regular Expressions}.) +escape sequences. (@xref{Regexp}.) @item Regular Expression See ``regexp.'' @@ -26800,7 +27108,7 @@ See ``regexp.'' A regular expression constant is a regular expression written within slashes, such as @code{/foo/}. This regular expression is chosen when you write the @command{awk} program and cannot be changed during -its execution. (@xref{Regexp Usage, ,How to Use Regular Expressions}.) +its execution. (@xref{Regexp Usage}.) @item Rule A segment of an @command{awk} program that specifies how to process single @@ -26838,13 +27146,13 @@ The nature of the @command{awk} logical operators @samp{&&} and @samp{||}. If the value of the entire expression is determinable from evaluating just the lefthand side of these operators, the righthand side is not evaluated. -(@xref{Boolean Ops, ,Boolean Expressions}.) +(@xref{Boolean Ops}.) @item Side Effect A side effect occurs when an expression has an effect aside from merely producing a value. Assignment expressions, increment and decrement expressions, and function calls have side effects. -(@xref{Assignment Ops, ,Assignment Expressions}.) +(@xref{Assignment Ops}.) @item Single-Precision An internal representation of numbers that can have fractional parts. @@ -26859,7 +27167,7 @@ The character generated by hitting the space bar on the keyboard. @item Special File A @value{FN} interpreted internally by @command{gawk}, instead of being handed directly to the underlying operating system---for example, @file{/dev/stderr}. -(@xref{Special Files, ,Special @value{FFN}s in @command{gawk}}.) +(@xref{Special Files}.) @item Stream Editor A program that reads records from an input stream and processes them one @@ -26915,7 +27223,7 @@ A sequence of space, TAB, or newline characters occurring inside an input record or a string. @end table -@node Copying, GNU Free Documentation License, Glossary, Top +@node Copying @unnumbered GNU General Public License @center Version 2, June 1991 @@ -27770,7 +28078,7 @@ to permit their use in free software. @c End: -@node Index, , GNU Free Documentation License, Top +@node Index @unnumbered Index @printindex cp |