summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorArnold D. Robbins <arnold@skeeve.com>2020-04-12 20:33:06 +0300
committerArnold D. Robbins <arnold@skeeve.com>2020-04-12 20:33:06 +0300
commit96656f01af15915c943865c8705bc7fc4a9ab436 (patch)
tree3a10d0864d7b92c145ddd863e0a23bc464e6f217
parentb60b2e33b6fa1727050c1e97662e1cf79ef1652b (diff)
downloadgawk-96656f01af15915c943865c8705bc7fc4a9ab436.tar.gz
More stuff on CSV files.
-rw-r--r--doc/ChangeLog5
-rw-r--r--doc/gawk.info1267
-rw-r--r--doc/gawk.texi84
-rw-r--r--doc/gawktexi.in84
4 files changed, 822 insertions, 618 deletions
diff --git a/doc/ChangeLog b/doc/ChangeLog
index 02dd55d5..0a18a812 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,8 @@
+2020-04-12 Arnold D. Robbins <arnold@skeeve.com>
+
+ * gawktexi.in: Contribution from Manuel Collado related
+ to CSV processing.
+
2020-04-10 Arnold D. Robbins <arnold@skeeve.com>
* gawktexi.in, gawk.1: Fix some spelling errors.
diff --git a/doc/gawk.info b/doc/gawk.info
index 36558c32..21acc797 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -204,6 +204,7 @@ in (a) below. A copy of the license is included in the section entitled
* Allowing trailing data:: Capturing optional trailing data.
* Fields with fixed data:: Field values with fixed-width data.
* Splitting By Content:: Defining Fields By Content
+* More CSV:: More on CSV files.
* Testing field creation:: Checking how 'gawk' is
splitting records.
* Multiple Line:: Reading multiline records.
@@ -5476,6 +5477,10 @@ File: gawk.info, Node: Splitting By Content, Next: Testing field creation, Pr
4.7 Defining Fields by Content
==============================
+* Menu:
+
+* More CSV:: More on CSV files.
+
This minor node discusses an advanced feature of 'gawk'. If you are a
novice 'awk' user, you might want to skip it on the first reading.
@@ -5501,8 +5506,9 @@ regular expression describes the contents of each field.
In the case of CSV data as presented here, each field is either
"anything that is not a comma," or "a double quote, anything that is not
-a double quote, and a closing double quote." If written as a regular
-expression constant (*note Regexp::), we would have
+a double quote, and a closing double quote." (There are more
+complicated definitions of CSV data, treated shortly.) If written as a
+regular expression constant (*note Regexp::), we would have
'/([^,]+)|("[^"]+")/'. Writing this as a string requires us to escape
the double quotes, leading to:
@@ -5543,12 +5549,6 @@ would be to remove the quotes when they occur, with something like this:
$i = substr($i, 2, len - 2) # Get text within the two quotes
}
- As with 'FS', the 'IGNORECASE' variable (*note User-modified::)
-affects field splitting with 'FPAT'.
-
- Assigning a value to 'FPAT' overrides field splitting with 'FS' and
-with 'FIELDWIDTHS'.
-
NOTE: Some programs export CSV data that contains embedded newlines
between the double quotes. 'gawk' provides no way to deal with
this. Even though a formal specification for CSV data exists,
@@ -5562,6 +5562,12 @@ contain at least one character. A straightforward modification
FPAT = "([^,]*)|(\"[^\"]+\")"
+ As with 'FS', the 'IGNORECASE' variable (*note User-modified::)
+affects field splitting with 'FPAT'.
+
+ Assigning a value to 'FPAT' overrides field splitting with 'FS' and
+with 'FIELDWIDTHS'.
+
Finally, the 'patsplit()' function makes the same functionality
available for splitting regular strings (*note String Functions::).
@@ -5572,6 +5578,58 @@ years. RFC 4180 (http://www.ietf.org/rfc/rfc4180.txt) standardizes the
most common practices.

+File: gawk.info, Node: More CSV, Up: Splitting By Content
+
+4.7.1 More on CSV Files
+-----------------------
+
+Manuel Collado notes that in addition to commas, a CSV field can also
+contains quotes, that have to be escaped by doubling them. The
+previously described regexps fail to accept quoted fields with both
+commas and quotes inside. He suggests that the simplest 'FPAT'
+expression that recognizes this kind of fields is
+'/([^,]*)|("([^"]|"")+")/'. He provides the following imput data test
+these variants:
+
+ p,"q,r",s
+ p,"q""r",s
+ p,"q,""r",s
+ p,"",s
+ p,,s
+
+And here is his test program:
+
+ BEGIN {
+ fp[0] = "([^,]+)|(\"[^\"]+\")"
+ fp[1] = "([^,]*)|(\"[^\"]+\")"
+ fp[2] = "([^,]*)|(\"([^\"]|\"\")+\")"
+ FPAT = fp[fpat+0]
+ }
+
+ {
+ print "<" $0 ">"
+ printf("NF = %s ", NF)
+ for (i = 1; i <= NF; i++) {
+ printf("<%s>", $i)
+ }
+ print ""
+ }
+
+ When run on the third variant, it produces:
+
+ $ gawk -v fpat=2 -f test-csv.awk sample.csv
+ -| <p,"q,r",s>
+ -| NF = 3 <p><"q,r"><s>
+ -| <p,"q""r",s>
+ -| NF = 3 <p><"q""r"><s>
+ -| <p,"q,""r",s>
+ -| NF = 3 <p><"q,""r"><s>
+ -| <p,"",s>
+ -| NF = 3 <p><""><s>
+ -| <p,,s>
+ -| NF = 3 <p><><s>
+
+
File: gawk.info, Node: Testing field creation, Next: Multiple Line, Prev: Splitting By Content, Up: Reading Files
4.8 Checking How 'gawk' Is Splitting Records
@@ -34147,7 +34205,7 @@ Index
* adding, features to gawk: Adding Code. (line 6)
* advanced features, fixed-width data: Constant Size. (line 6)
* advanced features, specifying field content: Splitting By Content.
- (line 9)
+ (line 13)
* advanced features, gawk: Advanced Features. (line 6)
* advanced features, nondecimal input data: Nondecimal Data. (line 6)
* advanced features, processes, communicating with: Two-way I/O.
@@ -35455,7 +35513,7 @@ Index
* forward slash (/), /= operator <1>: Precedence. (line 94)
* forward slash (/), patterns and: Expression Patterns. (line 24)
* FPAT variable: Splitting By Content.
- (line 25)
+ (line 29)
* FPAT variable <1>: User-modified. (line 46)
* frame debugger command: Execution Stack. (line 27)
* Free Documentation License (FDL): GNU Free Documentation License.
@@ -35565,7 +35623,7 @@ Index
* gawk, RT variable in <1>: gawk split records. (line 58)
* gawk, FIELDWIDTHS variable in: Fixed width data. (line 17)
* gawk, FPAT variable in: Splitting By Content.
- (line 25)
+ (line 29)
* gawk, splitting fields and: Testing field creation.
(line 6)
* gawk, RT variable in <2>: Multiple Line. (line 139)
@@ -37254,599 +37312,600 @@ Index

Tag Table:
Node: Top1200
-Node: Foreword344344
-Node: Foreword448786
-Node: Preface50318
-Ref: Preface-Footnote-153177
-Ref: Preface-Footnote-253286
-Ref: Preface-Footnote-353520
-Node: History53662
-Node: Names56014
-Ref: Names-Footnote-157118
-Node: This Manual57265
-Ref: This Manual-Footnote-163904
-Node: Conventions64004
-Node: Manual History66373
-Ref: Manual History-Footnote-169370
-Ref: Manual History-Footnote-269411
-Node: How To Contribute69485
-Node: Acknowledgments70411
-Node: Getting Started75348
-Node: Running gawk77787
-Node: One-shot78977
-Node: Read Terminal80240
-Node: Long82233
-Node: Executable Scripts83746
-Ref: Executable Scripts-Footnote-186379
-Node: Comments86482
-Node: Quoting88966
-Node: DOS Quoting94492
-Node: Sample Data Files96548
-Node: Very Simple99143
-Node: Two Rules104045
-Node: More Complex105930
-Node: Statements/Lines108796
-Ref: Statements/Lines-Footnote-1113280
-Node: Other Features113545
-Node: When114481
-Ref: When-Footnote-1116235
-Node: Intro Summary116300
-Node: Invoking Gawk117184
-Node: Command Line118698
-Node: Options119496
-Ref: Options-Footnote-1137165
-Ref: Options-Footnote-2137396
-Node: Other Arguments137421
-Node: Naming Standard Input140728
-Node: Environment Variables141938
-Node: AWKPATH Variable142496
-Ref: AWKPATH Variable-Footnote-1145908
-Ref: AWKPATH Variable-Footnote-2145942
-Node: AWKLIBPATH Variable146313
-Ref: AWKLIBPATH Variable-Footnote-1148010
-Node: Other Environment Variables148385
-Node: Exit Status152206
-Node: Include Files152883
-Node: Loading Shared Libraries156573
-Node: Obsolete158001
-Node: Undocumented158693
-Node: Invoking Summary158990
-Node: Regexp161831
-Node: Regexp Usage163285
-Node: Escape Sequences165322
-Node: Regexp Operators171563
-Node: Regexp Operator Details172048
-Ref: Regexp Operator Details-Footnote-1178480
-Node: Interval Expressions178627
-Ref: Interval Expressions-Footnote-1180048
-Node: Bracket Expressions180146
-Ref: table-char-classes182622
-Node: Leftmost Longest185948
-Node: Computed Regexps187251
-Node: GNU Regexp Operators190678
-Node: Case-sensitivity194415
-Ref: Case-sensitivity-Footnote-1197281
-Ref: Case-sensitivity-Footnote-2197516
-Node: Regexp Summary197624
-Node: Reading Files199090
-Node: Records201359
-Node: awk split records202434
-Node: gawk split records207709
-Ref: gawk split records-Footnote-1212442
-Node: Fields212479
-Node: Nonconstant Fields215220
-Ref: Nonconstant Fields-Footnote-1217456
-Node: Changing Fields217660
-Node: Field Separators223691
-Node: Default Field Splitting226389
-Node: Regexp Field Splitting227507
-Node: Single Character Fields230860
-Node: Command Line Field Separator231920
-Node: Full Line Fields235138
-Ref: Full Line Fields-Footnote-1236660
-Ref: Full Line Fields-Footnote-2236706
-Node: Field Splitting Summary236807
-Node: Constant Size238881
-Node: Fixed width data239613
-Node: Skipping intervening243080
-Node: Allowing trailing data243878
-Node: Fields with fixed data244915
-Node: Splitting By Content246433
-Ref: Splitting By Content-Footnote-1250083
-Node: Testing field creation250246
-Node: Multiple Line251871
-Node: Getline258148
-Node: Plain Getline260617
-Node: Getline/Variable263190
-Node: Getline/File264341
-Node: Getline/Variable/File265729
-Ref: Getline/Variable/File-Footnote-1267334
-Node: Getline/Pipe267422
-Node: Getline/Variable/Pipe270126
-Node: Getline/Coprocess271261
-Node: Getline/Variable/Coprocess272528
-Node: Getline Notes273270
-Node: Getline Summary276067
-Ref: table-getline-variants276491
-Node: Read Timeout277239
-Ref: Read Timeout-Footnote-1281145
-Node: Retrying Input281203
-Node: Command-line directories282402
-Node: Input Summary283308
-Node: Input Exercises286480
-Node: Printing286914
-Node: Print288748
-Node: Print Examples290205
-Node: Output Separators292985
-Node: OFMT295002
-Node: Printf296358
-Node: Basic Printf297143
-Node: Control Letters298717
-Node: Format Modifiers303881
-Node: Printf Examples309896
-Node: Redirection312382
-Node: Special FD319223
-Ref: Special FD-Footnote-1322391
-Node: Special Files322465
-Node: Other Inherited Files323082
-Node: Special Network324083
-Node: Special Caveats324943
-Node: Close Files And Pipes325892
-Ref: table-close-pipe-return-values332799
-Ref: Close Files And Pipes-Footnote-1333612
-Ref: Close Files And Pipes-Footnote-2333760
-Node: Nonfatal333912
-Node: Output Summary336250
-Node: Output Exercises337472
-Node: Expressions338151
-Node: Values339339
-Node: Constants340017
-Node: Scalar Constants340708
-Ref: Scalar Constants-Footnote-1343232
-Node: Nondecimal-numbers343482
-Node: Regexp Constants346483
-Node: Using Constant Regexps347009
-Node: Standard Regexp Constants347631
-Node: Strong Regexp Constants350819
-Node: Variables353777
-Node: Using Variables354434
-Node: Assignment Options356344
-Node: Conversion358815
-Node: Strings And Numbers359339
-Ref: Strings And Numbers-Footnote-1362402
-Node: Locale influences conversions362511
-Ref: table-locale-affects365269
-Node: All Operators365887
-Node: Arithmetic Ops366516
-Node: Concatenation369022
-Ref: Concatenation-Footnote-1371869
-Node: Assignment Ops371976
-Ref: table-assign-ops376967
-Node: Increment Ops378280
-Node: Truth Values and Conditions381740
-Node: Truth Values382814
-Node: Typing and Comparison383862
-Node: Variable Typing384682
-Ref: Variable Typing-Footnote-1391145
-Ref: Variable Typing-Footnote-2391217
-Node: Comparison Operators391294
-Ref: table-relational-ops391713
-Node: POSIX String Comparison395208
-Ref: POSIX String Comparison-Footnote-1396903
-Ref: POSIX String Comparison-Footnote-2397042
-Node: Boolean Ops397126
-Ref: Boolean Ops-Footnote-1401608
-Node: Conditional Exp401700
-Node: Function Calls403436
-Node: Precedence407313
-Node: Locales410972
-Node: Expressions Summary412604
-Node: Patterns and Actions415177
-Node: Pattern Overview416297
-Node: Regexp Patterns417974
-Node: Expression Patterns418516
-Node: Ranges422297
-Node: BEGIN/END425405
-Node: Using BEGIN/END426166
-Ref: Using BEGIN/END-Footnote-1428902
-Node: I/O And BEGIN/END429008
-Node: BEGINFILE/ENDFILE431322
-Node: Empty434235
-Node: Using Shell Variables434552
-Node: Action Overview436826
-Node: Statements439151
-Node: If Statement440999
-Node: While Statement442494
-Node: Do Statement444522
-Node: For Statement445670
-Node: Switch Statement448841
-Node: Break Statement451227
-Node: Continue Statement453319
-Node: Next Statement455146
-Node: Nextfile Statement457529
-Node: Exit Statement460181
-Node: Built-in Variables462584
-Node: User-modified463717
-Node: Auto-set471484
-Ref: Auto-set-Footnote-1488291
-Ref: Auto-set-Footnote-2488497
-Node: ARGC and ARGV488553
-Node: Pattern Action Summary492766
-Node: Arrays495196
-Node: Array Basics496525
-Node: Array Intro497369
-Ref: figure-array-elements499344
-Ref: Array Intro-Footnote-1502048
-Node: Reference to Elements502176
-Node: Assigning Elements504640
-Node: Array Example505131
-Node: Scanning an Array506890
-Node: Controlling Scanning509912
-Ref: Controlling Scanning-Footnote-1516368
-Node: Numeric Array Subscripts516684
-Node: Uninitialized Subscripts518868
-Node: Delete520487
-Ref: Delete-Footnote-1523239
-Node: Multidimensional523296
-Node: Multiscanning526391
-Node: Arrays of Arrays527982
-Node: Arrays Summary532750
-Node: Functions534843
-Node: Built-in535881
-Node: Calling Built-in536962
-Node: Numeric Functions538958
-Ref: Numeric Functions-Footnote-1542986
-Ref: Numeric Functions-Footnote-2543634
-Ref: Numeric Functions-Footnote-3543682
-Node: String Functions543954
-Ref: String Functions-Footnote-1568138
-Ref: String Functions-Footnote-2568266
-Ref: String Functions-Footnote-3568514
-Node: Gory Details568601
-Ref: table-sub-escapes570392
-Ref: table-sub-proposed571911
-Ref: table-posix-sub573274
-Ref: table-gensub-escapes574815
-Ref: Gory Details-Footnote-1575638
-Node: I/O Functions575792
-Ref: table-system-return-values582260
-Ref: I/O Functions-Footnote-1584340
-Ref: I/O Functions-Footnote-2584488
-Node: Time Functions584608
-Ref: Time Functions-Footnote-1595279
-Ref: Time Functions-Footnote-2595347
-Ref: Time Functions-Footnote-3595505
-Ref: Time Functions-Footnote-4595616
-Ref: Time Functions-Footnote-5595728
-Ref: Time Functions-Footnote-6595955
-Node: Bitwise Functions596221
-Ref: table-bitwise-ops596815
-Ref: Bitwise Functions-Footnote-1602878
-Ref: Bitwise Functions-Footnote-2603051
-Node: Type Functions603242
-Node: I18N Functions606105
-Node: User-defined607756
-Node: Definition Syntax608568
-Ref: Definition Syntax-Footnote-1614255
-Node: Function Example614326
-Ref: Function Example-Footnote-1617248
-Node: Function Calling617270
-Node: Calling A Function617858
-Node: Variable Scope618816
-Node: Pass By Value/Reference621810
-Node: Function Caveats624454
-Ref: Function Caveats-Footnote-1626501
-Node: Return Statement626621
-Node: Dynamic Typing629600
-Node: Indirect Calls630530
-Ref: Indirect Calls-Footnote-1640782
-Node: Functions Summary640910
-Node: Library Functions643615
-Ref: Library Functions-Footnote-1647222
-Ref: Library Functions-Footnote-2647365
-Node: Library Names647536
-Ref: Library Names-Footnote-1651203
-Ref: Library Names-Footnote-2651426
-Node: General Functions651512
-Node: Strtonum Function652615
-Node: Assert Function655637
-Node: Round Function658963
-Node: Cliff Random Function660503
-Node: Ordinal Functions661519
-Ref: Ordinal Functions-Footnote-1664582
-Ref: Ordinal Functions-Footnote-2664834
-Node: Join Function665044
-Ref: Join Function-Footnote-1666814
-Node: Getlocaltime Function667014
-Node: Readfile Function670756
-Node: Shell Quoting672733
-Node: Data File Management674134
-Node: Filetrans Function674766
-Node: Rewind Function678862
-Node: File Checking680771
-Ref: File Checking-Footnote-1682105
-Node: Empty Files682306
-Node: Ignoring Assigns684285
-Node: Getopt Function685835
-Ref: Getopt Function-Footnote-1701047
-Node: Passwd Functions701247
-Ref: Passwd Functions-Footnote-1710086
-Node: Group Functions710174
-Ref: Group Functions-Footnote-1718072
-Node: Walking Arrays718279
-Node: Library Functions Summary721287
-Node: Library Exercises722693
-Node: Sample Programs723158
-Node: Running Examples723928
-Node: Clones724656
-Node: Cut Program725880
-Node: Egrep Program735809
-Ref: Egrep Program-Footnote-1743321
-Node: Id Program743431
-Node: Split Program747111
-Ref: Split Program-Footnote-1750569
-Node: Tee Program750698
-Node: Uniq Program753488
-Node: Wc Program761109
-Ref: Wc Program-Footnote-1765364
-Node: Miscellaneous Programs765458
-Node: Dupword Program766671
-Node: Alarm Program768701
-Node: Translate Program773556
-Ref: Translate Program-Footnote-1778121
-Node: Labels Program778391
-Ref: Labels Program-Footnote-1781742
-Node: Word Sorting781826
-Node: History Sorting785898
-Node: Extract Program788123
-Node: Simple Sed796177
-Node: Igawk Program799251
-Ref: Igawk Program-Footnote-1813582
-Ref: Igawk Program-Footnote-2813784
-Ref: Igawk Program-Footnote-3813906
-Node: Anagram Program814021
-Node: Signature Program817083
-Node: Programs Summary818330
-Node: Programs Exercises819544
-Ref: Programs Exercises-Footnote-1823673
-Node: Advanced Features823764
-Node: Nondecimal Data825754
-Node: Array Sorting827345
-Node: Controlling Array Traversal828045
-Ref: Controlling Array Traversal-Footnote-1836413
-Node: Array Sorting Functions836531
-Ref: Array Sorting Functions-Footnote-1841622
-Node: Two-way I/O841818
-Ref: Two-way I/O-Footnote-1849539
-Ref: Two-way I/O-Footnote-2849726
-Node: TCP/IP Networking849808
-Node: Profiling852926
-Node: Advanced Features Summary861941
-Node: Internationalization863785
-Node: I18N and L10N865265
-Node: Explaining gettext865952
-Ref: Explaining gettext-Footnote-1871844
-Ref: Explaining gettext-Footnote-2872029
-Node: Programmer i18n872194
-Ref: Programmer i18n-Footnote-1877143
-Node: Translator i18n877192
-Node: String Extraction877986
-Ref: String Extraction-Footnote-1879118
-Node: Printf Ordering879204
-Ref: Printf Ordering-Footnote-1881990
-Node: I18N Portability882054
-Ref: I18N Portability-Footnote-1884510
-Node: I18N Example884573
-Ref: I18N Example-Footnote-1887848
-Ref: I18N Example-Footnote-2887921
-Node: Gawk I18N888030
-Node: I18N Summary888679
-Node: Debugger890020
-Node: Debugging891020
-Node: Debugging Concepts891461
-Node: Debugging Terms893270
-Node: Awk Debugging895845
-Ref: Awk Debugging-Footnote-1896790
-Node: Sample Debugging Session896922
-Node: Debugger Invocation897456
-Node: Finding The Bug898842
-Node: List of Debugger Commands905316
-Node: Breakpoint Control906649
-Node: Debugger Execution Control910343
-Node: Viewing And Changing Data913705
-Node: Execution Stack917246
-Node: Debugger Info918883
-Node: Miscellaneous Debugger Commands922954
-Node: Readline Support928016
-Node: Limitations928912
-Node: Debugging Summary931466
-Node: Namespaces932745
-Node: Global Namespace933856
-Node: Qualified Names935254
-Node: Default Namespace936253
-Node: Changing The Namespace936994
-Node: Naming Rules938608
-Node: Internal Name Management940456
-Node: Namespace Example941498
-Node: Namespace And Features944060
-Node: Namespace Summary945495
-Node: Arbitrary Precision Arithmetic946972
-Node: Computer Arithmetic948459
-Ref: table-numeric-ranges952225
-Ref: table-floating-point-ranges952718
-Ref: Computer Arithmetic-Footnote-1953376
-Node: Math Definitions953433
-Ref: table-ieee-formats956749
-Ref: Math Definitions-Footnote-1957352
-Node: MPFR features957457
-Node: FP Math Caution959175
-Ref: FP Math Caution-Footnote-1960247
-Node: Inexactness of computations960616
-Node: Inexact representation961576
-Node: Comparing FP Values962936
-Node: Errors accumulate964177
-Node: Getting Accuracy965610
-Node: Try To Round968320
-Node: Setting precision969219
-Ref: table-predefined-precision-strings969916
-Node: Setting the rounding mode971746
-Ref: table-gawk-rounding-modes972120
-Ref: Setting the rounding mode-Footnote-1976051
-Node: Arbitrary Precision Integers976230
-Ref: Arbitrary Precision Integers-Footnote-1979405
-Node: Checking for MPFR979554
-Node: POSIX Floating Point Problems981028
-Ref: POSIX Floating Point Problems-Footnote-1985313
-Node: Floating point summary985351
-Node: Dynamic Extensions987541
-Node: Extension Intro989094
-Node: Plugin License990360
-Node: Extension Mechanism Outline991157
-Ref: figure-load-extension991596
-Ref: figure-register-new-function993161
-Ref: figure-call-new-function994253
-Node: Extension API Description996315
-Node: Extension API Functions Introduction997957
-Ref: table-api-std-headers999793
-Node: General Data Types1003658
-Ref: General Data Types-Footnote-11012019
-Node: Memory Allocation Functions1012318
-Ref: Memory Allocation Functions-Footnote-11016528
-Node: Constructor Functions1016627
-Node: Registration Functions1020213
-Node: Extension Functions1020898
-Node: Exit Callback Functions1026220
-Node: Extension Version String1027470
-Node: Input Parsers1028133
-Node: Output Wrappers1040854
-Node: Two-way processors1045366
-Node: Printing Messages1047631
-Ref: Printing Messages-Footnote-11048802
-Node: Updating ERRNO1048955
-Node: Requesting Values1049694
-Ref: table-value-types-returned1050431
-Node: Accessing Parameters1051367
-Node: Symbol Table Access1052602
-Node: Symbol table by name1053114
-Ref: Symbol table by name-Footnote-11056138
-Node: Symbol table by cookie1056266
-Ref: Symbol table by cookie-Footnote-11060451
-Node: Cached values1060515
-Ref: Cached values-Footnote-11064051
-Node: Array Manipulation1064204
-Ref: Array Manipulation-Footnote-11065295
-Node: Array Data Types1065332
-Ref: Array Data Types-Footnote-11067990
-Node: Array Functions1068082
-Node: Flattening Arrays1072580
-Node: Creating Arrays1079556
-Node: Redirection API1084323
-Node: Extension API Variables1087156
-Node: Extension Versioning1087867
-Ref: gawk-api-version1088296
-Node: Extension GMP/MPFR Versioning1090027
-Node: Extension API Informational Variables1091655
-Node: Extension API Boilerplate1092728
-Node: Changes from API V11096702
-Node: Finding Extensions1098274
-Node: Extension Example1098833
-Node: Internal File Description1099631
-Node: Internal File Ops1103711
-Ref: Internal File Ops-Footnote-11115061
-Node: Using Internal File Ops1115201
-Ref: Using Internal File Ops-Footnote-11117584
-Node: Extension Samples1117858
-Node: Extension Sample File Functions1119387
-Node: Extension Sample Fnmatch1127036
-Node: Extension Sample Fork1128523
-Node: Extension Sample Inplace1129741
-Node: Extension Sample Ord1133366
-Node: Extension Sample Readdir1134202
-Ref: table-readdir-file-types1135091
-Node: Extension Sample Revout1136158
-Node: Extension Sample Rev2way1136747
-Node: Extension Sample Read write array1137487
-Node: Extension Sample Readfile1139429
-Node: Extension Sample Time1140524
-Node: Extension Sample API Tests1142276
-Node: gawkextlib1142768
-Node: Extension summary1145686
-Node: Extension Exercises1149388
-Node: Language History1150630
-Node: V7/SVR3.11152286
-Node: SVR41154438
-Node: POSIX1155872
-Node: BTL1157253
-Node: POSIX/GNU1157982
-Node: Feature History1163760
-Node: Common Extensions1180079
-Node: Ranges and Locales1181362
-Ref: Ranges and Locales-Footnote-11185978
-Ref: Ranges and Locales-Footnote-21186005
-Ref: Ranges and Locales-Footnote-31186240
-Node: Contributors1186463
-Node: History summary1192460
-Node: Installation1193840
-Node: Gawk Distribution1194784
-Node: Getting1195268
-Node: Extracting1196231
-Node: Distribution contents1197869
-Node: Unix Installation1204349
-Node: Quick Installation1205031
-Node: Shell Startup Files1207445
-Node: Additional Configuration Options1208534
-Node: Configuration Philosophy1210849
-Node: Non-Unix Installation1213218
-Node: PC Installation1213678
-Node: PC Binary Installation1214516
-Node: PC Compiling1214951
-Node: PC Using1216068
-Node: Cygwin1219621
-Node: MSYS1220845
-Node: VMS Installation1221447
-Node: VMS Compilation1222238
-Ref: VMS Compilation-Footnote-11223467
-Node: VMS Dynamic Extensions1223525
-Node: VMS Installation Details1225210
-Node: VMS Running1227463
-Node: VMS GNV1231742
-Node: VMS Old Gawk1232477
-Node: Bugs1232948
-Node: Bug address1233611
-Node: Usenet1236593
-Node: Maintainers1237597
-Node: Other Versions1238782
-Node: Installation summary1245870
-Node: Notes1247079
-Node: Compatibility Mode1247873
-Node: Additions1248655
-Node: Accessing The Source1249580
-Node: Adding Code1251017
-Node: New Ports1257236
-Node: Derived Files1261611
-Ref: Derived Files-Footnote-11267271
-Ref: Derived Files-Footnote-21267306
-Ref: Derived Files-Footnote-31267904
-Node: Future Extensions1268018
-Node: Implementation Limitations1268676
-Node: Extension Design1269859
-Node: Old Extension Problems1271003
-Ref: Old Extension Problems-Footnote-11272521
-Node: Extension New Mechanism Goals1272578
-Ref: Extension New Mechanism Goals-Footnote-11275942
-Node: Extension Other Design Decisions1276131
-Node: Extension Future Growth1278244
-Node: Notes summary1278850
-Node: Basic Concepts1280008
-Node: Basic High Level1280689
-Ref: figure-general-flow1280971
-Ref: figure-process-flow1281656
-Ref: Basic High Level-Footnote-11284957
-Node: Basic Data Typing1285142
-Node: Glossary1288470
-Node: Copying1320355
-Node: GNU Free Documentation License1357898
-Node: Index1383018
+Node: Foreword344403
+Node: Foreword448845
+Node: Preface50377
+Ref: Preface-Footnote-153236
+Ref: Preface-Footnote-253345
+Ref: Preface-Footnote-353579
+Node: History53721
+Node: Names56073
+Ref: Names-Footnote-157177
+Node: This Manual57324
+Ref: This Manual-Footnote-163963
+Node: Conventions64063
+Node: Manual History66432
+Ref: Manual History-Footnote-169429
+Ref: Manual History-Footnote-269470
+Node: How To Contribute69544
+Node: Acknowledgments70470
+Node: Getting Started75407
+Node: Running gawk77846
+Node: One-shot79036
+Node: Read Terminal80299
+Node: Long82292
+Node: Executable Scripts83805
+Ref: Executable Scripts-Footnote-186438
+Node: Comments86541
+Node: Quoting89025
+Node: DOS Quoting94551
+Node: Sample Data Files96607
+Node: Very Simple99202
+Node: Two Rules104104
+Node: More Complex105989
+Node: Statements/Lines108855
+Ref: Statements/Lines-Footnote-1113339
+Node: Other Features113604
+Node: When114540
+Ref: When-Footnote-1116294
+Node: Intro Summary116359
+Node: Invoking Gawk117243
+Node: Command Line118757
+Node: Options119555
+Ref: Options-Footnote-1137224
+Ref: Options-Footnote-2137455
+Node: Other Arguments137480
+Node: Naming Standard Input140787
+Node: Environment Variables141997
+Node: AWKPATH Variable142555
+Ref: AWKPATH Variable-Footnote-1145967
+Ref: AWKPATH Variable-Footnote-2146001
+Node: AWKLIBPATH Variable146372
+Ref: AWKLIBPATH Variable-Footnote-1148069
+Node: Other Environment Variables148444
+Node: Exit Status152265
+Node: Include Files152942
+Node: Loading Shared Libraries156632
+Node: Obsolete158060
+Node: Undocumented158752
+Node: Invoking Summary159049
+Node: Regexp161890
+Node: Regexp Usage163344
+Node: Escape Sequences165381
+Node: Regexp Operators171622
+Node: Regexp Operator Details172107
+Ref: Regexp Operator Details-Footnote-1178539
+Node: Interval Expressions178686
+Ref: Interval Expressions-Footnote-1180107
+Node: Bracket Expressions180205
+Ref: table-char-classes182681
+Node: Leftmost Longest186007
+Node: Computed Regexps187310
+Node: GNU Regexp Operators190737
+Node: Case-sensitivity194474
+Ref: Case-sensitivity-Footnote-1197340
+Ref: Case-sensitivity-Footnote-2197575
+Node: Regexp Summary197683
+Node: Reading Files199149
+Node: Records201418
+Node: awk split records202493
+Node: gawk split records207768
+Ref: gawk split records-Footnote-1212501
+Node: Fields212538
+Node: Nonconstant Fields215279
+Ref: Nonconstant Fields-Footnote-1217515
+Node: Changing Fields217719
+Node: Field Separators223750
+Node: Default Field Splitting226448
+Node: Regexp Field Splitting227566
+Node: Single Character Fields230919
+Node: Command Line Field Separator231979
+Node: Full Line Fields235197
+Ref: Full Line Fields-Footnote-1236719
+Ref: Full Line Fields-Footnote-2236765
+Node: Field Splitting Summary236866
+Node: Constant Size238940
+Node: Fixed width data239672
+Node: Skipping intervening243139
+Node: Allowing trailing data243937
+Node: Fields with fixed data244974
+Node: Splitting By Content246492
+Ref: Splitting By Content-Footnote-1250275
+Node: More CSV250438
+Node: Testing field creation251746
+Node: Multiple Line253371
+Node: Getline259648
+Node: Plain Getline262117
+Node: Getline/Variable264690
+Node: Getline/File265841
+Node: Getline/Variable/File267229
+Ref: Getline/Variable/File-Footnote-1268834
+Node: Getline/Pipe268922
+Node: Getline/Variable/Pipe271626
+Node: Getline/Coprocess272761
+Node: Getline/Variable/Coprocess274028
+Node: Getline Notes274770
+Node: Getline Summary277567
+Ref: table-getline-variants277991
+Node: Read Timeout278739
+Ref: Read Timeout-Footnote-1282645
+Node: Retrying Input282703
+Node: Command-line directories283902
+Node: Input Summary284808
+Node: Input Exercises287980
+Node: Printing288414
+Node: Print290248
+Node: Print Examples291705
+Node: Output Separators294485
+Node: OFMT296502
+Node: Printf297858
+Node: Basic Printf298643
+Node: Control Letters300217
+Node: Format Modifiers305381
+Node: Printf Examples311396
+Node: Redirection313882
+Node: Special FD320723
+Ref: Special FD-Footnote-1323891
+Node: Special Files323965
+Node: Other Inherited Files324582
+Node: Special Network325583
+Node: Special Caveats326443
+Node: Close Files And Pipes327392
+Ref: table-close-pipe-return-values334299
+Ref: Close Files And Pipes-Footnote-1335112
+Ref: Close Files And Pipes-Footnote-2335260
+Node: Nonfatal335412
+Node: Output Summary337750
+Node: Output Exercises338972
+Node: Expressions339651
+Node: Values340839
+Node: Constants341517
+Node: Scalar Constants342208
+Ref: Scalar Constants-Footnote-1344732
+Node: Nondecimal-numbers344982
+Node: Regexp Constants347983
+Node: Using Constant Regexps348509
+Node: Standard Regexp Constants349131
+Node: Strong Regexp Constants352319
+Node: Variables355277
+Node: Using Variables355934
+Node: Assignment Options357844
+Node: Conversion360315
+Node: Strings And Numbers360839
+Ref: Strings And Numbers-Footnote-1363902
+Node: Locale influences conversions364011
+Ref: table-locale-affects366769
+Node: All Operators367387
+Node: Arithmetic Ops368016
+Node: Concatenation370522
+Ref: Concatenation-Footnote-1373369
+Node: Assignment Ops373476
+Ref: table-assign-ops378467
+Node: Increment Ops379780
+Node: Truth Values and Conditions383240
+Node: Truth Values384314
+Node: Typing and Comparison385362
+Node: Variable Typing386182
+Ref: Variable Typing-Footnote-1392645
+Ref: Variable Typing-Footnote-2392717
+Node: Comparison Operators392794
+Ref: table-relational-ops393213
+Node: POSIX String Comparison396708
+Ref: POSIX String Comparison-Footnote-1398403
+Ref: POSIX String Comparison-Footnote-2398542
+Node: Boolean Ops398626
+Ref: Boolean Ops-Footnote-1403108
+Node: Conditional Exp403200
+Node: Function Calls404936
+Node: Precedence408813
+Node: Locales412472
+Node: Expressions Summary414104
+Node: Patterns and Actions416677
+Node: Pattern Overview417797
+Node: Regexp Patterns419474
+Node: Expression Patterns420016
+Node: Ranges423797
+Node: BEGIN/END426905
+Node: Using BEGIN/END427666
+Ref: Using BEGIN/END-Footnote-1430402
+Node: I/O And BEGIN/END430508
+Node: BEGINFILE/ENDFILE432822
+Node: Empty435735
+Node: Using Shell Variables436052
+Node: Action Overview438326
+Node: Statements440651
+Node: If Statement442499
+Node: While Statement443994
+Node: Do Statement446022
+Node: For Statement447170
+Node: Switch Statement450341
+Node: Break Statement452727
+Node: Continue Statement454819
+Node: Next Statement456646
+Node: Nextfile Statement459029
+Node: Exit Statement461681
+Node: Built-in Variables464084
+Node: User-modified465217
+Node: Auto-set472984
+Ref: Auto-set-Footnote-1489791
+Ref: Auto-set-Footnote-2489997
+Node: ARGC and ARGV490053
+Node: Pattern Action Summary494266
+Node: Arrays496696
+Node: Array Basics498025
+Node: Array Intro498869
+Ref: figure-array-elements500844
+Ref: Array Intro-Footnote-1503548
+Node: Reference to Elements503676
+Node: Assigning Elements506140
+Node: Array Example506631
+Node: Scanning an Array508390
+Node: Controlling Scanning511412
+Ref: Controlling Scanning-Footnote-1517868
+Node: Numeric Array Subscripts518184
+Node: Uninitialized Subscripts520368
+Node: Delete521987
+Ref: Delete-Footnote-1524739
+Node: Multidimensional524796
+Node: Multiscanning527891
+Node: Arrays of Arrays529482
+Node: Arrays Summary534250
+Node: Functions536343
+Node: Built-in537381
+Node: Calling Built-in538462
+Node: Numeric Functions540458
+Ref: Numeric Functions-Footnote-1544486
+Ref: Numeric Functions-Footnote-2545134
+Ref: Numeric Functions-Footnote-3545182
+Node: String Functions545454
+Ref: String Functions-Footnote-1569638
+Ref: String Functions-Footnote-2569766
+Ref: String Functions-Footnote-3570014
+Node: Gory Details570101
+Ref: table-sub-escapes571892
+Ref: table-sub-proposed573411
+Ref: table-posix-sub574774
+Ref: table-gensub-escapes576315
+Ref: Gory Details-Footnote-1577138
+Node: I/O Functions577292
+Ref: table-system-return-values583760
+Ref: I/O Functions-Footnote-1585840
+Ref: I/O Functions-Footnote-2585988
+Node: Time Functions586108
+Ref: Time Functions-Footnote-1596779
+Ref: Time Functions-Footnote-2596847
+Ref: Time Functions-Footnote-3597005
+Ref: Time Functions-Footnote-4597116
+Ref: Time Functions-Footnote-5597228
+Ref: Time Functions-Footnote-6597455
+Node: Bitwise Functions597721
+Ref: table-bitwise-ops598315
+Ref: Bitwise Functions-Footnote-1604378
+Ref: Bitwise Functions-Footnote-2604551
+Node: Type Functions604742
+Node: I18N Functions607605
+Node: User-defined609256
+Node: Definition Syntax610068
+Ref: Definition Syntax-Footnote-1615755
+Node: Function Example615826
+Ref: Function Example-Footnote-1618748
+Node: Function Calling618770
+Node: Calling A Function619358
+Node: Variable Scope620316
+Node: Pass By Value/Reference623310
+Node: Function Caveats625954
+Ref: Function Caveats-Footnote-1628001
+Node: Return Statement628121
+Node: Dynamic Typing631100
+Node: Indirect Calls632030
+Ref: Indirect Calls-Footnote-1642282
+Node: Functions Summary642410
+Node: Library Functions645115
+Ref: Library Functions-Footnote-1648722
+Ref: Library Functions-Footnote-2648865
+Node: Library Names649036
+Ref: Library Names-Footnote-1652703
+Ref: Library Names-Footnote-2652926
+Node: General Functions653012
+Node: Strtonum Function654115
+Node: Assert Function657137
+Node: Round Function660463
+Node: Cliff Random Function662003
+Node: Ordinal Functions663019
+Ref: Ordinal Functions-Footnote-1666082
+Ref: Ordinal Functions-Footnote-2666334
+Node: Join Function666544
+Ref: Join Function-Footnote-1668314
+Node: Getlocaltime Function668514
+Node: Readfile Function672256
+Node: Shell Quoting674233
+Node: Data File Management675634
+Node: Filetrans Function676266
+Node: Rewind Function680362
+Node: File Checking682271
+Ref: File Checking-Footnote-1683605
+Node: Empty Files683806
+Node: Ignoring Assigns685785
+Node: Getopt Function687335
+Ref: Getopt Function-Footnote-1702547
+Node: Passwd Functions702747
+Ref: Passwd Functions-Footnote-1711586
+Node: Group Functions711674
+Ref: Group Functions-Footnote-1719572
+Node: Walking Arrays719779
+Node: Library Functions Summary722787
+Node: Library Exercises724193
+Node: Sample Programs724658
+Node: Running Examples725428
+Node: Clones726156
+Node: Cut Program727380
+Node: Egrep Program737309
+Ref: Egrep Program-Footnote-1744821
+Node: Id Program744931
+Node: Split Program748611
+Ref: Split Program-Footnote-1752069
+Node: Tee Program752198
+Node: Uniq Program754988
+Node: Wc Program762609
+Ref: Wc Program-Footnote-1766864
+Node: Miscellaneous Programs766958
+Node: Dupword Program768171
+Node: Alarm Program770201
+Node: Translate Program775056
+Ref: Translate Program-Footnote-1779621
+Node: Labels Program779891
+Ref: Labels Program-Footnote-1783242
+Node: Word Sorting783326
+Node: History Sorting787398
+Node: Extract Program789623
+Node: Simple Sed797677
+Node: Igawk Program800751
+Ref: Igawk Program-Footnote-1815082
+Ref: Igawk Program-Footnote-2815284
+Ref: Igawk Program-Footnote-3815406
+Node: Anagram Program815521
+Node: Signature Program818583
+Node: Programs Summary819830
+Node: Programs Exercises821044
+Ref: Programs Exercises-Footnote-1825173
+Node: Advanced Features825264
+Node: Nondecimal Data827254
+Node: Array Sorting828845
+Node: Controlling Array Traversal829545
+Ref: Controlling Array Traversal-Footnote-1837913
+Node: Array Sorting Functions838031
+Ref: Array Sorting Functions-Footnote-1843122
+Node: Two-way I/O843318
+Ref: Two-way I/O-Footnote-1851039
+Ref: Two-way I/O-Footnote-2851226
+Node: TCP/IP Networking851308
+Node: Profiling854426
+Node: Advanced Features Summary863441
+Node: Internationalization865285
+Node: I18N and L10N866765
+Node: Explaining gettext867452
+Ref: Explaining gettext-Footnote-1873344
+Ref: Explaining gettext-Footnote-2873529
+Node: Programmer i18n873694
+Ref: Programmer i18n-Footnote-1878643
+Node: Translator i18n878692
+Node: String Extraction879486
+Ref: String Extraction-Footnote-1880618
+Node: Printf Ordering880704
+Ref: Printf Ordering-Footnote-1883490
+Node: I18N Portability883554
+Ref: I18N Portability-Footnote-1886010
+Node: I18N Example886073
+Ref: I18N Example-Footnote-1889348
+Ref: I18N Example-Footnote-2889421
+Node: Gawk I18N889530
+Node: I18N Summary890179
+Node: Debugger891520
+Node: Debugging892520
+Node: Debugging Concepts892961
+Node: Debugging Terms894770
+Node: Awk Debugging897345
+Ref: Awk Debugging-Footnote-1898290
+Node: Sample Debugging Session898422
+Node: Debugger Invocation898956
+Node: Finding The Bug900342
+Node: List of Debugger Commands906816
+Node: Breakpoint Control908149
+Node: Debugger Execution Control911843
+Node: Viewing And Changing Data915205
+Node: Execution Stack918746
+Node: Debugger Info920383
+Node: Miscellaneous Debugger Commands924454
+Node: Readline Support929516
+Node: Limitations930412
+Node: Debugging Summary932966
+Node: Namespaces934245
+Node: Global Namespace935356
+Node: Qualified Names936754
+Node: Default Namespace937753
+Node: Changing The Namespace938494
+Node: Naming Rules940108
+Node: Internal Name Management941956
+Node: Namespace Example942998
+Node: Namespace And Features945560
+Node: Namespace Summary946995
+Node: Arbitrary Precision Arithmetic948472
+Node: Computer Arithmetic949959
+Ref: table-numeric-ranges953725
+Ref: table-floating-point-ranges954218
+Ref: Computer Arithmetic-Footnote-1954876
+Node: Math Definitions954933
+Ref: table-ieee-formats958249
+Ref: Math Definitions-Footnote-1958852
+Node: MPFR features958957
+Node: FP Math Caution960675
+Ref: FP Math Caution-Footnote-1961747
+Node: Inexactness of computations962116
+Node: Inexact representation963076
+Node: Comparing FP Values964436
+Node: Errors accumulate965677
+Node: Getting Accuracy967110
+Node: Try To Round969820
+Node: Setting precision970719
+Ref: table-predefined-precision-strings971416
+Node: Setting the rounding mode973246
+Ref: table-gawk-rounding-modes973620
+Ref: Setting the rounding mode-Footnote-1977551
+Node: Arbitrary Precision Integers977730
+Ref: Arbitrary Precision Integers-Footnote-1980905
+Node: Checking for MPFR981054
+Node: POSIX Floating Point Problems982528
+Ref: POSIX Floating Point Problems-Footnote-1986813
+Node: Floating point summary986851
+Node: Dynamic Extensions989041
+Node: Extension Intro990594
+Node: Plugin License991860
+Node: Extension Mechanism Outline992657
+Ref: figure-load-extension993096
+Ref: figure-register-new-function994661
+Ref: figure-call-new-function995753
+Node: Extension API Description997815
+Node: Extension API Functions Introduction999457
+Ref: table-api-std-headers1001293
+Node: General Data Types1005158
+Ref: General Data Types-Footnote-11013519
+Node: Memory Allocation Functions1013818
+Ref: Memory Allocation Functions-Footnote-11018028
+Node: Constructor Functions1018127
+Node: Registration Functions1021713
+Node: Extension Functions1022398
+Node: Exit Callback Functions1027720
+Node: Extension Version String1028970
+Node: Input Parsers1029633
+Node: Output Wrappers1042354
+Node: Two-way processors1046866
+Node: Printing Messages1049131
+Ref: Printing Messages-Footnote-11050302
+Node: Updating ERRNO1050455
+Node: Requesting Values1051194
+Ref: table-value-types-returned1051931
+Node: Accessing Parameters1052867
+Node: Symbol Table Access1054102
+Node: Symbol table by name1054614
+Ref: Symbol table by name-Footnote-11057638
+Node: Symbol table by cookie1057766
+Ref: Symbol table by cookie-Footnote-11061951
+Node: Cached values1062015
+Ref: Cached values-Footnote-11065551
+Node: Array Manipulation1065704
+Ref: Array Manipulation-Footnote-11066795
+Node: Array Data Types1066832
+Ref: Array Data Types-Footnote-11069490
+Node: Array Functions1069582
+Node: Flattening Arrays1074080
+Node: Creating Arrays1081056
+Node: Redirection API1085823
+Node: Extension API Variables1088656
+Node: Extension Versioning1089367
+Ref: gawk-api-version1089796
+Node: Extension GMP/MPFR Versioning1091527
+Node: Extension API Informational Variables1093155
+Node: Extension API Boilerplate1094228
+Node: Changes from API V11098202
+Node: Finding Extensions1099774
+Node: Extension Example1100333
+Node: Internal File Description1101131
+Node: Internal File Ops1105211
+Ref: Internal File Ops-Footnote-11116561
+Node: Using Internal File Ops1116701
+Ref: Using Internal File Ops-Footnote-11119084
+Node: Extension Samples1119358
+Node: Extension Sample File Functions1120887
+Node: Extension Sample Fnmatch1128536
+Node: Extension Sample Fork1130023
+Node: Extension Sample Inplace1131241
+Node: Extension Sample Ord1134866
+Node: Extension Sample Readdir1135702
+Ref: table-readdir-file-types1136591
+Node: Extension Sample Revout1137658
+Node: Extension Sample Rev2way1138247
+Node: Extension Sample Read write array1138987
+Node: Extension Sample Readfile1140929
+Node: Extension Sample Time1142024
+Node: Extension Sample API Tests1143776
+Node: gawkextlib1144268
+Node: Extension summary1147186
+Node: Extension Exercises1150888
+Node: Language History1152130
+Node: V7/SVR3.11153786
+Node: SVR41155938
+Node: POSIX1157372
+Node: BTL1158753
+Node: POSIX/GNU1159482
+Node: Feature History1165260
+Node: Common Extensions1181579
+Node: Ranges and Locales1182862
+Ref: Ranges and Locales-Footnote-11187478
+Ref: Ranges and Locales-Footnote-21187505
+Ref: Ranges and Locales-Footnote-31187740
+Node: Contributors1187963
+Node: History summary1193960
+Node: Installation1195340
+Node: Gawk Distribution1196284
+Node: Getting1196768
+Node: Extracting1197731
+Node: Distribution contents1199369
+Node: Unix Installation1205849
+Node: Quick Installation1206531
+Node: Shell Startup Files1208945
+Node: Additional Configuration Options1210034
+Node: Configuration Philosophy1212349
+Node: Non-Unix Installation1214718
+Node: PC Installation1215178
+Node: PC Binary Installation1216016
+Node: PC Compiling1216451
+Node: PC Using1217568
+Node: Cygwin1221121
+Node: MSYS1222345
+Node: VMS Installation1222947
+Node: VMS Compilation1223738
+Ref: VMS Compilation-Footnote-11224967
+Node: VMS Dynamic Extensions1225025
+Node: VMS Installation Details1226710
+Node: VMS Running1228963
+Node: VMS GNV1233242
+Node: VMS Old Gawk1233977
+Node: Bugs1234448
+Node: Bug address1235111
+Node: Usenet1238093
+Node: Maintainers1239097
+Node: Other Versions1240282
+Node: Installation summary1247370
+Node: Notes1248579
+Node: Compatibility Mode1249373
+Node: Additions1250155
+Node: Accessing The Source1251080
+Node: Adding Code1252517
+Node: New Ports1258736
+Node: Derived Files1263111
+Ref: Derived Files-Footnote-11268771
+Ref: Derived Files-Footnote-21268806
+Ref: Derived Files-Footnote-31269404
+Node: Future Extensions1269518
+Node: Implementation Limitations1270176
+Node: Extension Design1271359
+Node: Old Extension Problems1272503
+Ref: Old Extension Problems-Footnote-11274021
+Node: Extension New Mechanism Goals1274078
+Ref: Extension New Mechanism Goals-Footnote-11277442
+Node: Extension Other Design Decisions1277631
+Node: Extension Future Growth1279744
+Node: Notes summary1280350
+Node: Basic Concepts1281508
+Node: Basic High Level1282189
+Ref: figure-general-flow1282471
+Ref: figure-process-flow1283156
+Ref: Basic High Level-Footnote-11286457
+Node: Basic Data Typing1286642
+Node: Glossary1289970
+Node: Copying1321855
+Node: GNU Free Documentation License1359398
+Node: Index1384518

End Tag Table
diff --git a/doc/gawk.texi b/doc/gawk.texi
index ca99f017..cbc71886 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -577,6 +577,7 @@ particular records in a file and perform operations upon them.
* Allowing trailing data:: Capturing optional trailing data.
* Fields with fixed data:: Field values with fixed-width data.
* Splitting By Content:: Defining Fields By Content
+* More CSV:: More on CSV files.
* Testing field creation:: Checking how @command{gawk} is
splitting records.
* Multiple Line:: Reading multiline records.
@@ -8188,6 +8189,10 @@ four, and @code{$4} has the value @code{"ddd"}.
@node Splitting By Content
@section Defining Fields by Content
+@menu
+* More CSV:: More on CSV files.
+@end menu
+
@c O'Reilly doesn't like it as a note the first thing in the section.
This @value{SECTION} discusses an advanced
feature of @command{gawk}. If you are a novice @command{awk} user,
@@ -8227,7 +8232,9 @@ This regular expression describes the contents of each field.
In the case of CSV data as presented here, each field is either ``anything that
is not a comma,'' or ``a double quote, anything that is not a double quote, and a
-closing double quote.'' If written as a regular expression constant
+closing double quote.'' (There are more complicated definitions of CSV data,
+treated shortly.)
+If written as a regular expression constant
(@pxref{Regexp}),
we would have @code{/([^,]+)|("[^"]+")/}.
Writing this as a string requires us to escape the double quotes, leading to:
@@ -8283,12 +8290,6 @@ if (substr($i, 1, 1) == "\"") @{
@}
@end example
-As with @code{FS}, the @code{IGNORECASE} variable (@pxref{User-modified})
-affects field splitting with @code{FPAT}.
-
-Assigning a value to @code{FPAT} overrides field splitting
-with @code{FS} and with @code{FIELDWIDTHS}.
-
@quotation NOTE
Some programs export CSV data that contains embedded newlines between
the double quotes. @command{gawk} provides no way to deal with this.
@@ -8311,9 +8312,78 @@ FPAT = "([^,]*)|(\"[^\"]+\")"
@c (star in latter part of value) to allow quoted strings to be empty.
@c Per email from Ed Morton <mortoneccc@comcast.net>
+As with @code{FS}, the @code{IGNORECASE} variable (@pxref{User-modified})
+affects field splitting with @code{FPAT}.
+
+Assigning a value to @code{FPAT} overrides field splitting
+with @code{FS} and with @code{FIELDWIDTHS}.
+
Finally, the @code{patsplit()} function makes the same functionality
available for splitting regular strings (@pxref{String Functions}).
+@node More CSV
+@subsection More on CSV Files
+
+Manuel Collado notes that in addition to commas, a CSV field can also
+contains quotes, that have to be escaped by doubling them. The previously
+described regexps fail to accept quoted fields with both commas and
+quotes inside. He suggests that the simplest @code{FPAT} expression that
+recognizes this kind of fields is @code{/([^,]*)|("([^"]|"")+")/}. He
+provides the following imput data test these variants:
+
+@example
+@c file eg/misc/sample.csv
+p,"q,r",s
+p,"q""r",s
+p,"q,""r",s
+p,"",s
+p,,s
+@c endfile
+@end example
+
+@noindent
+And here is his test program:
+
+@example
+@c file eg/misc/test-csv.awk
+@group
+BEGIN @{
+ fp[0] = "([^,]+)|(\"[^\"]+\")"
+ fp[1] = "([^,]*)|(\"[^\"]+\")"
+ fp[2] = "([^,]*)|(\"([^\"]|\"\")+\")"
+ FPAT = fp[fpat+0]
+@}
+@end group
+
+@group
+@{
+ print "<" $0 ">"
+ printf("NF = %s ", NF)
+ for (i = 1; i <= NF; i++) @{
+ printf("<%s>", $i)
+ @}
+ print ""
+@}
+@end group
+@c endfile
+@end example
+
+When run on the third variant, it produces:
+
+@example
+$ @kbd{gawk -v fpat=2 -f test-csv.awk sample.csv}
+@print{} <p,"q,r",s>
+@print{} NF = 3 <p><"q,r"><s>
+@print{} <p,"q""r",s>
+@print{} NF = 3 <p><"q""r"><s>
+@print{} <p,"q,""r",s>
+@print{} NF = 3 <p><"q,""r"><s>
+@print{} <p,"",s>
+@print{} NF = 3 <p><""><s>
+@print{} <p,,s>
+@print{} NF = 3 <p><><s>
+@end example
+
@node Testing field creation
@section Checking How @command{gawk} Is Splitting Records
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index 66cac326..940cd73d 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -572,6 +572,7 @@ particular records in a file and perform operations upon them.
* Allowing trailing data:: Capturing optional trailing data.
* Fields with fixed data:: Field values with fixed-width data.
* Splitting By Content:: Defining Fields By Content
+* More CSV:: More on CSV files.
* Testing field creation:: Checking how @command{gawk} is
splitting records.
* Multiple Line:: Reading multiline records.
@@ -7785,6 +7786,10 @@ four, and @code{$4} has the value @code{"ddd"}.
@node Splitting By Content
@section Defining Fields by Content
+@menu
+* More CSV:: More on CSV files.
+@end menu
+
@c O'Reilly doesn't like it as a note the first thing in the section.
This @value{SECTION} discusses an advanced
feature of @command{gawk}. If you are a novice @command{awk} user,
@@ -7824,7 +7829,9 @@ This regular expression describes the contents of each field.
In the case of CSV data as presented here, each field is either ``anything that
is not a comma,'' or ``a double quote, anything that is not a double quote, and a
-closing double quote.'' If written as a regular expression constant
+closing double quote.'' (There are more complicated definitions of CSV data,
+treated shortly.)
+If written as a regular expression constant
(@pxref{Regexp}),
we would have @code{/([^,]+)|("[^"]+")/}.
Writing this as a string requires us to escape the double quotes, leading to:
@@ -7880,12 +7887,6 @@ if (substr($i, 1, 1) == "\"") @{
@}
@end example
-As with @code{FS}, the @code{IGNORECASE} variable (@pxref{User-modified})
-affects field splitting with @code{FPAT}.
-
-Assigning a value to @code{FPAT} overrides field splitting
-with @code{FS} and with @code{FIELDWIDTHS}.
-
@quotation NOTE
Some programs export CSV data that contains embedded newlines between
the double quotes. @command{gawk} provides no way to deal with this.
@@ -7908,9 +7909,78 @@ FPAT = "([^,]*)|(\"[^\"]+\")"
@c (star in latter part of value) to allow quoted strings to be empty.
@c Per email from Ed Morton <mortoneccc@comcast.net>
+As with @code{FS}, the @code{IGNORECASE} variable (@pxref{User-modified})
+affects field splitting with @code{FPAT}.
+
+Assigning a value to @code{FPAT} overrides field splitting
+with @code{FS} and with @code{FIELDWIDTHS}.
+
Finally, the @code{patsplit()} function makes the same functionality
available for splitting regular strings (@pxref{String Functions}).
+@node More CSV
+@subsection More on CSV Files
+
+Manuel Collado notes that in addition to commas, a CSV field can also
+contains quotes, that have to be escaped by doubling them. The previously
+described regexps fail to accept quoted fields with both commas and
+quotes inside. He suggests that the simplest @code{FPAT} expression that
+recognizes this kind of fields is @code{/([^,]*)|("([^"]|"")+")/}. He
+provides the following imput data test these variants:
+
+@example
+@c file eg/misc/sample.csv
+p,"q,r",s
+p,"q""r",s
+p,"q,""r",s
+p,"",s
+p,,s
+@c endfile
+@end example
+
+@noindent
+And here is his test program:
+
+@example
+@c file eg/misc/test-csv.awk
+@group
+BEGIN @{
+ fp[0] = "([^,]+)|(\"[^\"]+\")"
+ fp[1] = "([^,]*)|(\"[^\"]+\")"
+ fp[2] = "([^,]*)|(\"([^\"]|\"\")+\")"
+ FPAT = fp[fpat+0]
+@}
+@end group
+
+@group
+@{
+ print "<" $0 ">"
+ printf("NF = %s ", NF)
+ for (i = 1; i <= NF; i++) @{
+ printf("<%s>", $i)
+ @}
+ print ""
+@}
+@end group
+@c endfile
+@end example
+
+When run on the third variant, it produces:
+
+@example
+$ @kbd{gawk -v fpat=2 -f test-csv.awk sample.csv}
+@print{} <p,"q,r",s>
+@print{} NF = 3 <p><"q,r"><s>
+@print{} <p,"q""r",s>
+@print{} NF = 3 <p><"q""r"><s>
+@print{} <p,"q,""r",s>
+@print{} NF = 3 <p><"q,""r"><s>
+@print{} <p,"",s>
+@print{} NF = 3 <p><""><s>
+@print{} <p,,s>
+@print{} NF = 3 <p><><s>
+@end example
+
@node Testing field creation
@section Checking How @command{gawk} Is Splitting Records