summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorArnold D. Robbins <arnold@skeeve.com>2023-04-13 16:12:27 +0300
committerArnold D. Robbins <arnold@skeeve.com>2023-04-13 16:12:27 +0300
commit2af702cd761cf01f4db700a10cf308459d046f41 (patch)
tree2af83cafd7126626f68fecd8cfec25fc6dd37cdd
parent75e692f8d197a987cf79213c73bfcf16e2ec59e1 (diff)
downloadgawk-2af702cd761cf01f4db700a10cf308459d046f41.tar.gz
Finally csv behavior w.r.t. CRs.
-rw-r--r--ChangeLog5
-rw-r--r--doc/ChangeLog8
-rw-r--r--doc/gawk.info1083
-rw-r--r--doc/gawk.texi30
-rw-r--r--doc/gawktexi.in20
-rw-r--r--doc/wordlist2
-rw-r--r--io.c17
-rw-r--r--pc/ChangeLog4
-rw-r--r--pc/Makefile.tst9
-rw-r--r--test/ChangeLog6
-rw-r--r--test/Makefile.am7
-rw-r--r--test/Makefile.in12
-rw-r--r--test/Maketests5
-rw-r--r--test/csvodd.awk13
-rw-r--r--test/csvodd.in25
-rw-r--r--test/csvodd.ok44
16 files changed, 719 insertions, 571 deletions
diff --git a/ChangeLog b/ChangeLog
index 9c5945a0..ccfa75f0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2023-04-13 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * io.c (csvscan): Convert CR-LF pairs to plain LF, both at the
+ end of the line and when embedded. Plain CRs are not touched.
+
2023-04-07 Andrew J. Schorr <aschorr@telemetry-investments.com>
* io.c (csvscan): Instead of stripping all carriage returns in the
diff --git a/doc/ChangeLog b/doc/ChangeLog
index 1a360483..53a0bc45 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -1,3 +1,11 @@
+2023-04-13 Arnold D. Robbins <arnold@skeeve.com>
+
+ * gawktexi.in (Comma Separated Fields): Further revise the prose
+ and sidebar for discussion of CR-LF embedded in the record, and
+ as the record terminator. Update the sidebar as well that
+ standalone CRs are not modified.
+ * wordlist: Updated.
+
2023-04-10 Arnold D. Robbins <arnold@skeeve.com>
* gawktexi.in (To CSV Function): Fix a typo in the code.
diff --git a/doc/gawk.info b/doc/gawk.info
index 3325a0c3..056fc2cc 100644
--- a/doc/gawk.info
+++ b/doc/gawk.info
@@ -5304,9 +5304,9 @@ the ‘-k’ or ‘--csv’ options.
comma to appear inside a field (i.e., as data), the field may be quoted
by beginning and ending it with double quotes. In order to allow a
double quote inside a field, the field _must_ be quoted, and two double
-quotes are used to represent an actual double quote. The double quote
-that starts a quoted field must be the first character after the comma.
-*note Table 4.1: table-csv-examples. shows some examples.
+quotes represent an actual double quote. The double quote that starts a
+quoted field must be the first character after the comma. *note Table
+4.1: table-csv-examples. shows some examples.
Input Field Contents
@@ -5323,8 +5323,9 @@ Table 4.1: Examples of CSV data
allowed inside double-quoted fields! In order to deal with such things,
when processing CSV files, ‘gawk’ scans the input data looking for
newlines that are not enclosed in double quotes. Thus, use of the
-‘--csv’ totally overrides normal record processing with ‘RS’ (*note
-Records::).
+‘--csv’ option totally overrides normal record processing with ‘RS’
+(*note Records::), as well as field splitting with any of ‘FS’,
+‘FIELDWIDTHS’, or ‘FPAT’.
Carriage-Return–Line-Feed Line Endings In CSV Files
@@ -5333,9 +5334,13 @@ Records::).
Many CSV files are imported from systems where the line terminator
for text files is a carriage-return–line-feed pair (CR-LF, ‘\r’ followed
-by ‘\n’). For ease of use, when processing CSV files, ‘gawk’ simply
-includes the carriage-return character in the record terminator when it
-occurs immediately prior to a line-feed character in the input.
+by ‘\n’). For ease of use, when processing CSV files, ‘gawk’ converts
+CR-LF pairs into a single newline. That is, the ‘\r’ is removed.
+
+ This occurs only when a CR is paired with an LF; a standalone CR is
+left alone. This behavior is consistent with with Windows systems which
+automatically convert CR-LF in files into a plain LF in memory, and also
+with the commonly available ‘unix2dos’ utility program.
The behavior of the ‘split()’ function (not formally discussed yet,
see *note String Functions::) differs slightly when processing CSV
@@ -38223,7 +38228,7 @@ Index
* Kernighan, Brian, quotes: Conventions. (line 38)
* Kernighan, Brian <1>: Acknowledgments. (line 79)
* Kernighan, Brian, quotes <1>: Comma Separated Fields.
- (line 45)
+ (line 46)
* Kernighan, Brian, quotes <2>: Getline/Pipe. (line 6)
* Kernighan, Brian, quotes <3>: Concatenation. (line 6)
* Kernighan, Brian, quotes <4>: Library Functions. (line 12)
@@ -39171,7 +39176,7 @@ Index
* sidebar, RS = "\0" Is Not Portable: gawk split records. (line 75)
* sidebar, Understanding $0: Changing Fields. (line 135)
* sidebar, Carriage-Return–Line-Feed Line Endings In CSV Files: Comma Separated Fields.
- (line 45)
+ (line 46)
* sidebar, Changing FS Does Not Affect the Fields: Full Line Fields.
(line 14)
* sidebar, FS and IGNORECASE: Field Splitting Summary.
@@ -39735,535 +39740,535 @@ Node: Default Field Splitting242323
Node: Regexp Field Splitting243466
Node: Single Character Fields247295
Node: Comma Separated Fields248384
-Ref: table-csv-examples249803
-Node: Command Line Field Separator251813
-Node: Full Line Fields255199
-Ref: Full Line Fields-Footnote-1256779
-Ref: Full Line Fields-Footnote-2256825
-Node: Field Splitting Summary256933
-Node: Constant Size259367
-Node: Fixed width data260111
-Node: Skipping intervening263630
-Node: Allowing trailing data264432
-Node: Fields with fixed data265497
-Node: Splitting By Content267123
-Ref: Splitting By Content-Footnote-1271392
-Node: More CSV271555
-Node: FS versus FPAT273208
-Node: Testing field creation274417
-Node: Multiple Line276195
-Node: Getline282677
-Node: Plain Getline285263
-Node: Getline/Variable287913
-Node: Getline/File289110
-Node: Getline/Variable/File290558
-Ref: Getline/Variable/File-Footnote-1292203
-Node: Getline/Pipe292299
-Node: Getline/Variable/Pipe295112
-Node: Getline/Coprocess296295
-Node: Getline/Variable/Coprocess297618
-Node: Getline Notes298384
-Node: Getline Summary301345
-Ref: table-getline-variants301789
-Node: Read Timeout302694
-Ref: Read Timeout-Footnote-1306658
-Node: Retrying Input306716
-Node: Command-line directories307983
-Node: Input Summary308921
-Node: Input Exercises312301
-Node: Printing312741
-Node: Print314684
-Node: Print Examples316190
-Node: Output Separators319043
-Node: OFMT321154
-Node: Printf322577
-Node: Basic Printf323382
-Node: Control Letters325018
-Node: Format Modifiers330487
-Node: Printf Examples336773
-Node: Redirection339318
-Node: Special FD346392
-Ref: Special FD-Footnote-1349682
-Node: Special Files349768
-Node: Other Inherited Files350397
-Node: Special Network351462
-Node: Special Caveats352350
-Node: Close Files And Pipes353333
-Ref: Close Files And Pipes-Footnote-1359469
-Node: Close Return Value359625
-Ref: table-close-pipe-return-values360900
-Ref: Close Return Value-Footnote-1361734
-Node: Noflush361890
-Node: Nonfatal363402
-Node: Output Summary365819
-Node: Output Exercises367105
-Node: Expressions367796
-Node: Values368998
-Node: Constants369676
-Node: Scalar Constants370373
-Ref: Scalar Constants-Footnote-1372951
-Ref: Scalar Constants-Footnote-2373201
-Node: Nondecimal-numbers373281
-Node: Regexp Constants376402
-Node: Using Constant Regexps376948
-Node: Standard Regexp Constants377594
-Node: Strong Regexp Constants380894
-Node: Variables384745
-Node: Using Variables385410
-Node: Assignment Options387390
-Node: Conversion389952
-Node: Strings And Numbers390484
-Ref: Strings And Numbers-Footnote-1393703
-Node: Locale influences conversions393812
-Ref: table-locale-affects396662
-Node: All Operators397305
-Node: Arithmetic Ops397946
-Node: Concatenation400776
-Ref: Concatenation-Footnote-1403726
-Node: Assignment Ops403849
-Ref: table-assign-ops408988
-Node: Increment Ops410370
-Node: Truth Values and Conditions413969
-Node: Truth Values415095
-Node: Typing and Comparison416186
-Node: Variable Typing417022
-Ref: Variable Typing-Footnote-1423684
-Ref: Variable Typing-Footnote-2423764
-Node: Comparison Operators423847
-Ref: table-relational-ops424274
-Node: POSIX String Comparison427960
-Ref: POSIX String Comparison-Footnote-1429719
-Ref: POSIX String Comparison-Footnote-2429862
-Node: Boolean Ops429946
-Ref: Boolean Ops-Footnote-1434639
-Node: Conditional Exp434735
-Node: Function Calls436521
-Node: Precedence440471
-Node: Locales444348
-Node: Expressions Summary446030
-Node: Patterns and Actions448693
-Node: Pattern Overview449835
-Node: Regexp Patterns451561
-Node: Expression Patterns452107
-Node: Ranges456016
-Node: BEGIN/END459194
-Node: Using BEGIN/END460007
-Ref: Using BEGIN/END-Footnote-1462917
-Node: I/O And BEGIN/END463027
-Node: BEGINFILE/ENDFILE465508
-Node: Empty468949
-Node: Using Shell Variables469266
-Node: Action Overview471604
-Node: Statements474039
-Node: If Statement475937
-Node: While Statement477506
-Node: Do Statement479594
-Node: For Statement480780
-Node: Switch Statement484137
-Node: Break Statement486688
-Node: Continue Statement488880
-Node: Next Statement490812
-Node: Nextfile Statement493309
-Node: Exit Statement496170
-Node: Built-in Variables498703
-Node: User-modified499882
-Node: Auto-set508093
-Ref: Auto-set-Footnote-1526192
-Ref: Auto-set-Footnote-2526410
-Node: ARGC and ARGV526466
-Node: Pattern Action Summary530905
-Node: Arrays533521
-Node: Array Basics534898
-Node: Array Intro535748
-Ref: figure-array-elements537764
-Ref: Array Intro-Footnote-1540633
-Node: Reference to Elements540765
-Node: Assigning Elements543287
-Node: Array Example543782
-Node: Scanning an Array545751
-Node: Controlling Scanning548848
-Ref: Controlling Scanning-Footnote-1555494
-Node: Numeric Array Subscripts555818
-Node: Uninitialized Subscripts558092
-Node: Delete559771
-Ref: Delete-Footnote-1562585
-Node: Multidimensional562642
-Node: Multiscanning565847
-Node: Arrays of Arrays567519
-Node: Arrays Summary572419
-Node: Functions574608
-Node: Built-in575668
-Node: Calling Built-in576857
-Node: Boolean Functions578904
-Node: Numeric Functions579474
-Ref: Numeric Functions-Footnote-1583667
-Ref: Numeric Functions-Footnote-2584351
-Ref: Numeric Functions-Footnote-3584403
-Node: String Functions584679
-Ref: String Functions-Footnote-1610910
-Ref: String Functions-Footnote-2611044
-Ref: String Functions-Footnote-3611304
-Node: Gory Details611391
-Ref: table-sub-escapes613298
-Ref: table-sub-proposed614944
-Ref: table-posix-sub616454
-Ref: table-gensub-escapes618142
-Ref: Gory Details-Footnote-1619076
-Node: I/O Functions619230
-Ref: table-system-return-values625917
-Ref: I/O Functions-Footnote-1628088
-Ref: I/O Functions-Footnote-2628236
-Node: Time Functions628356
-Ref: Time Functions-Footnote-1639512
-Ref: Time Functions-Footnote-2639588
-Ref: Time Functions-Footnote-3639750
-Ref: Time Functions-Footnote-4639861
-Ref: Time Functions-Footnote-5639979
-Ref: Time Functions-Footnote-6640214
-Node: Bitwise Functions640496
-Ref: table-bitwise-ops641098
-Ref: Bitwise Functions-Footnote-1647352
-Ref: Bitwise Functions-Footnote-2647531
-Node: Type Functions647728
-Node: I18N Functions651321
-Node: User-defined653064
-Node: Definition Syntax653884
-Ref: Definition Syntax-Footnote-1659712
-Node: Function Example659789
-Ref: Function Example-Footnote-1662768
-Node: Function Calling662790
-Node: Calling A Function663384
-Node: Variable Scope664354
-Node: Pass By Value/Reference667408
-Node: Function Caveats670140
-Ref: Function Caveats-Footnote-1672235
-Node: Return Statement672359
-Node: Dynamic Typing675414
-Node: Indirect Calls677806
-Node: Functions Summary688965
-Node: Library Functions691742
-Ref: Library Functions-Footnote-1695290
-Ref: Library Functions-Footnote-2695433
-Node: Library Names695608
-Ref: Library Names-Footnote-1699402
-Ref: Library Names-Footnote-2699629
-Node: General Functions699725
-Node: Strtonum Function700995
-Node: Assert Function704077
-Node: Round Function707529
-Node: Cliff Random Function709107
-Node: Ordinal Functions710140
-Ref: Ordinal Functions-Footnote-1713249
-Ref: Ordinal Functions-Footnote-2713501
-Node: Join Function713715
-Ref: Join Function-Footnote-1715518
-Node: Getlocaltime Function715722
-Node: Readfile Function719496
-Node: Shell Quoting721525
-Node: Isnumeric Function722981
-Node: To CSV Function724417
-Node: Data File Management726493
-Node: Filetrans Function727125
-Node: Rewind Function731419
-Node: File Checking733398
-Ref: File Checking-Footnote-1734770
-Node: Empty Files734977
-Node: Ignoring Assigns737044
-Node: Getopt Function738618
-Ref: Getopt Function-Footnote-1754452
-Node: Passwd Functions754664
-Ref: Passwd Functions-Footnote-1763846
-Node: Group Functions763934
-Ref: Group Functions-Footnote-1772072
-Node: Walking Arrays772285
-Node: Library Functions Summary775333
-Node: Library Exercises776757
-Node: Sample Programs777244
-Node: Running Examples778026
-Node: Clones778778
-Node: Cut Program780050
-Node: Egrep Program790491
-Node: Id Program799808
-Node: Split Program809922
-Ref: Split Program-Footnote-1820157
-Node: Tee Program820344
-Node: Uniq Program823253
-Node: Wc Program831118
-Node: Bytes vs. Characters831513
-Node: Using extensions833115
-Node: wc program833895
-Node: Miscellaneous Programs838901
-Node: Dupword Program840130
-Node: Alarm Program842193
-Node: Translate Program847106
-Ref: Translate Program-Footnote-1851847
-Node: Labels Program852125
-Ref: Labels Program-Footnote-1855566
-Node: Word Sorting855658
-Node: History Sorting859852
-Node: Extract Program862127
-Node: Simple Sed870396
-Node: Igawk Program873612
-Ref: Igawk Program-Footnote-1888859
-Ref: Igawk Program-Footnote-2889065
-Ref: Igawk Program-Footnote-3889195
-Node: Anagram Program889322
-Node: Signature Program892418
-Node: Programs Summary893670
-Node: Programs Exercises894928
-Ref: Programs Exercises-Footnote-1899244
-Node: Advanced Features899330
-Node: Nondecimal Data901824
-Node: Boolean Typed Values903454
-Node: Array Sorting905429
-Node: Controlling Array Traversal906158
-Ref: Controlling Array Traversal-Footnote-1914665
-Node: Array Sorting Functions914787
-Ref: Array Sorting Functions-Footnote-1920906
-Node: Two-way I/O921114
-Ref: Two-way I/O-Footnote-1929109
-Ref: Two-way I/O-Footnote-2929300
-Node: TCP/IP Networking929382
-Node: Profiling932562
-Node: Persistent Memory942272
-Ref: Persistent Memory-Footnote-1951230
-Node: Extension Philosophy951361
-Node: Advanced Features Summary952896
-Node: Internationalization955166
-Node: I18N and L10N956872
-Node: Explaining gettext957567
-Ref: Explaining gettext-Footnote-1963720
-Ref: Explaining gettext-Footnote-2963915
-Node: Programmer i18n964080
-Ref: Programmer i18n-Footnote-1969193
-Node: Translator i18n969242
-Node: String Extraction970078
-Ref: String Extraction-Footnote-1971256
-Node: Printf Ordering971354
-Ref: Printf Ordering-Footnote-1974216
-Node: I18N Portability974284
-Ref: I18N Portability-Footnote-1976858
-Node: I18N Example976929
-Ref: I18N Example-Footnote-1980329
-Ref: I18N Example-Footnote-2980405
-Node: Gawk I18N980522
-Node: I18N Summary981178
-Node: Debugger982579
-Node: Debugging983603
-Node: Debugging Concepts984052
-Node: Debugging Terms985878
-Node: Awk Debugging988491
-Ref: Awk Debugging-Footnote-1989468
-Node: Sample Debugging Session989608
-Node: Debugger Invocation990160
-Node: Finding The Bug991789
-Node: List of Debugger Commands998475
-Node: Breakpoint Control999852
-Node: Debugger Execution Control1003684
-Node: Viewing And Changing Data1007164
-Node: Execution Stack1010902
-Node: Debugger Info1012583
-Node: Miscellaneous Debugger Commands1016882
-Node: Readline Support1022135
-Node: Limitations1023081
-Node: Debugging Summary1025725
-Node: Namespaces1027028
-Node: Global Namespace1028155
-Node: Qualified Names1029600
-Node: Default Namespace1030635
-Node: Changing The Namespace1031410
-Node: Naming Rules1033104
-Node: Internal Name Management1035019
-Node: Namespace Example1036089
-Node: Namespace And Features1038672
-Node: Namespace Summary1040129
-Node: Arbitrary Precision Arithmetic1041642
-Node: Computer Arithmetic1043161
-Ref: table-numeric-ranges1046978
-Ref: table-floating-point-ranges1047476
-Ref: Computer Arithmetic-Footnote-11048135
-Node: Math Definitions1048194
-Ref: table-ieee-formats1051239
-Node: MPFR features1051813
-Node: MPFR On Parole1052266
-Ref: MPFR On Parole-Footnote-11053110
-Node: MPFR Intro1053269
-Node: FP Math Caution1054959
-Ref: FP Math Caution-Footnote-11056033
-Node: Inexactness of computations1056410
-Node: Inexact representation1057441
-Node: Comparing FP Values1058824
-Node: Errors accumulate1060082
-Node: Strange values1061549
-Ref: Strange values-Footnote-11064215
-Node: Getting Accuracy1064320
-Node: Try To Round1067057
-Node: Setting precision1067964
-Ref: table-predefined-precision-strings1068669
-Node: Setting the rounding mode1070554
-Ref: table-gawk-rounding-modes1070936
-Ref: Setting the rounding mode-Footnote-11074994
-Node: Arbitrary Precision Integers1075177
-Ref: Arbitrary Precision Integers-Footnote-11078389
-Node: Checking for MPFR1078545
-Node: POSIX Floating Point Problems1080035
-Ref: POSIX Floating Point Problems-Footnote-11084899
-Node: Floating point summary1084937
-Node: Dynamic Extensions1087201
-Node: Extension Intro1088800
-Node: Plugin License1090108
-Node: Extension Mechanism Outline1090921
-Ref: figure-load-extension1091372
-Ref: figure-register-new-function1092957
-Ref: figure-call-new-function1094067
-Node: Extension API Description1096191
-Node: Extension API Functions Introduction1097920
-Ref: table-api-std-headers1099818
-Node: General Data Types1104282
-Ref: General Data Types-Footnote-11113450
-Node: Memory Allocation Functions1113765
-Ref: Memory Allocation Functions-Footnote-11118490
-Node: Constructor Functions1118589
-Node: API Ownership of MPFR and GMP Values1122494
-Node: Registration Functions1124055
-Node: Extension Functions1124759
-Node: Exit Callback Functions1130335
-Node: Extension Version String1131654
-Node: Input Parsers1132349
-Node: Output Wrappers1146993
-Node: Two-way processors1151841
-Node: Printing Messages1154202
-Ref: Printing Messages-Footnote-11155416
-Node: Updating ERRNO1155571
-Node: Requesting Values1156370
-Ref: table-value-types-returned1157123
-Node: Accessing Parameters1158232
-Node: Symbol Table Access1159516
-Node: Symbol table by name1160032
-Ref: Symbol table by name-Footnote-11163243
-Node: Symbol table by cookie1163375
-Ref: Symbol table by cookie-Footnote-11167656
-Node: Cached values1167720
-Ref: Cached values-Footnote-11171364
-Node: Array Manipulation1171521
-Ref: Array Manipulation-Footnote-11172624
-Node: Array Data Types1172661
-Ref: Array Data Types-Footnote-11175483
-Node: Array Functions1175583
-Node: Flattening Arrays1180612
-Node: Creating Arrays1187664
-Node: Redirection API1192514
-Node: Extension API Variables1195535
-Node: Extension Versioning1196260
-Ref: gawk-api-version1196697
-Node: Extension GMP/MPFR Versioning1198485
-Node: Extension API Informational Variables1200191
-Node: Extension API Boilerplate1201352
-Node: Changes from API V11205488
-Node: Finding Extensions1207122
-Node: Extension Example1207697
-Node: Internal File Description1208521
-Node: Internal File Ops1212845
-Ref: Internal File Ops-Footnote-11224403
-Node: Using Internal File Ops1224551
-Ref: Using Internal File Ops-Footnote-11226982
-Node: Extension Samples1227260
-Node: Extension Sample File Functions1228829
-Node: Extension Sample Fnmatch1236967
-Node: Extension Sample Fork1238562
-Node: Extension Sample Inplace1239838
-Node: Extension Sample Ord1243510
-Node: Extension Sample Readdir1244386
-Ref: table-readdir-file-types1245283
-Node: Extension Sample Revout1246421
-Node: Extension Sample Rev2way1247018
-Node: Extension Sample Read write array1247770
-Node: Extension Sample Readfile1251044
-Node: Extension Sample Time1252175
-Node: Extension Sample API Tests1254465
-Node: gawkextlib1254973
-Node: Extension summary1258009
-Node: Extension Exercises1261867
-Node: Language History1263145
-Node: V7/SVR3.11264859
-Node: SVR41267209
-Node: POSIX1268741
-Node: BTL1270166
-Node: POSIX/GNU1270935
-Node: Feature History1277466
-Node: Common Extensions1297033
-Node: Ranges and Locales1298510
-Ref: Ranges and Locales-Footnote-11303311
-Ref: Ranges and Locales-Footnote-21303338
-Ref: Ranges and Locales-Footnote-31303577
-Node: Contributors1303800
-Node: History summary1310005
-Node: Installation1311451
-Node: Gawk Distribution1312415
-Node: Getting1312907
-Node: Extracting1313906
-Node: Distribution contents1315618
-Node: Unix Installation1323698
-Node: Quick Installation1324520
-Node: Compiling with MPFR1327066
-Node: Shell Startup Files1327772
-Node: Additional Configuration Options1328929
-Node: Configuration Philosophy1331316
-Node: Compiling from Git1333818
-Node: Building the Documentation1334377
-Node: Non-Unix Installation1335789
-Node: PC Installation1336265
-Node: PC Binary Installation1337138
-Node: PC Compiling1338043
-Node: PC Using1339221
-Node: Cygwin1342949
-Node: MSYS1344205
-Node: OpenVMS Installation1344837
-Node: OpenVMS Compilation1345518
-Ref: OpenVMS Compilation-Footnote-11347001
-Node: OpenVMS Dynamic Extensions1347063
-Node: OpenVMS Installation Details1348699
-Node: OpenVMS Running1351134
-Node: OpenVMS GNV1355271
-Node: Bugs1356026
-Node: Bug definition1356950
-Node: Bug address1360601
-Node: Usenet1364192
-Node: Performance bugs1365423
-Node: Asking for help1368441
-Node: Maintainers1370432
-Node: Other Versions1371459
-Node: Installation summary1380391
-Node: Notes1381775
-Node: Compatibility Mode1382585
-Node: Additions1383407
-Node: Accessing The Source1384352
-Node: Adding Code1385887
-Node: New Ports1393023
-Node: Derived Files1397533
-Ref: Derived Files-Footnote-11403380
-Ref: Derived Files-Footnote-21403415
-Ref: Derived Files-Footnote-31404032
-Node: Future Extensions1404146
-Node: Implementation Limitations1404818
-Node: Extension Design1406060
-Node: Old Extension Problems1407224
-Ref: Old Extension Problems-Footnote-11408800
-Node: Extension New Mechanism Goals1408861
-Ref: Extension New Mechanism Goals-Footnote-11412357
-Node: Extension Other Design Decisions1412558
-Node: Extension Future Growth1414757
-Node: Notes summary1415381
-Node: Basic Concepts1416594
-Node: Basic High Level1417279
-Ref: figure-general-flow1417561
-Ref: figure-process-flow1418268
-Ref: Basic High Level-Footnote-11421669
-Node: Basic Data Typing1421858
-Node: Glossary1425276
-Node: Copying1458398
-Node: GNU Free Documentation License1496159
-Node: Index1521482
+Ref: table-csv-examples249792
+Node: Command Line Field Separator252106
+Node: Full Line Fields255492
+Ref: Full Line Fields-Footnote-1257072
+Ref: Full Line Fields-Footnote-2257118
+Node: Field Splitting Summary257226
+Node: Constant Size259660
+Node: Fixed width data260404
+Node: Skipping intervening263923
+Node: Allowing trailing data264725
+Node: Fields with fixed data265790
+Node: Splitting By Content267416
+Ref: Splitting By Content-Footnote-1271685
+Node: More CSV271848
+Node: FS versus FPAT273501
+Node: Testing field creation274710
+Node: Multiple Line276488
+Node: Getline282970
+Node: Plain Getline285556
+Node: Getline/Variable288206
+Node: Getline/File289403
+Node: Getline/Variable/File290851
+Ref: Getline/Variable/File-Footnote-1292496
+Node: Getline/Pipe292592
+Node: Getline/Variable/Pipe295405
+Node: Getline/Coprocess296588
+Node: Getline/Variable/Coprocess297911
+Node: Getline Notes298677
+Node: Getline Summary301638
+Ref: table-getline-variants302082
+Node: Read Timeout302987
+Ref: Read Timeout-Footnote-1306951
+Node: Retrying Input307009
+Node: Command-line directories308276
+Node: Input Summary309214
+Node: Input Exercises312594
+Node: Printing313034
+Node: Print314977
+Node: Print Examples316483
+Node: Output Separators319336
+Node: OFMT321447
+Node: Printf322870
+Node: Basic Printf323675
+Node: Control Letters325311
+Node: Format Modifiers330780
+Node: Printf Examples337066
+Node: Redirection339611
+Node: Special FD346685
+Ref: Special FD-Footnote-1349975
+Node: Special Files350061
+Node: Other Inherited Files350690
+Node: Special Network351755
+Node: Special Caveats352643
+Node: Close Files And Pipes353626
+Ref: Close Files And Pipes-Footnote-1359762
+Node: Close Return Value359918
+Ref: table-close-pipe-return-values361193
+Ref: Close Return Value-Footnote-1362027
+Node: Noflush362183
+Node: Nonfatal363695
+Node: Output Summary366112
+Node: Output Exercises367398
+Node: Expressions368089
+Node: Values369291
+Node: Constants369969
+Node: Scalar Constants370666
+Ref: Scalar Constants-Footnote-1373244
+Ref: Scalar Constants-Footnote-2373494
+Node: Nondecimal-numbers373574
+Node: Regexp Constants376695
+Node: Using Constant Regexps377241
+Node: Standard Regexp Constants377887
+Node: Strong Regexp Constants381187
+Node: Variables385038
+Node: Using Variables385703
+Node: Assignment Options387683
+Node: Conversion390245
+Node: Strings And Numbers390777
+Ref: Strings And Numbers-Footnote-1393996
+Node: Locale influences conversions394105
+Ref: table-locale-affects396955
+Node: All Operators397598
+Node: Arithmetic Ops398239
+Node: Concatenation401069
+Ref: Concatenation-Footnote-1404019
+Node: Assignment Ops404142
+Ref: table-assign-ops409281
+Node: Increment Ops410663
+Node: Truth Values and Conditions414262
+Node: Truth Values415388
+Node: Typing and Comparison416479
+Node: Variable Typing417315
+Ref: Variable Typing-Footnote-1423977
+Ref: Variable Typing-Footnote-2424057
+Node: Comparison Operators424140
+Ref: table-relational-ops424567
+Node: POSIX String Comparison428253
+Ref: POSIX String Comparison-Footnote-1430012
+Ref: POSIX String Comparison-Footnote-2430155
+Node: Boolean Ops430239
+Ref: Boolean Ops-Footnote-1434932
+Node: Conditional Exp435028
+Node: Function Calls436814
+Node: Precedence440764
+Node: Locales444641
+Node: Expressions Summary446323
+Node: Patterns and Actions448986
+Node: Pattern Overview450128
+Node: Regexp Patterns451854
+Node: Expression Patterns452400
+Node: Ranges456309
+Node: BEGIN/END459487
+Node: Using BEGIN/END460300
+Ref: Using BEGIN/END-Footnote-1463210
+Node: I/O And BEGIN/END463320
+Node: BEGINFILE/ENDFILE465801
+Node: Empty469242
+Node: Using Shell Variables469559
+Node: Action Overview471897
+Node: Statements474332
+Node: If Statement476230
+Node: While Statement477799
+Node: Do Statement479887
+Node: For Statement481073
+Node: Switch Statement484430
+Node: Break Statement486981
+Node: Continue Statement489173
+Node: Next Statement491105
+Node: Nextfile Statement493602
+Node: Exit Statement496463
+Node: Built-in Variables498996
+Node: User-modified500175
+Node: Auto-set508386
+Ref: Auto-set-Footnote-1526485
+Ref: Auto-set-Footnote-2526703
+Node: ARGC and ARGV526759
+Node: Pattern Action Summary531198
+Node: Arrays533814
+Node: Array Basics535191
+Node: Array Intro536041
+Ref: figure-array-elements538057
+Ref: Array Intro-Footnote-1540926
+Node: Reference to Elements541058
+Node: Assigning Elements543580
+Node: Array Example544075
+Node: Scanning an Array546044
+Node: Controlling Scanning549141
+Ref: Controlling Scanning-Footnote-1555787
+Node: Numeric Array Subscripts556111
+Node: Uninitialized Subscripts558385
+Node: Delete560064
+Ref: Delete-Footnote-1562878
+Node: Multidimensional562935
+Node: Multiscanning566140
+Node: Arrays of Arrays567812
+Node: Arrays Summary572712
+Node: Functions574901
+Node: Built-in575961
+Node: Calling Built-in577150
+Node: Boolean Functions579197
+Node: Numeric Functions579767
+Ref: Numeric Functions-Footnote-1583960
+Ref: Numeric Functions-Footnote-2584644
+Ref: Numeric Functions-Footnote-3584696
+Node: String Functions584972
+Ref: String Functions-Footnote-1611203
+Ref: String Functions-Footnote-2611337
+Ref: String Functions-Footnote-3611597
+Node: Gory Details611684
+Ref: table-sub-escapes613591
+Ref: table-sub-proposed615237
+Ref: table-posix-sub616747
+Ref: table-gensub-escapes618435
+Ref: Gory Details-Footnote-1619369
+Node: I/O Functions619523
+Ref: table-system-return-values626210
+Ref: I/O Functions-Footnote-1628381
+Ref: I/O Functions-Footnote-2628529
+Node: Time Functions628649
+Ref: Time Functions-Footnote-1639805
+Ref: Time Functions-Footnote-2639881
+Ref: Time Functions-Footnote-3640043
+Ref: Time Functions-Footnote-4640154
+Ref: Time Functions-Footnote-5640272
+Ref: Time Functions-Footnote-6640507
+Node: Bitwise Functions640789
+Ref: table-bitwise-ops641391
+Ref: Bitwise Functions-Footnote-1647645
+Ref: Bitwise Functions-Footnote-2647824
+Node: Type Functions648021
+Node: I18N Functions651614
+Node: User-defined653357
+Node: Definition Syntax654177
+Ref: Definition Syntax-Footnote-1660005
+Node: Function Example660082
+Ref: Function Example-Footnote-1663061
+Node: Function Calling663083
+Node: Calling A Function663677
+Node: Variable Scope664647
+Node: Pass By Value/Reference667701
+Node: Function Caveats670433
+Ref: Function Caveats-Footnote-1672528
+Node: Return Statement672652
+Node: Dynamic Typing675707
+Node: Indirect Calls678099
+Node: Functions Summary689258
+Node: Library Functions692035
+Ref: Library Functions-Footnote-1695583
+Ref: Library Functions-Footnote-2695726
+Node: Library Names695901
+Ref: Library Names-Footnote-1699695
+Ref: Library Names-Footnote-2699922
+Node: General Functions700018
+Node: Strtonum Function701288
+Node: Assert Function704370
+Node: Round Function707822
+Node: Cliff Random Function709400
+Node: Ordinal Functions710433
+Ref: Ordinal Functions-Footnote-1713542
+Ref: Ordinal Functions-Footnote-2713794
+Node: Join Function714008
+Ref: Join Function-Footnote-1715811
+Node: Getlocaltime Function716015
+Node: Readfile Function719789
+Node: Shell Quoting721818
+Node: Isnumeric Function723274
+Node: To CSV Function724710
+Node: Data File Management726786
+Node: Filetrans Function727418
+Node: Rewind Function731712
+Node: File Checking733691
+Ref: File Checking-Footnote-1735063
+Node: Empty Files735270
+Node: Ignoring Assigns737337
+Node: Getopt Function738911
+Ref: Getopt Function-Footnote-1754745
+Node: Passwd Functions754957
+Ref: Passwd Functions-Footnote-1764139
+Node: Group Functions764227
+Ref: Group Functions-Footnote-1772365
+Node: Walking Arrays772578
+Node: Library Functions Summary775626
+Node: Library Exercises777050
+Node: Sample Programs777537
+Node: Running Examples778319
+Node: Clones779071
+Node: Cut Program780343
+Node: Egrep Program790784
+Node: Id Program800101
+Node: Split Program810215
+Ref: Split Program-Footnote-1820450
+Node: Tee Program820637
+Node: Uniq Program823546
+Node: Wc Program831411
+Node: Bytes vs. Characters831806
+Node: Using extensions833408
+Node: wc program834188
+Node: Miscellaneous Programs839194
+Node: Dupword Program840423
+Node: Alarm Program842486
+Node: Translate Program847399
+Ref: Translate Program-Footnote-1852140
+Node: Labels Program852418
+Ref: Labels Program-Footnote-1855859
+Node: Word Sorting855951
+Node: History Sorting860145
+Node: Extract Program862420
+Node: Simple Sed870689
+Node: Igawk Program873905
+Ref: Igawk Program-Footnote-1889152
+Ref: Igawk Program-Footnote-2889358
+Ref: Igawk Program-Footnote-3889488
+Node: Anagram Program889615
+Node: Signature Program892711
+Node: Programs Summary893963
+Node: Programs Exercises895221
+Ref: Programs Exercises-Footnote-1899537
+Node: Advanced Features899623
+Node: Nondecimal Data902117
+Node: Boolean Typed Values903747
+Node: Array Sorting905722
+Node: Controlling Array Traversal906451
+Ref: Controlling Array Traversal-Footnote-1914958
+Node: Array Sorting Functions915080
+Ref: Array Sorting Functions-Footnote-1921199
+Node: Two-way I/O921407
+Ref: Two-way I/O-Footnote-1929402
+Ref: Two-way I/O-Footnote-2929593
+Node: TCP/IP Networking929675
+Node: Profiling932855
+Node: Persistent Memory942565
+Ref: Persistent Memory-Footnote-1951523
+Node: Extension Philosophy951654
+Node: Advanced Features Summary953189
+Node: Internationalization955459
+Node: I18N and L10N957165
+Node: Explaining gettext957860
+Ref: Explaining gettext-Footnote-1964013
+Ref: Explaining gettext-Footnote-2964208
+Node: Programmer i18n964373
+Ref: Programmer i18n-Footnote-1969486
+Node: Translator i18n969535
+Node: String Extraction970371
+Ref: String Extraction-Footnote-1971549
+Node: Printf Ordering971647
+Ref: Printf Ordering-Footnote-1974509
+Node: I18N Portability974577
+Ref: I18N Portability-Footnote-1977151
+Node: I18N Example977222
+Ref: I18N Example-Footnote-1980622
+Ref: I18N Example-Footnote-2980698
+Node: Gawk I18N980815
+Node: I18N Summary981471
+Node: Debugger982872
+Node: Debugging983896
+Node: Debugging Concepts984345
+Node: Debugging Terms986171
+Node: Awk Debugging988784
+Ref: Awk Debugging-Footnote-1989761
+Node: Sample Debugging Session989901
+Node: Debugger Invocation990453
+Node: Finding The Bug992082
+Node: List of Debugger Commands998768
+Node: Breakpoint Control1000145
+Node: Debugger Execution Control1003977
+Node: Viewing And Changing Data1007457
+Node: Execution Stack1011195
+Node: Debugger Info1012876
+Node: Miscellaneous Debugger Commands1017175
+Node: Readline Support1022428
+Node: Limitations1023374
+Node: Debugging Summary1026018
+Node: Namespaces1027321
+Node: Global Namespace1028448
+Node: Qualified Names1029893
+Node: Default Namespace1030928
+Node: Changing The Namespace1031703
+Node: Naming Rules1033397
+Node: Internal Name Management1035312
+Node: Namespace Example1036382
+Node: Namespace And Features1038965
+Node: Namespace Summary1040422
+Node: Arbitrary Precision Arithmetic1041935
+Node: Computer Arithmetic1043454
+Ref: table-numeric-ranges1047271
+Ref: table-floating-point-ranges1047769
+Ref: Computer Arithmetic-Footnote-11048428
+Node: Math Definitions1048487
+Ref: table-ieee-formats1051532
+Node: MPFR features1052106
+Node: MPFR On Parole1052559
+Ref: MPFR On Parole-Footnote-11053403
+Node: MPFR Intro1053562
+Node: FP Math Caution1055252
+Ref: FP Math Caution-Footnote-11056326
+Node: Inexactness of computations1056703
+Node: Inexact representation1057734
+Node: Comparing FP Values1059117
+Node: Errors accumulate1060375
+Node: Strange values1061842
+Ref: Strange values-Footnote-11064508
+Node: Getting Accuracy1064613
+Node: Try To Round1067350
+Node: Setting precision1068257
+Ref: table-predefined-precision-strings1068962
+Node: Setting the rounding mode1070847
+Ref: table-gawk-rounding-modes1071229
+Ref: Setting the rounding mode-Footnote-11075287
+Node: Arbitrary Precision Integers1075470
+Ref: Arbitrary Precision Integers-Footnote-11078682
+Node: Checking for MPFR1078838
+Node: POSIX Floating Point Problems1080328
+Ref: POSIX Floating Point Problems-Footnote-11085192
+Node: Floating point summary1085230
+Node: Dynamic Extensions1087494
+Node: Extension Intro1089093
+Node: Plugin License1090401
+Node: Extension Mechanism Outline1091214
+Ref: figure-load-extension1091665
+Ref: figure-register-new-function1093250
+Ref: figure-call-new-function1094360
+Node: Extension API Description1096484
+Node: Extension API Functions Introduction1098213
+Ref: table-api-std-headers1100111
+Node: General Data Types1104575
+Ref: General Data Types-Footnote-11113743
+Node: Memory Allocation Functions1114058
+Ref: Memory Allocation Functions-Footnote-11118783
+Node: Constructor Functions1118882
+Node: API Ownership of MPFR and GMP Values1122787
+Node: Registration Functions1124348
+Node: Extension Functions1125052
+Node: Exit Callback Functions1130628
+Node: Extension Version String1131947
+Node: Input Parsers1132642
+Node: Output Wrappers1147286
+Node: Two-way processors1152134
+Node: Printing Messages1154495
+Ref: Printing Messages-Footnote-11155709
+Node: Updating ERRNO1155864
+Node: Requesting Values1156663
+Ref: table-value-types-returned1157416
+Node: Accessing Parameters1158525
+Node: Symbol Table Access1159809
+Node: Symbol table by name1160325
+Ref: Symbol table by name-Footnote-11163536
+Node: Symbol table by cookie1163668
+Ref: Symbol table by cookie-Footnote-11167949
+Node: Cached values1168013
+Ref: Cached values-Footnote-11171657
+Node: Array Manipulation1171814
+Ref: Array Manipulation-Footnote-11172917
+Node: Array Data Types1172954
+Ref: Array Data Types-Footnote-11175776
+Node: Array Functions1175876
+Node: Flattening Arrays1180905
+Node: Creating Arrays1187957
+Node: Redirection API1192807
+Node: Extension API Variables1195828
+Node: Extension Versioning1196553
+Ref: gawk-api-version1196990
+Node: Extension GMP/MPFR Versioning1198778
+Node: Extension API Informational Variables1200484
+Node: Extension API Boilerplate1201645
+Node: Changes from API V11205781
+Node: Finding Extensions1207415
+Node: Extension Example1207990
+Node: Internal File Description1208814
+Node: Internal File Ops1213138
+Ref: Internal File Ops-Footnote-11224696
+Node: Using Internal File Ops1224844
+Ref: Using Internal File Ops-Footnote-11227275
+Node: Extension Samples1227553
+Node: Extension Sample File Functions1229122
+Node: Extension Sample Fnmatch1237260
+Node: Extension Sample Fork1238855
+Node: Extension Sample Inplace1240131
+Node: Extension Sample Ord1243803
+Node: Extension Sample Readdir1244679
+Ref: table-readdir-file-types1245576
+Node: Extension Sample Revout1246714
+Node: Extension Sample Rev2way1247311
+Node: Extension Sample Read write array1248063
+Node: Extension Sample Readfile1251337
+Node: Extension Sample Time1252468
+Node: Extension Sample API Tests1254758
+Node: gawkextlib1255266
+Node: Extension summary1258302
+Node: Extension Exercises1262160
+Node: Language History1263438
+Node: V7/SVR3.11265152
+Node: SVR41267502
+Node: POSIX1269034
+Node: BTL1270459
+Node: POSIX/GNU1271228
+Node: Feature History1277759
+Node: Common Extensions1297326
+Node: Ranges and Locales1298803
+Ref: Ranges and Locales-Footnote-11303604
+Ref: Ranges and Locales-Footnote-21303631
+Ref: Ranges and Locales-Footnote-31303870
+Node: Contributors1304093
+Node: History summary1310298
+Node: Installation1311744
+Node: Gawk Distribution1312708
+Node: Getting1313200
+Node: Extracting1314199
+Node: Distribution contents1315911
+Node: Unix Installation1323991
+Node: Quick Installation1324813
+Node: Compiling with MPFR1327359
+Node: Shell Startup Files1328065
+Node: Additional Configuration Options1329222
+Node: Configuration Philosophy1331609
+Node: Compiling from Git1334111
+Node: Building the Documentation1334670
+Node: Non-Unix Installation1336082
+Node: PC Installation1336558
+Node: PC Binary Installation1337431
+Node: PC Compiling1338336
+Node: PC Using1339514
+Node: Cygwin1343242
+Node: MSYS1344498
+Node: OpenVMS Installation1345130
+Node: OpenVMS Compilation1345811
+Ref: OpenVMS Compilation-Footnote-11347294
+Node: OpenVMS Dynamic Extensions1347356
+Node: OpenVMS Installation Details1348992
+Node: OpenVMS Running1351427
+Node: OpenVMS GNV1355564
+Node: Bugs1356319
+Node: Bug definition1357243
+Node: Bug address1360894
+Node: Usenet1364485
+Node: Performance bugs1365716
+Node: Asking for help1368734
+Node: Maintainers1370725
+Node: Other Versions1371752
+Node: Installation summary1380684
+Node: Notes1382068
+Node: Compatibility Mode1382878
+Node: Additions1383700
+Node: Accessing The Source1384645
+Node: Adding Code1386180
+Node: New Ports1393316
+Node: Derived Files1397826
+Ref: Derived Files-Footnote-11403673
+Ref: Derived Files-Footnote-21403708
+Ref: Derived Files-Footnote-31404325
+Node: Future Extensions1404439
+Node: Implementation Limitations1405111
+Node: Extension Design1406353
+Node: Old Extension Problems1407517
+Ref: Old Extension Problems-Footnote-11409093
+Node: Extension New Mechanism Goals1409154
+Ref: Extension New Mechanism Goals-Footnote-11412650
+Node: Extension Other Design Decisions1412851
+Node: Extension Future Growth1415050
+Node: Notes summary1415674
+Node: Basic Concepts1416887
+Node: Basic High Level1417572
+Ref: figure-general-flow1417854
+Ref: figure-process-flow1418561
+Ref: Basic High Level-Footnote-11421962
+Node: Basic Data Typing1422151
+Node: Glossary1425569
+Node: Copying1458691
+Node: GNU Free Documentation License1496452
+Node: Index1521775

End Tag Table
diff --git a/doc/gawk.texi b/doc/gawk.texi
index f617db09..13ba6159 100644
--- a/doc/gawk.texi
+++ b/doc/gawk.texi
@@ -8124,8 +8124,8 @@ To use CSV data, invoke @command{gawk} with either of the
Fields in CSV files are separated by commas. In order to allow a comma
to appear inside a field (i.e., as data), the field may be quoted
by beginning and ending it with double quotes. In order to allow a double
-quote inside a field, the field @emph{must} be quoted, and two double quotes are used
-to represent an actual double quote.
+quote inside a field, the field @emph{must} be quoted, and two double quotes
+represent an actual double quote.
The double quote that starts a quoted field must be the first
character after the comma.
@ref{table-csv-examples} shows some examples.
@@ -8145,8 +8145,10 @@ Additionally, and here's where it gets messy, newlines are also
allowed inside double-quoted fields!
In order to deal with such things, when processing CSV files,
@command{gawk} scans the input data looking for newlines that
-are not enclosed in double quotes. Thus, use of the @option{--csv}
-totally overrides normal record processing with @code{RS} (@pxref{Records}).
+are not enclosed in double quotes. Thus, use of the @option{--csv} option
+totally overrides normal record processing with @code{RS} (@pxref{Records}),
+as well as field splitting with any of @code{FS}, @code{FIELDWIDTHS},
+or @code{FPAT}.
@cindex Kernighan, Brian @subentry quotes
@cindex sidebar @subentry Carriage-Return--Line-Feed Line Endings In CSV Files
@@ -8163,9 +8165,13 @@ totally overrides normal record processing with @code{RS} (@pxref{Records}).
Many CSV files are imported from systems where the line terminator
for text files is a carriage-return--line-feed pair
(CR-LF, @samp{\r} followed by @samp{\n}).
-For ease of use, when processing CSV files, @command{gawk} simply
-includes the carriage-return character in the record terminator
-when it occurs immediately prior to a line-feed character in the input.
+For ease of use, when processing CSV files, @command{gawk} converts
+CR-LF pairs into a single newline. That is, the @samp{\r} is removed.
+
+This occurs only when a CR is paired with an LF; a standalone CR
+is left alone. This behavior is consistent with with Windows systems
+which automatically convert CR-LF in files into a plain LF in memory,
+and also with the commonly available @command{unix2dos} utility program.
@docbook
</sidebar>
@@ -8185,9 +8191,13 @@ when it occurs immediately prior to a line-feed character in the input.
Many CSV files are imported from systems where the line terminator
for text files is a carriage-return--line-feed pair
(CR-LF, @samp{\r} followed by @samp{\n}).
-For ease of use, when processing CSV files, @command{gawk} simply
-includes the carriage-return character in the record terminator
-when it occurs immediately prior to a line-feed character in the input.
+For ease of use, when processing CSV files, @command{gawk} converts
+CR-LF pairs into a single newline. That is, the @samp{\r} is removed.
+
+This occurs only when a CR is paired with an LF; a standalone CR
+is left alone. This behavior is consistent with with Windows systems
+which automatically convert CR-LF in files into a plain LF in memory,
+and also with the commonly available @command{unix2dos} utility program.
@end cartouche
@end ifnotdocbook
diff --git a/doc/gawktexi.in b/doc/gawktexi.in
index 48e61668..58861621 100644
--- a/doc/gawktexi.in
+++ b/doc/gawktexi.in
@@ -7685,8 +7685,8 @@ To use CSV data, invoke @command{gawk} with either of the
Fields in CSV files are separated by commas. In order to allow a comma
to appear inside a field (i.e., as data), the field may be quoted
by beginning and ending it with double quotes. In order to allow a double
-quote inside a field, the field @emph{must} be quoted, and two double quotes are used
-to represent an actual double quote.
+quote inside a field, the field @emph{must} be quoted, and two double quotes
+represent an actual double quote.
The double quote that starts a quoted field must be the first
character after the comma.
@ref{table-csv-examples} shows some examples.
@@ -7706,8 +7706,10 @@ Additionally, and here's where it gets messy, newlines are also
allowed inside double-quoted fields!
In order to deal with such things, when processing CSV files,
@command{gawk} scans the input data looking for newlines that
-are not enclosed in double quotes. Thus, use of the @option{--csv}
-totally overrides normal record processing with @code{RS} (@pxref{Records}).
+are not enclosed in double quotes. Thus, use of the @option{--csv} option
+totally overrides normal record processing with @code{RS} (@pxref{Records}),
+as well as field splitting with any of @code{FS}, @code{FIELDWIDTHS},
+or @code{FPAT}.
@cindex Kernighan, Brian @subentry quotes
@sidebar Carriage-Return--Line-Feed Line Endings In CSV Files
@@ -7719,9 +7721,13 @@ totally overrides normal record processing with @code{RS} (@pxref{Records}).
Many CSV files are imported from systems where the line terminator
for text files is a carriage-return--line-feed pair
(CR-LF, @samp{\r} followed by @samp{\n}).
-For ease of use, when processing CSV files, @command{gawk} simply
-includes the carriage-return character in the record terminator
-when it occurs immediately prior to a line-feed character in the input.
+For ease of use, when processing CSV files, @command{gawk} converts
+CR-LF pairs into a single newline. That is, the @samp{\r} is removed.
+
+This occurs only when a CR is paired with an LF; a standalone CR
+is left alone. This behavior is consistent with with Windows systems
+which automatically convert CR-LF in files into a plain LF in memory,
+and also with the commonly available @command{unix2dos} utility program.
@end sidebar
The behavior of the @code{split()} function (not formally discussed
diff --git a/doc/wordlist b/doc/wordlist
index b37f31e8..8deb26ab 100644
--- a/doc/wordlist
+++ b/doc/wordlist
@@ -1574,6 +1574,7 @@ readline
realdata
realloc
realprogram
+rec
recomputation
redir
reenable
@@ -1762,6 +1763,7 @@ tlines
tm
tmp
tmy
+tocsv
tolower
toolset
toupper
diff --git a/io.c b/io.c
index 4f230ea2..dccf3952 100644
--- a/io.c
+++ b/io.c
@@ -3857,21 +3857,20 @@ csvscan(IOBUF *iop, struct recmatch *recm, SCANSTATE *state)
in_quote = ! in_quote;
bp++;
}
+ if (bp > iop->off && bp[-1] == '\r') {
+ // convert CR-LF to LF by shifting the record
+ memmove(bp - 1, bp, iop->dataend - bp);
+ iop->dataend--;
+ bp--;
+ }
} while (in_quote && bp < iop->dataend && bp++);
/* set len to what we have so far, in case this is all there is */
recm->len = bp - recm->start;
if (bp < iop->dataend) { /* found it in the buffer */
- if (bp > iop->off && bp[-1] == '\r') {
- /* handle CR LF conventional CSV record terminator */
- recm->rt_start = bp - 1;
- recm->rt_len = 2;
- }
- else {
- recm->rt_start = bp;
- recm->rt_len = 1;
- }
+ recm->rt_start = bp;
+ recm->rt_len = 1;
*state = NOSTATE;
return REC_OK;
} else {
diff --git a/pc/ChangeLog b/pc/ChangeLog
index 8ecc67b9..7246d200 100644
--- a/pc/ChangeLog
+++ b/pc/ChangeLog
@@ -1,3 +1,7 @@
+2023-04-13 Arnold D. Robbins <arnold@skeeve.com>
+
+ * Makefile.tst: Regenerated.
+
2023-03-24 Arnold D. Robbins <arnold@skeeve.com>
* Makefile.tst: Regenerated.
diff --git a/pc/Makefile.tst b/pc/Makefile.tst
index bd65e660..7af87c3c 100644
--- a/pc/Makefile.tst
+++ b/pc/Makefile.tst
@@ -190,7 +190,7 @@ GAWK_EXT_TESTS = \
aadelete1 aadelete2 aarray1 aasort aasorti argtest arraysort \
arraysort2 arraytype asortbool backw badargs beginfile1 beginfile2 \
binmode1 charasbytes clos1way clos1way2 clos1way3 clos1way4 \
- clos1way5 clos1way6 colonwarn commas crlf csv1 csv2 csv3 \
+ clos1way5 clos1way6 colonwarn commas crlf csv1 csv2 csv3 csvodd \
dbugeval dbugeval2 \
dbugeval3 dbugeval4 dbugtypedre1 dbugtypedre2 delsub \
devfd devfd1 devfd2 dfacheck1 dumpvars \
@@ -293,7 +293,7 @@ NEED_TRADITIONAL = litoct tradanch rscompat
NEED_PMA = pma
# List of tests that need --csv
-NEED_CSV = csv1 csv2 csv3
+NEED_CSV = csv1 csv2 csv3 csvodd
# Lists of tests that run a shell script
RUN_SHELL = exit fflush localenl modifiers next randtest rtlen rtlen01
@@ -2738,6 +2738,11 @@ csv3:
@-AWKPATH="$(srcdir)" $(AWK) -f $@.awk --csv < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
@-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
+csvodd:
+ @echo $@
+ @-AWKPATH="$(srcdir)" $(AWK) -f $@.awk --csv < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
+ @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
+
dbugeval2:
@echo $@
@-AWKPATH="$(srcdir)" $(AWK) -f $@.awk --debug < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
diff --git a/test/ChangeLog b/test/ChangeLog
index e32cd584..4d5569e7 100644
--- a/test/ChangeLog
+++ b/test/ChangeLog
@@ -1,3 +1,9 @@
+2023-04-13 Manuel Collado <mcollado2011@gmail.com>
+
+ * Makefile.am (EXTRA_DIST): New test: csvodd.
+ (NEED_CSV): Add csvodd.
+ * New files: csvodd.awk, csvodd.in, csvodd.ok:
+
2023-04-07 zhou shuiqing <zhoushuiqing321@outlook.com>
* sort1.awk: Add tests for ind_num_desc and @ind_str_asc.
diff --git a/test/Makefile.am b/test/Makefile.am
index 026db35f..fb1bf7be 100644
--- a/test/Makefile.am
+++ b/test/Makefile.am
@@ -231,6 +231,9 @@ EXTRA_DIST = \
csv3.awk \
csv3.in \
csv3.ok \
+ csvodd.awk \
+ csvodd.in \
+ csvodd.ok \
datanonl.awk \
datanonl.in \
datanonl.ok \
@@ -1513,7 +1516,7 @@ GAWK_EXT_TESTS = \
aadelete1 aadelete2 aarray1 aasort aasorti argtest arraysort \
arraysort2 arraytype asortbool backw badargs beginfile1 beginfile2 \
binmode1 charasbytes clos1way clos1way2 clos1way3 clos1way4 \
- clos1way5 clos1way6 colonwarn commas crlf csv1 csv2 csv3 \
+ clos1way5 clos1way6 colonwarn commas crlf csv1 csv2 csv3 csvodd \
dbugeval dbugeval2 \
dbugeval3 dbugeval4 dbugtypedre1 dbugtypedre2 delsub \
devfd devfd1 devfd2 dfacheck1 dumpvars \
@@ -1616,7 +1619,7 @@ NEED_TRADITIONAL = litoct tradanch rscompat
NEED_PMA = pma
# List of tests that need --csv
-NEED_CSV = csv1 csv2 csv3
+NEED_CSV = csv1 csv2 csv3 csvodd
# Lists of tests that run a shell script
RUN_SHELL = exit fflush localenl modifiers next randtest rtlen rtlen01
diff --git a/test/Makefile.in b/test/Makefile.in
index 4ef4c5df..b6cd07cc 100644
--- a/test/Makefile.in
+++ b/test/Makefile.in
@@ -495,6 +495,9 @@ EXTRA_DIST = \
csv3.awk \
csv3.in \
csv3.ok \
+ csvodd.awk \
+ csvodd.in \
+ csvodd.ok \
datanonl.awk \
datanonl.in \
datanonl.ok \
@@ -1777,7 +1780,7 @@ GAWK_EXT_TESTS = \
aadelete1 aadelete2 aarray1 aasort aasorti argtest arraysort \
arraysort2 arraytype asortbool backw badargs beginfile1 beginfile2 \
binmode1 charasbytes clos1way clos1way2 clos1way3 clos1way4 \
- clos1way5 clos1way6 colonwarn commas crlf csv1 csv2 csv3 \
+ clos1way5 clos1way6 colonwarn commas crlf csv1 csv2 csv3 csvodd \
dbugeval dbugeval2 \
dbugeval3 dbugeval4 dbugtypedre1 dbugtypedre2 delsub \
devfd devfd1 devfd2 dfacheck1 dumpvars \
@@ -1880,7 +1883,7 @@ NEED_TRADITIONAL = litoct tradanch rscompat
NEED_PMA = pma
# List of tests that need --csv
-NEED_CSV = csv1 csv2 csv3
+NEED_CSV = csv1 csv2 csv3 csvodd
# Lists of tests that run a shell script
RUN_SHELL = exit fflush localenl modifiers next randtest rtlen rtlen01
@@ -4508,6 +4511,11 @@ csv3:
@-AWKPATH="$(srcdir)" $(AWK) -f $@.awk --csv < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
@-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
+csvodd:
+ @echo $@
+ @-AWKPATH="$(srcdir)" $(AWK) -f $@.awk --csv < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
+ @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
+
dbugeval2:
@echo $@
@-AWKPATH="$(srcdir)" $(AWK) -f $@.awk --debug < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
diff --git a/test/Maketests b/test/Maketests
index 01db7413..165f143b 100644
--- a/test/Maketests
+++ b/test/Maketests
@@ -1427,6 +1427,11 @@ csv3:
@-AWKPATH="$(srcdir)" $(AWK) -f $@.awk --csv < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
@-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
+csvodd:
+ @echo $@
+ @-AWKPATH="$(srcdir)" $(AWK) -f $@.awk --csv < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
+ @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
+
dbugeval2:
@echo $@
@-AWKPATH="$(srcdir)" $(AWK) -f $@.awk --debug < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
diff --git a/test/csvodd.awk b/test/csvodd.awk
new file mode 100644
index 00000000..8d1948ad
--- /dev/null
+++ b/test/csvodd.awk
@@ -0,0 +1,13 @@
+# Show the string contents. Make begin, end, CR and LN visible.
+function show(str) {
+ gsub("\r", "\\r", str)
+ gsub("\n", "\\n", str)
+ printf("<%s>", str)
+}
+
+# Dump the current record
+{
+ show($0); show(RT); print ""
+ for (k=1; k<=NF; k++) show($k); print ""
+}
+
diff --git a/test/csvodd.in b/test/csvodd.in
new file mode 100644
index 00000000..59d8c207
--- /dev/null
+++ b/test/csvodd.in
@@ -0,0 +1,25 @@
+Normal record
+a,b,c,d
+EOL = CR+LF
+a,b,c,d
+EOL = CR+CR+LF
+a,b,c,d
+Quoted field
+a,"b,c",d
+Quote in quoted field
+a,"b""c",d
+Null fields
+,a,b,,c,d,
+Quoted null field, EOL = CR+LF
+a,b,"",c,d
+Embedded LF
+a,"b
+c",d
+Embedded CR+LF and LF, EOL = CR+LF
+"a
+b","c
+d"
+Embedded plain CR, EOL = LR
+a,b c,d
+No EOL at EOF
+a,b,c,d \ No newline at end of file
diff --git a/test/csvodd.ok b/test/csvodd.ok
new file mode 100644
index 00000000..cf9c44a4
--- /dev/null
+++ b/test/csvodd.ok
@@ -0,0 +1,44 @@
+<Normal record><\n>
+<Normal record>
+<a,b,c,d><\n>
+<a><b><c><d>
+<EOL = CR+LF><\n>
+<EOL = CR+LF>
+<a,b,c,d><\n>
+<a><b><c><d>
+<EOL = CR+CR+LF><\n>
+<EOL = CR+CR+LF>
+<a,b,c,d><\n>
+<a><b><c><d>
+<Quoted field><\n>
+<Quoted field>
+<a,"b,c",d><\n>
+<a><b,c><d>
+<Quote in quoted field><\n>
+<Quote in quoted field>
+<a,"b""c",d><\n>
+<a><b"c><d>
+<Null fields><\n>
+<Null fields>
+<,a,b,,c,d,><\n>
+<><a><b><><c><d><>
+<Quoted null field, EOL = CR+LF><\n>
+<Quoted null field>< EOL = CR+LF>
+<a,b,"",c,d><\n>
+<a><b><><c><d>
+<Embedded LF><\n>
+<Embedded LF>
+<a,"b\nc",d><\n>
+<a><b\nc><d>
+<Embedded CR+LF and LF, EOL = CR+LF><\n>
+<Embedded CR+LF and LF>< EOL = CR+LF>
+<"a\nb","c\nd"><\n>
+<a\nb><c\nd>
+<Embedded plain CR, EOL = LR><\n>
+<Embedded plain CR>< EOL = LR>
+<a,b\rc,d><\n>
+<a><b\rc><d>
+<No EOL at EOF><\n>
+<No EOL at EOF>
+<a,b,c,d><>
+<a><b><c><d>