summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--ChangeLog140
-rw-r--r--NEWS7
-rw-r--r--awk.h6
-rw-r--r--doc/ChangeLog10
-rw-r--r--doc/awkcard.in8
-rw-r--r--doc/gawk.115
-rw-r--r--extension/ChangeLog38
-rw-r--r--extension/Makefile.am8
-rw-r--r--extension/Makefile.in33
-rw-r--r--extension/readdir.c8
-rw-r--r--extension/readdir_test.c343
-rw-r--r--extension/readfile.c3
-rw-r--r--extension/revtwoway.c3
-rw-r--r--field.c219
-rw-r--r--gawkapi.h38
-rw-r--r--io.c24
-rw-r--r--main.c17
-rw-r--r--test/ChangeLog18
-rw-r--r--test/Makefile.am19
-rw-r--r--test/Makefile.in23
-rw-r--r--test/Maketests2
-rw-r--r--test/fwtest3.awk7
-rw-r--r--test/fwtest3.ok13
-rw-r--r--test/fwtest4.awk1
-rw-r--r--test/fwtest4.in (renamed from test/fwtest3.in)0
-rw-r--r--test/fwtest4.ok1
26 files changed, 878 insertions, 126 deletions
diff --git a/ChangeLog b/ChangeLog
index 6822c6f4..d740b43a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,13 @@
2017-03-27 Arnold D. Robbins <arnold@skeeve.com>
+ * field.c (parse_field_func_t): New typedef. Used as needed.
+ (fw_parse_field): Edit comment about resetting shift state.
+ (set_parser): Fix leading comment's style and type of argument.
+ (set_FIELDWIDTHS): Improve the fatal error message.
+ * gawkapi.h: Minor edits in some comments.
+
+2017-03-27 Arnold D. Robbins <arnold@skeeve.com>
+
Cause EPIPE errors to stdout to generate a real SIGPIPE.
* awk.h (die_via_sigpipe): New macro.
@@ -26,6 +34,60 @@
* config.sub: Updated again.
+2017-03-22 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * NEWS: Document new PROCINFO["FS"] value of "API".
+
+2017-03-22 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * NEWS: Document new FIELDWIDTHS skip capability and API input parser
+ field parsing enhancement.
+
+2017-03-22 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * gawkapi.h (awk_input_buf_t): Update get_record comment regarding the
+ new field_width argument.
+
+2017-03-21 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * gawkapi.h (awk_fieldwidth_info_t): Define new structure to contain
+ API field parsing info, replacing the previous awk_input_field_info_t
+ array.
+ (awk_fieldwidth_info_size): Define macro to calculate size of the
+ variable-length awk_fieldwidth_info_t structure.
+ (awk_input_buf_t): Update get_record prototype to update the type
+ of the final field_width argument from 'const awk_input_field_info_t **'
+ to 'const awk_fieldwidth_info_t **'.
+ * awk.h (set_record): Change 3rd argument from
+ 'const awk_input_field_info_t *' to 'const awk_fieldwidth_info_t *'.
+ * io.c (inrec, do_getline_redir, do_getline): Change field_width type
+ from 'const awk_input_field_info_t *' to
+ 'const awk_fieldwidth_info_t *'.
+ (get_a_record): Change field_width argument type from
+ 'const awk_input_field_info_t **' to 'const awk_fieldwidth_info_t **'.
+ * field.c (api_parser_override): Define new boolean to track whether
+ API parsing is currently overriding default parsing behavior.
+ (api_fw): Change type from 'const awk_input_field_info_t *'
+ to 'const awk_fieldwidth_info_t *'.
+ (FIELDWIDTHS): Change type from 'int *' to 'awk_fieldwidth_info_t *'.
+ (set_record): Use new boolean api_parser_override to track whether
+ API parsing override is in effect, since we can no longer discern
+ this from the value of parse_field -- FIELDWIDTHS parsing uses the
+ same function.
+ (calc_mbslen): New function to calculate the length of a multi-byte
+ string.
+ (fw_parse_field): Enhance to support the awk_fieldwidth_info_t
+ structure instead of simply using an array of integer field widths.
+ (api_parse_field): Remove function no longer needed since fw_parse_field
+ now supports both FIELDWIDTHS and API parsing.
+ (set_parser): Use api_parser_override instead of comparing parse_field
+ to api_parse_field.
+ (set_FIELDWIDTHS): Enhance to use new awk_fieldwidth_info_t structure
+ and parse new skip prefix for each field.
+ (current_field_sep): Use api_parser_override flag instead of comparing
+ to api_parse_field.
+ (current_field_sep_str): Ditto.
+
2017-03-20 Arnold D. Robbins <arnold@skeeve.com>
Improve handling of EPIPE. Problems reported by
@@ -55,6 +117,84 @@
* configure.ac: Some cleanups.
+2017-03-09 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * gawkapi.h (awk_input_field_info_t): Define new structure to contain
+ API field parsing info.
+ (awk_input_buf_t): Update get_record prototype to use an array of
+ awk_input_field_info_t instead of integers.
+ * awk.h (set_record): Change 3rd argument from 'const int *' to
+ 'const awk_input_field_info_t *'.
+ * field.c (api_fw): Now points to an array of awk_input_field_info_t
+ instead of integers.
+ (set_record): Change 3rd argument to point to an array of
+ awk_input_field_info_t.
+ (api_parse_field): Update parsing logic to use awk_input_field_info_t
+ structures instead of an array of integers.
+ * io.c (inrec, do_getline_redir, do_getline): Change field_width type
+ from 'const int *' to 'const awk_input_field_info_t *'.
+ (get_a_record): Change field_width argument type from 'const int **'
+ to 'const awk_input_field_info_t **'.
+
+2017-03-09 Arnold D. Robbins <arnold@skeeve.com>
+
+ * field.c: Minor style edits.
+
+2017-03-06 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * field.c (normal_parse_field): Renamed from save_parse_field to reflect
+ better its purpose. Added a comment to explain more clearly what's
+ going on.
+ (set_record, set_parser): Rename save_parse_field to normal_parse_field.
+
+2017-03-06 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * gawkapi.h (awk_input_buf_t): Remove field_width array and instead
+ add it as a 6th argument to the get_record function. This should
+ not break existing code, since it's fine to ignore the additional
+ argument. Document the behavior of the field_width argument.
+ * io.c (inrec): Pass pointer to field_width array to get_a_record,
+ and then hand it off to set_record.
+ (do_getline_redir): If not reading into a variable, pass pointer to
+ field_width array to get_a_record and then hand it off to set_record.
+ (do_getline): Ditto.
+ (get_a_record): Add a 4th field_width argument to pass through to
+ the API get_record method.
+
+2017-03-05 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * awk.h (set_record): Add a new argument containing a field-width
+ array returned by an API parser.
+ (field_sep_type): Add new enum value Using_API.
+ (current_field_sep_str): Declare new function.
+ * field.c (save_parse_field): New static variable to save the
+ parse_field value in cases where it's overridden by API parsing.
+ (api_fw): New static variable to hold pointer to API parser fieldwidth
+ array.
+ (set_record): Add new field-width array argument. If present, API
+ parsing will override the default parsing mechanism.
+ (api_parse_field): New field parser using field widths supplied by the
+ API. This is very similar to the existing fw_parse_field function.
+ (get_field): Fix typo in comment.
+ (set_parser): New function to set default parser and check whether
+ there's an API parser override in effect. Update PROCINFO["FS"] if
+ something has changed.
+ (set_FIELDWIDTHS): Use set_parser and stop updating PROCINFO["FS"].
+ (set_FS): Ditto.
+ (set_FPAT): Ditto.
+ (current_field_sep): Return Using_API when using the API field parsing
+ widths.
+ (current_field_sep_str): New function to return the proper string
+ value for PROCINFO["FS"].
+ * gawkapi.h (awk_input_buf_t): Add field_width array to enable the
+ parser get_record function to supply field widths to override the
+ default gawk field parsing mechanism.
+ * io.c (inrec): Pass iop->public.field_width to set_record as the
+ 3rd argument to enable API field parsing overrides.
+ (do_getline_redir, do_getline): Ditto.
+ * main.c (load_procinfo): Use new current_field_sep_str function
+ instead of switching on the return value from current_field_sep.
+
2017-02-23 Arnold D. Robbins <arnold@skeeve.com>
* awk.h (boolval): Return bool instead of int.
diff --git a/NEWS b/NEWS
index 3ddc96e9..2503f5f8 100644
--- a/NEWS
+++ b/NEWS
@@ -104,6 +104,13 @@ Changes from 4.1.x to 4.2.0
argument is present and is non-zero or non-null, the time will be converted
from UTC instead of from the local timezone.
+26. The FIELDWIDTHS parsing syntax has been enhanced to allow specifying
+ how many characters to skip before a field starts.
+
+27. An API input parser now has the ability to override the default field
+ parsing mechanism by specifying the locations of each field in the input
+ record. When this is in effect, PROCINFO["FS"] will be set to "API".
+
Changes from 4.1.3 to 4.1.4
---------------------------
diff --git a/awk.h b/awk.h
index c1e9b4a9..934fe25b 100644
--- a/awk.h
+++ b/awk.h
@@ -1510,7 +1510,7 @@ extern NODE *get_actual_argument(NODE *, int, bool);
#endif
/* field.c */
extern void init_fields(void);
-extern void set_record(const char *buf, int cnt);
+extern void set_record(const char *buf, int cnt, const awk_fieldwidth_info_t *);
extern void reset_record(void);
extern void rebuild_record(void);
extern void set_NF(void);
@@ -1527,9 +1527,11 @@ extern void update_PROCINFO_num(const char *subscript, AWKNUM val);
typedef enum {
Using_FS,
Using_FIELDWIDTHS,
- Using_FPAT
+ Using_FPAT,
+ Using_API
} field_sep_type;
extern field_sep_type current_field_sep(void);
+extern const char *current_field_sep_str(void);
/* gawkapi.c: */
extern gawk_api_t api_impl;
diff --git a/doc/ChangeLog b/doc/ChangeLog
index dd3e1c93..8c7876f4 100644
--- a/doc/ChangeLog
+++ b/doc/ChangeLog
@@ -5,6 +5,16 @@
* wordlist2: New file.
* Makefile.am: Revised for new document. Copyright years updated.
+2017-03-22 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * gawk.1: Document new PROCINFO["FS"] value "API".
+
+2017-03-22 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * awkcard.in: Document FIELDWIDTHS enhancement to support an optional
+ field skip prefix.
+ * gawk.1: Ditto.
+
2017-03-17 Arnold D. Robbins <arnold@skeeve.com>
* gawktexi.in: Improve the discussion of quoting on
diff --git a/doc/awkcard.in b/doc/awkcard.in
index 418cc8d9..86aeee2e 100644
--- a/doc/awkcard.in
+++ b/doc/awkcard.in
@@ -556,8 +556,10 @@ fails, or if
\*(FCclose()\*(FR fails.
T}
\*(FCFIELDWIDTHS\fP T{
-Whitespace separated list of field widths. Used
-to parse the input into fields of fixed width,
+Whitespace-separated list of field widths.
+Each field width may optionally be preceded by a colon-separated
+value specifying the number of characters to skip before the field starts.
+Used to parse the input into fields of fixed width,
instead of the value of \*(FCFS\fP.\*(CD
T}
\*(FCFILENAME\fP T{
@@ -1017,6 +1019,8 @@ also affects how fields are split when
variable is set to a space-separated list of numbers, each field is
expected to have a fixed width, and \*(GK
splits up the record using the specified widths.
+Each field width may optionally be preceded by a colon-separated
+value specifying the number of characters to skip before the field starts.
The value of \*(FCFS\fP is ignored.
Assigning a new value to \*(FCFS\fP or \*(FCFPAT\fP
overrides the use of \*(FCFIELDWIDTHS\*(FR.
diff --git a/doc/gawk.1 b/doc/gawk.1
index 2460a686..a4f691d6 100644
--- a/doc/gawk.1
+++ b/doc/gawk.1
@@ -805,10 +805,13 @@ is a regular expression.
.PP
If the
.B FIELDWIDTHS
-variable is set to a space separated list of numbers, each field is
+variable is set to a space-separated list of numbers, each field is
expected to have fixed width, and
.I gawk
-splits up the record using the specified widths. The value of
+splits up the record using the specified widths.
+Each field width may optionally be preceded by a colon-separated
+value specifying the number of characters to skip before the field starts.
+The value of
.B FS
is ignored.
Assigning a new value to
@@ -959,12 +962,14 @@ For non-system errors,
will be zero.
.TP
.B FIELDWIDTHS
-A whitespace separated list of field widths. When set,
+A whitespace-separated list of field widths. When set,
.I gawk
parses the input into fields of fixed width, instead of using the
value of the
.B FS
variable as the field separator.
+Each field width may optionally be preceded by a colon-separated
+value specifying the number of characters to skip before the field starts.
See
.BR Fields ,
above.
@@ -1133,8 +1138,10 @@ is in effect,
\fB"FPAT"\fP if field splitting with
.B FPAT
is in effect,
-or \fB"FIELDWIDTHS"\fP if field splitting with
+\fB"FIELDWIDTHS"\fP if field splitting with
.B FIELDWIDTHS
+is in effect,
+or \fB"API"\fP if API input parser field splitting
is in effect.
.TP
\fBPROCINFO["gid"]\fP
diff --git a/extension/ChangeLog b/extension/ChangeLog
index 9ea2ea9a..d10dc766 100644
--- a/extension/ChangeLog
+++ b/extension/ChangeLog
@@ -4,6 +4,37 @@
wrong argument count error message. Thanks to Dan Neilsen
for the report.
+2017-03-27 Arnold D. Robbins <arnold@skeeve.com>
+
+ * readdir.c: Minor edits.
+ * readdir_test.c: Same minor edits, update copyright year,
+ bump version of extension in case this ever becomes the real one.
+
+2017-03-23 Arnold D. Robbins <arnold@skeeve.com>
+
+ * readdir.c (dir_get_record): Add additional parameter to make types
+ match and remove compiler warning.
+ * readfile.c (readfile_get_record): Ditto.
+ * revtwoway.c (rev2way_get_record): Ditto.
+
+2017-03-21 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * readdir_test.c (open_directory_t): Replace field_width array
+ with new awk_fieldwidth_info_t structure. Wrap it in a union so
+ we can allocate the proper size.
+ (dir_get_record): Update field_width type from
+ 'const awk_input_field_info_t **' to 'const awk_fieldwidth_info_t **'.
+ Update new fieldwidth parsing info appropriately.
+ (dir_take_control_of): Populate new fieldwidth parsing structure
+ with initial values.
+
+2017-03-09 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * readdir_test.c (open_directory_t): Update field_width type from an
+ array of integers to an array of awk_input_field_info_t.
+ (dir_get_record): Ditto.
+ (dir_take_control_of): Ditto.
+
2017-03-07 Andrew J. Schorr <aschorr@telemetry-investments.com>
* Makefile.am (pkgextension_LTLIBRARIES): Remove testext.la, since it
@@ -16,6 +47,13 @@
installed, automake cannot use the final destination directory to
determine -rpath by itself. The value doesn't matter.
+2017-03-06 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * readdir_test.c: Test extension using new get_record field_width
+ parsing feature.
+ * Makefile.am (noinst_LTLIBRARIES): Add readdir_test.la.
+ (readdir_test_la_*): Configure building of new extension library.
+
2017-01-21 Eli Zaretskii <eliz@gnu.org>
* testext.c (getuid) [__MINGW32__]: New function, mirrors what
diff --git a/extension/Makefile.am b/extension/Makefile.am
index 185bc795..6ea16f5d 100644
--- a/extension/Makefile.am
+++ b/extension/Makefile.am
@@ -48,6 +48,7 @@ pkgextension_LTLIBRARIES = \
time.la
noinst_LTLIBRARIES = \
+ readdir_test.la \
testext.la
MY_MODULE_FLAGS = -module -avoid-version -no-undefined
@@ -106,6 +107,13 @@ testext_la_SOURCES = testext.c
testext_la_LDFLAGS = $(MY_MODULE_FLAGS) -rpath /foo
testext_la_LIBADD = $(MY_LIBS)
+# N.B. Because we are not installing readdir_test, we must specify -rpath in
+# LDFLAGS to get automake to build a shared library, since it needs
+# an installation path.
+readdir_test_la_SOURCES = readdir_test.c
+readdir_test_la_LDFLAGS = $(MY_MODULE_FLAGS) -rpath /foo
+readdir_test_la_LIBADD = $(MY_LIBS)
+
install-data-hook:
for i in $(pkgextension_LTLIBRARIES) ; do \
$(RM) $(DESTDIR)$(pkgextensiondir)/$$i ; \
diff --git a/extension/Makefile.in b/extension/Makefile.in
index 6557693a..c0e2676b 100644
--- a/extension/Makefile.in
+++ b/extension/Makefile.in
@@ -199,6 +199,13 @@ readdir_la_OBJECTS = $(am_readdir_la_OBJECTS)
readdir_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) \
$(LIBTOOLFLAGS) --mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \
$(readdir_la_LDFLAGS) $(LDFLAGS) -o $@
+readdir_test_la_DEPENDENCIES = $(am__DEPENDENCIES_1)
+am_readdir_test_la_OBJECTS = readdir_test.lo
+readdir_test_la_OBJECTS = $(am_readdir_test_la_OBJECTS)
+readdir_test_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \
+ $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=link $(CCLD) \
+ $(AM_CFLAGS) $(CFLAGS) $(readdir_test_la_LDFLAGS) $(LDFLAGS) \
+ -o $@
readfile_la_DEPENDENCIES = $(am__DEPENDENCIES_1)
am_readfile_la_OBJECTS = readfile.lo
readfile_la_OBJECTS = $(am_readfile_la_OBJECTS)
@@ -271,14 +278,16 @@ am__v_CCLD_0 = @echo " CCLD " $@;
am__v_CCLD_1 =
SOURCES = $(filefuncs_la_SOURCES) $(fnmatch_la_SOURCES) \
$(fork_la_SOURCES) $(inplace_la_SOURCES) $(ordchr_la_SOURCES) \
- $(readdir_la_SOURCES) $(readfile_la_SOURCES) \
- $(revoutput_la_SOURCES) $(revtwoway_la_SOURCES) \
- $(rwarray_la_SOURCES) $(testext_la_SOURCES) $(time_la_SOURCES)
+ $(readdir_la_SOURCES) $(readdir_test_la_SOURCES) \
+ $(readfile_la_SOURCES) $(revoutput_la_SOURCES) \
+ $(revtwoway_la_SOURCES) $(rwarray_la_SOURCES) \
+ $(testext_la_SOURCES) $(time_la_SOURCES)
DIST_SOURCES = $(filefuncs_la_SOURCES) $(fnmatch_la_SOURCES) \
$(fork_la_SOURCES) $(inplace_la_SOURCES) $(ordchr_la_SOURCES) \
- $(readdir_la_SOURCES) $(readfile_la_SOURCES) \
- $(revoutput_la_SOURCES) $(revtwoway_la_SOURCES) \
- $(rwarray_la_SOURCES) $(testext_la_SOURCES) $(time_la_SOURCES)
+ $(readdir_la_SOURCES) $(readdir_test_la_SOURCES) \
+ $(readfile_la_SOURCES) $(revoutput_la_SOURCES) \
+ $(revtwoway_la_SOURCES) $(rwarray_la_SOURCES) \
+ $(testext_la_SOURCES) $(time_la_SOURCES)
RECURSIVE_TARGETS = all-recursive check-recursive cscopelist-recursive \
ctags-recursive dvi-recursive html-recursive info-recursive \
install-data-recursive install-dvi-recursive \
@@ -520,6 +529,7 @@ pkgextension_LTLIBRARIES = \
time.la
noinst_LTLIBRARIES = \
+ readdir_test.la \
testext.la
MY_MODULE_FLAGS = -module -avoid-version -no-undefined
@@ -567,6 +577,13 @@ time_la_LIBADD = $(MY_LIBS)
testext_la_SOURCES = testext.c
testext_la_LDFLAGS = $(MY_MODULE_FLAGS) -rpath /foo
testext_la_LIBADD = $(MY_LIBS)
+
+# N.B. Because we are not installing readdir_test, we must specify -rpath in
+# LDFLAGS to get automake to build a shared library, since it needs
+# an installation path.
+readdir_test_la_SOURCES = readdir_test.c
+readdir_test_la_LDFLAGS = $(MY_MODULE_FLAGS) -rpath /foo
+readdir_test_la_LIBADD = $(MY_LIBS)
EXTRA_DIST = build-aux/config.rpath \
ChangeLog \
ChangeLog.0 \
@@ -702,6 +719,9 @@ ordchr.la: $(ordchr_la_OBJECTS) $(ordchr_la_DEPENDENCIES) $(EXTRA_ordchr_la_DEPE
readdir.la: $(readdir_la_OBJECTS) $(readdir_la_DEPENDENCIES) $(EXTRA_readdir_la_DEPENDENCIES)
$(AM_V_CCLD)$(readdir_la_LINK) -rpath $(pkgextensiondir) $(readdir_la_OBJECTS) $(readdir_la_LIBADD) $(LIBS)
+readdir_test.la: $(readdir_test_la_OBJECTS) $(readdir_test_la_DEPENDENCIES) $(EXTRA_readdir_test_la_DEPENDENCIES)
+ $(AM_V_CCLD)$(readdir_test_la_LINK) $(readdir_test_la_OBJECTS) $(readdir_test_la_LIBADD) $(LIBS)
+
readfile.la: $(readfile_la_OBJECTS) $(readfile_la_DEPENDENCIES) $(EXTRA_readfile_la_DEPENDENCIES)
$(AM_V_CCLD)$(readfile_la_LINK) -rpath $(pkgextensiondir) $(readfile_la_OBJECTS) $(readfile_la_LIBADD) $(LIBS)
@@ -733,6 +753,7 @@ distclean-compile:
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/inplace.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ordchr.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/readdir.Plo@am__quote@
+@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/readdir_test.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/readfile.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/revoutput.Plo@am__quote@
@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/revtwoway.Plo@am__quote@
diff --git a/extension/readdir.c b/extension/readdir.c
index 39acba68..2e34456e 100644
--- a/extension/readdir.c
+++ b/extension/readdir.c
@@ -51,7 +51,7 @@
#ifdef HAVE_DIRENT_H
#include <dirent.h>
#else
-#error Cannot compile the dirent extension on this system!
+#error Cannot compile the readdir extension on this system!
#endif
#ifdef __MINGW32__
@@ -137,6 +137,7 @@ ftype(struct dirent *entry, const char *dirname)
}
/* get_inode --- get the inode of a file */
+
static long long
get_inode(struct dirent *entry, const char *dirname)
{
@@ -168,7 +169,8 @@ get_inode(struct dirent *entry, const char *dirname)
static int
dir_get_record(char **out, awk_input_buf_t *iobuf, int *errcode,
- char **rt_start, size_t *rt_len)
+ char **rt_start, size_t *rt_len,
+ const awk_fieldwidth_info_t **unused)
{
DIR *dp;
struct dirent *dirent;
@@ -198,7 +200,7 @@ dir_get_record(char **out, awk_input_buf_t *iobuf, int *errcode,
return EOF;
}
- ino = get_inode (dirent, iobuf->name);
+ ino = get_inode(dirent, iobuf->name);
#if __MINGW32__
len = sprintf(the_dir->buf, "%I64u/%s", ino, dirent->d_name);
diff --git a/extension/readdir_test.c b/extension/readdir_test.c
new file mode 100644
index 00000000..6d6ee134
--- /dev/null
+++ b/extension/readdir_test.c
@@ -0,0 +1,343 @@
+/*
+ * readdir.c --- Provide an input parser to read directories
+ *
+ * Arnold Robbins
+ * arnold@skeeve.com
+ * Written 7/2012
+ *
+ * Andrew Schorr and Arnold Robbins: further fixes 8/2012.
+ * Simplified 11/2012.
+ */
+
+/*
+ * Copyright (C) 2012-2014, 2017 the Free Software Foundation, Inc.
+ *
+ * This file is part of GAWK, the GNU implementation of the
+ * AWK Programming Language.
+ *
+ * GAWK is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * GAWK is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA
+ */
+
+#ifdef HAVE_CONFIG_H
+#include <config.h>
+#endif
+
+#define _BSD_SOURCE
+#include <stdio.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#ifdef HAVE_LIMITS_H
+#include <limits.h>
+#endif
+
+#ifdef HAVE_DIRENT_H
+#include <dirent.h>
+#else
+#error Cannot compile the readdir extension on this system!
+#endif
+
+#ifdef __MINGW32__
+#define WIN32_LEAN_AND_MEAN
+#include <windows.h>
+#endif
+
+#include "gawkapi.h"
+
+#include "gawkdirfd.h"
+
+#include "gettext.h"
+#define _(msgid) gettext(msgid)
+#define N_(msgid) msgid
+
+#ifndef PATH_MAX
+#define PATH_MAX 1024 /* a good guess */
+#endif
+
+static const gawk_api_t *api; /* for convenience macros to work */
+static awk_ext_id_t *ext_id;
+static const char *ext_version = "readdir extension: version 2.0";
+
+static awk_bool_t init_readdir(void);
+static awk_bool_t (*init_func)(void) = init_readdir;
+
+int plugin_is_GPL_compatible;
+
+/* data type for the opaque pointer: */
+
+typedef struct open_directory {
+ DIR *dp;
+ char *buf;
+ union {
+ awk_fieldwidth_info_t fw;
+ char buf[awk_fieldwidth_info_size(3)];
+ } u;
+} open_directory_t;
+#define fw u.fw
+
+/* ftype --- return type of file as a single character string */
+
+static const char *
+ftype(struct dirent *entry, const char *dirname)
+{
+#ifdef DT_BLK
+ (void) dirname; /* silence warnings */
+ switch (entry->d_type) {
+ case DT_BLK: return "b";
+ case DT_CHR: return "c";
+ case DT_DIR: return "d";
+ case DT_FIFO: return "p";
+ case DT_LNK: return "l";
+ case DT_REG: return "f";
+ case DT_SOCK: return "s";
+ default:
+ case DT_UNKNOWN: return "u";
+ }
+#else
+ char fname[PATH_MAX];
+ struct stat sbuf;
+
+ strcpy(fname, dirname);
+ strcat(fname, "/");
+ strcat(fname, entry->d_name);
+ if (stat(fname, &sbuf) == 0) {
+ if (S_ISBLK(sbuf.st_mode))
+ return "b";
+ if (S_ISCHR(sbuf.st_mode))
+ return "c";
+ if (S_ISDIR(sbuf.st_mode))
+ return "d";
+ if (S_ISFIFO(sbuf.st_mode))
+ return "p";
+ if (S_ISREG(sbuf.st_mode))
+ return "f";
+#ifdef S_ISLNK
+ if (S_ISLNK(sbuf.st_mode))
+ return "l";
+#endif
+#ifdef S_ISSOCK
+ if (S_ISSOCK(sbuf.st_mode))
+ return "s";
+#endif
+ }
+ return "u";
+#endif
+}
+
+/* get_inode --- get the inode of a file */
+
+static long long
+get_inode(struct dirent *entry, const char *dirname)
+{
+#ifdef __MINGW32__
+ char fname[PATH_MAX];
+ HANDLE fh;
+ BY_HANDLE_FILE_INFORMATION info;
+
+ sprintf(fname, "%s\\%s", dirname, entry->d_name);
+ fh = CreateFile(fname, 0, 0, NULL, OPEN_EXISTING,
+ FILE_FLAG_BACKUP_SEMANTICS, NULL);
+ if (fh == INVALID_HANDLE_VALUE)
+ return 0;
+ if (GetFileInformationByHandle(fh, &info)) {
+ long long inode = info.nFileIndexHigh;
+
+ inode <<= 32;
+ inode += info.nFileIndexLow;
+ return inode;
+ }
+ return 0;
+#else
+ (void) dirname; /* silence warnings */
+ return entry->d_ino;
+#endif
+}
+
+/* dir_get_record --- get one record at a time out of a directory */
+
+static int
+dir_get_record(char **out, awk_input_buf_t *iobuf, int *errcode,
+ char **rt_start, size_t *rt_len,
+ const awk_fieldwidth_info_t **field_width)
+{
+ DIR *dp;
+ struct dirent *dirent;
+ int len, flen;
+ open_directory_t *the_dir;
+ const char *ftstr;
+ unsigned long long ino;
+
+ /*
+ * The caller sets *errcode to 0, so we should set it only if an
+ * error occurs.
+ */
+
+ if (out == NULL || iobuf == NULL || iobuf->opaque == NULL)
+ return EOF;
+
+ the_dir = (open_directory_t *) iobuf->opaque;
+ dp = the_dir->dp;
+
+ /*
+ * Initialize errno, since readdir does not set it to zero on EOF.
+ */
+ errno = 0;
+ dirent = readdir(dp);
+ if (dirent == NULL) {
+ *errcode = errno; /* in case there was an error */
+ return EOF;
+ }
+
+ ino = get_inode(dirent, iobuf->name);
+
+#if __MINGW32__
+ len = sprintf(the_dir->buf, "%I64u", ino);
+#else
+ len = sprintf(the_dir->buf, "%llu", ino);
+#endif
+ the_dir->fw.fields[0].len = len;
+ len += (flen = sprintf(the_dir->buf + len, "/%s", dirent->d_name));
+ the_dir->fw.fields[1].len = flen-1;
+
+ ftstr = ftype(dirent, iobuf->name);
+ len += (flen = sprintf(the_dir->buf + len, "/%s", ftstr));
+ the_dir->fw.fields[2].len = flen-1;
+
+ *out = the_dir->buf;
+
+ *rt_start = NULL;
+ *rt_len = 0; /* set RT to "" */
+ if (field_width)
+ *field_width = & the_dir->fw;
+ return len;
+}
+
+/* dir_close --- close up when done */
+
+static void
+dir_close(awk_input_buf_t *iobuf)
+{
+ open_directory_t *the_dir;
+
+ if (iobuf == NULL || iobuf->opaque == NULL)
+ return;
+
+ the_dir = (open_directory_t *) iobuf->opaque;
+
+ closedir(the_dir->dp);
+ gawk_free(the_dir->buf);
+ gawk_free(the_dir);
+
+ iobuf->fd = -1;
+}
+
+/* dir_can_take_file --- return true if we want the file */
+
+static awk_bool_t
+dir_can_take_file(const awk_input_buf_t *iobuf)
+{
+ if (iobuf == NULL)
+ return awk_false;
+
+ return (iobuf->fd != INVALID_HANDLE && S_ISDIR(iobuf->sbuf.st_mode));
+}
+
+/*
+ * dir_take_control_of --- set up input parser.
+ * We can assume that dir_can_take_file just returned true,
+ * and no state has changed since then.
+ */
+
+static awk_bool_t
+dir_take_control_of(awk_input_buf_t *iobuf)
+{
+ DIR *dp;
+ open_directory_t *the_dir;
+ size_t size;
+
+ errno = 0;
+#ifdef HAVE_FDOPENDIR
+ dp = fdopendir(iobuf->fd);
+#else
+ dp = opendir(iobuf->name);
+ if (dp != NULL)
+ iobuf->fd = dirfd(dp);
+#endif
+ if (dp == NULL) {
+ warning(ext_id, _("dir_take_control_of: opendir/fdopendir failed: %s"),
+ strerror(errno));
+ update_ERRNO_int(errno);
+ return awk_false;
+ }
+
+ emalloc(the_dir, open_directory_t *, sizeof(open_directory_t), "dir_take_control_of");
+ the_dir->dp = dp;
+ /* pre-populate the field_width struct with constant values: */
+ the_dir->fw.use_chars = awk_false;
+ the_dir->fw.nf = 3;
+ the_dir->fw.fields[0].skip = 0; /* no leading space */
+ the_dir->fw.fields[1].skip = 1; /* single '/' separator */
+ the_dir->fw.fields[2].skip = 1; /* single '/' separator */
+ size = sizeof(struct dirent) + 21 /* max digits in inode */ + 2 /* slashes */;
+ emalloc(the_dir->buf, char *, size, "dir_take_control_of");
+
+ iobuf->opaque = the_dir;
+ iobuf->get_record = dir_get_record;
+ iobuf->close_func = dir_close;
+
+ return awk_true;
+}
+
+static awk_input_parser_t readdir_parser = {
+ "readdir",
+ dir_can_take_file,
+ dir_take_control_of,
+ NULL
+};
+
+#ifdef TEST_DUPLICATE
+static awk_input_parser_t readdir_parser2 = {
+ "readdir2",
+ dir_can_take_file,
+ dir_take_control_of,
+ NULL
+};
+#endif
+
+/* init_readdir --- set things ups */
+
+static awk_bool_t
+init_readdir()
+{
+ register_input_parser(& readdir_parser);
+#ifdef TEST_DUPLICATE
+ register_input_parser(& readdir_parser2);
+#endif
+
+ return awk_true;
+}
+
+static awk_ext_func_t func_table[] = {
+ { NULL, NULL, 0, 0, awk_false, NULL }
+};
+
+/* define the dl_load function using the boilerplate macro */
+
+dl_load_func(func_table, readdir, "")
diff --git a/extension/readfile.c b/extension/readfile.c
index b453da21..fb1a376b 100644
--- a/extension/readfile.c
+++ b/extension/readfile.c
@@ -142,7 +142,8 @@ done:
static int
readfile_get_record(char **out, awk_input_buf_t *iobuf, int *errcode,
- char **rt_start, size_t *rt_len)
+ char **rt_start, size_t *rt_len,
+ const awk_fieldwidth_info_t **unused)
{
char *text;
diff --git a/extension/revtwoway.c b/extension/revtwoway.c
index ac4e22cf..84989bfc 100644
--- a/extension/revtwoway.c
+++ b/extension/revtwoway.c
@@ -133,7 +133,8 @@ close_two_proc_data(two_way_proc_data_t *proc_data)
static int
rev2way_get_record(char **out, awk_input_buf_t *iobuf, int *errcode,
- char **rt_start, size_t *rt_len)
+ char **rt_start, size_t *rt_len,
+ const awk_fieldwidth_info_t **unused)
{
int len = 0; /* for now */
two_way_proc_data_t *proc_data;
diff --git a/field.c b/field.c
index 0799fb1b..b5f28c17 100644
--- a/field.c
+++ b/field.c
@@ -38,8 +38,17 @@ is_blank(int c)
typedef void (* Setfunc)(long, char *, long, NODE *);
-static long (*parse_field)(long, char **, int, NODE *,
+/* is the API currently overriding the default parsing mechanism? */
+static bool api_parser_override = false;
+typedef long (*parse_field_func_t)(long, char **, int, NODE *,
Regexp *, Setfunc, NODE *, NODE *, bool);
+static parse_field_func_t parse_field;
+/*
+ * N.B. The normal_parse_field function pointer contains the parse_field value
+ * that should be used except when API field parsing is overriding the default
+ * field parsing mechanism.
+ */
+static parse_field_func_t normal_parse_field;
static long re_parse_field(long, char **, int, NODE *,
Regexp *, Setfunc, NODE *, NODE *, bool);
static long def_parse_field(long, char **, int, NODE *,
@@ -50,6 +59,7 @@ static long sc_parse_field(long, char **, int, NODE *,
Regexp *, Setfunc, NODE *, NODE *, bool);
static long fw_parse_field(long, char **, int, NODE *,
Regexp *, Setfunc, NODE *, NODE *, bool);
+static const awk_fieldwidth_info_t *api_fw = NULL;
static long fpat_parse_field(long, char **, int, NODE *,
Regexp *, Setfunc, NODE *, NODE *, bool);
static void set_element(long num, char * str, long len, NODE *arr);
@@ -64,7 +74,7 @@ static bool resave_fs;
static NODE *save_FS; /* save current value of FS when line is read,
* to be used in deferred parsing
*/
-static int *FIELDWIDTHS = NULL;
+static awk_fieldwidth_info_t *FIELDWIDTHS = NULL;
NODE **fields_arr; /* array of pointers to the field nodes */
bool field0_valid; /* $(>0) has not been changed yet */
@@ -252,7 +262,7 @@ rebuild_record()
* but better correct than fast.
*/
void
-set_record(const char *buf, int cnt)
+set_record(const char *buf, int cnt, const awk_fieldwidth_info_t *fw)
{
NODE *n;
static char *databuf;
@@ -306,6 +316,19 @@ set_record(const char *buf, int cnt)
n->stfmt = STFMT_UNUSED;
n->flags = (STRING|STRCUR|USER_INPUT); /* do not set MALLOC */
fields_arr[0] = n;
+ if (fw != api_fw) {
+ if ((api_fw = fw) != NULL) {
+ if (! api_parser_override) {
+ api_parser_override = true;
+ parse_field = fw_parse_field;
+ update_PROCINFO_str("FS", "API");
+ }
+ } else if (api_parser_override) {
+ api_parser_override = false;
+ parse_field = normal_parse_field;
+ update_PROCINFO_str("FS", current_field_sep_str());
+ }
+ }
#undef INITIAL_SIZE
#undef MAX_SIZE
@@ -691,6 +714,31 @@ sc_parse_field(long up_to, /* parse only up to this field number */
}
/*
+ * calc_mbslen --- calculate the length in bytes of a multi-byte string
+ * containing len characters.
+ */
+
+static size_t
+calc_mbslen(char *scan, char *end, size_t len, mbstate_t *mbs)
+{
+
+ size_t mbclen;
+ char *mbscan = scan;
+
+ while (len-- > 0 && mbscan < end) {
+ mbclen = mbrlen(mbscan, end - mbscan, mbs);
+ if (!(mbclen > 0 && mbclen <= (size_t)(end - mbscan)))
+ /*
+ * We treat it as a singlebyte character. This should
+ * catch error codes 0, (size_t) -1, and (size_t) -2.
+ */
+ mbclen = 1;
+ mbscan += mbclen;
+ }
+ return mbscan - scan;
+}
+
+/*
* fw_parse_field --- field parsing using FIELDWIDTHS spec
*
* This is called from get_field() via (*parse_field)().
@@ -710,53 +758,53 @@ fw_parse_field(long up_to, /* parse only up to this field number */
char *scan = *buf;
long nf = parse_high_water;
char *end = scan + len;
- int nmbc;
- size_t mbclen;
- size_t mbslen;
- size_t lenrest;
- char *mbscan;
+ const awk_fieldwidth_info_t *fw;
mbstate_t mbs;
+ size_t skiplen;
+ size_t flen;
- memset(&mbs, 0, sizeof(mbstate_t));
+ fw = (api_parser_override ? api_fw : FIELDWIDTHS);
if (up_to == UNLIMITED)
nf = 0;
if (len == 0)
return nf;
- for (; nf < up_to && (len = FIELDWIDTHS[nf+1]) != -1; ) {
- if (gawk_mb_cur_max > 1) {
- nmbc = 0;
- mbslen = 0;
- mbscan = scan;
- lenrest = end - scan;
- while (nmbc < len && mbslen < lenrest) {
- mbclen = mbrlen(mbscan, end - mbscan, &mbs);
- if ( mbclen == 1
- || mbclen == (size_t) -1
- || mbclen == (size_t) -2
- || mbclen == 0) {
- /* We treat it as a singlebyte character. */
- mbclen = 1;
- }
- if (mbclen <= end - mbscan) {
- mbscan += mbclen;
- mbslen += mbclen;
- ++nmbc;
- }
- }
- (*set)(++nf, scan, (long) mbslen, n);
- scan += mbslen;
- } else {
- if (len > end - scan)
- len = end - scan;
- (*set)(++nf, scan, (long) len, n);
- scan += len;
+ if (gawk_mb_cur_max > 1 && fw->use_chars) {
+ /*
+ * Reset the shift state for each field, since there might
+ * be who-knows-what kind of stuff in between fields,
+ * and we assume each field starts with a valid (possibly
+ * multibyte) character.
+ */
+ memset(&mbs, 0, sizeof(mbstate_t));
+ while (nf < up_to) {
+ if (nf >= fw->nf) {
+ *buf = end;
+ return nf;
+ }
+ scan += calc_mbslen(scan, end, fw->fields[nf].skip, &mbs);
+ flen = calc_mbslen(scan, end, fw->fields[nf].len, &mbs);
+ (*set)(++nf, scan, (long) flen, n);
+ scan += flen;
+ }
+ } else {
+ while (nf < up_to) {
+ if (nf >= fw->nf) {
+ *buf = end;
+ return nf;
+ }
+ skiplen = fw->fields[nf].skip;
+ if (skiplen > end - scan)
+ skiplen = end - scan;
+ scan += skiplen;
+ flen = fw->fields[nf].len;
+ if (flen > end - scan)
+ flen = end - scan;
+ (*set)(++nf, scan, (long) flen, n);
+ scan += flen;
}
}
- if (len == -1)
- *buf = end;
- else
- *buf = scan;
+ *buf = scan;
return nf;
}
@@ -845,7 +893,7 @@ get_field(long requested, Func_ptr *assign)
if (parse_extent == fields_arr[0]->stptr + fields_arr[0]->stlen)
NF = parse_high_water;
else if (parse_field == fpat_parse_field) {
- /* FPAT parsing is wierd, isolate the special cases */
+ /* FPAT parsing is weird, isolate the special cases */
char *rec_start = fields_arr[0]->stptr;
char *rec_end = fields_arr[0]->stptr + fields_arr[0]->stlen;
@@ -1057,6 +1105,18 @@ do_patsplit(int nargs)
return tmp;
}
+/* set_parser --- update the current (non-API) parser */
+
+static void
+set_parser(parse_field_func_t func)
+{
+ normal_parse_field = func;
+ if (! api_parser_override && parse_field != func) {
+ parse_field = func;
+ update_PROCINFO_str("FS", current_field_sep_str());
+ }
+}
+
/* set_FIELDWIDTHS --- handle an assignment to FIELDWIDTHS */
void
@@ -1078,27 +1138,27 @@ set_FIELDWIDTHS()
return;
/*
- * If changing the way fields are split, obey least-suprise
+ * If changing the way fields are split, obey least-surprise
* semantics, and force $0 to be split totally.
*/
if (fields_arr != NULL)
(void) get_field(UNLIMITED - 1, 0);
- parse_field = fw_parse_field;
+ set_parser(fw_parse_field);
tmp = force_string(FIELDWIDTHS_node->var_value);
scan = tmp->stptr;
- if (FIELDWIDTHS == NULL)
- emalloc(FIELDWIDTHS, int *, fw_alloc * sizeof(int), "set_FIELDWIDTHS");
- FIELDWIDTHS[0] = 0;
- for (i = 1; ; i++) {
+ if (FIELDWIDTHS == NULL) {
+ emalloc(FIELDWIDTHS, awk_fieldwidth_info_t *, awk_fieldwidth_info_size(fw_alloc), "set_FIELDWIDTHS");
+ FIELDWIDTHS->use_chars = true;
+ }
+ FIELDWIDTHS->nf = 0;
+ for (i = 0; ; i++) {
unsigned long int tmp;
- if (i + 1 >= fw_alloc) {
+ if (i >= fw_alloc) {
fw_alloc *= 2;
- erealloc(FIELDWIDTHS, int *, fw_alloc * sizeof(int), "set_FIELDWIDTHS");
+ erealloc(FIELDWIDTHS, awk_fieldwidth_info_t *, awk_fieldwidth_info_size(fw_alloc), "set_FIELDWIDTHS");
}
- /* Initialize value to be end of list */
- FIELDWIDTHS[i] = -1;
/* Ensure that there is no leading `-' sign. Otherwise,
strtoul would accept it and return a bogus result. */
while (is_blank(*scan)) {
@@ -1116,6 +1176,13 @@ set_FIELDWIDTHS()
or a value that is not in the range [1..INT_MAX]. */
errno = 0;
tmp = strtoul(scan, &end, 10);
+ if (errno == 0 && *end == ':' && (0 < tmp && tmp <= INT_MAX)) {
+ FIELDWIDTHS->fields[i].skip = tmp;
+ scan = end + 1;
+ tmp = strtoul(scan, &end, 10);
+ }
+ else
+ FIELDWIDTHS->fields[i].skip = 0;
if (errno != 0
|| (*end != '\0' && ! is_blank(*end))
|| !(0 < tmp && tmp <= INT_MAX)
@@ -1123,7 +1190,8 @@ set_FIELDWIDTHS()
fatal_error = true;
break;
}
- FIELDWIDTHS[i] = tmp;
+ FIELDWIDTHS->fields[i].len = tmp;
+ FIELDWIDTHS->nf = i+1;
scan = end;
/* Skip past any trailing blanks. */
while (is_blank(*scan)) {
@@ -1132,12 +1200,10 @@ set_FIELDWIDTHS()
if (*scan == '\0')
break;
}
- FIELDWIDTHS[i+1] = -1;
- update_PROCINFO_str("FS", "FIELDWIDTHS");
if (fatal_error)
- fatal(_("invalid FIELDWIDTHS value, near `%s'"),
- scan);
+ fatal(_("invalid FIELDWIDTHS value, for field %d, near `%s'"),
+ i, scan);
}
/* set_FS --- handle things when FS is assigned to */
@@ -1205,7 +1271,7 @@ choose_fs_function:
if (! do_traditional && fs->stlen == 0) {
static bool warned = false;
- parse_field = null_parse_field;
+ set_parser(null_parse_field);
if (do_lint && ! warned) {
warned = true;
@@ -1214,10 +1280,10 @@ choose_fs_function:
} else if (fs->stlen > 1) {
if (do_lint_old)
warning(_("old awk does not support regexps as value of `FS'"));
- parse_field = re_parse_field;
+ set_parser(re_parse_field);
} else if (RS_is_null) {
/* we know that fs->stlen <= 1 */
- parse_field = sc_parse_field;
+ set_parser(sc_parse_field);
if (fs->stlen == 1) {
if (fs->stptr[0] == ' ') {
default_FS = true;
@@ -1233,7 +1299,7 @@ choose_fs_function:
}
}
} else {
- parse_field = def_parse_field;
+ set_parser(def_parse_field);
if (fs->stlen == 1) {
if (fs->stptr[0] == ' ')
@@ -1242,7 +1308,7 @@ choose_fs_function:
/* same special case */
strcpy(buf, "[\\\\]");
else
- parse_field = sc_parse_field;
+ set_parser(sc_parse_field);
}
}
if (remake_re) {
@@ -1254,7 +1320,7 @@ choose_fs_function:
FS_re_yes_case = make_regexp(buf, strlen(buf), false, true, true);
FS_re_no_case = make_regexp(buf, strlen(buf), true, true, true);
FS_regexp = (IGNORECASE ? FS_re_no_case : FS_re_yes_case);
- parse_field = re_parse_field;
+ set_parser(re_parse_field);
} else if (parse_field == re_parse_field) {
FS_re_yes_case = make_regexp(fs->stptr, fs->stlen, false, true, true);
FS_re_no_case = make_regexp(fs->stptr, fs->stlen, true, true, true);
@@ -1270,16 +1336,16 @@ choose_fs_function:
*/
if (fs->stlen == 1 && parse_field == re_parse_field)
FS_regexp = FS_re_yes_case;
-
- update_PROCINFO_str("FS", "FS");
}
-/* current_field_sep --- return what field separator is */
+/* current_field_sep --- return the field separator type */
field_sep_type
current_field_sep()
{
- if (parse_field == fw_parse_field)
+ if (api_parser_override)
+ return Using_API;
+ else if (parse_field == fw_parse_field)
return Using_FIELDWIDTHS;
else if (parse_field == fpat_parse_field)
return Using_FPAT;
@@ -1287,6 +1353,21 @@ current_field_sep()
return Using_FS;
}
+/* current_field_sep_str --- return the field separator type as a string */
+
+const char *
+current_field_sep_str()
+{
+ if (api_parser_override)
+ return "API";
+ else if (parse_field == fw_parse_field)
+ return "FIELDWIDTHS";
+ else if (parse_field == fpat_parse_field)
+ return "FPAT";
+ else
+ return "FS";
+}
+
/* update_PROCINFO_str --- update PROCINFO[sub] with string value */
void
@@ -1373,7 +1454,7 @@ set_FPAT()
set_fpat_function:
fpat = force_string(FPAT_node->var_value);
- parse_field = fpat_parse_field;
+ set_parser(fpat_parse_field);
if (remake_re) {
refree(FPAT_re_yes_case);
@@ -1384,8 +1465,6 @@ set_fpat_function:
FPAT_re_no_case = make_regexp(fpat->stptr, fpat->stlen, true, true, true);
FPAT_regexp = (IGNORECASE ? FPAT_re_no_case : FPAT_re_yes_case);
}
-
- update_PROCINFO_str("FS", "FPAT");
}
/*
diff --git a/gawkapi.h b/gawkapi.h
index 5071adce..a8d6279f 100644
--- a/gawkapi.h
+++ b/gawkapi.h
@@ -117,6 +117,32 @@ typedef enum awk_bool {
awk_true
} awk_bool_t; /* we don't use <stdbool.h> on purpose */
+/*
+ * If the input parser would like to specify the field positions in the input
+ * record, it may populate an awk_fieldwidth_info_t structure to indicate
+ * the location of each field. The use_chars boolean controls whether the
+ * field lengths are specified in terms of bytes or potentially multi-byte
+ * characters. Performance will be better if the values are supplied in
+ * terms of bytes. The fields[0].skip value indicates how many bytes (or
+ * characters) to skip before $1, and fields[0].len is the length of $1, etc.
+ */
+
+typedef struct {
+ awk_bool_t use_chars; /* false ==> use bytes */
+ size_t nf;
+ struct awk_field_info {
+ size_t skip; /* amount to skip before field starts */
+ size_t len; /* length of field */
+ } fields[1]; /* actual dimension should be nf */
+} awk_fieldwidth_info_t;
+
+/*
+ * This macro calculates the total struct size needed. This is useful when
+ * calling malloc or realloc.
+ */
+#define awk_fieldwidth_info_size(NF) (sizeof(awk_fieldwidth_info_t) + \
+ (((NF)-1) * sizeof(struct awk_field_info)))
+
/* The information about input files that input parsers need to know: */
typedef struct awk_input {
const char *name; /* filename */
@@ -146,9 +172,19 @@ typedef struct awk_input {
* than zero, gawk will automatically update the ERRNO variable based
* on the value of *errcode (e.g., setting *errcode = errno should do
* the right thing).
+ *
+ * If field_width is non-NULL, then *field_width will be initialized
+ * to NULL, and the function may set it to point to a structure
+ * supplying field width information to override the default
+ * gawk field parsing mechanism. Note that this structure will not
+ * be copied by gawk; it must persist at least until the next call
+ * to get_record or close_func. Note also that field_width will
+ * be NULL when getline is assigning the results to a variable, thus
+ * field parsing is not needed.
*/
int (*get_record)(char **out, struct awk_input *iobuf, int *errcode,
- char **rt_start, size_t *rt_len);
+ char **rt_start, size_t *rt_len,
+ const awk_fieldwidth_info_t **field_width);
/*
* No argument prototype on read_func to allow for older systems
diff --git a/io.c b/io.c
index b00f4db4..1ed40aab 100644
--- a/io.c
+++ b/io.c
@@ -287,7 +287,7 @@ static RECVALUE rsrescan(IOBUF *iop, struct recmatch *recm, SCANSTATE *state);
static RECVALUE (*matchrec)(IOBUF *iop, struct recmatch *recm, SCANSTATE *state) = rs1scan;
-static int get_a_record(char **out, IOBUF *iop, int *errcode);
+static int get_a_record(char **out, IOBUF *iop, int *errcode, const awk_fieldwidth_info_t **field_width);
static void free_rp(struct redirect *rp);
@@ -590,13 +590,14 @@ inrec(IOBUF *iop, int *errcode)
char *begin;
int cnt;
bool retval = true;
+ const awk_fieldwidth_info_t *field_width = NULL;
if (at_eof(iop) && no_data_left(iop))
cnt = EOF;
else if ((iop->flag & IOP_CLOSED) != 0)
cnt = EOF;
else
- cnt = get_a_record(& begin, iop, errcode);
+ cnt = get_a_record(& begin, iop, errcode, & field_width);
/* Note that get_a_record may return -2 when I/O would block */
if (cnt < 0) {
@@ -604,7 +605,7 @@ inrec(IOBUF *iop, int *errcode)
} else {
INCREMENT_REC(NR);
INCREMENT_REC(FNR);
- set_record(begin, cnt);
+ set_record(begin, cnt, field_width);
if (*errcode > 0)
retval = false;
}
@@ -2646,6 +2647,7 @@ do_getline_redir(int into_variable, enum redirval redirtype)
NODE *redir_exp = NULL;
NODE **lhs = NULL;
int redir_error = 0;
+ const awk_fieldwidth_info_t *field_width = NULL;
if (into_variable)
lhs = POP_ADDRESS();
@@ -2674,7 +2676,7 @@ do_getline_redir(int into_variable, enum redirval redirtype)
return make_number((AWKNUM) 0.0);
errcode = 0;
- cnt = get_a_record(& s, iop, & errcode);
+ cnt = get_a_record(& s, iop, & errcode, (lhs ? NULL : & field_width));
if (errcode != 0) {
if (! do_traditional && (errcode != -1))
update_ERRNO_int(errcode);
@@ -2696,7 +2698,7 @@ do_getline_redir(int into_variable, enum redirval redirtype)
}
if (lhs == NULL) /* no optional var. */
- set_record(s, cnt);
+ set_record(s, cnt, field_width);
else { /* assignment to variable */
unref(*lhs);
*lhs = make_string(s, cnt);
@@ -2714,6 +2716,7 @@ do_getline(int into_variable, IOBUF *iop)
int cnt = EOF;
char *s = NULL;
int errcode;
+ const awk_fieldwidth_info_t *field_width = NULL;
if (iop == NULL) { /* end of input */
if (into_variable)
@@ -2722,7 +2725,7 @@ do_getline(int into_variable, IOBUF *iop)
}
errcode = 0;
- cnt = get_a_record(& s, iop, & errcode);
+ cnt = get_a_record(& s, iop, & errcode, (into_variable ? NULL : & field_width));
if (errcode != 0) {
if (! do_traditional && (errcode != -1))
update_ERRNO_int(errcode);
@@ -2737,7 +2740,7 @@ do_getline(int into_variable, IOBUF *iop)
INCREMENT_REC(FNR);
if (! into_variable) /* no optional var. */
- set_record(s, cnt);
+ set_record(s, cnt, field_width);
else { /* assignment to variable */
NODE **lhs;
lhs = POP_ADDRESS();
@@ -3681,7 +3684,9 @@ errno_io_retry(void)
static int
get_a_record(char **out, /* pointer to pointer to data */
IOBUF *iop, /* input IOP */
- int *errcode) /* pointer to error variable */
+ int *errcode, /* pointer to error variable */
+ const awk_fieldwidth_info_t **field_width)
+ /* pointer to pointer to field_width info */
{
struct recmatch recm;
SCANSTATE state;
@@ -3700,7 +3705,8 @@ get_a_record(char **out, /* pointer to pointer to data */
char *rt_start;
size_t rt_len;
int rc = iop->public.get_record(out, &iop->public, errcode,
- &rt_start, &rt_len);
+ &rt_start, &rt_len,
+ field_width);
if (rc == EOF)
iop->flag |= IOP_AT_EOF;
else {
diff --git a/main.c b/main.c
index 530d37fd..33b6eba6 100644
--- a/main.c
+++ b/main.c
@@ -1007,22 +1007,7 @@ load_procinfo()
value = getegid();
update_PROCINFO_num("egid", value);
- switch (current_field_sep()) {
- case Using_FIELDWIDTHS:
- update_PROCINFO_str("FS", "FIELDWIDTHS");
- break;
- case Using_FPAT:
- update_PROCINFO_str("FS", "FPAT");
- break;
- case Using_FS:
- update_PROCINFO_str("FS", "FS");
- break;
- default:
- fatal(_("unknown value for field spec: %d\n"),
- current_field_sep());
- break;
- }
-
+ update_PROCINFO_str("FS", current_field_sep_str());
#if defined (HAVE_GETGROUPS) && defined(NGROUPS_MAX) && NGROUPS_MAX > 0
for (i = 0; i < ngroups; i++) {
diff --git a/test/ChangeLog b/test/ChangeLog
index df0ed8fa..d684e73a 100644
--- a/test/ChangeLog
+++ b/test/ChangeLog
@@ -1,3 +1,14 @@
+2017-03-27 Arnold D. Robbins <arnold@skeeve.com>
+
+ * fwtest4: Renamed from fwtest3.
+ * fwtest3: Renamed from fwtest2b.
+ * Makefile.am: Updated.
+
+2017-03-21 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * Makefile.am (fwtest2b): Add new test of enhanced FIELDWIDTHS syntax.
+ * fwtest2b.awk, fwtest2b.ok: New files.
+
2017-03-19 Andrew J. Schorr <aschorr@telemetry-investments.com>
* Makefile.am (argarray): Always copy argarray.in to the local
@@ -5,6 +16,13 @@
$(srcdir) is the current directory.
* argarray.ok: Replace argarray.in with argarray.input.
+2017-03-06 Andrew J. Schorr <aschorr@telemetry-investments.com>
+
+ * Makefile.am (readdir_test): New test to check whether get_record
+ field_width parsing is working by comparing the results from the
+ readdir and readdir_test extensions.
+ (SHLIB_TESTS): Add readdir_test.
+
2017-02-21 Andrew J. Schorr <aschorr@telemetry-investments.com>
* Makefile.am (mktime): New test.
diff --git a/test/Makefile.am b/test/Makefile.am
index a356d63b..b1a97621 100644
--- a/test/Makefile.am
+++ b/test/Makefile.am
@@ -386,8 +386,10 @@ EXTRA_DIST = \
fwtest2.in \
fwtest2.ok \
fwtest3.awk \
- fwtest3.in \
fwtest3.ok \
+ fwtest4.awk \
+ fwtest4.in \
+ fwtest4.ok \
genpot.awk \
genpot.ok \
gensub.awk \
@@ -1221,7 +1223,7 @@ GAWK_EXT_TESTS = \
crlf dbugeval dbugeval2 dbugtypedre1 dbugtypedre2 delsub \
devfd devfd1 devfd2 dumpvars errno exit \
fieldwdth forcenum fpat1 fpat2 fpat3 fpat4 fpat5 fpatnull fsfwfs funlen \
- functab1 functab2 functab3 fwtest fwtest2 fwtest3 \
+ functab1 functab2 functab3 fwtest fwtest2 fwtest3 fwtest4 \
genpot gensub gensub2 gensub3 getlndir gnuops2 gnuops3 gnureops gsubind \
icasefs icasers id igncdym igncfs ignrcas2 ignrcas4 ignrcase \
incdupe incdupe2 incdupe3 incdupe4 incdupe5 incdupe6 incdupe7 \
@@ -1263,7 +1265,7 @@ LOCALE_CHARSET_TESTS = \
SHLIB_TESTS = \
apiterm fnmatch filefuncs fork fork2 fts functab4 getfile inplace1 inplace2 inplace3 \
- ordchr ordchr2 readdir readfile readfile2 revout revtwoway rwarray testext time
+ ordchr ordchr2 readdir readdir_test readfile readfile2 revout revtwoway rwarray testext time
# List of the tests which should be run with --lint option:
NEED_LINT = \
@@ -2187,6 +2189,12 @@ readdir:
-v dirlist=_dirlist -v longlist=_longlist > $@.ok
@-$(CMP) $@.ok _$@ && rm -f $@.ok _$@ _dirlist _longlist
+readdir_test:
+ @echo $@
+ @$(AWK) -lreaddir -F/ '{printf "[%s] [%s] [%s] [%s]\n", $$1, $$2, $$3, $$4}' "$(top_srcdir)" > $@.ok
+ @$(AWK) -lreaddir_test '{printf "[%s] [%s] [%s] [%s]\n", $$1, $$2, $$3, $$4}' "$(top_srcdir)" > _$@
+ @-$(CMP) $@.ok _$@ && rm -f $@.ok _$@
+
fts:
@case `uname` in \
IRIX) \
@@ -2367,6 +2375,11 @@ arrdbg:
@-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ "$(srcdir)"/$@.ok
# @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ "$(srcdir)"/$@.ok || exit 0
+fwtest3:
+ @echo $@
+ @AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/fwtest2.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
+ @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
+
# Targets generated for other tests:
include Maketests
diff --git a/test/Makefile.in b/test/Makefile.in
index 8719840b..57f5bf61 100644
--- a/test/Makefile.in
+++ b/test/Makefile.in
@@ -644,8 +644,10 @@ EXTRA_DIST = \
fwtest2.in \
fwtest2.ok \
fwtest3.awk \
- fwtest3.in \
fwtest3.ok \
+ fwtest4.awk \
+ fwtest4.in \
+ fwtest4.ok \
genpot.awk \
genpot.ok \
gensub.awk \
@@ -1478,7 +1480,7 @@ GAWK_EXT_TESTS = \
crlf dbugeval dbugeval2 dbugtypedre1 dbugtypedre2 delsub \
devfd devfd1 devfd2 dumpvars errno exit \
fieldwdth forcenum fpat1 fpat2 fpat3 fpat4 fpat5 fpatnull fsfwfs funlen \
- functab1 functab2 functab3 fwtest fwtest2 fwtest3 \
+ functab1 functab2 functab3 fwtest fwtest2 fwtest3 fwtest4 \
genpot gensub gensub2 gensub3 getlndir gnuops2 gnuops3 gnureops gsubind \
icasefs icasers id igncdym igncfs ignrcas2 ignrcas4 ignrcase \
incdupe incdupe2 incdupe3 incdupe4 incdupe5 incdupe6 incdupe7 \
@@ -1516,7 +1518,7 @@ LOCALE_CHARSET_TESTS = \
SHLIB_TESTS = \
apiterm fnmatch filefuncs fork fork2 fts functab4 getfile inplace1 inplace2 inplace3 \
- ordchr ordchr2 readdir readfile readfile2 revout revtwoway rwarray testext time
+ ordchr ordchr2 readdir readdir_test readfile readfile2 revout revtwoway rwarray testext time
# List of the tests which should be run with --lint option:
@@ -2626,6 +2628,12 @@ readdir:
-v dirlist=_dirlist -v longlist=_longlist > $@.ok
@-$(CMP) $@.ok _$@ && rm -f $@.ok _$@ _dirlist _longlist
+readdir_test:
+ @echo $@
+ @$(AWK) -lreaddir -F/ '{printf "[%s] [%s] [%s] [%s]\n", $$1, $$2, $$3, $$4}' "$(top_srcdir)" > $@.ok
+ @$(AWK) -lreaddir_test '{printf "[%s] [%s] [%s] [%s]\n", $$1, $$2, $$3, $$4}' "$(top_srcdir)" > _$@
+ @-$(CMP) $@.ok _$@ && rm -f $@.ok _$@
+
fts:
@case `uname` in \
IRIX) \
@@ -2803,6 +2811,12 @@ arrdbg:
@echo $@
@$(AWK) -v "okfile=$(srcdir)/$@.ok" -f "$(srcdir)"/$@.awk | grep array_f >_$@
@-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ "$(srcdir)"/$@.ok
+# @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ "$(srcdir)"/$@.ok || exit 0
+
+fwtest3:
+ @echo $@
+ @AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/fwtest2.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
+ @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
Gt-dummy:
# file Maketests, generated from Makefile.am by the Gentests program
addcomma:
@@ -3982,7 +3996,7 @@ fwtest2:
@AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
@-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
-fwtest3:
+fwtest4:
@echo $@
@AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
@-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
@@ -4457,7 +4471,6 @@ time:
@-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
# end of file Maketests
-# @-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@ "$(srcdir)"/$@.ok || exit 0
# Targets generated for other tests:
diff --git a/test/Maketests b/test/Maketests
index d9183c0a..9ff8ef90 100644
--- a/test/Maketests
+++ b/test/Maketests
@@ -1177,7 +1177,7 @@ fwtest2:
@AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
@-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
-fwtest3:
+fwtest4:
@echo $@
@AWKPATH="$(srcdir)" $(AWK) -f $@.awk < "$(srcdir)"/$@.in >_$@ 2>&1 || echo EXIT CODE: $$? >>_$@
@-$(CMP) "$(srcdir)"/$@.ok _$@ && rm -f _$@
diff --git a/test/fwtest3.awk b/test/fwtest3.awk
index d1384eaf..5e96c1aa 100644
--- a/test/fwtest3.awk
+++ b/test/fwtest3.awk
@@ -1 +1,6 @@
-BEGIN { FIELDWIDTHS="5" } { print $1 }
+BEGIN {
+ FIELDWIDTHS = "2:13 2:13 2:13";
+}
+{
+ printf "%s|%s|%s\n", $1, $2, $3
+}
diff --git a/test/fwtest3.ok b/test/fwtest3.ok
index e56e15bb..f4d28232 100644
--- a/test/fwtest3.ok
+++ b/test/fwtest3.ok
@@ -1 +1,12 @@
-12345
+ 0.4867373206| 1.3206333033|-0.2333178127
+ 0.5668176165| 1.3711756314|-0.2193558040
+ 0.4325251781| 1.3399488722|-0.1568307497
+ 0.4900487563| 1.3295759570|-0.2217392402
+-0.6790064191| 1.2536623801|-0.2955415433
+-0.6311440220| 1.2966579993|-0.2246692210
+-0.7209390351| 1.1783407099|-0.2539408209
+-0.6782473356| 1.2495242556|-0.2811436366
+-0.7062054082| 1.1223820964|-1.1619805834
+-0.6491590119| 1.1248946162|-1.0851579675
+-0.7948856821| 1.1208852325|-1.1259821556
+-0.7102549262| 1.1225121126|-1.1475381286
diff --git a/test/fwtest4.awk b/test/fwtest4.awk
new file mode 100644
index 00000000..d1384eaf
--- /dev/null
+++ b/test/fwtest4.awk
@@ -0,0 +1 @@
+BEGIN { FIELDWIDTHS="5" } { print $1 }
diff --git a/test/fwtest3.in b/test/fwtest4.in
index a32a4347..a32a4347 100644
--- a/test/fwtest3.in
+++ b/test/fwtest4.in
diff --git a/test/fwtest4.ok b/test/fwtest4.ok
new file mode 100644
index 00000000..e56e15bb
--- /dev/null
+++ b/test/fwtest4.ok
@@ -0,0 +1 @@
+12345