summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorArnold D. Robbins <arnold@skeeve.com>2010-07-02 15:46:31 +0300
committerArnold D. Robbins <arnold@skeeve.com>2010-07-02 15:46:31 +0300
commit3711eedc1b995eb1926c9ffb902d5d796cacf8d0 (patch)
tree5642fdee11499774e0b7401f195931cd3a143d18
parentec6415f1ba061b2fb78808b7dba3246745a15398 (diff)
downloadgawk-3711eedc1b995eb1926c9ffb902d5d796cacf8d0.tar.gz
Now at 2.02.gawk-2.02
-rw-r--r--COPYING90
-rw-r--r--Makefile153
-rw-r--r--PROBLEMS26
-rw-r--r--README37
-rw-r--r--alloca.s311
-rw-r--r--awk.h649
-rw-r--r--awk.y1940
-rw-r--r--awk1.c1275
-rw-r--r--awk2.c2090
-rw-r--r--awk3.c1446
-rw-r--r--awk4.c402
-rw-r--r--awk5.c154
-rw-r--r--awk6.c586
-rw-r--r--awk7.c552
-rw-r--r--awk8.c256
-rw-r--r--awk9.c272
-rw-r--r--debug.c485
-rw-r--r--gawk.11181
-rw-r--r--obstack.c157
-rw-r--r--obstack.h204
-rw-r--r--regex.c657
-rw-r--r--regex.h118
-rw-r--r--version.c25
23 files changed, 8761 insertions, 4305 deletions
diff --git a/COPYING b/COPYING
index d0f89166..acf78358 100644
--- a/COPYING
+++ b/COPYING
@@ -1,10 +1,11 @@
GAWK GENERAL PUBLIC LICENSE
- (Clarified 20 March 1987)
+ (Clarified 11 Feb 1988)
- Copyright (C) 1986 Richard M. Stallman
+ Copyright (C) 1988 Richard M. Stallman
Everyone is permitted to copy and distribute verbatim copies
- of this license, but changing it is not allowed.
+ of this license, but changing it is not allowed. You can also
+ use this wording to make the terms for other programs.
The license agreements of most software companies keep you at the
mercy of those companies. By contrast, our general public license is
@@ -37,13 +38,13 @@ allowed to distribute or change GAWK.
COPYING POLICIES
- 1. You may copy and distribute verbatim copies of GAWK source code as
-you receive it, in any medium, provided that you conspicuously and
+ 1. You may copy and distribute verbatim copies of GAWK source code
+as you receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy a valid copyright notice "Copyright
-(C) 1986 Free Software Foundation, Inc." (or with the year updated if
-that is appropriate); keep intact the notices on all files that refer
-to this License Agreement and to the absence of any warranty; and give
-any other recipients of the GAWK program a copy of this License
+(C) 1988 Free Software Foundation, Inc." (or with whatever year is
+appropriate); keep intact the notices on all files that refer to this
+License Agreement and to the absence of any warranty; and give any
+other recipients of the GAWK program a copy of this License
Agreement along with the program. You may charge a distribution fee
for the physical act of transferring a copy.
@@ -55,35 +56,43 @@ Paragraph 1 above, provided that you also do the following:
that you changed the files and the date of any change; and
b) cause the whole of any work that you distribute or publish,
- that in whole or in part contains or is a derivative of GAWK or any
- part thereof, to be licensed at no charge to all third parties on
- terms identical to those contained in this License Agreement
- (except that you may choose to grant more extensive warranty
- protection to third parties, at your option).
+ that in whole or in part contains or is a derivative of GAWK or
+ any part thereof, to be licensed at no charge to all third
+ parties on terms identical to those contained in this License
+ Agreement (except that you may choose to grant more extensive
+ warranty protection to some or all third parties, at your option).
c) You may charge a distribution fee for the physical act of
transferring a copy, and you may at your option offer warranty
protection in exchange for a fee.
- 3. You may copy and distribute GAWK or any portion of it in
-compiled, executable or object code form under the terms of Paragraphs
-1 and 2 above provided that you do the following:
+Mere aggregation of another unrelated program with this program (or its
+derivative) on a volume of a storage or distribution medium does not bring
+the other program under the scope of these terms.
- a) cause each such copy to be accompanied by the
- corresponding machine-readable source code, which must
- be distributed under the terms of Paragraphs 1 and 2 above; or,
+ 3. You may copy and distribute GAWK (or a portion or derivative of it,
+under Paragraph 2) in object code or executable form under the terms of
+Paragraphs 1 and 2 above provided that you also do one of the following:
- b) cause each such copy to be accompanied by a
- written offer, with no time limit, to give any third party
- free (except for a nominal shipping charge) a machine readable
- copy of the corresponding source code, to be distributed
- under the terms of Paragraphs 1 and 2 above; or,
+ a) accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of
+ Paragraphs 1 and 2 above; or,
- c) in the case of a recipient of GAWK in compiled, executable
- or object code form (without the corresponding source code) you
- shall cause copies you distribute to be accompanied by a copy
- of the written offer of source code which you received along
- with the copy you received.
+ b) accompany it with a written offer, valid for at least three
+ years, to give any third party free (except for a nominal
+ shipping charge) a complete machine-readable copy of the
+ corresponding source code, to be distributed under the terms of
+ Paragraphs 1 and 2 above; or,
+
+ c) accompany it with the information you received as to where the
+ corresponding source code may be obtained. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form alone.)
+
+For an executable file, complete source code means all the source code for
+all modules it contains; but, as a special exception, it need not include
+source code for modules which are standard libraries that accompany the
+operating system on which the executable file runs.
4. You may not copy, sublicense, distribute or transfer GAWK
except as expressly provided under this License Agreement. Any attempt
@@ -93,16 +102,17 @@ automatically terminated. However, parties who have received computer
software programs from you with this License Agreement will not have
their licenses terminated so long as such parties remain in full compliance.
- 5. If you wish to incorporate parts of GAWK into other free
-programs whose distribution conditions are different, write to the Free
-Software Foundation at 1000 Mass Ave, Cambridge, MA 02138. We have not yet
-worked out a simple rule that can be stated here, but we will often permit
-this. We will be guided by the two goals of preserving the free status of
-all derivatives of our free software and of promoting the sharing and reuse
-of software.
-
-In other words, go ahead and share GAWK, but don't try to stop
-anyone else from sharing it farther. Help stamp out software hoarding!
+ 5. If you wish to incorporate parts of GAWK into other free programs
+whose distribution conditions are different, write to the Free Software
+Foundation at 675 Mass Ave, Cambridge, MA 02139. We have not yet worked
+out a simple rule that can be stated here, but we will often permit this.
+We will be guided by the two goals of preserving the free status of all
+derivatives of our free software and of promoting the sharing and reuse of
+software.
+
+Your comments and suggestions about our licensing policies and our
+software are welcome! Please contact the Free Software Foundation, Inc.,
+675 Mass Ave, Cambridge, MA 02139, or call (617) 876-3296.
NO WARRANTY
diff --git a/Makefile b/Makefile
index e0f050cb..da8feb73 100644
--- a/Makefile
+++ b/Makefile
@@ -1,36 +1,145 @@
-LIB=.
-#CFLAGS=-O -pg -I$(LIB) -DFAST
-CFLAGS=-g -I$(LIB)
-LIBOBJS= $(LIB)/obstack.o $(LIB)/regex.o
-OBJS = awk1.o awk2.o awk3.o awk.tab.o debug.o
-gawk: $(OBJS) $(LIBOBJS)
- $(CC) -o gawk $(CFLAGS) $(OBJS) $(LIBOBJS) -lm
-$(OBJS): awk.h
-awk.tab.c: awk.y
- bison -v awk.y
-# yacc -v awk.y
-# mv y.tab.c awk.tab.c
-lint:
- lint -I$(LIB) awk.tab.c awk1.c awk2.c awk3.c
-clean:
- rm -f gawk *.o core awk.output awk.tab.c
+# Makefile for GNU Awk.
+#
+# Copyright (C) 1988 Free Software Foundation
+# Rewritten by Arnold Robbins, September 1988
+#
+# GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY. No author or distributor accepts responsibility to anyone for
+# the consequences of using it or for whether it serves any particular
+# purpose or works at all, unless he says so in writing. Refer to the GAWK
+# General Public License for full details.
+#
+# Everyone is granted permission to copy, modify and redistribute GAWK, but
+# only under the conditions described in the GAWK General Public License. A
+# copy of this license is supposed to have been given to you along with GAWK
+# so you can know your rights and responsibilities. It should be in a file
+# named COPYING. Among other things, the copyright notice and this notice
+# must be preserved on all copies.
+#
+# In other words, go ahead and share GAWK, but don't try to stop anyone else
+# from sharing it farther. Help stamp out software hoarding!
+#
+
+# CFLAGS: options to the C compiler
+#
+# -I. so includes of <obstack.h> work. mandatory. (fix?)
+# -O optimize
+# -g include dbx/sdb info
+# -gg include gdb debugging info; only for GCC
+# -pg include new (gmon) profiling info
+# -p include old style profiling info (System V)
+#
+# -Bstatic - For SunOS 4.0, don't use dynamic linking
+# -DUSG - for System V boxen.
+# -DSTRICT - remove anything not in Unix awk. Off by default.
+# -DDEBUG - include debugging code and options
+# -DVPRINTF - system has vprintf and associated routines
+# -DBSD - system needs version of vprintf et al. defined in awk5.c
+# (this is the only use at present, so don't define it if you
+# *have* vprintf et al. in your library)
+#
+INCLUDE= #-I.
+OPTIMIZE= -O
+DEBUG= #-DDEBUG
+DEBUGGER= -g
+PROFILE=#-pg
+SUNOS=# -Bstatic
+SYSV=# -DVPRINTF
+BSD=-DBSD
+
+FLAGS= $(INCLUDE) $(OPTIMIZE) $(SYSV) $(DEBUG) $(BSD)
+CFLAGS = $(FLAGS) $(DEBUGGER) $(SUNOS) $(PROFILE)
+
+SRC = awk1.c awk2.c awk3.c awk4.c awk5.c \
+ awk6.c awk7.c awk8.c awk9.c regex.c version.c #obstack.c
+
+AWKOBJS = awk1.o awk2.o awk3.o awk4.o awk5.o awk6.o awk7.o awk8.o awk9.o \
+ version.o
+ALLOBJS = $(AWKOBJS) awk.tab.o
+
+# Parser to use on grammar -- if you don't have bison use the first one
+#PARSER = yacc
+PARSER = bison
+
+# S5OBJS
+# Set equal to alloca.o if your system is S5 and you don't have
+# alloca. Uncomment the rule below to actually make alloca.o.
+S5OBJS=
+
+# LIBOBJS
+# Stuff that awk uses as library routines, but not in /lib/libc.a.
+LIBOBJS= regex.o $(S5OBJS) #obstack.o
+
+# DOCS
+# Documentation for users
+DOCS= gawk.1
# We don't distribute shar files, but they're useful for mailing.
-SHARS = COPYING Makefile README awk.h awk.tab.c awk.y awk1.c awk2.c awk3.c\
- debug.c obstack.h obstack.c regex.h regex.c
+UPDATES = Makefile awk.h awk.y \
+ $(SRC) regex.h #obstack.h
+
+SHARS = $(DOCS) COPYING README.1.01 README PROBLEMS \
+ $(UPDATES) awk.tab.c\
+ alloca.s
+
+gawk: $(ALLOBJS) $(LIBOBJS)
+ $(CC) -o gawk $(CFLAGS) $(ALLOBJS) $(LIBOBJS) -lm
+
+$(AWKOBJS): awk.h
+
+awk.tab.o: awk.h awk.tab.c
+
+awk.tab.c: awk.y
+ $(PARSER) -v awk.y
+ -mv -f y.tab.c awk.tab.c
+# -if [ $(PARSER) = "yacc" ] ; \
+# then \
+# if cmp -s y.tab.h awk.tab.h ; \
+# then : ; \
+# else \
+# cp y.tab.h awk.tab.h ; \
+# grep '^#.*define' awk.tab.h | \
+# sed 's/^# define \([^ ]*\) [^ ]*$$/ "\1",/' >y.tok.h ; \
+# mv y.tab.c awk.tab.c; \
+# fi; \
+# fi
+
+# Alloca: uncomment this if your system (notably System V boxen)
+# does not have alloca in /lib/libc.a
+#
+#alloca.o: alloca.s
+# /lib/cpp < alloca.s | sed '/^#/d' > t.s
+# as t.s -o alloca.o
+# rm t.s
+
+lint: $(SRC)
+ lint -h $(FLAGS) $(SRC) awk.tab.c
+
+clean:
+ rm -f gawk *.o core awk.output awk.tab.c gmon.out make.out
awk.shar: $(SHARS)
shar -f awk -c $(SHARS)
-
+
awk.tar: $(SHARS)
tar cvf awk.tar $(SHARS)
+updates.tar: $(UPDATES)
+ tar cvf gawk.tar $(UPDATES)
+
awk.tar.Z: awk.tar
- -@mv awk.tar.Z awk.tar.old.Z
compress < awk.tar > awk.tar.Z
+doc: $(DOCS)
+ nroff -man $(DOCS) | col > $(DOCS).out
+
# This command probably won't be useful to the rest of the world, but makes
# life much easier for me.
dist: awk.tar awk.tar.Z
- rcp awk.tar prep:/u2/emacs/awk.tar
- rcp awk.tar.Z prep:/u2/emacs/awk.tar.Z
+
+diff:
+ for i in RCS/*; do rcsdiff -c -b $$i > `basename $$i ,v`.diff; done
+
+update: $(UPDATES)
+ sendup $?
+ touch update
diff --git a/PROBLEMS b/PROBLEMS
new file mode 100644
index 00000000..816ac26c
--- /dev/null
+++ b/PROBLEMS
@@ -0,0 +1,26 @@
+This is a list of known problems in the current version of gawk.
+Hopefully they will all be fixed in the next major release of gawk.
+
+Please keep in mind that this is still beta software and the code
+is still undergoing significant evolution.
+
+1. Memory Management. Gawk has memory leaks. This version (12/23/88) is
+ better than earlier versions, but still not wonderful.
+
+2. Gawk reportedly does not work well with the BSD getopt. The getopt from
+ gnu grep is reported to work fine.
+
+3. The % operator truncates to integer. This will be fixed.
+
+4. \ inside [] in regexps doesn't work like the book says they should.
+ This will also be fixed.
+
+5. %g does not seem to truncate non-significant zeros.
+
+6. No gawk.texinfo; this is being worked on.
+
+7. MS-DOS support. This version does not have it, although support for
+ MSC 5.1 was recently contributed and will be included in the next
+ major release.
+
+Arnold Robbins
diff --git a/README b/README
new file mode 100644
index 00000000..d717d4ab
--- /dev/null
+++ b/README
@@ -0,0 +1,37 @@
+README:
+
+This is GNU Awk 2.00 Beta. It should be functionally equivalent to the
+System V Release 4 awk. (Yes, you read that right.)
+
+**** N O T I C E ****: Although the functionality is reasonably stable
+and we think it relatively bug-free, the code itself is not stable and
+the next release may look quite different. In particular, the memory
+management is in a state of flux and will be greatly changed (and
+improved) in the next release.
+
+Some additional features are under design/discussion with Randall
+Howard at MKS and Brian Kernighan at Bell Labs. Although they are
+documented, they are subject to future change.
+
+The gawk.1 man page is concise but (to the best of our knowledge) complete.
+A gawk.texinfo is in the works. A preliminary draft exists, but has numerous
+technical errors which are being fixed. It also will have to be reorganized.
+Don't look for it for a while. However, the AWK book will do quite well.
+
+Please coordinate changes through David Trueman and/or Arnold Robbins.
+
+David Trueman
+Department of Mathematics, Statistics and Computing Science,
+Dalhousie University, Halifax, Nova Scotia, Canada
+
+UUCP {uunet utai watmath}!dalcs!david
+CDN david@cs.dal.cdn
+INTERNET david%dalcs@uunet.UU.NET
+
+Arnold Robbins
+Emory University Computing Center
+Emory University, Atlanta, GA, 30322, USA
+
+DOMAIN: arnold@emoryu1.cc.emory.edu
+UUCP: { gatech, mtxinu }!emoryu1!arnold
+BITNET: arnold@emoryu1
diff --git a/alloca.s b/alloca.s
new file mode 100644
index 00000000..3fc1b6f2
--- /dev/null
+++ b/alloca.s
@@ -0,0 +1,311 @@
+/* `alloca' standard 4.2 subroutine for 68000's and 16000's and others.
+ Also has _setjmp and _longjmp for pyramids.
+ Copyright (C) 1985, 1986, 1988 Free Software Foundation, Inc.
+
+This file is part of GNU Emacs.
+
+GNU Emacs is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY. No author or distributor
+accepts responsibility to anyone for the consequences of using it
+or for whether it serves any particular purpose or works at all,
+unless he says so in writing. Refer to the GNU Emacs General Public
+License for full details.
+
+Everyone is granted permission to copy, modify and redistribute
+GNU Emacs, but only under the conditions described in the
+GNU Emacs General Public License. A copy of this license is
+supposed to have been given to you along with GNU Emacs so you
+can know your rights and responsibilities. It should be in a
+file named COPYING. Among other things, the copyright notice
+and this notice must be preserved on all copies. */
+
+
+/* Both 68000 systems I have run this on have had broken versions of alloca.
+ Also, I am told that non-berkeley systems do not have it at all.
+ So replace whatever system-provided alloca there may be
+ on all 68000 systems. */
+
+/* #include "config.h" */
+#define m68k
+
+#ifndef HAVE_ALLOCA /* define this to use system's alloca */
+
+#ifndef hp9000s300
+#ifndef m68k
+#ifndef m68000
+#ifndef WICAT
+#ifndef ns16000
+#ifndef sequent
+#ifndef pyramid
+#ifndef ATT3B5
+#ifndef XENIX
+you
+lose!!
+#endif /* XENIX */
+#endif /* ATT3B5 */
+#endif /* pyramid */
+#endif /* sequent */
+#endif /* ns16000 */
+#endif /* WICAT */
+#endif /* m68000 */
+#endif /* m68k */
+#endif /* hp9000s300 */
+
+
+#ifdef hp9000s300
+#ifdef OLD_HP_ASSEMBLER
+ data
+ text
+ globl _alloca
+_alloca
+ move.l (sp)+,a0 ; pop return addr from top of stack
+ move.l (sp)+,d0 ; pop size in bytes from top of stack
+ add.l #ROUND,d0 ; round size up to long word
+ and.l #MASK,d0 ; mask out lower two bits of size
+ sub.l d0,sp ; allocate by moving stack pointer
+ tst.b PROBE(sp) ; stack probe to allocate pages
+ move.l sp,d0 ; return pointer
+ add.l #-4,sp ; new top of stack
+ jmp (a0) ; not a normal return
+MASK equ -4 ; Longword alignment
+ROUND equ 3 ; ditto
+PROBE equ -128 ; safety buffer for C compiler scratch
+ data
+#else /* new hp assembler syntax */
+/*
+ The new compiler does "move.m <registers> (%sp)" to save registers,
+ so we must copy the saved registers when we mung the sp.
+ The old compiler did "move.m <register> <offset>(%a6)", which
+ gave us no trouble
+ */
+ text
+ set PROBE,-128 # safety for C frame temporaries
+ set MAXREG,10 # d2-d7, a2-a5 may have been saved
+ global _alloca
+_alloca:
+ mov.l (%sp)+,%a0 # return addess
+ mov.l (%sp)+,%d0 # number of bytes to allocate
+ mov.l %sp,%a1 # save old sp for register copy
+ mov.l %sp,%d1 # compute new sp
+ sub.l %d0,%d1 # space requested
+ and.l &-4,%d1 # round down to longword
+ sub.l &MAXREG*4,%d1 # space for saving registers
+ mov.l %d1,%sp # save new value of sp
+ tst.b PROBE(%sp) # create pages (sigh)
+ move.w &MAXREG-1,%d0
+copy_regs_loop: /* save caller's saved registers */
+ mov.l (%a1)+,(%sp)+
+ dbra %d0,copy_regs_loop
+ mov.l %sp,%d0 # return value
+ mov.l %d1,%sp
+ add.l &-4,%sp # adjust tos
+ jmp (%a0) # rts
+#endif /* new hp assembler */
+#else
+#ifdef m68k /* SGS assembler totally different */
+ file "alloca.s"
+ global alloca
+alloca:
+ mov.l (%sp)+,%a1 # pop return addr from top of stack
+ mov.l (%sp)+,%d0 # pop size in bytes from top of stack
+ add.l &R%1,%d0 # round size up to long word
+ and.l &-4,%d0 # mask out lower two bits of size
+ sub.l %d0,%sp # allocate by moving stack pointer
+ tst.b P%1(%sp) # stack probe to allocate pages
+ mov.l %sp,%a0 # return pointer as pointer
+ mov.l %sp,%d0 # return pointer as int to avoid disaster
+ add.l &-4,%sp # new top of stack
+ jmp (%a1) # not a normal return
+ set S%1,64 # safety factor for C compiler scratch
+ set R%1,3+S%1 # add to size for rounding
+ set P%1,-132 # probe this far below current top of stack
+
+#else /* not m68k */
+
+#ifdef m68000
+
+#ifdef WICAT
+/*
+ * Registers are saved after the corresponding link so we have to explicitly
+ * move them to the top of the stack where they are expected to be.
+ * Since we do not know how many registers were saved in the calling function
+ * we must assume the maximum possible (d2-d7,a2-a5). Hence, we end up
+ * wasting some space on the stack.
+ *
+ * The large probe (tst.b) attempts to make up for the fact that we have
+ * potentially used up the space that the caller probed for its own needs.
+ */
+ .procss m0
+ .config "68000 1"
+ .module _alloca
+MAXREG: .const 10
+ .sect text
+ .global _alloca
+_alloca:
+ move.l (sp)+,a1 ; pop return address
+ move.l (sp)+,d0 ; pop allocation size
+ move.l sp,d1 ; get current SP value
+ sub.l d0,d1 ; adjust to reflect required size...
+ sub.l #MAXREG*4,d1 ; ...and space needed for registers
+ and.l #-4,d1 ; backup to longword boundry
+ move.l sp,a0 ; save old SP value for register copy
+ move.l d1,sp ; set the new SP value
+ tst.b -4096(sp) ; grab an extra page (to cover caller)
+ move.l a2,d1 ; save callers register
+ move.l sp,a2
+ move.w #MAXREG-1,d0 ; # of longwords to copy
+loop: move.l (a0)+,(a2)+ ; copy registers...
+ dbra d0,loop ; ...til there are no more
+ move.l a2,d0 ; end of register area is addr for new space
+ move.l d1,a2 ; restore saved a2.
+ addq.l #4,sp ; caller will increment sp by 4 after return.
+ move.l d0,a0 ; return value in both a0 and d0.
+ jmp (a1)
+ .end _alloca
+#else
+
+/* Some systems want the _, some do not. Win with both kinds. */
+.globl _alloca
+_alloca:
+.globl alloca
+alloca:
+ movl sp@+,a0
+ movl a7,d0
+ subl sp@,d0
+ andl #~3,d0
+ movl d0,sp
+ tstb sp@(0) /* Make stack pages exist */
+ /* Needed on certain systems
+ that lack true demand paging */
+ addql #4,d0
+ jmp a0@
+
+#endif /* not WICAT */
+#endif /* m68000 */
+#endif /* not m68k */
+#endif /* not hp9000s300 */
+
+#ifdef ns16000
+
+ .text
+ .align 2
+/* Some systems want the _, some do not. Win with both kinds. */
+.globl _alloca
+_alloca:
+.globl alloca
+alloca:
+
+/* Two different assembler syntaxes are used for the same code
+ on different systems. */
+
+#ifdef sequent
+#define IM
+#define REGISTER(x) x
+#else
+#define IM $
+#define REGISTER(x) 0(x)
+#endif
+
+/*
+ * The ns16000 is a little more difficult, need to copy regs.
+ * Also the code assumes direct linkage call sequence (no mod table crap).
+ * We have to copy registers, and therefore waste 32 bytes.
+ *
+ * Stack layout:
+ * new sp -> junk
+ * registers (copy)
+ * r0 -> new data
+ * | (orig retval)
+ * | (orig arg)
+ * old sp -> regs (orig)
+ * local data
+ * fp -> old fp
+ */
+
+ movd tos,r1 /* pop return addr */
+ negd tos,r0 /* pop amount to allocate */
+ sprd sp,r2
+ addd r2,r0
+ bicb IM/**/3,r0 /* 4-byte align */
+ lprd sp,r0
+ adjspb IM/**/36 /* space for regs, +4 for caller to pop */
+ movmd 0(r2),4(sp),IM/**/4 /* copy regs */
+ movmd 0x10(r2),0x14(sp),IM/**/4
+ jump REGISTER(r1) /* funky return */
+#endif /* ns16000 */
+
+#ifdef pyramid
+
+.globl _alloca
+
+_alloca: addw $3,pr0 # add 3 (dec) to first argument
+ bicw $3,pr0 # then clear its last 2 bits
+ subw pr0,sp # subtract from SP the val in PR0
+ andw $-32,sp # keep sp aligned on multiple of 32.
+ movw sp,pr0 # ret. current SP
+ ret
+
+#ifdef PYRAMID_OLD /* This isn't needed in system version 4. */
+.globl __longjmp
+.globl _longjmp
+.globl __setjmp
+.globl _setjmp
+
+__longjmp: jump _longjmp
+__setjmp: jump _setjmp
+#endif
+
+#endif /* pyramid */
+
+#ifdef ATT3B5
+
+ .align 4
+ .globl alloca
+
+alloca:
+ movw %ap, %r8
+ subw2 $9*4, %r8
+ movw 0(%r8), %r1 /* pc */
+ movw 4(%r8), %fp
+ movw 8(%r8), %sp
+ addw2 %r0, %sp /* make room */
+ movw %sp, %r0 /* return value */
+ jmp (%r1) /* continue... */
+
+#endif /* ATT3B5 */
+
+#ifdef XENIX
+
+.386
+
+_TEXT segment dword use32 public 'CODE'
+assume cs:_TEXT
+
+;-------------------------------------------------------------------------
+
+public _alloca
+_alloca proc near
+
+ pop ecx ; return address
+ pop eax ; amount to alloc
+ add eax,3 ; round it to 32-bit boundary
+ and al,11111100B ;
+ mov edx,esp ; current sp in edx
+ sub edx,eax ; lower the stack
+ xchg esp,edx ; start of allocation in esp, old sp in edx
+ mov eax,esp ; return ptr to base in eax
+ push [edx+8] ; save poss. stored reg. values (esi,edi,ebx)
+ push [edx+4] ; on lowered stack
+ push [edx] ;
+ sub esp,4 ; allow for 'add esp, 4'
+ jmp ecx ; jump to return address
+
+_alloca endp
+
+_TEXT ends
+
+end
+
+#endif /* XENIX */
+
+#endif /* not HAVE_ALLOCA */
diff --git a/awk.h b/awk.h
index be792e07..fce57f65 100644
--- a/awk.h
+++ b/awk.h
@@ -1,163 +1,351 @@
/*
- * awk.h -- Definitions for gawk.
+ * awk.h -- Definitions for gawk.
*
- * Copyright (C) 1986 Free Software Foundation
- * Written by Paul Rubin, August 1986
+ * Copyright (C) 1986 Free Software Foundation Written by Paul Rubin, August
+ * 1986
+ *
+ * $Log: awk.h,v $
+ * Revision 1.29 88/12/15 12:52:10 david
+ * casetable made static elsewhere
+ *
+ * Revision 1.28 88/12/14 10:50:21 david
+ * change FREE_TEMP macro to free_temp
+ *
+ * Revision 1.27 88/12/13 22:20:09 david
+ * macro-front-end tree_eval, force_string and force_number
+ *
+ * Revision 1.25 88/12/08 15:57:11 david
+ * added some #ifdef'd out debugging code
+ *
+ * Revision 1.24 88/12/07 19:58:37 david
+ * changes for printing current source file in error messages
+ *
+ * Revision 1.23 88/12/01 15:07:10 david
+ * changes to accomodate source line numbers in error messages
+ *
+ * Revision 1.22 88/11/30 15:14:59 david
+ * FREE_ONE_REFERENCE macro merged inot do_deref()
+ *
+ * Revision 1.21 88/11/29 15:17:01 david
+ * minor movement
+ *
+ * Revision 1.20 88/11/23 21:36:00 david
+ * Arnold: portability addition
+ *
+ * Revision 1.19 88/11/22 15:51:23 david
+ * changed order of elements in NODE decl. for better packing on sparc and
+ * similar machines
+ *
+ * Revision 1.18 88/11/22 13:45:15 david
+ * Arnold: changes for case-insensitive matching
+ *
+ * Revision 1.17 88/11/15 10:15:28 david
+ * Arnold: move a bunch of #include's here
+ *
+ * Revision 1.16 88/11/14 21:50:26 david
+ * Arnold: get sprintf() declaration right; correct STREQ macro
+ *
+ * Revision 1.15 88/11/14 21:24:50 david
+ * added extern decl. for field_num
+ *
+ * Revision 1.14 88/11/03 15:21:03 david
+ * extended flags defines; made force_number safe; added TEMP_FREE define
+ *
+ * Revision 1.13 88/11/01 12:52:18 david
+ * allowed for vprintf code in awk5.c
+ *
+ * Revision 1.12 88/11/01 12:07:27 david
+ * cleanup; additions of external declarations; added variable name to node;
+ * moved flags from sub.val to node proper
+ *
+ * Revision 1.11 88/10/19 21:54:29 david
+ * safe_malloc to be used by obstack_alloc
+ * Node_val to replace other value types (to be done)
+ * emalloc and erealloc macros
+ *
+ * Revision 1.10 88/10/17 19:52:50 david
+ * Arnold: fix cant_happen(); improve VPRINTF; purge FAST
+ *
+ * Revision 1.9 88/10/13 22:02:47 david
+ * added some external declarations to make life easier
+ * #define VPRINTF for portable variable arg list handling
+ *
+ * Revision 1.8 88/10/11 22:19:05 david
+ * added external decl.
+ *
+ * Revision 1.7 88/06/05 22:15:40 david
+ * deleted level member from hashnode structure
+ *
+ * Revision 1.6 88/06/05 22:05:25 david
+ * added cnt member to NODE structure (doesn't add to size, since val member
+ * dominates)
+ *
+ * Revision 1.5 88/05/31 09:29:14 david
+ * expunge Node_local_var
+ *
+ * Revision 1.4 88/05/27 11:04:07 david
+ * changed AWKNUM to double to correspond to nawk
+ *
+ * Revision 1.3 88/05/13 22:07:56 david
+ * moved some defines here from elsewhere
+ *
+ * Revision 1.2 88/05/04 12:17:04 david
+ * make_for_loop() now returns a NODE *
+ *
+ * Revision 1.1 88/04/08 15:15:25 david
+ * Initial revision
+ * Revision 1.6 88/04/08 14:48:25 david changes from Arnold
+ * Robbins
+ *
+ * Revision 1.5 88/03/23 22:17:23 david mostly delinting -- a couple of bug
+ * fixes
+ *
+ * Revision 1.4 88/03/18 21:00:05 david Baseline -- hoefully all the
+ * functionality of the new awk added. Just debugging and tuning to do.
+ *
+ * Revision 1.3 87/11/19 14:34:12 david added a bunch of new Node types added
+ * a new union entry to the expnode structure to accomodate function
+ * parameter names added a level variable to the symbol structure to keep
+ * track of function nesting level
+ *
+ * Revision 1.2 87/10/29 21:48:32 david added Node_in_array NODETYPE
+ *
+ * Revision 1.1 87/10/27 15:23:07 david Initial revision
*
*/
/*
-GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
-WARRANTY. No author or distributor accepts responsibility to anyone
-for the consequences of using it or for whether it serves any
-particular purpose or works at all, unless he says so in writing.
-Refer to the GAWK General Public License for full details.
-
-Everyone is granted permission to copy, modify and redistribute GAWK,
-but only under the conditions described in the GAWK General Public
-License. A copy of this license is supposed to have been given to you
-along with GAWK so you can know your rights and responsibilities. It
-should be in a file named COPYING. Among other things, the copyright
-notice and this notice must be preserved on all copies.
-
-In other words, go ahead and share GAWK, but don't try to stop
-anyone else from sharing it farther. Help stamp out software hoarding!
-*/
+ * GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
+ * WARRANTY. No author or distributor accepts responsibility to anyone for
+ * the consequences of using it or for whether it serves any particular
+ * purpose or works at all, unless he says so in writing. Refer to the GAWK
+ * General Public License for full details.
+ *
+ * Everyone is granted permission to copy, modify and redistribute GAWK, but
+ * only under the conditions described in the GAWK General Public License. A
+ * copy of this license is supposed to have been given to you along with GAWK
+ * so you can know your rights and responsibilities. It should be in a file
+ * named COPYING. Among other things, the copyright notice and this notice
+ * must be preserved on all copies.
+ *
+ * In other words, go ahead and share GAWK, but don't try to stop anyone else
+ * from sharing it farther. Help stamp out software hoarding!
+ */
-#define AWKNUM float
+#define AWKNUM double
+#include <stdio.h>
#include <ctype.h>
+#include <setjmp.h>
+#include <varargs.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <errno.h>
+
+#include "regex.h"
#define is_identchar(c) (isalnum(c) || (c) == '_')
+#ifdef notdef
+#define free do_free /* for debugging */
+#define malloc do_malloc /* for debugging */
+#endif
+
#include "obstack.h"
-#define obstack_chunk_alloc malloc
+#define obstack_chunk_alloc safe_malloc
#define obstack_chunk_free free
-char *malloc(),*realloc();
+char *malloc(), *realloc();
+char *safe_malloc();
void free();
typedef enum {
- /* illegal entry == 0 */
- Node_illegal, /* 0 */
-
- /* binary operators lnode and rnode are the expressions to work on */
- Node_times, /* 1 */
- Node_quotient, /* 2 */
- Node_mod, /* 3 */
- Node_plus, /* 4 */
- Node_minus, /* 5 */
- Node_cond_pair, /* 6: conditional pair (see Node_line_range) jfw */
- Node_subscript, /* 7 */
- Node_concat, /* 8 */
-
- /* unary operators subnode is the expression to work on */
- Node_preincrement, /* 9 */
- Node_predecrement, /* 10 */
- Node_postincrement, /* 11 */
- Node_postdecrement, /* 12 */
- Node_unary_minus, /* 13 */
- Node_field_spec, /* 14 */
-
- /* assignments lnode is the var to assign to, rnode is the exp */
- Node_assign, /* 15 */
- Node_assign_times, /* 16 */
- Node_assign_quotient, /* 17 */
- Node_assign_mod, /* 18 */
- Node_assign_plus, /* 19 */
- Node_assign_minus, /* 20 */
-
- /* boolean binaries lnode and rnode are expressions */
- Node_and, /* 21 */
- Node_or, /* 22 */
-
- /* binary relationals compares lnode and rnode */
- Node_equal, /* 23 */
- Node_notequal, /* 24 */
- Node_less, /* 25 */
- Node_greater, /* 26 */
- Node_leq, /* 27 */
- Node_geq, /* 28 */
-
- /* unary relationals works on subnode */
- Node_not, /* 29 */
-
- /* match ops (binary) work on lnode and rnode ??? */
- Node_match, /* 30 */
- Node_nomatch, /* 31 */
-
- /* data items */
- Node_string, /* 32 has stlen, stptr, and stref */
- Node_temp_string, /* 33 has stlen, stptr, and stref */
- Node_number, /* 34 has numbr */
-
- /* program structures */
- Node_rule_list, /* 35 lnode is a rule, rnode is rest of list */
- Node_rule_node, /* 36 lnode is an conditional, rnode is statement */
- Node_statement_list, /* 37 lnode is a statement, rnode is more list */
- Node_if_branches, /* 38 lnode is to run on true, rnode on false */
- Node_expression_list, /* 39 lnode is an exp, rnode is more list */
-
- /* keywords */
- Node_K_BEGIN, /* 40 no stuff */
- Node_K_END, /* 41 ditto */
- Node_K_if, /* 42 lnode is conditonal, rnode is if_branches */
- Node_K_while, /* 43 lnode is condtional, rnode is stuff to run */
- Node_K_for, /* 44 lnode is for_struct, rnode is stuff to run */
- Node_K_arrayfor, /* 45 lnode is for_struct, rnode is stuff to run */
- Node_K_break, /* 46 no subs */
- Node_K_continue, /* 47 no stuff */
- Node_K_print, /* 48 lnode is exp_list, rnode is redirect */
- Node_K_printf, /* 49 lnode is exp_list, rnode is redirect */
- Node_K_next, /* 59 no subs */
- Node_K_exit, /* 51 subnode is return value, or NULL */
-
- /* I/O redirection for print statements */
- Node_redirect_output, /* 52 subnode is where to redirect */
- Node_redirect_append, /* 53 subnode is where to redirect */
- Node_redirect_pipe, /* 54 subnode is where to redirect */
-
- /* Variables */
- Node_var, /* 55 rnode is value, lnode is array stuff */
- Node_var_array, /* 56 array is ptr to elements, asize num of eles */
-
- /* Builtins subnode is explist to work on, proc is func to call */
- Node_builtin, /* 57 */
-
- /* pattern: conditional ',' conditional ; lnode of Node_line_range is
- * the two conditionals (Node_cond_pair), other word (rnode place) is
- * a flag indicating whether or not this range has been entered.
- * (jfw@eddie.mit.edu)
- */
- Node_line_range, /* 58 */
+ /* illegal entry == 0 */
+ Node_illegal, /* 0 */
+
+ /* binary operators lnode and rnode are the expressions to work on */
+ Node_times, /* 1 */
+ Node_quotient, /* 2 */
+ Node_mod, /* 3 */
+ Node_plus, /* 4 */
+ Node_minus, /* 5 */
+ Node_cond_pair, /* 6: conditional pair (see Node_line_range)
+ * jfw */
+ Node_subscript, /* 7 */
+ Node_concat, /* 8 */
+
+ /* unary operators subnode is the expression to work on */
+ Node_preincrement, /* 9 */
+ Node_predecrement, /* 10 */
+ Node_postincrement, /* 11 */
+ Node_postdecrement, /* 12 */
+ Node_unary_minus, /* 13 */
+ Node_field_spec, /* 14 */
+
+ /* assignments lnode is the var to assign to, rnode is the exp */
+ Node_assign, /* 15 */
+ Node_assign_times, /* 16 */
+ Node_assign_quotient, /* 17 */
+ Node_assign_mod, /* 18 */
+ Node_assign_plus, /* 19 */
+ Node_assign_minus, /* 20 */
+
+ /* boolean binaries lnode and rnode are expressions */
+ Node_and, /* 21 */
+ Node_or, /* 22 */
+
+ /* binary relationals compares lnode and rnode */
+ Node_equal, /* 23 */
+ Node_notequal, /* 24 */
+ Node_less, /* 25 */
+ Node_greater, /* 26 */
+ Node_leq, /* 27 */
+ Node_geq, /* 28 */
+
+ /* unary relationals works on subnode */
+ Node_not, /* 29 */
+
+ /* match ops (binary) work on lnode and rnode ??? */
+ Node_match, /* 30 */
+ Node_nomatch, /* 31 */
+
+ /* data items */
+ Node_string, /* deprecated: 32 has stlen, stptr, and stref */
+ Node_temp_string, /* deprecated: 33 has stlen, stptr, and stref */
+ Node_number, /* deprecated: 34 has numbr */
+
+ /* program structures */
+ Node_rule_list, /* 35 lnode is a rule, rnode is rest of list */
+ Node_rule_node, /* 36 lnode is an conditional, rnode is
+ * statement */
+ Node_statement_list, /* 37 lnode is a statement, rnode is more
+ * list */
+ Node_if_branches, /* 38 lnode is to run on true, rnode on false */
+ Node_expression_list, /* 39 lnode is an exp, rnode is more list */
+ Node_param_list, /* 40 lnode is a variable, rnode is more list */
+
+ /* keywords */
+ Node_K_BEGIN, /* 41 no stuff */
+ Node_K_END, /* 42 ditto */
+ Node_K_if, /* 43 lnode is conditonal, rnode is
+ * if_branches */
+ Node_K_while, /* 44 lnode is condtional, rnode is stuff to
+ * run */
+ Node_K_for, /* 45 lnode is for_struct, rnode is stuff to
+ * run */
+ Node_K_arrayfor, /* 46 lnode is for_struct, rnode is stuff to
+ * run */
+ Node_K_break, /* 47 no subs */
+ Node_K_continue, /* 48 no stuff */
+ Node_K_print, /* 49 lnode is exp_list, rnode is redirect */
+ Node_K_printf, /* 50 lnode is exp_list, rnode is redirect */
+ Node_K_next, /* 51 no subs */
+ Node_K_exit, /* 52 subnode is return value, or NULL */
+ Node_K_do, /* 53 lnode is conditional, rnode is stuff to
+ * run */
+ Node_K_return, /* 54 */
+ Node_K_delete, /* 55 */
+
+ /* I/O redirection for print statements */
+ Node_redirect_output, /* 56 subnode is where to redirect */
+ Node_redirect_append, /* 57 subnode is where to redirect */
+ Node_redirect_pipe, /* 58 subnode is where to redirect */
+ Node_redirect_pipein, /* 59 subnode is where to redirect */
+ Node_redirect_input, /* 60 subnode is where to redirect */
+
+ /* Variables */
+ Node_var, /* 61 rnode is value, lnode is array stuff */
+ Node_var_array, /* 62 array is ptr to elements, asize num of
+ * eles */
+
+ /* Builtins subnode is explist to work on, proc is func to call */
+ Node_builtin, /* 63 */
+
+ /*
+ * pattern: conditional ',' conditional ; lnode of Node_line_range
+ * is the two conditionals (Node_cond_pair), other word (rnode place)
+ * is a flag indicating whether or not this range has been entered.
+ * (jfw@eddie.mit.edu)
+ */
+ Node_line_range, /* 64 */
+
+ /*
+ * boolean test of membership in array lnode is string-valued
+ * expression rnode is array name
+ */
+ Node_in_array, /* 65 */
+ Node_K_function, /* 66 lnode is statement list, rnode is
+ * func_params */
+ Node_func, /* 67 lnode is param. list, rnode is
+ * statement list */
+ Node_func_call, /* 68 lnode is name, rnode is expression list */
+ Node_K_getline, /* 69 */
+ Node_sub, /* 70 */
+ Node_gsub, /* 71 */
+ Node_K_match, /* 72 */
+ Node_cond_exp, /* 73 lnode is conditonal, rnode is
+ * if_branches */
+ Node_exp, /* 74 */
+ Node_assign_exp, /* 75 */
+ Node_regex, /* 76 */
+ Node_str_num, /* deprecated: 77 both string and numeric
+ * values are valid
+ */
+ Node_val, /* 78 node is a value - type given by bits in
+ * status - to replace Node_string, Node_num,
+ * Node_temp_str and Node_str_num
+ */
+ Node_case_match, /* 79 case independant regexp match */
+ Node_case_nomatch, /* 80 case independant regexp no match */
} NODETYPE;
typedef struct exp_node {
- NODETYPE type;
- union {
- struct {
- struct exp_node *lptr;
- union {
- struct exp_node *rptr;
- struct exp_node *(* pptr)();
- struct re_pattern_buffer *preg;
- struct for_loop_header *hd;
- struct ahash **av;
- int r_ent; /* range entered (jfw) */
- } r;
- } nodep;
- struct {
- struct exp_node **ap;
- int as;
- } ar;
- struct {
- char *sp;
- short slen,sref;
- } str;
- AWKNUM fltnum;
- } sub;
+ NODETYPE type;
+ union {
+ struct {
+ union {
+ struct exp_node *lptr;
+ char *param_name;
+ } l;
+ union {
+ struct exp_node *rptr;
+ struct exp_node *(*pptr) ();
+ struct re_pattern_buffer *preg;
+ struct for_loop_header *hd;
+ struct ahash **av;
+ int r_ent; /* range entered (jfw) */
+ } r;
+ int number;
+ char *name;
+ } nodep;
+ struct {
+ struct exp_node **ap;
+ int as;
+ } ar;
+ struct {
+ char *sp;
+ AWKNUM fltnum; /* this is here for optimal packing of
+ * the structure on many machines
+ */
+ short slen;
+ unsigned char sref;
+ } val;
+ } sub;
+ unsigned char flags;
+# define MEM 0x7
+# define MALLOC 1 /* can be free'd */
+# define TEMP 2 /* should be free'd */
+# define PERM 4 /* can't be free'd */
+# define VAL 0x18
+# define NUM 8
+# define STR 16
} NODE;
-#define lnode sub.nodep.lptr
+#define lnode sub.nodep.l.lptr
#define rnode sub.nodep.r.rptr
+#define varname sub.nodep.name
+#define source_file sub.nodep.name
+#define source_line sub.nodep.number
+#define param_cnt sub.nodep.number
+#define param sub.nodep.l.param_name
#define subnode lnode
#define proc sub.nodep.r.pptr
@@ -166,16 +354,17 @@ typedef struct exp_node {
#define rereg sub.nodep.r.preg
#define forsub lnode
-#define forloop sub.nodep.r.hd
+#define forloop rnode->sub.nodep.r.hd
#define array sub.ar.ap
#define arrsiz sub.ar.as
-#define stptr sub.str.sp
-#define stlen sub.str.slen
-#define stref sub.str.sref
+#define stptr sub.val.sp
+#define stlen sub.val.slen
+#define stref sub.val.sref
+#define valstat flags
-#define numbr sub.fltnum
+#define numbr sub.val.fltnum
#define var_value lnode
#define var_array sub.nodep.r.av
@@ -184,33 +373,43 @@ typedef struct exp_node {
#define triggered sub.nodep.r.r_ent
NODE *newnode(), *dupnode();
-NODE *node(), *snode(), *make_number(), *make_string();
-NODE *mkrangenode(); /* to remove the temptation to use sub.nodep.r.rptr
- * as a boolean flag, or to call node() with a 0 and
- * hope that it will store correctly as an int. (jfw)
- */
-NODE *tmp_string(),*tmp_number();
+NODE *node(), *snode(), *make_number(), *make_string(), *make_name();
+NODE *make_param();
+NODE *mkrangenode(); /* to remove the temptation to use
+ * sub.nodep.r.rptr as a boolean flag, or to
+ * call node() with a 0 and hope that it will
+ * store correctly as an int. (jfw) */
+NODE *tmp_string(), *tmp_number();
NODE *variable(), *append_right();
-NODE *tree_eval();
+NODE *r_tree_eval();
+NODE **get_lhs();
struct re_pattern_buffer *make_regexp();
+extern NODE **stack_ptr;
extern NODE *Nnull_string;
-
-#ifdef FAST
-double atof();
-NODE *strforce();
-#define force_number(x) ((x)->type==Node_number ? (x)->numbr : atof((x)->stptr))
-#define force_string(x) ((x)->type==Node_number ? (strforce(x)) : (x))
-#define tmp_node(ty) (global_tmp=(NODE *)obstack_alloc(&temp_strings,sizeof(NODE)),global_tmp->type=ty)
-#define tmp_number(n) (tmp_node(Node_number),global_tmp->numbr=(n),global_tmp)
-/* #define tmp_string(s,len) (tmp_node(Node_temp_string),global_tmp->stref=1,global_tmp->stlen=len,global_tmp->stptr=(char *)obstack_alloc(&temp_strings,len+1),bcopy(s,global_tmp->stptr,len),global_tmp->stptr[len]='\0',global_tmp) */
-NODE *global_tmp;
+extern NODE *FS_node, *NF_node, *RS_node, *NR_node;
+extern NODE *FILENAME_node, *OFS_node, *ORS_node, *OFMT_node;
+extern NODE *FNR_node, *RLENGTH_node, *RSTART_node, *SUBSEP_node;
+
+extern struct obstack other_stack;
+extern NODE *deref;
+extern NODE **fields_arr;
+extern int sourceline;
+extern char *source;
+
+#ifdef USG
+int sprintf();
#else
-AWKNUM force_number();
-NODE *force_string();
+char *sprintf();
#endif
+char *strcpy(), *strcat();
+
+double atof();
+AWKNUM r_force_number();
+NODE *r_force_string();
+
NODE *expression_value;
@@ -218,38 +417,118 @@ NODE *expression_value;
typedef struct hashnode HASHNODE;
struct hashnode {
- HASHNODE *next;
- char *name;
- int length;
- NODE *value;
+ HASHNODE *next;
+ char *name;
+ int length;
+ NODE *value;
} *variables[HASHSIZE];
typedef struct ahash AHASH;
struct ahash {
AHASH *next;
- NODE *name,
- *symbol,
- *value;
+ NODE *name, *symbol, *value;
};
typedef struct for_loop_header {
- NODE *init;
- NODE *cond;
- NODE *incr;
+ NODE *init;
+ NODE *cond;
+ NODE *incr;
} FOR_LOOP_HEADER;
-FOR_LOOP_HEADER *make_for_loop();
-
-#define ADD_ONE_REFERENCE(s) ++(s)->stref
-#define FREE_ONE_REFERENCE(s) {\
- if(s==Nnull_string) {\
- fprintf(stderr,"Free_Nnull_string %d",(s)->stref);\
- }\
- if (--(s)->stref == 0) {\
- free((char *)((s)->stptr));\
- free((char *)s);\
- }\
-}
-/* #define FREE_ONE_REFERENCE(s) {if (--(s)->stref == 0) {printf("FREE %x\n",s);free((s)->stptr);free(s);}} */
+NODE *make_for_loop();
+
+/* for "for(iggy in foo) {" */
+struct search {
+ int numleft;
+ AHASH **arr_ptr;
+ AHASH *bucket;
+ NODE *symbol;
+ NODE *retval;
+};
+
+struct search *assoc_scan(), *assoc_next();
+
+extern NODE *_t; /* used as temporary in following macro */
+extern NODE *_result;
+#define tree_eval(t) (_result = (_t = (t),(_t) == NULL ? Nnull_string : \
+ ((_t)->type == Node_val ? (_t) : r_tree_eval((_t)))))
+
+#define free_temp(n) if ((n)->flags&TEMP) { deref = (n); do_deref(); } else
+#define free_result() if (_result) free_temp(_result); else
+
+#ifdef USG
+#define index strchr
+#define rindex strrchr
+#define bcmp memcmp
+/* nasty nasty berkelixm */
+#define _setjmp setjmp
+#define _longjmp longjmp
+#endif
+
+char *index();
+
+/* longjmp return codes, must be nonzero */
+/* Continue means either for loop/while continue, or next input record */
+#define TAG_CONTINUE 1
+/* Break means either for/while break, or stop reading input */
+#define TAG_BREAK 2
+/* Return means return from a function call; leave value in ret_node */
+#define TAG_RETURN 3
+
+/*
+ * the loop_tag_valid variable allows continue/break-out-of-context to be
+ * caught and diagnosed (jfw)
+ */
+#define PUSH_BINDING(stack, x, val) (bcopy ((char *)(x), (char *)(stack), sizeof (jmp_buf)), val++)
+#define RESTORE_BINDING(stack, x, val) (bcopy ((char *)(stack), (char *)(x), sizeof (jmp_buf)), val--)
+
+/* nasty nasty SunOS-ism */
+#ifdef sparc
+#include <alloca.h>
+#endif
+
+extern char *myname;
+void msg();
+void warning();
+void illegal_type();
+void fatal();
+
+#define cant_happen() fatal("line %d, file: %s; bailing out", \
+ __LINE__, __FILE__);
+
+/*
+ * if you don't have vprintf, but you are BSD, the version defined in
+ * awk5.c should do the trick. Otherwise, use this and cross your fingers.
+ */
+#if !defined(VPRINTF) && !defined(BSD)
+#define vfprintf(fp,fmt,arg) _doprnt((fmt), (arg), (fp))
+#endif
+
+extern int errno;
+extern char *sys_errlist[];
+
+#define emalloc(var,ty,x,str) if ((var = (ty) malloc((unsigned)(x))) == NULL)\
+ fatal("%s: %s: can't allocate memory (%s)",\
+ (str), "var", sys_errlist[errno]); else
+#define erealloc(var,ty,x,str) if((var=(ty)realloc(var,(unsigned)(x)))==NULL)\
+ fatal("%s: %s: can't allocate memory (%s)",\
+ (str), "var", sys_errlist[errno]); else
+#ifdef DEBUG
+#define force_number r_force_number
+#define force_string r_force_string
+#else
+#define force_number(n) (_t = (n),(_t->flags & NUM) ? _t->numbr : r_force_number(_t))
+#define force_string(s) (_t = (s),(_t->flags & STR) ? _t : r_force_string(_t))
+#endif
+
+#define STREQ(a,b) (*(a) == *(b) && strcmp((a), (b)) == 0)
+#define HUGE 0x7fffffff
+
+extern int node0_valid;
+extern int field_num;
+extern NODE **get_field();
+#define WHOLELINE (node0_valid ? fields_arr[0] : *get_field(0))
+
+extern int strict;
diff --git a/awk.y b/awk.y
index 180505f6..fb0f0d80 100644
--- a/awk.y
+++ b/awk.y
@@ -1,9 +1,124 @@
-
/*
* gawk -- GNU version of awk
* Copyright (C) 1986 Free Software Foundation
* Written by Paul Rubin, August 1986
*
+ * $Log: awk.y,v $
+ * Revision 1.24 88/12/15 12:52:58 david
+ * changes from Jay to get rid of some reduce/reduce conflicts - some remain
+ *
+ * Revision 1.23 88/12/07 19:59:25 david
+ * changes for incorporating source filename in error messages
+ *
+ * Revision 1.22 88/11/23 21:37:24 david
+ * Arnold: refinements of AWKPATH code
+ *
+ * Revision 1.21 88/11/22 13:46:45 david
+ * Arnold: changes for case-insensitive matching
+ *
+ * Revision 1.20 88/11/15 10:13:37 david
+ * Arnold: allow multiple -f options and search in directories for awk libraries,
+ * directories specified by AWKPATH env. variable; cleanupo of comments and
+ * #includes
+ *
+ * Revision 1.19 88/11/14 21:51:30 david
+ * Arnold: added error message for BEGIN or END without any action at all;
+ * unlink temporary source file right after creation so it goes away on bomb
+ *
+ * Revision 1.18 88/10/19 22:00:56 david
+ * generalize (and correct) what pattern can be in pattern {action}; this
+ * introduces quite a few new conflicts that should be checked thoroughly
+ * at some point, but they don't seem to do any harm at first glance
+ * replace malloc with emalloc
+ *
+ * Revision 1.17 88/10/17 19:52:01 david
+ * Arnold: cleanup, purge FAST
+ *
+ * Revision 1.16 88/10/13 22:02:16 david
+ * cleanup of yyerror and other error messages
+ *
+ * Revision 1.15 88/10/06 23:24:57 david
+ * accept var space ++var
+ * accept underscore as first character of a variable name
+ *
+ * Revision 1.14 88/06/13 18:01:46 david
+ * delete \a (change from Arnold)
+ *
+ * Revision 1.13 88/06/08 00:29:42 david
+ * better attempt at keeping track of line numbers
+ * change grammar to properly handle newlines after && or ||
+ *
+ * Revision 1.12 88/06/07 23:39:02 david
+ * little delint
+ *
+ * Revision 1.11 88/06/05 22:17:40 david
+ * make_name() becomes make_param() (again!)
+ * func_level goes away, param_counter makes entrance
+ *
+ * Revision 1.10 88/05/30 09:49:02 david
+ * obstack_free was being called at end of function definition, freeing
+ * memory that might be part of global variables referenced only inside
+ * functions; commented out for now, will have to selectively free later.
+ * cleanup: regexp now returns a NODE *
+ *
+ * Revision 1.9 88/05/27 11:04:53 david
+ * added print[f] '(' ... ')' (optional parentheses)
+ * for some reason want_redirect wasn't getting set for PRINT, so I set it in
+ * yylex()
+ *
+ * Revision 1.8 88/05/26 22:52:14 david
+ * fixed cmd | getline
+ * added compound patterns (they got lost somewhere along the line)
+ * fixed error message in yylex()
+ * added null statement
+ *
+ * Revision 1.7 88/05/13 22:05:29 david
+ * moved BEGIN and END block merging here
+ * BEGIN, END and function defs. are no longer incorporated into main parse tree
+ * fixed command | getline
+ * fixed function install and definition
+ *
+ * Revision 1.6 88/05/09 17:47:50 david
+ * Arnold's coded binary search
+ *
+ * Revision 1.5 88/05/04 12:31:13 david
+ * be a bit more careful about types
+ * make_for_loop() now returns a NODE *
+ * keyword search now uses bsearch() -- need a public domain version of this
+ * added back stuff in yylex() that got lost somewhere along the line
+ * malloc() tokens in yylex() since they were previously just pointers into
+ * current line that got overwritten by the next fgets() -- these need to get
+ * freed at some point
+ * fixed backslash line continuation interaction with CONCAT
+ *
+ * Revision 1.4 88/04/14 17:03:51 david
+ * reinstalled a fix to do with line continuation
+ *
+ * Revision 1.3 88/04/14 14:41:01 david
+ * Arnold's changes to yylex to read program from a file
+ *
+ * Revision 1.5 88/03/18 21:00:07 david
+ * Baseline -- hoefully all the functionality of the new awk added.
+ * Just debugging and tuning to do.
+ *
+ * Revision 1.4 87/11/19 14:37:20 david
+ * added a bunch of ew builtin functions
+ * added new rules for getline to provide new functionality
+ * minor cleanup of redirection handling
+ * generalized make_param into make_name
+ *
+ * Revision 1.3 87/11/09 21:22:33 david
+ * added macinery for user-defined functions (including return)
+ * added delete, do-while and system
+ * reformatted and revised grammer to improve error-handling
+ * changes to yyerror to give improved error messages
+ *
+ * Revision 1.2 87/10/29 21:33:28 david
+ * added test for membership in an array, as in: if ("yes" in answers) ...
+ *
+ * Revision 1.1 87/10/27 15:23:21 david
+ * Initial revision
+ *
*/
/*
@@ -26,191 +141,291 @@ anyone else from sharing it farther. Help stamp out software hoarding!
%{
#define YYDEBUG 12
+#define YYIMPROVE
-#include <stdio.h>
#include "awk.h"
- static int yylex ();
-
-
- /*
- * The following variable is used for a very sickening thing.
- * The awk language uses white space as the string concatenation
- * operator, but having a white space token that would have to appear
- * everywhere in all the grammar rules would be unbearable.
- * It turns out we can return CONCAT_OP exactly when there really
- * is one, just from knowing what kinds of other tokens it can appear
- * between (namely, constants, variables, or close parentheses).
- * This is because concatenation has the lowest priority of all
- * operators. want_concat_token is used to remember that something
- * that could be the left side of a concat has just been returned.
- *
- * If anyone knows a cleaner way to do this (don't look at the Un*x
- * code to find one, though), please suggest it.
- */
- static int want_concat_token;
-
- /* Two more horrible kludges. The same comment applies to these two too */
- static int want_regexp; /* lexical scanning kludge */
- static int want_redirect; /* similarly */
- int lineno = 1; /* JF for error msgs */
+static int yylex ();
+
+/*
+ * The following variable is used for a very sickening thing.
+ * The awk language uses white space as the string concatenation
+ * operator, but having a white space token that would have to appear
+ * everywhere in all the grammar rules would be unbearable.
+ * It turns out we can return CONCAT_OP exactly when there really
+ * is one, just from knowing what kinds of other tokens it can appear
+ * between (namely, constants, variables, or close parentheses).
+ * This is because concatenation has the lowest priority of all
+ * operators. want_concat_token is used to remember that something
+ * that could be the left side of a concat has just been returned.
+ *
+ * If anyone knows a cleaner way to do this (don't look at the Un*x
+ * code to find one, though), please suggest it.
+ */
+static int want_concat_token;
+
+/* Two more horrible kludges. The same comment applies to these two too */
+static int want_regexp; /* lexical scanning kludge */
+static int want_redirect; /* similarly */
+int lineno = 1; /* for error msgs */
/* During parsing of a gawk program, the pointer to the next character
is in this variable. */
- char *lexptr; /* JF moved it up here */
- char *lexptr_begin; /* JF for error msgs */
+char *lexptr; /* moved it up here */
+char *lexptr_begin; /* for error msgs */
+char *func_def;
+extern int errcount;
+extern NODE *begin_block;
+extern NODE *end_block;
+extern struct re_pattern_buffer *mk_re_parse();
+extern int param_counter;
+struct re_pattern_buffer *rp;
%}
%union {
- long lval;
- AWKNUM fval;
- NODE *nodeval;
- NODETYPE nodetypeval;
- char *sval;
- NODE *(*ptrval)();
+ long lval;
+ AWKNUM fval;
+ NODE *nodeval;
+ NODETYPE nodetypeval;
+ char *sval;
+ NODE *(*ptrval)();
}
-%type <nodeval> exp start program rule pattern conditional
-%type <nodeval> action variable redirection expression_list
-%type <nodeval> statements statement if_statement
-%type <nodeval> opt_exp v_exp
-%type <nodetypeval> whitespace
+%type <nodeval> function_prologue function_body
+%type <nodeval> exp sub_exp start program rule pattern expression_list
+%type <nodeval> action variable redirection param_list opt_expression_list
+%type <nodeval> statements statement if_statement opt_param_list
+%type <nodeval> opt_exp opt_variable regexp
+%type <nodetypeval> whitespace r_paren
%token <sval> NAME REGEXP YSTRING
%token <lval> ERROR INCDEC
%token <fval> NUMBER
%token <nodetypeval> ASSIGNOP RELOP MATCHOP NEWLINE REDIRECT_OP CONCAT_OP
-%token <nodetypeval> LEX_BEGIN LEX_END LEX_IF LEX_ELSE
-%token <nodetypeval> LEX_WHILE LEX_FOR LEX_BREAK LEX_CONTINUE
-%token <nodetypeval> LEX_PRINT LEX_PRINTF LEX_NEXT LEX_EXIT
-%token LEX_IN
+%token <nodetypeval> LEX_BEGIN LEX_END LEX_IF LEX_ELSE LEX_RETURN LEX_DELETE
+%token <nodetypeval> LEX_WHILE LEX_DO LEX_FOR LEX_BREAK LEX_CONTINUE
+%token <nodetypeval> LEX_PRINT LEX_PRINTF LEX_NEXT LEX_EXIT LEX_FUNCTION
+%token <nodetypeval> LEX_GETLINE LEX_SUB LEX_MATCH
+%token <nodetypeval> LEX_IN
%token <lval> LEX_AND LEX_OR INCREMENT DECREMENT
%token <ptrval> LEX_BUILTIN
/* these are just yylval numbers */
-/* %token <lval> CHAR JF this isn't used anymore */
/* Lowest to highest */
+%right ASSIGNOP
+%right '?' ':'
%left LEX_OR
%left LEX_AND
-%right ASSIGNOP
+%left LEX_IN
+%nonassoc MATCHOP
+%nonassoc RELOP
+%nonassoc REDIRECT_OP
%left CONCAT_OP
%left '+' '-'
%left '*' '/' '%'
%right UNARY
-%nonassoc MATCHOP RELOP
+%right '^'
+%left INCREMENT DECREMENT
+%left '$'
%%
-start : optional_newlines program
+start
+ : opt_newlines program
{ expression_value = $2; }
;
-
-program : rule
- { $$ = node ($1, Node_rule_list,(NODE *) NULL); }
+program
+ : rule
+ {
+ if ($1 != NULL)
+ $$ = node ($1, Node_rule_list,(NODE *) NULL);
+ else
+ $$ = NULL;
+ yyerrok;
+ }
| program rule
/* cons the rule onto the tail of list */
- { $$ = append_right ($1, node($2, Node_rule_list,(NODE *) NULL)); }
+ {
+ if ($2 == NULL)
+ $$ = $1;
+ else if ($1 == NULL)
+ $$ = node($2, Node_rule_list,(NODE *) NULL);
+ else
+ $$ = append_right ($1,
+ node($2, Node_rule_list,(NODE *) NULL));
+ yyerrok;
+ }
+ | error { $$ = NULL; }
+ | program error
;
-rule : pattern action NEWLINE optional_newlines
- { $$ = node ($1, Node_rule_node, $2); }
+rule
+ : LEX_BEGIN action
+ {
+ if (begin_block)
+ append_right (begin_block, node(
+ node((NODE *)NULL, Node_rule_node, $2),
+ Node_rule_list, (NODE *)NULL) );
+ else
+ begin_block = node(node((NODE *)NULL,Node_rule_node,$2),
+ Node_rule_list, (NODE *)NULL);
+ $$ = NULL;
+ yyerrok;
+ }
+ | LEX_END action
+ {
+ if (end_block)
+ append_right (end_block, node(
+ node((NODE *)NULL, Node_rule_node, $2),
+ Node_rule_list, (NODE *)NULL));
+ else
+ end_block = node(node((NODE *)NULL, Node_rule_node, $2),
+ Node_rule_list, (NODE *)NULL);
+ $$ = NULL;
+ yyerrok;
+ }
+ | LEX_BEGIN statement_term
+ {
+ msg ("error near line %d: BEGIN blocks must have an action part", lineno);
+ errcount++;
+ yyerrok;
+ }
+ | LEX_END statement_term
+ {
+ msg ("error near line %d: END blocks must have an action part", lineno);
+ errcount++;
+ yyerrok;
+ }
+ | pattern action
+ { $$ = node ($1, Node_rule_node, $2); yyerrok; }
+ | pattern statement_term
+ { if($1) $$ = node ($1, Node_rule_node, (NODE *)NULL); yyerrok; }
+ | function_prologue function_body
+ {
+ func_install($1, $2);
+ $$ = NULL;
+ yyerrok;
+ }
+ ;
+
+function_prologue
+ : LEX_FUNCTION
+ {
+ param_counter = 0;
+ }
+ NAME whitespace '(' opt_param_list r_paren whitespace
+ {
+ $$ = append_right(make_param($3), $6);
+ }
;
+function_body
+ : l_brace statements r_brace statement_term
+ { $$ = $2; }
+ ;
-pattern : /* empty */
+pattern
+ : /* empty */
{ $$ = NULL; }
- | conditional
+ | sub_exp
{ $$ = $1; }
- | conditional ',' conditional
- { $$ = mkrangenode ( node($1, Node_cond_pair, $3) ); } /*jfw*/
- ;
-
-
-conditional :
- LEX_BEGIN
- { $$ = node ((NODE *)NULL, Node_K_BEGIN,(NODE *) NULL); }
- | LEX_END
- { $$ = node ((NODE *)NULL, Node_K_END,(NODE *) NULL); }
- | '!' conditional %prec UNARY
- { $$ = node ($2, Node_not,(NODE *) NULL); }
- | conditional LEX_AND conditional
+ | regexp
+ {
+ $$ = node(
+ node(make_number((AWKNUM)0),Node_field_spec,(NODE*)NULL),
+ Node_match, $1);
+ }
+ | pattern LEX_AND pattern
{ $$ = node ($1, Node_and, $3); }
- | conditional LEX_OR conditional
+ | pattern LEX_OR pattern
{ $$ = node ($1, Node_or, $3); }
- | '(' conditional ')'
- {
- $$ = $2;
- want_concat_token = 0;
- }
+ | '!' pattern %prec UNARY
+ { $$ = node ($2, Node_not,(NODE *) NULL); }
+ | '(' pattern r_paren
+ { $$ = $2; }
+ | pattern ',' pattern
+ { $$ = mkrangenode ( node($1, Node_cond_pair, $3) ); }
+ ;
- /* In these rules, want_regexp tells yylex that the next thing
+regexp
+ /* In this rule, want_regexp tells yylex that the next thing
is a regexp so it should read up to the closing slash. */
-
- | '/'
+ : '/'
{ ++want_regexp; }
REGEXP '/'
{ want_regexp = 0;
- $$ = node (node (make_number ((AWKNUM)0), Node_field_spec, (NODE *)NULL),
- Node_match, (NODE *)make_regexp ($3));
+ rp = mk_re_parse($3);
+ $$ = node((NODE *)NULL, Node_regex, (NODE *)rp);
}
- | exp MATCHOP '/'
- { ++want_regexp; }
- REGEXP '/'
- { want_regexp = 0;
- $$ = node ($1, $2, (NODE *)make_regexp($5));
- }
- | exp RELOP exp
- { $$ = node ($1, $2, $3); }
- | exp /* JF */
- { $$ = $1; }
;
-
-action : /* empty */
- { $$ = NULL; }
- | '{' whitespace statements '}'
- { $$ = $3; }
+action
+ : l_brace r_brace
+ {
+ /* empty actions are different from missing actions */
+ $$ = node ((NODE *) NULL, Node_illegal, (NODE *) NULL);
+ }
+ | l_brace statements r_brace
+ { $$ = $2 ; }
;
-
-statements : /* EMPTY */
- { $$ = NULL; }
- | statement
+statements
+ : statement
{ $$ = node ($1, Node_statement_list, (NODE *)NULL); }
| statements statement
- { $$ = append_right($1, node( $2, Node_statement_list, (NODE *)NULL)); }
+ {
+ $$ = append_right($1,
+ node( $2, Node_statement_list, (NODE *)NULL));
+ yyerrok;
+ }
+ | error
+ { $$ = NULL; }
+ | statements error
;
-statement_term :
- NEWLINE optional_newlines
- { $<nodetypeval>$ = Node_illegal; }
- | ';' optional_newlines
- { $<nodetypeval>$ = Node_illegal; }
+statement_term
+ : NEWLINE opt_newlines
+ { $<nodetypeval>$ = Node_illegal; want_redirect = 0; }
+ | semi_colon opt_newlines
+ { $<nodetypeval>$ = Node_illegal; want_redirect = 0; }
;
-whitespace :
- /* blank */
- { $$ = Node_illegal; }
- | CONCAT_OP
+whitespace
+ : /* blank */
+ { $<nodetypeval>$ = Node_illegal; }
+ | CONCAT_OP
+ { $<nodetypeval>$ = Node_illegal; }
| NEWLINE
+ { $<nodetypeval>$ = Node_illegal; }
| whitespace CONCAT_OP
+ { $<nodetypeval>$ = Node_illegal; }
| whitespace NEWLINE
+ { $<nodetypeval>$ = Node_illegal; }
;
-statement :
- '{' whitespace statements '}' whitespace
- { $$ = $3; }
+
+statement
+ : semi_colon opt_newlines
+ { $$ = NULL; }
+ | l_brace statements r_brace whitespace
+ { $$ = $2; }
| if_statement
{ $$ = $1; }
- | LEX_WHILE '(' conditional ')' whitespace statement
+ | LEX_WHILE '(' exp r_paren whitespace statement
{ $$ = node ($3, Node_K_while, $6); }
- | LEX_FOR '(' opt_exp ';' conditional ';' opt_exp ')' whitespace statement
+ | LEX_DO whitespace statement LEX_WHILE '(' exp r_paren whitespace
+ { $$ = node ($6, Node_K_do, $3); }
+ | LEX_FOR '(' opt_exp semi_colon exp semi_colon opt_exp r_paren whitespace statement
{ $$ = node ($10, Node_K_for, (NODE *)make_for_loop ($3, $5, $7)); }
- | LEX_FOR '(' opt_exp ';' ';' opt_exp ')' whitespace statement
+ | LEX_FOR '(' opt_exp semi_colon semi_colon opt_exp r_paren whitespace statement
{ $$ = node ($9, Node_K_for, (NODE *)make_for_loop ($3, (NODE *)NULL, $6)); }
- | LEX_FOR '(' NAME CONCAT_OP LEX_IN NAME ')' whitespace statement
- { $$ = node ($9, Node_K_arrayfor, (NODE *)make_for_loop(variable($3), (NODE *)NULL, variable($6))); }
+ | LEX_FOR '(' NAME CONCAT_OP LEX_IN NAME r_paren whitespace statement
+ {
+ $$ = node ($9, Node_K_arrayfor,
+ make_for_loop(variable($3),
+ (NODE *)NULL, variable($6)));
+ }
| LEX_BREAK statement_term
/* for break, maybe we'll have to remember where to break to */
{ $$ = node ((NODE *)NULL, Node_K_break, (NODE *)NULL); }
@@ -219,680 +434,969 @@ statement :
{ $$ = node ((NODE *)NULL, Node_K_continue, (NODE *)NULL); }
| LEX_PRINT
{ ++want_redirect; }
- expression_list redirection statement_term
- {
- want_redirect = 0;
- /* $4->lnode = NULL; */
- $$ = node ($3, Node_K_print, $4);
- }
+ opt_expression_list redirection statement_term
+ { $$ = node ($3, Node_K_print, $4); }
+ | LEX_PRINT '(' opt_expression_list r_paren
+ { ++want_redirect; want_concat_token = 0; }
+ redirection statement_term
+ { $$ = node ($3, Node_K_print, $6); }
| LEX_PRINTF
{ ++want_redirect; }
- expression_list redirection statement_term
- {
- want_redirect = 0;
- /* $4->lnode = NULL; */
- $$ = node ($3, Node_K_printf, $4);
- }
- | LEX_PRINTF '(' expression_list ')'
- { ++want_redirect;
- want_concat_token = 0; }
- redirection statement_term
- {
- want_redirect = 0;
- $$ = node ($3, Node_K_printf, $6);
- }
+ opt_expression_list redirection statement_term
+ { $$ = node ($3, Node_K_printf, $4); }
+ | LEX_PRINTF '(' opt_expression_list r_paren
+ { ++want_redirect; want_concat_token = 0; }
+ redirection statement_term
+ { $$ = node ($3, Node_K_printf, $6); }
| LEX_NEXT statement_term
{ $$ = node ((NODE *)NULL, Node_K_next, (NODE *)NULL); }
- | LEX_EXIT statement_term
- { $$ = node ((NODE *)NULL, Node_K_exit, (NODE *)NULL); }
- | LEX_EXIT '(' exp ')' statement_term
- { $$ = node ($3, Node_K_exit, (NODE *)NULL); }
+ | LEX_EXIT opt_exp statement_term
+ { $$ = node ($2, Node_K_exit, (NODE *)NULL); }
+ | LEX_RETURN opt_exp statement_term
+ { $$ = node ($2, Node_K_return, (NODE *)NULL); }
+ | LEX_DELETE NAME '[' expression_list ']' statement_term
+ { $$ = node (variable($2), Node_K_delete, $4); }
| exp statement_term
{ $$ = $1; }
;
-
-if_statement:
- LEX_IF '(' conditional ')' whitespace statement
+if_statement
+ : LEX_IF '(' exp r_paren whitespace statement
{ $$ = node ($3, Node_K_if,
node ($6, Node_if_branches, (NODE *)NULL)); }
- | LEX_IF '(' conditional ')' whitespace statement
+ | LEX_IF '(' exp r_paren whitespace statement
LEX_ELSE whitespace statement
{ $$ = node ($3, Node_K_if,
node ($6, Node_if_branches, $9)); }
;
-optional_newlines :
- /* empty */
- | optional_newlines NEWLINE
+opt_newlines
+ : /* empty */
+ | opt_newlines NEWLINE
{ $<nodetypeval>$ = Node_illegal; }
;
-redirection :
- /* empty */
- { $$ = NULL; /* node (NULL, Node_redirect_nil, NULL); */ }
- /* | REDIRECT_OP NAME
- { $$ = node ($2, $1, NULL); } */
- | REDIRECT_OP exp
- { $$ = node ($2, $1, (NODE *)NULL); }
+redirection
+ : /* empty */
+ { want_redirect = 0; $$ = NULL; }
+ | REDIRECT_OP
+ { want_redirect = 0; }
+ exp
+ { $$ = node ($3, $1, (NODE *)NULL); }
+ ;
+
+opt_param_list
+ : /* empty */
+ { $$ = NULL; }
+ | param_list
+ /* $$ = $1 */
;
+param_list
+ : NAME
+ {
+ $$ = make_param($1);
+ }
+ | param_list ',' NAME
+ {
+ $$ = append_right($1, make_param($3));
+ yyerrok;
+ }
+ | error
+ { $$ = NULL; }
+ | param_list error
+ | param_list ',' error
+ ;
/* optional expression, as in for loop */
-opt_exp :
+opt_exp
+ : /* empty */
{ $$ = NULL; /* node(NULL, Node_builtin, NULL); */ }
| exp
- { $$ = $1; }
;
-expression_list :
- /* empty */
+opt_expression_list
+ : /* empty */
{ $$ = NULL; }
- | exp
+ | expression_list
+ { $$ = $1; }
+ ;
+
+expression_list
+ : exp
{ $$ = node ($1, Node_expression_list, (NODE *)NULL); }
| expression_list ',' exp
- { $$ = append_right($1, node( $3, Node_expression_list, (NODE *)NULL)); }
+ {
+ $$ = append_right($1,
+ node( $3, Node_expression_list, (NODE *)NULL));
+ yyerrok;
+ }
+ | error
+ { $$ = NULL; }
+ | expression_list error
+ | expression_list error exp
+ | expression_list ',' error
;
-
/* Expressions, not including the comma operator. */
-exp : LEX_BUILTIN '(' expression_list ')'
- { $$ = snode ($3, Node_builtin, $1); }
- | LEX_BUILTIN
- { $$ = snode ((NODE *)NULL, Node_builtin, $1); }
- | '(' exp ')'
- { $$ = $2; }
- | '-' exp %prec UNARY
- { $$ = node ($2, Node_unary_minus, (NODE *)NULL); }
- | INCREMENT variable %prec UNARY
- { $$ = node ($2, Node_preincrement, (NODE *)NULL); }
- | DECREMENT variable %prec UNARY
- { $$ = node ($2, Node_predecrement, (NODE *)NULL); }
- | variable INCREMENT %prec UNARY
- { $$ = node ($1, Node_postincrement, (NODE *)NULL); }
- | variable DECREMENT %prec UNARY
- { $$ = node ($1, Node_postdecrement, (NODE *)NULL); }
- | variable
- { $$ = $1; } /* JF was variable($1) */
- | NUMBER
- { $$ = make_number ($1); }
- | YSTRING
- { $$ = make_string ($1, -1); }
+exp : sub_exp
+ | exp LEX_AND whitespace exp
+ { $$ = node ($1, Node_and, $4); }
+ | exp LEX_OR whitespace exp
+ { $$ = node ($1, Node_or, $4); }
+ | '!' exp %prec UNARY
+ { $$ = node ($2, Node_not,(NODE *) NULL); }
+ | '(' exp r_paren
+ { $$ = $2; }
+ ;
+
+sub_exp : LEX_BUILTIN '(' opt_expression_list r_paren
+ { $$ = snode ($3, Node_builtin, $1); }
+ | LEX_BUILTIN
+ { $$ = snode ((NODE *)NULL, Node_builtin, $1); }
+ | exp MATCHOP regexp
+ { $$ = node ($1, $2, $3); }
+ | exp MATCHOP exp
+ { $$ = node ($1, $2, $3); }
+ | exp CONCAT_OP LEX_IN NAME
+ { $$ = node (variable($4), Node_in_array, $1); }
+ | '(' expression_list r_paren CONCAT_OP LEX_IN NAME
+ { $$ = node (variable($6), Node_in_array, $2); }
+ | LEX_SUB '(' regexp ',' expression_list r_paren
+ { $$ = node($5, $1, $3); }
+ | LEX_SUB '(' exp ',' expression_list r_paren
+ { $$ = node($5, $1, $3); }
+ | LEX_MATCH '(' exp ',' regexp r_paren
+ { $$ = node($3, $1, $5); }
+ | LEX_MATCH '(' exp ',' exp r_paren
+ { $$ = node($3, $1, $5); }
+ | LEX_GETLINE
+ {++want_redirect; }
+ opt_variable redirection
+ {
+ $$ = node ($3, Node_K_getline, $4);
+ }
+ | exp '|' LEX_GETLINE opt_variable
+ {
+ $$ = node ($4, Node_K_getline,
+ node ($1, Node_redirect_pipein, (NODE *)NULL));
+ }
+ | exp RELOP exp
+ { $$ = node ($1, $2, $3); }
+ | exp '?' exp ':' exp
+ { $$ = node($1, Node_cond_exp, node($3, Node_if_branches, $5)); }
+ | NAME '(' opt_expression_list r_paren
+ {
+ $$ = node ($3, Node_func_call, make_string($1, strlen($1)));
+ }
+ | '-' exp %prec UNARY
+ { $$ = node ($2, Node_unary_minus, (NODE *)NULL); }
+ | '+' exp %prec UNARY
+ { $$ = $2; }
+ | INCREMENT variable
+ { $$ = node ($2, Node_preincrement, (NODE *)NULL); }
+ | DECREMENT variable
+ { $$ = node ($2, Node_predecrement, (NODE *)NULL); }
+ | variable INCREMENT
+ { $$ = node ($1, Node_postincrement, (NODE *)NULL); }
+ | variable DECREMENT
+ { $$ = node ($1, Node_postdecrement, (NODE *)NULL); }
+ | variable
+ { $$ = $1; }
+ | NUMBER
+ { $$ = make_number ($1); }
+ | YSTRING
+ { $$ = make_string ($1, -1); }
/* Binary operators in order of decreasing precedence. */
- | exp '*' exp
- { $$ = node ($1, Node_times, $3); }
- | exp '/' exp
- { $$ = node ($1, Node_quotient, $3); }
- | exp '%' exp
- { $$ = node ($1, Node_mod, $3); }
- | exp '+' exp
- { $$ = node ($1, Node_plus, $3); }
- | exp '-' exp
- { $$ = node ($1, Node_minus, $3); }
+ | exp '^' exp
+ { $$ = node ($1, Node_exp, $3); }
+ | exp '*' exp
+ { $$ = node ($1, Node_times, $3); }
+ | exp '/' exp
+ { $$ = node ($1, Node_quotient, $3); }
+ | exp '%' exp
+ { $$ = node ($1, Node_mod, $3); }
+ | exp '+' exp
+ { $$ = node ($1, Node_plus, $3); }
+ | exp '-' exp
+ { $$ = node ($1, Node_minus, $3); }
/* Empty operator. See yylex for disgusting details. */
- | exp CONCAT_OP exp
- { $$ = node ($1, Node_concat, $3); }
- | variable ASSIGNOP exp
- { $$ = node ($1, $2, $3); }
+ | exp CONCAT_OP exp
+ { $$ = node ($1, Node_concat, $3); }
+ | variable ASSIGNOP exp
+ { $$ = node ($1, $2, $3); }
;
-v_exp : LEX_BUILTIN '(' expression_list ')'
- { $$ = snode ($3, Node_builtin, $1); }
- | LEX_BUILTIN
- { $$ = snode ((NODE *)NULL, Node_builtin, $1); }
- | '(' exp ')'
- { $$ = $2; }
- | '-' exp %prec UNARY
- { $$ = node ($2, Node_unary_minus, (NODE *)NULL); }
- | INCREMENT variable %prec UNARY
- { $$ = node ($2, Node_preincrement, (NODE *)NULL); }
- | DECREMENT variable %prec UNARY
- { $$ = node ($2, Node_predecrement, (NODE *)NULL); }
- | variable INCREMENT %prec UNARY
- { $$ = node ($1, Node_postincrement, (NODE *)NULL); }
- | variable DECREMENT %prec UNARY
- { $$ = node ($1, Node_postdecrement, (NODE *)NULL); }
- | variable
- { $$ = $1; } /* JF was variable($1) */
- | NUMBER
- { $$ = make_number ($1); }
- | YSTRING
- { $$ = make_string ($1, -1); }
+opt_variable
+ : /* empty */
+ { $$ = NULL; }
+ | variable
+ ;
-/* Binary operators in order of decreasing precedence. */
- | v_exp '*' exp
- { $$ = node ($1, Node_times, $3); }
- | v_exp '/' exp
- { $$ = node ($1, Node_quotient, $3); }
- | v_exp '%' exp
- { $$ = node ($1, Node_mod, $3); }
- | v_exp '+' exp
- { $$ = node ($1, Node_plus, $3); }
- | v_exp '-' exp
- { $$ = node ($1, Node_minus, $3); }
- /* Empty operator. See yylex for disgusting details. */
- | v_exp CONCAT_OP exp
- { $$ = node ($1, Node_concat, $3); }
+variable
+ : NAME
+ { $$ = variable ($1); }
+ | NAME '[' expression_list ']'
+ { $$ = node (variable($1), Node_subscript, $3); }
+ | '$' exp
+ { $$ = node ($2, Node_field_spec, (NODE *)NULL); }
+ ;
+
+l_brace
+ : '{' whitespace
;
-variable :
- NAME
- { $$ = variable ($1); }
- | NAME '[' exp ']'
- { $$ = node (variable($1), Node_subscript, $3); }
- | '$' v_exp %prec UNARY
- { $$ = node ($2, Node_field_spec, (NODE *)NULL); }
+r_brace
+ : '}' { yyerrok; }
+ ;
+
+r_paren
+ : ')' { $<nodetypeval>$ = Node_illegal; yyerrok; }
+ ;
+
+semi_colon
+ : ';' { yyerrok; }
;
%%
-
struct token {
- char *operator;
- NODETYPE value;
- int class;
- NODE *(*ptr)();
+ char *operator;
+ NODETYPE value;
+ int class;
+ NODE *(*ptr) ();
};
#define NULL 0
NODE *do_exp(), *do_getline(), *do_index(), *do_length(),
*do_sqrt(), *do_log(), *do_sprintf(), *do_substr(),
- *do_split(), *do_int();
+ *do_split(), *do_system(), *do_int(), *do_close(),
+ *do_atan2(), *do_sin(), *do_cos(), *do_rand(),
+ *do_srand(), *do_match();
- /* Special functions for debugging */
-#ifndef FAST
-NODE *do_prvars(), *do_bp();
+/* Special functions for debugging */
+#ifdef DEBUG
+NODE *do_prvars(), *do_bp();
#endif
/* Tokentab is sorted ascii ascending order, so it can be binary searched. */
-/* (later. Right now its just sort of linear search (SLOW!!) */
static struct token tokentab[] = {
- {"BEGIN", Node_illegal, LEX_BEGIN, 0},
- {"END", Node_illegal, LEX_END, 0},
-#ifndef FAST
- {"bp", Node_builtin, LEX_BUILTIN, do_bp},
+ { "BEGIN", Node_illegal, LEX_BEGIN, 0 },
+ { "END", Node_illegal, LEX_END, 0 },
+ { "atan2", Node_builtin, LEX_BUILTIN, do_atan2 },
+#ifdef DEBUG
+ { "bp", Node_builtin, LEX_BUILTIN, do_bp },
#endif
- {"break", Node_K_break, LEX_BREAK, 0},
- {"continue", Node_K_continue, LEX_CONTINUE, 0},
- {"else", Node_illegal, LEX_ELSE, 0},
- {"exit", Node_K_exit, LEX_EXIT, 0},
- {"exp", Node_builtin, LEX_BUILTIN, do_exp},
- {"for", Node_K_for, LEX_FOR, 0},
- {"getline", Node_builtin, LEX_BUILTIN, do_getline},
- {"if", Node_K_if, LEX_IF, 0},
- {"in", Node_illegal, LEX_IN, 0},
- {"index", Node_builtin, LEX_BUILTIN, do_index},
- {"int", Node_builtin, LEX_BUILTIN, do_int},
- {"length", Node_builtin, LEX_BUILTIN, do_length},
- {"log", Node_builtin, LEX_BUILTIN, do_log},
- {"next", Node_K_next, LEX_NEXT, 0},
- {"print", Node_K_print, LEX_PRINT, 0},
- {"printf", Node_K_printf, LEX_PRINTF, 0},
-#ifndef FAST
- {"prvars", Node_builtin, LEX_BUILTIN, do_prvars},
+ { "break", Node_K_break, LEX_BREAK, 0 },
+ { "close", Node_builtin, LEX_BUILTIN, do_close },
+ { "continue", Node_K_continue, LEX_CONTINUE, 0 },
+ { "cos", Node_builtin, LEX_BUILTIN, do_cos },
+ { "delete", Node_K_delete, LEX_DELETE, 0 },
+ { "do", Node_K_do, LEX_DO, 0 },
+ { "else", Node_illegal, LEX_ELSE, 0 },
+ { "exit", Node_K_exit, LEX_EXIT, 0 },
+ { "exp", Node_builtin, LEX_BUILTIN, do_exp },
+ { "for", Node_K_for, LEX_FOR, 0 },
+ { "func", Node_K_function, LEX_FUNCTION, 0 },
+ { "function", Node_K_function, LEX_FUNCTION, 0 },
+ { "getline", Node_K_getline, LEX_GETLINE, 0 },
+ { "gsub", Node_gsub, LEX_SUB, 0 },
+ { "if", Node_K_if, LEX_IF, 0 },
+ { "in", Node_illegal, LEX_IN, 0 },
+ { "index", Node_builtin, LEX_BUILTIN, do_index },
+ { "int", Node_builtin, LEX_BUILTIN, do_int },
+ { "length", Node_builtin, LEX_BUILTIN, do_length },
+ { "log", Node_builtin, LEX_BUILTIN, do_log },
+ { "match", Node_K_match, LEX_MATCH, 0 },
+ { "next", Node_K_next, LEX_NEXT, 0 },
+ { "print", Node_K_print, LEX_PRINT, 0 },
+ { "printf", Node_K_printf, LEX_PRINTF, 0 },
+#ifdef DEBUG
+ { "prvars", Node_builtin, LEX_BUILTIN, do_prvars },
#endif
- {"split", Node_builtin, LEX_BUILTIN, do_split},
- {"sprintf", Node_builtin, LEX_BUILTIN, do_sprintf},
- {"sqrt", Node_builtin, LEX_BUILTIN, do_sqrt},
- {"substr", Node_builtin, LEX_BUILTIN, do_substr},
- {"while", Node_K_while, LEX_WHILE, 0},
- {NULL, Node_illegal, ERROR, 0}
+ { "rand", Node_builtin, LEX_BUILTIN, do_rand },
+ { "return", Node_K_return, LEX_RETURN, 0 },
+ { "sin", Node_builtin, LEX_BUILTIN, do_sin },
+ { "split", Node_builtin, LEX_BUILTIN, do_split },
+ { "sprintf", Node_builtin, LEX_BUILTIN, do_sprintf },
+ { "sqrt", Node_builtin, LEX_BUILTIN, do_sqrt },
+ { "srand", Node_builtin, LEX_BUILTIN, do_srand },
+ { "sub", Node_sub, LEX_SUB, 0 },
+ { "substr", Node_builtin, LEX_BUILTIN, do_substr },
+ { "system", Node_builtin, LEX_BUILTIN, do_system },
+ { "while", Node_K_while, LEX_WHILE, 0 },
};
-/* Read one token, getting characters through lexptr. */
+/* VARARGS0 */
+yyerror(va_alist)
+va_dcl
+{
+ va_list args;
+ char *mesg;
+ char *a1;
+ register char *ptr, *beg;
+ static int list = 0;
+ char *scan;
+
+ errcount++;
+ va_start(args);
+ mesg = va_arg(args, char *);
+ if (mesg || !list) {
+ /* Find the current line in the input file */
+ if (!lexptr) {
+ beg = "(END OF FILE)";
+ ptr = beg + 13;
+ } else {
+ if (*lexptr == '\n' && lexptr != lexptr_begin)
+ --lexptr;
+ for (beg = lexptr; beg != lexptr_begin && *beg != '\n'; --beg)
+ ;
+ /* NL isn't guaranteed */
+ for (ptr = lexptr; *ptr && *ptr != '\n'; ptr++)
+ ;
+ if (beg != lexptr_begin)
+ beg++;
+ }
+ msg("syntax error near line %d:\n%.*s", lineno, ptr - beg, beg);
+ scan = beg;
+ while (scan <= lexptr)
+ if (*scan++ == '\t')
+ putc('\t', stderr);
+ else
+ putc(' ', stderr);
+ putc('^', stderr);
+ putc(' ', stderr);
+ if (mesg) {
+ vfprintf(stderr, mesg, args);
+ va_end(args);
+ putc('\n', stderr);
+ exit(1);
+ } else {
+ a1 = va_arg(args, char *);
+ if (a1) {
+ fputs("expecting: ", stderr);
+ fputs(a1, stderr);
+ list = 1;
+ va_end(args);
+ return;
+ }
+ }
+ va_end(args);
+ return;
+ }
+ a1 = va_arg(args, char *);
+ if (a1) {
+ fputs(" or ", stderr);
+ fputs(a1, stderr);
+ va_end(args);
+ putc('\n', stderr);
+ return;
+ }
+ putc('\n', stderr);
+ list = 0;
+ va_end(args);
+}
+
+/*
+ * Parse a C escape sequence. STRING_PTR points to a variable containing a
+ * pointer to the string to parse. That pointer is updated past the
+ * characters we use. The value of the escape sequence is returned.
+ *
+ * A negative value means the sequence \ newline was seen, which is supposed to
+ * be equivalent to nothing at all.
+ *
+ * If \ is followed by a null character, we return a negative value and leave
+ * the string pointer pointing at the null character.
+ *
+ * If \ is followed by 000, we return 0 and leave the string pointer after the
+ * zeros. A value of 0 does not mean end of string.
+ */
static int
-yylex ()
+parse_escape(string_ptr)
+char **string_ptr;
{
- register int c;
- register int namelen;
- register char *tokstart;
- register struct token *toktab;
- double atof(); /* JF know what happens if you forget this? */
-
-
- static did_newline = 0; /* JF the grammar insists that actions end
- with newlines. This was easier than hacking
- the grammar. */
- int do_concat;
-
- int seen_e = 0; /* These are for numbers */
- int seen_point = 0;
-
- retry:
-
- if(!lexptr)
- return 0;
-
- if (want_regexp) {
- want_regexp = 0;
- /* there is a potential bug if a regexp is followed by an equal sign:
- "/foo/=bar" would result in assign_quotient being returned as the
- next token. Nothing is done about it since it is not valid awk,
- but maybe something should be done anyway. */
-
- tokstart = lexptr;
- while (c = *lexptr++) {
- switch (c) {
- case '\\':
- if (*lexptr++ == '\0') {
- yyerror ("unterminated regexp ends with \\");
- return ERROR;
+ register int c = *(*string_ptr)++;
+
+ switch (c) {
+ case 'b':
+ return '\b';
+ case 'f':
+ return '\f';
+ case 'n':
+ return '\n';
+ case 'r':
+ return '\r';
+ case 't':
+ return '\t';
+ case 'v':
+ return '\v';
+ case '\n':
+ return -2;
+ case 0:
+ (*string_ptr)--;
+ return 0;
+ case '0':
+ case '1':
+ case '2':
+ case '3':
+ case '4':
+ case '5':
+ case '6':
+ case '7':
+ {
+ register int i = c - '0';
+ register int count = 0;
+
+ while (++count < 3) {
+ if ((c = *(*string_ptr)++) >= '0' && c <= '7') {
+ i *= 8;
+ i += c - '0';
+ } else {
+ (*string_ptr)--;
+ break;
+ }
+ }
+ return i;
+ }
+ default:
+ return c;
}
- break;
- case '/': /* end of the regexp */
- lexptr--;
- yylval.sval = tokstart;
- return REGEXP;
- case '\n':
- case '\0':
- yyerror ("unterminated regexp");
- return ERROR;
- }
- }
- }
- do_concat=want_concat_token;
- want_concat_token=0;
-
- if(*lexptr=='\0') {
- lexptr=0;
- return NEWLINE;
- }
-
- /* if lexptr is at white space between two terminal tokens or parens,
- it is a concatenation operator. */
- if(do_concat && (*lexptr==' ' || *lexptr=='\t')) {
- while (*lexptr == ' ' || *lexptr == '\t')
- lexptr++;
- if (isalnum(*lexptr) || *lexptr == '\"' || *lexptr == '('
- || *lexptr == '.' || *lexptr == '$') /* the '.' is for decimal pt */
- return CONCAT_OP;
- }
-
- while (*lexptr == ' ' || *lexptr == '\t')
- lexptr++;
-
- tokstart = lexptr; /* JF */
-
- switch (c = *lexptr++) {
- case 0:
- return 0;
-
- case '\n':
- lineno++;
- return NEWLINE;
-
- case '#': /* it's a comment */
- while (*lexptr != '\n' && *lexptr != '\0')
- lexptr++;
- goto retry;
-
- case '\\':
- if(*lexptr=='\n') {
- lexptr++;
- goto retry;
- } else break;
- case ')':
- case ']':
- ++want_concat_token;
- /* fall through */
- case '(': /* JF these were above, but I don't see why they should turn on concat. . . &*/
- case '[':
-
- case '{':
- case ',': /* JF */
- case '$':
- case ';':
- /* set node type to ILLEGAL because the action should set it to
- the right thing */
- yylval.nodetypeval = Node_illegal;
- return c;
-
- case '*':
- if(*lexptr=='=') {
- yylval.nodetypeval=Node_assign_times;
- lexptr++;
- return ASSIGNOP;
- }
- yylval.nodetypeval=Node_illegal;
- return c;
-
- case '/':
- if(*lexptr=='=') {
- yylval.nodetypeval=Node_assign_quotient;
- lexptr++;
- return ASSIGNOP;
- }
- yylval.nodetypeval=Node_illegal;
- return c;
-
- case '%':
- if(*lexptr=='=') {
- yylval.nodetypeval=Node_assign_mod;
- lexptr++;
- return ASSIGNOP;
- }
- yylval.nodetypeval=Node_illegal;
- return c;
-
- case '+':
- if(*lexptr=='=') {
- yylval.nodetypeval=Node_assign_plus;
- lexptr++;
- return ASSIGNOP;
- }
- if(*lexptr=='+') {
- yylval.nodetypeval=Node_illegal;
- lexptr++;
- return INCREMENT;
- }
- yylval.nodetypeval=Node_illegal;
- return c;
-
- case '!':
- if(*lexptr=='=') {
- yylval.nodetypeval=Node_notequal;
- lexptr++;
- return RELOP;
- }
- if(*lexptr=='~') {
- yylval.nodetypeval=Node_nomatch;
- lexptr++;
- return MATCHOP;
- }
- yylval.nodetypeval=Node_illegal;
- return c;
-
- case '<':
- if(*lexptr=='=') {
- yylval.nodetypeval=Node_leq;
- lexptr++;
- return RELOP;
- }
- yylval.nodetypeval=Node_less;
- return RELOP;
-
- case '=':
- if(*lexptr=='=') {
- yylval.nodetypeval=Node_equal;
- lexptr++;
- return RELOP;
- }
- yylval.nodetypeval=Node_assign;
- return ASSIGNOP;
-
- case '>':
- if(want_redirect) {
- if (*lexptr == '>') {
- yylval.nodetypeval = Node_redirect_append;
- lexptr++;
- } else
- yylval.nodetypeval = Node_redirect_output;
- return REDIRECT_OP;
- }
- if(*lexptr=='=') {
- yylval.nodetypeval=Node_geq;
- lexptr++;
- return RELOP;
- }
- yylval.nodetypeval=Node_greater;
- return RELOP;
-
- case '~':
- yylval.nodetypeval=Node_match;
- return MATCHOP;
-
- case '}': /* JF added did newline stuff. Easier than hacking the grammar */
- if(did_newline) {
- did_newline=0;
- return c;
- }
- did_newline++;
- --lexptr;
- return NEWLINE;
-
- case '"':
- while (*lexptr != '\0') {
- switch (*lexptr++) {
- case '\\':
- if (*lexptr++ != '\0')
- break;
- /* fall through */
- case '\n':
- yyerror ("unterminated string");
- return ERROR;
- case '\"':
- yylval.sval = tokstart + 1; /* JF Skip the doublequote */
- ++want_concat_token;
- return YSTRING;
- }
- }
- return ERROR; /* JF this was one level up, wrong? */
-
- case '-':
- if(*lexptr=='=') {
- yylval.nodetypeval=Node_assign_minus;
- lexptr++;
- return ASSIGNOP;
- }
- if(*lexptr=='-') {
- yylval.nodetypeval=Node_illegal;
- lexptr++;
- return DECREMENT;
- }
- /* JF I think space tab comma and newline are the legal places for
- a UMINUS. Have I missed any? */
- if((!isdigit(*lexptr) && *lexptr!='.') || (lexptr>lexptr_begin+1 &&
- !index(" \t,\n",lexptr[-2]))) {
- /* set node type to ILLEGAL because the action should set it to
- the right thing */
- yylval.nodetypeval = Node_illegal;
- return c;
- }
- /* FALL through into number code */
- case '0':
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- case '8':
- case '9':
- case '.':
- /* It's a number */
- if(c=='-') namelen=1;
- else namelen=0;
- for (; (c = tokstart[namelen]) != '\0'; namelen++) {
- switch (c) {
- case '.':
- if (seen_point)
- goto got_number;
- ++seen_point;
- break;
- case 'e':
- case 'E':
- if (seen_e)
- goto got_number;
- ++seen_e;
- if (tokstart[namelen+1] == '-' || tokstart[namelen+1] == '+')
- namelen++;
- break;
- case '0': case '1': case '2': case '3': case '4':
- case '5': case '6': case '7': case '8': case '9':
- break;
- default:
- goto got_number;
- }
- }
-
-got_number:
- lexptr = tokstart + namelen;
- yylval.fval = atof(tokstart);
- ++want_concat_token;
- return NUMBER;
-
- case '&':
- if(*lexptr=='&') {
- yylval.nodetypeval=Node_and;
- lexptr++;
- return LEX_AND;
- }
- return ERROR;
-
- case '|':
- if(want_redirect) {
- lexptr++;
- yylval.nodetypeval = Node_redirect_pipe;
- return REDIRECT_OP;
- }
- if(*lexptr=='|') {
- yylval.nodetypeval=Node_or;
- lexptr++;
- return LEX_OR;
- }
- return ERROR;
- }
-
- if (!isalpha(c)) {
- yyerror ("Invalid char '%c' in expression\n", c);
- return ERROR;
- }
-
- /* its some type of name-type-thing. Find its length */
- for (namelen = 0; is_identchar(tokstart[namelen]); namelen++)
- ;
-
-
- /* See if it is a special token. */
- for (toktab = tokentab; toktab->operator != NULL; toktab++) {
- if(*tokstart==toktab->operator[0] &&
- !strncmp(tokstart,toktab->operator,namelen) &&
- toktab->operator[namelen]=='\0') {
- lexptr=tokstart+namelen;
- if(toktab->class == LEX_BUILTIN)
- yylval.ptrval = toktab->ptr;
- else
- yylval.nodetypeval = toktab->value;
- return toktab->class;
- }
- }
-
- /* It's a name. See how long it is. */
- yylval.sval = tokstart;
- lexptr = tokstart+namelen;
- ++want_concat_token;
- return NAME;
}
-/*VARARGS1*/
-yyerror (mesg,a1,a2,a3,a4,a5,a6,a7,a8)
- char *mesg;
+/*
+ * Read the input and turn it into tokens. Input is now read from a file
+ * instead of from malloc'ed memory. The main program takes a program
+ * passed as a command line argument and writes it to a temp file. Otherwise
+ * the file name is made available in an external variable.
+ */
+
+int curinfile = -1;
+
+static int
+yylex()
{
- register char *ptr,*beg;
-
- /* Find the current line in the input file */
- if(!lexptr) {
- beg="(END OF FILE)";
- ptr=beg+13;
- } else {
- if (*lexptr == '\n' && lexptr!=lexptr_begin)
- --lexptr;
- for (beg = lexptr;beg!=lexptr_begin && *beg != '\n';--beg)
- ;
- for (ptr = lexptr;*ptr && *ptr != '\n';ptr++) /*jfw: NL isn't guaranteed*/
- ;
- if(beg!=lexptr_begin)
- beg++;
- }
- fprintf (stderr, "Error near line %d, '%.*s'\n",lineno, ptr-beg, beg);
- /* figure out line number, etc. later */
- fprintf (stderr, mesg, a1, a2, a3, a4, a5, a6, a7, a8);
- fprintf (stderr,"\n");
- exit (1);
-}
+ register int c;
+ register int namelen;
+ register char *tokstart;
+ register struct token *tokptr;
+ char *tokkey;
+ extern double atof(); /* know what happens if you forget this? */
+ static did_newline = 0; /* the grammar insists that actions end
+ * with newlines. This was easier than
+ * hacking the grammar. */
+ int do_concat;
+ int seen_e = 0; /* These are for numbers */
+ int seen_point = 0;
+ extern char **sourcefile;
+ extern int tempsource, numfiles;
+ extern FILE *pathopen();
+ static int file_opened = 0;
+ static FILE *fin;
+ static char cbuf[BUFSIZ];
+ int low, mid, high;
+ extern int debugging;
+
+ if (! file_opened) {
+ file_opened = 1;
+#ifdef DEBUG
+ if (debugging) {
+ int i;
+
+ for (i = 0; i <= numfiles; i++)
+ fprintf (stderr, "sourcefile[%d] = %s\n", i,
+ sourcefile[i]);
+ }
+#endif
+ nextfile:
+ if ((fin = pathopen (sourcefile[++curinfile])) == NULL)
+ fatal("cannot open `%s' for reading (%s)",
+ sourcefile[curinfile],
+ sys_errlist[errno]);
+ *(lexptr = cbuf) = '\0';
+ /*
+ * immediately unlink the tempfile so that it will
+ * go away cleanly if we bomb.
+ */
+ if (tempsource && curinfile == 0)
+ (void) unlink (sourcefile[curinfile]);
+ }
+
+retry:
+ if (! *lexptr)
+ if (fgets (cbuf, sizeof cbuf, fin) == NULL) {
+ if (fin != NULL)
+ fclose (fin); /* be neat and clean */
+ if (curinfile < numfiles)
+ goto nextfile;
+ return 0;
+ } else
+ lexptr = lexptr_begin = cbuf;
+
+ if (want_regexp) {
+ want_regexp = 0;
+
+ /*
+ * there is a potential bug if a regexp is followed by an
+ * equal sign: "/foo/=bar" would result in assign_quotient
+ * being returned as the next token. Nothing is done about
+ * it since it is not valid awk, but maybe something should
+ * be done anyway.
+ */
+
+ tokstart = lexptr;
+ while (c = *lexptr++) {
+ switch (c) {
+ case '\\':
+ if (*lexptr++ == '\0') {
+ yyerror("unterminated regexp ends with \\");
+ return ERROR;
+ } else if (lexptr[-1] == '\n')
+ goto retry;
+ break;
+ case '/': /* end of the regexp */
+ lexptr--;
+ yylval.sval = tokstart;
+ return REGEXP;
+ case '\n':
+ lineno++;
+ case '\0':
+ yyerror("unterminated regexp");
+ return ERROR;
+ }
+ }
+ }
+ do_concat = want_concat_token;
+ want_concat_token = 0;
-/* Parse a C escape sequence. STRING_PTR points to a variable
- containing a pointer to the string to parse. That pointer
- is updated past the characters we use. The value of the
- escape sequence is returned.
+ if (*lexptr == '\n') {
+ lexptr++;
+ lineno++;
+ return NEWLINE;
+ }
- A negative value means the sequence \ newline was seen,
- which is supposed to be equivalent to nothing at all.
+ /*
+ * if lexptr is at white space between two terminal tokens or parens,
+ * it is a concatenation operator.
+ */
+ if (do_concat && (*lexptr == ' ' || *lexptr == '\t')) {
+ while (*lexptr == ' ' || *lexptr == '\t')
+ lexptr++;
+ if (isalnum(*lexptr) || *lexptr == '_' || *lexptr == '\"' ||
+ *lexptr == '(' || *lexptr == '.' || *lexptr == '$' ||
+ (*lexptr == '+' && *(lexptr+1) == '+') ||
+ (*lexptr == '-' && *(lexptr+1) == '-'))
+ /* the '.' is for decimal pt */
+ return CONCAT_OP;
+ }
+ while (*lexptr == ' ' || *lexptr == '\t')
+ lexptr++;
+
+ tokstart = lexptr;
+
+ switch (c = *lexptr++) {
+ case 0:
+ return 0;
+
+ case '\n':
+ lineno++;
+ return NEWLINE;
+
+ case '#': /* it's a comment */
+ while (*lexptr != '\n' && *lexptr != '\0')
+ lexptr++;
+ goto retry;
+
+ case '\\':
+ if (*lexptr == '\n') {
+ lineno++;
+ lexptr++;
+ want_concat_token = do_concat;
+ goto retry;
+ } else
+ break;
+ case ')':
+ case ']':
+ ++want_concat_token;
+ /* fall through */
+ case '(':
+ case '[':
+ case '$':
+ case ';':
+ case ':':
+ case '?':
+
+ /*
+ * set node type to ILLEGAL because the action should set it
+ * to the right thing
+ */
+ yylval.nodetypeval = Node_illegal;
+ return c;
+
+ case '{':
+ case ',':
+ while (isspace(*lexptr)) {
+ if (*lexptr == '\n')
+ lineno++;
+ lexptr++;
+ }
+ yylval.nodetypeval = Node_illegal;
+ return c;
+
+ case '*':
+ if (*lexptr == '=') {
+ yylval.nodetypeval = Node_assign_times;
+ lexptr++;
+ return ASSIGNOP;
+ } else if (*lexptr == '*') { /* make ** and **= aliases
+ * for ^ and ^= */
+ if (lexptr[1] == '=') {
+ yylval.nodetypeval = Node_assign_exp;
+ lexptr += 2;
+ return ASSIGNOP;
+ } else {
+ yylval.nodetypeval = Node_illegal;
+ lexptr++;
+ return '^';
+ }
+ }
+ yylval.nodetypeval = Node_illegal;
+ return c;
+
+ case '/':
+ if (*lexptr == '=') {
+ yylval.nodetypeval = Node_assign_quotient;
+ lexptr++;
+ return ASSIGNOP;
+ }
+ yylval.nodetypeval = Node_illegal;
+ return c;
+
+ case '%':
+ if (*lexptr == '=') {
+ yylval.nodetypeval = Node_assign_mod;
+ lexptr++;
+ return ASSIGNOP;
+ }
+ yylval.nodetypeval = Node_illegal;
+ return c;
+
+ case '^':
+ if (*lexptr == '=') {
+ yylval.nodetypeval = Node_assign_exp;
+ lexptr++;
+ return ASSIGNOP;
+ }
+ yylval.nodetypeval = Node_illegal;
+ return c;
+
+ case '+':
+ if (*lexptr == '=') {
+ yylval.nodetypeval = Node_assign_plus;
+ lexptr++;
+ return ASSIGNOP;
+ }
+ if (*lexptr == '+') {
+ yylval.nodetypeval = Node_illegal;
+ lexptr++;
+ return INCREMENT;
+ }
+ yylval.nodetypeval = Node_illegal;
+ return c;
+
+ case '!':
+ if (*lexptr == '=') {
+ yylval.nodetypeval = Node_notequal;
+ lexptr++;
+ return RELOP;
+ }
+ if (*lexptr == '~') {
+ yylval.nodetypeval = Node_nomatch;
+ if (! strict && lexptr[1] == '~') {
+ yylval.nodetypeval = Node_case_nomatch;
+ lexptr++;
+ }
+ lexptr++;
+ return MATCHOP;
+ }
+ yylval.nodetypeval = Node_illegal;
+ return c;
- If \ is followed by a null character, we return a negative
- value and leave the string pointer pointing at the null character.
+ case '<':
+ if (want_redirect) {
+ yylval.nodetypeval = Node_redirect_input;
+ return REDIRECT_OP;
+ }
+ if (*lexptr == '=') {
+ yylval.nodetypeval = Node_leq;
+ lexptr++;
+ return RELOP;
+ }
+ yylval.nodetypeval = Node_less;
+ return RELOP;
+
+ case '=':
+ if (*lexptr == '=') {
+ yylval.nodetypeval = Node_equal;
+ lexptr++;
+ return RELOP;
+ }
+ yylval.nodetypeval = Node_assign;
+ return ASSIGNOP;
+
+ case '>':
+ if (want_redirect) {
+ if (*lexptr == '>') {
+ yylval.nodetypeval = Node_redirect_append;
+ lexptr++;
+ } else
+ yylval.nodetypeval = Node_redirect_output;
+ return REDIRECT_OP;
+ }
+ if (*lexptr == '=') {
+ yylval.nodetypeval = Node_geq;
+ lexptr++;
+ return RELOP;
+ }
+ yylval.nodetypeval = Node_greater;
+ return RELOP;
+
+ case '~':
+ yylval.nodetypeval = Node_match;
+ if (! strict && *lexptr == '~') {
+ yylval.nodetypeval = Node_case_match;
+ lexptr++;
+ }
+ return MATCHOP;
+
+ case '}':
+ /*
+ * Added did newline stuff. Easier than
+ * hacking the grammar
+ */
+ if (did_newline) {
+ did_newline = 0;
+ return c;
+ }
+ did_newline++;
+ --lexptr;
+ return NEWLINE;
+
+ case '"':
+ while (*lexptr != '\0') {
+ switch (*lexptr++) {
+ case '\\':
+ if (*lexptr++ != '\0')
+ break;
+ /* fall through */
+ case '\n':
+ yyerror("unterminated string");
+ return ERROR;
+ case '\"':
+ /* Skip the doublequote */
+ yylval.sval = tokstart + 1;
+ ++want_concat_token;
+ return YSTRING;
+ }
+ }
+ return ERROR;
- If \ is followed by 000, we return 0 and leave the string pointer
- after the zeros. A value of 0 does not mean end of string. */
+ case '-':
+ if (*lexptr == '=') {
+ yylval.nodetypeval = Node_assign_minus;
+ lexptr++;
+ return ASSIGNOP;
+ }
+ if (*lexptr == '-') {
+ yylval.nodetypeval = Node_illegal;
+ lexptr++;
+ return DECREMENT;
+ }
-static int
-parse_escape (string_ptr)
- char **string_ptr;
-{
- register int c = *(*string_ptr)++;
- switch (c)
- {
- case 'a':
- return '\a';
- case 'b':
- return '\b';
- case 'e':
- return 033;
- case 'f':
- return '\f';
- case 'n':
- return '\n';
- case 'r':
- return '\r';
- case 't':
- return '\t';
- case 'v':
- return '\v';
- case '\n':
- return -2;
- case 0:
- (*string_ptr)--;
- return 0;
- case '^':
- c = *(*string_ptr)++;
- if (c == '\\')
- c = parse_escape (string_ptr);
- if (c == '?')
- return 0177;
- return (c & 0200) | (c & 037);
-
- case '0':
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- {
- register int i = c - '0';
- register int count = 0;
- while (++count < 3)
- {
- if ((c = *(*string_ptr)++) >= '0' && c <= '7')
- {
- i *= 8;
- i += c - '0';
- }
- else
- {
- (*string_ptr)--;
+ /*
+ * It looks like space tab comma and newline are the legal
+ * places for a UMINUS. Have we missed any?
+ */
+ if ((! isdigit(*lexptr) && *lexptr != '.') ||
+ (lexptr > lexptr_begin + 1 &&
+ ! index(" \t,\n", lexptr[-2]))) {
+
+ /*
+ * set node type to ILLEGAL because the action should
+ * set it to the right thing
+ */
+ yylval.nodetypeval = Node_illegal;
+ return c;
+ }
+ /* FALL through into number code */
+ case '0':
+ case '1':
+ case '2':
+ case '3':
+ case '4':
+ case '5':
+ case '6':
+ case '7':
+ case '8':
+ case '9':
+ case '.':
+ /* It's a number */
+ if (c == '-')
+ namelen = 1;
+ else
+ namelen = 0;
+ for (; (c = tokstart[namelen]) != '\0'; namelen++) {
+ switch (c) {
+ case '.':
+ if (seen_point)
+ goto got_number;
+ ++seen_point;
+ break;
+ case 'e':
+ case 'E':
+ if (seen_e)
+ goto got_number;
+ ++seen_e;
+ if (tokstart[namelen + 1] == '-' || tokstart[namelen + 1] == '+')
+ namelen++;
+ break;
+ case '0':
+ case '1':
+ case '2':
+ case '3':
+ case '4':
+ case '5':
+ case '6':
+ case '7':
+ case '8':
+ case '9':
+ break;
+ default:
+ goto got_number;
+ }
+ }
+
+got_number:
+ lexptr = tokstart + namelen;
+ yylval.fval = atof(tokstart);
+ ++want_concat_token;
+ return NUMBER;
+
+ case '&':
+ if (*lexptr == '&') {
+ yylval.nodetypeval = Node_and;
+ lexptr++;
+ return LEX_AND;
+ }
+ return ERROR;
+
+ case '|':
+ if (*lexptr == '|') {
+ yylval.nodetypeval = Node_or;
+ lexptr++;
+ return LEX_OR;
+ } else if (want_redirect) {
+ yylval.nodetypeval = Node_redirect_pipe;
+ return REDIRECT_OP;
+ } else {
+ yylval.nodetypeval = Node_illegal;
+ return c;
+ }
break;
- }
- }
- return i;
- }
- default:
- return c;
- }
+ }
+
+ if (c != '_' && !isalpha(c)) {
+ yyerror("Invalid char '%c' in expression\n", c);
+ return ERROR;
+ }
+
+ /* it's some type of name-type-thing. Find its length */
+ for (namelen = 0; is_identchar(tokstart[namelen]); namelen++)
+ /* null */ ;
+ emalloc(tokkey, char *, namelen+1, "yylex");
+ strncpy (tokkey, tokstart, namelen);
+ tokkey[namelen] = '\0';
+
+ /* See if it is a special token. */
+ low = 0;
+ high = (sizeof (tokentab) / sizeof (tokentab[0])) - 1;
+ while (low <= high) {
+ int i, c;
+
+ mid = (low + high) / 2;
+
+ compare:
+ c = *tokstart - tokentab[mid].operator[0];
+ i = c ? c : strcmp (tokkey, tokentab[mid].operator);
+
+ if (i < 0) { /* token < mid */
+ high = mid - 1;
+ } else if (i > 0) { /* token > mid */
+ low = mid + 1;
+ } else {
+ lexptr = tokstart + namelen;
+ if (tokentab[mid].class == LEX_BUILTIN)
+ yylval.ptrval = tokentab[mid].ptr;
+ else
+ yylval.nodetypeval = tokentab[mid].value;
+ if (tokentab[mid].class == LEX_PRINT)
+ want_redirect++;
+ return tokentab[mid].class;
+ }
+ }
+
+ /* It's a name. See how long it is. */
+ yylval.sval = tokkey;
+ lexptr = tokstart + namelen;
+ ++want_concat_token;
+ return NAME;
+}
+
+#ifndef DEFPATH
+#define DEFPATH ".:/usr/lib/awk:/usr/local/lib/awk"
+#endif
+
+FILE *
+pathopen (file)
+char *file;
+{
+ static char defpath[] = DEFPATH;
+ static char *savepath;
+ static int first = 1;
+ extern char *getenv ();
+ char *awkpath, *cp;
+ char trypath[BUFSIZ];
+ FILE *fp;
+ extern int debugging;
+
+ if (strict)
+ return (fopen (file, "r"));
+
+ if (first) {
+ first = 0;
+ if ((awkpath = getenv ("AWKPATH")) == NULL || ! *awkpath)
+ awkpath = defpath;
+ savepath = awkpath; /* savepath used for restarting */
+ } else
+ awkpath = savepath;
+
+ if (index (file, '/') != NULL) /* some kind of path name, no search */
+ return (fopen (file, "r"));
+
+ do {
+ for (cp = trypath; *awkpath && *awkpath != ':'; )
+ *cp++ = *awkpath++;
+ *cp++ = '/';
+ *cp = '\0'; /* clear left over junk */
+ strcat (cp, file);
+ if ((fp = fopen (trypath, "r")) != NULL)
+ return (fp);
+
+ /* no luck, keep going */
+ awkpath++; /* skip colon */
+ } while (*awkpath);
+ return (NULL);
}
diff --git a/awk1.c b/awk1.c
index 79963bac..b6b4c7b4 100644
--- a/awk1.c
+++ b/awk1.c
@@ -1,370 +1,460 @@
+
/*
- * awk1 -- Expression tree constructors and main program for gawk.
+ * awk1 -- Expression tree constructors and main program for gawk.
+ *
+ * Copyright (C) 1986 Free Software Foundation Written by Paul Rubin, August
+ * 1986
+ *
+ * $Log: awk1.c,v $
+ * Revision 1.30 88/12/15 12:56:18 david
+ * changes from Jay to compile under gcc and fixing a bug in treatment of
+ * input files
+ *
+ * Revision 1.29 88/12/08 15:57:41 david
+ * *** empty log message ***
+ *
+ * Revision 1.28 88/12/07 20:00:15 david
+ * changes for incorporating source filename into error messages
+ *
+ * Revision 1.27 88/12/01 15:05:26 david
+ * changes to allow source line number printing in error messages
+ *
+ * Revision 1.26 88/11/28 20:12:30 david
+ * unbuffer stdout if compiled with DEBUG
+ *
+ * Revision 1.25 88/11/23 21:39:57 david
+ * Arnold: set strict if invoked as "awk"
+ *
+ * Revision 1.24 88/11/22 13:47:40 david
+ * Arnold: changes for case-insensitive matching
+ *
+ * Revision 1.23 88/11/15 10:18:57 david
+ * Arnold: cleanup; allow multiple -f options; if invoked as awk disable
+ * -v option for compatability
+ *
+ * Revision 1.22 88/11/14 21:54:27 david
+ * Arnold: cleanup
+ *
+ * Revision 1.21 88/11/03 15:22:22 david
+ * revised flags
+ *
+ * Revision 1.20 88/11/01 11:47:24 david
+ * mostly cleanup and code movement
+ *
+ * Revision 1.19 88/10/19 21:56:00 david
+ * replace malloc with emalloc
+ *
+ * Revision 1.18 88/10/17 20:57:19 david
+ * Arnold: purge FAST
+ *
+ * Revision 1.17 88/10/14 22:11:24 david
+ * pathc from Hack to get gcc to work
+ *
+ * Revision 1.16 88/10/13 21:55:08 david
+ * purge FAST; clean up error messages
+ *
+ * Revision 1.15 88/10/06 21:55:20 david
+ * be more careful about I/O errors on exit
+ * Arnold's fixes for command line processing
+ *
+ * Revision 1.14 88/10/06 15:52:24 david
+ * changes from Arnold: use getopt; ifdef -v option; change semantics of = args.
+ *
+ * Revision 1.13 88/09/26 10:16:35 david
+ * cleanup from Arnold
+ *
+ * Revision 1.12 88/09/19 20:38:29 david
+ * added -v option
+ * set FILENAME to "-" if stdin
+ *
+ * Revision 1.11 88/08/09 14:49:23 david
+ * reorganized handling of command-line arguments and files
+ *
+ * Revision 1.10 88/06/13 18:08:25 david
+ * delete \a and -R options
+ * separate exit value from flag indicating that exit has been called
+ * [from Arnold]
+ *
+ * Revision 1.9 88/06/05 22:19:41 david
+ * func_level goes away; param_counter is used to count order of local vars.
+ *
+ * Revision 1.8 88/05/31 09:21:56 david
+ * Arnold's portability changes (vprintf)
+ * fixed handling of function parameter use inside function
+ *
+ * Revision 1.7 88/05/30 09:51:17 david
+ * exit after yyparse() if any errors were encountered
+ * mk_re_parse() parses C escapes in regexps.
+ * do panic() properly with varargs
+ * clean up and simplify pop_var()
+ *
+ * Revision 1.6 88/05/26 22:46:19 david
+ * minor changes: break out separate case for making regular expressions
+ * from parser vs. from strings
+ *
+ * Revision 1.5 88/05/13 22:02:31 david
+ * moved BEGIN and END block merging into parse-phase (sorry Arnold)
+ * if there is only a BEGIN block and nothing else, don't read any files
+ * cleaned up func_install a bit
+ *
+ * Revision 1.4 88/05/04 12:18:28 david
+ * make_for_loop() now returns a NODE *
+ * pop_var() now returns the value of the node being popped, to be used
+ * in func_call() if the variable is an array (call by reference)
+ *
+ * Revision 1.3 88/04/15 13:12:19 david
+ * small fix to arg reading code
+ *
+ * Revision 1.2 88/04/14 14:40:25 david
+ * Arnold's changes to read program from a file
+ *
+ * Revision 1.1 88/04/08 15:14:52 david
+ * Initial revision
+ * Revision 1.6 88/04/08 14:48:30 david changes from
+ * Arnold Robbins
+ *
+ * Revision 1.5 88/03/28 14:13:33 david *** empty log message ***
+ *
+ * Revision 1.4 88/03/23 22:17:33 david mostly delinting -- a couple of bug
+ * fixes
*
- * Copyright (C) 1986 Free Software Foundation
- * Written by Paul Rubin, August 1986
+ * Revision 1.3 88/03/18 21:00:09 david Baseline -- hoefully all the
+ * functionality of the new awk added. Just debugging and tuning to do.
+ *
+ * Revision 1.2 87/11/19 14:40:17 david added support for user-defined
+ * functions
+ *
+ * Revision 1.1 87/10/27 15:23:26 david Initial revision
*
*/
/*
-GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
-WARRANTY. No author or distributor accepts responsibility to anyone
-for the consequences of using it or for whether it serves any
-particular purpose or works at all, unless he says so in writing.
-Refer to the GAWK General Public License for full details.
-
-Everyone is granted permission to copy, modify and redistribute GAWK,
-but only under the conditions described in the GAWK General Public
-License. A copy of this license is supposed to have been given to you
-along with GAWK so you can know your rights and responsibilities. It
-should be in a file named COPYING. Among other things, the copyright
-notice and this notice must be preserved on all copies.
-
-In other words, go ahead and share GAWK, but don't try to stop
-anyone else from sharing it farther. Help stamp out software hoarding!
-*/
-
-#include <stdio.h>
-#include "regex.h"
-#include "awk.h"
+ * GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
+ * WARRANTY. No author or distributor accepts responsibility to anyone for
+ * the consequences of using it or for whether it serves any particular
+ * purpose or works at all, unless he says so in writing. Refer to the GAWK
+ * General Public License for full details.
+ *
+ * Everyone is granted permission to copy, modify and redistribute GAWK, but
+ * only under the conditions described in the GAWK General Public License. A
+ * copy of this license is supposed to have been given to you along with GAWK
+ * so you can know your rights and responsibilities. It should be in a file
+ * named COPYING. Among other things, the copyright notice and this notice
+ * must be preserved on all copies.
+ *
+ * In other words, go ahead and share GAWK, but don't try to stop anyone else
+ * from sharing it farther. Help stamp out software hoarding!
+ */
-/* Temporary nodes are stored here. ob_dummy is a dummy object used to
- keep the obstack library from free()ing up the entire stack. */
-struct obstack temp_strings;
-char *ob_dummy;
+#include "awk.h"
-/* The parse tree and field nodes are stored here. Parse_end is a dummy
- item used to free up unneeded fields without freeing the program being run
+/*
+ * The parse tree and field nodes are stored here. Parse_end is a dummy item
+ * used to free up unneeded fields without freeing the program being run
*/
-struct obstack other_stack;
-char *parse_end;
+int errcount = 0; /* error counter, used by yyerror() */
+int param_counter;
/* The global null string */
NODE *Nnull_string;
/* The special variable that contains the name of the current input file */
extern NODE *FILENAME_node;
+extern NODE *ARGC_node;
+extern NODE *ARGV_node;
/* The name the program was invoked under, for error messages */
char *myname;
-/* A block of gAWK code to be run before running the program */
-NODE *begin_block = 0;
+/* A block of AWK code to be run before running the program */
+NODE *begin_block = 0;
+
+/* A block of AWK code to be run after the last input file */
+NODE *end_block = 0;
-/* A block of gAWK code to be run after the last input file */
-NODE *end_block = 0;
+FILE *input_file; /* Where to read from */
-FILE *input_file; /* Where to read from */
+int exiting = 0; /* Was an "exit" statement executed? */
+int exit_val = 0; /* optional exit value */
-#ifndef FAST
+#ifdef DEBUG
/* non-zero means in debugging is enabled. Probably not very useful */
-int debugging;
+int debugging = 0;
+
#endif
-char *index();
+int tempsource = 0; /* source is in a temp file */
+char **sourcefile = NULL; /* source file name(s) */
+int numfiles = -1; /* how many source files */
+
+int ignorecase = 0; /* global flag for ignoring case */
+int strict = 0; /* turn off gnu extensions */
main(argc, argv)
- int argc;
- char **argv;
+int argc;
+char **argv;
{
- register int i;
- register NODE *tmp;
- char **do_vars;
-#ifndef FAST
+#ifdef DEBUG
/* Print out the parse tree. For debugging */
- register int dotree = 0;
- extern int yydebug;
+ register int dotree = 0;
+ extern int yydebug;
+
+#endif
+ extern char *lexptr;
+ extern char *lexptr_begin;
+ extern char *version_string;
+ extern FILE *nextfile();
+ FILE *fp, *fopen();
+ static char template[] = "/tmp/gawk.XXXXX";
+ char *mktemp ();
+ int c;
+ extern int opterr, optind, getopt();
+ extern char *optarg;
+ char *cp, *rindex();
+ /*
+ * for strict to work, legal options must be first
+ */
+#define EXTENSIONS 4 /* where to clear */
+#ifdef DEBUG
+ char *awk_opts = "F:f:ivdD";
+#else
+ char *awk_opts = "F:f:iv";
#endif
- extern char *lexptr;
- extern char *lexptr_begin;
- FILE *fp,*fopen();
- --argc;
- myname= *argv++;
- if(!argc)
- usage();
+#ifdef DEBUG
+ /*malloc_debug(2);*/
+#endif
+ myname = argv[0];
+ if (argc < 2)
+ usage();
/* Tell the regex routines how they should work. . . */
- re_set_syntax(RE_NO_BK_PARENS|RE_NO_BK_VBAR);
-
- /* Set up the stack for temporary strings */
- obstack_init (&temp_strings);
- ob_dummy=obstack_alloc(&temp_strings,0);
+ (void) re_set_syntax(RE_SYNTAX_AWK);
- /* Set up the other stack for other things */
- obstack_init(&other_stack);
/* initialize the null string */
- Nnull_string = make_string("",0);
- /* This was to keep Nnull_string from ever being free()d It didn't work */
- /* Nnull_string->stref=32000; */
- /* Set up the special variables */
- /* Note that this must be done BEFORE arg parsing else -R and -F
- break horribly */
- init_vars();
-
-
- for(;*argv && **argv=='-';argc--,argv++) {
- switch(argv[0][1]) {
-#ifndef FAST
- case 'd':
- debugging++;
- dotree++;
- break;
-
- case 'D':
- debugging++;
- yydebug=2;
- break;
+ Nnull_string = make_string("", 0);
+ Nnull_string->numbr = 0.0;
+ Nnull_string->type = Node_val;
+ Nnull_string->flags = (PERM|STR|NUM);
+
+ /* Set up the special variables */
+
+ /*
+ * Note that this must be done BEFORE arg parsing else -F
+ * breaks horribly
+ */
+ init_vars();
+
+ /* worst case */
+ emalloc(sourcefile, char **, argc * sizeof(char *), "main");
+
+
+#ifdef STRICT /* strict Unix awk compatibility */
+ strict = 1;
+#else
+ /* if invoked as 'awk', also behave strictly */
+ if ((cp = rindex(myname, '/')) != NULL)
+ cp++;
+ else
+ cp = myname;
+ if (strcmp (cp, "awk") == 0)
+ strict = 1;
#endif
- /* This feature isn't in un*x awk, but might be useful */
- case 'R':
- set_rs(&argv[0][2]);
- break;
-
- case 'F':
- set_fs(&argv[0][2]);
- break;
-
-
- /* It would be better to read the input file in as we parse
- it. Its done this way for hysterical reasons. Feel
- free to fix it. */
- case 'f':
- if(lexptr)
- panic("Can only use one -f option");
- if((fp=fopen(argv[1],"r"))==NULL)
- er_panic(argv[1]);
- else {
- char *curptr;
- int siz,nread;
-
- curptr=lexptr=malloc(2000);
- if(curptr==NULL)
- panic("Memory exhausted"); /* jfw: instead of abort() */
- siz=2000;
- i=siz-1;
- while((nread=fread(curptr,sizeof(char),i,fp)) > 0) {
- curptr+=nread;
- i-=nread;
- if(i==0) {
- lexptr=realloc(lexptr,siz*2);
- if(lexptr==NULL)
- panic("Memory exhausted"); /* jfw: instead of abort() */
- curptr=lexptr+siz-1;
- i=siz;
- siz*=2;
- }
- }
- *curptr='\0';
- fclose(fp);
- }
- argc--;
- argv++;
- break;
-
- case '\0': /* A file */
- break;
-
- default:
- panic("Unknown option %s",argv[0]);
- }
- }
- if (debugging) setbuf(stdout, 0); /* jfw: make debugging easier */
- /* No -f option, use next arg */
- if(!lexptr) {
- if(!argc) usage();
- lexptr= *argv++;
- --argc;
- }
-
- /* Read in the program */
- lexptr_begin=lexptr;
- (void)yyparse ();
-
- /* Anything allocated on the other_stack after here will be freed
- when the next input line is read.
- */
- parse_end=obstack_alloc(&other_stack,0);
-
-#ifndef FAST
- if(dotree)
- print_parse_tree(expression_value);
+
+ if (strict)
+ awk_opts[EXTENSIONS] = '\0';
+
+ while ((c = getopt (argc, argv, awk_opts)) != EOF) {
+ switch (c) {
+#ifdef DEBUG
+ case 'd':
+ debugging++;
+ dotree++;
+ break;
+
+ case 'D':
+ debugging++;
+ yydebug = 2;
+ break;
#endif
- /* Set up the field variables */
- init_fields();
-
- /* Look for BEGIN and END blocks. Only one of each allowed */
- for(tmp=expression_value;tmp;tmp=tmp->rnode) {
- if(!tmp->lnode || !tmp->lnode->lnode)
- continue;
- if(tmp->lnode->lnode->type==Node_K_BEGIN)
- begin_block=tmp->lnode->rnode;
- else if(tmp->lnode->lnode->type==Node_K_END)
- end_block=tmp->lnode->rnode;
- }
- if(begin_block && interpret(begin_block) == 0) exit(0); /* jfw */
- do_vars=argv;
- while(argc>0 && index(*argv,'=')) {
- argv++;
- --argc;
- }
- if(do_vars==argv) do_vars=0;
- if(argc==0) {
- static char *dumb[2]= { "-", 0};
-
- argc=1;
- argv= &dumb[0];
- }
- while(argc--) {
- if(!strcmp(*argv,"-")) {
- input_file=stdin;
- FILENAME_node->var_value=Nnull_string;
- ADD_ONE_REFERENCE(Nnull_string);
- } else {
- extern NODE *deref;
-
- input_file=fopen(*argv,"r");
- /* This should print the error message from errno */
- if(!input_file)
- er_panic(*argv);
- /* This is a kludge. */
- deref=FILENAME_node->var_value;
- do_deref();
- FILENAME_node->var_value=make_string(*argv,strlen(*argv));
- }
- /* This is where it spends all its time. The infamous MAIN LOOP */
- if(inrec()==0) {
- if(do_vars) {
- while(do_vars!=argv && *do_vars) {
- char *cp;
-
- cp=index(*do_vars,'=');
- *cp++='\0';
- variable(*do_vars)->var_value=make_string(cp,strlen(cp));
- do_vars++;
+
+ case 'F':
+ set_fs(optarg);
+ break;
+
+ case 'f':
+ /*
+ * a la MKS awk, allow multiple -f options.
+ * this makes function libraries real easy.
+ * most of the magic is in the scanner.
+ */
+ sourcefile[++numfiles] = optarg;
+ break;
+
+ case 'i':
+ ignorecase = 1;
+ break;
+
+ case 'v':
+ fprintf(stderr, "%s", version_string);
+ break;
+
+ case '?':
+ default:
+ /* getopt will print a message for us */
+ /* S5R4 awk ignores bad options and keeps going */
+ break;
}
- do_vars=0;
}
- do
- obstack_free(&temp_strings, ob_dummy);
- while (interpret(expression_value) && inrec() == 0);
- }
- if(input_file!=stdin) fclose(input_file);
- argv++;
- }
- if(end_block) (void)interpret(end_block);
- exit(0);
-}
+#ifdef DEBUG
+ setbuf(stdout, (char *) NULL); /* make debugging easier */
+#endif
+ /* No -f option, use next arg */
+ /* write to temp file and save sourcefile name */
+ if (numfiles == -1) {
+ int i;
+
+ if (optind > argc - 1) /* no args left */
+ usage();
+ numfiles++;
+ sourcefile[0] = mktemp (template);
+ i = strlen (argv[optind]);
+ if ((fp = fopen (sourcefile[0], "w")) == NULL)
+ fatal("could not save source prog in temp file (%s)",
+ sys_errlist[errno]);
+ if (fwrite (argv[optind], 1, i, fp) == 0)
+ fatal(
+ "could not write source program to temp file (%s)",
+ sys_errlist[errno]);
+ if (argv[optind][i-1] != '\n')
+ putc ('\n', fp);
+ (void) fclose (fp);
+ tempsource++;
+ optind++;
+ }
+ init_args(optind, argc, myname, argv);
-/* These exit values are arbitrary */
-/*VARARGS1*/
-panic(str,arg)
-char *str;
-{
- fprintf(stderr,"%s: ",myname);
- fprintf(stderr,str,arg);
- fprintf(stderr,"\n");
- exit(12);
+ /* Read in the program */
+ lexptr_begin = lexptr;
+ if (yyparse() || errcount)
+ exit(1);
+
+#ifdef DEBUG
+ if (dotree)
+ print_parse_tree(expression_value);
+#endif
+ /* Set up the field variables */
+ init_fields();
+
+ if (begin_block)
+ (void) interpret(begin_block);
+ if (!exiting && (expression_value || end_block)) {
+ if(input_file)
+ do_file(input_file);
+ while ((fp = nextfile()) != NULL) {
+ do_file(fp);
+ if (exiting)
+ break;
+ }
+ }
+ if (end_block)
+ (void) interpret(end_block);
+ if (flush_io() != 0 && exit_val == 0)
+ exit_val = 1;
+ if (close_io() != 0 && exit_val == 0)
+ exit_val = 1;
+ exit(exit_val);
}
-er_panic(str)
-char *str;
+do_file(fp)
+FILE *fp;
{
- fprintf(stderr,"%s: ",myname);
- perror(str);
- exit(15);
+ input_file = fp;
+ /* This is where it spends all its time. The infamous MAIN LOOP */
+ if (inrec() == 0) {
+ while (interpret(expression_value) && inrec() == 0)
+ ;
+ }
+ if (fp != stdin)
+ (void) fclose(fp);
}
usage()
{
- fprintf(stderr,"%s: usage: %s {-f progfile | program } [-F{c} -R{c}] file . . .\n",myname,myname);
+#ifdef STRICT
+ char *opt1 = "[ -Ffs ] -f progfile [ -- ]";
+ char *opt2 = "[ -Ffs ] [ -- ] 'program'";
+#else
+ char *opt1 = "[ -v ] [ -Ffs ] -f progfile [ -- ]";
+ char *opt2 = "[ -v ] [ -Ffs ] [ -- ] 'program'";
+#endif
+
+ fprintf(stderr, "usage: %s %s file ...\n %s %s file ...\n",
+ myname, opt1, myname, opt2);
exit(11);
}
-
-/* This allocates a new node of type ty. Note that this node will not go
- away unless freed, so don't use it for tmp storage */
NODE *
-newnode(ty)
-NODETYPE ty;
+node_common(op)
+NODETYPE op;
{
register NODE *r;
-
- r=(NODE *)malloc(sizeof(NODE));
- if(r==NULL)
- abort();
- r->type=ty;
+ extern int lineno;
+ extern int numfiles;
+ extern int tempsource;
+ extern char **sourcefile;
+ extern int curinfile;
+
+ emalloc(r, NODE *, sizeof(NODE), "node_common");
+ r->type = op;
+ r->source_line = lineno;
+ if (numfiles > 1 && !tempsource)
+ r->source_file = sourcefile[curinfile];
+ else
+ r->source_file = NULL;
return r;
}
-
-/* Duplicate a node. (For global strings, "duplicate" means crank up
- the reference count.) This creates global nodes. . .*/
+/*
+ * This allocates a node with defined lnode and rnode.
+ * This should only be used by yyparse+co while reading in the program
+ */
NODE *
-dupnode(n)
-NODE *n;
+node(left, op, right)
+NODE *left, *right;
+NODETYPE op;
{
register NODE *r;
- if(n->type==Node_string) {
- n->stref++;
- return n;
- } else if(n->type==Node_temp_string) {
- r=newnode(Node_string);
- r->stlen=n->stlen;
- r->stref=1;
- r->stptr=malloc(n->stlen+1);
- if(r->stptr==NULL)
- abort();
- bcopy (n->stptr, r->stptr, n->stlen);
- r->stptr[r->stlen]='\0'; /* JF for hackval */
- return r;
- } else {
- r=newnode(Node_illegal);
- *r= *n;
- return r;
- }
-}
-
-/* This allocates a node with defined lnode and rnode. */
-/* This should only be used by yyparse+co while
- reading in the program */
-NODE *
-node (left, op, right)
- NODE *left, *right;
- NODETYPE op;
-{
- register NODE *r;
-
- r = (NODE *)obstack_alloc(&other_stack,sizeof(NODE));
- r->type=op;
- r->lnode = left;
- r->rnode = right;
- return r;
+ r = node_common(op);
+ r->lnode = left;
+ r->rnode = right;
+ return r;
}
-/* This allocates a node with defined subnode and proc */
-/* Otherwise like node() */
+/*
+ * This allocates a node with defined subnode and proc
+ * Otherwise like node()
+ */
NODE *
snode(subn, op, procp)
NODETYPE op;
-NODE *(*procp)();
+NODE *(*procp) ();
NODE *subn;
{
register NODE *r;
- r=(NODE *)obstack_alloc(&other_stack,sizeof(NODE));
- r->type=op;
- r->subnode=subn;
- r->proc=procp;
+ r = node_common(op);
+ r->subnode = subn;
+ r->proc = procp;
return r;
}
-/* (jfw) This allocates a Node_line_range node
- * with defined condpair and zeroes the trigger word
- * to avoid the temptation of assuming that calling
- * 'node( foo, Node_line_range, 0)' will properly initialize 'triggered'.
+/*
+ * This allocates a Node_line_range node with defined condpair and
+ * zeroes the trigger word to avoid the temptation of assuming that calling
+ * 'node( foo, Node_line_range, 0)' will properly initialize 'triggered'.
*/
/* Otherwise like node() */
NODE *
@@ -373,351 +463,378 @@ NODE *cpair;
{
register NODE *r;
- r=(NODE *)obstack_alloc(&other_stack,sizeof(NODE));
- r->type=Node_line_range;
- r->condpair=cpair;
+ emalloc(r, NODE *, sizeof(NODE), "mkrangenode");
+ r->type = Node_line_range;
+ r->condpair = cpair;
r->triggered = 0;
return r;
}
-/* this allocates a node with defined numbr */
-/* This creates global nodes! */
-NODE *
-make_number (x)
- AWKNUM x;
-{
- register NODE *r;
-
- r=newnode(Node_number);
- r->numbr = x;
- return r;
-}
-
-/* This creates temporary nodes. They go away quite quicly, so
- don't use them for anything important */
-#ifndef FAST
-NODE *
-tmp_number(x)
-AWKNUM x;
-{
-#ifdef DONTDEF
- return make_number(x);
-#endif
- NODE *r;
-
- r=(NODE *)obstack_alloc(&temp_strings,sizeof(NODE));
- r->type=Node_number;
- r->numbr=x;
- return r;
-}
-#endif
-
-/* Make a string node. If len==0, the string passed in S is supposed to end
- with a double quote, but have had the beginning double quote
- already stripped off by yylex.
- If LEN!=0, we don't care what s ends with. This creates a global node */
-
-NODE *
-make_string (s,len)
- char *s;
-{
- register NODE *r;
- register char *pf,*pt;
- register int c;
-
- /* the aborts are impossible because yylex is supposed to have
- already checked for unterminated strings */
- if(len==-1) { /* Called from yyparse, find our own len */
-#ifndef FAST
- if (s[-1] != '\"') /* Didn't start with " */
- abort ();
-#endif
-
- for(pf = pt = s; *pf != '\0' && *pf!='\"';) {
- c= *pf++;
- switch(c) {
-#ifndef FAST
- case '\0':
- abort();
-#endif
-
- case '\\':
-#ifndef FAST
- if(*pf=='\0')
- abort();
-#endif
-
- c= *pf++;
- switch(c) {
- case '\\': /* no massagary needed */
- case '\'':
- case '\"':
- break;
- case '0':
- case '1':
- case '2':
- case '3':
- case '4':
- case '5':
- case '6':
- case '7':
- case '8':
- case '9':
- c-='0';
- while(*pf && *pf>='0' && *pf<='7') {
- c=c*8+ *pf++ - '0';
- }
- break;
- case 'b':
- c='\b';
- break;
- case 'f':
- c='\f';
- break;
- case 'n':
- c='\n';
- break;
- case 'r':
- c='\r';
- break;
- case 't':
- c='\t';
- break;
- case 'v':
- c='\v';
- break;
- default:
- *pt++='\\';
- break;
- }
- /* FALL THROUGH */
- default:
- *pt++=c;
- break;
- }
- }
-#ifndef FAST
- if(*pf=='\0')
- abort(); /* JF hit the end of the buf */
-#endif
- len = pt - s; /* JF was p - s - 1 */
- }
-
- r=newnode(Node_string);
- r->stptr=(char *)malloc(len+1);
- if(r->stptr==0)
- abort();
- r->type=Node_string;
- r->stlen=len;
- r->stref=1;
- bcopy (s, r->stptr, len);
- r->stptr[len]='\0'; /* JF a hack */
-
- return r;
-}
-
-/* #ifndef FAST */
-/* This should be a macro for speed, but the C compiler chokes. */
-/* Read the warning under tmp_number */
-NODE *
-tmp_string(s,len)
+struct re_pattern_buffer *
+mk_re_parse(s)
char *s;
{
- register NODE *r;
-
-#ifdef DONTDEF
- return make_string(s,len);
-#endif
- r=(NODE *)obstack_alloc(&temp_strings,sizeof(NODE));
- r->stptr=(char *)obstack_alloc(&temp_strings,len+1);
- r->type=Node_temp_string;
- r->stlen=len;
- r->stref=1;
- bcopy (s, r->stptr, len);
- r->stptr[len]='\0'; /* JF a hack */
-
- return r;
+ register char *src, *dest;
+ int c;
+
+ for (dest = src = s; *src != '\0'; src++) {
+ if (*src == '\\') {
+ c = *++src;
+ switch (c) {
+ case 'b':
+ *dest++ = '\b';
+ break;
+ case 'f':
+ *dest++ = '\f';
+ break;
+ case 'n':
+ *dest++ = '\n';
+ break;
+ case 'r':
+ *dest++ = '\r';
+ break;
+ case 't':
+ *dest++ = '\t';
+ break;
+ case 'v':
+ *dest++ = '\v';
+ break;
+ case '0':
+ case '1':
+ case '2':
+ case '3':
+ case '4':
+ case '5':
+ case '6':
+ case '7':
+ {
+ register int i = c - '0';
+ register int count = 0;
+
+ while (++count < 3) {
+ if ((c = *++src) >= '0' && c <= '7') {
+ i *= 8;
+ i += c - '0';
+ } else
+ break;
+ }
+ *dest++ = i;
+ }
+ break;
+ default:
+ *dest++ = '\\';
+ *dest++ = c;
+ break;
+ }
+ } else if (*src == '/')
+ break;
+ else
+ *dest++ = *src;
+ }
+ return make_regexp(tmp_string(s, dest-s));
}
-/* #endif */
/* Generate compiled regular expressions */
struct re_pattern_buffer *
-make_regexp (s)
- char *s;
+make_regexp(s)
+NODE *s;
{
- typedef struct re_pattern_buffer RPAT;
- RPAT *rp;
- char *p, *err;
-
- rp = (RPAT *) obstack_alloc(&other_stack, sizeof (RPAT));
- bzero((char *)rp,sizeof(RPAT));
- rp->buffer = (char *)malloc(8); /* JF I'd obstack allocate it,
- except the regex routines
- try to realloc() it, which fails. */
- /* Note that this means it may never be freed. Someone fix, please? */
-
- rp->allocated = 8;
- rp->fastmap = (char *)obstack_alloc(&other_stack, 256);
-
- for (p = s; *p != '\0'; p++) {
- if (*p == '\\')
- p++;
- else if (*p == '/')
- break;
- }
-#ifndef FAST
- if (*p != '/')
- abort (); /* impossible */
-#endif
-
- /* JF was re_compile_pattern, but that mishandles ( ) and |,
- so I had to write my own front end. Sigh. */
-
- if ((err = re_compile_pattern (s, p - s, rp)) != NULL) {
- fprintf (stderr, "illegal regexp: ");
- yyerror (err); /* fatal */
- }
-
- return rp;
+ struct re_pattern_buffer *rp;
+ char *err;
+
+ emalloc(rp, struct re_pattern_buffer *, sizeof(*rp), "make_regexp");
+ bzero((char *) rp, sizeof(*rp));
+ emalloc(rp->buffer, char *, 8, "make_regexp");
+ rp->allocated = 8;
+ emalloc(rp->fastmap, char *, 256, "make_regexp");
+
+ if ((err = re_compile_pattern(s->stptr, s->stlen, rp)) != NULL)
+ fatal("%s: /%s/", err, s->stptr);
+ free_temp(s);
+ return rp;
}
/* Build a for loop */
-FOR_LOOP_HEADER *
-make_for_loop (init, cond, incr)
- NODE *init, *cond, *incr;
+NODE *
+make_for_loop(init, cond, incr)
+NODE *init, *cond, *incr;
{
- register FOR_LOOP_HEADER *r;
-
- r = (FOR_LOOP_HEADER *)obstack_alloc(&other_stack,sizeof (FOR_LOOP_HEADER));
- r->init = init;
- r->cond = cond;
- r->incr = incr;
- return r;
+ register FOR_LOOP_HEADER *r;
+ NODE *n;
+
+ emalloc(r, FOR_LOOP_HEADER *, sizeof(FOR_LOOP_HEADER), "make_for_loop");
+ emalloc(n, NODE *, sizeof(NODE), "make_for_loop");
+ r->init = init;
+ r->cond = cond;
+ r->incr = incr;
+ n->type = Node_illegal;
+ n->sub.nodep.r.hd = r;
+ return n;
}
/* Name points to a variable name. Make sure its in the symbol table */
NODE *
-variable (name)
- char *name;
+variable(name)
+char *name;
{
- register NODE *r;
- NODE *lookup(), *install();
-
- if ((r = lookup (variables, name)) == NULL) {
- r = install (variables, name, node(Nnull_string, Node_var, (NODE *)NULL));
- /* JF make_number (0.0) is WRONG */
- }
- return r;
+ register NODE *r;
+ NODE *lookup(), *install(), *make_name();
+
+ if ((r = lookup(variables, name)) == NULL)
+ r = install(variables, name,
+ node(Nnull_string, Node_var, (NODE *) NULL));
+ return r;
}
/* Create a special variable */
NODE *
-spc_var (name,value)
+spc_var(name, value)
char *name;
NODE *value;
{
- register NODE *r;
- NODE *lookup(), *install();
+ register NODE *r;
+ NODE *lookup(), *install();
- if ((r = lookup(variables, name)) == NULL)
- r = install (variables, name, node(value, Node_var, (NODE *)NULL));
- return r;
+ if ((r = lookup(variables, name)) == NULL)
+ r = install(variables, name, node(value, Node_var, (NODE *) NULL));
+ return r;
}
-
+
+
/*
* Install a name in the hash table specified, even if it is already there.
- * Name stops with first non alphanumeric.
- * Caller must check against redefinition if that is desired.
+ * Name stops with first non alphanumeric. Caller must check against
+ * redefinition if that is desired.
*/
NODE *
-install (table, name, value)
- HASHNODE **table;
- char *name;
- NODE *value;
+install(table, name, value)
+HASHNODE **table;
+char *name;
+NODE *value;
{
- register HASHNODE *hp;
- register int i, len, bucket;
- register char *p;
-
- len = 0;
- p = name;
- while (is_identchar(*p))
- p++;
- len = p - name;
-
- i = sizeof (HASHNODE) + len + 1;
- hp = (HASHNODE *)obstack_alloc(&other_stack,i);
- bucket = hashf(name, len, HASHSIZE);
- hp->next = table[bucket];
- table[bucket] = hp;
- hp->length = len;
- hp->value = value;
- hp->name = ((char *) hp) + sizeof (HASHNODE);
- hp->length = len;
- bcopy (name, hp->name, len);
- return hp->value;
+ register HASHNODE *hp;
+ register int i, len, bucket;
+ register char *p;
+
+ len = 0;
+ p = name;
+ while (is_identchar(*p))
+ p++;
+ len = p - name;
+
+ i = sizeof(HASHNODE) + len + 1;
+ emalloc(hp, HASHNODE *, i, "install");
+ bucket = hashf(name, len, HASHSIZE);
+ hp->next = table[bucket];
+ table[bucket] = hp;
+ hp->length = len;
+ hp->value = value;
+ hp->name = ((char *) hp) + sizeof(HASHNODE);
+ hp->length = len;
+ bcopy(name, hp->name, len);
+ hp->name[len] = '\0';
+ hp->value->varname = hp->name;
+ return hp->value;
}
/*
* find the most recent hash node for name name (ending with first
- * non-identifier char) installed by install
+ * non-identifier char) installed by install
*/
NODE *
-lookup (table, name)
- HASHNODE **table;
- char *name;
+lookup(table, name)
+HASHNODE **table;
+char *name;
{
- register char *bp;
- register HASHNODE *bucket;
- register int len;
-
- for (bp = name; is_identchar(*bp); bp++)
- ;
- len = bp - name;
- bucket = table[hashf(name, len, HASHSIZE)];
- while (bucket) {
- if (bucket->length == len && strncmp(bucket->name, name, len) == 0)
- return bucket->value;
- bucket = bucket->next;
- }
- return NULL;
+ register char *bp;
+ register HASHNODE *bucket;
+ register int len;
+
+ for (bp = name; is_identchar(*bp); bp++)
+ ;
+ len = bp - name;
+ bucket = table[hashf(name, len, HASHSIZE)];
+ while (bucket) {
+ if (bucket->length == len && strncmp(bucket->name, name, len) == 0)
+ return bucket->value;
+ bucket = bucket->next;
+ }
+ return NULL;
}
#define HASHSTEP(old, c) ((old << 1) + c)
-#define MAKE_POS(v) (v & ~0x80000000) /* make number positive */
+#define MAKE_POS(v) (v & ~0x80000000) /* make number positive */
/*
- * return hash function on name. must be compatible with the one
- * computed a step at a time, elsewhere (JF: Where? I can't find it!)
+ * return hash function on name.
*/
int
hashf(name, len, hashsize)
- register char *name;
- register int len;
- int hashsize;
+register char *name;
+register int len;
+int hashsize;
{
- register int r = 0;
-
- while (len--)
- r = HASHSTEP(r, *name++);
-
- return MAKE_POS(r) % hashsize;
+ register int r = 0;
+
+ while (len--)
+ r = HASHSTEP(r, *name++);
+
+ r = MAKE_POS(r) % hashsize;
+ return r;
}
-/* Add new to the rightmost branch of LIST. This uses n^2 time, but
- doesn't get used enough to make optimizing worth it. . . */
+/*
+ * Add new to the rightmost branch of LIST. This uses n^2 time, but doesn't
+ * get used enough to make optimizing worth it. . .
+ */
/* You don't believe me? Profile it yourself! */
NODE *
-append_right(list,new)
-NODE *list,*new;
+append_right(list, new)
+NODE *list, *new;
{
register NODE *oldlist;
oldlist = list;
- while(list->rnode!=NULL)
- list=list->rnode;
+ while (list->rnode != NULL)
+ list = list->rnode;
list->rnode = new;
return oldlist;
}
+
+/*
+ * check if name is already installed; if so, it had better have Null value,
+ * in which case def is added as the value. Otherwise, install name with def
+ * as value.
+ */
+func_install(params, def)
+NODE *params;
+NODE *def;
+{
+ NODE *r;
+ NODE *lookup();
+
+ pop_params(params);
+ r = lookup(variables, params->param);
+ if (r != NULL) {
+ fatal("function name `%s' previously defined", params->param);
+ } else
+ (void) install(variables, params->param,
+ node(params, Node_func, def));
+}
+
+NODE *
+pop_var(name)
+char *name;
+{
+ register char *bp;
+ register HASHNODE *bucket, **save;
+ register int len;
+
+ for (bp = name; is_identchar(*bp); bp++)
+ ;
+ len = bp - name;
+ save = &(variables[hashf(name, len, HASHSIZE)]);
+ bucket = *save;
+ while (bucket) {
+ if (strncmp(bucket->name, name, len) == 0) {
+ *save = bucket->next;
+ return bucket->value;
+ }
+ save = &(bucket->next);
+ bucket = bucket->next;
+ }
+ return NULL;
+}
+
+pop_params(params)
+NODE *params;
+{
+ register NODE *np;
+
+ for (np = params; np != NULL; np = np->rnode)
+ pop_var(np->param);
+}
+
+NODE *
+make_name(name, type)
+char *name;
+NODETYPE type;
+{
+ register char *p;
+ register NODE *r;
+ register int len;
+
+ p = name;
+ while (is_identchar(*p))
+ p++;
+ len = p - name;
+ emalloc(r, NODE *, sizeof(NODE), "make_name");
+ emalloc(r->param, char *, len + 1, "make_name");
+ bcopy(name, r->param, len);
+ r->param[len] = '\0';
+ r->rnode = NULL;
+ r->type = type;
+ return (install(variables, name, r));
+}
+
+NODE *make_param(name)
+char *name;
+{
+ NODE *r;
+
+ r = make_name(name, Node_param_list);
+ r->param_cnt = param_counter++;
+ return r;
+}
+
+FILE *
+nextfile()
+{
+ static int i = 1;
+ static int files = 0;
+ char *arg;
+ char *cp;
+ FILE *fp;
+ extern NODE **assoc_lookup();
+
+ for (; i < (int) (ARGC_node->lnode->numbr); i++) {
+ arg = (*assoc_lookup(ARGV_node, tmp_number((AWKNUM) i)))->stptr;
+ if (*arg == '\0')
+ continue;
+ cp = index(arg, '=');
+ if (cp != NULL) {
+ *cp++ = '\0';
+ variable(arg)->var_value = make_string(cp, strlen(cp));
+ } else {
+ extern NODE *deref;
+
+ files++;
+ if (strcmp(arg, "-") == 0)
+ fp = stdin;
+ else
+ fp = fopen(arg, "r");
+ if (fp == NULL)
+ fatal("cannot open file `%s' for reading (%s)",
+ arg, sys_errlist[errno]);
+ /* NOTREACHED */
+ /* This is a kludge. */
+ deref = FILENAME_node->var_value;
+ do_deref();
+ FILENAME_node->var_value =
+ make_string(arg, strlen(arg));
+ FNR_node->var_value->numbr = 0.0;
+ i++;
+ return fp;
+ }
+ }
+ if (files == 0) {
+ files++;
+ /* no args. -- use stdin */
+ /* FILENAME is init'ed to "-" */
+ /* FNR is init'ed to 0 */
+ return stdin;
+ }
+ return NULL;
+}
diff --git a/awk2.c b/awk2.c
index 8f29e312..38a319c6 100644
--- a/awk2.c
+++ b/awk2.c
@@ -1,1129 +1,1139 @@
/*
- * awk2 --- gawk parse tree interpreter
+ * awk2 --- gawk parse tree interpreter
*
- * Copyright (C) 1986 Free Software Foundation
- * Written by Paul Rubin, August 1986
+ * Copyright (C) 1986 Free Software Foundation Written by Paul Rubin, August
+ * 1986
+ *
+ * $Log: awk2.c,v $
+ * Revision 1.40 88/12/15 12:57:31 david
+ * make casetable static
+ *
+ * Revision 1.39 88/12/14 10:50:51 david
+ * dupnode() the return from a function
+ *
+ * Revision 1.38 88/12/13 22:27:04 david
+ * macro-front-end tree_eval and other optimizations
+ *
+ * Revision 1.36 88/12/08 10:51:37 david
+ * small correction to source file code
+ *
+ * Revision 1.35 88/12/07 20:00:35 david
+ * changes for incorporating source filename into error messages
+ *
+ * Revision 1.34 88/12/01 15:04:48 david
+ * cleanup and additions for source line number printing in error messages
+ *
+ * Revision 1.33 88/11/30 15:16:10 david
+ * merge FREE_ONE_REFERENCE into do_deref()
+ * free more in do_deref
+ * in for (i in array) loops, make sure value of i gets freed on each iteration
+ *
+ * Revision 1.32 88/11/29 09:55:04 david
+ * corrections to code that tracks value of NF -- this needs cleanup
+ *
+ * Revision 1.31 88/11/23 21:40:47 david
+ * Arnold: comment cleanup
+ *
+ * Revision 1.30 88/11/22 13:49:09 david
+ * Arnold: changes for case-insensitive matching
+ *
+ * Revision 1.29 88/11/15 10:22:42 david
+ * Arnold: cleanup of comments and #include's
+ *
+ * Revision 1.28 88/11/14 21:55:38 david
+ * Arnold: misc. cleanup and error message on bad regexp
+ *
+ * Revision 1.27 88/11/14 21:26:52 david
+ * update NF on assignment to a field greater than current NF
+ *
+ * Revision 1.26 88/11/03 15:26:20 david
+ * simplify call to in_array(); extensive revision of cmp_nodes and is_a_number
+ *
+ * Revision 1.25 88/11/01 12:11:57 david
+ * DEBUG macro becomes DBG_P; added some debugging code; moved all the
+ * compound assignments (+= etc.) into op_assign()
+ *
+ * Revision 1.24 88/10/25 10:43:05 david
+ * intermediate state: more code movement; Node_string et al. -> Node_val;
+ * add more debugging code; improve cmp_nodes
+ *
+ * Revision 1.22 88/10/19 21:57:41 david
+ * replace malloc and realloc with error checking versions
+ * start to change handling of $0
+ *
+ * Revision 1.21 88/10/17 20:56:13 david
+ * Arnold: better error messages for use of a function in the wrong context
+ *
+ * Revision 1.20 88/10/13 21:56:41 david
+ * cleanup of previous changes
+ * change panic() to fatal()
+ * detect and bomb on function call with space between name and opening (
+ *
+ * Revision 1.19 88/10/11 22:19:20 david
+ * cleanup
+ *
+ * Revision 1.18 88/10/04 21:31:33 david
+ * minor cleanup
+ *
+ * Revision 1.17 88/08/22 14:01:19 david
+ * fix to set_field() from Jay Finlayson
+ *
+ * Revision 1.16 88/08/09 14:51:34 david
+ * removed bad call to obstack_free() -- there is a lot of memory that is
+ * not being properly freed -- this area needs major work
+ * changed semantics in eval_condition -- if(expr) should test true if
+ * expr is a non-null string, even if the num,erical value is zero -- counter-
+ * intuitive but that's what's in the book
+ *
+ * Revision 1.15 88/06/13 18:02:58 david
+ * separate exit value from fact that exit has been called [from Arnold]
+ *
+ * Revision 1.14 88/06/07 23:39:48 david
+ * insubstantial changes
+ *
+ * Revision 1.13 88/06/06 11:26:39 david
+ * get rid of some obsolete code
+ * change interface of set_field()
+ *
+ * Revision 1.12 88/06/05 22:21:36 david
+ * local variables are now kept on a stack
+ *
+ * Revision 1.11 88/06/01 22:06:50 david
+ * make sure incases of Node_param_list that the variable is looked up
+ *
+ * Revision 1.10 88/05/31 09:29:47 david
+ * expunge Node_local_var
+ *
+ * Revision 1.9 88/05/30 09:52:55 david
+ * be prepared for NULL return from make_regexp()
+ * fix fatal() call
+ *
+ * Revision 1.8 88/05/26 22:48:48 david
+ * fixed regexp matching code
+ *
+ * Revision 1.7 88/05/16 21:27:09 david
+ * comment out obstack_free in interpret() -- it is done in do_file() anyway
+ * and was definitely free'ing stuff it shouldn't have
+ * change call of func_call() a bit
+ * allow get_lhs to be called with other Node types -- return 0; used in
+ * do_sub()
+ *
+ * Revision 1.6 88/05/13 22:00:03 david
+ * generalized *_BINDING macros and moved them to awk.h
+ * changes to function calling (mostly elsewhere)
+ * put into use the Node_var_array type
+ *
+ * Revision 1.5 88/05/09 21:22:27 david
+ * finally (I hope) got the code right in assign_number
+ *
+ * Revision 1.4 88/05/04 12:23:30 david
+ * fflush(stdout) on prints if FAST not def'ed
+ * all the assign_* cases were returning the wrong thing
+ * fixed Node_in_array code
+ * code in assign_number was freeing memory it shouldn't have
+ *
+ * Revision 1.3 88/04/15 13:12:38 david
+ * additional error message
+ *
+ * Revision 1.2 88/04/12 16:03:24 david
+ * fixed previously intoduced bug: all matches succeeded
+ *
+ * Revision 1.1 88/04/08 15:15:01 david
+ * Initial revision
+ * Revision 1.7 88/04/08 14:48:33 david changes from
+ * Arnold Robbins
+ *
+ * Revision 1.6 88/03/28 14:13:50 david *** empty log message ***
+ *
+ * Revision 1.5 88/03/23 22:17:37 david mostly delinting -- a couple of bug
+ * fixes
+ *
+ * Revision 1.4 88/03/18 21:00:10 david Baseline -- hoefully all the
+ * functionality of the new awk added. Just debugging and tuning to do.
+ *
+ * Revision 1.3 87/11/14 15:16:21 david added user-defined functions with
+ * return and do-while loops
+ *
+ * Revision 1.2 87/10/29 21:45:44 david added support for array membership
+ * test, as in: if ("yes" in answers) ... this involved one more case: for
+ * Node_in_array and rearrangment of the code in assoc_lookup, so thatthe
+ * element can be located without being created
+ *
+ * Revision 1.1 87/10/27 15:23:28 david Initial revision
*
*/
/*
-GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
-WARRANTY. No author or distributor accepts responsibility to anyone
-for the consequences of using it or for whether it serves any
-particular purpose or works at all, unless he says so in writing.
-Refer to the GAWK General Public License for full details.
-
-Everyone is granted permission to copy, modify and redistribute GAWK,
-but only under the conditions described in the GAWK General Public
-License. A copy of this license is supposed to have been given to you
-along with GAWK so you can know your rights and responsibilities. It
-should be in a file named COPYING. Among other things, the copyright
-notice and this notice must be preserved on all copies.
-
-In other words, go ahead and share GAWK, but don't try to stop
-anyone else from sharing it farther. Help stamp out software hoarding!
-*/
-
-#include <setjmp.h>
-#include <stdio.h>
-
-#ifdef SYSV
-/* nasty nasty berkelixm */
-#define _setjmp setjmp
-#define _longjmp longjmp
-#endif
+ * GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
+ * WARRANTY. No author or distributor accepts responsibility to anyone for
+ * the consequences of using it or for whether it serves any particular
+ * purpose or works at all, unless he says so in writing. Refer to the GAWK
+ * General Public License for full details.
+ *
+ * Everyone is granted permission to copy, modify and redistribute GAWK, but
+ * only under the conditions described in the GAWK General Public License. A
+ * copy of this license is supposed to have been given to you along with GAWK
+ * so you can know your rights and responsibilities. It should be in a file
+ * named COPYING. Among other things, the copyright notice and this notice
+ * must be preserved on all copies.
+ *
+ * In other words, go ahead and share GAWK, but don't try to stop anyone else
+ * from sharing it farther. Help stamp out software hoarding!
+ */
#include "awk.h"
-NODE **get_lhs();
+NODE *_t; /* used as a temporary in macros */
+NODE *_result; /* holds result of tree_eval, for possible freeing */
+NODE *ret_node;
+extern NODE *OFMT_node;
-extern NODE dumb[],*OFMT_node;
-/* BEGIN and END blocks need special handling, because we are handed them
- * as raw Node_statement_lists, not as Node_rule_lists (jfw)
+/*
+ * BEGIN and END blocks need special handling, because we are handed them as
+ * raw Node_statement_lists, not as Node_rule_lists.
*/
extern NODE *begin_block, *end_block;
NODE *do_sprintf();
-extern struct obstack other_stack;
-
-
-#define min(a,b) ((a) < (b) ? (a) : (b))
/* More of that debugging stuff */
-#ifdef FAST
-#define DEBUG(X)
+#ifdef DEBUG
+#define DBG_P(X) print_debug X
#else
-#define DEBUG(X) print_debug X
+#define DBG_P(X)
#endif
-/* longjmp return codes, must be nonzero */
-/* Continue means either for loop/while continue, or next input record */
-#define TAG_CONTINUE 1
-/* Break means either for/while break, or stop reading input */
-#define TAG_BREAK 2
-
-/* the loop_tag_valid variable allows continue/break-out-of-context
- * to be caught and diagnosed (jfw) */
-#define PUSH_BINDING(stack, x) (bcopy ((char *)(x), (char *)(stack), sizeof (jmp_buf)), loop_tag_valid++)
-#define RESTORE_BINDING(stack, x) (bcopy ((char *)(stack), (char *)(x), sizeof (jmp_buf)), loop_tag_valid--)
-
-/* for "for(iggy in foo) {" */
-struct search {
- int numleft;
- AHASH **arr_ptr;
- AHASH *bucket;
- NODE *symbol;
- NODE *retval;
-};
+NODE *func_call();
+extern jmp_buf func_tag;
-struct search *assoc_scan(),*assoc_next();
-/* Tree is a bunch of rules to run.
- Returns zero if it hit an exit() statement */
-interpret (tree)
- NODE *tree;
-{
- register NODE *t; /* temporary */
-
- auto jmp_buf loop_tag_stack; /* shallow binding stack for loop_tag */
- static jmp_buf loop_tag; /* always the current binding */
- static int loop_tag_valid = 0;/* nonzero when loop_tag valid (jfw) */
-
- static jmp_buf rule_tag; /* tag the rule currently being run,
- for NEXT and EXIT statements. It is
- static because there are no nested rules */
-
- register NODE **lhs; /* lhs == Left Hand Side for assigns, etc */
- register struct search *l; /* For array_for */
-
-
- extern struct obstack temp_strings;
- extern char *ob_dummy;
- NODE *do_printf();
-
- /* clean up temporary strings created by evaluating expressions in
- previous recursive calls */
- obstack_free (&temp_strings, ob_dummy);
-
- if(tree == NULL)
- return 1;
- switch (tree->type) {
-#ifndef FAST
- /* Can't run these! */
- case Node_illegal:
- case Node_rule_node:
- case Node_if_branches:
- case Node_expression_list:
- case Node_K_BEGIN:
- case Node_K_END:
- case Node_redirect_output:
- case Node_redirect_append:
- case Node_redirect_pipe:
- case Node_var_array:
- abort();
-#endif
-
- case Node_rule_list:
- for (t = tree; t != NULL; t = t->rnode) {
- switch (_setjmp(rule_tag)) {
- case 0: /* normal non-jump */
- if (eval_condition (t->lnode->lnode)) {
- DEBUG(("Found a rule",t->lnode->rnode));
- if (t->lnode->rnode == NULL) {
- /* special case: pattern with no action is equivalent to
- * an action of {print} (jfw) */
- NODE printnode;
- printnode.type = Node_K_print;
- printnode.lnode = NULL;
- printnode.rnode = NULL;
- hack_print_node(&printnode);
- } else
- (void)interpret (t->lnode->rnode);
- }
- break;
- case TAG_CONTINUE: /* NEXT statement */
- return 1;
- case TAG_BREAK:
- return 0;
- }
- }
- break;
-
- case Node_statement_list:
- /* print_a_node(tree); */
- /* because BEGIN and END do not have Node_rule_list nature, yet can
- * have exits and nexts, we special-case a setjmp of rule_tag here.
- * (jfw)
- */
- if (tree == begin_block || tree == end_block) {
- switch (_setjmp(rule_tag)) {
- case TAG_CONTINUE: /* next */
- panic("unexpected next");
- return 1;
- case TAG_BREAK: return 0;
- }
- }
- for (t = tree; t != NULL; t = t->rnode) {
- DEBUG(("Statements",t->lnode));
- (void)interpret (t->lnode);
- }
- break;
-
- case Node_K_if:
- DEBUG(("IF",tree->lnode));
- if (eval_condition(tree->lnode)) {
- DEBUG(("True",tree->rnode->lnode));
- (void)interpret (tree->rnode->lnode);
- } else {
- DEBUG(("False",tree->rnode->rnode));
- (void)interpret (tree->rnode->rnode);
- }
- break;
-
- case Node_K_while:
- PUSH_BINDING (loop_tag_stack, loop_tag);
-
- DEBUG(("WHILE",tree->lnode));
- while (eval_condition (tree->lnode)) {
- switch (_setjmp (loop_tag)) {
- case 0: /* normal non-jump */
- DEBUG(("DO",tree->rnode));
- (void)interpret (tree->rnode);
- break;
- case TAG_CONTINUE: /* continue statement */
- break;
- case TAG_BREAK: /* break statement */
- RESTORE_BINDING (loop_tag_stack, loop_tag);
- return 1;
-#ifndef FAST
- default:
- abort (); /* never happens */
-#endif
- }
- }
- RESTORE_BINDING (loop_tag_stack, loop_tag);
- break;
-
- case Node_K_for:
- PUSH_BINDING (loop_tag_stack, loop_tag);
-
- DEBUG(("FOR",tree->forloop->init));
- (void)interpret (tree->forloop->init);
-
- DEBUG(("FOR.WHILE",tree->forloop->cond));
- while (eval_condition (tree->forloop->cond)) {
- switch (_setjmp (loop_tag)) {
- case 0: /* normal non-jump */
- DEBUG(("FOR.DO",tree->lnode));
- (void)interpret (tree->lnode);
- /* fall through */
- case TAG_CONTINUE: /* continue statement */
- DEBUG(("FOR.INCR",tree->forloop->incr));
- (void)interpret (tree->forloop->incr);
- break;
- case TAG_BREAK: /* break statement */
- RESTORE_BINDING (loop_tag_stack, loop_tag);
- return 1;
-#ifndef FAST
- default:
- abort (); /* never happens */
-#endif
- }
- }
- RESTORE_BINDING (loop_tag_stack, loop_tag);
- break;
-
- case Node_K_arrayfor:
-#define hakvar forloop->init
-#define arrvar forloop->incr
- PUSH_BINDING(loop_tag_stack, loop_tag);
- DEBUG(("AFOR.VAR",tree->hakvar));
- lhs=get_lhs(tree->hakvar);
- do_deref();
- for(l=assoc_scan(tree->arrvar);l;l=assoc_next(l)) {
- *lhs=dupnode(l->retval);
- DEBUG(("AFOR.NEXTIS",*lhs));
- switch(_setjmp(loop_tag)) {
- case 0:
- DEBUG(("AFOR.DO",tree->lnode));
- (void)interpret(tree->lnode);
- case TAG_CONTINUE:
- break;
-
- case TAG_BREAK:
- RESTORE_BINDING(loop_tag_stack, loop_tag);
- return 1;
-#ifndef FAST
- default:
- abort();
+/*
+ * This table is used by the regexp routines to do case independant
+ * matching. Basically, every ascii character maps to itself, except
+ * uppercase letters map to lower case ones. This table has 256
+ * entries, which may be overkill. Note also that if the system this
+ * is compiled on doesn't use 7-bit ascii, casetable[] should not be
+ * defined to the linker, so gawk should not load.
+ */
+#if 'a' == 97 /* it's ascii */
+static char casetable[] = {
+ '\000', '\001', '\002', '\003', '\004', '\005', '\006', '\007',
+ '\010', '\011', '\012', '\013', '\014', '\015', '\016', '\017',
+ '\020', '\021', '\022', '\023', '\024', '\025', '\026', '\027',
+ '\030', '\031', '\032', '\033', '\034', '\035', '\036', '\037',
+ /* ' ' '!' '"' '#' '$' '%' '&' ''' */
+ '\040', '\041', '\042', '\043', '\044', '\045', '\046', '\047',
+ /* '(' ')' '*' '+' ',' '-' '.' '/' */
+ '\050', '\051', '\052', '\053', '\054', '\055', '\056', '\057',
+ /* '0' '1' '2' '3' '4' '5' '6' '7' */
+ '\060', '\061', '\062', '\063', '\064', '\065', '\066', '\067',
+ /* '8' '9' ':' ';' '<' '=' '>' '?' */
+ '\070', '\071', '\072', '\073', '\074', '\075', '\076', '\077',
+ /* '@' 'A' 'B' 'C' 'D' 'E' 'F' 'G' */
+ '\100', '\141', '\142', '\143', '\144', '\145', '\146', '\147',
+ /* 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O' */
+ '\150', '\151', '\152', '\153', '\154', '\155', '\156', '\157',
+ /* 'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' */
+ '\160', '\161', '\162', '\163', '\164', '\165', '\166', '\167',
+ /* 'X' 'Y' 'Z' '[' '\' ']' '^' '_' */
+ '\170', '\171', '\172', '\133', '\134', '\135', '\136', '\137',
+ /* '`' 'a' 'b' 'c' 'd' 'e' 'f' 'g' */
+ '\140', '\141', '\142', '\143', '\144', '\145', '\146', '\147',
+ /* 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o' */
+ '\150', '\151', '\152', '\153', '\154', '\155', '\156', '\157',
+ /* 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' */
+ '\160', '\161', '\162', '\163', '\164', '\165', '\166', '\167',
+ /* 'x' 'y' 'z' '{' '|' '}' '~' */
+ '\170', '\171', '\172', '\173', '\174', '\175', '\176', '\177',
+ '\200', '\201', '\202', '\203', '\204', '\205', '\206', '\207',
+ '\210', '\211', '\212', '\213', '\214', '\215', '\216', '\217',
+ '\220', '\221', '\222', '\223', '\224', '\225', '\226', '\227',
+ '\230', '\231', '\232', '\233', '\234', '\235', '\236', '\237',
+ '\240', '\241', '\242', '\243', '\244', '\245', '\246', '\247',
+ '\250', '\251', '\252', '\253', '\254', '\255', '\256', '\257',
+ '\260', '\261', '\262', '\263', '\264', '\265', '\266', '\267',
+ '\270', '\271', '\272', '\273', '\274', '\275', '\276', '\277',
+ '\300', '\301', '\302', '\303', '\304', '\305', '\306', '\307',
+ '\310', '\311', '\312', '\313', '\314', '\315', '\316', '\317',
+ '\320', '\321', '\322', '\323', '\324', '\325', '\326', '\327',
+ '\330', '\331', '\332', '\333', '\334', '\335', '\336', '\337',
+ '\340', '\341', '\342', '\343', '\344', '\345', '\346', '\347',
+ '\350', '\351', '\352', '\353', '\354', '\355', '\356', '\357',
+ '\360', '\361', '\362', '\363', '\364', '\365', '\366', '\367',
+ '\370', '\371', '\372', '\373', '\374', '\375', '\376', '\377',
+};
+#else
+/* You lose. You will need a translation table for your character set. */
#endif
- }
- }
- RESTORE_BINDING(loop_tag_stack, loop_tag);
- break;
-
- case Node_K_break:
- DEBUG(("BREAK",NULL));
- if (loop_tag_valid == 0) /* jfw */
- panic("unexpected break or continue");
- _longjmp (loop_tag, TAG_BREAK);
- break;
-
- case Node_K_continue:
- DEBUG(("CONTINUE",NULL));
- if (loop_tag_valid == 0) /* jfw */
- panic("unexpected break or continue");
- _longjmp (loop_tag, TAG_CONTINUE);
- break;
-
- case Node_K_print:
- DEBUG(("PRINT",tree));
- (void)hack_print_node (tree);
- break;
-
- case Node_K_printf:
- DEBUG(("PRINTF",tree));
- (void)do_printf(tree);
- break;
-
- case Node_K_next:
- DEBUG(("NEXT",NULL));
- _longjmp (rule_tag, TAG_CONTINUE);
- break;
-
- case Node_K_exit:
- /* The unix awk doc says to skip the rest of the input. Does that
- mean after performing all the rules on the current line?
- Unix awk quits immediately, so this does too. */
- /* The UN*X exit can also take an optional arg return code. We don't */
- /* Well, we parse it, but never *DO* it */
- DEBUG(("EXIT",NULL));
- _longjmp (rule_tag, TAG_BREAK);
- break;
-
- default:
- /* Appears to be an expression statement. Throw away the value. */
- DEBUG(("E",NULL));
- (void)tree_eval (tree);
- break;
- }
- return 1;
-}
-/* evaluate a subtree, allocating strings on a temporary stack. */
-/* This used to return a whole NODE, instead of a ptr to one, but that
- led to lots of obnoxious copying. I got rid of it (JF) */
-NODE *
-tree_eval (tree)
- NODE *tree;
+/*
+ * Tree is a bunch of rules to run. Returns zero if it hit an exit()
+ * statement
+ */
+interpret(tree)
+NODE *tree;
{
- register NODE *r, *t1, *t2; /* return value and temporary subtrees */
- register NODE **lhs;
- static AWKNUM x; /* Why are these static? */
- extern struct obstack temp_strings;
-
- if(tree == NULL) {
- DEBUG(("NULL",NULL));
- return Nnull_string;
- }
- switch (tree->type) {
- /* trivial data */
- case Node_string:
- case Node_number:
- DEBUG(("DATA",tree));
- return tree;
-
- /* Builtins */
- case Node_builtin:
- DEBUG(("builtin",tree));
- return ((*tree->proc)(tree->subnode));
-
- /* unary operations */
-
- case Node_var:
- case Node_subscript:
- case Node_field_spec:
- DEBUG(("var_type ref",tree));
- lhs=get_lhs(tree);
- return *lhs;
-
- case Node_preincrement:
- case Node_predecrement:
- DEBUG(("+-X",tree));
- lhs=get_lhs(tree->subnode);
- assign_number(lhs,force_number(*lhs) + (tree->type==Node_preincrement ? 1.0 : -1.0));
- return *lhs;
-
- case Node_postincrement:
- case Node_postdecrement:
- DEBUG(("X+-",tree));
- lhs=get_lhs(tree->subnode);
- x = force_number(*lhs);
- assign_number (lhs, x + (tree->type==Node_postincrement ? 1.0 : -1.0));
- return tmp_number(x);
-
- case Node_unary_minus:
- DEBUG(("UMINUS",tree));
- return tmp_number(-force_number(tree_eval(tree->subnode)));
-
- /* assignments */
- case Node_assign:
- DEBUG(("ASSIGN",tree));
- r = tree_eval (tree->rnode);
- lhs=get_lhs(tree->lnode);
- *lhs= dupnode(r);
- do_deref();
- /* FOO we have to regenerate $0 here! */
- if(tree->lnode->type==Node_field_spec)
- fix_fields();
- return r;
- /* other assignment types are easier because they are numeric */
- case Node_assign_times:
- r = tree_eval (tree->rnode);
- lhs=get_lhs(tree->lnode);
- assign_number(lhs, force_number(*lhs) * force_number(r));
- do_deref();
- return r;
-
- case Node_assign_quotient:
- r = tree_eval (tree->rnode);
- lhs=get_lhs(tree->lnode);
- assign_number(lhs, force_number(*lhs) / force_number(r));
- do_deref();
- return r;
-
- case Node_assign_mod:
- r = tree_eval (tree->rnode);
- lhs=get_lhs(tree->lnode);
- assign_number(lhs, (AWKNUM)(((int) force_number(*lhs)) % ((int) force_number(r))));
- do_deref();
- return r;
-
- case Node_assign_plus:
- r = tree_eval (tree->rnode);
- lhs=get_lhs(tree->lnode);
- assign_number(lhs, force_number(*lhs) + force_number(r));
- do_deref();
- return r;
-
- case Node_assign_minus:
- r = tree_eval (tree->rnode);
- lhs=get_lhs(tree->lnode);
- assign_number(lhs, force_number(*lhs) - force_number(r));
- do_deref();
- return r;
- }
- /* Note that if TREE is invalid, gAWK will probably bomb in one of these
- tree_evals here. */
- /* evaluate subtrees in order to do binary operation, then keep going */
- t1 = tree_eval (tree->lnode);
- t2 = tree_eval (tree->rnode);
-
- switch (tree->type) {
-
- case Node_concat:
- t1=force_string(t1);
- t2=force_string(t2);
-
- r=(NODE *)obstack_alloc(&temp_strings,sizeof(NODE));
- r->type=Node_temp_string;
- r->stlen=t1->stlen+t2->stlen;
- r->stref=1;
- r->stptr=(char *)obstack_alloc(&temp_strings,r->stlen+1);
- bcopy(t1->stptr,r->stptr,t1->stlen);
- bcopy(t2->stptr,r->stptr+t1->stlen,t2->stlen);
- r->stptr[r->stlen]='\0';
- return r;
-
- case Node_times:
- return tmp_number(force_number(t1) * force_number(t2));
-
- case Node_quotient:
- x=force_number(t2);
- if(x==(AWKNUM)0) return tmp_number((AWKNUM)0);
- else return tmp_number(force_number(t1) / x);
-
- case Node_mod:
- x=force_number(t2);
- if(x==(AWKNUM)0) return tmp_number((AWKNUM)0);
- return tmp_number((AWKNUM) /* uggh... */
- (((int) force_number(t1)) % ((int) x)));
-
- case Node_plus:
- return tmp_number(force_number(t1) + force_number(t2));
-
- case Node_minus:
- return tmp_number(force_number(t1) - force_number(t2));
-
-#ifndef FAST
- default:
- fprintf (stderr, "internal error: illegal numeric operation\n");
- abort ();
-#endif
- }
- return 0;
-}
+ register NODE *t; /* temporary */
-/* We can't dereference a variable until after we've given it its new value.
- This variable points to the value we have to free up */
-NODE *deref;
+ auto jmp_buf loop_tag_stack; /* shallow binding stack for loop_tag */
+ static jmp_buf loop_tag;/* always the current binding */
+ static int loop_tag_valid = 0; /* nonzero when loop_tag valid */
-/* This returns a POINTER to a node pointer.
- *get_lhs(ptr) is the current value of the var, or where to store the
- var's new value */
+ static jmp_buf rule_tag;/* tag the rule currently being run, for NEXT
+ * and EXIT statements. It is static because
+ * there are no nested rules */
-NODE **
-get_lhs(ptr)
-NODE *ptr;
-{
- register NODE *subexp;
- register NODE **aptr;
- register int num;
- extern NODE **fields_arr;
- extern f_arr_siz;
- NODE **assoc_lookup();
- extern char f_empty[]; /* jfw */
-
-#ifndef FAST
- if(ptr == NULL)
- abort();
-#endif
- deref = NULL;
- switch(ptr->type) {
- case Node_var:
- deref=ptr->var_value;
- return &(ptr->var_value);
-
- case Node_field_spec:
- num=(int)force_number(tree_eval(ptr->lnode));
- if(num<0) num=0; /* JF what should I do? */
- if(num>f_arr_siz)
- set_field(num,f_empty,0); /* jfw: so blank_strings can be simpler */
- deref = NULL;
- return &fields_arr[num];
-
- case Node_subscript:
- subexp = tree_eval(ptr->rnode);
- aptr=assoc_lookup(ptr->lnode,subexp);
- deref= *aptr;
- return aptr;
- }
-#ifndef FAST
- abort();
- return 0;
-#endif
-}
+ register NODE **lhs; /* lhs == Left Hand Side for assigns, etc */
+ register struct search *l; /* For array_for */
-do_deref()
-{
- if(deref) {
- switch(deref->type) {
- case Node_string:
- if(deref!=Nnull_string)
- FREE_ONE_REFERENCE(deref);
- break;
- case Node_number:
- free((char *)deref);
- break;
-#ifndef FAST
- default:
- abort();
-#endif
- }
- deref = 0;
- }
-}
-/* This makes numeric operations slightly more efficient.
- Just change the value of a numeric node, if possible */
-assign_number (ptr, value)
-NODE **ptr;
-AWKNUM value;
-{
- switch ((*ptr)->type) {
- case Node_string:
- if(*ptr!=Nnull_string)
- FREE_ONE_REFERENCE (*ptr);
- case Node_temp_string: /* jfw: dont crash if we say $2 += 4 */
- *ptr=make_number(value);
- return;
- case Node_number:
- (*ptr)->numbr = value;
- deref=0;
- break;
-#ifndef FAST
- default:
- printf("assign_number nodetype %d\n", (*ptr)->type); /* jfw: add mesg. */
- abort ();
-#endif
- }
-}
+ extern NODE **fields_arr;
+ extern int exiting, exit_val;
+ NODE *do_printf();
+ extern NODE *lookup();
-
-/* Routines to deal with fields */
-#define ORIG_F 30
+ /*
+ * clean up temporary strings created by evaluating expressions in
+ * previous recursive calls
+ */
-NODE **fields_arr;
-NODE *fields_nodes;
-int f_arr_siz;
-char f_empty [] = "";
+ if (tree == NULL)
+ return 1;
+ sourceline = tree->source_line;
+ source = tree->source_file;
+ switch (tree->type) {
+ case Node_rule_list:
+ for (t = tree; t != NULL; t = t->rnode) {
+ tree = t->lnode;
+ switch (_setjmp(rule_tag)) {
+ case 0: /* normal non-jump */
+ if (eval_condition(tree->lnode)) { /* pattern */
+ DBG_P(("Found a rule", tree->rnode));
+ if (tree->rnode == NULL) {
+ /*
+ * special case: pattern with
+ * no action is equivalent to
+ * an action of {print}
+ */
+ NODE printnode;
+
+ printnode.type = Node_K_print;
+ printnode.lnode = NULL;
+ printnode.rnode = NULL;
+ do_print(&printnode);
+ } else if (tree->rnode->type == Node_illegal) {
+ /*
+ * An empty statement
+ * (``{ }'') is different
+ * from a missing statement.
+ * A missing statement is
+ * equal to ``{ print }'' as
+ * above, but an empty
+ * statement is as in C, do
+ * nothing.
+ */
+ } else
+ (void) interpret(t->lnode->rnode);
+ }
+ break;
+ case TAG_CONTINUE: /* NEXT statement */
+ return 1;
+ case TAG_BREAK:
+ return 0;
+ }
+ }
+ break;
-init_fields()
-{
- register NODE **tmp;
- register NODE *xtmp;
-
- f_arr_siz=ORIG_F;
- fields_arr=(NODE **)malloc(ORIG_F * sizeof(NODE *));
- fields_nodes=(NODE *)malloc(ORIG_F * sizeof(NODE));
- tmp= &fields_arr[f_arr_siz];
- xtmp= &fields_nodes[f_arr_siz];
- while(--tmp>= &fields_arr[0]) {
- --xtmp;
- *tmp=xtmp;
- xtmp->type=Node_temp_string;
- xtmp->stlen=0;
- xtmp->stref=1;
- xtmp->stptr=f_empty;
+ case Node_statement_list:
+ /*
+ * because BEGIN and END do not have Node_rule_list nature,
+ * yet can have exits and nexts, we special-case a setjmp of
+ * rule_tag here.
+ */
+ if (tree == begin_block || tree == end_block) {
+ switch (_setjmp(rule_tag)) {
+ case TAG_CONTINUE: /* next */
+ fatal("unexpected \"next\" in %s block",
+ tree == begin_block ? "BEGIN" : "END");
+ return 1;
+ case TAG_BREAK:
+ return 0;
+ }
+ }
+ for (t = tree; t != NULL; t = t->rnode) {
+ DBG_P(("Statements", t->lnode));
+ (void) interpret(t->lnode);
+ }
+ break;
+
+ case Node_K_if:
+ DBG_P(("IF", tree->lnode));
+ if (eval_condition(tree->lnode)) {
+ DBG_P(("True", tree->rnode->lnode));
+ (void) interpret(tree->rnode->lnode);
+ } else {
+ DBG_P(("False", tree->rnode->rnode));
+ (void) interpret(tree->rnode->rnode);
+ }
+ break;
+
+ case Node_K_while:
+ PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+
+ DBG_P(("WHILE", tree->lnode));
+ while (eval_condition(tree->lnode)) {
+ switch (_setjmp(loop_tag)) {
+ case 0: /* normal non-jump */
+ DBG_P(("DO", tree->rnode));
+ (void) interpret(tree->rnode);
+ break;
+ case TAG_CONTINUE: /* continue statement */
+ break;
+ case TAG_BREAK: /* break statement */
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ return 1;
+ default:
+ cant_happen();
+ }
+ }
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ break;
+
+ case Node_K_do:
+ PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+
+ do {
+ switch (_setjmp(loop_tag)) {
+ case 0: /* normal non-jump */
+ DBG_P(("DO", tree->rnode));
+ (void) interpret(tree->rnode);
+ break;
+ case TAG_CONTINUE: /* continue statement */
+ break;
+ case TAG_BREAK: /* break statement */
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ return 1;
+ default:
+ cant_happen();
+ }
+ DBG_P(("WHILE", tree->lnode));
+ } while (eval_condition(tree->lnode));
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ break;
+
+ case Node_K_for:
+ PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+
+ DBG_P(("FOR", tree->forloop->init));
+ (void) interpret(tree->forloop->init);
+
+ DBG_P(("FOR.WHILE", tree->forloop->cond));
+ while (eval_condition(tree->forloop->cond)) {
+ switch (_setjmp(loop_tag)) {
+ case 0: /* normal non-jump */
+ DBG_P(("FOR.DO", tree->lnode));
+ (void) interpret(tree->lnode);
+ /* fall through */
+ case TAG_CONTINUE: /* continue statement */
+ DBG_P(("FOR.INCR", tree->forloop->incr));
+ (void) interpret(tree->forloop->incr);
+ break;
+ case TAG_BREAK: /* break statement */
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ return 1;
+ default:
+ cant_happen();
+ }
+ }
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ break;
+
+ case Node_K_arrayfor:
+#define hakvar forloop->init
+#define arrvar forloop->incr
+ PUSH_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ DBG_P(("AFOR.VAR", tree->hakvar));
+ lhs = get_lhs(tree->hakvar);
+ t = tree->arrvar;
+ if (tree->arrvar->type == Node_param_list)
+ t = stack_ptr[tree->arrvar->param_cnt];
+ for (l = assoc_scan(t); l; l = assoc_next(l)) {
+ deref = *lhs;
+ do_deref();
+ *lhs = dupnode(l->retval);
+ if (field_num == 0)
+ set_record(fields_arr[0]->stptr,
+ fields_arr[0]->stlen);
+ else if (field_num > 0) {
+ node0_valid = 0;
+ if (NF_node->var_value->numbr == -1 &&
+ field_num > NF_node->var_value->numbr)
+ assign_number(&(NF_node->var_value),
+ (AWKNUM) field_num);
+ }
+ DBG_P(("AFOR.NEXTIS", *lhs));
+ switch (_setjmp(loop_tag)) {
+ case 0:
+ DBG_P(("AFOR.DO", tree->lnode));
+ (void) interpret(tree->lnode);
+ case TAG_CONTINUE:
+ break;
+
+ case TAG_BREAK:
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ field_num = -1;
+ return 1;
+ default:
+ cant_happen();
+ }
+ }
+ field_num = -1;
+ RESTORE_BINDING(loop_tag_stack, loop_tag, loop_tag_valid);
+ break;
+
+ case Node_K_break:
+ DBG_P(("BREAK", NULL));
+ if (loop_tag_valid == 0)
+ fatal("unexpected break");
+ _longjmp(loop_tag, TAG_BREAK);
+ break;
+
+ case Node_K_continue:
+ DBG_P(("CONTINUE", NULL));
+ if (loop_tag_valid == 0)
+ fatal("unexpected continue");
+ _longjmp(loop_tag, TAG_CONTINUE);
+ break;
+
+ case Node_K_print:
+ DBG_P(("PRINT", tree));
+ (void) do_print(tree);
+ break;
+
+ case Node_K_printf:
+ DBG_P(("PRINTF", tree));
+ (void) do_printf(tree);
+ break;
+
+ case Node_K_next:
+ DBG_P(("NEXT", NULL));
+ _longjmp(rule_tag, TAG_CONTINUE);
+ break;
+
+ case Node_K_exit:
+ /*
+ * In A,K,&W, p. 49, it says that an exit statement "...
+ * causes the program to behave as if the end of input had
+ * occurred; no more input is read, and the END actions, if
+ * any are executed." This implies that the rest of the rules
+ * are not done. So we immediately break out of the main loop.
+ */
+ DBG_P(("EXIT", NULL));
+ exiting = 1;
+ if (tree)
+ exit_val = (int) force_number(tree_eval(tree->lnode));
+ free_result();
+ _longjmp(rule_tag, TAG_BREAK);
+ break;
+
+ case Node_K_function:
+ break;
+
+ case Node_K_return:
+ DBG_P(("RETURN", NULL));
+ ret_node = dupnode(tree_eval(tree->lnode));
+ ret_node->flags |= TEMP;
+ _longjmp(func_tag, TAG_RETURN);
+ break;
+
+ default:
+ /*
+ * Appears to be an expression statement. Throw away the
+ * value.
+ */
+ DBG_P(("E", NULL));
+ (void) tree_eval(tree);
+ free_result();
+ break;
}
+ return 1;
}
-blank_fields()
+/* evaluate a subtree, allocating strings on a temporary stack. */
+
+NODE *
+r_tree_eval(tree)
+NODE *tree;
{
- register NODE **tmp;
- extern char *parse_end;
-
- tmp= &fields_arr[f_arr_siz];
- while(--tmp>= &fields_arr[0]) {
- switch(tmp[0]->type) {
- case Node_number:
- free((char *)*tmp);
- *tmp= &fields_nodes[tmp-fields_arr];
- break;
- case Node_string:
- if(*tmp!=Nnull_string)
- FREE_ONE_REFERENCE(*tmp);
- *tmp= &fields_nodes[tmp-fields_arr];
- break;
- case Node_temp_string:
- break;
-#ifndef FAST
- default:
- abort();
-#endif
+ NODE *op_assign();
+ register NODE *r, *t1, *t2; /* return value & temporary subtrees */
+ int i;
+ register NODE **lhs;
+ int di;
+ AWKNUM x;
+ int samecase = 0;
+ extern int ignorecase;
+ struct re_pattern_buffer *rp;
+ extern NODE **fields_arr;
+ extern NODE *do_getline();
+ extern NODE *do_match();
+ extern NODE *do_sub();
+ extern double pow();
+
+ if (tree->type != Node_var)
+ source = tree->source_file;
+ sourceline = tree->source_line;
+ switch (tree->type) {
+ case Node_and:
+ DBG_P(("AND", tree));
+ return tmp_number((AWKNUM) (eval_condition(tree->lnode)
+ && eval_condition(tree->rnode)));
+
+ case Node_or:
+ DBG_P(("OR", tree));
+ return tmp_number((AWKNUM) (eval_condition(tree->lnode)
+ || eval_condition(tree->rnode)));
+
+ case Node_not:
+ DBG_P(("NOT", tree));
+ return tmp_number((AWKNUM) ! eval_condition(tree->lnode));
+
+ /* Builtins */
+ case Node_builtin:
+ DBG_P(("builtin", tree));
+ return ((*tree->proc) (tree->subnode));
+
+ case Node_K_getline:
+ DBG_P(("GETLINE", tree));
+ return (do_getline(tree));
+
+ case Node_in_array:
+ DBG_P(("IN_ARRAY", tree));
+ return tmp_number((AWKNUM) in_array(tree->lnode, tree->rnode));
+
+ case Node_K_match:
+ DBG_P(("MATCH", tree));
+ return do_match(tree);
+
+ case Node_sub:
+ case Node_gsub:
+ DBG_P(("SUB", tree));
+ return do_sub(tree);
+
+ case Node_func_call:
+ DBG_P(("func_call", tree));
+ return func_call(tree->rnode, tree->lnode);
+
+ case Node_K_delete:
+ DBG_P(("DELETE", tree));
+ do_delete(tree->lnode, tree->rnode);
+ return Nnull_string;
+
+ /* unary operations */
+
+ case Node_var:
+ case Node_var_array:
+ case Node_param_list:
+ case Node_subscript:
+ case Node_field_spec:
+ DBG_P(("var_type ref", tree));
+ lhs = get_lhs(tree);
+ field_num = -1;
+ deref = 0;
+ return *lhs;
+
+ case Node_unary_minus:
+ DBG_P(("UMINUS", tree));
+ x = -force_number(tree_eval(tree->subnode));
+ free_result();
+ return tmp_number(x);
+
+ case Node_cond_exp:
+ DBG_P(("?:", tree));
+ if (eval_condition(tree->lnode)) {
+ DBG_P(("True", tree->rnode->lnode));
+ return tree_eval(tree->rnode->lnode);
+ } else {
+ DBG_P(("False", tree->rnode->rnode));
+ return tree_eval(tree->rnode->rnode);
}
- if ((*tmp)->stptr != f_empty) { /* jfw */
- /*Then it was assigned a string with set_field */
- /*out of a private buffer to inrec, so don't free it*/
- (*tmp)->stptr = f_empty;
- (*tmp)->stlen = 0;
- (*tmp)->stref = 1;
+ break;
+
+ case Node_case_match:
+ case Node_case_nomatch:
+ samecase = 1;
+ /* fall through */
+ case Node_match:
+ case Node_nomatch:
+ DBG_P(("ASSIGN_[no]match", tree));
+ t1 = force_string(tree_eval(tree->lnode));
+ if (tree->rnode->type == Node_regex)
+ rp = tree->rnode->rereg;
+ else {
+ rp = make_regexp(force_string(tree_eval(tree->rnode)));
+ if (rp == NULL)
+ cant_happen();
}
- /* *tmp=Nnull_string; */
+ if (! strict && (ignorecase || samecase))
+ rp->translate = casetable;
+ i = re_search(rp, t1->stptr, t1->stlen, 0, t1->stlen,
+ (struct re_registers *) NULL);
+ i = (i == -1) ^ (tree->type == Node_match ||
+ tree->type == Node_case_match);
+ free_temp(t1);
+ return tmp_number((AWKNUM) i);
+
+ case Node_func:
+ fatal("function `%s' called with space between name and (,\n%s",
+ tree->lnode->param,
+ "or used in other expression context");
+
+ /* assignments */
+ case Node_assign:
+ DBG_P(("ASSIGN", tree));
+ r = tree_eval(tree->rnode);
+ lhs = get_lhs(tree->lnode);
+ *lhs = dupnode(r);
+ if (field_num == 0)
+ set_record(fields_arr[0]->stptr, fields_arr[0]->stlen);
+ else if (field_num > 0) {
+ node0_valid = 0;
+ if (NF_node->var_value->numbr == -1 &&
+ field_num > NF_node->var_value->numbr)
+ assign_number(&(NF_node->var_value),
+ (AWKNUM) field_num);
+ }
+ field_num = -1;
+ do_deref();
+ return *lhs;
+
+ /* other assignment types are easier because they are numeric */
+ case Node_preincrement:
+ case Node_predecrement:
+ case Node_postincrement:
+ case Node_postdecrement:
+ case Node_assign_exp:
+ case Node_assign_times:
+ case Node_assign_quotient:
+ case Node_assign_mod:
+ case Node_assign_plus:
+ case Node_assign_minus:
+ return op_assign(tree);
}
- /* Free the strings */
- obstack_free(&other_stack,parse_end);
-}
-/* Danger! Must only be called for fields we know have just been blanked,
- or fields we know don't exist yet. */
-set_field(n,str,len)
-char *str;
-{
- NODE *field_string();
-
- if(n>f_arr_siz) {
- int t;
-
- fields_arr=(NODE **)realloc((char *)fields_arr,(n+1)*sizeof(NODE *));
- fields_nodes=(NODE *)realloc((char *)fields_nodes,(n+1)*sizeof(NODE));
- for(t=f_arr_siz;t<=n;t++) {
- fields_arr[t]= &fields_nodes[t];
- fields_nodes[t].type=Node_temp_string;
- fields_nodes[t].stlen=0;
- fields_nodes[t].stref=1;
- fields_nodes[t].stptr=f_empty;
+ /*
+ * Note that if TREE is invalid, gawk will probably bomb in one of
+ * these tree_evals here.
+ */
+ /* evaluate subtrees in order to do binary operation, then keep going */
+ t1 = tree_eval(tree->lnode);
+ t2 = tree_eval(tree->rnode);
+
+ switch (tree->type) {
+ case Node_concat:
+ DBG_P(("CONCAT", tree));
+ t1 = force_string(t1);
+ t2 = force_string(t2);
+
+ emalloc(r, NODE *, sizeof(NODE), "tree_eval");
+ r->type = Node_val;
+ r->flags = (STR|TEMP);
+ r->stlen = t1->stlen + t2->stlen;
+ r->stref = 1;
+ emalloc(r->stptr, char *, r->stlen + 1, "tree_eval");
+ bcopy(t1->stptr, r->stptr, t1->stlen);
+ bcopy(t2->stptr, r->stptr + t1->stlen, t2->stlen);
+ r->stptr[r->stlen] = '\0';
+ free_temp(t1);
+ free_temp(t2);
+ return r;
+
+ case Node_geq:
+ case Node_leq:
+ case Node_greater:
+ case Node_less:
+ case Node_notequal:
+ case Node_equal:
+ di = cmp_nodes(t1, t2);
+ free_temp(t1);
+ free_temp(t2);
+ switch (tree->type) {
+ case Node_equal:
+ DBG_P(("EQUAL", tree));
+ return tmp_number((AWKNUM) (di == 0));
+ case Node_notequal:
+ DBG_P(("NOT_EQUAL", tree));
+ return tmp_number((AWKNUM) (di != 0));
+ case Node_less:
+ DBG_P(("LESS_THAN", tree));
+ return tmp_number((AWKNUM) (di < 0));
+ case Node_greater:
+ DBG_P(("GREATER_THAN", tree));
+ return tmp_number((AWKNUM) (di > 0));
+ case Node_leq:
+ DBG_P(("LESS_THAN_EQUAL", tree));
+ return tmp_number((AWKNUM) (di <= 0));
+ case Node_geq:
+ DBG_P(("GREATER_THAN_EQUAL", tree));
+ return tmp_number((AWKNUM) (di >= 0));
}
- f_arr_siz=n+1;
+ break;
}
- fields_nodes[n].stlen=len;
- if(n==0) {
- fields_nodes[n].stptr=(char*)obstack_alloc(&other_stack,len+1);
- bcopy(str,fields_nodes[n].stptr,len);
- fields_nodes[n].stptr[len]='\0';
- } else {
- fields_nodes[n].stptr=str;
- str[len]='\0';
+
+ (void) force_number(t1);
+ (void) force_number(t2);
+
+ switch (tree->type) {
+ case Node_exp:
+ DBG_P(("EXPONENT", tree));
+ x = pow((double) t1->numbr, (double) t2->numbr);
+ free_temp(t1);
+ free_temp(t2);
+ return tmp_number(x);
+
+ case Node_times:
+ DBG_P(("MULT", tree));
+ x = t1->numbr * t2->numbr;
+ free_temp(t1);
+ free_temp(t2);
+ return tmp_number(x);
+
+ case Node_quotient:
+ DBG_P(("DIVIDE", tree));
+ x = t2->numbr;
+ free_temp(t2);
+ if (x == (AWKNUM) 0) {
+ free_temp(t1);
+ return tmp_number((AWKNUM) 0);
+ } else {
+ x = t1->numbr / x;
+ free_temp(t1);
+ return tmp_number(x);
+ }
+
+ case Node_mod:
+ DBG_P(("MODULUS", tree));
+ x = t2->numbr;
+ free_temp(t2);
+ if (x == (AWKNUM) 0) {
+ free_temp(t1);
+ return tmp_number((AWKNUM) 0);
+ }
+ x = ((int) t1->numbr) % ((int) x);
+ free_temp(t1);
+ return tmp_number(x);
+
+ case Node_plus:
+ DBG_P(("PLUS", tree));
+ x = t1->numbr + t2->numbr;
+ free_temp(t1);
+ free_temp(t2);
+ return tmp_number(x);
+
+ case Node_minus:
+ DBG_P(("MINUS", tree));
+ x = t1->numbr - t2->numbr;
+ free_temp(t1);
+ free_temp(t2);
+ return tmp_number(x);
+
+ default:
+ fatal("illegal type (%d) in tree_eval", tree->type);
}
+ return 0;
}
-#ifdef DONTDEF
-/* Nodes created with this will go away when the next input line is read */
-NODE *
-field_string(s,len)
-char *s;
+/*
+ * This makes numeric operations slightly more efficient. Just change the
+ * value of a numeric node, if possible
+ */
+assign_number(ptr, value)
+NODE **ptr;
+AWKNUM value;
{
- register NODE *r;
-
- r=(NODE *)obstack_alloc(&other_stack,sizeof(NODE));
- r->type=Node_temp_string;
- r->stref=1;
- r->stlen=len;
- r->stptr=(char*)obstack_alloc(&other_stack,len+1);
- bcopy(s,r->stptr,len);
- /* r->stptr=s;
- r->stptr[len]='\0'; */
-
- return r;
-}
-#endif
+ extern NODE *deref;
-/* Someone assigned a value to $(something). Fix up $0 to be right */
-fix_fields()
-{
- register int tlen;
- register NODE *tmp;
- NODE *ofs;
- char *ops;
- register char *cops;
- register NODE **ptr,**maxp;
- extern NODE *OFS_node;
-
- maxp=0;
- tlen=0;
- ofs=force_string(*get_lhs(OFS_node));
- ptr= &fields_arr[f_arr_siz];
- while(--ptr> &fields_arr[0]) {
- tmp=force_string(*ptr);
- tlen+=tmp->stlen;
- if(tmp->stlen && !maxp)
- maxp=ptr;
- }
- if(!maxp) {
- if (fields_arr[0] != fields_nodes)
- FREE_ONE_REFERENCE(fields_arr[0]);
- fields_arr[0]=Nnull_string;
+#ifdef DEBUG
+ if ((*ptr)->type != Node_val)
+ cant_happen();
+#endif
+ if (*ptr == Nnull_string) {
+ *ptr = make_number(value);
+ deref = 0;
return;
}
-
- tlen+=((maxp-fields_arr)-1)*ofs->stlen;
- ops=(char *)malloc(tlen+1);
- cops=ops;
- for(ptr= &fields_arr[1];ptr<=maxp;ptr++) {
- tmp=force_string(*ptr);
- bcopy(tmp->stptr,cops,tmp->stlen);
- cops+=tmp->stlen;
- if(ptr!=maxp) {
- bcopy(ofs->stptr,cops,ofs->stlen);
- cops+=ofs->stlen;
- }
+ if ((*ptr)->stref > 1) {
+ *ptr = make_number(value);
+ return;
}
- tmp=newnode(Node_string);
- tmp->stptr=ops;
- tmp->stlen=tlen;
- tmp->stref=1;
- tmp->stptr[tlen]='\0';
- /* don't free unless it's new */
- if (fields_arr[0] != fields_nodes)
- FREE_ONE_REFERENCE(fields_arr[0]);
- fields_arr[0]=tmp;
+ (*ptr)->numbr = value;
+ (*ptr)->flags |= NUM;
+ (*ptr)->flags &= ~STR;
+ (*ptr)->stref = 0;
+ deref = 0;
}
-
+
/* Is TREE true or false? Returns 0==false, non-zero==true */
int
-eval_condition (tree)
+eval_condition(tree)
NODE *tree;
{
- register int di;
- register NODE *t1,*t2;
-
- if(tree==NULL) /* Null trees are the easiest kinds */
- return 1;
- switch (tree->type) {
- /* Maybe it's easy; check and see. */
- /* BEGIN and END are always false */
- case Node_K_BEGIN:
- return 0;
- break;
-
- case Node_K_END:
- return 0;
- break;
-
- case Node_and:
- return eval_condition (tree->lnode)
- && eval_condition (tree->rnode);
-
- case Node_or:
- return eval_condition (tree->lnode)
- || eval_condition (tree->rnode);
-
- case Node_not:
- return !eval_condition (tree->lnode);
-
- /* Node_line_range is kind of like Node_match, EXCEPT:
- * the lnode field (more properly, the condpair field) is a node of
- * a Node_cond_pair; whether we evaluate the lnode of that node or the
- * rnode depends on the triggered word. More precisely: if we are not
- * yet triggered, we tree_eval the lnode; if that returns true, we set
- * the triggered word. If we are triggered (not ELSE IF, note), we
- * tree_eval the rnode, clear triggered if it succeeds, and perform our
- * action (regardless of success or failure). We want to be able to
- * begin and end on a single input record, so this isn't an ELSE IF, as
- * noted above.
- * This feature was implemented by John Woods, jfw@eddie.mit.edu, during
- * a rainy weekend.
- */
- case Node_line_range:
- if (!tree->triggered)
- if (!eval_condition(tree->condpair->lnode))
+ register NODE *t1;
+ int ret;
+ extern double atof();
+
+ if (tree == NULL) /* Null trees are the easiest kinds */
+ return 1;
+ switch (tree->type) {
+ /* Maybe it's easy; check and see. */
+ /* BEGIN and END are always false */
+ case Node_K_BEGIN:
+ case Node_K_END:
return 0;
- else
- tree->triggered = 1;
- /* Else we are triggered */
- if (eval_condition(tree->condpair->rnode))
- tree->triggered = 0;
- return 1;
- }
-
- /* Could just be J.random expression.
- in which case, null and 0 are false,
- anything else is true */
-
- switch(tree->type) {
- case Node_match:
- case Node_nomatch:
- case Node_equal:
- case Node_notequal:
- case Node_less:
- case Node_greater:
- case Node_leq:
- case Node_geq:
- break;
-
- default: /* This is so 'if(iggy)', etc, will work */
- /* Non-zero and non-empty are true */
- t1=tree_eval(tree);
- switch(t1->type) {
- case Node_number:
- return t1->numbr!=0.0;
- case Node_string:
- case Node_temp_string:
- return t1->stlen!=0;
-#ifndef FAST
- default:
- abort();
-#endif
- }
- }
- /* couldn't fob it off recursively, eval left subtree and
- see if it's a pattern match operation */
-
- t1 = tree_eval (tree->lnode);
-
- if (tree->type == Node_match || tree->type == Node_nomatch) {
- t1=force_string(t1);
- return (re_search (tree->rereg, t1->stptr,
- t1->stlen, 0, t1->stlen,
- NULL) == -1)
- ^ (tree->type == Node_match);
- }
-
- /* still no luck--- eval the right subtree and try binary ops */
-
- t2 = tree_eval (tree->rnode);
-
- di=cmp_nodes(t1,t2);
-
- switch (tree->type) {
- case Node_equal:
- return di == 0;
- case Node_notequal:
- return di != 0;
- case Node_less:
- return di < 0;
- case Node_greater:
- return di > 0;
- case Node_leq:
- return di <= 0;
- case Node_geq:
- return di >= 0;
-#ifndef FAST
- default:
- fprintf(stderr,"Panic: unknown conditonal\n");
- abort ();
+ break;
+
+ /*
+ * Node_line_range is kind of like Node_match, EXCEPT: the
+ * lnode field (more properly, the condpair field) is a node
+ * of a Node_cond_pair; whether we evaluate the lnode of that
+ * node or the rnode depends on the triggered word. More
+ * precisely: if we are not yet triggered, we tree_eval the
+ * lnode; if that returns true, we set the triggered word.
+ * If we are triggered (not ELSE IF, note), we tree_eval the
+ * rnode, clear triggered if it succeeds, and perform our
+ * action (regardless of success or failure). We want to be
+ * able to begin and end on a single input record, so this
+ * isn't an ELSE IF, as noted above.
+ */
+ case Node_line_range:
+ if (!tree->triggered)
+ if (!eval_condition(tree->condpair->lnode))
+ return 0;
+ else
+ tree->triggered = 1;
+ /* Else we are triggered */
+ if (eval_condition(tree->condpair->rnode))
+ tree->triggered = 0;
+ return 1;
+ }
+
+ /*
+ * Could just be J.random expression. in which case, null and 0 are
+ * false, anything else is true
+ */
+
+ t1 = tree_eval(tree);
+#ifdef DEBUG
+ if (t1->type != Node_val)
+ cant_happen();
#endif
- }
- return 0;
+ if (t1->flags & STR)
+ ret = t1->stlen != 0;
+ else
+ ret = t1->numbr != 0.0;
+ free_temp(t1);
+ return ret;
}
-/* FOO this doesn't properly compare "12.0" and 12.0 etc */
-/* or "1E1" and 10 etc */
-/* Perhaps someone should fix it. */
-/* Consider it fixed (jfw) */
-
-/* strtod() would have been better, except (1) real awk is needlessly
- * restrictive in what strings it will consider to be numbers, and
- * (2) I couldn't find the public domain version anywhere handy.
+/*
+ * strtod() would have been better, except (1) real awk is needlessly
+ * restrictive in what strings it will consider to be numbers, and (2) I
+ * couldn't find the public domain version anywhere handy.
*/
+static int
is_a_number(str) /* does the string str have pure-numeric syntax? */
char *str; /* don't convert it, assume that atof is better */
{
- if (*str == 0) return 1; /* null string has numeric value of0 */
- /* This is still a bug: in real awk, an explicit "" string
- * is not treated as a number. Perhaps it is only variables
- * that, when empty, are also 0s. This bug-lette here at
- * least lets uninitialized variables to compare equal to
- * zero like they should.
- */
- if (*str == '-') str++;
- if (*str == 0) return 0;
+ if (*str == 0)
+ return 0; /* null string is not equal to 0 */
+
+ if (*str == '-')
+ str++;
+ if (*str == 0)
+ return 0;
/* must be either . or digits (.4 is legal) */
- if (*str != '.' && !isdigit(*str)) return 0;
- while (isdigit(*str)) str++;
+ if (*str != '.' && !isdigit(*str))
+ return 0;
+ while (isdigit(*str))
+ str++;
if (*str == '.') {
str++;
- while (isdigit(*str)) str++;
+ while (isdigit(*str))
+ str++;
}
- /* curiously, real awk DOESN'T consider "1E1" to be equal to 10!
- * Or even equal to 1E1 for that matter! For a laugh, try:
- * awk 'BEGIN {if ("1E1" == 1E1) print "eq"; else print "neq";exit}'
+
+ /*
+ * curiously, real awk DOESN'T consider "1E1" to be equal to 10! Or
+ * even equal to 1E1 for that matter! For a laugh, try:
+ * awk 'BEGIN {if ("1E1" == 1E1) print "eq"; else print "neq"; exit}'
* Since this behavior is QUITE curious, I include the code for the
- * adventurous. One might also feel like skipping leading whitespace
+ * adventurous. One might also feel like skipping leading whitespace
* (awk doesn't) and allowing a leading + (awk doesn't).
+ */
#ifdef Allow_Exponents
if (*str == 'e' || *str == 'E') {
str++;
- if (*str == '+' || *str == '-') str++;
- if (!isdigit(*str)) return 0;
- while (isdigit(*str)) str++;
+ if (*str == '+' || *str == '-')
+ str++;
+ if (!isdigit(*str))
+ return 0;
+ while (isdigit(*str))
+ str++;
}
#endif
- /* if we have digested the whole string, we are successful */
+ /*
+ * if we have digested the whole string, we are
+ * successful
+ */
return (*str == 0);
}
-cmp_nodes(t1,t2)
-NODE *t1,*t2;
-{
- register int di;
- register AWKNUM d;
-
-
- if(t1==t2) {
- return 0;
- }
-#ifndef FAST
- if(!t1 || !t2) {
- abort();
- return t1 ? 1 : -1;
- }
-
-#endif
- if (t1->type == Node_number && t2->type == Node_number) {
- d = t1->numbr - t2->numbr;
- if (d < 0.0)
- return -1;
- if (d > 0.0)
- return 1;
- return 0;
- }
- t1=force_string(t1);
- t2=force_string(t2);
- /* "real" awk treats things as numbers if they both "look" like numbers. */
- if (*t1->stptr && *t2->stptr /* don't allow both to be empty strings(jfw)*/
- && is_a_number(t1->stptr) && is_a_number(t2->stptr)) {
- double atof();
- d = atof(t1->stptr) - atof(t2->stptr);
- if (d < 0.0) return -1;
- if (d > 0.0) return 1;
- return 0;
- }
- di = strncmp (t1->stptr, t2->stptr, min (t1->stlen, t2->stlen));
- if (di == 0)
- di = t1->stlen - t2->stlen;
- if(di>0) return 1;
- if(di<0) return -1;
- return 0;
-}
-
-
-#ifdef DONTDEF
-int primes[] = {31,61,127,257,509,1021,2053,4099,8191,16381};
-#endif
-
-/* routines for associative arrays. SYMBOL is the address of the node
- (or other pointer) being dereferenced. SUBS is a number or string
- used as the subscript. */
-
-/* #define ASSOC_HASHSIZE 1009 /* prime */
-#define ASSOC_HASHSIZE 29
-#define STIR_BITS(n) ((n) << 5 | (((n) >> 27) & 0x1f))
-#define HASHSTEP(old, c) ((old << 1) + c)
-#define MAKE_POS(v) (v & ~0x80000000) /* make number positive */
-
-/* static AHASH *assoc_table[ASSOC_HASHSIZE]; */
-
-
-/* Flush all the values in symbol[] before doing a split() */
-assoc_clear(symbol)
-NODE *symbol;
+int
+cmp_nodes(t1, t2)
+NODE *t1, *t2;
{
- int i;
- AHASH *bucket,*next;
+ AWKNUM d;
- if(symbol->var_array==0)
- return;
- for(i=0;i<ASSOC_HASHSIZE;i++) {
- for(bucket=symbol->var_array[i];bucket;bucket=next) {
- next=bucket->next;
- deref=bucket->name;
- do_deref();
- deref=bucket->value;
- do_deref();
- free((void *)bucket);
+ if (t1 == t2)
+ return 0;
+ if ((t1->flags & NUM)) {
+ if ((t2->flags & NUM))
+ d = t1->numbr - t2->numbr;
+ else if (is_a_number(t2->stptr))
+ d = t1->numbr - force_number(t2);
+ else {
+ t1 = force_string(t1);
+ goto strings;
}
- symbol->var_array[i]=0;
+ if (d == 0.0) /* from profiling, this is most common */
+ return 0;
+ if (d > 0.0)
+ return 1;
+ return -1;
}
-}
-
-/* Find SYMBOL[SUBS] in the assoc array. Install it with value "" if it
- isn't there. */
-/* Returns a pointer ala get_lhs to where its value is stored */
-NODE **
-assoc_lookup (symbol, subs)
-NODE *symbol,
- *subs;
-{
- int hash1 = 0, hashf(), i;
- AHASH *bucket;
- NODETYPE ty;
-
- if(subs->type==Node_number) {
- hash1=(int)subs->numbr;
- ty=Node_number;
- } else {
- ty=Node_string;
- subs=force_string(subs);
- for(i=0;i<subs->stlen;i++)
- hash1=HASHSTEP(hash1,subs->stptr[i]);
-
- /* hash1 ^= (int) STIR_BITS((int)symbol); */
- }
- hash1 = MAKE_POS(STIR_BITS((int)hash1)) % ASSOC_HASHSIZE;
-
- /* this table really should grow dynamically */
- if(symbol->var_array==0) {
- symbol->var_array=(AHASH **)malloc(sizeof(AHASH *)*ASSOC_HASHSIZE);
- for(i=0;i<ASSOC_HASHSIZE;i++) {
- symbol->var_array[i]=0;
- }
- } else {
- for (bucket = symbol->var_array[hash1]; bucket; bucket = bucket->next) {
- if (bucket->name->type!= ty || cmp_nodes(bucket->name,subs))
- continue;
- return &(bucket->value);
- }
- /* Didn't find it on first pass. Try again. */
- for (bucket = symbol->var_array[hash1]; bucket; bucket = bucket->next) {
- if (cmp_nodes(bucket->name,subs))
- continue;
- return &(bucket->value);
- }
- }
- bucket = (AHASH *) malloc(sizeof (AHASH));
- bucket->symbol = symbol;
- bucket->name = dupnode(subs);
- bucket->value = Nnull_string;
- bucket->next = symbol->var_array[hash1];
- symbol->var_array[hash1]=bucket;
- return &(bucket->value);
-}
-
-struct search *
-assoc_scan(symbol)
-NODE *symbol;
-{
- struct search *lookat;
-
- if(!symbol->var_array)
- return 0;
- lookat=(struct search *)obstack_alloc(&other_stack,sizeof(struct search));
- /* lookat->symbol=symbol; */
- lookat->numleft=ASSOC_HASHSIZE;
- lookat->arr_ptr=symbol->var_array;
- lookat->bucket=symbol->var_array[0];
- return assoc_next(lookat);
-}
-
-struct search *
-assoc_next(lookat)
-struct search *lookat;
-{
- for(;lookat->numleft;lookat->numleft--) {
- while(lookat->bucket!=0) {
- lookat->retval=lookat->bucket->name;
- lookat->bucket=lookat->bucket->next;
- return lookat;
+ if ((t2->flags & NUM)) {
+ if (is_a_number(t1->stptr))
+ d = force_number(t1) - t2->numbr;
+ else {
+ t2 = force_string(t2);
+ goto strings;
}
- lookat->bucket= *++(lookat->arr_ptr);
+ if (d == 0.0) /* from profiling, this is most common */
+ return 0;
+ if (d > 0.0)
+ return 1;
+ return -1;
}
- return 0;
+ if (is_a_number(t1->stptr) && is_a_number(t2->stptr)) {
+ /*
+ * following two statements are this way because force_number
+ * is a macro
+ */
+ d = force_number(t1);
+ d = d - force_number(t2);
+ if (d == 0.0) /* from profiling, this is most common */
+ return 0;
+ if (d > 0.0)
+ return 1;
+ return -1;
+ }
+
+strings:
+ return strcmp(t1->stptr, t2->stptr);
}
-
-#ifdef FAST
NODE *
-strforce(n)
-NODE *n;
+op_assign(tree)
+NODE *tree;
{
- extern NODE dumb[],*OFMT_node;
- NODE *do_sprintf();
-
- dumb[1].lnode=n;
- if(OFMT_node->var_value->type!=Node_string)
- panic("Insane value for OFMT detected.");
- return do_sprintf(&dumb[0]);
-}
+ AWKNUM rval, lval;
+ NODE **lhs;
+
+ lhs = get_lhs(tree->lnode);
+ lval = force_number(*lhs);
+
+ switch(tree->type) {
+ case Node_preincrement:
+ case Node_predecrement:
+ DBG_P(("+-X", tree));
+ assign_number(lhs,
+ lval + (tree->type == Node_preincrement ? 1.0 : -1.0));
+ if (field_num == 0)
+ set_record(fields_arr[0]->stptr, fields_arr[0]->stlen);
+ else if (field_num > 0) {
+ node0_valid = 0;
+ if (NF_node->var_value->numbr == -1 &&
+ field_num > NF_node->var_value->numbr)
+ assign_number(&(NF_node->var_value),
+ (AWKNUM) field_num);
+ }
+ field_num = -1;
+ do_deref();
+ return *lhs;
+ break;
+
+ case Node_postincrement:
+ case Node_postdecrement:
+ DBG_P(("X+-", tree));
+ assign_number(lhs,
+ lval + (tree->type == Node_postincrement ? 1.0 : -1.0));
+ if (field_num == 0)
+ set_record(fields_arr[0]->stptr, fields_arr[0]->stlen);
+ else if (field_num > 0) {
+ node0_valid = 0;
+ if (NF_node->var_value->numbr == -1 &&
+ field_num > NF_node->var_value->numbr)
+ assign_number(&(NF_node->var_value),
+ (AWKNUM) field_num);
+ }
+ field_num = -1;
+ do_deref();
+ return tmp_number(lval);
+ }
-#else
-AWKNUM
-force_number (n)
-NODE *n;
-{
- double atof(); /* Forgetting this is bad */
-
- if(n==NULL)
- abort();
- switch (n->type) {
- case Node_number:
- return n->numbr;
- case Node_string:
- case Node_temp_string:
- return atof(n->stptr);
- default:
- abort ();
- }
- return 0.0;
+ rval = force_number(tree_eval(tree->rnode));
+ free_result();
+ switch(tree->type) {
+ case Node_assign_exp:
+ DBG_P(("ASSIGN_exp", tree));
+ assign_number(lhs, (AWKNUM) pow((double) lval, (double) rval));
+ break;
+
+ case Node_assign_times:
+ DBG_P(("ASSIGN_times", tree));
+ assign_number(lhs, lval * rval);
+ break;
+
+ case Node_assign_quotient:
+ DBG_P(("ASSIGN_quotient", tree));
+ assign_number(lhs, lval / rval);
+ break;
+
+ case Node_assign_mod:
+ DBG_P(("ASSIGN_mod", tree));
+ assign_number(lhs, (AWKNUM) (((int) lval) % ((int) rval)));
+ break;
+
+ case Node_assign_plus:
+ DBG_P(("ASSIGN_plus", tree));
+ assign_number(lhs, lval + rval);
+ break;
+
+ case Node_assign_minus:
+ DBG_P(("ASSIGN_minus", tree));
+ assign_number(lhs, lval - rval);
+ break;
+ }
+ if (field_num == 0)
+ set_record(fields_arr[0]->stptr, fields_arr[0]->stlen);
+ else if (field_num > 0) {
+ node0_valid = 0;
+ if (NF_node->var_value->numbr == -1 &&
+ field_num > NF_node->var_value->numbr)
+ assign_number(&(NF_node->var_value),
+ (AWKNUM) field_num);
+ }
+ field_num = -1;
+ do_deref();
+ return *lhs;
}
-NODE *
-force_string(s)
-NODE *s;
-{
- if(s==NULL)
- abort();
- switch(s->type) {
- case Node_string:
- case Node_temp_string:
- return s;
- case Node_number:
- if((*get_lhs(OFMT_node))->type!=Node_string)
- panic("Insane value for OFMT!",0);
- dumb[1].lnode=s;
- return do_sprintf(&dumb[0]);
- default:
- abort();
- }
- return NULL;
-}
-#endif
diff --git a/awk3.c b/awk3.c
index 1f58dfae..da4fce6a 100644
--- a/awk3.c
+++ b/awk3.c
@@ -1,113 +1,255 @@
-/* awk3.c -- Builtin functions and various utility procedures
- Copyright (C) 1986,1987 Free Software Foundation
- Written by Jay Fenlason, December 1986
-
+/*
+ * awk3 -- Builtin functions and various utility procedures
+ *
+ * Copyright (C) 1986,1987 Free Software Foundation Written by Jay Fenlason,
+ * December 1986
+ *
+ * $Log: awk3.c,v $
+ * Revision 1.34 88/12/13 22:28:10 david
+ * temporarily #ifdef out flush_io in redirect(); adjust atan2() for
+ * force_number as a macro
+ *
+ * Revision 1.32 88/12/01 15:03:21 david
+ * renamed hack_print_node to do_print (at last!)
+ * moved force_string() up out of print_simple for simplicity
+ *
+ * Revision 1.31 88/11/30 15:17:27 david
+ * free previous value in set_fs
+ *
+ * Revision 1.30 88/11/29 16:24:47 david
+ * fix bug in previous change
+ *
+ * Revision 1.29 88/11/29 15:14:52 david
+ * dynamically manage open files/pipes to allow an arbitrary number of open files
+ * (i.e. when out of file descriptors, close the least recently used file,
+ * saving the current offset; if it is reused, reopen and seek to saved offset)
+ *
+ * Revision 1.28 88/11/28 20:12:53 david
+ * correct previous error in cleanup of do_substr
+ *
+ * Revision 1.27 88/11/23 21:42:13 david
+ * Arnold: change ENV to ENVIRON nad a further bug fix for -Ft
+ * ..
+ *
+ * Revision 1.26 88/11/22 13:50:33 david
+ * Arnold: added ENV array and bug fix to -Ft
+ *
+ * Revision 1.25 88/11/15 10:24:08 david
+ * Arnold: cleanup of comments, #include's and obsolete code
+ *
+ * Revision 1.24 88/11/14 21:57:03 david
+ * Arnold: init. FILENAME to "-" and cleanup in do_substr()
+ *
+ * Revision 1.23 88/11/01 12:17:45 david
+ * cleanu and code movement; changes to reflect change to parse_fields()
+ *
+ * Revision 1.22 88/10/19 21:58:43 david
+ * replace malloc and realloc with error checking versions
+ *
+ * Revision 1.21 88/10/17 20:55:31 david
+ * SYSV --> USG
+ *
+ * Revision 1.20 88/10/13 21:59:55 david
+ * purge FAST and cleanup error messages
+ *
+ * Revision 1.19 88/10/06 21:54:28 david
+ * cleaned up I/O handling
+ *
+ * Revision 1.18 88/10/06 15:49:01 david
+ * changes from Arnold: be careful about flushing I/O; warn about error on close;
+ * return seed from srand
+ *
+ * Revision 1.17 88/09/19 20:39:11 david
+ * minor cleanup
+ *
+ * Revision 1.16 88/08/09 14:55:16 david
+ * getline now gets next file properly
+ * stupid bug in do_split() fixed
+ * substr() now works if second arg. is negative (truncated to 0)
+ *
+ * Revision 1.15 88/06/13 18:07:12 david
+ * delete -R option
+ * cleanup of redirection code [from Arnold]
+ *
+ * Revision 1.14 88/06/07 23:41:00 david
+ * some paranoid typecasting plus one bug fix:
+ * in do_getline(), use stdin if input_file is NULL and ther is no redirection
+ *
+ * Revision 1.13 88/06/06 21:40:49 david
+ * oops! got a little overenthusiastic on that last merge
+ *
+ * Revision 1.12 88/06/06 11:27:57 david
+ * get rid of some obsolete code
+ * merge parsing of fields for record input and split()
+ *
+ * Revision 1.11 88/06/05 21:00:35 david
+ * flush I/O buffers before calling system (fix from Arnold)
+ *
+ * Revision 1.10 88/06/05 20:59:26 david
+ * local vars. now come off a stack
+ *
+ * Revision 1.9 88/06/01 22:08:24 david
+ * in split(), ensure that if second arg. is a local var. that the value is
+ * looked up
+ *
+ * Revision 1.8 88/05/31 09:30:16 david
+ * Arnold's portability fixes to last change in random() stuff
+ *
+ * Revision 1.7 88/05/30 09:53:49 david
+ * clean up some fatal() calls
+ * de-lint the random number code
+ *
+ * Revision 1.6 88/05/27 11:06:21 david
+ * input_file wasn't getting properly reset after getline
+ *
+ * Revision 1.5 88/05/26 22:49:55 david
+ * fixed error message for redirection
+ *
+ * Revision 1.4 88/05/18 18:20:02 david
+ * fixed case where RS==""; record was including a trailing newline
+ *
+ * Revision 1.3 88/04/13 17:39:26 david
+ * fixed bug in handling of NR and FNR
+ *
+ * Revision 1.2 88/04/12 16:04:02 david
+ * fixed bug: NF at end of record generated one less field than it should have
+ *
+ * Revision 1.1 88/04/08 15:15:07 david
+ * Initial revision
+ * Revision 1.7 88/04/08 15:08:48 david bug fix for file
+ * descriptor handlin
+ *
+ * Revision 1.6 88/04/08 14:48:36 david changes from Arnold Robbins
+ *
+ * Revision 1.5 88/03/28 14:13:54 david *** empty log message ***
+ *
+ * Revision 1.4 88/03/23 22:17:41 david mostly delinting -- a couple of bug
+ * fixes
+ *
+ * Revision 1.3 88/03/18 21:00:13 david Baseline -- hoefully all the
+ * functionality of the new awk added. Just debugging and tuning to do.
+ *
+ * Revision 1.2 87/11/19 14:42:31 david expanded functionality for getline
+ * broke out get_a_record() from inrec() so that the former can be used from
+ * do_getline add system() builtin and skeletons for many other new builtins
+ *
+ * Revision 1.1 87/10/27 15:23:33 david Initial revision
+ *
*/
/*
-GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
-WARRANTY. No author or distributor accepts responsibility to anyone
-for the consequences of using it or for whether it serves any
-particular purpose or works at all, unless he says so in writing.
-Refer to the GAWK General Public License for full details.
-
-Everyone is granted permission to copy, modify and redistribute GAWK,
-but only under the conditions described in the GAWK General Public
-License. A copy of this license is supposed to have been given to you
-along with GAWK so you can know your rights and responsibilities. It
-should be in a file named COPYING. Among other things, the copyright
-notice and this notice must be preserved on all copies.
-
-In other words, go ahead and share GAWK, but don't try to stop
-anyone else from sharing it farther. Help stamp out software hoarding!
-*/
-#include <stdio.h>
+ * GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
+ * WARRANTY. No author or distributor accepts responsibility to anyone for
+ * the consequences of using it or for whether it serves any particular
+ * purpose or works at all, unless he says so in writing. Refer to the GAWK
+ * General Public License for full details.
+ *
+ * Everyone is granted permission to copy, modify and redistribute GAWK, but
+ * only under the conditions described in the GAWK General Public License. A
+ * copy of this license is supposed to have been given to you along with GAWK
+ * so you can know your rights and responsibilities. It should be in a file
+ * named COPYING. Among other things, the copyright notice and this notice
+ * must be preserved on all copies.
+ *
+ * In other words, go ahead and share GAWK, but don't try to stop anyone else
+ * from sharing it farther. Help stamp out software hoarding!
+ */
#include "awk.h"
-#include <obstack.h>
-
-extern struct obstack temp_strings;
-
-/* This node is the cannonical null string, used everywhere */
-extern NODE *Nnull_string;
-
-/* These nodes store all the special variables gAWK uses */
-NODE *FS_node, *NF_node, *RS_node, *NR_node;
-NODE *FILENAME_node, *OFS_node, *ORS_node, *OFMT_node;
-
-/* This dumb kludge is used by force_string to turn a floating point
- number into a string */
-NODE dumb[2];
+/* These nodes store all the special variables AWK uses */
+NODE *FS_node, *NF_node, *RS_node, *NR_node;
+NODE *FILENAME_node, *OFS_node, *ORS_node, *OFMT_node;
+NODE *FNR_node, *RLENGTH_node, *RSTART_node, *SUBSEP_node;
+NODE *ENVIRON_node;
-NODE **get_lhs();
-FILE *deal_redirect();
+FILE *redirect();
+/*
+ * structure used to dynamically maintain a linked-list of open files/pipes
+ */
struct redirect {
- int flag; /* JF was NODETYPE */
- NODE *value;
- FILE *fp;
+ int flag;
+# define RED_FILE 1
+# define RED_PIPE 2
+# define RED_READ 4
+# define RED_WRITE 8
+# define RED_APPEND 16
+ char *value;
+ FILE *fp;
+ long offset; /* used for dynamic management of open files */
+ struct redirect *prev;
+ struct redirect *next;
};
-struct redirect reds[20]; /* An arbitrary limit, surely, but there's an
- arbitrary limit on open files, too. So it
- doesn't make much difference, does it? */
-
+struct redirect *red_head = NULL;
-long NR;
-int NF;
-
-/* The next #define tells how you find $0. Its a hack */
-extern NODE **fields_arr;
-#define WHOLELINE fields_arr[0]
-
-/* Set all the special variables to their initial values. Also sets up
- the dumb[] array for force_string */
+/*
+ * Set all the special variables to their initial values.
+ */
init_vars()
{
- NODE *spc_var();
- NODE *do_sprintf();
-
- FS_node=spc_var("FS",make_string(" ",1));
- NF_node=spc_var("NF",make_number(0.0));
- RS_node=spc_var("RS",make_string("\n",1));
- NR_node=spc_var("NR",make_number(0.0));
- FILENAME_node=spc_var("FILENAME",Nnull_string);
- OFS_node=spc_var("OFS",make_string(" ",1));
- ORS_node=spc_var("ORS",make_string("\n",1));
- OFMT_node=spc_var("OFMT",make_string("%.6g",4));
-
- /* This ugly hack is used by force_string
- to fake a call to sprintf */
- dumb[0].type=Node_expression_list;
- dumb[0].lnode=OFMT_node;
- dumb[0].rnode= &dumb[1];
- dumb[1].type=Node_expression_list;
- dumb[1].lnode=(NODE *)0; /* fill in the var here */
- dumb[1].rnode=(NODE *)0;
- reds[0].flag=0; /* Don't depend on uninit data being zero, although it should be */
-}
-
-/* OFMT is special because we don't dare use force_string on it for fear of
- infinite loops. Thus, if it isn't a string, we return the default "%.6g"
- This may or may not be the right thing to do, but its the easiest */
+ NODE *spc_var();
+ NODE *do_sprintf();
+ extern char **environ;
+ char *var, *val;
+ NODE **aptr;
+ int i;
+ extern NODE **assoc_lookup();
+ extern NODE *tmp_string();
+
+ FS_node = spc_var("FS", make_string(" ", 1));
+ NF_node = spc_var("NF", make_number(-1.0));
+ RS_node = spc_var("RS", make_string("\n", 1));
+ NR_node = spc_var("NR", make_number(0.0));
+ FNR_node = spc_var("FNR", make_number(0.0));
+ FILENAME_node = spc_var("FILENAME", make_string("-", 1));
+ OFS_node = spc_var("OFS", make_string(" ", 1));
+ ORS_node = spc_var("ORS", make_string("\n", 1));
+ OFMT_node = spc_var("OFMT", make_string("%.6g", 4));
+ RLENGTH_node = spc_var("RLENGTH", make_number(0.0));
+ RSTART_node = spc_var("RSTART", make_number(0.0));
+ SUBSEP_node = spc_var("SUBSEP", make_string("\034", 1));
+
+ ENVIRON_node = spc_var("ENVIRON", Nnull_string);
+ for (i = 0; environ[i]; i++) {
+ var = environ[i];
+ val = index(var, '=');
+ if (val)
+ *val++ = '\0';
+ else
+ val = "";
+ aptr = assoc_lookup(ENVIRON_node, tmp_string(var, strlen (var)));
+ *aptr = make_string(val, strlen (val));
+ }
+}
+
+/*
+ * OFMT is special because we don't dare use force_string on it for fear of
+ * infinite loops. Thus, if it isn't a string, we return the default "%.6g"
+ * This may or may not be the right thing to do, but its the easiest
+ */
/* This routine isn't used! It should be. */
-char *get_ofmt()
+#ifdef notdef
+char *
+get_ofmt()
{
register NODE *tmp;
- tmp= *get_lhs(OFMT_node);
- if(tmp->type!=Node_string || tmp->stlen==0)
+ tmp = *get_lhs(OFMT_node);
+ if ((tmp->type != Node_string && tmp->type != Node_str_num) || tmp->stlen == 0)
return "%.6g";
return tmp->stptr;
}
+#endif
-int
+char *
get_fs()
{
register NODE *tmp;
- tmp=force_string(FS_node->var_value);
- if(tmp->stlen==0) return 0;
- return *(tmp->stptr);
+ tmp = force_string(FS_node->var_value);
+ if (tmp->stlen == 0)
+ return 0;
+ return tmp->stptr;
}
set_fs(str)
@@ -115,38 +257,27 @@ char *str;
{
register NODE **tmp;
- tmp= get_lhs(FS_node);
+ tmp = get_lhs(FS_node);
do_deref();
- /* stupid special case so -F\t works as documented in awk */
- /* even though the shell hands us -Ft. Bleah! (jfw) */
- if (*str == 't') *str == '\t';
- *tmp=make_string(str,1);
-}
-
-set_rs(str)
-char *str;
-{
- register NODE **tmp;
-
- tmp= get_lhs(RS_node);
+ /* stupid special case so -F\t works as documented in awk */
+ /* even though the shell hands us -Ft. Bleah! */
+ if (str[0] == 't' && str[1] == '\0')
+ str[0] = '\t';
+ *tmp = make_string(str, 1);
do_deref();
- /* stupid special case to be consistent with -F (jfw) */
- if (*str == 't') *str == '\t';
- *tmp=make_string(str,1);
}
-
int
get_rs()
{
register NODE *tmp;
- tmp=force_string(RS_node->var_value);
- if(tmp->stlen==0) return 0;
+ tmp = force_string(RS_node->var_value);
+ if (tmp->stlen == 0)
+ return 0;
return *(tmp->stptr);
}
-
/* Builtin functions */
NODE *
do_exp(tree)
@@ -155,53 +286,41 @@ NODE *tree;
NODE *tmp;
double exp();
- get_one(tree,&tmp);
- return tmp_number(exp(force_number(tmp)));
-}
-
-/* JF: I don't know what this should return. */
-/* jfw: 1 if successful or by land, 0 if end of file or by sea */
-NODE *
-do_getline(tree)
-NODE *tree;
-{
- if(inrec() == 0)
- return tmp_number(1.0);
- else
- return tmp_number(0.0);
+ get_one(tree, &tmp);
+ return tmp_number((AWKNUM)exp((double)force_number(tmp)));
}
NODE *
do_index(tree)
NODE *tree;
{
- NODE *s1,*s2;
- register char *p1,*p2;
- register int l1,l2;
-
- get_two(tree,&s1,&s2);
- p1=s1->stptr;
- p2=s2->stptr;
- l1=s1->stlen;
- l2=s2->stlen;
- while(l1) {
- if(!strncmp(p1,p2,l2))
- return tmp_number((AWKNUM)(1+s1->stlen-l1));
+ NODE *s1, *s2;
+ register char *p1, *p2;
+ register int l1, l2;
+
+ get_two(tree, &s1, &s2);
+ p1 = s1->stptr;
+ p2 = s2->stptr;
+ l1 = s1->stlen;
+ l2 = s2->stlen;
+ while (l1) {
+ if (!strncmp(p1, p2, l2))
+ return tmp_number((AWKNUM) (1 + s1->stlen - l1));
l1--;
p1++;
}
- return tmp_number(0.0);
+ return tmp_number((AWKNUM) 0.0);
}
NODE *
do_int(tree)
NODE *tree;
{
- NODE *tmp;
- double floor();
+ NODE *tmp;
+ double floor();
- get_one(tree,&tmp);
- return tmp_number(floor(force_number(tmp)));
+ get_one(tree, &tmp);
+ return tmp_number((AWKNUM)floor((double)force_number(tmp)));
}
NODE *
@@ -210,67 +329,77 @@ NODE *tree;
{
NODE *tmp;
- get_one(tree,&tmp);
- return tmp_number((AWKNUM)(force_string(tmp)->stlen));
+ get_one(tree, &tmp);
+ return tmp_number((AWKNUM) (force_string(tmp)->stlen));
}
NODE *
do_log(tree)
NODE *tree;
{
- NODE *tmp;
+ NODE *tmp;
double log();
- get_one(tree,&tmp);
- return tmp_number(log(force_number(tmp)));
+ get_one(tree, &tmp);
+ return tmp_number((AWKNUM)log((double)force_number(tmp)));
}
-NODE *
+NODE *
do_printf(tree)
NODE *tree;
{
- register FILE *fp;
- NODE *do_sprintf();
+ register FILE *fp;
+ NODE *do_sprintf();
- fp=deal_redirect(tree->rnode);
- print_simple(do_sprintf(tree->lnode),fp);
+ fp = redirect(tree->rnode);
+ print_simple(do_sprintf(tree->lnode), fp);
return Nnull_string;
}
+set_element(num, s, len, n)
+int num;
+char *s;
+int len;
+NODE *n;
+{
+ extern NODE **assoc_lookup();
+
+ *assoc_lookup(n, tmp_number((AWKNUM) (num))) = make_string(s, len);
+}
NODE *
do_split(tree)
NODE *tree;
{
- NODE *t1,*t2,*t3;
- register int splitc;
- register int num,snum,olds;
- register char *ptr,*oldp;
- NODE **assoc_lookup();
+ NODE *t1, *t2, *t3;
+ register char *splitc;
+ char *s;
+ NODE *n;
- if(a_get_three(tree,&t1,&t2,&t3)<3)
- splitc= get_fs();
+ if (a_get_three(tree, &t1, &t2, &t3) < 3)
+ splitc = get_fs();
else
- splitc= *(force_string(t3)->stptr);
- num=0;
- tree=force_string(t1);
- olds=snum=tree->stlen;
- oldp=ptr=tree->stptr;
- assoc_clear(t2);
- while(snum--) {
- if(*ptr++==splitc) {
- *assoc_lookup(t2,make_number((AWKNUM)(++num)))=make_string(oldp,(olds-snum)-1);
- oldp=ptr;
- olds=snum;
- }
- }
- *assoc_lookup(t2,make_number((AWKNUM)(++num)))=make_string(oldp,(olds-snum)-1);
- return tmp_number((AWKNUM)num);
+ splitc = force_string(t3)->stptr;
+
+ n = t2;
+ if (t2->type == Node_param_list)
+ n = stack_ptr[t2->param_cnt];
+ if (n->type != Node_var && n->type != Node_var_array)
+ fatal("second argument of split is not a variable");
+ assoc_clear(n);
+
+ tree = force_string(t1);
+
+ s = tree->stptr;
+ return tmp_number((AWKNUM)
+ parse_fields(HUGE, &s, tree->stlen, splitc, set_element, n));
}
-/* Note that the output buffer cannot be static because sprintf may get called
- recursively by force_string. Hence the wasteful alloca calls */
+/*
+ * Note that the output buffer cannot be static because sprintf may get
+ * called recursively by force_string. Hence the wasteful alloca calls
+ */
/* %e and %f formats are not properly implemented. Someone should fix them */
NODE *
@@ -291,7 +420,7 @@ NODE *tree;
ofre-=(l);\
}
-/* Is there space for something L big in the buffer? */
+ /* Is there space for something L big in the buffer? */
#define chksize(l) if((l)>ofre) {\
char *tmp;\
tmp=(char *)alloca(osiz*2);\
@@ -300,8 +429,11 @@ NODE *tree;
ofre+=osiz;\
osiz*=2;\
}
-/* Get the next arg to be formatted. If we've run out of args, return
- "" (Null string) */
+
+ /*
+ * Get the next arg to be formatted. If we've run out of args,
+ * return "" (Null string)
+ */
#define parse_next_arg() {\
if(!carg) arg= Nnull_string;\
else {\
@@ -311,61 +443,62 @@ NODE *tree;
}
char *obuf;
- int osiz,ofre,olen;
+ int osiz, ofre, olen;
static char chbuf[] = "0123456789abcdef";
- static char sp[] =" ";
- char *s0,*s1;
- int n0;
- NODE *sfmt,*arg;
+ static char sp[] = " ";
+ char *s0, *s1;
+ int n0;
+ NODE *sfmt, *arg;
register NODE *carg;
- long fw,prec,lj,alt,big;
- long *cur;
- long val;
+ long fw, prec, lj, alt, big;
+ long *cur;
+ long val;
unsigned long uval;
- int sgn;
- int base;
- char cpbuf[30]; /* if we have numbers bigger than 30 */
- char *cend= &cpbuf[30]; /* chars, we lose, but seems unlikely */
- char *cp;
- char *fill;
- double tmpval;
- char *pr_str;
-
-
- obuf=(char *)alloca(120);
- osiz=120;
- ofre=osiz;
- olen=0;
- get_one(tree,&sfmt);
- sfmt=force_string(sfmt);
- carg=tree->rnode;
- for(s0=s1=sfmt->stptr,n0=sfmt->stlen;n0-->0;) {
- if(*s1!='%') {
+ int sgn;
+ int base;
+ char cpbuf[30]; /* if we have numbers bigger than 30 */
+ char *cend = &cpbuf[30];/* chars, we lose, but seems unlikely */
+ char *cp;
+ char *fill;
+ double tmpval;
+ char *pr_str;
+ extern char *gcvt();
+
+
+ obuf = (char *) alloca(120);
+ osiz = 120;
+ ofre = osiz;
+ olen = 0;
+ get_one(tree, &sfmt);
+ sfmt = force_string(sfmt);
+ carg = tree->rnode;
+ for (s0 = s1 = sfmt->stptr, n0 = sfmt->stlen; n0-- > 0;) {
+ if (*s1 != '%') {
s1++;
continue;
}
-
- bchunk(s0,s1-s0);
- s0=s1;
- cur= &fw;
- fw=0;
- prec=0;
- lj=alt=big=0;
- fill= sp;
- cp=cend;
+ bchunk(s0, s1 - s0);
+ s0 = s1;
+ cur = &fw;
+ fw = 0;
+ prec = 0;
+ lj = alt = big = 0;
+ fill = sp;
+ cp = cend;
s1++;
- retry:
+retry:
--n0;
- switch(*s1++) {
+ switch (*s1++) {
case '%':
- bchunk("%",1);
- s0=s1;
+ bchunk("%", 1);
+ s0 = s1;
break;
case '0':
- if(fill!=sp || lj) goto lose;
- fill="0"; /* FALL through */
+ if (fill != sp || lj)
+ goto lose;
+ fill = "0"; /* FALL through */
case '1':
case '2':
case '3':
@@ -375,526 +508,681 @@ NODE *tree;
case '7':
case '8':
case '9':
- if(cur==0)
+ if (cur == 0)
goto lose;
- *cur= s1[-1]-'0';
- while(n0>0 && *s1>='0' && *s1<='9') {
+ *cur = s1[-1] - '0';
+ while (n0 > 0 && *s1 >= '0' && *s1 <= '9') {
--n0;
- *cur= *cur * 10 + *s1++ - '0';
+ *cur = *cur * 10 + *s1++ - '0';
}
goto retry;
case '-':
- if(lj || fill!=sp) goto lose;
+ if (lj || fill != sp)
+ goto lose;
lj++;
goto retry;
case '.':
- if(cur!=&fw) goto lose;
- cur= &prec;
+ if (cur != &fw)
+ goto lose;
+ cur = &prec;
goto retry;
case '#':
- if(alt) goto lose;
+ if (alt)
+ goto lose;
alt++;
goto retry;
case 'l':
- if(big) goto lose;
+ if (big)
+ goto lose;
big++;
goto retry;
- case '*':
- if(cur==0) goto lose;
- parse_next_arg();
- *cur=(int)arg;
- goto retry;
case 'c':
parse_next_arg();
- if(arg->type==Node_number) {
- uval=(unsigned long)arg->numbr;
- cpbuf[0]=uval;
- prec=1;
- pr_str=cpbuf;
+ if (arg->flags & NUM) {
+ uval = (unsigned long) arg->numbr;
+ cpbuf[0] = uval;
+ prec = 1;
+ pr_str = cpbuf;
goto dopr_string;
}
- if(!prec || prec>arg->stlen)
- prec=arg->stlen;
- pr_str=cpbuf;
+ if (!prec || prec > arg->stlen)
+ prec = arg->stlen;
+ pr_str = cpbuf;
goto dopr_string;
case 's':
parse_next_arg();
- arg=force_string(arg);
- if(!prec || prec>arg->stlen)
- prec=arg->stlen;
- pr_str=arg->stptr;
-
- dopr_string:
- if(fw>prec && !lj) {
- while(fw>prec) {
- bchunk(sp,1);
+ arg = force_string(arg);
+ if (!prec || prec > arg->stlen)
+ prec = arg->stlen;
+ pr_str = arg->stptr;
+
+ dopr_string:
+ if (fw > prec && !lj) {
+ while (fw > prec) {
+ bchunk(sp, 1);
fw--;
}
}
- bchunk(pr_str,(int)prec);
- if(fw>prec) {
- while(fw>prec) {
- bchunk(sp,1);
+ bchunk(pr_str, (int) prec);
+ if (fw > prec) {
+ while (fw > prec) {
+ bchunk(sp, 1);
fw--;
}
}
- s0=s1;
+ s0 = s1;
break;
case 'd':
parse_next_arg();
- val=(long)force_number(arg);
- if(val<0) {
- sgn=1;
- val= -val;
- } else sgn=0;
+ val = (long) force_number(arg);
+ if (val < 0) {
+ sgn = 1;
+ val = -val;
+ } else
+ sgn = 0;
do {
- *--cp='0'+val%10;
- val/=10;
+ *--cp = '0' + val % 10;
+ val /= 10;
} while (val);
- if(sgn) *--cp='-';
- prec=cend-cp;
- if(fw>prec && !lj) {
- if(fill!=sp && *cp=='-') {
- bchunk(cp,1);
+ if (sgn)
+ *--cp = '-';
+ prec = cend - cp;
+ if (fw > prec && !lj) {
+ if (fill != sp && *cp == '-') {
+ bchunk(cp, 1);
cp++;
prec--;
fw--;
}
- while(fw>prec) {
- bchunk(fill,1);
+ while (fw > prec) {
+ bchunk(fill, 1);
fw--;
}
}
- bchunk(cp,(int)prec);
- if(fw>prec) {
- while(fw>prec) {
- bchunk(fill,1);
+ bchunk(cp, (int) prec);
+ if (fw > prec) {
+ while (fw > prec) {
+ bchunk(fill, 1);
fw--;
}
}
- s0=s1;
+ s0 = s1;
break;
case 'u':
- base=10;
+ base = 10;
goto pr_unsigned;
case 'o':
- base=8;
+ base = 8;
goto pr_unsigned;
case 'x':
- base=16;
+ base = 16;
goto pr_unsigned;
- pr_unsigned:
+ pr_unsigned:
parse_next_arg();
- uval=(unsigned long)force_number(arg);
+ uval = (unsigned long) force_number(arg);
do {
- *--cp=chbuf[uval%base];
- uval/=base;
- } while(uval);
- prec=cend-cp;
- if(fw>prec && !lj) {
- while(fw>prec) {
- bchunk(fill,1);
+ *--cp = chbuf[uval % base];
+ uval /= base;
+ } while (uval);
+ prec = cend - cp;
+ if (fw > prec && !lj) {
+ while (fw > prec) {
+ bchunk(fill, 1);
fw--;
}
}
- bchunk(cp,(int)prec);
- if(fw>prec) {
- while(fw>prec) {
- bchunk(fill,1);
+ bchunk(cp, (int) prec);
+ if (fw > prec) {
+ while (fw > prec) {
+ bchunk(fill, 1);
fw--;
}
}
- s0=s1;
+ s0 = s1;
break;
case 'g':
parse_next_arg();
- tmpval=force_number(arg);
- if(prec==0) prec=13;
- gcvt(tmpval,prec,cpbuf);
- prec=strlen(cpbuf);
- cp=cpbuf;
- if(fw>prec && !lj) {
- if(fill!=sp && *cp=='-') {
- bchunk(cp,1);
+ tmpval = force_number(arg);
+ if (prec == 0)
+ prec = 13;
+ (void) gcvt(tmpval, (int) prec, cpbuf);
+ prec = strlen(cpbuf);
+ cp = cpbuf;
+ if (fw > prec && !lj) {
+ if (fill != sp && *cp == '-') {
+ bchunk(cp, 1);
cp++;
prec--;
} /* Deal with .5 as 0.5 */
- if(fill==sp && *cp=='.') {
+ if (fill == sp && *cp == '.') {
--fw;
- while(--fw>=prec) {
- bchunk(fill,1);
+ while (--fw >= prec) {
+ bchunk(fill, 1);
}
- bchunk("0",1);
- } else
- while(fw-->prec) bchunk(fill,1);
- } else { /* Turn .5 into 0.5 */
- /* FOO */
- if(*cp=='.' && fill==sp) {
- bchunk("0",1);
+ bchunk("0", 1);
+ } else
+ while (fw-- > prec)
+ bchunk(fill, 1);
+ } else {/* Turn .5 into 0.5 */
+ /* FOO */
+ if (*cp == '.' && fill == sp) {
+ bchunk("0", 1);
--fw;
}
}
- bchunk(cp,(int)prec);
- if(fw>prec) while(fw-->prec) bchunk(fill,1);
- s0=s1;
+ bchunk(cp, (int) prec);
+ if (fw > prec)
+ while (fw-- > prec)
+ bchunk(fill, 1);
+ s0 = s1;
break;
- /* JF how to handle these!? */
case 'f':
parse_next_arg();
- tmpval=force_number(arg);
- chksize(fw+prec+5); /* 5==slop */
-/* cp=fcvt(tmpval,prec,&dec,&sgn);
- prec=strlen(cp);
- if(sgn) prec++; */
- cp=cpbuf;
- *cp++='%';
- if(lj) *cp++='-';
- if(fill!=sp) *cp++='0';
- if(prec!=0) {
- strcpy(cp,"*.*f");
- sprintf(obuf+olen,cpbuf,fw,prec,(double)tmpval);
+ tmpval = force_number(arg);
+ chksize(fw + prec + 5); /* 5==slop */
+
+ cp = cpbuf;
+ *cp++ = '%';
+ if (lj)
+ *cp++ = '-';
+ if (fill != sp)
+ *cp++ = '0';
+ if (prec != 0) {
+ (void) strcpy(cp, "*.*f");
+ (void) sprintf(obuf + olen, cpbuf, fw, prec, (double) tmpval);
} else {
- strcpy(cp,"*f");
- sprintf(obuf+olen,cpbuf,fw,(double)tmpval);
+ (void) strcpy(cp, "*f");
+ (void) sprintf(obuf + olen, cpbuf, fw, (double) tmpval);
}
- cp=obuf+olen;
- ofre-=strlen(obuf+olen);
- olen+=strlen(obuf+olen);/* There may be nulls */
- s0=s1;
+ cp = obuf + olen;
+ ofre -= strlen(obuf + olen);
+ olen += strlen(obuf + olen); /* There may be nulls */
+ s0 = s1;
break;
case 'e':
parse_next_arg();
- tmpval=force_number(arg);
- chksize(fw+prec+5); /* 5==slop */
- cp=cpbuf;
- *cp++='%';
- if(lj) *cp++='-';
- if(fill!=sp) *cp++='0';
- if(prec!=0) {
- strcpy(cp,"*.*e");
- sprintf(obuf+olen,cpbuf,fw,prec,(double)tmpval);
+ tmpval = force_number(arg);
+ chksize(fw + prec + 5); /* 5==slop */
+ cp = cpbuf;
+ *cp++ = '%';
+ if (lj)
+ *cp++ = '-';
+ if (fill != sp)
+ *cp++ = '0';
+ if (prec != 0) {
+ (void) strcpy(cp, "*.*e");
+ (void) sprintf(obuf + olen, cpbuf, fw, prec, (double) tmpval);
} else {
- strcpy(cp,"*e");
- sprintf(obuf+olen,cpbuf,fw,(double)tmpval);
+ (void) strcpy(cp, "*e");
+ (void) sprintf(obuf + olen, cpbuf, fw, (double) tmpval);
}
- cp=obuf+olen;
- ofre-=strlen(obuf+olen);
- olen+=strlen(obuf+olen);/* There may be nulls */
- s0=s1;
- break;
+ cp = obuf + olen;
+ ofre -= strlen(obuf + olen);
+ olen += strlen(obuf + olen); /* There may be nulls */
+ s0 = s1;
break;
- /* case 'g':
- parse_next_arg();
- tmpval=force_number(arg);
- if(prec!=0) sprintf(obuf+osiz-ofre,"%*.*g",fw,prec,(double)tmpval);
- else sprintf(obuf+osiz-ofre,"%*g",fw,(double)tmpval);
- ofre-=strlen(obuf+osiz-ofre);
- s0=s1;
- break; */
+
default:
- lose:
+ lose:
break;
}
}
- bchunk(s0,s1-s0);
- return tmp_string(obuf,olen);
+ bchunk(s0, s1 - s0);
+ return tmp_string(obuf, olen);
}
NODE *
do_sqrt(tree)
NODE *tree;
{
- NODE *tmp;
- double sqrt();
+ NODE *tmp;
+ double sqrt();
- get_one(tree,&tmp);
- return tmp_number(sqrt(force_number(tmp)));
+ get_one(tree, &tmp);
+ return tmp_number((AWKNUM)sqrt((double)force_number(tmp)));
}
NODE *
do_substr(tree)
NODE *tree;
{
- NODE *t1,*t2,*t3;
- register int n1,n2;
-
- if(get_three(tree,&t1,&t2,&t3)<3)
- n2=32000;
- else
- n2=(int)force_number(t3);
- n1=(int)force_number(t2)-1;
- tree=force_string(t1);
- if(n1<0 || n1>=tree->stlen || n2<=0)
+ NODE *t1, *t2, *t3;
+ register int index, length;
+
+ length = -1;
+ if (get_three(tree, &t1, &t2, &t3) == 3)
+ length = (int) force_number(t3);
+ index = (int) force_number(t2) - 1;
+ tree = force_string(t1);
+ if (length == -1)
+ length = tree->stlen;
+ if (index < 0)
+ index = 0;
+ if (index >= tree->stlen || length <= 0)
return Nnull_string;
- if(n1+n2>tree->stlen)
- n2=tree->stlen-n1;
- return tmp_string(tree->stptr+n1,n2);
+ if (index + length > tree->stlen)
+ length = tree->stlen - index;
+ return tmp_string(tree->stptr + index, length);
}
-/* The print command. Its name is historical */
-hack_print_node(tree)
-NODE *tree;
+NODE *
+do_system(tree)
+NODE *tree;
{
- register FILE *fp;
+ NODE *tmp;
+ int ret;
+ extern int flush_io ();
+
+ (void) flush_io (); /* so output is syncrhonous with gawk's */
+ get_one(tree, &tmp);
+ ret = system(force_string(tmp)->stptr);
+ ret = (ret >> 8) & 0xff;
+ return tmp_number((AWKNUM) ret);
+}
-#ifndef FAST
- if(!tree || tree->type != Node_K_print)
- abort();
-#endif
- fp=deal_redirect(tree->rnode);
- tree=tree->lnode;
- if(!tree) tree=WHOLELINE;
- if(tree->type!=Node_expression_list) {
- print_simple(tree,fp);
+/* The print command. Its name is historical */
+do_print(tree)
+NODE *tree;
+{
+ register FILE *fp;
+
+ fp = redirect(tree->rnode);
+ tree = tree->lnode;
+ if (!tree)
+ tree = WHOLELINE;
+ if (tree->type != Node_expression_list) {
+ if (!(tree->flags & STR))
+ cant_happen();
+ print_simple(tree, fp);
} else {
- while(tree) {
- print_simple(tree_eval(tree->lnode),fp);
- tree=tree->rnode;
- if(tree) print_simple(OFS_node->var_value,fp);
+ while (tree) {
+ print_simple(force_string(tree_eval(tree->lnode)), fp);
+ tree = tree->rnode;
+ if (tree)
+ print_simple(OFS_node->var_value, fp);
}
}
- print_simple(ORS_node->var_value,fp);
+ print_simple(ORS_node->var_value, fp);
}
-
-/* Get the arguments to functions. No function cares if you give it
- too many args (they're ignored). Only a few fuctions complain
- about being given too few args. The rest have defaults */
+/*
+ * Get the arguments to functions. No function cares if you give it too many
+ * args (they're ignored). Only a few fuctions complain about being given
+ * too few args. The rest have defaults
+ */
-get_one(tree,res)
-NODE *tree,**res;
+get_one(tree, res)
+NODE *tree, **res;
{
- if(!tree) {
- *res= WHOLELINE;
+ if (!tree) {
+ *res = WHOLELINE;
return;
}
-#ifndef FAST
- if(tree->type!=Node_expression_list)
- abort();
-#endif
- *res=tree_eval(tree->lnode);
+ *res = tree_eval(tree->lnode);
}
-get_two(tree,res1,res2)
-NODE *tree,**res1,**res2;
+get_two(tree, res1, res2)
+NODE *tree, **res1, **res2;
{
- if(!tree) {
- *res1= WHOLELINE;
+ if (!tree) {
+ *res1 = WHOLELINE;
return;
}
-#ifndef FAST
- if(tree->type!=Node_expression_list)
- abort();
-#endif
- *res1=tree_eval(tree->lnode);
- if(!tree->rnode)
+ *res1 = tree_eval(tree->lnode);
+ if (!tree->rnode)
return;
- tree=tree->rnode;
-#ifndef FAST
- if(tree->type!=Node_expression_list)
- abort();
-#endif
- *res2=tree_eval(tree->lnode);
+ tree = tree->rnode;
+ *res2 = tree_eval(tree->lnode);
}
-get_three(tree,res1,res2,res3)
-NODE *tree,**res1,**res2,**res3;
+get_three(tree, res1, res2, res3)
+NODE *tree, **res1, **res2, **res3;
{
- if(!tree) {
- *res1= WHOLELINE;
+ if (!tree) {
+ *res1 = WHOLELINE;
return 0;
}
-#ifndef FAST
- if(tree->type!=Node_expression_list)
- abort();
-#endif
- *res1=tree_eval(tree->lnode);
- if(!tree->rnode)
+ *res1 = tree_eval(tree->lnode);
+ if (!tree->rnode)
return 1;
- tree=tree->rnode;
-#ifndef FAST
- if(tree->type!=Node_expression_list)
- abort();
-#endif
- *res2=tree_eval(tree->lnode);
- if(!tree->rnode)
+ tree = tree->rnode;
+ *res2 = tree_eval(tree->lnode);
+ if (!tree->rnode)
return 2;
- tree=tree->rnode;
-#ifndef FAST
- if(tree->type!=Node_expression_list)
- abort();
-#endif
- *res3=tree_eval(tree->lnode);
+ tree = tree->rnode;
+ *res3 = tree_eval(tree->lnode);
return 3;
}
-a_get_three(tree,res1,res2,res3)
-NODE *tree,**res1,**res2,**res3;
+a_get_three(tree, res1, res2, res3)
+NODE *tree, **res1, **res2, **res3;
{
- if(!tree) {
- *res1= WHOLELINE;
+ if (!tree) {
+ *res1 = WHOLELINE;
return 0;
}
-#ifndef FAST
- if(tree->type!=Node_expression_list)
- abort();
-#endif
- *res1=tree_eval(tree->lnode);
- if(!tree->rnode)
+ *res1 = tree_eval(tree->lnode);
+ if (!tree->rnode)
return 1;
- tree=tree->rnode;
-#ifndef FAST
- if(tree->type!=Node_expression_list)
- abort();
-#endif
- *res2=tree->lnode;
- if(!tree->rnode)
+ tree = tree->rnode;
+ *res2 = tree->lnode;
+ if (!tree->rnode)
return 2;
- tree=tree->rnode;
-#ifndef FAST
- if(tree->type!=Node_expression_list)
- abort();
-#endif
- *res3=tree_eval(tree->lnode);
+ tree = tree->rnode;
+ *res3 = tree_eval(tree->lnode);
return 3;
}
-/* FOO this should re-allocate the buffer if it isn't big enough.
- Also, it should do RMS style only-parse-enough stuff. */
-/* This reads in a line from the input file */
-inrec()
-{
- static char *buf,*buf_end;
- static bsz;
- register char *cur;
- register char *tmp;
- register char *ttmp;
- int cnt;
- int tcnt;
- register int c;
- int rs;
- int fs;
- extern FILE *input_file;
- NODE **get_lhs();
-
- rs = get_rs();
- fs = get_fs();
- blank_fields();
- NR++;
- NF=0;
- if(!buf) {
- buf=malloc(128);
- bsz=128;
- buf_end=buf+bsz;
+/* Redirection for printf and print commands */
+FILE *
+redirect(tree)
+NODE *tree;
+{
+ register NODE *tmp;
+ register struct redirect *rp;
+ register char *str;
+ register FILE *fp;
+ FILE *popen();
+ FILE *fopen();
+ int tflag;
+ char *direction = "to";
+
+ if (!tree)
+ return stdout;
+ tflag = 0;
+ switch (tree->type) {
+ case Node_redirect_append:
+ tflag = RED_APPEND;
+ case Node_redirect_output:
+ tflag |= (RED_FILE|RED_WRITE);
+ break;
+ case Node_redirect_pipe:
+ tflag = (RED_PIPE|RED_WRITE);
+ break;
+ case Node_redirect_pipein:
+ tflag = (RED_PIPE|RED_READ);
+ break;
+ case Node_redirect_input:
+ tflag = (RED_FILE|RED_READ);
+ break;
+ default:
+ fatal ("invalid tree type %d in redirect()\n", tree->type);
+ break;
}
- cur=buf;
- cnt=0;
- while ((c=getc(input_file))!=EOF) {
- if((!rs && c=='\n' && cur[-1]=='\n' && cur!=buf) || (c == rs))
+ tmp = force_string(tree_eval(tree->subnode));
+ str = tmp->stptr;
+ for (rp = red_head; rp != NULL; rp = rp->next)
+ if (rp->flag == tflag && strcmp(rp->value, str) == 0)
break;
- *cur++=c;
- cnt++;
- if(cur==buf_end) {
- buf=realloc(buf,bsz*2);
- cur=buf+bsz;
- bsz*=2;
- buf_end=buf+bsz;
- }
+ if (rp == NULL) {
+ emalloc(rp, struct redirect *, sizeof(struct redirect),
+ "redirect");
+ emalloc(str, char *, strlen(tmp->stptr)+1, "redirect");
+ (void) strcpy(str, tmp->stptr);
+ rp->value = str;
+ rp->flag = tflag;
+ rp->offset = 0;
+ rp->fp = NULL;
+ /* maintain list in most-recently-used first order */
+ if (red_head)
+ red_head->prev = rp;
+ rp->prev = NULL;
+ rp->next = red_head;
+ red_head = rp;
}
- *cur='\0';
- set_field(0,buf,cnt);
- assign_number(&(NF_node->var_value),0.0);
- if(c==EOF && cnt==0)
- return 1;
- assign_number(&(NR_node->var_value),1.0+force_number(NR_node->var_value));
- for(tmp=buf;tmp<cur;tmp++) {
- if(fs==' ') {
- while((*tmp==' ' || *tmp=='\t') && tmp<cur)
- tmp++;
- if(tmp>=cur)
- break;
+ while (rp->fp == NULL) {
+ errno = 0;
+ switch (tree->type) {
+ case Node_redirect_output:
+ fp = rp->fp = fopen(str, "w");
+ break;
+ case Node_redirect_append:
+ fp = rp->fp = fopen(str, "a");
+ break;
+ case Node_redirect_pipe:
+ fp = rp->fp = popen(str, "w");
+ break;
+ case Node_redirect_pipein:
+ direction = "from";
+ fp = rp->fp = popen(str, "r");
+ break;
+ case Node_redirect_input:
+ direction = "from";
+ fp = rp->fp = fopen(str, "r");
+ break;
}
- tcnt=0;
- ttmp=tmp;
- if(fs==' ') {
- while(*tmp!=' ' && *tmp!='\t' && tmp<cur) {
- tmp++;
- tcnt++;
- }
- } else {
- while(*tmp!=fs && tmp<cur) {
- tmp++;
- tcnt++;
- }
+ if (fp == NULL) {
+ /* too many files open -- close one and try again */
+ if (errno == ENFILE || errno == EMFILE)
+ close_one();
+ else /* some other reason for failure */
+ fatal("can't redirect %s `%s'\n", direction,
+ str);
}
- set_field(++NF,ttmp,tcnt);
}
- assign_number(&(NF_node->var_value),(AWKNUM)NF);
- return 0;
+ if (rp->offset != 0) { /* this file was previously open */
+ if (fseek(fp, rp->offset, 0) == -1)
+ fatal("can't seek to %ld on `%s'\n", rp->offset, str);
+ }
+#ifdef notdef
+ (void) flush_io(); /* a la SVR4 awk */
+#endif
+ free_temp(tmp);
+ return rp->fp;
}
-/* Redirection for printf and print commands */
-FILE *
-deal_redirect(tree)
-NODE *tree;
+close_one()
{
- register NODE *tmp;
register struct redirect *rp;
- register char *str;
- register FILE *fp;
- FILE *popen();
- int tflag;
+ register struct redirect *rplast;
+
+ /* go to end of list first, to pick up least recently used entry */
+ for (rp = red_head; rp != NULL; rp = rp->next)
+ rplast = rp;
+ /* now work back up through the list */
+ for (rp = rplast; rp != NULL; rp = rp->prev)
+ if (rp->fp && (rp->flag & RED_FILE)) {
+ rp->offset = ftell(rp->fp);
+ if (fclose(rp->fp))
+ warning("close of \"%s\" failed.",
+ rp->value);
+ rp->fp = NULL;
+ break;
+ }
+ if (rp == NULL)
+ /* surely this is the only reason ??? */
+ fatal("too many pipes open");
+}
+NODE *
+do_close(tree)
+NODE *tree;
+{
+ NODE *tmp;
+ register struct redirect *rp;
- if(!tree) return stdout;
- tflag= (tree->type==Node_redirect_pipe) ? 1 : 2;
- tmp=tree_eval(tree->subnode);
- for(rp=reds;rp->flag!=0 && rp<&reds[20];rp++) { /* That limit again */
- if(rp->flag==tflag && cmp_nodes(rp->value,tmp)==0)
+ tmp = force_string(tree_eval(tree->subnode));
+ for (rp = red_head; rp != NULL; rp = rp->next) {
+ if (strcmp(rp->value, tmp->stptr) == 0)
break;
}
- if(rp==&reds[20]) {
- panic("too many redirections",0);
- return 0;
+ free_temp(tmp);
+ if (rp == NULL) /* no match */
+ return tmp_number((AWKNUM) 0.0);
+ return tmp_number((AWKNUM)close_fp(rp));
+}
+
+int
+close_fp(rp)
+register struct redirect *rp;
+{
+ int status;
+
+ if (rp->flag & RED_PIPE)
+ status = pclose(rp->fp);
+ else
+ status = fclose(rp->fp);
+
+ /* SVR4 awk checks and warns about status of close */
+ if (status)
+ warning("%s close of \"%s\" failed.",
+ (rp->flag & RED_PIPE) ? "pipe" : "file", rp->value);
+ if (rp->prev)
+ rp->prev->next = rp->next;
+ else
+ red_head = rp->next;
+ free(rp->value);
+ free(rp);
+ return status;
+}
+
+int
+flush_io ()
+{
+ register struct redirect *rp;
+ int status = 0;
+
+ if (fflush(stdout)) {
+ warning("error writing standard output.");
+ status++;
}
- if(rp->flag!=0)
- return rp->fp;
- rp->flag=tflag;
- rp->value=dupnode(tmp);
- str=force_string(tmp)->stptr;
- switch(tree->type) {
- case Node_redirect_output:
- fp=rp->fp=fopen(str,"w");
- break;
- case Node_redirect_append:
- fp=rp->fp=fopen(str,"a");
- break;
- case Node_redirect_pipe:
- fp=rp->fp=popen(str,"w");
- break;
+ if (fflush(stderr)) {
+ warning("error writing standard error.");
+ status++;
}
- if(fp==0) panic("can't redirect to '%s'\n",str);
- rp++;
- rp->flag=0;
- return fp;
+ for (rp = red_head; rp != NULL; rp = rp->next)
+ /* flush both files and pipes, what the heck */
+ if ((rp->flag & RED_WRITE) && rp->fp != NULL)
+ if (fflush(rp->fp)) {
+ warning( "%s flush of \"%s\" failed.",
+ (rp->flag & RED_PIPE) ? "pipe" : "file",
+ rp->value);
+ status++;
+ }
+ return status;
}
-print_simple(tree,fp)
+int
+close_io ()
+{
+ register struct redirect *rp;
+ int status = 0;
+
+ for (rp = red_head; rp != NULL; rp = rp->next)
+ if (rp->fp && close_fp(rp))
+ status++;
+ return status;
+}
+
+print_simple(tree, fp)
NODE *tree;
FILE *fp;
{
-#ifndef FAST
- /* Deal with some obscure bugs */
- if(tree==(NODE *)0x55000000) {
- fprintf(fp,"***HUH***");
- return;
- }
- if((int)tree&01) {
- fprintf(fp,"$that's odd$");
- return;
+ if (fwrite(tree->stptr, sizeof(char), tree->stlen, fp) != tree->stlen)
+ warning("fwrite: %s", sys_errlist[errno]);
+ free_temp(tree);
+}
+
+NODE *
+do_atan2(tree)
+NODE *tree;
+{
+ NODE *t1, *t2;
+ extern double atan2();
+
+ get_two(tree, &t1, &t2);
+ (void) force_number(t1);
+ return tmp_number((AWKNUM) atan2((double) t1->numbr,
+ (double) force_number(t2)));
+}
+
+NODE *
+do_sin(tree)
+NODE *tree;
+{
+ NODE *tmp;
+ extern double sin();
+
+ get_one(tree, &tmp);
+ return tmp_number((AWKNUM) sin((double)force_number(tmp)));
+}
+
+NODE *
+do_cos(tree)
+NODE *tree;
+{
+ NODE *tmp;
+ extern double cos();
+
+ get_one(tree, &tmp);
+ return tmp_number((AWKNUM) cos((double)force_number(tmp)));
+}
+
+static int firstrand = 1;
+
+#ifndef USG
+static char state[256];
+extern char *initstate();
+
+#endif
+
+#define MAXLONG 2147483647 /* maximum value for long int */
+
+/* ARGSUSED */
+NODE *
+do_rand(tree)
+NODE *tree;
+{
+#ifdef USG
+ extern long lrand48();
+
+ return tmp_number((AWKNUM) lrand48() / MAXLONG);
+#else
+ extern long random();
+
+ if (firstrand) {
+ (void) initstate((unsigned) 1, state, sizeof state);
+ srandom(1);
+ firstrand = 0;
}
+ return tmp_number((AWKNUM) random() / MAXLONG);
#endif
- tree=force_string(tree);
- fwrite(tree->stptr,sizeof(char),tree->stlen,fp);
}
+NODE *
+do_srand(tree)
+NODE *tree;
+{
+ NODE *tmp;
+ extern long time();
+ static long save_seed = 1;
+ long ret = save_seed; /* SVR4 awk srand returns previous seed */
+
+#ifdef USG
+ extern void srand48();
+
+ if (tree == NULL)
+ srand48(save_seed = time((long *) 0));
+ else {
+ get_one(tree, &tmp);
+ srand48(save_seed = (long) force_number(tmp));
+ }
+#else
+ extern srandom();
+ extern char *setstate();
+
+ if (firstrand)
+ (void) initstate((unsigned) 1, state, sizeof state);
+ else
+ (void) setstate(state);
+
+ if (!tree)
+ srandom((int) (save_seed = time((long *) 0)));
+ else {
+ get_one(tree, &tmp);
+ srandom((int) (save_seed = (long) force_number(tmp)));
+ }
+#endif
+ firstrand = 0;
+ return tmp_number((AWKNUM) ret);
+}
diff --git a/awk4.c b/awk4.c
new file mode 100644
index 00000000..f88f9b96
--- /dev/null
+++ b/awk4.c
@@ -0,0 +1,402 @@
+/*
+ * awk4 -- Code for features in new AWK, System V compatibility.
+ *
+ * Copyright (C) 1988 Free Software Foundation
+ * Written by David Trueman, 1988
+ *
+ * $Log: awk4.c,v $
+ * Revision 1.27 88/12/14 10:53:49 david
+ * malloc structures in func_call and free them on return
+ *
+ * Revision 1.26 88/12/13 22:29:15 david
+ * minor change
+ *
+ * Revision 1.25 88/12/08 10:52:01 david
+ * small correction to #ifdef'ing
+ *
+ * Revision 1.24 88/11/30 15:18:21 david
+ * fooling around with memory allocation in func_call() but this new code remains
+ * #ifdef'd out
+ * correction to creasting private copy of string in do_sub()
+ *
+ * Revision 1.23 88/11/29 09:55:48 david
+ * corrections to code that tracks value of NF -- this needs cleanup
+ *
+ * Revision 1.22 88/11/28 20:30:10 david
+ * bug fix for do_sub when third arg. not specified
+ *
+ * Revision 1.21 88/11/22 13:51:10 david
+ * Arnold: delinting
+ *
+ * Revision 1.20 88/11/15 10:25:14 david
+ * Arnold: minor cleanup
+ *
+ * Revision 1.19 88/11/14 22:00:09 david
+ * Arnold: error message on bad regexp; correction to handling of RSTART in
+ * match() on failure; return arg handling to previous behaviour: var=val
+ * arg is processed whne it is encountered.
+ *
+ * Revision 1.18 88/11/14 21:28:09 david
+ * moved concat_exp(); update NF on assign to field > NF; temporarily aborted
+ * mods. in func_call(); flag misnamed as TMP becomes PERM (don't free)
+ *
+ * Revision 1.17 88/11/01 12:19:37 david
+ * cleanup and substantial changes to do_sub()
+ *
+ * Revision 1.16 88/10/19 21:59:20 david
+ * replace malloc and realloc with error checking versions
+ *
+ * Revision 1.15 88/10/17 20:54:54 david
+ * SYSV --> USG
+ *
+ * Revision 1.14 88/10/13 22:00:44 david
+ * purge FAST and cleanup error messages
+ *
+ * Revision 1.13 88/10/11 09:29:12 david
+ * retrieve parameters from the stack in func_call
+ *
+ * Revision 1.12 88/10/06 21:53:15 david
+ * added FSF copyleft; Arnold's changes to command line processing
+ *
+ *
+ */
+
+/*
+ * GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
+ * WARRANTY. No author or distributor accepts responsibility to anyone for
+ * the consequences of using it or for whether it serves any particular
+ * purpose or works at all, unless he says so in writing. Refer to the GAWK
+ * General Public License for full details.
+ *
+ * Everyone is granted permission to copy, modify and redistribute GAWK, but
+ * only under the conditions described in the GAWK General Public License. A
+ * copy of this license is supposed to have been given to you along with GAWK
+ * so you can know your rights and responsibilities. It should be in a file
+ * named COPYING. Among other things, the copyright notice and this notice
+ * must be preserved on all copies.
+ *
+ * In other words, go ahead and share GAWK, but don't try to stop anyone else
+ * from sharing it farther. Help stamp out software hoarding!
+ */
+
+#include "awk.h"
+
+NODE *ARGC_node, *ARGV_node;
+extern NODE **fields_arr;
+
+jmp_buf func_tag;
+NODE **stack_ptr;
+
+NODE *
+func_call(name, arg_list)
+NODE *name; /* name is a Node_val giving function name */
+NODE *arg_list; /* Node_expression_list of calling args. */
+{
+ register NODE *arg, *argp, *r;
+ NODE *n, *f, *p;
+ jmp_buf func_tag_stack;
+ NODE *ret_node_stack;
+ NODE **local_stack;
+ NODE **sp;
+ static int func_tag_valid = 0;
+ int count;
+ extern NODE *lookup(), *install();
+ extern NODE *pop_var();
+ extern NODE *ret_node;
+
+ /*
+ * retrieve function definition node
+ */
+ f = lookup(variables, name->stptr);
+ if (f->type != Node_func)
+ fatal("function `%s' not defined", name->stptr);
+ /*
+ * mark stack for variables allocated during life of function
+ */
+ count = f->lnode->param_cnt;
+ emalloc(local_stack, NODE **, count * sizeof(NODE *), "func_call");
+ sp = local_stack;
+
+ /*
+ * for each calling arg. add NODE * on stack
+ */
+ for (argp = arg_list; count && argp != NULL; argp = argp->rnode) {
+ arg = argp->lnode;
+ r = newnode(Node_var);
+ /*
+ * call by reference for arrays; see below also
+ */
+ if (arg->type == Node_param_list)
+ arg = stack_ptr[arg->param_cnt];
+ if (arg->type == Node_var_array)
+ *r = *arg;
+ else {
+ n = dupnode(tree_eval(arg));
+ r->lnode = n;
+ r->rnode = (NODE *) NULL;
+ }
+ *sp++ = r;
+ count--;
+ }
+ if (argp != NULL) /* left over calling args. */
+ warning(
+ "function `%s' called with more arguments than declared",
+ name->stptr);
+ /*
+ * add remaining params. on stack with null value
+ */
+ while (count-- > 0) {
+ r = newnode(Node_var);
+ r->lnode = Nnull_string;
+ r->rnode = (NODE *) NULL;
+ *sp++ = r;
+ }
+
+ /*
+ * execute function body, saving context, as a return statement
+ * will longjmp back here
+ */
+ sp = local_stack;
+ local_stack = stack_ptr;
+ stack_ptr = sp;
+ PUSH_BINDING(func_tag_stack, func_tag, func_tag_valid);
+ ret_node_stack = ret_node;
+ if (_setjmp(func_tag) == 0)
+ (void) interpret(f->rnode);
+ r = ret_node;
+ ret_node = ret_node_stack;
+ RESTORE_BINDING(func_tag_stack, func_tag, func_tag_valid);
+ sp = stack_ptr;
+ stack_ptr = local_stack;
+ local_stack = sp;
+
+ /*
+ * here, we pop each parameter and check whether
+ * it was an array. If so, and if the arg. passed in was
+ * a simple variable, then the value should be copied back.
+ * This achieves "call-by-reference" for arrays.
+ */
+ count = f->lnode->param_cnt;
+ for (argp = arg_list; count-- > 0 && argp != NULL; argp = argp->rnode) {
+ arg = argp->lnode;
+ n = *sp++;
+ if (arg->type == Node_var && n->type == Node_var_array) {
+ arg->var_array = n->var_array;
+ arg->type = Node_var_array;
+ }
+ deref = n->lnode;
+ do_deref();
+ free((char *) n);
+ }
+ while (count-- > 0) {
+ n = *sp++;
+ deref = n->lnode;
+ do_deref();
+ free((char *) n);
+ }
+ free((char *) local_stack);
+ return r;
+}
+
+NODE *
+do_match(tree)
+NODE *tree;
+{
+ NODE *t1;
+ int rstart;
+ struct re_registers reregs;
+ struct re_pattern_buffer *rp;
+ extern NODE *RSTART_node, *RLENGTH_node;
+
+ t1 = force_string(tree_eval(tree->lnode));
+ if (tree->rnode->type == Node_regex)
+ rp = tree->rnode->rereg;
+ else {
+ rp = make_regexp(force_string(tree_eval(tree->rnode)));
+ if (rp == NULL)
+ cant_happen();
+ }
+ rstart = re_search(rp, t1->stptr, t1->stlen, 0, t1->stlen, &reregs);
+ free_temp(t1);
+ if (rstart >= 0) {
+ rstart++; /* 1-based indexing */
+ /* RSTART set to rstart below */
+ RLENGTH_node->var_value->numbr =
+ (AWKNUM) (reregs.end[0] - reregs.start[0]);
+ } else {
+ /*
+ * Match failed. Set RSTART to 0, RLENGTH to -1.
+ * Return the value of RSTART.
+ */
+ rstart = 0; /* used as return value */
+ RLENGTH_node->var_value->numbr = -1.0;
+ }
+ RSTART_node->var_value->numbr = (AWKNUM) rstart;
+ return tmp_number((AWKNUM) rstart);
+}
+
+NODE *
+do_sub(tree)
+NODE *tree;
+{
+ register int len;
+ register char *scan;
+ register char *bp, *cp;
+ int search_start = 0;
+ int match_length;
+ int matches = 0;
+ char *buf;
+ int global;
+ struct re_pattern_buffer *rp;
+ NODE *s; /* subst. pattern */
+ NODE *t; /* string to make sub. in; $0 if none given */
+ struct re_registers reregs;
+ unsigned int saveflags;
+ NODE *tmp;
+ NODE **lhs;
+ char *lastbuf;
+
+ global = (tree->type == Node_gsub);
+
+ if (tree->rnode->type == Node_regex)
+ rp = tree->rnode->rereg;
+ else {
+ rp = make_regexp(force_string(tree_eval(tree->rnode)));
+ if (rp == NULL)
+ cant_happen();
+ }
+ tree = tree->lnode;
+ s = force_string(tree_eval(tree->lnode));
+ tree = tree->rnode;
+ deref = 0;
+ if (tree == NULL) {
+ t = WHOLELINE;
+ lhs = &fields_arr[0];
+ field_num = 0;
+ deref = t;
+ } else {
+ t = tree->lnode;
+ lhs = get_lhs(t);
+ t = force_string(tree_eval(t));
+ }
+ /*
+ * create a private copy of the string
+ */
+ if (t->stref > 1 || (t->flags & PERM)) {
+ saveflags = t->flags;
+ t->flags &= ~MALLOC;
+ tmp = dupnode(t);
+ t->flags = saveflags;
+ do_deref();
+ t = tmp;
+ if (lhs)
+ *lhs = tmp;
+ }
+ lastbuf = t->stptr;
+ do {
+ if (re_search(rp, t->stptr, t->stlen, search_start,
+ t->stlen-search_start, &reregs) == -1
+ || reregs.start[0] == reregs.end[0])
+ break;
+ matches++;
+
+ /*
+ * first, make a pass through the sub. pattern, to calculate
+ * the length of the string after substitution
+ */
+ match_length = reregs.end[0] - reregs.start[0];
+ len = t->stlen - match_length;
+ for (scan = s->stptr; scan < s->stptr + s->stlen; scan++)
+ if (*scan == '&')
+ len += match_length;
+ else if (*scan == '\\' && *(scan+1) == '&') {
+ scan++;
+ len++;
+ } else
+ len++;
+ emalloc(buf, char *, len + 1, "do_sub");
+ bp = buf;
+
+ /*
+ * now, create the result, copying in parts of the original
+ * string
+ */
+ for (scan = t->stptr; scan < t->stptr + reregs.start[0]; scan++)
+ *bp++ = *scan;
+ for (scan = s->stptr; scan < s->stptr + s->stlen; scan++)
+ if (*scan == '&')
+ for (cp = t->stptr + reregs.start[0]; cp < t->stptr + reregs.end[0]; cp++)
+ *bp++ = *cp;
+ else if (*scan == '\\' && *(scan+1) == '&') {
+ scan++;
+ *bp++ = *scan;
+ } else
+ *bp++ = *scan;
+ search_start = bp - buf;
+ for (scan = t->stptr + reregs.end[0]; scan < t->stptr + t->stlen; scan++)
+ *bp++ = *scan;
+ *bp = '\0';
+ free(lastbuf);
+ t->stptr = buf;
+ lastbuf = buf;
+ t->stlen = len;
+ } while (global && search_start < t->stlen);
+
+ free_temp(s);
+ if (matches > 0) {
+ if (field_num == 0)
+ set_record(fields_arr[0]->stptr, fields_arr[0]->stlen);
+ else if (field_num > 0) {
+ node0_valid = 0;
+ if (NF_node->var_value->numbr == -1 &&
+ field_num > NF_node->var_value->numbr)
+ assign_number(&(NF_node->var_value),
+ (AWKNUM) field_num);
+ }
+ t->flags &= ~NUM;
+ }
+ field_num = -1;
+ return tmp_number((AWKNUM) matches);
+}
+
+init_args(argc0, argc, argv0, argv)
+int argc0, argc;
+char *argv0;
+char **argv;
+{
+ int i, j;
+ NODE **aptr;
+ extern NODE **assoc_lookup();
+ extern NODE *spc_var();
+ extern NODE *make_string();
+ extern NODE *make_number();
+ extern NODE *tmp_number();
+
+ ARGV_node = spc_var("ARGV", Nnull_string);
+ aptr = assoc_lookup(ARGV_node, tmp_number(0.0));
+ *aptr = make_string(argv0, strlen(argv0));
+ for (i = argc0, j = 1; i < argc; i++) {
+ aptr = assoc_lookup(ARGV_node, tmp_number((AWKNUM) j));
+ *aptr = make_string(argv[i], strlen(argv[i]));
+ j++;
+ }
+ ARGC_node = spc_var("ARGC", make_number((AWKNUM) j));
+}
+
+#ifdef USG
+int
+bcopy (src, dst, length)
+register char *src, *dst;
+register int length;
+{
+ (void) memcpy (dst, src, length);
+}
+
+int
+bzero (b, length)
+register char *b;
+register int length;
+{
+ (void) memset (b, '\0', length);
+}
+#endif
diff --git a/awk5.c b/awk5.c
new file mode 100644
index 00000000..185fecb5
--- /dev/null
+++ b/awk5.c
@@ -0,0 +1,154 @@
+/*
+ * routines for error messages
+ *
+ * Copyright (C) 1988 Free Software Foundation
+ *
+ * $Log: awk5.c,v $
+ * Revision 1.10 88/12/08 11:00:07 david
+ * add $Log$
+ *
+ */
+
+/*
+ * GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
+ * WARRANTY. No author or distributor accepts responsibility to anyone for
+ * the consequences of using it or for whether it serves any particular
+ * purpose or works at all, unless he says so in writing. Refer to the GAWK
+ * General Public License for full details.
+ *
+ * Everyone is granted permission to copy, modify and redistribute GAWK, but
+ * only under the conditions described in the GAWK General Public License. A
+ * copy of this license is supposed to have been given to you along with GAWK
+ * so you can know your rights and responsibilities. It should be in a file
+ * named COPYING. Among other things, the copyright notice and this notice
+ * must be preserved on all copies.
+ *
+ * In other words, go ahead and share GAWK, but don't try to stop anyone else
+ * from sharing it farther. Help stamp out software hoarding!
+ */
+
+#include "awk.h"
+
+int sourceline = 0;
+char *source = NULL;
+
+err(s, argp)
+char *s;
+va_list *argp;
+{
+ char *fmt;
+ int line;
+ char *file;
+
+ (void) fprintf(stderr, "%s: %s ", myname, s);
+ fmt = va_arg(*argp, char *);
+ vfprintf(stderr, fmt, *argp);
+ (void) fprintf(stderr, "\n");
+ line = (int) FNR_node->var_value->numbr;
+ if (line)
+ (void) fprintf(stderr, " input line number %d", line);
+ file = FILENAME_node->var_value->stptr;
+ if (file && strcmp(file, "-") != 0)
+ (void) fprintf(stderr, ", file `%s'", file);
+ (void) fprintf(stderr, "\n");
+ if (sourceline)
+ (void) fprintf(stderr, " source line number %d", sourceline);
+ if (source)
+ (void) fprintf(stderr, ", file `%s'", source);
+ (void) fprintf(stderr, "\n");
+}
+
+/*VARARGS0*/
+void
+msg(va_alist)
+va_dcl
+{
+ va_list args;
+
+ va_start(args);
+ err("", &args);
+ va_end(args);
+}
+
+/*VARARGS0*/
+void
+warning(va_alist)
+va_dcl
+{
+ va_list args;
+
+ va_start(args);
+ err("warning:", &args);
+ va_end(args);
+}
+
+/*VARARGS0*/
+void
+fatal(va_alist)
+va_dcl
+{
+ va_list args;
+ extern char *sourcefile;
+
+ va_start(args);
+ err("fatal error:", &args);
+ va_end(args);
+#ifdef DEBUG
+ abort();
+#endif
+ exit(1);
+}
+
+char *
+safe_malloc(size)
+unsigned size;
+{
+ char *ret;
+
+ ret = malloc(size);
+ if (ret == NULL)
+ fatal("safe_malloc: can't allocate memory (%s)",
+ sys_errlist[errno]);
+ return ret;
+}
+
+#if defined(BSD) && !defined(VPRINTF)
+int
+vsprintf(str, fmt, ap)
+ char *str, *fmt;
+ va_list ap;
+{
+ FILE f;
+ int len;
+
+ f._flag = _IOWRT+_IOSTRG;
+ f._ptr = str;
+ f._cnt = 32767;
+ len = _doprnt(fmt, ap, &f);
+ *f._ptr = 0;
+ return (len);
+}
+
+int
+vfprintf(iop, fmt, ap)
+ FILE *iop;
+ char *fmt;
+ va_list ap;
+{
+ int len;
+
+ len = _doprnt(fmt, ap, iop);
+ return (ferror(iop) ? EOF : len);
+}
+
+int
+vprintf(fmt, ap)
+ char *fmt;
+ va_list ap;
+{
+ int len;
+
+ len = _doprnt(fmt, ap, stdout);
+ return (ferror(stdout) ? EOF : len);
+}
+#endif
diff --git a/awk6.c b/awk6.c
new file mode 100644
index 00000000..8c81c5f5
--- /dev/null
+++ b/awk6.c
@@ -0,0 +1,586 @@
+/*
+ * awk6.c -- Various debugging routines
+ *
+ * Copyright (C) 1986 Free Software Foundation Written by Jay Fenlason, December
+ * 1986
+ *
+ * $Log: awk6.c,v $
+ * Revision 1.8 88/11/22 13:51:34 david
+ * Arnold: changes for case-insensitive matching
+ *
+ * Revision 1.7 88/11/15 10:28:08 david
+ * Arnold: minor cleanup
+ *
+ * Revision 1.6 88/11/01 12:20:46 david
+ * small improvements to debugging code
+ *
+ * Revision 1.5 88/10/17 20:53:37 david
+ * purge FAST
+ *
+ * Revision 1.4 88/05/31 09:56:39 david
+ * oops! fix to last change
+ *
+ * Revision 1.3 88/05/31 09:25:48 david
+ * expunge Node_local_var
+ *
+ * Revision 1.2 88/04/15 13:15:47 david
+ * brought slightly up-to-date
+ *
+ * Revision 1.1 88/04/08 15:14:38 david
+ * Initial revision
+ * Revision 1.5 88/04/08 14:48:39 david changes from
+ * Arnold Robbins
+ *
+ * Revision 1.4 88/03/28 14:13:57 david *** empty log message ***
+ *
+ * Revision 1.3 88/03/18 21:00:15 david Baseline -- hoefully all the
+ * functionality of the new awk added. Just debugging and tuning to do.
+ *
+ * Revision 1.2 87/11/19 14:41:07 david trying to keep it up to date with
+ * changes elsewhere ...
+ *
+ * Revision 1.1 87/10/27 15:23:36 david Initial revision
+ *
+ */
+
+/*
+ * GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
+ * WARRANTY. No author or distributor accepts responsibility to anyone for
+ * the consequences of using it or for whether it serves any particular
+ * purpose or works at all, unless he says so in writing. Refer to the GAWK
+ * General Public License for full details.
+ *
+ * Everyone is granted permission to copy, modify and redistribute GAWK, but
+ * only under the conditions described in the GAWK General Public License. A
+ * copy of this license is supposed to have been given to you along with GAWK
+ * so you can know your rights and responsibilities. It should be in a file
+ * named COPYING. Among other things, the copyright notice and this notice
+ * must be preserved on all copies.
+ *
+ * In other words, go ahead and share GAWK, but don't try to stop anyone else
+ * from sharing it farther. Help stamp out software hoarding!
+ */
+#include "awk.h"
+
+#ifdef DEBUG
+
+extern NODE **fields_arr;
+
+
+/* This is all debugging stuff. Ignore it and maybe it'll go away. */
+
+/*
+ * Some of it could be turned into a really cute trace command, if anyone
+ * wants to.
+ */
+char *nnames[] = {
+ "Illegal Node",
+ "Times", "Divide", "Mod", "Plus", "Minus",
+ "Cond-pair", "Subscript", "Concat",
+ "++Pre", "--Pre", "Post++",
+ "Post--", "Uminus", "Field",
+ "Assign", "*=", "/=", "%=",
+ "+=", "-=",
+ "And", "Or",
+ "Equal", "!=", "Less", "Greater", "<=", ">=",
+ "Not",
+ "Match", "Nomatch",
+ "String", "TmpString", "Number",
+ "Rule_list", "Rule_node", "State_list", "If_branches", "Exp_list",
+ "Param_list",
+ "BEGIN", "END", "IF", "WHILE",
+ "FOR",
+ "arrayfor", "BREAK", "CONTINUE", "PRINT", "PRINTF",
+
+ "next", "exit", "DO", "RETURN", "DELETE",
+ "redirect", "Append", "pipe", "Pipe in",
+ "redirect input", "variable", "Varray",
+ "builtin", "Line-range",
+ "In_Array", "FUNCTION", "function def", "function call",
+ "local variable",
+ "getline", "sub", "gsub", "match", "?:",
+ "^", "^=", "/regexp/", "Str_num",
+ "~~", "!~~",
+};
+
+ptree(n)
+{
+ print_parse_tree((NODE *) n);
+}
+
+pt()
+{
+ int x;
+
+ (void) scanf("%x", &x);
+ printf("0x%x\n", x);
+ print_parse_tree((NODE *) x);
+ fflush(stdout);
+}
+
+static depth = 0;
+
+print_parse_tree(ptr)
+NODE *ptr;
+{
+ if (!ptr) {
+ printf("NULL\n");
+ return;
+ }
+ if ((int) (ptr->type) < 0 || (int) (ptr->type) > sizeof(nnames) / sizeof(nnames[0])) {
+ printf("(0x%x Type %d??)\n", ptr, ptr->type);
+ return;
+ }
+ printf("(%d)%*s", depth, depth, "");
+ switch ((int) ptr->type) {
+ case (int) Node_val:
+ printf("(0x%x Value ", ptr);
+ if (ptr->flags&STR)
+ printf("str: \"%.*s\" ", ptr->stlen, ptr->stptr);
+ if (ptr->flags&NUM)
+ printf("num: %g", ptr->numbr);
+ printf(")\n");
+ return;
+ case (int) Node_var_array:
+ {
+ struct search *l;
+ NODE **assoc_lookup();
+
+ printf("(0x%x Array)\n", ptr);
+ for (l = assoc_scan(ptr); l; l = assoc_next(l)) {
+ printf("\tindex: ");
+ print_parse_tree(l->retval);
+ printf("\tvalue: ");
+ print_parse_tree(*assoc_lookup(ptr, l->retval));
+ printf("\n");
+ }
+ return;
+ }
+ case Node_param_list:
+ printf("(0x%x Local variable %s)\n", ptr, ptr->param);
+ if (ptr->rnode)
+ print_parse_tree(ptr->rnode);
+ return;
+ }
+ if (ptr->lnode)
+ printf("0x%x = left<--", ptr->lnode);
+ printf("(0x%x %s.%d)", ptr, nnames[(int) (ptr->type)], ptr->type);
+ if (ptr->rnode)
+ printf("-->right = 0x%x", ptr->rnode);
+ printf("\n");
+ depth++;
+ if (ptr->lnode)
+ print_parse_tree(ptr->lnode);
+ switch ((int) ptr->type) {
+ case (int) Node_line_range:
+ case (int) Node_match:
+ case (int) Node_nomatch:
+ case (int) Node_case_match:
+ case (int) Node_case_nomatch:
+ break;
+ case (int) Node_builtin:
+ printf("Builtin: %d\n", ptr->proc);
+ break;
+ case (int) Node_K_for:
+ case (int) Node_K_arrayfor:
+ printf("(%s:)\n", nnames[(int) (ptr->type)]);
+ print_parse_tree(ptr->forloop->init);
+ printf("looping:\n");
+ print_parse_tree(ptr->forloop->cond);
+ printf("doing:\n");
+ print_parse_tree(ptr->forloop->incr);
+ break;
+ default:
+ if (ptr->rnode)
+ print_parse_tree(ptr->rnode);
+ break;
+ }
+ --depth;
+}
+
+
+/*
+ * print out all the variables in the world
+ */
+
+dump_vars()
+{
+ register int n;
+ register HASHNODE *buc;
+
+#ifdef notdef
+ printf("Fields:");
+ dump_fields();
+#endif
+ printf("Vars:\n");
+ for (n = 0; n < HASHSIZE; n++) {
+ for (buc = variables[n]; buc; buc = buc->next) {
+ printf("'%.*s': ", buc->length, buc->name);
+ print_parse_tree(buc->value);
+ /* print_parse_tree(buc->value); */
+ }
+ }
+ printf("End\n");
+}
+
+#ifdef notdef
+dump_fields()
+{
+ register NODE **p;
+ register int n;
+
+ printf("%d fields\n", f_arr_siz);
+ for (n = 0, p = &fields_arr[0]; n < f_arr_siz; n++, p++) {
+ printf("$%d is '", n);
+ print_simple(*p, stdout);
+ printf("'\n");
+ }
+}
+#endif
+
+/* VARARGS1 */
+print_debug(str, n)
+char *str;
+{
+ extern int debugging;
+
+ if (debugging)
+ printf("%s:0x%x\n", str, n);
+}
+
+int indent = 0;
+
+print_a_node(ptr)
+NODE *ptr;
+{
+ NODE *p1;
+ char *str, *str2;
+ int n;
+ HASHNODE *buc;
+
+ if (!ptr)
+ return; /* don't print null ptrs */
+ switch (ptr->type) {
+ case Node_val:
+ if (ptr->flags&NUM)
+ printf("%g", ptr->numbr);
+ else
+ printf("\"%.*s\"", ptr->stlen, ptr->stptr);
+ return;
+ case Node_times:
+ str = "*";
+ goto pr_twoop;
+ case Node_quotient:
+ str = "/";
+ goto pr_twoop;
+ case Node_mod:
+ str = "%";
+ goto pr_twoop;
+ case Node_plus:
+ str = "+";
+ goto pr_twoop;
+ case Node_minus:
+ str = "-";
+ goto pr_twoop;
+ case Node_exp:
+ str = "^";
+ goto pr_twoop;
+ case Node_concat:
+ str = " ";
+ goto pr_twoop;
+ case Node_assign:
+ str = "=";
+ goto pr_twoop;
+ case Node_assign_times:
+ str = "*=";
+ goto pr_twoop;
+ case Node_assign_quotient:
+ str = "/=";
+ goto pr_twoop;
+ case Node_assign_mod:
+ str = "%=";
+ goto pr_twoop;
+ case Node_assign_plus:
+ str = "+=";
+ goto pr_twoop;
+ case Node_assign_minus:
+ str = "-=";
+ goto pr_twoop;
+ case Node_assign_exp:
+ str = "^=";
+ goto pr_twoop;
+ case Node_and:
+ str = "&&";
+ goto pr_twoop;
+ case Node_or:
+ str = "||";
+ goto pr_twoop;
+ case Node_equal:
+ str = "==";
+ goto pr_twoop;
+ case Node_notequal:
+ str = "!=";
+ goto pr_twoop;
+ case Node_less:
+ str = "<";
+ goto pr_twoop;
+ case Node_greater:
+ str = ">";
+ goto pr_twoop;
+ case Node_leq:
+ str = "<=";
+ goto pr_twoop;
+ case Node_geq:
+ str = ">=";
+ goto pr_twoop;
+
+pr_twoop:
+ print_a_node(ptr->lnode);
+ printf("%s", str);
+ print_a_node(ptr->rnode);
+ return;
+
+ case Node_not:
+ str = "!";
+ str2 = "";
+ goto pr_oneop;
+ case Node_field_spec:
+ str = "$(";
+ str2 = ")";
+ goto pr_oneop;
+ case Node_postincrement:
+ str = "";
+ str2 = "++";
+ goto pr_oneop;
+ case Node_postdecrement:
+ str = "";
+ str2 = "--";
+ goto pr_oneop;
+ case Node_preincrement:
+ str = "++";
+ str2 = "";
+ goto pr_oneop;
+ case Node_predecrement:
+ str = "--";
+ str2 = "";
+ goto pr_oneop;
+pr_oneop:
+ printf(str);
+ print_a_node(ptr->subnode);
+ printf(str2);
+ return;
+
+ case Node_expression_list:
+ print_a_node(ptr->lnode);
+ if (ptr->rnode) {
+ printf(",");
+ print_a_node(ptr->rnode);
+ }
+ return;
+
+ case Node_var:
+ for (n = 0; n < HASHSIZE; n++) {
+ for (buc = variables[n]; buc; buc = buc->next) {
+ if (buc->value == ptr) {
+ printf("%.*s", buc->length, buc->name);
+ n = HASHSIZE;
+ break;
+ }
+ }
+ }
+ return;
+ case Node_subscript:
+ print_a_node(ptr->lnode);
+ printf("[");
+ print_a_node(ptr->rnode);
+ printf("]");
+ return;
+ case Node_builtin:
+ printf("some_builtin(");
+ print_a_node(ptr->subnode);
+ printf(")");
+ return;
+
+ case Node_statement_list:
+ printf("{\n");
+ indent++;
+ for (n = indent; n; --n)
+ printf(" ");
+ while (ptr) {
+ print_maybe_semi(ptr->lnode);
+ if (ptr->rnode)
+ for (n = indent; n; --n)
+ printf(" ");
+ ptr = ptr->rnode;
+ }
+ --indent;
+ for (n = indent; n; --n)
+ printf(" ");
+ printf("}\n");
+ for (n = indent; n; --n)
+ printf(" ");
+ return;
+
+ case Node_K_if:
+ printf("if(");
+ print_a_node(ptr->lnode);
+ printf(") ");
+ ptr = ptr->rnode;
+ if (ptr->lnode->type == Node_statement_list) {
+ printf("{\n");
+ indent++;
+ for (p1 = ptr->lnode; p1; p1 = p1->rnode) {
+ for (n = indent; n; --n)
+ printf(" ");
+ print_maybe_semi(p1->lnode);
+ }
+ --indent;
+ for (n = indent; n; --n)
+ printf(" ");
+ if (ptr->rnode) {
+ printf("} else ");
+ } else {
+ printf("}\n");
+ return;
+ }
+ } else {
+ print_maybe_semi(ptr->lnode);
+ if (ptr->rnode) {
+ for (n = indent; n; --n)
+ printf(" ");
+ printf("else ");
+ } else
+ return;
+ }
+ if (!ptr->rnode)
+ return;
+ deal_with_curls(ptr->rnode);
+ return;
+
+ case Node_K_while:
+ printf("while(");
+ print_a_node(ptr->lnode);
+ printf(") ");
+ deal_with_curls(ptr->rnode);
+ return;
+
+ case Node_K_do:
+ printf("do ");
+ deal_with_curls(ptr->rnode);
+ printf("while(");
+ print_a_node(ptr->lnode);
+ printf(") ");
+ return;
+
+ case Node_K_for:
+ printf("for(");
+ print_a_node(ptr->forloop->init);
+ printf(";");
+ print_a_node(ptr->forloop->cond);
+ printf(";");
+ print_a_node(ptr->forloop->incr);
+ printf(") ");
+ deal_with_curls(ptr->forsub);
+ return;
+ case Node_K_arrayfor:
+ printf("for(");
+ print_a_node(ptr->forloop->init);
+ printf(" in ");
+ print_a_node(ptr->forloop->incr);
+ printf(") ");
+ deal_with_curls(ptr->forsub);
+ return;
+
+ case Node_K_printf:
+ printf("printf(");
+ print_a_node(ptr->lnode);
+ printf(")");
+ return;
+ case Node_K_print:
+ printf("print(");
+ print_a_node(ptr->lnode);
+ printf(")");
+ return;
+ case Node_K_next:
+ printf("next");
+ return;
+ case Node_K_break:
+ printf("break");
+ return;
+ case Node_K_delete:
+ printf("delete ");
+ print_a_node(ptr->lnode);
+ return;
+ case Node_func:
+ printf("function %s (", ptr->lnode->param);
+ if (ptr->lnode->rnode)
+ print_a_node(ptr->lnode->rnode);
+ printf(")\n");
+ print_a_node(ptr->rnode);
+ return;
+ case Node_param_list:
+ printf("%s", ptr->param);
+ if (ptr->rnode) {
+ printf(", ");
+ print_a_node(ptr->rnode);
+ }
+ return;
+ default:
+ print_parse_tree(ptr);
+ return;
+ }
+}
+
+print_maybe_semi(ptr)
+NODE *ptr;
+{
+ print_a_node(ptr);
+ switch (ptr->type) {
+ case Node_K_if:
+ case Node_K_for:
+ case Node_K_arrayfor:
+ case Node_statement_list:
+ break;
+ default:
+ printf(";\n");
+ break;
+ }
+}
+
+deal_with_curls(ptr)
+NODE *ptr;
+{
+ int n;
+
+ if (ptr->type == Node_statement_list) {
+ printf("{\n");
+ indent++;
+ while (ptr) {
+ for (n = indent; n; --n)
+ printf(" ");
+ print_maybe_semi(ptr->lnode);
+ ptr = ptr->rnode;
+ }
+ --indent;
+ for (n = indent; n; --n)
+ printf(" ");
+ printf("}\n");
+ } else {
+ print_maybe_semi(ptr);
+ }
+}
+
+NODE *
+do_prvars()
+{
+ dump_vars();
+ return Nnull_string;
+}
+
+NODE *
+do_bp()
+{
+ return Nnull_string;
+}
+
+#endif
diff --git a/awk7.c b/awk7.c
new file mode 100644
index 00000000..ac427498
--- /dev/null
+++ b/awk7.c
@@ -0,0 +1,552 @@
+/*
+ * gawk - routines for dealing with record input and fields
+ *
+ * Copyright (C) 1988 Free Software Foundation
+ *
+ */
+
+/*
+ * GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
+ * WARRANTY. No author or distributor accepts responsibility to anyone for
+ * the consequences of using it or for whether it serves any particular
+ * purpose or works at all, unless he says so in writing. Refer to the GAWK
+ * General Public License for full details.
+ *
+ * Everyone is granted permission to copy, modify and redistribute GAWK, but
+ * only under the conditions described in the GAWK General Public License. A
+ * copy of this license is supposed to have been given to you along with GAWK
+ * so you can know your rights and responsibilities. It should be in a file
+ * named COPYING. Among other things, the copyright notice and this notice
+ * must be preserved on all copies.
+ *
+ * In other words, go ahead and share GAWK, but don't try to stop anyone else
+ * from sharing it farther. Help stamp out software hoarding!
+ */
+
+#include "awk.h"
+
+static int getline_redirect = 0;/* "getline <file" being executed */
+static char *line_buf = NULL; /* holds current input line */
+static int line_alloc = 0; /* current allocation for line_buf */
+
+int field_num; /* save number of field in get_lhs */
+char *field_begin;
+NODE **fields_arr; /* array of pointers to the field nodes */
+NODE node0; /* node for $0 which never gets free'd */
+int node0_valid = 1; /* $(>0) has not been changed yet */
+char f_empty[] = "";
+int parse_high_water = 0; /* field number that we have parsed so far */
+char *parse_extent; /* marks where to restart parse of record */
+char *save_fs = " "; /* save current value of FS when line is read,
+ * to be used in deferred parsing
+ */
+static get_a_record();
+
+init_fields()
+{
+ emalloc(fields_arr, NODE **, sizeof(NODE *), "init_fields");
+ node0.type = Node_val;
+ node0.stref = 0;
+ node0.flags = (STR|PERM); /* never free buf */
+ fields_arr[0] = &node0;
+}
+
+/*
+ * Danger! Must only be called for fields we know have just been blanked, or
+ * fields we know don't exist yet.
+ */
+set_field(num, str, len, dummy)
+int num;
+char *str;
+int len;
+NODE *dummy; /* not used -- just to make interface same as set_element */
+{
+ NODE *n;
+ int t;
+
+ erealloc(fields_arr, NODE **, (num + 1) * sizeof(NODE *), "set_field");
+ /* fill in fields that don't exist */
+ for (t = parse_high_water + 1; t < num; t++)
+ fields_arr[t] = Nnull_string;
+ n = make_string(str, len);
+ fields_arr[num] = n;
+ parse_high_water = num;
+}
+
+/* Someone assigned a value to $(something). Fix up $0 to be right */
+rebuild_record()
+{
+ register int tlen;
+ register NODE *tmp;
+ NODE *ofs;
+ char *ops;
+ register char *cops;
+ register NODE **ptr, **maxp;
+ extern NODE *OFS_node;
+
+ maxp = 0;
+ tlen = 0;
+ ofs = force_string(*get_lhs(OFS_node));
+ deref = 0;
+ ptr = &fields_arr[parse_high_water];
+ while (ptr > &fields_arr[0]) {
+ tmp = force_string(*ptr);
+ tlen += tmp->stlen;
+ if (tmp->stlen && !maxp)
+ maxp = ptr;
+ ptr--;
+ }
+ tlen += ((maxp - fields_arr) - 1) * ofs->stlen;
+ emalloc(ops, char *, tlen + 1, "fix_fields");
+ cops = ops;
+ for (ptr = &fields_arr[1]; ptr <= maxp; ptr++) {
+ tmp = force_string(*ptr);
+ bcopy(tmp->stptr, cops, tmp->stlen);
+ cops += tmp->stlen;
+ if (ptr != maxp) {
+ bcopy(ofs->stptr, cops, ofs->stlen);
+ cops += ofs->stlen;
+ }
+ }
+ tmp = make_string(ops, tlen);
+ deref = fields_arr[0];
+ do_deref();
+ fields_arr[0] = tmp;
+}
+
+
+/*
+ * This reads in a record from the input file
+ */
+int
+inrec()
+{
+ int cnt;
+ int retval = 0;
+
+ cnt = get_a_record(&line_buf, &line_alloc);
+ if (cnt == EOF) {
+ cnt = 0;
+ retval = 1;
+ } else {
+ if (!getline_redirect) {
+ assign_number(&(NR_node->var_value),
+ NR_node->var_value->numbr + 1.0);
+ assign_number(&(FNR_node->var_value),
+ FNR_node->var_value->numbr + 1.0);
+ }
+ }
+ set_record(line_buf, cnt);
+
+ return retval;
+}
+
+/*
+ * setup $0, but defer parsing rest of line until reference is made to $(>0)
+ * or to NF. At that point, parse only as much as necessary.
+ */
+set_record(buf, cnt)
+char *buf;
+int cnt;
+{
+ char *get_fs();
+
+ assign_number(&(NF_node->var_value), (AWKNUM) -1);
+ parse_high_water = 0;
+ node0_valid = 1;
+ if (buf == line_buf) {
+ deref = fields_arr[0];
+ do_deref();
+ save_fs = get_fs();
+ node0.type = Node_val;
+ node0.stptr = buf;
+ node0.stlen = cnt;
+ node0.stref = 1;
+ node0.flags = (STR|PERM); /* never free buf */
+ fields_arr[0] = &node0;
+ }
+}
+
+NODE **
+get_field(num)
+int num;
+{
+ int n;
+
+ /*
+ * if requesting whole line but some other field has been altered,
+ * then the whole line must be rebuilt
+ */
+ if (num == 0 && node0_valid == 0) {
+ /* first, parse remainder of input record */
+ (void) parse_fields(HUGE-1, &parse_extent,
+ fields_arr[0]->stlen - (parse_extent-fields_arr[0]->stptr),
+ save_fs, set_field, (NODE *)NULL);
+ rebuild_record();
+ parse_high_water = 0;
+ return &fields_arr[0];
+ }
+ if (num <= parse_high_water) /* we have already parsed this field */
+ return &fields_arr[num];
+ if (parse_high_water == 0 && num > 0) /* starting at the beginning */
+ parse_extent = fields_arr[0]->stptr;
+ /*
+ * parse up to num fields, calling set_field() for each, and saving
+ * in parse_extent the point where the parse left off
+ */
+ n = parse_fields(num, &parse_extent,
+ fields_arr[0]->stlen - (parse_extent-fields_arr[0]->stptr),
+ save_fs, set_field, (NODE *)NULL);
+ if (num == HUGE-1)
+ num = n;
+ if (n < num) /* requested field number beyond end of record;
+ * set_field will just extend the number of fields,
+ * with empty fields
+ */
+ set_field(num, f_empty, 0, (NODE *) NULL);
+ /*
+ * if we reached the end of the record, set NF to the number of fields
+ * actually parsed. Note that num might actually refer to a field that
+ * is beyond the end of the record, but we won't set NF to that value at
+ * this point, since this may only be a reference to the field and NF
+ * only gets set if the field is assigned to
+ */
+ if (*parse_extent == '\0')
+ assign_number(&(NF_node->var_value), (AWKNUM) n);
+
+ return &fields_arr[num];
+}
+
+/*
+ * this is called both from get_field() and from do_split()
+ */
+int
+parse_fields(up_to, buf, len, fs, set, n)
+int up_to; /* parse only up to this field number */
+char **buf; /* on input: string to parse; on output: point to start next */
+int len;
+char *fs;
+int (*set) (); /* routine to set the value of the parsed field */
+NODE *n;
+{
+ char *s = *buf;
+ char *field;
+ int field_len;
+ char *scan;
+ char *end = s + len;
+ int NF = parse_high_water;
+
+ if (up_to == HUGE)
+ NF = 0;
+ if (*fs && *(fs + 1) != '\0') { /* fs is a regexp */
+ struct re_registers reregs;
+
+ scan = s;
+ while (re_split(scan, end - scan, fs, &reregs) != -1 &&
+ NF < up_to) {
+ (*set)(++NF, scan, reregs.start[0], n);
+ scan += reregs.end[0];
+ }
+ if (NF != up_to && scan <= end) {
+ (*set)(++NF, scan, end - scan, n);
+ scan = end;
+ }
+ *buf = scan;
+ return (NF);
+ }
+ for (scan = s; scan < end && NF < up_to; scan++) {
+ /*
+ * special case: fs is single space, strip leading
+ * whitespace
+ */
+ if (*fs == ' ') {
+ while ((*scan == ' ' || *scan == '\t') && scan < end)
+ scan++;
+ if (scan >= end)
+ break;
+ }
+ field_len = 0;
+ field = scan;
+ if (*fs == ' ')
+ while (*scan != ' ' && *scan != '\t' && scan < end) {
+ scan++;
+ field_len++;
+ }
+ else {
+ while (*scan != *fs && scan < end) {
+ scan++;
+ field_len++;
+ }
+ if (scan == end-1 && *scan == *fs) {
+ (*set)(++NF, field, field_len, n);
+ field = scan;
+ field_len = 0;
+ }
+ }
+ (*set)(++NF, field, field_len, n);
+ if (scan == end)
+ break;
+ }
+ *buf = scan;
+ return NF;
+}
+
+int
+re_split(buf, len, fs, reregs)
+char *buf, *fs;
+int len;
+struct re_registers *reregs;
+{
+ typedef struct re_pattern_buffer RPAT;
+ static RPAT *rp;
+ static char *last_fs = NULL;
+
+ if (last_fs != NULL && strcmp(fs, last_fs) != 0) { /* fs has changed */
+ free(rp->buffer);
+ free(rp->fastmap);
+ free((char *) rp);
+ free(last_fs);
+ last_fs = NULL;
+ }
+ if (last_fs == NULL) { /* first time */
+ emalloc(rp, RPAT *, sizeof(RPAT), "re_split");
+ bzero((char *) rp, sizeof(RPAT));
+ emalloc(rp->buffer, char *, 8, "re_split");
+ rp->allocated = 8;
+ emalloc(rp->fastmap, char *, 256, "re_split");
+ emalloc(last_fs, char *, strlen(fs) + 1, "re_split");
+ (void) strcpy(last_fs, fs);
+ if (re_compile_pattern(fs, strlen(fs), rp) != NULL)
+ fatal("illegal regular expression for FS: `%s'", fs);
+ }
+ return re_search(rp, buf, len, 0, len, reregs);
+}
+
+static int /* count of chars read or EOF */
+get_a_record(bp, sizep)
+char **bp; /* *bp points to beginning of line on return */
+int *sizep; /* *sizep is current allocation of *bp */
+{
+ register char *buf; /* buffer; realloced if necessary */
+ int bsz; /* current buffer size */
+ register char *cur; /* current position in buffer */
+ register char *buf_end; /* end of buffer */
+ register int rs; /* rs is the current record separator */
+ register int c;
+ extern FILE *input_file;
+
+ bsz = *sizep;
+ buf = *bp;
+ if (!buf) {
+ emalloc(buf, char *, 128, "get_a_record");
+ bsz = 128;
+ }
+ rs = get_rs();
+ buf_end = buf + bsz;
+ cur = buf;
+ while ((c = getc(input_file)) != EOF) {
+ if (rs == 0 && c == '\n' && cur != buf && cur[-1] == '\n') {
+ cur--;
+ break;
+ }
+ else if (c == rs)
+ break;
+ *cur++ = c;
+ if (cur == buf_end) {
+ erealloc(buf, char *, bsz * 2, "get_a_record");
+ cur = buf + bsz;
+ bsz *= 2;
+ buf_end = buf + bsz;
+ }
+ }
+ if (rs == 0 && c == EOF && cur != buf && cur[-1] == '\n')
+ cur--;
+ *cur = '\0';
+ *bp = buf;
+ *sizep = bsz;
+ if (c == EOF && cur == buf)
+ return EOF;
+ return cur - buf;
+}
+
+NODE *
+do_getline(tree)
+NODE *tree;
+{
+ FILE *save_fp;
+ FILE *redirect();
+ int cnt;
+ NODE **lhs;
+ extern NODE **get_lhs();
+ extern FILE *input_file;
+ extern FILE *nextfile();
+
+ if (tree->rnode == NULL && (input_file == NULL || feof(input_file))) {
+ input_file = nextfile();
+ if (input_file == NULL)
+ return tmp_number((AWKNUM) 0.0);
+ }
+ save_fp = input_file;
+ if (tree->rnode != NULL) { /* with redirection */
+ input_file = redirect(tree->rnode);
+ getline_redirect++;
+ }
+ if (tree->lnode == NULL) { /* read in $0 */
+ if (inrec() != 0) {
+ input_file = save_fp;
+ getline_redirect = 0;
+ return tmp_number((AWKNUM) 0.0);
+ }
+ } else { /* read in a named variable */
+ char *s = NULL;
+ int n = 0;
+
+ lhs = get_lhs(tree->lnode);
+ cnt = get_a_record(&s, &n);
+ if (!getline_redirect) {
+ assign_number(&(NR_node->var_value),
+ NR_node->var_value->numbr + 1.0);
+ assign_number(&(FNR_node->var_value),
+ FNR_node->var_value->numbr + 1.0);
+ }
+ if (cnt == EOF) {
+ input_file = save_fp;
+ getline_redirect = 0;
+ free(s);
+ return tmp_number((AWKNUM) 0.0);
+ }
+ *lhs = make_string(s, strlen(s));
+ free(s);
+ /* we may have to regenerate $0 here! */
+ if (field_num == 0)
+ set_record(fields_arr[0]->stptr, fields_arr[0]->stlen);
+ else if (field_num > 0) {
+ node0_valid = 0;
+ if (NF_node->var_value->numbr == -1 &&
+ field_num > NF_node->var_value->numbr)
+ assign_number(&(NF_node->var_value),
+ (AWKNUM) field_num);
+ }
+ field_num = -1;
+ do_deref();
+ }
+ getline_redirect = 0;
+ input_file = save_fp;
+ return tmp_number((AWKNUM) 1.0);
+}
+
+/*
+ * We can't dereference a variable until after we've given it its new value.
+ * This variable points to the value we have to free up
+ */
+NODE *deref;
+
+/*
+ * This returns a POINTER to a node pointer. get_lhs(ptr) is the current
+ * value of the var, or where to store the var's new value
+ */
+
+NODE **
+get_lhs(ptr)
+NODE *ptr;
+{
+ register NODE **aptr;
+ NODE *n;
+ NODE **assoc_lookup();
+ extern NODE *concat_exp();
+
+#ifdef DEBUG
+ if (ptr == NULL)
+ cant_happen();
+#endif
+ deref = NULL;
+ field_num = -1;
+ switch (ptr->type) {
+ case Node_var:
+ case Node_var_array:
+ if (ptr == NF_node && (int) NF_node->var_value->numbr == -1)
+ (void) get_field(HUGE-1); /* parse entire record */
+ deref = ptr->var_value;
+#ifdef DEBUG
+ if (deref->type != Node_val)
+ cant_happen();
+ if (deref->flags == 0)
+ cant_happen();
+#endif
+ return &(ptr->var_value);
+
+ case Node_param_list:
+ n = stack_ptr[ptr->param_cnt];
+#ifdef DEBUG
+ deref = n->var_value;
+ if (deref->type != Node_val)
+ cant_happen();
+ if (deref->flags == 0)
+ cant_happen();
+ deref = 0;
+#endif
+ return &(n->var_value);
+
+ case Node_field_spec:
+ field_num = (int) force_number(tree_eval(ptr->lnode));
+ free_result();
+ if (field_num < 0)
+ fatal("attempt to access field %d", field_num);
+ aptr = get_field(field_num);
+ deref = *aptr;
+ return aptr;
+
+ case Node_subscript:
+ n = ptr->lnode;
+ if (n->type == Node_param_list)
+ n = stack_ptr[n->param_cnt];
+ aptr = assoc_lookup(n, concat_exp(ptr->rnode));
+ deref = *aptr;
+#ifdef DEBUG
+ if (deref->type != Node_val)
+ cant_happen();
+ if (deref->flags == 0)
+ cant_happen();
+#endif
+ return aptr;
+ case Node_func:
+ fatal ("`%s' is a function, assignment is not allowed",
+ ptr->lnode->param);
+ }
+ return 0;
+}
+
+do_deref()
+{
+ if (deref == NULL)
+ return;
+ if (deref == Nnull_string) {
+ deref = 0;
+ return;
+ }
+#ifdef DEBUG
+ if (deref->flags == 0)
+ cant_happen();
+#endif
+ if ((deref->flags & MALLOC) || (deref->flags & TEMP)) {
+#ifdef DEBUG
+ if (deref->flags & PERM)
+ cant_happen();
+#endif
+ if (deref->flags & STR) {
+ if (deref->stref > 0 && deref->stref != 255)
+ deref->stref--;
+ if (deref->stref > 0) {
+ deref = 0;
+ return;
+ }
+ free((char *)(deref->stptr));
+ }
+ deref->stptr = NULL;
+ deref->numbr = -1111111.0;
+ deref->flags = 0;
+ deref->type = Node_illegal;
+ free((char *)deref);
+ }
+ deref = 0;
+}
diff --git a/awk8.c b/awk8.c
new file mode 100644
index 00000000..923c3f79
--- /dev/null
+++ b/awk8.c
@@ -0,0 +1,256 @@
+/*
+ * routines for associative arrays. SYMBOL is the address of the node (or
+ * other pointer) being dereferenced. SUBS is a number or string used as the
+ * subscript.
+ *
+ * Copyright (C) 1988 Free Software Foundation
+ *
+ */
+
+/*
+ * GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
+ * WARRANTY. No author or distributor accepts responsibility to anyone for
+ * the consequences of using it or for whether it serves any particular
+ * purpose or works at all, unless he says so in writing. Refer to the GAWK
+ * General Public License for full details.
+ *
+ * Everyone is granted permission to copy, modify and redistribute GAWK, but
+ * only under the conditions described in the GAWK General Public License. A
+ * copy of this license is supposed to have been given to you along with GAWK
+ * so you can know your rights and responsibilities. It should be in a file
+ * named COPYING. Among other things, the copyright notice and this notice
+ * must be preserved on all copies.
+ *
+ * In other words, go ahead and share GAWK, but don't try to stop anyone else
+ * from sharing it farther. Help stamp out software hoarding!
+ */
+
+#include "awk.h"
+
+#ifdef DONTDEF
+int primes[] = {31, 61, 127, 257, 509, 1021, 2053, 4099, 8191, 16381};
+#endif
+
+#define ASSOC_HASHSIZE 127
+#define STIR_BITS(n) ((n) << 5 | (((n) >> 27) & 0x1f))
+#define HASHSTEP(old, c) ((old << 1) + c)
+#define MAKE_POS(v) (v & ~0x80000000) /* make number positive */
+
+NODE *
+concat_exp(tree)
+NODE *tree;
+{
+ NODE *r;
+ NODE *n;
+ char *s;
+ unsigned char save;
+ unsigned len;
+ int subseplen;
+ char *subsep;
+ extern NODE *SUBSEP_node;
+
+ if (tree->type != Node_expression_list)
+ return force_string(tree_eval(tree));
+ r = force_string(tree_eval(tree->lnode));
+ if (tree->rnode == NULL)
+ return r;
+ subseplen = SUBSEP_node->lnode->stlen;
+ subsep = SUBSEP_node->lnode->stptr;
+ len = r->stlen + subseplen;
+ emalloc(s, char *, len + 1, "concat_exp");
+ (void) strcpy(s, r->stptr);
+ free_temp(r);
+ tree = tree->rnode;
+ while (tree) {
+ (void) strcat(s, subsep);
+ r = force_string(tree_eval(tree->lnode));
+ len += r->stlen + subseplen;
+ erealloc(s, char *, len + 1, "concat_exp");
+ (void) strcat(s, r->stptr);
+ free_temp(r);
+ tree = tree->rnode;
+ }
+ len -= subseplen;
+ r = tmp_string(s, (int) len);
+ free(s);
+ return r;
+}
+
+/* Flush all the values in symbol[] before doing a split() */
+assoc_clear(symbol)
+NODE *symbol;
+{
+ int i;
+ AHASH *bucket, *next;
+
+ if (symbol->var_array == 0)
+ return;
+ for (i = 0; i < ASSOC_HASHSIZE; i++) {
+ for (bucket = symbol->var_array[i]; bucket; bucket = next) {
+ next = bucket->next;
+ deref = bucket->name;
+ do_deref();
+ deref = bucket->value;
+ do_deref();
+ free((char *) bucket);
+ }
+ symbol->var_array[i] = 0;
+ }
+}
+
+/*
+ * calculate the hash function of the string subs, also returning in *typtr
+ * the type (string or number)
+ */
+static int
+hash_calc(subs)
+NODE *subs;
+{
+ register int hash1 = 0, i;
+
+ subs = force_string(subs);
+ for (i = 0; i < subs->stlen; i++)
+ hash1 = HASHSTEP(hash1, subs->stptr[i]);
+
+ hash1 = MAKE_POS(STIR_BITS((int) hash1)) % ASSOC_HASHSIZE;
+ return (hash1);
+}
+
+/*
+ * locate symbol[subs], given hash of subs and type
+ */
+static AHASH * /* NULL if not found */
+assoc_find(symbol, subs, hash1)
+NODE *symbol, *subs;
+int hash1;
+{
+ register AHASH *bucket;
+
+ for (bucket = symbol->var_array[hash1]; bucket; bucket = bucket->next) {
+ if (cmp_nodes(bucket->name, subs))
+ continue;
+ return bucket;
+ }
+ return NULL;
+}
+
+/*
+ * test whether the array element symbol[subs] exists or not
+ */
+int
+in_array(symbol, subs)
+NODE *symbol, *subs;
+{
+ register int hash1;
+
+ if (symbol->type == Node_param_list)
+ symbol = stack_ptr[symbol->param_cnt];
+ if (symbol->var_array == 0)
+ return 0;
+ subs = concat_exp(subs);
+ hash1 = hash_calc(subs);
+ if (assoc_find(symbol, subs, hash1) == NULL) {
+ free_temp(subs);
+ return 0;
+ } else {
+ free_temp(subs);
+ return 1;
+ }
+}
+
+/*
+ * Find SYMBOL[SUBS] in the assoc array. Install it with value "" if it
+ * isn't there. Returns a pointer ala get_lhs to where its value is stored
+ */
+NODE **
+assoc_lookup(symbol, subs)
+NODE *symbol, *subs;
+{
+ register int hash1 = 0, i;
+ register AHASH *bucket;
+
+ hash1 = hash_calc(subs);
+
+ if (symbol->var_array == 0) { /* this table really should grow
+ * dynamically */
+ emalloc(symbol->var_array, AHASH **, (sizeof(AHASH *) *
+ ASSOC_HASHSIZE), "assoc_lookup");
+ for (i = 0; i < ASSOC_HASHSIZE; i++)
+ symbol->var_array[i] = 0;
+ symbol->type = Node_var_array;
+ } else {
+ bucket = assoc_find(symbol, subs, hash1);
+ if (bucket != NULL) {
+ free_temp(subs);
+ return &(bucket->value);
+ }
+ }
+ emalloc(bucket, AHASH *, sizeof(AHASH), "assoc_lookup");
+ bucket->symbol = symbol;
+ bucket->name = dupnode(subs);
+ bucket->value = Nnull_string;
+ bucket->next = symbol->var_array[hash1];
+ symbol->var_array[hash1] = bucket;
+ return &(bucket->value);
+}
+
+do_delete(symbol, tree)
+NODE *symbol, *tree;
+{
+ register int hash1 = 0;
+ register AHASH *bucket, *last;
+ NODE *subs;
+
+ if (symbol->var_array == 0)
+ return;
+ subs = concat_exp(tree);
+ hash1 = hash_calc(subs);
+
+ last = NULL;
+ for (bucket = symbol->var_array[hash1]; bucket; last = bucket, bucket = bucket->next)
+ if (cmp_nodes(bucket->name, subs) == 0)
+ break;
+ free_temp(subs);
+ if (bucket == NULL)
+ return;
+ if (last)
+ last->next = bucket->next;
+ else
+ symbol->var_array[hash1] = NULL;
+ deref = bucket->name;
+ do_deref();
+ deref = bucket->value;
+ do_deref();
+ free((char *) bucket);
+}
+
+struct search *
+assoc_scan(symbol)
+NODE *symbol;
+{
+ struct search *lookat;
+
+ if (!symbol->var_array)
+ return 0;
+ emalloc(lookat, struct search *, sizeof(struct search), "assoc_scan");
+ lookat->numleft = ASSOC_HASHSIZE;
+ lookat->arr_ptr = symbol->var_array;
+ lookat->bucket = symbol->var_array[0];
+ return assoc_next(lookat);
+}
+
+struct search *
+assoc_next(lookat)
+struct search *lookat;
+{
+ for (; lookat->numleft; lookat->numleft--) {
+ while (lookat->bucket != 0) {
+ lookat->retval = lookat->bucket->name;
+ lookat->bucket = lookat->bucket->next;
+ return lookat;
+ }
+ lookat->bucket = *++(lookat->arr_ptr);
+ }
+ free((char *) lookat);
+ return 0;
+}
diff --git a/awk9.c b/awk9.c
new file mode 100644
index 00000000..64e39907
--- /dev/null
+++ b/awk9.c
@@ -0,0 +1,272 @@
+/*
+ * routines for node management
+ *
+ * Copyright (C) 1988 Free Software Foundation
+ *
+ */
+
+/*
+ * GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
+ * WARRANTY. No author or distributor accepts responsibility to anyone for
+ * the consequences of using it or for whether it serves any particular
+ * purpose or works at all, unless he says so in writing. Refer to the GAWK
+ * General Public License for full details.
+ *
+ * Everyone is granted permission to copy, modify and redistribute GAWK, but
+ * only under the conditions described in the GAWK General Public License. A
+ * copy of this license is supposed to have been given to you along with GAWK
+ * so you can know your rights and responsibilities. It should be in a file
+ * named COPYING. Among other things, the copyright notice and this notice
+ * must be preserved on all copies.
+ *
+ * In other words, go ahead and share GAWK, but don't try to stop anyone else
+ * from sharing it farther. Help stamp out software hoarding!
+ */
+
+#include "awk.h"
+
+AWKNUM
+r_force_number(n)
+NODE *n;
+{
+ double atof();
+
+#ifdef DEBUG
+ if (n == NULL)
+ cant_happen();
+ if (n->type != Node_val)
+ cant_happen();
+ if(n->flags == 0)
+ cant_happen();
+ if (n->flags & NUM)
+ return n->numbr;
+#endif
+ n->numbr = (AWKNUM) atof(n->stptr);
+ n->flags |= NUM;
+ return n->numbr;
+}
+
+NODE *
+r_force_string(s)
+NODE *s;
+{
+ char buf[30];
+ char *fmt;
+
+#ifdef DEBUG
+ if (s == NULL)
+ cant_happen();
+ if (s->type != Node_val)
+ cant_happen();
+ if (s->flags & STR)
+ return s;
+ if (!(s->flags & NUM))
+ cant_happen();
+ if (s->stref != 0)
+ cant_happen();
+#endif
+ s->flags |= STR;
+ fmt = OFMT_node->var_value->stptr;
+ /* integral value */
+ if (STREQ(fmt, "%.6g") && (long) s->numbr == s->numbr)
+ fmt = "%.11g";
+ /* should check validity of user supplied OFMT */
+ (void) sprintf(buf, fmt, s->numbr);
+ s->stlen = strlen(buf);
+ s->stref = 1;
+ emalloc(s->stptr, char *, s->stlen + 1, "force_string");
+ memcpy(s->stptr, buf, s->stlen+1);
+ return s;
+}
+
+/*
+ * This allocates a new node of type ty. Note that this node will not go
+ * away unless freed.
+ */
+NODE *
+newnode(ty)
+NODETYPE ty;
+{
+ register NODE *r;
+
+ emalloc(r, NODE *, sizeof(NODE), "newnode");
+ r->type = ty;
+ r->flags = MALLOC;
+ return r;
+}
+
+/*
+ * Duplicate a node. (For global strings, "duplicate" means crank up the
+ * reference count.) This creates global nodes. . .
+ */
+NODE *
+dupnode(n)
+NODE *n;
+{
+ register NODE *r;
+
+ if (n->flags & TEMP) {
+ n->flags &= ~TEMP;
+ n->flags |= MALLOC;
+ return n;
+ }
+ if ((n->flags & (MALLOC|STR)) == (MALLOC|STR)) {
+ if (n->stref < 255)
+ n->stref++;
+ return n;
+ }
+ emalloc(r, NODE *, sizeof(NODE), "dupnode");
+ *r = *n;
+ r->flags &= ~(PERM|TEMP);
+ r->flags |= MALLOC;
+ if (n->type == Node_val && (n->flags & STR)) {
+ r->stref = 1;
+ emalloc(r->stptr, char *, r->stlen + 1, "dupnode");
+ bcopy(n->stptr, r->stptr, r->stlen);
+ r->stptr[r->stlen] = '\0';
+ }
+ return r;
+}
+
+/* this allocates a node with defined numbr */
+/* This creates global nodes! */
+NODE *
+make_number(x)
+AWKNUM x;
+{
+ register NODE *r;
+
+ r = newnode(Node_val);
+ r->numbr = x;
+ r->flags |= NUM;
+ r->stref = 0;
+ return r;
+}
+
+/*
+ * This creates temporary nodes. They go away quite quicly, so don't use
+ * them for anything important
+ */
+NODE *
+tmp_number(x)
+AWKNUM x;
+{
+ NODE *r;
+
+ r = make_number(x);
+ r->flags |= TEMP;
+ return r;
+}
+
+/*
+ * Make a string node. If len==-1, the string passed in S is supposed to end
+ * with a double quote, but have had the beginning double quote already
+ * stripped off by yylex. If LEN!=-1, we don't care what s ends with. This
+ * creates a global node
+ */
+
+NODE *
+make_string(s, len)
+char *s;
+{
+ register NODE *r;
+ register char *pf, *pt;
+ register int c;
+ int count;
+
+ /*
+ * the aborts are impossible because yylex is supposed to have
+ * already checked for unterminated strings
+ */
+ if (len == -1) { /* Called from yyparse, find our own len */
+ for (pf = pt = s; *pf != '\0' && *pf != '\"';) {
+ c = *pf++;
+ switch (c) {
+ case '\0':
+ cant_happen();
+
+ case '\\':
+ if (*pf == '\0')
+ cant_happen();
+ c = *pf++;
+ switch (c) {
+ case '\\': /* no massagary needed */
+ case '\'':
+ case '\"':
+ break;
+ case '0':
+ case '1':
+ case '2':
+ case '3':
+ case '4':
+ case '5':
+ case '6':
+ case '7':
+#ifdef notdef
+ case '8':
+ case '9':
+#endif
+ c -= '0';
+ count = 1;
+ while (*pf && *pf >= '0' && *pf <= '7') {
+ c = c * 8 + *pf++ - '0';
+ if (++count >= 3)
+ break;
+ }
+ break;
+ case 'b':
+ c = '\b';
+ break;
+ case 'f':
+ c = '\f';
+ break;
+ case 'n':
+ c = '\n';
+ break;
+ case 'r':
+ c = '\r';
+ break;
+ case 't':
+ c = '\t';
+ break;
+ case 'v':
+ c = '\v';
+ break;
+ default:
+ *pt++ = '\\';
+ break;
+ }
+ /* FALL THROUGH */
+ default:
+ *pt++ = c;
+ break;
+ }
+ }
+ if (*pf == '\0')
+ cant_happen(); /* hit the end of the buf */
+ len = pt - s;
+ }
+ r = newnode(Node_val);
+ emalloc(r->stptr, char *, len + 1, "make_string");
+ r->stlen = len;
+ r->stref = 1;
+ bcopy(s, r->stptr, len);
+ r->stptr[len] = '\0'; /* a hack */
+ r->flags = (STR|MALLOC);
+
+ return r;
+}
+
+/* This should be a macro for speed, but the C compiler chokes. */
+/* Read the warning under tmp_number */
+NODE *
+tmp_string(s, len)
+char *s;
+int len;
+{
+ register NODE *r;
+
+ r = make_string(s, len);
+ r->flags |= TEMP;
+ return r;
+}
diff --git a/debug.c b/debug.c
deleted file mode 100644
index 59033abf..00000000
--- a/debug.c
+++ /dev/null
@@ -1,485 +0,0 @@
-/*
- Debug.c -- Various debugging routines
-
- Copyright (C) 1986 Free Software Foundation
- Written by Jay Fenlason, December 1986
-
- */
-
-/*
-GAWK is distributed in the hope that it will be useful, but WITHOUT ANY
-WARRANTY. No author or distributor accepts responsibility to anyone
-for the consequences of using it or for whether it serves any
-particular purpose or works at all, unless he says so in writing.
-Refer to the GAWK General Public License for full details.
-
-Everyone is granted permission to copy, modify and redistribute GAWK,
-but only under the conditions described in the GAWK General Public
-License. A copy of this license is supposed to have been given to you
-along with GAWK so you can know your rights and responsibilities. It
-should be in a file named COPYING. Among other things, the copyright
-notice and this notice must be preserved on all copies.
-
-In other words, go ahead and share GAWK, but don't try to stop
-anyone else from sharing it farther. Help stamp out software hoarding!
-*/
-#include "awk.h"
-#include <stdio.h>
-
-#ifndef FAST
-
-extern NODE **fields_arr;
-extern f_arr_siz;
-
-
-/* This is all debugging stuff. Ignore it and maybe it'll go away. */
-
-/* Some of it could be turned into a really cute trace command, if anyone
- wants to. */
-char *nnames[] = {
- "Illegal Node",
- "Times", "Divide", "Mod", "Plus", "Minus",
- "Cond-pair" /* jfw */, "Subscript", "Concat",
- "++Pre", "--Pre", "Post++",
- "Post--", "Uminus", "Field",
- "Assign", "*=", "/=", "%=",
- "+=", "-=",
- "And", "Or",
- "Equal", "!=", "Less", "Greater", "<=", ">=",
- "Not",
- "Match", "Nomatch",
- "String", "TmpString", "Number",
- "Rule_list", "Rule_node", "State_list", "If_branches", "Exp_list",
- "BEGIN", "END", "IF", "WHILE", "FOR",
- "arrayfor", "BREAK", "CONTINUE", "PRINT", "PRINTF",
- "next", "exit", "redirect", "Append",
- "Pipe", "variable", "Varray", "builtin",
- "Line-range" /*jfw*/,
-};
-
-ptree(n)
-{
- print_parse_tree((NODE *)n);
-}
-
-pt()
-{
- int x;
- scanf("%x",&x);
- printf("0x%x\n",x);
- print_parse_tree((NODE *)x);
- fflush(stdout);
-}
-
-static depth = 0;
-print_parse_tree(ptr)
-NODE *ptr;
-{
- register int n;
-
- if(!ptr) {
- printf("NULL\n");
- return;
- }
- if((int)(ptr->type)<0 || (int)(ptr->type)>sizeof(nnames)/sizeof(nnames[0])) {
- printf("(0x%x Type %d??)\n",ptr,ptr->type);
- return;
- }
- printf("(%d)%*s",depth,depth,"");
- switch((int)ptr->type) {
- case (int)Node_string:
- case (int)Node_temp_string:
- printf("(0x%x String \"%.*s\")\n",ptr,ptr->stlen,ptr->stptr);
- return;
- case (int)Node_number:
- printf("(0x%x Number %g)\n",ptr,ptr->numbr);
- return;
- case (int)Node_var_array:
- printf("(0x%x Array of %d)\n",ptr,ptr->arrsiz);
- for(n=0;n<ptr->arrsiz;n++) {
- printf("'");
- print_simple((ptr->array)[n*2],stdout);
- printf("' is '");
- print_simple((ptr->array)[n*2+1],stdout);
- printf("'\n");
- }
- return;
- }
- if(ptr->lnode) printf("0x%x = left<--",ptr->lnode);
- printf("(0x%x %s.%d)",ptr,nnames[(int)(ptr->type)],ptr->type);
- if(ptr->rnode) printf("-->right = 0x%x",ptr->rnode);
- printf("\n");
- depth++;
- if(ptr->lnode)
- print_parse_tree(ptr->lnode);
- switch((int)ptr->type) {
- case (int)Node_line_range: /* jfw */
- case (int)Node_match:
- case (int)Node_nomatch:
- break;
- case (int)Node_builtin:
- printf("Builtin: %d\n",ptr->proc); /* jfw: was \N */
- break;
- case (int)Node_K_for:
- case (int)Node_K_arrayfor:
- printf("(%s:)\n",nnames[(int)(ptr->type)]);
- print_parse_tree(ptr->forloop->init);
- printf("looping:\n");
- print_parse_tree(ptr->forloop->cond);
- printf("doing:\n");
- print_parse_tree(ptr->forloop->incr);
- break;
- default:
- if(ptr->rnode)
- print_parse_tree(ptr->rnode);
- break;
- }
- --depth;
-}
-#endif
-
-#ifndef FAST
-/*
- * print out all the variables in the world
- */
-
-dump_vars()
-{
- register int n;
- register HASHNODE *buc;
-
- printf("Fields:");
- dump_fields();
- printf("Vars:\n");
- for(n=0;n<HASHSIZE;n++) {
- for(buc=variables[n];buc;buc=buc->next) {
- printf("'%.*s': ",buc->length,buc->name);
- print_simple(buc->value->var_value,stdout);
- printf(":");
- print_parse_tree(buc->value->lnode);
- /* print_parse_tree(buc->value); */
- }
- }
- printf("End\n");
-}
-#endif
-
-#ifndef FAST
-dump_fields()
-{
- register NODE **p;
- register int n;
-
- printf("%d fields\n",f_arr_siz);
- for(n=0,p= &fields_arr[0];n<f_arr_siz;n++,p++) {
- printf("$%d is '",n);
- print_simple(*p,stdout);
- printf("'\n");
- }
-}
-#endif
-
-#ifndef FAST
-/*VARARGS1*/
-print_debug(str,n)
-char *str;
-{
- extern int debugging;
-
- if(debugging)
- printf("%s:%d\n",str,n);
-}
-
-int indent = 0;
-
-print_a_node(ptr)
-NODE *ptr;
-{
- NODE *p1;
- char *str,*str2;
- int n;
- HASHNODE *buc;
-
- if(!ptr) return; /* don't print null ptrs */
- switch(ptr->type) {
- case Node_number:
- printf("%g",ptr->numbr);
- return;
- case Node_string:
- printf("\"%.*s\"",ptr->stlen,ptr->stptr);
- return;
- case Node_times:
- str="*";
- goto pr_twoop;
- case Node_quotient:
- str="/";
- goto pr_twoop;
- case Node_mod:
- str="%";
- goto pr_twoop;
- case Node_plus:
- str="+";
- goto pr_twoop;
- case Node_minus:
- str="-";
- goto pr_twoop;
- case Node_concat:
- str=" ";
- goto pr_twoop;
- case Node_assign:
- str="=";
- goto pr_twoop;
- case Node_assign_times:
- str="*=";
- goto pr_twoop;
- case Node_assign_quotient:
- str="/=";
- goto pr_twoop;
- case Node_assign_mod:
- str="%=";
- goto pr_twoop;
- case Node_assign_plus:
- str="+=";
- goto pr_twoop;
- case Node_assign_minus:
- str="-=";
- goto pr_twoop;
- case Node_and:
- str="&&";
- goto pr_twoop;
- case Node_or:
- str="||";
- goto pr_twoop;
- case Node_equal:
- str="==";
- goto pr_twoop;
- case Node_notequal:
- str="!=";
- goto pr_twoop;
- case Node_less:
- str="<";
- goto pr_twoop;
- case Node_greater:
- str=">";
- goto pr_twoop;
- case Node_leq:
- str="<=";
- goto pr_twoop;
- case Node_geq:
- str=">=";
- goto pr_twoop;
-
- pr_twoop:
- print_a_node(ptr->lnode);
- printf("%s",str);
- print_a_node(ptr->rnode);
- return;
-
- case Node_not:
- str="!";
- str2="";
- goto pr_oneop;
- case Node_field_spec:
- str="$(";
- str2=")";
- goto pr_oneop;
- case Node_postincrement:
- str="";
- str2="++";
- goto pr_oneop;
- case Node_postdecrement:
- str="";
- str2="--";
- goto pr_oneop;
- case Node_preincrement:
- str="++";
- str2="";
- goto pr_oneop;
- case Node_predecrement:
- str="--";
- str2="";
- goto pr_oneop;
- pr_oneop:
- printf(str);
- print_a_node(ptr->subnode);
- printf(str2);
- return;
-
- case Node_expression_list:
- print_a_node(ptr->lnode);
- if(ptr->rnode) {
- printf(",");
- print_a_node(ptr->rnode);
- }
- return;
-
- case Node_var:
- for(n=0;n<HASHSIZE;n++) {
- for(buc=variables[n];buc;buc=buc->next) {
- if(buc->value==ptr) {
- printf("%.*s",buc->length,buc->name);
- n=HASHSIZE;
- break;
- }
- }
- }
- return;
- case Node_subscript:
- print_a_node(ptr->lnode);
- printf("[");
- print_a_node(ptr->rnode);
- printf("]");
- return;
- case Node_builtin:
- printf("some_builtin(");
- print_a_node(ptr->subnode);
- printf(")");
- return;
-
- case Node_statement_list:
- printf("{\n");
- indent++;
- for(n=indent;n;--n)
- printf(" ");
- while(ptr) {
- print_maybe_semi(ptr->lnode);
- if(ptr->rnode)
- for(n=indent;n;--n)
- printf(" ");
- ptr=ptr->rnode;
- }
- --indent;
- for(n=indent;n;--n)
- printf(" ");
- printf("}\n");
- for(n=indent;n;--n)
- printf(" ");
- return;
-
- case Node_K_if:
- printf("if(");
- print_a_node(ptr->lnode);
- printf(") ");
- ptr=ptr->rnode;
- if(ptr->lnode->type==Node_statement_list) {
- printf("{\n");
- indent++;
- for(p1=ptr->lnode;p1;p1=p1->rnode) {
- for(n=indent;n;--n)
- printf(" ");
- print_maybe_semi(p1->lnode);
- }
- --indent;
- for(n=indent;n;--n)
- printf(" ");
- if(ptr->rnode) {
- printf("} else ");
- } else {
- printf("}\n");
- return;
- }
- } else {
- print_maybe_semi(ptr->lnode);
- if(ptr->rnode) {
- for(n=indent;n;--n)
- printf(" ");
- printf("else ");
- } else return;
- }
- if(!ptr->rnode) return;
- deal_with_curls(ptr->rnode);
- return;
-
- case Node_K_for:
- printf("for(");
- print_a_node(ptr->forloop->init);
- printf(";");
- print_a_node(ptr->forloop->cond);
- printf(";");
- print_a_node(ptr->forloop->incr);
- printf(") ");
- deal_with_curls(ptr->forsub);
- return;
- case Node_K_arrayfor:
- printf("for(");
- print_a_node(ptr->forloop->init);
- printf(" in ");
- print_a_node(ptr->forloop->incr);
- printf(") ");
- deal_with_curls(ptr->forsub);
- return;
-
- case Node_K_printf:
- printf("printf(");
- print_a_node(ptr->lnode);
- printf(")");
- return;
- case Node_K_print:
- printf("print(");
- print_a_node(ptr->lnode);
- printf(")");
- return;
- case Node_K_next:
- printf("next");
- return;
- case Node_K_break:
- printf("break");
- return;
- default:
- print_parse_tree(ptr);
- return;
- }
-}
-
-print_maybe_semi(ptr)
-NODE *ptr;
-{
- print_a_node(ptr);
- switch(ptr->type) {
- case Node_K_if:
- case Node_K_for:
- case Node_K_arrayfor:
- case Node_statement_list:
- break;
- default:
- printf(";\n");
- break;
- }
-}
-deal_with_curls(ptr)
-NODE *ptr;
-{
- int n;
-
- if(ptr->type==Node_statement_list) {
- printf("{\n");
- indent++;
- while(ptr) {
- for(n=indent;n;--n)
- printf(" ");
- print_maybe_semi(ptr->lnode);
- ptr=ptr->rnode;
- }
- --indent;
- for(n=indent;n;--n)
- printf(" ");
- printf("}\n");
- } else {
- print_maybe_semi(ptr);
- }
-}
-
-NODE *
-do_prvars()
-{
- dump_vars();
- return Nnull_string;
-}
-
-NODE *
-do_bp()
-{
- return Nnull_string;
-}
-
-#endif
diff --git a/gawk.1 b/gawk.1
new file mode 100644
index 00000000..67a76c3b
--- /dev/null
+++ b/gawk.1
@@ -0,0 +1,1181 @@
+.TH GAWK 1 "Free Software Foundation"
+.SH NAME
+gawk \- pattern scanning and processing language
+.SH SYNOPSIS
+.B gawk
+.ig
+[
+.B \-d
+] [
+.B \-D
+] [
+.B \-i
+] [
+.B \-v
+]
+..
+[
+.BI \-F\^ fs
+]
+.B \-f
+.I program-file
+[
+.B \-f
+.I program-file
+\&.\^.\^. ] [
+.B \-\^\-
+] file .\^.\^.
+.br
+.B gawk
+.ig
+[
+.B \-d
+] [
+.B \-D
+] [
+.B \-i
+] [
+.B \-v
+]
+..
+[
+.BI \-F\^ fs
+] [
+.B \-\^\-
+]
+.I program-text
+file .\^.\^.
+.SH DESCRIPTION
+.I Gawk
+is the GNU Project's implementation of the AWK programming language.
+It conforms to the definition and description of the language in
+.IR "The AWK Programming Language" ,
+by Aho, Kernighan, and Weinberger,
+with the additional features defined in the System V Release 4 version
+of \s-1UNIX\s+1
+.IR awk .
+.PP
+The command line consists of options to
+.I gawk
+itself, the AWK program text (if not supplied via the
+.B \-f
+option), and values to be made
+available in the
+.B ARGC
+and
+.B ARGV
+pre-defined AWK variables.
+.PP
+The options that
+.I gawk
+accepts are:
+.TP
+.BI \-F fs
+Use
+.I fs
+for the input field separator (the value of the
+.B FS
+predefined
+variable). For compatibility with \s-1UNIX\s+1
+.IR awk ,
+if
+.I fs
+is ``t'', then
+.B FS
+will be set to the tab character.
+.TP
+.BI \-f " program-file"
+Read the AWK program source from the file
+.IR program-file ,
+instead of from the first command line argument.
+.TP
+.B \-\^\-
+Signal the end of options. This is useful to allow further arguments to the
+AWK program itself to start with a ``\-''.
+This is mainly for consistency with the argument parsing convention used
+by most other System V programs.
+.PP
+Any other options are flagged as illegal, but are otherwise ignored.
+(However, see the
+.B "GNU EXTENSIONS"
+section, below.)
+.PP
+An AWK program consists of a sequence of pattern-action statements
+and optional function definitions.
+.RS
+.PP
+\fIpattern\fB { \fIaction statements\fB }\fR
+.br
+\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR
+.RE
+.PP
+.I Gawk
+first reads the program source from the
+.IR program-file (s)
+if specified, or from the first non-option argument on the command line.
+The
+.B \-f
+option may be used multiple times on the command line.
+.I Gawk
+will read the program text as if all the
+.IR program-file s
+had been concatenated together. This is useful for building libraries
+of AWK functions, without having to include them in each new AWK
+program that uses them. To use a library function in a file from a
+program typed in on the command line, specify
+.B /dev/tty
+as one of the
+.IR program-file s,
+type your program, and end it with a
+.B ^D
+(control-d).
+.PP
+.I Gawk
+compiles the program into an internal form,
+and then proceeds to read
+each file named in the
+.B ARGV
+array.
+If there are no files named on the command line,
+.I gawk
+reads the standard input.
+.PP
+If a ``file'' named on the command line has the form
+.IB var = val
+it is treated as a variable assignment. The variable
+.I var
+will be assigned the value
+.IR val .
+This is most useful for dynamically assigning values to the variables
+AWK uses to control how input is broken into fields and records. It
+is also useful for controlling state if multiple passes are needed over
+a single data file.
+.PP
+For each line in the input,
+.I gawk
+tests to see if it matches any
+.I pattern
+in the AWK program.
+For each pattern that the line matches, the associated
+.I action
+is executed.
+.SH VARIABLES AND FIELDS
+AWK variables are dynamic; they come into existence when they are
+first used. Their values are either floating-point numbers or strings,
+depending upon how they are used. AWK also has single dimension
+arrays; multiply dimensioned arrays may be simulated.
+There are several pre-defined variables that AWK sets as a program
+runs; these will be described as needed and summarized below.
+.PP
+As each input line is read,
+.I gawk
+splits the line into
+.IR fields ,
+using the value of the
+.B FS
+variable as the field separator.
+If
+.B FS
+is a single character, fields are separated by that character.
+Otherwise,
+.B FS
+is expected to be a full regular expression.
+In the special case that
+.B FS
+is a single blank, fields are separated
+by runs of blanks and/or tabs.
+.PP
+Each field in the input line may be referenced by its position,
+.BR $1 ,
+.BR $2 ,
+and so on.
+.B $0
+is the whole line. The value of a field may be assigned to as well.
+Fields need not be referenced by constants:
+.RS
+.PP
+.ft B
+n = 5
+.br
+print $n
+.ft R
+.RE
+.PP
+prints the fifth field in the input line.
+The variable
+.B NF
+is set to the total number of fields in the input line.
+.PP
+References to non-existent fields (i.e. fields after
+.BR $NF ),
+produce the null-string. However, assigning to a non-existent field
+(e.g.,
+.BR "$(NF+2) = 5" )
+will increase the value of
+.BR NF ,
+create any intervening fields with the null string as their value, and
+cause the value of
+.B $0
+to be recomputed, with the fields being separated by the value of
+.BR OFS .
+.SS Built-in Variables
+.PP
+AWK's built-in variables are:
+.PP
+.RS
+.TP \l'\fBFILENAME\fR'
+.B ARGC
+the number of command line arguments (does not include options to
+.IR gawk ,
+or the program source).
+.TP \l'\fBFILENAME\fR'
+.B ARGV
+array of command line arguments. The array is indexed from
+0 to
+.B ARGC
+\- 1.
+Dynamically changing the contents of
+.B ARGV
+can control the files used for data.
+.TP \l'\fBFILENAME\fR'
+.B ENVIRON
+An array containing the values of the current environment.
+The array is indexed by the environment variables, each element being
+the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be
+.BR /u/arnold ).
+Changing this array does not affect the environment seen by programs which
+.I gawk
+spawns via redirection or the
+.B system
+function.
+.TP \l'\fBFILENAME\fR'
+.B FILENAME
+the name of the current input file.
+If no files are specified on the command line, the value of
+.B FILENAME
+is ``\-''.
+.TP \l'\fBFILENAME\fR'
+.B FNR
+the input record number in the current input file.
+.TP \l'\fBFILENAME\fR'
+.B FS
+the input field separator, a blank by default.
+.TP \l'\fBFILENAME\fR'
+.B NF
+the number of fields in the current input record.
+.TP \l'\fBFILENAME\fR'
+.B NR
+the total number of input records seen so far.
+.TP \l'\fBFILENAME\fR'
+.B OFMT
+the output format for numbers,
+.B %.6g
+by default.
+.TP \l'\fBFILENAME\fR'
+.B OFS
+the output field separator, a blank by default.
+.TP \l'\fBFILENAME\fR'
+.B ORS
+the output record separator, by default a newline.
+.TP \l'\fBFILENAME\fR'
+.B RS
+the input record separator, by default a newline.
+.B RS
+is exceptional in that only the first character of its string
+value is used for separating records. If
+.B RS
+is set to the null string, then records are separated by
+blank lines.
+When
+.B RS
+is set to the null string, then the newline character always acts as
+a field separator, in addition to whatever value
+.B FS
+may have.
+.TP \l'\fBFILENAME\fR'
+.B RSTART
+the index of the first character matched by
+.BR match() ;
+0 if no match.
+.TP \l'\fBFILENAME\fR'
+.B RLENGTH
+the length of the string matched by
+.BR match() ;
+\-1 if no match.
+.TP \l'\fBFILENAME\fR'
+.B SUBSEP
+the character used to separate multiple subscripts in array
+elements, by default \fB"\e034"\fR.
+.RE
+.SS Arrays
+.PP
+Arrays are subscripted with an expression between square brackets
+.RB ( [ " and " ] ).
+If the expression is an expression list
+.RI ( expr ", " expr " ...)"
+then the array subscript is a string consisting of the
+concatenation of the (string) value of each expression,
+separated by the value of the
+.B SUBSEP
+variable.
+This facility is used to simulate multiply dimensioned
+arrays. For example:
+.PP
+.RS
+.ft B
+i = "A" ;\^ j = "B" ;\^ k = "C"
+.br
+x[i,j,k] = "hello, world\en"
+.ft R
+.RE
+.PP
+assigns the string \fB"hello, world\en"\fR to the element of the array
+.B x
+which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in AWK
+are associative, i.e. indexed by string values.
+.PP
+The special operator
+.B in
+may be used in an
+.B if
+or
+.B while
+statement to see if an array has an index consisting of a particular
+value.
+.PP
+.RS
+.ft B
+.nf
+if (val in array)
+ print array[val]
+.fi
+.ft
+.RE
+.PP
+If the array has multiple subscripts, use
+.BR "(i, j) in array" .
+.PP
+The
+.B in
+construct may also be used in a
+.B for
+loop to iterate over all the elements of an array.
+.PP
+An element may be deleted from an array using the
+.B delete
+statement.
+.SS Variable Typing
+.PP
+Variables and fields
+may be (floating point) numbers, or strings, or both. How the
+value of a variable is interpreted depends upon its context. If used in
+a numeric expression, it will be treated as a number, if used as a string
+it will be treated as a string.
+.PP
+To force a variable to be treated as a number, add 0 to it; to force it
+to be treated as a string, concatenate it with the null string.
+.PP
+The AWK language defines comparisons as being done numerically if
+possible, otherwise one or both operands are converted to strings and
+a string comparison is performed.
+.PP
+Uninitialized variables have the numeric value 0 and the string value ""
+(the null, or empty, string).
+.SH PATTERNS AND ACTIONS
+AWK is a line oriented language. The pattern comes first, and then the
+action. Action statements are enclosed in
+.B {
+and
+.BR } .
+Either the pattern may be missing, or the action may be missing, but,
+of course, not both. If the pattern is missing, the action will be
+executed for every single line of input.
+A missing action is equivalent to
+.RS
+.PP
+.B "{ print }"
+.RE
+.PP
+which prints the entire line.
+.PP
+Comments begin with the ``#'' character, and continue until the
+end of the line.
+Blank lines may be used to separate statements.
+Normally, a statement ends with a newline, however, this is not the
+case for lines ending in
+a ``,'', ``{'', ``?'', ``:'', ``&&'', or ``||''.
+Lines ending in
+.B do
+or
+.B else
+also have their statements automatically continued on the following line.
+In other cases, a line can be continued by ending it with a ``\e'',
+in which case the newline will be ignored.
+.PP
+Multiple statements may
+be put on one line by separating them with a ``;''.
+This applies to both the statements within the action part of a
+pattern-action pair (the usual case),
+and to the pattern-action statements themselves.
+.SS Patterns
+AWK patterns may be one of the following:
+.PP
+.RS
+.nf
+.B BEGIN
+.B END
+.BI / "regular expression" /
+.I "relational expression"
+.IB pattern " && " pattern
+.IB pattern " || " pattern
+.IB pattern " ? " pattern " : " pattern
+.BI ( pattern )
+.BI ! " pattern"
+.IB pattern1 ", " pattern2"
+.fi
+.RE
+.PP
+.B BEGIN
+and
+.B END
+are two special kinds of patterns which are not tested against
+the input.
+The action parts of all
+.B BEGIN
+patterns are merged as if all the statements had
+been written in a single
+.B BEGIN
+block. They are executed before any
+of the input is read. Similarly, all the
+.B END
+blocks are merged,
+and executed when all the input is exhausted (or when an
+.B exit
+statement is executed).
+.B BEGIN
+and
+.B END
+patterns cannot be combined with other patterns in pattern expressions.
+.B BEGIN
+and
+.B END
+patterns cannot have missing action parts.
+.PP
+For
+.BI / "regular expression" /
+patterns, the associated statement is executed for each input line that matches
+the regular expression.
+Regular expressions are the same as those in
+.IR egrep (1),
+and are summarized below.
+.PP
+A
+.I "relational expression"
+may use any of the operators defined below in the section on actions.
+These generally test whether certain fields match certain regular expressions.
+.PP
+The
+.BR && ,
+.BR || ,
+and
+.B !
+operators are logical AND, logical OR, and logical NOT, respectively, as in C.
+They do short-circuit evaluation, also as in C, and are used for combining
+more primitive pattern expressions. As in most languages, parentheses
+may be used to change the order of evaluation.
+.PP
+The
+.B ?\^:
+operator is like the same operator in C. If the first pattern is true
+then the pattern used for testing is the second pattern, otherwise it is
+the third. Only one of the second and third patterns is evaluated.
+.PP
+The
+.IB pattern1 ", " pattern2"
+form of an expression is called a range pattern.
+It matches all input lines starting with a line that matches
+.IR pattern1 ,
+and continuing until a line that matches
+.IR pattern2 ,
+inclusive. It does not combine with any other sort of pattern expression.
+.SS Regular Expressions
+Regular expressions are the extended kind found in
+.IR egrep .
+They are composed of characters as follows:
+.RS
+.TP \l'[^abc...]'
+.I c
+matches the non-metacharacter
+.IR c .
+.TP \l'[^abc...]'
+.I \ec
+matches the literal character
+.IR c .
+.TP \l'[^abc...]'
+.B .
+matches any character except newline.
+.TP \l'[^abc...]'
+.B ^
+matches the beginning of a line or a string.
+.TP \l'[^abc...]'
+.B $
+matches the end of a line or a string.
+.TP \l'[^abc...]'
+.BI [ abc... ]
+character class, matches any of the characters
+.IR abc... .
+.TP \l'[^abc...]'
+.BI [^ abc... ]
+negated character class, matches any character except
+.I abc...
+and newline.
+.TP \l'[^abc...]'
+.IB r1 | r2
+alternation: matches either
+.I r1
+or
+.IR r2 .
+.TP \l'[^abc...]'
+.I r1r2
+concatenation: matches
+.IR r1 ,
+and then
+.IR r2 .
+.TP \l'[^abc...]'
+.IB r +
+matches one or more
+.IR r 's.
+.TP \l'[^abc...]'
+.IB r *
+matches zero or more
+.IR r 's.
+.TP \l'[^abc...]'
+.IB r ?
+matches zero or one
+.IR r 's.
+.TP \l'[^abc...]'
+.BI ( r )
+grouping: matches
+.IR r .
+.RE
+.SS Actions
+Action statements are enclosed in braces,
+.B {
+and
+.BR } .
+Action statements consist of the usual assignment, conditional, and looping
+statements found in most languages. The operators, control statements,
+and input/output statements
+available are patterned after those in C.
+.PP
+The operators in AWK, in order of increasing precedence, are
+.PP
+.RS
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.B "= += \-= *= /= %= ^="
+Assignment. Both absolute assignment
+.BI ( var " = " value )
+and operator-assignment (the other forms) are supported.
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.B ?:
+The C conditional expression. This has the form
+.IB expr1 " ? " expr2 " : " expr3\c
+\&. If
+.I expr1
+is true, the value of the expression is
+.IR expr2 ,
+otherwise it is
+.IR expr3 .
+Only one of
+.I expr2
+and
+.I expr3
+is evaluated.
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.B ||
+logical OR.
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.B &&
+logical AND.
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.B "~ !~"
+regular expression match, negated match.
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.B "< <= > >= != =="
+the regular relational operators.
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.I blank
+string concatenation.
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.B "+ \-"
+addition and subtraction.
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.B "* / %"
+multiplication, division, and modulus.
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.B "+ \- !"
+unary plus, unary minus, and logical negation.
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.B ^
+exponentiation (\fB**\fR may also be used, and \fB**=\fR for
+the assignment operator).
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.B "++ \-\^\-"
+increment and decrement, both prefix and postfix.
+.TP \l'\fB= += \-= *= /= %= ^=\fR'
+.B $
+field reference.
+.RE
+.PP
+The control statements are
+as follows:
+.PP
+.RS
+.nf
+\fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR]
+\fBwhile (\fIcondition\fB) \fIstatement \fR
+\fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR
+\fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR
+\fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR
+\fBbreak\fR
+\fBcontinue\fR
+\fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR
+\fBexit\fR [ \fIexpression\fR ]
+\fB{ \fIstatements \fB}
+.fi
+.RE
+.PP
+The input/output statements are as follows:
+.PP
+.RS
+.TP \l'\fBprintf \fIfmt, expr-list\fR'
+.BI close( filename )
+close file (or pipe, see below).
+.TP \l'\fBprintf \fIfmt, expr-list\fR'
+.B getline
+set
+.B $0
+from next input record; set
+.BR NF ,
+.BR NR ,
+.BR FNR .
+.TP \l'\fBprintf \fIfmt, expr-list\fR'
+.BI "getline <" file
+set
+.B $0
+from next record of
+.IR file ;
+set
+.BR NF .
+.TP \l'\fBprintf \fIfmt, expr-list\fR'
+.BI getline " var"
+set
+.I var
+from next input record; set
+.BR NF ,
+.BR FNR .
+.TP \l'\fBprintf \fIfmt, expr-list\fR'
+.BI getline " var" " <" file
+set
+.I var
+from next record of
+.IR file .
+.TP \l'\fBprintf \fIfmt, expr-list\fR'
+.B next
+Stop processing the current input record. The next input record
+is read and processing starts over with the first pattern in the
+AWK program. If the end of the input data is reached, the
+.B END
+block(s), if any, are executed.
+.TP \l'\fBprintf \fIfmt, expr-list\fR'
+.B print
+prints the current record.
+.TP \l'\fBprintf \fIfmt, expr-list\fR'
+.BI print " expr-list"
+prints expressions.
+.TP \l'\fBprintf \fIfmt, expr-list\fR'
+.BI print " expr-list" " >" file
+prints expressions on
+.IR file .
+.TP \l'\fBprintf \fIfmt, expr-list\fR'
+.BI printf " fmt, expr-list"
+format and print.
+.TP \l'\fBprintf \fIfmt, expr-list\fR'
+.BI printf " fmt, expr-list" " >" file
+format and print on
+.IR file .
+.TP \l'\fBprintf \fIfmt, expr-list\fR'
+.BI system( cmd-line )
+execute the command
+.IR cmd-line ,
+and return the exit status.
+(This may not be available on
+systems besides \s-1UNIX\s+1 and \s-1GNU\s+1.)
+.RE
+.PP
+Other input/output redirections are also allowed. For
+.B print
+and
+.BR printf ,
+.BI >> file
+appends output to the
+.IR file ,
+while
+.BI | " command"
+writes on a pipe.
+In a similar fashion,
+.IB command " | getline"
+pipes into
+.BR getline .
+.BR Getline
+will return 0 on end of file, and \-1 on an error.
+.PP
+The AWK versions of the
+.B printf
+and
+.B sprintf
+(see below)
+functions accept the following conversion specification formats:
+.RS
+.TP
+.B %c
+An ASCII character.
+.TP
+.B %d
+A decimal number (the integer part).
+.TP
+.B %e
+A floating point number of the form
+.BR [\-]d.ddddddE[+\^\-]dd .
+.TP
+.B %f
+A floating point number of the form
+.BR [\-]ddd.dddddd .
+.TP
+.B %g
+Use
+.B e
+or
+.B f
+conversion, whichever is shorter, with nonsignificant zeros suppressed.
+.TP
+.B %o
+An unsigned octal number (again, an integer).
+.TP
+.B %s
+A character string.
+.TP
+.B %x
+An unsigned hexadecimal number (an integer).
+.TP
+.B %%
+A single
+.B %
+character; no argument is converted.
+.RE
+.PP
+There are optional, additional parameters that may lie between the
+.B %
+and the control letter:
+.RS
+.TP
+.B \-
+The expression should be left-justified within its field.
+.TP
+.I width
+The field should be padded to this width. If the number has a leading
+zero, then the field will be padded with zeros.
+Otherwise it is padded with blanks.
+.TP
+.BI . prec
+A number indicating the maximum width of strings or digits to the right
+of the decimal point.
+.RE
+.PP
+The dynamic
+.I width
+and
+.I prec
+capabilities of the C library
+.B printf
+routines are not supported.
+However, they may be simulated by using
+the AWK concatenation operation to build up
+a format specification dynamically.
+.PP
+AWK has the following pre-defined arithmetic functions:
+.PP
+.RS
+.TP \l'\fBsrand(\fIexpr\fB)\fR'
+.BI atan2( y , " x" )
+returns the arctangent of
+.I y/x
+in radians.
+.TP \l'\fBsrand(\fIexpr\fB)\fR'
+.BI cos( expr )
+returns the cosine in radians.
+.TP \l'\fBsrand(\fIexpr\fB)\fR'
+.BI exp( expr )
+the exponential function.
+.TP \l'\fBsrand(\fIexpr\fB)\fR'
+.BI int( expr )
+truncates to integer.
+.TP \l'\fBsrand(\fIexpr\fB)\fR'
+.BI log( expr )
+the natural logarithm function.
+.TP \l'\fBsrand(\fIexpr\fB)\fR'
+.B rand()
+returns a random number between 0 and 1.
+.TP \l'\fBsrand(\fIexpr\fB)\fR'
+.BI sin( expr )
+returns the sine in radians.
+.TP \l'\fBsrand(\fIexpr\fB)\fR'
+.BI sqrt( expr )
+the square root function.
+.TP \l'\fBsrand(\fIexpr\fB)\fR'
+.BI srand( expr )
+use
+.I expr
+as a new seed for the random number generator. If no
+.I expr
+is provided, the time of day will be used.
+The return value is the previous seed for the random
+number generator.
+.RE
+.PP
+AWK has the following pre-defined string functions:
+.PP
+.RS
+.TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
+\fBgsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR
+for each substring matching the regular expression
+.I r
+in the string
+.IR t ,
+substitute the string
+.IR s ,
+and return the number of substitutions.
+If
+.I t
+is not supplied, use
+.BR $0 .
+.TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
+.BI index( s , " t" )
+returns the index of the string
+.I t
+in the string
+.IR s ,
+or 0 if
+.I t
+is not present.
+.TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
+.BI length( s )
+returns the length of the string
+.IR s .
+.TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
+.BI match( s , " r" )
+returns the position in
+.I s
+where the regular expression
+.I r
+occurs, or 0 if
+.I r
+is not present, and sets the values of
+.B RSTART
+and
+.BR RLENGTH .
+.TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
+\fBsplit(\fIs\fB, \fIa\fB, \fIr\fB)\fR
+splits the string
+.I s
+into the array
+.I a
+on the regular expression
+.IR r ,
+and returns the number of fields. If
+.I r
+is omitted,
+.B FS
+is used instead.
+.TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
+.BI sprintf( fmt , " expr-list" )
+prints
+.I expr-list
+according to
+.IR fmt ,
+and returns the resulting string.
+.TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
+\fBsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR
+this is just like
+.BR gsub ,
+but only the first matching substring is replaced.
+.TP \l'\fBsprintf(\fIfmt\fB, \fIexpr-list\fB)\fR'
+\fBsubstr(\fIs\fB, \fIi\fB, \fIn\fB)\fR
+returns the
+.IR n -character
+substring of
+.I s
+starting at
+.IR i .
+If
+.I n
+is omitted, the rest of
+.I s
+is used.
+.RE
+.PP
+String constants in AWK are sequences of characters enclosed
+between double quotes (\fB"\fR). Within strings, certain
+.I "escape sequences"
+are recognized, as in C. These are:
+.PP
+.RS
+.TP \l'\fB\e\fIddd\fR'
+.B \eb
+backspace.
+.TP \l'\fB\e\fIddd\fR'
+.B \ef
+form-feed.
+.TP \l'\fB\e\fIddd\fR'
+.B \en
+new line.
+.TP \l'\fB\e\fIddd\fR'
+.B \er
+carriage return.
+.TP \l'\fB\e\fIddd\fR'
+.B \et
+horizontal tab.
+.TP \l'\fB\e\fIddd\fR'
+.B \ev
+vertical tab.
+.TP \l'\fB\e\fIddd\fR'
+.BI \e ddd
+The character represented by the 1-, 2-, or 3-digit sequence of octal
+digits. E.g. "\e033" is the ASCII ESC (escape) character.
+.RE
+.SH FUNCTIONS
+Functions in AWK are defined as follows:
+.PP
+.RS
+\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR
+.RE
+.PP
+Functions are executed when called from within the action parts of regular
+pattern-action statements. Actual parameters supplied in the function
+call are used to instantiate the formal parameters declared in the function.
+Arrays are passed by reference, other variables are passed by value.
+.PP
+Since functions were not originally part of the AWK language, the provision
+for local variables is rather clumsy: they are declared as extra parameters
+in the parameter list. The convention is to separate local variables from
+real parameters by extra spaces in the parameter list. For example:
+.PP
+.RS
+.ft B
+.nf
+function f(p, q, a, b) { # a & b are local
+ ..... }
+
+/abc/ { ... ; f(1, 2) ; ... }
+.fi
+.ft R
+.RE
+.PP
+The left parenthesis in a function call is required
+to immediately follow the function name,
+without any intervening white space.
+This is to avoid a syntactic ambiguity with the concatenation operator.
+This restriction does not apply to the built-in functions listed above.
+.PP
+Functions may call each other and may be recursive.
+Function parameters used as local variables are initialized
+to the null string and the number zero upon function invocation.
+.PP
+The word
+.B func
+may be used in place of
+.BR function .
+.SH EXAMPLES
+.nf
+Print and sort the login names of all users:
+
+.ft B
+ BEGIN { FS = ":" }
+ { print $1 | "sort" }
+
+.ft R
+Count lines in a file:
+
+.ft B
+ { nlines++ }
+ END { print nlines }
+
+.ft R
+Precede each line by its number in the file:
+
+.ft B
+ { print FNR, $0 }
+
+.ft R
+Concatenate and line number (a variation on a theme):
+
+.ft B
+ { print NR, $0 }
+.ft R
+.SH SEE ALSO
+.IR "The AWK Programming Language" ,
+Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
+Addison-Wesley, 1988. ISBN 0-201-07981-X.
+.SH SYSTEM V RELEASE 4 COMPATIBILITY
+A primary goal for
+.I gawk
+is compatibility with the latest version of \s-1UNIX\s+1
+.IR awk .
+To this end,
+.I gawk
+incorporates the following user visible
+features which are not described in the AWK book,
+but are part of
+.I awk
+in System V Release 4.
+.PP
+When processing arguments,
+.I gawk
+uses the special option ``\fB\-\^\-\fP'' to signal the end of
+arguments, and warns about, but otherwise ignores, undefined options.
+.PP
+The AWK book does not define the return value of
+.BR srand() .
+The System V Release 4 version of \s-1UNIX\s+1
+.I awk
+has it return the seed it was using, to allow keeping track
+of random number sequences. Therefore
+.B srand()
+in
+.I gawk
+also returns its current seed.
+.PP
+The use of multiple
+.B \-f
+options is a new feature, as is the
+.B ENVIRON
+array.
+.SH GNU EXTENSIONS
+.I Gawk
+has some extensions to System V
+.IR awk .
+They are described in this section.
+All features described in this section may change at some time in
+the future, or may go away entirely. They can be disabled either by
+compiling
+.I gawk
+with
+.BR \-DSTRICT ,
+or by invoking
+.I gawk
+with the name
+.IR awk .
+You should not write programs that depend upon them.
+.PP
+The environment variable
+.B AWKPATH
+specifies a search path to use when finding source files named with
+the
+.B \-f
+option. If this variable does not exist, the default path is
+\fB".:/usr/lib/awk:/usr/local/lib/awk"\fR.
+If a file name given to the
+.B \-f
+option contains a ``/'' character, no path search is performed.
+.PP
+Two new relational operators are defined,
+.BR ~~ ,
+and
+.BR !~~ .
+These perform case independent regular expression match and no-match
+operations, respectively.
+.PP
+The AWK book does not define the return value of the
+.B close
+function.
+.IR Gawk\^ 's
+.B close
+returns the value from
+.IR fclose (3),
+or
+.IR pclose (3),
+when closing a file or pipe, respectively.
+.PP
+.I Gawk
+accepts the following additional arguments:
+.ig
+.TP
+.B \-D
+Turn on general debugging and turn on
+.IR yacc (1)
+or
+.IR bison (1)
+debugging output during program parsing.
+This option should only be of interest to the
+.I gawk
+maintainers, and may not even be compiled into
+.IR gawk .
+.TP
+.B \-d
+Turn on general debugging and print the
+.I gawk
+internal tree as the program is executed.
+This option should only be of interest to the
+.I gawk
+maintainers, and may not even be compiled into
+.IR gawk .
+..
+.TP
+.B \-i
+Ignore case when doing regular expression operations.
+This causes
+.B ~
+and
+.B !~
+to behave like the new operators
+.B ~~
+and
+.BR !~~ ,
+described above.
+.TP
+.B \-v
+Print version information for this particular copy of
+.I gawk
+on the error output.
+This is useful mainly for knowing if the current copy of
+.I gawk
+on your system
+is up to date with respect to whatever the Free Software Foundation
+is distributing.
+.SH BUGS
+The
+.B \-F
+option is not necessary given the command line variable assignment feature;
+it remains only for backwards compatibility.
+.SH AUTHORS
+The original version of \s-1UNIX\s+1
+.I awk
+was designed and implemented by Alfred Aho,
+Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan
+continues to maintain and enhance it.
+.PP
+Paul Rubin and Jay Fenlason, with John Woods,
+all of the Free Software Foundation, wrote
+.IR gawk ,
+to be compatible with the original version of
+.I awk
+distributed in Seventh Edition \s-1UNIX\s+1.
+David Trueman of Dalhousie University, with contributions
+from Arnold Robbins at Emory University, made
+.I gawk
+compatible with the new version of \s-1UNIX\s+1
+.IR awk .
+.SH ACKNOWLEDGEMENTS
+Brian Kernighan of Bell Labs
+provided valuable assistance during testing and debugging.
+We thank him.
diff --git a/obstack.c b/obstack.c
deleted file mode 100644
index 66148106..00000000
--- a/obstack.c
+++ /dev/null
@@ -1,157 +0,0 @@
-/* obstack.c - subroutines used implicitly by object stack macros
- Copyright (c) 1986 Free Software Foundation, Inc.
-
- NO WARRANTY
-
- BECAUSE THIS PROGRAM IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY
-NO WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE LAW. EXCEPT
-WHEN OTHERWISE STATED IN WRITING, FREE SOFTWARE FOUNDATION, INC,
-RICHARD M. STALLMAN AND/OR OTHER PARTIES PROVIDE THIS PROGRAM "AS IS"
-WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING,
-BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
-FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY
-AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE
-DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR
-CORRECTION.
-
- IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL RICHARD M.
-STALLMAN, THE FREE SOFTWARE FOUNDATION, INC., AND/OR ANY OTHER PARTY
-WHO MAY MODIFY AND REDISTRIBUTE THIS PROGRAM AS PERMITTED BELOW, BE
-LIABLE TO YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR
-OTHER SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
-USE OR INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR
-DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES OR
-A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) THIS
-PROGRAM, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
-DAMAGES, OR FOR ANY CLAIM BY ANY OTHER PARTY.
-
- GENERAL PUBLIC LICENSE TO COPY
-
- 1. You may copy and distribute verbatim copies of this source file
-as you receive it, in any medium, provided that you conspicuously and
-appropriately publish on each copy a valid copyright notice "Copyright
-(C) 1986 Free Software Foundation, Inc."; and include following the
-copyright notice a verbatim copy of the above disclaimer of warranty
-and of this License.
-
- 2. You may modify your copy or copies of this source file or
-any portion of it, and copy and distribute such modifications under
-the terms of Paragraph 1 above, provided that you also do the following:
-
- a) cause the modified files to carry prominent notices stating
- that you changed the files and the date of any change; and
-
- b) cause the whole of any work that you distribute or publish,
- that in whole or in part contains or is a derivative of this
- program or any part thereof, to be freely distributed
- and licensed to all third parties on terms identical to those
- contained in this License Agreement (except that you may choose
- to grant more extensive warranty protection to third parties,
- at your option).
-
- 3. You may copy and distribute this program or any portion of it in
-compiled, executable or object code form under the terms of Paragraphs
-1 and 2 above provided that you do the following:
-
- a) cause each such copy to be accompanied by the
- corresponding machine-readable source code, which must
- be distributed under the terms of Paragraphs 1 and 2 above; or,
-
- b) cause each such copy to be accompanied by a
- written offer, with no time limit, to give any third party
- free (except for a nominal shipping charge) a machine readable
- copy of the corresponding source code, to be distributed
- under the terms of Paragraphs 1 and 2 above; or,
-
- c) in the case of a recipient of this program in compiled, executable
- or object code form (without the corresponding source code) you
- shall cause copies you distribute to be accompanied by a copy
- of the written offer of source code which you received along
- with the copy you received.
-
- 4. You may not copy, sublicense, distribute or transfer this program
-except as expressly provided under this License Agreement. Any attempt
-otherwise to copy, sublicense, distribute or transfer this program is void and
-your rights to use the program under this License agreement shall be
-automatically terminated. However, parties who have received computer
-software programs from you with this License Agreement will not have
-their licenses terminated so long as such parties remain in full compliance.
-*/
-
-#include <obstack.h>
-
-void
-_obstack_begin (h, chunkfun)
- struct obstack *h;
- int (*chunkfun) ();
-{
- register _Ll* chunk; /* points to new chunk */
- chunk = h->chunk =
- (_Ll*) (*chunkfun) (h->chunk_size);
- h->next_free = h->object_base = chunk->obstack_l_0;
- h->chunk_limit = chunk->obstack_l_limit
- = (char *) chunk + h->chunk_size;
- chunk->obstack_l_prev = 0;
-}
-
-/* Allocate a new current chunk for the obstack *H
- on the assumption that LENGTH bytes need to be added
- to the current object, or a new object of length LENGTH allocated.
- Copies any partial object from the end of the old chunk
- to the beginning of the new one. */
-
-void
-_obstack_newchunk (h, chunkfun, length)
- struct obstack *h;
- int (*chunkfun) ();
- int length;
-{
- register _Ll* old_chunk = h->chunk;
- register _Ll* new_chunk;
- register long new_size;
- register int obj_size = h->next_free - h->object_base;
-
- /* Compute size for new chunk. */
- new_size = (obj_size + length) << 1;
- if (new_size < h->chunk_size)
- new_size = h->chunk_size;
-
- /* Allocate and initialize the new chunk. */
- new_chunk = h->chunk = (_Ll*) (*chunkfun) (new_size);
- new_chunk->obstack_l_prev = old_chunk;
- new_chunk->obstack_l_limit = h->chunk_limit = (char *) new_chunk + new_size;
-
- /* Move the existing object to the new chunk. */
- bcopy (h->object_base, new_chunk->obstack_l_0, obj_size);
- h->object_base = new_chunk->obstack_l_0;
- h->next_free = h->object_base + obj_size;
- };
-
-void
-_obstack_free (h, freechunkfun, obj)
- struct obstack *h;
- void (*freechunkfun) ();
- char *obj;
-{
- register _Ll* lp; /* below addr of any objects in this chunk */
- register _Ll* plp; /* point to previous chunk if any */
-
- lp = (h)->chunk;
- while (lp != 0 && ((char *)lp > obj || (h)->chunk_limit < obj))
- {
- plp = lp -> obstack_l_prev;
- (*freechunkfun) (lp);
- if(lp==plp)
- plp=0;
- lp = plp;
- }
- if (lp)
- {
- (h)->object_base = (h)->next_free = (char *)(obj);
- (h)->chunk_limit = lp->obstack_l_limit;
- (h)->chunk = lp;
- }
- else if (obj != 0)
- /* obj is not in any of the chunks! */
- abort ();
-}
diff --git a/obstack.h b/obstack.h
index 772a5baa..e69de29b 100644
--- a/obstack.h
+++ b/obstack.h
@@ -1,204 +0,0 @@
-/* obstack.h - object stack macros
- Copyright (c) 1986 Free Software Foundation, Inc.
-
-Summary:
-
-All the apparent functions defined here are macros. The idea
-is that you would use these pre-tested macros to solve a
-very specific set of problems, and they would run fast.
-Caution: no side-effects in arguments please!! They may be
-evaluated MANY times!!
-
-These macros operate a stack of objects. Each object starts life
-small, and may grow to maturity. (Consider building a word syllable
-by syllable.) An object can move while it is growing. Once it has
-been "finished" it never changes address again. So the "top of the
-stack" is typically an immature growing object, while the rest of the
-stack is of mature, fixed size and fixed address objects.
-
-These routines grab large chunks of memory, using a function you
-supply, called `obstack_chunk_alloc'. On occasion, they free chunks,
-by calling `obstack_chunk_free'. You must define them and declare
-them before using any obstack macros.
-
-Each independent stack is represented by a `struct obstack'.
-Each of the obstack macros expects a pointer to such a structure
-as the first argument.
-
-One motivation for this package is the problem of growing char strings
-in symbol tables. Unless you are "facist pig with a read-only mind"
-[Gosper's immortal quote from HAKMEM item 154, out of context] you
-would not like to put any arbitrary upper limit on the length of your
-symbols.
-
-In practice this often means you will build many short symbols and a
-few long symbols. At the time you are reading a symbol you don't know
-how long it is. One traditional method is to read a symbol into a
-buffer, realloc()ating the buffer every time you try to read a symbol
-that is longer than the buffer. This is beaut, but you still will
-want to copy the symbol from the buffer to a more permanent
-symbol-table entry say about half the time.
-
-With obstacks, you can work differently. Use one obstack for all symbol
-names. As you read a symbol, grow the name in the obstack gradually.
-When the name is complete, finalize it. Then, if the symbol exists already,
-free the newly read name.
-
-The way we do this is to take a large chunk, allocating memory from
-low addresses. When you want to build a aymbol in the chunk you just
-add chars above the current "high water mark" in the chunk. When you
-have finished adding chars, because you got to the end of the symbol,
-you know how long the chars are, and you can create a new object.
-Mostly the chars will not burst over the highest address of the chunk,
-because you would typically expect a chunk to be (say) 100 times as
-long as an average object.
-
-In case that isn't clear, when we have enough chars to make up
-the object, THEY ARE ALREADY CONTIGUOUS IN THE CHUNK (guaranteed)
-so we just point to it where it lies. No moving of chars is
-needed and this is the second win: potentially long strings need
-never be explicitly shuffled. Once an object is formed, it does not
-change its address during its lifetime.
-
-When the chars burst over a chunk boundary, we allocate a larger
-chunk, and then copy the partly formed object from the end of the old
-chunk to the beggining of the new larger chunk. We then carry on
-accreting characters to the end of the object as we normaly would.
-
-A special macro is provided to add a single char at a time to a
-growing object. This allows the use of register variables, which
-break the ordinary 'growth' macro.
-
-Summary:
- We allocate large chunks.
- We carve out one object at a time from the current chunk.
- Once carved, an object never moves.
- We are free to append data of any size to the currently
- growing object.
- Exactly one object is growing in an obstack at any one time.
- You can run one obstack per control block.
- You may have as many control blocks as you dare.
- Because of the way we do it, you can `unwind' a obstack
- back to a previous state. (You may remove objects much
- as you would with a stack.)
-*/
-
-#ifndef obstackH
-#define obstackH
- /* these #defines keep it brief */
-#define _Ll struct obstack_chunk
-#define _LL (8) /* _L length in chars */
-
-struct obstack_chunk /* Lives at front of each chunk. */
-{
- char *obstack_l_limit; /* 1 past end of this chunk */
- _Ll *obstack_l_prev; /* address of prior chunk or NULL */
- char obstack_l_0[4]; /* objects begin here */
-};
-
-#if 0
-This function, called like malloc but not returning on failure,
-must return a chunk of the size given to it as argument,
-aligned on a boundary of 2**OBSTACK_LOG_DEFAULT_ALIGNMENT bytes.
-
-struct obstack_chunk * obstack_chunk_alloc();
-#endif /* 0 */
-
-struct obstack /* control current object in current chunk */
-{
- long chunk_size; /* preferred size to allocate chunks in */
- _Ll* chunk; /* address of current struct obstack_chunk */
- char *object_base; /* address of object we are building */
- char *next_free; /* where to add next char to current object */
- char *chunk_limit; /* address of char after current chunk */
- int temp; /* Temporary for some macros. */
- int alignment_mask; /* Mask of alignment for each object. */
-};
-
-/* Pointer to beginning of object being allocated or to be allocated next.
- Note that this might not be the final address of the object
- because a new chunk might be needed to hold the final size. */
-
-#define obstack_base(h) ((h)->object_base)
-
-/* Pointer to next byte not yet allocated in current chunk. */
-
-#define obstack_next_free(h) ((h)->next_free)
-
-/* Size of object currently growing */
-
-#define obstack_object_size(h) ((h)->next_free - (h)->object_base)
-
-/* Mask specifying low bits that should be clear in address of an object. */
-
-#define obstack_alignment_mask(h) ((h)->alignment_mask)
-
-#define obstack_init(h) obstack_begin (h, 4096 - 4 - _LL)
-
-#define obstack_begin(h,try_length) \
-((h)->chunk_size = (try_length) + (_LL), \
- (h)->alignment_mask = ((1 << 2) - 1), \
- _obstack_begin ((h), obstack_chunk_alloc))
-
-#define obstack_grow(h,where,length) \
-( (h)->temp = (length), \
- (((h)->next_free + (h)->temp > (h)->chunk_limit) \
- ? _obstack_newchunk ((h), obstack_chunk_alloc, (h)->temp) : 0), \
- bcopy (where, (h)->next_free, (h)->temp), \
- (h)->next_free += (h)->temp)
-
-#define obstack_grow0(h,where,length) \
-( (h)->temp = (length), \
- (((h)->next_free + (h)->temp + 1 > (h)->chunk_limit) \
- ? _obstack_newchunk ((h), obstack_chunk_alloc, (h)->temp + 1) : 0), \
- bcopy (where, (h)->next_free, (h)->temp), \
- (h)->next_free += (h)->temp, \
- *((h)->next_free)++ = 0)
-
-#define obstack_1grow(h,datum) \
-( (((h)->next_free + 1 > (h)->chunk_limit) \
- ? _obstack_newchunk ((h), obstack_chunk_alloc, 1) : 0), \
- *((h)->next_free)++ = (datum))
-
-#define obstack_blank(h,length) \
-( (h)->temp = (length), \
- (((h)->next_free + (h)->temp > (h)->chunk_limit) \
- ? _obstack_newchunk ((h), obstack_chunk_alloc, (h)->temp) : 0), \
- (h)->next_free += (h)->temp)
-
-#define obstack_alloc(h,length) \
- (obstack_blank ((h), (length)), obstack_finish (h))
-
-#define obstack_copy(h,where,length) \
- (obstack_grow ((h), (where), (length)), obstack_finish (h))
-
-#define obstack_copy0(h,where,length) \
- (obstack_grow0 ((h), (where), (length)), obstack_finish (h))
-
-#define obstack_room(h) ((long unsigned int) \
- ((h)->chunk_limit - (h)->next_free))
-
-#define obstack_1grow_fast(h,achar) (*((h)->next_free)++ = achar)
-
-#define obstack_blank_fast(h,n) ((h)->next_free += (n))
-
-#define obstack_finish(h) \
- ((h)->temp = (int) (h)->object_base, \
- (h)->next_free \
- = (char*)((int)((h)->next_free+(h)->alignment_mask) \
- & ~ ((h)->alignment_mask)), \
- (((h)->next_free - (char *)(h)->chunk \
- > (h)->chunk_limit - (char *)(h)->chunk) \
- ? (h)->next_free = (h)->chunk_limit : 0), \
- (h)->object_base = (h)->next_free, \
- (char *) (h)->temp)
-
-#define obstack_free(h,obj) \
-(((h)->temp = (char *)(obj) - (char *) (h)->chunk), \
- (((h)->temp >= 0 && (h)->temp < (h)->chunk_limit - (char *) (h)->chunk)\
- ? (int) ((h)->next_free = (h)->object_base \
- = (h)->temp + (char *) (h)->chunk) \
- : (int) _obstack_free ((h), obstack_chunk_free, \
- (h)->temp + (char *) (h)->chunk)))
-
-#endif /* #ifndef obstackH */
diff --git a/regex.c b/regex.c
index ebfd612e..40118055 100644
--- a/regex.c
+++ b/regex.c
@@ -32,7 +32,8 @@ as you receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy a valid copyright notice "Copyright
(C) 1985 Free Software Foundation, Inc."; and include following the
copyright notice a verbatim copy of the above disclaimer of warranty
-and of this License.
+and of this License. You may charge a distribution fee for the
+physical act of transferring a copy.
2. You may modify your copy or copies of this source file or
any portion of it, and copy and distribute such modifications under
@@ -43,31 +44,42 @@ the terms of Paragraph 1 above, provided that you also do the following:
b) cause the whole of any work that you distribute or publish,
that in whole or in part contains or is a derivative of this
- program or any part thereof, to be freely distributed
- and licensed to all third parties on terms identical to those
- contained in this License Agreement (except that you may choose
- to grant more extensive warranty protection to third parties,
- at your option).
-
- 3. You may copy and distribute this program or any portion of it in
-compiled, executable or object code form under the terms of Paragraphs
-1 and 2 above provided that you do the following:
-
- a) cause each such copy to be accompanied by the
- corresponding machine-readable source code, which must
- be distributed under the terms of Paragraphs 1 and 2 above; or,
-
- b) cause each such copy to be accompanied by a
- written offer, with no time limit, to give any third party
- free (except for a nominal shipping charge) a machine readable
- copy of the corresponding source code, to be distributed
- under the terms of Paragraphs 1 and 2 above; or,
-
- c) in the case of a recipient of this program in compiled, executable
- or object code form (without the corresponding source code) you
- shall cause copies you distribute to be accompanied by a copy
- of the written offer of source code which you received along
- with the copy you received.
+ program or any part thereof, to be licensed at no charge to all
+ third parties on terms identical to those contained in this
+ License Agreement (except that you may choose to grant more extensive
+ warranty protection to some or all third parties, at your option).
+
+ c) You may charge a distribution fee for the physical act of
+ transferring a copy, and you may at your option offer warranty
+ protection in exchange for a fee.
+
+Mere aggregation of another unrelated program with this program (or its
+derivative) on a volume of a storage or distribution medium does not bring
+the other program under the scope of these terms.
+
+ 3. You may copy and distribute this program (or a portion or derivative
+of it, under Paragraph 2) in object code or executable form under the terms
+of Paragraphs 1 and 2 above provided that you also do one of the following:
+
+ a) accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of
+ Paragraphs 1 and 2 above; or,
+
+ b) accompany it with a written offer, valid for at least three
+ years, to give any third party free (except for a nominal
+ shipping charge) a complete machine-readable copy of the
+ corresponding source code, to be distributed under the terms of
+ Paragraphs 1 and 2 above; or,
+
+ c) accompany it with the information you received as to where the
+ corresponding source code may be obtained. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form alone.)
+
+For an executable file, complete source code means all the source code for
+all modules it contains; but, as a special exception, it need not include
+source code for modules which are standard libraries that accompany the
+operating system on which the executable file runs.
4. You may not copy, sublicense, distribute or transfer this program
except as expressly provided under this License Agreement. Any attempt
@@ -77,6 +89,14 @@ automatically terminated. However, parties who have received computer
software programs from you with this License Agreement will not have
their licenses terminated so long as such parties remain in full compliance.
+ 5. If you wish to incorporate parts of this program into other free
+programs whose distribution conditions are different, write to the Free
+Software Foundation at 675 Mass Ave, Cambridge, MA 02139. We have not yet
+worked out a simple rule that can be stated here, but we will often permit
+this. We will be guided by the two goals of preserving the free status of
+all derivatives of our free software and of promoting the sharing and reuse of
+software.
+
In other words, you are welcome to use, share and improve this program.
You are forbidden to forbid anyone else to use, share and improve
@@ -88,11 +108,6 @@ what you give them. Help stamp out software-hoarding! */
which reads a pattern, describes how it compiles,
then reads a string and searches for it. */
-/* JF this var has taken on whole new meanings as time goes by. Various bits
-in this int tell how certain pieces of syntax should work */
-
-static int obscure_syntax = 0;
-
#ifdef emacs
/* The `emacs' switch turns on certain special matching commands
@@ -105,6 +120,21 @@ static int obscure_syntax = 0;
#else /* not emacs */
+#ifdef USG
+#define bcopy(s,d,n) memcpy((d),(s),(n))
+#define bcmp(s1,s2,n) memcmp((s1),(s2),(n))
+#define bzero(s,n) memset((s),0,(n))
+#endif
+
+/* Make alloca work the best possible way. */
+#ifdef __GNUC__
+#define alloca __builtin_alloca
+#else
+#ifdef sparc
+#include <alloca.h>
+#endif
+#endif
+
/*
* Define the syntax stuff, so we can do the \<...\> things.
*/
@@ -157,7 +187,7 @@ init_syntax_once ()
#ifndef NFAILURES
#define NFAILURES 80
-#endif NFAILURES
+#endif /* NFAILURES */
/* width of a byte in bits */
@@ -167,20 +197,39 @@ init_syntax_once ()
#define SIGN_EXTEND_CHAR(x) (x)
#endif
-/* compile_pattern takes a regular-expression descriptor string in the user's format
- and converts it into a buffer full of byte commands for matching.
+static int obscure_syntax = 0;
+
+/* Specify the precise syntax of regexp for compilation.
+ This provides for compatibility for various utilities
+ which historically have different, incompatible syntaxes.
+
+ The argument SYNTAX is a bit-mask containing the two bits
+ RE_NO_BK_PARENS and RE_NO_BK_VBAR. */
+
+int
+re_set_syntax (syntax)
+{
+ int ret;
+
+ ret = obscure_syntax;
+ obscure_syntax = syntax;
+ return ret;
+}
+
+/* re_compile_pattern takes a regular-expression string
+ and converts it into a buffer full of byte commands for matching.
- pattern is the address of the pattern string
- size is the length of it.
- bufp is a struct re_pattern_buffer * which points to the info
+ PATTERN is the address of the pattern string
+ SIZE is the length of it.
+ BUFP is a struct re_pattern_buffer * which points to the info
on where to store the byte commands.
This structure contains a char * which points to the
actual space, which should have been obtained with malloc.
- compile_pattern may use realloc to grow the buffer space.
+ re_compile_pattern may use realloc to grow the buffer space.
The number of bytes of commands can be found out by looking in
the struct re_pattern_buffer that bufp pointed to,
- after compile_pattern returns.
+ after re_compile_pattern returns.
*/
#define PATPUSH(ch) (*b++ = (char) (ch))
@@ -216,20 +265,6 @@ init_syntax_once ()
static int store_jump (), insert_jump ();
-/* JF this function is used to compile UN*X style regexps. In particular,
- ( ) and | don't have to be \ed to have a special meaning */
-
-int
-re_set_syntax(syntax)
-{
- int ret;
-
- ret=obscure_syntax;
- obscure_syntax=syntax;
- return ret;
-}
-
-
char *
re_compile_pattern (pattern, size, bufp)
char *pattern;
@@ -325,9 +360,28 @@ re_compile_pattern (pattern, size, bufp)
switch (c)
{
case '$':
+ if (obscure_syntax & RE_TIGHT_VBAR)
+ {
+ if (! (obscure_syntax & RE_CONTEXT_INDEP_OPS) && p != pend)
+ goto normal_char;
+ /* Make operand of last vbar end before this `$'. */
+ if (fixup_jump)
+ store_jump (fixup_jump, jump, b);
+ fixup_jump = 0;
+ PATPUSH (endline);
+ break;
+ }
+
/* $ means succeed if at end of line, but only in special contexts.
- If randonly in the middle of a pattern, it is a normal character. */
- if (p == pend || (*p == '\\' && (p[1] == ')' || p[1] == '|')))
+ If randomly in the middle of a pattern, it is a normal character. */
+ if (p == pend || *p == '\n'
+ || (obscure_syntax & RE_CONTEXT_INDEP_OPS)
+ || (obscure_syntax & RE_NO_BK_PARENS
+ ? *p == ')'
+ : *p == '\\' && p[1] == ')')
+ || (obscure_syntax & RE_NO_BK_VBAR
+ ? *p == '|'
+ : *p == '\\' && p[1] == '|'))
{
PATPUSH (endline);
break;
@@ -336,15 +390,30 @@ re_compile_pattern (pattern, size, bufp)
case '^':
/* ^ means succeed if at beg of line, but only if no preceding pattern. */
- if (laststart) goto normal_char;
- PATPUSH (begline);
+
+ if (laststart && p[-2] != '\n'
+ && ! (obscure_syntax & RE_CONTEXT_INDEP_OPS))
+ goto normal_char;
+ if (obscure_syntax & RE_TIGHT_VBAR)
+ {
+ if (p != pattern + 1
+ && ! (obscure_syntax & RE_CONTEXT_INDEP_OPS))
+ goto normal_char;
+ PATPUSH (begline);
+ begalt = b;
+ }
+ else
+ PATPUSH (begline);
break;
- case '*':
case '+':
case '?':
+ if (obscure_syntax & RE_BK_PLUS_QM)
+ goto normal_char;
+ handle_plus:
+ case '*':
/* If there is no previous pattern, char not special. */
- if (!laststart)
+ if (!laststart && ! (obscure_syntax & RE_CONTEXT_INDEP_OPS))
goto normal_char;
/* If there is a sequence of repetition chars,
collapse it down to equivalent to just one. */
@@ -357,13 +426,36 @@ re_compile_pattern (pattern, size, bufp)
if (p == pend)
break;
PATFETCH (c);
- if (!(c == '*' || c == '+' || c == '?'))
+ if (c == '*')
+ ;
+ else if (!(obscure_syntax & RE_BK_PLUS_QM)
+ && (c == '+' || c == '?'))
+ ;
+ else if ((obscure_syntax & RE_BK_PLUS_QM)
+ && c == '\\')
+ {
+ int c1;
+ PATFETCH (c1);
+ if (!(c1 == '+' || c1 == '?'))
+ {
+ PATUNFETCH;
+ PATUNFETCH;
+ break;
+ }
+ c = c1;
+ }
+ else
{
PATUNFETCH;
break;
}
}
+ /* Star, etc. applied to an empty pattern is equivalent
+ to an empty pattern. */
+ if (!laststart)
+ break;
+
/* Now we know whether 0 matches is allowed,
and whether 2 or more matches is allowed. */
if (many_times_ok)
@@ -391,8 +483,8 @@ re_compile_pattern (pattern, size, bufp)
break;
case '[':
- if (b - bufp->buffer
- > bufp->allocated - 3 - (1 << BYTEWIDTH) / BYTEWIDTH)
+ while (b - bufp->buffer
+ > bufp->allocated - 3 - (1 << BYTEWIDTH) / BYTEWIDTH)
/* Note that EXTEND_BUFFER clobbers c */
EXTEND_BUFFER;
@@ -411,7 +503,7 @@ re_compile_pattern (pattern, size, bufp)
{
PATFETCH (c);
if (c == ']' && p != p1 + 1) break;
- if (*p == '-')
+ if (*p == '-' && p[1] != ']')
{
PATFETCH (c1);
PATFETCH (c1);
@@ -425,86 +517,44 @@ re_compile_pattern (pattern, size, bufp)
}
/* Discard any bitmap bytes that are all 0 at the end of the map.
Decrement the map-length byte too. */
- while (b[-1] > 0 && b[b[-1] - 1] == 0)
+ while ((int) b[-1] > 0 && b[b[-1] - 1] == 0)
b[-1]--;
b += b[-1];
break;
- case '(':
- if(!(obscure_syntax&RE_NO_BK_PARENS)) goto normal_char;
- if (stackp == stacke) goto nesting_too_deep;
- if (regnum < RE_NREGS)
- {
- PATPUSH (start_memory);
- PATPUSH (regnum);
- }
- *stackp++ = b - bufp->buffer;
- *stackp++ = fixup_jump ? fixup_jump - bufp->buffer + 1 : 0;
- *stackp++ = regnum++;
- *stackp++ = begalt - bufp->buffer;
- fixup_jump = 0;
- laststart = 0;
- begalt = b;
- break;
+ case '(':
+ if (! (obscure_syntax & RE_NO_BK_PARENS))
+ goto normal_char;
+ else
+ goto handle_open;
- case ')':
- if(!(obscure_syntax&RE_NO_BK_PARENS)) goto normal_char;
- if (stackp == stackb) goto unmatched_close;
- begalt = *--stackp + bufp->buffer;
- if (fixup_jump)
- store_jump (fixup_jump, jump, b);
- if (stackp[-1] < RE_NREGS)
- {
- PATPUSH (stop_memory);
- PATPUSH (stackp[-1]);
- }
- stackp -= 2;
- fixup_jump = 0;
- if (*stackp)
- fixup_jump = *stackp + bufp->buffer - 1;
- laststart = *--stackp + bufp->buffer;
- break;
+ case ')':
+ if (! (obscure_syntax & RE_NO_BK_PARENS))
+ goto normal_char;
+ else
+ goto handle_close;
- case '|':
- if(!(obscure_syntax&RE_NO_BK_VBAR)) goto normal_char;
- insert_jump (on_failure_jump, begalt, b + 6, b);
- pending_exact = 0;
- b += 3;
- if (fixup_jump)
- store_jump (fixup_jump, jump, b);
- fixup_jump = b;
- b += 3;
- laststart = 0;
- begalt = b;
- break;
+ case '\n':
+ if (! (obscure_syntax & RE_NEWLINE_OR))
+ goto normal_char;
+ else
+ goto handle_bar;
+
+ case '|':
+ if (! (obscure_syntax & RE_NO_BK_VBAR))
+ goto normal_char;
+ else
+ goto handle_bar;
case '\\':
if (p == pend) goto invalid_pattern;
PATFETCH_RAW (c);
switch (c)
{
-#ifdef emacs
- case '=':
- PATPUSH (at_dot);
- break;
-
- case 's':
- laststart = b;
- PATPUSH (syntaxspec);
- PATFETCH (c);
- PATPUSH (syntax_spec_code[c]);
- break;
-
- case 'S':
- laststart = b;
- PATPUSH (notsyntaxspec);
- PATFETCH (c);
- PATPUSH (syntax_spec_code[c]);
- break;
-#endif emacs
-
case '(':
- if(obscure_syntax&RE_NO_BK_PARENS) goto normal_backsl;
+ if (obscure_syntax & RE_NO_BK_PARENS)
+ goto normal_backsl;
+ handle_open:
if (stackp == stacke) goto nesting_too_deep;
if (regnum < RE_NREGS)
{
@@ -521,7 +571,9 @@ re_compile_pattern (pattern, size, bufp)
break;
case ')':
- if(obscure_syntax&RE_NO_BK_PARENS) goto normal_backsl;
+ if (obscure_syntax & RE_NO_BK_PARENS)
+ goto normal_backsl;
+ handle_close:
if (stackp == stackb) goto unmatched_close;
begalt = *--stackp + bufp->buffer;
if (fixup_jump)
@@ -539,7 +591,9 @@ re_compile_pattern (pattern, size, bufp)
break;
case '|':
- if(obscure_syntax&RE_NO_BK_VBAR) goto normal_backsl;
+ if (obscure_syntax & RE_NO_BK_VBAR)
+ goto normal_backsl;
+ handle_bar:
insert_jump (on_failure_jump, begalt, b + 6, b);
pending_exact = 0;
b += 3;
@@ -551,6 +605,26 @@ re_compile_pattern (pattern, size, bufp)
begalt = b;
break;
+#ifdef emacs
+ case '=':
+ PATPUSH (at_dot);
+ break;
+
+ case 's':
+ laststart = b;
+ PATPUSH (syntaxspec);
+ PATFETCH (c);
+ PATPUSH (syntax_spec_code[c]);
+ break;
+
+ case 'S':
+ laststart = b;
+ PATPUSH (notsyntaxspec);
+ PATFETCH (c);
+ PATPUSH (syntax_spec_code[c]);
+ break;
+#endif /* emacs */
+
case 'w':
laststart = b;
PATPUSH (wordchar);
@@ -604,9 +678,15 @@ re_compile_pattern (pattern, size, bufp)
PATPUSH (duplicate);
PATPUSH (c1);
break;
+
+ case '+':
+ case '?':
+ if (obscure_syntax & RE_BK_PLUS_QM)
+ goto handle_plus;
+
default:
normal_backsl:
- /* You might think it wuld be useful for \ to mean
+ /* You might think it would be useful for \ to mean
not to translate; but if we don't translate it
it will never match anything. */
if (translate) c = translate[c];
@@ -618,7 +698,9 @@ re_compile_pattern (pattern, size, bufp)
normal_char:
if (!pending_exact || pending_exact + *pending_exact + 1 != b
|| *pending_exact == 0177 || *p == '*' || *p == '^'
- || *p == '+' || *p == '?')
+ || ((obscure_syntax & RE_BK_PLUS_QM)
+ ? *p == '\\' && (p[1] == '+' || p[1] == '?')
+ : (*p == '+' || *p == '?')))
{
laststart = b;
PATPUSH (exactn);
@@ -833,11 +915,12 @@ re_compile_fastmap (bufp)
break;
case notsyntaxspec:
+ k = *p++;
for (j = 0; j < (1 << BYTEWIDTH); j++)
if (SYNTAX (j) != (enum syntaxcode) k)
fastmap[j] = 1;
break;
-#endif emacs
+#endif /* emacs */
case charset:
for (j = *p++ * BYTEWIDTH - 1; j >= 0; j--)
@@ -891,16 +974,17 @@ re_search (pbufp, string, size, startpos, range, regs)
return re_search_2 (pbufp, 0, 0, string, size, startpos, range, regs, size);
}
-/* Like re_match_2 but tries first a match starting at index `startpos',
- then at startpos + 1, and so on.
- `range' is the number of places to try before giving up.
- If `range' is negative, the starting positions tried are
- startpos, startpos - 1, etc.
- It is up to the caller to make sure that range is not so large
- as to take the starting position outside of the input strings.
+/* Like re_match_2 but tries first a match starting at index STARTPOS,
+ then at STARTPOS + 1, and so on.
+ RANGE is the number of places to try before giving up.
+ If RANGE is negative, the starting positions tried are
+ STARTPOS, STARTPOS - 1, etc.
+ It is up to the caller to make sure that range is not so large
+ as to take the starting position outside of the input strings.
The value returned is the position at which the match was found,
- or -1 if no match was found. */
+ or -1 if no match was found,
+ or -2 if error (such as failure stack overflow). */
int
re_search_2 (pbufp, string1, size1, string2, size2, startpos, range, regs, mstop)
@@ -913,12 +997,24 @@ re_search_2 (pbufp, string1, size1, string2, size2, startpos, range, regs, mstop
int mstop;
{
register char *fastmap = pbufp->fastmap;
- register char *translate = pbufp->translate;
+ register unsigned char *translate = (unsigned char *) pbufp->translate;
int total = size1 + size2;
+ int val;
/* Update the fastmap now if not correct already */
if (fastmap && !pbufp->fastmap_accurate)
re_compile_fastmap (pbufp);
+
+ /* Don't waste time in a long search for a pattern
+ that says it is anchored. */
+ if (pbufp->used > 0 && (enum regexpcode) pbufp->buffer[0] == begbuf
+ && range > 0)
+ {
+ if (startpos > 0)
+ return -1;
+ else
+ range = 1;
+ }
while (1)
{
@@ -933,12 +1029,13 @@ re_search_2 (pbufp, string1, size1, string2, size2, startpos, range, regs, mstop
if (range > 0)
{
register int lim = 0;
- register char *p;
+ register unsigned char *p;
int irange = range;
if (startpos < size1 && startpos + range >= size1)
lim = range - (size1 - startpos);
- p = &(startpos >= size1 ? string2 - size1 : string1)[startpos];
+ p = ((unsigned char *)
+ &(startpos >= size1 ? string2 - size1 : string1)[startpos]);
if (translate)
{
@@ -954,9 +1051,12 @@ re_search_2 (pbufp, string1, size1, string2, size2, startpos, range, regs, mstop
}
else
{
- register char c;
- if (startpos >= size1) c = string2[startpos - size1];
- else c = string1[startpos];
+ register unsigned char c;
+ if (startpos >= size1)
+ c = string2[startpos - size1];
+ else
+ c = string1[startpos];
+ c &= 0xff;
if (translate ? !fastmap[translate[c]] : !fastmap[c])
goto advance;
}
@@ -966,8 +1066,13 @@ re_search_2 (pbufp, string1, size1, string2, size2, startpos, range, regs, mstop
&& fastmap && pbufp->can_be_null == 0)
return -1;
- if (0 <= re_match_2 (pbufp, string1, size1, string2, size2, startpos, regs, mstop))
- return startpos;
+ val = re_match_2 (pbufp, string1, size1, string2, size2, startpos, regs, mstop);
+ if (0 <= val)
+ {
+ if (val == -2)
+ return -2;
+ return startpos;
+ }
#ifdef C_ALLOCA
alloca (0);
@@ -992,42 +1097,47 @@ re_match (pbufp, string, size, pos, regs)
}
#endif /* emacs */
-/* Match the pattern described by `pbufp'
- against data which is the virtual concatenation of `string1' and `string2'.
- `size1' and `size2' are the sizes of the two data strings.
- Start the match at position `pos'.
- Do not consider matching past the position `mstop'.
+/* Maximum size of failure stack. Beyond this, overflow is an error. */
- If pbufp->fastmap is nonzero, then it had better be up to date.
+int re_max_failures = 2000;
- The reason that the data to match is specified as two components
- which are to be regarded as concatenated
- is so that this function can be used directly on the contents of an Emacs buffer.
+static int bcmp_translate();
+/* Match the pattern described by PBUFP
+ against data which is the virtual concatenation of STRING1 and STRING2.
+ SIZE1 and SIZE2 are the sizes of the two data strings.
+ Start the match at position POS.
+ Do not consider matching past the position MSTOP.
- -1 is returned if there is no match. Otherwise the value is the length
- of the substring which was matched.
-*/
+ If pbufp->fastmap is nonzero, then it had better be up to date.
+
+ The reason that the data to match are specified as two components
+ which are to be regarded as concatenated
+ is so this function can be used directly on the contents of an Emacs buffer.
+
+ -1 is returned if there is no match. -2 is returned if there is
+ an error (such as match stack overflow). Otherwise the value is the length
+ of the substring which was matched. */
int
re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
struct re_pattern_buffer *pbufp;
- char *string1, *string2;
+ unsigned char *string1, *string2;
int size1, size2;
int pos;
struct re_registers *regs;
int mstop;
{
- register char *p = pbufp->buffer;
- register char *pend = p + pbufp->used;
+ register unsigned char *p = (unsigned char *) pbufp->buffer;
+ register unsigned char *pend = p + pbufp->used;
/* End of first string */
- char *end1;
+ unsigned char *end1;
/* End of second string */
- char *end2;
+ unsigned char *end2;
/* Pointer just past last char to consider matching */
- char *end_match_1, *end_match_2;
- register char *d, *dend;
+ unsigned char *end_match_1, *end_match_2;
+ register unsigned char *d, *dend;
register int mcnt;
- char *translate = pbufp->translate;
+ unsigned char *translate = (unsigned char *) pbufp->translate;
/* Failure point stack. Each place that can handle a failure further down the line
pushes a failure point on this stack. It consists of two char *'s.
@@ -1037,8 +1147,9 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
If a failure happens and the innermost failure point is dormant,
it discards that failure point and tries the next one. */
- char **stackb = (char **) alloca (2 * NFAILURES * sizeof (char *));
- char **stackp = stackb, **stacke = &stackb[2 * NFAILURES];
+ unsigned char *initial_stack[2 * NFAILURES];
+ unsigned char **stackb = initial_stack;
+ unsigned char **stackp = stackb, **stacke = &stackb[2 * NFAILURES];
/* Information on the "contents" of registers.
These are pointers into the input strings; they record
@@ -1048,14 +1159,12 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
At that point, regstart[regnum] points to the first character in the register,
regend[regnum] points to the first character beyond the end of the register,
- and regstart_segend[regnum] is either the same as regend[regnum]
- or else points to the end of the input string into which regstart[regnum] points.
- The latter case happens when regstart[regnum] is in string1 and
- regend[regnum] is in string2. */
+ regstart_seg1[regnum] is true iff regstart[regnum] points into string1,
+ and regend_seg1[regnum] is true iff regend[regnum] points into string1. */
- char *regstart[RE_NREGS];
- char *regstart_segend[RE_NREGS];
- char *regend[RE_NREGS];
+ unsigned char *regstart[RE_NREGS];
+ unsigned char *regend[RE_NREGS];
+ unsigned char regstart_seg1[RE_NREGS], regend_seg1[RE_NREGS];
/* Set up pointers to ends of strings.
Don't allow the second string to be empty unless both are empty. */
@@ -1081,11 +1190,11 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
end_match_2 = string2 + mstop - size1;
}
- /* Initialize \( and \) text positions to -1
+ /* Initialize \) text positions to -1
to mark ones that no \( or \) has been seen for. */
- for (mcnt = 0; mcnt < sizeof (regstart) / sizeof (*regstart); mcnt++)
- regstart[mcnt] = (char *) -1;
+ for (mcnt = 0; mcnt < sizeof (regend) / sizeof (*regend); mcnt++)
+ regend[mcnt] = (unsigned char *) -1;
/* `p' scans through the pattern as `d' scans through the data.
`dend' is the end of the input string that `d' points within.
@@ -1119,31 +1228,31 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
/* If caller wants register contents data back, convert it to indices */
if (regs)
{
- regend[0] = d;
- regstart[0] = string1;
- for (mcnt = 0; mcnt < RE_NREGS; mcnt++)
+ regs->start[0] = pos;
+ if (dend == end_match_1)
+ regs->end[0] = d - string1;
+ else
+ regs->end[0] = d - string2 + size1;
+ for (mcnt = 1; mcnt < RE_NREGS; mcnt++)
{
- if ((mcnt != 0) && regstart[mcnt] == (char *) -1)
+ if (regend[mcnt] == (unsigned char *) -1)
{
regs->start[mcnt] = -1;
regs->end[mcnt] = -1;
continue;
}
- if (regstart[mcnt] - string1 < 0 ||
- regstart[mcnt] - string1 > size1)
- regs->start[mcnt] = regstart[mcnt] - string2 + size1;
- else
+ if (regstart_seg1[mcnt])
regs->start[mcnt] = regstart[mcnt] - string1;
- if (regend[mcnt] - string1 < 0 ||
- regend[mcnt] - string1 > size1)
- regs->end[mcnt] = regend[mcnt] - string2 + size1;
else
+ regs->start[mcnt] = regstart[mcnt] - string2 + size1;
+ if (regend_seg1[mcnt])
regs->end[mcnt] = regend[mcnt] - string1;
+ else
+ regs->end[mcnt] = regend[mcnt] - string2 + size1;
}
- regs->start[0] = pos;
}
- if (d - string1 >= 0 && d - string1 <= size1)
- return d - string1 - pos;
+ if (dend == end_match_1)
+ return (d - string1 - pos);
else
return d - string2 + size1 - pos;
}
@@ -1164,23 +1273,22 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
case start_memory:
regstart[*p] = d;
- regstart_segend[*p++] = dend;
+ regstart_seg1[*p++] = (dend == end_match_1);
break;
case stop_memory:
regend[*p] = d;
- if (regstart_segend[*p] == dend)
- regstart_segend[*p] = d;
- p++;
+ regend_seg1[*p++] = (dend == end_match_1);
break;
case duplicate:
{
int regno = *p++; /* Get which register to match against */
- register char *d2, *dend2;
+ register unsigned char *d2, *dend2;
d2 = regstart[regno];
- dend2 = regstart_segend[regno];
+ dend2 = ((regstart_seg1[regno] == regend_seg1[regno])
+ ? regend[regno] : end_match_1);
while (1)
{
/* Advance to next segment in register contents, if necessary */
@@ -1222,16 +1330,16 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
/* Nonzero for charset_not */
int not = 0;
register int c;
- if (*(p - 1) == (char) charset_not)
+ if (*(p - 1) == (unsigned char) charset_not)
not = 1;
/* fetch a data character */
PREFETCH;
if (translate)
- c = translate [*(unsigned char *)d];
+ c = translate [*d];
else
- c = *(unsigned char *)d;
+ c = *d;
if (c < *p * BYTEWIDTH
&& p[1 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
@@ -1273,14 +1381,18 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
case on_failure_jump:
if (stackp == stacke)
{
- char **stackx = (char **) alloca (2 * (stacke - stackb) * sizeof (char *));
+ unsigned char **stackx;
+ if (stacke - stackb > re_max_failures * 2)
+ return -2;
+ stackx = (unsigned char **) alloca (2 * (stacke - stackb)
+ * sizeof (char *));
bcopy (stackb, stackx, (stacke - stackb) * sizeof (char *));
- stackp += stackx - stackb;
+ stackp = stackx + (stackp - stackb);
stacke = stackx + 2 * (stacke - stackb);
stackb = stackx;
}
mcnt = *p++ & 0377;
- mcnt += SIGN_EXTEND_CHAR (*p) << 8;
+ mcnt += SIGN_EXTEND_CHAR (*(char *)p) << 8;
p++;
*stackp++ = mcnt + p;
*stackp++ = d;
@@ -1291,37 +1403,46 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
case maybe_finalize_jump:
mcnt = *p++ & 0377;
- mcnt += SIGN_EXTEND_CHAR (*p) << 8;
+ mcnt += SIGN_EXTEND_CHAR (*(char *)p) << 8;
p++;
- /* Compare what follows with the begining of the repeat.
- If we can establish that there is nothing that they would
- both match, we can change to finalize_jump */
- if (p == pend)
- p[-3] = (char) finalize_jump;
- else if (*p == (char) exactn || *p == (char) endline)
- {
- register int c = *p == (char) endline ? '\n' : p[2];
- register char *p1 = p + mcnt;
- /* p1[0] ... p1[2] are an on_failure_jump.
- Examine what follows that */
- if (p1[3] == (char) exactn && p1[5] != c)
- p[-3] = (char) finalize_jump;
- else if (p1[3] == (char) charset || p1[3] == (char) charset_not)
- {
- int not = p1[3] == (char) charset_not;
- if (c < p1[4] * BYTEWIDTH
- && p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
- not = !not;
- /* not is 1 if c would match */
- /* That means it is not safe to finalize */
- if (!not)
- p[-3] = (char) finalize_jump;
- }
- }
+ {
+ register unsigned char *p2 = p;
+ /* Compare what follows with the begining of the repeat.
+ If we can establish that there is nothing that they would
+ both match, we can change to finalize_jump */
+ while (p2 != pend
+ && (*p2 == (unsigned char) stop_memory
+ || *p2 == (unsigned char) start_memory))
+ p2++;
+ if (p2 == pend)
+ p[-3] = (unsigned char) finalize_jump;
+ else if (*p2 == (unsigned char) exactn
+ || *p2 == (unsigned char) endline)
+ {
+ register int c = *p2 == (unsigned char) endline ? '\n' : p2[2];
+ register unsigned char *p1 = p + mcnt;
+ /* p1[0] ... p1[2] are an on_failure_jump.
+ Examine what follows that */
+ if (p1[3] == (unsigned char) exactn && p1[5] != c)
+ p[-3] = (unsigned char) finalize_jump;
+ else if (p1[3] == (unsigned char) charset
+ || p1[3] == (unsigned char) charset_not)
+ {
+ int not = p1[3] == (unsigned char) charset_not;
+ if (c < p1[4] * BYTEWIDTH
+ && p1[5 + c / BYTEWIDTH] & (1 << (c % BYTEWIDTH)))
+ not = !not;
+ /* not is 1 if c would match */
+ /* That means it is not safe to finalize */
+ if (!not)
+ p[-3] = (unsigned char) finalize_jump;
+ }
+ }
+ }
p -= 2;
- if (p[-1] != (char) finalize_jump)
+ if (p[-1] != (unsigned char) finalize_jump)
{
- p[-1] = (char) jump;
+ p[-1] = (unsigned char) jump;
goto nofinalize;
}
@@ -1335,16 +1456,18 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
case jump:
nofinalize:
mcnt = *p++ & 0377;
- mcnt += SIGN_EXTEND_CHAR (*p) << 8;
+ mcnt += SIGN_EXTEND_CHAR (*(char *)p) << 8;
p += mcnt + 1; /* The 1 compensates for missing ++ above */
break;
case dummy_failure_jump:
if (stackp == stacke)
{
- char **stackx = (char **) alloca (2 * (stacke - stackb) * sizeof (char *));
+ unsigned char **stackx
+ = (unsigned char **) alloca (2 * (stacke - stackb)
+ * sizeof (char *));
bcopy (stackb, stackx, (stacke - stackb) * sizeof (char *));
- stackp += stackx - stackb;
+ stackp = stackx + (stackp - stackb);
stacke = stackx + 2 * (stacke - stackb);
stackb = stackx;
}
@@ -1357,8 +1480,8 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
|| d == end2 /* Points to end */
|| (d == end1 && size2 == 0)) /* Points to end */
break;
- if ((SYNTAX (((unsigned char *)d)[-1]) == Sword)
- != (SYNTAX (d == end1 ? *(unsigned char *)string2 : *(unsigned char *)d) == Sword))
+ if ((SYNTAX (d[-1]) == Sword)
+ != (SYNTAX (d == end1 ? *string2 : *d) == Sword))
break;
goto fail;
@@ -1367,49 +1490,49 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
|| d == end2 /* Points to end */
|| (d == end1 && size2 == 0)) /* Points to end */
goto fail;
- if ((SYNTAX (((unsigned char *)d)[-1]) == Sword)
- != (SYNTAX (d == end1 ? *(unsigned char *)string2 : *(unsigned char *)d) == Sword))
+ if ((SYNTAX (d[-1]) == Sword)
+ != (SYNTAX (d == end1 ? *string2 : *d) == Sword))
goto fail;
break;
case wordbeg:
if (d == end2 /* Points to end */
|| (d == end1 && size2 == 0) /* Points to end */
- || SYNTAX (*(unsigned char *) (d == end1 ? string2 : d)) != Sword) /* Next char not a letter */
+ || SYNTAX (* (d == end1 ? string2 : d)) != Sword) /* Next char not a letter */
goto fail;
if (d == string1 /* Points to first char */
- || SYNTAX (((unsigned char *)d)[-1]) != Sword) /* prev char not letter */
+ || SYNTAX (d[-1]) != Sword) /* prev char not letter */
break;
goto fail;
case wordend:
if (d == string1 /* Points to first char */
- || SYNTAX (((unsigned char *)d)[-1]) != Sword) /* prev char not letter */
+ || SYNTAX (d[-1]) != Sword) /* prev char not letter */
goto fail;
if (d == end2 /* Points to end */
|| (d == end1 && size2 == 0) /* Points to end */
- || SYNTAX (d == end1 ? *(unsigned char *)string2 : *(unsigned char *)d) != Sword) /* Next char not a letter */
+ || SYNTAX (d == end1 ? *string2 : *d) != Sword) /* Next char not a letter */
break;
goto fail;
#ifdef emacs
case before_dot:
if (((d - string2 <= (unsigned) size2)
- ? d - (char *) bf_p2 : d - (char *) bf_p1)
+ ? d - bf_p2 : d - bf_p1)
<= point)
goto fail;
break;
case at_dot:
if (((d - string2 <= (unsigned) size2)
- ? d - (char *) bf_p2 : d - (char *) bf_p1)
+ ? d - bf_p2 : d - bf_p1)
== point)
goto fail;
break;
case after_dot:
if (((d - string2 <= (unsigned) size2)
- ? d - (char *) bf_p2 : d - (char *) bf_p1)
+ ? d - bf_p2 : d - bf_p1)
>= point)
goto fail;
break;
@@ -1422,7 +1545,7 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
mcnt = *p++;
matchsyntax:
PREFETCH;
- if (SYNTAX (*(unsigned char *)d++) != (enum syntaxcode) mcnt) goto fail;
+ if (SYNTAX (*d++) != (enum syntaxcode) mcnt) goto fail;
break;
case notwordchar:
@@ -1433,19 +1556,19 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
mcnt = *p++;
matchnotsyntax:
PREFETCH;
- if (SYNTAX (*(unsigned char *)d++) == (enum syntaxcode) mcnt) goto fail;
+ if (SYNTAX (*d++) == (enum syntaxcode) mcnt) goto fail;
break;
#else
case wordchar:
PREFETCH;
- if (SYNTAX (*(unsigned char *)d++) == 0) goto fail;
+ if (SYNTAX (*d++) == 0) goto fail;
break;
case notwordchar:
PREFETCH;
- if (SYNTAX (*(unsigned char *)d++) != 0) goto fail;
+ if (SYNTAX (*d++) != 0) goto fail;
break;
-#endif not emacs
+#endif /* not emacs */
case begbuf:
if (d == string1) /* Note, d cannot equal string2 */
@@ -1466,7 +1589,7 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
do
{
PREFETCH;
- if (translate[*(unsigned char *)d++] != *p++) goto fail;
+ if (translate[*d++] != *p++) goto fail;
}
while (--mcnt);
}
@@ -1505,11 +1628,11 @@ re_match_2 (pbufp, string1, size1, string2, size2, pos, regs, mstop)
static int
bcmp_translate (s1, s2, len, translate)
- char *s1, *s2;
+ unsigned char *s1, *s2;
register int len;
- char *translate;
+ unsigned char *translate;
{
- register char *p1 = s1, *p2 = s2;
+ register unsigned char *p1 = s1, *p2 = s2;
while (len)
{
if (translate [*p1++] != translate [*p2++]) return 1;
@@ -1597,22 +1720,28 @@ static char upcase[0400] =
0370, 0371, 0372, 0373, 0374, 0375, 0376, 0377
};
-main ()
+main (argc, argv)
+ int argc;
+ char **argv;
{
char pat[80];
struct re_pattern_buffer buf;
int i;
char c;
char fastmap[(1 << BYTEWIDTH)];
- char *gets();
+
+ /* Allow a command argument to specify the style of syntax. */
+ if (argc > 1)
+ obscure_syntax = atoi (argv[1]);
buf.allocated = 40;
buf.buffer = (char *) malloc (buf.allocated);
buf.fastmap = fastmap;
buf.translate = upcase;
- while (gets(pat))
+ while (1)
{
+ gets (pat);
if (*pat)
{
@@ -1686,4 +1815,4 @@ error (string)
exit (1);
}
-#endif test
+#endif /* test */
diff --git a/regex.h b/regex.h
index b87b47e6..c9f082df 100644
--- a/regex.h
+++ b/regex.h
@@ -32,7 +32,8 @@ as you receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy a valid copyright notice "Copyright
(C) 1985 Free Software Foundation, Inc."; and include following the
copyright notice a verbatim copy of the above disclaimer of warranty
-and of this License.
+and of this License. You may charge a distribution fee for the
+physical act of transferring a copy.
2. You may modify your copy or copies of this source file or
any portion of it, and copy and distribute such modifications under
@@ -43,31 +44,42 @@ the terms of Paragraph 1 above, provided that you also do the following:
b) cause the whole of any work that you distribute or publish,
that in whole or in part contains or is a derivative of this
- program or any part thereof, to be freely distributed
- and licensed to all third parties on terms identical to those
- contained in this License Agreement (except that you may choose
- to grant more extensive warranty protection to third parties,
- at your option).
-
- 3. You may copy and distribute this program or any portion of it in
-compiled, executable or object code form under the terms of Paragraphs
-1 and 2 above provided that you do the following:
-
- a) cause each such copy to be accompanied by the
- corresponding machine-readable source code, which must
- be distributed under the terms of Paragraphs 1 and 2 above; or,
-
- b) cause each such copy to be accompanied by a
- written offer, with no time limit, to give any third party
- free (except for a nominal shipping charge) a machine readable
- copy of the corresponding source code, to be distributed
- under the terms of Paragraphs 1 and 2 above; or,
-
- c) in the case of a recipient of this program in compiled, executable
- or object code form (without the corresponding source code) you
- shall cause copies you distribute to be accompanied by a copy
- of the written offer of source code which you received along
- with the copy you received.
+ program or any part thereof, to be licensed at no charge to all
+ third parties on terms identical to those contained in this
+ License Agreement (except that you may choose to grant more extensive
+ warranty protection to some or all third parties, at your option).
+
+ c) You may charge a distribution fee for the physical act of
+ transferring a copy, and you may at your option offer warranty
+ protection in exchange for a fee.
+
+Mere aggregation of another unrelated program with this program (or its
+derivative) on a volume of a storage or distribution medium does not bring
+the other program under the scope of these terms.
+
+ 3. You may copy and distribute this program (or a portion or derivative
+of it, under Paragraph 2) in object code or executable form under the terms
+of Paragraphs 1 and 2 above provided that you also do one of the following:
+
+ a) accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of
+ Paragraphs 1 and 2 above; or,
+
+ b) accompany it with a written offer, valid for at least three
+ years, to give any third party free (except for a nominal
+ shipping charge) a complete machine-readable copy of the
+ corresponding source code, to be distributed under the terms of
+ Paragraphs 1 and 2 above; or,
+
+ c) accompany it with the information you received as to where the
+ corresponding source code may be obtained. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form alone.)
+
+For an executable file, complete source code means all the source code for
+all modules it contains; but, as a special exception, it need not include
+source code for modules which are standard libraries that accompany the
+operating system on which the executable file runs.
4. You may not copy, sublicense, distribute or transfer this program
except as expressly provided under this License Agreement. Any attempt
@@ -77,23 +89,65 @@ automatically terminated. However, parties who have received computer
software programs from you with this License Agreement will not have
their licenses terminated so long as such parties remain in full compliance.
+ 5. If you wish to incorporate parts of this program into other free
+programs whose distribution conditions are different, write to the Free
+Software Foundation at 675 Mass Ave, Cambridge, MA 02139. We have not yet
+worked out a simple rule that can be stated here, but we will often permit
+this. We will be guided by the two goals of preserving the free status of
+all derivatives of our free software and of promoting the sharing and reuse of
+software.
+
In other words, you are welcome to use, share and improve this program.
You are forbidden to forbid anyone else to use, share and improve
what you give them. Help stamp out software-hoarding! */
+/* Define number of parens for which we record the beginnings and ends.
+ This affects how much space the `struct re_registers' type takes up. */
#ifndef RE_NREGS
#define RE_NREGS 10
#endif
+/* These bits are used in the obscure_syntax variable to choose among
+ alternative regexp syntaxes. */
+
+/* 1 means plain parentheses serve as grouping, and backslash
+ parentheses are needed for literal searching.
+ 0 means backslash-parentheses are grouping, and plain parentheses
+ are for literal searching. */
+#define RE_NO_BK_PARENS 1
+
+/* 1 means plain | serves as the "or"-operator, and \| is a literal.
+ 0 means \| serves as the "or"-operator, and | is a literal. */
+#define RE_NO_BK_VBAR 2
+
+/* 0 means plain + or ? serves as an operator, and \+, \? are literals.
+ 1 means \+, \? are operators and plain +, ? are literals. */
+#define RE_BK_PLUS_QM 4
+
+/* 1 means | binds tighter than ^ or $.
+ 0 means the contrary. */
+#define RE_TIGHT_VBAR 8
+
+/* 1 means treat \n as an _OR operator
+ 0 means treat it as a normal character */
+#define RE_NEWLINE_OR 16
+
+/* 0 means that a special characters (such as *, ^, and $) always have
+ their special meaning regardless of the surrounding context.
+ 1 means that special characters may act as normal characters in some
+ contexts. Specifically, this applies to:
+ ^ - only special at the beginning, or after ( or |
+ $ - only special at the end, or before ) or |
+ *, +, ? - only special when not after the beginning, (, or | */
+#define RE_CONTEXT_INDEP_OPS 32
-/* JF for syntax stuff */
-/* To add more variable-syntax features, just use more bits. If we go over 16,
- we probably should make obscure_syntax a long. (JF: Yes, virgina, there
-really are 16 bit machines out there) */
-#define RE_NO_BK_PARENS (1<<0)
-#define RE_NO_BK_VBAR (1<<1)
+/* Now define combinations of bits for the standard possibilities. */
+#define RE_SYNTAX_AWK (RE_NO_BK_PARENS | RE_NO_BK_VBAR | RE_CONTEXT_INDEP_OPS)
+#define RE_SYNTAX_EGREP (RE_SYNTAX_AWK | RE_NEWLINE_OR)
+#define RE_SYNTAX_GREP (RE_BK_PLUS_QM | RE_NEWLINE_OR)
+#define RE_SYNTAX_EMACS 0
/* This data structure is used to represent a compiled pattern. */
diff --git a/version.c b/version.c
new file mode 100644
index 00000000..b6c42731
--- /dev/null
+++ b/version.c
@@ -0,0 +1,25 @@
+char *version_string = "@(#)Gnu Awk (gawk) 2.02beta 23 Dec 1988\n" + 4;
+
+/* 1.02 fixed /= += *= etc to return the new Left Hand Side instead
+ of the Right Hand Side */
+
+/* 1.03 Fixed split() to treat strings of space and tab as FS if
+ the split char is ' '.
+
+ Added -v option to print version number
+
+ Fixed bug that caused rounding when printing large numbers */
+
+/* 2.00beta Incorporated the functionality of the "new" awk as described
+ the book (reference not handy). Extensively tested, but no
+ doubt still buggy. Badly needs tuning and cleanup, in
+ particular in memory management which is currently almost
+ non-existent. */
+
+ /* JF: Modified to compile under GCC, and fixed a few
+ bugs while I was at it. I hope I didn't add any more.
+ I modified parse.y to reduce the number of reduce/reduce
+ conflicts. There are still a few left. */
+
+/* 2.02 Fixed JF's bugs; improved memory management, still needs
+ lots of work. */