diff options
author | Eric Blake <ebb9@byu.net> | 2009-01-06 22:03:27 -0700 |
---|---|---|
committer | Eric Blake <ebb9@byu.net> | 2009-01-07 14:11:08 -0700 |
commit | ae9dfa87a514d290fe349710a9f643d52856f4ba (patch) | |
tree | 56561d37876dd6b774b9ce07e608c3df436a9093 | |
parent | e6819ca240b76700f07c31ba157f7795caada02e (diff) | |
download | m4-ae9dfa87a514d290fe349710a9f643d52856f4ba.tar.gz |
Enhance substr to support negative values.
* doc/m4.texinfo (Substr): Document new semantics, and how to
simulate old.
* modules/m4.c (substr): Support negative values.
* NEWS: Document this.
Signed-off-by: Eric Blake <ebb9@byu.net>
(cherry picked from commit e9e4abba45f7e9f368cf497e14bc2ce64b867a02)
-rw-r--r-- | ChangeLog | 8 | ||||
-rw-r--r-- | NEWS | 9 | ||||
-rw-r--r-- | doc/m4.texinfo | 157 | ||||
-rw-r--r-- | modules/m4.c | 46 |
4 files changed, 193 insertions, 27 deletions
@@ -1,3 +1,11 @@ +2009-01-07 Eric Blake <ebb9@byu.net> + + Enhance substr to support negative values. + * doc/m4.texinfo (Substr): Document new semantics, and how to + simulate old. + * modules/m4.c (substr): Support negative values. + * NEWS: Document this. + 2009-01-05 Eric Blake <ebb9@byu.net> Maintainer cleanups. @@ -1,6 +1,6 @@ GNU m4 NEWS - History of user-visible changes. -*- outline -*- -Copyright (C) 1992, 1993, 1994, 1998, 2000, 2001, 2006, 2007, 2008 Free -Software Foundation, Inc. +Copyright (C) 1992, 1993, 1994, 1998, 2000, 2001, 2006, 2007, 2008, 2009 +Free Software Foundation, Inc. * Noteworthy changes in Version 1.9b (200x-??-??) [beta] Released by ????, based on git version 1.9a-* @@ -242,6 +242,11 @@ promoted to 2.0. the current expansion is nested within argument collection of another macro. It has also been optimized for faster performance. +** The `substr' builtin now treats negative arguments as indices relative + to the end of the string. The manual gives an + example of how to recover M4 1.4.x behavior, as well as an example of + simulating the new negative argument semantics with older M4. + ** The `-d'/`--debug' command-line option now understands `-' and `+' modifiers, the way the builtin `debugmode' has always done; this allows `-d-V' to disable prior debug settings from the command line, similar to diff --git a/doc/m4.texinfo b/doc/m4.texinfo index c5e36dd5..6b515cfd 100644 --- a/doc/m4.texinfo +++ b/doc/m4.texinfo @@ -46,7 +46,7 @@ This manual (@value{UPDATED}) is for @acronym{GNU} M4 (version language. Copyright @copyright{} 1989, 1990, 1991, 1992, 1993, 1994, 1998, 1999, -2000, 2001, 2004, 2005, 2006, 2007, 2008 Free Software Foundation, Inc. +2000, 2001, 2004, 2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc. @quotation Permission is granted to copy, distribute and/or modify this document @@ -7012,12 +7012,27 @@ regexp(`GNUs not Unix', `\w\(\w+\)$', `POSIX_EXTENDED', `') Substrings are extracted with @code{substr}: @deffn {Builtin (m4)} substr (@var{string}, @var{from}, @ovar{length}) -Expands to the substring of @var{string}, which starts at index -@var{from}, and extends for @var{length} characters, or to the end of -@var{string}, if @var{length} is omitted. The starting index of a -is always 0. The expansion is empty if there is an error parsing -@var{from} or @var{length}, if @var{from} is beyond the end of -@var{string}, or if @var{length} is negative. +Performs a substring operation on @var{string}. If @var{from} is +positive, it represents the 0-based index where the substring begins. +If @var{length} is omitted, the substring ends at the end of +@var{string}; if it is positive, @var{length} is added to the starting +index to determine the ending index. + +@cindex @acronym{GNU} extensions +As a @acronym{GNU} extension, if @var{from} is negative, it is added to +the length of @var{string} to determine the starting index; if it is +empty, the start of the string is used. Likewise, if @var{length} is +negative, it is added to the length of @var{string} to determine the +ending index, and an emtpy @var{length} behaves like an omitted +@var{length}. It is not an error if either of the resulting indices lie +outside the string, but the selected substring only contains the bytes +of @var{string} that overlap the selected indices. If the end point +lies before the beginning point, the substring chosen is the empty +string located at the starting index. + +The expansion is the selected substring, which may be empty. The +expansion is empty and a warning issued if @var{from} or @var{length} +cannot be parsed. The macro @code{substr} is recognized only with parameters. @end deffn @@ -7029,15 +7044,137 @@ substr(`gnus, gnats, and armadillos', `6', `5') @result{}gnats @end example -Omitting @var{from} evokes a warning, but still produces output. +Omitting @var{from} evokes a warning, but still produces output. On the +other hand, selecting a @var{from} or @var{length} that lies beyond +@var{string} is not a problem. @example substr(`abc') @error{}m4:stdin:1: Warning: substr: too few arguments: 1 < 2 @result{}abc -substr(`abc',) -@error{}m4:stdin:2: Warning: substr: empty string treated as 0 +substr(`abc', `') @result{}abc +substr(`abc', `4') +@result{} +substr(`abc', `1', `4') +@result{}bc +@end example + +Using negative values for @var{from} or @var{length} are @acronym{GNU} +extensions, useful for accessing a fixed size tail of an +arbitrary-length string. Prior to M4 1.6, using these values would +silently result in the empty string. Some other implementations crash +on negative values, and many treat an explicitly empty @var{length} as +0, which is different from the omitted @var{length} implying the rest of +the original @var{string}. + +@example +substr(`abcde', `2', `') +@result{}cde +substr(`abcde', `-3') +@result{}cde +substr(`abcde', `', `-3') +@result{}ab +substr(`abcde', `-6') +@result{}abcde +substr(`abcde', `-6', `5') +@result{}abcd +substr(`abcde', `-7', `1') +@result{} +substr(`abcde', `1', `-2') +@result{}bc +substr(`abcde', `-4', `-1') +@result{}bcd +substr(`abcde', `4', `-3') +@result{} +substr(`abcdefghij', `-09', `08') +@result{}bcdefghi +@end example + +If backwards compabitility to M4 1.4.x behavior is necessary, the +following macro is sufficient to do the job (mimicking warnings about +empty @var{from} or @var{length} or an ignored fourth argument is left +as an exercise to the reader). + +@example +define(`substr', `ifelse(`$#', `0', ``$0'', + eval(`2 < $#')`$3', `1', `', + index(`$2$3', `-'), `-1', `builtin(`$0', `$1', `$2', `$3')')') +@result{} +substr(`abcde', `3') +@result{}de +substr(`abcde', `3', `') +@result{} +substr(`abcde', `-1') +@result{} +substr(`abcde', `1', `-1') +@result{} +substr(`abcde', `2', `1', `C') +@result{}c +@end example + +On the other hand, it is possible to portably emulate the @acronym{GNU} +extension of negative @var{from} and @var{length} arguments across all +@code{m4} implementations, albeit with a lot more overhead. This +example uses @code{incr} and @code{decr} to normalize @samp{-08} to +something that a later @code{eval} will treat as a decimal value, rather +than looking like an invalid octal number, while avoiding using these +macros on an empty string. The helper macro @code{_substr_normalize} is +recursive, since it is easier to fix @var{length} after @var{from} has +been normalized, with the final iteration supplying two non-negative +arguments to the original builtin, now named @code{_substr}. + +@comment options: -daq -t_substr +@example +$ @kbd{m4 -daq -t _substr} +define(`_substr', defn(`substr'))dnl +define(`substr', `ifelse(`$#', `0', ``$0'', + `_$0(`$1', _$0_normalize(len(`$1'), + ifelse(`$2', `', `0', `incr(decr(`$2'))'), + ifelse(`$3', `', `', `incr(decr(`$3'))')))')')dnl +define(`_substr_normalize', `ifelse( + eval(`$2 < 0 && $1 + $2 >= 0'), `1', + `$0(`$1', eval(`$1 + $2'), `$3')', + eval(`$2 < 0')`$3', `1', ``0', `$1'', + eval(`$2 < 0 && $3 - 0 >= 0 && $1 + $2 + $3 - 0 >= 0'), `1', + `$0(`$1', `0', eval(`$1 + $2 + $3 - 0'))', + eval(`$2 < 0 && $3 - 0 >= 0'), `1', ``0', `0'', + eval(`$2 < 0'), `1', `$0(`$1', `0', `$3')', + `$3', `', ``$2', `$1'', + eval(`$3 - 0 < 0 && $1 - $2 + $3 - 0 >= 0'), `1', + ``$2', eval(`$1 - $2 + $3')', + eval(`$3 - 0 < 0'), `1', ``$2', `0'', + ``$2', `$3'')')dnl +substr(`abcde', `2', `') +@error{}m4trace: -1- _substr(`abcde', `2', `5') +@result{}cde +substr(`abcde', `-3') +@error{}m4trace: -1- _substr(`abcde', `2', `5') +@result{}cde +substr(`abcde', `', `-3') +@error{}m4trace: -1- _substr(`abcde', `0', `2') +@result{}ab +substr(`abcde', `-6') +@error{}m4trace: -1- _substr(`abcde', `0', `5') +@result{}abcde +substr(`abcde', `-6', `5') +@error{}m4trace: -1- _substr(`abcde', `0', `4') +@result{}abcd +substr(`abcde', `-7', `1') +@error{}m4trace: -1- _substr(`abcde', `0', `0') +@result{} +substr(`abcde', `1', `-2') +@error{}m4trace: -1- _substr(`abcde', `1', `2') +@result{}bc +substr(`abcde', `-4', `-1') +@error{}m4trace: -1- _substr(`abcde', `1', `3') +@result{}bcd +substr(`abcde', `4', `-3') +@error{}m4trace: -1- _substr(`abcde', `4', `0') +@result{} +substr(`abcdefghij', `-09', `08') +@error{}m4trace: -1- _substr(`abcdefghij', `1', `8') +@result{}bcdefghi @end example @node Translit diff --git a/modules/m4.c b/modules/m4.c index f578261a..b09510c8 100644 --- a/modules/m4.c +++ b/modules/m4.c @@ -1,6 +1,6 @@ /* GNU m4 -- A simple macro processor - Copyright (C) 2000, 2002, 2003, 2004, 2006, 2007, 2008 Free Software - Foundation, Inc. + Copyright (C) 2000, 2002, 2003, 2004, 2006, 2007, 2008, 2009 Free + Software Foundation, Inc. This file is part of GNU M4. @@ -924,17 +924,19 @@ M4BUILTIN_HANDLER (index) m4_shipout_int (obs, retval); } -/* The macro "substr" extracts substrings from the first argument, starting - from the index given by the second argument, extending for a length - given by the third argument. If the third argument is missing, the - substring extends to the end of the first argument. */ +/* The macro "substr" extracts substrings from the first argument, + starting from the index given by the second argument, extending for + a length given by the third argument. If the third argument is + missing or empty, the substring extends to the end of the first + argument. As an extension, negative arguments are treated as + indices relative to the string length. */ M4BUILTIN_HANDLER (substr) { const m4_call_info *me = m4_arg_info (argv); const char *str = M4ARG (1); int start = 0; + int end; int length; - int avail; if (argc <= 2) { @@ -942,19 +944,33 @@ M4BUILTIN_HANDLER (substr) return; } - length = avail = M4ARGLEN (1); - if (!m4_numeric_arg (context, me, M4ARG (2), &start)) + length = M4ARGLEN (1); + if (!m4_arg_empty (argv, 2) + && !m4_numeric_arg (context, me, M4ARG (2), &start)) return; + if (start < 0) + start += length; - if (argc >= 4 && !m4_numeric_arg (context, me, M4ARG (3), &length)) - return; + if (m4_arg_empty (argv, 3)) + end = length; + else + { + if (!m4_numeric_arg (context, me, M4ARG (3), &end)) + return; + if (end < 0) + end += length; + else + end += start; + } - if (start < 0 || length <= 0 || start >= avail) + if (start < 0) + start = 0; + if (length < end) + end = length; + if (end <= start) return; - if (start + length > avail) - length = avail - start; - obstack_grow (obs, str + start, length); + obstack_grow (obs, str + start, end - start); } |