------------------------------------------------------------------------------
-- --
-- GNAT LIBRARY COMPONENTS --
-- --
-- G N A T . S P I T B O L . P A T T E R N S --
-- --
-- S p e c --
-- --
-- Copyright (C) 1997-2013, AdaCore --
-- --
-- GNAT is free software; you can redistribute it and/or modify it under --
-- terms of the GNU General Public License as published by the Free Soft- --
-- ware Foundation; either version 3, or (at your option) any later ver- --
-- sion. GNAT is distributed in the hope that it will be useful, but WITH- --
-- OUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY --
-- or FITNESS FOR A PARTICULAR PURPOSE. --
-- --
-- As a special exception under Section 7 of GPL version 3, you are granted --
-- additional permissions described in the GCC Runtime Library Exception, --
-- version 3.1, as published by the Free Software Foundation. --
-- --
-- You should have received a copy of the GNU General Public License and --
-- a copy of the GCC Runtime Library Exception along with this program; --
-- see the files COPYING3 and COPYING.RUNTIME respectively. If not, see --
-- . --
-- --
-- GNAT was originally developed by the GNAT team at New York University. --
-- Extensive contributions were provided by Ada Core Technologies Inc. --
-- --
------------------------------------------------------------------------------
-- SPITBOL-like pattern construction and matching
-- This child package of GNAT.SPITBOL provides a complete implementation
-- of the SPITBOL-like pattern construction and matching operations. This
-- package is based on Macro-SPITBOL created by Robert Dewar.
------------------------------------------------------------
-- Summary of Pattern Matching Packages in GNAT Hierarchy --
------------------------------------------------------------
-- There are three related packages that perform pattern matching functions.
-- the following is an outline of these packages, to help you determine
-- which is best for your needs.
-- GNAT.Regexp (files g-regexp.ads/g-regexp.adb)
-- This is a simple package providing Unix-style regular expression
-- matching with the restriction that it matches entire strings. It
-- is particularly useful for file name matching, and in particular
-- it provides "globbing patterns" that are useful in implementing
-- unix or DOS style wild card matching for file names.
-- GNAT.Regpat (files g-regpat.ads/g-regpat.adb)
-- This is a more complete implementation of Unix-style regular
-- expressions, copied from the original V7 style regular expression
-- library written in C by Henry Spencer. It is functionally the
-- same as this library, and uses the same internal data structures
-- stored in a binary compatible manner.
-- GNAT.Spitbol.Patterns (files g-spipat.ads/g-spipat.adb)
-- This is a completely general patterm matching package based on the
-- pattern language of SNOBOL4, as implemented in SPITBOL. The pattern
-- language is modeled on context free grammars, with context sensitive
-- extensions that provide full (type 0) computational capabilities.
with Ada.Strings.Maps; use Ada.Strings.Maps;
with Ada.Text_IO; use Ada.Text_IO;
package GNAT.Spitbol.Patterns is
pragma Elaborate_Body;
-------------------------------
-- Pattern Matching Tutorial --
-------------------------------
-- A pattern matching operation (a call to one of the Match subprograms)
-- takes a subject string and a pattern, and optionally a replacement
-- string. The replacement string option is only allowed if the subject
-- is a variable.
-- The pattern is matched against the subject string, and either the
-- match fails, or it succeeds matching a contiguous substring. If a
-- replacement string is specified, then the subject string is modified
-- by replacing the matched substring with the given replacement.
-- Concatenation and Alternation
-- =============================
-- A pattern consists of a series of pattern elements. The pattern is
-- built up using either the concatenation operator:
-- A & B
-- which means match A followed immediately by matching B, or the
-- alternation operator:
-- A or B
-- which means first attempt to match A, and then if that does not
-- succeed, match B.
-- There is full backtracking, which means that if a given pattern
-- element fails to match, then previous alternatives are matched.
-- For example if we have the pattern:
-- (A or B) & (C or D) & (E or F)
-- First we attempt to match A, if that succeeds, then we go on to try
-- to match C, and if that succeeds, we go on to try to match E. If E
-- fails, then we try F. If F fails, then we go back and try matching
-- D instead of C. Let's make this explicit using a specific example,
-- and introducing the simplest kind of pattern element, which is a
-- literal string. The meaning of this pattern element is simply to
-- match the characters that correspond to the string characters. Now
-- let's rewrite the above pattern form with specific string literals
-- as the pattern elements:
-- ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ")
-- The following strings will be attempted in sequence:
-- ABC . DEF . GH
-- ABC . DEF . IJ
-- ABC . CDE . GH
-- ABC . CDE . IJ
-- AB . DEF . GH
-- AB . DEF . IJ
-- AB . CDE . GH
-- AB . CDE . IJ
-- Here we use the dot simply to separate the pieces of the string
-- matched by the three separate elements.
-- Moving the Start Point
-- ======================
-- A pattern is not required to match starting at the first character
-- of the string, and is not required to match to the end of the string.
-- The first attempt does indeed attempt to match starting at the first
-- character of the string, trying all the possible alternatives. But
-- if all alternatives fail, then the starting point of the match is
-- moved one character, and all possible alternatives are attempted at
-- the new anchor point.
-- The entire match fails only when every possible starting point has
-- been attempted. As an example, suppose that we had the subject
-- string
-- "ABABCDEIJKL"
-- matched using the pattern in the previous example:
-- ("ABC" or "AB") & ("DEF" or "CDE") & ("GH" or "IJ")
-- would succeed, after two anchor point moves:
-- "ABABCDEIJKL"
-- ^^^^^^^
-- matched
-- section
-- This mode of pattern matching is called the unanchored mode. It is
-- also possible to put the pattern matcher into anchored mode by
-- setting the global variable Anchored_Mode to True. This will cause
-- all subsequent matches to be performed in anchored mode, where the
-- match is required to start at the first character.
-- We will also see later how the effect of an anchored match can be
-- obtained for a single specified anchor point if this is desired.
-- Other Pattern Elements
-- ======================
-- In addition to strings (or single characters), there are many special
-- pattern elements that correspond to special predefined alternations:
-- Arb Matches any string. First it matches the null string, and
-- then on a subsequent failure, matches one character, and
-- then two characters, and so on. It only fails if the
-- entire remaining string is matched.
-- Bal Matches a non-empty string that is parentheses balanced
-- with respect to ordinary () characters. Examples of
-- balanced strings are "ABC", "A((B)C)", and "A(B)C(D)E".
-- Bal matches the shortest possible balanced string on the
-- first attempt, and if there is a subsequent failure,
-- attempts to extend the string.
-- Cancel Immediately aborts the entire pattern match, signalling
-- failure. This is a specialized pattern element, which is
-- useful in conjunction with some of the special pattern
-- elements that have side effects.
-- Fail The null alternation. Matches no possible strings, so it
-- always signals failure. This is a specialized pattern
-- element, which is useful in conjunction with some of the
-- special pattern elements that have side effects.
-- Fence Matches the null string at first, and then if a failure
-- causes alternatives to be sought, aborts the match (like
-- a Cancel). Note that using Fence at the start of a pattern
-- has the same effect as matching in anchored mode.
-- Rest Matches from the current point to the last character in
-- the string. This is a specialized pattern element, which
-- is useful in conjunction with some of the special pattern
-- elements that have side effects.
-- Succeed Repeatedly matches the null string (it is equivalent to
-- the alternation ("" or "" or "" ....). This is a special
-- pattern element, which is useful in conjunction with some
-- of the special pattern elements that have side effects.
-- Pattern Construction Functions
-- ==============================
-- The following functions construct additional pattern elements
-- Any(S) Where S is a string, matches a single character that is
-- any one of the characters in S. Fails if the current
-- character is not one of the given set of characters.
-- Arbno(P) Where P is any pattern, matches any number of instances
-- of the pattern, starting with zero occurrences. It is
-- thus equivalent to ("" or (P & ("" or (P & ("" ....)))).
-- The pattern P may contain any number of pattern elements
-- including the use of alternation and concatenation.
-- Break(S) Where S is a string, matches a string of zero or more
-- characters up to but not including a break character
-- that is one of the characters given in the string S.
-- Can match the null string, but cannot match the last
-- character in the string, since a break character is
-- required to be present.
-- BreakX(S) Where S is a string, behaves exactly like Break(S) when
-- it first matches, but if a string is successfully matched,
-- then a subsequent failure causes an attempt to extend the
-- matched string.
-- Fence(P) Where P is a pattern, attempts to match the pattern P
-- including trying all possible alternatives of P. If none
-- of these alternatives succeeds, then the Fence pattern
-- fails. If one alternative succeeds, then the pattern
-- match proceeds, but on a subsequent failure, no attempt
-- is made to search for alternative matches of P. The
-- pattern P may contain any number of pattern elements
-- including the use of alternation and concatenation.
-- Len(N) Where N is a natural number, matches the given number of
-- characters. For example, Len(10) matches any string that
-- is exactly ten characters long.
-- NotAny(S) Where S is a string, matches a single character that is
-- not one of the characters of S. Fails if the current
-- character is one of the given set of characters.
-- NSpan(S) Where S is a string, matches a string of zero or more
-- characters that is among the characters given in the
-- string. Always matches the longest possible such string.
-- Always succeeds, since it can match the null string.
-- Pos(N) Where N is a natural number, matches the null string
-- if exactly N characters have been matched so far, and
-- otherwise fails.
-- Rpos(N) Where N is a natural number, matches the null string
-- if exactly N characters remain to be matched, and
-- otherwise fails.
-- Rtab(N) Where N is a natural number, matches characters from
-- the current position until exactly N characters remain
-- to be matched in the string. Fails if fewer than N
-- unmatched characters remain in the string.
-- Tab(N) Where N is a natural number, matches characters from
-- the current position until exactly N characters have
-- been matched in all. Fails if more than N characters
-- have already been matched.
-- Span(S) Where S is a string, matches a string of one or more
-- characters that is among the characters given in the
-- string. Always matches the longest possible such string.
-- Fails if the current character is not one of the given
-- set of characters.
-- Recursive Pattern Matching
-- ==========================
-- The plus operator (+P) where P is a pattern variable, creates
-- a recursive pattern that will, at pattern matching time, follow
-- the pointer to obtain the referenced pattern, and then match this
-- pattern. This may be used to construct recursive patterns. Consider
-- for example:
-- P := ("A" or ("B" & (+P)))
-- On the first attempt, this pattern attempts to match the string "A".
-- If this fails, then the alternative matches a "B", followed by an
-- attempt to match P again. This second attempt first attempts to
-- match "A", and so on. The result is a pattern that will match a
-- string of B's followed by a single A.
-- This particular example could simply be written as NSpan('B') & 'A',
-- but the use of recursive patterns in the general case can construct
-- complex patterns which could not otherwise be built.
-- Pattern Assignment Operations
-- =============================
-- In addition to the overall result of a pattern match, which indicates
-- success or failure, it is often useful to be able to keep track of
-- the pieces of the subject string that are matched by individual
-- pattern elements, or subsections of the pattern.
-- The pattern assignment operators allow this capability. The first
-- form is the immediate assignment:
-- P * S
-- Here P is an arbitrary pattern, and S is a variable of type VString
-- that will be set to the substring matched by P. This assignment
-- happens during pattern matching, so if P matches more than once,
-- then the assignment happens more than once.
-- The deferred assignment operation:
-- P ** S
-- avoids these multiple assignments by deferring the assignment to the
-- end of the match. If the entire match is successful, and if the
-- pattern P was part of the successful match, then at the end of the
-- matching operation the assignment to S of the string matching P is
-- performed.
-- The cursor assignment operation:
-- Setcur(N'Access)
-- assigns the current cursor position to the natural variable N. The
-- cursor position is defined as the count of characters that have been
-- matched so far (including any start point moves).
-- Finally the operations * and ** may be used with values of type
-- Text_IO.File_Access. The effect is to do a Put_Line operation of
-- the matched substring. These are particularly useful in debugging
-- pattern matches.
-- Deferred Matching
-- =================
-- The pattern construction functions (such as Len and Any) all permit
-- the use of pointers to natural or string values, or functions that
-- return natural or string values. These forms cause the actual value
-- to be obtained at pattern matching time. This allows interesting
-- possibilities for constructing dynamic patterns as illustrated in
-- the examples section.
-- In addition the (+S) operator may be used where S is a pointer to
-- string or function returning string, with a similar deferred effect.
-- A special use of deferred matching is the construction of predicate
-- functions. The element (+P) where P is an access to a function that
-- returns a Boolean value, causes the function to be called at the
-- time the element is matched. If the function returns True, then the
-- null string is matched, if the function returns False, then failure
-- is signalled and previous alternatives are sought.
-- Deferred Replacement
-- ====================
-- The simple model given for pattern replacement (where the matched
-- substring is replaced by the string given as the third argument to
-- Match) works fine in simple cases, but this approach does not work
-- in the case where the expression used as the replacement string is
-- dependent on values set by the match.
-- For example, suppose we want to find an instance of a parenthesized
-- character, and replace the parentheses with square brackets. At first
-- glance it would seem that:
-- Match (Subject, '(' & Len (1) * Char & ')', '[' & Char & ']');
-- would do the trick, but that does not work, because the third
-- argument to Match gets evaluated too early, before the call to
-- Match, and before the pattern match has had a chance to set Char.
-- To solve this problem we provide the deferred replacement capability.
-- With this approach, which of course is only needed if the pattern
-- involved has side effects, is to do the match in two stages. The
-- call to Match sets a pattern result in a variable of the private
-- type Match_Result, and then a subsequent Replace operation uses
-- this Match_Result object to perform the required replacement.
-- Using this approach, we can now write the above operation properly
-- in a manner that will work:
-- M : Match_Result;
-- ...
-- Match (Subject, '(' & Len (1) * Char & ')', M);
-- Replace (M, '[' & Char & ']');
-- As with other Match cases, there is a function and procedure form
-- of this match call. A call to Replace after a failed match has no
-- effect. Note that Subject should not be modified between the calls.
-- Examples of Pattern Matching
-- ============================
-- First a simple example of the use of pattern replacement to remove
-- a line number from the start of a string. We assume that the line
-- number has the form of a string of decimal digits followed by a
-- period, followed by one or more spaces.
-- Digs : constant Pattern := Span("0123456789");
-- Lnum : constant Pattern := Pos(0) & Digs & '.' & Span(' ');
-- Now to use this pattern we simply do a match with a replacement:
-- Match (Line, Lnum, "");
-- which replaces the line number by the null string. Note that it is
-- also possible to use an Ada.Strings.Maps.Character_Set value as an
-- argument to Span and similar functions, and in particular all the
-- useful constants 'in Ada.Strings.Maps.Constants are available. This
-- means that we could define Digs as:
-- Digs : constant Pattern := Span(Decimal_Digit_Set);
-- The style we use here, of defining constant patterns and then using
-- them is typical. It is possible to build up patterns dynamically,
-- but it is usually more efficient to build them in pieces in advance
-- using constant declarations. Note in particular that although it is
-- possible to construct a pattern directly as an argument for the
-- Match routine, it is much more efficient to preconstruct the pattern
-- as we did in this example.
-- Now let's look at the use of pattern assignment to break a
-- string into sections. Suppose that the input string has two
-- unsigned decimal integers, separated by spaces or a comma,
-- with spaces allowed anywhere. Then we can isolate the two
-- numbers with the following pattern:
-- Num1, Num2 : aliased VString;
-- B : constant Pattern := NSpan(' ');
-- N : constant Pattern := Span("0123456789");
-- T : constant Pattern :=
-- NSpan(' ') & N * Num1 & Span(" ,") & N * Num2;
-- The match operation Match (" 124, 257 ", T) would assign the
-- string 124 to Num1 and the string 257 to Num2.
-- Now let's see how more complex elements can be built from the
-- set of primitive elements. The following pattern matches strings
-- that have the syntax of Ada 95 based literals:
-- Digs : constant Pattern := Span(Decimal_Digit_Set);
-- UDigs : constant Pattern := Digs & Arbno('_' & Digs);
-- Edig : constant Pattern := Span(Hexadecimal_Digit_Set);
-- UEdig : constant Pattern := Edig & Arbno('_' & Edig);
-- Bnum : constant Pattern := Udigs & '#' & UEdig & '#';
-- A match against Bnum will now match the desired strings, e.g.
-- it will match 16#123_abc#, but not a#b#. However, this pattern
-- is not quite complete, since it does not allow colons to replace
-- the pound signs. The following is more complete:
-- Bchar : constant Pattern := Any("#:");
-- Bnum : constant Pattern := Udigs & Bchar & UEdig & Bchar;
-- but that is still not quite right, since it allows # and : to be
-- mixed, and they are supposed to be used consistently. We solve
-- this by using a deferred match.
-- Temp : aliased VString;
-- Bnum : constant Pattern :=
-- Udigs & Bchar * Temp & UEdig & (+Temp)
-- Here the first instance of the base character is stored in Temp, and
-- then later in the pattern we rematch the value that was assigned.
-- For an example of a recursive pattern, let's define a pattern
-- that is like the built in Bal, but the string matched is balanced
-- with respect to square brackets or curly brackets.
-- The language for such strings might be defined in extended BNF as
-- ELEMENT ::=
-- | '[' BALANCED_STRING ']'
-- | '{' BALANCED_STRING '}'
-- BALANCED_STRING ::= ELEMENT {ELEMENT}
-- Here we use {} to indicate zero or more occurrences of a term, as
-- is common practice in extended BNF. Now we can translate the above
-- BNF into recursive patterns as follows:
-- Element, Balanced_String : aliased Pattern;
-- .
-- .
-- .
-- Element := NotAny ("[]{}")
-- or
-- ('[' & (+Balanced_String) & ']')
-- or
-- ('{' & (+Balanced_String) & '}');
-- Balanced_String := Element & Arbno (Element);
-- Note the important use of + here to refer to a pattern not yet
-- defined. Note also that we use assignments precisely because we
-- cannot refer to as yet undeclared variables in initializations.
-- Now that this pattern is constructed, we can use it as though it
-- were a new primitive pattern element, and for example, the match:
-- Match ("xy[ab{cd}]", Balanced_String * Current_Output & Fail);
-- will generate the output:
-- x
-- xy
-- xy[ab{cd}]
-- y
-- y[ab{cd}]
-- [ab{cd}]
-- a
-- ab
-- ab{cd}
-- b
-- b{cd}
-- {cd}
-- c
-- cd
-- d
-- Note that the function of the fail here is simply to force the
-- pattern Balanced_String to match all possible alternatives. Studying
-- the operation of this pattern in detail is highly instructive.
-- Finally we give a rather elaborate example of the use of deferred
-- matching. The following declarations build up a pattern which will
-- find the longest string of decimal digits in the subject string.
-- Max, Cur : VString;
-- Loc : Natural;
-- function GtS return Boolean is
-- begin
-- return Length (Cur) > Length (Max);
-- end GtS;
-- Digit : constant Character_Set := Decimal_Digit_Set;
-- Digs : constant Pattern := Span(Digit);
-- Find : constant Pattern :=
-- "" * Max & Fence & -- initialize Max to null
-- BreakX (Digit) & -- scan looking for digits
-- ((Span(Digit) * Cur & -- assign next string to Cur
-- (+GtS'Unrestricted_Access) & -- check size(Cur) > Size(Max)
-- Setcur(Loc'Access)) -- if so, save location
-- * Max) & -- and assign to Max
-- Fail; -- seek all alternatives
-- As we see from the comments here, complex patterns like this take
-- on aspects of sequential programs. In fact they are sequential
-- programs with general backtracking. In this pattern, we first use
-- a pattern assignment that matches null and assigns it to Max, so
-- that it is initialized for the new match. Now BreakX scans to the
-- next digit. Arb would do here, but BreakX will be more efficient.
-- Once we have found a digit, we scan out the longest string of
-- digits with Span, and assign it to Cur. The deferred call to GtS
-- tests if the string we assigned to Cur is the longest so far. If
-- not, then failure is signalled, and we seek alternatives (this
-- means that BreakX will extend and look for the next digit string).
-- If the call to GtS succeeds then the matched string is assigned
-- as the largest string so far into Max and its location is saved
-- in Loc. Finally Fail forces the match to fail and seek alternatives,
-- so that the entire string is searched.
-- If the pattern Find is matched against a string, the variable Max
-- at the end of the pattern will have the longest string of digits,
-- and Loc will be the starting character location of the string. For
-- example, Match("ab123cd4657ef23", Find) will assign "4657" to Max
-- and 11 to Loc (indicating that the string ends with the eleventh
-- character of the string).
-- Note: the use of Unrestricted_Access to reference GtS will not
-- be needed if GtS is defined at the outer level, but definitely
-- will be necessary if GtS is a nested function (in which case of
-- course the scope of the pattern Find will be restricted to this
-- nested scope, and this cannot be checked, i.e. use of the pattern
-- outside this scope is erroneous). Generally it is a good idea to
-- define patterns and the functions they call at the outer level
-- where possible, to avoid such problems.
-- Correspondence with Pattern Matching in SPITBOL
-- ===============================================
-- Generally the Ada syntax and names correspond closely to SPITBOL
-- syntax for pattern matching construction.
-- The basic pattern construction operators are renamed as follows:
-- Spitbol Ada
-- (space) &
-- | or
-- $ *
-- . **
-- The Ada operators were chosen so that the relative precedences of
-- these operators corresponds to that of the Spitbol operators, but
-- as always, the use of parentheses is advisable to clarify.
-- The pattern construction operators all have similar names except for
-- Spitbol Ada
-- Abort Cancel
-- Rem Rest
-- where we have clashes with Ada reserved names
-- Ada requires the use of 'Access to refer to functions used in the
-- pattern match, and often the use of 'Unrestricted_Access may be
-- necessary to get around the scope restrictions if the functions
-- are not declared at the outer level.
-- The actual pattern matching syntax is modified in Ada as follows:
-- Spitbol Ada
-- X Y Match (X, Y);
-- X Y = Z Match (X, Y, Z);
-- and pattern failure is indicated by returning a Boolean result from
-- the Match function (True for success, False for failure).
-----------------------
-- Type Declarations --
-----------------------
type Pattern is private;
-- Type representing a pattern. This package provides a complete set of
-- operations for constructing patterns that can be used in the pattern
-- matching operations provided.
type Boolean_Func is access function return Boolean;
-- General Boolean function type. When this type is used as a formal
-- parameter type in this package, it indicates a deferred predicate
-- pattern. The function will be called when the pattern element is
-- matched and failure signalled if False is returned.
type Natural_Func is access function return Natural;
-- General Natural function type. When this type is used as a formal
-- parameter type in this package, it indicates a deferred pattern.
-- The function will be called when the pattern element is matched
-- to obtain the currently referenced Natural value.
type VString_Func is access function return VString;
-- General VString function type. When this type is used as a formal
-- parameter type in this package, it indicates a deferred pattern.
-- The function will be called when the pattern element is matched
-- to obtain the currently referenced string value.
subtype PString is String;
-- This subtype is used in the remainder of the package to indicate a
-- formal parameter that is converted to its corresponding pattern,
-- i.e. a pattern that matches the characters of the string.
subtype PChar is Character;
-- Similarly, this subtype is used in the remainder of the package to
-- indicate a formal parameter that is converted to its corresponding
-- pattern, i.e. a pattern that matches this one character.
subtype VString_Var is VString;
subtype Pattern_Var is Pattern;
-- These synonyms are used as formal parameter types to a function where,
-- if the language allowed, we would use in out parameters, but we are
-- not allowed to have in out parameters for functions. Instead we pass
-- actuals which must be variables, and with a bit of trickery in the
-- body, manage to interpret them properly as though they were indeed
-- in out parameters.
pragma Warnings (Off, VString_Var);
pragma Warnings (Off, Pattern_Var);
-- We turn off warnings for these two types so that when variables are used
-- as arguments in this context, warnings about them not being assigned in
-- the source program will be suppressed.
--------------------------------
-- Basic Pattern Construction --
--------------------------------
function "&" (L : Pattern; R : Pattern) return Pattern;
function "&" (L : PString; R : Pattern) return Pattern;
function "&" (L : Pattern; R : PString) return Pattern;
function "&" (L : PChar; R : Pattern) return Pattern;
function "&" (L : Pattern; R : PChar) return Pattern;
-- Pattern concatenation. Matches L followed by R
function "or" (L : Pattern; R : Pattern) return Pattern;
function "or" (L : PString; R : Pattern) return Pattern;
function "or" (L : Pattern; R : PString) return Pattern;
function "or" (L : PString; R : PString) return Pattern;
function "or" (L : PChar; R : Pattern) return Pattern;
function "or" (L : Pattern; R : PChar) return Pattern;
function "or" (L : PChar; R : PChar) return Pattern;
function "or" (L : PString; R : PChar) return Pattern;
function "or" (L : PChar; R : PString) return Pattern;
-- Pattern alternation. Creates a pattern that will first try to match
-- L and then on a subsequent failure, attempts to match R instead.
----------------------------------
-- Pattern Assignment Functions --
----------------------------------
function "*" (P : Pattern; Var : VString_Var) return Pattern;
function "*" (P : PString; Var : VString_Var) return Pattern;
function "*" (P : PChar; Var : VString_Var) return Pattern;
-- Matches P, and if the match succeeds, assigns the matched substring
-- to the given VString variable Var. This assignment happens as soon as
-- the substring is matched, and if the pattern P1 is matched more than
-- once during the course of the match, then the assignment will occur
-- more than once.
function "**" (P : Pattern; Var : VString_Var) return Pattern;
function "**" (P : PString; Var : VString_Var) return Pattern;
function "**" (P : PChar; Var : VString_Var) return Pattern;
-- Like "*" above, except that the assignment happens at most once
-- after the entire match is completed successfully. If the match
-- fails, then no assignment takes place.
----------------------------------
-- Deferred Matching Operations --
----------------------------------
function "+" (Str : VString_Var) return Pattern;
-- Here Str must be a VString variable. This function constructs a
-- pattern which at pattern matching time will access the current
-- value of this variable, and match against these characters.
function "+" (Str : VString_Func) return Pattern;
-- Constructs a pattern which at pattern matching time calls the given
-- function, and then matches against the string or character value
-- that is returned by the call.
function "+" (P : Pattern_Var) return Pattern;
-- Here P must be a Pattern variable. This function constructs a
-- pattern which at pattern matching time will access the current
-- value of this variable, and match against the pattern value.
function "+" (P : Boolean_Func) return Pattern;
-- Constructs a predicate pattern function that at pattern matching time
-- calls the given function. If True is returned, then the pattern matches.
-- If False is returned, then failure is signalled.
--------------------------------
-- Pattern Building Functions --
--------------------------------
function Arb return Pattern;
-- Constructs a pattern that will match any string. On the first attempt,
-- the pattern matches a null string, then on each successive failure, it
-- matches one more character, and only fails if matching the entire rest
-- of the string.
function Arbno (P : Pattern) return Pattern;
function Arbno (P : PString) return Pattern;
function Arbno (P : PChar) return Pattern;
-- Pattern repetition. First matches null, then on a subsequent failure
-- attempts to match an additional instance of the given pattern.
-- Equivalent to (but more efficient than) P & ("" or (P & ("" or ...
function Any (Str : String) return Pattern;
function Any (Str : VString) return Pattern;
function Any (Str : Character) return Pattern;
function Any (Str : Character_Set) return Pattern;
function Any (Str : not null access VString) return Pattern;
function Any (Str : VString_Func) return Pattern;
-- Constructs a pattern that matches a single character that is one of
-- the characters in the given argument. The pattern fails if the current
-- character is not in Str.
function Bal return Pattern;
-- Constructs a pattern that will match any non-empty string that is
-- parentheses balanced with respect to the normal parentheses characters.
-- Attempts to extend the string if a subsequent failure occurs.
function Break (Str : String) return Pattern;
function Break (Str : VString) return Pattern;
function Break (Str : Character) return Pattern;
function Break (Str : Character_Set) return Pattern;
function Break (Str : not null access VString) return Pattern;
function Break (Str : VString_Func) return Pattern;
-- Constructs a pattern that matches a (possibly null) string which
-- is immediately followed by a character in the given argument. This
-- character is not part of the matched string. The pattern fails if
-- the remaining characters to be matched do not include any of the
-- characters in Str.
function BreakX (Str : String) return Pattern;
function BreakX (Str : VString) return Pattern;
function BreakX (Str : Character) return Pattern;
function BreakX (Str : Character_Set) return Pattern;
function BreakX (Str : not null access VString) return Pattern;
function BreakX (Str : VString_Func) return Pattern;
-- Like Break, but the pattern attempts to extend on a failure to find
-- the next occurrence of a character in Str, and only fails when the
-- last such instance causes a failure.
function Cancel return Pattern;
-- Constructs a pattern that immediately aborts the entire match
function Fail return Pattern;
-- Constructs a pattern that always fails
function Fence return Pattern;
-- Constructs a pattern that matches null on the first attempt, and then
-- causes the entire match to be aborted if a subsequent failure occurs.
function Fence (P : Pattern) return Pattern;
-- Constructs a pattern that first matches P. If P fails, then the
-- constructed pattern fails. If P succeeds, then the match proceeds,
-- but if subsequent failure occurs, alternatives in P are not sought.
-- The idea of Fence is that each time the pattern is matched, just
-- one attempt is made to match P, without trying alternatives.
function Len (Count : Natural) return Pattern;
function Len (Count : not null access Natural) return Pattern;
function Len (Count : Natural_Func) return Pattern;
-- Constructs a pattern that matches exactly the given number of
-- characters. The pattern fails if fewer than this number of characters
-- remain to be matched in the string.
function NotAny (Str : String) return Pattern;
function NotAny (Str : VString) return Pattern;
function NotAny (Str : Character) return Pattern;
function NotAny (Str : Character_Set) return Pattern;
function NotAny (Str : not null access VString) return Pattern;
function NotAny (Str : VString_Func) return Pattern;
-- Constructs a pattern that matches a single character that is not
-- one of the characters in the given argument. The pattern Fails if
-- the current character is in Str.
function NSpan (Str : String) return Pattern;
function NSpan (Str : VString) return Pattern;
function NSpan (Str : Character) return Pattern;
function NSpan (Str : Character_Set) return Pattern;
function NSpan (Str : not null access VString) return Pattern;
function NSpan (Str : VString_Func) return Pattern;
-- Constructs a pattern that matches the longest possible string
-- consisting entirely of characters from the given argument. The
-- string may be empty, so this pattern always succeeds.
function Pos (Count : Natural) return Pattern;
function Pos (Count : not null access Natural) return Pattern;
function Pos (Count : Natural_Func) return Pattern;
-- Constructs a pattern that matches the null string if exactly Count
-- characters have already been matched, and otherwise fails.
function Rest return Pattern;
-- Constructs a pattern that always succeeds, matching the remaining
-- unmatched characters in the pattern.
function Rpos (Count : Natural) return Pattern;
function Rpos (Count : not null access Natural) return Pattern;
function Rpos (Count : Natural_Func) return Pattern;
-- Constructs a pattern that matches the null string if exactly Count
-- characters remain to be matched in the string, and otherwise fails.
function Rtab (Count : Natural) return Pattern;
function Rtab (Count : not null access Natural) return Pattern;
function Rtab (Count : Natural_Func) return Pattern;
-- Constructs a pattern that matches from the current location until
-- exactly Count characters remain to be matched in the string. The
-- pattern fails if fewer than Count characters remain to be matched.
function Setcur (Var : not null access Natural) return Pattern;
-- Constructs a pattern that matches the null string, and assigns the
-- current cursor position in the string. This value is the number of
-- characters matched so far. So it is zero at the start of the match.
function Span (Str : String) return Pattern;
function Span (Str : VString) return Pattern;
function Span (Str : Character) return Pattern;
function Span (Str : Character_Set) return Pattern;
function Span (Str : not null access VString) return Pattern;
function Span (Str : VString_Func) return Pattern;
-- Constructs a pattern that matches the longest possible string
-- consisting entirely of characters from the given argument. The
-- string cannot be empty , so the pattern fails if the current
-- character is not one of the characters in Str.
function Succeed return Pattern;
-- Constructs a pattern that succeeds matching null, both on the first
-- attempt, and on any rematch attempt, i.e. it is equivalent to an
-- infinite alternation of null strings.
function Tab (Count : Natural) return Pattern;
function Tab (Count : not null access Natural) return Pattern;
function Tab (Count : Natural_Func) return Pattern;
-- Constructs a pattern that from the current location until Count
-- characters have been matched. The pattern fails if more than Count
-- characters have already been matched.
---------------------------------
-- Pattern Matching Operations --
---------------------------------
-- The Match function performs an actual pattern matching operation.
-- The versions with three parameters perform a match without modifying
-- the subject string and return a Boolean result indicating if the
-- match is successful or not. The Anchor parameter is set to True to
-- obtain an anchored match in which the pattern is required to match
-- the first character of the string. In an unanchored match, which is
-- the default, successive attempts are made to match the given pattern
-- at each character of the subject string until a match succeeds, or
-- until all possibilities have failed.
-- Note that pattern assignment functions in the pattern may generate
-- side effects, so these functions are not necessarily pure.
Anchored_Mode : Boolean := False;
-- This global variable can be set True to cause all subsequent pattern
-- matches to operate in anchored mode. In anchored mode, no attempt is
-- made to move the anchor point, so that if the match succeeds it must
-- succeed starting at the first character. Note that the effect of
-- anchored mode may be achieved in individual pattern matches by using
-- Fence or Pos(0) at the start of the pattern.
Pattern_Stack_Overflow : exception;
-- Exception raised if internal pattern matching stack overflows. This
-- is typically the result of runaway pattern recursion. If there is a
-- genuine case of stack overflow, then either the match must be broken
-- down into simpler steps, or the stack limit must be reset.
Stack_Size : constant Positive := 2000;
-- Size used for internal pattern matching stack. Increase this size if
-- complex patterns cause Pattern_Stack_Overflow to be raised.
-- Simple match functions. The subject is matched against the pattern.
-- Any immediate or deferred assignments or writes are executed, and
-- the returned value indicates whether or not the match succeeded.
function Match
(Subject : VString;
Pat : Pattern) return Boolean;
function Match
(Subject : VString;
Pat : PString) return Boolean;
function Match
(Subject : String;
Pat : Pattern) return Boolean;
function Match
(Subject : String;
Pat : PString) return Boolean;
-- Replacement functions. The subject is matched against the pattern.
-- Any immediate or deferred assignments or writes are executed, and
-- the returned value indicates whether or not the match succeeded.
-- If the match succeeds, then the matched part of the subject string
-- is replaced by the given Replace string.
function Match
(Subject : VString_Var;
Pat : Pattern;
Replace : VString) return Boolean;
function Match
(Subject : VString_Var;
Pat : PString;
Replace : VString) return Boolean;
function Match
(Subject : VString_Var;
Pat : Pattern;
Replace : String) return Boolean;
function Match
(Subject : VString_Var;
Pat : PString;
Replace : String) return Boolean;
-- Simple match procedures. The subject is matched against the pattern.
-- Any immediate or deferred assignments or writes are executed. No
-- indication of success or failure is returned.
procedure Match
(Subject : VString;
Pat : Pattern);
procedure Match
(Subject : VString;
Pat : PString);
procedure Match
(Subject : String;
Pat : Pattern);
procedure Match
(Subject : String;
Pat : PString);
-- Replacement procedures. The subject is matched against the pattern.
-- Any immediate or deferred assignments or writes are executed. No
-- indication of success or failure is returned. If the match succeeds,
-- then the matched part of the subject string is replaced by the given
-- Replace string.
procedure Match
(Subject : in out VString;
Pat : Pattern;
Replace : VString);
procedure Match
(Subject : in out VString;
Pat : PString;
Replace : VString);
procedure Match
(Subject : in out VString;
Pat : Pattern;
Replace : String);
procedure Match
(Subject : in out VString;
Pat : PString;
Replace : String);
-- Deferred Replacement
type Match_Result is private;
-- Type used to record result of pattern match
subtype Match_Result_Var is Match_Result;
-- This synonyms is used as a formal parameter type to a function where,
-- if the language allowed, we would use an in out parameter, but we are
-- not allowed to have in out parameters for functions. Instead we pass
-- actuals which must be variables, and with a bit of trickery in the
-- body, manage to interpret them properly as though they were indeed
-- in out parameters.
function Match
(Subject : VString_Var;
Pat : Pattern;
Result : Match_Result_Var) return Boolean;
procedure Match
(Subject : in out VString;
Pat : Pattern;
Result : out Match_Result);
procedure Replace
(Result : in out Match_Result;
Replace : VString);
-- Given a previous call to Match which set Result, performs a pattern
-- replacement if the match was successful. Has no effect if the match
-- failed. This call should immediately follow the Match call.
------------------------
-- Debugging Routines --
------------------------
-- Debugging pattern matching operations can often be quite complex,
-- since there is no obvious way to trace the progress of the match.
-- The declarations in this section provide some debugging assistance.
Debug_Mode : Boolean := False;
-- This global variable can be set True to generate debugging on all
-- subsequent calls to Match. The debugging output is a full trace of
-- the actions of the pattern matcher, written to Standard_Output. The
-- level of this information is intended to be comprehensible at the
-- abstract level of this package declaration. However, note that the
-- use of this switch often generates large amounts of output.
function "*" (P : Pattern; Fil : File_Access) return Pattern;
function "*" (P : PString; Fil : File_Access) return Pattern;
function "*" (P : PChar; Fil : File_Access) return Pattern;
function "**" (P : Pattern; Fil : File_Access) return Pattern;
function "**" (P : PString; Fil : File_Access) return Pattern;
function "**" (P : PChar; Fil : File_Access) return Pattern;
-- These are similar to the corresponding pattern assignment operations
-- except that instead of setting the value of a variable, the matched
-- substring is written to the appropriate file. This can be useful in
-- following the progress of a match without generating the full amount
-- of information obtained by setting Debug_Mode to True.
Terminal : constant File_Access := Standard_Error;
Output : constant File_Access := Standard_Output;
-- Two handy synonyms for use with the above pattern write operations
-- Finally we have some routines that are useful for determining what
-- patterns are in use, particularly if they are constructed dynamically.
function Image (P : Pattern) return String;
function Image (P : Pattern) return VString;
-- This procedures yield strings that corresponds to the syntax needed
-- to create the given pattern using the functions in this package. The
-- form of this string is such that it could actually be compiled and
-- evaluated to yield the required pattern except for references to
-- variables and functions, which are output using one of the following
-- forms:
--
-- access Natural NP(16#...#)
-- access Pattern PP(16#...#)
-- access VString VP(16#...#)
--
-- Natural_Func NF(16#...#)
-- VString_Func VF(16#...#)
--
-- where 16#...# is the hex representation of the integer address that
-- corresponds to the given access value
procedure Dump (P : Pattern);
-- This procedure writes information about the pattern to Standard_Out.
-- The format of this information is keyed to the internal data structures
-- used to implement patterns. The information provided by Dump is thus
-- more precise than that yielded by Image, but is also a bit more obscure
-- (i.e. it cannot be interpreted solely in terms of this spec, you have
-- to know something about the data structures).
------------------
-- Private Part --
------------------
private
type PE;
-- Pattern element, a pattern is a complex structure of PE's. This type
-- is defined and described in the body of this package.
type PE_Ptr is access all PE;
-- Pattern reference. PE's use PE_Ptr values to reference other PE's
type Pattern is new Controlled with record
Stk : Natural := 0;
-- Maximum number of stack entries required for matching this
-- pattern. See description of pattern history stack in body.
P : PE_Ptr := null;
-- Pointer to initial pattern element for pattern
end record;
pragma Finalize_Storage_Only (Pattern);
procedure Adjust (Object : in out Pattern);
-- Adjust routine used to copy pattern objects
procedure Finalize (Object : in out Pattern);
-- Finalization routine used to release storage allocated for a pattern
type VString_Ptr is access all VString;
type Match_Result is record
Var : VString_Ptr;
-- Pointer to subject string. Set to null if match failed
Start : Natural := 1;
-- Starting index position (1's origin) of matched section of
-- subject string. Only valid if Var is non-null.
Stop : Natural := 0;
-- Ending index position (1's origin) of matched section of
-- subject string. Only valid if Var is non-null.
end record;
pragma Volatile (Match_Result);
-- This ensures that the Result parameter is passed by reference, so
-- that we can play our games with the bogus Match_Result_Var parameter
-- in the function case to treat it as though it were an in out parameter.
end GNAT.Spitbol.Patterns;