diff options
Diffstat (limited to 'ext/mbstring/oniguruma/doc/RE')
-rw-r--r-- | ext/mbstring/oniguruma/doc/RE | 224 |
1 files changed, 0 insertions, 224 deletions
diff --git a/ext/mbstring/oniguruma/doc/RE b/ext/mbstring/oniguruma/doc/RE deleted file mode 100644 index 3527b4556f..0000000000 --- a/ext/mbstring/oniguruma/doc/RE +++ /dev/null @@ -1,224 +0,0 @@ -Oniguruma Regular Expressions 2003/07/04 - -syntax: REG_SYNTAX_RUBY (default) - - -1. Syntax elements - - \ escape - | alternation - (...) group - [...] character class - - -2. Characters - - \t horizontal tab (0x09) - \v vertical tab (0x0B) - \n newline (0x0A) - \r return (0x0D) - \b back space (0x08) (* in character class only) - \f form feed (0x0C) - \a bell (0x07) - \e escape (0x1B) - \nnn octal char - \xHH hexadecimal char - \x{7HHHHHHH} wide hexadecimal char - \cx control char - \C-x control char - \M-x meta (x|0x80) - \M-\C-x meta control char - - -3. Character types - - . any character (except newline) - \w word character (alphanumeric, "_" and multibyte char) - \W non-word char - \s whitespace char (\t, \n, \v, \f, \r, \x20) - \S non-whitespace char - \d digit char - \D non-digit char - - -4. Quantifier - - greedy - - ? 1 or 0 times - * 0 or more times - + 1 or more times - {n,m} at least n but not more than m times - {n,} at least n times - {n} n times - - reluctant - - ?? 1 or 0 times - *? 0 or more times - +? 1 or more times - {n,m}? at least n but not more than m times - {n,}? at least n times - - possessive (greedy and does not backtrack after repeated) - - ?+ 1 or 0 times - *+ 0 or more times - ++ 1 or more times - - -5. Anchors - - ^ beginning of the line - $ end of the line - \b word boundary - \B not word boundary - \A beginning of string - \Z end of string, or before newline at the end - \z end of string - \G previous end-of-match position - - -6. POSIX character class ([:xxxxx:], negate [:^xxxxx:]) - - alnum alphabet or digit char - alpha alphabet - ascii code value: [0 - 127] - blank \t, \x20 - cntrl - digit 0-9 - graph - lower - print - punct - space \t, \n, \v, \f, \r, \x20 - upper - xdigit 0-9, a-f, A-F - - -7. Operators in character class - - [...] group (character class in character class) - && intersection - (lowest precedence operator in character class) - - ex. [a-w&&[^c-g]z] ==> ([a-w] and ([^c-g] or z)) ==> [abh-w] - - -8. Extended expressions - - (?#...) comment - (?imx-imx) option on/off - i: ignore case - m: multi-line (dot(.) match newline) - x: extended form - (?imx-imx:subexp) option on/off for subexp - (?:subexp) not captured - (?=subexp) look-ahead - (?!subexp) negative look-ahead - (?<=subexp) look-behind - (?<!subexp) negative look-behind - - Subexp of look-behind must be fixed character length. - But different character length is allowed in top level - alternatives only. - ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed. - - (?>subexp) don't backtrack - (?<name>subexp) define named group - (name can not include '>', ')', '\' and NUL character) - - -9. Back reference - - \n back reference by group number (n >= 1) - \k<name> back reference by group name - - -10. Subexp call ("Tanaka Akira special") - - \g<name> call by group name - \g<n> call by group number (only if 'n' is not defined as name) - - ------------------------------ -11. Original extensions - - + named group (?<name>...) - + named backref \k<name> - + subexp call \g<name>, \g<group-num> - - -12. Lacked features compare with perl 5.8.0 - - + [:word:] - + \N{name} - + \l,\u,\L,\U, \P, \X, \C - + (?{code}) - + (??{code}) - + (?(condition)yes-pat|no-pat) - - + \Q...\E (* This is effective on REG_SYNTAX_PERL and REG_SYNTAX_JAVA) - - -13. Syntax depend options - - + REG_SYNTAX_RUBY (default) - (?m): dot(.) match newline - - + REG_SYNTAX_PERL, REG_SYNTAX_JAVA - (?s): dot(.) match newline - (?m): ^ match after newline, $ match before newline - - -14. Differences with Japanized GNU regex(version 0.12) of Ruby - - + add look behind - (?<=fixed-char-length-pattern), (?<!fixed-char-length-pattern) - (in negative-look-behind, capture group isn't allowed, - shy group(?:) is allowed.) - + add possessive quantifier. ?+, *+, ++ - + add operations in character class. [], && - + add named group and subexp call. - + octal or hexadecimal number sequence can be treated as - a multibyte code char in char-class, if multibyte encoding is specified. - (ex. [\xa1\xa2], [\xa1\xa7-\xa4\xa1]) - + effect range of isolated option is to next ')'. - ex. (?:(?i)a|b) is interpreted as (?:(?i:a|b)), not (?:(?i:a)|b). - + isolated option is not transparent to previous pattern. - ex. a(?i)* is a syntax error pattern. - + allowed incompleted left brace as an usual char. - ex. /{/, /({)/, /a{2,3/ etc... - + negative POSIX bracket [:^xxxx:] is supported. - + POSIX bracket [:ascii:] is added. - + repeat of look-ahead is not allowd. - ex. /(?=a)*/, /(?!b){5}/ - - -14. Problems - - + Invalid first byte in UTF-8 is allowed. - (which is the same as GNU regex of Ruby) - - /./u =~ "\xa3" - - Of course, although it is possible to validate, - it will become later than now. - - + Zero-length match in infinite repeat stops the repeat, - and captured group status isn't checked as stop condition. - - /()*\1/ =~ "" #=> match - /(?:()|())*\1\2/ =~ "" #=> fail - - /(?:\1a|())*/ =~ "a" #=> match with "" - - + Ignore case option is not effect to an octal or hexadecimal - numbered char, but it becomes effective if it appears in the char class. - This doesn't have consistency, though they are the specifications - which are the same as GNU regex of Ruby. - - /\x61/i.match("A") # => nil - /[\x61]/i.match("A") # => match - -// END |