summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2013-11-25 15:09:21 +0000
committerph10 <ph10@2f5784b3-3f2a-0410-8824-cb99058d5e15>2013-11-25 15:09:21 +0000
commit8af894c1c770bb0a8e7ebf60b9b2a6ed6185a1c4 (patch)
treedc71f3083a5c34c493f4ec9186e6627d527bcf0d
parent3d0a81dbf14c870df3ab19470bee0c29e1999521 (diff)
downloadpcre-8af894c1c770bb0a8e7ebf60b9b2a6ed6185a1c4.tar.gz
Clarify handling of \s in documentation; fix VT in pcretest's built-in tables.
git-svn-id: svn://vcs.exim.org/pcre/code/trunk@1405 2f5784b3-3f2a-0410-8824-cb99058d5e15
-rw-r--r--ChangeLog8
-rw-r--r--doc/pcrepattern.319
-rw-r--r--pcretest.c8
3 files changed, 19 insertions, 16 deletions
diff --git a/ChangeLog b/ChangeLog
index 5f0bfee..7e3c875 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -89,9 +89,11 @@ Version 8.34 19-November-2013
options in pcretest are provided to set it. It can also be set by
(*NO_AUTO_POSSESS) at the start of a pattern.
-18. The character VT has been added to the set of characters that match \s and
- are generally treated as white space, following this same change in Perl
- 5.18. There is now no difference between "Perl space" and "POSIX space".
+18. The character VT has been added to the default ("C" locale) set of
+ characters that match \s and are generally treated as white space,
+ following this same change in Perl 5.18. There is now no difference between
+ "Perl space" and "POSIX space". Whether VT is treated as white space in
+ other locales depends on the locale.
19. The code for checking named groups as conditions, either for being set or
for being recursed, has been refactored (this is related to 14 and 15
diff --git a/doc/pcrepattern.3 b/doc/pcrepattern.3
index 7dd951f..8bc83dc 100644
--- a/doc/pcrepattern.3
+++ b/doc/pcrepattern.3
@@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "12 November 2013" "PCRE 8.34"
+.TH PCREPATTERN 3 "25 November 2013" "PCRE 8.34"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE REGULAR EXPRESSION DETAILS"
@@ -536,8 +536,9 @@ For compatibility with Perl, \es did not used to match the VT character (code
added VT at release 5.18, and PCRE followed suit at release 8.34. The default
\es characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space
(32), which are defined as white space in the "C" locale. This list may vary if
-locale-specific matching is taking place; in particular, in some locales the
-"non-breaking space" character (\exA0) is recognized as white space.
+locale-specific matching is taking place. For example, in some locales the
+"non-breaking space" character (\exA0) is recognized as white space, and in
+others the VT character is not.
.P
A "word" character is an underscore or any character that is a letter or digit.
By default, the definition of letters and digits is controlled by PCRE's
@@ -1345,11 +1346,11 @@ are:
xdigit hexadecimal digits
.sp
The default "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13),
-and space (32). If locale-specific matching is taking place, there may be
-additional space characters. "Space" used to be different to \es, which did not
-include VT, for Perl compatibility. However, Perl changed at release 5.18, and
-PCRE followed at release 8.34. "Space" and \es now match the same set of
-characters.
+and space (32). If locale-specific matching is taking place, the list of space
+characters may be different; there may be fewer or more of them. "Space" used
+to be different to \es, which did not include VT, for Perl compatibility.
+However, Perl changed at release 5.18, and PCRE followed at release 8.34.
+"Space" and \es now match the same set of characters.
.P
The name "word" is a Perl extension, and "blank" is a GNU extension from Perl
5.8. Another Perl extension is negation, which is indicated by a ^ character
@@ -3230,6 +3231,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
-Last updated: 12 November 2013
+Last updated: 25 November 2013
Copyright (c) 1997-2013 University of Cambridge.
.fi
diff --git a/pcretest.c b/pcretest.c
index 5d8363c..8452d2b 100644
--- a/pcretest.c
+++ b/pcretest.c
@@ -1288,7 +1288,7 @@ graph, print, punct, and cntrl. Other classes are built from combinations. */
*/
0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 0- 7 */
- 0x00,0x01,0x01,0x00,0x01,0x01,0x00,0x00, /* 8- 15 */
+ 0x00,0x01,0x01,0x01,0x01,0x01,0x00,0x00, /* 8- 15 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 16- 23 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */
0x01,0x00,0x00,0x00,0x80,0x00,0x00,0x00, /* - ' */
@@ -1320,9 +1320,9 @@ graph, print, punct, and cntrl. Other classes are built from combinations. */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 240-247 */
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};/* 248-255 */
-/* This is a set of tables that came originally from a Windows user. It seems to
-be at least an approximation of ISO 8859. In particular, there are characters
-greater than 128 that are marked as spaces, letters, etc. */
+/* This is a set of tables that came originally from a Windows user. It seems
+to be at least an approximation of ISO 8859. In particular, there are
+characters greater than 128 that are marked as spaces, letters, etc. */
static const pcre_uint8 tables1[] = {
0,1,2,3,4,5,6,7,