summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJean Abou Samra <jean@abou-samra.fr>2022-12-11 12:28:02 +0100
committerArne Babenhauserheide <arne_bab@web.de>2023-01-17 07:20:10 +0100
commitff165ec9040cce0aa7a9aac0ca44743b7c1186a3 (patch)
treeb306c459f4badc278e8005a76070b0699a8c1c33
parent7d5ab8fa40d0f1b3dfaf4894325ca84cc36a6d31 (diff)
downloadguile-ff165ec9040cce0aa7a9aac0ca44743b7c1186a3.tar.gz
Doc: clarification on regexes and encodings
* doc/ref/api-regex.texi: make it more obviously clear that regexp matching supports only characters supported by the locale encoding.
-rw-r--r--doc/ref/api-regex.texi8
1 files changed, 6 insertions, 2 deletions
diff --git a/doc/ref/api-regex.texi b/doc/ref/api-regex.texi
index b14c2b39c..d778f969f 100644
--- a/doc/ref/api-regex.texi
+++ b/doc/ref/api-regex.texi
@@ -57,7 +57,11 @@ locale's encoding, and then passed to the C library's regular expression
routines (@pxref{Regular Expressions,,, libc, The GNU C Library
Reference Manual}). The returned match structures always point to
characters in the strings, not to individual bytes, even in the case of
-multi-byte encodings.
+multi-byte encodings. This ensures that the match structures are
+correct when performing matching with characters that have a multi-byte
+representation in the locale encoding. Note, however, that using
+characters which cannot be represented in the locale encoding can
+lead to surprising results.
@deffn {Scheme Procedure} string-match pattern str [start]
Compile the string @var{pattern} into a regular expression and compare
@@ -325,7 +329,7 @@ example the following is the date example from
@code{string-match} call.
@lisp
-(define date-regex
+(define date-regex
"([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
(define s "Date 20020429 12am.")
(regexp-substitute/global #f date-regex s