summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMichael Gran <spk121@yahoo.com>2009-09-04 07:55:05 -0700
committerMichael Gran <spk121@yahoo.com>2009-09-04 07:55:05 -0700
commit28cc8dac2f520fa9de29e93dca52e4892b945a3c (patch)
tree73ab90f8f277ca07a208bc7c85d3768a5a59c2f0
parent18d8fcd43c8ea6b0122453b2d9f7ac10c1f36d6c (diff)
downloadguile-28cc8dac2f520fa9de29e93dca52e4892b945a3c.tar.gz
Doc updates for Unicode string escapes and port encodings
* NEWS: string and port changes * doc/ref/api-data.texi: string escapes and string-ci * doc/ref/api-io.texi: port encoding functions
-rw-r--r--NEWS11
-rwxr-xr-xdoc/ref/api-data.texi19
-rw-r--r--doc/ref/api-io.texi84
3 files changed, 103 insertions, 11 deletions
diff --git a/NEWS b/NEWS
index 955075bfa..a3c4dddc1 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,17 @@ prerelease, and a full NEWS corresponding to 1.8 -> 2.0.)
Changes in 1.9.3 (since the 1.9.2 prerelease):
+** Ports do transcoding
+
+Ports now have an associated character encoding, and port read/write
+operations do conversion to/from locales automatically. Ports also
+have an associated strategy for how to deal with locale conversion
+failures. Four functions to support this: set-port-encoding!,
+port-encoding, set-port-conversion-strategy!,
+port-conversion-strategy.
+
+** String and SRFI-13 functions can operate on Unicode strings
+
** SRFI-14 char-sets are modified for Unicode
The default char-sets are not longer locale dependent and contain
diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi
index 5cbf4b17b..cf0d32113 100755
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@@ -2690,6 +2690,14 @@ Vertical tab character (ASCII 11).
@item @nicode{\xHH}
Character code given by two hexadecimal digits. For example
@nicode{\x7f} for an ASCII DEL (127).
+
+@item @nicode{\uHHHH}
+Character code given by four hexadecimal digits. For example
+@nicode{\u0100} for a capital A with macron (U+0100).
+
+@item @nicode{\UHHHHHH}
+Character code given by six hexadecimal digits. For example
+@nicode{\U010402}.
@end table
@noindent
@@ -3110,9 +3118,14 @@ The procedures in this section are similar to the character ordering
predicates (@pxref{Characters}), but are defined on character sequences.
The first set is specified in R5RS and has names that end in @code{?}.
-The second set is specified in SRFI-13 and the names have no ending
-@code{?}. The predicates ending in @code{-ci} ignore the character case
-when comparing strings. @xref{Text Collation, the @code{(ice-9
+The second set is specified in SRFI-13 and the names have not ending
+@code{?}.
+
+The predicates ending in @code{-ci} ignore the character case
+when comparing strings. For now, case-insensitive comparison is done
+using the R5RS rules, where every lower-case character that has a
+single character upper-case form is converted to uppercase before
+comparison. See @xref{Text Collation, the @code{(ice-9
i18n)} module}, for locale-dependent string comparison.
@rnindex string=?
diff --git a/doc/ref/api-io.texi b/doc/ref/api-io.texi
index 96cd147f3..83a2fd79c 100644
--- a/doc/ref/api-io.texi
+++ b/doc/ref/api-io.texi
@@ -47,7 +47,7 @@ are two interesting and powerful examples of this technique.
Ports are garbage collected in the usual way (@pxref{Memory
Management}), and will be closed at that time if not already closed.
-In this case any errors occuring in the close will not be reported.
+In this case any errors occurring in the close will not be reported.
Usually a program will want to explicitly close so as to be sure all
its operations have been successful. Of course if a program has
abandoned something due to an error or other condition then closing
@@ -70,6 +70,18 @@ All file access uses the ``LFS'' large file support functions when
available, so files bigger than 2 Gbytes (@math{2^31} bytes) can be
read and written on a 32-bit system.
+Each port has an associated character encoding that controls how bytes
+read from the port are converted to characters and string and controls
+how characters and strings written to the port are converted to bytes.
+When ports are created, they inherit their character encoding from the
+current locale, but, that can be modified after the port is created.
+
+Each port also has an associated conversion strategy: what to do when
+a Guile character can't be converted to the port's encoded character
+representation for output. There are three possible strategies: to
+raise an error, to replace the character with a hex escape, or to
+replace the character with a substitute character.
+
@rnindex input-port?
@deffn {Scheme Procedure} input-port? x
@deffnx {C Function} scm_input_port_p (x)
@@ -93,6 +105,55 @@ Equivalent to @code{(or (input-port? @var{x}) (output-port?
@var{x}))}.
@end deffn
+@deffn {Scheme Procedure} set-port-encoding! port enc
+@deffnx {C Function} scm_set_port_encoding_x (port, enc)
+Sets the character encoding that will be used to interpret all port
+I/O. @var{enc} is a string containing the name of an encoding.
+@end deffn
+
+New ports are created with the encoding appropriate for the current
+locale if @code{setlocale} has been called or ISO-8859-1 otherwise,
+and this procedure can be used to modify that encoding.
+
+@deffn {Scheme Procedure} port-encoding port
+@deffnx {C Function} scm_port_encoding
+Returns, as a string, the character encoding that @var{port} uses to
+interpret its input and output.
+@end deffn
+
+@deffn {Scheme Procedure} set-port-conversion-strategy! port sym
+@deffnx {C Function} scm_set_port_conversion_strategy_x (port, sym)
+Sets the behavior of the interpreter when outputting a character that
+is not representable in the port's current encoding. @var{sym} can be
+either @code{'error}, @code{'substitute}, or @code{'escape}. If it is
+@code{'error}, an error will be thrown when an nonconvertible character
+is encountered. If it is @code{'substitute}, then nonconvertible
+characters will be replaced with approximate characters, or with
+question marks if no approximately correct character is available. If
+it is @code{'escape}, it will appear as a hex escape when output.
+
+If @var{port} is an open port, the conversion error behavior
+is set for that port. If it is @code{#f}, it is set as the
+default behavior for any future ports that get created in
+this thread.
+@end deffn
+
+@deffn {Scheme Procedure} port-conversion-strategy port
+@deffnx {C Function} scm_port_conversion_strategy (port)
+Returns the behavior of the port when outputting a character that is
+not representable in the port's current encoding. It returns the
+symbol @code{error} if unrepresentable characters should cause
+exceptions, @code{substitute} if the port should try to replace
+unrepresentable characters with question marks or approximate
+characters, or @code{escape} if unrepresentable characters should be
+converted to string escapes.
+
+If @var{port} is @code{#f}, then the current default behavior will be
+returned. New ports will have this default behavior when they are
+created.
+@end deffn
+
+
@node Reading
@subsection Reading
@@ -238,7 +299,7 @@ output port if not given.
The output is designed to be machine readable, and can be read back
with @code{read} (@pxref{Reading}). Strings are printed in
-doublequotes, with escapes if necessary, and characters are printed in
+double quotes, with escapes if necessary, and characters are printed in
@samp{#\} notation.
@end deffn
@@ -248,7 +309,7 @@ Send a representation of @var{obj} to @var{port} or to the current
output port if not given.
The output is designed for human readability, it differs from
-@code{write} in that strings are printed without doublequotes and
+@code{write} in that strings are printed without double quotes and
escapes, and characters are printed as per @code{write-char}, not in
@samp{#\} form.
@end deffn
@@ -496,7 +557,7 @@ used. This function is equivalent to:
@end lisp
@end deffn
-Some of the abovementioned I/O functions rely on the following C
+Some of the aforementioned I/O functions rely on the following C
primitives. These will mainly be of interest to people hacking Guile
internals.
@@ -815,11 +876,11 @@ Open @var{filename} for output. Equivalent to
Open @var{filename} for input or output, and call @code{(@var{proc}
port)} with the resulting port. Return the value returned by
@var{proc}. @var{filename} is opened as per @code{open-input-file} or
-@code{open-output-file} respectively, and an error is signalled if it
+@code{open-output-file} respectively, and an error is signaled if it
cannot be opened.
When @var{proc} returns, the port is closed. If @var{proc} does not
-return (eg.@: if it throws an error), then the port might not be
+return (e.g.@: if it throws an error), then the port might not be
closed automatically, though it will be garbage collected in the usual
way if not otherwise referenced.
@end deffn
@@ -834,7 +895,7 @@ setup as respectively the @code{current-input-port},
@code{current-output-port}, or @code{current-error-port}. Return the
value returned by @var{thunk}. @var{filename} is opened as per
@code{open-input-file} or @code{open-output-file} respectively, and an
-error is signalled if it cannot be opened.
+error is signaled if it cannot be opened.
When @var{thunk} returns, the port is closed and the previous setting
of the respective current port is restored.
@@ -891,6 +952,13 @@ Determine whether @var{obj} is a port that is related to a file.
The following allow string ports to be opened by analogy to R4R*
file port facilities:
+With string ports, the port-encoding is treated differently than other
+types of ports. When string ports are created, they do not inherit a
+character encoding from the current locale. They are given a
+default locale that allows them to handle all valid string characters.
+Typically one should not modify a string port's character encoding
+away from its default.
+
@deffn {Scheme Procedure} call-with-output-string proc
@deffnx {C Function} scm_call_with_output_string (proc)
Calls the one-argument procedure @var{proc} with a newly created output
@@ -1409,7 +1477,7 @@ is set.
@node Port Implementation
@subsubsection Port Implementation
-@cindex Port implemenation
+@cindex Port implementation
This section describes how to implement a new port type in C.