summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMichael Gran <spk121@yahoo.com>2009-09-05 10:42:15 -0700
committerMichael Gran <spk121@yahoo.com>2009-09-05 10:42:15 -0700
commit8748ffeaa770ed47192f970ef5302a7c7aa7a935 (patch)
tree99dc3f28337308232e29d9ac462ed86930e2d732
parent28cc8dac2f520fa9de29e93dca52e4892b945a3c (diff)
downloadguile-8748ffeaa770ed47192f970ef5302a7c7aa7a935.tar.gz
Doc updates for character encoding of source code files
* NEWS * doc/ref/scheme-scripts.texi: doc updates for character encoding of source code * doc/ref/api-evaluation.texi: doc updates for character encoding of source code
-rw-r--r--NEWS12
-rw-r--r--doc/ref/api-evaluation.texi70
-rw-r--r--doc/ref/scheme-scripts.texi6
3 files changed, 88 insertions, 0 deletions
diff --git a/NEWS b/NEWS
index a3c4dddc1..147d0822a 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,18 @@ prerelease, and a full NEWS corresponding to 1.8 -> 2.0.)
Changes in 1.9.3 (since the 1.9.2 prerelease):
+** Non-ASCII source code files can be read, but require coding
+ declarations
+
+The default reader now handles source code files for some of the
+non-ASCII character encodings, such as UTF-8. A non-ASCII source file
+should have an encoding declaration near the top of the file. Also,
+there is a new function file-encoding that scans a port for a coding
+declaration.
+
+The pre-1.9.3 reader handled 8-bit clean but otherwise unspecified source
+code. This use is now discouraged.
+
** Ports do transcoding
Ports now have an associated character encoding, and port read/write
diff --git a/doc/ref/api-evaluation.texi b/doc/ref/api-evaluation.texi
index d8412154c..9fc5ef5de 100644
--- a/doc/ref/api-evaluation.texi
+++ b/doc/ref/api-evaluation.texi
@@ -17,6 +17,7 @@ loading, evaluating, and compiling Scheme code at run time.
* Fly Evaluation:: Procedures for on the fly evaluation.
* Compilation:: How to compile Scheme files and procedures.
* Loading:: Loading Scheme code from file.
+* Character Encoding of Source Files:: Loading non-ASCII Scheme code from file.
* Delayed Evaluation:: Postponing evaluation until it is needed.
* Local Evaluation:: Evaluation in a local environment.
* Evaluator Behaviour:: Modifying Guile's evaluator.
@@ -229,6 +230,12 @@ Thus a Guile script often starts like this.
More details on Guile scripting can be found in the scripting section
(@pxref{Guile Scripting}).
+There is one special case where the contents of a comment can actually
+affect the interpretation of code. When a character encoding
+declaration, such as @code{coding: utf-8} appears in one of the first
+few lines of a source file, it indicates to Guile's default reader
+that this source code file is not ASCII. For details see @ref{Character
+Encoding of Source Files}.
@node Case Sensitivity
@subsubsection Case Sensitivity
@@ -590,6 +597,69 @@ a file to load. By default, @code{%load-extensions} is bound to the
list @code{("" ".scm")}.
@end defvar
+@node Character Encoding of Source Files
+@subsection Character Encoding of Source Files
+
+@cindex primitive-load
+@cindex load
+Scheme source code files are usually encoded in ASCII, but, the
+built-in reader can interpret other character encodings. The
+procedure @code{primitive-load}, and by extension the functions that
+call it, such as @code{load}, first scan the top 500 characters of the
+file for a coding declaration.
+
+A coding declaration has the form @code{coding: XXXXXX}, where
+@code{XXXXXX} is the name of a character encoding in which the source
+code file has been encoded. The coding declaration must appear in a
+scheme comment. It can either be a semicolon-initiated comment or a block
+@code{#!} comment.
+
+The name of the character encoding in the coding declaration is
+typically lower case and containing only letters, numbers, and
+hyphens. The most common examples of character encodings are
+@code{utf-8} and @code{iso-8859-1}. This allows the coding
+declaration to be compatible with EMACS.
+
+For source code, only a subset of all possible character encodings can
+be interpreted by the built-in source code reader. Only those
+character encodings in which ASCII text appears unmodified can be
+used. This includes @code{UTF-8} and @code{ISO-8859-1} through
+@code{ISO-8859-15}. The multi-byte character encodings @code{UTF-16}
+and @code{UTF-32} may not be used because they are not compatible with
+ASCII.
+
+@cindex read
+@cindex set-port-encoding!
+There might be a scenario in which one would want to read non-ASCII
+code from a port, such as with the function @code{read}, instead of
+with @code{load}. If the port's character encoding is the same as the
+encoding of the code to be read by the port, not other special
+handling is necessary. The port will automatically do the character
+encoding conversion. The functions @code{setlocale} or by
+@code{set-port-encoding!} are used to set port encodings.
+
+If a port is used to read code of unknown character encoding, it can
+accomplish this in three steps. First, the character encoding of the
+port should be set to ISO-8859-1 using @code{set-port-encoding!}.
+Then, the procedure @code{file-encoding}, described below, is used to
+scan for a coding declaration when reading from the port. As a side
+effect, it rewinds the port after its scan is complete. After that,
+the port's character encoding should be set to the encoding returned
+by @code{file-encoding}, if any, again by using
+@code{set-port-encoding!}. Then the code can be read as normal.
+
+@deffn {Scheme Procedure} file-encoding port
+@deffnx {C Function} scm_file_encoding port
+Scans the port for an EMACS-like character coding declaration near the
+top of the contents of a port with random-acessible contents. The
+coding declaration is of the form @code{coding: XXXXX} and must appear
+in a scheme comment.
+
+Returns a string containing the character encoding of the file
+if a declaration was found, or @code{#f} otherwise. The port is
+rewound.
+@end deffn
+
@node Delayed Evaluation
@subsection Delayed Evaluation
diff --git a/doc/ref/scheme-scripts.texi b/doc/ref/scheme-scripts.texi
index e12eee60f..249bc3414 100644
--- a/doc/ref/scheme-scripts.texi
+++ b/doc/ref/scheme-scripts.texi
@@ -64,6 +64,12 @@ operating system never reads this far, but Guile treats this as the end
of the comment begun on the first line by the @samp{#!} characters.
@item
+If this source code file is not ASCII or ISO-8859-1 encoded, a coding
+declaration such as @code{coding: utf-8} should appear in a comment
+somewhere in the first five lines of the file: see @ref{Character
+Encoding of Source Files}.
+
+@item
The rest of the file should be a Scheme program.
@end itemize