diff options
author | Martin Grabmüller <mgrabmue@cs.tu-berlin.de> | 2001-04-24 19:41:48 +0000 |
---|---|---|
committer | Martin Grabmüller <mgrabmue@cs.tu-berlin.de> | 2001-04-24 19:41:48 +0000 |
commit | 612943c6c1ea6a1b84604e6503cee99da1d6351f (patch) | |
tree | 85cfffe0fc26111616fb00b60a2c6fc6d2328e95 /doc/srfi-13-14.texi | |
parent | fafb71de8c0429d5f460216bb556eeeada7b63e4 (diff) | |
download | guile-612943c6c1ea6a1b84604e6503cee99da1d6351f.tar.gz |
* Makefile.am (guile_TEXINFOS): Added srfi-13-14.texi.
* srfi-13-14.texi: New file documenting SRFI-13/14.
* guile.texi (Top): Added the SRFI-13/14 menu entry and @include.
Diffstat (limited to 'doc/srfi-13-14.texi')
-rw-r--r-- | doc/srfi-13-14.texi | 1091 |
1 files changed, 1091 insertions, 0 deletions
diff --git a/doc/srfi-13-14.texi b/doc/srfi-13-14.texi new file mode 100644 index 000000000..3daaf81f3 --- /dev/null +++ b/doc/srfi-13-14.texi @@ -0,0 +1,1091 @@ +@node SRFI-13/14 +@chapter SRFI-13 and SRFI-14 + +This chapter documents the SRFI-13/14 library, which provides the string +utility procedures defined in SRFI-13 and the character-set procedures +defined in SRFI-14 for Guile. + +@menu +* Introduction:: What is this all about? +* Loading SRFI-13/14:: Loading the module into a running Guile. +* String Functions:: Available string processing procedures. +* Character-set Procedures:: Procedures for manipulating character sets. +@end menu + + +@c =================================================================== + +@node Introduction +@section Introduction + +The SRFI-13/14 library is a shared library which provides the procedures +defined in SRFI-13 (string library) and the procedures defined in +SRFI-14 (character-set library). You should also refer to the SRFI +documents, which provide some details I will not document here. + +If you don't know what SRFI means, and what all the numbers are about, +you may want to refer to the SRFI home page at +@url{http://srfi.schemers.org}. + +Note that only the procedures from SRFI-13 are documented here which are +not already contained in Guile. For procedures not documented here +please refer to the relevant chapters in the Guile Reference Manual, for +example the documentation of strings and string procedures (REFFIXME). + +The SRFI-14 procedures are documented completely. + +@menu +* What can be done:: What is possible with SRFI-13/14 +* What cannot be done:: and what is not? +@end menu + + +@c =================================================================== + +@node What can be done +@subsection What can be done + +All of the procedures defined in SRFI-13, which are not already included +in the Guile core library, are implemented in the module @code{(srfi +srfi-13)}. The procedures which are both in Guile and in SRFI-13, but +which are slightly extended, have been implemented in this module, and +the bindings overwrite those in the Guile core. + +All procedures from SRFI-14 (character-set library) are implemented in +the module @code{(srfi srfi-14)}, as well as the standard variables +@code{char-set:letter}, @code{char-set:digit} etc. + + +@c =================================================================== + +@node What cannot be done +@subsection What cannot be done + +The procedures which are defined in the section @emph{Low-level +procedures} of SRFI-13 for parsing optional string indices, substring +specification checking and Knuth-Morris-Pratt-Searching are not +implemented. + +The procedures @code{string-contains} and @code{string-contains-ci} are +not implemented very efficiently at the moment. This will be changed as +soon as possible. + + +@c =================================================================== + +@node Loading SRFI-13/14 +@section Loading SRFI-13/14 + +When Guile is properly installed, it can be loaded into a running Guile +by using the @code{(srfi srfi-13)} module. + +@example +$ guile +guile> (use-modules (srfi srfi-13)) +guile> +@end example + +When this step causes any errors, Guile is not properly installed. + +One possible reason is that Guile cannot find either the Scheme module +file @file{srfi-13.scm}, or it cannot find the shared object file +@file{libguile-srfi-srfi-13-14.so}. Make sure that the former is in the +Guile load path and that the latter is either installed in some default +location like @file{/usr/local/lib} or that the directory it was +installed to is in your @code{LTDL_LIBRARY_PATH}. The same applies to +@file{srfi-14.scm}. + +Now you can test whether the SRFI-13 procedures are working by calling +the @code{string-concatenate} procedure. + +@example +guile> (string-concatenate '("Hello" " " "World!")) +"Hello World!" +@end example + +The same goes for the SRFI-14 module, of course. + +@example +$ guile +guile> (use-modules (srfi srfi-14)) +guile> (char-set-union (char-set #\f #\o #\o) (string->char-set "bar")) +#<charset @{#\a #\b #\f #\o #\r@}> +guile> +@end example + + +@c =================================================================== + +@node String Functions +@section String Functions + +In this section, we will describe all procedures defined in SRFI-13 +(string library) and implemented by the module @code{(srfi srfi-13)}. + +Except for the procedures in the section @emph{Low-level procedures} of +SRFI-13, all string procedures defined there are implemented completely. + +@menu +* Predicates:: Testing strings. +* SRFI-13 Constructors:: Constructing strings. +* SRFI-13 List/String Conversion:: Converstion from/to character lists. +* SRFI-13 Selection:: Selecting portions from strings. +* SRFI-13 Modification:: Modifying string in--place. +* SRFI-13 Comparison:: Comparing strings. +* Prefixes/Suffixes:: Checking for common pre-/suffixes. +* Searching:: Searching in strings. +* Case Mapping:: Changing the case of strings. +* Reverse/Append:: Append, concatenate and reverse strings. +* Fold/Unfold/Map:: Fold/Unfold/Map over strings. +* Replicate/Rotate:: String replication and rotation. +* Miscellaneous:: Miscellaneous string procedures. +* Filtering/Deleting:: Deleting characters from strings. +@end menu + + +@c =================================================================== + +@node Predicates +@subsection Predicates + +In addition to the primitives @code{string?} and @code{string-null?}, +which are already in the Guile core, the string predicates +@code{string-any} and @code{string-every} are defined by SRFI-13. + +@deffn primitive string-any pred s [start end] +Check if the predicate @var{pred} is true for any character in +the string @var{s}, proceeding from left (index @var{start}) to +right (index @var{end}). If @code{string-any} returns true, +the returned true value is the one produced by the first +successful application of @var{pred}. +@end deffn + +@deffn primitive string-every pred s [start end] +Check if the predicate @var{pred} is true for every character +in the string @var{s}, proceeding from left (index @var{start}) +to right (index @var{end}). If @code{string-every} returns +true, the returned true value is the one produced by the final +application of @var{pred} to the last character of @var{s}. +@end deffn + + +@c =================================================================== + +@node SRFI-13 Constructors +@subsection Constructors + +SRFI-13 defines several procedures for constructing new strings. In +addition to @code{make-string} and @code{string} (available in the Guile +core library), the procedure @code{string-tabulate} does exist. + +@deffn primitive string-tabulate proc len +@var{proc} is an integer->char procedure. Construct a string +of size @var{len} by applying @var{proc} to each index to +produce the corresponding string element. The order in which +@var{proc} is applied to the indices is not specified. +@end deffn + + +@c =================================================================== + +@node SRFI-13 List/String Conversion +@subsection List/String Conversion + +The procedure @code{string->list} is extended by SRFI-13, that is why it +is included in @code{(srfi srfi-13)}. The other procedures are new. +The Guile core already contains the procedure @code{list->string} for +converting a list of characters into a string (REFFIXME). + +@deffn primitive string->list str [start end] +Convert the string @var{str} into a list of characters. +@end deffn + +@deffn primitive reverse-list->string chrs +An efficient implementation of @code{(compose string->list +reverse)}: + +@smalllisp +(reverse-list->string '(#\a #\B #\c)) @result{} "cBa" +@end smalllisp +@end deffn + +@deffn primitive string-join ls [delimiter grammar] +Append the string in the string list @var{ls}, using the string +@var{delim} as a delimiter between the elements of @var{ls}. +@var{grammar} is a symbol which specifies how the delimiter is +placed between the strings, and defaults to the symbol +@code{infix}. + +@table @code +@item infix +Insert the separator between list elements. An empty string +will produce an empty list. + +@item string-infix +Like @code{infix}, but will raise an error if given the empty +list. + +@item suffix +Insert the separator after every list element. + +@item prefix +Insert the separator before each list element. +@end table +@end deffn + + +@c =================================================================== + +@node SRFI-13 Selection +@subsection Selection + +These procedures are called @dfn{selectors}, because they access +information about the string or select pieces of a given string. + +Additional selector procedures are documented in the Strings section +(REFFIXME), like @code{string-length} or @code{string-ref}. + +@code{string-copy} is also available in core Guile, but this version +accepts additional start/end indices. + +@deffn primitive string-copy str [start end] +Return a freshly allocated copy of the string @var{str}. If +given, @var{start} and @var{end} delimit the portion of +@var{str} which is copied. +@end deffn + +@deffn primitive substring/shared str start [end] +Like @code{substring}, but the result may share memory with the +argument @var{str}. +@end deffn + +@deffn primitive string-copy! target tstart s [start end] +Copy the sequence of characters from index range [@var{start}, +@var{end}) in string @var{s} to string @var{target}, beginning +at index @var{tstart}. The characters are copied left-to-right +or right-to-left as needed -- the copy is guaranteed to work, +even if @var{target} and @var{s} are the same string. It is an +error if the copy operation runs off the end of the target +string. +@end deffn + +@deffn primitive string-take s n +@deffnx primitive string-take-right s n +Return the @var{n} first/last characters of @var{s}. +@end deffn + +@deffn primitive string-drop s n +@deffnx primitive string-drop-right s n +Return all but the first/last @var{n} characters of @var{s}. +@end deffn + +@deffn primitive string-pad s len [chr start end] +@deffnx primitive string-pad-right s len [chr start end] +Take that characters from @var{start} to @var{end} from the +string @var{s} and return a new string, right(left)-padded by the +character @var{chr} to length @var{len}. If the resulting +string is longer than @var{len}, it is truncated on the right (left). +@end deffn + +@deffn primitive string-trim s [char_pred start end] +@deffnx primitive string-trim-right s [char_pred start end] +@deffnx primitive string-trim-both s [char_pred start end] +Trim @var{s} by skipping over all characters on the left/right/both +sides of the string that satisfy the parameter @var{char_pred}: + +@itemize @bullet +@item +if it is the character @var{ch}, characters equal to +@var{ch} are trimmed, + +@item +if it is a procedure @var{pred} characters that +satisfy @var{pred} are trimmed, + +@item +if it is a character set, characters in that set are trimmed. +@end itemize + +If called without a @var{char_pred} argument, all whitespace is +trimmed. +@end deffn + + +@c =================================================================== + +@node SRFI-13 Modification +@subsection Modification + +The procedure @code{string-fill!} is extended from R5RS because it +accepts optional start/end indices. This bindings shadows the procedure +of the same name in the Guile core. The second modification procedure +@code{string-set!} is documented in the Strings section (REFFIXME). + +@deffn primitive string-fill! str chr [start end] +Stores @var{chr} in every element of the given @var{str} and +returns an unspecified value. +@end deffn + + +@c =================================================================== + +@node SRFI-13 Comparison +@subsection Comparison + +The procedures in this section are used for comparing strings in +different ways. The comparison predicates differ from those in R5RS in +that they do not only return @code{#t} or @code{#f}, but the mismatch +index in the case of a true return value. + +@code{string-hash} and @code{string-hash-ci} are for calculating hash +values for strings, useful for implementing fast lookup mechanisms. + +@deffn primitive string-compare s1 s2 proc_lt proc_eq proc_gt [start1 end1 start2 end2] +@deffnx primitive string-compare-ci s1 s2 proc_lt proc_eq proc_gt [start1 end1 start2 end2] +Apply @var{proc_lt}, @var{proc_eq}, @var{proc_gt} to the +mismatch index, depending upon whether @var{s1} is less than, +equal to, or greater than @var{s2}. The mismatch index is the +largest index @var{i} such that for every 0 <= @var{j} < +@var{i}, @var{s1}[@var{j}] = @var{s2}[@var{j}] -- that is, +@var{i} is the first position that does not match. The +character comparison is done case-insensitively. +@end deffn + +@deffn primitive string= s1 s2 [start1 end1 start2 end2] +@deffnx primitive string<> s1 s2 [start1 end1 start2 end2] +@deffnx primitive string< s1 s2 [start1 end1 start2 end2] +@deffnx primitive string> s1 s2 [start1 end1 start2 end2] +@deffnx primitive string<= s1 s2 [start1 end1 start2 end2] +@deffnx primitive string>= s1 s2 [start1 end1 start2 end2] +Compare @var{s1} and @var{s2} and return @code{#f} if the predicate +fails. Otherwise, the mismatch index is returned (or @var{end1} in the +case of @code{string=}. +@end deffn + +@deffn primitive string-ci= s1 s2 [start1 end1 start2 end2] +@deffnx primitive string-ci<> s1 s2 [start1 end1 start2 end2] +@deffnx primitive string-ci< s1 s2 [start1 end1 start2 end2] +@deffnx primitive string-ci> s1 s2 [start1 end1 start2 end2] +@deffnx primitive string-ci<= s1 s2 [start1 end1 start2 end2] +@deffnx primitive string-ci>= s1 s2 [start1 end1 start2 end2] +Compare @var{s1} and @var{s2} and return @code{#f} if the predicate +fails. Otherwise, the mismatch index is returned (or @var{end1} in the +case of @code{string=}. These are the case-insensitive variants. +@end deffn + +@deffn primitive string-hash s [bound start end] +@deffnx primitive string-hash-ci s [bound start end] +Return a hash value of the string @var{s} in the range 0 @dots{} +@var{bound} - 1. @code{string-hash-ci} is the case-insensitive variant. +@end deffn + + +@c =================================================================== + +@node Prefixes/Suffixes +@subsection Prefixes/Suffixes + +Using these procedures you can determine whether a given string is a +prefix or suffix of another string or how long a common prefix/suffix +is. + +@deffn primitive string-prefix-length s1 s2 [start1 end1 start2 end2] +@deffnx primitive string-prefix-length-ci s1 s2 [start1 end1 start2 end2] +@deffnx primitive string-suffix-length s1 s2 [start1 end1 start2 end2] +@deffnx primitive string-suffix-length-ci s1 s2 [start1 end1 start2 end2] +Return the length of the longest common prefix/suffix of the two +strings. @code{string-prefix-length-ci} and +@code{string-suffix-length-ci} are the case-insensitive variants. +@end deffn + +@deffn primitive string-prefix? s1 s2 [start1 end1 start2 end2] +@deffnx primitive string-prefix-ci? s1 s2 [start1 end1 start2 end2] +@deffnx primitive string-suffix? s1 s2 [start1 end1 start2 end2] +@deffnx primitive string-suffix-ci? s1 s2 [start1 end1 start2 end2] +Is @var{s1} a prefix/suffix of @var{s2}. @code{string-prefix-ci?} and +@code{string-suffix-ci?} are the case-insensitive variants. +@end deffn + + +@c =================================================================== + +@node Searching +@subsection Searching + +Use these procedures to find out whether a string contains a given +character or a given substring, or a character from a set of characters. + +@deffn primitive string-index s char_pred [start end] +@deffnx primitive string-index-right s char_pred [start end] +Search through the string @var{s} from left to right (right to left), +returning the index of the first (last) occurence of a character which + +@itemize +@item +equals @var{char_pred}, if it is character, + +@item +satisifies the predicate @var{char_pred}, if it is a +procedure, + +@item +is in the set @var{char_pred}, if it is a character set. +@end itemize +@end deffn + +@deffn primitive string-skip s char_pred [start end] +@deffnx primitive string-skip-right s char_pred [start end] +Search through the string @var{s} from left to right (right to left), +returning the index of the first (last) occurence of a character which + +@itemize +@item +does not equal @var{char_pred}, if it is character, + +@item +does not satisify the predicate @var{char_pred}, if it is +a procedure. + +@item +is not in the set if @var{char_pred} is a character set. +@end itemize +@end deffn + +@deffn primitive string-count s char_pred [start end] +Return the count of the number of characters in the string +@var{s} which + +@itemize @bullet +@item +equals @var{char_pred}, if it is character, + +@item +satisifies the predicate @var{char_pred}, if it is a procedure. + +@item +is in the set @var{char_pred}, if it is a character set. +@end itemize +@end deffn + +@deffn primitive string-contains s1 s2 [start1 end1 start2 end2] +@deffnx primitive string-contains-ci s1 s2 [start1 end1 start2 end2] +Does string @var{s1} contain string @var{s2}? Return the index +in @var{s1} where @var{s2} occurs as a substring, or false. +The optional start/end indices restrict the operation to the +indicated substrings. + +@code{string-contains-ci} is the case-insensitive variant. +@end deffn + + +@c =================================================================== + +@node Case Mapping +@subsection Alphabetic Case Mapping + +These procedures convert the alphabetic case of strings. They are +similar to the procedures in the Guile core, but are extended to handle +optional start/end indices. + +@deffn primitive string-upcase s [start end] +@deffnx primitive string-upcase! s [start end] +Upcase every character in @var{s}. @code{string-upcase!} is the +side-effecting variant. +@end deffn + +@deffn primitive string-downcase s [start end] +@deffnx primitive string-downcase! s [start end] +Downcase every character in @var{s}. @code{string-downcase!} is the +side--effecting variant. +@end deffn + +@deffn primitive string-titlecase s [start end] +@deffnx primitive string-titlecase! s [start end] +Upcase every first character in every word in @var{s}, downcase the +other characters. @code{string-titlecase!} is the side--effecting +variant. +@end deffn + + +@c =================================================================== + +@node Reverse/Append +@subsection Reverse/Append + +One appending procedure, @code{string-append} is the same in R5RS and in +SRFI-13, so it is not redefined. + +@deffn primitive string-reverse str [start end] +@deffnx primitive string-reverse! str [start end] +Reverse the string @var{str}. The optional arguments +@var{start} and @var{end} delimit the region of @var{str} to +operate on. + +@code{string-reverse!} modifies the argument string and returns an +unspecified value. +@end deffn + +@deffn primitive string-append/shared ls @dots{} +Like @code{string-append}, but the result may share memory +with the argument strings. +@end deffn + +@deffn primitive string-concatenate ls +Append the elements of @var{ls} (which must be strings) +together into a single string. Guaranteed to return a freshly +allocated string. +@end deffn + +@deffn primitive string-concatenate/shared ls +Like @code{string-concatenate}, but the result may share memory +with the strings in the list @var{ls}. +@end deffn + +@deffn primitive reverse-string-concatenate ls final_string end +Without optional arguments, this procedure is equivalent to + +@smalllisp +(string-concatenate (reverse ls)) +@end smalllisp + +If the optional argument @var{final_string} is specified, it is +consed onto the beginning to @var{ls} before performing the +list-reverse and string-concatenate operations. + +Guaranteed to return a freshly allocated string. +@end deffn + +@deffn primitive reverse-string-concatenate/shared ls final_string end +Like @code{reverse-string-concatenate}, but the result may +share memory with the the strings in the @var{ls} arguments. +@end deffn + + +@c =================================================================== + +@node Fold/Unfold/Map +@subsection Fold/Unfold/Map + +@code{string-map}, @code{string-for-each} etc. are for iterating over +the characters a string is composed of. The fold and unfold procedures +are list iterators and constructors. + +@deffn primitive string-map proc s [start end] +@var{proc} is a char->char procedure, it is mapped over +@var{s}. The order in which the procedure is applied to the +string elements is not specified. +@end deffn + +@deffn primitive string-map! proc s [start end] +@var{proc} is a char->char procedure, it is mapped over +@var{s}. The order in which the procedure is applied to the +string elements is not specified. The string @var{s} is +modified in-place, the return value is not specified. +@end deffn + +@deffn primitive string-fold kons knil s [start end] +@deffnx primitive string-fold-right kons knil s [start end] +Fold @var{kons} over the characters of @var{s}, with @var{knil} as the +terminating element, from left to right (or right to left, for +@code{string-fold-right}). @var{kons} must expect two arguments: The +actual character and the last result of @var{kons}' application. +@end deffn + +@deffn primitive string-unfold p f g seed [base make_final] +@deffnx primitive string-unfold-right p f g seed [base make_final] +These are the fundamental string constructors. +@itemize +@item @var{g} is used to generate a series of @emph{seed} +values from the initial @var{seed}: @var{seed}, (@var{g} +@var{seed}), (@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}), +@dots{} +@item @var{p} tells us when to stop -- when it returns true +when applied to one of these seed values. +@item @var{f} maps each seed value to the corresponding +character in the result string. These chars are assembled into the +string in a left-to-right (right-to-left) order. +@item @var{base} is the optional initial/leftmost (rightmost) + portion of the constructed string; it default to the empty string. +@item @var{make_final} is applied to the terminal seed +value (on which @var{p} returns true) to produce the final/rightmost +(leftmost) portion of the constructed string. It defaults to +@code{(lambda (x) "")}. +@end itemize +@end deffn + +@deffn primitive string-for-each proc s [start end] +@var{proc} is mapped over @var{s} in left-to-right order. The +return value is not specified. +@end deffn + + +@c =================================================================== + +@node Replicate/Rotate +@subsection Replicate/Rotate + +These procedures are special substring procedures, which can also be +used for replicating strings. They are a bit tricky to use, but +consider this code fragment, which replicates the input string +@code{"foo"} so often that the resulting string has a length of six. + +@lisp +(xsubstring "foo" 0 6) +@result{} +"foofoo" +@end lisp + +@deffn primitive xsubstring s from [to start end] +This is the @emph{extended substring} procedure that implements +replicated copying of a substring of some string. + +@var{s} is a string, @var{start} and @var{end} are optional +arguments that demarcate a substring of @var{s}, defaulting to +0 and the length of @var{s}. Replicate this substring up and +down index space, in both the positive and negative directions. +@code{xsubstring} returns the substring of this string +beginning at index @var{from}, and ending at @var{to}, which +defaults to @var{from} + (@var{end} - @var{start}). +@end deffn + +@deffn primitive string-xcopy! target tstart s sfrom [sto start end] +Exactly the same as @code{xsubstring}, but the extracted text +is written into the string @var{target} starting at index +@var{tstart}. The operation is not defined if @code{(eq? +@var{target} @var{s})} or these arguments share storage -- you +cannot copy a string on top of itself. +@end deffn + + +@c =================================================================== + +@node Miscellaneous +@subsection Miscellaneous + +@code{string-replace} is for replacing a portion of a string with +another string and @code{string-tokenize} splits a string into a list of +strings, breaking it up at a specified character. + +@deffn primitive string-replace s1 s2 [start1 end1 start2 end2] +Return the string @var{s1}, but with the characters +@var{start1} @dots{} @var{end1} replaced by the characters +@var{start2} @dots{} @var{end2} from @var{s2}. +@end deffn + +@deffn primitive string-tokenize s [token_char start end] +Split the string @var{s} into a list of substrings, where each +substring is a maximal non-empty contiguous sequence of +characters equal to the character @var{token_char}, or +whitespace, if @var{token_char} is not given. If +@var{token_char} is a character set, it is used for finding the +token borders. +@end deffn + + +@c =================================================================== + +@node Filtering/Deleting +@subsection Filtering/Deleting + +@dfn{Filtering} means to remove all characters from a string which do +not match a given criteria, @dfn{deleting} means the opposite. + +@deffn primitive string-filter s char_pred [start end] +Filter the string @var{s}, retaining only those characters that +satisfy the @var{char_pred} argument. If the argument is a +procedure, it is applied to each character as a predicate, if +it is a character, it is tested for equality and if it is a +character set, it is tested for membership. +@end deffn + +@deffn primitive string-delete s char_pred [start end] +Filter the string @var{s}, retaining only those characters that +do not satisfy the @var{char_pred} argument. If the argument +is a procedure, it is applied to each character as a predicate, +if it is a character, it is tested for equality and if it is a +character set, it is tested for membership. +@end deffn + + +@c =================================================================== + +@node Character-set Procedures +@section Character-set Procedures + +SRFI-14 defines the data type @dfn{character set}, and also defines a +lot of procedures for handling this character type, and a few standard +character sets like whitespace, alphabetic characters and others. + +@menu +* Character Set Data Type:: Description of the character set data type. +* Predicates/Comparison:: Testing character sets. +* Iterating Over Character Sets:: Iterating over the members of a set. +* Creating Character Sets:: Creating new character sets. +* Querying Character Sets:: Extracting information from character sets. +* Character-Set Algebra:: Set-algebra on character sets. +* Standard Character Sets:: Variables containg standard character sets. +@end menu + + +@c =================================================================== + +@node Character Set Data Type +@subsection Character Set Data Type + +The data type @dfn{charset} implements sets of characters (REFFIXME). +Because the internal representation of character sets is not visible to +the user, a lot of procedures for handling them are provided. + +Character sets can be created, extended, tested for the membership of a +characters and be compared to other character sets. + +The Guile implementation of character sets deals with 8-bit characters. +In the standard variables, only the ASCII part of the character range is +really used, so that for example @dfn{Umlaute} and other accented +characters are not considered to be letters. In the future, as Guile +may get support for international character sets, this will change, so +don't rely on these ``features''. + + +@c =================================================================== + +@node Predicates/Comparison +@subsection Predicates/Comparison + +Use these procedures for testing whether an object is a character set, +or whether several character sets are equal or subsets of each other. +@code{char-set-hash} can be used for calculating a hash value, maybe for +usage in fast lookup procedures. + +@deffn primitive char-set? obj +Return @code{#t} if @var{obj} is a character set, @code{#f} +otherwise. +@end deffn + +@deffn primitive char-set= cs1 @dots{} +Return @code{#t} if all given character sets are equal. +@end deffn + +@deffn primitive char-set<= cs1 @dots{} +Return @code{#t} if every character set @var{cs}i is a subset +of character set @var{cs}i+1. +@end deffn + +@deffn primitive char-set-hash cs [bound] +Compute a hash value for the character set @var{cs}. If +@var{bound} is given and not @code{#f}, it restricts the +returned value to the range 0 @dots{} @var{bound - 1}. +@end deffn + + +@c =================================================================== + +@node Iterating Over Character Sets +@subsection Iterating Over Character Sets + +Character set cursors are a means for iterating over the members of a +character sets. After creating a character set cursor with +@code{char-set-cursor}, a cursor can be dereferenced with +@code{char-set-ref}, advanced to the next member with +@code{char-set-cursor-next}. Whether a cursor has passed past the last +element of the set can be checked with @code{end-of-char-set?}. + +Additionally, mapping and (un-)folding procedures for character sets are +provided. + +@deffn primitive char-set-cursor cs +Return a cursor into the character set @var{cs}. +@end deffn + +@deffn primitive char-set-ref cs cursor +Return the character at the current cursor position +@var{cursor} in the character set @var{cs}. It is an error to +pass a cursor for which @code{end-of-char-set?} returns true. +@end deffn + +@deffn primitive char-set-cursor-next cs cursor +Advance the character set cursor @var{cursor} to the next +character in the character set @var{cs}. It is an error if the +cursor given satisfies @code{end-of-char-set?}. +@end deffn + +@deffn primitive end-of-char-set? cursor +Return @code{#t} if @var{cursor} has reached the end of a +character set, @code{#f} otherwise. +@end deffn + +@deffn primitive char-set-fold kons knil cs +Fold the procedure @var{kons} over the character set @var{cs}, +initializing it with @var{knil}. +@end deffn + +@deffn primitive char-set-unfold p f g seed [base_cs] +@deffnx primitive char-set-unfold! p f g seed base_cs +This is a fundamental constructor for character sets. +@itemize +@item @var{g} is used to generate a series of ``seed'' values +from the initial seed: @var{seed}, (@var{g} @var{seed}), +(@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}), @dots{} +@item @var{p} tells us when to stop -- when it returns true +when applied to one of the seed values. +@item @var{f} maps each seed value to a character. These +characters are added to the base character set @var{base_cs} to +form the result; @var{base_cs} defaults to the empty set. +@end itemize + +@code{char-set-unfold!} is the side-effecting variant. +@end deffn + +@deffn primitive char-set-for-each proc cs +Apply @var{proc} to every character in the character set +@var{cs}. The return value is not specified. +@end deffn + +@deffn primitive char-set-map proc cs +Map the procedure @var{proc} over every character in @var{cs}. +@var{proc} must be a character -> character procedure. +@end deffn + + +@c =================================================================== + +@node Creating Character Sets +@subsection Creating Character Sets + +New character sets are produced with these procedures. + +@deffn primitive char-set-copy cs +Return a newly allocated character set containing all +characters in @var{cs}. +@end deffn + +@deffn primitive char-set char1 @dots{} +Return a character set containing all given characters. +@end deffn + +@deffn primitive list->char-set char_list [base_cs] +@deffnx primitive list->char-set! char_list base_cs +Convert the character list @var{list} to a character set. If +the character set @var{base_cs} is given, the character in this +set are also included in the result. + +@code{list->char-set!} is the side-effecting variant. +@end deffn + +@deffn primitive string->char-set s [base_cs] +@deffnx primitive string->char-set! s base_cs +Convert the string @var{str} to a character set. If the +character set @var{base_cs} is given, the characters in this +set are also included in the result. + +@code{string->char-set!} is the side-effecting variant. +@end deffn + +@deffn primitive char-set-filter pred cs [base_cs] +@deffnx primitive char-set-filter! pred cs base_cs +Return a character set containing every character from @var{cs} +so that it satisfies @var{pred}. If provided, the characters +from @var{base_cs} are added to the result. + +@code{char-set-filter!} is the side-effecting variant. +@end deffn + +@deffn primitive ucs-range->char-set lower upper [error? base_cs] +@deffnx primitive uce-range->char-set! lower upper error? base_cs +Return a character set containing all characters whose +character codes lie in the half-open range +[@var{lower},@var{upper}). + +If @var{error} is a true value, an error is signalled if the +specified range contains characters which are not contained in +the implemented character range. If @var{error} is @code{#f}, +these characters are silently left out of the resultung +character set. + +The characters in @var{base_cs} are added to the result, if +given. + +@code{ucs-range->char-set!} is the side-effecting variant. +@end deffn + +@deffn procedure ->char-set x +Coerce @var{x} into a character set. @var{x} may be a string, a +character or a character set. +@end deffn + + +@c =================================================================== + +@node Querying Character Sets +@subsection Querying Character Sets + +Access the elements and other information of a character set with these +procedures. + +@deffn primitive char-set-size cs +Return the number of elements in character set @var{cs}. +@end deffn + +@deffn primitive char-set-count pred cs +Return the number of the elements int the character set +@var{cs} which satisfy the predicate @var{pred}. +@end deffn + +@deffn primitive char-set->list cs +Return a list containing the elements of the character set +@var{cs}. +@end deffn + +@deffn primitive char-set->string cs +Return a string containing the elements of the character set +@var{cs}. The order in which the characters are placed in the +string is not defined. +@end deffn + +@deffn primitive char-set-contains? cs char +Return @code{#t} iff the character @var{ch} is contained in the +character set @var{cs}. +@end deffn + +@deffn primitive char-set-every pred cs +Return a true value if every character in the character set +@var{cs} satisfies the predicate @var{pred}. +@end deffn + +@deffn primitive char-set-any pred cs +Return a true value if any character in the character set +@var{cs} satisfies the predicate @var{pred}. +@end deffn + + +@c =================================================================== + +@node Character-Set Algebra +@subsection Character-Set Algebra + +Character sets can be manipulated with the common set algebra operation, +such as union, complement, intersection etc. All of these procedures +provide side--effecting variants, which modify their character set +argument(s). + +@deffn primitive char-set-adjoin cs char1 @dots{} +@deffnx primitive char-set-adjoin! cs char1 @dots{} +Add all character arguments to the first argument, which must +be a character set. +@end deffn + +@deffn primitive char-set-delete cs char1 @dots{} +@deffnx primitive char-set-delete! cs char1 @dots{} +Delete all character arguments from the first argument, which +must be a character set. +@end deffn + +@deffn primitive char-set-complement cs +@deffnx primitive char-set-complement! cs +Return the complement of the character set @var{cs}. +@end deffn + +@deffn primitive char-set-union cs1 @dots{} +@deffnx primitive char-set-union! cs1 @dots{} +Return the union of all argument character sets. +@end deffn + +@deffn primitive char-set-intersection cs1 @dots{} +@deffnx primitive char-set-intersection! cs1 @dots{} +Return the intersection of all argument character sets. +@end deffn + +@deffn primitive char-set-difference cs1 @dots{} +@deffnx primitive char-set-difference! cs1 @dots{} +Return the difference of all argument character sets. +@end deffn + +@deffn primitive char-set-xor cs1 @dots{} +@deffnx primitive char-set-xor! cs1 @dots{} +Return the exclusive--or of all argument character sets. +@end deffn + +@deffn primitive char-set-diff+intersection cs1 @dots{} +@deffnx primitive char-set-diff+intersection! cs1 @dots{} +Return the difference and the intersection of all argument +character sets. +@end deffn + + +@c =================================================================== + +@node Standard Character Sets +@subsection Standard Character Sets + +In order to make the use of the character set data type and procedures +useful, several predefined character set variables exist. + +@defvar char-set:lower-case +All lower--case characters. +@end defvar + +@defvar char-set:upper-case +All upper--case characters. +@end defvar + +@defvar char-set:title-case +This is empty, because ASCII has no titlecase characters. +@end defvar + +@defvar char-set:letter +All letters, e.g. the union of @code{char-set:lower-case} and +@code{char-set:upper-case}. +@end defvar + +@defvar char-set:digit +All digits. +@end defvar + +@defvar char-set:letter+digit +The union of @code{char-set:letter} and @code{char-set:digit}. +@end defvar + +@defvar char-set:graphic +All characters which would put ink on the paper. +@end defvar + +@defvar char-set:printing +The union of @code{char-set:graphic} and @code{char-set:whitespace}. +@end defvar + +@defvar char-set:whitespace +All whitespace characters. +@end defvar + +@defvar char-set:blank +All horizontal whitespace characters, that is @code{#\space} and +@code{#\tab}. +@end defvar + +@defvar char-set:iso-control +The ISO control characters with the codes 0--31 and 127. +@end defvar + +@defvar char-set:punctuation +The characters @code{!"#%&'()*,-./:;?@@[\\]_@{@}} +@end defvar + +@defvar char-set:symbol +The characters @code{$+<=>^`|~}. +@end defvar + +@defvar char-set:hex-digit +The hexadecimal digits @code{0123456789abcdefABCDEF}. +@end defvar + +@defvar char-set:ascii +All ASCII characters. +@end defvar + +@defvar char-set:empty +The empty character set. +@end defvar + +@defvar char-set:full +This character set contains all possible characters. +@end defvar |