summaryrefslogtreecommitdiff
path: root/stdlib/scanf.mli
diff options
context:
space:
mode:
authorPierre Weis <Pierre.Weis@inria.fr>2011-10-25 13:13:54 +0000
committerPierre Weis <Pierre.Weis@inria.fr>2011-10-25 13:13:54 +0000
commit2a2fcade52c32f339195dc8ed2b2fcad53081da6 (patch)
tree05651f6f429af3277c7e0087a06b9b4837f013c0 /stdlib/scanf.mli
parenta1704c30b11b78367062e63fc6ccb686cc69fe2c (diff)
downloadocaml-2a2fcade52c32f339195dc8ed2b2fcad53081da6.tar.gz
Bug #4380.
git-svn-id: http://caml.inria.fr/svn/ocaml/trunk@11243 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
Diffstat (limited to 'stdlib/scanf.mli')
-rw-r--r--stdlib/scanf.mli270
1 files changed, 145 insertions, 125 deletions
diff --git a/stdlib/scanf.mli b/stdlib/scanf.mli
index 38cbad8656..8423cb48c8 100644
--- a/stdlib/scanf.mli
+++ b/stdlib/scanf.mli
@@ -262,139 +262,159 @@ val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner;;
(** {7:conversion Conversion specifications in format strings} *)
-(** Conversion specifications consist in the [%] character, followed by
- an optional flag, an optional field width, and followed by one or
- two conversion characters. The conversion characters and their
- meanings are:
-
- - [d]: reads an optionally signed decimal integer.
- - [i]: reads an optionally signed integer
- (usual input conventions for decimal ([0-9]+), hexadecimal
- ([0x[0-9a-f]+] and [0X[0-9A-F]+]), octal ([0o[0-7]+]), and binary
- ([0b[0-1]+]) notations are understood).
- - [u]: reads an unsigned decimal integer.
- - [x] or [X]: reads an unsigned hexadecimal integer ([[0-9a-f]+] or [[0-9A-F]+]).
- - [o]: reads an unsigned octal integer ([[0-7]+]).
- - [s]: reads a string argument that spreads as much as possible, until the
- following bounding condition holds: {ul
- {- a whitespace has been found (see {!Scanf.space}),}
- {- a scanning indication (see scanning {!Scanf.indication}) has been
- encountered,}
- {- the end-of-input has been reached.}}
- Hence, this conversion always succeeds: it returns an empty
- string, if the bounding condition holds when the scan begins.
- - [S]: reads a delimited string argument (delimiters and special
- escaped characters follow the lexical conventions of Caml).
- - [c]: reads a single character. To test the current input character
- without reading it, specify a null field width, i.e. use
- specification [%0c]. Raise [Invalid_argument], if the field width
- specification is greater than 1.
- - [C]: reads a single delimited character (delimiters and special
- escaped characters follow the lexical conventions of Caml).
- - [f], [e], [E], [g], [G]: reads an optionally signed
- floating-point number in decimal notation, in the style [dddd.ddd
- e/E+-dd].
- - [F]: reads a floating point number according to the lexical
- conventions of Caml (hence the decimal point is mandatory if the
- exponent part is not mentioned).
- - [B]: reads a boolean argument ([true] or [false]).
- - [b]: reads a boolean argument (for backward compatibility; do not use
- in new programs).
- - [ld], [li], [lu], [lx], [lX], [lo]: reads an [int32] argument to
- the format specified by the second letter for regular integers.
- - [nd], [ni], [nu], [nx], [nX], [no]: reads a [nativeint] argument to
- the format specified by the second letter for regular integers.
- - [Ld], [Li], [Lu], [Lx], [LX], [Lo]: reads an [int64] argument to
- the format specified by the second letter for regular integers.
- - [\[ range \]]: reads characters that matches one of the characters
- mentioned in the range of characters [range] (or not mentioned in
- it, if the range starts with [^]). Reads a [string] that can be
- empty, if the next input character does not match the range. The set of
- characters from [c1] to [c2] (inclusively) is denoted by [c1-c2].
- Hence, [%\[0-9\]] returns a string representing a decimal number
- or an empty string if no decimal digit is found; similarly,
- [%\[\\048-\\057\\065-\\070\]] returns a string of hexadecimal digits.
- If a closing bracket appears in a range, it must occur as the
- first character of the range (or just after the [^] in case of
- range negation); hence [\[\]\]] matches a [\]] character and
- [\[^\]\]] matches any character that is not [\]].
- - [r]: user-defined reader. Takes the next [ri] formatted input function and
- applies it to the scanning buffer [ib] to read the next argument. The
- input function [ri] must therefore have type [Scanning.in_channel -> 'a] and
- the argument read has type ['a].
- - [\{ fmt %\}]: reads a format string argument.
- The format string read must have the same type as the format string
- specification [fmt].
- For instance, ["%{ %i %}"] reads any format string that can read a value of
- type [int]; hence, if [s] is the string ["fmt:\"number is %u\""], then
- [Scanf.sscanf s "fmt: %{%i%}"] succeeds and returns the format string
- ["number is %u"].
- - [\( fmt %\)]: scanning format substitution.
- Reads a format string and then goes on scanning with the format string
- read, instead of using [fmt].
- The format string read must have the same type as the format string
- specification [fmt] that it replaces.
- For instance, ["%( %i %)"] reads any format string that can read a value
- of type [int].
- Returns the format string read, and the value read using the format
- string read.
- Hence, if [s] is the string ["\"%4d\"1234.00"], then
- [Scanf.sscanf s "%(%i%)" (fun fmt i -> fmt, i)] evaluates to
- [("%4d", 1234)].
- If the special flag [_] is used, the conversion discards the
- format string read and only returns the value read with the format
- string read.
- Hence, if [s] is the string ["\"%4d\"1234.00"], then
- [Scanf.sscanf s "%_(%i%)"] is simply equivalent to
- [Scanf.sscanf "1234.00" "%4d"].
- - [l]: returns the number of lines read so far.
- - [n]: returns the number of characters read so far.
- - [N] or [L]: returns the number of tokens read so far.
- - [!]: matches the end of input condition.
- - [%]: matches one [%] character in the input.
- - [,]: the no-op delimiter for conversion specifications.
-
- Following the [%] character that introduces a conversion, there may be
- the special flag [_]: the conversion that follows occurs as usual,
- but the resulting value is discarded.
- For instance, if [f] is the function [fun i -> i + 1], and [s] is the
- string ["x = 1"], then [Scanf.sscanf s "%_s = %i" f] returns [2].
-
- The field width is composed of an optional integer literal
- indicating the maximal width of the token to read.
- For instance, [%6d] reads an integer, having at most 6 decimal digits;
- [%4f] reads a float with at most 4 characters; and [%8\[\\000-\\255\]]
- returns the next 8 characters (or all the characters still available,
- if fewer than 8 characters are available in the input).
-
- Notes:
-
- - as mentioned above, a [%s] conversion always succeeds, even if there is
- nothing to read in the input: in this case, it simply returns [""].
-
- - in addition to the relevant digits, ['_'] characters may appear
- inside numbers (this is reminiscent to the usual Caml lexical
- conventions). If stricter scanning is desired, use the range
- conversion facility instead of the number conversions.
-
- - the [scanf] facility is not intended for heavy duty lexical
- analysis and parsing. If it appears not expressive enough for your
- needs, several alternative exists: regular expressions (module
- [Str]), stream parsers, [ocamllex]-generated lexers,
- [ocamlyacc]-generated parsers. *)
+(** Conversion specifications have the following form:
+
+ [% \[flags\] \[width\] \[.precision\] type]
+
+ In short, a conversion specification consists in the [%] character,
+ followed by optional modifiers, and a type which is made of one or
+ several characters.
+
+ As a special convention for format strings, the [\@] character introduces
+ an escape for both characters [\@] and [%]: in a format string,
+ [\@\@] and [\@%] are respectively equivalent to the plain characters [\@]
+ and [%].
+
+ The types and their meanings are:
+
+ - [d]: reads an optionally signed decimal integer.
+ - [i]: reads an optionally signed integer
+ (usual input conventions for decimal ([0-9]+), hexadecimal
+ ([0x[0-9a-f]+] and [0X[0-9A-F]+]), octal ([0o[0-7]+]), and binary
+ ([0b[0-1]+]) notations are understood).
+ - [u]: reads an unsigned decimal integer.
+ - [x] or [X]: reads an unsigned hexadecimal integer ([[0-9a-f]+] or [[0-9A-F]+]).
+ - [o]: reads an unsigned octal integer ([[0-7]+]).
+ - [s]: reads a string argument that spreads as much as possible, until
+ the following bounding conditions holds:
+ {ul
+ {- a whitespace has been found (see {!Scanf.space}),}
+ {- a scanning indication has been encountered
+ (see scanning {!Scanf.indication}),}
+ {- the end-of-input has been reached.}
+ }
+ Hence, the [%s] conversion always succeeds: it returns an empty
+ string, if the bounding condition holds when the scan begins.
+ - [S]: reads a delimited string argument (delimiters and special
+ escaped characters follow the lexical conventions of Caml).
+ - [c]: reads a single character. To test the current input character
+ without reading it, specify a null field width, i.e. use
+ specification [%0c]. Raise [Invalid_argument], if the field width
+ specification is greater than 1.
+ - [C]: reads a single delimited character (delimiters and special
+ escaped characters follow the lexical conventions of Caml).
+ - [f], [e], [E], [g], [G]: reads an optionally signed
+ floating-point number in decimal notation, in the style [dddd.ddd
+ e/E+-dd].
+ - [F]: reads a floating point number according to the lexical
+ conventions of Caml (hence the decimal point is mandatory if the
+ exponent part is not mentioned).
+ - [B]: reads a boolean argument ([true] or [false]).
+ - [b]: reads a boolean argument (for backward compatibility; do not use
+ in new programs).
+ - [ld], [li], [lu], [lx], [lX], [lo]: reads an [int32] argument to
+ the format specified by the second letter for regular integers.
+ - [nd], [ni], [nu], [nx], [nX], [no]: reads a [nativeint] argument to
+ the format specified by the second letter for regular integers.
+ - [Ld], [Li], [Lu], [Lx], [LX], [Lo]: reads an [int64] argument to
+ the format specified by the second letter for regular integers.
+ - [\[ range \]]: reads characters that matches one of the characters
+ mentioned in the range of characters [range] (or not mentioned in
+ it, if the range starts with [^]). Reads a [string] that can be
+ empty, if the next input character does not match the range. The set of
+ characters from [c1] to [c2] (inclusively) is denoted by [c1-c2].
+ Hence, [%\[0-9\]] returns a string representing a decimal number
+ or an empty string if no decimal digit is found; similarly,
+ [%\[\\048-\\057\\065-\\070\]] returns a string of hexadecimal digits.
+ If a closing bracket appears in a range, it must occur as the
+ first character of the range (or just after the [^] in case of
+ range negation); hence [\[\]\]] matches a [\]] character and
+ [\[^\]\]] matches any character that is not [\]].
+ - [r]: user-defined reader. Takes the next [ri] formatted input function and
+ applies it to the scanning buffer [ib] to read the next argument. The
+ input function [ri] must therefore have type [Scanning.in_channel -> 'a] and
+ the argument read has type ['a].
+ - [\{ fmt %\}]: reads a format string argument.
+ The format string read must have the same type as the format string
+ specification [fmt].
+ For instance, ["%{ %i %}"] reads any format string that can read a value of
+ type [int]; hence, if [s] is the string ["fmt:\"number is %u\""], then
+ [Scanf.sscanf s "fmt: %{%i%}"] succeeds and returns the format string
+ ["number is %u"].
+ - [\( fmt %\)]: scanning format substitution.
+ Reads a format string and then goes on scanning with the format string
+ read, instead of using [fmt].
+ The format string read must have the same type as the format string
+ specification [fmt] that it replaces.
+ For instance, ["%( %i %)"] reads any format string that can read a value
+ of type [int].
+ Returns the format string read, and the value read using the format
+ string read.
+ Hence, if [s] is the string ["\"%4d\"1234.00"], then
+ [Scanf.sscanf s "%(%i%)" (fun fmt i -> fmt, i)] evaluates to
+ [("%4d", 1234)].
+ If the special flag [_] is used, the conversion discards the
+ format string read and only returns the value read with the format
+ string read.
+ Hence, if [s] is the string ["\"%4d\"1234.00"], then
+ [Scanf.sscanf s "%_(%i%)"] is simply equivalent to
+ [Scanf.sscanf "1234.00" "%4d"].
+ - [l]: returns the number of lines read so far.
+ - [n]: returns the number of characters read so far.
+ - [N] or [L]: returns the number of tokens read so far.
+ - [!]: matches the end of input condition.
+ - [%]: matches one [%] character in the input.
+ - [,]: the no-op delimiter for conversion specifications.
+
+ Following the [%] character that introduces a conversion, there may be
+ the special flag [_]: the conversion that follows occurs as usual,
+ but the resulting value is discarded.
+ For instance, if [f] is the function [fun i -> i + 1], and [s] is the
+ string ["x = 1"], then [Scanf.sscanf s "%_s = %i" f] returns [2].
+
+ The optional [width] is an integer literal indicating the maximal width
+ of the token to read.
+ For instance, [%6d] reads an integer, having at most 6 decimal digits;
+ [%4f] reads a float with at most 4 characters; and [%8\[\\000-\\255\]]
+ returns the next 8 characters (or all the characters still available,
+ if fewer than 8 characters are available in the input).
+
+ The optional [precision] is a dot [.] followed by an integer literal
+ indicating the maximum number of digits that follow the decimal point in
+ the [%f], [%e], and [%E] conversions. For instance, [%.4f] reads a
+ [float] with at most 4 fractional digits.
+
+ Notes:
+
+ - as mentioned above, the [%s] conversion always succeeds, even if there is
+ nothing to read in the input: in this case, it simply returns [""].
+
+ - in addition to the relevant digits, ['_'] characters may appear
+ inside numbers (this is reminiscent to the usual Caml lexical
+ conventions). If stricter scanning is desired, use the range
+ conversion facility instead of the number conversions.
+
+ - the [scanf] facility is not intended for heavy duty lexical
+ analysis and parsing. If it appears not expressive enough for your
+ needs, several alternative exists: regular expressions (module
+ [Str]), stream parsers, [ocamllex]-generated lexers,
+ [ocamlyacc]-generated parsers. *)
(** {7:indication Scanning indications in format strings} *)
(** Scanning indications appear just after the string conversions [%s]
and [%\[ range \]] to delimit the end of the token. A scanning
- indication is introduced by a [@] character, followed by some
- constant character [c]. It means that the string token should end
+ indication is introduced by a [\@] character, followed by some
+ literal character [c]. It means that the string token should end
just before the next matching [c] (which is skipped). If no [c]
character is encountered, the string token spreads as much as
possible. For instance, ["%s@\t"] reads a string up to the next
- tab character or to the end of input. If a scanning
- indication [\@c] does not follow a string conversion, it is treated
- as a plain [c] character.
+ tab character or up to the end of input.
+
+
+ When it does not introduce a scanning indication, the [\@] character
+ introduces an escape for the next character: [\@c] is treated as a plain
+ [c] character.
Note: