diff options
author | Pierre Weis <Pierre.Weis@inria.fr> | 2011-10-25 13:13:54 +0000 |
---|---|---|
committer | Pierre Weis <Pierre.Weis@inria.fr> | 2011-10-25 13:13:54 +0000 |
commit | 2a2fcade52c32f339195dc8ed2b2fcad53081da6 (patch) | |
tree | 05651f6f429af3277c7e0087a06b9b4837f013c0 /stdlib/scanf.mli | |
parent | a1704c30b11b78367062e63fc6ccb686cc69fe2c (diff) | |
download | ocaml-2a2fcade52c32f339195dc8ed2b2fcad53081da6.tar.gz |
Bug #4380.
git-svn-id: http://caml.inria.fr/svn/ocaml/trunk@11243 f963ae5c-01c2-4b8c-9fe0-0dff7051ff02
Diffstat (limited to 'stdlib/scanf.mli')
-rw-r--r-- | stdlib/scanf.mli | 270 |
1 files changed, 145 insertions, 125 deletions
diff --git a/stdlib/scanf.mli b/stdlib/scanf.mli index 38cbad8656..8423cb48c8 100644 --- a/stdlib/scanf.mli +++ b/stdlib/scanf.mli @@ -262,139 +262,159 @@ val bscanf : Scanning.in_channel -> ('a, 'b, 'c, 'd) scanner;; (** {7:conversion Conversion specifications in format strings} *) -(** Conversion specifications consist in the [%] character, followed by - an optional flag, an optional field width, and followed by one or - two conversion characters. The conversion characters and their - meanings are: - - - [d]: reads an optionally signed decimal integer. - - [i]: reads an optionally signed integer - (usual input conventions for decimal ([0-9]+), hexadecimal - ([0x[0-9a-f]+] and [0X[0-9A-F]+]), octal ([0o[0-7]+]), and binary - ([0b[0-1]+]) notations are understood). - - [u]: reads an unsigned decimal integer. - - [x] or [X]: reads an unsigned hexadecimal integer ([[0-9a-f]+] or [[0-9A-F]+]). - - [o]: reads an unsigned octal integer ([[0-7]+]). - - [s]: reads a string argument that spreads as much as possible, until the - following bounding condition holds: {ul - {- a whitespace has been found (see {!Scanf.space}),} - {- a scanning indication (see scanning {!Scanf.indication}) has been - encountered,} - {- the end-of-input has been reached.}} - Hence, this conversion always succeeds: it returns an empty - string, if the bounding condition holds when the scan begins. - - [S]: reads a delimited string argument (delimiters and special - escaped characters follow the lexical conventions of Caml). - - [c]: reads a single character. To test the current input character - without reading it, specify a null field width, i.e. use - specification [%0c]. Raise [Invalid_argument], if the field width - specification is greater than 1. - - [C]: reads a single delimited character (delimiters and special - escaped characters follow the lexical conventions of Caml). - - [f], [e], [E], [g], [G]: reads an optionally signed - floating-point number in decimal notation, in the style [dddd.ddd - e/E+-dd]. - - [F]: reads a floating point number according to the lexical - conventions of Caml (hence the decimal point is mandatory if the - exponent part is not mentioned). - - [B]: reads a boolean argument ([true] or [false]). - - [b]: reads a boolean argument (for backward compatibility; do not use - in new programs). - - [ld], [li], [lu], [lx], [lX], [lo]: reads an [int32] argument to - the format specified by the second letter for regular integers. - - [nd], [ni], [nu], [nx], [nX], [no]: reads a [nativeint] argument to - the format specified by the second letter for regular integers. - - [Ld], [Li], [Lu], [Lx], [LX], [Lo]: reads an [int64] argument to - the format specified by the second letter for regular integers. - - [\[ range \]]: reads characters that matches one of the characters - mentioned in the range of characters [range] (or not mentioned in - it, if the range starts with [^]). Reads a [string] that can be - empty, if the next input character does not match the range. The set of - characters from [c1] to [c2] (inclusively) is denoted by [c1-c2]. - Hence, [%\[0-9\]] returns a string representing a decimal number - or an empty string if no decimal digit is found; similarly, - [%\[\\048-\\057\\065-\\070\]] returns a string of hexadecimal digits. - If a closing bracket appears in a range, it must occur as the - first character of the range (or just after the [^] in case of - range negation); hence [\[\]\]] matches a [\]] character and - [\[^\]\]] matches any character that is not [\]]. - - [r]: user-defined reader. Takes the next [ri] formatted input function and - applies it to the scanning buffer [ib] to read the next argument. The - input function [ri] must therefore have type [Scanning.in_channel -> 'a] and - the argument read has type ['a]. - - [\{ fmt %\}]: reads a format string argument. - The format string read must have the same type as the format string - specification [fmt]. - For instance, ["%{ %i %}"] reads any format string that can read a value of - type [int]; hence, if [s] is the string ["fmt:\"number is %u\""], then - [Scanf.sscanf s "fmt: %{%i%}"] succeeds and returns the format string - ["number is %u"]. - - [\( fmt %\)]: scanning format substitution. - Reads a format string and then goes on scanning with the format string - read, instead of using [fmt]. - The format string read must have the same type as the format string - specification [fmt] that it replaces. - For instance, ["%( %i %)"] reads any format string that can read a value - of type [int]. - Returns the format string read, and the value read using the format - string read. - Hence, if [s] is the string ["\"%4d\"1234.00"], then - [Scanf.sscanf s "%(%i%)" (fun fmt i -> fmt, i)] evaluates to - [("%4d", 1234)]. - If the special flag [_] is used, the conversion discards the - format string read and only returns the value read with the format - string read. - Hence, if [s] is the string ["\"%4d\"1234.00"], then - [Scanf.sscanf s "%_(%i%)"] is simply equivalent to - [Scanf.sscanf "1234.00" "%4d"]. - - [l]: returns the number of lines read so far. - - [n]: returns the number of characters read so far. - - [N] or [L]: returns the number of tokens read so far. - - [!]: matches the end of input condition. - - [%]: matches one [%] character in the input. - - [,]: the no-op delimiter for conversion specifications. - - Following the [%] character that introduces a conversion, there may be - the special flag [_]: the conversion that follows occurs as usual, - but the resulting value is discarded. - For instance, if [f] is the function [fun i -> i + 1], and [s] is the - string ["x = 1"], then [Scanf.sscanf s "%_s = %i" f] returns [2]. - - The field width is composed of an optional integer literal - indicating the maximal width of the token to read. - For instance, [%6d] reads an integer, having at most 6 decimal digits; - [%4f] reads a float with at most 4 characters; and [%8\[\\000-\\255\]] - returns the next 8 characters (or all the characters still available, - if fewer than 8 characters are available in the input). - - Notes: - - - as mentioned above, a [%s] conversion always succeeds, even if there is - nothing to read in the input: in this case, it simply returns [""]. - - - in addition to the relevant digits, ['_'] characters may appear - inside numbers (this is reminiscent to the usual Caml lexical - conventions). If stricter scanning is desired, use the range - conversion facility instead of the number conversions. - - - the [scanf] facility is not intended for heavy duty lexical - analysis and parsing. If it appears not expressive enough for your - needs, several alternative exists: regular expressions (module - [Str]), stream parsers, [ocamllex]-generated lexers, - [ocamlyacc]-generated parsers. *) +(** Conversion specifications have the following form: + + [% \[flags\] \[width\] \[.precision\] type] + + In short, a conversion specification consists in the [%] character, + followed by optional modifiers, and a type which is made of one or + several characters. + + As a special convention for format strings, the [\@] character introduces + an escape for both characters [\@] and [%]: in a format string, + [\@\@] and [\@%] are respectively equivalent to the plain characters [\@] + and [%]. + + The types and their meanings are: + + - [d]: reads an optionally signed decimal integer. + - [i]: reads an optionally signed integer + (usual input conventions for decimal ([0-9]+), hexadecimal + ([0x[0-9a-f]+] and [0X[0-9A-F]+]), octal ([0o[0-7]+]), and binary + ([0b[0-1]+]) notations are understood). + - [u]: reads an unsigned decimal integer. + - [x] or [X]: reads an unsigned hexadecimal integer ([[0-9a-f]+] or [[0-9A-F]+]). + - [o]: reads an unsigned octal integer ([[0-7]+]). + - [s]: reads a string argument that spreads as much as possible, until + the following bounding conditions holds: + {ul + {- a whitespace has been found (see {!Scanf.space}),} + {- a scanning indication has been encountered + (see scanning {!Scanf.indication}),} + {- the end-of-input has been reached.} + } + Hence, the [%s] conversion always succeeds: it returns an empty + string, if the bounding condition holds when the scan begins. + - [S]: reads a delimited string argument (delimiters and special + escaped characters follow the lexical conventions of Caml). + - [c]: reads a single character. To test the current input character + without reading it, specify a null field width, i.e. use + specification [%0c]. Raise [Invalid_argument], if the field width + specification is greater than 1. + - [C]: reads a single delimited character (delimiters and special + escaped characters follow the lexical conventions of Caml). + - [f], [e], [E], [g], [G]: reads an optionally signed + floating-point number in decimal notation, in the style [dddd.ddd + e/E+-dd]. + - [F]: reads a floating point number according to the lexical + conventions of Caml (hence the decimal point is mandatory if the + exponent part is not mentioned). + - [B]: reads a boolean argument ([true] or [false]). + - [b]: reads a boolean argument (for backward compatibility; do not use + in new programs). + - [ld], [li], [lu], [lx], [lX], [lo]: reads an [int32] argument to + the format specified by the second letter for regular integers. + - [nd], [ni], [nu], [nx], [nX], [no]: reads a [nativeint] argument to + the format specified by the second letter for regular integers. + - [Ld], [Li], [Lu], [Lx], [LX], [Lo]: reads an [int64] argument to + the format specified by the second letter for regular integers. + - [\[ range \]]: reads characters that matches one of the characters + mentioned in the range of characters [range] (or not mentioned in + it, if the range starts with [^]). Reads a [string] that can be + empty, if the next input character does not match the range. The set of + characters from [c1] to [c2] (inclusively) is denoted by [c1-c2]. + Hence, [%\[0-9\]] returns a string representing a decimal number + or an empty string if no decimal digit is found; similarly, + [%\[\\048-\\057\\065-\\070\]] returns a string of hexadecimal digits. + If a closing bracket appears in a range, it must occur as the + first character of the range (or just after the [^] in case of + range negation); hence [\[\]\]] matches a [\]] character and + [\[^\]\]] matches any character that is not [\]]. + - [r]: user-defined reader. Takes the next [ri] formatted input function and + applies it to the scanning buffer [ib] to read the next argument. The + input function [ri] must therefore have type [Scanning.in_channel -> 'a] and + the argument read has type ['a]. + - [\{ fmt %\}]: reads a format string argument. + The format string read must have the same type as the format string + specification [fmt]. + For instance, ["%{ %i %}"] reads any format string that can read a value of + type [int]; hence, if [s] is the string ["fmt:\"number is %u\""], then + [Scanf.sscanf s "fmt: %{%i%}"] succeeds and returns the format string + ["number is %u"]. + - [\( fmt %\)]: scanning format substitution. + Reads a format string and then goes on scanning with the format string + read, instead of using [fmt]. + The format string read must have the same type as the format string + specification [fmt] that it replaces. + For instance, ["%( %i %)"] reads any format string that can read a value + of type [int]. + Returns the format string read, and the value read using the format + string read. + Hence, if [s] is the string ["\"%4d\"1234.00"], then + [Scanf.sscanf s "%(%i%)" (fun fmt i -> fmt, i)] evaluates to + [("%4d", 1234)]. + If the special flag [_] is used, the conversion discards the + format string read and only returns the value read with the format + string read. + Hence, if [s] is the string ["\"%4d\"1234.00"], then + [Scanf.sscanf s "%_(%i%)"] is simply equivalent to + [Scanf.sscanf "1234.00" "%4d"]. + - [l]: returns the number of lines read so far. + - [n]: returns the number of characters read so far. + - [N] or [L]: returns the number of tokens read so far. + - [!]: matches the end of input condition. + - [%]: matches one [%] character in the input. + - [,]: the no-op delimiter for conversion specifications. + + Following the [%] character that introduces a conversion, there may be + the special flag [_]: the conversion that follows occurs as usual, + but the resulting value is discarded. + For instance, if [f] is the function [fun i -> i + 1], and [s] is the + string ["x = 1"], then [Scanf.sscanf s "%_s = %i" f] returns [2]. + + The optional [width] is an integer literal indicating the maximal width + of the token to read. + For instance, [%6d] reads an integer, having at most 6 decimal digits; + [%4f] reads a float with at most 4 characters; and [%8\[\\000-\\255\]] + returns the next 8 characters (or all the characters still available, + if fewer than 8 characters are available in the input). + + The optional [precision] is a dot [.] followed by an integer literal + indicating the maximum number of digits that follow the decimal point in + the [%f], [%e], and [%E] conversions. For instance, [%.4f] reads a + [float] with at most 4 fractional digits. + + Notes: + + - as mentioned above, the [%s] conversion always succeeds, even if there is + nothing to read in the input: in this case, it simply returns [""]. + + - in addition to the relevant digits, ['_'] characters may appear + inside numbers (this is reminiscent to the usual Caml lexical + conventions). If stricter scanning is desired, use the range + conversion facility instead of the number conversions. + + - the [scanf] facility is not intended for heavy duty lexical + analysis and parsing. If it appears not expressive enough for your + needs, several alternative exists: regular expressions (module + [Str]), stream parsers, [ocamllex]-generated lexers, + [ocamlyacc]-generated parsers. *) (** {7:indication Scanning indications in format strings} *) (** Scanning indications appear just after the string conversions [%s] and [%\[ range \]] to delimit the end of the token. A scanning - indication is introduced by a [@] character, followed by some - constant character [c]. It means that the string token should end + indication is introduced by a [\@] character, followed by some + literal character [c]. It means that the string token should end just before the next matching [c] (which is skipped). If no [c] character is encountered, the string token spreads as much as possible. For instance, ["%s@\t"] reads a string up to the next - tab character or to the end of input. If a scanning - indication [\@c] does not follow a string conversion, it is treated - as a plain [c] character. + tab character or up to the end of input. + + + When it does not introduce a scanning indication, the [\@] character + introduces an escape for the next character: [\@c] is treated as a plain + [c] character. Note: |