From c7171788844166655de22f8bf356408afb659f77 Mon Sep 17 00:00:00 2001 From: vlefevre Date: Mon, 26 Apr 2021 12:17:08 +0000 Subject: [doc] Update about "case insensitive" and issue with Turkish locales for "I" / "i". * mpfr.texi: added "with the rules of the C locale" in the mpfr_strtofr description. * README.dev: completed information about Turkish locales. git-svn-id: https://scm.gforge.inria.fr/anonscm/svn/mpfr/trunk@14505 280ebfd0-de03-0410-8827-d642c229c3f4 --- doc/README.dev | 14 ++++++++++---- doc/mpfr.texi | 9 ++++++++- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/doc/README.dev b/doc/README.dev index d833c477d..caa8255ac 100644 --- a/doc/README.dev +++ b/doc/README.dev @@ -872,10 +872,16 @@ Conversely, do not use locale-dependent functions when the result must not depend on the locales. In particular, the alphanumeric characters used in number strings (as created by mpfr_get_str) must be those of the required characters from the basic character set (see ISO C99 -standard Section 5.2.1 "Character sets"). And tolower(letter) does -not necessarily return the corresponding lowercase letter from these -required characters. For instance, tolower('I') returns a dotless 'i' -in Turkish tr_TR.iso88599 locales. +standard Section 5.2.1 "Character sets"). + +Note that in Turkish locales on some systems: + * the uppercase version of "i" is "İ" (an "I" with a dot above); + * the lowercase version of "I" is "ı" (a dotless "i"). +These characters are available in ISO-8859-9, thus as "char" in the +tr_TR.iso88599 locale. However, in UTF-8, they are not available as +(8-bit) "char"; thus toupper('i') gives 'i' and tolower('I') gives 'I'. +So, when writing code and testing, these two encodings need to be +considered, as they can give different behaviors. =========================================================================== diff --git a/doc/mpfr.texi b/doc/mpfr.texi index 18b23f908..31e141b74 100644 --- a/doc/mpfr.texi +++ b/doc/mpfr.texi @@ -1540,7 +1540,8 @@ stops at the character @samp{0}, thus 0 is read. Special data (for infinities and NaN) can be @samp{@@inf@@} or @samp{@@nan@@(n-char-sequence-opt)}, and if @math{@var{base} @le{} 16}, it can also be @samp{infinity}, @samp{inf}, @samp{nan} or -@samp{nan(n-char-sequence-opt)}, all case insensitive. +@samp{nan(n-char-sequence-opt)}, all case insensitive with the rules of +the C locale. A @samp{n-char-sequence-opt} is a possibly empty string containing only digits, Latin letters and the underscore (0, 1, 2, @dots{}, 9, a, b, @dots{}, z, A, B, @dots{}, Z, _). Note: one has an optional sign for all data, even @@ -1548,6 +1549,12 @@ NaN@. For example, @samp{-@@nAn@@(This_Is_Not_17)} is a valid representation for NaN in base 17. +@c Note about the "case insensitive with the rules of the C locale": +@c The reason is that in Turkish locales on some systems, the uppercase +@c version of "i" is an "I" with a dot above, and the lowercase version +@c of "I" is a dotless "i". We do not follow these rules here. +@c See README.dev for additional information. + @end deftypefun @deftypefun void mpfr_set_nan (mpfr_t @var{x}) -- cgit v1.2.1