From 8c4d0fbf4c45df8e86acbb338b154930c5498dc3 Mon Sep 17 00:00:00 2001 From: Bruno Haible Date: Tue, 16 May 2023 02:02:13 +0200 Subject: doc: New chapter "Strings and Characters". * doc/strings.texi: New file. * doc/gnulib.texi (POSIXURL): New variable. (posixheader, posixfunc, func): New macros, from GNU libunistring's documentation. Include strings.texi. (Particular Modules): Don't include c-locale.texi here. * doc/c-locale.texi: Sections become subsections, subsections become subsubsections. * doc/posix-functions/isalnum.texi: Mention c32isalnum. * doc/posix-functions/isalpha.texi: Mention c32isalpha. * doc/posix-functions/isblank.texi: Mention c32isblank. * doc/posix-functions/iscntrl.texi: Mention c32iscntrl. * doc/posix-functions/isdigit.texi: Mention c32isdigit. * doc/posix-functions/isgraph.texi: Mention c32isgraph. * doc/posix-functions/islower.texi: Mention c32islower. * doc/posix-functions/isprint.texi: Mention c32isprint. * doc/posix-functions/ispunct.texi: Mention c32ispunct. * doc/posix-functions/isspace.texi: Mention c32isspace. * doc/posix-functions/isupper.texi: Mention c32isupper. * doc/posix-functions/isxdigit.texi: Mention c32isxdigit. * doc/posix-functions/tolower.texi: Mention alternative APIs. * doc/posix-functions/toupper.texi: Likewise. * doc/posix-functions/towlower.texi: Mention c32tolower. * doc/posix-functions/towupper.texi: Mention c32toupper. * doc/posix-functions/wcswidth.texi: Mention c32swidth. * doc/posix-functions/wcwidth.texi: Mention c32width. --- ChangeLog | 30 ++ doc/c-locale.texi | 18 +- doc/gnulib.texi | 43 +- doc/posix-functions/isalnum.texi | 8 +- doc/posix-functions/isalpha.texi | 8 +- doc/posix-functions/isblank.texi | 8 +- doc/posix-functions/iscntrl.texi | 8 +- doc/posix-functions/isdigit.texi | 8 +- doc/posix-functions/isgraph.texi | 8 +- doc/posix-functions/islower.texi | 8 +- doc/posix-functions/isprint.texi | 8 +- doc/posix-functions/ispunct.texi | 8 +- doc/posix-functions/isspace.texi | 8 +- doc/posix-functions/isupper.texi | 8 +- doc/posix-functions/isxdigit.texi | 8 +- doc/posix-functions/tolower.texi | 31 +- doc/posix-functions/toupper.texi | 31 +- doc/posix-functions/towlower.texi | 3 + doc/posix-functions/towupper.texi | 3 + doc/posix-functions/wcswidth.texi | 3 + doc/posix-functions/wcwidth.texi | 3 + doc/strings.texi | 854 ++++++++++++++++++++++++++++++++++++++ 22 files changed, 1086 insertions(+), 29 deletions(-) create mode 100644 doc/strings.texi diff --git a/ChangeLog b/ChangeLog index b894505c2b..ecbc25ef06 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,33 @@ +2023-05-15 Bruno Haible + + doc: New chapter "Strings and Characters". + * doc/strings.texi: New file. + * doc/gnulib.texi (POSIXURL): New variable. + (posixheader, posixfunc, func): New macros, from GNU libunistring's + documentation. + Include strings.texi. + (Particular Modules): Don't include c-locale.texi here. + * doc/c-locale.texi: Sections become subsections, subsections become + subsubsections. + * doc/posix-functions/isalnum.texi: Mention c32isalnum. + * doc/posix-functions/isalpha.texi: Mention c32isalpha. + * doc/posix-functions/isblank.texi: Mention c32isblank. + * doc/posix-functions/iscntrl.texi: Mention c32iscntrl. + * doc/posix-functions/isdigit.texi: Mention c32isdigit. + * doc/posix-functions/isgraph.texi: Mention c32isgraph. + * doc/posix-functions/islower.texi: Mention c32islower. + * doc/posix-functions/isprint.texi: Mention c32isprint. + * doc/posix-functions/ispunct.texi: Mention c32ispunct. + * doc/posix-functions/isspace.texi: Mention c32isspace. + * doc/posix-functions/isupper.texi: Mention c32isupper. + * doc/posix-functions/isxdigit.texi: Mention c32isxdigit. + * doc/posix-functions/tolower.texi: Mention alternative APIs. + * doc/posix-functions/toupper.texi: Likewise. + * doc/posix-functions/towlower.texi: Mention c32tolower. + * doc/posix-functions/towupper.texi: Mention c32toupper. + * doc/posix-functions/wcswidth.texi: Mention c32swidth. + * doc/posix-functions/wcwidth.texi: Mention c32width. + 2023-05-15 Bruno Haible sigsegv: Add tentative support for Hurd/x86_64. diff --git a/doc/c-locale.texi b/doc/c-locale.texi index 63d11384bd..b9f6274873 100644 --- a/doc/c-locale.texi +++ b/doc/c-locale.texi @@ -1,5 +1,5 @@ @node String Functions in C Locale -@section Character and String Functions in C Locale +@subsection Character and String Functions in C Locale The functions in this section are similar to the generic string functions from the standard C library, except that @@ -12,6 +12,8 @@ They are specially optimized for the case where all characters are plain ASCII characters. @end itemize +The functions are provided by the following modules. + @menu * c-ctype:: * c-strcase:: @@ -23,29 +25,29 @@ ASCII characters. @end menu @node c-ctype -@subsection c-ctype +@subsubsection c-ctype @include c-ctype.texi @node c-strcase -@subsection c-strcase +@subsubsection c-strcase @include c-strcase.texi @node c-strcaseeq -@subsection c-strcaseeq +@subsubsection c-strcaseeq @include c-strcaseeq.texi @node c-strcasestr -@subsection c-strcasestr +@subsubsection c-strcasestr @include c-strcasestr.texi @node c-strstr -@subsection c-strstr +@subsubsection c-strstr @include c-strstr.texi @node c-strtod -@subsection c-strtod +@subsubsection c-strtod @include c-strtod.texi @node c-strtold -@subsection c-strtold +@subsubsection c-strtold @include c-strtold.texi diff --git a/doc/gnulib.texi b/doc/gnulib.texi index 3af5cb21b2..0f91de5a39 100644 --- a/doc/gnulib.texi +++ b/doc/gnulib.texi @@ -82,6 +82,7 @@ Documentation License''. * Glibc Function Substitutes:: Replacing system functions. * Native Windows Support:: Support for the native Windows platforms. * Multithreading:: Multiple threads of execution. +* Strings and Characters:: Functions for strings and characters. * Particular Modules:: Documentation of individual modules. * Regular expressions:: The regex module. * Build Infrastructure Modules:: Modules that extend the GNU Build System. @@ -91,6 +92,42 @@ Documentation License''. * Index:: @end menu +@c Location of the POSIX specification on the web. +@set POSIXURL http://pubs.opengroup.org/onlinepubs/9699919799 + +@c Macro for referencing a POSIX header. +@ifinfo +@macro posixheader{header} +@code{<\header\>} +@end macro +@end ifinfo +@ifnotinfo +@macro posixheader{header} +@uref{@value{POSIXURL}/basedefs/\header\.html,,@code{<\header\>}} +@end macro +@end ifnotinfo + +@c Macro for referencing a POSIX function. +@c We don't write it as func(), see section "GNU Manuals" of the +@c GNU coding standards. +@ifinfo +@macro posixfunc{func} +@code{\func\} +@end macro +@end ifinfo +@ifnotinfo +@macro posixfunc{func} +@uref{@value{POSIXURL}/functions/\func\.html,,@code{\func\}} +@end macro +@end ifnotinfo + +@c Macro for referencing a normal function. +@c We don't write it as func(), see section "GNU Manuals" of the +@c GNU coding standards. +@macro func{func} +@code{\func\} +@end macro + @c This is used at the beginning of four chapters. @macro nosuchmodulenote{thing} The notation ``Gnulib module: ---'' means that Gnulib does not provide a @@ -6896,6 +6933,9 @@ to POSIX that it can be treated like any other Unix-like platform. @include multithread.texi +@include strings.texi + + @node Particular Modules @chapter Particular Modules @@ -6912,7 +6952,6 @@ to POSIX that it can be treated like any other Unix-like platform. * Closed standard fds:: * Handling strings with NUL characters:: * Container data types:: -* String Functions in C Locale:: * Recognizing Option Arguments:: * Quoting:: * progname and getprogname:: @@ -6954,8 +6993,6 @@ to POSIX that it can be treated like any other Unix-like platform. @include containers.texi -@include c-locale.texi - @include argmatch.texi @include quote.texi diff --git a/doc/posix-functions/isalnum.texi b/doc/posix-functions/isalnum.texi index b538d199c1..422b55d193 100644 --- a/doc/posix-functions/isalnum.texi +++ b/doc/posix-functions/isalnum.texi @@ -21,7 +21,7 @@ Portability problems not fixed by Gnulib: Note: This function's behaviour depends on the locale, but does not support the multibyte characters that occur in strings in locales with @code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). -There are four alternative APIs: +There are five alternative APIs: @table @code @item c_isalnum @@ -34,6 +34,12 @@ order to use it, you first have to convert from multibyte to wide characters, using the @code{mbrtowc} function. It is provided by the Gnulib module @samp{wctype}. +@item c32isalnum +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32isalnum}. + @item mb_isalnum This function operates in a locale dependent way, on multibyte characters. It is provided by the Gnulib module @samp{mbchar}. diff --git a/doc/posix-functions/isalpha.texi b/doc/posix-functions/isalpha.texi index 2e4304ea8b..ee1c644a42 100644 --- a/doc/posix-functions/isalpha.texi +++ b/doc/posix-functions/isalpha.texi @@ -21,7 +21,7 @@ Portability problems not fixed by Gnulib: Note: This function's behaviour depends on the locale, but does not support the multibyte characters that occur in strings in locales with @code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). -There are four alternative APIs: +There are five alternative APIs: @table @code @item c_isalpha @@ -34,6 +34,12 @@ order to use it, you first have to convert from multibyte to wide characters, using the @code{mbrtowc} function. It is provided by the Gnulib module @samp{wctype}. +@item c32isalpha +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32isalpha}. + @item mb_isalpha This function operates in a locale dependent way, on multibyte characters. It is provided by the Gnulib module @samp{mbchar}. diff --git a/doc/posix-functions/isblank.texi b/doc/posix-functions/isblank.texi index ab23391ac6..18b09fd903 100644 --- a/doc/posix-functions/isblank.texi +++ b/doc/posix-functions/isblank.texi @@ -24,7 +24,7 @@ Portability problems not fixed by Gnulib: Note: This function's behaviour depends on the locale, but does not support the multibyte characters that occur in strings in locales with @code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). -There are four alternative APIs: +There are five alternative APIs: @table @code @item c_isblank @@ -37,6 +37,12 @@ order to use it, you first have to convert from multibyte to wide characters, using the @code{mbrtowc} function. It is provided by the Gnulib module @samp{wctype}. +@item c32isblank +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32isblank}. + @item mb_isblank This function operates in a locale dependent way, on multibyte characters. It is provided by the Gnulib module @samp{mbchar}. diff --git a/doc/posix-functions/iscntrl.texi b/doc/posix-functions/iscntrl.texi index 19758ef1da..c6a40314f6 100644 --- a/doc/posix-functions/iscntrl.texi +++ b/doc/posix-functions/iscntrl.texi @@ -21,7 +21,7 @@ Portability problems not fixed by Gnulib: Note: This function's behaviour depends on the locale, but does not support the multibyte characters that occur in strings in locales with @code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). -There are four alternative APIs: +There are five alternative APIs: @table @code @item c_iscntrl @@ -34,6 +34,12 @@ order to use it, you first have to convert from multibyte to wide characters, using the @code{mbrtowc} function. It is provided by the Gnulib module @samp{wctype}. +@item c32iscntrl +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32iscntrl}. + @item mb_iscntrl This function operates in a locale dependent way, on multibyte characters. It is provided by the Gnulib module @samp{mbchar}. diff --git a/doc/posix-functions/isdigit.texi b/doc/posix-functions/isdigit.texi index 2494170827..7d01a500b2 100644 --- a/doc/posix-functions/isdigit.texi +++ b/doc/posix-functions/isdigit.texi @@ -21,7 +21,7 @@ Portability problems not fixed by Gnulib: Note: This function's behaviour depends on the locale, but does not support the multibyte characters that occur in strings in locales with @code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). -There are four alternative APIs: +There are five alternative APIs: @table @code @item c_isdigit @@ -34,6 +34,12 @@ order to use it, you first have to convert from multibyte to wide characters, using the @code{mbrtowc} function. It is provided by the Gnulib module @samp{wctype}. +@item c32isdigit +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32isdigit}. + @item mb_isdigit This function operates in a locale dependent way, on multibyte characters. It is provided by the Gnulib module @samp{mbchar}. diff --git a/doc/posix-functions/isgraph.texi b/doc/posix-functions/isgraph.texi index 01fbc83cb7..a4754dda3b 100644 --- a/doc/posix-functions/isgraph.texi +++ b/doc/posix-functions/isgraph.texi @@ -21,7 +21,7 @@ Portability problems not fixed by Gnulib: Note: This function's behaviour depends on the locale, but does not support the multibyte characters that occur in strings in locales with @code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). -There are four alternative APIs: +There are five alternative APIs: @table @code @item c_isgraph @@ -34,6 +34,12 @@ order to use it, you first have to convert from multibyte to wide characters, using the @code{mbrtowc} function. It is provided by the Gnulib module @samp{wctype}. +@item c32isgraph +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32isgraph}. + @item mb_isgraph This function operates in a locale dependent way, on multibyte characters. It is provided by the Gnulib module @samp{mbchar}. diff --git a/doc/posix-functions/islower.texi b/doc/posix-functions/islower.texi index 8eba57ae5a..fb3a898ab7 100644 --- a/doc/posix-functions/islower.texi +++ b/doc/posix-functions/islower.texi @@ -21,7 +21,7 @@ Portability problems not fixed by Gnulib: Note: This function's behaviour depends on the locale, but does not support the multibyte characters that occur in strings in locales with @code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). -There are four alternative APIs: +There are five alternative APIs: @table @code @item c_islower @@ -34,6 +34,12 @@ order to use it, you first have to convert from multibyte to wide characters, using the @code{mbrtowc} function. It is provided by the Gnulib module @samp{wctype}. +@item c32islower +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32islower}. + @item mb_islower This function operates in a locale dependent way, on multibyte characters. It is provided by the Gnulib module @samp{mbchar}. diff --git a/doc/posix-functions/isprint.texi b/doc/posix-functions/isprint.texi index e30ddc958d..931776f34c 100644 --- a/doc/posix-functions/isprint.texi +++ b/doc/posix-functions/isprint.texi @@ -21,7 +21,7 @@ Portability problems not fixed by Gnulib: Note: This function's behaviour depends on the locale, but does not support the multibyte characters that occur in strings in locales with @code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). -There are four alternative APIs: +There are five alternative APIs: @table @code @item c_isprint @@ -34,6 +34,12 @@ order to use it, you first have to convert from multibyte to wide characters, using the @code{mbrtowc} function. It is provided by the Gnulib module @samp{wctype}. +@item c32isprint +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32isprint}. + @item mb_isprint This function operates in a locale dependent way, on multibyte characters. It is provided by the Gnulib module @samp{mbchar}. diff --git a/doc/posix-functions/ispunct.texi b/doc/posix-functions/ispunct.texi index 2f245202a2..252b5773f5 100644 --- a/doc/posix-functions/ispunct.texi +++ b/doc/posix-functions/ispunct.texi @@ -21,7 +21,7 @@ Portability problems not fixed by Gnulib: Note: This function's behaviour depends on the locale, but does not support the multibyte characters that occur in strings in locales with @code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). -There are four alternative APIs: +There are five alternative APIs: @table @code @item c_ispunct @@ -34,6 +34,12 @@ order to use it, you first have to convert from multibyte to wide characters, using the @code{mbrtowc} function. It is provided by the Gnulib module @samp{wctype}. +@item c32ispunct +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32ispunct}. + @item mb_ispunct This function operates in a locale dependent way, on multibyte characters. It is provided by the Gnulib module @samp{mbchar}. diff --git a/doc/posix-functions/isspace.texi b/doc/posix-functions/isspace.texi index c0817ca768..ab7b0b41d8 100644 --- a/doc/posix-functions/isspace.texi +++ b/doc/posix-functions/isspace.texi @@ -21,7 +21,7 @@ Portability problems not fixed by Gnulib: Note: This function's behaviour depends on the locale, but does not support the multibyte characters that occur in strings in locales with @code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). -There are four alternative APIs: +There are five alternative APIs: @table @code @item c_isspace @@ -34,6 +34,12 @@ order to use it, you first have to convert from multibyte to wide characters, using the @code{mbrtowc} function. It is provided by the Gnulib module @samp{wctype}. +@item c32isspace +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32isspace}. + @item mb_isspace This function operates in a locale dependent way, on multibyte characters. It is provided by the Gnulib module @samp{mbchar}. diff --git a/doc/posix-functions/isupper.texi b/doc/posix-functions/isupper.texi index 295e86cf4d..eb2b0ff0b6 100644 --- a/doc/posix-functions/isupper.texi +++ b/doc/posix-functions/isupper.texi @@ -21,7 +21,7 @@ Portability problems not fixed by Gnulib: Note: This function's behaviour depends on the locale, but does not support the multibyte characters that occur in strings in locales with @code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). -There are four alternative APIs: +There are five alternative APIs: @table @code @item c_isupper @@ -34,6 +34,12 @@ order to use it, you first have to convert from multibyte to wide characters, using the @code{mbrtowc} function. It is provided by the Gnulib module @samp{wctype}. +@item c32isupper +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32isupper}. + @item mb_isupper This function operates in a locale dependent way, on multibyte characters. It is provided by the Gnulib module @samp{mbchar}. diff --git a/doc/posix-functions/isxdigit.texi b/doc/posix-functions/isxdigit.texi index e5b78bceaa..5e13b008c4 100644 --- a/doc/posix-functions/isxdigit.texi +++ b/doc/posix-functions/isxdigit.texi @@ -21,7 +21,7 @@ Portability problems not fixed by Gnulib: Note: This function's behaviour depends on the locale, but does not support the multibyte characters that occur in strings in locales with @code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). -There are four alternative APIs: +There are five alternative APIs: @table @code @item c_isxdigit @@ -34,6 +34,12 @@ order to use it, you first have to convert from multibyte to wide characters, using the @code{mbrtowc} function. It is provided by the Gnulib module @samp{wctype}. +@item c32isxdigit +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32isxdigit}. + @item mb_isxdigit This function operates in a locale dependent way, on multibyte characters. It is provided by the Gnulib module @samp{mbchar}. diff --git a/doc/posix-functions/tolower.texi b/doc/posix-functions/tolower.texi index d8d2bf7f06..9911b7fb0c 100644 --- a/doc/posix-functions/tolower.texi +++ b/doc/posix-functions/tolower.texi @@ -16,7 +16,32 @@ OS X 10.8. Portability problems not fixed by Gnulib: @itemize -@item -On Windows and 32-bit AIX platforms, @code{wchar_t} is a 16-bit type and therefore cannot -accommodate all Unicode characters. @end itemize + +Note: This function's behaviour depends on the locale, but does not support +the multibyte characters that occur in strings in locales with +@code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). +There are four alternative APIs: + +@table @code +@item c_tolower +This function operates in a locale independent way and returns a different +value than the argument only for uppercase ASCII characters. It is provided +by the Gnulib module @samp{c-ctype}. + +@item towlower +This function operates in a locale dependent way, on wide characters. In +order to use it, you first have to convert from multibyte to wide characters, +using the @code{mbrtowc} function. It is provided by the Gnulib module +@samp{wctype}. + +@item c32tolower +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32tolower}. + +@item uc_tolower +This function operates in a locale independent way, on Unicode characters. +It is provided by the Gnulib module @samp{unicase/tolower}. +@end table diff --git a/doc/posix-functions/toupper.texi b/doc/posix-functions/toupper.texi index 36e40c45bc..86272d3ca0 100644 --- a/doc/posix-functions/toupper.texi +++ b/doc/posix-functions/toupper.texi @@ -16,7 +16,32 @@ OS X 10.8. Portability problems not fixed by Gnulib: @itemize -@item -On Windows and 32-bit AIX platforms, @code{wchar_t} is a 16-bit type and therefore cannot -accommodate all Unicode characters. @end itemize + +Note: This function's behaviour depends on the locale, but does not support +the multibyte characters that occur in strings in locales with +@code{MB_CUR_MAX > 1} (this includes all the common UTF-8 locales). +There are four alternative APIs: + +@table @code +@item c_toupper +This function operates in a locale independent way and returns a different +value than the argument only for lowercase ASCII characters. It is provided +by the Gnulib module @samp{c-ctype}. + +@item towupper +This function operates in a locale dependent way, on wide characters. In +order to use it, you first have to convert from multibyte to wide characters, +using the @code{mbrtowc} function. It is provided by the Gnulib module +@samp{wctype}. + +@item c32toupper +This function operates in a locale dependent way, on 32-bit wide characters. +In order to use it, you first have to convert from multibyte to 32-bit wide +characters, using the @code{mbrtoc32} function. It is provided by the +Gnulib module @samp{c32toupper}. + +@item uc_toupper +This function operates in a locale independent way, on Unicode characters. +It is provided by the Gnulib module @samp{unicase/toupper}. +@end table diff --git a/doc/posix-functions/towlower.texi b/doc/posix-functions/towlower.texi index a8ef3ce990..b6c51a4571 100644 --- a/doc/posix-functions/towlower.texi +++ b/doc/posix-functions/towlower.texi @@ -23,6 +23,9 @@ Portability problems not fixed by Gnulib: @item On Windows and 32-bit AIX platforms, @code{wchar_t} is a 16-bit type and therefore cannot accommodate all Unicode characters. +However, the Gnulib function @code{c32tolower}, provided by Gnulib module +@code{c32tolower}, operates on 32-bit wide characters and therefore does not +have this limitation. @item This function returns wrong values even for the ASCII characters in a zh_CN.GB18030 locale on some platforms: diff --git a/doc/posix-functions/towupper.texi b/doc/posix-functions/towupper.texi index 902cf16e68..bd29eec9ad 100644 --- a/doc/posix-functions/towupper.texi +++ b/doc/posix-functions/towupper.texi @@ -23,6 +23,9 @@ Portability problems not fixed by Gnulib: @item On Windows and 32-bit AIX platforms, @code{wchar_t} is a 16-bit type and therefore cannot accommodate all Unicode characters. +However, the Gnulib function @code{c32toupper}, provided by Gnulib module +@code{c32toupper}, operates on 32-bit wide characters and therefore does not +have this limitation. @item This function returns wrong values even for the ASCII characters in a zh_CN.GB18030 locale on some platforms: diff --git a/doc/posix-functions/wcswidth.texi b/doc/posix-functions/wcswidth.texi index eab3393a60..02049ffdd0 100644 --- a/doc/posix-functions/wcswidth.texi +++ b/doc/posix-functions/wcswidth.texi @@ -18,4 +18,7 @@ Portability problems not fixed by Gnulib: @item On Windows and 32-bit AIX platforms, @code{wchar_t} is a 16-bit type and therefore cannot accommodate all Unicode characters. +However, the Gnulib function @code{c32swidth}, provided by Gnulib module +@code{c32swidth}, operates on 32-bit wide characters and therefore does not +have this limitation. @end itemize diff --git a/doc/posix-functions/wcwidth.texi b/doc/posix-functions/wcwidth.texi index b68cb78015..1ec5f48197 100644 --- a/doc/posix-functions/wcwidth.texi +++ b/doc/posix-functions/wcwidth.texi @@ -29,6 +29,9 @@ Portability problems not fixed by Gnulib: @item On Windows and 32-bit AIX platforms, @code{wchar_t} is a 16-bit type and therefore cannot accommodate all Unicode characters. +However, the Gnulib function @code{c32width}, provided by Gnulib module +@code{c32width}, operates on 32-bit wide characters and therefore does not +have this limitation. @item This function treats zero-width spaces like control characters on some platforms: diff --git a/doc/strings.texi b/doc/strings.texi new file mode 100644 index 0000000000..aa0830f1a5 --- /dev/null +++ b/doc/strings.texi @@ -0,0 +1,854 @@ +@node Strings and Characters +@chapter Strings and Characters + +@c Copyright (C) 2009-2023 Free Software Foundation, Inc. + +@c Permission is granted to copy, distribute and/or modify this document +@c under the terms of the GNU Free Documentation License, Version 1.3 or +@c any later version published by the Free Software Foundation; with no +@c Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A +@c copy of the license is at . + +@c Written by Bruno Haible. + +This chapter describes the APIs for strings and characters, provided by Gnulib. + +@menu +* Strings:: +* Characters:: +@end menu + +@node Strings +@section Strings + +Several possible representations exist for the representation of strings +in memory of a running C program. + +@menu +* C strings:: +* Strings with NUL characters:: +* Comparison of string APIs:: +@end menu + +@node C strings +@subsection The C string representation + +The classical representation of a string in C is a sequence of +characters, where each character takes up one or more bytes, followed by +a terminating NUL byte. This representation is used for strings that +are passed by the operating system (in the @code{argv} argument of +@code{main}, for example) and for strings that are passed to the +operating system (in system calls such as @code{open}). The C type to +hold such strings is @samp{char *} or, in places where the string shall +not be modified, @samp{const char *}. There are many C library +functions, standardized by ISO C and POSIX, that assume this +representation of strings. + +An @emph{character encoding}, or @emph{encoding} for short, describes +how the elements of a character set are represented as a sequence of +bytes. For example, in the @code{ASCII} encoding, the UNDERSCORE +character is represented by a single byte, with value 0x5F. As another +example, the COPYRIGHT SIGN character is represented: +@itemize +@item +in the @code{ISO-8859-1} encoding, by the single byte 0xA9, +@item +in the @code{UTF-8} encoding, by the two bytes 0xC2 0xA9, +@item +in the @code{GB18030} encoding, by the four bytes 0x81 0x30 0x84 0x38. +@end itemize + +@noindent +Note: The @samp{char} type may be signed or unsigned, depending on the +platform. When we talk about the "byte 0xA9" we actually mean the +@code{char} object whose value is @code{(char) 0xA9}; we omit the cast +to @code{char} in this documentation, for brevity. + +In POSIX, the character encoding is determined by the locale. The +locale is some environmental attribute that the user can choose. + +Depending on the encoding, in general, every character is represented by +one or more bytes (up to 4 bytes in practice --- but +use @code{MB_LEN_MAX} instead of the number 4 in the code). +@cindex unibyte locale +@cindex multibyte locale +When every character is represented by only 1 byte, we speak of an +``unibyte locale'', otherwise of a ``multibyte locale''. + +It is important to realize that the majority of Unix installations +nowadays use UTF-8 or GB18030 as locale encoding; therefore, the +majority of users are using multibyte locales. + +Three important facts to remember are: + +@cartouche +@emph{A @samp{char} is a byte, not a character.} +@end cartouche + +As a consequence: +@itemize @bullet +@item +The @posixheader{ctype.h} API, that was designed only with unibyte +encodings in mind, is useless nowadays; it does not work in +multibyte locales. +@item +The @posixfunc{strlen} function does not return the number of characters +in a string. Nor does it return the number of screen columns occupied +by a string after it is output. It merely returns the number of +@emph{bytes} occupied by a string. +@item +Truncating a string, for example, with @posixfunc{strncpy}, can have the +effect of truncating it in the middle of a multibyte character. Such +a string will, when output, have a garbled character at its end, often +represented by a hollow box. +@end itemize + +@cartouche +@emph{Multibyte does not imply UTF-8 encoding.} +@end cartouche + +While UTF-8 is the most common multibyte encoding, GB18030 is there as +well and will not go away within decades, because it is a Chinese +government standard, last revised in 2022. + +@cartouche +@emph{Searching for a character in a string is not the same as searching +for a byte in the string.} +@end cartouche + +Take the above example of COPYRIGHT SIGN in the @code{GB18030} encoding: +A byte search will find the bytes @code{'0'} and @code{'8'} in this +string. But a search for the @emph{character} "0" or "8" in the string +"@copyright{}" must, of course, report ``not found''. + +As a consequence: +@itemize @bullet +@item +@posixfunc{strchr} and @posixfunc{strrchr} do not work with multibyte +strings if the locale encoding is GB18030 and the character to be +searched is a digit. +@item +@posixfunc{strstr} does not work with multibyte strings if the locale +encoding is different from UTF-8. +@item +@posixfunc{strcspn}, @posixfunc{strpbrk}, @posixfunc{strspn} cannot work +correctly in multibyte locales: they assume the second argument is a +list of single-byte characters. Even in this simple case, they do not +work with multibyte strings if the locale encoding is GB18030 and one of +the characters to be searched is a digit. +@item +@posixfunc{strsep} and @posixfunc{strtok_r} do not work with multibyte +strings unless all of the delimiter characters are ASCII characters +< 0x30. +@item +The @posixfunc{strcasecmp}, @posixfunc{strncasecmp}, and +@posixfunc{strcasestr} functions do not work with multibyte strings. +@end itemize + +Workarounds can be found in Gnulib, in the form of @code{mbs*} API +functions: +@itemize @bullet +@item +Gnulib has functions @func{mbslen} and @func{mbswidth} that can be used +instead of @posixfunc{strlen} when the number of characters or the +number of screen columns of a string is requested. +@item +Gnulib has functions @func{mbschr} and @func{mbsrrchr} that are like +@posixfunc{strchr} and @posixfunc{strrchr}, but work in multibyte +locales. +@item +Gnulib has a function @func{mbsstr} that is like @posixfunc{strstr}, but +works in multibyte locales. +@item +Gnulib has functions @func{mbscspn}, @func{mbspbrk}, @func{mbsspn} that +are like @posixfunc{strcspn}, @posixfunc{strpbrk}, @posixfunc{strspn}, +but work in multibyte locales. +@item +Gnulib has functions @func{mbssep} and @func{mbstok_r} that are like +@posixfunc{strsep} and @posixfunc{strtok_r} but work in multibyte +locales. +@item +Gnulib has functions @func{mbscasecmp}, @func{mbsncasecmp}, +@func{mbspcasecmp}, and @func{mbscasestr} that are like +@posixfunc{strcasecmp}, @posixfunc{strncasecmp}, and +@posixfunc{strcasestr}, but work in multibyte locales. Still, the +function @code{ulc_casecmp} is preferable to these functions. +@end itemize + +Gnulib also has additional API. + +@menu +* Iterating through strings:: +@end menu + +@node Iterating through strings +@subsubsection Iterating through strings + +For complex string processing, the provided strings functions may not be +enough, and what you need is a way to iterate through a string while +processing each (possibly multibyte) character in turn. Gnulib provides +two modules for this purpose. Both iterate through the string in +forward direction. Iteration in backward direction, that is, from the +string's end to start, is not provided, as it is too hairy in general. + +@itemize +@item +The @code{mbiter} module. It iterates through a C string whose length +is already known. +@item +The @code{mbuiter} module. It iterates through a C string whose length +is not a-priori known. +@end itemize + +The @code{mbuiter} module is suitable when there is a high probability +that only the first few multibyte characters need to be inspected. +Whereas the @code{mbiter} module is better if usually the iteration runs +through the entire string. + +@node Strings with NUL characters +@subsection Strings with NUL characters + +The GNU Coding Standards, section +@ifinfo +@ref{Semantics,,Writing Robust Programs,standards}, +@end ifinfo +@ifnotinfo +@url{https://www.gnu.org/prep/standards/html_node/Semantics.html}, +@end ifnotinfo +specifies: +@cartouche +Utilities reading files should not drop NUL characters, or any other +nonprinting characters. +@end cartouche + +When it is a requirement to store NUL characters in strings, a variant +of the C strings is needed. Gnulib offers a ``string descriptor'' type +for this purpose. See @ref{Handling strings with NUL characters}. + +All remarks regarding encodings and multibyte characters in the previous +section apply to string descriptors as well. + +@include c-locale.texi + +@node Comparison of string APIs +@subsection Comparison of string APIs + +This table summarizes the API functions available for strings, in POSIX +and in Gnulib. + +@multitable @columnfractions .17 .17 .17 .17 .16 .16 +@headitem unibyte strings only +@tab assume C locale +@tab multibyte strings +@tab multibyte strings with NULs +@tab wide character strings +@tab 32-bit wide character strings + +@item @code{strlen} +@tab @code{strlen} +@tab @code{mbslen} +@tab @code{string_desc_length} +@tab @code{wcslen} +@tab @code{u32_strlen} + +@item @code{strnlen} +@tab @code{strnlen} +@tab @code{mbsnlen} +@tab -- +@tab @code{wcsnlen} +@tab @code{u32_strnlen}, @code{u32_mbsnlen} + +@item @code{strcmp} +@tab @code{strcmp} +@tab @code{strcmp} +@tab @code{string_desc_cmp} +@tab @code{wcscmp} +@tab @code{u32_strcmp} + +@item @code{strncmp} +@tab @code{strncmp} +@tab @code{strncmp} +@tab -- +@tab @code{wcsncmp} +@tab @code{u32_strncmp} + +@item @code{strcasecmp} +@tab @code{strcasecmp} +@tab @code{mbscasecmp} +@tab -- +@tab @code{wcscasecmp} +@tab @code{u32_casecmp} + +@item @code{strncasecmp} +@tab @code{strncasecmp} +@tab @code{mbsncasecmp}, @code{mbspcasecmp} +@tab -- +@tab @code{wcsncasecmp} +@tab @code{u32_casecmp} + +@item @code{strcoll} +@tab @code{strcmp} +@tab @code{strcoll} +@tab -- +@tab @code{wcscoll} +@tab @code{u32_strcoll} + +@item @code{strxfrm} +@tab -- +@tab @code{strxfrm} +@tab -- +@tab @code{wcsxfrm} +@tab -- + +@item @code{strchr} +@tab @code{strchr} +@tab @code{mbschr} +@tab @code{string_desc_index} +@tab @code{wcschr} +@tab @code{u32_strchr} + +@item @code{strrchr} +@tab @code{strrchr} +@tab @code{mbsrchr} +@tab @code{string_desc_last_index} +@tab @code{wcsrchr} +@tab @code{u32_strrchr} + +@item @code{strstr} +@tab @code{strstr} +@tab @code{mbsstr} +@tab @code{string_desc_contains} +@tab @code{wcsstr} +@tab @code{u32_strstr} + +@item @code{strcasestr} +@tab @code{strcasestr} +@tab @code{mbscasestr} +@tab -- +@tab -- +@tab -- + +@item @code{strspn} +@tab @code{strspn} +@tab @code{mbsspn} +@tab -- +@tab @code{wcsspn} +@tab @code{u32_strspn} + +@item @code{strcspn} +@tab @code{strcspn} +@tab @code{mbscspn} +@tab -- +@tab @code{wcscspn} +@tab @code{u32_strcspn} + +@item @code{strpbrk} +@tab @code{strpbrk} +@tab @code{mbspbrk} +@tab -- +@tab @code{wcspbrk} +@tab @code{u32_strpbrk} + +@item @code{strtok_r} +@tab @code{strtok_r} +@tab @code{mbstok_r} +@tab -- +@tab @code{wcstok} +@tab @code{u32_strtok} + +@item @code{strsep} +@tab @code{strsep} +@tab @code{mbssep} +@tab -- +@tab -- +@tab -- + +@item @code{strcpy} +@tab @code{strcpy} +@tab @code{strcpy} +@tab @code{string_desc_copy} +@tab @code{wcscpy} +@tab @code{u32_strcpy} + +@item @code{stpcpy} +@tab @code{stpcpy} +@tab @code{stpcpy} +@tab -- +@tab @code{wcpcpy} +@tab @code{u32_stpcpy} + +@item @code{strncpy} +@tab @code{strncpy} +@tab @code{strncpy} +@tab -- +@tab @code{wcsncpy} +@tab @code{u32_strncpy} + +@item @code{stpncpy} +@tab @code{stpncpy} +@tab @code{stpncpy} +@tab -- +@tab @code{wcpncpy} +@tab @code{u32_stpncpy} + +@item @code{strcat} +@tab @code{strcat} +@tab @code{strcat} +@tab @code{string_desc_concat} +@tab @code{wcscat} +@tab @code{u32_strcat} + +@item @code{strncat} +@tab @code{strncat} +@tab @code{strncat} +@tab -- +@tab @code{wcsncat} +@tab @code{u32_strncat} + +@item @code{free} +@tab @code{free} +@tab @code{free} +@tab @code{string_desc_free} +@tab @code{free} +@tab @code{free} + +@item @code{strdup} +@tab @code{strdup} +@tab @code{strdup} +@tab @code{string_desc_copy} +@tab @code{wcsdup} +@tab @code{u32_strdup} + +@item @code{strndup} +@tab @code{strndup} +@tab @code{strndup} +@tab -- +@tab -- +@tab -- + +@item @code{mbswidth} +@tab @code{mbswidth} +@tab @code{mbswidth} +@tab -- +@tab @code{wcswidth} +@tab @code{c32swidth}, @code{u32_strwidth} + +@item @code{strtol} +@tab @code{strtol} +@tab @code{strtol} +@tab -- +@tab -- +@tab -- + +@item @code{strtoul} +@tab @code{strtoul} +@tab @code{strtoul} +@tab -- +@tab -- +@tab -- + +@item @code{strtoll} +@tab @code{strtoll} +@tab @code{strtoll} +@tab -- +@tab -- +@tab -- + +@item @code{strtoull} +@tab @code{strtoull} +@tab @code{strtoull} +@tab -- +@tab -- +@tab -- + +@item @code{strtoimax} +@tab @code{strtoimax} +@tab @code{strtoimax} +@tab -- +@tab @code{wcstoimax} +@tab -- + +@item @code{strtoumax} +@tab @code{strtoumax} +@tab @code{strtoumax} +@tab -- +@tab @code{wcstoumax} +@tab -- + +@item @code{strtof} +@tab -- +@tab @code{strtof} +@tab -- +@tab -- +@tab -- + +@item @code{strtod} +@tab @code{c_strtod} +@tab @code{strtod} +@tab -- +@tab -- +@tab -- + +@item @code{strtold} +@tab @code{c_strtold} +@tab @code{strtold} +@tab -- +@tab -- +@tab -- + +@item @code{strfromf} +@tab -- +@tab @code{strfromf} +@tab -- +@tab -- +@tab -- + +@item @code{strfromd} +@tab -- +@tab @code{strfromd} +@tab -- +@tab -- +@tab -- + +@item @code{strfroml} +@tab -- +@tab @code{strfroml} +@tab -- +@tab -- +@tab -- + +@item -- +@tab -- +@tab -- +@tab -- +@tab @code{mbstowcs} +@tab @code{mbstoc32s} + +@item -- +@tab -- +@tab -- +@tab -- +@tab @code{mbsrtowcs} +@tab @code{mbsrtoc32s} + +@item -- +@tab -- +@tab -- +@tab -- +@tab @code{mbsnrtowcs} +@tab @code{mbsnrtoc32s} + +@item -- +@tab -- +@tab -- +@tab -- +@tab @code{wcstombs} +@tab @code{c32stombs} + +@item -- +@tab -- +@tab -- +@tab -- +@tab @code{wcsrtombs} +@tab @code{c32srtombs} + +@item -- +@tab -- +@tab -- +@tab -- +@tab @code{wcsnrtombs} +@tab @code{c32snrtombs} + +@end multitable + +@node Characters +@section Characters + +A @emph{character} is the elementary unit that strings are made of. + +What is a character? ``A character is an element of a character set'' +is sort of a circular definition, but it highlights the fact that it is +not merely a number. Although many characters are visually represented +by a single glyph, there are characters that, for example, have a +different glyph when used at the end of a word than when used inside a +word. A character is also not the minimal rendered text processing +unit; that is a grapheme cluster and in general consists of one or more +characters. If you want to know more about the concept of character and +various concepts associated with characters, refer to the Unicode +standard. + +For the representation in memory of a character, various types have been +in use, and some of them were failures: @code{char} and @code{wchar_t} +were invented for this purpose, but are not the right types. +@code{char32_t} is the right type (successor of @code{wchar_t}); and +@code{mbchar_t} (defined by Gnulib) is an alternative for specific kinds +of processing. + +@menu +* The char type:: +* The wchar_t type:: +* The char32_t type:: +* The mbchar_t type:: +* Comparison of character APIs:: +@end menu + +@node The char type +@subsection The @code{char} type + +The @code{char} type is in the C language since the beginning in the +1970ies, but --- due to its limitation of 256 possible values --- is no +longer the adequate type for storing a character. + +Technically, it is still adequate in unibyte locales. But since most +locales nowadays are multibyte locales, it makes no sense to write a +program that runs only in unibyte locales. + +ISO C and POSIX standardized an API for characters of type @code{char}, +in @code{}. This API is nowadays useless and obsolete. + +The important lessons to remember are: + +@cartouche +@emph{A @samp{char} is just the elementary storage unit for a string, +not a character.} +@end cartouche + +@cartouche +@emph{Never use @code{}!} +@end cartouche + +@node The wchar_t type +@subsection The @code{wchar_t} type + +The ISO C and POSIX standard creators made an attempt to overcome the +dead end regarding the @code{char} type. They introduced +@itemize @bullet +@item +a type @samp{wchar_t}, designed to encapsulate a character, +@item +a ``wide string'' type @samp{wchar_t *}, with some API functions +declared in @posixheader{wchar.h}, and +@item +functions declared in @posixheader{wctype.h} that were meant to supplant +the ones in @posixheader{ctype.h}. +@end itemize + +Unfortunately, this API and its implementation has numerous problems: + +@itemize @bullet +@item +On Windows platforms and on AIX in 32-bit mode, @code{wchar_t} is a +16-bit type. This means that it can never accommodate an entire Unicode +character. Either the @code{wchar_t *} strings are limited to +characters in UCS-2 (the ``Basic Multilingual Plane'' of Unicode), or +--- if @code{wchar_t *} strings are encoded in UTF-16 --- a +@code{wchar_t} represents only half of a character in the worst case, +making the @posixheader{wctype.h} functions pointless. + +@item +On Solaris and FreeBSD, the @code{wchar_t} encoding is locale dependent +and undocumented. This means, if you want to know any property of a +@code{wchar_t} character, other than the properties defined by +@posixheader{wctype.h} --- such as whether it's a dash, currency symbol, +paragraph separator, or similar ---, you have to convert it to +@code{char *} encoding first, by use of the function @posixfunc{wctomb}. + +@item +When you read a stream of wide characters, through the functions +@posixfunc{fgetwc} and @posixfunc{fgetws}, and when the input +stream/file is not in the expected encoding, you have no way to +determine the invalid byte sequence and do some corrective action. If +you use these functions, your program becomes ``garbage in - more +garbage out'' or ``garbage in - abort''. +@end itemize + +As a consequence, it is better to use multibyte strings. Such multibyte +strings can bypass limitations of the @code{wchar_t} type, if you use +functions defined in Gnulib and GNU libunistring for text processing. +They can also faithfully transport malformed characters that were +present in the input, without requiring the program to produce garbage +or abort. + +@node The char32_t type +@subsection The @code{char32_t} type + +The ISO C and POSIX standard creators then introduced the +@code{char32_t} type. In ISO C 11, it was conceptually a ``32-bit wide +character'' type. In ISO C 23, its semantics has been further +specified: A @code{char32_t} value is a Unicode code point. + +Thus, the @code{char32_t} type is not affected the problems that plague +the @code{wchar_t} type. + +The @code{char32_t} type and its API are defined in the @code{} +header file. + +ISO C and POSIX specify only the basic functions for the @code{char32_t} +type, namely conversion of a single character (@func{mbrtoc32} and +@func{c32rtomb}). For convenience, Gnulib adds API for classification +and case conversion of characters. + +GNU libunistring can also be used on @code{char32_t} values. Since +@code{char32_t} is the same as @code{uint32_t}, all @code{u32_*} +functions of GNU libunistring are applicable to arrays of +@code{char32_t} values. + +On glibc systems, use of the 32-bit wide strings (@code{char32_t[]}) is +exactly as efficient as the use of the older wide strings +(@code{wchar_t[]}). This is possible because on glibc, @code{wchar_t} +values already always were 32-bit and Unicode code points. +@code{mbrtoc32} is just an alias of @code{mbrtowc}. The Gnulib +@code{*c32*} functions are optimized so that on glibc systems they +immediately redirect to the corresponding @code{*wc*} functions. + +@node The mbchar_t type +@subsection The @code{mbchar_t} type + +Gnulib defines an alternate way to encode a multibyte character: +@code{mbchar_t}. Its main feature is the ability to process a string or +stream with some malformed characters without reporting an error. + +The type @code{mbchar_t}, defined in @code{"mbchar.h"}, holds a +character in both the multibyte and the 32-bit wide character +representation. In case of a malformed character only the multibyte +representation is used. + +@menu +* Reading multibyte strings:: +@end menu + +@node Reading multibyte strings +@subsubsection Reading multibyte strings + +If you want to process (possibly multibyte) characters while reading +them from a @code{FILE *} stream, without reading them into a string +first, the @code{mbfile} module is made for this purpose. + +@node Comparison of character APIs +@subsection Comparison of character APIs + +This table summarizes the API functions available for characters, in +POSIX and in Gnulib. + +@multitable @columnfractions .2 .2 .2 .2 .2 +@headitem unibyte character +@tab assume C locale +@tab wide character +@tab 32-bit wide character +@tab mbchar_t character + +@item @code{== '\0'} +@tab @code{== '\0'} +@tab @code{== L'\0'} +@tab @code{== 0} +@tab @code{mb_isnul} + +@item @code{==} +@tab @code{==} +@tab @code{==} +@tab @code{==} +@tab @code{mb_equal} + +@item @code{isalnum} +@tab @code{c_isalnum} +@tab @code{iswalnum} +@tab @code{c32isalnum} +@tab @code{mb_isalnum} + +@item @code{isalpha} +@tab @code{c_isalpha} +@tab @code{iswalpha} +@tab @code{c32isalpha} +@tab @code{mb_isalpha} + +@item @code{isblank} +@tab @code{c_isblank} +@tab @code{iswblank} +@tab @code{c32isblank} +@tab @code{mb_isblank} + +@item @code{iscntrl} +@tab @code{c_iscntrl} +@tab @code{iswcntrl} +@tab @code{c32iscntrl} +@tab @code{mb_iscntrl} + +@item @code{isdigit} +@tab @code{c_isdigit} +@tab @code{iswdigit} +@tab @code{c32isdigit} +@tab @code{mb_isdigit} + +@item @code{isgraph} +@tab @code{c_isgraph} +@tab @code{iswgraph} +@tab @code{c32isgraph} +@tab @code{mb_isgraph} + +@item @code{islower} +@tab @code{c_islower} +@tab @code{iswlower} +@tab @code{c32islower} +@tab @code{mb_islower} + +@item @code{isprint} +@tab @code{c_isprint} +@tab @code{iswprint} +@tab @code{c32isprint} +@tab @code{mb_isprint} + +@item @code{ispunct} +@tab @code{c_ispunct} +@tab @code{iswpunct} +@tab @code{c32ispunct} +@tab @code{mb_ispunct} + +@item @code{isspace} +@tab @code{c_isspace} +@tab @code{iswspace} +@tab @code{c32isspace} +@tab @code{mb_isspace} + +@item @code{isupper} +@tab @code{c_isupper} +@tab @code{iswupper} +@tab @code{c32isupper} +@tab @code{mb_isupper} + +@item @code{isxdigit} +@tab @code{c_isxdigit} +@tab @code{iswxdigit} +@tab @code{c32isxdigit} +@tab @code{mb_isxdigit} + +@item -- +@tab -- +@tab @code{iswctype} +@tab -- +@tab -- + +@item @code{tolower} +@tab @code{c_tolower} +@tab @code{towlower} +@tab @code{c32tolower} +@tab -- + +@item @code{toupper} +@tab @code{c_toupper} +@tab @code{towupper} +@tab @code{c32toupper} +@tab -- + +@item -- +@tab -- +@tab @code{towctrans} +@tab -- +@tab -- + +@item -- +@tab -- +@tab @code{wcwidth} +@tab @code{c32width} +@tab @code{mb_width} + +@end multitable -- cgit v1.2.1