summaryrefslogtreecommitdiff
path: root/doc/strings.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/strings.texi')
-rw-r--r--doc/strings.texi854
1 files changed, 854 insertions, 0 deletions
diff --git a/doc/strings.texi b/doc/strings.texi
new file mode 100644
index 0000000000..aa0830f1a5
--- /dev/null
+++ b/doc/strings.texi
@@ -0,0 +1,854 @@
+@node Strings and Characters
+@chapter Strings and Characters
+
+@c Copyright (C) 2009-2023 Free Software Foundation, Inc.
+
+@c Permission is granted to copy, distribute and/or modify this document
+@c under the terms of the GNU Free Documentation License, Version 1.3 or
+@c any later version published by the Free Software Foundation; with no
+@c Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
+@c copy of the license is at <https://www.gnu.org/licenses/fdl-1.3.en.html>.
+
+@c Written by Bruno Haible.
+
+This chapter describes the APIs for strings and characters, provided by Gnulib.
+
+@menu
+* Strings::
+* Characters::
+@end menu
+
+@node Strings
+@section Strings
+
+Several possible representations exist for the representation of strings
+in memory of a running C program.
+
+@menu
+* C strings::
+* Strings with NUL characters::
+* Comparison of string APIs::
+@end menu
+
+@node C strings
+@subsection The C string representation
+
+The classical representation of a string in C is a sequence of
+characters, where each character takes up one or more bytes, followed by
+a terminating NUL byte. This representation is used for strings that
+are passed by the operating system (in the @code{argv} argument of
+@code{main}, for example) and for strings that are passed to the
+operating system (in system calls such as @code{open}). The C type to
+hold such strings is @samp{char *} or, in places where the string shall
+not be modified, @samp{const char *}. There are many C library
+functions, standardized by ISO C and POSIX, that assume this
+representation of strings.
+
+An @emph{character encoding}, or @emph{encoding} for short, describes
+how the elements of a character set are represented as a sequence of
+bytes. For example, in the @code{ASCII} encoding, the UNDERSCORE
+character is represented by a single byte, with value 0x5F. As another
+example, the COPYRIGHT SIGN character is represented:
+@itemize
+@item
+in the @code{ISO-8859-1} encoding, by the single byte 0xA9,
+@item
+in the @code{UTF-8} encoding, by the two bytes 0xC2 0xA9,
+@item
+in the @code{GB18030} encoding, by the four bytes 0x81 0x30 0x84 0x38.
+@end itemize
+
+@noindent
+Note: The @samp{char} type may be signed or unsigned, depending on the
+platform. When we talk about the "byte 0xA9" we actually mean the
+@code{char} object whose value is @code{(char) 0xA9}; we omit the cast
+to @code{char} in this documentation, for brevity.
+
+In POSIX, the character encoding is determined by the locale. The
+locale is some environmental attribute that the user can choose.
+
+Depending on the encoding, in general, every character is represented by
+one or more bytes (up to 4 bytes in practice --- but
+use @code{MB_LEN_MAX} instead of the number 4 in the code).
+@cindex unibyte locale
+@cindex multibyte locale
+When every character is represented by only 1 byte, we speak of an
+``unibyte locale'', otherwise of a ``multibyte locale''.
+
+It is important to realize that the majority of Unix installations
+nowadays use UTF-8 or GB18030 as locale encoding; therefore, the
+majority of users are using multibyte locales.
+
+Three important facts to remember are:
+
+@cartouche
+@emph{A @samp{char} is a byte, not a character.}
+@end cartouche
+
+As a consequence:
+@itemize @bullet
+@item
+The @posixheader{ctype.h} API, that was designed only with unibyte
+encodings in mind, is useless nowadays; it does not work in
+multibyte locales.
+@item
+The @posixfunc{strlen} function does not return the number of characters
+in a string. Nor does it return the number of screen columns occupied
+by a string after it is output. It merely returns the number of
+@emph{bytes} occupied by a string.
+@item
+Truncating a string, for example, with @posixfunc{strncpy}, can have the
+effect of truncating it in the middle of a multibyte character. Such
+a string will, when output, have a garbled character at its end, often
+represented by a hollow box.
+@end itemize
+
+@cartouche
+@emph{Multibyte does not imply UTF-8 encoding.}
+@end cartouche
+
+While UTF-8 is the most common multibyte encoding, GB18030 is there as
+well and will not go away within decades, because it is a Chinese
+government standard, last revised in 2022.
+
+@cartouche
+@emph{Searching for a character in a string is not the same as searching
+for a byte in the string.}
+@end cartouche
+
+Take the above example of COPYRIGHT SIGN in the @code{GB18030} encoding:
+A byte search will find the bytes @code{'0'} and @code{'8'} in this
+string. But a search for the @emph{character} "0" or "8" in the string
+"@copyright{}" must, of course, report ``not found''.
+
+As a consequence:
+@itemize @bullet
+@item
+@posixfunc{strchr} and @posixfunc{strrchr} do not work with multibyte
+strings if the locale encoding is GB18030 and the character to be
+searched is a digit.
+@item
+@posixfunc{strstr} does not work with multibyte strings if the locale
+encoding is different from UTF-8.
+@item
+@posixfunc{strcspn}, @posixfunc{strpbrk}, @posixfunc{strspn} cannot work
+correctly in multibyte locales: they assume the second argument is a
+list of single-byte characters. Even in this simple case, they do not
+work with multibyte strings if the locale encoding is GB18030 and one of
+the characters to be searched is a digit.
+@item
+@posixfunc{strsep} and @posixfunc{strtok_r} do not work with multibyte
+strings unless all of the delimiter characters are ASCII characters
+< 0x30.
+@item
+The @posixfunc{strcasecmp}, @posixfunc{strncasecmp}, and
+@posixfunc{strcasestr} functions do not work with multibyte strings.
+@end itemize
+
+Workarounds can be found in Gnulib, in the form of @code{mbs*} API
+functions:
+@itemize @bullet
+@item
+Gnulib has functions @func{mbslen} and @func{mbswidth} that can be used
+instead of @posixfunc{strlen} when the number of characters or the
+number of screen columns of a string is requested.
+@item
+Gnulib has functions @func{mbschr} and @func{mbsrrchr} that are like
+@posixfunc{strchr} and @posixfunc{strrchr}, but work in multibyte
+locales.
+@item
+Gnulib has a function @func{mbsstr} that is like @posixfunc{strstr}, but
+works in multibyte locales.
+@item
+Gnulib has functions @func{mbscspn}, @func{mbspbrk}, @func{mbsspn} that
+are like @posixfunc{strcspn}, @posixfunc{strpbrk}, @posixfunc{strspn},
+but work in multibyte locales.
+@item
+Gnulib has functions @func{mbssep} and @func{mbstok_r} that are like
+@posixfunc{strsep} and @posixfunc{strtok_r} but work in multibyte
+locales.
+@item
+Gnulib has functions @func{mbscasecmp}, @func{mbsncasecmp},
+@func{mbspcasecmp}, and @func{mbscasestr} that are like
+@posixfunc{strcasecmp}, @posixfunc{strncasecmp}, and
+@posixfunc{strcasestr}, but work in multibyte locales. Still, the
+function @code{ulc_casecmp} is preferable to these functions.
+@end itemize
+
+Gnulib also has additional API.
+
+@menu
+* Iterating through strings::
+@end menu
+
+@node Iterating through strings
+@subsubsection Iterating through strings
+
+For complex string processing, the provided strings functions may not be
+enough, and what you need is a way to iterate through a string while
+processing each (possibly multibyte) character in turn. Gnulib provides
+two modules for this purpose. Both iterate through the string in
+forward direction. Iteration in backward direction, that is, from the
+string's end to start, is not provided, as it is too hairy in general.
+
+@itemize
+@item
+The @code{mbiter} module. It iterates through a C string whose length
+is already known.
+@item
+The @code{mbuiter} module. It iterates through a C string whose length
+is not a-priori known.
+@end itemize
+
+The @code{mbuiter} module is suitable when there is a high probability
+that only the first few multibyte characters need to be inspected.
+Whereas the @code{mbiter} module is better if usually the iteration runs
+through the entire string.
+
+@node Strings with NUL characters
+@subsection Strings with NUL characters
+
+The GNU Coding Standards, section
+@ifinfo
+@ref{Semantics,,Writing Robust Programs,standards},
+@end ifinfo
+@ifnotinfo
+@url{https://www.gnu.org/prep/standards/html_node/Semantics.html},
+@end ifnotinfo
+specifies:
+@cartouche
+Utilities reading files should not drop NUL characters, or any other
+nonprinting characters.
+@end cartouche
+
+When it is a requirement to store NUL characters in strings, a variant
+of the C strings is needed. Gnulib offers a ``string descriptor'' type
+for this purpose. See @ref{Handling strings with NUL characters}.
+
+All remarks regarding encodings and multibyte characters in the previous
+section apply to string descriptors as well.
+
+@include c-locale.texi
+
+@node Comparison of string APIs
+@subsection Comparison of string APIs
+
+This table summarizes the API functions available for strings, in POSIX
+and in Gnulib.
+
+@multitable @columnfractions .17 .17 .17 .17 .16 .16
+@headitem unibyte strings only
+@tab assume C locale
+@tab multibyte strings
+@tab multibyte strings with NULs
+@tab wide character strings
+@tab 32-bit wide character strings
+
+@item @code{strlen}
+@tab @code{strlen}
+@tab @code{mbslen}
+@tab @code{string_desc_length}
+@tab @code{wcslen}
+@tab @code{u32_strlen}
+
+@item @code{strnlen}
+@tab @code{strnlen}
+@tab @code{mbsnlen}
+@tab --
+@tab @code{wcsnlen}
+@tab @code{u32_strnlen}, @code{u32_mbsnlen}
+
+@item @code{strcmp}
+@tab @code{strcmp}
+@tab @code{strcmp}
+@tab @code{string_desc_cmp}
+@tab @code{wcscmp}
+@tab @code{u32_strcmp}
+
+@item @code{strncmp}
+@tab @code{strncmp}
+@tab @code{strncmp}
+@tab --
+@tab @code{wcsncmp}
+@tab @code{u32_strncmp}
+
+@item @code{strcasecmp}
+@tab @code{strcasecmp}
+@tab @code{mbscasecmp}
+@tab --
+@tab @code{wcscasecmp}
+@tab @code{u32_casecmp}
+
+@item @code{strncasecmp}
+@tab @code{strncasecmp}
+@tab @code{mbsncasecmp}, @code{mbspcasecmp}
+@tab --
+@tab @code{wcsncasecmp}
+@tab @code{u32_casecmp}
+
+@item @code{strcoll}
+@tab @code{strcmp}
+@tab @code{strcoll}
+@tab --
+@tab @code{wcscoll}
+@tab @code{u32_strcoll}
+
+@item @code{strxfrm}
+@tab --
+@tab @code{strxfrm}
+@tab --
+@tab @code{wcsxfrm}
+@tab --
+
+@item @code{strchr}
+@tab @code{strchr}
+@tab @code{mbschr}
+@tab @code{string_desc_index}
+@tab @code{wcschr}
+@tab @code{u32_strchr}
+
+@item @code{strrchr}
+@tab @code{strrchr}
+@tab @code{mbsrchr}
+@tab @code{string_desc_last_index}
+@tab @code{wcsrchr}
+@tab @code{u32_strrchr}
+
+@item @code{strstr}
+@tab @code{strstr}
+@tab @code{mbsstr}
+@tab @code{string_desc_contains}
+@tab @code{wcsstr}
+@tab @code{u32_strstr}
+
+@item @code{strcasestr}
+@tab @code{strcasestr}
+@tab @code{mbscasestr}
+@tab --
+@tab --
+@tab --
+
+@item @code{strspn}
+@tab @code{strspn}
+@tab @code{mbsspn}
+@tab --
+@tab @code{wcsspn}
+@tab @code{u32_strspn}
+
+@item @code{strcspn}
+@tab @code{strcspn}
+@tab @code{mbscspn}
+@tab --
+@tab @code{wcscspn}
+@tab @code{u32_strcspn}
+
+@item @code{strpbrk}
+@tab @code{strpbrk}
+@tab @code{mbspbrk}
+@tab --
+@tab @code{wcspbrk}
+@tab @code{u32_strpbrk}
+
+@item @code{strtok_r}
+@tab @code{strtok_r}
+@tab @code{mbstok_r}
+@tab --
+@tab @code{wcstok}
+@tab @code{u32_strtok}
+
+@item @code{strsep}
+@tab @code{strsep}
+@tab @code{mbssep}
+@tab --
+@tab --
+@tab --
+
+@item @code{strcpy}
+@tab @code{strcpy}
+@tab @code{strcpy}
+@tab @code{string_desc_copy}
+@tab @code{wcscpy}
+@tab @code{u32_strcpy}
+
+@item @code{stpcpy}
+@tab @code{stpcpy}
+@tab @code{stpcpy}
+@tab --
+@tab @code{wcpcpy}
+@tab @code{u32_stpcpy}
+
+@item @code{strncpy}
+@tab @code{strncpy}
+@tab @code{strncpy}
+@tab --
+@tab @code{wcsncpy}
+@tab @code{u32_strncpy}
+
+@item @code{stpncpy}
+@tab @code{stpncpy}
+@tab @code{stpncpy}
+@tab --
+@tab @code{wcpncpy}
+@tab @code{u32_stpncpy}
+
+@item @code{strcat}
+@tab @code{strcat}
+@tab @code{strcat}
+@tab @code{string_desc_concat}
+@tab @code{wcscat}
+@tab @code{u32_strcat}
+
+@item @code{strncat}
+@tab @code{strncat}
+@tab @code{strncat}
+@tab --
+@tab @code{wcsncat}
+@tab @code{u32_strncat}
+
+@item @code{free}
+@tab @code{free}
+@tab @code{free}
+@tab @code{string_desc_free}
+@tab @code{free}
+@tab @code{free}
+
+@item @code{strdup}
+@tab @code{strdup}
+@tab @code{strdup}
+@tab @code{string_desc_copy}
+@tab @code{wcsdup}
+@tab @code{u32_strdup}
+
+@item @code{strndup}
+@tab @code{strndup}
+@tab @code{strndup}
+@tab --
+@tab --
+@tab --
+
+@item @code{mbswidth}
+@tab @code{mbswidth}
+@tab @code{mbswidth}
+@tab --
+@tab @code{wcswidth}
+@tab @code{c32swidth}, @code{u32_strwidth}
+
+@item @code{strtol}
+@tab @code{strtol}
+@tab @code{strtol}
+@tab --
+@tab --
+@tab --
+
+@item @code{strtoul}
+@tab @code{strtoul}
+@tab @code{strtoul}
+@tab --
+@tab --
+@tab --
+
+@item @code{strtoll}
+@tab @code{strtoll}
+@tab @code{strtoll}
+@tab --
+@tab --
+@tab --
+
+@item @code{strtoull}
+@tab @code{strtoull}
+@tab @code{strtoull}
+@tab --
+@tab --
+@tab --
+
+@item @code{strtoimax}
+@tab @code{strtoimax}
+@tab @code{strtoimax}
+@tab --
+@tab @code{wcstoimax}
+@tab --
+
+@item @code{strtoumax}
+@tab @code{strtoumax}
+@tab @code{strtoumax}
+@tab --
+@tab @code{wcstoumax}
+@tab --
+
+@item @code{strtof}
+@tab --
+@tab @code{strtof}
+@tab --
+@tab --
+@tab --
+
+@item @code{strtod}
+@tab @code{c_strtod}
+@tab @code{strtod}
+@tab --
+@tab --
+@tab --
+
+@item @code{strtold}
+@tab @code{c_strtold}
+@tab @code{strtold}
+@tab --
+@tab --
+@tab --
+
+@item @code{strfromf}
+@tab --
+@tab @code{strfromf}
+@tab --
+@tab --
+@tab --
+
+@item @code{strfromd}
+@tab --
+@tab @code{strfromd}
+@tab --
+@tab --
+@tab --
+
+@item @code{strfroml}
+@tab --
+@tab @code{strfroml}
+@tab --
+@tab --
+@tab --
+
+@item --
+@tab --
+@tab --
+@tab --
+@tab @code{mbstowcs}
+@tab @code{mbstoc32s}
+
+@item --
+@tab --
+@tab --
+@tab --
+@tab @code{mbsrtowcs}
+@tab @code{mbsrtoc32s}
+
+@item --
+@tab --
+@tab --
+@tab --
+@tab @code{mbsnrtowcs}
+@tab @code{mbsnrtoc32s}
+
+@item --
+@tab --
+@tab --
+@tab --
+@tab @code{wcstombs}
+@tab @code{c32stombs}
+
+@item --
+@tab --
+@tab --
+@tab --
+@tab @code{wcsrtombs}
+@tab @code{c32srtombs}
+
+@item --
+@tab --
+@tab --
+@tab --
+@tab @code{wcsnrtombs}
+@tab @code{c32snrtombs}
+
+@end multitable
+
+@node Characters
+@section Characters
+
+A @emph{character} is the elementary unit that strings are made of.
+
+What is a character? ``A character is an element of a character set''
+is sort of a circular definition, but it highlights the fact that it is
+not merely a number. Although many characters are visually represented
+by a single glyph, there are characters that, for example, have a
+different glyph when used at the end of a word than when used inside a
+word. A character is also not the minimal rendered text processing
+unit; that is a grapheme cluster and in general consists of one or more
+characters. If you want to know more about the concept of character and
+various concepts associated with characters, refer to the Unicode
+standard.
+
+For the representation in memory of a character, various types have been
+in use, and some of them were failures: @code{char} and @code{wchar_t}
+were invented for this purpose, but are not the right types.
+@code{char32_t} is the right type (successor of @code{wchar_t}); and
+@code{mbchar_t} (defined by Gnulib) is an alternative for specific kinds
+of processing.
+
+@menu
+* The char type::
+* The wchar_t type::
+* The char32_t type::
+* The mbchar_t type::
+* Comparison of character APIs::
+@end menu
+
+@node The char type
+@subsection The @code{char} type
+
+The @code{char} type is in the C language since the beginning in the
+1970ies, but --- due to its limitation of 256 possible values --- is no
+longer the adequate type for storing a character.
+
+Technically, it is still adequate in unibyte locales. But since most
+locales nowadays are multibyte locales, it makes no sense to write a
+program that runs only in unibyte locales.
+
+ISO C and POSIX standardized an API for characters of type @code{char},
+in @code{<ctype.h>}. This API is nowadays useless and obsolete.
+
+The important lessons to remember are:
+
+@cartouche
+@emph{A @samp{char} is just the elementary storage unit for a string,
+not a character.}
+@end cartouche
+
+@cartouche
+@emph{Never use @code{<ctype.h>}!}
+@end cartouche
+
+@node The wchar_t type
+@subsection The @code{wchar_t} type
+
+The ISO C and POSIX standard creators made an attempt to overcome the
+dead end regarding the @code{char} type. They introduced
+@itemize @bullet
+@item
+a type @samp{wchar_t}, designed to encapsulate a character,
+@item
+a ``wide string'' type @samp{wchar_t *}, with some API functions
+declared in @posixheader{wchar.h}, and
+@item
+functions declared in @posixheader{wctype.h} that were meant to supplant
+the ones in @posixheader{ctype.h}.
+@end itemize
+
+Unfortunately, this API and its implementation has numerous problems:
+
+@itemize @bullet
+@item
+On Windows platforms and on AIX in 32-bit mode, @code{wchar_t} is a
+16-bit type. This means that it can never accommodate an entire Unicode
+character. Either the @code{wchar_t *} strings are limited to
+characters in UCS-2 (the ``Basic Multilingual Plane'' of Unicode), or
+--- if @code{wchar_t *} strings are encoded in UTF-16 --- a
+@code{wchar_t} represents only half of a character in the worst case,
+making the @posixheader{wctype.h} functions pointless.
+
+@item
+On Solaris and FreeBSD, the @code{wchar_t} encoding is locale dependent
+and undocumented. This means, if you want to know any property of a
+@code{wchar_t} character, other than the properties defined by
+@posixheader{wctype.h} --- such as whether it's a dash, currency symbol,
+paragraph separator, or similar ---, you have to convert it to
+@code{char *} encoding first, by use of the function @posixfunc{wctomb}.
+
+@item
+When you read a stream of wide characters, through the functions
+@posixfunc{fgetwc} and @posixfunc{fgetws}, and when the input
+stream/file is not in the expected encoding, you have no way to
+determine the invalid byte sequence and do some corrective action. If
+you use these functions, your program becomes ``garbage in - more
+garbage out'' or ``garbage in - abort''.
+@end itemize
+
+As a consequence, it is better to use multibyte strings. Such multibyte
+strings can bypass limitations of the @code{wchar_t} type, if you use
+functions defined in Gnulib and GNU libunistring for text processing.
+They can also faithfully transport malformed characters that were
+present in the input, without requiring the program to produce garbage
+or abort.
+
+@node The char32_t type
+@subsection The @code{char32_t} type
+
+The ISO C and POSIX standard creators then introduced the
+@code{char32_t} type. In ISO C 11, it was conceptually a ``32-bit wide
+character'' type. In ISO C 23, its semantics has been further
+specified: A @code{char32_t} value is a Unicode code point.
+
+Thus, the @code{char32_t} type is not affected the problems that plague
+the @code{wchar_t} type.
+
+The @code{char32_t} type and its API are defined in the @code{<uchar.h>}
+header file.
+
+ISO C and POSIX specify only the basic functions for the @code{char32_t}
+type, namely conversion of a single character (@func{mbrtoc32} and
+@func{c32rtomb}). For convenience, Gnulib adds API for classification
+and case conversion of characters.
+
+GNU libunistring can also be used on @code{char32_t} values. Since
+@code{char32_t} is the same as @code{uint32_t}, all @code{u32_*}
+functions of GNU libunistring are applicable to arrays of
+@code{char32_t} values.
+
+On glibc systems, use of the 32-bit wide strings (@code{char32_t[]}) is
+exactly as efficient as the use of the older wide strings
+(@code{wchar_t[]}). This is possible because on glibc, @code{wchar_t}
+values already always were 32-bit and Unicode code points.
+@code{mbrtoc32} is just an alias of @code{mbrtowc}. The Gnulib
+@code{*c32*} functions are optimized so that on glibc systems they
+immediately redirect to the corresponding @code{*wc*} functions.
+
+@node The mbchar_t type
+@subsection The @code{mbchar_t} type
+
+Gnulib defines an alternate way to encode a multibyte character:
+@code{mbchar_t}. Its main feature is the ability to process a string or
+stream with some malformed characters without reporting an error.
+
+The type @code{mbchar_t}, defined in @code{"mbchar.h"}, holds a
+character in both the multibyte and the 32-bit wide character
+representation. In case of a malformed character only the multibyte
+representation is used.
+
+@menu
+* Reading multibyte strings::
+@end menu
+
+@node Reading multibyte strings
+@subsubsection Reading multibyte strings
+
+If you want to process (possibly multibyte) characters while reading
+them from a @code{FILE *} stream, without reading them into a string
+first, the @code{mbfile} module is made for this purpose.
+
+@node Comparison of character APIs
+@subsection Comparison of character APIs
+
+This table summarizes the API functions available for characters, in
+POSIX and in Gnulib.
+
+@multitable @columnfractions .2 .2 .2 .2 .2
+@headitem unibyte character
+@tab assume C locale
+@tab wide character
+@tab 32-bit wide character
+@tab mbchar_t character
+
+@item @code{== '\0'}
+@tab @code{== '\0'}
+@tab @code{== L'\0'}
+@tab @code{== 0}
+@tab @code{mb_isnul}
+
+@item @code{==}
+@tab @code{==}
+@tab @code{==}
+@tab @code{==}
+@tab @code{mb_equal}
+
+@item @code{isalnum}
+@tab @code{c_isalnum}
+@tab @code{iswalnum}
+@tab @code{c32isalnum}
+@tab @code{mb_isalnum}
+
+@item @code{isalpha}
+@tab @code{c_isalpha}
+@tab @code{iswalpha}
+@tab @code{c32isalpha}
+@tab @code{mb_isalpha}
+
+@item @code{isblank}
+@tab @code{c_isblank}
+@tab @code{iswblank}
+@tab @code{c32isblank}
+@tab @code{mb_isblank}
+
+@item @code{iscntrl}
+@tab @code{c_iscntrl}
+@tab @code{iswcntrl}
+@tab @code{c32iscntrl}
+@tab @code{mb_iscntrl}
+
+@item @code{isdigit}
+@tab @code{c_isdigit}
+@tab @code{iswdigit}
+@tab @code{c32isdigit}
+@tab @code{mb_isdigit}
+
+@item @code{isgraph}
+@tab @code{c_isgraph}
+@tab @code{iswgraph}
+@tab @code{c32isgraph}
+@tab @code{mb_isgraph}
+
+@item @code{islower}
+@tab @code{c_islower}
+@tab @code{iswlower}
+@tab @code{c32islower}
+@tab @code{mb_islower}
+
+@item @code{isprint}
+@tab @code{c_isprint}
+@tab @code{iswprint}
+@tab @code{c32isprint}
+@tab @code{mb_isprint}
+
+@item @code{ispunct}
+@tab @code{c_ispunct}
+@tab @code{iswpunct}
+@tab @code{c32ispunct}
+@tab @code{mb_ispunct}
+
+@item @code{isspace}
+@tab @code{c_isspace}
+@tab @code{iswspace}
+@tab @code{c32isspace}
+@tab @code{mb_isspace}
+
+@item @code{isupper}
+@tab @code{c_isupper}
+@tab @code{iswupper}
+@tab @code{c32isupper}
+@tab @code{mb_isupper}
+
+@item @code{isxdigit}
+@tab @code{c_isxdigit}
+@tab @code{iswxdigit}
+@tab @code{c32isxdigit}
+@tab @code{mb_isxdigit}
+
+@item --
+@tab --
+@tab @code{iswctype}
+@tab --
+@tab --
+
+@item @code{tolower}
+@tab @code{c_tolower}
+@tab @code{towlower}
+@tab @code{c32tolower}
+@tab --
+
+@item @code{toupper}
+@tab @code{c_toupper}
+@tab @code{towupper}
+@tab @code{c32toupper}
+@tab --
+
+@item --
+@tab --
+@tab @code{towctrans}
+@tab --
+@tab --
+
+@item --
+@tab --
+@tab @code{wcwidth}
+@tab @code{c32width}
+@tab @code{mb_width}
+
+@end multitable