manual: Enhance documentation of the <ctype.h> functions
Checks
Context |
Check |
Description |
redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
redhat-pt-bot/TryBot-32bit |
success
|
Build for i686
|
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Testing passed
|
linaro-tcwg-bot/tcwg_glibc_check--master-arm |
success
|
Testing passed
|
Commit Message
Describe the problems with signed characters, and the glibc extension
to deal with most of them. Mention that the is* functions return
zero for the special argument EOF.
---
manual/ctype.texi | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)
base-commit: 388ae538ddcb05c7d8966147b488a5f6e481656e
Comments
On 6/15/23 13:18, Florian Weimer via Libc-alpha wrote:
> Describe the problems with signed characters, and the glibc extension
> to deal with most of them. Mention that the is* functions return
> zero for the special argument EOF.
LGTM.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> ---
> manual/ctype.texi | 32 ++++++++++++++++++++++++--------
> 1 file changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/manual/ctype.texi b/manual/ctype.texi
> index 88e3523dc4..d09249c6cf 100644
> --- a/manual/ctype.texi
> +++ b/manual/ctype.texi
> @@ -40,21 +40,37 @@ one set works on @code{char} type characters, the other one on
>
> This section explains the library functions for classifying characters.
> For example, @code{isalpha} is the function to test for an alphabetic
> -character. It takes one argument, the character to test, and returns a
> -nonzero integer if the character is alphabetic, and zero otherwise. You
> -would use it like this:
> +character. It takes one argument, the character to test as an
> +@code{unsigned char} value, and returns a nonzero integer if the
OK. Clarify that the input should be cased to 'unsigned char'.
> +character is alphabetic, and zero otherwise. You would use it like
> +this:
>
> @smallexample
> -if (isalpha (c))
> +if (isalpha ((unsigned char) c))
OK. Clarify that is* takes unsigned char, and that the machine, which may have signed
chars, needs to be cast.
> printf ("The character `%c' is alphabetic.\n", c);
> @end smallexample
>
> Each of the functions in this section tests for membership in a
> particular class of characters; each has a name starting with @samp{is}.
> -Each of them takes one argument, which is a character to test, and
> -returns an @code{int} which is treated as a boolean value. The
> -character argument is passed as an @code{int}, and it may be the
> -constant value @code{EOF} instead of a real character.
> +Each of them takes one argument, which is a character to test. The
> +character argument must be in the value range of @code{unsigned char} (0
> +to 255 for @theglibc{}). On a machine where the @code{char} type is
> +signed, it may be necessary to cast the argument to @code{unsigned
> +char}, or mask it with @samp{& 0xff}. (On @code{unsigned char}
OK.
> +machines, this step is harmless, so portable code should always perform
> +it.) The @samp{is} functions return an @code{int} which is treated as a
> +boolean value.
> +
> +All @samp{is} functions accept the special value @code{EOF} and return
> +zero. (Note that @code{EOF} must not be cast to @code{unsigned char}
> +for this to work.)
> +
> +As an extension, @theglibc{} accepts signed @code{char} values as
> +@samp{is} functions arguments in the range -128 to -2, and returns the
> +result for the corresponding unsigned character. However, as there
> +might be an actual character corresponding to the @code{EOF} integer
> +constant, doing so may introduce bugs, and it is recommended to apply
> +the conversion to the unsigned character range as appropriate.
OK. Agreed. That makes complete sense and is good to document.
>
> The attributes of any given character can vary between locales.
> @xref{Locales}, for more information on locales.
>
> base-commit: 388ae538ddcb05c7d8966147b488a5f6e481656e
>
@@ -40,21 +40,37 @@ one set works on @code{char} type characters, the other one on
This section explains the library functions for classifying characters.
For example, @code{isalpha} is the function to test for an alphabetic
-character. It takes one argument, the character to test, and returns a
-nonzero integer if the character is alphabetic, and zero otherwise. You
-would use it like this:
+character. It takes one argument, the character to test as an
+@code{unsigned char} value, and returns a nonzero integer if the
+character is alphabetic, and zero otherwise. You would use it like
+this:
@smallexample
-if (isalpha (c))
+if (isalpha ((unsigned char) c))
printf ("The character `%c' is alphabetic.\n", c);
@end smallexample
Each of the functions in this section tests for membership in a
particular class of characters; each has a name starting with @samp{is}.
-Each of them takes one argument, which is a character to test, and
-returns an @code{int} which is treated as a boolean value. The
-character argument is passed as an @code{int}, and it may be the
-constant value @code{EOF} instead of a real character.
+Each of them takes one argument, which is a character to test. The
+character argument must be in the value range of @code{unsigned char} (0
+to 255 for @theglibc{}). On a machine where the @code{char} type is
+signed, it may be necessary to cast the argument to @code{unsigned
+char}, or mask it with @samp{& 0xff}. (On @code{unsigned char}
+machines, this step is harmless, so portable code should always perform
+it.) The @samp{is} functions return an @code{int} which is treated as a
+boolean value.
+
+All @samp{is} functions accept the special value @code{EOF} and return
+zero. (Note that @code{EOF} must not be cast to @code{unsigned char}
+for this to work.)
+
+As an extension, @theglibc{} accepts signed @code{char} values as
+@samp{is} functions arguments in the range -128 to -2, and returns the
+result for the corresponding unsigned character. However, as there
+might be an actual character corresponding to the @code{EOF} integer
+constant, doing so may introduce bugs, and it is recommended to apply
+the conversion to the unsigned character range as appropriate.
The attributes of any given character can vary between locales.
@xref{Locales}, for more information on locales.