[review] localedef: Add verbose messages for failure paths.

Message ID gerrit.1571944987000.I28b9f680711ff00252a2cb15625b774cc58ecb9d@gnutoolchain-gerrit.osci.io
State Superseded
Headers

Commit Message

Simon Marchi (Code Review) Oct. 24, 2019, 7:23 p.m. UTC
  Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................

localedef: Add verbose messages for failure paths.

During testing of localedef running in a minimal container
there were several error cases which were hard to diagnose
since they appeared as strerror (errno) values printed by
the higher level functions.  This change adds three new
verbose messages for potential failure paths.  The new
messages give the user the opportunity to use -v and display
additional information about why localedef might be failing.
I found these messages useful myself while writing a localedef
container test for --no-hard-links.

Signed-off-by: Carlos O'Donell <carlos@redhat.com>
Change-Id: I28b9f680711ff00252a2cb15625b774cc58ecb9d
---
M locale/programs/localedef.c
1 file changed, 22 insertions(+), 8 deletions(-)
  

Comments

Simon Marchi (Code Review) Oct. 24, 2019, 7:33 p.m. UTC | #1
Florian Weimer has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 1:

(3 comments)

I like the direction of this change, but there are some nits.

https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1//COMMIT_MSG 
Commit Message:

https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1//COMMIT_MSG@19 
PS1, Line 19: Signed-off-by: Carlos O'Donell <carlos@redhat.com>
We do not use DCO and Signed-off-by.


https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1/locale/programs/localedef.c 
File locale/programs/localedef.c:

https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1/locale/programs/localedef.c@531 
PS1, Line 531: 			  _("failed to allocate space for compiled locale path"));
I think this line is too long.  I think we use xmalloc in many places already, so it would make sense to introduce xasprintf and use that consistently in the application.  (The version from support/ is not suitable in this context.)


https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1/locale/programs/localedef.c@567 
PS1, Line 567: 	record_verbose (stderr, _("no write permission to output path: %s"),
Would it make sense to include the output path in this message as well?
  
Simon Marchi (Code Review) Oct. 24, 2019, 7:40 p.m. UTC | #2
Florian Weimer has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 1: Code-Review+2
  
Simon Marchi (Code Review) Oct. 24, 2019, 7:40 p.m. UTC | #3
Florian Weimer has removed a vote from this change. ( https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303 )


Change subject: localedef: Add verbose messages for failure paths.
......................................................................


Removed Code-Review+2 by Florian Weimer <fweimer@redhat.com>
  
Simon Marchi (Code Review) Oct. 25, 2019, 2:31 a.m. UTC | #4
Carlos O'Donell has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 2:

> Patch Set 1:
> 
> (3 comments)
> 
> I like the direction of this change, but there are some nits.

v2 pushed with your changes resolved.
- Used xasprintf
- Fixed commit message.
  
Simon Marchi (Code Review) Oct. 25, 2019, 6:42 a.m. UTC | #5
Florian Weimer has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 2:

(4 comments)

https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/2/include/programs/xasprintf.h 
File include/programs/xasprintf.h:

PS2: 
Should this go into xmalloc.h?


https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/2/include/programs/xasprintf.h@21 
PS2, Line 21: extern int xasprintf (char **string_ptr, const char *format, ...);
This needs a format attribute.  The prototype in gnulib (and support/) looks like this:

char *xasprintf (const char *format, ...);

I think we should follow that, even though we actually use the returned length in the code today.


https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/2/locale/programs/localedef.c 
File locale/programs/localedef.c:

https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/2/locale/programs/localedef.c@446 
PS2, Line 446:       return cp;
Can you fix the memory leak for tp, too?


https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/2/locale/programs/localedef.c@559 
PS2, Line 559:   *endp++ = '/';
Maybe it would be clearer to add the slash with another xasprintf call at the end, not do the fancy in-place manipulation of the string.
  
Simon Marchi (Code Review) Oct. 25, 2019, 1:14 p.m. UTC | #6
Florian Weimer has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 2:

Please drop the dependent patch in v3.
  
Simon Marchi (Code Review) Oct. 25, 2019, 1:25 p.m. UTC | #7
Carlos O'Donell has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 3:

(3 comments)

I've resolved 2/3 issues here, I'll fix the rest in v4.

https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1//COMMIT_MSG 
Commit Message:

https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1//COMMIT_MSG@19 
PS1, Line 19: Signed-off-by: Carlos O'Donell <carlos@redhat.com>
> We do not use DCO and Signed-off-by.
Done


https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1/locale/programs/localedef.c 
File locale/programs/localedef.c:

https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1/locale/programs/localedef.c@531 
PS1, Line 531: 			  _("failed to allocate space for compiled locale path"));
> I think this line is too long. […]
Done


https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1/locale/programs/localedef.c@567 
PS1, Line 567: 	record_verbose (stderr, _("no write permission to output path: %s"),
> Would it make sense to include the output path in this message as well?
This isn't resolved yet.
  
Joseph Myers Oct. 25, 2019, 2:56 p.m. UTC | #8
On Fri, 25 Oct 2019, Carlos O'Donell (Code Review) wrote:

> https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1/locale/programs/localedef.c@531 
> PS1, Line 531: 			  _("failed to allocate space for compiled locale path"));
> > I think this line is too long. […]
> Done

This is a meta-comment on the formatting of these emails and a test of 
email replies to gerrit:

"PS1, Line 531:" is not helpful, and just makes the lines in these 
messages longer.  Quoted text should just use normal "> " to quote it 
(with the additional leading character "+", "-" or " " when being quoted 
as part of a diff hunk).
  
Carlos O'Donell Oct. 25, 2019, 3 p.m. UTC | #9
On 10/25/19 10:56 AM, Joseph Myers wrote:
> On Fri, 25 Oct 2019, Carlos O'Donell (Code Review) wrote:
> 
>> https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1/locale/programs/localedef.c@531 
>> PS1, Line 531: 			  _("failed to allocate space for compiled locale path"));
>>> I think this line is too long. […]
>> Done
> 
> This is a meta-comment on the formatting of these emails and a test of 
> email replies to gerrit:
> 
> "PS1, Line 531:" is not helpful, and just makes the lines in these 
> messages longer.  Quoted text should just use normal "> " to quote it 
> (with the additional leading character "+", "-" or " " when being quoted 
> as part of a diff hunk).
> 

Right, I think this falls under the "show more context" issue.

In this response I'm marking a comment as "Done" and I probably should
have said more. I removed the entire line in v2 of the patch, so if you
fetch v2 you'll see it's gone and I'm using xasprintf.
  
Joseph Myers Oct. 25, 2019, 3:05 p.m. UTC | #10
On Fri, 25 Oct 2019, Carlos O'Donell (Code Review) wrote:

> https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/1/locale/programs/localedef.c@531 
> PS1, Line 531: 			  _("failed to allocate space for compiled locale path"));
> > I think this line is too long. […]
> Done

Having previously sent a meta-comment and got an email bounce from gerrit, 
testing whether having now set up an account on gerrit makes it any better 
at understanding emails sent to it and adding them to the appropriate 
issues.
  
Simon Marchi (Code Review) Oct. 27, 2019, 3:44 p.m. UTC | #11
Simon Marchi has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 3:

(2 comments)

https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/3/locale/programs/localedef.c 
File locale/programs/localedef.c:

https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/3/locale/programs/localedef.c@437 
PS3, Line 437: 
428 | more_help (int key, const char *text, void *input)
429 | {
430 |   char *cp;
431 |   char *tp;
432 | 
433 |   switch (key)
434 |     {
435 |     case ARGP_KEY_HELP_EXTRA:
436 |       /* We print some extra information.  */
437 |       xasprintf (&tp, gettext ("\

Hi Carlos (1).


https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/3/locale/programs/localedef.c@521 
PS3, Line 521: 
479 | construct_output_path (char *path)
    | ...
512 | 	 the end of the function we need another byte for the trailing
513 | 	 '/'.  */
514 |       ssize_t n;
515 |       if (normal == NULL)
516 | 	n = xasprintf (&result, "%s%s/%s%c", output_prefix ?: "",
517 | 		       COMPLOCALEDIR, path, '\0');
518 |       else
519 | 	n = xasprintf (&result, "%s%s/%.*s%s%s%c",
520 | 		       output_prefix ?: "", COMPLOCALEDIR,
521 | 		       (int) (startp - path), path, normal, endp, '\0');

Hi Carlos (2).
  
Simon Marchi (Code Review) Nov. 7, 2019, 9:01 a.m. UTC | #12
Florian Weimer has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 3:

(2 comments)

Mark test comments as done.

https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/3/locale/programs/localedef.c 
File locale/programs/localedef.c:

https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/3/locale/programs/localedef.c@437 
PS3, Line 437: 
428 | more_help (int key, const char *text, void *input)
    | ...
432 | 
433 |   switch (key)
434 |     {
435 |     case ARGP_KEY_HELP_EXTRA:
436 |       /* We print some extra information.  */
437 >       xasprintf (&tp, gettext ("\
438 | For bug reporting instructions, please see:\n\
439 | %s.\n"), REPORT_BUGS_TO);
440 |       xasprintf (&cp, gettext ("\
441 | System's directory for character maps : %s\n\
442 | 		       repertoire maps: %s\n\

> Hi Carlos (1).

Done


https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303/3/locale/programs/localedef.c@521 
PS3, Line 521: 
479 | construct_output_path (char *path)
    | ...
516 | 	n = xasprintf (&result, "%s%s/%s%c", output_prefix ?: "",
517 | 		       COMPLOCALEDIR, path, '\0');
518 |       else
519 | 	n = xasprintf (&result, "%s%s/%.*s%s%s%c",
520 | 		       output_prefix ?: "", COMPLOCALEDIR,
521 > 		       (int) (startp - path), path, normal, endp, '\0');
522 | 
523 |       endp = result + n - 1;
524 |     }
525 |   else
526 |     {

> Hi Carlos (2).

Done
  
Simon Marchi (Code Review) Dec. 10, 2019, 9:18 p.m. UTC | #13
Carlos O'Donell has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 4:

(9 comments)

I rewrote part of the handling of result to avoid the in-place editing like you suggested. I also noticed another memory leak. The normalized paths need to be freed by the caller and they were not (just like the tp leak). This is ready for review again.

| --- /dev/null
| +++ /COMMIT_MSG
| @@ -1,0 +10,11 @@ During testing of localedef running in a minimal container
| +there were several error cases which were hard to diagnose
| +since they appeared as strerror (errno) values printed by
| +the higher level functions.  This change adds three new
| +verbose messages for potential failure paths.  The new
| +messages give the user the opportunity to use -v and display
| +additional information about why localedef might be failing.
| +I found these messages useful myself while writing a localedef
| +container test for --no-hard-links.
| +
| +Signed-off-by: Carlos O'Donell <carlos@redhat.com>

PS1, Line 19:

Done

| +Change-Id: I28b9f680711ff00252a2cb15625b774cc58ecb9d
| --- locale/programs/localedef.c
| +++ locale/programs/localedef.c
| @@ -523,14 +523,18 @@ construct_output_path (char *path)
|        else
|  	n = asprintf (&result, "%s%s/%.*s%s%s%c",
|  		      output_prefix ?: "", COMPLOCALEDIR,
|  		      (int) (startp - path), path, normal, endp, '\0');
|  
|        if (n < 0)
| -	return NULL;
| +	{
| +	  record_verbose (stderr,
| +			  _("failed to allocate space for compiled locale path"));

PS1, Line 531:

Done

| +	  return NULL;
| +	}
|  
|        endp = result + n - 1;
|      }
|    else
|      {
|        /* This is a user path.  Please note the additional byte in the
|  	 memory allocation.  */

 ...

| @@ -556,7 +558,19 @@ construct_output_path (char *path)
| +	  if (mkdir (result, 0777) < 0)
| +	    {
| +	      record_verbose (stderr,
| +			      _("cannot create output path \"%s\": %s"),
| +			      result, strerror (errno));
| +	      return NULL;
| +	    }
| +	}
| +      else
| +	record_verbose (stderr, _("no write permission to output path: %s"),

PS1, Line 567:

This is resolved. I added the output path.

| +			strerror (errno));
| +    }
|  
|    *endp++ = '/';
|    *endp = '\0';
|  
|    return result;
|  }
|  
| --- /dev/null
| +++ include/programs/xasprintf.h
PS2:

I like having it distinct. Just a preference.

| @@ -1,0 +12,12 @@ /* asprintf with out of memory checking
| +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
| +   GNU General Public License for more details.
| +
| +   You should have received a copy of the GNU General Public License
| +   along with this program; if not, see <https://www.gnu.org/licenses/>.  */
| +
| +#ifndef _XASPRINTF_H
| +#define _XASPRINTF_H	1
| +
| +extern int xasprintf (char **string_ptr, const char *format, ...);

PS2, Line 21:

Done

| +
| +#endif /* xasprintf.h */
| --- locale/programs/localedef.c
| +++ locale/programs/localedef.c
| @@ -443,18 +442,14 @@ System's directory for character maps : %s\n\
|  		       repertoire maps: %s\n\
|  		       locale path    : %s\n\
|  %s"),
| -		    CHARMAP_PATH, REPERTOIREMAP_PATH, LOCALE_PATH, tp) < 0)
| -	{
| -	  free (tp);
| -	  return NULL;
| -	}
| +		    CHARMAP_PATH, REPERTOIREMAP_PATH, LOCALE_PATH, tp);
|        return cp;

PS2, Line 446:

Done

|      default:
|        break;
|      }
|    return (char *) text;
|  }
|  
|  /* Print the version information.  */
|  static void
|  print_version (FILE *stream, struct argp_state *state)

 ...

| @@ -556,11 +550,19 @@ construct_output_path (char *path)
| +			      result, strerror (errno));
| +	      return NULL;
| +	    }
| +	}
| +      else
| +	record_verbose (stderr, _("no write permission to output path: %s"),
| +			strerror (errno));
| +    }
|  
|    *endp++ = '/';

PS2, Line 559:

Done

|    *endp = '\0';
|  
|    return result;
|  }
|  
|  
|  /* Normalize codeset name.  There is no standard for the codeset
|     names.  Normalization allows the user to use any of the common
|     names.  */
| --- locale/programs/localedef.c
| +++ locale/programs/localedef.c
| @@ -429,16 +429,15 @@ more_help (int key, const char *text, void *input)
|  {
|    char *cp;
|    char *tp;
|  
|    switch (key)
|      {
|      case ARGP_KEY_HELP_EXTRA:
|        /* We print some extra information.  */
| -      if (asprintf (&tp, gettext ("\
| +      xasprintf (&tp, gettext ("\

PS3, Line 437:

Done

|  For bug reporting instructions, please see:\n\
| -%s.\n"), REPORT_BUGS_TO) < 0)
| -	return NULL;
| -      if (asprintf (&cp, gettext ("\
| +%s.\n"), REPORT_BUGS_TO);
| +      xasprintf (&cp, gettext ("\
|  System's directory for character maps : %s\n\
|  		       repertoire maps: %s\n\
|  		       locale path    : %s\n\

 ...

| @@ -523,16 +518,13 @@ construct_output_path (char *path)
|        else
| -	n = asprintf (&result, "%s%s/%.*s%s%s%c",
| -		      output_prefix ?: "", COMPLOCALEDIR,
| -		      (int) (startp - path), path, normal, endp, '\0');
| -
| -      if (n < 0)
| -	return NULL;
| +	n = xasprintf (&result, "%s%s/%.*s%s%s%c",
| +		       output_prefix ?: "", COMPLOCALEDIR,
| +		       (int) (startp - path), path, normal, endp, '\0');

PS3, Line 521:

Done

|  
|        endp = result + n - 1;
|      }
|    else
|      {
|        /* This is a user path.  Please note the additional byte in the
|  	 memory allocation.  */
|        size_t len = strlen (path) + 1;
|        result = xmalloc (len + 1);
  
Simon Marchi (Code Review) Dec. 16, 2019, 5:29 p.m. UTC | #14
Adhemerval Zanella has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 4:

(13 comments)

| --- /dev/null
| +++ include/programs/xasprintf.h
| @@ -1,0 +16,9 @@ /* asprintf with out of memory checking
| +   along with this program; if not, see <https://www.gnu.org/licenses/>.  */
| +
| +#ifndef _XASPRINTF_H
| +#define _XASPRINTF_H	1
| +
| +extern char *xasprintf (const char *format, ...)
| +    __attribute__ ((__format__ (__printf__, 1, 2), __warn_unused_result__));
| +
| +#endif /* xasprintf.h */
| --- locale/Makefile
| +++ locale/Makefile
| @@ -51,18 +51,18 @@ vpath %.h programs
|  vpath %.gperf programs
|  
|  localedef-modules	:= localedef $(categories:%=ld-%) \
|  			   charmap linereader locfile \
|  			   repertoire locarchive
|  localedef-aux		:= md5
|  locale-modules		:= locale locale-spec
|  lib-modules		:= charmap-dir simple-hash xmalloc xstrdup \
| -			   record-status
| +			   record-status xasprintf

PS4, Line 59:

Ok.

|  
|  
|  GPERF = gperf
|  GPERFFLAGS = -acCgopt -k1,2,5,9,$$ -L ANSI-C
|  
|  ifeq ($(run-built-tests),yes)
|  tests-special += $(objpfx)tst-locale-locpath.out
|  endif
|  
| --- locale/programs/localedef.c
| +++ locale/programs/localedef.c
| @@ -443,17 +442,14 @@ System's directory for character maps : %s\n\
|  		       repertoire maps: %s\n\
|  		       locale path    : %s\n\
|  %s"),
| -		    CHARMAP_PATH, REPERTOIREMAP_PATH, LOCALE_PATH, tp) < 0)
| -	{
| -	  free (tp);
| -	  return NULL;
| -	}
| +		    CHARMAP_PATH, REPERTOIREMAP_PATH, LOCALE_PATH, tp);
| +      free (tp);

PS4, Line 446:

Ok, it uses the translation of 'For bug reporting...' to create
another message.

|        return cp;
|      default:
|        break;
|      }
|    return (char *) text;
|  }
|  
|  /* Print the version information.  */
|  static void

 ...

| @@ -511,38 +512,25 @@ construct_output_path (char *path)
|  	}
| -      else
| -	/* This is to keep gcc quiet.  */
| -	endp = NULL;
| -
| -      /* We put an additional '\0' at the end of the string because at
| -	 the end of the function we need another byte for the trailing
| -	 '/'.  */
| -      ssize_t n;
| +

PS4, Line 513:

Ok.

|        if (normal == NULL)
| -	n = asprintf (&result, "%s%s/%s%c", output_prefix ?: "",
| -		      COMPLOCALEDIR, path, '\0');
| +	result = xasprintf ("%s%s/%s/", output_prefix ?: "",
| +			    COMPLOCALEDIR, path);
|        else
| -	n = asprintf (&result, "%s%s/%.*s%s%s%c",
| -		      output_prefix ?: "", COMPLOCALEDIR,
| -		      (int) (startp - path), path, normal, endp, '\0');
| -
| -      if (n < 0)
| -	return NULL;
| -
| -      endp = result + n - 1;
| +	result = xasprintf ("%s%s/%.*s%s%s/",
| +			    output_prefix ?: "", COMPLOCALEDIR,
| +			    (int) (startp - path), path, normal, endp ?: "");
| +      /* Free the allocated normalized codeset name.  */

PS4, Line 521:

Wouldn't be better to use uintptr_t for the pointer arithmetic? There
is some recent discussion on gnulib-bugs about it [1].

[1] https://lists.gnu.org/archive/html/bug-
gnulib/2019-12/msg00104.html

| +      free ((char *) normal);

PS4, Line 522:

I feel a bit confusing an interface that returns a 'const char *' and
requires to explicit deallocate the memory with free, since it
requires the called to explicit remove the const qualifier with a
cast. But this is something we might fix later.

|      }
|    else
|      {
| -      /* This is a user path.  Please note the additional byte in the
| -	 memory allocation.  */
| -      size_t len = strlen (path) + 1;
| -      result = xmalloc (len + 1);
| -      endp = mempcpy (result, path, len) - 1;
| +      /* This is a user path.  */
| +      result = xasprintf ("%s/", path);

PS4, Line 527:

Ok.

|  
|        /* If the user specified an output path we cannot add the output
|  	 to the archive.  */
|        no_archive = true;
|      }
|  
|    errno = 0;
|  
|    if (no_archive && euidaccess (result, W_OK) == -1)

 ...

| @@ -559,17 +545,35 @@ construct_output_path (char *path)
| +			      _("cannot create output path \"%s\": %s"),
| +			      result, strerror (errno));
| +	      free (result);
| +	      return NULL;
| +	    }
| +	}
| +      else
| +	record_verbose (stderr,
| +			_("no write permission to output path \"%s\": %s"),
| +			result, strerror (errno));

PS4, Line 554:

Is it the only possible meaningful error for this case?

| +    }
|  
|    return result;
|  }
|  
|  
| -/* Normalize codeset name.  There is no standard for the codeset
| -   names.  Normalization allows the user to use any of the common
| -   names.  */
| +/* Normalize codeset name.  There is no standard for the codeset names.
| +   Normalization allows the user to use any of the common names e.g. UTF-8,
| +   utf-8, utf8, UTF8 etc.
| +
| +   We normalize using the following rules:
| +   - Remove all non-alpha-numeric characters
| +   - Lowercase all cahracters.

PS4, Line 567:

s/cahracters/characters

| +   - If there are only digits assume it's an ISO standard and prefix with 'iso'
| +
| +   We return the normalized string which needs to be freed by free.  */

PS4, Line 570:

Ok.

|  static const char *
|  normalize_codeset (const char *codeset, size_t name_len)
|  {
|    int len = 0;
|    int only_digit = 1;
|    char *retval;
|    char *wp;
|    size_t cnt;
|  

 ...

| @@ -578,17 +583,19 @@ normalize_codeset (const char *codeset, size_t name_len)
|        {
|  	++len;
|  
|  	if (isalpha (codeset[cnt]))
|  	  only_digit = 0;
|        }
|  
| +  /* If there were only digits we assume it's an ISO standard and we will
| +     prefix with 'iso' so include space for that.  */
|    retval = (char *) malloc ((only_digit ? 3 : 0) + len + 1);

PS4, Line 592:

It already for failed allocation for xasprintf, wouldn't it be simpler
to do the same here?

|  
|    if (retval != NULL)
|      {
|        if (only_digit)
|  	wp = stpcpy (retval, "iso");
|        else
|  	wp = retval;
|  
|        for (cnt = 0; cnt < name_len; ++cnt)
| --- locale/programs/localedef.h
| +++ locale/programs/localedef.h
| @@ -117,18 +117,19 @@ /* Global variables of the localedef program.  */
|  extern const char *repertoire_global;
|  extern int max_locarchive_open_retry;
|  extern bool no_archive;
|  extern const char *alias_file;
|  extern bool hard_links;
|  
|  
|  /* Prototypes for a few program-wide used functions.  */
|  #include <programs/xmalloc.h>
| +#include <programs/xasprintf.h>

PS4, Line 126:

Ok.

|  
|  
|  /* Mark given locale as to be read.  */
|  extern struct localedef_t *add_to_readlist (int locale, const char *name,
|  					    const char *repertoire_name,
|  					    int generate,
|  					    struct localedef_t *copy_locale);
|  
|  /* Find the information for the locale NAME.  */
| --- /dev/null
| +++ locale/programs/xasprintf.c
| @@ -1,0 +26,9 @@ xasprintf (const char *format, ...)
| +{
| +  va_list ap;
| +  va_start (ap, format);
| +  char *result;
| +  if (vasprintf (&result, format, ap) < 0)
| +    error (EXIT_FAILURE, 0, _("memory exhausted"));
| +  va_end (ap);
| +  return result;
| +}
  
Simon Marchi (Code Review) Dec. 17, 2019, 3:03 a.m. UTC | #15
Carlos O'Donell has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 4:

(5 comments)

Reviewed Adhemerval's comments. Patchset 5 coming up.

| --- locale/programs/localedef.c
| +++ locale/programs/localedef.c
| @@ -526,14 +518,9 @@ construct_output_path (char *path)
| -		      (int) (startp - path), path, normal, endp, '\0');
| -
| -      if (n < 0)
| -	return NULL;
| -
| -      endp = result + n - 1;
| +	result = xasprintf ("%s%s/%.*s%s%s/",
| +			    output_prefix ?: "", COMPLOCALEDIR,
| +			    (int) (startp - path), path, normal, endp ?: "");
| +      /* Free the allocated normalized codeset name.  */

PS4, Line 521:

It must be of type (int) since that's what printf(3) says is the width
specifier type in the next argument of the variable argument list.

Note that the discussion on gnulib-bugs is specifically about using
signed types like (int), and avoiding unsigned types [1] for
subscripts and sizes.

[1] http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf

| +      free ((char *) normal);

PS4, Line 522:

I actually fixed this, then undid it because I didn't want to expand
the scope of the patch, but I think I'll just clean this up because I
won't be back to look at this code for a while.

Fixed in next version.

|      }
|    else
|      {
| -      /* This is a user path.  Please note the additional byte in the
| -	 memory allocation.  */
| -      size_t len = strlen (path) + 1;
| -      result = xmalloc (len + 1);
| -      endp = mempcpy (result, path, len) - 1;
| +      /* This is a user path.  */

 ...

| @@ -559,14 +545,32 @@ construct_output_path (char *path)
| +			      _("cannot create output path \"%s\": %s"),
| +			      result, strerror (errno));
| +	      free (result);
| +	      return NULL;
| +	    }
| +	}
| +      else
| +	record_verbose (stderr,
| +			_("no write permission to output path \"%s\": %s"),
| +			result, strerror (errno));

PS4, Line 554:

We aren't writing to the archive, so the result has to be path, and
euidaccess returned -1 when we asked to write to it. So the error
message relates to the operation localedef was attempting to do.

e.g.
[verbose] no write permission to output path
"/home/carlos/build/glibc-gr-localedef/tmp/en_US.UTF-8/": Not a
directory

We distinguish only the ENOENT case because we might be able to do
something about that. In theory we could handle all forms of errors
returned by stat64, but instead of doing that we print errno so you'll
know if it's EACCESS, EBADF, ELOOP, ENAMETOOLONG, ENOMEM, etc. etc.

My suggestion is to leave the current message as-is, it represents
what localedef was _trying_ to do and let the user look at the printed
strerror(errno) to determine why that action (writing) was prevented.

Thoughts?

| +    }
|  
|    return result;
|  }
|  
|  
| -/* Normalize codeset name.  There is no standard for the codeset
| -   names.  Normalization allows the user to use any of the common
| -   names.  */
| +/* Normalize codeset name.  There is no standard for the codeset names.
| +   Normalization allows the user to use any of the common names e.g. UTF-8,
| +   utf-8, utf8, UTF8 etc.
| +
| +   We normalize using the following rules:
| +   - Remove all non-alpha-numeric characters
| +   - Lowercase all cahracters.

PS4, Line 567:

Fixed in next version.

| +   - If there are only digits assume it's an ISO standard and prefix with 'iso'
| +
| +   We return the normalized string which needs to be freed by free.  */
|  static const char *
|  normalize_codeset (const char *codeset, size_t name_len)
|  {
|    int len = 0;
|    int only_digit = 1;
|    char *retval;

 ...

| @@ -578,17 +583,19 @@ normalize_codeset (const char *codeset, size_t name_len)
|        {
|  	++len;
|  
|  	if (isalpha (codeset[cnt]))
|  	  only_digit = 0;
|        }
|  
| +  /* If there were only digits we assume it's an ISO standard and we will
| +     prefix with 'iso' so include space for that.  */
|    retval = (char *) malloc ((only_digit ? 3 : 0) + len + 1);

PS4, Line 592:

Fixed in next version.

|  
|    if (retval != NULL)
|      {
|        if (only_digit)
|  	wp = stpcpy (retval, "iso");
|        else
|  	wp = retval;
|  
|        for (cnt = 0; cnt < name_len; ++cnt)
  
Simon Marchi (Code Review) Dec. 17, 2019, 3:29 a.m. UTC | #16
Carlos O'Donell has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 5:

(11 comments)

There is a bug in using 'en_US.UTF-8@tEsT' style entries which I need to fix. I'm off by the computation of the length in the xasprtinf because UTF-8 is normalized to one byte less e.g. utf8 and I need to do the math to fix that.

| --- /dev/null
| +++ include/programs/xasprintf.h
| @@ -1,0 +16,9 @@ /* asprintf with out of memory checking
| +   along with this program; if not, see <https://www.gnu.org/licenses/>.  */
| +
| +#ifndef _XASPRINTF_H
| +#define _XASPRINTF_H	1
| +
| +extern char *xasprintf (const char *format, ...)
| +    __attribute__ ((__format__ (__printf__, 1, 2), __warn_unused_result__));
| +
| +#endif /* xasprintf.h */
| --- locale/Makefile
| +++ locale/Makefile
| @@ -51,18 +51,18 @@ vpath %.h programs
|  vpath %.gperf programs
|  
|  localedef-modules	:= localedef $(categories:%=ld-%) \
|  			   charmap linereader locfile \
|  			   repertoire locarchive
|  localedef-aux		:= md5
|  locale-modules		:= locale locale-spec
|  lib-modules		:= charmap-dir simple-hash xmalloc xstrdup \
| -			   record-status
| +			   record-status xasprintf

PS4, Line 59:

Done

|  
|  
|  GPERF = gperf
|  GPERFFLAGS = -acCgopt -k1,2,5,9,$$ -L ANSI-C
|  
|  ifeq ($(run-built-tests),yes)
|  tests-special += $(objpfx)tst-locale-locpath.out
|  endif
|  
| --- locale/programs/localedef.c
| +++ locale/programs/localedef.c
| @@ -443,17 +442,14 @@ System's directory for character maps : %s\n\
|  		       repertoire maps: %s\n\
|  		       locale path    : %s\n\
|  %s"),
| -		    CHARMAP_PATH, REPERTOIREMAP_PATH, LOCALE_PATH, tp) < 0)
| -	{
| -	  free (tp);
| -	  return NULL;
| -	}
| +		    CHARMAP_PATH, REPERTOIREMAP_PATH, LOCALE_PATH, tp);
| +      free (tp);

PS4, Line 446:

Done

|        return cp;
|      default:
|        break;
|      }
|    return (char *) text;
|  }
|  
|  /* Print the version information.  */
|  static void

 ...

| @@ -511,38 +512,25 @@ construct_output_path (char *path)
|  	}
| -      else
| -	/* This is to keep gcc quiet.  */
| -	endp = NULL;
| -
| -      /* We put an additional '\0' at the end of the string because at
| -	 the end of the function we need another byte for the trailing
| -	 '/'.  */
| -      ssize_t n;
| +

PS4, Line 513:

Done

|        if (normal == NULL)
| -	n = asprintf (&result, "%s%s/%s%c", output_prefix ?: "",
| -		      COMPLOCALEDIR, path, '\0');
| +	result = xasprintf ("%s%s/%s/", output_prefix ?: "",
| +			    COMPLOCALEDIR, path);
|        else
| -	n = asprintf (&result, "%s%s/%.*s%s%s%c",
| -		      output_prefix ?: "", COMPLOCALEDIR,
| -		      (int) (startp - path), path, normal, endp, '\0');
| -
| -      if (n < 0)
| -	return NULL;
| -
| -      endp = result + n - 1;
| +	result = xasprintf ("%s%s/%.*s%s%s/",
| +			    output_prefix ?: "", COMPLOCALEDIR,
| +			    (int) (startp - path), path, normal, endp ?: "");
| +      /* Free the allocated normalized codeset name.  */

PS4, Line 521:

Done

| +      free ((char *) normal);

PS4, Line 522:

Done

|      }
|    else
|      {
| -      /* This is a user path.  Please note the additional byte in the
| -	 memory allocation.  */
| -      size_t len = strlen (path) + 1;
| -      result = xmalloc (len + 1);
| -      endp = mempcpy (result, path, len) - 1;
| +      /* This is a user path.  */
| +      result = xasprintf ("%s/", path);

PS4, Line 527:

Done

|  
|        /* If the user specified an output path we cannot add the output
|  	 to the archive.  */
|        no_archive = true;
|      }
|  
|    errno = 0;
|  
|    if (no_archive && euidaccess (result, W_OK) == -1)

 ...

| @@ -564,12 +561,19 @@ construct_output_path (char *path)
| -/* Normalize codeset name.  There is no standard for the codeset
| -   names.  Normalization allows the user to use any of the common
| -   names.  */
| +/* Normalize codeset name.  There is no standard for the codeset names.
| +   Normalization allows the user to use any of the common names e.g. UTF-8,
| +   utf-8, utf8, UTF8 etc.
| +
| +   We normalize using the following rules:
| +   - Remove all non-alpha-numeric characters
| +   - Lowercase all cahracters.

PS4, Line 567:

Done

| +   - If there are only digits assume it's an ISO standard and prefix with 'iso'
| +
| +   We return the normalized string which needs to be freed by free.  */

PS4, Line 570:

Done

|  static const char *
|  normalize_codeset (const char *codeset, size_t name_len)
|  {
|    int len = 0;
|    int only_digit = 1;
|    char *retval;
|    char *wp;
|    size_t cnt;
|  

 ...

| @@ -578,17 +583,19 @@ normalize_codeset (const char *codeset, size_t name_len)
|        {
|  	++len;
|  
|  	if (isalpha (codeset[cnt]))
|  	  only_digit = 0;
|        }
|  
| +  /* If there were only digits we assume it's an ISO standard and we will
| +     prefix with 'iso' so include space for that.  */
|    retval = (char *) malloc ((only_digit ? 3 : 0) + len + 1);

PS4, Line 592:

Done

|  
|    if (retval != NULL)
|      {
|        if (only_digit)
|  	wp = stpcpy (retval, "iso");
|        else
|  	wp = retval;
|  
|        for (cnt = 0; cnt < name_len; ++cnt)
| --- locale/programs/localedef.h
| +++ locale/programs/localedef.h
| @@ -117,18 +117,19 @@ /* Global variables of the localedef program.  */
|  extern const char *repertoire_global;
|  extern int max_locarchive_open_retry;
|  extern bool no_archive;
|  extern const char *alias_file;
|  extern bool hard_links;
|  
|  
|  /* Prototypes for a few program-wide used functions.  */
|  #include <programs/xmalloc.h>
| +#include <programs/xasprintf.h>

PS4, Line 126:

Done

|  
|  
|  /* Mark given locale as to be read.  */
|  extern struct localedef_t *add_to_readlist (int locale, const char *name,
|  					    const char *repertoire_name,
|  					    int generate,
|  					    struct localedef_t *copy_locale);
|  
|  /* Find the information for the locale NAME.  */
  
Simon Marchi (Code Review) Dec. 17, 2019, 12:25 p.m. UTC | #17
Adhemerval Zanella has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 5:

(2 comments)

| --- locale/programs/localedef.c
| +++ locale/programs/localedef.c
| @@ -526,14 +518,8 @@ construct_output_path (char *path)
| -		      (int) (startp - path), path, normal, endp, '\0');
| -
| -      if (n < 0)
| -	return NULL;
| -
| -      endp = result + n - 1;
| +	result = xasprintf ("%s%s/%.*s%s%s/",
| +			    output_prefix ?: "", COMPLOCALEDIR,
| +			    (int) (startp - path), path, normal, endp ?: "");
| +      /* Free the allocated normalized codeset name.  */

PS4, Line 521:

My understanding from that gnulib discussion was although it might not
fail on current hardware that will eventually use 2-complement
operations, it is still undefined behavior in some cases:

$ cat print.c
#include <stdio.h>
#include <inttypes.h>

int main ()
{
  char *path  = (char*) 0x7fffc000;
  char *startp = (char*) 0x80003000;
  printf ("%" PRIdPTR "\n", (int) (startp - path));

  return 0;
}

$ gcc -Wall print.c -o print -m32 -fsanitize=undefined && ./print
print.c:8:3: runtime error: signed integer overflow: -2147471360 -
2147467264 cannot be represented in type 'int'
28672

Not sure if this is indeed an issue, but the resulting code being used
a string length limiter raised a red flag.

| +      free ((char *) normal);
|      }
|    else
|      {
| -      /* This is a user path.  Please note the additional byte in the
| -	 memory allocation.  */
| -      size_t len = strlen (path) + 1;
| -      result = xmalloc (len + 1);
| -      endp = mempcpy (result, path, len) - 1;

 ...

| @@ -559,8 +545,16 @@ construct_output_path (char *path)
| +			      _("cannot create output path \"%s\": %s"),
| +			      result, strerror (errno));
| +	      free (result);
| +	      return NULL;
| +	    }
| +	}
| +      else
| +	record_verbose (stderr,
| +			_("no write permission to output path \"%s\": %s"),
| +			result, strerror (errno));

PS4, Line 554:

Fair enough.

| +    }
|  
|    return result;
|  }
|  
|  
| -/* Normalize codeset name.  There is no standard for the codeset
| -   names.  Normalization allows the user to use any of the common
| -   names.  */
  
Simon Marchi (Code Review) Dec. 17, 2019, 12:27 p.m. UTC | #18
Adhemerval Zanella has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 5: Code-Review+2

(1 comment)

| --- locale/programs/localedef.c
| +++ locale/programs/localedef.c
| @@ -601,3 +591,16 @@ normalize_codeset (const char *codeset, size_t name_len)
| -    }
| -
| -  return (const char *) retval;
| +  /* If there were only digits we assume it's an ISO standard and we will
| +     prefix with 'iso' so include space for that.  */
| +  wp = retval = xasprintf ("%s%.*s", only_digit ? "iso" : "", len, "");
| +
| +  /* Skip "iso".  */
| +  if (only_digit)
| +    wp += 3;

PS5, Line 597:

Ok.

| +
| +  /* Lowercase all characters. */
| +  for (cnt = 0; cnt < name_len; ++cnt)
| +    if (isalpha (codeset[cnt]))
| +      *wp++ = tolower (codeset[cnt]);
| +    else if (isdigit (codeset[cnt]))
| +      *wp++ = codeset[cnt];
| +
| +  /* Return allocated and converted name for caller to free.  */
  
Simon Marchi (Code Review) Dec. 19, 2019, 4:37 a.m. UTC | #19
Carlos O'Donell has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 6:

(1 comment)

I fixed the buffer-overflow I introduced in the refactor in patchset 5. Only valgrind catches it, but I added a new tests-container test to verify the normalization of codesets is as expected and that has 9 new sub-tests for that verification.

Adhemerval, Would you mind having another look?

| --- locale/programs/localedef.c
| +++ locale/programs/localedef.c
| @@ -503,17 +503,11 @@ construct_output_path (char *path)
|  	{
|  	  /* We found a codeset specification.  Now find the end.  */
|  	  endp = ++startp;
| +
| +	  /* Stop at the first '@', and don't normalize anything past that.  */
|  	  while (*endp != '\0' && *endp != '@')
|  	    ++endp;
|  
|  	  if (endp > startp)
|  	    normal = normalize_codeset (startp, endp - startp);

PS5, Line 512:

There is a bug here I need to fix. If we normalize, then the string
length is potentially shorter or longer.

|  	}
| -      else
| -	/* This is to keep gcc quiet.  */
| -	endp = NULL;
| -
| -      /* We put an additional '\0' at the end of the string because at
| -	 the end of the function we need another byte for the trailing
| -	 '/'.  */
| -      ssize_t n;
  
Simon Marchi (Code Review) Dec. 31, 2019, 2:48 p.m. UTC | #20
Adhemerval Zanella has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/303
......................................................................


Patch Set 7: Code-Review+2
  

Patch

diff --git a/locale/programs/localedef.c b/locale/programs/localedef.c
index 3dcf15f..965751f 100644
--- a/locale/programs/localedef.c
+++ b/locale/programs/localedef.c
@@ -526,7 +526,11 @@ 
 		      (int) (startp - path), path, normal, endp, '\0');
 
       if (n < 0)
-	return NULL;
+	{
+	  record_verbose (stderr,
+			  _("failed to allocate space for compiled locale path"));
+	  return NULL;
+	}
 
       endp = result + n - 1;
     }
@@ -546,13 +550,23 @@ 
   errno = 0;
 
   if (no_archive && euidaccess (result, W_OK) == -1)
-    /* Perhaps the directory does not exist now.  Try to create it.  */
-    if (errno == ENOENT)
-      {
-	errno = 0;
-	if (mkdir (result, 0777) < 0)
-	  return NULL;
-      }
+    {
+      /* Perhaps the directory does not exist now.  Try to create it.  */
+      if (errno == ENOENT)
+	{
+	  errno = 0;
+	  if (mkdir (result, 0777) < 0)
+	    {
+	      record_verbose (stderr,
+			      _("cannot create output path \"%s\": %s"),
+			      result, strerror (errno));
+	      return NULL;
+	    }
+	}
+      else
+	record_verbose (stderr, _("no write permission to output path: %s"),
+			strerror (errno));
+    }
 
   *endp++ = '/';
   *endp = '\0';