[v22,7/9] posix: check for collisions in getopt_long

Message ID 5e2bb861a461cf836105efab8cadf833ad3155b4.1776957778.git.vivien@planete-kraus.eu (mailing list archive)
State New
Headers
Series Support translated long option names in getopt and argp |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-still_applies warning Patch no longer applies to master

Commit Message

Vivien Kraus April 23, 2026, 4:04 p.m. UTC
  There are 2 kinds of problems that can arise with translations:
1. The developer introduces a new option name that collides with a
translation of another option;
2. The translator gave the same translation to 2 different options.

Both are bugs, as the program behavior changes while the invocation is
the same.  With this check, those bugs are detected at run-time, and
getopt_long returns '?' with optind set to 1, after printing a message
to stderr.  The message helps the translator know what to change to
fix the issue.

While the checks do not do anything if translations are disabled, they
necessarily loop over all pairs of distinct options otherwise, so it
may be costly for a program with many arguments.  The proposal is to
only run the test during the first call to getopt_long, and never try
again after.  If we failed repeatedly until collisions were resolved,
then the common pattern to continue processing options until the
return value is -1 would lead to an infinite loop.

Since getopt does not have an initialization function, knowing whether
this is the first time we parse arguments is not trivial.  The
solution is this:
- for reentrant use: we add a boolean to the state, initially set to
false, and set to true once the first call returns.
- for non-reentrant use: we add a global (but not exported) boolean,
initially set to false, set to false when we call
getopt_long_enable_translations, and set to true when we disable the
translations or a non-reentrant getopt_long version returns.

Another approach would be to only check collisions when an option is
recognized (while processing argv), but the translator would then have
to try all the program options to detect the collisions.

Another approach would be to only check collisions with the argp API,
since it has a proper initialization method, but it is useful also for
getopt_long users.
---
 manual/getopt.texi                 |  11 ++-
 posix/Makefile                     |  11 +++
 posix/getopt.c                     | 104 ++++++++++++++++++++++-
 posix/getopt1.c                    |  28 +++++--
 posix/getopt_int.h                 |  10 ++-
 posix/tst-getopt_long_collision.c  | 128 +++++++++++++++++++++++++++++
 posix/tst-getopt_long_collision.po |  32 ++++++++
 posix/tstgetoptl.c                 |  11 +--
 posix/tstgetoptl.po                |   6 --
 9 files changed, 317 insertions(+), 24 deletions(-)
 create mode 100644 posix/tst-getopt_long_collision.c
 create mode 100644 posix/tst-getopt_long_collision.po
  

Patch

diff --git a/manual/getopt.texi b/manual/getopt.texi
index bd58e1b7d8..dbee16b62e 100644
--- a/manual/getopt.texi
+++ b/manual/getopt.texi
@@ -23,7 +23,9 @@  use this facility, your program must include the header file
 @deftypevar int opterr
 @standards{POSIX.2, unistd.h}
 If the value of this variable is nonzero, then @code{getopt} prints an
-error message to the standard error stream if it encounters an unknown
+error message to the standard error stream if it detects a collision
+between an untranslated long option name and a translation of a
+different option, or it encounters an unknown
 option character or an option with a missing required argument.  This is
 the default behavior.  If you set this variable to zero, @code{getopt}
 does not print any messages, but it still returns the character @code{?}
@@ -268,6 +270,13 @@  the default (@pxref{Interface to gettext, , The Interface, gettext,
 the GNU Gettext manual}).  If this is @code{NULL} (the default), the
 translation will be searched in the current text domain.
 
+Once translations have been enabled, the next call to
+@code{getopt_long} or @code{getopt_long_only}, and only this next
+call, will check for collisions between long option name translations.
+If a collision is found, a message is printed to @code{stderr}, this
+next call will fail, returning @code{'?'}, while @code{optind} will be
+set to 1.
+
 @code{getopt_long_enable_translations} returns 0 on success, or -1 and
 sets errno.
 @end deftypefun
diff --git a/posix/Makefile b/posix/Makefile
index 8755f42bdc..e8d5d0661c 100644
--- a/posix/Makefile
+++ b/posix/Makefile
@@ -289,6 +289,7 @@  tests := \
   tst-fork \
   tst-gai_strerror \
   tst-getopt_long1 \
+  tst-getopt_long_collision \
   tst-glob-bz30635 \
   tst-glob-tilde \
   tst-glob_symlinks \
@@ -534,6 +535,7 @@  LOCALES := \
   en_US.UTF-8 \
   es_US.ISO-8859-1 \
   es_US.UTF-8 \
+  fr_FR.UTF-8 \
   ja_JP.EUC-JP \
   tr_TR.UTF-8 \
   # LOCALES
@@ -815,3 +817,12 @@  $(tstgetoptl_mo): tstgetoptl.po
 
 $(objpfx)tstgetoptl.out: $(tstgetoptl_mo) $(gen-locales)
 CFLAGS-tstgetoptl.c += -DOBJPFX=\"$(objpfx)\"
+
+tst_getopt_long_collision_mo = $(objpfx)domaindir/fr_FR/LC_MESSAGES/tst-getopt_long_collision.mo
+$(tst_getopt_long_collision_mo): tst-getopt_long_collision.po
+	$(make-target-directory)
+	msgfmt -o $@T $<
+	mv -f $@T $@
+
+$(objpfx)tst-getopt_long_collision.out: $(tst_getopt_long_collision_mo) $(gen-locales)
+CFLAGS-tst-getopt_long_collision.c += -DOBJPFX=\"$(objpfx)\"
diff --git a/posix/getopt.c b/posix/getopt.c
index 399abeea74..2983fe6ea7 100644
--- a/posix/getopt.c
+++ b/posix/getopt.c
@@ -483,11 +483,93 @@  _getopt_initialize (_GL_UNUSED int argc,
     d->__ordering = REQUIRE_ORDER;
   else
     d->__ordering = PERMUTE;
-
   d->__initialized = 1;
   return optstring;
 }
 
+
+static bool
+has_translation_collisions (const char *domain,
+			    const char *context,
+			    const struct option *long_options,
+			    char *(*do_translate) (const char *__domain,
+						   const char *__context,
+						   const char *__name,
+						   char **__allocated),
+			    bool print_errors,
+			    const char *argv0)
+{
+  /* Otherwise, this is a double loop. */
+  size_t n_options = 0;
+  size_t option_index_a, option_index_b;
+  char *a_buffer = NULL;
+  const char *a_name = NULL;
+  const struct option *option_a;
+  char *b_buffer = NULL;
+  const char *b_name = NULL;
+  const struct option *option_b;
+  bool has_collision = false;
+
+  if (do_translate == NULL || context == NULL)
+    /* Translations are disabled, we can skip.  */
+    return false;
+  /* Count the number of options.  */
+  for (n_options = 0; long_options[n_options].name; n_options++)
+    ;
+  /* Detect collisions between the non-translated name of an option
+     and the translation of a *different* option, or the translations
+     of two different options.  */
+  for (option_index_a = 0;
+       option_index_a < n_options;
+       option_index_a++)
+    {
+      option_a = &(long_options[option_index_a]);
+      a_name = do_translate (domain, context, option_a->name, &a_buffer);
+      for (option_index_b = 0;
+	   option_index_b < n_options;
+	   option_index_b++)
+	if (option_index_b != option_index_a)
+	  {
+	    option_b = &(long_options[option_index_b]);
+	    b_name = do_translate (domain, context, option_b->name, &b_buffer);
+	    if (strcmp (option_a->name, b_name) == 0)
+	      {
+		if (print_errors)
+		  /* Since we do not consider a particular use of an
+		     option, but its general name, we do not know what
+		     prefix it has ("--", "-", or "-W ").  */
+		  fprintf (stderr,
+			   _("%s: you found a translation bug!  "
+			     "domain '%s', context '%s': "
+			     "option *'%s'* exists "
+			     "and option '%s' translates to *'%s'*\n"),
+			   argv0,
+			   domain, context,
+			   option_a->name,
+			   option_b->name, b_name);
+		has_collision = true;
+	      }
+	    if (strcmp (a_name, b_name) == 0
+		&& strcmp (option_a->name, a_name) != 0
+		&& strcmp (option_b->name, b_name) != 0
+		&& option_index_a < option_index_b)
+	      {
+		if (print_errors)
+		  fprintf (stderr,
+			   _("%s: you found a translation bug!  "
+			     "domain '%s', context '%s': "
+			     "both '%s' and '%s' translate to '%s'\n"),
+			   argv0, domain, context,
+			   option_a->name, option_b->name, a_name);
+		has_collision = true;
+	      }
+	    free (b_buffer);
+	  }
+      free (a_buffer);
+    }
+  return has_collision;
+}
+
 /* Scan elements of ARGV (whose length is ARGC) for option characters
    given in OPTSTRING.
 
@@ -563,6 +645,19 @@  _getopt_internal_r (int argc, char **argv, const char *optstring,
   else if (optstring[0] == '-' || optstring[0] == '+')
     optstring++;
 
+  /* Only ever check translations for the first time we call
+     getopt_long, since it is costly.  We cannot check them in
+     _getopt_initialize, because gettext may not be set up yet when it
+     is called.  */
+  if (!d->__translation_collisions_checked)
+    {
+      d->__translation_collisions_checked = true;
+      if (has_translation_collisions (d->opttextdomain, d->optctxt,
+				      longopts, translate, print_errors,
+				      argv[0]))
+	return '?';
+    }
+
   if (optstring[0] == ':')
     print_errors = 0;
 
@@ -788,7 +883,8 @@  _getopt_internal (int argc, char **argv, const char *optstring,
 		  char *(*translate) (const char *, const char *,
 				      const char *, char **),
 		  const char *ctxt,
-		  const char *domain)
+		  const char *domain,
+		  bool translation_collisions_checked)
 {
   int result;
 
@@ -796,6 +892,8 @@  _getopt_internal (int argc, char **argv, const char *optstring,
   getopt_data.opterr = opterr;
   getopt_data.optctxt = ctxt;
   getopt_data.opttextdomain = domain;
+  getopt_data.__translation_collisions_checked =
+    translation_collisions_checked;
 
   result = _getopt_internal_r (argc, argv, optstring, longopts,
 			       longind, long_only, &getopt_data,
@@ -818,7 +916,7 @@  _getopt_internal (int argc, char **argv, const char *optstring,
   {								\
     return _getopt_internal (argc, (char **)argv, optstring,	\
 			     NULL, NULL, 0, POSIXLY_CORRECT,	\
-			     NULL, NULL, NULL);			\
+			     NULL, NULL, NULL, true);		\
   }
 
 #ifdef _LIBC
diff --git a/posix/getopt1.c b/posix/getopt1.c
index cc844a0508..06639c4b96 100644
--- a/posix/getopt1.c
+++ b/posix/getopt1.c
@@ -41,6 +41,11 @@  char *optctxt = NULL;
 
 char *opttextdomain = NULL;
 
+/* This is reset each time we call getopt_long_enable_translations,
+   and set to true as soon as getopt_long is called.  */
+
+bool translation_collisions_checked = false;
+
 /* FIXME: use pgettext_expr.  */
 static char *
 do_translate (const char *domain, const char *context, const char *msgid,
@@ -77,9 +82,13 @@  int
 getopt_long (int argc, char *__getopt_argv_const *argv, const char *options,
 	     const struct option *long_options, int *opt_index)
 {
-  return _getopt_internal (argc, (char **) argv, options, long_options,
-			   opt_index, 0, 0, do_translate,
-			   optctxt, opttextdomain);
+  int c = _getopt_internal (argc, (char **) argv, options, long_options,
+			    opt_index, 0, 0, do_translate,
+			    optctxt, opttextdomain,
+			    translation_collisions_checked);
+  /* Translations are checked at most once. */
+  translation_collisions_checked = true;
+  return c;
 }
 
 int
@@ -101,9 +110,12 @@  getopt_long_only (int argc, char *__getopt_argv_const *argv,
 		  const char *options,
 		  const struct option *long_options, int *opt_index)
 {
-  return _getopt_internal (argc, (char **) argv, options, long_options,
-			   opt_index, 1, 0, do_translate,
-			   optctxt, opttextdomain);
+  int c = _getopt_internal (argc, (char **) argv, options, long_options,
+			    opt_index, 1, 0, do_translate,
+			    optctxt, opttextdomain,
+			    translation_collisions_checked);
+  translation_collisions_checked = true;
+  return c;
 }
 
 int
@@ -122,6 +134,8 @@  disable_translations (void)
   free (opttextdomain);
   optctxt = NULL;
   opttextdomain = NULL;
+  /* No translations so no possibilities for collisions.  */
+  translation_collisions_checked = true;
 }
 
 int
@@ -140,6 +154,8 @@  getopt_long_enable_translations (const char *msgctxt, const char *textdomain)
 	  disable_translations ();
 	  return -1;
 	}
+      /* Next call to getopt_long will check for collisions.  */
+      translation_collisions_checked = false;
     }
   return 0;
 }
diff --git a/posix/getopt_int.h b/posix/getopt_int.h
index a770776dc1..2d7079e770 100644
--- a/posix/getopt_int.h
+++ b/posix/getopt_int.h
@@ -21,6 +21,7 @@ 
 #define _GETOPT_INT_H	1
 
 #include <getopt.h>
+#include <stdbool.h>
 
 /* The translate argument here is optional (can be NULL), it is used
    to avoid depending on the gettext functions in the posix getopt
@@ -31,7 +32,8 @@  extern int _getopt_internal (int ___argc, char **___argv,
 			     int __long_only, int __posixly_correct,
 			     char *(*translate) (const char *, const char *,
 						 const char *, char **),
-			     const char *__optctxt, const char *__optdomain);
+			     const char *__optctxt, const char *__optdomain,
+			     bool __translation_collisions_checked);
 
 
 /* Reentrant versions which can handle parsing multiple argument
@@ -100,6 +102,12 @@  struct _getopt_data
 
   int __first_nonopt;
   int __last_nonopt;
+
+  /* Checking for collision in translations of long options.  */
+
+  /* Checking for collisions is costly; it must compare O(n²) strings,
+     when there are n options.  So, it is only done once.  */
+  bool __translation_collisions_checked;
 };
 
 /* The initializer is necessary to set OPTIND and OPTERR to their
diff --git a/posix/tst-getopt_long_collision.c b/posix/tst-getopt_long_collision.c
new file mode 100644
index 0000000000..2e603ec9f8
--- /dev/null
+++ b/posix/tst-getopt_long_collision.c
@@ -0,0 +1,128 @@ 
+/* Copyright (C) 2026 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <getopt.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <libintl.h>
+#include <locale.h>
+#include <support/support.h>
+#include <support/check.h>
+
+#define PN_(ctxt, str) (str)
+
+/* There are 2 types of collision that can happen: a translation equal
+   to an existing option, or two different options translating to the
+   same thing.
+
+   We test both kinds.  In the first test, we have this setup:
+   foo -> bar
+   bar -> baz
+
+   In the second test, we have this setup:
+   foo -> same
+   bar -> same
+
+   In the third, we don’t translate anything:
+   foo -> foo
+   bar -> bar
+  */
+
+static const struct option options[] =
+  {
+    {"foo", no_argument, NULL, 'f'},
+    {"bar", no_argument, NULL, 'b'},
+    {"help", no_argument, NULL, 'h'},
+    {NULL, 0, NULL, 0}
+  };
+
+static void
+setup_catalog (void)
+{
+  xsetlocale (LC_MESSAGES, "fr_FR.UTF-8");
+  TEST_VERIFY_EXIT (
+      bindtextdomain ("tst-getopt_long_collision", OBJPFX "domaindir")
+      != NULL);
+  TEST_VERIFY_EXIT (textdomain ("tst-getopt_long_collision") != NULL);
+  /* Check that the catalog is OK: */
+  TEST_COMPARE_STRING (dgettext ("tst-getopt_long_collision", "kind 1\004foo"),
+		       "bar");
+  TEST_COMPARE_STRING (dgettext ("tst-getopt_long_collision", "kind 1\004bar"),
+		       "baz");
+  TEST_COMPARE_STRING (dgettext ("tst-getopt_long_collision", "kind 2\004foo"),
+		       "same");
+  TEST_COMPARE_STRING (dgettext ("tst-getopt_long_collision", "kind 2\004bar"),
+		       "same");
+  TEST_COMPARE_STRING (dgettext ("tst-getopt_long_collision", "kind 3\004foo"),
+		       "kind 3\004foo");
+  TEST_COMPARE_STRING (dgettext ("tst-getopt_long_collision", "kind 3\004bar"),
+		       "kind 3\004bar");
+}
+
+static void
+do_test (int kind, int expected, int optind_after_first_run,
+	 int expected_second_run)
+{
+  /* Check test --help with one of the 3 tests.  We expect the first
+     call to getopt_long to return expected, while setting optind to
+     optind_after_first_run.  We call getopt_long a second time, and
+     we expect it to return expected_second_run, while setting optind
+     to 2.  */
+  static const char *contexts[] = { "kind 1", "kind 2", "kind 3" };
+  const char *context = contexts[kind];
+  int c;
+  int option_index = 0;
+  const static char *argv[] =
+    { (char *) "tst-getopt_long_collision", "--help", NULL };
+  const static int argc = 2;
+  optind = 0;
+  TEST_VERIFY_EXIT (getopt_long_enable_translations (context, NULL) == 0);
+  fprintf (stderr, "Start test %d.\n", kind + 1);
+  /* First pass should detect the problem immediately, even if we do
+     not trigger the option.  */
+  c = getopt_long (argc, (char **) argv, "fbh", options, &option_index);
+  TEST_COMPARE (c, expected);
+  TEST_COMPARE (optind, optind_after_first_run);
+  /* The translations check is only run once. */
+  fprintf (stderr, "Restart test %d, we expect no problems.\n", kind + 1);
+  c = getopt_long (argc, (char **) argv, "fbh", options, &option_index);
+  TEST_COMPARE (c, expected_second_run);
+  TEST_COMPARE (optind, 2);
+}
+
+static int
+do_all_tests (void)
+{
+  setup_catalog ();
+  /* In failure cases, the first time we parse, we should get '?', and
+     optind stays at 1.  The second time, we parse the first option.
+
+     In the normal case, the first time we parse, we should get the
+     first option and optind jumps directly to 2.  The second time, we
+     parsed everything.
+  */
+  do_test (0, '?', 1, 'h');
+  do_test (1, '?', 1, 'h');
+  do_test (2, 'h', 2, -1);
+  getopt_long_disable_translations ();
+  return 0;
+}
+
+#define TEST_FUNCTION do_all_tests
+#include <support/test-driver.c>
diff --git a/posix/tst-getopt_long_collision.po b/posix/tst-getopt_long_collision.po
new file mode 100644
index 0000000000..2f39001c6d
--- /dev/null
+++ b/posix/tst-getopt_long_collision.po
@@ -0,0 +1,32 @@ 
+# French translations for tst-getopt_long_collision.c
+# Copyright (C) 2026 THE GNU C Library'S COPYRIGHT HOLDER
+# This file is distributed under the same license as the GNU C Library.
+#
+msgid ""
+msgstr ""
+"Project-Id-Version: GNU C Library (see version.h)\n"
+"Report-Msgid-Bugs-To: \n"
+"POT-Creation-Date: 2025-06-06 22:37+0200\n"
+"PO-Revision-Date: 2025-06-06 22:38+0200\n"
+"Language-Team: English (British) <(nothing)>\n"
+"Language: en_GB\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=ASCII\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Plural-Forms: nplurals=2; plural=(n > 1);\n"
+
+msgctxt "kind 1"
+msgid "foo"
+msgstr "bar"
+
+msgctxt "kind 1"
+msgid "bar"
+msgstr "baz"
+
+msgctxt "kind 2"
+msgid "foo"
+msgstr "same"
+
+msgctxt "kind 2"
+msgid "bar"
+msgstr "same"
diff --git a/posix/tstgetoptl.c b/posix/tstgetoptl.c
index ad5755ddbd..bdc20b7e3d 100644
--- a/posix/tstgetoptl.c
+++ b/posix/tstgetoptl.c
@@ -31,13 +31,10 @@ 
    This echoes tstgetopt.c, where --colour was an option name alias
    for --color, so it had to be listed twice.  */
 
-/* This uses the en_GB locale so that colour means color.  As a
-   special case, we also check that non-translated options have
-   precedence over translated options, by translating "optional" as
-   "required".  We also check that getopt only matches translations
-   for actual options, by having the user pass --flavour (which is a
-   known translation of flavor) without the program recognizing a
-   --flavor option.  */
+/* This uses the en_GB locale so that colour means color.  We also
+   check that getopt only matches translations for actual options, by
+   having the user pass --flavour (which is a known translation of
+   flavor) without the program recognizing a --flavor option.  */
 
 #define TRANSLATION_CONTEXT "command-line option"
 
diff --git a/posix/tstgetoptl.po b/posix/tstgetoptl.po
index 7dc15e71f3..b1dc11c468 100644
--- a/posix/tstgetoptl.po
+++ b/posix/tstgetoptl.po
@@ -24,9 +24,3 @@  msgstr "colour"
 msgctxt "command-line option"
 msgid "flavor"
 msgstr "flavour"
-
-# This is to make sure the translator cannot redirect options.
-#: xxx.c:yy
-msgctxt "command-line option"
-msgid "optional"
-msgstr "required"