libstdc++: Expose translation related context in format_contexts.

Message ID 20260407081500.88664-1-tkaminsk@redhat.com
State New
Headers
Series libstdc++: Expose translation related context in format_contexts. |

Commit Message

Tomasz Kaminski April 7, 2026, 8:14 a.m. UTC
  This patch adds a _M_api member to basic_format_context and
basic_format_parse_context, that represents the information about
the TU in which the call was compiled:
* _M_ver represents the C++ standard in which TU was compiled,
* _M_literal_unicode is true when TU was compiled with Unicode
  literal encoding,
* _M_liter_enc is reserved for storing text_encoding::id value
  for literal encoding, currently set to zero.
This values are then populated by __current_api<_CharT>() functions.

This would allow the formatter instantiations compiled in different
TU (for example as part of libstdc++.so) to properly handle:
* multi-byte fill-characters used as fill in format-spec, that
  are supported only if literal encoding is Unicode,
* '?' as format flags for string and characters, that is only
  supported since C++23,
* escaping of the string parameters, that depends on the literal
  encoding.

The further aid the above, a new __do_vformat_to overload is extracted.
This overload format_context& that encodes the TU-specific properties,
and can be exported in from libstdc++.

This patch on purpose does not modify the formatters code, and only
adds new members, as adding them later would be ABI break.

libstdc++-v3/ChangeLog:

	* include/std/format (__format::_Api_ctx, __format::__current_api):
	Define.
	(basic_format_parse_context::_M_api): Define.
	(basic_format_parse_context::basic_format_parse_context):
        Provide (basic_string_view, size_t) constructor only in C++20.
	Define new internal private cosntructor accepting _Api_ctx.
	(basic_format_context::_M_api): Define.
	(basic_format_context::basic_format_context): Add additional
	_Api_ctx parameter.
	(_Scanner::_Scanner): Add additional _Api_ctx parameter,
	and forward it to basic_format_parse_context.
	(_Formatting_scanner::_Formatting_scanner): Propagate
	_M_api from basic_format_context.
	(_Checking_scanner::_Checking_scanner): Use __format::__current_api()
	to initialize API.
	(__format::__do_vformat_to): Extract overload accepting
	basic_format_context.
---
I have realized that exporting the vformat specializations correclty requires
much bigger code changes, than I am comfortable making this late in the stage-4,
as we will need to make the code independed on TU specific properties (like 
encodinds). This patch instead adds a context members to basic_format_context
and basic_format_parse_context that would allow doing so in the future.

Tested all *format* test on x86_64-linux. OK for trunk when all test
passes?

 libstdc++-v3/include/std/format | 209 ++++++++++++++++++++------------
 1 file changed, 133 insertions(+), 76 deletions(-)
  

Comments

Jonathan Wakely April 7, 2026, 9:49 a.m. UTC | #1
On Tue, 7 Apr 2026, 09:15 Tomasz Kamiński, <tkaminsk@redhat.com> wrote:

> This patch adds a _M_api member to basic_format_context and
> basic_format_parse_context, that represents the information about
> the TU in which the call was compiled:
> * _M_ver represents the C++ standard in which TU was compiled,
> * _M_literal_unicode is true when TU was compiled with Unicode
>   literal encoding,
> * _M_liter_enc is reserved for storing text_encoding::id value
>   for literal encoding, currently set to zero.
> This values are then populated by __current_api<_CharT>() functions.
>
> This would allow the formatter instantiations compiled in different
> TU (for example as part of libstdc++.so) to properly handle:
> * multi-byte fill-characters used as fill in format-spec, that
>   are supported only if literal encoding is Unicode,
> * '?' as format flags for string and characters, that is only
>   supported since C++23,
> * escaping of the string parameters, that depends on the literal
>   encoding.
>
> The further aid the above, a new __do_vformat_to overload is extracted.
> This overload format_context& that encodes the TU-specific properties,
> and can be exported in from libstdc++.
>
> This patch on purpose does not modify the formatters code, and only
> adds new members, as adding them later would be ABI break.
>
> libstdc++-v3/ChangeLog:
>
>         * include/std/format (__format::_Api_ctx, __format::__current_api):
>         Define.
>         (basic_format_parse_context::_M_api): Define.
>         (basic_format_parse_context::basic_format_parse_context):
>         Provide (basic_string_view, size_t) constructor only in C++20.
>         Define new internal private cosntructor accepting _Api_ctx.
>         (basic_format_context::_M_api): Define.
>         (basic_format_context::basic_format_context): Add additional
>         _Api_ctx parameter.
>         (_Scanner::_Scanner): Add additional _Api_ctx parameter,
>         and forward it to basic_format_parse_context.
>         (_Formatting_scanner::_Formatting_scanner): Propagate
>         _M_api from basic_format_context.
>         (_Checking_scanner::_Checking_scanner): Use
> __format::__current_api()
>         to initialize API.
>         (__format::__do_vformat_to): Extract overload accepting
>         basic_format_context.
> ---
> I have realized that exporting the vformat specializations correclty
> requires
> much bigger code changes, than I am comfortable making this late in the
> stage-4,
> as we will need to make the code independed on TU specific properties
> (like
> encodinds). This patch instead adds a context members to
> basic_format_context
> and basic_format_parse_context that would allow doing so in the future.
>

An alternative would be to have an inline dispatching function that decides
whether the current TU matches what's in the library (where that will be
the common case) and only uses the explicit instantiations of it matches.

I'm not sure this is really a problem I care about solving. If you try to
mix incompatible literal encodings in one program you shouldn't expect
sensible results for code that is sensitive to the literal encoding.

When mixing C++20 and C++23, the C++20 TUs should use the explicit
instantiation which is right for C++20, and C++23 TUs will use an implicit
instantiation of the C++23 definition.

Is there really a problem?

If we can capture the API level without adding any overhead, I suppose
that's acceptable.

If we store the text encoding, what are we going to do with it? Use iconv
to convert the fill character on the fly? To what output encoding?




> Tested all *format* test on x86_64-linux. OK for trunk when all test
> passes?
>
>  libstdc++-v3/include/std/format | 209 ++++++++++++++++++++------------
>  1 file changed, 133 insertions(+), 76 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/format
> b/libstdc++-v3/include/std/format
> index eca5bd213aa..97d1ecb3ed6 100644
> --- a/libstdc++-v3/include/std/format
> +++ b/libstdc++-v3/include/std/format
> @@ -140,6 +140,37 @@ namespace __format
>        template<typename, typename...> friend struct
> std::basic_format_string;
>      };
>
> +  // Exposed via basic_format_parse_context, defines the TU specific
> information
> +  // like encoding and standard version.
> +  struct _Api_ctx
> +  {
> +    enum class _Version : unsigned char
> +    { _Api_2020, _Api_2023, _Api_2026 };
> +
> +    _Version _M_ver;
> +    unsigned _M_unused : 23;
> +    unsigned _M_literal_unicode : 1;
> +    __INT_LEAST32_TYPE__ _M_literal_enc;
> +  };
> +  using enum _Api_ctx::_Version;
> +
> +  template<typename _CharT>
> +    constexpr _Api_ctx
> +    __current_api()
>

Should this be always inline?

+    {
> +      _Api_ctx __api{};
> +#if __cpluplus > 202302L
> +      __api._M_ver = _Api_2026;
> +#elif __cpluplus > 202002L
> +      __api._M_ver = _Api_2023;
> +#else
> +      __api._M_ver = _Api_2020;
> +#endif
> +      __api._M_literal_unicode
> +       = __unicode::__literal_encoding_is_unicode<_CharT>();
> +      return __api;
> +    }
> +
>  } // namespace __format
>  /// @endcond
>
> @@ -274,7 +305,7 @@ namespace __format
>    { __throw_format_error("format error: failed to parse format-spec"); }
>
>    template<typename _CharT> class _Scanner;
> -
> +  template<typename _Out, typename _CharT> class _Formatting_scanner;
>  } // namespace __format
>    /// @endcond
>
> @@ -408,23 +439,34 @@ namespace __format
>        // This must not be constexpr.
>        static void __invalid_dynamic_spec(const char*);
>
> -      friend __format::_Scanner<_CharT>;
> -#endif
> -
> +#else
>        // This constructor should only be used by the implementation.
>        constexpr explicit
>        basic_format_parse_context(basic_string_view<_CharT> __fmt,
>                                  size_t __num_args) noexcept
>        : _M_begin(__fmt.begin()), _M_end(__fmt.end()),
> _M_num_args(__num_args)
>        { }
> +#endif
>
>      private:
> +      // This constructor should only be used by the implementation.
> +      constexpr explicit
> +      basic_format_parse_context(__format::_Api_ctx __api,
> +                                basic_string_view<_CharT> __fmt,
> +                                size_t __num_args) noexcept
> +      : _M_api(__api), _M_begin(__fmt.begin()), _M_end(__fmt.end())
> +      , _M_num_args(__num_args)
> +      { }
> +
> +      __format::_Api_ctx _M_api = __format::__current_api<_CharT>();
>

What guarantees this will be initialized by a call to the right version?

Doesn't putting this member first add a lot of wasted padding due to
alignment?


       iterator _M_begin;
>        iterator _M_end;
>        enum _Indexing { _Unknown, _Manual, _Auto };
>        _Indexing _M_indexing = _Unknown;
>

We already have padding bytes here (and could guarantee that by giving a
fixed underlying type to _Indexing)

       size_t _M_next_arg_id = 0;
>        size_t _M_num_args = 0;
> +
> +      friend __format::_Scanner<_CharT>;
>      };
>
>  /// @cond undocumented
> @@ -4927,18 +4969,21 @@ namespace __format
>      {
>        static_assert( output_iterator<_Out, const _CharT&> );
>
> +      __format::_Api_ctx  _M_api;
>        basic_format_args<basic_format_context> _M_args;
>        _Out _M_out;
>        __format::_Optional_locale _M_loc;
>
> -      basic_format_context(basic_format_args<basic_format_context> __args,
> +      basic_format_context(__format::_Api_ctx __api,
> +                          basic_format_args<basic_format_context> __args,
>                            _Out __out)
> -      : _M_args(__args), _M_out(std::move(__out))
> +      : _M_api(__api), _M_args(__args), _M_out(std::move(__out))
>        { }
>
> -      basic_format_context(basic_format_args<basic_format_context> __args,
> +      basic_format_context(__format::_Api_ctx __api,
> +                          basic_format_args<basic_format_context> __args,
>                            _Out __out, const std::locale& __loc)
> -      : _M_args(__args), _M_out(std::move(__out)), _M_loc(__loc)
> +      : _M_api(__api), _M_args(__args),        _M_out(std::move(__out)),
> _M_loc(__loc)
>        { }
>
>        // _GLIBCXX_RESOLVE_LIB_DEFECTS
> @@ -4954,6 +4999,7 @@ namespace __format
>                                   const locale*);
>
>        friend __format::__formatter_chrono<_CharT>;
> +      friend __format::_Formatting_scanner<_Out, _CharT>;
>
>      public:
>        ~basic_format_context() = default;
> @@ -4998,8 +5044,9 @@ namespace __format
>        } _M_pc;
>
>        constexpr explicit
> -      _Scanner(basic_string_view<_CharT> __str, size_t __nargs =
> (size_t)-1)
> -      : _M_pc(__str, __nargs)
> +      _Scanner(_Api_ctx __api, basic_string_view<_CharT> __str,
> +              size_t __nargs = (size_t)-1)
> +      : _M_pc(__api, __str, __nargs)
>        { }
>
>        constexpr iterator begin() const noexcept { return _M_pc.begin(); }
> @@ -5115,7 +5162,7 @@ namespace __format
>      public:
>        _Formatting_scanner(basic_format_context<_Out, _CharT>& __fc,
>                           basic_string_view<_CharT> __str)
> -      : _Scanner<_CharT>(__str), _M_fc(__fc)
> +      : _Scanner<_CharT>(__fc._M_api, __str), _M_fc(__fc)
>        { }
>
>      private:
> @@ -5176,7 +5223,8 @@ namespace __format
>      public:
>        consteval
>        _Checking_scanner(basic_string_view<_CharT> __str)
> -      : _Scanner<_CharT>(__str, sizeof...(_Args))
> +      : _Scanner<_CharT>(__format::__current_api<_CharT>(),
>

This is consteval so should use the right version for the current TU.

+                        __str, sizeof...(_Args))
>        {
>  #if __cpp_lib_format >= 202305L
>         this->_M_pc._M_types = _M_types.data();
> @@ -5219,82 +5267,91 @@ namespace __format
>  #endif
>      };
>
> -  template<typename _Out, typename _CharT, typename _Context>
> -    inline _Out
> -    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
> -                   const basic_format_args<_Context>& __args,
> -                   const locale* __loc)
> +  template<typename _CharT>
> +    _Sink_iter<_CharT>
> +    __do_vformat_to(_Sink_iter<_CharT> __out, basic_string_view<_CharT>
> __fmt,
> +                   __format_context<_CharT>& __ctx)
>      {
> -      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
> -       {
> -         if constexpr (is_same_v<_CharT, char>)
> -           // Fast path for "{}" format strings and simple format arg
> types.
> -           if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
> -             {
> -               bool __done = false;
> -               __format::__visit_format_arg([&](auto& __arg) {
> -                 using _Tp = remove_cvref_t<decltype(__arg)>;
> -                 if constexpr (is_same_v<_Tp, bool>)
> +      if constexpr (is_same_v<_CharT, char>)
> +       // Fast path for "{}" format strings and simple format arg types.
> +       if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
> +         {
> +           bool __done = false;
> +           __format::__visit_format_arg([&](auto& __arg) {
> +             using _Tp = remove_cvref_t<decltype(__arg)>;
> +             if constexpr (is_same_v<_Tp, bool>)
> +               {
> +                 size_t __len = 4 + !__arg;
> +                 const char* __chars[] = { "false", "true" };
> +                 if (auto __res = __out._M_reserve(__len))
>                     {
> -                     size_t __len = 4 + !__arg;
> -                     const char* __chars[] = { "false", "true" };
> -                     if (auto __res = __out._M_reserve(__len))
> -                       {
> -                         __builtin_memcpy(__res.get(), __chars[__arg],
> __len);
> -                         __res._M_bump(__len);
> -                         __done = true;
> -                       }
> +                     __builtin_memcpy(__res.get(), __chars[__arg], __len);
> +                     __res._M_bump(__len);
> +                     __done = true;
>                     }
> -                 else if constexpr (is_same_v<_Tp, char>)
> +               }
> +             else if constexpr (is_same_v<_Tp, char>)
> +               {
> +                 if (auto __res = __out._M_reserve(1))
>                     {
> -                     if (auto __res = __out._M_reserve(1))
> -                       {
> -                         *__res.get() = __arg;
> -                         __res._M_bump(1);
> -                         __done = true;
> -                       }
> +                     *__res.get() = __arg;
> +                     __res._M_bump(1);
> +                     __done = true;
>                     }
> -                 else if constexpr (is_integral_v<_Tp>)
> +               }
> +             else if constexpr (is_integral_v<_Tp>)
> +               {
> +                 make_unsigned_t<_Tp> __uval;
> +                 const bool __neg = __arg < 0;
> +                 if (__neg)
> +                   __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
> +                 else
> +                   __uval = __arg;
> +                 const auto __n = __detail::__to_chars_len(__uval);
> +                 if (auto __res = __out._M_reserve(__n + __neg))
>                     {
> -                     make_unsigned_t<_Tp> __uval;
> -                     const bool __neg = __arg < 0;
> -                     if (__neg)
> -                       __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
> -                     else
> -                       __uval = __arg;
> -                     const auto __n = __detail::__to_chars_len(__uval);
> -                     if (auto __res = __out._M_reserve(__n + __neg))
> -                       {
> -                         auto __ptr = __res.get();
> -                         *__ptr = '-';
> -                         __detail::__to_chars_10_impl(__ptr + (int)__neg,
> __n,
> -                                                      __uval);
> -                         __res._M_bump(__n + __neg);
> -                         __done = true;
> -                       }
> +                     auto __ptr = __res.get();
> +                     *__ptr = '-';
> +                     __detail::__to_chars_10_impl(__ptr + (int)__neg, __n,
> +                                                  __uval);
> +                     __res._M_bump(__n + __neg);
> +                     __done = true;
>                     }
> -                 else if constexpr (is_convertible_v<_Tp, string_view>)
> +               }
> +             else if constexpr (is_convertible_v<_Tp, string_view>)
> +               {
> +                 string_view __sv = __arg;
> +                 if (auto __res = __out._M_reserve(__sv.size()))
>                     {
> -                     string_view __sv = __arg;
> -                     if (auto __res = __out._M_reserve(__sv.size()))
> -                       {
> -                         __builtin_memcpy(__res.get(), __sv.data(),
> __sv.size());
> -                         __res._M_bump(__sv.size());
> -                         __done = true;
> -                       }
> +                     __builtin_memcpy(__res.get(), __sv.data(),
> __sv.size());
> +                     __res._M_bump(__sv.size());
> +                     __done = true;
>                     }
> -               }, __args.get(0));
> +               }
> +           }, __ctx.arg(0));
>
> -               if (__done)
> -                 return __out;
> -             }
> +           if (__done)
> +             return __out;
> +         }
>
> +      _Formatting_scanner<_Sink_iter<_CharT>, _CharT> __scanner(__ctx,
> __fmt);
> +      __scanner._M_scan();
> +      return __out;
> +    }
> +
> +  template<typename _Out, typename _CharT, typename _Context>
> +    _Out
> +    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
> +                   const basic_format_args<_Context>& __args,
> +                   const locale* __loc)
> +    {
> +      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
> +       {
> +         const auto __api = __format::__current_api<_CharT>();
>           auto __ctx = __loc == nullptr
> -                        ? _Context(__args, __out)
> -                        : _Context(__args, __out, *__loc);
> -         _Formatting_scanner<_Sink_iter<_CharT>, _CharT> __scanner(__ctx,
> __fmt);
> -         __scanner._M_scan();
> -         return __out;
> +                    ? _Context(__api, __args, __out)
> +                    : _Context(__api, __args, __out, *__loc);
> +         return __do_vformat_to(std::move(__out), __fmt, __ctx);
>         }
>        else if constexpr (__contiguous_char_iter<_CharT, _Out>)
>         {
> --
> 2.53.0
>
>
  
Tomasz Kaminski April 7, 2026, 1:29 p.m. UTC | #2
On Tue, Apr 7, 2026 at 3:00 PM Jonathan Wakely <jwakely.gcc@gmail.com>
wrote:

>
>
> On Tue, 7 Apr 2026, 09:15 Tomasz Kamiński, <tkaminsk@redhat.com> wrote:
>
>> This patch adds a _M_api member to basic_format_context and
>> basic_format_parse_context, that represents the information about
>> the TU in which the call was compiled:
>> * _M_ver represents the C++ standard in which TU was compiled,
>> * _M_literal_unicode is true when TU was compiled with Unicode
>>   literal encoding,
>> * _M_liter_enc is reserved for storing text_encoding::id value
>>   for literal encoding, currently set to zero.
>> This values are then populated by __current_api<_CharT>() functions.
>>
>> This would allow the formatter instantiations compiled in different
>> TU (for example as part of libstdc++.so) to properly handle:
>> * multi-byte fill-characters used as fill in format-spec, that
>>   are supported only if literal encoding is Unicode,
>> * '?' as format flags for string and characters, that is only
>>   supported since C++23,
>> * escaping of the string parameters, that depends on the literal
>>   encoding.
>>
>> The further aid the above, a new __do_vformat_to overload is extracted.
>> This overload format_context& that encodes the TU-specific properties,
>> and can be exported in from libstdc++.
>>
>> This patch on purpose does not modify the formatters code, and only
>> adds new members, as adding them later would be ABI break.
>>
>> libstdc++-v3/ChangeLog:
>>
>>         * include/std/format (__format::_Api_ctx,
>> __format::__current_api):
>>         Define.
>>         (basic_format_parse_context::_M_api): Define.
>>         (basic_format_parse_context::basic_format_parse_context):
>>         Provide (basic_string_view, size_t) constructor only in C++20.
>>         Define new internal private cosntructor accepting _Api_ctx.
>>         (basic_format_context::_M_api): Define.
>>         (basic_format_context::basic_format_context): Add additional
>>         _Api_ctx parameter.
>>         (_Scanner::_Scanner): Add additional _Api_ctx parameter,
>>         and forward it to basic_format_parse_context.
>>         (_Formatting_scanner::_Formatting_scanner): Propagate
>>         _M_api from basic_format_context.
>>         (_Checking_scanner::_Checking_scanner): Use
>> __format::__current_api()
>>         to initialize API.
>>         (__format::__do_vformat_to): Extract overload accepting
>>         basic_format_context.
>> ---
>> I have realized that exporting the vformat specializations correclty
>> requires
>> much bigger code changes, than I am comfortable making this late in the
>> stage-4,
>> as we will need to make the code independed on TU specific properties
>> (like
>> encodinds). This patch instead adds a context members to
>> basic_format_context
>> and basic_format_parse_context that would allow doing so in the future.
>>
>
> An alternative would be to have an inline dispatching function that
> decides whether the current TU matches what's in the library (where that
> will be the common case) and only uses the explicit instantiations of it
> matches.
>
Seems reasonable for unicode encoding, but does not solve multiple standard.

>
> I'm not sure this is really a problem I care about solving. If you try to
> mix incompatible literal encodings in one program you shouldn't expect
> sensible results for code that is sensitive to the literal encoding.
>
> When mixing C++20 and C++23, the C++20 TUs should use the explicit
> instantiation which is right for C++20, and C++23 TUs will use an implicit
> instantiation of the C++23 definition.
>
This works only on surface level, C++20 will use __vformat_impl_20 and
C++23 will use __vformat_impl_23, defined in format-inst-20, format-inst-23
source files.
Both of these files will instantiate `__formatter_str` under the same name.
When they are combined into `libstdc++.so`, linker one, and one standard
will get incorrect behavior for one of the standards. To avoid this problem
we will need to ABI tag each used formatter in some manner and apply that
tag virally to any formatter referenced from these functions.


>
> Is there really a problem?
>
> If we can capture the API level without adding any overhead, I suppose
> that's acceptable.
>
> If we store the text encoding, what are we going to do with it? Use iconv
> to convert the fill character on the fly? To what output encoding?
>
This if for future, if we want to handle string escaping for non-unicode
encoding better than giving them equivalent behavior than ASCII.
We can add any additional fields to basic_format_parse_context and
basic_format_context in the future; this is why I am reserving space for it.

// Also could you take a look at:
https://gcc.gnu.org/pipermail/libstdc++/2026-April/066030.html


>
>
>
>
>> Tested all *format* test on x86_64-linux. OK for trunk when all test
>> passes?
>>
>>  libstdc++-v3/include/std/format | 209 ++++++++++++++++++++------------
>>  1 file changed, 133 insertions(+), 76 deletions(-)
>>
>> diff --git a/libstdc++-v3/include/std/format
>> b/libstdc++-v3/include/std/format
>> index eca5bd213aa..97d1ecb3ed6 100644
>> --- a/libstdc++-v3/include/std/format
>> +++ b/libstdc++-v3/include/std/format
>> @@ -140,6 +140,37 @@ namespace __format
>>        template<typename, typename...> friend struct
>> std::basic_format_string;
>>      };
>>
>> +  // Exposed via basic_format_parse_context, defines the TU specific
>> information
>> +  // like encoding and standard version.
>> +  struct _Api_ctx
>> +  {
>> +    enum class _Version : unsigned char
>> +    { _Api_2020, _Api_2023, _Api_2026 };
>> +
>> +    _Version _M_ver;
>> +    unsigned _M_unused : 23;
>> +    unsigned _M_literal_unicode : 1;
>> +    __INT_LEAST32_TYPE__ _M_literal_enc;
>> +  };
>> +  using enum _Api_ctx::_Version;
>> +
>> +  template<typename _CharT>
>> +    constexpr _Api_ctx
>> +    __current_api()
>>
>
> Should this be always inline?
>
> +    {
>> +      _Api_ctx __api{};
>> +#if __cpluplus > 202302L
>> +      __api._M_ver = _Api_2026;
>> +#elif __cpluplus > 202002L
>> +      __api._M_ver = _Api_2023;
>> +#else
>> +      __api._M_ver = _Api_2020;
>> +#endif
>> +      __api._M_literal_unicode
>> +       = __unicode::__literal_encoding_is_unicode<_CharT>();
>> +      return __api;
>> +    }
>> +
>>  } // namespace __format
>>  /// @endcond
>>
>> @@ -274,7 +305,7 @@ namespace __format
>>    { __throw_format_error("format error: failed to parse format-spec"); }
>>
>>    template<typename _CharT> class _Scanner;
>> -
>> +  template<typename _Out, typename _CharT> class _Formatting_scanner;
>>  } // namespace __format
>>    /// @endcond
>>
>> @@ -408,23 +439,34 @@ namespace __format
>>        // This must not be constexpr.
>>        static void __invalid_dynamic_spec(const char*);
>>
>> -      friend __format::_Scanner<_CharT>;
>> -#endif
>> -
>> +#else
>>        // This constructor should only be used by the implementation.
>>        constexpr explicit
>>        basic_format_parse_context(basic_string_view<_CharT> __fmt,
>>                                  size_t __num_args) noexcept
>>        : _M_begin(__fmt.begin()), _M_end(__fmt.end()),
>> _M_num_args(__num_args)
>>        { }
>> +#endif
>>
>>      private:
>> +      // This constructor should only be used by the implementation.
>> +      constexpr explicit
>> +      basic_format_parse_context(__format::_Api_ctx __api,
>> +                                basic_string_view<_CharT> __fmt,
>> +                                size_t __num_args) noexcept
>> +      : _M_api(__api), _M_begin(__fmt.begin()), _M_end(__fmt.end())
>> +      , _M_num_args(__num_args)
>> +      { }
>> +
>> +      __format::_Api_ctx _M_api = __format::__current_api<_CharT>();
>>
>
> What guarantees this will be initialized by a call to the right version?
>
This is only used by basic_format_parse_context(string) constructor, that
is mostly used
for user defined-formatters. We may want to define this cosntructors as
always inline.

>
> Doesn't putting this member first add a lot of wasted padding due to
> alignment?
>
I do not think basic_format_parse_context and basic_format_context size is
relevant to
anybody. But the struct is 64bits, so should not add extra pading.

>
>
>        iterator _M_begin;
>>        iterator _M_end;
>>        enum _Indexing { _Unknown, _Manual, _Auto };
>>        _Indexing _M_indexing = _Unknown;
>>
>
> We already have padding bytes here (and could guarantee that by giving a
> fixed underlying type to _Indexing)
>
>        size_t _M_next_arg_id = 0;
>>        size_t _M_num_args = 0;
>> +
>> +      friend __format::_Scanner<_CharT>;
>>      };
>>
>>  /// @cond undocumented
>> @@ -4927,18 +4969,21 @@ namespace __format
>>      {
>>        static_assert( output_iterator<_Out, const _CharT&> );
>>
>> +      __format::_Api_ctx  _M_api;
>>        basic_format_args<basic_format_context> _M_args;
>>        _Out _M_out;
>>        __format::_Optional_locale _M_loc;
>>
>> -      basic_format_context(basic_format_args<basic_format_context>
>> __args,
>> +      basic_format_context(__format::_Api_ctx __api,
>> +                          basic_format_args<basic_format_context> __args,
>>                            _Out __out)
>> -      : _M_args(__args), _M_out(std::move(__out))
>> +      : _M_api(__api), _M_args(__args), _M_out(std::move(__out))
>>        { }
>>
>> -      basic_format_context(basic_format_args<basic_format_context>
>> __args,
>> +      basic_format_context(__format::_Api_ctx __api,
>> +                          basic_format_args<basic_format_context> __args,
>>                            _Out __out, const std::locale& __loc)
>> -      : _M_args(__args), _M_out(std::move(__out)), _M_loc(__loc)
>> +      : _M_api(__api), _M_args(__args),        _M_out(std::move(__out)),
>> _M_loc(__loc)
>>        { }
>>
>>        // _GLIBCXX_RESOLVE_LIB_DEFECTS
>> @@ -4954,6 +4999,7 @@ namespace __format
>>                                   const locale*);
>>
>>        friend __format::__formatter_chrono<_CharT>;
>> +      friend __format::_Formatting_scanner<_Out, _CharT>;
>>
>>      public:
>>        ~basic_format_context() = default;
>> @@ -4998,8 +5044,9 @@ namespace __format
>>        } _M_pc;
>>
>>        constexpr explicit
>> -      _Scanner(basic_string_view<_CharT> __str, size_t __nargs =
>> (size_t)-1)
>> -      : _M_pc(__str, __nargs)
>> +      _Scanner(_Api_ctx __api, basic_string_view<_CharT> __str,
>> +              size_t __nargs = (size_t)-1)
>> +      : _M_pc(__api, __str, __nargs)
>>        { }
>>
>>        constexpr iterator begin() const noexcept { return _M_pc.begin(); }
>> @@ -5115,7 +5162,7 @@ namespace __format
>>      public:
>>        _Formatting_scanner(basic_format_context<_Out, _CharT>& __fc,
>>                           basic_string_view<_CharT> __str)
>> -      : _Scanner<_CharT>(__str), _M_fc(__fc)
>> +      : _Scanner<_CharT>(__fc._M_api, __str), _M_fc(__fc)
>>        { }
>>
>>      private:
>> @@ -5176,7 +5223,8 @@ namespace __format
>>      public:
>>        consteval
>>        _Checking_scanner(basic_string_view<_CharT> __str)
>> -      : _Scanner<_CharT>(__str, sizeof...(_Args))
>> +      : _Scanner<_CharT>(__format::__current_api<_CharT>(),
>>
>
> This is consteval so should use the right version for the current TU.
>
> +                        __str, sizeof...(_Args))
>>        {
>>  #if __cpp_lib_format >= 202305L
>>         this->_M_pc._M_types = _M_types.data();
>> @@ -5219,82 +5267,91 @@ namespace __format
>>  #endif
>>      };
>>
>> -  template<typename _Out, typename _CharT, typename _Context>
>> -    inline _Out
>> -    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>> -                   const basic_format_args<_Context>& __args,
>> -                   const locale* __loc)
>> +  template<typename _CharT>
>> +    _Sink_iter<_CharT>
>> +    __do_vformat_to(_Sink_iter<_CharT> __out, basic_string_view<_CharT>
>> __fmt,
>> +                   __format_context<_CharT>& __ctx)
>>      {
>> -      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>> -       {
>> -         if constexpr (is_same_v<_CharT, char>)
>> -           // Fast path for "{}" format strings and simple format arg
>> types.
>> -           if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>> -             {
>> -               bool __done = false;
>> -               __format::__visit_format_arg([&](auto& __arg) {
>> -                 using _Tp = remove_cvref_t<decltype(__arg)>;
>> -                 if constexpr (is_same_v<_Tp, bool>)
>> +      if constexpr (is_same_v<_CharT, char>)
>> +       // Fast path for "{}" format strings and simple format arg types.
>> +       if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>> +         {
>> +           bool __done = false;
>> +           __format::__visit_format_arg([&](auto& __arg) {
>> +             using _Tp = remove_cvref_t<decltype(__arg)>;
>> +             if constexpr (is_same_v<_Tp, bool>)
>> +               {
>> +                 size_t __len = 4 + !__arg;
>> +                 const char* __chars[] = { "false", "true" };
>> +                 if (auto __res = __out._M_reserve(__len))
>>                     {
>> -                     size_t __len = 4 + !__arg;
>> -                     const char* __chars[] = { "false", "true" };
>> -                     if (auto __res = __out._M_reserve(__len))
>> -                       {
>> -                         __builtin_memcpy(__res.get(), __chars[__arg],
>> __len);
>> -                         __res._M_bump(__len);
>> -                         __done = true;
>> -                       }
>> +                     __builtin_memcpy(__res.get(), __chars[__arg],
>> __len);
>> +                     __res._M_bump(__len);
>> +                     __done = true;
>>                     }
>> -                 else if constexpr (is_same_v<_Tp, char>)
>> +               }
>> +             else if constexpr (is_same_v<_Tp, char>)
>> +               {
>> +                 if (auto __res = __out._M_reserve(1))
>>                     {
>> -                     if (auto __res = __out._M_reserve(1))
>> -                       {
>> -                         *__res.get() = __arg;
>> -                         __res._M_bump(1);
>> -                         __done = true;
>> -                       }
>> +                     *__res.get() = __arg;
>> +                     __res._M_bump(1);
>> +                     __done = true;
>>                     }
>> -                 else if constexpr (is_integral_v<_Tp>)
>> +               }
>> +             else if constexpr (is_integral_v<_Tp>)
>> +               {
>> +                 make_unsigned_t<_Tp> __uval;
>> +                 const bool __neg = __arg < 0;
>> +                 if (__neg)
>> +                   __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>> +                 else
>> +                   __uval = __arg;
>> +                 const auto __n = __detail::__to_chars_len(__uval);
>> +                 if (auto __res = __out._M_reserve(__n + __neg))
>>                     {
>> -                     make_unsigned_t<_Tp> __uval;
>> -                     const bool __neg = __arg < 0;
>> -                     if (__neg)
>> -                       __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>> -                     else
>> -                       __uval = __arg;
>> -                     const auto __n = __detail::__to_chars_len(__uval);
>> -                     if (auto __res = __out._M_reserve(__n + __neg))
>> -                       {
>> -                         auto __ptr = __res.get();
>> -                         *__ptr = '-';
>> -                         __detail::__to_chars_10_impl(__ptr +
>> (int)__neg, __n,
>> -                                                      __uval);
>> -                         __res._M_bump(__n + __neg);
>> -                         __done = true;
>> -                       }
>> +                     auto __ptr = __res.get();
>> +                     *__ptr = '-';
>> +                     __detail::__to_chars_10_impl(__ptr + (int)__neg,
>> __n,
>> +                                                  __uval);
>> +                     __res._M_bump(__n + __neg);
>> +                     __done = true;
>>                     }
>> -                 else if constexpr (is_convertible_v<_Tp, string_view>)
>> +               }
>> +             else if constexpr (is_convertible_v<_Tp, string_view>)
>> +               {
>> +                 string_view __sv = __arg;
>> +                 if (auto __res = __out._M_reserve(__sv.size()))
>>                     {
>> -                     string_view __sv = __arg;
>> -                     if (auto __res = __out._M_reserve(__sv.size()))
>> -                       {
>> -                         __builtin_memcpy(__res.get(), __sv.data(),
>> __sv.size());
>> -                         __res._M_bump(__sv.size());
>> -                         __done = true;
>> -                       }
>> +                     __builtin_memcpy(__res.get(), __sv.data(),
>> __sv.size());
>> +                     __res._M_bump(__sv.size());
>> +                     __done = true;
>>                     }
>> -               }, __args.get(0));
>> +               }
>> +           }, __ctx.arg(0));
>>
>> -               if (__done)
>> -                 return __out;
>> -             }
>> +           if (__done)
>> +             return __out;
>> +         }
>>
>> +      _Formatting_scanner<_Sink_iter<_CharT>, _CharT> __scanner(__ctx,
>> __fmt);
>> +      __scanner._M_scan();
>> +      return __out;
>> +    }
>> +
>> +  template<typename _Out, typename _CharT, typename _Context>
>> +    _Out
>> +    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>> +                   const basic_format_args<_Context>& __args,
>> +                   const locale* __loc)
>> +    {
>> +      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>> +       {
>> +         const auto __api = __format::__current_api<_CharT>();
>>           auto __ctx = __loc == nullptr
>> -                        ? _Context(__args, __out)
>> -                        : _Context(__args, __out, *__loc);
>> -         _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>> __scanner(__ctx, __fmt);
>> -         __scanner._M_scan();
>> -         return __out;
>> +                    ? _Context(__api, __args, __out)
>> +                    : _Context(__api, __args, __out, *__loc);
>> +         return __do_vformat_to(std::move(__out), __fmt, __ctx);
>>         }
>>        else if constexpr (__contiguous_char_iter<_CharT, _Out>)
>>         {
>> --
>> 2.53.0
>>
>>
  
Jonathan Wakely April 7, 2026, 1:54 p.m. UTC | #3
On Tue, 7 Apr 2026, 14:30 Tomasz Kaminski, <tkaminsk@redhat.com> wrote:

>
>
> On Tue, Apr 7, 2026 at 3:00 PM Jonathan Wakely <jwakely.gcc@gmail.com>
> wrote:
>
>>
>>
>> On Tue, 7 Apr 2026, 09:15 Tomasz Kamiński, <tkaminsk@redhat.com> wrote:
>>
>>> This patch adds a _M_api member to basic_format_context and
>>> basic_format_parse_context, that represents the information about
>>> the TU in which the call was compiled:
>>> * _M_ver represents the C++ standard in which TU was compiled,
>>> * _M_literal_unicode is true when TU was compiled with Unicode
>>>   literal encoding,
>>> * _M_liter_enc is reserved for storing text_encoding::id value
>>>   for literal encoding, currently set to zero.
>>> This values are then populated by __current_api<_CharT>() functions.
>>>
>>> This would allow the formatter instantiations compiled in different
>>> TU (for example as part of libstdc++.so) to properly handle:
>>> * multi-byte fill-characters used as fill in format-spec, that
>>>   are supported only if literal encoding is Unicode,
>>> * '?' as format flags for string and characters, that is only
>>>   supported since C++23,
>>> * escaping of the string parameters, that depends on the literal
>>>   encoding.
>>>
>>> The further aid the above, a new __do_vformat_to overload is extracted.
>>> This overload format_context& that encodes the TU-specific properties,
>>> and can be exported in from libstdc++.
>>>
>>> This patch on purpose does not modify the formatters code, and only
>>> adds new members, as adding them later would be ABI break.
>>>
>>> libstdc++-v3/ChangeLog:
>>>
>>>         * include/std/format (__format::_Api_ctx,
>>> __format::__current_api):
>>>         Define.
>>>         (basic_format_parse_context::_M_api): Define.
>>>         (basic_format_parse_context::basic_format_parse_context):
>>>         Provide (basic_string_view, size_t) constructor only in C++20.
>>>         Define new internal private cosntructor accepting _Api_ctx.
>>>         (basic_format_context::_M_api): Define.
>>>         (basic_format_context::basic_format_context): Add additional
>>>         _Api_ctx parameter.
>>>         (_Scanner::_Scanner): Add additional _Api_ctx parameter,
>>>         and forward it to basic_format_parse_context.
>>>         (_Formatting_scanner::_Formatting_scanner): Propagate
>>>         _M_api from basic_format_context.
>>>         (_Checking_scanner::_Checking_scanner): Use
>>> __format::__current_api()
>>>         to initialize API.
>>>         (__format::__do_vformat_to): Extract overload accepting
>>>         basic_format_context.
>>> ---
>>> I have realized that exporting the vformat specializations correclty
>>> requires
>>> much bigger code changes, than I am comfortable making this late in the
>>> stage-4,
>>> as we will need to make the code independed on TU specific properties
>>> (like
>>> encodinds). This patch instead adds a context members to
>>> basic_format_context
>>> and basic_format_parse_context that would allow doing so in the future.
>>>
>>
>> An alternative would be to have an inline dispatching function that
>> decides whether the current TU matches what's in the library (where that
>> will be the common case) and only uses the explicit instantiations of it
>> matches.
>>
> Seems reasonable for unicode encoding, but does not solve multiple
> standard.
>
>>
>> I'm not sure this is really a problem I care about solving. If you try to
>> mix incompatible literal encodings in one program you shouldn't expect
>> sensible results for code that is sensitive to the literal encoding.
>>
>> When mixing C++20 and C++23, the C++20 TUs should use the explicit
>> instantiation which is right for C++20, and C++23 TUs will use an implicit
>> instantiation of the C++23 definition.
>>
> This works only on surface level, C++20 will use __vformat_impl_20 and
> C++23 will use __vformat_impl_23, defined in format-inst-20, format-inst-23
> source files.
>

Once we stabilise C++23 I think we should only instantiate the format
functions for C++23. Will anybody really care if they use a dynamic format
string in C++20 code and don't get an exception for using a C++23 format
specifier?


Both of these files will instantiate `__formatter_str` under the same name.
> When they are combined into `libstdc++.so`, linker one, and one standard
> will get incorrect behavior for one of the standards.
>

Fine, the older standard could get the "wrong"  behaviour (where wrong just
means supporting C++23 features when called from C++20 TUs).


To avoid this problem we will need to ABI tag each used formatter in some
> manner and apply that
> tag virally to any formatter referenced from these functions.
>
>
>>
>> Is there really a problem?
>>
>> If we can capture the API level without adding any overhead, I suppose
>> that's acceptable.
>>
>> If we store the text encoding, what are we going to do with it? Use iconv
>> to convert the fill character on the fly? To what output encoding?
>>
> This if for future, if we want to handle string escaping for non-unicode
> encoding better than giving them equivalent behavior than ASCII.
> We can add any additional fields to basic_format_parse_context and
> basic_format_context in the future; this is why I am reserving space for it.
>
> // Also could you take a look at:
> https://gcc.gnu.org/pipermail/libstdc++/2026-April/066030.html
>
>
>>
>>
>>
>>
>>> Tested all *format* test on x86_64-linux. OK for trunk when all test
>>> passes?
>>>
>>>  libstdc++-v3/include/std/format | 209 ++++++++++++++++++++------------
>>>  1 file changed, 133 insertions(+), 76 deletions(-)
>>>
>>> diff --git a/libstdc++-v3/include/std/format
>>> b/libstdc++-v3/include/std/format
>>> index eca5bd213aa..97d1ecb3ed6 100644
>>> --- a/libstdc++-v3/include/std/format
>>> +++ b/libstdc++-v3/include/std/format
>>> @@ -140,6 +140,37 @@ namespace __format
>>>        template<typename, typename...> friend struct
>>> std::basic_format_string;
>>>      };
>>>
>>> +  // Exposed via basic_format_parse_context, defines the TU specific
>>> information
>>> +  // like encoding and standard version.
>>> +  struct _Api_ctx
>>> +  {
>>> +    enum class _Version : unsigned char
>>> +    { _Api_2020, _Api_2023, _Api_2026 };
>>> +
>>> +    _Version _M_ver;
>>> +    unsigned _M_unused : 23;
>>> +    unsigned _M_literal_unicode : 1;
>>> +    __INT_LEAST32_TYPE__ _M_literal_enc;
>>> +  };
>>> +  using enum _Api_ctx::_Version;
>>> +
>>> +  template<typename _CharT>
>>> +    constexpr _Api_ctx
>>> +    __current_api()
>>>
>>
>> Should this be always inline?
>>
>> +    {
>>> +      _Api_ctx __api{};
>>> +#if __cpluplus > 202302L
>>> +      __api._M_ver = _Api_2026;
>>> +#elif __cpluplus > 202002L
>>> +      __api._M_ver = _Api_2023;
>>> +#else
>>> +      __api._M_ver = _Api_2020;
>>> +#endif
>>> +      __api._M_literal_unicode
>>> +       = __unicode::__literal_encoding_is_unicode<_CharT>();
>>> +      return __api;
>>> +    }
>>> +
>>>  } // namespace __format
>>>  /// @endcond
>>>
>>> @@ -274,7 +305,7 @@ namespace __format
>>>    { __throw_format_error("format error: failed to parse format-spec"); }
>>>
>>>    template<typename _CharT> class _Scanner;
>>> -
>>> +  template<typename _Out, typename _CharT> class _Formatting_scanner;
>>>  } // namespace __format
>>>    /// @endcond
>>>
>>> @@ -408,23 +439,34 @@ namespace __format
>>>        // This must not be constexpr.
>>>        static void __invalid_dynamic_spec(const char*);
>>>
>>> -      friend __format::_Scanner<_CharT>;
>>> -#endif
>>> -
>>> +#else
>>>        // This constructor should only be used by the implementation.
>>>        constexpr explicit
>>>        basic_format_parse_context(basic_string_view<_CharT> __fmt,
>>>                                  size_t __num_args) noexcept
>>>        : _M_begin(__fmt.begin()), _M_end(__fmt.end()),
>>> _M_num_args(__num_args)
>>>        { }
>>> +#endif
>>>
>>>      private:
>>> +      // This constructor should only be used by the implementation.
>>> +      constexpr explicit
>>> +      basic_format_parse_context(__format::_Api_ctx __api,
>>> +                                basic_string_view<_CharT> __fmt,
>>> +                                size_t __num_args) noexcept
>>> +      : _M_api(__api), _M_begin(__fmt.begin()), _M_end(__fmt.end())
>>> +      , _M_num_args(__num_args)
>>> +      { }
>>> +
>>> +      __format::_Api_ctx _M_api = __format::__current_api<_CharT>();
>>>
>>
>> What guarantees this will be initialized by a call to the right version?
>>
> This is only used by basic_format_parse_context(string) constructor, that
> is mostly used
> for user defined-formatters. We may want to define this cosntructors as
> always inline.
>
>>
>> Doesn't putting this member first add a lot of wasted padding due to
>> alignment?
>>
> I do not think basic_format_parse_context and basic_format_context size is
> relevant to
> anybody. But the struct is 64bits, so should not add extra pading.
>

I don't understand how adding a new byte before the first iterator member
doesn't introduce sizeof(void*)-1 bytes of padding.

Why don't we just give _Indexing a fixed underlying type of unsigned char
and then put the API version after that?




>>
>>        iterator _M_begin;
>>>        iterator _M_end;
>>>        enum _Indexing { _Unknown, _Manual, _Auto };
>>>        _Indexing _M_indexing = _Unknown;
>>>
>>
>> We already have padding bytes here (and could guarantee that by giving a
>> fixed underlying type to _Indexing)
>>
>>        size_t _M_next_arg_id = 0;
>>>        size_t _M_num_args = 0;
>>> +
>>> +      friend __format::_Scanner<_CharT>;
>>>      };
>>>
>>>  /// @cond undocumented
>>> @@ -4927,18 +4969,21 @@ namespace __format
>>>      {
>>>        static_assert( output_iterator<_Out, const _CharT&> );
>>>
>>> +      __format::_Api_ctx  _M_api;
>>>        basic_format_args<basic_format_context> _M_args;
>>>        _Out _M_out;
>>>        __format::_Optional_locale _M_loc;
>>>
>>> -      basic_format_context(basic_format_args<basic_format_context>
>>> __args,
>>> +      basic_format_context(__format::_Api_ctx __api,
>>> +                          basic_format_args<basic_format_context>
>>> __args,
>>>                            _Out __out)
>>> -      : _M_args(__args), _M_out(std::move(__out))
>>> +      : _M_api(__api), _M_args(__args), _M_out(std::move(__out))
>>>        { }
>>>
>>> -      basic_format_context(basic_format_args<basic_format_context>
>>> __args,
>>> +      basic_format_context(__format::_Api_ctx __api,
>>> +                          basic_format_args<basic_format_context>
>>> __args,
>>>                            _Out __out, const std::locale& __loc)
>>> -      : _M_args(__args), _M_out(std::move(__out)), _M_loc(__loc)
>>> +      : _M_api(__api), _M_args(__args),
>>> _M_out(std::move(__out)), _M_loc(__loc)
>>>        { }
>>>
>>>        // _GLIBCXX_RESOLVE_LIB_DEFECTS
>>> @@ -4954,6 +4999,7 @@ namespace __format
>>>                                   const locale*);
>>>
>>>        friend __format::__formatter_chrono<_CharT>;
>>> +      friend __format::_Formatting_scanner<_Out, _CharT>;
>>>
>>>      public:
>>>        ~basic_format_context() = default;
>>> @@ -4998,8 +5044,9 @@ namespace __format
>>>        } _M_pc;
>>>
>>>        constexpr explicit
>>> -      _Scanner(basic_string_view<_CharT> __str, size_t __nargs =
>>> (size_t)-1)
>>> -      : _M_pc(__str, __nargs)
>>> +      _Scanner(_Api_ctx __api, basic_string_view<_CharT> __str,
>>> +              size_t __nargs = (size_t)-1)
>>> +      : _M_pc(__api, __str, __nargs)
>>>        { }
>>>
>>>        constexpr iterator begin() const noexcept { return _M_pc.begin();
>>> }
>>> @@ -5115,7 +5162,7 @@ namespace __format
>>>      public:
>>>        _Formatting_scanner(basic_format_context<_Out, _CharT>& __fc,
>>>                           basic_string_view<_CharT> __str)
>>> -      : _Scanner<_CharT>(__str), _M_fc(__fc)
>>> +      : _Scanner<_CharT>(__fc._M_api, __str), _M_fc(__fc)
>>>        { }
>>>
>>>      private:
>>> @@ -5176,7 +5223,8 @@ namespace __format
>>>      public:
>>>        consteval
>>>        _Checking_scanner(basic_string_view<_CharT> __str)
>>> -      : _Scanner<_CharT>(__str, sizeof...(_Args))
>>> +      : _Scanner<_CharT>(__format::__current_api<_CharT>(),
>>>
>>
>> This is consteval so should use the right version for the current TU.
>>
>> +                        __str, sizeof...(_Args))
>>>        {
>>>  #if __cpp_lib_format >= 202305L
>>>         this->_M_pc._M_types = _M_types.data();
>>> @@ -5219,82 +5267,91 @@ namespace __format
>>>  #endif
>>>      };
>>>
>>> -  template<typename _Out, typename _CharT, typename _Context>
>>> -    inline _Out
>>> -    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>> -                   const basic_format_args<_Context>& __args,
>>> -                   const locale* __loc)
>>> +  template<typename _CharT>
>>> +    _Sink_iter<_CharT>
>>> +    __do_vformat_to(_Sink_iter<_CharT> __out, basic_string_view<_CharT>
>>> __fmt,
>>> +                   __format_context<_CharT>& __ctx)
>>>      {
>>> -      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>> -       {
>>> -         if constexpr (is_same_v<_CharT, char>)
>>> -           // Fast path for "{}" format strings and simple format arg
>>> types.
>>> -           if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>>> -             {
>>> -               bool __done = false;
>>> -               __format::__visit_format_arg([&](auto& __arg) {
>>> -                 using _Tp = remove_cvref_t<decltype(__arg)>;
>>> -                 if constexpr (is_same_v<_Tp, bool>)
>>> +      if constexpr (is_same_v<_CharT, char>)
>>> +       // Fast path for "{}" format strings and simple format arg types.
>>> +       if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>>> +         {
>>> +           bool __done = false;
>>> +           __format::__visit_format_arg([&](auto& __arg) {
>>> +             using _Tp = remove_cvref_t<decltype(__arg)>;
>>> +             if constexpr (is_same_v<_Tp, bool>)
>>> +               {
>>> +                 size_t __len = 4 + !__arg;
>>> +                 const char* __chars[] = { "false", "true" };
>>> +                 if (auto __res = __out._M_reserve(__len))
>>>                     {
>>> -                     size_t __len = 4 + !__arg;
>>> -                     const char* __chars[] = { "false", "true" };
>>> -                     if (auto __res = __out._M_reserve(__len))
>>> -                       {
>>> -                         __builtin_memcpy(__res.get(), __chars[__arg],
>>> __len);
>>> -                         __res._M_bump(__len);
>>> -                         __done = true;
>>> -                       }
>>> +                     __builtin_memcpy(__res.get(), __chars[__arg],
>>> __len);
>>> +                     __res._M_bump(__len);
>>> +                     __done = true;
>>>                     }
>>> -                 else if constexpr (is_same_v<_Tp, char>)
>>> +               }
>>> +             else if constexpr (is_same_v<_Tp, char>)
>>> +               {
>>> +                 if (auto __res = __out._M_reserve(1))
>>>                     {
>>> -                     if (auto __res = __out._M_reserve(1))
>>> -                       {
>>> -                         *__res.get() = __arg;
>>> -                         __res._M_bump(1);
>>> -                         __done = true;
>>> -                       }
>>> +                     *__res.get() = __arg;
>>> +                     __res._M_bump(1);
>>> +                     __done = true;
>>>                     }
>>> -                 else if constexpr (is_integral_v<_Tp>)
>>> +               }
>>> +             else if constexpr (is_integral_v<_Tp>)
>>> +               {
>>> +                 make_unsigned_t<_Tp> __uval;
>>> +                 const bool __neg = __arg < 0;
>>> +                 if (__neg)
>>> +                   __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>> +                 else
>>> +                   __uval = __arg;
>>> +                 const auto __n = __detail::__to_chars_len(__uval);
>>> +                 if (auto __res = __out._M_reserve(__n + __neg))
>>>                     {
>>> -                     make_unsigned_t<_Tp> __uval;
>>> -                     const bool __neg = __arg < 0;
>>> -                     if (__neg)
>>> -                       __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>> -                     else
>>> -                       __uval = __arg;
>>> -                     const auto __n = __detail::__to_chars_len(__uval);
>>> -                     if (auto __res = __out._M_reserve(__n + __neg))
>>> -                       {
>>> -                         auto __ptr = __res.get();
>>> -                         *__ptr = '-';
>>> -                         __detail::__to_chars_10_impl(__ptr +
>>> (int)__neg, __n,
>>> -                                                      __uval);
>>> -                         __res._M_bump(__n + __neg);
>>> -                         __done = true;
>>> -                       }
>>> +                     auto __ptr = __res.get();
>>> +                     *__ptr = '-';
>>> +                     __detail::__to_chars_10_impl(__ptr + (int)__neg,
>>> __n,
>>> +                                                  __uval);
>>> +                     __res._M_bump(__n + __neg);
>>> +                     __done = true;
>>>                     }
>>> -                 else if constexpr (is_convertible_v<_Tp, string_view>)
>>> +               }
>>> +             else if constexpr (is_convertible_v<_Tp, string_view>)
>>> +               {
>>> +                 string_view __sv = __arg;
>>> +                 if (auto __res = __out._M_reserve(__sv.size()))
>>>                     {
>>> -                     string_view __sv = __arg;
>>> -                     if (auto __res = __out._M_reserve(__sv.size()))
>>> -                       {
>>> -                         __builtin_memcpy(__res.get(), __sv.data(),
>>> __sv.size());
>>> -                         __res._M_bump(__sv.size());
>>> -                         __done = true;
>>> -                       }
>>> +                     __builtin_memcpy(__res.get(), __sv.data(),
>>> __sv.size());
>>> +                     __res._M_bump(__sv.size());
>>> +                     __done = true;
>>>                     }
>>> -               }, __args.get(0));
>>> +               }
>>> +           }, __ctx.arg(0));
>>>
>>> -               if (__done)
>>> -                 return __out;
>>> -             }
>>> +           if (__done)
>>> +             return __out;
>>> +         }
>>>
>>> +      _Formatting_scanner<_Sink_iter<_CharT>, _CharT> __scanner(__ctx,
>>> __fmt);
>>> +      __scanner._M_scan();
>>> +      return __out;
>>> +    }
>>> +
>>> +  template<typename _Out, typename _CharT, typename _Context>
>>> +    _Out
>>> +    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>> +                   const basic_format_args<_Context>& __args,
>>> +                   const locale* __loc)
>>> +    {
>>> +      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>> +       {
>>> +         const auto __api = __format::__current_api<_CharT>();
>>>           auto __ctx = __loc == nullptr
>>> -                        ? _Context(__args, __out)
>>> -                        : _Context(__args, __out, *__loc);
>>> -         _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>> __scanner(__ctx, __fmt);
>>> -         __scanner._M_scan();
>>> -         return __out;
>>> +                    ? _Context(__api, __args, __out)
>>> +                    : _Context(__api, __args, __out, *__loc);
>>> +         return __do_vformat_to(std::move(__out), __fmt, __ctx);
>>>         }
>>>        else if constexpr (__contiguous_char_iter<_CharT, _Out>)
>>>         {
>>> --
>>> 2.53.0
>>>
>>>
  
Tomasz Kaminski April 7, 2026, 2:08 p.m. UTC | #4
On Tue, Apr 7, 2026 at 3:54 PM Jonathan Wakely <jwakely.gcc@gmail.com>
wrote:

>
>
> On Tue, 7 Apr 2026, 14:30 Tomasz Kaminski, <tkaminsk@redhat.com> wrote:
>
>>
>>
>> On Tue, Apr 7, 2026 at 3:00 PM Jonathan Wakely <jwakely.gcc@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, 7 Apr 2026, 09:15 Tomasz Kamiński, <tkaminsk@redhat.com> wrote:
>>>
>>>> This patch adds a _M_api member to basic_format_context and
>>>> basic_format_parse_context, that represents the information about
>>>> the TU in which the call was compiled:
>>>> * _M_ver represents the C++ standard in which TU was compiled,
>>>> * _M_literal_unicode is true when TU was compiled with Unicode
>>>>   literal encoding,
>>>> * _M_liter_enc is reserved for storing text_encoding::id value
>>>>   for literal encoding, currently set to zero.
>>>> This values are then populated by __current_api<_CharT>() functions.
>>>>
>>>> This would allow the formatter instantiations compiled in different
>>>> TU (for example as part of libstdc++.so) to properly handle:
>>>> * multi-byte fill-characters used as fill in format-spec, that
>>>>   are supported only if literal encoding is Unicode,
>>>> * '?' as format flags for string and characters, that is only
>>>>   supported since C++23,
>>>> * escaping of the string parameters, that depends on the literal
>>>>   encoding.
>>>>
>>>> The further aid the above, a new __do_vformat_to overload is extracted.
>>>> This overload format_context& that encodes the TU-specific properties,
>>>> and can be exported in from libstdc++.
>>>>
>>>> This patch on purpose does not modify the formatters code, and only
>>>> adds new members, as adding them later would be ABI break.
>>>>
>>>> libstdc++-v3/ChangeLog:
>>>>
>>>>         * include/std/format (__format::_Api_ctx,
>>>> __format::__current_api):
>>>>         Define.
>>>>         (basic_format_parse_context::_M_api): Define.
>>>>         (basic_format_parse_context::basic_format_parse_context):
>>>>         Provide (basic_string_view, size_t) constructor only in C++20.
>>>>         Define new internal private cosntructor accepting _Api_ctx.
>>>>         (basic_format_context::_M_api): Define.
>>>>         (basic_format_context::basic_format_context): Add additional
>>>>         _Api_ctx parameter.
>>>>         (_Scanner::_Scanner): Add additional _Api_ctx parameter,
>>>>         and forward it to basic_format_parse_context.
>>>>         (_Formatting_scanner::_Formatting_scanner): Propagate
>>>>         _M_api from basic_format_context.
>>>>         (_Checking_scanner::_Checking_scanner): Use
>>>> __format::__current_api()
>>>>         to initialize API.
>>>>         (__format::__do_vformat_to): Extract overload accepting
>>>>         basic_format_context.
>>>> ---
>>>> I have realized that exporting the vformat specializations correclty
>>>> requires
>>>> much bigger code changes, than I am comfortable making this late in the
>>>> stage-4,
>>>> as we will need to make the code independed on TU specific properties
>>>> (like
>>>> encodinds). This patch instead adds a context members to
>>>> basic_format_context
>>>> and basic_format_parse_context that would allow doing so in the future.
>>>>
>>>
>>> An alternative would be to have an inline dispatching function that
>>> decides whether the current TU matches what's in the library (where that
>>> will be the common case) and only uses the explicit instantiations of it
>>> matches.
>>>
>> Seems reasonable for unicode encoding, but does not solve multiple
>> standard.
>>
>>>
>>> I'm not sure this is really a problem I care about solving. If you try
>>> to mix incompatible literal encodings in one program you shouldn't expect
>>> sensible results for code that is sensitive to the literal encoding.
>>>
>>> When mixing C++20 and C++23, the C++20 TUs should use the explicit
>>> instantiation which is right for C++20, and C++23 TUs will use an implicit
>>> instantiation of the C++23 definition.
>>>
>> This works only on surface level, C++20 will use __vformat_impl_20 and
>> C++23 will use __vformat_impl_23, defined in format-inst-20, format-inst-23
>> source files.
>>
>
> Once we stabilise C++23 I think we should only instantiate the format
> functions for C++23. Will anybody really care if they use a dynamic format
> string in C++20 code and don't get an exception for using a C++23 format
> specifier?
>
Yes, the fact that the format specifier string is accepted by `vformat` but
rejected by `format` seems very surprising to me, but if we accepted
that we have a much simpler problem to solve. If we plan to accept all
C++23 specifiers for basic types as extension in C++20 mode, once
it will be stabilized, then that would sound much more intuitive for me.

>
> Both of these files will instantiate `__formatter_str` under the same
>> name. When they are combined into `libstdc++.so`, linker one, and one
>> standard
>> will get incorrect behavior for one of the standards.
>>
>
> Fine, the older standard could get the "wrong"  behaviour (where wrong
> just means supporting C++23 features when called from C++20 TUs).
>
I considered that unacceptable, and we even have a test cases checking if
that is the case. The failures for these tests were the reason
I started exploring alternatives.

>
> To avoid this problem we will need to ABI tag each used formatter in some
>> manner and apply that
>> tag virally to any formatter referenced from these functions.
>>
>>
>>>
>>> Is there really a problem?
>>>
>>> If we can capture the API level without adding any overhead, I suppose
>>> that's acceptable.
>>>
>>> If we store the text encoding, what are we going to do with it? Use
>>> iconv to convert the fill character on the fly? To what output encoding?
>>>
>> This if for future, if we want to handle string escaping for non-unicode
>> encoding better than giving them equivalent behavior than ASCII.
>> We can add any additional fields to basic_format_parse_context and
>> basic_format_context in the future; this is why I am reserving space for it.
>>
>> // Also could you take a look at:
>> https://gcc.gnu.org/pipermail/libstdc++/2026-April/066030.html
>>
>>
>>>
>>>
>>>
>>>
>>>> Tested all *format* test on x86_64-linux. OK for trunk when all test
>>>> passes?
>>>>
>>>>  libstdc++-v3/include/std/format | 209 ++++++++++++++++++++------------
>>>>  1 file changed, 133 insertions(+), 76 deletions(-)
>>>>
>>>> diff --git a/libstdc++-v3/include/std/format
>>>> b/libstdc++-v3/include/std/format
>>>> index eca5bd213aa..97d1ecb3ed6 100644
>>>> --- a/libstdc++-v3/include/std/format
>>>> +++ b/libstdc++-v3/include/std/format
>>>> @@ -140,6 +140,37 @@ namespace __format
>>>>        template<typename, typename...> friend struct
>>>> std::basic_format_string;
>>>>      };
>>>>
>>>> +  // Exposed via basic_format_parse_context, defines the TU specific
>>>> information
>>>> +  // like encoding and standard version.
>>>> +  struct _Api_ctx
>>>> +  {
>>>> +    enum class _Version : unsigned char
>>>> +    { _Api_2020, _Api_2023, _Api_2026 };
>>>> +
>>>> +    _Version _M_ver;
>>>> +    unsigned _M_unused : 23;
>>>> +    unsigned _M_literal_unicode : 1;
>>>> +    __INT_LEAST32_TYPE__ _M_literal_enc;
>>>> +  };
>>>> +  using enum _Api_ctx::_Version;
>>>> +
>>>> +  template<typename _CharT>
>>>> +    constexpr _Api_ctx
>>>> +    __current_api()
>>>>
>>>
>>> Should this be always inline?
>>>
>>> +    {
>>>> +      _Api_ctx __api{};
>>>> +#if __cpluplus > 202302L
>>>> +      __api._M_ver = _Api_2026;
>>>> +#elif __cpluplus > 202002L
>>>> +      __api._M_ver = _Api_2023;
>>>> +#else
>>>> +      __api._M_ver = _Api_2020;
>>>> +#endif
>>>> +      __api._M_literal_unicode
>>>> +       = __unicode::__literal_encoding_is_unicode<_CharT>();
>>>> +      return __api;
>>>> +    }
>>>> +
>>>>  } // namespace __format
>>>>  /// @endcond
>>>>
>>>> @@ -274,7 +305,7 @@ namespace __format
>>>>    { __throw_format_error("format error: failed to parse format-spec");
>>>> }
>>>>
>>>>    template<typename _CharT> class _Scanner;
>>>> -
>>>> +  template<typename _Out, typename _CharT> class _Formatting_scanner;
>>>>  } // namespace __format
>>>>    /// @endcond
>>>>
>>>> @@ -408,23 +439,34 @@ namespace __format
>>>>        // This must not be constexpr.
>>>>        static void __invalid_dynamic_spec(const char*);
>>>>
>>>> -      friend __format::_Scanner<_CharT>;
>>>> -#endif
>>>> -
>>>> +#else
>>>>        // This constructor should only be used by the implementation.
>>>>        constexpr explicit
>>>>        basic_format_parse_context(basic_string_view<_CharT> __fmt,
>>>>                                  size_t __num_args) noexcept
>>>>        : _M_begin(__fmt.begin()), _M_end(__fmt.end()),
>>>> _M_num_args(__num_args)
>>>>        { }
>>>> +#endif
>>>>
>>>>      private:
>>>> +      // This constructor should only be used by the implementation.
>>>> +      constexpr explicit
>>>> +      basic_format_parse_context(__format::_Api_ctx __api,
>>>> +                                basic_string_view<_CharT> __fmt,
>>>> +                                size_t __num_args) noexcept
>>>> +      : _M_api(__api), _M_begin(__fmt.begin()), _M_end(__fmt.end())
>>>> +      , _M_num_args(__num_args)
>>>> +      { }
>>>> +
>>>> +      __format::_Api_ctx _M_api = __format::__current_api<_CharT>();
>>>>
>>>
>>> What guarantees this will be initialized by a call to the right version?
>>>
>> This is only used by basic_format_parse_context(string) constructor, that
>> is mostly used
>> for user defined-formatters. We may want to define this cosntructors as
>> always inline.
>>
>>>
>>> Doesn't putting this member first add a lot of wasted padding due to
>>> alignment?
>>>
>> I do not think basic_format_parse_context and basic_format_context size
>> is relevant to
>> anybody. But the struct is 64bits, so should not add extra pading.
>>
>
> I don't understand how adding a new byte before the first iterator member
> doesn't introduce sizeof(void*)-1 bytes of padding.
>
_Api_ctx is 8B struct with the literal encoding information. And I
considered having a spare bytes there as a feature, and not
a drawback.

>
> Why don't we just give _Indexing a fixed underlying type of unsigned char
> and then put the API version after that?
>
>
>
>
>>>
>>>        iterator _M_begin;
>>>>        iterator _M_end;
>>>>        enum _Indexing { _Unknown, _Manual, _Auto };
>>>>        _Indexing _M_indexing = _Unknown;
>>>>
>>>
>>> We already have padding bytes here (and could guarantee that by giving a
>>> fixed underlying type to _Indexing)
>>>
>>>        size_t _M_next_arg_id = 0;
>>>>        size_t _M_num_args = 0;
>>>> +
>>>> +      friend __format::_Scanner<_CharT>;
>>>>      };
>>>>
>>>>  /// @cond undocumented
>>>> @@ -4927,18 +4969,21 @@ namespace __format
>>>>      {
>>>>        static_assert( output_iterator<_Out, const _CharT&> );
>>>>
>>>> +      __format::_Api_ctx  _M_api;
>>>>        basic_format_args<basic_format_context> _M_args;
>>>>        _Out _M_out;
>>>>        __format::_Optional_locale _M_loc;
>>>>
>>>> -      basic_format_context(basic_format_args<basic_format_context>
>>>> __args,
>>>> +      basic_format_context(__format::_Api_ctx __api,
>>>> +                          basic_format_args<basic_format_context>
>>>> __args,
>>>>                            _Out __out)
>>>> -      : _M_args(__args), _M_out(std::move(__out))
>>>> +      : _M_api(__api), _M_args(__args), _M_out(std::move(__out))
>>>>        { }
>>>>
>>>> -      basic_format_context(basic_format_args<basic_format_context>
>>>> __args,
>>>> +      basic_format_context(__format::_Api_ctx __api,
>>>> +                          basic_format_args<basic_format_context>
>>>> __args,
>>>>                            _Out __out, const std::locale& __loc)
>>>> -      : _M_args(__args), _M_out(std::move(__out)), _M_loc(__loc)
>>>> +      : _M_api(__api), _M_args(__args),
>>>> _M_out(std::move(__out)), _M_loc(__loc)
>>>>        { }
>>>>
>>>>        // _GLIBCXX_RESOLVE_LIB_DEFECTS
>>>> @@ -4954,6 +4999,7 @@ namespace __format
>>>>                                   const locale*);
>>>>
>>>>        friend __format::__formatter_chrono<_CharT>;
>>>> +      friend __format::_Formatting_scanner<_Out, _CharT>;
>>>>
>>>>      public:
>>>>        ~basic_format_context() = default;
>>>> @@ -4998,8 +5044,9 @@ namespace __format
>>>>        } _M_pc;
>>>>
>>>>        constexpr explicit
>>>> -      _Scanner(basic_string_view<_CharT> __str, size_t __nargs =
>>>> (size_t)-1)
>>>> -      : _M_pc(__str, __nargs)
>>>> +      _Scanner(_Api_ctx __api, basic_string_view<_CharT> __str,
>>>> +              size_t __nargs = (size_t)-1)
>>>> +      : _M_pc(__api, __str, __nargs)
>>>>        { }
>>>>
>>>>        constexpr iterator begin() const noexcept { return
>>>> _M_pc.begin(); }
>>>> @@ -5115,7 +5162,7 @@ namespace __format
>>>>      public:
>>>>        _Formatting_scanner(basic_format_context<_Out, _CharT>& __fc,
>>>>                           basic_string_view<_CharT> __str)
>>>> -      : _Scanner<_CharT>(__str), _M_fc(__fc)
>>>> +      : _Scanner<_CharT>(__fc._M_api, __str), _M_fc(__fc)
>>>>        { }
>>>>
>>>>      private:
>>>> @@ -5176,7 +5223,8 @@ namespace __format
>>>>      public:
>>>>        consteval
>>>>        _Checking_scanner(basic_string_view<_CharT> __str)
>>>> -      : _Scanner<_CharT>(__str, sizeof...(_Args))
>>>> +      : _Scanner<_CharT>(__format::__current_api<_CharT>(),
>>>>
>>>
>>> This is consteval so should use the right version for the current TU.
>>>
>>> +                        __str, sizeof...(_Args))
>>>>        {
>>>>  #if __cpp_lib_format >= 202305L
>>>>         this->_M_pc._M_types = _M_types.data();
>>>> @@ -5219,82 +5267,91 @@ namespace __format
>>>>  #endif
>>>>      };
>>>>
>>>> -  template<typename _Out, typename _CharT, typename _Context>
>>>> -    inline _Out
>>>> -    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>>> -                   const basic_format_args<_Context>& __args,
>>>> -                   const locale* __loc)
>>>> +  template<typename _CharT>
>>>> +    _Sink_iter<_CharT>
>>>> +    __do_vformat_to(_Sink_iter<_CharT> __out,
>>>> basic_string_view<_CharT> __fmt,
>>>> +                   __format_context<_CharT>& __ctx)
>>>>      {
>>>> -      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>>> -       {
>>>> -         if constexpr (is_same_v<_CharT, char>)
>>>> -           // Fast path for "{}" format strings and simple format arg
>>>> types.
>>>> -           if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>>>> -             {
>>>> -               bool __done = false;
>>>> -               __format::__visit_format_arg([&](auto& __arg) {
>>>> -                 using _Tp = remove_cvref_t<decltype(__arg)>;
>>>> -                 if constexpr (is_same_v<_Tp, bool>)
>>>> +      if constexpr (is_same_v<_CharT, char>)
>>>> +       // Fast path for "{}" format strings and simple format arg
>>>> types.
>>>> +       if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>>>> +         {
>>>> +           bool __done = false;
>>>> +           __format::__visit_format_arg([&](auto& __arg) {
>>>> +             using _Tp = remove_cvref_t<decltype(__arg)>;
>>>> +             if constexpr (is_same_v<_Tp, bool>)
>>>> +               {
>>>> +                 size_t __len = 4 + !__arg;
>>>> +                 const char* __chars[] = { "false", "true" };
>>>> +                 if (auto __res = __out._M_reserve(__len))
>>>>                     {
>>>> -                     size_t __len = 4 + !__arg;
>>>> -                     const char* __chars[] = { "false", "true" };
>>>> -                     if (auto __res = __out._M_reserve(__len))
>>>> -                       {
>>>> -                         __builtin_memcpy(__res.get(), __chars[__arg],
>>>> __len);
>>>> -                         __res._M_bump(__len);
>>>> -                         __done = true;
>>>> -                       }
>>>> +                     __builtin_memcpy(__res.get(), __chars[__arg],
>>>> __len);
>>>> +                     __res._M_bump(__len);
>>>> +                     __done = true;
>>>>                     }
>>>> -                 else if constexpr (is_same_v<_Tp, char>)
>>>> +               }
>>>> +             else if constexpr (is_same_v<_Tp, char>)
>>>> +               {
>>>> +                 if (auto __res = __out._M_reserve(1))
>>>>                     {
>>>> -                     if (auto __res = __out._M_reserve(1))
>>>> -                       {
>>>> -                         *__res.get() = __arg;
>>>> -                         __res._M_bump(1);
>>>> -                         __done = true;
>>>> -                       }
>>>> +                     *__res.get() = __arg;
>>>> +                     __res._M_bump(1);
>>>> +                     __done = true;
>>>>                     }
>>>> -                 else if constexpr (is_integral_v<_Tp>)
>>>> +               }
>>>> +             else if constexpr (is_integral_v<_Tp>)
>>>> +               {
>>>> +                 make_unsigned_t<_Tp> __uval;
>>>> +                 const bool __neg = __arg < 0;
>>>> +                 if (__neg)
>>>> +                   __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>>> +                 else
>>>> +                   __uval = __arg;
>>>> +                 const auto __n = __detail::__to_chars_len(__uval);
>>>> +                 if (auto __res = __out._M_reserve(__n + __neg))
>>>>                     {
>>>> -                     make_unsigned_t<_Tp> __uval;
>>>> -                     const bool __neg = __arg < 0;
>>>> -                     if (__neg)
>>>> -                       __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>>> -                     else
>>>> -                       __uval = __arg;
>>>> -                     const auto __n = __detail::__to_chars_len(__uval);
>>>> -                     if (auto __res = __out._M_reserve(__n + __neg))
>>>> -                       {
>>>> -                         auto __ptr = __res.get();
>>>> -                         *__ptr = '-';
>>>> -                         __detail::__to_chars_10_impl(__ptr +
>>>> (int)__neg, __n,
>>>> -                                                      __uval);
>>>> -                         __res._M_bump(__n + __neg);
>>>> -                         __done = true;
>>>> -                       }
>>>> +                     auto __ptr = __res.get();
>>>> +                     *__ptr = '-';
>>>> +                     __detail::__to_chars_10_impl(__ptr + (int)__neg,
>>>> __n,
>>>> +                                                  __uval);
>>>> +                     __res._M_bump(__n + __neg);
>>>> +                     __done = true;
>>>>                     }
>>>> -                 else if constexpr (is_convertible_v<_Tp, string_view>)
>>>> +               }
>>>> +             else if constexpr (is_convertible_v<_Tp, string_view>)
>>>> +               {
>>>> +                 string_view __sv = __arg;
>>>> +                 if (auto __res = __out._M_reserve(__sv.size()))
>>>>                     {
>>>> -                     string_view __sv = __arg;
>>>> -                     if (auto __res = __out._M_reserve(__sv.size()))
>>>> -                       {
>>>> -                         __builtin_memcpy(__res.get(), __sv.data(),
>>>> __sv.size());
>>>> -                         __res._M_bump(__sv.size());
>>>> -                         __done = true;
>>>> -                       }
>>>> +                     __builtin_memcpy(__res.get(), __sv.data(),
>>>> __sv.size());
>>>> +                     __res._M_bump(__sv.size());
>>>> +                     __done = true;
>>>>                     }
>>>> -               }, __args.get(0));
>>>> +               }
>>>> +           }, __ctx.arg(0));
>>>>
>>>> -               if (__done)
>>>> -                 return __out;
>>>> -             }
>>>> +           if (__done)
>>>> +             return __out;
>>>> +         }
>>>>
>>>> +      _Formatting_scanner<_Sink_iter<_CharT>, _CharT> __scanner(__ctx,
>>>> __fmt);
>>>> +      __scanner._M_scan();
>>>> +      return __out;
>>>> +    }
>>>> +
>>>> +  template<typename _Out, typename _CharT, typename _Context>
>>>> +    _Out
>>>> +    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>>> +                   const basic_format_args<_Context>& __args,
>>>> +                   const locale* __loc)
>>>> +    {
>>>> +      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>>> +       {
>>>> +         const auto __api = __format::__current_api<_CharT>();
>>>>           auto __ctx = __loc == nullptr
>>>> -                        ? _Context(__args, __out)
>>>> -                        : _Context(__args, __out, *__loc);
>>>> -         _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>>> __scanner(__ctx, __fmt);
>>>> -         __scanner._M_scan();
>>>> -         return __out;
>>>> +                    ? _Context(__api, __args, __out)
>>>> +                    : _Context(__api, __args, __out, *__loc);
>>>> +         return __do_vformat_to(std::move(__out), __fmt, __ctx);
>>>>         }
>>>>        else if constexpr (__contiguous_char_iter<_CharT, _Out>)
>>>>         {
>>>> --
>>>> 2.53.0
>>>>
>>>>
  
Tomasz Kaminski April 7, 2026, 2:30 p.m. UTC | #5
Do you want me to provide an alternative patch, that exports the
definition for __do_vformat only
for TUs that use unicode literal encoding?

On Tue, Apr 7, 2026 at 4:08 PM Tomasz Kaminski <tkaminsk@redhat.com> wrote:

>
>
> On Tue, Apr 7, 2026 at 3:54 PM Jonathan Wakely <jwakely.gcc@gmail.com>
> wrote:
>
>>
>>
>> On Tue, 7 Apr 2026, 14:30 Tomasz Kaminski, <tkaminsk@redhat.com> wrote:
>>
>>>
>>>
>>> On Tue, Apr 7, 2026 at 3:00 PM Jonathan Wakely <jwakely.gcc@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, 7 Apr 2026, 09:15 Tomasz Kamiński, <tkaminsk@redhat.com> wrote:
>>>>
>>>>> This patch adds a _M_api member to basic_format_context and
>>>>> basic_format_parse_context, that represents the information about
>>>>> the TU in which the call was compiled:
>>>>> * _M_ver represents the C++ standard in which TU was compiled,
>>>>> * _M_literal_unicode is true when TU was compiled with Unicode
>>>>>   literal encoding,
>>>>> * _M_liter_enc is reserved for storing text_encoding::id value
>>>>>   for literal encoding, currently set to zero.
>>>>> This values are then populated by __current_api<_CharT>() functions.
>>>>>
>>>>> This would allow the formatter instantiations compiled in different
>>>>> TU (for example as part of libstdc++.so) to properly handle:
>>>>> * multi-byte fill-characters used as fill in format-spec, that
>>>>>   are supported only if literal encoding is Unicode,
>>>>> * '?' as format flags for string and characters, that is only
>>>>>   supported since C++23,
>>>>> * escaping of the string parameters, that depends on the literal
>>>>>   encoding.
>>>>>
>>>>> The further aid the above, a new __do_vformat_to overload is extracted.
>>>>> This overload format_context& that encodes the TU-specific properties,
>>>>> and can be exported in from libstdc++.
>>>>>
>>>>> This patch on purpose does not modify the formatters code, and only
>>>>> adds new members, as adding them later would be ABI break.
>>>>>
>>>>> libstdc++-v3/ChangeLog:
>>>>>
>>>>>         * include/std/format (__format::_Api_ctx,
>>>>> __format::__current_api):
>>>>>         Define.
>>>>>         (basic_format_parse_context::_M_api): Define.
>>>>>         (basic_format_parse_context::basic_format_parse_context):
>>>>>         Provide (basic_string_view, size_t) constructor only in C++20.
>>>>>         Define new internal private cosntructor accepting _Api_ctx.
>>>>>         (basic_format_context::_M_api): Define.
>>>>>         (basic_format_context::basic_format_context): Add additional
>>>>>         _Api_ctx parameter.
>>>>>         (_Scanner::_Scanner): Add additional _Api_ctx parameter,
>>>>>         and forward it to basic_format_parse_context.
>>>>>         (_Formatting_scanner::_Formatting_scanner): Propagate
>>>>>         _M_api from basic_format_context.
>>>>>         (_Checking_scanner::_Checking_scanner): Use
>>>>> __format::__current_api()
>>>>>         to initialize API.
>>>>>         (__format::__do_vformat_to): Extract overload accepting
>>>>>         basic_format_context.
>>>>> ---
>>>>> I have realized that exporting the vformat specializations correclty
>>>>> requires
>>>>> much bigger code changes, than I am comfortable making this late in
>>>>> the stage-4,
>>>>> as we will need to make the code independed on TU specific properties
>>>>> (like
>>>>> encodinds). This patch instead adds a context members to
>>>>> basic_format_context
>>>>> and basic_format_parse_context that would allow doing so in the future.
>>>>>
>>>>
>>>> An alternative would be to have an inline dispatching function that
>>>> decides whether the current TU matches what's in the library (where that
>>>> will be the common case) and only uses the explicit instantiations of it
>>>> matches.
>>>>
>>> Seems reasonable for unicode encoding, but does not solve multiple
>>> standard.
>>>
>>>>
>>>> I'm not sure this is really a problem I care about solving. If you try
>>>> to mix incompatible literal encodings in one program you shouldn't expect
>>>> sensible results for code that is sensitive to the literal encoding.
>>>>
>>>> When mixing C++20 and C++23, the C++20 TUs should use the explicit
>>>> instantiation which is right for C++20, and C++23 TUs will use an implicit
>>>> instantiation of the C++23 definition.
>>>>
>>> This works only on surface level, C++20 will use __vformat_impl_20 and
>>> C++23 will use __vformat_impl_23, defined in format-inst-20, format-inst-23
>>> source files.
>>>
>>
>> Once we stabilise C++23 I think we should only instantiate the format
>> functions for C++23. Will anybody really care if they use a dynamic format
>> string in C++20 code and don't get an exception for using a C++23 format
>> specifier?
>>
> Yes, the fact that the format specifier string is accepted by `vformat`
> but rejected by `format` seems very surprising to me, but if we accepted
> that we have a much simpler problem to solve. If we plan to accept all
> C++23 specifiers for basic types as extension in C++20 mode, once
> it will be stabilized, then that would sound much more intuitive for me.
>
>>
>> Both of these files will instantiate `__formatter_str` under the same
>>> name. When they are combined into `libstdc++.so`, linker one, and one
>>> standard
>>> will get incorrect behavior for one of the standards.
>>>
>>
>> Fine, the older standard could get the "wrong"  behaviour (where wrong
>> just means supporting C++23 features when called from C++20 TUs).
>>
> I considered that unacceptable, and we even have a test cases checking if
> that is the case. The failures for these tests were the reason
> I started exploring alternatives.
>
>>
>> To avoid this problem we will need to ABI tag each used formatter in some
>>> manner and apply that
>>> tag virally to any formatter referenced from these functions.
>>>
>>>
>>>>
>>>> Is there really a problem?
>>>>
>>>> If we can capture the API level without adding any overhead, I suppose
>>>> that's acceptable.
>>>>
>>>> If we store the text encoding, what are we going to do with it? Use
>>>> iconv to convert the fill character on the fly? To what output encoding?
>>>>
>>> This if for future, if we want to handle string escaping for non-unicode
>>> encoding better than giving them equivalent behavior than ASCII.
>>> We can add any additional fields to basic_format_parse_context and
>>> basic_format_context in the future; this is why I am reserving space for it.
>>>
>>> // Also could you take a look at:
>>> https://gcc.gnu.org/pipermail/libstdc++/2026-April/066030.html
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Tested all *format* test on x86_64-linux. OK for trunk when all test
>>>>> passes?
>>>>>
>>>>>  libstdc++-v3/include/std/format | 209 ++++++++++++++++++++------------
>>>>>  1 file changed, 133 insertions(+), 76 deletions(-)
>>>>>
>>>>> diff --git a/libstdc++-v3/include/std/format
>>>>> b/libstdc++-v3/include/std/format
>>>>> index eca5bd213aa..97d1ecb3ed6 100644
>>>>> --- a/libstdc++-v3/include/std/format
>>>>> +++ b/libstdc++-v3/include/std/format
>>>>> @@ -140,6 +140,37 @@ namespace __format
>>>>>        template<typename, typename...> friend struct
>>>>> std::basic_format_string;
>>>>>      };
>>>>>
>>>>> +  // Exposed via basic_format_parse_context, defines the TU specific
>>>>> information
>>>>> +  // like encoding and standard version.
>>>>> +  struct _Api_ctx
>>>>> +  {
>>>>> +    enum class _Version : unsigned char
>>>>> +    { _Api_2020, _Api_2023, _Api_2026 };
>>>>> +
>>>>> +    _Version _M_ver;
>>>>> +    unsigned _M_unused : 23;
>>>>> +    unsigned _M_literal_unicode : 1;
>>>>> +    __INT_LEAST32_TYPE__ _M_literal_enc;
>>>>> +  };
>>>>> +  using enum _Api_ctx::_Version;
>>>>> +
>>>>> +  template<typename _CharT>
>>>>> +    constexpr _Api_ctx
>>>>> +    __current_api()
>>>>>
>>>>
>>>> Should this be always inline?
>>>>
>>>> +    {
>>>>> +      _Api_ctx __api{};
>>>>> +#if __cpluplus > 202302L
>>>>> +      __api._M_ver = _Api_2026;
>>>>> +#elif __cpluplus > 202002L
>>>>> +      __api._M_ver = _Api_2023;
>>>>> +#else
>>>>> +      __api._M_ver = _Api_2020;
>>>>> +#endif
>>>>> +      __api._M_literal_unicode
>>>>> +       = __unicode::__literal_encoding_is_unicode<_CharT>();
>>>>> +      return __api;
>>>>> +    }
>>>>> +
>>>>>  } // namespace __format
>>>>>  /// @endcond
>>>>>
>>>>> @@ -274,7 +305,7 @@ namespace __format
>>>>>    { __throw_format_error("format error: failed to parse
>>>>> format-spec"); }
>>>>>
>>>>>    template<typename _CharT> class _Scanner;
>>>>> -
>>>>> +  template<typename _Out, typename _CharT> class _Formatting_scanner;
>>>>>  } // namespace __format
>>>>>    /// @endcond
>>>>>
>>>>> @@ -408,23 +439,34 @@ namespace __format
>>>>>        // This must not be constexpr.
>>>>>        static void __invalid_dynamic_spec(const char*);
>>>>>
>>>>> -      friend __format::_Scanner<_CharT>;
>>>>> -#endif
>>>>> -
>>>>> +#else
>>>>>        // This constructor should only be used by the implementation.
>>>>>        constexpr explicit
>>>>>        basic_format_parse_context(basic_string_view<_CharT> __fmt,
>>>>>                                  size_t __num_args) noexcept
>>>>>        : _M_begin(__fmt.begin()), _M_end(__fmt.end()),
>>>>> _M_num_args(__num_args)
>>>>>        { }
>>>>> +#endif
>>>>>
>>>>>      private:
>>>>> +      // This constructor should only be used by the implementation.
>>>>> +      constexpr explicit
>>>>> +      basic_format_parse_context(__format::_Api_ctx __api,
>>>>> +                                basic_string_view<_CharT> __fmt,
>>>>> +                                size_t __num_args) noexcept
>>>>> +      : _M_api(__api), _M_begin(__fmt.begin()), _M_end(__fmt.end())
>>>>> +      , _M_num_args(__num_args)
>>>>> +      { }
>>>>> +
>>>>> +      __format::_Api_ctx _M_api = __format::__current_api<_CharT>();
>>>>>
>>>>
>>>> What guarantees this will be initialized by a call to the right version?
>>>>
>>> This is only used by basic_format_parse_context(string) constructor,
>>> that is mostly used
>>> for user defined-formatters. We may want to define this cosntructors as
>>> always inline.
>>>
>>>>
>>>> Doesn't putting this member first add a lot of wasted padding due to
>>>> alignment?
>>>>
>>> I do not think basic_format_parse_context and basic_format_context size
>>> is relevant to
>>> anybody. But the struct is 64bits, so should not add extra pading.
>>>
>>
>> I don't understand how adding a new byte before the first iterator member
>> doesn't introduce sizeof(void*)-1 bytes of padding.
>>
> _Api_ctx is 8B struct with the literal encoding information. And I
> considered having a spare bytes there as a feature, and not
> a drawback.
>
>>
>> Why don't we just give _Indexing a fixed underlying type of unsigned char
>> and then put the API version after that?
>>
>>
>>
>>
>>>>
>>>>        iterator _M_begin;
>>>>>        iterator _M_end;
>>>>>        enum _Indexing { _Unknown, _Manual, _Auto };
>>>>>        _Indexing _M_indexing = _Unknown;
>>>>>
>>>>
>>>> We already have padding bytes here (and could guarantee that by giving
>>>> a fixed underlying type to _Indexing)
>>>>
>>>>        size_t _M_next_arg_id = 0;
>>>>>        size_t _M_num_args = 0;
>>>>> +
>>>>> +      friend __format::_Scanner<_CharT>;
>>>>>      };
>>>>>
>>>>>  /// @cond undocumented
>>>>> @@ -4927,18 +4969,21 @@ namespace __format
>>>>>      {
>>>>>        static_assert( output_iterator<_Out, const _CharT&> );
>>>>>
>>>>> +      __format::_Api_ctx  _M_api;
>>>>>        basic_format_args<basic_format_context> _M_args;
>>>>>        _Out _M_out;
>>>>>        __format::_Optional_locale _M_loc;
>>>>>
>>>>> -      basic_format_context(basic_format_args<basic_format_context>
>>>>> __args,
>>>>> +      basic_format_context(__format::_Api_ctx __api,
>>>>> +                          basic_format_args<basic_format_context>
>>>>> __args,
>>>>>                            _Out __out)
>>>>> -      : _M_args(__args), _M_out(std::move(__out))
>>>>> +      : _M_api(__api), _M_args(__args), _M_out(std::move(__out))
>>>>>        { }
>>>>>
>>>>> -      basic_format_context(basic_format_args<basic_format_context>
>>>>> __args,
>>>>> +      basic_format_context(__format::_Api_ctx __api,
>>>>> +                          basic_format_args<basic_format_context>
>>>>> __args,
>>>>>                            _Out __out, const std::locale& __loc)
>>>>> -      : _M_args(__args), _M_out(std::move(__out)), _M_loc(__loc)
>>>>> +      : _M_api(__api), _M_args(__args),
>>>>> _M_out(std::move(__out)), _M_loc(__loc)
>>>>>        { }
>>>>>
>>>>>        // _GLIBCXX_RESOLVE_LIB_DEFECTS
>>>>> @@ -4954,6 +4999,7 @@ namespace __format
>>>>>                                   const locale*);
>>>>>
>>>>>        friend __format::__formatter_chrono<_CharT>;
>>>>> +      friend __format::_Formatting_scanner<_Out, _CharT>;
>>>>>
>>>>>      public:
>>>>>        ~basic_format_context() = default;
>>>>> @@ -4998,8 +5044,9 @@ namespace __format
>>>>>        } _M_pc;
>>>>>
>>>>>        constexpr explicit
>>>>> -      _Scanner(basic_string_view<_CharT> __str, size_t __nargs =
>>>>> (size_t)-1)
>>>>> -      : _M_pc(__str, __nargs)
>>>>> +      _Scanner(_Api_ctx __api, basic_string_view<_CharT> __str,
>>>>> +              size_t __nargs = (size_t)-1)
>>>>> +      : _M_pc(__api, __str, __nargs)
>>>>>        { }
>>>>>
>>>>>        constexpr iterator begin() const noexcept { return
>>>>> _M_pc.begin(); }
>>>>> @@ -5115,7 +5162,7 @@ namespace __format
>>>>>      public:
>>>>>        _Formatting_scanner(basic_format_context<_Out, _CharT>& __fc,
>>>>>                           basic_string_view<_CharT> __str)
>>>>> -      : _Scanner<_CharT>(__str), _M_fc(__fc)
>>>>> +      : _Scanner<_CharT>(__fc._M_api, __str), _M_fc(__fc)
>>>>>        { }
>>>>>
>>>>>      private:
>>>>> @@ -5176,7 +5223,8 @@ namespace __format
>>>>>      public:
>>>>>        consteval
>>>>>        _Checking_scanner(basic_string_view<_CharT> __str)
>>>>> -      : _Scanner<_CharT>(__str, sizeof...(_Args))
>>>>> +      : _Scanner<_CharT>(__format::__current_api<_CharT>(),
>>>>>
>>>>
>>>> This is consteval so should use the right version for the current TU.
>>>>
>>>> +                        __str, sizeof...(_Args))
>>>>>        {
>>>>>  #if __cpp_lib_format >= 202305L
>>>>>         this->_M_pc._M_types = _M_types.data();
>>>>> @@ -5219,82 +5267,91 @@ namespace __format
>>>>>  #endif
>>>>>      };
>>>>>
>>>>> -  template<typename _Out, typename _CharT, typename _Context>
>>>>> -    inline _Out
>>>>> -    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>>>> -                   const basic_format_args<_Context>& __args,
>>>>> -                   const locale* __loc)
>>>>> +  template<typename _CharT>
>>>>> +    _Sink_iter<_CharT>
>>>>> +    __do_vformat_to(_Sink_iter<_CharT> __out,
>>>>> basic_string_view<_CharT> __fmt,
>>>>> +                   __format_context<_CharT>& __ctx)
>>>>>      {
>>>>> -      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>>>> -       {
>>>>> -         if constexpr (is_same_v<_CharT, char>)
>>>>> -           // Fast path for "{}" format strings and simple format arg
>>>>> types.
>>>>> -           if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] ==
>>>>> '}')
>>>>> -             {
>>>>> -               bool __done = false;
>>>>> -               __format::__visit_format_arg([&](auto& __arg) {
>>>>> -                 using _Tp = remove_cvref_t<decltype(__arg)>;
>>>>> -                 if constexpr (is_same_v<_Tp, bool>)
>>>>> +      if constexpr (is_same_v<_CharT, char>)
>>>>> +       // Fast path for "{}" format strings and simple format arg
>>>>> types.
>>>>> +       if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>>>>> +         {
>>>>> +           bool __done = false;
>>>>> +           __format::__visit_format_arg([&](auto& __arg) {
>>>>> +             using _Tp = remove_cvref_t<decltype(__arg)>;
>>>>> +             if constexpr (is_same_v<_Tp, bool>)
>>>>> +               {
>>>>> +                 size_t __len = 4 + !__arg;
>>>>> +                 const char* __chars[] = { "false", "true" };
>>>>> +                 if (auto __res = __out._M_reserve(__len))
>>>>>                     {
>>>>> -                     size_t __len = 4 + !__arg;
>>>>> -                     const char* __chars[] = { "false", "true" };
>>>>> -                     if (auto __res = __out._M_reserve(__len))
>>>>> -                       {
>>>>> -                         __builtin_memcpy(__res.get(),
>>>>> __chars[__arg], __len);
>>>>> -                         __res._M_bump(__len);
>>>>> -                         __done = true;
>>>>> -                       }
>>>>> +                     __builtin_memcpy(__res.get(), __chars[__arg],
>>>>> __len);
>>>>> +                     __res._M_bump(__len);
>>>>> +                     __done = true;
>>>>>                     }
>>>>> -                 else if constexpr (is_same_v<_Tp, char>)
>>>>> +               }
>>>>> +             else if constexpr (is_same_v<_Tp, char>)
>>>>> +               {
>>>>> +                 if (auto __res = __out._M_reserve(1))
>>>>>                     {
>>>>> -                     if (auto __res = __out._M_reserve(1))
>>>>> -                       {
>>>>> -                         *__res.get() = __arg;
>>>>> -                         __res._M_bump(1);
>>>>> -                         __done = true;
>>>>> -                       }
>>>>> +                     *__res.get() = __arg;
>>>>> +                     __res._M_bump(1);
>>>>> +                     __done = true;
>>>>>                     }
>>>>> -                 else if constexpr (is_integral_v<_Tp>)
>>>>> +               }
>>>>> +             else if constexpr (is_integral_v<_Tp>)
>>>>> +               {
>>>>> +                 make_unsigned_t<_Tp> __uval;
>>>>> +                 const bool __neg = __arg < 0;
>>>>> +                 if (__neg)
>>>>> +                   __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>>>> +                 else
>>>>> +                   __uval = __arg;
>>>>> +                 const auto __n = __detail::__to_chars_len(__uval);
>>>>> +                 if (auto __res = __out._M_reserve(__n + __neg))
>>>>>                     {
>>>>> -                     make_unsigned_t<_Tp> __uval;
>>>>> -                     const bool __neg = __arg < 0;
>>>>> -                     if (__neg)
>>>>> -                       __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>>>> -                     else
>>>>> -                       __uval = __arg;
>>>>> -                     const auto __n =
>>>>> __detail::__to_chars_len(__uval);
>>>>> -                     if (auto __res = __out._M_reserve(__n + __neg))
>>>>> -                       {
>>>>> -                         auto __ptr = __res.get();
>>>>> -                         *__ptr = '-';
>>>>> -                         __detail::__to_chars_10_impl(__ptr +
>>>>> (int)__neg, __n,
>>>>> -                                                      __uval);
>>>>> -                         __res._M_bump(__n + __neg);
>>>>> -                         __done = true;
>>>>> -                       }
>>>>> +                     auto __ptr = __res.get();
>>>>> +                     *__ptr = '-';
>>>>> +                     __detail::__to_chars_10_impl(__ptr + (int)__neg,
>>>>> __n,
>>>>> +                                                  __uval);
>>>>> +                     __res._M_bump(__n + __neg);
>>>>> +                     __done = true;
>>>>>                     }
>>>>> -                 else if constexpr (is_convertible_v<_Tp,
>>>>> string_view>)
>>>>> +               }
>>>>> +             else if constexpr (is_convertible_v<_Tp, string_view>)
>>>>> +               {
>>>>> +                 string_view __sv = __arg;
>>>>> +                 if (auto __res = __out._M_reserve(__sv.size()))
>>>>>                     {
>>>>> -                     string_view __sv = __arg;
>>>>> -                     if (auto __res = __out._M_reserve(__sv.size()))
>>>>> -                       {
>>>>> -                         __builtin_memcpy(__res.get(), __sv.data(),
>>>>> __sv.size());
>>>>> -                         __res._M_bump(__sv.size());
>>>>> -                         __done = true;
>>>>> -                       }
>>>>> +                     __builtin_memcpy(__res.get(), __sv.data(),
>>>>> __sv.size());
>>>>> +                     __res._M_bump(__sv.size());
>>>>> +                     __done = true;
>>>>>                     }
>>>>> -               }, __args.get(0));
>>>>> +               }
>>>>> +           }, __ctx.arg(0));
>>>>>
>>>>> -               if (__done)
>>>>> -                 return __out;
>>>>> -             }
>>>>> +           if (__done)
>>>>> +             return __out;
>>>>> +         }
>>>>>
>>>>> +      _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>>>> __scanner(__ctx, __fmt);
>>>>> +      __scanner._M_scan();
>>>>> +      return __out;
>>>>> +    }
>>>>> +
>>>>> +  template<typename _Out, typename _CharT, typename _Context>
>>>>> +    _Out
>>>>> +    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>>>> +                   const basic_format_args<_Context>& __args,
>>>>> +                   const locale* __loc)
>>>>> +    {
>>>>> +      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>>>> +       {
>>>>> +         const auto __api = __format::__current_api<_CharT>();
>>>>>           auto __ctx = __loc == nullptr
>>>>> -                        ? _Context(__args, __out)
>>>>> -                        : _Context(__args, __out, *__loc);
>>>>> -         _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>>>> __scanner(__ctx, __fmt);
>>>>> -         __scanner._M_scan();
>>>>> -         return __out;
>>>>> +                    ? _Context(__api, __args, __out)
>>>>> +                    : _Context(__api, __args, __out, *__loc);
>>>>> +         return __do_vformat_to(std::move(__out), __fmt, __ctx);
>>>>>         }
>>>>>        else if constexpr (__contiguous_char_iter<_CharT, _Out>)
>>>>>         {
>>>>> --
>>>>> 2.53.0
>>>>>
>>>>>
  
Tomasz Kaminski April 7, 2026, 2:36 p.m. UTC | #6
To give context, to below question, I think I need some guidance and
decision, if we
find drawbacks we discussed acceptable, and want just simpler
implementation.

On Tue, Apr 7, 2026 at 4:30 PM Tomasz Kaminski <tkaminsk@redhat.com> wrote:

> Do you want me to provide an alternative patch, that exports the
> definition for __do_vformat only
> for TUs that use unicode literal encoding?
>
I plan to do that by having something like:
+  template<typename _CharT, bool =
__unicode::__literal_is_unicode<_CharT>()>
+    _Sink_iter<_CharT>
+    __do_vformat_to(_Sink_iter<_CharT> __out, basic_string_view<_CharT>
__fmt,
+                   __format_context<_CharT>& __ctx)

And then providing extern definition only for <char, true> and <wchar_t,
true> overloads
of above. (And not introducing any runtime _Api_ctx).

And then once we call C++23 stable, we will export the C++23 version of
above,
and remove C++20 one.



> On Tue, Apr 7, 2026 at 4:08 PM Tomasz Kaminski <tkaminsk@redhat.com>
> wrote:
>
>>
>>
>> On Tue, Apr 7, 2026 at 3:54 PM Jonathan Wakely <jwakely.gcc@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, 7 Apr 2026, 14:30 Tomasz Kaminski, <tkaminsk@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Apr 7, 2026 at 3:00 PM Jonathan Wakely <jwakely.gcc@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, 7 Apr 2026, 09:15 Tomasz Kamiński, <tkaminsk@redhat.com>
>>>>> wrote:
>>>>>
>>>>>> This patch adds a _M_api member to basic_format_context and
>>>>>> basic_format_parse_context, that represents the information about
>>>>>> the TU in which the call was compiled:
>>>>>> * _M_ver represents the C++ standard in which TU was compiled,
>>>>>> * _M_literal_unicode is true when TU was compiled with Unicode
>>>>>>   literal encoding,
>>>>>> * _M_liter_enc is reserved for storing text_encoding::id value
>>>>>>   for literal encoding, currently set to zero.
>>>>>> This values are then populated by __current_api<_CharT>() functions.
>>>>>>
>>>>>> This would allow the formatter instantiations compiled in different
>>>>>> TU (for example as part of libstdc++.so) to properly handle:
>>>>>> * multi-byte fill-characters used as fill in format-spec, that
>>>>>>   are supported only if literal encoding is Unicode,
>>>>>> * '?' as format flags for string and characters, that is only
>>>>>>   supported since C++23,
>>>>>> * escaping of the string parameters, that depends on the literal
>>>>>>   encoding.
>>>>>>
>>>>>> The further aid the above, a new __do_vformat_to overload is
>>>>>> extracted.
>>>>>> This overload format_context& that encodes the TU-specific properties,
>>>>>> and can be exported in from libstdc++.
>>>>>>
>>>>>> This patch on purpose does not modify the formatters code, and only
>>>>>> adds new members, as adding them later would be ABI break.
>>>>>>
>>>>>> libstdc++-v3/ChangeLog:
>>>>>>
>>>>>>         * include/std/format (__format::_Api_ctx,
>>>>>> __format::__current_api):
>>>>>>         Define.
>>>>>>         (basic_format_parse_context::_M_api): Define.
>>>>>>         (basic_format_parse_context::basic_format_parse_context):
>>>>>>         Provide (basic_string_view, size_t) constructor only in C++20.
>>>>>>         Define new internal private cosntructor accepting _Api_ctx.
>>>>>>         (basic_format_context::_M_api): Define.
>>>>>>         (basic_format_context::basic_format_context): Add additional
>>>>>>         _Api_ctx parameter.
>>>>>>         (_Scanner::_Scanner): Add additional _Api_ctx parameter,
>>>>>>         and forward it to basic_format_parse_context.
>>>>>>         (_Formatting_scanner::_Formatting_scanner): Propagate
>>>>>>         _M_api from basic_format_context.
>>>>>>         (_Checking_scanner::_Checking_scanner): Use
>>>>>> __format::__current_api()
>>>>>>         to initialize API.
>>>>>>         (__format::__do_vformat_to): Extract overload accepting
>>>>>>         basic_format_context.
>>>>>> ---
>>>>>> I have realized that exporting the vformat specializations correclty
>>>>>> requires
>>>>>> much bigger code changes, than I am comfortable making this late in
>>>>>> the stage-4,
>>>>>> as we will need to make the code independed on TU specific properties
>>>>>> (like
>>>>>> encodinds). This patch instead adds a context members to
>>>>>> basic_format_context
>>>>>> and basic_format_parse_context that would allow doing so in the
>>>>>> future.
>>>>>>
>>>>>
>>>>> An alternative would be to have an inline dispatching function that
>>>>> decides whether the current TU matches what's in the library (where that
>>>>> will be the common case) and only uses the explicit instantiations of it
>>>>> matches.
>>>>>
>>>> Seems reasonable for unicode encoding, but does not solve multiple
>>>> standard.
>>>>
>>>>>
>>>>> I'm not sure this is really a problem I care about solving. If you try
>>>>> to mix incompatible literal encodings in one program you shouldn't expect
>>>>> sensible results for code that is sensitive to the literal encoding.
>>>>>
>>>>> When mixing C++20 and C++23, the C++20 TUs should use the explicit
>>>>> instantiation which is right for C++20, and C++23 TUs will use an implicit
>>>>> instantiation of the C++23 definition.
>>>>>
>>>> This works only on surface level, C++20 will use __vformat_impl_20 and
>>>> C++23 will use __vformat_impl_23, defined in format-inst-20, format-inst-23
>>>> source files.
>>>>
>>>
>>> Once we stabilise C++23 I think we should only instantiate the format
>>> functions for C++23. Will anybody really care if they use a dynamic format
>>> string in C++20 code and don't get an exception for using a C++23 format
>>> specifier?
>>>
>> Yes, the fact that the format specifier string is accepted by `vformat`
>> but rejected by `format` seems very surprising to me, but if we accepted
>> that we have a much simpler problem to solve. If we plan to accept all
>> C++23 specifiers for basic types as extension in C++20 mode, once
>> it will be stabilized, then that would sound much more intuitive for me.
>>
>>>
>>> Both of these files will instantiate `__formatter_str` under the same
>>>> name. When they are combined into `libstdc++.so`, linker one, and one
>>>> standard
>>>> will get incorrect behavior for one of the standards.
>>>>
>>>
>>> Fine, the older standard could get the "wrong"  behaviour (where wrong
>>> just means supporting C++23 features when called from C++20 TUs).
>>>
>> I considered that unacceptable, and we even have a test cases checking if
>> that is the case. The failures for these tests were the reason
>> I started exploring alternatives.
>>
>>>
>>> To avoid this problem we will need to ABI tag each used formatter in
>>>> some manner and apply that
>>>> tag virally to any formatter referenced from these functions.
>>>>
>>>>
>>>>>
>>>>> Is there really a problem?
>>>>>
>>>>> If we can capture the API level without adding any overhead, I suppose
>>>>> that's acceptable.
>>>>>
>>>>> If we store the text encoding, what are we going to do with it? Use
>>>>> iconv to convert the fill character on the fly? To what output encoding?
>>>>>
>>>> This if for future, if we want to handle string escaping for
>>>> non-unicode encoding better than giving them equivalent behavior than ASCII.
>>>> We can add any additional fields to basic_format_parse_context and
>>>> basic_format_context in the future; this is why I am reserving space for it.
>>>>
>>>> // Also could you take a look at:
>>>> https://gcc.gnu.org/pipermail/libstdc++/2026-April/066030.html
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Tested all *format* test on x86_64-linux. OK for trunk when all test
>>>>>> passes?
>>>>>>
>>>>>>  libstdc++-v3/include/std/format | 209
>>>>>> ++++++++++++++++++++------------
>>>>>>  1 file changed, 133 insertions(+), 76 deletions(-)
>>>>>>
>>>>>> diff --git a/libstdc++-v3/include/std/format
>>>>>> b/libstdc++-v3/include/std/format
>>>>>> index eca5bd213aa..97d1ecb3ed6 100644
>>>>>> --- a/libstdc++-v3/include/std/format
>>>>>> +++ b/libstdc++-v3/include/std/format
>>>>>> @@ -140,6 +140,37 @@ namespace __format
>>>>>>        template<typename, typename...> friend struct
>>>>>> std::basic_format_string;
>>>>>>      };
>>>>>>
>>>>>> +  // Exposed via basic_format_parse_context, defines the TU specific
>>>>>> information
>>>>>> +  // like encoding and standard version.
>>>>>> +  struct _Api_ctx
>>>>>> +  {
>>>>>> +    enum class _Version : unsigned char
>>>>>> +    { _Api_2020, _Api_2023, _Api_2026 };
>>>>>> +
>>>>>> +    _Version _M_ver;
>>>>>> +    unsigned _M_unused : 23;
>>>>>> +    unsigned _M_literal_unicode : 1;
>>>>>> +    __INT_LEAST32_TYPE__ _M_literal_enc;
>>>>>> +  };
>>>>>> +  using enum _Api_ctx::_Version;
>>>>>> +
>>>>>> +  template<typename _CharT>
>>>>>> +    constexpr _Api_ctx
>>>>>> +    __current_api()
>>>>>>
>>>>>
>>>>> Should this be always inline?
>>>>>
>>>>> +    {
>>>>>> +      _Api_ctx __api{};
>>>>>> +#if __cpluplus > 202302L
>>>>>> +      __api._M_ver = _Api_2026;
>>>>>> +#elif __cpluplus > 202002L
>>>>>> +      __api._M_ver = _Api_2023;
>>>>>> +#else
>>>>>> +      __api._M_ver = _Api_2020;
>>>>>> +#endif
>>>>>> +      __api._M_literal_unicode
>>>>>> +       = __unicode::__literal_encoding_is_unicode<_CharT>();
>>>>>> +      return __api;
>>>>>> +    }
>>>>>> +
>>>>>>  } // namespace __format
>>>>>>  /// @endcond
>>>>>>
>>>>>> @@ -274,7 +305,7 @@ namespace __format
>>>>>>    { __throw_format_error("format error: failed to parse
>>>>>> format-spec"); }
>>>>>>
>>>>>>    template<typename _CharT> class _Scanner;
>>>>>> -
>>>>>> +  template<typename _Out, typename _CharT> class _Formatting_scanner;
>>>>>>  } // namespace __format
>>>>>>    /// @endcond
>>>>>>
>>>>>> @@ -408,23 +439,34 @@ namespace __format
>>>>>>        // This must not be constexpr.
>>>>>>        static void __invalid_dynamic_spec(const char*);
>>>>>>
>>>>>> -      friend __format::_Scanner<_CharT>;
>>>>>> -#endif
>>>>>> -
>>>>>> +#else
>>>>>>        // This constructor should only be used by the implementation.
>>>>>>        constexpr explicit
>>>>>>        basic_format_parse_context(basic_string_view<_CharT> __fmt,
>>>>>>                                  size_t __num_args) noexcept
>>>>>>        : _M_begin(__fmt.begin()), _M_end(__fmt.end()),
>>>>>> _M_num_args(__num_args)
>>>>>>        { }
>>>>>> +#endif
>>>>>>
>>>>>>      private:
>>>>>> +      // This constructor should only be used by the implementation.
>>>>>> +      constexpr explicit
>>>>>> +      basic_format_parse_context(__format::_Api_ctx __api,
>>>>>> +                                basic_string_view<_CharT> __fmt,
>>>>>> +                                size_t __num_args) noexcept
>>>>>> +      : _M_api(__api), _M_begin(__fmt.begin()), _M_end(__fmt.end())
>>>>>> +      , _M_num_args(__num_args)
>>>>>> +      { }
>>>>>> +
>>>>>> +      __format::_Api_ctx _M_api = __format::__current_api<_CharT>();
>>>>>>
>>>>>
>>>>> What guarantees this will be initialized by a call to the right
>>>>> version?
>>>>>
>>>> This is only used by basic_format_parse_context(string) constructor,
>>>> that is mostly used
>>>> for user defined-formatters. We may want to define this cosntructors as
>>>> always inline.
>>>>
>>>>>
>>>>> Doesn't putting this member first add a lot of wasted padding due to
>>>>> alignment?
>>>>>
>>>> I do not think basic_format_parse_context and basic_format_context size
>>>> is relevant to
>>>> anybody. But the struct is 64bits, so should not add extra pading.
>>>>
>>>
>>> I don't understand how adding a new byte before the first iterator
>>> member doesn't introduce sizeof(void*)-1 bytes of padding.
>>>
>> _Api_ctx is 8B struct with the literal encoding information. And I
>> considered having a spare bytes there as a feature, and not
>> a drawback.
>>
>>>
>>> Why don't we just give _Indexing a fixed underlying type of unsigned
>>> char and then put the API version after that?
>>>
>>>
>>>
>>>
>>>>>
>>>>>        iterator _M_begin;
>>>>>>        iterator _M_end;
>>>>>>        enum _Indexing { _Unknown, _Manual, _Auto };
>>>>>>        _Indexing _M_indexing = _Unknown;
>>>>>>
>>>>>
>>>>> We already have padding bytes here (and could guarantee that by giving
>>>>> a fixed underlying type to _Indexing)
>>>>>
>>>>>        size_t _M_next_arg_id = 0;
>>>>>>        size_t _M_num_args = 0;
>>>>>> +
>>>>>> +      friend __format::_Scanner<_CharT>;
>>>>>>      };
>>>>>>
>>>>>>  /// @cond undocumented
>>>>>> @@ -4927,18 +4969,21 @@ namespace __format
>>>>>>      {
>>>>>>        static_assert( output_iterator<_Out, const _CharT&> );
>>>>>>
>>>>>> +      __format::_Api_ctx  _M_api;
>>>>>>        basic_format_args<basic_format_context> _M_args;
>>>>>>        _Out _M_out;
>>>>>>        __format::_Optional_locale _M_loc;
>>>>>>
>>>>>> -      basic_format_context(basic_format_args<basic_format_context>
>>>>>> __args,
>>>>>> +      basic_format_context(__format::_Api_ctx __api,
>>>>>> +                          basic_format_args<basic_format_context>
>>>>>> __args,
>>>>>>                            _Out __out)
>>>>>> -      : _M_args(__args), _M_out(std::move(__out))
>>>>>> +      : _M_api(__api), _M_args(__args), _M_out(std::move(__out))
>>>>>>        { }
>>>>>>
>>>>>> -      basic_format_context(basic_format_args<basic_format_context>
>>>>>> __args,
>>>>>> +      basic_format_context(__format::_Api_ctx __api,
>>>>>> +                          basic_format_args<basic_format_context>
>>>>>> __args,
>>>>>>                            _Out __out, const std::locale& __loc)
>>>>>> -      : _M_args(__args), _M_out(std::move(__out)), _M_loc(__loc)
>>>>>> +      : _M_api(__api), _M_args(__args),
>>>>>> _M_out(std::move(__out)), _M_loc(__loc)
>>>>>>        { }
>>>>>>
>>>>>>        // _GLIBCXX_RESOLVE_LIB_DEFECTS
>>>>>> @@ -4954,6 +4999,7 @@ namespace __format
>>>>>>                                   const locale*);
>>>>>>
>>>>>>        friend __format::__formatter_chrono<_CharT>;
>>>>>> +      friend __format::_Formatting_scanner<_Out, _CharT>;
>>>>>>
>>>>>>      public:
>>>>>>        ~basic_format_context() = default;
>>>>>> @@ -4998,8 +5044,9 @@ namespace __format
>>>>>>        } _M_pc;
>>>>>>
>>>>>>        constexpr explicit
>>>>>> -      _Scanner(basic_string_view<_CharT> __str, size_t __nargs =
>>>>>> (size_t)-1)
>>>>>> -      : _M_pc(__str, __nargs)
>>>>>> +      _Scanner(_Api_ctx __api, basic_string_view<_CharT> __str,
>>>>>> +              size_t __nargs = (size_t)-1)
>>>>>> +      : _M_pc(__api, __str, __nargs)
>>>>>>        { }
>>>>>>
>>>>>>        constexpr iterator begin() const noexcept { return
>>>>>> _M_pc.begin(); }
>>>>>> @@ -5115,7 +5162,7 @@ namespace __format
>>>>>>      public:
>>>>>>        _Formatting_scanner(basic_format_context<_Out, _CharT>& __fc,
>>>>>>                           basic_string_view<_CharT> __str)
>>>>>> -      : _Scanner<_CharT>(__str), _M_fc(__fc)
>>>>>> +      : _Scanner<_CharT>(__fc._M_api, __str), _M_fc(__fc)
>>>>>>        { }
>>>>>>
>>>>>>      private:
>>>>>> @@ -5176,7 +5223,8 @@ namespace __format
>>>>>>      public:
>>>>>>        consteval
>>>>>>        _Checking_scanner(basic_string_view<_CharT> __str)
>>>>>> -      : _Scanner<_CharT>(__str, sizeof...(_Args))
>>>>>> +      : _Scanner<_CharT>(__format::__current_api<_CharT>(),
>>>>>>
>>>>>
>>>>> This is consteval so should use the right version for the current TU.
>>>>>
>>>>> +                        __str, sizeof...(_Args))
>>>>>>        {
>>>>>>  #if __cpp_lib_format >= 202305L
>>>>>>         this->_M_pc._M_types = _M_types.data();
>>>>>> @@ -5219,82 +5267,91 @@ namespace __format
>>>>>>  #endif
>>>>>>      };
>>>>>>
>>>>>> -  template<typename _Out, typename _CharT, typename _Context>
>>>>>> -    inline _Out
>>>>>> -    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>>>>> -                   const basic_format_args<_Context>& __args,
>>>>>> -                   const locale* __loc)
>>>>>> +  template<typename _CharT>
>>>>>> +    _Sink_iter<_CharT>
>>>>>> +    __do_vformat_to(_Sink_iter<_CharT> __out,
>>>>>> basic_string_view<_CharT> __fmt,
>>>>>> +                   __format_context<_CharT>& __ctx)
>>>>>>      {
>>>>>> -      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>>>>> -       {
>>>>>> -         if constexpr (is_same_v<_CharT, char>)
>>>>>> -           // Fast path for "{}" format strings and simple format
>>>>>> arg types.
>>>>>> -           if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] ==
>>>>>> '}')
>>>>>> -             {
>>>>>> -               bool __done = false;
>>>>>> -               __format::__visit_format_arg([&](auto& __arg) {
>>>>>> -                 using _Tp = remove_cvref_t<decltype(__arg)>;
>>>>>> -                 if constexpr (is_same_v<_Tp, bool>)
>>>>>> +      if constexpr (is_same_v<_CharT, char>)
>>>>>> +       // Fast path for "{}" format strings and simple format arg
>>>>>> types.
>>>>>> +       if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>>>>>> +         {
>>>>>> +           bool __done = false;
>>>>>> +           __format::__visit_format_arg([&](auto& __arg) {
>>>>>> +             using _Tp = remove_cvref_t<decltype(__arg)>;
>>>>>> +             if constexpr (is_same_v<_Tp, bool>)
>>>>>> +               {
>>>>>> +                 size_t __len = 4 + !__arg;
>>>>>> +                 const char* __chars[] = { "false", "true" };
>>>>>> +                 if (auto __res = __out._M_reserve(__len))
>>>>>>                     {
>>>>>> -                     size_t __len = 4 + !__arg;
>>>>>> -                     const char* __chars[] = { "false", "true" };
>>>>>> -                     if (auto __res = __out._M_reserve(__len))
>>>>>> -                       {
>>>>>> -                         __builtin_memcpy(__res.get(),
>>>>>> __chars[__arg], __len);
>>>>>> -                         __res._M_bump(__len);
>>>>>> -                         __done = true;
>>>>>> -                       }
>>>>>> +                     __builtin_memcpy(__res.get(), __chars[__arg],
>>>>>> __len);
>>>>>> +                     __res._M_bump(__len);
>>>>>> +                     __done = true;
>>>>>>                     }
>>>>>> -                 else if constexpr (is_same_v<_Tp, char>)
>>>>>> +               }
>>>>>> +             else if constexpr (is_same_v<_Tp, char>)
>>>>>> +               {
>>>>>> +                 if (auto __res = __out._M_reserve(1))
>>>>>>                     {
>>>>>> -                     if (auto __res = __out._M_reserve(1))
>>>>>> -                       {
>>>>>> -                         *__res.get() = __arg;
>>>>>> -                         __res._M_bump(1);
>>>>>> -                         __done = true;
>>>>>> -                       }
>>>>>> +                     *__res.get() = __arg;
>>>>>> +                     __res._M_bump(1);
>>>>>> +                     __done = true;
>>>>>>                     }
>>>>>> -                 else if constexpr (is_integral_v<_Tp>)
>>>>>> +               }
>>>>>> +             else if constexpr (is_integral_v<_Tp>)
>>>>>> +               {
>>>>>> +                 make_unsigned_t<_Tp> __uval;
>>>>>> +                 const bool __neg = __arg < 0;
>>>>>> +                 if (__neg)
>>>>>> +                   __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>>>>> +                 else
>>>>>> +                   __uval = __arg;
>>>>>> +                 const auto __n = __detail::__to_chars_len(__uval);
>>>>>> +                 if (auto __res = __out._M_reserve(__n + __neg))
>>>>>>                     {
>>>>>> -                     make_unsigned_t<_Tp> __uval;
>>>>>> -                     const bool __neg = __arg < 0;
>>>>>> -                     if (__neg)
>>>>>> -                       __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>>>>> -                     else
>>>>>> -                       __uval = __arg;
>>>>>> -                     const auto __n =
>>>>>> __detail::__to_chars_len(__uval);
>>>>>> -                     if (auto __res = __out._M_reserve(__n + __neg))
>>>>>> -                       {
>>>>>> -                         auto __ptr = __res.get();
>>>>>> -                         *__ptr = '-';
>>>>>> -                         __detail::__to_chars_10_impl(__ptr +
>>>>>> (int)__neg, __n,
>>>>>> -                                                      __uval);
>>>>>> -                         __res._M_bump(__n + __neg);
>>>>>> -                         __done = true;
>>>>>> -                       }
>>>>>> +                     auto __ptr = __res.get();
>>>>>> +                     *__ptr = '-';
>>>>>> +                     __detail::__to_chars_10_impl(__ptr +
>>>>>> (int)__neg, __n,
>>>>>> +                                                  __uval);
>>>>>> +                     __res._M_bump(__n + __neg);
>>>>>> +                     __done = true;
>>>>>>                     }
>>>>>> -                 else if constexpr (is_convertible_v<_Tp,
>>>>>> string_view>)
>>>>>> +               }
>>>>>> +             else if constexpr (is_convertible_v<_Tp, string_view>)
>>>>>> +               {
>>>>>> +                 string_view __sv = __arg;
>>>>>> +                 if (auto __res = __out._M_reserve(__sv.size()))
>>>>>>                     {
>>>>>> -                     string_view __sv = __arg;
>>>>>> -                     if (auto __res = __out._M_reserve(__sv.size()))
>>>>>> -                       {
>>>>>> -                         __builtin_memcpy(__res.get(), __sv.data(),
>>>>>> __sv.size());
>>>>>> -                         __res._M_bump(__sv.size());
>>>>>> -                         __done = true;
>>>>>> -                       }
>>>>>> +                     __builtin_memcpy(__res.get(), __sv.data(),
>>>>>> __sv.size());
>>>>>> +                     __res._M_bump(__sv.size());
>>>>>> +                     __done = true;
>>>>>>                     }
>>>>>> -               }, __args.get(0));
>>>>>> +               }
>>>>>> +           }, __ctx.arg(0));
>>>>>>
>>>>>> -               if (__done)
>>>>>> -                 return __out;
>>>>>> -             }
>>>>>> +           if (__done)
>>>>>> +             return __out;
>>>>>> +         }
>>>>>>
>>>>>> +      _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>>>>> __scanner(__ctx, __fmt);
>>>>>> +      __scanner._M_scan();
>>>>>> +      return __out;
>>>>>> +    }
>>>>>> +
>>>>>> +  template<typename _Out, typename _CharT, typename _Context>
>>>>>> +    _Out
>>>>>> +    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>>>>> +                   const basic_format_args<_Context>& __args,
>>>>>> +                   const locale* __loc)
>>>>>> +    {
>>>>>> +      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>>>>> +       {
>>>>>> +         const auto __api = __format::__current_api<_CharT>();
>>>>>>           auto __ctx = __loc == nullptr
>>>>>> -                        ? _Context(__args, __out)
>>>>>> -                        : _Context(__args, __out, *__loc);
>>>>>> -         _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>>>>> __scanner(__ctx, __fmt);
>>>>>> -         __scanner._M_scan();
>>>>>> -         return __out;
>>>>>> +                    ? _Context(__api, __args, __out)
>>>>>> +                    : _Context(__api, __args, __out, *__loc);
>>>>>> +         return __do_vformat_to(std::move(__out), __fmt, __ctx);
>>>>>>         }
>>>>>>        else if constexpr (__contiguous_char_iter<_CharT, _Out>)
>>>>>>         {
>>>>>> --
>>>>>> 2.53.0
>>>>>>
>>>>>>
  
Jonathan Wakely April 7, 2026, 3:47 p.m. UTC | #7
On Tue, 7 Apr 2026, 15:09 Tomasz Kaminski, <tkaminsk@redhat.com> wrote:

>
>
> On Tue, Apr 7, 2026 at 3:54 PM Jonathan Wakely <jwakely.gcc@gmail.com>
> wrote:
>
>>
>>
>> On Tue, 7 Apr 2026, 14:30 Tomasz Kaminski, <tkaminsk@redhat.com> wrote:
>>
>>>
>>>
>>> On Tue, Apr 7, 2026 at 3:00 PM Jonathan Wakely <jwakely.gcc@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, 7 Apr 2026, 09:15 Tomasz Kamiński, <tkaminsk@redhat.com> wrote:
>>>>
>>>>> This patch adds a _M_api member to basic_format_context and
>>>>> basic_format_parse_context, that represents the information about
>>>>> the TU in which the call was compiled:
>>>>> * _M_ver represents the C++ standard in which TU was compiled,
>>>>> * _M_literal_unicode is true when TU was compiled with Unicode
>>>>>   literal encoding,
>>>>> * _M_liter_enc is reserved for storing text_encoding::id value
>>>>>   for literal encoding, currently set to zero.
>>>>> This values are then populated by __current_api<_CharT>() functions.
>>>>>
>>>>> This would allow the formatter instantiations compiled in different
>>>>> TU (for example as part of libstdc++.so) to properly handle:
>>>>> * multi-byte fill-characters used as fill in format-spec, that
>>>>>   are supported only if literal encoding is Unicode,
>>>>> * '?' as format flags for string and characters, that is only
>>>>>   supported since C++23,
>>>>> * escaping of the string parameters, that depends on the literal
>>>>>   encoding.
>>>>>
>>>>> The further aid the above, a new __do_vformat_to overload is extracted.
>>>>> This overload format_context& that encodes the TU-specific properties,
>>>>> and can be exported in from libstdc++.
>>>>>
>>>>> This patch on purpose does not modify the formatters code, and only
>>>>> adds new members, as adding them later would be ABI break.
>>>>>
>>>>> libstdc++-v3/ChangeLog:
>>>>>
>>>>>         * include/std/format (__format::_Api_ctx,
>>>>> __format::__current_api):
>>>>>         Define.
>>>>>         (basic_format_parse_context::_M_api): Define.
>>>>>         (basic_format_parse_context::basic_format_parse_context):
>>>>>         Provide (basic_string_view, size_t) constructor only in C++20.
>>>>>         Define new internal private cosntructor accepting _Api_ctx.
>>>>>         (basic_format_context::_M_api): Define.
>>>>>         (basic_format_context::basic_format_context): Add additional
>>>>>         _Api_ctx parameter.
>>>>>         (_Scanner::_Scanner): Add additional _Api_ctx parameter,
>>>>>         and forward it to basic_format_parse_context.
>>>>>         (_Formatting_scanner::_Formatting_scanner): Propagate
>>>>>         _M_api from basic_format_context.
>>>>>         (_Checking_scanner::_Checking_scanner): Use
>>>>> __format::__current_api()
>>>>>         to initialize API.
>>>>>         (__format::__do_vformat_to): Extract overload accepting
>>>>>         basic_format_context.
>>>>> ---
>>>>> I have realized that exporting the vformat specializations correclty
>>>>> requires
>>>>> much bigger code changes, than I am comfortable making this late in
>>>>> the stage-4,
>>>>> as we will need to make the code independed on TU specific properties
>>>>> (like
>>>>> encodinds). This patch instead adds a context members to
>>>>> basic_format_context
>>>>> and basic_format_parse_context that would allow doing so in the future.
>>>>>
>>>>
>>>> An alternative would be to have an inline dispatching function that
>>>> decides whether the current TU matches what's in the library (where that
>>>> will be the common case) and only uses the explicit instantiations of it
>>>> matches.
>>>>
>>> Seems reasonable for unicode encoding, but does not solve multiple
>>> standard.
>>>
>>>>
>>>> I'm not sure this is really a problem I care about solving. If you try
>>>> to mix incompatible literal encodings in one program you shouldn't expect
>>>> sensible results for code that is sensitive to the literal encoding.
>>>>
>>>> When mixing C++20 and C++23, the C++20 TUs should use the explicit
>>>> instantiation which is right for C++20, and C++23 TUs will use an implicit
>>>> instantiation of the C++23 definition.
>>>>
>>> This works only on surface level, C++20 will use __vformat_impl_20 and
>>> C++23 will use __vformat_impl_23, defined in format-inst-20, format-inst-23
>>> source files.
>>>
>>
>> Once we stabilise C++23 I think we should only instantiate the format
>> functions for C++23. Will anybody really care if they use a dynamic format
>> string in C++20 code and don't get an exception for using a C++23 format
>> specifier?
>>
> Yes, the fact that the format specifier string is accepted by `vformat`
> but rejected by `format` seems very surprising to me, but if we accepted
> that we have a much simpler problem to solve. If we plan to accept all
> C++23 specifiers for basic types as extension in C++20 mode, once
> it will be stabilized, then that would sound much more intuitive for me.
>
>>
>> Both of these files will instantiate `__formatter_str` under the same
>>> name. When they are combined into `libstdc++.so`, linker one, and one
>>> standard
>>> will get incorrect behavior for one of the standards.
>>>
>>
>> Fine, the older standard could get the "wrong"  behaviour (where wrong
>> just means supporting C++23 features when called from C++20 TUs).
>>
> I considered that unacceptable, and we even have a test cases checking if
> that is the case. The failures for these tests were the reason
> I started exploring alternatives.
>
>>
>> To avoid this problem we will need to ABI tag each used formatter in some
>>> manner and apply that
>>> tag virally to any formatter referenced from these functions.
>>>
>>>
>>>>
>>>> Is there really a problem?
>>>>
>>>> If we can capture the API level without adding any overhead, I suppose
>>>> that's acceptable.
>>>>
>>>> If we store the text encoding, what are we going to do with it? Use
>>>> iconv to convert the fill character on the fly? To what output encoding?
>>>>
>>> This if for future, if we want to handle string escaping for non-unicode
>>> encoding better than giving them equivalent behavior than ASCII.
>>> We can add any additional fields to basic_format_parse_context and
>>> basic_format_context in the future; this is why I am reserving space for it.
>>>
>>> // Also could you take a look at:
>>> https://gcc.gnu.org/pipermail/libstdc++/2026-April/066030.html
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>>> Tested all *format* test on x86_64-linux. OK for trunk when all test
>>>>> passes?
>>>>>
>>>>>  libstdc++-v3/include/std/format | 209 ++++++++++++++++++++------------
>>>>>  1 file changed, 133 insertions(+), 76 deletions(-)
>>>>>
>>>>> diff --git a/libstdc++-v3/include/std/format
>>>>> b/libstdc++-v3/include/std/format
>>>>> index eca5bd213aa..97d1ecb3ed6 100644
>>>>> --- a/libstdc++-v3/include/std/format
>>>>> +++ b/libstdc++-v3/include/std/format
>>>>> @@ -140,6 +140,37 @@ namespace __format
>>>>>        template<typename, typename...> friend struct
>>>>> std::basic_format_string;
>>>>>      };
>>>>>
>>>>> +  // Exposed via basic_format_parse_context, defines the TU specific
>>>>> information
>>>>> +  // like encoding and standard version.
>>>>> +  struct _Api_ctx
>>>>> +  {
>>>>> +    enum class _Version : unsigned char
>>>>> +    { _Api_2020, _Api_2023, _Api_2026 };
>>>>> +
>>>>> +    _Version _M_ver;
>>>>> +    unsigned _M_unused : 23;
>>>>> +    unsigned _M_literal_unicode : 1;
>>>>> +    __INT_LEAST32_TYPE__ _M_literal_enc;
>>>>> +  };
>>>>> +  using enum _Api_ctx::_Version;
>>>>> +
>>>>> +  template<typename _CharT>
>>>>> +    constexpr _Api_ctx
>>>>> +    __current_api()
>>>>>
>>>>
>>>> Should this be always inline?
>>>>
>>>> +    {
>>>>> +      _Api_ctx __api{};
>>>>> +#if __cpluplus > 202302L
>>>>> +      __api._M_ver = _Api_2026;
>>>>> +#elif __cpluplus > 202002L
>>>>> +      __api._M_ver = _Api_2023;
>>>>> +#else
>>>>> +      __api._M_ver = _Api_2020;
>>>>> +#endif
>>>>> +      __api._M_literal_unicode
>>>>> +       = __unicode::__literal_encoding_is_unicode<_CharT>();
>>>>> +      return __api;
>>>>> +    }
>>>>> +
>>>>>  } // namespace __format
>>>>>  /// @endcond
>>>>>
>>>>> @@ -274,7 +305,7 @@ namespace __format
>>>>>    { __throw_format_error("format error: failed to parse
>>>>> format-spec"); }
>>>>>
>>>>>    template<typename _CharT> class _Scanner;
>>>>> -
>>>>> +  template<typename _Out, typename _CharT> class _Formatting_scanner;
>>>>>  } // namespace __format
>>>>>    /// @endcond
>>>>>
>>>>> @@ -408,23 +439,34 @@ namespace __format
>>>>>        // This must not be constexpr.
>>>>>        static void __invalid_dynamic_spec(const char*);
>>>>>
>>>>> -      friend __format::_Scanner<_CharT>;
>>>>> -#endif
>>>>> -
>>>>> +#else
>>>>>        // This constructor should only be used by the implementation.
>>>>>        constexpr explicit
>>>>>        basic_format_parse_context(basic_string_view<_CharT> __fmt,
>>>>>                                  size_t __num_args) noexcept
>>>>>        : _M_begin(__fmt.begin()), _M_end(__fmt.end()),
>>>>> _M_num_args(__num_args)
>>>>>        { }
>>>>> +#endif
>>>>>
>>>>>      private:
>>>>> +      // This constructor should only be used by the implementation.
>>>>> +      constexpr explicit
>>>>> +      basic_format_parse_context(__format::_Api_ctx __api,
>>>>> +                                basic_string_view<_CharT> __fmt,
>>>>> +                                size_t __num_args) noexcept
>>>>> +      : _M_api(__api), _M_begin(__fmt.begin()), _M_end(__fmt.end())
>>>>> +      , _M_num_args(__num_args)
>>>>> +      { }
>>>>> +
>>>>> +      __format::_Api_ctx _M_api = __format::__current_api<_CharT>();
>>>>>
>>>>
>>>> What guarantees this will be initialized by a call to the right version?
>>>>
>>> This is only used by basic_format_parse_context(string) constructor,
>>> that is mostly used
>>> for user defined-formatters. We may want to define this cosntructors as
>>> always inline.
>>>
>>>>
>>>> Doesn't putting this member first add a lot of wasted padding due to
>>>> alignment?
>>>>
>>> I do not think basic_format_parse_context and basic_format_context size
>>> is relevant to
>>> anybody. But the struct is 64bits, so should not add extra pading.
>>>
>>
>> I don't understand how adding a new byte before the first iterator member
>> doesn't introduce sizeof(void*)-1 bytes of padding.
>>
> _Api_ctx is 8B struct with the literal encoding information. And I
> considered having a spare bytes there as a feature, and not
> a drawback.
>

Oh, I thought this was just storing the enum, not the struct. Doh.

That explains why I was confused.

Thinking more about the questions in the follow up mails....



>> Why don't we just give _Indexing a fixed underlying type of unsigned char
>> and then put the API version after that?
>>
>>
>>
>>
>>>>
>>>>        iterator _M_begin;
>>>>>        iterator _M_end;
>>>>>        enum _Indexing { _Unknown, _Manual, _Auto };
>>>>>        _Indexing _M_indexing = _Unknown;
>>>>>
>>>>
>>>> We already have padding bytes here (and could guarantee that by giving
>>>> a fixed underlying type to _Indexing)
>>>>
>>>>        size_t _M_next_arg_id = 0;
>>>>>        size_t _M_num_args = 0;
>>>>> +
>>>>> +      friend __format::_Scanner<_CharT>;
>>>>>      };
>>>>>
>>>>>  /// @cond undocumented
>>>>> @@ -4927,18 +4969,21 @@ namespace __format
>>>>>      {
>>>>>        static_assert( output_iterator<_Out, const _CharT&> );
>>>>>
>>>>> +      __format::_Api_ctx  _M_api;
>>>>>        basic_format_args<basic_format_context> _M_args;
>>>>>        _Out _M_out;
>>>>>        __format::_Optional_locale _M_loc;
>>>>>
>>>>> -      basic_format_context(basic_format_args<basic_format_context>
>>>>> __args,
>>>>> +      basic_format_context(__format::_Api_ctx __api,
>>>>> +                          basic_format_args<basic_format_context>
>>>>> __args,
>>>>>                            _Out __out)
>>>>> -      : _M_args(__args), _M_out(std::move(__out))
>>>>> +      : _M_api(__api), _M_args(__args), _M_out(std::move(__out))
>>>>>        { }
>>>>>
>>>>> -      basic_format_context(basic_format_args<basic_format_context>
>>>>> __args,
>>>>> +      basic_format_context(__format::_Api_ctx __api,
>>>>> +                          basic_format_args<basic_format_context>
>>>>> __args,
>>>>>                            _Out __out, const std::locale& __loc)
>>>>> -      : _M_args(__args), _M_out(std::move(__out)), _M_loc(__loc)
>>>>> +      : _M_api(__api), _M_args(__args),
>>>>> _M_out(std::move(__out)), _M_loc(__loc)
>>>>>        { }
>>>>>
>>>>>        // _GLIBCXX_RESOLVE_LIB_DEFECTS
>>>>> @@ -4954,6 +4999,7 @@ namespace __format
>>>>>                                   const locale*);
>>>>>
>>>>>        friend __format::__formatter_chrono<_CharT>;
>>>>> +      friend __format::_Formatting_scanner<_Out, _CharT>;
>>>>>
>>>>>      public:
>>>>>        ~basic_format_context() = default;
>>>>> @@ -4998,8 +5044,9 @@ namespace __format
>>>>>        } _M_pc;
>>>>>
>>>>>        constexpr explicit
>>>>> -      _Scanner(basic_string_view<_CharT> __str, size_t __nargs =
>>>>> (size_t)-1)
>>>>> -      : _M_pc(__str, __nargs)
>>>>> +      _Scanner(_Api_ctx __api, basic_string_view<_CharT> __str,
>>>>> +              size_t __nargs = (size_t)-1)
>>>>> +      : _M_pc(__api, __str, __nargs)
>>>>>        { }
>>>>>
>>>>>        constexpr iterator begin() const noexcept { return
>>>>> _M_pc.begin(); }
>>>>> @@ -5115,7 +5162,7 @@ namespace __format
>>>>>      public:
>>>>>        _Formatting_scanner(basic_format_context<_Out, _CharT>& __fc,
>>>>>                           basic_string_view<_CharT> __str)
>>>>> -      : _Scanner<_CharT>(__str), _M_fc(__fc)
>>>>> +      : _Scanner<_CharT>(__fc._M_api, __str), _M_fc(__fc)
>>>>>        { }
>>>>>
>>>>>      private:
>>>>> @@ -5176,7 +5223,8 @@ namespace __format
>>>>>      public:
>>>>>        consteval
>>>>>        _Checking_scanner(basic_string_view<_CharT> __str)
>>>>> -      : _Scanner<_CharT>(__str, sizeof...(_Args))
>>>>> +      : _Scanner<_CharT>(__format::__current_api<_CharT>(),
>>>>>
>>>>
>>>> This is consteval so should use the right version for the current TU.
>>>>
>>>> +                        __str, sizeof...(_Args))
>>>>>        {
>>>>>  #if __cpp_lib_format >= 202305L
>>>>>         this->_M_pc._M_types = _M_types.data();
>>>>> @@ -5219,82 +5267,91 @@ namespace __format
>>>>>  #endif
>>>>>      };
>>>>>
>>>>> -  template<typename _Out, typename _CharT, typename _Context>
>>>>> -    inline _Out
>>>>> -    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>>>> -                   const basic_format_args<_Context>& __args,
>>>>> -                   const locale* __loc)
>>>>> +  template<typename _CharT>
>>>>> +    _Sink_iter<_CharT>
>>>>> +    __do_vformat_to(_Sink_iter<_CharT> __out,
>>>>> basic_string_view<_CharT> __fmt,
>>>>> +                   __format_context<_CharT>& __ctx)
>>>>>      {
>>>>> -      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>>>> -       {
>>>>> -         if constexpr (is_same_v<_CharT, char>)
>>>>> -           // Fast path for "{}" format strings and simple format arg
>>>>> types.
>>>>> -           if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] ==
>>>>> '}')
>>>>> -             {
>>>>> -               bool __done = false;
>>>>> -               __format::__visit_format_arg([&](auto& __arg) {
>>>>> -                 using _Tp = remove_cvref_t<decltype(__arg)>;
>>>>> -                 if constexpr (is_same_v<_Tp, bool>)
>>>>> +      if constexpr (is_same_v<_CharT, char>)
>>>>> +       // Fast path for "{}" format strings and simple format arg
>>>>> types.
>>>>> +       if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>>>>> +         {
>>>>> +           bool __done = false;
>>>>> +           __format::__visit_format_arg([&](auto& __arg) {
>>>>> +             using _Tp = remove_cvref_t<decltype(__arg)>;
>>>>> +             if constexpr (is_same_v<_Tp, bool>)
>>>>> +               {
>>>>> +                 size_t __len = 4 + !__arg;
>>>>> +                 const char* __chars[] = { "false", "true" };
>>>>> +                 if (auto __res = __out._M_reserve(__len))
>>>>>                     {
>>>>> -                     size_t __len = 4 + !__arg;
>>>>> -                     const char* __chars[] = { "false", "true" };
>>>>> -                     if (auto __res = __out._M_reserve(__len))
>>>>> -                       {
>>>>> -                         __builtin_memcpy(__res.get(),
>>>>> __chars[__arg], __len);
>>>>> -                         __res._M_bump(__len);
>>>>> -                         __done = true;
>>>>> -                       }
>>>>> +                     __builtin_memcpy(__res.get(), __chars[__arg],
>>>>> __len);
>>>>> +                     __res._M_bump(__len);
>>>>> +                     __done = true;
>>>>>                     }
>>>>> -                 else if constexpr (is_same_v<_Tp, char>)
>>>>> +               }
>>>>> +             else if constexpr (is_same_v<_Tp, char>)
>>>>> +               {
>>>>> +                 if (auto __res = __out._M_reserve(1))
>>>>>                     {
>>>>> -                     if (auto __res = __out._M_reserve(1))
>>>>> -                       {
>>>>> -                         *__res.get() = __arg;
>>>>> -                         __res._M_bump(1);
>>>>> -                         __done = true;
>>>>> -                       }
>>>>> +                     *__res.get() = __arg;
>>>>> +                     __res._M_bump(1);
>>>>> +                     __done = true;
>>>>>                     }
>>>>> -                 else if constexpr (is_integral_v<_Tp>)
>>>>> +               }
>>>>> +             else if constexpr (is_integral_v<_Tp>)
>>>>> +               {
>>>>> +                 make_unsigned_t<_Tp> __uval;
>>>>> +                 const bool __neg = __arg < 0;
>>>>> +                 if (__neg)
>>>>> +                   __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>>>> +                 else
>>>>> +                   __uval = __arg;
>>>>> +                 const auto __n = __detail::__to_chars_len(__uval);
>>>>> +                 if (auto __res = __out._M_reserve(__n + __neg))
>>>>>                     {
>>>>> -                     make_unsigned_t<_Tp> __uval;
>>>>> -                     const bool __neg = __arg < 0;
>>>>> -                     if (__neg)
>>>>> -                       __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>>>> -                     else
>>>>> -                       __uval = __arg;
>>>>> -                     const auto __n =
>>>>> __detail::__to_chars_len(__uval);
>>>>> -                     if (auto __res = __out._M_reserve(__n + __neg))
>>>>> -                       {
>>>>> -                         auto __ptr = __res.get();
>>>>> -                         *__ptr = '-';
>>>>> -                         __detail::__to_chars_10_impl(__ptr +
>>>>> (int)__neg, __n,
>>>>> -                                                      __uval);
>>>>> -                         __res._M_bump(__n + __neg);
>>>>> -                         __done = true;
>>>>> -                       }
>>>>> +                     auto __ptr = __res.get();
>>>>> +                     *__ptr = '-';
>>>>> +                     __detail::__to_chars_10_impl(__ptr + (int)__neg,
>>>>> __n,
>>>>> +                                                  __uval);
>>>>> +                     __res._M_bump(__n + __neg);
>>>>> +                     __done = true;
>>>>>                     }
>>>>> -                 else if constexpr (is_convertible_v<_Tp,
>>>>> string_view>)
>>>>> +               }
>>>>> +             else if constexpr (is_convertible_v<_Tp, string_view>)
>>>>> +               {
>>>>> +                 string_view __sv = __arg;
>>>>> +                 if (auto __res = __out._M_reserve(__sv.size()))
>>>>>                     {
>>>>> -                     string_view __sv = __arg;
>>>>> -                     if (auto __res = __out._M_reserve(__sv.size()))
>>>>> -                       {
>>>>> -                         __builtin_memcpy(__res.get(), __sv.data(),
>>>>> __sv.size());
>>>>> -                         __res._M_bump(__sv.size());
>>>>> -                         __done = true;
>>>>> -                       }
>>>>> +                     __builtin_memcpy(__res.get(), __sv.data(),
>>>>> __sv.size());
>>>>> +                     __res._M_bump(__sv.size());
>>>>> +                     __done = true;
>>>>>                     }
>>>>> -               }, __args.get(0));
>>>>> +               }
>>>>> +           }, __ctx.arg(0));
>>>>>
>>>>> -               if (__done)
>>>>> -                 return __out;
>>>>> -             }
>>>>> +           if (__done)
>>>>> +             return __out;
>>>>> +         }
>>>>>
>>>>> +      _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>>>> __scanner(__ctx, __fmt);
>>>>> +      __scanner._M_scan();
>>>>> +      return __out;
>>>>> +    }
>>>>> +
>>>>> +  template<typename _Out, typename _CharT, typename _Context>
>>>>> +    _Out
>>>>> +    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>>>> +                   const basic_format_args<_Context>& __args,
>>>>> +                   const locale* __loc)
>>>>> +    {
>>>>> +      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>>>> +       {
>>>>> +         const auto __api = __format::__current_api<_CharT>();
>>>>>           auto __ctx = __loc == nullptr
>>>>> -                        ? _Context(__args, __out)
>>>>> -                        : _Context(__args, __out, *__loc);
>>>>> -         _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>>>> __scanner(__ctx, __fmt);
>>>>> -         __scanner._M_scan();
>>>>> -         return __out;
>>>>> +                    ? _Context(__api, __args, __out)
>>>>> +                    : _Context(__api, __args, __out, *__loc);
>>>>> +         return __do_vformat_to(std::move(__out), __fmt, __ctx);
>>>>>         }
>>>>>        else if constexpr (__contiguous_char_iter<_CharT, _Out>)
>>>>>         {
>>>>> --
>>>>> 2.53.0
>>>>>
>>>>>
  
Tomasz Kaminski April 7, 2026, 4:33 p.m. UTC | #8
On Tue, Apr 7, 2026 at 3:29 PM Tomasz Kaminski <tkaminsk@redhat.com> wrote:

>
>
> On Tue, Apr 7, 2026 at 3:00 PM Jonathan Wakely <jwakely.gcc@gmail.com>
> wrote:
>
>>
>>
>> On Tue, 7 Apr 2026, 09:15 Tomasz Kamiński, <tkaminsk@redhat.com> wrote:
>>
>>> This patch adds a _M_api member to basic_format_context and
>>> basic_format_parse_context, that represents the information about
>>> the TU in which the call was compiled:
>>> * _M_ver represents the C++ standard in which TU was compiled,
>>> * _M_literal_unicode is true when TU was compiled with Unicode
>>>   literal encoding,
>>> * _M_liter_enc is reserved for storing text_encoding::id value
>>>   for literal encoding, currently set to zero.
>>> This values are then populated by __current_api<_CharT>() functions.
>>>
>>> This would allow the formatter instantiations compiled in different
>>> TU (for example as part of libstdc++.so) to properly handle:
>>> * multi-byte fill-characters used as fill in format-spec, that
>>>   are supported only if literal encoding is Unicode,
>>> * '?' as format flags for string and characters, that is only
>>>   supported since C++23,
>>> * escaping of the string parameters, that depends on the literal
>>>   encoding.
>>>
>>> The further aid the above, a new __do_vformat_to overload is extracted.
>>> This overload format_context& that encodes the TU-specific properties,
>>> and can be exported in from libstdc++.
>>>
>>> This patch on purpose does not modify the formatters code, and only
>>> adds new members, as adding them later would be ABI break.
>>>
>>> libstdc++-v3/ChangeLog:
>>>
>>>         * include/std/format (__format::_Api_ctx,
>>> __format::__current_api):
>>>         Define.
>>>         (basic_format_parse_context::_M_api): Define.
>>>         (basic_format_parse_context::basic_format_parse_context):
>>>         Provide (basic_string_view, size_t) constructor only in C++20.
>>>         Define new internal private cosntructor accepting _Api_ctx.
>>>         (basic_format_context::_M_api): Define.
>>>         (basic_format_context::basic_format_context): Add additional
>>>         _Api_ctx parameter.
>>>         (_Scanner::_Scanner): Add additional _Api_ctx parameter,
>>>         and forward it to basic_format_parse_context.
>>>         (_Formatting_scanner::_Formatting_scanner): Propagate
>>>         _M_api from basic_format_context.
>>>         (_Checking_scanner::_Checking_scanner): Use
>>> __format::__current_api()
>>>         to initialize API.
>>>         (__format::__do_vformat_to): Extract overload accepting
>>>         basic_format_context.
>>> ---
>>> I have realized that exporting the vformat specializations correclty
>>> requires
>>> much bigger code changes, than I am comfortable making this late in the
>>> stage-4,
>>> as we will need to make the code independed on TU specific properties
>>> (like
>>> encodinds). This patch instead adds a context members to
>>> basic_format_context
>>> and basic_format_parse_context that would allow doing so in the future.
>>>
>>
>> An alternative would be to have an inline dispatching function that
>> decides whether the current TU matches what's in the library (where that
>> will be the common case) and only uses the explicit instantiations of it
>> matches.
>>
> Seems reasonable for unicode encoding, but does not solve multiple
> standard.
>
>>
>> I'm not sure this is really a problem I care about solving. If you try to
>> mix incompatible literal encodings in one program you shouldn't expect
>> sensible results for code that is sensitive to the literal encoding.
>>
>> When mixing C++20 and C++23, the C++20 TUs should use the explicit
>> instantiation which is right for C++20, and C++23 TUs will use an implicit
>> instantiation of the C++23 definition.
>>
> This works only on surface level, C++20 will use __vformat_impl_20 and
> C++23 will use __vformat_impl_23, defined in format-inst-20, format-inst-23
> source files.
> Both of these files will instantiate `__formatter_str` under the same
> name. When they are combined into `libstdc++.so`, linker one, and one
> standard
> will get incorrect behavior for one of the standards. To avoid this
> problem we will need to ABI tag each used formatter in some manner and
> apply that
> tag virally to any formatter referenced from these functions.
>
>
>>
>> Is there really a problem?
>>
>> If we can capture the API level without adding any overhead, I suppose
>> that's acceptable.
>>
>> If we store the text encoding, what are we going to do with it? Use iconv
>> to convert the fill character on the fly? To what output encoding?
>>
> This if for future, if we want to handle string escaping for non-unicode
> encoding better than giving them equivalent behavior than ASCII.
> We can add any additional fields to basic_format_parse_context and
> basic_format_context in the future; this is why I am reserving space for it.
>
> // Also could you take a look at:
> https://gcc.gnu.org/pipermail/libstdc++/2026-April/066030.html
>
>
>>
>>
>>
>>
>>> Tested all *format* test on x86_64-linux. OK for trunk when all test
>>> passes?
>>>
>>>  libstdc++-v3/include/std/format | 209 ++++++++++++++++++++------------
>>>  1 file changed, 133 insertions(+), 76 deletions(-)
>>>
>>> diff --git a/libstdc++-v3/include/std/format
>>> b/libstdc++-v3/include/std/format
>>> index eca5bd213aa..97d1ecb3ed6 100644
>>> --- a/libstdc++-v3/include/std/format
>>> +++ b/libstdc++-v3/include/std/format
>>> @@ -140,6 +140,37 @@ namespace __format
>>>        template<typename, typename...> friend struct
>>> std::basic_format_string;
>>>      };
>>>
>>> +  // Exposed via basic_format_parse_context, defines the TU specific
>>> information
>>> +  // like encoding and standard version.
>>> +  struct _Api_ctx
>>> +  {
>>> +    enum class _Version : unsigned char
>>> +    { _Api_2020, _Api_2023, _Api_2026 };
>>> +
>>> +    _Version _M_ver;
>>> +    unsigned _M_unused : 23;
>>> +    unsigned _M_literal_unicode : 1;
>>> +    __INT_LEAST32_TYPE__ _M_literal_enc;
>>> +  };
>>> +  using enum _Api_ctx::_Version;
>>> +
>>> +  template<typename _CharT>
>>> +    constexpr _Api_ctx
>>> +    __current_api()
>>>
>>
>> Should this be always inline?
>>
>> +    {
>>> +      _Api_ctx __api{};
>>> +#if __cpluplus > 202302L
>>> +      __api._M_ver = _Api_2026;
>>> +#elif __cpluplus > 202002L
>>> +      __api._M_ver = _Api_2023;
>>> +#else
>>> +      __api._M_ver = _Api_2020;
>>> +#endif
>>> +      __api._M_literal_unicode
>>> +       = __unicode::__literal_encoding_is_unicode<_CharT>();
>>> +      return __api;
>>> +    }
>>> +
>>>  } // namespace __format
>>>  /// @endcond
>>>
>>> @@ -274,7 +305,7 @@ namespace __format
>>>    { __throw_format_error("format error: failed to parse format-spec"); }
>>>
>>>    template<typename _CharT> class _Scanner;
>>> -
>>> +  template<typename _Out, typename _CharT> class _Formatting_scanner;
>>>  } // namespace __format
>>>    /// @endcond
>>>
>>> @@ -408,23 +439,34 @@ namespace __format
>>>        // This must not be constexpr.
>>>        static void __invalid_dynamic_spec(const char*);
>>>
>>> -      friend __format::_Scanner<_CharT>;
>>> -#endif
>>> -
>>> +#else
>>>        // This constructor should only be used by the implementation.
>>>        constexpr explicit
>>>        basic_format_parse_context(basic_string_view<_CharT> __fmt,
>>>                                  size_t __num_args) noexcept
>>>        : _M_begin(__fmt.begin()), _M_end(__fmt.end()),
>>> _M_num_args(__num_args)
>>>        { }
>>> +#endif
>>>
>>>      private:
>>> +      // This constructor should only be used by the implementation.
>>> +      constexpr explicit
>>> +      basic_format_parse_context(__format::_Api_ctx __api,
>>> +                                basic_string_view<_CharT> __fmt,
>>> +                                size_t __num_args) noexcept
>>> +      : _M_api(__api), _M_begin(__fmt.begin()), _M_end(__fmt.end())
>>> +      , _M_num_args(__num_args)
>>> +      { }
>>> +
>>> +      __format::_Api_ctx _M_api = __format::__current_api<_CharT>();
>>>
>>
>> What guarantees this will be initialized by a call to the right version?
>>
> This is only used by basic_format_parse_context(string) constructor, that
> is mostly used
> for user defined-formatters. We may want to define this cosntructors as
> always inline.
>
If we go with this direction, I will add __api as last defaulted parameter
for user-facing
constructor, and this will guarantee that they are always populated on TU
side.
(Not doing that now, until we decied how we want to proceed).


>
>> Doesn't putting this member first add a lot of wasted padding due to
>> alignment?
>>
> I do not think basic_format_parse_context and basic_format_context size is
> relevant to
> anybody. But the struct is 64bits, so should not add extra pading.
>
>>
>>
>>        iterator _M_begin;
>>>        iterator _M_end;
>>>        enum _Indexing { _Unknown, _Manual, _Auto };
>>>        _Indexing _M_indexing = _Unknown;
>>>
>>
>> We already have padding bytes here (and could guarantee that by giving a
>> fixed underlying type to _Indexing)
>>
>>        size_t _M_next_arg_id = 0;
>>>        size_t _M_num_args = 0;
>>> +
>>> +      friend __format::_Scanner<_CharT>;
>>>      };
>>>
>>>  /// @cond undocumented
>>> @@ -4927,18 +4969,21 @@ namespace __format
>>>      {
>>>        static_assert( output_iterator<_Out, const _CharT&> );
>>>
>>> +      __format::_Api_ctx  _M_api;
>>>        basic_format_args<basic_format_context> _M_args;
>>>        _Out _M_out;
>>>        __format::_Optional_locale _M_loc;
>>>
>>> -      basic_format_context(basic_format_args<basic_format_context>
>>> __args,
>>> +      basic_format_context(__format::_Api_ctx __api,
>>> +                          basic_format_args<basic_format_context>
>>> __args,
>>>                            _Out __out)
>>> -      : _M_args(__args), _M_out(std::move(__out))
>>> +      : _M_api(__api), _M_args(__args), _M_out(std::move(__out))
>>>        { }
>>>
>>> -      basic_format_context(basic_format_args<basic_format_context>
>>> __args,
>>> +      basic_format_context(__format::_Api_ctx __api,
>>> +                          basic_format_args<basic_format_context>
>>> __args,
>>>                            _Out __out, const std::locale& __loc)
>>> -      : _M_args(__args), _M_out(std::move(__out)), _M_loc(__loc)
>>> +      : _M_api(__api), _M_args(__args),
>>> _M_out(std::move(__out)), _M_loc(__loc)
>>>        { }
>>>
>>>        // _GLIBCXX_RESOLVE_LIB_DEFECTS
>>> @@ -4954,6 +4999,7 @@ namespace __format
>>>                                   const locale*);
>>>
>>>        friend __format::__formatter_chrono<_CharT>;
>>> +      friend __format::_Formatting_scanner<_Out, _CharT>;
>>>
>>>      public:
>>>        ~basic_format_context() = default;
>>> @@ -4998,8 +5044,9 @@ namespace __format
>>>        } _M_pc;
>>>
>>>        constexpr explicit
>>> -      _Scanner(basic_string_view<_CharT> __str, size_t __nargs =
>>> (size_t)-1)
>>> -      : _M_pc(__str, __nargs)
>>> +      _Scanner(_Api_ctx __api, basic_string_view<_CharT> __str,
>>> +              size_t __nargs = (size_t)-1)
>>> +      : _M_pc(__api, __str, __nargs)
>>>        { }
>>>
>>>        constexpr iterator begin() const noexcept { return _M_pc.begin();
>>> }
>>> @@ -5115,7 +5162,7 @@ namespace __format
>>>      public:
>>>        _Formatting_scanner(basic_format_context<_Out, _CharT>& __fc,
>>>                           basic_string_view<_CharT> __str)
>>> -      : _Scanner<_CharT>(__str), _M_fc(__fc)
>>> +      : _Scanner<_CharT>(__fc._M_api, __str), _M_fc(__fc)
>>>        { }
>>>
>>>      private:
>>> @@ -5176,7 +5223,8 @@ namespace __format
>>>      public:
>>>        consteval
>>>        _Checking_scanner(basic_string_view<_CharT> __str)
>>> -      : _Scanner<_CharT>(__str, sizeof...(_Args))
>>> +      : _Scanner<_CharT>(__format::__current_api<_CharT>(),
>>>
>>
>> This is consteval so should use the right version for the current TU.
>>
>> +                        __str, sizeof...(_Args))
>>>        {
>>>  #if __cpp_lib_format >= 202305L
>>>         this->_M_pc._M_types = _M_types.data();
>>> @@ -5219,82 +5267,91 @@ namespace __format
>>>  #endif
>>>      };
>>>
>>> -  template<typename _Out, typename _CharT, typename _Context>
>>> -    inline _Out
>>> -    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>> -                   const basic_format_args<_Context>& __args,
>>> -                   const locale* __loc)
>>> +  template<typename _CharT>
>>> +    _Sink_iter<_CharT>
>>> +    __do_vformat_to(_Sink_iter<_CharT> __out, basic_string_view<_CharT>
>>> __fmt,
>>> +                   __format_context<_CharT>& __ctx)
>>>      {
>>> -      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>> -       {
>>> -         if constexpr (is_same_v<_CharT, char>)
>>> -           // Fast path for "{}" format strings and simple format arg
>>> types.
>>> -           if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>>> -             {
>>> -               bool __done = false;
>>> -               __format::__visit_format_arg([&](auto& __arg) {
>>> -                 using _Tp = remove_cvref_t<decltype(__arg)>;
>>> -                 if constexpr (is_same_v<_Tp, bool>)
>>> +      if constexpr (is_same_v<_CharT, char>)
>>> +       // Fast path for "{}" format strings and simple format arg types.
>>> +       if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>>> +         {
>>> +           bool __done = false;
>>> +           __format::__visit_format_arg([&](auto& __arg) {
>>> +             using _Tp = remove_cvref_t<decltype(__arg)>;
>>> +             if constexpr (is_same_v<_Tp, bool>)
>>> +               {
>>> +                 size_t __len = 4 + !__arg;
>>> +                 const char* __chars[] = { "false", "true" };
>>> +                 if (auto __res = __out._M_reserve(__len))
>>>                     {
>>> -                     size_t __len = 4 + !__arg;
>>> -                     const char* __chars[] = { "false", "true" };
>>> -                     if (auto __res = __out._M_reserve(__len))
>>> -                       {
>>> -                         __builtin_memcpy(__res.get(), __chars[__arg],
>>> __len);
>>> -                         __res._M_bump(__len);
>>> -                         __done = true;
>>> -                       }
>>> +                     __builtin_memcpy(__res.get(), __chars[__arg],
>>> __len);
>>> +                     __res._M_bump(__len);
>>> +                     __done = true;
>>>                     }
>>> -                 else if constexpr (is_same_v<_Tp, char>)
>>> +               }
>>> +             else if constexpr (is_same_v<_Tp, char>)
>>> +               {
>>> +                 if (auto __res = __out._M_reserve(1))
>>>                     {
>>> -                     if (auto __res = __out._M_reserve(1))
>>> -                       {
>>> -                         *__res.get() = __arg;
>>> -                         __res._M_bump(1);
>>> -                         __done = true;
>>> -                       }
>>> +                     *__res.get() = __arg;
>>> +                     __res._M_bump(1);
>>> +                     __done = true;
>>>                     }
>>> -                 else if constexpr (is_integral_v<_Tp>)
>>> +               }
>>> +             else if constexpr (is_integral_v<_Tp>)
>>> +               {
>>> +                 make_unsigned_t<_Tp> __uval;
>>> +                 const bool __neg = __arg < 0;
>>> +                 if (__neg)
>>> +                   __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>> +                 else
>>> +                   __uval = __arg;
>>> +                 const auto __n = __detail::__to_chars_len(__uval);
>>> +                 if (auto __res = __out._M_reserve(__n + __neg))
>>>                     {
>>> -                     make_unsigned_t<_Tp> __uval;
>>> -                     const bool __neg = __arg < 0;
>>> -                     if (__neg)
>>> -                       __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>> -                     else
>>> -                       __uval = __arg;
>>> -                     const auto __n = __detail::__to_chars_len(__uval);
>>> -                     if (auto __res = __out._M_reserve(__n + __neg))
>>> -                       {
>>> -                         auto __ptr = __res.get();
>>> -                         *__ptr = '-';
>>> -                         __detail::__to_chars_10_impl(__ptr +
>>> (int)__neg, __n,
>>> -                                                      __uval);
>>> -                         __res._M_bump(__n + __neg);
>>> -                         __done = true;
>>> -                       }
>>> +                     auto __ptr = __res.get();
>>> +                     *__ptr = '-';
>>> +                     __detail::__to_chars_10_impl(__ptr + (int)__neg,
>>> __n,
>>> +                                                  __uval);
>>> +                     __res._M_bump(__n + __neg);
>>> +                     __done = true;
>>>                     }
>>> -                 else if constexpr (is_convertible_v<_Tp, string_view>)
>>> +               }
>>> +             else if constexpr (is_convertible_v<_Tp, string_view>)
>>> +               {
>>> +                 string_view __sv = __arg;
>>> +                 if (auto __res = __out._M_reserve(__sv.size()))
>>>                     {
>>> -                     string_view __sv = __arg;
>>> -                     if (auto __res = __out._M_reserve(__sv.size()))
>>> -                       {
>>> -                         __builtin_memcpy(__res.get(), __sv.data(),
>>> __sv.size());
>>> -                         __res._M_bump(__sv.size());
>>> -                         __done = true;
>>> -                       }
>>> +                     __builtin_memcpy(__res.get(), __sv.data(),
>>> __sv.size());
>>> +                     __res._M_bump(__sv.size());
>>> +                     __done = true;
>>>                     }
>>> -               }, __args.get(0));
>>> +               }
>>> +           }, __ctx.arg(0));
>>>
>>> -               if (__done)
>>> -                 return __out;
>>> -             }
>>> +           if (__done)
>>> +             return __out;
>>> +         }
>>>
>>> +      _Formatting_scanner<_Sink_iter<_CharT>, _CharT> __scanner(__ctx,
>>> __fmt);
>>> +      __scanner._M_scan();
>>> +      return __out;
>>> +    }
>>> +
>>> +  template<typename _Out, typename _CharT, typename _Context>
>>> +    _Out
>>> +    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>> +                   const basic_format_args<_Context>& __args,
>>> +                   const locale* __loc)
>>> +    {
>>> +      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>> +       {
>>> +         const auto __api = __format::__current_api<_CharT>();
>>>           auto __ctx = __loc == nullptr
>>> -                        ? _Context(__args, __out)
>>> -                        : _Context(__args, __out, *__loc);
>>> -         _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>> __scanner(__ctx, __fmt);
>>> -         __scanner._M_scan();
>>> -         return __out;
>>> +                    ? _Context(__api, __args, __out)
>>> +                    : _Context(__api, __args, __out, *__loc);
>>> +         return __do_vformat_to(std::move(__out), __fmt, __ctx);
>>>         }
>>>        else if constexpr (__contiguous_char_iter<_CharT, _Out>)
>>>         {
>>> --
>>> 2.53.0
>>>
>>>
  
Tomasz Kaminski April 9, 2026, 8:11 a.m. UTC | #9
On Tue, Apr 7, 2026 at 5:47 PM Jonathan Wakely <jwakely.gcc@gmail.com>
wrote:

>
>
> On Tue, 7 Apr 2026, 15:09 Tomasz Kaminski, <tkaminsk@redhat.com> wrote:
>
>>
>>
>> On Tue, Apr 7, 2026 at 3:54 PM Jonathan Wakely <jwakely.gcc@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, 7 Apr 2026, 14:30 Tomasz Kaminski, <tkaminsk@redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Apr 7, 2026 at 3:00 PM Jonathan Wakely <jwakely.gcc@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, 7 Apr 2026, 09:15 Tomasz Kamiński, <tkaminsk@redhat.com>
>>>>> wrote:
>>>>>
>>>>>> This patch adds a _M_api member to basic_format_context and
>>>>>> basic_format_parse_context, that represents the information about
>>>>>> the TU in which the call was compiled:
>>>>>> * _M_ver represents the C++ standard in which TU was compiled,
>>>>>> * _M_literal_unicode is true when TU was compiled with Unicode
>>>>>>   literal encoding,
>>>>>> * _M_liter_enc is reserved for storing text_encoding::id value
>>>>>>   for literal encoding, currently set to zero.
>>>>>> This values are then populated by __current_api<_CharT>() functions.
>>>>>>
>>>>>> This would allow the formatter instantiations compiled in different
>>>>>> TU (for example as part of libstdc++.so) to properly handle:
>>>>>> * multi-byte fill-characters used as fill in format-spec, that
>>>>>>   are supported only if literal encoding is Unicode,
>>>>>> * '?' as format flags for string and characters, that is only
>>>>>>   supported since C++23,
>>>>>> * escaping of the string parameters, that depends on the literal
>>>>>>   encoding.
>>>>>>
>>>>>> The further aid the above, a new __do_vformat_to overload is
>>>>>> extracted.
>>>>>> This overload format_context& that encodes the TU-specific properties,
>>>>>> and can be exported in from libstdc++.
>>>>>>
>>>>>> This patch on purpose does not modify the formatters code, and only
>>>>>> adds new members, as adding them later would be ABI break.
>>>>>>
>>>>>> libstdc++-v3/ChangeLog:
>>>>>>
>>>>>>         * include/std/format (__format::_Api_ctx,
>>>>>> __format::__current_api):
>>>>>>         Define.
>>>>>>         (basic_format_parse_context::_M_api): Define.
>>>>>>         (basic_format_parse_context::basic_format_parse_context):
>>>>>>         Provide (basic_string_view, size_t) constructor only in C++20.
>>>>>>         Define new internal private cosntructor accepting _Api_ctx.
>>>>>>         (basic_format_context::_M_api): Define.
>>>>>>         (basic_format_context::basic_format_context): Add additional
>>>>>>         _Api_ctx parameter.
>>>>>>         (_Scanner::_Scanner): Add additional _Api_ctx parameter,
>>>>>>         and forward it to basic_format_parse_context.
>>>>>>         (_Formatting_scanner::_Formatting_scanner): Propagate
>>>>>>         _M_api from basic_format_context.
>>>>>>         (_Checking_scanner::_Checking_scanner): Use
>>>>>> __format::__current_api()
>>>>>>         to initialize API.
>>>>>>         (__format::__do_vformat_to): Extract overload accepting
>>>>>>         basic_format_context.
>>>>>> ---
>>>>>> I have realized that exporting the vformat specializations correclty
>>>>>> requires
>>>>>> much bigger code changes, than I am comfortable making this late in
>>>>>> the stage-4,
>>>>>> as we will need to make the code independed on TU specific properties
>>>>>> (like
>>>>>> encodinds). This patch instead adds a context members to
>>>>>> basic_format_context
>>>>>> and basic_format_parse_context that would allow doing so in the
>>>>>> future.
>>>>>>
>>>>>
>>>>> An alternative would be to have an inline dispatching function that
>>>>> decides whether the current TU matches what's in the library (where that
>>>>> will be the common case) and only uses the explicit instantiations of it
>>>>> matches.
>>>>>
>>>> Seems reasonable for unicode encoding, but does not solve multiple
>>>> standard.
>>>>
>>>>>
>>>>> I'm not sure this is really a problem I care about solving. If you try
>>>>> to mix incompatible literal encodings in one program you shouldn't expect
>>>>> sensible results for code that is sensitive to the literal encoding.
>>>>>
>>>>> When mixing C++20 and C++23, the C++20 TUs should use the explicit
>>>>> instantiation which is right for C++20, and C++23 TUs will use an implicit
>>>>> instantiation of the C++23 definition.
>>>>>
>>>> This works only on surface level, C++20 will use __vformat_impl_20 and
>>>> C++23 will use __vformat_impl_23, defined in format-inst-20, format-inst-23
>>>> source files.
>>>>
>>>
>>> Once we stabilise C++23 I think we should only instantiate the format
>>> functions for C++23. Will anybody really care if they use a dynamic format
>>> string in C++20 code and don't get an exception for using a C++23 format
>>> specifier?
>>>
>> Yes, the fact that the format specifier string is accepted by `vformat`
>> but rejected by `format` seems very surprising to me, but if we accepted
>> that we have a much simpler problem to solve. If we plan to accept all
>> C++23 specifiers for basic types as extension in C++20 mode, once
>> it will be stabilized, then that would sound much more intuitive for me.
>>
>>>
>>> Both of these files will instantiate `__formatter_str` under the same
>>>> name. When they are combined into `libstdc++.so`, linker one, and one
>>>> standard
>>>> will get incorrect behavior for one of the standards.
>>>>
>>>
>>> Fine, the older standard could get the "wrong"  behaviour (where wrong
>>> just means supporting C++23 features when called from C++20 TUs).
>>>
>> I considered that unacceptable, and we even have a test cases checking if
>> that is the case. The failures for these tests were the reason
>> I started exploring alternatives.
>>
>>>
>>> To avoid this problem we will need to ABI tag each used formatter in
>>>> some manner and apply that
>>>> tag virally to any formatter referenced from these functions.
>>>>
>>>>
>>>>>
>>>>> Is there really a problem?
>>>>>
>>>>> If we can capture the API level without adding any overhead, I suppose
>>>>> that's acceptable.
>>>>>
>>>>> If we store the text encoding, what are we going to do with it? Use
>>>>> iconv to convert the fill character on the fly? To what output encoding?
>>>>>
>>>> This if for future, if we want to handle string escaping for
>>>> non-unicode encoding better than giving them equivalent behavior than ASCII.
>>>> We can add any additional fields to basic_format_parse_context and
>>>> basic_format_context in the future; this is why I am reserving space for it.
>>>>
>>>> // Also could you take a look at:
>>>> https://gcc.gnu.org/pipermail/libstdc++/2026-April/066030.html
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Tested all *format* test on x86_64-linux. OK for trunk when all test
>>>>>> passes?
>>>>>>
>>>>>>  libstdc++-v3/include/std/format | 209
>>>>>> ++++++++++++++++++++------------
>>>>>>  1 file changed, 133 insertions(+), 76 deletions(-)
>>>>>>
>>>>>> diff --git a/libstdc++-v3/include/std/format
>>>>>> b/libstdc++-v3/include/std/format
>>>>>> index eca5bd213aa..97d1ecb3ed6 100644
>>>>>> --- a/libstdc++-v3/include/std/format
>>>>>> +++ b/libstdc++-v3/include/std/format
>>>>>> @@ -140,6 +140,37 @@ namespace __format
>>>>>>        template<typename, typename...> friend struct
>>>>>> std::basic_format_string;
>>>>>>      };
>>>>>>
>>>>>> +  // Exposed via basic_format_parse_context, defines the TU specific
>>>>>> information
>>>>>> +  // like encoding and standard version.
>>>>>> +  struct _Api_ctx
>>>>>> +  {
>>>>>> +    enum class _Version : unsigned char
>>>>>> +    { _Api_2020, _Api_2023, _Api_2026 };
>>>>>> +
>>>>>> +    _Version _M_ver;
>>>>>> +    unsigned _M_unused : 23;
>>>>>> +    unsigned _M_literal_unicode : 1;
>>>>>> +    __INT_LEAST32_TYPE__ _M_literal_enc;
>>>>>> +  };
>>>>>> +  using enum _Api_ctx::_Version;
>>>>>> +
>>>>>> +  template<typename _CharT>
>>>>>> +    constexpr _Api_ctx
>>>>>> +    __current_api()
>>>>>>
>>>>>
>>>>> Should this be always inline?
>>>>>
>>>>> +    {
>>>>>> +      _Api_ctx __api{};
>>>>>> +#if __cpluplus > 202302L
>>>>>> +      __api._M_ver = _Api_2026;
>>>>>> +#elif __cpluplus > 202002L
>>>>>> +      __api._M_ver = _Api_2023;
>>>>>> +#else
>>>>>> +      __api._M_ver = _Api_2020;
>>>>>> +#endif
>>>>>> +      __api._M_literal_unicode
>>>>>> +       = __unicode::__literal_encoding_is_unicode<_CharT>();
>>>>>> +      return __api;
>>>>>> +    }
>>>>>> +
>>>>>>  } // namespace __format
>>>>>>  /// @endcond
>>>>>>
>>>>>> @@ -274,7 +305,7 @@ namespace __format
>>>>>>    { __throw_format_error("format error: failed to parse
>>>>>> format-spec"); }
>>>>>>
>>>>>>    template<typename _CharT> class _Scanner;
>>>>>> -
>>>>>> +  template<typename _Out, typename _CharT> class _Formatting_scanner;
>>>>>>  } // namespace __format
>>>>>>    /// @endcond
>>>>>>
>>>>>> @@ -408,23 +439,34 @@ namespace __format
>>>>>>        // This must not be constexpr.
>>>>>>        static void __invalid_dynamic_spec(const char*);
>>>>>>
>>>>>> -      friend __format::_Scanner<_CharT>;
>>>>>> -#endif
>>>>>> -
>>>>>> +#else
>>>>>>        // This constructor should only be used by the implementation.
>>>>>>        constexpr explicit
>>>>>>        basic_format_parse_context(basic_string_view<_CharT> __fmt,
>>>>>>                                  size_t __num_args) noexcept
>>>>>>        : _M_begin(__fmt.begin()), _M_end(__fmt.end()),
>>>>>> _M_num_args(__num_args)
>>>>>>        { }
>>>>>> +#endif
>>>>>>
>>>>>>      private:
>>>>>> +      // This constructor should only be used by the implementation.
>>>>>> +      constexpr explicit
>>>>>> +      basic_format_parse_context(__format::_Api_ctx __api,
>>>>>> +                                basic_string_view<_CharT> __fmt,
>>>>>> +                                size_t __num_args) noexcept
>>>>>> +      : _M_api(__api), _M_begin(__fmt.begin()), _M_end(__fmt.end())
>>>>>> +      , _M_num_args(__num_args)
>>>>>> +      { }
>>>>>> +
>>>>>> +      __format::_Api_ctx _M_api = __format::__current_api<_CharT>();
>>>>>>
>>>>>
>>>>> What guarantees this will be initialized by a call to the right
>>>>> version?
>>>>>
>>>> This is only used by basic_format_parse_context(string) constructor,
>>>> that is mostly used
>>>> for user defined-formatters. We may want to define this cosntructors as
>>>> always inline.
>>>>
>>>>>
>>>>> Doesn't putting this member first add a lot of wasted padding due to
>>>>> alignment?
>>>>>
>>>> I do not think basic_format_parse_context and basic_format_context size
>>>> is relevant to
>>>> anybody. But the struct is 64bits, so should not add extra pading.
>>>>
>>>
>>> I don't understand how adding a new byte before the first iterator
>>> member doesn't introduce sizeof(void*)-1 bytes of padding.
>>>
>> _Api_ctx is 8B struct with the literal encoding information. And I
>> considered having a spare bytes there as a feature, and not
>> a drawback.
>>
>
> Oh, I thought this was just storing the enum, not the struct. Doh.
>
> That explains why I was confused.
>
> Thinking more about the questions in the follow up mails....
>
I think allowing a C++23 format string extension in C++20 mode may be a
better choice, as it will help us avoid
all the problems related to linking TU compiled in different standards.
However, I think we should make that proper
extension, and make them accepted for contexpr and runtime format.

For sure, we do not want to be in situation, when ? is accepted if you are
using unicode literal encoding in TU
(and using exported vformat_to defintion), and not if you are using a
different encoding. We could add some
if consteval checks to apply extension only to runtime, but I do not see
value in that.

In short, I am for policy, let make format strings specifiers accepted as
extension, once give standard mode
becomes stable. (On purpose limiting to format-string specifiers, and not
functions like set_debug_format).


>
>
>>> Why don't we just give _Indexing a fixed underlying type of unsigned
>>> char and then put the API version after that?
>>>
>>>
>>>
>>>
>>>>>
>>>>>        iterator _M_begin;
>>>>>>        iterator _M_end;
>>>>>>        enum _Indexing { _Unknown, _Manual, _Auto };
>>>>>>        _Indexing _M_indexing = _Unknown;
>>>>>>
>>>>>
>>>>> We already have padding bytes here (and could guarantee that by giving
>>>>> a fixed underlying type to _Indexing)
>>>>>
>>>>>        size_t _M_next_arg_id = 0;
>>>>>>        size_t _M_num_args = 0;
>>>>>> +
>>>>>> +      friend __format::_Scanner<_CharT>;
>>>>>>      };
>>>>>>
>>>>>>  /// @cond undocumented
>>>>>> @@ -4927,18 +4969,21 @@ namespace __format
>>>>>>      {
>>>>>>        static_assert( output_iterator<_Out, const _CharT&> );
>>>>>>
>>>>>> +      __format::_Api_ctx  _M_api;
>>>>>>        basic_format_args<basic_format_context> _M_args;
>>>>>>        _Out _M_out;
>>>>>>        __format::_Optional_locale _M_loc;
>>>>>>
>>>>>> -      basic_format_context(basic_format_args<basic_format_context>
>>>>>> __args,
>>>>>> +      basic_format_context(__format::_Api_ctx __api,
>>>>>> +                          basic_format_args<basic_format_context>
>>>>>> __args,
>>>>>>                            _Out __out)
>>>>>> -      : _M_args(__args), _M_out(std::move(__out))
>>>>>> +      : _M_api(__api), _M_args(__args), _M_out(std::move(__out))
>>>>>>        { }
>>>>>>
>>>>>> -      basic_format_context(basic_format_args<basic_format_context>
>>>>>> __args,
>>>>>> +      basic_format_context(__format::_Api_ctx __api,
>>>>>> +                          basic_format_args<basic_format_context>
>>>>>> __args,
>>>>>>                            _Out __out, const std::locale& __loc)
>>>>>> -      : _M_args(__args), _M_out(std::move(__out)), _M_loc(__loc)
>>>>>> +      : _M_api(__api), _M_args(__args),
>>>>>> _M_out(std::move(__out)), _M_loc(__loc)
>>>>>>        { }
>>>>>>
>>>>>>        // _GLIBCXX_RESOLVE_LIB_DEFECTS
>>>>>> @@ -4954,6 +4999,7 @@ namespace __format
>>>>>>                                   const locale*);
>>>>>>
>>>>>>        friend __format::__formatter_chrono<_CharT>;
>>>>>> +      friend __format::_Formatting_scanner<_Out, _CharT>;
>>>>>>
>>>>>>      public:
>>>>>>        ~basic_format_context() = default;
>>>>>> @@ -4998,8 +5044,9 @@ namespace __format
>>>>>>        } _M_pc;
>>>>>>
>>>>>>        constexpr explicit
>>>>>> -      _Scanner(basic_string_view<_CharT> __str, size_t __nargs =
>>>>>> (size_t)-1)
>>>>>> -      : _M_pc(__str, __nargs)
>>>>>> +      _Scanner(_Api_ctx __api, basic_string_view<_CharT> __str,
>>>>>> +              size_t __nargs = (size_t)-1)
>>>>>> +      : _M_pc(__api, __str, __nargs)
>>>>>>        { }
>>>>>>
>>>>>>        constexpr iterator begin() const noexcept { return
>>>>>> _M_pc.begin(); }
>>>>>> @@ -5115,7 +5162,7 @@ namespace __format
>>>>>>      public:
>>>>>>        _Formatting_scanner(basic_format_context<_Out, _CharT>& __fc,
>>>>>>                           basic_string_view<_CharT> __str)
>>>>>> -      : _Scanner<_CharT>(__str), _M_fc(__fc)
>>>>>> +      : _Scanner<_CharT>(__fc._M_api, __str), _M_fc(__fc)
>>>>>>        { }
>>>>>>
>>>>>>      private:
>>>>>> @@ -5176,7 +5223,8 @@ namespace __format
>>>>>>      public:
>>>>>>        consteval
>>>>>>        _Checking_scanner(basic_string_view<_CharT> __str)
>>>>>> -      : _Scanner<_CharT>(__str, sizeof...(_Args))
>>>>>> +      : _Scanner<_CharT>(__format::__current_api<_CharT>(),
>>>>>>
>>>>>
>>>>> This is consteval so should use the right version for the current TU.
>>>>>
>>>>> +                        __str, sizeof...(_Args))
>>>>>>        {
>>>>>>  #if __cpp_lib_format >= 202305L
>>>>>>         this->_M_pc._M_types = _M_types.data();
>>>>>> @@ -5219,82 +5267,91 @@ namespace __format
>>>>>>  #endif
>>>>>>      };
>>>>>>
>>>>>> -  template<typename _Out, typename _CharT, typename _Context>
>>>>>> -    inline _Out
>>>>>> -    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>>>>> -                   const basic_format_args<_Context>& __args,
>>>>>> -                   const locale* __loc)
>>>>>> +  template<typename _CharT>
>>>>>> +    _Sink_iter<_CharT>
>>>>>> +    __do_vformat_to(_Sink_iter<_CharT> __out,
>>>>>> basic_string_view<_CharT> __fmt,
>>>>>> +                   __format_context<_CharT>& __ctx)
>>>>>>      {
>>>>>> -      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>>>>> -       {
>>>>>> -         if constexpr (is_same_v<_CharT, char>)
>>>>>> -           // Fast path for "{}" format strings and simple format
>>>>>> arg types.
>>>>>> -           if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] ==
>>>>>> '}')
>>>>>> -             {
>>>>>> -               bool __done = false;
>>>>>> -               __format::__visit_format_arg([&](auto& __arg) {
>>>>>> -                 using _Tp = remove_cvref_t<decltype(__arg)>;
>>>>>> -                 if constexpr (is_same_v<_Tp, bool>)
>>>>>> +      if constexpr (is_same_v<_CharT, char>)
>>>>>> +       // Fast path for "{}" format strings and simple format arg
>>>>>> types.
>>>>>> +       if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>>>>>> +         {
>>>>>> +           bool __done = false;
>>>>>> +           __format::__visit_format_arg([&](auto& __arg) {
>>>>>> +             using _Tp = remove_cvref_t<decltype(__arg)>;
>>>>>> +             if constexpr (is_same_v<_Tp, bool>)
>>>>>> +               {
>>>>>> +                 size_t __len = 4 + !__arg;
>>>>>> +                 const char* __chars[] = { "false", "true" };
>>>>>> +                 if (auto __res = __out._M_reserve(__len))
>>>>>>                     {
>>>>>> -                     size_t __len = 4 + !__arg;
>>>>>> -                     const char* __chars[] = { "false", "true" };
>>>>>> -                     if (auto __res = __out._M_reserve(__len))
>>>>>> -                       {
>>>>>> -                         __builtin_memcpy(__res.get(),
>>>>>> __chars[__arg], __len);
>>>>>> -                         __res._M_bump(__len);
>>>>>> -                         __done = true;
>>>>>> -                       }
>>>>>> +                     __builtin_memcpy(__res.get(), __chars[__arg],
>>>>>> __len);
>>>>>> +                     __res._M_bump(__len);
>>>>>> +                     __done = true;
>>>>>>                     }
>>>>>> -                 else if constexpr (is_same_v<_Tp, char>)
>>>>>> +               }
>>>>>> +             else if constexpr (is_same_v<_Tp, char>)
>>>>>> +               {
>>>>>> +                 if (auto __res = __out._M_reserve(1))
>>>>>>                     {
>>>>>> -                     if (auto __res = __out._M_reserve(1))
>>>>>> -                       {
>>>>>> -                         *__res.get() = __arg;
>>>>>> -                         __res._M_bump(1);
>>>>>> -                         __done = true;
>>>>>> -                       }
>>>>>> +                     *__res.get() = __arg;
>>>>>> +                     __res._M_bump(1);
>>>>>> +                     __done = true;
>>>>>>                     }
>>>>>> -                 else if constexpr (is_integral_v<_Tp>)
>>>>>> +               }
>>>>>> +             else if constexpr (is_integral_v<_Tp>)
>>>>>> +               {
>>>>>> +                 make_unsigned_t<_Tp> __uval;
>>>>>> +                 const bool __neg = __arg < 0;
>>>>>> +                 if (__neg)
>>>>>> +                   __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>>>>> +                 else
>>>>>> +                   __uval = __arg;
>>>>>> +                 const auto __n = __detail::__to_chars_len(__uval);
>>>>>> +                 if (auto __res = __out._M_reserve(__n + __neg))
>>>>>>                     {
>>>>>> -                     make_unsigned_t<_Tp> __uval;
>>>>>> -                     const bool __neg = __arg < 0;
>>>>>> -                     if (__neg)
>>>>>> -                       __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>>>>> -                     else
>>>>>> -                       __uval = __arg;
>>>>>> -                     const auto __n =
>>>>>> __detail::__to_chars_len(__uval);
>>>>>> -                     if (auto __res = __out._M_reserve(__n + __neg))
>>>>>> -                       {
>>>>>> -                         auto __ptr = __res.get();
>>>>>> -                         *__ptr = '-';
>>>>>> -                         __detail::__to_chars_10_impl(__ptr +
>>>>>> (int)__neg, __n,
>>>>>> -                                                      __uval);
>>>>>> -                         __res._M_bump(__n + __neg);
>>>>>> -                         __done = true;
>>>>>> -                       }
>>>>>> +                     auto __ptr = __res.get();
>>>>>> +                     *__ptr = '-';
>>>>>> +                     __detail::__to_chars_10_impl(__ptr +
>>>>>> (int)__neg, __n,
>>>>>> +                                                  __uval);
>>>>>> +                     __res._M_bump(__n + __neg);
>>>>>> +                     __done = true;
>>>>>>                     }
>>>>>> -                 else if constexpr (is_convertible_v<_Tp,
>>>>>> string_view>)
>>>>>> +               }
>>>>>> +             else if constexpr (is_convertible_v<_Tp, string_view>)
>>>>>> +               {
>>>>>> +                 string_view __sv = __arg;
>>>>>> +                 if (auto __res = __out._M_reserve(__sv.size()))
>>>>>>                     {
>>>>>> -                     string_view __sv = __arg;
>>>>>> -                     if (auto __res = __out._M_reserve(__sv.size()))
>>>>>> -                       {
>>>>>> -                         __builtin_memcpy(__res.get(), __sv.data(),
>>>>>> __sv.size());
>>>>>> -                         __res._M_bump(__sv.size());
>>>>>> -                         __done = true;
>>>>>> -                       }
>>>>>> +                     __builtin_memcpy(__res.get(), __sv.data(),
>>>>>> __sv.size());
>>>>>> +                     __res._M_bump(__sv.size());
>>>>>> +                     __done = true;
>>>>>>                     }
>>>>>> -               }, __args.get(0));
>>>>>> +               }
>>>>>> +           }, __ctx.arg(0));
>>>>>>
>>>>>> -               if (__done)
>>>>>> -                 return __out;
>>>>>> -             }
>>>>>> +           if (__done)
>>>>>> +             return __out;
>>>>>> +         }
>>>>>>
>>>>>> +      _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>>>>> __scanner(__ctx, __fmt);
>>>>>> +      __scanner._M_scan();
>>>>>> +      return __out;
>>>>>> +    }
>>>>>> +
>>>>>> +  template<typename _Out, typename _CharT, typename _Context>
>>>>>> +    _Out
>>>>>> +    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>>>>> +                   const basic_format_args<_Context>& __args,
>>>>>> +                   const locale* __loc)
>>>>>> +    {
>>>>>> +      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>>>>> +       {
>>>>>> +         const auto __api = __format::__current_api<_CharT>();
>>>>>>           auto __ctx = __loc == nullptr
>>>>>> -                        ? _Context(__args, __out)
>>>>>> -                        : _Context(__args, __out, *__loc);
>>>>>> -         _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>>>>> __scanner(__ctx, __fmt);
>>>>>> -         __scanner._M_scan();
>>>>>> -         return __out;
>>>>>> +                    ? _Context(__api, __args, __out)
>>>>>> +                    : _Context(__api, __args, __out, *__loc);
>>>>>> +         return __do_vformat_to(std::move(__out), __fmt, __ctx);
>>>>>>         }
>>>>>>        else if constexpr (__contiguous_char_iter<_CharT, _Out>)
>>>>>>         {
>>>>>> --
>>>>>> 2.53.0
>>>>>>
>>>>>>
  
Tomasz Kaminski April 16, 2026, 9:55 a.m. UTC | #10
On Thu, Apr 9, 2026 at 10:11 AM Tomasz Kaminski <tkaminsk@redhat.com> wrote:

>
>
> On Tue, Apr 7, 2026 at 5:47 PM Jonathan Wakely <jwakely.gcc@gmail.com>
> wrote:
>
>>
>>
>> On Tue, 7 Apr 2026, 15:09 Tomasz Kaminski, <tkaminsk@redhat.com> wrote:
>>
>>>
>>>
>>> On Tue, Apr 7, 2026 at 3:54 PM Jonathan Wakely <jwakely.gcc@gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, 7 Apr 2026, 14:30 Tomasz Kaminski, <tkaminsk@redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Apr 7, 2026 at 3:00 PM Jonathan Wakely <jwakely.gcc@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 7 Apr 2026, 09:15 Tomasz Kamiński, <tkaminsk@redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> This patch adds a _M_api member to basic_format_context and
>>>>>>> basic_format_parse_context, that represents the information about
>>>>>>> the TU in which the call was compiled:
>>>>>>> * _M_ver represents the C++ standard in which TU was compiled,
>>>>>>> * _M_literal_unicode is true when TU was compiled with Unicode
>>>>>>>   literal encoding,
>>>>>>> * _M_liter_enc is reserved for storing text_encoding::id value
>>>>>>>   for literal encoding, currently set to zero.
>>>>>>> This values are then populated by __current_api<_CharT>() functions.
>>>>>>>
>>>>>>> This would allow the formatter instantiations compiled in different
>>>>>>> TU (for example as part of libstdc++.so) to properly handle:
>>>>>>> * multi-byte fill-characters used as fill in format-spec, that
>>>>>>>   are supported only if literal encoding is Unicode,
>>>>>>> * '?' as format flags for string and characters, that is only
>>>>>>>   supported since C++23,
>>>>>>> * escaping of the string parameters, that depends on the literal
>>>>>>>   encoding.
>>>>>>>
>>>>>>> The further aid the above, a new __do_vformat_to overload is
>>>>>>> extracted.
>>>>>>> This overload format_context& that encodes the TU-specific
>>>>>>> properties,
>>>>>>> and can be exported in from libstdc++.
>>>>>>>
>>>>>>> This patch on purpose does not modify the formatters code, and only
>>>>>>> adds new members, as adding them later would be ABI break.
>>>>>>>
>>>>>>> libstdc++-v3/ChangeLog:
>>>>>>>
>>>>>>>         * include/std/format (__format::_Api_ctx,
>>>>>>> __format::__current_api):
>>>>>>>         Define.
>>>>>>>         (basic_format_parse_context::_M_api): Define.
>>>>>>>         (basic_format_parse_context::basic_format_parse_context):
>>>>>>>         Provide (basic_string_view, size_t) constructor only in
>>>>>>> C++20.
>>>>>>>         Define new internal private cosntructor accepting _Api_ctx.
>>>>>>>         (basic_format_context::_M_api): Define.
>>>>>>>         (basic_format_context::basic_format_context): Add additional
>>>>>>>         _Api_ctx parameter.
>>>>>>>         (_Scanner::_Scanner): Add additional _Api_ctx parameter,
>>>>>>>         and forward it to basic_format_parse_context.
>>>>>>>         (_Formatting_scanner::_Formatting_scanner): Propagate
>>>>>>>         _M_api from basic_format_context.
>>>>>>>         (_Checking_scanner::_Checking_scanner): Use
>>>>>>> __format::__current_api()
>>>>>>>         to initialize API.
>>>>>>>         (__format::__do_vformat_to): Extract overload accepting
>>>>>>>         basic_format_context.
>>>>>>> ---
>>>>>>> I have realized that exporting the vformat specializations correclty
>>>>>>> requires
>>>>>>> much bigger code changes, than I am comfortable making this late in
>>>>>>> the stage-4,
>>>>>>> as we will need to make the code independed on TU specific
>>>>>>> properties (like
>>>>>>> encodinds). This patch instead adds a context members to
>>>>>>> basic_format_context
>>>>>>> and basic_format_parse_context that would allow doing so in the
>>>>>>> future.
>>>>>>>
>>>>>>
>>>>>> An alternative would be to have an inline dispatching function that
>>>>>> decides whether the current TU matches what's in the library (where that
>>>>>> will be the common case) and only uses the explicit instantiations of it
>>>>>> matches.
>>>>>>
>>>>> Seems reasonable for unicode encoding, but does not solve multiple
>>>>> standard.
>>>>>
>>>>>>
>>>>>> I'm not sure this is really a problem I care about solving. If you
>>>>>> try to mix incompatible literal encodings in one program you shouldn't
>>>>>> expect sensible results for code that is sensitive to the literal encoding.
>>>>>>
>>>>>> When mixing C++20 and C++23, the C++20 TUs should use the explicit
>>>>>> instantiation which is right for C++20, and C++23 TUs will use an implicit
>>>>>> instantiation of the C++23 definition.
>>>>>>
>>>>> This works only on surface level, C++20 will use __vformat_impl_20 and
>>>>> C++23 will use __vformat_impl_23, defined in format-inst-20, format-inst-23
>>>>> source files.
>>>>>
>>>>
>>>> Once we stabilise C++23 I think we should only instantiate the format
>>>> functions for C++23. Will anybody really care if they use a dynamic format
>>>> string in C++20 code and don't get an exception for using a C++23 format
>>>> specifier?
>>>>
>>> Yes, the fact that the format specifier string is accepted by `vformat`
>>> but rejected by `format` seems very surprising to me, but if we accepted
>>> that we have a much simpler problem to solve. If we plan to accept all
>>> C++23 specifiers for basic types as extension in C++20 mode, once
>>> it will be stabilized, then that would sound much more intuitive for me.
>>>
>>>>
>>>> Both of these files will instantiate `__formatter_str` under the same
>>>>> name. When they are combined into `libstdc++.so`, linker one, and one
>>>>> standard
>>>>> will get incorrect behavior for one of the standards.
>>>>>
>>>>
>>>> Fine, the older standard could get the "wrong"  behaviour (where wrong
>>>> just means supporting C++23 features when called from C++20 TUs).
>>>>
>>> I considered that unacceptable, and we even have a test cases checking
>>> if that is the case. The failures for these tests were the reason
>>> I started exploring alternatives.
>>>
>>>>
>>>> To avoid this problem we will need to ABI tag each used formatter in
>>>>> some manner and apply that
>>>>> tag virally to any formatter referenced from these functions.
>>>>>
>>>>>
>>>>>>
>>>>>> Is there really a problem?
>>>>>>
>>>>>> If we can capture the API level without adding any overhead, I
>>>>>> suppose that's acceptable.
>>>>>>
>>>>>> If we store the text encoding, what are we going to do with it? Use
>>>>>> iconv to convert the fill character on the fly? To what output encoding?
>>>>>>
>>>>> This if for future, if we want to handle string escaping for
>>>>> non-unicode encoding better than giving them equivalent behavior than ASCII.
>>>>> We can add any additional fields to basic_format_parse_context and
>>>>> basic_format_context in the future; this is why I am reserving space for it.
>>>>>
>>>>> // Also could you take a look at:
>>>>> https://gcc.gnu.org/pipermail/libstdc++/2026-April/066030.html
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Tested all *format* test on x86_64-linux. OK for trunk when all test
>>>>>>> passes?
>>>>>>>
>>>>>>>  libstdc++-v3/include/std/format | 209
>>>>>>> ++++++++++++++++++++------------
>>>>>>>  1 file changed, 133 insertions(+), 76 deletions(-)
>>>>>>>
>>>>>>> diff --git a/libstdc++-v3/include/std/format
>>>>>>> b/libstdc++-v3/include/std/format
>>>>>>> index eca5bd213aa..97d1ecb3ed6 100644
>>>>>>> --- a/libstdc++-v3/include/std/format
>>>>>>> +++ b/libstdc++-v3/include/std/format
>>>>>>> @@ -140,6 +140,37 @@ namespace __format
>>>>>>>        template<typename, typename...> friend struct
>>>>>>> std::basic_format_string;
>>>>>>>      };
>>>>>>>
>>>>>>> +  // Exposed via basic_format_parse_context, defines the TU
>>>>>>> specific information
>>>>>>> +  // like encoding and standard version.
>>>>>>> +  struct _Api_ctx
>>>>>>> +  {
>>>>>>> +    enum class _Version : unsigned char
>>>>>>> +    { _Api_2020, _Api_2023, _Api_2026 };
>>>>>>> +
>>>>>>> +    _Version _M_ver;
>>>>>>> +    unsigned _M_unused : 23;
>>>>>>> +    unsigned _M_literal_unicode : 1;
>>>>>>> +    __INT_LEAST32_TYPE__ _M_literal_enc;
>>>>>>> +  };
>>>>>>> +  using enum _Api_ctx::_Version;
>>>>>>> +
>>>>>>> +  template<typename _CharT>
>>>>>>> +    constexpr _Api_ctx
>>>>>>> +    __current_api()
>>>>>>>
>>>>>>
>>>>>> Should this be always inline?
>>>>>>
>>>>>> +    {
>>>>>>> +      _Api_ctx __api{};
>>>>>>> +#if __cpluplus > 202302L
>>>>>>> +      __api._M_ver = _Api_2026;
>>>>>>> +#elif __cpluplus > 202002L
>>>>>>> +      __api._M_ver = _Api_2023;
>>>>>>> +#else
>>>>>>> +      __api._M_ver = _Api_2020;
>>>>>>> +#endif
>>>>>>> +      __api._M_literal_unicode
>>>>>>> +       = __unicode::__literal_encoding_is_unicode<_CharT>();
>>>>>>> +      return __api;
>>>>>>> +    }
>>>>>>> +
>>>>>>>  } // namespace __format
>>>>>>>  /// @endcond
>>>>>>>
>>>>>>> @@ -274,7 +305,7 @@ namespace __format
>>>>>>>    { __throw_format_error("format error: failed to parse
>>>>>>> format-spec"); }
>>>>>>>
>>>>>>>    template<typename _CharT> class _Scanner;
>>>>>>> -
>>>>>>> +  template<typename _Out, typename _CharT> class
>>>>>>> _Formatting_scanner;
>>>>>>>  } // namespace __format
>>>>>>>    /// @endcond
>>>>>>>
>>>>>>> @@ -408,23 +439,34 @@ namespace __format
>>>>>>>        // This must not be constexpr.
>>>>>>>        static void __invalid_dynamic_spec(const char*);
>>>>>>>
>>>>>>> -      friend __format::_Scanner<_CharT>;
>>>>>>> -#endif
>>>>>>> -
>>>>>>> +#else
>>>>>>>        // This constructor should only be used by the implementation.
>>>>>>>        constexpr explicit
>>>>>>>        basic_format_parse_context(basic_string_view<_CharT> __fmt,
>>>>>>>                                  size_t __num_args) noexcept
>>>>>>>        : _M_begin(__fmt.begin()), _M_end(__fmt.end()),
>>>>>>> _M_num_args(__num_args)
>>>>>>>        { }
>>>>>>> +#endif
>>>>>>>
>>>>>>>      private:
>>>>>>> +      // This constructor should only be used by the implementation.
>>>>>>> +      constexpr explicit
>>>>>>> +      basic_format_parse_context(__format::_Api_ctx __api,
>>>>>>> +                                basic_string_view<_CharT> __fmt,
>>>>>>> +                                size_t __num_args) noexcept
>>>>>>> +      : _M_api(__api), _M_begin(__fmt.begin()), _M_end(__fmt.end())
>>>>>>> +      , _M_num_args(__num_args)
>>>>>>> +      { }
>>>>>>> +
>>>>>>> +      __format::_Api_ctx _M_api = __format::__current_api<_CharT>();
>>>>>>>
>>>>>>
>>>>>> What guarantees this will be initialized by a call to the right
>>>>>> version?
>>>>>>
>>>>> This is only used by basic_format_parse_context(string) constructor,
>>>>> that is mostly used
>>>>> for user defined-formatters. We may want to define this cosntructors
>>>>> as always inline.
>>>>>
>>>>>>
>>>>>> Doesn't putting this member first add a lot of wasted padding due to
>>>>>> alignment?
>>>>>>
>>>>> I do not think basic_format_parse_context and basic_format_context
>>>>> size is relevant to
>>>>> anybody. But the struct is 64bits, so should not add extra pading.
>>>>>
>>>>
>>>> I don't understand how adding a new byte before the first iterator
>>>> member doesn't introduce sizeof(void*)-1 bytes of padding.
>>>>
>>> _Api_ctx is 8B struct with the literal encoding information. And I
>>> considered having a spare bytes there as a feature, and not
>>> a drawback.
>>>
>>
>> Oh, I thought this was just storing the enum, not the struct. Doh.
>>
>> That explains why I was confused.
>>
>> Thinking more about the questions in the follow up mails....
>>
> We decided to go with simpler model from
https://gcc.gnu.org/pipermail/libstdc++/2026-April/066170.html.

> I think allowing a C++23 format string extension in C++20 mode may be a
> better choice, as it will help us avoid
> all the problems related to linking TU compiled in different standards.
> However, I think we should make that proper
> extension, and make them accepted for contexpr and runtime format.
>
> For sure, we do not want to be in situation, when ? is accepted if you are
> using unicode literal encoding in TU
> (and using exported vformat_to defintion), and not if you are using a
> different encoding. We could add some
> if consteval checks to apply extension only to runtime, but I do not see
> value in that.
>
> In short, I am for policy, let make format strings specifiers accepted as
> extension, once give standard mode
> becomes stable. (On purpose limiting to format-string specifiers, and not
> functions like set_debug_format).
>
>
>>
>>
>>>> Why don't we just give _Indexing a fixed underlying type of unsigned
>>>> char and then put the API version after that?
>>>>
>>>>
>>>>
>>>>
>>>>>>
>>>>>>        iterator _M_begin;
>>>>>>>        iterator _M_end;
>>>>>>>        enum _Indexing { _Unknown, _Manual, _Auto };
>>>>>>>        _Indexing _M_indexing = _Unknown;
>>>>>>>
>>>>>>
>>>>>> We already have padding bytes here (and could guarantee that by
>>>>>> giving a fixed underlying type to _Indexing)
>>>>>>
>>>>>>        size_t _M_next_arg_id = 0;
>>>>>>>        size_t _M_num_args = 0;
>>>>>>> +
>>>>>>> +      friend __format::_Scanner<_CharT>;
>>>>>>>      };
>>>>>>>
>>>>>>>  /// @cond undocumented
>>>>>>> @@ -4927,18 +4969,21 @@ namespace __format
>>>>>>>      {
>>>>>>>        static_assert( output_iterator<_Out, const _CharT&> );
>>>>>>>
>>>>>>> +      __format::_Api_ctx  _M_api;
>>>>>>>        basic_format_args<basic_format_context> _M_args;
>>>>>>>        _Out _M_out;
>>>>>>>        __format::_Optional_locale _M_loc;
>>>>>>>
>>>>>>> -      basic_format_context(basic_format_args<basic_format_context>
>>>>>>> __args,
>>>>>>> +      basic_format_context(__format::_Api_ctx __api,
>>>>>>> +                          basic_format_args<basic_format_context>
>>>>>>> __args,
>>>>>>>                            _Out __out)
>>>>>>> -      : _M_args(__args), _M_out(std::move(__out))
>>>>>>> +      : _M_api(__api), _M_args(__args), _M_out(std::move(__out))
>>>>>>>        { }
>>>>>>>
>>>>>>> -      basic_format_context(basic_format_args<basic_format_context>
>>>>>>> __args,
>>>>>>> +      basic_format_context(__format::_Api_ctx __api,
>>>>>>> +                          basic_format_args<basic_format_context>
>>>>>>> __args,
>>>>>>>                            _Out __out, const std::locale& __loc)
>>>>>>> -      : _M_args(__args), _M_out(std::move(__out)), _M_loc(__loc)
>>>>>>> +      : _M_api(__api), _M_args(__args),
>>>>>>> _M_out(std::move(__out)), _M_loc(__loc)
>>>>>>>        { }
>>>>>>>
>>>>>>>        // _GLIBCXX_RESOLVE_LIB_DEFECTS
>>>>>>> @@ -4954,6 +4999,7 @@ namespace __format
>>>>>>>                                   const locale*);
>>>>>>>
>>>>>>>        friend __format::__formatter_chrono<_CharT>;
>>>>>>> +      friend __format::_Formatting_scanner<_Out, _CharT>;
>>>>>>>
>>>>>>>      public:
>>>>>>>        ~basic_format_context() = default;
>>>>>>> @@ -4998,8 +5044,9 @@ namespace __format
>>>>>>>        } _M_pc;
>>>>>>>
>>>>>>>        constexpr explicit
>>>>>>> -      _Scanner(basic_string_view<_CharT> __str, size_t __nargs =
>>>>>>> (size_t)-1)
>>>>>>> -      : _M_pc(__str, __nargs)
>>>>>>> +      _Scanner(_Api_ctx __api, basic_string_view<_CharT> __str,
>>>>>>> +              size_t __nargs = (size_t)-1)
>>>>>>> +      : _M_pc(__api, __str, __nargs)
>>>>>>>        { }
>>>>>>>
>>>>>>>        constexpr iterator begin() const noexcept { return
>>>>>>> _M_pc.begin(); }
>>>>>>> @@ -5115,7 +5162,7 @@ namespace __format
>>>>>>>      public:
>>>>>>>        _Formatting_scanner(basic_format_context<_Out, _CharT>& __fc,
>>>>>>>                           basic_string_view<_CharT> __str)
>>>>>>> -      : _Scanner<_CharT>(__str), _M_fc(__fc)
>>>>>>> +      : _Scanner<_CharT>(__fc._M_api, __str), _M_fc(__fc)
>>>>>>>        { }
>>>>>>>
>>>>>>>      private:
>>>>>>> @@ -5176,7 +5223,8 @@ namespace __format
>>>>>>>      public:
>>>>>>>        consteval
>>>>>>>        _Checking_scanner(basic_string_view<_CharT> __str)
>>>>>>> -      : _Scanner<_CharT>(__str, sizeof...(_Args))
>>>>>>> +      : _Scanner<_CharT>(__format::__current_api<_CharT>(),
>>>>>>>
>>>>>>
>>>>>> This is consteval so should use the right version for the current TU.
>>>>>>
>>>>>> +                        __str, sizeof...(_Args))
>>>>>>>        {
>>>>>>>  #if __cpp_lib_format >= 202305L
>>>>>>>         this->_M_pc._M_types = _M_types.data();
>>>>>>> @@ -5219,82 +5267,91 @@ namespace __format
>>>>>>>  #endif
>>>>>>>      };
>>>>>>>
>>>>>>> -  template<typename _Out, typename _CharT, typename _Context>
>>>>>>> -    inline _Out
>>>>>>> -    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>>>>>> -                   const basic_format_args<_Context>& __args,
>>>>>>> -                   const locale* __loc)
>>>>>>> +  template<typename _CharT>
>>>>>>> +    _Sink_iter<_CharT>
>>>>>>> +    __do_vformat_to(_Sink_iter<_CharT> __out,
>>>>>>> basic_string_view<_CharT> __fmt,
>>>>>>> +                   __format_context<_CharT>& __ctx)
>>>>>>>      {
>>>>>>> -      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>>>>>> -       {
>>>>>>> -         if constexpr (is_same_v<_CharT, char>)
>>>>>>> -           // Fast path for "{}" format strings and simple format
>>>>>>> arg types.
>>>>>>> -           if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] ==
>>>>>>> '}')
>>>>>>> -             {
>>>>>>> -               bool __done = false;
>>>>>>> -               __format::__visit_format_arg([&](auto& __arg) {
>>>>>>> -                 using _Tp = remove_cvref_t<decltype(__arg)>;
>>>>>>> -                 if constexpr (is_same_v<_Tp, bool>)
>>>>>>> +      if constexpr (is_same_v<_CharT, char>)
>>>>>>> +       // Fast path for "{}" format strings and simple format arg
>>>>>>> types.
>>>>>>> +       if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
>>>>>>> +         {
>>>>>>> +           bool __done = false;
>>>>>>> +           __format::__visit_format_arg([&](auto& __arg) {
>>>>>>> +             using _Tp = remove_cvref_t<decltype(__arg)>;
>>>>>>> +             if constexpr (is_same_v<_Tp, bool>)
>>>>>>> +               {
>>>>>>> +                 size_t __len = 4 + !__arg;
>>>>>>> +                 const char* __chars[] = { "false", "true" };
>>>>>>> +                 if (auto __res = __out._M_reserve(__len))
>>>>>>>                     {
>>>>>>> -                     size_t __len = 4 + !__arg;
>>>>>>> -                     const char* __chars[] = { "false", "true" };
>>>>>>> -                     if (auto __res = __out._M_reserve(__len))
>>>>>>> -                       {
>>>>>>> -                         __builtin_memcpy(__res.get(),
>>>>>>> __chars[__arg], __len);
>>>>>>> -                         __res._M_bump(__len);
>>>>>>> -                         __done = true;
>>>>>>> -                       }
>>>>>>> +                     __builtin_memcpy(__res.get(), __chars[__arg],
>>>>>>> __len);
>>>>>>> +                     __res._M_bump(__len);
>>>>>>> +                     __done = true;
>>>>>>>                     }
>>>>>>> -                 else if constexpr (is_same_v<_Tp, char>)
>>>>>>> +               }
>>>>>>> +             else if constexpr (is_same_v<_Tp, char>)
>>>>>>> +               {
>>>>>>> +                 if (auto __res = __out._M_reserve(1))
>>>>>>>                     {
>>>>>>> -                     if (auto __res = __out._M_reserve(1))
>>>>>>> -                       {
>>>>>>> -                         *__res.get() = __arg;
>>>>>>> -                         __res._M_bump(1);
>>>>>>> -                         __done = true;
>>>>>>> -                       }
>>>>>>> +                     *__res.get() = __arg;
>>>>>>> +                     __res._M_bump(1);
>>>>>>> +                     __done = true;
>>>>>>>                     }
>>>>>>> -                 else if constexpr (is_integral_v<_Tp>)
>>>>>>> +               }
>>>>>>> +             else if constexpr (is_integral_v<_Tp>)
>>>>>>> +               {
>>>>>>> +                 make_unsigned_t<_Tp> __uval;
>>>>>>> +                 const bool __neg = __arg < 0;
>>>>>>> +                 if (__neg)
>>>>>>> +                   __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>>>>>> +                 else
>>>>>>> +                   __uval = __arg;
>>>>>>> +                 const auto __n = __detail::__to_chars_len(__uval);
>>>>>>> +                 if (auto __res = __out._M_reserve(__n + __neg))
>>>>>>>                     {
>>>>>>> -                     make_unsigned_t<_Tp> __uval;
>>>>>>> -                     const bool __neg = __arg < 0;
>>>>>>> -                     if (__neg)
>>>>>>> -                       __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
>>>>>>> -                     else
>>>>>>> -                       __uval = __arg;
>>>>>>> -                     const auto __n =
>>>>>>> __detail::__to_chars_len(__uval);
>>>>>>> -                     if (auto __res = __out._M_reserve(__n + __neg))
>>>>>>> -                       {
>>>>>>> -                         auto __ptr = __res.get();
>>>>>>> -                         *__ptr = '-';
>>>>>>> -                         __detail::__to_chars_10_impl(__ptr +
>>>>>>> (int)__neg, __n,
>>>>>>> -                                                      __uval);
>>>>>>> -                         __res._M_bump(__n + __neg);
>>>>>>> -                         __done = true;
>>>>>>> -                       }
>>>>>>> +                     auto __ptr = __res.get();
>>>>>>> +                     *__ptr = '-';
>>>>>>> +                     __detail::__to_chars_10_impl(__ptr +
>>>>>>> (int)__neg, __n,
>>>>>>> +                                                  __uval);
>>>>>>> +                     __res._M_bump(__n + __neg);
>>>>>>> +                     __done = true;
>>>>>>>                     }
>>>>>>> -                 else if constexpr (is_convertible_v<_Tp,
>>>>>>> string_view>)
>>>>>>> +               }
>>>>>>> +             else if constexpr (is_convertible_v<_Tp, string_view>)
>>>>>>> +               {
>>>>>>> +                 string_view __sv = __arg;
>>>>>>> +                 if (auto __res = __out._M_reserve(__sv.size()))
>>>>>>>                     {
>>>>>>> -                     string_view __sv = __arg;
>>>>>>> -                     if (auto __res = __out._M_reserve(__sv.size()))
>>>>>>> -                       {
>>>>>>> -                         __builtin_memcpy(__res.get(), __sv.data(),
>>>>>>> __sv.size());
>>>>>>> -                         __res._M_bump(__sv.size());
>>>>>>> -                         __done = true;
>>>>>>> -                       }
>>>>>>> +                     __builtin_memcpy(__res.get(), __sv.data(),
>>>>>>> __sv.size());
>>>>>>> +                     __res._M_bump(__sv.size());
>>>>>>> +                     __done = true;
>>>>>>>                     }
>>>>>>> -               }, __args.get(0));
>>>>>>> +               }
>>>>>>> +           }, __ctx.arg(0));
>>>>>>>
>>>>>>> -               if (__done)
>>>>>>> -                 return __out;
>>>>>>> -             }
>>>>>>> +           if (__done)
>>>>>>> +             return __out;
>>>>>>> +         }
>>>>>>>
>>>>>>> +      _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>>>>>> __scanner(__ctx, __fmt);
>>>>>>> +      __scanner._M_scan();
>>>>>>> +      return __out;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  template<typename _Out, typename _CharT, typename _Context>
>>>>>>> +    _Out
>>>>>>> +    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
>>>>>>> +                   const basic_format_args<_Context>& __args,
>>>>>>> +                   const locale* __loc)
>>>>>>> +    {
>>>>>>> +      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
>>>>>>> +       {
>>>>>>> +         const auto __api = __format::__current_api<_CharT>();
>>>>>>>           auto __ctx = __loc == nullptr
>>>>>>> -                        ? _Context(__args, __out)
>>>>>>> -                        : _Context(__args, __out, *__loc);
>>>>>>> -         _Formatting_scanner<_Sink_iter<_CharT>, _CharT>
>>>>>>> __scanner(__ctx, __fmt);
>>>>>>> -         __scanner._M_scan();
>>>>>>> -         return __out;
>>>>>>> +                    ? _Context(__api, __args, __out)
>>>>>>> +                    : _Context(__api, __args, __out, *__loc);
>>>>>>> +         return __do_vformat_to(std::move(__out), __fmt, __ctx);
>>>>>>>         }
>>>>>>>        else if constexpr (__contiguous_char_iter<_CharT, _Out>)
>>>>>>>         {
>>>>>>> --
>>>>>>> 2.53.0
>>>>>>>
>>>>>>>
  

Patch

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index eca5bd213aa..97d1ecb3ed6 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -140,6 +140,37 @@  namespace __format
       template<typename, typename...> friend struct std::basic_format_string;
     };
 
+  // Exposed via basic_format_parse_context, defines the TU specific information
+  // like encoding and standard version.
+  struct _Api_ctx
+  { 
+    enum class _Version : unsigned char 
+    { _Api_2020, _Api_2023, _Api_2026 };
+    
+    _Version _M_ver;
+    unsigned _M_unused : 23;
+    unsigned _M_literal_unicode : 1;
+    __INT_LEAST32_TYPE__ _M_literal_enc;
+  };
+  using enum _Api_ctx::_Version;
+
+  template<typename _CharT>
+    constexpr _Api_ctx
+    __current_api()
+    {
+      _Api_ctx __api{};
+#if __cpluplus > 202302L
+      __api._M_ver = _Api_2026;
+#elif __cpluplus > 202002L
+      __api._M_ver = _Api_2023;
+#else
+      __api._M_ver = _Api_2020;
+#endif
+      __api._M_literal_unicode
+	= __unicode::__literal_encoding_is_unicode<_CharT>();
+      return __api;
+    }
+
 } // namespace __format
 /// @endcond
 
@@ -274,7 +305,7 @@  namespace __format
   { __throw_format_error("format error: failed to parse format-spec"); }
 
   template<typename _CharT> class _Scanner;
-
+  template<typename _Out, typename _CharT> class _Formatting_scanner;
 } // namespace __format
   /// @endcond
 
@@ -408,23 +439,34 @@  namespace __format
       // This must not be constexpr.
       static void __invalid_dynamic_spec(const char*);
 
-      friend __format::_Scanner<_CharT>;
-#endif
-
+#else
       // This constructor should only be used by the implementation.
       constexpr explicit
       basic_format_parse_context(basic_string_view<_CharT> __fmt,
 				 size_t __num_args) noexcept
       : _M_begin(__fmt.begin()), _M_end(__fmt.end()), _M_num_args(__num_args)
       { }
+#endif
 
     private:
+      // This constructor should only be used by the implementation.
+      constexpr explicit
+      basic_format_parse_context(__format::_Api_ctx __api,
+				 basic_string_view<_CharT> __fmt,
+				 size_t __num_args) noexcept
+      : _M_api(__api), _M_begin(__fmt.begin()), _M_end(__fmt.end())
+      , _M_num_args(__num_args)
+      { }
+
+      __format::_Api_ctx _M_api = __format::__current_api<_CharT>();
       iterator _M_begin;
       iterator _M_end;
       enum _Indexing { _Unknown, _Manual, _Auto };
       _Indexing _M_indexing = _Unknown;
       size_t _M_next_arg_id = 0;
       size_t _M_num_args = 0;
+
+      friend __format::_Scanner<_CharT>;
     };
 
 /// @cond undocumented
@@ -4927,18 +4969,21 @@  namespace __format
     {
       static_assert( output_iterator<_Out, const _CharT&> );
 
+      __format::_Api_ctx  _M_api;
       basic_format_args<basic_format_context> _M_args;
       _Out _M_out;
       __format::_Optional_locale _M_loc;
 
-      basic_format_context(basic_format_args<basic_format_context> __args,
+      basic_format_context(__format::_Api_ctx __api,
+			   basic_format_args<basic_format_context> __args,
 			   _Out __out)
-      : _M_args(__args), _M_out(std::move(__out))
+      : _M_api(__api), _M_args(__args), _M_out(std::move(__out))
       { }
 
-      basic_format_context(basic_format_args<basic_format_context> __args,
+      basic_format_context(__format::_Api_ctx __api,
+			   basic_format_args<basic_format_context> __args,
 			   _Out __out, const std::locale& __loc)
-      : _M_args(__args), _M_out(std::move(__out)), _M_loc(__loc)
+      : _M_api(__api), _M_args(__args),	_M_out(std::move(__out)), _M_loc(__loc)
       { }
 
       // _GLIBCXX_RESOLVE_LIB_DEFECTS
@@ -4954,6 +4999,7 @@  namespace __format
 				  const locale*);
 
       friend __format::__formatter_chrono<_CharT>;
+      friend __format::_Formatting_scanner<_Out, _CharT>;
 
     public:
       ~basic_format_context() = default;
@@ -4998,8 +5044,9 @@  namespace __format
       } _M_pc;
 
       constexpr explicit
-      _Scanner(basic_string_view<_CharT> __str, size_t __nargs = (size_t)-1)
-      : _M_pc(__str, __nargs)
+      _Scanner(_Api_ctx __api, basic_string_view<_CharT> __str,
+	       size_t __nargs = (size_t)-1)
+      : _M_pc(__api, __str, __nargs)
       { }
 
       constexpr iterator begin() const noexcept { return _M_pc.begin(); }
@@ -5115,7 +5162,7 @@  namespace __format
     public:
       _Formatting_scanner(basic_format_context<_Out, _CharT>& __fc,
 			  basic_string_view<_CharT> __str)
-      : _Scanner<_CharT>(__str), _M_fc(__fc)
+      : _Scanner<_CharT>(__fc._M_api, __str), _M_fc(__fc)
       { }
 
     private:
@@ -5176,7 +5223,8 @@  namespace __format
     public:
       consteval
       _Checking_scanner(basic_string_view<_CharT> __str)
-      : _Scanner<_CharT>(__str, sizeof...(_Args))
+      : _Scanner<_CharT>(__format::__current_api<_CharT>(),
+		         __str, sizeof...(_Args))
       {
 #if __cpp_lib_format >= 202305L
 	this->_M_pc._M_types = _M_types.data();
@@ -5219,82 +5267,91 @@  namespace __format
 #endif
     };
 
-  template<typename _Out, typename _CharT, typename _Context>
-    inline _Out
-    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
-		    const basic_format_args<_Context>& __args,
-		    const locale* __loc)
+  template<typename _CharT>
+    _Sink_iter<_CharT>
+    __do_vformat_to(_Sink_iter<_CharT> __out, basic_string_view<_CharT> __fmt,
+		    __format_context<_CharT>& __ctx)
     {
-      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
-	{
-	  if constexpr (is_same_v<_CharT, char>)
-	    // Fast path for "{}" format strings and simple format arg types.
-	    if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
-	      {
-		bool __done = false;
-		__format::__visit_format_arg([&](auto& __arg) {
-		  using _Tp = remove_cvref_t<decltype(__arg)>;
-		  if constexpr (is_same_v<_Tp, bool>)
+      if constexpr (is_same_v<_CharT, char>)
+	// Fast path for "{}" format strings and simple format arg types.
+	if (__fmt.size() == 2 && __fmt[0] == '{' && __fmt[1] == '}')
+	  {
+	    bool __done = false;
+	    __format::__visit_format_arg([&](auto& __arg) {
+	      using _Tp = remove_cvref_t<decltype(__arg)>;
+	      if constexpr (is_same_v<_Tp, bool>)
+		{
+		  size_t __len = 4 + !__arg;
+		  const char* __chars[] = { "false", "true" };
+		  if (auto __res = __out._M_reserve(__len))
 		    {
-		      size_t __len = 4 + !__arg;
-		      const char* __chars[] = { "false", "true" };
-		      if (auto __res = __out._M_reserve(__len))
-			{
-			  __builtin_memcpy(__res.get(), __chars[__arg], __len);
-			  __res._M_bump(__len);
-			  __done = true;
-			}
+		      __builtin_memcpy(__res.get(), __chars[__arg], __len);
+		      __res._M_bump(__len);
+		      __done = true;
 		    }
-		  else if constexpr (is_same_v<_Tp, char>)
+		}
+	      else if constexpr (is_same_v<_Tp, char>)
+		{
+		  if (auto __res = __out._M_reserve(1))
 		    {
-		      if (auto __res = __out._M_reserve(1))
-			{
-			  *__res.get() = __arg;
-			  __res._M_bump(1);
-			  __done = true;
-			}
+		      *__res.get() = __arg;
+		      __res._M_bump(1);
+		      __done = true;
 		    }
-		  else if constexpr (is_integral_v<_Tp>)
+		}
+	      else if constexpr (is_integral_v<_Tp>)
+		{
+		  make_unsigned_t<_Tp> __uval;
+		  const bool __neg = __arg < 0;
+		  if (__neg)
+		    __uval = make_unsigned_t<_Tp>(~__arg) + 1u;
+		  else
+		    __uval = __arg;
+		  const auto __n = __detail::__to_chars_len(__uval);
+		  if (auto __res = __out._M_reserve(__n + __neg))
 		    {
-		      make_unsigned_t<_Tp> __uval;
-		      const bool __neg = __arg < 0;
-		      if (__neg)
-			__uval = make_unsigned_t<_Tp>(~__arg) + 1u;
-		      else
-			__uval = __arg;
-		      const auto __n = __detail::__to_chars_len(__uval);
-		      if (auto __res = __out._M_reserve(__n + __neg))
-			{
-			  auto __ptr = __res.get();
-			  *__ptr = '-';
-			  __detail::__to_chars_10_impl(__ptr + (int)__neg, __n,
-						       __uval);
-			  __res._M_bump(__n + __neg);
-			  __done = true;
-			}
+		      auto __ptr = __res.get();
+		      *__ptr = '-';
+		      __detail::__to_chars_10_impl(__ptr + (int)__neg, __n,
+						   __uval);
+		      __res._M_bump(__n + __neg);
+		      __done = true;
 		    }
-		  else if constexpr (is_convertible_v<_Tp, string_view>)
+		}
+	      else if constexpr (is_convertible_v<_Tp, string_view>)
+		{
+		  string_view __sv = __arg;
+		  if (auto __res = __out._M_reserve(__sv.size()))
 		    {
-		      string_view __sv = __arg;
-		      if (auto __res = __out._M_reserve(__sv.size()))
-			{
-			  __builtin_memcpy(__res.get(), __sv.data(), __sv.size());
-			  __res._M_bump(__sv.size());
-			  __done = true;
-			}
+		      __builtin_memcpy(__res.get(), __sv.data(), __sv.size());
+		      __res._M_bump(__sv.size());
+		      __done = true;
 		    }
-		}, __args.get(0));
+		}
+	    }, __ctx.arg(0));
 
-		if (__done)
-		  return __out;
-	      }
+	    if (__done)
+	      return __out;
+	  }
 
+      _Formatting_scanner<_Sink_iter<_CharT>, _CharT> __scanner(__ctx, __fmt);
+      __scanner._M_scan();
+      return __out;
+    }
+
+  template<typename _Out, typename _CharT, typename _Context>
+    _Out
+    __do_vformat_to(_Out __out, basic_string_view<_CharT> __fmt,
+		    const basic_format_args<_Context>& __args,
+		    const locale* __loc)
+    {
+      if constexpr (is_same_v<_Out, _Sink_iter<_CharT>>)
+	{
+	  const auto __api = __format::__current_api<_CharT>();
 	  auto __ctx = __loc == nullptr
-			 ? _Context(__args, __out)
-			 : _Context(__args, __out, *__loc);
-	  _Formatting_scanner<_Sink_iter<_CharT>, _CharT> __scanner(__ctx, __fmt);
-	  __scanner._M_scan();
-	  return __out;
+		     ? _Context(__api, __args, __out)
+		     : _Context(__api, __args, __out, *__loc);
+	  return __do_vformat_to(std::move(__out), __fmt, __ctx);
 	}
       else if constexpr (__contiguous_char_iter<_CharT, _Out>)
 	{