gcov: Fix -fprofile-update=atomic

Message ID 20221209135609.55159-1-sebastian.huber@embedded-brains.de
State New
Headers
Series gcov: Fix -fprofile-update=atomic |

Commit Message

Sebastian Huber Dec. 9, 2022, 1:56 p.m. UTC
  The code coverage support uses counters to determine which edges in the control
flow graph were executed.  If a counter overflows, then the code coverage
information is invalid.  Therefore the counter type should be a 64-bit integer.
In multithreaded applications, it is important that the counter increments are
atomic.  This is not the case by default.  The user can enable atomic counter
increments through the -fprofile-update=atomic and
-fprofile-update=prefer-atomic options.

If the hardware supports 64-bit atomic operations, then everything is fine.  If
not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic
counter increments will be used.  However, if the hardware does not support the
required atomic operations and -fprofile-atomic=update was chosen by the user,
then a warning was issued and as a forced fall-back to non-atomic operations
was done.  This is probably not what a user wants.  There is still hardware on
the market which does not have atomic operations and is used for multithreaded
applications.  A user which selects -fprofile-update=atomic wants consistent
code coverage data and not random data.

This patch removes the fall-back to non-atomic operations for
-fprofile-update=atomic.  If atomic operations in hardware are not available,
then a library call to libatomic is emitted.  To mitigate potential performance
issues an optimization for systems which only support 32-bit atomic operations
is provided.  Here, the edge counter increments are done like this:

  low = __atomic_add_fetch_4 (&counter.low, 1, MEMMODEL_RELAXED);
  high_inc = low == 0 ? 1 : 0;
  __atomic_add_fetch_4 (&counter.high, high_inc, MEMMODEL_RELAXED);

gcc/ChangeLog:

	* tree-profile.cc (split_atomic_increment): New.
	(gimple_gen_edge_profiler): Split the atomic edge counter increment in
	two 32-bit atomic operations if necessary.
	(tree_profiling): Remove profile update warning and fall-back.  Set
	split_atomic_increment if necessary.
---
 gcc/tree-profile.cc | 81 +++++++++++++++++++++++++++++++++------------
 1 file changed, 59 insertions(+), 22 deletions(-)
  

Comments

Richard Biener Dec. 13, 2022, 2:30 p.m. UTC | #1
On Fri, Dec 9, 2022 at 2:56 PM Sebastian Huber
<sebastian.huber@embedded-brains.de> wrote:
>
> The code coverage support uses counters to determine which edges in the control
> flow graph were executed.  If a counter overflows, then the code coverage
> information is invalid.  Therefore the counter type should be a 64-bit integer.
> In multithreaded applications, it is important that the counter increments are
> atomic.  This is not the case by default.  The user can enable atomic counter
> increments through the -fprofile-update=atomic and
> -fprofile-update=prefer-atomic options.
>
> If the hardware supports 64-bit atomic operations, then everything is fine.  If
> not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic
> counter increments will be used.  However, if the hardware does not support the
> required atomic operations and -fprofile-atomic=update was chosen by the user,
> then a warning was issued and as a forced fall-back to non-atomic operations
> was done.  This is probably not what a user wants.  There is still hardware on
> the market which does not have atomic operations and is used for multithreaded
> applications.  A user which selects -fprofile-update=atomic wants consistent
> code coverage data and not random data.
>
> This patch removes the fall-back to non-atomic operations for
> -fprofile-update=atomic.  If atomic operations in hardware are not available,
> then a library call to libatomic is emitted.  To mitigate potential performance
> issues an optimization for systems which only support 32-bit atomic operations
> is provided.  Here, the edge counter increments are done like this:
>
>   low = __atomic_add_fetch_4 (&counter.low, 1, MEMMODEL_RELAXED);
>   high_inc = low == 0 ? 1 : 0;
>   __atomic_add_fetch_4 (&counter.high, high_inc, MEMMODEL_RELAXED);

You check for compare_and_swapsi and the old code checks
TYPE_PRECISION (gcov_type_node) > 32 to determine whether 8 byte or 4 byte
gcov_type is used.  But you do not seem to handle the case where
neither SImode nor DImode atomic operations are available?  Can we instead
do

  if (gcov_type_size == 4)
    can_support_atomic4
      = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
  else if (gcov_type_size == 8)
    can_support_atomic8
      = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;

  if (flag_profile_update == PROFILE_UPDATE_ATOMIC
      && !can_support_atomic4 && !can_support_atomic8)
    {
      warning (0, "target does not support atomic profile update, "
               "single mode is selected");
      flag_profile_update = PROFILE_UPDATE_SINGLE;
    }

thus retain the diagnostic & fallback for this case?  The existing
code also suggests
there might be targets with HImode or TImode counters?

In another mail you mentioned that for gen_time_profiler this doesn't
work but its
code relies on flag_profile_update as well.  So do we need to split the flag
somehow, or continue using the PROFILE_UPDATE_SINGLE fallback when
we are doing more than just edge profiling?

Thanks,
Richard.

> gcc/ChangeLog:
>
>         * tree-profile.cc (split_atomic_increment): New.
>         (gimple_gen_edge_profiler): Split the atomic edge counter increment in
>         two 32-bit atomic operations if necessary.
>         (tree_profiling): Remove profile update warning and fall-back.  Set
>         split_atomic_increment if necessary.
> ---
>  gcc/tree-profile.cc | 81 +++++++++++++++++++++++++++++++++------------
>  1 file changed, 59 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index 2beb49241f2..1d326dde59a 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -73,6 +73,17 @@ static GTY(()) tree ic_tuple_var;
>  static GTY(()) tree ic_tuple_counters_field;
>  static GTY(()) tree ic_tuple_callee_field;
>
> +/* If the user selected atomic profile counter updates
> +   (-fprofile-update=atomic), then the counter updates will be done atomically.
> +   Ideally, this is done through atomic operations in hardware.  If the
> +   hardware supports only 32-bit atomic increments and gcov_type_node is a
> +   64-bit integer type, then for the profile edge counters the increment is
> +   performed through two separate 32-bit atomic increments.  This case is
> +   indicated by the split_atomic_increment variable begin true.  If the
> +   hardware does not support atomic operations at all, then a library call to
> +   libatomic is emitted.  */
> +static bool split_atomic_increment;
> +
>  /* Do initialization work for the edge profiler.  */
>
>  /* Add code:
> @@ -242,30 +253,59 @@ gimple_init_gcov_profiler (void)
>  void
>  gimple_gen_edge_profiler (int edgeno, edge e)
>  {
> -  tree one;
> -
> -  one = build_int_cst (gcov_type_node, 1);
> +  const char *name = "PROF_edge_counter";
> +  tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);
> +  tree one = build_int_cst (gcov_type_node, 1);
>
>    if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
>      {
> -      /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
> -      tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno);
> -      tree f = builtin_decl_explicit (TYPE_PRECISION (gcov_type_node) > 32
> -                                     ? BUILT_IN_ATOMIC_FETCH_ADD_8:
> -                                     BUILT_IN_ATOMIC_FETCH_ADD_4);
> -      gcall *stmt = gimple_build_call (f, 3, addr, one,
> -                                      build_int_cst (integer_type_node,
> -                                                     MEMMODEL_RELAXED));
> -      gsi_insert_on_edge (e, stmt);
> +      tree addr = build_fold_addr_expr (ref);
> +      tree relaxed = build_int_cst (integer_type_node, MEMMODEL_RELAXED);
> +      if (!split_atomic_increment)
> +       {
> +         /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
> +         tree f = builtin_decl_explicit (TYPE_PRECISION (gcov_type_node) > 32
> +                                         ? BUILT_IN_ATOMIC_FETCH_ADD_8:
> +                                         BUILT_IN_ATOMIC_FETCH_ADD_4);
> +         gcall *stmt = gimple_build_call (f, 3, addr, one, relaxed);
> +         gsi_insert_on_edge (e, stmt);
> +       }
> +      else
> +       {
> +         /* low = __atomic_add_fetch_4 (addr, 1, MEMMODEL_RELAXED);
> +            high_inc = low == 0 ? 1 : 0;
> +            __atomic_add_fetch_4 (addr_high, high_inc, MEMMODEL_RELAXED); */
> +         tree zero32 = build_zero_cst (uint32_type_node);
> +         tree one32 = build_one_cst (uint32_type_node);
> +         tree addr_high = make_temp_ssa_name (TREE_TYPE (addr), NULL, name);
> +         gimple *stmt = gimple_build_assign (addr_high, POINTER_PLUS_EXPR,
> +                                             addr,
> +                                             build_int_cst (size_type_node,
> +                                                            4));
> +         gsi_insert_on_edge (e, stmt);
> +         if (WORDS_BIG_ENDIAN)
> +           std::swap (addr, addr_high);
> +         tree f = builtin_decl_explicit (BUILT_IN_ATOMIC_ADD_FETCH_4);
> +         stmt = gimple_build_call (f, 3, addr, one, relaxed);
> +         tree low = make_temp_ssa_name (uint32_type_node, NULL, name);
> +         gimple_call_set_lhs (stmt, low);
> +         gsi_insert_on_edge (e, stmt);
> +         tree is_zero = make_temp_ssa_name (boolean_type_node, NULL, name);
> +         stmt = gimple_build_assign (is_zero, EQ_EXPR, low, zero32);
> +         gsi_insert_on_edge (e, stmt);
> +         tree high_inc = make_temp_ssa_name (uint32_type_node, NULL, name);
> +         stmt = gimple_build_assign (high_inc, COND_EXPR, is_zero, one32,
> +                                     zero32);
> +         gsi_insert_on_edge (e, stmt);
> +         stmt = gimple_build_call (f, 3, addr_high, high_inc, relaxed);
> +         gsi_insert_on_edge (e, stmt);
> +       }
>      }
>    else
>      {
> -      tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);
> -      tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
> -                                                  NULL, "PROF_edge_counter");
> +      tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node, NULL, name);
>        gassign *stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
> -      gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
> -                                             NULL, "PROF_edge_counter");
> +      gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node, NULL, name);
>        gassign *stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR,
>                                             gimple_assign_lhs (stmt1), one);
>        gassign *stmt3 = gimple_build_assign (unshare_expr (ref),
> @@ -710,11 +750,8 @@ tree_profiling (void)
>
>    if (flag_profile_update == PROFILE_UPDATE_ATOMIC
>        && !can_support_atomic)
> -    {
> -      warning (0, "target does not support atomic profile update, "
> -              "single mode is selected");
> -      flag_profile_update = PROFILE_UPDATE_SINGLE;
> -    }
> +    split_atomic_increment = HAVE_sync_compare_and_swapsi
> +      || HAVE_atomic_compare_and_swapsi;
>    else if (flag_profile_update == PROFILE_UPDATE_PREFER_ATOMIC)
>      flag_profile_update = can_support_atomic
>        ? PROFILE_UPDATE_ATOMIC : PROFILE_UPDATE_SINGLE;
> --
> 2.35.3
>
  
Sebastian Huber Dec. 15, 2022, 8:34 a.m. UTC | #2
On 13/12/2022 15:30, Richard Biener wrote:
> On Fri, Dec 9, 2022 at 2:56 PM Sebastian Huber
> <sebastian.huber@embedded-brains.de>  wrote:
>> The code coverage support uses counters to determine which edges in the control
>> flow graph were executed.  If a counter overflows, then the code coverage
>> information is invalid.  Therefore the counter type should be a 64-bit integer.
>> In multithreaded applications, it is important that the counter increments are
>> atomic.  This is not the case by default.  The user can enable atomic counter
>> increments through the -fprofile-update=atomic and
>> -fprofile-update=prefer-atomic options.
>>
>> If the hardware supports 64-bit atomic operations, then everything is fine.  If
>> not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic
>> counter increments will be used.  However, if the hardware does not support the
>> required atomic operations and -fprofile-atomic=update was chosen by the user,
>> then a warning was issued and as a forced fall-back to non-atomic operations
>> was done.  This is probably not what a user wants.  There is still hardware on
>> the market which does not have atomic operations and is used for multithreaded
>> applications.  A user which selects -fprofile-update=atomic wants consistent
>> code coverage data and not random data.
>>
>> This patch removes the fall-back to non-atomic operations for
>> -fprofile-update=atomic.  If atomic operations in hardware are not available,
>> then a library call to libatomic is emitted.  To mitigate potential performance
>> issues an optimization for systems which only support 32-bit atomic operations
>> is provided.  Here, the edge counter increments are done like this:
>>
>>    low = __atomic_add_fetch_4 (&counter.low, 1, MEMMODEL_RELAXED);
>>    high_inc = low == 0 ? 1 : 0;
>>    __atomic_add_fetch_4 (&counter.high, high_inc, MEMMODEL_RELAXED);
> You check for compare_and_swapsi and the old code checks
> TYPE_PRECISION (gcov_type_node) > 32 to determine whether 8 byte or 4 byte
> gcov_type is used.  But you do not seem to handle the case where
> neither SImode nor DImode atomic operations are available?  Can we instead
> do
> 
>    if (gcov_type_size == 4)
>      can_support_atomic4
>        = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
>    else if (gcov_type_size == 8)
>      can_support_atomic8
>        = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
> 
>    if (flag_profile_update == PROFILE_UPDATE_ATOMIC
>        && !can_support_atomic4 && !can_support_atomic8)
>      {
>        warning (0, "target does not support atomic profile update, "
>                 "single mode is selected");
>        flag_profile_update = PROFILE_UPDATE_SINGLE;
>      }
> 
> thus retain the diagnostic & fallback for this case?  

No, if you select -fprofile-update=atomic, then the updates shall be 
atomic from my point of view. If a fallback is acceptable, then you can 
use -fprofile-update=prefer-atomic. Using the fallback in 
-fprofile-update=atomic is a bug which prevents the use of gcov for 
multi-threaded applications on the lower end targets which do not have 
atomic operations in hardware.

> The existing
> code also suggests
> there might be targets with HImode or TImode counters?

Sorry, I don't know.

> 
> In another mail you mentioned that for gen_time_profiler this doesn't
> work but its
> code relies on flag_profile_update as well.  So do we need to split the flag
> somehow, or continue using the PROFILE_UPDATE_SINGLE fallback when
> we are doing more than just edge profiling?

If atomic operations are not available in hardware, then with this patch 
calls to libatomic are emitted. In gen_time_profiler() this is not an 
issue at all, since the atomic increment is only done if counters[0] == 
0 (if I read the code correctly).

For example for

void f(void) {}

we get on riscv:

gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic

         lui     a4,%hi(__gcov0.f)
         li      a3,1
         addi    a4,a4,%lo(__gcov0.f)
         amoadd.w a5,a3,0(a4)
         lui     a4,%hi(__gcov0.f+4)
         addi    a5,a5,1
         seqz    a5,a5
         addi    a4,a4,%lo(__gcov0.f+4)
         amoadd.w zero,a5,0(a4)
         ret

gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -mbig-endian

         lui     a4,%hi(__gcov0.f+4)
         li      a3,1
         addi    a4,a4,%lo(__gcov0.f+4)
         amoadd.w a5,a3,0(a4)
         lui     a4,%hi(__gcov0.f)
         addi    a5,a5,1
         seqz    a5,a5
         addi    a4,a4,%lo(__gcov0.f)
         amoadd.w zero,a5,0(a4)
         ret

gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -march=rv64g 
-mabi=lp64

         lui     a5,%hi(__gcov0.f)
         li      a4,1
         addi    a5,a5,%lo(__gcov0.f)
         amoadd.d zero,a4,0(a5)
         ret

gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -march=rv32id

         lui     a0,%hi(__gcov0.f)
         li      a3,0
         li      a1,1
         li      a2,0
         addi    a0,a0,%lo(__gcov0.f)
         tail    __atomic_fetch_add_8

gcc --coverage -O2 -S -o - test.c -fprofile-update=prefer-atomic 
-march=rv32id

         lui     a4,%hi(__gcov0.f)
         lw      a5,%lo(__gcov0.f)(a4)
         lw      a2,%lo(__gcov0.f+4)(a4)
         addi    a3,a5,1
         sltu    a5,a3,a5
         add     a5,a5,a2
         sw      a3,%lo(__gcov0.f)(a4)
         sw      a5,%lo(__gcov0.f+4)(a4)
         ret
  
Richard Biener Dec. 16, 2022, 9:47 a.m. UTC | #3
On Thu, Dec 15, 2022 at 9:34 AM Sebastian Huber
<sebastian.huber@embedded-brains.de> wrote:
>
> On 13/12/2022 15:30, Richard Biener wrote:
> > On Fri, Dec 9, 2022 at 2:56 PM Sebastian Huber
> > <sebastian.huber@embedded-brains.de>  wrote:
> >> The code coverage support uses counters to determine which edges in the control
> >> flow graph were executed.  If a counter overflows, then the code coverage
> >> information is invalid.  Therefore the counter type should be a 64-bit integer.
> >> In multithreaded applications, it is important that the counter increments are
> >> atomic.  This is not the case by default.  The user can enable atomic counter
> >> increments through the -fprofile-update=atomic and
> >> -fprofile-update=prefer-atomic options.
> >>
> >> If the hardware supports 64-bit atomic operations, then everything is fine.  If
> >> not and -fprofile-update=prefer-atomic was chosen by the user, then non-atomic
> >> counter increments will be used.  However, if the hardware does not support the
> >> required atomic operations and -fprofile-atomic=update was chosen by the user,
> >> then a warning was issued and as a forced fall-back to non-atomic operations
> >> was done.  This is probably not what a user wants.  There is still hardware on
> >> the market which does not have atomic operations and is used for multithreaded
> >> applications.  A user which selects -fprofile-update=atomic wants consistent
> >> code coverage data and not random data.
> >>
> >> This patch removes the fall-back to non-atomic operations for
> >> -fprofile-update=atomic.  If atomic operations in hardware are not available,
> >> then a library call to libatomic is emitted.  To mitigate potential performance
> >> issues an optimization for systems which only support 32-bit atomic operations
> >> is provided.  Here, the edge counter increments are done like this:
> >>
> >>    low = __atomic_add_fetch_4 (&counter.low, 1, MEMMODEL_RELAXED);
> >>    high_inc = low == 0 ? 1 : 0;
> >>    __atomic_add_fetch_4 (&counter.high, high_inc, MEMMODEL_RELAXED);
> > You check for compare_and_swapsi and the old code checks
> > TYPE_PRECISION (gcov_type_node) > 32 to determine whether 8 byte or 4 byte
> > gcov_type is used.  But you do not seem to handle the case where
> > neither SImode nor DImode atomic operations are available?  Can we instead
> > do
> >
> >    if (gcov_type_size == 4)
> >      can_support_atomic4
> >        = HAVE_sync_compare_and_swapsi || HAVE_atomic_compare_and_swapsi;
> >    else if (gcov_type_size == 8)
> >      can_support_atomic8
> >        = HAVE_sync_compare_and_swapdi || HAVE_atomic_compare_and_swapdi;
> >
> >    if (flag_profile_update == PROFILE_UPDATE_ATOMIC
> >        && !can_support_atomic4 && !can_support_atomic8)
> >      {
> >        warning (0, "target does not support atomic profile update, "
> >                 "single mode is selected");
> >        flag_profile_update = PROFILE_UPDATE_SINGLE;
> >      }
> >
> > thus retain the diagnostic & fallback for this case?
>
> No, if you select -fprofile-update=atomic, then the updates shall be
> atomic from my point of view. If a fallback is acceptable, then you can
> use -fprofile-update=prefer-atomic. Using the fallback in
> -fprofile-update=atomic is a bug which prevents the use of gcov for
> multi-threaded applications on the lower end targets which do not have
> atomic operations in hardware.

Ah, OK.  So the fallback there is libatomic calls as well then?  Note
not all targets support libatomic, for those the failure mode is likely
a link error (which might be fine, but we eventually might want to
amend documentation to indicate this failure mode).

> > The existing
> > code also suggests
> > there might be targets with HImode or TImode counters?
>
> Sorry, I don't know.

can you conditionalize the split_atomic_increment init on
gcov_type_size == 8 please?

The patch is missing adjusments to the -fprofile-update documentation.
I think the prefer-atomic vs. atomic needs more explanation with this
change.

otherwise the patch looks OK to me.

Thanks,
Richard.

> >
> > In another mail you mentioned that for gen_time_profiler this doesn't
> > work but its
> > code relies on flag_profile_update as well.  So do we need to split the flag
> > somehow, or continue using the PROFILE_UPDATE_SINGLE fallback when
> > we are doing more than just edge profiling?
>
> If atomic operations are not available in hardware, then with this patch
> calls to libatomic are emitted. In gen_time_profiler() this is not an
> issue at all, since the atomic increment is only done if counters[0] ==
> 0 (if I read the code correctly).
>
> For example for
>
> void f(void) {}
>
> we get on riscv:
>
> gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic
>
>          lui     a4,%hi(__gcov0.f)
>          li      a3,1
>          addi    a4,a4,%lo(__gcov0.f)
>          amoadd.w a5,a3,0(a4)
>          lui     a4,%hi(__gcov0.f+4)
>          addi    a5,a5,1
>          seqz    a5,a5
>          addi    a4,a4,%lo(__gcov0.f+4)
>          amoadd.w zero,a5,0(a4)
>          ret
>
> gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -mbig-endian
>
>          lui     a4,%hi(__gcov0.f+4)
>          li      a3,1
>          addi    a4,a4,%lo(__gcov0.f+4)
>          amoadd.w a5,a3,0(a4)
>          lui     a4,%hi(__gcov0.f)
>          addi    a5,a5,1
>          seqz    a5,a5
>          addi    a4,a4,%lo(__gcov0.f)
>          amoadd.w zero,a5,0(a4)
>          ret
>
> gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -march=rv64g
> -mabi=lp64
>
>          lui     a5,%hi(__gcov0.f)
>          li      a4,1
>          addi    a5,a5,%lo(__gcov0.f)
>          amoadd.d zero,a4,0(a5)
>          ret
>
> gcc --coverage -O2 -S -o - test.c -fprofile-update=atomic -march=rv32id
>
>          lui     a0,%hi(__gcov0.f)
>          li      a3,0
>          li      a1,1
>          li      a2,0
>          addi    a0,a0,%lo(__gcov0.f)
>          tail    __atomic_fetch_add_8
>
> gcc --coverage -O2 -S -o - test.c -fprofile-update=prefer-atomic
> -march=rv32id
>
>          lui     a4,%hi(__gcov0.f)
>          lw      a5,%lo(__gcov0.f)(a4)
>          lw      a2,%lo(__gcov0.f+4)(a4)
>          addi    a3,a5,1
>          sltu    a5,a3,a5
>          add     a5,a5,a2
>          sw      a3,%lo(__gcov0.f)(a4)
>          sw      a5,%lo(__gcov0.f+4)(a4)
>          ret
>
> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.huber@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/
  
Sebastian Huber Dec. 16, 2022, 10:39 a.m. UTC | #4
On 16.12.22 10:47, Richard Biener wrote:
>> No, if you select -fprofile-update=atomic, then the updates shall be
>> atomic from my point of view. If a fallback is acceptable, then you can
>> use -fprofile-update=prefer-atomic. Using the fallback in
>> -fprofile-update=atomic is a bug which prevents the use of gcov for
>> multi-threaded applications on the lower end targets which do not have
>> atomic operations in hardware.
> Ah, OK.  So the fallback there is libatomic calls as well then?  Note
> not all targets support libatomic, for those the failure mode is likely
> a link error (which might be fine, but we eventually might want to
> amend documentation to indicate this failure mode).

It seems these library calls caused issues in the past:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77466

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378

One option could be to emit calls to a new libgcov function:

__gcov_inc_counter(counter) -> updated value

This function could use a __gthread_mutex_t mutex for updates. This ends 
up probably with quite a bad performance.
  
Richard Biener Dec. 16, 2022, 12:09 p.m. UTC | #5
On Fri, Dec 16, 2022 at 11:39 AM Sebastian Huber
<sebastian.huber@embedded-brains.de> wrote:
>
> On 16.12.22 10:47, Richard Biener wrote:
> >> No, if you select -fprofile-update=atomic, then the updates shall be
> >> atomic from my point of view. If a fallback is acceptable, then you can
> >> use -fprofile-update=prefer-atomic. Using the fallback in
> >> -fprofile-update=atomic is a bug which prevents the use of gcov for
> >> multi-threaded applications on the lower end targets which do not have
> >> atomic operations in hardware.
> > Ah, OK.  So the fallback there is libatomic calls as well then?  Note
> > not all targets support libatomic, for those the failure mode is likely
> > a link error (which might be fine, but we eventually might want to
> > amend documentation to indicate this failure mode).
>
> It seems these library calls caused issues in the past:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77466
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378

Hmm, those are testsuite-isms in some way but of course
users would run into the same issue, needing explicit
-latomic (where available).  I suppose target specs could
automatically add that for -fprofile-update=atomic but this
would need to be specified at link time as well then.

> One option could be to emit calls to a new libgcov function:
>
> __gcov_inc_counter(counter) -> updated value
>
> This function could use a __gthread_mutex_t mutex for updates. This ends
> up probably with quite a bad performance.

But that's eventually what libatomic will do as well as fallback.

I don't have a good idea here.

Do you have to explicitely link -latomic on RISCV?

Richard.

> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.huber@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/
  
Sebastian Huber Dec. 16, 2022, 1:01 p.m. UTC | #6
On 16.12.22 13:09, Richard Biener wrote:
> On Fri, Dec 16, 2022 at 11:39 AM Sebastian Huber
> <sebastian.huber@embedded-brains.de>  wrote:
>> On 16.12.22 10:47, Richard Biener wrote:
>>>> No, if you select -fprofile-update=atomic, then the updates shall be
>>>> atomic from my point of view. If a fallback is acceptable, then you can
>>>> use -fprofile-update=prefer-atomic. Using the fallback in
>>>> -fprofile-update=atomic is a bug which prevents the use of gcov for
>>>> multi-threaded applications on the lower end targets which do not have
>>>> atomic operations in hardware.
>>> Ah, OK.  So the fallback there is libatomic calls as well then?  Note
>>> not all targets support libatomic, for those the failure mode is likely
>>> a link error (which might be fine, but we eventually might want to
>>> amend documentation to indicate this failure mode).
>> It seems these library calls caused issues in the past:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77466
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77378
> Hmm, those are testsuite-isms in some way but of course
> users would run into the same issue, needing explicit
> -latomic (where available).  I suppose target specs could
> automatically add that for -fprofile-update=atomic but this
> would need to be specified at link time as well then.

Why can't we provide libatomic for all targets? Is gthr.h not always 
available?

> 
>> One option could be to emit calls to a new libgcov function:
>>
>> __gcov_inc_counter(counter) -> updated value
>>
>> This function could use a __gthread_mutex_t mutex for updates. This ends
>> up probably with quite a bad performance.
> But that's eventually what libatomic will do as well as fallback.

For libatomic, hosts can implement a protect_start() and protect_end() 
function. On RTEMS, this is implemented like this:

#include <machine/_libatomic.h>

static inline UWORD
protect_start (void *ptr)
{
   return _Libatomic_Protect_start (ptr);
}

static inline void
protect_end (void *ptr, UWORD isr_level)
{
   _Libatomic_Protect_end (ptr, isr_level);
}

On single-core systems, this is mapped to interrupt disable/enable. This 
is quite efficient (compared to a mutex).

> 
> I don't have a good idea here.
> 
> Do you have to explicitely link -latomic on RISCV?

For RTEMS, libatomic is always added to the linker command line:

gcc/config/rtems.h:"%{qrtems:--start-group -lrtemsbsp -lrtemscpu 
-latomic -lc -lgcc --end-group}"

For riscv, there seems to be also a linux special case:

gcc/config/riscv/linux.h:  " %{pthread:" LD_AS_NEEDED_OPTION " -latomic 
" LD_NO_AS_NEEDED_OPTION "}"
gcc/config/riscv/linux.h:#define LIB_SPEC GNU_USER_TARGET_LIB_SPEC " 
-latomic "
  

Patch

diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index 2beb49241f2..1d326dde59a 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -73,6 +73,17 @@  static GTY(()) tree ic_tuple_var;
 static GTY(()) tree ic_tuple_counters_field;
 static GTY(()) tree ic_tuple_callee_field;
 
+/* If the user selected atomic profile counter updates
+   (-fprofile-update=atomic), then the counter updates will be done atomically.
+   Ideally, this is done through atomic operations in hardware.  If the
+   hardware supports only 32-bit atomic increments and gcov_type_node is a
+   64-bit integer type, then for the profile edge counters the increment is
+   performed through two separate 32-bit atomic increments.  This case is
+   indicated by the split_atomic_increment variable begin true.  If the
+   hardware does not support atomic operations at all, then a library call to
+   libatomic is emitted.  */
+static bool split_atomic_increment;
+
 /* Do initialization work for the edge profiler.  */
 
 /* Add code:
@@ -242,30 +253,59 @@  gimple_init_gcov_profiler (void)
 void
 gimple_gen_edge_profiler (int edgeno, edge e)
 {
-  tree one;
-
-  one = build_int_cst (gcov_type_node, 1);
+  const char *name = "PROF_edge_counter";
+  tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);
+  tree one = build_int_cst (gcov_type_node, 1);
 
   if (flag_profile_update == PROFILE_UPDATE_ATOMIC)
     {
-      /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
-      tree addr = tree_coverage_counter_addr (GCOV_COUNTER_ARCS, edgeno);
-      tree f = builtin_decl_explicit (TYPE_PRECISION (gcov_type_node) > 32
-				      ? BUILT_IN_ATOMIC_FETCH_ADD_8:
-				      BUILT_IN_ATOMIC_FETCH_ADD_4);
-      gcall *stmt = gimple_build_call (f, 3, addr, one,
-				       build_int_cst (integer_type_node,
-						      MEMMODEL_RELAXED));
-      gsi_insert_on_edge (e, stmt);
+      tree addr = build_fold_addr_expr (ref);
+      tree relaxed = build_int_cst (integer_type_node, MEMMODEL_RELAXED);
+      if (!split_atomic_increment)
+	{
+	  /* __atomic_fetch_add (&counter, 1, MEMMODEL_RELAXED); */
+	  tree f = builtin_decl_explicit (TYPE_PRECISION (gcov_type_node) > 32
+					  ? BUILT_IN_ATOMIC_FETCH_ADD_8:
+					  BUILT_IN_ATOMIC_FETCH_ADD_4);
+	  gcall *stmt = gimple_build_call (f, 3, addr, one, relaxed);
+	  gsi_insert_on_edge (e, stmt);
+	}
+      else
+	{
+	  /* low = __atomic_add_fetch_4 (addr, 1, MEMMODEL_RELAXED);
+	     high_inc = low == 0 ? 1 : 0;
+	     __atomic_add_fetch_4 (addr_high, high_inc, MEMMODEL_RELAXED); */
+	  tree zero32 = build_zero_cst (uint32_type_node);
+	  tree one32 = build_one_cst (uint32_type_node);
+	  tree addr_high = make_temp_ssa_name (TREE_TYPE (addr), NULL, name);
+	  gimple *stmt = gimple_build_assign (addr_high, POINTER_PLUS_EXPR,
+					      addr,
+					      build_int_cst (size_type_node,
+							     4));
+	  gsi_insert_on_edge (e, stmt);
+	  if (WORDS_BIG_ENDIAN)
+	    std::swap (addr, addr_high);
+	  tree f = builtin_decl_explicit (BUILT_IN_ATOMIC_ADD_FETCH_4);
+	  stmt = gimple_build_call (f, 3, addr, one, relaxed);
+	  tree low = make_temp_ssa_name (uint32_type_node, NULL, name);
+	  gimple_call_set_lhs (stmt, low);
+	  gsi_insert_on_edge (e, stmt);
+	  tree is_zero = make_temp_ssa_name (boolean_type_node, NULL, name);
+	  stmt = gimple_build_assign (is_zero, EQ_EXPR, low, zero32);
+	  gsi_insert_on_edge (e, stmt);
+	  tree high_inc = make_temp_ssa_name (uint32_type_node, NULL, name);
+	  stmt = gimple_build_assign (high_inc, COND_EXPR, is_zero, one32,
+				      zero32);
+	  gsi_insert_on_edge (e, stmt);
+	  stmt = gimple_build_call (f, 3, addr_high, high_inc, relaxed);
+	  gsi_insert_on_edge (e, stmt);
+	}
     }
   else
     {
-      tree ref = tree_coverage_counter_ref (GCOV_COUNTER_ARCS, edgeno);
-      tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
-						   NULL, "PROF_edge_counter");
+      tree gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node, NULL, name);
       gassign *stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
-      gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node,
-					      NULL, "PROF_edge_counter");
+      gcov_type_tmp_var = make_temp_ssa_name (gcov_type_node, NULL, name);
       gassign *stmt2 = gimple_build_assign (gcov_type_tmp_var, PLUS_EXPR,
 					    gimple_assign_lhs (stmt1), one);
       gassign *stmt3 = gimple_build_assign (unshare_expr (ref),
@@ -710,11 +750,8 @@  tree_profiling (void)
 
   if (flag_profile_update == PROFILE_UPDATE_ATOMIC
       && !can_support_atomic)
-    {
-      warning (0, "target does not support atomic profile update, "
-	       "single mode is selected");
-      flag_profile_update = PROFILE_UPDATE_SINGLE;
-    }
+    split_atomic_increment = HAVE_sync_compare_and_swapsi
+      || HAVE_atomic_compare_and_swapsi;
   else if (flag_profile_update == PROFILE_UPDATE_PREFER_ATOMIC)
     flag_profile_update = can_support_atomic
       ? PROFILE_UPDATE_ATOMIC : PROFILE_UPDATE_SINGLE;