gcov: Split atomic bitwise-or for some targets
Checks
| Context |
Check |
Description |
| rivoscibot/toolchain-ci-rivos-lint |
warning
|
Lint failed
|
| rivoscibot/toolchain-ci-rivos-apply-patch |
success
|
Patch applied
|
| rivoscibot/toolchain-ci-rivos-build--newlib-rv64gcv-lp64d-multilib |
success
|
Build passed
|
| rivoscibot/toolchain-ci-rivos-build--linux-rv64gcv-lp64d-multilib |
success
|
Build passed
|
| rivoscibot/toolchain-ci-rivos-build--linux-rv64gc_zba_zbb_zbc_zbs-lp64d-multilib |
success
|
Build passed
|
| rivoscibot/toolchain-ci-rivos-test |
success
|
Testing passed
|
Commit Message
There are targets, which only offer 32-bit atomic operations (for
example 32-bit RISC-V). For these targets, split the 64-bit atomic
bitwise-or operation into two parts.
For this test case
int a(int i);
int b(int i);
int f(int i)
{
if (i) {
return a(i);
} else {
return b(i);
}
}
with options
-O2 -fprofile-update=atomic -fcondition-coverage
the code generation to 64-bit vs. 32-bit RISC-V looks like:
addi a5,a5,%lo(.LANCHOR0)
beq a0,zero,.L2
li a4,1
- amoor.d zero,a4,0(a5)
- addi a5,a5,8
- amoor.d zero,zero,0(a5)
+ amoor.w zero,a4,0(a5)
+ addi a4,a5,4
+ amoor.w zero,zero,0(a4)
+ addi a4,a5,8
+ amoor.w zero,zero,0(a4)
+ addi a5,a5,12
+ amoor.w zero,zero,0(a5)
tail a
.L2:
- amoor.d zero,zero,0(a5)
+ amoor.w zero,zero,0(a5)
+ addi a4,a5,4
+ amoor.w zero,zero,0(a4)
li a4,1
- addi a5,a5,8
- amoor.d zero,a4,0(a5)
+ addi a3,a5,8
+ amoor.w zero,a4,0(a3)
+ addi a5,a5,12
+ amoor.w zero,zero,0(a5)
tail b
Not related to this patch, even with -O2 the compiler generates
no-operations like
amoor.d zero,zero,0(a5)
and
amoor.w zero,zero,0(a5)
Would this be possible to filter out in instrument_decisions()?
gcc/ChangeLog:
* tree-profile.cc (split_update_decision_counter): New.
(instrument_decisions): Use counter_update to determine which
atomic operations are available. Use
split_update_decision_counter() if 64-bit atomic operations can
be split up into two 32-bit atomic operations.
---
gcc/tree-profile.cc | 73 +++++++++++++++++++++++++++++++++++++++++----
1 file changed, 67 insertions(+), 6 deletions(-)
Comments
On 12/7/2025 5:11 AM, Sebastian Huber wrote:
> There are targets, which only offer 32-bit atomic operations (for
> example 32-bit RISC-V). For these targets, split the 64-bit atomic
> bitwise-or operation into two parts.
>
> For this test case
>
> int a(int i);
> int b(int i);
>
> int f(int i)
> {
> if (i) {
> return a(i);
> } else {
> return b(i);
> }
> }
>
> with options
>
> -O2 -fprofile-update=atomic -fcondition-coverage
>
> the code generation to 64-bit vs. 32-bit RISC-V looks like:
>
> addi a5,a5,%lo(.LANCHOR0)
> beq a0,zero,.L2
> li a4,1
> - amoor.d zero,a4,0(a5)
> - addi a5,a5,8
> - amoor.d zero,zero,0(a5)
> + amoor.w zero,a4,0(a5)
> + addi a4,a5,4
> + amoor.w zero,zero,0(a4)
> + addi a4,a5,8
> + amoor.w zero,zero,0(a4)
> + addi a5,a5,12
> + amoor.w zero,zero,0(a5)
> tail a
> .L2:
> - amoor.d zero,zero,0(a5)
> + amoor.w zero,zero,0(a5)
> + addi a4,a5,4
> + amoor.w zero,zero,0(a4)
> li a4,1
> - addi a5,a5,8
> - amoor.d zero,a4,0(a5)
> + addi a3,a5,8
> + amoor.w zero,a4,0(a3)
> + addi a5,a5,12
> + amoor.w zero,zero,0(a5)
> tail b
>
> Not related to this patch, even with -O2 the compiler generates
> no-operations like
>
> amoor.d zero,zero,0(a5)
>
> and
>
> amoor.w zero,zero,0(a5)
>
> Would this be possible to filter out in instrument_decisions()?
I'd bet this might be reasonably optimized in either gimple or RTL
without major work. Though someone would have to read up on semantics
-- are we allowed to drop atomics like that?
>
> gcc/ChangeLog:
>
> * tree-profile.cc (split_update_decision_counter): New.
> (instrument_decisions): Use counter_update to determine which
> atomic operations are available. Use
> split_update_decision_counter() if 64-bit atomic operations can
> be split up into two 32-bit atomic operations.
I was originally thinking that spitting these down in the optimizers or
target files would make more sense. But this looks like a fairly
practical solution pending doing some real optimization around atomics
(which I think we do need, I've seen unused relaxed loads showing up in
profiles from jemalloc). But until then....
> +
> + /* Get the high 32-bit of the counter */
> + tree shift_32 = build_int_cst (integer_type_node, 32);
> + tree counter_high_64 = make_temp_ssa_name (gcov_type_node, NULL,
> + "PROF_decision");
> + gassign *assign3 = gimple_build_assign (counter_high_64, LSHIFT_EXPR,
> + counter, shift_32);
Doesn't the type of shift_32 need to match the type of the object being
shifted? Or do we have loose requirements around type checking operands
for this case (where the shift count is often in a smaller precision
than the object being shifted).
Do we need to worry about logical vs arithmetic shifts here? COUNTER's
type is going to drive that decision, so we just need to make sure it's
sensible.
>
>
> @@ -1157,6 +1213,11 @@ instrument_decisions (array_slice<basic_block> expr, size_t condno,
> next[k], relaxed);
> gsi_insert_on_edge (e, flush);
> }
> + else if (use_atomic_split)
> + {
> + split_update_decision_counter (e, ref, next[k],
> + atomic_ior_32, relaxed);
> + }
Consider dropping the extraneous curlys. That function seems to be
formatted without regard to our formatting conventions, so I'm not going
to ask that you adjust indention on this little hunk since it mirrors
nearby code.
Jeff
----- Am 27. Dez 2025 um 0:43 schrieb Jeff Law jeffreyalaw@gmail.com:
> On 12/7/2025 5:11 AM, Sebastian Huber wrote:
[...]
>> +
>> + /* Get the high 32-bit of the counter */
>> + tree shift_32 = build_int_cst (integer_type_node, 32);
>> + tree counter_high_64 = make_temp_ssa_name (gcov_type_node, NULL,
>> + "PROF_decision");
>> + gassign *assign3 = gimple_build_assign (counter_high_64, LSHIFT_EXPR,
>> + counter, shift_32);
>
> Doesn't the type of shift_32 need to match the type of the object being
> shifted? Or do we have loose requirements around type checking operands
> for this case (where the shift count is often in a smaller precision
> than the object being shifted).
This is my attempt to write something like this:
int shift_32 = 32;
gcov_type_node counter_high_64 = counter >> shift_32;
>
> Do we need to worry about logical vs arithmetic shifts here? COUNTER's
> type is going to drive that decision, so we just need to make sure it's
> sensible.
We have
tree
get_gcov_type (void)
{
scalar_int_mode mode
= smallest_int_mode_for_size
(LONG_LONG_TYPE_SIZE > 32 ? 64 : 32).require ();
return lang_hooks.types.type_for_mode (mode, false);
}
So, the gcov_type_node is probably a signed type.
With
gassign *assign4 = gimple_build_assign (counter_high_32, NOP_EXPR,
counter_high_64);
does it matter if it is a logical or arithmetic shift? I am sorry, but I don't really know what I am doing here. I tinkered this together by looking at examples in the code.
>
>
>>
>>
>> @@ -1157,6 +1213,11 @@ instrument_decisions (array_slice<basic_block> expr,
>> size_t condno,
>> next[k], relaxed);
>> gsi_insert_on_edge (e, flush);
>> }
>> + else if (use_atomic_split)
>> + {
>> + split_update_decision_counter (e, ref, next[k],
>> + atomic_ior_32, relaxed);
>> + }
>
> Consider dropping the extraneous curlys. That function seems to be
> formatted without regard to our formatting conventions, so I'm not going
> to ask that you adjust indention on this little hunk since it mirrors
> nearby code.
Ok, I adjusted the patch.
----- Am 28. Dez 2025 um 14:26 schrieb Sebastian Huber sebastian.huber@embedded-brains.de:
> ----- Am 27. Dez 2025 um 0:43 schrieb Jeff Law jeffreyalaw@gmail.com:
>
>> On 12/7/2025 5:11 AM, Sebastian Huber wrote:
> [...]
>>> +
>>> + /* Get the high 32-bit of the counter */
>>> + tree shift_32 = build_int_cst (integer_type_node, 32);
>>> + tree counter_high_64 = make_temp_ssa_name (gcov_type_node, NULL,
>>> + "PROF_decision");
>>> + gassign *assign3 = gimple_build_assign (counter_high_64, LSHIFT_EXPR,
>>> + counter, shift_32);
>>
>> Doesn't the type of shift_32 need to match the type of the object being
>> shifted? Or do we have loose requirements around type checking operands
>> for this case (where the shift count is often in a smaller precision
>> than the object being shifted).
>
> This is my attempt to write something like this:
>
> int shift_32 = 32;
> gcov_type_node counter_high_64 = counter >> shift_32;
Oh, it looks like I confused left and right. This should be an RSHIFT_EXPR:
gassign *assign3 = gimple_build_assign (counter_high_64, RSHIFT_EXPR,
counter, shift_32);
>
>>
>> Do we need to worry about logical vs arithmetic shifts here? COUNTER's
>> type is going to drive that decision, so we just need to make sure it's
>> sensible.
>
> We have
>
> tree
> get_gcov_type (void)
> {
> scalar_int_mode mode
> = smallest_int_mode_for_size
> (LONG_LONG_TYPE_SIZE > 32 ? 64 : 32).require ();
> return lang_hooks.types.type_for_mode (mode, false);
> }
>
> So, the gcov_type_node is probably a signed type.
>
> With
>
> gassign *assign4 = gimple_build_assign (counter_high_32, NOP_EXPR,
> counter_high_64);
>
> does it matter if it is a logical or arithmetic shift? I am sorry, but I don't
> really know what I am doing here. I tinkered this together by looking at
> examples in the code.
>
>>
>>
>>>
>>>
>>> @@ -1157,6 +1213,11 @@ instrument_decisions (array_slice<basic_block> expr,
>>> size_t condno,
>>> next[k], relaxed);
>>> gsi_insert_on_edge (e, flush);
>>> }
>>> + else if (use_atomic_split)
>>> + {
>>> + split_update_decision_counter (e, ref, next[k],
>>> + atomic_ior_32, relaxed);
>>> + }
>>
>> Consider dropping the extraneous curlys. That function seems to be
>> formatted without regard to our formatting conventions, so I'm not going
>> to ask that you adjust indention on this little hunk since it mirrors
>> nearby code.
>
> Ok, I adjusted the patch.
>
> --
> embedded brains GmbH & Co. KG
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.huber@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax: +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/
----- Am 28. Dez 2025 um 14:34 schrieb Sebastian Huber sebastian.huber@embedded-brains.de:
> ----- Am 28. Dez 2025 um 14:26 schrieb Sebastian Huber
> sebastian.huber@embedded-brains.de:
>
>> ----- Am 27. Dez 2025 um 0:43 schrieb Jeff Law jeffreyalaw@gmail.com:
>>
>>> On 12/7/2025 5:11 AM, Sebastian Huber wrote:
>> [...]
>>>> +
>>>> + /* Get the high 32-bit of the counter */
>>>> + tree shift_32 = build_int_cst (integer_type_node, 32);
>>>> + tree counter_high_64 = make_temp_ssa_name (gcov_type_node, NULL,
>>>> + "PROF_decision");
>>>> + gassign *assign3 = gimple_build_assign (counter_high_64, LSHIFT_EXPR,
>>>> + counter, shift_32);
>>>
>>> Doesn't the type of shift_32 need to match the type of the object being
>>> shifted? Or do we have loose requirements around type checking operands
>>> for this case (where the shift count is often in a smaller precision
>>> than the object being shifted).
>>
>> This is my attempt to write something like this:
>>
>> int shift_32 = 32;
>> gcov_type_node counter_high_64 = counter >> shift_32;
>
> Oh, it looks like I confused left and right. This should be an RSHIFT_EXPR:
>
> gassign *assign3 = gimple_build_assign (counter_high_64, RSHIFT_EXPR,
> counter, shift_32);
I used this test case to double check that the shifting is now correct:
int a(void);
int b(void);
int c(int);
int f(int *i)
{
if (c(i[0]) || c(i[1]) || c(i[2]) || c(i[3]) || c(i[4]) ||
c(i[5]) || c(i[6]) || c(i[7]) || c(i[8]) || c(i[9]) ||
c(i[10]) || c(i[11]) || c(i[12]) || c(i[13]) || c(i[14]) ||
c(i[15]) || c(i[16]) || c(i[17]) || c(i[18]) || c(i[19]) ||
c(i[20]) || c(i[21]) || c(i[22]) || c(i[23]) || c(i[24]) ||
c(i[25]) || c(i[26]) || c(i[27]) || c(i[28]) || c(i[29]) ||
c(i[30]) || c(i[31]) || c(i[32]) || c(i[33]) || c(i[34]) ||
c(i[35]) || c(i[36]) || c(i[37]) || c(i[38]) || c(i[39])) {
return a();
} else {
return b();
}
}
Interestingly, GCC now reuses the "amoor.w zero,zero" operations (see "j .L46").
.type f, @function
f:
addi sp,sp,-16
sw s0,8(sp)
mv s0,a0
lw a0,0(a0)
sw ra,12(sp)
call c
bne a0,zero,.L49
lw a0,4(s0)
call c
beq a0,zero,.L4
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
li a4,2
.L44:
amoor.w zero,a4,0(a5)
addi a4,a5,4
.L46:
amoor.w zero,zero,0(a4)
addi a4,a5,8
amoor.w zero,zero,0(a4)
addi a5,a5,12
amoor.w zero,zero,0(a5)
.L3:
lw s0,8(sp)
lw ra,12(sp)
addi sp,sp,16
tail a
.L4:
lw a0,8(s0)
call c
beq a0,zero,.L5
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
li a4,4
amoor.w zero,a4,0(a5)
add a4,a5,a4
j .L46
GCC reloads the .LANCHOR0 about 40 times. It probably should do this only once and keep it in a non-volatile register.
Once the counter exceeds 32 bits, we get this code:
.L34:
lw a0,128(s0)
call c
beq a0,zero,.L35
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
amoor.w zero,zero,0(a5)
li a4,1
.L45:
addi a3,a5,4
.L47:
amoor.w zero,a4,0(a3)
addi a4,a5,8
amoor.w zero,zero,0(a4)
addi a5,a5,12
amoor.w zero,zero,0(a5)
j .L3
This is the corresponding 64-bit code:
.L34:
lw a0,128(s0)
call c
beq a0,zero,.L35
lui a5,%hi(.LANCHOR0)
li a4,1
addi a5,a5,%lo(.LANCHOR0)
slli a4,a4,32
amoor.d zero,a4,0(a5)
addi a5,a5,8
amoor.d zero,zero,0(a5)
j .L3
On 12/28/2025 6:26 AM, Sebastian Huber wrote:
> ----- Am 27. Dez 2025 um 0:43 schrieb Jeff Law jeffreyalaw@gmail.com:
>
>> On 12/7/2025 5:11 AM, Sebastian Huber wrote:
> [...]
>>> +
>>> + /* Get the high 32-bit of the counter */
>>> + tree shift_32 = build_int_cst (integer_type_node, 32);
>>> + tree counter_high_64 = make_temp_ssa_name (gcov_type_node, NULL,
>>> + "PROF_decision");
>>> + gassign *assign3 = gimple_build_assign (counter_high_64, LSHIFT_EXPR,
>>> + counter, shift_32);
>> Doesn't the type of shift_32 need to match the type of the object being
>> shifted? Or do we have loose requirements around type checking operands
>> for this case (where the shift count is often in a smaller precision
>> than the object being shifted).
> This is my attempt to write something like this:
>
> int shift_32 = 32;
> gcov_type_node counter_high_64 = counter >> shift_32;
So I went into the tree checking code and we do indeed have looser
checks for the shift/rotate cases; essentially we allow the shift/rotate
count to be any integral type or vector of integrals. So we're OK with
a constant node like you're using.
> tree
> get_gcov_type (void)
> {
> scalar_int_mode mode
> = smallest_int_mode_for_size
> (LONG_LONG_TYPE_SIZE > 32 ? 64 : 32).require ();
> return lang_hooks.types.type_for_mode (mode, false);
> }
>
> So, the gcov_type_node is probably a signed type.
That was my conclusion as well.
>
> With
>
> gassign *assign4 = gimple_build_assign (counter_high_32, NOP_EXPR,
> counter_high_64);
>
> does it matter if it is a logical or arithmetic shift? I am sorry, but I don't really know what I am doing here. I tinkered this together by looking at examples in the code.
No worries at all. I'm not familiar with the gcov code, so we're both
just kind of slogging through it.
So the assignment above will just convert the types and as I think
through it, the type of the shift isn't going to matter because you
shift the upper 32 bits into the low 32 bit positions. The upper 32
bits will be copies of the original sign bit. But then we use the
nop-conversion to drop those upper 32 bits anyway. So it shouldn't
really matter if they were zeros or copies of the sign bit because we
never use them.
>> Consider dropping the extraneous curlys. That function seems to be
>> formatted without regard to our formatting conventions, so I'm not going
>> to ask that you adjust indention on this little hunk since it mirrors
>> nearby code.
> Ok, I adjusted the patch.
Thanks. I'll take another looksie, but we're probably good to go after
working through this stuff a bit on this thread.
jeff
On 12/28/2025 7:03 PM, Sebastian Huber wrote:
> I used this test case to double check that the shifting is now correct:
>
> int a(void);
> int b(void);
> int c(int);
> int f(int *i)
> {
> if (c(i[0]) || c(i[1]) || c(i[2]) || c(i[3]) || c(i[4]) ||
> c(i[5]) || c(i[6]) || c(i[7]) || c(i[8]) || c(i[9]) ||
> c(i[10]) || c(i[11]) || c(i[12]) || c(i[13]) || c(i[14]) ||
> c(i[15]) || c(i[16]) || c(i[17]) || c(i[18]) || c(i[19]) ||
> c(i[20]) || c(i[21]) || c(i[22]) || c(i[23]) || c(i[24]) ||
> c(i[25]) || c(i[26]) || c(i[27]) || c(i[28]) || c(i[29]) ||
> c(i[30]) || c(i[31]) || c(i[32]) || c(i[33]) || c(i[34]) ||
> c(i[35]) || c(i[36]) || c(i[37]) || c(i[38]) || c(i[39])) {
> return a();
> } else {
> return b();
> }
> }
>
> Interestingly, GCC now reuses the "amoor.w zero,zero" operations (see "j .L46").
Right. That's not a huge surprise to me. If we look at the gimple we see:
> ;; basic block 3, loop depth 0
> ;; pred: 2
> __atomic_fetch_or_4 (&__gcov8.f[0], 1, 0);
> __atomic_fetch_or_4 (&MEM <long long int> [(void *)&__gcov8.f + 4B],
> 0, 0);
> __atomic_fetch_or_4 (&__gcov8.f[1], 0, 0);
> __atomic_fetch_or_4 (&MEM <long long int> [(void *)&__gcov8.f +
> 12B], 0, 0);
> _8 = a (i_3(D)); [tail call]
> goto <bb 5>; [100.00%]
> ;; succ: 5
>
> ;; basic block 4, loop depth 0
> ;; pred: 2
> __atomic_fetch_or_4 (&__gcov8.f[0], 0, 0);
> __atomic_fetch_or_4 (&MEM <long long int> [(void *)&__gcov8.f + 4B],
> 0, 0);
> __atomic_fetch_or_4 (&__gcov8.f[1], 1, 0);
> __atomic_fetch_or_4 (&MEM <long long int> [(void *)&__gcov8.f +
> 12B], 0, 0);
> _6 = b (0); [tail call]
So to improve the code you need to recognize the atomic_fetch_or_4 where
the object is IOR'd with the constant 0 as a nop and remove those
statements (or not emit them to begin with). In general our optimizers
don't do a whole lot with atomics right now.
I think your change is missing a check somewhere. When I compile your
test I initially get "target does not support atomic profile update,
single mode is selected", but then it still does the atomic path.
Before your patch is just used the non-atomic updates. So it appears
something isn't quite right yet.
jeff
----- Am 29. Dez 2025 um 20:08 schrieb Jeffrey Law jeffrey.law@oss.qualcomm.com:
[...]
> I think your change is missing a check somewhere. When I compile your
> test I initially get "target does not support atomic profile update,
> single mode is selected", but then it still does the atomic path.
> Before your patch is just used the non-atomic updates. So it appears
> something isn't quite right yet.
Yes, it seems the counter update mode selection was wrong for PROFILE_UPDATE_ATOMIC. There should be a dedicated warning if COUNTER_UPDATE_ATOMIC_PARTIAL is selected. Please have a look at this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2025-December/704607.html
If the target doesn't support libatomic, then some atomic operations cannot be carried out. If at least 32-bit atomic operations are available, we can at least do the atomic increments and the bit field updates. This case is indicated by COUNTER_UPDATE_ATOMIC_PARTIAL.
@@ -1006,6 +1006,57 @@ resolve_counters (vec<counters>& cands)
}
+/* At edge E, update the decision counter referenced by REF with the
+ COUNTER. Generate two separate 32-bit atomic bitwise-or operations
+ specified by ATOMIC_IOR_32 in the RELAXED memory order. */
+static void
+split_update_decision_counter (edge e, tree ref, tree counter, tree
+ atomic_ior_32, tree relaxed)
+{
+ gimple_stmt_iterator gsi = gsi_last (PENDING_STMT (e));
+ ref = unshare_expr (ref);
+
+ /* Get the low and high address of the referenced counter */
+ tree addr_low = build_addr (ref);
+ tree addr_high = make_temp_ssa_name (TREE_TYPE (addr_low), NULL,
+ "PROF_decision");
+ tree four = build_int_cst (size_type_node, 4);
+ gassign *assign1 = gimple_build_assign (addr_high, POINTER_PLUS_EXPR,
+ addr_low, four);
+ gsi_insert_after (&gsi, assign1, GSI_NEW_STMT);
+ if (WORDS_BIG_ENDIAN)
+ std::swap (addr_low, addr_high);
+
+ /* Get the low 32-bit of the counter */
+ tree counter_low_32 = make_temp_ssa_name (uint32_type_node, NULL,
+ "PROF_decision");
+ gassign *assign2 = gimple_build_assign (counter_low_32, NOP_EXPR, counter);
+ gsi_insert_after (&gsi, assign2, GSI_NEW_STMT);
+
+ /* Get the high 32-bit of the counter */
+ tree shift_32 = build_int_cst (integer_type_node, 32);
+ tree counter_high_64 = make_temp_ssa_name (gcov_type_node, NULL,
+ "PROF_decision");
+ gassign *assign3 = gimple_build_assign (counter_high_64, LSHIFT_EXPR,
+ counter, shift_32);
+ gsi_insert_after (&gsi, assign3, GSI_NEW_STMT);
+ tree counter_high_32 = make_temp_ssa_name (uint32_type_node, NULL,
+ "PROF_decision");
+ gassign *assign4 = gimple_build_assign (counter_high_32, NOP_EXPR,
+ counter_high_64);
+ gsi_insert_after (&gsi, assign4, GSI_NEW_STMT);
+
+ /* Atomically bitwise-or the low 32-bit counter parts */
+ gcall *call1 = gimple_build_call (atomic_ior_32, 3, addr_low,
+ counter_low_32, relaxed);
+ gsi_insert_after (&gsi, call1, GSI_NEW_STMT);
+
+ /* Atomically bitwise-or the high 32-bit counter parts */
+ gcall *call2 = gimple_build_call (atomic_ior_32, 3, addr_high,
+ counter_high_32, relaxed);
+ gsi_insert_after (&gsi, call2, GSI_NEW_STMT);
+}
+
/* Add instrumentation to a decision subgraph. EXPR should be the
(topologically sorted) block of nodes returned by cov_blocks, MAPS the
bitmaps returned by cov_maps, and MASKS the block of bitsets returned by
@@ -1108,11 +1159,16 @@ instrument_decisions (array_slice<basic_block> expr, size_t condno,
gcc_assert (xi == bitmap_count_bits (core));
const tree relaxed = build_int_cst (integer_type_node, MEMMODEL_RELAXED);
- const bool atomic = flag_profile_update == PROFILE_UPDATE_ATOMIC;
- const tree atomic_ior = builtin_decl_explicit
- (TYPE_PRECISION (gcov_type_node) > 32
- ? BUILT_IN_ATOMIC_FETCH_OR_8
- : BUILT_IN_ATOMIC_FETCH_OR_4);
+ const bool use_atomic_builtin =
+ counter_update == COUNTER_UPDATE_ATOMIC_BUILTIN;
+ const bool use_atomic_split =
+ counter_update == COUNTER_UPDATE_ATOMIC_SPLIT ||
+ counter_update == COUNTER_UPDATE_ATOMIC_PARTIAL;
+ const tree atomic_ior_32 =
+ builtin_decl_explicit (BUILT_IN_ATOMIC_FETCH_OR_4);
+ const tree atomic_ior = TYPE_PRECISION (gcov_type_node) > 32 ?
+ builtin_decl_explicit (BUILT_IN_ATOMIC_FETCH_OR_8) :
+ atomic_ior_32;
/* Flush to the gcov accumulators. */
for (const basic_block b : expr)
@@ -1149,7 +1205,7 @@ instrument_decisions (array_slice<basic_block> expr, size_t condno,
{
tree ref = tree_coverage_counter_ref (GCOV_COUNTER_CONDS,
2*condno + k);
- if (atomic)
+ if (use_atomic_builtin)
{
ref = unshare_expr (ref);
gcall *flush = gimple_build_call (atomic_ior, 3,
@@ -1157,6 +1213,11 @@ instrument_decisions (array_slice<basic_block> expr, size_t condno,
next[k], relaxed);
gsi_insert_on_edge (e, flush);
}
+ else if (use_atomic_split)
+ {
+ split_update_decision_counter (e, ref, next[k],
+ atomic_ior_32, relaxed);
+ }
else
{
tree get = emit_assign (e, ref);