LoongArch: Change jumptable's register constraint to 'q' [PR110136]

Message ID 20230607023147.1602812-1-chenglulu@loongson.cn
State New
Headers
Series LoongArch: Change jumptable's register constraint to 'q' [PR110136] |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gcc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_gcc_build--master-aarch64 success Testing passed

Commit Message

Lulu Cheng June 7, 2023, 2:31 a.m. UTC
  If the $ra register is modified during the jump to the jump table, the hardware
branch prediction function will be broken, resulting in a significant increase
in the branch false prediction rate and affecting performance.

gcc/ChangeLog:

	* config/loongarch/loongarch.md: Change register constraint to 'q'.
---
 gcc/config/loongarch/loongarch.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
  

Comments

WANG Xuerui June 7, 2023, 3:26 a.m. UTC | #1
Hi,

On 2023/6/7 10:31, Lulu Cheng wrote:
> If the $ra register is modified during the jump to the jump table, the hardware
> branch prediction function will be broken, resulting in a significant increase
> in the branch false prediction rate and affecting performance.

Thanks for the insight! This is the kind of improvement that will 
probably become a lot harder to even *sight* without uarch details 
available.

However, I think it's better to also include a minimized test case to 
ensure the compiled code doesn't regress. (Comparison of relevant 
statistics, e.g. output of perf stat, would be even nicer to have!)

> 
> gcc/ChangeLog:
> 
> 	* config/loongarch/loongarch.md: Change register constraint to 'q'.
> ---
>   gcc/config/loongarch/loongarch.md | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
> index 816a943d155..f9b64173104 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -2926,9 +2926,11 @@ (define_expand "tablejump"
>     DONE;
>   })
>   
> +;; Jump to the jump table Avoid using the $r1 register to prevent
> +;; affecting hardware branch prediction.
>   (define_insn "@tablejump<mode>"
>     [(set (pc)
> -	(match_operand:P 0 "register_operand" "r"))
> +	(match_operand:P 0 "register_operand" "q"))
>      (use (label_ref (match_operand 1 "" "")))]
>     ""
>     "jr\t%0"
  
Lulu Cheng June 7, 2023, 3:36 a.m. UTC | #2
在 2023/6/7 上午11:26, WANG Xuerui 写道:
> Hi,
>
> On 2023/6/7 10:31, Lulu Cheng wrote:
>> If the $ra register is modified during the jump to the jump table, 
>> the hardware
>> branch prediction function will be broken, resulting in a significant 
>> increase
>> in the branch false prediction rate and affecting performance.
>
> Thanks for the insight! This is the kind of improvement that will 
> probably become a lot harder to even *sight* without uarch details 
> available.
>
> However, I think it's better to also include a minimized test case to 
> ensure the compiled code doesn't regress. (Comparison of relevant 
> statistics, e.g. output of perf stat, would be even nicer to have!)
>
There was no way I could find a small test case that would replicate 
this problem. This occurs when compiling spec2006 400.perlbench. And it 
only appears when '-flto' is added.:-(

But I paid for reproducible programs and compilation methods under the 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110136.

>>
>> gcc/ChangeLog:
>>
>>     * config/loongarch/loongarch.md: Change register constraint to 'q'.
>> ---
>>   gcc/config/loongarch/loongarch.md | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/config/loongarch/loongarch.md 
>> b/gcc/config/loongarch/loongarch.md
>> index 816a943d155..f9b64173104 100644
>> --- a/gcc/config/loongarch/loongarch.md
>> +++ b/gcc/config/loongarch/loongarch.md
>> @@ -2926,9 +2926,11 @@ (define_expand "tablejump"
>>     DONE;
>>   })
>>   +;; Jump to the jump table Avoid using the $r1 register to prevent
>> +;; affecting hardware branch prediction.
>>   (define_insn "@tablejump<mode>"
>>     [(set (pc)
>> -    (match_operand:P 0 "register_operand" "r"))
>> +    (match_operand:P 0 "register_operand" "q"))
>>      (use (label_ref (match_operand 1 "" "")))]
>>     ""
>>     "jr\t%0"
  
WANG Xuerui June 7, 2023, 7:37 a.m. UTC | #3
On 2023/6/7 11:36, Lulu Cheng wrote:
> 
> 在 2023/6/7 上午11:26, WANG Xuerui 写道:
>> Hi,
>>
>> On 2023/6/7 10:31, Lulu Cheng wrote:
>>> If the $ra register is modified during the jump to the jump table, 
>>> the hardware
>>> branch prediction function will be broken, resulting in a significant 
>>> increase
>>> in the branch false prediction rate and affecting performance.
>>
>> Thanks for the insight! This is the kind of improvement that will 
>> probably become a lot harder to even *sight* without uarch details 
>> available.
>>
>> However, I think it's better to also include a minimized test case to 
>> ensure the compiled code doesn't regress. (Comparison of relevant 
>> statistics, e.g. output of perf stat, would be even nicer to have!)
>>
> There was no way I could find a small test case that would replicate 
> this problem. This occurs when compiling spec2006 400.perlbench. And it 
> only appears when '-flto' is added.:-(
> 
> But I paid for reproducible programs and compilation methods under the 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110136.

Ah okay. I missed the bug number in the title initially.

After reading through the context, I think the reason for avoiding $ra 
(in general) should be that the micro-architecture unconditionally 
treats a "jr $ra" as "return from subroutine", hence doing "jr $ra" 
would interfere with *both* subroutine return prediction *and* the more 
general indirect branch prediction; am I right? This could be more 
informative than merely saying "HW branch prediction will be broken".

> 
>>>
>>> gcc/ChangeLog:
>>>
>>>     * config/loongarch/loongarch.md: Change register constraint to 'q'.
>>> ---
>>>   gcc/config/loongarch/loongarch.md | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/config/loongarch/loongarch.md 
>>> b/gcc/config/loongarch/loongarch.md
>>> index 816a943d155..f9b64173104 100644
>>> --- a/gcc/config/loongarch/loongarch.md
>>> +++ b/gcc/config/loongarch/loongarch.md
>>> @@ -2926,9 +2926,11 @@ (define_expand "tablejump"
>>>     DONE;
>>>   })
>>>   +;; Jump to the jump table Avoid using the $r1 register to prevent
>>> +;; affecting hardware branch prediction.
>>>   (define_insn "@tablejump<mode>"
>>>     [(set (pc)
>>> -    (match_operand:P 0 "register_operand" "r"))
>>> +    (match_operand:P 0 "register_operand" "q"))
>>>      (use (label_ref (match_operand 1 "" "")))]
>>>     ""
>>>     "jr\t%0"
>
  
Lulu Cheng June 7, 2023, 7:51 a.m. UTC | #4
在 2023/6/7 下午3:37, WANG Xuerui 写道:
> On 2023/6/7 11:36, Lulu Cheng wrote:
>>
>> 在 2023/6/7 上午11:26, WANG Xuerui 写道:
>>> Hi,
>>>
>>> On 2023/6/7 10:31, Lulu Cheng wrote:
>>>> If the $ra register is modified during the jump to the jump table, 
>>>> the hardware
>>>> branch prediction function will be broken, resulting in a 
>>>> significant increase
>>>> in the branch false prediction rate and affecting performance.
>>>
>>> Thanks for the insight! This is the kind of improvement that will 
>>> probably become a lot harder to even *sight* without uarch details 
>>> available.
>>>
>>> However, I think it's better to also include a minimized test case 
>>> to ensure the compiled code doesn't regress. (Comparison of relevant 
>>> statistics, e.g. output of perf stat, would be even nicer to have!)
>>>
>> There was no way I could find a small test case that would replicate 
>> this problem. This occurs when compiling spec2006 400.perlbench. And 
>> it only appears when '-flto' is added.:-(
>>
>> But I paid for reproducible programs and compilation methods under 
>> the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110136.
>
> Ah okay. I missed the bug number in the title initially.
>
> After reading through the context, I think the reason for avoiding $ra 
> (in general) should be that the micro-architecture unconditionally 
> treats a "jr $ra" as "return from subroutine", hence doing "jr $ra" 
> would interfere with *both* subroutine return prediction *and* the 
> more general indirect branch prediction; am I right? This could be 
> more informative than merely saying "HW branch prediction will be 
> broken".
>
Exactly what you described. :-)

I'll change the description.

Thank you very much!

>>
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>     * config/loongarch/loongarch.md: Change register constraint to 
>>>> 'q'.
>>>> ---
>>>>   gcc/config/loongarch/loongarch.md | 4 +++-
>>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/gcc/config/loongarch/loongarch.md 
>>>> b/gcc/config/loongarch/loongarch.md
>>>> index 816a943d155..f9b64173104 100644
>>>> --- a/gcc/config/loongarch/loongarch.md
>>>> +++ b/gcc/config/loongarch/loongarch.md
>>>> @@ -2926,9 +2926,11 @@ (define_expand "tablejump"
>>>>     DONE;
>>>>   })
>>>>   +;; Jump to the jump table Avoid using the $r1 register to prevent
>>>> +;; affecting hardware branch prediction.
>>>>   (define_insn "@tablejump<mode>"
>>>>     [(set (pc)
>>>> -    (match_operand:P 0 "register_operand" "r"))
>>>> +    (match_operand:P 0 "register_operand" "q"))
>>>>      (use (label_ref (match_operand 1 "" "")))]
>>>>     ""
>>>>     "jr\t%0"
>>
  

Patch

diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index 816a943d155..f9b64173104 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2926,9 +2926,11 @@  (define_expand "tablejump"
   DONE;
 })
 
+;; Jump to the jump table Avoid using the $r1 register to prevent
+;; affecting hardware branch prediction.
 (define_insn "@tablejump<mode>"
   [(set (pc)
-	(match_operand:P 0 "register_operand" "r"))
+	(match_operand:P 0 "register_operand" "q"))
    (use (label_ref (match_operand 1 "" "")))]
   ""
   "jr\t%0"