diff mbox series

[1/1] Fix bit-position comparison

Message ID	20220727034437.154625-2-juzhe.zhong@rivai.ai
State	Rejected
Headers	DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D992D3857C52 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/1] Fix bit-position comparison Date: Wed, 27 Jul 2022 11:44:37 +0800 Message-Id: <20220727034437.154625-2-juzhe.zhong@rivai.ai> In-Reply-To: <20220727034437.154625-1-juzhe.zhong@rivai.ai> References: <20220727034437.154625-1-juzhe.zhong@rivai.ai> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Feedback-ID: bizesmtp:rivai.ai:qybgforeign:qybgforeign3 Precedence: list Cc: jakub@redhat.com, richard.earnshaw@arm.com, rguenther@suse.de, richard.sandiford@arm.com, gnu@the-meissners.org, jlaw@tachyum.com, zhongjuzhe <juzhe.zhong@rivai.ai>, davem@redhat.com, kenner@nyu.edu, ian@airs.com, wilson@tuliptree.org, joseph@codesourcery.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org>
Series	middle-end: Fix bit position comparison \| [0/1] middle-end: Fix bit position comparison [1/1] Fix bit-position comparison

Commit Message

juzhe.zhong@rivai.ai July 27, 2022, 3:44 a.m. UTC

  From: zhongjuzhe <juzhe.zhong@rivai.ai>

gcc/ChangeLog:

        * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE

---
 gcc/expr.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Richard Biener July 27, 2022, 6:46 a.m. UTC | #1

On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:

> From: zhongjuzhe <juzhe.zhong@rivai.ai>
> 
> gcc/ChangeLog:
> 
>         * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE
> 
> ---
>  gcc/expr.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 80bb1b8a4c5..ac2b3c07df6 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
>  	 code contains an out-of-bounds access to a small array.  */
>        if (!MEM_P (to_rtx)
>  	  && GET_MODE (to_rtx) != BLKmode
> -	  && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
> +	  && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))

I think this has the chance to go wrong with regard to endianess.
Consider to_rtx with 32bit mode size but 12bit mode precision.  bitpos
is relative to the start of to_rtx as if it were in memory and bitsize
determines the contiguous region affected.  But since we are actually
storing into a non-memory endianess comes into play.

So no, I don't think the patch is correct, it would be way more
complicated to get the desired improvement.

Richard.

>  	{
>  	  expand_normal (from);
>  	  result = NULL;
>

juzhe.zhong@rivai.ai July 27, 2022, 7:09 a.m. UTC | #2

Thank you so much for the fast reply. Ok, it is true that I didn't think about it carefully. Can you help me with the following the issue?

For RVV (RISC-V 'V' Extension), we have full vector type 'vint8m1_t' (LMUL = 1) and fractional vector type 'vint8mf2_t' (LMUL = 1/2).
Because in the ISA, we don't have whole register load/store for fractional vector. I reference the LLVM implementation and I adjust BITSIZE of 
fractional vector same as full vector (It will confuse GCC the bytesize of fractional vector and consider the spill size of a fractional vector is same as LMUL = 1) 
so that I can use whole register load/store directly during the register spilling. (Even though it will enlarge the spill size). According to the machine_mode definition,
The machine_mode PRECISION is calculate by component size which is different from BITSIZE

Now, here is the question. For array type: vint8mf2x4_t,  if I want to access vint8mf2x4_t[2], because the PRECISION and BITSIZE are different. Because bitops is calculated by
bitsize and compare to precision in the codes that the patch mentioned. It will make a out-of-bounds access to small array.

 Can you help me with this? This is important for the following RVV upstream support. Thanks.

juzhe.zhong@rivai.ai

From: Richard Biener
Date: 2022-07-27 14:46
To: zhongjuzhe
CC: gcc-patches; richard.earnshaw; jakub; kenner; jlaw; gnu; jason; davem; joseph; richard.sandiford; bernds_cb1; ian; wilson
Subject: Re: [PATCH 1/1] Fix bit-position comparison
On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:

> From: zhongjuzhe <juzhe.zhong@rivai.ai>
> 
> gcc/ChangeLog:
> 
>         * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE
> 
> ---
>  gcc/expr.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 80bb1b8a4c5..ac2b3c07df6 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
>  code contains an out-of-bounds access to a small array.  */
>        if (!MEM_P (to_rtx)
>    && GET_MODE (to_rtx) != BLKmode
> -   && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
> +   && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))

I think this has the chance to go wrong with regard to endianess.
Consider to_rtx with 32bit mode size but 12bit mode precision.  bitpos
is relative to the start of to_rtx as if it were in memory and bitsize
determines the contiguous region affected.  But since we are actually
storing into a non-memory endianess comes into play.

So no, I don't think the patch is correct, it would be way more
complicated to get the desired improvement.

Richard.

>  {
>    expand_normal (from);
>    result = NULL;
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Richard Biener July 27, 2022, 7:35 a.m. UTC | #3

On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:

> Thank you so much for the fast reply. Ok, it is true that I didn't think about it carefully. Can you help me with the following the issue?
> 
> For RVV (RISC-V 'V' Extension), we have full vector type 'vint8m1_t' (LMUL = 1) and fractional vector type 'vint8mf2_t' (LMUL = 1/2).

Can you explain in terms of GCCs generic vectors what vint8m1_t and
vint8mf2_t are?

> Because in the ISA, we don't have whole register load/store for fractional vector. I reference the LLVM implementation and I adjust BITSIZE of 
> fractional vector same as full vector (It will confuse GCC the bytesize of fractional vector and consider the spill size of a fractional vector is same as LMUL = 1) 
> so that I can use whole register load/store directly during the register spilling. (Even though it will enlarge the spill size). According to the machine_mode definition,
> The machine_mode PRECISION is calculate by component size which is different from BITSIZE
> 
> Now, here is the question. For array type: vint8mf2x4_t,  if I want to access vint8mf2x4_t[2], because the PRECISION and BITSIZE are different. Because bitops is calculated by
> bitsize and compare to precision in the codes that the patch mentioned. It will make a out-of-bounds access to small array.
> 
>  Can you help me with this? This is important for the following RVV upstream support. Thanks.

So you have that vint8mf2_t type and registers + instructions to operate 
on them but no way to load/store?  How do you implement

vint8mf2_t foo (vint8mf2_t *p)
{
  return *p;
}

?

> 
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2022-07-27 14:46
> To: zhongjuzhe
> CC: gcc-patches; richard.earnshaw; jakub; kenner; jlaw; gnu; jason; davem; joseph; richard.sandiford; bernds_cb1; ian; wilson
> Subject: Re: [PATCH 1/1] Fix bit-position comparison
> On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
>  
> > From: zhongjuzhe <juzhe.zhong@rivai.ai>
> > 
> > gcc/ChangeLog:
> > 
> >         * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE
> > 
> > ---
> >  gcc/expr.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > index 80bb1b8a4c5..ac2b3c07df6 100644
> > --- a/gcc/expr.cc
> > +++ b/gcc/expr.cc
> > @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
> >  code contains an out-of-bounds access to a small array.  */
> >        if (!MEM_P (to_rtx)
> >    && GET_MODE (to_rtx) != BLKmode
> > -   && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
> > +   && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
>  
> I think this has the chance to go wrong with regard to endianess.
> Consider to_rtx with 32bit mode size but 12bit mode precision.  bitpos
> is relative to the start of to_rtx as if it were in memory and bitsize
> determines the contiguous region affected.  But since we are actually
> storing into a non-memory endianess comes into play.
>  
> So no, I don't think the patch is correct, it would be way more
> complicated to get the desired improvement.
>  
> Richard.
>  
> >  {
> >    expand_normal (from);
> >    result = NULL;
> > 
>  
>

juzhe.zhong@rivai.ai July 27, 2022, 8 a.m. UTC | #4

Let's take look at these 2 cases: https://godbolt.org/z/zP16frPnb. 
In RVV, we have vle8 and vsetvli to specify loading vint8mf2 (vsetvli a1, zero + vle8.v). You can see it in foo function.  In this case we don't need to confuse compiler the size of vint8mf2.
However, The second case is I write assembly to occupy the vector register to generate register spilling cases. You can see LLVM implementation:  First use vsetvli + vle8.v load (vint8mf2_t) data from the base pointer, then spill the vector to memory using vs1r (this is the whole register store which store the whole vector to the memory and then use vl1r load the 
whole register and finally return it.  In LLVM implementation, it insert vsetvli instructions for RVV instruction using LLVM PASS before RA (register allocation).  So after RA, compiler generate the spilling loads/stores. We can either choose to use vsetvli + vle/vse (load/store fractional vector) or vl1r/vs1r (load/store whole vector which enlarge the spill size).

Because implementing insert vsetvli PASS after RA (register allocation) is so complicated, LLVM choose to use vl1r/vs1r.
Frankly, I implement RVV GCC reference to LLVM. So that why I want to define the machine_mode for `vint8mf2` with smaller element-size but same byte-size from `vint8m1'.

Thank you for your reply.  

juzhe.zhong@rivai.ai

From: Richard Biener
Date: 2022-07-27 15:35
To: juzhe.zhong@rivai.ai
CC: gcc-patches
Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:

> Thank you so much for the fast reply. Ok, it is true that I didn't think about it carefully. Can you help me with the following the issue?
> 
> For RVV (RISC-V 'V' Extension), we have full vector type 'vint8m1_t' (LMUL = 1) and fractional vector type 'vint8mf2_t' (LMUL = 1/2).

Can you explain in terms of GCCs generic vectors what vint8m1_t and
vint8mf2_t are?

> Because in the ISA, we don't have whole register load/store for fractional vector. I reference the LLVM implementation and I adjust BITSIZE of 
> fractional vector same as full vector (It will confuse GCC the bytesize of fractional vector and consider the spill size of a fractional vector is same as LMUL = 1) 
> so that I can use whole register load/store directly during the register spilling. (Even though it will enlarge the spill size). According to the machine_mode definition,
> The machine_mode PRECISION is calculate by component size which is different from BITSIZE
> 
> Now, here is the question. For array type: vint8mf2x4_t,  if I want to access vint8mf2x4_t[2], because the PRECISION and BITSIZE are different. Because bitops is calculated by
> bitsize and compare to precision in the codes that the patch mentioned. It will make a out-of-bounds access to small array.
> 
>  Can you help me with this? This is important for the following RVV upstream support. Thanks.

So you have that vint8mf2_t type and registers + instructions to operate 
on them but no way to load/store?  How do you implement

vint8mf2_t foo (vint8mf2_t *p)
{
  return *p;
}

?

> 
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2022-07-27 14:46
> To: zhongjuzhe
> CC: gcc-patches; richard.earnshaw; jakub; kenner; jlaw; gnu; jason; davem; joseph; richard.sandiford; bernds_cb1; ian; wilson
> Subject: Re: [PATCH 1/1] Fix bit-position comparison
> On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
>  
> > From: zhongjuzhe <juzhe.zhong@rivai.ai>
> > 
> > gcc/ChangeLog:
> > 
> >         * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE
> > 
> > ---
> >  gcc/expr.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > index 80bb1b8a4c5..ac2b3c07df6 100644
> > --- a/gcc/expr.cc
> > +++ b/gcc/expr.cc
> > @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
> >  code contains an out-of-bounds access to a small array.  */
> >        if (!MEM_P (to_rtx)
> >    && GET_MODE (to_rtx) != BLKmode
> > -   && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
> > +   && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
>  
> I think this has the chance to go wrong with regard to endianess.
> Consider to_rtx with 32bit mode size but 12bit mode precision.  bitpos
> is relative to the start of to_rtx as if it were in memory and bitsize
> determines the contiguous region affected.  But since we are actually
> storing into a non-memory endianess comes into play.
>  
> So no, I don't think the patch is correct, it would be way more
> complicated to get the desired improvement.
>  
> Richard.
>  
> >  {
> >    expand_normal (from);
> >    result = NULL;
> > 
>  
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Richard Biener July 27, 2022, 8:12 a.m. UTC | #5

On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:

> Let's take look at these 2 cases: https://godbolt.org/z/zP16frPnb. In 
> RVV, we have vle8 and vsetvli to specify loading vint8mf2 (vsetvli a1, 
> zero + vle8.v). You can see it in foo function.  In this case we don't 
> need to confuse compiler the size of vint8mf2. However, The second case 
> is I write assembly to occupy the vector register to generate register 
> spilling cases. You can see LLVM implementation:  First use vsetvli + 
> vle8.v load (vint8mf2_t) data from the base pointer, then spill the 
> vector to memory using vs1r (this is the whole register store which 
> store the whole vector to the memory and then use vl1r load the whole 
> register and finally return it.  In LLVM implementation, it insert 
> vsetvli instructions for RVV instruction using LLVM PASS before RA 
> (register allocation).  So after RA, compiler generate the spilling 
> loads/stores. We can either choose to use vsetvli + vle/vse (load/store 
> fractional vector) or vl1r/vs1r (load/store whole vector which enlarge 
> the spill size).
> 
> Because implementing insert vsetvli PASS after RA (register allocation) 
> is so complicated, LLVM choose to use vl1r/vs1r. Frankly, I implement 
> RVV GCC reference to LLVM. So that why I want to define the machine_mode 
> for `vint8mf2` with smaller element-size but same byte-size from 
> `vint8m1'.

The LLVM strathegy might not be the best one for GCC.  I still don't
quite understand what issue you face with the code you try to patch.
How does the machmode.def portion of the port look like for
vint8m1 and vint8mf2?  What are the corresponding modes?

> Thank you for your reply.  
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2022-07-27 15:35
> To: juzhe.zhong@rivai.ai
> CC: gcc-patches
> Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
> On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
>  
> > Thank you so much for the fast reply. Ok, it is true that I didn't think about it carefully. Can you help me with the following the issue?
> > 
> > For RVV (RISC-V 'V' Extension), we have full vector type 'vint8m1_t' (LMUL = 1) and fractional vector type 'vint8mf2_t' (LMUL = 1/2).
>  
> Can you explain in terms of GCCs generic vectors what vint8m1_t and
> vint8mf2_t are?
>  
> > Because in the ISA, we don't have whole register load/store for fractional vector. I reference the LLVM implementation and I adjust BITSIZE of 
> > fractional vector same as full vector (It will confuse GCC the bytesize of fractional vector and consider the spill size of a fractional vector is same as LMUL = 1) 
> > so that I can use whole register load/store directly during the register spilling. (Even though it will enlarge the spill size). According to the machine_mode definition,
> > The machine_mode PRECISION is calculate by component size which is different from BITSIZE
> > 
> > Now, here is the question. For array type: vint8mf2x4_t,  if I want to access vint8mf2x4_t[2], because the PRECISION and BITSIZE are different. Because bitops is calculated by
> > bitsize and compare to precision in the codes that the patch mentioned. It will make a out-of-bounds access to small array.
> > 
> >  Can you help me with this? This is important for the following RVV upstream support. Thanks.
>  
> So you have that vint8mf2_t type and registers + instructions to operate 
> on them but no way to load/store?  How do you implement
>  
> vint8mf2_t foo (vint8mf2_t *p)
> {
>   return *p;
> }
>  
> ?
>  
> > 
> > 
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-07-27 14:46
> > To: zhongjuzhe
> > CC: gcc-patches; richard.earnshaw; jakub; kenner; jlaw; gnu; jason; davem; joseph; richard.sandiford; bernds_cb1; ian; wilson
> > Subject: Re: [PATCH 1/1] Fix bit-position comparison
> > On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
> >  
> > > From: zhongjuzhe <juzhe.zhong@rivai.ai>
> > > 
> > > gcc/ChangeLog:
> > > 
> > >         * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE
> > > 
> > > ---
> > >  gcc/expr.cc | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > > index 80bb1b8a4c5..ac2b3c07df6 100644
> > > --- a/gcc/expr.cc
> > > +++ b/gcc/expr.cc
> > > @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
> > >  code contains an out-of-bounds access to a small array.  */
> > >        if (!MEM_P (to_rtx)
> > >    && GET_MODE (to_rtx) != BLKmode
> > > -   && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
> > > +   && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
> >  
> > I think this has the chance to go wrong with regard to endianess.
> > Consider to_rtx with 32bit mode size but 12bit mode precision.  bitpos
> > is relative to the start of to_rtx as if it were in memory and bitsize
> > determines the contiguous region affected.  But since we are actually
> > storing into a non-memory endianess comes into play.
> >  
> > So no, I don't think the patch is correct, it would be way more
> > complicated to get the desired improvement.
> >  
> > Richard.
> >  
> > >  {
> > >    expand_normal (from);
> > >    result = NULL;
> > > 
> >  
> > 
>  
>

juzhe.zhong@rivai.ai July 27, 2022, 8:26 a.m. UTC | #6

For vint8m1_t: 
   VECTOR_MODES_WITH_PREFIX (VNx, INT, 16, 0)
ADJUST_NUNITS (VNx16QI, riscv_vector_chunks * 8);
ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8);
For vint8mf2_t: 
   VECTOR_MODES_WITH_PREFIX (VNx, INT, 8, 0)
ADJUST_NUNITS (VNx8QI, riscv_vector_chunks * 4);
ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8);

riscv_vector_chunks is a compile-time unknown poly value, I use ADJUST_BYTES to specify these 2 machine_modes 
with same bytesize but different element number.
This way can tell GCC that vint8mf2_t has half element nunber of vint8m1_t but occupy the same size during the register spilling.

May be we can add a new function that call ADJUST_PRECISION that I can adjust these two precision?
I am not sure. I am look forward another solution to deal with this issuse. Thank you so much.



juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2022-07-27 16:12
To: juzhe.zhong@rivai.ai
CC: gcc-patches
Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
 
> Let's take look at these 2 cases: https://godbolt.org/z/zP16frPnb. In 
> RVV, we have vle8 and vsetvli to specify loading vint8mf2 (vsetvli a1, 
> zero + vle8.v). You can see it in foo function.  In this case we don't 
> need to confuse compiler the size of vint8mf2. However, The second case 
> is I write assembly to occupy the vector register to generate register 
> spilling cases. You can see LLVM implementation:  First use vsetvli + 
> vle8.v load (vint8mf2_t) data from the base pointer, then spill the 
> vector to memory using vs1r (this is the whole register store which 
> store the whole vector to the memory and then use vl1r load the whole 
> register and finally return it.  In LLVM implementation, it insert 
> vsetvli instructions for RVV instruction using LLVM PASS before RA 
> (register allocation).  So after RA, compiler generate the spilling 
> loads/stores. We can either choose to use vsetvli + vle/vse (load/store 
> fractional vector) or vl1r/vs1r (load/store whole vector which enlarge 
> the spill size).
> 
> Because implementing insert vsetvli PASS after RA (register allocation) 
> is so complicated, LLVM choose to use vl1r/vs1r. Frankly, I implement 
> RVV GCC reference to LLVM. So that why I want to define the machine_mode 
> for `vint8mf2` with smaller element-size but same byte-size from 
> `vint8m1'.
 
The LLVM strathegy might not be the best one for GCC.  I still don't
quite understand what issue you face with the code you try to patch.
How does the machmode.def portion of the port look like for
vint8m1 and vint8mf2?  What are the corresponding modes?
 
> Thank you for your reply.  
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2022-07-27 15:35
> To: juzhe.zhong@rivai.ai
> CC: gcc-patches
> Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
> On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
>  
> > Thank you so much for the fast reply. Ok, it is true that I didn't think about it carefully. Can you help me with the following the issue?
> > 
> > For RVV (RISC-V 'V' Extension), we have full vector type 'vint8m1_t' (LMUL = 1) and fractional vector type 'vint8mf2_t' (LMUL = 1/2).
>  
> Can you explain in terms of GCCs generic vectors what vint8m1_t and
> vint8mf2_t are?
>  
> > Because in the ISA, we don't have whole register load/store for fractional vector. I reference the LLVM implementation and I adjust BITSIZE of 
> > fractional vector same as full vector (It will confuse GCC the bytesize of fractional vector and consider the spill size of a fractional vector is same as LMUL = 1) 
> > so that I can use whole register load/store directly during the register spilling. (Even though it will enlarge the spill size). According to the machine_mode definition,
> > The machine_mode PRECISION is calculate by component size which is different from BITSIZE
> > 
> > Now, here is the question. For array type: vint8mf2x4_t,  if I want to access vint8mf2x4_t[2], because the PRECISION and BITSIZE are different. Because bitops is calculated by
> > bitsize and compare to precision in the codes that the patch mentioned. It will make a out-of-bounds access to small array.
> > 
> >  Can you help me with this? This is important for the following RVV upstream support. Thanks.
>  
> So you have that vint8mf2_t type and registers + instructions to operate 
> on them but no way to load/store?  How do you implement
>  
> vint8mf2_t foo (vint8mf2_t *p)
> {
>   return *p;
> }
>  
> ?
>  
> > 
> > 
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-07-27 14:46
> > To: zhongjuzhe
> > CC: gcc-patches; richard.earnshaw; jakub; kenner; jlaw; gnu; jason; davem; joseph; richard.sandiford; bernds_cb1; ian; wilson
> > Subject: Re: [PATCH 1/1] Fix bit-position comparison
> > On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
> >  
> > > From: zhongjuzhe <juzhe.zhong@rivai.ai>
> > > 
> > > gcc/ChangeLog:
> > > 
> > >         * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE
> > > 
> > > ---
> > >  gcc/expr.cc | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > > index 80bb1b8a4c5..ac2b3c07df6 100644
> > > --- a/gcc/expr.cc
> > > +++ b/gcc/expr.cc
> > > @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
> > >  code contains an out-of-bounds access to a small array.  */
> > >        if (!MEM_P (to_rtx)
> > >    && GET_MODE (to_rtx) != BLKmode
> > > -   && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
> > > +   && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
> >  
> > I think this has the chance to go wrong with regard to endianess.
> > Consider to_rtx with 32bit mode size but 12bit mode precision.  bitpos
> > is relative to the start of to_rtx as if it were in memory and bitsize
> > determines the contiguous region affected.  But since we are actually
> > storing into a non-memory endianess comes into play.
> >  
> > So no, I don't think the patch is correct, it would be way more
> > complicated to get the desired improvement.
> >  
> > Richard.
> >  
> > >  {
> > >    expand_normal (from);
> > >    result = NULL;
> > > 
> >  
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Richard Sandiford July 27, 2022, 12:47 p.m. UTC | #7

Richard Biener <rguenther@suse.de> writes:
> On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
>
>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>> 
>> gcc/ChangeLog:
>> 
>>         * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE
>> 
>> ---
>>  gcc/expr.cc | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index 80bb1b8a4c5..ac2b3c07df6 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
>>  	 code contains an out-of-bounds access to a small array.  */
>>        if (!MEM_P (to_rtx)
>>  	  && GET_MODE (to_rtx) != BLKmode
>> -	  && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
>> +	  && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
>
> I think this has the chance to go wrong with regard to endianess.
> Consider to_rtx with 32bit mode size but 12bit mode precision.  bitpos
> is relative to the start of to_rtx as if it were in memory and bitsize
> determines the contiguous region affected.  But since we are actually
> storing into a non-memory endianess comes into play.

Not sure I follow the code well enough to comment, but: this condition
is saying when we can drop the assignment on the floor and just expand
the rhs (for side effects).  From that point of view, using bitsize
should be more conservative than using precision, since bitsize is
always >= precision.

Thanks,
Richard

>
> So no, I don't think the patch is correct, it would be way more
> complicated to get the desired improvement.
>
> Richard.
>
>>  	{
>>  	  expand_normal (from);
>>  	  result = NULL;
>>

Richard Biener July 27, 2022, 12:56 p.m. UTC | #8

On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:

> For vint8m1_t: 
>    VECTOR_MODES_WITH_PREFIX (VNx, INT, 16, 0)
> ADJUST_NUNITS (VNx16QI, riscv_vector_chunks * 8);
> ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8);
> For vint8mf2_t: 
>    VECTOR_MODES_WITH_PREFIX (VNx, INT, 8, 0)
> ADJUST_NUNITS (VNx8QI, riscv_vector_chunks * 4);
> ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8);

^^^

ADJUST_BYTES (VNx8QI, riscv_vector_chunks * 8);

probably.  As said, I'm not sure this is a good idea just for the sake
of spill slots.  Maybe there's a mode_for_spill one could use and
spill a paradoxical subreg of that mode ... (don't search for
mode_for_spill, I just made that up).

Maybe you can somehow forbit spilling of vint8mf2_t and instead
define spill_class for those so they spill first to vint8m1_t
and then those get spilled to the stack?

> riscv_vector_chunks is a compile-time unknown poly value, I use ADJUST_BYTES to specify these 2 machine_modes 
> with same bytesize but different element number.
> This way can tell GCC that vint8mf2_t has half element nunber of vint8m1_t but occupy the same size during the register spilling.
> 
> May be we can add a new function that call ADJUST_PRECISION that I can adjust these two precision?
> I am not sure. I am look forward another solution to deal with this issuse. Thank you so much.
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2022-07-27 16:12
> To: juzhe.zhong@rivai.ai
> CC: gcc-patches
> Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
> On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
>  
> > Let's take look at these 2 cases: https://godbolt.org/z/zP16frPnb. In 
> > RVV, we have vle8 and vsetvli to specify loading vint8mf2 (vsetvli a1, 
> > zero + vle8.v). You can see it in foo function.  In this case we don't 
> > need to confuse compiler the size of vint8mf2. However, The second case 
> > is I write assembly to occupy the vector register to generate register 
> > spilling cases. You can see LLVM implementation:  First use vsetvli + 
> > vle8.v load (vint8mf2_t) data from the base pointer, then spill the 
> > vector to memory using vs1r (this is the whole register store which 
> > store the whole vector to the memory and then use vl1r load the whole 
> > register and finally return it.  In LLVM implementation, it insert 
> > vsetvli instructions for RVV instruction using LLVM PASS before RA 
> > (register allocation).  So after RA, compiler generate the spilling 
> > loads/stores. We can either choose to use vsetvli + vle/vse (load/store 
> > fractional vector) or vl1r/vs1r (load/store whole vector which enlarge 
> > the spill size).
> > 
> > Because implementing insert vsetvli PASS after RA (register allocation) 
> > is so complicated, LLVM choose to use vl1r/vs1r. Frankly, I implement 
> > RVV GCC reference to LLVM. So that why I want to define the machine_mode 
> > for `vint8mf2` with smaller element-size but same byte-size from 
> > `vint8m1'.
>  
> The LLVM strathegy might not be the best one for GCC.  I still don't
> quite understand what issue you face with the code you try to patch.
> How does the machmode.def portion of the port look like for
> vint8m1 and vint8mf2?  What are the corresponding modes?
>  
> > Thank you for your reply.  
> > 
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-07-27 15:35
> > To: juzhe.zhong@rivai.ai
> > CC: gcc-patches
> > Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
> > On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
> >  
> > > Thank you so much for the fast reply. Ok, it is true that I didn't think about it carefully. Can you help me with the following the issue?
> > > 
> > > For RVV (RISC-V 'V' Extension), we have full vector type 'vint8m1_t' (LMUL = 1) and fractional vector type 'vint8mf2_t' (LMUL = 1/2).
> >  
> > Can you explain in terms of GCCs generic vectors what vint8m1_t and
> > vint8mf2_t are?
> >  
> > > Because in the ISA, we don't have whole register load/store for fractional vector. I reference the LLVM implementation and I adjust BITSIZE of 
> > > fractional vector same as full vector (It will confuse GCC the bytesize of fractional vector and consider the spill size of a fractional vector is same as LMUL = 1) 
> > > so that I can use whole register load/store directly during the register spilling. (Even though it will enlarge the spill size). According to the machine_mode definition,
> > > The machine_mode PRECISION is calculate by component size which is different from BITSIZE
> > > 
> > > Now, here is the question. For array type: vint8mf2x4_t,  if I want to access vint8mf2x4_t[2], because the PRECISION and BITSIZE are different. Because bitops is calculated by
> > > bitsize and compare to precision in the codes that the patch mentioned. It will make a out-of-bounds access to small array.
> > > 
> > >  Can you help me with this? This is important for the following RVV upstream support. Thanks.
> >  
> > So you have that vint8mf2_t type and registers + instructions to operate 
> > on them but no way to load/store?  How do you implement
> >  
> > vint8mf2_t foo (vint8mf2_t *p)
> > {
> >   return *p;
> > }
> >  
> > ?
> >  
> > > 
> > > 
> > > 
> > > 
> > > juzhe.zhong@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2022-07-27 14:46
> > > To: zhongjuzhe
> > > CC: gcc-patches; richard.earnshaw; jakub; kenner; jlaw; gnu; jason; davem; joseph; richard.sandiford; bernds_cb1; ian; wilson
> > > Subject: Re: [PATCH 1/1] Fix bit-position comparison
> > > On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
> > >  
> > > > From: zhongjuzhe <juzhe.zhong@rivai.ai>
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > >         * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE
> > > > 
> > > > ---
> > > >  gcc/expr.cc | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > > > index 80bb1b8a4c5..ac2b3c07df6 100644
> > > > --- a/gcc/expr.cc
> > > > +++ b/gcc/expr.cc
> > > > @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
> > > >  code contains an out-of-bounds access to a small array.  */
> > > >        if (!MEM_P (to_rtx)
> > > >    && GET_MODE (to_rtx) != BLKmode
> > > > -   && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
> > > > +   && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
> > >  
> > > I think this has the chance to go wrong with regard to endianess.
> > > Consider to_rtx with 32bit mode size but 12bit mode precision.  bitpos
> > > is relative to the start of to_rtx as if it were in memory and bitsize
> > > determines the contiguous region affected.  But since we are actually
> > > storing into a non-memory endianess comes into play.
> > >  
> > > So no, I don't think the patch is correct, it would be way more
> > > complicated to get the desired improvement.
> > >  
> > > Richard.
> > >  
> > > >  {
> > > >    expand_normal (from);
> > > >    result = NULL;
> > > > 
> > >  
> > > 
> >  
> > 
>  
>

juzhe.zhong@rivai.ai July 27, 2022, 1:12 p.m. UTC | #9

After consideration. Maybe I can try another solution. I aggree with you that it is not good idea that fake the BYTESIZE of vint8mf2_t and let GCC think it is a full vector.  I think the best way is let GCC understand the real size of 'vint8mf2_t' as a half of a vector and then RISC-V backend insert vsetvli after RA (register allocation) so that it will not enlarge the spill size. Thank you so much.



juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2022-07-27 20:56
To: juzhe.zhong@rivai.ai
CC: gcc-patches; vmakarov
Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
 
> For vint8m1_t: 
>    VECTOR_MODES_WITH_PREFIX (VNx, INT, 16, 0)
> ADJUST_NUNITS (VNx16QI, riscv_vector_chunks * 8);
> ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8);
> For vint8mf2_t: 
>    VECTOR_MODES_WITH_PREFIX (VNx, INT, 8, 0)
> ADJUST_NUNITS (VNx8QI, riscv_vector_chunks * 4);
> ADJUST_BYTES (VNx16QI, riscv_vector_chunks * 8);
 
^^^
 
ADJUST_BYTES (VNx8QI, riscv_vector_chunks * 8);
 
probably.  As said, I'm not sure this is a good idea just for the sake
of spill slots.  Maybe there's a mode_for_spill one could use and
spill a paradoxical subreg of that mode ... (don't search for
mode_for_spill, I just made that up).
 
Maybe you can somehow forbit spilling of vint8mf2_t and instead
define spill_class for those so they spill first to vint8m1_t
and then those get spilled to the stack?
 
> riscv_vector_chunks is a compile-time unknown poly value, I use ADJUST_BYTES to specify these 2 machine_modes 
> with same bytesize but different element number.
> This way can tell GCC that vint8mf2_t has half element nunber of vint8m1_t but occupy the same size during the register spilling.
> 
> May be we can add a new function that call ADJUST_PRECISION that I can adjust these two precision?
> I am not sure. I am look forward another solution to deal with this issuse. Thank you so much.
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2022-07-27 16:12
> To: juzhe.zhong@rivai.ai
> CC: gcc-patches
> Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
> On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
>  
> > Let's take look at these 2 cases: https://godbolt.org/z/zP16frPnb. In 
> > RVV, we have vle8 and vsetvli to specify loading vint8mf2 (vsetvli a1, 
> > zero + vle8.v). You can see it in foo function.  In this case we don't 
> > need to confuse compiler the size of vint8mf2. However, The second case 
> > is I write assembly to occupy the vector register to generate register 
> > spilling cases. You can see LLVM implementation:  First use vsetvli + 
> > vle8.v load (vint8mf2_t) data from the base pointer, then spill the 
> > vector to memory using vs1r (this is the whole register store which 
> > store the whole vector to the memory and then use vl1r load the whole 
> > register and finally return it.  In LLVM implementation, it insert 
> > vsetvli instructions for RVV instruction using LLVM PASS before RA 
> > (register allocation).  So after RA, compiler generate the spilling 
> > loads/stores. We can either choose to use vsetvli + vle/vse (load/store 
> > fractional vector) or vl1r/vs1r (load/store whole vector which enlarge 
> > the spill size).
> > 
> > Because implementing insert vsetvli PASS after RA (register allocation) 
> > is so complicated, LLVM choose to use vl1r/vs1r. Frankly, I implement 
> > RVV GCC reference to LLVM. So that why I want to define the machine_mode 
> > for `vint8mf2` with smaller element-size but same byte-size from 
> > `vint8m1'.
>  
> The LLVM strathegy might not be the best one for GCC.  I still don't
> quite understand what issue you face with the code you try to patch.
> How does the machmode.def portion of the port look like for
> vint8m1 and vint8mf2?  What are the corresponding modes?
>  
> > Thank you for your reply.  
> > 
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-07-27 15:35
> > To: juzhe.zhong@rivai.ai
> > CC: gcc-patches
> > Subject: Re: Re: [PATCH 1/1] Fix bit-position comparison
> > On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
> >  
> > > Thank you so much for the fast reply. Ok, it is true that I didn't think about it carefully. Can you help me with the following the issue?
> > > 
> > > For RVV (RISC-V 'V' Extension), we have full vector type 'vint8m1_t' (LMUL = 1) and fractional vector type 'vint8mf2_t' (LMUL = 1/2).
> >  
> > Can you explain in terms of GCCs generic vectors what vint8m1_t and
> > vint8mf2_t are?
> >  
> > > Because in the ISA, we don't have whole register load/store for fractional vector. I reference the LLVM implementation and I adjust BITSIZE of 
> > > fractional vector same as full vector (It will confuse GCC the bytesize of fractional vector and consider the spill size of a fractional vector is same as LMUL = 1) 
> > > so that I can use whole register load/store directly during the register spilling. (Even though it will enlarge the spill size). According to the machine_mode definition,
> > > The machine_mode PRECISION is calculate by component size which is different from BITSIZE
> > > 
> > > Now, here is the question. For array type: vint8mf2x4_t,  if I want to access vint8mf2x4_t[2], because the PRECISION and BITSIZE are different. Because bitops is calculated by
> > > bitsize and compare to precision in the codes that the patch mentioned. It will make a out-of-bounds access to small array.
> > > 
> > >  Can you help me with this? This is important for the following RVV upstream support. Thanks.
> >  
> > So you have that vint8mf2_t type and registers + instructions to operate 
> > on them but no way to load/store?  How do you implement
> >  
> > vint8mf2_t foo (vint8mf2_t *p)
> > {
> >   return *p;
> > }
> >  
> > ?
> >  
> > > 
> > > 
> > > 
> > > 
> > > juzhe.zhong@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2022-07-27 14:46
> > > To: zhongjuzhe
> > > CC: gcc-patches; richard.earnshaw; jakub; kenner; jlaw; gnu; jason; davem; joseph; richard.sandiford; bernds_cb1; ian; wilson
> > > Subject: Re: [PATCH 1/1] Fix bit-position comparison
> > > On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
> > >  
> > > > From: zhongjuzhe <juzhe.zhong@rivai.ai>
> > > > 
> > > > gcc/ChangeLog:
> > > > 
> > > >         * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE
> > > > 
> > > > ---
> > > >  gcc/expr.cc | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > > > index 80bb1b8a4c5..ac2b3c07df6 100644
> > > > --- a/gcc/expr.cc
> > > > +++ b/gcc/expr.cc
> > > > @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
> > > >  code contains an out-of-bounds access to a small array.  */
> > > >        if (!MEM_P (to_rtx)
> > > >    && GET_MODE (to_rtx) != BLKmode
> > > > -   && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
> > > > +   && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
> > >  
> > > I think this has the chance to go wrong with regard to endianess.
> > > Consider to_rtx with 32bit mode size but 12bit mode precision.  bitpos
> > > is relative to the start of to_rtx as if it were in memory and bitsize
> > > determines the contiguous region affected.  But since we are actually
> > > storing into a non-memory endianess comes into play.
> > >  
> > > So no, I don't think the patch is correct, it would be way more
> > > complicated to get the desired improvement.
> > >  
> > > Richard.
> > >  
> > > >  {
> > > >    expand_normal (from);
> > > >    result = NULL;
> > > > 
> > >  
> > > 
> >  
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

juzhe.zhong@rivai.ai July 27, 2022, 1:20 p.m. UTC | #10

Thank you for your reply. I am gonna try another to implement the fractional vector spilling of RVV in RISC-V backend. 
If this patch is really having a bad impact on other targets. I think this patch may needs to be abandon.




juzhe.zhong@rivai.ai
 
From: Richard Sandiford
Date: 2022-07-27 20:47
To: Richard Biener
CC: zhongjuzhe; gcc-patches; richard.earnshaw; jakub; kenner; jlaw; gnu; jason; davem; joseph; bernds_cb1; ian; wilson
Subject: Re: [PATCH 1/1] Fix bit-position comparison
Richard Biener <rguenther@suse.de> writes:
> On Wed, 27 Jul 2022, juzhe.zhong@rivai.ai wrote:
>
>> From: zhongjuzhe <juzhe.zhong@rivai.ai>
>> 
>> gcc/ChangeLog:
>> 
>>         * expr.cc (expand_assignment): Change GET_MODE_PRECISION to GET_MODE_BITSIZE
>> 
>> ---
>>  gcc/expr.cc | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index 80bb1b8a4c5..ac2b3c07df6 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -5574,7 +5574,7 @@ expand_assignment (tree to, tree from, bool nontemporal)
>>  code contains an out-of-bounds access to a small array.  */
>>        if (!MEM_P (to_rtx)
>>    && GET_MODE (to_rtx) != BLKmode
>> -   && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
>> +   && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
>
> I think this has the chance to go wrong with regard to endianess.
> Consider to_rtx with 32bit mode size but 12bit mode precision.  bitpos
> is relative to the start of to_rtx as if it were in memory and bitsize
> determines the contiguous region affected.  But since we are actually
> storing into a non-memory endianess comes into play.
 
Not sure I follow the code well enough to comment, but: this condition
is saying when we can drop the assignment on the floor and just expand
the rhs (for side effects).  From that point of view, using bitsize
should be more conservative than using precision, since bitsize is
always >= precision.
 
Thanks,
Richard
 
>
> So no, I don't think the patch is correct, it would be way more
> complicated to get the desired improvement.
>
> Richard.
>
>>  {
>>    expand_normal (from);
>>    result = NULL;
>>

diff mbox series

Patch

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 80bb1b8a4c5..ac2b3c07df6 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -5574,7 +5574,7 @@  expand_assignment (tree to, tree from, bool nontemporal)
 	 code contains an out-of-bounds access to a small array.  */
       if (!MEM_P (to_rtx)
 	  && GET_MODE (to_rtx) != BLKmode
-	  && known_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
+	  && known_ge (bitpos, GET_MODE_BITSIZE (GET_MODE (to_rtx))))
 	{
 	  expand_normal (from);
 	  result = NULL;