x86-64: fix ZMM register state tracking

Message ID 5B8FD8B302000078001E5940@prv1-mh.provo.novell.com
State New, archived
Headers

Commit Message

Jan Beulich Sept. 5, 2018, 1:22 p.m. UTC
  The three AVX512 state components are entirely independent - one being
in its "init state" has no implication whatsoever on either of the other
two. Fully separate X86_XSTATE_ZMM_H and X86_XSTATE_ZMM handling, to
prevent upper halves of the upper 16 ZMM registers to display as if they
were zero (when they aren't) after e.g. VZEROALL/VZEROUPPER.

gdb/
2018-09-05  Jan Beulich  <jbeulich@suse.com>

	* i387-tdep.c (i387_supply_xsave): Split handling of
	X86_XSTATE_ZMM_H and X86_XSTATE_ZMM.
	(i387_collect_xsave): Likewise.
  

Comments

Simon Marchi Sept. 7, 2018, 11:12 p.m. UTC | #1
Hi Jan,

Would it be possible to update or create a test to exercise that?
arch-specific tests are in testsuite/gdb.arch.

I have some comments about the form.  Tim, could you or somebody
else from Intel take a look at the functional side of the changes?

On 2018-09-05 02:22 PM, Jan Beulich wrote:
> The three AVX512 state components are entirely independent - one being
> in its "init state" has no implication whatsoever on either of the other
> two. Fully separate X86_XSTATE_ZMM_H and X86_XSTATE_ZMM handling, to
> prevent upper halves of the upper 16 ZMM registers to display as if they
> were zero (when they aren't) after e.g. VZEROALL/VZEROUPPER.
> 
> gdb/
> 2018-09-05  Jan Beulich  <jbeulich@suse.com>
> 
> 	* i387-tdep.c (i387_supply_xsave): Split handling of
> 	X86_XSTATE_ZMM_H and X86_XSTATE_ZMM.
> 	(i387_collect_xsave): Likewise.
> 
> --- a/gdb/i387-tdep.c
> +++ b/gdb/i387-tdep.c
> @@ -923,7 +923,8 @@ i387_supply_xsave (struct regcache *regc
>    enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
>    struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
>    const gdb_byte *regs = (const gdb_byte *) xsave;
> -  int i;
> +  int i, zmm_endlo_regnum = I387_ZMM0H_REGNUM (tdep)
> +			    + std::min (tdep->num_zmm_regs, 16);

The GNU standard requires to parenthesis when breaking lines:

  int i, zmm_endlo_regnum = (I387_ZMM0H_REGNUM (tdep)
			     + std::min (tdep->num_zmm_regs, 16));

Simon
  
Jan Beulich Sept. 10, 2018, 6:25 a.m. UTC | #2
>>> Simon Marchi <simon.marchi@ericsson.com> 09/08/18 1:13 AM >>>
>Would it be possible to update or create a test to exercise that?
>arch-specific tests are in testsuite/gdb.arch.

I'm sure it would be possible, but while I was happy to invest the time to
fix the actual bug (because it affects work I'm doing), I'm afraid I don't have
the time to learn how gdb test cases are to be constructed (I'm familiar
only with the binutils / gas side of things).


>On 2018-09-05 02:22 PM, Jan Beulich wrote:
>> --- a/gdb/i387-tdep.c
>> +++ b/gdb/i387-tdep.c
>> @@ -923,7 +923,8 @@ i387_supply_xsave (struct regcache *regc
>>    enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
>>    struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
>>    const gdb_byte *regs = (const gdb_byte *) xsave;
>> -  int i;
>> +  int i, zmm_endlo_regnum = I387_ZMM0H_REGNUM (tdep)
>> +			    + std::min (tdep->num_zmm_regs, 16);
>
>The GNU standard requires to parenthesis when breaking lines:
>
>int i, zmm_endlo_regnum = (I387_ZMM0H_REGNUM (tdep)
>+ std::min (tdep->num_zmm_regs, 16));

This is easy enough to fix (perhaps even without the need to send a v2,
but just before/while committing?); I don't think this is a rule being
enforced on the binutils side, so I simply wasn't aware.

Jan
  
Simon Marchi Sept. 10, 2018, 1:01 p.m. UTC | #3
On 2018-09-10 07:25, Jan Beulich wrote:
>>>> Simon Marchi <simon.marchi@ericsson.com> 09/08/18 1:13 AM >>>
>> Would it be possible to update or create a test to exercise that?
>> arch-specific tests are in testsuite/gdb.arch.
> 
> I'm sure it would be possible, but while I was happy to invest the time 
> to
> fix the actual bug (because it affects work I'm doing), I'm afraid I 
> don't have
> the time to learn how gdb test cases are to be constructed (I'm 
> familiar
> only with the binutils / gas side of things).

I understand.  If you can provide:

  - a minimal source file (C + assembly in this case, I suppose)
  - GDB commands to reproduce the bug
  - actual and expected output

I can take care of turning it in a GDB test case.

>> On 2018-09-05 02:22 PM, Jan Beulich wrote:
>>> --- a/gdb/i387-tdep.c
>>> +++ b/gdb/i387-tdep.c
>>> @@ -923,7 +923,8 @@ i387_supply_xsave (struct regcache *regc
>>>    enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
>>>    struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
>>>    const gdb_byte *regs = (const gdb_byte *) xsave;
>>> -  int i;
>>> +  int i, zmm_endlo_regnum = I387_ZMM0H_REGNUM (tdep)
>>> +			    + std::min (tdep->num_zmm_regs, 16);
>> 
>> The GNU standard requires to parenthesis when breaking lines:
>> 
>> int i, zmm_endlo_regnum = (I387_ZMM0H_REGNUM (tdep)
>> + std::min (tdep->num_zmm_regs, 16));
> 
> This is easy enough to fix (perhaps even without the need to send a v2,
> but just before/while committing?); I don't think this is a rule being
> enforced on the binutils side, so I simply wasn't aware.

Yeah, no problem, I can do it.  It's suggested in the GNU coding 
standards, so I think it would apply to binutils too:

   https://www.gnu.org/prep/standards/html_node/Formatting.html

Search for "Adding a set of parentheses".

Simon
  
Pedro Alves Sept. 11, 2018, 10:34 a.m. UTC | #4
On 09/05/2018 02:22 PM, Jan Beulich wrote:
> The three AVX512 state components are entirely independent - one being
> in its "init state" has no implication whatsoever on either of the other
> two. Fully separate X86_XSTATE_ZMM_H and X86_XSTATE_ZMM handling, to
> prevent upper halves of the upper 16 ZMM registers to display as if they
> were zero (when they aren't) after e.g. VZEROALL/VZEROUPPER.
> 
> gdb/
> 2018-09-05  Jan Beulich  <jbeulich@suse.com>
> 
> 	* i387-tdep.c (i387_supply_xsave): Split handling of
> 	X86_XSTATE_ZMM_H and X86_XSTATE_ZMM.
> 	(i387_collect_xsave): Likewise.

Does gdb/gdbserver/i387-fp.c need similar treatment?

Thanks,
Pedro Alves
  
Jan Beulich Sept. 11, 2018, 11:34 a.m. UTC | #5
>>> On 11.09.18 at 12:34, <palves@redhat.com> wrote:
> On 09/05/2018 02:22 PM, Jan Beulich wrote:
>> The three AVX512 state components are entirely independent - one being
>> in its "init state" has no implication whatsoever on either of the other
>> two. Fully separate X86_XSTATE_ZMM_H and X86_XSTATE_ZMM handling, to
>> prevent upper halves of the upper 16 ZMM registers to display as if they
>> were zero (when they aren't) after e.g. VZEROALL/VZEROUPPER.
>> 
>> gdb/
>> 2018-09-05  Jan Beulich  <jbeulich@suse.com>
>> 
>> 	* i387-tdep.c (i387_supply_xsave): Split handling of
>> 	X86_XSTATE_ZMM_H and X86_XSTATE_ZMM.
>> 	(i387_collect_xsave): Likewise.
> 
> Does gdb/gdbserver/i387-fp.c need similar treatment?

Not afaics - there's no place where both flags would be tested at
the same time (other than was the case here before this patch).

Jan
  
Jan Beulich Sept. 18, 2018, 1:37 p.m. UTC | #6
>>> On 10.09.18 at 15:01, <simon.marchi@polymtl.ca> wrote:
> On 2018-09-10 07:25, Jan Beulich wrote:
>>>>> Simon Marchi <simon.marchi@ericsson.com> 09/08/18 1:13 AM >>>
>>> Would it be possible to update or create a test to exercise that?
>>> arch-specific tests are in testsuite/gdb.arch.
>> 
>> I'm sure it would be possible, but while I was happy to invest the time 
>> to
>> fix the actual bug (because it affects work I'm doing), I'm afraid I 
>> don't have
>> the time to learn how gdb test cases are to be constructed (I'm 
>> familiar
>> only with the binutils / gas side of things).
> 
> I understand.  If you can provide:
> 
>   - a minimal source file (C + assembly in this case, I suppose)
>   - GDB commands to reproduce the bug
>   - actual and expected output

Attached. vzero.s is the source file used (no C file needed). gdb.log
is a transcript of a session with a broken gdb (the one installed on
the system), while gdb.txt is a transcript for the fixed one that I've
built myself.

> I can take care of turning it in a GDB test case.

Thanks.

Jan
[jbeulich@dus-dev-sles15v ~]$ gcc -Wall -W -o vzero{,.s} && gdbx ./vzero
GNU gdb (GDB) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./vzero...done.
(gdb) break main
Breakpoint 1 at 0x400497
(gdb) r
Starting program: /home/jbeulich/vzero 

Breakpoint 1, 0x0000000000400497 in main ()
(gdb) disp/i $pc
2: x/i $pc
=> 0x400497 <main>:	vpternlogd zmm0,zmm0,zmm0,0xff
(gdb) si
0x000000000040049e in main ()
2: x/i $pc
=> 0x40049e <main+7>:	vpternlogd zmm16,zmm0,zmm0,0xff
(gdb) 
0x00000000004004a5 in main ()
2: x/i $pc
=> 0x4004a5 <main+14>:	vzeroupper 
(gdb) print $zmm0.v16_int32
$1 = {-1 <repeats 16 times>}
(gdb) print $zmm16.v16_int32
$2 = {-1 <repeats 16 times>}
(gdb) si
0x00000000004004a8 in main ()
2: x/i $pc
=> 0x4004a8 <main+17>:	xor    eax,eax
(gdb) print $zmm16.v16_int32
$3 = {-1 <repeats 16 times>}
(gdb) q
A debugging session is active.

	Inferior 1 [process 56857] will be killed.

Quit anyway? (y or n) y
  
Metzger, Markus T Sept. 24, 2018, 5:19 p.m. UTC | #7
Hello Jan,

> The three AVX512 state components are entirely independent - one being in its "init
> state" has no implication whatsoever on either of the other two. Fully separate
> X86_XSTATE_ZMM_H and X86_XSTATE_ZMM handling, to prevent upper halves of
> the upper 16 ZMM registers to display as if they were zero (when they aren't) after
> e.g. VZEROALL/VZEROUPPER.
> 
> gdb/
> 2018-09-05  Jan Beulich  <jbeulich@suse.com>
> 
> 	* i387-tdep.c (i387_supply_xsave): Split handling of
> 	X86_XSTATE_ZMM_H and X86_XSTATE_ZMM.
> 	(i387_collect_xsave): Likewise.
> 
> --- a/gdb/i387-tdep.c
> +++ b/gdb/i387-tdep.c
> @@ -923,7 +923,8 @@ i387_supply_xsave (struct regcache *regc
>    enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
>    struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
>    const gdb_byte *regs = (const gdb_byte *) xsave;
> -  int i;
> +  int i, zmm_endlo_regnum = I387_ZMM0H_REGNUM (tdep)
> +			    + std::min (tdep->num_zmm_regs, 16);

It would be nice to comment on this magic 16 and the min operation.
It's how XSAVE organizes things but it isn't entirely intuitive.


>    ULONGEST clear_bv;
>    static const gdb_byte zero[I386_MAX_REGISTER_SIZE] = { 0 };
>    enum
> @@ -1002,7 +1003,8 @@ i387_supply_xsave (struct regcache *regc
>        return;
> 
>      case avx512_zmm_h:
> -      if ((clear_bv & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
> +      if ((clear_bv & (regnum < zmm_endlo_regnum ? X86_XSTATE_ZMM_H
> +						 : X86_XSTATE_ZMM)))

A comment that XSAVE stores the lower 16 registers in a different place
than the higher 16 registers and also guards them by different XCR0 bits
would be nice.

We hid the different places behind those XSAVE_AVX512_ZMM_H_ADDR
macros but there's nothing similar for the guard bits.  Maybe add macros
for the guard check, as well?


>  	regcache->raw_supply (regnum, zero);
>        else
>  	regcache->raw_supply (regnum,
> @@ -1080,21 +1082,17 @@ i387_supply_xsave (struct regcache *regc
>  	    }
>  	}
> 
> -      /* Handle the upper ZMM registers.  */
> -      if ((tdep->xcr0 & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
> +      /* Handle the upper halves of the low 8/16 ZMM registers.  */
> +      if ((tdep->xcr0 & X86_XSTATE_ZMM_H))
>  	{
> -	  if ((clear_bv & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
> +	  if ((clear_bv & X86_XSTATE_ZMM_H))
>  	    {
> -	      for (i = I387_ZMM0H_REGNUM (tdep);
> -		   i < I387_ZMMENDH_REGNUM (tdep);
> -		   i++)
> +	      for (i = I387_ZMM0H_REGNUM (tdep); i < zmm_endlo_regnum; i++)
>  		regcache->raw_supply (i, zero);
>  	    }
>  	  else
>  	    {
> -	      for (i = I387_ZMM0H_REGNUM (tdep);
> -		   i < I387_ZMMENDH_REGNUM (tdep);
> -		   i++)
> +	      for (i = I387_ZMM0H_REGNUM (tdep); i < zmm_endlo_regnum; i++)
>  		regcache->raw_supply (i,
>  				      XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i));
>  	    }
> @@ -1119,11 +1117,13 @@ i387_supply_xsave (struct regcache *regc
>  	    }
>  	}
> 
> -      /* Handle the YMM_AVX512 registers.  */
> +      /* Handle the upper 16 ZMM/YMM/XMM registers (if any).  */
>        if ((tdep->xcr0 & X86_XSTATE_ZMM))
>  	{
>  	  if ((clear_bv & X86_XSTATE_ZMM))
>  	    {
> +	      for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
> +		regcache->raw_supply (i, zero);
>  	      for (i = I387_YMM16H_REGNUM (tdep);
>  		   i < I387_YMMH_AVX512_END_REGNUM (tdep);
>  		   i++)
> @@ -1135,6 +1135,9 @@ i387_supply_xsave (struct regcache *regc
>  	    }
>  	  else
>  	    {
> +	      for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
> +		regcache->raw_supply (i,
> +				      XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i));

This covers the upper halves of zmm16 to zmm31.  Looking at the function it looks
like the lower halves are covered in separate cases avx512_ymmh_avx512 and
avx512_xmmh_avx512.  Maybe reflect this in the comment?  It currently suggests
that it handles the entire upper 16 registers.


>  	      for (i = I387_YMM16H_REGNUM (tdep);
>  		   i < I387_YMMH_AVX512_END_REGNUM (tdep);
>  		   i++)
> @@ -1340,7 +1343,8 @@ i387_collect_xsave (const struct regcach
>    gdb_byte *p, *regs = (gdb_byte *) xsave;
>    gdb_byte raw[I386_MAX_REGISTER_SIZE];
>    ULONGEST initial_xstate_bv, clear_bv, xstate_bv = 0;
> -  unsigned int i;
> +  unsigned int i, zmm_endlo_regnum = I387_ZMM0H_REGNUM (tdep)
> +				     + std::min (tdep->num_zmm_regs, 16);
>    enum
>      {
>        x87_ctrl_or_mxcsr = 0x1,
> @@ -1441,9 +1445,8 @@ i387_collect_xsave (const struct regcach
>  	     i < I387_MPXEND_REGNUM (tdep); i++)
>  	  memset (XSAVE_MPX_ADDR (tdep, regs, i), 0, 8);
> 
> -      if ((clear_bv & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
> -	for (i = I387_ZMM0H_REGNUM (tdep);
> -	     i < I387_ZMMENDH_REGNUM (tdep); i++)
> +      if ((clear_bv & X86_XSTATE_ZMM_H))
> +	for (i = I387_ZMM0H_REGNUM (tdep); i < zmm_endlo_regnum; i++)
>  	  memset (XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i), 0, 32);
> 
>        if ((clear_bv & X86_XSTATE_K))
> @@ -1453,6 +1456,8 @@ i387_collect_xsave (const struct regcach
> 
>        if ((clear_bv & X86_XSTATE_ZMM))
>  	{
> +	  for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
> +	    memset (XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i), 0, 32);
>  	  for (i = I387_YMM16H_REGNUM (tdep);
>  	       i < I387_YMMH_AVX512_END_REGNUM (tdep); i++)
>  	    memset (XSAVE_YMM_AVX512_ADDR (tdep, regs, i), 0, 16);

Looks OK to me.

Regards,
Markus.

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
  
Jan Beulich Sept. 25, 2018, 7:46 a.m. UTC | #8
>>> On 24.09.18 at 19:19, <markus.t.metzger@intel.com> wrote:
>> --- a/gdb/i387-tdep.c
>> +++ b/gdb/i387-tdep.c
>> @@ -923,7 +923,8 @@ i387_supply_xsave (struct regcache *regc
>>    enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
>>    struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
>>    const gdb_byte *regs = (const gdb_byte *) xsave;
>> -  int i;
>> +  int i, zmm_endlo_regnum = I387_ZMM0H_REGNUM (tdep)
>> +			    + std::min (tdep->num_zmm_regs, 16);
> 
> It would be nice to comment on this magic 16 and the min operation.
> It's how XSAVE organizes things but it isn't entirely intuitive.

Okay, I can accept the desire for a (new) comment here, albeit it pretty
much goes along the lines of what I say further down.

>> @@ -1002,7 +1003,8 @@ i387_supply_xsave (struct regcache *regc
>>        return;
>> 
>>      case avx512_zmm_h:
>> -      if ((clear_bv & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
>> +      if ((clear_bv & (regnum < zmm_endlo_regnum ? X86_XSTATE_ZMM_H
>> +						 : X86_XSTATE_ZMM)))
> 
> A comment that XSAVE stores the lower 16 registers in a different place
> than the higher 16 registers and also guards them by different XCR0 bits
> would be nice.

Such a comment would have been appropriate already prior to my patch.
I have to admit that I'm not inclined to invest time in improving
pre-existing (lack of) commentary, the more that how to exactly word
this might be controversial. If that's really a requirement, then I'd
abandon the patch (perhaps for someone else to pick up), maintaining it
locally until the bug has been fixed upstream by whatever means. This
is even more so that the extra time I'll have to put into playing with
Simon's test case is also nothing I really have time for (but I'll try to
find the time nevertheless).

> We hid the different places behind those XSAVE_AVX512_ZMM_H_ADDR
> macros but there's nothing similar for the guard bits.  Maybe add macros
> for the guard check, as well?

Same here - nothing that this patch really changes.

>> @@ -1135,6 +1135,9 @@ i387_supply_xsave (struct regcache *regc
>>  	    }
>>  	  else
>>  	    {
>> +	      for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
>> +		regcache->raw_supply (i,
>> +				      XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i));
> 
> This covers the upper halves of zmm16 to zmm31.  Looking at the function it looks
> like the lower halves are covered in separate cases avx512_ymmh_avx512 and
> avx512_xmmh_avx512.  Maybe reflect this in the comment?  It currently suggests
> that it handles the entire upper 16 registers.

Same here - the split among code regions has been there before (also
for e.g. the YMM register lower and upper halves).

Irrespective of my mostly negative responses thanks for your
comments, Jan
  
Jan Beulich Oct. 29, 2018, 10:31 a.m. UTC | #9
>>> On 10.10.18 at 17:12,  wrote:
> The three AVX512 state components are entirely independent - one being
> in its "init state" has no implication whatsoever on either of the other
> two. Fully separate X86_XSTATE_ZMM_H and X86_XSTATE_ZMM handling, to
> prevent upper halves of the upper 16 ZMM registers to display as if they
> were zero (when they aren't) after e.g. VZEROALL/VZEROUPPER.
> 
> gdb/
> 2018-10-10  Jan Beulich  <jbeulich@suse.com>
> 
> 	* i387-tdep.c (i387_supply_xsave): Split handling of
> 	X86_XSTATE_ZMM_H and X86_XSTATE_ZMM.
> 	(i387_collect_xsave): Likewise.
> 
> gdb/testsuite/
> 2018-10-10  Simon Marchi <simon.marchi@polymtl.ca>
> 
> 	* testsuite/gdb.arch/i386-avx512.c,
> 	testsuite/gdb.arch/i386-avx512.exp: Add 7th test.
> 
> ---
> v2: Attach comments to zmm_endlo_regnum declarations. Add testcase
>     provided by Simon.
> 
> --- a/gdb/i387-tdep.c
> +++ b/gdb/i387-tdep.c
> @@ -924,6 +924,12 @@ i387_supply_xsave (struct regcache *regc
>    struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
>    const gdb_byte *regs = (const gdb_byte *) xsave;
>    int i;
> +  /* In 64-bit mode the split between "low" and "high" ZMM registers is at
> +     ZMM16.  Outside of 64-bit mode there are no "high" ZMM registers at 
> all.
> +     Precalculate the number to be used for the split point, with the all
> +     registers in the "low" portion outside of 64-bit mode.  */
> +  unsigned int zmm_endlo_regnum = I387_ZMM0H_REGNUM (tdep)
> +				  + std::min (tdep->num_zmm_regs, 16);
>    ULONGEST clear_bv;
>    static const gdb_byte zero[I386_MAX_REGISTER_SIZE] = { 0 };
>    enum
> @@ -1002,7 +1008,8 @@ i387_supply_xsave (struct regcache *regc
>        return;
>  
>      case avx512_zmm_h:
> -      if ((clear_bv & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
> +      if ((clear_bv & (regnum < zmm_endlo_regnum ? X86_XSTATE_ZMM_H
> +						 : X86_XSTATE_ZMM)))
>  	regcache->raw_supply (regnum, zero);
>        else
>  	regcache->raw_supply (regnum,
> @@ -1080,21 +1087,17 @@ i387_supply_xsave (struct regcache *regc
>  	    }
>  	}
>  
> -      /* Handle the upper ZMM registers.  */
> -      if ((tdep->xcr0 & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
> +      /* Handle the upper halves of the low 8/16 ZMM registers.  */
> +      if ((tdep->xcr0 & X86_XSTATE_ZMM_H))
>  	{
> -	  if ((clear_bv & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
> +	  if ((clear_bv & X86_XSTATE_ZMM_H))
>  	    {
> -	      for (i = I387_ZMM0H_REGNUM (tdep);
> -		   i < I387_ZMMENDH_REGNUM (tdep);
> -		   i++)
> +	      for (i = I387_ZMM0H_REGNUM (tdep); i < zmm_endlo_regnum; i++)
>  		regcache->raw_supply (i, zero);
>  	    }
>  	  else
>  	    {
> -	      for (i = I387_ZMM0H_REGNUM (tdep);
> -		   i < I387_ZMMENDH_REGNUM (tdep);
> -		   i++)
> +	      for (i = I387_ZMM0H_REGNUM (tdep); i < zmm_endlo_regnum; i++)
>  		regcache->raw_supply (i,
>  				      XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i));
>  	    }
> @@ -1119,11 +1122,13 @@ i387_supply_xsave (struct regcache *regc
>  	    }
>  	}
>  
> -      /* Handle the YMM_AVX512 registers.  */
> +      /* Handle the upper 16 ZMM/YMM/XMM registers (if any).  */
>        if ((tdep->xcr0 & X86_XSTATE_ZMM))
>  	{
>  	  if ((clear_bv & X86_XSTATE_ZMM))
>  	    {
> +	      for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
> +		regcache->raw_supply (i, zero);
>  	      for (i = I387_YMM16H_REGNUM (tdep);
>  		   i < I387_YMMH_AVX512_END_REGNUM (tdep);
>  		   i++)
> @@ -1135,6 +1140,9 @@ i387_supply_xsave (struct regcache *regc
>  	    }
>  	  else
>  	    {
> +	      for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
> +		regcache->raw_supply (i,
> +				      XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i));
>  	      for (i = I387_YMM16H_REGNUM (tdep);
>  		   i < I387_YMMH_AVX512_END_REGNUM (tdep);
>  		   i++)
> @@ -1341,6 +1349,9 @@ i387_collect_xsave (const struct regcach
>    gdb_byte raw[I386_MAX_REGISTER_SIZE];
>    ULONGEST initial_xstate_bv, clear_bv, xstate_bv = 0;
>    unsigned int i;
> +  /* See the comment in i387_supply_xsave().  */
> +  unsigned int zmm_endlo_regnum = I387_ZMM0H_REGNUM (tdep)
> +				  + std::min (tdep->num_zmm_regs, 16);
>    enum
>      {
>        x87_ctrl_or_mxcsr = 0x1,
> @@ -1441,9 +1452,8 @@ i387_collect_xsave (const struct regcach
>  	     i < I387_MPXEND_REGNUM (tdep); i++)
>  	  memset (XSAVE_MPX_ADDR (tdep, regs, i), 0, 8);
>  
> -      if ((clear_bv & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
> -	for (i = I387_ZMM0H_REGNUM (tdep);
> -	     i < I387_ZMMENDH_REGNUM (tdep); i++)
> +      if ((clear_bv & X86_XSTATE_ZMM_H))
> +	for (i = I387_ZMM0H_REGNUM (tdep); i < zmm_endlo_regnum; i++)
>  	  memset (XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i), 0, 32);
>  
>        if ((clear_bv & X86_XSTATE_K))
> @@ -1453,6 +1463,8 @@ i387_collect_xsave (const struct regcach
>  
>        if ((clear_bv & X86_XSTATE_ZMM))
>  	{
> +	  for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
> +	    memset (XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i), 0, 32);
>  	  for (i = I387_YMM16H_REGNUM (tdep);
>  	       i < I387_YMMH_AVX512_END_REGNUM (tdep); i++)
>  	    memset (XSAVE_YMM_AVX512_ADDR (tdep, regs, i), 0, 16);
> --- a/gdb/testsuite/gdb.arch/i386-avx512.c
> +++ b/gdb/testsuite/gdb.arch/i386-avx512.c
> @@ -249,6 +249,13 @@ main (int argc, char **argv)
>  	 move back to array and check values.  */
>        move_zmm_data_to_memory ();
>        asm ("nop"); /* sixth breakpoint here  */
> +
> +      asm ("vpternlogd $0xff, %zmm0, %zmm0, %zmm0");
> +#ifdef __x86_64__
> +      asm ("vpternlogd $0xff, %zmm0, %zmm0, %zmm16");
> +#endif
> +      asm ("vzeroupper");
> +      asm ("nop"); /* seventh breakpoint here  */
>      }
>  
>    return 0;
> --- a/gdb/testsuite/gdb.arch/i386-avx512.exp
> +++ b/gdb/testsuite/gdb.arch/i386-avx512.exp
> @@ -174,3 +174,13 @@ for { set r 0 } { $r < $nr_regs } { incr
>          ".. = \\{f = \\{[expr $r + 30], [expr $r.125 + 30], [expr $r.25 + 
> 20], [expr $r.375 + 20], [expr $r.5 + 10], [expr $r.625 + 10], [expr $r.75 + 
> 10], [expr $r.875 + 10]\\}\\}.*" \
>          "check contents of zmm_data\[$r\] after writing XMM regs"
>  }
> +
> +gdb_test "break [gdb_get_line_number "seventh breakpoint here"]" \
> +    "Breakpoint .* at .*i386-avx512.c.*" \
> +    "set seventh breakpoint in main"
> +gdb_continue_to_breakpoint "continue to seventh breakpoint in main"
> +gdb_test "print \$zmm0.v16_int32" "= {-1, -1, -1, -1, 0 <repeats 12 
> times>}"
> +
> +if { $nr_regs >= 16 } {
> +    gdb_test "print \$zmm16.v16_int32" "= {-1 <repeats 16 times>}"
> +}
> 
> 
> 
>
  
Simon Marchi Oct. 31, 2018, 2 p.m. UTC | #10
On 2018-10-29 06:31, Jan Beulich wrote:
>>>> On 10.10.18 at 17:12,  wrote:
>> The three AVX512 state components are entirely independent - one being
>> in its "init state" has no implication whatsoever on either of the 
>> other
>> two. Fully separate X86_XSTATE_ZMM_H and X86_XSTATE_ZMM handling, to
>> prevent upper halves of the upper 16 ZMM registers to display as if 
>> they
>> were zero (when they aren't) after e.g. VZEROALL/VZEROUPPER.
>> 
>> gdb/
>> 2018-10-10  Jan Beulich  <jbeulich@suse.com>
>> 
>> 	* i387-tdep.c (i387_supply_xsave): Split handling of
>> 	X86_XSTATE_ZMM_H and X86_XSTATE_ZMM.
>> 	(i387_collect_xsave): Likewise.
>> 
>> gdb/testsuite/
>> 2018-10-10  Simon Marchi <simon.marchi@polymtl.ca>
>> 
>> 	* testsuite/gdb.arch/i386-avx512.c,
>> 	testsuite/gdb.arch/i386-avx512.exp: Add 7th test.
>> 
>> ---
>> v2: Attach comments to zmm_endlo_regnum declarations. Add testcase
>>     provided by Simon.

The testcase obviously LGTM.  I will let Markus approve the other 
changes.

Simon
  
Metzger, Markus T Nov. 7, 2018, 9:07 a.m. UTC | #11
> On 2018-10-29 06:31, Jan Beulich wrote:
> >>>> On 10.10.18 at 17:12,  wrote:
> >> The three AVX512 state components are entirely independent - one
> >> being in its "init state" has no implication whatsoever on either of
> >> the other two. Fully separate X86_XSTATE_ZMM_H and X86_XSTATE_ZMM
> >> handling, to prevent upper halves of the upper 16 ZMM registers to
> >> display as if they were zero (when they aren't) after e.g.
> >> VZEROALL/VZEROUPPER.
> >>
> >> gdb/
> >> 2018-10-10  Jan Beulich  <jbeulich@suse.com>
> >>
> >> 	* i387-tdep.c (i387_supply_xsave): Split handling of
> >> 	X86_XSTATE_ZMM_H and X86_XSTATE_ZMM.
> >> 	(i387_collect_xsave): Likewise.
> >>
> >> gdb/testsuite/
> >> 2018-10-10  Simon Marchi <simon.marchi@polymtl.ca>
> >>
> >> 	* testsuite/gdb.arch/i386-avx512.c,
> >> 	testsuite/gdb.arch/i386-avx512.exp: Add 7th test.
> >>
> >> ---
> >> v2: Attach comments to zmm_endlo_regnum declarations. Add testcase
> >>     provided by Simon.
> 
> The testcase obviously LGTM.  I will let Markus approve the other changes.

The code already looked good to me in v1.  Thanks for adding comments.

Markus.
Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
  
Jan Beulich Nov. 7, 2018, 9:11 a.m. UTC | #12
>>> On 07.11.18 at 10:07, <markus.t.metzger@intel.com> wrote:
>>  On 2018-10-29 06:31, Jan Beulich wrote:
>> >>>> On 10.10.18 at 17:12,  wrote:
>> >> The three AVX512 state components are entirely independent - one
>> >> being in its "init state" has no implication whatsoever on either of
>> >> the other two. Fully separate X86_XSTATE_ZMM_H and X86_XSTATE_ZMM
>> >> handling, to prevent upper halves of the upper 16 ZMM registers to
>> >> display as if they were zero (when they aren't) after e.g.
>> >> VZEROALL/VZEROUPPER.
>> >>
>> >> gdb/
>> >> 2018-10-10  Jan Beulich  <jbeulich@suse.com>
>> >>
>> >> 	* i387-tdep.c (i387_supply_xsave): Split handling of
>> >> 	X86_XSTATE_ZMM_H and X86_XSTATE_ZMM.
>> >> 	(i387_collect_xsave): Likewise.
>> >>
>> >> gdb/testsuite/
>> >> 2018-10-10  Simon Marchi <simon.marchi@polymtl.ca>
>> >>
>> >> 	* testsuite/gdb.arch/i386-avx512.c,
>> >> 	testsuite/gdb.arch/i386-avx512.exp: Add 7th test.
>> >>
>> >> ---
>> >> v2: Attach comments to zmm_endlo_regnum declarations. Add testcase
>> >>     provided by Simon.
>> 
>> The testcase obviously LGTM.  I will let Markus approve the other changes.
> 
> The code already looked good to me in v1.  Thanks for adding comments.

So can I translate this into an ack for me to commit the change?
Or else, who would be the one to give the go-ahead?

Jan
  
Metzger, Markus T Nov. 7, 2018, 1:18 p.m. UTC | #13
> >>> On 07.11.18 at 10:07, <markus.t.metzger@intel.com> wrote:
> >>  On 2018-10-29 06:31, Jan Beulich wrote:
> >> >>>> On 10.10.18 at 17:12,  wrote:
> >> >> The three AVX512 state components are entirely independent - one
> >> >> being in its "init state" has no implication whatsoever on either
> >> >> of the other two. Fully separate X86_XSTATE_ZMM_H and
> >> >> X86_XSTATE_ZMM handling, to prevent upper halves of the upper 16
> >> >> ZMM registers to display as if they were zero (when they aren't) after e.g.
> >> >> VZEROALL/VZEROUPPER.
> >> >>
> >> >> gdb/
> >> >> 2018-10-10  Jan Beulich  <jbeulich@suse.com>
> >> >>
> >> >> 	* i387-tdep.c (i387_supply_xsave): Split handling of
> >> >> 	X86_XSTATE_ZMM_H and X86_XSTATE_ZMM.
> >> >> 	(i387_collect_xsave): Likewise.
> >> >>
> >> >> gdb/testsuite/
> >> >> 2018-10-10  Simon Marchi <simon.marchi@polymtl.ca>
> >> >>
> >> >> 	* testsuite/gdb.arch/i386-avx512.c,
> >> >> 	testsuite/gdb.arch/i386-avx512.exp: Add 7th test.
> >> >>
> >> >> ---
> >> >> v2: Attach comments to zmm_endlo_regnum declarations. Add testcase
> >> >>     provided by Simon.
> >>
> >> The testcase obviously LGTM.  I will let Markus approve the other changes.
> >
> > The code already looked good to me in v1.  Thanks for adding comments.
> 
> So can I translate this into an ack for me to commit the change?
> Or else, who would be the one to give the go-ahead?

Simon can approve your patch.

IIRC Pedro had a question regarding gdbserver.  From a first look, it seems to get the
feature bits right but does not distinguish 32-bit and 64-bit mode regarding the number
of available registers.

Did you run the new test you added also in remote configuration?

Markus.

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Christian Lamprechter
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
  
Jan Beulich Nov. 7, 2018, 1:25 p.m. UTC | #14
>>> On 07.11.18 at 14:18, <markus.t.metzger@intel.com> wrote:
>> >>> On 07.11.18 at 10:07, <markus.t.metzger@intel.com> wrote:
>> >>  On 2018-10-29 06:31, Jan Beulich wrote:
>> >> >>>> On 10.10.18 at 17:12,  wrote:
>> >> >> The three AVX512 state components are entirely independent - one
>> >> >> being in its "init state" has no implication whatsoever on either
>> >> >> of the other two. Fully separate X86_XSTATE_ZMM_H and
>> >> >> X86_XSTATE_ZMM handling, to prevent upper halves of the upper 16
>> >> >> ZMM registers to display as if they were zero (when they aren't) after 
> e.g.
>> >> >> VZEROALL/VZEROUPPER.
>> >> >>
>> >> >> gdb/
>> >> >> 2018-10-10  Jan Beulich  <jbeulich@suse.com>
>> >> >>
>> >> >> 	* i387-tdep.c (i387_supply_xsave): Split handling of
>> >> >> 	X86_XSTATE_ZMM_H and X86_XSTATE_ZMM.
>> >> >> 	(i387_collect_xsave): Likewise.
>> >> >>
>> >> >> gdb/testsuite/
>> >> >> 2018-10-10  Simon Marchi <simon.marchi@polymtl.ca>
>> >> >>
>> >> >> 	* testsuite/gdb.arch/i386-avx512.c,
>> >> >> 	testsuite/gdb.arch/i386-avx512.exp: Add 7th test.
>> >> >>
>> >> >> ---
>> >> >> v2: Attach comments to zmm_endlo_regnum declarations. Add testcase
>> >> >>     provided by Simon.
>> >>
>> >> The testcase obviously LGTM.  I will let Markus approve the other changes.
>> >
>> > The code already looked good to me in v1.  Thanks for adding comments.
>> 
>> So can I translate this into an ack for me to commit the change?
>> Or else, who would be the one to give the go-ahead?
> 
> Simon can approve your patch.
> 
> IIRC Pedro had a question regarding gdbserver.  From a first look, it seems to get the
> feature bits right but does not distinguish 32-bit and 64-bit mode regarding the number
> of available registers.

If there was a problem there, it was my understanding that this can
(and should) be fixed in a separate patch, by someone able to test
this.

> Did you run the new test you added also in remote configuration?

No, and I also have no idea how I would go about doing so.

Jan
  
Simon Marchi Nov. 7, 2018, 2:18 p.m. UTC | #15
On 2018-11-07 8:25 a.m., Jan Beulich wrote:
>>> So can I translate this into an ack for me to commit the change?

>>> Or else, who would be the one to give the go-ahead?

>>

>> Simon can approve your patch.


If Markus is happy with the comments, it's all fine with me, please push (I see
you are listed in MAINTAINERS, so I assume you have push access).

>>

>> IIRC Pedro had a question regarding gdbserver.  From a first look, it seems to get the

>> feature bits right but does not distinguish 32-bit and 64-bit mode regarding the number

>> of available registers.

> 

> If there was a problem there, it was my understanding that this can

> (and should) be fixed in a separate patch, by someone able to test

> this.


Well, you are able to test it (see commands below) :).  But indeed, I think we
should push this patch, which already improves GDB.  It would be nice to at least
get an idea of the status of the test when ran with gdbserver.  Since access to
this hardware is quite limited, it would be appreciated if you could give it a
quick shot and report the results.

>> Did you run the new test you added also in remote configuration?

> 

> No, and I also have no idea how I would go about doing so.


Run:

$ make check TESTS="gdb.arch/i386-avx512.exp" RUNTESTFLAGS="--target_board=native-gdbserver"
$ make check TESTS="gdb.arch/i386-avx512.exp" RUNTESTFLAGS="--target_board=native-extended-gdbserver"

in the gdb/ build directory.

Thanks,

Simon
  
Jan Beulich Nov. 8, 2018, 11:15 a.m. UTC | #16
>>> On 07.11.18 at 15:18, <simon.marchi@ericsson.com> wrote:
> On 2018-11-07 8:25 a.m., Jan Beulich wrote:
>>>> So can I translate this into an ack for me to commit the change?
>>>> Or else, who would be the one to give the go-ahead?
>>>
>>> Simon can approve your patch.
> 
> If Markus is happy with the comments, it's all fine with me, please push (I see
> you are listed in MAINTAINERS, so I assume you have push access).

Done.

>>> IIRC Pedro had a question regarding gdbserver.  From a first look, it seems to get the
>>> feature bits right but does not distinguish 32-bit and 64-bit mode regarding the number
>>> of available registers.
>> 
>> If there was a problem there, it was my understanding that this can
>> (and should) be fixed in a separate patch, by someone able to test
>> this.
> 
> Well, you are able to test it (see commands below) :).  But indeed, I think we
> should push this patch, which already improves GDB.  It would be nice to at least
> get an idea of the status of the test when ran with gdbserver.  Since access to
> this hardware is quite limited, it would be appreciated if you could give it a
> quick shot and report the results.
> 
>>> Did you run the new test you added also in remote configuration?
>> 
>> No, and I also have no idea how I would go about doing so.
> 
> Run:
> 
> $ make check TESTS="gdb.arch/i386-avx512.exp" RUNTESTFLAGS="--target_board=native-gdbserver"

I'm afraid there's more broken here than just this one test.
Log file attached. I hope you can make sense of it.

> $ make check TESTS="gdb.arch/i386-avx512.exp" RUNTESTFLAGS="--target_board=native-extended-gdbserver"

This produces effectively the same (bad) results.

Jan
  
Simon Marchi Nov. 8, 2018, 4:23 p.m. UTC | #17
On 2018-11-08 6:15 a.m., Jan Beulich wrote:
> I'm afraid there's more broken here than just this one test.
> Log file attached. I hope you can make sense of it.
> 
>> $ make check TESTS="gdb.arch/i386-avx512.exp" RUNTESTFLAGS="--target_board=native-extended-gdbserver"
> 
> This produces effectively the same (bad) results.

Thanks I opened:

https://sourceware.org/bugzilla/show_bug.cgi?id=23873

Simon
  

Patch

--- a/gdb/i387-tdep.c
+++ b/gdb/i387-tdep.c
@@ -923,7 +923,8 @@  i387_supply_xsave (struct regcache *regc
   enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
   struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
   const gdb_byte *regs = (const gdb_byte *) xsave;
-  int i;
+  int i, zmm_endlo_regnum = I387_ZMM0H_REGNUM (tdep)
+			    + std::min (tdep->num_zmm_regs, 16);
   ULONGEST clear_bv;
   static const gdb_byte zero[I386_MAX_REGISTER_SIZE] = { 0 };
   enum
@@ -1002,7 +1003,8 @@  i387_supply_xsave (struct regcache *regc
       return;
 
     case avx512_zmm_h:
-      if ((clear_bv & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
+      if ((clear_bv & (regnum < zmm_endlo_regnum ? X86_XSTATE_ZMM_H
+						 : X86_XSTATE_ZMM)))
 	regcache->raw_supply (regnum, zero);
       else
 	regcache->raw_supply (regnum,
@@ -1080,21 +1082,17 @@  i387_supply_xsave (struct regcache *regc
 	    }
 	}
 
-      /* Handle the upper ZMM registers.  */
-      if ((tdep->xcr0 & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
+      /* Handle the upper halves of the low 8/16 ZMM registers.  */
+      if ((tdep->xcr0 & X86_XSTATE_ZMM_H))
 	{
-	  if ((clear_bv & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
+	  if ((clear_bv & X86_XSTATE_ZMM_H))
 	    {
-	      for (i = I387_ZMM0H_REGNUM (tdep);
-		   i < I387_ZMMENDH_REGNUM (tdep);
-		   i++)
+	      for (i = I387_ZMM0H_REGNUM (tdep); i < zmm_endlo_regnum; i++)
 		regcache->raw_supply (i, zero);
 	    }
 	  else
 	    {
-	      for (i = I387_ZMM0H_REGNUM (tdep);
-		   i < I387_ZMMENDH_REGNUM (tdep);
-		   i++)
+	      for (i = I387_ZMM0H_REGNUM (tdep); i < zmm_endlo_regnum; i++)
 		regcache->raw_supply (i,
 				      XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i));
 	    }
@@ -1119,11 +1117,13 @@  i387_supply_xsave (struct regcache *regc
 	    }
 	}
 
-      /* Handle the YMM_AVX512 registers.  */
+      /* Handle the upper 16 ZMM/YMM/XMM registers (if any).  */
       if ((tdep->xcr0 & X86_XSTATE_ZMM))
 	{
 	  if ((clear_bv & X86_XSTATE_ZMM))
 	    {
+	      for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
+		regcache->raw_supply (i, zero);
 	      for (i = I387_YMM16H_REGNUM (tdep);
 		   i < I387_YMMH_AVX512_END_REGNUM (tdep);
 		   i++)
@@ -1135,6 +1135,9 @@  i387_supply_xsave (struct regcache *regc
 	    }
 	  else
 	    {
+	      for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
+		regcache->raw_supply (i,
+				      XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i));
 	      for (i = I387_YMM16H_REGNUM (tdep);
 		   i < I387_YMMH_AVX512_END_REGNUM (tdep);
 		   i++)
@@ -1340,7 +1343,8 @@  i387_collect_xsave (const struct regcach
   gdb_byte *p, *regs = (gdb_byte *) xsave;
   gdb_byte raw[I386_MAX_REGISTER_SIZE];
   ULONGEST initial_xstate_bv, clear_bv, xstate_bv = 0;
-  unsigned int i;
+  unsigned int i, zmm_endlo_regnum = I387_ZMM0H_REGNUM (tdep)
+				     + std::min (tdep->num_zmm_regs, 16);
   enum
     {
       x87_ctrl_or_mxcsr = 0x1,
@@ -1441,9 +1445,8 @@  i387_collect_xsave (const struct regcach
 	     i < I387_MPXEND_REGNUM (tdep); i++)
 	  memset (XSAVE_MPX_ADDR (tdep, regs, i), 0, 8);
 
-      if ((clear_bv & (X86_XSTATE_ZMM_H | X86_XSTATE_ZMM)))
-	for (i = I387_ZMM0H_REGNUM (tdep);
-	     i < I387_ZMMENDH_REGNUM (tdep); i++)
+      if ((clear_bv & X86_XSTATE_ZMM_H))
+	for (i = I387_ZMM0H_REGNUM (tdep); i < zmm_endlo_regnum; i++)
 	  memset (XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i), 0, 32);
 
       if ((clear_bv & X86_XSTATE_K))
@@ -1453,6 +1456,8 @@  i387_collect_xsave (const struct regcach
 
       if ((clear_bv & X86_XSTATE_ZMM))
 	{
+	  for (i = zmm_endlo_regnum; i < I387_ZMMENDH_REGNUM (tdep); i++)
+	    memset (XSAVE_AVX512_ZMM_H_ADDR (tdep, regs, i), 0, 32);
 	  for (i = I387_YMM16H_REGNUM (tdep);
 	       i < I387_YMMH_AVX512_END_REGNUM (tdep); i++)
 	    memset (XSAVE_YMM_AVX512_ADDR (tdep, regs, i), 0, 16);