Message ID | CAK8P3a2=rVvvW8bBcrPm3nXNNP56Ygxchu4ThD2ZgZ_cVcd8wQ@mail.gmail.com |
---|---|
State | New, archived |
Headers |
Received: (qmail 74290 invoked by alias); 8 Aug 2018 16:01:54 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 74180 invoked by uid 89); 8 Aug 2018 16:01:54 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-22.6 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, SPF_PASS, UNSUBSCRIBE_BODY autolearn=ham version=3.3.2 spammy=Hx-languages-length:2464, b_h, H*Ad:U*neko X-HELO: mail-qk0-f178.google.com MIME-Version: 1.0 References: <alpine.LRH.2.02.1808021242320.31834@file01.intranet.prod.int.rdu2.redhat.com> <CAHCPf3tFGqkYEcWNN4LaWThw_rVqT316pzLv6T7RfxwO-eZ0EA@mail.gmail.com> <alpine.LRH.2.02.1808030212340.17672@file01.intranet.prod.int.rdu2.redhat.com> <CAKv+Gu8DeuksZhk1g3q_msSKV_hSY_2e1uzVten9-oGO3j9Sqg@mail.gmail.com> <20180803094129.GB17798@arm.com> <alpine.LRH.2.02.1808031235410.31584@file01.intranet.prod.int.rdu2.redhat.com> <20180808113927.GA24736@iMac.local> <alpine.LRH.2.02.1808081011110.9997@file01.intranet.prod.int.rdu2.redhat.com> <b1a0dac5-3cf2-1a7f-ac7f-649126eb7873@arm.com> <20180808151444.GF24736@iMac.local> In-Reply-To: <20180808151444.GF24736@iMac.local> From: Arnd Bergmann <arnd@arndb.de> Date: Wed, 8 Aug 2018 18:01:31 +0200 Message-ID: <CAK8P3a2=rVvvW8bBcrPm3nXNNP56Ygxchu4ThD2ZgZ_cVcd8wQ@mail.gmail.com> Subject: Re: framebuffer corruption due to overlapping stp instructions on arm64 To: Catalin Marinas <catalin.marinas@arm.com> Cc: Richard.Earnshaw@arm.com, Mikulas Patocka <mpatocka@redhat.com>, Thomas Petazzoni <thomas.petazzoni@free-electrons.com>, Joao Pinto <Joao.Pinto@synopsys.com>, GNU C Library <libc-alpha@sourceware.org>, Ard Biesheuvel <ard.biesheuvel@linaro.org>, Jingoo Han <jingoohan1@gmail.com>, Will Deacon <will.deacon@arm.com>, Russell King - ARM Linux <linux@armlinux.org.uk>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, neko@bakuhatsu.net, linux-pci <linux-pci@vger.kernel.org>, Linux ARM <linux-arm-kernel@lists.infradead.org> Content-Type: text/plain; charset="UTF-8" |
Commit Message
Arnd Bergmann
Aug. 8, 2018, 4:01 p.m. UTC
On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > - failing to write a few bytes > - writing a few bytes that were written 16 bytes before > - writing a few bytes that were written 16 bytes after > > > The overlapping writes in memcpy never write different values to the > > same location, so I still feel this must be some sort of HW issue, not a > > SW one. > > So do I (my interpretation is that it combines or rather skips some of > the writes to the same 16-byte address as it ignores the data strobes). Maybe it just always writes to the wrong location, 16 bytes apart for one of the stp instructions. Since we are usually dealing with a pair of overlapping 'stp', both unaligned, that could explain both the missing bytes (we write data to the wrong place, but overwrite it with the correct data right away) and the extra copy (we write it to the wrong place, but then write the correct data to the correct place as well). This sounds a bit like what the original ARM CPUs did on unaligned memory access, where a single aligned 4-byte location was accessed, but the bytes swapped around. There may be a few more things worth trying out or analysing from the recorded past failures to understand more about how it goes wrong: - For which data lengths does it fail? Having two overlapping unaligned stp is something that only happens for 16..96 byte memcpy. - What if we use a pair of str instructions instead of an stp in a modified memcpy? Does it now write to still write to the wrong place 16 bytes away, just 8 bytes away, or correctly? - Does it change in any way if we do the overlapping writes in the reverse order? E.g. for the 16..64 byte case: Arnd
Comments
On Wed, 8 Aug 2018, Arnd Bergmann wrote: > On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > - failing to write a few bytes > > - writing a few bytes that were written 16 bytes before > > - writing a few bytes that were written 16 bytes after > > > > > The overlapping writes in memcpy never write different values to the > > > same location, so I still feel this must be some sort of HW issue, not a > > > SW one. > > > > So do I (my interpretation is that it combines or rather skips some of > > the writes to the same 16-byte address as it ignores the data strobes). > > Maybe it just always writes to the wrong location, 16 bytes apart for one of > the stp instructions. Since we are usually dealing with a pair of overlapping > 'stp', both unaligned, that could explain both the missing bytes (we write > data to the wrong place, but overwrite it with the correct data right away) > and the extra copy (we write it to the wrong place, but then write the correct > data to the correct place as well). > > This sounds a bit like what the original ARM CPUs did on unaligned > memory access, where a single aligned 4-byte location was accessed, > but the bytes swapped around. > > There may be a few more things worth trying out or analysing from > the recorded past failures to understand more about how it goes > wrong: > > - For which data lengths does it fail? Having two overlapping > unaligned stp is something that only happens for 16..96 byte > memcpy. If you want to research the corruptions in detail, I uploaded a file containing 7k corruptions here: http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/ > - What if we use a pair of str instructions instead of an stp in > a modified memcpy? Does it now write to still write to the > wrong place 16 bytes away, just 8 bytes away, or correctly? I replaced all stp instructions with str and it didn't have effect on corruptions. Either a few bytes is omitted, or a value that belongs 16 bytes before or after is written. > - Does it change in any way if we do the overlapping writes > in the reverse order? E.g. for the 16..64 byte case: > > diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S > index 7e1163e6a0..09d0160bdf 100644 > --- a/sysdeps/aarch64/memcpy.S > +++ b/sysdeps/aarch64/memcpy.S > @@ -102,11 +102,11 @@ ENTRY (MEMCPY) > tbz tmp1, 5, 1f > ldp B_l, B_h, [src, 16] > ldp C_l, C_h, [srcend, -32] > - stp B_l, B_h, [dstin, 16] > stp C_l, C_h, [dstend, -32] > + stp B_l, B_h, [dstin, 16] > 1: > - stp A_l, A_h, [dstin] > stp D_l, D_h, [dstend, -16] > + stp A_l, A_h, [dstin] > ret > > .p2align 4 > > Arnd After reordering them, I observe only omitted writes, there are no longer misdirected writes: http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/reorder-test/ Mikulas
On Wed, Aug 8, 2018 at 8:25 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > On Wed, 8 Aug 2018, Arnd Bergmann wrote: > > > On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > > - failing to write a few bytes > > > - writing a few bytes that were written 16 bytes before > > > - writing a few bytes that were written 16 bytes after > > > > > > > The overlapping writes in memcpy never write different values to the > > > > same location, so I still feel this must be some sort of HW issue, not a > > > > SW one. > > > > > > So do I (my interpretation is that it combines or rather skips some of > > > the writes to the same 16-byte address as it ignores the data strobes). > > > > Maybe it just always writes to the wrong location, 16 bytes apart for one of > > the stp instructions. Since we are usually dealing with a pair of overlapping > > 'stp', both unaligned, that could explain both the missing bytes (we write > > data to the wrong place, but overwrite it with the correct data right away) > > and the extra copy (we write it to the wrong place, but then write the correct > > data to the correct place as well). > > > > This sounds a bit like what the original ARM CPUs did on unaligned > > memory access, where a single aligned 4-byte location was accessed, > > but the bytes swapped around. > > > > There may be a few more things worth trying out or analysing from > > the recorded past failures to understand more about how it goes > > wrong: > > > > - For which data lengths does it fail? Having two overlapping > > unaligned stp is something that only happens for 16..96 byte > > memcpy. > > If you want to research the corruptions in detail, I uploaded a file > containing 7k corruptions here: > http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/ Nice! I already found a couple of things: - Failure to copy always happens at the *end* of a 16 byte aligned physical address, it misses between 1 and 6 bytes, never 7 or more, and it's more likely to be fewer bytes that are affected. 279 7 389 6 484 5 683 4 741 3 836 2 946 1 - The first byte that fails to get copied is always 16 bytes after the memcpy target. Since we only observe it at the end of the 16 byte range, it means this happens specifically for addresses ending in 0x9 (7 bytes missed) to 0xf (1 byte missed). - Out of 7445 corruptions, 4358 were of the kind that misses a copy at the end of a 16-byte area, they were for copies between 41 and 64 bytes, more to the larger end of the scale (note that with your test program, smaller memcpys happen more frequenly than larger ones). 47 0x29 36 0x2a 47 0x2b 23 0x2c 29 0x2d 31 0x2e 36 0x2f 46 0x30 45 0x31 51 0x32 62 0x33 64 0x34 77 0x35 91 0x36 90 0x37 100 0x38 100 0x39 209 0x3a 279 0x3b 366 0x3c 498 0x3d 602 0x3e 682 0x3f 747 0x40 - All corruption with data copied to the wrong place happened for copies between 33 and 47 bytes, mostly to the smaller end of the scale: 391 0x21 360 0x22 319 0x23 273 0x24 273 0x25 241 0x26 224 0x27 221 0x28 231 0x29 208 0x2a 163 0x2b 86 0x2c 63 0x2d 33 0x2e 1 0x2f - One common (but not the only, still investigating) case for data getting written to the wrong place is: * corruption starts 16 bytes after the memcpy start * corrupt bytes are the same as the bytes written to the start * start address ends in 0x1 through 0x7 * length of corruption is at most memcpy length- 32, always between 1 and 7. Arnd
On Wed, Aug 8, 2018 at 11:51 PM Arnd Bergmann <arnd@arndb.de> wrote: > I already found a couple of things: > > - Failure to copy always happens at the *end* of a 16 byte aligned > physical address, it misses between 1 and 6 bytes, never 7 or more, > and it's more likely to be fewer bytes that are affected. > > - The first byte that fails to get copied is always 16 bytes after the > memcpy target. Since we only observe it at the end of the 16 byte > range, it means this happens specifically for addresses ending in > 0x9 (7 bytes missed) to 0xf (1 byte missed). > > - Out of 7445 corruptions, 4358 were of the kind that misses a copy at the > end of a 16-byte area, they were for copies between 41 and 64 bytes, > more to the larger end of the scale (note that with your test program, > smaller memcpys happen more frequenly than larger ones). Thinking about it some more, this scenario can be explained by a read-modify-write logic gone wrong somewhere in the hardware, leading to the original bytes being written back after we write the correct data. The code path we hit most commonly in glibc is like this one: // offset = 0xd, could be 0x9..0xf + n*0x10 // length = 0x3f, could be 0x29..0x3f memcpy(map + 0xd, data + 0xd, 0x3f); stp B_l, B_h, [dstin, 16] # offset 0x1d stp C_l, C_h, [dstend, -32] # offset 0x2c stp A_l, A_h, [dstin] # offset 0x0d stp D_l, D_h, [dstend, -16] # offset 0x3c The corruption here always appears in bytes 0x1d..0x1f. A theory that matches this corruption is that the stores for B, C and D get combined into write transaction of length 0x2f, spanning bytes 0x1d..0x4b in the map. This may prefetch either 8 bytes at 0x18 or 16 bytes at 0x10 into a temporary HW buffer, which gets modified with the correct data for 0x1d..0x1f before writing back that prefetched data. The key here is the write of A to offset 0x0d..0x1c. This also prefetches the data at 0x18..0x1f, and modifies the bytes ..1c in it. When this is prefetched before the first write, but written back after it, offsets 0x1d..0x1f have the original data again! Variations that trigger the same thing include the modified sequence: stp C_l, C_h, [dstend, -32] # offset 0x2c stp B_l, B_h, [dstin, 16] # offset 0x1d stp D_l, D_h, [dstend, -16] # offset 0x3c stp A_l, A_h, [dstin] # offset 0x0d and the special case for 64 byte memcpy that uses a completely different sequence, either (original, corruption is common for 64 byte) stp A_l, A_h, [dstin] # offset 0x0d stp B_l, B_h, [dstin, 16] # offset 0x1d stp C_l, C_h, [dstin, 32] # offset 0x2d stp D_l, D_h, [dstin, 48] # offset 0x3d stp E_l, E_h, [dstend, -32] # offset 0x2d again stp F_l, F_h, [dstend, -16] # offset 0x3d again or (patched libc, corruption happens very rarely for 64 byte compared to other sizes) stp E_l, E_h, [dstend, -32] # offset 0x2d stp F_l, F_h, [dstend, -16] # offset 0x3d stp A_l, A_h, [dstin] # offset 0x0d stp B_l, B_h, [dstin, 16] # offset 0x1d stp C_l, C_h, [dstin, 32] # offset 0x2d again stp D_l, D_h, [dstin, 48] # offset 0x3d again The corruption for both also happens at 0x1d..0x1f, which unfortunately is not easily explained by the theory above, but maybe my glibc sources are slightly different from the ones that were used on the system. > - All corruption with data copied to the wrong place happened for copies > between 33 and 47 bytes, mostly to the smaller end of the scale: > 391 0x21 > 360 0x22 ... > 33 0x2e > 1 0x2f > > - One common (but not the only, still investigating) case for data getting > written to the wrong place is: > * corruption starts 16 bytes after the memcpy start > * corrupt bytes are the same as the bytes written to the start > * start address ends in 0x1 through 0x7 > * length of corruption is at most memcpy length- 32, always > between 1 and 7. This is only observed with the original sequence (B, C, A, D) in glibc, and only when C overlaps with both A and B. A typical example would be // offset = 0x02, can be [0x01..0x07,0x09..0x0f] + n*0x10 // length = 0x23, could be 0x21..0x2f memcpy(map + 0x2, data + 0x2, 0x23); stp B_l, B_h, [dstin, 16] # offset 0x22 stp C_l, C_h, [dstend, -32] # offset 0x15 stp A_l, A_h, [dstin] # offset 0x12 stp D_l, D_h, [dstend, -16] # offset 0x25 In this example, bytes 0x22..0x24 incorrectly contain the data that was written to bytes 0x12..0x14. I would guess that only the stores to C and D get combined here, so we actually have three separate store transactions rather than the two in the first example. Each of the three stores touches data in the 0x20..0x2f range, and these are the transactions that might happen on them, assuming there is a read-modify-write logic somewhere: B1: prefetch 0x20..0x21, modify 0x22..0x2f, store 0x20 CD1: modify 0x20..0x24 A1: prefetch 0x10.0x11, modify 0x12..0x1f, store 0x10 A2: prefetch 0x22..0x2f, modify 0x20..0x21, store 0x20 CD2: modify 0x25..0x2f, store The observation is that data from the A1 stage at offset 0x12..0x14 ends up in the CD buffer, which I can't yet explain simply by doing steps in the wrong order, it still requires something to also confuse two buffers. I've also shown that the length of the corruption strictly depends on the start and end pointer values, and put it up in a spreadsheet at [1]. The case of writing the data to the wrong place happens exactly when A and C are within the same 16-byte aligned range, while writing back the old data (or not writing at all) happens exactly when C writes to the same 16-byte range as the end of B, but doesn't overlap with A. Also, if either pointer is 8-byte aligned, everything is fine. Arnd [1] https://docs.google.com/spreadsheets/d/1zlDMNAgF--5n0zQmfV3JBzkhdSrUNtwSXIHZH-fqRio/edit#gid=0
diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S index 7e1163e6a0..09d0160bdf 100644 --- a/sysdeps/aarch64/memcpy.S +++ b/sysdeps/aarch64/memcpy.S @@ -102,11 +102,11 @@ ENTRY (MEMCPY) tbz tmp1, 5, 1f ldp B_l, B_h, [src, 16] ldp C_l, C_h, [srcend, -32] - stp B_l, B_h, [dstin, 16] stp C_l, C_h, [dstend, -32] + stp B_l, B_h, [dstin, 16] 1: - stp A_l, A_h, [dstin] stp D_l, D_h, [dstend, -16] + stp A_l, A_h, [dstin] ret .p2align 4