Message ID | mvm1se4romh.fsf@suse.de |
---|---|
State | Committed |
Commit | 9aaaab7c6e4176e61c59b0a63c6ba906d875dc0e |
Headers |
Received: (qmail 117122 invoked by alias); 22 May 2018 10:06:21 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 116991 invoked by uid 89); 22 May 2018 10:06:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx2.suse.de From: Andreas Schwab <schwab@suse.de> To: libc-alpha@sourceware.org Subject: [PATCH] Don't write beyond destination in __mempcpy_avx512_no_vzeroupper (bug 23196) X-Yow: I can't decide which WRONG TURN to make first!! I wonder if BOB GUCCIONE has these problems! Date: Tue, 22 May 2018 12:06:14 +0200 Message-ID: <mvm1se4romh.fsf@suse.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain |
Commit Message
Andreas Schwab
May 22, 2018, 10:06 a.m. UTC
[BZ #23196] CVE-2018-11237 * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S (L(preloop_large)): Save initial destination pointer in %r11 and use it instead of %rax after the loop. * string/test-mempcpy.c (MIN_PAGE_SIZE): Define. --- string/test-mempcpy.c | 1 + sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S | 5 +++-- 2 files changed, 4 insertions(+), 2 deletions(-)
Comments
On Tue, May 22, 2018 at 3:06 AM, Andreas Schwab <schwab@suse.de> wrote: > [BZ #23196] > CVE-2018-11237 > * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S > (L(preloop_large)): Save initial destination pointer in %r11 and > use it instead of %rax after the loop. > * string/test-mempcpy.c (MIN_PAGE_SIZE): Define. Please include your analysis in commit message. > string/test-mempcpy.c | 1 + > sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S | 5 +++-- > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/string/test-mempcpy.c b/string/test-mempcpy.c > index c08fba895e..d98ecdd2d9 100644 > --- a/string/test-mempcpy.c > +++ b/string/test-mempcpy.c > @@ -18,6 +18,7 @@ > <http://www.gnu.org/licenses/>. */ > > #define MEMCPY_RESULT(dst, len) (dst) + (len) > +#define MIN_PAGE_SIZE 131072 > #define TEST_MAIN > #define TEST_NAME "mempcpy" > #include "test-string.h" The modified test does't fail on Skylake server with unchanged memmove-avx512-no-vzeroupper.S. Can you modify the test so that it fails with the original memmove-avx512-no-vzeroupper.S on Skylake server? > diff --git a/sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S b/sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S > index 23c0f7a9ed..effc3ac2de 100644 > --- a/sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S > +++ b/sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S > @@ -336,6 +336,7 @@ L(preloop_large): > vmovups (%rsi), %zmm4 > vmovups 0x40(%rsi), %zmm5 > > + mov %rdi, %r11 > /* Align destination for access with non-temporal stores in the loop. */ > mov %rdi, %r8 > and $-0x80, %rdi > @@ -366,8 +367,8 @@ L(gobble_256bytes_nt_loop): > cmp $256, %rdx > ja L(gobble_256bytes_nt_loop) > sfence > - vmovups %zmm4, (%rax) > - vmovups %zmm5, 0x40(%rax) > + vmovups %zmm4, (%r11) > + vmovups %zmm5, 0x40(%r11) > jmp L(check) > > L(preloop_large_bkw): memmove-vec-unaligned-erms.S supports no vzeroupper: #ifndef VZEROUPPER # if VEC_SIZE > 16 # define VZEROUPPER vzeroupper # else # define VZEROUPPER # endif #endif Should it be used instead?
On Mai 22 2018, "H.J. Lu" <hjl.tools@gmail.com> wrote: > The modified test does't fail on Skylake server with unchanged > memmove-avx512-no-vzeroupper.S. Can you modify the test > so that it fails with the original memmove-avx512-no-vzeroupper.S > on Skylake server? How to do that? I have tested it on a Xeon Phi (whateve that means) where it failed without the fix. > memmove-vec-unaligned-erms.S supports no vzeroupper: > > #ifndef VZEROUPPER > # if VEC_SIZE > 16 > # define VZEROUPPER vzeroupper > # else > # define VZEROUPPER > # endif > #endif > > Should it be used instead? I don't understand what that has to do with the bug in question. Andreas.
On Tue, May 22, 2018 at 5:17 AM, Andreas Schwab <schwab@suse.de> wrote: > On Mai 22 2018, "H.J. Lu" <hjl.tools@gmail.com> wrote: > >> The modified test does't fail on Skylake server with unchanged >> memmove-avx512-no-vzeroupper.S. Can you modify the test >> so that it fails with the original memmove-avx512-no-vzeroupper.S >> on Skylake server? > > How to do that? I have tested it on a Xeon Phi (whateve that means) > where it failed without the fix. __mempcpy_avx512_no_vzeroupper is also tested on Skylake server: [hjl@gnu-skx-1 build-x86_64-linux]$ ./string/test-mempcpy simple_mempcpy __mempcpy_avx512_no_vzeroupper __mempcpy_avx512_unaligned __mempcpy_avx512_unaligned_erms __mempcpy_avx_unaligned__mempcpy_avx_unaligned_erms __mempcpy_ssse3_back __mempcpy_ssse3 __mempcpy_sse2_unaligned __mempcpy_sse2_unaligned_erms __mempcpy_erms [hjl@gnu-skx-1 build-x86_64-linux]$ It should also fail without the fix. >> memmove-vec-unaligned-erms.S supports no vzeroupper: >> >> #ifndef VZEROUPPER >> # if VEC_SIZE > 16 >> # define VZEROUPPER vzeroupper >> # else >> # define VZEROUPPER >> # endif >> #endif >> >> Should it be used instead? > > I don't understand what that has to do with the bug in question. __mempcpy_avx512_no_vzeroupper can be implemented with memmove-vec-unaligned-erms.S. It doesn't have the issue in memmove-avx512-no-vzeroupper.S.
On Mai 22 2018, "H.J. Lu" <hjl.tools@gmail.com> wrote: > It should also fail without the fix. Again, tell me how to do that. > __mempcpy_avx512_no_vzeroupper can be implemented with > memmove-vec-unaligned-erms.S. In which way? There is no definition of __mempcpy_avx512_no_vzeroupper in memmove-vec-unaligned-erms.S. Andreas.
On 05/22/2018 02:17 PM, Andreas Schwab wrote: > On Mai 22 2018, "H.J. Lu"<hjl.tools@gmail.com> wrote: > >> The modified test does't fail on Skylake server with unchanged >> memmove-avx512-no-vzeroupper.S. Can you modify the test >> so that it fails with the original memmove-avx512-no-vzeroupper.S >> on Skylake server? > How to do that? I have tested it on a Xeon Phi (whateve that means) > where it failed without the fix. It has to do with copy sizes and relation to the cache size. I had to adjust your reproducer to make it fail on KNL, basically increasing the copy size by a factor of 10. Thanks, Florian
On Tue, May 22, 2018 at 5:38 AM, Andreas Schwab <schwab@suse.de> wrote: > On Mai 22 2018, "H.J. Lu" <hjl.tools@gmail.com> wrote: > >> It should also fail without the fix. > > Again, tell me how to do that. You need to trigger this condition on all AVX512 machines. >> __mempcpy_avx512_no_vzeroupper can be implemented with >> memmove-vec-unaligned-erms.S. > > In which way? There is no definition of __mempcpy_avx512_no_vzeroupper > in memmove-vec-unaligned-erms.S. > memmove-vec-unaligned-erms.S is a template. It can be used to implement AVX512 memmove without vzeroupper.
On Mai 22 2018, "H.J. Lu" <hjl.tools@gmail.com> wrote: > On Tue, May 22, 2018 at 5:38 AM, Andreas Schwab <schwab@suse.de> wrote: >> On Mai 22 2018, "H.J. Lu" <hjl.tools@gmail.com> wrote: >> >>> It should also fail without the fix. >> >> Again, tell me how to do that. > > You need to trigger this condition on all AVX512 machines. Again, tell me how to do that. > memmove-vec-unaligned-erms.S is a template. It can be used to > implement AVX512 memmove without vzeroupper. How is that relevant? Andreas.
On Mai 22 2018, Florian Weimer <fweimer@redhat.com> wrote: > It has to do with copy sizes and relation to the cache size. I had to > adjust your reproducer to make it fail on KNL, basically increasing the > copy size by a factor of 10. What is the architectural maximum cache size? Andreas.
On 05/22/2018 03:21 PM, Andreas Schwab wrote: > On Mai 22 2018, Florian Weimer <fweimer@redhat.com> wrote: > >> It has to do with copy sizes and relation to the cache size. I had to >> adjust your reproducer to make it fail on KNL, basically increasing the >> copy size by a factor of 10. > > What is the architectural maximum cache size? My test machine had an L2 cache size of 1 MiB. I think the cut-over happens at half of that (at least for our backported version), which is why #define N 976999 in your reproducer word for me (along with -O0 -fno-builtin). Thanks, Florian
On Mai 22 2018, Florian Weimer <fweimer@redhat.com> wrote:
> My test machine had an L2 cache size of 1 MiB.
Then you should be safe, a 64K mempcpy is enough to trigger it.
Andreas.
On 05/22/2018 04:12 PM, Andreas Schwab wrote: > On Mai 22 2018, Florian Weimer <fweimer@redhat.com> wrote: > >> My test machine had an L2 cache size of 1 MiB. > > Then you should be safe, a 64K mempcpy is enough to trigger it. Our memcpy has: L(512bytesormore): #ifdef SHARED_CACHE_SIZE_HALF mov $SHARED_CACHE_SIZE_HALF, %r8 #else mov __x86_64_shared_cache_size_half(%rip), %r8 #endif cmp %r8, %rdx jae L(preloop_large) cmp $1024, %rdx ja L(1024bytesormore) And with the original reproducer, we wouldn't hit preloop_large, where the bug is. Thanks, Florian
On Mai 22 2018, Florian Weimer <fweimer@redhat.com> wrote: > And with the original reproducer, we wouldn't hit preloop_large, where the > bug is. Worksforme. $ string/test-mempcpy simple_mempcpy __mempcpy_avx512_no_vzeroupper __mempcpy_avx512_unaligned __mempcpy_avx512_unaligned_erms __mempcpy_avx_unaligned__mempcpy_avx_unaligned_erms __mempcpy_ssse3_back __mempcpy_ssse3 __mempcpy_sse2_unaligned __mempcpy_sse2_unaligned_erms __mempcpy_erms /mnt/src/libc/test/string/test-mempcpy: Wrong result in function __mempcpy_avx512_no_vzeroupper dst 0x7f7cf79f6000 "" src 0x7f7cf7a36000 "▒/F]t������,CZq������)@Wn������&=Tk������ #:Qh����� 7Ne|�����4Kby�����▒1H_v�����" len 65536 /mnt/src/libc/test/string/test-mempcpy: Wrong result in function __mempcpy_avx512_no_vzeroupper dst 0x7f7cf79f6000 "" src 0x7f7cf7a36010 "▒/F]t������,CZq������)@Wn������&=Tk������ #:Qh����� 7Ne|�����4Kby�����▒1H_v�����" len 65536 /mnt/src/libc/test/string/test-mempcpy: Wrong result in function __mempcpy_avx512_no_vzeroupper dst 0x7f7cf79f6010 "" src 0x7f7cf7a36000 "▒/F]t������,CZq������)@Wn������&=Tk������ #:Qh����� 7Ne|�����4Kby�����▒1H_v�����" len 65536 /mnt/src/libc/test/string/test-mempcpy: Wrong result in function __mempcpy_avx512_no_vzeroupper dst 0x7f7cf79f6010 "" src 0x7f7cf7a36010 "▒/F]t������,CZq������)@Wn������&=Tk������ #:Qh����� 7Ne|�����4Kby�����▒1H_v�����" len 65536 Andreas.
On 05/22/2018 05:06 PM, Andreas Schwab wrote: > On Mai 22 2018, Florian Weimer<fweimer@redhat.com> wrote: > >> And with the original reproducer, we wouldn't hit preloop_large, where the >> bug is. > Worksforme. > > $ string/test-mempcpy I don't doubt that. Our CPUs probably have different cache sizes, or there are differences in how different glibc versions compute the effective cache size. Thanks, Florian
On Mai 22 2018, Florian Weimer <fweimer@redhat.com> wrote: > On 05/22/2018 05:06 PM, Andreas Schwab wrote: >> On Mai 22 2018, Florian Weimer<fweimer@redhat.com> wrote: >> >>> And with the original reproducer, we wouldn't hit preloop_large, where the >>> bug is. >> Worksforme. >> >> $ string/test-mempcpy > > I don't doubt that. Our CPUs probably have different cache sizes, Same cache size. Andreas.
diff --git a/string/test-mempcpy.c b/string/test-mempcpy.c index c08fba895e..d98ecdd2d9 100644 --- a/string/test-mempcpy.c +++ b/string/test-mempcpy.c @@ -18,6 +18,7 @@ <http://www.gnu.org/licenses/>. */ #define MEMCPY_RESULT(dst, len) (dst) + (len) +#define MIN_PAGE_SIZE 131072 #define TEST_MAIN #define TEST_NAME "mempcpy" #include "test-string.h" diff --git a/sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S b/sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S index 23c0f7a9ed..effc3ac2de 100644 --- a/sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S +++ b/sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S @@ -336,6 +336,7 @@ L(preloop_large): vmovups (%rsi), %zmm4 vmovups 0x40(%rsi), %zmm5 + mov %rdi, %r11 /* Align destination for access with non-temporal stores in the loop. */ mov %rdi, %r8 and $-0x80, %rdi @@ -366,8 +367,8 @@ L(gobble_256bytes_nt_loop): cmp $256, %rdx ja L(gobble_256bytes_nt_loop) sfence - vmovups %zmm4, (%rax) - vmovups %zmm5, 0x40(%rax) + vmovups %zmm4, (%r11) + vmovups %zmm5, 0x40(%r11) jmp L(check) L(preloop_large_bkw):