locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
Message ID | 20210315184211.4124573-1-yuanzi@google.com |
---|---|
State | Superseded |
Headers |
Return-Path: <libc-alpha-bounces@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0C2B93858C27; Mon, 15 Mar 2021 18:42:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0C2B93858C27 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1615833740; bh=G6zJqBoiLJbQAM9mVEbw7fiwoFk8jaWM+WVAHDm2xiU=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=xIIp3zEjmrPhrsGC8/IfEJQw8OkqRQc0C7r4lo1E5d3Hx9xsQCHgYalQswgKDmQRU GzrFqE5/9KyNoYcgBgAyByhbciUX1RBlSi5Xbs4rFRSOH6pSHcsEPEp/7zbZI6r3hc 2J+P+a9f6LkhBErwCVI3bzekKN1sx55kqNUp67tU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qk1-x74a.google.com (mail-qk1-x74a.google.com [IPv6:2607:f8b0:4864:20::74a]) by sourceware.org (Postfix) with ESMTPS id 1B9193858D29 for <libc-alpha@sourceware.org>; Mon, 15 Mar 2021 18:42:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 1B9193858D29 Received: by mail-qk1-x74a.google.com with SMTP id g18so25159087qki.15 for <libc-alpha@sourceware.org>; Mon, 15 Mar 2021 11:42:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc :content-transfer-encoding; bh=G6zJqBoiLJbQAM9mVEbw7fiwoFk8jaWM+WVAHDm2xiU=; b=AV1+8tdtnIcOUWMApRU8mQAfaYf/1KpLtGoBdbwyTGS60FmktIT6t4/9y5aMzM80/Q PE6ebp0WLGBQI4LXPkids4U94O7U1S/pGLuT4aTSizow8a5UyFyD4eGf2rIKuzbNNihr lH3rUB9rcdu8OZDYpGMCfCmNzFH9tLsWSOqHsdGfILf9bN5jUZgtLO45DlNu26Ie01aA 7ISfrYlXxjAcDo5Il02K7OC+t8PonzoCWbxyEqN/nw8C2JEDqrAonPLYU/NEGEUhN6JR g/dfJzPwLLx6xawgCLhWgv7j/8ouUTOdn1vvYJmpkuzIG3F+E9MmjqscusBWOR1l59xa GmLQ== X-Gm-Message-State: AOAM533kgKTrdFtLF3xucxC03JPbkwL7WrOgrcjE90wvPFw36glUhLWe wwITj0dRvkqqAoxXm8mnHFzK0I/eauMoLUI3a0fvaI30ZEs0nI5qWi01sKx2kFjLt7x2SsBbbZt vQE5wzAq3Y4V5hwwYdS+Xq9XxEwBApivz9dWTPiCfz6UEr+iMYJNanwhuhSemwtm1gA== X-Google-Smtp-Source: ABdhPJzVwR0uXSruWHX0ARV49R99L99CS2ixSc/+vT1rclgXNJr8NNjL0ZkeGh8CBWmT5Rpz4tDDfKjkNO8= X-Received: from yuanzi.svl.corp.google.com ([2620:15c:2ce:200:6858:7514:943:c39d]) (user=yuanzi job=sendgmr) by 2002:ad4:5614:: with SMTP id ca20mr12164128qvb.37.1615833735038; Mon, 15 Mar 2021 11:42:15 -0700 (PDT) Date: Mon, 15 Mar 2021 11:42:11 -0700 Message-Id: <20210315184211.4124573-1-yuanzi@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.31.0.rc2.261.g7f71774620-goog Subject: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively To: libc-alpha@sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-21.8 required=5.0 tests=BAYES_00, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Lirong Yuan via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Lirong Yuan <yuanzi@google.com> Cc: Lirong Yuan <yuanzi@google.com>, scw@google.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org> |
Series |
locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
|
|
Commit Message
Lirong Yuan
March 15, 2021, 6:42 p.m. UTC
steps to reproduce the problem: compile a program that uses ctype functions such as “isspace” for aarch64 with UBSan flag “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode emulation.
observed behavior: UndefinedBehaviorSanitizer reports misaligned-pointer-use in the program.
solution: align the arrays defined in locale/C-ctype.c with correct data types as defined in ctype/ctype.h.
test suite regressions: none.
Signed-off-by: Lirong Yuan <yuanzi@google.com>
---
locale/C-ctype.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Comments
On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote: > steps to reproduce the problem: compile a program that uses ctype functions such as “isspace” for aarch64 with UBSan flag “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode emulation. Szabolcs, Do you have any input on this? > observed behavior: UndefinedBehaviorSanitizer reports misaligned-pointer-use in the program. Yes, the char array could be misaligned with respect to a 16-bit value, and should be aligned to the type that is expected from the interface e.g. ctype/ctype.h: 91 # define __isctype_f(type) \ 92 __extern_inline int \ 93 is##type (int __c) __THROW \ 94 { \ 95 return (*__ctype_b_loc ())[(int) (__c)] & (unsigned short int) _IS##type; \ 96 } 97 #endif include/ctype.h: 38 CTYPE_EXTERN_INLINE const uint16_t ** __attribute__ ((const)) 39 __ctype_b_loc (void) 40 { 41 return __libc_tsd_address (const uint16_t *, CTYPE_B); 42 } So we expect a uint16_t type and the respective alignment. My expectation is that normally aarch64 simply handles the unaligned load without any problems, but that it would be "better" if it were 16-bit aligned? Is this the *only* case of misaligned pointers? > solution: align the arrays defined in locale/C-ctype.c with correct data types as defined in ctype/ctype.h. > > test suite regressions: none. This looks technically correct, and the C locale is builtin so the layout itself should be able to change without any problems. I'd like to hear comments from Arm about this before accepting. > Signed-off-by: Lirong Yuan <yuanzi@google.com> We don't use DSOs in glibc, we assign copyright to the FSF, so this line would be normally removed, and you as the git author remains. See: https://sourceware.org/glibc/wiki/Contribution%20checklist You are covered by the Google copyright assignment so everything is accepted. > --- > locale/C-ctype.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/locale/C-ctype.c b/locale/C-ctype.c > index bffdbedad0..da2c8cc33c 100644 > --- a/locale/C-ctype.c > +++ b/locale/C-ctype.c > @@ -18,6 +18,7 @@ > > #include "localeinfo.h" > #include <endian.h> > +#include <stdalign.h> OK. > #include <stdint.h> > > #include "C-translit.h" > @@ -30,7 +31,7 @@ > In the `_nl_C_LC_CTYPE_class' array the value for EOF (== -1) > is set to always return 0 and the conversion arrays return EOF. */ > > -const char _nl_C_LC_CTYPE_class[768] attribute_hidden = > +alignas(uint16_t) const char _nl_C_LC_CTYPE_class[768] attribute_hidden = OK. Used directly by __ctype_b_loc. > /* 0x80 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" > /* 0x86 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" > /* 0x8c */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" > @@ -96,7 +97,7 @@ const char _nl_C_LC_CTYPE_class[768] attribute_hidden = > /* 0xf4 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" > /* 0xfa */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" > ; > -const char _nl_C_LC_CTYPE_class32[1024] attribute_hidden = > +alignas(uint32_t) const char _nl_C_LC_CTYPE_class32[1024] attribute_hidden = OK. Might be exposed via nl_langinfo and is internally uint32_t (thought directly exposed __ctype32_b should not exist for aarch64 (verified not on abilist)). > /* 0x00 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000" > /* 0x03 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000" > /* 0x06 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000" >
The 03/15/2021 21:44, Carlos O'Donell wrote: > On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote: > > steps to reproduce the problem: compile a program that uses ctype functions such as “isspace” for aarch64 with UBSan flag “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode emulation. > > Szabolcs, > > Do you have any input on this? > > > observed behavior: UndefinedBehaviorSanitizer reports misaligned-pointer-use in the program. > > Yes, the char array could be misaligned with respect to a 16-bit value, > and should be aligned to the type that is expected from the interface e.g. using char[] as uint16_t[] is aliasing violation. and in principle alignas on the definition does not fix this, but in practice that's the only abi visible aspect of the wrong type. i'm not sure why ubsanitizer cares about alignment specifically on aarch64, unaligned load should work.
On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com> wrote: > My expectation is that normally aarch64 simply handles the unaligned load > without any problems, > but that it would be "better" if it were 16-bit aligned? > Is this the *only* case of misaligned pointers? Yes, this is the only case reported by UBSan. > Signed-off-by: Lirong Yuan <yuanzi@google.com> > We don't use DSOs in glibc, we assign copyright to the FSF, so this line > would > be normally removed, and you as the git author remains. Thanks for the explanation! I will send an updated patch without "Signed-off-by" if the current approach looks good. :) On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote: > The 03/15/2021 21:44, Carlos O'Donell wrote: > > On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote: > > > steps to reproduce the problem: compile a program that uses ctype > functions such as “isspace” for aarch64 with UBSan flag > “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode > emulation. > > > > Szabolcs, > > > > Do you have any input on this? > > > > > observed behavior: UndefinedBehaviorSanitizer reports > misaligned-pointer-use in the program. > > > > Yes, the char array could be misaligned with respect to a 16-bit value, > > and should be aligned to the type that is expected from the interface > e.g. > > using char[] as uint16_t[] is aliasing violation. and in principle > alignas on the definition does not fix this, but in practice that's > the only abi visible aspect of the wrong type. > Alternatively, we can define _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays, like _nl_C_LC_CTYPE_toupper array: https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper Though the conversion may be error-prune and require more test cases... It would seem that using alignas is an approach that's both technically correct and less likely to cause havoc. > i'm not sure why ubsanitizer cares about alignment specifically on > aarch64, unaligned load should work. > Yes, the code works fine in practice on aarch64. The ubsan alignment is a check for misaligned rather than unaligned. It's almost always worth fixing since this can cause subtle and hard to track down failures that more often manifest on other architectures. Thanks, Lirong
On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote: > On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com> wrote: > >> My expectation is that normally aarch64 simply handles the unaligned load >> without any problems, >> but that it would be "better" if it were 16-bit aligned? >> Is this the *only* case of misaligned pointers? > > > Yes, this is the only case reported by UBSan. > >> Signed-off-by: Lirong Yuan <yuanzi@google.com> >> We don't use DSOs in glibc, we assign copyright to the FSF, so this line >> would >> be normally removed, and you as the git author remains. > > > Thanks for the explanation! I will send an updated patch without > "Signed-off-by" if the current approach looks good. :) > > On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote: > >> The 03/15/2021 21:44, Carlos O'Donell wrote: >>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote: >>>> steps to reproduce the problem: compile a program that uses ctype >> functions such as “isspace” for aarch64 with UBSan flag >> “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode >> emulation. >>> >>> Szabolcs, >>> >>> Do you have any input on this? >>> >>>> observed behavior: UndefinedBehaviorSanitizer reports >> misaligned-pointer-use in the program. >>> >>> Yes, the char array could be misaligned with respect to a 16-bit value, >>> and should be aligned to the type that is expected from the interface >> e.g. >> >> using char[] as uint16_t[] is aliasing violation. and in principle >> alignas on the definition does not fix this, but in practice that's >> the only abi visible aspect of the wrong type. >> > > Alternatively, we can define _nl_C_LC_CTYPE_class and > _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays, > like _nl_C_LC_CTYPE_toupper array: > https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper > Though the conversion may be error-prune and require more test cases... > It would seem that using alignas is an approach that's both technically > correct and less likely to cause havoc. Could you check if using the expected types yields any regression? All their usages are using explicit cast to the expected types, so I am can't see why they have declared as char at first place. > > >> i'm not sure why ubsanitizer cares about alignment specifically on >> aarch64, unaligned load should work. >> > > Yes, the code works fine in practice on aarch64. The ubsan alignment is a > check for misaligned rather than unaligned. It's almost always worth fixing > since this can cause subtle and hard to track down failures that more often > manifest on other architectures. I would expect that it this is really accessed in an unaligned manner it would blow in some architectures (sparc and some arm and mips environments). Not sure why we haven't see any issues on such architectures.
On Mär 16 2021, Adhemerval Zanella via Libc-alpha wrote: > I would expect that it this is really accessed in an unaligned manner it > would blow in some architectures (sparc and some arm and mips environments). > Not sure why we haven't see any issues on such architectures. It is very unlikely that the arrays are placed at odd addresses, since they are bundled together with aligned data. Andreas.
On 3/16/21 3:47 PM, Adhemerval Zanella via Libc-alpha wrote: > > > On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote: >> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com> wrote: >> >>> My expectation is that normally aarch64 simply handles the unaligned load >>> without any problems, >>> but that it would be "better" if it were 16-bit aligned? >>> Is this the *only* case of misaligned pointers? >> >> >> Yes, this is the only case reported by UBSan. >> >>> Signed-off-by: Lirong Yuan <yuanzi@google.com> >>> We don't use DSOs in glibc, we assign copyright to the FSF, so this line >>> would >>> be normally removed, and you as the git author remains. >> >> >> Thanks for the explanation! I will send an updated patch without >> "Signed-off-by" if the current approach looks good. :) >> >> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote: >> >>> The 03/15/2021 21:44, Carlos O'Donell wrote: >>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote: >>>>> steps to reproduce the problem: compile a program that uses ctype >>> functions such as “isspace” for aarch64 with UBSan flag >>> “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode >>> emulation. >>>> >>>> Szabolcs, >>>> >>>> Do you have any input on this? >>>> >>>>> observed behavior: UndefinedBehaviorSanitizer reports >>> misaligned-pointer-use in the program. >>>> >>>> Yes, the char array could be misaligned with respect to a 16-bit value, >>>> and should be aligned to the type that is expected from the interface >>> e.g. >>> >>> using char[] as uint16_t[] is aliasing violation. and in principle >>> alignas on the definition does not fix this, but in practice that's >>> the only abi visible aspect of the wrong type. >>> >> >> Alternatively, we can define _nl_C_LC_CTYPE_class and >> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays, >> like _nl_C_LC_CTYPE_toupper array: >> https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper >> Though the conversion may be error-prune and require more test cases... >> It would seem that using alignas is an approach that's both technically >> correct and less likely to cause havoc. > > Could you check if using the expected types yields any regression? > All their usages are using explicit cast to the expected types, so > I am can't see why they have declared as char at first place. It might work, but it: * changes the ABI of binary data exposed via nl_langinfo()? - Pointers to the array are returned. * changes the on-disk layout and invalidates all system binary locales? - Something similar happened for ALTMON and we just rebuilt everything. - It isn't entirely clear to me what our guarantee of binary compatibility is for compiled locales. Is this a concern? >>> i'm not sure why ubsanitizer cares about alignment specifically on >>> aarch64, unaligned load should work. >>> >> >> Yes, the code works fine in practice on aarch64. The ubsan alignment is a >> check for misaligned rather than unaligned. It's almost always worth fixing >> since this can cause subtle and hard to track down failures that more often >> manifest on other architectures. > > I would expect that it this is really accessed in an unaligned manner it > would blow in some architectures (sparc and some arm and mips environments). > Not sure why we haven't see any issues on such architectures. AFAICT the structure is *not* misaligned on x86_64 and it has to do with the vagaries of the compiler and linker you use and the alignment of global variables. On x86_64 the structures are all over-aligned. I expect the same is true for all other arches. Perhaps Google used lld with an experimental glibc patch and they have more tightly aligned object layout. e.g. readelf -a -W /lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class 23087: 0000000000176ec0 1024 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class32 23125: 00000000001761a0 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_alpha 23699: 0000000000176020 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_print 23960: 0000000000175e40 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_alnum 24152: 0000000000176200 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_lower 24208: 00000000001760e0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_xdigit 24547: 0000000000175fc0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_graph 24569: 0000000000175f60 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_blank 24582: 0000000000175ea0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_punct 24764: 0000000000176140 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_digit 24883: 0000000000175f00 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_cntrl 24991: 00000000001772c0 768 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class 25344: 0000000000176260 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_upper 25416: 0000000000176080 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_space Aligned on 16-byte boundaries. On AArch64 I see this: eu-readelf -a -W lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class 34906: 000000000011eea8 1024 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class32 34945: 000000000011e1d0 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_alpha 35470: 000000000011e0a0 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_print 35712: 000000000011df18 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_alnum 35889: 000000000011e218 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_lower 35944: 000000000011e138 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_xdigit 36246: 000000000011e050 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_graph 36268: 000000000011e008 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_blank 36284: 000000000011df68 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_punct 36455: 000000000011e188 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_digit 36571: 000000000011dfb8 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_cntrl 36670: 000000000011f2a8 768 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class 36993: 000000000011e260 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_upper 37061: 000000000011e0f0 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_space Aligned on 8-byte boundaries. But that alignment could go lower and still be valid and cause unaligned accesses.
On 16/03/2021 18:05, Carlos O'Donell wrote: > On 3/16/21 3:47 PM, Adhemerval Zanella via Libc-alpha wrote: >> >> >> On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote: >>> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com> wrote: >>> >>>> My expectation is that normally aarch64 simply handles the unaligned load >>>> without any problems, >>>> but that it would be "better" if it were 16-bit aligned? >>>> Is this the *only* case of misaligned pointers? >>> >>> >>> Yes, this is the only case reported by UBSan. >>> >>>> Signed-off-by: Lirong Yuan <yuanzi@google.com> >>>> We don't use DSOs in glibc, we assign copyright to the FSF, so this line >>>> would >>>> be normally removed, and you as the git author remains. >>> >>> >>> Thanks for the explanation! I will send an updated patch without >>> "Signed-off-by" if the current approach looks good. :) >>> >>> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote: >>> >>>> The 03/15/2021 21:44, Carlos O'Donell wrote: >>>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote: >>>>>> steps to reproduce the problem: compile a program that uses ctype >>>> functions such as “isspace” for aarch64 with UBSan flag >>>> “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode >>>> emulation. >>>>> >>>>> Szabolcs, >>>>> >>>>> Do you have any input on this? >>>>> >>>>>> observed behavior: UndefinedBehaviorSanitizer reports >>>> misaligned-pointer-use in the program. >>>>> >>>>> Yes, the char array could be misaligned with respect to a 16-bit value, >>>>> and should be aligned to the type that is expected from the interface >>>> e.g. >>>> >>>> using char[] as uint16_t[] is aliasing violation. and in principle >>>> alignas on the definition does not fix this, but in practice that's >>>> the only abi visible aspect of the wrong type. >>>> >>> >>> Alternatively, we can define _nl_C_LC_CTYPE_class and >>> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays, >>> like _nl_C_LC_CTYPE_toupper array: >>> https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper >>> Though the conversion may be error-prune and require more test cases... >>> It would seem that using alignas is an approach that's both technically >>> correct and less likely to cause havoc. >> >> Could you check if using the expected types yields any regression? >> All their usages are using explicit cast to the expected types, so >> I am can't see why they have declared as char at first place. > > It might work, but it: > > * changes the ABI of binary data exposed via nl_langinfo()? > - Pointers to the array are returned. > > * changes the on-disk layout and invalidates all system binary locales? > - Something similar happened for ALTMON and we just rebuilt everything. > - It isn't entirely clear to me what our guarantee of binary compatibility > is for compiled locales. > > Is this a concern? > But wouldn't the array alignment change cause this very changes as well? The ABI of binary data exposed is already exported as 'unsigned short', so not sure if will change anything here. I am not sure about on-disk layout and I don't recall the ALTMON change specifically. >>>> i'm not sure why ubsanitizer cares about alignment specifically on >>>> aarch64, unaligned load should work. >>>> >>> >>> Yes, the code works fine in practice on aarch64. The ubsan alignment is a >>> check for misaligned rather than unaligned. It's almost always worth fixing >>> since this can cause subtle and hard to track down failures that more often >>> manifest on other architectures. >> >> I would expect that it this is really accessed in an unaligned manner it >> would blow in some architectures (sparc and some arm and mips environments). >> Not sure why we haven't see any issues on such architectures. > > AFAICT the structure is *not* misaligned on x86_64 and it has to do with the > vagaries of the compiler and linker you use and the alignment of global > variables. On x86_64 the structures are all over-aligned. I expect the same > is true for all other arches. Perhaps Google used lld with an experimental > glibc patch and they have more tightly aligned object layout. So were it the case we got lucky that linker is laying out the data structures in the expected alignment? > > e.g. > readelf -a -W /lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class > 23087: 0000000000176ec0 1024 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class32 > 23125: 00000000001761a0 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_alpha > 23699: 0000000000176020 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_print > 23960: 0000000000175e40 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_alnum > 24152: 0000000000176200 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_lower > 24208: 00000000001760e0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_xdigit > 24547: 0000000000175fc0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_graph > 24569: 0000000000175f60 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_blank > 24582: 0000000000175ea0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_punct > 24764: 0000000000176140 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_digit > 24883: 0000000000175f00 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_cntrl > 24991: 00000000001772c0 768 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class > 25344: 0000000000176260 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_upper > 25416: 0000000000176080 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_space > > Aligned on 16-byte boundaries. > > On AArch64 I see this: > eu-readelf -a -W lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class > 34906: 000000000011eea8 1024 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class32 > 34945: 000000000011e1d0 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_alpha > 35470: 000000000011e0a0 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_print > 35712: 000000000011df18 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_alnum > 35889: 000000000011e218 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_lower > 35944: 000000000011e138 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_xdigit > 36246: 000000000011e050 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_graph > 36268: 000000000011e008 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_blank > 36284: 000000000011df68 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_punct > 36455: 000000000011e188 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_digit > 36571: 000000000011dfb8 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_cntrl > 36670: 000000000011f2a8 768 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class > 36993: 000000000011e260 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_upper > 37061: 000000000011e0f0 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_space > > Aligned on 8-byte boundaries. > > But that alignment could go lower and still be valid and cause unaligned accesses. >
On Tue, Mar 16, 2021 at 12:47 PM Adhemerval Zanella < adhemerval.zanella@linaro.org> wrote: > Could you check if using the expected types yields any regression? > All their usages are using explicit cast to the expected types, so > I am can't see why they have declared as char at first place. Took a while to get it right but checked that using the expected types did not yield any regression. :) On Wed, Mar 17, 2021 at 4:34 AM Adhemerval Zanella < adhemerval.zanella@linaro.org> wrote: > But wouldn't the array alignment change cause this very changes as well? > The ABI of binary data exposed is already exported as 'unsigned short', > so not sure if will change anything here. I am not sure about on-disk > layout and I don't recall the ALTMON change specifically. Layout seemed to change for the newly built libraries. Clean lib: 6982: 0000000000025f65 768 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class 6983: 0000000000026265 1024 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class32 6984: 0000000000027588 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_alnum 6985: 00000000000272f8 72 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_alpha 6986: 00000000000274ac 68 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_blank 6987: 00000000000274f0 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_cntrl 6988: 0000000000027340 68 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_digit 6989: 0000000000027460 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_graph 6990: 00000000000272b0 72 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_lower 6991: 0000000000027414 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_print 6992: 000000000002753c 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_punct 6993: 00000000000273d0 68 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_space 6994: 0000000000027268 72 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_upper 6995: 0000000000027384 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_xdigit New lib built using alignas: 5662: 000000000002ac5e 768 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class 5663: 000000000002af60 1024 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class32 5666: 000000000002bf60 72 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_upper 5667: 000000000002bfa8 72 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_lower 5668: 000000000002bff0 72 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_alpha 5669: 000000000002c038 68 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_digit 5670: 000000000002c07c 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_xdigit 5671: 000000000002c0c8 68 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_space 5672: 000000000002c10c 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_print 5673: 000000000002c158 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_graph 5674: 000000000002c1a4 68 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_blank 5675: 000000000002c1e8 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_cntrl 5676: 000000000002c234 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_punct 5677: 000000000002c280 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_alnum New lib built using expected types: 5662: 000000000002ac5e 768 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class 5663: 000000000002af60 1024 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class32 5666: 000000000002bf60 72 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_upper 5667: 000000000002bfa8 72 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_lower 5668: 000000000002bff0 72 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_alpha 5669: 000000000002c038 68 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_digit 5670: 000000000002c07c 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_xdigit 5671: 000000000002c0c8 68 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_space 5672: 000000000002c10c 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_print 5673: 000000000002c158 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_graph 5674: 000000000002c1a4 68 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_blank 5675: 000000000002c1e8 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_cntrl 5676: 000000000002c234 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_punct 5677: 000000000002c280 76 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class_alnum So were it the case we got lucky that linker is laying out the data > structures > in the expected alignment? Yeah, sort of like a race condition that can only be discovered under certain circumstances. In this case, the condition is running an arm program compiled with ubsan under qemu user mode emulation on a x86 machine. Regards, Lirong On Wed, Mar 17, 2021 at 4:34 AM Adhemerval Zanella < adhemerval.zanella@linaro.org> wrote: > > > On 16/03/2021 18:05, Carlos O'Donell wrote: > > On 3/16/21 3:47 PM, Adhemerval Zanella via Libc-alpha wrote: > >> > >> > >> On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote: > >>> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com> > wrote: > >>> > >>>> My expectation is that normally aarch64 simply handles the unaligned > load > >>>> without any problems, > >>>> but that it would be "better" if it were 16-bit aligned? > >>>> Is this the *only* case of misaligned pointers? > >>> > >>> > >>> Yes, this is the only case reported by UBSan. > >>> > >>>> Signed-off-by: Lirong Yuan <yuanzi@google.com> > >>>> We don't use DSOs in glibc, we assign copyright to the FSF, so this > line > >>>> would > >>>> be normally removed, and you as the git author remains. > >>> > >>> > >>> Thanks for the explanation! I will send an updated patch without > >>> "Signed-off-by" if the current approach looks good. :) > >>> > >>> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> > wrote: > >>> > >>>> The 03/15/2021 21:44, Carlos O'Donell wrote: > >>>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote: > >>>>>> steps to reproduce the problem: compile a program that uses ctype > >>>> functions such as “isspace” for aarch64 with UBSan flag > >>>> “-fsanitize=undefined” and run it on x86_64 machines with qemu user > mode > >>>> emulation. > >>>>> > >>>>> Szabolcs, > >>>>> > >>>>> Do you have any input on this? > >>>>> > >>>>>> observed behavior: UndefinedBehaviorSanitizer reports > >>>> misaligned-pointer-use in the program. > >>>>> > >>>>> Yes, the char array could be misaligned with respect to a 16-bit > value, > >>>>> and should be aligned to the type that is expected from the interface > >>>> e.g. > >>>> > >>>> using char[] as uint16_t[] is aliasing violation. and in principle > >>>> alignas on the definition does not fix this, but in practice that's > >>>> the only abi visible aspect of the wrong type. > >>>> > >>> > >>> Alternatively, we can define _nl_C_LC_CTYPE_class and > >>> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays, > >>> like _nl_C_LC_CTYPE_toupper array: > >>> > https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper > >>> Though the conversion may be error-prune and require more test cases... > >>> It would seem that using alignas is an approach that's both technically > >>> correct and less likely to cause havoc. > >> > >> Could you check if using the expected types yields any regression? > >> All their usages are using explicit cast to the expected types, so > >> I am can't see why they have declared as char at first place. > > > > It might work, but it: > > > > * changes the ABI of binary data exposed via nl_langinfo()? > > - Pointers to the array are returned. > > > > * changes the on-disk layout and invalidates all system binary locales? > > - Something similar happened for ALTMON and we just rebuilt everything. > > - It isn't entirely clear to me what our guarantee of binary > compatibility > > is for compiled locales. > > > > Is this a concern? > > > > But wouldn't the array alignment change cause this very changes as well? > The ABI of binary data exposed is already exported as 'unsigned short', > so not sure if will change anything here. I am not sure about on-disk > layout and I don't recall the ALTMON change specifically. > > >>>> i'm not sure why ubsanitizer cares about alignment specifically on > >>>> aarch64, unaligned load should work. > >>>> > >>> > >>> Yes, the code works fine in practice on aarch64. The ubsan alignment > is a > >>> check for misaligned rather than unaligned. It's almost always worth > fixing > >>> since this can cause subtle and hard to track down failures that more > often > >>> manifest on other architectures. > >> > >> I would expect that it this is really accessed in an unaligned manner it > >> would blow in some architectures (sparc and some arm and mips > environments). > >> Not sure why we haven't see any issues on such architectures. > > > > AFAICT the structure is *not* misaligned on x86_64 and it has to do with > the > > vagaries of the compiler and linker you use and the alignment of global > > variables. On x86_64 the structures are all over-aligned. I expect the > same > > is true for all other arches. Perhaps Google used lld with an > experimental > > glibc patch and they have more tightly aligned object layout. > > So were it the case we got lucky that linker is laying out the data > structures > in the expected alignment? > > > > > e.g. > > readelf -a -W /lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class > > 23087: 0000000000176ec0 1024 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class32 > > 23125: 00000000001761a0 72 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class_alpha > > 23699: 0000000000176020 76 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class_print > > 23960: 0000000000175e40 76 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class_alnum > > 24152: 0000000000176200 72 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class_lower > > 24208: 00000000001760e0 76 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class_xdigit > > 24547: 0000000000175fc0 76 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class_graph > > 24569: 0000000000175f60 68 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class_blank > > 24582: 0000000000175ea0 76 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class_punct > > 24764: 0000000000176140 68 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class_digit > > 24883: 0000000000175f00 76 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class_cntrl > > 24991: 00000000001772c0 768 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class > > 25344: 0000000000176260 72 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class_upper > > 25416: 0000000000176080 68 OBJECT LOCAL DEFAULT 17 > _nl_C_LC_CTYPE_class_space > > > > Aligned on 16-byte boundaries. > > > > On AArch64 I see this: > > eu-readelf -a -W lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class > > 34906: 000000000011eea8 1024 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class32 > > 34945: 000000000011e1d0 72 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class_alpha > > 35470: 000000000011e0a0 76 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class_print > > 35712: 000000000011df18 76 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class_alnum > > 35889: 000000000011e218 72 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class_lower > > 35944: 000000000011e138 76 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class_xdigit > > 36246: 000000000011e050 76 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class_graph > > 36268: 000000000011e008 68 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class_blank > > 36284: 000000000011df68 76 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class_punct > > 36455: 000000000011e188 68 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class_digit > > 36571: 000000000011dfb8 76 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class_cntrl > > 36670: 000000000011f2a8 768 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class > > 36993: 000000000011e260 72 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class_upper > > 37061: 000000000011e0f0 68 OBJECT LOCAL DEFAULT 16 > _nl_C_LC_CTYPE_class_space > > > > Aligned on 8-byte boundaries. > > > > But that alignment could go lower and still be valid and cause unaligned > accesses. > > >
Friendly Ping~ Also sent the v2 patch for using the expected types: http://patchwork.sourceware.org/project/glibc/patch/20210330172518.184058-1-yuanzi@google.com/ Regards, Lirong On Fri, Mar 19, 2021 at 11:31 AM Lirong Yuan <yuanzi@google.com> wrote: > On Tue, Mar 16, 2021 at 12:47 PM Adhemerval Zanella < > adhemerval.zanella@linaro.org> wrote: > >> Could you check if using the expected types yields any regression? >> All their usages are using explicit cast to the expected types, so >> I am can't see why they have declared as char at first place. > > > Took a while to get it right but checked that using the expected types did > not yield any regression. :) > > On Wed, Mar 17, 2021 at 4:34 AM Adhemerval Zanella < > adhemerval.zanella@linaro.org> wrote: > >> But wouldn't the array alignment change cause this very changes as well? >> The ABI of binary data exposed is already exported as 'unsigned short', >> so not sure if will change anything here. I am not sure about on-disk >> layout and I don't recall the ALTMON change specifically. > > > Layout seemed to change for the newly built libraries. Clean lib: 6982: > 0000000000025f65 768 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class > 6983: 0000000000026265 1024 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class32 > 6984: 0000000000027588 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_alnum > 6985: 00000000000272f8 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_alpha > 6986: 00000000000274ac 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_blank > 6987: 00000000000274f0 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_cntrl > 6988: 0000000000027340 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_digit > 6989: 0000000000027460 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_graph > 6990: 00000000000272b0 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_lower > 6991: 0000000000027414 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_print > 6992: 000000000002753c 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_punct > 6993: 00000000000273d0 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_space > 6994: 0000000000027268 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_upper > 6995: 0000000000027384 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_xdigit > > New lib built using alignas: 5662: 000000000002ac5e 768 OBJECT LOCAL > HIDDEN 12 _nl_C_LC_CTYPE_class > 5663: 000000000002af60 1024 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class32 > 5666: 000000000002bf60 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_upper > 5667: 000000000002bfa8 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_lower > 5668: 000000000002bff0 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_alpha > 5669: 000000000002c038 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_digit > 5670: 000000000002c07c 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_xdigit > 5671: 000000000002c0c8 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_space > 5672: 000000000002c10c 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_print > 5673: 000000000002c158 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_graph > 5674: 000000000002c1a4 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_blank > 5675: 000000000002c1e8 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_cntrl > 5676: 000000000002c234 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_punct > 5677: 000000000002c280 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_alnum > > New lib built using expected types: 5662: 000000000002ac5e 768 OBJECT > LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class > 5663: 000000000002af60 1024 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class32 > 5666: 000000000002bf60 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_upper > 5667: 000000000002bfa8 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_lower > 5668: 000000000002bff0 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_alpha > 5669: 000000000002c038 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_digit > 5670: 000000000002c07c 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_xdigit > 5671: 000000000002c0c8 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_space > 5672: 000000000002c10c 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_print > 5673: 000000000002c158 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_graph > 5674: 000000000002c1a4 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_blank > 5675: 000000000002c1e8 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_cntrl > 5676: 000000000002c234 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_punct > 5677: 000000000002c280 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_alnum > > So were it the case we got lucky that linker is laying out the data >> structures >> in the expected alignment? > > > Yeah, sort of like a race condition that can only be discovered under > certain circumstances. > In this case, the condition is running an arm program compiled with ubsan > under qemu user mode emulation on a x86 machine. > > Regards, > Lirong > > On Wed, Mar 17, 2021 at 4:34 AM Adhemerval Zanella < > adhemerval.zanella@linaro.org> wrote: > >> >> >> On 16/03/2021 18:05, Carlos O'Donell wrote: >> > On 3/16/21 3:47 PM, Adhemerval Zanella via Libc-alpha wrote: >> >> >> >> >> >> On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote: >> >>> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com> >> wrote: >> >>> >> >>>> My expectation is that normally aarch64 simply handles the unaligned >> load >> >>>> without any problems, >> >>>> but that it would be "better" if it were 16-bit aligned? >> >>>> Is this the *only* case of misaligned pointers? >> >>> >> >>> >> >>> Yes, this is the only case reported by UBSan. >> >>> >> >>>> Signed-off-by: Lirong Yuan <yuanzi@google.com> >> >>>> We don't use DSOs in glibc, we assign copyright to the FSF, so this >> line >> >>>> would >> >>>> be normally removed, and you as the git author remains. >> >>> >> >>> >> >>> Thanks for the explanation! I will send an updated patch without >> >>> "Signed-off-by" if the current approach looks good. :) >> >>> >> >>> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> >> wrote: >> >>> >> >>>> The 03/15/2021 21:44, Carlos O'Donell wrote: >> >>>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote: >> >>>>>> steps to reproduce the problem: compile a program that uses ctype >> >>>> functions such as “isspace” for aarch64 with UBSan flag >> >>>> “-fsanitize=undefined” and run it on x86_64 machines with qemu user >> mode >> >>>> emulation. >> >>>>> >> >>>>> Szabolcs, >> >>>>> >> >>>>> Do you have any input on this? >> >>>>> >> >>>>>> observed behavior: UndefinedBehaviorSanitizer reports >> >>>> misaligned-pointer-use in the program. >> >>>>> >> >>>>> Yes, the char array could be misaligned with respect to a 16-bit >> value, >> >>>>> and should be aligned to the type that is expected from the >> interface >> >>>> e.g. >> >>>> >> >>>> using char[] as uint16_t[] is aliasing violation. and in principle >> >>>> alignas on the definition does not fix this, but in practice that's >> >>>> the only abi visible aspect of the wrong type. >> >>>> >> >>> >> >>> Alternatively, we can define _nl_C_LC_CTYPE_class and >> >>> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t >> arrays, >> >>> like _nl_C_LC_CTYPE_toupper array: >> >>> >> https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper >> >>> Though the conversion may be error-prune and require more test >> cases... >> >>> It would seem that using alignas is an approach that's both >> technically >> >>> correct and less likely to cause havoc. >> >> >> >> Could you check if using the expected types yields any regression? >> >> All their usages are using explicit cast to the expected types, so >> >> I am can't see why they have declared as char at first place. >> > >> > It might work, but it: >> > >> > * changes the ABI of binary data exposed via nl_langinfo()? >> > - Pointers to the array are returned. >> > >> > * changes the on-disk layout and invalidates all system binary locales? >> > - Something similar happened for ALTMON and we just rebuilt >> everything. >> > - It isn't entirely clear to me what our guarantee of binary >> compatibility >> > is for compiled locales. >> > >> > Is this a concern? >> > >> >> But wouldn't the array alignment change cause this very changes as well? >> The ABI of binary data exposed is already exported as 'unsigned short', >> so not sure if will change anything here. I am not sure about on-disk >> layout and I don't recall the ALTMON change specifically. >> >> >>>> i'm not sure why ubsanitizer cares about alignment specifically on >> >>>> aarch64, unaligned load should work. >> >>>> >> >>> >> >>> Yes, the code works fine in practice on aarch64. The ubsan alignment >> is a >> >>> check for misaligned rather than unaligned. It's almost always worth >> fixing >> >>> since this can cause subtle and hard to track down failures that more >> often >> >>> manifest on other architectures. >> >> >> >> I would expect that it this is really accessed in an unaligned manner >> it >> >> would blow in some architectures (sparc and some arm and mips >> environments). >> >> Not sure why we haven't see any issues on such architectures. >> > >> > AFAICT the structure is *not* misaligned on x86_64 and it has to do >> with the >> > vagaries of the compiler and linker you use and the alignment of global >> > variables. On x86_64 the structures are all over-aligned. I expect the >> same >> > is true for all other arches. Perhaps Google used lld with an >> experimental >> > glibc patch and they have more tightly aligned object layout. >> >> So were it the case we got lucky that linker is laying out the data >> structures >> in the expected alignment? >> >> > >> > e.g. >> > readelf -a -W /lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class >> > 23087: 0000000000176ec0 1024 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class32 >> > 23125: 00000000001761a0 72 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_alpha >> > 23699: 0000000000176020 76 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_print >> > 23960: 0000000000175e40 76 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_alnum >> > 24152: 0000000000176200 72 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_lower >> > 24208: 00000000001760e0 76 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_xdigit >> > 24547: 0000000000175fc0 76 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_graph >> > 24569: 0000000000175f60 68 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_blank >> > 24582: 0000000000175ea0 76 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_punct >> > 24764: 0000000000176140 68 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_digit >> > 24883: 0000000000175f00 76 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_cntrl >> > 24991: 00000000001772c0 768 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class >> > 25344: 0000000000176260 72 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_upper >> > 25416: 0000000000176080 68 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_space >> > >> > Aligned on 16-byte boundaries. >> > >> > On AArch64 I see this: >> > eu-readelf -a -W lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class >> > 34906: 000000000011eea8 1024 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class32 >> > 34945: 000000000011e1d0 72 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_alpha >> > 35470: 000000000011e0a0 76 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_print >> > 35712: 000000000011df18 76 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_alnum >> > 35889: 000000000011e218 72 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_lower >> > 35944: 000000000011e138 76 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_xdigit >> > 36246: 000000000011e050 76 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_graph >> > 36268: 000000000011e008 68 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_blank >> > 36284: 000000000011df68 76 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_punct >> > 36455: 000000000011e188 68 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_digit >> > 36571: 000000000011dfb8 76 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_cntrl >> > 36670: 000000000011f2a8 768 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class >> > 36993: 000000000011e260 72 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_upper >> > 37061: 000000000011e0f0 68 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_space >> > >> > Aligned on 8-byte boundaries. >> > >> > But that alignment could go lower and still be valid and cause >> unaligned accesses. >> > >> >
diff --git a/locale/C-ctype.c b/locale/C-ctype.c index bffdbedad0..da2c8cc33c 100644 --- a/locale/C-ctype.c +++ b/locale/C-ctype.c @@ -18,6 +18,7 @@ #include "localeinfo.h" #include <endian.h> +#include <stdalign.h> #include <stdint.h> #include "C-translit.h" @@ -30,7 +31,7 @@ In the `_nl_C_LC_CTYPE_class' array the value for EOF (== -1) is set to always return 0 and the conversion arrays return EOF. */ -const char _nl_C_LC_CTYPE_class[768] attribute_hidden = +alignas(uint16_t) const char _nl_C_LC_CTYPE_class[768] attribute_hidden = /* 0x80 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" /* 0x86 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" /* 0x8c */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" @@ -96,7 +97,7 @@ const char _nl_C_LC_CTYPE_class[768] attribute_hidden = /* 0xf4 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" /* 0xfa */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" ; -const char _nl_C_LC_CTYPE_class32[1024] attribute_hidden = +alignas(uint32_t) const char _nl_C_LC_CTYPE_class32[1024] attribute_hidden = /* 0x00 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000" /* 0x03 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000" /* 0x06 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000"