Message ID | 1443360385-20079-1-git-send-email-koriakin@0x04.net |
---|---|
State | Changes Requested, archived |
Headers |
Received: (qmail 88075 invoked by alias); 27 Sep 2015 13:26:36 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 88059 invoked by uid 89); 27 Sep 2015 13:26:35 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=AWL, BAYES_00, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: xyzzy.0x04.net From: =?UTF-8?q?Marcin=20Ko=C5=9Bcielnicki?= <koriakin@0x04.net> To: libc-alpha@sourceware.org Cc: =?UTF-8?q?Marcin=20Ko=C5=9Bcielnicki?= <koriakin@0x04.net> Subject: [PATCH][BZ 18960] setlocale.c: Mark *_used symbols as unaligned. Date: Sun, 27 Sep 2015 15:26:25 +0200 Message-Id: <1443360385-20079-1-git-send-email-koriakin@0x04.net> |
Commit Message
Marcin Kościelnicki
Sept. 27, 2015, 1:26 p.m. UTC
This ensures that compiler doesn't get the values of these symbols using instructions that have alignment requirements (eg. larl on s390). --- locale/setlocale.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On 09/27/2015 03:26 PM, Marcin Kościelnicki wrote: > This ensures that compiler doesn't get the values of these symbols > using instructions that have alignment requirements (eg. larl on s390). The commit message should perhaps say “the address of these variables” because the trigger for this issue is in C code, so it makes sense to use C terms. > --- > locale/setlocale.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/locale/setlocale.c b/locale/setlocale.c > index ead030d..028496d 100644 > --- a/locale/setlocale.c > +++ b/locale/setlocale.c > @@ -35,7 +35,7 @@ > Also use a weak reference for the _nl_current_CATEGORY thread variable. */ > > # define DEFINE_CATEGORY(category, category_name, items, a) \ > - extern char _nl_current_##category##_used; \ > + extern char _nl_current_##category##_used __attribute__((__aligned__(1))); \ This is a very gray area as far as GCC is concerned. This side effect of attributed “aligned” is not documented, and I'm not sure if we can rely on it. It's a bit like making a non-weak function symbol zero. Maybe the better approach would be to change 1 to 8 or 16.
On 09/28/2015 09:16 AM, Florian Weimer wrote: > On 09/27/2015 03:26 PM, Marcin Kościelnicki wrote: >> This ensures that compiler doesn't get the values of these symbols >> using instructions that have alignment requirements (eg. larl on s390). > > The commit message should perhaps say “the address of these variables” > because the trigger for this issue is in C code, so it makes sense to > use C terms. > >> --- >> locale/setlocale.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/locale/setlocale.c b/locale/setlocale.c >> index ead030d..028496d 100644 >> --- a/locale/setlocale.c >> +++ b/locale/setlocale.c >> @@ -35,7 +35,7 @@ >> Also use a weak reference for the _nl_current_CATEGORY thread variable. */ >> >> # define DEFINE_CATEGORY(category, category_name, items, a) \ >> - extern char _nl_current_##category##_used; \ >> + extern char _nl_current_##category##_used __attribute__((__aligned__(1))); \ > > This is a very gray area as far as GCC is concerned. This side effect > of attributed “aligned” is not documented, and I'm not sure if we can > rely on it. It's a bit like making a non-weak function symbol zero. > > Maybe the better approach would be to change 1 to 8 or 16. Agreed. See locale/localeinfo.h and adjust the asm to use 16, and see if that fixes the problem. If it does, you'll need a large comment there explaining why 16 is important and should not be changed. I'm surpised there aren't other instances where this doesn't cause problems, but there are indeed few of these cases where data addresses are used, and AFAICT only one which is type 'char', this one. The other instance is _dl_rtld_map which has a larger alignment. A function address would have all the requirements for alignment that you would need to avoid this problem. Cheers, Carlos.
On Sun, Sep 27, 2015 at 03:26:25PM +0200, Marcin Kościelnicki wrote: > This ensures that compiler doesn't get the values of these symbols > using instructions that have alignment requirements (eg. larl on s390). > --- > locale/setlocale.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/locale/setlocale.c b/locale/setlocale.c > index ead030d..028496d 100644 > --- a/locale/setlocale.c > +++ b/locale/setlocale.c > @@ -35,7 +35,7 @@ > Also use a weak reference for the _nl_current_CATEGORY thread variable. */ > > # define DEFINE_CATEGORY(category, category_name, items, a) \ > - extern char _nl_current_##category##_used; \ > + extern char _nl_current_##category##_used __attribute__((__aligned__(1))); \ > weak_extern (_nl_current_##category##_used) \ > weak_extern (_nl_current_##category) > # include "categories.def" Can you explain how/why you think this is needed? char has no alignment requirements (inherently) so as far as I can tell, the compiler may not make any alignment assumptions about an extern object of type char. Rich
On 09/28/2015 04:18 PM, Rich Felker wrote: > Can you explain how/why you think this is needed? char has no > alignment requirements (inherently) so as far as I can tell, the > compiler may not make any alignment assumptions about an extern object > of type char. s390(x) expects all top-level objects in the data segment to be aligned to at least 2. As far as I can tell, this is not explicitly mentioned in the psABI supplement, but it is heavily implied by .align directives and use of the lalr instruction.
Rich Felker <dalias@libc.org> writes: > Can you explain how/why you think this is needed? char has no > alignment requirements (inherently) so as far as I can tell, the > compiler may not make any alignment assumptions about an extern object > of type char. See DATA_ABI_ALIGNMENT in the gccint manual. This is an ABI property. Andreas.
On Mon, Sep 28, 2015 at 04:52:10PM +0200, Florian Weimer wrote: > On 09/28/2015 04:18 PM, Rich Felker wrote: > > > Can you explain how/why you think this is needed? char has no > > alignment requirements (inherently) so as far as I can tell, the > > compiler may not make any alignment assumptions about an extern object > > of type char. > > s390(x) expects all top-level objects in the data segment to be aligned > to at least 2. As far as I can tell, this is not explicitly mentioned > in the psABI supplement, but it is heavily implied by .align directives > and use of the lalr instruction. So reading chars via a pointer or array is slower than reading a single non-array char object? Uhg...what an awful ISA. Thanks for the explanation, though. Rich
On 28/09/15 17:05, Rich Felker wrote: > On Mon, Sep 28, 2015 at 04:52:10PM +0200, Florian Weimer wrote: >> On 09/28/2015 04:18 PM, Rich Felker wrote: >> >>> Can you explain how/why you think this is needed? char has no >>> alignment requirements (inherently) so as far as I can tell, the >>> compiler may not make any alignment assumptions about an extern object >>> of type char. >> >> s390(x) expects all top-level objects in the data segment to be aligned >> to at least 2. As far as I can tell, this is not explicitly mentioned >> in the psABI supplement, but it is heavily implied by .align directives >> and use of the lalr instruction. > > So reading chars via a pointer or array is slower than reading a > single non-array char object? Uhg...what an awful ISA. Thanks for the > explanation, though. > > Rich > It's not about access speed, it's about loading the address of a symbol. Eg. extern char x[10]; read x[0]: larl %r2, x lg %r2, 0(%r2) read x[1]: larl %r2, x lg %r2, 1(%r2) If address of x is not aligned to 2 bytes, larl won't work and you have to stuff the address into a literal pool and read it from there.
On 28 Sep 2015 16:52, Florian Weimer wrote: > On 09/28/2015 04:18 PM, Rich Felker wrote: > > Can you explain how/why you think this is needed? char has no > > alignment requirements (inherently) so as far as I can tell, the > > compiler may not make any alignment assumptions about an extern object > > of type char. > > s390(x) expects all top-level objects in the data segment to be aligned > to at least 2. As far as I can tell, this is not explicitly mentioned > in the psABI supplement, but it is heavily implied by .align directives > and use of the lalr instruction. how does that bubble in here though ? does locale/localeinfo.h need to update _NL_CURRENT_DEFINE to use helpers from libc-symbols.h ? this seems to be patching a single symptom point rather than addressing the source. -mike
On 28/09/15 16:17, Carlos O'Donell wrote: > On 09/28/2015 09:16 AM, Florian Weimer wrote: >> On 09/27/2015 03:26 PM, Marcin Kościelnicki wrote: >>> This ensures that compiler doesn't get the values of these symbols >>> using instructions that have alignment requirements (eg. larl on s390). >> >> The commit message should perhaps say “the address of these variables” >> because the trigger for this issue is in C code, so it makes sense to >> use C terms. >> >>> --- >>> locale/setlocale.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/locale/setlocale.c b/locale/setlocale.c >>> index ead030d..028496d 100644 >>> --- a/locale/setlocale.c >>> +++ b/locale/setlocale.c >>> @@ -35,7 +35,7 @@ >>> Also use a weak reference for the _nl_current_CATEGORY thread variable. */ >>> >>> # define DEFINE_CATEGORY(category, category_name, items, a) \ >>> - extern char _nl_current_##category##_used; \ >>> + extern char _nl_current_##category##_used __attribute__((__aligned__(1))); \ >> >> This is a very gray area as far as GCC is concerned. This side effect >> of attributed “aligned” is not documented, and I'm not sure if we can >> rely on it. It's a bit like making a non-weak function symbol zero. FWIW, it seems to be quite purposeful, but s390 specific: https://github.com/gcc-mirror/gcc/blob/gcc_5_2_0_release/gcc/config/s390/s390.c#L11798 >> >> Maybe the better approach would be to change 1 to 8 or 16. > > Agreed. > > See locale/localeinfo.h and adjust the asm to use 16, and see if that fixes > the problem. If it does, you'll need a large comment there explaining why > 16 is important and should not be changed. Yeah, that's another way to fix the problem. I picked the aligned attribute since I also had to fix a similiar problem in gold testsuite, which actually relied on symbols having unaligned values. But "use 16" seems just fine for glibc. > > I'm surpised there aren't other instances where this doesn't cause problems, > but there are indeed few of these cases where data addresses are used, and > AFAICT only one which is type 'char', this one. The other instance is _dl_rtld_map > which has a larger alignment. There's one major problem: GNU ld does not emit any diagnostics about misaligned larl target, it just happily forces the LSB to 0. So there may be a lot of problems that noone noticed. As for _dl_rtld_map, that'd mean miscalculating _end or _edata by 1 if they're unaligned (which is likely quite rare). Sounds rather hard to notice. > > A function address would have all the requirements for alignment that you would > need to avoid this problem. Sorry, I don't get what you're talking about. > > Cheers, > Carlos. > > >
On 09/28/2015 05:17 PM, Mike Frysinger wrote: > On 28 Sep 2015 16:52, Florian Weimer wrote: >> On 09/28/2015 04:18 PM, Rich Felker wrote: >>> Can you explain how/why you think this is needed? char has no >>> alignment requirements (inherently) so as far as I can tell, the >>> compiler may not make any alignment assumptions about an extern object >>> of type char. >> >> s390(x) expects all top-level objects in the data segment to be aligned >> to at least 2. As far as I can tell, this is not explicitly mentioned >> in the psABI supplement, but it is heavily implied by .align directives >> and use of the lalr instruction. > > how does that bubble in here though ? The value of a weak symbol is used as flag, to determine that a particular piece of code has been linked in. If the object file is pulled in by other code, the symbol is defined with an absolute address. > does locale/localeinfo.h need to update > _NL_CURRENT_DEFINE to use helpers from libc-symbols.h ? this seems to be > patching a single symptom point rather than addressing the source. I'm not sure if this occurs in other places.
On 09/28/2015 11:22 AM, Florian Weimer wrote: > On 09/28/2015 05:17 PM, Mike Frysinger wrote: >> On 28 Sep 2015 16:52, Florian Weimer wrote: >>> On 09/28/2015 04:18 PM, Rich Felker wrote: >>>> Can you explain how/why you think this is needed? char has no >>>> alignment requirements (inherently) so as far as I can tell, the >>>> compiler may not make any alignment assumptions about an extern object >>>> of type char. >>> >>> s390(x) expects all top-level objects in the data segment to be aligned >>> to at least 2. As far as I can tell, this is not explicitly mentioned >>> in the psABI supplement, but it is heavily implied by .align directives >>> and use of the lalr instruction. >> >> how does that bubble in here though ? > > The value of a weak symbol is used as flag, to determine that a > particular piece of code has been linked in. If the object file is > pulled in by other code, the symbol is defined with an absolute address. > >> does locale/localeinfo.h need to update >> _NL_CURRENT_DEFINE to use helpers from libc-symbols.h ? this seems to be >> patching a single symptom point rather than addressing the source. > > I'm not sure if this occurs in other places. As I said earlier I did a quick review and this is likely the only case I could find. I encourage others to look though. In some sense I agree with Mike, that there could be some kind of macro to prevent this from ever happening again. That all weak-ref-check idioms for data addresses should go through a macro that prevents s390 from failing and enforces a sane symbol-like value (if such a macro doesn't already exist). It seems a bit like overkill right now though. I think just making the value 16 with a comment will work. Cheers, Carlos.
On 09/28/2015 11:20 AM, Marcin Kościelnicki wrote: >> A function address would have all the requirements for alignment that you would >> need to avoid this problem. > > Sorry, I don't get what you're talking about. This is an artificial address of 1, which is why it is violating assumptions in the s390x implementation. If the address had been that of a function, we would not have seen this failure. We might add a way to create valid addresses from flags and vice versa using a macro from libc-symbols.h as Mike suggests. Totally untested: sysdeps/generic/libc-symbols.h /* On a generic machine any symbol address is valid. */ #define FLAGS_TO_SYMADDR(val) (val) #define SYMADDR_TO_FLAGS(val) (val) #include_next <libc-symbols.h> sysdeps/s390/libc-symbols.h /* On s390 a symbol address must be aligned to X. Therefore if we hide flags in a symbol address we loose the lowest bits for flags. */ #define FLAGS_TO_SYMADDR(val) (val << X) #define SYMADDR_TO_FLAGS(val) (val >> X) #include_next <libc-symbols.h> locale/localeinfo.h: /* This is used in lc-CATEGORY.c to define _nl_current_CATEGORY. */ #define _NL_CURRENT_DEFINE(category) \ __thread struct __locale_data *const *_nl_current_##category \ attribute_hidden = &_nl_global_locale.__locales[category]; \ asm (".globl " __SYMBOL_PREFIX "_nl_current_" #category "_used\n" \ _NL_CURRENT_DEFINE_ABS (_nl_current_##category##_used, FLAGS_TO_SYMADDR (1))); ... locale/uselocale.c: # define DEFINE_CATEGORY(category, category_name, items, a) \ { \ extern char _nl_current_##category##_used; \ weak_extern (_nl_current_##category##_used) \ weak_extern (_nl_current_##category) \ if (SYMADDR_TO_FLAGS(&_nl_current_##category##_used) != 0) \ _nl_current_##category = &locobj->__locales[category]; \ } Does that make sense? Cheers, Carlos.
On Mon, Sep 28, 2015 at 11:28:07AM -0400, Carlos O'Donell wrote: > On 09/28/2015 11:22 AM, Florian Weimer wrote: > > On 09/28/2015 05:17 PM, Mike Frysinger wrote: > >> On 28 Sep 2015 16:52, Florian Weimer wrote: > >>> On 09/28/2015 04:18 PM, Rich Felker wrote: > >>>> Can you explain how/why you think this is needed? char has no > >>>> alignment requirements (inherently) so as far as I can tell, the > >>>> compiler may not make any alignment assumptions about an extern object > >>>> of type char. > >>> > >>> s390(x) expects all top-level objects in the data segment to be aligned > >>> to at least 2. As far as I can tell, this is not explicitly mentioned > >>> in the psABI supplement, but it is heavily implied by .align directives > >>> and use of the lalr instruction. > >> > >> how does that bubble in here though ? > > > > The value of a weak symbol is used as flag, to determine that a > > particular piece of code has been linked in. If the object file is > > pulled in by other code, the symbol is defined with an absolute address. > > > >> does locale/localeinfo.h need to update > >> _NL_CURRENT_DEFINE to use helpers from libc-symbols.h ? this seems to be > >> patching a single symptom point rather than addressing the source. > > > > I'm not sure if this occurs in other places. > > As I said earlier I did a quick review and this is likely the only case > I could find. I encourage others to look though. > > In some sense I agree with Mike, that there could be some kind of macro > to prevent this from ever happening again. That all weak-ref-check idioms > for data addresses should go through a macro that prevents s390 from > failing and enforces a sane symbol-like value (if such a macro doesn't > already exist). > > It seems a bit like overkill right now though. I think just making the > value 16 with a comment will work. Is there a reason this hack with absolute definitions is being used rather than just defining actual objects for the symbols to resolve to (or aliasing them to objects that already exist)? This approach would not require any ABI-specific knowledge. Rich
On 09/28/2015 02:22 PM, Rich Felker wrote: >> It seems a bit like overkill right now though. I think just making the >> value 16 with a comment will work. > > Is there a reason this hack with absolute definitions is being used > rather than just defining actual objects for the symbols to resolve to > (or aliasing them to objects that already exist)? This approach would > not require any ABI-specific knowledge. Yes, exactly, I was just wondering this myself. I think indeed we can just do away with the micro-optimization of using the address as a flag and just allocate an object with a known value. Keep in mind this is only for static linking, since that's the only time NL_CURRENT_INDIRECT is defined. In the dynamic linking case everything is loaded in the libc.so DSO at load time, there is no way to avoid the overhead of handling all of the categories. Cheers, Carlos.
diff --git a/locale/setlocale.c b/locale/setlocale.c index ead030d..028496d 100644 --- a/locale/setlocale.c +++ b/locale/setlocale.c @@ -35,7 +35,7 @@ Also use a weak reference for the _nl_current_CATEGORY thread variable. */ # define DEFINE_CATEGORY(category, category_name, items, a) \ - extern char _nl_current_##category##_used; \ + extern char _nl_current_##category##_used __attribute__((__aligned__(1))); \ weak_extern (_nl_current_##category##_used) \ weak_extern (_nl_current_##category) # include "categories.def"