Message ID | CAMe9rOp7c1rdM6VLU11KTOXRsCKNPLeA+1SifWNtsTxYw2X2fw@mail.gmail.com |
---|---|
State | Committed |
Headers |
Received: (qmail 74262 invoked by alias); 2 Aug 2015 17:23:11 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 74252 invoked by uid 89); 2 Aug 2015 17:23:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.2 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-oi0-f48.google.com MIME-Version: 1.0 X-Received: by 10.202.208.197 with SMTP id h188mr4560067oig.74.1438536187680; Sun, 02 Aug 2015 10:23:07 -0700 (PDT) In-Reply-To: <55BE3C67.8010408@redhat.com> References: <55BD1D92.6040406@redhat.com> <CAMe9rOr3fOZK_PUt-pEqivGi2h3qmSaQaKLipN7LG8VgB45D7A@mail.gmail.com> <55BE3C67.8010408@redhat.com> Date: Sun, 2 Aug 2015 10:23:07 -0700 Message-ID: <CAMe9rOp7c1rdM6VLU11KTOXRsCKNPLeA+1SifWNtsTxYw2X2fw@mail.gmail.com> Subject: Re: glibc 2.22 -- Final testing for 32-bit x86 failing? From: "H.J. Lu" <hjl.tools@gmail.com> To: "Carlos O'Donell" <carlos@redhat.com> Cc: GNU C Library <libc-alpha@sourceware.org> Content-Type: text/plain; charset=UTF-8 |
Commit Message
H.J. Lu
Aug. 2, 2015, 5:23 p.m. UTC
On Sun, Aug 2, 2015 at 8:51 AM, Carlos O'Donell <carlos@redhat.com> wrote: > On 08/01/2015 04:03 PM, H.J. Lu wrote: >> On Sat, Aug 1, 2015 at 12:27 PM, Carlos O'Donell <carlos@redhat.com> wrote: >>> Community, >>> >>> Has anyone else done 32-bit x86 testing and had it work? Top of master >>> is showing some kind of problem with memory corruption. >>> >>> memory clobbered past end of allocated block >>> >>> Program received signal SIGABRT, Aborted. >>> 0xf7ffdc10 in __kernel_vsyscall () >>> (gdb) bt >>> #0 0xf7ffdc10 in __kernel_vsyscall () >>> #1 0xf7e7548f in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 >>> #2 0xf7e76ec0 in __GI_abort () at abort.c:89 >>> #3 0xf7eb4f99 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0xf7fbaf3f "%s") at ../sysdeps/posix/libc_fatal.c:175 >>> #4 0xf7eb4fd4 in __GI___libc_fatal (message=0xf7fbe128 "memory clobbered past end of allocated block\n") >>> at ../sysdeps/posix/libc_fatal.c:186 >>> #5 0xf7ec2570 in mabort (status=MCHECK_TAIL) at mcheck.c:362 >>> #6 0xf7ec264b in checkhdr (hdr=hdr@entry=0x5657a018) at mcheck.c:113 >>> #7 0xf7ec2c18 in checkhdr (hdr=0x5657a018) at mcheck.c:185 >>> #8 freehook (ptr=0x5657a030, caller=0xf7e6c86b <_nl_load_locale_from_archive+1419>) at mcheck.c:186 >>> #9 0xf7ec0395 in __GI___libc_free (mem=0x5657a030) at malloc.c:2936 >>> #10 0xf7e6c86b in _nl_load_locale_from_archive (category=category@entry=5, namep=namep@entry=0xffffcd9c) at loadarchive.c:190 >>> #11 0xf7e6b730 in _nl_find_locale (locale_path=0x0, locale_path_len=0, category=category@entry=5, name=name@entry=0xffffcd9c) >>> at findlocale.c:154 >>> #12 0xf7e6ae51 in __GI_setlocale (category=5, locale=0x80882e4 "") at setlocale.c:417 >>> #13 0x0804a428 in ?? () >>> #14 0xf7e60480 in __libc_start_main (main=0x804a3d0, argc=2, argv=0xffffcfc0, init=0x8088200, fini=0x8088270, >>> rtld_fini=0x56565ab0 <_dl_fini>, stack_end=0xffffcfbc) at libc-start.c:289 >>> #15 0x0804abb7 in ?? () >>> (gdb) >>> >>> Has anyone seen this? This appears to be new. I'll see if I can track >>> this down to an environment change in my build box (Fedora 21). >> >> I only saw >> >> FAIL: math/test-float >> >> on Fedora 22. > > Could you do me a favour and run valgrind on your build of localedef? > > valgrind --leak-check=full --show-leak-kinds=all $PWD/elf/ld.so --library-path $PWD:$PWD/elf ./locale/localedef --list-archive > > The most important thing I see is this: > > ==20823== Warning: set address range perms: large range [0x809a000, 0x2809a000) (noaccess) > ==20823== Invalid read of size 4 > ==20823== at 0x496BEF9: _nl_archive_subfreeres (in /home/carlos/scratch/build/glibc-pristine-i686/libc.so) > ==20823== by 0x496BC43: free_mem (in /home/carlos/scratch/build/glibc-pristine-i686/libc.so) > ==20823== by 0x496C3A1: __libc_freeres (in /home/carlos/scratch/build/glibc-pristine-i686/libc.so) > ==20823== by 0x4801528: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-x86-linux.so) > ==20823== by 0x48D0CDD: _Exit (_exit.S:29) > ==20823== Address 0x10 is not stack'd, malloc'd or (recently) free'd > ==20823== > ==20823== > ==20823== Process terminating with default action of signal 11 (SIGSEGV) > ==20823== Access not within mapped region at address 0x10 > ==20823== at 0x496BEF9: _nl_archive_subfreeres (in /home/carlos/scratch/build/glibc-pristine-i686/libc.so) > ==20823== by 0x496BC43: free_mem (in /home/carlos/scratch/build/glibc-pristine-i686/libc.so) > ==20823== by 0x496C3A1: __libc_freeres (in /home/carlos/scratch/build/glibc-pristine-i686/libc.so) > ==20823== by 0x4801528: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-x86-linux.so) > ==20823== by 0x48D0CDD: _Exit (_exit.S:29) > ==20823== If you believe this happened as a result of a stack > ==20823== overflow in your program's main thread (unlikely but > ==20823== possible), you can try to increase the size of the > ==20823== main thread stack using the --main-stacksize= flag. > ==20823== The main thread stack size used in this run was 8388608. > > This is not actually where it fails in glibc, which detects the failure > earlier via malloc checking. > > c. I can reproduce it on both ia32 and x86-64. struct __locale_data * internal_function _nl_load_locale_from_archive (int category, const char **namep) { has for (cnt = 0; cnt < __LC_LAST; ++cnt) if (cnt != LC_ALL) { lia->data[cnt] = _nl_intern_locale_data (cnt, results[cnt].addr, results[cnt].len); if (__glibc_likely (lia->data[cnt] != NULL)) { /* _nl_intern_locale_data leaves us these fields to initialize. */ lia->data[cnt]->alloc = ld_archive; lia->data[cnt]->name = lia->name; /* We do this instead of bumping the count each time we return this data because the mappings stay around forever anyway and we might as well hold on to a little more memory and not have to rebuild it on the next lookup of the same thing. If we were to maintain the usage_count normally and let the structures be freed, we would have to remove the elements from archloaded too. */ lia->data[cnt]->usage_count = UNDELETABLE; } } lia->data[cnt] can be NULL, which happens to en_US.UTF-8 with LC_COLLATE. But this won't happen if glibc is configured with --enable-hardcoded-path-in-tests, which I have been using. This patch fixes it.
Comments
On 08/02/2015 01:23 PM, H.J. Lu wrote: > On Sun, Aug 2, 2015 at 8:51 AM, Carlos O'Donell <carlos@redhat.com> wrote: >> On 08/01/2015 04:03 PM, H.J. Lu wrote: >>> On Sat, Aug 1, 2015 at 12:27 PM, Carlos O'Donell <carlos@redhat.com> wrote: >>>> Community, >>>> >>>> Has anyone else done 32-bit x86 testing and had it work? Top of master >>>> is showing some kind of problem with memory corruption. >>>> >>>> memory clobbered past end of allocated block >>>> >>>> Program received signal SIGABRT, Aborted. >>>> 0xf7ffdc10 in __kernel_vsyscall () >>>> (gdb) bt >>>> #0 0xf7ffdc10 in __kernel_vsyscall () >>>> #1 0xf7e7548f in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 >>>> #2 0xf7e76ec0 in __GI_abort () at abort.c:89 >>>> #3 0xf7eb4f99 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0xf7fbaf3f "%s") at ../sysdeps/posix/libc_fatal.c:175 >>>> #4 0xf7eb4fd4 in __GI___libc_fatal (message=0xf7fbe128 "memory clobbered past end of allocated block\n") >>>> at ../sysdeps/posix/libc_fatal.c:186 >>>> #5 0xf7ec2570 in mabort (status=MCHECK_TAIL) at mcheck.c:362 >>>> #6 0xf7ec264b in checkhdr (hdr=hdr@entry=0x5657a018) at mcheck.c:113 >>>> #7 0xf7ec2c18 in checkhdr (hdr=0x5657a018) at mcheck.c:185 >>>> #8 freehook (ptr=0x5657a030, caller=0xf7e6c86b <_nl_load_locale_from_archive+1419>) at mcheck.c:186 >>>> #9 0xf7ec0395 in __GI___libc_free (mem=0x5657a030) at malloc.c:2936 >>>> #10 0xf7e6c86b in _nl_load_locale_from_archive (category=category@entry=5, namep=namep@entry=0xffffcd9c) at loadarchive.c:190 >>>> #11 0xf7e6b730 in _nl_find_locale (locale_path=0x0, locale_path_len=0, category=category@entry=5, name=name@entry=0xffffcd9c) >>>> at findlocale.c:154 >>>> #12 0xf7e6ae51 in __GI_setlocale (category=5, locale=0x80882e4 "") at setlocale.c:417 >>>> #13 0x0804a428 in ?? () >>>> #14 0xf7e60480 in __libc_start_main (main=0x804a3d0, argc=2, argv=0xffffcfc0, init=0x8088200, fini=0x8088270, >>>> rtld_fini=0x56565ab0 <_dl_fini>, stack_end=0xffffcfbc) at libc-start.c:289 >>>> #15 0x0804abb7 in ?? () >>>> (gdb) >>>> >>>> Has anyone seen this? This appears to be new. I'll see if I can track >>>> this down to an environment change in my build box (Fedora 21). >>> >>> I only saw >>> >>> FAIL: math/test-float >>> >>> on Fedora 22. >> >> Could you do me a favour and run valgrind on your build of localedef? >> >> valgrind --leak-check=full --show-leak-kinds=all $PWD/elf/ld.so --library-path $PWD:$PWD/elf ./locale/localedef --list-archive >> >> The most important thing I see is this: >> >> ==20823== Warning: set address range perms: large range [0x809a000, 0x2809a000) (noaccess) >> ==20823== Invalid read of size 4 >> ==20823== at 0x496BEF9: _nl_archive_subfreeres (in /home/carlos/scratch/build/glibc-pristine-i686/libc.so) >> ==20823== by 0x496BC43: free_mem (in /home/carlos/scratch/build/glibc-pristine-i686/libc.so) >> ==20823== by 0x496C3A1: __libc_freeres (in /home/carlos/scratch/build/glibc-pristine-i686/libc.so) >> ==20823== by 0x4801528: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-x86-linux.so) >> ==20823== by 0x48D0CDD: _Exit (_exit.S:29) >> ==20823== Address 0x10 is not stack'd, malloc'd or (recently) free'd >> ==20823== >> ==20823== >> ==20823== Process terminating with default action of signal 11 (SIGSEGV) >> ==20823== Access not within mapped region at address 0x10 >> ==20823== at 0x496BEF9: _nl_archive_subfreeres (in /home/carlos/scratch/build/glibc-pristine-i686/libc.so) >> ==20823== by 0x496BC43: free_mem (in /home/carlos/scratch/build/glibc-pristine-i686/libc.so) >> ==20823== by 0x496C3A1: __libc_freeres (in /home/carlos/scratch/build/glibc-pristine-i686/libc.so) >> ==20823== by 0x4801528: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-x86-linux.so) >> ==20823== by 0x48D0CDD: _Exit (_exit.S:29) >> ==20823== If you believe this happened as a result of a stack >> ==20823== overflow in your program's main thread (unlikely but >> ==20823== possible), you can try to increase the size of the >> ==20823== main thread stack using the --main-stacksize= flag. >> ==20823== The main thread stack size used in this run was 8388608. >> >> This is not actually where it fails in glibc, which detects the failure >> earlier via malloc checking. >> >> c. > > I can reproduce it on both ia32 and x86-64. > > struct __locale_data * > internal_function > _nl_load_locale_from_archive (int category, const char **namep) > { > > has > > for (cnt = 0; cnt < __LC_LAST; ++cnt) > if (cnt != LC_ALL) > { > lia->data[cnt] = _nl_intern_locale_data (cnt, > results[cnt].addr, > results[cnt].len); > if (__glibc_likely (lia->data[cnt] != NULL)) > { > /* _nl_intern_locale_data leaves us these fields to initialize. */ > lia->data[cnt]->alloc = ld_archive; > lia->data[cnt]->name = lia->name; > > /* We do this instead of bumping the count each time we return > this data because the mappings stay around forever anyway > and we might as well hold on to a little more memory and not > have to rebuild it on the next lookup of the same thing. > If we were to maintain the usage_count normally and let the > structures be freed, we would have to remove the elements > from archloaded too. */ > lia->data[cnt]->usage_count = UNDELETABLE; > } > } > > lia->data[cnt] can be NULL, which happens to en_US.UTF-8 with > LC_COLLATE. But this won't happen if glibc is configured with > --enable-hardcoded-path-in-tests, which I have been using. > > This patch fixes it. Your patch does fix this problem, but it doesn't solve the issue I'm seeing on F21. I only see this problem in F21, but not in RHEL7, so it likely indicate a compiler<->glibc interaction. Given that I don't see it on a better tested platform with more stable tools I'm going to ignore this as a blocker. Your patch looks good though, and should go in for 2.23. Cheers, Carlos.
diff --git a/locale/loadarchive.c b/locale/loadarchive.c index ce5c210..3e18cf0 100644 --- a/locale/loadarchive.c +++ b/locale/loadarchive.c @@ -515,7 +515,7 @@ _nl_archive_subfreeres (void) free (dead->name); for (category = 0; category < __LC_LAST; ++category) - if (category != LC_ALL) + if (category != LC_ALL && dead->data[category] != NULL) { /* _nl_unload_locale just does this free for the archive case. */ if (dead->data[category]->private.cleanup)