Message ID | 1457456791-17402-1-git-send-email-aurelien@aurel32.net |
---|---|
State | Changes Requested, archived |
Headers |
Received: (qmail 49756 invoked by alias); 8 Mar 2016 17:06:47 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 49729 invoked by uid 89); 8 Mar 2016 17:06:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.8 required=5.0 tests=BAYES_50, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=U*aurelien, sk:aurelie, aurelien@aurel32.net, D*aurel32.net X-HELO: hall.aurel32.net From: Aurelien Jarno <aurelien@aurel32.net> To: libc-alpha@sourceware.org Cc: Aurelien Jarno <aurelien@aurel32.net> Subject: [PATCH] nscd_stat.c: make the build reproducible Date: Tue, 8 Mar 2016 18:06:31 +0100 Message-Id: <1457456791-17402-1-git-send-email-aurelien@aurel32.net> |
Commit Message
Aurelien Jarno
March 8, 2016, 5:06 p.m. UTC
nscd_stat.c uses the __DATE__ and __TIME__ macros to make sure the client and the server use the same format. This prevents reproducible builds and fails to build with -Werror=date-time. In addition another build of the same file a bit later does not necessary imply a change in the format. Instead compute a checksum of the file in the Makefile and pass it to the preprocessor with the -D option. Use the md5sum command limited to 20 characters to avoid changing the structure size. Changelog: * math/nscd (CPPFLAGS-nscd_stat.c): Append -DNSCD_STAT_C_CHECKSUM=... with the checksum of nscd_stat.c computed with md5sum. * nscd/nscd_stat.c (compilation): Use NSCD_STAT_C_CHECKSUM instead of __DATE__ " " __TIME__. --- ChangeLog | 7 +++++++ nscd/Makefile | 4 ++++ nscd/nscd_stat.c | 2 +- 3 files changed, 12 insertions(+), 1 deletion(-)
Comments
On 08 Mar 2016 18:06, Aurelien Jarno wrote: > nscd_stat.c uses the __DATE__ and __TIME__ macros to make sure the > client and the server use the same format. This prevents reproducible > builds and fails to build with -Werror=date-time. In addition another > build of the same file a bit later does not necessary imply a change > in the format. > > Instead compute a checksum of the file in the Makefile and pass it to > the preprocessor with the -D option. Use the md5sum command limited to > 20 characters to avoid changing the structure size. this doesn't take into consideration the HAVE_SELINUX knob ... -mike
On 2016-03-08 18:37, Mike Frysinger wrote: > On 08 Mar 2016 18:06, Aurelien Jarno wrote: > > nscd_stat.c uses the __DATE__ and __TIME__ macros to make sure the > > client and the server use the same format. This prevents reproducible > > builds and fails to build with -Werror=date-time. In addition another > > build of the same file a bit later does not necessary imply a change > > in the format. > > > > Instead compute a checksum of the file in the Makefile and pass it to > > the preprocessor with the -D option. Use the md5sum command limited to > > 20 characters to avoid changing the structure size. > > this doesn't take into consideration the HAVE_SELINUX knob ... Correct. Back to the design board, I'll try to provide a new solution. Don't hesitate to tell if you have ideas.
On 09 Mar 2016 08:54, Aurelien Jarno wrote: > On 2016-03-08 18:37, Mike Frysinger wrote: > > On 08 Mar 2016 18:06, Aurelien Jarno wrote: > > > nscd_stat.c uses the __DATE__ and __TIME__ macros to make sure the > > > client and the server use the same format. This prevents reproducible > > > builds and fails to build with -Werror=date-time. In addition another > > > build of the same file a bit later does not necessary imply a change > > > in the format. > > > > > > Instead compute a checksum of the file in the Makefile and pass it to > > > the preprocessor with the -D option. Use the md5sum command limited to > > > 20 characters to avoid changing the structure size. > > > > this doesn't take into consideration the HAVE_SELINUX knob ... > > Correct. Back to the design board, I'll try to provide a new solution. > Don't hesitate to tell if you have ideas. mix in the value of $(have-selinux), and put a comment above the struct saying that the Makefile needs to be updated whenever the struct changes. although i think this also doesn't take into account other differences (alignment/sizes). this is a struct that is transmitted over the unix socket ? so if i launch a 32bit x86 app on a system with a 64bit x86_64 nscd, it won't work. would it be so terrible to properly marshall this data ? -mike
On 03/09/2016 05:30 PM, Mike Frysinger wrote:
> would it be so terrible to properly marshall this data ?
Ximin Luo and I discussed this and I wonder if it is possible to read
out the libc.so.6 build ID if it is present. It should indirectly call
all the layout dependencies and be reasonably easy to access because it
is in an allocated section (and we might want to print it from an
libc.so.6 invocation, too).
We still need the time-based approach if the build ID is not available,
but I expect most distributions will have something like it.
The Debian bug is:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=783210
(Also Cc:ed)
Florian
Florian Weimer <fweimer@redhat.com> skribis: > We still need the time-based approach if the build ID is not > available, but I expect most distributions will have something like > it. FWIW in Guix we solve it by filling the ‘compilation’ array with a substring of the installation prefix¹. Since the installation prefix is something like /gnu/store/5fx3vscv9pqjr1k0vyaqnpqlvvzl8rff-glibc-2.22, which comprises a hash of all the source, build scripts, and dependencies used to build it, we know that it uniquely identifies the result of this specific glibc build. The build ID should be a good approximation of this. Ludo’. ¹ http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/base.scm#n603
On 28 Jul 2016 15:15, Florian Weimer wrote: > On 03/09/2016 05:30 PM, Mike Frysinger wrote: > > would it be so terrible to properly marshall this data ? > > Ximin Luo and I discussed this and I wonder if it is possible to read > out the libc.so.6 build ID if it is present. It should indirectly call > all the layout dependencies and be reasonably easy to access because it > is in an allocated section (and we might want to print it from an > libc.so.6 invocation, too). > > We still need the time-based approach if the build ID is not available, > but I expect most distributions will have something like it. > > The Debian bug is: > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=783210 > > (Also Cc:ed) agreed that build-id should be an acceptable replacement for what the code is doing today, but in order to pull that off, i guess you'd have to have to do a configure test to see if build-id is active ? if you leave the logic to runtime, you'd still need to include the datetime stamp in the object which would still make the build unreproducible. this also doesn't really cover the quoted idea of marshalling the data between client & server :). -mike
Mike Frysinger: > On 28 Jul 2016 15:15, Florian Weimer wrote: >> On 03/09/2016 05:30 PM, Mike Frysinger wrote: >>> would it be so terrible to properly marshall this data ? >> >> Ximin Luo and I discussed this and I wonder if it is possible to read >> out the libc.so.6 build ID if it is present. It should indirectly call >> all the layout dependencies and be reasonably easy to access because it >> is in an allocated section (and we might want to print it from an >> libc.so.6 invocation, too). >> >> We still need the time-based approach if the build ID is not available, >> but I expect most distributions will have something like it. >> >> The Debian bug is: >> >> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=783210 >> >> (Also Cc:ed) > > agreed that build-id should be an acceptable replacement for what the > code is doing today, but in order to pull that off, i guess you'd have > to have to do a configure test to see if build-id is active ? if you > leave the logic to runtime, you'd still need to include the datetime > stamp in the object which would still make the build unreproducible. > > this also doesn't really cover the quoted idea of marshalling the data > between client & server :). > -mike > Hi all, I've written a small program that prints out the Build IDs of all the objects that are dynamically linked to it, plus itself. It works well, although I'm not a C expert so I don't know if it is portable enough. For example, I hard-code some >>2 <<2s in there, along with a uint8_t - I didn't see a corresponding ElfW(xxx) type in elf.h Another downside is it needs to be linked against libdl, which I think is not the case currently with nscd. I'm not sure if this carries extra security risk or whatever. An alternative would be to detect the build-id *at build time* and then monkey-patch it into the binary itself. What do you all think? How shall I proceed? X
Ximin Luo: > Mike Frysinger: >> On 28 Jul 2016 15:15, Florian Weimer wrote: >>> On 03/09/2016 05:30 PM, Mike Frysinger wrote: >>>> would it be so terrible to properly marshall this data ? >>> >>> Ximin Luo and I discussed this and I wonder if it is possible to read >>> out the libc.so.6 build ID if it is present. It should indirectly call >>> all the layout dependencies and be reasonably easy to access because it >>> is in an allocated section (and we might want to print it from an >>> libc.so.6 invocation, too). >>> >>> We still need the time-based approach if the build ID is not available, >>> but I expect most distributions will have something like it. >>> >>> The Debian bug is: >>> >>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=783210 >>> >>> (Also Cc:ed) >> >> agreed that build-id should be an acceptable replacement for what the >> code is doing today, but in order to pull that off, i guess you'd have >> to have to do a configure test to see if build-id is active ? if you >> leave the logic to runtime, you'd still need to include the datetime >> stamp in the object which would still make the build unreproducible. >> >> this also doesn't really cover the quoted idea of marshalling the data >> between client & server :). >> -mike >> > > Hi all, > > I've written a small program that prints out the Build IDs of all the objects that are dynamically linked to it, plus itself. > > It works well, although I'm not a C expert so I don't know if it is portable enough. For example, I hard-code some >>2 <<2s in there, along with a uint8_t - I didn't see a corresponding ElfW(xxx) type in elf.h > > Another downside is it needs to be linked against libdl, which I think is not the case currently with nscd. I'm not sure if this carries extra security risk or whatever. > Oh! Actually it doesn't need to be linked against libdl. That was from an earlier version of the code where I was using dlinfo instead of dl_iterate_phdr. But this latter function doesn't need extra libs. :) > An alternative would be to detect the build-id *at build time* and then monkey-patch it into the binary itself. > > What do you all think? How shall I proceed? > X
diff --git a/ChangeLog b/ChangeLog index 289d578..8f0b24f 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,10 @@ +2016-03-08 Aurelien Jarno <aurelien@aurel32.net> + + * math/nscd (CPPFLAGS-nscd_stat.c): Append -DNSCD_STAT_C_CHECKSUM=... + with the checksum of nscd_stat.c computed with md5sum. + * nscd/nscd_stat.c (compilation): Use NSCD_STAT_C_CHECKSUM instead of + __DATE__ " " __TIME__. + 2016-03-08 H.J. Lu <hongjiu.lu@intel.com> [BZ #19783] diff --git a/nscd/Makefile b/nscd/Makefile index 50bad32..2678c6b 100644 --- a/nscd/Makefile +++ b/nscd/Makefile @@ -90,6 +90,10 @@ ifeq (yesyes,$(have-fpie)$(build-shared)) LDFLAGS-nscd = -Wl,-z,now endif +# Compute a checksum of the file to verify that the sender and the +# receiver are using the same format. +CPPFLAGS-nscd_stat.c += -DNSCD_STAT_C_CHECKSUM='"$(shell md5sum $< | cut -c1-20)"' + # Set libof-nscd. cpp-srcs-left := $(nscd-modules) lib := nscd diff --git a/nscd/nscd_stat.c b/nscd/nscd_stat.c index f34c352..fa294ca 100644 --- a/nscd/nscd_stat.c +++ b/nscd/nscd_stat.c @@ -37,7 +37,7 @@ /* We use this to make sure the receiver is the same. */ -static const char compilation[21] = __DATE__ " " __TIME__; +static const char compilation[21] = NSCD_STAT_C_CHECKSUM; /* Statistic data for one database. */ struct dbstat