nscd_stat.c: make the build reproducible

Message ID 1457456791-17402-1-git-send-email-aurelien@aurel32.net
State Changes Requested, archived
Headers

Commit Message

Aurelien Jarno March 8, 2016, 5:06 p.m. UTC
  nscd_stat.c uses the __DATE__ and __TIME__ macros to make sure the
client and the server use the same format. This prevents reproducible
builds and fails to build with -Werror=date-time. In addition another
build of the same file a bit later does not necessary imply a change
in the format.

Instead compute a checksum of the file in the Makefile and pass it to
the preprocessor with the -D option. Use the md5sum command limited to
20 characters to avoid changing the structure size.

Changelog:
	* math/nscd (CPPFLAGS-nscd_stat.c): Append -DNSCD_STAT_C_CHECKSUM=...
	with the checksum of nscd_stat.c computed with md5sum.
	* nscd/nscd_stat.c (compilation): Use NSCD_STAT_C_CHECKSUM instead of
	__DATE__ " " __TIME__.
---
 ChangeLog        | 7 +++++++
 nscd/Makefile    | 4 ++++
 nscd/nscd_stat.c | 2 +-
 3 files changed, 12 insertions(+), 1 deletion(-)
  

Comments

Mike Frysinger March 8, 2016, 11:37 p.m. UTC | #1
On 08 Mar 2016 18:06, Aurelien Jarno wrote:
> nscd_stat.c uses the __DATE__ and __TIME__ macros to make sure the
> client and the server use the same format. This prevents reproducible
> builds and fails to build with -Werror=date-time. In addition another
> build of the same file a bit later does not necessary imply a change
> in the format.
> 
> Instead compute a checksum of the file in the Makefile and pass it to
> the preprocessor with the -D option. Use the md5sum command limited to
> 20 characters to avoid changing the structure size.

this doesn't take into consideration the HAVE_SELINUX knob ...
-mike
  
Aurelien Jarno March 9, 2016, 7:54 a.m. UTC | #2
On 2016-03-08 18:37, Mike Frysinger wrote:
> On 08 Mar 2016 18:06, Aurelien Jarno wrote:
> > nscd_stat.c uses the __DATE__ and __TIME__ macros to make sure the
> > client and the server use the same format. This prevents reproducible
> > builds and fails to build with -Werror=date-time. In addition another
> > build of the same file a bit later does not necessary imply a change
> > in the format.
> > 
> > Instead compute a checksum of the file in the Makefile and pass it to
> > the preprocessor with the -D option. Use the md5sum command limited to
> > 20 characters to avoid changing the structure size.
> 
> this doesn't take into consideration the HAVE_SELINUX knob ...

Correct. Back to the design board, I'll try to provide a new solution.
Don't hesitate to tell if you have ideas.
  
Mike Frysinger March 9, 2016, 10:30 p.m. UTC | #3
On 09 Mar 2016 08:54, Aurelien Jarno wrote:
> On 2016-03-08 18:37, Mike Frysinger wrote:
> > On 08 Mar 2016 18:06, Aurelien Jarno wrote:
> > > nscd_stat.c uses the __DATE__ and __TIME__ macros to make sure the
> > > client and the server use the same format. This prevents reproducible
> > > builds and fails to build with -Werror=date-time. In addition another
> > > build of the same file a bit later does not necessary imply a change
> > > in the format.
> > > 
> > > Instead compute a checksum of the file in the Makefile and pass it to
> > > the preprocessor with the -D option. Use the md5sum command limited to
> > > 20 characters to avoid changing the structure size.
> > 
> > this doesn't take into consideration the HAVE_SELINUX knob ...
> 
> Correct. Back to the design board, I'll try to provide a new solution.
> Don't hesitate to tell if you have ideas.

mix in the value of $(have-selinux), and put a comment above the struct
saying that the Makefile needs to be updated whenever the struct changes.

although i think this also doesn't take into account other differences
(alignment/sizes).  this is a struct that is transmitted over the unix
socket ?  so if i launch a 32bit x86 app on a system with a 64bit x86_64
nscd, it won't work.

would it be so terrible to properly marshall this data ?
-mike
  
Florian Weimer July 28, 2016, 7:15 p.m. UTC | #4
On 03/09/2016 05:30 PM, Mike Frysinger wrote:

> would it be so terrible to properly marshall this data ?

Ximin Luo and I discussed this and I wonder if it is possible to read 
out the libc.so.6 build ID if it is present.  It should indirectly call 
all the layout dependencies and be reasonably easy to access because it 
is in an allocated section (and we might want to print it from an 
libc.so.6 invocation, too).

We still need the time-based approach if the build ID is not available, 
but I expect most distributions will have something like it.

The Debian bug is:

   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=783210

(Also Cc:ed)

Florian
  
Ludovic Courtès July 29, 2016, 11:51 a.m. UTC | #5
Florian Weimer <fweimer@redhat.com> skribis:

> We still need the time-based approach if the build ID is not
> available, but I expect most distributions will have something like
> it.

FWIW in Guix we solve it by filling the ‘compilation’ array with a
substring of the installation prefix¹.

Since the installation prefix is something like
/gnu/store/5fx3vscv9pqjr1k0vyaqnpqlvvzl8rff-glibc-2.22, which comprises
a hash of all the source, build scripts, and dependencies used to build
it, we know that it uniquely identifies the result of this specific
glibc build.

The build ID should be a good approximation of this.

Ludo’.

¹ http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/base.scm#n603
  
Mike Frysinger Aug. 1, 2016, 4:52 a.m. UTC | #6
On 28 Jul 2016 15:15, Florian Weimer wrote:
> On 03/09/2016 05:30 PM, Mike Frysinger wrote:
> > would it be so terrible to properly marshall this data ?
> 
> Ximin Luo and I discussed this and I wonder if it is possible to read 
> out the libc.so.6 build ID if it is present.  It should indirectly call 
> all the layout dependencies and be reasonably easy to access because it 
> is in an allocated section (and we might want to print it from an 
> libc.so.6 invocation, too).
> 
> We still need the time-based approach if the build ID is not available, 
> but I expect most distributions will have something like it.
> 
> The Debian bug is:
> 
>    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=783210
> 
> (Also Cc:ed)

agreed that build-id should be an acceptable replacement for what the
code is doing today, but in order to pull that off, i guess you'd have
to have to do a configure test to see if build-id is active ?  if you
leave the logic to runtime, you'd still need to include the datetime
stamp in the object which would still make the build unreproducible.

this also doesn't really cover the quoted idea of marshalling the data
between client & server :).
-mike
  
Ximin Luo Nov. 4, 2016, 5:53 p.m. UTC | #7
Mike Frysinger:
> On 28 Jul 2016 15:15, Florian Weimer wrote:
>> On 03/09/2016 05:30 PM, Mike Frysinger wrote:
>>> would it be so terrible to properly marshall this data ?
>>
>> Ximin Luo and I discussed this and I wonder if it is possible to read 
>> out the libc.so.6 build ID if it is present.  It should indirectly call 
>> all the layout dependencies and be reasonably easy to access because it 
>> is in an allocated section (and we might want to print it from an 
>> libc.so.6 invocation, too).
>>
>> We still need the time-based approach if the build ID is not available, 
>> but I expect most distributions will have something like it.
>>
>> The Debian bug is:
>>
>>    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=783210
>>
>> (Also Cc:ed)
> 
> agreed that build-id should be an acceptable replacement for what the
> code is doing today, but in order to pull that off, i guess you'd have
> to have to do a configure test to see if build-id is active ?  if you
> leave the logic to runtime, you'd still need to include the datetime
> stamp in the object which would still make the build unreproducible.
> 
> this also doesn't really cover the quoted idea of marshalling the data
> between client & server :).
> -mike
> 

Hi all,

I've written a small program that prints out the Build IDs of all the objects that are dynamically linked to it, plus itself.

It works well, although I'm not a C expert so I don't know if it is portable enough. For example, I hard-code some >>2 <<2s in there, along with a uint8_t - I didn't see a corresponding ElfW(xxx) type in elf.h

Another downside is it needs to be linked against libdl, which I think is not the case currently with nscd. I'm not sure if this carries extra security risk or whatever.

An alternative would be to detect the build-id *at build time* and then monkey-patch it into the binary itself.

What do you all think? How shall I proceed?

X
  
Ximin Luo Nov. 4, 2016, 6:14 p.m. UTC | #8
Ximin Luo:
> Mike Frysinger:
>> On 28 Jul 2016 15:15, Florian Weimer wrote:
>>> On 03/09/2016 05:30 PM, Mike Frysinger wrote:
>>>> would it be so terrible to properly marshall this data ?
>>>
>>> Ximin Luo and I discussed this and I wonder if it is possible to read 
>>> out the libc.so.6 build ID if it is present.  It should indirectly call 
>>> all the layout dependencies and be reasonably easy to access because it 
>>> is in an allocated section (and we might want to print it from an 
>>> libc.so.6 invocation, too).
>>>
>>> We still need the time-based approach if the build ID is not available, 
>>> but I expect most distributions will have something like it.
>>>
>>> The Debian bug is:
>>>
>>>    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=783210
>>>
>>> (Also Cc:ed)
>>
>> agreed that build-id should be an acceptable replacement for what the
>> code is doing today, but in order to pull that off, i guess you'd have
>> to have to do a configure test to see if build-id is active ?  if you
>> leave the logic to runtime, you'd still need to include the datetime
>> stamp in the object which would still make the build unreproducible.
>>
>> this also doesn't really cover the quoted idea of marshalling the data
>> between client & server :).
>> -mike
>>
> 
> Hi all,
> 
> I've written a small program that prints out the Build IDs of all the objects that are dynamically linked to it, plus itself.
> 
> It works well, although I'm not a C expert so I don't know if it is portable enough. For example, I hard-code some >>2 <<2s in there, along with a uint8_t - I didn't see a corresponding ElfW(xxx) type in elf.h
> 
> Another downside is it needs to be linked against libdl, which I think is not the case currently with nscd. I'm not sure if this carries extra security risk or whatever.
> 

Oh! Actually it doesn't need to be linked against libdl. That was from an earlier version of the code where I was using dlinfo instead of dl_iterate_phdr. But this latter function doesn't need extra libs. :)

> An alternative would be to detect the build-id *at build time* and then monkey-patch it into the binary itself.
> 
> What do you all think? How shall I proceed?
> 

X
  

Patch

diff --git a/ChangeLog b/ChangeLog
index 289d578..8f0b24f 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@ 
+2016-03-08  Aurelien Jarno  <aurelien@aurel32.net>
+
+	* math/nscd (CPPFLAGS-nscd_stat.c): Append -DNSCD_STAT_C_CHECKSUM=...
+	with the checksum of nscd_stat.c computed with md5sum.
+	* nscd/nscd_stat.c (compilation): Use NSCD_STAT_C_CHECKSUM instead of
+	__DATE__ " " __TIME__.
+
 2016-03-08  H.J. Lu  <hongjiu.lu@intel.com>
 
 	[BZ #19783]
diff --git a/nscd/Makefile b/nscd/Makefile
index 50bad32..2678c6b 100644
--- a/nscd/Makefile
+++ b/nscd/Makefile
@@ -90,6 +90,10 @@  ifeq (yesyes,$(have-fpie)$(build-shared))
 LDFLAGS-nscd = -Wl,-z,now
 endif
 
+# Compute a checksum of the file to verify that the sender and the
+# receiver are using the same format.
+CPPFLAGS-nscd_stat.c += -DNSCD_STAT_C_CHECKSUM='"$(shell md5sum $< | cut -c1-20)"'
+
 # Set libof-nscd.
 cpp-srcs-left := $(nscd-modules)
 lib := nscd
diff --git a/nscd/nscd_stat.c b/nscd/nscd_stat.c
index f34c352..fa294ca 100644
--- a/nscd/nscd_stat.c
+++ b/nscd/nscd_stat.c
@@ -37,7 +37,7 @@ 
 
 
 /* We use this to make sure the receiver is the same.  */
-static const char compilation[21] = __DATE__ " " __TIME__;
+static const char compilation[21] = NSCD_STAT_C_CHECKSUM;
 
 /* Statistic data for one database.  */
 struct dbstat