Take latest of archive and file mtime

Message ID 6ijvptsms2h2jhdutin4eh3ze3o5g5aardgtkgeplqjiveuvhl@mcslxdj77xgo
State Changes Requested
Delegated to: Frank Eigler
Headers
Series Take latest of archive and file mtime |

Commit Message

Lluís Batlle i Rossell Feb. 27, 2025, 3:26 p.m. UTC
  attached
From dad01d11ce8390f1c32fa39963d6aeb17897a9c5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Llu=C3=ADs=20Batlle=20i=20Rossell?= <viric@viric.name>
Date: Thu, 27 Feb 2025 14:16:41 +0100
Subject: [PATCH] Take latest of archive and file mtime
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Distributions like Yocto update the RPMs but they set all files inside to
a fixed timestamp, so that internal timestamp doesn't tell if files
changed.

Signed-off-by: Lluís Batlle i Rossell <viric@viric.name>
---
 debuginfod/debuginfod.cxx | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
  

Comments

Frank Ch. Eigler March 3, 2025, 11:23 p.m. UTC | #1
Hi -

> [...]
> Distributions like Yocto update the RPMs but they set all files inside to
> a fixed timestamp, so that internal timestamp doesn't tell if files
> changed.
> [...]

Can you please elaborate on your explanation?  I'm afraid I can't quite see
why this yocto behaviour should change anything, or why substituting one
mtime for another should fix it.


- FChE
  
Lluís Batlle i Rossell March 4, 2025, 8:23 a.m. UTC | #2
On Mon, Mar 03, 2025 at 06:23:55PM -0500, Frank Ch. Eigler wrote:
> Hi -
> 
> > [...]
> > Distributions like Yocto update the RPMs but they set all files inside to
> > a fixed timestamp, so that internal timestamp doesn't tell if files
> > changed.
> > [...]
> 
> Can you please elaborate on your explanation?  I'm afraid I can't quite see
> why this yocto behaviour should change anything, or why substituting one
> mtime for another should fix it.

Sure. The yocto distribution prepares hundreds of RPMs/debs/... for the
OS, and they contain the binaries, debug info & srcs. That's ideal for
debuginfod on the whole OS:
https://docs.yoctoproject.org/dev-manual/debugging.html#using-the-debuginfod-server-method

To achieve binary-reproducibility, the build yocto recipes set a fixed
timestamp to all files going into the package archives (rpm, deb...). So
when debuginfod indexes the files in the RPMs it will take the fixed
timestamp.

Every time you rebuild some packages, the package files are rebuilt with the
new contents. When this happens, the new package files will have a newer
mtime, but the files inside the archive (elf, source) will have the same
fixed timestamp as before. And when debuginfod will traverse them (without
the patch I propose), it will not update the database because it will find
consider that the files inside the packages were not modified.

The yocto distribution tries to guess a good "fixed timestamp" based on
multiple sources of information, but only from the source alone.

https://git.yoctoproject.org/poky/plain/meta/lib/oe/reproducible.py

So if there was a source change in a package, it will pick a newer mtime.
But if it was a dependency that was updated (like openssl), that will not
contribute to a newer fixed mtime for the files. That's when debuginfod
will have a completely outdated database fooled by the fixed mtimes
determined by the yocto distribution.

That's the problem I'm facing where, before my patch, the only thing I
could do was to remove the sqlite debuginfod database and index all again
every time I rebuild a package in the yocto distribution, to have
debuginfod serving the new debuginfo correctly.

Regards,
Lluís.
  
Frank Ch. Eigler March 13, 2025, 5:43 p.m. UTC | #3
Hi -

> [...]
> Every time you rebuild some packages, the package files are rebuilt with the
> new contents. When this happens, the new package files will have a newer
> mtime, but the files inside the archive (elf, source) will have the same
> fixed timestamp as before. 

Do I understand this part correctly: that yocto package-file
filestamps are normal (reflect their actual unique-ish creation time),
but the timestamps of constituent files are synthetic (and may be
backdated / duplicate)?

> And when debuginfod will traverse them (without the patch I
> propose), it will not update the database because it will find
> consider that the files inside the packages were not modified. [...]

That's not how debuginfod works though.  It decides to reanalyze
archives based on the archive mtime, not the constituent file mtime.
Your patch only affects the "_r_seekable.mtime" column, which I
believe is not used for any sort of caching/invalidation type logic.

> [...]  That's the problem I'm facing where, before my patch, the
> only thing I could do was to remove the sqlite debuginfod database
> and index all again every time I rebuild a package in the yocto
> distribution, to have debuginfod serving the new debuginfo
> correctly.

Yeah, that shouldn't be necessary.

Maybe we could make this discussion more concrete by having you show
us an actual example.  Two different yocto package versions, with
detailed timestamp/content listings, ingested into an otherwise empty
debuginfod database one at a time, and doing a database dump after
both completed scan operations.

- FChE
  
Lluís Batlle i Rossell March 17, 2025, 4:54 p.m. UTC | #4
On Thu, Mar 13, 2025 at 01:43:25PM -0400, Frank Ch. Eigler wrote:
> Hi -
> 
> > [...]
> > Every time you rebuild some packages, the package files are rebuilt with the
> > new contents. When this happens, the new package files will have a newer
> > mtime, but the files inside the archive (elf, source) will have the same
> > fixed timestamp as before. 
> 
> Do I understand this part correctly: that yocto package-file
> filestamps are normal (reflect their actual unique-ish creation time),
> but the timestamps of constituent files are synthetic (and may be
> backdated / duplicate)?

Exactly. Correct. The "rpm" file has a good mtime timestamp of it
creation, but the mtimes of the files inside are fake.

> > And when debuginfod will traverse them (without the patch I
> > propose), it will not update the database because it will find
> > consider that the files inside the packages were not modified. [...]
> 
> That's not how debuginfod works though.  It decides to reanalyze
> archives based on the archive mtime, not the constituent file mtime.
> Your patch only affects the "_r_seekable.mtime" column, which I
> believe is not used for any sort of caching/invalidation type logic.

Uhm I agree now that I review.
I see the _r_seekable_mtime though is sent through HTTP as Last-Modified.
Maybe it is the client that is confused? I had the impression the patch
fixed all my problems for me.
I will have to review this again in detail. I simply wanted to avoid
trusting the file-in-archive timestamp for anything, in my patch.

> Maybe we could make this discussion more concrete by having you show
> us an actual example.  Two different yocto package versions, with
> detailed timestamp/content listings, ingested into an otherwise empty
> debuginfod database one at a time, and doing a database dump after
> both completed scan operations.

Here is an example:
$ ls -l less-600-r0.cortexa78ae.rpm
-rw-r--r-- 1 lbatlle 1000 121980 Mar 14 02:04 less-600-r0.cortexa78ae.rpm
$ rpm2cpio less-600-r0.cortexa78ae.rpm |cpio -v -t
drwxr-xr-x   1 root     root            0 Jan  7  2022 ./usr
drwxr-xr-x   1 root     root            0 Jan  7  2022 ./usr/bin
-rwxr-xr-x   1 root     root       223592 Jan  7  2022 ./usr/bin/less.less
-rwxr-xr-x   1 root     root        10232 Jan  7  2022 ./usr/bin/lessecho
-rwxr-xr-x   1 root     root        19840 Jan  7  2022 ./usr/bin/lesskey
497 blocks

If I modify the recipe and build again, I get the same timestamps inside.
For example, I removed one of the patches to 'less' and rebuilt it.

$ ls -l less-600-r0.cortexa78ae.rpm
-rw-r--r-- 1 lbatlle 1000 121837 Mar 17 17:24 less-600-r0.cortexa78ae.rpm
$ rpm2cpio less-600-r0.cortexa78ae.rpm |cpio -v -t
drwxr-xr-x   1 root     root            0 Jan  7  2022 ./usr
drwxr-xr-x   1 root     root            0 Jan  7  2022 ./usr/bin
-rwxr-xr-x   1 root     root       223592 Jan  7  2022 ./usr/bin/less.less
-rwxr-xr-x   1 root     root        10232 Jan  7  2022 ./usr/bin/lessecho
-rwxr-xr-x   1 root     root        19840 Jan  7  2022 ./usr/bin/lesskey
497 blocks

Regards,
Lluís.
  

Patch

diff --git a/debuginfod/debuginfod.cxx b/debuginfod/debuginfod.cxx
index 0edd57cb..4a65e57d 100644
--- a/debuginfod/debuginfod.cxx
+++ b/debuginfod/debuginfod.cxx
@@ -4725,7 +4725,9 @@  archive_classify (const string& rps, string& archive_extension, int64_t archivei
                   .bind(2, fileid)
                   .bind(3, seekable_size)
                   .bind(4, seekable_offset)
-                  .bind(5, seekable_mtime)
+                  // Distros like yocto reset timestamp in archives
+                  // Pick the most recent mtime, archive vs entry
+                  .bind(5, seekable_mtime > mtime ? seekable_mtime : mtime)
                   .step_ok_done();
             }
           else // potential source - sdef record