nscd time_t size mismatch problem

Message ID Y1g/C4pinQ1tutC4@hatter.bewilderbeest.net
State Superseded
Headers
Series nscd time_t size mismatch problem |

Checks

Context Check Description
dj/TryBot-apply_patch fail Patch failed to apply to master at the time it was sent
dj/TryBot-32bit fail Patch series failed to apply

Commit Message

Zev Weiss Oct. 25, 2022, 7:54 p.m. UTC
  Hello glibc devs,

We've recently been seeing some misbehavior from nscd in OpenBMC.  It
manifests as lots of log messages like:

     disabled inotify-based monitoring for file /passwd': No such file or directory
     stat failed for file /passwd'; will try again later: No such file or directory
     disabled inotify-based monitoring for file /group': No such file or directory
     stat failed for file /group'; will try again later: No such file or directory
     disabled inotify-based monitoring for file /hosts': No such file or directory
     stat failed for file /hosts'; will try again later: No such file or directory
     disabled inotify-based monitoring for file /resolv.conf': No such file or directory
     stat failed for file /resolv.conf'; will try again later: No such file or directory

and so forth.  I initially assumed it was a configure-time --sysconfdir 
mixup, but after digging into it I found that it actually stems from a 
time_t size mismatch (this is a 32-bit ARM gnueabi target):

     $ gdb -batch -ex 'pt time_t' -ex 'p sizeof(time_t)' time/time.o
     type = long
     $1 = 4
     $ gdb -batch -ex 'pt time_t' -ex 'p sizeof(time_t)' nscd/nscd.o
     type = long long
     $1 = 8

The confusing log messages are thus just the result of the coincidence 
that sizeof(long long) - sizeof(long) == strlen("/etc"), which causes 
the disagreement in the layout of struct traced_file to make it look 
like the 'fname' member just had its directory prefix chopped off.

In the discussion of the bug in the OpenBMC issue tracker [0], Wayne 
Tung (CCed) came up with the patch below, which does seem to solve the 
immediate problem, but if I'm understanding things right does so by just 
reverting nscd to a 32-bit time_t, and so I'd expect probably wouldn't 
be considered the "right" fix -- however I don't presently know enough 
about the 32/64-bit time_t transition and ensuing compatibility concerns 
to know what the right fix really is.  Should nscd perhaps be using 
__time64_t or something instead of time_t?


Thanks,
Zev Weiss

[0] https://github.com/openbmc/openbmc/issues/3881

 From 0fda9faf757abd4f5469e6d9207499e97f79a663 Mon Sep 17 00:00:00 2001
From: Wayne Tung <wayne.tung@ui.com>
Date: Thu, 13 Oct 2022 13:10:21 +0800
Subject: [PATCH] Use 32 bits time_t for ncsd

---
  Makeconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

--
  

Comments

Adhemerval Zanella Netto Oct. 25, 2022, 9:13 p.m. UTC | #1
On 25/10/22 16:54, Zev Weiss via Libc-alpha wrote:
> Hello glibc devs,
> 
> We've recently been seeing some misbehavior from nscd in OpenBMC.  It
> manifests as lots of log messages like:
> 
>     disabled inotify-based monitoring for file /passwd': No such file or directory
>     stat failed for file /passwd'; will try again later: No such file or directory
>     disabled inotify-based monitoring for file /group': No such file or directory
>     stat failed for file /group'; will try again later: No such file or directory
>     disabled inotify-based monitoring for file /hosts': No such file or directory
>     stat failed for file /hosts'; will try again later: No such file or directory
>     disabled inotify-based monitoring for file /resolv.conf': No such file or directory
>     stat failed for file /resolv.conf'; will try again later: No such file or directory
> 
> and so forth.  I initially assumed it was a configure-time --sysconfdir mixup, but after digging into it I found that it actually stems from a time_t size mismatch (this is a 32-bit ARM gnueabi target):
> 
>     $ gdb -batch -ex 'pt time_t' -ex 'p sizeof(time_t)' time/time.o
>     type = long
>     $1 = 4
>     $ gdb -batch -ex 'pt time_t' -ex 'p sizeof(time_t)' nscd/nscd.o
>     type = long long
>     $1 = 8
> 
> The confusing log messages are thus just the result of the coincidence that sizeof(long long) - sizeof(long) == strlen("/etc"), which causes the disagreement in the layout of struct traced_file to make it look like the 'fname' member just had its directory prefix chopped off.
> 
> In the discussion of the bug in the OpenBMC issue tracker [0], Wayne Tung (CCed) came up with the patch below, which does seem to solve the immediate problem, but if I'm understanding things right does so by just reverting nscd to a 32-bit time_t, and so I'd expect probably wouldn't be considered the "right" fix -- however I don't presently know enough about the 32/64-bit time_t transition and ensuing compatibility concerns to know what the right fix really is.  Should nscd perhaps be using __time64_t or something instead of time_t?

Reverting to 32 bits time_t only means that we are postponing some potential
failure to y2038, we really move everything to 64 bit time_t.  Could you check
if the following patch fix it?

The issue is we do build nss modules with 64 time_t, however some features
are built on libc.so itself and in such cases we need to explicit use the
internal __time64_t type.

diff --git a/nscd/nscd.h b/nscd/nscd.h
index 368091aef8..f15321585b 100644
--- a/nscd/nscd.h
+++ b/nscd/nscd.h
@@ -65,7 +65,7 @@ typedef enum
 struct traced_file
 {
   /* Tracks the last modified time of the traced file.  */
-  time_t mtime;
+  __time64_t mtime;
   /* Support multiple registered files per database.  */
   struct traced_file *next;
   int call_res_init;

> 
> 
> Thanks,
> Zev Weiss
> 
> [0] https://github.com/openbmc/openbmc/issues/3881
> 
> From 0fda9faf757abd4f5469e6d9207499e97f79a663 Mon Sep 17 00:00:00 2001
> From: Wayne Tung <wayne.tung@ui.com>
> Date: Thu, 13 Oct 2022 13:10:21 +0800
> Subject: [PATCH] Use 32 bits time_t for ncsd
> 
> ---
>  Makeconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Makeconfig b/Makeconfig
> index 47db08d6ae..f78f7cc74a 100644
> --- a/Makeconfig
> +++ b/Makeconfig
> @@ -869,7 +869,7 @@ endif
>  +extra-math-flags = $(if $(filter libm,$(in-module)),-fno-math-errno,-fmath-errno)
> 
>  # Use 64 bit time_t support for installed programs
> -installed-modules = nonlib nscd lddlibc4 ldconfig locale_programs \
> +installed-modules = nonlib lddlibc4 ldconfig locale_programs \
>             iconvprogs libnss_files libnss_compat libnss_db libnss_hesiod \
>             libutil libpcprofile libSegFault
>  +extra-time-flags = $(if $(filter $(installed-modules),\
> -- 
>
  
Zev Weiss Oct. 26, 2022, 1:04 a.m. UTC | #2
On Tue, Oct 25, 2022 at 02:13:23PM PDT, Adhemerval Zanella Netto wrote:
>
>
>On 25/10/22 16:54, Zev Weiss via Libc-alpha wrote:
>> Hello glibc devs,
>>
>> We've recently been seeing some misbehavior from nscd in OpenBMC.  It
>> manifests as lots of log messages like:
>>
>>     disabled inotify-based monitoring for file /passwd': No such file or directory
>>     stat failed for file /passwd'; will try again later: No such file or directory
>>     disabled inotify-based monitoring for file /group': No such file or directory
>>     stat failed for file /group'; will try again later: No such file or directory
>>     disabled inotify-based monitoring for file /hosts': No such file or directory
>>     stat failed for file /hosts'; will try again later: No such file or directory
>>     disabled inotify-based monitoring for file /resolv.conf': No such file or directory
>>     stat failed for file /resolv.conf'; will try again later: No such file or directory
>>
>> and so forth.  I initially assumed it was a configure-time --sysconfdir mixup, but after digging into it I found that it actually stems from a time_t size mismatch (this is a 32-bit ARM gnueabi target):
>>
>>     $ gdb -batch -ex 'pt time_t' -ex 'p sizeof(time_t)' time/time.o
>>     type = long
>>     $1 = 4
>>     $ gdb -batch -ex 'pt time_t' -ex 'p sizeof(time_t)' nscd/nscd.o
>>     type = long long
>>     $1 = 8
>>
>> The confusing log messages are thus just the result of the coincidence that sizeof(long long) - sizeof(long) == strlen("/etc"), which causes the disagreement in the layout of struct traced_file to make it look like the 'fname' member just had its directory prefix chopped off.
>>
>> In the discussion of the bug in the OpenBMC issue tracker [0], Wayne Tung (CCed) came up with the patch below, which does seem to solve the immediate problem, but if I'm understanding things right does so by just reverting nscd to a 32-bit time_t, and so I'd expect probably wouldn't be considered the "right" fix -- however I don't presently know enough about the 32/64-bit time_t transition and ensuing compatibility concerns to know what the right fix really is.  Should nscd perhaps be using __time64_t or something instead of time_t?
>
>Reverting to 32 bits time_t only means that we are postponing some potential
>failure to y2038, we really move everything to 64 bit time_t.  Could you check
>if the following patch fix it?
>
>The issue is we do build nss modules with 64 time_t, however some features
>are built on libc.so itself and in such cases we need to explicit use the
>internal __time64_t type.
>
>diff --git a/nscd/nscd.h b/nscd/nscd.h
>index 368091aef8..f15321585b 100644
>--- a/nscd/nscd.h
>+++ b/nscd/nscd.h
>@@ -65,7 +65,7 @@ typedef enum
> struct traced_file
> {
>   /* Tracks the last modified time of the traced file.  */
>-  time_t mtime;
>+  __time64_t mtime;
>   /* Support multiple registered files per database.  */
>   struct traced_file *next;
>   int call_res_init;
>

Ah, great -- after testing that out I can confirm that it appears to fix 
the problem.  Thanks!

Also, after sending that email I discovered that there's an existing 
bugzilla issue for this same problem 
(https://sourceware.org/bugzilla/show_bug.cgi?id=29402), so that can 
presumably be closed once a fix is committed.


Zev
  
Adhemerval Zanella Netto Oct. 26, 2022, 11:31 a.m. UTC | #3
On 25/10/22 22:04, Zev Weiss wrote:
> On Tue, Oct 25, 2022 at 02:13:23PM PDT, Adhemerval Zanella Netto wrote:
>>
>>
>> On 25/10/22 16:54, Zev Weiss via Libc-alpha wrote:
>>> Hello glibc devs,
>>>
>>> We've recently been seeing some misbehavior from nscd in OpenBMC.  It
>>> manifests as lots of log messages like:
>>>
>>>     disabled inotify-based monitoring for file /passwd': No such file or directory
>>>     stat failed for file /passwd'; will try again later: No such file or directory
>>>     disabled inotify-based monitoring for file /group': No such file or directory
>>>     stat failed for file /group'; will try again later: No such file or directory
>>>     disabled inotify-based monitoring for file /hosts': No such file or directory
>>>     stat failed for file /hosts'; will try again later: No such file or directory
>>>     disabled inotify-based monitoring for file /resolv.conf': No such file or directory
>>>     stat failed for file /resolv.conf'; will try again later: No such file or directory
>>>
>>> and so forth.  I initially assumed it was a configure-time --sysconfdir mixup, but after digging into it I found that it actually stems from a time_t size mismatch (this is a 32-bit ARM gnueabi target):
>>>
>>>     $ gdb -batch -ex 'pt time_t' -ex 'p sizeof(time_t)' time/time.o
>>>     type = long
>>>     $1 = 4
>>>     $ gdb -batch -ex 'pt time_t' -ex 'p sizeof(time_t)' nscd/nscd.o
>>>     type = long long
>>>     $1 = 8
>>>
>>> The confusing log messages are thus just the result of the coincidence that sizeof(long long) - sizeof(long) == strlen("/etc"), which causes the disagreement in the layout of struct traced_file to make it look like the 'fname' member just had its directory prefix chopped off.
>>>
>>> In the discussion of the bug in the OpenBMC issue tracker [0], Wayne Tung (CCed) came up with the patch below, which does seem to solve the immediate problem, but if I'm understanding things right does so by just reverting nscd to a 32-bit time_t, and so I'd expect probably wouldn't be considered the "right" fix -- however I don't presently know enough about the 32/64-bit time_t transition and ensuing compatibility concerns to know what the right fix really is.  Should nscd perhaps be using __time64_t or something instead of time_t?
>>
>> Reverting to 32 bits time_t only means that we are postponing some potential
>> failure to y2038, we really move everything to 64 bit time_t.  Could you check
>> if the following patch fix it?
>>
>> The issue is we do build nss modules with 64 time_t, however some features
>> are built on libc.so itself and in such cases we need to explicit use the
>> internal __time64_t type.
>>
>> diff --git a/nscd/nscd.h b/nscd/nscd.h
>> index 368091aef8..f15321585b 100644
>> --- a/nscd/nscd.h
>> +++ b/nscd/nscd.h
>> @@ -65,7 +65,7 @@ typedef enum
>> struct traced_file
>> {
>>   /* Tracks the last modified time of the traced file.  */
>> -  time_t mtime;
>> +  __time64_t mtime;
>>   /* Support multiple registered files per database.  */
>>   struct traced_file *next;
>>   int call_res_init;
>>
> 
> Ah, great -- after testing that out I can confirm that it appears to fix the problem.  Thanks!
> 
> Also, after sending that email I discovered that there's an existing bugzilla issue for this same problem (https://sourceware.org/bugzilla/show_bug.cgi?id=29402), so that can presumably be closed once a fix is committed.

I will send a proper path and I will check if there are any other instance of time_t
that is used internally as this.
  

Patch

diff --git a/Makeconfig b/Makeconfig
index 47db08d6ae..f78f7cc74a 100644
--- a/Makeconfig
+++ b/Makeconfig
@@ -869,7 +869,7 @@  endif
  +extra-math-flags = $(if $(filter libm,$(in-module)),-fno-math-errno,-fmath-errno)

  # Use 64 bit time_t support for installed programs
-installed-modules = nonlib nscd lddlibc4 ldconfig locale_programs \
+installed-modules = nonlib lddlibc4 ldconfig locale_programs \
             iconvprogs libnss_files libnss_compat libnss_db libnss_hesiod \
             libutil libpcprofile libSegFault
  +extra-time-flags = $(if $(filter $(installed-modules),\