nscd: Skip unusable entries in first pass in prune_cache (bug 30800)

Message ID 87o7iry6k6.fsf@oldenburg.str.redhat.com
State Committed
Commit c00b984fcd53f679ca2dafcd1aee2c89836e6e73
Headers
Series nscd: Skip unusable entries in first pass in prune_cache (bug 30800) |

Checks

Context Check Description
redhat-pt-bot/TryBot-apply_patch success Patch applied to master at the time it was sent
redhat-pt-bot/TryBot-32bit success Build for i686
linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 success Testing passed
linaro-tcwg-bot/tcwg_glibc_build--master-arm success Testing passed
linaro-tcwg-bot/tcwg_glibc_check--master-arm success Testing passed

Commit Message

Florian Weimer Aug. 28, 2023, 7:22 a.m. UTC
  Previously, if an entry was marked unusable for any reason, but had
not timed out yet, the assert would trigger.

One way to get into such state is if a data change is detected during
re-validation of an entry.  This causes the entry to be marked as not
usable.  If exits nscd soon after that, then the clock jumps
backwards, and nscd restarted, the cache re-validation run after
startup triggers the removed assert.

The change is more complicated than just the removal of the assert
because entries marked as not usable should be garbage-collected in
the second pass.  To make this happen, it is necessary to update some
book-keeping data.

Tested on x86_64-linux-gnu with a problematic /var/db/nscd/passwd file
that triggered the assert before.  The assert is gone, and unusable
entries are pruned as expected.

---
 nscd/cache.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)


base-commit: 87ced255bdf2681f5bf6c89d7121e59f6f342161
  

Comments

DJ Delorie Aug. 29, 2023, 4:14 a.m. UTC | #1
Florian Weimer via Libc-alpha <libc-alpha@sourceware.org> writes:
> Previously, if an entry was marked unusable for any reason, but had
> not timed out yet, the assert would trigger.

So, the old way assumed that any entry that hadn't timed out, must be
usable, which turns out to be technically false.

> The change is more complicated than just the removal of the assert
> because entries marked as not usable should be garbage-collected in
> the second pass.  To make this happen, it is necessary to update some
> book-keeping data.

which is the mark[cnt] = true line

> -	  /* Check whether the entry timed out.  */
> +	  /* Check whether the entry timed out.  Timed out entries
> +	     will be revalidated.  For unusable records, it is still
> +	     necessary to record that the bucket needs to be scanned
> +	     again below.  */
> -	  if (dh->timeout < now)
> +	  if (dh->timeout < now || !dh->usable)

This controls the block with mark[] in it; so now we also mark unusable
entries as needing to be checked.

> -	      if (runp->first)
> +	      if (runp->first && dh->usable)

But some of the code only works for the (expected) usable entries.

>  		    {
>  		      /* Remove the value.  */
>  		      dh->usable = false;
> -
> -		      /* We definitely have some garbage entries now.  */
> -		      any = true;

We move this outside the "&& usable" check, below.  Makes sense, as it's
at the same scope as setting mark[].

>  		    }
>  		  else
>  		    {
> @@ -413,18 +413,15 @@ prune_cache (struct database_dyn *table, time_t now, int fd)
>  
>  		      time_t timeout = readdfcts[runp->type] (table, runp, dh);
>  		      next_timeout = MIN (next_timeout, timeout);
> -
> -		      /* If the entry has been replaced, we might need
> -			 cleanup.  */
> -		      any |= !dh->usable;

Here too.

>  		    }
>  		}
> +
> +	      /* If the entry has been replaced, we might need cleanup.  */
> +	      any |= !dh->usable;

We move them here so the entry is always re-scanned.

>  	    }
>  	  else
> -	    {
> -	      assert (dh->usable);
> -	      next_timeout = MIN (next_timeout, dh->timeout);
> -	    }
> +	    /* Entry has not timed out and is usable.  */
> +	    next_timeout = MIN (next_timeout, dh->timeout);

This else used to be "not timed out", now it's "not timed out and still
usable" so the assert is no longer required.

This all makes sense to me!  LGTM.

Reviewed-by: DJ Delorie <dj@redhat.com>
  
Florian Weimer Aug. 29, 2023, 9:51 a.m. UTC | #2
* DJ Delorie:

> This else used to be "not timed out", now it's "not timed out and still
> usable" so the assert is no longer required.
>
> This all makes sense to me!  LGTM.
>
> Reviewed-by: DJ Delorie <dj@redhat.com>

Thank you for your review.  Pushed.

Florian
  

Patch

diff --git a/nscd/cache.c b/nscd/cache.c
index b4b54d82bb..336ff548cb 100644
--- a/nscd/cache.c
+++ b/nscd/cache.c
@@ -370,8 +370,11 @@  prune_cache (struct database_dyn *table, time_t now, int fd)
 		       serv2str[runp->type], str, dh->timeout);
 	    }
 
-	  /* Check whether the entry timed out.  */
-	  if (dh->timeout < now)
+	  /* Check whether the entry timed out.  Timed out entries
+	     will be revalidated.  For unusable records, it is still
+	     necessary to record that the bucket needs to be scanned
+	     again below.  */
+	  if (dh->timeout < now || !dh->usable)
 	    {
 	      /* This hash bucket could contain entries which need to
 		 be looked at.  */
@@ -383,7 +386,7 @@  prune_cache (struct database_dyn *table, time_t now, int fd)
 	      /* We only have to look at the data of the first entries
 		 since the count information is kept in the data part
 		 which is shared.  */
-	      if (runp->first)
+	      if (runp->first && dh->usable)
 		{
 
 		  /* At this point there are two choices: we reload the
@@ -399,9 +402,6 @@  prune_cache (struct database_dyn *table, time_t now, int fd)
 		    {
 		      /* Remove the value.  */
 		      dh->usable = false;
-
-		      /* We definitely have some garbage entries now.  */
-		      any = true;
 		    }
 		  else
 		    {
@@ -413,18 +413,15 @@  prune_cache (struct database_dyn *table, time_t now, int fd)
 
 		      time_t timeout = readdfcts[runp->type] (table, runp, dh);
 		      next_timeout = MIN (next_timeout, timeout);
-
-		      /* If the entry has been replaced, we might need
-			 cleanup.  */
-		      any |= !dh->usable;
 		    }
 		}
+
+	      /* If the entry has been replaced, we might need cleanup.  */
+	      any |= !dh->usable;
 	    }
 	  else
-	    {
-	      assert (dh->usable);
-	      next_timeout = MIN (next_timeout, dh->timeout);
-	    }
+	    /* Entry has not timed out and is usable.  */
+	    next_timeout = MIN (next_timeout, dh->timeout);
 
 	  run = runp->next;
 	}