Fix nscd assertion failure in gc (bug 19755)

Message ID mvmtwh36bx5.fsf@hawking.suse.de
State Committed
Headers

Commit Message

Andreas Schwab June 8, 2016, 1:35 p.m. UTC
  If a GETxxBYyy request (for passwd or group) is running in parallel to
an INVALIDATE request (for the same database) then in a particular order
of events the garbage collector is not properly marking all used memory
and fails an assertion:

   GETGRBYNAME (root)
Haven't found "root" in group cache!
add new entry "root" of type GETGRBYNAME for group to cache (first)
handle_request: request received (Version = 2) from PID 7413
   INVALIDATE (group)
pruning group cache; time 9223372036854775807
considering GETGRBYNAME entry "root", timeout 1456763027
add new entry "0" of type GETGRBYGID for group to cache
remove GETGRBYNAME entry "root"
nscd: mem.c:403: gc: Assertion `next_data == &he_data[db->head->nentries]' failed.

Here the first call to cache_add added the GETGRBYNAME entry, which is
immediately marked for collection by prune_cache.  Then the GETGRBYGID
entry is added which shares the data packet with the first entry and
therefore is marked as !first, while the marking look in prune_cache has
already finished.  When the garbage collector runs, it only considers
references by entries marked as first, missing the reference by the
secondary entry.

The only way to fix that is to prevent prune_cache from running while the
two related entries are added.

	[BZ #19755]
	* nscd/pwdcache.c (cache_addpw): Lock prune_run_lock while adding
	new entries in auto-propagate mode.
	* nscd/grpcache.c (cache_addgr): Likewise.
---
 nscd/grpcache.c | 13 ++++++++++++-
 nscd/pwdcache.c | 13 ++++++++++++-
 2 files changed, 24 insertions(+), 2 deletions(-)
  

Comments

Carlos O'Donell June 8, 2016, 7:58 p.m. UTC | #1
On 06/08/2016 09:35 AM, Andreas Schwab wrote:
> If a GETxxBYyy request (for passwd or group) is running in parallel to
> an INVALIDATE request (for the same database) then in a particular order
> of events the garbage collector is not properly marking all used memory
> and fails an assertion:
> 
>    GETGRBYNAME (root)
> Haven't found "root" in group cache!
> add new entry "root" of type GETGRBYNAME for group to cache (first)
> handle_request: request received (Version = 2) from PID 7413
>    INVALIDATE (group)
> pruning group cache; time 9223372036854775807
> considering GETGRBYNAME entry "root", timeout 1456763027
> add new entry "0" of type GETGRBYGID for group to cache
> remove GETGRBYNAME entry "root"
> nscd: mem.c:403: gc: Assertion `next_data == &he_data[db->head->nentries]' failed.
> 
> Here the first call to cache_add added the GETGRBYNAME entry, which is
> immediately marked for collection by prune_cache.  Then the GETGRBYGID
> entry is added which shares the data packet with the first entry and
> therefore is marked as !first, while the marking look in prune_cache has
> already finished.  When the garbage collector runs, it only considers
> references by entries marked as first, missing the reference by the
> secondary entry.
> 
> The only way to fix that is to prevent prune_cache from running while the
> two related entries are added.
> 
> 	[BZ #19755]
> 	* nscd/pwdcache.c (cache_addpw): Lock prune_run_lock while adding
> 	new entries in auto-propagate mode.
> 	* nscd/grpcache.c (cache_addgr): Likewise.

This looks good to me.

> ---
>  nscd/grpcache.c | 13 ++++++++++++-
>  nscd/pwdcache.c | 13 ++++++++++++-
>  2 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/nscd/grpcache.c b/nscd/grpcache.c
> index 3831170..8b9b13d 100644
> --- a/nscd/grpcache.c
> +++ b/nscd/grpcache.c
> @@ -205,10 +205,19 @@ cache_addgr (struct database_dyn *db, int fd, request_header *req,
>        dataset = NULL;
>  
>        if (he == NULL)
> -	dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
> +	{
> +	  /* Prevent an INVALIDATE request from pruning the data between
> +	     the two calls to cache_add.  */
> +	  if (db->propagate)
> +	    pthread_mutex_lock (&db->prune_run_lock);

OK, using prune_run_lock to prevent concurrent changes.

> +	  dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
> +	}
>  
>        if (dataset == NULL)
>  	{
> +	  if (he == NULL && db->propagate)
> +	    pthread_mutex_unlock (&db->prune_run_lock);

OK. Unlock if we're using alloca since we won't add these entries to the
database.

> +
>  	  /* We cannot permanently add the result in the moment.  But
>  	     we can provide the result as is.  Store the data in some
>  	     temporary memory.  */
> @@ -396,6 +405,8 @@ cache_addgr (struct database_dyn *db, int fd, request_header *req,
>  
>  	out:
>  	  pthread_rwlock_unlock (&db->lock);
> +	  if (he == NULL && db->propagate)
> +	    pthread_mutex_unlock (&db->prune_run_lock);

OK. Unlock in the !alloca_used path.

>  	}
>      }
>  
> diff --git a/nscd/pwdcache.c b/nscd/pwdcache.c
> index 6dd6746..5ef8485 100644
> --- a/nscd/pwdcache.c
> +++ b/nscd/pwdcache.c
> @@ -198,10 +198,19 @@ cache_addpw (struct database_dyn *db, int fd, request_header *req,
>        dataset = NULL;
>  
>        if (he == NULL)
> -	dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
> +	{
> +	  /* Prevent an INVALIDATE request from pruning the data between
> +	     the two calls to cache_add.  */
> +	  if (db->propagate)
> +	    pthread_mutex_lock (&db->prune_run_lock);
> +	  dataset = (struct dataset *) mempool_alloc (db, total + n, 1);

OK.

> +	}
>  
>        if (dataset == NULL)
>  	{
> +	  if (he == NULL && db->propagate)
> +	    pthread_mutex_unlock (&db->prune_run_lock);

OK.

> +
>  	  /* We cannot permanently add the result in the moment.  But
>  	     we can provide the result as is.  Store the data in some
>  	     temporary memory.  */
> @@ -374,6 +383,8 @@ cache_addpw (struct database_dyn *db, int fd, request_header *req,
>  
>  	out:
>  	  pthread_rwlock_unlock (&db->lock);
> +	  if (he == NULL && db->propagate)
> +	    pthread_mutex_unlock (&db->prune_run_lock);

OK.

>  	}
>      }
>  
>
  

Patch

diff --git a/nscd/grpcache.c b/nscd/grpcache.c
index 3831170..8b9b13d 100644
--- a/nscd/grpcache.c
+++ b/nscd/grpcache.c
@@ -205,10 +205,19 @@  cache_addgr (struct database_dyn *db, int fd, request_header *req,
       dataset = NULL;
 
       if (he == NULL)
-	dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
+	{
+	  /* Prevent an INVALIDATE request from pruning the data between
+	     the two calls to cache_add.  */
+	  if (db->propagate)
+	    pthread_mutex_lock (&db->prune_run_lock);
+	  dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
+	}
 
       if (dataset == NULL)
 	{
+	  if (he == NULL && db->propagate)
+	    pthread_mutex_unlock (&db->prune_run_lock);
+
 	  /* We cannot permanently add the result in the moment.  But
 	     we can provide the result as is.  Store the data in some
 	     temporary memory.  */
@@ -396,6 +405,8 @@  cache_addgr (struct database_dyn *db, int fd, request_header *req,
 
 	out:
 	  pthread_rwlock_unlock (&db->lock);
+	  if (he == NULL && db->propagate)
+	    pthread_mutex_unlock (&db->prune_run_lock);
 	}
     }
 
diff --git a/nscd/pwdcache.c b/nscd/pwdcache.c
index 6dd6746..5ef8485 100644
--- a/nscd/pwdcache.c
+++ b/nscd/pwdcache.c
@@ -198,10 +198,19 @@  cache_addpw (struct database_dyn *db, int fd, request_header *req,
       dataset = NULL;
 
       if (he == NULL)
-	dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
+	{
+	  /* Prevent an INVALIDATE request from pruning the data between
+	     the two calls to cache_add.  */
+	  if (db->propagate)
+	    pthread_mutex_lock (&db->prune_run_lock);
+	  dataset = (struct dataset *) mempool_alloc (db, total + n, 1);
+	}
 
       if (dataset == NULL)
 	{
+	  if (he == NULL && db->propagate)
+	    pthread_mutex_unlock (&db->prune_run_lock);
+
 	  /* We cannot permanently add the result in the moment.  But
 	     we can provide the result as is.  Store the data in some
 	     temporary memory.  */
@@ -374,6 +383,8 @@  cache_addpw (struct database_dyn *db, int fd, request_header *req,
 
 	out:
 	  pthread_rwlock_unlock (&db->lock);
+	  if (he == NULL && db->propagate)
+	    pthread_mutex_unlock (&db->prune_run_lock);
 	}
     }