Fix for bz22161: ncsd: avoid dangling lock in netgroup cache timeout code

Message ID xnd16lt1vd.fsf@greed.delorie.com
State New, archived
Headers

Commit Message

DJ Delorie Sept. 20, 2017, 10:57 p.m. UTC
  Patch for https://sourceware.org/bugzilla/show_bug.cgi?id=22161

 "From the bz: in nscd/netgroupcache.c in addinnetgrX() we call
  mempool_alloc(..., 1) which takes a lock on the database.  If we
  exit via the "bump timeout" clause, the lock is not released."

This patch adds an unlock if mempool_alloc actually took a lock, in
the case where we return early because the timeout hasn't timed out.

Because the dangling lock is a read lock, queries to the database
continue to work.  The cache prune thread eventually becomes
deadlocked, and queries stop seeing updates (i.e. they return stale
data forever).

Original patch by Al Heisner via https://bugzilla.redhat.com/show_bug.cgi?id=1277672

2017-09-20  DJ Delorie  <dj@redhat.com>

	* nscd/netgroupcache.c (addinnetgrX): Release read lock after
	bumping timeout values.
  

Comments

Jonathan Nieder Sept. 20, 2017, 11:07 p.m. UTC | #1
DJ Delorie wrote:

> Patch for https://sourceware.org/bugzilla/show_bug.cgi?id=22161
>
>  "From the bz: in nscd/netgroupcache.c in addinnetgrX() we call
>   mempool_alloc(..., 1) which takes a lock on the database.  If we
>   exit via the "bump timeout" clause, the lock is not released."
>
> This patch adds an unlock if mempool_alloc actually took a lock, in
> the case where we return early because the timeout hasn't timed out.
>
> Because the dangling lock is a read lock, queries to the database
> continue to work.  The cache prune thread eventually becomes
> deadlocked, and queries stop seeing updates (i.e. they return stale
> data forever).
>
> Original patch by Al Heisner via https://bugzilla.redhat.com/show_bug.cgi?id=1277672
>
> 2017-09-20  DJ Delorie  <dj@redhat.com>
>
> 	* nscd/netgroupcache.c (addinnetgrX): Release read lock after
> 	bumping timeout values.

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thank you.

> diff --git a/nscd/netgroupcache.c b/nscd/netgroupcache.c
> index cd0c3ea..3ca96f8 100644
> --- a/nscd/netgroupcache.c
> +++ b/nscd/netgroupcache.c
> @@ -584,6 +584,8 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
>        dh->timeout = timeout;
>        dh->ttl = dataset->head.ttl;
>        ++dh->nreloads;
> +      if (cacheable)
> +        pthread_rwlock_unlock (&db->lock);
>        return timeout;
>      }
>
  
Andreas Schwab Sept. 21, 2017, 7:08 a.m. UTC | #2
On Sep 20 2017, DJ Delorie <dj@redhat.com> wrote:

> Patch for https://sourceware.org/bugzilla/show_bug.cgi?id=22161
>
>  "From the bz: in nscd/netgroupcache.c in addinnetgrX() we call
>   mempool_alloc(..., 1) which takes a lock on the database.  If we
>   exit via the "bump timeout" clause, the lock is not released."
>
> This patch adds an unlock if mempool_alloc actually took a lock, in
> the case where we return early because the timeout hasn't timed out.

What about the uses of mempool_alloc in the other *cache.c
implementations.  Are they ok?

Andreas.
  
Paul A. Clarke Sept. 21, 2017, 2:30 p.m. UTC | #3
just a nit, but your subject line misspells "nscd".

PC
  

Patch

diff --git a/nscd/netgroupcache.c b/nscd/netgroupcache.c
index cd0c3ea..3ca96f8 100644
--- a/nscd/netgroupcache.c
+++ b/nscd/netgroupcache.c
@@ -584,6 +584,8 @@  addinnetgrX (struct database_dyn *db, int fd, request_header *req,
       dh->timeout = timeout;
       dh->ttl = dataset->head.ttl;
       ++dh->nreloads;
+      if (cacheable)
+        pthread_rwlock_unlock (&db->lock);
       return timeout;
     }