Patchwork Fix for bz22161: ncsd: avoid dangling lock in netgroup cache timeout code

login
register
mail settings
Submitter DJ Delorie
Date Sept. 20, 2017, 10:57 p.m.
Message ID <xnd16lt1vd.fsf@greed.delorie.com>
Download mbox | patch
Permalink /patch/22982/
State New
Headers show

Comments

DJ Delorie - Sept. 20, 2017, 10:57 p.m.
Patch for https://sourceware.org/bugzilla/show_bug.cgi?id=22161

 "From the bz: in nscd/netgroupcache.c in addinnetgrX() we call
  mempool_alloc(..., 1) which takes a lock on the database.  If we
  exit via the "bump timeout" clause, the lock is not released."

This patch adds an unlock if mempool_alloc actually took a lock, in
the case where we return early because the timeout hasn't timed out.

Because the dangling lock is a read lock, queries to the database
continue to work.  The cache prune thread eventually becomes
deadlocked, and queries stop seeing updates (i.e. they return stale
data forever).

Original patch by Al Heisner via https://bugzilla.redhat.com/show_bug.cgi?id=1277672

2017-09-20  DJ Delorie  <dj@redhat.com>

	* nscd/netgroupcache.c (addinnetgrX): Release read lock after
	bumping timeout values.
Jonathan Nieder - Sept. 20, 2017, 11:07 p.m.
DJ Delorie wrote:

> Patch for https://sourceware.org/bugzilla/show_bug.cgi?id=22161
>
>  "From the bz: in nscd/netgroupcache.c in addinnetgrX() we call
>   mempool_alloc(..., 1) which takes a lock on the database.  If we
>   exit via the "bump timeout" clause, the lock is not released."
>
> This patch adds an unlock if mempool_alloc actually took a lock, in
> the case where we return early because the timeout hasn't timed out.
>
> Because the dangling lock is a read lock, queries to the database
> continue to work.  The cache prune thread eventually becomes
> deadlocked, and queries stop seeing updates (i.e. they return stale
> data forever).
>
> Original patch by Al Heisner via https://bugzilla.redhat.com/show_bug.cgi?id=1277672
>
> 2017-09-20  DJ Delorie  <dj@redhat.com>
>
> 	* nscd/netgroupcache.c (addinnetgrX): Release read lock after
> 	bumping timeout values.

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Thank you.

> diff --git a/nscd/netgroupcache.c b/nscd/netgroupcache.c
> index cd0c3ea..3ca96f8 100644
> --- a/nscd/netgroupcache.c
> +++ b/nscd/netgroupcache.c
> @@ -584,6 +584,8 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
>        dh->timeout = timeout;
>        dh->ttl = dataset->head.ttl;
>        ++dh->nreloads;
> +      if (cacheable)
> +        pthread_rwlock_unlock (&db->lock);
>        return timeout;
>      }
>
Andreas Schwab - Sept. 21, 2017, 7:08 a.m.
On Sep 20 2017, DJ Delorie <dj@redhat.com> wrote:

> Patch for https://sourceware.org/bugzilla/show_bug.cgi?id=22161
>
>  "From the bz: in nscd/netgroupcache.c in addinnetgrX() we call
>   mempool_alloc(..., 1) which takes a lock on the database.  If we
>   exit via the "bump timeout" clause, the lock is not released."
>
> This patch adds an unlock if mempool_alloc actually took a lock, in
> the case where we return early because the timeout hasn't timed out.

What about the uses of mempool_alloc in the other *cache.c
implementations.  Are they ok?

Andreas.
Paul Clarke - Sept. 21, 2017, 2:30 p.m.
just a nit, but your subject line misspells "nscd".

PC

Patch

diff --git a/nscd/netgroupcache.c b/nscd/netgroupcache.c
index cd0c3ea..3ca96f8 100644
--- a/nscd/netgroupcache.c
+++ b/nscd/netgroupcache.c
@@ -584,6 +584,8 @@  addinnetgrX (struct database_dyn *db, int fd, request_header *req,
       dh->timeout = timeout;
       dh->ttl = dataset->head.ttl;
       ++dh->nreloads;
+      if (cacheable)
+        pthread_rwlock_unlock (&db->lock);
       return timeout;
     }