Fix for bz22161: ncsd: avoid dangling lock in netgroup cache timeout code
Commit Message
Patch for https://sourceware.org/bugzilla/show_bug.cgi?id=22161
"From the bz: in nscd/netgroupcache.c in addinnetgrX() we call
mempool_alloc(..., 1) which takes a lock on the database. If we
exit via the "bump timeout" clause, the lock is not released."
This patch adds an unlock if mempool_alloc actually took a lock, in
the case where we return early because the timeout hasn't timed out.
Because the dangling lock is a read lock, queries to the database
continue to work. The cache prune thread eventually becomes
deadlocked, and queries stop seeing updates (i.e. they return stale
data forever).
Original patch by Al Heisner via https://bugzilla.redhat.com/show_bug.cgi?id=1277672
2017-09-20 DJ Delorie <dj@redhat.com>
* nscd/netgroupcache.c (addinnetgrX): Release read lock after
bumping timeout values.
Comments
DJ Delorie wrote:
> Patch for https://sourceware.org/bugzilla/show_bug.cgi?id=22161
>
> "From the bz: in nscd/netgroupcache.c in addinnetgrX() we call
> mempool_alloc(..., 1) which takes a lock on the database. If we
> exit via the "bump timeout" clause, the lock is not released."
>
> This patch adds an unlock if mempool_alloc actually took a lock, in
> the case where we return early because the timeout hasn't timed out.
>
> Because the dangling lock is a read lock, queries to the database
> continue to work. The cache prune thread eventually becomes
> deadlocked, and queries stop seeing updates (i.e. they return stale
> data forever).
>
> Original patch by Al Heisner via https://bugzilla.redhat.com/show_bug.cgi?id=1277672
>
> 2017-09-20 DJ Delorie <dj@redhat.com>
>
> * nscd/netgroupcache.c (addinnetgrX): Release read lock after
> bumping timeout values.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Thank you.
> diff --git a/nscd/netgroupcache.c b/nscd/netgroupcache.c
> index cd0c3ea..3ca96f8 100644
> --- a/nscd/netgroupcache.c
> +++ b/nscd/netgroupcache.c
> @@ -584,6 +584,8 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
> dh->timeout = timeout;
> dh->ttl = dataset->head.ttl;
> ++dh->nreloads;
> + if (cacheable)
> + pthread_rwlock_unlock (&db->lock);
> return timeout;
> }
>
On Sep 20 2017, DJ Delorie <dj@redhat.com> wrote:
> Patch for https://sourceware.org/bugzilla/show_bug.cgi?id=22161
>
> "From the bz: in nscd/netgroupcache.c in addinnetgrX() we call
> mempool_alloc(..., 1) which takes a lock on the database. If we
> exit via the "bump timeout" clause, the lock is not released."
>
> This patch adds an unlock if mempool_alloc actually took a lock, in
> the case where we return early because the timeout hasn't timed out.
What about the uses of mempool_alloc in the other *cache.c
implementations. Are they ok?
Andreas.
just a nit, but your subject line misspells "nscd".
PC
@@ -584,6 +584,8 @@ addinnetgrX (struct database_dyn *db, int fd, request_header *req,
dh->timeout = timeout;
dh->ttl = dataset->head.ttl;
++dh->nreloads;
+ if (cacheable)
+ pthread_rwlock_unlock (&db->lock);
return timeout;
}