nscd: bump GC cycle during cache pruning (bug 26130)

Message ID mvmeeqd3hrd.fsf@suse.de
State Committed
Headers
Series nscd: bump GC cycle during cache pruning (bug 26130) |

Commit Message

Andreas Schwab June 17, 2020, 2:10 p.m. UTC
  While nscd prunes a cache it becomes inconsistent temporarily, which is
visible to clients if that cache is shared.  Bump the GC cycle counter so
that the clients notice the modification window.
---
 nscd/cache.c | 9 +++++++++
 1 file changed, 9 insertions(+)
  

Comments

Florian Weimer June 29, 2020, 8:27 a.m. UTC | #1
* Andreas Schwab:

> While nscd prunes a cache it becomes inconsistent temporarily, which is
> visible to clients if that cache is shared.  Bump the GC cycle counter so
> that the clients notice the modification window.
> ---
>  nscd/cache.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/nscd/cache.c b/nscd/cache.c
> index 94163d9ce3..2ceac94c23 100644
> --- a/nscd/cache.c
> +++ b/nscd/cache.c
> @@ -453,6 +453,11 @@ prune_cache (struct database_dyn *table, time_t now, int fd)
>  	  pthread_rwlock_wrlock (&table->lock);
>  	}
>  
> +      /* Now we start modifying the data.  Make sure all readers of the
> +	 data are aware of this and temporarily don't use the data.  */
> +      ++table->head->gc_cycle;
> +      assert ((table->head->gc_cycle & 1) == 1);
> +
>        while (first <= last)
>  	{
>  	  if (mark[first])
> @@ -493,6 +498,10 @@ prune_cache (struct database_dyn *table, time_t now, int fd)
>  	  ++first;
>  	}
>  
> +      /* Now we are done modifying the data.  */
> +      ++table->head->gc_cycle;
> +      assert ((table->head->gc_cycle & 1) == 0);
> +
>        /* It's all done.  */
>        pthread_rwlock_unlock (&table->lock);

I think this needs barriers after and before the increments.

Thanks,
Florian
  
Andreas Schwab June 29, 2020, 10:35 a.m. UTC | #2
On Jun 29 2020, Florian Weimer wrote:

> I think this needs barriers after and before the increments.

Why doesn't gc need those barriers?

Andreas.
  
Florian Weimer June 29, 2020, 10:48 a.m. UTC | #3
* Andreas Schwab:

> On Jun 29 2020, Florian Weimer wrote:
>
>> I think this needs barriers after and before the increments.
>
> Why doesn't gc need those barriers?

I think it needs them as well.  I thought we had them there. 8-(

Thanks,
Florian
  
Andreas Schwab June 29, 2020, 2:09 p.m. UTC | #4
On Jun 29 2020, Florian Weimer via Libc-alpha wrote:

> * Andreas Schwab:
>
>> On Jun 29 2020, Florian Weimer wrote:
>>
>>> I think this needs barriers after and before the increments.
>>
>> Why doesn't gc need those barriers?
>
> I think it needs them as well.  I thought we had them there. 8-(

Should they use atomic_increment?

Andreas.
  
Carlos O'Donell June 29, 2020, 2:14 p.m. UTC | #5
On 6/29/20 10:09 AM, Andreas Schwab wrote:
> On Jun 29 2020, Florian Weimer via Libc-alpha wrote:
> 
>> * Andreas Schwab:
>>
>>> On Jun 29 2020, Florian Weimer wrote:
>>>
>>>> I think this needs barriers after and before the increments.
>>>
>>> Why doesn't gc need those barriers?
>>
>> I think it needs them as well.  I thought we had them there. 8-(
> 
> Should they use atomic_increment?

There are many concurrency problems with the cache, and we do see
these issues with downstream customers. Torvald reviewed this at
one point and produced a patch that I've attached to bug 25888.

Our latest attempt to fix this is here:
https://sourceware.org/bugzilla/show_bug.cgi?id=25888
https://sourceware.org/bugzilla/attachment.cgi?id=12493
  
Florian Weimer June 29, 2020, 2:16 p.m. UTC | #6
* Andreas Schwab:

> On Jun 29 2020, Florian Weimer via Libc-alpha wrote:
>
>> * Andreas Schwab:
>>
>>> On Jun 29 2020, Florian Weimer wrote:
>>>
>>>> I think this needs barriers after and before the increments.
>>>
>>> Why doesn't gc need those barriers?
>>
>> I think it needs them as well.  I thought we had them there. 8-(
>
> Should they use atomic_increment?

Yes, that as well.  I don't know if the legacy atomic_increment function
implies a barrier.  I think we need the equivalent of __atomic_fetch_add
(or __atomic_add_fetch) with __ATOMIC_ACQ_REL, and our <atomic.h> does
not have this, and we are not supposed to use the compiler built-ins.

atomic_fetch_add_relaxed together with atomic_full_barrier should work,
though.

Thanks,
Florian
  
Andreas Schwab June 29, 2020, 2:20 p.m. UTC | #7
On Jun 29 2020, Carlos O'Donell via Libc-alpha wrote:

> https://sourceware.org/bugzilla/attachment.cgi?id=12493

That doesn't fix this bug.

Andreas.
  

Patch

diff --git a/nscd/cache.c b/nscd/cache.c
index 94163d9ce3..2ceac94c23 100644
--- a/nscd/cache.c
+++ b/nscd/cache.c
@@ -453,6 +453,11 @@  prune_cache (struct database_dyn *table, time_t now, int fd)
 	  pthread_rwlock_wrlock (&table->lock);
 	}
 
+      /* Now we start modifying the data.  Make sure all readers of the
+	 data are aware of this and temporarily don't use the data.  */
+      ++table->head->gc_cycle;
+      assert ((table->head->gc_cycle & 1) == 1);
+
       while (first <= last)
 	{
 	  if (mark[first])
@@ -493,6 +498,10 @@  prune_cache (struct database_dyn *table, time_t now, int fd)
 	  ++first;
 	}
 
+      /* Now we are done modifying the data.  */
+      ++table->head->gc_cycle;
+      assert ((table->head->gc_cycle & 1) == 0);
+
       /* It's all done.  */
       pthread_rwlock_unlock (&table->lock);