[v2,14/23] nscd: Consistant snapshot with client function __nscd_read_from_cache
Checks
| Context |
Check |
Description |
| redhat-pt-bot/TryBot-apply_patch |
success
|
Patch applied to master at the time it was sent
|
| linaro-tcwg-bot/tcwg_glibc_build--master-arm |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_glibc_build--master-aarch64 |
success
|
Build passed
|
| linaro-tcwg-bot/tcwg_glibc_check--master-aarch64 |
fail
|
Test failed
|
| linaro-tcwg-bot/tcwg_glibc_check--master-arm |
fail
|
Test failed
|
Commit Message
Reuse the same software TM protocol that we use for _dl_find_object,
with a minor tweak: for newly added records, we rely on loads
carrying a dependency on the next pointer. This means the GC cycle
versioning is only needed for cache pruning and actual GC, which move
data around or make it available for subsequent allocation.
---
nscd/cache.c | 41 +++++++++++---
nscd/mem.c | 7 +--
nscd/nscd-client.h | 40 ++++++++++++-
nscd/nscd.h | 4 ++
nscd/nscd_helper.c | 137 +++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 214 insertions(+), 15 deletions(-)
Comments
On 3/20/26 4:42 PM, Florian Weimer wrote:
Subject typo s/Consistant/Consistent/g (Claude Code v2.1.81 Sonnet 4.5)
It took 2 rounds of review with Claud Code, narrowing attention and
context to have it understand the code in question. While it can review
narrower contexts it doesn't have the working attention to remember key
design details for nscd. As such it can only help me in localized audits
to improve code quality.
> Reuse the same software TM protocol that we use for _dl_find_object,
> with a minor tweak: for newly added records, we rely on loads
> carrying a dependency on the next pointer. This means the GC cycle
> versioning is only needed for cache pruning and actual GC, which move
> data around or make it available for subsequent allocation.
Looks really good. I'm happy to see the reuse of seqlock-style strategies
we used in _dl_find_object!
LGTM. You can keep my RB if you fix the subject line typo, and if you want
to remove the redundant release.
We can look at the acquire optimization another day.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> ---
> nscd/cache.c | 41 +++++++++++---
> nscd/mem.c | 7 +--
> nscd/nscd-client.h | 40 ++++++++++++-
> nscd/nscd.h | 4 ++
> nscd/nscd_helper.c | 137 +++++++++++++++++++++++++++++++++++++++++++++
> 5 files changed, 214 insertions(+), 15 deletions(-)
>
> diff --git a/nscd/cache.c b/nscd/cache.c
> index 3f8be16281..7e133f5bae 100644
> --- a/nscd/cache.c
> +++ b/nscd/cache.c
> @@ -176,8 +176,10 @@ cache_add (int type, const void *key, size_t len, struct datahead *packet,
> newp->packet = (char *) packet - table->data;
> assert ((newp->packet & BLOCK_ALIGN_M1) == 0);
>
> - /* Put the new entry in the first position. */
> - /* TODO Review concurrency. Use atomic_exchange_release. */
> + /* Put the new entry in the first position.
> +
> + The load in __nscd_read_from_cache carries a dependency
> + on the release store from the CAS below. */
> newp->next = atomic_load_relaxed (&table->head->array[hash]);
OK. Relaxed MO.
> while (!atomic_compare_exchange_weak_release (&table->head->array[hash],
> (ref_t *) &newp->next,
> @@ -231,6 +233,30 @@ cache_add (int type, const void *key, size_t len, struct datahead *packet,
> return 0;
> }
>
> +/* This follows the software TM protocol in dl-find_object.c
> + (_dlfo_mappings_begin_update and _dlfo_mappings_end_update). Also
> + see __nscd_cache_read_start_version and __nscd_cache_read_success
> + in nscd_helper.c. */
> +
> +void
> +__nscd_begin_gc (struct database_dyn *table)
> +{
> + uint32_t gc_cycle = atomic_load_relaxed (&table->head->gc_cycle) + 1;
> + assert ((gc_cycle & 1) == 1);
> + atomic_store_relaxed (&table->head->gc_cycle, gc_cycle);
> + atomic_thread_fence_release ();
OK. Release MO. Low bit is 1 (odd cycle). Always start on odd cycle.
> +}
> +
> +void
> +__nscd_end_gc (struct database_dyn *table)
> +{
> + atomic_thread_fence_release ();
OK. Release MO. Low bit is 0 (even cycle). Always end even.
> + uint32_t gc_cycle = atomic_load_relaxed (&table->head->gc_cycle) + 1;
> + assert ((gc_cycle & 1) == 0);
> + atomic_store_relaxed (&table->head->gc_cycle, gc_cycle);
> + atomic_thread_fence_release ();
Redundant release? The release above synchronizes-with the gc_cycle and
ensures all writes in the update are seen when looking at the version
of gc_cycle. The subsequent relaxed update is find, either we see it and
we know we're done, or we don't and we figure we're still in the gc
cycle, either is OK?
Note: Also noted by Claude Code v2.1.81 Sonnet 4.5 (narrower context). Reviewed.
> +}
> +
> /* Walk through the table and remove all entries which lifetime ended.
>
> We have a problem here. To actually remove the entries we must get
> @@ -448,10 +474,10 @@ prune_cache (struct database_dyn *table, time_t now, int fd)
> pthread_rwlock_wrlock (&table->lock);
> }
>
> - /* Now we start modifying the data. Make sure all readers of the
> - data are aware of this and temporarily don't use the data. */
> - atomic_fetch_add_relaxed (&table->head->gc_cycle, 1);
> - assert ((table->head->gc_cycle & 1) == 1);
> + /* Now we start modifying the data. Make sure all readers of
> + the data are aware of this and temporarily don't use the
> + data. */
> + __nscd_begin_gc (table);
OK.
>
> while (first <= last)
> {
> @@ -494,8 +520,7 @@ prune_cache (struct database_dyn *table, time_t now, int fd)
> }
>
> /* Now we are done modifying the data. */
> - atomic_fetch_add_relaxed (&table->head->gc_cycle, 1);
> - assert ((table->head->gc_cycle & 1) == 0);
> + __nscd_end_gc (table);
OK.
>
> /* It's all done. */
> pthread_rwlock_unlock (&table->lock);
> diff --git a/nscd/mem.c b/nscd/mem.c
> index 4f1350b3fd..14a2ab855a 100644
> --- a/nscd/mem.c
> +++ b/nscd/mem.c
> @@ -259,9 +259,7 @@ gc (struct database_dyn *db)
>
> /* Now we start modifying the data. Make sure all readers of the
> data are aware of this and temporarily don't use the data. */
> - atomic_fetch_add_relaxed (&db->head->gc_cycle, 1);
> - assert ((db->head->gc_cycle & 1) == 1);
> -
> + __nscd_begin_gc (db);
OK.
>
> /* We do not perform the move operations right away since the
> he_data array is not sorted by the address of the data. */
> @@ -485,8 +483,7 @@ gc (struct database_dyn *db)
>
>
> /* Now we are done modifying the data. */
> - atomic_fetch_add_relaxed (&db->head->gc_cycle, 1);
> - assert ((db->head->gc_cycle & 1) == 0);
> + __nscd_end_gc (db);
OK.
>
> /* We are done. */
> out:
> diff --git a/nscd/nscd-client.h b/nscd/nscd-client.h
> index 10284e8ef2..9761750951 100644
> --- a/nscd/nscd-client.h
> +++ b/nscd/nscd-client.h
> @@ -210,8 +210,8 @@ typedef uint64_t nscd_time_t;
> /* Head of record in data part of database. */
> struct datahead
> {
> - nscd_ssize_t allocsize; /* Allocated Bytes. */
> - nscd_ssize_t recsize; /* Size of the record. */
> + nscd_ssize_t allocsize; /* Allocated bytes (including this header). */
> + nscd_ssize_t recsize; /* Size of the record (data part). */
> nscd_time_t timeout; /* Time when this entry becomes invalid. */
> uint8_t notfound; /* Nonzero if data has not been found. */
> uint8_t nreloads; /* Reloads without use. */
> @@ -389,6 +389,26 @@ ssize_t __nscd_read_from_socket (const char *key, size_t keylen,
> struct scratch_buffer *response)
> attribute_hidden;
>
> +/* Use the mapping MAPPED to look up KEY of size KEYLEN. Grow
> + *RESPONSE as necessary.
> +
> + On success, the number of bytes in RESPONSE->data is returned.
> + RESPONSE->data points to the payload data from the mapping
> + (after struct datahead).
> +
> + A return value of zero means that the mapping was processed
> + successfully, but the key was not found in it.
> +
> + On failure, a negative error code is return: -EINPROGRESS means
s/return/returned/g
> + that a garbage collection is in progress, and the socket should be
> + used instead. -ENOMEM means that local (malloc) memory allocation
> + failed; this should be reported up to the caller. -EMSGSIZE denotes
> + a corrupted mapping. */
> +ssize_t
> +__nscd_read_from_cache (const char *key, size_t keylen, request_type type,
> + struct scratch_buffer *response,
> + const struct mapped_database *mapped) attribute_hidden;
> +
> /* Acquire reference to the mapping for DB (see <nscd-dbtype.h>). On
> success, return a pointer to the mapping descriptor, and lock the
> mapping.
> @@ -414,6 +434,22 @@ bool __nscd_map_ref_retry_or_drop (struct mapped_database **mapped,
> int *gc_cycle, int *nretries, int retval)
> attribute_hidden;
>
> +
> +/* Like __nscd_map_ref_retry_or_drop, but without comparing gc_cycle
> + against the original value. Instead, the retry decision is based
> + on *RET: the values -EINPROGRESS and -EMSGSIZE indicate that
> + another attempt is not needed. Other negative error codes are used
> + to set errno (to the negated/positive value) and do not result in a
> + retry, and *RET is set to -1.
> +
> + If there is no retry, return false, and unlock the mapping
> + if *USE_MAPPING.
> +
> + Must be called after __nscd_get_map_ref. */
> +bool __nscd_map_retry (struct mapped_database *mapped,
> + bool *use_mapping, int *nretries,
> + ssize_t *ret) attribute_hidden;
> +
> /* Search the mapped database. */
> extern struct datahead *__nscd_cache_search (request_type type,
> const char *key,
> diff --git a/nscd/nscd.h b/nscd/nscd.h
> index fffcb7a719..db9cd24c13 100644
> --- a/nscd/nscd.h
> +++ b/nscd/nscd.h
> @@ -262,6 +262,10 @@ extern void send_stats (int fd, struct database_dyn dbs[lastdb]);
> extern int receive_print_stats (void) __attribute__ ((__noreturn__));
>
> /* cache.c */
> +/* Software TM protocol. Used in gc and prune_cache. */
> +void __nscd_begin_gc (struct database_dyn *) attribute_hidden;
> +void __nscd_end_gc (struct database_dyn *) attribute_hidden;
> +
> extern struct datahead *cache_search (request_type, const void *key,
> size_t len, struct database_dyn *table,
> uid_t owner);
> diff --git a/nscd/nscd_helper.c b/nscd/nscd_helper.c
> index ed2d8d09da..88bfb463c2 100644
> --- a/nscd/nscd_helper.c
> +++ b/nscd/nscd_helper.c
> @@ -513,6 +513,32 @@ __nscd_map_ref_retry_or_drop (struct mapped_database **mapped,
> return false;
> }
>
> +bool
> +__nscd_map_retry (struct mapped_database *mapped,
> + bool *use_mapping, int *nretries,
> + ssize_t *ret)
> +{
> + if (*use_mapping && (*ret == -EINPROGRESS || *ret == -EMSGSIZE))
> + {
> + if (++*nretries == 5)
> + {
> + __libc_rwlock_unlock (mapped->lock);
> + *use_mapping = false;
> + }
> + return true;
> + }
> +
> + if (*use_mapping)
> + __libc_rwlock_unlock (mapped->lock);
> +
> + if (*ret < 0)
> + {
> + __set_errno (-*ret);
> + *ret = -1;
> + }
> + return false;
> +}
> +
> /* Using sizeof (hashentry) is not always correct to determine the size of
> the data structure as found in the nscd cache. The program could be
> a 64-bit process and nscd could be a 32-bit process. In this case
> @@ -580,6 +606,117 @@ __nscd_cache_search (request_type type, const char *key, size_t keylen,
> return NULL;
> }
>
> +/* TM version at the start of the read operation. See
> +_dlfo_read_start_version. */
> +static inline uint32_t
> +__nscd_cache_read_start_version (const struct mapped_database *mapped)
> +{
> + return atomic_load_acquire (&mapped->head->gc_cycle);
OK. Acquire MO on the gc_cycle synchronizes-with either the __nscd_begin_gc
or __nscd_end_gc (both are thread fence release).
> +}
> +
> +/* Return true if the read was successful, given the start version.
> + See _dlfo_read_success. */
> +static inline bool
> +__nscd_cache_read_success (const struct mapped_database *mapped,
> + uint32_t start_version)
> +{
> + atomic_thread_fence_acquire ();
> + return __nscd_cache_read_start_version (mapped) == start_version;
OK. Acquire MO, then Acquire MO on gc_cycle to see if it was the same as the
start_version.
Note that this is an Acquire MO after an Acquire MO for gc_cycle which could
be optimized with a store release of gc_cycle (instead of the thread fence
release). I think this is fine, and I doubt anyone is looking at the performance
that closely, and it is easier to reason about. It requires atomic add though
and that has architectural constraints, which is probably why the simpler
sequence was chosen for sequence locks.
Note: Also noted by Claude Code v2.1.81 Sonnet 4.5.
> +}
> +
> +
> +ssize_t
> +__nscd_read_from_cache (const char *key, size_t keylen, request_type type,
> + struct scratch_buffer *response,
> + const struct mapped_database *mapped)
> +{
> + if (keylen > MAXKEYLEN)
> + return -ENAMETOOLONG;
> +
> + unsigned long int hash = __nss_hash (key, keylen) % mapped->head->module;
> +
> + retry:;
OK. Restart here if the gc_cycle started or completed (seqlock-style).
> + struct concurrent_buffer cb = cb_create (mapped->data, mapped->datasize);
> + uint32_t start_version = __nscd_cache_read_start_version (mapped);
OK. Aquire MO fence. Acquire MO read of read of gc_cycle. Either we see the
gc_cycle as even and we know we aren't in a GC phase and process data or
we see we are in a gc_cycle (odd) and need to wait.
> + if (start_version & 1)
> + /* Garbage collection is in progress. */
> + return -EINPROGRESS;
OK. Agreed. Needs an outer loop to handle waiting (patch 15).
> +
> + ref_t trail = mapped->head->array[hash];
> + ref_t work = trail;
> + size_t loop_cnt = mapped->datasize / (MINIMUM_HASHENTRY_SIZE
> + + offsetof (struct datahead, data) / 2);
> + int tick = 0;
> +
> + while (work != ENDREF)
> + {
> + /* First compare type and key. */
> + if (type == cb_field (&cb, work, struct hashentry, type)
> + && keylen == cb_field (&cb, work, struct hashentry, len))
> + {
> + ref_t here_key = cb_field (&cb, work, struct hashentry, key);
> + if (cb_memeq (&cb, here_key, key, keylen))
> + {
> + /* Data found. Now validate the data reference. */
> + ref_t here_packet = cb_field (&cb, work,
> + struct hashentry, packet);
> + if ((here_packet & (__alignof__ (struct datahead) - 1)) == 0
> + && cb_field (&cb, here_packet, struct datahead, usable))
> + {
> + size_t recsize = cb_field (&cb, here_packet,
> + struct datahead, recsize);
> + /* Intermediate validation: we must verify that the
> + size is genuine before passing it to malloc. */
> + if (!__nscd_cache_read_success (mapped, start_version))
> + /* This will bail out if a GC cycle has started. */
OK. The recsize matters here and forcing the acquire here synchronizes with
the possible gc_cycle write.
> + goto retry;
> + /* Include struct datahead part and data part, not
> + the full allocation (with unused padding at end). */
> + size_t data_offset = offsetof (struct datahead, data);
> + size_t entrysize = data_offset + recsize;
> + if (!scratch_buffer_set_array_size (response, recsize, 1))
> + return -ENOMEM;
> + if (!cb_available (&cb, here_packet, entrysize))
> + /* The entry size was derived from validated data,
> + so if there is not enough data available, the
> + mapping is corrupted. */
> + return -EMSGSIZE;
> + memcpy (response->data,
> + mapped->data + here_packet + data_offset,
> + recsize);
OK. This copies the data out, which should be consistent if we validate again
after the copy and the pseudo-seqlock has not been observed to change gc_cycle.
OK. There is in practice ABA in gc_cycle, but the time for that is very large and
highly unlikely e.g. gc_cycle at 0, 2^32 gc cycles, gc_cycle at 0 again
and we get old data.
> + /* Validate again after copying, to make sure we
> + produced a consistent snapshot. */
> + if (!__nscd_cache_read_success (mapped, start_version))
> + goto retry;
OK.
> + return recsize;
> + }
> + }
> + }
> +
> + work = cb_field (&cb, work, struct hashentry, next);
> + if (!__nscd_cache_read_success (mapped, start_version))
> + goto retry;
> + if (cb_has_failed (&cb))
> + /* Mapping is corrupted because data is inconsistent, but
> + validated. */
> + return -EMSGSIZE;
> +
> + /* Prevent endless loops. This should never happen but perhaps
> + the database got corrupted, accidentally or deliberately.
> + Use both a loop count and a Floyd-style cycle detector
> + (work is the hare, and trail is the tortoise). */
> + if (work == trail || loop_cnt-- == 0)
> + break;
> + if (tick)
> + trail = cb_field (&cb, trail, struct hashentry, next);
> + tick = 1 - tick;
> + }
> +
> + if (!__nscd_cache_read_success (mapped, start_version))
> + goto retry;
OK.
> + /* Negative result. */
> + return 0;
> +}
>
> /* Create a socket connected to a name. */
> int
@@ -176,8 +176,10 @@ cache_add (int type, const void *key, size_t len, struct datahead *packet,
newp->packet = (char *) packet - table->data;
assert ((newp->packet & BLOCK_ALIGN_M1) == 0);
- /* Put the new entry in the first position. */
- /* TODO Review concurrency. Use atomic_exchange_release. */
+ /* Put the new entry in the first position.
+
+ The load in __nscd_read_from_cache carries a dependency
+ on the release store from the CAS below. */
newp->next = atomic_load_relaxed (&table->head->array[hash]);
while (!atomic_compare_exchange_weak_release (&table->head->array[hash],
(ref_t *) &newp->next,
@@ -231,6 +233,30 @@ cache_add (int type, const void *key, size_t len, struct datahead *packet,
return 0;
}
+/* This follows the software TM protocol in dl-find_object.c
+ (_dlfo_mappings_begin_update and _dlfo_mappings_end_update). Also
+ see __nscd_cache_read_start_version and __nscd_cache_read_success
+ in nscd_helper.c. */
+
+void
+__nscd_begin_gc (struct database_dyn *table)
+{
+ uint32_t gc_cycle = atomic_load_relaxed (&table->head->gc_cycle) + 1;
+ assert ((gc_cycle & 1) == 1);
+ atomic_store_relaxed (&table->head->gc_cycle, gc_cycle);
+ atomic_thread_fence_release ();
+}
+
+void
+__nscd_end_gc (struct database_dyn *table)
+{
+ atomic_thread_fence_release ();
+ uint32_t gc_cycle = atomic_load_relaxed (&table->head->gc_cycle) + 1;
+ assert ((gc_cycle & 1) == 0);
+ atomic_store_relaxed (&table->head->gc_cycle, gc_cycle);
+ atomic_thread_fence_release ();
+}
+
/* Walk through the table and remove all entries which lifetime ended.
We have a problem here. To actually remove the entries we must get
@@ -448,10 +474,10 @@ prune_cache (struct database_dyn *table, time_t now, int fd)
pthread_rwlock_wrlock (&table->lock);
}
- /* Now we start modifying the data. Make sure all readers of the
- data are aware of this and temporarily don't use the data. */
- atomic_fetch_add_relaxed (&table->head->gc_cycle, 1);
- assert ((table->head->gc_cycle & 1) == 1);
+ /* Now we start modifying the data. Make sure all readers of
+ the data are aware of this and temporarily don't use the
+ data. */
+ __nscd_begin_gc (table);
while (first <= last)
{
@@ -494,8 +520,7 @@ prune_cache (struct database_dyn *table, time_t now, int fd)
}
/* Now we are done modifying the data. */
- atomic_fetch_add_relaxed (&table->head->gc_cycle, 1);
- assert ((table->head->gc_cycle & 1) == 0);
+ __nscd_end_gc (table);
/* It's all done. */
pthread_rwlock_unlock (&table->lock);
@@ -259,9 +259,7 @@ gc (struct database_dyn *db)
/* Now we start modifying the data. Make sure all readers of the
data are aware of this and temporarily don't use the data. */
- atomic_fetch_add_relaxed (&db->head->gc_cycle, 1);
- assert ((db->head->gc_cycle & 1) == 1);
-
+ __nscd_begin_gc (db);
/* We do not perform the move operations right away since the
he_data array is not sorted by the address of the data. */
@@ -485,8 +483,7 @@ gc (struct database_dyn *db)
/* Now we are done modifying the data. */
- atomic_fetch_add_relaxed (&db->head->gc_cycle, 1);
- assert ((db->head->gc_cycle & 1) == 0);
+ __nscd_end_gc (db);
/* We are done. */
out:
@@ -210,8 +210,8 @@ typedef uint64_t nscd_time_t;
/* Head of record in data part of database. */
struct datahead
{
- nscd_ssize_t allocsize; /* Allocated Bytes. */
- nscd_ssize_t recsize; /* Size of the record. */
+ nscd_ssize_t allocsize; /* Allocated bytes (including this header). */
+ nscd_ssize_t recsize; /* Size of the record (data part). */
nscd_time_t timeout; /* Time when this entry becomes invalid. */
uint8_t notfound; /* Nonzero if data has not been found. */
uint8_t nreloads; /* Reloads without use. */
@@ -389,6 +389,26 @@ ssize_t __nscd_read_from_socket (const char *key, size_t keylen,
struct scratch_buffer *response)
attribute_hidden;
+/* Use the mapping MAPPED to look up KEY of size KEYLEN. Grow
+ *RESPONSE as necessary.
+
+ On success, the number of bytes in RESPONSE->data is returned.
+ RESPONSE->data points to the payload data from the mapping
+ (after struct datahead).
+
+ A return value of zero means that the mapping was processed
+ successfully, but the key was not found in it.
+
+ On failure, a negative error code is return: -EINPROGRESS means
+ that a garbage collection is in progress, and the socket should be
+ used instead. -ENOMEM means that local (malloc) memory allocation
+ failed; this should be reported up to the caller. -EMSGSIZE denotes
+ a corrupted mapping. */
+ssize_t
+__nscd_read_from_cache (const char *key, size_t keylen, request_type type,
+ struct scratch_buffer *response,
+ const struct mapped_database *mapped) attribute_hidden;
+
/* Acquire reference to the mapping for DB (see <nscd-dbtype.h>). On
success, return a pointer to the mapping descriptor, and lock the
mapping.
@@ -414,6 +434,22 @@ bool __nscd_map_ref_retry_or_drop (struct mapped_database **mapped,
int *gc_cycle, int *nretries, int retval)
attribute_hidden;
+
+/* Like __nscd_map_ref_retry_or_drop, but without comparing gc_cycle
+ against the original value. Instead, the retry decision is based
+ on *RET: the values -EINPROGRESS and -EMSGSIZE indicate that
+ another attempt is not needed. Other negative error codes are used
+ to set errno (to the negated/positive value) and do not result in a
+ retry, and *RET is set to -1.
+
+ If there is no retry, return false, and unlock the mapping
+ if *USE_MAPPING.
+
+ Must be called after __nscd_get_map_ref. */
+bool __nscd_map_retry (struct mapped_database *mapped,
+ bool *use_mapping, int *nretries,
+ ssize_t *ret) attribute_hidden;
+
/* Search the mapped database. */
extern struct datahead *__nscd_cache_search (request_type type,
const char *key,
@@ -262,6 +262,10 @@ extern void send_stats (int fd, struct database_dyn dbs[lastdb]);
extern int receive_print_stats (void) __attribute__ ((__noreturn__));
/* cache.c */
+/* Software TM protocol. Used in gc and prune_cache. */
+void __nscd_begin_gc (struct database_dyn *) attribute_hidden;
+void __nscd_end_gc (struct database_dyn *) attribute_hidden;
+
extern struct datahead *cache_search (request_type, const void *key,
size_t len, struct database_dyn *table,
uid_t owner);
@@ -513,6 +513,32 @@ __nscd_map_ref_retry_or_drop (struct mapped_database **mapped,
return false;
}
+bool
+__nscd_map_retry (struct mapped_database *mapped,
+ bool *use_mapping, int *nretries,
+ ssize_t *ret)
+{
+ if (*use_mapping && (*ret == -EINPROGRESS || *ret == -EMSGSIZE))
+ {
+ if (++*nretries == 5)
+ {
+ __libc_rwlock_unlock (mapped->lock);
+ *use_mapping = false;
+ }
+ return true;
+ }
+
+ if (*use_mapping)
+ __libc_rwlock_unlock (mapped->lock);
+
+ if (*ret < 0)
+ {
+ __set_errno (-*ret);
+ *ret = -1;
+ }
+ return false;
+}
+
/* Using sizeof (hashentry) is not always correct to determine the size of
the data structure as found in the nscd cache. The program could be
a 64-bit process and nscd could be a 32-bit process. In this case
@@ -580,6 +606,117 @@ __nscd_cache_search (request_type type, const char *key, size_t keylen,
return NULL;
}
+/* TM version at the start of the read operation. See
+_dlfo_read_start_version. */
+static inline uint32_t
+__nscd_cache_read_start_version (const struct mapped_database *mapped)
+{
+ return atomic_load_acquire (&mapped->head->gc_cycle);
+}
+
+/* Return true if the read was successful, given the start version.
+ See _dlfo_read_success. */
+static inline bool
+__nscd_cache_read_success (const struct mapped_database *mapped,
+ uint32_t start_version)
+{
+ atomic_thread_fence_acquire ();
+ return __nscd_cache_read_start_version (mapped) == start_version;
+}
+
+
+ssize_t
+__nscd_read_from_cache (const char *key, size_t keylen, request_type type,
+ struct scratch_buffer *response,
+ const struct mapped_database *mapped)
+{
+ if (keylen > MAXKEYLEN)
+ return -ENAMETOOLONG;
+
+ unsigned long int hash = __nss_hash (key, keylen) % mapped->head->module;
+
+ retry:;
+ struct concurrent_buffer cb = cb_create (mapped->data, mapped->datasize);
+ uint32_t start_version = __nscd_cache_read_start_version (mapped);
+ if (start_version & 1)
+ /* Garbage collection is in progress. */
+ return -EINPROGRESS;
+
+ ref_t trail = mapped->head->array[hash];
+ ref_t work = trail;
+ size_t loop_cnt = mapped->datasize / (MINIMUM_HASHENTRY_SIZE
+ + offsetof (struct datahead, data) / 2);
+ int tick = 0;
+
+ while (work != ENDREF)
+ {
+ /* First compare type and key. */
+ if (type == cb_field (&cb, work, struct hashentry, type)
+ && keylen == cb_field (&cb, work, struct hashentry, len))
+ {
+ ref_t here_key = cb_field (&cb, work, struct hashentry, key);
+ if (cb_memeq (&cb, here_key, key, keylen))
+ {
+ /* Data found. Now validate the data reference. */
+ ref_t here_packet = cb_field (&cb, work,
+ struct hashentry, packet);
+ if ((here_packet & (__alignof__ (struct datahead) - 1)) == 0
+ && cb_field (&cb, here_packet, struct datahead, usable))
+ {
+ size_t recsize = cb_field (&cb, here_packet,
+ struct datahead, recsize);
+ /* Intermediate validation: we must verify that the
+ size is genuine before passing it to malloc. */
+ if (!__nscd_cache_read_success (mapped, start_version))
+ /* This will bail out if a GC cycle has started. */
+ goto retry;
+ /* Include struct datahead part and data part, not
+ the full allocation (with unused padding at end). */
+ size_t data_offset = offsetof (struct datahead, data);
+ size_t entrysize = data_offset + recsize;
+ if (!scratch_buffer_set_array_size (response, recsize, 1))
+ return -ENOMEM;
+ if (!cb_available (&cb, here_packet, entrysize))
+ /* The entry size was derived from validated data,
+ so if there is not enough data available, the
+ mapping is corrupted. */
+ return -EMSGSIZE;
+ memcpy (response->data,
+ mapped->data + here_packet + data_offset,
+ recsize);
+ /* Validate again after copying, to make sure we
+ produced a consistent snapshot. */
+ if (!__nscd_cache_read_success (mapped, start_version))
+ goto retry;
+ return recsize;
+ }
+ }
+ }
+
+ work = cb_field (&cb, work, struct hashentry, next);
+ if (!__nscd_cache_read_success (mapped, start_version))
+ goto retry;
+ if (cb_has_failed (&cb))
+ /* Mapping is corrupted because data is inconsistent, but
+ validated. */
+ return -EMSGSIZE;
+
+ /* Prevent endless loops. This should never happen but perhaps
+ the database got corrupted, accidentally or deliberately.
+ Use both a loop count and a Floyd-style cycle detector
+ (work is the hare, and trail is the tortoise). */
+ if (work == trail || loop_cnt-- == 0)
+ break;
+ if (tick)
+ trail = cb_field (&cb, trail, struct hashentry, next);
+ tick = 1 - tick;
+ }
+
+ if (!__nscd_cache_read_success (mapped, start_version))
+ goto retry;
+ /* Negative result. */
+ return 0;
+}
/* Create a socket connected to a name. */
int