[v2,1/5] mips: Do not malloc on getdents64 fallback

Message ID 30395e41-d6c9-4fce-ca7a-933d55900fc7@linaro.org
State Dropped
Headers

Commit Message

Adhemerval Zanella Sept. 2, 2019, 5:38 p.m. UTC
  On 02/09/2019 09:59, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>>> The choice of size needs a comment.  I think the largest possible
>>> practical length of the d_name member are 255 Unicode characters in the
>>> BMP, in UTF-8 encoding, so d_name is 766 bytes long, plus 10 bytes from
>>> the header, for 776 bytes total.  (NAME_MAX is not a constant on Linux
>>> in reality.)
>>
>> I picked the buffer as an arbitrary value, what about:
>>
>>   /* The largest possible practical length of the d_name member are 255
>>      Unicode characters in UTF-8 encoding, so d_name is 766 bytes long, plus
>>      18 (mips64) / 10 (mips64n32) bytes from header, for total of 784 (mips64)
>>      / 776 (mips64n32) bytes total.  Ensure that the minimum size hold at
>>      least one entry.  */
>>   enum { KBUF_SIZE = 1024 };
> 
> “holds”
> 
> Looks good.
> 
>>>>    struct kernel_dirent
>>>> +  {
>>>> +    unsigned long d_ino;
>>>> +    unsigned long d_off;
>>>> +    unsigned short int d_reclen;
>>>> +    char d_name[1];
>>>> +  } kbuf[KBUF_SIZE / sizeof (struct kernel_dirent)];
>>>> +  size_t kbuf_size = nbytes < KBUF_SIZE ? nbytes : KBUF_SIZE;
>>>
>>> I would define kbuf as a char array, and perhaps leave out the d_name
>>> member in struct kernel_dirent.  You can copy out the struct
>>> kernel_dirent using memcpy, which GCC should optimize away.
>>
>> I defined the buffer as 'struct kernel_dirent' to make it easier to align
>> for the required fields.  It allows simplify the access on the loop to
>> avoid memcpy calls.
> 
> But the code is invalid C as a result of this.  We do not compile glibc
> with -fno-strict-aliasing, after all.

Right, we indeed do some pointer arithmetic to get the kdp value. I change
to use char buffer plus memcpy to obtain the fields.

> 
>>>>    struct dirent64 *dp = (struct dirent64 *) buf;
>>>> +
>>>> +  size_t nb = 0;
>>>> +  off64_t last_offset = -1;
>>>> +
>>>> +  ssize_t r;
>>>> +  while ((r = INLINE_SYSCALL_CALL (getdents, fd, kbuf, kbuf_size)) > 0)
>>>>      {
>>>
>>> Sorry, I don't see how the outer loop is exited.  I think we should
>>> remove it because it does not seem necessary.
>>
>> We still need to handle the cases where NBYTES are larger than the temporary
>> buffer, because it might require multiple getdents calls.
> 
> Why?  The application (or readdir) will call us again to get more entries.

As a simple optimization to avoid it, but I think we can move to a simplified
version.

> 
>>> Is the length really correct, though?  I'd expect it to grow by the
>>> additional size of the d_ino and d_off members.  I think it would be
>>> best recompute it from scratch, using the actual length of d_name.
>>
>> It it because you are referencing to an older patch version, checking on
>> mips64-n32 I adjusted to:
>>
>>   const size_t size_diff = (offsetof (struct dirent64, d_name)
>>                            - offsetof (struct kernel_dirent, d_name));
>>   [...]
>>                size_t new_reclen = ALIGN_UP (kdp->d_reclen + size_diff,
>>                                         _Alignof (struct dirent64));
>>   [...]
> 
> Okay, this needs a comment that this is a conservative approximation
> (some of size_diff might fit into the existing padding for alignment).

Ack.

> 
>>>> +	  if (nb + new_reclen > nbytes)
>>>>  	    {
>>>> +		/* The new entry will overflow the input buffer, rewind to
>>>> +		   last obtained entry and return.  */
>>>> +	       __lseek64 (fd, last_offset, SEEK_SET);
>>>
>>> I don't think last_offset is guaranteed to have been set with a proper
>>> offset at this point.  Given that d_name is essentially of unbounded
>>> length, even expanding the first entry can cause failure.
>>>
>>> Maybe it's possible to avoid this corner case by limiting the amount of
>>> data being read so that we know that the application-supplied buffer is
>>> always large enough for any possible expansion.  I think the worse-case
>>> growth is for lengths 5 to 8, from 20 bytes to 32 bytes.  So perhaps we
>>> should divide the buffer size by 1.6 and use that?
>>
>> For this case I really think we just need to return an error to user:
>>
>>             if (nb + new_reclen > nbytes)
>>             {   
>> 		/* Entry is too large for the static buffer.  */
> 
> Fixed-size buffer, it's not static. 8-)

Ack.

> 
>> 		if (last_offset == -1)
>> 		  {
>> 		    __set_errno (EINVAL);
>> 		    return -1;
>> 		  }
>>                 /* The new entry will overflow the input buffer, rewind to
>>                    last obtained entry and return.  */
>>                __lseek64 (fd, last_offset, SEEK_SET);
>>                goto out;
>>             }
>>
>> Again I see this fallback code as a best-effort since we are emulating the
>> syscall with additional restraints.  Most of time glibc tries to play smart
>> emulating a syscall ended in a lot of headaches...
> 
> I don't disagree.
> 
> Which error code does the kernel return if no entry can be read at all?
> We should mirror that.

It returns -1/EINVAL.

fs/readdir.c
177         if (reclen > buf->count)                                                                    
178                 return -EINVAL;

> 
>>>> +	       goto out;
>>>>  	    }
>>>> +	  nb += new_reclen;
>>>>  
>>>> +	  dp->d_ino = kdp->d_ino;
>>>> +	  dp->d_off = last_offset = kdp->d_off;
>>>> +	  dp->d_reclen = new_reclen;
>>>> +	  dp->d_type = *((char *) kdp + kdp->d_reclen - 1);
>>>
>>> I think instead of reading through kdp, you should use char *s and
>>> memcpy, to avoid the aliasing violation, as discussed above.  Likewise
>>> for writing to dp.
>>
>> I think if we proper setting the buffer alignment there is no need to do it.
>> Also the problem of using memcpy here is for mips64n32 the size is *not* 
>> equal for dp and kdp, each would require an extra step as:
>>
>>    {
>>      typeof (kdp->d_ino) kino;
>>      memcpy (&kino, &kdp_d->ino, sizeof (kino));
>>      typeof (dp->d_ino) dino = kino;
>>      memcpy (&dp->d_ino, &kino, sizeof (dino));
>>    }
> 
> I think that's just the price of writing correct C.  It's also what the
> kernel does.

Ack.

> 
> I don't even think there's a requirement that the byte buffer passed to
> getdents64 has any kind of alignment.
> 
> Thanks,
> Florian
> 

Updated patch below.

--

This patch changes how the fallback getdents64 implementation calls
non-LFS getdents by replacing the scratch_buffer with static buffer
plus a loop on getdents calls.  This avoids the potential malloc
call on scratch_buffer_set_array_size for large input buffer size
at the cost of more getdents syscalls.

It also adds a small optimization for older kernels, where the first
ENOSYS failure for getdents64 disable subsequent calls.

Check the dirent tests on a mips64-linux-gnu with getdents64 code
disabled.

	* sysdeps/unix/sysv/linux/mips/mips64/getdents64.c (__getdents64):
	Add small optimization for older kernel to avoid issuing
	__NR_getdents64 on each call and replace scratch_buffer usage with
	a static allocated buffer.
---
 .../unix/sysv/linux/mips/mips64/getdents64.c  | 133 ++++++++++--------
 1 file changed, 76 insertions(+), 57 deletions(-)
  

Comments

Adhemerval Zanella Oct. 7, 2019, 5:49 p.m. UTC | #1
Ping.

On 02/09/2019 14:38, Adhemerval Zanella wrote:
> 
> 
> On 02/09/2019 09:59, Florian Weimer wrote:
>> * Adhemerval Zanella:
>>
>>>> The choice of size needs a comment.  I think the largest possible
>>>> practical length of the d_name member are 255 Unicode characters in the
>>>> BMP, in UTF-8 encoding, so d_name is 766 bytes long, plus 10 bytes from
>>>> the header, for 776 bytes total.  (NAME_MAX is not a constant on Linux
>>>> in reality.)
>>>
>>> I picked the buffer as an arbitrary value, what about:
>>>
>>>   /* The largest possible practical length of the d_name member are 255
>>>      Unicode characters in UTF-8 encoding, so d_name is 766 bytes long, plus
>>>      18 (mips64) / 10 (mips64n32) bytes from header, for total of 784 (mips64)
>>>      / 776 (mips64n32) bytes total.  Ensure that the minimum size hold at
>>>      least one entry.  */
>>>   enum { KBUF_SIZE = 1024 };
>>
>> “holds”
>>
>> Looks good.
>>
>>>>>    struct kernel_dirent
>>>>> +  {
>>>>> +    unsigned long d_ino;
>>>>> +    unsigned long d_off;
>>>>> +    unsigned short int d_reclen;
>>>>> +    char d_name[1];
>>>>> +  } kbuf[KBUF_SIZE / sizeof (struct kernel_dirent)];
>>>>> +  size_t kbuf_size = nbytes < KBUF_SIZE ? nbytes : KBUF_SIZE;
>>>>
>>>> I would define kbuf as a char array, and perhaps leave out the d_name
>>>> member in struct kernel_dirent.  You can copy out the struct
>>>> kernel_dirent using memcpy, which GCC should optimize away.
>>>
>>> I defined the buffer as 'struct kernel_dirent' to make it easier to align
>>> for the required fields.  It allows simplify the access on the loop to
>>> avoid memcpy calls.
>>
>> But the code is invalid C as a result of this.  We do not compile glibc
>> with -fno-strict-aliasing, after all.
> 
> Right, we indeed do some pointer arithmetic to get the kdp value. I change
> to use char buffer plus memcpy to obtain the fields.
> 
>>
>>>>>    struct dirent64 *dp = (struct dirent64 *) buf;
>>>>> +
>>>>> +  size_t nb = 0;
>>>>> +  off64_t last_offset = -1;
>>>>> +
>>>>> +  ssize_t r;
>>>>> +  while ((r = INLINE_SYSCALL_CALL (getdents, fd, kbuf, kbuf_size)) > 0)
>>>>>      {
>>>>
>>>> Sorry, I don't see how the outer loop is exited.  I think we should
>>>> remove it because it does not seem necessary.
>>>
>>> We still need to handle the cases where NBYTES are larger than the temporary
>>> buffer, because it might require multiple getdents calls.
>>
>> Why?  The application (or readdir) will call us again to get more entries.
> 
> As a simple optimization to avoid it, but I think we can move to a simplified
> version.
> 
>>
>>>> Is the length really correct, though?  I'd expect it to grow by the
>>>> additional size of the d_ino and d_off members.  I think it would be
>>>> best recompute it from scratch, using the actual length of d_name.
>>>
>>> It it because you are referencing to an older patch version, checking on
>>> mips64-n32 I adjusted to:
>>>
>>>   const size_t size_diff = (offsetof (struct dirent64, d_name)
>>>                            - offsetof (struct kernel_dirent, d_name));
>>>   [...]
>>>                size_t new_reclen = ALIGN_UP (kdp->d_reclen + size_diff,
>>>                                         _Alignof (struct dirent64));
>>>   [...]
>>
>> Okay, this needs a comment that this is a conservative approximation
>> (some of size_diff might fit into the existing padding for alignment).
> 
> Ack.
> 
>>
>>>>> +	  if (nb + new_reclen > nbytes)
>>>>>  	    {
>>>>> +		/* The new entry will overflow the input buffer, rewind to
>>>>> +		   last obtained entry and return.  */
>>>>> +	       __lseek64 (fd, last_offset, SEEK_SET);
>>>>
>>>> I don't think last_offset is guaranteed to have been set with a proper
>>>> offset at this point.  Given that d_name is essentially of unbounded
>>>> length, even expanding the first entry can cause failure.
>>>>
>>>> Maybe it's possible to avoid this corner case by limiting the amount of
>>>> data being read so that we know that the application-supplied buffer is
>>>> always large enough for any possible expansion.  I think the worse-case
>>>> growth is for lengths 5 to 8, from 20 bytes to 32 bytes.  So perhaps we
>>>> should divide the buffer size by 1.6 and use that?
>>>
>>> For this case I really think we just need to return an error to user:
>>>
>>>             if (nb + new_reclen > nbytes)
>>>             {   
>>> 		/* Entry is too large for the static buffer.  */
>>
>> Fixed-size buffer, it's not static. 8-)
> 
> Ack.
> 
>>
>>> 		if (last_offset == -1)
>>> 		  {
>>> 		    __set_errno (EINVAL);
>>> 		    return -1;
>>> 		  }
>>>                 /* The new entry will overflow the input buffer, rewind to
>>>                    last obtained entry and return.  */
>>>                __lseek64 (fd, last_offset, SEEK_SET);
>>>                goto out;
>>>             }
>>>
>>> Again I see this fallback code as a best-effort since we are emulating the
>>> syscall with additional restraints.  Most of time glibc tries to play smart
>>> emulating a syscall ended in a lot of headaches...
>>
>> I don't disagree.
>>
>> Which error code does the kernel return if no entry can be read at all?
>> We should mirror that.
> 
> It returns -1/EINVAL.
> 
> fs/readdir.c
> 177         if (reclen > buf->count)                                                                    
> 178                 return -EINVAL;
> 
>>
>>>>> +	       goto out;
>>>>>  	    }
>>>>> +	  nb += new_reclen;
>>>>>  
>>>>> +	  dp->d_ino = kdp->d_ino;
>>>>> +	  dp->d_off = last_offset = kdp->d_off;
>>>>> +	  dp->d_reclen = new_reclen;
>>>>> +	  dp->d_type = *((char *) kdp + kdp->d_reclen - 1);
>>>>
>>>> I think instead of reading through kdp, you should use char *s and
>>>> memcpy, to avoid the aliasing violation, as discussed above.  Likewise
>>>> for writing to dp.
>>>
>>> I think if we proper setting the buffer alignment there is no need to do it.
>>> Also the problem of using memcpy here is for mips64n32 the size is *not* 
>>> equal for dp and kdp, each would require an extra step as:
>>>
>>>    {
>>>      typeof (kdp->d_ino) kino;
>>>      memcpy (&kino, &kdp_d->ino, sizeof (kino));
>>>      typeof (dp->d_ino) dino = kino;
>>>      memcpy (&dp->d_ino, &kino, sizeof (dino));
>>>    }
>>
>> I think that's just the price of writing correct C.  It's also what the
>> kernel does.
> 
> Ack.
> 
>>
>> I don't even think there's a requirement that the byte buffer passed to
>> getdents64 has any kind of alignment.
>>
>> Thanks,
>> Florian
>>
> 
> Updated patch below.
> 
> --
> 
> This patch changes how the fallback getdents64 implementation calls
> non-LFS getdents by replacing the scratch_buffer with static buffer
> plus a loop on getdents calls.  This avoids the potential malloc
> call on scratch_buffer_set_array_size for large input buffer size
> at the cost of more getdents syscalls.
> 
> It also adds a small optimization for older kernels, where the first
> ENOSYS failure for getdents64 disable subsequent calls.
> 
> Check the dirent tests on a mips64-linux-gnu with getdents64 code
> disabled.
> 
> 	* sysdeps/unix/sysv/linux/mips/mips64/getdents64.c (__getdents64):
> 	Add small optimization for older kernel to avoid issuing
> 	__NR_getdents64 on each call and replace scratch_buffer usage with
> 	a static allocated buffer.
> ---
>  .../unix/sysv/linux/mips/mips64/getdents64.c  | 133 ++++++++++--------
>  1 file changed, 76 insertions(+), 57 deletions(-)
> 
> diff --git a/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c b/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c
> index 8bf3abb0e0..02e15a0b2e 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c
> +++ b/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c
> @@ -22,88 +22,108 @@
>  #include <assert.h>
>  #include <sys/param.h>
>  #include <unistd.h>
> -#include <scratch_buffer.h>
>  #include <limits.h>
>  
> +#include <include/libc-pointer-arith.h>
> +
>  ssize_t
> -__getdents64 (int fd, void *buf0, size_t nbytes)
> +__getdents64 (int fd, void *buf, size_t nbytes)
>  {
> -  char *buf = buf0;
> -
>    /* The system call takes an unsigned int argument, and some length
>       checks in the kernel use an int type.  */
>    if (nbytes > INT_MAX)
>      nbytes = INT_MAX;
>  
>  #ifdef __NR_getdents64
> -  ssize_t ret = INLINE_SYSCALL_CALL (getdents64, fd, buf, nbytes);
> -  if (ret != -1)
> -    return ret;
> +  static int getdents64_supported = true;
> +  if (atomic_load_relaxed (&getdents64_supported))
> +    {
> +      ssize_t ret = INLINE_SYSCALL_CALL (getdents64, fd, buf, nbytes);
> +      if (ret >= 0 || errno != ENOSYS)
> +	return ret;
> +
> +      atomic_store_relaxed (&getdents64_supported, false);
> +    }
>  #endif
>  
>    /* Unfortunately getdents64 was only wire-up for MIPS n64 on Linux 3.10.
> -     If syscall is not available it need to fallback to non-LFS one.  */
> +     If the syscall is not available it need to fallback to the non-LFS one.
> +     Also to avoid an unbounded allocation through VLA/alloca or malloc (which
> +     would make the syscall non async-signal-safe) it uses a limited buffer.
> +     This is sub-optimal for large NBYTES, however this is a fallback
> +     mechanism to emulate a syscall that kernel should provide.   */
>  
>    struct kernel_dirent
> -    {
> -      unsigned long d_ino;
> -      unsigned long d_off;
> -      unsigned short int d_reclen;
> -      char d_name[256];
> -    };
> +  {
> +#if _MIPS_SIM == _ABI64
> +    uint64_t d_ino;
> +    uint64_t d_off;
> +#else
> +    uint32_t d_ino;
> +    uint32_t d_off;
> +#endif
> +    unsigned short int d_reclen;
> +    char d_name[1];
> +  };
> +
> +  /* The largest possible practical length of the d_name member are 255
> +     Unicode characters in UTF-8 encoding, so d_name is 766 bytes long, plus
> +     18 (mips64) / 10 (mips64n32) bytes from header, for total of 784 (mips64)
> +     / 776 (mips64n32) bytes total.  Ensure that the minimum size holds at
> +     least one entry.  */
> +  enum { KBUF_SIZE = 1024 };
> +  char kbuf[KBUF_SIZE];
> +  size_t kbuf_size = nbytes < KBUF_SIZE ? nbytes : KBUF_SIZE;
>  
>    const size_t size_diff = (offsetof (struct dirent64, d_name)
> -			   - offsetof (struct kernel_dirent, d_name));
> +                           - offsetof (struct kernel_dirent, d_name));
>  
> -  size_t red_nbytes = MIN (nbytes
> -			   - ((nbytes / (offsetof (struct dirent64, d_name)
> -					 + 14)) * size_diff),
> -			   nbytes - size_diff);
> +  struct dirent64 *dp = (struct dirent64 *) buf;
>  
> -  struct scratch_buffer tmpbuf;
> -  scratch_buffer_init (&tmpbuf);
> -  if (!scratch_buffer_set_array_size (&tmpbuf, red_nbytes, sizeof (uint8_t)))
> -    INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOMEM);
> +  size_t nb = 0;
> +  off64_t last_offset = -1;
>  
> -  struct kernel_dirent *skdp, *kdp;
> -  skdp = kdp = tmpbuf.data;
> +  ssize_t r = INLINE_SYSCALL_CALL (getdents, fd, kbuf, kbuf_size);
> +  if (r <= 0)
> +    return r;
>  
> -  ssize_t retval = INLINE_SYSCALL_CALL (getdents, fd, kdp, red_nbytes);
> -  if (retval == -1)
> -    {
> -      scratch_buffer_free (&tmpbuf);
> -      return -1;
> -    }
> +  struct kernel_dirent *skdp, *kdp;
> +  skdp = kdp = (struct kernel_dirent *) kbuf;
>  
> -  off64_t last_offset = -1;
> -  struct dirent64 *dp = (struct dirent64 *) buf;
> -  while ((char *) kdp < (char *) skdp + retval)
> +  while ((char *) kdp < (char *) skdp + r)
>      {
> -      const size_t alignment = _Alignof (struct dirent64);
> -      /* Since kdp->d_reclen is already aligned for the kernel structure
> -	 this may compute a value that is bigger than necessary.  */
> -      size_t new_reclen = ((kdp->d_reclen + size_diff + alignment - 1)
> -			   & ~(alignment - 1));
> -      if ((char *) dp + new_reclen > buf + nbytes)
> -        {
> -	  /* Our heuristic failed.  We read too many entries.  Reset
> -	     the stream.  */
> -	  assert (last_offset != -1);
> -	  __lseek64 (fd, last_offset, SEEK_SET);
> -
> -	  if ((char *) dp == buf)
> +      /* This is a conservative approximation, since some of size_diff might
> +	 fit into the existing padding for alignment.  */
> +      size_t new_reclen = ALIGN_UP (kdp->d_reclen + size_diff,
> +				    _Alignof (struct dirent64));
> +      if (nb + new_reclen > nbytes)
> +	{
> +	  /* Entry is too large for the fixed-size buffer.  */
> +	  if (last_offset == -1)
>  	    {
> -	      scratch_buffer_free (&tmpbuf);
> -	      return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
> +	      __set_errno (EINVAL);
> +	      return -1;
>  	    }
>  
> -	  break;
> +	  /* The new entry will overflow the input buffer, rewind to last
> +	     obtained entry and return.  */
> +	  __lseek64 (fd, last_offset, SEEK_SET);
> +	  return (char *) dp - (char *) buf;
>  	}
> -
> -      last_offset = kdp->d_off;
> -      dp->d_ino = kdp->d_ino;
> -      dp->d_off = kdp->d_off;
> -      dp->d_reclen = new_reclen;
> +      nb += new_reclen;
> +
> +#define copy_field(dst, src)			\
> +  ({						\
> +     typeof (src) _src;				\
> +     memcpy (&_src, &(src), sizeof (src));	\
> +     typeof (dst) _dst = _src;			\
> +     memcpy (&(dst), &_dst, sizeof (dst));	\
> +  })
> +
> +      copy_field (dp->d_ino, kdp->d_ino);
> +      copy_field (dp->d_off, kdp->d_off);
> +      copy_field (last_offset, kdp->d_off);
> +      copy_field (dp->d_reclen, new_reclen);
>        dp->d_type = *((char *) kdp + kdp->d_reclen - 1);
>        memcpy (dp->d_name, kdp->d_name,
>  	      kdp->d_reclen - offsetof (struct kernel_dirent, d_name));
> @@ -112,8 +132,7 @@ __getdents64 (int fd, void *buf0, size_t nbytes)
>        kdp = (struct kernel_dirent *) (((char *) kdp) + kdp->d_reclen);
>      }
>  
> -  scratch_buffer_free (&tmpbuf);
> -  return (char *) dp - buf;
> +  return (char *) dp - (char *) buf;
>  }
>  libc_hidden_def (__getdents64)
>  weak_alias (__getdents64, getdents64)
>
  
Florian Weimer Oct. 7, 2019, 6:29 p.m. UTC | #2
* Adhemerval Zanella:

> +#define copy_field(dst, src)			\
> +  ({						\
> +     typeof (src) _src;				\
> +     memcpy (&_src, &(src), sizeof (src));	\
> +     typeof (dst) _dst = _src;			\
> +     memcpy (&(dst), &_dst, sizeof (dst));	\
> +  })
> +
> +      copy_field (dp->d_ino, kdp->d_ino);
> +      copy_field (dp->d_off, kdp->d_off);
> +      copy_field (last_offset, kdp->d_off);
> +      copy_field (dp->d_reclen, new_reclen);
>        dp->d_type = *((char *) kdp + kdp->d_reclen - 1);

I believe this still asserts the dynamic type of *dp, which is not what
we want.  The truly portable way probably involves using offsetof and
not -> dereferencing. 8-(

Considering that, I would probably drop copy_field, compile the file
with -fno-strict-aliasing, and add a comment to the (now plain
assignments) that this is okay due to -fno-strict-aliasing.

But this is really up to you, I do not want to discuss this patch to
death.

Thanks,
Florian
  

Patch

diff --git a/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c b/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c
index 8bf3abb0e0..02e15a0b2e 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c
+++ b/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c
@@ -22,88 +22,108 @@ 
 #include <assert.h>
 #include <sys/param.h>
 #include <unistd.h>
-#include <scratch_buffer.h>
 #include <limits.h>
 
+#include <include/libc-pointer-arith.h>
+
 ssize_t
-__getdents64 (int fd, void *buf0, size_t nbytes)
+__getdents64 (int fd, void *buf, size_t nbytes)
 {
-  char *buf = buf0;
-
   /* The system call takes an unsigned int argument, and some length
      checks in the kernel use an int type.  */
   if (nbytes > INT_MAX)
     nbytes = INT_MAX;
 
 #ifdef __NR_getdents64
-  ssize_t ret = INLINE_SYSCALL_CALL (getdents64, fd, buf, nbytes);
-  if (ret != -1)
-    return ret;
+  static int getdents64_supported = true;
+  if (atomic_load_relaxed (&getdents64_supported))
+    {
+      ssize_t ret = INLINE_SYSCALL_CALL (getdents64, fd, buf, nbytes);
+      if (ret >= 0 || errno != ENOSYS)
+	return ret;
+
+      atomic_store_relaxed (&getdents64_supported, false);
+    }
 #endif
 
   /* Unfortunately getdents64 was only wire-up for MIPS n64 on Linux 3.10.
-     If syscall is not available it need to fallback to non-LFS one.  */
+     If the syscall is not available it need to fallback to the non-LFS one.
+     Also to avoid an unbounded allocation through VLA/alloca or malloc (which
+     would make the syscall non async-signal-safe) it uses a limited buffer.
+     This is sub-optimal for large NBYTES, however this is a fallback
+     mechanism to emulate a syscall that kernel should provide.   */
 
   struct kernel_dirent
-    {
-      unsigned long d_ino;
-      unsigned long d_off;
-      unsigned short int d_reclen;
-      char d_name[256];
-    };
+  {
+#if _MIPS_SIM == _ABI64
+    uint64_t d_ino;
+    uint64_t d_off;
+#else
+    uint32_t d_ino;
+    uint32_t d_off;
+#endif
+    unsigned short int d_reclen;
+    char d_name[1];
+  };
+
+  /* The largest possible practical length of the d_name member are 255
+     Unicode characters in UTF-8 encoding, so d_name is 766 bytes long, plus
+     18 (mips64) / 10 (mips64n32) bytes from header, for total of 784 (mips64)
+     / 776 (mips64n32) bytes total.  Ensure that the minimum size holds at
+     least one entry.  */
+  enum { KBUF_SIZE = 1024 };
+  char kbuf[KBUF_SIZE];
+  size_t kbuf_size = nbytes < KBUF_SIZE ? nbytes : KBUF_SIZE;
 
   const size_t size_diff = (offsetof (struct dirent64, d_name)
-			   - offsetof (struct kernel_dirent, d_name));
+                           - offsetof (struct kernel_dirent, d_name));
 
-  size_t red_nbytes = MIN (nbytes
-			   - ((nbytes / (offsetof (struct dirent64, d_name)
-					 + 14)) * size_diff),
-			   nbytes - size_diff);
+  struct dirent64 *dp = (struct dirent64 *) buf;
 
-  struct scratch_buffer tmpbuf;
-  scratch_buffer_init (&tmpbuf);
-  if (!scratch_buffer_set_array_size (&tmpbuf, red_nbytes, sizeof (uint8_t)))
-    INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOMEM);
+  size_t nb = 0;
+  off64_t last_offset = -1;
 
-  struct kernel_dirent *skdp, *kdp;
-  skdp = kdp = tmpbuf.data;
+  ssize_t r = INLINE_SYSCALL_CALL (getdents, fd, kbuf, kbuf_size);
+  if (r <= 0)
+    return r;
 
-  ssize_t retval = INLINE_SYSCALL_CALL (getdents, fd, kdp, red_nbytes);
-  if (retval == -1)
-    {
-      scratch_buffer_free (&tmpbuf);
-      return -1;
-    }
+  struct kernel_dirent *skdp, *kdp;
+  skdp = kdp = (struct kernel_dirent *) kbuf;
 
-  off64_t last_offset = -1;
-  struct dirent64 *dp = (struct dirent64 *) buf;
-  while ((char *) kdp < (char *) skdp + retval)
+  while ((char *) kdp < (char *) skdp + r)
     {
-      const size_t alignment = _Alignof (struct dirent64);
-      /* Since kdp->d_reclen is already aligned for the kernel structure
-	 this may compute a value that is bigger than necessary.  */
-      size_t new_reclen = ((kdp->d_reclen + size_diff + alignment - 1)
-			   & ~(alignment - 1));
-      if ((char *) dp + new_reclen > buf + nbytes)
-        {
-	  /* Our heuristic failed.  We read too many entries.  Reset
-	     the stream.  */
-	  assert (last_offset != -1);
-	  __lseek64 (fd, last_offset, SEEK_SET);
-
-	  if ((char *) dp == buf)
+      /* This is a conservative approximation, since some of size_diff might
+	 fit into the existing padding for alignment.  */
+      size_t new_reclen = ALIGN_UP (kdp->d_reclen + size_diff,
+				    _Alignof (struct dirent64));
+      if (nb + new_reclen > nbytes)
+	{
+	  /* Entry is too large for the fixed-size buffer.  */
+	  if (last_offset == -1)
 	    {
-	      scratch_buffer_free (&tmpbuf);
-	      return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
+	      __set_errno (EINVAL);
+	      return -1;
 	    }
 
-	  break;
+	  /* The new entry will overflow the input buffer, rewind to last
+	     obtained entry and return.  */
+	  __lseek64 (fd, last_offset, SEEK_SET);
+	  return (char *) dp - (char *) buf;
 	}
-
-      last_offset = kdp->d_off;
-      dp->d_ino = kdp->d_ino;
-      dp->d_off = kdp->d_off;
-      dp->d_reclen = new_reclen;
+      nb += new_reclen;
+
+#define copy_field(dst, src)			\
+  ({						\
+     typeof (src) _src;				\
+     memcpy (&_src, &(src), sizeof (src));	\
+     typeof (dst) _dst = _src;			\
+     memcpy (&(dst), &_dst, sizeof (dst));	\
+  })
+
+      copy_field (dp->d_ino, kdp->d_ino);
+      copy_field (dp->d_off, kdp->d_off);
+      copy_field (last_offset, kdp->d_off);
+      copy_field (dp->d_reclen, new_reclen);
       dp->d_type = *((char *) kdp + kdp->d_reclen - 1);
       memcpy (dp->d_name, kdp->d_name,
 	      kdp->d_reclen - offsetof (struct kernel_dirent, d_name));
@@ -112,8 +132,7 @@  __getdents64 (int fd, void *buf0, size_t nbytes)
       kdp = (struct kernel_dirent *) (((char *) kdp) + kdp->d_reclen);
     }
 
-  scratch_buffer_free (&tmpbuf);
-  return (char *) dp - buf;
+  return (char *) dp - (char *) buf;
 }
 libc_hidden_def (__getdents64)
 weak_alias (__getdents64, getdents64)