[v2,1/5] mips: Do not malloc on getdents64 fallback

  On 02/09/2019 09:59, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>>> The choice of size needs a comment.  I think the largest possible
>>> practical length of the d_name member are 255 Unicode characters in the
>>> BMP, in UTF-8 encoding, so d_name is 766 bytes long, plus 10 bytes from
>>> the header, for 776 bytes total.  (NAME_MAX is not a constant on Linux
>>> in reality.)
>>
>> I picked the buffer as an arbitrary value, what about:
>>
>>   /* The largest possible practical length of the d_name member are 255
>>      Unicode characters in UTF-8 encoding, so d_name is 766 bytes long, plus
>>      18 (mips64) / 10 (mips64n32) bytes from header, for total of 784 (mips64)
>>      / 776 (mips64n32) bytes total.  Ensure that the minimum size hold at
>>      least one entry.  */
>>   enum { KBUF_SIZE = 1024 };
> 
> “holds”
> 
> Looks good.
> 
>>>>    struct kernel_dirent
>>>> +  {
>>>> +    unsigned long d_ino;
>>>> +    unsigned long d_off;
>>>> +    unsigned short int d_reclen;
>>>> +    char d_name[1];
>>>> +  } kbuf[KBUF_SIZE / sizeof (struct kernel_dirent)];
>>>> +  size_t kbuf_size = nbytes < KBUF_SIZE ? nbytes : KBUF_SIZE;
>>>
>>> I would define kbuf as a char array, and perhaps leave out the d_name
>>> member in struct kernel_dirent.  You can copy out the struct
>>> kernel_dirent using memcpy, which GCC should optimize away.
>>
>> I defined the buffer as 'struct kernel_dirent' to make it easier to align
>> for the required fields.  It allows simplify the access on the loop to
>> avoid memcpy calls.
> 
> But the code is invalid C as a result of this.  We do not compile glibc
> with -fno-strict-aliasing, after all.

Right, we indeed do some pointer arithmetic to get the kdp value. I change
to use char buffer plus memcpy to obtain the fields.

> 
>>>>    struct dirent64 *dp = (struct dirent64 *) buf;
>>>> +
>>>> +  size_t nb = 0;
>>>> +  off64_t last_offset = -1;
>>>> +
>>>> +  ssize_t r;
>>>> +  while ((r = INLINE_SYSCALL_CALL (getdents, fd, kbuf, kbuf_size)) > 0)
>>>>      {
>>>
>>> Sorry, I don't see how the outer loop is exited.  I think we should
>>> remove it because it does not seem necessary.
>>
>> We still need to handle the cases where NBYTES are larger than the temporary
>> buffer, because it might require multiple getdents calls.
> 
> Why?  The application (or readdir) will call us again to get more entries.

As a simple optimization to avoid it, but I think we can move to a simplified
version.

> 
>>> Is the length really correct, though?  I'd expect it to grow by the
>>> additional size of the d_ino and d_off members.  I think it would be
>>> best recompute it from scratch, using the actual length of d_name.
>>
>> It it because you are referencing to an older patch version, checking on
>> mips64-n32 I adjusted to:
>>
>>   const size_t size_diff = (offsetof (struct dirent64, d_name)
>>                            - offsetof (struct kernel_dirent, d_name));
>>   [...]
>>                size_t new_reclen = ALIGN_UP (kdp->d_reclen + size_diff,
>>                                         _Alignof (struct dirent64));
>>   [...]
> 
> Okay, this needs a comment that this is a conservative approximation
> (some of size_diff might fit into the existing padding for alignment).

Ack.

> 
>>>> +	  if (nb + new_reclen > nbytes)
>>>>  	    {
>>>> +		/* The new entry will overflow the input buffer, rewind to
>>>> +		   last obtained entry and return.  */
>>>> +	       __lseek64 (fd, last_offset, SEEK_SET);
>>>
>>> I don't think last_offset is guaranteed to have been set with a proper
>>> offset at this point.  Given that d_name is essentially of unbounded
>>> length, even expanding the first entry can cause failure.
>>>
>>> Maybe it's possible to avoid this corner case by limiting the amount of
>>> data being read so that we know that the application-supplied buffer is
>>> always large enough for any possible expansion.  I think the worse-case
>>> growth is for lengths 5 to 8, from 20 bytes to 32 bytes.  So perhaps we
>>> should divide the buffer size by 1.6 and use that?
>>
>> For this case I really think we just need to return an error to user:
>>
>>             if (nb + new_reclen > nbytes)
>>             {   
>> 		/* Entry is too large for the static buffer.  */
> 
> Fixed-size buffer, it's not static. 8-)

Ack.

> 
>> 		if (last_offset == -1)
>> 		  {
>> 		    __set_errno (EINVAL);
>> 		    return -1;
>> 		  }
>>                 /* The new entry will overflow the input buffer, rewind to
>>                    last obtained entry and return.  */
>>                __lseek64 (fd, last_offset, SEEK_SET);
>>                goto out;
>>             }
>>
>> Again I see this fallback code as a best-effort since we are emulating the
>> syscall with additional restraints.  Most of time glibc tries to play smart
>> emulating a syscall ended in a lot of headaches...
> 
> I don't disagree.
> 
> Which error code does the kernel return if no entry can be read at all?
> We should mirror that.

It returns -1/EINVAL.

fs/readdir.c
177         if (reclen > buf->count)                                                                    
178                 return -EINVAL;

> 
>>>> +	       goto out;
>>>>  	    }
>>>> +	  nb += new_reclen;
>>>>  
>>>> +	  dp->d_ino = kdp->d_ino;
>>>> +	  dp->d_off = last_offset = kdp->d_off;
>>>> +	  dp->d_reclen = new_reclen;
>>>> +	  dp->d_type = *((char *) kdp + kdp->d_reclen - 1);
>>>
>>> I think instead of reading through kdp, you should use char *s and
>>> memcpy, to avoid the aliasing violation, as discussed above.  Likewise
>>> for writing to dp.
>>
>> I think if we proper setting the buffer alignment there is no need to do it.
>> Also the problem of using memcpy here is for mips64n32 the size is *not* 
>> equal for dp and kdp, each would require an extra step as:
>>
>>    {
>>      typeof (kdp->d_ino) kino;
>>      memcpy (&kino, &kdp_d->ino, sizeof (kino));
>>      typeof (dp->d_ino) dino = kino;
>>      memcpy (&dp->d_ino, &kino, sizeof (dino));
>>    }
> 
> I think that's just the price of writing correct C.  It's also what the
> kernel does.

Ack.

> 
> I don't even think there's a requirement that the byte buffer passed to
> getdents64 has any kind of alignment.
> 
> Thanks,
> Florian
> 

Updated patch below.

--

This patch changes how the fallback getdents64 implementation calls
non-LFS getdents by replacing the scratch_buffer with static buffer
plus a loop on getdents calls.  This avoids the potential malloc
call on scratch_buffer_set_array_size for large input buffer size
at the cost of more getdents syscalls.

It also adds a small optimization for older kernels, where the first
ENOSYS failure for getdents64 disable subsequent calls.

Check the dirent tests on a mips64-linux-gnu with getdents64 code
disabled.

	* sysdeps/unix/sysv/linux/mips/mips64/getdents64.c (__getdents64):
	Add small optimization for older kernel to avoid issuing
	__NR_getdents64 on each call and replace scratch_buffer usage with
	a static allocated buffer.
---
 .../unix/sysv/linux/mips/mips64/getdents64.c  | 133 ++++++++++--------
 1 file changed, 76 insertions(+), 57 deletions(-)

[v2,1/5] mips: Do not malloc on getdents64 fallback

Commit Message

Comments

Patch