Submitter | Adhemerval Zanella Netto |
---|---|
Date | Sept. 2, 2019, 5:38 p.m. |
Message ID | <30395e41-d6c9-4fce-ca7a-933d55900fc7@linaro.org> |
Download | mbox | patch |
Permalink | /patch/34373/ |
State | New |
Headers | show |
Comments
Ping. On 02/09/2019 14:38, Adhemerval Zanella wrote: > > > On 02/09/2019 09:59, Florian Weimer wrote: >> * Adhemerval Zanella: >> >>>> The choice of size needs a comment. I think the largest possible >>>> practical length of the d_name member are 255 Unicode characters in the >>>> BMP, in UTF-8 encoding, so d_name is 766 bytes long, plus 10 bytes from >>>> the header, for 776 bytes total. (NAME_MAX is not a constant on Linux >>>> in reality.) >>> >>> I picked the buffer as an arbitrary value, what about: >>> >>> /* The largest possible practical length of the d_name member are 255 >>> Unicode characters in UTF-8 encoding, so d_name is 766 bytes long, plus >>> 18 (mips64) / 10 (mips64n32) bytes from header, for total of 784 (mips64) >>> / 776 (mips64n32) bytes total. Ensure that the minimum size hold at >>> least one entry. */ >>> enum { KBUF_SIZE = 1024 }; >> >> “holds” >> >> Looks good. >> >>>>> struct kernel_dirent >>>>> + { >>>>> + unsigned long d_ino; >>>>> + unsigned long d_off; >>>>> + unsigned short int d_reclen; >>>>> + char d_name[1]; >>>>> + } kbuf[KBUF_SIZE / sizeof (struct kernel_dirent)]; >>>>> + size_t kbuf_size = nbytes < KBUF_SIZE ? nbytes : KBUF_SIZE; >>>> >>>> I would define kbuf as a char array, and perhaps leave out the d_name >>>> member in struct kernel_dirent. You can copy out the struct >>>> kernel_dirent using memcpy, which GCC should optimize away. >>> >>> I defined the buffer as 'struct kernel_dirent' to make it easier to align >>> for the required fields. It allows simplify the access on the loop to >>> avoid memcpy calls. >> >> But the code is invalid C as a result of this. We do not compile glibc >> with -fno-strict-aliasing, after all. > > Right, we indeed do some pointer arithmetic to get the kdp value. I change > to use char buffer plus memcpy to obtain the fields. > >> >>>>> struct dirent64 *dp = (struct dirent64 *) buf; >>>>> + >>>>> + size_t nb = 0; >>>>> + off64_t last_offset = -1; >>>>> + >>>>> + ssize_t r; >>>>> + while ((r = INLINE_SYSCALL_CALL (getdents, fd, kbuf, kbuf_size)) > 0) >>>>> { >>>> >>>> Sorry, I don't see how the outer loop is exited. I think we should >>>> remove it because it does not seem necessary. >>> >>> We still need to handle the cases where NBYTES are larger than the temporary >>> buffer, because it might require multiple getdents calls. >> >> Why? The application (or readdir) will call us again to get more entries. > > As a simple optimization to avoid it, but I think we can move to a simplified > version. > >> >>>> Is the length really correct, though? I'd expect it to grow by the >>>> additional size of the d_ino and d_off members. I think it would be >>>> best recompute it from scratch, using the actual length of d_name. >>> >>> It it because you are referencing to an older patch version, checking on >>> mips64-n32 I adjusted to: >>> >>> const size_t size_diff = (offsetof (struct dirent64, d_name) >>> - offsetof (struct kernel_dirent, d_name)); >>> [...] >>> size_t new_reclen = ALIGN_UP (kdp->d_reclen + size_diff, >>> _Alignof (struct dirent64)); >>> [...] >> >> Okay, this needs a comment that this is a conservative approximation >> (some of size_diff might fit into the existing padding for alignment). > > Ack. > >> >>>>> + if (nb + new_reclen > nbytes) >>>>> { >>>>> + /* The new entry will overflow the input buffer, rewind to >>>>> + last obtained entry and return. */ >>>>> + __lseek64 (fd, last_offset, SEEK_SET); >>>> >>>> I don't think last_offset is guaranteed to have been set with a proper >>>> offset at this point. Given that d_name is essentially of unbounded >>>> length, even expanding the first entry can cause failure. >>>> >>>> Maybe it's possible to avoid this corner case by limiting the amount of >>>> data being read so that we know that the application-supplied buffer is >>>> always large enough for any possible expansion. I think the worse-case >>>> growth is for lengths 5 to 8, from 20 bytes to 32 bytes. So perhaps we >>>> should divide the buffer size by 1.6 and use that? >>> >>> For this case I really think we just need to return an error to user: >>> >>> if (nb + new_reclen > nbytes) >>> { >>> /* Entry is too large for the static buffer. */ >> >> Fixed-size buffer, it's not static. 8-) > > Ack. > >> >>> if (last_offset == -1) >>> { >>> __set_errno (EINVAL); >>> return -1; >>> } >>> /* The new entry will overflow the input buffer, rewind to >>> last obtained entry and return. */ >>> __lseek64 (fd, last_offset, SEEK_SET); >>> goto out; >>> } >>> >>> Again I see this fallback code as a best-effort since we are emulating the >>> syscall with additional restraints. Most of time glibc tries to play smart >>> emulating a syscall ended in a lot of headaches... >> >> I don't disagree. >> >> Which error code does the kernel return if no entry can be read at all? >> We should mirror that. > > It returns -1/EINVAL. > > fs/readdir.c > 177 if (reclen > buf->count) > 178 return -EINVAL; > >> >>>>> + goto out; >>>>> } >>>>> + nb += new_reclen; >>>>> >>>>> + dp->d_ino = kdp->d_ino; >>>>> + dp->d_off = last_offset = kdp->d_off; >>>>> + dp->d_reclen = new_reclen; >>>>> + dp->d_type = *((char *) kdp + kdp->d_reclen - 1); >>>> >>>> I think instead of reading through kdp, you should use char *s and >>>> memcpy, to avoid the aliasing violation, as discussed above. Likewise >>>> for writing to dp. >>> >>> I think if we proper setting the buffer alignment there is no need to do it. >>> Also the problem of using memcpy here is for mips64n32 the size is *not* >>> equal for dp and kdp, each would require an extra step as: >>> >>> { >>> typeof (kdp->d_ino) kino; >>> memcpy (&kino, &kdp_d->ino, sizeof (kino)); >>> typeof (dp->d_ino) dino = kino; >>> memcpy (&dp->d_ino, &kino, sizeof (dino)); >>> } >> >> I think that's just the price of writing correct C. It's also what the >> kernel does. > > Ack. > >> >> I don't even think there's a requirement that the byte buffer passed to >> getdents64 has any kind of alignment. >> >> Thanks, >> Florian >> > > Updated patch below. > > -- > > This patch changes how the fallback getdents64 implementation calls > non-LFS getdents by replacing the scratch_buffer with static buffer > plus a loop on getdents calls. This avoids the potential malloc > call on scratch_buffer_set_array_size for large input buffer size > at the cost of more getdents syscalls. > > It also adds a small optimization for older kernels, where the first > ENOSYS failure for getdents64 disable subsequent calls. > > Check the dirent tests on a mips64-linux-gnu with getdents64 code > disabled. > > * sysdeps/unix/sysv/linux/mips/mips64/getdents64.c (__getdents64): > Add small optimization for older kernel to avoid issuing > __NR_getdents64 on each call and replace scratch_buffer usage with > a static allocated buffer. > --- > .../unix/sysv/linux/mips/mips64/getdents64.c | 133 ++++++++++-------- > 1 file changed, 76 insertions(+), 57 deletions(-) > > diff --git a/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c b/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c > index 8bf3abb0e0..02e15a0b2e 100644 > --- a/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c > +++ b/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c > @@ -22,88 +22,108 @@ > #include <assert.h> > #include <sys/param.h> > #include <unistd.h> > -#include <scratch_buffer.h> > #include <limits.h> > > +#include <include/libc-pointer-arith.h> > + > ssize_t > -__getdents64 (int fd, void *buf0, size_t nbytes) > +__getdents64 (int fd, void *buf, size_t nbytes) > { > - char *buf = buf0; > - > /* The system call takes an unsigned int argument, and some length > checks in the kernel use an int type. */ > if (nbytes > INT_MAX) > nbytes = INT_MAX; > > #ifdef __NR_getdents64 > - ssize_t ret = INLINE_SYSCALL_CALL (getdents64, fd, buf, nbytes); > - if (ret != -1) > - return ret; > + static int getdents64_supported = true; > + if (atomic_load_relaxed (&getdents64_supported)) > + { > + ssize_t ret = INLINE_SYSCALL_CALL (getdents64, fd, buf, nbytes); > + if (ret >= 0 || errno != ENOSYS) > + return ret; > + > + atomic_store_relaxed (&getdents64_supported, false); > + } > #endif > > /* Unfortunately getdents64 was only wire-up for MIPS n64 on Linux 3.10. > - If syscall is not available it need to fallback to non-LFS one. */ > + If the syscall is not available it need to fallback to the non-LFS one. > + Also to avoid an unbounded allocation through VLA/alloca or malloc (which > + would make the syscall non async-signal-safe) it uses a limited buffer. > + This is sub-optimal for large NBYTES, however this is a fallback > + mechanism to emulate a syscall that kernel should provide. */ > > struct kernel_dirent > - { > - unsigned long d_ino; > - unsigned long d_off; > - unsigned short int d_reclen; > - char d_name[256]; > - }; > + { > +#if _MIPS_SIM == _ABI64 > + uint64_t d_ino; > + uint64_t d_off; > +#else > + uint32_t d_ino; > + uint32_t d_off; > +#endif > + unsigned short int d_reclen; > + char d_name[1]; > + }; > + > + /* The largest possible practical length of the d_name member are 255 > + Unicode characters in UTF-8 encoding, so d_name is 766 bytes long, plus > + 18 (mips64) / 10 (mips64n32) bytes from header, for total of 784 (mips64) > + / 776 (mips64n32) bytes total. Ensure that the minimum size holds at > + least one entry. */ > + enum { KBUF_SIZE = 1024 }; > + char kbuf[KBUF_SIZE]; > + size_t kbuf_size = nbytes < KBUF_SIZE ? nbytes : KBUF_SIZE; > > const size_t size_diff = (offsetof (struct dirent64, d_name) > - - offsetof (struct kernel_dirent, d_name)); > + - offsetof (struct kernel_dirent, d_name)); > > - size_t red_nbytes = MIN (nbytes > - - ((nbytes / (offsetof (struct dirent64, d_name) > - + 14)) * size_diff), > - nbytes - size_diff); > + struct dirent64 *dp = (struct dirent64 *) buf; > > - struct scratch_buffer tmpbuf; > - scratch_buffer_init (&tmpbuf); > - if (!scratch_buffer_set_array_size (&tmpbuf, red_nbytes, sizeof (uint8_t))) > - INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOMEM); > + size_t nb = 0; > + off64_t last_offset = -1; > > - struct kernel_dirent *skdp, *kdp; > - skdp = kdp = tmpbuf.data; > + ssize_t r = INLINE_SYSCALL_CALL (getdents, fd, kbuf, kbuf_size); > + if (r <= 0) > + return r; > > - ssize_t retval = INLINE_SYSCALL_CALL (getdents, fd, kdp, red_nbytes); > - if (retval == -1) > - { > - scratch_buffer_free (&tmpbuf); > - return -1; > - } > + struct kernel_dirent *skdp, *kdp; > + skdp = kdp = (struct kernel_dirent *) kbuf; > > - off64_t last_offset = -1; > - struct dirent64 *dp = (struct dirent64 *) buf; > - while ((char *) kdp < (char *) skdp + retval) > + while ((char *) kdp < (char *) skdp + r) > { > - const size_t alignment = _Alignof (struct dirent64); > - /* Since kdp->d_reclen is already aligned for the kernel structure > - this may compute a value that is bigger than necessary. */ > - size_t new_reclen = ((kdp->d_reclen + size_diff + alignment - 1) > - & ~(alignment - 1)); > - if ((char *) dp + new_reclen > buf + nbytes) > - { > - /* Our heuristic failed. We read too many entries. Reset > - the stream. */ > - assert (last_offset != -1); > - __lseek64 (fd, last_offset, SEEK_SET); > - > - if ((char *) dp == buf) > + /* This is a conservative approximation, since some of size_diff might > + fit into the existing padding for alignment. */ > + size_t new_reclen = ALIGN_UP (kdp->d_reclen + size_diff, > + _Alignof (struct dirent64)); > + if (nb + new_reclen > nbytes) > + { > + /* Entry is too large for the fixed-size buffer. */ > + if (last_offset == -1) > { > - scratch_buffer_free (&tmpbuf); > - return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL); > + __set_errno (EINVAL); > + return -1; > } > > - break; > + /* The new entry will overflow the input buffer, rewind to last > + obtained entry and return. */ > + __lseek64 (fd, last_offset, SEEK_SET); > + return (char *) dp - (char *) buf; > } > - > - last_offset = kdp->d_off; > - dp->d_ino = kdp->d_ino; > - dp->d_off = kdp->d_off; > - dp->d_reclen = new_reclen; > + nb += new_reclen; > + > +#define copy_field(dst, src) \ > + ({ \ > + typeof (src) _src; \ > + memcpy (&_src, &(src), sizeof (src)); \ > + typeof (dst) _dst = _src; \ > + memcpy (&(dst), &_dst, sizeof (dst)); \ > + }) > + > + copy_field (dp->d_ino, kdp->d_ino); > + copy_field (dp->d_off, kdp->d_off); > + copy_field (last_offset, kdp->d_off); > + copy_field (dp->d_reclen, new_reclen); > dp->d_type = *((char *) kdp + kdp->d_reclen - 1); > memcpy (dp->d_name, kdp->d_name, > kdp->d_reclen - offsetof (struct kernel_dirent, d_name)); > @@ -112,8 +132,7 @@ __getdents64 (int fd, void *buf0, size_t nbytes) > kdp = (struct kernel_dirent *) (((char *) kdp) + kdp->d_reclen); > } > > - scratch_buffer_free (&tmpbuf); > - return (char *) dp - buf; > + return (char *) dp - (char *) buf; > } > libc_hidden_def (__getdents64) > weak_alias (__getdents64, getdents64) >
* Adhemerval Zanella: > +#define copy_field(dst, src) \ > + ({ \ > + typeof (src) _src; \ > + memcpy (&_src, &(src), sizeof (src)); \ > + typeof (dst) _dst = _src; \ > + memcpy (&(dst), &_dst, sizeof (dst)); \ > + }) > + > + copy_field (dp->d_ino, kdp->d_ino); > + copy_field (dp->d_off, kdp->d_off); > + copy_field (last_offset, kdp->d_off); > + copy_field (dp->d_reclen, new_reclen); > dp->d_type = *((char *) kdp + kdp->d_reclen - 1); I believe this still asserts the dynamic type of *dp, which is not what we want. The truly portable way probably involves using offsetof and not -> dereferencing. 8-( Considering that, I would probably drop copy_field, compile the file with -fno-strict-aliasing, and add a comment to the (now plain assignments) that this is okay due to -fno-strict-aliasing. But this is really up to you, I do not want to discuss this patch to death. Thanks, Florian
Patch
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c b/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c index 8bf3abb0e0..02e15a0b2e 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c +++ b/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c @@ -22,88 +22,108 @@ #include <assert.h> #include <sys/param.h> #include <unistd.h> -#include <scratch_buffer.h> #include <limits.h> +#include <include/libc-pointer-arith.h> + ssize_t -__getdents64 (int fd, void *buf0, size_t nbytes) +__getdents64 (int fd, void *buf, size_t nbytes) { - char *buf = buf0; - /* The system call takes an unsigned int argument, and some length checks in the kernel use an int type. */ if (nbytes > INT_MAX) nbytes = INT_MAX; #ifdef __NR_getdents64 - ssize_t ret = INLINE_SYSCALL_CALL (getdents64, fd, buf, nbytes); - if (ret != -1) - return ret; + static int getdents64_supported = true; + if (atomic_load_relaxed (&getdents64_supported)) + { + ssize_t ret = INLINE_SYSCALL_CALL (getdents64, fd, buf, nbytes); + if (ret >= 0 || errno != ENOSYS) + return ret; + + atomic_store_relaxed (&getdents64_supported, false); + } #endif /* Unfortunately getdents64 was only wire-up for MIPS n64 on Linux 3.10. - If syscall is not available it need to fallback to non-LFS one. */ + If the syscall is not available it need to fallback to the non-LFS one. + Also to avoid an unbounded allocation through VLA/alloca or malloc (which + would make the syscall non async-signal-safe) it uses a limited buffer. + This is sub-optimal for large NBYTES, however this is a fallback + mechanism to emulate a syscall that kernel should provide. */ struct kernel_dirent - { - unsigned long d_ino; - unsigned long d_off; - unsigned short int d_reclen; - char d_name[256]; - }; + { +#if _MIPS_SIM == _ABI64 + uint64_t d_ino; + uint64_t d_off; +#else + uint32_t d_ino; + uint32_t d_off; +#endif + unsigned short int d_reclen; + char d_name[1]; + }; + + /* The largest possible practical length of the d_name member are 255 + Unicode characters in UTF-8 encoding, so d_name is 766 bytes long, plus + 18 (mips64) / 10 (mips64n32) bytes from header, for total of 784 (mips64) + / 776 (mips64n32) bytes total. Ensure that the minimum size holds at + least one entry. */ + enum { KBUF_SIZE = 1024 }; + char kbuf[KBUF_SIZE]; + size_t kbuf_size = nbytes < KBUF_SIZE ? nbytes : KBUF_SIZE; const size_t size_diff = (offsetof (struct dirent64, d_name) - - offsetof (struct kernel_dirent, d_name)); + - offsetof (struct kernel_dirent, d_name)); - size_t red_nbytes = MIN (nbytes - - ((nbytes / (offsetof (struct dirent64, d_name) - + 14)) * size_diff), - nbytes - size_diff); + struct dirent64 *dp = (struct dirent64 *) buf; - struct scratch_buffer tmpbuf; - scratch_buffer_init (&tmpbuf); - if (!scratch_buffer_set_array_size (&tmpbuf, red_nbytes, sizeof (uint8_t))) - INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOMEM); + size_t nb = 0; + off64_t last_offset = -1; - struct kernel_dirent *skdp, *kdp; - skdp = kdp = tmpbuf.data; + ssize_t r = INLINE_SYSCALL_CALL (getdents, fd, kbuf, kbuf_size); + if (r <= 0) + return r; - ssize_t retval = INLINE_SYSCALL_CALL (getdents, fd, kdp, red_nbytes); - if (retval == -1) - { - scratch_buffer_free (&tmpbuf); - return -1; - } + struct kernel_dirent *skdp, *kdp; + skdp = kdp = (struct kernel_dirent *) kbuf; - off64_t last_offset = -1; - struct dirent64 *dp = (struct dirent64 *) buf; - while ((char *) kdp < (char *) skdp + retval) + while ((char *) kdp < (char *) skdp + r) { - const size_t alignment = _Alignof (struct dirent64); - /* Since kdp->d_reclen is already aligned for the kernel structure - this may compute a value that is bigger than necessary. */ - size_t new_reclen = ((kdp->d_reclen + size_diff + alignment - 1) - & ~(alignment - 1)); - if ((char *) dp + new_reclen > buf + nbytes) - { - /* Our heuristic failed. We read too many entries. Reset - the stream. */ - assert (last_offset != -1); - __lseek64 (fd, last_offset, SEEK_SET); - - if ((char *) dp == buf) + /* This is a conservative approximation, since some of size_diff might + fit into the existing padding for alignment. */ + size_t new_reclen = ALIGN_UP (kdp->d_reclen + size_diff, + _Alignof (struct dirent64)); + if (nb + new_reclen > nbytes) + { + /* Entry is too large for the fixed-size buffer. */ + if (last_offset == -1) { - scratch_buffer_free (&tmpbuf); - return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL); + __set_errno (EINVAL); + return -1; } - break; + /* The new entry will overflow the input buffer, rewind to last + obtained entry and return. */ + __lseek64 (fd, last_offset, SEEK_SET); + return (char *) dp - (char *) buf; } - - last_offset = kdp->d_off; - dp->d_ino = kdp->d_ino; - dp->d_off = kdp->d_off; - dp->d_reclen = new_reclen; + nb += new_reclen; + +#define copy_field(dst, src) \ + ({ \ + typeof (src) _src; \ + memcpy (&_src, &(src), sizeof (src)); \ + typeof (dst) _dst = _src; \ + memcpy (&(dst), &_dst, sizeof (dst)); \ + }) + + copy_field (dp->d_ino, kdp->d_ino); + copy_field (dp->d_off, kdp->d_off); + copy_field (last_offset, kdp->d_off); + copy_field (dp->d_reclen, new_reclen); dp->d_type = *((char *) kdp + kdp->d_reclen - 1); memcpy (dp->d_name, kdp->d_name, kdp->d_reclen - offsetof (struct kernel_dirent, d_name)); @@ -112,8 +132,7 @@ __getdents64 (int fd, void *buf0, size_t nbytes) kdp = (struct kernel_dirent *) (((char *) kdp) + kdp->d_reclen); } - scratch_buffer_free (&tmpbuf); - return (char *) dp - buf; + return (char *) dp - (char *) buf; } libc_hidden_def (__getdents64) weak_alias (__getdents64, getdents64)