From patchwork Mon Sep 2 17:38:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 34373 Received: (qmail 64584 invoked by alias); 2 Sep 2019 17:38:50 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 64560 invoked by uid 89); 2 Sep 2019 17:38:50 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=ack, H*i:sk:87zhjmv, H*f:sk:87zhjmv, price X-HELO: mail-qk1-f194.google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=to:cc:references:from:openpgp:autocrypt:subject:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=vwghkPrXASZ5iTBTqKbKnYX7wL3Ejn5T9ESzDVTECCg=; b=YeSKkzNRzK0HHehWslSt40wRI9KgnMEaUEZJb9jEPk/MLKyPWDaI3GuabuGGG3rxQb a6QM0fvGvIZyQ+AbYKfvL8PpayXwZ42/jY7EzIG+oMx8DptUDSmObPVeSSo9pkjyALNm CUcjoMv+qPWVQqEk7KsoRhmm0fKGxjWPOnNMlrIM0wVESWw8pmqyOZ7UNKimFn1LTm1E lZciewJx4FSbiY2PKahSpNdBpKgarj9o2SYNtrQu+mgTRW435leG7S/CfZ6Lcq53IfMR TWSRx/lEjJ2AojvleDsu4ifneTifjfXinX57Q273ojNNcVObk0F6o1hHLJwKXAO5suOI dA9g== Return-Path: To: Florian Weimer Cc: libc-alpha@sourceware.org References: <20190731183136.21545-1-adhemerval.zanella@linaro.org> <87k1av6kuh.fsf@oldenburg2.str.redhat.com> <49dc9440-d9a5-112f-9ec6-b62f4c0bcdf2@linaro.org> <87zhjmvoow.fsf@oldenburg2.str.redhat.com> From: Adhemerval Zanella Openpgp: preference=signencrypt Subject: Re: [PATCH v2 1/5] mips: Do not malloc on getdents64 fallback Message-ID: <30395e41-d6c9-4fce-ca7a-933d55900fc7@linaro.org> Date: Mon, 2 Sep 2019 14:38:42 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <87zhjmvoow.fsf@oldenburg2.str.redhat.com> On 02/09/2019 09:59, Florian Weimer wrote: > * Adhemerval Zanella: > >>> The choice of size needs a comment. I think the largest possible >>> practical length of the d_name member are 255 Unicode characters in the >>> BMP, in UTF-8 encoding, so d_name is 766 bytes long, plus 10 bytes from >>> the header, for 776 bytes total. (NAME_MAX is not a constant on Linux >>> in reality.) >> >> I picked the buffer as an arbitrary value, what about: >> >> /* The largest possible practical length of the d_name member are 255 >> Unicode characters in UTF-8 encoding, so d_name is 766 bytes long, plus >> 18 (mips64) / 10 (mips64n32) bytes from header, for total of 784 (mips64) >> / 776 (mips64n32) bytes total. Ensure that the minimum size hold at >> least one entry. */ >> enum { KBUF_SIZE = 1024 }; > > “holds” > > Looks good. > >>>> struct kernel_dirent >>>> + { >>>> + unsigned long d_ino; >>>> + unsigned long d_off; >>>> + unsigned short int d_reclen; >>>> + char d_name[1]; >>>> + } kbuf[KBUF_SIZE / sizeof (struct kernel_dirent)]; >>>> + size_t kbuf_size = nbytes < KBUF_SIZE ? nbytes : KBUF_SIZE; >>> >>> I would define kbuf as a char array, and perhaps leave out the d_name >>> member in struct kernel_dirent. You can copy out the struct >>> kernel_dirent using memcpy, which GCC should optimize away. >> >> I defined the buffer as 'struct kernel_dirent' to make it easier to align >> for the required fields. It allows simplify the access on the loop to >> avoid memcpy calls. > > But the code is invalid C as a result of this. We do not compile glibc > with -fno-strict-aliasing, after all. Right, we indeed do some pointer arithmetic to get the kdp value. I change to use char buffer plus memcpy to obtain the fields. > >>>> struct dirent64 *dp = (struct dirent64 *) buf; >>>> + >>>> + size_t nb = 0; >>>> + off64_t last_offset = -1; >>>> + >>>> + ssize_t r; >>>> + while ((r = INLINE_SYSCALL_CALL (getdents, fd, kbuf, kbuf_size)) > 0) >>>> { >>> >>> Sorry, I don't see how the outer loop is exited. I think we should >>> remove it because it does not seem necessary. >> >> We still need to handle the cases where NBYTES are larger than the temporary >> buffer, because it might require multiple getdents calls. > > Why? The application (or readdir) will call us again to get more entries. As a simple optimization to avoid it, but I think we can move to a simplified version. > >>> Is the length really correct, though? I'd expect it to grow by the >>> additional size of the d_ino and d_off members. I think it would be >>> best recompute it from scratch, using the actual length of d_name. >> >> It it because you are referencing to an older patch version, checking on >> mips64-n32 I adjusted to: >> >> const size_t size_diff = (offsetof (struct dirent64, d_name) >> - offsetof (struct kernel_dirent, d_name)); >> [...] >> size_t new_reclen = ALIGN_UP (kdp->d_reclen + size_diff, >> _Alignof (struct dirent64)); >> [...] > > Okay, this needs a comment that this is a conservative approximation > (some of size_diff might fit into the existing padding for alignment). Ack. > >>>> + if (nb + new_reclen > nbytes) >>>> { >>>> + /* The new entry will overflow the input buffer, rewind to >>>> + last obtained entry and return. */ >>>> + __lseek64 (fd, last_offset, SEEK_SET); >>> >>> I don't think last_offset is guaranteed to have been set with a proper >>> offset at this point. Given that d_name is essentially of unbounded >>> length, even expanding the first entry can cause failure. >>> >>> Maybe it's possible to avoid this corner case by limiting the amount of >>> data being read so that we know that the application-supplied buffer is >>> always large enough for any possible expansion. I think the worse-case >>> growth is for lengths 5 to 8, from 20 bytes to 32 bytes. So perhaps we >>> should divide the buffer size by 1.6 and use that? >> >> For this case I really think we just need to return an error to user: >> >> if (nb + new_reclen > nbytes) >> { >> /* Entry is too large for the static buffer. */ > > Fixed-size buffer, it's not static. 8-) Ack. > >> if (last_offset == -1) >> { >> __set_errno (EINVAL); >> return -1; >> } >> /* The new entry will overflow the input buffer, rewind to >> last obtained entry and return. */ >> __lseek64 (fd, last_offset, SEEK_SET); >> goto out; >> } >> >> Again I see this fallback code as a best-effort since we are emulating the >> syscall with additional restraints. Most of time glibc tries to play smart >> emulating a syscall ended in a lot of headaches... > > I don't disagree. > > Which error code does the kernel return if no entry can be read at all? > We should mirror that. It returns -1/EINVAL. fs/readdir.c 177 if (reclen > buf->count) 178 return -EINVAL; > >>>> + goto out; >>>> } >>>> + nb += new_reclen; >>>> >>>> + dp->d_ino = kdp->d_ino; >>>> + dp->d_off = last_offset = kdp->d_off; >>>> + dp->d_reclen = new_reclen; >>>> + dp->d_type = *((char *) kdp + kdp->d_reclen - 1); >>> >>> I think instead of reading through kdp, you should use char *s and >>> memcpy, to avoid the aliasing violation, as discussed above. Likewise >>> for writing to dp. >> >> I think if we proper setting the buffer alignment there is no need to do it. >> Also the problem of using memcpy here is for mips64n32 the size is *not* >> equal for dp and kdp, each would require an extra step as: >> >> { >> typeof (kdp->d_ino) kino; >> memcpy (&kino, &kdp_d->ino, sizeof (kino)); >> typeof (dp->d_ino) dino = kino; >> memcpy (&dp->d_ino, &kino, sizeof (dino)); >> } > > I think that's just the price of writing correct C. It's also what the > kernel does. Ack. > > I don't even think there's a requirement that the byte buffer passed to > getdents64 has any kind of alignment. > > Thanks, > Florian > Updated patch below. --- This patch changes how the fallback getdents64 implementation calls non-LFS getdents by replacing the scratch_buffer with static buffer plus a loop on getdents calls. This avoids the potential malloc call on scratch_buffer_set_array_size for large input buffer size at the cost of more getdents syscalls. It also adds a small optimization for older kernels, where the first ENOSYS failure for getdents64 disable subsequent calls. Check the dirent tests on a mips64-linux-gnu with getdents64 code disabled. * sysdeps/unix/sysv/linux/mips/mips64/getdents64.c (__getdents64): Add small optimization for older kernel to avoid issuing __NR_getdents64 on each call and replace scratch_buffer usage with a static allocated buffer. --- .../unix/sysv/linux/mips/mips64/getdents64.c | 133 ++++++++++-------- 1 file changed, 76 insertions(+), 57 deletions(-) diff --git a/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c b/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c index 8bf3abb0e0..02e15a0b2e 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c +++ b/sysdeps/unix/sysv/linux/mips/mips64/getdents64.c @@ -22,88 +22,108 @@ #include #include #include -#include #include +#include + ssize_t -__getdents64 (int fd, void *buf0, size_t nbytes) +__getdents64 (int fd, void *buf, size_t nbytes) { - char *buf = buf0; - /* The system call takes an unsigned int argument, and some length checks in the kernel use an int type. */ if (nbytes > INT_MAX) nbytes = INT_MAX; #ifdef __NR_getdents64 - ssize_t ret = INLINE_SYSCALL_CALL (getdents64, fd, buf, nbytes); - if (ret != -1) - return ret; + static int getdents64_supported = true; + if (atomic_load_relaxed (&getdents64_supported)) + { + ssize_t ret = INLINE_SYSCALL_CALL (getdents64, fd, buf, nbytes); + if (ret >= 0 || errno != ENOSYS) + return ret; + + atomic_store_relaxed (&getdents64_supported, false); + } #endif /* Unfortunately getdents64 was only wire-up for MIPS n64 on Linux 3.10. - If syscall is not available it need to fallback to non-LFS one. */ + If the syscall is not available it need to fallback to the non-LFS one. + Also to avoid an unbounded allocation through VLA/alloca or malloc (which + would make the syscall non async-signal-safe) it uses a limited buffer. + This is sub-optimal for large NBYTES, however this is a fallback + mechanism to emulate a syscall that kernel should provide. */ struct kernel_dirent - { - unsigned long d_ino; - unsigned long d_off; - unsigned short int d_reclen; - char d_name[256]; - }; + { +#if _MIPS_SIM == _ABI64 + uint64_t d_ino; + uint64_t d_off; +#else + uint32_t d_ino; + uint32_t d_off; +#endif + unsigned short int d_reclen; + char d_name[1]; + }; + + /* The largest possible practical length of the d_name member are 255 + Unicode characters in UTF-8 encoding, so d_name is 766 bytes long, plus + 18 (mips64) / 10 (mips64n32) bytes from header, for total of 784 (mips64) + / 776 (mips64n32) bytes total. Ensure that the minimum size holds at + least one entry. */ + enum { KBUF_SIZE = 1024 }; + char kbuf[KBUF_SIZE]; + size_t kbuf_size = nbytes < KBUF_SIZE ? nbytes : KBUF_SIZE; const size_t size_diff = (offsetof (struct dirent64, d_name) - - offsetof (struct kernel_dirent, d_name)); + - offsetof (struct kernel_dirent, d_name)); - size_t red_nbytes = MIN (nbytes - - ((nbytes / (offsetof (struct dirent64, d_name) - + 14)) * size_diff), - nbytes - size_diff); + struct dirent64 *dp = (struct dirent64 *) buf; - struct scratch_buffer tmpbuf; - scratch_buffer_init (&tmpbuf); - if (!scratch_buffer_set_array_size (&tmpbuf, red_nbytes, sizeof (uint8_t))) - INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOMEM); + size_t nb = 0; + off64_t last_offset = -1; - struct kernel_dirent *skdp, *kdp; - skdp = kdp = tmpbuf.data; + ssize_t r = INLINE_SYSCALL_CALL (getdents, fd, kbuf, kbuf_size); + if (r <= 0) + return r; - ssize_t retval = INLINE_SYSCALL_CALL (getdents, fd, kdp, red_nbytes); - if (retval == -1) - { - scratch_buffer_free (&tmpbuf); - return -1; - } + struct kernel_dirent *skdp, *kdp; + skdp = kdp = (struct kernel_dirent *) kbuf; - off64_t last_offset = -1; - struct dirent64 *dp = (struct dirent64 *) buf; - while ((char *) kdp < (char *) skdp + retval) + while ((char *) kdp < (char *) skdp + r) { - const size_t alignment = _Alignof (struct dirent64); - /* Since kdp->d_reclen is already aligned for the kernel structure - this may compute a value that is bigger than necessary. */ - size_t new_reclen = ((kdp->d_reclen + size_diff + alignment - 1) - & ~(alignment - 1)); - if ((char *) dp + new_reclen > buf + nbytes) - { - /* Our heuristic failed. We read too many entries. Reset - the stream. */ - assert (last_offset != -1); - __lseek64 (fd, last_offset, SEEK_SET); - - if ((char *) dp == buf) + /* This is a conservative approximation, since some of size_diff might + fit into the existing padding for alignment. */ + size_t new_reclen = ALIGN_UP (kdp->d_reclen + size_diff, + _Alignof (struct dirent64)); + if (nb + new_reclen > nbytes) + { + /* Entry is too large for the fixed-size buffer. */ + if (last_offset == -1) { - scratch_buffer_free (&tmpbuf); - return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL); + __set_errno (EINVAL); + return -1; } - break; + /* The new entry will overflow the input buffer, rewind to last + obtained entry and return. */ + __lseek64 (fd, last_offset, SEEK_SET); + return (char *) dp - (char *) buf; } - - last_offset = kdp->d_off; - dp->d_ino = kdp->d_ino; - dp->d_off = kdp->d_off; - dp->d_reclen = new_reclen; + nb += new_reclen; + +#define copy_field(dst, src) \ + ({ \ + typeof (src) _src; \ + memcpy (&_src, &(src), sizeof (src)); \ + typeof (dst) _dst = _src; \ + memcpy (&(dst), &_dst, sizeof (dst)); \ + }) + + copy_field (dp->d_ino, kdp->d_ino); + copy_field (dp->d_off, kdp->d_off); + copy_field (last_offset, kdp->d_off); + copy_field (dp->d_reclen, new_reclen); dp->d_type = *((char *) kdp + kdp->d_reclen - 1); memcpy (dp->d_name, kdp->d_name, kdp->d_reclen - offsetof (struct kernel_dirent, d_name)); @@ -112,8 +132,7 @@ __getdents64 (int fd, void *buf0, size_t nbytes) kdp = (struct kernel_dirent *) (((char *) kdp) + kdp->d_reclen); } - scratch_buffer_free (&tmpbuf); - return (char *) dp - buf; + return (char *) dp - (char *) buf; } libc_hidden_def (__getdents64) weak_alias (__getdents64, getdents64)