Message ID | 5657C135.1090807@redhat.com |
---|---|
State | Dropped |
Headers |
Received: (qmail 69694 invoked by alias); 27 Nov 2015 02:34:34 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: <libc-alpha.sourceware.org> List-Unsubscribe: <mailto:libc-alpha-unsubscribe-##L=##H@sourceware.org> List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org> List-Archive: <http://sourceware.org/ml/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs> Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 69179 invoked by uid 89); 27 Nov 2015 02:34:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, SPF_HELO_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com To: GNU C Library <libc-alpha@sourceware.org> From: "Carlos O'Donell" <carlos@redhat.com> Subject: [RFC] Calling realloc vs. mallo/memcpy/free and _IO_vasprintf. X-Enigmail-Draft-Status: N1110 Message-ID: <5657C135.1090807@redhat.com> Date: Thu, 26 Nov 2015 21:34:29 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit |
Commit Message
Carlos O'Donell
Nov. 27, 2015, 2:34 a.m. UTC
While answering a question on libc-help about asprintf I happened to look at _IO_vasprintf and noticed some code which tries to decide if it should do a realloc vs. malloc/memcpy/free. In this particular case we should always be shrinking the buffer. For example _IO_vfprintf may have expanded the stream buffer but now no longer needs the extra space. If we are growing the buffer then something is very very wrong since that means _IO_vpfrintf will have failed due to lack of memory, and we handled that case earlier in _IO_vasprintf. So again, if we are only shrinking the buffer, then realloc has to be the most optimal operation in all cases. It would seem to me that an allocation shrink like this would almost never result in an allocation move, but rather malloc would just split the allocation and reuse the remainder for other malloc calls. So why the micro-optimization? Is my assumption of always shrinking the buffer wrong? If my assumption is right, and my logic is right, wouldn't the following patch be a natural cleanup? --- Cheers, Carlos.
Comments
On Thu, Nov 26, 2015 at 09:34:29PM -0500, Carlos O'Donell wrote: > While answering a question on libc-help about asprintf I happened > to look at _IO_vasprintf and noticed some code which tries to > decide if it should do a realloc vs. malloc/memcpy/free. > > In this particular case we should always be shrinking the buffer. > For example _IO_vfprintf may have expanded the stream buffer but > now no longer needs the extra space. If we are growing the buffer > then something is very very wrong since that means _IO_vpfrintf > will have failed due to lack of memory, and we handled that case > earlier in _IO_vasprintf. > > So again, if we are only shrinking the buffer, then realloc has > to be the most optimal operation in all cases. It would seem to > me that an allocation shrink like this would almost never result > in an allocation move, but rather malloc would just split the > allocation and reuse the remainder for other malloc calls. > > So why the micro-optimization? > > Is my assumption of always shrinking the buffer wrong? > > If my assumption is right, and my logic is right, wouldn't the > following patch be a natural cleanup? Why would malloc+memcpy+free be faster than a realloc, which does the same thing internally, but without adding the extra PLT and avoiding memcpy for smaller copies? I'd say just drop that bit and stick to realloc. Siddhesh
On 11/26/2015 10:14 PM, Siddhesh Poyarekar wrote: > On Thu, Nov 26, 2015 at 09:34:29PM -0500, Carlos O'Donell wrote: >> While answering a question on libc-help about asprintf I happened >> to look at _IO_vasprintf and noticed some code which tries to >> decide if it should do a realloc vs. malloc/memcpy/free. >> >> In this particular case we should always be shrinking the buffer. >> For example _IO_vfprintf may have expanded the stream buffer but >> now no longer needs the extra space. If we are growing the buffer >> then something is very very wrong since that means _IO_vpfrintf >> will have failed due to lack of memory, and we handled that case >> earlier in _IO_vasprintf. >> >> So again, if we are only shrinking the buffer, then realloc has >> to be the most optimal operation in all cases. It would seem to >> me that an allocation shrink like this would almost never result >> in an allocation move, but rather malloc would just split the >> allocation and reuse the remainder for other malloc calls. >> >> So why the micro-optimization? >> >> Is my assumption of always shrinking the buffer wrong? >> >> If my assumption is right, and my logic is right, wouldn't the >> following patch be a natural cleanup? > > Why would malloc+memcpy+free be faster than a realloc, which does the > same thing internally, but without adding the extra PLT and avoiding > memcpy for smaller copies? I'd say just drop that bit and stick to > realloc. Agreed. It appears to me that _int_realloc always splits the allocated region, and frees the remainder if >= MINSIZE (non-mmaped case), which is a faster operation. In the mmaped case we call mremap which will shrink the mapping, which is a syscall, and in that case *might* be slower tahn malloc/memcpy/free due to the cost of entering the kernel and the work needing to be done. However, I would argue that malloc/memcpy/free in the mean performance case is likely to be a lot more work than just realloc. Cheers, Carlos.
On 11/26/2015 10:14 PM, Siddhesh Poyarekar wrote: > On Thu, Nov 26, 2015 at 09:34:29PM -0500, Carlos O'Donell wrote: >> While answering a question on libc-help about asprintf I happened >> to look at _IO_vasprintf and noticed some code which tries to >> decide if it should do a realloc vs. malloc/memcpy/free. >> >> In this particular case we should always be shrinking the buffer. >> For example _IO_vfprintf may have expanded the stream buffer but >> now no longer needs the extra space. If we are growing the buffer >> then something is very very wrong since that means _IO_vpfrintf >> will have failed due to lack of memory, and we handled that case >> earlier in _IO_vasprintf. >> >> So again, if we are only shrinking the buffer, then realloc has >> to be the most optimal operation in all cases. It would seem to >> me that an allocation shrink like this would almost never result >> in an allocation move, but rather malloc would just split the >> allocation and reuse the remainder for other malloc calls. >> >> So why the micro-optimization? >> >> Is my assumption of always shrinking the buffer wrong? >> >> If my assumption is right, and my logic is right, wouldn't the >> following patch be a natural cleanup? > > Why would malloc+memcpy+free be faster than a realloc, which does the > same thing internally, but without adding the extra PLT and avoiding > memcpy for smaller copies? I'd say just drop that bit and stick to > realloc. Just to verify my assertion about the buffer always being shrunk, I added: /* The amount the stream has allocated is always >= what we needed. */ assert ((sf._sbf._f._IO_write_end - sf._sbf._f._IO_write_base) >= needed); And it triggers in the testsuite: (gdb) p sf._sbf._f._IO_write_end $2 = 0x9c5054 "" (gdb) p sf._sbf._f._IO_write_base $3 = 0x9c4ff0 "fma_downward (0x3.", '0' <repeats 12 times>, "1ffcp+10132, -0x2.", '0' <repeats 15 times>, "4p+4432, 0x1.80e000000003fffep+14516)" (gdb) p needed $4 = 101 (gdb) p sf._sbf._f._IO_write_end - sf._sbf._f._IO_write_base $5 = 100 (gdb) It's a case where the initial working buffer is exactly equal to what's needed minus the NULL, and because we have to return a null-terminated string we must grow the buffer by 1 byte. I still don't see how realloc is going to be slower since it's just one call, and there *might* be enough space in this chunk or the next to grow by one byte and that avoids the memcpy. Cheers, Carlos.
On Fri, Nov 27, 2015 at 12:38:45AM -0500, Carlos O'Donell wrote: > I still don't see how realloc is going to be slower since it's just > one call, and there *might* be enough space in this chunk or the next > to grow by one byte and that avoids the memcpy. Agreed, in fact even in the mremap case you mentioned, the malloc+memcpy+free ought to be slower because you end up calling two syscalls (mmap followed by munmap) instead of just mremap in case of realloc. I don't think there is a situation where realloc calls mremap and the malloc+memcpy+free combination allocates on the heap. Siddhesh
On 11/27/2015 12:43 AM, Siddhesh Poyarekar wrote: > On Fri, Nov 27, 2015 at 12:38:45AM -0500, Carlos O'Donell wrote: >> I still don't see how realloc is going to be slower since it's just >> one call, and there *might* be enough space in this chunk or the next >> to grow by one byte and that avoids the memcpy. > > Agreed, in fact even in the mremap case you mentioned, the > malloc+memcpy+free ought to be slower because you end up calling two > syscalls (mmap followed by munmap) instead of just mremap in case of > realloc. I don't think there is a situation where realloc calls > mremap and the malloc+memcpy+free combination allocates on the heap. The *only* interesting case is: What if we don't realloc a too-large buffer? What if we kept the too-large buffer, null-terminated the string, and returned that to the caller? You are trading off size of allocated buffer versus realloc cost. My reading of vfprintf.c shows that we malloc rather conservatively to grow the working buffer as we memcpy (xsputn) the results into place. Therefore the working buffer shouldn't be that much larger than what's needed, and that means we might just get away with avoiding the realloc in the case the buffer is more than 1 byte (for NULL) larger than required? So in 2000 (15 years ago), Ulrich added this code, it previously did an unconditional realloc as we suggest today. The comment was: commit 4100c5a0e62c461adf2757a163dafbbc02ca3fc4 Author: Ulrich Drepper <drepper@redhat.com> Date: Tue Apr 11 17:53:41 2000 +0000 (_IO_vasprintf): Try to avoid memory fragmentation by allocating new memory at the end instead of reallocating. I agree that realloc has the potential to fragment the arena, but we have no tools to measure this. If we are going to micro-optimize to reduce heap fragmentation we are going to need a lot more instrumentation in malloc. Thoughts? Cheers, Carlos.
On Thu, Nov 26, 2015 at 09:34:29PM -0500, Carlos O'Donell wrote: > While answering a question on libc-help about asprintf I happened > to look at _IO_vasprintf and noticed some code which tries to > decide if it should do a realloc vs. malloc/memcpy/free. > > In this particular case we should always be shrinking the buffer. > For example _IO_vfprintf may have expanded the stream buffer but > now no longer needs the extra space. If we are growing the buffer > then something is very very wrong since that means _IO_vpfrintf > will have failed due to lack of memory, and we handled that case > earlier in _IO_vasprintf. > > So again, if we are only shrinking the buffer, then realloc has > to be the most optimal operation in all cases. It would seem to > me that an allocation shrink like this would almost never result > in an allocation move, but rather malloc would just split the > allocation and reuse the remainder for other malloc calls. > > So why the micro-optimization? > > Is my assumption of always shrinking the buffer wrong? I suspect the aim is not optimization but rather to prevent fragmentation from leaving a small allocation behind in an area carved out of large free space. Whether the code is effective at doing this, I'm not sure. Rich
On 11/27/2015 10:45 AM, Rich Felker wrote: > On Thu, Nov 26, 2015 at 09:34:29PM -0500, Carlos O'Donell wrote: >> While answering a question on libc-help about asprintf I happened >> to look at _IO_vasprintf and noticed some code which tries to >> decide if it should do a realloc vs. malloc/memcpy/free. >> >> In this particular case we should always be shrinking the buffer. >> For example _IO_vfprintf may have expanded the stream buffer but >> now no longer needs the extra space. If we are growing the buffer >> then something is very very wrong since that means _IO_vpfrintf >> will have failed due to lack of memory, and we handled that case >> earlier in _IO_vasprintf. >> >> So again, if we are only shrinking the buffer, then realloc has >> to be the most optimal operation in all cases. It would seem to >> me that an allocation shrink like this would almost never result >> in an allocation move, but rather malloc would just split the >> allocation and reuse the remainder for other malloc calls. >> >> So why the micro-optimization? >> >> Is my assumption of always shrinking the buffer wrong? > > I suspect the aim is not optimization but rather to prevent > fragmentation from leaving a small allocation behind in an area carved > out of large free space. Whether the code is effective at doing this, > I'm not sure. As I mention here in the final patch: https://www.sourceware.org/ml/libc-alpha/2015-11/msg00607.html My preference is for realloc to be doing this kind of work by making a decision to split or move based on some kind of fragmentation estimate (which it carries as part of the arena information perhaps). I don't know that the caller is really the best place to put this heuristic. Thoughts? Cheers, Carlos.
On Fri, Nov 27, 2015 at 11:07:12AM -0500, Carlos O'Donell wrote: > On 11/27/2015 10:45 AM, Rich Felker wrote: > > On Thu, Nov 26, 2015 at 09:34:29PM -0500, Carlos O'Donell wrote: > >> While answering a question on libc-help about asprintf I happened > >> to look at _IO_vasprintf and noticed some code which tries to > >> decide if it should do a realloc vs. malloc/memcpy/free. > >> > >> In this particular case we should always be shrinking the buffer. > >> For example _IO_vfprintf may have expanded the stream buffer but > >> now no longer needs the extra space. If we are growing the buffer > >> then something is very very wrong since that means _IO_vpfrintf > >> will have failed due to lack of memory, and we handled that case > >> earlier in _IO_vasprintf. > >> > >> So again, if we are only shrinking the buffer, then realloc has > >> to be the most optimal operation in all cases. It would seem to > >> me that an allocation shrink like this would almost never result > >> in an allocation move, but rather malloc would just split the > >> allocation and reuse the remainder for other malloc calls. > >> > >> So why the micro-optimization? > >> > >> Is my assumption of always shrinking the buffer wrong? > > > > I suspect the aim is not optimization but rather to prevent > > fragmentation from leaving a small allocation behind in an area carved > > out of large free space. Whether the code is effective at doing this, > > I'm not sure. > > As I mention here in the final patch: > https://www.sourceware.org/ml/libc-alpha/2015-11/msg00607.html > > My preference is for realloc to be doing this kind of work by making > a decision to split or move based on some kind of fragmentation > estimate (which it carries as part of the arena information perhaps). > > I don't know that the caller is really the best place to put this > heuristic. > > Thoughts? I'm not clear on which is best. The caller has more knowledge that the allocator implementation fundamentally cannot have, like knowing whether it will pass the object to realloc again to enlarge it again, etc. and there's also the issue that even if glibc's realloc is smart about this, replacement allocator implementations (which glibc allows) may not be. Personally, I try to avoid using realloc for any purposes except enlarging existing allocations either to their now-known final size or as part of an exponential growth strategy that bounds the number of memcpy operations that might be needed. Rich
diff --git a/libio/vasprintf.c b/libio/vasprintf.c index 61cdfdd..fff8c54 100644 --- a/libio/vasprintf.c +++ b/libio/vasprintf.c @@ -62,24 +62,13 @@ _IO_vasprintf (char **result_ptr, const char *format, _IO_va_list args) free (sf._sbf._f._IO_buf_base); return ret; } - /* Only use realloc if the size we need is of the same (binary) - order of magnitude then the memory we allocated. */ needed = sf._sbf._f._IO_write_ptr - sf._sbf._f._IO_write_base + 1; allocated = sf._sbf._f._IO_write_end - sf._sbf._f._IO_write_base; - if ((allocated >> 1) <= needed) - *result_ptr = (char *) realloc (sf._sbf._f._IO_buf_base, needed); - else - { - *result_ptr = (char *) malloc (needed); - if (*result_ptr != NULL) - { - memcpy (*result_ptr, sf._sbf._f._IO_buf_base, needed - 1); - free (sf._sbf._f._IO_buf_base); - } - else - /* We have no choice, use the buffer we already have. */ - *result_ptr = (char *) realloc (sf._sbf._f._IO_buf_base, needed); - } + /* We are either shrinking the buffer or doing nothing, otherwise + _IO_vfprintf would have failed itself being unable to grow the + buffer to the needed size. Therefore it's always fastest here to + call realloc. */ + *result_ptr = (char *) realloc (sf._sbf._f._IO_buf_base, needed); if (*result_ptr == NULL) *result_ptr = sf._sbf._f._IO_buf_base; (*result_ptr)[needed - 1] = '\0';