Patchwork Introduce gdb::byte_vector, add allocator that default-initializes

login
register
mail settings
Submitter Pedro Alves
Date June 12, 2017, 5:07 p.m.
Message ID <1497287225-16542-1-git-send-email-palves@redhat.com>
Download mbox | patch
Permalink /patch/20962/
State New
Headers show

Comments

Pedro Alves - June 12, 2017, 5:07 p.m.
In some cases we've been replacing heap-allocated gdb_byte buffers
managed with xmalloc/make_cleanup(xfree) with gdb::vector<gdb_byte>.
That usually pessimizes the code a little bit because std::vector
value-initializes elements (which for gdb_byte means
zero-initialization), while if you're creating a temporary buffer,
you're most certaintly going to fill it in with some data.  An
alternative is to use

  unique_ptr<gdb_byte[]> buf (new gdb_byte[size]);

but it looks like that's not very popular.

Recently, a use of obstacks in dwarf2read.c was replaced with
std::vector<gdb_byte> and that as well introduced a pessimization for
always memsetting the buffer when it's garanteed that the zeros will
be overwritten immediately.  (see dwarf2read.c change in this patch to
find it.)

So here's a different take at addressing this issue "by design":

#1 - Introduce default_init_allocator<T>

I.e., a custom allocator that does default construction using default
initialization, meaning, no more zero initialization.  That's the
default_init_allocation<T> class added in this patch.

See "Notes" at
<http://en.cppreference.com/w/cpp/container/vector/resize>.

#2 - Introduce def_vector<T>

I.e., a convenience typedef, because typing the allocator is annoying:

  using def_vector<T> = std::vector<T, gdb::default_init_allocator<T>>;

#3 - Introduce byte_vector

Because gdb_byte vectors will be the common thing, add a convenience
"byte_vector" typedef:

  using byte_vector = def_vector<gdb_byte>;

which is really the same as:

  std::vector<gdb_byte, gdb::default_init_allocator<gdb_byte>>;

The intent then is to make "gdb::byte_vector" be the go-to for dynamic
byte buffers.  So the less friction, the better.

#4 - Adjust current code to use it.

To set the example going forward.  Replace std::vector uses and also
unique_ptr<byte[]> uses.

One nice thing is that with this allocator, for changes like these:

  -std::unique_ptr<byte[]> buf (new gdb_byte[some_size]);
  +gdb::byte_vector buf (some_size);
   fill_with_data (buf.data (), buf.size ());

the generated code is the exact same as before.  I.e., the compiler
de-structures the vector and gets rid of the unused "reserved vs size"
related fields.

The other nice thing is that it's easier to write
  gdb::byte_vector buf (size);
than
  std::unique_ptr<gdb_byte[]> buf (new gdb_byte[size]);
or even (C++14):
  auto buf = std::make_unique<gdb_byte[]> (size); // zero-initializes...

#5 - Suggest s/std::vector<gdb_byte>/gdb::byte_vector/ going forward.

Note that this patch actually fixes a couple of bugs where the current
code is incorrectly using "std::vector::reserve(new_size)" and then
accessing the vector's internal buffer beyond that vector's size: see
dwarf2loc.c and charset.c.  That's undefined behavior and may trigger
debug mode assertion failures.  With default_init_allocator,
"resize()" behaves like "reserve()" performance wise, in that it
leaves new elements with unspecified values, but, it does that safely
without triggering undefined behavior.

gdb/ChangeLog:
2017-06-12  Pedro Alves  <palves@redhat.com>

	* ada-lang.c: Include "common/byte-vector.h".
	(ada_value_primitive_packed_val): Use gdb::byte_vector.
	* charset.c (wchar_iterator::iterate): Resize the vector instead
	of reserving it.
	* common/byte-vector.h: Include "common/def-vector.h".
	(wchar_iterator::m_out): Now a gdb::def_vector<gdb_wchar_t>.
	* cli/cli-dump.c: Include "common/byte-vector.h".
	(dump_memory_to_file, restore_binary_file): Use gdb::byte_vector.
	* common/byte-vector.h: New file.
	* common/def-vector.h: New file.
	* common/default-init-alloc.h: New file.
	* dwarf2loc.c: Include "common/byte-vector.h".
	(read_pieced_value): Use gdb::byte_vector, and resize the vector
	instead of reserving it.
	* dwarf2read.c: Include "common/byte-vector.h".
	(data_buf::m_vec): Now a gdb::byte_vector.
	* gdb_regex.c: Include "common/def-vector.h".
	(compiled_regex::compiled_regex): Use gdb::def_vector<char>.
	* mi/mi-main.c: Include "common/byte-vector.h".
	(mi_cmd_data_read_memory): Use gdb::byte_vector.
---
 gdb/ada-lang.c                  | 16 +++++-----
 gdb/charset.c                   |  2 +-
 gdb/charset.h                   |  4 +--
 gdb/cli/cli-dump.c              | 16 +++++-----
 gdb/common/byte-vector.h        | 62 ++++++++++++++++++++++++++++++++++++++
 gdb/common/def-vector.h         | 36 ++++++++++++++++++++++
 gdb/common/default-init-alloc.h | 67 +++++++++++++++++++++++++++++++++++++++++
 gdb/dwarf2loc.c                 |  7 +++--
 gdb/dwarf2read.c                |  3 +-
 gdb/gdb_regex.c                 |  7 +++--
 gdb/mi/mi-main.c                |  5 +--
 11 files changed, 197 insertions(+), 28 deletions(-)
 create mode 100644 gdb/common/byte-vector.h
 create mode 100644 gdb/common/def-vector.h
 create mode 100644 gdb/common/default-init-alloc.h
Simon Marchi - June 12, 2017, 7:52 p.m.
On 2017-06-12 19:07, Pedro Alves wrote:
> In some cases we've been replacing heap-allocated gdb_byte buffers
> managed with xmalloc/make_cleanup(xfree) with gdb::vector<gdb_byte>.
> That usually pessimizes the code a little bit because std::vector
> value-initializes elements (which for gdb_byte means
> zero-initialization), while if you're creating a temporary buffer,
> you're most certaintly going to fill it in with some data.  An
> alternative is to use
> 
>   unique_ptr<gdb_byte[]> buf (new gdb_byte[size]);
> 
> but it looks like that's not very popular.
> 
> Recently, a use of obstacks in dwarf2read.c was replaced with
> std::vector<gdb_byte> and that as well introduced a pessimization for
> always memsetting the buffer when it's garanteed that the zeros will
> be overwritten immediately.  (see dwarf2read.c change in this patch to
> find it.)
> 
> So here's a different take at addressing this issue "by design":
> 
> #1 - Introduce default_init_allocator<T>
> 
> I.e., a custom allocator that does default construction using default
> initialization, meaning, no more zero initialization.  That's the
> default_init_allocation<T> class added in this patch.
> 
> See "Notes" at
> <http://en.cppreference.com/w/cpp/container/vector/resize>.
> 
> #2 - Introduce def_vector<T>
> 
> I.e., a convenience typedef, because typing the allocator is annoying:
> 
>   using def_vector<T> = std::vector<T, gdb::default_init_allocator<T>>;
> 
> #3 - Introduce byte_vector
> 
> Because gdb_byte vectors will be the common thing, add a convenience
> "byte_vector" typedef:
> 
>   using byte_vector = def_vector<gdb_byte>;
> 
> which is really the same as:
> 
>   std::vector<gdb_byte, gdb::default_init_allocator<gdb_byte>>;
> 
> The intent then is to make "gdb::byte_vector" be the go-to for dynamic
> byte buffers.  So the less friction, the better.
> 
> #4 - Adjust current code to use it.
> 
> To set the example going forward.  Replace std::vector uses and also
> unique_ptr<byte[]> uses.
> 
> One nice thing is that with this allocator, for changes like these:
> 
>   -std::unique_ptr<byte[]> buf (new gdb_byte[some_size]);
>   +gdb::byte_vector buf (some_size);
>    fill_with_data (buf.data (), buf.size ());
> 
> the generated code is the exact same as before.  I.e., the compiler
> de-structures the vector and gets rid of the unused "reserved vs size"
> related fields.

How come?

> The other nice thing is that it's easier to write
>   gdb::byte_vector buf (size);
> than
>   std::unique_ptr<gdb_byte[]> buf (new gdb_byte[size]);
> or even (C++14):
>   auto buf = std::make_unique<gdb_byte[]> (size); // 
> zero-initializes...
> 
> #5 - Suggest s/std::vector<gdb_byte>/gdb::byte_vector/ going forward.
> 
> Note that this patch actually fixes a couple of bugs where the current
> code is incorrectly using "std::vector::reserve(new_size)" and then
> accessing the vector's internal buffer beyond that vector's size: see
> dwarf2loc.c and charset.c.  That's undefined behavior and may trigger
> debug mode assertion failures.  With default_init_allocator,
> "resize()" behaves like "reserve()" performance wise, in that it
> leaves new elements with unspecified values, but, it does that safely
> without triggering undefined behavior.

I don't really understand default-init-alloc.h, but the rest of the 
patch looks good to me.

Simon
Pedro Alves - June 13, 2017, 11:04 p.m.
On 06/12/2017 08:52 PM, Simon Marchi wrote:
> On 2017-06-12 19:07, Pedro Alves wrote:

>> One nice thing is that with this allocator, for changes like these:
>>
>>   -std::unique_ptr<byte[]> buf (new gdb_byte[some_size]);
>>   +gdb::byte_vector buf (some_size);
>>    fill_with_data (buf.data (), buf.size ());
>>
>> the generated code is the exact same as before.  I.e., the compiler
>> de-structures the vector and gets rid of the unused "reserved vs size"
>> related fields.
> 
> How come?

I mean that even though std::vector is a struct/aggregate, the
compiler does "scalar replacement of aggregates", constant
propagation, etc., and then all that's left in the generated code is a
pointer to the buffer returned by the new call that "operator new()"
returns.  Even though sizeof(buf) is 24 on libstdc++ there are no
traces of a real "buf" tuple in memory.

E.g., with:

extern void fill_with_data (gdb_byte *buf, size_t len);

void foo_uniq ()
{
  std::unique_ptr<gdb_byte[]> buf (new gdb_byte [100]);

  fill_with_data (buf.get (), 100);
}

void foo_def_vec ()
{
  gdb::byte_vector buf (100);

  fill_with_data (buf.data (), 100);
}

We get, with gcc 7, -O2:

Dump of assembler code for function foo_uniq():
   0x00000000005ac320 <+0>:	push   %rbp
   0x00000000005ac321 <+1>:	push   %rbx
   0x00000000005ac322 <+2>:	mov    $0x64,%edi
   0x00000000005ac327 <+7>:	sub    $0x8,%rsp
   0x00000000005ac32b <+11>:	callq  0x6218c0 <operator new[](unsigned long)>
   0x00000000005ac330 <+16>:	mov    $0x64,%esi
   0x00000000005ac335 <+21>:	mov    %rax,%rdi
   0x00000000005ac338 <+24>:	mov    %rax,%rbx
   0x00000000005ac33b <+27>:	callq  0x600820 <fill_with_data(unsigned char*, unsigned long)>
   0x00000000005ac340 <+32>:	add    $0x8,%rsp
   0x00000000005ac344 <+36>:	mov    %rbx,%rdi
   0x00000000005ac347 <+39>:	pop    %rbx
   0x00000000005ac348 <+40>:	pop    %rbp
   0x00000000005ac349 <+41>:	jmpq   0x80904a <operator delete[](void*)>
   0x00000000005ac34e <+46>:	mov    %rax,%rbp
   0x00000000005ac351 <+49>:	mov    %rbx,%rdi
   0x00000000005ac354 <+52>:	callq  0x80904a <operator delete[](void*)>
   0x00000000005ac359 <+57>:	mov    %rbp,%rdi
   0x00000000005ac35c <+60>:	callq  0x828dbb <_Unwind_Resume>
End of assembler dump.
Dump of assembler code for function foo_def_vec():
   0x00000000005ac370 <+0>:	push   %rbp
   0x00000000005ac371 <+1>:	push   %rbx
   0x00000000005ac372 <+2>:	mov    $0x64,%edi
   0x00000000005ac377 <+7>:	sub    $0x8,%rsp
   0x00000000005ac37b <+11>:	callq  0x6217b0 <operator new(unsigned long)>
   0x00000000005ac380 <+16>:	mov    $0x64,%esi
   0x00000000005ac385 <+21>:	mov    %rax,%rdi
   0x00000000005ac388 <+24>:	mov    %rax,%rbx
   0x00000000005ac38b <+27>:	callq  0x600820 <fill_with_data(unsigned char*, unsigned long)>
   0x00000000005ac390 <+32>:	add    $0x8,%rsp
   0x00000000005ac394 <+36>:	mov    %rbx,%rdi
   0x00000000005ac397 <+39>:	pop    %rbx
   0x00000000005ac398 <+40>:	pop    %rbp
   0x00000000005ac399 <+41>:	jmpq   0x80938a <operator delete(void*)>
   0x00000000005ac39e <+46>:	mov    %rax,%rbp
   0x00000000005ac3a1 <+49>:	mov    %rbx,%rdi
   0x00000000005ac3a4 <+52>:	callq  0x80938a <operator delete(void*)>
   0x00000000005ac3a9 <+57>:	mov    %rbp,%rdi
   0x00000000005ac3ac <+60>:	callq  0x828dbb <_Unwind_Resume>
End of assembler dump.

The only difference is that one called operator new, while the
other called operator new[].

> I don't really understand default-init-alloc.h, but the rest of the
> patch looks good to me.

The important thing to understand is that std containers never
call operator new/delete directly, and don't call ctors/dtors directly
either.  Instead there's a level of (usually compile-time) indirection
via an "allocator".

  http://en.cppreference.com/w/cpp/memory/allocator

(be sure to set "standard revision to C++11" to hide c++03 cruft, btw.)

There's another level of indirection here that I'll ignore for
simplicity - std::allocator_traits.

So when std::vector is constructed with a size or resized, the
allocator's allocate method is called to allocate a raw block of
contiguous memory for all the vector's elements.  gdb::default_init_alloc
doesn't override  the (provide an) "allocate" method, so we end up in the
default memory allocation via "operator new (size_t)".  Note this "new" call
returns a raw memory block, not constructed elements.  We still need
to run ctors to give life to the elements.

To run the ctor of each of the elements, std::vector calls
the allocator's "construct" method.  You can imagine it as having
this signature:

 template< class U, class... Args >
 void construct( U* p, Args&&... args );

Note that's a variadic template method.  When a vector is
created with an initial size, or is resized, and you don't specify
the value new element should have, i.e., overload (3) at:

 http://en.cppreference.com/w/cpp/container/vector/vector

the new elements must be default constructed.  I.e.,
their default constructor (i.e., ctor with no arguments) must
be called.  std::vector does that by calling the allocator's
construct method above passing it no "args" after "p".  Since
gdb::default_init_allocator has a "construct" overload like this:

 template< class U >
 void construct( U* p );

that's picked as the right overload to call, because it's 
considered a better match than a variadic template.

Here's that gdb::default_init_allocator method in full:

+  /* .. and provide an override/overload for the case of default
+     construction (i.e., no arguments).  This is where we construct
+     with default-init.  */
+  template <typename U>
+  void construct (U *ptr)
+    noexcept (std::is_nothrow_default_constructible<U>::value)
+  {
+    ::new ((void *) ptr) U; /* default-init */
+  }

Ignore the "noexpect" bit.

That "new" call with an argument passed before the type
is called a "placement-new".  That's how you run the ctor
of U on pre-existing memory.  In this case "*ptr".

That new expression does default-initialization because it doesn't
have "()" after the U.  If it had, like this:

   ::new ((void *) ptr) U ();

then that'd value-initialize *ptr.  For non-trival types, it's
the same thing.  But for scalar types, default-initialization
does nothing, and value-initialization memsets to 0.

Value-initialization would be what what the:

 template< class U, class... Args >
 void construct( U* p, Args&&... args );

"overload" would do, if we'd let it.  (Again, "quotes" because
I'm ignoring allocator_traits for simplicity.)

So what happens if you resize the vector with an explicit value, like

  buf.resize(new_size, 0xff);

?

In that case you're calling overload (2) at :
  http://en.cppreference.com/w/cpp/container/vector/vector

and then the default_init_allocator's "construct" method that
takes no arguments beyond the object's pointer is not picked
by overload resolution, so the generic one is picked:

 template< class U, class... Args >
 void construct( U* p, Args&&... args );

and that one constructs *P with

   ::new ((void *) ptr) U (args...);

which expanding "args..." ends up being:

   ::new ((void *) ptr) U (0xff);

You see here why by default the you end up with
value-initialization.  If "args..." expanded to nothing,
you'd get:

   ::new ((void *) ptr) U ();

See more here:

 http://en.cppreference.com/w/cpp/concept/Allocator
 http://en.cppreference.com/w/cpp/memory/allocator_traits
 http://en.cppreference.com/w/cpp/memory/allocator
 http://en.cppreference.com/w/cpp/memory/allocator_traits/construct
 http://en.cppreference.com/w/cpp/memory/allocator/construct
 http://en.cppreference.com/w/cpp/language/new

I hope that helps.  I think the best way to get a good grasp
on this is to just step through the std::vector code all the
way to the allocator.

Thanks,
Pedro Alves
Simon Marchi - June 14, 2017, 8:39 a.m.
On 2017-06-14 01:04, Pedro Alves wrote:
> On 06/12/2017 08:52 PM, Simon Marchi wrote:
>> On 2017-06-12 19:07, Pedro Alves wrote:
> 
>>> One nice thing is that with this allocator, for changes like these:
>>> 
>>>   -std::unique_ptr<byte[]> buf (new gdb_byte[some_size]);
>>>   +gdb::byte_vector buf (some_size);
>>>    fill_with_data (buf.data (), buf.size ());
>>> 
>>> the generated code is the exact same as before.  I.e., the compiler
>>> de-structures the vector and gets rid of the unused "reserved vs 
>>> size"
>>> related fields.
>> 
>> How come?
> 
> I mean that even though std::vector is a struct/aggregate, the
> compiler does "scalar replacement of aggregates", constant
> propagation, etc., and then all that's left in the generated code is a
> pointer to the buffer returned by the new call that "operator new()"
> returns.  Even though sizeof(buf) is 24 on libstdc++ there are no
> traces of a real "buf" tuple in memory.
> 
> E.g., with:
> 
> extern void fill_with_data (gdb_byte *buf, size_t len);
> 
> void foo_uniq ()
> {
>   std::unique_ptr<gdb_byte[]> buf (new gdb_byte [100]);
> 
>   fill_with_data (buf.get (), 100);
> }
> 
> void foo_def_vec ()
> {
>   gdb::byte_vector buf (100);
> 
>   fill_with_data (buf.data (), 100);
> }
> 
> We get, with gcc 7, -O2:
> 
> Dump of assembler code for function foo_uniq():
>    0x00000000005ac320 <+0>:	push   %rbp
>    0x00000000005ac321 <+1>:	push   %rbx
>    0x00000000005ac322 <+2>:	mov    $0x64,%edi
>    0x00000000005ac327 <+7>:	sub    $0x8,%rsp
>    0x00000000005ac32b <+11>:	callq  0x6218c0 <operator new[](unsigned 
> long)>
>    0x00000000005ac330 <+16>:	mov    $0x64,%esi
>    0x00000000005ac335 <+21>:	mov    %rax,%rdi
>    0x00000000005ac338 <+24>:	mov    %rax,%rbx
>    0x00000000005ac33b <+27>:	callq  0x600820 <fill_with_data(unsigned
> char*, unsigned long)>
>    0x00000000005ac340 <+32>:	add    $0x8,%rsp
>    0x00000000005ac344 <+36>:	mov    %rbx,%rdi
>    0x00000000005ac347 <+39>:	pop    %rbx
>    0x00000000005ac348 <+40>:	pop    %rbp
>    0x00000000005ac349 <+41>:	jmpq   0x80904a <operator delete[](void*)>
>    0x00000000005ac34e <+46>:	mov    %rax,%rbp
>    0x00000000005ac351 <+49>:	mov    %rbx,%rdi
>    0x00000000005ac354 <+52>:	callq  0x80904a <operator delete[](void*)>
>    0x00000000005ac359 <+57>:	mov    %rbp,%rdi
>    0x00000000005ac35c <+60>:	callq  0x828dbb <_Unwind_Resume>
> End of assembler dump.
> Dump of assembler code for function foo_def_vec():
>    0x00000000005ac370 <+0>:	push   %rbp
>    0x00000000005ac371 <+1>:	push   %rbx
>    0x00000000005ac372 <+2>:	mov    $0x64,%edi
>    0x00000000005ac377 <+7>:	sub    $0x8,%rsp
>    0x00000000005ac37b <+11>:	callq  0x6217b0 <operator new(unsigned 
> long)>
>    0x00000000005ac380 <+16>:	mov    $0x64,%esi
>    0x00000000005ac385 <+21>:	mov    %rax,%rdi
>    0x00000000005ac388 <+24>:	mov    %rax,%rbx
>    0x00000000005ac38b <+27>:	callq  0x600820 <fill_with_data(unsigned
> char*, unsigned long)>
>    0x00000000005ac390 <+32>:	add    $0x8,%rsp
>    0x00000000005ac394 <+36>:	mov    %rbx,%rdi
>    0x00000000005ac397 <+39>:	pop    %rbx
>    0x00000000005ac398 <+40>:	pop    %rbp
>    0x00000000005ac399 <+41>:	jmpq   0x80938a <operator delete(void*)>
>    0x00000000005ac39e <+46>:	mov    %rax,%rbp
>    0x00000000005ac3a1 <+49>:	mov    %rbx,%rdi
>    0x00000000005ac3a4 <+52>:	callq  0x80938a <operator delete(void*)>
>    0x00000000005ac3a9 <+57>:	mov    %rbp,%rdi
>    0x00000000005ac3ac <+60>:	callq  0x828dbb <_Unwind_Resume>
> End of assembler dump.
> 
> The only difference is that one called operator new, while the
> other called operator new[].
> 
>> I don't really understand default-init-alloc.h, but the rest of the
>> patch looks good to me.
> 
> The important thing to understand is that std containers never
> call operator new/delete directly, and don't call ctors/dtors directly
> either.  Instead there's a level of (usually compile-time) indirection
> via an "allocator".
> 
>   http://en.cppreference.com/w/cpp/memory/allocator
> 
> (be sure to set "standard revision to C++11" to hide c++03 cruft, btw.)
> 
> There's another level of indirection here that I'll ignore for
> simplicity - std::allocator_traits.
> 
> So when std::vector is constructed with a size or resized, the
> allocator's allocate method is called to allocate a raw block of
> contiguous memory for all the vector's elements.  
> gdb::default_init_alloc
> doesn't override  the (provide an) "allocate" method, so we end up in 
> the
> default memory allocation via "operator new (size_t)".  Note this "new" 
> call
> returns a raw memory block, not constructed elements.  We still need
> to run ctors to give life to the elements.
> 
> To run the ctor of each of the elements, std::vector calls
> the allocator's "construct" method.  You can imagine it as having
> this signature:
> 
>  template< class U, class... Args >
>  void construct( U* p, Args&&... args );
> 
> Note that's a variadic template method.  When a vector is
> created with an initial size, or is resized, and you don't specify
> the value new element should have, i.e., overload (3) at:
> 
>  http://en.cppreference.com/w/cpp/container/vector/vector
> 
> the new elements must be default constructed.  I.e.,
> their default constructor (i.e., ctor with no arguments) must
> be called.  std::vector does that by calling the allocator's
> construct method above passing it no "args" after "p".  Since
> gdb::default_init_allocator has a "construct" overload like this:
> 
>  template< class U >
>  void construct( U* p );
> 
> that's picked as the right overload to call, because it's
> considered a better match than a variadic template.
> 
> Here's that gdb::default_init_allocator method in full:
> 
> +  /* .. and provide an override/overload for the case of default
> +     construction (i.e., no arguments).  This is where we construct
> +     with default-init.  */
> +  template <typename U>
> +  void construct (U *ptr)
> +    noexcept (std::is_nothrow_default_constructible<U>::value)
> +  {
> +    ::new ((void *) ptr) U; /* default-init */
> +  }
> 
> Ignore the "noexpect" bit.
> 
> That "new" call with an argument passed before the type
> is called a "placement-new".  That's how you run the ctor
> of U on pre-existing memory.  In this case "*ptr".
> 
> That new expression does default-initialization because it doesn't
> have "()" after the U.  If it had, like this:
> 
>    ::new ((void *) ptr) U ();
> 
> then that'd value-initialize *ptr.  For non-trival types, it's
> the same thing.  But for scalar types, default-initialization
> does nothing, and value-initialization memsets to 0.
> 
> Value-initialization would be what what the:
> 
>  template< class U, class... Args >
>  void construct( U* p, Args&&... args );
> 
> "overload" would do, if we'd let it.  (Again, "quotes" because
> I'm ignoring allocator_traits for simplicity.)
> 
> So what happens if you resize the vector with an explicit value, like
> 
>   buf.resize(new_size, 0xff);
> 
> ?
> 
> In that case you're calling overload (2) at :
>   http://en.cppreference.com/w/cpp/container/vector/vector
> 
> and then the default_init_allocator's "construct" method that
> takes no arguments beyond the object's pointer is not picked
> by overload resolution, so the generic one is picked:
> 
>  template< class U, class... Args >
>  void construct( U* p, Args&&... args );
> 
> and that one constructs *P with
> 
>    ::new ((void *) ptr) U (args...);
> 
> which expanding "args..." ends up being:
> 
>    ::new ((void *) ptr) U (0xff);
> 
> You see here why by default the you end up with
> value-initialization.  If "args..." expanded to nothing,
> you'd get:
> 
>    ::new ((void *) ptr) U ();
> 
> See more here:
> 
>  http://en.cppreference.com/w/cpp/concept/Allocator
>  http://en.cppreference.com/w/cpp/memory/allocator_traits
>  http://en.cppreference.com/w/cpp/memory/allocator
>  http://en.cppreference.com/w/cpp/memory/allocator_traits/construct
>  http://en.cppreference.com/w/cpp/memory/allocator/construct
>  http://en.cppreference.com/w/cpp/language/new
> 
> I hope that helps.  I think the best way to get a good grasp
> on this is to just step through the std::vector code all the
> way to the allocator.
> 
> Thanks,
> Pedro Alves

Thanks for the detailed explanation.  It's all very logical, but it's 
also full of small details essential to really understand what's 
happening.

Simon
Yao Qi - June 23, 2017, 10:52 a.m.
Pedro Alves <palves@redhat.com> writes:

Hi Pedro,
I happen to read this code when I am writing a stack allocator,

> +  /* Override rebind.  */
> +  template<typename U>
> +  struct rebind
> +  {
> +    /* A couple helpers just to make it a bit more readable.  */
> +    typedef std::allocator_traits<A> traits_;
> +    typedef typename traits_::template rebind_alloc<U> alloc_;
> +
> +    /* This is what we're after.  */
> +    typedef default_init_allocator<U, alloc_> other;

Can we just replace "alloc_" with "A"?  alloc_ is a typedef, which is
"typename std::allocator_traits<A>::template rebind_alloc<U>".  Then, it
is "typename A::rebind<U>::other".  Then, it is "A<U>", so the member
"other" can be "typedef default_init_allocator<U, A> other;".
Pedro Alves - June 23, 2017, 11:51 a.m.
On 06/23/2017 11:52 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
> Hi Pedro,
> I happen to read this code when I am writing a stack allocator,
> 
>> +  /* Override rebind.  */
>> +  template<typename U>
>> +  struct rebind
>> +  {
>> +    /* A couple helpers just to make it a bit more readable.  */
>> +    typedef std::allocator_traits<A> traits_;
>> +    typedef typename traits_::template rebind_alloc<U> alloc_;
>> +
>> +    /* This is what we're after.  */
>> +    typedef default_init_allocator<U, alloc_> other;
> 
> Can we just replace "alloc_" with "A"?  alloc_ is a typedef, which is
> "typename std::allocator_traits<A>::template rebind_alloc<U>".  Then, it
> is "typename A::rebind<U>::other".  

First, this direct substitution is not guaranteed.  We don't know
whether "A::rebind<U>::other" exists, since it is optional:

 http://en.cppreference.com/w/cpp/concept/Allocator

We're not supposed to access properties/fields/methods of allocators
directly; instead, we need to indirect via std::allocator_traits, which 
provides defaults:

 http://en.cppreference.com/w/cpp/memory/allocator_traits

 "rebind_alloc<T>" => Alloc::rebind<T>::other if present,
                      otherwise Alloc<T, Args> if this Alloc is Alloc<U, Args>

> Then, it is "A<U>", so the member
> "other" can be "typedef default_init_allocator<U, A> other;".

Not really.  "A<U>" is not A.  "A<U>" wouldn't compile, either.

When A is "std::allocator<T>", "other" for U must be:

  typedef default_init_allocator<U, std::allocator<U>> other;

Not:

  typedef default_init_allocator<U, std::allocator<T>> other;



In more detail, we have:

 template<typename T, typename A = std::allocator<T>>
 class default_init_allocator : public A

The point of "other" is to return a default_init_allocator
for U.  IOW, above, we need to replace both "T"s with "U".

 template<typename U, typename B = std::allocator<U>>
 class default_init_allocator : public B

("A" replaced with "B" too, to avoid confusion, since
it ends up being a different type.)


So:

  default_init_allocator<T>::rebind<U>::other

should result in

  default_init_allocator<U, std::allocator<U>>

while with what you're saying, it'd result in:

  default_init_allocator<U, std::allocator<T>>


IOW, say that T is int, and U is char.  Then,

  default_init_allocator<int>::rebind<char>::other

should be:

  default_init_allocator<char, std::allocator<char>>

not:

  default_init_allocator<char, std::allocator<int>>

Hope that helps,
Pedro Alves
Yao Qi - June 23, 2017, 1:03 p.m.
Pedro Alves <palves@redhat.com> writes:

> Hope that helps,

Yes, thanks for the explanation.

Patch

diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c
index 57c670e..ea60df2 100644
--- a/gdb/ada-lang.c
+++ b/gdb/ada-lang.c
@@ -61,6 +61,7 @@ 
 #include "arch-utils.h"
 #include "cli/cli-utils.h"
 #include "common/function-view.h"
+#include "common/byte-vector.h"
 
 /* Define whether or not the C operator '/' truncates towards zero for
    differently signed operands (truncation direction is undefined in C).
@@ -2567,8 +2568,7 @@  ada_value_primitive_packed_val (struct value *obj, const gdb_byte *valaddr,
   gdb_byte *unpacked;
   const int is_scalar = is_scalar_type (type);
   const int is_big_endian = gdbarch_bits_big_endian (get_type_arch (type));
-  std::unique_ptr<gdb_byte[]> staging;
-  int staging_len = 0;
+  gdb::byte_vector staging;
 
   type = ada_check_typedef (type);
 
@@ -2586,14 +2586,14 @@  ada_value_primitive_packed_val (struct value *obj, const gdb_byte *valaddr,
 	 packed, and therefore maybe not at a byte boundary.  So, what
 	 we do, is unpack the data into a byte-aligned buffer, and then
 	 use that buffer as our object's value for resolving the type.  */
-      staging_len = (bit_size + HOST_CHAR_BIT - 1) / HOST_CHAR_BIT;
-      staging.reset (new gdb_byte[staging_len]);
+      int staging_len = (bit_size + HOST_CHAR_BIT - 1) / HOST_CHAR_BIT;
+      staging.resize (staging_len);
 
       ada_unpack_from_contents (src, bit_offset, bit_size,
-			        staging.get (), staging_len,
+			        staging.data (), staging.size (),
 				is_big_endian, has_negatives (type),
 				is_scalar);
-      type = resolve_dynamic_type (type, staging.get (), 0);
+      type = resolve_dynamic_type (type, staging.data (), 0);
       if (TYPE_LENGTH (type) < (bit_size + HOST_CHAR_BIT - 1) / HOST_CHAR_BIT)
 	{
 	  /* This happens when the length of the object is dynamic,
@@ -2656,12 +2656,12 @@  ada_value_primitive_packed_val (struct value *obj, const gdb_byte *valaddr,
       return v;
     }
 
-  if (staging != NULL && staging_len == TYPE_LENGTH (type))
+  if (staging.size () == TYPE_LENGTH (type))
     {
       /* Small short-cut: If we've unpacked the data into a buffer
 	 of the same size as TYPE's length, then we can reuse that,
 	 instead of doing the unpacking again.  */
-      memcpy (unpacked, staging.get (), staging_len);
+      memcpy (unpacked, staging.data (), staging.size ());
     }
   else
     ada_unpack_from_contents (src, bit_offset, bit_size,
diff --git a/gdb/charset.c b/gdb/charset.c
index f55e482..dbe46a4 100644
--- a/gdb/charset.c
+++ b/gdb/charset.c
@@ -673,7 +673,7 @@  wchar_iterator::iterate (enum wchar_iterate_result *out_result,
 
 	      ++out_request;
 	      if (out_request > m_out.size ())
-		m_out.reserve (out_request);
+		m_out.resize (out_request);
 	      continue;
 
 	    case EINVAL:
diff --git a/gdb/charset.h b/gdb/charset.h
index 51180e3..821974a 100644
--- a/gdb/charset.h
+++ b/gdb/charset.h
@@ -19,7 +19,7 @@ 
 #ifndef CHARSET_H
 #define CHARSET_H
 
-#include <vector>
+#include "common/def-vector.h"
 
 /* If the target program uses a different character set than the host,
    GDB has some support for translating between the two; GDB converts
@@ -144,7 +144,7 @@  class wchar_iterator
   size_t m_width;
 
   /* The output buffer.  */
-  std::vector<gdb_wchar_t> m_out;
+  gdb::def_vector<gdb_wchar_t> m_out;
 };
 
 
diff --git a/gdb/cli/cli-dump.c b/gdb/cli/cli-dump.c
index 213622d..d6d4aab 100644
--- a/gdb/cli/cli-dump.c
+++ b/gdb/cli/cli-dump.c
@@ -31,7 +31,7 @@ 
 #include "cli/cli-utils.h"
 #include "gdb_bfd.h"
 #include "filestuff.h"
-
+#include "common/byte-vector.h"
 
 static const char *
 scan_expression_with_cleanup (const char **cmd, const char *def)
@@ -230,17 +230,17 @@  dump_memory_to_file (const char *cmd, const char *mode, const char *file_format)
 
   /* FIXME: Should use read_memory_partial() and a magic blocking
      value.  */
-  std::unique_ptr<gdb_byte[]> buf (new gdb_byte[count]);
-  read_memory (lo, buf.get (), count);
+  gdb::byte_vector buf (count);
+  read_memory (lo, buf.data (), count);
   
   /* Have everything.  Open/write the data.  */
   if (file_format == NULL || strcmp (file_format, "binary") == 0)
     {
-      dump_binary_file (filename, mode, buf.get (), count);
+      dump_binary_file (filename, mode, buf.data (), count);
     }
   else
     {
-      dump_bfd_file (filename, mode, file_format, lo, buf.get (), count);
+      dump_bfd_file (filename, mode, file_format, lo, buf.data (), count);
     }
 
   do_cleanups (old_cleanups);
@@ -545,13 +545,13 @@  restore_binary_file (const char *filename, struct callback_data *data)
     perror_with_name (filename);
 
   /* Now allocate a buffer and read the file contents.  */
-  std::unique_ptr<gdb_byte[]> buf (new gdb_byte[len]);
-  if (fread (buf.get (), 1, len, file) != len)
+  gdb::byte_vector buf (len);
+  if (fread (buf.data (), 1, len, file) != len)
     perror_with_name (filename);
 
   /* Now write the buffer into target memory.  */
   len = target_write_memory (data->load_start + data->load_offset,
-			     buf.get (), len);
+			     buf.data (), len);
   if (len != 0)
     warning (_("restore: memory write failed (%s)."), safe_strerror (len));
   do_cleanups (cleanup);
diff --git a/gdb/common/byte-vector.h b/gdb/common/byte-vector.h
new file mode 100644
index 0000000..c17b14d
--- /dev/null
+++ b/gdb/common/byte-vector.h
@@ -0,0 +1,62 @@ 
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef COMMON_BYTE_VECTOR_H
+#define COMMON_BYTE_VECTOR_H
+
+#include "common/def-vector.h"
+
+namespace gdb {
+
+/* byte_vector is a gdb_byte std::vector with a custom allocator that
+   unlike std::vector<gdb_byte> does not zero-initialize new elements
+   by default when the vector is created/resized.  This is what you
+   usually want when working with byte buffers, since if you're
+   creating or growing a buffer you'll most surely want to fill it in
+   with data, in which case zero-initialization would be a
+   pessimization.  For example:
+
+     gdb::byte_vector buf (some_large_size);
+     fill_with_data (buf.data (), buf.size ());
+
+   On the odd case you do need zero initialization, then you can still
+   call the overloads that specify an explicit value, like:
+
+     gdb::byte_vector buf (some_initial_size, 0);
+     buf.resize (a_bigger_size, 0);
+
+   (Or use std::vector<gdb_byte> instead.)
+
+   Note that unlike std::vector<gdb_byte>, function local
+   gdb::byte_vector objects constructed with an initial size like:
+
+     gdb::byte_vector buf (some_size);
+     fill_with_data (buf.data (), buf.size ());
+
+   usually compile down to the exact same as:
+
+     std::unique_ptr<byte[]> buf (new gdb_byte[some_size]);
+     fill_with_data (buf.get (), some_size);
+
+   with the former having the advantage of being a bit more readable,
+   and providing the whole std::vector API, if you end up needing it.
+*/
+using byte_vector = gdb::def_vector<gdb_byte>;
+
+} /* namespace gdb */
+
+#endif /* COMMON_DEF_VECTOR_H */
diff --git a/gdb/common/def-vector.h b/gdb/common/def-vector.h
new file mode 100644
index 0000000..ab9331f
--- /dev/null
+++ b/gdb/common/def-vector.h
@@ -0,0 +1,36 @@ 
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef COMMON_DEF_VECTOR_H
+#define COMMON_DEF_VECTOR_H
+
+#include <vector>
+#include "common/default-init-alloc.h"
+
+namespace gdb {
+
+/* A vector that uses an allocator that default constructs using
+   default-initialization rather than value-initialization.  The idea
+   is to use this when you don't want zero-initialization of elements
+   of vectors of trivial types.  E.g., byte buffers.  */
+
+template<typename T> using def_vector
+  = std::vector<T, gdb::default_init_allocator<T>>;
+
+} /* namespace gdb */
+
+#endif /* COMMON_DEF_VECTOR_H */
diff --git a/gdb/common/default-init-alloc.h b/gdb/common/default-init-alloc.h
new file mode 100644
index 0000000..4fb852f
--- /dev/null
+++ b/gdb/common/default-init-alloc.h
@@ -0,0 +1,67 @@ 
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef COMMON_DEFAULT_INIT_ALLOC_H
+#define COMMON_DEFAULT_INIT_ALLOC_H
+
+namespace gdb {
+
+/* An allocator that default constructs using default-initialization
+   rather than value-initialization.  The idea is to use this when you
+   don't want to default construct elements of containers of trivial
+   types using zero-initialization.  */
+
+/* Mostly as implementation convenience, this is implemented as an
+   adapter that given an allocator A, overrides 'A::construct()'.  'A'
+   defaults to std::allocator<T>.  */
+
+template<typename T, typename A = std::allocator<T>>
+class default_init_allocator : public A
+{
+public:
+  /* Pull in A's ctors.  */
+  using A::A;
+
+  /* Override rebind.  */
+  template<typename U>
+  struct rebind
+  {
+    /* A couple helpers just to make it a bit more readable.  */
+    typedef std::allocator_traits<A> traits_;
+    typedef typename traits_::template rebind_alloc<U> alloc_;
+
+    /* This is what we're after.  */
+    typedef default_init_allocator<U, alloc_> other;
+  };
+
+  /* Make the base allocator's construct method(s) visible.  */
+  using A::construct;
+
+  /* .. and provide an override/overload for the case of default
+     construction (i.e., no arguments).  This is where we construct
+     with default-init.  */
+  template <typename U>
+  void construct (U *ptr)
+    noexcept (std::is_nothrow_default_constructible<U>::value)
+  {
+    ::new ((void *) ptr) U; /* default-init */
+  }
+};
+
+} /* namespace gdb */
+
+#endif /* COMMON_DEFAULT_INIT_ALLOC_H */
diff --git a/gdb/dwarf2loc.c b/gdb/dwarf2loc.c
index 127167d..698c91f 100644
--- a/gdb/dwarf2loc.c
+++ b/gdb/dwarf2loc.c
@@ -43,6 +43,7 @@ 
 #include <vector>
 #include <unordered_set>
 #include "common/underlying.h"
+#include "common/byte-vector.h"
 
 extern int dwarf_always_disassemble;
 
@@ -1767,7 +1768,7 @@  read_pieced_value (struct value *v)
     = (struct piece_closure *) value_computed_closure (v);
   size_t type_len;
   size_t buffer_size = 0;
-  std::vector<gdb_byte> buffer;
+  gdb::byte_vector buffer;
   int bits_big_endian
     = gdbarch_bits_big_endian (get_type_arch (value_type (v)));
 
@@ -1821,7 +1822,7 @@  read_pieced_value (struct value *v)
       if (buffer_size < this_size)
 	{
 	  buffer_size = this_size;
-	  buffer.reserve (buffer_size);
+	  buffer.resize (buffer_size);
 	}
       intermediate_buffer = buffer.data ();
 
@@ -1992,7 +1993,7 @@  write_pieced_value (struct value *to, struct value *from)
 	  if (buffer_size < this_size)
 	    {
 	      buffer_size = this_size;
-	      buffer.reserve (buffer_size);
+	      buffer.resize (buffer_size);
 	    }
 	  source_buffer = buffer.data ();
 	  need_bitwise = 1;
diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
index 3f872b7..abe14b2 100644
--- a/gdb/dwarf2read.c
+++ b/gdb/dwarf2read.c
@@ -73,6 +73,7 @@ 
 #include "common/function-view.h"
 #include "common/gdb_optional.h"
 #include "common/underlying.h"
+#include "common/byte-vector.h"
 
 #include <fcntl.h>
 #include <sys/types.h>
@@ -23245,7 +23246,7 @@  private:
     return &*m_vec.end () - size;
   }
 
-  std::vector<gdb_byte> m_vec;
+  gdb::byte_vector m_vec;
 };
 
 /* An entry in the symbol table.  */
diff --git a/gdb/gdb_regex.c b/gdb/gdb_regex.c
index 2e376e3..55c3980 100644
--- a/gdb/gdb_regex.c
+++ b/gdb/gdb_regex.c
@@ -17,6 +17,7 @@ 
 
 #include "defs.h"
 #include "gdb_regex.h"
+#include "common/def-vector.h"
 
 compiled_regex::compiled_regex (const char *regex, int cflags,
 				const char *message)
@@ -28,10 +29,10 @@  compiled_regex::compiled_regex (const char *regex, int cflags,
   if (code != 0)
     {
       size_t length = regerror (code, &m_pattern, NULL, 0);
-      std::unique_ptr<char[]> err (new char[length]);
+      gdb::def_vector<char> err (length);
 
-      regerror (code, &m_pattern, err.get (), length);
-      error (("%s: %s"), message, err.get ());
+      regerror (code, &m_pattern, err.data (), length);
+      error (("%s: %s"), message, err.data ());
     }
 }
 
diff --git a/gdb/mi/mi-main.c b/gdb/mi/mi-main.c
index 755fbab..53289bb 100644
--- a/gdb/mi/mi-main.c
+++ b/gdb/mi/mi-main.c
@@ -55,6 +55,7 @@ 
 #include "gdbcmd.h"
 #include "observer.h"
 #include "common/gdb_optional.h"
+#include "common/byte-vector.h"
 
 #include <ctype.h>
 #include "run-time-clock.h"
@@ -1466,12 +1467,12 @@  mi_cmd_data_read_memory (const char *command, char **argv, int argc)
   /* Create a buffer and read it in.  */
   total_bytes = word_size * nr_rows * nr_cols;
 
-  std::unique_ptr<gdb_byte[]> mbuf (new gdb_byte[total_bytes]);
+  gdb::byte_vector mbuf (total_bytes);
 
   /* Dispatch memory reads to the topmost target, not the flattened
      current_target.  */
   nr_bytes = target_read (current_target.beneath,
-			  TARGET_OBJECT_MEMORY, NULL, mbuf.get (),
+			  TARGET_OBJECT_MEMORY, NULL, mbuf.data (),
 			  addr, total_bytes);
   if (nr_bytes <= 0)
     error (_("Unable to read memory."));