Introduce gdb::byte_vector, add allocator that default-initializes

Message ID 1497287225-16542-1-git-send-email-palves@redhat.com
State New, archived
Headers

Commit Message

Pedro Alves June 12, 2017, 5:07 p.m. UTC
  In some cases we've been replacing heap-allocated gdb_byte buffers
managed with xmalloc/make_cleanup(xfree) with gdb::vector<gdb_byte>.
That usually pessimizes the code a little bit because std::vector
value-initializes elements (which for gdb_byte means
zero-initialization), while if you're creating a temporary buffer,
you're most certaintly going to fill it in with some data.  An
alternative is to use

  unique_ptr<gdb_byte[]> buf (new gdb_byte[size]);

but it looks like that's not very popular.

Recently, a use of obstacks in dwarf2read.c was replaced with
std::vector<gdb_byte> and that as well introduced a pessimization for
always memsetting the buffer when it's garanteed that the zeros will
be overwritten immediately.  (see dwarf2read.c change in this patch to
find it.)

So here's a different take at addressing this issue "by design":

#1 - Introduce default_init_allocator<T>

I.e., a custom allocator that does default construction using default
initialization, meaning, no more zero initialization.  That's the
default_init_allocation<T> class added in this patch.

See "Notes" at
<http://en.cppreference.com/w/cpp/container/vector/resize>.

#2 - Introduce def_vector<T>

I.e., a convenience typedef, because typing the allocator is annoying:

  using def_vector<T> = std::vector<T, gdb::default_init_allocator<T>>;

#3 - Introduce byte_vector

Because gdb_byte vectors will be the common thing, add a convenience
"byte_vector" typedef:

  using byte_vector = def_vector<gdb_byte>;

which is really the same as:

  std::vector<gdb_byte, gdb::default_init_allocator<gdb_byte>>;

The intent then is to make "gdb::byte_vector" be the go-to for dynamic
byte buffers.  So the less friction, the better.

#4 - Adjust current code to use it.

To set the example going forward.  Replace std::vector uses and also
unique_ptr<byte[]> uses.

One nice thing is that with this allocator, for changes like these:

  -std::unique_ptr<byte[]> buf (new gdb_byte[some_size]);
  +gdb::byte_vector buf (some_size);
   fill_with_data (buf.data (), buf.size ());

the generated code is the exact same as before.  I.e., the compiler
de-structures the vector and gets rid of the unused "reserved vs size"
related fields.

The other nice thing is that it's easier to write
  gdb::byte_vector buf (size);
than
  std::unique_ptr<gdb_byte[]> buf (new gdb_byte[size]);
or even (C++14):
  auto buf = std::make_unique<gdb_byte[]> (size); // zero-initializes...

#5 - Suggest s/std::vector<gdb_byte>/gdb::byte_vector/ going forward.

Note that this patch actually fixes a couple of bugs where the current
code is incorrectly using "std::vector::reserve(new_size)" and then
accessing the vector's internal buffer beyond that vector's size: see
dwarf2loc.c and charset.c.  That's undefined behavior and may trigger
debug mode assertion failures.  With default_init_allocator,
"resize()" behaves like "reserve()" performance wise, in that it
leaves new elements with unspecified values, but, it does that safely
without triggering undefined behavior.

gdb/ChangeLog:
2017-06-12  Pedro Alves  <palves@redhat.com>

	* ada-lang.c: Include "common/byte-vector.h".
	(ada_value_primitive_packed_val): Use gdb::byte_vector.
	* charset.c (wchar_iterator::iterate): Resize the vector instead
	of reserving it.
	* common/byte-vector.h: Include "common/def-vector.h".
	(wchar_iterator::m_out): Now a gdb::def_vector<gdb_wchar_t>.
	* cli/cli-dump.c: Include "common/byte-vector.h".
	(dump_memory_to_file, restore_binary_file): Use gdb::byte_vector.
	* common/byte-vector.h: New file.
	* common/def-vector.h: New file.
	* common/default-init-alloc.h: New file.
	* dwarf2loc.c: Include "common/byte-vector.h".
	(read_pieced_value): Use gdb::byte_vector, and resize the vector
	instead of reserving it.
	* dwarf2read.c: Include "common/byte-vector.h".
	(data_buf::m_vec): Now a gdb::byte_vector.
	* gdb_regex.c: Include "common/def-vector.h".
	(compiled_regex::compiled_regex): Use gdb::def_vector<char>.
	* mi/mi-main.c: Include "common/byte-vector.h".
	(mi_cmd_data_read_memory): Use gdb::byte_vector.
---
 gdb/ada-lang.c                  | 16 +++++-----
 gdb/charset.c                   |  2 +-
 gdb/charset.h                   |  4 +--
 gdb/cli/cli-dump.c              | 16 +++++-----
 gdb/common/byte-vector.h        | 62 ++++++++++++++++++++++++++++++++++++++
 gdb/common/def-vector.h         | 36 ++++++++++++++++++++++
 gdb/common/default-init-alloc.h | 67 +++++++++++++++++++++++++++++++++++++++++
 gdb/dwarf2loc.c                 |  7 +++--
 gdb/dwarf2read.c                |  3 +-
 gdb/gdb_regex.c                 |  7 +++--
 gdb/mi/mi-main.c                |  5 +--
 11 files changed, 197 insertions(+), 28 deletions(-)
 create mode 100644 gdb/common/byte-vector.h
 create mode 100644 gdb/common/def-vector.h
 create mode 100644 gdb/common/default-init-alloc.h
  

Comments

Simon Marchi June 12, 2017, 7:52 p.m. UTC | #1
On 2017-06-12 19:07, Pedro Alves wrote:
> In some cases we've been replacing heap-allocated gdb_byte buffers
> managed with xmalloc/make_cleanup(xfree) with gdb::vector<gdb_byte>.
> That usually pessimizes the code a little bit because std::vector
> value-initializes elements (which for gdb_byte means
> zero-initialization), while if you're creating a temporary buffer,
> you're most certaintly going to fill it in with some data.  An
> alternative is to use
> 
>   unique_ptr<gdb_byte[]> buf (new gdb_byte[size]);
> 
> but it looks like that's not very popular.
> 
> Recently, a use of obstacks in dwarf2read.c was replaced with
> std::vector<gdb_byte> and that as well introduced a pessimization for
> always memsetting the buffer when it's garanteed that the zeros will
> be overwritten immediately.  (see dwarf2read.c change in this patch to
> find it.)
> 
> So here's a different take at addressing this issue "by design":
> 
> #1 - Introduce default_init_allocator<T>
> 
> I.e., a custom allocator that does default construction using default
> initialization, meaning, no more zero initialization.  That's the
> default_init_allocation<T> class added in this patch.
> 
> See "Notes" at
> <http://en.cppreference.com/w/cpp/container/vector/resize>.
> 
> #2 - Introduce def_vector<T>
> 
> I.e., a convenience typedef, because typing the allocator is annoying:
> 
>   using def_vector<T> = std::vector<T, gdb::default_init_allocator<T>>;
> 
> #3 - Introduce byte_vector
> 
> Because gdb_byte vectors will be the common thing, add a convenience
> "byte_vector" typedef:
> 
>   using byte_vector = def_vector<gdb_byte>;
> 
> which is really the same as:
> 
>   std::vector<gdb_byte, gdb::default_init_allocator<gdb_byte>>;
> 
> The intent then is to make "gdb::byte_vector" be the go-to for dynamic
> byte buffers.  So the less friction, the better.
> 
> #4 - Adjust current code to use it.
> 
> To set the example going forward.  Replace std::vector uses and also
> unique_ptr<byte[]> uses.
> 
> One nice thing is that with this allocator, for changes like these:
> 
>   -std::unique_ptr<byte[]> buf (new gdb_byte[some_size]);
>   +gdb::byte_vector buf (some_size);
>    fill_with_data (buf.data (), buf.size ());
> 
> the generated code is the exact same as before.  I.e., the compiler
> de-structures the vector and gets rid of the unused "reserved vs size"
> related fields.

How come?

> The other nice thing is that it's easier to write
>   gdb::byte_vector buf (size);
> than
>   std::unique_ptr<gdb_byte[]> buf (new gdb_byte[size]);
> or even (C++14):
>   auto buf = std::make_unique<gdb_byte[]> (size); // 
> zero-initializes...
> 
> #5 - Suggest s/std::vector<gdb_byte>/gdb::byte_vector/ going forward.
> 
> Note that this patch actually fixes a couple of bugs where the current
> code is incorrectly using "std::vector::reserve(new_size)" and then
> accessing the vector's internal buffer beyond that vector's size: see
> dwarf2loc.c and charset.c.  That's undefined behavior and may trigger
> debug mode assertion failures.  With default_init_allocator,
> "resize()" behaves like "reserve()" performance wise, in that it
> leaves new elements with unspecified values, but, it does that safely
> without triggering undefined behavior.

I don't really understand default-init-alloc.h, but the rest of the 
patch looks good to me.

Simon
  
Pedro Alves June 13, 2017, 11:04 p.m. UTC | #2
On 06/12/2017 08:52 PM, Simon Marchi wrote:
> On 2017-06-12 19:07, Pedro Alves wrote:

>> One nice thing is that with this allocator, for changes like these:
>>
>>   -std::unique_ptr<byte[]> buf (new gdb_byte[some_size]);
>>   +gdb::byte_vector buf (some_size);
>>    fill_with_data (buf.data (), buf.size ());
>>
>> the generated code is the exact same as before.  I.e., the compiler
>> de-structures the vector and gets rid of the unused "reserved vs size"
>> related fields.
> 
> How come?

I mean that even though std::vector is a struct/aggregate, the
compiler does "scalar replacement of aggregates", constant
propagation, etc., and then all that's left in the generated code is a
pointer to the buffer returned by the new call that "operator new()"
returns.  Even though sizeof(buf) is 24 on libstdc++ there are no
traces of a real "buf" tuple in memory.

E.g., with:

extern void fill_with_data (gdb_byte *buf, size_t len);

void foo_uniq ()
{
  std::unique_ptr<gdb_byte[]> buf (new gdb_byte [100]);

  fill_with_data (buf.get (), 100);
}

void foo_def_vec ()
{
  gdb::byte_vector buf (100);

  fill_with_data (buf.data (), 100);
}

We get, with gcc 7, -O2:

Dump of assembler code for function foo_uniq():
   0x00000000005ac320 <+0>:	push   %rbp
   0x00000000005ac321 <+1>:	push   %rbx
   0x00000000005ac322 <+2>:	mov    $0x64,%edi
   0x00000000005ac327 <+7>:	sub    $0x8,%rsp
   0x00000000005ac32b <+11>:	callq  0x6218c0 <operator new[](unsigned long)>
   0x00000000005ac330 <+16>:	mov    $0x64,%esi
   0x00000000005ac335 <+21>:	mov    %rax,%rdi
   0x00000000005ac338 <+24>:	mov    %rax,%rbx
   0x00000000005ac33b <+27>:	callq  0x600820 <fill_with_data(unsigned char*, unsigned long)>
   0x00000000005ac340 <+32>:	add    $0x8,%rsp
   0x00000000005ac344 <+36>:	mov    %rbx,%rdi
   0x00000000005ac347 <+39>:	pop    %rbx
   0x00000000005ac348 <+40>:	pop    %rbp
   0x00000000005ac349 <+41>:	jmpq   0x80904a <operator delete[](void*)>
   0x00000000005ac34e <+46>:	mov    %rax,%rbp
   0x00000000005ac351 <+49>:	mov    %rbx,%rdi
   0x00000000005ac354 <+52>:	callq  0x80904a <operator delete[](void*)>
   0x00000000005ac359 <+57>:	mov    %rbp,%rdi
   0x00000000005ac35c <+60>:	callq  0x828dbb <_Unwind_Resume>
End of assembler dump.
Dump of assembler code for function foo_def_vec():
   0x00000000005ac370 <+0>:	push   %rbp
   0x00000000005ac371 <+1>:	push   %rbx
   0x00000000005ac372 <+2>:	mov    $0x64,%edi
   0x00000000005ac377 <+7>:	sub    $0x8,%rsp
   0x00000000005ac37b <+11>:	callq  0x6217b0 <operator new(unsigned long)>
   0x00000000005ac380 <+16>:	mov    $0x64,%esi
   0x00000000005ac385 <+21>:	mov    %rax,%rdi
   0x00000000005ac388 <+24>:	mov    %rax,%rbx
   0x00000000005ac38b <+27>:	callq  0x600820 <fill_with_data(unsigned char*, unsigned long)>
   0x00000000005ac390 <+32>:	add    $0x8,%rsp
   0x00000000005ac394 <+36>:	mov    %rbx,%rdi
   0x00000000005ac397 <+39>:	pop    %rbx
   0x00000000005ac398 <+40>:	pop    %rbp
   0x00000000005ac399 <+41>:	jmpq   0x80938a <operator delete(void*)>
   0x00000000005ac39e <+46>:	mov    %rax,%rbp
   0x00000000005ac3a1 <+49>:	mov    %rbx,%rdi
   0x00000000005ac3a4 <+52>:	callq  0x80938a <operator delete(void*)>
   0x00000000005ac3a9 <+57>:	mov    %rbp,%rdi
   0x00000000005ac3ac <+60>:	callq  0x828dbb <_Unwind_Resume>
End of assembler dump.

The only difference is that one called operator new, while the
other called operator new[].

> I don't really understand default-init-alloc.h, but the rest of the
> patch looks good to me.

The important thing to understand is that std containers never
call operator new/delete directly, and don't call ctors/dtors directly
either.  Instead there's a level of (usually compile-time) indirection
via an "allocator".

  http://en.cppreference.com/w/cpp/memory/allocator

(be sure to set "standard revision to C++11" to hide c++03 cruft, btw.)

There's another level of indirection here that I'll ignore for
simplicity - std::allocator_traits.

So when std::vector is constructed with a size or resized, the
allocator's allocate method is called to allocate a raw block of
contiguous memory for all the vector's elements.  gdb::default_init_alloc
doesn't override  the (provide an) "allocate" method, so we end up in the
default memory allocation via "operator new (size_t)".  Note this "new" call
returns a raw memory block, not constructed elements.  We still need
to run ctors to give life to the elements.

To run the ctor of each of the elements, std::vector calls
the allocator's "construct" method.  You can imagine it as having
this signature:

 template< class U, class... Args >
 void construct( U* p, Args&&... args );

Note that's a variadic template method.  When a vector is
created with an initial size, or is resized, and you don't specify
the value new element should have, i.e., overload (3) at:

 http://en.cppreference.com/w/cpp/container/vector/vector

the new elements must be default constructed.  I.e.,
their default constructor (i.e., ctor with no arguments) must
be called.  std::vector does that by calling the allocator's
construct method above passing it no "args" after "p".  Since
gdb::default_init_allocator has a "construct" overload like this:

 template< class U >
 void construct( U* p );

that's picked as the right overload to call, because it's 
considered a better match than a variadic template.

Here's that gdb::default_init_allocator method in full:

+  /* .. and provide an override/overload for the case of default
+     construction (i.e., no arguments).  This is where we construct
+     with default-init.  */
+  template <typename U>
+  void construct (U *ptr)
+    noexcept (std::is_nothrow_default_constructible<U>::value)
+  {
+    ::new ((void *) ptr) U; /* default-init */
+  }

Ignore the "noexpect" bit.

That "new" call with an argument passed before the type
is called a "placement-new".  That's how you run the ctor
of U on pre-existing memory.  In this case "*ptr".

That new expression does default-initialization because it doesn't
have "()" after the U.  If it had, like this:

   ::new ((void *) ptr) U ();

then that'd value-initialize *ptr.  For non-trival types, it's
the same thing.  But for scalar types, default-initialization
does nothing, and value-initialization memsets to 0.

Value-initialization would be what what the:

 template< class U, class... Args >
 void construct( U* p, Args&&... args );

"overload" would do, if we'd let it.  (Again, "quotes" because
I'm ignoring allocator_traits for simplicity.)

So what happens if you resize the vector with an explicit value, like

  buf.resize(new_size, 0xff);

?

In that case you're calling overload (2) at :
  http://en.cppreference.com/w/cpp/container/vector/vector

and then the default_init_allocator's "construct" method that
takes no arguments beyond the object's pointer is not picked
by overload resolution, so the generic one is picked:

 template< class U, class... Args >
 void construct( U* p, Args&&... args );

and that one constructs *P with

   ::new ((void *) ptr) U (args...);

which expanding "args..." ends up being:

   ::new ((void *) ptr) U (0xff);

You see here why by default the you end up with
value-initialization.  If "args..." expanded to nothing,
you'd get:

   ::new ((void *) ptr) U ();

See more here:

 http://en.cppreference.com/w/cpp/concept/Allocator
 http://en.cppreference.com/w/cpp/memory/allocator_traits
 http://en.cppreference.com/w/cpp/memory/allocator
 http://en.cppreference.com/w/cpp/memory/allocator_traits/construct
 http://en.cppreference.com/w/cpp/memory/allocator/construct
 http://en.cppreference.com/w/cpp/language/new

I hope that helps.  I think the best way to get a good grasp
on this is to just step through the std::vector code all the
way to the allocator.

Thanks,
Pedro Alves
  
Simon Marchi June 14, 2017, 8:39 a.m. UTC | #3
On 2017-06-14 01:04, Pedro Alves wrote:
> On 06/12/2017 08:52 PM, Simon Marchi wrote:
>> On 2017-06-12 19:07, Pedro Alves wrote:
> 
>>> One nice thing is that with this allocator, for changes like these:
>>> 
>>>   -std::unique_ptr<byte[]> buf (new gdb_byte[some_size]);
>>>   +gdb::byte_vector buf (some_size);
>>>    fill_with_data (buf.data (), buf.size ());
>>> 
>>> the generated code is the exact same as before.  I.e., the compiler
>>> de-structures the vector and gets rid of the unused "reserved vs 
>>> size"
>>> related fields.
>> 
>> How come?
> 
> I mean that even though std::vector is a struct/aggregate, the
> compiler does "scalar replacement of aggregates", constant
> propagation, etc., and then all that's left in the generated code is a
> pointer to the buffer returned by the new call that "operator new()"
> returns.  Even though sizeof(buf) is 24 on libstdc++ there are no
> traces of a real "buf" tuple in memory.
> 
> E.g., with:
> 
> extern void fill_with_data (gdb_byte *buf, size_t len);
> 
> void foo_uniq ()
> {
>   std::unique_ptr<gdb_byte[]> buf (new gdb_byte [100]);
> 
>   fill_with_data (buf.get (), 100);
> }
> 
> void foo_def_vec ()
> {
>   gdb::byte_vector buf (100);
> 
>   fill_with_data (buf.data (), 100);
> }
> 
> We get, with gcc 7, -O2:
> 
> Dump of assembler code for function foo_uniq():
>    0x00000000005ac320 <+0>:	push   %rbp
>    0x00000000005ac321 <+1>:	push   %rbx
>    0x00000000005ac322 <+2>:	mov    $0x64,%edi
>    0x00000000005ac327 <+7>:	sub    $0x8,%rsp
>    0x00000000005ac32b <+11>:	callq  0x6218c0 <operator new[](unsigned 
> long)>
>    0x00000000005ac330 <+16>:	mov    $0x64,%esi
>    0x00000000005ac335 <+21>:	mov    %rax,%rdi
>    0x00000000005ac338 <+24>:	mov    %rax,%rbx
>    0x00000000005ac33b <+27>:	callq  0x600820 <fill_with_data(unsigned
> char*, unsigned long)>
>    0x00000000005ac340 <+32>:	add    $0x8,%rsp
>    0x00000000005ac344 <+36>:	mov    %rbx,%rdi
>    0x00000000005ac347 <+39>:	pop    %rbx
>    0x00000000005ac348 <+40>:	pop    %rbp
>    0x00000000005ac349 <+41>:	jmpq   0x80904a <operator delete[](void*)>
>    0x00000000005ac34e <+46>:	mov    %rax,%rbp
>    0x00000000005ac351 <+49>:	mov    %rbx,%rdi
>    0x00000000005ac354 <+52>:	callq  0x80904a <operator delete[](void*)>
>    0x00000000005ac359 <+57>:	mov    %rbp,%rdi
>    0x00000000005ac35c <+60>:	callq  0x828dbb <_Unwind_Resume>
> End of assembler dump.
> Dump of assembler code for function foo_def_vec():
>    0x00000000005ac370 <+0>:	push   %rbp
>    0x00000000005ac371 <+1>:	push   %rbx
>    0x00000000005ac372 <+2>:	mov    $0x64,%edi
>    0x00000000005ac377 <+7>:	sub    $0x8,%rsp
>    0x00000000005ac37b <+11>:	callq  0x6217b0 <operator new(unsigned 
> long)>
>    0x00000000005ac380 <+16>:	mov    $0x64,%esi
>    0x00000000005ac385 <+21>:	mov    %rax,%rdi
>    0x00000000005ac388 <+24>:	mov    %rax,%rbx
>    0x00000000005ac38b <+27>:	callq  0x600820 <fill_with_data(unsigned
> char*, unsigned long)>
>    0x00000000005ac390 <+32>:	add    $0x8,%rsp
>    0x00000000005ac394 <+36>:	mov    %rbx,%rdi
>    0x00000000005ac397 <+39>:	pop    %rbx
>    0x00000000005ac398 <+40>:	pop    %rbp
>    0x00000000005ac399 <+41>:	jmpq   0x80938a <operator delete(void*)>
>    0x00000000005ac39e <+46>:	mov    %rax,%rbp
>    0x00000000005ac3a1 <+49>:	mov    %rbx,%rdi
>    0x00000000005ac3a4 <+52>:	callq  0x80938a <operator delete(void*)>
>    0x00000000005ac3a9 <+57>:	mov    %rbp,%rdi
>    0x00000000005ac3ac <+60>:	callq  0x828dbb <_Unwind_Resume>
> End of assembler dump.
> 
> The only difference is that one called operator new, while the
> other called operator new[].
> 
>> I don't really understand default-init-alloc.h, but the rest of the
>> patch looks good to me.
> 
> The important thing to understand is that std containers never
> call operator new/delete directly, and don't call ctors/dtors directly
> either.  Instead there's a level of (usually compile-time) indirection
> via an "allocator".
> 
>   http://en.cppreference.com/w/cpp/memory/allocator
> 
> (be sure to set "standard revision to C++11" to hide c++03 cruft, btw.)
> 
> There's another level of indirection here that I'll ignore for
> simplicity - std::allocator_traits.
> 
> So when std::vector is constructed with a size or resized, the
> allocator's allocate method is called to allocate a raw block of
> contiguous memory for all the vector's elements.  
> gdb::default_init_alloc
> doesn't override  the (provide an) "allocate" method, so we end up in 
> the
> default memory allocation via "operator new (size_t)".  Note this "new" 
> call
> returns a raw memory block, not constructed elements.  We still need
> to run ctors to give life to the elements.
> 
> To run the ctor of each of the elements, std::vector calls
> the allocator's "construct" method.  You can imagine it as having
> this signature:
> 
>  template< class U, class... Args >
>  void construct( U* p, Args&&... args );
> 
> Note that's a variadic template method.  When a vector is
> created with an initial size, or is resized, and you don't specify
> the value new element should have, i.e., overload (3) at:
> 
>  http://en.cppreference.com/w/cpp/container/vector/vector
> 
> the new elements must be default constructed.  I.e.,
> their default constructor (i.e., ctor with no arguments) must
> be called.  std::vector does that by calling the allocator's
> construct method above passing it no "args" after "p".  Since
> gdb::default_init_allocator has a "construct" overload like this:
> 
>  template< class U >
>  void construct( U* p );
> 
> that's picked as the right overload to call, because it's
> considered a better match than a variadic template.
> 
> Here's that gdb::default_init_allocator method in full:
> 
> +  /* .. and provide an override/overload for the case of default
> +     construction (i.e., no arguments).  This is where we construct
> +     with default-init.  */
> +  template <typename U>
> +  void construct (U *ptr)
> +    noexcept (std::is_nothrow_default_constructible<U>::value)
> +  {
> +    ::new ((void *) ptr) U; /* default-init */
> +  }
> 
> Ignore the "noexpect" bit.
> 
> That "new" call with an argument passed before the type
> is called a "placement-new".  That's how you run the ctor
> of U on pre-existing memory.  In this case "*ptr".
> 
> That new expression does default-initialization because it doesn't
> have "()" after the U.  If it had, like this:
> 
>    ::new ((void *) ptr) U ();
> 
> then that'd value-initialize *ptr.  For non-trival types, it's
> the same thing.  But for scalar types, default-initialization
> does nothing, and value-initialization memsets to 0.
> 
> Value-initialization would be what what the:
> 
>  template< class U, class... Args >
>  void construct( U* p, Args&&... args );
> 
> "overload" would do, if we'd let it.  (Again, "quotes" because
> I'm ignoring allocator_traits for simplicity.)
> 
> So what happens if you resize the vector with an explicit value, like
> 
>   buf.resize(new_size, 0xff);
> 
> ?
> 
> In that case you're calling overload (2) at :
>   http://en.cppreference.com/w/cpp/container/vector/vector
> 
> and then the default_init_allocator's "construct" method that
> takes no arguments beyond the object's pointer is not picked
> by overload resolution, so the generic one is picked:
> 
>  template< class U, class... Args >
>  void construct( U* p, Args&&... args );
> 
> and that one constructs *P with
> 
>    ::new ((void *) ptr) U (args...);
> 
> which expanding "args..." ends up being:
> 
>    ::new ((void *) ptr) U (0xff);
> 
> You see here why by default the you end up with
> value-initialization.  If "args..." expanded to nothing,
> you'd get:
> 
>    ::new ((void *) ptr) U ();
> 
> See more here:
> 
>  http://en.cppreference.com/w/cpp/concept/Allocator
>  http://en.cppreference.com/w/cpp/memory/allocator_traits
>  http://en.cppreference.com/w/cpp/memory/allocator
>  http://en.cppreference.com/w/cpp/memory/allocator_traits/construct
>  http://en.cppreference.com/w/cpp/memory/allocator/construct
>  http://en.cppreference.com/w/cpp/language/new
> 
> I hope that helps.  I think the best way to get a good grasp
> on this is to just step through the std::vector code all the
> way to the allocator.
> 
> Thanks,
> Pedro Alves

Thanks for the detailed explanation.  It's all very logical, but it's 
also full of small details essential to really understand what's 
happening.

Simon
  
Yao Qi June 23, 2017, 10:52 a.m. UTC | #4
Pedro Alves <palves@redhat.com> writes:

Hi Pedro,
I happen to read this code when I am writing a stack allocator,

> +  /* Override rebind.  */
> +  template<typename U>
> +  struct rebind
> +  {
> +    /* A couple helpers just to make it a bit more readable.  */
> +    typedef std::allocator_traits<A> traits_;
> +    typedef typename traits_::template rebind_alloc<U> alloc_;
> +
> +    /* This is what we're after.  */
> +    typedef default_init_allocator<U, alloc_> other;

Can we just replace "alloc_" with "A"?  alloc_ is a typedef, which is
"typename std::allocator_traits<A>::template rebind_alloc<U>".  Then, it
is "typename A::rebind<U>::other".  Then, it is "A<U>", so the member
"other" can be "typedef default_init_allocator<U, A> other;".
  
Pedro Alves June 23, 2017, 11:51 a.m. UTC | #5
On 06/23/2017 11:52 AM, Yao Qi wrote:
> Pedro Alves <palves@redhat.com> writes:
> 
> Hi Pedro,
> I happen to read this code when I am writing a stack allocator,
> 
>> +  /* Override rebind.  */
>> +  template<typename U>
>> +  struct rebind
>> +  {
>> +    /* A couple helpers just to make it a bit more readable.  */
>> +    typedef std::allocator_traits<A> traits_;
>> +    typedef typename traits_::template rebind_alloc<U> alloc_;
>> +
>> +    /* This is what we're after.  */
>> +    typedef default_init_allocator<U, alloc_> other;
> 
> Can we just replace "alloc_" with "A"?  alloc_ is a typedef, which is
> "typename std::allocator_traits<A>::template rebind_alloc<U>".  Then, it
> is "typename A::rebind<U>::other".  

First, this direct substitution is not guaranteed.  We don't know
whether "A::rebind<U>::other" exists, since it is optional:

 http://en.cppreference.com/w/cpp/concept/Allocator

We're not supposed to access properties/fields/methods of allocators
directly; instead, we need to indirect via std::allocator_traits, which 
provides defaults:

 http://en.cppreference.com/w/cpp/memory/allocator_traits

 "rebind_alloc<T>" => Alloc::rebind<T>::other if present,
                      otherwise Alloc<T, Args> if this Alloc is Alloc<U, Args>

> Then, it is "A<U>", so the member
> "other" can be "typedef default_init_allocator<U, A> other;".

Not really.  "A<U>" is not A.  "A<U>" wouldn't compile, either.

When A is "std::allocator<T>", "other" for U must be:

  typedef default_init_allocator<U, std::allocator<U>> other;

Not:

  typedef default_init_allocator<U, std::allocator<T>> other;



In more detail, we have:

 template<typename T, typename A = std::allocator<T>>
 class default_init_allocator : public A

The point of "other" is to return a default_init_allocator
for U.  IOW, above, we need to replace both "T"s with "U".

 template<typename U, typename B = std::allocator<U>>
 class default_init_allocator : public B

("A" replaced with "B" too, to avoid confusion, since
it ends up being a different type.)


So:

  default_init_allocator<T>::rebind<U>::other

should result in

  default_init_allocator<U, std::allocator<U>>

while with what you're saying, it'd result in:

  default_init_allocator<U, std::allocator<T>>


IOW, say that T is int, and U is char.  Then,

  default_init_allocator<int>::rebind<char>::other

should be:

  default_init_allocator<char, std::allocator<char>>

not:

  default_init_allocator<char, std::allocator<int>>

Hope that helps,
Pedro Alves
  
Yao Qi June 23, 2017, 1:03 p.m. UTC | #6
Pedro Alves <palves@redhat.com> writes:

> Hope that helps,

Yes, thanks for the explanation.
  

Patch

diff --git a/gdb/ada-lang.c b/gdb/ada-lang.c
index 57c670e..ea60df2 100644
--- a/gdb/ada-lang.c
+++ b/gdb/ada-lang.c
@@ -61,6 +61,7 @@ 
 #include "arch-utils.h"
 #include "cli/cli-utils.h"
 #include "common/function-view.h"
+#include "common/byte-vector.h"
 
 /* Define whether or not the C operator '/' truncates towards zero for
    differently signed operands (truncation direction is undefined in C).
@@ -2567,8 +2568,7 @@  ada_value_primitive_packed_val (struct value *obj, const gdb_byte *valaddr,
   gdb_byte *unpacked;
   const int is_scalar = is_scalar_type (type);
   const int is_big_endian = gdbarch_bits_big_endian (get_type_arch (type));
-  std::unique_ptr<gdb_byte[]> staging;
-  int staging_len = 0;
+  gdb::byte_vector staging;
 
   type = ada_check_typedef (type);
 
@@ -2586,14 +2586,14 @@  ada_value_primitive_packed_val (struct value *obj, const gdb_byte *valaddr,
 	 packed, and therefore maybe not at a byte boundary.  So, what
 	 we do, is unpack the data into a byte-aligned buffer, and then
 	 use that buffer as our object's value for resolving the type.  */
-      staging_len = (bit_size + HOST_CHAR_BIT - 1) / HOST_CHAR_BIT;
-      staging.reset (new gdb_byte[staging_len]);
+      int staging_len = (bit_size + HOST_CHAR_BIT - 1) / HOST_CHAR_BIT;
+      staging.resize (staging_len);
 
       ada_unpack_from_contents (src, bit_offset, bit_size,
-			        staging.get (), staging_len,
+			        staging.data (), staging.size (),
 				is_big_endian, has_negatives (type),
 				is_scalar);
-      type = resolve_dynamic_type (type, staging.get (), 0);
+      type = resolve_dynamic_type (type, staging.data (), 0);
       if (TYPE_LENGTH (type) < (bit_size + HOST_CHAR_BIT - 1) / HOST_CHAR_BIT)
 	{
 	  /* This happens when the length of the object is dynamic,
@@ -2656,12 +2656,12 @@  ada_value_primitive_packed_val (struct value *obj, const gdb_byte *valaddr,
       return v;
     }
 
-  if (staging != NULL && staging_len == TYPE_LENGTH (type))
+  if (staging.size () == TYPE_LENGTH (type))
     {
       /* Small short-cut: If we've unpacked the data into a buffer
 	 of the same size as TYPE's length, then we can reuse that,
 	 instead of doing the unpacking again.  */
-      memcpy (unpacked, staging.get (), staging_len);
+      memcpy (unpacked, staging.data (), staging.size ());
     }
   else
     ada_unpack_from_contents (src, bit_offset, bit_size,
diff --git a/gdb/charset.c b/gdb/charset.c
index f55e482..dbe46a4 100644
--- a/gdb/charset.c
+++ b/gdb/charset.c
@@ -673,7 +673,7 @@  wchar_iterator::iterate (enum wchar_iterate_result *out_result,
 
 	      ++out_request;
 	      if (out_request > m_out.size ())
-		m_out.reserve (out_request);
+		m_out.resize (out_request);
 	      continue;
 
 	    case EINVAL:
diff --git a/gdb/charset.h b/gdb/charset.h
index 51180e3..821974a 100644
--- a/gdb/charset.h
+++ b/gdb/charset.h
@@ -19,7 +19,7 @@ 
 #ifndef CHARSET_H
 #define CHARSET_H
 
-#include <vector>
+#include "common/def-vector.h"
 
 /* If the target program uses a different character set than the host,
    GDB has some support for translating between the two; GDB converts
@@ -144,7 +144,7 @@  class wchar_iterator
   size_t m_width;
 
   /* The output buffer.  */
-  std::vector<gdb_wchar_t> m_out;
+  gdb::def_vector<gdb_wchar_t> m_out;
 };
 
 
diff --git a/gdb/cli/cli-dump.c b/gdb/cli/cli-dump.c
index 213622d..d6d4aab 100644
--- a/gdb/cli/cli-dump.c
+++ b/gdb/cli/cli-dump.c
@@ -31,7 +31,7 @@ 
 #include "cli/cli-utils.h"
 #include "gdb_bfd.h"
 #include "filestuff.h"
-
+#include "common/byte-vector.h"
 
 static const char *
 scan_expression_with_cleanup (const char **cmd, const char *def)
@@ -230,17 +230,17 @@  dump_memory_to_file (const char *cmd, const char *mode, const char *file_format)
 
   /* FIXME: Should use read_memory_partial() and a magic blocking
      value.  */
-  std::unique_ptr<gdb_byte[]> buf (new gdb_byte[count]);
-  read_memory (lo, buf.get (), count);
+  gdb::byte_vector buf (count);
+  read_memory (lo, buf.data (), count);
   
   /* Have everything.  Open/write the data.  */
   if (file_format == NULL || strcmp (file_format, "binary") == 0)
     {
-      dump_binary_file (filename, mode, buf.get (), count);
+      dump_binary_file (filename, mode, buf.data (), count);
     }
   else
     {
-      dump_bfd_file (filename, mode, file_format, lo, buf.get (), count);
+      dump_bfd_file (filename, mode, file_format, lo, buf.data (), count);
     }
 
   do_cleanups (old_cleanups);
@@ -545,13 +545,13 @@  restore_binary_file (const char *filename, struct callback_data *data)
     perror_with_name (filename);
 
   /* Now allocate a buffer and read the file contents.  */
-  std::unique_ptr<gdb_byte[]> buf (new gdb_byte[len]);
-  if (fread (buf.get (), 1, len, file) != len)
+  gdb::byte_vector buf (len);
+  if (fread (buf.data (), 1, len, file) != len)
     perror_with_name (filename);
 
   /* Now write the buffer into target memory.  */
   len = target_write_memory (data->load_start + data->load_offset,
-			     buf.get (), len);
+			     buf.data (), len);
   if (len != 0)
     warning (_("restore: memory write failed (%s)."), safe_strerror (len));
   do_cleanups (cleanup);
diff --git a/gdb/common/byte-vector.h b/gdb/common/byte-vector.h
new file mode 100644
index 0000000..c17b14d
--- /dev/null
+++ b/gdb/common/byte-vector.h
@@ -0,0 +1,62 @@ 
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef COMMON_BYTE_VECTOR_H
+#define COMMON_BYTE_VECTOR_H
+
+#include "common/def-vector.h"
+
+namespace gdb {
+
+/* byte_vector is a gdb_byte std::vector with a custom allocator that
+   unlike std::vector<gdb_byte> does not zero-initialize new elements
+   by default when the vector is created/resized.  This is what you
+   usually want when working with byte buffers, since if you're
+   creating or growing a buffer you'll most surely want to fill it in
+   with data, in which case zero-initialization would be a
+   pessimization.  For example:
+
+     gdb::byte_vector buf (some_large_size);
+     fill_with_data (buf.data (), buf.size ());
+
+   On the odd case you do need zero initialization, then you can still
+   call the overloads that specify an explicit value, like:
+
+     gdb::byte_vector buf (some_initial_size, 0);
+     buf.resize (a_bigger_size, 0);
+
+   (Or use std::vector<gdb_byte> instead.)
+
+   Note that unlike std::vector<gdb_byte>, function local
+   gdb::byte_vector objects constructed with an initial size like:
+
+     gdb::byte_vector buf (some_size);
+     fill_with_data (buf.data (), buf.size ());
+
+   usually compile down to the exact same as:
+
+     std::unique_ptr<byte[]> buf (new gdb_byte[some_size]);
+     fill_with_data (buf.get (), some_size);
+
+   with the former having the advantage of being a bit more readable,
+   and providing the whole std::vector API, if you end up needing it.
+*/
+using byte_vector = gdb::def_vector<gdb_byte>;
+
+} /* namespace gdb */
+
+#endif /* COMMON_DEF_VECTOR_H */
diff --git a/gdb/common/def-vector.h b/gdb/common/def-vector.h
new file mode 100644
index 0000000..ab9331f
--- /dev/null
+++ b/gdb/common/def-vector.h
@@ -0,0 +1,36 @@ 
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef COMMON_DEF_VECTOR_H
+#define COMMON_DEF_VECTOR_H
+
+#include <vector>
+#include "common/default-init-alloc.h"
+
+namespace gdb {
+
+/* A vector that uses an allocator that default constructs using
+   default-initialization rather than value-initialization.  The idea
+   is to use this when you don't want zero-initialization of elements
+   of vectors of trivial types.  E.g., byte buffers.  */
+
+template<typename T> using def_vector
+  = std::vector<T, gdb::default_init_allocator<T>>;
+
+} /* namespace gdb */
+
+#endif /* COMMON_DEF_VECTOR_H */
diff --git a/gdb/common/default-init-alloc.h b/gdb/common/default-init-alloc.h
new file mode 100644
index 0000000..4fb852f
--- /dev/null
+++ b/gdb/common/default-init-alloc.h
@@ -0,0 +1,67 @@ 
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef COMMON_DEFAULT_INIT_ALLOC_H
+#define COMMON_DEFAULT_INIT_ALLOC_H
+
+namespace gdb {
+
+/* An allocator that default constructs using default-initialization
+   rather than value-initialization.  The idea is to use this when you
+   don't want to default construct elements of containers of trivial
+   types using zero-initialization.  */
+
+/* Mostly as implementation convenience, this is implemented as an
+   adapter that given an allocator A, overrides 'A::construct()'.  'A'
+   defaults to std::allocator<T>.  */
+
+template<typename T, typename A = std::allocator<T>>
+class default_init_allocator : public A
+{
+public:
+  /* Pull in A's ctors.  */
+  using A::A;
+
+  /* Override rebind.  */
+  template<typename U>
+  struct rebind
+  {
+    /* A couple helpers just to make it a bit more readable.  */
+    typedef std::allocator_traits<A> traits_;
+    typedef typename traits_::template rebind_alloc<U> alloc_;
+
+    /* This is what we're after.  */
+    typedef default_init_allocator<U, alloc_> other;
+  };
+
+  /* Make the base allocator's construct method(s) visible.  */
+  using A::construct;
+
+  /* .. and provide an override/overload for the case of default
+     construction (i.e., no arguments).  This is where we construct
+     with default-init.  */
+  template <typename U>
+  void construct (U *ptr)
+    noexcept (std::is_nothrow_default_constructible<U>::value)
+  {
+    ::new ((void *) ptr) U; /* default-init */
+  }
+};
+
+} /* namespace gdb */
+
+#endif /* COMMON_DEFAULT_INIT_ALLOC_H */
diff --git a/gdb/dwarf2loc.c b/gdb/dwarf2loc.c
index 127167d..698c91f 100644
--- a/gdb/dwarf2loc.c
+++ b/gdb/dwarf2loc.c
@@ -43,6 +43,7 @@ 
 #include <vector>
 #include <unordered_set>
 #include "common/underlying.h"
+#include "common/byte-vector.h"
 
 extern int dwarf_always_disassemble;
 
@@ -1767,7 +1768,7 @@  read_pieced_value (struct value *v)
     = (struct piece_closure *) value_computed_closure (v);
   size_t type_len;
   size_t buffer_size = 0;
-  std::vector<gdb_byte> buffer;
+  gdb::byte_vector buffer;
   int bits_big_endian
     = gdbarch_bits_big_endian (get_type_arch (value_type (v)));
 
@@ -1821,7 +1822,7 @@  read_pieced_value (struct value *v)
       if (buffer_size < this_size)
 	{
 	  buffer_size = this_size;
-	  buffer.reserve (buffer_size);
+	  buffer.resize (buffer_size);
 	}
       intermediate_buffer = buffer.data ();
 
@@ -1992,7 +1993,7 @@  write_pieced_value (struct value *to, struct value *from)
 	  if (buffer_size < this_size)
 	    {
 	      buffer_size = this_size;
-	      buffer.reserve (buffer_size);
+	      buffer.resize (buffer_size);
 	    }
 	  source_buffer = buffer.data ();
 	  need_bitwise = 1;
diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c
index 3f872b7..abe14b2 100644
--- a/gdb/dwarf2read.c
+++ b/gdb/dwarf2read.c
@@ -73,6 +73,7 @@ 
 #include "common/function-view.h"
 #include "common/gdb_optional.h"
 #include "common/underlying.h"
+#include "common/byte-vector.h"
 
 #include <fcntl.h>
 #include <sys/types.h>
@@ -23245,7 +23246,7 @@  private:
     return &*m_vec.end () - size;
   }
 
-  std::vector<gdb_byte> m_vec;
+  gdb::byte_vector m_vec;
 };
 
 /* An entry in the symbol table.  */
diff --git a/gdb/gdb_regex.c b/gdb/gdb_regex.c
index 2e376e3..55c3980 100644
--- a/gdb/gdb_regex.c
+++ b/gdb/gdb_regex.c
@@ -17,6 +17,7 @@ 
 
 #include "defs.h"
 #include "gdb_regex.h"
+#include "common/def-vector.h"
 
 compiled_regex::compiled_regex (const char *regex, int cflags,
 				const char *message)
@@ -28,10 +29,10 @@  compiled_regex::compiled_regex (const char *regex, int cflags,
   if (code != 0)
     {
       size_t length = regerror (code, &m_pattern, NULL, 0);
-      std::unique_ptr<char[]> err (new char[length]);
+      gdb::def_vector<char> err (length);
 
-      regerror (code, &m_pattern, err.get (), length);
-      error (("%s: %s"), message, err.get ());
+      regerror (code, &m_pattern, err.data (), length);
+      error (("%s: %s"), message, err.data ());
     }
 }
 
diff --git a/gdb/mi/mi-main.c b/gdb/mi/mi-main.c
index 755fbab..53289bb 100644
--- a/gdb/mi/mi-main.c
+++ b/gdb/mi/mi-main.c
@@ -55,6 +55,7 @@ 
 #include "gdbcmd.h"
 #include "observer.h"
 #include "common/gdb_optional.h"
+#include "common/byte-vector.h"
 
 #include <ctype.h>
 #include "run-time-clock.h"
@@ -1466,12 +1467,12 @@  mi_cmd_data_read_memory (const char *command, char **argv, int argc)
   /* Create a buffer and read it in.  */
   total_bytes = word_size * nr_rows * nr_cols;
 
-  std::unique_ptr<gdb_byte[]> mbuf (new gdb_byte[total_bytes]);
+  gdb::byte_vector mbuf (total_bytes);
 
   /* Dispatch memory reads to the topmost target, not the flattened
      current_target.  */
   nr_bytes = target_read (current_target.beneath,
-			  TARGET_OBJECT_MEMORY, NULL, mbuf.get (),
+			  TARGET_OBJECT_MEMORY, NULL, mbuf.data (),
 			  addr, total_bytes);
   if (nr_bytes <= 0)
     error (_("Unable to read memory."));