[RESEND] reader context: do not reuse current corpus and corpus_group

Message ID 20200430215256.19135-1-maennich@google.com
State Rejected
Headers
Series [RESEND] reader context: do not reuse current corpus and corpus_group |

Commit Message

Matthias Männich April 30, 2020, 9:52 p.m. UTC
  libabigail's readers (abg-reader and abg-dwarf-reader) currently spare
some allocations by reusing the reader context's existing current corpus
and current corpus group. When building a corpus_group's vector of
corpora, reusing the shared_ptr referring to a corpus means we are
modifying the corpus data of a previously read corpus. As a user of the
read*corpus functions, that isn't entirely transparent and when storing
corpare like in the vector above, we might introduce subtle bugs.

Fix this by explicitly creating new corpus / corpus group instances when
reading from elf/dwarf or xml.

	* src/abg-dwarf-reader.cc (read_debug_info_into_corpus): always
	instantiate a new corpus instance.
	* src/abg-reader.cc (read_corpus_from_input): Likewise.
	(read_corpus_group_from_input): always instantiate a new corpus
	group instance.

Signed-off-by: Matthias Maennich <maennich@google.com>
---
 src/abg-dwarf-reader.cc | 11 ++++-----
 src/abg-reader.cc       | 49 +++++++++++++++++------------------------
 2 files changed, 24 insertions(+), 36 deletions(-)
  

Comments

Giuliano Procida May 1, 2020, 2:23 p.m. UTC | #1
Hi.

Does this change have any impact on incomplete (forward-declared) type
differences?
Does it significantly impact performance on large inputs?

Otherwise, this change makes sense and
Reviewed-by: Giuliano Procida <gprocida@google.com>

More generally, do you have a good understanding of what in the
context is generic to the tool run and what is specific to a file
being read and can they be separated cleanly?

Giuliano.

On Thu, 30 Apr 2020 at 22:53, Matthias Maennich <maennich@google.com> wrote:
>
> libabigail's readers (abg-reader and abg-dwarf-reader) currently spare
> some allocations by reusing the reader context's existing current corpus
> and current corpus group. When building a corpus_group's vector of
> corpora, reusing the shared_ptr referring to a corpus means we are
> modifying the corpus data of a previously read corpus. As a user of the
> read*corpus functions, that isn't entirely transparent and when storing
> corpare like in the vector above, we might introduce subtle bugs.
>
> Fix this by explicitly creating new corpus / corpus group instances when
> reading from elf/dwarf or xml.
>
>         * src/abg-dwarf-reader.cc (read_debug_info_into_corpus): always
>         instantiate a new corpus instance.
>         * src/abg-reader.cc (read_corpus_from_input): Likewise.
>         (read_corpus_group_from_input): always instantiate a new corpus
>         group instance.
>
> Signed-off-by: Matthias Maennich <maennich@google.com>
> ---
>  src/abg-dwarf-reader.cc | 11 ++++-----
>  src/abg-reader.cc       | 49 +++++++++++++++++------------------------
>  2 files changed, 24 insertions(+), 36 deletions(-)
>
> diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
> index 850281ad1ce5..d99d25bfb3b9 100644
> --- a/src/abg-dwarf-reader.cc
> +++ b/src/abg-dwarf-reader.cc
> @@ -16068,13 +16068,10 @@ read_debug_info_into_corpus(read_context& ctxt)
>  {
>    ctxt.clear_per_corpus_data();
>
> -  if (!ctxt.current_corpus())
> -    {
> -      corpus_sptr corp (new corpus(ctxt.env(), ctxt.elf_path()));
> -      ctxt.current_corpus(corp);
> -      if (!ctxt.env())
> -       ctxt.env(corp->get_environment());
> -    }
> +  corpus_sptr corp(new corpus(ctxt.env(), ctxt.elf_path()));
> +  ctxt.current_corpus(corp);
> +  if (!ctxt.env())
> +    ctxt.env(corp->get_environment());
>
>    // First set some mundane properties of the corpus gathered from
>    // ELF.
> diff --git a/src/abg-reader.cc b/src/abg-reader.cc
> index 255a200f2a25..0be45ec59a39 100644
> --- a/src/abg-reader.cc
> +++ b/src/abg-reader.cc
> @@ -25,6 +25,7 @@
>  /// ABI Instrumentation file in libabigail native XML format.  This
>  /// native XML format is named "abixml".
>
> +#include "abg-fwd.h"
>  #include "config.h"
>  #include <cstring>
>  #include <cstdlib>
> @@ -1899,17 +1900,14 @@ read_corpus_from_input(read_context& ctxt)
>                                        BAD_CAST("abi-corpus")))
>         return nil;
>
> -      if (!ctxt.get_corpus())
> -       {
> -         corpus_sptr c(new corpus(ctxt.get_environment(), ""));
> -         ctxt.set_corpus(c);
> -       }
> +      corpus_sptr corp(new corpus(ctxt.get_environment(), ""));
> +      ctxt.set_corpus(corp);
>
>        if (!ctxt.get_corpus_group())
>         ctxt.clear_per_corpus_data();
>
> -      corpus& corp = *ctxt.get_corpus();
> -      ctxt.set_exported_decls_builder(corp.get_exported_decls_builder().get());
> +      ctxt.set_exported_decls_builder(
> +         corp->get_exported_decls_builder().get());
>
>        xml::xml_char_sptr path_str = XML_READER_GET_ATTRIBUTE(reader, "path");
>        string path;
> @@ -1917,13 +1915,13 @@ read_corpus_from_input(read_context& ctxt)
>        if (path_str)
>         {
>           path = reinterpret_cast<char*>(path_str.get());
> -         corp.set_path(path);
> +         corp->set_path(path);
>         }
>
>        xml::xml_char_sptr architecture_str =
>         XML_READER_GET_ATTRIBUTE(reader, "architecture");
>        if (architecture_str)
> -       corp.set_architecture_name
> +       corp->set_architecture_name
>           (reinterpret_cast<char*>(architecture_str.get()));
>
>        xml::xml_char_sptr soname_str =
> @@ -1933,7 +1931,7 @@ read_corpus_from_input(read_context& ctxt)
>        if (soname_str)
>         {
>           soname = reinterpret_cast<char*>(soname_str.get());
> -         corp.set_soname(soname);
> +         corp->set_soname(soname);
>         }
>
>        // Apply suppression specifications here to honour:
> @@ -1955,32 +1953,29 @@ read_corpus_from_input(read_context& ctxt)
>      }
>    else
>      {
> -      if (!ctxt.get_corpus())
> -       {
> -         corpus_sptr c(new corpus(ctxt.get_environment(), ""));
> -         ctxt.set_corpus(c);
> -       }
> +      corpus_sptr corp(new corpus(ctxt.get_environment(), ""));
> +      ctxt.set_corpus(corp);
>
>        if (!ctxt.get_corpus_group())
>         ctxt.clear_per_corpus_data();
>
> -      corpus& corp = *ctxt.get_corpus();
> -      ctxt.set_exported_decls_builder(corp.get_exported_decls_builder().get());
> +      ctxt.set_exported_decls_builder(
> +         corp->get_exported_decls_builder().get());
>
>        xml::xml_char_sptr path_str = XML_NODE_GET_ATTRIBUTE(node, "path");
>        if (path_str)
> -       corp.set_path(reinterpret_cast<char*>(path_str.get()));
> +       corp->set_path(reinterpret_cast<char*>(path_str.get()));
>
>        xml::xml_char_sptr architecture_str =
>         XML_NODE_GET_ATTRIBUTE(node, "architecture");
>        if (architecture_str)
> -       corp.set_architecture_name
> +       corp->set_architecture_name
>           (reinterpret_cast<char*>(architecture_str.get()));
>
>        xml::xml_char_sptr soname_str =
>         XML_NODE_GET_ATTRIBUTE(node, "soname");
>        if (soname_str)
> -       corp.set_soname(reinterpret_cast<char*>(soname_str.get()));
> +       corp->set_soname(reinterpret_cast<char*>(soname_str.get()));
>      }
>
>    if (!node->children)
> @@ -2093,14 +2088,10 @@ read_corpus_group_from_input(read_context& ctxt)
>                                    BAD_CAST("abi-corpus-group")))
>      return nil;
>
> -  if (!ctxt.get_corpus_group())
> -    {
> -      corpus_group_sptr g(new corpus_group(ctxt.get_environment(),
> -                                          ctxt.get_path()));
> -      ctxt.set_corpus_group(g);
> -    }
> +  corpus_group_sptr group(
> +      new corpus_group(ctxt.get_environment(), ctxt.get_path()));
> +  ctxt.set_corpus_group(group);
>
> -  corpus_group_sptr group = ctxt.get_corpus_group();
>    xml::xml_char_sptr path_str = XML_READER_GET_ATTRIBUTE(reader, "path");
>    if (path_str)
>      group->set_path(reinterpret_cast<char*>(path_str.get()));
> @@ -2115,11 +2106,11 @@ read_corpus_group_from_input(read_context& ctxt)
>
>    corpus_sptr corp;
>    while ((corp = read_corpus_from_input(ctxt)))
> -    ctxt.get_corpus_group()->add_corpus(corp);
> +    group->add_corpus(corp);
>
>    xmlTextReaderNext(reader.get());
>
> -  return ctxt.get_corpus_group();
> +  return group;
>  }
>
>  /// De-serialize an ABI corpus group from an input XML document which
> --
> 2.26.2.526.g744177e7f7-goog
>
  
Matthias Männich May 4, 2020, 1:21 p.m. UTC | #2
On Fri, May 01, 2020 at 03:23:05PM +0100, Giuliano Procida wrote:
>Hi.
>
>Does this change have any impact on incomplete (forward-declared) type
>differences?
>Does it significantly impact performance on large inputs?

This was a prerequisite change for sorting the corpus vector. But we
replace that by sorting the corpora much earlier. Still this is a useful
change for others coming here and hitting an issue with the reuse (as I
did).

>
>Otherwise, this change makes sense and
>Reviewed-by: Giuliano Procida <gprocida@google.com>
>
>More generally, do you have a good understanding of what in the
>context is generic to the tool run and what is specific to a file
>being read and can they be separated cleanly?

There is more opportunity to separate things a bit (also for potential
multithreaded execution). As it stands, the read_context is a very
central object in the dwarf reader that contains likely too much data
and carries too much responsibility. I am working on splitting out the
symtab reader part to address such a concern. (Also to make it testable
individually).

Cheers,
Matthias

>
>Giuliano.
>
>On Thu, 30 Apr 2020 at 22:53, Matthias Maennich <maennich@google.com> wrote:
>>
>> libabigail's readers (abg-reader and abg-dwarf-reader) currently spare
>> some allocations by reusing the reader context's existing current corpus
>> and current corpus group. When building a corpus_group's vector of
>> corpora, reusing the shared_ptr referring to a corpus means we are
>> modifying the corpus data of a previously read corpus. As a user of the
>> read*corpus functions, that isn't entirely transparent and when storing
>> corpare like in the vector above, we might introduce subtle bugs.
>>
>> Fix this by explicitly creating new corpus / corpus group instances when
>> reading from elf/dwarf or xml.
>>
>>         * src/abg-dwarf-reader.cc (read_debug_info_into_corpus): always
>>         instantiate a new corpus instance.
>>         * src/abg-reader.cc (read_corpus_from_input): Likewise.
>>         (read_corpus_group_from_input): always instantiate a new corpus
>>         group instance.
>>
>> Signed-off-by: Matthias Maennich <maennich@google.com>
>> ---
>>  src/abg-dwarf-reader.cc | 11 ++++-----
>>  src/abg-reader.cc       | 49 +++++++++++++++++------------------------
>>  2 files changed, 24 insertions(+), 36 deletions(-)
>>
>> diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
>> index 850281ad1ce5..d99d25bfb3b9 100644
>> --- a/src/abg-dwarf-reader.cc
>> +++ b/src/abg-dwarf-reader.cc
>> @@ -16068,13 +16068,10 @@ read_debug_info_into_corpus(read_context& ctxt)
>>  {
>>    ctxt.clear_per_corpus_data();
>>
>> -  if (!ctxt.current_corpus())
>> -    {
>> -      corpus_sptr corp (new corpus(ctxt.env(), ctxt.elf_path()));
>> -      ctxt.current_corpus(corp);
>> -      if (!ctxt.env())
>> -       ctxt.env(corp->get_environment());
>> -    }
>> +  corpus_sptr corp(new corpus(ctxt.env(), ctxt.elf_path()));
>> +  ctxt.current_corpus(corp);
>> +  if (!ctxt.env())
>> +    ctxt.env(corp->get_environment());
>>
>>    // First set some mundane properties of the corpus gathered from
>>    // ELF.
>> diff --git a/src/abg-reader.cc b/src/abg-reader.cc
>> index 255a200f2a25..0be45ec59a39 100644
>> --- a/src/abg-reader.cc
>> +++ b/src/abg-reader.cc
>> @@ -25,6 +25,7 @@
>>  /// ABI Instrumentation file in libabigail native XML format.  This
>>  /// native XML format is named "abixml".
>>
>> +#include "abg-fwd.h"
>>  #include "config.h"
>>  #include <cstring>
>>  #include <cstdlib>
>> @@ -1899,17 +1900,14 @@ read_corpus_from_input(read_context& ctxt)
>>                                        BAD_CAST("abi-corpus")))
>>         return nil;
>>
>> -      if (!ctxt.get_corpus())
>> -       {
>> -         corpus_sptr c(new corpus(ctxt.get_environment(), ""));
>> -         ctxt.set_corpus(c);
>> -       }
>> +      corpus_sptr corp(new corpus(ctxt.get_environment(), ""));
>> +      ctxt.set_corpus(corp);
>>
>>        if (!ctxt.get_corpus_group())
>>         ctxt.clear_per_corpus_data();
>>
>> -      corpus& corp = *ctxt.get_corpus();
>> -      ctxt.set_exported_decls_builder(corp.get_exported_decls_builder().get());
>> +      ctxt.set_exported_decls_builder(
>> +         corp->get_exported_decls_builder().get());
>>
>>        xml::xml_char_sptr path_str = XML_READER_GET_ATTRIBUTE(reader, "path");
>>        string path;
>> @@ -1917,13 +1915,13 @@ read_corpus_from_input(read_context& ctxt)
>>        if (path_str)
>>         {
>>           path = reinterpret_cast<char*>(path_str.get());
>> -         corp.set_path(path);
>> +         corp->set_path(path);
>>         }
>>
>>        xml::xml_char_sptr architecture_str =
>>         XML_READER_GET_ATTRIBUTE(reader, "architecture");
>>        if (architecture_str)
>> -       corp.set_architecture_name
>> +       corp->set_architecture_name
>>           (reinterpret_cast<char*>(architecture_str.get()));
>>
>>        xml::xml_char_sptr soname_str =
>> @@ -1933,7 +1931,7 @@ read_corpus_from_input(read_context& ctxt)
>>        if (soname_str)
>>         {
>>           soname = reinterpret_cast<char*>(soname_str.get());
>> -         corp.set_soname(soname);
>> +         corp->set_soname(soname);
>>         }
>>
>>        // Apply suppression specifications here to honour:
>> @@ -1955,32 +1953,29 @@ read_corpus_from_input(read_context& ctxt)
>>      }
>>    else
>>      {
>> -      if (!ctxt.get_corpus())
>> -       {
>> -         corpus_sptr c(new corpus(ctxt.get_environment(), ""));
>> -         ctxt.set_corpus(c);
>> -       }
>> +      corpus_sptr corp(new corpus(ctxt.get_environment(), ""));
>> +      ctxt.set_corpus(corp);
>>
>>        if (!ctxt.get_corpus_group())
>>         ctxt.clear_per_corpus_data();
>>
>> -      corpus& corp = *ctxt.get_corpus();
>> -      ctxt.set_exported_decls_builder(corp.get_exported_decls_builder().get());
>> +      ctxt.set_exported_decls_builder(
>> +         corp->get_exported_decls_builder().get());
>>
>>        xml::xml_char_sptr path_str = XML_NODE_GET_ATTRIBUTE(node, "path");
>>        if (path_str)
>> -       corp.set_path(reinterpret_cast<char*>(path_str.get()));
>> +       corp->set_path(reinterpret_cast<char*>(path_str.get()));
>>
>>        xml::xml_char_sptr architecture_str =
>>         XML_NODE_GET_ATTRIBUTE(node, "architecture");
>>        if (architecture_str)
>> -       corp.set_architecture_name
>> +       corp->set_architecture_name
>>           (reinterpret_cast<char*>(architecture_str.get()));
>>
>>        xml::xml_char_sptr soname_str =
>>         XML_NODE_GET_ATTRIBUTE(node, "soname");
>>        if (soname_str)
>> -       corp.set_soname(reinterpret_cast<char*>(soname_str.get()));
>> +       corp->set_soname(reinterpret_cast<char*>(soname_str.get()));
>>      }
>>
>>    if (!node->children)
>> @@ -2093,14 +2088,10 @@ read_corpus_group_from_input(read_context& ctxt)
>>                                    BAD_CAST("abi-corpus-group")))
>>      return nil;
>>
>> -  if (!ctxt.get_corpus_group())
>> -    {
>> -      corpus_group_sptr g(new corpus_group(ctxt.get_environment(),
>> -                                          ctxt.get_path()));
>> -      ctxt.set_corpus_group(g);
>> -    }
>> +  corpus_group_sptr group(
>> +      new corpus_group(ctxt.get_environment(), ctxt.get_path()));
>> +  ctxt.set_corpus_group(group);
>>
>> -  corpus_group_sptr group = ctxt.get_corpus_group();
>>    xml::xml_char_sptr path_str = XML_READER_GET_ATTRIBUTE(reader, "path");
>>    if (path_str)
>>      group->set_path(reinterpret_cast<char*>(path_str.get()));
>> @@ -2115,11 +2106,11 @@ read_corpus_group_from_input(read_context& ctxt)
>>
>>    corpus_sptr corp;
>>    while ((corp = read_corpus_from_input(ctxt)))
>> -    ctxt.get_corpus_group()->add_corpus(corp);
>> +    group->add_corpus(corp);
>>
>>    xmlTextReaderNext(reader.get());
>>
>> -  return ctxt.get_corpus_group();
>> +  return group;
>>  }
>>
>>  /// De-serialize an ABI corpus group from an input XML document which
>> --
>> 2.26.2.526.g744177e7f7-goog
>>
  
Dodji Seketeli May 4, 2020, 2:42 p.m. UTC | #3
Hello Matthias, Giuliano,

Matthias Maennich <maennich@google.com> a ?crit:

> libabigail's readers (abg-reader and abg-dwarf-reader) currently spare
> some allocations by reusing the reader context's existing current corpus
> and current corpus group. When building a corpus_group's vector of
> corpora, reusing the shared_ptr referring to a corpus means we are
> modifying the corpus data of a previously read corpus. As a user of the
> read*corpus functions, that isn't entirely transparent and when storing
> corpare like in the vector above, we might introduce subtle bugs.

[...]

Giuliano Procida <gprocida@google.com> a ?crit:

> Does this change have any impact on incomplete (forward-declared) type
> differences?
> Does it significantly impact performance on large inputs?

Right.  That is my concern as well.  The reason why we avoid
instantiating a new dwarf reader context all the time is for performance
reasons, especially when analysing a kernel with lots of kernel
modules.  Example of kernels would be any of the known enterprise
kernels, I guess.


That being said, a lot of things have happened on the front of
performance for kernels with lots of modules, so maybe this particular
optimization is not useful anymore, I am not sure.

In any case, we should measure this first before we know if this can get
in.

I hope this makes sense.

Cheers,
  
Matthias Männich May 4, 2020, 6:03 p.m. UTC | #4
Hi Dodji,

On Mon, May 04, 2020 at 04:42:02PM +0200, Dodji Seketeli wrote:
>Hello Matthias, Giuliano,
>
>Matthias Maennich <maennich@google.com> a ?crit:
>
>> libabigail's readers (abg-reader and abg-dwarf-reader) currently spare
>> some allocations by reusing the reader context's existing current corpus
>> and current corpus group. When building a corpus_group's vector of
>> corpora, reusing the shared_ptr referring to a corpus means we are
>> modifying the corpus data of a previously read corpus. As a user of the
>> read*corpus functions, that isn't entirely transparent and when storing
>> corpare like in the vector above, we might introduce subtle bugs.
>
>[...]
>
>Giuliano Procida <gprocida@google.com> a ?crit:
>
>> Does this change have any impact on incomplete (forward-declared) type
>> differences?
>> Does it significantly impact performance on large inputs?
>
>Right.  That is my concern as well.  The reason why we avoid
>instantiating a new dwarf reader context all the time is for performance
>reasons, especially when analysing a kernel with lots of kernel
>modules.  Example of kernels would be any of the known enterprise
>kernels, I guess.
>
>
>That being said, a lot of things have happened on the front of
>performance for kernels with lots of modules, so maybe this particular
>optimization is not useful anymore, I am not sure.
>
>In any case, we should measure this first before we know if this can get
>in.
>
>I hope this makes sense.

I agree. It makes sense to measure this in a more diverse context. The
Android Kernels were running with this patch for a while, but they were
not at all representative. Since this was a prerequisite patch for a now
obsolete patch, I am also ok to just drop it. I will also drop it from
mm-next then.

Cheers,
Matthias

>
>Cheers,
>
>-- 
>		Dodji
  

Patch

diff --git a/src/abg-dwarf-reader.cc b/src/abg-dwarf-reader.cc
index 850281ad1ce5..d99d25bfb3b9 100644
--- a/src/abg-dwarf-reader.cc
+++ b/src/abg-dwarf-reader.cc
@@ -16068,13 +16068,10 @@  read_debug_info_into_corpus(read_context& ctxt)
 {
   ctxt.clear_per_corpus_data();
 
-  if (!ctxt.current_corpus())
-    {
-      corpus_sptr corp (new corpus(ctxt.env(), ctxt.elf_path()));
-      ctxt.current_corpus(corp);
-      if (!ctxt.env())
-	ctxt.env(corp->get_environment());
-    }
+  corpus_sptr corp(new corpus(ctxt.env(), ctxt.elf_path()));
+  ctxt.current_corpus(corp);
+  if (!ctxt.env())
+    ctxt.env(corp->get_environment());
 
   // First set some mundane properties of the corpus gathered from
   // ELF.
diff --git a/src/abg-reader.cc b/src/abg-reader.cc
index 255a200f2a25..0be45ec59a39 100644
--- a/src/abg-reader.cc
+++ b/src/abg-reader.cc
@@ -25,6 +25,7 @@ 
 /// ABI Instrumentation file in libabigail native XML format.  This
 /// native XML format is named "abixml".
 
+#include "abg-fwd.h"
 #include "config.h"
 #include <cstring>
 #include <cstdlib>
@@ -1899,17 +1900,14 @@  read_corpus_from_input(read_context& ctxt)
 				       BAD_CAST("abi-corpus")))
 	return nil;
 
-      if (!ctxt.get_corpus())
-	{
-	  corpus_sptr c(new corpus(ctxt.get_environment(), ""));
-	  ctxt.set_corpus(c);
-	}
+      corpus_sptr corp(new corpus(ctxt.get_environment(), ""));
+      ctxt.set_corpus(corp);
 
       if (!ctxt.get_corpus_group())
 	ctxt.clear_per_corpus_data();
 
-      corpus& corp = *ctxt.get_corpus();
-      ctxt.set_exported_decls_builder(corp.get_exported_decls_builder().get());
+      ctxt.set_exported_decls_builder(
+	  corp->get_exported_decls_builder().get());
 
       xml::xml_char_sptr path_str = XML_READER_GET_ATTRIBUTE(reader, "path");
       string path;
@@ -1917,13 +1915,13 @@  read_corpus_from_input(read_context& ctxt)
       if (path_str)
 	{
 	  path = reinterpret_cast<char*>(path_str.get());
-	  corp.set_path(path);
+	  corp->set_path(path);
 	}
 
       xml::xml_char_sptr architecture_str =
 	XML_READER_GET_ATTRIBUTE(reader, "architecture");
       if (architecture_str)
-	corp.set_architecture_name
+	corp->set_architecture_name
 	  (reinterpret_cast<char*>(architecture_str.get()));
 
       xml::xml_char_sptr soname_str =
@@ -1933,7 +1931,7 @@  read_corpus_from_input(read_context& ctxt)
       if (soname_str)
 	{
 	  soname = reinterpret_cast<char*>(soname_str.get());
-	  corp.set_soname(soname);
+	  corp->set_soname(soname);
 	}
 
       // Apply suppression specifications here to honour:
@@ -1955,32 +1953,29 @@  read_corpus_from_input(read_context& ctxt)
     }
   else
     {
-      if (!ctxt.get_corpus())
-	{
-	  corpus_sptr c(new corpus(ctxt.get_environment(), ""));
-	  ctxt.set_corpus(c);
-	}
+      corpus_sptr corp(new corpus(ctxt.get_environment(), ""));
+      ctxt.set_corpus(corp);
 
       if (!ctxt.get_corpus_group())
 	ctxt.clear_per_corpus_data();
 
-      corpus& corp = *ctxt.get_corpus();
-      ctxt.set_exported_decls_builder(corp.get_exported_decls_builder().get());
+      ctxt.set_exported_decls_builder(
+	  corp->get_exported_decls_builder().get());
 
       xml::xml_char_sptr path_str = XML_NODE_GET_ATTRIBUTE(node, "path");
       if (path_str)
-	corp.set_path(reinterpret_cast<char*>(path_str.get()));
+	corp->set_path(reinterpret_cast<char*>(path_str.get()));
 
       xml::xml_char_sptr architecture_str =
 	XML_NODE_GET_ATTRIBUTE(node, "architecture");
       if (architecture_str)
-	corp.set_architecture_name
+	corp->set_architecture_name
 	  (reinterpret_cast<char*>(architecture_str.get()));
 
       xml::xml_char_sptr soname_str =
 	XML_NODE_GET_ATTRIBUTE(node, "soname");
       if (soname_str)
-	corp.set_soname(reinterpret_cast<char*>(soname_str.get()));
+	corp->set_soname(reinterpret_cast<char*>(soname_str.get()));
     }
 
   if (!node->children)
@@ -2093,14 +2088,10 @@  read_corpus_group_from_input(read_context& ctxt)
 				   BAD_CAST("abi-corpus-group")))
     return nil;
 
-  if (!ctxt.get_corpus_group())
-    {
-      corpus_group_sptr g(new corpus_group(ctxt.get_environment(),
-					   ctxt.get_path()));
-      ctxt.set_corpus_group(g);
-    }
+  corpus_group_sptr group(
+      new corpus_group(ctxt.get_environment(), ctxt.get_path()));
+  ctxt.set_corpus_group(group);
 
-  corpus_group_sptr group = ctxt.get_corpus_group();
   xml::xml_char_sptr path_str = XML_READER_GET_ATTRIBUTE(reader, "path");
   if (path_str)
     group->set_path(reinterpret_cast<char*>(path_str.get()));
@@ -2115,11 +2106,11 @@  read_corpus_group_from_input(read_context& ctxt)
 
   corpus_sptr corp;
   while ((corp = read_corpus_from_input(ctxt)))
-    ctxt.get_corpus_group()->add_corpus(corp);
+    group->add_corpus(corp);
 
   xmlTextReaderNext(reader.get());
 
-  return ctxt.get_corpus_group();
+  return group;
 }
 
 /// De-serialize an ABI corpus group from an input XML document which