mbox series

[0/3] Add an option to give finer-grained control of offset reporting.

Message ID	20200504162439.74028-1-gprocida@google.com
Headers	From: gprocida@google.com (Giuliano Procida) Date: Mon, 4 May 2020 17:24:36 +0100 Subject: [PATCH 0/3] Add an option to give finer-grained control of offset reporting. Message-ID: <20200504162439.74028-1-gprocida@google.com>
Series	Add an option to give finer-grained control of offset reporting. \| [0/3] Add an option to give finer-grained control of offset reporting. [1/3] abidiff.cc: tidy using directives and declarations [2/3] Allow offset changes to be considered harmless. [3/3] Add abidiff --offset-changes-are-harmless tests.

Message

Giuliano Procida May 4, 2020, 4:24 p.m. UTC

  This is from an internal request.

I'm pretty happy with the core changes.  The new command line option
and plumbing are what I think deserve more scrutiny.  Do we want a
more generic way of manipulating the change categories?  This change
takes us up to 3 flags.

Regards,
Giuliano.

Giuliano Procida (3):
  abidiff.cc: tidy using directives and declarations
  Allow offset changes to be considered harmless.
  Add abidiff --offset-changes-are-harmless tests.

 include/abg-comparison.h                      |  35 ++++++----
 src/abg-comp-filter.cc                        |  14 ++--
 src/abg-comparison.cc                         |  15 +++-
 src/abg-reporter-priv.cc                      |  17 +++--
 tests/data/Makefile.am                        |   8 +++
 .../test-offset-harm-report0.txt              |  15 ++++
 .../test-offset-harm-report1.txt              |  20 ++++++
 .../test-offset-harm-report2.txt              |   3 +
 .../test-offset-harm-report3.txt              |  14 ++++
 .../test-abidiff-exit/test-offset-harm-v0.c   |   7 ++
 .../test-abidiff-exit/test-offset-harm-v0.o   | Bin 0 -> 2872 bytes
 .../test-abidiff-exit/test-offset-harm-v1.c   |   7 ++
 .../test-abidiff-exit/test-offset-harm-v1.o   | Bin 0 -> 2864 bytes
 tests/test-abidiff-exit.cc                    |  36 ++++++++++
 tools/abidiff.cc                              |  64 +++++++++++-------
 15 files changed, 202 insertions(+), 53 deletions(-)
 create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-report0.txt
 create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-report1.txt
 create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-report2.txt
 create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-report3.txt
 create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-v0.c
 create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-v0.o
 create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-v1.c
 create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-v1.o

Comments

Dodji Seketeli May 13, 2020, 12:06 p.m. UTC | #1

Matthias Maennich <maennich@google.com> a écrit:

> As such this is a reporting issue as each single member offset change is
> in fact an ABI breakage, and actually never harmless.

Aggreed.  Though, as I was saying in
https://sourceware.org/pipermail/libabigail/2020q2/002229.html, we could
avoid emitting them only when they are a consequence of an ABI change
that we've already emitted, just like ...


> What appears to me as a more effective way of addressing this would be
> to collapse subsequent offset changes. Instead of dropping the reports
> of those completely, how about we add an option that changes the report
> from
>
> 'struct task_struct at sched.h:635:1' changed:
>   type size hasn't changed
>   1 data member insertion:
>     'unsigned int task_struct::in_ubsan', at offset 16704 (in bits) at sched.h:1006:1
>   there are data member changes:
>     'void* task_struct::journal_info' offset changed from 16704 to 16768 (in bits) (by +64 bits)
>     'bio_list* task_struct::bio_list' offset changed from 16768 to 16832 (in bits) (by +64 bits)
>     'blk_plug* task_struct::plug' offset changed from 16832 to 16896 (in bits) (by +64 bits)
>     'reclaim_state* task_struct::reclaim_state' offset changed from 16896 to 16960 (in bits) (by +64 bits)
>     'backing_dev_info* task_struct::backing_dev_info' offset changed from 16960 to 17024 (in bits) (by +64 bits)
>     'io_context* task_struct::io_context' offset changed from 17024 to 17088 (in bits) (by +64 bits)
>     'capture_control* task_struct::capture_control' offset changed from 17088 to 17152 (in bits) (by +64 bits)
>     'unsigned long int task_struct::ptrace_message' offset changed from 17152 to 17216 (in bits) (by +64 bits)
>     'kernel_siginfo_t* task_struct::last_siginfo' offset changed from 17216 to 17280 (in bits) (by +64 bits)
>
> to
>
> 'struct task_struct at sched.h:635:1' changed:
>   type size hasn't changed
>   1 data member insertion:
>     'unsigned int task_struct::in_ubsan', at offset 16704 (in bits) at sched.h:1006:1
>
>   offset of 9 consecutive data members changed by +64bits (journal_info .. last_siginfo)

... here!

And here, I totally agree with you that this would be an interesting way
to report it.  But we could also completely do away with reporting about
them because they are redundant, in a sense.  Note that there is already
a pass that flags redundant changes and avoid emitting them, so in a
sense, this new particular case would not be unheard of, so to speak.


> While looking at that, we likely can also reduce the name we print for
> each data member:
>
> 'struct task_struct at sched.h:635:1' changed:
>   type size hasn't changed
>   1 data member insertion:
>     'unsigned int task_struct::in_ubsan', at offset 16704 (in bits) at sched.h:1006:1
 
So you would emit a qualified data member name here ...

>   there are data member changes:
>     'journal_info' offset changed from 16704 to 16768 (in bits) (by +64 bits)
>     'bio_list' offset changed from 16768 to 16832 (in bits) (by +64 bits)
>     'plug' offset changed from 16832 to 16896 (in bits) (by +64 bits)
>     'reclaim_state' offset changed from 16896 to 16960 (in bits) (by +64 bits)
>     'backing_dev_info' offset changed from 16960 to 17024 (in bits) (by +64 bits)
>     'io_context' offset changed from 17024 to 17088 (in bits) (by +64 bits)
>     'capture_control' offset changed from 17088 to 17152 (in bits) (by +64 bits)
>     'ptrace_message' offset changed from 17152 to 17216 (in bits) (by +64 bits)
>     'last_siginfo' offset changed from 17216 to 17280 (in bits) (by +64 bits)

but not here?

Also, would it make sense to emit those qualified data member names when
we are analyzying a C++ program?  Or do you think it's more legible to
emit always emit non-qualified data member names?

I know I repeat myself, but if we are to work on "tasks", it'd be nice
to file them as enhancement reports so that we discuss them at "one
place" and so that we can track them.  I can go ahead and file these if
you guys want :-)

Cheers,

Matthias Männich May 13, 2020, 7:38 p.m. UTC | #2

Hi Dodji!

On Wed, May 13, 2020 at 02:06:32PM +0200, Dodji Seketeli wrote:
>Matthias Maennich <maennich@google.com> a écrit:
>
>> As such this is a reporting issue as each single member offset change is
>> in fact an ABI breakage, and actually never harmless.
>
>Aggreed.  Though, as I was saying in
>https://sourceware.org/pipermail/libabigail/2020q2/002229.html, we could
>avoid emitting them only when they are a consequence of an ABI change
>that we've already emitted, just like ...
>
>
>> What appears to me as a more effective way of addressing this would be
>> to collapse subsequent offset changes. Instead of dropping the reports
>> of those completely, how about we add an option that changes the report
>> from
>>
>> 'struct task_struct at sched.h:635:1' changed:
>>   type size hasn't changed
>>   1 data member insertion:
>>     'unsigned int task_struct::in_ubsan', at offset 16704 (in bits) at sched.h:1006:1
>>   there are data member changes:
>>     'void* task_struct::journal_info' offset changed from 16704 to 16768 (in bits) (by +64 bits)
>>     'bio_list* task_struct::bio_list' offset changed from 16768 to 16832 (in bits) (by +64 bits)
>>     'blk_plug* task_struct::plug' offset changed from 16832 to 16896 (in bits) (by +64 bits)
>>     'reclaim_state* task_struct::reclaim_state' offset changed from 16896 to 16960 (in bits) (by +64 bits)
>>     'backing_dev_info* task_struct::backing_dev_info' offset changed from 16960 to 17024 (in bits) (by +64 bits)
>>     'io_context* task_struct::io_context' offset changed from 17024 to 17088 (in bits) (by +64 bits)
>>     'capture_control* task_struct::capture_control' offset changed from 17088 to 17152 (in bits) (by +64 bits)
>>     'unsigned long int task_struct::ptrace_message' offset changed from 17152 to 17216 (in bits) (by +64 bits)
>>     'kernel_siginfo_t* task_struct::last_siginfo' offset changed from 17216 to 17280 (in bits) (by +64 bits)
>>
>> to
>>
>> 'struct task_struct at sched.h:635:1' changed:
>>   type size hasn't changed
>>   1 data member insertion:
>>     'unsigned int task_struct::in_ubsan', at offset 16704 (in bits) at sched.h:1006:1
>>
>>   offset of 9 consecutive data members changed by +64bits (journal_info .. last_siginfo)
>
>... here!
>
>And here, I totally agree with you that this would be an interesting way
>to report it.  But we could also completely do away with reporting about
>them because they are redundant, in a sense.  Note that there is already
>a pass that flags redundant changes and avoid emitting them, so in a
>sense, this new particular case would not be unheard of, so to speak.

I think I would like the one line of mention as I suggested. Omitting it
conditionally introduces more logic and likely we confuse users. I also
do not see them as redundant, just obvious and therefore a short form is
enough to transmit this information. Does this make sense?

>
>
>> While looking at that, we likely can also reduce the name we print for
>> each data member:
>>
>> 'struct task_struct at sched.h:635:1' changed:
>>   type size hasn't changed
>>   1 data member insertion:
>>     'unsigned int task_struct::in_ubsan', at offset 16704 (in bits) at sched.h:1006:1
>
>So you would emit a qualified data member name here ...
>
>>   there are data member changes:
>>     'journal_info' offset changed from 16704 to 16768 (in bits) (by +64 bits)
>>     'bio_list' offset changed from 16768 to 16832 (in bits) (by +64 bits)
>>     'plug' offset changed from 16832 to 16896 (in bits) (by +64 bits)
>>     'reclaim_state' offset changed from 16896 to 16960 (in bits) (by +64 bits)
>>     'backing_dev_info' offset changed from 16960 to 17024 (in bits) (by +64 bits)
>>     'io_context' offset changed from 17024 to 17088 (in bits) (by +64 bits)
>>     'capture_control' offset changed from 17088 to 17152 (in bits) (by +64 bits)
>>     'ptrace_message' offset changed from 17152 to 17216 (in bits) (by +64 bits)
>>     'last_siginfo' offset changed from 17216 to 17280 (in bits) (by +64 bits)
>
>but not here?

Oh, that was an oversight. I meant to apply the simplification in all
places. I.e. 'in_ubsan' only.

>
>Also, would it make sense to emit those qualified data member names when
>we are analyzying a C++ program?  Or do you think it's more legible to
>emit always emit non-qualified data member names?

I think this very much applies to C structs as well. In fact, the above
is a C example. I think we should use the non-qualified version if we
are in a nested block where the rest of the qualified name is already
emitted. In the above example, we are in the block of 'task_struct'
differences, hence we can strip that part away from
task_struct::in_ubsan'.

>
>I know I repeat myself, but if we are to work on "tasks", it'd be nice
>to file them as enhancement reports so that we discuss them at "one
>place" and so that we can track them.  I can go ahead and file these if
>you guys want :-)

Understood. And we will happily file them. I guess, what started with
fixes here and there, became more frequent and more involved
contributions without adjusting the process properly. Sorry about that.

Cheers,
Matthias

>
>Cheers,
>
>-- 
>		Dodji

Dodji Seketeli May 14, 2020, 8:35 a.m. UTC | #3

Matthias Maennich <maennich@google.com> a écrit:

[...]

> I think I would like the one line of mention as I suggested. Omitting it
> conditionally introduces more logic and likely we confuse users.

Either way, we'll have to introdroce more logic anyway, so I am not
concerned about that, if, of course that is useful.

I guess what I wanted to say is that the one line mention seems like a
good idea because that example is simple.  If you have more than one
non-contiguous addition/removal/changes of data members in the struct,
then I am not sure the one liner mention is still that "relevant".  Or
maybe in that case we can avoir the one liner mention and emit the
in-extenso message as we have today?  What do you think?

> I also do not see them as redundant, just obvious and therefore a
> short form is enough to transmit this information.

Hmmh, yes, I think you are right.  The offset changes are not redundant
per se.  They just happen to be caused by the change, but they don't
necessary have to be.  Point taken.

[...]


>>Also, would it make sense to emit those qualified data member names when
>>we are analyzying a C++ program?  Or do you think it's more legible to
>>emit always emit non-qualified data member names?
>
> I think this very much applies to C structs as well. In fact, the above
> is a C example.

Yeah, I got that.  I really meant my question.  I'll re-phrase it
differently.  Do you think we should avoid removing the
full-qualification when we are analyzing c++ programs?  Or do do you
think what you are proposing for C should apply for C++ programs
equally?  I would think yes, but I am not sure.

> I think we should use the non-qualified version if we
> are in a nested block where the rest of the qualified name is already
> emitted. In the above example, we are in the block of 'task_struct'
> differences, hence we can strip that part away from
> task_struct::in_ubsan'.

This does make sense to me.

If we agree that we should change this behaviour, then maybe we should
create an enhance request for this.  Shall we?

Thanks,

Cheers,

Giuliano Procida May 14, 2020, 12:39 p.m. UTC | #4

I'll just chime in...

I concur that the initial change I proposed is dangerous in the sense that
it allows abidiff to be used in a fashion that will not find certain ABI
differences. Even if the mechanism makes sense, perhaps the policy does
not. However, it's already possible to say --harmless --no-harmful.

The splitting of SIZE/OFFSET categories may have some value for libabigail
itself. I think SIZE would still be overloaded, including alignment and
some other miscellaneous things.

As for a concrete proposal, if the two of you feel you have reached a
consensus, could one of you paste it into the issue tracker?

Now, taking an even bigger step back, the source of this request is to
further compress reports for human consumption. This means removing
redundant information but it also means finding smaller diff descriptions
(while still being legible and having meaningful context). We're
considering ways of presenting different levels of details to our users,
you might consider these abidiff options as making a hierarchy:

--redundant
--no-redundant
--leaf-changes-only --impacted-interfaces
--leaf-changes-only --impacted-interface-counts (this doesn't exist)
--leaf-changes-only
--leaf-changes-only --summarise-offset-changes (this doesn't exist)
--leaf-changes-only --offset-changes-are-harmless (was this request)

At the lowest level here, we'd only see root causes of ABI differences -
useful for people who know there's some difference and just want to track
it down.
At the top level we get the full impact, with the change graph expanded
into a tree as far as possible.

It's quite expensive (in terms of run-time) to generate multiple reports
(Matthias, we should fork and wait if we need to do this) with different
variations. It's also (as the present example demonstrates) not ideal to
add every variation to abidiff itself. So we are doing some things with
post-processing. It might be straightforward to do this for offsets, for
example. If there were an XML format for the diffs as well... :-)

I hope this helps fill in a bit more of the background.

Regards,
Giuliano.

On Thu, 14 May 2020 at 09:35, Dodji Seketeli <dodji@seketeli.org> wrote:

> Matthias Maennich <maennich@google.com> a écrit:
>
> [...]
>
> > I think I would like the one line of mention as I suggested. Omitting it
> > conditionally introduces more logic and likely we confuse users.
>
> Either way, we'll have to introdroce more logic anyway, so I am not
> concerned about that, if, of course that is useful.
>
> I guess what I wanted to say is that the one line mention seems like a
> good idea because that example is simple.  If you have more than one
> non-contiguous addition/removal/changes of data members in the struct,
> then I am not sure the one liner mention is still that "relevant".  Or
> maybe in that case we can avoir the one liner mention and emit the
> in-extenso message as we have today?  What do you think?
>
> > I also do not see them as redundant, just obvious and therefore a
> > short form is enough to transmit this information.
>
> Hmmh, yes, I think you are right.  The offset changes are not redundant
> per se.  They just happen to be caused by the change, but they don't
> necessary have to be.  Point taken.
>
> [...]
>
>
> >>Also, would it make sense to emit those qualified data member names when
> >>we are analyzying a C++ program?  Or do you think it's more legible to
> >>emit always emit non-qualified data member names?
> >
> > I think this very much applies to C structs as well. In fact, the above
> > is a C example.
>
> Yeah, I got that.  I really meant my question.  I'll re-phrase it
> differently.  Do you think we should avoid removing the
> full-qualification when we are analyzing c++ programs?  Or do do you
> think what you are proposing for C should apply for C++ programs
> equally?  I would think yes, but I am not sure.
>
> > I think we should use the non-qualified version if we
> > are in a nested block where the rest of the qualified name is already
> > emitted. In the above example, we are in the block of 'task_struct'
> > differences, hence we can strip that part away from
> > task_struct::in_ubsan'.
>
> This does make sense to me.
>
> If we agree that we should change this behaviour, then maybe we should
> create an enhance request for this.  Shall we?
>
> Thanks,
>
> Cheers,
>
> --
>                 Dodji
>

Matthias Männich May 18, 2020, 8:09 p.m. UTC | #5

On Thu, May 14, 2020 at 10:35:22AM +0200, Dodji Seketeli wrote:
>Matthias Maennich <maennich@google.com> a écrit:
>
>[...]
>
>> I think I would like the one line of mention as I suggested. Omitting it
>> conditionally introduces more logic and likely we confuse users.
>
>Either way, we'll have to introdroce more logic anyway, so I am not
>concerned about that, if, of course that is useful.
>
>I guess what I wanted to say is that the one line mention seems like a
>good idea because that example is simple.  If you have more than one
>non-contiguous addition/removal/changes of data members in the struct,
>then I am not sure the one liner mention is still that "relevant".  Or
>maybe in that case we can avoir the one liner mention and emit the
>in-extenso message as we have today?  What do you think?
>
>> I also do not see them as redundant, just obvious and therefore a
>> short form is enough to transmit this information.
>
>Hmmh, yes, I think you are right.  The offset changes are not redundant
>per se.  They just happen to be caused by the change, but they don't
>necessary have to be.  Point taken.
>
>[...]
>
>
>>>Also, would it make sense to emit those qualified data member names when
>>>we are analyzying a C++ program?  Or do you think it's more legible to
>>>emit always emit non-qualified data member names?
>>
>> I think this very much applies to C structs as well. In fact, the above
>> is a C example.
>
>Yeah, I got that.  I really meant my question.  I'll re-phrase it
>differently.  Do you think we should avoid removing the
>full-qualification when we are analyzing c++ programs?  Or do do you
>think what you are proposing for C should apply for C++ programs
>equally?  I would think yes, but I am not sure.
>
>> I think we should use the non-qualified version if we
>> are in a nested block where the rest of the qualified name is already
>> emitted. In the above example, we are in the block of 'task_struct'
>> differences, hence we can strip that part away from
>> task_struct::in_ubsan'.
>
>This does make sense to me.
>
>If we agree that we should change this behaviour, then maybe we should
>create an enhance request for this.  Shall we?

I think we do agree :-)

I opened Bug 26012 and 26013 to track those.

Cheers,
Matthias

>
>Thanks,
>
>Cheers,
>
>-- 
>		Dodji

Matthias Männich May 18, 2020, 8:16 p.m. UTC | #6

On Thu, May 14, 2020 at 01:39:48PM +0100, Giuliano Procida wrote:
>I'll just chime in...
>
>I concur that the initial change I proposed is dangerous in the sense that
>it allows abidiff to be used in a fashion that will not find certain ABI
>differences. Even if the mechanism makes sense, perhaps the policy does
>not. However, it's already possible to say --harmless --no-harmful.
>
>The splitting of SIZE/OFFSET categories may have some value for libabigail
>itself. I think SIZE would still be overloaded, including alignment and
>some other miscellaneous things.
>
>As for a concrete proposal, if the two of you feel you have reached a
>consensus, could one of you paste it into the issue tracker?
>
>Now, taking an even bigger step back, the source of this request is to
>further compress reports for human consumption. This means removing
>redundant information but it also means finding smaller diff descriptions
>(while still being legible and having meaningful context). We're
>considering ways of presenting different levels of details to our users,
>you might consider these abidiff options as making a hierarchy:
>
>--redundant
>--no-redundant
>--leaf-changes-only --impacted-interfaces
>--leaf-changes-only --impacted-interface-counts (this doesn't exist)
>--leaf-changes-only
>--leaf-changes-only --summarise-offset-changes (this doesn't exist)
>--leaf-changes-only --offset-changes-are-harmless (was this request)
>
>At the lowest level here, we'd only see root causes of ABI differences -
>useful for people who know there's some difference and just want to track
>it down.
>At the top level we get the full impact, with the change graph expanded
>into a tree as far as possible.
>
>It's quite expensive (in terms of run-time) to generate multiple reports
>(Matthias, we should fork and wait if we need to do this) with different
>variations. It's also (as the present example demonstrates) not ideal to
>add every variation to abidiff itself. So we are doing some things with
>post-processing. It might be straightforward to do this for offsets, for
>example. If there were an XML format for the diffs as well... :-)

I agree that we can solve the problem either in abigail and use
downstream more CPU to generate multiple reports at the same time or do
some postprocessing of the most verbose report. I like the idea of an
intermediate (XML?) format for the diff, actually. That would allow all
sorts of reports and different flags without touching libabigail's core
diffing.

Yet, my main motivation here to chime in was to point out that we should
improve the default for lossless compression and for human consumption.
For more detail, we can add verbosity flags. Maybe even like you
suggested above. But the first target group should be developers that
need to fix an actual breakage.

Cheers,
Matthias

>
>I hope this helps fill in a bit more of the background.
>
>Regards,
>Giuliano.
>
>On Thu, 14 May 2020 at 09:35, Dodji Seketeli <dodji@seketeli.org> wrote:
>
>> Matthias Maennich <maennich@google.com> a écrit:
>>
>> [...]
>>
>> > I think I would like the one line of mention as I suggested. Omitting it
>> > conditionally introduces more logic and likely we confuse users.
>>
>> Either way, we'll have to introdroce more logic anyway, so I am not
>> concerned about that, if, of course that is useful.
>>
>> I guess what I wanted to say is that the one line mention seems like a
>> good idea because that example is simple.  If you have more than one
>> non-contiguous addition/removal/changes of data members in the struct,
>> then I am not sure the one liner mention is still that "relevant".  Or
>> maybe in that case we can avoir the one liner mention and emit the
>> in-extenso message as we have today?  What do you think?
>>
>> > I also do not see them as redundant, just obvious and therefore a
>> > short form is enough to transmit this information.
>>
>> Hmmh, yes, I think you are right.  The offset changes are not redundant
>> per se.  They just happen to be caused by the change, but they don't
>> necessary have to be.  Point taken.
>>
>> [...]
>>
>>
>> >>Also, would it make sense to emit those qualified data member names when
>> >>we are analyzying a C++ program?  Or do you think it's more legible to
>> >>emit always emit non-qualified data member names?
>> >
>> > I think this very much applies to C structs as well. In fact, the above
>> > is a C example.
>>
>> Yeah, I got that.  I really meant my question.  I'll re-phrase it
>> differently.  Do you think we should avoid removing the
>> full-qualification when we are analyzing c++ programs?  Or do do you
>> think what you are proposing for C should apply for C++ programs
>> equally?  I would think yes, but I am not sure.
>>
>> > I think we should use the non-qualified version if we
>> > are in a nested block where the rest of the qualified name is already
>> > emitted. In the above example, we are in the block of 'task_struct'
>> > differences, hence we can strip that part away from
>> > task_struct::in_ubsan'.
>>
>> This does make sense to me.
>>
>> If we agree that we should change this behaviour, then maybe we should
>> create an enhance request for this.  Shall we?
>>
>> Thanks,
>>
>> Cheers,
>>
>> --
>>                 Dodji
>>

Dodji Seketeli May 20, 2020, 7:57 a.m. UTC | #7

Giuliano Procida <gprocida@google.com> a écrit:

[...]

> It's quite expensive (in terms of run-time) to generate multiple reports
> (Matthias, we should fork and wait if we need to do this) with different
> variations. It's also (as the present example demonstrates) not ideal to
> add every variation to abidiff itself. So we are doing some things with
> post-processing. 

[...]

> If there were an XML format for the diffs as well... :-)

Actually, I have been thinking about that at some point.  My thinking
was more around a textual serialization format for the diff reports.  I
think it's roughly equals what you mean by "XML format for the diffs".
I went ahead and started to design an XML schema to define the format.
The schema is in the relax-ng format and can be read at
https://sourceware.org/git/?p=libabigail.git;a=blob;f=doc/manuals/ifac-schema.rng;h=e0676f6a233770ab09f296db6cfd2fb5421ed60f;hb=9105467c9020ebbbea689c056e681b54287db201.

The (very light) introduction to that schema can be read at
https://sourceware.org/git/?p=libabigail.git;a=blob;f=doc/manuals/libabigail-xml-report.rst;h=0e690167b6bbe8b8629c33baece62f4e43810af2;hb=9105467c9020ebbbea689c056e681b54287db201.

The idea was to be able to emit a report in an abstract format so that
other tools could come up with html (or whatever interactive format)
reports as they'd see fit.  I wanted to be able to share that abstract
change reporting format with other tools like e.g, the
"abi-compliance-checker" tool, but the author didn't seem interested at
the time.

That was ~ 5 years ago.  I moved to other more pressing matters since
then.

> It might be straightforward to do this for offsets, for example.

We can do whatever we want with post-processing, granted, so you could
do the offsets change report formating there.  Nevertheless, I think
this is such a fundamental thing that abidiff ought to be able to do it
anyway.

Cheers,