Message ID | 20200504162439.74028-1-gprocida@google.com |
---|---|
Headers |
From: gprocida@google.com (Giuliano Procida) Date: Mon, 4 May 2020 17:24:36 +0100 Subject: [PATCH 0/3] Add an option to give finer-grained control of offset reporting. Message-ID: <20200504162439.74028-1-gprocida@google.com> |
Series |
Add an option to give finer-grained control of offset reporting.
|
|
Message
Giuliano Procida
May 4, 2020, 4:24 p.m. UTC
This is from an internal request. I'm pretty happy with the core changes. The new command line option and plumbing are what I think deserve more scrutiny. Do we want a more generic way of manipulating the change categories? This change takes us up to 3 flags. Regards, Giuliano. Giuliano Procida (3): abidiff.cc: tidy using directives and declarations Allow offset changes to be considered harmless. Add abidiff --offset-changes-are-harmless tests. include/abg-comparison.h | 35 ++++++---- src/abg-comp-filter.cc | 14 ++-- src/abg-comparison.cc | 15 +++- src/abg-reporter-priv.cc | 17 +++-- tests/data/Makefile.am | 8 +++ .../test-offset-harm-report0.txt | 15 ++++ .../test-offset-harm-report1.txt | 20 ++++++ .../test-offset-harm-report2.txt | 3 + .../test-offset-harm-report3.txt | 14 ++++ .../test-abidiff-exit/test-offset-harm-v0.c | 7 ++ .../test-abidiff-exit/test-offset-harm-v0.o | Bin 0 -> 2872 bytes .../test-abidiff-exit/test-offset-harm-v1.c | 7 ++ .../test-abidiff-exit/test-offset-harm-v1.o | Bin 0 -> 2864 bytes tests/test-abidiff-exit.cc | 36 ++++++++++ tools/abidiff.cc | 64 +++++++++++------- 15 files changed, 202 insertions(+), 53 deletions(-) create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-report0.txt create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-report1.txt create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-report2.txt create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-report3.txt create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-v0.c create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-v0.o create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-v1.c create mode 100644 tests/data/test-abidiff-exit/test-offset-harm-v1.o
Comments
Matthias Maennich <maennich@google.com> a écrit: > As such this is a reporting issue as each single member offset change is > in fact an ABI breakage, and actually never harmless. Aggreed. Though, as I was saying in https://sourceware.org/pipermail/libabigail/2020q2/002229.html, we could avoid emitting them only when they are a consequence of an ABI change that we've already emitted, just like ... > What appears to me as a more effective way of addressing this would be > to collapse subsequent offset changes. Instead of dropping the reports > of those completely, how about we add an option that changes the report > from > > 'struct task_struct at sched.h:635:1' changed: > type size hasn't changed > 1 data member insertion: > 'unsigned int task_struct::in_ubsan', at offset 16704 (in bits) at sched.h:1006:1 > there are data member changes: > 'void* task_struct::journal_info' offset changed from 16704 to 16768 (in bits) (by +64 bits) > 'bio_list* task_struct::bio_list' offset changed from 16768 to 16832 (in bits) (by +64 bits) > 'blk_plug* task_struct::plug' offset changed from 16832 to 16896 (in bits) (by +64 bits) > 'reclaim_state* task_struct::reclaim_state' offset changed from 16896 to 16960 (in bits) (by +64 bits) > 'backing_dev_info* task_struct::backing_dev_info' offset changed from 16960 to 17024 (in bits) (by +64 bits) > 'io_context* task_struct::io_context' offset changed from 17024 to 17088 (in bits) (by +64 bits) > 'capture_control* task_struct::capture_control' offset changed from 17088 to 17152 (in bits) (by +64 bits) > 'unsigned long int task_struct::ptrace_message' offset changed from 17152 to 17216 (in bits) (by +64 bits) > 'kernel_siginfo_t* task_struct::last_siginfo' offset changed from 17216 to 17280 (in bits) (by +64 bits) > > to > > 'struct task_struct at sched.h:635:1' changed: > type size hasn't changed > 1 data member insertion: > 'unsigned int task_struct::in_ubsan', at offset 16704 (in bits) at sched.h:1006:1 > > offset of 9 consecutive data members changed by +64bits (journal_info .. last_siginfo) ... here! And here, I totally agree with you that this would be an interesting way to report it. But we could also completely do away with reporting about them because they are redundant, in a sense. Note that there is already a pass that flags redundant changes and avoid emitting them, so in a sense, this new particular case would not be unheard of, so to speak. > While looking at that, we likely can also reduce the name we print for > each data member: > > 'struct task_struct at sched.h:635:1' changed: > type size hasn't changed > 1 data member insertion: > 'unsigned int task_struct::in_ubsan', at offset 16704 (in bits) at sched.h:1006:1 So you would emit a qualified data member name here ... > there are data member changes: > 'journal_info' offset changed from 16704 to 16768 (in bits) (by +64 bits) > 'bio_list' offset changed from 16768 to 16832 (in bits) (by +64 bits) > 'plug' offset changed from 16832 to 16896 (in bits) (by +64 bits) > 'reclaim_state' offset changed from 16896 to 16960 (in bits) (by +64 bits) > 'backing_dev_info' offset changed from 16960 to 17024 (in bits) (by +64 bits) > 'io_context' offset changed from 17024 to 17088 (in bits) (by +64 bits) > 'capture_control' offset changed from 17088 to 17152 (in bits) (by +64 bits) > 'ptrace_message' offset changed from 17152 to 17216 (in bits) (by +64 bits) > 'last_siginfo' offset changed from 17216 to 17280 (in bits) (by +64 bits) but not here? Also, would it make sense to emit those qualified data member names when we are analyzying a C++ program? Or do you think it's more legible to emit always emit non-qualified data member names? I know I repeat myself, but if we are to work on "tasks", it'd be nice to file them as enhancement reports so that we discuss them at "one place" and so that we can track them. I can go ahead and file these if you guys want :-) Cheers,
Hi Dodji! On Wed, May 13, 2020 at 02:06:32PM +0200, Dodji Seketeli wrote: >Matthias Maennich <maennich@google.com> a écrit: > >> As such this is a reporting issue as each single member offset change is >> in fact an ABI breakage, and actually never harmless. > >Aggreed. Though, as I was saying in >https://sourceware.org/pipermail/libabigail/2020q2/002229.html, we could >avoid emitting them only when they are a consequence of an ABI change >that we've already emitted, just like ... > > >> What appears to me as a more effective way of addressing this would be >> to collapse subsequent offset changes. Instead of dropping the reports >> of those completely, how about we add an option that changes the report >> from >> >> 'struct task_struct at sched.h:635:1' changed: >> type size hasn't changed >> 1 data member insertion: >> 'unsigned int task_struct::in_ubsan', at offset 16704 (in bits) at sched.h:1006:1 >> there are data member changes: >> 'void* task_struct::journal_info' offset changed from 16704 to 16768 (in bits) (by +64 bits) >> 'bio_list* task_struct::bio_list' offset changed from 16768 to 16832 (in bits) (by +64 bits) >> 'blk_plug* task_struct::plug' offset changed from 16832 to 16896 (in bits) (by +64 bits) >> 'reclaim_state* task_struct::reclaim_state' offset changed from 16896 to 16960 (in bits) (by +64 bits) >> 'backing_dev_info* task_struct::backing_dev_info' offset changed from 16960 to 17024 (in bits) (by +64 bits) >> 'io_context* task_struct::io_context' offset changed from 17024 to 17088 (in bits) (by +64 bits) >> 'capture_control* task_struct::capture_control' offset changed from 17088 to 17152 (in bits) (by +64 bits) >> 'unsigned long int task_struct::ptrace_message' offset changed from 17152 to 17216 (in bits) (by +64 bits) >> 'kernel_siginfo_t* task_struct::last_siginfo' offset changed from 17216 to 17280 (in bits) (by +64 bits) >> >> to >> >> 'struct task_struct at sched.h:635:1' changed: >> type size hasn't changed >> 1 data member insertion: >> 'unsigned int task_struct::in_ubsan', at offset 16704 (in bits) at sched.h:1006:1 >> >> offset of 9 consecutive data members changed by +64bits (journal_info .. last_siginfo) > >... here! > >And here, I totally agree with you that this would be an interesting way >to report it. But we could also completely do away with reporting about >them because they are redundant, in a sense. Note that there is already >a pass that flags redundant changes and avoid emitting them, so in a >sense, this new particular case would not be unheard of, so to speak. I think I would like the one line of mention as I suggested. Omitting it conditionally introduces more logic and likely we confuse users. I also do not see them as redundant, just obvious and therefore a short form is enough to transmit this information. Does this make sense? > > >> While looking at that, we likely can also reduce the name we print for >> each data member: >> >> 'struct task_struct at sched.h:635:1' changed: >> type size hasn't changed >> 1 data member insertion: >> 'unsigned int task_struct::in_ubsan', at offset 16704 (in bits) at sched.h:1006:1 > >So you would emit a qualified data member name here ... > >> there are data member changes: >> 'journal_info' offset changed from 16704 to 16768 (in bits) (by +64 bits) >> 'bio_list' offset changed from 16768 to 16832 (in bits) (by +64 bits) >> 'plug' offset changed from 16832 to 16896 (in bits) (by +64 bits) >> 'reclaim_state' offset changed from 16896 to 16960 (in bits) (by +64 bits) >> 'backing_dev_info' offset changed from 16960 to 17024 (in bits) (by +64 bits) >> 'io_context' offset changed from 17024 to 17088 (in bits) (by +64 bits) >> 'capture_control' offset changed from 17088 to 17152 (in bits) (by +64 bits) >> 'ptrace_message' offset changed from 17152 to 17216 (in bits) (by +64 bits) >> 'last_siginfo' offset changed from 17216 to 17280 (in bits) (by +64 bits) > >but not here? Oh, that was an oversight. I meant to apply the simplification in all places. I.e. 'in_ubsan' only. > >Also, would it make sense to emit those qualified data member names when >we are analyzying a C++ program? Or do you think it's more legible to >emit always emit non-qualified data member names? I think this very much applies to C structs as well. In fact, the above is a C example. I think we should use the non-qualified version if we are in a nested block where the rest of the qualified name is already emitted. In the above example, we are in the block of 'task_struct' differences, hence we can strip that part away from task_struct::in_ubsan'. > >I know I repeat myself, but if we are to work on "tasks", it'd be nice >to file them as enhancement reports so that we discuss them at "one >place" and so that we can track them. I can go ahead and file these if >you guys want :-) Understood. And we will happily file them. I guess, what started with fixes here and there, became more frequent and more involved contributions without adjusting the process properly. Sorry about that. Cheers, Matthias > >Cheers, > >-- > Dodji
Matthias Maennich <maennich@google.com> a écrit: [...] > I think I would like the one line of mention as I suggested. Omitting it > conditionally introduces more logic and likely we confuse users. Either way, we'll have to introdroce more logic anyway, so I am not concerned about that, if, of course that is useful. I guess what I wanted to say is that the one line mention seems like a good idea because that example is simple. If you have more than one non-contiguous addition/removal/changes of data members in the struct, then I am not sure the one liner mention is still that "relevant". Or maybe in that case we can avoir the one liner mention and emit the in-extenso message as we have today? What do you think? > I also do not see them as redundant, just obvious and therefore a > short form is enough to transmit this information. Hmmh, yes, I think you are right. The offset changes are not redundant per se. They just happen to be caused by the change, but they don't necessary have to be. Point taken. [...] >>Also, would it make sense to emit those qualified data member names when >>we are analyzying a C++ program? Or do you think it's more legible to >>emit always emit non-qualified data member names? > > I think this very much applies to C structs as well. In fact, the above > is a C example. Yeah, I got that. I really meant my question. I'll re-phrase it differently. Do you think we should avoid removing the full-qualification when we are analyzing c++ programs? Or do do you think what you are proposing for C should apply for C++ programs equally? I would think yes, but I am not sure. > I think we should use the non-qualified version if we > are in a nested block where the rest of the qualified name is already > emitted. In the above example, we are in the block of 'task_struct' > differences, hence we can strip that part away from > task_struct::in_ubsan'. This does make sense to me. If we agree that we should change this behaviour, then maybe we should create an enhance request for this. Shall we? Thanks, Cheers,
I'll just chime in... I concur that the initial change I proposed is dangerous in the sense that it allows abidiff to be used in a fashion that will not find certain ABI differences. Even if the mechanism makes sense, perhaps the policy does not. However, it's already possible to say --harmless --no-harmful. The splitting of SIZE/OFFSET categories may have some value for libabigail itself. I think SIZE would still be overloaded, including alignment and some other miscellaneous things. As for a concrete proposal, if the two of you feel you have reached a consensus, could one of you paste it into the issue tracker? Now, taking an even bigger step back, the source of this request is to further compress reports for human consumption. This means removing redundant information but it also means finding smaller diff descriptions (while still being legible and having meaningful context). We're considering ways of presenting different levels of details to our users, you might consider these abidiff options as making a hierarchy: --redundant --no-redundant --leaf-changes-only --impacted-interfaces --leaf-changes-only --impacted-interface-counts (this doesn't exist) --leaf-changes-only --leaf-changes-only --summarise-offset-changes (this doesn't exist) --leaf-changes-only --offset-changes-are-harmless (was this request) At the lowest level here, we'd only see root causes of ABI differences - useful for people who know there's some difference and just want to track it down. At the top level we get the full impact, with the change graph expanded into a tree as far as possible. It's quite expensive (in terms of run-time) to generate multiple reports (Matthias, we should fork and wait if we need to do this) with different variations. It's also (as the present example demonstrates) not ideal to add every variation to abidiff itself. So we are doing some things with post-processing. It might be straightforward to do this for offsets, for example. If there were an XML format for the diffs as well... :-) I hope this helps fill in a bit more of the background. Regards, Giuliano. On Thu, 14 May 2020 at 09:35, Dodji Seketeli <dodji@seketeli.org> wrote: > Matthias Maennich <maennich@google.com> a écrit: > > [...] > > > I think I would like the one line of mention as I suggested. Omitting it > > conditionally introduces more logic and likely we confuse users. > > Either way, we'll have to introdroce more logic anyway, so I am not > concerned about that, if, of course that is useful. > > I guess what I wanted to say is that the one line mention seems like a > good idea because that example is simple. If you have more than one > non-contiguous addition/removal/changes of data members in the struct, > then I am not sure the one liner mention is still that "relevant". Or > maybe in that case we can avoir the one liner mention and emit the > in-extenso message as we have today? What do you think? > > > I also do not see them as redundant, just obvious and therefore a > > short form is enough to transmit this information. > > Hmmh, yes, I think you are right. The offset changes are not redundant > per se. They just happen to be caused by the change, but they don't > necessary have to be. Point taken. > > [...] > > > >>Also, would it make sense to emit those qualified data member names when > >>we are analyzying a C++ program? Or do you think it's more legible to > >>emit always emit non-qualified data member names? > > > > I think this very much applies to C structs as well. In fact, the above > > is a C example. > > Yeah, I got that. I really meant my question. I'll re-phrase it > differently. Do you think we should avoid removing the > full-qualification when we are analyzing c++ programs? Or do do you > think what you are proposing for C should apply for C++ programs > equally? I would think yes, but I am not sure. > > > I think we should use the non-qualified version if we > > are in a nested block where the rest of the qualified name is already > > emitted. In the above example, we are in the block of 'task_struct' > > differences, hence we can strip that part away from > > task_struct::in_ubsan'. > > This does make sense to me. > > If we agree that we should change this behaviour, then maybe we should > create an enhance request for this. Shall we? > > Thanks, > > Cheers, > > -- > Dodji >
On Thu, May 14, 2020 at 10:35:22AM +0200, Dodji Seketeli wrote: >Matthias Maennich <maennich@google.com> a écrit: > >[...] > >> I think I would like the one line of mention as I suggested. Omitting it >> conditionally introduces more logic and likely we confuse users. > >Either way, we'll have to introdroce more logic anyway, so I am not >concerned about that, if, of course that is useful. > >I guess what I wanted to say is that the one line mention seems like a >good idea because that example is simple. If you have more than one >non-contiguous addition/removal/changes of data members in the struct, >then I am not sure the one liner mention is still that "relevant". Or >maybe in that case we can avoir the one liner mention and emit the >in-extenso message as we have today? What do you think? > >> I also do not see them as redundant, just obvious and therefore a >> short form is enough to transmit this information. > >Hmmh, yes, I think you are right. The offset changes are not redundant >per se. They just happen to be caused by the change, but they don't >necessary have to be. Point taken. > >[...] > > >>>Also, would it make sense to emit those qualified data member names when >>>we are analyzying a C++ program? Or do you think it's more legible to >>>emit always emit non-qualified data member names? >> >> I think this very much applies to C structs as well. In fact, the above >> is a C example. > >Yeah, I got that. I really meant my question. I'll re-phrase it >differently. Do you think we should avoid removing the >full-qualification when we are analyzing c++ programs? Or do do you >think what you are proposing for C should apply for C++ programs >equally? I would think yes, but I am not sure. > >> I think we should use the non-qualified version if we >> are in a nested block where the rest of the qualified name is already >> emitted. In the above example, we are in the block of 'task_struct' >> differences, hence we can strip that part away from >> task_struct::in_ubsan'. > >This does make sense to me. > >If we agree that we should change this behaviour, then maybe we should >create an enhance request for this. Shall we? I think we do agree :-) I opened Bug 26012 and 26013 to track those. Cheers, Matthias > >Thanks, > >Cheers, > >-- > Dodji
On Thu, May 14, 2020 at 01:39:48PM +0100, Giuliano Procida wrote: >I'll just chime in... > >I concur that the initial change I proposed is dangerous in the sense that >it allows abidiff to be used in a fashion that will not find certain ABI >differences. Even if the mechanism makes sense, perhaps the policy does >not. However, it's already possible to say --harmless --no-harmful. > >The splitting of SIZE/OFFSET categories may have some value for libabigail >itself. I think SIZE would still be overloaded, including alignment and >some other miscellaneous things. > >As for a concrete proposal, if the two of you feel you have reached a >consensus, could one of you paste it into the issue tracker? > >Now, taking an even bigger step back, the source of this request is to >further compress reports for human consumption. This means removing >redundant information but it also means finding smaller diff descriptions >(while still being legible and having meaningful context). We're >considering ways of presenting different levels of details to our users, >you might consider these abidiff options as making a hierarchy: > >--redundant >--no-redundant >--leaf-changes-only --impacted-interfaces >--leaf-changes-only --impacted-interface-counts (this doesn't exist) >--leaf-changes-only >--leaf-changes-only --summarise-offset-changes (this doesn't exist) >--leaf-changes-only --offset-changes-are-harmless (was this request) > >At the lowest level here, we'd only see root causes of ABI differences - >useful for people who know there's some difference and just want to track >it down. >At the top level we get the full impact, with the change graph expanded >into a tree as far as possible. > >It's quite expensive (in terms of run-time) to generate multiple reports >(Matthias, we should fork and wait if we need to do this) with different >variations. It's also (as the present example demonstrates) not ideal to >add every variation to abidiff itself. So we are doing some things with >post-processing. It might be straightforward to do this for offsets, for >example. If there were an XML format for the diffs as well... :-) I agree that we can solve the problem either in abigail and use downstream more CPU to generate multiple reports at the same time or do some postprocessing of the most verbose report. I like the idea of an intermediate (XML?) format for the diff, actually. That would allow all sorts of reports and different flags without touching libabigail's core diffing. Yet, my main motivation here to chime in was to point out that we should improve the default for lossless compression and for human consumption. For more detail, we can add verbosity flags. Maybe even like you suggested above. But the first target group should be developers that need to fix an actual breakage. Cheers, Matthias > >I hope this helps fill in a bit more of the background. > >Regards, >Giuliano. > >On Thu, 14 May 2020 at 09:35, Dodji Seketeli <dodji@seketeli.org> wrote: > >> Matthias Maennich <maennich@google.com> a écrit: >> >> [...] >> >> > I think I would like the one line of mention as I suggested. Omitting it >> > conditionally introduces more logic and likely we confuse users. >> >> Either way, we'll have to introdroce more logic anyway, so I am not >> concerned about that, if, of course that is useful. >> >> I guess what I wanted to say is that the one line mention seems like a >> good idea because that example is simple. If you have more than one >> non-contiguous addition/removal/changes of data members in the struct, >> then I am not sure the one liner mention is still that "relevant". Or >> maybe in that case we can avoir the one liner mention and emit the >> in-extenso message as we have today? What do you think? >> >> > I also do not see them as redundant, just obvious and therefore a >> > short form is enough to transmit this information. >> >> Hmmh, yes, I think you are right. The offset changes are not redundant >> per se. They just happen to be caused by the change, but they don't >> necessary have to be. Point taken. >> >> [...] >> >> >> >>Also, would it make sense to emit those qualified data member names when >> >>we are analyzying a C++ program? Or do you think it's more legible to >> >>emit always emit non-qualified data member names? >> > >> > I think this very much applies to C structs as well. In fact, the above >> > is a C example. >> >> Yeah, I got that. I really meant my question. I'll re-phrase it >> differently. Do you think we should avoid removing the >> full-qualification when we are analyzing c++ programs? Or do do you >> think what you are proposing for C should apply for C++ programs >> equally? I would think yes, but I am not sure. >> >> > I think we should use the non-qualified version if we >> > are in a nested block where the rest of the qualified name is already >> > emitted. In the above example, we are in the block of 'task_struct' >> > differences, hence we can strip that part away from >> > task_struct::in_ubsan'. >> >> This does make sense to me. >> >> If we agree that we should change this behaviour, then maybe we should >> create an enhance request for this. Shall we? >> >> Thanks, >> >> Cheers, >> >> -- >> Dodji >>
Giuliano Procida <gprocida@google.com> a écrit: [...] > It's quite expensive (in terms of run-time) to generate multiple reports > (Matthias, we should fork and wait if we need to do this) with different > variations. It's also (as the present example demonstrates) not ideal to > add every variation to abidiff itself. So we are doing some things with > post-processing. [...] > If there were an XML format for the diffs as well... :-) Actually, I have been thinking about that at some point. My thinking was more around a textual serialization format for the diff reports. I think it's roughly equals what you mean by "XML format for the diffs". I went ahead and started to design an XML schema to define the format. The schema is in the relax-ng format and can be read at https://sourceware.org/git/?p=libabigail.git;a=blob;f=doc/manuals/ifac-schema.rng;h=e0676f6a233770ab09f296db6cfd2fb5421ed60f;hb=9105467c9020ebbbea689c056e681b54287db201. The (very light) introduction to that schema can be read at https://sourceware.org/git/?p=libabigail.git;a=blob;f=doc/manuals/libabigail-xml-report.rst;h=0e690167b6bbe8b8629c33baece62f4e43810af2;hb=9105467c9020ebbbea689c056e681b54287db201. The idea was to be able to emit a report in an abstract format so that other tools could come up with html (or whatever interactive format) reports as they'd see fit. I wanted to be able to share that abstract change reporting format with other tools like e.g, the "abi-compliance-checker" tool, but the author didn't seem interested at the time. That was ~ 5 years ago. I moved to other more pressing matters since then. > It might be straightforward to do this for offsets, for example. We can do whatever we want with post-processing, granted, so you could do the offsets change report formating there. Nevertheless, I think this is such a fundamental thing that abidiff ought to be able to do it anyway. Cheers,