Add new script add-abilist.py

Message ID 20150302080300.B464D4242A0BE@oldenburg.str.redhat.com
State Dropped
Headers

Commit Message

Florian Weimer March 2, 2015, 8:01 a.m. UTC
  This script adds new entries to the abilist files, in the right place
according to the sorting produced by scripts/abilist.awk.
---
 scripts/add-abilist.py | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)
 create mode 100644 scripts/add-abilist.py
  

Comments

Roland McGrath March 2, 2015, 8:19 p.m. UTC | #1
What's wrong with 'make update-abi'?
  
Florian Weimer March 2, 2015, 8:49 p.m. UTC | #2
On 03/02/2015 09:19 PM, Roland McGrath wrote:
> What's wrong with 'make update-abi'?

It's very difficult to find. :-)  I haven't tried it, but I think it
only patches the current abilist file, and not the other architectures.
  
Roland McGrath March 2, 2015, 8:59 p.m. UTC | #3
> It's very difficult to find. :-)  I haven't tried it, but I think it
> only patches the current abilist file, and not the other architectures.

That's true.  It does an empirical update to the actual new reality, rather
than a specified updated to a putative new reality.
  
Florian Weimer March 2, 2015, 9:02 p.m. UTC | #4
On 03/02/2015 09:59 PM, Roland McGrath wrote:
>> It's very difficult to find. :-)  I haven't tried it, but I think it
>> only patches the current abilist file, and not the other architectures.
> 
> That's true.  It does an empirical update to the actual new reality, rather
> than a specified updated to a putative new reality.

I think both approaches have their advantages, at least for function
symbols.  Data symbols with their size information are more tricky.
  
Carlos O'Donell March 5, 2015, 7:30 p.m. UTC | #5
On 03/02/2015 04:02 PM, Florian Weimer wrote:
> On 03/02/2015 09:59 PM, Roland McGrath wrote:
>>> It's very difficult to find. :-)  I haven't tried it, but I think it
>>> only patches the current abilist file, and not the other architectures.
>>
>> That's true.  It does an empirical update to the actual new reality, rather
>> than a specified updated to a putative new reality.
> 
> I think both approaches have their advantages, at least for function
> symbols.  Data symbols with their size information are more tricky.

The `make update-abi` method that Roland describes is part of the general
glibc `Release` documentation here and referenced from the `Release`
process followed by a release manager or machine maintainer during each
release:

https://sourceware.org/glibc/wiki/Regeneration

I'm happy to have more utility scripts added to `scripts/`, but I would
also like to hear what use cases this is intended to help with and have
those documented in the `Regeneration` document so developers can learn
how to use it.

Cheers,
Carlos.
  
Florian Weimer March 5, 2015, 9:01 p.m. UTC | #6
On 03/05/2015 08:30 PM, Carlos O'Donell wrote:
> On 03/02/2015 04:02 PM, Florian Weimer wrote:
>> On 03/02/2015 09:59 PM, Roland McGrath wrote:
>>>> It's very difficult to find. :-)  I haven't tried it, but I think it
>>>> only patches the current abilist file, and not the other architectures.
>>>
>>> That's true.  It does an empirical update to the actual new reality, rather
>>> than a specified updated to a putative new reality.
>>
>> I think both approaches have their advantages, at least for function
>> symbols.  Data symbols with their size information are more tricky.
> 
> The `make update-abi` method that Roland describes is part of the general
> glibc `Release` documentation here and referenced from the `Release`
> process followed by a release manager or machine maintainer during each
> release:
> 
> https://sourceware.org/glibc/wiki/Regeneration

I was under the impression it is the job of the patch submitter to
ensure that a patch does not cause *known* test suite failures on any
architecture.  To me, this clearly means that I have to update the
*.abilist files in any patch that updates the ABI.

Doing it at release time might work as well, but it seems much better to
me to avoid any known test suite failures during development, so that
interpreting test suite results is easy as possible.
  
Carlos O'Donell March 5, 2015, 9:41 p.m. UTC | #7
On 03/05/2015 04:01 PM, Florian Weimer wrote:
> On 03/05/2015 08:30 PM, Carlos O'Donell wrote:
>> On 03/02/2015 04:02 PM, Florian Weimer wrote:
>>> On 03/02/2015 09:59 PM, Roland McGrath wrote:
>>>>> It's very difficult to find. :-)  I haven't tried it, but I think it
>>>>> only patches the current abilist file, and not the other architectures.
>>>>
>>>> That's true.  It does an empirical update to the actual new reality, rather
>>>> than a specified updated to a putative new reality.
>>>
>>> I think both approaches have their advantages, at least for function
>>> symbols.  Data symbols with their size information are more tricky.
>>
>> The `make update-abi` method that Roland describes is part of the general
>> glibc `Release` documentation here and referenced from the `Release`
>> process followed by a release manager or machine maintainer during each
>> release:
>>
>> https://sourceware.org/glibc/wiki/Regeneration
> 
> I was under the impression it is the job of the patch submitter to
> ensure that a patch does not cause *known* test suite failures on any
> architecture.  To me, this clearly means that I have to update the
> *.abilist files in any patch that updates the ABI.

You do.
 
> Doing it at release time might work as well, but it seems much better to
> me to avoid any known test suite failures during development, so that
> interpreting test suite results is easy as possible.

It is.

If your script helps with that, by say skipping the manual process of
running `make update-abi` on one system, and propagating that identical
change to all other target abi list files, then that's a good thing.

What you have proposed appears to be a "first step", in that it allows
you to add entries to all ABI files one symbol at a time.

Could we have a `make all-update-abi` target that ran `make update-abi`
and then ran your script to propagate all the changes to all other
targets? That's the only useful case I can come up with where your
script helps.

Thoughts?

Cheers,
Carlos.
  
Roland McGrath March 5, 2015, 10:03 p.m. UTC | #8
> Could we have a `make all-update-abi` target that ran `make update-abi`
> and then ran your script to propagate all the changes to all other
> targets? That's the only useful case I can come up with where your
> script helps.
> 
> Thoughts?

Anything with a manually-controlled step still assumes that the human gets
the details right.  Of course it would be caught eventually if they don't,
but it still seems like false security to me.

If we get to where we have a buildbot (even compile-only) for each
maintained configuration, then we can arrange to have the bot logs show
check-abi diffs in a digestible fashion.  Then one can commit the change
including the 'make update-abi'-created changes for every machine one
personally built for, wait for all the bots to cycle, and run some
bot-log-scraping script that has the effect of producing the same changes
'make update-abi' would do if you were able to build it locally.  Of course
that leaves a window of commit history where check-abi is in a failing
state for some subset of configurations, which is less than ideal.
  
Carlos O'Donell March 5, 2015, 10:06 p.m. UTC | #9
On 03/05/2015 05:03 PM, Roland McGrath wrote:
>> Could we have a `make all-update-abi` target that ran `make update-abi`
>> and then ran your script to propagate all the changes to all other
>> targets? That's the only useful case I can come up with where your
>> script helps.
>>
>> Thoughts?
> 
> Anything with a manually-controlled step still assumes that the human gets
> the details right.  Of course it would be caught eventually if they don't,
> but it still seems like false security to me.
> 
> If we get to where we have a buildbot (even compile-only) for each
> maintained configuration, then we can arrange to have the bot logs show
> check-abi diffs in a digestible fashion.  Then one can commit the change
> including the 'make update-abi'-created changes for every machine one
> personally built for, wait for all the bots to cycle, and run some
> bot-log-scraping script that has the effect of producing the same changes
> 'make update-abi' would do if you were able to build it locally.  Of course
> that leaves a window of commit history where check-abi is in a failing
> state for some subset of configurations, which is less than ideal.

Right, you would use trybot once we get it.

I'm arguing Florian's script is 50% of a solution. I've had to update
the abi files and it's annoying. Similarly Florian probably has also.

You really want:

(a) `make all-update-abi` to propagate `make update-abi` changes to all
    machine abis.

or

(b) A trybot that you can use to scrape the results of `make update-abi`
    which gives you perfect confidence that things are working as
    expected.

Cheers,
Carlos.
  
Roland McGrath March 5, 2015, 10:08 p.m. UTC | #10
> Right, you would use trybot once we get it.

Indeed.

> (a) `make all-update-abi` to propagate `make update-abi` changes to all
>     machine abis.

How do you imagine that can work?
  
Carlos O'Donell March 5, 2015, 10:17 p.m. UTC | #11
On 03/05/2015 05:08 PM, Roland McGrath wrote:
>> Right, you would use trybot once we get it.
> 
> Indeed.
> 
>> (a) `make all-update-abi` to propagate `make update-abi` changes to all
>>     machine abis.
> 
> How do you imagine that can work?

It can't, at least not 100% correctly. It's simply automation. The machine
maintainer still has to review, and if you check it in it could still fail,
but it's easier than the manual process I follow now.

Look at the check-localplt changes I made where I *tried* to get the symbols
of __tls_get_addr correct, and failed for at least 3 machines becuase the
ABI was structured in odd ways.

Nothing but trybot is 100% correct.

Cheers,
Carlos.
  
Florian Weimer March 6, 2015, 10:24 a.m. UTC | #12
On 03/05/2015 11:06 PM, Carlos O'Donell wrote:

> I'm arguing Florian's script is 50% of a solution. I've had to update
> the abi files and it's annoying. Similarly Florian probably has also.
> 
> You really want:
> 
> (a) `make all-update-abi` to propagate `make update-abi` changes to all
>     machine abis.

Can we change the format of the abi-list file so that it is a list of
rows, with space-separated columns

 <version> <symbol> <flags/size>

?  Then I can write the diffing and merging logic just with sort and
comm.  With the current format, I'd have to write another Python script
(or learn awk).
  
Carlos O'Donell March 6, 2015, 3:37 p.m. UTC | #13
On 03/06/2015 05:24 AM, Florian Weimer wrote:
> On 03/05/2015 11:06 PM, Carlos O'Donell wrote:
> 
>> I'm arguing Florian's script is 50% of a solution. I've had to update
>> the abi files and it's annoying. Similarly Florian probably has also.
>>
>> You really want:
>>
>> (a) `make all-update-abi` to propagate `make update-abi` changes to all
>>     machine abis.
> 
> Can we change the format of the abi-list file so that it is a list of
> rows, with space-separated columns
> 
>  <version> <symbol> <flags/size>
> 
> ?  Then I can write the diffing and merging logic just with sort and
> comm.  With the current format, I'd have to write another Python script
> (or learn awk).

Yeah, absolutely, you can change it to be whatever format
you think makes it easier to automate processing. However,
this also involves updating existing scripts that use the
old format, or rewriting such scripts in python.

I don't care if we repeat some data in the file if it simplifies
automation and thus simplifies my work as a developer.

Cheers,
Carlos.
  
Roland McGrath March 6, 2015, 5:55 p.m. UTC | #14
> Can we change the format of the abi-list file so that it is a list of
> rows, with space-separated columns
> 
>  <version> <symbol> <flags/size>
> 
> ?  Then I can write the diffing and merging logic just with sort and
> comm.  With the current format, I'd have to write another Python script
> (or learn awk).

Everyone should learn awk anyway.  But yes, I think it would be fine to
change the format to whatever is most convenient for maintenance.  The
original choices of format were somewhat motivated by keeping the files
smaller, but the scale of the file sizes in today's world makes this a
nonissue.
  
Joseph Myers March 6, 2015, 6:48 p.m. UTC | #15
On Thu, 5 Mar 2015, Carlos O'Donell wrote:

> (b) A trybot that you can use to scrape the results of `make update-abi`
>     which gives you perfect confidence that things are working as
>     expected.

One important case is that a bug in the change means that some 
architecture doesn't actually get the new function, and so if you don't 
update that architecture's ABI baselines in a way corresponding to other 
architectures, a make update-abi on that architecture results in no 
changes at all, and make check-abi does not fail there, and the new 
function is quietly missing there.

Manually updating ABIs, possibly helped by a script, to reflect the intent 
that the new function appears on all architectures, is one way to help 
avoid such bugs (by making them result in visible check-abi failures 
unless you accidentally miss updating the baselines for exactly the same 
architectures as miss the new functions).  Of course we should *also* add 
testcases that verify the new function calls at least link (even if no 
runtime test is possible without e.g. a new Linux kernel or running with 
privileges) - as that would also detect such problems.

(fallocate64 being added in 2.10 on 64-bit architectures but 2.11 on 
32-bit architectures 
<https://sourceware.org/ml/libc-hacker/2009-05/msg00003.html> is the 
prototypical example of a bug that could have been found this way.)
  
Carlos O'Donell March 6, 2015, 7:03 p.m. UTC | #16
On 03/06/2015 01:48 PM, Joseph Myers wrote:
> On Thu, 5 Mar 2015, Carlos O'Donell wrote:
> 
>> (b) A trybot that you can use to scrape the results of `make update-abi`
>>     which gives you perfect confidence that things are working as
>>     expected.
> 
> One important case is that a bug in the change means that some 
> architecture doesn't actually get the new function, and so if you don't 
> update that architecture's ABI baselines in a way corresponding to other 
> architectures, a make update-abi on that architecture results in no 
> changes at all, and make check-abi does not fail there, and the new 
> function is quietly missing there.

Oh, that's a really good point, and a very good reason for accepting
a script that simply updates all baselines.

> Manually updating ABIs, possibly helped by a script, to reflect the intent 
> that the new function appears on all architectures, is one way to help 
> avoid such bugs (by making them result in visible check-abi failures 
> unless you accidentally miss updating the baselines for exactly the same 
> architectures as miss the new functions).  Of course we should *also* add 
> testcases that verify the new function calls at least link (even if no 
> runtime test is possible without e.g. a new Linux kernel or running with 
> privileges) - as that would also detect such problems.

Agreed.

> (fallocate64 being added in 2.10 on 64-bit architectures but 2.11 on 
> 32-bit architectures 
> <https://sourceware.org/ml/libc-hacker/2009-05/msg00003.html> is the 
> prototypical example of a bug that could have been found this way.)

Yes.

Cheers,
Carlos.
  

Patch

diff --git a/scripts/add-abilist.py b/scripts/add-abilist.py
new file mode 100644
index 0000000..3d6b6b8
--- /dev/null
+++ b/scripts/add-abilist.py
@@ -0,0 +1,93 @@ 
+#!/usr/bin/python
+# Copyright (C) 2015 Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, see <http://www.gnu.org/licenses/>.
+
+# usage:
+#
+#   python scripts/add-abilist.py GLIBC_X.Y function1 function2 \
+#     --  $(find sysdeps/unix -name libc.abilist)
+#
+# By default, function symbols are added.  You can also add arguments
+# of the form "variable 0x4" to add symbols for variables.  (Note that
+# the size may be architecture-specific.)
+
+import sys
+
+def parse_arguments(argv):
+    if len(argv) < 5 or "--" not in argv[3:]:
+        sys.stderr.write(
+            "usage: {} VERSION SYMBOL... -- FILES...\n".format(argv[0]))
+        sys.exit(1)
+    version = argv[1]
+    for dashdash in range(2, len(argv)):
+        if argv[dashdash] == "--":
+            break
+    symbols = argv[2:dashdash]
+    files = argv[dashdash + 1:]
+    return version, symbols, files
+version, symbols, files = parse_arguments(sys.argv)
+
+def read_file(path):
+    result = {}
+    with file(path) as f:
+        version = None
+        for line in f:
+            if line[0] == ' ':
+                line = line.strip()
+                assert version is not None
+                try:
+                    result[version].add(line)
+                except KeyError:
+                    result[version] = set((line,))
+            else:
+                version = line.strip()
+    return result
+
+def write_file(path, versions):
+    with file(path, "w") as f:
+        for (version, symbols) in sorted(versions.items()):
+            f.write("{}\n".format(version))
+            for symbol in sorted(symbols):
+                f.write(" {}\n".format(symbol))
+
+def patch_file(path, version, symbols):
+    file_versions = read_file(path)
+    changed = []
+    try:
+        version_symbols = file_versions[version]
+    except KeyError:
+        version_symbols = set((version + " A",))
+        changed.append(version)
+        file_versions[version] = version_symbols
+    for symbol in symbols:
+        if symbol not in version_symbols:
+            changed.append(symbol)
+            if " " not in symbols:
+                symbol += " F" # Mark as function by default
+            version_symbols.add(symbol)
+    write_file(path, file_versions)
+    return tuple(changed)
+
+previous = ()
+for path in sorted(files):
+    if path[:2] == "./":
+        path = path[2:]
+    changed = patch_file(path, version, symbols)
+    if changed == previous:
+        changed = "Likewise."
+    else:
+        previous = changed
+        changed = "Add {}.".format(", ".join(changed))
+    sys.stdout.write("\t* {}: {}\n".format(path, changed))