Create script to convert old tests into Dwarf::assemble calls.

Message ID 20241205135550.9320-1-wqferr@gmail.com
State New
Headers
Series Create script to convert old tests into Dwarf::assemble calls. |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed

Commit Message

William Ferreira Dec. 5, 2024, 1:55 p.m. UTC
  PR testsuite/32261 requests a script that could convert old .S-based
tests (that were made before dwarf.exp existed) into the new
Dwarf::assemble calls in Tcl. This commit is an initial implementation
of such a script. Python was chosen for convenience, and only relies on
a single external library.

Usage: the script operates not on .S files, but on ELF files with DWARF
information. To convert an old test, one must run said test via
`make check-gdb TESTS=testname` in their build directory. This will
produce, as a side effect, an ELF file the test used as an inferior, at
gdb/testsuite/outputs/testname/testname. This ELF file is this script's
input.

Reliability: not counting the limitations listed below, the script seems
functional enough to be worthy of discussion in the mailing list. I have
been testing it with different tests that already use Dwarf::assemble,
to see if the script can produce a similar call to it. Indeed, in the
cases that I've tested (including some more complex ones, marked with an
asterisk below) it is able to produce comparable output to the original
exp file. Of course, it can't reproduce the complex code *before* the
Dwarf::assemble call. Values calculated then are simply inlined.

The following .exp files have been tried in this way and their outputs
highly resemble the original:
- gdb.dwarf2/dynarr-ptr
- gdb.dwarf2/void-type
- gdb.dwarf2/ada-thick-pointer
- gdb.dwarf2/atomic-type
- gdb.dwarf2/dw2-entry-points (*)

The following .exp files DO NOT WORK with this script:
- gdb.dwarf2/cu-no-addrs
  - aranges not supported.
  - Compile unit hi_pc and low_pc hardcoded, prone to user error
    due to forgetting to replace with variables.

The following .exp files work, with caveats addressed in the limitations
section below.
- gdb.dwarf2/cpp-linkage-name
  - Works correctly except for one attribute of the form SPECIAL_expr.
- gdb.dwarf2/dw2-unusual-field-names
  - Same as above, with two instances of SPECIAL_expr.
- gdb.dwarf2/implptr-optimized-out
  - Same as above, with two instances of SPECIAL_expr.

Currently the script has the following known limitations:
- It does not support Tcl outside a small boilerplate and
  Dwarf::assemble calls.
- It does not support line tables.
- It does not use $srcfile and other variables in the call to
  Dwarf::assemble (since it can't know where it is safe to substitute).
- It does not support "fission" type DWARFs (in fact I still have no
  clue what those are).
- It does not support cu {label LABEL} {} CUs, mostly because I couldn't
  find the information using pyelftools.
- It sometimes outputs empty CUs at the start and end of the call. This
  might be a problem with my machine, but I've checked with DWARF dumps
  and they are indeed in the input ELF files generated by
  `make check-gdb`.
- It does not support attributes with the forms DW_FORM_block* and
  DW_FORM_exprloc. This is mostly not a concern of the difficulty of
  the implementation, but of it being an incomplete feature and, thus,
  more susceptible to users forgetting to correct its mistakes or
  unfinished values (please see discussion started by Guinevere at
  comment 23 https://sourceware.org/bugzilla/show_bug.cgi?id=32261#c23).
  The incompleteness of this feature is easy to demonstrate: any call to
  gdb_target_symbol, a common tool used in these attributes, needs a
  symbol name that is erased after compilation. There is no way to guess
  where that address being referenced in a DW_OP_addr comes from, and it
  can't be hard coded since it can change depending on the machine
  compiling it.

Please bring up any further shortcomings this script may have with your
expectations.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32261
---
 gdb/testsuite/lib/asm_to_dwarf_assembler.py | 830 ++++++++++++++++++++
 1 file changed, 830 insertions(+)
 create mode 100644 gdb/testsuite/lib/asm_to_dwarf_assembler.py
  

Comments

Simon Marchi Dec. 5, 2024, 5:25 p.m. UTC | #1
On 2024-12-05 08:55, William Ferreira wrote:
> PR testsuite/32261 requests a script that could convert old .S-based
> tests (that were made before dwarf.exp existed) into the new
> Dwarf::assemble calls in Tcl. This commit is an initial implementation
> of such a script. Python was chosen for convenience, and only relies on
> a single external library.

Thanks for doing this.  I don't have time for a deep review right now,
but I have a few superficial comments.

I'm not sure I would put it under testsuite/lib.  This directory
contains stuff used by the tests themselves (expect code) of test
programs.  Maybe right under testsuite?  There are already two Python
utils there.

Run black and flake8.  I would suggest setting up pre-commit for that:

https://sourceware.org/gdb/wiki/DeveloperTips#Setting_up_pre-commit

However, the flake8 entry in .pre-commit-config.yml isn't properly set
up to pick up gdb/testsuite/*.py, I will try to fix that right now.


> +def main(argv):
> +    try:
> +        filename = argv[1]
> +    except IndexError:
> +        print("Usage:")
> +        print("python ./asm_to_dwarf_assembler.py <path/to/elf/file>")
> +        sys.exit(1)

You import argparse but don't use it.  Any reason not to use it?

Simon
  
Luis Machado Dec. 5, 2024, 5:37 p.m. UTC | #2
On 12/5/24 17:25, Simon Marchi wrote:
> 
> 
> On 2024-12-05 08:55, William Ferreira wrote:
>> PR testsuite/32261 requests a script that could convert old .S-based
>> tests (that were made before dwarf.exp existed) into the new
>> Dwarf::assemble calls in Tcl. This commit is an initial implementation
>> of such a script. Python was chosen for convenience, and only relies on
>> a single external library.
> 
> Thanks for doing this.  I don't have time for a deep review right now,
> but I have a few superficial comments.
> 
> I'm not sure I would put it under testsuite/lib.  This directory
> contains stuff used by the tests themselves (expect code) of test
> programs.  Maybe right under testsuite?  There are already two Python
> utils there.

How about gdb/contrib?
  
Simon Marchi Dec. 5, 2024, 5:46 p.m. UTC | #3
On 2024-12-05 12:37, Luis Machado wrote:
> On 12/5/24 17:25, Simon Marchi wrote:
>>
>>
>> On 2024-12-05 08:55, William Ferreira wrote:
>>> PR testsuite/32261 requests a script that could convert old .S-based
>>> tests (that were made before dwarf.exp existed) into the new
>>> Dwarf::assemble calls in Tcl. This commit is an initial implementation
>>> of such a script. Python was chosen for convenience, and only relies on
>>> a single external library.
>>
>> Thanks for doing this.  I don't have time for a deep review right now,
>> but I have a few superficial comments.
>>
>> I'm not sure I would put it under testsuite/lib.  This directory
>> contains stuff used by the tests themselves (expect code) of test
>> programs.  Maybe right under testsuite?  There are already two Python
>> utils there.
> 
> How about gdb/contrib?
> 

I wouldn't expect to find things specific to the testsuite there.

Of course, to contradict that, I see there is:

  expect-read1.c
  expect-read1.sh

It seems like gdb/contrib/expect-read1.c is a simpler version of
gdb/testsuite/read1.c.  I wonder if the copy in contrib is really
used...

Simon
  
Luis Machado Dec. 5, 2024, 5:48 p.m. UTC | #4
On 12/5/24 17:46, Simon Marchi wrote:
> 
> 
> On 2024-12-05 12:37, Luis Machado wrote:
>> On 12/5/24 17:25, Simon Marchi wrote:
>>>
>>>
>>> On 2024-12-05 08:55, William Ferreira wrote:
>>>> PR testsuite/32261 requests a script that could convert old .S-based
>>>> tests (that were made before dwarf.exp existed) into the new
>>>> Dwarf::assemble calls in Tcl. This commit is an initial implementation
>>>> of such a script. Python was chosen for convenience, and only relies on
>>>> a single external library.
>>>
>>> Thanks for doing this.  I don't have time for a deep review right now,
>>> but I have a few superficial comments.
>>>
>>> I'm not sure I would put it under testsuite/lib.  This directory
>>> contains stuff used by the tests themselves (expect code) of test
>>> programs.  Maybe right under testsuite?  There are already two Python
>>> utils there.
>>
>> How about gdb/contrib?
>>
> 
> I wouldn't expect to find things specific to the testsuite there.

Fair enough. Contrib feels like a place to put misc helpers/scripts/tools that
get used sporadically.

> 
> Of course, to contradict that, I see there is:
> 
>   expect-read1.c
>   expect-read1.sh
> 
> It seems like gdb/contrib/expect-read1.c is a simpler version of
> gdb/testsuite/read1.c.  I wonder if the copy in contrib is really
> used...
> 
> Simon
  
Tom Tromey Dec. 9, 2024, 8:30 p.m. UTC | #5
>>>>> "Simon" == Simon Marchi <simark@simark.ca> writes:

>>> I'm not sure I would put it under testsuite/lib.  This directory
>>> contains stuff used by the tests themselves (expect code) of test
>>> programs.  Maybe right under testsuite?  There are already two Python
>>> utils there.

>> How about gdb/contrib?

Simon> I wouldn't expect to find things specific to the testsuite there.

I'd also suggest contrib:

It seems to me that running this program on the existing
pre-DWARF-assembler tests is a one-time thing: we convert them, remove
the old .s files, and move on.

However, the program can also be used (or modified so it can be...) to
convert other programs to DWARF assembler.  This will be useful when
writing new tests -- you can use it to generate a rough draft of the
test file.

In this case, contrib seems reasonable to me.


Another idea would be to make a new directory where we keep all the
maint-related Python code.  This would make it easier for us to reuse
code, have gdb-maint-specific modules for import & sharing, etc.

Tom
  
Guinevere Larsen Dec. 17, 2024, 4:43 p.m. UTC | #6
Thanks for working on this amazing script! This is really fancy,  and I 
really like what you got going

I have some specific feedback on some lines, but in general I really 
like where the script is going!
On 12/5/24 10:55 AM, William Ferreira wrote:
> PR testsuite/32261 requests a script that could convert old .S-based
> tests (that were made before dwarf.exp existed) into the new
> Dwarf::assemble calls in Tcl. This commit is an initial implementation
> of such a script. Python was chosen for convenience, and only relies on
> a single external library.
>
> Usage: the script operates not on .S files, but on ELF files with DWARF
> information. To convert an old test, one must run said test via
> `make check-gdb TESTS=testname` in their build directory. This will
> produce, as a side effect, an ELF file the test used as an inferior, at
> gdb/testsuite/outputs/testname/testname. This ELF file is this script's
> input.
>
> Reliability: not counting the limitations listed below, the script seems
> functional enough to be worthy of discussion in the mailing list. I have
> been testing it with different tests that already use Dwarf::assemble,
> to see if the script can produce a similar call to it. Indeed, in the
> cases that I've tested (including some more complex ones, marked with an
> asterisk below) it is able to produce comparable output to the original
> exp file. Of course, it can't reproduce the complex code *before* the
> Dwarf::assemble call. Values calculated then are simply inlined.
>
> The following .exp files have been tried in this way and their outputs
> highly resemble the original:
> - gdb.dwarf2/dynarr-ptr
> - gdb.dwarf2/void-type
> - gdb.dwarf2/ada-thick-pointer
> - gdb.dwarf2/atomic-type
> - gdb.dwarf2/dw2-entry-points (*)
>
> The following .exp files DO NOT WORK with this script:
> - gdb.dwarf2/cu-no-addrs
>    - aranges not supported.
>    - Compile unit hi_pc and low_pc hardcoded, prone to user error
>      due to forgetting to replace with variables.
>
> The following .exp files work, with caveats addressed in the limitations
> section below.
> - gdb.dwarf2/cpp-linkage-name
>    - Works correctly except for one attribute of the form SPECIAL_expr.
> - gdb.dwarf2/dw2-unusual-field-names
>    - Same as above, with two instances of SPECIAL_expr.
> - gdb.dwarf2/implptr-optimized-out
>    - Same as above, with two instances of SPECIAL_expr.
>
> Currently the script has the following known limitations:
> - It does not support Tcl outside a small boilerplate and
>    Dwarf::assemble calls.
I don't think we need to support anything other than creating the 
dwarf::assemble stuff. Everything else is usual boilerplate from the 
testsuite, so we're pretty familiar with it (or at least have plenty of 
things to double-check) while dwarf assembler isn't as common and much 
more complex.
> - It does not support line tables.
> - It does not use $srcfile and other variables in the call to
>    Dwarf::assemble (since it can't know where it is safe to substitute).
> - It does not support "fission" type DWARFs (in fact I still have no
>    clue what those are).
> - It does not support cu {label LABEL} {} CUs, mostly because I couldn't
>    find the information using pyelftools.
> - It sometimes outputs empty CUs at the start and end of the call. This
>    might be a problem with my machine, but I've checked with DWARF dumps
>    and they are indeed in the input ELF files generated by
>    `make check-gdb`.
> - It does not support attributes with the forms DW_FORM_block* and
>    DW_FORM_exprloc. This is mostly not a concern of the difficulty of
>    the implementation, but of it being an incomplete feature and, thus,
>    more susceptible to users forgetting to correct its mistakes or
>    unfinished values (please see discussion started by Guinevere at
>    comment 23 https://sourceware.org/bugzilla/show_bug.cgi?id=32261#c23).
>    The incompleteness of this feature is easy to demonstrate: any call to
>    gdb_target_symbol, a common tool used in these attributes, needs a
>    symbol name that is erased after compilation. There is no way to guess
>    where that address being referenced in a DW_OP_addr comes from, and it
>    can't be hard coded since it can change depending on the machine
>    compiling it.
>
> Please bring up any further shortcomings this script may have with your
> expectations.
>
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32261
> ---
>   gdb/testsuite/lib/asm_to_dwarf_assembler.py | 830 ++++++++++++++++++++
>   1 file changed, 830 insertions(+)
>   create mode 100644 gdb/testsuite/lib/asm_to_dwarf_assembler.py
>
> diff --git a/gdb/testsuite/lib/asm_to_dwarf_assembler.py b/gdb/testsuite/lib/asm_to_dwarf_assembler.py
> new file mode 100644
> index 00000000000..ea86c19c805
> --- /dev/null
> +++ b/gdb/testsuite/lib/asm_to_dwarf_assembler.py
> @@ -0,0 +1,830 @@
> +# Copyright 2024 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# Due to the pyelftools dependency, this script requires Python version
> +# 3.10 or greater to run.
> +
> +"""A utility to convert ELF files with DWARF info to Dwarf::assemble code.
> +
> +Usage:
> +    python ./asm_to_dwarf_assembler.py <path/to/elf/file>
> +
> +Dependencies:
> +    Python >= 3.10
> +    pyelftools >= 0.31
> +
> +Notes:
> +- Line tables are not currently supported.
> +- Non-contiguous subprograms are not currently supported.
> +- If you want to use $srcfile or similar, you must edit the references to the
> +  file name manually, including DW_AT_name attributes on compile units.
> +- If run with binaries generated by make check-gdb, it may include an
> +  additional compile_unit before and after the actual compile units. This is
> +  an artifact of the normal compilation process, as these CUs are indeed in
> +  the generated DWARF in some cases.
> +"""
> +
> +import sys
> +import argparse
> +import elftools
> +import re
> +
> +from io import BytesIO, IOBase
> +from typing import Any, Optional, Iterable, List, Union, Annotated
> +from functools import cache
> +from dataclasses import dataclass
> +from copy import copy
> +from datetime import datetime
> +from logging import getLogger
> +
> +from elftools import dwarf, elf
> +from elftools.elf.elffile import ELFFile
> +from elftools.dwarf.compileunit import CompileUnit as RawCompileUnit
> +from elftools.dwarf.die import DIE as RawDIE, AttributeValue
> +from elftools.common.exceptions import DWARFError
> +
> +
> +logger = getLogger(__file__)
> +
> +
> +EXPR_ATTRIBUTE_FORMS = [
> +    "DW_FORM_exprloc",
> +    "DW_FORM_block",
> +    "DW_FORM_block1",
> +    "DW_FORM_block2",
> +    "DW_FORM_block4",
> +]
Since these aren't supported, does it make sense to keep them here?
> +
> +
> +@dataclass
> +class DWARFOperation:
> +    """Values of the form DW_OP_<name>."""
> +    suffix: Annotated[str, 'Part of the name that succeeds "DW_OP_".']
> +    code: Annotated[int, "Numeric representation of the OP."]
> +    num_operands: Annotated[int, "Number of operands this operator expects."]
> +
> +
> +dwarf_operations: Annotated[
> +    dict[int, DWARFOperation],
> +    "Table of all DWARF operations as per table 7.9 of the DWARF5 spec."
> +] = {}
> +def _register_op(suffix: str, code: int, num_operands: int) -> None:
> +    assert code not in dwarf_operations, "Duplicate operation code found."
> +    dwarf_operations[code] = DWARFOperation(suffix, code, num_operands)
> +
> +def _register_all_ops() -> None:
> +    # Op codes 0x01 and 0x02 are reserved.
> +
> +    _register_op("addr", 0x03, 1)
> +
> +    # Op codes 0x04 and 0x05 are reserved.
> +
> +    _register_op("deref", 0x06, 0)
> +
> +    # Op code 0x07 is reserved.
> +
> +    _register_op("const1u", 0x08, 1)
> +    _register_op("const1s", 0x09, 1)
> +    _register_op("const2u", 0x0a, 1)
> +    _register_op("const2s", 0x0b, 1)
> +    _register_op("const4u", 0x0c, 1)
> +    _register_op("const4s", 0x0d, 1)
> +    _register_op("const8u", 0x0e, 1)
> +    _register_op("const8s", 0x0f, 1)
> +    _register_op("constu", 0x10, 1)
> +    _register_op("consts", 0x11, 1)
> +
> +    _register_op("dup", 0x12, 0)
> +    _register_op("drop", 0x13, 0)
> +    _register_op("over", 0x14, 0)
> +    _register_op("pick", 0x15, 1)
> +    _register_op("swap", 0x16, 0)
> +    _register_op("rot", 0x17, 0)
> +    _register_op("xderef", 0x18, 0)
> +
> +    _register_op("abs", 0x19, 0)
> +    _register_op("and", 0x1a, 0)
> +    _register_op("div", 0x1b, 0)
> +    _register_op("minus", 0x1c, 0)
> +    _register_op("mod", 0x1d, 0)
> +    _register_op("mul", 0x1e, 0)
> +    _register_op("neg", 0x1f, 0)
> +    _register_op("not", 0x20, 0)
> +    _register_op("or", 0x21, 0)
> +    _register_op("plus", 0x22, 0)
> +    _register_op("plus_uconst", 0x23, 1)
> +    _register_op("shl", 0x24, 0)
> +    _register_op("shr", 0x25, 0)
> +    _register_op("shra", 0x26, 0)
> +    _register_op("xor", 0x27, 0)
> +    _register_op("bra", 0x28, 1)
> +    _register_op("eq", 0x29, 0)
> +    _register_op("ge", 0x2a, 0)
> +    _register_op("gt", 0x2b, 0)
> +    _register_op("le", 0x2c, 0)
> +    _register_op("lt", 0x2d, 0)
> +    _register_op("ne", 0x2e, 0)
> +    _register_op("skip", 0x2f, 0)
> +
> +    for lit_nr in range(32):
> +        _register_op(f"lit{lit_nr}", 0x30 + lit_nr, 0)
> +
> +    for reg_nr in range(32):
> +        _register_op(f"reg{reg_nr}", 0x50 + reg_nr, 0)
> +
> +    for breg_nr in range(32):
> +        _register_op(f"breg{breg_nr}", 0x70 + breg_nr, 0)
> +
> +    _register_op("regx", 0x90, 1)
> +    _register_op("fbregx", 0x91, 1)
> +    _register_op("bregx", 0x92, 2)
> +    _register_op("piece", 0x93, 1)
> +    _register_op("deref_size", 0x94, 1)
> +    _register_op("xderef_size", 0x95, 1)
> +    _register_op("nop", 0x96, 0)
> +    _register_op("push_object_address", 0x97, 0)
> +
> +    _register_op("call2", 0x98, 1)
> +    _register_op("call4", 0x99, 1)
> +    _register_op("call_ref", 0x9a, 1)
> +    _register_op("form_tls_address", 0x9b, 0)
> +    _register_op("call_frame_cfa", 0x9c, 0)
> +
> +    _register_op("bit_piece", 0x9d, 2)
> +    _register_op("implicit_value", 0x9e, 2)
> +    _register_op("stack_value", 0x9f, 0)
> +    _register_op("implicit_pointer", 0xa0, 2)
> +    _register_op("addrx", 0xa1, 1)
> +    _register_op("constx", 0xa2, 1)
> +
> +    _register_op("entry_value", 0xa3, 2)
> +    _register_op("const_type", 0xa4, 3)
> +    _register_op("regval_type", 0xa5, 2)
> +    _register_op("deref_type", 0xa6, 2)
> +    _register_op("xderef_type", 0xa7, 2)
> +
> +    _register_op("convert", 0xa8, 1)
> +    _register_op("reinterpret", 0xa9, 1)
> +
> +
> +_register_all_ops()
Similar question to here, if I understood later code correctly. Do we 
want to keep this if we're not doing location expressions?
> +
> +
> +copyright_notice_template = """
> +# Copyright 2024-{current_year} Free Software Foundation, Inc.
If the script is adding copyright, it will be for a new file, and so 
copyright will always start on the year when the script is run. So it 
can just be {current_year}
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +""".lstrip()  # Strip to remove initial \n, but not the last one.
> +
> +# Workaround for my editor not to freak out over unclosed braces.
> +lbrace, rbrace = "{", "}"
> +
> +@cache
> +def get_indent_str(indent_count: int) -> str:
> +    """Get whitespace string to prepend to another for indenting."""
> +    indent = (indent_count // 2) * "\t"
> +    if indent_count % 2 == 1:
> +        indent += "    "
> +    return indent
> +
> +
> +def indent(line: str, indent_count: int) -> str:
> +    """Indent line by indent_count levels."""
> +    return get_indent_str(indent_count) + line
> +
> +
> +def labelify_str(s: str) -> str:
> +    """Make s appropriate for a label name."""
> +    # Replace "*" with the literal word "ptr".
> +    s = s.replace("*", "ptr")
> +
> +    # Replace any non-"word" characters by "_".
> +    s = re.sub(r"\W", "_", s)
> +
> +    # Remove consecutive "_"s.
> +    s = re.sub(r"__+", "_", s)
> +
> +    return s
> +
> +
> +class DWARFAttribute:
> +    """Storage unit for a single DWARF attribute.
> +
> +    All its values are strings that are usually passed on
> +    directly to format. The exceptions to this are attributes
> +    with int values with DW_FORM_ref4 or DW_FORM_ref_addr form.
> +    Their values are interpreted as the global offset of the DIE
> +    being referenced, which are looked up dynamically to fetch
> +    their labels.
> +    """
> +    def __init__(
> +        self,
> +        die_offset: int,
> +        name: str,
> +        value: str | bytes | int | bool,
> +        form=None,
> +    ):
> +        self.die_offset = die_offset
> +        self.name = name
> +        self.value = value
> +        self.form = form
> +
> +    def _format_expr_value(self) -> str:
> +        # This code is left here in case this is feature is desired in the
> +        # future. It is partly functional, being able to decode operations
> +        # but not operands due to their variable length in bytes.
> +
> +        # self.form = "SPECIAL_expr"
> +        # lines = [lbrace]
> +        # curr_idx = 0
> +        # while curr_idx < len(self.value):
> +        #     curr_op_code = self.value[curr_idx]
> +        #
> +        #     try:
> +        #         curr_op: DWARFOperation = dwarf_operations[curr_op_code]
> +        #     except KeyError as err:
> +        #         raise KeyError(
> +        #             f"Unknown op: {curr_op_code}, item {curr_idx} of"
> +        #             f" attribute {self.name}, DIE at offset"
> +        #             f" {self.die_offset}."
> +        #         ) from err
> +        #     # Empty arg here so it inserts a space before the first argument,
> +        #     # only if there is indeed a first argument.
> +        #     args = [""]
> +        #     for arg_nr in range(curr_op.num_operands):
> +        #         curr_idx += 1
> +        #         # TODO decode bytes according to specific operation.
> +        #         # The following line only reads the next BYTE of the sequence,
> +        #         # and needs to be adjusted to read the correct number of
> +        #         # bytes.
> +        #         # args.append(str(self.value[curr_idx]))
> +        #     op_line = "DW_OP_" + curr_op.suffix + " ".join(args)
> +        #     lines.append(op_line)
> +        #     curr_idx += 1
> +        #
> +        # lines.append(rbrace)
> +        # return "\n".join(lines)
> +        self.form = "SPECIAL_expr"
> +        return "{ TODO: Fill expr list }"
Maybe, rather than "TODO" you could add "MANUAL" or something, to make 
it clear that this is a manual process and expected to be so? (naming is 
hard, but I think todo implies that this will be improved soon, and I 
don't think that'd be the case?)
> +
> +    def _needs_escaping(self, str_value: str) -> bool:
> +        charset = set(str_value)
> +        return bool(charset.intersection({"{", "}", " ", "\t"}))
> +
> +    def _format_str(self, str_value: str) -> str:
> +        if self._needs_escaping(str_value):
> +            escaped_str = str(str_value)
> +            # Replace single escape (which is itself escaped because of regex)
> +            # with a double escape (which doesn't mean anything to regex so
> +            # it doesn't need escaping).
> +            escaped_str = re.sub(r"\\", r"\\", escaped_str)
> +            escaped_str = re.sub("([{}])", r"\\\1", escaped_str)
> +            return "{" + escaped_str + "}"
> +        else:
> +            return str_value
> +
> +    def _format_value(
> +        self,
> +        offset_die_lookup: dict[int, "DWARFDIE"],
> +        indent_count: int = 0
> +    ) -> str:
> +        if self.form in EXPR_ATTRIBUTE_FORMS:
> +            return self._format_expr_value()
> +        elif isinstance(self.value, bool):
> +            return str(int(self.value))
> +        elif isinstance(self.value, int):
> +            if self.form == "DW_FORM_ref4":
> +                # ref4-style referencing label.
> +                die: "DWARFDIE" = offset_die_lookup[self.value]
> +                return ":$" + die.tcl_label
> +            elif self.form == "DW_FORM_ref_addr":
> +                # ref_addr-style referencing label.
> +                die: "DWARFDIE" = offset_die_lookup[self.value]
> +                return "%$" + die.tcl_label
> +            else:
> +                return str(self.value)
> +        elif isinstance(self.value, bytes):
> +            return self._format_str(self.value.decode("ascii"))
> +        elif isinstance(self.value, str):
> +            return self._format_str(self.value)
> +        else:
> +            raise NotImplementedError(
> +                "Unknown data type: " + str(type(self.value))
> +            )
> +
> +    def format(
> +        self,
> +        offset_die_lookup: dict[int, "DWARFDIE"],
> +        indent_count: int = 0
> +    ) -> str:
> +        """Format the attribute in the form {name value form}.
> +
> +        If form is DW_FORM_exprloc or DW_FORM_block, see next section on
> +        DWARFOperations.
> +
> +        If it isn't, value is formatted as follows:
> +            If bool, use "1" if True, "0" if False.
> +            If int:
> +                If form is DW_FORM_ref4, use ":$label" where label is the
> +                    tcl_label of the DWARFDIE at offset "value".
> +                If form is DW_FORM_ref_addr, use "%$label" where label is
> +                    the tcl_label of the DWARFDIE at offset "value".
> +                Else, use value directly.
> +            If bytes, use value.decode("ascii")
> +            If str, use value directly.
> +            Any other type results in a NotImplementedError being raised.
> +
> +        Regarding DW_FORM_exprloc and DW_FORM_block:
> +            The form is replaced with SPECIAL_expr.
> +            The entries in the value are interpreted and decoded using the
> +            dwarf_operations dictionary, and replaced with their names where
> +            applicable.
> +        """
> +        s = lbrace
> +        s += self.name + " "
> +        s += self._format_value(offset_die_lookup)
> +
> +        # Only explicitly state form if it's not a reference.
> +        if self.form not in [None, "DW_FORM_ref4", "DW_FORM_ref_addr"]:
> +            s += " " + self.form
> +
> +        s += rbrace
> +        return indent(s, indent_count)
> +
> +
> +class DWARFDIE:
> +    """This script's parsed version of a RawDIE."""
> +    def __init__(
> +        self,
> +        offset: int,
> +        tag: str,
> +        attrs: dict[str, DWARFAttribute],
> +        tcl_label: Optional[str] = None
> +    ):
> +        self.offset = offset
> +        self.tag = tag
> +        self.attrs = copy(attrs)
> +        self.children = []
> +        self.tcl_label = tcl_label

I think it would be worth a comment here or in format lines, regarding 
what tcl_label is actually about.

it might not be self-evident what these are, especially for people 
unfamiliar with the assembler.

> +
> +    def format_lines(
> +        self,
> +        offset_die_lookup: dict[int, "DWARFDIE"],
> +        indent_count: int = 0
> +    ) -> list[str]:
> +        """Get the list of lines that represent this DIE in Dwarf assembler."""
> +        die_lines = []
> +
> +        # Prepend label to first line, if it's set.
> +        if self.tcl_label:
> +            first_line_start = self.tcl_label + ": "
> +        else:
> +            first_line_start = ""
> +
> +        # First line, including label.
> +        first_line = indent(
> +            first_line_start + self.tag + " " + lbrace,
> +            indent_count
> +        )
> +        die_lines.append(first_line)
> +
> +        # Format attributes, if any.
> +        if self.attrs:
> +            for attr_name, attr in self.attrs.items():
> +                attr_line = attr.format(
> +                    offset_die_lookup,
> +                    indent_count=indent_count+1
> +                )
> +                die_lines.append(attr_line)
> +            die_lines.append(indent(rbrace, indent_count))
> +        else:
> +            # Don't create a new line, just append and immediately close the
> +            # brace on the last line.
> +            die_lines[-1] += rbrace
> +
> +        # Format children, if any.
> +        if self.children:
> +            # Only open a new brace if there are any children for the
> +            # current DIE.
> +            die_lines[-1] += " " + lbrace
> +            for child in self.children:
> +                child_lines = child.format_lines(
> +                    offset_die_lookup,
> +                    indent_count=indent_count+1
> +                )
> +                die_lines.extend(child_lines)
> +            die_lines.append(indent(rbrace, indent_count))
> +
> +        return die_lines
> +
> +    def format(
> +        self,
> +        offset_die_lookup: dict[int, "DWARFDIE"],
> +        indent_count: int = 0
> +    ) -> str:
> +        """Join result from format_lines into a single str."""
> +        return "\n".join(self.format_lines(offset_die_lookup, indent_count))
> +
> +    def name(self) -> Optional[str]:
> +        """Get DW_AT_name (if present) decoded as ASCII."""
> +        raw_value = self.attrs.get("DW_AT_name")
> +        if raw_value is None:
> +            return None
> +        else:
> +            return raw_value.value.decode("ascii")
> +
> +    def type_name(self) -> str:
> +        """Name of Dwarf tag, with the "DW_TAG_" prefix removed."""
> +        return re.sub("DW_TAG_", "", self.tag)
> +
> +class DWARFCompileUnit(DWARFDIE):
> +    """Wrapper subclass for CU DIEs.
> +
> +    This is necessary due to the special format CUs take in Dwarf::assemble.
> +
> +    Instead of simply:
> +        DW_TAG_compile_unit {
> +            <attributes>
> +        } {
> +            <children>
> +        }
> +
> +    CUs are formatted as:
> +        cu { <cu_special_vars> } {
> +            DW_TAG_compile_unit {
> +                <attributes>
> +            } {
> +                <children>
> +            }
> +        }
> +    """
> +
> +    # Default value for parameter dwarf_version defined in dwarf.exp line 1552.
> +    default_dwarf_version = 4
> +
> +    # Default value for parameter is_fission defined in dwarf.exp line 1556.
> +    # Currently not implemented, see comment below.
> +    # default_is_fission = False
> +
> +    # Tag that signifies a DIE is a compile unit.
> +    compile_unit_tag = "DW_TAG_compile_unit"
> +
> +    def __init__(
> +        self,
> +        raw_die: RawDIE,
> +        raw_cu: RawCompileUnit,
> +        attrs: dict[str, DWARFAttribute],
> +    ):
> +        """Initialize additional instance variables for CU encoding.
> +
> +        The additional instance variables are:
> +            - is_64_bit: Optional[bool]
> +              default None
> +              Whether this CU is 64 bit or not.
> +            - dwarf_version: int
> +              default DWARFCompileUnit.default_dwarf_version
> +              Version of DWARF this CU is using.
> +            - addr_size: Optional[int]
> +              default None
> +              Size of an address in bytes.
> +
> +        These variables are used to configure the first parameter of the cu
> +        proc (which contains calls to the compile_unit proc in the body of
> +        Dwarf::assemble).
> +        """
> +        super().__init__(
> +            raw_die.offset,
> +            DWARFCompileUnit.compile_unit_tag,
> +            attrs
> +        )
> +        self.raw_cu = raw_cu
> +        self.is_64_bit: Optional[bool] = None
> +        self.dwarf_version: int = raw_cu.header.get(
> +            "version",
> +            DWARFCompileUnit.default_dwarf_version
> +        )
> +        self.addr_size: Optional[int] = raw_cu.header["address_size"]
> +
> +        # Fission is not currently implemented because I don't know where to
> +        # fetch this information from.
> +        # self.is_fission: bool = self.default_is_fission
> +
> +        # CU labels are not currently implemented because I haven't found where
> +        # pyelftools exposes this information.
> +        # self.cu_label: Optional[str] = None
> +
> +    def format_lines(
> +        self,
> +        offset_die_lookup: dict[int, DWARFDIE],
> +        indent_count: int = 0,
> +    ) -> list[str]:
> +        lines = []
> +        lines.append(self._get_header(indent_count))
> +        inner_lines = super().format_lines(
> +            offset_die_lookup,
> +            indent_count + 1
> +        )
> +        lines += inner_lines
> +        lines.append(indent(rbrace, indent_count))
> +        return lines
> +
> +    def _get_header(self, indent_count: int = 0) -> str:
> +        """Assemble the first line of the surrounding 'cu {} {}' proc call."""
> +        header = indent("cu " + lbrace, indent_count)
> +        cu_params = []
> +
> +        if self.is_64_bit is not None:
> +            # Convert from True/False to 1/0.
> +            param_value = int(self.is_64_bit)
> +            cu_params += ["is_64", str(param_value)]
> +
> +        if self.dwarf_version != DWARFCompileUnit.default_dwarf_version:
> +            cu_params += ["version", str(self.dwarf_version)]
> +
> +        if self.addr_size is not None:
> +            cu_params += ["addr_size", str(self.addr_size)]
> +
> +        # Fission is not currently implemented, see comment above.
> +        # if self.is_fission != DWARFCompileUnit.default_is_fission:
> +        #     # Same as is_64_bit conversion, True/False -> 1/0.
> +        #     param_value = int(self.is_fission)
> +        #     cu_params += ["fission", str(param_value)]
> +
> +        # CU labels are not currently implemented, see commend above.
> +        # if self.cu_label is not None:
> +        #     cu_params += ["label", self.cu_label]
> +
> +        if cu_params:
> +            header += " ".join(cu_params)
> +
> +        header += rbrace + " " + lbrace
> +        return header
> +
> +class DWARFParser:
> +    """Converter from pyelftools's DWARF representation to this script's."""
> +
> +    def __init__(self, elf_file: IOBase):
> +        """Init parser with file opened in binary mode.
> +
> +        File can be closed after this function is called.
> +        """
> +        self.raw_data = BytesIO(elf_file.read())
> +        self.elf_data = ELFFile(self.raw_data)
> +        self.dwarf_info = self.elf_data.get_dwarf_info()
> +        self.offset_to_die: dict[int, DWARFDIE] = {}
> +        self.label_to_die: dict[str, DWARFDIE] = {}
> +        self.referenced_offsets: set[int] = set()
> +        self.raw_cu_list: list[RawCompileUnit] = []
> +        self.top_level_dies: list[DWARFDIE] = []
> +        self.subprograms: list[DWARFDIE] = []
> +        self.taken_labels: set[str] = set()
> +
> +        self._read_cus()
> +        self._create_necessary_labels()
> +
> +    def _read_cus(self):
I think I'd personally rename this to _read_all_cus or something like 
that. It is easy to miss the S at the end and typo the function name or 
get confused.
> +        """Populate self.raw_cu_list with all CUs in self.dwarf_info."""
> +        for cu in self.dwarf_info.iter_CUs():
> +            self._read_cu(cu)
> +
> +    def _read_cu(self, raw_cu: RawCompileUnit):
> +        """Read a compile_unit into self.cu_list."""
> +        self.raw_cu_list.append(raw_cu)
> +        for raw_die in raw_cu.iter_DIEs():
> +            if not raw_die.is_null():
> +                self._parse_die(raw_die)
> +
> +    def _parse_die(self, raw_die: RawDIE) -> DWARFDIE:
> +        """Process a single DIE and add it to offset_to_die.
> +
> +        Look for DW_FORM_ref4 and DWD_FORM_ref_addr form attributes and replace
> +        them with the global offset of the referenced DIE, and marking the
> +        referenced DIE as "referenced". This will be used later to assign and
> +        use labels.
> +
> +        In case the DIE is a top-level DIE, add it to self.top_level_dies.
> +
> +        In case the DIE is a subprogram, add it to self.subprograms and call
> +        self._use_vars_for_low_and_high_pc_attr with it.
> +        """
> +        processed_attrs = {}
> +        attr_value: AttributeValue
> +        for attr_name, attr_value in raw_die.attributes.items():
> +            actual_value = attr_value.value
> +            if attr_value.form in ("DW_FORM_ref4", "DW_FORM_ref_addr"):
> +                referenced_die = raw_die.get_DIE_from_attribute(attr_name)
> +                actual_value = referenced_die.offset
> +                self.referenced_offsets.add(referenced_die.offset)
> +
> +            processed_attrs[attr_name] = DWARFAttribute(
> +                raw_die.offset,
> +                attr_name,
> +                actual_value,
According to the comment above, this should not send actual_value, but 
instead the string "referenced". Is this an oversight here or (more 
likely) in the comment?
> +                attr_value.form
> +            )
> +
> +        if raw_die.tag == DWARFCompileUnit.compile_unit_tag:
> +            # FIXME: This isn't pretty, as it relies on global parser state.
> +            # It also only works under the assumption CUs can't be nested.
> +            die_cu = self.raw_cu_list[-1]

We try and not get code into GDB that has a FIXME, so this has to be 
resolved before going in.

Would it make sense to call _parse_die with the CU that the DIE comes from?

> +            processed_die = DWARFCompileUnit(raw_die, die_cu, processed_attrs)
> +        else:
> +            processed_die = DWARFDIE(
> +                raw_die.offset,
> +                raw_die.tag,
> +                processed_attrs,
> +                None
> +            )
> +
> +        if raw_die.get_parent() is None:
> +            # Top level DIE
> +            self.top_level_dies.append(processed_die)
> +        else:
> +            # Setting the parent here assumes the parent was already processed
> +            # prior to this DIE being found.
> +            # As far as I'm aware, this is always true in DWARF.
> +            processed_parent = self.offset_to_die[raw_die.get_parent().offset]
> +            processed_parent.children.append(processed_die)
> +
> +        if processed_die.tag == "DW_TAG_subprogram":
> +            self.subprograms.append(processed_die)
> +            self._use_vars_for_low_and_high_pc_attr(processed_die)
> +
> +        self.offset_to_die[processed_die.offset] = processed_die
> +        return processed_die
> +
> +    def _create_necessary_labels(self):
> +        """Create labels to DIEs that were referenced by others."""
> +        for offset in self.referenced_offsets:
> +            die = self.offset_to_die[offset]
> +            self._create_label_for_die(die)
> +
> +    def _use_vars_for_low_and_high_pc_attr(self, subprogram: DWARFDIE) -> None:
> +        """Replace existing PC attributes with Tcl variables.
> +
> +        If DW_AT_low_pc exists for this DIE, replace it with accessing the
> +        variable whose name is given by self.subprogram_start_var(subprogram).
> +
> +        If DW_AT_high_pc exists for this DIE, replace it with accessing the
> +        variable whose name is given by self.subprogram_end_var(subprogram).
> +        """
> +        low_pc_attr_name = "DW_AT_low_pc"
> +        if low_pc_attr_name in subprogram.attrs:
> +            start = self.subprogram_start_var(subprogram)
> +            subprogram.attrs[low_pc_attr_name].value = start
> +
> +        high_pc_attr_name = "DW_AT_high_pc"
> +        if high_pc_attr_name in subprogram.attrs:
> +            end = self.subprogram_end_var(subprogram)
> +            subprogram.attrs[high_pc_attr_name].value = end
> +
> +    def _create_label_for_die(self, die: DWARFDIE) -> None:
> +        """Set tcl_label to a unique string among other DIEs for this parser.
> +
> +        As a first attempt, use labelify(die.name()). If the DIE does not have
> +        a name, use labelify(die.type_name()).
> +
> +        If the chosen initial label is already taken, try again appending "_2".
> +        While the attempt is still taken, try again replacing it with "_3", then
> +        "_4", and so on.
> +
> +        This function also creates an entry on self.label_to_die.
> +        """
> +        if die.tcl_label is not None:
> +            return
> +
> +        label = labelify_str(die.name() or die.type_name())
> +
> +        # Deduplicate label in case of collision
> +        if label in self.taken_labels:
> +            suffix_nr = 2
> +
> +            # Walrus operator to prevent writing the assembled label_suffix
> +            # string literal twice. This could be rewritten by copying the
> +            # string literal to the line after the end of the while loop,
> +            # but I deemed it would be too frail in case one of them needs
> +            # to be changed and the other is forgotten.
> +            while (new_label := f"{label}_{suffix_nr}") in self.taken_labels:
> +                suffix_nr += 1
> +            label = new_label
> +
> +        die.tcl_label = label
> +        self.label_to_die[label] = die
> +        self.taken_labels.add(label)
> +
> +    def subprogram_start_var(self, subprogram: DWARFDIE) -> str:
> +        """Name of the Tcl variable that holds the low PC for a subprogram."""
> +        return f"${subprogram.name()}_start"
> +
> +    def subprogram_end_var(self, subprogram: DWARFDIE) -> str:
> +        """Name of the Tcl variable that holds the high PC for a subprogram."""
> +        return f"${subprogram.name()}_end"
> +
> +    def all_labels(self) -> set[str]:
> +        """Get a copy of the set of all labels known to the parser so far."""
> +        return copy(self.taken_labels)
> +
> +
> +class DWARFAssemblerGenerator:
> +    """Class that generates Dwarf::assemble code out of a DWARFParser."""
> +
> +    def __init__(self, dwarf_parser: DWARFParser, output=sys.stdout):
> +        self.dwarf_parser = dwarf_parser
> +        self.output = output
> +
> +    def emit(self, line: str, indent_count: int) -> None:
> +        """Print a single line indented indent_count times to self.output.
> +
> +        If line is empty, it will always print an empty line, even with nonzero
> +        indent_count.
> +        """
> +        if line:
> +            line = get_indent_str(indent_count) + line
> +        print(line, file=self.output)
> +
> +    def generate_die(self, die: DWARFDIE, indent_count: int):
> +        """Generate the lines that represent a DIE."""
> +        die_lines = die.format(self.dwarf_parser.offset_to_die, indent_count)
> +        self.emit(die_lines, 0)
> +
> +    def generate(self):
> +        indent_count = 0
> +
> +        year = datetime.now().year
> +        copyright_notice = copyright_notice_template.format(current_year=year)
> +        for copyright_line in copyright_notice.split("\n"):
> +            self.emit(copyright_line, indent_count)
> +
> +        self.emit("load_lib dwarf.exp", indent_count)
> +        self.emit("require dwarf2_support", indent_count)
> +        self.emit("set asm_file [standard_output_file $srcfile2]", indent_count)
> +
> +        self.emit("", 0)
> +        self.emit("# TODO: call prepare_for_testing", indent_count)
> +
> +        self.emit("", 0)
> +        self.emit("Dwarf::assemble $asm_file " + lbrace, indent_count)
Honestly, you can just have "Dwarf::assemble $srcfile2" I think. no need 
to set an asm_file
> +
> +        # Begin Dwarf::assemble body.
> +        indent_count += 1
> +        self.emit("global srcdir subdir srcfile", indent_count)
> +
> +        all_labels = self.dwarf_parser.all_labels()
> +        self.emit("declare_labels " + " ".join(all_labels), indent_count)
> +
> +        self.emit("", 0)
> +        for subprogram in self.dwarf_parser.subprograms:
> +            self.emit(f"get_func_info {subprogram.name()}", indent_count)
> +
> +        for die in self.dwarf_parser.top_level_dies:
> +            self.generate_die(die, indent_count)
> +
> +        # TODO: line table, if it's within scope (it probably isn't).
> +
> +        # End Dwarf::assemble body.
> +        indent_count -= 1
> +        self.emit(rbrace, indent_count)
> +
> +def main(argv):
> +    try:
> +        filename = argv[1]
> +    except IndexError:
> +        print("Usage:")
> +        print("python ./asm_to_dwarf_assembler.py <path/to/elf/file>")
> +        sys.exit(1)
> +
> +    try:
> +        with open(filename, "rb") as elf_file:
> +            parser = DWARFParser(elf_file)
> +    except Exception as e:
> +        print("Error parsing ELF file. Does it contain DWARF information?")
> +        print(str(e))
> +        sys.exit(2)
> +    generator = DWARFAssemblerGenerator(parser)
> +    generator.generate()
> +
> +if __name__ == "__main__":
> +    main(sys.argv)
  
William Ferreira Dec. 17, 2024, 9:36 p.m. UTC | #7
I've addressed all of your concerns and I will send the next version of
the patch soon. One comment of yours in particular made me rethink a
comment section:

> According to the comment above, this should not send actual_value, but
> instead the string "referenced". Is this an oversight here or (more
> likely) in the comment?

The original meaning I thought the comment was passing along was that it
was keeping track of which DIEs were referenced, not that a string
literal "referenced" had a special meaning. I've since added more
comments in that region about how this is done - an instance variable -
and annotated said instance variable. Annotations are comments that are
visible in some IDEs' hover feature.

On Tue, Dec 17, 2024 at 1:43 PM Guinevere Larsen <guinevere@redhat.com> wrote:
>
> Thanks for working on this amazing script! This is really fancy,  and I
> really like what you got going
>
> I have some specific feedback on some lines, but in general I really
> like where the script is going!
> On 12/5/24 10:55 AM, William Ferreira wrote:
> > PR testsuite/32261 requests a script that could convert old .S-based
> > tests (that were made before dwarf.exp existed) into the new
> > Dwarf::assemble calls in Tcl. This commit is an initial implementation
> > of such a script. Python was chosen for convenience, and only relies on
> > a single external library.
> >
> > Usage: the script operates not on .S files, but on ELF files with DWARF
> > information. To convert an old test, one must run said test via
> > `make check-gdb TESTS=testname` in their build directory. This will
> > produce, as a side effect, an ELF file the test used as an inferior, at
> > gdb/testsuite/outputs/testname/testname. This ELF file is this script's
> > input.
> >
> > Reliability: not counting the limitations listed below, the script seems
> > functional enough to be worthy of discussion in the mailing list. I have
> > been testing it with different tests that already use Dwarf::assemble,
> > to see if the script can produce a similar call to it. Indeed, in the
> > cases that I've tested (including some more complex ones, marked with an
> > asterisk below) it is able to produce comparable output to the original
> > exp file. Of course, it can't reproduce the complex code *before* the
> > Dwarf::assemble call. Values calculated then are simply inlined.
> >
> > The following .exp files have been tried in this way and their outputs
> > highly resemble the original:
> > - gdb.dwarf2/dynarr-ptr
> > - gdb.dwarf2/void-type
> > - gdb.dwarf2/ada-thick-pointer
> > - gdb.dwarf2/atomic-type
> > - gdb.dwarf2/dw2-entry-points (*)
> >
> > The following .exp files DO NOT WORK with this script:
> > - gdb.dwarf2/cu-no-addrs
> >    - aranges not supported.
> >    - Compile unit hi_pc and low_pc hardcoded, prone to user error
> >      due to forgetting to replace with variables.
> >
> > The following .exp files work, with caveats addressed in the limitations
> > section below.
> > - gdb.dwarf2/cpp-linkage-name
> >    - Works correctly except for one attribute of the form SPECIAL_expr.
> > - gdb.dwarf2/dw2-unusual-field-names
> >    - Same as above, with two instances of SPECIAL_expr.
> > - gdb.dwarf2/implptr-optimized-out
> >    - Same as above, with two instances of SPECIAL_expr.
> >
> > Currently the script has the following known limitations:
> > - It does not support Tcl outside a small boilerplate and
> >    Dwarf::assemble calls.
> I don't think we need to support anything other than creating the
> dwarf::assemble stuff. Everything else is usual boilerplate from the
> testsuite, so we're pretty familiar with it (or at least have plenty of
> things to double-check) while dwarf assembler isn't as common and much
> more complex.
> > - It does not support line tables.
> > - It does not use $srcfile and other variables in the call to
> >    Dwarf::assemble (since it can't know where it is safe to substitute).
> > - It does not support "fission" type DWARFs (in fact I still have no
> >    clue what those are).
> > - It does not support cu {label LABEL} {} CUs, mostly because I couldn't
> >    find the information using pyelftools.
> > - It sometimes outputs empty CUs at the start and end of the call. This
> >    might be a problem with my machine, but I've checked with DWARF dumps
> >    and they are indeed in the input ELF files generated by
> >    `make check-gdb`.
> > - It does not support attributes with the forms DW_FORM_block* and
> >    DW_FORM_exprloc. This is mostly not a concern of the difficulty of
> >    the implementation, but of it being an incomplete feature and, thus,
> >    more susceptible to users forgetting to correct its mistakes or
> >    unfinished values (please see discussion started by Guinevere at
> >    comment 23 https://sourceware.org/bugzilla/show_bug.cgi?id=32261#c23).
> >    The incompleteness of this feature is easy to demonstrate: any call to
> >    gdb_target_symbol, a common tool used in these attributes, needs a
> >    symbol name that is erased after compilation. There is no way to guess
> >    where that address being referenced in a DW_OP_addr comes from, and it
> >    can't be hard coded since it can change depending on the machine
> >    compiling it.
> >
> > Please bring up any further shortcomings this script may have with your
> > expectations.
> >
> > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32261
> > ---
> >   gdb/testsuite/lib/asm_to_dwarf_assembler.py | 830 ++++++++++++++++++++
> >   1 file changed, 830 insertions(+)
> >   create mode 100644 gdb/testsuite/lib/asm_to_dwarf_assembler.py
> >
> > diff --git a/gdb/testsuite/lib/asm_to_dwarf_assembler.py b/gdb/testsuite/lib/asm_to_dwarf_assembler.py
> > new file mode 100644
> > index 00000000000..ea86c19c805
> > --- /dev/null
> > +++ b/gdb/testsuite/lib/asm_to_dwarf_assembler.py
> > @@ -0,0 +1,830 @@
> > +# Copyright 2024 Free Software Foundation, Inc.
> > +
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 3 of the License, or
> > +# (at your option) any later version.
> > +#
> > +# This program is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > +
> > +# Due to the pyelftools dependency, this script requires Python version
> > +# 3.10 or greater to run.
> > +
> > +"""A utility to convert ELF files with DWARF info to Dwarf::assemble code.
> > +
> > +Usage:
> > +    python ./asm_to_dwarf_assembler.py <path/to/elf/file>
> > +
> > +Dependencies:
> > +    Python >= 3.10
> > +    pyelftools >= 0.31
> > +
> > +Notes:
> > +- Line tables are not currently supported.
> > +- Non-contiguous subprograms are not currently supported.
> > +- If you want to use $srcfile or similar, you must edit the references to the
> > +  file name manually, including DW_AT_name attributes on compile units.
> > +- If run with binaries generated by make check-gdb, it may include an
> > +  additional compile_unit before and after the actual compile units. This is
> > +  an artifact of the normal compilation process, as these CUs are indeed in
> > +  the generated DWARF in some cases.
> > +"""
> > +
> > +import sys
> > +import argparse
> > +import elftools
> > +import re
> > +
> > +from io import BytesIO, IOBase
> > +from typing import Any, Optional, Iterable, List, Union, Annotated
> > +from functools import cache
> > +from dataclasses import dataclass
> > +from copy import copy
> > +from datetime import datetime
> > +from logging import getLogger
> > +
> > +from elftools import dwarf, elf
> > +from elftools.elf.elffile import ELFFile
> > +from elftools.dwarf.compileunit import CompileUnit as RawCompileUnit
> > +from elftools.dwarf.die import DIE as RawDIE, AttributeValue
> > +from elftools.common.exceptions import DWARFError
> > +
> > +
> > +logger = getLogger(__file__)
> > +
> > +
> > +EXPR_ATTRIBUTE_FORMS = [
> > +    "DW_FORM_exprloc",
> > +    "DW_FORM_block",
> > +    "DW_FORM_block1",
> > +    "DW_FORM_block2",
> > +    "DW_FORM_block4",
> > +]
> Since these aren't supported, does it make sense to keep them here?
> > +
> > +
> > +@dataclass
> > +class DWARFOperation:
> > +    """Values of the form DW_OP_<name>."""
> > +    suffix: Annotated[str, 'Part of the name that succeeds "DW_OP_".']
> > +    code: Annotated[int, "Numeric representation of the OP."]
> > +    num_operands: Annotated[int, "Number of operands this operator expects."]
> > +
> > +
> > +dwarf_operations: Annotated[
> > +    dict[int, DWARFOperation],
> > +    "Table of all DWARF operations as per table 7.9 of the DWARF5 spec."
> > +] = {}
> > +def _register_op(suffix: str, code: int, num_operands: int) -> None:
> > +    assert code not in dwarf_operations, "Duplicate operation code found."
> > +    dwarf_operations[code] = DWARFOperation(suffix, code, num_operands)
> > +
> > +def _register_all_ops() -> None:
> > +    # Op codes 0x01 and 0x02 are reserved.
> > +
> > +    _register_op("addr", 0x03, 1)
> > +
> > +    # Op codes 0x04 and 0x05 are reserved.
> > +
> > +    _register_op("deref", 0x06, 0)
> > +
> > +    # Op code 0x07 is reserved.
> > +
> > +    _register_op("const1u", 0x08, 1)
> > +    _register_op("const1s", 0x09, 1)
> > +    _register_op("const2u", 0x0a, 1)
> > +    _register_op("const2s", 0x0b, 1)
> > +    _register_op("const4u", 0x0c, 1)
> > +    _register_op("const4s", 0x0d, 1)
> > +    _register_op("const8u", 0x0e, 1)
> > +    _register_op("const8s", 0x0f, 1)
> > +    _register_op("constu", 0x10, 1)
> > +    _register_op("consts", 0x11, 1)
> > +
> > +    _register_op("dup", 0x12, 0)
> > +    _register_op("drop", 0x13, 0)
> > +    _register_op("over", 0x14, 0)
> > +    _register_op("pick", 0x15, 1)
> > +    _register_op("swap", 0x16, 0)
> > +    _register_op("rot", 0x17, 0)
> > +    _register_op("xderef", 0x18, 0)
> > +
> > +    _register_op("abs", 0x19, 0)
> > +    _register_op("and", 0x1a, 0)
> > +    _register_op("div", 0x1b, 0)
> > +    _register_op("minus", 0x1c, 0)
> > +    _register_op("mod", 0x1d, 0)
> > +    _register_op("mul", 0x1e, 0)
> > +    _register_op("neg", 0x1f, 0)
> > +    _register_op("not", 0x20, 0)
> > +    _register_op("or", 0x21, 0)
> > +    _register_op("plus", 0x22, 0)
> > +    _register_op("plus_uconst", 0x23, 1)
> > +    _register_op("shl", 0x24, 0)
> > +    _register_op("shr", 0x25, 0)
> > +    _register_op("shra", 0x26, 0)
> > +    _register_op("xor", 0x27, 0)
> > +    _register_op("bra", 0x28, 1)
> > +    _register_op("eq", 0x29, 0)
> > +    _register_op("ge", 0x2a, 0)
> > +    _register_op("gt", 0x2b, 0)
> > +    _register_op("le", 0x2c, 0)
> > +    _register_op("lt", 0x2d, 0)
> > +    _register_op("ne", 0x2e, 0)
> > +    _register_op("skip", 0x2f, 0)
> > +
> > +    for lit_nr in range(32):
> > +        _register_op(f"lit{lit_nr}", 0x30 + lit_nr, 0)
> > +
> > +    for reg_nr in range(32):
> > +        _register_op(f"reg{reg_nr}", 0x50 + reg_nr, 0)
> > +
> > +    for breg_nr in range(32):
> > +        _register_op(f"breg{breg_nr}", 0x70 + breg_nr, 0)
> > +
> > +    _register_op("regx", 0x90, 1)
> > +    _register_op("fbregx", 0x91, 1)
> > +    _register_op("bregx", 0x92, 2)
> > +    _register_op("piece", 0x93, 1)
> > +    _register_op("deref_size", 0x94, 1)
> > +    _register_op("xderef_size", 0x95, 1)
> > +    _register_op("nop", 0x96, 0)
> > +    _register_op("push_object_address", 0x97, 0)
> > +
> > +    _register_op("call2", 0x98, 1)
> > +    _register_op("call4", 0x99, 1)
> > +    _register_op("call_ref", 0x9a, 1)
> > +    _register_op("form_tls_address", 0x9b, 0)
> > +    _register_op("call_frame_cfa", 0x9c, 0)
> > +
> > +    _register_op("bit_piece", 0x9d, 2)
> > +    _register_op("implicit_value", 0x9e, 2)
> > +    _register_op("stack_value", 0x9f, 0)
> > +    _register_op("implicit_pointer", 0xa0, 2)
> > +    _register_op("addrx", 0xa1, 1)
> > +    _register_op("constx", 0xa2, 1)
> > +
> > +    _register_op("entry_value", 0xa3, 2)
> > +    _register_op("const_type", 0xa4, 3)
> > +    _register_op("regval_type", 0xa5, 2)
> > +    _register_op("deref_type", 0xa6, 2)
> > +    _register_op("xderef_type", 0xa7, 2)
> > +
> > +    _register_op("convert", 0xa8, 1)
> > +    _register_op("reinterpret", 0xa9, 1)
> > +
> > +
> > +_register_all_ops()
> Similar question to here, if I understood later code correctly. Do we
> want to keep this if we're not doing location expressions?
> > +
> > +
> > +copyright_notice_template = """
> > +# Copyright 2024-{current_year} Free Software Foundation, Inc.
> If the script is adding copyright, it will be for a new file, and so
> copyright will always start on the year when the script is run. So it
> can just be {current_year}
> > +#
> > +# This program is free software; you can redistribute it and/or modify
> > +# it under the terms of the GNU General Public License as published by
> > +# the Free Software Foundation; either version 3 of the License, or
> > +# (at your option) any later version.
> > +#
> > +# This program is distributed in the hope that it will be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > +""".lstrip()  # Strip to remove initial \n, but not the last one.
> > +
> > +# Workaround for my editor not to freak out over unclosed braces.
> > +lbrace, rbrace = "{", "}"
> > +
> > +@cache
> > +def get_indent_str(indent_count: int) -> str:
> > +    """Get whitespace string to prepend to another for indenting."""
> > +    indent = (indent_count // 2) * "\t"
> > +    if indent_count % 2 == 1:
> > +        indent += "    "
> > +    return indent
> > +
> > +
> > +def indent(line: str, indent_count: int) -> str:
> > +    """Indent line by indent_count levels."""
> > +    return get_indent_str(indent_count) + line
> > +
> > +
> > +def labelify_str(s: str) -> str:
> > +    """Make s appropriate for a label name."""
> > +    # Replace "*" with the literal word "ptr".
> > +    s = s.replace("*", "ptr")
> > +
> > +    # Replace any non-"word" characters by "_".
> > +    s = re.sub(r"\W", "_", s)
> > +
> > +    # Remove consecutive "_"s.
> > +    s = re.sub(r"__+", "_", s)
> > +
> > +    return s
> > +
> > +
> > +class DWARFAttribute:
> > +    """Storage unit for a single DWARF attribute.
> > +
> > +    All its values are strings that are usually passed on
> > +    directly to format. The exceptions to this are attributes
> > +    with int values with DW_FORM_ref4 or DW_FORM_ref_addr form.
> > +    Their values are interpreted as the global offset of the DIE
> > +    being referenced, which are looked up dynamically to fetch
> > +    their labels.
> > +    """
> > +    def __init__(
> > +        self,
> > +        die_offset: int,
> > +        name: str,
> > +        value: str | bytes | int | bool,
> > +        form=None,
> > +    ):
> > +        self.die_offset = die_offset
> > +        self.name = name
> > +        self.value = value
> > +        self.form = form
> > +
> > +    def _format_expr_value(self) -> str:
> > +        # This code is left here in case this is feature is desired in the
> > +        # future. It is partly functional, being able to decode operations
> > +        # but not operands due to their variable length in bytes.
> > +
> > +        # self.form = "SPECIAL_expr"
> > +        # lines = [lbrace]
> > +        # curr_idx = 0
> > +        # while curr_idx < len(self.value):
> > +        #     curr_op_code = self.value[curr_idx]
> > +        #
> > +        #     try:
> > +        #         curr_op: DWARFOperation = dwarf_operations[curr_op_code]
> > +        #     except KeyError as err:
> > +        #         raise KeyError(
> > +        #             f"Unknown op: {curr_op_code}, item {curr_idx} of"
> > +        #             f" attribute {self.name}, DIE at offset"
> > +        #             f" {self.die_offset}."
> > +        #         ) from err
> > +        #     # Empty arg here so it inserts a space before the first argument,
> > +        #     # only if there is indeed a first argument.
> > +        #     args = [""]
> > +        #     for arg_nr in range(curr_op.num_operands):
> > +        #         curr_idx += 1
> > +        #         # TODO decode bytes according to specific operation.
> > +        #         # The following line only reads the next BYTE of the sequence,
> > +        #         # and needs to be adjusted to read the correct number of
> > +        #         # bytes.
> > +        #         # args.append(str(self.value[curr_idx]))
> > +        #     op_line = "DW_OP_" + curr_op.suffix + " ".join(args)
> > +        #     lines.append(op_line)
> > +        #     curr_idx += 1
> > +        #
> > +        # lines.append(rbrace)
> > +        # return "\n".join(lines)
> > +        self.form = "SPECIAL_expr"
> > +        return "{ TODO: Fill expr list }"
> Maybe, rather than "TODO" you could add "MANUAL" or something, to make
> it clear that this is a manual process and expected to be so? (naming is
> hard, but I think todo implies that this will be improved soon, and I
> don't think that'd be the case?)
> > +
> > +    def _needs_escaping(self, str_value: str) -> bool:
> > +        charset = set(str_value)
> > +        return bool(charset.intersection({"{", "}", " ", "\t"}))
> > +
> > +    def _format_str(self, str_value: str) -> str:
> > +        if self._needs_escaping(str_value):
> > +            escaped_str = str(str_value)
> > +            # Replace single escape (which is itself escaped because of regex)
> > +            # with a double escape (which doesn't mean anything to regex so
> > +            # it doesn't need escaping).
> > +            escaped_str = re.sub(r"\\", r"\\", escaped_str)
> > +            escaped_str = re.sub("([{}])", r"\\\1", escaped_str)
> > +            return "{" + escaped_str + "}"
> > +        else:
> > +            return str_value
> > +
> > +    def _format_value(
> > +        self,
> > +        offset_die_lookup: dict[int, "DWARFDIE"],
> > +        indent_count: int = 0
> > +    ) -> str:
> > +        if self.form in EXPR_ATTRIBUTE_FORMS:
> > +            return self._format_expr_value()
> > +        elif isinstance(self.value, bool):
> > +            return str(int(self.value))
> > +        elif isinstance(self.value, int):
> > +            if self.form == "DW_FORM_ref4":
> > +                # ref4-style referencing label.
> > +                die: "DWARFDIE" = offset_die_lookup[self.value]
> > +                return ":$" + die.tcl_label
> > +            elif self.form == "DW_FORM_ref_addr":
> > +                # ref_addr-style referencing label.
> > +                die: "DWARFDIE" = offset_die_lookup[self.value]
> > +                return "%$" + die.tcl_label
> > +            else:
> > +                return str(self.value)
> > +        elif isinstance(self.value, bytes):
> > +            return self._format_str(self.value.decode("ascii"))
> > +        elif isinstance(self.value, str):
> > +            return self._format_str(self.value)
> > +        else:
> > +            raise NotImplementedError(
> > +                "Unknown data type: " + str(type(self.value))
> > +            )
> > +
> > +    def format(
> > +        self,
> > +        offset_die_lookup: dict[int, "DWARFDIE"],
> > +        indent_count: int = 0
> > +    ) -> str:
> > +        """Format the attribute in the form {name value form}.
> > +
> > +        If form is DW_FORM_exprloc or DW_FORM_block, see next section on
> > +        DWARFOperations.
> > +
> > +        If it isn't, value is formatted as follows:
> > +            If bool, use "1" if True, "0" if False.
> > +            If int:
> > +                If form is DW_FORM_ref4, use ":$label" where label is the
> > +                    tcl_label of the DWARFDIE at offset "value".
> > +                If form is DW_FORM_ref_addr, use "%$label" where label is
> > +                    the tcl_label of the DWARFDIE at offset "value".
> > +                Else, use value directly.
> > +            If bytes, use value.decode("ascii")
> > +            If str, use value directly.
> > +            Any other type results in a NotImplementedError being raised.
> > +
> > +        Regarding DW_FORM_exprloc and DW_FORM_block:
> > +            The form is replaced with SPECIAL_expr.
> > +            The entries in the value are interpreted and decoded using the
> > +            dwarf_operations dictionary, and replaced with their names where
> > +            applicable.
> > +        """
> > +        s = lbrace
> > +        s += self.name + " "
> > +        s += self._format_value(offset_die_lookup)
> > +
> > +        # Only explicitly state form if it's not a reference.
> > +        if self.form not in [None, "DW_FORM_ref4", "DW_FORM_ref_addr"]:
> > +            s += " " + self.form
> > +
> > +        s += rbrace
> > +        return indent(s, indent_count)
> > +
> > +
> > +class DWARFDIE:
> > +    """This script's parsed version of a RawDIE."""
> > +    def __init__(
> > +        self,
> > +        offset: int,
> > +        tag: str,
> > +        attrs: dict[str, DWARFAttribute],
> > +        tcl_label: Optional[str] = None
> > +    ):
> > +        self.offset = offset
> > +        self.tag = tag
> > +        self.attrs = copy(attrs)
> > +        self.children = []
> > +        self.tcl_label = tcl_label
>
> I think it would be worth a comment here or in format lines, regarding
> what tcl_label is actually about.
>
> it might not be self-evident what these are, especially for people
> unfamiliar with the assembler.
>
> > +
> > +    def format_lines(
> > +        self,
> > +        offset_die_lookup: dict[int, "DWARFDIE"],
> > +        indent_count: int = 0
> > +    ) -> list[str]:
> > +        """Get the list of lines that represent this DIE in Dwarf assembler."""
> > +        die_lines = []
> > +
> > +        # Prepend label to first line, if it's set.
> > +        if self.tcl_label:
> > +            first_line_start = self.tcl_label + ": "
> > +        else:
> > +            first_line_start = ""
> > +
> > +        # First line, including label.
> > +        first_line = indent(
> > +            first_line_start + self.tag + " " + lbrace,
> > +            indent_count
> > +        )
> > +        die_lines.append(first_line)
> > +
> > +        # Format attributes, if any.
> > +        if self.attrs:
> > +            for attr_name, attr in self.attrs.items():
> > +                attr_line = attr.format(
> > +                    offset_die_lookup,
> > +                    indent_count=indent_count+1
> > +                )
> > +                die_lines.append(attr_line)
> > +            die_lines.append(indent(rbrace, indent_count))
> > +        else:
> > +            # Don't create a new line, just append and immediately close the
> > +            # brace on the last line.
> > +            die_lines[-1] += rbrace
> > +
> > +        # Format children, if any.
> > +        if self.children:
> > +            # Only open a new brace if there are any children for the
> > +            # current DIE.
> > +            die_lines[-1] += " " + lbrace
> > +            for child in self.children:
> > +                child_lines = child.format_lines(
> > +                    offset_die_lookup,
> > +                    indent_count=indent_count+1
> > +                )
> > +                die_lines.extend(child_lines)
> > +            die_lines.append(indent(rbrace, indent_count))
> > +
> > +        return die_lines
> > +
> > +    def format(
> > +        self,
> > +        offset_die_lookup: dict[int, "DWARFDIE"],
> > +        indent_count: int = 0
> > +    ) -> str:
> > +        """Join result from format_lines into a single str."""
> > +        return "\n".join(self.format_lines(offset_die_lookup, indent_count))
> > +
> > +    def name(self) -> Optional[str]:
> > +        """Get DW_AT_name (if present) decoded as ASCII."""
> > +        raw_value = self.attrs.get("DW_AT_name")
> > +        if raw_value is None:
> > +            return None
> > +        else:
> > +            return raw_value.value.decode("ascii")
> > +
> > +    def type_name(self) -> str:
> > +        """Name of Dwarf tag, with the "DW_TAG_" prefix removed."""
> > +        return re.sub("DW_TAG_", "", self.tag)
> > +
> > +class DWARFCompileUnit(DWARFDIE):
> > +    """Wrapper subclass for CU DIEs.
> > +
> > +    This is necessary due to the special format CUs take in Dwarf::assemble.
> > +
> > +    Instead of simply:
> > +        DW_TAG_compile_unit {
> > +            <attributes>
> > +        } {
> > +            <children>
> > +        }
> > +
> > +    CUs are formatted as:
> > +        cu { <cu_special_vars> } {
> > +            DW_TAG_compile_unit {
> > +                <attributes>
> > +            } {
> > +                <children>
> > +            }
> > +        }
> > +    """
> > +
> > +    # Default value for parameter dwarf_version defined in dwarf.exp line 1552.
> > +    default_dwarf_version = 4
> > +
> > +    # Default value for parameter is_fission defined in dwarf.exp line 1556.
> > +    # Currently not implemented, see comment below.
> > +    # default_is_fission = False
> > +
> > +    # Tag that signifies a DIE is a compile unit.
> > +    compile_unit_tag = "DW_TAG_compile_unit"
> > +
> > +    def __init__(
> > +        self,
> > +        raw_die: RawDIE,
> > +        raw_cu: RawCompileUnit,
> > +        attrs: dict[str, DWARFAttribute],
> > +    ):
> > +        """Initialize additional instance variables for CU encoding.
> > +
> > +        The additional instance variables are:
> > +            - is_64_bit: Optional[bool]
> > +              default None
> > +              Whether this CU is 64 bit or not.
> > +            - dwarf_version: int
> > +              default DWARFCompileUnit.default_dwarf_version
> > +              Version of DWARF this CU is using.
> > +            - addr_size: Optional[int]
> > +              default None
> > +              Size of an address in bytes.
> > +
> > +        These variables are used to configure the first parameter of the cu
> > +        proc (which contains calls to the compile_unit proc in the body of
> > +        Dwarf::assemble).
> > +        """
> > +        super().__init__(
> > +            raw_die.offset,
> > +            DWARFCompileUnit.compile_unit_tag,
> > +            attrs
> > +        )
> > +        self.raw_cu = raw_cu
> > +        self.is_64_bit: Optional[bool] = None
> > +        self.dwarf_version: int = raw_cu.header.get(
> > +            "version",
> > +            DWARFCompileUnit.default_dwarf_version
> > +        )
> > +        self.addr_size: Optional[int] = raw_cu.header["address_size"]
> > +
> > +        # Fission is not currently implemented because I don't know where to
> > +        # fetch this information from.
> > +        # self.is_fission: bool = self.default_is_fission
> > +
> > +        # CU labels are not currently implemented because I haven't found where
> > +        # pyelftools exposes this information.
> > +        # self.cu_label: Optional[str] = None
> > +
> > +    def format_lines(
> > +        self,
> > +        offset_die_lookup: dict[int, DWARFDIE],
> > +        indent_count: int = 0,
> > +    ) -> list[str]:
> > +        lines = []
> > +        lines.append(self._get_header(indent_count))
> > +        inner_lines = super().format_lines(
> > +            offset_die_lookup,
> > +            indent_count + 1
> > +        )
> > +        lines += inner_lines
> > +        lines.append(indent(rbrace, indent_count))
> > +        return lines
> > +
> > +    def _get_header(self, indent_count: int = 0) -> str:
> > +        """Assemble the first line of the surrounding 'cu {} {}' proc call."""
> > +        header = indent("cu " + lbrace, indent_count)
> > +        cu_params = []
> > +
> > +        if self.is_64_bit is not None:
> > +            # Convert from True/False to 1/0.
> > +            param_value = int(self.is_64_bit)
> > +            cu_params += ["is_64", str(param_value)]
> > +
> > +        if self.dwarf_version != DWARFCompileUnit.default_dwarf_version:
> > +            cu_params += ["version", str(self.dwarf_version)]
> > +
> > +        if self.addr_size is not None:
> > +            cu_params += ["addr_size", str(self.addr_size)]
> > +
> > +        # Fission is not currently implemented, see comment above.
> > +        # if self.is_fission != DWARFCompileUnit.default_is_fission:
> > +        #     # Same as is_64_bit conversion, True/False -> 1/0.
> > +        #     param_value = int(self.is_fission)
> > +        #     cu_params += ["fission", str(param_value)]
> > +
> > +        # CU labels are not currently implemented, see commend above.
> > +        # if self.cu_label is not None:
> > +        #     cu_params += ["label", self.cu_label]
> > +
> > +        if cu_params:
> > +            header += " ".join(cu_params)
> > +
> > +        header += rbrace + " " + lbrace
> > +        return header
> > +
> > +class DWARFParser:
> > +    """Converter from pyelftools's DWARF representation to this script's."""
> > +
> > +    def __init__(self, elf_file: IOBase):
> > +        """Init parser with file opened in binary mode.
> > +
> > +        File can be closed after this function is called.
> > +        """
> > +        self.raw_data = BytesIO(elf_file.read())
> > +        self.elf_data = ELFFile(self.raw_data)
> > +        self.dwarf_info = self.elf_data.get_dwarf_info()
> > +        self.offset_to_die: dict[int, DWARFDIE] = {}
> > +        self.label_to_die: dict[str, DWARFDIE] = {}
> > +        self.referenced_offsets: set[int] = set()
> > +        self.raw_cu_list: list[RawCompileUnit] = []
> > +        self.top_level_dies: list[DWARFDIE] = []
> > +        self.subprograms: list[DWARFDIE] = []
> > +        self.taken_labels: set[str] = set()
> > +
> > +        self._read_cus()
> > +        self._create_necessary_labels()
> > +
> > +    def _read_cus(self):
> I think I'd personally rename this to _read_all_cus or something like
> that. It is easy to miss the S at the end and typo the function name or
> get confused.
> > +        """Populate self.raw_cu_list with all CUs in self.dwarf_info."""
> > +        for cu in self.dwarf_info.iter_CUs():
> > +            self._read_cu(cu)
> > +
> > +    def _read_cu(self, raw_cu: RawCompileUnit):
> > +        """Read a compile_unit into self.cu_list."""
> > +        self.raw_cu_list.append(raw_cu)
> > +        for raw_die in raw_cu.iter_DIEs():
> > +            if not raw_die.is_null():
> > +                self._parse_die(raw_die)
> > +
> > +    def _parse_die(self, raw_die: RawDIE) -> DWARFDIE:
> > +        """Process a single DIE and add it to offset_to_die.
> > +
> > +        Look for DW_FORM_ref4 and DWD_FORM_ref_addr form attributes and replace
> > +        them with the global offset of the referenced DIE, and marking the
> > +        referenced DIE as "referenced". This will be used later to assign and
> > +        use labels.
> > +
> > +        In case the DIE is a top-level DIE, add it to self.top_level_dies.
> > +
> > +        In case the DIE is a subprogram, add it to self.subprograms and call
> > +        self._use_vars_for_low_and_high_pc_attr with it.
> > +        """
> > +        processed_attrs = {}
> > +        attr_value: AttributeValue
> > +        for attr_name, attr_value in raw_die.attributes.items():
> > +            actual_value = attr_value.value
> > +            if attr_value.form in ("DW_FORM_ref4", "DW_FORM_ref_addr"):
> > +                referenced_die = raw_die.get_DIE_from_attribute(attr_name)
> > +                actual_value = referenced_die.offset
> > +                self.referenced_offsets.add(referenced_die.offset)
> > +
> > +            processed_attrs[attr_name] = DWARFAttribute(
> > +                raw_die.offset,
> > +                attr_name,
> > +                actual_value,
> According to the comment above, this should not send actual_value, but
> instead the string "referenced". Is this an oversight here or (more
> likely) in the comment?
> > +                attr_value.form
> > +            )
> > +
> > +        if raw_die.tag == DWARFCompileUnit.compile_unit_tag:
> > +            # FIXME: This isn't pretty, as it relies on global parser state.
> > +            # It also only works under the assumption CUs can't be nested.
> > +            die_cu = self.raw_cu_list[-1]
>
> We try and not get code into GDB that has a FIXME, so this has to be
> resolved before going in.
>
> Would it make sense to call _parse_die with the CU that the DIE comes from?
>
> > +            processed_die = DWARFCompileUnit(raw_die, die_cu, processed_attrs)
> > +        else:
> > +            processed_die = DWARFDIE(
> > +                raw_die.offset,
> > +                raw_die.tag,
> > +                processed_attrs,
> > +                None
> > +            )
> > +
> > +        if raw_die.get_parent() is None:
> > +            # Top level DIE
> > +            self.top_level_dies.append(processed_die)
> > +        else:
> > +            # Setting the parent here assumes the parent was already processed
> > +            # prior to this DIE being found.
> > +            # As far as I'm aware, this is always true in DWARF.
> > +            processed_parent = self.offset_to_die[raw_die.get_parent().offset]
> > +            processed_parent.children.append(processed_die)
> > +
> > +        if processed_die.tag == "DW_TAG_subprogram":
> > +            self.subprograms.append(processed_die)
> > +            self._use_vars_for_low_and_high_pc_attr(processed_die)
> > +
> > +        self.offset_to_die[processed_die.offset] = processed_die
> > +        return processed_die
> > +
> > +    def _create_necessary_labels(self):
> > +        """Create labels to DIEs that were referenced by others."""
> > +        for offset in self.referenced_offsets:
> > +            die = self.offset_to_die[offset]
> > +            self._create_label_for_die(die)
> > +
> > +    def _use_vars_for_low_and_high_pc_attr(self, subprogram: DWARFDIE) -> None:
> > +        """Replace existing PC attributes with Tcl variables.
> > +
> > +        If DW_AT_low_pc exists for this DIE, replace it with accessing the
> > +        variable whose name is given by self.subprogram_start_var(subprogram).
> > +
> > +        If DW_AT_high_pc exists for this DIE, replace it with accessing the
> > +        variable whose name is given by self.subprogram_end_var(subprogram).
> > +        """
> > +        low_pc_attr_name = "DW_AT_low_pc"
> > +        if low_pc_attr_name in subprogram.attrs:
> > +            start = self.subprogram_start_var(subprogram)
> > +            subprogram.attrs[low_pc_attr_name].value = start
> > +
> > +        high_pc_attr_name = "DW_AT_high_pc"
> > +        if high_pc_attr_name in subprogram.attrs:
> > +            end = self.subprogram_end_var(subprogram)
> > +            subprogram.attrs[high_pc_attr_name].value = end
> > +
> > +    def _create_label_for_die(self, die: DWARFDIE) -> None:
> > +        """Set tcl_label to a unique string among other DIEs for this parser.
> > +
> > +        As a first attempt, use labelify(die.name()). If the DIE does not have
> > +        a name, use labelify(die.type_name()).
> > +
> > +        If the chosen initial label is already taken, try again appending "_2".
> > +        While the attempt is still taken, try again replacing it with "_3", then
> > +        "_4", and so on.
> > +
> > +        This function also creates an entry on self.label_to_die.
> > +        """
> > +        if die.tcl_label is not None:
> > +            return
> > +
> > +        label = labelify_str(die.name() or die.type_name())
> > +
> > +        # Deduplicate label in case of collision
> > +        if label in self.taken_labels:
> > +            suffix_nr = 2
> > +
> > +            # Walrus operator to prevent writing the assembled label_suffix
> > +            # string literal twice. This could be rewritten by copying the
> > +            # string literal to the line after the end of the while loop,
> > +            # but I deemed it would be too frail in case one of them needs
> > +            # to be changed and the other is forgotten.
> > +            while (new_label := f"{label}_{suffix_nr}") in self.taken_labels:
> > +                suffix_nr += 1
> > +            label = new_label
> > +
> > +        die.tcl_label = label
> > +        self.label_to_die[label] = die
> > +        self.taken_labels.add(label)
> > +
> > +    def subprogram_start_var(self, subprogram: DWARFDIE) -> str:
> > +        """Name of the Tcl variable that holds the low PC for a subprogram."""
> > +        return f"${subprogram.name()}_start"
> > +
> > +    def subprogram_end_var(self, subprogram: DWARFDIE) -> str:
> > +        """Name of the Tcl variable that holds the high PC for a subprogram."""
> > +        return f"${subprogram.name()}_end"
> > +
> > +    def all_labels(self) -> set[str]:
> > +        """Get a copy of the set of all labels known to the parser so far."""
> > +        return copy(self.taken_labels)
> > +
> > +
> > +class DWARFAssemblerGenerator:
> > +    """Class that generates Dwarf::assemble code out of a DWARFParser."""
> > +
> > +    def __init__(self, dwarf_parser: DWARFParser, output=sys.stdout):
> > +        self.dwarf_parser = dwarf_parser
> > +        self.output = output
> > +
> > +    def emit(self, line: str, indent_count: int) -> None:
> > +        """Print a single line indented indent_count times to self.output.
> > +
> > +        If line is empty, it will always print an empty line, even with nonzero
> > +        indent_count.
> > +        """
> > +        if line:
> > +            line = get_indent_str(indent_count) + line
> > +        print(line, file=self.output)
> > +
> > +    def generate_die(self, die: DWARFDIE, indent_count: int):
> > +        """Generate the lines that represent a DIE."""
> > +        die_lines = die.format(self.dwarf_parser.offset_to_die, indent_count)
> > +        self.emit(die_lines, 0)
> > +
> > +    def generate(self):
> > +        indent_count = 0
> > +
> > +        year = datetime.now().year
> > +        copyright_notice = copyright_notice_template.format(current_year=year)
> > +        for copyright_line in copyright_notice.split("\n"):
> > +            self.emit(copyright_line, indent_count)
> > +
> > +        self.emit("load_lib dwarf.exp", indent_count)
> > +        self.emit("require dwarf2_support", indent_count)
> > +        self.emit("set asm_file [standard_output_file $srcfile2]", indent_count)
> > +
> > +        self.emit("", 0)
> > +        self.emit("# TODO: call prepare_for_testing", indent_count)
> > +
> > +        self.emit("", 0)
> > +        self.emit("Dwarf::assemble $asm_file " + lbrace, indent_count)
> Honestly, you can just have "Dwarf::assemble $srcfile2" I think. no need
> to set an asm_file
> > +
> > +        # Begin Dwarf::assemble body.
> > +        indent_count += 1
> > +        self.emit("global srcdir subdir srcfile", indent_count)
> > +
> > +        all_labels = self.dwarf_parser.all_labels()
> > +        self.emit("declare_labels " + " ".join(all_labels), indent_count)
> > +
> > +        self.emit("", 0)
> > +        for subprogram in self.dwarf_parser.subprograms:
> > +            self.emit(f"get_func_info {subprogram.name()}", indent_count)
> > +
> > +        for die in self.dwarf_parser.top_level_dies:
> > +            self.generate_die(die, indent_count)
> > +
> > +        # TODO: line table, if it's within scope (it probably isn't).
> > +
> > +        # End Dwarf::assemble body.
> > +        indent_count -= 1
> > +        self.emit(rbrace, indent_count)
> > +
> > +def main(argv):
> > +    try:
> > +        filename = argv[1]
> > +    except IndexError:
> > +        print("Usage:")
> > +        print("python ./asm_to_dwarf_assembler.py <path/to/elf/file>")
> > +        sys.exit(1)
> > +
> > +    try:
> > +        with open(filename, "rb") as elf_file:
> > +            parser = DWARFParser(elf_file)
> > +    except Exception as e:
> > +        print("Error parsing ELF file. Does it contain DWARF information?")
> > +        print(str(e))
> > +        sys.exit(2)
> > +    generator = DWARFAssemblerGenerator(parser)
> > +    generator.generate()
> > +
> > +if __name__ == "__main__":
> > +    main(sys.argv)
>
>
> --
> Cheers,
> Guinevere Larsen
> She/Her/Hers
>
  

Patch

diff --git a/gdb/testsuite/lib/asm_to_dwarf_assembler.py b/gdb/testsuite/lib/asm_to_dwarf_assembler.py
new file mode 100644
index 00000000000..ea86c19c805
--- /dev/null
+++ b/gdb/testsuite/lib/asm_to_dwarf_assembler.py
@@ -0,0 +1,830 @@ 
+# Copyright 2024 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Due to the pyelftools dependency, this script requires Python version
+# 3.10 or greater to run.
+
+"""A utility to convert ELF files with DWARF info to Dwarf::assemble code.
+
+Usage:
+    python ./asm_to_dwarf_assembler.py <path/to/elf/file>
+
+Dependencies:
+    Python >= 3.10
+    pyelftools >= 0.31
+
+Notes:
+- Line tables are not currently supported.
+- Non-contiguous subprograms are not currently supported.
+- If you want to use $srcfile or similar, you must edit the references to the
+  file name manually, including DW_AT_name attributes on compile units.
+- If run with binaries generated by make check-gdb, it may include an
+  additional compile_unit before and after the actual compile units. This is
+  an artifact of the normal compilation process, as these CUs are indeed in
+  the generated DWARF in some cases.
+"""
+
+import sys
+import argparse
+import elftools
+import re
+
+from io import BytesIO, IOBase
+from typing import Any, Optional, Iterable, List, Union, Annotated
+from functools import cache
+from dataclasses import dataclass
+from copy import copy
+from datetime import datetime
+from logging import getLogger
+
+from elftools import dwarf, elf
+from elftools.elf.elffile import ELFFile
+from elftools.dwarf.compileunit import CompileUnit as RawCompileUnit
+from elftools.dwarf.die import DIE as RawDIE, AttributeValue
+from elftools.common.exceptions import DWARFError
+
+
+logger = getLogger(__file__)
+
+
+EXPR_ATTRIBUTE_FORMS = [
+    "DW_FORM_exprloc",
+    "DW_FORM_block",
+    "DW_FORM_block1",
+    "DW_FORM_block2",
+    "DW_FORM_block4",
+]
+
+
+@dataclass
+class DWARFOperation:
+    """Values of the form DW_OP_<name>."""
+    suffix: Annotated[str, 'Part of the name that succeeds "DW_OP_".']
+    code: Annotated[int, "Numeric representation of the OP."]
+    num_operands: Annotated[int, "Number of operands this operator expects."]
+
+
+dwarf_operations: Annotated[
+    dict[int, DWARFOperation],
+    "Table of all DWARF operations as per table 7.9 of the DWARF5 spec."
+] = {}
+def _register_op(suffix: str, code: int, num_operands: int) -> None:
+    assert code not in dwarf_operations, "Duplicate operation code found."
+    dwarf_operations[code] = DWARFOperation(suffix, code, num_operands)
+
+def _register_all_ops() -> None:
+    # Op codes 0x01 and 0x02 are reserved.
+
+    _register_op("addr", 0x03, 1)
+
+    # Op codes 0x04 and 0x05 are reserved.
+
+    _register_op("deref", 0x06, 0)
+
+    # Op code 0x07 is reserved.
+
+    _register_op("const1u", 0x08, 1)
+    _register_op("const1s", 0x09, 1)
+    _register_op("const2u", 0x0a, 1)
+    _register_op("const2s", 0x0b, 1)
+    _register_op("const4u", 0x0c, 1)
+    _register_op("const4s", 0x0d, 1)
+    _register_op("const8u", 0x0e, 1)
+    _register_op("const8s", 0x0f, 1)
+    _register_op("constu", 0x10, 1)
+    _register_op("consts", 0x11, 1)
+
+    _register_op("dup", 0x12, 0)
+    _register_op("drop", 0x13, 0)
+    _register_op("over", 0x14, 0)
+    _register_op("pick", 0x15, 1)
+    _register_op("swap", 0x16, 0)
+    _register_op("rot", 0x17, 0)
+    _register_op("xderef", 0x18, 0)
+
+    _register_op("abs", 0x19, 0)
+    _register_op("and", 0x1a, 0)
+    _register_op("div", 0x1b, 0)
+    _register_op("minus", 0x1c, 0)
+    _register_op("mod", 0x1d, 0)
+    _register_op("mul", 0x1e, 0)
+    _register_op("neg", 0x1f, 0)
+    _register_op("not", 0x20, 0)
+    _register_op("or", 0x21, 0)
+    _register_op("plus", 0x22, 0)
+    _register_op("plus_uconst", 0x23, 1)
+    _register_op("shl", 0x24, 0)
+    _register_op("shr", 0x25, 0)
+    _register_op("shra", 0x26, 0)
+    _register_op("xor", 0x27, 0)
+    _register_op("bra", 0x28, 1)
+    _register_op("eq", 0x29, 0)
+    _register_op("ge", 0x2a, 0)
+    _register_op("gt", 0x2b, 0)
+    _register_op("le", 0x2c, 0)
+    _register_op("lt", 0x2d, 0)
+    _register_op("ne", 0x2e, 0)
+    _register_op("skip", 0x2f, 0)
+
+    for lit_nr in range(32):
+        _register_op(f"lit{lit_nr}", 0x30 + lit_nr, 0)
+
+    for reg_nr in range(32):
+        _register_op(f"reg{reg_nr}", 0x50 + reg_nr, 0)
+
+    for breg_nr in range(32):
+        _register_op(f"breg{breg_nr}", 0x70 + breg_nr, 0)
+
+    _register_op("regx", 0x90, 1)
+    _register_op("fbregx", 0x91, 1)
+    _register_op("bregx", 0x92, 2)
+    _register_op("piece", 0x93, 1)
+    _register_op("deref_size", 0x94, 1)
+    _register_op("xderef_size", 0x95, 1)
+    _register_op("nop", 0x96, 0)
+    _register_op("push_object_address", 0x97, 0)
+
+    _register_op("call2", 0x98, 1)
+    _register_op("call4", 0x99, 1)
+    _register_op("call_ref", 0x9a, 1)
+    _register_op("form_tls_address", 0x9b, 0)
+    _register_op("call_frame_cfa", 0x9c, 0)
+
+    _register_op("bit_piece", 0x9d, 2)
+    _register_op("implicit_value", 0x9e, 2)
+    _register_op("stack_value", 0x9f, 0)
+    _register_op("implicit_pointer", 0xa0, 2)
+    _register_op("addrx", 0xa1, 1)
+    _register_op("constx", 0xa2, 1)
+
+    _register_op("entry_value", 0xa3, 2)
+    _register_op("const_type", 0xa4, 3)
+    _register_op("regval_type", 0xa5, 2)
+    _register_op("deref_type", 0xa6, 2)
+    _register_op("xderef_type", 0xa7, 2)
+
+    _register_op("convert", 0xa8, 1)
+    _register_op("reinterpret", 0xa9, 1)
+
+
+_register_all_ops()
+
+
+copyright_notice_template = """
+# Copyright 2024-{current_year} Free Software Foundation, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+""".lstrip()  # Strip to remove initial \n, but not the last one.
+
+# Workaround for my editor not to freak out over unclosed braces.
+lbrace, rbrace = "{", "}"
+
+@cache
+def get_indent_str(indent_count: int) -> str:
+    """Get whitespace string to prepend to another for indenting."""
+    indent = (indent_count // 2) * "\t"
+    if indent_count % 2 == 1:
+        indent += "    "
+    return indent
+
+
+def indent(line: str, indent_count: int) -> str:
+    """Indent line by indent_count levels."""
+    return get_indent_str(indent_count) + line
+
+
+def labelify_str(s: str) -> str:
+    """Make s appropriate for a label name."""
+    # Replace "*" with the literal word "ptr".
+    s = s.replace("*", "ptr")
+
+    # Replace any non-"word" characters by "_".
+    s = re.sub(r"\W", "_", s)
+
+    # Remove consecutive "_"s.
+    s = re.sub(r"__+", "_", s)
+
+    return s
+
+
+class DWARFAttribute:
+    """Storage unit for a single DWARF attribute.
+
+    All its values are strings that are usually passed on
+    directly to format. The exceptions to this are attributes
+    with int values with DW_FORM_ref4 or DW_FORM_ref_addr form.
+    Their values are interpreted as the global offset of the DIE
+    being referenced, which are looked up dynamically to fetch
+    their labels.
+    """
+    def __init__(
+        self,
+        die_offset: int,
+        name: str,
+        value: str | bytes | int | bool,
+        form=None,
+    ):
+        self.die_offset = die_offset
+        self.name = name
+        self.value = value
+        self.form = form
+
+    def _format_expr_value(self) -> str:
+        # This code is left here in case this is feature is desired in the
+        # future. It is partly functional, being able to decode operations
+        # but not operands due to their variable length in bytes.
+
+        # self.form = "SPECIAL_expr"
+        # lines = [lbrace]
+        # curr_idx = 0
+        # while curr_idx < len(self.value):
+        #     curr_op_code = self.value[curr_idx]
+        #
+        #     try:
+        #         curr_op: DWARFOperation = dwarf_operations[curr_op_code]
+        #     except KeyError as err:
+        #         raise KeyError(
+        #             f"Unknown op: {curr_op_code}, item {curr_idx} of"
+        #             f" attribute {self.name}, DIE at offset"
+        #             f" {self.die_offset}."
+        #         ) from err
+        #     # Empty arg here so it inserts a space before the first argument,
+        #     # only if there is indeed a first argument.
+        #     args = [""]
+        #     for arg_nr in range(curr_op.num_operands):
+        #         curr_idx += 1
+        #         # TODO decode bytes according to specific operation.
+        #         # The following line only reads the next BYTE of the sequence,
+        #         # and needs to be adjusted to read the correct number of
+        #         # bytes.
+        #         # args.append(str(self.value[curr_idx]))
+        #     op_line = "DW_OP_" + curr_op.suffix + " ".join(args)
+        #     lines.append(op_line)
+        #     curr_idx += 1
+        #
+        # lines.append(rbrace)
+        # return "\n".join(lines)
+        self.form = "SPECIAL_expr"
+        return "{ TODO: Fill expr list }"
+
+    def _needs_escaping(self, str_value: str) -> bool:
+        charset = set(str_value)
+        return bool(charset.intersection({"{", "}", " ", "\t"}))
+
+    def _format_str(self, str_value: str) -> str:
+        if self._needs_escaping(str_value):
+            escaped_str = str(str_value)
+            # Replace single escape (which is itself escaped because of regex)
+            # with a double escape (which doesn't mean anything to regex so
+            # it doesn't need escaping).
+            escaped_str = re.sub(r"\\", r"\\", escaped_str)
+            escaped_str = re.sub("([{}])", r"\\\1", escaped_str)
+            return "{" + escaped_str + "}"
+        else:
+            return str_value
+
+    def _format_value(
+        self,
+        offset_die_lookup: dict[int, "DWARFDIE"],
+        indent_count: int = 0
+    ) -> str:
+        if self.form in EXPR_ATTRIBUTE_FORMS:
+            return self._format_expr_value()
+        elif isinstance(self.value, bool):
+            return str(int(self.value))
+        elif isinstance(self.value, int):
+            if self.form == "DW_FORM_ref4":
+                # ref4-style referencing label.
+                die: "DWARFDIE" = offset_die_lookup[self.value]
+                return ":$" + die.tcl_label
+            elif self.form == "DW_FORM_ref_addr":
+                # ref_addr-style referencing label.
+                die: "DWARFDIE" = offset_die_lookup[self.value]
+                return "%$" + die.tcl_label
+            else:
+                return str(self.value)
+        elif isinstance(self.value, bytes):
+            return self._format_str(self.value.decode("ascii"))
+        elif isinstance(self.value, str):
+            return self._format_str(self.value)
+        else:
+            raise NotImplementedError(
+                "Unknown data type: " + str(type(self.value))
+            )
+
+    def format(
+        self,
+        offset_die_lookup: dict[int, "DWARFDIE"],
+        indent_count: int = 0
+    ) -> str:
+        """Format the attribute in the form {name value form}.
+
+        If form is DW_FORM_exprloc or DW_FORM_block, see next section on
+        DWARFOperations.
+
+        If it isn't, value is formatted as follows:
+            If bool, use "1" if True, "0" if False.
+            If int:
+                If form is DW_FORM_ref4, use ":$label" where label is the
+                    tcl_label of the DWARFDIE at offset "value".
+                If form is DW_FORM_ref_addr, use "%$label" where label is
+                    the tcl_label of the DWARFDIE at offset "value".
+                Else, use value directly.
+            If bytes, use value.decode("ascii")
+            If str, use value directly.
+            Any other type results in a NotImplementedError being raised.
+
+        Regarding DW_FORM_exprloc and DW_FORM_block:
+            The form is replaced with SPECIAL_expr.
+            The entries in the value are interpreted and decoded using the
+            dwarf_operations dictionary, and replaced with their names where
+            applicable.
+        """
+        s = lbrace
+        s += self.name + " "
+        s += self._format_value(offset_die_lookup)
+
+        # Only explicitly state form if it's not a reference.
+        if self.form not in [None, "DW_FORM_ref4", "DW_FORM_ref_addr"]:
+            s += " " + self.form
+
+        s += rbrace
+        return indent(s, indent_count)
+
+
+class DWARFDIE:
+    """This script's parsed version of a RawDIE."""
+    def __init__(
+        self,
+        offset: int,
+        tag: str,
+        attrs: dict[str, DWARFAttribute],
+        tcl_label: Optional[str] = None
+    ):
+        self.offset = offset
+        self.tag = tag
+        self.attrs = copy(attrs)
+        self.children = []
+        self.tcl_label = tcl_label
+
+    def format_lines(
+        self,
+        offset_die_lookup: dict[int, "DWARFDIE"],
+        indent_count: int = 0
+    ) -> list[str]:
+        """Get the list of lines that represent this DIE in Dwarf assembler."""
+        die_lines = []
+
+        # Prepend label to first line, if it's set.
+        if self.tcl_label:
+            first_line_start = self.tcl_label + ": "
+        else:
+            first_line_start = ""
+
+        # First line, including label.
+        first_line = indent(
+            first_line_start + self.tag + " " + lbrace,
+            indent_count
+        )
+        die_lines.append(first_line)
+
+        # Format attributes, if any.
+        if self.attrs:
+            for attr_name, attr in self.attrs.items():
+                attr_line = attr.format(
+                    offset_die_lookup,
+                    indent_count=indent_count+1
+                )
+                die_lines.append(attr_line)
+            die_lines.append(indent(rbrace, indent_count))
+        else:
+            # Don't create a new line, just append and immediately close the
+            # brace on the last line.
+            die_lines[-1] += rbrace
+
+        # Format children, if any.
+        if self.children:
+            # Only open a new brace if there are any children for the
+            # current DIE.
+            die_lines[-1] += " " + lbrace
+            for child in self.children:
+                child_lines = child.format_lines(
+                    offset_die_lookup,
+                    indent_count=indent_count+1
+                )
+                die_lines.extend(child_lines)
+            die_lines.append(indent(rbrace, indent_count))
+
+        return die_lines
+
+    def format(
+        self,
+        offset_die_lookup: dict[int, "DWARFDIE"],
+        indent_count: int = 0
+    ) -> str:
+        """Join result from format_lines into a single str."""
+        return "\n".join(self.format_lines(offset_die_lookup, indent_count))
+
+    def name(self) -> Optional[str]:
+        """Get DW_AT_name (if present) decoded as ASCII."""
+        raw_value = self.attrs.get("DW_AT_name")
+        if raw_value is None:
+            return None
+        else:
+            return raw_value.value.decode("ascii")
+
+    def type_name(self) -> str:
+        """Name of Dwarf tag, with the "DW_TAG_" prefix removed."""
+        return re.sub("DW_TAG_", "", self.tag)
+
+class DWARFCompileUnit(DWARFDIE):
+    """Wrapper subclass for CU DIEs.
+
+    This is necessary due to the special format CUs take in Dwarf::assemble.
+
+    Instead of simply:
+        DW_TAG_compile_unit {
+            <attributes>
+        } {
+            <children>
+        }
+
+    CUs are formatted as:
+        cu { <cu_special_vars> } {
+            DW_TAG_compile_unit {
+                <attributes>
+            } {
+                <children>
+            }
+        }
+    """
+
+    # Default value for parameter dwarf_version defined in dwarf.exp line 1552.
+    default_dwarf_version = 4
+
+    # Default value for parameter is_fission defined in dwarf.exp line 1556.
+    # Currently not implemented, see comment below.
+    # default_is_fission = False
+
+    # Tag that signifies a DIE is a compile unit.
+    compile_unit_tag = "DW_TAG_compile_unit"
+
+    def __init__(
+        self,
+        raw_die: RawDIE,
+        raw_cu: RawCompileUnit,
+        attrs: dict[str, DWARFAttribute],
+    ):
+        """Initialize additional instance variables for CU encoding.
+
+        The additional instance variables are:
+            - is_64_bit: Optional[bool]
+              default None
+              Whether this CU is 64 bit or not.
+            - dwarf_version: int
+              default DWARFCompileUnit.default_dwarf_version
+              Version of DWARF this CU is using.
+            - addr_size: Optional[int]
+              default None
+              Size of an address in bytes.
+
+        These variables are used to configure the first parameter of the cu
+        proc (which contains calls to the compile_unit proc in the body of
+        Dwarf::assemble).
+        """
+        super().__init__(
+            raw_die.offset,
+            DWARFCompileUnit.compile_unit_tag,
+            attrs
+        )
+        self.raw_cu = raw_cu
+        self.is_64_bit: Optional[bool] = None
+        self.dwarf_version: int = raw_cu.header.get(
+            "version",
+            DWARFCompileUnit.default_dwarf_version
+        )
+        self.addr_size: Optional[int] = raw_cu.header["address_size"]
+
+        # Fission is not currently implemented because I don't know where to
+        # fetch this information from.
+        # self.is_fission: bool = self.default_is_fission
+
+        # CU labels are not currently implemented because I haven't found where
+        # pyelftools exposes this information.
+        # self.cu_label: Optional[str] = None
+
+    def format_lines(
+        self,
+        offset_die_lookup: dict[int, DWARFDIE],
+        indent_count: int = 0,
+    ) -> list[str]:
+        lines = []
+        lines.append(self._get_header(indent_count))
+        inner_lines = super().format_lines(
+            offset_die_lookup,
+            indent_count + 1
+        )
+        lines += inner_lines
+        lines.append(indent(rbrace, indent_count))
+        return lines
+
+    def _get_header(self, indent_count: int = 0) -> str:
+        """Assemble the first line of the surrounding 'cu {} {}' proc call."""
+        header = indent("cu " + lbrace, indent_count)
+        cu_params = []
+
+        if self.is_64_bit is not None:
+            # Convert from True/False to 1/0.
+            param_value = int(self.is_64_bit)
+            cu_params += ["is_64", str(param_value)]
+
+        if self.dwarf_version != DWARFCompileUnit.default_dwarf_version:
+            cu_params += ["version", str(self.dwarf_version)]
+
+        if self.addr_size is not None:
+            cu_params += ["addr_size", str(self.addr_size)]
+
+        # Fission is not currently implemented, see comment above.
+        # if self.is_fission != DWARFCompileUnit.default_is_fission:
+        #     # Same as is_64_bit conversion, True/False -> 1/0.
+        #     param_value = int(self.is_fission)
+        #     cu_params += ["fission", str(param_value)]
+
+        # CU labels are not currently implemented, see commend above.
+        # if self.cu_label is not None:
+        #     cu_params += ["label", self.cu_label]
+
+        if cu_params:
+            header += " ".join(cu_params)
+
+        header += rbrace + " " + lbrace
+        return header
+
+class DWARFParser:
+    """Converter from pyelftools's DWARF representation to this script's."""
+
+    def __init__(self, elf_file: IOBase):
+        """Init parser with file opened in binary mode.
+
+        File can be closed after this function is called.
+        """
+        self.raw_data = BytesIO(elf_file.read())
+        self.elf_data = ELFFile(self.raw_data)
+        self.dwarf_info = self.elf_data.get_dwarf_info()
+        self.offset_to_die: dict[int, DWARFDIE] = {}
+        self.label_to_die: dict[str, DWARFDIE] = {}
+        self.referenced_offsets: set[int] = set()
+        self.raw_cu_list: list[RawCompileUnit] = []
+        self.top_level_dies: list[DWARFDIE] = []
+        self.subprograms: list[DWARFDIE] = []
+        self.taken_labels: set[str] = set()
+
+        self._read_cus()
+        self._create_necessary_labels()
+
+    def _read_cus(self):
+        """Populate self.raw_cu_list with all CUs in self.dwarf_info."""
+        for cu in self.dwarf_info.iter_CUs():
+            self._read_cu(cu)
+
+    def _read_cu(self, raw_cu: RawCompileUnit):
+        """Read a compile_unit into self.cu_list."""
+        self.raw_cu_list.append(raw_cu)
+        for raw_die in raw_cu.iter_DIEs():
+            if not raw_die.is_null():
+                self._parse_die(raw_die)
+
+    def _parse_die(self, raw_die: RawDIE) -> DWARFDIE:
+        """Process a single DIE and add it to offset_to_die.
+
+        Look for DW_FORM_ref4 and DWD_FORM_ref_addr form attributes and replace
+        them with the global offset of the referenced DIE, and marking the
+        referenced DIE as "referenced". This will be used later to assign and
+        use labels.
+
+        In case the DIE is a top-level DIE, add it to self.top_level_dies.
+
+        In case the DIE is a subprogram, add it to self.subprograms and call
+        self._use_vars_for_low_and_high_pc_attr with it.
+        """
+        processed_attrs = {}
+        attr_value: AttributeValue
+        for attr_name, attr_value in raw_die.attributes.items():
+            actual_value = attr_value.value
+            if attr_value.form in ("DW_FORM_ref4", "DW_FORM_ref_addr"):
+                referenced_die = raw_die.get_DIE_from_attribute(attr_name)
+                actual_value = referenced_die.offset
+                self.referenced_offsets.add(referenced_die.offset)
+
+            processed_attrs[attr_name] = DWARFAttribute(
+                raw_die.offset,
+                attr_name,
+                actual_value,
+                attr_value.form
+            )
+
+        if raw_die.tag == DWARFCompileUnit.compile_unit_tag:
+            # FIXME: This isn't pretty, as it relies on global parser state.
+            # It also only works under the assumption CUs can't be nested.
+            die_cu = self.raw_cu_list[-1]
+            processed_die = DWARFCompileUnit(raw_die, die_cu, processed_attrs)
+        else:
+            processed_die = DWARFDIE(
+                raw_die.offset,
+                raw_die.tag,
+                processed_attrs,
+                None
+            )
+
+        if raw_die.get_parent() is None:
+            # Top level DIE
+            self.top_level_dies.append(processed_die)
+        else:
+            # Setting the parent here assumes the parent was already processed
+            # prior to this DIE being found.
+            # As far as I'm aware, this is always true in DWARF.
+            processed_parent = self.offset_to_die[raw_die.get_parent().offset]
+            processed_parent.children.append(processed_die)
+
+        if processed_die.tag == "DW_TAG_subprogram":
+            self.subprograms.append(processed_die)
+            self._use_vars_for_low_and_high_pc_attr(processed_die)
+
+        self.offset_to_die[processed_die.offset] = processed_die
+        return processed_die
+
+    def _create_necessary_labels(self):
+        """Create labels to DIEs that were referenced by others."""
+        for offset in self.referenced_offsets:
+            die = self.offset_to_die[offset]
+            self._create_label_for_die(die)
+
+    def _use_vars_for_low_and_high_pc_attr(self, subprogram: DWARFDIE) -> None:
+        """Replace existing PC attributes with Tcl variables.
+
+        If DW_AT_low_pc exists for this DIE, replace it with accessing the
+        variable whose name is given by self.subprogram_start_var(subprogram).
+
+        If DW_AT_high_pc exists for this DIE, replace it with accessing the
+        variable whose name is given by self.subprogram_end_var(subprogram).
+        """
+        low_pc_attr_name = "DW_AT_low_pc"
+        if low_pc_attr_name in subprogram.attrs:
+            start = self.subprogram_start_var(subprogram)
+            subprogram.attrs[low_pc_attr_name].value = start
+
+        high_pc_attr_name = "DW_AT_high_pc"
+        if high_pc_attr_name in subprogram.attrs:
+            end = self.subprogram_end_var(subprogram)
+            subprogram.attrs[high_pc_attr_name].value = end
+
+    def _create_label_for_die(self, die: DWARFDIE) -> None:
+        """Set tcl_label to a unique string among other DIEs for this parser.
+
+        As a first attempt, use labelify(die.name()). If the DIE does not have
+        a name, use labelify(die.type_name()).
+
+        If the chosen initial label is already taken, try again appending "_2".
+        While the attempt is still taken, try again replacing it with "_3", then
+        "_4", and so on.
+
+        This function also creates an entry on self.label_to_die.
+        """
+        if die.tcl_label is not None:
+            return
+
+        label = labelify_str(die.name() or die.type_name())
+
+        # Deduplicate label in case of collision
+        if label in self.taken_labels:
+            suffix_nr = 2
+
+            # Walrus operator to prevent writing the assembled label_suffix
+            # string literal twice. This could be rewritten by copying the
+            # string literal to the line after the end of the while loop,
+            # but I deemed it would be too frail in case one of them needs
+            # to be changed and the other is forgotten.
+            while (new_label := f"{label}_{suffix_nr}") in self.taken_labels:
+                suffix_nr += 1
+            label = new_label
+
+        die.tcl_label = label
+        self.label_to_die[label] = die
+        self.taken_labels.add(label)
+
+    def subprogram_start_var(self, subprogram: DWARFDIE) -> str:
+        """Name of the Tcl variable that holds the low PC for a subprogram."""
+        return f"${subprogram.name()}_start"
+
+    def subprogram_end_var(self, subprogram: DWARFDIE) -> str:
+        """Name of the Tcl variable that holds the high PC for a subprogram."""
+        return f"${subprogram.name()}_end"
+
+    def all_labels(self) -> set[str]:
+        """Get a copy of the set of all labels known to the parser so far."""
+        return copy(self.taken_labels)
+
+
+class DWARFAssemblerGenerator:
+    """Class that generates Dwarf::assemble code out of a DWARFParser."""
+
+    def __init__(self, dwarf_parser: DWARFParser, output=sys.stdout):
+        self.dwarf_parser = dwarf_parser
+        self.output = output
+
+    def emit(self, line: str, indent_count: int) -> None:
+        """Print a single line indented indent_count times to self.output.
+
+        If line is empty, it will always print an empty line, even with nonzero
+        indent_count.
+        """
+        if line:
+            line = get_indent_str(indent_count) + line
+        print(line, file=self.output)
+
+    def generate_die(self, die: DWARFDIE, indent_count: int):
+        """Generate the lines that represent a DIE."""
+        die_lines = die.format(self.dwarf_parser.offset_to_die, indent_count)
+        self.emit(die_lines, 0)
+
+    def generate(self):
+        indent_count = 0
+
+        year = datetime.now().year
+        copyright_notice = copyright_notice_template.format(current_year=year)
+        for copyright_line in copyright_notice.split("\n"):
+            self.emit(copyright_line, indent_count)
+
+        self.emit("load_lib dwarf.exp", indent_count)
+        self.emit("require dwarf2_support", indent_count)
+        self.emit("set asm_file [standard_output_file $srcfile2]", indent_count)
+
+        self.emit("", 0)
+        self.emit("# TODO: call prepare_for_testing", indent_count)
+
+        self.emit("", 0)
+        self.emit("Dwarf::assemble $asm_file " + lbrace, indent_count)
+
+        # Begin Dwarf::assemble body.
+        indent_count += 1
+        self.emit("global srcdir subdir srcfile", indent_count)
+
+        all_labels = self.dwarf_parser.all_labels()
+        self.emit("declare_labels " + " ".join(all_labels), indent_count)
+
+        self.emit("", 0)
+        for subprogram in self.dwarf_parser.subprograms:
+            self.emit(f"get_func_info {subprogram.name()}", indent_count)
+
+        for die in self.dwarf_parser.top_level_dies:
+            self.generate_die(die, indent_count)
+
+        # TODO: line table, if it's within scope (it probably isn't).
+
+        # End Dwarf::assemble body.
+        indent_count -= 1
+        self.emit(rbrace, indent_count)
+
+def main(argv):
+    try:
+        filename = argv[1]
+    except IndexError:
+        print("Usage:")
+        print("python ./asm_to_dwarf_assembler.py <path/to/elf/file>")
+        sys.exit(1)
+
+    try:
+        with open(filename, "rb") as elf_file:
+            parser = DWARFParser(elf_file)
+    except Exception as e:
+        print("Error parsing ELF file. Does it contain DWARF information?")
+        print(str(e))
+        sys.exit(2)
+    generator = DWARFAssemblerGenerator(parser)
+    generator.generate()
+
+if __name__ == "__main__":
+    main(sys.argv)