Read inlining info in NVIDIA extended line map

Message ID 7C166312-4876-444F-9B85-C7E30C8F4959@rice.edu
State Committed
Headers
Series Read inlining info in NVIDIA extended line map |

Commit Message

John Mellor-Crummey Sept. 6, 2021, 12:07 a.m. UTC
  --Apple-Mail=_EB22CA76-EC65-45E3-8A9F-E0830BF864E9
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

As of CUDA 11.2, NVIDIA added extensions to the line map section
of CUDA binaries to represent inlined functions. These extensions
include

 - two new fields in a line table row to represent inline=20
   information: context, and functionname,

 - two new DWARF extended opcodes: DW_LNE_inlined_call,=20
   DW_LNE_set_function_name,

 - an additional word in the line table header that indicates=20
   the offset in the .debug_str function where the function=20
   names for this line table begin, and

 - two new functions in the libdw API: dwarf_linecontext and=20
   dwarf_linefunctionname, which return the new line table fields.

A line table row for an inlined function contains a non-zero
"context" value. The =E2=80=9Ccontext=E2=80=9D field indicates the index =
of the
line table row that serves as the call site for an inlined
context.

The "functionname" field in a line table row is only meaningful
if the "context" field of the row is non-zero. A meaningful
"functionname" field contains an index into the .debug_str
section relative to the base offset established in the line table
header; the position in the .debug_str section indicates the name
of the inlined function.

These extensions resemble the proposed DWARF extensions
(http://dwarfstd.org/ShowIssue.php?issue=3D140906.1) by Cary
Coutant, but are not identical.

This patch adds integrates support for handling NVIDIA's extended
line maps into elfutil's libdw library and the readelf command
line utility.

Since this support is a non-standard extension to DWARF, all code
that implements the extensions is implemented between markers =20
/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */ and=20
/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */.

The definition below

 #define NVIDIA_LINEMAP_INLINING_EXTENSIONS 1

is added to elfutils/version.h, which enables a client of elfutils=20
to test whether the NVIDIA line map extensions are present.=20

Note: The support for NVIDIA extended line maps adds two integer
fields (context and functionname) to struct Dwarf_Line_s, which
makes the structure about 30% larger.

The patch includes a binary testfile_nvidia_linemap.bz2 that
contains an NVIDIA extended linemap along with two tests that
read the line map.

 - A test script run-nvidia-extended-linemap-readelf.sh=20
   checks the output of readelf on the new test binary to=20
   validate its dump of the line op codes containing context=20
   and functionname entries.

 - A test program tests/nvidia_extended_linemap_libdw.c reads=20
   the extended line map with dwarf_next_lines and dumps the=20
   context and functionname fields of the line map where they=20
   are relevant, i.e. the value of context is non-zero. A test=20
   script run-nvidia-extended-linemap-libdw.sh runs this test=20
   and validates its output.

A patch with the new functionality described above is attached.
--
John Mellor-Crummey		Professor
Dept of Computer Science	Rice University
email: johnmc@rice.edu		phone: 713-348-5179

Signed-off-by: John M Mellor-Crummey <johnmc@rice.edu>
---
 ChangeLog                                    |   4 +
 config/version.h.in                          |   4 +
 libdw/Makefile.am                            |   1 +
 libdw/dwarf.h                                |   4 +
 libdw/dwarf_getsrclines.c                    |  52 ++++++-
 libdw/dwarf_linecontext.c                    |  47 ++++++
 libdw/dwarf_linefunctionname.c               |  47 ++++++
 libdw/libdw.h                                |  12 ++
 libdw/libdw.map                              |   2 +
 libdw/libdwP.h                               |   4 +
 src/readelf.c                                |  46 ++++++
 tests/Makefile.am                            |   3 +
 tests/nvidia_extended_linemap_libdw.c        | 153 +++++++++++++++++++
 tests/run-nvidia-extended-linemap-libdw.sh   |  41 +++++
 tests/run-nvidia-extended-linemap-readelf.sh | 114 ++++++++++++++
 tests/testfile_nvidia_linemap.bz2            | Bin 0 -> 2365 bytes
 16 files changed, 533 insertions(+), 1 deletion(-)
 create mode 100644 libdw/dwarf_linecontext.c
 create mode 100644 libdw/dwarf_linefunctionname.c
 create mode 100644 tests/nvidia_extended_linemap_libdw.c
 create mode 100755 tests/run-nvidia-extended-linemap-libdw.sh
 create mode 100755 tests/run-nvidia-extended-linemap-readelf.sh
 create mode 100644 tests/testfile_nvidia_linemap.bz2

GIT binary patch
literal 2365
zcmV-D3BvY5T4*^jL0KkKSq)>6PXG%lfB*mg|L^|i|Nqm@%YXm>|9|bF;0gptplMZb
z+z@oxek$MzA3=Le*sZ4QFf%t<T`QmfC=8JrjZZ;6N&1r>rk<(lJVeQ+CXEdl4WbN<
z003wJ222QOG|<yP27r2i7=%qUF)E&u)ENvDAT-IfH1#ls1k*qRBM``F02*iphK(5w
z0iZMnn3`cS28K;E(-R<Rp^>4JO$`Q^Ookzl7$XsrKr|R55rS!>07C#HCYdx~lTt}N
z38AT?G}BEz6Gx~S0Av6%G&D2~0017KWB>pLfCE4P28bFNG}BB>fu@E=hD|gy8euXR
zhCpD9Moj?FV2nlyri=j$0F0Vs(Sk(*1jt54hEjf{$kR~t(lp+b(<Xq(15E~jrk+q9
zqfHEkhL2EaGypVuhEG${Y^Ig~UxRRaEJ=4<WbwApGblg|mU#~?l+vLlQd5iQ9&V+j
zS6ewn1h^@)ItC<x_}SoALul#d(o~$6NV>p@T+?W=XfGcS<G97?76|H0A)#7}m4%D$
zl){2Ki-}g@dcI8OD<Vp$5F$Dwp_?`yv%mpFk<jL&xO6e{JpCKEpp6b1j2Z}5C7?OT
zNdc){{}xteZZ8SK01`yC9fd$h%1NwjvPDwV5G@H8Y@Ir;t|^E--yG)4tYPJ$Bwir|
zH1>&L5UbU0?35a#xs8L!r-U)TF1&F!__dKhBSQp!#+wj$%m9uMNR-G!K_M%dqZ6H;
zp6s^6LtA42tC>wzChpmB6xue3L}H1b<ePIc!X-KZgNKWWjf}zs{L8PiMA9T+r2yER
z#{&BqPz<p_ArB>!5<t@L_Le!M&YW<R2F0_EGcdiNHR-gzj<%``cqK9d;xR!J#XuVZ
z#uQ+Vm%M^RWdw*;1F`tQL`%eh(`q8EYatRzKX#pwocuB~oIdJ;zO|iCBg6duFXI9W
zFo8h=453TaHX|=AFfek2gOCjU>}H>R`jsiDJG5(on{65{32{>NRWayf(z@`1E`3y{
zl~q+m1VF!oZ8R<b+<2JonKJ294HBq!J2I&?M#u~{(5z2H)k=(oRtaYvtOB+r#Y3k;
zC7>)6HEO=U!CjHKL=i#lMp8Ugu|_5N>}!I}P<2H9jm?W|S`&9IL7fYl&9&Tajy_+U
zco?v~b4#nESGuIQ`WznwnVcDcvISVY32sJ@;zMlYbDE<_<YnWEy^6s~Akr65*7_x|
z=cGhyIbiK>Ci5dU5WJ7A^d;1t&%hFdjAKBEu4x*h0Wpf#4;~SW1h|NlaA3mRdio<%
zGpjR0sd)xQTDw99P<XZM0^$G-hw)*Dp{>Zxme&R|Hv~Sh-6jvSs*KAN5Z798xL3$;
z9FnOe=%tyNnVINO5fKqInrWo~+BvML@!#9pbEQiVVg!noOb1AemW7{KqN380XnRY1
z6{JWGivo+!W+h-4Y3?|6%5MW0g;{{xNZkBE)Ja^Fk<DfLOUuiGU|e;7DEHy&FTyC9
zw|@ElJs$fff7cGbVgpeTho{*g*|kHkokHMWV^Yqw$$NH5G4T8(xzbCN4SD<>-a=Dk
zS=wXMwrouPmhh`==%Vz67fDh|NyHUm+g!3v8g!M10QsMnf&`Qk+^IDn5Z;=Pj1~a5
zH&O@Dx5OLVo51JwnP)1TKRjw}r3i|mBC4vaL`EVgtVLB-h={~hRw5{<f-0)2s-h$1
z$kRYHs;aT68eDmjY!vQor*T(Q=XlE8YdDda9r-uO;Nkp-Ugd{*L9r+xqg-(@M@3W;
zDC8t_qJnLFArXM1;;4pCGBtyOBt;Vyh&;)~05=1ik9*E_DQ%T?tnGw&zTKOqIw1aW
zwnT<AmO#}`*(GvbI`=jT;Lk#Z2*>4IBBKH!Y!!n8(v!LfgK&f~C6?3$*akx+tT+~`
zQ&mt=O;aXEkO2drE@PBN-FLcr75GzJg+W0&@WT#_C_bl!Qmkb@<B?RrV`ti5ZFR$g
zCFUGX8|<!SUR_d9jWnPq1vMrXrl7eh0n1p~b!yuiSb4n8hoLl*Q@+TM%^dmj%_L)u
zq^FIRTb>|P;)P6=sVtW;lDLBoVZy4hQ58{8Mk=bRAgk!m5)flbxZQ3X+O!1lp@o(X
z+SMs00cDV_Ba9V5-r^D{G#CoWD#*P+kpPw8M6zZih>HSCTG3%@-dffX$!=A)MiVAu
zjx<?QS4~dMn;GkCB4sZx)5Nc`R7?8Au(ES4i1RJQZ7S?i^_SMjldfy0yGT+FA7z3G
z0!l4O$T4ul)*&IWS1}0$2oXpi(QC!X(G~F16;y3)yeVz>63v@7iqgAu&330$@?#DN
z6NEditP%rEZN^JA8Y?;mfGz=a@`QwllSz_F08}3)mY~3kP&Ps_ZMmM0c!-Q49`B4#
zcb+A-d|*WH1GzczT?Nf5cK!b{<D9hikOSD7ElCiJaX_p>F_9@3pgiJJ3pWL7W>gA4
zGiDm}BpCOl$LYMDSzZy{NiuK^^pSeEbtH;V1z?N>)ah6m1~G-bIK(87XT3qK5)l9x
z40mqd)(~-IuM57Vl3(pymwh1aNiBL67qdt&_v%@YFH69NQHiccgh3-{yq!_dYu|g@
z=@*8TomE+8s9{vLJ*R@(ZQtwkpx|qDhM_ZY0%9&mHga|(YR7p3b|5p@TdqM_Eg+-s
zLL!iMrX_|_A~gsaL?&2rselIk+iLV;pl>wLF0}Gs9l=WS)!Yq_pXg3`#*M}(Q-g2@
z1`~`<3%;7Tx%_a8V<Tz{&?L4+0hA(=M#z~Exmja+;7A%d0SX`pcKO0JV}Q^=-q$Wo
z!EI97Oj*lC>yAZl!6_V$UBw`=4iMR4eXP@L*K7((7gR*|jKtwYBx*=67Nb8((Hm10
zTeLcBi18~G1vRWQDD}RujxL{FF1l1;K}-aDVK9GC32vRua)mg)2*#^e6+Jh`NDdNs
zn#lrvt*a9C4j5hys&rMN&Pg;`(RCy8V3aTH7)tkQALtO<?Dk*^_65{eWYalsZm)fV
z&N@wqCZ`3Jqs$kY7Wv)oIMqYU>K1(%p_BZdf^uW4^<3GX)mdW&zZuC%MyCDnGxGg7
j?<uY9gnA3bP{k@Mli^~Jj?A#k|M<I-DZ+$+YaEJrhXo~*

literal 0
HcmV?d00001
  

Comments

John Mellor-Crummey Sept. 10, 2021, 3:49 p.m. UTC | #1
My previous patch submission seems to have been overlooked as buildbot issues consumed several days this week. However, discussion in the mailing list now seems to have moved on beyond my submission and I would like my patch considered. Here, I echo my previous submission, except I improved my submission by including the prefix [PATCH] in the subject and I included the patch in the message body rather than as an attachment.

Regarding the DWARF proposal by Cary Coutant for two-level linemaps: I now believe that NVIDIA?s implementation is consistent with that proposal although NVIDIA uses a DWARF extended opcode for inlined calls whereas Cary?s proposal uses DW_LNS_inlined_call (a standard opcode), NVIDIA?s implementation uses DW_LNE_inlined_call (an extended opcode).

A note about the source code for the test case reading the extended linemap entries using libdw: this code was copied from another test that used dwarf_next_lines and extended with code that reads the new context and functionname fields of a line table entry.
--
John Mellor-Crummey		Professor
Dept of Computer Science	Rice University
email: johnmc@rice.edu		phone: 713-348-5179

> On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey <johnmc@rice.edu> wrote:
> 
> As of CUDA 11.2, NVIDIA added extensions to the line map section
> of CUDA binaries to represent inlined functions. These extensions
> include
> 
> - two new fields in a line table row to represent inline 
>   information: context, and functionname,
> 
> - two new DWARF extended opcodes: DW_LNE_inlined_call, 
>   DW_LNE_set_function_name,
> 
> - an additional word in the line table header that indicates 
>   the offset in the .debug_str function where the function 
>   names for this line table begin, and
> 
> - two new functions in the libdw API: dwarf_linecontext and 
>   dwarf_linefunctionname, which return the new line table fields.
> 
> A line table row for an inlined function contains a non-zero
> "context" value. The ?context? field indicates the index of the
> line table row that serves as the call site for an inlined
> context.
> 
> The "functionname" field in a line table row is only meaningful
> if the "context" field of the row is non-zero. A meaningful
> "functionname" field contains an index into the .debug_str
> section relative to the base offset established in the line table
> header; the position in the .debug_str section indicates the name
> of the inlined function.
> 
> These extensions resemble the proposed DWARF extensions
> (http://dwarfstd.org/ShowIssue.php?issue=140906.1) by Cary
> Coutant, but are not identical.
> 
> This patch adds integrates support for handling NVIDIA's extended
> line maps into elfutil's libdw library and the readelf command
> line utility.
> 
> Since this support is a non-standard extension to DWARF, all code
> that implements the extensions is implemented between markers  
> /* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */ and 
> /* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */.
> 
> The definition below
> 
> #define NVIDIA_LINEMAP_INLINING_EXTENSIONS 1
> 
> is added to elfutils/version.h, which enables a client of elfutils 
> to test whether the NVIDIA line map extensions are present. 
> 
> Note: The support for NVIDIA extended line maps adds two integer
> fields (context and functionname) to struct Dwarf_Line_s, which
> makes the structure about 30% larger.
> 
> The patch includes a binary testfile_nvidia_linemap.bz2 that
> contains an NVIDIA extended linemap along with two tests that
> read the line map.
> 
> - A test script run-nvidia-extended-linemap-readelf.sh 
>   checks the output of readelf on the new test binary to 
>   validate its dump of the line op codes containing context 
>   and functionname entries.
> 
> - A test program tests/nvidia_extended_linemap_libdw.c reads 
>   the extended line map with dwarf_next_lines and dumps the 
>   context and functionname fields of the line map where they 
>   are relevant, i.e. the value of context is non-zero. A test 
>   script run-nvidia-extended-linemap-libdw.sh runs this test 
>   and validates its output.
> 
> A patch with the new functionality described above is attached.
> --
> John Mellor-Crummey		Professor
> Dept of Computer Science	Rice University
> email: johnmc@rice.edu		phone: 713-348-5179
  
Mark Wielaard Sept. 10, 2021, 5:11 p.m. UTC | #2
Hi John,

On Fri, 2021-09-10 at 10:49 -0500, John Mellor-Crummey via Elfutils-
devel wrote:
> My previous patch submission seems to have been overlooked as
> buildbot issues consumed several days this week. However, discussion
> in the mailing list now seems to have moved on beyond my submission
> and I would like my patch considered. Here, I echo my previous
> submission, except I improved my submission by including the prefix
> [PATCH] in the subject and I included the patch in the message body
> rather than as an attachment.

Sorry for the buildbot noise, that was indeed a little painful. But a
buildbot that randomly fails is not much fun, so it took highest
priority to get it back to green.

Your submission is really nice, having extensive documentation, all
features implemented, a testcase. Well done.

There are however some concerns outside your control. It is somewhat
disappointing NVIDIA didn't document this themselves, or tried to
standardize this. You seems to have a good grasp of what was intended
though. We have to be careful not to extend the api in a way that makes
a better standard/implementation impossible. And the way we implemented
Dwarf_Lines isn't ideal, so this extension shows we should maybe change
it to be more efficient/compact. But maybe we can do that after adding
the extension, we should however have a plan.

> Regarding the DWARF proposal by Cary Coutant for two-level linemaps:
> I now believe that NVIDIA?s implementation is consistent with that
> proposal although NVIDIA uses a DWARF extended opcode for inlined
> calls whereas Cary?s proposal uses DW_LNS_inlined_call (a standard
> opcode), NVIDIA?s implementation uses DW_LNE_inlined_call (an
> extended opcode).

The naming is one of the concerns. It would be better to use a name
like DW_LNE_NVIDIA_inlined_call and DW_LNE_NVIDIA_set_function_name to
show they are vendor extensions and don't clash with possible future
standard opcode names.

That it mimics the two-level linemaps proposal is a good thing. But
lets make sure that the new accessor functions, dwarf_linecontext and
dwarf_linefunctionname, are generic enough that they can hopefully be
reused when two-level linemaps or a similar proposal becomes a
standard.

> A note about the source code for the test case reading the extended
> linemap entries using libdw: this code was copied from another test
> that used dwarf_next_lines and extended with code that reads the new
> context and functionname fields of a line table entry.

Thanks for the test case, it makes clear how the new functionality can
be used. How was the test binary, testfile_nvidia_linemap, generated?
That should probably be documented inside the testcase.

I won't be able to review all the code right now, but here are some
high-level comments, so you know what I am thinking.

On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey <johnmc@rice.edu>
> > wrote:
> > 
> > As of CUDA 11.2, NVIDIA added extensions to the line map section
> > of CUDA binaries to represent inlined functions. These extensions
> > include
> > 
> > - two new fields in a line table row to represent inline 
> >   information: context, and functionname,

We didn't design the Dwarf_Line_s very well/compact. We already have
the op_index, isa and discriminator fields which are almost never used.
This adds two more. I wonder if we can split the struct up so that the
extra fields are only used when actually used.

Maybe this can be done with a flag in Dwarf_Lines_s indicating whether
the array contains only the basic line attributes or also some extended
values. Of course this makes dwarf_getsrclines even more complex
because it then has to expand the line state struct whenever it first
sees an extended attribute. But I really like to see if we can use less
memory here. If we agree on some way to make that possible we can
implement it afterwards.

> > - two new DWARF extended opcodes: DW_LNE_inlined_call, 
> >   DW_LNE_set_function_name,

Like I said above I think these names should contain the "vendor" name
DW_LNE_NVIDIA_...

> > - an additional word in the line table header that indicates 
> >   the offset in the .debug_str function where the function 
> >   names for this line table begin, and

This is the only part I think is somewhat tricky. I believe you
implement it cleverly by checking the header_length. And your
implementation should work, but it is also the part that no other tool
understands, which means any such binary cannot really be handled by
any other tool. And I don't really understand how this works when
linking objects with such a offset into .debug_str. Normally the
.debug_str section contains strings that can be merged, but that would
disrupt the order, so I don't fully understand how the function_name
indexes are kept correct. This would have been nicer as a DWARF5
extension, where there is already a .debug_str_offsets section (but
also confusing because there is also a special .debug_line_str section
where these strings should probably point at).

> > - two new functions in the libdw API: dwarf_linecontext and 
> >   dwarf_linefunctionname, which return the new line table fields.

I think it would be nice if we could make these functions return a
Dwarf_Line * and a const char * to make them future proof in case a
standard extension encodes these differently.

Cheers,

Mark
  
John Mellor-Crummey Sept. 15, 2021, 6:25 p.m. UTC | #3
Mark,

Thanks for your feedback. I am working on a new version of the patch that changes
the interface for dwarf_linecontext and dwarf_linefunctionname to return a line pointer and a character pointer so it will be future proof.

See my other comments below, e.g. about an idea for reworking the Dwarf_line_s data structure.
--
John Mellor-Crummey		Professor
Dept of Computer Science	Rice University
email: johnmc@rice.edu		phone: 713-348-5179

> On Sep 10, 2021, at 12:11 PM, Mark Wielaard <mark@klomp.org> wrote:
> 
> Hi John,
> 
> On Fri, 2021-09-10 at 10:49 -0500, John Mellor-Crummey via Elfutils-
> devel wrote:
>> My previous patch submission seems to have been overlooked as
>> buildbot issues consumed several days this week. However, discussion
>> in the mailing list now seems to have moved on beyond my submission
>> and I would like my patch considered. Here, I echo my previous
>> submission, except I improved my submission by including the prefix
>> [PATCH] in the subject and I included the patch in the message body
>> rather than as an attachment.
> 
> Sorry for the buildbot noise, that was indeed a little painful. But a
> buildbot that randomly fails is not much fun, so it took highest
> priority to get it back to green.
> 
> Your submission is really nice, having extensive documentation, all
> features implemented, a testcase. Well done.
> 
> There are however some concerns outside your control. It is somewhat
> disappointing NVIDIA didn't document this themselves, or tried to
> standardize this. You seems to have a good grasp of what was intended
> though. We have to be careful not to extend the api in a way that makes
> a better standard/implementation impossible. And the way we implemented
> Dwarf_Lines isn't ideal, so this extension shows we should maybe change
> it to be more efficient/compact. But maybe we can do that after adding
> the extension, we should however have a plan.
> 
>> Regarding the DWARF proposal by Cary Coutant for two-level linemaps:
>> I now believe that NVIDIA?s implementation is consistent with that
>> proposal although NVIDIA uses a DWARF extended opcode for inlined
>> calls whereas Cary?s proposal uses DW_LNS_inlined_call (a standard
>> opcode), NVIDIA?s implementation uses DW_LNE_inlined_call (an
>> extended opcode).
> 
> The naming is one of the concerns. It would be better to use a name
> like DW_LNE_NVIDIA_inlined_call and DW_LNE_NVIDIA_set_function_name to
> show they are vendor extensions and don't clash with possible future
> standard opcode names.

I renamed the two new DWARF extended opcodes as you suggested.

> That it mimics the two-level linemaps proposal is a good thing. But
> lets make sure that the new accessor functions, dwarf_linecontext and
> dwarf_linefunctionname, are generic enough that they can hopefully be
> reused when two-level linemaps or a similar proposal becomes a
> standard.


>> A note about the source code for the test case reading the extended
>> linemap entries using libdw: this code was copied from another test
>> that used dwarf_next_lines and extended with code that reads the new
>> context and functionname fields of a line table entry.
> 
> Thanks for the test case, it makes clear how the new functionality can
> be used. How was the test binary, testfile_nvidia_linemap, generated?
> That should probably be documented inside the testcase.

I documented how the NVIDIA binary used in the two test cases was created
by adding comments to the two test cases.

> I won't be able to review all the code right now, but here are some
> high-level comments, so you know what I am thinking.
> 
> On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey <johnmc@rice.edu>
>>> wrote:
>>> 
>>> As of CUDA 11.2, NVIDIA added extensions to the line map section
>>> of CUDA binaries to represent inlined functions. These extensions
>>> include
>>> 
>>> - two new fields in a line table row to represent inline 
>>>  information: context, and functionname,
> 
> We didn't design the Dwarf_Line_s very well/compact. We already have
> the op_index, isa and discriminator fields which are almost never used.
> This adds two more. I wonder if we can split the struct up so that the
> extra fields are only used when actually used.



> Maybe this can be done with a flag in Dwarf_Lines_s indicating whether
> the array contains only the basic line attributes or also some extended
> values. Of course this makes dwarf_getsrclines even more complex
> because it then has to expand the line state struct whenever it first
> sees an extended attribute. But I really like to see if we can use less
> memory here. If we agree on some way to make that possible we can
> implement it afterwards.

Here are my thoughts about that. 

I think the type for Dwarf_Lines should be opaque rather than simply
a pointer to a Dwarf_Line_s record. The dwarf_onesrcline or routine
requires constant time random access to line records for efficiency.
Here is how I think that could be preserved.

Let the Dwarf_Line_s structure be a  union type: a small record with a bit or two that indicates the record kind, e.g.
enum dwarf_line_type { dwarf_line_compact, dwarf_line_nvidia_inline, dwarf_line_extended2, ?}. 
All the fields for the compact record would be in one structure in the union. Other structures in the union
would just contain a pointer to an out-of-band extended record that is stored elsewhere. The storage could be managed as follows:
Have a buffer to store the lines. A sequence of compact Dwarf_Line_s records begin at the front of the buffer. Any time an extended record is needed, allocate it at the end of the buffer before the previous extended record. An extended record will have both a This could accommodate extended records of various lengths. Allocations in the buffer would manage the fact that compact line records are allocated at the front and the smaller number of larger extended line records are allocated at the back. When the the next allocation would cause the cursors to cross in the middle,  the buffer is full and another buffer is needed. The Dwarf_Lines type could have a pointer to a next Dwarf_Lines structure. If there is concern that finding a line would not really be constant time if there was a huge number of lines for a file line lookup could require following an unbounded chain of pointers to a next buffer, then the buffers could be managed in a splay tree or a skip list, which would give O(log n) time lookup of a line buffer followed by a constant time indexing into the buffer.

> 
>>> - two new DWARF extended opcodes: DW_LNE_inlined_call, 
>>>  DW_LNE_set_function_name,
> 
> Like I said above I think these names should contain the "vendor" name
> DW_LNE_NVIDIA_...

Done.

> 
>>> - an additional word in the line table header that indicates 
>>>  the offset in the .debug_str function where the function 
>>>  names for this line table begin, and
> 
> This is the only part I think is somewhat tricky. I believe you
> implement it cleverly by checking the header_length.

That strategy came from NVIDIA?s implementation in cuda-gdb . They check when there
is a gap between the end of the standard header and the header length.

> And your
> implementation should work, but it is also the part that no other tool
> understands, which means any such binary cannot really be handled by
> any other tool. And I don't really understand how this works when
> linking objects with such a offset into .debug_str. Normally the
> .debug_str section contains strings that can be merged, but that would
> disrupt the order, so I don't fully understand how the function_name
> indexes are kept correct. This would have been nicer as a DWARF5
> extension, where there is already a .debug_str_offsets section (but
> also confusing because there is also a special .debug_line_str section
> where these strings should probably point at).
> 
>>> - two new functions in the libdw API: dwarf_linecontext and 
>>>  dwarf_linefunctionname, which return the new line table fields.
> 
> I think it would be nice if we could make these functions return a
> Dwarf_Line * and a const char * to make them future proof in case a
> standard extension encodes these differently.

I will work on a patch that does just this as soon as I have time.
  
John Mellor-Crummey Nov. 4, 2021, 9:41 p.m. UTC | #4
[We would really like this patch in the forthcoming release]

Attached is a new version of the patch for reading inlining information encoded in an enhanced line map format used in NVIDIA GPU binaries for CUDA 11.2+.

This is an updated version of a patch first submitted on Sept. 5. A copy of the original submission email is quoted below this note. 

Here I describe just the improvements to that patch that address Mark’s concerns:

(1) all of the code for handling NVIDIA DWARF extensions is always available; there is no special configuration switch needed.
(2) all changes are bracketed by comments that mark them NVIDIA extensions
(3) the DWARF extended opcodes have been renamed with names that include NVIDIA in them
(4) the two new API functions to surface the new information have been improved to separate the interface result from the internal representation (at Mark’s request)
	(4a) the API for extracting the name of an inlined function in a DWARF line now returns a const char * instead of a string table index
	(4b) the API for extracting an inline “context” now returns a pointer to a DWARF line where the code is inlined rather than returning an unsigned int (an index into the line table that one could use to compute the pointer)
(5) there are test cases for readelf and libdw that use a binary generated by NVIDIA’s compiler. the test cases include information about how the binary was generated
--
John Mellor-Crummey		Professor
Dept of Computer Science	Rice University
email: johnmc@rice.edu		phone: 713-348-5179

Description of the first version of the patch: 

> On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey <johnmc@rice.edu <mailto:johnmc@rice.edu>> wrote:
> 
> As of CUDA 11.2, NVIDIA added extensions to the line map section
> of CUDA binaries to represent inlined functions. These extensions
> include
> 
> - two new fields in a line table row to represent inline 
>   information: context, and functionname,
> 
> - two new DWARF extended opcodes: DW_LNE_inlined_call, 
>   DW_LNE_set_function_name,
> 
> - an additional word in the line table header that indicates 
>   the offset in the .debug_str function where the function 
>   names for this line table begin, and
> 
> - two new functions in the libdw API: dwarf_linecontext and 
>   dwarf_linefunctionname, which return the new line table fields.
> 
> A line table row for an inlined function contains a non-zero
> "context" value. The “context” field indicates the index of the
> line table row that serves as the call site for an inlined
> context.
> 
> The "functionname" field in a line table row is only meaningful
> if the "context" field of the row is non-zero. A meaningful
> "functionname" field contains an index into the .debug_str
> section relative to the base offset established in the line table
> header; the position in the .debug_str section indicates the name
> of the inlined function.
> 
> These extensions resemble the proposed DWARF extensions
> (http://dwarfstd.org/ShowIssue.php?issue=140906.1 <http://dwarfstd.org/ShowIssue.php?issue=140906.1>) by Cary
> Coutant, but are not identical.
> 
> This patch adds integrates support for handling NVIDIA's extended
> line maps into elfutil's libdw library and the readelf command
> line utility.
> 
> Since this support is a non-standard extension to DWARF, all code
> that implements the extensions is implemented between markers  
> /* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */ and 
> /* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */.
> 
> The definition below
> 
> #define NVIDIA_LINEMAP_INLINING_EXTENSIONS 1
> 
> is added to elfutils/version.h, which enables a client of elfutils 
> to test whether the NVIDIA line map extensions are present. 
> 
> Note: The support for NVIDIA extended line maps adds two integer
> fields (context and functionname) to struct Dwarf_Line_s, which
> makes the structure about 30% larger.
> 
> The patch includes a binary testfile_nvidia_linemap.bz2 that
> contains an NVIDIA extended linemap along with two tests that
> read the line map.
> 
> - A test script run-nvidia-extended-linemap-readelf.sh 
>   checks the output of readelf on the new test binary to 
>   validate its dump of the line op codes containing context 
>   and functionname entries.
> 
> - A test program tests/nvidia_extended_linemap_libdw.c reads 
>   the extended line map with dwarf_next_lines and dumps the 
>   context and functionname fields of the line map where they 
>   are relevant, i.e. the value of context is non-zero. A test 
>   script run-nvidia-extended-linemap-libdw.sh runs this test 
>   and validates its output.
> 
> A patch with the new functionality described above is attached.

Discussion about the first version of the patch:

> On Sep 15, 2021, at 1:25 PM, John Mellor-Crummey <johnmc@rice.edu> wrote:
> 
> Mark,
> 
> Thanks for your feedback. I am working on a new version of the patch that changes
> the interface for dwarf_linecontext and dwarf_linefunctionname to return a line pointer and a character pointer so it will be future proof.
> 
> See my other comments below, e.g. about an idea for reworking the Dwarf_line_s data structure.
> --
> John Mellor-Crummey		Professor
> Dept of Computer Science	Rice University
> email: johnmc@rice.edu <mailto:johnmc@rice.edu>		phone: 713-348-5179
> 
>> On Sep 10, 2021, at 12:11 PM, Mark Wielaard <mark@klomp.org <mailto:mark@klomp.org>> wrote:
>> 
>> Hi John,
>> 
>> On Fri, 2021-09-10 at 10:49 -0500, John Mellor-Crummey via Elfutils-
>> devel wrote:
>>> My previous patch submission seems to have been overlooked as
>>> buildbot issues consumed several days this week. However, discussion
>>> in the mailing list now seems to have moved on beyond my submission
>>> and I would like my patch considered. Here, I echo my previous
>>> submission, except I improved my submission by including the prefix
>>> [PATCH] in the subject and I included the patch in the message body
>>> rather than as an attachment.
>> 
>> Sorry for the buildbot noise, that was indeed a little painful. But a
>> buildbot that randomly fails is not much fun, so it took highest
>> priority to get it back to green.
>> 
>> Your submission is really nice, having extensive documentation, all
>> features implemented, a testcase. Well done.
>> 
>> There are however some concerns outside your control. It is somewhat
>> disappointing NVIDIA didn't document this themselves, or tried to
>> standardize this. You seems to have a good grasp of what was intended
>> though. We have to be careful not to extend the api in a way that makes
>> a better standard/implementation impossible. And the way we implemented
>> Dwarf_Lines isn't ideal, so this extension shows we should maybe change
>> it to be more efficient/compact. But maybe we can do that after adding
>> the extension, we should however have a plan.
>> 
>>> Regarding the DWARF proposal by Cary Coutant for two-level linemaps:
>>> I now believe that NVIDIA’s implementation is consistent with that
>>> proposal although NVIDIA uses a DWARF extended opcode for inlined
>>> calls whereas Cary’s proposal uses DW_LNS_inlined_call (a standard
>>> opcode), NVIDIA’s implementation uses DW_LNE_inlined_call (an
>>> extended opcode).
>> 
>> The naming is one of the concerns. It would be better to use a name
>> like DW_LNE_NVIDIA_inlined_call and DW_LNE_NVIDIA_set_function_name to
>> show they are vendor extensions and don't clash with possible future
>> standard opcode names.
> 
> I renamed the two new DWARF extended opcodes as you suggested.
> 
>> That it mimics the two-level linemaps proposal is a good thing. But
>> lets make sure that the new accessor functions, dwarf_linecontext and
>> dwarf_linefunctionname, are generic enough that they can hopefully be
>> reused when two-level linemaps or a similar proposal becomes a
>> standard.
> 
> 
>>> A note about the source code for the test case reading the extended
>>> linemap entries using libdw: this code was copied from another test
>>> that used dwarf_next_lines and extended with code that reads the new
>>> context and functionname fields of a line table entry.
>> 
>> Thanks for the test case, it makes clear how the new functionality can
>> be used. How was the test binary, testfile_nvidia_linemap, generated?
>> That should probably be documented inside the testcase.
> 
> I documented how the NVIDIA binary used in the two test cases was created
> by adding comments to the two test cases.
> 
>> I won't be able to review all the code right now, but here are some
>> high-level comments, so you know what I am thinking.
>> 
>> On Sep 5, 2021, at 7:07 PM, John Mellor-Crummey <johnmc@rice.edu <mailto:johnmc@rice.edu>>
>>>> wrote:
>>>> 
>>>> As of CUDA 11.2, NVIDIA added extensions to the line map section
>>>> of CUDA binaries to represent inlined functions. These extensions
>>>> include
>>>> 
>>>> - two new fields in a line table row to represent inline 
>>>>  information: context, and functionname,
>> 
>> We didn't design the Dwarf_Line_s very well/compact. We already have
>> the op_index, isa and discriminator fields which are almost never used.
>> This adds two more. I wonder if we can split the struct up so that the
>> extra fields are only used when actually used.
> 
> 
> 
>> Maybe this can be done with a flag in Dwarf_Lines_s indicating whether
>> the array contains only the basic line attributes or also some extended
>> values. Of course this makes dwarf_getsrclines even more complex
>> because it then has to expand the line state struct whenever it first
>> sees an extended attribute. But I really like to see if we can use less
>> memory here. If we agree on some way to make that possible we can
>> implement it afterwards.
> 
> Here are my thoughts about that. 
> 
> I think the type for Dwarf_Lines should be opaque rather than simply
> a pointer to a Dwarf_Line_s record. The dwarf_onesrcline or routine
> requires constant time random access to line records for efficiency.
> Here is how I think that could be preserved.
> 
> Let the Dwarf_Line_s structure be a  union type: a small record with a bit or two that indicates the record kind, e.g.
> enum dwarf_line_type { dwarf_line_compact, dwarf_line_nvidia_inline, dwarf_line_extended2, …}. 
> All the fields for the compact record would be in one structure in the union. Other structures in the union
> would just contain a pointer to an out-of-band extended record that is stored elsewhere. The storage could be managed as follows:
> Have a buffer to store the lines. A sequence of compact Dwarf_Line_s records begin at the front of the buffer. Any time an extended record is needed, allocate it at the end of the buffer before the previous extended record. An extended record will have both a This could accommodate extended records of various lengths. Allocations in the buffer would manage the fact that compact line records are allocated at the front and the smaller number of larger extended line records are allocated at the back. When the the next allocation would cause the cursors to cross in the middle,  the buffer is full and another buffer is needed. The Dwarf_Lines type could have a pointer to a next Dwarf_Lines structure. If there is concern that finding a line would not really be constant time if there was a huge number of lines for a file line lookup could require following an unbounded chain of pointers to a next buffer, then the buffers could be managed in a splay tree or a skip list, which would give O(log n) time lookup of a line buffer followed by a constant time indexing into the buffer.
> 
>> 
>>>> - two new DWARF extended opcodes: DW_LNE_inlined_call, 
>>>>  DW_LNE_set_function_name,
>> 
>> Like I said above I think these names should contain the "vendor" name
>> DW_LNE_NVIDIA_...
> 
> Done.
> 
>> ...
> 
>> 
>>>> - two new functions in the libdw API: dwarf_linecontext and 
>>>>  dwarf_linefunctionname, which return the new line table fields.
>> 
>> I think it would be nice if we could make these functions return a
>> Dwarf_Line * and a const char * to make them future proof in case a standard extension encodes these differently.
  
Mark Wielaard Nov. 5, 2021, 9:34 a.m. UTC | #5
Hi,

On Thu, 2021-11-04 at 16:41 -0500, John Mellor-Crummey via Elfutils-
devel wrote:
> [We would really like this patch in the forthcoming release]
> 
> Attached is a new version of the patch for reading inlining
> information encoded in an enhanced line map format used in NVIDIA GPU
> binaries for CUDA 11.2+.

It looks like the attachment is missing. Or the mailinglist removed it
for some reason, but I also didn't see it here:
https://sourceware.org/pipermail/elfutils-devel/2021q4/004307.html

Could you resent it?

Thanks,

Mark

> This is an updated version of a patch first submitted on Sept. 5. A
> copy of the original submission email is quoted below this note. 
> 
> Here I describe just the improvements to that patch that address
> Mark’s concerns:
> 
> (1) all of the code for handling NVIDIA DWARF extensions is always
> available; there is no special configuration switch needed.
> (2) all changes are bracketed by comments that mark them NVIDIA
> extensions
> (3) the DWARF extended opcodes have been renamed with names that
> include NVIDIA in them
> (4) the two new API functions to surface the new information have
> been improved to separate the interface result from the internal
> representation (at Mark’s request)
> 	(4a) the API for extracting the name of an inlined function in
> a DWARF line now returns a const char * instead of a string table
> index
> 	(4b) the API for extracting an inline “context” now returns a
> pointer to a DWARF line where the code is inlined rather than
> returning an unsigned int (an index into the line table that one
> could use to compute the pointer)
> (5) there are test cases for readelf and libdw that use a binary
> generated by NVIDIA’s compiler. the test cases include information
> about how the binary was generated
  
Mark Wielaard Nov. 10, 2021, 10:16 a.m. UTC | #6
Hi John,

On Thu, Nov 04, 2021 at 04:41:58PM -0500, John Mellor-Crummey via Elfutils-devel wrote:
> Here I describe just the improvements to that patch that address Mark’s concerns:
>
> (1) all of the code for handling NVIDIA DWARF extensions is always
> available; there is no special configuration switch needed.
> (2) all changes are bracketed by comments that mark them NVIDIA
> extensions
> (3) the DWARF extended opcodes have been renamed with names that
> include NVIDIA in them
> (4) the two new API functions to surface the new information have
> been improved to separate the interface result from the internal
> representation (at Mark’s request)
>      (4a) the API for extracting the name of an inlined function in a
>         DWARF line now returns a const char * instead of a string
>         table index
>      (4b) the API for extracting an inline “context” now returns a
>         pointer to a DWARF line where the code is inlined rather
>         than returning an unsigned int (an index into the line table
>         that one could use to compute the pointer)
> (5) there are test cases for readelf and libdw that use a binary
> generated by NVIDIA’s compiler. the test cases include information
> about how the binary was generated

This is really nice. I did make a few tweaks:

- Added your original overview as commit message because it contains
  all the relevant context and pointers to more information.

- Added ChangeLog and NEWS entries, mainly for my own review.

- I removed the bracketed comments, I think they cluttered the code
  and made it seem like we wanted to remove it or disable at some
  point. I think it should just be considered part of the normal code
  now.

- I removed the NVIDIA_LINEMAP_INLINING_EXTENSIONS define from
  version.h. If people want they can have a configure check for the
  new dwarf_linecontext or dwarf_linefunctionname functions or the
  DW_LNE_NVIDIA_inlined_call or DW_LNE_NVIDIA_set_function_name
  constants.

- I made dwarf_linefunctionname always return NULL on error (not
  the magic string "???", which is still used in readelf).

- Changed the header check to be exactly 4 bytes, so we are sure to be
  able to read the str offset completely (if it is smaller or larger
  we cannot handle it).

- The new dwarf_linecontext and dwarf_linefunctionname get their own
  new ELFUTILS_0.186 section in libdw.map because they are introduced
  with verion 0.186.

- The new run-nvidia-extended-linemap-libdw.sh and
  run-nvidia-extended-linemap-readelf.sh sripts and
  testfile_nvidia_linemap.bz2 testfile were added to EXTRA_DIST so
  they show up in a dist tarball.

Patch as committed attached. Hope you don't mind the cleanups.

We still want to reduce the size of the Dwarf_Line_s and struct
line_state (independent of these extensions). I opened a new bug for
that: https://sourceware.org/bugzilla/show_bug.cgi?id=28574

Thanks,

Mark
  
John Mellor-Crummey Nov. 10, 2021, 3:14 p.m. UTC | #7
Mark,

Your tweaks are fine. Many thanks for accepting our patch before 186!
--
John Mellor-Crummey		Professor
Dept of Computer Science	Rice University
email: johnmc@rice.edu		phone: 713-348-5179

> On Nov 10, 2021, at 4:16 AM, Mark Wielaard <mark@klomp.org> wrote:
> 
> Hi John,
> 
> On Thu, Nov 04, 2021 at 04:41:58PM -0500, John Mellor-Crummey via Elfutils-devel wrote:
>> Here I describe just the improvements to that patch that address Mark’s concerns:
>> 
>> (1) all of the code for handling NVIDIA DWARF extensions is always
>> available; there is no special configuration switch needed.
>> (2) all changes are bracketed by comments that mark them NVIDIA
>> extensions
>> (3) the DWARF extended opcodes have been renamed with names that
>> include NVIDIA in them
>> (4) the two new API functions to surface the new information have
>> been improved to separate the interface result from the internal
>> representation (at Mark’s request)
>>     (4a) the API for extracting the name of an inlined function in a
>>        DWARF line now returns a const char * instead of a string
>>        table index
>>     (4b) the API for extracting an inline “context” now returns a
>>        pointer to a DWARF line where the code is inlined rather
>>        than returning an unsigned int (an index into the line table
>>        that one could use to compute the pointer)
>> (5) there are test cases for readelf and libdw that use a binary
>> generated by NVIDIA’s compiler. the test cases include information
>> about how the binary was generated
> 
> This is really nice. I did make a few tweaks:
> 
> - Added your original overview as commit message because it contains
>  all the relevant context and pointers to more information.
> 
> - Added ChangeLog and NEWS entries, mainly for my own review.
> 
> - I removed the bracketed comments, I think they cluttered the code
>  and made it seem like we wanted to remove it or disable at some
>  point. I think it should just be considered part of the normal code
>  now.
> 
> - I removed the NVIDIA_LINEMAP_INLINING_EXTENSIONS define from
>  version.h. If people want they can have a configure check for the
>  new dwarf_linecontext or dwarf_linefunctionname functions or the
>  DW_LNE_NVIDIA_inlined_call or DW_LNE_NVIDIA_set_function_name
>  constants.
> 
> - I made dwarf_linefunctionname always return NULL on error (not
>  the magic string "???", which is still used in readelf).
> 
> - Changed the header check to be exactly 4 bytes, so we are sure to be
>  able to read the str offset completely (if it is smaller or larger
>  we cannot handle it).
> 
> - The new dwarf_linecontext and dwarf_linefunctionname get their own
>  new ELFUTILS_0.186 section in libdw.map because they are introduced
>  with verion 0.186.
> 
> - The new run-nvidia-extended-linemap-libdw.sh and
>  run-nvidia-extended-linemap-readelf.sh sripts and
>  testfile_nvidia_linemap.bz2 testfile were added to EXTRA_DIST so
>  they show up in a dist tarball.
> 
> Patch as committed attached. Hope you don't mind the cleanups.
> 
> We still want to reduce the size of the Dwarf_Line_s and struct
> line_state (independent of these extensions). I opened a new bug for
> that: https://sourceware.org/bugzilla/show_bug.cgi?id=28574
> 
> Thanks,
> 
> Mark<0001-libdw-readelf-Read-inlining-info-in-NVIDIA-extended-.patch>
  

Patch

diff --git a/ChangeLog b/ChangeLog
index 6255fe61..d4b89f9c 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@ 
+2021-09-03  John Mellor-Crummey <johnmc@rice.edu>
+
+	* NEWS: Read inlining info in NVIDIA extended line map
+
 2021-08-10  Adrian Ratiu  <adrian.ratiu@collabora.com>
 
 	* configure.ac (AC_CACHE_CHECK): Rework std=gnu99 check to allow clang.
diff --git a/config/version.h.in b/config/version.h.in
index 34e62c3b..8b30d144 100644
--- a/config/version.h.in
+++ b/config/version.h.in
@@ -35,4 +35,8 @@ 
 #define _ELFUTILS_PREREQ(major, minor) \
   (_ELFUTILS_VERSION >= ((major) * 1000 + (minor)))
 
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+#define NVIDIA_LINEMAP_INLINING_EXTENSIONS 1
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+
 #endif	/* elfutils/version.h */
diff --git a/libdw/Makefile.am b/libdw/Makefile.am
index 6b7834af..4fda33bd 100644
--- a/libdw/Makefile.am
+++ b/libdw/Makefile.am
@@ -63,6 +63,7 @@  libdw_a_SOURCES = dwarf_begin.c dwarf_begin_elf.c dwarf_end.c dwarf_getelf.c \
 		  dwarf_linesrc.c dwarf_lineno.c dwarf_lineaddr.c \
 		  dwarf_linecol.c dwarf_linebeginstatement.c \
 		  dwarf_lineendsequence.c dwarf_lineblock.c \
+		  dwarf_linecontext.c dwarf_linefunctionname.c \
 		  dwarf_lineprologueend.c dwarf_lineepiloguebegin.c \
 		  dwarf_lineisa.c dwarf_linediscriminator.c \
 		  dwarf_lineop_index.c dwarf_line_file.c \
diff --git a/libdw/dwarf.h b/libdw/dwarf.h
index 19a4be96..2faf852a 100644
--- a/libdw/dwarf.h
+++ b/libdw/dwarf.h
@@ -844,6 +844,10 @@  enum
     DW_LNE_set_discriminator = 4,
 
     DW_LNE_lo_user = 128,
+    /* Begin  NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+    DW_LNE_inlined_call = 144,
+    DW_LNE_set_function_name = 145,
+    /* End  NVIDIA_LINEMAP_INLINING_EXTENSIONS */
     DW_LNE_hi_user = 255
   };
 
diff --git a/libdw/dwarf_getsrclines.c b/libdw/dwarf_getsrclines.c
index d6a581ad..d7907d4d 100644
--- a/libdw/dwarf_getsrclines.c
+++ b/libdw/dwarf_getsrclines.c
@@ -93,6 +93,10 @@  struct line_state
   struct linelist *linelist;
   size_t nlinelist;
   unsigned int end_sequence;
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+  unsigned int context;
+  unsigned int function_name;
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
 };
 
 static inline void
@@ -139,6 +143,10 @@  add_new_line (struct line_state *state, struct linelist *new_line)
   SET (epilogue_begin);
   SET (isa);
   SET (discriminator);
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+  SET (context);
+  SET (function_name);
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
 
 #undef SET
 
@@ -165,6 +173,13 @@  read_srclines (Dwarf *dbg,
 #define MAX_STACK_FILES (MAX_STACK_ALLOC / 4)
 #define MAX_STACK_DIRS  (MAX_STACK_ALLOC / 16)
 
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+  /* reduce the MAX_STACK_LINES when using NVIDIA linemap inlining extensions, which
+     increase the size of the line structure by two unsigned int */
+#undef MAX_STACK_LINES
+#define MAX_STACK_LINES (MAX_STACK_ALLOC / 2)
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+
   /* Initial statement program state (except for stmt_list, see below).  */
   struct line_state state =
     {
@@ -180,7 +195,11 @@  read_srclines (Dwarf *dbg,
       .prologue_end = false,
       .epilogue_begin = false,
       .isa = 0,
-      .discriminator = 0
+      .discriminator = 0,
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+      .context = 0,
+      .function_name = 0
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
     };
 
   /* The dirs normally go on the stack, but if there are too many
@@ -648,6 +667,14 @@  read_srclines (Dwarf *dbg,
 	}
     }
 
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+  unsigned int debug_str_offset __attribute__((unused)) = 0;
+  if (unlikely (linep < header_start + header_length)) {
+      /* CUBINs contain an unsigned 4-byte offset */
+      debug_str_offset = read_4ubyte_unaligned_inc (dbg, linep);
+  }
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+
   /* Consistency check.  */
   if (unlikely (linep != header_start + header_length))
     {
@@ -753,6 +780,10 @@  read_srclines (Dwarf *dbg,
 	      state.epilogue_begin = false;
 	      state.isa = 0;
 	      state.discriminator = 0;
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+	      state.context = 0;
+	      state.function_name = 0;
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
 	      break;
 
 	    case DW_LNE_set_address:
@@ -831,6 +862,25 @@  read_srclines (Dwarf *dbg,
 	      get_uleb128 (state.discriminator, linep, lineendp);
 	      break;
 
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+	    case DW_LNE_inlined_call:
+	      if (unlikely (linep >= lineendp))
+		goto invalid_data;
+	      get_uleb128 (state.context, linep, lineendp);
+	      if (unlikely (linep >= lineendp))
+		goto invalid_data;
+	      get_uleb128 (state.function_name, linep, lineendp);
+	      state.function_name += debug_str_offset;
+	      break;
+
+	    case DW_LNE_set_function_name:
+	      if (unlikely (linep >= lineendp))
+		goto invalid_data;
+	      get_uleb128 (state.function_name, linep, lineendp);
+	      state.function_name += debug_str_offset;
+	      break;
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+
 	    default:
 	      /* Unknown, ignore it.  */
 	      if (unlikely ((size_t) (lineendp - (linep - 1)) < len))
diff --git a/libdw/dwarf_linecontext.c b/libdw/dwarf_linecontext.c
new file mode 100644
index 00000000..f1c78dfd
--- /dev/null
+++ b/libdw/dwarf_linecontext.c
@@ -0,0 +1,47 @@ 
+/* Return context in line.
+   This file is part of elfutils.
+   Written by John Mellor-Crummey <johnmc@rice.edu>, 2021.
+
+   This file is free software; you can redistribute it and/or modify
+   it under the terms of either
+
+     * the GNU Lesser General Public License as published by the Free
+       Software Foundation; either version 3 of the License, or (at
+       your option) any later version
+
+   or
+
+     * the GNU General Public License as published by the Free
+       Software Foundation; either version 2 of the License, or (at
+       your option) any later version
+
+   or both in parallel, as here.
+
+   elfutils is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received copies of the GNU General Public License and
+   the GNU Lesser General Public License along with this program.  If
+   not, see <http://www.gnu.org/licenses/>.  */
+
+#ifdef HAVE_CONFIG_H
+# include <config.h>
+#endif
+
+#include "libdwP.h"
+
+
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+int
+dwarf_linecontext (Dwarf_Line *line, unsigned int *contextp)
+{
+  if (line == NULL)
+    return -1;
+
+  *contextp =  line->context;
+
+  return 0;
+}
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
diff --git a/libdw/dwarf_linefunctionname.c b/libdw/dwarf_linefunctionname.c
new file mode 100644
index 00000000..a7027135
--- /dev/null
+++ b/libdw/dwarf_linefunctionname.c
@@ -0,0 +1,47 @@ 
+/* Return function name in line.
+   This file is part of elfutils.
+   Written by John Mellor-Crummey <johnmc@rice.edu>, 2021.
+
+   This file is free software; you can redistribute it and/or modify
+   it under the terms of either
+
+     * the GNU Lesser General Public License as published by the Free
+       Software Foundation; either version 3 of the License, or (at
+       your option) any later version
+
+   or
+
+     * the GNU General Public License as published by the Free
+       Software Foundation; either version 2 of the License, or (at
+       your option) any later version
+
+   or both in parallel, as here.
+
+   elfutils is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received copies of the GNU General Public License and
+   the GNU Lesser General Public License along with this program.  If
+   not, see <http://www.gnu.org/licenses/>.  */
+
+#ifdef HAVE_CONFIG_H
+# include <config.h>
+#endif
+
+#include "libdwP.h"
+
+
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+int
+dwarf_linefunctionname (Dwarf_Line *line, unsigned int *functionnamep)
+{
+  if (line == NULL)
+    return -1;
+
+  *functionnamep =  line->function_name;
+
+  return 0;
+}
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
diff --git a/libdw/libdw.h b/libdw/libdw.h
index 77174d28..730f9338 100644
--- a/libdw/libdw.h
+++ b/libdw/libdw.h
@@ -701,6 +701,18 @@  extern int dwarf_linediscriminator (Dwarf_Line *line, unsigned int *discp)
 extern const char *dwarf_linesrc (Dwarf_Line *line,
 				  Dwarf_Word *mtime, Dwarf_Word *length);
 
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+/* NVIDIA extension: Return inline context in this record. A non-zero context
+   value represents an inline context */
+extern int dwarf_linecontext (Dwarf_Line *line, unsigned int *contextp);
+
+/* NVIDIA extension: Return function name in this record. When context is
+   non-zero, the value of function name is an offset into the .debug_str section,
+   which contains a character string that specifies the name of an inlined
+   function. */
+extern int dwarf_linefunctionname (Dwarf_Line *line, unsigned int *functionnamep);
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+
 /* Return file information.  The returned string is NULL when
    an error occurred, or the file path.  The file path is either absolute
    or relative to the compilation directory.  See dwarf_decl_file.  */
diff --git a/libdw/libdw.map b/libdw/libdw.map
index 8ab0a2a0..2505cd46 100644
--- a/libdw/libdw.map
+++ b/libdw/libdw.map
@@ -67,8 +67,10 @@  ELFUTILS_0.122 {
     dwarf_linebeginstatement;
     dwarf_lineblock;
     dwarf_linecol;
+    dwarf_linecontext;
     dwarf_lineendsequence;
     dwarf_lineepiloguebegin;
+    dwarf_linefunctionname;
     dwarf_lineno;
     dwarf_lineprologueend;
     dwarf_linesrc;
diff --git a/libdw/libdwP.h b/libdw/libdwP.h
index 7174ea93..d8fa7e7e 100644
--- a/libdw/libdwP.h
+++ b/libdw/libdwP.h
@@ -291,6 +291,10 @@  struct Dwarf_Line_s
   unsigned int op_index:8;
   unsigned int isa:8;
   unsigned int discriminator:24;
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+  unsigned int context;
+  unsigned int function_name;
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
 };
 
 struct Dwarf_Lines_s
diff --git a/src/readelf.c b/src/readelf.c
index 8191bde2..050f2592 100644
--- a/src/readelf.c
+++ b/src/readelf.c
@@ -8481,6 +8481,9 @@  print_debug_line_section (Dwfl_Module *dwflmod, Ebl *ebl, GElf_Ehdr *ehdr,
 	    goto invalid_data;
 	  header_length = read_8ubyte_unaligned_inc (dbg, linep);
 	}
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+      const unsigned char *header_start = linep;
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
 
       /* Next the minimum instruction length.  */
       if ((size_t) (lineendp - linep) < 1)
@@ -8765,6 +8768,14 @@  print_debug_line_section (Dwfl_Module *dwflmod, Ebl *ebl, GElf_Ehdr *ehdr,
 	  ++linep;
 	}
 
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+      unsigned int debug_str_offset __attribute__((unused)) = 0;
+      if (unlikely (linep < header_start + header_length)) {
+	/* CUBINs contain an unsigned 4-byte offset */
+	debug_str_offset = read_4ubyte_unaligned_inc (dbg, linep);
+      }
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+
       if (linep == lineendp)
 	{
 	  puts (_("\nNo line number statements."));
@@ -8913,6 +8924,41 @@  print_debug_line_section (Dwfl_Module *dwflmod, Ebl *ebl, GElf_Ehdr *ehdr,
 		  printf (_(" set discriminator to %u\n"), u128);
 		  break;
 
+/* Begin NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+		case DW_LNE_inlined_call:
+		  {
+		    if (unlikely (linep >= lineendp))
+		      goto invalid_data;
+
+		    unsigned int context;
+		    get_uleb128 (context, linep, lineendp);
+
+		    if (unlikely (linep >= lineendp))
+		      goto invalid_data;
+
+		    unsigned int function_name;
+		    get_uleb128 (function_name, linep, lineendp);
+		    function_name += debug_str_offset;
+
+		    printf (_(" inlined context %u, function name 0x%x \n"),
+			    context, function_name);
+		    break;
+		  }
+
+		case DW_LNE_set_function_name:
+		  {
+		    if (unlikely (linep >= lineendp))
+		      goto invalid_data;
+
+		    unsigned int function_name;
+		    get_uleb128 (function_name, linep, lineendp);
+		    function_name += debug_str_offset;
+
+		    printf (_(" set function name %u\n"), function_name);
+		  }
+		  break;
+/* End NVIDIA_LINEMAP_INLINING_EXTENSIONS */
+
 		default:
 		  /* Unknown, ignore it.  */
 		  puts (_(" unknown opcode"));
diff --git a/tests/Makefile.am b/tests/Makefile.am
index c586422e..263d6bc5 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -61,6 +61,7 @@  check_PROGRAMS = arextract arsymtest newfile saridx scnnames sectiondump \
 		  dwelf_elf_e_machine_string \
 		  getphdrnum leb128 read_unaligned \
 		  msg_tst system-elf-libelf-test \
+                  nvidia_extended_linemap_libdw \
 		  $(asm_TESTS)
 
 asm_TESTS = asm-tst1 asm-tst2 asm-tst3 asm-tst4 asm-tst5 \
@@ -189,6 +190,7 @@  TESTS = run-arextract.sh run-arsymtest.sh run-ar.sh newfile test-nlist \
 	leb128 read_unaligned \
 	msg_tst system-elf-libelf-test \
 	$(asm_TESTS) run-disasm-bpf.sh run-low_high_pc-dw-form-indirect.sh \
+	run-nvidia-extended-linemap-libdw.sh run-nvidia-extended-linemap-readelf.sh \
 	run-readelf-dw-form-indirect.sh run-strip-largealign.sh
 
 if !BIARCH
@@ -725,6 +727,7 @@  dwelf_elf_e_machine_string_LDADD = $(libelf) $(libdw)
 getphdrnum_LDADD = $(libelf) $(libdw)
 leb128_LDADD = $(libelf) $(libdw)
 read_unaligned_LDADD = $(libelf) $(libdw)
+nvidia_extended_linemap_libdw_LDADD = $(libelf) $(libdw)
 
 # We want to test the libelf header against the system elf.h header.
 # Don't include any -I CPPFLAGS. Except when we install our own elf.h.
diff --git a/tests/nvidia_extended_linemap_libdw.c b/tests/nvidia_extended_linemap_libdw.c
new file mode 100644
index 00000000..9f1e5efd
--- /dev/null
+++ b/tests/nvidia_extended_linemap_libdw.c
@@ -0,0 +1,153 @@ 
+/* Inspect nvidia extended linemap with dwarf_next_lines.
+   Copyright (C) 2002, 2004, 2018 Red Hat, Inc.
+   This file is part of elfutils.
+
+   This file is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   elfutils is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifdef HAVE_CONFIG_H
+# include <config.h>
+#endif
+
+#include <fcntl.h>
+#include <inttypes.h>
+#include <libelf.h>
+#include ELFUTILS_HEADER(dw)
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+
+int
+main (int argc, char *argv[])
+{
+  int result = 0;
+  int cnt;
+
+  for (cnt = 1; cnt < argc; ++cnt)
+    {
+      int fd = open (argv[cnt], O_RDONLY);
+
+      Dwarf *dbg = dwarf_begin (fd, DWARF_C_READ);
+      if  (dbg == NULL)
+	{
+	  printf ("%s not usable: %s\n", argv[cnt], dwarf_errmsg (-1));
+	  close  (fd);
+	  continue;
+	}
+
+      Dwarf_Off off;
+      Dwarf_Off next_off = 0;
+      Dwarf_CU *cu = NULL;
+      Dwarf_Lines *lb;
+      size_t nlb;
+      int res;
+      while ((res = dwarf_next_lines (dbg, off = next_off, &next_off, &cu,
+				      NULL, NULL, &lb, &nlb)) == 0)
+	{
+	  printf ("off = %" PRIu64 "\n", off);
+	  printf (" %zu lines\n", nlb);
+
+	  for (size_t i = 0; i < nlb; ++i)
+	    {
+	      Dwarf_Line *l = dwarf_onesrcline (lb, i);
+	      if (l == NULL)
+		{
+		  printf ("%s: cannot get individual line\n", argv[cnt]);
+		  result = 1;
+		  break;
+		}
+
+	      Dwarf_Addr addr;
+	      if (dwarf_lineaddr (l, &addr) != 0)
+		addr = 0;
+	      const char *file = dwarf_linesrc (l, NULL, NULL);
+	      int line;
+	      if (dwarf_lineno (l, &line) != 0)
+		line = 0;
+
+	      printf ("%" PRIx64 ": %s:%d:", (uint64_t) addr,
+		      file ?: "???", line);
+
+	      /* Getting the file path through the Dwarf_Files should
+		 result in the same path.  */
+	      Dwarf_Files *files;
+	      size_t idx;
+	      if (dwarf_line_file (l, &files, &idx) != 0)
+		{
+		  printf ("%s: cannot get file from line (%zd): %s\n",
+			  argv[cnt], i, dwarf_errmsg (-1));
+		  result = 1;
+		  break;
+		}
+	      const char *path = dwarf_filesrc (files, idx, NULL, NULL);
+	      if ((path == NULL && file != NULL)
+		  || (path != NULL && file == NULL)
+		  || (strcmp (file, path) != 0))
+		{
+		  printf ("%s: line %zd srcline (%s) != file srcline (%s)\n",
+			  argv[cnt], i, file ?: "???", path ?: "???");
+		  result = 1;
+		  break;
+		}
+
+	      int column;
+	      if (dwarf_linecol (l, &column) != 0)
+		column = 0;
+	      if (column >= 0)
+		printf ("%d:", column);
+
+              unsigned int context;
+	      if (dwarf_linecontext (l, &context) != 0)
+	        context = 0;
+              unsigned int functionname;
+	      if (dwarf_linefunctionname (l, &functionname) != 0)
+		functionname = 0;
+              if (context > 0) {
+		  printf (" context:%u, functionname:%u,", context, functionname);
+              }
+
+	      bool is_stmt;
+	      if (dwarf_linebeginstatement (l, &is_stmt) != 0)
+		is_stmt = false;
+	      bool end_sequence;
+	      if (dwarf_lineendsequence (l, &end_sequence) != 0)
+		end_sequence = false;
+	      bool basic_block;
+	      if (dwarf_lineblock (l, &basic_block) != 0)
+		basic_block = false;
+	      bool prologue_end;
+	      if (dwarf_lineprologueend (l, &prologue_end) != 0)
+		prologue_end = false;
+	      bool epilogue_begin;
+	      if (dwarf_lineepiloguebegin (l, &epilogue_begin) != 0)
+		epilogue_begin = false;
+	      printf (" is_stmt:%s, end_seq:%s, bb:%s, prologue:%s, epilogue:%s\n",
+		      is_stmt ? "yes" : "no", end_sequence ? "yes" : "no",
+		      basic_block ? "yes" : "no", prologue_end  ? "yes" : "no",
+		      epilogue_begin ? "yes" : "no");
+	    }
+	}
+
+      if (res < 0)
+	{
+	  printf ("dwarf_next_lines failed: %s\n", dwarf_errmsg (-1));
+	  result = 1;
+	}
+
+      dwarf_end (dbg);
+      close (fd);
+    }
+
+  return result;
+}
diff --git a/tests/run-nvidia-extended-linemap-libdw.sh b/tests/run-nvidia-extended-linemap-libdw.sh
new file mode 100755
index 00000000..b0386b49
--- /dev/null
+++ b/tests/run-nvidia-extended-linemap-libdw.sh
@@ -0,0 +1,41 @@ 
+# Copyright (C) 2011 Red Hat, Inc.
+# This file is part of elfutils.
+#
+# This file is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# elfutils is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. $srcdir/test-subr.sh
+
+testfiles testfile_nvidia_linemap
+testrun_compare ${abs_top_builddir}/tests/nvidia_extended_linemap_libdw testfile_nvidia_linemap << EOF
+off = 0
+ 18 lines
+0: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:25:0: is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+10: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:26:0: is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+40: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:27:0: is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+90: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:25:0: is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+a0: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:28:0: is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+100: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:28:0: is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+100: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:8:0: context:6, functionname:0, is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+150: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:9:0: context:6, functionname:0, is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+1e0: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:31:0: is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+1e0: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/bar.h:6:0: context:9, functionname:4, is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+1e0: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:8:0: context:10, functionname:0, is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+220: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:9:0: context:10, functionname:0, is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+2b0: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/bar.h:7:0: context:9, functionname:4, is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+2f0: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/bar.h:8:0: context:9, functionname:4, is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+2f0: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:18:0: context:14, functionname:8, is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+330: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:19:0: context:14, functionname:8, is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+3c0: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:33:0: is_stmt:yes, end_seq:no, bb:no, prologue:no, epilogue:no
+480: /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4/main.cu:33:0: is_stmt:yes, end_seq:yes, bb:no, prologue:no, epilogue:no
+EOF
diff --git a/tests/run-nvidia-extended-linemap-readelf.sh b/tests/run-nvidia-extended-linemap-readelf.sh
new file mode 100755
index 00000000..6ad96027
--- /dev/null
+++ b/tests/run-nvidia-extended-linemap-readelf.sh
@@ -0,0 +1,114 @@ 
+# Copyright (C) 2011 Red Hat, Inc.
+# This file is part of elfutils.
+#
+# This file is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# elfutils is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+. $srcdir/test-subr.sh
+
+testfiles testfile_nvidia_linemap
+testrun_compare ${abs_top_builddir}/src/readelf --debug-dump=line testfile_nvidia_linemap << EOF
+
+DWARF section [ 5] '.debug_line' at offset 0x3e0:
+
+Table at offset 0:
+
+ Length:                         253
+ DWARF version:                  2
+ Prologue length:                111
+ Address size:                   8
+ Segment selector size:          0
+ Min instruction length:         1
+ Max operations per instruction: 1
+ Initial value if 'is_stmt':     1
+ Line base:                      -5
+ Line range:                     14
+ Opcode base:                    10
+
+Opcodes:
+  [1]  0 arguments
+  [2]  1 argument
+  [3]  1 argument
+  [4]  1 argument
+  [5]  1 argument
+  [6]  0 arguments
+  [7]  0 arguments
+  [8]  0 arguments
+  [9]  1 argument
+
+Directory table:
+ /home/johnmc/hpctoolkit-gpu-samples/nvidia_extended_linemap4
+
+File name table:
+ Entry Dir   Time      Size      Name
+ 1     1     1626104146 1819      main.cu
+ 2     1     1626104111 211       bar.h
+
+Line number statements:
+ [    79] extended opcode 2:  set address to 0 <kernel>
+ [    84] set file to 1
+ [    86] advance line by constant 24 to 25
+ [    88] copy
+ [    89] special opcode 240: address+16 = 0x10 <kernel+0x10>, line+1 = 26
+ [    8a] advance line by constant 1 to 27
+ [    8c] advance address by 48 to 0x40 <kernel+0x40>
+ [    8e] copy
+ [    8f] advance line by constant -2 to 25
+ [    91] advance address by 80 to 0x90 <kernel+0x90>
+ [    94] copy
+ [    95] special opcode 242: address+16 = 0xa0 <kernel+0xa0>, line+3 = 28
+ [    96] advance address by 96 to 0x100 <kernel+0x100>
+ [    99] copy
+ [    9a] extended opcode 144:  inlined context 6, function name 0x0 
+ [    9f] advance line by constant -20 to 8
+ [    a1] copy
+ [    a2] advance line by constant 1 to 9
+ [    a4] advance address by 80 to 0x150 <kernel+0x150>
+ [    a7] copy
+ [    a8] extended opcode 144:  inlined context 0, function name 0x0 
+ [    ad] advance line by constant 22 to 31
+ [    af] advance address by 144 to 0x1e0 <kernel+0x1e0>
+ [    b2] copy
+ [    b3] set file to 2
+ [    b5] extended opcode 144:  inlined context 9, function name 0x4 
+ [    ba] advance line by constant -25 to 6
+ [    bc] copy
+ [    bd] set file to 1
+ [    bf] extended opcode 144:  inlined context 10, function name 0x0 
+ [    c4] advance line by constant 2 to 8
+ [    c6] copy
+ [    c7] advance line by constant 1 to 9
+ [    c9] advance address by 64 to 0x220 <kernel+0x220>
+ [    cc] copy
+ [    cd] set file to 2
+ [    cf] extended opcode 144:  inlined context 9, function name 0x4 
+ [    d4] advance line by constant -2 to 7
+ [    d6] advance address by 144 to 0x2b0 <kernel+0x2b0>
+ [    d9] copy
+ [    da] advance line by constant 1 to 8
+ [    dc] advance address by 64 to 0x2f0 <kernel+0x2f0>
+ [    df] copy
+ [    e0] set file to 1
+ [    e2] extended opcode 144:  inlined context 14, function name 0x8 
+ [    e7] advance line by constant 10 to 18
+ [    e9] copy
+ [    ea] advance line by constant 1 to 19
+ [    ec] advance address by 64 to 0x330 <kernel+0x330>
+ [    ef] copy
+ [    f0] extended opcode 144:  inlined context 0, function name 0x0 
+ [    f5] advance line by constant 14 to 33
+ [    f7] advance address by 144 to 0x3c0 <kernel+0x3c0>
+ [    fa] copy
+ [    fb] advance address by 192 to 0x480
+ [    fe] extended opcode 1:  end of sequence
+EOF
diff --git a/tests/testfile_nvidia_linemap.bz2 b/tests/testfile_nvidia_linemap.bz2
new file mode 100644
index 0000000000000000000000000000000000000000..8a6d09fbd1419af2b4664e8751207be470198f1a