diff mbox series

[v2] libebl: recognize FDO Packaging Metadata ELF note

Message ID 20211121194318.105654-1-luca.boccassi@gmail.com
State New
Headers show
Series [v2] libebl: recognize FDO Packaging Metadata ELF note | expand

Commit Message

luca.boccassi--- via Elfutils-devel Nov. 21, 2021, 7:43 p.m. UTC
From: Luca Boccassi <bluca@debian.org>

As defined on: https://systemd.io/COREDUMP_PACKAGE_METADATA/
this note will be used starting from Fedora 36. Allow
readelf --notes to pretty print it:

Note section [ 3] '.note.package' of 76 bytes at offset 0x2e8:
  Owner          Data size  Type
  FDO                   57  FDO_PACKAGING_METADATA
    Packaging Metadata: {"type":"deb","name":"fsverity-utils","version":"1.3-1"}

Signed-off-by: Luca Boccassi <luca.boccassi@microsoft.com>
---
v2: check that note payload is NULL terminated

 libebl/eblobjnote.c         | 3 +++
 libebl/eblobjnotetypename.c | 3 +++
 libelf/elf.h                | 3 +++
 3 files changed, 9 insertions(+)

Comments

Luca Boccassi Nov. 25, 2021, 5:02 p.m. UTC | #1
On Sun, 2021-11-21 at 19:43 +0000, luca.boccassi@gmail.com wrote:
> From: Luca Boccassi <bluca@debian.org>
> 
> As defined on: https://systemd.io/COREDUMP_PACKAGE_METADATA/
> this note will be used starting from Fedora 36. Allow
> readelf --notes to pretty print it:
> 
> Note section [ 3] '.note.package' of 76 bytes at offset 0x2e8:
>   Owner          Data size  Type
>   FDO                   57  FDO_PACKAGING_METADATA
>     Packaging Metadata: {"type":"deb","name":"fsverity-utils","version":"1.3-1"}
> 
> Signed-off-by: Luca Boccassi <luca.boccassi@microsoft.com>
> ---
> v2: check that note payload is NULL terminated
> 
>  libebl/eblobjnote.c         | 3 +++
>  libebl/eblobjnotetypename.c | 3 +++
>  libelf/elf.h                | 3 +++
>  3 files changed, 9 insertions(+)
> 
> diff --git a/libebl/eblobjnote.c b/libebl/eblobjnote.c
> index 36efe275..e3ad1409 100644
> --- a/libebl/eblobjnote.c
> +++ b/libebl/eblobjnote.c
> @@ -288,6 +288,9 @@ ebl_object_note (Ebl *ebl, uint32_t namesz, const char *name, uint32_t type,
>        if (descsz == 0 && type == NT_VERSION)
>  	return;
>  
> 
> 
> 
> +      if (strcmp ("FDO", name) == 0 && type == FDO_PACKAGING_METADATA && descsz > 0 && desc[descsz - 1] == '\0')
> +	printf("    Packaging Metadata: %.*s\n", (int) descsz, desc);
> +
>        /* Everything else should have the "GNU" owner name.  */
>        if (strcmp ("GNU", name) != 0)
>  	return;
> diff --git a/libebl/eblobjnotetypename.c b/libebl/eblobjnotetypename.c
> index 4662906d..863f20e4 100644
> --- a/libebl/eblobjnotetypename.c
> +++ b/libebl/eblobjnotetypename.c
> @@ -101,6 +101,9 @@ ebl_object_note_type_name (Ebl *ebl, const char *name, uint32_t type,
>  	  return buf;
>  	}
>  
> 
> 
> 
> +      if (strcmp (name, "FDO") == 0 && type == FDO_PACKAGING_METADATA)
> +	return "FDO_PACKAGING_METADATA";
> +
>        if (strcmp (name, "GNU") != 0)
>  	{
>  	  /* NT_VERSION is special, all data is in the name.  */
> diff --git a/libelf/elf.h b/libelf/elf.h
> index 8e3e618f..633f9f67 100644
> --- a/libelf/elf.h
> +++ b/libelf/elf.h
> @@ -1297,6 +1297,9 @@ typedef struct
>  /* Program property.  */
>  #define NT_GNU_PROPERTY_TYPE_0 5
>  
> 
> 
> 
> +/* Packaging metadata as defined on https://systemd.io/COREDUMP_PACKAGE_METADATA/ */
> +#define FDO_PACKAGING_METADATA 0xcafe1a7e
> +
>  /* Note section name of program property.   */
>  #define NOTE_GNU_PROPERTY_SECTION_NAME ".note.gnu.property"

Hi Mark,

I could move this definition to an internal header if the change to
elf.h blocks this patch, if you prefer? Let me know.
Mark Wielaard Nov. 30, 2021, 11:25 a.m. UTC | #2
Hi Luca,

On Thu, 2021-11-25 at 17:02 +0000, Luca Boccassi via Elfutils-devel
wrote:
> +/* Packaging metadata as defined on 
> > https://systemd.io/COREDUMP_PACKAGE_METADATA/ */
> > +#define FDO_PACKAGING_METADATA 0xcafe1a7e
> > +
> >  /* Note section name of program property.   */
> >  #define NOTE_GNU_PROPERTY_SECTION_NAME ".note.gnu.property"
> 
> I could move this definition to an internal header if the change to
> elf.h blocks this patch, if you prefer? Let me know.

It looks like it will be integrated into glibc elf.h later this week.
I'll resync elf.h then and apply the other half of your patch.

While looking at how to implement the json parsing of the note data I
noticed a couple of things that could be clarified in the spec to make
this more clear and less ambiguous.

The spec says "a key-value JSON format", "JSON payload" and "a JSON
string with the structure described below". Which isn't very exact, or
seems to describe multiple different JSON concepts which aren't exactly
the same thing. A JSON string is something different from a JSON object
(which is the only JSON value that has a key-value format). And it
doesn't really make clear what the encoding of the JSON itself is (it
cannot be a JSON string, because that itself is expressed in an
specific character encoding itself).

What I think the spec should say is something like:
"The note data is a single JSON object encoded as a zero terminated
UTF-8 string."

The spec does explain the requirements for JSON numbers, but doesn't
mention any for JSON strings or JSON objects. It would be good to also
explain how to make the strings and objects unambiguous.

For Objects it should require that all names are unique. See section 4
in rfc8259.

For Strings it should require that \uXXXX escaping isn't used and that
only control characters that have an explicit escape sequence are used
(\b, \n, \f, \r, \t) [if you allow them in the first place, they are
probably confusing and it may be good to simply say that Strings should
not contain any control characters or use \uXXXX escaping]. See section
7 and section 8 in rfc8259.

That should get rid of various corner cases that different parsers are
known to get wrong. Especially \uXXXX escaping is really confusing when
using the UTF-8 encoding (and it is totally necessary since UTF-8 can
express any valid UTF character already).

Cheers,

Mark
Luca Boccassi Nov. 30, 2021, 12:37 p.m. UTC | #3
On Tue, 2021-11-30 at 12:25 +0100, Mark Wielaard wrote:
> Hi Luca,
> 
> On Thu, 2021-11-25 at 17:02 +0000, Luca Boccassi via Elfutils-devel
> wrote:
> > +/* Packaging metadata as defined on 
> > > https://systemd.io/COREDUMP_PACKAGE_METADATA/ */
> > > +#define FDO_PACKAGING_METADATA 0xcafe1a7e
> > > +
> > >  /* Note section name of program property.   */
> > >  #define NOTE_GNU_PROPERTY_SECTION_NAME ".note.gnu.property"
> > 
> > I could move this definition to an internal header if the change to
> > elf.h blocks this patch, if you prefer? Let me know.
> 
> It looks like it will be integrated into glibc elf.h later this week.
> I'll resync elf.h then and apply the other half of your patch.
> 
> While looking at how to implement the json parsing of the note data I
> noticed a couple of things that could be clarified in the spec to make
> this more clear and less ambiguous.
> 
> The spec says "a key-value JSON format", "JSON payload" and "a JSON
> string with the structure described below". Which isn't very exact, or
> seems to describe multiple different JSON concepts which aren't exactly
> the same thing. A JSON string is something different from a JSON object
> (which is the only JSON value that has a key-value format). And it
> doesn't really make clear what the encoding of the JSON itself is (it
> cannot be a JSON string, because that itself is expressed in an
> specific character encoding itself).
> 
> What I think the spec should say is something like:
> "The note data is a single JSON object encoded as a zero terminated
> UTF-8 string."
> 
> The spec does explain the requirements for JSON numbers, but doesn't
> mention any for JSON strings or JSON objects. It would be good to also
> explain how to make the strings and objects unambiguous.
> 
> For Objects it should require that all names are unique. See section 4
> in rfc8259.
> 
> For Strings it should require that \uXXXX escaping isn't used and that
> only control characters that have an explicit escape sequence are used
> (\b, \n, \f, \r, \t) [if you allow them in the first place, they are
> probably confusing and it may be good to simply say that Strings should
> not contain any control characters or use \uXXXX escaping]. See section
> 7 and section 8 in rfc8259.
> 
> That should get rid of various corner cases that different parsers are
> known to get wrong. Especially \uXXXX escaping is really confusing when
> using the UTF-8 encoding (and it is totally necessary since UTF-8 can
> express any valid UTF character already).
> 
> Cheers,
> 
> Mark

Thanks, see:

https://github.com/systemd/systemd/pull/21578
Frank Ch. Eigler Nov. 30, 2021, 4:23 p.m. UTC | #4
Hi -

On Tue, Nov 30, 2021 at 12:25:41PM +0100, Mark Wielaard wrote:
> [...]
> The spec does explain the requirements for JSON numbers, but doesn't
> mention any for JSON strings or JSON objects. It would be good to also
> explain how to make the strings and objects unambiguous. [...]
> For Objects it should require that all names are unique. [...]
> For Strings it should require that \uXXXX escaping isn't used [...]
> 
> That should get rid of various corner cases that different parsers are
> known to get wrong. 

Are such buggy parsers seen in the wild, now, after all this time with
JSON?  It seems to me it's not really elfutils' or systemd's place to
impose -additional- constraints on customary JSON.


> Especially \uXXXX escaping is really confusing when using the UTF-8
> encoding (and it is totally necessary since UTF-8 can express any
> valid UTF character already).

Yes, and yet we have had the bidi situation recently where UTF-8 raw
codes could visually confuse a human reader whereas escaped \uXXXX
wouldn't.  If we forbid \uXXXX unilaterally, we literally become
incompatible with JSON (RFC8259 7. String. "Any character may be
escaped."), and for what?

Both these seem to be personal aesthetic decisions that need not be
imposed on a routine application of well-established standards.  We
have many industrial-strength json parser libraries to choose from,
so that part is IMO no longer a timely concern.


- FChE
Florian Weimer Nov. 30, 2021, 4:49 p.m. UTC | #5
* Frank Ch. Eigler via Elfutils-devel:

> Hi -
>
> On Tue, Nov 30, 2021 at 12:25:41PM +0100, Mark Wielaard wrote:
>> [...]
>> The spec does explain the requirements for JSON numbers, but doesn't
>> mention any for JSON strings or JSON objects. It would be good to also
>> explain how to make the strings and objects unambiguous. [...]
>> For Objects it should require that all names are unique. [...]
>> For Strings it should require that \uXXXX escaping isn't used [...]
>> 
>> That should get rid of various corner cases that different parsers are
>> known to get wrong. 
>
> Are such buggy parsers seen in the wild, now, after all this time with
> JSON?  It seems to me it's not really elfutils' or systemd's place to
> impose -additional- constraints on customary JSON.

JSON has been targeted at the Windows/Java UTF-16 world, there is always
going to be a mismatch if you try to represent it in UTF-8 or anything
that doesn't have surrogate pairs.

>> Especially \uXXXX escaping is really confusing when using the UTF-8
>> encoding (and it is totally necessary since UTF-8 can express any
>> valid UTF character already).
>
> Yes, and yet we have had the bidi situation recently where UTF-8 raw
> codes could visually confuse a human reader whereas escaped \uXXXX
> wouldn't.  If we forbid \uXXXX unilaterally, we literally become
> incompatible with JSON (RFC8259 7. String. "Any character may be
> escaped."), and for what?

RFC 8259 says this:

   However, the ABNF in this specification allows member names and
   string values to contain bit sequences that cannot encode Unicode
   characters; for example, "\uDEAD" (a single unpaired UTF-16
   surrogate).  Instances of this have been observed, for example, when
   a library truncates a UTF-16 string without checking whether the
   truncation split a surrogate pair.  The behavior of software that
   receives JSON texts containing such values is unpredictable; for
   example, implementations might return different values for the length
   of a string value or even suffer fatal runtime exceptions.

A UTF-8 environment has to enforce *some* additional constraints
compared to the official JSON syntax.

Thanks,
Florian
Mark Wielaard Nov. 30, 2021, 8:04 p.m. UTC | #6
Hi,

On Tue, Nov 30, 2021 at 05:49:58PM +0100, Florian Weimer wrote:
> * Frank Ch. Eigler via Elfutils-devel:
> > On Tue, Nov 30, 2021 at 12:25:41PM +0100, Mark Wielaard wrote:
> >> [...]
> >> The spec does explain the requirements for JSON numbers, but doesn't
> >> mention any for JSON strings or JSON objects. It would be good to also
> >> explain how to make the strings and objects unambiguous. [...]
> >> For Objects it should require that all names are unique. [...]
> >> For Strings it should require that \uXXXX escaping isn't used [...]
> >> 
> >> That should get rid of various corner cases that different parsers are
> >> known to get wrong. 
> >
> > Are such buggy parsers seen in the wild, now, after all this time with
> > JSON?  It seems to me it's not really elfutils' or systemd's place to
> > impose -additional- constraints on customary JSON.
> 
> JSON has been targeted at the Windows/Java UTF-16 world, there is always
> going to be a mismatch if you try to represent it in UTF-8 or anything
> that doesn't have surrogate pairs.

Right, the \uhex escaping is kind of odd when using UTF-8 because you
suddenly have to use surrogate pairs. It is correct that you may use
them according to the RFC. But the RFC also points out that if you do
there are various odd/bad corner cases to take care of and that it
makes comparing strings really awkward.

> >> Especially \uXXXX escaping is really confusing when using the UTF-8
> >> encoding (and it is totally necessary since UTF-8 can express any
> >> valid UTF character already).
> >
> > Yes, and yet we have had the bidi situation recently where UTF-8 raw
> > codes could visually confuse a human reader whereas escaped \uXXXX
> > wouldn't.  If we forbid \uXXXX unilaterally, we literally become
> > incompatible with JSON (RFC8259 7. String. "Any character may be
> > escaped."), and for what?
> 
> RFC 8259 says this:
> 
>    However, the ABNF in this specification allows member names and
>    string values to contain bit sequences that cannot encode Unicode
>    characters; for example, "\uDEAD" (a single unpaired UTF-16
>    surrogate).  Instances of this have been observed, for example, when
>    a library truncates a UTF-16 string without checking whether the
>    truncation split a surrogate pair.  The behavior of software that
>    receives JSON texts containing such values is unpredictable; for
>    example, implementations might return different values for the length
>    of a string value or even suffer fatal runtime exceptions.
> 
> A UTF-8 environment has to enforce *some* additional constraints
> compared to the official JSON syntax.

Right, all recommendations for encoding the JSON object, string and
numbers actually come from the RFC. I don't think it is odd to
explicitly specify the encoding of the package note as using a zero
terminated UTF-8 string and to specify the allowed JSON string and
number encodings. The contents of the package note is meant to be
human readable, so excluding control characters and excotic escape
sequence in the expected key/value pairs seems natural to me.

Cheers,

Mark
Frank Ch. Eigler Dec. 2, 2021, 3:16 p.m. UTC | #7
Hi -

> JSON has been targeted at the Windows/Java UTF-16 world, there is always
> going to be a mismatch if you try to represent it in UTF-8 or anything
> that doesn't have surrogate pairs.

The JSON RFC8259 8.1 mandates UTF-8 encoding for situations like ours.


> > Yes, and yet we have had the bidi situation recently where UTF-8 raw
> > codes could visually confuse a human reader whereas escaped \uXXXX
> > wouldn't.  If we forbid \uXXXX unilaterally, we literally become
> > incompatible with JSON (RFC8259 7. String. "Any character may be
> > escaped."), and for what?
> 
> RFC 8259 says this:
> 
>    However, the ABNF in this specification allows member names and
>    string values to contain bit sequences that cannot encode Unicode
>    characters; for example, "\uDEAD" (a single unpaired UTF-16
>    surrogate).  Instances of this have been observed, for example, when
>    a library truncates a UTF-16 string without checking whether the
>    truncation split a surrogate pair.  The behavior of software that
>    receives JSON texts containing such values is unpredictable; for
>    example, implementations might return different values for the length
>    of a string value or even suffer fatal runtime exceptions.
> 
> A UTF-8 environment has to enforce *some* additional constraints
> compared to the official JSON syntax.

I'm sorry, I don't see how.  If a JSON string were to include the
suspect "\uDEAD", but from observing our hypothetical "no escapes!"
rule they could reencode it as the UTF-8 octets 0xED 0xBA 0xAD.
ISTM we're no better off.


- FChE
Florian Weimer Dec. 2, 2021, 3:44 p.m. UTC | #8
* Frank Ch. Eigler:

>> > Yes, and yet we have had the bidi situation recently where UTF-8 raw
>> > codes could visually confuse a human reader whereas escaped \uXXXX
>> > wouldn't.  If we forbid \uXXXX unilaterally, we literally become
>> > incompatible with JSON (RFC8259 7. String. "Any character may be
>> > escaped."), and for what?
>> 
>> RFC 8259 says this:
>> 
>>    However, the ABNF in this specification allows member names and
>>    string values to contain bit sequences that cannot encode Unicode
>>    characters; for example, "\uDEAD" (a single unpaired UTF-16
>>    surrogate).  Instances of this have been observed, for example, when
>>    a library truncates a UTF-16 string without checking whether the
>>    truncation split a surrogate pair.  The behavior of software that
>>    receives JSON texts containing such values is unpredictable; for
>>    example, implementations might return different values for the length
>>    of a string value or even suffer fatal runtime exceptions.
>> 
>> A UTF-8 environment has to enforce *some* additional constraints
>> compared to the official JSON syntax.
>
> I'm sorry, I don't see how.  If a JSON string were to include the
> suspect "\uDEAD", but from observing our hypothetical "no escapes!"
> rule they could reencode it as the UTF-8 octets 0xED 0xBA 0xAD.
> ISTM we're no better off.

These octets aren't UTF-8.  UTF-8 never contains surrogate pairs (paired
or unpaired). 8-(

Thanks,
Florian
diff mbox series

Patch

diff --git a/libebl/eblobjnote.c b/libebl/eblobjnote.c
index 36efe275..e3ad1409 100644
--- a/libebl/eblobjnote.c
+++ b/libebl/eblobjnote.c
@@ -288,6 +288,9 @@  ebl_object_note (Ebl *ebl, uint32_t namesz, const char *name, uint32_t type,
       if (descsz == 0 && type == NT_VERSION)
 	return;
 
+      if (strcmp ("FDO", name) == 0 && type == FDO_PACKAGING_METADATA && descsz > 0 && desc[descsz - 1] == '\0')
+	printf("    Packaging Metadata: %.*s\n", (int) descsz, desc);
+
       /* Everything else should have the "GNU" owner name.  */
       if (strcmp ("GNU", name) != 0)
 	return;
diff --git a/libebl/eblobjnotetypename.c b/libebl/eblobjnotetypename.c
index 4662906d..863f20e4 100644
--- a/libebl/eblobjnotetypename.c
+++ b/libebl/eblobjnotetypename.c
@@ -101,6 +101,9 @@  ebl_object_note_type_name (Ebl *ebl, const char *name, uint32_t type,
 	  return buf;
 	}
 
+      if (strcmp (name, "FDO") == 0 && type == FDO_PACKAGING_METADATA)
+	return "FDO_PACKAGING_METADATA";
+
       if (strcmp (name, "GNU") != 0)
 	{
 	  /* NT_VERSION is special, all data is in the name.  */
diff --git a/libelf/elf.h b/libelf/elf.h
index 8e3e618f..633f9f67 100644
--- a/libelf/elf.h
+++ b/libelf/elf.h
@@ -1297,6 +1297,9 @@  typedef struct
 /* Program property.  */
 #define NT_GNU_PROPERTY_TYPE_0 5
 
+/* Packaging metadata as defined on https://systemd.io/COREDUMP_PACKAGE_METADATA/ */
+#define FDO_PACKAGING_METADATA 0xcafe1a7e
+
 /* Note section name of program property.   */
 #define NOTE_GNU_PROPERTY_SECTION_NAME ".note.gnu.property"