diff mbox series

getmntent: Consolidate character decoding

Message ID	20201215044812.1068400-1-siddhesh@sourceware.org
State	Dropped
Headers	DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 2C8EB385800D sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 37E96322571; Tue, 15 Dec 2020 04:48:24 +0000 (UTC) sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a35.g.dreamhost.com (Postfix) with ESMTPSA id AA5617F506; Mon, 14 Dec 2020 20:48:21 -0800 (PST) To: libc-alpha@sourceware.org Subject: [PATCH] getmntent: Consolidate character decoding Date: Tue, 15 Dec 2020 10:18:12 +0530 Message-Id: <20201215044812.1068400-1-siddhesh@sourceware.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: list From: Siddhesh Poyarekar via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Siddhesh Poyarekar <siddhesh@sourceware.org> Cc: fweimer@redhat.com Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org>
Series	getmntent: Consolidate character decoding \| getmntent: Consolidate character decoding

Commit Message

Siddhesh Poyarekar Dec. 15, 2020, 4:48 a.m. UTC

  The Linux kernel escapes some characters (whitespaces and \) by
replacing them with their octal form in the format \xxx.  Use that
format to decode insteald of looking for specific bytes so that the
check is extensible.  This way if the kernel escapes additional
characters, glibc code won't have to change to accommodate it.  I
have, for example, proposed[1] to escape the '#' character since it
may interfere with parsing in getmntent.

The check for '\\\\' is kept intact even though as of today, the
kernel does not write '\\\\' to escape a backslash; it write its octal
equivalent instead.

[1] https://lore.kernel.org/linux-fsdevel/20201215042454.998361-1-siddhesh@gotplt.org/T/#u
---
 misc/mntent_r.c | 43 +++++++++++++++----------------------------
 1 file changed, 15 insertions(+), 28 deletions(-)

Comments

Andreas Schwab Dec. 15, 2020, 9:27 a.m. UTC | #1

On Dez 15 2020, Siddhesh Poyarekar via Libc-alpha wrote:

> +	else
> +	  {
> +	    /* The kernel escapes special characters with their octal
> +	       equivalents.  */
> +	    char c = 64 * (rp[1] - '0') + 8 * (rp[2] - '0') + rp[3] - '0';
> +	    *wp++ = c;
> +	    rp += 3;

That should check that the characters are indeed digits and you are not
running past the end.

Andreas.

Siddhesh Poyarekar Dec. 15, 2020, 9:41 a.m. UTC | #2

On 12/15/20 2:57 PM, Andreas Schwab wrote:
> On Dez 15 2020, Siddhesh Poyarekar via Libc-alpha wrote:
> 
>> +	else
>> +	  {
>> +	    /* The kernel escapes special characters with their octal
>> +	       equivalents.  */
>> +	    char c = 64 * (rp[1] - '0') + 8 * (rp[2] - '0') + rp[3] - '0';
>> +	    *wp++ = c;
>> +	    rp += 3;
> 
> That should check that the characters are indeed digits and you are not
> running past the end.

Indeed, I'll fix it up, thanks.

Siddhesh

Carlos O'Donell Dec. 15, 2020, 2:50 p.m. UTC | #3

On 12/14/20 11:48 PM, Siddhesh Poyarekar via Libc-alpha wrote:
> The Linux kernel escapes some characters (whitespaces and \) by
> replacing them with their octal form in the format \xxx.  Use that
> format to decode insteald of looking for specific bytes so that the
> check is extensible.  This way if the kernel escapes additional
> characters, glibc code won't have to change to accommodate it.  I
> have, for example, proposed[1] to escape the '#' character since it
> may interfere with parsing in getmntent.
> 
> The check for '\\\\' is kept intact even though as of today, the
> kernel does not write '\\\\' to escape a backslash; it write its octal
> equivalent instead.
> 
> [1] https://lore.kernel.org/linux-fsdevel/20201215042454.998361-1-siddhesh@gotplt.org/T/#u
> ---
>  misc/mntent_r.c | 43 +++++++++++++++----------------------------
>  1 file changed, 15 insertions(+), 28 deletions(-)

Please consider how we would test this.

I would like to see additional coverage here to make sure this works
in the future.

Siddhesh Poyarekar Dec. 15, 2020, 3 p.m. UTC | #4

On 12/15/20 8:20 PM, Carlos O'Donell via Libc-alpha wrote:
> On 12/14/20 11:48 PM, Siddhesh Poyarekar via Libc-alpha wrote:
>> The Linux kernel escapes some characters (whitespaces and \) by
>> replacing them with their octal form in the format \xxx.  Use that
>> format to decode insteald of looking for specific bytes so that the
>> check is extensible.  This way if the kernel escapes additional
>> characters, glibc code won't have to change to accommodate it.  I
>> have, for example, proposed[1] to escape the '#' character since it
>> may interfere with parsing in getmntent.
>>
>> The check for '\\\\' is kept intact even though as of today, the
>> kernel does not write '\\\\' to escape a backslash; it write its octal
>> equivalent instead.
>>
>> [1] https://lore.kernel.org/linux-fsdevel/20201215042454.998361-1-siddhesh@gotplt.org/T/#u
>> ---
>>   misc/mntent_r.c | 43 +++++++++++++++----------------------------
>>   1 file changed, 15 insertions(+), 28 deletions(-)
> 
> Please consider how we would test this.
> 
> I would like to see additional coverage here to make sure this works
> in the future.
> 

Sure thing.  I realized my first attempt was quite a lazy one and the 
whole set of functions needs a closer look.  I'll post a v2 with all 
that and tests.

Siddhesh

diff mbox series

Patch

diff --git a/misc/mntent_r.c b/misc/mntent_r.c
index 0e8f10007e..f80b563f39 100644
--- a/misc/mntent_r.c
+++ b/misc/mntent_r.c
@@ -76,35 +76,22 @@  decode_name (char *buf)
   char *wp = buf;
 
   do
-    if (rp[0] == '\\' && rp[1] == '0' && rp[2] == '4' && rp[3] == '0')
+    if (rp[0] == '\\')
       {
-	/* \040 is a SPACE.  */
-	*wp++ = ' ';
-	rp += 3;
-      }
-    else if (rp[0] == '\\' && rp[1] == '0' && rp[2] == '1' && rp[3] == '1')
-      {
-	/* \011 is a TAB.  */
-	*wp++ = '\t';
-	rp += 3;
-      }
-    else if (rp[0] == '\\' && rp[1] == '0' && rp[2] == '1' && rp[3] == '2')
-      {
-	/* \012 is a NEWLINE.  */
-	*wp++ = '\n';
-	rp += 3;
-      }
-    else if (rp[0] == '\\' && rp[1] == '\\')
-      {
-	/* We have to escape \\ to be able to represent all characters.  */
-	*wp++ = '\\';
-	rp += 1;
-      }
-    else if (rp[0] == '\\' && rp[1] == '1' && rp[2] == '3' && rp[3] == '4')
-      {
-	/* \134 is also \\.  */
-	*wp++ = '\\';
-	rp += 3;
+	if (rp[1] == '\\')
+	  {
+	    /* We have to escape \\ to be able to represent all characters.  */
+	    *wp++ = '\\';
+	    rp += 1;
+	  }
+	else
+	  {
+	    /* The kernel escapes special characters with their octal
+	       equivalents.  */
+	    char c = 64 * (rp[1] - '0') + 8 * (rp[2] - '0') + rp[3] - '0';
+	    *wp++ = c;
+	    rp += 3;
+	  }
       }
     else
       *wp++ = *rp;