getmntent: Consolidate character decoding

Message ID 20201215044812.1068400-1-siddhesh@sourceware.org
State Dropped
Headers
Series getmntent: Consolidate character decoding |

Commit Message

Siddhesh Poyarekar Dec. 15, 2020, 4:48 a.m. UTC
  The Linux kernel escapes some characters (whitespaces and \) by
replacing them with their octal form in the format \xxx.  Use that
format to decode insteald of looking for specific bytes so that the
check is extensible.  This way if the kernel escapes additional
characters, glibc code won't have to change to accommodate it.  I
have, for example, proposed[1] to escape the '#' character since it
may interfere with parsing in getmntent.

The check for '\\\\' is kept intact even though as of today, the
kernel does not write '\\\\' to escape a backslash; it write its octal
equivalent instead.

[1] https://lore.kernel.org/linux-fsdevel/20201215042454.998361-1-siddhesh@gotplt.org/T/#u
---
 misc/mntent_r.c | 43 +++++++++++++++----------------------------
 1 file changed, 15 insertions(+), 28 deletions(-)
  

Comments

Andreas Schwab Dec. 15, 2020, 9:27 a.m. UTC | #1
On Dez 15 2020, Siddhesh Poyarekar via Libc-alpha wrote:

> +	else
> +	  {
> +	    /* The kernel escapes special characters with their octal
> +	       equivalents.  */
> +	    char c = 64 * (rp[1] - '0') + 8 * (rp[2] - '0') + rp[3] - '0';
> +	    *wp++ = c;
> +	    rp += 3;

That should check that the characters are indeed digits and you are not
running past the end.

Andreas.
  
Siddhesh Poyarekar Dec. 15, 2020, 9:41 a.m. UTC | #2
On 12/15/20 2:57 PM, Andreas Schwab wrote:
> On Dez 15 2020, Siddhesh Poyarekar via Libc-alpha wrote:
> 
>> +	else
>> +	  {
>> +	    /* The kernel escapes special characters with their octal
>> +	       equivalents.  */
>> +	    char c = 64 * (rp[1] - '0') + 8 * (rp[2] - '0') + rp[3] - '0';
>> +	    *wp++ = c;
>> +	    rp += 3;
> 
> That should check that the characters are indeed digits and you are not
> running past the end.

Indeed, I'll fix it up, thanks.

Siddhesh
  
Carlos O'Donell Dec. 15, 2020, 2:50 p.m. UTC | #3
On 12/14/20 11:48 PM, Siddhesh Poyarekar via Libc-alpha wrote:
> The Linux kernel escapes some characters (whitespaces and \) by
> replacing them with their octal form in the format \xxx.  Use that
> format to decode insteald of looking for specific bytes so that the
> check is extensible.  This way if the kernel escapes additional
> characters, glibc code won't have to change to accommodate it.  I
> have, for example, proposed[1] to escape the '#' character since it
> may interfere with parsing in getmntent.
> 
> The check for '\\\\' is kept intact even though as of today, the
> kernel does not write '\\\\' to escape a backslash; it write its octal
> equivalent instead.
> 
> [1] https://lore.kernel.org/linux-fsdevel/20201215042454.998361-1-siddhesh@gotplt.org/T/#u
> ---
>  misc/mntent_r.c | 43 +++++++++++++++----------------------------
>  1 file changed, 15 insertions(+), 28 deletions(-)

Please consider how we would test this.

I would like to see additional coverage here to make sure this works
in the future.
  
Siddhesh Poyarekar Dec. 15, 2020, 3 p.m. UTC | #4
On 12/15/20 8:20 PM, Carlos O'Donell via Libc-alpha wrote:
> On 12/14/20 11:48 PM, Siddhesh Poyarekar via Libc-alpha wrote:
>> The Linux kernel escapes some characters (whitespaces and \) by
>> replacing them with their octal form in the format \xxx.  Use that
>> format to decode insteald of looking for specific bytes so that the
>> check is extensible.  This way if the kernel escapes additional
>> characters, glibc code won't have to change to accommodate it.  I
>> have, for example, proposed[1] to escape the '#' character since it
>> may interfere with parsing in getmntent.
>>
>> The check for '\\\\' is kept intact even though as of today, the
>> kernel does not write '\\\\' to escape a backslash; it write its octal
>> equivalent instead.
>>
>> [1] https://lore.kernel.org/linux-fsdevel/20201215042454.998361-1-siddhesh@gotplt.org/T/#u
>> ---
>>   misc/mntent_r.c | 43 +++++++++++++++----------------------------
>>   1 file changed, 15 insertions(+), 28 deletions(-)
> 
> Please consider how we would test this.
> 
> I would like to see additional coverage here to make sure this works
> in the future.
> 

Sure thing.  I realized my first attempt was quite a lazy one and the 
whole set of functions needs a closer look.  I'll post a v2 with all 
that and tests.

Siddhesh
  

Patch

diff --git a/misc/mntent_r.c b/misc/mntent_r.c
index 0e8f10007e..f80b563f39 100644
--- a/misc/mntent_r.c
+++ b/misc/mntent_r.c
@@ -76,35 +76,22 @@  decode_name (char *buf)
   char *wp = buf;
 
   do
-    if (rp[0] == '\\' && rp[1] == '0' && rp[2] == '4' && rp[3] == '0')
+    if (rp[0] == '\\')
       {
-	/* \040 is a SPACE.  */
-	*wp++ = ' ';
-	rp += 3;
-      }
-    else if (rp[0] == '\\' && rp[1] == '0' && rp[2] == '1' && rp[3] == '1')
-      {
-	/* \011 is a TAB.  */
-	*wp++ = '\t';
-	rp += 3;
-      }
-    else if (rp[0] == '\\' && rp[1] == '0' && rp[2] == '1' && rp[3] == '2')
-      {
-	/* \012 is a NEWLINE.  */
-	*wp++ = '\n';
-	rp += 3;
-      }
-    else if (rp[0] == '\\' && rp[1] == '\\')
-      {
-	/* We have to escape \\ to be able to represent all characters.  */
-	*wp++ = '\\';
-	rp += 1;
-      }
-    else if (rp[0] == '\\' && rp[1] == '1' && rp[2] == '3' && rp[3] == '4')
-      {
-	/* \134 is also \\.  */
-	*wp++ = '\\';
-	rp += 3;
+	if (rp[1] == '\\')
+	  {
+	    /* We have to escape \\ to be able to represent all characters.  */
+	    *wp++ = '\\';
+	    rp += 1;
+	  }
+	else
+	  {
+	    /* The kernel escapes special characters with their octal
+	       equivalents.  */
+	    char c = 64 * (rp[1] - '0') + 8 * (rp[2] - '0') + rp[3] - '0';
+	    *wp++ = c;
+	    rp += 3;
+	  }
       }
     else
       *wp++ = *rp;