getmntent: Consolidate character decoding
Commit Message
The Linux kernel escapes some characters (whitespaces and \) by
replacing them with their octal form in the format \xxx. Use that
format to decode insteald of looking for specific bytes so that the
check is extensible. This way if the kernel escapes additional
characters, glibc code won't have to change to accommodate it. I
have, for example, proposed[1] to escape the '#' character since it
may interfere with parsing in getmntent.
The check for '\\\\' is kept intact even though as of today, the
kernel does not write '\\\\' to escape a backslash; it write its octal
equivalent instead.
[1] https://lore.kernel.org/linux-fsdevel/20201215042454.998361-1-siddhesh@gotplt.org/T/#u
---
misc/mntent_r.c | 43 +++++++++++++++----------------------------
1 file changed, 15 insertions(+), 28 deletions(-)
Comments
On Dez 15 2020, Siddhesh Poyarekar via Libc-alpha wrote:
> + else
> + {
> + /* The kernel escapes special characters with their octal
> + equivalents. */
> + char c = 64 * (rp[1] - '0') + 8 * (rp[2] - '0') + rp[3] - '0';
> + *wp++ = c;
> + rp += 3;
That should check that the characters are indeed digits and you are not
running past the end.
Andreas.
On 12/15/20 2:57 PM, Andreas Schwab wrote:
> On Dez 15 2020, Siddhesh Poyarekar via Libc-alpha wrote:
>
>> + else
>> + {
>> + /* The kernel escapes special characters with their octal
>> + equivalents. */
>> + char c = 64 * (rp[1] - '0') + 8 * (rp[2] - '0') + rp[3] - '0';
>> + *wp++ = c;
>> + rp += 3;
>
> That should check that the characters are indeed digits and you are not
> running past the end.
Indeed, I'll fix it up, thanks.
Siddhesh
On 12/14/20 11:48 PM, Siddhesh Poyarekar via Libc-alpha wrote:
> The Linux kernel escapes some characters (whitespaces and \) by
> replacing them with their octal form in the format \xxx. Use that
> format to decode insteald of looking for specific bytes so that the
> check is extensible. This way if the kernel escapes additional
> characters, glibc code won't have to change to accommodate it. I
> have, for example, proposed[1] to escape the '#' character since it
> may interfere with parsing in getmntent.
>
> The check for '\\\\' is kept intact even though as of today, the
> kernel does not write '\\\\' to escape a backslash; it write its octal
> equivalent instead.
>
> [1] https://lore.kernel.org/linux-fsdevel/20201215042454.998361-1-siddhesh@gotplt.org/T/#u
> ---
> misc/mntent_r.c | 43 +++++++++++++++----------------------------
> 1 file changed, 15 insertions(+), 28 deletions(-)
Please consider how we would test this.
I would like to see additional coverage here to make sure this works
in the future.
On 12/15/20 8:20 PM, Carlos O'Donell via Libc-alpha wrote:
> On 12/14/20 11:48 PM, Siddhesh Poyarekar via Libc-alpha wrote:
>> The Linux kernel escapes some characters (whitespaces and \) by
>> replacing them with their octal form in the format \xxx. Use that
>> format to decode insteald of looking for specific bytes so that the
>> check is extensible. This way if the kernel escapes additional
>> characters, glibc code won't have to change to accommodate it. I
>> have, for example, proposed[1] to escape the '#' character since it
>> may interfere with parsing in getmntent.
>>
>> The check for '\\\\' is kept intact even though as of today, the
>> kernel does not write '\\\\' to escape a backslash; it write its octal
>> equivalent instead.
>>
>> [1] https://lore.kernel.org/linux-fsdevel/20201215042454.998361-1-siddhesh@gotplt.org/T/#u
>> ---
>> misc/mntent_r.c | 43 +++++++++++++++----------------------------
>> 1 file changed, 15 insertions(+), 28 deletions(-)
>
> Please consider how we would test this.
>
> I would like to see additional coverage here to make sure this works
> in the future.
>
Sure thing. I realized my first attempt was quite a lazy one and the
whole set of functions needs a closer look. I'll post a v2 with all
that and tests.
Siddhesh
@@ -76,35 +76,22 @@ decode_name (char *buf)
char *wp = buf;
do
- if (rp[0] == '\\' && rp[1] == '0' && rp[2] == '4' && rp[3] == '0')
+ if (rp[0] == '\\')
{
- /* \040 is a SPACE. */
- *wp++ = ' ';
- rp += 3;
- }
- else if (rp[0] == '\\' && rp[1] == '0' && rp[2] == '1' && rp[3] == '1')
- {
- /* \011 is a TAB. */
- *wp++ = '\t';
- rp += 3;
- }
- else if (rp[0] == '\\' && rp[1] == '0' && rp[2] == '1' && rp[3] == '2')
- {
- /* \012 is a NEWLINE. */
- *wp++ = '\n';
- rp += 3;
- }
- else if (rp[0] == '\\' && rp[1] == '\\')
- {
- /* We have to escape \\ to be able to represent all characters. */
- *wp++ = '\\';
- rp += 1;
- }
- else if (rp[0] == '\\' && rp[1] == '1' && rp[2] == '3' && rp[3] == '4')
- {
- /* \134 is also \\. */
- *wp++ = '\\';
- rp += 3;
+ if (rp[1] == '\\')
+ {
+ /* We have to escape \\ to be able to represent all characters. */
+ *wp++ = '\\';
+ rp += 1;
+ }
+ else
+ {
+ /* The kernel escapes special characters with their octal
+ equivalents. */
+ char c = 64 * (rp[1] - '0') + 8 * (rp[2] - '0') + rp[3] - '0';
+ *wp++ = c;
+ rp += 3;
+ }
}
else
*wp++ = *rp;