[1/2] Flag CPP_W_BIDIRECTIONAL so that source lines are escaped
Commit Message
Before:
Wbidirectional-1.c: In function ‘main’:
Wbidirectional-1.c:6:43: warning: unpaired UTF-8 bidirectional character detected [-Wbidirectional=]
6 | /* } if (isAdmin) begin admins only */
| ^
Wbidirectional-1.c:9:28: warning: unpaired UTF-8 bidirectional character detected [-Wbidirectional=]
9 | /* end admins only { */
| ^
Wbidirectional-11.c:6:15: warning: UTF-8 vs UCN mismatch when closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-Wbidirectional=]
6 | int LRE__PDF_\u202c;
| ^
After setting rich_loc.set_escape_on_output (true):
Wbidirectional-1.c:6:43: warning: unpaired UTF-8 bidirectional character detected [-Wbidirectional=]
6 | /*<U+202E> } <U+2066>if (isAdmin)<U+2069> <U+2066> begin admins only */
| ^
Wbidirectional-1.c:9:28: warning: unpaired UTF-8 bidirectional character detected [-Wbidirectional=]
9 | /* end admins only <U+202E> { <U+2066>*/
| ^
Wbidirectional-11.c:6:15: warning: UTF-8 vs UCN mismatch when closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-Wbidirectional=]
6 | int LRE_<U+202A>_PDF_\u202c;
| ^
libcpp/ChangeLog:
* lex.c (maybe_warn_bidi_on_close): Use a rich_location
and call set_escape_on_output (true) on it.
(maybe_warn_bidi_on_char): Likewise.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
---
libcpp/lex.c | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)
Comments
On Tue, 2021-11-02 at 16:58 -0400, David Malcolm wrote:
> Before:
>
> Wbidirectional-1.c: In function ‘main’:
> Wbidirectional-1.c:6:43: warning: unpaired UTF-8 bidirectional
> character detected [-Wbidirectional=]
> 6 | /* } if (isAdmin) begin admins only */
> | ^
> Wbidirectional-1.c:9:28: warning: unpaired UTF-8 bidirectional
> character detected [-Wbidirectional=]
> 9 | /* end admins only { */
> | ^
>
> Wbidirectional-11.c:6:15: warning: UTF-8 vs UCN mismatch when
> closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-
> Wbidirectional=]
> 6 | int LRE__PDF_\u202c;
> | ^
>
> After setting rich_loc.set_escape_on_output (true):
>
> Wbidirectional-1.c:6:43: warning: unpaired UTF-8 bidirectional
> character detected [-Wbidirectional=]
> 6 | /*<U+202E> } <U+2066>if (isAdmin)<U+2069> <U+2066>
> begin admins only */
>
> |
> ^
> Wbidirectional-1.c:9:28: warning: unpaired UTF-8 bidirectional
> character detected [-Wbidirectional=]
> 9 | /* end admins only <U+202E> { <U+2066>*/
> | ^
>
> Wbidirectional-11.c:6:15: warning: UTF-8 vs UCN mismatch when
> closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-
> Wbidirectional=]
> 6 | int LRE_<U+202A>_PDF_\u202c;
> | ^
>
> libcpp/ChangeLog:
> * lex.c (maybe_warn_bidi_on_close): Use a rich_location
> and call set_escape_on_output (true) on it.
> (maybe_warn_bidi_on_char): Likewise.
>
> Signed-off-by: David Malcolm <dmalcolm@redhat.com>
[...snip...]
To be more explicit: part of the benefit of escaping non-ASCII bytes in
the source line is that it further mitigates against CVE-2021-42574,
since it "defangs" the bidi control characters - turning everything
into ASCII, so that the user can see the logical ordering of the
characters directly. A similar consideration applies to homoglyph
attacks.
Dave
@@ -1387,9 +1387,11 @@ maybe_warn_bidi_on_close (cpp_reader *pfile, const uchar *p)
const location_t loc
= linemap_position_for_column (pfile->line_table,
CPP_BUF_COLUMN (pfile->buffer, p));
- cpp_warning_with_line (pfile, CPP_W_BIDIRECTIONAL, loc, 0,
- "unpaired UTF-8 bidirectional character "
- "detected");
+ rich_location rich_loc (pfile->line_table, loc);
+ rich_loc.set_escape_on_output (true);
+ cpp_warning_at (pfile, CPP_W_BIDIRECTIONAL, &rich_loc,
+ "unpaired UTF-8 bidirectional character "
+ "detected");
}
/* We're done with this context. */
bidi::on_close ();
@@ -1414,6 +1416,9 @@ maybe_warn_bidi_on_char (cpp_reader *pfile, const uchar *p, bidi::kind kind,
const location_t loc
= linemap_position_for_column (pfile->line_table,
CPP_BUF_COLUMN (pfile->buffer, p));
+ rich_location rich_loc (pfile->line_table, loc);
+ rich_loc.set_escape_on_output (true);
+
/* It seems excessive to warn about a PDI/PDF that is closing
an opened context because we've already warned about the
opening character. Except warn when we have a UCN x UTF-8
@@ -1422,20 +1427,20 @@ maybe_warn_bidi_on_char (cpp_reader *pfile, const uchar *p, bidi::kind kind,
{
if (warn_bidi == bidirectional_unpaired
&& bidi::current_ctx_ucn_p () != ucn_p)
- cpp_warning_with_line (pfile, CPP_W_BIDIRECTIONAL, loc, 0,
- "UTF-8 vs UCN mismatch when closing "
- "a context by \"%s\"", bidi::to_str (kind));
+ cpp_warning_at (pfile, CPP_W_BIDIRECTIONAL, &rich_loc,
+ "UTF-8 vs UCN mismatch when closing "
+ "a context by \"%s\"", bidi::to_str (kind));
}
else if (warn_bidi == bidirectional_any)
{
if (kind == bidi::kind::PDF || kind == bidi::kind::PDI)
- cpp_warning_with_line (pfile, CPP_W_BIDIRECTIONAL, loc, 0,
- "\"%s\" is closing an unopened context",
- bidi::to_str (kind));
+ cpp_warning_at (pfile, CPP_W_BIDIRECTIONAL, &rich_loc,
+ "\"%s\" is closing an unopened context",
+ bidi::to_str (kind));
else
- cpp_warning_with_line (pfile, CPP_W_BIDIRECTIONAL, loc, 0,
- "found problematic Unicode character \"%s\"",
- bidi::to_str (kind));
+ cpp_warning_at (pfile, CPP_W_BIDIRECTIONAL, &rich_loc,
+ "found problematic Unicode character \"%s\"",
+ bidi::to_str (kind));
}
}
/* We're done with this context. */