From patchwork Wed Nov 10 08:58:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pierre-Marie de Rodat X-Patchwork-Id: 47377 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8E63D3857C64 for ; Wed, 10 Nov 2021 09:13:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8E63D3857C64 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1636535588; bh=2QJfzdzO/qk6K5geBaYXlY6Eo60YwRcXVTZm+KryNLY=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=VMBQINUbkDfc6U5MdludNtu24IYN1n22xUX9pcQQmeTspxi8Qo9pxoOWSKUuQhuby iRrI0W02Ri8yDubN017UxK5tstv3dfF0bHN0HtwkhwTPChmyxn/GtF4fetD4Yf+lld FYXYDRs16e04ULhORUIHAlAx8vI4HKTsJk/Icng8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) by sourceware.org (Postfix) with ESMTPS id C76B63858409 for ; Wed, 10 Nov 2021 08:58:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C76B63858409 Received: by mail-lf1-x12d.google.com with SMTP id l22so4308002lfg.7 for ; Wed, 10 Nov 2021 00:58:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-disposition; bh=2QJfzdzO/qk6K5geBaYXlY6Eo60YwRcXVTZm+KryNLY=; b=ktAAGaiJmSmk2WPthLJrW99cWHTe94fZk5ihDy9+KMdmkE4aWBX3KkXR/b4lPokg+N rTI5Afh1hXY8cH7hJR7LN/vfcNFv+pCfVMZoA5zlZ4G9AuKbONSmrStPDKG+vqKLlILs rfIk+PemT3ulbs6Om+77F+WWOmgwnmw69fZDZek/KNSqliHNR9NZLYo0VaLyTJV7WSBT 8GegpjsJbJzfRgA6lDNNVQy+DCBTTuYvbNWFn/7+vs5LFf3AEybL7XHc2R2J4kzY9FE8 vzEPtuYukxErPI7mMgG8+44bAZOpzhJwDurDxd/BpvEybMg3djVgkoZ9I6fJU4GeAoo8 zgjg== X-Gm-Message-State: AOAM531XR2pVIWtyUqKjE0L7cXSHtXk8rq6kU21iUrBxL8hmYGSSI4aw B29Qr4ZBednX96oWczbMtZavJ+sfaHP93Q== X-Google-Smtp-Source: ABdhPJy/fkHon9DUU+hxQN+kXZqxBDOHHyh6EdQ+GZu+nDwuYlzV0DUAovTcSvLVZUpQerDBI1udJA== X-Received: by 2002:a05:6512:39c8:: with SMTP id k8mr12569479lfu.81.1636534732730; Wed, 10 Nov 2021 00:58:52 -0800 (PST) Received: from adacore.com ([2a02:2ab8:224:2ce:72b5:e8ff:feef:ee60]) by smtp.gmail.com with ESMTPSA id n5sm2394219lfu.126.2021.11.10.00.58.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Nov 2021 00:58:51 -0800 (PST) Date: Wed, 10 Nov 2021 08:58:50 +0000 To: gcc-patches@gcc.gnu.org Subject: [Ada] Warn for bidirectional characters Message-ID: <20211110085850.GA2811262@adacore.com> MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Pierre-Marie de Rodat via Gcc-patches From: Pierre-Marie de Rodat Reply-To: Pierre-Marie de Rodat Cc: Bob Duff Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Bidirectional characters can cause security vulnerabilities, as explained in the paper mentioned in a comment in this patch. Therefore, we warn if such characters appear in string_literals, character_literals, or comments. Tested on x86_64-pc-linux-gnu, committed on trunk gcc/ada/ * scng.adb (Check_Bidi): New procedure to give warning. Note that this is called only for non-ASCII characters, so should not be an efficiency issue. (Slit): Call Check_Bidi for wide characters in string_literals. (Minus_Case): Call Check_Bidi for wide characters in comments. (Char_Literal_Case): Call Check_Bidi for wide characters in character_literals. Move Accumulate_Checksum down, because otherwise, if Err is True, the Code is uninitialized. * errout.ads: Make the obsolete nature of "Insertion character ?" more prominent; one should not have to read several paragraphs before finding out that it's obsolete. diff --git a/gcc/ada/errout.ads b/gcc/ada/errout.ads --- a/gcc/ada/errout.ads +++ b/gcc/ada/errout.ads @@ -275,7 +275,7 @@ package Errout is -- contain subprograms to be inlined in the main program. It is also -- used by the Compiler_Unit_Warning pragma for similar reasons. - -- Insertion character ? (Question: warning message) + -- Insertion character ? (Question: warning message -- OBSOLETE) -- The character ? appearing anywhere in a message makes the message -- warning instead of a normal error message, and the text of the -- message will be preceded by "warning:" in the normal case. The @@ -302,7 +302,7 @@ package Errout is -- clear that the continuation is part of a warning message, but it is -- not necessary to go through any computational effort to include it. -- - -- Note: this usage is obsolete, use ?? ?*? ?$? ?x? ?.x? ?_x? to + -- Note: this usage is obsolete; use ?? ?*? ?$? ?x? ?.x? ?_x? to -- specify the string to be added when Warn_Doc_Switch is set to True. -- If this switch is True, then for simple ? messages it has no effect. -- This simple form is to ease transition and may be removed later diff --git a/gcc/ada/scng.adb b/gcc/ada/scng.adb --- a/gcc/ada/scng.adb +++ b/gcc/ada/scng.adb @@ -322,6 +322,49 @@ package body Scng is -- Returns True if the scan pointer is pointing to the start of a wide -- character sequence, does not modify the scan pointer in any case. + procedure Check_Bidi (Code : Char_Code); + -- Give a warning if Code is a bidirectional character, which can cause + -- security vulnerabilities. See the following article: + -- + -- @article{boucher_trojansource_2021, + -- title = {Trojan {Source}: {Invisible} {Vulnerabilities}}, + -- author = {Nicholas Boucher and Ross Anderson}, + -- year = {2021}, + -- journal = {Preprint}, + -- eprint = {2111.00169}, + -- archivePrefix = {arXiv}, + -- primaryClass = {cs.CR}, + -- url = {https://arxiv.org/abs/2111.00169} + -- } + + ---------------- + -- Check_Bidi -- + ---------------- + + type Bidi_Characters is + (LRE, RLE, LRO, RLO, LRI, RLI, FSI, PDF, PDI); + Bidi_Character_Codes : constant array (Bidi_Characters) of Char_Code := + (LRE => 16#202A#, + RLE => 16#202B#, + LRO => 16#202D#, + RLO => 16#202E#, + LRI => 16#2066#, + RLI => 16#2067#, + FSI => 16#2068#, + PDF => 16#202C#, + PDI => 16#2069#); + -- Above are the bidirectional characters, along with their Unicode code + -- points. + + procedure Check_Bidi (Code : Char_Code) is + begin + for Bidi_Code of Bidi_Character_Codes loop + if Code = Bidi_Code then + Error_Msg ("??bidirectional wide character", Wptr); + end if; + end loop; + end Check_Bidi; + ----------------------- -- Double_Char_Token -- ----------------------- @@ -1070,6 +1113,8 @@ package body Scng is if Err then Error_Illegal_Wide_Character; Code := Get_Char_Code (' '); + else + Check_Bidi (Code); end if; Accumulate_Checksum (Code); @@ -1611,11 +1656,11 @@ package body Scng is elsif Start_Of_Wide_Character then declare - Wptr : constant Source_Ptr := Scan_Ptr; Code : Char_Code; Err : Boolean; begin + Wptr := Scan_Ptr; Scan_Wide (Source, Scan_Ptr, Code, Err); -- If not well formed wide character, then just skip @@ -1629,6 +1674,8 @@ package body Scng is elsif Is_UTF_32_Line_Terminator (UTF_32 (Code)) then Scan_Ptr := Wptr; exit; + else + Check_Bidi (Code); end if; end; @@ -1736,7 +1783,6 @@ package body Scng is if Start_Of_Wide_Character then Wptr := Scan_Ptr; Scan_Wide (Source, Scan_Ptr, Code, Err); - Accumulate_Checksum (Code); if Err then Error_Illegal_Wide_Character; @@ -1752,8 +1798,12 @@ package body Scng is Error_Msg -- CODEFIX ("(Ada 2005) non-graphic character not permitted " & "in character literal", Wptr); + else + Check_Bidi (Code); end if; + Accumulate_Checksum (Code); + if Source (Scan_Ptr) /= ''' then Error_Msg_S ("missing apostrophe"); else