Message ID | 20220607214342.19463-1-david.faust@oracle.com |
---|---|
Headers |
Return-Path: <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9BAF73954446 for <patchwork@sourceware.org>; Tue, 7 Jun 2022 21:45:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9BAF73954446 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1654638334; bh=fs89d8GPP4GkzEthbucoL9p9pmDIdABa+RTzjLy6i50=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=HItO7E/J41+rJs1OC5jegI1utC7ugGF0phc511TdXEn1yztymfEyOixmlU9PmydQw B7eEKY0tklyHWH8MJf5LnykYzlV2z+fi1vWEUDx6za3OfEfVvjC9rKQ/4oLlq9IIqd SVWpNs7VIYwgBB5zlI0M3qNvJmE0Pc6IJdi99TqI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by sourceware.org (Postfix) with ESMTPS id 16B823856DEF for <gcc-patches@gcc.gnu.org>; Tue, 7 Jun 2022 21:44:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 16B823856DEF Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 257IJ1pL018341; Tue, 7 Jun 2022 21:44:02 GMT Received: from iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta02.appoci.oracle.com [147.154.18.20]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3gfyekev1n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 07 Jun 2022 21:44:02 +0000 Received: from pps.filterd (iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 257LfL3L022839; Tue, 7 Jun 2022 21:44:02 GMT Received: from nam12-dm6-obe.outbound.protection.outlook.com (mail-dm6nam12lp2168.outbound.protection.outlook.com [104.47.59.168]) by iadpaimrmta02.imrmtpd1.prodappiadaev1.oraclevcn.com with ESMTP id 3gfwua3c8f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 07 Jun 2022 21:44:01 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IzX2cMVyrnH4PXMYgVmS5bReD/jNuuKAwkSvfSS1vG5/E+dCZqPtOqg6wkE9NkC3XubFywGIzbGSxfNlR431ulQ7v/vyHSedFwU6kZ4ESGAJCcVNp4d4MgB9vzeS95+LIHC9lWBmmuXhjGY/iDKl7/9PsZn/Ee1cpaX+l2ta7PlOC5soe4f5wnRCb8RGMqfIQW3EtqmuxwUiy1mjDscfkqXewFITYsSi8rxs6fzKJbNYjIuNdhPdrvPVByrXBb5L1jAWRQyzHEoZ3kK1ul98apjDGQe8kHjJLfi/yKLQc+0VcxWwsfhxxo0I+H92QT2T38n2pUBY0+AVk5C7BNLTcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=fs89d8GPP4GkzEthbucoL9p9pmDIdABa+RTzjLy6i50=; b=ZwfajF4fxhC463+JCGQctfeOJPCol5esb/DQOFwqgCXUGXQi8oda3diAJz4U06zkkUQ2jEKcfxef2p5flQvgCXq0aExbLQIj9Rky5MYRdl5hxM0yF/3Lcyjylf4Mi7O6EbwQ00D2Jw+lkegKqo1yDj4PF78ONonNFnCxl9fdqiugJ3X8mcV6yufYUm1iaMD7b9+ihNDGZpQemWroCfM13nUG69pRDHrkjFoBI1LLz4V7Cim5PZddevKsCweEeLYaVhGWbhvgRjYw9rvWnoDW/XCyxDbggq7GxKe6DWFjsqdFvzatk3R2mXT8PjVkGKD0lHTH5tHt/NI8elUuUjHAbQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none Received: from MN2PR10MB3213.namprd10.prod.outlook.com (2603:10b6:208:131::33) by BN0PR10MB5351.namprd10.prod.outlook.com (2603:10b6:408:127::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5314.12; Tue, 7 Jun 2022 21:44:00 +0000 Received: from MN2PR10MB3213.namprd10.prod.outlook.com ([fe80::4939:15e0:57cb:87fb]) by MN2PR10MB3213.namprd10.prod.outlook.com ([fe80::4939:15e0:57cb:87fb%5]) with mapi id 15.20.5314.019; Tue, 7 Jun 2022 21:44:00 +0000 To: gcc-patches@gcc.gnu.org Subject: [PATCH 0/9] Add debug_annotate attributes Date: Tue, 7 Jun 2022 14:43:33 -0700 Message-Id: <20220607214342.19463-1-david.faust@oracle.com> X-Mailer: git-send-email 2.35.3 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: DM5PR11CA0003.namprd11.prod.outlook.com (2603:10b6:3:115::13) To MN2PR10MB3213.namprd10.prod.outlook.com (2603:10b6:208:131::33) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: cd8e4988-986f-49f5-cd64-08da48ced50b X-MS-TrafficTypeDiagnostic: BN0PR10MB5351:EE_ X-Microsoft-Antispam-PRVS: <BN0PR10MB5351C62C45A33F26DC51360E86A59@BN0PR10MB5351.namprd10.prod.outlook.com> X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: b1J2FSKWezTrWyzVjQ4NCBp9yuIuR6lG0mwxzClmptqtcoUT5b1nEuTu3c3kaa8a4JHDVZeD9A1yZpzFd78WLRbqMJVvi+k4MoP31boVDPRLaLuVlj1lulmndilPQVr0vrCdl0YjwfBSM7Tlg5Z6tzQUEcOdP1nMb+dqcDxQjuqCyB2XNqciC1AjLJ42mfNniySueM1tCdf93030z2HOeppu8CH1ADL0Pdwacy0852r8l4XSZ4COHNMntHbKZiEYB/OFKqHvL3d8oMKwZXIZQtPB2Xqdt5xGlcxA9GrrF6bLyN6BP9vi8Diy6bRmmQ8cx+lJt4VSa896jwbRwV3U+UmwjaKe0eZOwpMsMv0mGXf7UoZ/r+wk6TUi/usqGdq72QY0IRh6WAqxPMjzgrmll7TMNMAh3ToaeCy4bO4iUaWGxPXoFETN2cfOwBufmD+mT2ATm4w7SZ6jbdCtAWwZvR5RnRu3bpEapFfIBUgfx/Iv047MCPIsM434fM0bXFkoT4ncs9U0hdhe7P9ChRsrrOM+u4HyZ4yMQ/w3nTqRtdjRiwguJ3wcKW5F/H0QC/um7kWabAEOo73k5QgvbkNplG7prgeG1rrCYCvggjmBCwlu24FN5nOtKMUxzqqfGMzizSfXTjBexImjAHn0Et8hdzW+eHkWIZtuf/8DWxNxYjEuqU7jfhWBM8Yh0WzHdSOP3TfxeDh5PmgQNfbt3tqLJgf0FtrvYpVPjtf6vhvRgs8YLkoWm2Uv0FC5IRRsnWXAGyjxFufG1F7sGBCGngWBeQ== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:MN2PR10MB3213.namprd10.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(366004)(508600001)(36756003)(6916009)(8936002)(1076003)(5660300002)(6512007)(2616005)(2906002)(66946007)(86362001)(38100700002)(83380400001)(66556008)(66476007)(316002)(6486002)(44832011)(8676002)(4326008)(6666004)(966005)(6506007)(186003)(52116002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: ILzVpn3ooaTbh0khL2kkBgd+6kdhGr6xt7D9GD3mWpJCQsb66+1dsK4+LD7ZtfBdSX8Pm9hOahfIyKbXzsREEOTCbcMqF4BR566Sf5xCbMx7LqjNswUL5mWMRmQCzvRpp9yQNeT7v1XwcQJ0inYlV3SpH+q1JU4YX92oiYKXer2H3Fw0qzlA/tBK5h3t0VWYqpbF+kEILI0SZT7mIwzrrGfMueY28sGNxJtx9BaiOoaCuvBLQch7g9pawr92VMyveFjBgbSStbVhAHWwqfFDJIFHBn81h6hJYrr6EqObxCqr6+3llkvra13w0p9fZfY4+I8fW91G24UJt4RZzrXwCr+XQ60zRcYxsN5yrmHgB/4D2jiv3iHHiUJIVzjtI/WWe0W7mNnPCz6vtvkZO1pcjM0CaGbFyopJLFmQr/9D7YG1RJF52xEQU9vMRjtsssh9WN8LY2/a+N8j83LuzlwglI4Jme8nYm5yWYaecERb7+2Tsl7q/b2m3MSA+JgBAkgJI3vONt8MdOhENvt4AwNThNNtIthiJJ+tBwe5ZZcmjjrvvJkpMtoroB2xZKMjtDydy8hxmD6NCypyUBAWmXTU9I47iIOT0f5M7s0vjezuWOKZWwfQVBjudToXmfQviPE0O40/nEyUC+A/+F5A6OQXxVik8AVJkXDL0I2lDxSWhxZcD4tycfQXl+HZECxef8xyOhoj7WNoDNi0bTeozJ6XLoCgS0HbhWGqxRtALJr6L5M0OZDZwZDYCbBrtoFBJGCi/WkEase5DFzddFJqxeAPyTMDy1kC12GT2vMmZ+wZqlCxaDwAyTwKSOoBztyqaqLEQYW915Cl2dwMibFN6isXkwLUuVkVWNfqIJl3yrpUHjCjkKGJyrJ+avUz0UgKGF6avpbZJb3SAYgYX6jjM6gD0WK35boqaUPlE9XkN6irgFdkQ2L/cCCVsHg97DqcwZYcsyYBwKSpny3RwoBSj8n54Bw/PqbmKdKQJ8gb3ktA7tWhyHObyhMDtdn7RgQAuGUIHz3Dvc38N5k48RzcvAac8nEK3ZEpZAuYP1sS0jRvdKeWr/cKfiO2eXoFevXw8dnSSbuHLfMHwMQabmv0Qimjl2VVWpV5ZB9Y0k0C18tzEhqUmwQxV4cts2kTNPptKlyJkU7+0aBjiYJtc1mrVw3JNWFWp6lbgup+0zdGnqALlaj6n1t28sS+VlcqLKCnGRhQ9UKaDeFJcinhZLlKejdIGBeC983iLX+4v2jl19Umdq7P7gD0QvXjMRkN5fWK0E1MfLajh2trqF1V2C30/lX+6YtxEBnNHzCy5sVHUf12yqaFYH8l+qRsPsal3x9+/TxOgL6iOiYlo5xAJKCClY7hiz8jKqegutyhY7KTC+jbPHTHSPwr3KlxdZV1y9ZZSH/8eLRYu3hA2JRlJdJIyMflm0wfkk2BON/CKNnAtU2ybG7M2MII81vumS1Agtb9giwx7PVZmbjcMJimoj/3KdrhCjxFpznpgl8lvLIG/pPSXb+HPbY1FK+Qx5BGM/V32MLMJ/5gs3CHNncyLdZwgS36OwfmHDF662CKTmD2+FdrHAvPnA/Rj9y+YVLQP4dpiC9I3aGBBJb7pfqDaF5fhP1OPnS+8YYmcGb5Lkfhkqx7tNIa0k7cJq+W4g3ewqVBtqaJYEJ+YDXcA2DetFhSxEBg8BXbIczjff3Ys46zbq72tp8al5JOuUOcl1gEX+bfjQENyatlc5cY8g7I/ksFk2afVr9doko4O/ydCX7kgrhgzLQm3t4CEoDMGFA2BtLeBI1P X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: cd8e4988-986f-49f5-cd64-08da48ced50b X-MS-Exchange-CrossTenant-AuthSource: MN2PR10MB3213.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Jun 2022 21:44:00.2848 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: wnyyMLCPzpSbjdwAtWJdS/+3RcdlDtkTa4C6CwWAzt5REgigjT4VsNplTjAPLeTrzJ1KTVbGg1Fxb7D7ZVQvDg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN0PR10MB5351 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517, 18.0.874 definitions=2022-06-07_10:2022-06-07, 2022-06-07 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 bulkscore=0 malwarescore=0 mlxlogscore=999 adultscore=0 mlxscore=0 phishscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2204290000 definitions=main-2206070089 X-Proofpoint-GUID: Kp9UFvMyAnBtb5IVmq2Oc_jRIJHDyPcl X-Proofpoint-ORIG-GUID: Kp9UFvMyAnBtb5IVmq2Oc_jRIJHDyPcl X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: David Faust via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: David Faust <david.faust@oracle.com> Cc: yhs@fb.com Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org> |
Series |
Add debug_annotate attributes
|
|
Message
David Faust
June 7, 2022, 9:43 p.m. UTC
Hello, This patch series adds support for: - Two new C-language-level attributes that allow to associate (to "annotate" or to "tag") particular declarations and types with arbitrary strings. As explained below, this is intended to be used to, for example, characterize certain pointer types. - The conveyance of that information in the DWARF output in the form of a new DIE: DW_TAG_GNU_annotation. - The conveyance of that information in the BTF output in the form of two new kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. All of these facilities are being added to the eBPF ecosystem, and support for them exists in some form in LLVM. Purpose ======= 1) Addition of C-family language constructs (attributes) to specify free-text tags on certain language elements, such as struct fields. The purpose of these annotations is to provide additional information about types, variables, and function parameters of interest to the kernel. A driving use case is to tag pointer types within the linux kernel and eBPF programs with additional semantic information, such as '__user' or '__rcu'. For example, consider the linux kernel function do_execve with the following declaration: static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Here, __user could be defined with these annotations to record semantic information about the pointer parameters (e.g., they are user-provided) in DWARF and BTF information. Other kernel facilites such as the eBPF verifier can read the tags and make use of the information. 2) Conveying the tags in the generated DWARF debug info. The main motivation for emitting the tags in DWARF is that the Linux kernel generates its BTF information via pahole, using DWARF as a source: +--------+ BTF BTF +----------+ | pahole |-------> vmlinux.btf ------->| verifier | +--------+ +----------+ ^ ^ | | DWARF | BTF | | | vmlinux +-------------+ module1.ko | BPF program | module2.ko +-------------+ ... This is because: a) Unlike GCC, LLVM will only generate BTF for BPF programs. b) GCC can generate BTF for whatever target with -gbtf, but there is no support for linking/deduplicating BTF in the linker. In the scenario above, the verifier needs access to the pointer tags of both the kernel types/declarations (conveyed in the DWARF and translated to BTF by pahole) and those of the BPF program (available directly in BTF). Another motivation for having the tag information in DWARF, unrelated to BPF and BTF, is that the drgn project (another DWARF consumer) also wants to benefit from these tags in order to differentiate between different kinds of pointers in the kernel. 3) Conveying the tags in the generated BTF debug info. This is easy: the main purpose of having this info in BTF is for the compiled eBPF programs. The kernel verifier can then access the tags of pointers used by the eBPF programs. For more information about these tags and the motivation behind them, please refer to the following linux kernel discussions: https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/ https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/ https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/ Implementation Overview ======================= To enable these annotations, two new C language attributes are added: __attribute__((debug_annotate_decl("foo"))) and __attribute__((debug_annotate_type("bar"))). Both attributes accept a single arbitrary string constant argument, which will be recorded in the generated DWARF and/or BTF debug information. They have no effect on code generation. Note that we are not using the same attribute names as LLVM (btf_decl_tag and btf_type_tag, respectively). While these attributes are functionally very similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" in the attribute name seems misleading. DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF, declarations and types will be checked for the corresponding attributes. If present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for the annotated type or declaration, one for each tag. These DIEs link the arbitrary tag value to the item they annotate. For example, the following variable declaration: #define __typetag1 __attribute__((debug_annotate_type ("typetag1"))) #define __decltag1 __attribute__((debug_annotate_decl ("decltag1"))) #define __decltag2 __attribute__((debug_annotate_decl ("decltag2"))) int * __typetag1 x __decltag1 __decltag2; Produces the following DWARF information: <1><1e>: Abbrev Number: 3 (DW_TAG_variable) <1f> DW_AT_name : x <21> DW_AT_decl_file : 1 <22> DW_AT_decl_line : 7 <23> DW_AT_decl_column : 18 <24> DW_AT_type : <0x49> <28> DW_AT_external : 1 <28> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 (DW_OP_addr: 0) <32> DW_AT_sibling : <0x49> <2><36>: Abbrev Number: 1 (User TAG value: 0x6000) <37> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl <3b> DW_AT_const_value : (indirect string, offset: 0xcd): decltag2 <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000) <40> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl <44> DW_AT_const_value : (indirect string, offset: 0x0): decltag1 <2><48>: Abbrev Number: 0 <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type) <4a> DW_AT_byte_size : 8 <4b> DW_AT_type : <0x5d> <4f> DW_AT_sibling : <0x5d> <2><53>: Abbrev Number: 1 (User TAG value: 0x6000) <54> DW_AT_name : (indirect string, offset: 0x9): debug_annotate_type <58> DW_AT_const_value : (indirect string, offset: 0x1d): typetag1 <2><5c>: Abbrev Number: 0 <1><5d>: Abbrev Number: 5 (DW_TAG_base_type) <5e> DW_AT_byte_size : 4 <5f> DW_AT_encoding : 5 (signed) <60> DW_AT_name : int <1><64>: Abbrev Number: 0 In the case of BTF, the annotations are recorded in two type kinds recently added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. The above example declaration prodcues the following BTF information: [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED [2] PTR '(anon)' type_id=3 [3] TYPE_TAG 'typetag1' type_id=1 [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1 [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1 [6] VAR 'x' type_id=2, linkage=global [7] DATASEC '.bss' size=0 vlen=1 type_id=6 offset=0 size=8 (VAR 'x') David Faust (9): dwarf: add dw_get_die_parent function include: Add new definitions c-family: Add debug_annotate attribute handlers dwarf: generate annotation DIEs ctfc: pass through debug annotations to BTF dwarf2ctf: convert annotation DIEs to CTF types btf: output decl_tag and type_tag records doc: document new attributes testsuite: add debug annotation tests gcc/btfout.cc | 28 +++++ gcc/c-family/c-attribs.cc | 43 +++++++ gcc/ctf-int.h | 29 +++++ gcc/ctfc.cc | 11 +- gcc/ctfc.h | 17 ++- gcc/doc/extend.texi | 106 ++++++++++++++++ gcc/dwarf2ctf.cc | 114 +++++++++++++++++- gcc/dwarf2out.cc | 102 ++++++++++++++++ gcc/dwarf2out.h | 1 + .../gcc.dg/debug/btf/btf-decltag-func.c | 18 +++ .../gcc.dg/debug/btf/btf-decltag-sou.c | 34 ++++++ .../gcc.dg/debug/btf/btf-decltag-typedef.c | 15 +++ .../gcc.dg/debug/btf/btf-typetag-1.c | 20 +++ .../gcc.dg/debug/dwarf2/annotation-1.c | 20 +++ .../gcc.dg/debug/dwarf2/annotation-2.c | 17 +++ .../gcc.dg/debug/dwarf2/annotation-3.c | 20 +++ .../gcc.dg/debug/dwarf2/annotation-4.c | 34 ++++++ include/btf.h | 17 ++- include/dwarf2.def | 4 + 19 files changed, 639 insertions(+), 11 deletions(-) create mode 100644 gcc/ctf-int.h create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-func.c create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-sou.c create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-decltag-typedef.c create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-typetag-1.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-1.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-2.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-3.c create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/annotation-4.c
Comments
On 6/7/22 2:43 PM, David Faust wrote: > Hello, > > This patch series adds support for: > > - Two new C-language-level attributes that allow to associate (to "annotate" or > to "tag") particular declarations and types with arbitrary strings. As > explained below, this is intended to be used to, for example, characterize > certain pointer types. > > - The conveyance of that information in the DWARF output in the form of a new > DIE: DW_TAG_GNU_annotation. > > - The conveyance of that information in the BTF output in the form of two new > kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. > > All of these facilities are being added to the eBPF ecosystem, and support for > them exists in some form in LLVM. > > Purpose > ======= > > 1) Addition of C-family language constructs (attributes) to specify free-text > tags on certain language elements, such as struct fields. > > The purpose of these annotations is to provide additional information about > types, variables, and function parameters of interest to the kernel. A > driving use case is to tag pointer types within the linux kernel and eBPF > programs with additional semantic information, such as '__user' or '__rcu'. > > For example, consider the linux kernel function do_execve with the > following declaration: > > static int do_execve(struct filename *filename, > const char __user *const __user *__argv, > const char __user *const __user *__envp); > > Here, __user could be defined with these annotations to record semantic > information about the pointer parameters (e.g., they are user-provided) in > DWARF and BTF information. Other kernel facilites such as the eBPF verifier > can read the tags and make use of the information. > > 2) Conveying the tags in the generated DWARF debug info. > > The main motivation for emitting the tags in DWARF is that the Linux kernel > generates its BTF information via pahole, using DWARF as a source: > > +--------+ BTF BTF +----------+ > | pahole |-------> vmlinux.btf ------->| verifier | > +--------+ +----------+ > ^ ^ > | | > DWARF | BTF | > | | > vmlinux +-------------+ > module1.ko | BPF program | > module2.ko +-------------+ > ... > > This is because: > > a) Unlike GCC, LLVM will only generate BTF for BPF programs. > > b) GCC can generate BTF for whatever target with -gbtf, but there is no > support for linking/deduplicating BTF in the linker. > > In the scenario above, the verifier needs access to the pointer tags of > both the kernel types/declarations (conveyed in the DWARF and translated > to BTF by pahole) and those of the BPF program (available directly in BTF). > > Another motivation for having the tag information in DWARF, unrelated to > BPF and BTF, is that the drgn project (another DWARF consumer) also wants > to benefit from these tags in order to differentiate between different > kinds of pointers in the kernel. > > 3) Conveying the tags in the generated BTF debug info. > > This is easy: the main purpose of having this info in BTF is for the > compiled eBPF programs. The kernel verifier can then access the tags > of pointers used by the eBPF programs. > > > For more information about these tags and the motivation behind them, please > refer to the following linux kernel discussions: > > https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/ > https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/ > https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/ > > > Implementation Overview > ======================= > > To enable these annotations, two new C language attributes are added: > __attribute__((debug_annotate_decl("foo"))) and > __attribute__((debug_annotate_type("bar"))). Both attributes accept a single > arbitrary string constant argument, which will be recorded in the generated > DWARF and/or BTF debug information. They have no effect on code generation. > > Note that we are not using the same attribute names as LLVM (btf_decl_tag and > btf_type_tag, respectively). While these attributes are functionally very > similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" > in the attribute name seems misleading. > > DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF, > declarations and types will be checked for the corresponding attributes. If > present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for > the annotated type or declaration, one for each tag. These DIEs link the > arbitrary tag value to the item they annotate. > > For example, the following variable declaration: > > #define __typetag1 __attribute__((debug_annotate_type ("typetag1"))) > > #define __decltag1 __attribute__((debug_annotate_decl ("decltag1"))) > #define __decltag2 __attribute__((debug_annotate_decl ("decltag2"))) > > int * __typetag1 x __decltag1 __decltag2; Based on the above example static int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp); Should the above example should be the below? int __typetag1 * x __decltag1 __decltag2 > > Produces the following DWARF information: > > <1><1e>: Abbrev Number: 3 (DW_TAG_variable) > <1f> DW_AT_name : x > <21> DW_AT_decl_file : 1 > <22> DW_AT_decl_line : 7 > <23> DW_AT_decl_column : 18 > <24> DW_AT_type : <0x49> > <28> DW_AT_external : 1 > <28> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 (DW_OP_addr: 0) > <32> DW_AT_sibling : <0x49> > <2><36>: Abbrev Number: 1 (User TAG value: 0x6000) > <37> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl > <3b> DW_AT_const_value : (indirect string, offset: 0xcd): decltag2 > <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000) > <40> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl > <44> DW_AT_const_value : (indirect string, offset: 0x0): decltag1 > <2><48>: Abbrev Number: 0 > <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type) > <4a> DW_AT_byte_size : 8 > <4b> DW_AT_type : <0x5d> > <4f> DW_AT_sibling : <0x5d> > <2><53>: Abbrev Number: 1 (User TAG value: 0x6000) > <54> DW_AT_name : (indirect string, offset: 0x9): debug_annotate_type > <58> DW_AT_const_value : (indirect string, offset: 0x1d): typetag1 > <2><5c>: Abbrev Number: 0 > <1><5d>: Abbrev Number: 5 (DW_TAG_base_type) > <5e> DW_AT_byte_size : 4 > <5f> DW_AT_encoding : 5 (signed) > <60> DW_AT_name : int > <1><64>: Abbrev Number: 0 Maybe you can also show what dwarf debug_info looks like? > > In the case of BTF, the annotations are recorded in two type kinds recently > added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. > The above example declaration prodcues the following BTF information: > > [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED > [2] PTR '(anon)' type_id=3 > [3] TYPE_TAG 'typetag1' type_id=1 > [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1 > [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1 > [6] VAR 'x' type_id=2, linkage=global > [7] DATASEC '.bss' size=0 vlen=1 > type_id=6 offset=0 size=8 (VAR 'x') > > [...]
On 6/14/22 22:53, Yonghong Song wrote: > > > On 6/7/22 2:43 PM, David Faust wrote: >> Hello, >> >> This patch series adds support for: >> >> - Two new C-language-level attributes that allow to associate (to "annotate" or >> to "tag") particular declarations and types with arbitrary strings. As >> explained below, this is intended to be used to, for example, characterize >> certain pointer types. >> >> - The conveyance of that information in the DWARF output in the form of a new >> DIE: DW_TAG_GNU_annotation. >> >> - The conveyance of that information in the BTF output in the form of two new >> kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. >> >> All of these facilities are being added to the eBPF ecosystem, and support for >> them exists in some form in LLVM. >> >> Purpose >> ======= >> >> 1) Addition of C-family language constructs (attributes) to specify free-text >> tags on certain language elements, such as struct fields. >> >> The purpose of these annotations is to provide additional information about >> types, variables, and function parameters of interest to the kernel. A >> driving use case is to tag pointer types within the linux kernel and eBPF >> programs with additional semantic information, such as '__user' or '__rcu'. >> >> For example, consider the linux kernel function do_execve with the >> following declaration: >> >> static int do_execve(struct filename *filename, >> const char __user *const __user *__argv, >> const char __user *const __user *__envp); >> >> Here, __user could be defined with these annotations to record semantic >> information about the pointer parameters (e.g., they are user-provided) in >> DWARF and BTF information. Other kernel facilites such as the eBPF verifier >> can read the tags and make use of the information. >> >> 2) Conveying the tags in the generated DWARF debug info. >> >> The main motivation for emitting the tags in DWARF is that the Linux kernel >> generates its BTF information via pahole, using DWARF as a source: >> >> +--------+ BTF BTF +----------+ >> | pahole |-------> vmlinux.btf ------->| verifier | >> +--------+ +----------+ >> ^ ^ >> | | >> DWARF | BTF | >> | | >> vmlinux +-------------+ >> module1.ko | BPF program | >> module2.ko +-------------+ >> ... >> >> This is because: >> >> a) Unlike GCC, LLVM will only generate BTF for BPF programs. >> >> b) GCC can generate BTF for whatever target with -gbtf, but there is no >> support for linking/deduplicating BTF in the linker. >> >> In the scenario above, the verifier needs access to the pointer tags of >> both the kernel types/declarations (conveyed in the DWARF and translated >> to BTF by pahole) and those of the BPF program (available directly in BTF). >> >> Another motivation for having the tag information in DWARF, unrelated to >> BPF and BTF, is that the drgn project (another DWARF consumer) also wants >> to benefit from these tags in order to differentiate between different >> kinds of pointers in the kernel. >> >> 3) Conveying the tags in the generated BTF debug info. >> >> This is easy: the main purpose of having this info in BTF is for the >> compiled eBPF programs. The kernel verifier can then access the tags >> of pointers used by the eBPF programs. >> >> >> For more information about these tags and the motivation behind them, please >> refer to the following linux kernel discussions: >> >> https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/ >> https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/ >> https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/ >> >> >> Implementation Overview >> ======================= >> >> To enable these annotations, two new C language attributes are added: >> __attribute__((debug_annotate_decl("foo"))) and >> __attribute__((debug_annotate_type("bar"))). Both attributes accept a single >> arbitrary string constant argument, which will be recorded in the generated >> DWARF and/or BTF debug information. They have no effect on code generation. >> >> Note that we are not using the same attribute names as LLVM (btf_decl_tag and >> btf_type_tag, respectively). While these attributes are functionally very >> similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" >> in the attribute name seems misleading. >> >> DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF, >> declarations and types will be checked for the corresponding attributes. If >> present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for >> the annotated type or declaration, one for each tag. These DIEs link the >> arbitrary tag value to the item they annotate. >> >> For example, the following variable declaration: >> >> #define __typetag1 __attribute__((debug_annotate_type ("typetag1"))) >> >> #define __decltag1 __attribute__((debug_annotate_decl ("decltag1"))) >> #define __decltag2 __attribute__((debug_annotate_decl ("decltag2"))) >> >> int * __typetag1 x __decltag1 __decltag2; > > Based on the above example > static int do_execve(struct filename *filename, > const char __user *const __user *__argv, > const char __user *const __user *__envp); > > Should the above example should be the below? > int __typetag1 * x __decltag1 __decltag2 > This example is not related to the one above. It is just meant to show the behavior of both attributes. My apologies for not making that clear. >> >> Produces the following DWARF information: >> >> <1><1e>: Abbrev Number: 3 (DW_TAG_variable) >> <1f> DW_AT_name : x >> <21> DW_AT_decl_file : 1 >> <22> DW_AT_decl_line : 7 >> <23> DW_AT_decl_column : 18 >> <24> DW_AT_type : <0x49> >> <28> DW_AT_external : 1 >> <28> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 (DW_OP_addr: 0) >> <32> DW_AT_sibling : <0x49> >> <2><36>: Abbrev Number: 1 (User TAG value: 0x6000) >> <37> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl >> <3b> DW_AT_const_value : (indirect string, offset: 0xcd): decltag2 >> <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000) >> <40> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl >> <44> DW_AT_const_value : (indirect string, offset: 0x0): decltag1 >> <2><48>: Abbrev Number: 0 >> <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type) >> <4a> DW_AT_byte_size : 8 >> <4b> DW_AT_type : <0x5d> >> <4f> DW_AT_sibling : <0x5d> >> <2><53>: Abbrev Number: 1 (User TAG value: 0x6000) >> <54> DW_AT_name : (indirect string, offset: 0x9): debug_annotate_type >> <58> DW_AT_const_value : (indirect string, offset: 0x1d): typetag1 >> <2><5c>: Abbrev Number: 0 >> <1><5d>: Abbrev Number: 5 (DW_TAG_base_type) >> <5e> DW_AT_byte_size : 4 >> <5f> DW_AT_encoding : 5 (signed) >> <60> DW_AT_name : int >> <1><64>: Abbrev Number: 0 > > Maybe you can also show what dwarf debug_info looks like I am not sure what you mean. This is the .debug_info section as output by readelf -w. I did trim some information not relevant to the discussion such as the DW_TAG_compile_unit DIE, for brevity. > >> >> In the case of BTF, the annotations are recorded in two type kinds recently >> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. >> The above example declaration prodcues the following BTF information: >> >> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED >> [2] PTR '(anon)' type_id=3 >> [3] TYPE_TAG 'typetag1' type_id=1 >> [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1 >> [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1 >> [6] VAR 'x' type_id=2, linkage=global >> [7] DATASEC '.bss' size=0 vlen=1 >> type_id=6 offset=0 size=8 (VAR 'x') >> >> > [...]
On 6/15/22 1:57 PM, David Faust wrote: > > > On 6/14/22 22:53, Yonghong Song wrote: >> >> >> On 6/7/22 2:43 PM, David Faust wrote: >>> Hello, >>> >>> This patch series adds support for: >>> >>> - Two new C-language-level attributes that allow to associate (to "annotate" or >>> to "tag") particular declarations and types with arbitrary strings. As >>> explained below, this is intended to be used to, for example, characterize >>> certain pointer types. >>> >>> - The conveyance of that information in the DWARF output in the form of a new >>> DIE: DW_TAG_GNU_annotation. >>> >>> - The conveyance of that information in the BTF output in the form of two new >>> kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. >>> >>> All of these facilities are being added to the eBPF ecosystem, and support for >>> them exists in some form in LLVM. >>> >>> Purpose >>> ======= >>> >>> 1) Addition of C-family language constructs (attributes) to specify free-text >>> tags on certain language elements, such as struct fields. >>> >>> The purpose of these annotations is to provide additional information about >>> types, variables, and function parameters of interest to the kernel. A >>> driving use case is to tag pointer types within the linux kernel and eBPF >>> programs with additional semantic information, such as '__user' or '__rcu'. >>> >>> For example, consider the linux kernel function do_execve with the >>> following declaration: >>> >>> static int do_execve(struct filename *filename, >>> const char __user *const __user *__argv, >>> const char __user *const __user *__envp); >>> >>> Here, __user could be defined with these annotations to record semantic >>> information about the pointer parameters (e.g., they are user-provided) in >>> DWARF and BTF information. Other kernel facilites such as the eBPF verifier >>> can read the tags and make use of the information. >>> >>> 2) Conveying the tags in the generated DWARF debug info. >>> >>> The main motivation for emitting the tags in DWARF is that the Linux kernel >>> generates its BTF information via pahole, using DWARF as a source: >>> >>> +--------+ BTF BTF +----------+ >>> | pahole |-------> vmlinux.btf ------->| verifier | >>> +--------+ +----------+ >>> ^ ^ >>> | | >>> DWARF | BTF | >>> | | >>> vmlinux +-------------+ >>> module1.ko | BPF program | >>> module2.ko +-------------+ >>> ... >>> >>> This is because: >>> >>> a) Unlike GCC, LLVM will only generate BTF for BPF programs. >>> >>> b) GCC can generate BTF for whatever target with -gbtf, but there is no >>> support for linking/deduplicating BTF in the linker. >>> >>> In the scenario above, the verifier needs access to the pointer tags of >>> both the kernel types/declarations (conveyed in the DWARF and translated >>> to BTF by pahole) and those of the BPF program (available directly in BTF). >>> >>> Another motivation for having the tag information in DWARF, unrelated to >>> BPF and BTF, is that the drgn project (another DWARF consumer) also wants >>> to benefit from these tags in order to differentiate between different >>> kinds of pointers in the kernel. >>> >>> 3) Conveying the tags in the generated BTF debug info. >>> >>> This is easy: the main purpose of having this info in BTF is for the >>> compiled eBPF programs. The kernel verifier can then access the tags >>> of pointers used by the eBPF programs. >>> >>> >>> For more information about these tags and the motivation behind them, please >>> refer to the following linux kernel discussions: >>> >>> https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/ >>> https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/ >>> https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/ >>> >>> >>> Implementation Overview >>> ======================= >>> >>> To enable these annotations, two new C language attributes are added: >>> __attribute__((debug_annotate_decl("foo"))) and >>> __attribute__((debug_annotate_type("bar"))). Both attributes accept a single >>> arbitrary string constant argument, which will be recorded in the generated >>> DWARF and/or BTF debug information. They have no effect on code generation. >>> >>> Note that we are not using the same attribute names as LLVM (btf_decl_tag and >>> btf_type_tag, respectively). While these attributes are functionally very >>> similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" >>> in the attribute name seems misleading. >>> >>> DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF, >>> declarations and types will be checked for the corresponding attributes. If >>> present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for >>> the annotated type or declaration, one for each tag. These DIEs link the >>> arbitrary tag value to the item they annotate. >>> >>> For example, the following variable declaration: >>> >>> #define __typetag1 __attribute__((debug_annotate_type ("typetag1"))) >>> >>> #define __decltag1 __attribute__((debug_annotate_decl ("decltag1"))) >>> #define __decltag2 __attribute__((debug_annotate_decl ("decltag2"))) >>> >>> int * __typetag1 x __decltag1 __decltag2; >> >> Based on the above example >> static int do_execve(struct filename *filename, >> const char __user *const __user *__argv, >> const char __user *const __user *__envp); >> >> Should the above example should be the below? >> int __typetag1 * x __decltag1 __decltag2 >> > > This example is not related to the one above. It is just meant to > show the behavior of both attributes. My apologies for not making > that clear. Okay, it should be fine if the dwarf debug_info is shown. > >>> >>> Produces the following DWARF information: >>> >>> <1><1e>: Abbrev Number: 3 (DW_TAG_variable) >>> <1f> DW_AT_name : x >>> <21> DW_AT_decl_file : 1 >>> <22> DW_AT_decl_line : 7 >>> <23> DW_AT_decl_column : 18 >>> <24> DW_AT_type : <0x49> >>> <28> DW_AT_external : 1 >>> <28> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 (DW_OP_addr: 0) >>> <32> DW_AT_sibling : <0x49> >>> <2><36>: Abbrev Number: 1 (User TAG value: 0x6000) >>> <37> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl >>> <3b> DW_AT_const_value : (indirect string, offset: 0xcd): decltag2 >>> <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000) >>> <40> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl >>> <44> DW_AT_const_value : (indirect string, offset: 0x0): decltag1 >>> <2><48>: Abbrev Number: 0 >>> <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type) >>> <4a> DW_AT_byte_size : 8 >>> <4b> DW_AT_type : <0x5d> >>> <4f> DW_AT_sibling : <0x5d> >>> <2><53>: Abbrev Number: 1 (User TAG value: 0x6000) >>> <54> DW_AT_name : (indirect string, offset: 0x9): debug_annotate_type >>> <58> DW_AT_const_value : (indirect string, offset: 0x1d): typetag1 >>> <2><5c>: Abbrev Number: 0 >>> <1><5d>: Abbrev Number: 5 (DW_TAG_base_type) >>> <5e> DW_AT_byte_size : 4 >>> <5f> DW_AT_encoding : 5 (signed) >>> <60> DW_AT_name : int >>> <1><64>: Abbrev Number: 0 This shows the info in .debug_abbrev. What I mean is to show the related info in .debug_info section which seems more useful to understand the relationships between different tags. Maybe this is due to that I am not fully understanding what <1>/<2> means in <1><49> and <2><53> etc. >> >> Maybe you can also show what dwarf debug_info looks like > I am not sure what you mean. This is the .debug_info section as output > by readelf -w. I did trim some information not relevant to the discussion > such as the DW_TAG_compile_unit DIE, for brevity. > >> >>> >>> In the case of BTF, the annotations are recorded in two type kinds recently >>> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. >>> The above example declaration prodcues the following BTF information: >>> >>> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED >>> [2] PTR '(anon)' type_id=3 >>> [3] TYPE_TAG 'typetag1' type_id=1 >>> [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1 >>> [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1 >>> [6] VAR 'x' type_id=2, linkage=global >>> [7] DATASEC '.bss' size=0 vlen=1 >>> type_id=6 offset=0 size=8 (VAR 'x') >>> >>> >> [...]
Hi Yonghong. > On 6/15/22 1:57 PM, David Faust wrote: >> >> On 6/14/22 22:53, Yonghong Song wrote: >>> >>> >>> On 6/7/22 2:43 PM, David Faust wrote: >>>> Hello, >>>> >>>> This patch series adds support for: >>>> >>>> - Two new C-language-level attributes that allow to associate (to "annotate" or >>>> to "tag") particular declarations and types with arbitrary strings. As >>>> explained below, this is intended to be used to, for example, characterize >>>> certain pointer types. >>>> >>>> - The conveyance of that information in the DWARF output in the form of a new >>>> DIE: DW_TAG_GNU_annotation. >>>> >>>> - The conveyance of that information in the BTF output in the form of two new >>>> kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. >>>> >>>> All of these facilities are being added to the eBPF ecosystem, and support for >>>> them exists in some form in LLVM. >>>> >>>> Purpose >>>> ======= >>>> >>>> 1) Addition of C-family language constructs (attributes) to specify free-text >>>> tags on certain language elements, such as struct fields. >>>> >>>> The purpose of these annotations is to provide additional information about >>>> types, variables, and function parameters of interest to the kernel. A >>>> driving use case is to tag pointer types within the linux kernel and eBPF >>>> programs with additional semantic information, such as '__user' or '__rcu'. >>>> >>>> For example, consider the linux kernel function do_execve with the >>>> following declaration: >>>> >>>> static int do_execve(struct filename *filename, >>>> const char __user *const __user *__argv, >>>> const char __user *const __user *__envp); >>>> >>>> Here, __user could be defined with these annotations to record semantic >>>> information about the pointer parameters (e.g., they are user-provided) in >>>> DWARF and BTF information. Other kernel facilites such as the eBPF verifier >>>> can read the tags and make use of the information. >>>> >>>> 2) Conveying the tags in the generated DWARF debug info. >>>> >>>> The main motivation for emitting the tags in DWARF is that the Linux kernel >>>> generates its BTF information via pahole, using DWARF as a source: >>>> >>>> +--------+ BTF BTF +----------+ >>>> | pahole |-------> vmlinux.btf ------->| verifier | >>>> +--------+ +----------+ >>>> ^ ^ >>>> | | >>>> DWARF | BTF | >>>> | | >>>> vmlinux +-------------+ >>>> module1.ko | BPF program | >>>> module2.ko +-------------+ >>>> ... >>>> >>>> This is because: >>>> >>>> a) Unlike GCC, LLVM will only generate BTF for BPF programs. >>>> >>>> b) GCC can generate BTF for whatever target with -gbtf, but there is no >>>> support for linking/deduplicating BTF in the linker. >>>> >>>> In the scenario above, the verifier needs access to the pointer tags of >>>> both the kernel types/declarations (conveyed in the DWARF and translated >>>> to BTF by pahole) and those of the BPF program (available directly in BTF). >>>> >>>> Another motivation for having the tag information in DWARF, unrelated to >>>> BPF and BTF, is that the drgn project (another DWARF consumer) also wants >>>> to benefit from these tags in order to differentiate between different >>>> kinds of pointers in the kernel. >>>> >>>> 3) Conveying the tags in the generated BTF debug info. >>>> >>>> This is easy: the main purpose of having this info in BTF is for the >>>> compiled eBPF programs. The kernel verifier can then access the tags >>>> of pointers used by the eBPF programs. >>>> >>>> >>>> For more information about these tags and the motivation behind them, please >>>> refer to the following linux kernel discussions: >>>> >>>> https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/ >>>> https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/ >>>> https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/ >>>> >>>> >>>> Implementation Overview >>>> ======================= >>>> >>>> To enable these annotations, two new C language attributes are added: >>>> __attribute__((debug_annotate_decl("foo"))) and >>>> __attribute__((debug_annotate_type("bar"))). Both attributes accept a single >>>> arbitrary string constant argument, which will be recorded in the generated >>>> DWARF and/or BTF debug information. They have no effect on code generation. >>>> >>>> Note that we are not using the same attribute names as LLVM (btf_decl_tag and >>>> btf_type_tag, respectively). While these attributes are functionally very >>>> similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" >>>> in the attribute name seems misleading. >>>> >>>> DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF, >>>> declarations and types will be checked for the corresponding attributes. If >>>> present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for >>>> the annotated type or declaration, one for each tag. These DIEs link the >>>> arbitrary tag value to the item they annotate. >>>> >>>> For example, the following variable declaration: >>>> >>>> #define __typetag1 __attribute__((debug_annotate_type ("typetag1"))) >>>> >>>> #define __decltag1 __attribute__((debug_annotate_decl ("decltag1"))) >>>> #define __decltag2 __attribute__((debug_annotate_decl ("decltag2"))) >>>> >>>> int * __typetag1 x __decltag1 __decltag2; >>> >>> Based on the above example >>> static int do_execve(struct filename *filename, >>> const char __user *const __user *__argv, >>> const char __user *const __user *__envp); >>> >>> Should the above example should be the below? >>> int __typetag1 * x __decltag1 __decltag2 >>> >> This example is not related to the one above. It is just meant to >> show the behavior of both attributes. My apologies for not making >> that clear. > > Okay, it should be fine if the dwarf debug_info is shown. > >> >>>> >>>> Produces the following DWARF information: >>>> >>>> <1><1e>: Abbrev Number: 3 (DW_TAG_variable) >>>> <1f> DW_AT_name : x >>>> <21> DW_AT_decl_file : 1 >>>> <22> DW_AT_decl_line : 7 >>>> <23> DW_AT_decl_column : 18 >>>> <24> DW_AT_type : <0x49> >>>> <28> DW_AT_external : 1 >>>> <28> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 (DW_OP_addr: 0) >>>> <32> DW_AT_sibling : <0x49> >>>> <2><36>: Abbrev Number: 1 (User TAG value: 0x6000) >>>> <37> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl >>>> <3b> DW_AT_const_value : (indirect string, offset: 0xcd): decltag2 >>>> <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000) >>>> <40> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl >>>> <44> DW_AT_const_value : (indirect string, offset: 0x0): decltag1 >>>> <2><48>: Abbrev Number: 0 >>>> <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type) >>>> <4a> DW_AT_byte_size : 8 >>>> <4b> DW_AT_type : <0x5d> >>>> <4f> DW_AT_sibling : <0x5d> >>>> <2><53>: Abbrev Number: 1 (User TAG value: 0x6000) >>>> <54> DW_AT_name : (indirect string, offset: 0x9): debug_annotate_type >>>> <58> DW_AT_const_value : (indirect string, offset: 0x1d): typetag1 >>>> <2><5c>: Abbrev Number: 0 >>>> <1><5d>: Abbrev Number: 5 (DW_TAG_base_type) >>>> <5e> DW_AT_byte_size : 4 >>>> <5f> DW_AT_encoding : 5 (signed) >>>> <60> DW_AT_name : int >>>> <1><64>: Abbrev Number: 0 > > This shows the info in .debug_abbrev. What I mean is to > show the related info in .debug_info section which seems more useful to > understand the relationships between different tags. Maybe this is due > to that I am not fully understanding what <1>/<2> means in <1><49> and > <2><53> etc. I think that dump actually shows .debug_info, with the abbrevs expanded... Anyway, it seems to us that the root of this problem is the fact the kernel sparse annotations, such as address_space(__user), are: 1) To be processed by an external kernel-specific tool ( https://sparse.docs.kernel.org/en/latest/annotations.html) and not a C compiler, and therefore, 2) Not quite the same than compiler attributes (despite the way they look.) In particular, they seem to assume an ordering different than of GNU attributes: in some cases given the same written order, they refer to different things!. Which is quite unfortunate :( Now, if I understood properly, you plan to change the definition of __user and __kernel in the kernel sources in order to generate the tag compiler attributes, correct? Is that the reason why LLVM implements what we assume to be the sparse ordering, and not the correct GNU attributes ordering, for the tag attributes? If that is so, we have quite a problem here: I don't think we can change the way GCC handles GNU-like attributes just because the kernel sources want to hook on these __user/__kernel sparse annotations to generate the compiler tags, even if we could mayhaps get GCC to handle debug_annotate_type and debug_annotate_decl differently. Some would say doing so would perpetuate the mistake instead of fixing it... Is my understanding correct? >>> >>> Maybe you can also show what dwarf debug_info looks like >> I am not sure what you mean. This is the .debug_info section as output >> by readelf -w. I did trim some information not relevant to the discussion >> such as the DW_TAG_compile_unit DIE, for brevity. >> >>> >>>> >>>> In the case of BTF, the annotations are recorded in two type kinds recently >>>> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. >>>> The above example declaration prodcues the following BTF information: >>>> >>>> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED >>>> [2] PTR '(anon)' type_id=3 >>>> [3] TYPE_TAG 'typetag1' type_id=1 >>>> [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1 >>>> [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1 >>>> [6] VAR 'x' type_id=2, linkage=global >>>> [7] DATASEC '.bss' size=0 vlen=1 >>>> type_id=6 offset=0 size=8 (VAR 'x') >>>> >>>> >>> [...]
On 6/17/22 10:18 AM, Jose E. Marchesi wrote: > > Hi Yonghong. > >> On 6/15/22 1:57 PM, David Faust wrote: >>> >>> On 6/14/22 22:53, Yonghong Song wrote: >>>> >>>> >>>> On 6/7/22 2:43 PM, David Faust wrote: >>>>> Hello, >>>>> >>>>> This patch series adds support for: >>>>> >>>>> - Two new C-language-level attributes that allow to associate (to "annotate" or >>>>> to "tag") particular declarations and types with arbitrary strings. As >>>>> explained below, this is intended to be used to, for example, characterize >>>>> certain pointer types. >>>>> >>>>> - The conveyance of that information in the DWARF output in the form of a new >>>>> DIE: DW_TAG_GNU_annotation. >>>>> >>>>> - The conveyance of that information in the BTF output in the form of two new >>>>> kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. >>>>> >>>>> All of these facilities are being added to the eBPF ecosystem, and support for >>>>> them exists in some form in LLVM. >>>>> >>>>> Purpose >>>>> ======= >>>>> >>>>> 1) Addition of C-family language constructs (attributes) to specify free-text >>>>> tags on certain language elements, such as struct fields. >>>>> >>>>> The purpose of these annotations is to provide additional information about >>>>> types, variables, and function parameters of interest to the kernel. A >>>>> driving use case is to tag pointer types within the linux kernel and eBPF >>>>> programs with additional semantic information, such as '__user' or '__rcu'. >>>>> >>>>> For example, consider the linux kernel function do_execve with the >>>>> following declaration: >>>>> >>>>> static int do_execve(struct filename *filename, >>>>> const char __user *const __user *__argv, >>>>> const char __user *const __user *__envp); >>>>> >>>>> Here, __user could be defined with these annotations to record semantic >>>>> information about the pointer parameters (e.g., they are user-provided) in >>>>> DWARF and BTF information. Other kernel facilites such as the eBPF verifier >>>>> can read the tags and make use of the information. >>>>> >>>>> 2) Conveying the tags in the generated DWARF debug info. >>>>> >>>>> The main motivation for emitting the tags in DWARF is that the Linux kernel >>>>> generates its BTF information via pahole, using DWARF as a source: >>>>> >>>>> +--------+ BTF BTF +----------+ >>>>> | pahole |-------> vmlinux.btf ------->| verifier | >>>>> +--------+ +----------+ >>>>> ^ ^ >>>>> | | >>>>> DWARF | BTF | >>>>> | | >>>>> vmlinux +-------------+ >>>>> module1.ko | BPF program | >>>>> module2.ko +-------------+ >>>>> ... >>>>> >>>>> This is because: >>>>> >>>>> a) Unlike GCC, LLVM will only generate BTF for BPF programs. >>>>> >>>>> b) GCC can generate BTF for whatever target with -gbtf, but there is no >>>>> support for linking/deduplicating BTF in the linker. >>>>> >>>>> In the scenario above, the verifier needs access to the pointer tags of >>>>> both the kernel types/declarations (conveyed in the DWARF and translated >>>>> to BTF by pahole) and those of the BPF program (available directly in BTF). >>>>> >>>>> Another motivation for having the tag information in DWARF, unrelated to >>>>> BPF and BTF, is that the drgn project (another DWARF consumer) also wants >>>>> to benefit from these tags in order to differentiate between different >>>>> kinds of pointers in the kernel. >>>>> >>>>> 3) Conveying the tags in the generated BTF debug info. >>>>> >>>>> This is easy: the main purpose of having this info in BTF is for the >>>>> compiled eBPF programs. The kernel verifier can then access the tags >>>>> of pointers used by the eBPF programs. >>>>> >>>>> >>>>> For more information about these tags and the motivation behind them, please >>>>> refer to the following linux kernel discussions: >>>>> >>>>> https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/ >>>>> https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com/ >>>>> https://lore.kernel.org/bpf/20211112012604.1504583-1-yhs@fb.com/ >>>>> >>>>> >>>>> Implementation Overview >>>>> ======================= >>>>> >>>>> To enable these annotations, two new C language attributes are added: >>>>> __attribute__((debug_annotate_decl("foo"))) and >>>>> __attribute__((debug_annotate_type("bar"))). Both attributes accept a single >>>>> arbitrary string constant argument, which will be recorded in the generated >>>>> DWARF and/or BTF debug information. They have no effect on code generation. >>>>> >>>>> Note that we are not using the same attribute names as LLVM (btf_decl_tag and >>>>> btf_type_tag, respectively). While these attributes are functionally very >>>>> similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf" >>>>> in the attribute name seems misleading. >>>>> >>>>> DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF, >>>>> declarations and types will be checked for the corresponding attributes. If >>>>> present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for >>>>> the annotated type or declaration, one for each tag. These DIEs link the >>>>> arbitrary tag value to the item they annotate. >>>>> >>>>> For example, the following variable declaration: >>>>> >>>>> #define __typetag1 __attribute__((debug_annotate_type ("typetag1"))) >>>>> >>>>> #define __decltag1 __attribute__((debug_annotate_decl ("decltag1"))) >>>>> #define __decltag2 __attribute__((debug_annotate_decl ("decltag2"))) >>>>> >>>>> int * __typetag1 x __decltag1 __decltag2; >>>> >>>> Based on the above example >>>> static int do_execve(struct filename *filename, >>>> const char __user *const __user *__argv, >>>> const char __user *const __user *__envp); >>>> >>>> Should the above example should be the below? >>>> int __typetag1 * x __decltag1 __decltag2 >>>> >>> This example is not related to the one above. It is just meant to >>> show the behavior of both attributes. My apologies for not making >>> that clear. >> >> Okay, it should be fine if the dwarf debug_info is shown. >> >>> >>>>> >>>>> Produces the following DWARF information: >>>>> >>>>> <1><1e>: Abbrev Number: 3 (DW_TAG_variable) >>>>> <1f> DW_AT_name : x >>>>> <21> DW_AT_decl_file : 1 >>>>> <22> DW_AT_decl_line : 7 >>>>> <23> DW_AT_decl_column : 18 >>>>> <24> DW_AT_type : <0x49> >>>>> <28> DW_AT_external : 1 >>>>> <28> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 (DW_OP_addr: 0) >>>>> <32> DW_AT_sibling : <0x49> >>>>> <2><36>: Abbrev Number: 1 (User TAG value: 0x6000) >>>>> <37> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl >>>>> <3b> DW_AT_const_value : (indirect string, offset: 0xcd): decltag2 >>>>> <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000) >>>>> <40> DW_AT_name : (indirect string, offset: 0xd6): debug_annotate_decl >>>>> <44> DW_AT_const_value : (indirect string, offset: 0x0): decltag1 >>>>> <2><48>: Abbrev Number: 0 >>>>> <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type) >>>>> <4a> DW_AT_byte_size : 8 >>>>> <4b> DW_AT_type : <0x5d> >>>>> <4f> DW_AT_sibling : <0x5d> >>>>> <2><53>: Abbrev Number: 1 (User TAG value: 0x6000) >>>>> <54> DW_AT_name : (indirect string, offset: 0x9): debug_annotate_type >>>>> <58> DW_AT_const_value : (indirect string, offset: 0x1d): typetag1 >>>>> <2><5c>: Abbrev Number: 0 >>>>> <1><5d>: Abbrev Number: 5 (DW_TAG_base_type) >>>>> <5e> DW_AT_byte_size : 4 >>>>> <5f> DW_AT_encoding : 5 (signed) >>>>> <60> DW_AT_name : int >>>>> <1><64>: Abbrev Number: 0 >> >> This shows the info in .debug_abbrev. What I mean is to >> show the related info in .debug_info section which seems more useful to >> understand the relationships between different tags. Maybe this is due >> to that I am not fully understanding what <1>/<2> means in <1><49> and >> <2><53> etc. > > I think that dump actually shows .debug_info, with the abbrevs > expanded... > > Anyway, it seems to us that the root of this problem is the fact the > kernel sparse annotations, such as address_space(__user), are: > > 1) To be processed by an external kernel-specific tool ( > https://sparse.docs.kernel.org/en/latest/annotations.html) and not a > C compiler, and therefore, > > 2) Not quite the same than compiler attributes (despite the way they > look.) In particular, they seem to assume an ordering different than > of GNU attributes: in some cases given the same written order, they > refer to different things!. Which is quite unfortunate :( Yes, currently __user/__kernel macros (implemented with address_space attribute) are processed by macros. > > Now, if I understood properly, you plan to change the definition of > __user and __kernel in the kernel sources in order to generate the tag > compiler attributes, correct? Right. The original __user definition likes: # define __user __attribute__((noderef, address_space(__user))) The new attribute looks like # define BTF_TYPE_TAG(value) __attribute__((btf_type_tag(#value))) # define __user BTF_TYPE_TAG(user) > > Is that the reason why LLVM implements what we assume to be the sparse > ordering, and not the correct GNU attributes ordering, for the tag > attributes? Note that __user attributes apply to pointee's and not pointers. Just like const int *p; the 'const' is not applied to pointer 'p', but the pointee of 'p'. What current llvm dwarf generation with pointer <--- btf_type_tag is just ONE implementation. As I said earlier, I am okay to have dwarf implementation like p->btf_type_tag->const->int. If you can propose an implementation like this in dwarf. I can propose to change implementation in llvm. > > If that is so, we have quite a problem here: I don't think we can change > the way GCC handles GNU-like attributes just because the kernel sources > want to hook on these __user/__kernel sparse annotations to generate the > compiler tags, even if we could mayhaps get GCC to handle > debug_annotate_type and debug_annotate_decl differently. Some would say > doing so would perpetuate the mistake instead of fixing it... > > Is my understanding correct? Let us just say that the btf_type_tag attribute applies to pointees. Does this help? > >>>> >>>> Maybe you can also show what dwarf debug_info looks like >>> I am not sure what you mean. This is the .debug_info section as output >>> by readelf -w. I did trim some information not relevant to the discussion >>> such as the DW_TAG_compile_unit DIE, for brevity. >>> >>>> >>>>> >>>>> In the case of BTF, the annotations are recorded in two type kinds recently >>>>> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. >>>>> The above example declaration prodcues the following BTF information: >>>>> >>>>> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED >>>>> [2] PTR '(anon)' type_id=3 >>>>> [3] TYPE_TAG 'typetag1' type_id=1 >>>>> [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1 >>>>> [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1 >>>>> [6] VAR 'x' type_id=2, linkage=global >>>>> [7] DATASEC '.bss' size=0 vlen=1 >>>>> type_id=6 offset=0 size=8 (VAR 'x') >>>>> >>>>> >>>> [...]