From patchwork Thu Jul 20 10:36:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lancelot SIX X-Patchwork-Id: 72990 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 10E92385AF9B for ; Thu, 20 Jul 2023 10:37:19 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 10E92385AF9B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1689849439; bh=ybs3neweb4+vrM0rP3U8pvD1FOHd8qeIIhhBX3obZh4=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=NOqs11FiFu+wJz7qKtQoBPhv9BnA2l0Jyto7+AQ/hBLEw0xYY4ocp2WfoSUphlYeV 9EX0+IvpXe74wdIeuDE8YhAPx6ffPI3B+1JBPZ5DOnxcRUGfTYHlT5B5wt8x7q8Bei kHuAh1vZqK1u+bBDNRHvsJzsMMu/hQDCLXnFcOdk= X-Original-To: gdb-patches@sourceware.org Delivered-To: gdb-patches@sourceware.org Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2054.outbound.protection.outlook.com [40.107.94.54]) by sourceware.org (Postfix) with ESMTPS id 23DB03858C2B for ; Thu, 20 Jul 2023 10:36:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 23DB03858C2B ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HGE2Zkp1EvOfkI6tkTzQKfptbY+60vhd+8m2OpEbD27IcGMh2KDh/YQJakYuuVFmMAs6jNWt2c6yA0ww7b8ExKptHyZBjo34bq7KrkdvfRT6VhxzL17toSa79Ua8cqbyy7QZq1hPT55BrW4upImaCmNWicPCFgn5wFgOHn+y0TwCwoHgHTdWLGkhbDok3Ez/0dc6nFjHV/MOmnEdW/QT5Z0hAm2c0D9UwRNY656aCRKJMWFhOztI4mKkFEuOB5w8mxmCxuLxTc+pSd1xtAqJ0ay/GDBfHxBJSwltAzl/IeXF91w1/YXVoqGasWX+jSVbt8MJDwRas3SpwIp3PZduIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ybs3neweb4+vrM0rP3U8pvD1FOHd8qeIIhhBX3obZh4=; b=cGi7oAAZC5xjc6PXwVc0KNqMs/1dkG1AP/Z/giXJ1jIlCgRWgWM72ztDYUlrRHgnQNVoFeRuuwYIqdO0VpS/zTZ+ABbmAACgw3cMGIC7dw3+caU0p9GkvPuZnN7pg7GMfu4B7ZeEeL4DpteALF/8oGK922nQefbGjhWiS3zisMtDPBfF/sIQJXAmhKD1+/EGALrFdwm2suE4+7dn862tsu5mOW85o4W9VXOSQGYql7wIuxdLyJdX7G47yx2LDXTEF/YI9jyxClH1l2BwXcsp7yaWKph67qZMeFvNQ+tNNm2Wz6MA7qtF26vFqxzVhiGbBXdfeqQ3JDDUlwNLGOod9w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=sourceware.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none Received: from BN9PR03CA0235.namprd03.prod.outlook.com (2603:10b6:408:f8::30) by PH8PR12MB6892.namprd12.prod.outlook.com (2603:10b6:510:1bc::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24; Thu, 20 Jul 2023 10:36:50 +0000 Received: from BN8NAM11FT041.eop-nam11.prod.protection.outlook.com (2603:10b6:408:f8:cafe::b5) by BN9PR03CA0235.outlook.office365.com (2603:10b6:408:f8::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.33 via Frontend Transport; Thu, 20 Jul 2023 10:36:49 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT041.mail.protection.outlook.com (10.13.177.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6609.25 via Frontend Transport; Thu, 20 Jul 2023 10:36:49 +0000 Received: from hpe6u-23.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Thu, 20 Jul 2023 05:36:47 -0500 To: CC: , , Lancelot Six Subject: [PATCH v2] gdb/solib-rocm: limit the number of opened file descriptors Date: Thu, 20 Jul 2023 10:36:15 +0000 Message-ID: <20230720103615.93419-1-lancelot.six@amd.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT041:EE_|PH8PR12MB6892:EE_ X-MS-Office365-Filtering-Correlation-Id: 54011334-cc67-4ee8-f92d-08db890d397f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: YONHsuVAcvLquJLU/rJXCpxV9CFieTsZ24Y9InBpkYGBluMhK5Y7ecgEjtpcWwpdQ6ycvLD7zs4bc0wd8S5DQcFvNyWJMKmK4SP1cBgCmkVF30/PfJ08dLInmw3vCxLCEU6pKqnQg9K54RmJ5FX/khMgiYMAdH5t8EEUKkyRqSIyknE1jQc+jvj0A3UbnBALxbR9IfgN9HBIHJLrgzV0DSJEM17PlyLmiyW4TBSiIhNzQXuth+gocv/MRIk4ol57bj/mVeyBNnEc8gc6RFjIbvdOEbuiMNjY3uKhsEccEhDiPwH23cEkkCvJJ3ptaD4tsWZ9MpxJNWyTLk1Z7PMVl2wHv8i9NtojQoOgarG9u+NdWSu+sQXyOPko6dOfiAr9KV71vwt9OkeLOIiZYO5zyYn/b48B+1ELRWl/j6UT+tqhJlA2biU72W/uUujeXrUJA9OnwGydGYY6LYFSO7bHl72M/hpDNHl2F8ZbxYN82YEuSfXc/8kLvHr78d2ai3xNhn+etTr4GSLPqUuCUSpTSNGV6hpSYgK83pqf7moXpBtzsWeeEhmu6DUH6Tu98t0qqUsrQUhjTgNGIAXGPi7lznCGgd3YDBEb3Ehg8hZktqGF4ev2SRLCH7JWnq/Xw7V/tiUZ4g9fwE6fa1ejE3JH5wtMYk9IZPMxLo3qJoN57wF6sXTW+Z0eTCKE6BwK31Fm3YxRhbkTYlQVEAdHO/oyr1w8L2PuMS57LEguwaAV5aE1w/q7pCty1T5hafFODsca X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB04.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230028)(4636009)(396003)(136003)(39860400002)(346002)(376002)(82310400008)(451199021)(36840700001)(40470700004)(46966006)(86362001)(356005)(81166007)(36756003)(82740400003)(40460700003)(40480700001)(2906002)(54906003)(478600001)(36860700001)(2616005)(426003)(186003)(336012)(1076003)(16526019)(26005)(47076005)(41300700001)(83380400001)(8936002)(5660300002)(44832011)(8676002)(966005)(7696005)(6666004)(70586007)(70206006)(4326008)(316002)(6916009)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Jul 2023 10:36:49.4856 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 54011334-cc67-4ee8-f92d-08db890d397f X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT041.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH8PR12MB6892 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Lancelot Six via Gdb-patches From: Lancelot SIX Reply-To: Lancelot Six Errors-To: gdb-patches-bounces+patchwork=sourceware.org@sourceware.org Sender: "Gdb-patches" Hi, Here is a V2 for https://sourceware.org/pipermail/gdb-patches/2023-June/200532.html. Changes since V1: - Addressed Pedro's comments. - I have not changed rocm_solib_fd_cache::close to return void as I really see this a wrapper around target_fileio_close. I however made sure the caller of rocm_solib_fd_cache::close checks the return value and issues a warning in case of error. Best, Lancelot. --- ROCm programs can load a high number of compute kernels on GPU devices, especially if lazy code-object loading have been disabled. Each code object containing such program is loaded once for each device available, and each instance is reported by GDB as an individual shared library. We came across situations where the number of shared libraries opened by GDB gets higher than the allowed number of opened files for the process. Increasing the opened files limit works around the problem, but there is a better way this patch proposes to follow. Under the hood, the GPU code objects are embedded inside the host application binary and shared library binaries. GDB currently opens the underlying file once for each shared library it sees. That means that the same file is re-opened every time a code object is loaded on a GPU. This patch proposes to only open each underlying file once. This is done by implementing a reference counting mechanism so the underlying file is opened when the underlying file first needs to be opened, and closed when the last BFD using the underlying file is closed. On a program where GDB used to open about 1500 files to load all shared libraries, this patch makes it so only 54 opened file descriptors are needed. I have tested this patch on downstream ROCgdb's full testsuite and upstream GDB testsuite with no regression. --- gdb/solib-rocm.c | 125 +++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 115 insertions(+), 10 deletions(-) base-commit: 2de783f754c1b10b28271aa352881dc451efe341 diff --git a/gdb/solib-rocm.c b/gdb/solib-rocm.c index 0ea3702e9e3..882920a3711 100644 --- a/gdb/solib-rocm.c +++ b/gdb/solib-rocm.c @@ -32,12 +32,111 @@ #include "solist.h" #include "symfile.h" +#include + +namespace { + +/* Per inferior cache of opened file descriptors. */ +struct rocm_solib_fd_cache +{ + explicit rocm_solib_fd_cache (inferior *inf) : m_inferior (inf) {} + DISABLE_COPY_AND_ASSIGN (rocm_solib_fd_cache); + + /* Return a read-only file descriptor to FILENAME and increment the + associated reference count. + + Open the file FILENAME if it is not already opened, reuse the existing file + descriptor otherwise. + + On error -1 is returned, and TARGET_ERRNO is set. */ + int open (const std::string &filename, fileio_error *target_errno); + + /* Decrement the reference count to FD and close FD if the reference count + reaches 0. + + On success, return 0. On error, return -1 and set TARGET_ERRNO. */ + int close (int fd, fileio_error *target_errno); + +private: + struct refcnt_fd + { + DISABLE_COPY_AND_ASSIGN (refcnt_fd); + refcnt_fd (int fd, int refcnt) : fd (fd), refcnt (refcnt) {} + + int fd = -1; + int refcnt = 0; + }; + + inferior *m_inferior; + std::unordered_map m_cache; +}; + +int +rocm_solib_fd_cache::open (const std::string &filename, + fileio_error *target_errno) +{ + auto it = m_cache.find (filename); + if (it == m_cache.end ()) + { + /* The file is not yet opened on the target. */ + int fd + = target_fileio_open (m_inferior, filename.c_str (), FILEIO_O_RDONLY, + false, 0, target_errno); + if (fd != -1) + m_cache.emplace (std::piecewise_construct, + std::forward_as_tuple (filename), + std::forward_as_tuple (fd, 1)); + return fd; + } + else + { + /* The file is already opened. Increment the refcnt and return the + already opened FD. */ + it->second.refcnt++; + gdb_assert (it->second.fd != -1); + return it->second.fd; + } +} + +int +rocm_solib_fd_cache::close (int fd, fileio_error *target_errno) +{ + using cache_val = std::unordered_map::value_type; + auto it + = std::find_if (m_cache.begin (), m_cache.end (), + [fd](const cache_val &s) { return s.second.fd == fd; }); + + gdb_assert (it != m_cache.end ()); + + it->second.refcnt--; + if (it->second.refcnt == 0) + { + int ret = target_fileio_close (it->second.fd, target_errno); + m_cache.erase (it); + return ret; + } + else + { + /* Keep the FD open for the other users, return success. */ + return 0; + } +} + +} /* Anonymous namespace. */ + /* ROCm-specific inferior data. */ struct solib_info { + explicit solib_info (inferior *inf) + : solib_list (nullptr), fd_cache (inf) + {}; + /* List of code objects loaded into the inferior. */ so_list *solib_list; + + /* Cache of opened FD in the inferior. */ + rocm_solib_fd_cache fd_cache; }; /* Per-inferior data key. */ @@ -70,7 +169,7 @@ get_solib_info (inferior *inf) solib_info *info = rocm_solib_data.get (inf); if (info == nullptr) - info = rocm_solib_data.emplace (inf); + info = rocm_solib_data.emplace (inf, inf); return info; } @@ -217,7 +316,8 @@ struct rocm_code_object_stream_file final : rocm_code_object_stream { DISABLE_COPY_AND_ASSIGN (rocm_code_object_stream_file); - rocm_code_object_stream_file (int fd, ULONGEST offset, ULONGEST size); + rocm_code_object_stream_file (inferior *inf, int fd, ULONGEST offset, + ULONGEST size); file_ptr read (void *buf, file_ptr size, file_ptr offset) override; @@ -227,6 +327,9 @@ struct rocm_code_object_stream_file final : rocm_code_object_stream protected: + /* The inferior owning this code object stream. */ + inferior *m_inf; + /* The target file descriptor for this stream. */ int m_fd; @@ -239,8 +342,8 @@ struct rocm_code_object_stream_file final : rocm_code_object_stream }; rocm_code_object_stream_file::rocm_code_object_stream_file - (int fd, ULONGEST offset, ULONGEST size) - : m_fd (fd), m_offset (offset), m_size (size) + (inferior *inf, int fd, ULONGEST offset, ULONGEST size) + : m_inf (inf), m_fd (fd), m_offset (offset), m_size (size) { } @@ -305,8 +408,11 @@ rocm_code_object_stream_file::size () rocm_code_object_stream_file::~rocm_code_object_stream_file () { + auto info = get_solib_info (m_inf); fileio_error target_errno; - target_fileio_close (m_fd, &target_errno); + if (info->fd_cache.close (m_fd, &target_errno) != 0) + warning (_("Failed to close solib: %s"), + strerror (fileio_error_to_host (target_errno))); } /* Interface to a code object which lives in the inferior's memory. */ @@ -446,11 +552,9 @@ rocm_bfd_iovec_open (bfd *abfd, void *inferior_void) if (protocol == "file") { + auto info = get_solib_info (inferior); fileio_error target_errno; - int fd - = target_fileio_open (static_cast (inferior), - decoded_path.c_str (), FILEIO_O_RDONLY, - false, 0, &target_errno); + int fd = info->fd_cache.open (decoded_path, &target_errno); if (fd == -1) { @@ -459,7 +563,8 @@ rocm_bfd_iovec_open (bfd *abfd, void *inferior_void) return nullptr; } - return new rocm_code_object_stream_file (fd, offset, size); + return new rocm_code_object_stream_file (inferior, fd, offset, + size); } if (protocol == "memory")