From patchwork Wed Nov 1 10:43:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Szabolcs Nagy X-Patchwork-Id: 24032 Received: (qmail 48619 invoked by alias); 1 Nov 2017 10:43:25 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 48603 invoked by uid 89); 1 Nov 2017 10:43:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=Hx-spam-relays-external:15.20.197.13, H*RU:15.20.197.13, 2017-10-25, 20171025 X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Szabolcs.Nagy@arm.com; Message-ID: <59F9A544.9080709@arm.com> Date: Wed, 01 Nov 2017 10:43:16 +0000 From: Szabolcs Nagy User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.8.0 MIME-Version: 1.0 To: GNU C Library CC: nd@arm.com Subject: [PATCH] aarch64: optimize _dl_tlsdesc_dynamic fast path X-ClientProxiedBy: CWXP265CA0013.GBRP265.PROD.OUTLOOK.COM (2603:10a6:400:2e::25) To DB6PR0802MB2488.eurprd08.prod.outlook.com (2603:10a6:4:a0::23) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: fdd91416-28f9-475a-dfa2-08d521155da3 X-MS-Office365-Filtering-HT: Tenant X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(48565401081)(4534020)(4602075)(2017052603238)(49563074); SRVR:DB6PR0802MB2488; X-Microsoft-Exchange-Diagnostics: 1; DB6PR0802MB2488; 3:kqxor9psV+FPEHnppcWFkOjdr242AUChyNLQnESBcVZWB+AtVIMUOyN4lqleG1kmsDYFe5/YCzWlMLaA7g2ko3FHG9AO1pOf+d7c+WVAKmScJl9QG2jV7PNiD2mCmSxSxVQAbzuARAxZUAK47s5EHGXppX4gdtlYu2uU7y674aclUSsaubiZ2wIYCyEy57+k/4K62E4UiNHcAK3FnRzkPRDZLm3wnnd+ESG3lHcmxmOmxof2xPDCslZpxrJ20lV/; 25:qRPiip2/0yR3qB6H9CEUnaopg50vYvSKigXLXK8U5l9dr7J01ZltoaWKF3OSargnXW4jUZp/HWS9P4kpKZLc8uYsFKlGIMDaRqO18w8NhL00g3AcQO31ptiU+JslPfi+bmMmLlaQhvYbzHa6MIE3+46zONSCQbB77LnxPyRipWrz14r1hPiM9rnC2zj1EGMC8ICTSSrU0zypokmpyPGsaTCEBAopxB8Ohc2lIt7mHolxtGWDySr/OcPkEPfiyEoqbq5u9toQYtDAo8cmRBv47iXZHGIKtDIyzikgYPOiRXwPJ5Xbvig8ZQuEms9EDOfS718yDwiLbkzubD+oGOlJdg==; 31:Hyb/jMfVV6psr03hhqOtJ6BaJ14ef7wFF7iH4UCQ6v7PHuTGOdB0vLjSD2imAWe3T3ltIm3KtFwy5oc/PEcISLPc6D+kW7ne7h6Koz0gqXJosVlbMMlkbusrgy87ebpDYzU9j3mmzXf6BlVaNFNU5Ca+scIp8a4kYmoI0Ig+qOT+0RgESzNs/hXYC/rkFHabzGaaZ5I1+ZX9VEAX212Ewdxm+kbpF2B6ku+VH6JDcWw= X-MS-TrafficTypeDiagnostic: DB6PR0802MB2488: NoDisclaimer: True X-Microsoft-Exchange-Diagnostics: 1; DB6PR0802MB2488; 20:s7uURoIdYtCPYf2TkHV4rvIReVeJqQN9VqTlzdgscPOP2GDvwZkhscMPChh6HpgreF9pt6+kh2xI8wmt7RLnDmey5zwDWNn99r54dMb6CaPzeY4T02fis8V299VVKWWpe8A4gUJH7DmVI75RK6JWv6LcgKLkJHicReJkhfHPYQU=; 4:yTr5IaFHItlFnxWrOWSATL0guj7HfqUTjSEcl7hm54DI0VK1CHHAjVJYO/cUk7YNRl/kWA8N0RX8rgYww0tB/T4joB9m18ieDyX/QMYsPsYwR2w1a1lmXDkR0FWMUxLcDRmmxJC6iGQYqDqh8iONdHKluN0+K++xYr3B8YPClcVAVFYR5BfN8x9niko6CJyxx0mIaXLc41YCBYzmTHkWFe8fDd5t2KVij3gRksNOkG6e99/zgIUhWT02xl4ZvulHgVKAtrpka0tMW82YWExHqQ== X-Exchange-Antispam-Report-Test: UriScan:; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(102415395)(6040450)(2401047)(8121501046)(5005006)(100000703101)(100105400095)(93006095)(93001095)(3231020)(3002001)(10201501046)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(20161123564025)(20161123562025)(20161123558100)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095); SRVR:DB6PR0802MB2488; BCL:0; PCL:0; RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095); SRVR:DB6PR0802MB2488; X-Forefront-PRVS: 0478C23FE0 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(6049001)(6009001)(39860400002)(376002)(346002)(199003)(189002)(77096006)(8676002)(25786009)(105586002)(564344004)(81166006)(81156014)(106356001)(316002)(2476003)(86362001)(59896002)(8936002)(4326008)(65816999)(54356999)(87266999)(50986999)(6486002)(7736002)(58126008)(6116002)(189998001)(53936002)(3846002)(84326002)(16576012)(270700001)(2906002)(6916009)(36756003)(80316001)(64126003)(568964002)(83506002)(478600001)(5660300001)(305945005)(16586007)(66066001)(68736007)(33656002)(5000100001)(101416001)(97736004)(5890100001)(72206003)(65956001)(4610100001)(16526018)(65806001); DIR:OUT; SFP:1101; SCL:1; SRVR:DB6PR0802MB2488; H:[10.2.206.69]; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; DB6PR0802MB2488; 23:7hVmUG4lf53REo1G8EUcHWd90iR7ZkthHsptaua?= =?us-ascii?Q?xqwtGhuTUZtAFDqx6bECIaarbaFLiNe1kcAKNx/28meEGl40IHZ6hDqypfh2?= =?us-ascii?Q?2q8QjLGymVfR3BsnU0zknai20jGPXAbH79AlS6y+Axcp91hx4cnm9BcN6rFI?= =?us-ascii?Q?yAq5V2uhJXU9NsraBOuXJpCn1PWAgajVwW1GnSz4kq7eB8RIOR1kAc9R32Qv?= =?us-ascii?Q?AYZ4FgzALq8FGUm/TET3Fnz28xTenJS3LQ3KeO351PngfYGwZQlwboq6lWKD?= =?us-ascii?Q?rir06IEoXMSHfzh46hO2jFi3bdBO2nfbjH2/U9Y8aG92K01Ge7YlvjatnInd?= =?us-ascii?Q?VHtEmyY6Yx/abpypjGh/+cYzGwn+eOVy6ZmVHiKYWNsDqmBD6vUyEpMf8cpK?= =?us-ascii?Q?Xtg48Uq1fbEnQE1Rlajydn0PmiWo3LGLvQxSkmr0qla78tut+xBQZDP1alIc?= =?us-ascii?Q?gZ9HSOj3o4Rwyxr2ahK1CklyLV9y0W0p6hDxWvaj1kUu1ip+Q1+4TrI36gmG?= =?us-ascii?Q?eZ1YTi48VaOJB6qTUsxDJHN16ZkjxnoA4OYNgPYH+KZK91lgiStWZn/wsQ1s?= =?us-ascii?Q?06JuJFENoObSY03Aap0/Q/l+paLg1xM4Orc6NBtbyZoYaOD+y7iQhMJgDR/+?= =?us-ascii?Q?1FX/1vPZ9g9O8kiICk+5+vLZgZf2LEwCiSH6rw32gQqJPQALlOBBPsmaft/S?= =?us-ascii?Q?e/lcIth9ascXYQ+xdAqOk5WqTZksBaCK9ndujSW2b3H2jHMGoCMdP3iDFUfb?= =?us-ascii?Q?ZC+TxuaJziMRIpXzHxcrDWOAShfaMEbi7tIZD+kKDPGUTQDhvRswEqMXcDgF?= =?us-ascii?Q?75t8jq70tQe1FfFY3yuMpG3KuMa80LrC2MNQVzNiDVnyjm79NOVQzRrlt7Sz?= =?us-ascii?Q?ykjrdG6+nyswcR7j8tUeiPgxFYj7aH1KuyvZ1PBXnUNHyW7qOxLHjVGa08Yl?= =?us-ascii?Q?ZYj7fX+Qz148zLm8nF1UhUA88y7nFib4raAlXrWmhBsJeGK0Y/wopmyqenRC?= =?us-ascii?Q?tnjBZ/qYcOioy17MZM8TvexU3dJRxIo+TT2D4pFaqSjAnh8wcW6/slly9FEz?= =?us-ascii?Q?g1+PpAvB8B/Awz8CVqYXj+J2tQT50JUUfPJfwruy8IBLQBWzo9S0G0L0nr6s?= =?us-ascii?Q?5oslStdrGs7Xd4+zeHBZg+T/fFkfT8nopfWdH2g0uP7bEfOFyokhSV8cKUUN?= =?us-ascii?Q?iuipAy+inhRxM5vNBP0F3bSl60YG12Ty0RA3GjG/1UQRirgahVmWriPU3cV9?= =?us-ascii?Q?IEcAvRzE5t9dDO8Pe/2392IgeCuEmNu1v3SQbkGXiMovr9Z8+JJ3yW3R5hyW?= =?us-ascii?Q?DtqBGbSw1hPOfkzVGOnGxvwtG+8K76LVX3/6XXgK/F7ZRE+fSSn1zwxHRh/8?= =?us-ascii?Q?0+HohRg=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; DB6PR0802MB2488; 6:4teGPhnB4lL+PivMBaENfVorrbtDnIng97ekOBfkZffOL1enXfYOuam7EmHOHQh0saQ1s/jGH3ivRhRvNO8gm9mVcgLPSUAxIecgYcdrwDLd9J16M2vM/l/5ylUPmaaec9JHlG0NK4WE5rLoV69Xp1KqSlMYJfryLDHI3YbMLhGHq7gshJ4vxPkzQFMr/B+MHg9Fz5pw1JH79fAMZj69Hya+/tRtJM/RcW1Yogl4SJ8DACNGxyEVi+ddN9Gc0fQCb/gyIjUkBQeyiBYMtzH2a3WA+f+YkXlxuZs11n1EUU+QoQLkMa/HVEB465fbbmHWcsnGnmxv4UgWfl83DnMIAp6lZzKVC39GaES1hxpqQ8U=; 5:gVAH3T9UZm1f2Cmzda5SyJweSyDPrmbpfpYsPnQUjYmQ+dBgbVSOHDG6rx7QH50nBjnHR4lklo0Q+aAZH7ScaDTT2wmREDT1cEOOj7DKRQgyaoFjmu3lDWtxgpHkIFVUZtyHqwDVnb1Pm/tuisJ42U3gbj/ngBfy0jhlIZ9z4Bk=; 24:IN4BjF7/NZ9J3s7M0O3dKxl7fu4sSVqATrZapuRfQn1J4K7OXeAc5h+aUsl7n8Hrp1Zq3E9lmVRgq0H0BzJf25bhwEMGHqnCc1whRAjAPpc=; 7:mh8jTd6efvskwN4vMnaM5BI8SkdbNdju0YotxMSzgFnmdRuL1D+uPJb71LRswQTKZHq91C3d5RbC/92FTqStgHibAAwQp5raGjH/eHeOhWNGAf7Vb6ba7lhCPJOE26w3fvfnewgzOJ5HTMuvE2kpNadP+Pg19NMTvVCzjDlsK6alf4kWOBS51vU7qzJXsqP+fHjqig1cgLIXBwGw7IRAjS+BiC1LRFRP/Yhr/nIgSiyUbrqHcR4BAZkNA/ZrliUF SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Nov 2017 10:43:18.9499 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fdd91416-28f9-475a-dfa2-08d521155da3 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB6PR0802MB2488 This patch will go on top of the lazy tlsdesc removal patch set. From 9f713143d817fdf60233ecbc8104d6e9d028342a Mon Sep 17 00:00:00 2001 From: Szabolcs Nagy Date: Tue, 24 Oct 2017 17:49:14 +0100 Subject: [PATCH] aarch64: optimize _dl_tlsdesc_dynamic fast path Remove some load/store instructions from the dynamic tlsdesc resolver fast path. This gives around 20% faster tls access in dlopened shared libraries (assuming glibc ran out of static tls space). 2017-10-25 Szabolcs Nagy * sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_dynamic): Optimize. --- sysdeps/aarch64/dl-tlsdesc.S | 105 +++++++++++++++++++++---------------------- 1 file changed, 51 insertions(+), 54 deletions(-) diff --git a/sysdeps/aarch64/dl-tlsdesc.S b/sysdeps/aarch64/dl-tlsdesc.S index 70550c7ce0..1d2008cbf2 100644 --- a/sysdeps/aarch64/dl-tlsdesc.S +++ b/sysdeps/aarch64/dl-tlsdesc.S @@ -142,23 +142,17 @@ _dl_tlsdesc_undefweak: cfi_startproc .align 2 _dl_tlsdesc_dynamic: -# define NSAVEXREGPAIRS 2 - stp x29, x30, [sp,#-(32+16*NSAVEXREGPAIRS)]! - cfi_adjust_cfa_offset (32+16*NSAVEXREGPAIRS) - cfi_rel_offset (x29, 0) - cfi_rel_offset (x30, 8) - mov x29, sp DELOUSE (0) /* Save just enough registers to support fast path, if we fall into slow path we will save additional registers. */ - - stp x1, x2, [sp, #32+16*0] - stp x3, x4, [sp, #32+16*1] - cfi_rel_offset (x1, 32) - cfi_rel_offset (x2, 32+8) - cfi_rel_offset (x3, 32+16) - cfi_rel_offset (x4, 32+24) + stp x1, x2, [sp, #-32]! + stp x3, x4, [sp, #16] + cfi_adjust_cfa_offset (32) + cfi_rel_offset (x1, 0) + cfi_rel_offset (x2, 8) + cfi_rel_offset (x3, 16) + cfi_rel_offset (x4, 24) mrs x4, tpidr_el0 ldr PTR_REG (1), [x0,#TLSDESC_ARG] @@ -167,23 +161,18 @@ _dl_tlsdesc_dynamic: ldr PTR_REG (2), [x0,#DTV_COUNTER] cmp PTR_REG (3), PTR_REG (2) b.hi 2f - ldr PTR_REG (2), [x1,#TLSDESC_MODID] + /* Load r2 = td->tlsinfo.ti_module and r3 = td->tlsinfo.ti_offset. */ + ldp PTR_REG (2), PTR_REG (3), [x1,#TLSDESC_MODID] add PTR_REG (0), PTR_REG (0), PTR_REG (2), lsl #(PTR_LOG_SIZE + 1) ldr PTR_REG (0), [x0] /* Load val member of DTV entry. */ cmp PTR_REG (0), #TLS_DTV_UNALLOCATED b.eq 2f - ldr PTR_REG (1), [x1,#TLSDESC_MODOFF] - add PTR_REG (0), PTR_REG (0), PTR_REG (1) - sub PTR_REG (0), PTR_REG (0), PTR_REG (4) + sub PTR_REG (3), PTR_REG (3), PTR_REG (4) + add PTR_REG (0), PTR_REG (0), PTR_REG (3) 1: - ldp x1, x2, [sp, #32+16*0] - ldp x3, x4, [sp, #32+16*1] - - ldp x29, x30, [sp], #(32+16*NSAVEXREGPAIRS) - cfi_adjust_cfa_offset (-32-16*NSAVEXREGPAIRS) - cfi_restore (x29) - cfi_restore (x30) -# undef NSAVEXREGPAIRS + ldp x3, x4, [sp, #16] + ldp x1, x2, [sp], #32 + cfi_adjust_cfa_offset (-32) RET 2: /* This is the slow path. We need to call __tls_get_addr() which @@ -191,29 +180,33 @@ _dl_tlsdesc_dynamic: callee will trash. */ /* Save the remaining registers that we must treat as caller save. */ -# define NSAVEXREGPAIRS 7 - stp x5, x6, [sp, #-16*NSAVEXREGPAIRS]! +# define NSAVEXREGPAIRS 8 + stp x29, x30, [sp,#-16*NSAVEXREGPAIRS]! cfi_adjust_cfa_offset (16*NSAVEXREGPAIRS) - stp x7, x8, [sp, #16*1] - stp x9, x10, [sp, #16*2] - stp x11, x12, [sp, #16*3] - stp x13, x14, [sp, #16*4] - stp x15, x16, [sp, #16*5] - stp x17, x18, [sp, #16*6] - cfi_rel_offset (x5, 0) - cfi_rel_offset (x6, 8) - cfi_rel_offset (x7, 16) - cfi_rel_offset (x8, 16+8) - cfi_rel_offset (x9, 16*2) - cfi_rel_offset (x10, 16*2+8) - cfi_rel_offset (x11, 16*3) - cfi_rel_offset (x12, 16*3+8) - cfi_rel_offset (x13, 16*4) - cfi_rel_offset (x14, 16*4+8) - cfi_rel_offset (x15, 16*5) - cfi_rel_offset (x16, 16*5+8) - cfi_rel_offset (x17, 16*6) - cfi_rel_offset (x18, 16*6+8) + cfi_rel_offset (x29, 0) + cfi_rel_offset (x30, 8) + mov x29, sp + stp x5, x6, [sp, #16*1] + stp x7, x8, [sp, #16*2] + stp x9, x10, [sp, #16*3] + stp x11, x12, [sp, #16*4] + stp x13, x14, [sp, #16*5] + stp x15, x16, [sp, #16*6] + stp x17, x18, [sp, #16*7] + cfi_rel_offset (x5, 16*1) + cfi_rel_offset (x6, 16*1+8) + cfi_rel_offset (x7, 16*2) + cfi_rel_offset (x8, 16*2+8) + cfi_rel_offset (x9, 16*3) + cfi_rel_offset (x10, 16*3+8) + cfi_rel_offset (x11, 16*4) + cfi_rel_offset (x12, 16*4+8) + cfi_rel_offset (x13, 16*5) + cfi_rel_offset (x14, 16*5+8) + cfi_rel_offset (x15, 16*6) + cfi_rel_offset (x16, 16*6+8) + cfi_rel_offset (x17, 16*7) + cfi_rel_offset (x18, 16*7+8) SAVE_Q_REGISTERS @@ -225,14 +218,18 @@ _dl_tlsdesc_dynamic: RESTORE_Q_REGISTERS - ldp x7, x8, [sp, #16*1] - ldp x9, x10, [sp, #16*2] - ldp x11, x12, [sp, #16*3] - ldp x13, x14, [sp, #16*4] - ldp x15, x16, [sp, #16*5] - ldp x17, x18, [sp, #16*6] - ldp x5, x6, [sp], #16*NSAVEXREGPAIRS + ldp x5, x6, [sp, #16*1] + ldp x7, x8, [sp, #16*2] + ldp x9, x10, [sp, #16*3] + ldp x11, x12, [sp, #16*4] + ldp x13, x14, [sp, #16*5] + ldp x15, x16, [sp, #16*6] + ldp x17, x18, [sp, #16*7] + + ldp x29, x30, [sp], #16*NSAVEXREGPAIRS cfi_adjust_cfa_offset (-16*NSAVEXREGPAIRS) + cfi_restore (x29) + cfi_restore (x30) b 1b cfi_endproc .size _dl_tlsdesc_dynamic, .-_dl_tlsdesc_dynamic -- 2.11.0