From patchwork Thu Sep 16 01:14:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 45063 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E9ABE385781A for ; Thu, 16 Sep 2021 01:14:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E9ABE385781A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1631754897; bh=nXKV02ddlT/l9ee3kzhG0N5tx2UmgXFDpLJc5aUuRAA=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=UfO5CW+j1yfadgjlODd/mZq2xZ+hjX499fzjCe0hybCnuBDj9R8w9QVwXfwIROf/n bPG3ABn6gE4TNqFWZMhmbb4zxmb3wM2AJlUtY8IXY7I8vFM2WKtDXyZnEq+cKcYYN7 w7A5+R1ons/Ss70OO5CgVX9n6/JSlcSFxRj80X08= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 3E6AF3858C39 for ; Thu, 16 Sep 2021 01:14:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3E6AF3858C39 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.0.43) with SMTP id 18FMxkpW031658; Wed, 15 Sep 2021 21:14:25 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3b3t481y4n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 15 Sep 2021 21:14:24 -0400 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 18G0pI4o023254; Wed, 15 Sep 2021 21:14:24 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com with ESMTP id 3b3t481y48-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 15 Sep 2021 21:14:24 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 18G17HGi025365; Thu, 16 Sep 2021 01:14:22 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma04ams.nl.ibm.com with ESMTP id 3b0m3ac79k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 16 Sep 2021 01:14:21 +0000 Received: from b06wcsmtp001.portsmouth.uk.ibm.com (b06wcsmtp001.portsmouth.uk.ibm.com [9.149.105.160]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 18G1EJYC42861054 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 16 Sep 2021 01:14:19 GMT Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 72D62A405C; Thu, 16 Sep 2021 01:14:19 +0000 (GMT) Received: from b06wcsmtp001.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E0485A405F; Thu, 16 Sep 2021 01:14:17 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.197.239.230]) by b06wcsmtp001.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 16 Sep 2021 01:14:17 +0000 (GMT) To: GCC Patches Subject: [PATCH] rs6000: Modify the way for extra penalized cost Message-ID: Date: Thu, 16 Sep 2021 09:14:15 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-GUID: jzb_K7J9UBkLrjU2BVoSpPy-Y8jBOeUp X-Proofpoint-ORIG-GUID: StJIrrmE1FKEW-eYG1PfYoIpR5qdAHnM X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.687,Hydra:6.0.235,FMLib:17.0.607.475 definitions=2020-10-13_15,2020-10-13_02,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 mlxlogscore=999 clxscore=1015 malwarescore=0 phishscore=0 priorityscore=1501 impostorscore=0 suspectscore=0 mlxscore=0 spamscore=0 adultscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109030001 definitions=main-2109150122 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Bill Schmidt , David Edelsohn , Segher Boessenkool Errors-To: gcc-patches-bounces+patchwork=sourceware.org@gcc.gnu.org Sender: "Gcc-patches" Hi, This patch follows the discussion here[1], where Segher pointed out the existing way to guard the extra penalized cost for strided/elementwise loads with a magic bound doesn't scale. The way with nunits * stmt_cost can get one much exaggerated penalized cost, such as: for V16QI on P8, it's 16 * 20 = 320, that's why we need one bound. To make it scale, this patch doesn't use nunits * stmt_cost any more, but it still keeps nunits since there are actually nunits scalar loads there. So it uses one cost adjusted from stmt_cost, since the current stmt_cost sort of considers nunits, we can stablize the cost for big nunits and retain the cost for small nunits. After some tries, this patch gets the adjusted cost as: stmt_cost / (log2(nunits) * log2(nunits)) For V16QI, the adjusted cost would be 1 and total penalized cost is 16, it isn't exaggerated. For V2DI, the adjusted cost would be 2 and total penalized cost is 4, which is the same as before. btw, I tried to use one single log2(nunits), but the penalized cost is still big enough and can't fix the degraded bmk blender_r. The separated SPEC2017 evaluations on Power8, Power9 and Power10 at option sets O2-vect and Ofast-unroll showed this change is neutral (that is same effect as before). Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9. Is it ok for trunk? [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html BR, Kewen ----- gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust the way to compute extra penalized cost. --- gcc/config/rs6000/rs6000.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) -- 2.25.1 diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 4ab23b0ab33..e08b94c0447 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -5454,17 +5454,23 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data *data, { tree vectype = STMT_VINFO_VECTYPE (stmt_info); unsigned int nunits = vect_nunits_for_cost (vectype); - unsigned int extra_cost = nunits * stmt_cost; - /* As function rs6000_builtin_vectorization_cost shows, we have - priced much on V16QI/V8HI vector construction as their units, - if we penalize them with nunits * stmt_cost, it can result in - an unreliable body cost, eg: for V16QI on Power8, stmt_cost - is 20 and nunits is 16, the extra cost is 320 which looks - much exaggerated. So let's use one maximum bound for the - extra penalized cost for vector construction here. */ - const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12; - if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR) - extra_cost = MAX_PENALIZED_COST_FOR_CTOR; + /* As function rs6000_builtin_vectorization_cost shows, we + have priced much on V16QI/V8HI vector construction by + considering their units, if we penalize them with nunits + * stmt_cost here, it can result in an unreliable body cost, + eg: for V16QI on Power8, stmt_cost is 20 and nunits is 16, + the penalty will be 320 which looks much exaggerated. But + there are actually nunits scalar loads, so we try to adopt + one reasonable penalized cost for each load rather than + stmt_cost. Here, with stmt_cost dividing by log2(nunits)^2, + we can still retain the necessary penalty for small nunits + meanwhile stabilize the penalty for big nunits. */ + int nunits_log2 = exact_log2 (nunits); + gcc_assert (nunits_log2 > 0); + unsigned int nunits_sq = nunits_log2 * nunits_log2; + unsigned int adjusted_cost = stmt_cost / nunits_sq; + gcc_assert (adjusted_cost > 0); + unsigned int extra_cost = nunits * adjusted_cost; data->extra_ctor_cost += extra_cost; } }