From patchwork Thu Feb 10 19:58:32 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Adhemerval Zanella Netto
 <adhemerval.zanella@linaro.org>
X-Patchwork-Id: 51029
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id CFE7D385842B
	for <patchwork@sourceware.org>; Thu, 10 Feb 2022 20:06:16 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CFE7D385842B
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1644523576;
	bh=rbaOaESU0xM3jstDgSDs/SYsFJU7f3pS4YnLJenRfeY=;
	h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
	 From;
	b=q/NK5VWMUlUxyTdL4POeLcsFCjXJEsG/9K4aQqImA503aji/XYM2oFTJAi+jEwzIr
	 m2cH80NjwdSA1Oxa10t8x3jF4ZXBk9oBDv2GztP00dw7PSTA8+SeujubG8zASZ2dwF
	 12p9XLJ2qXBVpYzFY0Q5B9f8PRN7m8ACfM4LPBZ4=
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from mail-oo1-xc2c.google.com (mail-oo1-xc2c.google.com
 [IPv6:2607:f8b0:4864:20::c2c])
 by sourceware.org (Postfix) with ESMTPS id 80EE93858417
 for <libc-alpha@sourceware.org>; Thu, 10 Feb 2022 19:58:52 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 80EE93858417
Received: by mail-oo1-xc2c.google.com with SMTP id
 q145-20020a4a3397000000b002e85c7234b1so7726002ooq.8
 for <libc-alpha@sourceware.org>; Thu, 10 Feb 2022 11:58:52 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=rbaOaESU0xM3jstDgSDs/SYsFJU7f3pS4YnLJenRfeY=;
 b=BcoVwH9K5xX+SBqVTsRf+vPvmkBgWcJVzQ67fdG+tq2jN4MNGc+4Lw1rGra4lj09Zd
 T+7j94npy3lEODgSn3glyS2HJsxtpbOrmCdqqRaxohdgnobewXawMbYeVlb2Pg3SzuDa
 rxJFRd1POLIAJjFjUGF4C7M3UIQMOZMEji/ynpQy9d0VkCg1MzlLFv6bz+gitFE7rbmz
 J7mkXkRPFsTIAuIIcB2A1OCCRbLSxg2FttS9beKNBiW98PLWQN1D/0YJ0GrXwwEcjH1C
 2q8LqTvVJ3Otn5Wun1kBe7FEm95aDWKkC3pt2pJMFmU71En2mljPDeCN+yix1WYVJx4j
 Wy1g==
X-Gm-Message-State: AOAM532s0UMXDpNWSKavS+TX147b9iSLHjouTtq8/dbf7zn8TT+DS8o+
 NwijKePS1QzReGQ723Az1sxntgbptufZQA==
X-Google-Smtp-Source: 
 ABdhPJz75QWYxi1hZVAZpDPXUHl6biRmMyMjb+i5mxh550ZL7YlV70SKSYziOiwGHKZs9p8Ihnj/4w==
X-Received: by 2002:a05:6870:3845:: with SMTP id
 z5mr1280277oal.42.1644523131694;
 Thu, 10 Feb 2022 11:58:51 -0800 (PST)
Received: from birita.. ([2804:431:c7ca:733:a925:765e:3799:3d34])
 by smtp.gmail.com with ESMTPSA id bg34sm8859219oob.14.2022.02.10.11.58.50
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Thu, 10 Feb 2022 11:58:51 -0800 (PST)
To: libc-alpha@sourceware.org, Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
 "H . J . Lu" <hjl.tools@gmail.com>,
 Noah Goldstein <goldstein.w.n@gmail.com>
Subject: [PATCH 06/12] ia64: Remove bzero optimization
Date: Thu, 10 Feb 2022 16:58:32 -0300
Message-Id: <20220210195838.1036012-7-adhemerval.zanella@linaro.org>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <20220210195838.1036012-1-adhemerval.zanella@linaro.org>
References: <20220210195838.1036012-1-adhemerval.zanella@linaro.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT,
 RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha
 <libc-alpha@sourceware.org>
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Reply-To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

The symbol is not present current POSIX specification and compiler
already generates memset call.
---
 sysdeps/ia64/bzero.S | 312 -------------------------------------------
 1 file changed, 312 deletions(-)
 delete mode 100644 sysdeps/ia64/bzero.S

diff --git a/sysdeps/ia64/bzero.S b/sysdeps/ia64/bzero.S
deleted file mode 100644
index cd01abb436..0000000000
--- a/sysdeps/ia64/bzero.S
+++ /dev/null
@@ -1,312 +0,0 @@
-/* Optimized version of the standard bzero() function.
-   This file is part of the GNU C Library.
-   Copyright (C) 2000-2022 Free Software Foundation, Inc.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <https://www.gnu.org/licenses/>.  */
-
-/* Return: dest
-
-   Inputs:
-        in0:    dest
-        in1:    count
-
-   The algorithm is fairly straightforward: set byte by byte until we
-   we get to a 16B-aligned address, then loop on 128 B chunks using an
-   early store as prefetching, then loop on 32B chucks, then clear remaining
-   words, finally clear remaining bytes.
-   Since a stf.spill f0 can store 16B in one go, we use this instruction
-   to get peak speed.  */
-
-#include <sysdep.h>
-#undef ret
-
-#define dest		in0
-#define	cnt		in1
-
-#define tmp		r31
-#define save_lc		r30
-#define ptr0		r29
-#define ptr1		r28
-#define ptr2		r27
-#define ptr3		r26
-#define ptr9 		r24
-#define	loopcnt		r23
-#define linecnt		r22
-#define bytecnt		r21
-
-// This routine uses only scratch predicate registers (p6 - p15)
-#define p_scr		p6	// default register for same-cycle branches
-#define p_unalgn	p9
-#define p_y		p11
-#define p_n		p12
-#define p_yy		p13
-#define p_nn		p14
-
-#define movi0		mov
-
-#define MIN1		15
-#define MIN1P1HALF	8
-#define LINE_SIZE	128
-#define LSIZE_SH        7			// shift amount
-#define PREF_AHEAD	8
-
-#define USE_FLP
-#if defined(USE_INT)
-#define store		st8
-#define myval		r0
-#elif defined(USE_FLP)
-#define store		stf8
-#define myval		f0
-#endif
-
-.align	64
-ENTRY(bzero)
-{ .mmi
-	.prologue
-	alloc	tmp = ar.pfs, 2, 0, 0, 0
-	lfetch.nt1 [dest]
-	.save   ar.lc, save_lc
-	movi0	save_lc = ar.lc
-} { .mmi
-	.body
-	mov	ret0 = dest		// return value
-	nop.m	0
-	cmp.eq	p_scr, p0 = cnt, r0
-;; }
-{ .mmi
-	and	ptr2 = -(MIN1+1), dest	// aligned address
-	and	tmp = MIN1, dest	// prepare to check for alignment
-	tbit.nz p_y, p_n = dest, 0	// Do we have an odd address? (M_B_U)
-} { .mib
-	mov	ptr1 = dest
-	nop.i	0
-(p_scr)	br.ret.dpnt.many rp		// return immediately if count = 0
-;; }
-{ .mib
-	cmp.ne	p_unalgn, p0 = tmp, r0
-} { .mib					// NB: # of bytes to move is 1
-	sub	bytecnt = (MIN1+1), tmp		//     higher than loopcnt
-	cmp.gt	p_scr, p0 = 16, cnt		// is it a minimalistic task?
-(p_scr)	br.cond.dptk.many .move_bytes_unaligned	// go move just a few (M_B_U)
-;; }
-{ .mmi
-(p_unalgn) add	ptr1 = (MIN1+1), ptr2		// after alignment
-(p_unalgn) add	ptr2 = MIN1P1HALF, ptr2		// after alignment
-(p_unalgn) tbit.nz.unc p_y, p_n = bytecnt, 3	// should we do a st8 ?
-;; }
-{ .mib
-(p_y)	add	cnt = -8, cnt
-(p_unalgn) tbit.nz.unc p_yy, p_nn = bytecnt, 2	// should we do a st4 ?
-} { .mib
-(p_y)	st8	[ptr2] = r0,-4
-(p_n)	add	ptr2 = 4, ptr2
-;; }
-{ .mib
-(p_yy)	add	cnt = -4, cnt
-(p_unalgn) tbit.nz.unc p_y, p_n = bytecnt, 1	// should we do a st2 ?
-} { .mib
-(p_yy)	st4	[ptr2] = r0,-2
-(p_nn)	add	ptr2 = 2, ptr2
-;; }
-{ .mmi
-	mov	tmp = LINE_SIZE+1		// for compare
-(p_y)	add	cnt = -2, cnt
-(p_unalgn) tbit.nz.unc p_yy, p_nn = bytecnt, 0	// should we do a st1 ?
-} { .mmi
-	nop.m	0
-(p_y)	st2	[ptr2] = r0,-1
-(p_n)	add	ptr2 = 1, ptr2
-;; }
-
-{ .mmi
-(p_yy)	st1	[ptr2] = r0
-	cmp.gt	p_scr, p0 = tmp, cnt		// is it a minimalistic task?
-} { .mbb
-(p_yy)	add	cnt = -1, cnt
-(p_scr)	br.cond.dpnt.many .fraction_of_line	// go move just a few
-;; }
-{ .mib
-	nop.m 	0
-	shr.u	linecnt = cnt, LSIZE_SH
-	nop.b	0
-;; }
-
-	.align 32
-.l1b:	// ------------------//  L1B: store ahead into cache lines; fill later
-{ .mmi
-	and	tmp = -(LINE_SIZE), cnt		// compute end of range
-	mov	ptr9 = ptr1			// used for prefetching
-	and	cnt = (LINE_SIZE-1), cnt	// remainder
-} { .mmi
-	mov	loopcnt = PREF_AHEAD-1		// default prefetch loop
-	cmp.gt	p_scr, p0 = PREF_AHEAD, linecnt	// check against actual value
-;; }
-{ .mmi
-(p_scr)	add	loopcnt = -1, linecnt
-	add	ptr2 = 16, ptr1	// start of stores (beyond prefetch stores)
-	add	ptr1 = tmp, ptr1	// first address beyond total range
-;; }
-{ .mmi
-	add	tmp = -1, linecnt	// next loop count
-	movi0	ar.lc = loopcnt
-;; }
-.pref_l1b:
-{ .mib
-	stf.spill [ptr9] = f0, 128	// Do stores one cache line apart
-	nop.i   0
-	br.cloop.dptk.few .pref_l1b
-;; }
-{ .mmi
-	add	ptr0 = 16, ptr2		// Two stores in parallel
-	movi0	ar.lc = tmp
-;; }
-.l1bx:
- { .mmi
-	stf.spill [ptr2] = f0, 32
-	stf.spill [ptr0] = f0, 32
- ;; }
- { .mmi
-	stf.spill [ptr2] = f0, 32
-	stf.spill [ptr0] = f0, 32
- ;; }
- { .mmi
-	stf.spill [ptr2] = f0, 32
-	stf.spill [ptr0] = f0, 64
-	cmp.lt	p_scr, p0 = ptr9, ptr1	// do we need more prefetching?
- ;; }
-{ .mmb
-	stf.spill [ptr2] = f0, 32
-(p_scr)	stf.spill [ptr9] = f0, 128
-	br.cloop.dptk.few .l1bx
-;; }
-{ .mib
-	cmp.gt  p_scr, p0 = 8, cnt	// just a few bytes left ?
-(p_scr)	br.cond.dpnt.many  .move_bytes_from_alignment
-;; }
-
-.fraction_of_line:
-{ .mib
-	add	ptr2 = 16, ptr1
-	shr.u	loopcnt = cnt, 5   	// loopcnt = cnt / 32
-;; }
-{ .mib
-	cmp.eq	p_scr, p0 = loopcnt, r0
-	add	loopcnt = -1, loopcnt
-(p_scr)	br.cond.dpnt.many .store_words
-;; }
-{ .mib
-	and	cnt = 0x1f, cnt		// compute the remaining cnt
-	movi0   ar.lc = loopcnt
-;; }
-	.align 32
-.l2:	// -----------------------------//  L2A:  store 32B in 2 cycles
-{ .mmb
-	store	[ptr1] = myval, 8
-	store	[ptr2] = myval, 8
-;; } { .mmb
-	store	[ptr1] = myval, 24
-	store	[ptr2] = myval, 24
-	br.cloop.dptk.many .l2
-;; }
-.store_words:
-{ .mib
-	cmp.gt	p_scr, p0 = 8, cnt	// just a few bytes left ?
-(p_scr)	br.cond.dpnt.many .move_bytes_from_alignment	// Branch
-;; }
-
-{ .mmi
-	store	[ptr1] = myval, 8	// store
-	cmp.le	p_y, p_n = 16, cnt	//
-	add	cnt = -8, cnt		// subtract
-;; }
-{ .mmi
-(p_y)	store	[ptr1] = myval, 8	// store
-(p_y)	cmp.le.unc p_yy, p_nn = 16, cnt
-(p_y)	add	cnt = -8, cnt		// subtract
-;; }
-{ .mmi					// store
-(p_yy)	store	[ptr1] = myval, 8
-(p_yy)	add	cnt = -8, cnt		// subtract
-;; }
-
-.move_bytes_from_alignment:
-{ .mib
-	cmp.eq	p_scr, p0 = cnt, r0
-	tbit.nz.unc p_y, p0 = cnt, 2	// should we terminate with a st4 ?
-(p_scr)	br.cond.dpnt.few .restore_and_exit
-;; }
-{ .mib
-(p_y)	st4	[ptr1] = r0,4
-	tbit.nz.unc p_yy, p0 = cnt, 1	// should we terminate with a st2 ?
-;; }
-{ .mib
-(p_yy)	st2	[ptr1] = r0,2
-	tbit.nz.unc p_y, p0 = cnt, 0	// should we terminate with a st1 ?
-;; }
-
-{ .mib
-(p_y)	st1	[ptr1] = r0
-;; }
-.restore_and_exit:
-{ .mib
-	nop.m	0
-	movi0	ar.lc = save_lc
-	br.ret.sptk.many rp
-;; }
-
-.move_bytes_unaligned:
-{ .mmi
-       .pred.rel "mutex",p_y, p_n
-       .pred.rel "mutex",p_yy, p_nn
-(p_n)	cmp.le  p_yy, p_nn = 4, cnt
-(p_y)	cmp.le  p_yy, p_nn = 5, cnt
-(p_n)	add	ptr2 = 2, ptr1
-} { .mmi
-(p_y)	add	ptr2 = 3, ptr1
-(p_y)	st1	[ptr1] = r0, 1		// fill 1 (odd-aligned) byte
-(p_y)	add	cnt = -1, cnt		// [15, 14 (or less) left]
-;; }
-{ .mmi
-(p_yy)	cmp.le.unc p_y, p0 = 8, cnt
-	add	ptr3 = ptr1, cnt	// prepare last store
-	movi0	ar.lc = save_lc
-} { .mmi
-(p_yy)	st2	[ptr1] = r0, 4		// fill 2 (aligned) bytes
-(p_yy)	st2	[ptr2] = r0, 4		// fill 2 (aligned) bytes
-(p_yy)	add	cnt = -4, cnt		// [11, 10 (o less) left]
-;; }
-{ .mmi
-(p_y)	cmp.le.unc p_yy, p0 = 8, cnt
-	add	ptr3 = -1, ptr3		// last store
-	tbit.nz p_scr, p0 = cnt, 1	// will there be a st2 at the end ?
-} { .mmi
-(p_y)	st2	[ptr1] = r0, 4		// fill 2 (aligned) bytes
-(p_y)	st2	[ptr2] = r0, 4		// fill 2 (aligned) bytes
-(p_y)	add	cnt = -4, cnt		// [7, 6 (or less) left]
-;; }
-{ .mmi
-(p_yy)	st2	[ptr1] = r0, 4		// fill 2 (aligned) bytes
-(p_yy)	st2	[ptr2] = r0, 4		// fill 2 (aligned) bytes
-					// [3, 2 (or less) left]
-	tbit.nz p_y, p0 = cnt, 0	// will there be a st1 at the end ?
-} { .mmi
-(p_yy)	add	cnt = -4, cnt
-;; }
-{ .mmb
-(p_scr)	st2	[ptr1] = r0		// fill 2 (aligned) bytes
-(p_y)	st1	[ptr3] = r0		// fill last byte (using ptr3)
-	br.ret.sptk.many rp
-;; }
-END(bzero)