From patchwork Thu Apr 21 14:49:54 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Liebler X-Patchwork-Id: 11837 Received: (qmail 106088 invoked by alias); 21 Apr 2016 15:05:14 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 106073 invoked by uid 89); 21 Apr 2016 15:05:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=sk:andreas, interrupted X-HELO: plane.gmane.org To: libc-alpha@sourceware.org From: Stefan Liebler Subject: Re: [PATCH 06/14] S390: Optimize iso-8859-1 to ibm037 iconv-module. Date: Thu, 21 Apr 2016 16:49:54 +0200 Lines: 306 Message-ID: References: <1456219278-5258-1-git-send-email-stli@linux.vnet.ibm.com> <1456219278-5258-7-git-send-email-stli@linux.vnet.ibm.com> Mime-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.0 In-Reply-To: <1456219278-5258-7-git-send-email-stli@linux.vnet.ibm.com> Here is an updated patch, where the labels in inline assemblies are out-dented as suggested by Florian. On 02/23/2016 10:21 AM, Stefan Liebler wrote: > This patch reworks the s390 specific module which used the z900 > translate one to one instruction. Now the g5 translate instruction is used, > because it outperforms the troo instruction. > > ChangeLog: > > * sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c (TROO_LOOP): > Rename to TR_LOOP and usage of tr instead of troo instruction. > --- > sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c | 93 +++++++++++++++++----------- > 1 file changed, 56 insertions(+), 37 deletions(-) > > diff --git a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c > index c59f87f..4d79bbf 100644 > --- a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c > +++ b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c > @@ -1,7 +1,6 @@ > /* Conversion between ISO 8859-1 and IBM037. > > - This module uses the Z900 variant of the Translate One To One > - instruction. > + This module uses the translate instruction. > Copyright (C) 1997-2016 Free Software Foundation, Inc. > > Author: Andreas Krebbel > @@ -176,50 +175,70 @@ __attribute__ ((aligned (8))) = > #define MIN_NEEDED_FROM 1 > #define MIN_NEEDED_TO 1 > > -/* The Z900 variant of troo forces us to always specify a test > - character which ends the translation. So if we run into the > - situation where the translation has been interrupted due to the > - test character we translate the character by hand and jump back > - into the instruction. */ > - > -#define TROO_LOOP(TABLE) \ > +#define TR_LOOP(TABLE) \ > { \ > - register const unsigned char test __asm__ ("0") = 0; \ > - register const unsigned char *pTable __asm__ ("1") = TABLE; \ > - register unsigned char *pOutput __asm__ ("2") = outptr; \ > - register uint64_t length __asm__ ("3"); \ > - const unsigned char* pInput = inptr; \ > - uint64_t tmp; \ > - \ > - length = (inend - inptr < outend - outptr \ > - ? inend - inptr : outend - outptr); \ > + size_t length = (inend - inptr < outend - outptr \ > + ? inend - inptr : outend - outptr); \ > \ > - __asm__ volatile ("0: \n\t" \ > - " troo %0,%1 \n\t" \ > - " jz 1f \n\t" \ > - " jo 0b \n\t" \ > - " llgc %3,0(%1) \n\t" \ > - " la %3,0(%3,%4) \n\t" \ > - " mvc 0(1,%0),0(%3) \n\t" \ > - " aghi %1,1 \n\t" \ > - " aghi %0,1 \n\t" \ > - " aghi %2,-1 \n\t" \ > - " j 0b \n\t" \ > - "1: \n" \ > + /* Process in 256 byte blocks. */ \ > + if (__builtin_expect (length >= 256, 0)) \ > + { \ > + size_t blocks = length / 256; \ > + __asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t" \ > + "tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t" \ > + "la %[R_IN],256(%[R_IN])\n\t" \ > + "la %[R_OUT],256(%[R_OUT])\n\t" \ > + "brctg %[R_LI],0b\n\t" \ > + : /* outputs */ [R_IN] "+a" (inptr) \ > + , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \ > + : /* inputs */ [R_TBL] "a" (TABLE) \ > + : /* clobber list */ "memory" \ > + ); \ > + length = length % 256; \ > + } \ > \ > - : "+a" (pOutput), "+a" (pInput), "+d" (length), "=&a" (tmp) \ > - : "a" (pTable), "d" (test) \ > - : "cc"); \ > + /* Process remaining 0...248 bytes in 8byte blocks. */ \ > + if (length >= 8) \ > + { \ > + size_t blocks = length / 8; \ > + for (int i = 0; i < blocks; i++) \ > + { \ > + outptr[0] = TABLE[inptr[0]]; \ > + outptr[1] = TABLE[inptr[1]]; \ > + outptr[2] = TABLE[inptr[2]]; \ > + outptr[3] = TABLE[inptr[3]]; \ > + outptr[4] = TABLE[inptr[4]]; \ > + outptr[5] = TABLE[inptr[5]]; \ > + outptr[6] = TABLE[inptr[6]]; \ > + outptr[7] = TABLE[inptr[7]]; \ > + inptr += 8; \ > + outptr += 8; \ > + } \ > + length = length % 8; \ > + } \ > \ > - inptr = pInput; \ > - outptr = pOutput; \ > + /* Process remaining 0...7 bytes. */ \ > + switch (length) \ > + { \ > + case 7: outptr[6] = TABLE[inptr[6]]; \ > + case 6: outptr[5] = TABLE[inptr[5]]; \ > + case 5: outptr[4] = TABLE[inptr[4]]; \ > + case 4: outptr[3] = TABLE[inptr[3]]; \ > + case 3: outptr[2] = TABLE[inptr[2]]; \ > + case 2: outptr[1] = TABLE[inptr[1]]; \ > + case 1: outptr[0] = TABLE[inptr[0]]; \ > + case 0: break; \ > + } \ > + inptr += length; \ > + outptr += length; \ > } > > + > /* First define the conversion function from ISO 8859-1 to CP037. */ > #define MIN_NEEDED_INPUT MIN_NEEDED_FROM > #define MIN_NEEDED_OUTPUT MIN_NEEDED_TO > #define LOOPFCT FROM_LOOP > -#define BODY TROO_LOOP (table_iso8859_1_to_cp037) > +#define BODY TR_LOOP (table_iso8859_1_to_cp037) > > #include > > @@ -228,7 +247,7 @@ __attribute__ ((aligned (8))) = > #define MIN_NEEDED_INPUT MIN_NEEDED_TO > #define MIN_NEEDED_OUTPUT MIN_NEEDED_FROM > #define LOOPFCT TO_LOOP > -#define BODY TROO_LOOP (table_cp037_iso8859_1); > +#define BODY TR_LOOP (table_cp037_iso8859_1); > > #include > > From d489351c09c82994adb872049fcb33bf189f86af Mon Sep 17 00:00:00 2001 From: Stefan Liebler Date: Thu, 21 Apr 2016 12:42:49 +0200 Subject: [PATCH 06/14] S390: Optimize iso-8859-1 to ibm037 iconv-module. This patch reworks the s390 specific module which used the z900 translate one to one instruction. Now the g5 translate instruction is used, because it outperforms the troo instruction. ChangeLog: * sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c (TROO_LOOP): Rename to TR_LOOP and usage of tr instead of troo instruction. --- sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c | 93 +++++++++++++++++----------- 1 file changed, 56 insertions(+), 37 deletions(-) diff --git a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c index c59f87f..3b63e6a 100644 --- a/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c +++ b/sysdeps/s390/s390-64/iso-8859-1_cp037_z900.c @@ -1,7 +1,6 @@ /* Conversion between ISO 8859-1 and IBM037. - This module uses the Z900 variant of the Translate One To One - instruction. + This module uses the translate instruction. Copyright (C) 1997-2016 Free Software Foundation, Inc. Author: Andreas Krebbel @@ -176,50 +175,70 @@ __attribute__ ((aligned (8))) = #define MIN_NEEDED_FROM 1 #define MIN_NEEDED_TO 1 -/* The Z900 variant of troo forces us to always specify a test - character which ends the translation. So if we run into the - situation where the translation has been interrupted due to the - test character we translate the character by hand and jump back - into the instruction. */ - -#define TROO_LOOP(TABLE) \ +#define TR_LOOP(TABLE) \ { \ - register const unsigned char test __asm__ ("0") = 0; \ - register const unsigned char *pTable __asm__ ("1") = TABLE; \ - register unsigned char *pOutput __asm__ ("2") = outptr; \ - register uint64_t length __asm__ ("3"); \ - const unsigned char* pInput = inptr; \ - uint64_t tmp; \ - \ - length = (inend - inptr < outend - outptr \ - ? inend - inptr : outend - outptr); \ + size_t length = (inend - inptr < outend - outptr \ + ? inend - inptr : outend - outptr); \ \ - __asm__ volatile ("0: \n\t" \ - " troo %0,%1 \n\t" \ - " jz 1f \n\t" \ - " jo 0b \n\t" \ - " llgc %3,0(%1) \n\t" \ - " la %3,0(%3,%4) \n\t" \ - " mvc 0(1,%0),0(%3) \n\t" \ - " aghi %1,1 \n\t" \ - " aghi %0,1 \n\t" \ - " aghi %2,-1 \n\t" \ - " j 0b \n\t" \ - "1: \n" \ + /* Process in 256 byte blocks. */ \ + if (__builtin_expect (length >= 256, 0)) \ + { \ + size_t blocks = length / 256; \ + __asm__ __volatile__("0: mvc 0(256,%[R_OUT]),0(%[R_IN])\n\t" \ + " tr 0(256,%[R_OUT]),0(%[R_TBL])\n\t" \ + " la %[R_IN],256(%[R_IN])\n\t" \ + " la %[R_OUT],256(%[R_OUT])\n\t" \ + " brctg %[R_LI],0b\n\t" \ + : /* outputs */ [R_IN] "+a" (inptr) \ + , [R_OUT] "+a" (outptr), [R_LI] "+d" (blocks) \ + : /* inputs */ [R_TBL] "a" (TABLE) \ + : /* clobber list */ "memory" \ + ); \ + length = length % 256; \ + } \ \ - : "+a" (pOutput), "+a" (pInput), "+d" (length), "=&a" (tmp) \ - : "a" (pTable), "d" (test) \ - : "cc"); \ + /* Process remaining 0...248 bytes in 8byte blocks. */ \ + if (length >= 8) \ + { \ + size_t blocks = length / 8; \ + for (int i = 0; i < blocks; i++) \ + { \ + outptr[0] = TABLE[inptr[0]]; \ + outptr[1] = TABLE[inptr[1]]; \ + outptr[2] = TABLE[inptr[2]]; \ + outptr[3] = TABLE[inptr[3]]; \ + outptr[4] = TABLE[inptr[4]]; \ + outptr[5] = TABLE[inptr[5]]; \ + outptr[6] = TABLE[inptr[6]]; \ + outptr[7] = TABLE[inptr[7]]; \ + inptr += 8; \ + outptr += 8; \ + } \ + length = length % 8; \ + } \ \ - inptr = pInput; \ - outptr = pOutput; \ + /* Process remaining 0...7 bytes. */ \ + switch (length) \ + { \ + case 7: outptr[6] = TABLE[inptr[6]]; \ + case 6: outptr[5] = TABLE[inptr[5]]; \ + case 5: outptr[4] = TABLE[inptr[4]]; \ + case 4: outptr[3] = TABLE[inptr[3]]; \ + case 3: outptr[2] = TABLE[inptr[2]]; \ + case 2: outptr[1] = TABLE[inptr[1]]; \ + case 1: outptr[0] = TABLE[inptr[0]]; \ + case 0: break; \ + } \ + inptr += length; \ + outptr += length; \ } + /* First define the conversion function from ISO 8859-1 to CP037. */ #define MIN_NEEDED_INPUT MIN_NEEDED_FROM #define MIN_NEEDED_OUTPUT MIN_NEEDED_TO #define LOOPFCT FROM_LOOP -#define BODY TROO_LOOP (table_iso8859_1_to_cp037) +#define BODY TR_LOOP (table_iso8859_1_to_cp037) #include @@ -228,7 +247,7 @@ __attribute__ ((aligned (8))) = #define MIN_NEEDED_INPUT MIN_NEEDED_TO #define MIN_NEEDED_OUTPUT MIN_NEEDED_FROM #define LOOPFCT TO_LOOP -#define BODY TROO_LOOP (table_cp037_iso8859_1); +#define BODY TR_LOOP (table_cp037_iso8859_1); #include -- 2.5.5