From patchwork Fri Jun 21 12:52:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 92648 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 352C13899080 for ; Fri, 21 Jun 2024 12:53:24 +0000 (GMT) X-Original-To: binutils@sourceware.org Delivered-To: binutils@sourceware.org Received: from mail-lj1-x22e.google.com (mail-lj1-x22e.google.com [IPv6:2a00:1450:4864:20::22e]) by sourceware.org (Postfix) with ESMTPS id DCCB13896C0B for ; Fri, 21 Jun 2024 12:52:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DCCB13896C0B Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DCCB13896C0B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::22e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718974371; cv=none; b=EJKTmHAdby4KYcDvLZ5r6Z2o2Uxgmc5Ybf9IyBU8SM7YBT/xcj0U2EOHDL6xOjQtZSBer+u70IbKXO4KR52jA890dbCclpDitMvMJeUSRzMpO3YX042s6FoU+Qy1OCE/mwMashFYDIBhdeJus4ntpH6boARbhaBUU/c7Y+Dn5zU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718974371; c=relaxed/simple; bh=/2QI/uajelqtCJZV9LdnrrM5avtWQ9nd8H+y7/tww8o=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:From:To; b=g2VXRlcg4NG60ct4l/PzKU0iPFa7escZ/T05Ej12Ea2JKojZb25ltIKGfAjM0wIxtBB7H6BttBAq9TvpNF6V3fC11mZi3kYzE1XHBnAcjESnD72mT4tgyFm/4pGOCI3vgMomh5wWez7E0IjdVdzTEkS5yvCJiyx5LpNi+aAtzUM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x22e.google.com with SMTP id 38308e7fff4ca-2ec0f3b9cfeso20772761fa.0 for ; Fri, 21 Jun 2024 05:52:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1718974366; x=1719579166; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=992Lfw6c5rM2qdUMG2vHYuymQBlW8F9UO4nfe9mBnPQ=; b=Pxcvgj+HnB4lrUCcRxPcA89qYD0eJxY9pexGEepR4kZOY4XN/dhO4KLYpFrSDmXbzm MMyaqCFUH6oeclQHOYU/+t33PwI+zi1v6kGmqA4WJusmqWIP1YhrXH95a+NjG+alw8gI 5D+m9hYVnf76k5ZM/2GfTwWhnPDZk63fybS866V+TrT/TSiE14Jgj3yO+JsDG4T3/iGj NmknsmAQT4pRLBXGUgHpDsCMt6Qwk9ooWPj7Dny0NSl3vYJfb54lTkFQR5xyaU0B/HOf JB1OwrqeJXi8ejQviL21JiA+vvx0kzXASLNbNwsorsZuso44r8X8BK9e1aqxRLbCZu9N Yjaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718974366; x=1719579166; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=992Lfw6c5rM2qdUMG2vHYuymQBlW8F9UO4nfe9mBnPQ=; b=KrktgwGRqrOy3ymg9BcX9zEJSJ9Hp0qjoTNcYSuDGC6/r2AeWI4JVKNpeIznUoJ4pc YQFKDHtzVv/6fqv1xMkogj3ZCETiHZY8SPEfMtEtYTHSp4Okp9Y/6czjNU1FbSaZ/Erj VoH3nCayW9Cax/FmOi7ptUkQCvRu4zP7cyex6971nxJA8JBdVMIR6onj4jTbedJzcHL5 AmhIKG8MGfcw71YqiRGfS/lu/CsXfI/JqZqdTi13by57QB9lFyl+ImKWoaHothYiT/0T F7o7kOmkNxw4SXOzVQ+rgGUfg2mp6JJJeznEHFWeUpm0IRlOPjj4sxJnEH66a4D+vuhI wHlg== X-Gm-Message-State: AOJu0Yw37KUIr4/QO0SYnFfjzuNVTvhhZgEt9vejp37L9WzFBm9EP1Es LUCH49zhE3p9clU6KLhEecDZZI4cT8L+uwh9D4Md11jSx5Ikb0kJtQxiGQAqH0uChScr+FsHevI = X-Google-Smtp-Source: AGHT+IHPLSi94Q19htBk+VBWdPtkv6bAEKhK8hG9l9zgZAxa8NE5JW9eQqtYAuijwXtvE9Md556znQ== X-Received: by 2002:a2e:9208:0:b0:2ec:1f9f:215a with SMTP id 38308e7fff4ca-2ec3cfe8b05mr47987251fa.34.1718974366139; Fri, 21 Jun 2024 05:52:46 -0700 (PDT) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70651085a68sm1315038b3a.31.2024.06.21.05.52.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 21 Jun 2024 05:52:45 -0700 (PDT) Message-ID: <7c01753d-2f2a-4642-bcbe-1cbcbe00cbc1@suse.com> Date: Fri, 21 Jun 2024 14:52:39 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: [PATCH v2 6/8] x86/APX: optimize {nf}-form IMUL-by-power-of-2 to SHL From: Jan Beulich To: Binutils Cc: "H.J. Lu" , Lili Cui , "Jiang, Haochen" References: <72726722-9d28-4d82-84ef-320e1786b0e4@suse.com> Content-Language: en-US Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: <72726722-9d28-4d82-84ef-320e1786b0e4@suse.com> X-Spam-Status: No, score=-3024.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: binutils-bounces+patchwork=sourceware.org@sourceware.org ..., for differing only in the resulting EFLAGS, which are left untouched anyway. That's a shorter encoding, available as long as certain constraints on operands are met; see code comments. (SHL-by-1 forms may then be subject to further optimization that was introduced earlier.) Note that kind of as a side effect this also converts multiplication by 1 to shift by 0, which is a plain move or even no-op anyway. That could be further shrunk (as could be presence of shifts/rotates by 0 in the original code as well as a fair set of other {nf}-form insns), yet the expectation (for now) is that people won't write such code in the first place. --- RFC: Comparing i.op[2].regs against i.op[1].regs without first checking that operand 1 isn't a memory operand is at least UB-ish, for memory operands setting i.op[].disps instead (if anything). Do we deem this tolerable? --- v2: New. --- a/gas/config/tc-i386.c +++ b/gas/config/tc-i386.c @@ -5458,6 +5458,75 @@ optimize_nf_encoding (void) i.tm.operand_types[0].bitfield.imm1 = 1; i.imm_operands = 0; } + else if ((i.tm.base_opcode | 2) == 0x6b + && i.op[0].imms->X_op == O_constant + && (i.op[0].imms->X_add_number > 0 + ? !(i.op[0].imms->X_add_number & (i.op[0].imms->X_add_number - 1)) + /* optimize_imm() converts to sign-extended representation where + possible (and input can also come with these specific numbers). */ + : (i.types[i.operands - 1].bitfield.word + && i.op[0].imms->X_add_number == -0x8000) + || (i.types[i.operands - 1].bitfield.dword + && i.op[0].imms->X_add_number + 1 == -0x7fffffff)) + /* 16-bit 3-operand non-ZU forms need leaviong alone, to prevent + zero-extension of the result. Unless, of course, both non- + immediate operands match (which can be converted to the non-NDD + form). */ + && (i.operands < 3 + || !i.types[2].bitfield.word + || i.tm.mnem_off == MN_imulzu + || i.op[2].regs == i.op[1].regs) + /* When merely optimizing for size, exclude cases where we'd convert + from Imm8S to Imm8 encoding, thus not actually reducing size. */ + && (!optimize_for_space + || i.tm.base_opcode == 0x69 + || !(i.op[0].imms->X_add_number & 0x7d))) + { + /* Optimize: -O: + {nf} imul $1< {nf} shl $N, ... + {nf} imulzu $1< {nf} shl $N, ... + */ + if (i.op[0].imms->X_add_number != 2) + { + i.tm.base_opcode = 0xc0; + i.op[0].imms->X_add_number = ffs (i.op[0].imms->X_add_number) - 1; + i.tm.operand_types[0].bitfield.imm8 = 1; + i.tm.operand_types[0].bitfield.imm16 = 0; + i.tm.operand_types[0].bitfield.imm32 = 0; + i.tm.operand_types[0].bitfield.imm32s = 0; + } + else + { + i.tm.base_opcode = 0xd0; + i.tm.operand_types[0].bitfield.imm1 = 1; + } + i.types[0] = i.tm.operand_types[0]; + i.tm.extension_opcode = 4; + i.tm.opcode_modifier.w = 1; + i.tm.opcode_modifier.operandconstraint = 0; + if (i.operands == 3) + { + if (i.op[2].regs == i.op[1].regs && i.tm.mnem_off != MN_imulzu) + { + /* Convert to non-NDD form. This is required for 16-bit insns + (to prevent zero-extension) and benign for others. */ + i.operands = 2; + i.reg_operands = 1; + } + else + i.tm.opcode_modifier.vexvvvv = VexVVVV_DST; + } + else if (i.tm.mnem_off == MN_imulzu) + { + /* Convert to NDD form, to effect zero-extension of the result. */ + i.tm.opcode_modifier.vexvvvv = VexVVVV_DST; + i.operands = 3; + i.reg_operands = 2; + i.op[2].regs = i.op[1].regs; + i.tm.operand_types[2] = i.tm.operand_types[1]; + i.types[2] = i.types[1]; + } + } if (optimize_for_space && i.encoding != encoding_evex @@ -5604,6 +5673,7 @@ optimize_nf_encoding (void) else if (i.tm.base_opcode == 0x6b && !i.mem_operands && i.encoding != encoding_evex + && i.tm.mnem_off != MN_imulzu && is_plausible_suffix (1) /* %rsp can't be the index. */ && is_index (i.op[1].regs) --- a/gas/testsuite/gas/i386/x86-64-apx-nf.s +++ b/gas/testsuite/gas/i386/x86-64-apx-nf.s @@ -1472,4 +1472,40 @@ optimize: {nf} imul $5, %r21w, %dx {nf} imul $9, %r21w .endif + + # Note: 2-6 want leaving alone with -Os. + .irp n, 1, 2, 6, 7 + # Note: 16-bit 3-operand src!=dst non-ZU form needs leaving alone. + {nf} imul $1<<\n, %\r\()dx, %\r\()cx + {nf} imul $1<<\n, (%rdx), %\r\()cx + {nf} imul $1<<\n, %\r\()cx, %\r\()cx + {nf} imul $1<<\n, %\r\()cx + + .ifeqs "\r","" + {nf} imulzu $1<<\n, %dx, %cx + {nf} imulzu $1<<\n, (%rdx), %cx + {nf} imulzu $1<<\n, %cx, %cx + {nf} imulzu $1<<\n, %cx + .endif + .endr + + .ifeqs "\r","" + # Note: 3-operand src!=dst non-ZU form needs leaving alone. + {nf} imul $1<<15, %dx, %cx + {nf} imul $-1<<15, (%rdx), %cx + {nf} imul $1<<15, %cx, %cx + {nf} imul $-1<<15, %cx + {nf} imulzu $1<<15, %cx + .endif + + .ifeqs "\r","e" + {nf} imul $1<<31, %edx, %ecx + {nf} imul $-1<<31, (%rdx), %ecx + .endif + + .ifeqs "\r","r" + {nf} imul $1<<30, %rdx, %rcx + # Needs leaving alone. + {nf} imul $-1<<31, %rdx, %rcx + .endif .endr --- a/gas/testsuite/gas/i386/x86-64-apx-nf-optimize.d +++ b/gas/testsuite/gas/i386/x86-64-apx-nf-optimize.d @@ -1522,14 +1522,87 @@ Disassembly of section \.text: [ ]*[a-f0-9]+:[ ]*66 d5 40 8d 44 6d 00[ ]+lea 0x0\(%rbp,%rbp,2\),%r16w [ ]*[a-f0-9]+:[ ]*66 d5 30 8d 54 ad 00[ ]+lea 0x0\(%r21,%r21,4\),%dx [ ]*[a-f0-9]+:[ ]*66 d5 70 8d 6c ed 00[ ]+lea 0x0\(%r21,%r21,8\),%r21w +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b ca 02 \{nf\} imul \$0x2,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b 0a 02 \{nf\} imul \$0x2,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 01 c9 \{nf\} add %cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 01 c9 \{nf\} add %cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c 01 d2 \{nf\} add %dx,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c d1 22 \{nf\} shl \$1,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c 01 c9 \{nf\} add %cx,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c 01 c9 \{nf\} add %cx,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b ca 04 \{nf\} imul \$0x4,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b 0a 04 \{nf\} imul \$0x4,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c c1 e1 02 \{nf\} shl \$0x2,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c c1 e1 02 \{nf\} shl \$0x2,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e2 02 \{nf\} shl \$0x2,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 22 02 \{nf\} shl \$0x2,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e1 02 \{nf\} shl \$0x2,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e1 02 \{nf\} shl \$0x2,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b ca 40 \{nf\} imul \$0x40,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b 0a 40 \{nf\} imul \$0x40,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c c1 e1 06 \{nf\} shl \$0x6,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c c1 e1 06 \{nf\} shl \$0x6,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e2 06 \{nf\} shl \$0x6,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 22 06 \{nf\} shl \$0x6,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e1 06 \{nf\} shl \$0x6,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e1 06 \{nf\} shl \$0x6,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 69 ca 80 00 \{nf\} imul \$0x80,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 69 0a 80 00 \{nf\} imul \$0x80,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c c1 e1 07 \{nf\} shl \$0x7,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c c1 e1 07 \{nf\} shl \$0x7,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e2 07 \{nf\} shl \$0x7,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 22 07 \{nf\} shl \$0x7,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e1 07 \{nf\} shl \$0x7,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e1 07 \{nf\} shl \$0x7,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 69 ca 00 80 \{nf\} imul \$0x8000,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 69 0a 00 80 \{nf\} imul \$0x8000,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c c1 e1 0f \{nf\} shl \$0xf,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c c1 e1 0f \{nf\} shl \$0xf,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e1 0f \{nf\} shl \$0xf,%cx,%cx [ ]*[a-f0-9]+:[ ]*8d 14 49[ ]+lea \(%rcx,%rcx,2\),%edx [ ]*[a-f0-9]+:[ ]*8d 54 ad 00[ ]+lea 0x0\(%rbp,%rbp,4\),%edx [ ]*[a-f0-9]+:[ ]*8d 2c c9[ ]+lea \(%rcx,%rcx,8\),%ebp [ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 6b d4 03[ ]+\{nf\} imul \$0x3,%esp,%edx [ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 6b e4 05[ ]+\{nf\} imul \$0x5,%esp,%esp +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c 01 d2 \{nf\} add %edx,%edx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c d1 22 \{nf\} shl \$1,\(%rdx\),%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 01 c9 \{nf\} add %ecx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 01 c9 \{nf\} add %ecx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c c1 e2 02 \{nf\} shl \$0x2,%edx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c c1 22 02 \{nf\} shl \$0x2,\(%rdx\),%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c c1 e1 02 \{nf\} shl \$0x2,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c c1 e1 02 \{nf\} shl \$0x2,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c c1 e2 06 \{nf\} shl \$0x6,%edx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c c1 22 06 \{nf\} shl \$0x6,\(%rdx\),%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c c1 e1 06 \{nf\} shl \$0x6,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c c1 e1 06 \{nf\} shl \$0x6,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c c1 e2 07 \{nf\} shl \$0x7,%edx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c c1 22 07 \{nf\} shl \$0x7,\(%rdx\),%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c c1 e1 07 \{nf\} shl \$0x7,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c c1 e1 07 \{nf\} shl \$0x7,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c c1 e2 1f \{nf\} shl \$0x1f,%edx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c c1 22 1f \{nf\} shl \$0x1f,\(%rdx\),%ecx [ ]*[a-f0-9]+:[ ]*48 8d 14 49[ ]+lea \(%rcx,%rcx,2\),%rdx [ ]*[a-f0-9]+:[ ]*48 8d 54 ad 00[ ]+lea 0x0\(%rbp,%rbp,4\),%rdx [ ]*[a-f0-9]+:[ ]*48 8d 2c c9[ ]+lea \(%rcx,%rcx,8\),%rbp [ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 6b d4 03[ ]+\{nf\} imul \$0x3,%rsp,%rdx [ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 6b e4 05[ ]+\{nf\} imul \$0x5,%rsp,%rsp +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c 01 d2 \{nf\} add %rdx,%rdx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c d1 22 \{nf\} shl \$1,\(%rdx\),%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 01 c9 \{nf\} add %rcx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 01 c9 \{nf\} add %rcx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c c1 e2 02 \{nf\} shl \$0x2,%rdx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c c1 22 02 \{nf\} shl \$0x2,\(%rdx\),%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c c1 e1 02 \{nf\} shl \$0x2,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c c1 e1 02 \{nf\} shl \$0x2,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c c1 e2 06 \{nf\} shl \$0x6,%rdx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c c1 22 06 \{nf\} shl \$0x6,\(%rdx\),%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c c1 e1 06 \{nf\} shl \$0x6,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c c1 e1 06 \{nf\} shl \$0x6,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c c1 e2 07 \{nf\} shl \$0x7,%rdx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c c1 22 07 \{nf\} shl \$0x7,\(%rdx\),%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c c1 e1 07 \{nf\} shl \$0x7,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c c1 e1 07 \{nf\} shl \$0x7,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c c1 e2 1e \{nf\} shl \$0x1e,%rdx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 69 ca 00 00 00 80 \{nf\} imul \$0xffffffff80000000,%rdx,%rcx #pass --- a/gas/testsuite/gas/i386/x86-64-apx-nf-optimize-size.d +++ b/gas/testsuite/gas/i386/x86-64-apx-nf-optimize-size.d @@ -1522,14 +1522,87 @@ Disassembly of section \.text: [ ]*[a-f0-9]+:[ ]*62 e4 7d 0c 6b c5 03[ ]+\{nf\} imul \$0x3,%bp,%r16w [ ]*[a-f0-9]+:[ ]*62 fc 7d 0c 6b d5 05[ ]+\{nf\} imul \$0x5,%r21w,%dx [ ]*[a-f0-9]+:[ ]*62 ec 7d 0c 6b ed 09[ ]+\{nf\} imul \$0x9,%r21w,%r21w +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b ca 02 \{nf\} imul \$0x2,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b 0a 02 \{nf\} imul \$0x2,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*66 8d 0c 09[ ]+lea \(%rcx,%rcx,1\),%cx +[ ]*[a-f0-9]+:[ ]*66 8d 0c 09[ ]+lea \(%rcx,%rcx,1\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c d1 e2 \{nf\} shl \$1,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c d1 22 \{nf\} shl \$1,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c d1 e1 \{nf\} shl \$1,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c d1 e1 \{nf\} shl \$1,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b ca 04 \{nf\} imul \$0x4,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b 0a 04 \{nf\} imul \$0x4,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b c9 04 \{nf\} imul \$0x4,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b c9 04 \{nf\} imul \$0x4,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 1c 6b ca 04 \{nf\} imulzu \$0x4,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 1c 6b 0a 04 \{nf\} imulzu \$0x4,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 1c 6b c9 04 \{nf\} imulzu \$0x4,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 1c 6b c9 04 \{nf\} imulzu \$0x4,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b ca 40 \{nf\} imul \$0x40,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b 0a 40 \{nf\} imul \$0x40,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b c9 40 \{nf\} imul \$0x40,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 6b c9 40 \{nf\} imul \$0x40,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 1c 6b ca 40 \{nf\} imulzu \$0x40,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 1c 6b 0a 40 \{nf\} imulzu \$0x40,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 1c 6b c9 40 \{nf\} imulzu \$0x40,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 1c 6b c9 40 \{nf\} imulzu \$0x40,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 69 ca 80 00 \{nf\} imul \$0x80,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 69 0a 80 00 \{nf\} imul \$0x80,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c c1 e1 07 \{nf\} shl \$0x7,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c c1 e1 07 \{nf\} shl \$0x7,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e2 07 \{nf\} shl \$0x7,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 22 07 \{nf\} shl \$0x7,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e1 07 \{nf\} shl \$0x7,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e1 07 \{nf\} shl \$0x7,%cx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 69 ca 00 80 \{nf\} imul \$0x8000,%dx,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c 69 0a 00 80 \{nf\} imul \$0x8000,\(%rdx\),%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c c1 e1 0f \{nf\} shl \$0xf,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 7d 0c c1 e1 0f \{nf\} shl \$0xf,%cx +[ ]*[a-f0-9]+:[ ]*62 f4 75 1c c1 e1 0f \{nf\} shl \$0xf,%cx,%cx [ ]*[a-f0-9]+:[ ]*8d 14 49[ ]+lea \(%rcx,%rcx,2\),%edx [ ]*[a-f0-9]+:[ ]*8d 54 ad 00[ ]+lea 0x0\(%rbp,%rbp,4\),%edx [ ]*[a-f0-9]+:[ ]*8d 2c c9[ ]+lea \(%rcx,%rcx,8\),%ebp [ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 6b d4 03[ ]+\{nf\} imul \$0x3,%esp,%edx [ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 6b e4 05[ ]+\{nf\} imul \$0x5,%esp,%esp +[ ]*[a-f0-9]+:[ ]*8d 0c 12[ ]+lea \(%rdx,%rdx,1\),%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c d1 22 \{nf\} shl \$1,\(%rdx\),%ecx +[ ]*[a-f0-9]+:[ ]*8d 0c 09[ ]+lea \(%rcx,%rcx,1\),%ecx +[ ]*[a-f0-9]+:[ ]*8d 0c 09[ ]+lea \(%rcx,%rcx,1\),%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 6b ca 04 \{nf\} imul \$0x4,%edx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 6b 0a 04 \{nf\} imul \$0x4,\(%rdx\),%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 6b c9 04 \{nf\} imul \$0x4,%ecx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 6b c9 04 \{nf\} imul \$0x4,%ecx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 6b ca 40 \{nf\} imul \$0x40,%edx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 6b 0a 40 \{nf\} imul \$0x40,\(%rdx\),%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 6b c9 40 \{nf\} imul \$0x40,%ecx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c 6b c9 40 \{nf\} imul \$0x40,%ecx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c c1 e2 07 \{nf\} shl \$0x7,%edx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c c1 22 07 \{nf\} shl \$0x7,\(%rdx\),%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c c1 e1 07 \{nf\} shl \$0x7,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 7c 0c c1 e1 07 \{nf\} shl \$0x7,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c c1 e2 1f \{nf\} shl \$0x1f,%edx,%ecx +[ ]*[a-f0-9]+:[ ]*62 f4 74 1c c1 22 1f \{nf\} shl \$0x1f,\(%rdx\),%ecx [ ]*[a-f0-9]+:[ ]*48 8d 14 49[ ]+lea \(%rcx,%rcx,2\),%rdx [ ]*[a-f0-9]+:[ ]*48 8d 54 ad 00[ ]+lea 0x0\(%rbp,%rbp,4\),%rdx [ ]*[a-f0-9]+:[ ]*48 8d 2c c9[ ]+lea \(%rcx,%rcx,8\),%rbp [ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 6b d4 03[ ]+\{nf\} imul \$0x3,%rsp,%rdx [ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 6b e4 05[ ]+\{nf\} imul \$0x5,%rsp,%rsp +[ ]*[a-f0-9]+:[ ]*48 8d 0c 12[ ]+lea \(%rdx,%rdx,1\),%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c d1 22 \{nf\} shl \$1,\(%rdx\),%rcx +[ ]*[a-f0-9]+:[ ]*48 8d 0c 09[ ]+lea \(%rcx,%rcx,1\),%rcx +[ ]*[a-f0-9]+:[ ]*48 8d 0c 09[ ]+lea \(%rcx,%rcx,1\),%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 6b ca 04 \{nf\} imul \$0x4,%rdx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 6b 0a 04 \{nf\} imul \$0x4,\(%rdx\),%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 6b c9 04 \{nf\} imul \$0x4,%rcx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 6b c9 04 \{nf\} imul \$0x4,%rcx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 6b ca 40 \{nf\} imul \$0x40,%rdx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 6b 0a 40 \{nf\} imul \$0x40,\(%rdx\),%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 6b c9 40 \{nf\} imul \$0x40,%rcx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 6b c9 40 \{nf\} imul \$0x40,%rcx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c c1 e2 07 \{nf\} shl \$0x7,%rdx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c c1 22 07 \{nf\} shl \$0x7,\(%rdx\),%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c c1 e1 07 \{nf\} shl \$0x7,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c c1 e1 07 \{nf\} shl \$0x7,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 f4 1c c1 e2 1e \{nf\} shl \$0x1e,%rdx,%rcx +[ ]*[a-f0-9]+:[ ]*62 f4 fc 0c 69 ca 00 00 00 80 \{nf\} imul \$0xffffffff80000000,%rdx,%rcx #pass --- a/opcodes/i386-opc.tbl +++ b/opcodes/i386-opc.tbl @@ -419,21 +419,21 @@ imul, 0xfaf, i386, Modrm|CheckOperandSiz imul, 0xaf, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVexMap4|NF, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } imul, 0x6b, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } imul, 0x6b, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVexMap4|NF|Optimize, { Imm8S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } -imulzu, 0x6b, APX_F, Modrm|No_bSuf|No_sSuf|EVexMap4|NF|ZU, { Imm8S, Reg16|Unspecified|BaseIndex, Reg16 } +imulzu, 0x6b, APX_F, Modrm|No_bSuf|No_sSuf|EVexMap4|NF|ZU|Optimize, { Imm8S, Reg16|Unspecified|BaseIndex, Reg16 } imul, 0x69, i186, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } -imul, 0x69, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVexMap4|NF, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } -imulzu, 0x69, APX_F, Modrm|No_bSuf|No_sSuf|EVexMap4|NF|ZU, { Imm16, Reg16|Unspecified|BaseIndex, Reg16 } +imul, 0x69, APX_F, Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVexMap4|NF|Optimize, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 } +imulzu, 0x69, APX_F, Modrm|No_bSuf|No_sSuf|EVexMap4|NF|ZU|Optimize, { Imm16, Reg16|Unspecified|BaseIndex, Reg16 } // imul with 2 operands mimics imul with 3 by putting the register in // both i.rm.reg & i.rm.regmem fields. RegKludge enables this // transformation. imul, 0x6b, i186, Modrm|No_bSuf|No_sSuf|RegKludge, { Imm8S, Reg16|Reg32|Reg64 } imul, 0x6b, APX_F, Modrm|No_bSuf|No_sSuf|RegKludge|EVexMap4|NF|Optimize, { Imm8S, Reg16|Reg32|Reg64 } imul, 0x69, i186, Modrm|No_bSuf|No_sSuf|RegKludge, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64 } -imul, 0x69, APX_F, Modrm|No_bSuf|No_sSuf|RegKludge|EVexMap4|NF, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64 } +imul, 0x69, APX_F, Modrm|No_bSuf|No_sSuf|RegKludge|EVexMap4|NF|Optimize, { Imm16|Imm32|Imm32S, Reg16|Reg32|Reg64 } // ZU is omitted here, for colliding with RegKludge. process_operands() will // replace the constraint value after processing RegKludge. -imulzu, 0x6b, APX_F, Modrm|No_bSuf|No_sSuf|RegKludge|EVexMap4|NF/*|ZU*/, { Imm8S, Reg16 } -imulzu, 0x69, APX_F, Modrm|No_bSuf|No_sSuf|RegKludge|EVexMap4|NF/*|ZU*/, { Imm16, Reg16 } +imulzu, 0x6b, APX_F, Modrm|No_bSuf|No_sSuf|RegKludge|EVexMap4|NF/*|ZU*/|Optimize, { Imm8S, Reg16 } +imulzu, 0x69, APX_F, Modrm|No_bSuf|No_sSuf|RegKludge|EVexMap4|NF/*|ZU*/|Optimize, { Imm16, Reg16 }