From patchwork Fri Jun 28 13:31:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 93023 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 59F683826DD5 for ; Fri, 28 Jun 2024 13:32:04 +0000 (GMT) X-Original-To: binutils@sourceware.org Delivered-To: binutils@sourceware.org Received: from mail-lj1-x22e.google.com (mail-lj1-x22e.google.com [IPv6:2a00:1450:4864:20::22e]) by sourceware.org (Postfix) with ESMTPS id 1DABA382FADF for ; Fri, 28 Jun 2024 13:31:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1DABA382FADF Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1DABA382FADF Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::22e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719581492; cv=none; b=gYe6WiTKHCkjbIcIzY/XxZmG6vWSBRrAbdDZoUW3g6EtfZ9An4fD/Alktj44FeevUZJyiXxwre6sl034QhAglEE4kBCnfMQdqATJm9VGCJ6r6w5EaFOYJSFpA04jOlTF436YXRh+0lo/fNXCVVdClXsMzc1E8o+EZEVdXu6YAVo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1719581492; c=relaxed/simple; bh=jWXOrarrA1Dwq7eq84+vRw1DrD2FLogFfcDn0h3kw/w=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:From:To; b=hoh9m7u/zw6KDzKC3GdD4W7zramWWNH+UMTLMH6BRZLEd/WBywfCJn790e+syrqn4HUNBdWX7X/WmlGM+DDAWm+gOJPti5/8sZtpBuqfTKUTU2y/C+GhujCjBMEWoeQhUwSeq9X9vVLqkEXBLKbD0LVaCqR9lgdAYoizOANf9Rg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x22e.google.com with SMTP id 38308e7fff4ca-2eaea28868dso8654101fa.3 for ; Fri, 28 Jun 2024 06:31:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1719581488; x=1720186288; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=YqiurQ4+47X8w2RRTNdqiwc+w6tqrhPzlz1AEF+dL9M=; b=cRFldAvV1Or/ECRS43/c6DdJBmhZH8DrMLPeYLbsZ3H/Acr52aPpPi5P7PkyOfDL6C JmYhxm9irQJ9gPdiGO9EHoDywaDpJNU2QGzNoPytIpq7CgcNANvoumco4+Z45jYPi90X YnRuK40qJ5u/cNugKuE0RGBvQWdtOEXUzvXBhn4b5OKUcI0xtwxO/IeUx15FF1Y9xNsE WSfowiVdSiajCT5Rf4psmZ2HSpVyF+aK/JMJPJl9vIowtLGk1s8DAt/7FPzlWTRznR9T ozu+VFqiah0jcy8hsWAnEd5TpAdBs02l+fwLUvTd2kiD7hwUCBldCq3IkPUQCDoVzW/5 haXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719581488; x=1720186288; h=content-transfer-encoding:in-reply-to:autocrypt:content-language :references:cc:to:from:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=YqiurQ4+47X8w2RRTNdqiwc+w6tqrhPzlz1AEF+dL9M=; b=Wp75/VV0xNZZNhm1uuXiGkAW37ZmOdt9AOrqoiScJsp6dzeC88jRAlEkzkO4hhIgzy aswKj4Xg+YFC5Shppj3x67z7QIpT8gIdbLaPagFgZsVWm5UV+XqaHmqGPVOQYRiCLDS0 9b5cl8ZI6PpMtLmqG8vPgrlV2wvHIWL25hx6ZcZcTQIFAO/IoPFxO8fpIZxd1VdY56T0 q+LBCAKc4hTfk03qdPhBy+6wDM71rY4QiKbuOsq0CFJBaySRVdTsiNePlZPfcU0PrI5Q GEG0GBAiHuNEv0fZnBKlcpgLQVvmuFHgKcoEytNubAOhKhZpR8d1sqvGc7S4GzWFrF0i d6mQ== X-Gm-Message-State: AOJu0YzvIxSmHBa+S/V1IwidPLlP06OKITf85+K+UdER56sI9FnamN7F MRog3UeHJrrQ80k3W6cs0Nw0dcmOSomfHzU+T57OtiBbr+JuEVN4VcExxbOD6Jqyo3KTfX6oWNU = X-Google-Smtp-Source: AGHT+IE1j7el+Hh9CtzCAD8+ZaFAh29oR6E1JnhehD89iOUgL20c5vz+ouoI3ZDqi17bgh52ObkScg== X-Received: by 2002:a2e:a5cb:0:b0:2ee:4a67:3d82 with SMTP id 38308e7fff4ca-2ee4a673edcmr48629251fa.28.1719581488298; Fri, 28 Jun 2024 06:31:28 -0700 (PDT) Received: from [10.156.60.236] (ip-037-024-206-209.um08.pools.vodafone-ip.de. [37.24.206.209]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70804b8ebebsm1558956b3a.214.2024.06.28.06.31.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 Jun 2024 06:31:27 -0700 (PDT) Message-ID: <9701ab88-c5a5-4bb0-aa13-ac583f133c4e@suse.com> Date: Fri, 28 Jun 2024 15:31:20 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: [PATCH WIP/RFC 11/11] gas: have scrubber retain more whitespace From: Jan Beulich To: Binutils Cc: Nick Clifton , "ramana.radhakrishnan@arm.com" , Richard Earnshaw , Marcus Shawcroft , "H.J. Lu" , Joseph Myers References: Content-Language: en-US Autocrypt: addr=jbeulich@suse.com; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL In-Reply-To: X-Spam-Status: No, score=-3023.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_STOCKGEN, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: binutils-bounces+patchwork=sourceware.org@sourceware.org According to the description os the state machine, the expectation appears to be that (leaving aside labels) any insn mnemonic or directive would be followed by a comma separated list of operands. That may have been true very long ago, but the latest with the advent of more elaborate macros this isn't rhe case anymore. Neither macro parameters in macro definitions nor macro arguments in macro invocations are required to be separated by commas. Hence whitespace serves a crucial role there. Plus even without "real" macros issues exist, in e.g. .irp n, ... insn\n\(suffix) operand1, operand2 .endr Whitespace following the closing parenthesis would have been removed (ahead of even processing the .irp), as the "opcode" was deemed to have ended earlier already. Therefore, squash the distinction between "opcode" and operands, i.e. fold state 10 back into state 3. Also drop most of the distinction between "symbol chars" and "relatively normal" ones. Not entirely unexpectedly this results in the need to skip whitespace in a few more places in arch-specific code. --- The comma related special casing could of course be extended to adjacent two-char comments; I simply didn't deem this as important. WIP: likely there's also fallout for further architectures WIP: see about dropping the D10V special case (not helped by there not being a dedicated maintainer) WIP: revisit the TIC6X special case WIP: testcase(s) WIP: NEWS entry (perhaps also to warn people of posible perceived regressions, along the lines of what "Arm: correct macro use in gas testsuite" addresses) --- a/gas/app.c +++ b/gas/app.c @@ -423,16 +423,18 @@ do_scrub_chars (size_t (*get) (char *, s /*State 0: beginning of normal line 1: After first whitespace on line (flush more white) - 2: After first non-white (opcode) on line (keep 1white) - 3: after second white on line (into operands) (flush white) + 2: After first non-white (opcode or maybe label when they're followed + by colons) on line (keep 1white) + 3: after subsequent white on line (typically into operands) + (flush more white) 4: after putting out a .linefile, put out digits 5: parsing a string, then go to old-state 6: putting out \ escape in a "d string. 7: no longer used 8: no longer used - 9: After seeing symbol char in state 3 (keep 1white after symchar) - 10: After seeing whitespace in state 9 (keep white before symchar) - 11: After seeing a symbol character in state 0 (eg a label definition) + 9: After seeing non-white in state 3 (keep 1white) + 10: no longer used + 11: After seeing a non-white character in state 0 (eg a label definition) -1: output string in out_string and go to the state in old_state 12: no longer used #ifdef DOUBLEBAR_PARALLEL @@ -899,7 +901,11 @@ do_scrub_chars (size_t (*get) (char *, s && (state < 1 || strchr (tc_comment_chars, ch))) || IS_NEWLINE (ch) || IS_LINE_SEPARATOR (ch) - || IS_PARALLEL_SEPARATOR (ch)) + || IS_PARALLEL_SEPARATOR (ch) + /* See comma related comment near the bottom of the function. + Reasoning equally applies to whitespace preceding a comma in + most cases. */ + || (ch == ',' && state > 2)) { if (scrub_m68k_mri) { @@ -942,6 +948,7 @@ do_scrub_chars (size_t (*get) (char *, s character at the beginning of a line. */ goto recycle; case 2: + case 9: state = 3; if (to + 1 < toend) { @@ -965,20 +972,6 @@ do_scrub_chars (size_t (*get) (char *, s break; } goto recycle; /* Sp in operands */ - case 9: - case 10: -#ifndef TC_KEEP_OPERAND_SPACES - if (scrub_m68k_mri) -#endif - { - /* In MRI mode, we keep these spaces. */ - state = 3; - UNGET (ch); - PUT (' '); - break; - } - state = 10; /* Sp after symbol char */ - goto recycle; case 11: if (LABELS_WITHOUT_COLONS || flag_m68k_mri) state = 1; @@ -1049,27 +1042,17 @@ do_scrub_chars (size_t (*get) (char *, s { if (ch2 != EOF) UNGET (ch2); - if (state == 9 || state == 10) - state = 3; + if (state == 1) + state = 2; + else if (state == 3) + state = 9; PUT (ch); } break; case LEX_IS_STRINGQUOTE: quotechar = ch; - if (state == 10) - { - /* Preserve the whitespace in foo "bar". */ - UNGET (ch); - state = 3; - PUT (' '); - - /* PUT didn't jump out. We could just break, but we - know what will happen, so optimize a bit. */ - ch = GET (); - old_state = 9; - } - else if (state == 3) + if (state == 3) old_state = 9; else old_state = state; @@ -1088,14 +1071,6 @@ do_scrub_chars (size_t (*get) (char *, s UNGET (c); } #endif - if (state == 10) - { - /* Preserve the whitespace in foo 'b'. */ - UNGET (ch); - state = 3; - PUT (' '); - break; - } ch = GET (); if (ch == EOF) { @@ -1130,10 +1105,7 @@ do_scrub_chars (size_t (*get) (char *, s PUT (out_buf[0]); break; } - if (state == 9) - old_state = 3; - else - old_state = state; + old_state = state; state = -1; out_string = out_buf; PUT (*out_string++); @@ -1143,10 +1115,10 @@ do_scrub_chars (size_t (*get) (char *, s #ifdef KEEP_WHITE_AROUND_COLON state = 9; #else - if (state == 9 || state == 10) - state = 3; - else if (state != 3) + if (state == 2 || state == 11) state = 1; + else + state = 9; #endif PUT (ch); break; @@ -1271,7 +1243,7 @@ do_scrub_chars (size_t (*get) (char *, s break; } -#ifdef TC_D10V +#ifdef TC_D10V//todo drop? /* All insns end in a char for which LEX_IS_SYMBOL_COMPONENT is true. Trap is the only short insn that has a first operand that is neither register nor label. @@ -1282,7 +1254,7 @@ do_scrub_chars (size_t (*get) (char *, s can recognize it as such. */ /* An alternative approach would be to reset the state to 1 when we see '||', '<'- or '->', but that seems to be overkill. */ - if (state == 10) + if (state == 3) PUT (' '); #endif /* We have a line comment character which is not at the @@ -1296,7 +1268,7 @@ do_scrub_chars (size_t (*get) (char *, s if (scrub_m68k_mri && (ch == '!' || ch == '*' || ch == '#') && state != 1 - && state != 10) + && state != 3) goto de_fault; /* Fall through. */ case LEX_IS_COMMENT_START: @@ -1352,17 +1324,6 @@ do_scrub_chars (size_t (*get) (char *, s /* Fall through. */ case LEX_IS_SYMBOL_COMPONENT: - if (state == 10) - { - /* This is a symbol character following another symbol - character, with whitespace in between. We skipped - the whitespace earlier, so output it now. */ - UNGET (ch); - state = 3; - PUT (' '); - break; - } - #ifdef TC_Z80 /* "af'" is a symbol containing '\''. */ if (state == 3 && (ch == 'a' || ch == 'A')) @@ -1388,7 +1349,16 @@ do_scrub_chars (size_t (*get) (char *, s } } #endif - if (state == 3) + + /* Fall through. */ + default: + de_fault: + /* Some relatively `normal' character. */ + if (state == 0) + state = 11; /* Now seeing label definition. */ + else if (state == 1) + state = 2; /* Ditto. */ + else if (state == 3) state = 9; /* This is a common case. Quickly copy CH and all the @@ -1436,52 +1406,15 @@ do_scrub_chars (size_t (*get) (char *, s } } - /* Fall through. */ - default: - de_fault: - /* Some relatively `normal' character. */ - if (state == 0) - { - state = 11; /* Now seeing label definition. */ - } - else if (state == 1) - { - state = 2; /* Ditto. */ - } - else if (state == 9) - { - if (!IS_SYMBOL_COMPONENT (ch)) - state = 3; - } - else if (state == 10) - { - if (ch == '\\') - { - /* Special handling for backslash: a backslash may - be the beginning of a formal parameter (of a - macro) following another symbol character, with - whitespace in between. If that is the case, we - output a space before the parameter. Strictly - speaking, correct handling depends upon what the - macro parameter expands into; if the parameter - expands into something which does not start with - an operand character, then we don't want to keep - the space. We don't have enough information to - make the right choice, so here we are making the - choice which is more likely to be correct. */ - if (to + 1 >= toend) - { - /* If we're near the end of the buffer, save the - character for the next time round. Otherwise - we'll lose our state. */ - UNGET (ch); - goto tofull; - } - *to++ = ' '; - } + /* As a special case, to limit the delta to previous behavior, e.g. + also affecting listings, go straight to state 3 when seeing a + comma. Commas are special: While they can be used to separate + macro parameters or arguments, they cannot (on their own, i.e. + without quoting) be arguments (or parameter default values). + Hence successive whitespace is not meaningful there. */ + if (ch == ',' && state == 9) + state = 3; - state = 3; - } PUT (ch); break; } --- a/gas/config/tc-aarch64.c +++ b/gas/config/tc-aarch64.c @@ -641,6 +641,7 @@ const char FLT_CHARS[] = "rRsSfFdDxXeEpP static inline bool skip_past_char (char **str, char c) { + skip_whitespace (*str); if (**str == c) { (*str)++; @@ -891,6 +892,7 @@ parse_reg (char **ccp) start++; #endif + skip_whitespace (start); p = start; if (!ISALPHA (*p) || !is_name_beginner (*p)) return NULL; @@ -1196,13 +1198,17 @@ parse_typed_reg (char **ccp, aarch64_reg struct vector_type_el *typeinfo, unsigned int flags) { char *str = *ccp; - bool is_alpha = ISALPHA (*str); - const reg_entry *reg = parse_reg (&str); + bool is_alpha; + const reg_entry *reg; struct vector_type_el atype; struct vector_type_el parsetype; bool is_typed_vecreg = false; unsigned int err_flags = (flags & PTR_IN_REGLIST) ? SEF_IN_REGLIST : 0; + skip_whitespace (str); + is_alpha = ISALPHA (*str); + reg = parse_reg (&str); + atype.defined = 0; atype.type = NT_invtype; atype.width = -1; @@ -1420,10 +1426,7 @@ parse_vector_reg_list (char **ccp, aarch do { if (in_range) - { - str++; /* skip over '-' */ - val_range = val; - } + val_range = val; const reg_entry *reg; if (has_qualifier) @@ -1485,7 +1488,8 @@ parse_vector_reg_list (char **ccp, aarch in_range = 0; ptr_flags |= PTR_GOOD_MATCH; } - while (skip_past_comma (&str) || (in_range = 1, *str == '-')); + while (skip_past_comma (&str) + || (in_range = 1, skip_past_char (&str, '-'))); skip_whitespace (str); if (*str != '}') @@ -8265,6 +8269,7 @@ parse_operands (char *str, const aarch64 } /* Check if we have parsed all the operands. */ + skip_whitespace (str); if (*str != '\0' && ! error_p ()) { /* Set I to the index of the last present operand; this is --- a/gas/config/tc-arm.c +++ b/gas/config/tc-arm.c @@ -1148,6 +1148,8 @@ my_get_expression (expressionS * ep, cha prefix_mode = (prefix_mode == GE_OPT_PREFIX_BIG) ? prefix_mode : GE_OPT_PREFIX; + skip_whitespace (*str); + switch (prefix_mode) { case GE_NO_PREFIX: break; --- a/gas/config/tc-i386.c +++ b/gas/config/tc-i386.c @@ -15306,6 +15306,8 @@ RC_SAE_immediate (const char *imm_start) as_bad (_("Missing '}': '%s'"), imm_start); return 0; } + if (is_space_char (*pstr)) + ++pstr; /* RC/SAE immediate string should contain nothing more. */; if (*pstr != 0) {