Message ID | 20220318165214.2291065-1-adhemerval.zanella@linaro.org |
---|---|
Headers |
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9ED183888C4E for <patchwork@sourceware.org>; Fri, 18 Mar 2022 16:52:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9ED183888C4E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1647622361; bh=tS04osOKLXS/zl6XYOWHZfQyysD7wRvq5j4lMf0wQng=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=OjLHAEsgGMQtYvUMKc9Fp0ZsMXTzTaTxhPFcNNKBmodsx+gsjoZd2Y/5EYP/6qemO ZHWkSlhGTC0mAnQ7VhejVv4278gFEfJS+Ij4XawTwQAVoD/F+B6z3VyyypVejJfrv6 9uzW0n8KC9tNtxsOvx2Xho+EY/N4ehEA63xi1OZI= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oo1-xc32.google.com (mail-oo1-xc32.google.com [IPv6:2607:f8b0:4864:20::c32]) by sourceware.org (Postfix) with ESMTPS id 524BB3858C2C for <libc-alpha@sourceware.org>; Fri, 18 Mar 2022 16:52:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 524BB3858C2C Received: by mail-oo1-xc32.google.com with SMTP id w3-20020a4ac183000000b0031d806bbd7eso10753262oop.13 for <libc-alpha@sourceware.org>; Fri, 18 Mar 2022 09:52:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=tS04osOKLXS/zl6XYOWHZfQyysD7wRvq5j4lMf0wQng=; b=d5oa5pVB8wiqTOqi1/WiJTfmlqpxRsWmhX8hOuT1iKZDVffVTSzEiw9RsgjclRgyvP ZeuHmesUUMZvZLWKhYD/mNPv2sJLpAVSeMY6Sx6ZTfQQ5s4fAr49Yoehzy7aoceeqxGA teEyiCpDqv+sOElka+rvNpCAH8K9xMs1UMUMn0FkW0B27ebkRXyA5QzLdj2ozHQn7NhC 5xvXtQGr6Ye9fvh487kgnCLFQ59aVuVXld/RRTQq6SRZ+7dtn2RgYEsPiFMhvEU+ATLW zw9K6pOT29tJBL+SqanHYQ6GuPquczFIzFy9tzFJ+/qmaubzwO+1eBt9vYCMJ/mb116A p/qA== X-Gm-Message-State: AOAM532xuUrFgv2B20y+tfJoho+7KJ3tuIi891dCCbQHouW7CEjN8wbA pAkxnTp9nq/M1hjzfiDn+Pnprhl1x/k7yg== X-Google-Smtp-Source: ABdhPJzsry/SJl3geNF2p8xpUHzcJzpk8ktJHszE10IYdOiYC6zNJTnodTZSsRHCKCSlRhWuNBIA6w== X-Received: by 2002:a05:6870:4187:b0:d9:f16d:f04a with SMTP id y7-20020a056870418700b000d9f16df04amr7406157oac.68.1647622339240; Fri, 18 Mar 2022 09:52:19 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:99a3:99e2:1060:da92:ae49]) by smtp.gmail.com with ESMTPSA id c12-20020a9d75cc000000b005b24b061940sm3986417otl.33.2022.03.18.09.52.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 18 Mar 2022 09:52:18 -0700 (PDT) To: libc-alpha@sourceware.org, Paul Eggert <eggert@cs.ucla.edu> Subject: [PATCH v3 0/7] Refactor syslog implementation Date: Fri, 18 Mar 2022 13:52:07 -0300 Message-Id: <20220318165214.2291065-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe> List-Archive: <https://sourceware.org/pipermail/libc-alpha/> List-Post: <mailto:libc-alpha@sourceware.org> List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help> List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>, <mailto:libc-alpha-request@sourceware.org?subject=subscribe> From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org> Reply-To: Adhemerval Zanella <adhemerval.zanella@linaro.org> Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org> |
Series |
Refactor syslog implementation
|
|
Message
Adhemerval Zanella Netto
March 18, 2022, 4:52 p.m. UTC
The main driver of this change is to move away of using 32-bit timestamps. Based on discussion where systemd should support RFC5424 [1], it decided to move away from implementing on glibc [2] and just replace the use of localtime by gmtime. It is a deviation from RFC3164, but it improves some corner cases [3]. [1] https://github.com/systemd/systemd/issues/19251 [2] https://sourceware.org/pipermail/libc-alpha/2022-February/136595.html [3] https://sourceware.org/pipermail/libc-alpha/2021-March/123583.html Adhemerval Zanella (7): support: Add xmkfifo misc: Add syslog test misc: syslog: Fix indentation and style misc: syslog: Simplify implementation misc: syslog: Use fixed-sized buffer misc: syslog: Move SYSLOG_NAME to USE_MISC (BZ #16355) misc: Use gmtime instead of localtime misc/Makefile | 2 + misc/sys/syslog.h | 4 +- misc/syslog.c | 488 +++++++++++++++++++++++----------------------- misc/tst-syslog.c | 477 ++++++++++++++++++++++++++++++++++++++++++++ support/Makefile | 1 + support/xmkfifo.c | 29 +++ support/xunistd.h | 1 + 7 files changed, 756 insertions(+), 246 deletions(-) create mode 100644 misc/tst-syslog.c create mode 100644 support/xmkfifo.c
Comments
Thanks for looking into this. I'm reviewing the patches all in one diff rather than one patch at a time, as that's more convenient for me: > -#ifdef SYSLOG_NAMES > +#if defined(SYSLOG_NAMES) && defined(__USE_MISC) Need spaces before parens. Better yet, omit the parens. Please do this systematically in #if. > + enum > { > + timestamp_size = sizeof "MMM DD hh:mm:ss ", > + bufs_size = 1024 > + }; As these enums are used only once it might be more readable to eliminate them and replace their uses with their definiens, e.g., char timestamp[sizeof "MMM DD hh:mm:ss "]; ... char bufs[1024]; since the later code uses "sizeof timestamp" and "sizeof bufs" anyway (as that's less error-prone). > + /* "%h %e %H:%M:%S " */ Please prefer "%b" to "%h" here and elsewhere, as they're equivalent and "%b" is more mnemonic (it's short for "%B"). > + /* We deviate from RFC3164 which states timestamp should be in localtime Please use imperative instead of plural form: "Deviate from" instead of "We deviate from". None of the new comments should need to use "we" or "us" or "our" or "ours". > + bool buf_malloced = false; This local var isn't needed. You can remove it, and replace its use with "buf != bufs", which is like what the old code did; this is a bit more efficient, I expect. > + bool has_ts = __gmtime64_r (&now, &now_tm) != NULL; It'll be slightly more efficient to replace this with: struct tm *now_tmp = __gmtime64_r (&now, &now_tm); bool has_ts = now_tmp != NULL; and replace the "&now_tm" with "now_tmp" in the next __strftime_l call. > + /* In the highly unlike case of gmtime_r failure (the clock being > + INT_MIN + 1900 or follow INT_MAX + 1900) we skip the hostname so the > + message is handl as valid PRI but without TIMESTAMP or invalid TIMESTAMP > + (which should force the relay to add the timestamp itself). */ Some English fixups. "unlike" -> "unlikely". No need for "highly". "the clock being INT_MIN + 1900 or follow INT_MAX + 1900" -> "tm_year out of int range". "we skip" -> "skip". "handl" -> "handled". I don't understand the bit about "without TIMESTAMP or invalid TIMESTAMP (which should force the relay to add the timestamp itself)". Since we're already departing from RFC 3164, aren't we already generating an invalid TIMESTAMP? And if so, why can't we output our own representation of the out-of-range timestamp, e.g., '@67768037170140800' to represent a timestamp that is 67768037170140800 seconds after the Epoch? Better yet, we could output the correct year by dividing the __time64_t value by 12622780800 (60 * 60 * 24 * the number of days in 400 Gregorian years), running __gmtime64_r on the remainder, and adding 400 times the quotient to the tm_year that __gmtime64_r gives us; this computation will always succeed and so we won't need to worry about __gmtime64_r failure. On platforms with leap seconds this approach would go very slightly wrong on timestamps millions of years in the future but those timestamps are wrong anyway (due to leap seconds we don't know about yet, plus we'll switch to some approach other than leap seconds by then anyway). > + pid != 0 ? "[" : "", pid, pid != 0 ? "]" : "" Is GCC smart enough to optimize this to be branch-free? If not, you can hand-optimize it as follows: "[" + (pid == 0), pid, "]" + (pid == 0) > + buf[bufsize - 1] != '\n' ? "\n" : ""); Similarly, this can be "\n" + (buf[bufsize - 1] == '\n'). > + if (l < sizeof (bufs)) Omit the unnecessary parentheses (for consistency with the other code). Also, this comparison isn't safe on admittedly-theoretical platforms where size_t is narrower than int. So I suggest: if (0 <= l && l < sizeof bufs) which is clearer and should be equally efficient. + if (l + vl < sizeof bufs) l + vl could have signed integer overflow, leading to undefined behavior. Also, this doesn't work if vl == -1. Also, we have the same theoretical problem as before. So change this to "if (0 <= vl && vl < sizeof bufs - l)". > + FILE *f = __open_memstream (&buf, &bufsize); > + if (f != NULL) I'm not seeing what the memstream buys you here, compared to a simple malloc. You can't generate anything longer than INT_MAX bytes, since fprintf won't let you. And you already know how many bytes to allocate, from the returned value of the call to snprintf on the too-small stack buffer. So just call malloc and then call snprintf again; there's no need for a memstream. (The existing code already has this problem of course.)
On 18/03/2022 18:11, Paul Eggert wrote: > Thanks for looking into this. I'm reviewing the patches all in one diff rather than one patch at a time, as that's more convenient for me: > > >> -#ifdef SYSLOG_NAMES >> +#if defined(SYSLOG_NAMES) && defined(__USE_MISC) > > Need spaces before parens. Better yet, omit the parens. Please do this systematically in #if. > Ack. > >> + enum >> { >> + timestamp_size = sizeof "MMM DD hh:mm:ss ", >> + bufs_size = 1024 >> + }; > > As these enums are used only once it might be more readable to eliminate them and replace their uses with their definiens, e.g., > > char timestamp[sizeof "MMM DD hh:mm:ss "]; > ... > char bufs[1024]; > > since the later code uses "sizeof timestamp" and "sizeof bufs" anyway (as that's less error-prone). Yeah, it seems it would be better in this case. > > >> + /* "%h %e %H:%M:%S " */ > > Please prefer "%b" to "%h" here and elsewhere, as they're equivalent and "%b" is more mnemonic (it's short for "%B"). > Ack, although in this case I am keeping the old code as-in. > >> + /* We deviate from RFC3164 which states timestamp should be in localtime > > Please use imperative instead of plural form: "Deviate from" instead of "We deviate from". None of the new comments should need to use "we" or "us" or "our" or "ours". > Ack. > >> + bool buf_malloced = false; > > This local var isn't needed. You can remove it, and replace its use with "buf != bufs", which is like what the old code did; this is a bit more efficient, I expect. > Ack. > >> + bool has_ts = __gmtime64_r (&now, &now_tm) != NULL; > > It'll be slightly more efficient to replace this with: > > struct tm *now_tmp = __gmtime64_r (&now, &now_tm); > bool has_ts = now_tmp != NULL; > > and replace the "&now_tm" with "now_tmp" in the next __strftime_l call. > Ack. > >> + /* In the highly unlike case of gmtime_r failure (the clock being >> + INT_MIN + 1900 or follow INT_MAX + 1900) we skip the hostname so the >> + message is handl as valid PRI but without TIMESTAMP or invalid TIMESTAMP >> + (which should force the relay to add the timestamp itself). */ > > Some English fixups. "unlike" -> "unlikely". No need for "highly". "the clock being INT_MIN + 1900 or follow INT_MAX + 1900" -> "tm_year out of int range". "we skip" -> "skip". "handl" -> "handled". Ack. > > I don't understand the bit about "without TIMESTAMP or invalid TIMESTAMP > (which should force the relay to add the timestamp itself)". Since we're already departing from RFC 3164, aren't we already generating an invalid TIMESTAMP? And if so, why can't we output our own representation of the out-of-range timestamp, e.g., '@67768037170140800' to represent a timestamp that is 67768037170140800 seconds after the Epoch? I meant the RFC3164 '4.3.2 Valid PRI but no TIMESTAMP or invalid TIMESTAMP', which states in such case the relay should be responsible to generate the timestamp itself. And I think that if the clock is in a such bogus state, I don't see a gain in generating timestamp. > > Better yet, we could output the correct year by dividing the __time64_t value by 12622780800 (60 * 60 * 24 * the number of days in 400 Gregorian years), running __gmtime64_r on the remainder, and adding 400 times the quotient to the tm_year that __gmtime64_r gives us; this computation will always succeed and so we won't need to worry about __gmtime64_r failure. On platforms with leap seconds this approach would go very slightly wrong on timestamps millions of years in the future but those timestamps are wrong anyway (due to leap seconds we don't know about yet, plus we'll switch to some approach other than leap seconds by then anyway). I really don't think we should bother for such corner cases, specially since from systemd discussion [1] usually the local relay will use different timestamps mechanisms than the one generated by the client (such as socket timetamp or better resolutions obtained by server itself). [1] https://github.com/systemd/systemd/issues/19251 > > >> + pid != 0 ? "[" : "", pid, pid != 0 ? "]" : "" > > Is GCC smart enough to optimize this to be branch-free? If not, you can hand-optimize it as follows: > > "[" + (pid == 0), pid, "]" + (pid == 0) I would guess so, but I did not bother to check. Your suggestion is slight simpler though. > >> + buf[bufsize - 1] != '\n' ? "\n" : ""); > > Similarly, this can be "\n" + (buf[bufsize - 1] == '\n'). > > Ack. >> + if (l < sizeof (bufs)) > > Omit the unnecessary parentheses (for consistency with the other code). > Also, this comparison isn't safe on admittedly-theoretical platforms where size_t is narrower than int. So I suggest: > > if (0 <= l && l < sizeof bufs) > > which is clearer and should be equally efficient. Ack. > > + if (l + vl < sizeof bufs) > > l + vl could have signed integer overflow, leading to undefined behavior. Also, this doesn't work if vl == -1. Also, we have the same theoretical problem as before. So change this to "if (0 <= vl && vl < sizeof bufs - l)". Ack. > > >> + FILE *f = __open_memstream (&buf, &bufsize); >> + if (f != NULL) > > I'm not seeing what the memstream buys you here, compared to a simple malloc. You can't generate anything longer than INT_MAX bytes, since fprintf won't let you. And you already know how many bytes to allocate, from the returned value of the call to snprintf on the too-small stack buffer. So just call malloc and then call snprintf again; there's no need for a memstream. (The existing code already has this problem of course.) Yeah, I tried to keep this part as is, but it seems better indeed.