From patchwork Thu Nov 7 09:51:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Simon Marchi (Code Review)" X-Patchwork-Id: 35717 Received: (qmail 126076 invoked by alias); 7 Nov 2019 09:51:15 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Delivered-To: mailing list gdb-patches@sourceware.org Received: (qmail 126061 invoked by uid 89); 7 Nov 2019 09:51:15 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-21.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT autolearn=ham version=3.3.1 spammy=1129 X-HELO: mx1.osci.io Received: from polly.osci.io (HELO mx1.osci.io) (8.43.85.229) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 07 Nov 2019 09:51:13 +0000 Received: by mx1.osci.io (Postfix, from userid 994) id C48E020393; Thu, 7 Nov 2019 04:51:11 -0500 (EST) Received: from gnutoolchain-gerrit.osci.io (gnutoolchain-gerrit.osci.io [8.43.85.239]) by mx1.osci.io (Postfix) with ESMTP id EE7B6200EF; Thu, 7 Nov 2019 04:51:07 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by gnutoolchain-gerrit.osci.io (Postfix) with ESMTP id B0E4225B28; Thu, 7 Nov 2019 04:51:07 -0500 (EST) X-Gerrit-PatchSet: 3 Date: Thu, 7 Nov 2019 04:51:06 -0500 From: "Sourceware to Gerrit sync (Code Review)" To: Tom de Vries , Luis Machado , Tom Tromey , gdb-patches@sourceware.org Cc: Simon Marchi Auto-Submitted: auto-generated X-Gerrit-MessageType: newpatchset Subject: [pushed] [gdb/contrib] Add words.sh script X-Gerrit-Change-Id: I7b119c9a4519cdbf62a3243d1df2927c80813e8b X-Gerrit-Change-Number: 282 X-Gerrit-ChangeURL: X-Gerrit-Commit: 496af5c81112807c9909fb7038404905e15950ea In-Reply-To: References: Reply-To: noreply@gnutoolchain-gerrit.osci.io, simon.marchi@polymtl.ca, tdevries@suse.de, tromey@sourceware.org, luis.machado@linaro.org, gdb-patches@sourceware.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Gerrit/3.0.3-75-g9005159e5d Message-Id: <20191107095107.B0E4225B28@gnutoolchain-gerrit.osci.io> The original change was created by Tom de Vries. Change URL: https://gnutoolchain-gerrit.osci.io/r/c/binutils-gdb/+/282 ...................................................................... [gdb/contrib] Add words.sh script Add a script that takes a list of files as arguments and output a list of words from the C comments with their frequencies. For: ... $ ./gdb/contrib/words.sh $(find gdb -type f -name "*.c" -o -name "*.h") ... it generates a list of ~15000 words prefixed with frequency. This could be used to generate a dictionary that is kept as part of the sources, against which new code can be checked, generating a warning or error. The hope is that misspellings would trigger this frequently, and rare words rarely, otherwise the burden of updating the dictionary would be too much. And for: ... $ ./gdb/contrib/words.sh -f 1 $(find gdb -type f -name "*.c" -o -name "*.h") ... it generates a list of ~5000 words with frequency 1. This can be used to scan for misspellings manually. Change-Id: I7b119c9a4519cdbf62a3243d1df2927c80813e8b --- A gdb/contrib/words.sh 1 file changed, 129 insertions(+), 0 deletions(-) diff --git a/gdb/contrib/words.sh b/gdb/contrib/words.sh new file mode 100755 index 0000000..ae38539 --- /dev/null +++ b/gdb/contrib/words.sh @@ -0,0 +1,129 @@ +#!/bin/sh + +# Copyright (C) 2019 Free Software Foundation, Inc. +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +# This script intends to facilitate spell checking of comments in C sources. +# It: +# - extracts comments from C files +# - transforms the comments into a list of lowercase words +# - prefixes each word with the frequency +# - filters out words within a frequency range +# - sorts the words, longest first +# +# For: +# ... +# $ ./gdb/contrib/words.sh $(find gdb -type f -name "*.c" -o -name "*.h") +# ... +# it generates a list of ~15000 words prefixed with frequency. +# +# This could be used to generate a dictionary that is kept as part of the +# sources, against which new code can be checked, generating a warning or +# error. The hope is that misspellings would trigger this frequently, and rare +# words rarely, otherwise the burden of updating the dictionary would be too +# much. +# +# And for: +# ... +# $ ./gdb/contrib/words.sh -f 1 $(find gdb -type f -name "*.c" -o -name "*.h") +# ... +# it generates a list of ~5000 words with frequency 1. +# +# This can be used to scan for misspellings manually. +# + +minfreq= +maxfreq= +while [ $# -gt 0 ]; do + case "$1" in + --freq|-f) + minfreq=$2 + maxfreq=$2 + shift 2 + ;; + --min) + minfreq=$2 + if [ "$maxfreq" = "" ]; then + maxfreq=0 + fi + shift 2 + ;; + --max) + maxfreq=$2 + if [ "$minfreq" = "" ]; then + minfreq=0 + fi + shift 2 + ;; + *) + break; + ;; + esac +done + +if [ "$minfreq" = "" ] && [ "$maxfreq" = "" ]; then + minfreq=0 + maxfreq=0 +fi + +awkfile=$(mktemp) +trap 'rm -f "$awkfile"' EXIT + +cat > "$awkfile" <\+\*-]/\n/g' \ + | sed 's/\[/\n/g' \ + | sed 's/\]/\n/g' \ + | sed 's/[0-9][0-9]*/\n/g' \ + | tr '[:upper:]' '[:lower:]' \ + | sed 's/[ \t]*//g' \ + | sort \ + | uniq -c \ + | awk "{ if (($minfreq == 0 || $minfreq <= \$1) \ + && ($maxfreq == 0 || \$1 <= $maxfreq)) { print \$0; } }" \ + | awk '{ print length($0) " " $0; }' \ + | sort -n -r \ + | cut -d ' ' -f 2-