[RFC,pre-commit] Add codespell-log commit-msg hook

Message ID 20250410134533.12142-1-tdevries@suse.de
State Superseded
Headers
Series [RFC,pre-commit] Add codespell-log commit-msg hook |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 warning Skipped upon request
linaro-tcwg-bot/tcwg_gdb_build--master-arm warning Skipped upon request

Commit Message

Tom de Vries April 10, 2025, 1:45 p.m. UTC
  Now that we're using codespell to check spelling in gdb files, can we use
codespell to bring this spelling warning:
...
$ echo usuable | codespell -
1: usuable
	usuable ==> usable
...
to:
...
$ git commit -a -m "Usuable stuff"
...
?

First, let's look at a straightforward commit-msg hook implementation:
...
    - id: codespell
      name: codespell-commit-msg
      verbose: true
      always_run: true
      stages: [commit-msg]
...
installed using:
...
$ pre-commit install -t commit-msg
...

When trying the commit, we get:
...
$ echo "/* bla */" >> gdb/gdb.c
$ git commit -a -m "Usuable stuff"
black................................................(no files to check)Skipped
flake8...............................................(no files to check)Skipped
isort................................................(no files to check)Skipped
codespell............................................(no files to check)Skipped
check-include-guards.................................(no files to check)Skipped
black................................................(no files to check)Skipped
flake8...............................................(no files to check)Skipped
codespell............................................(no files to check)Skipped
codespell-commit-msg.....................................................Failed
- hook id: codespell
- duration: 0.06s
- exit code: 65

.git/COMMIT_EDITMSG:1: Usuable ==> Usable

check-include-guards.................................(no files to check)Skipped
$
...

The commit was aborted, but the commit message is still there:
...
$ cat .git/COMMIT_EDITMSG
Usuable stuff
...

We can retry and edit the commit message to clean up the typo:
...
$ git commit -e -F .git/COMMIT_EDITMSG -a
...
but it's a bit cumbersome.

Furthermore, say we fix a typo and want to document this in the ChangeLog, and
do:
...
$ git commit -m "Fixed typo: useable -> usable" -a
...

This commit cannot succeed, unless we add a codespell ignore tag, which feels
like taking it too far.

Both these problems can be addressed by setting things up in such a way that
the commit always succeeds, and codespell output is shown as a hint.

Ideally, we'd tell to pre-commit to implement this using some setting, but
there doesn't seem to be one.

So we use some indirection.  Instead of using native codespell, use a local
hook that calls a script gdb/contrib/codespell-log.sh, which calls pre-commit,
which calls codespell.

Using this approach, we get:
...
$ echo "/* bla */" >> gdb/gdb.c
$ git commit -a -m "Usuable stuff"
black................................................(no files to check)Skipped
flake8...............................................(no files to check)Skipped
isort................................................(no files to check)Skipped
codespell............................................(no files to check)Skipped
check-include-guards.................................(no files to check)Skipped
black................................................(no files to check)Skipped
flake8...............................................(no files to check)Skipped
codespell............................................(no files to check)Skipped
check-include-guards.................................(no files to check)Skipped
codespell-log............................................................Passed
- hook id: codespell-log
- duration: 0.18s

codespell-log-internal...................................................Failed
- hook id: codespell
- exit code: 65

.git/COMMIT_EDITMSG:1: Usuable ==> Usable

[codespell/codespell-log-2 d081bd25a40] Usuable stuff
 1 file changed, 1 insertion(+)
$
...

This is obviously convoluted, but it works.  Perhaps we can propose a
pre-commit improvement (always_pass) and simplify this eventually.
---
 .pre-commit-config.yaml      |  8 ++++++++
 gdb/contrib/codespell-log.sh | 29 +++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)
 create mode 100755 gdb/contrib/codespell-log.sh


base-commit: d1458933830456e54223d9fc61f0d9b3a19256f5
  

Comments

Tom de Vries April 24, 2025, 9:24 p.m. UTC | #1
On 4/10/25 15:45, Tom de Vries wrote:
> Now that we're using codespell to check spelling in gdb files, can we use
> codespell to bring this spelling warning:
> ...
> $ echo usuable | codespell -
> 1: usuable
> 	usuable ==> usable
> ...
> to:
> ...
> $ git commit -a -m "Usuable stuff"
> ...
> ?
> 

Ping.

I'm starting to think this should go in.

I just noticed a commit message using "alwasy", which would have been 
caught by codespell:
...
$ echo alwasy | codespell -
1: alwasy
	alwasy ==> always
...

So, any comments?

Thanks,
- Tom

> First, let's look at a straightforward commit-msg hook implementation:
> ...
>      - id: codespell
>        name: codespell-commit-msg
>        verbose: true
>        always_run: true
>        stages: [commit-msg]
> ...
> installed using:
> ...
> $ pre-commit install -t commit-msg
> ...
> 
> When trying the commit, we get:
> ...
> $ echo "/* bla */" >> gdb/gdb.c
> $ git commit -a -m "Usuable stuff"
> black................................................(no files to check)Skipped
> flake8...............................................(no files to check)Skipped
> isort................................................(no files to check)Skipped
> codespell............................................(no files to check)Skipped
> check-include-guards.................................(no files to check)Skipped
> black................................................(no files to check)Skipped
> flake8...............................................(no files to check)Skipped
> codespell............................................(no files to check)Skipped
> codespell-commit-msg.....................................................Failed
> - hook id: codespell
> - duration: 0.06s
> - exit code: 65
> 
> .git/COMMIT_EDITMSG:1: Usuable ==> Usable
> 
> check-include-guards.................................(no files to check)Skipped
> $
> ...
> 
> The commit was aborted, but the commit message is still there:
> ...
> $ cat .git/COMMIT_EDITMSG
> Usuable stuff
> ...
> 
> We can retry and edit the commit message to clean up the typo:
> ...
> $ git commit -e -F .git/COMMIT_EDITMSG -a
> ...
> but it's a bit cumbersome.
> 
> Furthermore, say we fix a typo and want to document this in the ChangeLog, and
> do:
> ...
> $ git commit -m "Fixed typo: useable -> usable" -a
> ...
> 
> This commit cannot succeed, unless we add a codespell ignore tag, which feels
> like taking it too far.
> 
> Both these problems can be addressed by setting things up in such a way that
> the commit always succeeds, and codespell output is shown as a hint.
> 
> Ideally, we'd tell to pre-commit to implement this using some setting, but
> there doesn't seem to be one.
> 
> So we use some indirection.  Instead of using native codespell, use a local
> hook that calls a script gdb/contrib/codespell-log.sh, which calls pre-commit,
> which calls codespell.
> 
> Using this approach, we get:
> ...
> $ echo "/* bla */" >> gdb/gdb.c
> $ git commit -a -m "Usuable stuff"
> black................................................(no files to check)Skipped
> flake8...............................................(no files to check)Skipped
> isort................................................(no files to check)Skipped
> codespell............................................(no files to check)Skipped
> check-include-guards.................................(no files to check)Skipped
> black................................................(no files to check)Skipped
> flake8...............................................(no files to check)Skipped
> codespell............................................(no files to check)Skipped
> check-include-guards.................................(no files to check)Skipped
> codespell-log............................................................Passed
> - hook id: codespell-log
> - duration: 0.18s
> 
> codespell-log-internal...................................................Failed
> - hook id: codespell
> - exit code: 65
> 
> .git/COMMIT_EDITMSG:1: Usuable ==> Usable
> 
> [codespell/codespell-log-2 d081bd25a40] Usuable stuff
>   1 file changed, 1 insertion(+)
> $
> ...
> 
> This is obviously convoluted, but it works.  Perhaps we can propose a
> pre-commit improvement (always_pass) and simplify this eventually.
> ---
>   .pre-commit-config.yaml      |  8 ++++++++
>   gdb/contrib/codespell-log.sh | 29 +++++++++++++++++++++++++++++
>   2 files changed, 37 insertions(+)
>   create mode 100755 gdb/contrib/codespell-log.sh
> 
> diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
> index 488e7812f94..ffa3fa55101 100644
> --- a/.pre-commit-config.yaml
> +++ b/.pre-commit-config.yaml
> @@ -38,6 +38,7 @@
>   # See https://pre-commit.com/hooks.html for more hooks
>   
>   minimum_pre_commit_version: 3.2.0
> +default_install_hook_types: [pre-commit, commit-msg]
>   repos:
>     - repo: https://github.com/psf/black-pre-commit-mirror
>       rev: 25.1.0
> @@ -80,3 +81,10 @@ repos:
>         # All gdb header files, but not headers in the test suite.
>         files: '^(gdb(support|server)?)/.*\.h$'
>         exclude: '.*/testsuite/.*'
> +    - id: codespell-log
> +      name: codespell-log
> +      language: script
> +      entry: gdb/contrib/codespell-log.sh
> +      verbose: true
> +      always_run: true
> +      stages: [commit-msg]
> diff --git a/gdb/contrib/codespell-log.sh b/gdb/contrib/codespell-log.sh
> new file mode 100755
> index 00000000000..380d34025ef
> --- /dev/null
> +++ b/gdb/contrib/codespell-log.sh
> @@ -0,0 +1,29 @@
> +#!/bin/sh
> +
> +gen_cfg ()
> +{
> +    cat > "$1" <<EOF
> +repos:
> +- repo: https://github.com/codespell-project/codespell
> +  rev: v2.4.1
> +  hooks:
> +  - id: codespell
> +    name: codespell-log-internal
> +    stages: [manual]
> +EOF
> +}
> +
> +cfg=$(mktemp)
> +
> +gen_cfg $cfg
> +
> +pre-commit \
> +    run \
> +    -c "$cfg" \
> +    codespell \
> +    --hook-stage manual \
> +    --files "$@" \
> +    || true
> +
> +rm -f \
> +   "$cfg"
> 
> base-commit: d1458933830456e54223d9fc61f0d9b3a19256f5
  
Simon Marchi April 25, 2025, 3 a.m. UTC | #2
On 2025-04-24 17:24, Tom de Vries wrote:
> On 4/10/25 15:45, Tom de Vries wrote:
>> Now that we're using codespell to check spelling in gdb files, can we use
>> codespell to bring this spelling warning:
>> ...
>> $ echo usuable | codespell -
>> 1: usuable
>>     usuable ==> usable
>> ...
>> to:
>> ...
>> $ git commit -a -m "Usuable stuff"
>> ...
>> ?
>>
> 
> Ping.
> 
> I'm starting to think this should go in.
> 
> I just noticed a commit message using "alwasy", which would have been caught by codespell:
> ...
> $ echo alwasy | codespell -
> 1: alwasy
>     alwasy ==> always
> ...
> 
> So, any comments?
> 
> Thanks,
> - Tom

I don't mind giving this a try.

There is one shellcheck warning in the new file (that should be a
pre-commit check too!):

    In contrib/codespell-log.sh line 18:
    gen_cfg $cfg
            ^--^ SC2086 (info): Double quote to prevent globbing and word splitting.

The new file would need a license, and ideally some comment to explain
what's going on.  With that:

Approved-By: Simon Marchi <simon.marchi@efficios.com>

Simon
  
Tom de Vries April 25, 2025, 8:59 a.m. UTC | #3
On 4/25/25 05:00, Simon Marchi wrote:
> 
> 
> On 2025-04-24 17:24, Tom de Vries wrote:
>> On 4/10/25 15:45, Tom de Vries wrote:
>>> Now that we're using codespell to check spelling in gdb files, can we use
>>> codespell to bring this spelling warning:
>>> ...
>>> $ echo usuable | codespell -
>>> 1: usuable
>>>      usuable ==> usable
>>> ...
>>> to:
>>> ...
>>> $ git commit -a -m "Usuable stuff"
>>> ...
>>> ?
>>>
>>
>> Ping.
>>
>> I'm starting to think this should go in.
>>
>> I just noticed a commit message using "alwasy", which would have been caught by codespell:
>> ...
>> $ echo alwasy | codespell -
>> 1: alwasy
>>      alwasy ==> always
>> ...
>>
>> So, any comments?
>>
>> Thanks,
>> - Tom
> 
> I don't mind giving this a try.
> 

Hi Simon,

thanks for the review.

> There is one shellcheck warning in the new file (that should be a
> pre-commit check too!):

I submitted an RFC here ( 
https://sourceware.org/pipermail/gdb-patches/2024-November/213400.html ).

> 
>      In contrib/codespell-log.sh line 18:
>      gen_cfg $cfg
>              ^--^ SC2086 (info): Double quote to prevent globbing and word splitting.
> 

Fixed.

> The new file would need a license, and ideally some comment to explain
> what's going on.

Fixed.

I've submitted a v2 ( I've submitted a v2 ) fixing/improving a few minor 
things:
- less verbose in case codespell succeeds
- use bash to be able to use arrays
- use trap EXIT to schedule cleanup
- use gdb/contrib/setup.cfg to for ignore setttings:
   - ignore-words = gdb/contrib/codespell-ignore-words.txt
   - uri-ignore-words-list = *
- add "set -e"

Thanks,
- Tom

> With that:
> 
> Approved-By: Simon Marchi <simon.marchi@efficios.com>
> 
> Simon
  

Patch

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 488e7812f94..ffa3fa55101 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -38,6 +38,7 @@ 
 # See https://pre-commit.com/hooks.html for more hooks
 
 minimum_pre_commit_version: 3.2.0
+default_install_hook_types: [pre-commit, commit-msg]
 repos:
   - repo: https://github.com/psf/black-pre-commit-mirror
     rev: 25.1.0
@@ -80,3 +81,10 @@  repos:
       # All gdb header files, but not headers in the test suite.
       files: '^(gdb(support|server)?)/.*\.h$'
       exclude: '.*/testsuite/.*'
+    - id: codespell-log
+      name: codespell-log
+      language: script
+      entry: gdb/contrib/codespell-log.sh
+      verbose: true
+      always_run: true
+      stages: [commit-msg]
diff --git a/gdb/contrib/codespell-log.sh b/gdb/contrib/codespell-log.sh
new file mode 100755
index 00000000000..380d34025ef
--- /dev/null
+++ b/gdb/contrib/codespell-log.sh
@@ -0,0 +1,29 @@ 
+#!/bin/sh
+
+gen_cfg ()
+{
+    cat > "$1" <<EOF
+repos:
+- repo: https://github.com/codespell-project/codespell
+  rev: v2.4.1
+  hooks:
+  - id: codespell
+    name: codespell-log-internal
+    stages: [manual]
+EOF
+}
+
+cfg=$(mktemp)
+
+gen_cfg $cfg
+
+pre-commit \
+    run \
+    -c "$cfg" \
+    codespell \
+    --hook-stage manual \
+    --files "$@" \
+    || true
+
+rm -f \
+   "$cfg"