[3/3] Add spell check pre-commit hook

Message ID 20241008074402.10374-3-tdevries@suse.de
State New
Headers
Series [1/3,gdb/contrib] Add spellcheck.sh --check |

Checks

Context Check Description
linaro-tcwg-bot/tcwg_gdb_build--master-aarch64 success Build passed
linaro-tcwg-bot/tcwg_gdb_build--master-arm success Build passed
linaro-tcwg-bot/tcwg_gdb_check--master-arm success Test passed
linaro-tcwg-bot/tcwg_gdb_check--master-aarch64 success Test passed

Commit Message

Tom de Vries Oct. 8, 2024, 7:44 a.m. UTC
  Add a call to gdb/contrib/spellcheck.sh --check in .pre-commit-config.yaml:
...
$ time git commit -a -m typo
black................................................(no files to check)Skipped
flake8...............................................(no files to check)Skipped
isort................................................(no files to check)Skipped
spell check..............................................................Failed
- hook id: spell-check
- exit code: 1

real    0m0,493s
user    0m0,258s
sys     0m0,318s
...

Tested on aarch64-linux.
---
 .pre-commit-config.yaml | 7 +++++++
 1 file changed, 7 insertions(+)
  

Comments

Tom de Vries Oct. 21, 2024, 1:08 p.m. UTC | #1
On 10/8/24 09:44, Tom de Vries wrote:
> Add a call to gdb/contrib/spellcheck.sh --check in .pre-commit-config.yaml:
> ...
> $ time git commit -a -m typo
> black................................................(no files to check)Skipped
> flake8...............................................(no files to check)Skipped
> isort................................................(no files to check)Skipped
> spell check..............................................................Failed
> - hook id: spell-check
> - exit code: 1
> 
> real    0m0,493s
> user    0m0,258s
> sys     0m0,318s
> ...
> 

I've committed the other patches from this series, but this one remains 
in review, so ... ping.

Thanks,
- Tom

> Tested on aarch64-linux.
> ---
>   .pre-commit-config.yaml | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
> index 87726aeb758..4a3fdd79f3b 100644
> --- a/.pre-commit-config.yaml
> +++ b/.pre-commit-config.yaml
> @@ -22,3 +22,10 @@ repos:
>       - id: isort
>         types_or: [file]
>         files: 'gdb/.*\.py(\.in)?$'
> +  - repo: local
> +    hooks:
> +    - id: spell-check
> +      name: spell check
> +      language: script
> +      entry: ./gdb/contrib/spellcheck.sh --check
> +      files: ^(gdb|gdbsupport|gdbserver)/
  
Tom Tromey Nov. 7, 2024, 2:57 p.m. UTC | #2
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:

Tom> I've committed the other patches from this series, but this one
Tom> remains in review, so ... ping.

Ah, now I see the others already went in.
Totally fine of course, I just should have read the whole thread first.

I am in favor of this change.  I think it's a good improvement.
However I wonder -- how long does it take to run the script?
I ask because it seems like it will be run on essentially every commit.
Also, I can't recall how pre-commit works -- is the script run on the
entire tree or just the files being modified?

Assuming everything is ok though I think this should go in.
Approved-By: Tom Tromey <tom@tromey.com>

Tom
  
Tom de Vries Nov. 12, 2024, 10:15 a.m. UTC | #3
On 11/7/24 15:57, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
> 
> Tom> I've committed the other patches from this series, but this one
> Tom> remains in review, so ... ping.
> 
> Ah, now I see the others already went in.
> Totally fine of course, I just should have read the whole thread first.
> 
> I am in favor of this change.  I think it's a good improvement.
> However I wonder -- how long does it take to run the script?

It depends.

Let's take the largest file in gdb* in terms of lines, that's 
gdb/doc/gdb.texinfo with ~51k lines:
...
$ time ./gdb/contrib/spellcheck.sh --check gdb/doc/gdb.texinfo

real	0m4.613s
user	0m4.405s
sys	0m0.215s
...

Now, let's take a small file with just 39 lines:
...
$ time ./gdb/contrib/spellcheck.sh --check gdb/gdb.c

real	0m0.445s
user	0m0.314s
sys	0m0.137s
...

This is with .git/wikipedia-common-misspellings.txt already downloaded, 
and .git/spell-check.pat1.$md5sum already generated.

> I ask because it seems like it will be run on essentially every commit.

Agreed, speed matters for this.

> Also, I can't recall how pre-commit works -- is the script run on the
> entire tree or just the files being modified?

Just on the files being modified.

[ Running it on the entire tree is slow:

$ time ./gdb/contrib/spellcheck.sh --check gdb*

real	1m9.906s
user	1m7.710s
sys	0m1.899s ]

I think it may be possible to speed up the check to only check the 
modified lines (see the line counter example below).

This works for checks that don't require context, or require a fixed 
amount of lines of context.

Currently, spellcheck.sh works on line-at-a-time basis, so no context used.

The question is, should it be sensitive to context?

The current implementation does not differentiate between comments and 
code, so when running it on directory sim we get:
...
diff --git a/sim/arm/armsupp.c b/sim/arm/armsupp.c
index 1a5eeaff1d6..1b92a3abc3e 100644
--- a/sim/arm/armsupp.c
+++ b/sim/arm/armsupp.c
@@ -390,9 +390,9 @@ ARMul_NthReg (ARMword instr, unsigned number)
  {
    unsigned bit, upto;

-  for (bit = 0, upto = 0; upto <= number; bit ++)
+  for (bit = 0, up to = 0; up to <= number; bit ++)
      if (BIT (bit))
-      upto ++;
+      up to ++;

    return (bit - 1);
  }
...

> Assuming everything is ok though I think this should go in.
> Approved-By: Tom Tromey <tom@tromey.com>

Thanks for the review, and I'm glad you're supporting the idea.

I think the potential 4.5 seconds delay is too long, so I'll hold off on 
committing this.

I'll work on a --pre-commit switch that only checks modified lines, and 
resubmit together with this patch.

Thanks,
- Tom

Patch:
...
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 87726aeb758..0b94dcc4744 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -22,3 +22,10 @@ repos:
      - id: isort
        types_or: [file]
        files: 'gdb/.*\.py(\.in)?$'
+  - repo: local
+    hooks:
+    - id: line-counter
+      name: line counter
+      language: script
+      entry: ./gdb/contrib/line-counter.sh
+      files: ^(gdb|gdbsupport|gdbserver)/
diff --git a/gdb/contrib/line-counter.sh b/gdb/contrib/line-counter.sh
new file mode 100755
index 00000000000..257ec3793bb
--- /dev/null
+++ b/gdb/contrib/line-counter.sh
@@ -0,0 +1,5 @@
+#!/bin/sh
+
+git diff --staged > DIFF
+
+git diff --staged | egrep -v "^\+\+\+" | egrep "^\+" | wc -l >> DIFF
...

Example output:
...
diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index 53f41e67444..e08ee7fb675 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -51432,6 +51432,8 @@ Richard M. Stallman and Roland H. Pesch, July 1991.

  @printindex cp

+BLA
+
  @node Command and Variable Index
  @unnumbered Command, Variable, and Function Index

2
...
  
Tom de Vries Nov. 12, 2024, 1:07 p.m. UTC | #4
On 11/12/24 11:15, Tom de Vries wrote:
> On 11/7/24 15:57, Tom Tromey wrote:
>>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:
>>
>> Tom> I've committed the other patches from this series, but this one
>> Tom> remains in review, so ... ping.
>>
>> Ah, now I see the others already went in.
>> Totally fine of course, I just should have read the whole thread first.
>>
>> I am in favor of this change.  I think it's a good improvement.
>> However I wonder -- how long does it take to run the script?
> 
> It depends.
> 
> Let's take the largest file in gdb* in terms of lines, that's gdb/doc/ 
> gdb.texinfo with ~51k lines:
> ...
> $ time ./gdb/contrib/spellcheck.sh --check gdb/doc/gdb.texinfo
> 
> real    0m4.613s
> user    0m4.405s
> sys    0m0.215s
> ...
> 
> Now, let's take a small file with just 39 lines:
> ...
> $ time ./gdb/contrib/spellcheck.sh --check gdb/gdb.c
> 
> real    0m0.445s
> user    0m0.314s
> sys    0m0.137s
> ...
> 
> This is with .git/wikipedia-common-misspellings.txt already downloaded, 
> and .git/spell-check.pat1.$md5sum already generated.
> 
>> I ask because it seems like it will be run on essentially every commit.
> 
> Agreed, speed matters for this.
> 
>> Also, I can't recall how pre-commit works -- is the script run on the
>> entire tree or just the files being modified?
> 
> Just on the files being modified.
> 
> [ Running it on the entire tree is slow:
> 
> $ time ./gdb/contrib/spellcheck.sh --check gdb*
> 
> real    1m9.906s
> user    1m7.710s
> sys    0m1.899s ]
> 
> I think it may be possible to speed up the check to only check the 
> modified lines (see the line counter example below).
> 
> This works for checks that don't require context, or require a fixed 
> amount of lines of context.
> 
> Currently, spellcheck.sh works on line-at-a-time basis, so no context used.
> 
> The question is, should it be sensitive to context?
> 
> The current implementation does not differentiate between comments and 
> code, so when running it on directory sim we get:
> ...
> diff --git a/sim/arm/armsupp.c b/sim/arm/armsupp.c
> index 1a5eeaff1d6..1b92a3abc3e 100644
> --- a/sim/arm/armsupp.c
> +++ b/sim/arm/armsupp.c
> @@ -390,9 +390,9 @@ ARMul_NthReg (ARMword instr, unsigned number)
>   {
>     unsigned bit, upto;
> 
> -  for (bit = 0, upto = 0; upto <= number; bit ++)
> +  for (bit = 0, up to = 0; up to <= number; bit ++)
>       if (BIT (bit))
> -      upto ++;
> +      up to ++;
> 
>     return (bit - 1);
>   }
> ...
> 
>> Assuming everything is ok though I think this should go in.
>> Approved-By: Tom Tromey <tom@tromey.com>
> 
> Thanks for the review, and I'm glad you're supporting the idea.
> 
> I think the potential 4.5 seconds delay is too long, so I'll hold off on 
> committing this.
> 
> I'll work on a --pre-commit switch that only checks modified lines, and 
> resubmit together with this patch.
> 

Submitted here ( 
https://sourceware.org/pipermail/gdb-patches/2024-November/213249.html ).

Thanks,
- Tom

> Patch:
> ...
> diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
> index 87726aeb758..0b94dcc4744 100644
> --- a/.pre-commit-config.yaml
> +++ b/.pre-commit-config.yaml
> @@ -22,3 +22,10 @@ repos:
>       - id: isort
>         types_or: [file]
>         files: 'gdb/.*\.py(\.in)?$'
> +  - repo: local
> +    hooks:
> +    - id: line-counter
> +      name: line counter
> +      language: script
> +      entry: ./gdb/contrib/line-counter.sh
> +      files: ^(gdb|gdbsupport|gdbserver)/
> diff --git a/gdb/contrib/line-counter.sh b/gdb/contrib/line-counter.sh
> new file mode 100755
> index 00000000000..257ec3793bb
> --- /dev/null
> +++ b/gdb/contrib/line-counter.sh
> @@ -0,0 +1,5 @@
> +#!/bin/sh
> +
> +git diff --staged > DIFF
> +
> +git diff --staged | egrep -v "^\+\+\+" | egrep "^\+" | wc -l >> DIFF
> ...
> 
> Example output:
> ...
> diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
> index 53f41e67444..e08ee7fb675 100644
> --- a/gdb/doc/gdb.texinfo
> +++ b/gdb/doc/gdb.texinfo
> @@ -51432,6 +51432,8 @@ Richard M. Stallman and Roland H. Pesch, July 1991.
> 
>   @printindex cp
> 
> +BLA
> +
>   @node Command and Variable Index
>   @unnumbered Command, Variable, and Function Index
> 
> 2
> ...
  

Patch

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 87726aeb758..4a3fdd79f3b 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -22,3 +22,10 @@  repos:
     - id: isort
       types_or: [file]
       files: 'gdb/.*\.py(\.in)?$'
+  - repo: local
+    hooks:
+    - id: spell-check
+      name: spell check
+      language: script
+      entry: ./gdb/contrib/spellcheck.sh --check
+      files: ^(gdb|gdbsupport|gdbserver)/