contrib/gcc-changelog: Accept only [PRnnnn] in subject

Message ID 20250311165708.963372-1-christophe.lyon@linaro.org
State New
Headers
Series contrib/gcc-changelog: Accept only [PRnnnn] in subject |

Commit Message

Christophe Lyon March 11, 2025, 4:57 p.m. UTC
  In https://gcc.gnu.org/contribute.html#patches we ask to use [PRnnnn]
without the Bugzilla component identifier and with no space between
'PR' and the number, but git_check_commit.py accepts all forms.  The
patch enforces what we document.

Note that this would reject a few of the recent commits.

contrib/ChangeLog:

	* gcc-changelog/git_commit.py (subject_pr_regex): Rename into
	subject_pr_component_regex.
	(subject_pr_space_regex): New.
	(subject_pr_paren_regex): New.
	(subject_pr2_regex): Remove matching parentheses and rename into
	subject_pr_regex.
	(GitCommit): Add checks for new regexps.
---
 contrib/gcc-changelog/git_commit.py | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)
  

Comments

Jonathan Wakely March 11, 2025, 5:19 p.m. UTC | #1
On Tue, 11 Mar 2025 at 16:57, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
>
> In https://gcc.gnu.org/contribute.html#patches we ask to use [PRnnnn]
> without the Bugzilla component identifier and with no space between
> 'PR' and the number, but git_check_commit.py accepts all forms.  The
> patch enforces what we document.
>
> Note that this would reject a few of the recent commits.
>
> contrib/ChangeLog:
>
>         * gcc-changelog/git_commit.py (subject_pr_regex): Rename into
>         subject_pr_component_regex.
>         (subject_pr_space_regex): New.
>         (subject_pr_paren_regex): New.
>         (subject_pr2_regex): Remove matching parentheses and rename into
>         subject_pr_regex.
>         (GitCommit): Add checks for new regexps.
> ---
>  contrib/gcc-changelog/git_commit.py | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py
> index 5c0596c2627..245c8496553 100755
> --- a/contrib/gcc-changelog/git_commit.py
> +++ b/contrib/gcc-changelog/git_commit.py
> @@ -167,8 +167,10 @@ author_line_regex = \
>          re.compile(r'^(?P<datetime>\d{4}-\d{2}-\d{2})\ {2}(?P<name>.*  <.*>)')
>  additional_author_regex = re.compile(r'^\t(?P<spaces>\ *)?(?P<name>.*  <.*>)')
>  changelog_regex = re.compile(r'^(?:[fF]or +)?([a-z0-9+-/]*)ChangeLog:?')
> -subject_pr_regex = re.compile(r'(^|\W)PR\s+(?P<component>[a-zA-Z0-9+-]+)/(?P<pr>\d{4,7})')
> -subject_pr2_regex = re.compile(r'[(\[]PR\s*(?P<pr>\d{4,7})[)\]]')
> +subject_pr_regex = re.compile(r'\[PR(?P<pr>\d{4,7})\]')                    # [PRnnnn]
> +subject_pr_space_regex = re.compile(r'\[PR\s+(?P<pr>\d{4,7})\]')           # [PR nnnn]
> +subject_pr_paren_regex = re.compile(r'\(PR\s*(?P<pr>\d{4,7})\)')           # (PRnnnn) / (PR nnnn)
> +subject_pr_component_regex = re.compile(r'(^|\W)PR\s*(?P<component>[a-zA-Z0-9+-]+)/(?P<pr>\d{4,7})') # PRcomponent/nnnn or PR component/nnnn
>  pr_regex = re.compile(r'\tPR (?P<component>[a-z0-9+-]+\/)?(?P<pr>[0-9]+)$')
>  dr_regex = re.compile(r'\tDR ([0-9]+)$')
>  star_prefix_regex = re.compile(r'\t\*(?P<spaces>\ *)(?P<content>.*)')
> @@ -346,13 +348,15 @@ class GitCommit:
>          self.check_commit_email()
>
>          # Extract PR numbers form the subject line
> -        # Match either [PRnnnn] / (PRnnnn) or PR component/nnnn
> +        # Reject [PR nnnn] / (PR nnnn) or PR component/nnnn
>          if self.info.lines and not self.revert_commit:
> -            self.subject_prs = {m.group('pr') for m in subject_pr2_regex.finditer(info.lines[0])}
> -            for m in subject_pr_regex.finditer(info.lines[0]):
> -                if not m.group('component') in bug_components:
> -                    self.errors.append(Error('invalid PR component in subject', info.lines[0]))
> -                self.subject_prs.add(m.group('pr'))
> +            self.subject_prs = {m.group('pr') for m in subject_pr_regex.finditer(info.lines[0])}
> +            for m in subject_pr_space_regex.finditer(info.lines[0]):
> +                self.errors.append(Error('Use [PRnnn] in subject, not [PR nnn]', info.lines[0]))
> +            for m in subject_pr_paren_regex.finditer(info.lines[0]):
> +                self.errors.append(Error('Use [PRnnn] in subject, not (PRnnn)', info.lines[0]))
> +            for m in subject_pr_component_regex.finditer(info.lines[0]):
> +                self.errors.append(Error('Do not use PR component in subject', info.lines[0]))

Maybe "Do not use PR component/nnn in subject" here?

Otherwise I like the change and the patch looks correct (but I'm not
an approver for contrib).

>
>          # Allow complete deletion of ChangeLog files in a commit
>          project_files = [f for f in self.info.modified_files
> --
> 2.34.1
>
  
Richard Biener March 12, 2025, 7:04 a.m. UTC | #2
On Tue, Mar 11, 2025 at 5:58 PM Christophe Lyon
<christophe.lyon@linaro.org> wrote:
>
> In https://gcc.gnu.org/contribute.html#patches we ask to use [PRnnnn]
> without the Bugzilla component identifier and with no space between
> 'PR' and the number, but git_check_commit.py accepts all forms.  The
> patch enforces what we document.
>
> Note that this would reject a few of the recent commits.

Why would we be this restrictive?  I personally am using

bugzilla-component/number - description

IMO 'PR' is redundant and the component helps screening for area of
maintenance.

That said, I suppose my form wouldn't be rejected, so I'm happy.  I just
wonder why we should force something I don't remember we discussed
at any point.

Richard.

> contrib/ChangeLog:
>
>         * gcc-changelog/git_commit.py (subject_pr_regex): Rename into
>         subject_pr_component_regex.
>         (subject_pr_space_regex): New.
>         (subject_pr_paren_regex): New.
>         (subject_pr2_regex): Remove matching parentheses and rename into
>         subject_pr_regex.
>         (GitCommit): Add checks for new regexps.
> ---
>  contrib/gcc-changelog/git_commit.py | 20 ++++++++++++--------
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py
> index 5c0596c2627..245c8496553 100755
> --- a/contrib/gcc-changelog/git_commit.py
> +++ b/contrib/gcc-changelog/git_commit.py
> @@ -167,8 +167,10 @@ author_line_regex = \
>          re.compile(r'^(?P<datetime>\d{4}-\d{2}-\d{2})\ {2}(?P<name>.*  <.*>)')
>  additional_author_regex = re.compile(r'^\t(?P<spaces>\ *)?(?P<name>.*  <.*>)')
>  changelog_regex = re.compile(r'^(?:[fF]or +)?([a-z0-9+-/]*)ChangeLog:?')
> -subject_pr_regex = re.compile(r'(^|\W)PR\s+(?P<component>[a-zA-Z0-9+-]+)/(?P<pr>\d{4,7})')
> -subject_pr2_regex = re.compile(r'[(\[]PR\s*(?P<pr>\d{4,7})[)\]]')
> +subject_pr_regex = re.compile(r'\[PR(?P<pr>\d{4,7})\]')                    # [PRnnnn]
> +subject_pr_space_regex = re.compile(r'\[PR\s+(?P<pr>\d{4,7})\]')           # [PR nnnn]
> +subject_pr_paren_regex = re.compile(r'\(PR\s*(?P<pr>\d{4,7})\)')           # (PRnnnn) / (PR nnnn)
> +subject_pr_component_regex = re.compile(r'(^|\W)PR\s*(?P<component>[a-zA-Z0-9+-]+)/(?P<pr>\d{4,7})') # PRcomponent/nnnn or PR component/nnnn
>  pr_regex = re.compile(r'\tPR (?P<component>[a-z0-9+-]+\/)?(?P<pr>[0-9]+)$')
>  dr_regex = re.compile(r'\tDR ([0-9]+)$')
>  star_prefix_regex = re.compile(r'\t\*(?P<spaces>\ *)(?P<content>.*)')
> @@ -346,13 +348,15 @@ class GitCommit:
>          self.check_commit_email()
>
>          # Extract PR numbers form the subject line
> -        # Match either [PRnnnn] / (PRnnnn) or PR component/nnnn
> +        # Reject [PR nnnn] / (PR nnnn) or PR component/nnnn
>          if self.info.lines and not self.revert_commit:
> -            self.subject_prs = {m.group('pr') for m in subject_pr2_regex.finditer(info.lines[0])}
> -            for m in subject_pr_regex.finditer(info.lines[0]):
> -                if not m.group('component') in bug_components:
> -                    self.errors.append(Error('invalid PR component in subject', info.lines[0]))
> -                self.subject_prs.add(m.group('pr'))
> +            self.subject_prs = {m.group('pr') for m in subject_pr_regex.finditer(info.lines[0])}
> +            for m in subject_pr_space_regex.finditer(info.lines[0]):
> +                self.errors.append(Error('Use [PRnnn] in subject, not [PR nnn]', info.lines[0]))
> +            for m in subject_pr_paren_regex.finditer(info.lines[0]):
> +                self.errors.append(Error('Use [PRnnn] in subject, not (PRnnn)', info.lines[0]))
> +            for m in subject_pr_component_regex.finditer(info.lines[0]):
> +                self.errors.append(Error('Do not use PR component in subject', info.lines[0]))
>
>          # Allow complete deletion of ChangeLog files in a commit
>          project_files = [f for f in self.info.modified_files
> --
> 2.34.1
>
  
Christophe Lyon March 12, 2025, 9:49 a.m. UTC | #3
On Wed, 12 Mar 2025 at 08:05, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Tue, Mar 11, 2025 at 5:58 PM Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
> >
> > In https://gcc.gnu.org/contribute.html#patches we ask to use [PRnnnn]
> > without the Bugzilla component identifier and with no space between
> > 'PR' and the number, but git_check_commit.py accepts all forms.  The
> > patch enforces what we document.
> >
> > Note that this would reject a few of the recent commits.
>
> Why would we be this restrictive?  I personally am using

Well, someone made me realize that I should have used [PRnnnn] rather
than (PR nnnn) in a recent commit, as documented in "Contributing to
GCC".

Since I use gcc-verify, I looked at why I missed that, and hence
proposed this patch, so that our tools match what we document....

>
> bugzilla-component/number - description
>
> IMO 'PR' is redundant and the component helps screening for area of
> maintenance.
>
> That said, I suppose my form wouldn't be rejected, so I'm happy.  I just
> wonder why we should force something I don't remember we discussed
> at any point.
>

.... so another alternative is to change our document, to match the
actual practice and what the tools do?

Thanks,

Christophe

> Richard.
>
> > contrib/ChangeLog:
> >
> >         * gcc-changelog/git_commit.py (subject_pr_regex): Rename into
> >         subject_pr_component_regex.
> >         (subject_pr_space_regex): New.
> >         (subject_pr_paren_regex): New.
> >         (subject_pr2_regex): Remove matching parentheses and rename into
> >         subject_pr_regex.
> >         (GitCommit): Add checks for new regexps.
> > ---
> >  contrib/gcc-changelog/git_commit.py | 20 ++++++++++++--------
> >  1 file changed, 12 insertions(+), 8 deletions(-)
> >
> > diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py
> > index 5c0596c2627..245c8496553 100755
> > --- a/contrib/gcc-changelog/git_commit.py
> > +++ b/contrib/gcc-changelog/git_commit.py
> > @@ -167,8 +167,10 @@ author_line_regex = \
> >          re.compile(r'^(?P<datetime>\d{4}-\d{2}-\d{2})\ {2}(?P<name>.*  <.*>)')
> >  additional_author_regex = re.compile(r'^\t(?P<spaces>\ *)?(?P<name>.*  <.*>)')
> >  changelog_regex = re.compile(r'^(?:[fF]or +)?([a-z0-9+-/]*)ChangeLog:?')
> > -subject_pr_regex = re.compile(r'(^|\W)PR\s+(?P<component>[a-zA-Z0-9+-]+)/(?P<pr>\d{4,7})')
> > -subject_pr2_regex = re.compile(r'[(\[]PR\s*(?P<pr>\d{4,7})[)\]]')
> > +subject_pr_regex = re.compile(r'\[PR(?P<pr>\d{4,7})\]')                    # [PRnnnn]
> > +subject_pr_space_regex = re.compile(r'\[PR\s+(?P<pr>\d{4,7})\]')           # [PR nnnn]
> > +subject_pr_paren_regex = re.compile(r'\(PR\s*(?P<pr>\d{4,7})\)')           # (PRnnnn) / (PR nnnn)
> > +subject_pr_component_regex = re.compile(r'(^|\W)PR\s*(?P<component>[a-zA-Z0-9+-]+)/(?P<pr>\d{4,7})') # PRcomponent/nnnn or PR component/nnnn
> >  pr_regex = re.compile(r'\tPR (?P<component>[a-z0-9+-]+\/)?(?P<pr>[0-9]+)$')
> >  dr_regex = re.compile(r'\tDR ([0-9]+)$')
> >  star_prefix_regex = re.compile(r'\t\*(?P<spaces>\ *)(?P<content>.*)')
> > @@ -346,13 +348,15 @@ class GitCommit:
> >          self.check_commit_email()
> >
> >          # Extract PR numbers form the subject line
> > -        # Match either [PRnnnn] / (PRnnnn) or PR component/nnnn
> > +        # Reject [PR nnnn] / (PR nnnn) or PR component/nnnn
> >          if self.info.lines and not self.revert_commit:
> > -            self.subject_prs = {m.group('pr') for m in subject_pr2_regex.finditer(info.lines[0])}
> > -            for m in subject_pr_regex.finditer(info.lines[0]):
> > -                if not m.group('component') in bug_components:
> > -                    self.errors.append(Error('invalid PR component in subject', info.lines[0]))
> > -                self.subject_prs.add(m.group('pr'))
> > +            self.subject_prs = {m.group('pr') for m in subject_pr_regex.finditer(info.lines[0])}
> > +            for m in subject_pr_space_regex.finditer(info.lines[0]):
> > +                self.errors.append(Error('Use [PRnnn] in subject, not [PR nnn]', info.lines[0]))
> > +            for m in subject_pr_paren_regex.finditer(info.lines[0]):
> > +                self.errors.append(Error('Use [PRnnn] in subject, not (PRnnn)', info.lines[0]))
> > +            for m in subject_pr_component_regex.finditer(info.lines[0]):
> > +                self.errors.append(Error('Do not use PR component in subject', info.lines[0]))
> >
> >          # Allow complete deletion of ChangeLog files in a commit
> >          project_files = [f for f in self.info.modified_files
> > --
> > 2.34.1
> >
  
Jakub Jelinek March 12, 2025, 9:55 a.m. UTC | #4
On Wed, Mar 12, 2025 at 10:49:58AM +0100, Christophe Lyon wrote:
> On Wed, 12 Mar 2025 at 08:05, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Mar 11, 2025 at 5:58 PM Christophe Lyon
> > <christophe.lyon@linaro.org> wrote:
> > >
> > > In https://gcc.gnu.org/contribute.html#patches we ask to use [PRnnnn]
> > > without the Bugzilla component identifier and with no space between
> > > 'PR' and the number, but git_check_commit.py accepts all forms.  The
> > > patch enforces what we document.
> > >
> > > Note that this would reject a few of the recent commits.
> >
> > Why would we be this restrictive?  I personally am using
> 
> Well, someone made me realize that I should have used [PRnnnn] rather
> than (PR nnnn) in a recent commit, as documented in "Contributing to
> GCC".
> 
> Since I use gcc-verify, I looked at why I missed that, and hence
> proposed this patch, so that our tools match what we document....
> 
> >
> > bugzilla-component/number - description
> >
> > IMO 'PR' is redundant and the component helps screening for area of
> > maintenance.

The documented style is
bugzilla-component: description [PRnumber]
(it doesn't need to be exactly bugzilla-component, though sometimes the
component and bugzilla-component are the same thing).
I believe it has been discussed on the mailing lists before it was
added to contribute.html.

	Jakub
  
Jonathan Wakely March 12, 2025, 10:02 a.m. UTC | #5
On Wed, 12 Mar 2025 at 09:55, Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Wed, Mar 12, 2025 at 10:49:58AM +0100, Christophe Lyon wrote:
> > On Wed, 12 Mar 2025 at 08:05, Richard Biener <richard.guenther@gmail.com> wrote:
> > >
> > > On Tue, Mar 11, 2025 at 5:58 PM Christophe Lyon
> > > <christophe.lyon@linaro.org> wrote:
> > > >
> > > > In https://gcc.gnu.org/contribute.html#patches we ask to use [PRnnnn]
> > > > without the Bugzilla component identifier and with no space between
> > > > 'PR' and the number, but git_check_commit.py accepts all forms.  The
> > > > patch enforces what we document.
> > > >
> > > > Note that this would reject a few of the recent commits.
> > >
> > > Why would we be this restrictive?  I personally am using
> >
> > Well, someone made me realize that I should have used [PRnnnn] rather
> > than (PR nnnn) in a recent commit, as documented in "Contributing to
> > GCC".
> >
> > Since I use gcc-verify, I looked at why I missed that, and hence
> > proposed this patch, so that our tools match what we document....
> >
> > >
> > > bugzilla-component/number - description
> > >
> > > IMO 'PR' is redundant and the component helps screening for area of
> > > maintenance.
>
> The documented style is
> bugzilla-component: description [PRnumber]

And so I do think we should reject (PR nnn) at the end, for example.

Maybe the other checks don't need to be so strict, I don't feel
strongly about that.

> (it doesn't need to be exactly bugzilla-component, though sometimes the
> component and bugzilla-component are the same thing).
> I believe it has been discussed on the mailing lists before it was
> added to contribute.html.
>
>         Jakub
>
  
Richard Biener March 12, 2025, 10:16 a.m. UTC | #6
On Wed, Mar 12, 2025 at 11:02 AM Jonathan Wakely <jwakely@redhat.com> wrote:
>
> On Wed, 12 Mar 2025 at 09:55, Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Wed, Mar 12, 2025 at 10:49:58AM +0100, Christophe Lyon wrote:
> > > On Wed, 12 Mar 2025 at 08:05, Richard Biener <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Tue, Mar 11, 2025 at 5:58 PM Christophe Lyon
> > > > <christophe.lyon@linaro.org> wrote:
> > > > >
> > > > > In https://gcc.gnu.org/contribute.html#patches we ask to use [PRnnnn]
> > > > > without the Bugzilla component identifier and with no space between
> > > > > 'PR' and the number, but git_check_commit.py accepts all forms.  The
> > > > > patch enforces what we document.
> > > > >
> > > > > Note that this would reject a few of the recent commits.
> > > >
> > > > Why would we be this restrictive?  I personally am using
> > >
> > > Well, someone made me realize that I should have used [PRnnnn] rather
> > > than (PR nnnn) in a recent commit, as documented in "Contributing to
> > > GCC".
> > >
> > > Since I use gcc-verify, I looked at why I missed that, and hence
> > > proposed this patch, so that our tools match what we document....
> > >
> > > >
> > > > bugzilla-component/number - description
> > > >
> > > > IMO 'PR' is redundant and the component helps screening for area of
> > > > maintenance.
> >
> > The documented style is
> > bugzilla-component: description [PRnumber]
>
> And so I do think we should reject (PR nnn) at the end, for example.
>
> Maybe the other checks don't need to be so strict, I don't feel
> strongly about that.

The question is whether there's good heuristic to detect this is a fix
for a PR - such heuristic should look at the bugzilla relevant line
before the ChangeLog (though rejecting otherwise heuristically
identified PR fixes with that missing would be nice as well)

Richard.

>
> > (it doesn't need to be exactly bugzilla-component, though sometimes the
> > component and bugzilla-component are the same thing).
> > I believe it has been discussed on the mailing lists before it was
> > added to contribute.html.
> >
> >         Jakub
> >
>
  

Patch

diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py
index 5c0596c2627..245c8496553 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -167,8 +167,10 @@  author_line_regex = \
         re.compile(r'^(?P<datetime>\d{4}-\d{2}-\d{2})\ {2}(?P<name>.*  <.*>)')
 additional_author_regex = re.compile(r'^\t(?P<spaces>\ *)?(?P<name>.*  <.*>)')
 changelog_regex = re.compile(r'^(?:[fF]or +)?([a-z0-9+-/]*)ChangeLog:?')
-subject_pr_regex = re.compile(r'(^|\W)PR\s+(?P<component>[a-zA-Z0-9+-]+)/(?P<pr>\d{4,7})')
-subject_pr2_regex = re.compile(r'[(\[]PR\s*(?P<pr>\d{4,7})[)\]]')
+subject_pr_regex = re.compile(r'\[PR(?P<pr>\d{4,7})\]')                    # [PRnnnn]
+subject_pr_space_regex = re.compile(r'\[PR\s+(?P<pr>\d{4,7})\]')           # [PR nnnn]
+subject_pr_paren_regex = re.compile(r'\(PR\s*(?P<pr>\d{4,7})\)')           # (PRnnnn) / (PR nnnn)
+subject_pr_component_regex = re.compile(r'(^|\W)PR\s*(?P<component>[a-zA-Z0-9+-]+)/(?P<pr>\d{4,7})') # PRcomponent/nnnn or PR component/nnnn
 pr_regex = re.compile(r'\tPR (?P<component>[a-z0-9+-]+\/)?(?P<pr>[0-9]+)$')
 dr_regex = re.compile(r'\tDR ([0-9]+)$')
 star_prefix_regex = re.compile(r'\t\*(?P<spaces>\ *)(?P<content>.*)')
@@ -346,13 +348,15 @@  class GitCommit:
         self.check_commit_email()
 
         # Extract PR numbers form the subject line
-        # Match either [PRnnnn] / (PRnnnn) or PR component/nnnn
+        # Reject [PR nnnn] / (PR nnnn) or PR component/nnnn
         if self.info.lines and not self.revert_commit:
-            self.subject_prs = {m.group('pr') for m in subject_pr2_regex.finditer(info.lines[0])}
-            for m in subject_pr_regex.finditer(info.lines[0]):
-                if not m.group('component') in bug_components:
-                    self.errors.append(Error('invalid PR component in subject', info.lines[0]))
-                self.subject_prs.add(m.group('pr'))
+            self.subject_prs = {m.group('pr') for m in subject_pr_regex.finditer(info.lines[0])}
+            for m in subject_pr_space_regex.finditer(info.lines[0]):
+                self.errors.append(Error('Use [PRnnn] in subject, not [PR nnn]', info.lines[0]))
+            for m in subject_pr_paren_regex.finditer(info.lines[0]):
+                self.errors.append(Error('Use [PRnnn] in subject, not (PRnnn)', info.lines[0]))
+            for m in subject_pr_component_regex.finditer(info.lines[0]):
+                self.errors.append(Error('Do not use PR component in subject', info.lines[0]))
 
         # Allow complete deletion of ChangeLog files in a commit
         project_files = [f for f in self.info.modified_files