[RFA] input: add get_source_text_between

Message ID 20221103195902.2114479-1-jason@redhat.com
State New
Headers
Series [RFA] input: add get_source_text_between |

Commit Message

Jason Merrill Nov. 3, 2022, 7:59 p.m. UTC
  Tested x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

The c++-contracts branch uses this to retrieve the source form of the
contract predicate, to be returned by contract_violation::comment().

gcc/ChangeLog:

	* input.cc (get_source_text_between): New fn.
	* input.h (get_source_text_between): Declare.
---
 gcc/input.h  |  1 +
 gcc/input.cc | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)


base-commit: a4cd2389276a30c39034a83d640ce68fa407bac1
prerequisite-patch-id: 329bc16a88dc9a3b13cd3fcecb3678826cc592dc
prerequisite-patch-id: 49e922c10f6da687d9da3f6a0fd20f324bd352d6
  

Comments

David Malcolm Nov. 3, 2022, 11:06 p.m. UTC | #1
On Thu, 2022-11-03 at 15:59 -0400, Jason Merrill via Gcc-patches wrote:
> Tested x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> The c++-contracts branch uses this to retrieve the source form of the
> contract predicate, to be returned by contract_violation::comment().
> 
> gcc/ChangeLog:
> 
>         * input.cc (get_source_text_between): New fn.
>         * input.h (get_source_text_between): Declare.
> ---
>  gcc/input.h  |  1 +
>  gcc/input.cc | 76
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 77 insertions(+)
> 
> diff --git a/gcc/input.h b/gcc/input.h
> index 11c571d076f..f18769950b5 100644
> --- a/gcc/input.h
> +++ b/gcc/input.h
> @@ -111,6 +111,7 @@ class char_span
>  };
>  
>  extern char_span location_get_source_line (const char *file_path,
> int line);
> +extern char *get_source_text_between (location_t, location_t);
>  
>  extern bool location_missing_trailing_newline (const char
> *file_path);
>  
> diff --git a/gcc/input.cc b/gcc/input.cc
> index a28abfac5ac..9b36356338a 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -949,6 +949,82 @@ location_get_source_line (const char *file_path,
> int line)
>    return char_span (buffer, len);
>  }
>  
> +/* Return a copy of the source text between two locations.  The
> caller is
> +   responsible for freeing the return value.  */
> +
> +char *
> +get_source_text_between (location_t start, location_t end)
> +{
> +  expanded_location expstart =
> +    expand_location_to_spelling_point (start,
> LOCATION_ASPECT_START);
> +  expanded_location expend =
> +    expand_location_to_spelling_point (end, LOCATION_ASPECT_FINISH);
> +
> +  /* If the locations are in different files or the end comes before
> the
> +     start, abort and return nothing.  */

I don't like the use of the term "abort" here, as it suggests to me the
use of abort(3).  Maybe "bail out" instead?


> +  if (!expstart.file || !expend.file)
> +    return NULL;
> +  if (strcmp (expstart.file, expend.file) != 0)
> +    return NULL;
> +  if (expstart.line > expend.line)
> +    return NULL;
> +  if (expstart.line == expend.line
> +      && expstart.column > expend.column)
> +    return NULL;

We occasionally use the convention that
  (column == 0)
means "the whole line".  Probably should detect that case and bail out
early for it.


> +
> +  /* For a single line we need to trim both edges.  */
> +  if (expstart.line == expend.line)
> +    {
> +      char_span line = location_get_source_line (expstart.file,
> expstart.line);
> +      if (line.length () < 1)
> +       return NULL;
> +      int s = expstart.column - 1;
> +      int l = expend.column - s;

Can we please avoid lower-case L "ell" for variable names, as it looks
so similar to the numeral for one.  "len" would be more descriptive
here.


> +      if (line.length () < (size_t)expend.column)
> +       return NULL;
> +      return line.subspan (s, l).xstrdup ();
> +    }
> +
> +  struct obstack buf_obstack;
> +  obstack_init (&buf_obstack);
> +
> +  /* Loop through all lines in the range and append each to buf; may
> trim
> +     parts of the start and end lines off depending on column
> values.  */
> +  for (int l = expstart.line; l <= expend.line; ++l)

Again, please let's not have a var named "l".  Maybe "iter_line" as
that's what is being done?

> +    {
> +      char_span line = location_get_source_line (expstart.file, l);
> +      if (line.length () < 1 && (l != expstart.line && l !=
> expend.line))

...especially as I *think* the first comparison is against numeral one,
whereas comparisons two and three are against lower-case ell, but I'm
having to squint at the font in my email client to be sure :-/

> +       continue;
> +
> +      /* For the first line in the range, only start at
> expstart.column */
> +      if (l == expstart.line)
> +       {
> +         if (expstart.column == 0)
> +           return NULL;
> +         if (line.length () < (size_t)expstart.column - 1)
> +           return NULL;
> +         line = line.subspan (expstart.column - 1,
> +                              line.length() - expstart.column + 1);
> +       }
> +      /* For the last line, don't go past expend.column */
> +      else if (l == expend.line)
> +       {
> +         if (line.length () < (size_t)expend.column)
> +           return NULL;
> +         line = line.subspan (0, expend.column);
> +       }
> +
> +      obstack_grow (&buf_obstack, line.get_buffer (), line.length
> ());

Is this accumulating the trailing newline characters into the
buf_obstack?  I *think* it is, but it seems worth a comment for each of
the three cases (first line, intermediate line, last line).

> +    }
> +
> +  /* NUL-terminate and finish the buf obstack.  */
> +  obstack_1grow (&buf_obstack, 0);
> +  const char *buf = (const char *) obstack_finish (&buf_obstack);
> +
> +  /* TODO should we collapse/trim newlines and runs of spaces?  */
> +  return xstrdup (buf);
> +}
> +

Do you have test coverage for this from the DejaGnu side?  If not, you
could add selftest coverage for this; see input.cc's
test_reading_source_line for something similar.

OK for trunk otherwise.

Hope this is helpful
Dave
  
Jason Merrill Nov. 4, 2022, 2:27 p.m. UTC | #2
On 11/3/22 19:06, David Malcolm wrote:
> On Thu, 2022-11-03 at 15:59 -0400, Jason Merrill via Gcc-patches wrote:
>> Tested x86_64-pc-linux-gnu, OK for trunk?
>>
>> -- >8 --
>>
>> The c++-contracts branch uses this to retrieve the source form of the
>> contract predicate, to be returned by contract_violation::comment().
>>
>> gcc/ChangeLog:
>>
>>          * input.cc (get_source_text_between): New fn.
>>          * input.h (get_source_text_between): Declare.
>> ---
>>   gcc/input.h  |  1 +
>>   gcc/input.cc | 76
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 77 insertions(+)
>>
>> diff --git a/gcc/input.h b/gcc/input.h
>> index 11c571d076f..f18769950b5 100644
>> --- a/gcc/input.h
>> +++ b/gcc/input.h
>> @@ -111,6 +111,7 @@ class char_span
>>   };
>>   
>>   extern char_span location_get_source_line (const char *file_path,
>> int line);
>> +extern char *get_source_text_between (location_t, location_t);
>>   
>>   extern bool location_missing_trailing_newline (const char
>> *file_path);
>>   
>> diff --git a/gcc/input.cc b/gcc/input.cc
>> index a28abfac5ac..9b36356338a 100644
>> --- a/gcc/input.cc
>> +++ b/gcc/input.cc
>> @@ -949,6 +949,82 @@ location_get_source_line (const char *file_path,
>> int line)
>>     return char_span (buffer, len);
>>   }
>>   
>> +/* Return a copy of the source text between two locations.  The
>> caller is
>> +   responsible for freeing the return value.  */
>> +
>> +char *
>> +get_source_text_between (location_t start, location_t end)
>> +{
>> +  expanded_location expstart =
>> +    expand_location_to_spelling_point (start,
>> LOCATION_ASPECT_START);
>> +  expanded_location expend =
>> +    expand_location_to_spelling_point (end, LOCATION_ASPECT_FINISH);
>> +
>> +  /* If the locations are in different files or the end comes before
>> the
>> +     start, abort and return nothing.  */
> 
> I don't like the use of the term "abort" here, as it suggests to me the
> use of abort(3).  Maybe "bail out" instead?

I went with "give up".

>> +  if (!expstart.file || !expend.file)
>> +    return NULL;
>> +  if (strcmp (expstart.file, expend.file) != 0)
>> +    return NULL;
>> +  if (expstart.line > expend.line)
>> +    return NULL;
>> +  if (expstart.line == expend.line
>> +      && expstart.column > expend.column)
>> +    return NULL;
> 
> We occasionally use the convention that
>    (column == 0)
> means "the whole line".  Probably should detect that case and bail out
> early for it.

Done.

>> +
>> +  /* For a single line we need to trim both edges.  */
>> +  if (expstart.line == expend.line)
>> +    {
>> +      char_span line = location_get_source_line (expstart.file,
>> expstart.line);
>> +      if (line.length () < 1)
>> +       return NULL;
>> +      int s = expstart.column - 1;
>> +      int l = expend.column - s;
> 
> Can we please avoid lower-case L "ell" for variable names, as it looks
> so similar to the numeral for one.  "len" would be more descriptive
> here.

Done.

>> +      if (line.length () < (size_t)expend.column)
>> +       return NULL;
>> +      return line.subspan (s, l).xstrdup ();
>> +    }
>> +
>> +  struct obstack buf_obstack;
>> +  obstack_init (&buf_obstack);
>> +
>> +  /* Loop through all lines in the range and append each to buf; may
>> trim
>> +     parts of the start and end lines off depending on column
>> values.  */
>> +  for (int l = expstart.line; l <= expend.line; ++l)
> 
> Again, please let's not have a var named "l".  Maybe "iter_line" as
> that's what is being done?
> 
>> +    {
>> +      char_span line = location_get_source_line (expstart.file, l);
>> +      if (line.length () < 1 && (l != expstart.line && l !=
>> expend.line))
> 
> ...especially as I *think* the first comparison is against numeral one,
> whereas comparisons two and three are against lower-case ell, but I'm
> having to squint at the font in my email client to be sure :-/

Done.  But also allow me to recommend

https://nodnod.net/posts/inconsolata-dz/

>> +       continue;
>> +
>> +      /* For the first line in the range, only start at
>> expstart.column */
>> +      if (l == expstart.line)
>> +       {
>> +         if (expstart.column == 0)
>> +           return NULL;
>> +         if (line.length () < (size_t)expstart.column - 1)
>> +           return NULL;
>> +         line = line.subspan (expstart.column - 1,
>> +                              line.length() - expstart.column + 1);
>> +       }
>> +      /* For the last line, don't go past expend.column */
>> +      else if (l == expend.line)
>> +       {
>> +         if (line.length () < (size_t)expend.column)
>> +           return NULL;
>> +         line = line.subspan (0, expend.column);
>> +       }
>> +
>> +      obstack_grow (&buf_obstack, line.get_buffer (), line.length
>> ());
> 
> Is this accumulating the trailing newline characters into the
> buf_obstack?  I *think* it is, but it seems worth a comment for each of
> the three cases (first line, intermediate line, last line).

It is not; I've added a comment to that effect, and also implemented the 
TODO of collapsing a series of whitespace.

>> +    }
>> +
>> +  /* NUL-terminate and finish the buf obstack.  */
>> +  obstack_1grow (&buf_obstack, 0);
>> +  const char *buf = (const char *) obstack_finish (&buf_obstack);
>> +
>> +  /* TODO should we collapse/trim newlines and runs of spaces?  */
>> +  return xstrdup (buf);
>> +}
>> +
> 
> Do you have test coverage for this from the DejaGnu side?  If not, you
> could add selftest coverage for this; see input.cc's
> test_reading_source_line for something similar.

There is test coverage for the output of the the contract violation 
handler, which involves printing the result of this function.
  
David Malcolm Nov. 4, 2022, 3:16 p.m. UTC | #3
On Fri, 2022-11-04 at 10:27 -0400, Jason Merrill wrote:
> On 11/3/22 19:06, David Malcolm wrote:
> > On Thu, 2022-11-03 at 15:59 -0400, Jason Merrill via Gcc-patches
> > wrote:

[...snip...]

> > 
> > 
> > Do you have test coverage for this from the DejaGnu side?  If not,
> > you
> > could add selftest coverage for this; see input.cc's
> > test_reading_source_line for something similar.
> 
> There is test coverage for the output of the the contract violation 
> handler, which involves printing the result of this function.

Thanks.   Is this test posted somwehere?  I was looking in:
 https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604974.html
but I'm not seeing it.  Sorry if I'm missing something here.

Ideally we should have coverage for the three cases of:
(a) bails out and returns NULL
(b) single-line case
(c) multi-line case


> index a28abfac5ac..04d0809bfdf 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -949,6 +949,97 @@ location_get_source_line (const char *file_path, int line)
>    return char_span (buffer, len);
>  }

Strings in input.cc are not always NUL-terminated, so...

>  
> +/* Return a copy of the source text between two locations.  The caller is
> +   responsible for freeing the return value.  */

...please note in the comment that if non-NULL, the copy is NUL-
terminated (I checked both exit paths that can return non-NULL, and
they do NUL-terminate their buffers).

OK with that nit fixed.

Thanks
Dave
  
Jason Merrill Nov. 4, 2022, 5:06 p.m. UTC | #4
On 11/4/22 11:16, David Malcolm wrote:
> On Fri, 2022-11-04 at 10:27 -0400, Jason Merrill wrote:
>> On 11/3/22 19:06, David Malcolm wrote:
>>> On Thu, 2022-11-03 at 15:59 -0400, Jason Merrill via Gcc-patches
>>> wrote:
> 
> [...snip...]
> 
>>>
>>>
>>> Do you have test coverage for this from the DejaGnu side?  If not,
>>> you
>>> could add selftest coverage for this; see input.cc's
>>> test_reading_source_line for something similar.
>>
>> There is test coverage for the output of the the contract violation
>> handler, which involves printing the result of this function.
> 
> Thanks.   Is this test posted somwehere?  I was looking in:
>   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604974.html
> but I'm not seeing it.  Sorry if I'm missing something here.

The tests are in the newer message

   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605072.html

> Ideally we should have coverage for the three cases of:
> (a) bails out and returns NULL

This isn't tested because it should never happen.

> (b) single-line case

contracts-tmpl-spec3.C etc.

> (c) multi-line case

contracts-multiline1.C

>> index a28abfac5ac..04d0809bfdf 100644
>> --- a/gcc/input.cc
>> +++ b/gcc/input.cc
>> @@ -949,6 +949,97 @@ location_get_source_line (const char *file_path, int line)
>>     return char_span (buffer, len);
>>   }
> 
> Strings in input.cc are not always NUL-terminated, so...
> 
>>   
>> +/* Return a copy of the source text between two locations.  The caller is
>> +   responsible for freeing the return value.  */
> 
> ...please note in the comment that if non-NULL, the copy is NUL-
> terminated (I checked both exit paths that can return non-NULL, and
> they do NUL-terminate their buffers).
> 
> OK with that nit fixed.

Thanks, pushing this:
  
David Malcolm Nov. 5, 2022, 2 a.m. UTC | #5
On Fri, 2022-11-04 at 13:06 -0400, Jason Merrill wrote:
> On 11/4/22 11:16, David Malcolm wrote:
> > On Fri, 2022-11-04 at 10:27 -0400, Jason Merrill wrote:
> > > On 11/3/22 19:06, David Malcolm wrote:
> > > > On Thu, 2022-11-03 at 15:59 -0400, Jason Merrill via Gcc-
> > > > patches
> > > > wrote:
> > 
> > [...snip...]
> > 
> > > > 
> > > > 
> > > > Do you have test coverage for this from the DejaGnu side?  If
> > > > not,
> > > > you
> > > > could add selftest coverage for this; see input.cc's
> > > > test_reading_source_line for something similar.
> > > 
> > > There is test coverage for the output of the the contract
> > > violation
> > > handler, which involves printing the result of this function.
> > 
> > Thanks.   Is this test posted somwehere?  I was looking in:
> >  
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604974.html
> > but I'm not seeing it.  Sorry if I'm missing something here.
> 
> The tests are in the newer message
> 
>   
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605072.html
> 
> > Ideally we should have coverage for the three cases of:
> > (a) bails out and returns NULL
> 
> This isn't tested because it should never happen.

I'm guessing someone will run into it in the wild at some point.  With
very large files we eventually give up on tracking column numbers, and
so we get location_t values with column number == 0, and FWIW
  gcc/testsuite/gcc.dg/plugin/location_overflow_plugin.c
makes it feasible to test such cases.

But obviously that's low priority.

Dave

> 
> > (b) single-line case
> 
> contracts-tmpl-spec3.C etc.
> 
> > (c) multi-line case
> 
> contracts-multiline1.C
> 
> > > index a28abfac5ac..04d0809bfdf 100644
> > > --- a/gcc/input.cc
> > > +++ b/gcc/input.cc
> > > @@ -949,6 +949,97 @@ location_get_source_line (const char
> > > *file_path, int line)
> > >     return char_span (buffer, len);
> > >   }
> > 
> > Strings in input.cc are not always NUL-terminated, so...
> > 
> > >   
> > > +/* Return a copy of the source text between two locations.  The
> > > caller is
> > > +   responsible for freeing the return value.  */
> > 
> > ...please note in the comment that if non-NULL, the copy is NUL-
> > terminated (I checked both exit paths that can return non-NULL, and
> > they do NUL-terminate their buffers).
> > 
> > OK with that nit fixed.
> 
> Thanks, pushing this:
  

Patch

diff --git a/gcc/input.h b/gcc/input.h
index 11c571d076f..f18769950b5 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -111,6 +111,7 @@  class char_span
 };
 
 extern char_span location_get_source_line (const char *file_path, int line);
+extern char *get_source_text_between (location_t, location_t);
 
 extern bool location_missing_trailing_newline (const char *file_path);
 
diff --git a/gcc/input.cc b/gcc/input.cc
index a28abfac5ac..9b36356338a 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -949,6 +949,82 @@  location_get_source_line (const char *file_path, int line)
   return char_span (buffer, len);
 }
 
+/* Return a copy of the source text between two locations.  The caller is
+   responsible for freeing the return value.  */
+
+char *
+get_source_text_between (location_t start, location_t end)
+{
+  expanded_location expstart =
+    expand_location_to_spelling_point (start, LOCATION_ASPECT_START);
+  expanded_location expend =
+    expand_location_to_spelling_point (end, LOCATION_ASPECT_FINISH);
+
+  /* If the locations are in different files or the end comes before the
+     start, abort and return nothing.  */
+  if (!expstart.file || !expend.file)
+    return NULL;
+  if (strcmp (expstart.file, expend.file) != 0)
+    return NULL;
+  if (expstart.line > expend.line)
+    return NULL;
+  if (expstart.line == expend.line
+      && expstart.column > expend.column)
+    return NULL;
+
+  /* For a single line we need to trim both edges.  */
+  if (expstart.line == expend.line)
+    {
+      char_span line = location_get_source_line (expstart.file, expstart.line);
+      if (line.length () < 1)
+	return NULL;
+      int s = expstart.column - 1;
+      int l = expend.column - s;
+      if (line.length () < (size_t)expend.column)
+	return NULL;
+      return line.subspan (s, l).xstrdup ();
+    }
+
+  struct obstack buf_obstack;
+  obstack_init (&buf_obstack);
+
+  /* Loop through all lines in the range and append each to buf; may trim
+     parts of the start and end lines off depending on column values.  */
+  for (int l = expstart.line; l <= expend.line; ++l)
+    {
+      char_span line = location_get_source_line (expstart.file, l);
+      if (line.length () < 1 && (l != expstart.line && l != expend.line))
+	continue;
+
+      /* For the first line in the range, only start at expstart.column */
+      if (l == expstart.line)
+	{
+	  if (expstart.column == 0)
+	    return NULL;
+	  if (line.length () < (size_t)expstart.column - 1)
+	    return NULL;
+	  line = line.subspan (expstart.column - 1,
+			       line.length() - expstart.column + 1);
+	}
+      /* For the last line, don't go past expend.column */
+      else if (l == expend.line)
+	{
+	  if (line.length () < (size_t)expend.column)
+	    return NULL;
+	  line = line.subspan (0, expend.column);
+	}
+
+      obstack_grow (&buf_obstack, line.get_buffer (), line.length ());
+    }
+
+  /* NUL-terminate and finish the buf obstack.  */
+  obstack_1grow (&buf_obstack, 0);
+  const char *buf = (const char *) obstack_finish (&buf_obstack);
+
+  /* TODO should we collapse/trim newlines and runs of spaces?  */
+  return xstrdup (buf);
+}
+
 /* Determine if FILE_PATH missing a trailing newline on its final line.
    Only valid to call once all of the file has been loaded, by
    requesting a line number beyond the end of the file.  */