diff mbox series

[1/3] test-diff-suppr: Improve regexes in tests.

Message ID CAGvU0Hmyc_CYzD7Z--f=y6fcUAKxyZVgR_HF9DY9dVzHYaOmrw@mail.gmail.com
State Dropped
Headers show
Series [1/3] test-diff-suppr: Improve regexes in tests. | expand

Commit Message

Giuliano Procida April 22, 2020, 7:11 p.m. UTC
Hi there.

Thanks for the review.

On Wed, 22 Apr 2020 at 14:48, Dodji Seketeli <dodji@seketeli.org> wrote:
>
> Hello Giuliano,
>
> Giuliano Procida <gprocida@google.com> a ?crit:
>
> > The are some (mostly inconsequential) issues with regexes:
> >
> >   - Leading and trailing .* are redundant and can be removed.
>
> They are redundant I agree, but that redundancy can be useful, I think.
> The reason why I have put them in the test is that in the past, I have
> tried different regular expression engines.  Some of them would
> implement the quantifier in a greedy manner, rather than in a lazy
> manner.  Whereas, what I find more useful is the lazy variant.  So that
> writing ".*blah" would do what you would expect (rather than having to
> write .*?blah)

If we only care about match existence (which is the case for
libabigail), greediness shouldn't matter, the engine should backtrack.
Do you mean possessive matching (I've just found this on Wikipedia,
it's not something I thought regex engines had)? That would be painful
to work with.

If you want more features (like being able to match foofoo for any
foo), you'll want PCRE. If you are thinking of embedding libabigail in
a server, you'll want RE2.

> So putting that in those tests explicitly will hopefully help us catch
> potential issues if we are to switch to a different engine tomorrow.  So
> I'd prefer keeping that redundancy in those tests.

Fair enough.

> [...]
>
> >   - File name matches are full path
>
> Are they?

Correction, some file name matches seem to be full path. This may be a
bug. I've found:

Path in XML
corpus_is_suppressed_by_soname_or_filename
suppression_matches_soname_or_filename
matches_binary_name (@param binary_name the full path to the binary)

ELF path
suppression_can_match
matches_binary_name (@param binary_name the full path to the binary)

I've not traced through the code, only looked at it. This change
breaks make check:


> It seems to me file_suppression::suppress_file() acts on the base name
> of the file, rather than on its full path.
>
> I might be missing something here, please tell me.
>
> > so should start with (^|/) if trying to match a base name, assuming a
> > Unix-like filesystem.
>
> I'd need the above to be clarified to say for sure.
>
> >     Given these are just tests, it's not that important, but they
> >     still serve as examples.
> >     - In cases where the ^ anchor was used, full paths would usually
> >       fail to match. In such cases, the regex was being ignored for
> >       other reasons (see later patch) or is expected not to match
> >       anyway.
> >     - In many cases, the $ anchor could be considered to be missing.
> >   - The .ini parser unescapes string values, so escaping regex
> >     metacharacters requires a double backslash. Single backslashes
> >     are pointless.
> >   - The dot metacharacter is used unescaped in a few places where a
> >     literal was likely intended, so should be escaped.
> >   - The characters [ and ] don't need to be (.ini) escaped.
>
> They don't *need* to (in a string) I agree.  But it they are escaped,
> that should work.  It's a way to test the init parser as well.  So I'd
> keep the redundancy there as well.

I'd go as far to say "\." should be "\\.". In fact, changing all the
tests to this exposes a bug in the init printer (it doesn't re-escape
strings).

For the rest, I'm much less opinionated. :-)

> [...]
>
> Cheers,

Regards,
Giuliano.

> --
>                 Dodji
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

Comments

Giuliano Procida May 1, 2020, 3:56 p.m. UTC | #1
Hi Dodji.

On Wed, 22 Apr 2020 at 20:11, Giuliano Procida <gprocida@google.com> wrote:
>
> Hi there.
>
> Thanks for the review.
>
> On Wed, 22 Apr 2020 at 14:48, Dodji Seketeli <dodji@seketeli.org> wrote:
> >
> > Hello Giuliano,
> >
> > Giuliano Procida <gprocida@google.com> a ?crit:
> >
> > > The are some (mostly inconsequential) issues with regexes:
> > >
> > >   - Leading and trailing .* are redundant and can be removed.
> >
> > They are redundant I agree, but that redundancy can be useful, I think.
> > The reason why I have put them in the test is that in the past, I have
> > tried different regular expression engines.  Some of them would
> > implement the quantifier in a greedy manner, rather than in a lazy
> > manner.  Whereas, what I find more useful is the lazy variant.  So that
> > writing ".*blah" would do what you would expect (rather than having to
> > write .*?blah)
>
> If we only care about match existence (which is the case for
> libabigail), greediness shouldn't matter, the engine should backtrack.
> Do you mean possessive matching (I've just found this on Wikipedia,
> it's not something I thought regex engines had)? That would be painful
> to work with.
>
> If you want more features (like being able to match foofoo for any
> foo), you'll want PCRE. If you are thinking of embedding libabigail in
> a server, you'll want RE2.
>
> > So putting that in those tests explicitly will hopefully help us catch
> > potential issues if we are to switch to a different engine tomorrow.  So
> > I'd prefer keeping that redundancy in those tests.
>
> Fair enough.
>
> > [...]
> >
> > >   - File name matches are full path
> >
> > Are they?
>
> Correction, some file name matches seem to be full path. This may be a
> bug. I've found:

Have you had a chance to look at these cases where full path matching
is being done? Would you like a patch that makes it base name matching
instead?

Regards,
Giuliano.

> Path in XML
> corpus_is_suppressed_by_soname_or_filename
> suppression_matches_soname_or_filename
> matches_binary_name (@param binary_name the full path to the binary)
>
> ELF path
> suppression_can_match
> matches_binary_name (@param binary_name the full path to the binary)
>
> I've not traced through the code, only looked at it. This change
> breaks make check:
>
> diff --git a/tests/data/test-diff-suppr/test24-soname-suppr-2.txt
> b/tests/data/test-diff-suppr/test24-soname-suppr-2.txt
> index d11e632c..8ac78817 100644
> --- a/tests/data/test-diff-suppr/test24-soname-suppr-2.txt
> +++ b/tests/data/test-diff-suppr/test24-soname-suppr-2.txt
> @@ -1,4 +1,4 @@
>  [suppress_type]
> -  file_name_regexp = .*/libtest24-soname-v0.so$
> +  file_name_regexp = ^libtest24-soname-v0.so$
>    name = S
>    accessed_through = reference-or-pointer
> diff --git a/tests/data/test-diff-suppr/test24-soname-suppr-3.txt
> b/tests/data/test-diff-suppr/test24-soname-suppr-3.txt
> index 00d3428a..e94a6477 100644
> --- a/tests/data/test-diff-suppr/test24-soname-suppr-3.txt
> +++ b/tests/data/test-diff-suppr/test24-soname-suppr-3.txt
> @@ -1,4 +1,4 @@
>  [suppress_type]
> -  file_name_regexp = .*/libtest24-soname-v1.so$
> +  file_name_regexp = ^libtest24-soname-v1.so$
>    name = S
>    accessed_through = reference-or-pointer
>
> > It seems to me file_suppression::suppress_file() acts on the base name
> > of the file, rather than on its full path.
> >
> > I might be missing something here, please tell me.
> >
> > > so should start with (^|/) if trying to match a base name, assuming a
> > > Unix-like filesystem.
> >
> > I'd need the above to be clarified to say for sure.
> >
> > >     Given these are just tests, it's not that important, but they
> > >     still serve as examples.
> > >     - In cases where the ^ anchor was used, full paths would usually
> > >       fail to match. In such cases, the regex was being ignored for
> > >       other reasons (see later patch) or is expected not to match
> > >       anyway.
> > >     - In many cases, the $ anchor could be considered to be missing.
> > >   - The .ini parser unescapes string values, so escaping regex
> > >     metacharacters requires a double backslash. Single backslashes
> > >     are pointless.
> > >   - The dot metacharacter is used unescaped in a few places where a
> > >     literal was likely intended, so should be escaped.
> > >   - The characters [ and ] don't need to be (.ini) escaped.
> >
> > They don't *need* to (in a string) I agree.  But it they are escaped,
> > that should work.  It's a way to test the init parser as well.  So I'd
> > keep the redundancy there as well.
>
> I'd go as far to say "\." should be "\\.". In fact, changing all the
> tests to this exposes a bug in the init printer (it doesn't re-escape
> strings).
>
> For the rest, I'm much less opinionated. :-)
>
> > [...]
> >
> > Cheers,
>
> Regards,
> Giuliano.
>
> > --
> >                 Dodji
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >
diff mbox series

Patch

diff --git a/tests/data/test-diff-suppr/test24-soname-suppr-2.txt
b/tests/data/test-diff-suppr/test24-soname-suppr-2.txt
index d11e632c..8ac78817 100644
--- a/tests/data/test-diff-suppr/test24-soname-suppr-2.txt
+++ b/tests/data/test-diff-suppr/test24-soname-suppr-2.txt
@@ -1,4 +1,4 @@ 
 [suppress_type]
-  file_name_regexp = .*/libtest24-soname-v0.so$
+  file_name_regexp = ^libtest24-soname-v0.so$
   name = S
   accessed_through = reference-or-pointer
diff --git a/tests/data/test-diff-suppr/test24-soname-suppr-3.txt
b/tests/data/test-diff-suppr/test24-soname-suppr-3.txt
index 00d3428a..e94a6477 100644
--- a/tests/data/test-diff-suppr/test24-soname-suppr-3.txt
+++ b/tests/data/test-diff-suppr/test24-soname-suppr-3.txt
@@ -1,4 +1,4 @@ 
 [suppress_type]
-  file_name_regexp = .*/libtest24-soname-v1.so$
+  file_name_regexp = ^libtest24-soname-v1.so$
   name = S
   accessed_through = reference-or-pointer