[v3,04/21] Escape names used in symbol whitelisting regex.

Message ID 20200424092132.150547-5-gprocida@google.com
State Superseded
Headers
Series Simplify regex and suppression parsing. |

Commit Message

Giuliano Procida April 24, 2020, 9:21 a.m. UTC
  There is the theoretical possibility that symbols may contain special
regex characters like '.' and '$'. This patch ensures all such
characters in symbol names are escaped before they are added to the
whitelisting regex.

	* include/regex.h (escape): New string reference holder
	class. (operator<<): Declaration of std::ostream,
	regex::escape overload.
	* include/regex.cc (operator<<): New std::ostream,
	regex::escape overload that outputs regex-escaped strings.
	* src/abg-tools-utils.cc
	(gen_suppr_spec_from_kernel_abi_whitelists): Make sure any
	special regex characters in symbol names are escaped.

Signed-off-by: Giuliano Procida <gprocida@google.com>
---
 include/abg-regex.h | 10 ++++++++++
 src/abg-regex.cc    | 27 +++++++++++++++++++++++++--
 2 files changed, 35 insertions(+), 2 deletions(-)
  

Comments

Matthias Männich April 27, 2020, 11:14 a.m. UTC | #1
On Fri, Apr 24, 2020 at 10:21:15AM +0100, Giuliano Procida wrote:
>There is the theoretical possibility that symbols may contain special
>regex characters like '.' and '$'. This patch ensures all such
>characters in symbol names are escaped before they are added to the
>whitelisting regex.
>
>	* include/regex.h (escape): New string reference holder
>	class. (operator<<): Declaration of std::ostream,
>	regex::escape overload.
>	* include/regex.cc (operator<<): New std::ostream,
>	regex::escape overload that outputs regex-escaped strings.
>	* src/abg-tools-utils.cc
>	(gen_suppr_spec_from_kernel_abi_whitelists): Make sure any
>	special regex characters in symbol names are escaped.
>
>Signed-off-by: Giuliano Procida <gprocida@google.com>
>---
> include/abg-regex.h | 10 ++++++++++
> src/abg-regex.cc    | 27 +++++++++++++++++++++++++--
> 2 files changed, 35 insertions(+), 2 deletions(-)
>
>diff --git a/include/abg-regex.h b/include/abg-regex.h
>index 2f638ef2..59976794 100644
>--- a/include/abg-regex.h
>+++ b/include/abg-regex.h
>@@ -58,6 +58,16 @@ struct regex_t_deleter
>   }
> };//end struct regex_deleter
>
>+/// A class to hold a reference to a string to regex escape.
>+struct escape
>+{
>+  escape(const std::string& str) : ref(str) { }
>+  const std::string& ref;
>+};
>+
>+std::ostream&
>+operator<<(std::ostream& os, const escape& esc);
>+
> std::string
> generate_from_strings(const std::vector<std::string>& strs);
>
>diff --git a/src/abg-regex.cc b/src/abg-regex.cc
>index 79a89033..90e4d144 100644
>--- a/src/abg-regex.cc
>+++ b/src/abg-regex.cc
>@@ -24,6 +24,7 @@
> ///
>
> #include <sstream>
>+#include <ostream>
Sort.

> #include "abg-sptr-utils.h"
> #include "abg-regex.h"
>
>@@ -56,6 +57,28 @@ sptr_utils::build_sptr<regex_t>()
> namespace regex
> {
>
>+/// Escape regex special charaters in input string.
>+///
>+/// @param os the output stream being written to.
>+///
>+/// @param esc the regex_escape object holding a reference to the string
>+/// needing to be escaped.
>+///
>+/// @return the output stream.
>+std::ostream&
>+operator<<(std::ostream& os, const escape& esc)
>+{
>+  static const std::string specials = "^.[$()|*+?{\\";

What about ']' and '}' ?

Cheers,
Matthias

>+  const std::string str = esc.ref;
>+  for (std::string::const_iterator i = str.begin(); i != str.end(); ++i)
>+    {
>+      if (specials.find(*i) != std::string::npos)
>+        os << '\\';
>+      os << *i;
>+    }
>+  return os;
>+}
>+
> /// Generate a regex pattern equivalent to testing set membership.
> ///
> /// A string will match the resulting pattern regex, if and only if it
>@@ -71,9 +94,9 @@ generate_from_strings(const std::vector<std::string>& strs)
>     return "^_^";
>   std::ostringstream os;
>   std::vector<std::string>::const_iterator i = strs.begin();
>-  os << "^(" << *i++;
>+  os << "^(" << escape(*i++);
>   while (i != strs.end())
>-    os << "|" << *i++;
>+    os << "|" << escape(*i++);
>   os << ")$";
>   return os.str();
> }
>-- 
>2.26.2.303.gf8c07b1a785-goog
>
  
Giuliano Procida April 27, 2020, 3:37 p.m. UTC | #2
Hi.

On Mon, 27 Apr 2020 at 12:14, Matthias Maennich <maennich@google.com> wrote:
>
> On Fri, Apr 24, 2020 at 10:21:15AM +0100, Giuliano Procida wrote:
> >There is the theoretical possibility that symbols may contain special
> >regex characters like '.' and '$'. This patch ensures all such
> >characters in symbol names are escaped before they are added to the
> >whitelisting regex.
> >
> >       * include/regex.h (escape): New string reference holder
> >       class. (operator<<): Declaration of std::ostream,
> >       regex::escape overload.
> >       * include/regex.cc (operator<<): New std::ostream,
> >       regex::escape overload that outputs regex-escaped strings.
> >       * src/abg-tools-utils.cc
> >       (gen_suppr_spec_from_kernel_abi_whitelists): Make sure any
> >       special regex characters in symbol names are escaped.
> >
> >Signed-off-by: Giuliano Procida <gprocida@google.com>
> >---
> > include/abg-regex.h | 10 ++++++++++
> > src/abg-regex.cc    | 27 +++++++++++++++++++++++++--
> > 2 files changed, 35 insertions(+), 2 deletions(-)
> >
> >diff --git a/include/abg-regex.h b/include/abg-regex.h
> >index 2f638ef2..59976794 100644
> >--- a/include/abg-regex.h
> >+++ b/include/abg-regex.h
> >@@ -58,6 +58,16 @@ struct regex_t_deleter
> >   }
> > };//end struct regex_deleter
> >
> >+/// A class to hold a reference to a string to regex escape.
> >+struct escape
> >+{
> >+  escape(const std::string& str) : ref(str) { }
> >+  const std::string& ref;
> >+};
> >+
> >+std::ostream&
> >+operator<<(std::ostream& os, const escape& esc);
> >+
> > std::string
> > generate_from_strings(const std::vector<std::string>& strs);
> >
> >diff --git a/src/abg-regex.cc b/src/abg-regex.cc
> >index 79a89033..90e4d144 100644
> >--- a/src/abg-regex.cc
> >+++ b/src/abg-regex.cc
> >@@ -24,6 +24,7 @@
> > ///
> >
> > #include <sstream>
> >+#include <ostream>
> Sort.

Done.

> > #include "abg-sptr-utils.h"
> > #include "abg-regex.h"
> >
> >@@ -56,6 +57,28 @@ sptr_utils::build_sptr<regex_t>()
> > namespace regex
> > {
> >
> >+/// Escape regex special charaters in input string.
> >+///
> >+/// @param os the output stream being written to.
> >+///
> >+/// @param esc the regex_escape object holding a reference to the string
> >+/// needing to be escaped.
> >+///
> >+/// @return the output stream.
> >+std::ostream&
> >+operator<<(std::ostream& os, const escape& esc)
> >+{
> >+  static const std::string specials = "^.[$()|*+?{\\";
>
> What about ']' and '}' ?

I stole my list from somewhere, possibly Wikipedia. To answer your
question, because ']' and '}' are only special when preceded by '['
and '{' respectively.

However, it can be confusing for humans, so I'll add them.

> Cheers,
> Matthias
>
> >+  const std::string str = esc.ref;
> >+  for (std::string::const_iterator i = str.begin(); i != str.end(); ++i)
> >+    {
> >+      if (specials.find(*i) != std::string::npos)
> >+        os << '\\';
> >+      os << *i;
> >+    }
> >+  return os;
> >+}
> >+
> > /// Generate a regex pattern equivalent to testing set membership.
> > ///
> > /// A string will match the resulting pattern regex, if and only if it
> >@@ -71,9 +94,9 @@ generate_from_strings(const std::vector<std::string>& strs)
> >     return "^_^";
> >   std::ostringstream os;
> >   std::vector<std::string>::const_iterator i = strs.begin();
> >-  os << "^(" << *i++;
> >+  os << "^(" << escape(*i++);
> >   while (i != strs.end())
> >-    os << "|" << *i++;
> >+    os << "|" << escape(*i++);
> >   os << ")$";
> >   return os.str();
> > }
> >--
> >2.26.2.303.gf8c07b1a785-goog
> >
  

Patch

diff --git a/include/abg-regex.h b/include/abg-regex.h
index 2f638ef2..59976794 100644
--- a/include/abg-regex.h
+++ b/include/abg-regex.h
@@ -58,6 +58,16 @@  struct regex_t_deleter
   }
 };//end struct regex_deleter
 
+/// A class to hold a reference to a string to regex escape.
+struct escape
+{
+  escape(const std::string& str) : ref(str) { }
+  const std::string& ref;
+};
+
+std::ostream&
+operator<<(std::ostream& os, const escape& esc);
+
 std::string
 generate_from_strings(const std::vector<std::string>& strs);
 
diff --git a/src/abg-regex.cc b/src/abg-regex.cc
index 79a89033..90e4d144 100644
--- a/src/abg-regex.cc
+++ b/src/abg-regex.cc
@@ -24,6 +24,7 @@ 
 ///
 
 #include <sstream>
+#include <ostream>
 #include "abg-sptr-utils.h"
 #include "abg-regex.h"
 
@@ -56,6 +57,28 @@  sptr_utils::build_sptr<regex_t>()
 namespace regex
 {
 
+/// Escape regex special charaters in input string.
+///
+/// @param os the output stream being written to.
+///
+/// @param esc the regex_escape object holding a reference to the string
+/// needing to be escaped.
+///
+/// @return the output stream.
+std::ostream&
+operator<<(std::ostream& os, const escape& esc)
+{
+  static const std::string specials = "^.[$()|*+?{\\";
+  const std::string str = esc.ref;
+  for (std::string::const_iterator i = str.begin(); i != str.end(); ++i)
+    {
+      if (specials.find(*i) != std::string::npos)
+        os << '\\';
+      os << *i;
+    }
+  return os;
+}
+
 /// Generate a regex pattern equivalent to testing set membership.
 ///
 /// A string will match the resulting pattern regex, if and only if it
@@ -71,9 +94,9 @@  generate_from_strings(const std::vector<std::string>& strs)
     return "^_^";
   std::ostringstream os;
   std::vector<std::string>::const_iterator i = strs.begin();
-  os << "^(" << *i++;
+  os << "^(" << escape(*i++);
   while (i != strs.end())
-    os << "|" << *i++;
+    os << "|" << escape(*i++);
   os << ")$";
   return os.str();
 }