[RFC,2/7] Add libiberty/concat styled concat_path function

Message ID 20170112113217.48852-3-prudo@linux.vnet.ibm.com
State New, archived
Headers

Commit Message

Philipp Rudo Jan. 12, 2017, 11:32 a.m. UTC
  This commit adds concat_path function to concatenate an arbitrary number of
    path elements.  The function automatically adds an directory separator between
    two elements as needed.

    gdb/ChangeLog:

	* common/common-utils (startswith): Convert to C++.
	(endswith): New function.
	* utils.c (_concat_path, approx_path_length): New function.
	* utils.h (_concat_path): New export.
	(concat_path): New define.
---
 gdb/common/common-utils.h | 16 +++++++++++++---
 gdb/utils.c               | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 gdb/utils.h               | 17 +++++++++++++++++
 3 files changed, 76 insertions(+), 3 deletions(-)
  

Comments

Pedro Alves Jan. 12, 2017, noon UTC | #1
On 01/12/2017 11:32 AM, Philipp Rudo wrote:
>  static inline int
> -startswith (const char *string, const char *pattern)
> +startswith (std::string str, std::string pattern)

NAK.

This is passing by copy, so will force unnecessary deep
string copies all over the place.

>  {
> -  return strncmp (string, pattern, strlen (pattern)) == 0;
> +  return (str.find (pattern) == 0);
> +}
> +

I.e., before, this caused 0 copies:

  startswith ("foo, "f");

After, you force a deep string copy for "foo", and another
for "f".  It's as if you wrote:

  startswith (xstrdup ("foo), xstrdup ("f"));


Also, this function is a static inline in a header so that
the compiler can see when "pattern" is a string literal, and
thus strlen can be optimized to a plain 'sizeof (pattern) - 1',
which is very frequent.

If you want to add overloads that can take "const std::string &"
for convenience, to avoid str.c_str(), that's maybe fine,
but you'd have to add all the combinations of
'const char *' x 'const std::string &' in the parameters, I
suppose.

Thanks,
Pedro Alves
  
Philipp Rudo Jan. 12, 2017, 1:33 p.m. UTC | #2
Hi Pedro,

i see your point. 

My goal here was to get rid of any C-string. While making this patch i
also wanted to use it to get rid of all those

concat (path, need_dirsep ? SLASH_STRING : "", NULL)

or

strcat (path, "/")
strcat (path, file)

constructs. I gave up when it repeatedly caused memory leaks and use
after free errors because of the mixture of C and C++ strings. Fixing
them made the code less readable than before. Thus you should only use
one kind of string through out GDB, either char * or std::string. And
as GDB decided to move to C++ for me std::string is the way you should
go. Even when it costs performance.

Just my 0.02$
Philipp


On Thu, 12 Jan 2017 12:00:22 +0000
Pedro Alves <palves@redhat.com> wrote:

> On 01/12/2017 11:32 AM, Philipp Rudo wrote:
> >  static inline int
> > -startswith (const char *string, const char *pattern)
> > +startswith (std::string str, std::string pattern)  
> 
> NAK.
> 
> This is passing by copy, so will force unnecessary deep
> string copies all over the place.
> 
> >  {
> > -  return strncmp (string, pattern, strlen (pattern)) == 0;
> > +  return (str.find (pattern) == 0);
> > +}
> > +  
> 
> I.e., before, this caused 0 copies:
> 
>   startswith ("foo, "f");
> 
> After, you force a deep string copy for "foo", and another
> for "f".  It's as if you wrote:
> 
>   startswith (xstrdup ("foo), xstrdup ("f"));
> 
> 
> Also, this function is a static inline in a header so that
> the compiler can see when "pattern" is a string literal, and
> thus strlen can be optimized to a plain 'sizeof (pattern) - 1',
> which is very frequent.
> 
> If you want to add overloads that can take "const std::string &"
> for convenience, to avoid str.c_str(), that's maybe fine,
> but you'd have to add all the combinations of
> 'const char *' x 'const std::string &' in the parameters, I
> suppose.
> 
> Thanks,
> Pedro Alves
>
  
Pedro Alves Jan. 12, 2017, 1:48 p.m. UTC | #3
On 01/12/2017 01:33 PM, Philipp Rudo wrote:
> Hi Pedro,
> 
> i see your point. 
> 
> My goal here was to get rid of any C-string. While making this patch i
> also wanted to use it to get rid of all those
> 
> concat (path, need_dirsep ? SLASH_STRING : "", NULL)
> 
> or
> 
> strcat (path, "/")
> strcat (path, file)
> 
> constructs. I gave up when it repeatedly caused memory leaks and use
> after free errors because of the mixture of C and C++ strings. Fixing
> them made the code less readable than before. Thus you should only use
> one kind of string through out GDB, either char * or std::string. And
> as GDB decided to move to C++ for me std::string is the way you should
> go. 

Even if we used std::string throughout, we should still be careful
with unnecessary string copying.  "std::string" vs "const string &"
in function parameters (use the former only when the function already
needs to work with a copy).  

Similarly, please don't write:

+  for (std::string arg: args)
+    {

Please write instead:

 for (const std::string &arg : args)

Or 

 for (const auto &arg : args)

"for (std::string arg: args)" creates/destroys
one deep string copy on each iteration.

I hope it's obvious that I'm all for C++ conversion, but ...

> Even when it costs performance.

... not at any cost.  startswith _is_ used in performance
critical paths.

BTW, I've been thinking that we may want to add our version
of C++17 std::string_view to avoid these kinds of problems.

Thanks,
Pedro Alves
  
Philipp Rudo Jan. 12, 2017, 3:08 p.m. UTC | #4
On Thu, 12 Jan 2017 13:48:03 +0000
Pedro Alves <palves@redhat.com> wrote:

> On 01/12/2017 01:33 PM, Philipp Rudo wrote:
> > Hi Pedro,
> > 
> > i see your point. 
> > 
> > My goal here was to get rid of any C-string. While making this
> > patch i also wanted to use it to get rid of all those
> > 
> > concat (path, need_dirsep ? SLASH_STRING : "", NULL)
> > 
> > or
> > 
> > strcat (path, "/")
> > strcat (path, file)
> > 
> > constructs. I gave up when it repeatedly caused memory leaks and use
> > after free errors because of the mixture of C and C++ strings.
> > Fixing them made the code less readable than before. Thus you
> > should only use one kind of string through out GDB, either char *
> > or std::string. And as GDB decided to move to C++ for me
> > std::string is the way you should go.   
> 
> Even if we used std::string throughout, we should still be careful
> with unnecessary string copying.  "std::string" vs "const string &"
> in function parameters (use the former only when the function already
> needs to work with a copy).  

You are right here.

> Similarly, please don't write:
> 
> +  for (std::string arg: args)
> +    {
> 
> Please write instead:
> 
>  for (const std::string &arg : args)
> 
> Or 
> 
>  for (const auto &arg : args)
> 
> "for (std::string arg: args)" creates/destroys
> one deep string copy on each iteration.

Thanks for the tip. I will check my patches again.
 
> I hope it's obvious that I'm all for C++ conversion, but ...

its quite obvious that you are pro C++. My own feelings are quite
mixed. I must amid that the standard library is quite handy, at least
when it works. Debugging it is quite a pain. But the syntax sometimes
is unreadable, especially when you use those teplates containing
templates using types is different namespaces.
Furthermore i don't think that simply porting GDB to C++ solves its
major problem that there are no clear structures in code. Although it
could help if you not only port it one to one but restructure the code
while you work at it. But that would requires quite some work...

> > Even when it costs performance.  
> 
> ... not at any cost.  startswith _is_ used in performance
> critical paths.
>
> BTW, I've been thinking that we may want to add our version
> of C++17 std::string_view to avoid these kinds of problems.

As much as i know about C++ this sounds like a good idea.

Thanks
Philipp
  
Pedro Alves Jan. 12, 2017, 3:42 p.m. UTC | #5
On 01/12/2017 03:08 PM, Philipp Rudo wrote:
> its quite obvious that you are pro C++. My own feelings are quite
> mixed. I must amid that the standard library is quite handy, at least
> when it works. Debugging it is quite a pain.

There's definitely lots of scope for making C++ debugging less painful.
I think 2017 will see good advancements here.  For example, with the
palves/cp-linespec branch on my github, setting breakpoints in C++ methods
is soooooo much easier, mainly because I've made linespec tab completion
Just Work.  The whole "compile" feature for C++ (making use of g++ for
the parsing, via the libcc1 plugin) should be making it upstream this year.
Etc.

Keep in mind that a significant (if not the largest) chunk of
our users is using GDB to debug their C++ code too.  So in a sense,
any pain we now may feel, a good chunk of our users have been feeling
for a long while.  We just hadn't been dogfooding.

> But the syntax sometimes
> is unreadable, especially when you use those teplates containing
> templates using types is different namespaces.

OTOH, stepping through C code that emulates templates using
C #defines OTOH is just plain impossible, since all of it
is compiled down to a single source line...  So in that sense,
I think debugging C++ is better than C here.

It also helps if you're using a distro that installs pretty printers
for the standard library correctly.  Fedora does.  Ubuntu didn't use
to, but I don't know the current state.

Thanks,
Pedro Alves
  

Patch

diff --git a/gdb/common/common-utils.h b/gdb/common/common-utils.h
index 618d266..be3c106 100644
--- a/gdb/common/common-utils.h
+++ b/gdb/common/common-utils.h
@@ -74,13 +74,23 @@  char *savestring (const char *ptr, size_t len);
 
 extern char *safe_strerror (int);
 
-/* Return non-zero if the start of STRING matches PATTERN, zero
+/* Return non-zero if the start of STR matches PATTERN, zero
    otherwise.  */
 
 static inline int
-startswith (const char *string, const char *pattern)
+startswith (std::string str, std::string pattern)
 {
-  return strncmp (string, pattern, strlen (pattern)) == 0;
+  return (str.find (pattern) == 0);
+}
+
+/* Return non-zero if the end of STR matches PATTERN, zero
+   otherwise.  */
+
+static inline int
+endswith (std::string str, std::string pattern)
+{
+  return (str.rfind (pattern) != std::string::npos
+	  && str.rfind (pattern) == (str.length () - pattern.length ()));
 }
 
 ULONGEST strtoulst (const char *num, const char **trailer, int base);
diff --git a/gdb/utils.c b/gdb/utils.c
index ee37e49..e924b2c 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -3193,6 +3193,52 @@  substitute_path_component (std::string &str, const std::string &from,
     }
 }
 
+/* Approximate length of final path.  Helper for concat_path.  */
+
+static inline unsigned long
+approx_path_length (std::initializer_list<std::string> args,
+		   std::string dir_separator)
+{
+  size_t length = 0;
+
+  for (std::string arg: args)
+      length += arg.length () + dir_separator.length ();
+
+  return length;
+}
+
+/* See utils.h.  */
+
+std::string
+_concat_path (std::initializer_list<std::string> args,
+	      std::string dir_separator)
+{
+  std::string dst;
+  dst.reserve (approx_path_length (args, dir_separator));
+
+  for (std::string arg: args)
+    {
+      if (arg.empty ())
+	continue;
+
+      if (startswith (arg, dir_separator)
+	  && endswith (dst, dir_separator))
+	dst.erase (dst.length () - dir_separator.length (),
+		   dir_separator.length ());
+
+      else if (!dst.empty ()
+	       && !startswith (arg, dir_separator)
+	       && !endswith (dst, dir_separator)
+	       && dst != TARGET_SYSROOT_PREFIX)
+	dst += dir_separator;
+
+      dst += arg;
+    }
+
+  dst.shrink_to_fit ();
+  return dst;
+}
+
 #ifdef HAVE_WAITPID
 
 #ifdef SIGALRM
diff --git a/gdb/utils.h b/gdb/utils.h
index 674f672..5d68a18 100644
--- a/gdb/utils.h
+++ b/gdb/utils.h
@@ -24,6 +24,7 @@ 
 #include "exceptions.h"
 #include "common/scoped_restore.h"
 #include <chrono>
+#include <string>
 
 extern void initialize_utils (void);
 
@@ -143,6 +144,22 @@  extern void substitute_path_component (std::string &str,
 				       const std::string &from,
 				       const std::string &to);
 
+/* Concatenate an arbitrary number of path elements.  Adds and removes
+   directory separators as needed.
+
+   concat_path (/first, second)		=> /first/second
+   concat_path (first, second)		=> first/second
+   concat_path (first/, second)		=> first/second
+   concat_path (first, /second)		=> first/second
+   concat_path (first/, /second)	=> first/second
+   concat_path (target:, second)	=> target:second
+   */
+
+extern std::string _concat_path (std::initializer_list<std::string> args,
+				 std::string dir_separator);
+
+#define concat_path(...) _concat_path ({__VA_ARGS__}, SLASH_STRING)
+
 char *ldirname (const char *filename);
 
 extern int count_path_elements (const char *path);