[1/4] Improve identification of memory mappings

Message ID 1426707523-6499-2-git-send-email-sergiodj@redhat.com
State New, archived
Headers

Commit Message

Sergio Durigan Junior March 18, 2015, 7:38 p.m. UTC
  This commit implements the new 'enum memory_mapping_state', which can
be used to represent the different states of each memory mapping from
the inferior.  These states are:

  - MODIFIED, which means that the mapping should be dumped in
    corefiles

  - UNMODIFIED, which means that the mapping should not be dumped in
    corefiles (e.g., mappings that have been marked as VM_DONTDUMP), and

  - UNKNOWN, which means that we don't know whether the mapping should
    or should not be dumped.

The last state is not used in this patch, but will be used in the
following patches of this series.

It is also worth mentioning that I do not intend to commit this patch
independently.  IOW, I will merge everything before pushing one single
commit.

Changes from v2:

  - Moved declaration of 'enum memory_mapping_state' from
    gdb/common/common-defs.h to gdb/defs.h.

  - Renamed variable to 'modified_state' (instead of 'state').

gdb/ChangeLog:
2015-03-18  Sergio Durigan Junior  <sergiodj@redhat.com>
	    Jan Kratochvil  <jan.kratochvil@redhat.com>
	    Oleg Nesterov  <oleg@redhat.com>

	PR corefiles/16092
	* defs.h (enum memory_mapping_state): New enum.
	(find_memory_region_ftype): Remove 'int modified' parameter,
	replacing by 'enum memory_mapping_state modified_state'.
	* gcore.c (gcore_create_callback): Likewise.
	(objfile_find_memory_regions): Passing
	'MEMORY_MAPPING_UNKNOWN_STATE' or 'MEMORY_MAPPING_MODIFIED' when
	needed to 'func' callback, instead of saying the memory mapping
	was modified even without knowing it.
	* gnu-nat.c (gnu_find_memory_regions): Likewise.
	* linux-tdep.c (linux_find_memory_region_ftype): Remove 'int
	modified' parameter, replacing by 'enum memory_mapping_state
	modified_state'.
	(linux_find_memory_regions_thunk): Likewise.
	(linux_make_mappings_callback): Likewise.
	(find_mapping_size): Likewise.
	* procfs.c (find_memory_regions_callback): Likewise.
---
 gdb/defs.h       | 13 ++++++++++++-
 gdb/gcore.c      | 16 ++++++++++------
 gdb/gnu-nat.c    |  4 ++--
 gdb/linux-tdep.c | 25 ++++++++++++++++---------
 gdb/procfs.c     |  2 +-
 5 files changed, 41 insertions(+), 19 deletions(-)
  

Comments

Pedro Alves March 19, 2015, 10:39 a.m. UTC | #1
Thanks for splitting.  It indeed makes it easier to understand
the changes.

On 03/18/2015 07:38 PM, Sergio Durigan Junior wrote:
> This commit implements the new 'enum memory_mapping_state', which can
> be used to represent the different states of each memory mapping from
> the inferior.  These states are:
> 
>   - MODIFIED, which means that the mapping should be dumped in
>     corefiles
> 
>   - UNMODIFIED, which means that the mapping should not be dumped in
>     corefiles (e.g., mappings that have been marked as VM_DONTDUMP), and
> 
>   - UNKNOWN, which means that we don't know whether the mapping should
>     or should not be dumped.
> 

I'm sorry to push back on this, but it sounds to me that this is mixing
up orthogonal aspects.

For example, what if a mapping is indeed modified, but the tdep code
decides it should not be dumped?  With this interface, you need to
"lie" and pass down UNMODIFIED.

And then, what if a mapping is definitely not modified, but the
tdep code decides it should dumped (e.g., say we could tell that the
vdso mapping really wasn't modified, but we still want to dump
it anyhow because there's no file on the filesystem to read the
vdso contents from later at core load time).  With this interface,
you need to pass down either MODIFIED or UNKNOWN.

So it sounds to me that instead, the arch/target code that is walking
the memory mappings should just not call the "dump this mapping"
callback if it decides the mapping should not be dumped.

And if we do _that_ first, then, what other changes to
gcore_create_callback would be required to make it do what
we need?

This may need further discussion, but in any case, note that the
descriptions above of what each state means ...

> +/* Enum used to inform the state of a memory mapping.  This is used in
> +   functions implementing find_memory_region_ftype.  */
> +
> +enum memory_mapping_state
> +  {
> +    MEMORY_MAPPING_MODIFIED,
> +    MEMORY_MAPPING_UNMODIFIED,
> +    MEMORY_MAPPING_UNKNOWN_STATE,
> +  };

... should be here in the code.

Thanks,
Pedro Alves
  
Sergio Durigan Junior March 19, 2015, 11:06 p.m. UTC | #2
On Thursday, March 19 2015, Pedro Alves wrote:

> On 03/18/2015 07:38 PM, Sergio Durigan Junior wrote:
>> This commit implements the new 'enum memory_mapping_state', which can
>> be used to represent the different states of each memory mapping from
>> the inferior.  These states are:
>> 
>>   - MODIFIED, which means that the mapping should be dumped in
>>     corefiles
>> 
>>   - UNMODIFIED, which means that the mapping should not be dumped in
>>     corefiles (e.g., mappings that have been marked as VM_DONTDUMP), and
>> 
>>   - UNKNOWN, which means that we don't know whether the mapping should
>>     or should not be dumped.
>> 
>
> I'm sorry to push back on this, but it sounds to me that this is mixing
> up orthogonal aspects.
>
> For example, what if a mapping is indeed modified, but the tdep code
> decides it should not be dumped?  With this interface, you need to
> "lie" and pass down UNMODIFIED.
>
> And then, what if a mapping is definitely not modified, but the
> tdep code decides it should dumped (e.g., say we could tell that the
> vdso mapping really wasn't modified, but we still want to dump
> it anyhow because there's no file on the filesystem to read the
> vdso contents from later at core load time).  With this interface,
> you need to pass down either MODIFIED or UNKNOWN.
>
> So it sounds to me that instead, the arch/target code that is walking
> the memory mappings should just not call the "dump this mapping"
> callback if it decides the mapping should not be dumped.

Right, I agree there is some confusion in the terms being used here.
Thanks for giving examples that make this confusion obvious.

While I still think gcore_create_callback should probably receive more
attention, I will withdraw this patch because it doesn't really help to
fix the problem at hand, which is to make GDB obey
/proc/PID/coredump_filter.

> And if we do _that_ first, then, what other changes to
> gcore_create_callback would be required to make it do what
> we need?

If we do what you proposed, we wouldn't need to change
gcore_create_callback at all *to fix the specific problem of making GDB
obey /proc/PID/smaps*.  This is why, as I said, I am withdrawing this
patch.

However, IMHO gcore_create_callback still has some problems.  For
example, this heuristic used to determine whether a mapping should be
dumped or not:

  if (write == 0 && modified == 0 && !solib_keep_data_in_core (vaddr, size))
    {
      /* See if this region of memory lies inside a known file on disk.
	 If so, we can avoid copying its contents by clearing SEC_LOAD.  */
      struct objfile *objfile;
      struct obj_section *objsec;

      ALL_OBJSECTIONS (objfile, objsec)
	{
	  bfd *abfd = objfile->obfd;
	  asection *asec = objsec->the_bfd_section;
	  bfd_vma align = (bfd_vma) 1 << bfd_get_section_alignment (abfd,
								    asec);
	  bfd_vma start = obj_section_addr (objsec) & -align;
	  bfd_vma end = (obj_section_endaddr (objsec) + align - 1) & -align;

	  /* Match if either the entire memory region lies inside the
	     section (i.e. a mapping covering some pages of a large
	     segment) or the entire section lies inside the memory region
	     (i.e. a mapping covering multiple small sections).

	     This BFD was synthesized from reading target memory,
	     we don't want to omit that.  */
	  if (objfile->separate_debug_objfile_backlink == NULL
	      && ((vaddr >= start && vaddr + size <= end)
	          || (start >= vaddr && end <= vaddr + size))
	      && !(bfd_get_file_flags (abfd) & BFD_IN_MEMORY))
	    {
	      flags &= ~(SEC_LOAD | SEC_HAS_CONTENTS);
	      goto keep;	/* Break out of two nested for loops.  */
	    }
	}

    keep:;
    }

will not be used by any code, because everyone will be passing
'modified' as 1 with my following patch (the only code that could pass
'modified' as zero was linux_find_memory_regions_full, which I will
patch to only pass 1 as well).

> This may need further discussion, but in any case, note that the
> descriptions above of what each state means ...
>
>> +/* Enum used to inform the state of a memory mapping.  This is used in
>> +   functions implementing find_memory_region_ftype.  */
>> +
>> +enum memory_mapping_state
>> +  {
>> +    MEMORY_MAPPING_MODIFIED,
>> +    MEMORY_MAPPING_UNMODIFIED,
>> +    MEMORY_MAPPING_UNKNOWN_STATE,
>> +  };
>
> ... should be here in the code.

This is not needed anymore.

Thanks,
  
Pedro Alves March 20, 2015, 7:11 p.m. UTC | #3
On 03/19/2015 11:06 PM, Sergio Durigan Junior wrote:

> However, IMHO gcore_create_callback still has some problems.  For
> example, this heuristic used to determine whether a mapping should be
> dumped or not:
> 
>   if (write == 0 && modified == 0 && !solib_keep_data_in_core (vaddr, size))
>     {
>       /* See if this region of memory lies inside a known file on disk.
> 	 If so, we can avoid copying its contents by clearing SEC_LOAD.  */
>       struct objfile *objfile;
>       struct obj_section *objsec;
> 
>       ALL_OBJSECTIONS (objfile, objsec)
> 	{
> 	  bfd *abfd = objfile->obfd;
> 	  asection *asec = objsec->the_bfd_section;
> 	  bfd_vma align = (bfd_vma) 1 << bfd_get_section_alignment (abfd,
> 								    asec);
> 	  bfd_vma start = obj_section_addr (objsec) & -align;
> 	  bfd_vma end = (obj_section_endaddr (objsec) + align - 1) & -align;
> 
> 	  /* Match if either the entire memory region lies inside the
> 	     section (i.e. a mapping covering some pages of a large
> 	     segment) or the entire section lies inside the memory region
> 	     (i.e. a mapping covering multiple small sections).
> 
> 	     This BFD was synthesized from reading target memory,
> 	     we don't want to omit that.  */
> 	  if (objfile->separate_debug_objfile_backlink == NULL
> 	      && ((vaddr >= start && vaddr + size <= end)
> 	          || (start >= vaddr && end <= vaddr + size))
> 	      && !(bfd_get_file_flags (abfd) & BFD_IN_MEMORY))
> 	    {
> 	      flags &= ~(SEC_LOAD | SEC_HAS_CONTENTS);
> 	      goto keep;	/* Break out of two nested for loops.  */
> 	    }
> 	}
> 
>     keep:;
>     }
> 
> will not be used by any code, because everyone will be passing
> 'modified' as 1 with my following patch (the only code that could pass
> 'modified' as zero was linux_find_memory_regions_full, which I will
> patch to only pass 1 as well).

Alright.  Good that that now became clear.

I found the initial submission for that, btw:

  https://sourceware.org/ml/gdb-patches/2003-10/msg00164.html

I wonder whether it'd be worth to keep that somehow, for the fallback
cases when /proc//smaps or some other /proc file you're relying
on for file-backed read-only region identification is missing
(because old kernel, or even /proc not mounted).  Maybe not.

Thanks,
Pedro Alves
  
Sergio Durigan Junior March 20, 2015, 8:14 p.m. UTC | #4
On Friday, March 20 2015, Pedro Alves wrote:

> On 03/19/2015 11:06 PM, Sergio Durigan Junior wrote:
>
>> However, IMHO gcore_create_callback still has some problems.  For
>> example, this heuristic used to determine whether a mapping should be
>> dumped or not:
>> 
>>   if (write == 0 && modified == 0 && !solib_keep_data_in_core (vaddr, size))
>>     {
>>       /* See if this region of memory lies inside a known file on disk.
>> 	 If so, we can avoid copying its contents by clearing SEC_LOAD.  */
>>       struct objfile *objfile;
>>       struct obj_section *objsec;
>> 
>>       ALL_OBJSECTIONS (objfile, objsec)
>> 	{
>> 	  bfd *abfd = objfile->obfd;
>> 	  asection *asec = objsec->the_bfd_section;
>> 	  bfd_vma align = (bfd_vma) 1 << bfd_get_section_alignment (abfd,
>> 								    asec);
>> 	  bfd_vma start = obj_section_addr (objsec) & -align;
>> 	  bfd_vma end = (obj_section_endaddr (objsec) + align - 1) & -align;
>> 
>> 	  /* Match if either the entire memory region lies inside the
>> 	     section (i.e. a mapping covering some pages of a large
>> 	     segment) or the entire section lies inside the memory region
>> 	     (i.e. a mapping covering multiple small sections).
>> 
>> 	     This BFD was synthesized from reading target memory,
>> 	     we don't want to omit that.  */
>> 	  if (objfile->separate_debug_objfile_backlink == NULL
>> 	      && ((vaddr >= start && vaddr + size <= end)
>> 	          || (start >= vaddr && end <= vaddr + size))
>> 	      && !(bfd_get_file_flags (abfd) & BFD_IN_MEMORY))
>> 	    {
>> 	      flags &= ~(SEC_LOAD | SEC_HAS_CONTENTS);
>> 	      goto keep;	/* Break out of two nested for loops.  */
>> 	    }
>> 	}
>> 
>>     keep:;
>>     }
>> 
>> will not be used by any code, because everyone will be passing
>> 'modified' as 1 with my following patch (the only code that could pass
>> 'modified' as zero was linux_find_memory_regions_full, which I will
>> patch to only pass 1 as well).
>
> Alright.  Good that that now became clear.
>
> I found the initial submission for that, btw:
>
>   https://sourceware.org/ml/gdb-patches/2003-10/msg00164.html

Thanks for doing that.

> I wonder whether it'd be worth to keep that somehow, for the fallback
> cases when /proc//smaps or some other /proc file you're relying
> on for file-backed read-only region identification is missing
> (because old kernel, or even /proc not mounted).  Maybe not.

I will not touch this code in this series, of course, but I also think
that this part should be used more often.  For example, there are cases
when we are not sure whether the mapping needs to be dumped or not
(e.g., see gdb/gnu-nat.c), and currently we are just passing 'modified =
1', which makes GDB dump those pages.  I guess this is a situation where
this code could help.

Anyway, something that I can look into later.
  

Patch

diff --git a/gdb/defs.h b/gdb/defs.h
index 72512f6..37b1430 100644
--- a/gdb/defs.h
+++ b/gdb/defs.h
@@ -330,6 +330,16 @@  extern void init_source_path (void);
 
 /* From exec.c */
 
+/* Enum used to inform the state of a memory mapping.  This is used in
+   functions implementing find_memory_region_ftype.  */
+
+enum memory_mapping_state
+  {
+    MEMORY_MAPPING_MODIFIED,
+    MEMORY_MAPPING_UNMODIFIED,
+    MEMORY_MAPPING_UNKNOWN_STATE,
+  };
+
 /* * Process memory area starting at ADDR with length SIZE.  Area is
    readable iff READ is non-zero, writable if WRITE is non-zero,
    executable if EXEC is non-zero.  Area is possibly changed against
@@ -338,7 +348,8 @@  extern void init_source_path (void);
 
 typedef int (*find_memory_region_ftype) (CORE_ADDR addr, unsigned long size,
 					 int read, int write, int exec,
-					 int modified, void *data);
+					 enum memory_mapping_state state,
+					 void *data);
 
 /* * Possible lvalue types.  Like enum language, this should be in
    value.h, but needs to be here for the same reason.  */
diff --git a/gdb/gcore.c b/gdb/gcore.c
index 44b9d0c..751ddac 100644
--- a/gdb/gcore.c
+++ b/gdb/gcore.c
@@ -415,7 +415,8 @@  make_output_phdrs (bfd *obfd, asection *osec, void *ignored)
 
 static int
 gcore_create_callback (CORE_ADDR vaddr, unsigned long size, int read,
-		       int write, int exec, int modified, void *data)
+		       int write, int exec,
+		       enum memory_mapping_state modified_state, void *data)
 {
   bfd *obfd = data;
   asection *osec;
@@ -424,7 +425,8 @@  gcore_create_callback (CORE_ADDR vaddr, unsigned long size, int read,
   /* If the memory segment has no permissions set, ignore it, otherwise
      when we later try to access it for read/write, we'll get an error
      or jam the kernel.  */
-  if (read == 0 && write == 0 && exec == 0 && modified == 0)
+  if (read == 0 && write == 0 && exec == 0
+      && modified_state == MEMORY_MAPPING_UNMODIFIED)
     {
       if (info_verbose)
         {
@@ -435,7 +437,8 @@  gcore_create_callback (CORE_ADDR vaddr, unsigned long size, int read,
       return 0;
     }
 
-  if (write == 0 && modified == 0 && !solib_keep_data_in_core (vaddr, size))
+  if (write == 0 && modified_state == MEMORY_MAPPING_UNMODIFIED
+      && !solib_keep_data_in_core (vaddr, size))
     {
       /* See if this region of memory lies inside a known file on disk.
 	 If so, we can avoid copying its contents by clearing SEC_LOAD.  */
@@ -528,7 +531,8 @@  objfile_find_memory_regions (struct target_ops *self,
 			 1, /* All sections will be readable.  */
 			 (flags & SEC_READONLY) == 0, /* Writable.  */
 			 (flags & SEC_CODE) != 0, /* Executable.  */
-			 1, /* MODIFIED is unknown, pass it as true.  */
+			 MEMORY_MAPPING_UNKNOWN_STATE, /* MODIFIED is
+							 unknown.  */
 			 obfd);
 	  if (ret != 0)
 	    return ret;
@@ -541,7 +545,7 @@  objfile_find_memory_regions (struct target_ops *self,
 	     1, /* Stack section will be readable.  */
 	     1, /* Stack section will be writable.  */
 	     0, /* Stack section will not be executable.  */
-	     1, /* Stack section will be modified.  */
+	     MEMORY_MAPPING_MODIFIED, /* Stack section will be modified.  */
 	     obfd);
 
   /* Make a heap segment.  */
@@ -550,7 +554,7 @@  objfile_find_memory_regions (struct target_ops *self,
 	     1, /* Heap section will be readable.  */
 	     1, /* Heap section will be writable.  */
 	     0, /* Heap section will not be executable.  */
-	     1, /* Heap section will be modified.  */
+	     MEMORY_MAPPING_MODIFIED, /* Heap section will be modified.  */
 	     obfd);
 
   return 0;
diff --git a/gdb/gnu-nat.c b/gdb/gnu-nat.c
index d830773..60612a7 100644
--- a/gdb/gnu-nat.c
+++ b/gdb/gnu-nat.c
@@ -2611,7 +2611,7 @@  gnu_find_memory_regions (struct target_ops *self,
 		     last_protection & VM_PROT_READ,
 		     last_protection & VM_PROT_WRITE,
 		     last_protection & VM_PROT_EXECUTE,
-		     1, /* MODIFIED is unknown, pass it as true.  */
+		     MEMORY_MAPPING_UNKNOWN_STATE, /* MODIFIED is unknown.  */
 		     data);
 	  last_region_address = region_address;
 	  last_region_end = region_address += region_length;
@@ -2625,7 +2625,7 @@  gnu_find_memory_regions (struct target_ops *self,
 	     last_protection & VM_PROT_READ,
 	     last_protection & VM_PROT_WRITE,
 	     last_protection & VM_PROT_EXECUTE,
-	     1, /* MODIFIED is unknown, pass it as true.  */
+	     MEMORY_MAPPING_UNKNOWN_STATE, /* MODIFIED is unknown.  */
 	     data);
 
   return 0;
diff --git a/gdb/linux-tdep.c b/gdb/linux-tdep.c
index ea0d4cd..15725c7 100644
--- a/gdb/linux-tdep.c
+++ b/gdb/linux-tdep.c
@@ -807,7 +807,9 @@  linux_core_info_proc (struct gdbarch *gdbarch, const char *args,
 typedef int linux_find_memory_region_ftype (ULONGEST vaddr, ULONGEST size,
 					    ULONGEST offset, ULONGEST inode,
 					    int read, int write,
-					    int exec, int modified,
+					    int exec,
+					    enum memory_mapping_state
+					    modified_state,
 					    const char *filename,
 					    void *data);
 
@@ -847,7 +849,8 @@  linux_find_memory_regions_full (struct gdbarch *gdbarch,
 	  const char *permissions, *device, *filename;
 	  size_t permissions_len, device_len;
 	  int read, write, exec;
-	  int modified = 0, has_anonymous = 0;
+	  enum memory_mapping_state modified_state = MEMORY_MAPPING_UNMODIFIED;
+	  int has_anonymous = 0;
 
 	  read_mapping (line, &addr, &endaddr, &permissions, &permissions_len,
 			&offset, &device, &device_len, &inode, &filename);
@@ -885,18 +888,18 @@  linux_find_memory_regions_full (struct gdbarch *gdbarch,
 		      break;
 		    }
 		  if (number != 0)
-		    modified = 1;
+		    modified_state = MEMORY_MAPPING_MODIFIED;
 		}
 	    }
 
 	  /* Older Linux kernels did not support the "Anonymous:" counter.
 	     If it is missing, we can't be sure - dump all the pages.  */
 	  if (!has_anonymous)
-	    modified = 1;
+	    modified_state = MEMORY_MAPPING_MODIFIED;
 
 	  /* Invoke the callback function to create the corefile segment.  */
 	  func (addr, endaddr - addr, offset, inode,
-		read, write, exec, modified, filename, obfd);
+		read, write, exec, modified_state, filename, obfd);
 	}
 
       do_cleanups (cleanup);
@@ -926,12 +929,14 @@  struct linux_find_memory_regions_data
 static int
 linux_find_memory_regions_thunk (ULONGEST vaddr, ULONGEST size,
 				 ULONGEST offset, ULONGEST inode,
-				 int read, int write, int exec, int modified,
+				 int read, int write, int exec,
+				 enum memory_mapping_state modified_state,
 				 const char *filename, void *arg)
 {
   struct linux_find_memory_regions_data *data = arg;
 
-  return data->func (vaddr, size, read, write, exec, modified, data->obfd);
+  return data->func (vaddr, size, read, write, exec, modified_state,
+		     data->obfd);
 }
 
 /* A variant of linux_find_memory_regions_full that is suitable as the
@@ -1074,7 +1079,8 @@  static linux_find_memory_region_ftype linux_make_mappings_callback;
 static int
 linux_make_mappings_callback (ULONGEST vaddr, ULONGEST size,
 			      ULONGEST offset, ULONGEST inode,
-			      int read, int write, int exec, int modified,
+			      int read, int write, int exec,
+			      enum memory_mapping_state modified_state,
 			      const char *filename, void *data)
 {
   struct linux_make_mappings_data *map_data = data;
@@ -1872,7 +1878,8 @@  linux_gdb_signal_to_target (struct gdbarch *gdbarch,
 
 static int
 find_mapping_size (CORE_ADDR vaddr, unsigned long size,
-		   int read, int write, int exec, int modified,
+		   int read, int write, int exec,
+		   enum memory_mapping_state modified_state,
 		   void *data)
 {
   struct mem_range *range = data;
diff --git a/gdb/procfs.c b/gdb/procfs.c
index b62539f..d074dd3 100644
--- a/gdb/procfs.c
+++ b/gdb/procfs.c
@@ -4967,7 +4967,7 @@  find_memory_regions_callback (struct prmap *map,
 		  (map->pr_mflags & MA_READ) != 0,
 		  (map->pr_mflags & MA_WRITE) != 0,
 		  (map->pr_mflags & MA_EXEC) != 0,
-		  1, /* MODIFIED is unknown, pass it as true.  */
+		  MEMORY_MAPPING_UNKNOWN_STATE, /* MODIFIED is unknown.  */
 		  data);
 }