install_special_mapping && vm_pgoff (Was: vvar, gup && coredump)

Message ID 20150317134309.GA365@redhat.com
State New, archived
Headers

Commit Message

Oleg Nesterov March 17, 2015, 1:43 p.m. UTC
  On 03/16, Oleg Nesterov wrote:
>
> On 03/16, Andy Lutomirski wrote:
> >
> > Ick, you're probably right.  For what it's worth, the vdso *seems* to
> > be okay (on 64-bit only, and only if you don't poke at it too hard) if
> > you mremap it in one piece.  CRIU does that.
>
> I need to run away till tomorrow, but looking at this code even if "one piece"
> case doesn't look right if it was cow'ed. I'll verify tomorrow.

And I am still not sure this all is 100% correct, but I got lost in this code.
Probably this is fine...

But at least the bug exposed by the test-case looks clear:

	do_linear_fault:

		vmf->pgoff = (((address & PAGE_MASK) - vma->vm_start) >> PAGE_SHIFT)
				+ vma->vm_pgoff;
		...

		special_mapping_fault:

			pgoff = vmf->pgoff - vma->vm_pgoff;


So special_mapping_fault() can only work if this mapping starts from the
first page in ->pages[].

So perhaps we need _something like_ the (wrong/incomplete) patch below...

Or, really, perhaps we can create vdso_mapping ? So that map_vdso() could
simply mmap the anon_inode file...

Oleg.
  

Comments

Andy Lutomirski March 18, 2015, 1:44 a.m. UTC | #1
On Tue, Mar 17, 2015 at 6:43 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 03/16, Oleg Nesterov wrote:
>>
>> On 03/16, Andy Lutomirski wrote:
>> >
>> > Ick, you're probably right.  For what it's worth, the vdso *seems* to
>> > be okay (on 64-bit only, and only if you don't poke at it too hard) if
>> > you mremap it in one piece.  CRIU does that.
>>
>> I need to run away till tomorrow, but looking at this code even if "one piece"
>> case doesn't look right if it was cow'ed. I'll verify tomorrow.
>
> And I am still not sure this all is 100% correct, but I got lost in this code.
> Probably this is fine...
>
> But at least the bug exposed by the test-case looks clear:
>
>         do_linear_fault:
>
>                 vmf->pgoff = (((address & PAGE_MASK) - vma->vm_start) >> PAGE_SHIFT)
>                                 + vma->vm_pgoff;
>                 ...
>
>                 special_mapping_fault:
>
>                         pgoff = vmf->pgoff - vma->vm_pgoff;
>
>
> So special_mapping_fault() can only work if this mapping starts from the
> first page in ->pages[].
>
> So perhaps we need _something like_ the (wrong/incomplete) patch below...
>
> Or, really, perhaps we can create vdso_mapping ? So that map_vdso() could
> simply mmap the anon_inode file...

That's slightly tricky, I think, because it could start showing up in
/proc/PID/map_files or whatever it's called, and I don't think we want
that.  I also don't want to commit to all special mappings everywhere
being semantically identical (there are already two kinds on both x86
and arm64, and I'd eventually like to have them vary per-process as
well).  None of that precludes using non-null vm_file, but it's a
complication.

Your patch does look like a considerable improvement, though.  Let me
see if I can find some time to fold it in with the rest of my special
mapping rework over the next few days.

--Andy

>
> Oleg.
>
> --- x/mm/mmap.c
> +++ x/mm/mmap.c
> @@ -2832,6 +2832,8 @@ int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
>         return 0;
>  }
>
> +bool is_special_vma(struct vm_area_struct *vma);
> +
>  /*
>   * Copy the vma structure to a new location in the same mm,
>   * prior to moving page table entries, to effect an mremap move.
> @@ -2851,7 +2853,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
>          * If anonymous vma has not yet been faulted, update new pgoff
>          * to match new location, to increase its chance of merging.
>          */
> -       if (unlikely(!vma->vm_file && !vma->anon_vma)) {
> +       if (unlikely(!vma->vm_file && !is_special_vma(vma) && !vma->anon_vma)) {
>                 pgoff = addr >> PAGE_SHIFT;
>                 faulted_in_anon_vma = false;
>         }
> @@ -2953,6 +2955,11 @@ static const struct vm_operations_struct legacy_special_mapping_vmops = {
>         .fault = special_mapping_fault,
>  };
>
> +bool is_special_vma(struct vm_area_struct *vma)
> +{
> +       return vma->vm_ops == &special_mapping_vmops;
> +}
> +
>  static int special_mapping_fault(struct vm_area_struct *vma,
>                                 struct vm_fault *vmf)
>  {
> @@ -2965,7 +2972,7 @@ static int special_mapping_fault(struct vm_area_struct *vma,
>          * We are allowed to do this because we are the mm; do not copy
>          * this code into drivers!
>          */
> -       pgoff = vmf->pgoff - vma->vm_pgoff;
> +       pgoff = vmf->pgoff;
>
>         if (vma->vm_ops == &legacy_special_mapping_vmops)
>                 pages = vma->vm_private_data;
> @@ -3014,6 +3021,7 @@ static struct vm_area_struct *__install_special_mapping(
>         if (ret)
>                 goto out;
>
> +       vma->vm_pgoff = 0;
>         mm->total_vm += len >> PAGE_SHIFT;
>
>         perf_event_mmap(vma);
>
  
Oleg Nesterov March 18, 2015, 6:06 p.m. UTC | #2
On 03/17, Andy Lutomirski wrote:
>
> On Tue, Mar 17, 2015 at 6:43 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > But at least the bug exposed by the test-case looks clear:
> >
> >         do_linear_fault:
> >
> >                 vmf->pgoff = (((address & PAGE_MASK) - vma->vm_start) >> PAGE_SHIFT)
> >                                 + vma->vm_pgoff;
> >                 ...
> >
> >                 special_mapping_fault:
> >
> >                         pgoff = vmf->pgoff - vma->vm_pgoff;
> >
> >
> > So special_mapping_fault() can only work if this mapping starts from the
> > first page in ->pages[].
> >
> > So perhaps we need _something like_ the (wrong/incomplete) patch below...
> >
> > Or, really, perhaps we can create vdso_mapping ? So that map_vdso() could
> > simply mmap the anon_inode file...
>
> That's slightly tricky, I think, because it could start showing up in
> /proc/PID/map_files or whatever it's called, and I don't think we want
> that.

Hmm. To me this looke liks improvement. And again, with this change
uprobe-in-vdso can work.

OK, this is off-topic right now, lets forget this for the moment.

> Your patch does look like a considerable improvement, though.  Let me
> see if I can find some time to fold it in with the rest of my special
> mapping rework over the next few days.

I'll try to recheck... Perhaps I'll send this (changed) patch for review.
This is a bugfix, even if the bug is minor.

And note that with this change vvar->access() becomes trivial. I think it
makes sense to fix "gup() fails in vvar" too. Gdb developers have enough
other problems with the poor kernel interfaces ;)

Oleg.
  

Patch

--- x/mm/mmap.c
+++ x/mm/mmap.c
@@ -2832,6 +2832,8 @@  int insert_vm_struct(struct mm_struct *mm, struct vm_area_struct *vma)
 	return 0;
 }
 
+bool is_special_vma(struct vm_area_struct *vma);
+
 /*
  * Copy the vma structure to a new location in the same mm,
  * prior to moving page table entries, to effect an mremap move.
@@ -2851,7 +2853,7 @@  struct vm_area_struct *copy_vma(struct vm_area_struct **vmap,
 	 * If anonymous vma has not yet been faulted, update new pgoff
 	 * to match new location, to increase its chance of merging.
 	 */
-	if (unlikely(!vma->vm_file && !vma->anon_vma)) {
+	if (unlikely(!vma->vm_file && !is_special_vma(vma) && !vma->anon_vma)) {
 		pgoff = addr >> PAGE_SHIFT;
 		faulted_in_anon_vma = false;
 	}
@@ -2953,6 +2955,11 @@  static const struct vm_operations_struct legacy_special_mapping_vmops = {
 	.fault = special_mapping_fault,
 };
 
+bool is_special_vma(struct vm_area_struct *vma)
+{
+	return vma->vm_ops == &special_mapping_vmops;
+}
+
 static int special_mapping_fault(struct vm_area_struct *vma,
 				struct vm_fault *vmf)
 {
@@ -2965,7 +2972,7 @@  static int special_mapping_fault(struct vm_area_struct *vma,
 	 * We are allowed to do this because we are the mm; do not copy
 	 * this code into drivers!
 	 */
-	pgoff = vmf->pgoff - vma->vm_pgoff;
+	pgoff = vmf->pgoff;
 
 	if (vma->vm_ops == &legacy_special_mapping_vmops)
 		pages = vma->vm_private_data;
@@ -3014,6 +3021,7 @@  static struct vm_area_struct *__install_special_mapping(
 	if (ret)
 		goto out;
 
+	vma->vm_pgoff = 0;
 	mm->total_vm += len >> PAGE_SHIFT;
 
 	perf_event_mmap(vma);