[PING,PR,gdb/19361] Fix invalid comparison functions

Message ID 56823702.6020804@samsung.com
State New, archived
Headers

Commit Message

Yury Gribov Dec. 29, 2015, 7:32 a.m. UTC
  Hi all,

The attached patch fixes bugs in comparison functions qsort_cmp and
compare_processes.

I've tested the patch on x86_64-pc-linux-gnu (no regressions in
testsuite except for flakiness in gdb.threads and bigcore.exp).

These functions are passed to qsort(3) but do not obey standard symmetry
requirements mandated by the standard (grep for "total ordering" in
http://pubs.opengroup.org/onlinepubs/009695399/functions/qsort.html).
This causes undefined behavior at runtime which can e.g. cause qsort to
produce invalid results.

Compare_processes fails to properly compare process group leaders which
is probably a serious problem (e.g. resulting in invalid sort).

Qsort_cmp fails to produce proper result when comparing same element.
Sane qsort implementation probably don't call comparison callback on
same element so this may not be a big problem in practice but I think it
should still be fixed.

NB1: please Cc: me explicitly, I'm not subscribed to the list.
NB2: Bugs were found with SortChecker tool
(https://github.com/yugr/sortcheck).

Best regards,
Yury Gribov
  

Comments

Pedro Alves Dec. 29, 2015, 5:27 p.m. UTC | #1
On 12/29/2015 07:32 AM, Yury Gribov wrote:
> Hi all,
> 
> The attached patch fixes bugs in comparison functions qsort_cmp and
> compare_processes.
> 
> I've tested the patch on x86_64-pc-linux-gnu (no regressions in
> testsuite except for flakiness in gdb.threads and bigcore.exp).
> 
> These functions are passed to qsort(3) but do not obey standard symmetry
> requirements mandated by the standard (grep for "total ordering" in
> http://pubs.opengroup.org/onlinepubs/009695399/functions/qsort.html).
> This causes undefined behavior at runtime which can e.g. cause qsort to
> produce invalid results.
> 
> Compare_processes fails to properly compare process group leaders which
> is probably a serious problem (e.g. resulting in invalid sort).

I'm not sure whether it's possible that you end up with equivalent
elements in the list.  That is, two entries with the same pgid and pid.
I suppose it could, if the kernel doesn't build the /proc/ directory in one
go under a lock (or rcu), and a process that has been added to the directory
already just exited and the kernel reuses the pid for another process of
the same progress group while we're calling readdir...  Did you check?
I was under the impression the whole /proc subdir was built atomically
at open time.

> diff --git a/gdb/nat/linux-osdata.c b/gdb/nat/linux-osdata.c
> index 56a8fe6..25a310f 100644
> --- a/gdb/nat/linux-osdata.c
> +++ b/gdb/nat/linux-osdata.c
> @@ -420,9 +420,9 @@ compare_processes (const void *process1, const void *process2)
>    else
>      {
>        /* Process group leaders always come first, else sort by PID.  */
> -      if (pid1 == pgid1)
> +      if (pid1 == pgid1 && pid2 != pgid2)
>  	return -1;
> -      else if (pid2 == pgid2)
> +      else if (pid1 != pgid1 && pid2 == pgid2)
>  	return 1;
>        else if (pid1 < pid2)
>  	return -1;

In any case, seems to me that it'd result in simpler-to-read-code if
you rewrote it like this:

      /* Process group leaders always come first, else sort by PID.  */

      /* Easier to check for equivalent element first.  */
      if (pid1 == pid2)
        return 0;

      if (pid1 == pgid1)
	return -1;
      else if (pid2 == pgid2)
	return 1;
      else if (pid1 < pid2)
	return -1;
      else if (pid1 > pid2)
	return 1;

> 
> Qsort_cmp fails to produce proper result when comparing same element.
> Sane qsort implementation probably don't call comparison callback on
> same element 

One would hope...  AFAIK, the only real reason to compare same
object, is if you're sorting an array of pointers, and you can have
the same pointer included twice in the array being sorted.  It's still
not the same as comparing same element (the pointers are the elements), but
it's close.  But in this case, if that ever happened, surely something
else would have blown up already.

So how about we make that:

  if (sect1_addr < sect2_addr)
    return -1;
  else if (sect1_addr > sect2_addr)
    return 1;
-  else
+  if (sect1 != sect2)
    {

So that the assertion at the bottom is reached in that case? :

  /* Unreachable.  */
  gdb_assert_not_reached ("unexpected code path");
  return 0;
}

> so this may not be a big problem in practice but I think it
> should still be fixed.

Thanks,
Pedro Alves
  
Yury Gribov Dec. 29, 2015, 6:09 p.m. UTC | #2
On 12/29/2015 08:27 PM, Pedro Alves wrote:
> On 12/29/2015 07:32 AM, Yury Gribov wrote:
>> Hi all,
>>
>> The attached patch fixes bugs in comparison functions qsort_cmp and
>> compare_processes.
>>
>> I've tested the patch on x86_64-pc-linux-gnu (no regressions in
>> testsuite except for flakiness in gdb.threads and bigcore.exp).
>>
>> These functions are passed to qsort(3) but do not obey standard symmetry
>> requirements mandated by the standard (grep for "total ordering" in
>> http://pubs.opengroup.org/onlinepubs/009695399/functions/qsort.html).
>> This causes undefined behavior at runtime which can e.g. cause qsort to
>> produce invalid results.
>>
>> Compare_processes fails to properly compare process group leaders which
>> is probably a serious problem (e.g. resulting in invalid sort).
>
> I'm not sure whether it's possible that you end up with equivalent
> elements in the list.  That is, two entries with the same pgid and pid.
> I suppose it could, if the kernel doesn't build the /proc/ directory in one
> go under a lock (or rcu), and a process that has been added to the directory
> already just exited and the kernel reuses the pid for another process of
> the same progress group while we're calling readdir...  Did you check?
> I was under the impression the whole /proc subdir was built atomically
> at open time.
>
>> diff --git a/gdb/nat/linux-osdata.c b/gdb/nat/linux-osdata.c
>> index 56a8fe6..25a310f 100644
>> --- a/gdb/nat/linux-osdata.c
>> +++ b/gdb/nat/linux-osdata.c
>> @@ -420,9 +420,9 @@ compare_processes (const void *process1, const void *process2)
>>     else
>>       {
>>         /* Process group leaders always come first, else sort by PID.  */
>> -      if (pid1 == pgid1)
>> +      if (pid1 == pgid1 && pid2 != pgid2)
>>   	return -1;
>> -      else if (pid2 == pgid2)
>> +      else if (pid1 != pgid1 && pid2 == pgid2)
>>   	return 1;
>>         else if (pid1 < pid2)
>>   	return -1;
>
> In any case, seems to me that it'd result in simpler-to-read-code if
> you rewrote it like this:
>
>        /* Process group leaders always come first, else sort by PID.  */
>
>        /* Easier to check for equivalent element first.  */
>        if (pid1 == pid2)
>          return 0;
>
>        if (pid1 == pgid1)
> 	return -1;
>        else if (pid2 == pgid2)
> 	return 1;
>        else if (pid1 < pid2)
> 	return -1;
>        else if (pid1 > pid2)
> 	return 1;
>
>>
>> Qsort_cmp fails to produce proper result when comparing same element.
>> Sane qsort implementation probably don't call comparison callback on
>> same element
>
> One would hope...  AFAIK, the only real reason to compare same
> object, is if you're sorting an array of pointers, and you can have
> the same pointer included twice in the array being sorted.  It's still
> not the same as comparing same element (the pointers are the elements), but
> it's close.  But in this case, if that ever happened, surely something
> else would have blown up already.
>
> So how about we make that:
>
>    if (sect1_addr < sect2_addr)
>      return -1;
>    else if (sect1_addr > sect2_addr)
>      return 1;
> -  else
> +  if (sect1 != sect2)
>      {
>
> So that the assertion at the bottom is reached in that case? :
>
>    /* Unreachable.  */
>    gdb_assert_not_reached ("unexpected code path");
>    return 0;
> }
>
>> so this may not be a big problem in practice but I think it
>> should still be fixed.

Added my home address (to be able to answer on vacation).

-Y
  
Yuri Gribov Dec. 30, 2015, 8:18 p.m. UTC | #3
On Tue, Dec 29, 2015 at 9:09 PM, Yury Gribov <y.gribov@samsung.com> wrote:
> On 12/29/2015 08:27 PM, Pedro Alves wrote:
>>
>> On 12/29/2015 07:32 AM, Yury Gribov wrote:
>>>
>>> Hi all,
>>>
>>> The attached patch fixes bugs in comparison functions qsort_cmp and
>>> compare_processes.
>>>
>>> I've tested the patch on x86_64-pc-linux-gnu (no regressions in
>>> testsuite except for flakiness in gdb.threads and bigcore.exp).
>>>
>>> These functions are passed to qsort(3) but do not obey standard symmetry
>>> requirements mandated by the standard (grep for "total ordering" in
>>> http://pubs.opengroup.org/onlinepubs/009695399/functions/qsort.html).
>>> This causes undefined behavior at runtime which can e.g. cause qsort to
>>> produce invalid results.
>>>
>>> Compare_processes fails to properly compare process group leaders which
>>> is probably a serious problem (e.g. resulting in invalid sort).
>>
>>
>> I'm not sure whether it's possible that you end up with equivalent
>> elements in the list.  That is, two entries with the same pgid and pid.
>> I suppose it could, if the kernel doesn't build the /proc/ directory in
>> one
>> go under a lock (or rcu), and a process that has been added to the
>> directory
>> already just exited and the kernel reuses the pid for another process of
>> the same progress group while we're calling readdir...  Did you check?
>> I was under the impression the whole /proc subdir was built atomically
>> at open time.

Sorry, I should have been more wordy about the actual problem. With
current approach i.e.

  if (pid1 == pgid1)
    return -1;
  else if (pid2 == pgid2)
    return 1;

comparison of two group leaders is not going to be symmetric:

  cmp(lead_1, lead_2) == cmp(lead_2, lead_1) == -1

whereas qsort requires cmp(x, y) == -cmp(y, x) (symmetry requirement).
Such violations of ordering may easily cause sorting algorithm to
misbehave.

>>> Qsort_cmp fails to produce proper result when comparing same element.
>>> Sane qsort implementation probably don't call comparison callback on
>>> same element
>>
>> One would hope...  AFAIK, the only real reason to compare same
>> object, is if you're sorting an array of pointers, and you can have
>> the same pointer included twice in the array being sorted.  It's still
>> not the same as comparing same element (the pointers are the elements),
>> but
>> it's close.  But in this case, if that ever happened, surely something
>> else would have blown up already.
>>
>> So how about we make that:
>>
>>    if (sect1_addr < sect2_addr)
>>      return -1;
>>    else if (sect1_addr > sect2_addr)
>>      return 1;
>> -  else
>> +  if (sect1 != sect2)
>>      {
>>
>> So that the assertion at the bottom is reached in that case? :
>>
>>    /* Unreachable.  */
>>    gdb_assert_not_reached ("unexpected code path");
>>    return 0;
>> }

Makes perfect sense!

And thanks for detailed reply btw.

-Y
  
Pedro Alves Dec. 30, 2015, 9:25 p.m. UTC | #4
On 12/30/2015 08:18 PM, Yuri Gribov wrote:

> Sorry, I should have been more wordy about the actual problem. With
> current approach i.e.
> 
>   if (pid1 == pgid1)
>     return -1;
>   else if (pid2 == pgid2)
>     return 1;
> 
> comparison of two group leaders is not going to be symmetric:
> 
>   cmp(lead_1, lead_2) == cmp(lead_2, lead_1) == -1

Aaaaaaah, d'oh!  Thanks, it's obvious now, yes, we fail to consider
the case of both elements being leaders.  I couldn't see that
even after staring at the code for a while.  That hunk is OK as
is then.  (Please clarify this in the commit log.)

Thanks,
Pedro Alves
  
Pedro Alves Dec. 30, 2015, 9:35 p.m. UTC | #5
On 12/30/2015 09:25 PM, Pedro Alves wrote:
> On 12/30/2015 08:18 PM, Yuri Gribov wrote:
> 
>> Sorry, I should have been more wordy about the actual problem. With
>> current approach i.e.
>>
>>   if (pid1 == pgid1)
>>     return -1;
>>   else if (pid2 == pgid2)
>>     return 1;
>>
>> comparison of two group leaders is not going to be symmetric:
>>
>>   cmp(lead_1, lead_2) == cmp(lead_2, lead_1) == -1
> 
> Aaaaaaah, d'oh!  Thanks, it's obvious now, yes, we fail to consider
> the case of both elements being leaders.  I couldn't see that
> even after staring at the code for a while.  That hunk is OK as
> is then.  (Please clarify this in the commit log.)

Wait, no we don't...  If both are leaders when you get there, then
they must have different pgid's, and that case is handled before:

  /* Sort by PGID.  */
  if (pgid1 < pgid2)
    return -1;
  else if (pgid1 > pgid2)
    return 1;
  else

But let's assume not.  Let's assume we see two leaders when you get to the
code in question.  That means they have pgid1==pgid2.  Since by definition
being leader means pgid==pid, it must be that pid1 == pid2 as well.  That
is, this is really about comparing equivalent elements.  Which
brings us back again to:

      /* Easier to check for equivalent element first.  */
      if (pid1 == pid2)
        return 0;

Or am I confused again?

Thanks,
Pedro Alves
  
Yuri Gribov Jan. 2, 2016, 2:18 a.m. UTC | #6
On Thu, Dec 31, 2015 at 12:35 AM, Pedro Alves <palves@redhat.com> wrote:
> On 12/30/2015 09:25 PM, Pedro Alves wrote:
>> On 12/30/2015 08:18 PM, Yuri Gribov wrote:
>>
>>> Sorry, I should have been more wordy about the actual problem. With
>>> current approach i.e.
>>>
>>>   if (pid1 == pgid1)
>>>     return -1;
>>>   else if (pid2 == pgid2)
>>>     return 1;
>>>
>>> comparison of two group leaders is not going to be symmetric:
>>>
>>>   cmp(lead_1, lead_2) == cmp(lead_2, lead_1) == -1
>>
>> Aaaaaaah, d'oh!  Thanks, it's obvious now, yes, we fail to consider
>> the case of both elements being leaders.  I couldn't see that
>> even after staring at the code for a while.  That hunk is OK as
>> is then.  (Please clarify this in the commit log.)
>
> Wait, no we don't...  If both are leaders when you get there, then
> they must have different pgid's, and that case is handled before:
>
>   /* Sort by PGID.  */
>   if (pgid1 < pgid2)
>     return -1;
>   else if (pgid1 > pgid2)
>     return 1;
>   else
>
> But let's assume not.  Let's assume we see two leaders when you get to the
> code in question.  That means they have pgid1==pgid2.  Since by definition
> being leader means pgid==pid, it must be that pid1 == pid2 as well.  That
> is, this is really about comparing equivalent elements.  Which
> brings us back again to:
>
>       /* Easier to check for equivalent element first.  */
>       if (pid1 == pid2)
>         return 0;
>
> Or am I confused again?

Hm, this sounds reasonable. Let me see if I can repro this bug and get
the exact inputs which caused the problem (unfortunately I didn't
collect them during testing).

-Y
  

Patch

>From 5bc187e060a75f0dff83e2803adb53f41c2e6dcb Mon Sep 17 00:00:00 2001
From: Yury Gribov <y.gribov@samsung.com>
Date: Tue, 15 Dec 2015 10:56:06 +0300
Subject: [PATCH] Fix invalid comparison functions.

2015-12-15  Yury Gribov  <tetra2005@gmail.com>

	PR gdb/19361
        * nat/linux-osdata.c (compare_processes):
	Fix asymmetry in comparison function.
	* objfiles.c (qsort_cmp): Ditto.
---
 gdb/nat/linux-osdata.c | 4 ++--
 gdb/objfiles.c         | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gdb/nat/linux-osdata.c b/gdb/nat/linux-osdata.c
index 56a8fe6..25a310f 100644
--- a/gdb/nat/linux-osdata.c
+++ b/gdb/nat/linux-osdata.c
@@ -420,9 +420,9 @@  compare_processes (const void *process1, const void *process2)
   else
     {
       /* Process group leaders always come first, else sort by PID.  */
-      if (pid1 == pgid1)
+      if (pid1 == pgid1 && pid2 != pgid2)
 	return -1;
-      else if (pid2 == pgid2)
+      else if (pid1 != pgid1 && pid2 == pgid2)
 	return 1;
       else if (pid1 < pid2)
 	return -1;
diff --git a/gdb/objfiles.c b/gdb/objfiles.c
index d33379f..4aa600b 100644
--- a/gdb/objfiles.c
+++ b/gdb/objfiles.c
@@ -1140,6 +1140,9 @@  qsort_cmp (const void *a, const void *b)
   const CORE_ADDR sect1_addr = obj_section_addr (sect1);
   const CORE_ADDR sect2_addr = obj_section_addr (sect2);
 
+  if (sect1 == sect2)
+    return 0;
+
   if (sect1_addr < sect2_addr)
     return -1;
   else if (sect1_addr > sect2_addr)
-- 
1.9.1