Fixing where test-strncmp could read beyond page boundary

Message ID a321b583-d108-3c45-3354-fa80ccef8171@us.ibm.com
State Superseded
Delegated to: Richard Henderson
Headers

Commit Message

Paul A. Clarke Dec. 20, 2016, 5:19 p.m. UTC
  In do_random_tests(), 2 "page" size buffers are used with various
offsets and lengths where the operations are fairly close to the end
of a page boundary.  Most of the computations are done carefully to
avoid crossing the page boundary, except the actual "size" which is
eventually passed to strncmp() (via CALL_IMPL()) is not.  "size" is
computed once and not adjusted to accommodate for the random offsets
into the buffers at which the strings to be compared start.

This change ensures that the "size" passed to strncmp() contains the
strings within the page.

I also added a number of comments within do_random_tests() in the
hope that someone new to the code will find it much more
comprehensible than I did.

2016-12-16  Paul A. Clarke  <pc@us.ibm.com>

     * string/test-strncmp.c (do_random_tests): Ensure "size" defines
     strings within a single page.
---
  string/test-strncmp.c | 40 +++++++++++++++++++++++++++++++++++++---
  1 file changed, 37 insertions(+), 3 deletions(-)
  

Comments

Richard Henderson Dec. 20, 2016, 5:29 p.m. UTC | #1
On 12/20/2016 09:19 AM, Paul Clarke wrote:
> This change ensures that the "size" passed to strncmp() contains the
> strings within the page.

If there is a known difference or a known EOS before the end of the page, then
SIZE can legitimately be beyond the end of the page and the routine should not
SEGV.

Are you certain you're not removing valid testing for this function?


r~
  
Paul A. Clarke Dec. 20, 2016, 6:46 p.m. UTC | #2
On 12/20/2016 11:29 AM, Richard Henderson wrote:
> On 12/20/2016 09:19 AM, Paul Clarke wrote:
>> This change ensures that the "size" passed to strncmp() contains the
>> strings within the page.
>
> If there is a known difference or a known EOS before the end of the page, then
> SIZE can legitimately be beyond the end of the page and the routine should not
> SEGV.
>
> Are you certain you're not removing valid testing for this function?

That's a fair question.

do_random_tests() checks if strncmp() gives correct results for different alignment and length combinations. It checks if strncmp() is not issuing unaligned loads beyond a native alignment: by placing the strings near the end of a page, such loads will provoke a seg fault. This test already takes pains to ensure the string proper does not cross a page boundary by doing various checks, but one check missing is for variable 'size'. The 'size' parameter can be used in the implementation of strncmp() to decide the loop value for the number of loads. So these checks help to make the testcase valid thereby any failure would be in the implementation of strncmp().

Regards,
PC
  
Richard Henderson Dec. 20, 2016, 7:15 p.m. UTC | #3
On 12/20/2016 10:46 AM, Paul Clarke wrote:
> The 'size' parameter can be used in the implementation of strncmp() to decide
> the loop value for the number of loads.

No, it can't.  I know because I've made that mistake before.


r~
  
Florian Weimer Dec. 21, 2016, 9:44 a.m. UTC | #4
On 12/20/2016 08:15 PM, Richard Henderson wrote:
> On 12/20/2016 10:46 AM, Paul Clarke wrote:
>> The 'size' parameter can be used in the implementation of strncmp() to decide
>> the loop value for the number of loads.
>
> No, it can't.  I know because I've made that mistake before.

Richard is right.  Even C11 is pretty clear that this optimization is 
not permitted because it says that “characters that follow a
null character are not compared”.

It's unspecified whether reading stops at the first difference.  But 
considering that you need to check for NUL bytes anyway, I don't think 
this matters in practice because any size-based optimization would be 
invalid anyway.

This is also relevant to real-world code, which may use

   strncmp (s, "expected-prefix", strlen ("expected-prefix")) == 0

to determine whether s begins with "expected-prefix".  s could point 
close to the end of the heap.

Thanks,
Florian
  
Paul A. Clarke Dec. 21, 2016, 7:13 p.m. UTC | #5
On 12/21/2016 03:44 AM, Florian Weimer wrote:
> On 12/20/2016 08:15 PM, Richard Henderson wrote:
>> On 12/20/2016 10:46 AM, Paul Clarke wrote:
>>> The 'size' parameter can be used in the implementation of strncmp() to decide
>>> the loop value for the number of loads.
>>
>> No, it can't.  I know because I've made that mistake before.
>
> Richard is right.  Even C11 is pretty clear that this optimization is
> not permitted because it says that “characters that follow a
> null character are not compared”.
>
> It's unspecified whether reading stops at the first difference.  But
> considering that you need to check for NUL bytes anyway, I don't think
> this matters in practice because any size-based optimization would be
> invalid anyway.
>
> This is also relevant to real-world code, which may use
>
>    strncmp (s, "expected-prefix", strlen ("expected-prefix")) == 0
>
> to determine whether s begins with "expected-prefix".  s could point
> close to the end of the heap.

Thanks for the thorough reviews. Looking more carefully this time, I agree.

Since I sifted through the fairly complex code and found it challenging, let me try to preserve some value from the effort and submit a patch with just the explanatory comments, in case someone else needs to understand the code in the future.

PC
  

Patch

diff --git a/string/test-strncmp.c b/string/test-strncmp.c
index fb57a9b..97768f9 100644
--- a/string/test-strncmp.c
+++ b/string/test-strncmp.c
@@ -264,33 +264,58 @@  do_random_tests (void)
    size_t i, j, n, align1, align2, pos, len1, len2, size;
    int result;
    long r;
+  /* Get pointers near the end of a "page".  */
    UCHAR *p1 = (UCHAR *) (buf1 + page_size - 512 * CHARBYTES);
    UCHAR *p2 = (UCHAR *) (buf2 + page_size - 512 * CHARBYTES);
  
    for (n = 0; n < ITERATIONS; n++)
      {
+      /* Start index for first string within p1.  */
        align1 = random () & 31;
+
+      /* Start index for second string within p2.  */
        if (random () & 1)
  	align2 = random () & 31;
        else
  	align2 = align1 + (random () & 24);
-      pos = random () & 511;
-      size = random () & 511;
+
+      /* Compute larger offset into buffers.  */
        j = align1 > align2 ? align1 : align2;
-      if (pos + j >= 511)
+
+      /* Position of difference between strings.  */
+      pos = random () & 511;
+
+      /* Ensure pos within range for strings.  */
+      if (j + pos >= 511)
  	pos = 510 - j - (random () & 7);
+
+      /* Maximum size of operation.  */
+      size = random () & 511;
+
+      /* Ensure size within range for buffers.  */
+      if (j + size >= 512)
+	size = 512 - j;
+
+      /* Actual length of 1st string.  */
        len1 = random () & 511;
        if (pos >= len1 && (random () & 1))
  	len1 = pos + (random () & 7);
        if (len1 + j >= 512)
  	len1 = 511 - j - (random () & 7);
+
+      /* Actual length of 2nd string.  */
+      /* Note len2 >= len1.  */
        if (pos >= len1)
  	len2 = len1;
        else
  	len2 = len1 + (len1 != 511 - j ? random () % (511 - j - len1) : 0);
+
+      /* Compute max length of strings, plus margin of dword.  */
        j = (pos > len2 ? pos : len2) + align1 + 64;
        if (j > 512)
  	j = 512;
+
+      /* Fill p1 with random data.  */
        for (i = 0; i < j; ++i)
  	{
  	  p1[i] = random () & 255;
@@ -301,6 +326,8 @@  do_random_tests (void)
  		p1[i] = 1 + (random () & 127);
  	    }
  	}
+
+      /* Fill p2 with random data.  */
        for (i = 0; i < j; ++i)
  	{
  	  p2[i] = random () & 255;
@@ -313,11 +340,15 @@  do_random_tests (void)
  	}
  
        result = 0;
+
+      /* Make strings the same, up to position pos.  */
        MEMCPY (p2 + align2, p1 + align1, pos);
+
        if (pos < len1)
  	{
  	  if (p2[align2 + pos] == p1[align1 + pos])
  	    {
+	      /* Insert difference.  */
  	      p2[align2 + pos] = random () & 255;
  	      if (p2[align2 + pos] == p1[align1 + pos])
  		p2[align2 + pos] = p1[align1 + pos] + 3 + (random () & 127);
@@ -325,12 +356,15 @@  do_random_tests (void)
  
  	  if (pos < size)
  	    {
+	      /* Set expectations.  */
  	      if (p1[align1 + pos] < p2[align2 + pos])
  		result = -1;
  	      else
  		result = 1;
  	    }
  	}
+
+      /* Null terminate strings.  */
        p1[len1 + align1] = 0;
        p2[len2 + align2] = 0;