Fix infinite loop in check_pf

Message ID EEBB90F5C3BBEF4688BEC315A6934C8047924B84@SVTEX01.simplivity.com
State Committed
Headers

Commit Message

Jim King Oct. 13, 2014, 8:51 p.m. UTC
  getaddrinfo() calls down to check_pf.c:make_request() while holding a lock.
make_request calls __recvmsg, and the documentation for recvmsg states that
a return code of 0 indicates an orderly peer shutdown.  Given it is possible
to get a return code of 0 (and presumably once you get 0, subsequent calls
will also get 0), there is an infinite loop.  I have a core where this
has happened once; as it enters this loop under lock it is possible to
exhaust other resources (in my case, we exhausted the number of file handles
available to a process as incoming requests were calling getaddrinfo).

line 173 begins the loop
line 182 __recvmsg is called and returns 0
line 191 for loop avoided, as NLMSG_OK (nlmh, 0) is always false
line 283 done is not set, so the loop repeats

ChangeLog:

2014-09-17  Jim King  <jim.king@simplivity.com>

        * sysdeps/unix/sysv/linux/check_pf.c (make_request): Avoid
        infinite loop when __recvmsg returns 0.

Patch:


---
James E. King, III
Architect
SimpliVity Corporation
8 Technology Drive, 2nd Floor
Westborough, MA 01581-1756
Ph: 855-SVT-INFO
  

Comments

Siddhesh Poyarekar Oct. 14, 2014, 8:29 a.m. UTC | #1
On Mon, Oct 13, 2014 at 08:51:28PM +0000, Jim King wrote:
> getaddrinfo() calls down to check_pf.c:make_request() while holding a lock.
> make_request calls __recvmsg, and the documentation for recvmsg states that
> a return code of 0 indicates an orderly peer shutdown.  Given it is possible
> to get a return code of 0 (and presumably once you get 0, subsequent calls
> will also get 0), there is an infinite loop.  I have a core where this
> has happened once; as it enters this loop under lock it is possible to
> exhaust other resources (in my case, we exhausted the number of file handles
> available to a process as incoming requests were calling getaddrinfo).

Interesting; it's a netlink socket which is assumed to not fail in
this manner.  The situation deserves a bit more investigation to
figure out the root cause, but the patch itself is fine.

Could you please file a bug report in the sourceware bugzilla and let
me know the bug number so that I can mention it in the commit log?

Thanks,
Siddhesh
  
Siddhesh Poyarekar Oct. 14, 2014, 3:30 p.m. UTC | #2
On Tue, Oct 14, 2014 at 01:12:40PM +0000, Jim King wrote:
> This was originally reported as bug 12926 and Paul Pluzhnikov
> provided the patch.  I reopened it a month ago - it doesn't look
> like anyone is looking.

Thanks, I'll commit this.

Thanks,
Siddhesh
  

Patch

diff --git a/sysdeps/unix/sysv/linux/check_pf.c b/sysdeps/unix/sysv/linux/check_pf.c
index c7fd9b0..976f249 100644
--- a/sysdeps/unix/sysv/linux/check_pf.c
+++ b/sysdeps/unix/sysv/linux/check_pf.c
@@ -180,7 +180,7 @@  make_request (int fd, pid_t pid)
        };
 
       ssize_t read_len = TEMP_FAILURE_RETRY (__recvmsg (fd, &msg, 0));
-      if (read_len < 0)
+      if (read_len <= 0)
        goto out_fail2;
 
       if (msg.msg_flags & MSG_TRUNC)