Fix infinite loop in check_pf
Commit Message
getaddrinfo() calls down to check_pf.c:make_request() while holding a lock.
make_request calls __recvmsg, and the documentation for recvmsg states that
a return code of 0 indicates an orderly peer shutdown. Given it is possible
to get a return code of 0 (and presumably once you get 0, subsequent calls
will also get 0), there is an infinite loop. I have a core where this
has happened once; as it enters this loop under lock it is possible to
exhaust other resources (in my case, we exhausted the number of file handles
available to a process as incoming requests were calling getaddrinfo).
line 173 begins the loop
line 182 __recvmsg is called and returns 0
line 191 for loop avoided, as NLMSG_OK (nlmh, 0) is always false
line 283 done is not set, so the loop repeats
ChangeLog:
2014-09-17 Jim King <jim.king@simplivity.com>
* sysdeps/unix/sysv/linux/check_pf.c (make_request): Avoid
infinite loop when __recvmsg returns 0.
Patch:
---
James E. King, III
Architect
SimpliVity Corporation
8 Technology Drive, 2nd Floor
Westborough, MA 01581-1756
Ph: 855-SVT-INFO
Comments
On Mon, Oct 13, 2014 at 08:51:28PM +0000, Jim King wrote:
> getaddrinfo() calls down to check_pf.c:make_request() while holding a lock.
> make_request calls __recvmsg, and the documentation for recvmsg states that
> a return code of 0 indicates an orderly peer shutdown. Given it is possible
> to get a return code of 0 (and presumably once you get 0, subsequent calls
> will also get 0), there is an infinite loop. I have a core where this
> has happened once; as it enters this loop under lock it is possible to
> exhaust other resources (in my case, we exhausted the number of file handles
> available to a process as incoming requests were calling getaddrinfo).
Interesting; it's a netlink socket which is assumed to not fail in
this manner. The situation deserves a bit more investigation to
figure out the root cause, but the patch itself is fine.
Could you please file a bug report in the sourceware bugzilla and let
me know the bug number so that I can mention it in the commit log?
Thanks,
Siddhesh
On Tue, Oct 14, 2014 at 01:12:40PM +0000, Jim King wrote:
> This was originally reported as bug 12926 and Paul Pluzhnikov
> provided the patch. I reopened it a month ago - it doesn't look
> like anyone is looking.
Thanks, I'll commit this.
Thanks,
Siddhesh
@@ -180,7 +180,7 @@ make_request (int fd, pid_t pid)
};
ssize_t read_len = TEMP_FAILURE_RETRY (__recvmsg (fd, &msg, 0));
- if (read_len < 0)
+ if (read_len <= 0)
goto out_fail2;
if (msg.msg_flags & MSG_TRUNC)