BUG/MEDIUM: listener/thread: fix a race when pausing a listener
There exists a race in the listener code where a thread might disable
receipt on a listener's FD then turn it to LI_PAUSED while at the same
time another one faces EAGAIN on accept() and enables it again via
fd_cant_recv(). The result is that the FD is in LI_PAUSED state with
its polling still enabled. listener_accept() does not do anything then
and doesn't disable the FD either, resulting in a thread eating all the
CPU as reported in issue #358. A solution would be to take the listener's
lock to perform the fd_cant_recv() call and do it only if the FD is still
in LI_READY state, but this would be totally overkill while in practice
the issue only happens during shutdown.
Instead what is done here is that when leaving we recheck the state and
disable polling if the listener is not in LI_READY state, which never
happens except when being limited. In the worst case there could be one
extra check per thread for the time required to converge, which is
absolutely nothing.
This fix was successfully tested, and should be backported to all
versions using the lock-free listeners, which means all those containing
commit 3f0d02bb ("MAJOR: listener: do not hold the listener lock in
listener_accept()"), hence 2.1, 2.0, 1.9.7+, 1.8.20+.
(cherry picked from commit 4c044e274c16fde42863c476449895b0fd603818)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 8d30120a21cc22138395c153930da6f81ebca013)
Signed-off-by: Willy Tarreau <w@1wt.eu>
1 file changed