BUG/MINOR: fd: protect fd state harder against a concurrent takeover

There's a theoretical race (that we failed to trigger) in function
fd_update_events(), which could strike on idle connections. The "locked"
variable will most often be 0 as the FD is bound to the current thread
only. Another thread could take it over once "locked" is set, change
the thread and running masks. Then the first thread updates the FD's
state non-atomically and possibly overwrites what the other thread was
preparing. It still looks like the FD's state will ultimately converge
though.

The solution against this is to set the running flag earlier so that a
takeover() attempt cannot succeed, or that the fd_set_running() attempt
fails, indicating that nothing needs to be done on this FD.

While this is sufficient for a simple fix to be backported, it leaves
the FD actively polled in the calling thread, this will trigger a second
wakeup which will notice the absence of tid_bit in the thread_mask,
getting rid of it.

A more elaborate solution would consist in calling fd_set_running()
directly from the pollers before calling fd_update_events(), getting
rid of the thread_mask test and letting the caller eliminate that FD
from its list if needed.

Interestingly, this code also proves to be suboptimal in that it sets
the FD state twice instead of calculating the new state at once and
always using a CAS to set it. This is a leftover of a simplification
that went into 2.4 and which should be explored in a future patch.

This may be backported as far as 2.2.

(cherry picked from commit d5402b8df8a6c50c3b9080189bf3d95afd466feb)
Signed-off-by: Willy Tarreau <w@1wt.eu>
1 file changed