BUG/MEDIUM: fd/threads: fix a concurrency issue between add and rm on the same fd
There's a very hard-to-trigger bug in the FD list code where the
fd_add_to_fd_list() function assumes that if the FD it's trying to add
is already locked, it's in the process of being added. Unfortunately, it
can also be in the process of being removed. It is very hard to trigger
because it requires that one thread is removing the FD while another one
is adding it. First very few FDs run on multiple threads (listeners and
DNS), and second, it does not make sense to add and remove the FD at the
same time.
In practice the DNS code built on the older callback-only model does
perform bursts of fd_want_send() for all resolvers at once when it wants
to send a new query (dns_send_query()). And this is more likely to happen
when here are lots of resolutions in parallel and many resolvers, because
the dns_response_recv() callback can also trigger a series of queries on
all resolvers for each invalid response it receives. This means that it
really is perfectly possible to both stop and start in parallel during
short periods of time there.
This issue was not reported before 2.1, but 2.1 had the FD cache, built
on the exact same code base. It's very possible that the issue caused
exactly the opposite situation, where an event was occasionally lost,
causing a DNS retry that worked, and nobody noticing the problem in the
end. In 2.1 the lost entries are the updates asking for not polling for
writes anymore, and the effect is that the poller contiuously reports
writability on the socket when the issue happens.
This patch fixes bug #416 and must be backported as far as 1.8, and
absolutely requires that previous commit "MINOR: fd/threads: make
_GET_NEXT()/_GET_PREV() use the volatile attribute" is backported as
well otherwise it will make the issue worse.
Special thanks to Julien Pivotto for setting up a reliable reproducer
for this difficult issue.
(cherry picked from commit fc51f0f588b77458e340476f84c2d475f3a2a9d5)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 9342d55c55064abe979f20bbe0ddf7d1b3203e05)
Signed-off-by: Willy Tarreau <w@1wt.eu>
diff --git a/src/fd.c b/src/fd.c
index b735f56..3d361e8 100644
--- a/src/fd.c
+++ b/src/fd.c
@@ -201,8 +201,10 @@
redo_next:
next = _GET_NEXT(fd, off);
/* Check that we're not already in the cache, and if not, lock us. */
- if (next >= -2)
+ if (next > -2)
goto done;
+ if (next == -2)
+ goto redo_next;
if (!_HA_ATOMIC_CAS(&_GET_NEXT(fd, off), &next, -2))
goto redo_next;
__ha_barrier_atomic_store();