BUG/MAJOR: idle conns: schedule the cleanup task on the correct threads
The idle cleanup tasks' masks are wrong for threads 32 to 64, which
causes the wrong thread to wake up and clean the connections that it
does not own, with a risk of crash or infinite loop depending on
concurrent accesses. For thread 32, any thread between 32 and 64 will
be woken up, but for threads 33 to 64, in fact threads 1 to 32 will
run the task instead.
This issue only affects deployments enabling more than 32 threads. While
is it not common in 1.9 where this has to be explicit, and can easily be
dealt with by lowering the number of threads, it can be more common in
2.0 since by default the thread count is determined based on the number
of available processors, hence the MAJOR tag which is mostly relevant
to 2.x.
The problem was first introduced into 1.9-dev9 by commit 0c18a6fe3
("MEDIUM: servers: Add a way to keep idle connections alive.") and was
later moved to cfgparse.c by commit 980855bd9 ("BUG/MEDIUM: server:
initialize the orphaned conns lists and tasks at the end").
This patch needs to be backported as far as 1.9, with care as 1.9 is
slightly different there (uses idle_task[] instead of idle_conn_cleanup[]
like in 2.x).
(cherry picked from commit bbb5f1d6d2a9948409683aa5865c130801d193ad)
Signed-off-by: Willy Tarreau <w@1wt.eu>
diff --git a/src/cfgparse.c b/src/cfgparse.c
index a6a8764..3bbb802 100644
--- a/src/cfgparse.c
+++ b/src/cfgparse.c
@@ -3690,7 +3690,7 @@
idle_conn_task->process = srv_cleanup_idle_connections;
idle_conn_task->context = NULL;
for (i = 0; i < global.nbthread; i++) {
- idle_conn_cleanup[i] = task_new(1 << i);
+ idle_conn_cleanup[i] = task_new(1UL << i);
if (!idle_conn_cleanup[i])
goto err;
idle_conn_cleanup[i]->process = srv_cleanup_toremove_connections;