BUG/MINOR: haproxy/threads: try to make all threads leave together
There's a small issue with soft stop combined with the incoming
connection load balancing. A thread may dispatch a connection to
another one at the moment stopping=1 is set, and the second one could
stop by seeing (jobs - unstoppable_jobs) == 0 in run_poll_loop(),
without ever picking these connections from the queue. This is
visible in that it may occasionally cause a connection drop on
reload since no remaining thread will ever pick that connection
anymore.
In order to address this, this patch adds a stopping_thread_mask
variable by which threads acknowledge their willingness to stop
when their runqueue is empty. And all threads will only stop at
this moment, so that if finally some late work arrives in the
thread's queue, it still has a chance to process it.
This should be backported to 2.1 and 2.0.
(cherry picked from commit 4b3f27b67f2dec16bc06084df2dfe9f20072584e)
[wt: minor ctx adj: wake_expired_tasks() is earlier in 2.2]
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 2bb9a482aeaddaab5f91740afd92870abeed989f)
Signed-off-by: Willy Tarreau <w@1wt.eu>
diff --git a/src/haproxy.c b/src/haproxy.c
index 74fb70e..617a08d 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -142,6 +142,7 @@
unsigned long all_proc_mask = 1; /* mask of all processes */
volatile unsigned long sleeping_thread_mask = 0; /* Threads that are about to sleep in poll() */
+volatile unsigned long stopping_thread_mask = 0; /* Threads acknowledged stopping */
/* global options */
struct global global = {
@@ -2633,8 +2634,12 @@
/* Check if we can expire some tasks */
next = wake_expired_tasks();
+ if (stopping && tasks_run_queue == 0)
+ _HA_ATOMIC_OR(&stopping_thread_mask, tid_bit);
+
/* stop when there's nothing left to do */
- if ((jobs - unstoppable_jobs) == 0)
+ if ((jobs - unstoppable_jobs) == 0 && tasks_run_queue == 0 &&
+ (stopping_thread_mask & all_threads_mask) == all_threads_mask)
break;
/* also stop if we failed to cleanly stop all tasks */