BUG/MEDIUM: queue: implement a flag to check for the dequeuing As unveiled in GH issue #2711, commit 5541d4995d ("BUG/MEDIUM: queue: deal with a rare TOCTOU in assign_server_and_queue()") does have some side effects in that it can occasionally cause an endless loop. As Christopher analysed it, the problem is that process_srv_queue(), which uses a trylock in order to leave only one thread in charge of the dequeueing process, can lose the lock race against pendconn_add(). If this happens on the last served request, then there's no more thread to deal with the dequeuing, and assign_server_and_queue() will loop forever on a condition that was initially exepected to be extremely rare (and still is, except that now it can become sticky). Previously what was happening is that such queued requests would just time out and since that was very rare, nobody would notice. The root of the problem really is that trylock. It was added so that only one thread dequeues at a time but it doesn't offer only that guarantee since it also prevents a thread from dequeuing if another one is in the process of queuing. We need a different criterion. What we're doing now is to set a flag "dequeuing" in the server, which indicates that one thread is currently in the process of dequeuing requests. This one is atomically tested, and only if no thread is in this process, then the thread grabs the queue's lock and dequeues. This way it will be serialized with pendconn_add() and no request addition will be missed. It is not certain whether the original race covered by the fix above can still happen with this change, so better keep that fix for now. Thanks to @Yenya (Jan Kasprzak) for the precise and complete report allowing to spot the problem. This patch should be backported wherever the patch above was backported. (cherry picked from commit b11495652e724d71f1f4247332f060fe48577664) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit 36ea5570d2c2c966dc6fa16057727410e25e4a31) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com> (cherry picked from commit c9fa2098c20999af5b4e19af00f48aa0957896a7) Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

commit: c56b380082d5e06819ea6d382294e17c6ea2905a [log] [tgz]
author: Willy Tarreau <w@1wt.eu> Wed Sep 11 09:37:53 2024 +0200
committer: Christopher Faulet <cfaulet@haproxy.com> Tue Sep 17 09:53:49 2024 +0200
tree: 7eca1784a96e0d955d07a707899127e93cefa498
parent: f0af384d9f8f7506a0b59343b3981d6f9d303602 [diff]
diff --git a/include/haproxy/server-t.h b/include/haproxy/server-t.h
index 40a5278..95d1d10 100644
--- a/include/haproxy/server-t.h
+++ b/include/haproxy/server-t.h

@@ -319,6 +319,7 @@
 
 	struct queue queue;			/* pending connections */
 	struct mt_list sess_conns;		/* list of private conns managed by a session on this server */
+	unsigned int dequeuing;                 /* non-zero = dequeuing in progress (atomic) */
 
 	/* Element below are usd by LB algorithms and must be doable in
 	 * parallel to other threads reusing connections above.

diff --git a/src/queue.c b/src/queue.c
index 7df3379..6243960 100644
--- a/src/queue.c
+++ b/src/queue.c

@@ -379,12 +379,20 @@
 	 * However we still re-enter the loop for one pass if there's no
 	 * more served, otherwise we could end up with no other thread
 	 * trying to dequeue them.
+	 *
+	 * There's one racy part: we don't want to have more than one thread
+	 * in charge of dequeuing, hence the dequeung flag. We cannot rely
+	 * on a trylock here because it would compete against pendconn_add()
+	 * and would occasionally leave entries in the queue that are never
+	 * dequeued. Nobody else uses the dequeuing flag so when seeing it
+	 * non-null, we're certain that another thread is waiting on it.
 	 */
 	while (!stop && (done < global.tune.maxpollevents || !s->served) &&
 	       s->served < (maxconn = srv_dynamic_maxconn(s))) {
-		if (HA_SPIN_TRYLOCK(QUEUE_LOCK, &s->queue.lock) != 0)
+		if (HA_ATOMIC_XCHG(&s->dequeuing, 1))
 			break;
 
+		HA_SPIN_LOCK(QUEUE_LOCK, &s->queue.lock);
 		while (s->served < maxconn) {
 			stop = !pendconn_process_next_strm(s, p, px_ok);
 			if (stop)
@@ -394,6 +402,7 @@
 			if (done >= global.tune.maxpollevents)
 				break;
 		}
+		HA_ATOMIC_STORE(&s->dequeuing, 0);
 		HA_SPIN_UNLOCK(QUEUE_LOCK, &s->queue.lock);
 	}
commit	c56b380082d5e06819ea6d382294e17c6ea2905a	[log] [tgz]
author	Willy Tarreau <w@1wt.eu>	Wed Sep 11 09:37:53 2024 +0200
committer	Christopher Faulet <cfaulet@haproxy.com>	Tue Sep 17 09:53:49 2024 +0200
tree	7eca1784a96e0d955d07a707899127e93cefa498
parent	f0af384d9f8f7506a0b59343b3981d6f9d303602 [diff]