OPTIM: task: automatically adjust the default runqueue-depth to the threads The recent default runqueue size reduction appeared to have significantly lowered performance on low-thread count configs. Testing various values runqueue values on different workloads under thread counts ranging from 1 to 64, it appeared that lower values are more optimal for high thread counts and conversely. It could even be drawn that the optimal value for various workloads sits around 280/sqrt(nbthread), and probably has to do with both the L3 cache usage and how to optimally interlace the threads' activity to minimize contention. This is much easier to optimally configure, so let's do this by default now. (cherry picked from commit 060a7612487c244175fa1dc1e5b224015cbcf503) Signed-off-by: Willy Tarreau <w@1wt.eu>

commit: 20076a05dd699b784e44872e74c538d0a4c075e4 [log] [tgz]
author: Willy Tarreau <w@1wt.eu> Wed Mar 10 11:06:26 2021 +0100
committer: Willy Tarreau <w@1wt.eu> Wed Mar 10 15:11:58 2021 +0100
tree: cd3b94ea84ab22984fc06c378c500e8821852ec2
parent: 4a8fbbd6d4e508c3e2b10cc8f03b4e85a40b4912 [diff] [blame]
diff --git a/src/haproxy.c b/src/haproxy.c
index 9dfdd22..5da563b 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c

@@ -2275,8 +2275,14 @@
 	if (global.tune.maxpollevents <= 0)
 		global.tune.maxpollevents = MAX_POLL_EVENTS;
 
-	if (global.tune.runqueue_depth <= 0)
-		global.tune.runqueue_depth = RUNQUEUE_DEPTH;
+	if (global.tune.runqueue_depth <= 0) {
+		/* tests on various thread counts from 1 to 64 have shown an
+		 * optimal queue depth following roughly 1/sqrt(threads).
+		 */
+		int s = my_flsl(global.nbthread);
+		s += (global.nbthread / s); // roughly twice the sqrt.
+		global.tune.runqueue_depth = RUNQUEUE_DEPTH * 2 / s;
+	}
 
 	if (global.tune.recv_enough == 0)
 		global.tune.recv_enough = MIN_RECV_AT_ONCE_ENOUGH;
commit	20076a05dd699b784e44872e74c538d0a4c075e4	[log] [tgz]
author	Willy Tarreau <w@1wt.eu>	Wed Mar 10 11:06:26 2021 +0100
committer	Willy Tarreau <w@1wt.eu>	Wed Mar 10 15:11:58 2021 +0100
tree	cd3b94ea84ab22984fc06c378c500e8821852ec2
parent	4a8fbbd6d4e508c3e2b10cc8f03b4e85a40b4912 [diff] [blame]