MINOR: tasks: refine the default run queue depth Since a lot of internal callbacks were turned to tasklets, the runqueue depth had not been readjusted from the default 200 which was initially used to favor batched processing. But nowadays it appears too large already based on the following tests conducted on a 8c16t machine with a simple config involving "balance leastconn" and one server. The setup always involved the two threads of a same CPU core except for 1 thread, and the client was running over 1000 concurrent H1 connections. The number of requests per second is reported for each (runqueue-depth, nbthread) couple: rq\thr| 1 2 4 8 16 ------+------------------------------ 32| 120k 159k 276k 477k 698k 40| 122k 160k 276k 478k 722k 48| 121k 159k 274k 482k 720k 64| 121k 160k 274k 469k 710k 200| 114k 150k 247k 415k 613k <-- default It's possible to save up to about 18% performance by lowering the default value to 40. One possible explanation to this is that checking I/Os more frequently allows to flush buffers faster and to smooth the I/O wait time over multiple operations instead of alternating phases of processing, waiting for locks and waiting for new I/Os. The total round trip time also fell from 1.62ms to 1.40ms on average, among which at least 0.5ms is attributed to the testing tools since this is the minimum attainable on the loopback. After some observation it would be nice to backport this to 2.3 and 2.2 which observe similar improvements, since some users have already observed some perf regressions between 1.6 and 2.2. (cherry picked from commit 4327d0ac00fd29622e8558a206dfbf23eaeb9fe9) Signed-off-by: Willy Tarreau <w@1wt.eu>

commit: 6c3d77629881f40e22d7dab35327c13ed41bd5c0 [log] [tgz]
author: Willy Tarreau <w@1wt.eu> Fri Feb 19 15:11:55 2021 +0100
committer: Willy Tarreau <w@1wt.eu> Wed Mar 10 14:05:49 2021 +0100
tree: e76bda109ff7ae15d04e3d48af7a103eb5518488
parent: 656730d92f1c3afaed0baf67911d4ee055528e2e [diff] [blame]
diff --git a/doc/configuration.txt b/doc/configuration.txt
index e5cc4a9..3aca9c2 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt

@@ -2398,10 +2398,12 @@
 
 tune.runqueue-depth <number>
   Sets the maximum amount of task that can be processed at once when running
-  tasks. The default value is 200. Increasing it may incur latency when
-  dealing with I/Os, making it too small can incur extra overhead. When
-  experimenting with much larger values, it may be useful to also enable
-  tune.sched.low-latency to limit the maximum latency to the lowest possible.
+  tasks. The default value is 40 which tends to show the highest request rates
+  and lowest latencies. Increasing it may incur latency when dealing with I/Os,
+  making it too small can incur extra overhead. When experimenting with much
+  larger values, it may be useful to also enable tune.sched.low-latency and
+  possibly tune.fd.edge-triggered to limit the maximum latency to the lowest
+  possible.
 
 tune.sched.low-latency { on | off }
   Enables ('on') or disables ('off') the low-latency task scheduler. By default
commit	6c3d77629881f40e22d7dab35327c13ed41bd5c0	[log] [tgz]
author	Willy Tarreau <w@1wt.eu>	Fri Feb 19 15:11:55 2021 +0100
committer	Willy Tarreau <w@1wt.eu>	Wed Mar 10 14:05:49 2021 +0100
tree	e76bda109ff7ae15d04e3d48af7a103eb5518488
parent	656730d92f1c3afaed0baf67911d4ee055528e2e [diff] [blame]