MEDIUM: listener: switch the default sharding to by-group Sharding by-group is exactly identical to by-process for a single group, and will use the same number of file descriptors for more than one group, while significantly lowering the kernel's locking overhead. Now that all special listeners (cli, peers) are properly handled, and that support for SO_REUSEPORT is detected at runtime per protocol, there should be no more reason for now switching to by-group by default. That's what this patch does. It does only this and nothing else so that it's easy to revert, should any issue be raised. Testing on an AMD EPYC 74F3 featuring 24 cores and 48 threads distributed into 8 core complexes of 3 cores each, shows that configuring 8 groups (one per CCX) is sufficient to simply double the forwarded connection rate from 112k to 214k/s, reducing kernel locking from 71 to 55%.

commit: 0e875cf291fe5c5fbc108574de44f0cdc38a4051 [log] [tgz]
author: Willy Tarreau <w@1wt.eu> Sun Apr 23 00:51:59 2023 +0200
committer: Willy Tarreau <w@1wt.eu> Sun Apr 23 10:18:16 2023 +0200
tree: 5d6c9d5db0f122a1e20abba491d9ecd3afc4ea51
parent: 7310164b2cbae510b17377973fab26bf85c7d6c6 [diff]
diff --git a/doc/configuration.txt b/doc/configuration.txt
index 8fbe88a..3fc4ea0 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt

@@ -3045,7 +3045,8 @@
   sockets on a same address. Note that "by-group" will remain equivalent to
   "by-process" for default configurations involving a single thread group, and
   will fall back to sharing the same socket on systems that do not support this
-  mechanism. As such, it is the recommended setting.
+  mechanism. The default is "by-group" with a fallback to "by-process" for
+  systems or socket families that do not support multiple bindings.
 
 tune.listener.multi-queue { on | fair | off }
   Enables ('on' / 'fair') or disables ('off') the listener's multi-queue accept

diff --git a/src/haproxy.c b/src/haproxy.c
index 739183a..6d155f3 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c

@@ -205,7 +205,7 @@
 		.idle_timer = 1000, /* 1 second */
 #endif
 		.nb_stk_ctr = MAX_SESS_STKCTR,
-		.default_shards = 1, /* "by-process" = one shard per listener */
+		.default_shards = -2, /* by-group */
 #ifdef USE_QUIC
 		.quic_backend_max_idle_timeout = QUIC_TP_DFLT_BACK_MAX_IDLE_TIMEOUT,
 		.quic_frontend_max_idle_timeout = QUIC_TP_DFLT_FRONT_MAX_IDLE_TIMEOUT,
commit	0e875cf291fe5c5fbc108574de44f0cdc38a4051	[log] [tgz]
author	Willy Tarreau <w@1wt.eu>	Sun Apr 23 00:51:59 2023 +0200
committer	Willy Tarreau <w@1wt.eu>	Sun Apr 23 10:18:16 2023 +0200
tree	5d6c9d5db0f122a1e20abba491d9ecd3afc4ea51
parent	7310164b2cbae510b17377973fab26bf85c7d6c6 [diff]