MEDIUM: server: add a new pool-low-conn server setting The problem with the way idle connections currently work is that it's easy for a thread to steal all of its siblings' connections, then release them, then it's done by another one, etc. This happens even more easily due to scheduling latencies, or merged events inside the same pool loop, which, when dealing with a fast server responding in sub-millisecond delays, can really result in one thread being fully at work at a time. In such a case, we perform a huge amount of takeover() which consumes CPU and requires quite some locking, sometimes resulting in lower performance than expected. In order to fight against this problem, this patch introduces a new server setting "pool-low-conn", whose purpose is to dictate when it is allowed to steal connections from a sibling. As long as the number of idle connections remains at least as high as this value, it is permitted to take over another connection. When the idle connection count becomes lower, a thread may only use its own connections or create a new one. By proceeding like this even with a low number (typically 2*nbthreads), we quickly end up in a situation where all active threads have a few connections. It then becomes possible to connect to a server without bothering other threads the vast majority of the time, while still being able to use these connections when the number of available FDs becomes low. We also use this threshold instead of global.nbthread in the connection release logic, allowing to keep more extra connections if needed. A test performed with 10000 concurrent HTTP/1 connections, 16 threads and 210 servers with 1 millisecond of server response time showed the following numbers: haproxy 2.1.7: 185000 requests per second haproxy 2.2: 314000 requests per second haproxy 2.2 lowconn 32: 352000 requests per second The takeover rate goes down from 300k/s to 13k/s. The difference is further amplified as the response time shrinks.

commit: 2f3f4d3441ecbd7a192c577ef96d93ba957155ab [log] [tgz]
author: Willy Tarreau <w@1wt.eu> Wed Jul 01 07:43:51 2020 +0200
committer: Willy Tarreau <w@1wt.eu> Wed Jul 01 15:23:15 2020 +0200
tree: f7f214f14e1342916000e2c7b94e9a3391c98ad3
parent: 35e30c9670dcc76d5a35d3472e38e4afaf93c615 [diff] [blame]
diff --git a/src/backend.c b/src/backend.c
index 0faf724..f54181a 100644
--- a/src/backend.c
+++ b/src/backend.c

@@ -1076,13 +1076,14 @@
 {
 	struct mt_list *mt_list = is_safe ? srv->safe_conns : srv->idle_conns;
 	struct connection *conn;
-	int i;
+	int i; // thread number
 	int found = 0;
 
 	/* We need to lock even if this is our own list, because another
 	 * thread may be trying to migrate that connection, and we don't want
 	 * to end up with two threads using the same connection.
 	 */
+	i = tid;
 	HA_SPIN_LOCK(OTHER_LOCK, &idle_conns[tid].toremove_lock);
 	conn = MT_LIST_POP(&mt_list[tid], struct connection *, list);
 	HA_SPIN_UNLOCK(OTHER_LOCK, &idle_conns[tid].toremove_lock);
@@ -1090,10 +1091,17 @@
 	/* If we found a connection in our own list, and we don't have to
 	 * steal one from another thread, then we're done.
 	 */
-	if (conn) {
-		i = tid;
-		goto fix_conn;
-	}
+	if (conn)
+		goto done;
+
+	/* Are we allowed to pick from another thread ? We'll still try
+	 * it if we're running low on FDs as we don't want to create
+	 * extra conns in this case, otherwise we can give up if we have
+	 * too few idle conns.
+	 */
+	if (srv->curr_idle_conns < srv->low_idle_conns &&
+	    ha_used_fds < global.tune.pool_low_count)
+		goto done;
 
 	/* Lookup all other threads for an idle connection, starting from tid + 1 */
 	for (i = tid; !found && (i = ((i + 1 == global.nbthread) ? 0 : i + 1)) != tid;) {
@@ -1116,8 +1124,8 @@
 
 	if (!found)
 		conn = NULL;
-	else {
-fix_conn:
+ done:
+	if (conn) {
 		conn->idle_time = 0;
 		_HA_ATOMIC_SUB(&srv->curr_idle_conns, 1);
 		_HA_ATOMIC_SUB(&srv->curr_idle_thr[i], 1);
commit	2f3f4d3441ecbd7a192c577ef96d93ba957155ab	[log] [tgz]
author	Willy Tarreau <w@1wt.eu>	Wed Jul 01 07:43:51 2020 +0200
committer	Willy Tarreau <w@1wt.eu>	Wed Jul 01 15:23:15 2020 +0200
tree	f7f214f14e1342916000e2c7b94e9a3391c98ad3
parent	35e30c9670dcc76d5a35d3472e38e4afaf93c615 [diff] [blame]