MEDIUM: backend: use a trylock when trying to grab an idle connection
In conn_backend_get() we can cause some extreme contention due to the
idle_conns_lock. Indeed, even though it's per-thread, it still causes
high contention when running with many threads. The reason is that all
threads which do not have any idle connections are quickly skipped,
till the point where there are still some, so the first reaching that
point will grab the lock and the other ones wait behind. From this
point, all threads are synchronized waiting on the same lock, and
will follow the leader in small jumps, all hindering each other.
Here instead of doing this we're using a trylock. This way when a thread
is already checking a list, other ones will continue to next thread. In
the worst case, a high contention will lead to a few new connections to
be set up, but this may actually be what is required to avoid contention
in the first place. With this change, the contention has mostly
disappeared on this lock (it's still present in muxes and transport
layers due to the takeover).
Surprisingly, checking for emptiness of the tree root before taking
the lock didn't address any contention.
A few improvements are still possible and desirable here. The first
one would be to avoid seeing all threads jump to the next one. We
could have each thread use a different prime number as the increment
so as to spread them across the entire table instead of keeping them
synchronized. The second one is that the lock in the muck layers
shouldn't be needed to check for the tasklet's context availability.
(cherry picked from commit b1adf03df970958e82ad33efadba3c2586ffc02f)
[wt: this is a major cause of contention in highly threaded environments,
hence the backport. Minor ctx adjustments (list vs tree) and lock name]
Signed-off-by: Willy Tarreau <w@1wt.eu>
diff --git a/src/backend.c b/src/backend.c
index 522928e..c3894a5 100644
--- a/src/backend.c
+++ b/src/backend.c
@@ -1163,7 +1163,8 @@
if (!srv->curr_idle_thr[i] || i == tid)
continue;
- HA_SPIN_LOCK(OTHER_LOCK, &idle_conns[i].takeover_lock);
+ if (HA_SPIN_TRYLOCK(OTHER_LOCK, &idle_conns[i].takeover_lock) != 0)
+ continue;
mt_list_for_each_entry_safe(conn, &mt_list[i], list, elt1, elt2) {
if (conn->mux->takeover && conn->mux->takeover(conn, i) == 0) {
MT_LIST_DEL_SAFE(elt1);