MINOR: listener: improve incoming traffic distribution
By picking two randoms following the P2C algorithm, we seldom observe
asymmetric loads on bursts of small session counts. This is typically
what makes h2load take a bit of time to complete the last 100% because
if a thread gets two connections while the other ones only have one,
it takes twice the time to complete its work.
This patch proposes a modification of the p2c algorithm which seems
more suitable to this case : it mixes a rotating index with a random.
This way, we're certain that all threads are consulted in turn and at
the same time we're not forced to use the ones we're giving a chance.
This significantly increases the traffic rate. Now h2load shows faster
completion and the average request rates on H2 and the TLS resume rate
increases by a bit more than 5% compared to pure p2c.
The index was placed into the struct bind_conf because 1) it's faster
there and it's the best place to optimally distribute traffic among a
group of listeners. It's the only runtime-modified element there and
it will be quite cache-hot.
diff --git a/include/types/listener.h b/include/types/listener.h
index 876d3c3..95f3cc9 100644
--- a/include/types/listener.h
+++ b/include/types/listener.h
@@ -172,6 +172,7 @@
unsigned long bind_thread; /* bitmask of threads allowed to use these listeners */
unsigned long thr_2, thr_4, thr_8, thr_16; /* intermediate values for bind_thread counting */
unsigned int thr_count; /* #threads bound */
+ unsigned int thr_idx; /* thread indexes for queue distribution : (t2<<16)+t1 */
uint32_t ns_cip_magic; /* Excepted NetScaler Client IP magic number */
struct list by_fe; /* next binding for the same frontend, or NULL */
char *arg; /* argument passed to "bind" for better error reporting */
diff --git a/src/listener.c b/src/listener.c
index 9a9699c..3e080b4 100644
--- a/src/listener.c
+++ b/src/listener.c
@@ -847,12 +847,23 @@
count = l->bind_conf->thr_count;
if (count > 1 && (global.tune.options & GTUNE_LISTENER_MQ)) {
struct accept_queue_ring *ring;
- int r, t1, t2, q1, q2;
+ int t1, t2, q1, q2;
+
+ /* pick a first thread ID using a round robin index,
+ * and a second thread ID using a random. The
+ * connection will be assigned to the one with the
+ * least connections. This provides fairness on short
+ * connections (round robin) and on long ones (conn
+ * count).
+ */
+ t1 = l->bind_conf->thr_idx;
+ do {
+ t2 = t1 + 1;
+ if (t2 >= count)
+ t2 = 0;
+ } while (!HA_ATOMIC_CAS(&l->bind_conf->thr_idx, &t1, t2));
- /* pick two small distinct random values and drop lower bits */
- r = (random() >> 8) % ((count - 1) * count);
- t2 = r / count; // 0..thr_count-2
- t1 = r % count; // 0..thr_count-1
+ t2 = (random() >> 8) % (count - 1); // 0..thr_count-2
t2 += t1 + 1; // necessarily different from t1
if (t2 >= count)