BUG/MEDIUM: lb-leastconn: Reposition a server using the right eweight
Depending on the context, the current eweight or the next one must be used
to reposition a server in the tree. When the server state is updated, for
instance its weight, the next eweight must be used because it is not yet
committed. However, when the server is used, on normal conditions, the
current eweight must be used.
In fact, it is only a bug on the 1.8. On newer versions, the changes on a
server are performed synchronously. But it is safer to rely on the right
eweight value to avoid any futur bugs.
On the 1.8, it is important to do so, because the server state is updated
and committed inside the rendez-vous point. Thus, the next server state may
be unsync with the current state for a short time, waiting all threads join
the rendez-vous point. It is especially a problem if the next eweight is set
to 0. Because otherwise, it must not be used to reposition the server in the
tree, leading to a divide by 0.
This patch must be backported as far as 1.8.
(cherry picked from commit cb33d3ac7f8dbc3f7323606a521d29dc100adbda)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
diff --git a/src/lb_fwlc.c b/src/lb_fwlc.c
index bf5ae61..630467b 100644
--- a/src/lb_fwlc.c
+++ b/src/lb_fwlc.c
@@ -37,7 +37,7 @@
eb32_delete(&s->lb_node);
}
-/* Queue a server in its associated tree, assuming the weight is >0.
+/* Queue a server in its associated tree, assuming the <eweight> is >0.
* Servers are sorted by (#conns+1)/weight. To ensure maximum accuracy,
* we use (#conns+1)*SRV_EWGHT_MAX/eweight as the sorting key. The reason
* for using #conns+1 is to sort by weights in case the server is picked
@@ -47,13 +47,19 @@
* connection are always picked first so that under low loads, it's not
* always the single server with the highest weight that gets picked.
*
+ * NOTE: Depending on the calling context, we use s->next_eweight or
+ * s->cur_eweight. The next value is used when the server state is updated
+ * (because the weight changed for instance). During this step, the server
+ * state is not yet committed. The current value is used to reposition the
+ * server in the tree. This happens when the server is used.
+ *
* The server's lock and the lbprm's lock must be held.
*/
-static inline void fwlc_queue_srv(struct server *s)
+static inline void fwlc_queue_srv(struct server *s, unsigned int eweight)
{
unsigned int inflight = s->served + s->nbpend;
- s->lb_node.key = inflight ? (inflight + 1) * SRV_EWGHT_MAX / s->next_eweight : 0;
+ s->lb_node.key = inflight ? (inflight + 1) * SRV_EWGHT_MAX / eweight : 0;
eb32_insert(s->lb_tree, &s->lb_node);
}
@@ -68,7 +74,7 @@
HA_RWLOCK_WRLOCK(LBPRM_LOCK, &s->proxy->lbprm.lock);
if (s->lb_tree) {
fwlc_dequeue_srv(s);
- fwlc_queue_srv(s);
+ fwlc_queue_srv(s, s->cur_eweight);
}
HA_RWLOCK_WRUNLOCK(LBPRM_LOCK, &s->proxy->lbprm.lock);
}
@@ -183,7 +189,7 @@
}
/* note that eweight cannot be 0 here */
- fwlc_queue_srv(srv);
+ fwlc_queue_srv(srv, srv->next_eweight);
out_update_backend:
/* check/update tot_used, tot_weight */
@@ -244,7 +250,7 @@
srv->lb_tree = &p->lbprm.fwlc.act;
}
- fwlc_queue_srv(srv);
+ fwlc_queue_srv(srv, srv->next_eweight);
update_backend_weight(p);
HA_RWLOCK_WRUNLOCK(LBPRM_LOCK, &p->lbprm.lock);
@@ -284,7 +290,7 @@
if (!srv_currently_usable(srv))
continue;
srv->lb_tree = (srv->flags & SRV_F_BACKUP) ? &p->lbprm.fwlc.bck : &p->lbprm.fwlc.act;
- fwlc_queue_srv(srv);
+ fwlc_queue_srv(srv, srv->next_eweight);
}
}