BUG/MEDIUM: peers: reset starting point if peers appears longly disconnected
If two peers are disconnected and during this period they continue to
process a large amount of local updates, after a reconnection they
may take a long time before restarting to push their updates. because
the last pushed update would appear internally in futur.
This patch fix this resetting the cursor on acked updates at the maximum
point considered in the past if it appears in futur but it means we
may lost some updates. A clean fix would be to update the protocol to
be able to signal a remote peer that is was not updated for a too long
period and needs a full resync but this is not yet supported by the
protocol.
This patch should be backported on all supported branches ( >= 1.6 )
(cherry picked from commit d9729da98262f2136ad4eac44c3ec2f710cb4a49)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 21680f6d9311339193220eb0d85aac79969fad4f)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 905295c9c5de3be91d1b2493bbb01f8f690181d7)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
diff --git a/src/peers.c b/src/peers.c
index 2793589..bf7e03d 100644
--- a/src/peers.c
+++ b/src/peers.c
@@ -2136,7 +2136,20 @@
/* Init cursors */
for (st = peer->tables; st ; st = st->next) {
st->last_get = st->last_acked = 0;
+ HA_SPIN_LOCK(STK_TABLE_LOCK, &st->table->lock);
+ /* if st->update appears to be in future it means
+ * that the last acked value is very old and we
+ * remain unconnected a too long time to use this
+ * acknowlegement as a reset.
+ * We should update the protocol to be able to
+ * signal the remote peer that it needs a full resync.
+ * Here a partial fix consist to set st->update at
+ * the max past value
+ */
+ if ((int)(st->table->localupdate - st->update) < 0)
+ st->update = st->table->localupdate + (2147483648U);
st->teaching_origin = st->last_pushed = st->update;
+ HA_SPIN_UNLOCK(STK_TABLE_LOCK, &st->table->lock);
}
/* reset teaching and learning flags to 0 */
@@ -2173,7 +2186,20 @@
/* Init cursors */
for (st = peer->tables; st ; st = st->next) {
st->last_get = st->last_acked = 0;
+ HA_SPIN_LOCK(STK_TABLE_LOCK, &st->table->lock);
+ /* if st->update appears to be in future it means
+ * that the last acked value is very old and we
+ * remain unconnected a too long time to use this
+ * acknowlegement as a reset.
+ * We should update the protocol to be able to
+ * signal the remote peer that it needs a full resync.
+ * Here a partial fix consist to set st->update at
+ * the max past value.
+ */
+ if ((int)(st->table->localupdate - st->update) < 0)
+ st->update = st->table->localupdate + (2147483648U);
st->teaching_origin = st->last_pushed = st->update;
+ HA_SPIN_UNLOCK(STK_TABLE_LOCK, &st->table->lock);
}
/* Init confirm counter */