d16a1b2a818359e8c3ade85f789e66ed7ca9488c - haproxy

commit	d16a1b2a818359e8c3ade85f789e66ed7ca9488c	[log] [tgz]
author	Willy Tarreau <w@1wt.eu>	Fri Apr 12 14:46:51 2013 +0200
committer	Willy Tarreau <w@1wt.eu>	Fri Apr 12 14:46:51 2013 +0200
tree	dc61c3d43155db16505366955d1639c4b792026a
parent	33c60dece549953e5e57969b4c8f2cd2d4ea2b3c [diff]

BUG/MAJOR: backend: consistent hash can loop forever in certain circumstances

When the parameter passed to a consistent hash is not found, we fall back to
round-robin using chash_get_next_server(). This one stores the last visited
server in lbprm.chash.last, which can be NULL upon the first invocation or if
the only server was recently brought up.

The loop used to scan for a server is able to skip the previously attempted
server in case of a redispatch, by passing this previous server in srvtoavoid.
For this reason, the loop stops when the currently considered server is
different from srvtoavoid and different from the original chash.last.

A problem happens in a special sequence : if a connection to a server fails,
then all servers are removed from the farm, then the original server is added
again before the redispatch happens, we have chash.last = NULL and srvtoavoid
set to the only server in the farm. Then this server is always equal to
srvtoavoid and never to NULL, and the loop never stops.

The fix consists in assigning the stop point to the first encountered node if
it was not yet set.

This issue cannot happen with the map-based algorithm since it's based on an
index and not a stop point.

This issue was reported by Henry Qian who kindly provided lots of critically
useful information to figure out the conditions to reproduce the issue.

The fix needs to be backported to 1.4 which is also affected.

src/lb_chash.c[diff]

1 file changed

tree: dc61c3d43155db16505366955d1639c4b792026a