Diff - 5a329cf0177af1a4c8b8a6c6ab8821a92952f794^! - haproxy

commit	5a329cf0177af1a4c8b8a6c6ab8821a92952f794	[log] [tgz]
author	Krzysztof Piotr Oledzki <ole@ans.pl>	Fri Feb 22 03:50:19 2008 +0100
committer	Willy Tarreau <w@1wt.eu>	Tue Mar 04 06:16:37 2008 +0100
tree	32c39f0c8b612085be6c93ec49787ab93c81b2ae
parent	626a19b66f769a87e7c995267ccedf14149e03b3 [diff] [blame]

[MEDIUM]: Prevent redispatcher from selecting the same server, version #3

When haproxy decides that session needs to be redispatched it chose a server,
but there is no guarantee for it to be a different one. So, it often
happens that selected server is exactly the same that it was previously, so
a client ends up with a 503 error anyway, especially when one sever has
much bigger weight than others.

Changes from the previous version:
 - drop stupid and unnecessary SN_DIRECT changes

 - assign_server(): use srvtoavoid to keep the old server and clear s->srv
    so SRV_STATUS_NOSRV guarantees that t->srv == NULL (again)
    and get_server_rr_with_conns has chances to work (previously
    we were passing a NULL here)

 - srv_redispatch_connect(): remove t->srv->cum_sess and t->srv->failed_conns
   incrementing as t->srv was guaranteed to be NULL

 - add avoididx to get_server_rr_with_conns. I hope I correctly understand this code.

 - fix http_flush_cookie_flags() and move it to assign_server_and_queue()
   directly. The code here was supposed to set CK_DOWN and clear CK_VALID,
   but: (TX_CK_VALID | TX_CK_DOWN) == TX_CK_VALID == TX_CK_MASK so:
	if ((txn->flags & TX_CK_MASK) == TX_CK_VALID)
		txn->flags ^= (TX_CK_VALID | TX_CK_DOWN);
   was really a:
	if ((txn->flags & TX_CK_MASK) == TX_CK_VALID)
		txn->flags &= TX_CK_VALID

   Now haproxy logs "--DI" after redispatching connection.

 - defer srv->redispatches++ and s->be->redispatches++ so there
   are called only if a conenction was redispatched, not only
   supposed to.

 - don't increment lbconn if redispatcher selected the same sarver

 - don't count unsuccessfully redispatched connections as redispatched
   connections

 - don't count redispatched connections as errors, so:

 - the number of connections effectively served by a server is:
 srv->cum_sess - srv->failed_conns - srv->retries - srv->redispatches
   and
 SUM(servers->failed_conns) == be->failed_conns

 - requires the "Don't increment server connections too much + fix retries" patch

 - needs little more testing and probably some discussion so reverting to the RFC state

Tests #1:
 retries 4
 redispatch

i) 1 server(s): b (wght=1, down)
  b) sessions=5, lbtot=1, err_conn=1, retr=4, redis=0
  -> request failed

ii) server(s): b (wght=1, down), u (wght=1, down)
  b) sessions=4, lbtot=1, err_conn=0, retr=3, redis=1
  u) sessions=1, lbtot=1, err_conn=1, retr=0, redis=0
  -> request FAILED

iii) 2 server(s): b (wght=1, down), u (wght=1, up)
  b) sessions=4, lbtot=1, err_conn=0, retr=3, redis=1
  u) sessions=1, lbtot=1, err_conn=0, retr=0, redis=0
  -> request OK

iv) 2 server(s): b (wght=100, down), u (wght=1, up)
  b) sessions=4, lbtot=1, err_conn=0, retr=3, redis=1
  u) sessions=1, lbtot=1, err_conn=0, retr=0, redis=0
  -> request OK

v) 1 server(s): b (down for first 4 SYNS)
  b) sessions=5, lbtot=1, err_conn=0, retr=4, redis=0
  -> request OK

Tests #2:
 retries 4

i) 1 server(s): b (down)
  b) sessions=5, lbtot=1, err_conn=1, retr=4, redis=0
  -> request FAILED

diff --git a/src/proto_http.c b/src/proto_http.c
index cd88905..f685201 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c

@@ -2627,16 +2627,8 @@
 				if (may_dequeue_tasks(t->srv, t->be))
 					task_wakeup(t->srv->queue_mgt);
 
-				if (t->srv) {
-					t->srv->failed_conns++;
-					t->srv->redispatches++;
-				}
-				t->be->redispatches++;
-
+				/* it's left to the dispatcher to choose a server */
 				t->flags &= ~(SN_DIRECT | SN_ASSIGNED | SN_ADDR_SET);
-				t->flags |= SN_REDISP;
-				t->srv = NULL; /* it's left to the dispatcher to choose a server */
-				http_flush_cookie_flags(txn);
 
 				/* first, get a connection */
 				if (srv_redispatch_connect(t))