doc/design-thoughts/connection-reuse.txt - haproxy - Gitiles

 2015/08/06 - server connection sharing

 Improvements on the connection sharing strategies
 -------------------------------------------------

 4 strategies are currently supported :
   - never
   - safe
   - aggressive
   - always

 The "aggressive" and "always" strategies take into account the fact that the
 connection has already been reused at least once or not. The principle is that
 second requests can be used to safely "validate" connection reuse on newly
 added connections, and that such validated connections may be used even by
 first requests from other sessions. A validated connection is a connection
 which has already been reused, hence proving that it definitely supports
 multiple requests. Such connections are easy to verify : after processing the
 response, if the txn already had the TX_NOT_FIRST flag, then it was not the
 first request over that connection, and it is validated as safe for reuse.
 Validated connections are put into a distinct list : server->safe_conns.

 Incoming requests with TX_NOT_FIRST first pick from the regular idle_conns
 list so that any new idle connection is validated as soon as possible.

 Incoming requests without TX_NOT_FIRST only pick from the safe_conns list for
 strategy "aggressive", guaranteeing that the server properly supports connection
 reuse, or first from the safe_conns list, then from the idle_conns list for
 strategy "always".

 Connections are always stacked into the list (LIFO) so that there are higher
 changes to convert recent connections and to use them. This will first optimize
 the likeliness that the connection works, and will avoid TCP metrics from being
 lost due to an idle state, and/or the congestion window to drop and the
 connection going to slow start mode.


 Handling connections in pools
 -----------------------------

 A per-server "pool-max" setting should be added to permit disposing unused idle
 connections not attached anymore to a session for use by future requests. The
 principle will be that attached connections are queued from the front of the
 list while the detached connections will be queued from the tail of the list.

 This way, most reused connections will be fairly recent and detached connections
 will most often be ignored. The number of detached idle connections in the lists
 should be accounted for (pool_used) and limited (pool_max).

 After some time, a part of these detached idle connections should be killed.
 For this, the list is walked from tail to head and connections without an owner
 may be evicted. It may be useful to have a per-server pool_min setting
 indicating how many idle connections should remain in the pool, ready for use
 by new requests. Conversely, a pool_low metric should be kept between eviction
 runs, to indicate the lowest amount of detached connections that were found in
 the pool.

 For eviction, the principle of a half-life is appealing. The principle is
 simple : over a period of time, half of the connections between pool_min and
 pool_low should be gone. Since pool_low indicates how many connections were
 remaining unused over a period, it makes sense to kill some of them.

 In order to avoid killing thousands of connections in one run, the purge
 interval should be split into smaller batches. Let's call N the ratio of the
 half-life interval and the effective interval.

 The algorithm consists in walking over them from the end every interval and
 killing ((pool_low - pool_min) + 2 * N - 1) / (2 * N). It ensures that half
 of the unused connections are killed over the half-life period, in N batches
 of population/2N entries at most.

 Unsafe connections should be evicted first. There should be quite few of them
 since most of them are probed and become safe. Since detached connections are
 quickly recycled and attached to a new session, there should not be too many
 detached connections in the pool, and those present there may be killed really
 quickly.

 Another interesting point of pools is that when a pool-max is not null, then it
 makes sense to automatically enable pretend-keep-alive on non-private connections
 going to the server in order to be able to feed them back into the pool. With
 the "aggressive" or "always" strategies, it can allow clients making a single
 request over their connection to share persistent connections to the servers.


 2013/10/17 - server connection management and reuse

 Current state
 -------------

 At the moment, a connection entity is needed to carry any address
 information. This means in the following situations, we need a server
 connection :

 - server is elected and the server's destination address is set

 - transparent mode is elected and the destination address is set from
   the incoming connection

 - proxy mode is enabled, and the destination's address is set during
   the parsing of the HTTP request

 - connection to the server fails and must be retried on the same
   server using the same parameters, especially the destination
   address (SN_ADDR_SET not removed)


 On the accepting side, we have further requirements :

 - allocate a clean connection without a stream interface

 - incrementally set the accepted connection's parameters without
   clearing it, and keep track of what is set (eg: getsockname).

 - initialize a stream interface in established mode

 - attach the accepted connection to a stream interface


 This means several things :

 - the connection has to be allocated on the fly the first time it is
   needed to store the source or destination address ;

 - the connection has to be attached to the stream interface at this
   moment ;

 - it must be possible to incrementally set some settings on the
   connection's addresses regardless of the connection's current state

 - the connection must not be released across connection retries ;

 - it must be possible to clear a connection's parameters for a
   redispatch without having to detach/attach the connection ;

 - we need to allocate a connection without an existing stream interface

 So on the accept() side, it looks like this :

   fd = accept();
   conn = new_conn();
   get_some_addr_info(&conn->addr);
   ...
   si = new_si();
   si_attach_conn(si, conn);
   si_set_state(si, SI_ST_EST);
   ...
   get_more_addr_info(&conn->addr);

 On the connect() side, it looks like this :

   si = new_si();
   while (!properly_connected) {
     if (!(conn = si->end)) {
       conn = new_conn();
       conn_clear(conn);
       si_attach_conn(si, conn);
     }
     else {
       if (connected) {
         f = conn->flags & CO_FL_XPRT_TRACKED;
         conn->flags &= ~CO_FL_XPRT_TRACKED;
         conn_close(conn);
         conn->flags |= f;
       }
       if (!correct_dest)
         conn_clear(conn);
     }
     set_some_addr_info(&conn->addr);
     si_set_state(si, SI_ST_CON);
     ...
     set_more_addr_info(&conn->addr);
     conn->connect();
     if (must_retry) {
       close_conn(conn);
     }
   }

 Note: we need to be able to set the control and transport protocols.
 On outgoing connections, this is set once we know the destination address.
 On incoming connections, this is set the earliest possible (once we know
 the source address).

 The problem analysed below was solved on 2013/10/22

 | ==> the real requirement is to know whether a connection is still valid or not
 |     before deciding to close it. CO_FL_CONNECTED could be enough, though it
 |     will not indicate connections that are still waiting for a connect to occur.
 |     This combined with CO_FL_WAIT_L4_CONN and CO_FL_WAIT_L6_CONN should be OK.
 |
 |     Alternatively, conn->xprt could be used for this, but needs some careful checks
 |     (it's used by conn_full_close at least).
 |
 | Right now, conn_xprt_close() checks conn->xprt and sets it to NULL.
 | conn_full_close() also checks conn->xprt and sets it to NULL, except
 | that the check on ctrl is performed within xprt. So conn_xprt_close()
 | followed by conn_full_close() will not close the file descriptor.
 | Note that conn_xprt_close() is never called, maybe we should kill it ?
 |
 | Note: at the moment, it's problematic to leave conn->xprt to NULL before doing
 |       xprt_init() because we might end up with a pending file descriptor. Or at
 |       least with some transport not de-initialized. We might thus need
 |       conn_xprt_close() when conn_xprt_init() fails.
 |
 | The fd should be conditionned by ->ctrl only, and the transport layer by ->xprt.
 |
 | - conn_prepare_ctrl(conn, ctrl)
 | - conn_prepare_xprt(conn, xprt)
 | - conn_prepare_data(conn, data)
 |
 | Note: conn_xprt_init() needs conn->xprt so it's not a problem to set it early.
 |
 | One problem might be with conn_xprt_close() not being able to know if xprt_init()
 | was called or not. That's where it might make sense to only set ->xprt during init.
 | Except that it does not fly with outgoing connections (xprt_init is called after
 | connect()).
 |
 | => currently conn_xprt_close() is only used by ssl_sock.c and decides whether
 |    to do something based on ->xprt_ctx which is set by ->init() from xprt_init().
 |    So there is nothing to worry about. We just need to restore conn_xprt_close()
 |    and rely on ->ctrl to close the fd instead of ->xprt.
 |
 | => we have the same issue with conn_ctrl_close() : when is the fd supposed to be
 |    valid ? On outgoing connections, the control is set much before the fd...
	2015/08/06 - server connection sharing

	Improvements on the connection sharing strategies
	-------------------------------------------------

	4 strategies are currently supported :
	- never
	- safe
	- aggressive
	- always

	The "aggressive" and "always" strategies take into account the fact that the
	connection has already been reused at least once or not. The principle is that
	second requests can be used to safely "validate" connection reuse on newly
	added connections, and that such validated connections may be used even by
	first requests from other sessions. A validated connection is a connection
	which has already been reused, hence proving that it definitely supports
	multiple requests. Such connections are easy to verify : after processing the
	response, if the txn already had the TX_NOT_FIRST flag, then it was not the
	first request over that connection, and it is validated as safe for reuse.
	Validated connections are put into a distinct list : server->safe_conns.

	Incoming requests with TX_NOT_FIRST first pick from the regular idle_conns
	list so that any new idle connection is validated as soon as possible.

	Incoming requests without TX_NOT_FIRST only pick from the safe_conns list for
	strategy "aggressive", guaranteeing that the server properly supports connection
	reuse, or first from the safe_conns list, then from the idle_conns list for
	strategy "always".

	Connections are always stacked into the list (LIFO) so that there are higher
	changes to convert recent connections and to use them. This will first optimize
	the likeliness that the connection works, and will avoid TCP metrics from being
	lost due to an idle state, and/or the congestion window to drop and the
	connection going to slow start mode.


	Handling connections in pools
	-----------------------------

	A per-server "pool-max" setting should be added to permit disposing unused idle
	connections not attached anymore to a session for use by future requests. The
	principle will be that attached connections are queued from the front of the
	list while the detached connections will be queued from the tail of the list.

	This way, most reused connections will be fairly recent and detached connections
	will most often be ignored. The number of detached idle connections in the lists
	should be accounted for (pool_used) and limited (pool_max).

	After some time, a part of these detached idle connections should be killed.
	For this, the list is walked from tail to head and connections without an owner
	may be evicted. It may be useful to have a per-server pool_min setting
	indicating how many idle connections should remain in the pool, ready for use
	by new requests. Conversely, a pool_low metric should be kept between eviction
	runs, to indicate the lowest amount of detached connections that were found in
	the pool.

	For eviction, the principle of a half-life is appealing. The principle is
	simple : over a period of time, half of the connections between pool_min and
	pool_low should be gone. Since pool_low indicates how many connections were
	remaining unused over a period, it makes sense to kill some of them.

	In order to avoid killing thousands of connections in one run, the purge
	interval should be split into smaller batches. Let's call N the ratio of the
	half-life interval and the effective interval.

	The algorithm consists in walking over them from the end every interval and
	killing ((pool_low - pool_min) + 2 * N - 1) / (2 * N). It ensures that half
	of the unused connections are killed over the half-life period, in N batches
	of population/2N entries at most.

	Unsafe connections should be evicted first. There should be quite few of them
	since most of them are probed and become safe. Since detached connections are
	quickly recycled and attached to a new session, there should not be too many
	detached connections in the pool, and those present there may be killed really
	quickly.

	Another interesting point of pools is that when a pool-max is not null, then it
	makes sense to automatically enable pretend-keep-alive on non-private connections
	going to the server in order to be able to feed them back into the pool. With
	the "aggressive" or "always" strategies, it can allow clients making a single
	request over their connection to share persistent connections to the servers.



	2013/10/17 - server connection management and reuse

	Current state
	-------------

	At the moment, a connection entity is needed to carry any address
	information. This means in the following situations, we need a server
	connection :

	- server is elected and the server's destination address is set

	- transparent mode is elected and the destination address is set from
	the incoming connection

	- proxy mode is enabled, and the destination's address is set during
	the parsing of the HTTP request

	- connection to the server fails and must be retried on the same
	server using the same parameters, especially the destination
	address (SN_ADDR_SET not removed)


	On the accepting side, we have further requirements :

	- allocate a clean connection without a stream interface

	- incrementally set the accepted connection's parameters without
	clearing it, and keep track of what is set (eg: getsockname).

	- initialize a stream interface in established mode

	- attach the accepted connection to a stream interface


	This means several things :

	- the connection has to be allocated on the fly the first time it is
	needed to store the source or destination address ;

	- the connection has to be attached to the stream interface at this
	moment ;

	- it must be possible to incrementally set some settings on the
	connection's addresses regardless of the connection's current state

	- the connection must not be released across connection retries ;

	- it must be possible to clear a connection's parameters for a
	redispatch without having to detach/attach the connection ;

	- we need to allocate a connection without an existing stream interface

	So on the accept() side, it looks like this :

	fd = accept();
	conn = new_conn();
	get_some_addr_info(&conn->addr);
	...
	si = new_si();
	si_attach_conn(si, conn);
	si_set_state(si, SI_ST_EST);
	...
	get_more_addr_info(&conn->addr);

	On the connect() side, it looks like this :

	si = new_si();
	while (!properly_connected) {
	if (!(conn = si->end)) {
	conn = new_conn();
	conn_clear(conn);
	si_attach_conn(si, conn);
	}
	else {
	if (connected) {
	f = conn->flags & CO_FL_XPRT_TRACKED;
	conn->flags &= ~CO_FL_XPRT_TRACKED;
	conn_close(conn);
	conn->flags \|= f;
	}
	if (!correct_dest)
	conn_clear(conn);
	}
	set_some_addr_info(&conn->addr);
	si_set_state(si, SI_ST_CON);
	...
	set_more_addr_info(&conn->addr);
	conn->connect();
	if (must_retry) {
	close_conn(conn);
	}
	}

	Note: we need to be able to set the control and transport protocols.
	On outgoing connections, this is set once we know the destination address.
	On incoming connections, this is set the earliest possible (once we know
	the source address).

	The problem analysed below was solved on 2013/10/22

	\| ==> the real requirement is to know whether a connection is still valid or not
	\| before deciding to close it. CO_FL_CONNECTED could be enough, though it
	\| will not indicate connections that are still waiting for a connect to occur.
	\| This combined with CO_FL_WAIT_L4_CONN and CO_FL_WAIT_L6_CONN should be OK.
	\|
	\| Alternatively, conn->xprt could be used for this, but needs some careful checks
	\| (it's used by conn_full_close at least).
	\|
	\| Right now, conn_xprt_close() checks conn->xprt and sets it to NULL.
	\| conn_full_close() also checks conn->xprt and sets it to NULL, except
	\| that the check on ctrl is performed within xprt. So conn_xprt_close()
	\| followed by conn_full_close() will not close the file descriptor.
	\| Note that conn_xprt_close() is never called, maybe we should kill it ?
	\|
	\| Note: at the moment, it's problematic to leave conn->xprt to NULL before doing
	\| xprt_init() because we might end up with a pending file descriptor. Or at
	\| least with some transport not de-initialized. We might thus need
	\| conn_xprt_close() when conn_xprt_init() fails.
	\|
	\| The fd should be conditionned by ->ctrl only, and the transport layer by ->xprt.
	\|
	\| - conn_prepare_ctrl(conn, ctrl)
	\| - conn_prepare_xprt(conn, xprt)
	\| - conn_prepare_data(conn, data)
	\|
	\| Note: conn_xprt_init() needs conn->xprt so it's not a problem to set it early.
	\|
	\| One problem might be with conn_xprt_close() not being able to know if xprt_init()
	\| was called or not. That's where it might make sense to only set ->xprt during init.
	\| Except that it does not fly with outgoing connections (xprt_init is called after
	\| connect()).
	\|
	\| => currently conn_xprt_close() is only used by ssl_sock.c and decides whether
	\| to do something based on ->xprt_ctx which is set by ->init() from xprt_init().
	\| So there is nothing to worry about. We just need to restore conn_xprt_close()
	\| and rely on ->ctrl to close the fd instead of ->xprt.
	\|
	\| => we have the same issue with conn_ctrl_close() : when is the fd supposed to be
	\| valid ? On outgoing connections, the control is set much before the fd...