Blame - doc/design-thoughts/entities-v2.txt - haproxy

blob: 8c9eb48433389279e8d6c8cb2ceb11d4cc77be01 [file] [log] [blame]

Willy Tarreau	c14b7d9	2014-06-19 16:03:41 +0200	[diff] [blame]	1	2012/07/05 - Connection layering and sequencing
				2
				3
				4	An FD has a state :
				5	- CLOSED
				6	- READY
				7	- ERROR (?)
				8	- LISTEN (?)
				9
				10	A connection has a state :
				11	- CLOSED
				12	- ACCEPTED
				13	- CONNECTING
				14	- ESTABLISHED
				15	- ERROR
				16
				17	A stream interface has a state :
				18	- INI, REQ, QUE, TAR, ASS, CON, CER, EST, DIS, CLO
				19
				20	Note that CON and CER might be replaced by EST if the connection state is used
				21	instead. CON might even be more suited than EST to indicate that a connection
				22	is known.
				23
				24
				25	si_shutw() must do :
				26
				27	data_shutw()
				28	if (shutr) {
				29	data_close()
				30	ctrl_shutw()
				31	ctrl_close()
				32	}
				33
				34	si_shutr() must do :
				35	data_shutr()
				36	if (shutw) {
				37	data_close()
				38	ctrl_shutr()
				39	ctrl_close()
				40	}
				41
				42	Each of these steps may fail, in which case the step must be retained and the
				43	operations postponed in an asynchronous task.
				44
				45	The first asynchronous data_shut() might already fail so it is mandatory to
				46	save the other side's status with the connection in order to let the async task
				47	know whether the 3 next steps must be performed.
				48
				49	The connection (or perhaps the FD) needs to know :
				50	- the desired close operations : DSHR, DSHW, CSHR, CSHW
				51	- the completed close operations : DSHR, DSHW, CSHR, CSHW
				52
				53
				54	On the accept() side, we probably need to know :
				55	- if a header is expected (eg: accept-proxy)
				56	- if this header is still being waited for
				57	=> maybe both info might be combined into one bit
				58
				59	- if a data-layer accept() is expected
				60	- if a data-layer accept() has been started
				61	- if a data-layer accept() has been performed
				62	=> possibly 2 bits, to indicate the need to free()
				63
				64	On the connect() side, we need to konw :
				65	- the desire to send a header (eg: send-proxy)
				66	- if this header has been sent
				67	=> maybe both info might be combined
				68
				69	- if a data-layer connect() is expected
				70	- if a data-layer connect() has been started
				71	- if a data-layer connect() has been completed
				72	=> possibly 2 bits, to indicate the need to free()
				73
				74	On the response side, we also need to know :
				75	- the desire to send a header (eg: health check response for monitor-net)
				76	- if this header was sent
				77	=> might be the same as sending a header over a new connection
				78
				79	Note: monitor-net has precedence over proxy proto and data layers. Same for
				80	health mode.
				81
				82	For multi-step operations, use 2 bits :
				83	00 = operation not desired, not performed
				84	10 = operation desired, not started
				85	11 = operation desired, started but not completed
				86	01 = operation desired, started and completed
				87
				88	=> X != 00 ==> operation desired
				89	X & 01 ==> operation at least started
				90	X & 10 ==> operation not completed
				91
				92	Note: no way to store status information for error reporting.
				93
				94	Note2: it would be nice if "tcp-request connection" rules could work at the
				95	connection level, just after headers ! This means support for tracking stick
				96	tables, possibly not too much complicated.
				97
				98
				99	Proposal for incoming connection sequence :
				100
				101	- accept()
				102	- if monitor-net matches or if mode health => try to send response
				103	- if accept-proxy, wait for proxy request
				104	- if tcp-request connection, process tcp rules and possibly keep the
				105	pointer to stick-table
				106	- if SSL is enabled, switch to SSL handshake
				107	- then switch to DATA state and instantiate a session
				108
				109	We just need a map of handshake handlers on the connection. They all manage the
				110	FD status themselves and set the callbacks themselves. If their work succeeds,
				111	they remove themselves from the list. If it fails, they remain subscribed and
				112	enable the required polling until they are woken up again or the timeout strikes.
				113
				114	Identified handshake handlers for incoming connections :
				115	- HH_HEALTH (tries to send OK and dies)
				116	- HH_MONITOR_IN (matches src IP and adds/removes HH_SEND_OK/HH_SEND_HTTP_OK)
				117	- HH_SEND_OK (tries to send "OK" and dies)
				118	- HH_SEND_HTTP_OK (tries to send "HTTP/1.0 200 OK" and dies)
				119	- HH_ACCEPT_PROXY (waits for PROXY line and parses it)
				120	- HH_TCP_RULES (processes TCP rules)
				121	- HH_SSL_HS (starts SSL handshake)
				122	- HH_ACCEPT_SESSION (instanciates a session)
				123
				124	Identified handshake handlers for outgoing connections :
				125	- HH_SEND_PROXY (tries to build and send the PROXY line)
				126	- HH_SSL_HS (starts SSL handshake)
				127
				128	For the pollers, we could check that handshake handlers are not 0 and decide to
				129	call a generic connection handshake handler instead of usual callbacks. Problem
				130	is that pollers don't know connections, they know fds. So entities which manage
				131	handlers should update change the FD callbacks accordingly.
				132
				133	With a bit of care, we could have :
				134	- HH_SEND_LAST_CHUNK (sends the chunk pointed to by a pointer and dies)
				135	=> merges HEALTH, SEND_OK and SEND_HTTP_OK
				136
				137	It sounds like the ctrl vs data state for the connection are per-direction
				138	(eg: support an async ctrl shutw while still reading data).
				139
				140	Also support shutr/shutw status at L4/L7.
				141
				142	In practice, what we really need is :
				143
				144	shutdown(conn) =
				145	conn.data.shut()
				146	conn.ctrl.shut()
				147	conn.fd.shut()
				148
				149	close(conn) =
				150	conn.data.close()
				151	conn.ctrl.close()
				152	conn.fd.close()
				153
				154	With SSL over Remote TCP (RTCP + RSSL) to reach the server, we would have :
				155
				156	HTTP -> RTCP+RSSL connection <-> RTCP+RRAW connection -> TCP+SSL connection
				157
				158	The connection has to be closed at 3 places after a successful response :
				159	- DATA (RSSL over RTCP)
				160	- CTRL (RTCP to close connection to server)
				161	- SOCK (FD to close connection to second process)
				162
				163	Externally, the connection is seen with very few flags :
				164	- SHR
				165	- SHW
				166	- ERR
				167
				168	We don't need a CLOSED flag as a connection must always be detached when it's closed.
				169
				170	The internal status doesn't need to be exposed :
				171	- FD allocated (Y/N)
				172	- CTRL initialized (Y/N)
				173	- CTRL connected (Y/N)
				174	- CTRL handlers done (Y/N)
				175	- CTRL failed (Y/N)
				176	- CTRL shutr (Y/N)
				177	- CTRL shutw (Y/N)
				178	- DATA initialized (Y/N)
				179	- DATA connected (Y/N)
				180	- DATA handlers done (Y/N)
				181	- DATA failed (Y/N)
				182	- DATA shutr (Y/N)
				183	- DATA shutw (Y/N)
				184
				185	(note that having flags for operations needing to be completed might be easier)
				186	--------------
				187
				188	Maybe we need to be able to call conn->fdset() and conn->fdclr() but it sounds
				189	very unlikely since the only functions manipulating this are in the code of
				190	the data/ctrl handlers.
				191
				192	FDSET/FDCLR cannot be directly controlled by the stream interface since it also
				193	depends on the DATA layer (WANT_READ/WANT_WRITE).
				194
				195	But FDSET/FDCLR is probably controlled by who owns the connection (eg: DATA).
				196
				197	Example: an SSL conn relies on an FD. The buffer is full, and wants the conn to
				198	stop reading. It must not stop the FD itself. It is the read function which
				199	should notice that it has nothing to do with a read wake-up, which needs to
				200	disable reading.
				201
				202	Conversely, when calling conn->chk_rcv(), the reader might get a WANT_READ or
				203	even WANT_WRITE and adjust the FDs accordingly.
				204
				205	------------------------
				206
				207	OK, the problem is simple : we don't manipulate the FD at the right level.
				208	We should have :
				209	->connect(), ->chk_snd(), ->chk_rcv(), ->shutw(), ->shutr() which are
				210	called from the upper layer (buffer)
				211	->recv(), ->send(), called from the lower layer
				212
				213	Note that the SHR is reported by lower layer but can be forced by upper
				214	layer. In this case it's like a delayed abort. The difficulty consists in
				215	knowing the output data were correctly read. Probably we'd need to drain
				216	incoming data past the active shutr().
				217
				218	The only four purposes of the top-down shutr() call are :
				219	- acknowledge a shut read report : could probably be done better
				220	- read timeout => disable reading : it's a delayed abort. We want to
				221	report that the buffer is SHR, maybe even the connection, but the
				222	FD clearly isn't.
				223	- read abort due to error on the other side or desire to close (eg:
				224	http-server-close) : delayed abort
				225	- complete abort
				226
				227	The active shutr() is problematic as we can't disable reading if we expect some
				228	exchanges for data acknowledgement. We probably need to drain data only until
				229	the shutw() has been performed and ACKed.
				230
				231	A connection shut down for read would behave like this :
				232
				233	1) bidir exchanges
				234
				235	2) shutr() => read_abort_pending=1
				236
				237	3) drain input, still send output
				238
				239	4) shutw()
				240
				241	5) drain input, wait for read0 or ack(shutw)
				242
				243	6) close()
				244
				245	--------------------- 2012/07/05 -------------------
				246
				247	Communications must be performed this way :
				248
				249	connection <-> channel <-> connection
				250
				251	A channel is composed of flags and stats, and may store data in either a buffer
				252	or a pipe. We need low-layer operations between sockets and buffers or pipes.
				253	Right now we only support sockets, but later we might support remote sockets
				254	and maybe pipes or shared memory segments.
				255
				256	So we need :
				257
				258	- raw_sock_to_buf() => receive raw data from socket into buffer
				259	- raw_sock_to_pipe => receive raw data from socket into pipe (splice in)
				260	- raw_sock_from_buf() => send raw data from buffer to socket
				261	- raw_sock_from_pipe => send raw data from pipe to socket (splice out)
				262
				263	- ssl_sock_to_buf() => receive ssl data from socket into buffer
				264	- ssl_sock_to_pipe => receive ssl data from socket into a pipe (NULL)
				265	- ssl_sock_from_buf() => send ssl data from buffer to socket
				266	- ssl_sock_from_pipe => send ssl data from pipe to socket (NULL)
				267
				268	These functions should set such status flags :
				269
				270	#define ERR_IN 0x01
				271	#define ERR_OUT 0x02
				272	#define SHUT_IN 0x04
				273	#define SHUT_OUT 0x08
				274	#define EMPTY_IN 0x10
				275	#define FULL_OUT 0x20
				276