Blame - doc/internals/notes-polling.txt - haproxy

blob: e7741a668827ef212c79e65c26fc397f838f5ed3 [file] [log] [blame]

Willy Tarreau	c89f665	2019-09-06 18:50:32 +0200	[diff] [blame]	1	2019-09-03
				2
				3	u8 fd.state;
				4	u8 fd.ev;
				5
				6
				7	ev = one of :
				8	#define FD_POLL_IN 0x01
				9	#define FD_POLL_PRI 0x02
				10	#define FD_POLL_OUT 0x04
				11	#define FD_POLL_ERR 0x08
				12	#define FD_POLL_HUP 0x10
				13
				14	Could we instead have :
				15
				16	FD_WAIT_IN 0x01
				17	FD_WAIT_OUT 0x02
				18	FD_WAIT_PRI 0x04
				19	FD_SEEN_HUP 0x08
				20	FD_SEEN_HUP 0x10
				21	FD_WAIT_CON 0x20 <<= shouldn't this be in the connection itself in fact ?
				22
				23	=> not needed, covered by the state instead.
				24
				25	What is missing though is :
				26	- FD_DATA_PENDING -- overlaps with READY_R, OK if passed by pollers only
				27	- FD_EOI_PENDING
				28	- FD_ERR_PENDING
				29	- FD_EOI
				30	- FD_SHW
				31	- FD_ERR
				32
				33	fd_update_events() could do that :
				34
				35	if ((fd_data_pending\|fd_eoi_pending\|fd_err_pending) && !(fd_err\|fd_eoi))
				36	may_recv()
				37
				38	if (fd_send_ok && !(fd_err\|fd_shw))
				39	may_send()
				40
				41	if (fd_err)
				42	wake()
				43
				44	the poller could do that :
				45	HUP+OUT => always indicates a failed connect(), it should not lack ERR. Is this err_pending ?
				46
				47	ERR HUP OUT IN
				48	0 0 0 0 => nothing
				49	0 0 0 1 => FD_DATA_PENDING
				50	0 0 1 0 => FD_SEND_OK
				51	0 0 1 1 => FD_DATA_PENDING\|FD_SEND_OK
				52	0 1 0 0 => FD_EOI (\|FD_SHW)
				53	0 1 0 1 => FD_DATA_PENDING\|FD_EOI_PENDING (\|FD_SHW)
				54	0 1 1 0 => FD_EOI \|FD_ERR (\|FD_SHW)
				55	0 1 1 1 => FD_EOI_PENDING (\|FD_ERR_PENDING) \|FD_DATA_PENDING (\|FD_SHW)
				56	1 X 0 0 => FD_ERR \| FD_EOI (\|FD_SHW)
				57	1 X X 1 => FD_ERR_PENDING \| FD_EOI_PENDING \| FD_DATA_PENDING (\|FD_SHW)
				58	1 X 1 0 => FD_ERR \| FD_EOI (\|FD_SHW)
				59
				60	OUT+HUP,OUT+HUP+ERR => FD_ERR
				61
				62	This reorders to:
				63
				64	IN ERR HUP OUT
				65	0 0 0 0 => nothing
				66	0 0 0 1 => FD_SEND_OK
				67	0 0 1 0 => FD_EOI (\|FD_SHW)
				68
				69	0 X 1 1 => FD_ERR \| FD_EOI (\|FD_SHW)
				70	0 1 X 0 => FD_ERR \| FD_EOI (\|FD_SHW)
				71	0 1 X 1 => FD_ERR \| FD_EOI (\|FD_SHW)
				72
				73	1 0 0 0 => FD_DATA_PENDING
				74	1 0 0 1 => FD_DATA_PENDING\|FD_SEND_OK
				75	1 0 1 0 => FD_DATA_PENDING\|FD_EOI_PENDING (\|FD_SHW)
				76	1 0 1 1 => FD_EOI_PENDING (\|FD_ERR_PENDING) \|FD_DATA_PENDING (\|FD_SHW)
				77	1 1 X X => FD_ERR_PENDING \| FD_EOI_PENDING \| FD_DATA_PENDING (\|FD_SHW)
				78
				79	Regarding "\|SHW", it's normally useless since it will already have been done,
				80	except on connect() error where this indicates there's no need for SHW.
				81
				82	FD_EOI and FD_SHW could be part of the state (FD_EV_SHUT_R, FD_EV_SHUT_W).
				83	Then all states having these bit and another one would be transient and need
				84	to resync. We could then have "fd_shut_recv" and "fd_shut_send" to turn these
				85	states.
				86
				87	The FD's ev then only needs to update EOI_PENDING, ERR_PENDING, ERR, DATA_PENDING.
				88	With this said, these are not exactly polling states either, as err/eoi/shw are
				89	orthogonal to the other states and are required to update them so that the polling
				90	state really is DISABLED in the end. So we need more of an operational status for
				91	the FD containing EOI_PENDING, EOI, ERR_PENDING, ERR, SHW, CLO?. These could be
				92	classified in 3 categories: read:(OPEN, EOI_PENDING, EOI); write:(OPEN,SHW),
				93	ctrl:(OPEN,ERR_PENDING,ERR,CLO). That would be 2 bits for R, 1 for W, 2 for ctrl
				94	or total 5 vs 6 for individual ones, but would be harder to manipulate.
				95
				96	Proposal:
				97	- rename fdtab[].state to "polling_state"
				98	- rename fdtab[].ev to "status"
				99
				100	Note: POLLHUP is also reported is a listen() socket has gone in shutdown()
				101	TEMPORARILY! Thus we may not always consider this as a final error.
				102
				103
				104	Work hypothesis:
				105
				106	SHUT RDY ACT
				107	0 0 0 => disabled
				108	0 0 1 => active
				109	0 1 0 => stopped
				110	0 1 1 => ready
				111	1 0 0 => final shut
				112	1 0 1 => shut pending without data
				113	1 1 0 => shut pending, stopped
				114	1 1 1 => shut pending
				115
				116	PB: we can land into final shut if one thread disables the FD while another
				117	one that was waiting on it reports it as shut. Theorically it should be
				118	implicitly ready though, since reported. But if no data is reported, it
				119	will be reportedly shut only. And no event will be reported then. This
				120	might still make sense since it's not active, thus we don't want events.
				121	But it will not be enabled later either in this case so the shut really
				122	risks not to be properly reported. The issue is that there's no difference
				123	between a shut coming from the bottom and a shut coming from the top, and
				124	we need an event to report activity here. Or we may consider that a poller
				125	never leaves a final shut by itself (100) and always reports it as
				126	shut+stop (thus ready) if it was not active. Alternately, if active is
				127	disabled, shut should possibly be ignored, then a poller cannot report
				128	shut. But shut+stopped seems the most suitable as it corresponds to
				129	disabled->stopped transition.
				130
				131	Now let's add ERR. ERR necessarily implies SHUT as there doesn't seem to be a
				132	valid case of ERR pending without shut pending.
				133
				134	ERR SHUT RDY ACT
				135	0 0 0 0 => disabled
				136	0 0 0 1 => active
				137	0 0 1 0 => stopped
				138	0 0 1 1 => ready
				139
				140	0 1 0 0 => final shut, no error
				141	0 1 0 1 => shut pending without data
				142	0 1 1 0 => shut pending, stopped
				143	0 1 1 1 => shut pending
				144
				145	1 0 X X => invalid
				146
				147	1 1 0 0 => final shut, error encountered
				148	1 1 0 1 => error pending without data
				149	1 1 1 0 => error pending after data, stopped
				150	1 1 1 1 => error pending
				151
				152	So the algorithm for the poller is:
				153	- if (shutdown_pending or error) reported and ACT==0,
				154	report SHUT\|RDY or SHUT\|ERR\|RDY
				155
				156	For read handlers :
				157	- if (!(flags & (RDY\|ACT)))
				158	return
				159	- if (ready)
				160	try_to_read
				161	- if (err)
				162	report error
				163	- if (shut)
				164	read0
				165
				166	For write handlers:
				167	- if (!(flags & (RDY\|ACT)))
				168	return
				169	- if (err\|\|shut)
				170	report error
				171	- if (ready)
				172	try_to_write
				173
				174	For listeners:
				175	- if (!(flags & (RDY\|ACT)))
				176	return
				177	- if (err\|\|shut)
				178	pause
				179	- if (ready)
				180	try_to_accept
				181
				182	Kqueue reports events differently, it says EV_EOF() on READ or WRITE, that
				183	we currently map to FD_POLL_HUP and FD_POLL_ERR. Thus kqueue reports only
				184	POLLRDHUP and not POLLHUP, so for now a direct mapping of POLLHUP to
				185	FD_POLL_HUP does NOT imply write closed with kqueue while it does for others.
				186
				187	Other approach, use the {RD,WR}_{ERR,SHUT,RDY} flags to build a composite
				188	status in each poller and pass this to fd_update_events(). We normally
				189	have enough to be precise, and this latter will rework the events.
				190
				191	FIXME: Normally on KQUEUE we're supposed to look at kev[].fflags to get the error
				192	on EV_EOF() on read or write.