Blame - doc/design-thoughts/buffer-redesign.txt - haproxy

blob: c7d4345e779ee67a8fcf06173010b5634e4e857a [file] [log] [blame]

Willy Tarreau	1122d9c	2012-02-27 19:31:50 +0100	[diff] [blame]	1	2012/02/27 - redesigning buffers for better simplicity - w@1wt.eu
				2
				3	1) Analysis
				4	-----------
				5
				6	Buffer handling becomes complex because buffers are circular but many of their
				7	users don't support wrapping operations (eg: HTTP parsing). Due to this fact,
				8	some buffer operations automatically realign buffers as soon as possible when
				9	the buffer is empty, which makes it very hard to track buffer pointers outside
				10	of the buffer struct itself. The buffer contains a pointer to last processed
				11	data (buf->lr) which is automatically realigned with such operations. But in
				12	the end, its semantics are often unclear and whether it's safe or not to use it
				13	isn't always obvious, as it has acquired multiple roles over the time.
				14
				15	A "struct buffer" is declared this way :
				16
				17	struct buffer {
				18	unsigned int flags; /* BF_* */
				19	int rex; /* expiration date for a read, in ticks */
				20	int wex; /* expiration date for a write or connect, in ticks */
				21	int rto; /* read timeout, in ticks */
				22	int wto; /* write timeout, in ticks */
				23	unsigned int l; /* data length */
				24	char r, w, lr; / read ptr, write ptr, last read */
				25	unsigned int size; /* buffer size in bytes */
				26	unsigned int send_max; /* number of bytes the sender can consume om this buffer, <= l */
				27	unsigned int to_forward; /* number of bytes to forward after send_max without a wake-up */
				28	unsigned int analysers; /* bit field indicating what to do on the buffer */
				29	int analyse_exp; /* expiration date for current analysers (if set) */
				30	void (hijacker)(struct session , struct buffer ); / alternative content producer */
				31	unsigned char xfer_large; /* number of consecutive large xfers */
				32	unsigned char xfer_small; /* number of consecutive small xfers */
				33	unsigned long long total; /* total data read */
				34	struct stream_interface prod; / producer attached to this buffer */
				35	struct stream_interface cons; / consumer attached to this buffer */
				36	struct pipe pipe; / non-NULL only when data present */
				37	char data[0]; /* <size> bytes */
				38	};
				39
				40	In order to address this, a struct http_msg was created with other pointers to
				41	the buffer. The issue is that some of these pointers are absolute and other
				42	ones are relative, sometimes one to another, sometimes to the beginning of the
				43	buffer, which doesn't help at all for the case where buffers get realigned.
				44
				45	A "struct http_msg" is defined this way :
				46
				47	struct http_msg {
				48	unsigned int msg_state;
				49	unsigned int flags;
				50	unsigned int col, sov; /* current header: colon, start of value */
				51	unsigned int eoh; /* End Of Headers, relative to buffer */
				52	char sol; / start of line, also start of message when fully parsed */
				53	char eol; / end of line */
				54	unsigned int som; /* Start Of Message, relative to buffer */
				55	int err_pos; /* err handling: -2=block, -1=pass, 0+=detected */
				56	union { /* useful start line pointers, relative to ->sol */
				57	struct {
				58	int l; /* request line length (not including CR) */
				59	int m_l; /* METHOD length (method starts at ->som) */
				60	int u, u_l; /* URI, length */
				61	int v, v_l; /* VERSION, length */
				62	} rq; /* request line : field, length */
				63	struct {
				64	int l; /* status line length (not including CR) */
				65	int v_l; /* VERSION length (version starts at ->som) */
				66	int c, c_l; /* CODE, length */
				67	int r, r_l; /* REASON, length */
				68	} st; /* status line : field, length */
				69	} sl; /* start line */
				70	unsigned long long chunk_len;
				71	unsigned long long body_len;
				72	char **cap;
				73	};
				74
				75
				76	The first immediate observation is that nothing in a buffer should be relative
				77	to the beginning of the storage area, everything should be relative to the
				78	buffer's origin as a floating location. Right now the buffer's origin is equal
				79	to (buf->w + buf->send_max). It is the place where the first byte of data not
				80	yet scheduled for being forwarded is found.
				81
				82	- buf->w is an absolute pointer, just like buf->data.
				83	- buf->send_max is a relative value which oscillates between 0 when nothing
				84	has to be forwarded, and buf->l when the whole buffer must be forwarded.
				85
				86
				87	2) Proposal
				88	-----------
				89
				90	By having such an origin, we could have everything in http_msg relative to this
				91	origin. This would resist buffer realigns much better than right now.
				92
				93	At the moment we have msg->som which is relative to buf->data and which points
				94	to the beginning of the message. The beginning of the message should always
				95	be the buffer's origin. If data are to be skipped in the message, just wait for
				96	send_max to become zero and move the origin forwards ; this would definitely get
				97	rid of msg->som. This is already what is done in the HTTP parser except that it
				98	has to move both buf->lr and msg->som.
				99
				100	Following the same principle, we should then have a relative pointer in
				101	http_msg to replace buf->lr. It would be relative to the buffer's origin and
				102	would simply recall what location was last visited.
				103
				104	Doing all this could result in more complex operations where more time is spent
				105	adding buf->w to buf->send_max and then to msg->anything. It would probably make
				106	more sense to define the buffer's origin as an absolute pointer and to have
				107	both the buf->h (head) and buf->t (tail) pointers be positive and negative
				108	positions relative to this origin. Operating on the buffer would then look like
				109	this :
				110
				111	- no buf->l anymore. buf->l is replaced by (head + tail)
				112	- no buf->lr anymore. Use origin + msg->last for instance
				113	- recv() : head += recv(origin + head);
				114	- send() : tail -= send(origin - tail, tail);
				115	thus, buf->o effectively replaces buf->send_max.
				116	- forward(N) : tail += N; origin += N;
				117	- realign() : origin = data
				118	- detect risk of wrapping of input : origin + head > data + size
				119
				120	In general it looks like less pointers are manipulated for common operations
				121	and that maybe an additional wrapping test (hand-made modulo) will have to be
				122	added so send() and recv() operations.
				123
				124
				125	3) Caveats
				126	----------
				127
				128	The first caveat is that the elements to modify appear at a very large number
				129	of places.