Blame - doc/internals/body-parsing.txt - haproxy

blob: be209af6f1dbd5fab6f0b86029b08d0dca68e891 [file] [log] [blame]

Willy Tarreau	c006dab	2014-04-16 21:10:49 +0200	[diff] [blame]	1	2014/04/16 - Pointer assignments during processing of the HTTP body
				2
				3	In HAProxy, a struct http_msg is a descriptor for an HTTP message, which stores
				4	the state of an HTTP parser at any given instant, relative to a buffer which
				5	contains part of the message being inspected.
				6
				7	Currently, an http_msg holds a few pointers and offsets to some important
				8	locations in a message depending on the state the parser is in. Some of these
				9	pointers and offsets may move when data are inserted into or removed from the
				10	buffer, others won't move.
				11
				12	An important point is that the state of the parser only translates what the
				13	parser is reading, and not at all what is being done on the message (eg:
				14	forwarding).
				15
				16	For an HTTP message <msg> and a buffer <buf>, we have the following elements
				17	to work with :
				18
				19
				20	Buffer :
				21	--------
				22
				23	buf.size : the allocated size of the buffer. A message cannot be larger than
				24	this size. In general, a message will even be smaller because the
				25	size is almost always reduced by global.maxrewrite bytes.
				26
				27	buf.data : memory area containing the part of the message being worked on. This
				28	area is exactly <buf.size> bytes long. It should be seen as a sliding
				29	window over the message, but in terms of implementation, it's closer
				30	to a wrapping window. For ease of processing, new messages (requests
				31	or responses) are aligned to the beginning of the buffer so that they
				32	never wrap and common string processing functions can be used.
				33
				34	buf.p : memory pointer (char *) to the beginning of the buffer as the parser
				35	understands it. It commonly refers to the first character of an HTTP
				36	request or response, but during forwarding, it can point to other
				37	locations. This pointer always points to a location in <buf.data>.
				38
				39	buf.i : number of bytes after <buf.p> that are available in the buffer. If
				40	<buf.p + buf.i> exceeds <buf.data + buf.size>, then the pending data
				41	wrap at the end of the buffer and continue at <buf.data>.
				42
				43	buf.o : number of bytes already processed before <buf.p> that are pending
				44	for departure. These bytes may leave at any instant once a connection
				45	is established. These ones may wrap before <buf.data> to start before
				46	<buf.data + buf.size>.
				47
				48	It's common to call the part between buf.p and buf.p+buf.i the input buffer, and
				49	the part between buf.p-buf.o and buf.p the output buffer. This design permits
				50	efficient forwarding without copies. As a result, forwarding one byte from the
				51	input buffer to the output buffer only consists in :
				52	- incrementing buf.p
				53	- incrementing buf.o
				54	- decrementing buf.i
				55
				56
				57	Message :
				58	---------
				59	Unless stated otherwise, all values are relative to <buf.p>, and are always
				60	comprised between 0 and <buf.i>. These values are relative offsets and they do
				61	not need to take wrapping into account, they are used as if the buffer was an
				62	infinite length sliding window. The buffer management functions handle the
				63	wrapping automatically.
				64
				65	msg.next : points to the next byte to inspect. This offset is automatically
				66	adjusted when inserting/removing some headers. In data states, it is
				67	automatically adjusted to the number of bytes already inspected.
				68
				69	msg.sov : start of value. First character of the header's value in the header
Willy Tarreau	bb2e669	2014-07-10 19:06:10 +0200	[diff] [blame]	70	states, start of the body in the data states. Strictly positive
				71	values indicate that headers were not forwarded yet (<buf.p> is
Willy Tarreau	cbf8cf6	2014-08-14 19:02:46 +0200	[diff] [blame]	72	before the start of the body), and null or negative values are seen
Willy Tarreau	bb2e669	2014-07-10 19:06:10 +0200	[diff] [blame]	73	after headers are forwarded (<buf.p> is at or past the start of the
				74	body). The value stops changing when data start to leave the buffer
				75	(in order to avoid integer overflows). So the maximum possible range
				76	is -<buf.size> to +<buf.size>. This offset is automatically adjusted
				77	when inserting or removing some headers. It is useful to rewind the
				78	request buffer to the beginning of the body at any phase. The
				79	response buffer does not really use it since it is immediately
				80	forwarded to the client.
Willy Tarreau	c006dab	2014-04-16 21:10:49 +0200	[diff] [blame]	81
				82	msg.sol : start of line. Points to the beginning of the current header line
				83	while parsing headers. It is cleared to zero in the BODY state,
Joseph Herlant	02cedc4	2018-11-13 19:45:17 -0800	[diff] [blame]	84	and contains exactly the number of bytes comprising the preceding
Willy Tarreau	c006dab	2014-04-16 21:10:49 +0200	[diff] [blame]	85	chunk size in the DATA state (which can be zero), so that the sum of
				86	msg.sov + msg.sol always points to the beginning of data for all
				87	states starting with DATA. For chunked encoded messages, this sum
				88	always corresponds to the beginning of the current chunk of data as
				89	it appears in the buffer, or to be more precise, it corresponds to
Christopher Faulet	113f7de	2015-12-14 14:52:13 +0100	[diff] [blame]	90	the first of the remaining bytes of chunked data to be inspected. In
				91	TRAILERS state, it contains the length of the last parsed part of
				92	the trailer headers.
Willy Tarreau	c006dab	2014-04-16 21:10:49 +0200	[diff] [blame]	93
Joseph Herlant	02cedc4	2018-11-13 19:45:17 -0800	[diff] [blame]	94	msg.eoh : end of headers. Points to the CRLF (or LF) preceding the body and
Willy Tarreau	c006dab	2014-04-16 21:10:49 +0200	[diff] [blame]	95	marking the end of headers. It is where new headers are appended.
				96	This offset is automatically adjusted when inserting/removing some
				97	headers. It always contains the size of the headers excluding the
				98	trailing CRLF even after headers have been forwarded.
				99
				100	msg.eol : end of line. Points to the CRLF or LF of the current header line
				101	being inspected during the various header states. In data states, it
				102	holds the trailing CRLF length (1 or 2) so that msg.eoh + msg.eol
				103	always equals the exact header length. It is not affected during data
				104	states nor by forwarding.
				105
				106	The beginning of the message headers can always be found this way even after
Willy Tarreau	bb2e669	2014-07-10 19:06:10 +0200	[diff] [blame]	107	headers or data have been forwarded, provided that everything is still present
				108	in the buffer :
Willy Tarreau	c006dab	2014-04-16 21:10:49 +0200	[diff] [blame]	109
				110	headers = buf.p + msg->sov - msg->eoh - msg->eol
				111
				112
				113	Message length :
				114	----------------
				115	msg.chunk_len : amount of bytes of the current chunk or total message body
				116	remaining to be inspected after msg.next. It is automatically
				117	incremented when parsing a chunk size, and decremented as data
				118	are forwarded.
				119
				120	msg.body_len : total message body length, for logging. Equals Content-Length
				121	when used, otherwise is the sum of all correctly parsed chunks.
				122
				123
				124	Message state :
				125	---------------
				126	msg.msg_state contains the current parser state, one of HTTP_MSG_*. The state
				127	indicates what byte is expected at msg->next.
				128
				129	HTTP_MSG_BODY : all headers have been parsed, parsing of body has not
				130	started yet.
				131
				132	HTTP_MSG_100_SENT : parsing of body has started. If a 100-Continue was needed
				133	it has already been sent.
				134
				135	HTTP_MSG_DATA : some bytes are remaining for either the whole body when
				136	the message size is determined by Content-Length, or for
				137	the current chunk in chunked-encoded mode.
				138
				139	HTTP_MSG_CHUNK_CRLF : msg->next points to the CRLF after the current data chunk.
				140
				141	HTTP_MSG_TRAILERS : msg->next points to the beginning of a possibly empty
				142	trailer line after the final empty chunk.
				143
				144	HTTP_MSG_DONE : all the Content-Length data has been inspected, or the
				145	final CRLF after trailers has been met.
				146
				147
				148	Message forwarding :
				149	--------------------
				150	Forwarding part of a message consists in advancing buf.p up to the point where
				151	it points to the byte following the last one to be forwarded. This can be done
				152	inline if enough bytes are present in the buffer, or in multiple steps if more
				153	buffers need to be forwarded (possibly including splicing). Thus by definition,
				154	after a block has been scheduled for being forwarded, msg->next and msg->sov
				155	must be reset.
				156
				157	The communication channel between the producer and the consumer holds a counter
				158	of extra bytes remaining to be forwarded directly without consulting analysers,
				159	after buf.p. This counter is called to_forward. It commonly holds the advertised
				160	chunk length or content-length that does not fit in the buffer. For example, if
				161	2000 bytes are to be forwarded, and 10 bytes are present after buf.p as reported
				162	by buf.i, then both buf.o and buf.p will advance by 10, buf.i will be reset, and
				163	to_forward will be set to 1990 so that in total, 2000 bytes will be forwarded.
				164	At the end of the forwarding, buf.p will point to the first byte to be inspected
				165	after the 2000 forwarded bytes.