blob: be209af6f1dbd5fab6f0b86029b08d0dca68e891 [file] [log] [blame]
Willy Tarreauc006dab2014-04-16 21:10:49 +020012014/04/16 - Pointer assignments during processing of the HTTP body
2
3In HAProxy, a struct http_msg is a descriptor for an HTTP message, which stores
4the state of an HTTP parser at any given instant, relative to a buffer which
5contains part of the message being inspected.
6
7Currently, an http_msg holds a few pointers and offsets to some important
8locations in a message depending on the state the parser is in. Some of these
9pointers and offsets may move when data are inserted into or removed from the
10buffer, others won't move.
11
12An important point is that the state of the parser only translates what the
13parser is reading, and not at all what is being done on the message (eg:
14forwarding).
15
16For an HTTP message <msg> and a buffer <buf>, we have the following elements
17to work with :
18
19
20Buffer :
21--------
22
23buf.size : the allocated size of the buffer. A message cannot be larger than
24 this size. In general, a message will even be smaller because the
25 size is almost always reduced by global.maxrewrite bytes.
26
27buf.data : memory area containing the part of the message being worked on. This
28 area is exactly <buf.size> bytes long. It should be seen as a sliding
29 window over the message, but in terms of implementation, it's closer
30 to a wrapping window. For ease of processing, new messages (requests
31 or responses) are aligned to the beginning of the buffer so that they
32 never wrap and common string processing functions can be used.
33
34buf.p : memory pointer (char *) to the beginning of the buffer as the parser
35 understands it. It commonly refers to the first character of an HTTP
36 request or response, but during forwarding, it can point to other
37 locations. This pointer always points to a location in <buf.data>.
38
39buf.i : number of bytes after <buf.p> that are available in the buffer. If
40 <buf.p + buf.i> exceeds <buf.data + buf.size>, then the pending data
41 wrap at the end of the buffer and continue at <buf.data>.
42
43buf.o : number of bytes already processed before <buf.p> that are pending
44 for departure. These bytes may leave at any instant once a connection
45 is established. These ones may wrap before <buf.data> to start before
46 <buf.data + buf.size>.
47
48It's common to call the part between buf.p and buf.p+buf.i the input buffer, and
49the part between buf.p-buf.o and buf.p the output buffer. This design permits
50efficient forwarding without copies. As a result, forwarding one byte from the
51input buffer to the output buffer only consists in :
52 - incrementing buf.p
53 - incrementing buf.o
54 - decrementing buf.i
55
56
57Message :
58---------
59Unless stated otherwise, all values are relative to <buf.p>, and are always
60comprised between 0 and <buf.i>. These values are relative offsets and they do
61not need to take wrapping into account, they are used as if the buffer was an
62infinite length sliding window. The buffer management functions handle the
63wrapping automatically.
64
65msg.next : points to the next byte to inspect. This offset is automatically
66 adjusted when inserting/removing some headers. In data states, it is
67 automatically adjusted to the number of bytes already inspected.
68
69msg.sov : start of value. First character of the header's value in the header
Willy Tarreaubb2e6692014-07-10 19:06:10 +020070 states, start of the body in the data states. Strictly positive
71 values indicate that headers were not forwarded yet (<buf.p> is
Willy Tarreaucbf8cf62014-08-14 19:02:46 +020072 before the start of the body), and null or negative values are seen
Willy Tarreaubb2e6692014-07-10 19:06:10 +020073 after headers are forwarded (<buf.p> is at or past the start of the
74 body). The value stops changing when data start to leave the buffer
75 (in order to avoid integer overflows). So the maximum possible range
76 is -<buf.size> to +<buf.size>. This offset is automatically adjusted
77 when inserting or removing some headers. It is useful to rewind the
78 request buffer to the beginning of the body at any phase. The
79 response buffer does not really use it since it is immediately
80 forwarded to the client.
Willy Tarreauc006dab2014-04-16 21:10:49 +020081
82msg.sol : start of line. Points to the beginning of the current header line
83 while parsing headers. It is cleared to zero in the BODY state,
Joseph Herlant02cedc42018-11-13 19:45:17 -080084 and contains exactly the number of bytes comprising the preceding
Willy Tarreauc006dab2014-04-16 21:10:49 +020085 chunk size in the DATA state (which can be zero), so that the sum of
86 msg.sov + msg.sol always points to the beginning of data for all
87 states starting with DATA. For chunked encoded messages, this sum
88 always corresponds to the beginning of the current chunk of data as
89 it appears in the buffer, or to be more precise, it corresponds to
Christopher Faulet113f7de2015-12-14 14:52:13 +010090 the first of the remaining bytes of chunked data to be inspected. In
91 TRAILERS state, it contains the length of the last parsed part of
92 the trailer headers.
Willy Tarreauc006dab2014-04-16 21:10:49 +020093
Joseph Herlant02cedc42018-11-13 19:45:17 -080094msg.eoh : end of headers. Points to the CRLF (or LF) preceding the body and
Willy Tarreauc006dab2014-04-16 21:10:49 +020095 marking the end of headers. It is where new headers are appended.
96 This offset is automatically adjusted when inserting/removing some
97 headers. It always contains the size of the headers excluding the
98 trailing CRLF even after headers have been forwarded.
99
100msg.eol : end of line. Points to the CRLF or LF of the current header line
101 being inspected during the various header states. In data states, it
102 holds the trailing CRLF length (1 or 2) so that msg.eoh + msg.eol
103 always equals the exact header length. It is not affected during data
104 states nor by forwarding.
105
106The beginning of the message headers can always be found this way even after
Willy Tarreaubb2e6692014-07-10 19:06:10 +0200107headers or data have been forwarded, provided that everything is still present
108in the buffer :
Willy Tarreauc006dab2014-04-16 21:10:49 +0200109
110 headers = buf.p + msg->sov - msg->eoh - msg->eol
111
112
113Message length :
114----------------
115msg.chunk_len : amount of bytes of the current chunk or total message body
116 remaining to be inspected after msg.next. It is automatically
117 incremented when parsing a chunk size, and decremented as data
118 are forwarded.
119
120msg.body_len : total message body length, for logging. Equals Content-Length
121 when used, otherwise is the sum of all correctly parsed chunks.
122
123
124Message state :
125---------------
126msg.msg_state contains the current parser state, one of HTTP_MSG_*. The state
127indicates what byte is expected at msg->next.
128
129HTTP_MSG_BODY : all headers have been parsed, parsing of body has not
130 started yet.
131
132HTTP_MSG_100_SENT : parsing of body has started. If a 100-Continue was needed
133 it has already been sent.
134
135HTTP_MSG_DATA : some bytes are remaining for either the whole body when
136 the message size is determined by Content-Length, or for
137 the current chunk in chunked-encoded mode.
138
139HTTP_MSG_CHUNK_CRLF : msg->next points to the CRLF after the current data chunk.
140
141HTTP_MSG_TRAILERS : msg->next points to the beginning of a possibly empty
142 trailer line after the final empty chunk.
143
144HTTP_MSG_DONE : all the Content-Length data has been inspected, or the
145 final CRLF after trailers has been met.
146
147
148Message forwarding :
149--------------------
150Forwarding part of a message consists in advancing buf.p up to the point where
151it points to the byte following the last one to be forwarded. This can be done
152inline if enough bytes are present in the buffer, or in multiple steps if more
153buffers need to be forwarded (possibly including splicing). Thus by definition,
154after a block has been scheduled for being forwarded, msg->next and msg->sov
155must be reset.
156
157The communication channel between the producer and the consumer holds a counter
158of extra bytes remaining to be forwarded directly without consulting analysers,
159after buf.p. This counter is called to_forward. It commonly holds the advertised
160chunk length or content-length that does not fit in the buffer. For example, if
1612000 bytes are to be forwarded, and 10 bytes are present after buf.p as reported
162by buf.i, then both buf.o and buf.p will advance by 10, buf.i will be reset, and
163to_forward will be set to 1990 so that in total, 2000 bytes will be forwarded.
164At the end of the forwarding, buf.p will point to the first byte to be inspected
165after the 2000 forwarded bytes.