Willy Tarreau | c006dab | 2014-04-16 21:10:49 +0200 | [diff] [blame] | 1 | 2014/04/16 - Pointer assignments during processing of the HTTP body |
| 2 | |
| 3 | In HAProxy, a struct http_msg is a descriptor for an HTTP message, which stores |
| 4 | the state of an HTTP parser at any given instant, relative to a buffer which |
| 5 | contains part of the message being inspected. |
| 6 | |
| 7 | Currently, an http_msg holds a few pointers and offsets to some important |
| 8 | locations in a message depending on the state the parser is in. Some of these |
| 9 | pointers and offsets may move when data are inserted into or removed from the |
| 10 | buffer, others won't move. |
| 11 | |
| 12 | An important point is that the state of the parser only translates what the |
| 13 | parser is reading, and not at all what is being done on the message (eg: |
| 14 | forwarding). |
| 15 | |
| 16 | For an HTTP message <msg> and a buffer <buf>, we have the following elements |
| 17 | to work with : |
| 18 | |
| 19 | |
| 20 | Buffer : |
| 21 | -------- |
| 22 | |
| 23 | buf.size : the allocated size of the buffer. A message cannot be larger than |
| 24 | this size. In general, a message will even be smaller because the |
| 25 | size is almost always reduced by global.maxrewrite bytes. |
| 26 | |
| 27 | buf.data : memory area containing the part of the message being worked on. This |
| 28 | area is exactly <buf.size> bytes long. It should be seen as a sliding |
| 29 | window over the message, but in terms of implementation, it's closer |
| 30 | to a wrapping window. For ease of processing, new messages (requests |
| 31 | or responses) are aligned to the beginning of the buffer so that they |
| 32 | never wrap and common string processing functions can be used. |
| 33 | |
| 34 | buf.p : memory pointer (char *) to the beginning of the buffer as the parser |
| 35 | understands it. It commonly refers to the first character of an HTTP |
| 36 | request or response, but during forwarding, it can point to other |
| 37 | locations. This pointer always points to a location in <buf.data>. |
| 38 | |
| 39 | buf.i : number of bytes after <buf.p> that are available in the buffer. If |
| 40 | <buf.p + buf.i> exceeds <buf.data + buf.size>, then the pending data |
| 41 | wrap at the end of the buffer and continue at <buf.data>. |
| 42 | |
| 43 | buf.o : number of bytes already processed before <buf.p> that are pending |
| 44 | for departure. These bytes may leave at any instant once a connection |
| 45 | is established. These ones may wrap before <buf.data> to start before |
| 46 | <buf.data + buf.size>. |
| 47 | |
| 48 | It's common to call the part between buf.p and buf.p+buf.i the input buffer, and |
| 49 | the part between buf.p-buf.o and buf.p the output buffer. This design permits |
| 50 | efficient forwarding without copies. As a result, forwarding one byte from the |
| 51 | input buffer to the output buffer only consists in : |
| 52 | - incrementing buf.p |
| 53 | - incrementing buf.o |
| 54 | - decrementing buf.i |
| 55 | |
| 56 | |
| 57 | Message : |
| 58 | --------- |
| 59 | Unless stated otherwise, all values are relative to <buf.p>, and are always |
| 60 | comprised between 0 and <buf.i>. These values are relative offsets and they do |
| 61 | not need to take wrapping into account, they are used as if the buffer was an |
| 62 | infinite length sliding window. The buffer management functions handle the |
| 63 | wrapping automatically. |
| 64 | |
| 65 | msg.next : points to the next byte to inspect. This offset is automatically |
| 66 | adjusted when inserting/removing some headers. In data states, it is |
| 67 | automatically adjusted to the number of bytes already inspected. |
| 68 | |
| 69 | msg.sov : start of value. First character of the header's value in the header |
Willy Tarreau | bb2e669 | 2014-07-10 19:06:10 +0200 | [diff] [blame] | 70 | states, start of the body in the data states. Strictly positive |
| 71 | values indicate that headers were not forwarded yet (<buf.p> is |
| 72 | before the start of the body), and null or positive values are seen |
| 73 | after headers are forwarded (<buf.p> is at or past the start of the |
| 74 | body). The value stops changing when data start to leave the buffer |
| 75 | (in order to avoid integer overflows). So the maximum possible range |
| 76 | is -<buf.size> to +<buf.size>. This offset is automatically adjusted |
| 77 | when inserting or removing some headers. It is useful to rewind the |
| 78 | request buffer to the beginning of the body at any phase. The |
| 79 | response buffer does not really use it since it is immediately |
| 80 | forwarded to the client. |
Willy Tarreau | c006dab | 2014-04-16 21:10:49 +0200 | [diff] [blame] | 81 | |
| 82 | msg.sol : start of line. Points to the beginning of the current header line |
| 83 | while parsing headers. It is cleared to zero in the BODY state, |
| 84 | and contains exactly the number of bytes comprising the preceeding |
| 85 | chunk size in the DATA state (which can be zero), so that the sum of |
| 86 | msg.sov + msg.sol always points to the beginning of data for all |
| 87 | states starting with DATA. For chunked encoded messages, this sum |
| 88 | always corresponds to the beginning of the current chunk of data as |
| 89 | it appears in the buffer, or to be more precise, it corresponds to |
| 90 | the first of the remaining bytes of chunked data to be inspected. |
| 91 | |
| 92 | msg.eoh : end of headers. Points to the CRLF (or LF) preceeding the body and |
| 93 | marking the end of headers. It is where new headers are appended. |
| 94 | This offset is automatically adjusted when inserting/removing some |
| 95 | headers. It always contains the size of the headers excluding the |
| 96 | trailing CRLF even after headers have been forwarded. |
| 97 | |
| 98 | msg.eol : end of line. Points to the CRLF or LF of the current header line |
| 99 | being inspected during the various header states. In data states, it |
| 100 | holds the trailing CRLF length (1 or 2) so that msg.eoh + msg.eol |
| 101 | always equals the exact header length. It is not affected during data |
| 102 | states nor by forwarding. |
| 103 | |
| 104 | The beginning of the message headers can always be found this way even after |
Willy Tarreau | bb2e669 | 2014-07-10 19:06:10 +0200 | [diff] [blame] | 105 | headers or data have been forwarded, provided that everything is still present |
| 106 | in the buffer : |
Willy Tarreau | c006dab | 2014-04-16 21:10:49 +0200 | [diff] [blame] | 107 | |
| 108 | headers = buf.p + msg->sov - msg->eoh - msg->eol |
| 109 | |
| 110 | |
| 111 | Message length : |
| 112 | ---------------- |
| 113 | msg.chunk_len : amount of bytes of the current chunk or total message body |
| 114 | remaining to be inspected after msg.next. It is automatically |
| 115 | incremented when parsing a chunk size, and decremented as data |
| 116 | are forwarded. |
| 117 | |
| 118 | msg.body_len : total message body length, for logging. Equals Content-Length |
| 119 | when used, otherwise is the sum of all correctly parsed chunks. |
| 120 | |
| 121 | |
| 122 | Message state : |
| 123 | --------------- |
| 124 | msg.msg_state contains the current parser state, one of HTTP_MSG_*. The state |
| 125 | indicates what byte is expected at msg->next. |
| 126 | |
| 127 | HTTP_MSG_BODY : all headers have been parsed, parsing of body has not |
| 128 | started yet. |
| 129 | |
| 130 | HTTP_MSG_100_SENT : parsing of body has started. If a 100-Continue was needed |
| 131 | it has already been sent. |
| 132 | |
| 133 | HTTP_MSG_DATA : some bytes are remaining for either the whole body when |
| 134 | the message size is determined by Content-Length, or for |
| 135 | the current chunk in chunked-encoded mode. |
| 136 | |
| 137 | HTTP_MSG_CHUNK_CRLF : msg->next points to the CRLF after the current data chunk. |
| 138 | |
| 139 | HTTP_MSG_TRAILERS : msg->next points to the beginning of a possibly empty |
| 140 | trailer line after the final empty chunk. |
| 141 | |
| 142 | HTTP_MSG_DONE : all the Content-Length data has been inspected, or the |
| 143 | final CRLF after trailers has been met. |
| 144 | |
| 145 | |
| 146 | Message forwarding : |
| 147 | -------------------- |
| 148 | Forwarding part of a message consists in advancing buf.p up to the point where |
| 149 | it points to the byte following the last one to be forwarded. This can be done |
| 150 | inline if enough bytes are present in the buffer, or in multiple steps if more |
| 151 | buffers need to be forwarded (possibly including splicing). Thus by definition, |
| 152 | after a block has been scheduled for being forwarded, msg->next and msg->sov |
| 153 | must be reset. |
| 154 | |
| 155 | The communication channel between the producer and the consumer holds a counter |
| 156 | of extra bytes remaining to be forwarded directly without consulting analysers, |
| 157 | after buf.p. This counter is called to_forward. It commonly holds the advertised |
| 158 | chunk length or content-length that does not fit in the buffer. For example, if |
| 159 | 2000 bytes are to be forwarded, and 10 bytes are present after buf.p as reported |
| 160 | by buf.i, then both buf.o and buf.p will advance by 10, buf.i will be reset, and |
| 161 | to_forward will be set to 1990 so that in total, 2000 bytes will be forwarded. |
| 162 | At the end of the forwarding, buf.p will point to the first byte to be inspected |
| 163 | after the 2000 forwarded bytes. |