blob: 7783b0c7f1b477acecf2f4a5c79364fbbd684ec4 [file] [log] [blame]
Christopher Fauletbda62e72019-07-12 15:05:58 +02001 -----------------------------------------------
2 HTX API
3 Version 1.0
4 ( Last update: 2019-06-20 )
5 -----------------------------------------------
6 Author : Christopher Faulet
7 Contact : cfaulet at haproxy dot com
8
91. Background
10
11Historically, HAProxy stored HTTP messages in a raw fashion in buffers, keeping
12parsing information separatly in a "struct http_msg" owned by the stream. It was
13optimized to the data transfer, but not so much for rewrites. It was also HTTP/1
14centered. While it was the only HTTP version supported, it was not a
15problem. But with the rise of HTTP/2, it starts to be hard to still use this
16representation.
17
18At the first age of the HTTP/2 in HAProxy, H2 messages were converted into
19H1. This was terribly unefficient because it required two parsing passes, a
20first one in H2 and a second one in H1, with a conversion in the middle. And of
21course, the same was also true in the opposite direction. outgoing H1 messages
22had to be converted back in H2 to be sent. Even worse, because the H2->H1
23conversion, only client H2 connections were supported.
24
25So, to address all these problems, we decided to replace the old raw
26representation by a version-agnostic and self-structured internal HTTP
27representation, the HTX. As an additional benefit, with this new representation,
28the message parsing and its processing are now separated, making all the HTTP
29analysis simplier and cleaner. The parsing of HTTP messages is now handled by
30the multiplexers (h1 or h2).
31
32
332. The HTX message
34
35The HTX is a structure containing useful information about an HTTP message
36followed by a contiguous array with some parts of the message. These parts are
37called blocks. A block is composed of metadata (htx_blk) and an associated
38payload. Blocks' metadata are stored starting from the end of the array while
39their payload are stored at the beginning. Blocks' metadata are often simply
40called blocks. it is a misuse of language that's simplify explainations.
41
42Internally, this structure is "hidden" in a buffer. This way, there are few
43changes into intermediate layers (stream-interface and channels). They still
44manipulate buffers. Only the multiplexer and the stream have to know how data
45are really stored. From the HTX perspective, a buffer is just a memory
46area. When an HTX message is stored in a buffer, this one appears as full.
47
48 * General view of an HTX message :
49
50
51 buffer->area
52 |
53 |<------------ buffer->size == buffer->data ----------------------|
54 | |
55 | |<------------- Blocks array (htx->size) ------------------>|
56 V | |
57 +-----+-----------------+-------------------------+---------------+
58 | HTX | PAYLOADS ==> | | <== HTX_BLKs |
59 +-----+-----------------+-------------------------+---------------+
60 | | | |
61 |<-payloads part->|<----- free space ------>|<-blocks part->|
62 (htx->data)
63
64
65The blocks part remains linear and sorted. You may think about it as an array
66with negative indexes. But, instead of using negative indexes, we use positive
67positions to identify a block. This position is then converted to an address
68relatively to the beginning of the blocks array.
69
70 tail head
71 | |
72 V V
73 .....--+----+-----------------------+------+------+
74 | Bn | ... | B1 | B0 |
75 .....--+----+-----------------------+------+------+
76 ^ ^ ^
77 Addr of the block Addr of the block Addr of the block
78 at the position N at the position 1 at the position 0
79
80
81In the HTX struture, 3 "special" positions are stored:
82
83 - tail : Position of the newest inserted block
84 - head : Position of the oldest inserted block
85 - first : Position of the first block to (re)start the analyse
86
87The blocks part never wrap. If we have no space to allocate a new block and if
88there is a hole at the beginning of the blocks part (so at the end of the blocks
89array), we move back all blocks.
90
91
92 tail head tail head
93 | | | |
94 V V V V
95 ...+--------------+---------+ blocks ...----------+--------------+
96 | X== HTX_BLKS | | defrag | <== HTX_BLKS |
97 ...+--------------+---------+ =====> ...----------+--------------+
98
99
100The payloads part is a raw space that may wrap. You never access to a block's
101payload directly. Instead you get a block to retrieve the address of its
102payload.
103
104
105 +------------------------( B0.addr )--------------------------+
106 | +-------------------( B1.addr )----------------------+ |
107 | | +-----------( B2.addr )----------------+ | |
108 V V V | | |
109 +-----+----+-------+----+--------+-------------+-------+----+----+----+
110 | HTX | P0 | P1 | P2 | ...==> | | <=... | B2 | B1 | B0 |
111 +-----+----+-------+----+--------+-------------+-------+----+----+----+
112
113
114Because the payloads part may wrap, there are 2 usable free spaces:
115
116 - The free space in front of the blocks part. This one is used iff the other
117 one was not used yet.
118
119 - The free space at the beginning of the message. Once this one is used, the
120 other one is never used again, until a message defragmentation.
121
122
123 * Linear payloads part :
124
125
126 head_addr end_addr tail_addr
127 | | |
128 V V V
129 +-----+--------------------+-------------+--------------------+-------...
130 | HTX | | PAYLOADS | | HTX_BLKs
131 +-----+--------------------+-------------+--------------------+-------...
132 |<-- free space 2 -->| |<-- free space 1 -->|
133 (used if the other is too small) (used in priority)
134
135
136 * Wrapping payloads part :
137
138
139 head_addr end_addr tail_addr
140 | | |
141 V V V
142 +-----+----+----------------+--------+----------------+-------+-------...
143 | HTX | | PAYLOADS part2 | | PAYLOADS part1 | | HTX_BLKs
144 +-----+----+----------------+--------+----------------+-------+-------...
145 |<-->| |<------>| |<----->|
146 unusable free space unusable
147 free space free space
148
149
150Finally, when the usable free space is not enough to store a new block, unsuable
151parts may be get back with a full defragmentation. The payloads part is then
152realigned at the beginning of the blocks array and the free space becomes
153continuous again.
154
155
1563. The HTX blocks
157
158An HTX block can be as well a start-line as a header, a body part or a
159trailer. For all these types of block, a payload is attached to the block. It
160can also be a marker, the end-of-headers, end-of-trailers or end-of-message. For
161these blocks, there is no payload but it counts for a byte. It is important to
162not skip it when data are forwarded.
163
164As already said, a block is composed of metadata and a payload. Metadata are
165stored in the blocks part and are composed of 2 fields :
166
167 - info : It a 32 bits field containing the block's type on 4 bits followed
168 by the payload length. See below for details.
169
170 - addr : The payload's address, if any, relatively to the beginning the
171 array used to store part of the HTTP message itself.
172
173
174 * Block's info representation :
175
176 0b 0000 0000 0000 0000 0000 0000 0000 0000
177 ---- ------------------------ ---------
178 type value (1 MB max) name length (header/trailer - 256B max)
179 ----------------------------------
180 data length (256 MB max)
181 (body, method, path, version, status, reason)
182
183
184Supported types are :
185
186 - 0000 (0) : The request start-line
187 - 0001 (1) : The response start-line
188 - 0010 (2) : A header block
189 - 0011 (3) : The end-of-headers marker
190 - 0100 (4) : A data block
191 - 0101 (5) : A trailer block
192 - 0110 (6) : The end-of-trailers marker
193 - 0111 (7) : The end-of-message marker
194 - 1111 (15) : An unused block
195
196Other types are unused for now and reserved for futur extensions.
197
198An HTX message is typically composed of following blocks, in this order :
199
200 - a start-line
201 - zero or more header blocks
202 - an end-of-headers marker
203 - zero or more data blocks
204 - zero or more trailer blocks (optional)
205 - an end-of-trailers marker (optional but always set if there is at least
206 one trailer block)
207 - an end-of-message marker.
208
209Only one HTTP request at a time can be stored in an HTX message. For HTTP
210response, it is more complicated. Only one "final" response can be stored in an
211HTX message. It is a response with status-code 101 or greater or equal to
212200. But it may be preceeded by several 1xx informational responses. Such
213responses are part of the same HTX message, so there is no end-of-message marker
214for them.
215
216
2173.1. The start-line
218
219Every HTX message starts with a start-line. Its payload is a "struct htx_sl". In
220addition to the parts of the HTTP start-line, this structure contains some
221information about the represented HTTP message, mainly in the form of flags
222(HTX_SL_F_*). For instance, if an HTTP message contains the header
223"conten-length", then the flag HTX_SL_F_CLEN is set.
224
225Each HTTP message has its own start-line. So an HTX request has one and only one
226start-line because it must contain only one HTTP request at a time. But an HTX
227response may have more than one start-line if the final HTTP response is
228precedeed by some 1xx informational responses.
229
230In HTTP/2, there is no start-line. So the H2 multiplexer must create one when it
231converts an H2 message to HTX :
232
233 - For the request, it uses the pseudo headers ":method", ":path" or
234 ":authority" depending on the method and the hardcoded version "HTTP/2.0".
235
236 - For the response, it used the hardcoded version "HTTP/2.0", the
237 pseudo-header ":status" and an empty reason.
238
239
2403.2. The headers and trailers
241
242HTX Headers and trailers are quite similar. Different types are used to simplify
243headers processing. But from the HTX point of view, there is no real difference,
244except their position in the HTX message. The header blocks always follow an HTX
245start-line while trailer blocks come after the data. If there is no data, they
246follow the end-of-headers marker.
247
248Headers and trailers are the only blocks containing a Key/Value payload. The
249corresponding end-of marker must always be placed after each group to mark, as
250it name suggests, the end.
251
252In HTTP/1, trailers are only present on chunked messages. But chunked messages
253do not always have trailers. In this case, the end-of-trailers block may or may
254not be present. Multiplexers must be able to handle both situations. In HTTP/2,
255trailers are only present if a HEADERS frame is sent after DATA frames.
256
257
2583.3. The data
259
260The payload body of an HTTP message is stored as DATA blocks in the HTX
261message. For HTTP/1 messages, it is the message body without the chunks
262formatting, if any. For HTTP/2, it is the payload of DATA frames.
263
264The DATA blocks are the only HTX blocks that may be partially processed (copied
265or removed). All other types of block must be entierly processed. This means
266DATA blocks can be resized.
267
268
2693.4. The end-of markers
270
271These blocks are used to delimit parts of an HTX message. It exists three
272markers:
273
274 - end-of-headers (EOH)
275 - end-of-trailers (EOT)
276 - end-of-message (EOM)
277
278EOH and EOM are always present in an HTX message. EOT is optional.
279
280
2814. The HTX API
282
283
2844.1. Get/set HTX message from/to the underlying buffer
285
286The first thing to do to process an HTX message is to get it from the underlying
287buffer. There are 2 functions to do so, the second one relying on the first:
288
289 - htxbuf() returns an HTX message from a buffer. It does not modify the
290 buffer. It only initialize the HTX message if the buffer is empty.
291
292 - htx_from_buf() uses htxbuf(). But it also updates the underlying buffer so
293 that it appears as full.
294
295Both functions return a "zero-sized" HTX message if the buffer is null. This
296way, you are sure to always have a valid HTX message. The first function is the
297default function to use. The second one is only useful when some content will be
298added. For instance, it used by the HTX analyzers when HAproxy generates a
299response. This way, the buffer is in a right state and you don't need to take
300care of it anymore outside the possible error paths.
301
302Once the processing done, if the HTX message has been modified, the underlying
303buffer must be also updated, except you uses htx_from_buf() and you only add
304data. For all other cases, the function htx_to_buf() must be called.
305
306Finally, the function htx_reset() may be called at any time to reset an HTX
307message. And the function buf_room_for_htx_data() may be called to know if a raw
308buffer is full from the HTX perspective. It is used during conversion from/to
309the HTX.
310
311
3124.2. Helpers to deal with free space in an HTX message
313
314Once you have an HTX message, following functions may help you to process it :
315
316 - htx_used_space() and htx_meta_space() return, respectively, the total
317 space used in an HTX message and the space used by block's metadata only.
318
319 - htx_free_space() and htx_free_data_space() return, respectively, the total
320 free space in an HTX message and the free space available for the payload
321 if a new HTX block is stored (so it is the total free space minus the size
322 of an HTX block).
323
324 - htx_is_empty() and htx_is_not_empty() are boolean functions to know if an
325 HTX message is empty or not.
326
327 - htx_get_max_blksz() returns the maximum size available for the payload,
328 not exceeding a maximum, metadata included.
329
330 - htx_almost_full() should be used to know if an HTX message uses at least
331 3/4 of its capacity.
332
333
3344.3. HTX Blocks manipulations
335
336Once you know how much space is available in an HTX message, the next step is to
337add HTX blocks. First of all the function htx_nbblks() returns the number of
338blocks allocated in an HTX message. Then, there is an add function per block's
339type:
340
341 - htx_add_stline() adds a start-line. The type (request or response) and the
342 flags of the start-line must be provided, as well as its three parts
343 (method,uri,version or version,status-code,reason).
344
345 - htx_add_header() and htx_add_trailers() are similar. The name and the
346 value must be provided. The inserted HTX block is returned on success or
347 NULL if an error occurred.
348
349 - htx_add_endof() must be used to add any end-of marker. The block's type
350 (EOH, EOT or EOM) must be specified. The inserted HTX block is returned on
351 success or NULL if an error occurred.
352
353 - htx_add_all_headers() and htx_add_all_trailers() add, respectively, a list
354 of headers and a list of trailers, followed by the appropriate end-of
355 marker. On success, this marker is returned. Otherwise, NULL is
356 returned. Note there is no rollback on the HTX message when an error
357 occurred. Some headers or trailers may have been added. So it is the
358 caller responsibility to take care of that.
359
360 - htx_add_data() must be used to add a DATA block. Unlike previous
361 functions, this one returns the number of bytes copied or 0 if nothing was
362 copied. If possible, the data are appended to the last DATA block, if
363 any. Only a part of the payload may be copied because this function will
364 try to limit the message defragmentation and the wrapping of blocks as far
365 as possible. If you really need to add all data or nothing, the function
366 htx_add_data_atonce() must be used instead. Because it tries to insert all
367 the payload, this function returns the inserted block on success.
368 Otherwise it returns NULL.
369
370When an HTX block is added, it is always the last one (the tail). But, if you
371need to add a block at a specific place, it is not really handy. 2 functions may
372help you (others could be added) :
373
374 - htx_add_last_data() adds a DATA block just after all other DATA blocks and
375 before any trailers and EOT or EOM markers. It relies on
376 htx_add_data_atonce(), so a defragmentation may be performed.
377
378 - htx_move_blk_before() moves a specific block just after another one. Both
379 blocks must already be in the HTX message and the block to move must
380 always be placed after the "pivot".
381
382Once added, there are three functions to update the block's payload :
383
384 - htx_replace_stline() updates a start-line. The HTX block must be passed as
385 argument. Only string parts of the start-line are updated by this
386 function. On success, it returns the new start-line. So it is pretty easy
387 to update its flags. NULL is returned if an error occurred.
388
389 - htx_replace_header() fully replaces a header (its name and its value) by a
390 new one. The HTX block must be passed a argument, as well as its new name
391 and its new value. The new header can be smaller or larger than the old
392 one. This function returns the new HTX block on success, or NULL is an
393 error occurred.
394
395 - htx_replace_blk_value() replaces a part of a block's payload or its
396 totality. It works for HEADERS, TRAILERS or DATA blocks. The HTX block
397 must be provided with the part to remove and the new one. The new part can
398 be smaller or larger than the old one. This function returns the new HTX
399 block on success, or NULL is an error occurred.
400
401Finally, You may remove a block using the function htx_remove_blk(). This
402function returns the block following the one removed or NULL if it is the tail
403block.
404
405
4064.4. The HTX start-line
407
408Unlike other HTX blocks, the start-line is a bit special because its payload is
409a structure followed by its three parts :
410
411 +--------+-------+-------+-------+
412 | HTX_SL | PART1 | PART2 | PART3 |
413 +--------+-------+-------+-------+
414
415Some macros and functions may help to manipulate these parts :
416
417 - HTX_SL_P{N}_LEN() and HTX_SL_P{N}_PTR() are macros to get the length of a
418 part and a pointer on it. {N} should be 1, 2 or 3.
419
420 - HTX_SL_REQ_MLEN(), HTX_SL_REQ_ULEN(), HTX_SL_REQ_VLEN(),
421 HTX_SL_REQ_MPTR(), HTX_SL_REQ_UPTR() and HTX_SL_REQ_VPTR() are macros to
422 get info about a request start-line. These macros only wrap HTX_SL_P*
423 ones.
424
425 - HTX_SL_RES_VLEN(), HTX_SL_RES_CLEN(), HTX_SL_RES_RLEN(),
426 HTX_SL_RES_VPTR(), HTX_SL_RES_CPTR() and HTX_SL_RES_RPTR() are macros to
427 get info about a response start-line. These macros only wrap HTX_SL_P*
428 ones.
429
430 - htx_sl_p1(), htx_sl_p2() and htx_sl_p2() are functions to get the ist
431 corresponding to the right part of a start-line.
432
433 - htx_sl_req_meth(), htx_sl_req_uri() and htx_sl_req_vsn() get the ist
434 corresponding to the right part of a request start-line.
435
436 - htx_sl_res_vsn(), htx_sl_res_code() and htx_sl_res_reason() get the ist
437 corresponding to the right part of a response start-line.
438
439
4404.5. Iterate on the HTX message
441
442To iterate on an HTX message, the first thing to do is to get the HTX block to
443start the loop. There are three special blocks in an HTX message that may be
444good candidates to start a loop :
445
446 * the head block. It is the oldest inserted block. Multiplexers always start
447 to consume an HTX message from this block. The function htx_get_head()
448 returns its position and htx_get_head_blk() returns the blocks itself. In
449 addition, the function htx_get_head_type() returns its block's type.
450
451 * the tail block. It is the newest inserted block. The function htx_get_tail()
452 returns its position and htx_get_tail_blk() returns the blocks itself. In
453 addition, the function htx_get_tail_type() returns its block's type.
454
455 * the first block. It is the block where to (re)start the analyse. It is used
456 as start point by HTX analyzers. The function htx_get_first() returns its
457 position and htx_get_first_blk() returns the blocks itself. In addition, the
458 function htx_get_first_type() returns its block's type.
459
460For all these functions, if the HTX message is empty, -1 is returned for the
461block's position, NULL instead of a block and HTX_BLK_UNUSED for its type.
462
463Then to iterate on blocks, you may move foreword or backward :
464
465 * htx_get_prev() and htx_get_next() return, respectively, the position of the
466 previous block or the next block, given a specific position. Or -1 if an edge
467 is reached.
468
469 * htx_get_prev_blk() and htx_get_next_blk() return, respectively, the previous
470 block or the next one, given a specific block. Or NULL if an edge is
471 reached.
472
473
4744.6. Advanced functions
475
476Some more advanced functions may be used to do complex processing on the HTX
477message. These functions are used by HTX analyzers or by multiplexers.
478
479 * htx_truncate() removes all blocks after the one containing a specific offset
480 relatively to the head block of the HTX message. If the offset is inside a
481 DATA block, it is truncated. For all other blocks, the removal starts to the
482 next block.
483
484 * htx_drain() tries to remove a specific amount of bytes of payload. If the
485 last block is a DATA block, it may be truncated if necessary. All other
486 block are removed at once or kept. This function returns a mixed value, with
487 the first block not removed, or NULL if everything was removed, and the
488 amount of data drained.
489
490 * htx_xfer_blks() transfers HTX blocks from an HTX message to another,
491 stopping on the first block of a specified type or when a specific amount of
492 bytes, including meta-data, was moved. If the last block is a DATA block, it
493 may be partially moved. All other block are transferred at once or
494 kept. This function returns a mixed value, with the last block moved, or
495 NULL if nothing was moved, and the amount of data transferred. When HEADERS
496 or TRAILERS blocks must be transferred, this function transfers all of
497 them. Otherwise, if it is not possible, it triggers an error. It is the
498 caller responsibility to transfer all headers or trailers at once.