Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 1 | ----------------------------------------------- |
| 2 | HTX API |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 3 | Version 1.1 |
| 4 | ( Last update: 2021-02-24 ) |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 5 | ----------------------------------------------- |
| 6 | Author : Christopher Faulet |
| 7 | Contact : cfaulet at haproxy dot com |
| 8 | |
| 9 | 1. Background |
| 10 | |
| 11 | Historically, HAProxy stored HTTP messages in a raw fashion in buffers, keeping |
Ilya Shipitsin | 2075ca8 | 2020-03-06 23:22:22 +0500 | [diff] [blame] | 12 | parsing information separately in a "struct http_msg" owned by the stream. It was |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 13 | optimized to the data transfer, but not so much for rewrites. It was also HTTP/1 |
| 14 | centered. While it was the only HTTP version supported, it was not a |
| 15 | problem. But with the rise of HTTP/2, it starts to be hard to still use this |
| 16 | representation. |
| 17 | |
| 18 | At the first age of the HTTP/2 in HAProxy, H2 messages were converted into |
| 19 | H1. This was terribly unefficient because it required two parsing passes, a |
| 20 | first one in H2 and a second one in H1, with a conversion in the middle. And of |
| 21 | course, the same was also true in the opposite direction. outgoing H1 messages |
| 22 | had to be converted back in H2 to be sent. Even worse, because the H2->H1 |
| 23 | conversion, only client H2 connections were supported. |
| 24 | |
| 25 | So, to address all these problems, we decided to replace the old raw |
| 26 | representation by a version-agnostic and self-structured internal HTTP |
| 27 | representation, the HTX. As an additional benefit, with this new representation, |
| 28 | the message parsing and its processing are now separated, making all the HTTP |
Ilya Shipitsin | 2075ca8 | 2020-03-06 23:22:22 +0500 | [diff] [blame] | 29 | analysis simpler and cleaner. The parsing of HTTP messages is now handled by |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 30 | the multiplexers (h1 or h2). |
| 31 | |
| 32 | |
| 33 | 2. The HTX message |
| 34 | |
| 35 | The HTX is a structure containing useful information about an HTTP message |
| 36 | followed by a contiguous array with some parts of the message. These parts are |
| 37 | called blocks. A block is composed of metadata (htx_blk) and an associated |
| 38 | payload. Blocks' metadata are stored starting from the end of the array while |
| 39 | their payload are stored at the beginning. Blocks' metadata are often simply |
Ilya Shipitsin | 2075ca8 | 2020-03-06 23:22:22 +0500 | [diff] [blame] | 40 | called blocks. it is a misuse of language that's simplify explanations. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 41 | |
| 42 | Internally, this structure is "hidden" in a buffer. This way, there are few |
| 43 | changes into intermediate layers (stream-interface and channels). They still |
| 44 | manipulate buffers. Only the multiplexer and the stream have to know how data |
| 45 | are really stored. From the HTX perspective, a buffer is just a memory |
| 46 | area. When an HTX message is stored in a buffer, this one appears as full. |
| 47 | |
| 48 | * General view of an HTX message : |
| 49 | |
| 50 | |
| 51 | buffer->area |
| 52 | | |
| 53 | |<------------ buffer->size == buffer->data ----------------------| |
| 54 | | | |
| 55 | | |<------------- Blocks array (htx->size) ------------------>| |
| 56 | V | | |
| 57 | +-----+-----------------+-------------------------+---------------+ |
| 58 | | HTX | PAYLOADS ==> | | <== HTX_BLKs | |
| 59 | +-----+-----------------+-------------------------+---------------+ |
| 60 | | | | | |
| 61 | |<-payloads part->|<----- free space ------>|<-blocks part->| |
| 62 | (htx->data) |
| 63 | |
| 64 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 65 | The blocks part remains linear and sorted. It may be see as an array with |
| 66 | negative indexes. But, instead of using negative indexes, we use positive |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 67 | positions to identify a block. This position is then converted to an address |
| 68 | relatively to the beginning of the blocks array. |
| 69 | |
| 70 | tail head |
| 71 | | | |
| 72 | V V |
| 73 | .....--+----+-----------------------+------+------+ |
| 74 | | Bn | ... | B1 | B0 | |
| 75 | .....--+----+-----------------------+------+------+ |
| 76 | ^ ^ ^ |
| 77 | Addr of the block Addr of the block Addr of the block |
| 78 | at the position N at the position 1 at the position 0 |
| 79 | |
| 80 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 81 | In the HTX structure, 3 "special" positions are stored : |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 82 | |
| 83 | - tail : Position of the newest inserted block |
| 84 | - head : Position of the oldest inserted block |
| 85 | - first : Position of the first block to (re)start the analyse |
| 86 | |
| 87 | The blocks part never wrap. If we have no space to allocate a new block and if |
| 88 | there is a hole at the beginning of the blocks part (so at the end of the blocks |
| 89 | array), we move back all blocks. |
| 90 | |
| 91 | |
| 92 | tail head tail head |
| 93 | | | | | |
| 94 | V V V V |
| 95 | ...+--------------+---------+ blocks ...----------+--------------+ |
| 96 | | X== HTX_BLKS | | defrag | <== HTX_BLKS | |
| 97 | ...+--------------+---------+ =====> ...----------+--------------+ |
| 98 | |
| 99 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 100 | The payloads part is a raw space that may wrap. A block's payload must never be |
| 101 | accessed directly. Instead a block must be selected to retrieve the address of |
| 102 | its payload. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 103 | |
| 104 | |
| 105 | +------------------------( B0.addr )--------------------------+ |
| 106 | | +-------------------( B1.addr )----------------------+ | |
| 107 | | | +-----------( B2.addr )----------------+ | | |
| 108 | V V V | | | |
| 109 | +-----+----+-------+----+--------+-------------+-------+----+----+----+ |
| 110 | | HTX | P0 | P1 | P2 | ...==> | | <=... | B2 | B1 | B0 | |
| 111 | +-----+----+-------+----+--------+-------------+-------+----+----+----+ |
| 112 | |
| 113 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 114 | Because the payloads part may wrap, there are 2 usable free spaces : |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 115 | |
Thayne McCombs | cdbcca9 | 2021-01-07 21:24:41 -0700 | [diff] [blame] | 116 | - The free space in front of the blocks part. This one is used if and only if |
| 117 | the other one was not used yet. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 118 | |
| 119 | - The free space at the beginning of the message. Once this one is used, the |
| 120 | other one is never used again, until a message defragmentation. |
| 121 | |
| 122 | |
| 123 | * Linear payloads part : |
| 124 | |
| 125 | |
| 126 | head_addr end_addr tail_addr |
| 127 | | | | |
| 128 | V V V |
| 129 | +-----+--------------------+-------------+--------------------+-------... |
| 130 | | HTX | | PAYLOADS | | HTX_BLKs |
| 131 | +-----+--------------------+-------------+--------------------+-------... |
| 132 | |<-- free space 2 -->| |<-- free space 1 -->| |
| 133 | (used if the other is too small) (used in priority) |
| 134 | |
| 135 | |
| 136 | * Wrapping payloads part : |
| 137 | |
| 138 | |
| 139 | head_addr end_addr tail_addr |
| 140 | | | | |
| 141 | V V V |
| 142 | +-----+----+----------------+--------+----------------+-------+-------... |
| 143 | | HTX | | PAYLOADS part2 | | PAYLOADS part1 | | HTX_BLKs |
| 144 | +-----+----+----------------+--------+----------------+-------+-------... |
| 145 | |<-->| |<------>| |<----->| |
| 146 | unusable free space unusable |
| 147 | free space free space |
| 148 | |
| 149 | |
Ilya Shipitsin | 11057a3 | 2020-06-21 21:18:27 +0500 | [diff] [blame] | 150 | Finally, when the usable free space is not enough to store a new block, unusable |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 151 | parts may be get back with a full defragmentation. The payloads part is then |
| 152 | realigned at the beginning of the blocks array and the free space becomes |
| 153 | continuous again. |
| 154 | |
| 155 | |
| 156 | 3. The HTX blocks |
| 157 | |
| 158 | An HTX block can be as well a start-line as a header, a body part or a |
| 159 | trailer. For all these types of block, a payload is attached to the block. It |
Christopher Faulet | d1ac2b9 | 2020-12-02 19:12:22 +0100 | [diff] [blame] | 160 | can also be a marker, the end-of-headers or end-of-trailers. For these blocks, |
| 161 | there is no payload but it counts for a byte. It is important to not skip it |
| 162 | when data are forwarded. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 163 | |
| 164 | As already said, a block is composed of metadata and a payload. Metadata are |
| 165 | stored in the blocks part and are composed of 2 fields : |
| 166 | |
| 167 | - info : It a 32 bits field containing the block's type on 4 bits followed |
| 168 | by the payload length. See below for details. |
| 169 | |
| 170 | - addr : The payload's address, if any, relatively to the beginning the |
| 171 | array used to store part of the HTTP message itself. |
| 172 | |
| 173 | |
| 174 | * Block's info representation : |
| 175 | |
| 176 | 0b 0000 0000 0000 0000 0000 0000 0000 0000 |
| 177 | ---- ------------------------ --------- |
| 178 | type value (1 MB max) name length (header/trailer - 256B max) |
| 179 | ---------------------------------- |
| 180 | data length (256 MB max) |
| 181 | (body, method, path, version, status, reason) |
| 182 | |
| 183 | |
| 184 | Supported types are : |
| 185 | |
| 186 | - 0000 (0) : The request start-line |
| 187 | - 0001 (1) : The response start-line |
| 188 | - 0010 (2) : A header block |
| 189 | - 0011 (3) : The end-of-headers marker |
| 190 | - 0100 (4) : A data block |
| 191 | - 0101 (5) : A trailer block |
| 192 | - 0110 (6) : The end-of-trailers marker |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 193 | - 1111 (15) : An unused block |
| 194 | |
| 195 | Other types are unused for now and reserved for futur extensions. |
| 196 | |
| 197 | An HTX message is typically composed of following blocks, in this order : |
| 198 | |
| 199 | - a start-line |
| 200 | - zero or more header blocks |
| 201 | - an end-of-headers marker |
| 202 | - zero or more data blocks |
| 203 | - zero or more trailer blocks (optional) |
| 204 | - an end-of-trailers marker (optional but always set if there is at least |
| 205 | one trailer block) |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 206 | |
| 207 | Only one HTTP request at a time can be stored in an HTX message. For HTTP |
| 208 | response, it is more complicated. Only one "final" response can be stored in an |
| 209 | HTX message. It is a response with status-code 101 or greater or equal to |
Ilya Shipitsin | 2075ca8 | 2020-03-06 23:22:22 +0500 | [diff] [blame] | 210 | 200. But it may be preceded by several 1xx informational responses. Such |
Christopher Faulet | d1ac2b9 | 2020-12-02 19:12:22 +0100 | [diff] [blame] | 211 | responses are part of the same HTX message. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 212 | |
Christopher Faulet | d1ac2b9 | 2020-12-02 19:12:22 +0100 | [diff] [blame] | 213 | When the end of the message is reached a special flag is set on the message |
| 214 | (HTX_FL_EOM). It means no more data are expected for this message, except |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 215 | tunneled data. But tunneled data will never be mixed with message data to avoid |
| 216 | ambiguities. Thus once the flag marking the end of the message is set, it is |
| 217 | easy to know the message ends. The end is reached if the HTX message is empty or |
| 218 | on the tail HTX block in the HTX message. Once all blocks of the HTX message are |
Ilya Shipitsin | d7a988c | 2021-03-04 23:26:15 +0500 | [diff] [blame] | 219 | consumed, tunneled data, if any, may be transferred. |
Christopher Faulet | d1ac2b9 | 2020-12-02 19:12:22 +0100 | [diff] [blame] | 220 | |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 221 | |
| 222 | 3.1. The start-line |
| 223 | |
| 224 | Every HTX message starts with a start-line. Its payload is a "struct htx_sl". In |
| 225 | addition to the parts of the HTTP start-line, this structure contains some |
| 226 | information about the represented HTTP message, mainly in the form of flags |
| 227 | (HTX_SL_F_*). For instance, if an HTTP message contains the header |
| 228 | "conten-length", then the flag HTX_SL_F_CLEN is set. |
| 229 | |
| 230 | Each HTTP message has its own start-line. So an HTX request has one and only one |
| 231 | start-line because it must contain only one HTTP request at a time. But an HTX |
| 232 | response may have more than one start-line if the final HTTP response is |
| 233 | precedeed by some 1xx informational responses. |
| 234 | |
| 235 | In HTTP/2, there is no start-line. So the H2 multiplexer must create one when it |
| 236 | converts an H2 message to HTX : |
| 237 | |
| 238 | - For the request, it uses the pseudo headers ":method", ":path" or |
| 239 | ":authority" depending on the method and the hardcoded version "HTTP/2.0". |
| 240 | |
| 241 | - For the response, it used the hardcoded version "HTTP/2.0", the |
| 242 | pseudo-header ":status" and an empty reason. |
| 243 | |
| 244 | |
| 245 | 3.2. The headers and trailers |
| 246 | |
| 247 | HTX Headers and trailers are quite similar. Different types are used to simplify |
| 248 | headers processing. But from the HTX point of view, there is no real difference, |
| 249 | except their position in the HTX message. The header blocks always follow an HTX |
| 250 | start-line while trailer blocks come after the data. If there is no data, they |
| 251 | follow the end-of-headers marker. |
| 252 | |
| 253 | Headers and trailers are the only blocks containing a Key/Value payload. The |
| 254 | corresponding end-of marker must always be placed after each group to mark, as |
| 255 | it name suggests, the end. |
| 256 | |
| 257 | In HTTP/1, trailers are only present on chunked messages. But chunked messages |
| 258 | do not always have trailers. In this case, the end-of-trailers block may or may |
| 259 | not be present. Multiplexers must be able to handle both situations. In HTTP/2, |
| 260 | trailers are only present if a HEADERS frame is sent after DATA frames. |
| 261 | |
| 262 | |
| 263 | 3.3. The data |
| 264 | |
| 265 | The payload body of an HTTP message is stored as DATA blocks in the HTX |
| 266 | message. For HTTP/1 messages, it is the message body without the chunks |
| 267 | formatting, if any. For HTTP/2, it is the payload of DATA frames. |
| 268 | |
| 269 | The DATA blocks are the only HTX blocks that may be partially processed (copied |
Ilya Shipitsin | 4a689da | 2022-10-29 09:34:32 +0500 | [diff] [blame] | 270 | or removed). All other types of block must be entirely processed. This means |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 271 | DATA blocks can be resized. |
| 272 | |
| 273 | |
| 274 | 3.4. The end-of markers |
| 275 | |
Christopher Faulet | d1ac2b9 | 2020-12-02 19:12:22 +0100 | [diff] [blame] | 276 | These blocks are used to delimit parts of an HTX message. It exists two |
| 277 | markers : |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 278 | |
| 279 | - end-of-headers (EOH) |
| 280 | - end-of-trailers (EOT) |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 281 | |
Christopher Faulet | d1ac2b9 | 2020-12-02 19:12:22 +0100 | [diff] [blame] | 282 | EOH is always present in an HTX message. EOT is optional. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 283 | |
| 284 | |
| 285 | 4. The HTX API |
| 286 | |
| 287 | |
| 288 | 4.1. Get/set HTX message from/to the underlying buffer |
| 289 | |
| 290 | The first thing to do to process an HTX message is to get it from the underlying |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 291 | buffer. There are 2 functions to do so, the second one relying on the first : |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 292 | |
| 293 | - htxbuf() returns an HTX message from a buffer. It does not modify the |
| 294 | buffer. It only initialize the HTX message if the buffer is empty. |
| 295 | |
| 296 | - htx_from_buf() uses htxbuf(). But it also updates the underlying buffer so |
| 297 | that it appears as full. |
| 298 | |
| 299 | Both functions return a "zero-sized" HTX message if the buffer is null. This |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 300 | way, the HTX message is always valid. The first function is the default function |
| 301 | to use. The second one is only useful when some content will be added. For |
Willy Tarreau | 714f345 | 2021-05-09 06:47:26 +0200 | [diff] [blame] | 302 | instance, it used by the HTX analyzers when HAProxy generates a response. Thus, |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 303 | the buffer is in a right state. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 304 | |
| 305 | Once the processing done, if the HTX message has been modified, the underlying |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 306 | buffer must be also updated, except htx_from_buf() was used _AND_ data was only |
| 307 | added. For all other cases, the function htx_to_buf() must be called. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 308 | |
| 309 | Finally, the function htx_reset() may be called at any time to reset an HTX |
| 310 | message. And the function buf_room_for_htx_data() may be called to know if a raw |
| 311 | buffer is full from the HTX perspective. It is used during conversion from/to |
| 312 | the HTX. |
| 313 | |
| 314 | |
| 315 | 4.2. Helpers to deal with free space in an HTX message |
| 316 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 317 | Once with an HTX message, following functions may help to process it : |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 318 | |
| 319 | - htx_used_space() and htx_meta_space() return, respectively, the total |
| 320 | space used in an HTX message and the space used by block's metadata only. |
| 321 | |
| 322 | - htx_free_space() and htx_free_data_space() return, respectively, the total |
| 323 | free space in an HTX message and the free space available for the payload |
| 324 | if a new HTX block is stored (so it is the total free space minus the size |
| 325 | of an HTX block). |
| 326 | |
| 327 | - htx_is_empty() and htx_is_not_empty() are boolean functions to know if an |
| 328 | HTX message is empty or not. |
| 329 | |
| 330 | - htx_get_max_blksz() returns the maximum size available for the payload, |
| 331 | not exceeding a maximum, metadata included. |
| 332 | |
| 333 | - htx_almost_full() should be used to know if an HTX message uses at least |
| 334 | 3/4 of its capacity. |
| 335 | |
| 336 | |
| 337 | 4.3. HTX Blocks manipulations |
| 338 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 339 | Once the available sapce in an HTX message is known, the next step is to add HTX |
| 340 | blocks. First of all the function htx_nbblks() returns the number of blocks |
| 341 | allocated in an HTX message. Then, there is an add function per block's type : |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 342 | |
| 343 | - htx_add_stline() adds a start-line. The type (request or response) and the |
| 344 | flags of the start-line must be provided, as well as its three parts |
| 345 | (method,uri,version or version,status-code,reason). |
| 346 | |
| 347 | - htx_add_header() and htx_add_trailers() are similar. The name and the |
| 348 | value must be provided. The inserted HTX block is returned on success or |
| 349 | NULL if an error occurred. |
| 350 | |
| 351 | - htx_add_endof() must be used to add any end-of marker. The block's type |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 352 | (EOH or EOT) must be specified. The inserted HTX block is returned on |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 353 | success or NULL if an error occurred. |
| 354 | |
| 355 | - htx_add_all_headers() and htx_add_all_trailers() add, respectively, a list |
| 356 | of headers and a list of trailers, followed by the appropriate end-of |
| 357 | marker. On success, this marker is returned. Otherwise, NULL is |
| 358 | returned. Note there is no rollback on the HTX message when an error |
| 359 | occurred. Some headers or trailers may have been added. So it is the |
| 360 | caller responsibility to take care of that. |
| 361 | |
| 362 | - htx_add_data() must be used to add a DATA block. Unlike previous |
| 363 | functions, this one returns the number of bytes copied or 0 if nothing was |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 364 | copied. If possible, the data are appended to the tail block if it is a |
| 365 | DATA block. Only a part of the payload may be copied because this function |
| 366 | will try to limit the message defragmentation and the wrapping of blocks |
| 367 | as far as possible. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 368 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 369 | - htx_add_data_atonce() must be used if all data must be added or nothing. |
| 370 | It tries to insert all the payload, this function returns the inserted |
| 371 | block on success. Otherwise it returns NULL. |
| 372 | |
| 373 | When an HTX block is added, it is always the last one (the tail). But, if a |
| 374 | block must be added at a specific place, it is not really handy. 2 functions may |
| 375 | help (others could be added) : |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 376 | |
| 377 | - htx_add_last_data() adds a DATA block just after all other DATA blocks and |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 378 | before any trailers and EOT marker. It relies on htx_add_data_atonce(), so |
| 379 | a defragmentation may be performed. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 380 | |
| 381 | - htx_move_blk_before() moves a specific block just after another one. Both |
| 382 | blocks must already be in the HTX message and the block to move must |
| 383 | always be placed after the "pivot". |
| 384 | |
| 385 | Once added, there are three functions to update the block's payload : |
| 386 | |
| 387 | - htx_replace_stline() updates a start-line. The HTX block must be passed as |
| 388 | argument. Only string parts of the start-line are updated by this |
| 389 | function. On success, it returns the new start-line. So it is pretty easy |
| 390 | to update its flags. NULL is returned if an error occurred. |
| 391 | |
| 392 | - htx_replace_header() fully replaces a header (its name and its value) by a |
| 393 | new one. The HTX block must be passed a argument, as well as its new name |
| 394 | and its new value. The new header can be smaller or larger than the old |
| 395 | one. This function returns the new HTX block on success, or NULL is an |
| 396 | error occurred. |
| 397 | |
| 398 | - htx_replace_blk_value() replaces a part of a block's payload or its |
| 399 | totality. It works for HEADERS, TRAILERS or DATA blocks. The HTX block |
| 400 | must be provided with the part to remove and the new one. The new part can |
| 401 | be smaller or larger than the old one. This function returns the new HTX |
| 402 | block on success, or NULL is an error occurred. |
| 403 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 404 | - htx_change_blk_value_len() changes the size of the value. It is the caller |
| 405 | responsibility to change the value itself, make sure there is enough space |
| 406 | and update allocated value. This function updates the HTX message |
| 407 | accordingly. |
| 408 | |
| 409 | - htx_set_blk_value_len() changes the size of the value. It is the caller |
| 410 | responsibility to change the value itself, make sure there is enough space |
| 411 | and update allocated value. Unlike the function |
| 412 | htx_change_blk_value_len(), this one does not update the HTX message. So |
| 413 | it should be used with caution. |
| 414 | |
| 415 | - htx_cut_data_blk() removes <n> bytes from the beginning of a DATA |
| 416 | block. The block's start address and its length are adjusted, and the |
| 417 | htx's total data count is updated. This is used to mark that part of some |
| 418 | data were transferred from a DATA block without removing this DATA |
| 419 | block. No sanity check is performed, the caller is responsible for doing |
| 420 | this exclusively on DATA blocks, and never removing more than the block's |
| 421 | size. |
| 422 | |
| 423 | - htx_remove_blk() removes a block from an HTX message. It returns the |
| 424 | following block or NULL if it is the tail block. |
| 425 | |
| 426 | Finally, a block may be removed using the function htx_remove_blk(). This |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 427 | function returns the block following the one removed or NULL if it is the tail |
| 428 | block. |
| 429 | |
| 430 | |
| 431 | 4.4. The HTX start-line |
| 432 | |
| 433 | Unlike other HTX blocks, the start-line is a bit special because its payload is |
| 434 | a structure followed by its three parts : |
| 435 | |
| 436 | +--------+-------+-------+-------+ |
| 437 | | HTX_SL | PART1 | PART2 | PART3 | |
| 438 | +--------+-------+-------+-------+ |
| 439 | |
| 440 | Some macros and functions may help to manipulate these parts : |
| 441 | |
| 442 | - HTX_SL_P{N}_LEN() and HTX_SL_P{N}_PTR() are macros to get the length of a |
| 443 | part and a pointer on it. {N} should be 1, 2 or 3. |
| 444 | |
| 445 | - HTX_SL_REQ_MLEN(), HTX_SL_REQ_ULEN(), HTX_SL_REQ_VLEN(), |
| 446 | HTX_SL_REQ_MPTR(), HTX_SL_REQ_UPTR() and HTX_SL_REQ_VPTR() are macros to |
| 447 | get info about a request start-line. These macros only wrap HTX_SL_P* |
| 448 | ones. |
| 449 | |
| 450 | - HTX_SL_RES_VLEN(), HTX_SL_RES_CLEN(), HTX_SL_RES_RLEN(), |
| 451 | HTX_SL_RES_VPTR(), HTX_SL_RES_CPTR() and HTX_SL_RES_RPTR() are macros to |
| 452 | get info about a response start-line. These macros only wrap HTX_SL_P* |
| 453 | ones. |
| 454 | |
| 455 | - htx_sl_p1(), htx_sl_p2() and htx_sl_p2() are functions to get the ist |
| 456 | corresponding to the right part of a start-line. |
| 457 | |
| 458 | - htx_sl_req_meth(), htx_sl_req_uri() and htx_sl_req_vsn() get the ist |
| 459 | corresponding to the right part of a request start-line. |
| 460 | |
| 461 | - htx_sl_res_vsn(), htx_sl_res_code() and htx_sl_res_reason() get the ist |
| 462 | corresponding to the right part of a response start-line. |
| 463 | |
| 464 | |
| 465 | 4.5. Iterate on the HTX message |
| 466 | |
| 467 | To iterate on an HTX message, the first thing to do is to get the HTX block to |
| 468 | start the loop. There are three special blocks in an HTX message that may be |
| 469 | good candidates to start a loop : |
| 470 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 471 | - the head block. It is the oldest inserted block. Multiplexers always start |
| 472 | to consume an HTX message from this block. The function htx_get_head() |
| 473 | returns its position and htx_get_head_blk() returns the blocks itself. In |
| 474 | addition, the function htx_get_head_type() returns its block's type. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 475 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 476 | - the tail block. It is the newest inserted block. The function |
| 477 | htx_get_tail() returns its position and htx_get_tail_blk() returns the |
| 478 | blocks itself. In addition, the function htx_get_tail_type() returns its |
| 479 | block's type. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 480 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 481 | - the first block. It is the block where to (re)start the analyse. It is |
| 482 | used as start point by HTX analyzers. The function htx_get_first() returns |
| 483 | its position and htx_get_first_blk() returns the blocks itself. In |
| 484 | addition, the function htx_get_first_type() returns its block's type. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 485 | |
| 486 | For all these functions, if the HTX message is empty, -1 is returned for the |
| 487 | block's position, NULL instead of a block and HTX_BLK_UNUSED for its type. |
| 488 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 489 | Then to iterate on blocks, foreword or backward : |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 490 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 491 | - htx_get_prev() and htx_get_next() return, respectively, the position of |
| 492 | the previous block or the next block, given a specific position. Or -1 if |
| 493 | an edge is reached. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 494 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 495 | - htx_get_prev_blk() and htx_get_next_blk() return, respectively, the |
| 496 | previous block or the next one, given a specific block. Or NULL if an edge |
| 497 | is reached. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 498 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 499 | 4.6. Access block content and info |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 500 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 501 | Following functions may be used to retrieve information about a specific HTX |
| 502 | block : |
| 503 | |
| 504 | - htx_get_blk_pos() returns the position of a block. It must be in the HTX |
| 505 | message. |
| 506 | |
| 507 | - htx_get_blk_ptr() returns a pointer on the payload of a block. |
| 508 | |
| 509 | - htx_get_blk_type() returns the type of a block. |
| 510 | |
| 511 | - htx_get_blksz() returns the payload size of a block |
| 512 | |
| 513 | - htx_get_blk_name() returns the name of a block, only if it is a header or |
| 514 | a trailer. Otherwise, it returns an empty string. |
| 515 | |
| 516 | - htx_get_blk_value() returns the value of a block, depending on its |
| 517 | type. For header and trailer blocks, it is the value field. For markers |
| 518 | (EOH or EOT), an empty string is returned. For other blocks an ist |
| 519 | pointing on the block payload is returned. |
| 520 | |
| 521 | - htx_is_unique_blk() may be used to know if a block is the only one |
Ilya Shipitsin | d7a988c | 2021-03-04 23:26:15 +0500 | [diff] [blame] | 522 | remaining inside an HTX message, excluding unused blocks. This function is |
| 523 | pretty useful to determine the end of a HTX message, in conjunction with |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 524 | HTX_FL_EOM flag. |
| 525 | |
| 526 | 4.7. Advanced functions |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 527 | |
| 528 | Some more advanced functions may be used to do complex processing on the HTX |
| 529 | message. These functions are used by HTX analyzers or by multiplexers. |
| 530 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 531 | - htx_truncate() removes all blocks after the one containing a specific |
| 532 | offset relatively to the head block of the HTX message. If the offset is |
| 533 | inside a DATA block, it is truncated. For all other blocks, the removal |
| 534 | starts to the next block. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 535 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 536 | - htx_drain() tries to remove a specific amount of bytes of payload. If the |
| 537 | tail block is a DATA block, it may be truncated if necessary. All other |
| 538 | block are removed at once or kept. This function returns a mixed value, |
| 539 | with the first block not removed, or NULL if everything was removed, and |
| 540 | the amount of data drained. |
| 541 | |
| 542 | - htx_xfer_blks() transfers HTX blocks from an HTX message to another, |
| 543 | stopping on the first block of a specified type or when a specific amount |
| 544 | of bytes, including meta-data, was moved. If the tail block is a DATA |
| 545 | block, it may be partially moved. All other block are transferred at once |
| 546 | or kept. This function returns a mixed value, with the last block moved, |
| 547 | or NULL if nothing was moved, and the amount of data transferred. When |
| 548 | HEADERS or TRAILERS blocks must be transferred, this function transfers |
| 549 | all of them. Otherwise, if it is not possible, it triggers an error. It is |
| 550 | the caller responsibility to transfer all headers or trailers at once. |
| 551 | |
| 552 | - htx_append_msg() append an HTX message to another one. All the message is |
| 553 | copied or nothing. So, if an error occurred, a rollback is performed. This |
| 554 | function returns 1 on success and 0 on error. |
| 555 | |
| 556 | - htx_reserve_max_data() Reserves the maximum possible size for an HTX data |
| 557 | block, by extending an existing one or by creating a new one. It returns a |
| 558 | compound result with the HTX block and the position where new data must be |
| 559 | inserted (0 for a new block). If an error occurs or if there is no space |
| 560 | left, NULL is returned instead of a pointer on an HTX block. |
| 561 | |
| 562 | - htx_find_offset() looks for the HTX block containing a specific offset, |
| 563 | starting at the HTX message's head. The function returns the found HTX |
| 564 | block and the position inside this block where the offset is. If the |
| 565 | offset is outside of the HTX message, NULL is returned. |
Christopher Faulet | bda62e7 | 2019-07-12 15:05:58 +0200 | [diff] [blame] | 566 | |
Christopher Faulet | 9a2cec4 | 2021-02-24 11:33:21 +0100 | [diff] [blame] | 567 | - htx_defrag() defragments an HTX message. It removes unused blocks and |
| 568 | unwraps the payloads part. A temporary buffer is used to do so. This |
| 569 | function never fails. A referenced block may be provided. If so, the |
| 570 | corresponding new block is returned. Otherwise, NULL is returned. |