Christopher Faulet | c3fe533 | 2016-04-07 15:30:10 +0200 | [diff] [blame] | 1 | ----------------------------------------- |
Willy Tarreau | 7d1b48f | 2016-05-10 15:36:58 +0200 | [diff] [blame^] | 2 | Filters Guide - version 1.7 |
Christopher Faulet | c3fe533 | 2016-04-07 15:30:10 +0200 | [diff] [blame] | 3 | ( Last update: 2016-04-18 ) |
| 4 | ------------------------------------------ |
| 5 | Author : Christopher Faulet |
| 6 | Contact : christopher dot faulet at capflam dot org |
| 7 | |
| 8 | |
| 9 | ABSTRACT |
| 10 | -------- |
| 11 | |
| 12 | The filters support is a new feature of HAProxy 1.7. It is a way to extend |
| 13 | HAProxy without touching its core code and, in certain extent, without knowing |
| 14 | its internals. This feature will ease contributions, reducing impact of |
| 15 | changes. Another advantage will be to simplify HAProxy by replacing some parts |
| 16 | by filters. As we will see, and as an example, the HTTP compression is the first |
| 17 | feature moved in a filter. |
| 18 | |
| 19 | This document describes how to write a filter and what you have to keep in mind |
| 20 | to do so. It also talks about the known limits and the pitfalls to avoid. |
| 21 | |
| 22 | As said, filters are quite new for now. The API is not freezed and will be |
| 23 | updated/modified/improved/extended as needed. |
| 24 | |
| 25 | |
| 26 | |
| 27 | SUMMARY |
| 28 | ------- |
| 29 | |
| 30 | 1. Filters introduction |
| 31 | 2. How to use filters |
| 32 | 3. How to write a new filter |
| 33 | 3.1. API Overview |
| 34 | 3.2. Defining the filter name and its configuration |
| 35 | 3.3. Managing the filter lifecycle |
| 36 | 3.4. Handling the streams creation and desctruction |
| 37 | 3.5. Analyzing the channels activity |
| 38 | 3.6. Filtering the data exchanged |
| 39 | 4. FAQ |
| 40 | |
| 41 | |
| 42 | |
| 43 | 1. FILTERS INTRODUCTION |
| 44 | ----------------------- |
| 45 | |
| 46 | First of all, to fully understand how filters work and how to create one, it is |
| 47 | best to know, at least from a distance, what is a proxy (frontend/backend), a |
| 48 | stream and a channel in HAProxy and how these entities are linked to each other. |
| 49 | doc/internals/entities.pdf is a good overview. |
| 50 | |
| 51 | Then, to support filters, many callbacks has been added to HAProxy at different |
| 52 | places, mainly around channel analyzers. Their purpose is to allow filters to |
| 53 | be involved in the data processing, from the stream creation/destruction to |
| 54 | the data forwarding. Depending of what it should do, a filter can implement all |
| 55 | or part of these callbacks. For now, existing callbacks are focused on |
| 56 | streams. But futur improvements could enlarge filters scope. For example, it |
| 57 | could be useful to handle events at the connection level. |
| 58 | |
| 59 | In HAProxy configuration file, a filter is declared in a proxy section, except |
| 60 | default. So the configuration corresponding to a filter declaration is attached |
| 61 | to a specific proxy, and will be shared by all its instances. it is opaque from |
| 62 | the HAProxy point of view, this is the filter responsibility to manage it. For |
| 63 | each filter declaration matches a uniq configuration. Several declarations of |
| 64 | the same filter in the same proxy will be handle as different filters by |
| 65 | HAProxy. |
| 66 | |
| 67 | A filter instance is represented by a partially opaque context (or a state) |
| 68 | attached to a stream and passed as arguments to callbacks. Through this context, |
| 69 | filter instances are stateful. Depending the filter is declared in a frontend or |
| 70 | a backend section, its instances will be created, respectively, when a stream is |
| 71 | created or when a backend is selected. Their behaviors will also be |
| 72 | different. Only instances of filters declared in a frontend section will be |
| 73 | aware of the creation and the destruction of the stream, and will take part in |
| 74 | the channels analyzing before the backend is defined. |
| 75 | |
| 76 | It is important to remember the configuration of a filter is shared by all its |
| 77 | instances, while the context of an instance is owned by a uniq stream. |
| 78 | |
| 79 | Filters are designed to be chained. It is possible to declare several filters in |
| 80 | the same proxy section. The declaration order is important because filters will |
| 81 | be called one after the other respecting this order. Frontend and backend |
| 82 | filters are also chained, frontend ones called first. Even if the filters |
| 83 | processing is serialized, each filter will bahave as it was alone (unless it was |
| 84 | developed to be aware of other filters). For all that, some constraints are |
| 85 | imposed to filters, especially when data exchanged between the client and the |
| 86 | server are processed. We will dicuss again these contraints when we will tackle |
| 87 | the subject of writing a filter. |
| 88 | |
| 89 | |
| 90 | |
| 91 | 2. HOW TO USE FILTERS |
| 92 | --------------------- |
| 93 | |
| 94 | To use a filter, you must use the parameter 'filter' followed by the filter name |
| 95 | and, optionnaly, its configuration in the desired listen, frontend or backend |
| 96 | section. For example: |
| 97 | |
| 98 | listen test |
| 99 | ... |
| 100 | filter trace name TST |
| 101 | ... |
| 102 | |
| 103 | |
| 104 | See doc/configuration.txt for a formal definition of the parameter 'filter'. |
| 105 | Note that additional parameters on the filter line must be parsed by the filter |
| 106 | itself. |
| 107 | |
| 108 | The list of available filters is reported by 'haproxy -vv': |
| 109 | |
| 110 | $> haproxy -vv |
| 111 | HA-Proxy version 1.7-dev2-3a1d4a-33 2016/03/21 |
| 112 | Copyright 2000-2016 Willy Tarreau <willy@haproxy.org> |
| 113 | |
| 114 | [...] |
| 115 | |
| 116 | Available filters : |
| 117 | [COMP] compression |
| 118 | [TRACE] trace |
| 119 | |
| 120 | |
| 121 | Multiple filter lines can be used in a proxy section to chain filters. Filters |
| 122 | will be called in the declaration order. |
| 123 | |
| 124 | Some filters can support implicit declarartions in certain circumstances |
| 125 | (without the filter line). This is not recommanded for new features but are |
| 126 | useful for existing ones moved in a filter, for backward compatibility |
| 127 | reasons. Implicit declarartions are supported when there is only one filter used |
| 128 | on a proxy. When several filters are used, explicit declarartions are mandatory. |
| 129 | The HTTP compression filter is one of these filters. Alone, using 'compression' |
| 130 | keywords is enough to use it. But when at least a second filter is used, a |
| 131 | filter line must be added. |
| 132 | |
| 133 | # filter line is optionnal |
| 134 | listen t1 |
| 135 | bind *:80 |
| 136 | compression algo gzip |
| 137 | compression offload |
| 138 | server srv x.x.x.x:80 |
| 139 | |
| 140 | # filter line is mandatory for the compression filter |
| 141 | listen t2 |
| 142 | bind *:81 |
| 143 | filter trace name T2 |
| 144 | filter compression |
| 145 | compression algo gzip |
| 146 | compression offload |
| 147 | server srv x.x.x.x:80 |
| 148 | |
| 149 | |
| 150 | |
| 151 | |
| 152 | 3. HOW TO WRITE A NEW FILTER |
| 153 | ---------------------------- |
| 154 | |
| 155 | If you want to write a filter, there are 2 header files that you must know: |
| 156 | |
| 157 | * include/types/filters.h: This is the main header file, containing all |
| 158 | important structures you will use. It represents |
| 159 | the filter API. |
| 160 | * include/proto/filters.h: This header file contains helper functions that |
| 161 | you may need to use. It also contains the internal |
| 162 | API used by HAProxy to handle filters. |
| 163 | |
| 164 | To ease the filters integration, it is better to follow some conventions: |
| 165 | |
| 166 | * Use 'flt_' prefix to name your filter (e.g: flt_http_comp or flt_trace). |
| 167 | * Keep everything related to your filter in a same file. |
| 168 | |
| 169 | The filter 'trace' can be used as a template to write your own filter. It is a |
| 170 | good start to see how filters really work. |
| 171 | |
| 172 | 3.1 API OVERVIEW |
| 173 | ---------------- |
| 174 | |
| 175 | Writing a filter can be summarized to write functions and attach them to the |
| 176 | existing callbacks. Available callbacks are listed in the following structure: |
| 177 | |
| 178 | struct flt_ops { |
| 179 | /* |
| 180 | * Callbacks to manage the filter lifecycle |
| 181 | */ |
| 182 | int (*init) (struct proxy *p, struct flt_conf *fconf); |
| 183 | void (*deinit)(struct proxy *p, struct flt_conf *fconf); |
| 184 | int (*check) (struct proxy *p, struct flt_conf *fconf); |
| 185 | |
| 186 | /* |
| 187 | * Stream callbacks |
| 188 | */ |
| 189 | int (*stream_start) (struct stream *s, struct filter *f); |
| 190 | void (*stream_stop) (struct stream *s, struct filter *f); |
| 191 | |
| 192 | /* |
| 193 | * Channel callbacks |
| 194 | */ |
| 195 | int (*channel_start_analyze)(struct stream *s, struct filter *f, |
| 196 | struct channel *chn); |
| 197 | int (*channel_analyze) (struct stream *s, struct filter *f, |
| 198 | struct channel *chn, |
| 199 | unsigned int an_bit); |
| 200 | int (*channel_end_analyze) (struct stream *s, struct filter *f, |
| 201 | struct channel *chn); |
| 202 | |
| 203 | /* |
| 204 | * HTTP callbacks |
| 205 | */ |
| 206 | int (*http_data) (struct stream *s, struct filter *f, |
| 207 | struct http_msg *msg); |
| 208 | int (*http_chunk_trailers)(struct stream *s, struct filter *f, |
| 209 | struct http_msg *msg); |
| 210 | int (*http_end) (struct stream *s, struct filter *f, |
| 211 | struct http_msg *msg); |
| 212 | int (*http_forward_data) (struct stream *s, struct filter *f, |
| 213 | struct http_msg *msg, |
| 214 | unsigned int len); |
| 215 | |
| 216 | void (*http_reset) (struct stream *s, struct filter *f, |
| 217 | struct http_msg *msg); |
| 218 | void (*http_reply) (struct stream *s, struct filter *f, |
| 219 | short status, |
| 220 | const struct chunk *msg); |
| 221 | |
| 222 | /* |
| 223 | * TCP callbacks |
| 224 | */ |
| 225 | int (*tcp_data) (struct stream *s, struct filter *f, |
| 226 | struct channel *chn); |
| 227 | int (*tcp_forward_data)(struct stream *s, struct filter *f, |
| 228 | struct channel *chn, |
| 229 | unsigned int len); |
| 230 | }; |
| 231 | |
| 232 | |
| 233 | We will explain in following parts when these callbacks are called and what they |
| 234 | should do. |
| 235 | |
| 236 | Filters are declared in proxy sections. So each proxy have an ordered list of |
| 237 | filters, possibly empty if no filter is used. When the configuration of a proxy |
| 238 | is parsed, each filter line represents an entry in this list. In the structure |
| 239 | 'proxy', the filters configurations are stored in the field 'filter_configs', |
| 240 | each one of type 'struct flt_conf *': |
| 241 | |
| 242 | /* |
| 243 | * Structure representing the filter configuration, attached to a proxy and |
| 244 | * accessible from a filter when instantiated in a stream |
| 245 | */ |
| 246 | struct flt_conf { |
| 247 | const char *id; /* The filter id */ |
| 248 | struct flt_ops *ops; /* The filter callbacks */ |
| 249 | void *conf; /* The filter configuration */ |
| 250 | struct list list; /* Next filter for the same proxy */ |
| 251 | }; |
| 252 | |
| 253 | * 'flt_conf.id' is an identifier, defined by the filter. It can be |
| 254 | NULL. HAProxy does not use this field. Filters can use it in log messages or |
| 255 | as a uniq identifier to check multiple declarations. It is the filter |
| 256 | responsibility to free it, if necessary. |
| 257 | |
| 258 | * 'flt_conf.conf' is opaque. It is the internal configuration of a filter, |
| 259 | generally allocated and filled by its parsing function (See § 3.2). It is |
| 260 | the filter responsibility to free it. |
| 261 | |
| 262 | * 'flt_conf.ops' references the callbacks implemented by the filter. This |
| 263 | field must be set during the parsing phase (See § 3.2) and can be refine |
| 264 | during the initialization phase (See § 3.3). If it is dynamically allocated, |
| 265 | it is the filter responsibility to free it. |
| 266 | |
| 267 | |
| 268 | The filter configuration is global and shared by all its instances. A filter |
| 269 | instance is created in the context of a stream and attached to this stream. in |
| 270 | the structure 'stream', the field 'strm_flt' is the state of all filter |
| 271 | instances attached to a stream: |
| 272 | |
| 273 | /* |
| 274 | * Structure reprensenting the "global" state of filters attached to a |
| 275 | * stream. |
| 276 | */ |
| 277 | struct strm_flt { |
| 278 | struct list filters; /* List of filters attached to a stream */ |
| 279 | struct filter *current[2]; /* From which filter resume processing, for a specific channel. |
| 280 | * This is used for resumable callbacks only, |
| 281 | * If NULL, we start from the first filter. |
| 282 | * 0: request channel, 1: response channel */ |
| 283 | unsigned short flags; /* STRM_FL_* */ |
| 284 | unsigned char nb_req_data_filters; /* Number of data filters registerd on the request channel */ |
| 285 | unsigned char nb_rsp_data_filters; /* Number of data filters registerd on the response channel */ |
| 286 | }; |
| 287 | |
| 288 | |
| 289 | Filter instances attached to a stream are stored in the field |
| 290 | 'strm_flt.filters', each instance is of type 'struct filter *': |
| 291 | |
| 292 | /* |
| 293 | * Structure reprensenting a filter instance attached to a stream |
| 294 | * |
| 295 | * 2D-Array fields are used to store info per channel. The first index |
| 296 | * stands for the request channel, and the second one for the response |
| 297 | * channel. Especially, <next> and <fwd> are offets representing amount of |
| 298 | * data that the filter are, respectively, parsed and forwarded on a |
| 299 | * channel. Filters can access these values using FLT_NXT and FLT_FWD |
| 300 | * macros. |
| 301 | */ |
| 302 | struct filter { |
| 303 | struct flt_conf *config; /* the filter's configuration */ |
| 304 | void *ctx; /* The filter context (opaque) */ |
| 305 | unsigned short flags; /* FLT_FL_* */ |
| 306 | unsigned int next[2]; /* Offset, relative to buf->p, to the next |
| 307 | * byte to parse for a specific channel |
| 308 | * 0: request channel, 1: response channel */ |
| 309 | unsigned int fwd[2]; /* Offset, relative to buf->p, to the next |
| 310 | * byte to forward for a specific channel |
| 311 | * 0: request channel, 1: response channel */ |
| 312 | struct list list; /* Next filter for the same proxy/stream */ |
| 313 | }; |
| 314 | |
| 315 | * 'filter.config' is the filter configuration previously described. All |
| 316 | instances of a filter share it. |
| 317 | |
| 318 | * 'filter.ctx' is an opaque context. It is managed by the filter, so it is its |
| 319 | responsibility to free it. |
| 320 | |
| 321 | * 'filter.next' and 'filter.fwd' will be described later (See § 3.6). |
| 322 | |
| 323 | |
| 324 | 3.2. DEFINING THE FILTER NAME AND ITS CONFIGURATION |
| 325 | --------------------------------------------------- |
| 326 | |
| 327 | When you write a filter, the first thing to do is to add it in the supported |
| 328 | filters. To do so, you must register its name as a valid keyword on the filter |
| 329 | line: |
| 330 | |
| 331 | /* Declare the filter parser for "my_filter" keyword */ |
| 332 | static struct flt_kw_list flt_kws = { "MY_FILTER_SCOPE", { }, { |
| 333 | { "my_filter", parse_my_filter_cfg }, |
| 334 | { NULL, NULL }, |
| 335 | } |
| 336 | }; |
| 337 | |
| 338 | __attribute__((constructor)) |
| 339 | static void |
| 340 | __my_filter_init(void) |
| 341 | { |
| 342 | flt_register_keywords(&flt_kws); |
| 343 | } |
| 344 | |
| 345 | |
| 346 | Then you must define the internal configuration your filter will use. For |
| 347 | example: |
| 348 | |
| 349 | struct my_filter_config { |
| 350 | struct proxy *proxy; |
| 351 | char *name; |
| 352 | /* ... */ |
| 353 | }; |
| 354 | |
| 355 | |
| 356 | You also must list all callbacks implemented by your filter. Here, we use a |
| 357 | global variable: |
| 358 | |
| 359 | struct flt_ops my_filter_ops { |
| 360 | .init = my_filter_init, |
| 361 | .deinit = my_filter_deinit, |
| 362 | .check = my_filter_config_check, |
| 363 | |
| 364 | /* ... */ |
| 365 | }; |
| 366 | |
| 367 | |
| 368 | Finally, you must define the function to parse your filter configuration, here |
| 369 | 'parse_my_filter_cfg'. This function must parse all remaining keywords on the |
| 370 | filter line: |
| 371 | |
| 372 | /* Return -1 on error, else 0 */ |
| 373 | static int |
| 374 | parse_my_filter_cfg(char **args, int *cur_arg, struct proxy *px, |
| 375 | struct flt_conf *flt_conf, char **err) |
| 376 | { |
| 377 | struct my_filter_config *my_conf; |
| 378 | int pos = *cur_arg; |
| 379 | |
| 380 | /* Allocate the internal configuration used by the filter */ |
| 381 | my_conf = calloc(1, sizeof(*my_conf)); |
| 382 | if (!my_conf) { |
| 383 | memprintf(err, "%s: out of memory", args[*cur_arg]); |
| 384 | return -1; |
| 385 | } |
| 386 | my_conf->proxy = px; |
| 387 | |
| 388 | /* ... */ |
| 389 | |
| 390 | /* Parse all keywords supported by the filter and fill the internal |
| 391 | * configuration */ |
| 392 | pos++; /* Skip the filter name */ |
| 393 | while (*args[pos]) { |
| 394 | if (!strcmp(args[pos], "name")) { |
| 395 | if (!*args[pos + 1]) { |
| 396 | memprintf(err, "'%s' : '%s' option without value", |
| 397 | args[*cur_arg], args[pos]); |
| 398 | goto error; |
| 399 | } |
| 400 | my_conf->name = strdup(args[pos + 1]); |
| 401 | if (!my_conf->name) { |
| 402 | memprintf(err, "%s: out of memory", args[*cur_arg]); |
| 403 | goto error; |
| 404 | } |
| 405 | pos += 2; |
| 406 | } |
| 407 | |
| 408 | /* ... parse other keywords ... */ |
| 409 | } |
| 410 | *cur_arg = pos; |
| 411 | |
| 412 | /* Set callbacks supported by the filter */ |
| 413 | flt_conf->ops = &my_filter_ops; |
| 414 | |
| 415 | /* Last, save the internal configuration */ |
| 416 | flt_conf->conf = my_conf; |
| 417 | return 0; |
| 418 | |
| 419 | error: |
| 420 | if (my_conf->name) |
| 421 | free(my_conf->name); |
| 422 | free(my_conf); |
| 423 | return -1; |
| 424 | } |
| 425 | |
| 426 | |
| 427 | WARNING: In your parsing function, you must define 'flt_conf->ops'. You must |
| 428 | also parse all arguments on the filter line. This is mandatory. |
| 429 | |
| 430 | In the previous example, we expect to read a filter line as follows: |
| 431 | |
| 432 | filter my_filter name MY_NAME ... |
| 433 | |
| 434 | |
| 435 | Optionnaly, by implementing the 'flt_ops.check' callback, you add a step to |
| 436 | check the internal configuration of your filter after the parsing phase, when |
| 437 | the HAProxy configuration is fully defined. For example: |
| 438 | |
| 439 | /* Check configuration of a trace filter for a specified proxy. |
| 440 | * Return 1 on error, else 0. */ |
| 441 | static int |
| 442 | my_filter_config_check(struct proxy *px, struct flt_conf *my_conf) |
| 443 | { |
| 444 | if (px->mode != PR_MODE_HTTP) { |
| 445 | Alert("The filter 'my_filter' cannot be used in non-HTTP mode.\n"); |
| 446 | return 1; |
| 447 | } |
| 448 | |
| 449 | /* ... */ |
| 450 | |
| 451 | return 0; |
| 452 | } |
| 453 | |
| 454 | |
| 455 | |
| 456 | 3.3. MANAGING THE FILTER LIFECYCLE |
| 457 | ---------------------------------- |
| 458 | |
| 459 | Once the configuration parsed and checked, filters are ready to by used. There |
| 460 | are two callbacks to manage the filter lifecycle: |
| 461 | |
| 462 | * 'flt_ops.init': It initializes the filter for a proxy. You may define this |
| 463 | callback if you need to complete your filter configuration. |
| 464 | |
| 465 | * 'flt_ops.deinit': It cleans up what the parsing function and the init |
| 466 | callback have done. This callback is useful to release |
| 467 | memory allocated for the filter configuration. |
| 468 | |
| 469 | Here is an example: |
| 470 | |
| 471 | /* Initialize the filter. Returns -1 on error, else 0. */ |
| 472 | static int |
| 473 | my_filter_init(struct proxy *px, struct flt_conf *fconf) |
| 474 | { |
| 475 | struct my_filter_config *my_conf = fconf->conf; |
| 476 | |
| 477 | /* ... */ |
| 478 | |
| 479 | return 0; |
| 480 | } |
| 481 | |
| 482 | /* Free ressources allocated by the trace filter. */ |
| 483 | static void |
| 484 | my_filter_deinit(struct proxy *px, struct flt_conf *fconf) |
| 485 | { |
| 486 | struct my_filter_config *my_conf = fconf->conf; |
| 487 | |
| 488 | if (my_conf) { |
| 489 | free(my_conf->name); |
| 490 | /* ... */ |
| 491 | free(my_conf); |
| 492 | } |
| 493 | fconf->conf = NULL; |
| 494 | } |
| 495 | |
| 496 | |
| 497 | TODO: Add callbacks to handle creation/destruction of filter instances. And |
| 498 | document it. |
| 499 | |
| 500 | |
| 501 | 3.4. HANDLING THE STREAMS CREATION AND DESCTRUCTION |
| 502 | --------------------------------------------------- |
| 503 | |
| 504 | You may be interessted to handle stream creation and destruction. If so, you |
| 505 | must define followings callbacks: |
| 506 | |
| 507 | * 'flt_ops.stream_start': It is called when a stream is started. This callback |
| 508 | can fail by returning a negative value. It will be |
| 509 | considered as a critical error by HAProxy which |
| 510 | disabled the listener for a short time. |
| 511 | |
| 512 | * 'flt_ops.stream_stop': It is called when a stream is stopped. This callback |
| 513 | always succeed. Anyway, it is too late to return an |
| 514 | error. |
| 515 | |
| 516 | For example: |
| 517 | |
| 518 | /* Called when a stream is created. Returns -1 on error, else 0. */ |
| 519 | static int |
| 520 | my_filter_stream_start(struct stream *s, struct filter *filter) |
| 521 | { |
| 522 | struct my_filter_config *my_conf = FLT_CONF(filter); |
| 523 | |
| 524 | /* ... */ |
| 525 | |
| 526 | return 0; |
| 527 | } |
| 528 | |
| 529 | /* Called when a stream is destroyed */ |
| 530 | static void |
| 531 | my_filter_stream_stop(struct stream *s, struct filter *filter) |
| 532 | { |
| 533 | struct my_filter_config *my_conf = FLT_CONF(filter); |
| 534 | |
| 535 | /* ... */ |
| 536 | } |
| 537 | |
| 538 | |
| 539 | WARNING: Handling the streams creation and destuction is only possible for |
| 540 | filters defined on proxies with the frontend capability. |
| 541 | |
| 542 | |
| 543 | 3.5. ANALYZING THE CHANNELS ACTIVITY |
| 544 | ------------------------------------ |
| 545 | |
| 546 | The main purpose of filters is to take part in the channels analyzing. To do so, |
| 547 | there is a callback, 'flt_ops.channel_analyze', called before each analyzer |
| 548 | attached to a channel, execpt analyzers responsible for the data |
| 549 | parsing/forwarding (TCP data or HTTP body). Concretely, on the request channel, |
| 550 | 'flt_ops.channel_analyze' could be called before following analyzers: |
| 551 | |
| 552 | * tcp_inspect_request (AN_REQ_INSPECT_FE and AN_REQ_INSPECT_BE) |
| 553 | * http_wait_for_request (AN_REQ_WAIT_HTTP) |
| 554 | * http_wait_for_request_body (AN_REQ_HTTP_BODY) |
| 555 | * http_process_req_common (AN_REQ_HTTP_PROCESS_FE) |
| 556 | * process_switching_rules (AN_REQ_SWITCHING_RULES) |
| 557 | * http_process_req_ common (AN_REQ_HTTP_PROCESS_BE) |
| 558 | * http_process_tarpit (AN_REQ_HTTP_TARPIT) |
| 559 | * process_server_rules (AN_REQ_SRV_RULES) |
| 560 | * http_process_request (AN_REQ_HTTP_INNER) |
| 561 | * tcp_persist_rdp_cookie (AN_REQ_PRST_RDP_COOKIE) |
| 562 | * process_sticking_rules (AN_REQ_STICKING_RULES) |
| 563 | * flt_analyze_http_headers (AN_FLT_HTTP_HDRS) |
| 564 | |
| 565 | And on the response channel: |
| 566 | |
| 567 | * tcp_inspect_response (AN_RES_INSPECT) |
| 568 | * http_wait_for_response (AN_RES_WAIT_HTTP) |
| 569 | * process_store_rules (AN_RES_STORE_RULES) |
| 570 | * http_process_res_common (AN_RES_HTTP_PROCESS_BE) |
| 571 | * flt_analyze_http_headers (AN_FLT_HTTP_HDRS) |
| 572 | |
| 573 | Note that 'flt_analyze_http_headers' (AN_FLT_HTTP_HDRS) is a new analyzer. It |
| 574 | has been added to let filters analyze HTTP headers after all processing, just |
| 575 | before the data parsing/forwarding. |
| 576 | |
| 577 | Unlike the other callbacks previously seen before, 'flt_ops.channel_analyze' can |
| 578 | interrupt the stream processing. So a filter can decide to not execute the |
| 579 | analyzer that follows and wait the next iteration. If there are more than one |
| 580 | filter, following ones are skipped. On the next iteration, the filtering resumes |
| 581 | where it was stopped, i.e. on the filter that has previously stopped the |
| 582 | processing. So it is possible for a filter to stop the stream processing for a |
| 583 | while before continuing. For example: |
| 584 | |
| 585 | /* Called before a processing happens on a given channel. |
| 586 | * Returns a negative value if an error occurs, 0 if it needs to wait, |
| 587 | * any other value otherwise. */ |
| 588 | static int |
| 589 | my_filter_chn_analyze(struct stream *s, struct filter *filter, |
| 590 | struct channel *chn, unsigned an_bit) |
| 591 | { |
| 592 | struct my_filter_config *my_conf = FLT_CONF(filter); |
| 593 | |
| 594 | switch (an_bit) { |
| 595 | case AN_REQ_WAIT_HTTP: |
| 596 | if (/* wait that a condition is verified before continuing */) |
| 597 | return 0; |
| 598 | break; |
| 599 | /* ... * / |
| 600 | } |
| 601 | return 1; |
| 602 | } |
| 603 | |
| 604 | * 'an_bit' is the analyzer id. All analyzers are listed in |
| 605 | 'include/types/channels.h'. |
| 606 | |
| 607 | * 'chn' is the channel on which the analyzing is done. You can know if it is |
| 608 | the request or the response channel by testing if CF_ISRESP flag is set: |
| 609 | |
| 610 | │ ((chn->flags & CF_ISRESP) == CF_ISRESP) |
| 611 | |
| 612 | |
| 613 | In previous example, the stream processing is blocked before receipt of the HTTP |
| 614 | request until a condition is verified. |
| 615 | |
| 616 | To surround activity of a filter during the channel analyzing, two new analyzers |
| 617 | has been added: |
| 618 | |
| 619 | * 'flt_start_analyze' (AN_FLT_START_FE/AN_FLT_START_BE): For a specific |
| 620 | filter, this analyzer is called before any call to the 'channel_analyze' |
| 621 | callback. From the filter point of view, it calls the |
| 622 | 'flt_ops.channel_start_analyze' callback. |
| 623 | |
| 624 | * 'flt_end_analyze' (AN_FLT_END): For a specific filter, this analyzer is |
| 625 | called when all other analyzers have finished their processing. From the |
| 626 | filter point of view, it calls the 'flt_ops.channel_end_analyze' callback. |
| 627 | |
| 628 | For TCP streams, these analyzers are called only once. For HTTP streams, if the |
| 629 | client connection is kept alive, this happens at each request/response roundtip. |
| 630 | |
| 631 | 'flt_ops.channel_start_analyze' and 'flt_ops.channel_end_analyze' callbacks can |
| 632 | interrupt the stream processing, as 'flt_ops.channel_analyze'. Here is an |
| 633 | example: |
| 634 | |
| 635 | /* Called when analyze starts for a given channel |
| 636 | * Returns a negative value if an error occurs, 0 if it needs to wait, |
| 637 | * any other value otherwise. */ |
| 638 | static int |
| 639 | my_filter_chn_start_analyze(struct stream *s, struct filter *filter, |
| 640 | struct channel *chn) |
| 641 | { |
| 642 | struct my_filter_config *my_conf = FLT_CONF(filter); |
| 643 | |
| 644 | /* ... TODO ... */ |
| 645 | |
| 646 | return 1; |
| 647 | } |
| 648 | |
| 649 | /* Called when analyze ends for a given channel |
| 650 | * Returns a negative value if an error occurs, 0 if it needs to wait, |
| 651 | * any other value otherwise. */ |
| 652 | static int |
| 653 | my_filter_chn_end_analyze(struct stream *s, struct filter *filter, |
| 654 | struct channel *chn) |
| 655 | { |
| 656 | struct my_filter_config *my_conf = FLT_CONF(filter); |
| 657 | |
| 658 | /* ... TODO ... */ |
| 659 | |
| 660 | return 1; |
| 661 | } |
| 662 | |
| 663 | |
| 664 | Workflow on channels can be summarized as following: |
| 665 | |
| 666 | | |
| 667 | +----------+-----------+ |
| 668 | | flt_ops.stream_start | |
| 669 | +----------+-----------+ |
| 670 | | |
| 671 | ... |
| 672 | | |
| 673 | +-<-- [1] +------->---------+ |
| 674 | | --+ | | --+ |
| 675 | +------<----------+ | | +--------<--------+ | |
| 676 | | | | | | | | |
| 677 | V | | | V | | |
| 678 | +-------------------------------+ | | | +-------------------------------+ | | |
| 679 | | flt_start_analyze +-+ | | | flt_start_analyze +-+ | |
| 680 | |(flt_ops.channel_start_analyze)| | F | |(flt_ops.channel_start_analyze)| | |
| 681 | +---------------+---------------+ | R | +---------------+---------------+ | |
| 682 | | | O | | | |
| 683 | +------<--------+ | N ^ +--------<-------+ | B |
| 684 | | | | T | | | | A |
| 685 | +---------------+----------+ | | E | +---------------+----------+ | | C |
| 686 | |+--------------V-----------+ | | N | |+--------------V-----------+ | | K |
| 687 | ||+--------------------------+ | | D | ||+--------------------------+ | | E |
| 688 | ||| flt_ops.channel_analyze | | | | ||| flt_ops.channel_analyze | | | N |
| 689 | +|| V +--+ | | +|| V +---+ | D |
| 690 | +| analyzer | | | +| analyzer | | |
| 691 | +-------------+------------+ | | +-------------+------------+ | |
| 692 | | --+ | | | |
| 693 | +------------>------------+ ... | |
| 694 | | | |
| 695 | [ data filtering (see below) ] | |
| 696 | | | |
| 697 | ... | |
| 698 | | | |
| 699 | +--------<--------+ | |
| 700 | | | | |
| 701 | V | | |
| 702 | +-------------------------------+ | | |
| 703 | | flt_end_analyze +-+ | |
| 704 | | (flt_ops.channel_end_analyze) | | |
| 705 | +---------------+---------------+ | |
| 706 | | --+ |
| 707 | If HTTP stream, go back to [1] --<--+ |
| 708 | | |
| 709 | ... |
| 710 | | |
| 711 | +----------+-----------+ |
| 712 | | flt_ops.stream_stop | |
| 713 | +----------+-----------+ |
| 714 | | |
| 715 | V |
| 716 | |
| 717 | |
| 718 | TODO: Add pre/post analyzer callbacks with a mask. So, this part will be |
| 719 | massively refactored very soon. |
| 720 | |
| 721 | |
| 722 | 3.6. FILTERING THE DATA EXCHANGED |
| 723 | ----------------------------------- |
| 724 | |
| 725 | WARNING: To fully understand this part, you must be aware on how the buffers |
| 726 | work in HAProxy. In particular, you must be comfortable with the idea |
| 727 | of circular buffers. See doc/internals/buffer-operations.txt and |
| 728 | doc/internals/buffer-ops.fig for details. |
| 729 | doc/internals/body-parsing.txt could also be useful. |
| 730 | |
| 731 | An extended feature of the filters is the data filtering. By default a filter |
| 732 | does not look into data exchanged between the client and the server because it |
| 733 | is expensive. Indeed, instead of forwarding data without any processing, each |
| 734 | byte need to be buffered. |
| 735 | |
| 736 | So, to enable the data filtering on a channel, at any time, in one of previous |
| 737 | callbacks, you should call 'register_data_filter' function. And conversely, to |
| 738 | disable it, you should call 'unregister_data_filter' function. For example: |
| 739 | |
| 740 | my_filter_chn_analyze(struct stream *s, struct filter *filter, |
| 741 | struct channel *chn, unsigned an_bit) |
| 742 | { |
| 743 | struct my_filter_config *my_conf = FLT_CONF(filter); |
| 744 | |
| 745 | /* 'chn' must be the request channel */ |
| 746 | if (!(chn->flags & CF_ISRESP) && an_bit == AN_FLT_HTTP_HDRS) { |
| 747 | struct http_txn *txn = s->txn; |
| 748 | struct http_msg *msg = &txn->req; |
| 749 | struct buffer *req = msg->chn->buf; |
| 750 | struct hdr_ctx ctx; |
| 751 | |
| 752 | /* Enable the data filtering for the request if 'X-Filter' header |
| 753 | * is set to 'true'. */ |
| 754 | if (http_find_header2("X-Filter", 8, req->p, &txn->hdr_idx, &ctx) && |
| 755 | ctx.vlen >= 3 && memcmp(ctx.line + ctx.val, "true", 4) == 0) |
| 756 | register_data_filter(s, chn_filter); |
| 757 | } |
| 758 | |
| 759 | return 1; |
| 760 | } |
| 761 | |
| 762 | Here, the data filtering is enabled if the HTTP header 'X-Filter' is found and |
| 763 | set to 'true'. |
| 764 | |
| 765 | If several filters are declared, the evaluation order remains the same, |
| 766 | regardless the order of the registrations to the data filtering. |
| 767 | |
| 768 | Depending on the stream type, TCP or HTTP, the way to handle data filtering will |
| 769 | be slightly different. Among other things, for HTTP streams, there are more |
| 770 | callbacks to help you to fully handle all steps of an HTTP transaction. But the |
| 771 | basis is the same. The data filtering is done in 2 stages: |
| 772 | |
| 773 | * The data parsing: At this stage, filters will analyze input data on a |
| 774 | channel. Once a filter has parsed some data, it cannot parse it again. At |
| 775 | any time, a filter can choose to not parse all available data. So, it is |
| 776 | possible for a filter to retain data for a while. Because filters are |
| 777 | chained, a filter cannot parse more data than its predecessors. Thus only |
| 778 | data considered as parsed by the last filter will be available to the next |
| 779 | stage, the data forwarding. |
| 780 | |
| 781 | * The data forwarding: At this stage, filters will decide how much data |
| 782 | HAProxy can forward among those considered as parsed at the previous |
| 783 | stage. Once a filter has marked data as forwardable, it cannot analyze it |
| 784 | anymore. At any time, a filter can choose to not forward all parsed |
| 785 | data. So, it is possible for a filter to retain data for a while. Because |
| 786 | filters are chained, a filter cannot forward more data than its |
| 787 | predecessors. Thus only data marked as forwardable by the last filter will |
| 788 | be actually forwarded by HAProxy. |
| 789 | |
| 790 | Internally, filters own 2 offsets, relatively to 'buf->p', representing the |
| 791 | number of bytes already parsed in the available input data and the number of |
| 792 | bytes considered as forwarded. We will call these offsets, respectively, 'nxt' |
| 793 | and 'fwd'. Following macros reference these offsets: |
| 794 | |
| 795 | * FLT_NXT(flt, chn), flt_req_nxt(flt) and flt_rsp_nxt(flt) |
| 796 | |
| 797 | * FLT_FWD(flt, chn), flt_req_fwd(flt) and flt_rsp_fwd(flt) |
| 798 | |
| 799 | where 'flt' is the 'struct filter' passed as argument in all callbacks and 'chn' |
| 800 | is the considered channel. |
| 801 | |
| 802 | Using these offsets, following operations on buffers are possible: |
| 803 | |
| 804 | chn->buf->p + FLT_NXT(flt, chn) // the pointer on parsable data for |
| 805 | // the filter 'flt' on the channel 'chn'. |
| 806 | // Everything between chn->buf->p and 'nxt' offset was already parsed |
| 807 | // by the filter. |
| 808 | |
| 809 | chn->buf->i - FLT_NXT(flt, chn) // the number of bytes of parsable data for |
| 810 | // the filter 'flt' on the channel 'chn'. |
| 811 | |
| 812 | chn->buf->p + FLT_FWD(flt, chn) // the pointer on forwardable data for |
| 813 | // the filter 'flt' on the channel 'chn'. |
| 814 | // Everything between chn->buf->p and 'fwd' offset was already forwarded |
| 815 | // by the filter. |
| 816 | |
| 817 | |
| 818 | Note that at any time, for a filter, 'nxt' offset is always greater or equal to |
| 819 | 'fwd' offset. |
| 820 | |
| 821 | TODO: Add schema with buffer states when there is 2 filters that analyze data. |
| 822 | |
| 823 | |
| 824 | 3.6.1 FILTERING DATA ON TCP STREAMS |
| 825 | ----------------------------------- |
| 826 | |
| 827 | The TCP data filtering is the easy case, because HAProxy do not parse these |
| 828 | data. So you have only two callbacks that you need to consider: |
| 829 | |
| 830 | * 'flt_ops.tcp_data': This callback is called when unparsed data are |
| 831 | available. If not defined, all available data will be considered as parsed |
| 832 | for the filter. |
| 833 | |
| 834 | * 'flt_ops.tcp_forward_data': This callback is called when parsed data are |
| 835 | available. If not defined, all parsed data will be considered as forwarded |
| 836 | for the filter. |
| 837 | |
| 838 | Here is an example: |
| 839 | |
| 840 | /* Returns a negative value if an error occurs, else the number of |
| 841 | * consumed bytes. */ |
| 842 | static int |
| 843 | my_filter_tcp_data(struct stream *s, struct filter *filter, |
| 844 | struct channel *chn) |
| 845 | { |
| 846 | struct my_filter_config *my_conf = FLT_CONF(filter); |
| 847 | int avail = chn->buf->i - FLT_NXT(filter, chn); |
| 848 | int ret = avail; |
| 849 | |
| 850 | /* Do not parse more than 'my_conf->max_parse' bytes at a time */ |
| 851 | if (my_conf->max_parse != 0 && ret > my_conf->max_parse) |
| 852 | ret = my_conf->max_parse; |
| 853 | |
| 854 | /* if available data are not completely parsed, wake up the stream to |
| 855 | * be sure to not freeze it. */ |
| 856 | if (ret != avail) |
| 857 | task_wakeup(s->task, TASK_WOKEN_MSG); |
| 858 | return ret; |
| 859 | } |
| 860 | |
| 861 | |
| 862 | /* Returns a negative value if an error occurs, else * or the number of |
| 863 | * forwarded bytes. */ |
| 864 | static int |
| 865 | my_filter_tcp_forward_data(struct stream *s, struct filter *filter, |
| 866 | struct channel *chn, unsigned int len) |
| 867 | { |
| 868 | struct my_filter_config *my_conf = FLT_CONF(filter); |
| 869 | int ret = len; |
| 870 | |
| 871 | /* Do not forward more than 'my_conf->max_forward' bytes at a time */ |
| 872 | if (my_conf->max_forward != 0 && ret > my_conf->max_forward) |
| 873 | ret = my_conf->max_forward; |
| 874 | |
| 875 | /* if parsed data are not completely forwarded, wake up the stream to |
| 876 | * be sure to not freeze it. */ |
| 877 | if (ret != len) |
| 878 | task_wakeup(s->task, TASK_WOKEN_MSG); |
| 879 | return ret; |
| 880 | } |
| 881 | |
| 882 | |
| 883 | |
| 884 | 3.6.2 FILTERING DATA ON HTTP STREAMS |
| 885 | ------------------------------------ |
| 886 | |
| 887 | The HTTP data filtering is a bit tricky because HAProxy will parse the body |
| 888 | structure, especially chunked body. So basically there is the HTTP counterpart |
| 889 | to the previous callbacks: |
| 890 | |
| 891 | * 'flt_ops.http_data': This callback is called when unparsed data are |
| 892 | available. If not defined, all available data will be considered as parsed |
| 893 | for the filter. |
| 894 | |
| 895 | * 'flt_ops.http_forward_data': This callback is called when parsed data are |
| 896 | available. If not defined, all parsed data will be considered as forwarded |
| 897 | for the filter. |
| 898 | |
| 899 | But the prototype for these callbacks is slightly different. Instead of having |
| 900 | the channel as parameter, we have the HTTP message (struct http_msg). You need |
| 901 | to be careful when you use 'http_msg.chunk_len' size. This value is the number |
| 902 | of bytes remaining to parse in the HTTP body (or the chunk for chunked |
| 903 | messages). The HTTP parser of HAProxy uses it to have the number of bytes that |
| 904 | it could consume: |
| 905 | |
| 906 | /* Available input data in the current chunk from the HAProxy point of view. |
| 907 | * msg->next bytes were already parsed. Without data filtering, HAProxy |
| 908 | * will consume all of it. */ |
| 909 | Bytes = MIN(msg->chunk_len, chn->buf->i - msg->next); |
| 910 | |
| 911 | |
| 912 | But in your filter, you need to recompute it: |
| 913 | |
| 914 | /* Available input data in the current chunk from the filter point of view. |
| 915 | * 'nxt' bytes were already parsed. */ |
| 916 | Bytes = MIN(msg->chunk_len + msg->next, chn->buf->i) - FLT_NXT(flt, chn); |
| 917 | |
| 918 | |
| 919 | In addition to these callbacks, there are two other: |
| 920 | |
| 921 | * 'flt_ops.http_end': This callback is called when the whole HTTP |
| 922 | request/response is processed. It can interrupt the stream processing. So, |
| 923 | it could be used to synchronize the HTTP request with the HTTP response, for |
| 924 | example: |
| 925 | |
| 926 | /* Returns a negative value if an error occurs, 0 if it needs to wait, |
| 927 | * any other value otherwise. */ |
| 928 | static int |
| 929 | my_filter_http_end(struct stream *s, struct filter *filter, |
| 930 | struct http_msg *msg) |
| 931 | { |
| 932 | struct my_filter_ctx *my_ctx = filter->ctx; |
| 933 | |
| 934 | |
| 935 | if (!(msg->chn->flags & CF_ISRESP)) /* The request */ |
| 936 | my_ctx->end_of_req = 1; |
| 937 | else /* The response */ |
| 938 | my_ctx->end_of_rsp = 1; |
| 939 | |
| 940 | /* Both the request and the response are finished */ |
| 941 | if (my_ctx->end_of_req == 1 && my_ctx->end_of_rsp == 1) |
| 942 | return 1; |
| 943 | |
| 944 | /* Wait */ |
| 945 | return 0; |
| 946 | } |
| 947 | |
| 948 | |
| 949 | * 'flt_ops.http_chunk_trailers': This callback is called for chunked HTTP |
| 950 | messages only when all chunks were parsed. HTTP trailers can be parsed into |
| 951 | several passes. This callback will be called each time. The number of bytes |
| 952 | parsed by HAProxy at each iteration is stored in 'msg->sol'. |
| 953 | |
| 954 | Then, to finish, there are 2 informational callbacks: |
| 955 | |
| 956 | * 'flt_ops.http_reset': This callback is called when a HTTP message is |
| 957 | reset. This only happens when a '100-continue' response is received. It |
| 958 | could be useful to reset the filter context before receiving the true |
| 959 | response. |
| 960 | |
| 961 | * 'flt_ops.http_reply': This callback is called when, at any time, HAProxy |
| 962 | decides to stop the processing on a HTTP message and to send an internal |
| 963 | response to the client. This mainly happens when an error or a redirect |
| 964 | occurs. |
| 965 | |
| 966 | |
| 967 | 3.6.3 REWRITING DATA |
| 968 | -------------------- |
| 969 | |
| 970 | The last part, and the trickiest one about the data filtering, is about the data |
| 971 | rewriting. For now, the filter API does not offer a lot of functions to handle |
| 972 | it. There are only functions to notify HAProxy that the data size has changed to |
| 973 | let it update internal state of filters. This is your responsibility to update |
| 974 | data itself, i.e. the buffer offsets. For a HTTP message, you also must update |
| 975 | 'msg->next' and 'msg->chunk_len' values accordingly: |
| 976 | |
| 977 | * 'flt_change_next_size': This function must be called when a filter alter |
| 978 | incoming data. It updates 'nxt' offset value of all its predecessors. Do not |
| 979 | call this function when a filter change the size of incoming data leads to |
| 980 | an undefined behavior. |
| 981 | |
| 982 | unsigned int avail = MIN(msg->chunk_len + msg->next, chn->buf->i) - |
| 983 | flt_rsp_next(filter); |
| 984 | |
| 985 | if (avail > 10 and /* ...Some condition... */) { |
| 986 | /* Move the buffer forward to have buf->p pointing on unparsed |
| 987 | * data */ |
| 988 | b_adv(msg->chn->buf, flt_rsp_nxt(filter)); |
| 989 | |
| 990 | /* Skip first 10 bytes. To simplify this example, we consider a |
| 991 | * non-wrapping buffer */ |
| 992 | memmove(buf->p + 10, buf->p, avail - 10); |
| 993 | |
| 994 | /* Restore buf->p value */ |
| 995 | b_rew(msg->chn->buf, flt_rsp_nxt(filter)); |
| 996 | |
| 997 | /* Now update other filters */ |
| 998 | flt_change_next_size(filter, msg->chn, -10); |
| 999 | |
| 1000 | /* Update the buffer state */ |
| 1001 | buf->i -= 10; |
| 1002 | |
| 1003 | /* And update the HTTP message state */ |
| 1004 | msg->chunk_len -= 10; |
| 1005 | |
| 1006 | return (avail - 10); |
| 1007 | } |
| 1008 | else |
| 1009 | return 0; /* Wait for more data */ |
| 1010 | |
| 1011 | |
| 1012 | * 'flt_change_forward_size': This function must be called when a filter alter |
| 1013 | parsed data. It updates offset values ('nxt' and 'fwd') of all filters. Do |
| 1014 | not call this function when a filter change the size of parsed data leads to |
| 1015 | an undefined behavior. |
| 1016 | |
| 1017 | /* len is the number of bytes of forwardable data */ |
| 1018 | if (len > 10 and /* ...Some condition... */) { |
| 1019 | /* Move the buffer forward to have buf->p pointing on non-forwarded |
| 1020 | * data */ |
| 1021 | b_adv(msg->chn->buf, flt_rsp_fwd(filter)); |
| 1022 | |
| 1023 | /* Skip first 10 bytes. To simplify this example, we consider a |
| 1024 | * non-wrapping buffer */ |
| 1025 | memmove(buf->p + 10, buf->p, len - 10); |
| 1026 | |
| 1027 | /* Restore buf->p value */ |
| 1028 | b_rew(msg->chn->buf, flt_rsp_fwd(filter)); |
| 1029 | |
| 1030 | /* Now update other filters */ |
| 1031 | flt_change_forward_size(filter, msg->chn, -10); |
| 1032 | |
| 1033 | /* Update the buffer state */ |
| 1034 | buf->i -= 10; |
| 1035 | |
| 1036 | /* And update the HTTP message state */ |
| 1037 | msg->next -= 10; |
| 1038 | |
| 1039 | return (len - 10); |
| 1040 | } |
| 1041 | else |
| 1042 | return 0; /* Wait for more data */ |
| 1043 | |
| 1044 | |
| 1045 | TODO: implement all the stuff to easily rewrite data. For HTTP messages, this |
| 1046 | requires to have a chunked message. Else the size of data cannot be |
| 1047 | changed. |
| 1048 | |
| 1049 | |
| 1050 | |
| 1051 | |
| 1052 | 4. FAQ |
| 1053 | ------ |
| 1054 | |
| 1055 | 4.1. Detect multiple declarations of the same filter |
| 1056 | ---------------------------------------------------- |
| 1057 | |
| 1058 | TODO |