blob: 71013439d81cc5f87902f7ca14852e1e710b1c6b [file] [log] [blame]
Christopher Fauletc3fe5332016-04-07 15:30:10 +02001 -----------------------------------------
2 Filters Guide - version 0.1
3 ( Last update: 2016-04-18 )
4 ------------------------------------------
5 Author : Christopher Faulet
6 Contact : christopher dot faulet at capflam dot org
7
8
9ABSTRACT
10--------
11
12The filters support is a new feature of HAProxy 1.7. It is a way to extend
13HAProxy without touching its core code and, in certain extent, without knowing
14its internals. This feature will ease contributions, reducing impact of
15changes. Another advantage will be to simplify HAProxy by replacing some parts
16by filters. As we will see, and as an example, the HTTP compression is the first
17feature moved in a filter.
18
19This document describes how to write a filter and what you have to keep in mind
20to do so. It also talks about the known limits and the pitfalls to avoid.
21
22As said, filters are quite new for now. The API is not freezed and will be
23updated/modified/improved/extended as needed.
24
25
26
27SUMMARY
28-------
29
30 1. Filters introduction
31 2. How to use filters
32 3. How to write a new filter
33 3.1. API Overview
34 3.2. Defining the filter name and its configuration
35 3.3. Managing the filter lifecycle
36 3.4. Handling the streams creation and desctruction
37 3.5. Analyzing the channels activity
38 3.6. Filtering the data exchanged
39 4. FAQ
40
41
42
431. FILTERS INTRODUCTION
44-----------------------
45
46First of all, to fully understand how filters work and how to create one, it is
47best to know, at least from a distance, what is a proxy (frontend/backend), a
48stream and a channel in HAProxy and how these entities are linked to each other.
49doc/internals/entities.pdf is a good overview.
50
51Then, to support filters, many callbacks has been added to HAProxy at different
52places, mainly around channel analyzers. Their purpose is to allow filters to
53be involved in the data processing, from the stream creation/destruction to
54the data forwarding. Depending of what it should do, a filter can implement all
55or part of these callbacks. For now, existing callbacks are focused on
56streams. But futur improvements could enlarge filters scope. For example, it
57could be useful to handle events at the connection level.
58
59In HAProxy configuration file, a filter is declared in a proxy section, except
60default. So the configuration corresponding to a filter declaration is attached
61to a specific proxy, and will be shared by all its instances. it is opaque from
62the HAProxy point of view, this is the filter responsibility to manage it. For
63each filter declaration matches a uniq configuration. Several declarations of
64the same filter in the same proxy will be handle as different filters by
65HAProxy.
66
67A filter instance is represented by a partially opaque context (or a state)
68attached to a stream and passed as arguments to callbacks. Through this context,
69filter instances are stateful. Depending the filter is declared in a frontend or
70a backend section, its instances will be created, respectively, when a stream is
71created or when a backend is selected. Their behaviors will also be
72different. Only instances of filters declared in a frontend section will be
73aware of the creation and the destruction of the stream, and will take part in
74the channels analyzing before the backend is defined.
75
76It is important to remember the configuration of a filter is shared by all its
77instances, while the context of an instance is owned by a uniq stream.
78
79Filters are designed to be chained. It is possible to declare several filters in
80the same proxy section. The declaration order is important because filters will
81be called one after the other respecting this order. Frontend and backend
82filters are also chained, frontend ones called first. Even if the filters
83processing is serialized, each filter will bahave as it was alone (unless it was
84developed to be aware of other filters). For all that, some constraints are
85imposed to filters, especially when data exchanged between the client and the
86server are processed. We will dicuss again these contraints when we will tackle
87the subject of writing a filter.
88
89
90
912. HOW TO USE FILTERS
92---------------------
93
94To use a filter, you must use the parameter 'filter' followed by the filter name
95and, optionnaly, its configuration in the desired listen, frontend or backend
96section. For example:
97
98 listen test
99 ...
100 filter trace name TST
101 ...
102
103
104See doc/configuration.txt for a formal definition of the parameter 'filter'.
105Note that additional parameters on the filter line must be parsed by the filter
106itself.
107
108The list of available filters is reported by 'haproxy -vv':
109
110 $> haproxy -vv
111 HA-Proxy version 1.7-dev2-3a1d4a-33 2016/03/21
112 Copyright 2000-2016 Willy Tarreau <willy@haproxy.org>
113
114 [...]
115
116 Available filters :
117 [COMP] compression
118 [TRACE] trace
119
120
121Multiple filter lines can be used in a proxy section to chain filters. Filters
122will be called in the declaration order.
123
124Some filters can support implicit declarartions in certain circumstances
125(without the filter line). This is not recommanded for new features but are
126useful for existing ones moved in a filter, for backward compatibility
127reasons. Implicit declarartions are supported when there is only one filter used
128on a proxy. When several filters are used, explicit declarartions are mandatory.
129The HTTP compression filter is one of these filters. Alone, using 'compression'
130keywords is enough to use it. But when at least a second filter is used, a
131filter line must be added.
132
133 # filter line is optionnal
134 listen t1
135 bind *:80
136 compression algo gzip
137 compression offload
138 server srv x.x.x.x:80
139
140 # filter line is mandatory for the compression filter
141 listen t2
142 bind *:81
143 filter trace name T2
144 filter compression
145 compression algo gzip
146 compression offload
147 server srv x.x.x.x:80
148
149
150
151
1523. HOW TO WRITE A NEW FILTER
153----------------------------
154
155If you want to write a filter, there are 2 header files that you must know:
156
157 * include/types/filters.h: This is the main header file, containing all
158 important structures you will use. It represents
159 the filter API.
160 * include/proto/filters.h: This header file contains helper functions that
161 you may need to use. It also contains the internal
162 API used by HAProxy to handle filters.
163
164To ease the filters integration, it is better to follow some conventions:
165
166 * Use 'flt_' prefix to name your filter (e.g: flt_http_comp or flt_trace).
167 * Keep everything related to your filter in a same file.
168
169The filter 'trace' can be used as a template to write your own filter. It is a
170good start to see how filters really work.
171
1723.1 API OVERVIEW
173----------------
174
175Writing a filter can be summarized to write functions and attach them to the
176existing callbacks. Available callbacks are listed in the following structure:
177
178 struct flt_ops {
179 /*
180 * Callbacks to manage the filter lifecycle
181 */
182 int (*init) (struct proxy *p, struct flt_conf *fconf);
183 void (*deinit)(struct proxy *p, struct flt_conf *fconf);
184 int (*check) (struct proxy *p, struct flt_conf *fconf);
185
186 /*
187 * Stream callbacks
188 */
189 int (*stream_start) (struct stream *s, struct filter *f);
190 void (*stream_stop) (struct stream *s, struct filter *f);
191
192 /*
193 * Channel callbacks
194 */
195 int (*channel_start_analyze)(struct stream *s, struct filter *f,
196 struct channel *chn);
197 int (*channel_analyze) (struct stream *s, struct filter *f,
198 struct channel *chn,
199 unsigned int an_bit);
200 int (*channel_end_analyze) (struct stream *s, struct filter *f,
201 struct channel *chn);
202
203 /*
204 * HTTP callbacks
205 */
206 int (*http_data) (struct stream *s, struct filter *f,
207 struct http_msg *msg);
208 int (*http_chunk_trailers)(struct stream *s, struct filter *f,
209 struct http_msg *msg);
210 int (*http_end) (struct stream *s, struct filter *f,
211 struct http_msg *msg);
212 int (*http_forward_data) (struct stream *s, struct filter *f,
213 struct http_msg *msg,
214 unsigned int len);
215
216 void (*http_reset) (struct stream *s, struct filter *f,
217 struct http_msg *msg);
218 void (*http_reply) (struct stream *s, struct filter *f,
219 short status,
220 const struct chunk *msg);
221
222 /*
223 * TCP callbacks
224 */
225 int (*tcp_data) (struct stream *s, struct filter *f,
226 struct channel *chn);
227 int (*tcp_forward_data)(struct stream *s, struct filter *f,
228 struct channel *chn,
229 unsigned int len);
230 };
231
232
233We will explain in following parts when these callbacks are called and what they
234should do.
235
236Filters are declared in proxy sections. So each proxy have an ordered list of
237filters, possibly empty if no filter is used. When the configuration of a proxy
238is parsed, each filter line represents an entry in this list. In the structure
239'proxy', the filters configurations are stored in the field 'filter_configs',
240each one of type 'struct flt_conf *':
241
242 /*
243 * Structure representing the filter configuration, attached to a proxy and
244 * accessible from a filter when instantiated in a stream
245 */
246 struct flt_conf {
247 const char *id; /* The filter id */
248 struct flt_ops *ops; /* The filter callbacks */
249 void *conf; /* The filter configuration */
250 struct list list; /* Next filter for the same proxy */
251 };
252
253 * 'flt_conf.id' is an identifier, defined by the filter. It can be
254 NULL. HAProxy does not use this field. Filters can use it in log messages or
255 as a uniq identifier to check multiple declarations. It is the filter
256 responsibility to free it, if necessary.
257
258 * 'flt_conf.conf' is opaque. It is the internal configuration of a filter,
259 generally allocated and filled by its parsing function (See § 3.2). It is
260 the filter responsibility to free it.
261
262 * 'flt_conf.ops' references the callbacks implemented by the filter. This
263 field must be set during the parsing phase (See § 3.2) and can be refine
264 during the initialization phase (See § 3.3). If it is dynamically allocated,
265 it is the filter responsibility to free it.
266
267
268The filter configuration is global and shared by all its instances. A filter
269instance is created in the context of a stream and attached to this stream. in
270the structure 'stream', the field 'strm_flt' is the state of all filter
271instances attached to a stream:
272
273 /*
274 * Structure reprensenting the "global" state of filters attached to a
275 * stream.
276 */
277 struct strm_flt {
278 struct list filters; /* List of filters attached to a stream */
279 struct filter *current[2]; /* From which filter resume processing, for a specific channel.
280 * This is used for resumable callbacks only,
281 * If NULL, we start from the first filter.
282 * 0: request channel, 1: response channel */
283 unsigned short flags; /* STRM_FL_* */
284 unsigned char nb_req_data_filters; /* Number of data filters registerd on the request channel */
285 unsigned char nb_rsp_data_filters; /* Number of data filters registerd on the response channel */
286 };
287
288
289Filter instances attached to a stream are stored in the field
290'strm_flt.filters', each instance is of type 'struct filter *':
291
292 /*
293 * Structure reprensenting a filter instance attached to a stream
294 *
295 * 2D-Array fields are used to store info per channel. The first index
296 * stands for the request channel, and the second one for the response
297 * channel. Especially, <next> and <fwd> are offets representing amount of
298 * data that the filter are, respectively, parsed and forwarded on a
299 * channel. Filters can access these values using FLT_NXT and FLT_FWD
300 * macros.
301 */
302 struct filter {
303 struct flt_conf *config; /* the filter's configuration */
304 void *ctx; /* The filter context (opaque) */
305 unsigned short flags; /* FLT_FL_* */
306 unsigned int next[2]; /* Offset, relative to buf->p, to the next
307 * byte to parse for a specific channel
308 * 0: request channel, 1: response channel */
309 unsigned int fwd[2]; /* Offset, relative to buf->p, to the next
310 * byte to forward for a specific channel
311 * 0: request channel, 1: response channel */
312 struct list list; /* Next filter for the same proxy/stream */
313 };
314
315 * 'filter.config' is the filter configuration previously described. All
316 instances of a filter share it.
317
318 * 'filter.ctx' is an opaque context. It is managed by the filter, so it is its
319 responsibility to free it.
320
321 * 'filter.next' and 'filter.fwd' will be described later (See § 3.6).
322
323
3243.2. DEFINING THE FILTER NAME AND ITS CONFIGURATION
325---------------------------------------------------
326
327When you write a filter, the first thing to do is to add it in the supported
328filters. To do so, you must register its name as a valid keyword on the filter
329line:
330
331 /* Declare the filter parser for "my_filter" keyword */
332 static struct flt_kw_list flt_kws = { "MY_FILTER_SCOPE", { }, {
333 { "my_filter", parse_my_filter_cfg },
334 { NULL, NULL },
335 }
336 };
337
338 __attribute__((constructor))
339 static void
340 __my_filter_init(void)
341 {
342 flt_register_keywords(&flt_kws);
343 }
344
345
346Then you must define the internal configuration your filter will use. For
347example:
348
349 struct my_filter_config {
350 struct proxy *proxy;
351 char *name;
352 /* ... */
353 };
354
355
356You also must list all callbacks implemented by your filter. Here, we use a
357global variable:
358
359 struct flt_ops my_filter_ops {
360 .init = my_filter_init,
361 .deinit = my_filter_deinit,
362 .check = my_filter_config_check,
363
364 /* ... */
365 };
366
367
368Finally, you must define the function to parse your filter configuration, here
369'parse_my_filter_cfg'. This function must parse all remaining keywords on the
370filter line:
371
372 /* Return -1 on error, else 0 */
373 static int
374 parse_my_filter_cfg(char **args, int *cur_arg, struct proxy *px,
375 struct flt_conf *flt_conf, char **err)
376 {
377 struct my_filter_config *my_conf;
378 int pos = *cur_arg;
379
380 /* Allocate the internal configuration used by the filter */
381 my_conf = calloc(1, sizeof(*my_conf));
382 if (!my_conf) {
383 memprintf(err, "%s: out of memory", args[*cur_arg]);
384 return -1;
385 }
386 my_conf->proxy = px;
387
388 /* ... */
389
390 /* Parse all keywords supported by the filter and fill the internal
391 * configuration */
392 pos++; /* Skip the filter name */
393 while (*args[pos]) {
394 if (!strcmp(args[pos], "name")) {
395 if (!*args[pos + 1]) {
396 memprintf(err, "'%s' : '%s' option without value",
397 args[*cur_arg], args[pos]);
398 goto error;
399 }
400 my_conf->name = strdup(args[pos + 1]);
401 if (!my_conf->name) {
402 memprintf(err, "%s: out of memory", args[*cur_arg]);
403 goto error;
404 }
405 pos += 2;
406 }
407
408 /* ... parse other keywords ... */
409 }
410 *cur_arg = pos;
411
412 /* Set callbacks supported by the filter */
413 flt_conf->ops = &my_filter_ops;
414
415 /* Last, save the internal configuration */
416 flt_conf->conf = my_conf;
417 return 0;
418
419 error:
420 if (my_conf->name)
421 free(my_conf->name);
422 free(my_conf);
423 return -1;
424 }
425
426
427WARNING: In your parsing function, you must define 'flt_conf->ops'. You must
428 also parse all arguments on the filter line. This is mandatory.
429
430In the previous example, we expect to read a filter line as follows:
431
432 filter my_filter name MY_NAME ...
433
434
435Optionnaly, by implementing the 'flt_ops.check' callback, you add a step to
436check the internal configuration of your filter after the parsing phase, when
437the HAProxy configuration is fully defined. For example:
438
439 /* Check configuration of a trace filter for a specified proxy.
440 * Return 1 on error, else 0. */
441 static int
442 my_filter_config_check(struct proxy *px, struct flt_conf *my_conf)
443 {
444 if (px->mode != PR_MODE_HTTP) {
445 Alert("The filter 'my_filter' cannot be used in non-HTTP mode.\n");
446 return 1;
447 }
448
449 /* ... */
450
451 return 0;
452 }
453
454
455
4563.3. MANAGING THE FILTER LIFECYCLE
457----------------------------------
458
459Once the configuration parsed and checked, filters are ready to by used. There
460are two callbacks to manage the filter lifecycle:
461
462 * 'flt_ops.init': It initializes the filter for a proxy. You may define this
463 callback if you need to complete your filter configuration.
464
465 * 'flt_ops.deinit': It cleans up what the parsing function and the init
466 callback have done. This callback is useful to release
467 memory allocated for the filter configuration.
468
469Here is an example:
470
471 /* Initialize the filter. Returns -1 on error, else 0. */
472 static int
473 my_filter_init(struct proxy *px, struct flt_conf *fconf)
474 {
475 struct my_filter_config *my_conf = fconf->conf;
476
477 /* ... */
478
479 return 0;
480 }
481
482 /* Free ressources allocated by the trace filter. */
483 static void
484 my_filter_deinit(struct proxy *px, struct flt_conf *fconf)
485 {
486 struct my_filter_config *my_conf = fconf->conf;
487
488 if (my_conf) {
489 free(my_conf->name);
490 /* ... */
491 free(my_conf);
492 }
493 fconf->conf = NULL;
494 }
495
496
497TODO: Add callbacks to handle creation/destruction of filter instances. And
498 document it.
499
500
5013.4. HANDLING THE STREAMS CREATION AND DESCTRUCTION
502---------------------------------------------------
503
504You may be interessted to handle stream creation and destruction. If so, you
505must define followings callbacks:
506
507 * 'flt_ops.stream_start': It is called when a stream is started. This callback
508 can fail by returning a negative value. It will be
509 considered as a critical error by HAProxy which
510 disabled the listener for a short time.
511
512 * 'flt_ops.stream_stop': It is called when a stream is stopped. This callback
513 always succeed. Anyway, it is too late to return an
514 error.
515
516For example:
517
518 /* Called when a stream is created. Returns -1 on error, else 0. */
519 static int
520 my_filter_stream_start(struct stream *s, struct filter *filter)
521 {
522 struct my_filter_config *my_conf = FLT_CONF(filter);
523
524 /* ... */
525
526 return 0;
527 }
528
529 /* Called when a stream is destroyed */
530 static void
531 my_filter_stream_stop(struct stream *s, struct filter *filter)
532 {
533 struct my_filter_config *my_conf = FLT_CONF(filter);
534
535 /* ... */
536 }
537
538
539WARNING: Handling the streams creation and destuction is only possible for
540 filters defined on proxies with the frontend capability.
541
542
5433.5. ANALYZING THE CHANNELS ACTIVITY
544------------------------------------
545
546The main purpose of filters is to take part in the channels analyzing. To do so,
547there is a callback, 'flt_ops.channel_analyze', called before each analyzer
548attached to a channel, execpt analyzers responsible for the data
549parsing/forwarding (TCP data or HTTP body). Concretely, on the request channel,
550'flt_ops.channel_analyze' could be called before following analyzers:
551
552 * tcp_inspect_request (AN_REQ_INSPECT_FE and AN_REQ_INSPECT_BE)
553 * http_wait_for_request (AN_REQ_WAIT_HTTP)
554 * http_wait_for_request_body (AN_REQ_HTTP_BODY)
555 * http_process_req_common (AN_REQ_HTTP_PROCESS_FE)
556 * process_switching_rules (AN_REQ_SWITCHING_RULES)
557 * http_process_req_ common (AN_REQ_HTTP_PROCESS_BE)
558 * http_process_tarpit (AN_REQ_HTTP_TARPIT)
559 * process_server_rules (AN_REQ_SRV_RULES)
560 * http_process_request (AN_REQ_HTTP_INNER)
561 * tcp_persist_rdp_cookie (AN_REQ_PRST_RDP_COOKIE)
562 * process_sticking_rules (AN_REQ_STICKING_RULES)
563 * flt_analyze_http_headers (AN_FLT_HTTP_HDRS)
564
565And on the response channel:
566
567 * tcp_inspect_response (AN_RES_INSPECT)
568 * http_wait_for_response (AN_RES_WAIT_HTTP)
569 * process_store_rules (AN_RES_STORE_RULES)
570 * http_process_res_common (AN_RES_HTTP_PROCESS_BE)
571 * flt_analyze_http_headers (AN_FLT_HTTP_HDRS)
572
573Note that 'flt_analyze_http_headers' (AN_FLT_HTTP_HDRS) is a new analyzer. It
574has been added to let filters analyze HTTP headers after all processing, just
575before the data parsing/forwarding.
576
577Unlike the other callbacks previously seen before, 'flt_ops.channel_analyze' can
578interrupt the stream processing. So a filter can decide to not execute the
579analyzer that follows and wait the next iteration. If there are more than one
580filter, following ones are skipped. On the next iteration, the filtering resumes
581where it was stopped, i.e. on the filter that has previously stopped the
582processing. So it is possible for a filter to stop the stream processing for a
583while before continuing. For example:
584
585 /* Called before a processing happens on a given channel.
586 * Returns a negative value if an error occurs, 0 if it needs to wait,
587 * any other value otherwise. */
588 static int
589 my_filter_chn_analyze(struct stream *s, struct filter *filter,
590 struct channel *chn, unsigned an_bit)
591 {
592 struct my_filter_config *my_conf = FLT_CONF(filter);
593
594 switch (an_bit) {
595 case AN_REQ_WAIT_HTTP:
596 if (/* wait that a condition is verified before continuing */)
597 return 0;
598 break;
599 /* ... * /
600 }
601 return 1;
602 }
603
604 * 'an_bit' is the analyzer id. All analyzers are listed in
605 'include/types/channels.h'.
606
607 * 'chn' is the channel on which the analyzing is done. You can know if it is
608 the request or the response channel by testing if CF_ISRESP flag is set:
609
610 │ ((chn->flags & CF_ISRESP) == CF_ISRESP)
611
612
613In previous example, the stream processing is blocked before receipt of the HTTP
614request until a condition is verified.
615
616To surround activity of a filter during the channel analyzing, two new analyzers
617has been added:
618
619 * 'flt_start_analyze' (AN_FLT_START_FE/AN_FLT_START_BE): For a specific
620 filter, this analyzer is called before any call to the 'channel_analyze'
621 callback. From the filter point of view, it calls the
622 'flt_ops.channel_start_analyze' callback.
623
624 * 'flt_end_analyze' (AN_FLT_END): For a specific filter, this analyzer is
625 called when all other analyzers have finished their processing. From the
626 filter point of view, it calls the 'flt_ops.channel_end_analyze' callback.
627
628For TCP streams, these analyzers are called only once. For HTTP streams, if the
629client connection is kept alive, this happens at each request/response roundtip.
630
631'flt_ops.channel_start_analyze' and 'flt_ops.channel_end_analyze' callbacks can
632interrupt the stream processing, as 'flt_ops.channel_analyze'. Here is an
633example:
634
635 /* Called when analyze starts for a given channel
636 * Returns a negative value if an error occurs, 0 if it needs to wait,
637 * any other value otherwise. */
638 static int
639 my_filter_chn_start_analyze(struct stream *s, struct filter *filter,
640 struct channel *chn)
641 {
642 struct my_filter_config *my_conf = FLT_CONF(filter);
643
644 /* ... TODO ... */
645
646 return 1;
647 }
648
649 /* Called when analyze ends for a given channel
650 * Returns a negative value if an error occurs, 0 if it needs to wait,
651 * any other value otherwise. */
652 static int
653 my_filter_chn_end_analyze(struct stream *s, struct filter *filter,
654 struct channel *chn)
655 {
656 struct my_filter_config *my_conf = FLT_CONF(filter);
657
658 /* ... TODO ... */
659
660 return 1;
661 }
662
663
664Workflow on channels can be summarized as following:
665
666 |
667 +----------+-----------+
668 | flt_ops.stream_start |
669 +----------+-----------+
670 |
671 ...
672 |
673 +-<-- [1] +------->---------+
674 | --+ | | --+
675 +------<----------+ | | +--------<--------+ |
676 | | | | | | |
677 V | | | V | |
678+-------------------------------+ | | | +-------------------------------+ | |
679| flt_start_analyze +-+ | | | flt_start_analyze +-+ |
680|(flt_ops.channel_start_analyze)| | F | |(flt_ops.channel_start_analyze)| |
681+---------------+---------------+ | R | +---------------+---------------+ |
682 | | O | | |
683 +------<--------+ | N ^ +--------<-------+ | B
684 | | | T | | | | A
685+---------------+----------+ | | E | +---------------+----------+ | | C
686|+--------------V-----------+ | | N | |+--------------V-----------+ | | K
687||+--------------------------+ | | D | ||+--------------------------+ | | E
688||| flt_ops.channel_analyze | | | | ||| flt_ops.channel_analyze | | | N
689+|| V +--+ | | +|| V +---+ | D
690 +| analyzer | | | +| analyzer | |
691 +-------------+------------+ | | +-------------+------------+ |
692 | --+ | | |
693 +------------>------------+ ... |
694 | |
695 [ data filtering (see below) ] |
696 | |
697 ... |
698 | |
699 +--------<--------+ |
700 | | |
701 V | |
702 +-------------------------------+ | |
703 | flt_end_analyze +-+ |
704 | (flt_ops.channel_end_analyze) | |
705 +---------------+---------------+ |
706 | --+
707 If HTTP stream, go back to [1] --<--+
708 |
709 ...
710 |
711 +----------+-----------+
712 | flt_ops.stream_stop |
713 +----------+-----------+
714 |
715 V
716
717
718TODO: Add pre/post analyzer callbacks with a mask. So, this part will be
719 massively refactored very soon.
720
721
722 3.6. FILTERING THE DATA EXCHANGED
723-----------------------------------
724
725WARNING: To fully understand this part, you must be aware on how the buffers
726 work in HAProxy. In particular, you must be comfortable with the idea
727 of circular buffers. See doc/internals/buffer-operations.txt and
728 doc/internals/buffer-ops.fig for details.
729 doc/internals/body-parsing.txt could also be useful.
730
731An extended feature of the filters is the data filtering. By default a filter
732does not look into data exchanged between the client and the server because it
733is expensive. Indeed, instead of forwarding data without any processing, each
734byte need to be buffered.
735
736So, to enable the data filtering on a channel, at any time, in one of previous
737callbacks, you should call 'register_data_filter' function. And conversely, to
738disable it, you should call 'unregister_data_filter' function. For example:
739
740 my_filter_chn_analyze(struct stream *s, struct filter *filter,
741 struct channel *chn, unsigned an_bit)
742 {
743 struct my_filter_config *my_conf = FLT_CONF(filter);
744
745 /* 'chn' must be the request channel */
746 if (!(chn->flags & CF_ISRESP) && an_bit == AN_FLT_HTTP_HDRS) {
747 struct http_txn *txn = s->txn;
748 struct http_msg *msg = &txn->req;
749 struct buffer *req = msg->chn->buf;
750 struct hdr_ctx ctx;
751
752 /* Enable the data filtering for the request if 'X-Filter' header
753 * is set to 'true'. */
754 if (http_find_header2("X-Filter", 8, req->p, &txn->hdr_idx, &ctx) &&
755 ctx.vlen >= 3 && memcmp(ctx.line + ctx.val, "true", 4) == 0)
756 register_data_filter(s, chn_filter);
757 }
758
759 return 1;
760 }
761
762Here, the data filtering is enabled if the HTTP header 'X-Filter' is found and
763set to 'true'.
764
765If several filters are declared, the evaluation order remains the same,
766regardless the order of the registrations to the data filtering.
767
768Depending on the stream type, TCP or HTTP, the way to handle data filtering will
769be slightly different. Among other things, for HTTP streams, there are more
770callbacks to help you to fully handle all steps of an HTTP transaction. But the
771basis is the same. The data filtering is done in 2 stages:
772
773 * The data parsing: At this stage, filters will analyze input data on a
774 channel. Once a filter has parsed some data, it cannot parse it again. At
775 any time, a filter can choose to not parse all available data. So, it is
776 possible for a filter to retain data for a while. Because filters are
777 chained, a filter cannot parse more data than its predecessors. Thus only
778 data considered as parsed by the last filter will be available to the next
779 stage, the data forwarding.
780
781 * The data forwarding: At this stage, filters will decide how much data
782 HAProxy can forward among those considered as parsed at the previous
783 stage. Once a filter has marked data as forwardable, it cannot analyze it
784 anymore. At any time, a filter can choose to not forward all parsed
785 data. So, it is possible for a filter to retain data for a while. Because
786 filters are chained, a filter cannot forward more data than its
787 predecessors. Thus only data marked as forwardable by the last filter will
788 be actually forwarded by HAProxy.
789
790Internally, filters own 2 offsets, relatively to 'buf->p', representing the
791number of bytes already parsed in the available input data and the number of
792bytes considered as forwarded. We will call these offsets, respectively, 'nxt'
793and 'fwd'. Following macros reference these offsets:
794
795 * FLT_NXT(flt, chn), flt_req_nxt(flt) and flt_rsp_nxt(flt)
796
797 * FLT_FWD(flt, chn), flt_req_fwd(flt) and flt_rsp_fwd(flt)
798
799where 'flt' is the 'struct filter' passed as argument in all callbacks and 'chn'
800is the considered channel.
801
802Using these offsets, following operations on buffers are possible:
803
804 chn->buf->p + FLT_NXT(flt, chn) // the pointer on parsable data for
805 // the filter 'flt' on the channel 'chn'.
806 // Everything between chn->buf->p and 'nxt' offset was already parsed
807 // by the filter.
808
809 chn->buf->i - FLT_NXT(flt, chn) // the number of bytes of parsable data for
810 // the filter 'flt' on the channel 'chn'.
811
812 chn->buf->p + FLT_FWD(flt, chn) // the pointer on forwardable data for
813 // the filter 'flt' on the channel 'chn'.
814 // Everything between chn->buf->p and 'fwd' offset was already forwarded
815 // by the filter.
816
817
818Note that at any time, for a filter, 'nxt' offset is always greater or equal to
819'fwd' offset.
820
821TODO: Add schema with buffer states when there is 2 filters that analyze data.
822
823
8243.6.1 FILTERING DATA ON TCP STREAMS
825-----------------------------------
826
827The TCP data filtering is the easy case, because HAProxy do not parse these
828data. So you have only two callbacks that you need to consider:
829
830 * 'flt_ops.tcp_data': This callback is called when unparsed data are
831 available. If not defined, all available data will be considered as parsed
832 for the filter.
833
834 * 'flt_ops.tcp_forward_data': This callback is called when parsed data are
835 available. If not defined, all parsed data will be considered as forwarded
836 for the filter.
837
838Here is an example:
839
840 /* Returns a negative value if an error occurs, else the number of
841 * consumed bytes. */
842 static int
843 my_filter_tcp_data(struct stream *s, struct filter *filter,
844 struct channel *chn)
845 {
846 struct my_filter_config *my_conf = FLT_CONF(filter);
847 int avail = chn->buf->i - FLT_NXT(filter, chn);
848 int ret = avail;
849
850 /* Do not parse more than 'my_conf->max_parse' bytes at a time */
851 if (my_conf->max_parse != 0 && ret > my_conf->max_parse)
852 ret = my_conf->max_parse;
853
854 /* if available data are not completely parsed, wake up the stream to
855 * be sure to not freeze it. */
856 if (ret != avail)
857 task_wakeup(s->task, TASK_WOKEN_MSG);
858 return ret;
859 }
860
861
862 /* Returns a negative value if an error occurs, else * or the number of
863 * forwarded bytes. */
864 static int
865 my_filter_tcp_forward_data(struct stream *s, struct filter *filter,
866 struct channel *chn, unsigned int len)
867 {
868 struct my_filter_config *my_conf = FLT_CONF(filter);
869 int ret = len;
870
871 /* Do not forward more than 'my_conf->max_forward' bytes at a time */
872 if (my_conf->max_forward != 0 && ret > my_conf->max_forward)
873 ret = my_conf->max_forward;
874
875 /* if parsed data are not completely forwarded, wake up the stream to
876 * be sure to not freeze it. */
877 if (ret != len)
878 task_wakeup(s->task, TASK_WOKEN_MSG);
879 return ret;
880 }
881
882
883
8843.6.2 FILTERING DATA ON HTTP STREAMS
885------------------------------------
886
887The HTTP data filtering is a bit tricky because HAProxy will parse the body
888structure, especially chunked body. So basically there is the HTTP counterpart
889to the previous callbacks:
890
891 * 'flt_ops.http_data': This callback is called when unparsed data are
892 available. If not defined, all available data will be considered as parsed
893 for the filter.
894
895 * 'flt_ops.http_forward_data': This callback is called when parsed data are
896 available. If not defined, all parsed data will be considered as forwarded
897 for the filter.
898
899But the prototype for these callbacks is slightly different. Instead of having
900the channel as parameter, we have the HTTP message (struct http_msg). You need
901to be careful when you use 'http_msg.chunk_len' size. This value is the number
902of bytes remaining to parse in the HTTP body (or the chunk for chunked
903messages). The HTTP parser of HAProxy uses it to have the number of bytes that
904it could consume:
905
906 /* Available input data in the current chunk from the HAProxy point of view.
907 * msg->next bytes were already parsed. Without data filtering, HAProxy
908 * will consume all of it. */
909 Bytes = MIN(msg->chunk_len, chn->buf->i - msg->next);
910
911
912But in your filter, you need to recompute it:
913
914 /* Available input data in the current chunk from the filter point of view.
915 * 'nxt' bytes were already parsed. */
916 Bytes = MIN(msg->chunk_len + msg->next, chn->buf->i) - FLT_NXT(flt, chn);
917
918
919In addition to these callbacks, there are two other:
920
921 * 'flt_ops.http_end': This callback is called when the whole HTTP
922 request/response is processed. It can interrupt the stream processing. So,
923 it could be used to synchronize the HTTP request with the HTTP response, for
924 example:
925
926 /* Returns a negative value if an error occurs, 0 if it needs to wait,
927 * any other value otherwise. */
928 static int
929 my_filter_http_end(struct stream *s, struct filter *filter,
930 struct http_msg *msg)
931 {
932 struct my_filter_ctx *my_ctx = filter->ctx;
933
934
935 if (!(msg->chn->flags & CF_ISRESP)) /* The request */
936 my_ctx->end_of_req = 1;
937 else /* The response */
938 my_ctx->end_of_rsp = 1;
939
940 /* Both the request and the response are finished */
941 if (my_ctx->end_of_req == 1 && my_ctx->end_of_rsp == 1)
942 return 1;
943
944 /* Wait */
945 return 0;
946 }
947
948
949 * 'flt_ops.http_chunk_trailers': This callback is called for chunked HTTP
950 messages only when all chunks were parsed. HTTP trailers can be parsed into
951 several passes. This callback will be called each time. The number of bytes
952 parsed by HAProxy at each iteration is stored in 'msg->sol'.
953
954Then, to finish, there are 2 informational callbacks:
955
956 * 'flt_ops.http_reset': This callback is called when a HTTP message is
957 reset. This only happens when a '100-continue' response is received. It
958 could be useful to reset the filter context before receiving the true
959 response.
960
961 * 'flt_ops.http_reply': This callback is called when, at any time, HAProxy
962 decides to stop the processing on a HTTP message and to send an internal
963 response to the client. This mainly happens when an error or a redirect
964 occurs.
965
966
9673.6.3 REWRITING DATA
968--------------------
969
970The last part, and the trickiest one about the data filtering, is about the data
971rewriting. For now, the filter API does not offer a lot of functions to handle
972it. There are only functions to notify HAProxy that the data size has changed to
973let it update internal state of filters. This is your responsibility to update
974data itself, i.e. the buffer offsets. For a HTTP message, you also must update
975'msg->next' and 'msg->chunk_len' values accordingly:
976
977 * 'flt_change_next_size': This function must be called when a filter alter
978 incoming data. It updates 'nxt' offset value of all its predecessors. Do not
979 call this function when a filter change the size of incoming data leads to
980 an undefined behavior.
981
982 unsigned int avail = MIN(msg->chunk_len + msg->next, chn->buf->i) -
983 flt_rsp_next(filter);
984
985 if (avail > 10 and /* ...Some condition... */) {
986 /* Move the buffer forward to have buf->p pointing on unparsed
987 * data */
988 b_adv(msg->chn->buf, flt_rsp_nxt(filter));
989
990 /* Skip first 10 bytes. To simplify this example, we consider a
991 * non-wrapping buffer */
992 memmove(buf->p + 10, buf->p, avail - 10);
993
994 /* Restore buf->p value */
995 b_rew(msg->chn->buf, flt_rsp_nxt(filter));
996
997 /* Now update other filters */
998 flt_change_next_size(filter, msg->chn, -10);
999
1000 /* Update the buffer state */
1001 buf->i -= 10;
1002
1003 /* And update the HTTP message state */
1004 msg->chunk_len -= 10;
1005
1006 return (avail - 10);
1007 }
1008 else
1009 return 0; /* Wait for more data */
1010
1011
1012 * 'flt_change_forward_size': This function must be called when a filter alter
1013 parsed data. It updates offset values ('nxt' and 'fwd') of all filters. Do
1014 not call this function when a filter change the size of parsed data leads to
1015 an undefined behavior.
1016
1017 /* len is the number of bytes of forwardable data */
1018 if (len > 10 and /* ...Some condition... */) {
1019 /* Move the buffer forward to have buf->p pointing on non-forwarded
1020 * data */
1021 b_adv(msg->chn->buf, flt_rsp_fwd(filter));
1022
1023 /* Skip first 10 bytes. To simplify this example, we consider a
1024 * non-wrapping buffer */
1025 memmove(buf->p + 10, buf->p, len - 10);
1026
1027 /* Restore buf->p value */
1028 b_rew(msg->chn->buf, flt_rsp_fwd(filter));
1029
1030 /* Now update other filters */
1031 flt_change_forward_size(filter, msg->chn, -10);
1032
1033 /* Update the buffer state */
1034 buf->i -= 10;
1035
1036 /* And update the HTTP message state */
1037 msg->next -= 10;
1038
1039 return (len - 10);
1040 }
1041 else
1042 return 0; /* Wait for more data */
1043
1044
1045TODO: implement all the stuff to easily rewrite data. For HTTP messages, this
1046 requires to have a chunked message. Else the size of data cannot be
1047 changed.
1048
1049
1050
1051
10524. FAQ
1053------
1054
10554.1. Detect multiple declarations of the same filter
1056----------------------------------------------------
1057
1058TODO