[DOC] added documentation about HTTP header manipulations
This section has been inserted before the logging section.
diff --git a/doc/configuration.txt b/doc/configuration.txt
index 84ebc0e..4dc529d 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -854,7 +854,7 @@
capture cookie ASPSESSION len 32
See also : "capture request header", "capture response header" as well as
- section 2.5 about logging.
+ section 2.6 about logging.
capture request header <name> len <length>
@@ -891,7 +891,7 @@
capture request header X-Forwarded-For len 15
capture request header Referrer len 15
- See also : "capture cookie", "capture response header" as well as section 2.5
+ See also : "capture cookie", "capture response header" as well as section 2.6
about logging.
@@ -927,7 +927,7 @@
capture response header Content-length len 9
capture response header Location len 15
- See also : "capture cookie", "capture request header" as well as section 2.5
+ See also : "capture cookie", "capture request header" as well as section 2.6
about logging.
@@ -1706,7 +1706,7 @@
If this option has been enabled in a "defaults" section, it can be disabled
in a specific instance by prepending the "no" keyword before it.
- See also : "log", "monitor-net", "monitor-uri" and section 2.5 about logging.
+ See also : "log", "monitor-net", "monitor-uri" and section 2.6 about logging.
option forceclose
@@ -1905,7 +1905,7 @@
This option may be set either in the frontend or the backend.
- See also : section 2.5 about logging.
+ See also : section 2.6 about logging.
option logasap
@@ -1926,7 +1926,7 @@
"Content-Length" response header so that the logs at least indicate how many
bytes are expected to be transferred.
- See also : "option httplog", "capture response header", and section 2.5 about
+ See also : "option httplog", "capture response header", and section 2.6 about
logging.
@@ -2160,7 +2160,7 @@
This option may be set either in the frontend or the backend.
- See also : "option httplog", and section 2.5 about logging.
+ See also : "option httplog", and section 2.6 about logging.
option tcpsplice [ experimental ]
@@ -2245,7 +2245,7 @@
Arguments :
<string> is the complete line to be added. Any space or known delimiter
must be escaped using a backslash ('\'). Please refer to section
- 2.6 about HTTP header manipulation for more information.
+ 2.5 about HTTP header manipulation for more information.
A new line consisting in <string> followed by a line feed will be added after
the last header of an HTTP request.
@@ -2254,7 +2254,7 @@
and not to traffic generated by HAProxy, such as health-checks or error
responses.
- See also: "rspadd" and section 2.6 about HTTP header manipulation
+ See also: "rspadd" and section 2.5 about HTTP header manipulation
reqallow <search>
@@ -2285,7 +2285,7 @@
reqiallow ^Host:\ www\.
reqideny ^Host:\ .*\.local
- See also: "reqdeny", "acl", "block" and section 2.6 about HTTP header
+ See also: "reqdeny", "acl", "block" and section 2.5 about HTTP header
manipulation
@@ -2316,7 +2316,7 @@
reqidel ^X-Forwarded-For:.*
reqidel ^Cookie:.*SERVER=
- See also: "reqadd", "reqrep", "rspdel" and section 2.6 about HTTP header
+ See also: "reqadd", "reqrep", "rspdel" and section 2.5 about HTTP header
manipulation
@@ -2340,6 +2340,10 @@
headers. Keep in mind that URLs in request line are case-sensitive while
header names are not.
+ A denied request will generate an "HTTP 403 forbidden" response once the
+ complete request has been parsed. This is consistent with what is practised
+ using ACLs.
+
It is easier, faster and more powerful to use ACLs to write access policies.
Reqdeny, reqallow and reqpass should be avoided in new designs.
@@ -2348,7 +2352,7 @@
reqideny ^Host:\ .*\.local
reqiallow ^Host:\ www\.
- See also: "reqallow", "rspdeny", "acl", "block" and section 2.6 about HTTP
+ See also: "reqallow", "rspdeny", "acl", "block" and section 2.5 about HTTP
header manipulation
@@ -2380,7 +2384,7 @@
reqideny ^Host:\ .*\.local
reqiallow ^Host:\ www\.
- See also: "reqallow", "reqdeny", "acl", "block" and section 2.6 about HTTP
+ See also: "reqallow", "reqdeny", "acl", "block" and section 2.5 about HTTP
header manipulation
@@ -2401,7 +2405,7 @@
must be escaped using a backslash ('\'). References to matched
pattern groups are possible using the common \N form, with N
being a single digit between 0 and 9. Please refer to section
- 2.6 about HTTP header manipulation for more information.
+ 2.5 about HTTP header manipulation for more information.
Any line matching extended regular expression <search> in the request (both
the request line and header lines) will be completely replaced with <string>.
@@ -2419,7 +2423,7 @@
# replace "www.mydomain.com" with "www" in the host name.
reqirep ^Host:\ www.mydomain.com Host:\ www
- See also: "reqadd", "reqdel", "rsprep" and section 2.6 about HTTP header
+ See also: "reqadd", "reqdel", "rsprep" and section 2.5 about HTTP header
manipulation
@@ -2439,7 +2443,9 @@
A request containing any line which matches extended regular expression
<search> will be tarpitted, which means that it will connect to nowhere, will
- be kept open for a pre-defined time, then will return an HTTP error 500. The
+ be kept open for a pre-defined time, then will return an HTTP error 500 so
+ that the attacker does not suspect it has been tarpitted. The status 500 will
+ be reported in the logs, but the completion flags will indicate "PT". The
delay is defined by "timeout tarpit", or "timeout connect" if the former is
not set.
@@ -2455,7 +2461,7 @@
reqipass ^User-Agent:\.*(Mozilla|MSIE)
reqitarpit ^User-Agent:
- See also: "reqallow", "reqdeny", "reqpass", and section 2.6 about HTTP header
+ See also: "reqallow", "reqdeny", "reqpass", and section 2.5 about HTTP header
manipulation
@@ -2466,7 +2472,7 @@
Arguments :
<string> is the complete line to be added. Any space or known delimiter
must be escaped using a backslash ('\'). Please refer to section
- 2.6 about HTTP header manipulation for more information.
+ 2.5 about HTTP header manipulation for more information.
A new line consisting in <string> followed by a line feed will be added after
the last header of an HTTP response.
@@ -2475,7 +2481,7 @@
and not to traffic generated by HAProxy, such as health-checks or error
responses.
- See also: "reqadd" and section 2.6 about HTTP header manipulation
+ See also: "reqadd" and section 2.5 about HTTP header manipulation
rspdel <search>
@@ -2505,7 +2511,7 @@
# remove the Server header from responses
reqidel ^Server:.*
- See also: "rspadd", "rsprep", "reqdel" and section 2.6 about HTTP header
+ See also: "rspadd", "rsprep", "reqdel" and section 2.5 about HTTP header
manipulation
@@ -2529,9 +2535,9 @@
case-sensitive.
Main use of this keyword is to prevent sensitive information leak and to
- block the response before it reaches the client. If a response is denied,
- it will be replaced with an HTTP 502 error so that the client never gets
- the sensitive data.
+ block the response before it reaches the client. If a response is denied, it
+ will be replaced with an HTTP 502 error so that the client never retrieves
+ any sensitive data.
It is easier, faster and more powerful to use ACLs to write access policies.
Rspdeny should be avoided in new designs.
@@ -2540,7 +2546,7 @@
# Ensure that no content type matching ms-word will leak
rspideny ^Content-type:\.*/ms-word
- See also: "reqdeny", "acl", "block" and section 2.6 about HTTP header
+ See also: "reqdeny", "acl", "block" and section 2.5 about HTTP header
manipulation
@@ -2562,7 +2568,7 @@
must be escaped using a backslash ('\'). References to matched
pattern groups are possible using the common \N form, with N
being a single digit between 0 and 9. Please refer to section
- 2.6 about HTTP header manipulation for more information.
+ 2.5 about HTTP header manipulation for more information.
Any line matching extended regular expression <search> in the response (both
the response line and header lines) will be completely replaced with
@@ -2578,7 +2584,7 @@
# replace "Location: 127.0.0.1:8080" with "Location: www.mydomain.com"
rspirep ^Location:\ 127.0.0.1:8080 Location:\ www.mydomain.com
- See also: "rspadd", "rspdel", "reqrep" and section 2.6 about HTTP header
+ See also: "rspadd", "rspdel", "reqrep" and section 2.5 about HTTP header
manipulation
@@ -3822,8 +3828,104 @@
instance between 10 and 100 to leave enough room above and below for later
adjustments.
+
+2.5) HTTP header manipulation
+-----------------------------
+
+In HTTP mode, it is possible to rewrite, add or delete some of the request and
+response headers based on regular expressions. It is also possible to block a
+request or a response if a particular header matches a regular expression,
+which is enough to stop most elementary protocol attacks, and to protect
+against information leak from the internal network. But there is a limitation
+to this : since HAProxy's HTTP engine does not support keep-alive, only headers
+passed during the first request of a TCP session will be seen. All subsequent
+headers will be considered data only and not analyzed. Furthermore, HAProxy
+never touches data contents, it stops analysis at the end of headers.
+
+This section covers common usage of the following keywords, described in detail
+in section 2.2.1 :
+
+ - reqadd <string>
+ - reqallow <search>
+ - reqiallow <search>
+ - reqdel <search>
+ - reqidel <search>
+ - reqdeny <search>
+ - reqideny <search>
+ - reqpass <search>
+ - reqipass <search>
+ - reqrep <search> <replace>
+ - reqirep <search> <replace>
+ - reqtarpit <search>
+ - reqitarpit <search>
+ - rspadd <string>
+ - rspdel <search>
+ - rspidel <search>
+ - rspdeny <search>
+ - rspideny <search>
+ - rsprep <search> <replace>
+ - rspirep <search> <replace>
+
+With all these keywords, the same conventions are used. The <search> parameter
+is a POSIX extended regular expression (regex) which supports grouping through
+parenthesis (without the backslash). Spaces and other delimiters must be
+prefixed with a backslash ('\') to avoid confusion with a field delimiter.
+Other characters may be prefixed with a backslash to change their meaning :
+
+ \t for a tab
+ \r for a carriage return (CR)
+ \n for a new line (LF)
+ \ to mark a space and differentiate it from a delimiter
+ \# to mark a sharp and differentiate it from a comment
+ \\ to use a backslash in a regex
+ \\\\ to use a backslash in the text (*2 for regex, *2 for haproxy)
+ \xXX to write the ASCII hex code XX as in the C language
+
+The <replace> parameter contains the string to be used to replace the largest
+portion of text matching the regex. It can make use of the special characters
+above, and can reference a substring which is delimited by parenthesis in the
+regex, by writing a backslash ('\') immediately followed by one digit from 0 to
+9 indicating the group position (0 designating the entire line). This practise
+is very common to users of the "sed" program.
+
+The <string> parameter represents the string which will systematically be added
+after the last header line. It can also use special character sequences above.
+
+Notes related to these keywords :
+---------------------------------
+ - these keywords are not always convenient to allow/deny based on header
+ contents. It is strongly recommended to use ACLs with the "block" keyword
+ instead, resulting in far more flexible and manageable rules.
+
+ - lines are always considered as a whole. It is not possible to reference
+ a header name only or a value only. This is important because of the way
+ headers are written (notably the number of spaces after the colon).
+
+ - the first line is always considered as a header, which makes it possible to
+ rewrite or filter HTTP requests URIs or response codes, but in turn makes
+ it harder to distinguish between headers and request line. The regex prefix
+ ^[^\ \t]*[\ \t] matches any HTTP method followed by a space, and the prefix
+ ^[^ \t:]*: matches any header name followed by a colon.
+
+ - for performances reasons, the number of characters added to a request or to
+ a response is limited at build time to values between 1 and 4 kB. This
+ should normally be far more than enough for most usages. If it is too short
+ on occasional usages, it is possible to gain some space by removing some
+ useless headers before adding new ones.
+
+ - keywords beginning with "reqi" and "rspi" are the same as their couterpart
+ without the 'i' letter except that they ignore case when matching patterns.
+
+ - when a request passes through a frontend then a backend, all req* rules
+ from the frontend will be evaluated, then all req* rules from the backend
+ will be evaluated. The reverse path is applied to responses.
+
+ - req* statements are applied after "block" statements, so that "block" is
+ always the first one, but before "use_backend" in order to permit rewriting
+ before switching.
+
-2.5) Logging
+2.6) Logging
------------
[to do]