BUG/MINOR: http_fetch: make hdr_ip() reject trailing characters
The hdr_ip() sample fetch function will try to extract IP addresses
from a header field. These IP addresses are parsed using url2ipv4()
and if it fails it will fall back to inet_pton(AF_INET6), otherwise
will fail.
There is a small problem there which is that if a field starts with
an IP address and is immediately followed by some garbage, the IP
address part is still returned. This is a problem with fields such
as x-forwarded-for because it prevents detection of accidental
corruption or bug along the chain. For example, the following string:
x-forwarded-for: 1.2.3.4; 5.6.7.8
or this one:
x-forwarded-for: 1.2.3.4O ( the last one being the letter 'O')
would still return "1.2.3.4" despite the trailing characters. This is
bad because it will silently cover broken code running on intermediary
proxies and may even in some cases allow haproxy to pass improperly
formatted headers after they were apparently validated, for example,
if someone extracts the address from this field to place it into
another one.
This issue would only affect the IPv4 parser, because the IPv6 parser
already uses inet_pton() which fails at the first invalid character and
rejects trailing port numbers.
In strict compliance with RFC7239, let's make sure that if there are any
characters left in the string, the parsing fails and makes hdr_ip()
return nothing. However, a special case has to be handled to support
IPv4 addresses followed by a colon and a valid port number, because till
now the parser used to implicitly accept them and it appears that this
practice, though rare, does exist at least in Azure:
https://docs.microsoft.com/en-us/azure/application-gateway/how-application-gateway-works
This issue has always been there so the fix may be backported to all
versions. It will need the following commit in order to work as expected:
MINOR: tools: make url2ipv4 return the exact number of bytes parsed
Many thanks to https://twitter.com/melardev and the BitMEX Security Team
for their detailed report.
(cherry picked from commit 7b0e00d943feaa6320522c376fbbff18965b1345)
Signed-off-by: Willy Tarreau <w@1wt.eu>
diff --git a/doc/configuration.txt b/doc/configuration.txt
index 3170efa..5baacef 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -18330,7 +18330,11 @@
This extracts the last occurrence of header <name> in an HTTP request,
converts it to an IPv4 or IPv6 address and returns this address. When used
with ACLs, all occurrences are checked, and if <name> is omitted, every value
- of every header is checked.
+ of every header is checked. The parser strictly adheres to the format
+ described in RFC7239, with the extension that IPv4 addresses may optionally
+ be followed by a colon (':') and a valid decimal port number (0 to 65535),
+ which will be silently dropped. All other forms will not match and will
+ cause the address to be ignored.
The <occ> parameter is processed as with req.hdr().