blob: 8d3ad8312f4c25eef8ddc7626d32379eeb03f475 [file] [log] [blame]
Willy Tarreau640cf222010-10-29 21:46:16 +02001The PROXY protocol - 2010/10/29 - Willy TARREAU
2-----------------------------------------------
3
4Relaying TCP connections through proxies generally involves a loss of the
5original TCP connection parameters such as source and destination addresses,
6ports, and so on. Some protocols make it a little bit easier to transfer such
7information. For SMTP, Postfix authors have proposed the XCLIENT protocol which
8received broad adoption and is particularly suited to mail exchanges. In HTTP,
9we have the non-standard but omnipresent X-Forwarded-For header which relays
10information about the original source address, and the less common
11X-Original-To which relays information about the destination address.
12
13However, both mechanisms require a knowledge of the underlying protocol to be
14implemented in intermediaries.
15
16Then comes a new class of products which we'll call "dumb proxies", not because
17they don't do anything, but because they're processing protocol-agnostic data.
18Stunnel is an example of such a "dumb proxy". It talks raw TCP on one side, and
19raw SSL on the other one, and does that reliably.
20
21The problem with such a proxy when it is combined with another one such as
22haproxy is to adapt it to talk the higher level protocol. A patch is available
23for Stunnel to make it capable to insert an X-Forwarded-For header in the first
24HTTP request of each incoming connection. Haproxy is able not to add another
25one when the connection comes from Stunnel, so that it's possible to hide it
26from the servers.
27
28The typical architecture becomes the following one :
29
30
31 +--------+ HTTP :80 +----------+
32 | client | --------------------------------> | |
33 | | | haproxy, |
34 +--------+ +---------+ | 1 or 2 |
35 / / HTTPS | stunnel | HTTP :81 | listening|
36 <________/ ---------> | (server | ---------> | ports |
37 | mode) | | |
38 +---------+ +----------+
39
40
41The problem appears when haproxy runs with keep-alive on the side towards the
42client. The Stunnel patch will only add the X-Forwarded-For header to the first
43request of each connection and all subsequent requests will not have it. One
44solution could be to improve the patch to make it support keep-alive and parse
45all forwarded data, whether they're announced with a Content-Length or with a
46Transfer-Encoding, taking care of special methods such as HEAD which announce
47data without transfering them, etc... In fact, it would require implementing a
48full HTTP stack in Stunnel. It would then become a lot more complex, a lot less
49reliable and would not anymore be the "dumb proxy" that fits every purposes.
50
51In practice, we don't need to add a header for each request because we'll emit
52the exact same information every time : the information related to the client
53side connection. We could then cache that information in haproxy and use it for
54every other request. But that becomes dangerous and is still limited to HTTP
55only.
56
57Another approach would be to prepend each connection with a line reporting the
58characteristics of the other side's connection. This method is a lot simpler to
59implement, does not require any protocol-specific knowledge on either side, and
60completely fits the purpose. That's finally what we did with a small patch to
61Stunnel and another one to haproxy. We have called this protocol the PROXY
62protocol.
63
64The PROXY protocol's goal is to fill the receiver's internal structures with
65the information it could have found itself if it performed the accept from the
66client. Thus right now we're supporting the following :
67 - INET protocol and family (TCP over IPv4 or IPv6)
68 - layer 3 source and destination addresses
69 - layer 4 source and destination ports if any
70
71Unlike the XCLIENT protocol, the PROXY protocol was designed with limited
72extensibility in order to help the receiver parse it very fast, while keeping
73it human-readable for better debugging possibilities. So it consists in exactly
74the following block prepended before any data flowing from the dumb proxy to
75the next hop :
76
77 - a string identifying the protocol : "PROXY" ( \x50 \x52 \x4F \x58 \x59 )
78
79 - exactly one space : " " ( \x20 )
80
81 - a string indicating the proxied INET protocol and family. At the moment,
82 only "TCP4" ( \x54 \x43 \x50 \x34 ) for TCP over IPv4, and "TCP6"
83 ( \x54 \x43 \x50 \x36 ) for TCP over IPv6 are allowed. Unsupported or
84 unknown protocols must be reported with the name "UNKNOWN" ( \x55 \x4E \x4B
85 \x4E \x4F \x57 \x4E). The remaining fields of the line are then optional
86 and may be ignored, until the CRLF is found.
87
88 - exactly one space : " " ( \x20 )
89
90 - the layer 3 source address in its canonical format. IPv4 addresses must be
91 indicated as a series of exactly 4 integers in the range [0..255] inclusive
92 written in decimal representation separated by exactly one dot between each
93 other. Heading zeroes are not permitted in front of numbers in order to
94 avoid any possible confusion with octal numbers. IPv6 addresses must be
95 indicated as series of 4 hexadecimal digits (upper or lower case) delimited
96 by colons between each other, with the acceptance of one double colon
97 sequence to replace the largest acceptable range of consecutive zeroes. The
98 total number of decoded bits must exactly be 128. The advertised protocol
99 family dictates what format to use.
100
101 - exactly one space : " " ( \x20 )
102
103 - the layer 3 destination address in its canonical format. It is the same
104 format as the layer 3 source address and matches the same family.
105
106 - exactly one space : " " ( \x20 )
107
108 - the TCP source port represented as a decimal integer in the range
109 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
110 in order to avoid any possible confusion with octal numbers.
111
112 - exactly one space : " " ( \x20 )
113
114 - the TCP destination port represented as a decimal integer in the range
115 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
116 in order to avoid any possible confusion with octal numbers.
117
118 - the CRLF sequence ( \x0D \x0A )
119
120The receiver MUST be configured to only receive this protocol and MUST not try
121to guess whether the line is prepended or not. That means that the protocol
122explicitly prevents port sharing between public and private access. Otherwise
123it would become a big security issue. The receiver should ensure proper access
124filtering so that only trusted proxies are allowed to use this protocol. The
125receiver must wait for the CRLF sequence to decode the addresses in order to
126ensure they are complete. Any sequence which does not exactly match the
127protocol must be discarded and cause a connection abort. It is recommended
128to abort the connection as soon as possible to that the emitter notices the
129anomaly.
130
131If the announced transport protocol is "UNKNOWN", then the receiver knows that
132the emitter talks the correct protocol, any may or may not decide to accept the
133connection and use the real connection's parameters as if there was no such
134protocol on the wire.
135
136An example of such a line before an HTTP request would look like this (CR
137marked as "\r" and LF marked as "\n") :
138
139 PROXY TCP4 192.168.0.1 192.168.0.11 56324 443\r\n
140 GET / HTTP/1.1\r\n
141 Host: 192.168.0.11\r\n
142 \r\n
143
144For the emitter, the line is easy to put into the output buffers once the
145connection is established. For the receiver, once the line is parsed, it's
146easy to skip it from the input buffers.
147
148We have a patch available for recent versions of Stunnel that brings it the
Willy Tarreaucf3e47a2011-02-13 09:17:39 +0100149ability to be an emitter. The feature is called "sendproxy" there. The code
Willy Tarreau640cf222010-10-29 21:46:16 +0200150for the receiving side has been merged into haproxy and is enabled using the
151"accept-proxy" keyword on a "bind" statement. Haproxy will use the transport
152information from the PROXY protocol for logging, ACLs, etc... everywhere an
153information about the original connection is required.
154
155It is possible that the protocol may slightly evolve to present other
156information such as the incoming network interface, or the origin addresses in
157case of network address translation happening before the first proxy, but this
158is not identified as a requirement right now.
159--