Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 1 | The PROXY protocol |
| 2 | Willy Tarreau |
| 3 | 2011/03/20 |
| 4 | |
| 5 | Abstract |
| 6 | |
| 7 | The PROXY protocol provides a convenient way to safely transport connection |
| 8 | information such as a client's address across multiple layers of NAT or TCP |
| 9 | proxies. It is designed to require little changes to existing components and |
| 10 | to limit the performance impact caused by the processing of the transported |
| 11 | information. |
| 12 | |
| 13 | |
| 14 | Revision history |
| 15 | |
| 16 | 2010/10/29 - first version |
| 17 | 2011/03/20 - update: implementation and security considerations |
| 18 | |
| 19 | |
| 20 | 1. Background |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 21 | |
| 22 | Relaying TCP connections through proxies generally involves a loss of the |
| 23 | original TCP connection parameters such as source and destination addresses, |
| 24 | ports, and so on. Some protocols make it a little bit easier to transfer such |
| 25 | information. For SMTP, Postfix authors have proposed the XCLIENT protocol which |
| 26 | received broad adoption and is particularly suited to mail exchanges. In HTTP, |
| 27 | we have the non-standard but omnipresent X-Forwarded-For header which relays |
| 28 | information about the original source address, and the less common |
| 29 | X-Original-To which relays information about the destination address. |
| 30 | |
| 31 | However, both mechanisms require a knowledge of the underlying protocol to be |
| 32 | implemented in intermediaries. |
| 33 | |
| 34 | Then comes a new class of products which we'll call "dumb proxies", not because |
| 35 | they don't do anything, but because they're processing protocol-agnostic data. |
| 36 | Stunnel is an example of such a "dumb proxy". It talks raw TCP on one side, and |
| 37 | raw SSL on the other one, and does that reliably. |
| 38 | |
| 39 | The problem with such a proxy when it is combined with another one such as |
| 40 | haproxy is to adapt it to talk the higher level protocol. A patch is available |
| 41 | for Stunnel to make it capable to insert an X-Forwarded-For header in the first |
| 42 | HTTP request of each incoming connection. Haproxy is able not to add another |
| 43 | one when the connection comes from Stunnel, so that it's possible to hide it |
| 44 | from the servers. |
| 45 | |
| 46 | The typical architecture becomes the following one : |
| 47 | |
| 48 | |
| 49 | +--------+ HTTP :80 +----------+ |
| 50 | | client | --------------------------------> | | |
| 51 | | | | haproxy, | |
| 52 | +--------+ +---------+ | 1 or 2 | |
| 53 | / / HTTPS | stunnel | HTTP :81 | listening| |
| 54 | <________/ ---------> | (server | ---------> | ports | |
| 55 | | mode) | | | |
| 56 | +---------+ +----------+ |
| 57 | |
| 58 | |
| 59 | The problem appears when haproxy runs with keep-alive on the side towards the |
| 60 | client. The Stunnel patch will only add the X-Forwarded-For header to the first |
| 61 | request of each connection and all subsequent requests will not have it. One |
| 62 | solution could be to improve the patch to make it support keep-alive and parse |
| 63 | all forwarded data, whether they're announced with a Content-Length or with a |
| 64 | Transfer-Encoding, taking care of special methods such as HEAD which announce |
| 65 | data without transfering them, etc... In fact, it would require implementing a |
| 66 | full HTTP stack in Stunnel. It would then become a lot more complex, a lot less |
| 67 | reliable and would not anymore be the "dumb proxy" that fits every purposes. |
| 68 | |
| 69 | In practice, we don't need to add a header for each request because we'll emit |
| 70 | the exact same information every time : the information related to the client |
| 71 | side connection. We could then cache that information in haproxy and use it for |
| 72 | every other request. But that becomes dangerous and is still limited to HTTP |
| 73 | only. |
| 74 | |
| 75 | Another approach would be to prepend each connection with a line reporting the |
| 76 | characteristics of the other side's connection. This method is a lot simpler to |
| 77 | implement, does not require any protocol-specific knowledge on either side, and |
| 78 | completely fits the purpose. That's finally what we did with a small patch to |
| 79 | Stunnel and another one to haproxy. We have called this protocol the PROXY |
| 80 | protocol. |
| 81 | |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 82 | |
| 83 | 2. The PROXY protocol |
| 84 | |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 85 | The PROXY protocol's goal is to fill the receiver's internal structures with |
| 86 | the information it could have found itself if it performed the accept from the |
| 87 | client. Thus right now we're supporting the following : |
| 88 | - INET protocol and family (TCP over IPv4 or IPv6) |
| 89 | - layer 3 source and destination addresses |
| 90 | - layer 4 source and destination ports if any |
| 91 | |
| 92 | Unlike the XCLIENT protocol, the PROXY protocol was designed with limited |
| 93 | extensibility in order to help the receiver parse it very fast, while keeping |
| 94 | it human-readable for better debugging possibilities. So it consists in exactly |
| 95 | the following block prepended before any data flowing from the dumb proxy to |
| 96 | the next hop : |
| 97 | |
| 98 | - a string identifying the protocol : "PROXY" ( \x50 \x52 \x4F \x58 \x59 ) |
| 99 | |
| 100 | - exactly one space : " " ( \x20 ) |
| 101 | |
| 102 | - a string indicating the proxied INET protocol and family. At the moment, |
| 103 | only "TCP4" ( \x54 \x43 \x50 \x34 ) for TCP over IPv4, and "TCP6" |
| 104 | ( \x54 \x43 \x50 \x36 ) for TCP over IPv6 are allowed. Unsupported or |
| 105 | unknown protocols must be reported with the name "UNKNOWN" ( \x55 \x4E \x4B |
| 106 | \x4E \x4F \x57 \x4E). The remaining fields of the line are then optional |
| 107 | and may be ignored, until the CRLF is found. |
| 108 | |
| 109 | - exactly one space : " " ( \x20 ) |
| 110 | |
| 111 | - the layer 3 source address in its canonical format. IPv4 addresses must be |
| 112 | indicated as a series of exactly 4 integers in the range [0..255] inclusive |
| 113 | written in decimal representation separated by exactly one dot between each |
| 114 | other. Heading zeroes are not permitted in front of numbers in order to |
| 115 | avoid any possible confusion with octal numbers. IPv6 addresses must be |
| 116 | indicated as series of 4 hexadecimal digits (upper or lower case) delimited |
| 117 | by colons between each other, with the acceptance of one double colon |
| 118 | sequence to replace the largest acceptable range of consecutive zeroes. The |
| 119 | total number of decoded bits must exactly be 128. The advertised protocol |
| 120 | family dictates what format to use. |
| 121 | |
| 122 | - exactly one space : " " ( \x20 ) |
| 123 | |
| 124 | - the layer 3 destination address in its canonical format. It is the same |
| 125 | format as the layer 3 source address and matches the same family. |
| 126 | |
| 127 | - exactly one space : " " ( \x20 ) |
| 128 | |
| 129 | - the TCP source port represented as a decimal integer in the range |
| 130 | [0..65535] inclusive. Heading zeroes are not permitted in front of numbers |
| 131 | in order to avoid any possible confusion with octal numbers. |
| 132 | |
| 133 | - exactly one space : " " ( \x20 ) |
| 134 | |
| 135 | - the TCP destination port represented as a decimal integer in the range |
| 136 | [0..65535] inclusive. Heading zeroes are not permitted in front of numbers |
| 137 | in order to avoid any possible confusion with octal numbers. |
| 138 | |
| 139 | - the CRLF sequence ( \x0D \x0A ) |
| 140 | |
| 141 | The receiver MUST be configured to only receive this protocol and MUST not try |
| 142 | to guess whether the line is prepended or not. That means that the protocol |
| 143 | explicitly prevents port sharing between public and private access. Otherwise |
| 144 | it would become a big security issue. The receiver should ensure proper access |
| 145 | filtering so that only trusted proxies are allowed to use this protocol. The |
| 146 | receiver must wait for the CRLF sequence to decode the addresses in order to |
| 147 | ensure they are complete. Any sequence which does not exactly match the |
| 148 | protocol must be discarded and cause a connection abort. It is recommended |
| 149 | to abort the connection as soon as possible to that the emitter notices the |
| 150 | anomaly. |
| 151 | |
| 152 | If the announced transport protocol is "UNKNOWN", then the receiver knows that |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 153 | the emitter talks the correct protocol, and may or may not decide to accept the |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 154 | connection and use the real connection's parameters as if there was no such |
| 155 | protocol on the wire. |
| 156 | |
| 157 | An example of such a line before an HTTP request would look like this (CR |
| 158 | marked as "\r" and LF marked as "\n") : |
| 159 | |
| 160 | PROXY TCP4 192.168.0.1 192.168.0.11 56324 443\r\n |
| 161 | GET / HTTP/1.1\r\n |
| 162 | Host: 192.168.0.11\r\n |
| 163 | \r\n |
| 164 | |
| 165 | For the emitter, the line is easy to put into the output buffers once the |
| 166 | connection is established. For the receiver, once the line is parsed, it's |
| 167 | easy to skip it from the input buffers. |
| 168 | |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 169 | |
| 170 | 3. Implementations |
| 171 | |
| 172 | Haproxy 1.5 implements the PROXY protocol on both sides : |
| 173 | - the listening sockets accept the protocol when the "accept-proxy" setting |
| 174 | is passed to the "bind" keyword. Connections accepted on such listeners |
| 175 | will behave just as if the source really was the one advertised in the |
| 176 | protocol. This is true for logging, ACLs, content filtering, transparent |
| 177 | proxying, etc... |
| 178 | |
| 179 | - the protocol may be used to connect to servers if the "send-proxy" setting |
| 180 | is present on the "server" line. It is enabled on a per-server basis, so it |
| 181 | is possible to have it enabled for remote servers only and still have local |
| 182 | ones behave differently. If the incoming connection was accepted with the |
| 183 | "accept-proxy", then the relayed information is the one advertised in this |
| 184 | connection's PROXY line. |
| 185 | |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 186 | We have a patch available for recent versions of Stunnel that brings it the |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 187 | ability to be an emitter. The feature is called "sendproxy" there. |
| 188 | |
| 189 | The protocol is so simple that it is expected that other implementations will |
| 190 | appear, especially in environments such as SMTP, IMAP, FTP, RDP where the |
| 191 | client's address is an important piece of information for the server and some |
| 192 | intermediaries. |
| 193 | |
| 194 | Proxy developers are encouraged to implement this protocol, because it will |
| 195 | make their products much more transparent in complex infrastructures, and will |
| 196 | get rid of a number of issues related to logging and access control. |
| 197 | |
| 198 | |
| 199 | 4. Security considerations |
| 200 | |
| 201 | The protocol was designed so as to be distinguishable from HTTP. It will not |
| 202 | parse as a valid HTTP request and an HTTP request will not parse as a valid |
| 203 | proxy request. That makes it easier to enfore its use certain connections. |
| 204 | Implementers should be very careful about not trying to automatically detect |
| 205 | whether they have to decode the line or not, but rather to only rely on a |
| 206 | configuration parameter. Indeed, if the opportunity is left to a normal client |
| 207 | to use the protocol, he will be able to hide his activities or make them appear |
| 208 | as coming from someone else. However, accepting the line only from a number of |
| 209 | known sources should be safe. |
| 210 | |
| 211 | |
| 212 | 5. Future developments |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 213 | |
| 214 | It is possible that the protocol may slightly evolve to present other |
| 215 | information such as the incoming network interface, or the origin addresses in |
| 216 | case of network address translation happening before the first proxy, but this |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 217 | is not identified as a requirement right now. Suggestions on improvements are |
| 218 | welcome. |
| 219 | |
| 220 | |
| 221 | 6. Contacts |
| 222 | |
| 223 | Please use w@1wt.eu to send any comments to the author. |
| 224 | |