Tim Duesterhus | b435f77 | 2020-03-13 12:34:22 +0100 | [diff] [blame] | 1 | 2020/03/05 Willy Tarreau |
Willy Tarreau | a339395 | 2014-05-10 15:16:43 +0200 | [diff] [blame] | 2 | HAProxy Technologies |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 3 | The PROXY protocol |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 4 | Versions 1 & 2 |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 5 | |
| 6 | Abstract |
| 7 | |
| 8 | The PROXY protocol provides a convenient way to safely transport connection |
| 9 | information such as a client's address across multiple layers of NAT or TCP |
| 10 | proxies. It is designed to require little changes to existing components and |
| 11 | to limit the performance impact caused by the processing of the transported |
| 12 | information. |
| 13 | |
| 14 | |
| 15 | Revision history |
| 16 | |
| 17 | 2010/10/29 - first version |
| 18 | 2011/03/20 - update: implementation and security considerations |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 19 | 2012/06/21 - add support for binary format |
| 20 | 2012/11/19 - final review and fixes |
David S | afb7683 | 2014-05-08 23:42:08 -0400 | [diff] [blame] | 21 | 2014/05/18 - modify and extend PROXY protocol version 2 |
Willy Tarreau | 7a6f134 | 2014-06-14 11:45:09 +0200 | [diff] [blame] | 22 | 2014/06/11 - fix example code to consider ver+cmd merge |
| 23 | 2014/06/14 - fix v2 header check in example code, and update Forwarded spec |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 24 | 2014/07/12 - update list of implementations (add Squid) |
| 25 | 2015/05/02 - update list of implementations and format of the TLV add-ons |
Andriy Palamarchuk | 1a943c4 | 2017-03-23 16:30:24 -0400 | [diff] [blame] | 26 | 2017/03/10 - added the checksum, noop and more SSL-related TLV types, |
| 27 | reserved TLV type ranges, added TLV documentation, clarified |
| 28 | string encoding. With contributions from Andriy Palamarchuk |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 29 | (Amazon.com). |
Tim Duesterhus | b435f77 | 2020-03-13 12:34:22 +0100 | [diff] [blame] | 30 | 2020/03/05 - added the unique ID TLV type (Tim Düsterhus) |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 31 | |
| 32 | |
| 33 | 1. Background |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 34 | |
| 35 | Relaying TCP connections through proxies generally involves a loss of the |
| 36 | original TCP connection parameters such as source and destination addresses, |
| 37 | ports, and so on. Some protocols make it a little bit easier to transfer such |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 38 | information. For SMTP, Postfix authors have proposed the XCLIENT protocol [1] |
Willy Tarreau | 7a6f134 | 2014-06-14 11:45:09 +0200 | [diff] [blame] | 39 | which received broad adoption and is particularly suited to mail exchanges. |
| 40 | For HTTP, there is the "Forwarded" extension [2], which aims at replacing the |
| 41 | omnipresent "X-Forwarded-For" header which carries information about the |
| 42 | original source address, and the less common X-Original-To which carries |
| 43 | information about the destination address. |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 44 | |
| 45 | However, both mechanisms require a knowledge of the underlying protocol to be |
| 46 | implemented in intermediaries. |
| 47 | |
| 48 | Then comes a new class of products which we'll call "dumb proxies", not because |
| 49 | they don't do anything, but because they're processing protocol-agnostic data. |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 50 | Both Stunnel[3] and Stud[4] are examples of such "dumb proxies". They talk raw |
| 51 | TCP on one side, and raw SSL on the other one, and do that reliably, without |
Willy Tarreau | 714f345 | 2021-05-09 06:47:26 +0200 | [diff] [blame] | 52 | any knowledge of what protocol is transported on top of the connection. HAProxy |
Willy Tarreau | 7a6f134 | 2014-06-14 11:45:09 +0200 | [diff] [blame] | 53 | running in pure TCP mode obviously falls into that category as well. |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 54 | |
| 55 | The problem with such a proxy when it is combined with another one such as |
Willy Tarreau | 7a6f134 | 2014-06-14 11:45:09 +0200 | [diff] [blame] | 56 | haproxy, is to adapt it to talk the higher level protocol. A patch is available |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 57 | for Stunnel to make it capable of inserting an X-Forwarded-For header in the |
Willy Tarreau | 714f345 | 2021-05-09 06:47:26 +0200 | [diff] [blame] | 58 | first HTTP request of each incoming connection. HAProxy is able not to add |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 59 | another one when the connection comes from Stunnel, so that it's possible to |
| 60 | hide it from the servers. |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 61 | |
| 62 | The typical architecture becomes the following one : |
| 63 | |
| 64 | |
| 65 | +--------+ HTTP :80 +----------+ |
| 66 | | client | --------------------------------> | | |
| 67 | | | | haproxy, | |
| 68 | +--------+ +---------+ | 1 or 2 | |
| 69 | / / HTTPS | stunnel | HTTP :81 | listening| |
| 70 | <________/ ---------> | (server | ---------> | ports | |
| 71 | | mode) | | | |
| 72 | +---------+ +----------+ |
| 73 | |
| 74 | |
| 75 | The problem appears when haproxy runs with keep-alive on the side towards the |
| 76 | client. The Stunnel patch will only add the X-Forwarded-For header to the first |
| 77 | request of each connection and all subsequent requests will not have it. One |
| 78 | solution could be to improve the patch to make it support keep-alive and parse |
| 79 | all forwarded data, whether they're announced with a Content-Length or with a |
| 80 | Transfer-Encoding, taking care of special methods such as HEAD which announce |
Andriy Palamarchuk | f1eae4e | 2017-01-24 13:34:08 -0500 | [diff] [blame] | 81 | data without transferring them, etc... In fact, it would require implementing a |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 82 | full HTTP stack in Stunnel. It would then become a lot more complex, a lot less |
| 83 | reliable and would not anymore be the "dumb proxy" that fits every purposes. |
| 84 | |
| 85 | In practice, we don't need to add a header for each request because we'll emit |
| 86 | the exact same information every time : the information related to the client |
| 87 | side connection. We could then cache that information in haproxy and use it for |
| 88 | every other request. But that becomes dangerous and is still limited to HTTP |
| 89 | only. |
| 90 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 91 | Another approach consists in prepending each connection with a header reporting |
| 92 | the characteristics of the other side's connection. This method is simpler to |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 93 | implement, does not require any protocol-specific knowledge on either side, and |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 94 | completely fits the purpose since what is desired precisely is to know the |
| 95 | other side's connection endpoints. It is easy to perform for the sender (just |
| 96 | send a short header once the connection is established) and to parse for the |
| 97 | receiver (simply perform one read() on the incoming connection to fill in |
| 98 | addresses after an accept). The protocol used to carry connection information |
| 99 | across proxies was thus called the PROXY protocol. |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 100 | |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 101 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 102 | 2. The PROXY protocol header |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 103 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 104 | This document uses a few terms that are worth explaining here : |
| 105 | - "connection initiator" is the party requesting a new connection |
| 106 | - "connection target" is the party accepting a connection request |
| 107 | - "client" is the party for which a connection was requested |
| 108 | - "server" is the party to which the client desired to connect |
| 109 | - "proxy" is the party intercepting and relaying the connection |
| 110 | from the client to the server. |
| 111 | - "sender" is the party sending data over a connection. |
| 112 | - "receiver" is the party receiving data from the sender. |
| 113 | - "header" or "PROXY protocol header" is the block of connection information |
| 114 | the connection initiator prepends at the beginning of a connection, which |
| 115 | makes it the sender from the protocol point of view. |
| 116 | |
| 117 | The PROXY protocol's goal is to fill the server's internal structures with the |
| 118 | information collected by the proxy that the server would have been able to get |
| 119 | by itself if the client was connecting directly to the server instead of via a |
| 120 | proxy. The information carried by the protocol are the ones the server would |
| 121 | get using getsockname() and getpeername() : |
| 122 | - address family (AF_INET for IPv4, AF_INET6 for IPv6, AF_UNIX) |
| 123 | - socket protocol (SOCK_STREAM for TCP, SOCK_DGRAM for UDP) |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 124 | - layer 3 source and destination addresses |
| 125 | - layer 4 source and destination ports if any |
| 126 | |
| 127 | Unlike the XCLIENT protocol, the PROXY protocol was designed with limited |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 128 | extensibility in order to help the receiver parse it very fast. Version 1 was |
| 129 | focused on keeping it human-readable for better debugging possibilities, which |
| 130 | is always desirable for early adoption when few implementations exist. Version |
| 131 | 2 adds support for a binary encoding of the header which is much more efficient |
| 132 | to produce and to parse, especially when dealing with IPv6 addresses that are |
| 133 | expensive to emit in ASCII form and to parse. |
| 134 | |
| 135 | In both cases, the protocol simply consists in an easily parsable header placed |
| 136 | by the connection initiator at the beginning of each connection. The protocol |
| 137 | is intentionally stateless in that it does not expect the sender to wait for |
| 138 | the receiver before sending the header, nor the receiver to send anything back. |
| 139 | |
| 140 | This specification supports two header formats, a human-readable format which |
| 141 | is the only format supported in version 1 of the protocol, and a binary format |
| 142 | which is only supported in version 2. Both formats were designed to ensure that |
| 143 | the header cannot be confused with common higher level protocols such as HTTP, |
| 144 | SSL/TLS, FTP or SMTP, and that both formats are easily distinguishable one from |
| 145 | each other for the receiver. |
| 146 | |
| 147 | Version 1 senders MAY only produce the human-readable header format. Version 2 |
| 148 | senders MAY only produce the binary header format. Version 1 receivers MUST at |
| 149 | least implement the human-readable header format. Version 2 receivers MUST at |
| 150 | least implement the binary header format, and it is recommended that they also |
| 151 | implement the human-readable header format for better interoperability and ease |
| 152 | of upgrade when facing version 1 senders. |
| 153 | |
| 154 | Both formats are designed to fit in the smallest TCP segment that any TCP/IP |
| 155 | host is required to support (576 - 40 = 536 bytes). This ensures that the whole |
| 156 | header will always be delivered at once when the socket buffers are still empty |
| 157 | at the beginning of a connection. The sender must always ensure that the header |
| 158 | is sent at once, so that the transport layer maintains atomicity along the path |
| 159 | to the receiver. The receiver may be tolerant to partial headers or may simply |
| 160 | drop the connection when receiving a partial header. Recommendation is to be |
| 161 | tolerant, but implementation constraints may not always easily permit this. It |
| 162 | is important to note that nothing forces any intermediary to forward the whole |
| 163 | header at once, because TCP is a streaming protocol which may be processed one |
| 164 | byte at a time if desired, causing the header to be fragmented when reaching |
| 165 | the receiver. But due to the places where such a protocol is used, the above |
| 166 | simplification generally is acceptable because the risk of crossing such a |
| 167 | device handling one byte at a time is close to zero. |
| 168 | |
| 169 | The receiver MUST NOT start processing the connection before it receives a |
| 170 | complete and valid PROXY protocol header. This is particularly important for |
| 171 | protocols where the receiver is expected to speak first (eg: SMTP, FTP or SSH). |
| 172 | The receiver may apply a short timeout and decide to abort the connection if |
| 173 | the protocol header is not seen within a few seconds (at least 3 seconds to |
| 174 | cover a TCP retransmit). |
| 175 | |
| 176 | The receiver MUST be configured to only receive the protocol described in this |
| 177 | specification and MUST not try to guess whether the protocol header is present |
| 178 | or not. This means that the protocol explicitly prevents port sharing between |
| 179 | public and private access. Otherwise it would open a major security breach by |
| 180 | allowing untrusted parties to spoof their connection addresses. The receiver |
| 181 | SHOULD ensure proper access filtering so that only trusted proxies are allowed |
| 182 | to use this protocol. |
| 183 | |
| 184 | Some proxies are smart enough to understand transported protocols and to reuse |
| 185 | idle server connections for multiple messages. This typically happens in HTTP |
| 186 | where requests from multiple clients may be sent over the same connection. Such |
| 187 | proxies MUST NOT implement this protocol on multiplexed connections because the |
| 188 | receiver would use the address advertised in the PROXY header as the address of |
| 189 | all forwarded requests's senders. In fact, such proxies are not dumb proxies, |
| 190 | and since they do have a complete understanding of the transported protocol, |
| 191 | they MUST use the facilities provided by this protocol to present the client's |
| 192 | address. |
| 193 | |
| 194 | |
| 195 | 2.1. Human-readable header format (Version 1) |
| 196 | |
| 197 | This is the format specified in version 1 of the protocol. It consists in one |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 198 | line of US-ASCII text matching exactly the following block, sent immediately |
| 199 | and at once upon the connection establishment and prepended before any data |
| 200 | flowing from the sender to the receiver : |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 201 | |
| 202 | - a string identifying the protocol : "PROXY" ( \x50 \x52 \x4F \x58 \x59 ) |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 203 | Seeing this string indicates that this is version 1 of the protocol. |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 204 | |
| 205 | - exactly one space : " " ( \x20 ) |
| 206 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 207 | - a string indicating the proxied INET protocol and family. As of version 1, |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 208 | only "TCP4" ( \x54 \x43 \x50 \x34 ) for TCP over IPv4, and "TCP6" |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 209 | ( \x54 \x43 \x50 \x36 ) for TCP over IPv6 are allowed. Other, unsupported, |
| 210 | or unknown protocols must be reported with the name "UNKNOWN" ( \x55 \x4E |
| 211 | \x4B \x4E \x4F \x57 \x4E ). For "UNKNOWN", the rest of the line before the |
| 212 | CRLF may be omitted by the sender, and the receiver must ignore anything |
| 213 | presented before the CRLF is found. Note that an earlier version of this |
| 214 | specification suggested to use this when sending health checks, but this |
| 215 | causes issues with servers that reject the "UNKNOWN" keyword. Thus is it |
| 216 | now recommended not to send "UNKNOWN" when the connection is expected to |
| 217 | be accepted, but only when it is not possible to correctly fill the PROXY |
| 218 | line. |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 219 | |
| 220 | - exactly one space : " " ( \x20 ) |
| 221 | |
| 222 | - the layer 3 source address in its canonical format. IPv4 addresses must be |
| 223 | indicated as a series of exactly 4 integers in the range [0..255] inclusive |
| 224 | written in decimal representation separated by exactly one dot between each |
| 225 | other. Heading zeroes are not permitted in front of numbers in order to |
| 226 | avoid any possible confusion with octal numbers. IPv6 addresses must be |
Willy Tarreau | 269a9b6 | 2020-02-25 18:04:39 +0100 | [diff] [blame] | 227 | indicated as series of sets of 4 hexadecimal digits (upper or lower case) |
| 228 | delimited by colons between each other, with the acceptance of one double |
| 229 | colon sequence to replace the largest acceptable range of consecutive |
| 230 | zeroes. The total number of decoded bits must exactly be 128. The |
| 231 | advertised protocol family dictates what format to use. |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 232 | |
| 233 | - exactly one space : " " ( \x20 ) |
| 234 | |
| 235 | - the layer 3 destination address in its canonical format. It is the same |
| 236 | format as the layer 3 source address and matches the same family. |
| 237 | |
| 238 | - exactly one space : " " ( \x20 ) |
| 239 | |
| 240 | - the TCP source port represented as a decimal integer in the range |
| 241 | [0..65535] inclusive. Heading zeroes are not permitted in front of numbers |
| 242 | in order to avoid any possible confusion with octal numbers. |
| 243 | |
| 244 | - exactly one space : " " ( \x20 ) |
| 245 | |
| 246 | - the TCP destination port represented as a decimal integer in the range |
| 247 | [0..65535] inclusive. Heading zeroes are not permitted in front of numbers |
| 248 | in order to avoid any possible confusion with octal numbers. |
| 249 | |
| 250 | - the CRLF sequence ( \x0D \x0A ) |
| 251 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 252 | |
| 253 | The maximum line lengths the receiver must support including the CRLF are : |
| 254 | - TCP/IPv4 : |
| 255 | "PROXY TCP4 255.255.255.255 255.255.255.255 65535 65535\r\n" |
| 256 | => 5 + 1 + 4 + 1 + 15 + 1 + 15 + 1 + 5 + 1 + 5 + 2 = 56 chars |
| 257 | |
| 258 | - TCP/IPv6 : |
| 259 | "PROXY TCP6 ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n" |
| 260 | => 5 + 1 + 4 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 104 chars |
| 261 | |
| 262 | - unknown connection (short form) : |
| 263 | "PROXY UNKNOWN\r\n" |
| 264 | => 5 + 1 + 7 + 2 = 15 chars |
| 265 | |
| 266 | - worst case (optional fields set to 0xff) : |
| 267 | "PROXY UNKNOWN ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n" |
| 268 | => 5 + 1 + 7 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 107 chars |
| 269 | |
| 270 | So a 108-byte buffer is always enough to store all the line and a trailing zero |
| 271 | for string processing. |
| 272 | |
| 273 | The receiver must wait for the CRLF sequence before starting to decode the |
| 274 | addresses in order to ensure they are complete and properly parsed. If the CRLF |
| 275 | sequence is not found in the first 107 characters, the receiver should declare |
| 276 | the line invalid. A receiver may reject an incomplete line which does not |
| 277 | contain the CRLF sequence in the first atomic read operation. The receiver must |
| 278 | not tolerate a single CR or LF character to end the line when a complete CRLF |
| 279 | sequence is expected. |
| 280 | |
| 281 | Any sequence which does not exactly match the protocol must be discarded and |
| 282 | cause the receiver to abort the connection. It is recommended to abort the |
| 283 | connection as soon as possible so that the sender gets a chance to notice the |
| 284 | anomaly and log it. |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 285 | |
| 286 | If the announced transport protocol is "UNKNOWN", then the receiver knows that |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 287 | the sender speaks the correct PROXY protocol with the appropriate version, and |
| 288 | SHOULD accept the connection and use the real connection's parameters as if |
| 289 | there were no PROXY protocol header on the wire. However, senders SHOULD not |
| 290 | use the "UNKNOWN" protocol when they are the initiators of outgoing connections |
| 291 | because some receivers may reject them. When a load balancing proxy has to send |
| 292 | health checks to a server, it SHOULD build a valid PROXY line which it will |
| 293 | fill with a getsockname()/getpeername() pair indicating the addresses used. It |
| 294 | is important to understand that doing so is not appropriate when some source |
| 295 | address translation is performed between the sender and the receiver. |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 296 | |
| 297 | An example of such a line before an HTTP request would look like this (CR |
| 298 | marked as "\r" and LF marked as "\n") : |
| 299 | |
| 300 | PROXY TCP4 192.168.0.1 192.168.0.11 56324 443\r\n |
| 301 | GET / HTTP/1.1\r\n |
| 302 | Host: 192.168.0.11\r\n |
| 303 | \r\n |
| 304 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 305 | For the sender, the header line is easy to put into the output buffers once the |
| 306 | connection is established. Note that since the line is always shorter than an |
| 307 | MSS, the sender is guaranteed to always be able to emit it at once and should |
| 308 | not even bother handling partial sends. For the receiver, once the header is |
| 309 | parsed, it is easy to skip it from the input buffers. Please consult section 9 |
| 310 | for implementation suggestions. |
| 311 | |
| 312 | |
| 313 | 2.2. Binary header format (version 2) |
| 314 | |
| 315 | Producing human-readable IPv6 addresses and parsing them is very inefficient, |
| 316 | due to the multiple possible representation formats and the handling of compact |
| 317 | address format. It was also not possible to specify address families outside |
| 318 | IPv4/IPv6 nor non-TCP protocols. Another drawback of the human-readable format |
| 319 | is the fact that implementations need to parse all characters to find the |
| 320 | trailing CRLF, which makes it harder to read only the exact bytes count. Last, |
| 321 | the UNKNOWN address type has not always been accepted by servers as a valid |
| 322 | protocol because of its imprecise meaning. |
| 323 | |
| 324 | Version 2 of the protocol thus introduces a new binary format which remains |
| 325 | distinguishable from version 1 and from other commonly used protocols. It was |
| 326 | specially designed in order to be incompatible with a wide range of protocols |
| 327 | and to be rejected by a number of common implementations of these protocols |
| 328 | when unexpectedly presented (please see section 7). Also for better processing |
| 329 | efficiency, IPv4 and IPv6 addresses are respectively aligned on 4 and 16 bytes |
| 330 | boundaries. |
| 331 | |
| 332 | The binary header format starts with a constant 12 bytes block containing the |
| 333 | protocol signature : |
| 334 | |
| 335 | \x0D \x0A \x0D \x0A \x00 \x0D \x0A \x51 \x55 \x49 \x54 \x0A |
| 336 | |
| 337 | Note that this block contains a null byte at the 5th position, so it must not |
| 338 | be handled as a null-terminated string. |
| 339 | |
David S | afb7683 | 2014-05-08 23:42:08 -0400 | [diff] [blame] | 340 | The next byte (the 13th one) is the protocol version and command. |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 341 | |
David S | afb7683 | 2014-05-08 23:42:08 -0400 | [diff] [blame] | 342 | The highest four bits contains the version. As of this specification, it must |
| 343 | always be sent as \x2 and the receiver must only accept this value. |
| 344 | |
| 345 | The lowest four bits represents the command : |
| 346 | - \x0 : LOCAL : the connection was established on purpose by the proxy |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 347 | without being relayed. The connection endpoints are the sender and the |
| 348 | receiver. Such connections exist when the proxy sends health-checks to the |
| 349 | server. The receiver must accept this connection as valid and must use the |
| 350 | real connection endpoints and discard the protocol block including the |
| 351 | family which is ignored. |
| 352 | |
David S | afb7683 | 2014-05-08 23:42:08 -0400 | [diff] [blame] | 353 | - \x1 : PROXY : the connection was established on behalf of another node, |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 354 | and reflects the original connection endpoints. The receiver must then use |
| 355 | the information provided in the protocol block to get original the address. |
| 356 | |
| 357 | - other values are unassigned and must not be emitted by senders. Receivers |
| 358 | must drop connections presenting unexpected values here. |
| 359 | |
David S | afb7683 | 2014-05-08 23:42:08 -0400 | [diff] [blame] | 360 | The 14th byte contains the transport protocol and address family. The highest 4 |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 361 | bits contain the address family, the lowest 4 bits contain the protocol. |
| 362 | |
| 363 | The address family maps to the original socket family without necessarily |
| 364 | matching the values internally used by the system. It may be one of : |
| 365 | |
| 366 | - 0x0 : AF_UNSPEC : the connection is forwarded for an unknown, unspecified |
| 367 | or unsupported protocol. The sender should use this family when sending |
| 368 | LOCAL commands or when dealing with unsupported protocol families. The |
| 369 | receiver is free to accept the connection anyway and use the real endpoint |
| 370 | addresses or to reject it. The receiver should ignore address information. |
| 371 | |
| 372 | - 0x1 : AF_INET : the forwarded connection uses the AF_INET address family |
| 373 | (IPv4). The addresses are exactly 4 bytes each in network byte order, |
| 374 | followed by transport protocol information (typically ports). |
| 375 | |
| 376 | - 0x2 : AF_INET6 : the forwarded connection uses the AF_INET6 address family |
| 377 | (IPv6). The addresses are exactly 16 bytes each in network byte order, |
| 378 | followed by transport protocol information (typically ports). |
| 379 | |
| 380 | - 0x3 : AF_UNIX : the forwarded connection uses the AF_UNIX address family |
| 381 | (UNIX). The addresses are exactly 108 bytes each. |
| 382 | |
| 383 | - other values are unspecified and must not be emitted in version 2 of this |
| 384 | protocol and must be rejected as invalid by receivers. |
| 385 | |
Andriy Palamarchuk | f1eae4e | 2017-01-24 13:34:08 -0500 | [diff] [blame] | 386 | The transport protocol is specified in the lowest 4 bits of the 14th byte : |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 387 | |
| 388 | - 0x0 : UNSPEC : the connection is forwarded for an unknown, unspecified |
| 389 | or unsupported protocol. The sender should use this family when sending |
| 390 | LOCAL commands or when dealing with unsupported protocol families. The |
| 391 | receiver is free to accept the connection anyway and use the real endpoint |
| 392 | addresses or to reject it. The receiver should ignore address information. |
| 393 | |
| 394 | - 0x1 : STREAM : the forwarded connection uses a SOCK_STREAM protocol (eg: |
| 395 | TCP or UNIX_STREAM). When used with AF_INET/AF_INET6 (TCP), the addresses |
| 396 | are followed by the source and destination ports represented on 2 bytes |
| 397 | each in network byte order. |
| 398 | |
| 399 | - 0x2 : DGRAM : the forwarded connection uses a SOCK_DGRAM protocol (eg: |
| 400 | UDP or UNIX_DGRAM). When used with AF_INET/AF_INET6 (UDP), the addresses |
| 401 | are followed by the source and destination ports represented on 2 bytes |
| 402 | each in network byte order. |
| 403 | |
| 404 | - other values are unspecified and must not be emitted in version 2 of this |
| 405 | protocol and must be rejected as invalid by receivers. |
| 406 | |
| 407 | In practice, the following protocol bytes are expected : |
| 408 | |
| 409 | - \x00 : UNSPEC : the connection is forwarded for an unknown, unspecified |
| 410 | or unsupported protocol. The sender should use this family when sending |
| 411 | LOCAL commands or when dealing with unsupported protocol families. When |
| 412 | used with a LOCAL command, the receiver must accept the connection and |
| 413 | ignore any address information. For other commands, the receiver is free |
| 414 | to accept the connection anyway and use the real endpoints addresses or to |
| 415 | reject the connection. The receiver should ignore address information. |
| 416 | |
| 417 | - \x11 : TCP over IPv4 : the forwarded connection uses TCP over the AF_INET |
| 418 | protocol family. Address length is 2*4 + 2*2 = 12 bytes. |
| 419 | |
| 420 | - \x12 : UDP over IPv4 : the forwarded connection uses UDP over the AF_INET |
| 421 | protocol family. Address length is 2*4 + 2*2 = 12 bytes. |
| 422 | |
| 423 | - \x21 : TCP over IPv6 : the forwarded connection uses TCP over the AF_INET6 |
| 424 | protocol family. Address length is 2*16 + 2*2 = 36 bytes. |
| 425 | |
| 426 | - \x22 : UDP over IPv6 : the forwarded connection uses UDP over the AF_INET6 |
| 427 | protocol family. Address length is 2*16 + 2*2 = 36 bytes. |
| 428 | |
| 429 | - \x31 : UNIX stream : the forwarded connection uses SOCK_STREAM over the |
| 430 | AF_UNIX protocol family. Address length is 2*108 = 216 bytes. |
| 431 | |
| 432 | - \x32 : UNIX datagram : the forwarded connection uses SOCK_DGRAM over the |
| 433 | AF_UNIX protocol family. Address length is 2*108 = 216 bytes. |
| 434 | |
| 435 | |
Andriy Palamarchuk | ceae85b | 2017-01-24 13:48:27 -0500 | [diff] [blame] | 436 | Only the UNSPEC protocol byte (\x00) is mandatory to implement on the receiver. |
| 437 | A receiver is not required to implement other ones, provided that it |
| 438 | automatically falls back to the UNSPEC mode for the valid combinations above |
| 439 | that it does not support. |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 440 | |
Andriy Palamarchuk | f1eae4e | 2017-01-24 13:34:08 -0500 | [diff] [blame] | 441 | The 15th and 16th bytes is the address length in bytes in network endian order. |
David S | afb7683 | 2014-05-08 23:42:08 -0400 | [diff] [blame] | 442 | It is used so that the receiver knows how many address bytes to skip even when |
| 443 | it does not implement the presented protocol. Thus the length of the protocol |
| 444 | header in bytes is always exactly 16 + this value. When a sender presents a |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 445 | LOCAL connection, it should not present any address so it sets this field to |
| 446 | zero. Receivers MUST always consider this field to skip the appropriate number |
| 447 | of bytes and must not assume zero is presented for LOCAL connections. When a |
| 448 | receiver accepts an incoming connection showing an UNSPEC address family or |
| 449 | protocol, it may or may not decide to log the address information if present. |
| 450 | |
| 451 | So the 16-byte version 2 header can be described this way : |
| 452 | |
| 453 | struct proxy_hdr_v2 { |
| 454 | uint8_t sig[12]; /* hex 0D 0A 0D 0A 00 0D 0A 51 55 49 54 0A */ |
Willy Tarreau | 0f6093a | 2014-06-11 21:21:26 +0200 | [diff] [blame] | 455 | uint8_t ver_cmd; /* protocol version and command */ |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 456 | uint8_t fam; /* protocol family and address */ |
David S | afb7683 | 2014-05-08 23:42:08 -0400 | [diff] [blame] | 457 | uint16_t len; /* number of following bytes part of the header */ |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 458 | }; |
| 459 | |
| 460 | Starting from the 17th byte, addresses are presented in network byte order. |
| 461 | The address order is always the same : |
| 462 | - source layer 3 address in network byte order |
| 463 | - destination layer 3 address in network byte order |
| 464 | - source layer 4 address if any, in network byte order (port) |
| 465 | - destination layer 4 address if any, in network byte order (port) |
| 466 | |
| 467 | The address block may directly be sent from or received into the following |
| 468 | union which makes it easy to cast from/to the relevant socket native structs |
| 469 | depending on the address type : |
| 470 | |
| 471 | union proxy_addr { |
| 472 | struct { /* for TCP/UDP over IPv4, len = 12 */ |
| 473 | uint32_t src_addr; |
| 474 | uint32_t dst_addr; |
| 475 | uint16_t src_port; |
| 476 | uint16_t dst_port; |
| 477 | } ipv4_addr; |
| 478 | struct { /* for TCP/UDP over IPv6, len = 36 */ |
| 479 | uint8_t src_addr[16]; |
| 480 | uint8_t dst_addr[16]; |
| 481 | uint16_t src_port; |
| 482 | uint16_t dst_port; |
| 483 | } ipv6_addr; |
| 484 | struct { /* for AF_UNIX sockets, len = 216 */ |
| 485 | uint8_t src_addr[108]; |
| 486 | uint8_t dst_addr[108]; |
| 487 | } unix_addr; |
| 488 | }; |
| 489 | |
| 490 | The sender must ensure that all the protocol header is sent at once. This block |
| 491 | is always smaller than an MSS, so there is no reason for it to be segmented at |
| 492 | the beginning of the connection. The receiver should also process the header |
| 493 | at once. The receiver must not start to parse an address before the whole |
| 494 | address block is received. The receiver must also reject incoming connections |
| 495 | containing partial protocol headers. |
| 496 | |
| 497 | A receiver may be configured to support both version 1 and version 2 of the |
| 498 | protocol. Identifying the protocol version is easy : |
| 499 | |
| 500 | - if the incoming byte count is 16 or above and the 13 first bytes match |
| 501 | the protocol signature block followed by the protocol version 2 : |
| 502 | |
Willy Tarreau | e008402 | 2023-02-12 09:26:48 +0100 | [diff] [blame] | 503 | \x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A\x20 |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 504 | |
| 505 | - otherwise, if the incoming byte count is 8 or above, and the 5 first |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 506 | characters match the US-ASCII representation of "PROXY" then the protocol |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 507 | must be parsed as version 1 : |
| 508 | |
| 509 | \x50\x52\x4F\x58\x59 |
| 510 | |
| 511 | - otherwise the protocol is not covered by this specification and the |
| 512 | connection must be dropped. |
| 513 | |
David S | afb7683 | 2014-05-08 23:42:08 -0400 | [diff] [blame] | 514 | If the length specified in the PROXY protocol header indicates that additional |
| 515 | bytes are part of the header beyond the address information, a receiver may |
| 516 | choose to skip over and ignore those bytes, or attempt to interpret those |
| 517 | bytes. |
| 518 | |
| 519 | The information in those bytes will be arranged in Type-Length-Value (TLV |
| 520 | vectors) in the following format. The first byte is the Type of the vector. |
| 521 | The second two bytes represent the length in bytes of the value (not included |
| 522 | the Type and Length bytes), and following the length field is the number of |
| 523 | bytes specified by the length. |
| 524 | |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 525 | struct pp2_tlv { |
David S | afb7683 | 2014-05-08 23:42:08 -0400 | [diff] [blame] | 526 | uint8_t type; |
| 527 | uint8_t length_hi; |
| 528 | uint8_t length_lo; |
| 529 | uint8_t value[0]; |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 530 | }; |
David S | afb7683 | 2014-05-08 23:42:08 -0400 | [diff] [blame] | 531 | |
Jackie Tapia | 749f74c | 2020-07-22 18:59:40 -0500 | [diff] [blame] | 532 | A receiver may choose to skip over and ignore the TLVs it is not interested in |
| 533 | or it does not understand. Senders can generate the TLVs only for |
Andriy Palamarchuk | ceae85b | 2017-01-24 13:48:27 -0500 | [diff] [blame] | 534 | the information they choose to publish. |
| 535 | |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 536 | The following types have already been registered for the <type> field : |
| 537 | |
Nikos Mavrogiannopoulos | f1650a8 | 2015-08-24 15:53:18 +0200 | [diff] [blame] | 538 | #define PP2_TYPE_ALPN 0x01 |
| 539 | #define PP2_TYPE_AUTHORITY 0x02 |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 540 | #define PP2_TYPE_CRC32C 0x03 |
Andriy Palamarchuk | 1a943c4 | 2017-03-23 16:30:24 -0400 | [diff] [blame] | 541 | #define PP2_TYPE_NOOP 0x04 |
Tim Duesterhus | b435f77 | 2020-03-13 12:34:22 +0100 | [diff] [blame] | 542 | #define PP2_TYPE_UNIQUE_ID 0x05 |
Nikos Mavrogiannopoulos | f1650a8 | 2015-08-24 15:53:18 +0200 | [diff] [blame] | 543 | #define PP2_TYPE_SSL 0x20 |
| 544 | #define PP2_SUBTYPE_SSL_VERSION 0x21 |
| 545 | #define PP2_SUBTYPE_SSL_CN 0x22 |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 546 | #define PP2_SUBTYPE_SSL_CIPHER 0x23 |
| 547 | #define PP2_SUBTYPE_SSL_SIG_ALG 0x24 |
| 548 | #define PP2_SUBTYPE_SSL_KEY_ALG 0x25 |
Nikos Mavrogiannopoulos | f1650a8 | 2015-08-24 15:53:18 +0200 | [diff] [blame] | 549 | #define PP2_TYPE_NETNS 0x30 |
| 550 | |
Andriy Palamarchuk | ceae85b | 2017-01-24 13:48:27 -0500 | [diff] [blame] | 551 | |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 552 | 2.2.1 PP2_TYPE_ALPN |
Andriy Palamarchuk | ceae85b | 2017-01-24 13:48:27 -0500 | [diff] [blame] | 553 | |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 554 | Application-Layer Protocol Negotiation (ALPN). It is a byte sequence defining |
| 555 | the upper layer protocol in use over the connection. The most common use case |
| 556 | will be to pass the exact copy of the ALPN extension of the Transport Layer |
| 557 | Security (TLS) protocol as defined by RFC7301 [9]. |
| 558 | |
| 559 | |
| 560 | 2.2.2 PP2_TYPE_AUTHORITY |
| 561 | |
| 562 | Contains the host name value passed by the client, as an UTF8-encoded string. |
| 563 | In case of TLS being used on the client connection, this is the exact copy of |
| 564 | the "server_name" extension as defined by RFC3546 [10], section 3.1, often |
| 565 | referred to as "SNI". There are probably other situations where an authority |
Michael Prokop | 4438c60 | 2019-05-24 10:25:45 +0200 | [diff] [blame] | 566 | can be mentioned on a connection without TLS being involved at all. |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 567 | |
| 568 | |
| 569 | 2.2.3. PP2_TYPE_CRC32C |
| 570 | |
| 571 | The value of the type PP2_TYPE_CRC32C is a 32-bit number storing the CRC32c |
Andriy Palamarchuk | ceae85b | 2017-01-24 13:48:27 -0500 | [diff] [blame] | 572 | checksum of the PROXY protocol header. |
| 573 | |
| 574 | When the checksum is supported by the sender after constructing the header |
| 575 | the sender MUST: |
| 576 | |
| 577 | - initialize the checksum field to '0's. |
| 578 | |
| 579 | - calculate the CRC32c checksum of the PROXY header as described in RFC4960, |
| 580 | Appendix B [8]. |
Nikos Mavrogiannopoulos | f1650a8 | 2015-08-24 15:53:18 +0200 | [diff] [blame] | 581 | |
Andriy Palamarchuk | ceae85b | 2017-01-24 13:48:27 -0500 | [diff] [blame] | 582 | - put the resultant value into the checksum field, and leave the rest of |
| 583 | the bits unchanged. |
| 584 | |
| 585 | If the checksum is provided as part of the PROXY header and the checksum |
| 586 | functionality is supported by the receiver, the receiver MUST: |
| 587 | |
| 588 | - store the received CRC32c checksum value aside. |
| 589 | |
| 590 | - replace the 32 bits of the checksum field in the received PROXY header with |
| 591 | all '0's and calculate a CRC32c checksum value of the whole PROXY header. |
| 592 | |
| 593 | - verify that the calculated CRC32c checksum is the same as the received |
| 594 | CRC32c checksum. If it is not, the receiver MUST treat the TCP connection |
| 595 | providing the header as invalid. |
| 596 | |
| 597 | The default procedure for handling an invalid TCP connection is to abort it. |
| 598 | |
| 599 | |
Andriy Palamarchuk | 1a943c4 | 2017-03-23 16:30:24 -0400 | [diff] [blame] | 600 | 2.2.4. PP2_TYPE_NOOP |
| 601 | |
| 602 | The TLV of this type should be ignored when parsed. The value is zero or more |
| 603 | bytes. Can be used for data padding or alignment. Note that it can be used |
| 604 | to align only by 3 or more bytes because a TLV can not be smaller than that. |
| 605 | |
| 606 | |
Tim Duesterhus | b435f77 | 2020-03-13 12:34:22 +0100 | [diff] [blame] | 607 | 2.2.5. PP2_TYPE_UNIQUE_ID |
| 608 | |
| 609 | The value of the type PP2_TYPE_UNIQUE_ID is an opaque byte sequence of up to |
| 610 | 128 bytes generated by the upstream proxy that uniquely identifies the |
| 611 | connection. |
| 612 | |
| 613 | The unique ID can be used to easily correlate connections across multiple |
| 614 | layers of proxies, without needing to look up IP addresses and port numbers. |
| 615 | |
| 616 | |
| 617 | 2.2.6. The PP2_TYPE_SSL type and subtypes |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 618 | |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 619 | For the type PP2_TYPE_SSL, the value is itself a defined like this : |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 620 | |
| 621 | struct pp2_tlv_ssl { |
| 622 | uint8_t client; |
| 623 | uint32_t verify; |
| 624 | struct pp2_tlv sub_tlv[0]; |
| 625 | }; |
| 626 | |
Nikos Mavrogiannopoulos | f1650a8 | 2015-08-24 15:53:18 +0200 | [diff] [blame] | 627 | The <verify> field will be zero if the client presented a certificate |
| 628 | and it was successfully verified, and non-zero otherwise. |
| 629 | |
| 630 | The <client> field is made of a bit field from the following values, |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 631 | indicating which element is present : |
| 632 | |
| 633 | #define PP2_CLIENT_SSL 0x01 |
| 634 | #define PP2_CLIENT_CERT_CONN 0x02 |
| 635 | #define PP2_CLIENT_CERT_SESS 0x04 |
| 636 | |
Nikos Mavrogiannopoulos | f1650a8 | 2015-08-24 15:53:18 +0200 | [diff] [blame] | 637 | Note, that each of these elements may lead to extra data being appended to |
| 638 | this TLV using a second level of TLV encapsulation. It is thus possible to |
| 639 | find multiple TLV values after this field. The total length of the pp2_tlv_ssl |
| 640 | TLV will reflect this. |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 641 | |
Nikos Mavrogiannopoulos | f1650a8 | 2015-08-24 15:53:18 +0200 | [diff] [blame] | 642 | The PP2_CLIENT_SSL flag indicates that the client connected over SSL/TLS. When |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 643 | this field is present, the US-ASCII string representation of the TLS version is |
| 644 | appended at the end of the field in the TLV format using the type |
| 645 | PP2_SUBTYPE_SSL_VERSION. |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 646 | |
| 647 | PP2_CLIENT_CERT_CONN indicates that the client provided a certificate over the |
| 648 | current connection. PP2_CLIENT_CERT_SESS indicates that the client provided a |
Nikos Mavrogiannopoulos | f1650a8 | 2015-08-24 15:53:18 +0200 | [diff] [blame] | 649 | certificate at least once over the TLS session this connection belongs to. |
| 650 | |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 651 | The second level TLV PP2_SUBTYPE_SSL_CIPHER provides the US-ASCII string name |
| 652 | of the used cipher, for example "ECDHE-RSA-AES128-GCM-SHA256". |
| 653 | |
| 654 | The second level TLV PP2_SUBTYPE_SSL_SIG_ALG provides the US-ASCII string name |
| 655 | of the algorithm used to sign the certificate presented by the frontend when |
| 656 | the incoming connection was made over an SSL/TLS transport layer, for example |
| 657 | "SHA256". |
| 658 | |
| 659 | The second level TLV PP2_SUBTYPE_SSL_KEY_ALG provides the US-ASCII string name |
| 660 | of the algorithm used to generate the key of the certificate presented by the |
| 661 | frontend when the incoming connection was made over an SSL/TLS transport layer, |
| 662 | for example "RSA2048". |
| 663 | |
Nikos Mavrogiannopoulos | f1650a8 | 2015-08-24 15:53:18 +0200 | [diff] [blame] | 664 | In all cases, the string representation (in UTF8) of the Common Name field |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 665 | (OID: 2.5.4.3) of the client certificate's Distinguished Name, is appended |
| 666 | using the TLV format and the type PP2_SUBTYPE_SSL_CN. E.g. "example.com". |
Nikos Mavrogiannopoulos | f1650a8 | 2015-08-24 15:53:18 +0200 | [diff] [blame] | 667 | |
| 668 | |
Tim Duesterhus | b435f77 | 2020-03-13 12:34:22 +0100 | [diff] [blame] | 669 | 2.2.7. The PP2_TYPE_NETNS type |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 670 | |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 671 | The type PP2_TYPE_NETNS defines the value as the US-ASCII string representation |
| 672 | of the namespace's name. |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 673 | |
Andriy Palamarchuk | ceae85b | 2017-01-24 13:48:27 -0500 | [diff] [blame] | 674 | |
Tim Duesterhus | b435f77 | 2020-03-13 12:34:22 +0100 | [diff] [blame] | 675 | 2.2.8. Reserved type ranges |
Andriy Palamarchuk | ceae85b | 2017-01-24 13:48:27 -0500 | [diff] [blame] | 676 | |
| 677 | The following range of 16 type values is reserved for application-specific |
| 678 | data and will be never used by the PROXY Protocol. If you need more values |
| 679 | consider extending the range with a type field in your TLVs. |
| 680 | |
| 681 | #define PP2_TYPE_MIN_CUSTOM 0xE0 |
| 682 | #define PP2_TYPE_MAX_CUSTOM 0xEF |
| 683 | |
| 684 | This range of 8 values is reserved for temporary experimental use by |
| 685 | application developers and protocol designers. The values from the range will |
| 686 | never be used by the PROXY protocol and should not be used by production |
| 687 | functionality. |
| 688 | |
| 689 | #define PP2_TYPE_MIN_EXPERIMENT 0xF0 |
| 690 | #define PP2_TYPE_MAX_EXPERIMENT 0xF7 |
| 691 | |
| 692 | The following range of 8 values is reserved for future use, potentially to |
| 693 | extend the protocol with multibyte type values. |
| 694 | |
| 695 | #define PP2_TYPE_MIN_FUTURE 0xF8 |
| 696 | #define PP2_TYPE_MAX_FUTURE 0xFF |
| 697 | |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 698 | |
| 699 | 3. Implementations |
| 700 | |
Willy Tarreau | 714f345 | 2021-05-09 06:47:26 +0200 | [diff] [blame] | 701 | HAProxy 1.5 implements version 1 of the PROXY protocol on both sides : |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 702 | - the listening sockets accept the protocol when the "accept-proxy" setting |
| 703 | is passed to the "bind" keyword. Connections accepted on such listeners |
| 704 | will behave just as if the source really was the one advertised in the |
| 705 | protocol. This is true for logging, ACLs, content filtering, transparent |
| 706 | proxying, etc... |
| 707 | |
| 708 | - the protocol may be used to connect to servers if the "send-proxy" setting |
| 709 | is present on the "server" line. It is enabled on a per-server basis, so it |
| 710 | is possible to have it enabled for remote servers only and still have local |
| 711 | ones behave differently. If the incoming connection was accepted with the |
| 712 | "accept-proxy", then the relayed information is the one advertised in this |
| 713 | connection's PROXY line. |
| 714 | |
Willy Tarreau | 714f345 | 2021-05-09 06:47:26 +0200 | [diff] [blame] | 715 | - HAProxy 1.5 also implements version 2 of the PROXY protocol as a sender. In |
David S | afb7683 | 2014-05-08 23:42:08 -0400 | [diff] [blame] | 716 | addition, a TLV with limited, optional, SSL information has been added. |
| 717 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 718 | Stunnel added support for version 1 of the protocol for outgoing connections in |
| 719 | version 4.45. |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 720 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 721 | Stud added support for version 1 of the protocol for outgoing connections on |
| 722 | 2011/06/29. |
| 723 | |
| 724 | Postfix added support for version 1 of the protocol for incoming connections |
| 725 | in smtpd and postscreen in version 2.10. |
| 726 | |
| 727 | A patch is available for Stud[5] to implement version 1 of the protocol on |
| 728 | incoming connections. |
| 729 | |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 730 | Support for versions 1 and 2 of the protocol was added to Varnish 4.1 [6]. |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 731 | |
Todd Lyons | d1dcea0 | 2014-06-03 13:29:33 -0700 | [diff] [blame] | 732 | Exim added support for version 1 and version 2 of the protocol for incoming |
| 733 | connections on 2014/05/13, and will be released as part of version 4.83. |
| 734 | |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 735 | Squid added support for versions 1 and 2 of the protocol in version 3.5 [7]. |
| 736 | |
| 737 | Jetty 9.3.0 supports protocol version 1. |
| 738 | |
Glenn Strauss | c28bb55 | 2017-04-05 01:51:37 -0400 | [diff] [blame] | 739 | lighttpd added support for versions 1 and 2 of the protocol for incoming |
| 740 | connections in version 1.4.46 [11]. |
| 741 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 742 | The protocol is simple enough that it is expected that other implementations |
| 743 | will appear, especially in environments such as SMTP, IMAP, FTP, RDP where the |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 744 | client's address is an important piece of information for the server and some |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 745 | intermediaries. In fact, several proprietary deployments have already done so |
| 746 | on FTP and SMTP servers. |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 747 | |
| 748 | Proxy developers are encouraged to implement this protocol, because it will |
| 749 | make their products much more transparent in complex infrastructures, and will |
| 750 | get rid of a number of issues related to logging and access control. |
| 751 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 752 | |
| 753 | 4. Architectural benefits |
| 754 | 4.1. Multiple layers |
| 755 | |
| 756 | Using the PROXY protocol instead of transparent proxy provides several benefits |
| 757 | in multiple-layer infrastructures. The first immediate benefit is that it |
| 758 | becomes possible to chain multiple layers of proxies and always present the |
| 759 | original IP address. for instance, let's consider the following 2-layer proxy |
| 760 | architecture : |
| 761 | |
| 762 | Internet |
| 763 | ,---. | client to PX1: |
| 764 | ( X ) | native protocol |
| 765 | `---' | |
| 766 | | V |
| 767 | +--+--+ +-----+ |
| 768 | | FW1 |------| PX1 | |
| 769 | +--+--+ +-----+ | PX1 to PX2: PROXY + native |
| 770 | | V |
| 771 | +--+--+ +-----+ |
| 772 | | FW2 |------| PX2 | |
| 773 | +--+--+ +-----+ | PX2 to SRV: PROXY + native |
| 774 | | V |
| 775 | +--+--+ |
| 776 | | SRV | |
| 777 | +-----+ |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 778 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 779 | Firewall FW1 receives traffic from internet-based clients and forwards it to |
| 780 | reverse-proxy PX1. PX1 adds a PROXY header then forwards to PX2 via FW2. PX2 |
| 781 | is configured to read the PROXY header and to emit it on output. It then joins |
| 782 | the origin server SRV and presents the original client's address there. Since |
| 783 | all TCP connections endpoints are real machines and are not spoofed, there is |
| 784 | no issue for the return traffic to pass via the firewalls and reverse proxies. |
| 785 | Using transparent proxy, this would be quite difficult because the firewalls |
| 786 | would have to deal with the client's address coming from the proxies in the DMZ |
| 787 | and would have to correctly route the return traffic there instead of using the |
| 788 | default route. |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 789 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 790 | |
| 791 | 4.2. IPv4 and IPv6 integration |
| 792 | |
| 793 | The protocol also eases IPv4 and IPv6 integration : if only the first layer |
| 794 | (FW1 and PX1) is IPv6-capable, it is still possible to present the original |
Andriy Palamarchuk | f1eae4e | 2017-01-24 13:34:08 -0500 | [diff] [blame] | 795 | client's IPv6 address to the target server even though the whole chain is only |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 796 | connected via IPv4. |
| 797 | |
| 798 | |
| 799 | 4.3. Multiple return paths |
| 800 | |
| 801 | When transparent proxy is used, it is not possible to run multiple proxies |
| 802 | because the return traffic would follow the default route instead of finding |
| 803 | the proper proxy. Some tricks are sometimes possible using multiple server |
| 804 | addresses and policy routing but these are very limited. |
| 805 | |
| 806 | Using the PROXY protocol, this problem disappears as the servers don't need |
| 807 | to route to the client, just to the proxy that forwarded the connection. So |
| 808 | it is perfectly possible to run a proxy farm in front of a very large server |
| 809 | farm and have it working effortless, even when dealing with multiple sites. |
| 810 | |
| 811 | This is particularly important in Cloud-like environments where there is little |
| 812 | choice of binding to random addresses and where the lower processing power per |
| 813 | node generally requires multiple front nodes. |
| 814 | |
| 815 | The example below illustrates the following case : virtualized infrastructures |
| 816 | are deployed in 3 datacenters (DC1..DC3). Each DC uses its own VIP which is |
| 817 | handled by the hosting provider's layer 3 load balancer. This load balancer |
| 818 | routes the traffic to a farm of layer 7 SSL/cache offloaders which load balance |
| 819 | among their local servers. The VIPs are advertised by geolocalised DNS so that |
| 820 | clients generally stick to a given DC. Since clients are not guaranteed to |
| 821 | stick to one DC, the L7 load balancing proxies have to know the other DCs' |
| 822 | servers that may be reached via the hosting provider's LAN or via the internet. |
| 823 | The L7 proxies use the PROXY protocol to join the servers behind them, so that |
| 824 | even inter-DC traffic can forward the original client's address and the return |
| 825 | path is unambiguous. This would not be possible using transparent proxy because |
| 826 | most often the L7 proxies would not be able to spoof an address, and this would |
| 827 | never work between datacenters. |
| 828 | |
| 829 | Internet |
| 830 | |
| 831 | DC1 DC2 DC3 |
| 832 | ,---. ,---. ,---. |
| 833 | ( X ) ( X ) ( X ) |
| 834 | `---' `---' `---' |
| 835 | | +-------+ | +-------+ | +-------+ |
| 836 | +----| L3 LB | +----| L3 LB | +----| L3 LB | |
| 837 | | +-------+ | +-------+ | +-------+ |
| 838 | ------+------- ~ ~ ~ ------+------- ~ ~ ~ ------+------- |
| 839 | ||||| |||| ||||| |||| ||||| |||| |
| 840 | 50 SRV 4 PX 50 SRV 4 PX 50 SRV 4 PX |
| 841 | |
| 842 | |
| 843 | 5. Security considerations |
| 844 | |
| 845 | Version 1 of the protocol header (the human-readable format) was designed so as |
| 846 | to be distinguishable from HTTP. It will not parse as a valid HTTP request and |
| 847 | an HTTP request will not parse as a valid proxy request. Version 2 add to use a |
| 848 | non-parsable binary signature to make many products fail on this block. The |
| 849 | signature was designed to cause immediate failure on HTTP, SSL/TLS, SMTP, FTP, |
| 850 | and POP. It also causes aborts on LDAP and RDP servers (see section 6). That |
| 851 | makes it easier to enforce its use under certain connections and at the same |
| 852 | time, it ensures that improperly configured servers are quickly detected. |
| 853 | |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 854 | Implementers should be very careful about not trying to automatically detect |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 855 | whether they have to decode the header or not, but rather they must only rely |
| 856 | on a configuration parameter. Indeed, if the opportunity is left to a normal |
Jackie Tapia | 749f74c | 2020-07-22 18:59:40 -0500 | [diff] [blame] | 857 | client to use the protocol, it will be able to hide its activities or make them |
| 858 | appear as coming from somewhere else. However, accepting the header only from a |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 859 | number of known sources should be safe. |
| 860 | |
| 861 | |
| 862 | 6. Validation |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 863 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 864 | The version 2 protocol signature has been sent to a wide variety of protocols |
| 865 | and implementations including old ones. The following protocol and products |
Andriy Palamarchuk | f1eae4e | 2017-01-24 13:34:08 -0500 | [diff] [blame] | 866 | have been tested to ensure the best possible behavior when the signature was |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 867 | presented, even with minimal implementations : |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 868 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 869 | - HTTP : |
| 870 | - Apache 1.3.33 : connection abort => pass/optimal |
| 871 | - Nginx 0.7.69 : 400 Bad Request + abort => pass/optimal |
| 872 | - lighttpd 1.4.20 : 400 Bad Request + abort => pass/optimal |
| 873 | - thttpd 2.20c : 400 Bad Request + abort => pass/optimal |
| 874 | - mini-httpd-1.19 : 400 Bad Request + abort => pass/optimal |
| 875 | - haproxy 1.4.21 : 400 Bad Request + abort => pass/optimal |
Willy Tarreau | 9e13820 | 2014-07-12 17:31:07 +0200 | [diff] [blame] | 876 | - Squid 3 : 400 Bad Request + abort => pass/optimal |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 877 | - SSL : |
| 878 | - stud 0.3.47 : connection abort => pass/optimal |
| 879 | - stunnel 4.45 : connection abort => pass/optimal |
| 880 | - nginx 0.7.69 : 400 Bad Request + abort => pass/optimal |
| 881 | - FTP : |
| 882 | - Pure-ftpd 1.0.20 : 3*500 then 221 Goodbye => pass/optimal |
| 883 | - vsftpd 2.0.1 : 3*530 then 221 Goodbye => pass/optimal |
| 884 | - SMTP : |
| 885 | - postfix 2.3 : 3*500 + 221 Bye => pass/optimal |
| 886 | - exim 4.69 : 554 + connection abort => pass/optimal |
| 887 | - POP : |
| 888 | - dovecot 1.0.10 : 3*ERR + Logout => pass/optimal |
| 889 | - IMAP : |
| 890 | - dovecot 1.0.10 : 5*ERR + hang => pass/non-optimal |
| 891 | - LDAP : |
| 892 | - openldap 2.3 : abort => pass/optimal |
| 893 | - SSH : |
| 894 | - openssh 3.9p1 : abort => pass/optimal |
| 895 | - RDP : |
| 896 | - Windows XP SP3 : abort => pass/optimal |
| 897 | |
| 898 | This means that most protocols and implementations will not be confused by an |
| 899 | incoming connection exhibiting the protocol signature, which avoids issues when |
| 900 | facing misconfigurations. |
| 901 | |
| 902 | |
| 903 | 7. Future developments |
Willy Tarreau | 640cf22 | 2010-10-29 21:46:16 +0200 | [diff] [blame] | 904 | |
| 905 | It is possible that the protocol may slightly evolve to present other |
| 906 | information such as the incoming network interface, or the origin addresses in |
| 907 | case of network address translation happening before the first proxy, but this |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 908 | is not identified as a requirement right now. Some deep thinking has been spent |
Andriy Palamarchuk | f1eae4e | 2017-01-24 13:34:08 -0500 | [diff] [blame] | 909 | on this and it appears that trying to add a few more information open a Pandora |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 910 | box with many information from MAC addresses to SSL client certificates, which |
| 911 | would make the protocol much more complex. So at this point it is not planned. |
| 912 | Suggestions on improvements are welcome. |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 913 | |
| 914 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 915 | 8. Contacts and links |
Willy Tarreau | 7f89851 | 2011-03-20 11:32:40 +0100 | [diff] [blame] | 916 | |
| 917 | Please use w@1wt.eu to send any comments to the author. |
| 918 | |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 919 | The following links were referenced in the document. |
| 920 | |
| 921 | [1] http://www.postfix.org/XCLIENT_README.html |
Willy Tarreau | 7a6f134 | 2014-06-14 11:45:09 +0200 | [diff] [blame] | 922 | [2] http://tools.ietf.org/html/rfc7239 |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 923 | [3] http://www.stunnel.org/ |
| 924 | [4] https://github.com/bumptech/stud |
| 925 | [5] https://github.com/bumptech/stud/pull/81 |
Willy Tarreau | 7b7011c | 2015-05-02 15:13:07 +0200 | [diff] [blame] | 926 | [6] https://www.varnish-cache.org/docs/trunk/phk/ssl_again.html |
| 927 | [7] http://wiki.squid-cache.org/Squid-3.5 |
Andriy Palamarchuk | ceae85b | 2017-01-24 13:48:27 -0500 | [diff] [blame] | 928 | [8] https://tools.ietf.org/html/rfc4960#appendix-B |
Andriy Palamarchuk | 01105ac | 2017-03-14 18:59:09 -0400 | [diff] [blame] | 929 | [9] https://tools.ietf.org/rfc/rfc7301.txt |
| 930 | [10] https://www.ietf.org/rfc/rfc3546.txt |
Glenn Strauss | c28bb55 | 2017-04-05 01:51:37 -0400 | [diff] [blame] | 931 | [11] https://redmine.lighttpd.net/issues/2804 |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 932 | |
| 933 | 9. Sample code |
| 934 | |
| 935 | The code below is an example of how a receiver may deal with both versions of |
| 936 | the protocol header for TCP over IPv4 or IPv6. The function is supposed to be |
| 937 | called upon a read event. Addresses may be directly copied into their final |
| 938 | memory location since they're transported in network byte order. The sending |
| 939 | side is even simpler and can easily be deduced from this sample code. |
| 940 | |
| 941 | struct sockaddr_storage from; /* already filled by accept() */ |
| 942 | struct sockaddr_storage to; /* already filled by getsockname() */ |
Willy Tarreau | 01320c9 | 2014-06-14 08:36:29 +0200 | [diff] [blame] | 943 | const char v2sig[12] = "\x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A"; |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 944 | |
| 945 | /* returns 0 if needs to poll, <0 upon error or >0 if it did the job */ |
| 946 | int read_evt(int fd) |
| 947 | { |
| 948 | union { |
| 949 | struct { |
| 950 | char line[108]; |
| 951 | } v1; |
| 952 | struct { |
| 953 | uint8_t sig[12]; |
Willy Tarreau | 0f6093a | 2014-06-11 21:21:26 +0200 | [diff] [blame] | 954 | uint8_t ver_cmd; |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 955 | uint8_t fam; |
Willy Tarreau | 0f6093a | 2014-06-11 21:21:26 +0200 | [diff] [blame] | 956 | uint16_t len; |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 957 | union { |
| 958 | struct { /* for TCP/UDP over IPv4, len = 12 */ |
| 959 | uint32_t src_addr; |
| 960 | uint32_t dst_addr; |
| 961 | uint16_t src_port; |
| 962 | uint16_t dst_port; |
| 963 | } ip4; |
| 964 | struct { /* for TCP/UDP over IPv6, len = 36 */ |
| 965 | uint8_t src_addr[16]; |
| 966 | uint8_t dst_addr[16]; |
| 967 | uint16_t src_port; |
| 968 | uint16_t dst_port; |
| 969 | } ip6; |
| 970 | struct { /* for AF_UNIX sockets, len = 216 */ |
| 971 | uint8_t src_addr[108]; |
| 972 | uint8_t dst_addr[108]; |
| 973 | } unx; |
| 974 | } addr; |
| 975 | } v2; |
| 976 | } hdr; |
| 977 | |
| 978 | int size, ret; |
| 979 | |
| 980 | do { |
| 981 | ret = recv(fd, &hdr, sizeof(hdr), MSG_PEEK); |
| 982 | } while (ret == -1 && errno == EINTR); |
| 983 | |
| 984 | if (ret == -1) |
| 985 | return (errno == EAGAIN) ? 0 : -1; |
| 986 | |
Willy Tarreau | 01320c9 | 2014-06-14 08:36:29 +0200 | [diff] [blame] | 987 | if (ret >= 16 && memcmp(&hdr.v2, v2sig, 12) == 0 && |
| 988 | (hdr.v2.ver_cmd & 0xF0) == 0x20) { |
Glenn Strauss | 91cc808 | 2017-04-05 01:37:20 -0400 | [diff] [blame] | 989 | size = 16 + ntohs(hdr.v2.len); |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 990 | if (ret < size) |
| 991 | return -1; /* truncated or too large header */ |
| 992 | |
Willy Tarreau | 0f6093a | 2014-06-11 21:21:26 +0200 | [diff] [blame] | 993 | switch (hdr.v2.ver_cmd & 0xF) { |
Willy Tarreau | 332d7b0 | 2012-11-19 11:27:29 +0100 | [diff] [blame] | 994 | case 0x01: /* PROXY command */ |
| 995 | switch (hdr.v2.fam) { |
| 996 | case 0x11: /* TCPv4 */ |
| 997 | ((struct sockaddr_in *)&from)->sin_family = AF_INET; |
| 998 | ((struct sockaddr_in *)&from)->sin_addr.s_addr = |
| 999 | hdr.v2.addr.ip4.src_addr; |
| 1000 | ((struct sockaddr_in *)&from)->sin_port = |
| 1001 | hdr.v2.addr.ip4.src_port; |
| 1002 | ((struct sockaddr_in *)&to)->sin_family = AF_INET; |
| 1003 | ((struct sockaddr_in *)&to)->sin_addr.s_addr = |
| 1004 | hdr.v2.addr.ip4.dst_addr; |
| 1005 | ((struct sockaddr_in *)&to)->sin_port = |
| 1006 | hdr.v2.addr.ip4.dst_port; |
| 1007 | goto done; |
| 1008 | case 0x21: /* TCPv6 */ |
| 1009 | ((struct sockaddr_in6 *)&from)->sin6_family = AF_INET6; |
| 1010 | memcpy(&((struct sockaddr_in6 *)&from)->sin6_addr, |
| 1011 | hdr.v2.addr.ip6.src_addr, 16); |
| 1012 | ((struct sockaddr_in6 *)&from)->sin6_port = |
| 1013 | hdr.v2.addr.ip6.src_port; |
| 1014 | ((struct sockaddr_in6 *)&to)->sin6_family = AF_INET6; |
| 1015 | memcpy(&((struct sockaddr_in6 *)&to)->sin6_addr, |
| 1016 | hdr.v2.addr.ip6.dst_addr, 16); |
| 1017 | ((struct sockaddr_in6 *)&to)->sin6_port = |
| 1018 | hdr.v2.addr.ip6.dst_port; |
| 1019 | goto done; |
| 1020 | } |
| 1021 | /* unsupported protocol, keep local connection address */ |
| 1022 | break; |
| 1023 | case 0x00: /* LOCAL command */ |
| 1024 | /* keep local connection address for LOCAL */ |
| 1025 | break; |
| 1026 | default: |
| 1027 | return -1; /* not a supported command */ |
| 1028 | } |
| 1029 | } |
| 1030 | else if (ret >= 8 && memcmp(hdr.v1.line, "PROXY", 5) == 0) { |
| 1031 | char *end = memchr(hdr.v1.line, '\r', ret - 1); |
| 1032 | if (!end || end[1] != '\n') |
| 1033 | return -1; /* partial or invalid header */ |
| 1034 | *end = '\0'; /* terminate the string to ease parsing */ |
| 1035 | size = end + 2 - hdr.v1.line; /* skip header + CRLF */ |
| 1036 | /* parse the V1 header using favorite address parsers like inet_pton. |
| 1037 | * return -1 upon error, or simply fall through to accept. |
| 1038 | */ |
| 1039 | } |
| 1040 | else { |
| 1041 | /* Wrong protocol */ |
| 1042 | return -1; |
| 1043 | } |
| 1044 | |
| 1045 | done: |
| 1046 | /* we need to consume the appropriate amount of data from the socket */ |
| 1047 | do { |
| 1048 | ret = recv(fd, &hdr, size, 0); |
| 1049 | } while (ret == -1 && errno == EINTR); |
| 1050 | return (ret >= 0) ? 1 : -1; |
| 1051 | } |