blob: ccb2aed7d6d4c747d2a24950f31bd67578ab0d52 [file] [log] [blame]
Willy Tarreau87b7ac92014-07-25 08:56:07 +020012014/07/25 Willy Tarreau
Willy Tarreaua3393952014-05-10 15:16:43 +02002 HAProxy Technologies
Willy Tarreau7f898512011-03-20 11:32:40 +01003 The PROXY protocol
Willy Tarreau332d7b02012-11-19 11:27:29 +01004 Versions 1 & 2
Willy Tarreau7f898512011-03-20 11:32:40 +01005
6Abstract
7
8 The PROXY protocol provides a convenient way to safely transport connection
9 information such as a client's address across multiple layers of NAT or TCP
10 proxies. It is designed to require little changes to existing components and
11 to limit the performance impact caused by the processing of the transported
12 information.
13
14
15Revision history
16
17 2010/10/29 - first version
18 2011/03/20 - update: implementation and security considerations
Willy Tarreau332d7b02012-11-19 11:27:29 +010019 2012/06/21 - add support for binary format
20 2012/11/19 - final review and fixes
David Safb76832014-05-08 23:42:08 -040021 2014/05/18 - modify and extend PROXY protocol version 2
Willy Tarreau7a6f1342014-06-14 11:45:09 +020022 2014/06/11 - fix example code to consider ver+cmd merge
23 2014/06/14 - fix v2 header check in example code, and update Forwarded spec
Willy Tarreau87b7ac92014-07-25 08:56:07 +020024 2014/07/12 - update list of implementations (add Squid)
Willy Tarreau7f898512011-03-20 11:32:40 +010025
26
271. Background
Willy Tarreau640cf222010-10-29 21:46:16 +020028
29Relaying TCP connections through proxies generally involves a loss of the
30original TCP connection parameters such as source and destination addresses,
31ports, and so on. Some protocols make it a little bit easier to transfer such
Willy Tarreau332d7b02012-11-19 11:27:29 +010032information. For SMTP, Postfix authors have proposed the XCLIENT protocol [1]
Willy Tarreau7a6f1342014-06-14 11:45:09 +020033which received broad adoption and is particularly suited to mail exchanges.
34For HTTP, there is the "Forwarded" extension [2], which aims at replacing the
35omnipresent "X-Forwarded-For" header which carries information about the
36original source address, and the less common X-Original-To which carries
37information about the destination address.
Willy Tarreau640cf222010-10-29 21:46:16 +020038
39However, both mechanisms require a knowledge of the underlying protocol to be
40implemented in intermediaries.
41
42Then comes a new class of products which we'll call "dumb proxies", not because
43they don't do anything, but because they're processing protocol-agnostic data.
Willy Tarreau332d7b02012-11-19 11:27:29 +010044Both Stunnel[3] and Stud[4] are examples of such "dumb proxies". They talk raw
45TCP on one side, and raw SSL on the other one, and do that reliably, without
Willy Tarreau7a6f1342014-06-14 11:45:09 +020046any knowledge of what protocol is transported on top of the connection. Haproxy
47running in pure TCP mode obviously falls into that category as well.
Willy Tarreau640cf222010-10-29 21:46:16 +020048
49The problem with such a proxy when it is combined with another one such as
Willy Tarreau7a6f1342014-06-14 11:45:09 +020050haproxy, is to adapt it to talk the higher level protocol. A patch is available
Willy Tarreau332d7b02012-11-19 11:27:29 +010051for Stunnel to make it capable of inserting an X-Forwarded-For header in the
52first HTTP request of each incoming connection. Haproxy is able not to add
53another one when the connection comes from Stunnel, so that it's possible to
54hide it from the servers.
Willy Tarreau640cf222010-10-29 21:46:16 +020055
56The typical architecture becomes the following one :
57
58
59 +--------+ HTTP :80 +----------+
60 | client | --------------------------------> | |
61 | | | haproxy, |
62 +--------+ +---------+ | 1 or 2 |
63 / / HTTPS | stunnel | HTTP :81 | listening|
64 <________/ ---------> | (server | ---------> | ports |
65 | mode) | | |
66 +---------+ +----------+
67
68
69The problem appears when haproxy runs with keep-alive on the side towards the
70client. The Stunnel patch will only add the X-Forwarded-For header to the first
71request of each connection and all subsequent requests will not have it. One
72solution could be to improve the patch to make it support keep-alive and parse
73all forwarded data, whether they're announced with a Content-Length or with a
74Transfer-Encoding, taking care of special methods such as HEAD which announce
75data without transfering them, etc... In fact, it would require implementing a
76full HTTP stack in Stunnel. It would then become a lot more complex, a lot less
77reliable and would not anymore be the "dumb proxy" that fits every purposes.
78
79In practice, we don't need to add a header for each request because we'll emit
80the exact same information every time : the information related to the client
81side connection. We could then cache that information in haproxy and use it for
82every other request. But that becomes dangerous and is still limited to HTTP
83only.
84
Willy Tarreau332d7b02012-11-19 11:27:29 +010085Another approach consists in prepending each connection with a header reporting
86the characteristics of the other side's connection. This method is simpler to
Willy Tarreau640cf222010-10-29 21:46:16 +020087implement, does not require any protocol-specific knowledge on either side, and
Willy Tarreau332d7b02012-11-19 11:27:29 +010088completely fits the purpose since what is desired precisely is to know the
89other side's connection endpoints. It is easy to perform for the sender (just
90send a short header once the connection is established) and to parse for the
91receiver (simply perform one read() on the incoming connection to fill in
92addresses after an accept). The protocol used to carry connection information
93across proxies was thus called the PROXY protocol.
Willy Tarreau640cf222010-10-29 21:46:16 +020094
Willy Tarreau7f898512011-03-20 11:32:40 +010095
Willy Tarreau332d7b02012-11-19 11:27:29 +0100962. The PROXY protocol header
Willy Tarreau7f898512011-03-20 11:32:40 +010097
Willy Tarreau332d7b02012-11-19 11:27:29 +010098This document uses a few terms that are worth explaining here :
99 - "connection initiator" is the party requesting a new connection
100 - "connection target" is the party accepting a connection request
101 - "client" is the party for which a connection was requested
102 - "server" is the party to which the client desired to connect
103 - "proxy" is the party intercepting and relaying the connection
104 from the client to the server.
105 - "sender" is the party sending data over a connection.
106 - "receiver" is the party receiving data from the sender.
107 - "header" or "PROXY protocol header" is the block of connection information
108 the connection initiator prepends at the beginning of a connection, which
109 makes it the sender from the protocol point of view.
110
111The PROXY protocol's goal is to fill the server's internal structures with the
112information collected by the proxy that the server would have been able to get
113by itself if the client was connecting directly to the server instead of via a
114proxy. The information carried by the protocol are the ones the server would
115get using getsockname() and getpeername() :
116 - address family (AF_INET for IPv4, AF_INET6 for IPv6, AF_UNIX)
117 - socket protocol (SOCK_STREAM for TCP, SOCK_DGRAM for UDP)
Willy Tarreau640cf222010-10-29 21:46:16 +0200118 - layer 3 source and destination addresses
119 - layer 4 source and destination ports if any
120
121Unlike the XCLIENT protocol, the PROXY protocol was designed with limited
Willy Tarreau332d7b02012-11-19 11:27:29 +0100122extensibility in order to help the receiver parse it very fast. Version 1 was
123focused on keeping it human-readable for better debugging possibilities, which
124is always desirable for early adoption when few implementations exist. Version
1252 adds support for a binary encoding of the header which is much more efficient
126to produce and to parse, especially when dealing with IPv6 addresses that are
127expensive to emit in ASCII form and to parse.
128
129In both cases, the protocol simply consists in an easily parsable header placed
130by the connection initiator at the beginning of each connection. The protocol
131is intentionally stateless in that it does not expect the sender to wait for
132the receiver before sending the header, nor the receiver to send anything back.
133
134This specification supports two header formats, a human-readable format which
135is the only format supported in version 1 of the protocol, and a binary format
136which is only supported in version 2. Both formats were designed to ensure that
137the header cannot be confused with common higher level protocols such as HTTP,
138SSL/TLS, FTP or SMTP, and that both formats are easily distinguishable one from
139each other for the receiver.
140
141Version 1 senders MAY only produce the human-readable header format. Version 2
142senders MAY only produce the binary header format. Version 1 receivers MUST at
143least implement the human-readable header format. Version 2 receivers MUST at
144least implement the binary header format, and it is recommended that they also
145implement the human-readable header format for better interoperability and ease
146of upgrade when facing version 1 senders.
147
148Both formats are designed to fit in the smallest TCP segment that any TCP/IP
149host is required to support (576 - 40 = 536 bytes). This ensures that the whole
150header will always be delivered at once when the socket buffers are still empty
151at the beginning of a connection. The sender must always ensure that the header
152is sent at once, so that the transport layer maintains atomicity along the path
153to the receiver. The receiver may be tolerant to partial headers or may simply
154drop the connection when receiving a partial header. Recommendation is to be
155tolerant, but implementation constraints may not always easily permit this. It
156is important to note that nothing forces any intermediary to forward the whole
157header at once, because TCP is a streaming protocol which may be processed one
158byte at a time if desired, causing the header to be fragmented when reaching
159the receiver. But due to the places where such a protocol is used, the above
160simplification generally is acceptable because the risk of crossing such a
161device handling one byte at a time is close to zero.
162
163The receiver MUST NOT start processing the connection before it receives a
164complete and valid PROXY protocol header. This is particularly important for
165protocols where the receiver is expected to speak first (eg: SMTP, FTP or SSH).
166The receiver may apply a short timeout and decide to abort the connection if
167the protocol header is not seen within a few seconds (at least 3 seconds to
168cover a TCP retransmit).
169
170The receiver MUST be configured to only receive the protocol described in this
171specification and MUST not try to guess whether the protocol header is present
172or not. This means that the protocol explicitly prevents port sharing between
173public and private access. Otherwise it would open a major security breach by
174allowing untrusted parties to spoof their connection addresses. The receiver
175SHOULD ensure proper access filtering so that only trusted proxies are allowed
176to use this protocol.
177
178Some proxies are smart enough to understand transported protocols and to reuse
179idle server connections for multiple messages. This typically happens in HTTP
180where requests from multiple clients may be sent over the same connection. Such
181proxies MUST NOT implement this protocol on multiplexed connections because the
182receiver would use the address advertised in the PROXY header as the address of
183all forwarded requests's senders. In fact, such proxies are not dumb proxies,
184and since they do have a complete understanding of the transported protocol,
185they MUST use the facilities provided by this protocol to present the client's
186address.
187
188
1892.1. Human-readable header format (Version 1)
190
191This is the format specified in version 1 of the protocol. It consists in one
192line of ASCII text matching exactly the following block, sent immediately and
193at once upon the connection establishment and prepended before any data flowing
194from the sender to the receiver :
Willy Tarreau640cf222010-10-29 21:46:16 +0200195
196 - a string identifying the protocol : "PROXY" ( \x50 \x52 \x4F \x58 \x59 )
Willy Tarreau332d7b02012-11-19 11:27:29 +0100197 Seeing this string indicates that this is version 1 of the protocol.
Willy Tarreau640cf222010-10-29 21:46:16 +0200198
199 - exactly one space : " " ( \x20 )
200
Willy Tarreau332d7b02012-11-19 11:27:29 +0100201 - a string indicating the proxied INET protocol and family. As of version 1,
Willy Tarreau640cf222010-10-29 21:46:16 +0200202 only "TCP4" ( \x54 \x43 \x50 \x34 ) for TCP over IPv4, and "TCP6"
Willy Tarreau332d7b02012-11-19 11:27:29 +0100203 ( \x54 \x43 \x50 \x36 ) for TCP over IPv6 are allowed. Other, unsupported,
204 or unknown protocols must be reported with the name "UNKNOWN" ( \x55 \x4E
205 \x4B \x4E \x4F \x57 \x4E ). For "UNKNOWN", the rest of the line before the
206 CRLF may be omitted by the sender, and the receiver must ignore anything
207 presented before the CRLF is found. Note that an earlier version of this
208 specification suggested to use this when sending health checks, but this
209 causes issues with servers that reject the "UNKNOWN" keyword. Thus is it
210 now recommended not to send "UNKNOWN" when the connection is expected to
211 be accepted, but only when it is not possible to correctly fill the PROXY
212 line.
Willy Tarreau640cf222010-10-29 21:46:16 +0200213
214 - exactly one space : " " ( \x20 )
215
216 - the layer 3 source address in its canonical format. IPv4 addresses must be
217 indicated as a series of exactly 4 integers in the range [0..255] inclusive
218 written in decimal representation separated by exactly one dot between each
219 other. Heading zeroes are not permitted in front of numbers in order to
220 avoid any possible confusion with octal numbers. IPv6 addresses must be
221 indicated as series of 4 hexadecimal digits (upper or lower case) delimited
222 by colons between each other, with the acceptance of one double colon
223 sequence to replace the largest acceptable range of consecutive zeroes. The
224 total number of decoded bits must exactly be 128. The advertised protocol
225 family dictates what format to use.
226
227 - exactly one space : " " ( \x20 )
228
229 - the layer 3 destination address in its canonical format. It is the same
230 format as the layer 3 source address and matches the same family.
231
232 - exactly one space : " " ( \x20 )
233
234 - the TCP source port represented as a decimal integer in the range
235 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
236 in order to avoid any possible confusion with octal numbers.
237
238 - exactly one space : " " ( \x20 )
239
240 - the TCP destination port represented as a decimal integer in the range
241 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
242 in order to avoid any possible confusion with octal numbers.
243
244 - the CRLF sequence ( \x0D \x0A )
245
Willy Tarreau332d7b02012-11-19 11:27:29 +0100246
247The maximum line lengths the receiver must support including the CRLF are :
248 - TCP/IPv4 :
249 "PROXY TCP4 255.255.255.255 255.255.255.255 65535 65535\r\n"
250 => 5 + 1 + 4 + 1 + 15 + 1 + 15 + 1 + 5 + 1 + 5 + 2 = 56 chars
251
252 - TCP/IPv6 :
253 "PROXY TCP6 ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
254 => 5 + 1 + 4 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 104 chars
255
256 - unknown connection (short form) :
257 "PROXY UNKNOWN\r\n"
258 => 5 + 1 + 7 + 2 = 15 chars
259
260 - worst case (optional fields set to 0xff) :
261 "PROXY UNKNOWN ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
262 => 5 + 1 + 7 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 107 chars
263
264So a 108-byte buffer is always enough to store all the line and a trailing zero
265for string processing.
266
267The receiver must wait for the CRLF sequence before starting to decode the
268addresses in order to ensure they are complete and properly parsed. If the CRLF
269sequence is not found in the first 107 characters, the receiver should declare
270the line invalid. A receiver may reject an incomplete line which does not
271contain the CRLF sequence in the first atomic read operation. The receiver must
272not tolerate a single CR or LF character to end the line when a complete CRLF
273sequence is expected.
274
275Any sequence which does not exactly match the protocol must be discarded and
276cause the receiver to abort the connection. It is recommended to abort the
277connection as soon as possible so that the sender gets a chance to notice the
278anomaly and log it.
Willy Tarreau640cf222010-10-29 21:46:16 +0200279
280If the announced transport protocol is "UNKNOWN", then the receiver knows that
Willy Tarreau332d7b02012-11-19 11:27:29 +0100281the sender speaks the correct PROXY protocol with the appropriate version, and
282SHOULD accept the connection and use the real connection's parameters as if
283there were no PROXY protocol header on the wire. However, senders SHOULD not
284use the "UNKNOWN" protocol when they are the initiators of outgoing connections
285because some receivers may reject them. When a load balancing proxy has to send
286health checks to a server, it SHOULD build a valid PROXY line which it will
287fill with a getsockname()/getpeername() pair indicating the addresses used. It
288is important to understand that doing so is not appropriate when some source
289address translation is performed between the sender and the receiver.
Willy Tarreau640cf222010-10-29 21:46:16 +0200290
291An example of such a line before an HTTP request would look like this (CR
292marked as "\r" and LF marked as "\n") :
293
294 PROXY TCP4 192.168.0.1 192.168.0.11 56324 443\r\n
295 GET / HTTP/1.1\r\n
296 Host: 192.168.0.11\r\n
297 \r\n
298
Willy Tarreau332d7b02012-11-19 11:27:29 +0100299For the sender, the header line is easy to put into the output buffers once the
300connection is established. Note that since the line is always shorter than an
301MSS, the sender is guaranteed to always be able to emit it at once and should
302not even bother handling partial sends. For the receiver, once the header is
303parsed, it is easy to skip it from the input buffers. Please consult section 9
304for implementation suggestions.
305
306
3072.2. Binary header format (version 2)
308
309Producing human-readable IPv6 addresses and parsing them is very inefficient,
310due to the multiple possible representation formats and the handling of compact
311address format. It was also not possible to specify address families outside
312IPv4/IPv6 nor non-TCP protocols. Another drawback of the human-readable format
313is the fact that implementations need to parse all characters to find the
314trailing CRLF, which makes it harder to read only the exact bytes count. Last,
315the UNKNOWN address type has not always been accepted by servers as a valid
316protocol because of its imprecise meaning.
317
318Version 2 of the protocol thus introduces a new binary format which remains
319distinguishable from version 1 and from other commonly used protocols. It was
320specially designed in order to be incompatible with a wide range of protocols
321and to be rejected by a number of common implementations of these protocols
322when unexpectedly presented (please see section 7). Also for better processing
323efficiency, IPv4 and IPv6 addresses are respectively aligned on 4 and 16 bytes
324boundaries.
325
326The binary header format starts with a constant 12 bytes block containing the
327protocol signature :
328
329 \x0D \x0A \x0D \x0A \x00 \x0D \x0A \x51 \x55 \x49 \x54 \x0A
330
331Note that this block contains a null byte at the 5th position, so it must not
332be handled as a null-terminated string.
333
David Safb76832014-05-08 23:42:08 -0400334The next byte (the 13th one) is the protocol version and command.
Willy Tarreau332d7b02012-11-19 11:27:29 +0100335
David Safb76832014-05-08 23:42:08 -0400336The highest four bits contains the version. As of this specification, it must
337always be sent as \x2 and the receiver must only accept this value.
338
339The lowest four bits represents the command :
340 - \x0 : LOCAL : the connection was established on purpose by the proxy
Willy Tarreau332d7b02012-11-19 11:27:29 +0100341 without being relayed. The connection endpoints are the sender and the
342 receiver. Such connections exist when the proxy sends health-checks to the
343 server. The receiver must accept this connection as valid and must use the
344 real connection endpoints and discard the protocol block including the
345 family which is ignored.
346
David Safb76832014-05-08 23:42:08 -0400347 - \x1 : PROXY : the connection was established on behalf of another node,
Willy Tarreau332d7b02012-11-19 11:27:29 +0100348 and reflects the original connection endpoints. The receiver must then use
349 the information provided in the protocol block to get original the address.
350
351 - other values are unassigned and must not be emitted by senders. Receivers
352 must drop connections presenting unexpected values here.
353
David Safb76832014-05-08 23:42:08 -0400354The 14th byte contains the transport protocol and address family. The highest 4
Willy Tarreau332d7b02012-11-19 11:27:29 +0100355bits contain the address family, the lowest 4 bits contain the protocol.
356
357The address family maps to the original socket family without necessarily
358matching the values internally used by the system. It may be one of :
359
360 - 0x0 : AF_UNSPEC : the connection is forwarded for an unknown, unspecified
361 or unsupported protocol. The sender should use this family when sending
362 LOCAL commands or when dealing with unsupported protocol families. The
363 receiver is free to accept the connection anyway and use the real endpoint
364 addresses or to reject it. The receiver should ignore address information.
365
366 - 0x1 : AF_INET : the forwarded connection uses the AF_INET address family
367 (IPv4). The addresses are exactly 4 bytes each in network byte order,
368 followed by transport protocol information (typically ports).
369
370 - 0x2 : AF_INET6 : the forwarded connection uses the AF_INET6 address family
371 (IPv6). The addresses are exactly 16 bytes each in network byte order,
372 followed by transport protocol information (typically ports).
373
374 - 0x3 : AF_UNIX : the forwarded connection uses the AF_UNIX address family
375 (UNIX). The addresses are exactly 108 bytes each.
376
377 - other values are unspecified and must not be emitted in version 2 of this
378 protocol and must be rejected as invalid by receivers.
379
David Safb76832014-05-08 23:42:08 -0400380The transport protocol is specified in the lowest 4 bits of the the 14th byte :
Willy Tarreau332d7b02012-11-19 11:27:29 +0100381
382 - 0x0 : UNSPEC : the connection is forwarded for an unknown, unspecified
383 or unsupported protocol. The sender should use this family when sending
384 LOCAL commands or when dealing with unsupported protocol families. The
385 receiver is free to accept the connection anyway and use the real endpoint
386 addresses or to reject it. The receiver should ignore address information.
387
388 - 0x1 : STREAM : the forwarded connection uses a SOCK_STREAM protocol (eg:
389 TCP or UNIX_STREAM). When used with AF_INET/AF_INET6 (TCP), the addresses
390 are followed by the source and destination ports represented on 2 bytes
391 each in network byte order.
392
393 - 0x2 : DGRAM : the forwarded connection uses a SOCK_DGRAM protocol (eg:
394 UDP or UNIX_DGRAM). When used with AF_INET/AF_INET6 (UDP), the addresses
395 are followed by the source and destination ports represented on 2 bytes
396 each in network byte order.
397
398 - other values are unspecified and must not be emitted in version 2 of this
399 protocol and must be rejected as invalid by receivers.
400
401In practice, the following protocol bytes are expected :
402
403 - \x00 : UNSPEC : the connection is forwarded for an unknown, unspecified
404 or unsupported protocol. The sender should use this family when sending
405 LOCAL commands or when dealing with unsupported protocol families. When
406 used with a LOCAL command, the receiver must accept the connection and
407 ignore any address information. For other commands, the receiver is free
408 to accept the connection anyway and use the real endpoints addresses or to
409 reject the connection. The receiver should ignore address information.
410
411 - \x11 : TCP over IPv4 : the forwarded connection uses TCP over the AF_INET
412 protocol family. Address length is 2*4 + 2*2 = 12 bytes.
413
414 - \x12 : UDP over IPv4 : the forwarded connection uses UDP over the AF_INET
415 protocol family. Address length is 2*4 + 2*2 = 12 bytes.
416
417 - \x21 : TCP over IPv6 : the forwarded connection uses TCP over the AF_INET6
418 protocol family. Address length is 2*16 + 2*2 = 36 bytes.
419
420 - \x22 : UDP over IPv6 : the forwarded connection uses UDP over the AF_INET6
421 protocol family. Address length is 2*16 + 2*2 = 36 bytes.
422
423 - \x31 : UNIX stream : the forwarded connection uses SOCK_STREAM over the
424 AF_UNIX protocol family. Address length is 2*108 = 216 bytes.
425
426 - \x32 : UNIX datagram : the forwarded connection uses SOCK_DGRAM over the
427 AF_UNIX protocol family. Address length is 2*108 = 216 bytes.
428
429
430Only the UNSPEC protocol byte (\x00) is mandatory. A receiver is not required
431to implement other ones, provided that it automatically falls back to the
432UNSPEC mode for the valid combinations above that it does not support.
Willy Tarreau640cf222010-10-29 21:46:16 +0200433
David Safb76832014-05-08 23:42:08 -0400434The 15th and 16th bytes is the address length in bytes in network endien order.
435It is used so that the receiver knows how many address bytes to skip even when
436it does not implement the presented protocol. Thus the length of the protocol
437header in bytes is always exactly 16 + this value. When a sender presents a
Willy Tarreau332d7b02012-11-19 11:27:29 +0100438LOCAL connection, it should not present any address so it sets this field to
439zero. Receivers MUST always consider this field to skip the appropriate number
440of bytes and must not assume zero is presented for LOCAL connections. When a
441receiver accepts an incoming connection showing an UNSPEC address family or
442protocol, it may or may not decide to log the address information if present.
443
444So the 16-byte version 2 header can be described this way :
445
446 struct proxy_hdr_v2 {
447 uint8_t sig[12]; /* hex 0D 0A 0D 0A 00 0D 0A 51 55 49 54 0A */
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200448 uint8_t ver_cmd; /* protocol version and command */
Willy Tarreau332d7b02012-11-19 11:27:29 +0100449 uint8_t fam; /* protocol family and address */
David Safb76832014-05-08 23:42:08 -0400450 uint16_t len; /* number of following bytes part of the header */
Willy Tarreau332d7b02012-11-19 11:27:29 +0100451 };
452
453Starting from the 17th byte, addresses are presented in network byte order.
454The address order is always the same :
455 - source layer 3 address in network byte order
456 - destination layer 3 address in network byte order
457 - source layer 4 address if any, in network byte order (port)
458 - destination layer 4 address if any, in network byte order (port)
459
460The address block may directly be sent from or received into the following
461union which makes it easy to cast from/to the relevant socket native structs
462depending on the address type :
463
464 union proxy_addr {
465 struct { /* for TCP/UDP over IPv4, len = 12 */
466 uint32_t src_addr;
467 uint32_t dst_addr;
468 uint16_t src_port;
469 uint16_t dst_port;
470 } ipv4_addr;
471 struct { /* for TCP/UDP over IPv6, len = 36 */
472 uint8_t src_addr[16];
473 uint8_t dst_addr[16];
474 uint16_t src_port;
475 uint16_t dst_port;
476 } ipv6_addr;
477 struct { /* for AF_UNIX sockets, len = 216 */
478 uint8_t src_addr[108];
479 uint8_t dst_addr[108];
480 } unix_addr;
481 };
482
483The sender must ensure that all the protocol header is sent at once. This block
484is always smaller than an MSS, so there is no reason for it to be segmented at
485the beginning of the connection. The receiver should also process the header
486at once. The receiver must not start to parse an address before the whole
487address block is received. The receiver must also reject incoming connections
488containing partial protocol headers.
489
490A receiver may be configured to support both version 1 and version 2 of the
491protocol. Identifying the protocol version is easy :
492
493 - if the incoming byte count is 16 or above and the 13 first bytes match
494 the protocol signature block followed by the protocol version 2 :
495
496 \x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A\x02
497
498 - otherwise, if the incoming byte count is 8 or above, and the 5 first
499 characters match the ASCII representation of "PROXY" then the protocol
500 must be parsed as version 1 :
501
502 \x50\x52\x4F\x58\x59
503
504 - otherwise the protocol is not covered by this specification and the
505 connection must be dropped.
506
David Safb76832014-05-08 23:42:08 -0400507If the length specified in the PROXY protocol header indicates that additional
508bytes are part of the header beyond the address information, a receiver may
509choose to skip over and ignore those bytes, or attempt to interpret those
510bytes.
511
512The information in those bytes will be arranged in Type-Length-Value (TLV
513vectors) in the following format. The first byte is the Type of the vector.
514The second two bytes represent the length in bytes of the value (not included
515the Type and Length bytes), and following the length field is the number of
516bytes specified by the length.
517
518 struct {
519 uint8_t type;
520 uint8_t length_hi;
521 uint8_t length_lo;
522 uint8_t value[0];
523 } tlv;
524
Willy Tarreau7f898512011-03-20 11:32:40 +0100525
5263. Implementations
527
Willy Tarreau332d7b02012-11-19 11:27:29 +0100528Haproxy 1.5 implements version 1 of the PROXY protocol on both sides :
Willy Tarreau7f898512011-03-20 11:32:40 +0100529 - the listening sockets accept the protocol when the "accept-proxy" setting
530 is passed to the "bind" keyword. Connections accepted on such listeners
531 will behave just as if the source really was the one advertised in the
532 protocol. This is true for logging, ACLs, content filtering, transparent
533 proxying, etc...
534
535 - the protocol may be used to connect to servers if the "send-proxy" setting
536 is present on the "server" line. It is enabled on a per-server basis, so it
537 is possible to have it enabled for remote servers only and still have local
538 ones behave differently. If the incoming connection was accepted with the
539 "accept-proxy", then the relayed information is the one advertised in this
540 connection's PROXY line.
541
David Safb76832014-05-08 23:42:08 -0400542 - Haproxy 1.5 also implements version 2 of the PROXY protocol as a sender. In
543 addition, a TLV with limited, optional, SSL information has been added.
544
Willy Tarreau332d7b02012-11-19 11:27:29 +0100545Stunnel added support for version 1 of the protocol for outgoing connections in
546version 4.45.
Willy Tarreau7f898512011-03-20 11:32:40 +0100547
Willy Tarreau332d7b02012-11-19 11:27:29 +0100548Stud added support for version 1 of the protocol for outgoing connections on
5492011/06/29.
550
551Postfix added support for version 1 of the protocol for incoming connections
552in smtpd and postscreen in version 2.10.
553
554A patch is available for Stud[5] to implement version 1 of the protocol on
555incoming connections.
556
557Support for the protocol in the Varnish cache is being considered [6].
558
Todd Lyonsd1dcea02014-06-03 13:29:33 -0700559Exim added support for version 1 and version 2 of the protocol for incoming
560connections on 2014/05/13, and will be released as part of version 4.83.
561
Willy Tarreau332d7b02012-11-19 11:27:29 +0100562The protocol is simple enough that it is expected that other implementations
563will appear, especially in environments such as SMTP, IMAP, FTP, RDP where the
Willy Tarreau7f898512011-03-20 11:32:40 +0100564client's address is an important piece of information for the server and some
Willy Tarreau332d7b02012-11-19 11:27:29 +0100565intermediaries. In fact, several proprietary deployments have already done so
566on FTP and SMTP servers.
Willy Tarreau7f898512011-03-20 11:32:40 +0100567
568Proxy developers are encouraged to implement this protocol, because it will
569make their products much more transparent in complex infrastructures, and will
570get rid of a number of issues related to logging and access control.
571
Willy Tarreau332d7b02012-11-19 11:27:29 +0100572
5734. Architectural benefits
5744.1. Multiple layers
575
576Using the PROXY protocol instead of transparent proxy provides several benefits
577in multiple-layer infrastructures. The first immediate benefit is that it
578becomes possible to chain multiple layers of proxies and always present the
579original IP address. for instance, let's consider the following 2-layer proxy
580architecture :
581
582 Internet
583 ,---. | client to PX1:
584 ( X ) | native protocol
585 `---' |
586 | V
587 +--+--+ +-----+
588 | FW1 |------| PX1 |
589 +--+--+ +-----+ | PX1 to PX2: PROXY + native
590 | V
591 +--+--+ +-----+
592 | FW2 |------| PX2 |
593 +--+--+ +-----+ | PX2 to SRV: PROXY + native
594 | V
595 +--+--+
596 | SRV |
597 +-----+
Willy Tarreau7f898512011-03-20 11:32:40 +0100598
Willy Tarreau332d7b02012-11-19 11:27:29 +0100599Firewall FW1 receives traffic from internet-based clients and forwards it to
600reverse-proxy PX1. PX1 adds a PROXY header then forwards to PX2 via FW2. PX2
601is configured to read the PROXY header and to emit it on output. It then joins
602the origin server SRV and presents the original client's address there. Since
603all TCP connections endpoints are real machines and are not spoofed, there is
604no issue for the return traffic to pass via the firewalls and reverse proxies.
605Using transparent proxy, this would be quite difficult because the firewalls
606would have to deal with the client's address coming from the proxies in the DMZ
607and would have to correctly route the return traffic there instead of using the
608default route.
Willy Tarreau7f898512011-03-20 11:32:40 +0100609
Willy Tarreau332d7b02012-11-19 11:27:29 +0100610
6114.2. IPv4 and IPv6 integration
612
613The protocol also eases IPv4 and IPv6 integration : if only the first layer
614(FW1 and PX1) is IPv6-capable, it is still possible to present the original
615client's IPv6 address to the target server eventhough the whole chain is only
616connected via IPv4.
617
618
6194.3. Multiple return paths
620
621When transparent proxy is used, it is not possible to run multiple proxies
622because the return traffic would follow the default route instead of finding
623the proper proxy. Some tricks are sometimes possible using multiple server
624addresses and policy routing but these are very limited.
625
626Using the PROXY protocol, this problem disappears as the servers don't need
627to route to the client, just to the proxy that forwarded the connection. So
628it is perfectly possible to run a proxy farm in front of a very large server
629farm and have it working effortless, even when dealing with multiple sites.
630
631This is particularly important in Cloud-like environments where there is little
632choice of binding to random addresses and where the lower processing power per
633node generally requires multiple front nodes.
634
635The example below illustrates the following case : virtualized infrastructures
636are deployed in 3 datacenters (DC1..DC3). Each DC uses its own VIP which is
637handled by the hosting provider's layer 3 load balancer. This load balancer
638routes the traffic to a farm of layer 7 SSL/cache offloaders which load balance
639among their local servers. The VIPs are advertised by geolocalised DNS so that
640clients generally stick to a given DC. Since clients are not guaranteed to
641stick to one DC, the L7 load balancing proxies have to know the other DCs'
642servers that may be reached via the hosting provider's LAN or via the internet.
643The L7 proxies use the PROXY protocol to join the servers behind them, so that
644even inter-DC traffic can forward the original client's address and the return
645path is unambiguous. This would not be possible using transparent proxy because
646most often the L7 proxies would not be able to spoof an address, and this would
647never work between datacenters.
648
649 Internet
650
651 DC1 DC2 DC3
652 ,---. ,---. ,---.
653 ( X ) ( X ) ( X )
654 `---' `---' `---'
655 | +-------+ | +-------+ | +-------+
656 +----| L3 LB | +----| L3 LB | +----| L3 LB |
657 | +-------+ | +-------+ | +-------+
658 ------+------- ~ ~ ~ ------+------- ~ ~ ~ ------+-------
659 ||||| |||| ||||| |||| ||||| ||||
660 50 SRV 4 PX 50 SRV 4 PX 50 SRV 4 PX
661
662
6635. Security considerations
664
665Version 1 of the protocol header (the human-readable format) was designed so as
666to be distinguishable from HTTP. It will not parse as a valid HTTP request and
667an HTTP request will not parse as a valid proxy request. Version 2 add to use a
668non-parsable binary signature to make many products fail on this block. The
669signature was designed to cause immediate failure on HTTP, SSL/TLS, SMTP, FTP,
670and POP. It also causes aborts on LDAP and RDP servers (see section 6). That
671makes it easier to enforce its use under certain connections and at the same
672time, it ensures that improperly configured servers are quickly detected.
673
Willy Tarreau7f898512011-03-20 11:32:40 +0100674Implementers should be very careful about not trying to automatically detect
Willy Tarreau332d7b02012-11-19 11:27:29 +0100675whether they have to decode the header or not, but rather they must only rely
676on a configuration parameter. Indeed, if the opportunity is left to a normal
677client to use the protocol, he will be able to hide his activities or make them
678appear as coming from someone else. However, accepting the header only from a
679number of known sources should be safe.
680
681
6826. Validation
Willy Tarreau7f898512011-03-20 11:32:40 +0100683
Willy Tarreau332d7b02012-11-19 11:27:29 +0100684The version 2 protocol signature has been sent to a wide variety of protocols
685and implementations including old ones. The following protocol and products
686have been tested to ensure the best possible behaviour when the signature was
687presented, even with minimal implementations :
Willy Tarreau7f898512011-03-20 11:32:40 +0100688
Willy Tarreau332d7b02012-11-19 11:27:29 +0100689 - HTTP :
690 - Apache 1.3.33 : connection abort => pass/optimal
691 - Nginx 0.7.69 : 400 Bad Request + abort => pass/optimal
692 - lighttpd 1.4.20 : 400 Bad Request + abort => pass/optimal
693 - thttpd 2.20c : 400 Bad Request + abort => pass/optimal
694 - mini-httpd-1.19 : 400 Bad Request + abort => pass/optimal
695 - haproxy 1.4.21 : 400 Bad Request + abort => pass/optimal
Willy Tarreaua124eb62014-07-12 17:31:07 +0200696 - Squid 3 : 400 Bad Request + abort => pass/optimal
Willy Tarreau332d7b02012-11-19 11:27:29 +0100697 - SSL :
698 - stud 0.3.47 : connection abort => pass/optimal
699 - stunnel 4.45 : connection abort => pass/optimal
700 - nginx 0.7.69 : 400 Bad Request + abort => pass/optimal
701 - FTP :
702 - Pure-ftpd 1.0.20 : 3*500 then 221 Goodbye => pass/optimal
703 - vsftpd 2.0.1 : 3*530 then 221 Goodbye => pass/optimal
704 - SMTP :
705 - postfix 2.3 : 3*500 + 221 Bye => pass/optimal
706 - exim 4.69 : 554 + connection abort => pass/optimal
707 - POP :
708 - dovecot 1.0.10 : 3*ERR + Logout => pass/optimal
709 - IMAP :
710 - dovecot 1.0.10 : 5*ERR + hang => pass/non-optimal
711 - LDAP :
712 - openldap 2.3 : abort => pass/optimal
713 - SSH :
714 - openssh 3.9p1 : abort => pass/optimal
715 - RDP :
716 - Windows XP SP3 : abort => pass/optimal
717
718This means that most protocols and implementations will not be confused by an
719incoming connection exhibiting the protocol signature, which avoids issues when
720facing misconfigurations.
721
722
7237. Future developments
Willy Tarreau640cf222010-10-29 21:46:16 +0200724
725It is possible that the protocol may slightly evolve to present other
726information such as the incoming network interface, or the origin addresses in
727case of network address translation happening before the first proxy, but this
Willy Tarreau332d7b02012-11-19 11:27:29 +0100728is not identified as a requirement right now. Some deep thinking has been spent
729on this and it appears that trying to add a few more information open a pandora
730box with many information from MAC addresses to SSL client certificates, which
731would make the protocol much more complex. So at this point it is not planned.
732Suggestions on improvements are welcome.
Willy Tarreau7f898512011-03-20 11:32:40 +0100733
734
Willy Tarreau332d7b02012-11-19 11:27:29 +01007358. Contacts and links
Willy Tarreau7f898512011-03-20 11:32:40 +0100736
737Please use w@1wt.eu to send any comments to the author.
738
Willy Tarreau332d7b02012-11-19 11:27:29 +0100739The following links were referenced in the document.
740
741[1] http://www.postfix.org/XCLIENT_README.html
Willy Tarreau7a6f1342014-06-14 11:45:09 +0200742[2] http://tools.ietf.org/html/rfc7239
Willy Tarreau332d7b02012-11-19 11:27:29 +0100743[3] http://www.stunnel.org/
744[4] https://github.com/bumptech/stud
745[5] https://github.com/bumptech/stud/pull/81
746[6] https://www.varnish-cache.org/trac/wiki/Future_Protocols
747
748
7499. Sample code
750
751The code below is an example of how a receiver may deal with both versions of
752the protocol header for TCP over IPv4 or IPv6. The function is supposed to be
753called upon a read event. Addresses may be directly copied into their final
754memory location since they're transported in network byte order. The sending
755side is even simpler and can easily be deduced from this sample code.
756
757 struct sockaddr_storage from; /* already filled by accept() */
758 struct sockaddr_storage to; /* already filled by getsockname() */
Willy Tarreau01320c92014-06-14 08:36:29 +0200759 const char v2sig[12] = "\x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A";
Willy Tarreau332d7b02012-11-19 11:27:29 +0100760
761 /* returns 0 if needs to poll, <0 upon error or >0 if it did the job */
762 int read_evt(int fd)
763 {
764 union {
765 struct {
766 char line[108];
767 } v1;
768 struct {
769 uint8_t sig[12];
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200770 uint8_t ver_cmd;
Willy Tarreau332d7b02012-11-19 11:27:29 +0100771 uint8_t fam;
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200772 uint16_t len;
Willy Tarreau332d7b02012-11-19 11:27:29 +0100773 union {
774 struct { /* for TCP/UDP over IPv4, len = 12 */
775 uint32_t src_addr;
776 uint32_t dst_addr;
777 uint16_t src_port;
778 uint16_t dst_port;
779 } ip4;
780 struct { /* for TCP/UDP over IPv6, len = 36 */
781 uint8_t src_addr[16];
782 uint8_t dst_addr[16];
783 uint16_t src_port;
784 uint16_t dst_port;
785 } ip6;
786 struct { /* for AF_UNIX sockets, len = 216 */
787 uint8_t src_addr[108];
788 uint8_t dst_addr[108];
789 } unx;
790 } addr;
791 } v2;
792 } hdr;
793
794 int size, ret;
795
796 do {
797 ret = recv(fd, &hdr, sizeof(hdr), MSG_PEEK);
798 } while (ret == -1 && errno == EINTR);
799
800 if (ret == -1)
801 return (errno == EAGAIN) ? 0 : -1;
802
Willy Tarreau01320c92014-06-14 08:36:29 +0200803 if (ret >= 16 && memcmp(&hdr.v2, v2sig, 12) == 0 &&
804 (hdr.v2.ver_cmd & 0xF0) == 0x20) {
Willy Tarreau332d7b02012-11-19 11:27:29 +0100805 size = 16 + hdr.v2.len;
806 if (ret < size)
807 return -1; /* truncated or too large header */
808
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200809 switch (hdr.v2.ver_cmd & 0xF) {
Willy Tarreau332d7b02012-11-19 11:27:29 +0100810 case 0x01: /* PROXY command */
811 switch (hdr.v2.fam) {
812 case 0x11: /* TCPv4 */
813 ((struct sockaddr_in *)&from)->sin_family = AF_INET;
814 ((struct sockaddr_in *)&from)->sin_addr.s_addr =
815 hdr.v2.addr.ip4.src_addr;
816 ((struct sockaddr_in *)&from)->sin_port =
817 hdr.v2.addr.ip4.src_port;
818 ((struct sockaddr_in *)&to)->sin_family = AF_INET;
819 ((struct sockaddr_in *)&to)->sin_addr.s_addr =
820 hdr.v2.addr.ip4.dst_addr;
821 ((struct sockaddr_in *)&to)->sin_port =
822 hdr.v2.addr.ip4.dst_port;
823 goto done;
824 case 0x21: /* TCPv6 */
825 ((struct sockaddr_in6 *)&from)->sin6_family = AF_INET6;
826 memcpy(&((struct sockaddr_in6 *)&from)->sin6_addr,
827 hdr.v2.addr.ip6.src_addr, 16);
828 ((struct sockaddr_in6 *)&from)->sin6_port =
829 hdr.v2.addr.ip6.src_port;
830 ((struct sockaddr_in6 *)&to)->sin6_family = AF_INET6;
831 memcpy(&((struct sockaddr_in6 *)&to)->sin6_addr,
832 hdr.v2.addr.ip6.dst_addr, 16);
833 ((struct sockaddr_in6 *)&to)->sin6_port =
834 hdr.v2.addr.ip6.dst_port;
835 goto done;
836 }
837 /* unsupported protocol, keep local connection address */
838 break;
839 case 0x00: /* LOCAL command */
840 /* keep local connection address for LOCAL */
841 break;
842 default:
843 return -1; /* not a supported command */
844 }
845 }
846 else if (ret >= 8 && memcmp(hdr.v1.line, "PROXY", 5) == 0) {
847 char *end = memchr(hdr.v1.line, '\r', ret - 1);
848 if (!end || end[1] != '\n')
849 return -1; /* partial or invalid header */
850 *end = '\0'; /* terminate the string to ease parsing */
851 size = end + 2 - hdr.v1.line; /* skip header + CRLF */
852 /* parse the V1 header using favorite address parsers like inet_pton.
853 * return -1 upon error, or simply fall through to accept.
854 */
855 }
856 else {
857 /* Wrong protocol */
858 return -1;
859 }
860
861 done:
862 /* we need to consume the appropriate amount of data from the socket */
863 do {
864 ret = recv(fd, &hdr, size, 0);
865 } while (ret == -1 && errno == EINTR);
866 return (ret >= 0) ? 1 : -1;
867 }