blob: b8d4379171e2a3cea1b48fca4d6d9483cbb5edd4 [file] [log] [blame]
Willy Tarreaua3393952014-05-10 15:16:43 +020012014/05/10 Willy Tarreau
2 HAProxy Technologies
Willy Tarreau7f898512011-03-20 11:32:40 +01003 The PROXY protocol
Willy Tarreau332d7b02012-11-19 11:27:29 +01004 Versions 1 & 2
Willy Tarreau7f898512011-03-20 11:32:40 +01005
6Abstract
7
8 The PROXY protocol provides a convenient way to safely transport connection
9 information such as a client's address across multiple layers of NAT or TCP
10 proxies. It is designed to require little changes to existing components and
11 to limit the performance impact caused by the processing of the transported
12 information.
13
14
15Revision history
16
17 2010/10/29 - first version
18 2011/03/20 - update: implementation and security considerations
Willy Tarreau332d7b02012-11-19 11:27:29 +010019 2012/06/21 - add support for binary format
20 2012/11/19 - final review and fixes
David Safb76832014-05-08 23:42:08 -040021 2014/05/18 - modify and extend PROXY protocol version 2
Willy Tarreau7f898512011-03-20 11:32:40 +010022
23
241. Background
Willy Tarreau640cf222010-10-29 21:46:16 +020025
26Relaying TCP connections through proxies generally involves a loss of the
27original TCP connection parameters such as source and destination addresses,
28ports, and so on. Some protocols make it a little bit easier to transfer such
Willy Tarreau332d7b02012-11-19 11:27:29 +010029information. For SMTP, Postfix authors have proposed the XCLIENT protocol [1]
30which received broad adoption and is particularly suited to mail exchanges. In
31HTTP, there is the "Forwarded-For" proposed standard [2]. This proposal aims at
32replacing the omnipresent "X-Forwarded-For" header which carries information
33about the original source address, and the less common X-Original-To which
34carries information about the destination address.
Willy Tarreau640cf222010-10-29 21:46:16 +020035
36However, both mechanisms require a knowledge of the underlying protocol to be
37implemented in intermediaries.
38
39Then comes a new class of products which we'll call "dumb proxies", not because
40they don't do anything, but because they're processing protocol-agnostic data.
Willy Tarreau332d7b02012-11-19 11:27:29 +010041Both Stunnel[3] and Stud[4] are examples of such "dumb proxies". They talk raw
42TCP on one side, and raw SSL on the other one, and do that reliably, without
43any knowledge of what protocol is transported on top of the connection.
Willy Tarreau640cf222010-10-29 21:46:16 +020044
45The problem with such a proxy when it is combined with another one such as
46haproxy is to adapt it to talk the higher level protocol. A patch is available
Willy Tarreau332d7b02012-11-19 11:27:29 +010047for Stunnel to make it capable of inserting an X-Forwarded-For header in the
48first HTTP request of each incoming connection. Haproxy is able not to add
49another one when the connection comes from Stunnel, so that it's possible to
50hide it from the servers.
Willy Tarreau640cf222010-10-29 21:46:16 +020051
52The typical architecture becomes the following one :
53
54
55 +--------+ HTTP :80 +----------+
56 | client | --------------------------------> | |
57 | | | haproxy, |
58 +--------+ +---------+ | 1 or 2 |
59 / / HTTPS | stunnel | HTTP :81 | listening|
60 <________/ ---------> | (server | ---------> | ports |
61 | mode) | | |
62 +---------+ +----------+
63
64
65The problem appears when haproxy runs with keep-alive on the side towards the
66client. The Stunnel patch will only add the X-Forwarded-For header to the first
67request of each connection and all subsequent requests will not have it. One
68solution could be to improve the patch to make it support keep-alive and parse
69all forwarded data, whether they're announced with a Content-Length or with a
70Transfer-Encoding, taking care of special methods such as HEAD which announce
71data without transfering them, etc... In fact, it would require implementing a
72full HTTP stack in Stunnel. It would then become a lot more complex, a lot less
73reliable and would not anymore be the "dumb proxy" that fits every purposes.
74
75In practice, we don't need to add a header for each request because we'll emit
76the exact same information every time : the information related to the client
77side connection. We could then cache that information in haproxy and use it for
78every other request. But that becomes dangerous and is still limited to HTTP
79only.
80
Willy Tarreau332d7b02012-11-19 11:27:29 +010081Another approach consists in prepending each connection with a header reporting
82the characteristics of the other side's connection. This method is simpler to
Willy Tarreau640cf222010-10-29 21:46:16 +020083implement, does not require any protocol-specific knowledge on either side, and
Willy Tarreau332d7b02012-11-19 11:27:29 +010084completely fits the purpose since what is desired precisely is to know the
85other side's connection endpoints. It is easy to perform for the sender (just
86send a short header once the connection is established) and to parse for the
87receiver (simply perform one read() on the incoming connection to fill in
88addresses after an accept). The protocol used to carry connection information
89across proxies was thus called the PROXY protocol.
Willy Tarreau640cf222010-10-29 21:46:16 +020090
Willy Tarreau7f898512011-03-20 11:32:40 +010091
Willy Tarreau332d7b02012-11-19 11:27:29 +0100922. The PROXY protocol header
Willy Tarreau7f898512011-03-20 11:32:40 +010093
Willy Tarreau332d7b02012-11-19 11:27:29 +010094This document uses a few terms that are worth explaining here :
95 - "connection initiator" is the party requesting a new connection
96 - "connection target" is the party accepting a connection request
97 - "client" is the party for which a connection was requested
98 - "server" is the party to which the client desired to connect
99 - "proxy" is the party intercepting and relaying the connection
100 from the client to the server.
101 - "sender" is the party sending data over a connection.
102 - "receiver" is the party receiving data from the sender.
103 - "header" or "PROXY protocol header" is the block of connection information
104 the connection initiator prepends at the beginning of a connection, which
105 makes it the sender from the protocol point of view.
106
107The PROXY protocol's goal is to fill the server's internal structures with the
108information collected by the proxy that the server would have been able to get
109by itself if the client was connecting directly to the server instead of via a
110proxy. The information carried by the protocol are the ones the server would
111get using getsockname() and getpeername() :
112 - address family (AF_INET for IPv4, AF_INET6 for IPv6, AF_UNIX)
113 - socket protocol (SOCK_STREAM for TCP, SOCK_DGRAM for UDP)
Willy Tarreau640cf222010-10-29 21:46:16 +0200114 - layer 3 source and destination addresses
115 - layer 4 source and destination ports if any
116
117Unlike the XCLIENT protocol, the PROXY protocol was designed with limited
Willy Tarreau332d7b02012-11-19 11:27:29 +0100118extensibility in order to help the receiver parse it very fast. Version 1 was
119focused on keeping it human-readable for better debugging possibilities, which
120is always desirable for early adoption when few implementations exist. Version
1212 adds support for a binary encoding of the header which is much more efficient
122to produce and to parse, especially when dealing with IPv6 addresses that are
123expensive to emit in ASCII form and to parse.
124
125In both cases, the protocol simply consists in an easily parsable header placed
126by the connection initiator at the beginning of each connection. The protocol
127is intentionally stateless in that it does not expect the sender to wait for
128the receiver before sending the header, nor the receiver to send anything back.
129
130This specification supports two header formats, a human-readable format which
131is the only format supported in version 1 of the protocol, and a binary format
132which is only supported in version 2. Both formats were designed to ensure that
133the header cannot be confused with common higher level protocols such as HTTP,
134SSL/TLS, FTP or SMTP, and that both formats are easily distinguishable one from
135each other for the receiver.
136
137Version 1 senders MAY only produce the human-readable header format. Version 2
138senders MAY only produce the binary header format. Version 1 receivers MUST at
139least implement the human-readable header format. Version 2 receivers MUST at
140least implement the binary header format, and it is recommended that they also
141implement the human-readable header format for better interoperability and ease
142of upgrade when facing version 1 senders.
143
144Both formats are designed to fit in the smallest TCP segment that any TCP/IP
145host is required to support (576 - 40 = 536 bytes). This ensures that the whole
146header will always be delivered at once when the socket buffers are still empty
147at the beginning of a connection. The sender must always ensure that the header
148is sent at once, so that the transport layer maintains atomicity along the path
149to the receiver. The receiver may be tolerant to partial headers or may simply
150drop the connection when receiving a partial header. Recommendation is to be
151tolerant, but implementation constraints may not always easily permit this. It
152is important to note that nothing forces any intermediary to forward the whole
153header at once, because TCP is a streaming protocol which may be processed one
154byte at a time if desired, causing the header to be fragmented when reaching
155the receiver. But due to the places where such a protocol is used, the above
156simplification generally is acceptable because the risk of crossing such a
157device handling one byte at a time is close to zero.
158
159The receiver MUST NOT start processing the connection before it receives a
160complete and valid PROXY protocol header. This is particularly important for
161protocols where the receiver is expected to speak first (eg: SMTP, FTP or SSH).
162The receiver may apply a short timeout and decide to abort the connection if
163the protocol header is not seen within a few seconds (at least 3 seconds to
164cover a TCP retransmit).
165
166The receiver MUST be configured to only receive the protocol described in this
167specification and MUST not try to guess whether the protocol header is present
168or not. This means that the protocol explicitly prevents port sharing between
169public and private access. Otherwise it would open a major security breach by
170allowing untrusted parties to spoof their connection addresses. The receiver
171SHOULD ensure proper access filtering so that only trusted proxies are allowed
172to use this protocol.
173
174Some proxies are smart enough to understand transported protocols and to reuse
175idle server connections for multiple messages. This typically happens in HTTP
176where requests from multiple clients may be sent over the same connection. Such
177proxies MUST NOT implement this protocol on multiplexed connections because the
178receiver would use the address advertised in the PROXY header as the address of
179all forwarded requests's senders. In fact, such proxies are not dumb proxies,
180and since they do have a complete understanding of the transported protocol,
181they MUST use the facilities provided by this protocol to present the client's
182address.
183
184
1852.1. Human-readable header format (Version 1)
186
187This is the format specified in version 1 of the protocol. It consists in one
188line of ASCII text matching exactly the following block, sent immediately and
189at once upon the connection establishment and prepended before any data flowing
190from the sender to the receiver :
Willy Tarreau640cf222010-10-29 21:46:16 +0200191
192 - a string identifying the protocol : "PROXY" ( \x50 \x52 \x4F \x58 \x59 )
Willy Tarreau332d7b02012-11-19 11:27:29 +0100193 Seeing this string indicates that this is version 1 of the protocol.
Willy Tarreau640cf222010-10-29 21:46:16 +0200194
195 - exactly one space : " " ( \x20 )
196
Willy Tarreau332d7b02012-11-19 11:27:29 +0100197 - a string indicating the proxied INET protocol and family. As of version 1,
Willy Tarreau640cf222010-10-29 21:46:16 +0200198 only "TCP4" ( \x54 \x43 \x50 \x34 ) for TCP over IPv4, and "TCP6"
Willy Tarreau332d7b02012-11-19 11:27:29 +0100199 ( \x54 \x43 \x50 \x36 ) for TCP over IPv6 are allowed. Other, unsupported,
200 or unknown protocols must be reported with the name "UNKNOWN" ( \x55 \x4E
201 \x4B \x4E \x4F \x57 \x4E ). For "UNKNOWN", the rest of the line before the
202 CRLF may be omitted by the sender, and the receiver must ignore anything
203 presented before the CRLF is found. Note that an earlier version of this
204 specification suggested to use this when sending health checks, but this
205 causes issues with servers that reject the "UNKNOWN" keyword. Thus is it
206 now recommended not to send "UNKNOWN" when the connection is expected to
207 be accepted, but only when it is not possible to correctly fill the PROXY
208 line.
Willy Tarreau640cf222010-10-29 21:46:16 +0200209
210 - exactly one space : " " ( \x20 )
211
212 - the layer 3 source address in its canonical format. IPv4 addresses must be
213 indicated as a series of exactly 4 integers in the range [0..255] inclusive
214 written in decimal representation separated by exactly one dot between each
215 other. Heading zeroes are not permitted in front of numbers in order to
216 avoid any possible confusion with octal numbers. IPv6 addresses must be
217 indicated as series of 4 hexadecimal digits (upper or lower case) delimited
218 by colons between each other, with the acceptance of one double colon
219 sequence to replace the largest acceptable range of consecutive zeroes. The
220 total number of decoded bits must exactly be 128. The advertised protocol
221 family dictates what format to use.
222
223 - exactly one space : " " ( \x20 )
224
225 - the layer 3 destination address in its canonical format. It is the same
226 format as the layer 3 source address and matches the same family.
227
228 - exactly one space : " " ( \x20 )
229
230 - the TCP source port represented as a decimal integer in the range
231 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
232 in order to avoid any possible confusion with octal numbers.
233
234 - exactly one space : " " ( \x20 )
235
236 - the TCP destination port represented as a decimal integer in the range
237 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
238 in order to avoid any possible confusion with octal numbers.
239
240 - the CRLF sequence ( \x0D \x0A )
241
Willy Tarreau332d7b02012-11-19 11:27:29 +0100242
243The maximum line lengths the receiver must support including the CRLF are :
244 - TCP/IPv4 :
245 "PROXY TCP4 255.255.255.255 255.255.255.255 65535 65535\r\n"
246 => 5 + 1 + 4 + 1 + 15 + 1 + 15 + 1 + 5 + 1 + 5 + 2 = 56 chars
247
248 - TCP/IPv6 :
249 "PROXY TCP6 ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
250 => 5 + 1 + 4 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 104 chars
251
252 - unknown connection (short form) :
253 "PROXY UNKNOWN\r\n"
254 => 5 + 1 + 7 + 2 = 15 chars
255
256 - worst case (optional fields set to 0xff) :
257 "PROXY UNKNOWN ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
258 => 5 + 1 + 7 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 107 chars
259
260So a 108-byte buffer is always enough to store all the line and a trailing zero
261for string processing.
262
263The receiver must wait for the CRLF sequence before starting to decode the
264addresses in order to ensure they are complete and properly parsed. If the CRLF
265sequence is not found in the first 107 characters, the receiver should declare
266the line invalid. A receiver may reject an incomplete line which does not
267contain the CRLF sequence in the first atomic read operation. The receiver must
268not tolerate a single CR or LF character to end the line when a complete CRLF
269sequence is expected.
270
271Any sequence which does not exactly match the protocol must be discarded and
272cause the receiver to abort the connection. It is recommended to abort the
273connection as soon as possible so that the sender gets a chance to notice the
274anomaly and log it.
Willy Tarreau640cf222010-10-29 21:46:16 +0200275
276If the announced transport protocol is "UNKNOWN", then the receiver knows that
Willy Tarreau332d7b02012-11-19 11:27:29 +0100277the sender speaks the correct PROXY protocol with the appropriate version, and
278SHOULD accept the connection and use the real connection's parameters as if
279there were no PROXY protocol header on the wire. However, senders SHOULD not
280use the "UNKNOWN" protocol when they are the initiators of outgoing connections
281because some receivers may reject them. When a load balancing proxy has to send
282health checks to a server, it SHOULD build a valid PROXY line which it will
283fill with a getsockname()/getpeername() pair indicating the addresses used. It
284is important to understand that doing so is not appropriate when some source
285address translation is performed between the sender and the receiver.
Willy Tarreau640cf222010-10-29 21:46:16 +0200286
287An example of such a line before an HTTP request would look like this (CR
288marked as "\r" and LF marked as "\n") :
289
290 PROXY TCP4 192.168.0.1 192.168.0.11 56324 443\r\n
291 GET / HTTP/1.1\r\n
292 Host: 192.168.0.11\r\n
293 \r\n
294
Willy Tarreau332d7b02012-11-19 11:27:29 +0100295For the sender, the header line is easy to put into the output buffers once the
296connection is established. Note that since the line is always shorter than an
297MSS, the sender is guaranteed to always be able to emit it at once and should
298not even bother handling partial sends. For the receiver, once the header is
299parsed, it is easy to skip it from the input buffers. Please consult section 9
300for implementation suggestions.
301
302
3032.2. Binary header format (version 2)
304
305Producing human-readable IPv6 addresses and parsing them is very inefficient,
306due to the multiple possible representation formats and the handling of compact
307address format. It was also not possible to specify address families outside
308IPv4/IPv6 nor non-TCP protocols. Another drawback of the human-readable format
309is the fact that implementations need to parse all characters to find the
310trailing CRLF, which makes it harder to read only the exact bytes count. Last,
311the UNKNOWN address type has not always been accepted by servers as a valid
312protocol because of its imprecise meaning.
313
314Version 2 of the protocol thus introduces a new binary format which remains
315distinguishable from version 1 and from other commonly used protocols. It was
316specially designed in order to be incompatible with a wide range of protocols
317and to be rejected by a number of common implementations of these protocols
318when unexpectedly presented (please see section 7). Also for better processing
319efficiency, IPv4 and IPv6 addresses are respectively aligned on 4 and 16 bytes
320boundaries.
321
322The binary header format starts with a constant 12 bytes block containing the
323protocol signature :
324
325 \x0D \x0A \x0D \x0A \x00 \x0D \x0A \x51 \x55 \x49 \x54 \x0A
326
327Note that this block contains a null byte at the 5th position, so it must not
328be handled as a null-terminated string.
329
David Safb76832014-05-08 23:42:08 -0400330The next byte (the 13th one) is the protocol version and command.
Willy Tarreau332d7b02012-11-19 11:27:29 +0100331
David Safb76832014-05-08 23:42:08 -0400332The highest four bits contains the version. As of this specification, it must
333always be sent as \x2 and the receiver must only accept this value.
334
335The lowest four bits represents the command :
336 - \x0 : LOCAL : the connection was established on purpose by the proxy
Willy Tarreau332d7b02012-11-19 11:27:29 +0100337 without being relayed. The connection endpoints are the sender and the
338 receiver. Such connections exist when the proxy sends health-checks to the
339 server. The receiver must accept this connection as valid and must use the
340 real connection endpoints and discard the protocol block including the
341 family which is ignored.
342
David Safb76832014-05-08 23:42:08 -0400343 - \x1 : PROXY : the connection was established on behalf of another node,
Willy Tarreau332d7b02012-11-19 11:27:29 +0100344 and reflects the original connection endpoints. The receiver must then use
345 the information provided in the protocol block to get original the address.
346
347 - other values are unassigned and must not be emitted by senders. Receivers
348 must drop connections presenting unexpected values here.
349
David Safb76832014-05-08 23:42:08 -0400350The 14th byte contains the transport protocol and address family. The highest 4
Willy Tarreau332d7b02012-11-19 11:27:29 +0100351bits contain the address family, the lowest 4 bits contain the protocol.
352
353The address family maps to the original socket family without necessarily
354matching the values internally used by the system. It may be one of :
355
356 - 0x0 : AF_UNSPEC : the connection is forwarded for an unknown, unspecified
357 or unsupported protocol. The sender should use this family when sending
358 LOCAL commands or when dealing with unsupported protocol families. The
359 receiver is free to accept the connection anyway and use the real endpoint
360 addresses or to reject it. The receiver should ignore address information.
361
362 - 0x1 : AF_INET : the forwarded connection uses the AF_INET address family
363 (IPv4). The addresses are exactly 4 bytes each in network byte order,
364 followed by transport protocol information (typically ports).
365
366 - 0x2 : AF_INET6 : the forwarded connection uses the AF_INET6 address family
367 (IPv6). The addresses are exactly 16 bytes each in network byte order,
368 followed by transport protocol information (typically ports).
369
370 - 0x3 : AF_UNIX : the forwarded connection uses the AF_UNIX address family
371 (UNIX). The addresses are exactly 108 bytes each.
372
373 - other values are unspecified and must not be emitted in version 2 of this
374 protocol and must be rejected as invalid by receivers.
375
David Safb76832014-05-08 23:42:08 -0400376The transport protocol is specified in the lowest 4 bits of the the 14th byte :
Willy Tarreau332d7b02012-11-19 11:27:29 +0100377
378 - 0x0 : UNSPEC : the connection is forwarded for an unknown, unspecified
379 or unsupported protocol. The sender should use this family when sending
380 LOCAL commands or when dealing with unsupported protocol families. The
381 receiver is free to accept the connection anyway and use the real endpoint
382 addresses or to reject it. The receiver should ignore address information.
383
384 - 0x1 : STREAM : the forwarded connection uses a SOCK_STREAM protocol (eg:
385 TCP or UNIX_STREAM). When used with AF_INET/AF_INET6 (TCP), the addresses
386 are followed by the source and destination ports represented on 2 bytes
387 each in network byte order.
388
389 - 0x2 : DGRAM : the forwarded connection uses a SOCK_DGRAM protocol (eg:
390 UDP or UNIX_DGRAM). When used with AF_INET/AF_INET6 (UDP), the addresses
391 are followed by the source and destination ports represented on 2 bytes
392 each in network byte order.
393
394 - other values are unspecified and must not be emitted in version 2 of this
395 protocol and must be rejected as invalid by receivers.
396
397In practice, the following protocol bytes are expected :
398
399 - \x00 : UNSPEC : the connection is forwarded for an unknown, unspecified
400 or unsupported protocol. The sender should use this family when sending
401 LOCAL commands or when dealing with unsupported protocol families. When
402 used with a LOCAL command, the receiver must accept the connection and
403 ignore any address information. For other commands, the receiver is free
404 to accept the connection anyway and use the real endpoints addresses or to
405 reject the connection. The receiver should ignore address information.
406
407 - \x11 : TCP over IPv4 : the forwarded connection uses TCP over the AF_INET
408 protocol family. Address length is 2*4 + 2*2 = 12 bytes.
409
410 - \x12 : UDP over IPv4 : the forwarded connection uses UDP over the AF_INET
411 protocol family. Address length is 2*4 + 2*2 = 12 bytes.
412
413 - \x21 : TCP over IPv6 : the forwarded connection uses TCP over the AF_INET6
414 protocol family. Address length is 2*16 + 2*2 = 36 bytes.
415
416 - \x22 : UDP over IPv6 : the forwarded connection uses UDP over the AF_INET6
417 protocol family. Address length is 2*16 + 2*2 = 36 bytes.
418
419 - \x31 : UNIX stream : the forwarded connection uses SOCK_STREAM over the
420 AF_UNIX protocol family. Address length is 2*108 = 216 bytes.
421
422 - \x32 : UNIX datagram : the forwarded connection uses SOCK_DGRAM over the
423 AF_UNIX protocol family. Address length is 2*108 = 216 bytes.
424
425
426Only the UNSPEC protocol byte (\x00) is mandatory. A receiver is not required
427to implement other ones, provided that it automatically falls back to the
428UNSPEC mode for the valid combinations above that it does not support.
Willy Tarreau640cf222010-10-29 21:46:16 +0200429
David Safb76832014-05-08 23:42:08 -0400430The 15th and 16th bytes is the address length in bytes in network endien order.
431It is used so that the receiver knows how many address bytes to skip even when
432it does not implement the presented protocol. Thus the length of the protocol
433header in bytes is always exactly 16 + this value. When a sender presents a
Willy Tarreau332d7b02012-11-19 11:27:29 +0100434LOCAL connection, it should not present any address so it sets this field to
435zero. Receivers MUST always consider this field to skip the appropriate number
436of bytes and must not assume zero is presented for LOCAL connections. When a
437receiver accepts an incoming connection showing an UNSPEC address family or
438protocol, it may or may not decide to log the address information if present.
439
440So the 16-byte version 2 header can be described this way :
441
442 struct proxy_hdr_v2 {
443 uint8_t sig[12]; /* hex 0D 0A 0D 0A 00 0D 0A 51 55 49 54 0A */
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200444 uint8_t ver_cmd; /* protocol version and command */
Willy Tarreau332d7b02012-11-19 11:27:29 +0100445 uint8_t fam; /* protocol family and address */
David Safb76832014-05-08 23:42:08 -0400446 uint16_t len; /* number of following bytes part of the header */
Willy Tarreau332d7b02012-11-19 11:27:29 +0100447 };
448
449Starting from the 17th byte, addresses are presented in network byte order.
450The address order is always the same :
451 - source layer 3 address in network byte order
452 - destination layer 3 address in network byte order
453 - source layer 4 address if any, in network byte order (port)
454 - destination layer 4 address if any, in network byte order (port)
455
456The address block may directly be sent from or received into the following
457union which makes it easy to cast from/to the relevant socket native structs
458depending on the address type :
459
460 union proxy_addr {
461 struct { /* for TCP/UDP over IPv4, len = 12 */
462 uint32_t src_addr;
463 uint32_t dst_addr;
464 uint16_t src_port;
465 uint16_t dst_port;
466 } ipv4_addr;
467 struct { /* for TCP/UDP over IPv6, len = 36 */
468 uint8_t src_addr[16];
469 uint8_t dst_addr[16];
470 uint16_t src_port;
471 uint16_t dst_port;
472 } ipv6_addr;
473 struct { /* for AF_UNIX sockets, len = 216 */
474 uint8_t src_addr[108];
475 uint8_t dst_addr[108];
476 } unix_addr;
477 };
478
479The sender must ensure that all the protocol header is sent at once. This block
480is always smaller than an MSS, so there is no reason for it to be segmented at
481the beginning of the connection. The receiver should also process the header
482at once. The receiver must not start to parse an address before the whole
483address block is received. The receiver must also reject incoming connections
484containing partial protocol headers.
485
486A receiver may be configured to support both version 1 and version 2 of the
487protocol. Identifying the protocol version is easy :
488
489 - if the incoming byte count is 16 or above and the 13 first bytes match
490 the protocol signature block followed by the protocol version 2 :
491
492 \x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A\x02
493
494 - otherwise, if the incoming byte count is 8 or above, and the 5 first
495 characters match the ASCII representation of "PROXY" then the protocol
496 must be parsed as version 1 :
497
498 \x50\x52\x4F\x58\x59
499
500 - otherwise the protocol is not covered by this specification and the
501 connection must be dropped.
502
David Safb76832014-05-08 23:42:08 -0400503If the length specified in the PROXY protocol header indicates that additional
504bytes are part of the header beyond the address information, a receiver may
505choose to skip over and ignore those bytes, or attempt to interpret those
506bytes.
507
508The information in those bytes will be arranged in Type-Length-Value (TLV
509vectors) in the following format. The first byte is the Type of the vector.
510The second two bytes represent the length in bytes of the value (not included
511the Type and Length bytes), and following the length field is the number of
512bytes specified by the length.
513
514 struct {
515 uint8_t type;
516 uint8_t length_hi;
517 uint8_t length_lo;
518 uint8_t value[0];
519 } tlv;
520
Willy Tarreau7f898512011-03-20 11:32:40 +0100521
5223. Implementations
523
Willy Tarreau332d7b02012-11-19 11:27:29 +0100524Haproxy 1.5 implements version 1 of the PROXY protocol on both sides :
Willy Tarreau7f898512011-03-20 11:32:40 +0100525 - the listening sockets accept the protocol when the "accept-proxy" setting
526 is passed to the "bind" keyword. Connections accepted on such listeners
527 will behave just as if the source really was the one advertised in the
528 protocol. This is true for logging, ACLs, content filtering, transparent
529 proxying, etc...
530
531 - the protocol may be used to connect to servers if the "send-proxy" setting
532 is present on the "server" line. It is enabled on a per-server basis, so it
533 is possible to have it enabled for remote servers only and still have local
534 ones behave differently. If the incoming connection was accepted with the
535 "accept-proxy", then the relayed information is the one advertised in this
536 connection's PROXY line.
537
David Safb76832014-05-08 23:42:08 -0400538 - Haproxy 1.5 also implements version 2 of the PROXY protocol as a sender. In
539 addition, a TLV with limited, optional, SSL information has been added.
540
Willy Tarreau332d7b02012-11-19 11:27:29 +0100541Stunnel added support for version 1 of the protocol for outgoing connections in
542version 4.45.
Willy Tarreau7f898512011-03-20 11:32:40 +0100543
Willy Tarreau332d7b02012-11-19 11:27:29 +0100544Stud added support for version 1 of the protocol for outgoing connections on
5452011/06/29.
546
547Postfix added support for version 1 of the protocol for incoming connections
548in smtpd and postscreen in version 2.10.
549
550A patch is available for Stud[5] to implement version 1 of the protocol on
551incoming connections.
552
553Support for the protocol in the Varnish cache is being considered [6].
554
Todd Lyonsd1dcea02014-06-03 13:29:33 -0700555Exim added support for version 1 and version 2 of the protocol for incoming
556connections on 2014/05/13, and will be released as part of version 4.83.
557
Willy Tarreau332d7b02012-11-19 11:27:29 +0100558The protocol is simple enough that it is expected that other implementations
559will appear, especially in environments such as SMTP, IMAP, FTP, RDP where the
Willy Tarreau7f898512011-03-20 11:32:40 +0100560client's address is an important piece of information for the server and some
Willy Tarreau332d7b02012-11-19 11:27:29 +0100561intermediaries. In fact, several proprietary deployments have already done so
562on FTP and SMTP servers.
Willy Tarreau7f898512011-03-20 11:32:40 +0100563
564Proxy developers are encouraged to implement this protocol, because it will
565make their products much more transparent in complex infrastructures, and will
566get rid of a number of issues related to logging and access control.
567
Willy Tarreau332d7b02012-11-19 11:27:29 +0100568
5694. Architectural benefits
5704.1. Multiple layers
571
572Using the PROXY protocol instead of transparent proxy provides several benefits
573in multiple-layer infrastructures. The first immediate benefit is that it
574becomes possible to chain multiple layers of proxies and always present the
575original IP address. for instance, let's consider the following 2-layer proxy
576architecture :
577
578 Internet
579 ,---. | client to PX1:
580 ( X ) | native protocol
581 `---' |
582 | V
583 +--+--+ +-----+
584 | FW1 |------| PX1 |
585 +--+--+ +-----+ | PX1 to PX2: PROXY + native
586 | V
587 +--+--+ +-----+
588 | FW2 |------| PX2 |
589 +--+--+ +-----+ | PX2 to SRV: PROXY + native
590 | V
591 +--+--+
592 | SRV |
593 +-----+
Willy Tarreau7f898512011-03-20 11:32:40 +0100594
Willy Tarreau332d7b02012-11-19 11:27:29 +0100595Firewall FW1 receives traffic from internet-based clients and forwards it to
596reverse-proxy PX1. PX1 adds a PROXY header then forwards to PX2 via FW2. PX2
597is configured to read the PROXY header and to emit it on output. It then joins
598the origin server SRV and presents the original client's address there. Since
599all TCP connections endpoints are real machines and are not spoofed, there is
600no issue for the return traffic to pass via the firewalls and reverse proxies.
601Using transparent proxy, this would be quite difficult because the firewalls
602would have to deal with the client's address coming from the proxies in the DMZ
603and would have to correctly route the return traffic there instead of using the
604default route.
Willy Tarreau7f898512011-03-20 11:32:40 +0100605
Willy Tarreau332d7b02012-11-19 11:27:29 +0100606
6074.2. IPv4 and IPv6 integration
608
609The protocol also eases IPv4 and IPv6 integration : if only the first layer
610(FW1 and PX1) is IPv6-capable, it is still possible to present the original
611client's IPv6 address to the target server eventhough the whole chain is only
612connected via IPv4.
613
614
6154.3. Multiple return paths
616
617When transparent proxy is used, it is not possible to run multiple proxies
618because the return traffic would follow the default route instead of finding
619the proper proxy. Some tricks are sometimes possible using multiple server
620addresses and policy routing but these are very limited.
621
622Using the PROXY protocol, this problem disappears as the servers don't need
623to route to the client, just to the proxy that forwarded the connection. So
624it is perfectly possible to run a proxy farm in front of a very large server
625farm and have it working effortless, even when dealing with multiple sites.
626
627This is particularly important in Cloud-like environments where there is little
628choice of binding to random addresses and where the lower processing power per
629node generally requires multiple front nodes.
630
631The example below illustrates the following case : virtualized infrastructures
632are deployed in 3 datacenters (DC1..DC3). Each DC uses its own VIP which is
633handled by the hosting provider's layer 3 load balancer. This load balancer
634routes the traffic to a farm of layer 7 SSL/cache offloaders which load balance
635among their local servers. The VIPs are advertised by geolocalised DNS so that
636clients generally stick to a given DC. Since clients are not guaranteed to
637stick to one DC, the L7 load balancing proxies have to know the other DCs'
638servers that may be reached via the hosting provider's LAN or via the internet.
639The L7 proxies use the PROXY protocol to join the servers behind them, so that
640even inter-DC traffic can forward the original client's address and the return
641path is unambiguous. This would not be possible using transparent proxy because
642most often the L7 proxies would not be able to spoof an address, and this would
643never work between datacenters.
644
645 Internet
646
647 DC1 DC2 DC3
648 ,---. ,---. ,---.
649 ( X ) ( X ) ( X )
650 `---' `---' `---'
651 | +-------+ | +-------+ | +-------+
652 +----| L3 LB | +----| L3 LB | +----| L3 LB |
653 | +-------+ | +-------+ | +-------+
654 ------+------- ~ ~ ~ ------+------- ~ ~ ~ ------+-------
655 ||||| |||| ||||| |||| ||||| ||||
656 50 SRV 4 PX 50 SRV 4 PX 50 SRV 4 PX
657
658
6595. Security considerations
660
661Version 1 of the protocol header (the human-readable format) was designed so as
662to be distinguishable from HTTP. It will not parse as a valid HTTP request and
663an HTTP request will not parse as a valid proxy request. Version 2 add to use a
664non-parsable binary signature to make many products fail on this block. The
665signature was designed to cause immediate failure on HTTP, SSL/TLS, SMTP, FTP,
666and POP. It also causes aborts on LDAP and RDP servers (see section 6). That
667makes it easier to enforce its use under certain connections and at the same
668time, it ensures that improperly configured servers are quickly detected.
669
Willy Tarreau7f898512011-03-20 11:32:40 +0100670Implementers should be very careful about not trying to automatically detect
Willy Tarreau332d7b02012-11-19 11:27:29 +0100671whether they have to decode the header or not, but rather they must only rely
672on a configuration parameter. Indeed, if the opportunity is left to a normal
673client to use the protocol, he will be able to hide his activities or make them
674appear as coming from someone else. However, accepting the header only from a
675number of known sources should be safe.
676
677
6786. Validation
Willy Tarreau7f898512011-03-20 11:32:40 +0100679
Willy Tarreau332d7b02012-11-19 11:27:29 +0100680The version 2 protocol signature has been sent to a wide variety of protocols
681and implementations including old ones. The following protocol and products
682have been tested to ensure the best possible behaviour when the signature was
683presented, even with minimal implementations :
Willy Tarreau7f898512011-03-20 11:32:40 +0100684
Willy Tarreau332d7b02012-11-19 11:27:29 +0100685 - HTTP :
686 - Apache 1.3.33 : connection abort => pass/optimal
687 - Nginx 0.7.69 : 400 Bad Request + abort => pass/optimal
688 - lighttpd 1.4.20 : 400 Bad Request + abort => pass/optimal
689 - thttpd 2.20c : 400 Bad Request + abort => pass/optimal
690 - mini-httpd-1.19 : 400 Bad Request + abort => pass/optimal
691 - haproxy 1.4.21 : 400 Bad Request + abort => pass/optimal
692 - SSL :
693 - stud 0.3.47 : connection abort => pass/optimal
694 - stunnel 4.45 : connection abort => pass/optimal
695 - nginx 0.7.69 : 400 Bad Request + abort => pass/optimal
696 - FTP :
697 - Pure-ftpd 1.0.20 : 3*500 then 221 Goodbye => pass/optimal
698 - vsftpd 2.0.1 : 3*530 then 221 Goodbye => pass/optimal
699 - SMTP :
700 - postfix 2.3 : 3*500 + 221 Bye => pass/optimal
701 - exim 4.69 : 554 + connection abort => pass/optimal
702 - POP :
703 - dovecot 1.0.10 : 3*ERR + Logout => pass/optimal
704 - IMAP :
705 - dovecot 1.0.10 : 5*ERR + hang => pass/non-optimal
706 - LDAP :
707 - openldap 2.3 : abort => pass/optimal
708 - SSH :
709 - openssh 3.9p1 : abort => pass/optimal
710 - RDP :
711 - Windows XP SP3 : abort => pass/optimal
712
713This means that most protocols and implementations will not be confused by an
714incoming connection exhibiting the protocol signature, which avoids issues when
715facing misconfigurations.
716
717
7187. Future developments
Willy Tarreau640cf222010-10-29 21:46:16 +0200719
720It is possible that the protocol may slightly evolve to present other
721information such as the incoming network interface, or the origin addresses in
722case of network address translation happening before the first proxy, but this
Willy Tarreau332d7b02012-11-19 11:27:29 +0100723is not identified as a requirement right now. Some deep thinking has been spent
724on this and it appears that trying to add a few more information open a pandora
725box with many information from MAC addresses to SSL client certificates, which
726would make the protocol much more complex. So at this point it is not planned.
727Suggestions on improvements are welcome.
Willy Tarreau7f898512011-03-20 11:32:40 +0100728
729
Willy Tarreau332d7b02012-11-19 11:27:29 +01007308. Contacts and links
Willy Tarreau7f898512011-03-20 11:32:40 +0100731
732Please use w@1wt.eu to send any comments to the author.
733
Willy Tarreau332d7b02012-11-19 11:27:29 +0100734The following links were referenced in the document.
735
736[1] http://www.postfix.org/XCLIENT_README.html
737[2] http://tools.ietf.org/html/draft-ietf-appsawg-http-forwarded
738[3] http://www.stunnel.org/
739[4] https://github.com/bumptech/stud
740[5] https://github.com/bumptech/stud/pull/81
741[6] https://www.varnish-cache.org/trac/wiki/Future_Protocols
742
743
7449. Sample code
745
746The code below is an example of how a receiver may deal with both versions of
747the protocol header for TCP over IPv4 or IPv6. The function is supposed to be
748called upon a read event. Addresses may be directly copied into their final
749memory location since they're transported in network byte order. The sending
750side is even simpler and can easily be deduced from this sample code.
751
752 struct sockaddr_storage from; /* already filled by accept() */
753 struct sockaddr_storage to; /* already filled by getsockname() */
754 const char v2sig[13] = "\x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A\x02";
755
756 /* returns 0 if needs to poll, <0 upon error or >0 if it did the job */
757 int read_evt(int fd)
758 {
759 union {
760 struct {
761 char line[108];
762 } v1;
763 struct {
764 uint8_t sig[12];
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200765 uint8_t ver_cmd;
Willy Tarreau332d7b02012-11-19 11:27:29 +0100766 uint8_t fam;
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200767 uint16_t len;
Willy Tarreau332d7b02012-11-19 11:27:29 +0100768 union {
769 struct { /* for TCP/UDP over IPv4, len = 12 */
770 uint32_t src_addr;
771 uint32_t dst_addr;
772 uint16_t src_port;
773 uint16_t dst_port;
774 } ip4;
775 struct { /* for TCP/UDP over IPv6, len = 36 */
776 uint8_t src_addr[16];
777 uint8_t dst_addr[16];
778 uint16_t src_port;
779 uint16_t dst_port;
780 } ip6;
781 struct { /* for AF_UNIX sockets, len = 216 */
782 uint8_t src_addr[108];
783 uint8_t dst_addr[108];
784 } unx;
785 } addr;
786 } v2;
787 } hdr;
788
789 int size, ret;
790
791 do {
792 ret = recv(fd, &hdr, sizeof(hdr), MSG_PEEK);
793 } while (ret == -1 && errno == EINTR);
794
795 if (ret == -1)
796 return (errno == EAGAIN) ? 0 : -1;
797
798 if (ret >= 16 && memcmp(&hdr.v2, v2sig, 13) == 0) {
799 size = 16 + hdr.v2.len;
800 if (ret < size)
801 return -1; /* truncated or too large header */
802
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200803 switch (hdr.v2.ver_cmd & 0xF) {
Willy Tarreau332d7b02012-11-19 11:27:29 +0100804 case 0x01: /* PROXY command */
805 switch (hdr.v2.fam) {
806 case 0x11: /* TCPv4 */
807 ((struct sockaddr_in *)&from)->sin_family = AF_INET;
808 ((struct sockaddr_in *)&from)->sin_addr.s_addr =
809 hdr.v2.addr.ip4.src_addr;
810 ((struct sockaddr_in *)&from)->sin_port =
811 hdr.v2.addr.ip4.src_port;
812 ((struct sockaddr_in *)&to)->sin_family = AF_INET;
813 ((struct sockaddr_in *)&to)->sin_addr.s_addr =
814 hdr.v2.addr.ip4.dst_addr;
815 ((struct sockaddr_in *)&to)->sin_port =
816 hdr.v2.addr.ip4.dst_port;
817 goto done;
818 case 0x21: /* TCPv6 */
819 ((struct sockaddr_in6 *)&from)->sin6_family = AF_INET6;
820 memcpy(&((struct sockaddr_in6 *)&from)->sin6_addr,
821 hdr.v2.addr.ip6.src_addr, 16);
822 ((struct sockaddr_in6 *)&from)->sin6_port =
823 hdr.v2.addr.ip6.src_port;
824 ((struct sockaddr_in6 *)&to)->sin6_family = AF_INET6;
825 memcpy(&((struct sockaddr_in6 *)&to)->sin6_addr,
826 hdr.v2.addr.ip6.dst_addr, 16);
827 ((struct sockaddr_in6 *)&to)->sin6_port =
828 hdr.v2.addr.ip6.dst_port;
829 goto done;
830 }
831 /* unsupported protocol, keep local connection address */
832 break;
833 case 0x00: /* LOCAL command */
834 /* keep local connection address for LOCAL */
835 break;
836 default:
837 return -1; /* not a supported command */
838 }
839 }
840 else if (ret >= 8 && memcmp(hdr.v1.line, "PROXY", 5) == 0) {
841 char *end = memchr(hdr.v1.line, '\r', ret - 1);
842 if (!end || end[1] != '\n')
843 return -1; /* partial or invalid header */
844 *end = '\0'; /* terminate the string to ease parsing */
845 size = end + 2 - hdr.v1.line; /* skip header + CRLF */
846 /* parse the V1 header using favorite address parsers like inet_pton.
847 * return -1 upon error, or simply fall through to accept.
848 */
849 }
850 else {
851 /* Wrong protocol */
852 return -1;
853 }
854
855 done:
856 /* we need to consume the appropriate amount of data from the socket */
857 do {
858 ret = recv(fd, &hdr, size, 0);
859 } while (ret == -1 && errno == EINTR);
860 return (ret >= 0) ? 1 : -1;
861 }