blob: 96a459e3443d0d738fb2053e18a5d92c4a6478ce [file] [log] [blame]
Willy Tarreau332d7b02012-11-19 11:27:29 +010012012/11/19 Willy Tarreau
2 Exceliance
Willy Tarreau7f898512011-03-20 11:32:40 +01003 The PROXY protocol
Willy Tarreau332d7b02012-11-19 11:27:29 +01004 Versions 1 & 2
Willy Tarreau7f898512011-03-20 11:32:40 +01005
6Abstract
7
8 The PROXY protocol provides a convenient way to safely transport connection
9 information such as a client's address across multiple layers of NAT or TCP
10 proxies. It is designed to require little changes to existing components and
11 to limit the performance impact caused by the processing of the transported
12 information.
13
14
15Revision history
16
17 2010/10/29 - first version
18 2011/03/20 - update: implementation and security considerations
Willy Tarreau332d7b02012-11-19 11:27:29 +010019 2012/06/21 - add support for binary format
20 2012/11/19 - final review and fixes
Willy Tarreau7f898512011-03-20 11:32:40 +010021
22
231. Background
Willy Tarreau640cf222010-10-29 21:46:16 +020024
25Relaying TCP connections through proxies generally involves a loss of the
26original TCP connection parameters such as source and destination addresses,
27ports, and so on. Some protocols make it a little bit easier to transfer such
Willy Tarreau332d7b02012-11-19 11:27:29 +010028information. For SMTP, Postfix authors have proposed the XCLIENT protocol [1]
29which received broad adoption and is particularly suited to mail exchanges. In
30HTTP, there is the "Forwarded-For" proposed standard [2]. This proposal aims at
31replacing the omnipresent "X-Forwarded-For" header which carries information
32about the original source address, and the less common X-Original-To which
33carries information about the destination address.
Willy Tarreau640cf222010-10-29 21:46:16 +020034
35However, both mechanisms require a knowledge of the underlying protocol to be
36implemented in intermediaries.
37
38Then comes a new class of products which we'll call "dumb proxies", not because
39they don't do anything, but because they're processing protocol-agnostic data.
Willy Tarreau332d7b02012-11-19 11:27:29 +010040Both Stunnel[3] and Stud[4] are examples of such "dumb proxies". They talk raw
41TCP on one side, and raw SSL on the other one, and do that reliably, without
42any knowledge of what protocol is transported on top of the connection.
Willy Tarreau640cf222010-10-29 21:46:16 +020043
44The problem with such a proxy when it is combined with another one such as
45haproxy is to adapt it to talk the higher level protocol. A patch is available
Willy Tarreau332d7b02012-11-19 11:27:29 +010046for Stunnel to make it capable of inserting an X-Forwarded-For header in the
47first HTTP request of each incoming connection. Haproxy is able not to add
48another one when the connection comes from Stunnel, so that it's possible to
49hide it from the servers.
Willy Tarreau640cf222010-10-29 21:46:16 +020050
51The typical architecture becomes the following one :
52
53
54 +--------+ HTTP :80 +----------+
55 | client | --------------------------------> | |
56 | | | haproxy, |
57 +--------+ +---------+ | 1 or 2 |
58 / / HTTPS | stunnel | HTTP :81 | listening|
59 <________/ ---------> | (server | ---------> | ports |
60 | mode) | | |
61 +---------+ +----------+
62
63
64The problem appears when haproxy runs with keep-alive on the side towards the
65client. The Stunnel patch will only add the X-Forwarded-For header to the first
66request of each connection and all subsequent requests will not have it. One
67solution could be to improve the patch to make it support keep-alive and parse
68all forwarded data, whether they're announced with a Content-Length or with a
69Transfer-Encoding, taking care of special methods such as HEAD which announce
70data without transfering them, etc... In fact, it would require implementing a
71full HTTP stack in Stunnel. It would then become a lot more complex, a lot less
72reliable and would not anymore be the "dumb proxy" that fits every purposes.
73
74In practice, we don't need to add a header for each request because we'll emit
75the exact same information every time : the information related to the client
76side connection. We could then cache that information in haproxy and use it for
77every other request. But that becomes dangerous and is still limited to HTTP
78only.
79
Willy Tarreau332d7b02012-11-19 11:27:29 +010080Another approach consists in prepending each connection with a header reporting
81the characteristics of the other side's connection. This method is simpler to
Willy Tarreau640cf222010-10-29 21:46:16 +020082implement, does not require any protocol-specific knowledge on either side, and
Willy Tarreau332d7b02012-11-19 11:27:29 +010083completely fits the purpose since what is desired precisely is to know the
84other side's connection endpoints. It is easy to perform for the sender (just
85send a short header once the connection is established) and to parse for the
86receiver (simply perform one read() on the incoming connection to fill in
87addresses after an accept). The protocol used to carry connection information
88across proxies was thus called the PROXY protocol.
Willy Tarreau640cf222010-10-29 21:46:16 +020089
Willy Tarreau7f898512011-03-20 11:32:40 +010090
Willy Tarreau332d7b02012-11-19 11:27:29 +0100912. The PROXY protocol header
Willy Tarreau7f898512011-03-20 11:32:40 +010092
Willy Tarreau332d7b02012-11-19 11:27:29 +010093This document uses a few terms that are worth explaining here :
94 - "connection initiator" is the party requesting a new connection
95 - "connection target" is the party accepting a connection request
96 - "client" is the party for which a connection was requested
97 - "server" is the party to which the client desired to connect
98 - "proxy" is the party intercepting and relaying the connection
99 from the client to the server.
100 - "sender" is the party sending data over a connection.
101 - "receiver" is the party receiving data from the sender.
102 - "header" or "PROXY protocol header" is the block of connection information
103 the connection initiator prepends at the beginning of a connection, which
104 makes it the sender from the protocol point of view.
105
106The PROXY protocol's goal is to fill the server's internal structures with the
107information collected by the proxy that the server would have been able to get
108by itself if the client was connecting directly to the server instead of via a
109proxy. The information carried by the protocol are the ones the server would
110get using getsockname() and getpeername() :
111 - address family (AF_INET for IPv4, AF_INET6 for IPv6, AF_UNIX)
112 - socket protocol (SOCK_STREAM for TCP, SOCK_DGRAM for UDP)
Willy Tarreau640cf222010-10-29 21:46:16 +0200113 - layer 3 source and destination addresses
114 - layer 4 source and destination ports if any
115
116Unlike the XCLIENT protocol, the PROXY protocol was designed with limited
Willy Tarreau332d7b02012-11-19 11:27:29 +0100117extensibility in order to help the receiver parse it very fast. Version 1 was
118focused on keeping it human-readable for better debugging possibilities, which
119is always desirable for early adoption when few implementations exist. Version
1202 adds support for a binary encoding of the header which is much more efficient
121to produce and to parse, especially when dealing with IPv6 addresses that are
122expensive to emit in ASCII form and to parse.
123
124In both cases, the protocol simply consists in an easily parsable header placed
125by the connection initiator at the beginning of each connection. The protocol
126is intentionally stateless in that it does not expect the sender to wait for
127the receiver before sending the header, nor the receiver to send anything back.
128
129This specification supports two header formats, a human-readable format which
130is the only format supported in version 1 of the protocol, and a binary format
131which is only supported in version 2. Both formats were designed to ensure that
132the header cannot be confused with common higher level protocols such as HTTP,
133SSL/TLS, FTP or SMTP, and that both formats are easily distinguishable one from
134each other for the receiver.
135
136Version 1 senders MAY only produce the human-readable header format. Version 2
137senders MAY only produce the binary header format. Version 1 receivers MUST at
138least implement the human-readable header format. Version 2 receivers MUST at
139least implement the binary header format, and it is recommended that they also
140implement the human-readable header format for better interoperability and ease
141of upgrade when facing version 1 senders.
142
143Both formats are designed to fit in the smallest TCP segment that any TCP/IP
144host is required to support (576 - 40 = 536 bytes). This ensures that the whole
145header will always be delivered at once when the socket buffers are still empty
146at the beginning of a connection. The sender must always ensure that the header
147is sent at once, so that the transport layer maintains atomicity along the path
148to the receiver. The receiver may be tolerant to partial headers or may simply
149drop the connection when receiving a partial header. Recommendation is to be
150tolerant, but implementation constraints may not always easily permit this. It
151is important to note that nothing forces any intermediary to forward the whole
152header at once, because TCP is a streaming protocol which may be processed one
153byte at a time if desired, causing the header to be fragmented when reaching
154the receiver. But due to the places where such a protocol is used, the above
155simplification generally is acceptable because the risk of crossing such a
156device handling one byte at a time is close to zero.
157
158The receiver MUST NOT start processing the connection before it receives a
159complete and valid PROXY protocol header. This is particularly important for
160protocols where the receiver is expected to speak first (eg: SMTP, FTP or SSH).
161The receiver may apply a short timeout and decide to abort the connection if
162the protocol header is not seen within a few seconds (at least 3 seconds to
163cover a TCP retransmit).
164
165The receiver MUST be configured to only receive the protocol described in this
166specification and MUST not try to guess whether the protocol header is present
167or not. This means that the protocol explicitly prevents port sharing between
168public and private access. Otherwise it would open a major security breach by
169allowing untrusted parties to spoof their connection addresses. The receiver
170SHOULD ensure proper access filtering so that only trusted proxies are allowed
171to use this protocol.
172
173Some proxies are smart enough to understand transported protocols and to reuse
174idle server connections for multiple messages. This typically happens in HTTP
175where requests from multiple clients may be sent over the same connection. Such
176proxies MUST NOT implement this protocol on multiplexed connections because the
177receiver would use the address advertised in the PROXY header as the address of
178all forwarded requests's senders. In fact, such proxies are not dumb proxies,
179and since they do have a complete understanding of the transported protocol,
180they MUST use the facilities provided by this protocol to present the client's
181address.
182
183
1842.1. Human-readable header format (Version 1)
185
186This is the format specified in version 1 of the protocol. It consists in one
187line of ASCII text matching exactly the following block, sent immediately and
188at once upon the connection establishment and prepended before any data flowing
189from the sender to the receiver :
Willy Tarreau640cf222010-10-29 21:46:16 +0200190
191 - a string identifying the protocol : "PROXY" ( \x50 \x52 \x4F \x58 \x59 )
Willy Tarreau332d7b02012-11-19 11:27:29 +0100192 Seeing this string indicates that this is version 1 of the protocol.
Willy Tarreau640cf222010-10-29 21:46:16 +0200193
194 - exactly one space : " " ( \x20 )
195
Willy Tarreau332d7b02012-11-19 11:27:29 +0100196 - a string indicating the proxied INET protocol and family. As of version 1,
Willy Tarreau640cf222010-10-29 21:46:16 +0200197 only "TCP4" ( \x54 \x43 \x50 \x34 ) for TCP over IPv4, and "TCP6"
Willy Tarreau332d7b02012-11-19 11:27:29 +0100198 ( \x54 \x43 \x50 \x36 ) for TCP over IPv6 are allowed. Other, unsupported,
199 or unknown protocols must be reported with the name "UNKNOWN" ( \x55 \x4E
200 \x4B \x4E \x4F \x57 \x4E ). For "UNKNOWN", the rest of the line before the
201 CRLF may be omitted by the sender, and the receiver must ignore anything
202 presented before the CRLF is found. Note that an earlier version of this
203 specification suggested to use this when sending health checks, but this
204 causes issues with servers that reject the "UNKNOWN" keyword. Thus is it
205 now recommended not to send "UNKNOWN" when the connection is expected to
206 be accepted, but only when it is not possible to correctly fill the PROXY
207 line.
Willy Tarreau640cf222010-10-29 21:46:16 +0200208
209 - exactly one space : " " ( \x20 )
210
211 - the layer 3 source address in its canonical format. IPv4 addresses must be
212 indicated as a series of exactly 4 integers in the range [0..255] inclusive
213 written in decimal representation separated by exactly one dot between each
214 other. Heading zeroes are not permitted in front of numbers in order to
215 avoid any possible confusion with octal numbers. IPv6 addresses must be
216 indicated as series of 4 hexadecimal digits (upper or lower case) delimited
217 by colons between each other, with the acceptance of one double colon
218 sequence to replace the largest acceptable range of consecutive zeroes. The
219 total number of decoded bits must exactly be 128. The advertised protocol
220 family dictates what format to use.
221
222 - exactly one space : " " ( \x20 )
223
224 - the layer 3 destination address in its canonical format. It is the same
225 format as the layer 3 source address and matches the same family.
226
227 - exactly one space : " " ( \x20 )
228
229 - the TCP source port represented as a decimal integer in the range
230 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
231 in order to avoid any possible confusion with octal numbers.
232
233 - exactly one space : " " ( \x20 )
234
235 - the TCP destination port represented as a decimal integer in the range
236 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
237 in order to avoid any possible confusion with octal numbers.
238
239 - the CRLF sequence ( \x0D \x0A )
240
Willy Tarreau332d7b02012-11-19 11:27:29 +0100241
242The maximum line lengths the receiver must support including the CRLF are :
243 - TCP/IPv4 :
244 "PROXY TCP4 255.255.255.255 255.255.255.255 65535 65535\r\n"
245 => 5 + 1 + 4 + 1 + 15 + 1 + 15 + 1 + 5 + 1 + 5 + 2 = 56 chars
246
247 - TCP/IPv6 :
248 "PROXY TCP6 ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
249 => 5 + 1 + 4 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 104 chars
250
251 - unknown connection (short form) :
252 "PROXY UNKNOWN\r\n"
253 => 5 + 1 + 7 + 2 = 15 chars
254
255 - worst case (optional fields set to 0xff) :
256 "PROXY UNKNOWN ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
257 => 5 + 1 + 7 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 107 chars
258
259So a 108-byte buffer is always enough to store all the line and a trailing zero
260for string processing.
261
262The receiver must wait for the CRLF sequence before starting to decode the
263addresses in order to ensure they are complete and properly parsed. If the CRLF
264sequence is not found in the first 107 characters, the receiver should declare
265the line invalid. A receiver may reject an incomplete line which does not
266contain the CRLF sequence in the first atomic read operation. The receiver must
267not tolerate a single CR or LF character to end the line when a complete CRLF
268sequence is expected.
269
270Any sequence which does not exactly match the protocol must be discarded and
271cause the receiver to abort the connection. It is recommended to abort the
272connection as soon as possible so that the sender gets a chance to notice the
273anomaly and log it.
Willy Tarreau640cf222010-10-29 21:46:16 +0200274
275If the announced transport protocol is "UNKNOWN", then the receiver knows that
Willy Tarreau332d7b02012-11-19 11:27:29 +0100276the sender speaks the correct PROXY protocol with the appropriate version, and
277SHOULD accept the connection and use the real connection's parameters as if
278there were no PROXY protocol header on the wire. However, senders SHOULD not
279use the "UNKNOWN" protocol when they are the initiators of outgoing connections
280because some receivers may reject them. When a load balancing proxy has to send
281health checks to a server, it SHOULD build a valid PROXY line which it will
282fill with a getsockname()/getpeername() pair indicating the addresses used. It
283is important to understand that doing so is not appropriate when some source
284address translation is performed between the sender and the receiver.
Willy Tarreau640cf222010-10-29 21:46:16 +0200285
286An example of such a line before an HTTP request would look like this (CR
287marked as "\r" and LF marked as "\n") :
288
289 PROXY TCP4 192.168.0.1 192.168.0.11 56324 443\r\n
290 GET / HTTP/1.1\r\n
291 Host: 192.168.0.11\r\n
292 \r\n
293
Willy Tarreau332d7b02012-11-19 11:27:29 +0100294For the sender, the header line is easy to put into the output buffers once the
295connection is established. Note that since the line is always shorter than an
296MSS, the sender is guaranteed to always be able to emit it at once and should
297not even bother handling partial sends. For the receiver, once the header is
298parsed, it is easy to skip it from the input buffers. Please consult section 9
299for implementation suggestions.
300
301
3022.2. Binary header format (version 2)
303
304Producing human-readable IPv6 addresses and parsing them is very inefficient,
305due to the multiple possible representation formats and the handling of compact
306address format. It was also not possible to specify address families outside
307IPv4/IPv6 nor non-TCP protocols. Another drawback of the human-readable format
308is the fact that implementations need to parse all characters to find the
309trailing CRLF, which makes it harder to read only the exact bytes count. Last,
310the UNKNOWN address type has not always been accepted by servers as a valid
311protocol because of its imprecise meaning.
312
313Version 2 of the protocol thus introduces a new binary format which remains
314distinguishable from version 1 and from other commonly used protocols. It was
315specially designed in order to be incompatible with a wide range of protocols
316and to be rejected by a number of common implementations of these protocols
317when unexpectedly presented (please see section 7). Also for better processing
318efficiency, IPv4 and IPv6 addresses are respectively aligned on 4 and 16 bytes
319boundaries.
320
321The binary header format starts with a constant 12 bytes block containing the
322protocol signature :
323
324 \x0D \x0A \x0D \x0A \x00 \x0D \x0A \x51 \x55 \x49 \x54 \x0A
325
326Note that this block contains a null byte at the 5th position, so it must not
327be handled as a null-terminated string.
328
329The next byte (the 13th one) is the protocol version. As of this specification,
330it must always be sent as \x02 and the receiver must only accept this value.
331
332The 14th byte represents the command :
333 - \x00 : LOCAL : the connection was established on purpose by the proxy
334 without being relayed. The connection endpoints are the sender and the
335 receiver. Such connections exist when the proxy sends health-checks to the
336 server. The receiver must accept this connection as valid and must use the
337 real connection endpoints and discard the protocol block including the
338 family which is ignored.
339
340 - \x01 : PROXY : the connection was established on behalf of another node,
341 and reflects the original connection endpoints. The receiver must then use
342 the information provided in the protocol block to get original the address.
343
344 - other values are unassigned and must not be emitted by senders. Receivers
345 must drop connections presenting unexpected values here.
346
347The 15th byte contains the transport protocol and address family. The highest 4
348bits contain the address family, the lowest 4 bits contain the protocol.
349
350The address family maps to the original socket family without necessarily
351matching the values internally used by the system. It may be one of :
352
353 - 0x0 : AF_UNSPEC : the connection is forwarded for an unknown, unspecified
354 or unsupported protocol. The sender should use this family when sending
355 LOCAL commands or when dealing with unsupported protocol families. The
356 receiver is free to accept the connection anyway and use the real endpoint
357 addresses or to reject it. The receiver should ignore address information.
358
359 - 0x1 : AF_INET : the forwarded connection uses the AF_INET address family
360 (IPv4). The addresses are exactly 4 bytes each in network byte order,
361 followed by transport protocol information (typically ports).
362
363 - 0x2 : AF_INET6 : the forwarded connection uses the AF_INET6 address family
364 (IPv6). The addresses are exactly 16 bytes each in network byte order,
365 followed by transport protocol information (typically ports).
366
367 - 0x3 : AF_UNIX : the forwarded connection uses the AF_UNIX address family
368 (UNIX). The addresses are exactly 108 bytes each.
369
370 - other values are unspecified and must not be emitted in version 2 of this
371 protocol and must be rejected as invalid by receivers.
372
373The transport protocol is specified in the lowest 4 bits of the the 15th byte :
374
375 - 0x0 : UNSPEC : the connection is forwarded for an unknown, unspecified
376 or unsupported protocol. The sender should use this family when sending
377 LOCAL commands or when dealing with unsupported protocol families. The
378 receiver is free to accept the connection anyway and use the real endpoint
379 addresses or to reject it. The receiver should ignore address information.
380
381 - 0x1 : STREAM : the forwarded connection uses a SOCK_STREAM protocol (eg:
382 TCP or UNIX_STREAM). When used with AF_INET/AF_INET6 (TCP), the addresses
383 are followed by the source and destination ports represented on 2 bytes
384 each in network byte order.
385
386 - 0x2 : DGRAM : the forwarded connection uses a SOCK_DGRAM protocol (eg:
387 UDP or UNIX_DGRAM). When used with AF_INET/AF_INET6 (UDP), the addresses
388 are followed by the source and destination ports represented on 2 bytes
389 each in network byte order.
390
391 - other values are unspecified and must not be emitted in version 2 of this
392 protocol and must be rejected as invalid by receivers.
393
394In practice, the following protocol bytes are expected :
395
396 - \x00 : UNSPEC : the connection is forwarded for an unknown, unspecified
397 or unsupported protocol. The sender should use this family when sending
398 LOCAL commands or when dealing with unsupported protocol families. When
399 used with a LOCAL command, the receiver must accept the connection and
400 ignore any address information. For other commands, the receiver is free
401 to accept the connection anyway and use the real endpoints addresses or to
402 reject the connection. The receiver should ignore address information.
403
404 - \x11 : TCP over IPv4 : the forwarded connection uses TCP over the AF_INET
405 protocol family. Address length is 2*4 + 2*2 = 12 bytes.
406
407 - \x12 : UDP over IPv4 : the forwarded connection uses UDP over the AF_INET
408 protocol family. Address length is 2*4 + 2*2 = 12 bytes.
409
410 - \x21 : TCP over IPv6 : the forwarded connection uses TCP over the AF_INET6
411 protocol family. Address length is 2*16 + 2*2 = 36 bytes.
412
413 - \x22 : UDP over IPv6 : the forwarded connection uses UDP over the AF_INET6
414 protocol family. Address length is 2*16 + 2*2 = 36 bytes.
415
416 - \x31 : UNIX stream : the forwarded connection uses SOCK_STREAM over the
417 AF_UNIX protocol family. Address length is 2*108 = 216 bytes.
418
419 - \x32 : UNIX datagram : the forwarded connection uses SOCK_DGRAM over the
420 AF_UNIX protocol family. Address length is 2*108 = 216 bytes.
421
422
423Only the UNSPEC protocol byte (\x00) is mandatory. A receiver is not required
424to implement other ones, provided that it automatically falls back to the
425UNSPEC mode for the valid combinations above that it does not support.
Willy Tarreau640cf222010-10-29 21:46:16 +0200426
Willy Tarreau332d7b02012-11-19 11:27:29 +0100427The 16th byte is the address length in bytes. It is used so that the receiver
428knows how many address bytes to skip even when it does not implement the
429presented protocol. Thus the length of the protocol header in bytes is always
430exactly 16 + this byte. This means that the largest protocol header may only
431be 16 + 255 = 271 bytes, which fits in a usual MSS. When a sender presents a
432LOCAL connection, it should not present any address so it sets this field to
433zero. Receivers MUST always consider this field to skip the appropriate number
434of bytes and must not assume zero is presented for LOCAL connections. When a
435receiver accepts an incoming connection showing an UNSPEC address family or
436protocol, it may or may not decide to log the address information if present.
437
438So the 16-byte version 2 header can be described this way :
439
440 struct proxy_hdr_v2 {
441 uint8_t sig[12]; /* hex 0D 0A 0D 0A 00 0D 0A 51 55 49 54 0A */
442 uint8_t ver; /* hex 02 */
443 uint8_t cmd; /* hex 00 or 01 */
444 uint8_t fam; /* protocol family and address */
445 uint8_t len; /* number of following bytes part of the header */
446 };
447
448Starting from the 17th byte, addresses are presented in network byte order.
449The address order is always the same :
450 - source layer 3 address in network byte order
451 - destination layer 3 address in network byte order
452 - source layer 4 address if any, in network byte order (port)
453 - destination layer 4 address if any, in network byte order (port)
454
455The address block may directly be sent from or received into the following
456union which makes it easy to cast from/to the relevant socket native structs
457depending on the address type :
458
459 union proxy_addr {
460 struct { /* for TCP/UDP over IPv4, len = 12 */
461 uint32_t src_addr;
462 uint32_t dst_addr;
463 uint16_t src_port;
464 uint16_t dst_port;
465 } ipv4_addr;
466 struct { /* for TCP/UDP over IPv6, len = 36 */
467 uint8_t src_addr[16];
468 uint8_t dst_addr[16];
469 uint16_t src_port;
470 uint16_t dst_port;
471 } ipv6_addr;
472 struct { /* for AF_UNIX sockets, len = 216 */
473 uint8_t src_addr[108];
474 uint8_t dst_addr[108];
475 } unix_addr;
476 };
477
478The sender must ensure that all the protocol header is sent at once. This block
479is always smaller than an MSS, so there is no reason for it to be segmented at
480the beginning of the connection. The receiver should also process the header
481at once. The receiver must not start to parse an address before the whole
482address block is received. The receiver must also reject incoming connections
483containing partial protocol headers.
484
485A receiver may be configured to support both version 1 and version 2 of the
486protocol. Identifying the protocol version is easy :
487
488 - if the incoming byte count is 16 or above and the 13 first bytes match
489 the protocol signature block followed by the protocol version 2 :
490
491 \x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A\x02
492
493 - otherwise, if the incoming byte count is 8 or above, and the 5 first
494 characters match the ASCII representation of "PROXY" then the protocol
495 must be parsed as version 1 :
496
497 \x50\x52\x4F\x58\x59
498
499 - otherwise the protocol is not covered by this specification and the
500 connection must be dropped.
501
Willy Tarreau7f898512011-03-20 11:32:40 +0100502
5033. Implementations
504
Willy Tarreau332d7b02012-11-19 11:27:29 +0100505Haproxy 1.5 implements version 1 of the PROXY protocol on both sides :
Willy Tarreau7f898512011-03-20 11:32:40 +0100506 - the listening sockets accept the protocol when the "accept-proxy" setting
507 is passed to the "bind" keyword. Connections accepted on such listeners
508 will behave just as if the source really was the one advertised in the
509 protocol. This is true for logging, ACLs, content filtering, transparent
510 proxying, etc...
511
512 - the protocol may be used to connect to servers if the "send-proxy" setting
513 is present on the "server" line. It is enabled on a per-server basis, so it
514 is possible to have it enabled for remote servers only and still have local
515 ones behave differently. If the incoming connection was accepted with the
516 "accept-proxy", then the relayed information is the one advertised in this
517 connection's PROXY line.
518
Willy Tarreau332d7b02012-11-19 11:27:29 +0100519Stunnel added support for version 1 of the protocol for outgoing connections in
520version 4.45.
Willy Tarreau7f898512011-03-20 11:32:40 +0100521
Willy Tarreau332d7b02012-11-19 11:27:29 +0100522Stud added support for version 1 of the protocol for outgoing connections on
5232011/06/29.
524
525Postfix added support for version 1 of the protocol for incoming connections
526in smtpd and postscreen in version 2.10.
527
528A patch is available for Stud[5] to implement version 1 of the protocol on
529incoming connections.
530
531Support for the protocol in the Varnish cache is being considered [6].
532
533The protocol is simple enough that it is expected that other implementations
534will appear, especially in environments such as SMTP, IMAP, FTP, RDP where the
Willy Tarreau7f898512011-03-20 11:32:40 +0100535client's address is an important piece of information for the server and some
Willy Tarreau332d7b02012-11-19 11:27:29 +0100536intermediaries. In fact, several proprietary deployments have already done so
537on FTP and SMTP servers.
Willy Tarreau7f898512011-03-20 11:32:40 +0100538
539Proxy developers are encouraged to implement this protocol, because it will
540make their products much more transparent in complex infrastructures, and will
541get rid of a number of issues related to logging and access control.
542
Willy Tarreau332d7b02012-11-19 11:27:29 +0100543
5444. Architectural benefits
5454.1. Multiple layers
546
547Using the PROXY protocol instead of transparent proxy provides several benefits
548in multiple-layer infrastructures. The first immediate benefit is that it
549becomes possible to chain multiple layers of proxies and always present the
550original IP address. for instance, let's consider the following 2-layer proxy
551architecture :
552
553 Internet
554 ,---. | client to PX1:
555 ( X ) | native protocol
556 `---' |
557 | V
558 +--+--+ +-----+
559 | FW1 |------| PX1 |
560 +--+--+ +-----+ | PX1 to PX2: PROXY + native
561 | V
562 +--+--+ +-----+
563 | FW2 |------| PX2 |
564 +--+--+ +-----+ | PX2 to SRV: PROXY + native
565 | V
566 +--+--+
567 | SRV |
568 +-----+
Willy Tarreau7f898512011-03-20 11:32:40 +0100569
Willy Tarreau332d7b02012-11-19 11:27:29 +0100570Firewall FW1 receives traffic from internet-based clients and forwards it to
571reverse-proxy PX1. PX1 adds a PROXY header then forwards to PX2 via FW2. PX2
572is configured to read the PROXY header and to emit it on output. It then joins
573the origin server SRV and presents the original client's address there. Since
574all TCP connections endpoints are real machines and are not spoofed, there is
575no issue for the return traffic to pass via the firewalls and reverse proxies.
576Using transparent proxy, this would be quite difficult because the firewalls
577would have to deal with the client's address coming from the proxies in the DMZ
578and would have to correctly route the return traffic there instead of using the
579default route.
Willy Tarreau7f898512011-03-20 11:32:40 +0100580
Willy Tarreau332d7b02012-11-19 11:27:29 +0100581
5824.2. IPv4 and IPv6 integration
583
584The protocol also eases IPv4 and IPv6 integration : if only the first layer
585(FW1 and PX1) is IPv6-capable, it is still possible to present the original
586client's IPv6 address to the target server eventhough the whole chain is only
587connected via IPv4.
588
589
5904.3. Multiple return paths
591
592When transparent proxy is used, it is not possible to run multiple proxies
593because the return traffic would follow the default route instead of finding
594the proper proxy. Some tricks are sometimes possible using multiple server
595addresses and policy routing but these are very limited.
596
597Using the PROXY protocol, this problem disappears as the servers don't need
598to route to the client, just to the proxy that forwarded the connection. So
599it is perfectly possible to run a proxy farm in front of a very large server
600farm and have it working effortless, even when dealing with multiple sites.
601
602This is particularly important in Cloud-like environments where there is little
603choice of binding to random addresses and where the lower processing power per
604node generally requires multiple front nodes.
605
606The example below illustrates the following case : virtualized infrastructures
607are deployed in 3 datacenters (DC1..DC3). Each DC uses its own VIP which is
608handled by the hosting provider's layer 3 load balancer. This load balancer
609routes the traffic to a farm of layer 7 SSL/cache offloaders which load balance
610among their local servers. The VIPs are advertised by geolocalised DNS so that
611clients generally stick to a given DC. Since clients are not guaranteed to
612stick to one DC, the L7 load balancing proxies have to know the other DCs'
613servers that may be reached via the hosting provider's LAN or via the internet.
614The L7 proxies use the PROXY protocol to join the servers behind them, so that
615even inter-DC traffic can forward the original client's address and the return
616path is unambiguous. This would not be possible using transparent proxy because
617most often the L7 proxies would not be able to spoof an address, and this would
618never work between datacenters.
619
620 Internet
621
622 DC1 DC2 DC3
623 ,---. ,---. ,---.
624 ( X ) ( X ) ( X )
625 `---' `---' `---'
626 | +-------+ | +-------+ | +-------+
627 +----| L3 LB | +----| L3 LB | +----| L3 LB |
628 | +-------+ | +-------+ | +-------+
629 ------+------- ~ ~ ~ ------+------- ~ ~ ~ ------+-------
630 ||||| |||| ||||| |||| ||||| ||||
631 50 SRV 4 PX 50 SRV 4 PX 50 SRV 4 PX
632
633
6345. Security considerations
635
636Version 1 of the protocol header (the human-readable format) was designed so as
637to be distinguishable from HTTP. It will not parse as a valid HTTP request and
638an HTTP request will not parse as a valid proxy request. Version 2 add to use a
639non-parsable binary signature to make many products fail on this block. The
640signature was designed to cause immediate failure on HTTP, SSL/TLS, SMTP, FTP,
641and POP. It also causes aborts on LDAP and RDP servers (see section 6). That
642makes it easier to enforce its use under certain connections and at the same
643time, it ensures that improperly configured servers are quickly detected.
644
Willy Tarreau7f898512011-03-20 11:32:40 +0100645Implementers should be very careful about not trying to automatically detect
Willy Tarreau332d7b02012-11-19 11:27:29 +0100646whether they have to decode the header or not, but rather they must only rely
647on a configuration parameter. Indeed, if the opportunity is left to a normal
648client to use the protocol, he will be able to hide his activities or make them
649appear as coming from someone else. However, accepting the header only from a
650number of known sources should be safe.
651
652
6536. Validation
Willy Tarreau7f898512011-03-20 11:32:40 +0100654
Willy Tarreau332d7b02012-11-19 11:27:29 +0100655The version 2 protocol signature has been sent to a wide variety of protocols
656and implementations including old ones. The following protocol and products
657have been tested to ensure the best possible behaviour when the signature was
658presented, even with minimal implementations :
Willy Tarreau7f898512011-03-20 11:32:40 +0100659
Willy Tarreau332d7b02012-11-19 11:27:29 +0100660 - HTTP :
661 - Apache 1.3.33 : connection abort => pass/optimal
662 - Nginx 0.7.69 : 400 Bad Request + abort => pass/optimal
663 - lighttpd 1.4.20 : 400 Bad Request + abort => pass/optimal
664 - thttpd 2.20c : 400 Bad Request + abort => pass/optimal
665 - mini-httpd-1.19 : 400 Bad Request + abort => pass/optimal
666 - haproxy 1.4.21 : 400 Bad Request + abort => pass/optimal
667 - SSL :
668 - stud 0.3.47 : connection abort => pass/optimal
669 - stunnel 4.45 : connection abort => pass/optimal
670 - nginx 0.7.69 : 400 Bad Request + abort => pass/optimal
671 - FTP :
672 - Pure-ftpd 1.0.20 : 3*500 then 221 Goodbye => pass/optimal
673 - vsftpd 2.0.1 : 3*530 then 221 Goodbye => pass/optimal
674 - SMTP :
675 - postfix 2.3 : 3*500 + 221 Bye => pass/optimal
676 - exim 4.69 : 554 + connection abort => pass/optimal
677 - POP :
678 - dovecot 1.0.10 : 3*ERR + Logout => pass/optimal
679 - IMAP :
680 - dovecot 1.0.10 : 5*ERR + hang => pass/non-optimal
681 - LDAP :
682 - openldap 2.3 : abort => pass/optimal
683 - SSH :
684 - openssh 3.9p1 : abort => pass/optimal
685 - RDP :
686 - Windows XP SP3 : abort => pass/optimal
687
688This means that most protocols and implementations will not be confused by an
689incoming connection exhibiting the protocol signature, which avoids issues when
690facing misconfigurations.
691
692
6937. Future developments
Willy Tarreau640cf222010-10-29 21:46:16 +0200694
695It is possible that the protocol may slightly evolve to present other
696information such as the incoming network interface, or the origin addresses in
697case of network address translation happening before the first proxy, but this
Willy Tarreau332d7b02012-11-19 11:27:29 +0100698is not identified as a requirement right now. Some deep thinking has been spent
699on this and it appears that trying to add a few more information open a pandora
700box with many information from MAC addresses to SSL client certificates, which
701would make the protocol much more complex. So at this point it is not planned.
702Suggestions on improvements are welcome.
Willy Tarreau7f898512011-03-20 11:32:40 +0100703
704
Willy Tarreau332d7b02012-11-19 11:27:29 +01007058. Contacts and links
Willy Tarreau7f898512011-03-20 11:32:40 +0100706
707Please use w@1wt.eu to send any comments to the author.
708
Willy Tarreau332d7b02012-11-19 11:27:29 +0100709The following links were referenced in the document.
710
711[1] http://www.postfix.org/XCLIENT_README.html
712[2] http://tools.ietf.org/html/draft-ietf-appsawg-http-forwarded
713[3] http://www.stunnel.org/
714[4] https://github.com/bumptech/stud
715[5] https://github.com/bumptech/stud/pull/81
716[6] https://www.varnish-cache.org/trac/wiki/Future_Protocols
717
718
7199. Sample code
720
721The code below is an example of how a receiver may deal with both versions of
722the protocol header for TCP over IPv4 or IPv6. The function is supposed to be
723called upon a read event. Addresses may be directly copied into their final
724memory location since they're transported in network byte order. The sending
725side is even simpler and can easily be deduced from this sample code.
726
727 struct sockaddr_storage from; /* already filled by accept() */
728 struct sockaddr_storage to; /* already filled by getsockname() */
729 const char v2sig[13] = "\x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A\x02";
730
731 /* returns 0 if needs to poll, <0 upon error or >0 if it did the job */
732 int read_evt(int fd)
733 {
734 union {
735 struct {
736 char line[108];
737 } v1;
738 struct {
739 uint8_t sig[12];
740 uint8_t ver;
741 uint8_t cmd;
742 uint8_t fam;
743 uint8_t len;
744 union {
745 struct { /* for TCP/UDP over IPv4, len = 12 */
746 uint32_t src_addr;
747 uint32_t dst_addr;
748 uint16_t src_port;
749 uint16_t dst_port;
750 } ip4;
751 struct { /* for TCP/UDP over IPv6, len = 36 */
752 uint8_t src_addr[16];
753 uint8_t dst_addr[16];
754 uint16_t src_port;
755 uint16_t dst_port;
756 } ip6;
757 struct { /* for AF_UNIX sockets, len = 216 */
758 uint8_t src_addr[108];
759 uint8_t dst_addr[108];
760 } unx;
761 } addr;
762 } v2;
763 } hdr;
764
765 int size, ret;
766
767 do {
768 ret = recv(fd, &hdr, sizeof(hdr), MSG_PEEK);
769 } while (ret == -1 && errno == EINTR);
770
771 if (ret == -1)
772 return (errno == EAGAIN) ? 0 : -1;
773
774 if (ret >= 16 && memcmp(&hdr.v2, v2sig, 13) == 0) {
775 size = 16 + hdr.v2.len;
776 if (ret < size)
777 return -1; /* truncated or too large header */
778
779 switch (hdr.v2.cmd) {
780 case 0x01: /* PROXY command */
781 switch (hdr.v2.fam) {
782 case 0x11: /* TCPv4 */
783 ((struct sockaddr_in *)&from)->sin_family = AF_INET;
784 ((struct sockaddr_in *)&from)->sin_addr.s_addr =
785 hdr.v2.addr.ip4.src_addr;
786 ((struct sockaddr_in *)&from)->sin_port =
787 hdr.v2.addr.ip4.src_port;
788 ((struct sockaddr_in *)&to)->sin_family = AF_INET;
789 ((struct sockaddr_in *)&to)->sin_addr.s_addr =
790 hdr.v2.addr.ip4.dst_addr;
791 ((struct sockaddr_in *)&to)->sin_port =
792 hdr.v2.addr.ip4.dst_port;
793 goto done;
794 case 0x21: /* TCPv6 */
795 ((struct sockaddr_in6 *)&from)->sin6_family = AF_INET6;
796 memcpy(&((struct sockaddr_in6 *)&from)->sin6_addr,
797 hdr.v2.addr.ip6.src_addr, 16);
798 ((struct sockaddr_in6 *)&from)->sin6_port =
799 hdr.v2.addr.ip6.src_port;
800 ((struct sockaddr_in6 *)&to)->sin6_family = AF_INET6;
801 memcpy(&((struct sockaddr_in6 *)&to)->sin6_addr,
802 hdr.v2.addr.ip6.dst_addr, 16);
803 ((struct sockaddr_in6 *)&to)->sin6_port =
804 hdr.v2.addr.ip6.dst_port;
805 goto done;
806 }
807 /* unsupported protocol, keep local connection address */
808 break;
809 case 0x00: /* LOCAL command */
810 /* keep local connection address for LOCAL */
811 break;
812 default:
813 return -1; /* not a supported command */
814 }
815 }
816 else if (ret >= 8 && memcmp(hdr.v1.line, "PROXY", 5) == 0) {
817 char *end = memchr(hdr.v1.line, '\r', ret - 1);
818 if (!end || end[1] != '\n')
819 return -1; /* partial or invalid header */
820 *end = '\0'; /* terminate the string to ease parsing */
821 size = end + 2 - hdr.v1.line; /* skip header + CRLF */
822 /* parse the V1 header using favorite address parsers like inet_pton.
823 * return -1 upon error, or simply fall through to accept.
824 */
825 }
826 else {
827 /* Wrong protocol */
828 return -1;
829 }
830
831 done:
832 /* we need to consume the appropriate amount of data from the socket */
833 do {
834 ret = recv(fd, &hdr, size, 0);
835 } while (ret == -1 && errno == EINTR);
836 return (ret >= 0) ? 1 : -1;
837 }