blob: 3473b0bab5bc188ff025b2c292e890bd9524664c [file] [log] [blame]
Willy Tarreau7b7011c2015-05-02 15:13:07 +020012015/05/02 Willy Tarreau
Willy Tarreaua3393952014-05-10 15:16:43 +02002 HAProxy Technologies
Willy Tarreau7f898512011-03-20 11:32:40 +01003 The PROXY protocol
Willy Tarreau332d7b02012-11-19 11:27:29 +01004 Versions 1 & 2
Willy Tarreau7f898512011-03-20 11:32:40 +01005
6Abstract
7
8 The PROXY protocol provides a convenient way to safely transport connection
9 information such as a client's address across multiple layers of NAT or TCP
10 proxies. It is designed to require little changes to existing components and
11 to limit the performance impact caused by the processing of the transported
12 information.
13
14
15Revision history
16
17 2010/10/29 - first version
18 2011/03/20 - update: implementation and security considerations
Willy Tarreau332d7b02012-11-19 11:27:29 +010019 2012/06/21 - add support for binary format
20 2012/11/19 - final review and fixes
David Safb76832014-05-08 23:42:08 -040021 2014/05/18 - modify and extend PROXY protocol version 2
Willy Tarreau7a6f1342014-06-14 11:45:09 +020022 2014/06/11 - fix example code to consider ver+cmd merge
23 2014/06/14 - fix v2 header check in example code, and update Forwarded spec
Willy Tarreau7b7011c2015-05-02 15:13:07 +020024 2014/07/12 - update list of implementations (add Squid)
25 2015/05/02 - update list of implementations and format of the TLV add-ons
Willy Tarreau7f898512011-03-20 11:32:40 +010026
27
281. Background
Willy Tarreau640cf222010-10-29 21:46:16 +020029
30Relaying TCP connections through proxies generally involves a loss of the
31original TCP connection parameters such as source and destination addresses,
32ports, and so on. Some protocols make it a little bit easier to transfer such
Willy Tarreau332d7b02012-11-19 11:27:29 +010033information. For SMTP, Postfix authors have proposed the XCLIENT protocol [1]
Willy Tarreau7a6f1342014-06-14 11:45:09 +020034which received broad adoption and is particularly suited to mail exchanges.
35For HTTP, there is the "Forwarded" extension [2], which aims at replacing the
36omnipresent "X-Forwarded-For" header which carries information about the
37original source address, and the less common X-Original-To which carries
38information about the destination address.
Willy Tarreau640cf222010-10-29 21:46:16 +020039
40However, both mechanisms require a knowledge of the underlying protocol to be
41implemented in intermediaries.
42
43Then comes a new class of products which we'll call "dumb proxies", not because
44they don't do anything, but because they're processing protocol-agnostic data.
Willy Tarreau332d7b02012-11-19 11:27:29 +010045Both Stunnel[3] and Stud[4] are examples of such "dumb proxies". They talk raw
46TCP on one side, and raw SSL on the other one, and do that reliably, without
Willy Tarreau7a6f1342014-06-14 11:45:09 +020047any knowledge of what protocol is transported on top of the connection. Haproxy
48running in pure TCP mode obviously falls into that category as well.
Willy Tarreau640cf222010-10-29 21:46:16 +020049
50The problem with such a proxy when it is combined with another one such as
Willy Tarreau7a6f1342014-06-14 11:45:09 +020051haproxy, is to adapt it to talk the higher level protocol. A patch is available
Willy Tarreau332d7b02012-11-19 11:27:29 +010052for Stunnel to make it capable of inserting an X-Forwarded-For header in the
53first HTTP request of each incoming connection. Haproxy is able not to add
54another one when the connection comes from Stunnel, so that it's possible to
55hide it from the servers.
Willy Tarreau640cf222010-10-29 21:46:16 +020056
57The typical architecture becomes the following one :
58
59
60 +--------+ HTTP :80 +----------+
61 | client | --------------------------------> | |
62 | | | haproxy, |
63 +--------+ +---------+ | 1 or 2 |
64 / / HTTPS | stunnel | HTTP :81 | listening|
65 <________/ ---------> | (server | ---------> | ports |
66 | mode) | | |
67 +---------+ +----------+
68
69
70The problem appears when haproxy runs with keep-alive on the side towards the
71client. The Stunnel patch will only add the X-Forwarded-For header to the first
72request of each connection and all subsequent requests will not have it. One
73solution could be to improve the patch to make it support keep-alive and parse
74all forwarded data, whether they're announced with a Content-Length or with a
75Transfer-Encoding, taking care of special methods such as HEAD which announce
76data without transfering them, etc... In fact, it would require implementing a
77full HTTP stack in Stunnel. It would then become a lot more complex, a lot less
78reliable and would not anymore be the "dumb proxy" that fits every purposes.
79
80In practice, we don't need to add a header for each request because we'll emit
81the exact same information every time : the information related to the client
82side connection. We could then cache that information in haproxy and use it for
83every other request. But that becomes dangerous and is still limited to HTTP
84only.
85
Willy Tarreau332d7b02012-11-19 11:27:29 +010086Another approach consists in prepending each connection with a header reporting
87the characteristics of the other side's connection. This method is simpler to
Willy Tarreau640cf222010-10-29 21:46:16 +020088implement, does not require any protocol-specific knowledge on either side, and
Willy Tarreau332d7b02012-11-19 11:27:29 +010089completely fits the purpose since what is desired precisely is to know the
90other side's connection endpoints. It is easy to perform for the sender (just
91send a short header once the connection is established) and to parse for the
92receiver (simply perform one read() on the incoming connection to fill in
93addresses after an accept). The protocol used to carry connection information
94across proxies was thus called the PROXY protocol.
Willy Tarreau640cf222010-10-29 21:46:16 +020095
Willy Tarreau7f898512011-03-20 11:32:40 +010096
Willy Tarreau332d7b02012-11-19 11:27:29 +0100972. The PROXY protocol header
Willy Tarreau7f898512011-03-20 11:32:40 +010098
Willy Tarreau332d7b02012-11-19 11:27:29 +010099This document uses a few terms that are worth explaining here :
100 - "connection initiator" is the party requesting a new connection
101 - "connection target" is the party accepting a connection request
102 - "client" is the party for which a connection was requested
103 - "server" is the party to which the client desired to connect
104 - "proxy" is the party intercepting and relaying the connection
105 from the client to the server.
106 - "sender" is the party sending data over a connection.
107 - "receiver" is the party receiving data from the sender.
108 - "header" or "PROXY protocol header" is the block of connection information
109 the connection initiator prepends at the beginning of a connection, which
110 makes it the sender from the protocol point of view.
111
112The PROXY protocol's goal is to fill the server's internal structures with the
113information collected by the proxy that the server would have been able to get
114by itself if the client was connecting directly to the server instead of via a
115proxy. The information carried by the protocol are the ones the server would
116get using getsockname() and getpeername() :
117 - address family (AF_INET for IPv4, AF_INET6 for IPv6, AF_UNIX)
118 - socket protocol (SOCK_STREAM for TCP, SOCK_DGRAM for UDP)
Willy Tarreau640cf222010-10-29 21:46:16 +0200119 - layer 3 source and destination addresses
120 - layer 4 source and destination ports if any
121
122Unlike the XCLIENT protocol, the PROXY protocol was designed with limited
Willy Tarreau332d7b02012-11-19 11:27:29 +0100123extensibility in order to help the receiver parse it very fast. Version 1 was
124focused on keeping it human-readable for better debugging possibilities, which
125is always desirable for early adoption when few implementations exist. Version
1262 adds support for a binary encoding of the header which is much more efficient
127to produce and to parse, especially when dealing with IPv6 addresses that are
128expensive to emit in ASCII form and to parse.
129
130In both cases, the protocol simply consists in an easily parsable header placed
131by the connection initiator at the beginning of each connection. The protocol
132is intentionally stateless in that it does not expect the sender to wait for
133the receiver before sending the header, nor the receiver to send anything back.
134
135This specification supports two header formats, a human-readable format which
136is the only format supported in version 1 of the protocol, and a binary format
137which is only supported in version 2. Both formats were designed to ensure that
138the header cannot be confused with common higher level protocols such as HTTP,
139SSL/TLS, FTP or SMTP, and that both formats are easily distinguishable one from
140each other for the receiver.
141
142Version 1 senders MAY only produce the human-readable header format. Version 2
143senders MAY only produce the binary header format. Version 1 receivers MUST at
144least implement the human-readable header format. Version 2 receivers MUST at
145least implement the binary header format, and it is recommended that they also
146implement the human-readable header format for better interoperability and ease
147of upgrade when facing version 1 senders.
148
149Both formats are designed to fit in the smallest TCP segment that any TCP/IP
150host is required to support (576 - 40 = 536 bytes). This ensures that the whole
151header will always be delivered at once when the socket buffers are still empty
152at the beginning of a connection. The sender must always ensure that the header
153is sent at once, so that the transport layer maintains atomicity along the path
154to the receiver. The receiver may be tolerant to partial headers or may simply
155drop the connection when receiving a partial header. Recommendation is to be
156tolerant, but implementation constraints may not always easily permit this. It
157is important to note that nothing forces any intermediary to forward the whole
158header at once, because TCP is a streaming protocol which may be processed one
159byte at a time if desired, causing the header to be fragmented when reaching
160the receiver. But due to the places where such a protocol is used, the above
161simplification generally is acceptable because the risk of crossing such a
162device handling one byte at a time is close to zero.
163
164The receiver MUST NOT start processing the connection before it receives a
165complete and valid PROXY protocol header. This is particularly important for
166protocols where the receiver is expected to speak first (eg: SMTP, FTP or SSH).
167The receiver may apply a short timeout and decide to abort the connection if
168the protocol header is not seen within a few seconds (at least 3 seconds to
169cover a TCP retransmit).
170
171The receiver MUST be configured to only receive the protocol described in this
172specification and MUST not try to guess whether the protocol header is present
173or not. This means that the protocol explicitly prevents port sharing between
174public and private access. Otherwise it would open a major security breach by
175allowing untrusted parties to spoof their connection addresses. The receiver
176SHOULD ensure proper access filtering so that only trusted proxies are allowed
177to use this protocol.
178
179Some proxies are smart enough to understand transported protocols and to reuse
180idle server connections for multiple messages. This typically happens in HTTP
181where requests from multiple clients may be sent over the same connection. Such
182proxies MUST NOT implement this protocol on multiplexed connections because the
183receiver would use the address advertised in the PROXY header as the address of
184all forwarded requests's senders. In fact, such proxies are not dumb proxies,
185and since they do have a complete understanding of the transported protocol,
186they MUST use the facilities provided by this protocol to present the client's
187address.
188
189
1902.1. Human-readable header format (Version 1)
191
192This is the format specified in version 1 of the protocol. It consists in one
193line of ASCII text matching exactly the following block, sent immediately and
194at once upon the connection establishment and prepended before any data flowing
195from the sender to the receiver :
Willy Tarreau640cf222010-10-29 21:46:16 +0200196
197 - a string identifying the protocol : "PROXY" ( \x50 \x52 \x4F \x58 \x59 )
Willy Tarreau332d7b02012-11-19 11:27:29 +0100198 Seeing this string indicates that this is version 1 of the protocol.
Willy Tarreau640cf222010-10-29 21:46:16 +0200199
200 - exactly one space : " " ( \x20 )
201
Willy Tarreau332d7b02012-11-19 11:27:29 +0100202 - a string indicating the proxied INET protocol and family. As of version 1,
Willy Tarreau640cf222010-10-29 21:46:16 +0200203 only "TCP4" ( \x54 \x43 \x50 \x34 ) for TCP over IPv4, and "TCP6"
Willy Tarreau332d7b02012-11-19 11:27:29 +0100204 ( \x54 \x43 \x50 \x36 ) for TCP over IPv6 are allowed. Other, unsupported,
205 or unknown protocols must be reported with the name "UNKNOWN" ( \x55 \x4E
206 \x4B \x4E \x4F \x57 \x4E ). For "UNKNOWN", the rest of the line before the
207 CRLF may be omitted by the sender, and the receiver must ignore anything
208 presented before the CRLF is found. Note that an earlier version of this
209 specification suggested to use this when sending health checks, but this
210 causes issues with servers that reject the "UNKNOWN" keyword. Thus is it
211 now recommended not to send "UNKNOWN" when the connection is expected to
212 be accepted, but only when it is not possible to correctly fill the PROXY
213 line.
Willy Tarreau640cf222010-10-29 21:46:16 +0200214
215 - exactly one space : " " ( \x20 )
216
217 - the layer 3 source address in its canonical format. IPv4 addresses must be
218 indicated as a series of exactly 4 integers in the range [0..255] inclusive
219 written in decimal representation separated by exactly one dot between each
220 other. Heading zeroes are not permitted in front of numbers in order to
221 avoid any possible confusion with octal numbers. IPv6 addresses must be
222 indicated as series of 4 hexadecimal digits (upper or lower case) delimited
223 by colons between each other, with the acceptance of one double colon
224 sequence to replace the largest acceptable range of consecutive zeroes. The
225 total number of decoded bits must exactly be 128. The advertised protocol
226 family dictates what format to use.
227
228 - exactly one space : " " ( \x20 )
229
230 - the layer 3 destination address in its canonical format. It is the same
231 format as the layer 3 source address and matches the same family.
232
233 - exactly one space : " " ( \x20 )
234
235 - the TCP source port represented as a decimal integer in the range
236 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
237 in order to avoid any possible confusion with octal numbers.
238
239 - exactly one space : " " ( \x20 )
240
241 - the TCP destination port represented as a decimal integer in the range
242 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
243 in order to avoid any possible confusion with octal numbers.
244
245 - the CRLF sequence ( \x0D \x0A )
246
Willy Tarreau332d7b02012-11-19 11:27:29 +0100247
248The maximum line lengths the receiver must support including the CRLF are :
249 - TCP/IPv4 :
250 "PROXY TCP4 255.255.255.255 255.255.255.255 65535 65535\r\n"
251 => 5 + 1 + 4 + 1 + 15 + 1 + 15 + 1 + 5 + 1 + 5 + 2 = 56 chars
252
253 - TCP/IPv6 :
254 "PROXY TCP6 ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
255 => 5 + 1 + 4 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 104 chars
256
257 - unknown connection (short form) :
258 "PROXY UNKNOWN\r\n"
259 => 5 + 1 + 7 + 2 = 15 chars
260
261 - worst case (optional fields set to 0xff) :
262 "PROXY UNKNOWN ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
263 => 5 + 1 + 7 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 107 chars
264
265So a 108-byte buffer is always enough to store all the line and a trailing zero
266for string processing.
267
268The receiver must wait for the CRLF sequence before starting to decode the
269addresses in order to ensure they are complete and properly parsed. If the CRLF
270sequence is not found in the first 107 characters, the receiver should declare
271the line invalid. A receiver may reject an incomplete line which does not
272contain the CRLF sequence in the first atomic read operation. The receiver must
273not tolerate a single CR or LF character to end the line when a complete CRLF
274sequence is expected.
275
276Any sequence which does not exactly match the protocol must be discarded and
277cause the receiver to abort the connection. It is recommended to abort the
278connection as soon as possible so that the sender gets a chance to notice the
279anomaly and log it.
Willy Tarreau640cf222010-10-29 21:46:16 +0200280
281If the announced transport protocol is "UNKNOWN", then the receiver knows that
Willy Tarreau332d7b02012-11-19 11:27:29 +0100282the sender speaks the correct PROXY protocol with the appropriate version, and
283SHOULD accept the connection and use the real connection's parameters as if
284there were no PROXY protocol header on the wire. However, senders SHOULD not
285use the "UNKNOWN" protocol when they are the initiators of outgoing connections
286because some receivers may reject them. When a load balancing proxy has to send
287health checks to a server, it SHOULD build a valid PROXY line which it will
288fill with a getsockname()/getpeername() pair indicating the addresses used. It
289is important to understand that doing so is not appropriate when some source
290address translation is performed between the sender and the receiver.
Willy Tarreau640cf222010-10-29 21:46:16 +0200291
292An example of such a line before an HTTP request would look like this (CR
293marked as "\r" and LF marked as "\n") :
294
295 PROXY TCP4 192.168.0.1 192.168.0.11 56324 443\r\n
296 GET / HTTP/1.1\r\n
297 Host: 192.168.0.11\r\n
298 \r\n
299
Willy Tarreau332d7b02012-11-19 11:27:29 +0100300For the sender, the header line is easy to put into the output buffers once the
301connection is established. Note that since the line is always shorter than an
302MSS, the sender is guaranteed to always be able to emit it at once and should
303not even bother handling partial sends. For the receiver, once the header is
304parsed, it is easy to skip it from the input buffers. Please consult section 9
305for implementation suggestions.
306
307
3082.2. Binary header format (version 2)
309
310Producing human-readable IPv6 addresses and parsing them is very inefficient,
311due to the multiple possible representation formats and the handling of compact
312address format. It was also not possible to specify address families outside
313IPv4/IPv6 nor non-TCP protocols. Another drawback of the human-readable format
314is the fact that implementations need to parse all characters to find the
315trailing CRLF, which makes it harder to read only the exact bytes count. Last,
316the UNKNOWN address type has not always been accepted by servers as a valid
317protocol because of its imprecise meaning.
318
319Version 2 of the protocol thus introduces a new binary format which remains
320distinguishable from version 1 and from other commonly used protocols. It was
321specially designed in order to be incompatible with a wide range of protocols
322and to be rejected by a number of common implementations of these protocols
323when unexpectedly presented (please see section 7). Also for better processing
324efficiency, IPv4 and IPv6 addresses are respectively aligned on 4 and 16 bytes
325boundaries.
326
327The binary header format starts with a constant 12 bytes block containing the
328protocol signature :
329
330 \x0D \x0A \x0D \x0A \x00 \x0D \x0A \x51 \x55 \x49 \x54 \x0A
331
332Note that this block contains a null byte at the 5th position, so it must not
333be handled as a null-terminated string.
334
David Safb76832014-05-08 23:42:08 -0400335The next byte (the 13th one) is the protocol version and command.
Willy Tarreau332d7b02012-11-19 11:27:29 +0100336
David Safb76832014-05-08 23:42:08 -0400337The highest four bits contains the version. As of this specification, it must
338always be sent as \x2 and the receiver must only accept this value.
339
340The lowest four bits represents the command :
341 - \x0 : LOCAL : the connection was established on purpose by the proxy
Willy Tarreau332d7b02012-11-19 11:27:29 +0100342 without being relayed. The connection endpoints are the sender and the
343 receiver. Such connections exist when the proxy sends health-checks to the
344 server. The receiver must accept this connection as valid and must use the
345 real connection endpoints and discard the protocol block including the
346 family which is ignored.
347
David Safb76832014-05-08 23:42:08 -0400348 - \x1 : PROXY : the connection was established on behalf of another node,
Willy Tarreau332d7b02012-11-19 11:27:29 +0100349 and reflects the original connection endpoints. The receiver must then use
350 the information provided in the protocol block to get original the address.
351
352 - other values are unassigned and must not be emitted by senders. Receivers
353 must drop connections presenting unexpected values here.
354
David Safb76832014-05-08 23:42:08 -0400355The 14th byte contains the transport protocol and address family. The highest 4
Willy Tarreau332d7b02012-11-19 11:27:29 +0100356bits contain the address family, the lowest 4 bits contain the protocol.
357
358The address family maps to the original socket family without necessarily
359matching the values internally used by the system. It may be one of :
360
361 - 0x0 : AF_UNSPEC : the connection is forwarded for an unknown, unspecified
362 or unsupported protocol. The sender should use this family when sending
363 LOCAL commands or when dealing with unsupported protocol families. The
364 receiver is free to accept the connection anyway and use the real endpoint
365 addresses or to reject it. The receiver should ignore address information.
366
367 - 0x1 : AF_INET : the forwarded connection uses the AF_INET address family
368 (IPv4). The addresses are exactly 4 bytes each in network byte order,
369 followed by transport protocol information (typically ports).
370
371 - 0x2 : AF_INET6 : the forwarded connection uses the AF_INET6 address family
372 (IPv6). The addresses are exactly 16 bytes each in network byte order,
373 followed by transport protocol information (typically ports).
374
375 - 0x3 : AF_UNIX : the forwarded connection uses the AF_UNIX address family
376 (UNIX). The addresses are exactly 108 bytes each.
377
378 - other values are unspecified and must not be emitted in version 2 of this
379 protocol and must be rejected as invalid by receivers.
380
David Safb76832014-05-08 23:42:08 -0400381The transport protocol is specified in the lowest 4 bits of the the 14th byte :
Willy Tarreau332d7b02012-11-19 11:27:29 +0100382
383 - 0x0 : UNSPEC : the connection is forwarded for an unknown, unspecified
384 or unsupported protocol. The sender should use this family when sending
385 LOCAL commands or when dealing with unsupported protocol families. The
386 receiver is free to accept the connection anyway and use the real endpoint
387 addresses or to reject it. The receiver should ignore address information.
388
389 - 0x1 : STREAM : the forwarded connection uses a SOCK_STREAM protocol (eg:
390 TCP or UNIX_STREAM). When used with AF_INET/AF_INET6 (TCP), the addresses
391 are followed by the source and destination ports represented on 2 bytes
392 each in network byte order.
393
394 - 0x2 : DGRAM : the forwarded connection uses a SOCK_DGRAM protocol (eg:
395 UDP or UNIX_DGRAM). When used with AF_INET/AF_INET6 (UDP), the addresses
396 are followed by the source and destination ports represented on 2 bytes
397 each in network byte order.
398
399 - other values are unspecified and must not be emitted in version 2 of this
400 protocol and must be rejected as invalid by receivers.
401
402In practice, the following protocol bytes are expected :
403
404 - \x00 : UNSPEC : the connection is forwarded for an unknown, unspecified
405 or unsupported protocol. The sender should use this family when sending
406 LOCAL commands or when dealing with unsupported protocol families. When
407 used with a LOCAL command, the receiver must accept the connection and
408 ignore any address information. For other commands, the receiver is free
409 to accept the connection anyway and use the real endpoints addresses or to
410 reject the connection. The receiver should ignore address information.
411
412 - \x11 : TCP over IPv4 : the forwarded connection uses TCP over the AF_INET
413 protocol family. Address length is 2*4 + 2*2 = 12 bytes.
414
415 - \x12 : UDP over IPv4 : the forwarded connection uses UDP over the AF_INET
416 protocol family. Address length is 2*4 + 2*2 = 12 bytes.
417
418 - \x21 : TCP over IPv6 : the forwarded connection uses TCP over the AF_INET6
419 protocol family. Address length is 2*16 + 2*2 = 36 bytes.
420
421 - \x22 : UDP over IPv6 : the forwarded connection uses UDP over the AF_INET6
422 protocol family. Address length is 2*16 + 2*2 = 36 bytes.
423
424 - \x31 : UNIX stream : the forwarded connection uses SOCK_STREAM over the
425 AF_UNIX protocol family. Address length is 2*108 = 216 bytes.
426
427 - \x32 : UNIX datagram : the forwarded connection uses SOCK_DGRAM over the
428 AF_UNIX protocol family. Address length is 2*108 = 216 bytes.
429
430
431Only the UNSPEC protocol byte (\x00) is mandatory. A receiver is not required
432to implement other ones, provided that it automatically falls back to the
433UNSPEC mode for the valid combinations above that it does not support.
Willy Tarreau640cf222010-10-29 21:46:16 +0200434
David Safb76832014-05-08 23:42:08 -0400435The 15th and 16th bytes is the address length in bytes in network endien order.
436It is used so that the receiver knows how many address bytes to skip even when
437it does not implement the presented protocol. Thus the length of the protocol
438header in bytes is always exactly 16 + this value. When a sender presents a
Willy Tarreau332d7b02012-11-19 11:27:29 +0100439LOCAL connection, it should not present any address so it sets this field to
440zero. Receivers MUST always consider this field to skip the appropriate number
441of bytes and must not assume zero is presented for LOCAL connections. When a
442receiver accepts an incoming connection showing an UNSPEC address family or
443protocol, it may or may not decide to log the address information if present.
444
445So the 16-byte version 2 header can be described this way :
446
447 struct proxy_hdr_v2 {
448 uint8_t sig[12]; /* hex 0D 0A 0D 0A 00 0D 0A 51 55 49 54 0A */
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200449 uint8_t ver_cmd; /* protocol version and command */
Willy Tarreau332d7b02012-11-19 11:27:29 +0100450 uint8_t fam; /* protocol family and address */
David Safb76832014-05-08 23:42:08 -0400451 uint16_t len; /* number of following bytes part of the header */
Willy Tarreau332d7b02012-11-19 11:27:29 +0100452 };
453
454Starting from the 17th byte, addresses are presented in network byte order.
455The address order is always the same :
456 - source layer 3 address in network byte order
457 - destination layer 3 address in network byte order
458 - source layer 4 address if any, in network byte order (port)
459 - destination layer 4 address if any, in network byte order (port)
460
461The address block may directly be sent from or received into the following
462union which makes it easy to cast from/to the relevant socket native structs
463depending on the address type :
464
465 union proxy_addr {
466 struct { /* for TCP/UDP over IPv4, len = 12 */
467 uint32_t src_addr;
468 uint32_t dst_addr;
469 uint16_t src_port;
470 uint16_t dst_port;
471 } ipv4_addr;
472 struct { /* for TCP/UDP over IPv6, len = 36 */
473 uint8_t src_addr[16];
474 uint8_t dst_addr[16];
475 uint16_t src_port;
476 uint16_t dst_port;
477 } ipv6_addr;
478 struct { /* for AF_UNIX sockets, len = 216 */
479 uint8_t src_addr[108];
480 uint8_t dst_addr[108];
481 } unix_addr;
482 };
483
484The sender must ensure that all the protocol header is sent at once. This block
485is always smaller than an MSS, so there is no reason for it to be segmented at
486the beginning of the connection. The receiver should also process the header
487at once. The receiver must not start to parse an address before the whole
488address block is received. The receiver must also reject incoming connections
489containing partial protocol headers.
490
491A receiver may be configured to support both version 1 and version 2 of the
492protocol. Identifying the protocol version is easy :
493
494 - if the incoming byte count is 16 or above and the 13 first bytes match
495 the protocol signature block followed by the protocol version 2 :
496
497 \x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A\x02
498
499 - otherwise, if the incoming byte count is 8 or above, and the 5 first
500 characters match the ASCII representation of "PROXY" then the protocol
501 must be parsed as version 1 :
502
503 \x50\x52\x4F\x58\x59
504
505 - otherwise the protocol is not covered by this specification and the
506 connection must be dropped.
507
David Safb76832014-05-08 23:42:08 -0400508If the length specified in the PROXY protocol header indicates that additional
509bytes are part of the header beyond the address information, a receiver may
510choose to skip over and ignore those bytes, or attempt to interpret those
511bytes.
512
513The information in those bytes will be arranged in Type-Length-Value (TLV
514vectors) in the following format. The first byte is the Type of the vector.
515The second two bytes represent the length in bytes of the value (not included
516the Type and Length bytes), and following the length field is the number of
517bytes specified by the length.
518
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200519 struct pp2_tlv {
David Safb76832014-05-08 23:42:08 -0400520 uint8_t type;
521 uint8_t length_hi;
522 uint8_t length_lo;
523 uint8_t value[0];
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200524 };
David Safb76832014-05-08 23:42:08 -0400525
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200526The following types have already been registered for the <type> field :
527
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200528 #define PP2_TYPE_ALPN 0x01
529 #define PP2_TYPE_AUTHORITY 0x02
530 #define PP2_TYPE_SSL 0x20
531 #define PP2_SUBTYPE_SSL_VERSION 0x21
532 #define PP2_SUBTYPE_SSL_CN 0x22
533 #define PP2_TYPE_NETNS 0x30
534
535
5362.2.1. The PP2_TYPE_SSL type and subtypes
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200537
538For the type PP2_TYPE_SSL, the value is itselv a defined like this :
539
540 struct pp2_tlv_ssl {
541 uint8_t client;
542 uint32_t verify;
543 struct pp2_tlv sub_tlv[0];
544 };
545
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200546The <verify> field will be zero if the client presented a certificate
547and it was successfully verified, and non-zero otherwise.
548
549The <client> field is made of a bit field from the following values,
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200550indicating which element is present :
551
552 #define PP2_CLIENT_SSL 0x01
553 #define PP2_CLIENT_CERT_CONN 0x02
554 #define PP2_CLIENT_CERT_SESS 0x04
555
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200556Note, that each of these elements may lead to extra data being appended to
557this TLV using a second level of TLV encapsulation. It is thus possible to
558find multiple TLV values after this field. The total length of the pp2_tlv_ssl
559TLV will reflect this.
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200560
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200561The PP2_CLIENT_SSL flag indicates that the client connected over SSL/TLS. When
562this field is present, the string representation of the TLS version is appended
563at the end of the field in the TLV format using the type PP2_SUBTYPE_SSL_VERSION.
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200564
565PP2_CLIENT_CERT_CONN indicates that the client provided a certificate over the
566current connection. PP2_CLIENT_CERT_SESS indicates that the client provided a
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200567certificate at least once over the TLS session this connection belongs to.
568
569In all cases, the string representation (in UTF8) of the Common Name field
570(OID: 2.5.4.3) of the client certificate's DistinguishedName, is appended
571using the TLV format and the type PP2_SUBTYPE_SSL_CN.
572
573
5742.2.2. The PP2_TYPE_NETNS type
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200575
576The type PP2_TYPE_NETNS defines the value as the string representation of the
577namespace's name.
578
Willy Tarreau7f898512011-03-20 11:32:40 +0100579
5803. Implementations
581
Willy Tarreau332d7b02012-11-19 11:27:29 +0100582Haproxy 1.5 implements version 1 of the PROXY protocol on both sides :
Willy Tarreau7f898512011-03-20 11:32:40 +0100583 - the listening sockets accept the protocol when the "accept-proxy" setting
584 is passed to the "bind" keyword. Connections accepted on such listeners
585 will behave just as if the source really was the one advertised in the
586 protocol. This is true for logging, ACLs, content filtering, transparent
587 proxying, etc...
588
589 - the protocol may be used to connect to servers if the "send-proxy" setting
590 is present on the "server" line. It is enabled on a per-server basis, so it
591 is possible to have it enabled for remote servers only and still have local
592 ones behave differently. If the incoming connection was accepted with the
593 "accept-proxy", then the relayed information is the one advertised in this
594 connection's PROXY line.
595
David Safb76832014-05-08 23:42:08 -0400596 - Haproxy 1.5 also implements version 2 of the PROXY protocol as a sender. In
597 addition, a TLV with limited, optional, SSL information has been added.
598
Willy Tarreau332d7b02012-11-19 11:27:29 +0100599Stunnel added support for version 1 of the protocol for outgoing connections in
600version 4.45.
Willy Tarreau7f898512011-03-20 11:32:40 +0100601
Willy Tarreau332d7b02012-11-19 11:27:29 +0100602Stud added support for version 1 of the protocol for outgoing connections on
6032011/06/29.
604
605Postfix added support for version 1 of the protocol for incoming connections
606in smtpd and postscreen in version 2.10.
607
608A patch is available for Stud[5] to implement version 1 of the protocol on
609incoming connections.
610
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200611Support for versions 1 and 2 of the protocol was added to Varnish 4.1 [6].
Willy Tarreau332d7b02012-11-19 11:27:29 +0100612
Todd Lyonsd1dcea02014-06-03 13:29:33 -0700613Exim added support for version 1 and version 2 of the protocol for incoming
614connections on 2014/05/13, and will be released as part of version 4.83.
615
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200616Squid added support for versions 1 and 2 of the protocol in version 3.5 [7].
617
618Jetty 9.3.0 supports protocol version 1.
619
Willy Tarreau332d7b02012-11-19 11:27:29 +0100620The protocol is simple enough that it is expected that other implementations
621will appear, especially in environments such as SMTP, IMAP, FTP, RDP where the
Willy Tarreau7f898512011-03-20 11:32:40 +0100622client's address is an important piece of information for the server and some
Willy Tarreau332d7b02012-11-19 11:27:29 +0100623intermediaries. In fact, several proprietary deployments have already done so
624on FTP and SMTP servers.
Willy Tarreau7f898512011-03-20 11:32:40 +0100625
626Proxy developers are encouraged to implement this protocol, because it will
627make their products much more transparent in complex infrastructures, and will
628get rid of a number of issues related to logging and access control.
629
Willy Tarreau332d7b02012-11-19 11:27:29 +0100630
6314. Architectural benefits
6324.1. Multiple layers
633
634Using the PROXY protocol instead of transparent proxy provides several benefits
635in multiple-layer infrastructures. The first immediate benefit is that it
636becomes possible to chain multiple layers of proxies and always present the
637original IP address. for instance, let's consider the following 2-layer proxy
638architecture :
639
640 Internet
641 ,---. | client to PX1:
642 ( X ) | native protocol
643 `---' |
644 | V
645 +--+--+ +-----+
646 | FW1 |------| PX1 |
647 +--+--+ +-----+ | PX1 to PX2: PROXY + native
648 | V
649 +--+--+ +-----+
650 | FW2 |------| PX2 |
651 +--+--+ +-----+ | PX2 to SRV: PROXY + native
652 | V
653 +--+--+
654 | SRV |
655 +-----+
Willy Tarreau7f898512011-03-20 11:32:40 +0100656
Willy Tarreau332d7b02012-11-19 11:27:29 +0100657Firewall FW1 receives traffic from internet-based clients and forwards it to
658reverse-proxy PX1. PX1 adds a PROXY header then forwards to PX2 via FW2. PX2
659is configured to read the PROXY header and to emit it on output. It then joins
660the origin server SRV and presents the original client's address there. Since
661all TCP connections endpoints are real machines and are not spoofed, there is
662no issue for the return traffic to pass via the firewalls and reverse proxies.
663Using transparent proxy, this would be quite difficult because the firewalls
664would have to deal with the client's address coming from the proxies in the DMZ
665and would have to correctly route the return traffic there instead of using the
666default route.
Willy Tarreau7f898512011-03-20 11:32:40 +0100667
Willy Tarreau332d7b02012-11-19 11:27:29 +0100668
6694.2. IPv4 and IPv6 integration
670
671The protocol also eases IPv4 and IPv6 integration : if only the first layer
672(FW1 and PX1) is IPv6-capable, it is still possible to present the original
673client's IPv6 address to the target server eventhough the whole chain is only
674connected via IPv4.
675
676
6774.3. Multiple return paths
678
679When transparent proxy is used, it is not possible to run multiple proxies
680because the return traffic would follow the default route instead of finding
681the proper proxy. Some tricks are sometimes possible using multiple server
682addresses and policy routing but these are very limited.
683
684Using the PROXY protocol, this problem disappears as the servers don't need
685to route to the client, just to the proxy that forwarded the connection. So
686it is perfectly possible to run a proxy farm in front of a very large server
687farm and have it working effortless, even when dealing with multiple sites.
688
689This is particularly important in Cloud-like environments where there is little
690choice of binding to random addresses and where the lower processing power per
691node generally requires multiple front nodes.
692
693The example below illustrates the following case : virtualized infrastructures
694are deployed in 3 datacenters (DC1..DC3). Each DC uses its own VIP which is
695handled by the hosting provider's layer 3 load balancer. This load balancer
696routes the traffic to a farm of layer 7 SSL/cache offloaders which load balance
697among their local servers. The VIPs are advertised by geolocalised DNS so that
698clients generally stick to a given DC. Since clients are not guaranteed to
699stick to one DC, the L7 load balancing proxies have to know the other DCs'
700servers that may be reached via the hosting provider's LAN or via the internet.
701The L7 proxies use the PROXY protocol to join the servers behind them, so that
702even inter-DC traffic can forward the original client's address and the return
703path is unambiguous. This would not be possible using transparent proxy because
704most often the L7 proxies would not be able to spoof an address, and this would
705never work between datacenters.
706
707 Internet
708
709 DC1 DC2 DC3
710 ,---. ,---. ,---.
711 ( X ) ( X ) ( X )
712 `---' `---' `---'
713 | +-------+ | +-------+ | +-------+
714 +----| L3 LB | +----| L3 LB | +----| L3 LB |
715 | +-------+ | +-------+ | +-------+
716 ------+------- ~ ~ ~ ------+------- ~ ~ ~ ------+-------
717 ||||| |||| ||||| |||| ||||| ||||
718 50 SRV 4 PX 50 SRV 4 PX 50 SRV 4 PX
719
720
7215. Security considerations
722
723Version 1 of the protocol header (the human-readable format) was designed so as
724to be distinguishable from HTTP. It will not parse as a valid HTTP request and
725an HTTP request will not parse as a valid proxy request. Version 2 add to use a
726non-parsable binary signature to make many products fail on this block. The
727signature was designed to cause immediate failure on HTTP, SSL/TLS, SMTP, FTP,
728and POP. It also causes aborts on LDAP and RDP servers (see section 6). That
729makes it easier to enforce its use under certain connections and at the same
730time, it ensures that improperly configured servers are quickly detected.
731
Willy Tarreau7f898512011-03-20 11:32:40 +0100732Implementers should be very careful about not trying to automatically detect
Willy Tarreau332d7b02012-11-19 11:27:29 +0100733whether they have to decode the header or not, but rather they must only rely
734on a configuration parameter. Indeed, if the opportunity is left to a normal
735client to use the protocol, he will be able to hide his activities or make them
736appear as coming from someone else. However, accepting the header only from a
737number of known sources should be safe.
738
739
7406. Validation
Willy Tarreau7f898512011-03-20 11:32:40 +0100741
Willy Tarreau332d7b02012-11-19 11:27:29 +0100742The version 2 protocol signature has been sent to a wide variety of protocols
743and implementations including old ones. The following protocol and products
744have been tested to ensure the best possible behaviour when the signature was
745presented, even with minimal implementations :
Willy Tarreau7f898512011-03-20 11:32:40 +0100746
Willy Tarreau332d7b02012-11-19 11:27:29 +0100747 - HTTP :
748 - Apache 1.3.33 : connection abort => pass/optimal
749 - Nginx 0.7.69 : 400 Bad Request + abort => pass/optimal
750 - lighttpd 1.4.20 : 400 Bad Request + abort => pass/optimal
751 - thttpd 2.20c : 400 Bad Request + abort => pass/optimal
752 - mini-httpd-1.19 : 400 Bad Request + abort => pass/optimal
753 - haproxy 1.4.21 : 400 Bad Request + abort => pass/optimal
Willy Tarreau9e138202014-07-12 17:31:07 +0200754 - Squid 3 : 400 Bad Request + abort => pass/optimal
Willy Tarreau332d7b02012-11-19 11:27:29 +0100755 - SSL :
756 - stud 0.3.47 : connection abort => pass/optimal
757 - stunnel 4.45 : connection abort => pass/optimal
758 - nginx 0.7.69 : 400 Bad Request + abort => pass/optimal
759 - FTP :
760 - Pure-ftpd 1.0.20 : 3*500 then 221 Goodbye => pass/optimal
761 - vsftpd 2.0.1 : 3*530 then 221 Goodbye => pass/optimal
762 - SMTP :
763 - postfix 2.3 : 3*500 + 221 Bye => pass/optimal
764 - exim 4.69 : 554 + connection abort => pass/optimal
765 - POP :
766 - dovecot 1.0.10 : 3*ERR + Logout => pass/optimal
767 - IMAP :
768 - dovecot 1.0.10 : 5*ERR + hang => pass/non-optimal
769 - LDAP :
770 - openldap 2.3 : abort => pass/optimal
771 - SSH :
772 - openssh 3.9p1 : abort => pass/optimal
773 - RDP :
774 - Windows XP SP3 : abort => pass/optimal
775
776This means that most protocols and implementations will not be confused by an
777incoming connection exhibiting the protocol signature, which avoids issues when
778facing misconfigurations.
779
780
7817. Future developments
Willy Tarreau640cf222010-10-29 21:46:16 +0200782
783It is possible that the protocol may slightly evolve to present other
784information such as the incoming network interface, or the origin addresses in
785case of network address translation happening before the first proxy, but this
Willy Tarreau332d7b02012-11-19 11:27:29 +0100786is not identified as a requirement right now. Some deep thinking has been spent
787on this and it appears that trying to add a few more information open a pandora
788box with many information from MAC addresses to SSL client certificates, which
789would make the protocol much more complex. So at this point it is not planned.
790Suggestions on improvements are welcome.
Willy Tarreau7f898512011-03-20 11:32:40 +0100791
792
Willy Tarreau332d7b02012-11-19 11:27:29 +01007938. Contacts and links
Willy Tarreau7f898512011-03-20 11:32:40 +0100794
795Please use w@1wt.eu to send any comments to the author.
796
Willy Tarreau332d7b02012-11-19 11:27:29 +0100797The following links were referenced in the document.
798
799[1] http://www.postfix.org/XCLIENT_README.html
Willy Tarreau7a6f1342014-06-14 11:45:09 +0200800[2] http://tools.ietf.org/html/rfc7239
Willy Tarreau332d7b02012-11-19 11:27:29 +0100801[3] http://www.stunnel.org/
802[4] https://github.com/bumptech/stud
803[5] https://github.com/bumptech/stud/pull/81
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200804[6] https://www.varnish-cache.org/docs/trunk/phk/ssl_again.html
805[7] http://wiki.squid-cache.org/Squid-3.5
Willy Tarreau332d7b02012-11-19 11:27:29 +0100806
807
8089. Sample code
809
810The code below is an example of how a receiver may deal with both versions of
811the protocol header for TCP over IPv4 or IPv6. The function is supposed to be
812called upon a read event. Addresses may be directly copied into their final
813memory location since they're transported in network byte order. The sending
814side is even simpler and can easily be deduced from this sample code.
815
816 struct sockaddr_storage from; /* already filled by accept() */
817 struct sockaddr_storage to; /* already filled by getsockname() */
Willy Tarreau01320c92014-06-14 08:36:29 +0200818 const char v2sig[12] = "\x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A";
Willy Tarreau332d7b02012-11-19 11:27:29 +0100819
820 /* returns 0 if needs to poll, <0 upon error or >0 if it did the job */
821 int read_evt(int fd)
822 {
823 union {
824 struct {
825 char line[108];
826 } v1;
827 struct {
828 uint8_t sig[12];
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200829 uint8_t ver_cmd;
Willy Tarreau332d7b02012-11-19 11:27:29 +0100830 uint8_t fam;
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200831 uint16_t len;
Willy Tarreau332d7b02012-11-19 11:27:29 +0100832 union {
833 struct { /* for TCP/UDP over IPv4, len = 12 */
834 uint32_t src_addr;
835 uint32_t dst_addr;
836 uint16_t src_port;
837 uint16_t dst_port;
838 } ip4;
839 struct { /* for TCP/UDP over IPv6, len = 36 */
840 uint8_t src_addr[16];
841 uint8_t dst_addr[16];
842 uint16_t src_port;
843 uint16_t dst_port;
844 } ip6;
845 struct { /* for AF_UNIX sockets, len = 216 */
846 uint8_t src_addr[108];
847 uint8_t dst_addr[108];
848 } unx;
849 } addr;
850 } v2;
851 } hdr;
852
853 int size, ret;
854
855 do {
856 ret = recv(fd, &hdr, sizeof(hdr), MSG_PEEK);
857 } while (ret == -1 && errno == EINTR);
858
859 if (ret == -1)
860 return (errno == EAGAIN) ? 0 : -1;
861
Willy Tarreau01320c92014-06-14 08:36:29 +0200862 if (ret >= 16 && memcmp(&hdr.v2, v2sig, 12) == 0 &&
863 (hdr.v2.ver_cmd & 0xF0) == 0x20) {
Willy Tarreau332d7b02012-11-19 11:27:29 +0100864 size = 16 + hdr.v2.len;
865 if (ret < size)
866 return -1; /* truncated or too large header */
867
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200868 switch (hdr.v2.ver_cmd & 0xF) {
Willy Tarreau332d7b02012-11-19 11:27:29 +0100869 case 0x01: /* PROXY command */
870 switch (hdr.v2.fam) {
871 case 0x11: /* TCPv4 */
872 ((struct sockaddr_in *)&from)->sin_family = AF_INET;
873 ((struct sockaddr_in *)&from)->sin_addr.s_addr =
874 hdr.v2.addr.ip4.src_addr;
875 ((struct sockaddr_in *)&from)->sin_port =
876 hdr.v2.addr.ip4.src_port;
877 ((struct sockaddr_in *)&to)->sin_family = AF_INET;
878 ((struct sockaddr_in *)&to)->sin_addr.s_addr =
879 hdr.v2.addr.ip4.dst_addr;
880 ((struct sockaddr_in *)&to)->sin_port =
881 hdr.v2.addr.ip4.dst_port;
882 goto done;
883 case 0x21: /* TCPv6 */
884 ((struct sockaddr_in6 *)&from)->sin6_family = AF_INET6;
885 memcpy(&((struct sockaddr_in6 *)&from)->sin6_addr,
886 hdr.v2.addr.ip6.src_addr, 16);
887 ((struct sockaddr_in6 *)&from)->sin6_port =
888 hdr.v2.addr.ip6.src_port;
889 ((struct sockaddr_in6 *)&to)->sin6_family = AF_INET6;
890 memcpy(&((struct sockaddr_in6 *)&to)->sin6_addr,
891 hdr.v2.addr.ip6.dst_addr, 16);
892 ((struct sockaddr_in6 *)&to)->sin6_port =
893 hdr.v2.addr.ip6.dst_port;
894 goto done;
895 }
896 /* unsupported protocol, keep local connection address */
897 break;
898 case 0x00: /* LOCAL command */
899 /* keep local connection address for LOCAL */
900 break;
901 default:
902 return -1; /* not a supported command */
903 }
904 }
905 else if (ret >= 8 && memcmp(hdr.v1.line, "PROXY", 5) == 0) {
906 char *end = memchr(hdr.v1.line, '\r', ret - 1);
907 if (!end || end[1] != '\n')
908 return -1; /* partial or invalid header */
909 *end = '\0'; /* terminate the string to ease parsing */
910 size = end + 2 - hdr.v1.line; /* skip header + CRLF */
911 /* parse the V1 header using favorite address parsers like inet_pton.
912 * return -1 upon error, or simply fall through to accept.
913 */
914 }
915 else {
916 /* Wrong protocol */
917 return -1;
918 }
919
920 done:
921 /* we need to consume the appropriate amount of data from the socket */
922 do {
923 ret = recv(fd, &hdr, size, 0);
924 } while (ret == -1 && errno == EINTR);
925 return (ret >= 0) ? 1 : -1;
926 }