blob: 0208afc9df2961b35bd3e216ce2f2b310bdf8bd2 [file] [log] [blame]
Tim Duesterhusb435f772020-03-13 12:34:22 +010012020/03/05 Willy Tarreau
Willy Tarreaua3393952014-05-10 15:16:43 +02002 HAProxy Technologies
Willy Tarreau7f898512011-03-20 11:32:40 +01003 The PROXY protocol
Willy Tarreau332d7b02012-11-19 11:27:29 +01004 Versions 1 & 2
Willy Tarreau7f898512011-03-20 11:32:40 +01005
6Abstract
7
8 The PROXY protocol provides a convenient way to safely transport connection
9 information such as a client's address across multiple layers of NAT or TCP
10 proxies. It is designed to require little changes to existing components and
11 to limit the performance impact caused by the processing of the transported
12 information.
13
14
15Revision history
16
17 2010/10/29 - first version
18 2011/03/20 - update: implementation and security considerations
Willy Tarreau332d7b02012-11-19 11:27:29 +010019 2012/06/21 - add support for binary format
20 2012/11/19 - final review and fixes
David Safb76832014-05-08 23:42:08 -040021 2014/05/18 - modify and extend PROXY protocol version 2
Willy Tarreau7a6f1342014-06-14 11:45:09 +020022 2014/06/11 - fix example code to consider ver+cmd merge
23 2014/06/14 - fix v2 header check in example code, and update Forwarded spec
Willy Tarreau7b7011c2015-05-02 15:13:07 +020024 2014/07/12 - update list of implementations (add Squid)
25 2015/05/02 - update list of implementations and format of the TLV add-ons
Andriy Palamarchuk1a943c42017-03-23 16:30:24 -040026 2017/03/10 - added the checksum, noop and more SSL-related TLV types,
27 reserved TLV type ranges, added TLV documentation, clarified
28 string encoding. With contributions from Andriy Palamarchuk
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -040029 (Amazon.com).
Tim Duesterhusb435f772020-03-13 12:34:22 +010030 2020/03/05 - added the unique ID TLV type (Tim Düsterhus)
Willy Tarreau7f898512011-03-20 11:32:40 +010031
32
331. Background
Willy Tarreau640cf222010-10-29 21:46:16 +020034
35Relaying TCP connections through proxies generally involves a loss of the
36original TCP connection parameters such as source and destination addresses,
37ports, and so on. Some protocols make it a little bit easier to transfer such
Willy Tarreau332d7b02012-11-19 11:27:29 +010038information. For SMTP, Postfix authors have proposed the XCLIENT protocol [1]
Willy Tarreau7a6f1342014-06-14 11:45:09 +020039which received broad adoption and is particularly suited to mail exchanges.
40For HTTP, there is the "Forwarded" extension [2], which aims at replacing the
41omnipresent "X-Forwarded-For" header which carries information about the
42original source address, and the less common X-Original-To which carries
43information about the destination address.
Willy Tarreau640cf222010-10-29 21:46:16 +020044
45However, both mechanisms require a knowledge of the underlying protocol to be
46implemented in intermediaries.
47
48Then comes a new class of products which we'll call "dumb proxies", not because
49they don't do anything, but because they're processing protocol-agnostic data.
Willy Tarreau332d7b02012-11-19 11:27:29 +010050Both Stunnel[3] and Stud[4] are examples of such "dumb proxies". They talk raw
51TCP on one side, and raw SSL on the other one, and do that reliably, without
Willy Tarreau7a6f1342014-06-14 11:45:09 +020052any knowledge of what protocol is transported on top of the connection. Haproxy
53running in pure TCP mode obviously falls into that category as well.
Willy Tarreau640cf222010-10-29 21:46:16 +020054
55The problem with such a proxy when it is combined with another one such as
Willy Tarreau7a6f1342014-06-14 11:45:09 +020056haproxy, is to adapt it to talk the higher level protocol. A patch is available
Willy Tarreau332d7b02012-11-19 11:27:29 +010057for Stunnel to make it capable of inserting an X-Forwarded-For header in the
58first HTTP request of each incoming connection. Haproxy is able not to add
59another one when the connection comes from Stunnel, so that it's possible to
60hide it from the servers.
Willy Tarreau640cf222010-10-29 21:46:16 +020061
62The typical architecture becomes the following one :
63
64
65 +--------+ HTTP :80 +----------+
66 | client | --------------------------------> | |
67 | | | haproxy, |
68 +--------+ +---------+ | 1 or 2 |
69 / / HTTPS | stunnel | HTTP :81 | listening|
70 <________/ ---------> | (server | ---------> | ports |
71 | mode) | | |
72 +---------+ +----------+
73
74
75The problem appears when haproxy runs with keep-alive on the side towards the
76client. The Stunnel patch will only add the X-Forwarded-For header to the first
77request of each connection and all subsequent requests will not have it. One
78solution could be to improve the patch to make it support keep-alive and parse
79all forwarded data, whether they're announced with a Content-Length or with a
80Transfer-Encoding, taking care of special methods such as HEAD which announce
Andriy Palamarchukf1eae4e2017-01-24 13:34:08 -050081data without transferring them, etc... In fact, it would require implementing a
Willy Tarreau640cf222010-10-29 21:46:16 +020082full HTTP stack in Stunnel. It would then become a lot more complex, a lot less
83reliable and would not anymore be the "dumb proxy" that fits every purposes.
84
85In practice, we don't need to add a header for each request because we'll emit
86the exact same information every time : the information related to the client
87side connection. We could then cache that information in haproxy and use it for
88every other request. But that becomes dangerous and is still limited to HTTP
89only.
90
Willy Tarreau332d7b02012-11-19 11:27:29 +010091Another approach consists in prepending each connection with a header reporting
92the characteristics of the other side's connection. This method is simpler to
Willy Tarreau640cf222010-10-29 21:46:16 +020093implement, does not require any protocol-specific knowledge on either side, and
Willy Tarreau332d7b02012-11-19 11:27:29 +010094completely fits the purpose since what is desired precisely is to know the
95other side's connection endpoints. It is easy to perform for the sender (just
96send a short header once the connection is established) and to parse for the
97receiver (simply perform one read() on the incoming connection to fill in
98addresses after an accept). The protocol used to carry connection information
99across proxies was thus called the PROXY protocol.
Willy Tarreau640cf222010-10-29 21:46:16 +0200100
Willy Tarreau7f898512011-03-20 11:32:40 +0100101
Willy Tarreau332d7b02012-11-19 11:27:29 +01001022. The PROXY protocol header
Willy Tarreau7f898512011-03-20 11:32:40 +0100103
Willy Tarreau332d7b02012-11-19 11:27:29 +0100104This document uses a few terms that are worth explaining here :
105 - "connection initiator" is the party requesting a new connection
106 - "connection target" is the party accepting a connection request
107 - "client" is the party for which a connection was requested
108 - "server" is the party to which the client desired to connect
109 - "proxy" is the party intercepting and relaying the connection
110 from the client to the server.
111 - "sender" is the party sending data over a connection.
112 - "receiver" is the party receiving data from the sender.
113 - "header" or "PROXY protocol header" is the block of connection information
114 the connection initiator prepends at the beginning of a connection, which
115 makes it the sender from the protocol point of view.
116
117The PROXY protocol's goal is to fill the server's internal structures with the
118information collected by the proxy that the server would have been able to get
119by itself if the client was connecting directly to the server instead of via a
120proxy. The information carried by the protocol are the ones the server would
121get using getsockname() and getpeername() :
122 - address family (AF_INET for IPv4, AF_INET6 for IPv6, AF_UNIX)
123 - socket protocol (SOCK_STREAM for TCP, SOCK_DGRAM for UDP)
Willy Tarreau640cf222010-10-29 21:46:16 +0200124 - layer 3 source and destination addresses
125 - layer 4 source and destination ports if any
126
127Unlike the XCLIENT protocol, the PROXY protocol was designed with limited
Willy Tarreau332d7b02012-11-19 11:27:29 +0100128extensibility in order to help the receiver parse it very fast. Version 1 was
129focused on keeping it human-readable for better debugging possibilities, which
130is always desirable for early adoption when few implementations exist. Version
1312 adds support for a binary encoding of the header which is much more efficient
132to produce and to parse, especially when dealing with IPv6 addresses that are
133expensive to emit in ASCII form and to parse.
134
135In both cases, the protocol simply consists in an easily parsable header placed
136by the connection initiator at the beginning of each connection. The protocol
137is intentionally stateless in that it does not expect the sender to wait for
138the receiver before sending the header, nor the receiver to send anything back.
139
140This specification supports two header formats, a human-readable format which
141is the only format supported in version 1 of the protocol, and a binary format
142which is only supported in version 2. Both formats were designed to ensure that
143the header cannot be confused with common higher level protocols such as HTTP,
144SSL/TLS, FTP or SMTP, and that both formats are easily distinguishable one from
145each other for the receiver.
146
147Version 1 senders MAY only produce the human-readable header format. Version 2
148senders MAY only produce the binary header format. Version 1 receivers MUST at
149least implement the human-readable header format. Version 2 receivers MUST at
150least implement the binary header format, and it is recommended that they also
151implement the human-readable header format for better interoperability and ease
152of upgrade when facing version 1 senders.
153
154Both formats are designed to fit in the smallest TCP segment that any TCP/IP
155host is required to support (576 - 40 = 536 bytes). This ensures that the whole
156header will always be delivered at once when the socket buffers are still empty
157at the beginning of a connection. The sender must always ensure that the header
158is sent at once, so that the transport layer maintains atomicity along the path
159to the receiver. The receiver may be tolerant to partial headers or may simply
160drop the connection when receiving a partial header. Recommendation is to be
161tolerant, but implementation constraints may not always easily permit this. It
162is important to note that nothing forces any intermediary to forward the whole
163header at once, because TCP is a streaming protocol which may be processed one
164byte at a time if desired, causing the header to be fragmented when reaching
165the receiver. But due to the places where such a protocol is used, the above
166simplification generally is acceptable because the risk of crossing such a
167device handling one byte at a time is close to zero.
168
169The receiver MUST NOT start processing the connection before it receives a
170complete and valid PROXY protocol header. This is particularly important for
171protocols where the receiver is expected to speak first (eg: SMTP, FTP or SSH).
172The receiver may apply a short timeout and decide to abort the connection if
173the protocol header is not seen within a few seconds (at least 3 seconds to
174cover a TCP retransmit).
175
176The receiver MUST be configured to only receive the protocol described in this
177specification and MUST not try to guess whether the protocol header is present
178or not. This means that the protocol explicitly prevents port sharing between
179public and private access. Otherwise it would open a major security breach by
180allowing untrusted parties to spoof their connection addresses. The receiver
181SHOULD ensure proper access filtering so that only trusted proxies are allowed
182to use this protocol.
183
184Some proxies are smart enough to understand transported protocols and to reuse
185idle server connections for multiple messages. This typically happens in HTTP
186where requests from multiple clients may be sent over the same connection. Such
187proxies MUST NOT implement this protocol on multiplexed connections because the
188receiver would use the address advertised in the PROXY header as the address of
189all forwarded requests's senders. In fact, such proxies are not dumb proxies,
190and since they do have a complete understanding of the transported protocol,
191they MUST use the facilities provided by this protocol to present the client's
192address.
193
194
1952.1. Human-readable header format (Version 1)
196
197This is the format specified in version 1 of the protocol. It consists in one
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400198line of US-ASCII text matching exactly the following block, sent immediately
199and at once upon the connection establishment and prepended before any data
200flowing from the sender to the receiver :
Willy Tarreau640cf222010-10-29 21:46:16 +0200201
202 - a string identifying the protocol : "PROXY" ( \x50 \x52 \x4F \x58 \x59 )
Willy Tarreau332d7b02012-11-19 11:27:29 +0100203 Seeing this string indicates that this is version 1 of the protocol.
Willy Tarreau640cf222010-10-29 21:46:16 +0200204
205 - exactly one space : " " ( \x20 )
206
Willy Tarreau332d7b02012-11-19 11:27:29 +0100207 - a string indicating the proxied INET protocol and family. As of version 1,
Willy Tarreau640cf222010-10-29 21:46:16 +0200208 only "TCP4" ( \x54 \x43 \x50 \x34 ) for TCP over IPv4, and "TCP6"
Willy Tarreau332d7b02012-11-19 11:27:29 +0100209 ( \x54 \x43 \x50 \x36 ) for TCP over IPv6 are allowed. Other, unsupported,
210 or unknown protocols must be reported with the name "UNKNOWN" ( \x55 \x4E
211 \x4B \x4E \x4F \x57 \x4E ). For "UNKNOWN", the rest of the line before the
212 CRLF may be omitted by the sender, and the receiver must ignore anything
213 presented before the CRLF is found. Note that an earlier version of this
214 specification suggested to use this when sending health checks, but this
215 causes issues with servers that reject the "UNKNOWN" keyword. Thus is it
216 now recommended not to send "UNKNOWN" when the connection is expected to
217 be accepted, but only when it is not possible to correctly fill the PROXY
218 line.
Willy Tarreau640cf222010-10-29 21:46:16 +0200219
220 - exactly one space : " " ( \x20 )
221
222 - the layer 3 source address in its canonical format. IPv4 addresses must be
223 indicated as a series of exactly 4 integers in the range [0..255] inclusive
224 written in decimal representation separated by exactly one dot between each
225 other. Heading zeroes are not permitted in front of numbers in order to
226 avoid any possible confusion with octal numbers. IPv6 addresses must be
Willy Tarreau269a9b62020-02-25 18:04:39 +0100227 indicated as series of sets of 4 hexadecimal digits (upper or lower case)
228 delimited by colons between each other, with the acceptance of one double
229 colon sequence to replace the largest acceptable range of consecutive
230 zeroes. The total number of decoded bits must exactly be 128. The
231 advertised protocol family dictates what format to use.
Willy Tarreau640cf222010-10-29 21:46:16 +0200232
233 - exactly one space : " " ( \x20 )
234
235 - the layer 3 destination address in its canonical format. It is the same
236 format as the layer 3 source address and matches the same family.
237
238 - exactly one space : " " ( \x20 )
239
240 - the TCP source port represented as a decimal integer in the range
241 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
242 in order to avoid any possible confusion with octal numbers.
243
244 - exactly one space : " " ( \x20 )
245
246 - the TCP destination port represented as a decimal integer in the range
247 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
248 in order to avoid any possible confusion with octal numbers.
249
250 - the CRLF sequence ( \x0D \x0A )
251
Willy Tarreau332d7b02012-11-19 11:27:29 +0100252
253The maximum line lengths the receiver must support including the CRLF are :
254 - TCP/IPv4 :
255 "PROXY TCP4 255.255.255.255 255.255.255.255 65535 65535\r\n"
256 => 5 + 1 + 4 + 1 + 15 + 1 + 15 + 1 + 5 + 1 + 5 + 2 = 56 chars
257
258 - TCP/IPv6 :
259 "PROXY TCP6 ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
260 => 5 + 1 + 4 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 104 chars
261
262 - unknown connection (short form) :
263 "PROXY UNKNOWN\r\n"
264 => 5 + 1 + 7 + 2 = 15 chars
265
266 - worst case (optional fields set to 0xff) :
267 "PROXY UNKNOWN ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
268 => 5 + 1 + 7 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 107 chars
269
270So a 108-byte buffer is always enough to store all the line and a trailing zero
271for string processing.
272
273The receiver must wait for the CRLF sequence before starting to decode the
274addresses in order to ensure they are complete and properly parsed. If the CRLF
275sequence is not found in the first 107 characters, the receiver should declare
276the line invalid. A receiver may reject an incomplete line which does not
277contain the CRLF sequence in the first atomic read operation. The receiver must
278not tolerate a single CR or LF character to end the line when a complete CRLF
279sequence is expected.
280
281Any sequence which does not exactly match the protocol must be discarded and
282cause the receiver to abort the connection. It is recommended to abort the
283connection as soon as possible so that the sender gets a chance to notice the
284anomaly and log it.
Willy Tarreau640cf222010-10-29 21:46:16 +0200285
286If the announced transport protocol is "UNKNOWN", then the receiver knows that
Willy Tarreau332d7b02012-11-19 11:27:29 +0100287the sender speaks the correct PROXY protocol with the appropriate version, and
288SHOULD accept the connection and use the real connection's parameters as if
289there were no PROXY protocol header on the wire. However, senders SHOULD not
290use the "UNKNOWN" protocol when they are the initiators of outgoing connections
291because some receivers may reject them. When a load balancing proxy has to send
292health checks to a server, it SHOULD build a valid PROXY line which it will
293fill with a getsockname()/getpeername() pair indicating the addresses used. It
294is important to understand that doing so is not appropriate when some source
295address translation is performed between the sender and the receiver.
Willy Tarreau640cf222010-10-29 21:46:16 +0200296
297An example of such a line before an HTTP request would look like this (CR
298marked as "\r" and LF marked as "\n") :
299
300 PROXY TCP4 192.168.0.1 192.168.0.11 56324 443\r\n
301 GET / HTTP/1.1\r\n
302 Host: 192.168.0.11\r\n
303 \r\n
304
Willy Tarreau332d7b02012-11-19 11:27:29 +0100305For the sender, the header line is easy to put into the output buffers once the
306connection is established. Note that since the line is always shorter than an
307MSS, the sender is guaranteed to always be able to emit it at once and should
308not even bother handling partial sends. For the receiver, once the header is
309parsed, it is easy to skip it from the input buffers. Please consult section 9
310for implementation suggestions.
311
312
3132.2. Binary header format (version 2)
314
315Producing human-readable IPv6 addresses and parsing them is very inefficient,
316due to the multiple possible representation formats and the handling of compact
317address format. It was also not possible to specify address families outside
318IPv4/IPv6 nor non-TCP protocols. Another drawback of the human-readable format
319is the fact that implementations need to parse all characters to find the
320trailing CRLF, which makes it harder to read only the exact bytes count. Last,
321the UNKNOWN address type has not always been accepted by servers as a valid
322protocol because of its imprecise meaning.
323
324Version 2 of the protocol thus introduces a new binary format which remains
325distinguishable from version 1 and from other commonly used protocols. It was
326specially designed in order to be incompatible with a wide range of protocols
327and to be rejected by a number of common implementations of these protocols
328when unexpectedly presented (please see section 7). Also for better processing
329efficiency, IPv4 and IPv6 addresses are respectively aligned on 4 and 16 bytes
330boundaries.
331
332The binary header format starts with a constant 12 bytes block containing the
333protocol signature :
334
335 \x0D \x0A \x0D \x0A \x00 \x0D \x0A \x51 \x55 \x49 \x54 \x0A
336
337Note that this block contains a null byte at the 5th position, so it must not
338be handled as a null-terminated string.
339
David Safb76832014-05-08 23:42:08 -0400340The next byte (the 13th one) is the protocol version and command.
Willy Tarreau332d7b02012-11-19 11:27:29 +0100341
David Safb76832014-05-08 23:42:08 -0400342The highest four bits contains the version. As of this specification, it must
343always be sent as \x2 and the receiver must only accept this value.
344
345The lowest four bits represents the command :
346 - \x0 : LOCAL : the connection was established on purpose by the proxy
Willy Tarreau332d7b02012-11-19 11:27:29 +0100347 without being relayed. The connection endpoints are the sender and the
348 receiver. Such connections exist when the proxy sends health-checks to the
349 server. The receiver must accept this connection as valid and must use the
350 real connection endpoints and discard the protocol block including the
351 family which is ignored.
352
David Safb76832014-05-08 23:42:08 -0400353 - \x1 : PROXY : the connection was established on behalf of another node,
Willy Tarreau332d7b02012-11-19 11:27:29 +0100354 and reflects the original connection endpoints. The receiver must then use
355 the information provided in the protocol block to get original the address.
356
357 - other values are unassigned and must not be emitted by senders. Receivers
358 must drop connections presenting unexpected values here.
359
David Safb76832014-05-08 23:42:08 -0400360The 14th byte contains the transport protocol and address family. The highest 4
Willy Tarreau332d7b02012-11-19 11:27:29 +0100361bits contain the address family, the lowest 4 bits contain the protocol.
362
363The address family maps to the original socket family without necessarily
364matching the values internally used by the system. It may be one of :
365
366 - 0x0 : AF_UNSPEC : the connection is forwarded for an unknown, unspecified
367 or unsupported protocol. The sender should use this family when sending
368 LOCAL commands or when dealing with unsupported protocol families. The
369 receiver is free to accept the connection anyway and use the real endpoint
370 addresses or to reject it. The receiver should ignore address information.
371
372 - 0x1 : AF_INET : the forwarded connection uses the AF_INET address family
373 (IPv4). The addresses are exactly 4 bytes each in network byte order,
374 followed by transport protocol information (typically ports).
375
376 - 0x2 : AF_INET6 : the forwarded connection uses the AF_INET6 address family
377 (IPv6). The addresses are exactly 16 bytes each in network byte order,
378 followed by transport protocol information (typically ports).
379
380 - 0x3 : AF_UNIX : the forwarded connection uses the AF_UNIX address family
381 (UNIX). The addresses are exactly 108 bytes each.
382
383 - other values are unspecified and must not be emitted in version 2 of this
384 protocol and must be rejected as invalid by receivers.
385
Andriy Palamarchukf1eae4e2017-01-24 13:34:08 -0500386The transport protocol is specified in the lowest 4 bits of the 14th byte :
Willy Tarreau332d7b02012-11-19 11:27:29 +0100387
388 - 0x0 : UNSPEC : the connection is forwarded for an unknown, unspecified
389 or unsupported protocol. The sender should use this family when sending
390 LOCAL commands or when dealing with unsupported protocol families. The
391 receiver is free to accept the connection anyway and use the real endpoint
392 addresses or to reject it. The receiver should ignore address information.
393
394 - 0x1 : STREAM : the forwarded connection uses a SOCK_STREAM protocol (eg:
395 TCP or UNIX_STREAM). When used with AF_INET/AF_INET6 (TCP), the addresses
396 are followed by the source and destination ports represented on 2 bytes
397 each in network byte order.
398
399 - 0x2 : DGRAM : the forwarded connection uses a SOCK_DGRAM protocol (eg:
400 UDP or UNIX_DGRAM). When used with AF_INET/AF_INET6 (UDP), the addresses
401 are followed by the source and destination ports represented on 2 bytes
402 each in network byte order.
403
404 - other values are unspecified and must not be emitted in version 2 of this
405 protocol and must be rejected as invalid by receivers.
406
407In practice, the following protocol bytes are expected :
408
409 - \x00 : UNSPEC : the connection is forwarded for an unknown, unspecified
410 or unsupported protocol. The sender should use this family when sending
411 LOCAL commands or when dealing with unsupported protocol families. When
412 used with a LOCAL command, the receiver must accept the connection and
413 ignore any address information. For other commands, the receiver is free
414 to accept the connection anyway and use the real endpoints addresses or to
415 reject the connection. The receiver should ignore address information.
416
417 - \x11 : TCP over IPv4 : the forwarded connection uses TCP over the AF_INET
418 protocol family. Address length is 2*4 + 2*2 = 12 bytes.
419
420 - \x12 : UDP over IPv4 : the forwarded connection uses UDP over the AF_INET
421 protocol family. Address length is 2*4 + 2*2 = 12 bytes.
422
423 - \x21 : TCP over IPv6 : the forwarded connection uses TCP over the AF_INET6
424 protocol family. Address length is 2*16 + 2*2 = 36 bytes.
425
426 - \x22 : UDP over IPv6 : the forwarded connection uses UDP over the AF_INET6
427 protocol family. Address length is 2*16 + 2*2 = 36 bytes.
428
429 - \x31 : UNIX stream : the forwarded connection uses SOCK_STREAM over the
430 AF_UNIX protocol family. Address length is 2*108 = 216 bytes.
431
432 - \x32 : UNIX datagram : the forwarded connection uses SOCK_DGRAM over the
433 AF_UNIX protocol family. Address length is 2*108 = 216 bytes.
434
435
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500436Only the UNSPEC protocol byte (\x00) is mandatory to implement on the receiver.
437A receiver is not required to implement other ones, provided that it
438automatically falls back to the UNSPEC mode for the valid combinations above
439that it does not support.
Willy Tarreau640cf222010-10-29 21:46:16 +0200440
Andriy Palamarchukf1eae4e2017-01-24 13:34:08 -0500441The 15th and 16th bytes is the address length in bytes in network endian order.
David Safb76832014-05-08 23:42:08 -0400442It is used so that the receiver knows how many address bytes to skip even when
443it does not implement the presented protocol. Thus the length of the protocol
444header in bytes is always exactly 16 + this value. When a sender presents a
Willy Tarreau332d7b02012-11-19 11:27:29 +0100445LOCAL connection, it should not present any address so it sets this field to
446zero. Receivers MUST always consider this field to skip the appropriate number
447of bytes and must not assume zero is presented for LOCAL connections. When a
448receiver accepts an incoming connection showing an UNSPEC address family or
449protocol, it may or may not decide to log the address information if present.
450
451So the 16-byte version 2 header can be described this way :
452
453 struct proxy_hdr_v2 {
454 uint8_t sig[12]; /* hex 0D 0A 0D 0A 00 0D 0A 51 55 49 54 0A */
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200455 uint8_t ver_cmd; /* protocol version and command */
Willy Tarreau332d7b02012-11-19 11:27:29 +0100456 uint8_t fam; /* protocol family and address */
David Safb76832014-05-08 23:42:08 -0400457 uint16_t len; /* number of following bytes part of the header */
Willy Tarreau332d7b02012-11-19 11:27:29 +0100458 };
459
460Starting from the 17th byte, addresses are presented in network byte order.
461The address order is always the same :
462 - source layer 3 address in network byte order
463 - destination layer 3 address in network byte order
464 - source layer 4 address if any, in network byte order (port)
465 - destination layer 4 address if any, in network byte order (port)
466
467The address block may directly be sent from or received into the following
468union which makes it easy to cast from/to the relevant socket native structs
469depending on the address type :
470
471 union proxy_addr {
472 struct { /* for TCP/UDP over IPv4, len = 12 */
473 uint32_t src_addr;
474 uint32_t dst_addr;
475 uint16_t src_port;
476 uint16_t dst_port;
477 } ipv4_addr;
478 struct { /* for TCP/UDP over IPv6, len = 36 */
479 uint8_t src_addr[16];
480 uint8_t dst_addr[16];
481 uint16_t src_port;
482 uint16_t dst_port;
483 } ipv6_addr;
484 struct { /* for AF_UNIX sockets, len = 216 */
485 uint8_t src_addr[108];
486 uint8_t dst_addr[108];
487 } unix_addr;
488 };
489
490The sender must ensure that all the protocol header is sent at once. This block
491is always smaller than an MSS, so there is no reason for it to be segmented at
492the beginning of the connection. The receiver should also process the header
493at once. The receiver must not start to parse an address before the whole
494address block is received. The receiver must also reject incoming connections
495containing partial protocol headers.
496
497A receiver may be configured to support both version 1 and version 2 of the
498protocol. Identifying the protocol version is easy :
499
500 - if the incoming byte count is 16 or above and the 13 first bytes match
501 the protocol signature block followed by the protocol version 2 :
502
503 \x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A\x02
504
505 - otherwise, if the incoming byte count is 8 or above, and the 5 first
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400506 characters match the US-ASCII representation of "PROXY" then the protocol
Willy Tarreau332d7b02012-11-19 11:27:29 +0100507 must be parsed as version 1 :
508
509 \x50\x52\x4F\x58\x59
510
511 - otherwise the protocol is not covered by this specification and the
512 connection must be dropped.
513
David Safb76832014-05-08 23:42:08 -0400514If the length specified in the PROXY protocol header indicates that additional
515bytes are part of the header beyond the address information, a receiver may
516choose to skip over and ignore those bytes, or attempt to interpret those
517bytes.
518
519The information in those bytes will be arranged in Type-Length-Value (TLV
520vectors) in the following format. The first byte is the Type of the vector.
521The second two bytes represent the length in bytes of the value (not included
522the Type and Length bytes), and following the length field is the number of
523bytes specified by the length.
524
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200525 struct pp2_tlv {
David Safb76832014-05-08 23:42:08 -0400526 uint8_t type;
527 uint8_t length_hi;
528 uint8_t length_lo;
529 uint8_t value[0];
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200530 };
David Safb76832014-05-08 23:42:08 -0400531
Jackie Tapia749f74c2020-07-22 18:59:40 -0500532A receiver may choose to skip over and ignore the TLVs it is not interested in
533or it does not understand. Senders can generate the TLVs only for
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500534the information they choose to publish.
535
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200536The following types have already been registered for the <type> field :
537
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200538 #define PP2_TYPE_ALPN 0x01
539 #define PP2_TYPE_AUTHORITY 0x02
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400540 #define PP2_TYPE_CRC32C 0x03
Andriy Palamarchuk1a943c42017-03-23 16:30:24 -0400541 #define PP2_TYPE_NOOP 0x04
Tim Duesterhusb435f772020-03-13 12:34:22 +0100542 #define PP2_TYPE_UNIQUE_ID 0x05
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200543 #define PP2_TYPE_SSL 0x20
544 #define PP2_SUBTYPE_SSL_VERSION 0x21
545 #define PP2_SUBTYPE_SSL_CN 0x22
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400546 #define PP2_SUBTYPE_SSL_CIPHER 0x23
547 #define PP2_SUBTYPE_SSL_SIG_ALG 0x24
548 #define PP2_SUBTYPE_SSL_KEY_ALG 0x25
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200549 #define PP2_TYPE_NETNS 0x30
550
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500551
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -04005522.2.1 PP2_TYPE_ALPN
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500553
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400554Application-Layer Protocol Negotiation (ALPN). It is a byte sequence defining
555the upper layer protocol in use over the connection. The most common use case
556will be to pass the exact copy of the ALPN extension of the Transport Layer
557Security (TLS) protocol as defined by RFC7301 [9].
558
559
5602.2.2 PP2_TYPE_AUTHORITY
561
562Contains the host name value passed by the client, as an UTF8-encoded string.
563In case of TLS being used on the client connection, this is the exact copy of
564the "server_name" extension as defined by RFC3546 [10], section 3.1, often
565referred to as "SNI". There are probably other situations where an authority
Michael Prokop4438c602019-05-24 10:25:45 +0200566can be mentioned on a connection without TLS being involved at all.
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400567
568
5692.2.3. PP2_TYPE_CRC32C
570
571The value of the type PP2_TYPE_CRC32C is a 32-bit number storing the CRC32c
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500572checksum of the PROXY protocol header.
573
574When the checksum is supported by the sender after constructing the header
575the sender MUST:
576
577 - initialize the checksum field to '0's.
578
579 - calculate the CRC32c checksum of the PROXY header as described in RFC4960,
580 Appendix B [8].
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200581
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500582 - put the resultant value into the checksum field, and leave the rest of
583 the bits unchanged.
584
585If the checksum is provided as part of the PROXY header and the checksum
586functionality is supported by the receiver, the receiver MUST:
587
588 - store the received CRC32c checksum value aside.
589
590 - replace the 32 bits of the checksum field in the received PROXY header with
591 all '0's and calculate a CRC32c checksum value of the whole PROXY header.
592
593 - verify that the calculated CRC32c checksum is the same as the received
594 CRC32c checksum. If it is not, the receiver MUST treat the TCP connection
595 providing the header as invalid.
596
597The default procedure for handling an invalid TCP connection is to abort it.
598
599
Andriy Palamarchuk1a943c42017-03-23 16:30:24 -04006002.2.4. PP2_TYPE_NOOP
601
602The TLV of this type should be ignored when parsed. The value is zero or more
603bytes. Can be used for data padding or alignment. Note that it can be used
604to align only by 3 or more bytes because a TLV can not be smaller than that.
605
606
Tim Duesterhusb435f772020-03-13 12:34:22 +01006072.2.5. PP2_TYPE_UNIQUE_ID
608
609The value of the type PP2_TYPE_UNIQUE_ID is an opaque byte sequence of up to
610128 bytes generated by the upstream proxy that uniquely identifies the
611connection.
612
613The unique ID can be used to easily correlate connections across multiple
614layers of proxies, without needing to look up IP addresses and port numbers.
615
616
6172.2.6. The PP2_TYPE_SSL type and subtypes
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200618
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400619For the type PP2_TYPE_SSL, the value is itself a defined like this :
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200620
621 struct pp2_tlv_ssl {
622 uint8_t client;
623 uint32_t verify;
624 struct pp2_tlv sub_tlv[0];
625 };
626
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200627The <verify> field will be zero if the client presented a certificate
628and it was successfully verified, and non-zero otherwise.
629
630The <client> field is made of a bit field from the following values,
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200631indicating which element is present :
632
633 #define PP2_CLIENT_SSL 0x01
634 #define PP2_CLIENT_CERT_CONN 0x02
635 #define PP2_CLIENT_CERT_SESS 0x04
636
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200637Note, that each of these elements may lead to extra data being appended to
638this TLV using a second level of TLV encapsulation. It is thus possible to
639find multiple TLV values after this field. The total length of the pp2_tlv_ssl
640TLV will reflect this.
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200641
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200642The PP2_CLIENT_SSL flag indicates that the client connected over SSL/TLS. When
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400643this field is present, the US-ASCII string representation of the TLS version is
644appended at the end of the field in the TLV format using the type
645PP2_SUBTYPE_SSL_VERSION.
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200646
647PP2_CLIENT_CERT_CONN indicates that the client provided a certificate over the
648current connection. PP2_CLIENT_CERT_SESS indicates that the client provided a
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200649certificate at least once over the TLS session this connection belongs to.
650
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400651The second level TLV PP2_SUBTYPE_SSL_CIPHER provides the US-ASCII string name
652of the used cipher, for example "ECDHE-RSA-AES128-GCM-SHA256".
653
654The second level TLV PP2_SUBTYPE_SSL_SIG_ALG provides the US-ASCII string name
655of the algorithm used to sign the certificate presented by the frontend when
656the incoming connection was made over an SSL/TLS transport layer, for example
657"SHA256".
658
659The second level TLV PP2_SUBTYPE_SSL_KEY_ALG provides the US-ASCII string name
660of the algorithm used to generate the key of the certificate presented by the
661frontend when the incoming connection was made over an SSL/TLS transport layer,
662for example "RSA2048".
663
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200664In all cases, the string representation (in UTF8) of the Common Name field
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400665(OID: 2.5.4.3) of the client certificate's Distinguished Name, is appended
666using the TLV format and the type PP2_SUBTYPE_SSL_CN. E.g. "example.com".
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200667
668
Tim Duesterhusb435f772020-03-13 12:34:22 +01006692.2.7. The PP2_TYPE_NETNS type
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200670
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400671The type PP2_TYPE_NETNS defines the value as the US-ASCII string representation
672of the namespace's name.
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200673
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500674
Tim Duesterhusb435f772020-03-13 12:34:22 +01006752.2.8. Reserved type ranges
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500676
677The following range of 16 type values is reserved for application-specific
678data and will be never used by the PROXY Protocol. If you need more values
679consider extending the range with a type field in your TLVs.
680
681 #define PP2_TYPE_MIN_CUSTOM 0xE0
682 #define PP2_TYPE_MAX_CUSTOM 0xEF
683
684This range of 8 values is reserved for temporary experimental use by
685application developers and protocol designers. The values from the range will
686never be used by the PROXY protocol and should not be used by production
687functionality.
688
689 #define PP2_TYPE_MIN_EXPERIMENT 0xF0
690 #define PP2_TYPE_MAX_EXPERIMENT 0xF7
691
692The following range of 8 values is reserved for future use, potentially to
693extend the protocol with multibyte type values.
694
695 #define PP2_TYPE_MIN_FUTURE 0xF8
696 #define PP2_TYPE_MAX_FUTURE 0xFF
697
Willy Tarreau7f898512011-03-20 11:32:40 +0100698
6993. Implementations
700
Willy Tarreau332d7b02012-11-19 11:27:29 +0100701Haproxy 1.5 implements version 1 of the PROXY protocol on both sides :
Willy Tarreau7f898512011-03-20 11:32:40 +0100702 - the listening sockets accept the protocol when the "accept-proxy" setting
703 is passed to the "bind" keyword. Connections accepted on such listeners
704 will behave just as if the source really was the one advertised in the
705 protocol. This is true for logging, ACLs, content filtering, transparent
706 proxying, etc...
707
708 - the protocol may be used to connect to servers if the "send-proxy" setting
709 is present on the "server" line. It is enabled on a per-server basis, so it
710 is possible to have it enabled for remote servers only and still have local
711 ones behave differently. If the incoming connection was accepted with the
712 "accept-proxy", then the relayed information is the one advertised in this
713 connection's PROXY line.
714
David Safb76832014-05-08 23:42:08 -0400715 - Haproxy 1.5 also implements version 2 of the PROXY protocol as a sender. In
716 addition, a TLV with limited, optional, SSL information has been added.
717
Willy Tarreau332d7b02012-11-19 11:27:29 +0100718Stunnel added support for version 1 of the protocol for outgoing connections in
719version 4.45.
Willy Tarreau7f898512011-03-20 11:32:40 +0100720
Willy Tarreau332d7b02012-11-19 11:27:29 +0100721Stud added support for version 1 of the protocol for outgoing connections on
7222011/06/29.
723
724Postfix added support for version 1 of the protocol for incoming connections
725in smtpd and postscreen in version 2.10.
726
727A patch is available for Stud[5] to implement version 1 of the protocol on
728incoming connections.
729
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200730Support for versions 1 and 2 of the protocol was added to Varnish 4.1 [6].
Willy Tarreau332d7b02012-11-19 11:27:29 +0100731
Todd Lyonsd1dcea02014-06-03 13:29:33 -0700732Exim added support for version 1 and version 2 of the protocol for incoming
733connections on 2014/05/13, and will be released as part of version 4.83.
734
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200735Squid added support for versions 1 and 2 of the protocol in version 3.5 [7].
736
737Jetty 9.3.0 supports protocol version 1.
738
Glenn Straussc28bb552017-04-05 01:51:37 -0400739lighttpd added support for versions 1 and 2 of the protocol for incoming
740connections in version 1.4.46 [11].
741
Willy Tarreau332d7b02012-11-19 11:27:29 +0100742The protocol is simple enough that it is expected that other implementations
743will appear, especially in environments such as SMTP, IMAP, FTP, RDP where the
Willy Tarreau7f898512011-03-20 11:32:40 +0100744client's address is an important piece of information for the server and some
Willy Tarreau332d7b02012-11-19 11:27:29 +0100745intermediaries. In fact, several proprietary deployments have already done so
746on FTP and SMTP servers.
Willy Tarreau7f898512011-03-20 11:32:40 +0100747
748Proxy developers are encouraged to implement this protocol, because it will
749make their products much more transparent in complex infrastructures, and will
750get rid of a number of issues related to logging and access control.
751
Willy Tarreau332d7b02012-11-19 11:27:29 +0100752
7534. Architectural benefits
7544.1. Multiple layers
755
756Using the PROXY protocol instead of transparent proxy provides several benefits
757in multiple-layer infrastructures. The first immediate benefit is that it
758becomes possible to chain multiple layers of proxies and always present the
759original IP address. for instance, let's consider the following 2-layer proxy
760architecture :
761
762 Internet
763 ,---. | client to PX1:
764 ( X ) | native protocol
765 `---' |
766 | V
767 +--+--+ +-----+
768 | FW1 |------| PX1 |
769 +--+--+ +-----+ | PX1 to PX2: PROXY + native
770 | V
771 +--+--+ +-----+
772 | FW2 |------| PX2 |
773 +--+--+ +-----+ | PX2 to SRV: PROXY + native
774 | V
775 +--+--+
776 | SRV |
777 +-----+
Willy Tarreau7f898512011-03-20 11:32:40 +0100778
Willy Tarreau332d7b02012-11-19 11:27:29 +0100779Firewall FW1 receives traffic from internet-based clients and forwards it to
780reverse-proxy PX1. PX1 adds a PROXY header then forwards to PX2 via FW2. PX2
781is configured to read the PROXY header and to emit it on output. It then joins
782the origin server SRV and presents the original client's address there. Since
783all TCP connections endpoints are real machines and are not spoofed, there is
784no issue for the return traffic to pass via the firewalls and reverse proxies.
785Using transparent proxy, this would be quite difficult because the firewalls
786would have to deal with the client's address coming from the proxies in the DMZ
787and would have to correctly route the return traffic there instead of using the
788default route.
Willy Tarreau7f898512011-03-20 11:32:40 +0100789
Willy Tarreau332d7b02012-11-19 11:27:29 +0100790
7914.2. IPv4 and IPv6 integration
792
793The protocol also eases IPv4 and IPv6 integration : if only the first layer
794(FW1 and PX1) is IPv6-capable, it is still possible to present the original
Andriy Palamarchukf1eae4e2017-01-24 13:34:08 -0500795client's IPv6 address to the target server even though the whole chain is only
Willy Tarreau332d7b02012-11-19 11:27:29 +0100796connected via IPv4.
797
798
7994.3. Multiple return paths
800
801When transparent proxy is used, it is not possible to run multiple proxies
802because the return traffic would follow the default route instead of finding
803the proper proxy. Some tricks are sometimes possible using multiple server
804addresses and policy routing but these are very limited.
805
806Using the PROXY protocol, this problem disappears as the servers don't need
807to route to the client, just to the proxy that forwarded the connection. So
808it is perfectly possible to run a proxy farm in front of a very large server
809farm and have it working effortless, even when dealing with multiple sites.
810
811This is particularly important in Cloud-like environments where there is little
812choice of binding to random addresses and where the lower processing power per
813node generally requires multiple front nodes.
814
815The example below illustrates the following case : virtualized infrastructures
816are deployed in 3 datacenters (DC1..DC3). Each DC uses its own VIP which is
817handled by the hosting provider's layer 3 load balancer. This load balancer
818routes the traffic to a farm of layer 7 SSL/cache offloaders which load balance
819among their local servers. The VIPs are advertised by geolocalised DNS so that
820clients generally stick to a given DC. Since clients are not guaranteed to
821stick to one DC, the L7 load balancing proxies have to know the other DCs'
822servers that may be reached via the hosting provider's LAN or via the internet.
823The L7 proxies use the PROXY protocol to join the servers behind them, so that
824even inter-DC traffic can forward the original client's address and the return
825path is unambiguous. This would not be possible using transparent proxy because
826most often the L7 proxies would not be able to spoof an address, and this would
827never work between datacenters.
828
829 Internet
830
831 DC1 DC2 DC3
832 ,---. ,---. ,---.
833 ( X ) ( X ) ( X )
834 `---' `---' `---'
835 | +-------+ | +-------+ | +-------+
836 +----| L3 LB | +----| L3 LB | +----| L3 LB |
837 | +-------+ | +-------+ | +-------+
838 ------+------- ~ ~ ~ ------+------- ~ ~ ~ ------+-------
839 ||||| |||| ||||| |||| ||||| ||||
840 50 SRV 4 PX 50 SRV 4 PX 50 SRV 4 PX
841
842
8435. Security considerations
844
845Version 1 of the protocol header (the human-readable format) was designed so as
846to be distinguishable from HTTP. It will not parse as a valid HTTP request and
847an HTTP request will not parse as a valid proxy request. Version 2 add to use a
848non-parsable binary signature to make many products fail on this block. The
849signature was designed to cause immediate failure on HTTP, SSL/TLS, SMTP, FTP,
850and POP. It also causes aborts on LDAP and RDP servers (see section 6). That
851makes it easier to enforce its use under certain connections and at the same
852time, it ensures that improperly configured servers are quickly detected.
853
Willy Tarreau7f898512011-03-20 11:32:40 +0100854Implementers should be very careful about not trying to automatically detect
Willy Tarreau332d7b02012-11-19 11:27:29 +0100855whether they have to decode the header or not, but rather they must only rely
856on a configuration parameter. Indeed, if the opportunity is left to a normal
Jackie Tapia749f74c2020-07-22 18:59:40 -0500857client to use the protocol, it will be able to hide its activities or make them
858appear as coming from somewhere else. However, accepting the header only from a
Willy Tarreau332d7b02012-11-19 11:27:29 +0100859number of known sources should be safe.
860
861
8626. Validation
Willy Tarreau7f898512011-03-20 11:32:40 +0100863
Willy Tarreau332d7b02012-11-19 11:27:29 +0100864The version 2 protocol signature has been sent to a wide variety of protocols
865and implementations including old ones. The following protocol and products
Andriy Palamarchukf1eae4e2017-01-24 13:34:08 -0500866have been tested to ensure the best possible behavior when the signature was
Willy Tarreau332d7b02012-11-19 11:27:29 +0100867presented, even with minimal implementations :
Willy Tarreau7f898512011-03-20 11:32:40 +0100868
Willy Tarreau332d7b02012-11-19 11:27:29 +0100869 - HTTP :
870 - Apache 1.3.33 : connection abort => pass/optimal
871 - Nginx 0.7.69 : 400 Bad Request + abort => pass/optimal
872 - lighttpd 1.4.20 : 400 Bad Request + abort => pass/optimal
873 - thttpd 2.20c : 400 Bad Request + abort => pass/optimal
874 - mini-httpd-1.19 : 400 Bad Request + abort => pass/optimal
875 - haproxy 1.4.21 : 400 Bad Request + abort => pass/optimal
Willy Tarreau9e138202014-07-12 17:31:07 +0200876 - Squid 3 : 400 Bad Request + abort => pass/optimal
Willy Tarreau332d7b02012-11-19 11:27:29 +0100877 - SSL :
878 - stud 0.3.47 : connection abort => pass/optimal
879 - stunnel 4.45 : connection abort => pass/optimal
880 - nginx 0.7.69 : 400 Bad Request + abort => pass/optimal
881 - FTP :
882 - Pure-ftpd 1.0.20 : 3*500 then 221 Goodbye => pass/optimal
883 - vsftpd 2.0.1 : 3*530 then 221 Goodbye => pass/optimal
884 - SMTP :
885 - postfix 2.3 : 3*500 + 221 Bye => pass/optimal
886 - exim 4.69 : 554 + connection abort => pass/optimal
887 - POP :
888 - dovecot 1.0.10 : 3*ERR + Logout => pass/optimal
889 - IMAP :
890 - dovecot 1.0.10 : 5*ERR + hang => pass/non-optimal
891 - LDAP :
892 - openldap 2.3 : abort => pass/optimal
893 - SSH :
894 - openssh 3.9p1 : abort => pass/optimal
895 - RDP :
896 - Windows XP SP3 : abort => pass/optimal
897
898This means that most protocols and implementations will not be confused by an
899incoming connection exhibiting the protocol signature, which avoids issues when
900facing misconfigurations.
901
902
9037. Future developments
Willy Tarreau640cf222010-10-29 21:46:16 +0200904
905It is possible that the protocol may slightly evolve to present other
906information such as the incoming network interface, or the origin addresses in
907case of network address translation happening before the first proxy, but this
Willy Tarreau332d7b02012-11-19 11:27:29 +0100908is not identified as a requirement right now. Some deep thinking has been spent
Andriy Palamarchukf1eae4e2017-01-24 13:34:08 -0500909on this and it appears that trying to add a few more information open a Pandora
Willy Tarreau332d7b02012-11-19 11:27:29 +0100910box with many information from MAC addresses to SSL client certificates, which
911would make the protocol much more complex. So at this point it is not planned.
912Suggestions on improvements are welcome.
Willy Tarreau7f898512011-03-20 11:32:40 +0100913
914
Willy Tarreau332d7b02012-11-19 11:27:29 +01009158. Contacts and links
Willy Tarreau7f898512011-03-20 11:32:40 +0100916
917Please use w@1wt.eu to send any comments to the author.
918
Willy Tarreau332d7b02012-11-19 11:27:29 +0100919The following links were referenced in the document.
920
921[1] http://www.postfix.org/XCLIENT_README.html
Willy Tarreau7a6f1342014-06-14 11:45:09 +0200922[2] http://tools.ietf.org/html/rfc7239
Willy Tarreau332d7b02012-11-19 11:27:29 +0100923[3] http://www.stunnel.org/
924[4] https://github.com/bumptech/stud
925[5] https://github.com/bumptech/stud/pull/81
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200926[6] https://www.varnish-cache.org/docs/trunk/phk/ssl_again.html
927[7] http://wiki.squid-cache.org/Squid-3.5
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500928[8] https://tools.ietf.org/html/rfc4960#appendix-B
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400929[9] https://tools.ietf.org/rfc/rfc7301.txt
930[10] https://www.ietf.org/rfc/rfc3546.txt
Glenn Straussc28bb552017-04-05 01:51:37 -0400931[11] https://redmine.lighttpd.net/issues/2804
Willy Tarreau332d7b02012-11-19 11:27:29 +0100932
9339. Sample code
934
935The code below is an example of how a receiver may deal with both versions of
936the protocol header for TCP over IPv4 or IPv6. The function is supposed to be
937called upon a read event. Addresses may be directly copied into their final
938memory location since they're transported in network byte order. The sending
939side is even simpler and can easily be deduced from this sample code.
940
941 struct sockaddr_storage from; /* already filled by accept() */
942 struct sockaddr_storage to; /* already filled by getsockname() */
Willy Tarreau01320c92014-06-14 08:36:29 +0200943 const char v2sig[12] = "\x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A";
Willy Tarreau332d7b02012-11-19 11:27:29 +0100944
945 /* returns 0 if needs to poll, <0 upon error or >0 if it did the job */
946 int read_evt(int fd)
947 {
948 union {
949 struct {
950 char line[108];
951 } v1;
952 struct {
953 uint8_t sig[12];
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200954 uint8_t ver_cmd;
Willy Tarreau332d7b02012-11-19 11:27:29 +0100955 uint8_t fam;
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200956 uint16_t len;
Willy Tarreau332d7b02012-11-19 11:27:29 +0100957 union {
958 struct { /* for TCP/UDP over IPv4, len = 12 */
959 uint32_t src_addr;
960 uint32_t dst_addr;
961 uint16_t src_port;
962 uint16_t dst_port;
963 } ip4;
964 struct { /* for TCP/UDP over IPv6, len = 36 */
965 uint8_t src_addr[16];
966 uint8_t dst_addr[16];
967 uint16_t src_port;
968 uint16_t dst_port;
969 } ip6;
970 struct { /* for AF_UNIX sockets, len = 216 */
971 uint8_t src_addr[108];
972 uint8_t dst_addr[108];
973 } unx;
974 } addr;
975 } v2;
976 } hdr;
977
978 int size, ret;
979
980 do {
981 ret = recv(fd, &hdr, sizeof(hdr), MSG_PEEK);
982 } while (ret == -1 && errno == EINTR);
983
984 if (ret == -1)
985 return (errno == EAGAIN) ? 0 : -1;
986
Willy Tarreau01320c92014-06-14 08:36:29 +0200987 if (ret >= 16 && memcmp(&hdr.v2, v2sig, 12) == 0 &&
988 (hdr.v2.ver_cmd & 0xF0) == 0x20) {
Glenn Strauss91cc8082017-04-05 01:37:20 -0400989 size = 16 + ntohs(hdr.v2.len);
Willy Tarreau332d7b02012-11-19 11:27:29 +0100990 if (ret < size)
991 return -1; /* truncated or too large header */
992
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200993 switch (hdr.v2.ver_cmd & 0xF) {
Willy Tarreau332d7b02012-11-19 11:27:29 +0100994 case 0x01: /* PROXY command */
995 switch (hdr.v2.fam) {
996 case 0x11: /* TCPv4 */
997 ((struct sockaddr_in *)&from)->sin_family = AF_INET;
998 ((struct sockaddr_in *)&from)->sin_addr.s_addr =
999 hdr.v2.addr.ip4.src_addr;
1000 ((struct sockaddr_in *)&from)->sin_port =
1001 hdr.v2.addr.ip4.src_port;
1002 ((struct sockaddr_in *)&to)->sin_family = AF_INET;
1003 ((struct sockaddr_in *)&to)->sin_addr.s_addr =
1004 hdr.v2.addr.ip4.dst_addr;
1005 ((struct sockaddr_in *)&to)->sin_port =
1006 hdr.v2.addr.ip4.dst_port;
1007 goto done;
1008 case 0x21: /* TCPv6 */
1009 ((struct sockaddr_in6 *)&from)->sin6_family = AF_INET6;
1010 memcpy(&((struct sockaddr_in6 *)&from)->sin6_addr,
1011 hdr.v2.addr.ip6.src_addr, 16);
1012 ((struct sockaddr_in6 *)&from)->sin6_port =
1013 hdr.v2.addr.ip6.src_port;
1014 ((struct sockaddr_in6 *)&to)->sin6_family = AF_INET6;
1015 memcpy(&((struct sockaddr_in6 *)&to)->sin6_addr,
1016 hdr.v2.addr.ip6.dst_addr, 16);
1017 ((struct sockaddr_in6 *)&to)->sin6_port =
1018 hdr.v2.addr.ip6.dst_port;
1019 goto done;
1020 }
1021 /* unsupported protocol, keep local connection address */
1022 break;
1023 case 0x00: /* LOCAL command */
1024 /* keep local connection address for LOCAL */
1025 break;
1026 default:
1027 return -1; /* not a supported command */
1028 }
1029 }
1030 else if (ret >= 8 && memcmp(hdr.v1.line, "PROXY", 5) == 0) {
1031 char *end = memchr(hdr.v1.line, '\r', ret - 1);
1032 if (!end || end[1] != '\n')
1033 return -1; /* partial or invalid header */
1034 *end = '\0'; /* terminate the string to ease parsing */
1035 size = end + 2 - hdr.v1.line; /* skip header + CRLF */
1036 /* parse the V1 header using favorite address parsers like inet_pton.
1037 * return -1 upon error, or simply fall through to accept.
1038 */
1039 }
1040 else {
1041 /* Wrong protocol */
1042 return -1;
1043 }
1044
1045 done:
1046 /* we need to consume the appropriate amount of data from the socket */
1047 do {
1048 ret = recv(fd, &hdr, size, 0);
1049 } while (ret == -1 && errno == EINTR);
1050 return (ret >= 0) ? 1 : -1;
1051 }