blob: d05ebc232d0fce8639612ebd44d91f9137e9d8ce [file] [log] [blame]
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -040012017/03/10 Willy Tarreau
Willy Tarreaua3393952014-05-10 15:16:43 +02002 HAProxy Technologies
Willy Tarreau7f898512011-03-20 11:32:40 +01003 The PROXY protocol
Willy Tarreau332d7b02012-11-19 11:27:29 +01004 Versions 1 & 2
Willy Tarreau7f898512011-03-20 11:32:40 +01005
6Abstract
7
8 The PROXY protocol provides a convenient way to safely transport connection
9 information such as a client's address across multiple layers of NAT or TCP
10 proxies. It is designed to require little changes to existing components and
11 to limit the performance impact caused by the processing of the transported
12 information.
13
14
15Revision history
16
17 2010/10/29 - first version
18 2011/03/20 - update: implementation and security considerations
Willy Tarreau332d7b02012-11-19 11:27:29 +010019 2012/06/21 - add support for binary format
20 2012/11/19 - final review and fixes
David Safb76832014-05-08 23:42:08 -040021 2014/05/18 - modify and extend PROXY protocol version 2
Willy Tarreau7a6f1342014-06-14 11:45:09 +020022 2014/06/11 - fix example code to consider ver+cmd merge
23 2014/06/14 - fix v2 header check in example code, and update Forwarded spec
Willy Tarreau7b7011c2015-05-02 15:13:07 +020024 2014/07/12 - update list of implementations (add Squid)
25 2015/05/02 - update list of implementations and format of the TLV add-ons
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -040026 2017/03/10 - added the checksum and more SSL-related TLV types, reserved TLV
27 type ranges, added TLV documentation, clarified string
28 encoding. With contributions from Andriy Palamarchuk
29 (Amazon.com).
Willy Tarreau7f898512011-03-20 11:32:40 +010030
31
321. Background
Willy Tarreau640cf222010-10-29 21:46:16 +020033
34Relaying TCP connections through proxies generally involves a loss of the
35original TCP connection parameters such as source and destination addresses,
36ports, and so on. Some protocols make it a little bit easier to transfer such
Willy Tarreau332d7b02012-11-19 11:27:29 +010037information. For SMTP, Postfix authors have proposed the XCLIENT protocol [1]
Willy Tarreau7a6f1342014-06-14 11:45:09 +020038which received broad adoption and is particularly suited to mail exchanges.
39For HTTP, there is the "Forwarded" extension [2], which aims at replacing the
40omnipresent "X-Forwarded-For" header which carries information about the
41original source address, and the less common X-Original-To which carries
42information about the destination address.
Willy Tarreau640cf222010-10-29 21:46:16 +020043
44However, both mechanisms require a knowledge of the underlying protocol to be
45implemented in intermediaries.
46
47Then comes a new class of products which we'll call "dumb proxies", not because
48they don't do anything, but because they're processing protocol-agnostic data.
Willy Tarreau332d7b02012-11-19 11:27:29 +010049Both Stunnel[3] and Stud[4] are examples of such "dumb proxies". They talk raw
50TCP on one side, and raw SSL on the other one, and do that reliably, without
Willy Tarreau7a6f1342014-06-14 11:45:09 +020051any knowledge of what protocol is transported on top of the connection. Haproxy
52running in pure TCP mode obviously falls into that category as well.
Willy Tarreau640cf222010-10-29 21:46:16 +020053
54The problem with such a proxy when it is combined with another one such as
Willy Tarreau7a6f1342014-06-14 11:45:09 +020055haproxy, is to adapt it to talk the higher level protocol. A patch is available
Willy Tarreau332d7b02012-11-19 11:27:29 +010056for Stunnel to make it capable of inserting an X-Forwarded-For header in the
57first HTTP request of each incoming connection. Haproxy is able not to add
58another one when the connection comes from Stunnel, so that it's possible to
59hide it from the servers.
Willy Tarreau640cf222010-10-29 21:46:16 +020060
61The typical architecture becomes the following one :
62
63
64 +--------+ HTTP :80 +----------+
65 | client | --------------------------------> | |
66 | | | haproxy, |
67 +--------+ +---------+ | 1 or 2 |
68 / / HTTPS | stunnel | HTTP :81 | listening|
69 <________/ ---------> | (server | ---------> | ports |
70 | mode) | | |
71 +---------+ +----------+
72
73
74The problem appears when haproxy runs with keep-alive on the side towards the
75client. The Stunnel patch will only add the X-Forwarded-For header to the first
76request of each connection and all subsequent requests will not have it. One
77solution could be to improve the patch to make it support keep-alive and parse
78all forwarded data, whether they're announced with a Content-Length or with a
79Transfer-Encoding, taking care of special methods such as HEAD which announce
Andriy Palamarchukf1eae4e2017-01-24 13:34:08 -050080data without transferring them, etc... In fact, it would require implementing a
Willy Tarreau640cf222010-10-29 21:46:16 +020081full HTTP stack in Stunnel. It would then become a lot more complex, a lot less
82reliable and would not anymore be the "dumb proxy" that fits every purposes.
83
84In practice, we don't need to add a header for each request because we'll emit
85the exact same information every time : the information related to the client
86side connection. We could then cache that information in haproxy and use it for
87every other request. But that becomes dangerous and is still limited to HTTP
88only.
89
Willy Tarreau332d7b02012-11-19 11:27:29 +010090Another approach consists in prepending each connection with a header reporting
91the characteristics of the other side's connection. This method is simpler to
Willy Tarreau640cf222010-10-29 21:46:16 +020092implement, does not require any protocol-specific knowledge on either side, and
Willy Tarreau332d7b02012-11-19 11:27:29 +010093completely fits the purpose since what is desired precisely is to know the
94other side's connection endpoints. It is easy to perform for the sender (just
95send a short header once the connection is established) and to parse for the
96receiver (simply perform one read() on the incoming connection to fill in
97addresses after an accept). The protocol used to carry connection information
98across proxies was thus called the PROXY protocol.
Willy Tarreau640cf222010-10-29 21:46:16 +020099
Willy Tarreau7f898512011-03-20 11:32:40 +0100100
Willy Tarreau332d7b02012-11-19 11:27:29 +01001012. The PROXY protocol header
Willy Tarreau7f898512011-03-20 11:32:40 +0100102
Willy Tarreau332d7b02012-11-19 11:27:29 +0100103This document uses a few terms that are worth explaining here :
104 - "connection initiator" is the party requesting a new connection
105 - "connection target" is the party accepting a connection request
106 - "client" is the party for which a connection was requested
107 - "server" is the party to which the client desired to connect
108 - "proxy" is the party intercepting and relaying the connection
109 from the client to the server.
110 - "sender" is the party sending data over a connection.
111 - "receiver" is the party receiving data from the sender.
112 - "header" or "PROXY protocol header" is the block of connection information
113 the connection initiator prepends at the beginning of a connection, which
114 makes it the sender from the protocol point of view.
115
116The PROXY protocol's goal is to fill the server's internal structures with the
117information collected by the proxy that the server would have been able to get
118by itself if the client was connecting directly to the server instead of via a
119proxy. The information carried by the protocol are the ones the server would
120get using getsockname() and getpeername() :
121 - address family (AF_INET for IPv4, AF_INET6 for IPv6, AF_UNIX)
122 - socket protocol (SOCK_STREAM for TCP, SOCK_DGRAM for UDP)
Willy Tarreau640cf222010-10-29 21:46:16 +0200123 - layer 3 source and destination addresses
124 - layer 4 source and destination ports if any
125
126Unlike the XCLIENT protocol, the PROXY protocol was designed with limited
Willy Tarreau332d7b02012-11-19 11:27:29 +0100127extensibility in order to help the receiver parse it very fast. Version 1 was
128focused on keeping it human-readable for better debugging possibilities, which
129is always desirable for early adoption when few implementations exist. Version
1302 adds support for a binary encoding of the header which is much more efficient
131to produce and to parse, especially when dealing with IPv6 addresses that are
132expensive to emit in ASCII form and to parse.
133
134In both cases, the protocol simply consists in an easily parsable header placed
135by the connection initiator at the beginning of each connection. The protocol
136is intentionally stateless in that it does not expect the sender to wait for
137the receiver before sending the header, nor the receiver to send anything back.
138
139This specification supports two header formats, a human-readable format which
140is the only format supported in version 1 of the protocol, and a binary format
141which is only supported in version 2. Both formats were designed to ensure that
142the header cannot be confused with common higher level protocols such as HTTP,
143SSL/TLS, FTP or SMTP, and that both formats are easily distinguishable one from
144each other for the receiver.
145
146Version 1 senders MAY only produce the human-readable header format. Version 2
147senders MAY only produce the binary header format. Version 1 receivers MUST at
148least implement the human-readable header format. Version 2 receivers MUST at
149least implement the binary header format, and it is recommended that they also
150implement the human-readable header format for better interoperability and ease
151of upgrade when facing version 1 senders.
152
153Both formats are designed to fit in the smallest TCP segment that any TCP/IP
154host is required to support (576 - 40 = 536 bytes). This ensures that the whole
155header will always be delivered at once when the socket buffers are still empty
156at the beginning of a connection. The sender must always ensure that the header
157is sent at once, so that the transport layer maintains atomicity along the path
158to the receiver. The receiver may be tolerant to partial headers or may simply
159drop the connection when receiving a partial header. Recommendation is to be
160tolerant, but implementation constraints may not always easily permit this. It
161is important to note that nothing forces any intermediary to forward the whole
162header at once, because TCP is a streaming protocol which may be processed one
163byte at a time if desired, causing the header to be fragmented when reaching
164the receiver. But due to the places where such a protocol is used, the above
165simplification generally is acceptable because the risk of crossing such a
166device handling one byte at a time is close to zero.
167
168The receiver MUST NOT start processing the connection before it receives a
169complete and valid PROXY protocol header. This is particularly important for
170protocols where the receiver is expected to speak first (eg: SMTP, FTP or SSH).
171The receiver may apply a short timeout and decide to abort the connection if
172the protocol header is not seen within a few seconds (at least 3 seconds to
173cover a TCP retransmit).
174
175The receiver MUST be configured to only receive the protocol described in this
176specification and MUST not try to guess whether the protocol header is present
177or not. This means that the protocol explicitly prevents port sharing between
178public and private access. Otherwise it would open a major security breach by
179allowing untrusted parties to spoof their connection addresses. The receiver
180SHOULD ensure proper access filtering so that only trusted proxies are allowed
181to use this protocol.
182
183Some proxies are smart enough to understand transported protocols and to reuse
184idle server connections for multiple messages. This typically happens in HTTP
185where requests from multiple clients may be sent over the same connection. Such
186proxies MUST NOT implement this protocol on multiplexed connections because the
187receiver would use the address advertised in the PROXY header as the address of
188all forwarded requests's senders. In fact, such proxies are not dumb proxies,
189and since they do have a complete understanding of the transported protocol,
190they MUST use the facilities provided by this protocol to present the client's
191address.
192
193
1942.1. Human-readable header format (Version 1)
195
196This is the format specified in version 1 of the protocol. It consists in one
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400197line of US-ASCII text matching exactly the following block, sent immediately
198and at once upon the connection establishment and prepended before any data
199flowing from the sender to the receiver :
Willy Tarreau640cf222010-10-29 21:46:16 +0200200
201 - a string identifying the protocol : "PROXY" ( \x50 \x52 \x4F \x58 \x59 )
Willy Tarreau332d7b02012-11-19 11:27:29 +0100202 Seeing this string indicates that this is version 1 of the protocol.
Willy Tarreau640cf222010-10-29 21:46:16 +0200203
204 - exactly one space : " " ( \x20 )
205
Willy Tarreau332d7b02012-11-19 11:27:29 +0100206 - a string indicating the proxied INET protocol and family. As of version 1,
Willy Tarreau640cf222010-10-29 21:46:16 +0200207 only "TCP4" ( \x54 \x43 \x50 \x34 ) for TCP over IPv4, and "TCP6"
Willy Tarreau332d7b02012-11-19 11:27:29 +0100208 ( \x54 \x43 \x50 \x36 ) for TCP over IPv6 are allowed. Other, unsupported,
209 or unknown protocols must be reported with the name "UNKNOWN" ( \x55 \x4E
210 \x4B \x4E \x4F \x57 \x4E ). For "UNKNOWN", the rest of the line before the
211 CRLF may be omitted by the sender, and the receiver must ignore anything
212 presented before the CRLF is found. Note that an earlier version of this
213 specification suggested to use this when sending health checks, but this
214 causes issues with servers that reject the "UNKNOWN" keyword. Thus is it
215 now recommended not to send "UNKNOWN" when the connection is expected to
216 be accepted, but only when it is not possible to correctly fill the PROXY
217 line.
Willy Tarreau640cf222010-10-29 21:46:16 +0200218
219 - exactly one space : " " ( \x20 )
220
221 - the layer 3 source address in its canonical format. IPv4 addresses must be
222 indicated as a series of exactly 4 integers in the range [0..255] inclusive
223 written in decimal representation separated by exactly one dot between each
224 other. Heading zeroes are not permitted in front of numbers in order to
225 avoid any possible confusion with octal numbers. IPv6 addresses must be
226 indicated as series of 4 hexadecimal digits (upper or lower case) delimited
227 by colons between each other, with the acceptance of one double colon
228 sequence to replace the largest acceptable range of consecutive zeroes. The
229 total number of decoded bits must exactly be 128. The advertised protocol
230 family dictates what format to use.
231
232 - exactly one space : " " ( \x20 )
233
234 - the layer 3 destination address in its canonical format. It is the same
235 format as the layer 3 source address and matches the same family.
236
237 - exactly one space : " " ( \x20 )
238
239 - the TCP source port represented as a decimal integer in the range
240 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
241 in order to avoid any possible confusion with octal numbers.
242
243 - exactly one space : " " ( \x20 )
244
245 - the TCP destination port represented as a decimal integer in the range
246 [0..65535] inclusive. Heading zeroes are not permitted in front of numbers
247 in order to avoid any possible confusion with octal numbers.
248
249 - the CRLF sequence ( \x0D \x0A )
250
Willy Tarreau332d7b02012-11-19 11:27:29 +0100251
252The maximum line lengths the receiver must support including the CRLF are :
253 - TCP/IPv4 :
254 "PROXY TCP4 255.255.255.255 255.255.255.255 65535 65535\r\n"
255 => 5 + 1 + 4 + 1 + 15 + 1 + 15 + 1 + 5 + 1 + 5 + 2 = 56 chars
256
257 - TCP/IPv6 :
258 "PROXY TCP6 ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
259 => 5 + 1 + 4 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 104 chars
260
261 - unknown connection (short form) :
262 "PROXY UNKNOWN\r\n"
263 => 5 + 1 + 7 + 2 = 15 chars
264
265 - worst case (optional fields set to 0xff) :
266 "PROXY UNKNOWN ffff:f...f:ffff ffff:f...f:ffff 65535 65535\r\n"
267 => 5 + 1 + 7 + 1 + 39 + 1 + 39 + 1 + 5 + 1 + 5 + 2 = 107 chars
268
269So a 108-byte buffer is always enough to store all the line and a trailing zero
270for string processing.
271
272The receiver must wait for the CRLF sequence before starting to decode the
273addresses in order to ensure they are complete and properly parsed. If the CRLF
274sequence is not found in the first 107 characters, the receiver should declare
275the line invalid. A receiver may reject an incomplete line which does not
276contain the CRLF sequence in the first atomic read operation. The receiver must
277not tolerate a single CR or LF character to end the line when a complete CRLF
278sequence is expected.
279
280Any sequence which does not exactly match the protocol must be discarded and
281cause the receiver to abort the connection. It is recommended to abort the
282connection as soon as possible so that the sender gets a chance to notice the
283anomaly and log it.
Willy Tarreau640cf222010-10-29 21:46:16 +0200284
285If the announced transport protocol is "UNKNOWN", then the receiver knows that
Willy Tarreau332d7b02012-11-19 11:27:29 +0100286the sender speaks the correct PROXY protocol with the appropriate version, and
287SHOULD accept the connection and use the real connection's parameters as if
288there were no PROXY protocol header on the wire. However, senders SHOULD not
289use the "UNKNOWN" protocol when they are the initiators of outgoing connections
290because some receivers may reject them. When a load balancing proxy has to send
291health checks to a server, it SHOULD build a valid PROXY line which it will
292fill with a getsockname()/getpeername() pair indicating the addresses used. It
293is important to understand that doing so is not appropriate when some source
294address translation is performed between the sender and the receiver.
Willy Tarreau640cf222010-10-29 21:46:16 +0200295
296An example of such a line before an HTTP request would look like this (CR
297marked as "\r" and LF marked as "\n") :
298
299 PROXY TCP4 192.168.0.1 192.168.0.11 56324 443\r\n
300 GET / HTTP/1.1\r\n
301 Host: 192.168.0.11\r\n
302 \r\n
303
Willy Tarreau332d7b02012-11-19 11:27:29 +0100304For the sender, the header line is easy to put into the output buffers once the
305connection is established. Note that since the line is always shorter than an
306MSS, the sender is guaranteed to always be able to emit it at once and should
307not even bother handling partial sends. For the receiver, once the header is
308parsed, it is easy to skip it from the input buffers. Please consult section 9
309for implementation suggestions.
310
311
3122.2. Binary header format (version 2)
313
314Producing human-readable IPv6 addresses and parsing them is very inefficient,
315due to the multiple possible representation formats and the handling of compact
316address format. It was also not possible to specify address families outside
317IPv4/IPv6 nor non-TCP protocols. Another drawback of the human-readable format
318is the fact that implementations need to parse all characters to find the
319trailing CRLF, which makes it harder to read only the exact bytes count. Last,
320the UNKNOWN address type has not always been accepted by servers as a valid
321protocol because of its imprecise meaning.
322
323Version 2 of the protocol thus introduces a new binary format which remains
324distinguishable from version 1 and from other commonly used protocols. It was
325specially designed in order to be incompatible with a wide range of protocols
326and to be rejected by a number of common implementations of these protocols
327when unexpectedly presented (please see section 7). Also for better processing
328efficiency, IPv4 and IPv6 addresses are respectively aligned on 4 and 16 bytes
329boundaries.
330
331The binary header format starts with a constant 12 bytes block containing the
332protocol signature :
333
334 \x0D \x0A \x0D \x0A \x00 \x0D \x0A \x51 \x55 \x49 \x54 \x0A
335
336Note that this block contains a null byte at the 5th position, so it must not
337be handled as a null-terminated string.
338
David Safb76832014-05-08 23:42:08 -0400339The next byte (the 13th one) is the protocol version and command.
Willy Tarreau332d7b02012-11-19 11:27:29 +0100340
David Safb76832014-05-08 23:42:08 -0400341The highest four bits contains the version. As of this specification, it must
342always be sent as \x2 and the receiver must only accept this value.
343
344The lowest four bits represents the command :
345 - \x0 : LOCAL : the connection was established on purpose by the proxy
Willy Tarreau332d7b02012-11-19 11:27:29 +0100346 without being relayed. The connection endpoints are the sender and the
347 receiver. Such connections exist when the proxy sends health-checks to the
348 server. The receiver must accept this connection as valid and must use the
349 real connection endpoints and discard the protocol block including the
350 family which is ignored.
351
David Safb76832014-05-08 23:42:08 -0400352 - \x1 : PROXY : the connection was established on behalf of another node,
Willy Tarreau332d7b02012-11-19 11:27:29 +0100353 and reflects the original connection endpoints. The receiver must then use
354 the information provided in the protocol block to get original the address.
355
356 - other values are unassigned and must not be emitted by senders. Receivers
357 must drop connections presenting unexpected values here.
358
David Safb76832014-05-08 23:42:08 -0400359The 14th byte contains the transport protocol and address family. The highest 4
Willy Tarreau332d7b02012-11-19 11:27:29 +0100360bits contain the address family, the lowest 4 bits contain the protocol.
361
362The address family maps to the original socket family without necessarily
363matching the values internally used by the system. It may be one of :
364
365 - 0x0 : AF_UNSPEC : the connection is forwarded for an unknown, unspecified
366 or unsupported protocol. The sender should use this family when sending
367 LOCAL commands or when dealing with unsupported protocol families. The
368 receiver is free to accept the connection anyway and use the real endpoint
369 addresses or to reject it. The receiver should ignore address information.
370
371 - 0x1 : AF_INET : the forwarded connection uses the AF_INET address family
372 (IPv4). The addresses are exactly 4 bytes each in network byte order,
373 followed by transport protocol information (typically ports).
374
375 - 0x2 : AF_INET6 : the forwarded connection uses the AF_INET6 address family
376 (IPv6). The addresses are exactly 16 bytes each in network byte order,
377 followed by transport protocol information (typically ports).
378
379 - 0x3 : AF_UNIX : the forwarded connection uses the AF_UNIX address family
380 (UNIX). The addresses are exactly 108 bytes each.
381
382 - other values are unspecified and must not be emitted in version 2 of this
383 protocol and must be rejected as invalid by receivers.
384
Andriy Palamarchukf1eae4e2017-01-24 13:34:08 -0500385The transport protocol is specified in the lowest 4 bits of the 14th byte :
Willy Tarreau332d7b02012-11-19 11:27:29 +0100386
387 - 0x0 : UNSPEC : the connection is forwarded for an unknown, unspecified
388 or unsupported protocol. The sender should use this family when sending
389 LOCAL commands or when dealing with unsupported protocol families. The
390 receiver is free to accept the connection anyway and use the real endpoint
391 addresses or to reject it. The receiver should ignore address information.
392
393 - 0x1 : STREAM : the forwarded connection uses a SOCK_STREAM protocol (eg:
394 TCP or UNIX_STREAM). When used with AF_INET/AF_INET6 (TCP), the addresses
395 are followed by the source and destination ports represented on 2 bytes
396 each in network byte order.
397
398 - 0x2 : DGRAM : the forwarded connection uses a SOCK_DGRAM protocol (eg:
399 UDP or UNIX_DGRAM). When used with AF_INET/AF_INET6 (UDP), the addresses
400 are followed by the source and destination ports represented on 2 bytes
401 each in network byte order.
402
403 - other values are unspecified and must not be emitted in version 2 of this
404 protocol and must be rejected as invalid by receivers.
405
406In practice, the following protocol bytes are expected :
407
408 - \x00 : UNSPEC : the connection is forwarded for an unknown, unspecified
409 or unsupported protocol. The sender should use this family when sending
410 LOCAL commands or when dealing with unsupported protocol families. When
411 used with a LOCAL command, the receiver must accept the connection and
412 ignore any address information. For other commands, the receiver is free
413 to accept the connection anyway and use the real endpoints addresses or to
414 reject the connection. The receiver should ignore address information.
415
416 - \x11 : TCP over IPv4 : the forwarded connection uses TCP over the AF_INET
417 protocol family. Address length is 2*4 + 2*2 = 12 bytes.
418
419 - \x12 : UDP over IPv4 : the forwarded connection uses UDP over the AF_INET
420 protocol family. Address length is 2*4 + 2*2 = 12 bytes.
421
422 - \x21 : TCP over IPv6 : the forwarded connection uses TCP over the AF_INET6
423 protocol family. Address length is 2*16 + 2*2 = 36 bytes.
424
425 - \x22 : UDP over IPv6 : the forwarded connection uses UDP over the AF_INET6
426 protocol family. Address length is 2*16 + 2*2 = 36 bytes.
427
428 - \x31 : UNIX stream : the forwarded connection uses SOCK_STREAM over the
429 AF_UNIX protocol family. Address length is 2*108 = 216 bytes.
430
431 - \x32 : UNIX datagram : the forwarded connection uses SOCK_DGRAM over the
432 AF_UNIX protocol family. Address length is 2*108 = 216 bytes.
433
434
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500435Only the UNSPEC protocol byte (\x00) is mandatory to implement on the receiver.
436A receiver is not required to implement other ones, provided that it
437automatically falls back to the UNSPEC mode for the valid combinations above
438that it does not support.
Willy Tarreau640cf222010-10-29 21:46:16 +0200439
Andriy Palamarchukf1eae4e2017-01-24 13:34:08 -0500440The 15th and 16th bytes is the address length in bytes in network endian order.
David Safb76832014-05-08 23:42:08 -0400441It is used so that the receiver knows how many address bytes to skip even when
442it does not implement the presented protocol. Thus the length of the protocol
443header in bytes is always exactly 16 + this value. When a sender presents a
Willy Tarreau332d7b02012-11-19 11:27:29 +0100444LOCAL connection, it should not present any address so it sets this field to
445zero. Receivers MUST always consider this field to skip the appropriate number
446of bytes and must not assume zero is presented for LOCAL connections. When a
447receiver accepts an incoming connection showing an UNSPEC address family or
448protocol, it may or may not decide to log the address information if present.
449
450So the 16-byte version 2 header can be described this way :
451
452 struct proxy_hdr_v2 {
453 uint8_t sig[12]; /* hex 0D 0A 0D 0A 00 0D 0A 51 55 49 54 0A */
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200454 uint8_t ver_cmd; /* protocol version and command */
Willy Tarreau332d7b02012-11-19 11:27:29 +0100455 uint8_t fam; /* protocol family and address */
David Safb76832014-05-08 23:42:08 -0400456 uint16_t len; /* number of following bytes part of the header */
Willy Tarreau332d7b02012-11-19 11:27:29 +0100457 };
458
459Starting from the 17th byte, addresses are presented in network byte order.
460The address order is always the same :
461 - source layer 3 address in network byte order
462 - destination layer 3 address in network byte order
463 - source layer 4 address if any, in network byte order (port)
464 - destination layer 4 address if any, in network byte order (port)
465
466The address block may directly be sent from or received into the following
467union which makes it easy to cast from/to the relevant socket native structs
468depending on the address type :
469
470 union proxy_addr {
471 struct { /* for TCP/UDP over IPv4, len = 12 */
472 uint32_t src_addr;
473 uint32_t dst_addr;
474 uint16_t src_port;
475 uint16_t dst_port;
476 } ipv4_addr;
477 struct { /* for TCP/UDP over IPv6, len = 36 */
478 uint8_t src_addr[16];
479 uint8_t dst_addr[16];
480 uint16_t src_port;
481 uint16_t dst_port;
482 } ipv6_addr;
483 struct { /* for AF_UNIX sockets, len = 216 */
484 uint8_t src_addr[108];
485 uint8_t dst_addr[108];
486 } unix_addr;
487 };
488
489The sender must ensure that all the protocol header is sent at once. This block
490is always smaller than an MSS, so there is no reason for it to be segmented at
491the beginning of the connection. The receiver should also process the header
492at once. The receiver must not start to parse an address before the whole
493address block is received. The receiver must also reject incoming connections
494containing partial protocol headers.
495
496A receiver may be configured to support both version 1 and version 2 of the
497protocol. Identifying the protocol version is easy :
498
499 - if the incoming byte count is 16 or above and the 13 first bytes match
500 the protocol signature block followed by the protocol version 2 :
501
502 \x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A\x02
503
504 - otherwise, if the incoming byte count is 8 or above, and the 5 first
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400505 characters match the US-ASCII representation of "PROXY" then the protocol
Willy Tarreau332d7b02012-11-19 11:27:29 +0100506 must be parsed as version 1 :
507
508 \x50\x52\x4F\x58\x59
509
510 - otherwise the protocol is not covered by this specification and the
511 connection must be dropped.
512
David Safb76832014-05-08 23:42:08 -0400513If the length specified in the PROXY protocol header indicates that additional
514bytes are part of the header beyond the address information, a receiver may
515choose to skip over and ignore those bytes, or attempt to interpret those
516bytes.
517
518The information in those bytes will be arranged in Type-Length-Value (TLV
519vectors) in the following format. The first byte is the Type of the vector.
520The second two bytes represent the length in bytes of the value (not included
521the Type and Length bytes), and following the length field is the number of
522bytes specified by the length.
523
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200524 struct pp2_tlv {
David Safb76832014-05-08 23:42:08 -0400525 uint8_t type;
526 uint8_t length_hi;
527 uint8_t length_lo;
528 uint8_t value[0];
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200529 };
David Safb76832014-05-08 23:42:08 -0400530
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500531A receiver may choose to skip over and ignore the TLVs he is not interested in
532or he does not understand. Senders can generate the TLVs only for
533the information they choose to publish.
534
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200535The following types have already been registered for the <type> field :
536
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200537 #define PP2_TYPE_ALPN 0x01
538 #define PP2_TYPE_AUTHORITY 0x02
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400539 #define PP2_TYPE_CRC32C 0x03
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200540 #define PP2_TYPE_SSL 0x20
541 #define PP2_SUBTYPE_SSL_VERSION 0x21
542 #define PP2_SUBTYPE_SSL_CN 0x22
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400543 #define PP2_SUBTYPE_SSL_CIPHER 0x23
544 #define PP2_SUBTYPE_SSL_SIG_ALG 0x24
545 #define PP2_SUBTYPE_SSL_KEY_ALG 0x25
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200546 #define PP2_TYPE_NETNS 0x30
547
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500548
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -04005492.2.1 PP2_TYPE_ALPN
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500550
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400551Application-Layer Protocol Negotiation (ALPN). It is a byte sequence defining
552the upper layer protocol in use over the connection. The most common use case
553will be to pass the exact copy of the ALPN extension of the Transport Layer
554Security (TLS) protocol as defined by RFC7301 [9].
555
556
5572.2.2 PP2_TYPE_AUTHORITY
558
559Contains the host name value passed by the client, as an UTF8-encoded string.
560In case of TLS being used on the client connection, this is the exact copy of
561the "server_name" extension as defined by RFC3546 [10], section 3.1, often
562referred to as "SNI". There are probably other situations where an authority
563can be mentionned on a connection without TLS being involved at all.
564
565
5662.2.3. PP2_TYPE_CRC32C
567
568The value of the type PP2_TYPE_CRC32C is a 32-bit number storing the CRC32c
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500569checksum of the PROXY protocol header.
570
571When the checksum is supported by the sender after constructing the header
572the sender MUST:
573
574 - initialize the checksum field to '0's.
575
576 - calculate the CRC32c checksum of the PROXY header as described in RFC4960,
577 Appendix B [8].
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200578
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500579 - put the resultant value into the checksum field, and leave the rest of
580 the bits unchanged.
581
582If the checksum is provided as part of the PROXY header and the checksum
583functionality is supported by the receiver, the receiver MUST:
584
585 - store the received CRC32c checksum value aside.
586
587 - replace the 32 bits of the checksum field in the received PROXY header with
588 all '0's and calculate a CRC32c checksum value of the whole PROXY header.
589
590 - verify that the calculated CRC32c checksum is the same as the received
591 CRC32c checksum. If it is not, the receiver MUST treat the TCP connection
592 providing the header as invalid.
593
594The default procedure for handling an invalid TCP connection is to abort it.
595
596
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -04005972.2.4. The PP2_TYPE_SSL type and subtypes
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200598
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400599For the type PP2_TYPE_SSL, the value is itself a defined like this :
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200600
601 struct pp2_tlv_ssl {
602 uint8_t client;
603 uint32_t verify;
604 struct pp2_tlv sub_tlv[0];
605 };
606
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200607The <verify> field will be zero if the client presented a certificate
608and it was successfully verified, and non-zero otherwise.
609
610The <client> field is made of a bit field from the following values,
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200611indicating which element is present :
612
613 #define PP2_CLIENT_SSL 0x01
614 #define PP2_CLIENT_CERT_CONN 0x02
615 #define PP2_CLIENT_CERT_SESS 0x04
616
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200617Note, that each of these elements may lead to extra data being appended to
618this TLV using a second level of TLV encapsulation. It is thus possible to
619find multiple TLV values after this field. The total length of the pp2_tlv_ssl
620TLV will reflect this.
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200621
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200622The PP2_CLIENT_SSL flag indicates that the client connected over SSL/TLS. When
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400623this field is present, the US-ASCII string representation of the TLS version is
624appended at the end of the field in the TLV format using the type
625PP2_SUBTYPE_SSL_VERSION.
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200626
627PP2_CLIENT_CERT_CONN indicates that the client provided a certificate over the
628current connection. PP2_CLIENT_CERT_SESS indicates that the client provided a
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200629certificate at least once over the TLS session this connection belongs to.
630
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400631The second level TLV PP2_SUBTYPE_SSL_CIPHER provides the US-ASCII string name
632of the used cipher, for example "ECDHE-RSA-AES128-GCM-SHA256".
633
634The second level TLV PP2_SUBTYPE_SSL_SIG_ALG provides the US-ASCII string name
635of the algorithm used to sign the certificate presented by the frontend when
636the incoming connection was made over an SSL/TLS transport layer, for example
637"SHA256".
638
639The second level TLV PP2_SUBTYPE_SSL_KEY_ALG provides the US-ASCII string name
640of the algorithm used to generate the key of the certificate presented by the
641frontend when the incoming connection was made over an SSL/TLS transport layer,
642for example "RSA2048".
643
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200644In all cases, the string representation (in UTF8) of the Common Name field
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400645(OID: 2.5.4.3) of the client certificate's Distinguished Name, is appended
646using the TLV format and the type PP2_SUBTYPE_SSL_CN. E.g. "example.com".
Nikos Mavrogiannopoulosf1650a82015-08-24 15:53:18 +0200647
648
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -04006492.2.5. The PP2_TYPE_NETNS type
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200650
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400651The type PP2_TYPE_NETNS defines the value as the US-ASCII string representation
652of the namespace's name.
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200653
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500654
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -04006552.2.6. Reserved type ranges
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500656
657The following range of 16 type values is reserved for application-specific
658data and will be never used by the PROXY Protocol. If you need more values
659consider extending the range with a type field in your TLVs.
660
661 #define PP2_TYPE_MIN_CUSTOM 0xE0
662 #define PP2_TYPE_MAX_CUSTOM 0xEF
663
664This range of 8 values is reserved for temporary experimental use by
665application developers and protocol designers. The values from the range will
666never be used by the PROXY protocol and should not be used by production
667functionality.
668
669 #define PP2_TYPE_MIN_EXPERIMENT 0xF0
670 #define PP2_TYPE_MAX_EXPERIMENT 0xF7
671
672The following range of 8 values is reserved for future use, potentially to
673extend the protocol with multibyte type values.
674
675 #define PP2_TYPE_MIN_FUTURE 0xF8
676 #define PP2_TYPE_MAX_FUTURE 0xFF
677
Willy Tarreau7f898512011-03-20 11:32:40 +0100678
6793. Implementations
680
Willy Tarreau332d7b02012-11-19 11:27:29 +0100681Haproxy 1.5 implements version 1 of the PROXY protocol on both sides :
Willy Tarreau7f898512011-03-20 11:32:40 +0100682 - the listening sockets accept the protocol when the "accept-proxy" setting
683 is passed to the "bind" keyword. Connections accepted on such listeners
684 will behave just as if the source really was the one advertised in the
685 protocol. This is true for logging, ACLs, content filtering, transparent
686 proxying, etc...
687
688 - the protocol may be used to connect to servers if the "send-proxy" setting
689 is present on the "server" line. It is enabled on a per-server basis, so it
690 is possible to have it enabled for remote servers only and still have local
691 ones behave differently. If the incoming connection was accepted with the
692 "accept-proxy", then the relayed information is the one advertised in this
693 connection's PROXY line.
694
David Safb76832014-05-08 23:42:08 -0400695 - Haproxy 1.5 also implements version 2 of the PROXY protocol as a sender. In
696 addition, a TLV with limited, optional, SSL information has been added.
697
Willy Tarreau332d7b02012-11-19 11:27:29 +0100698Stunnel added support for version 1 of the protocol for outgoing connections in
699version 4.45.
Willy Tarreau7f898512011-03-20 11:32:40 +0100700
Willy Tarreau332d7b02012-11-19 11:27:29 +0100701Stud added support for version 1 of the protocol for outgoing connections on
7022011/06/29.
703
704Postfix added support for version 1 of the protocol for incoming connections
705in smtpd and postscreen in version 2.10.
706
707A patch is available for Stud[5] to implement version 1 of the protocol on
708incoming connections.
709
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200710Support for versions 1 and 2 of the protocol was added to Varnish 4.1 [6].
Willy Tarreau332d7b02012-11-19 11:27:29 +0100711
Todd Lyonsd1dcea02014-06-03 13:29:33 -0700712Exim added support for version 1 and version 2 of the protocol for incoming
713connections on 2014/05/13, and will be released as part of version 4.83.
714
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200715Squid added support for versions 1 and 2 of the protocol in version 3.5 [7].
716
717Jetty 9.3.0 supports protocol version 1.
718
Willy Tarreau332d7b02012-11-19 11:27:29 +0100719The protocol is simple enough that it is expected that other implementations
720will appear, especially in environments such as SMTP, IMAP, FTP, RDP where the
Willy Tarreau7f898512011-03-20 11:32:40 +0100721client's address is an important piece of information for the server and some
Willy Tarreau332d7b02012-11-19 11:27:29 +0100722intermediaries. In fact, several proprietary deployments have already done so
723on FTP and SMTP servers.
Willy Tarreau7f898512011-03-20 11:32:40 +0100724
725Proxy developers are encouraged to implement this protocol, because it will
726make their products much more transparent in complex infrastructures, and will
727get rid of a number of issues related to logging and access control.
728
Willy Tarreau332d7b02012-11-19 11:27:29 +0100729
7304. Architectural benefits
7314.1. Multiple layers
732
733Using the PROXY protocol instead of transparent proxy provides several benefits
734in multiple-layer infrastructures. The first immediate benefit is that it
735becomes possible to chain multiple layers of proxies and always present the
736original IP address. for instance, let's consider the following 2-layer proxy
737architecture :
738
739 Internet
740 ,---. | client to PX1:
741 ( X ) | native protocol
742 `---' |
743 | V
744 +--+--+ +-----+
745 | FW1 |------| PX1 |
746 +--+--+ +-----+ | PX1 to PX2: PROXY + native
747 | V
748 +--+--+ +-----+
749 | FW2 |------| PX2 |
750 +--+--+ +-----+ | PX2 to SRV: PROXY + native
751 | V
752 +--+--+
753 | SRV |
754 +-----+
Willy Tarreau7f898512011-03-20 11:32:40 +0100755
Willy Tarreau332d7b02012-11-19 11:27:29 +0100756Firewall FW1 receives traffic from internet-based clients and forwards it to
757reverse-proxy PX1. PX1 adds a PROXY header then forwards to PX2 via FW2. PX2
758is configured to read the PROXY header and to emit it on output. It then joins
759the origin server SRV and presents the original client's address there. Since
760all TCP connections endpoints are real machines and are not spoofed, there is
761no issue for the return traffic to pass via the firewalls and reverse proxies.
762Using transparent proxy, this would be quite difficult because the firewalls
763would have to deal with the client's address coming from the proxies in the DMZ
764and would have to correctly route the return traffic there instead of using the
765default route.
Willy Tarreau7f898512011-03-20 11:32:40 +0100766
Willy Tarreau332d7b02012-11-19 11:27:29 +0100767
7684.2. IPv4 and IPv6 integration
769
770The protocol also eases IPv4 and IPv6 integration : if only the first layer
771(FW1 and PX1) is IPv6-capable, it is still possible to present the original
Andriy Palamarchukf1eae4e2017-01-24 13:34:08 -0500772client's IPv6 address to the target server even though the whole chain is only
Willy Tarreau332d7b02012-11-19 11:27:29 +0100773connected via IPv4.
774
775
7764.3. Multiple return paths
777
778When transparent proxy is used, it is not possible to run multiple proxies
779because the return traffic would follow the default route instead of finding
780the proper proxy. Some tricks are sometimes possible using multiple server
781addresses and policy routing but these are very limited.
782
783Using the PROXY protocol, this problem disappears as the servers don't need
784to route to the client, just to the proxy that forwarded the connection. So
785it is perfectly possible to run a proxy farm in front of a very large server
786farm and have it working effortless, even when dealing with multiple sites.
787
788This is particularly important in Cloud-like environments where there is little
789choice of binding to random addresses and where the lower processing power per
790node generally requires multiple front nodes.
791
792The example below illustrates the following case : virtualized infrastructures
793are deployed in 3 datacenters (DC1..DC3). Each DC uses its own VIP which is
794handled by the hosting provider's layer 3 load balancer. This load balancer
795routes the traffic to a farm of layer 7 SSL/cache offloaders which load balance
796among their local servers. The VIPs are advertised by geolocalised DNS so that
797clients generally stick to a given DC. Since clients are not guaranteed to
798stick to one DC, the L7 load balancing proxies have to know the other DCs'
799servers that may be reached via the hosting provider's LAN or via the internet.
800The L7 proxies use the PROXY protocol to join the servers behind them, so that
801even inter-DC traffic can forward the original client's address and the return
802path is unambiguous. This would not be possible using transparent proxy because
803most often the L7 proxies would not be able to spoof an address, and this would
804never work between datacenters.
805
806 Internet
807
808 DC1 DC2 DC3
809 ,---. ,---. ,---.
810 ( X ) ( X ) ( X )
811 `---' `---' `---'
812 | +-------+ | +-------+ | +-------+
813 +----| L3 LB | +----| L3 LB | +----| L3 LB |
814 | +-------+ | +-------+ | +-------+
815 ------+------- ~ ~ ~ ------+------- ~ ~ ~ ------+-------
816 ||||| |||| ||||| |||| ||||| ||||
817 50 SRV 4 PX 50 SRV 4 PX 50 SRV 4 PX
818
819
8205. Security considerations
821
822Version 1 of the protocol header (the human-readable format) was designed so as
823to be distinguishable from HTTP. It will not parse as a valid HTTP request and
824an HTTP request will not parse as a valid proxy request. Version 2 add to use a
825non-parsable binary signature to make many products fail on this block. The
826signature was designed to cause immediate failure on HTTP, SSL/TLS, SMTP, FTP,
827and POP. It also causes aborts on LDAP and RDP servers (see section 6). That
828makes it easier to enforce its use under certain connections and at the same
829time, it ensures that improperly configured servers are quickly detected.
830
Willy Tarreau7f898512011-03-20 11:32:40 +0100831Implementers should be very careful about not trying to automatically detect
Willy Tarreau332d7b02012-11-19 11:27:29 +0100832whether they have to decode the header or not, but rather they must only rely
833on a configuration parameter. Indeed, if the opportunity is left to a normal
834client to use the protocol, he will be able to hide his activities or make them
835appear as coming from someone else. However, accepting the header only from a
836number of known sources should be safe.
837
838
8396. Validation
Willy Tarreau7f898512011-03-20 11:32:40 +0100840
Willy Tarreau332d7b02012-11-19 11:27:29 +0100841The version 2 protocol signature has been sent to a wide variety of protocols
842and implementations including old ones. The following protocol and products
Andriy Palamarchukf1eae4e2017-01-24 13:34:08 -0500843have been tested to ensure the best possible behavior when the signature was
Willy Tarreau332d7b02012-11-19 11:27:29 +0100844presented, even with minimal implementations :
Willy Tarreau7f898512011-03-20 11:32:40 +0100845
Willy Tarreau332d7b02012-11-19 11:27:29 +0100846 - HTTP :
847 - Apache 1.3.33 : connection abort => pass/optimal
848 - Nginx 0.7.69 : 400 Bad Request + abort => pass/optimal
849 - lighttpd 1.4.20 : 400 Bad Request + abort => pass/optimal
850 - thttpd 2.20c : 400 Bad Request + abort => pass/optimal
851 - mini-httpd-1.19 : 400 Bad Request + abort => pass/optimal
852 - haproxy 1.4.21 : 400 Bad Request + abort => pass/optimal
Willy Tarreau9e138202014-07-12 17:31:07 +0200853 - Squid 3 : 400 Bad Request + abort => pass/optimal
Willy Tarreau332d7b02012-11-19 11:27:29 +0100854 - SSL :
855 - stud 0.3.47 : connection abort => pass/optimal
856 - stunnel 4.45 : connection abort => pass/optimal
857 - nginx 0.7.69 : 400 Bad Request + abort => pass/optimal
858 - FTP :
859 - Pure-ftpd 1.0.20 : 3*500 then 221 Goodbye => pass/optimal
860 - vsftpd 2.0.1 : 3*530 then 221 Goodbye => pass/optimal
861 - SMTP :
862 - postfix 2.3 : 3*500 + 221 Bye => pass/optimal
863 - exim 4.69 : 554 + connection abort => pass/optimal
864 - POP :
865 - dovecot 1.0.10 : 3*ERR + Logout => pass/optimal
866 - IMAP :
867 - dovecot 1.0.10 : 5*ERR + hang => pass/non-optimal
868 - LDAP :
869 - openldap 2.3 : abort => pass/optimal
870 - SSH :
871 - openssh 3.9p1 : abort => pass/optimal
872 - RDP :
873 - Windows XP SP3 : abort => pass/optimal
874
875This means that most protocols and implementations will not be confused by an
876incoming connection exhibiting the protocol signature, which avoids issues when
877facing misconfigurations.
878
879
8807. Future developments
Willy Tarreau640cf222010-10-29 21:46:16 +0200881
882It is possible that the protocol may slightly evolve to present other
883information such as the incoming network interface, or the origin addresses in
884case of network address translation happening before the first proxy, but this
Willy Tarreau332d7b02012-11-19 11:27:29 +0100885is not identified as a requirement right now. Some deep thinking has been spent
Andriy Palamarchukf1eae4e2017-01-24 13:34:08 -0500886on this and it appears that trying to add a few more information open a Pandora
Willy Tarreau332d7b02012-11-19 11:27:29 +0100887box with many information from MAC addresses to SSL client certificates, which
888would make the protocol much more complex. So at this point it is not planned.
889Suggestions on improvements are welcome.
Willy Tarreau7f898512011-03-20 11:32:40 +0100890
891
Willy Tarreau332d7b02012-11-19 11:27:29 +01008928. Contacts and links
Willy Tarreau7f898512011-03-20 11:32:40 +0100893
894Please use w@1wt.eu to send any comments to the author.
895
Willy Tarreau332d7b02012-11-19 11:27:29 +0100896The following links were referenced in the document.
897
898[1] http://www.postfix.org/XCLIENT_README.html
Willy Tarreau7a6f1342014-06-14 11:45:09 +0200899[2] http://tools.ietf.org/html/rfc7239
Willy Tarreau332d7b02012-11-19 11:27:29 +0100900[3] http://www.stunnel.org/
901[4] https://github.com/bumptech/stud
902[5] https://github.com/bumptech/stud/pull/81
Willy Tarreau7b7011c2015-05-02 15:13:07 +0200903[6] https://www.varnish-cache.org/docs/trunk/phk/ssl_again.html
904[7] http://wiki.squid-cache.org/Squid-3.5
Andriy Palamarchukceae85b2017-01-24 13:48:27 -0500905[8] https://tools.ietf.org/html/rfc4960#appendix-B
Andriy Palamarchuk01105ac2017-03-14 18:59:09 -0400906[9] https://tools.ietf.org/rfc/rfc7301.txt
907[10] https://www.ietf.org/rfc/rfc3546.txt
Willy Tarreau332d7b02012-11-19 11:27:29 +0100908
9099. Sample code
910
911The code below is an example of how a receiver may deal with both versions of
912the protocol header for TCP over IPv4 or IPv6. The function is supposed to be
913called upon a read event. Addresses may be directly copied into their final
914memory location since they're transported in network byte order. The sending
915side is even simpler and can easily be deduced from this sample code.
916
917 struct sockaddr_storage from; /* already filled by accept() */
918 struct sockaddr_storage to; /* already filled by getsockname() */
Willy Tarreau01320c92014-06-14 08:36:29 +0200919 const char v2sig[12] = "\x0D\x0A\x0D\x0A\x00\x0D\x0A\x51\x55\x49\x54\x0A";
Willy Tarreau332d7b02012-11-19 11:27:29 +0100920
921 /* returns 0 if needs to poll, <0 upon error or >0 if it did the job */
922 int read_evt(int fd)
923 {
924 union {
925 struct {
926 char line[108];
927 } v1;
928 struct {
929 uint8_t sig[12];
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200930 uint8_t ver_cmd;
Willy Tarreau332d7b02012-11-19 11:27:29 +0100931 uint8_t fam;
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200932 uint16_t len;
Willy Tarreau332d7b02012-11-19 11:27:29 +0100933 union {
934 struct { /* for TCP/UDP over IPv4, len = 12 */
935 uint32_t src_addr;
936 uint32_t dst_addr;
937 uint16_t src_port;
938 uint16_t dst_port;
939 } ip4;
940 struct { /* for TCP/UDP over IPv6, len = 36 */
941 uint8_t src_addr[16];
942 uint8_t dst_addr[16];
943 uint16_t src_port;
944 uint16_t dst_port;
945 } ip6;
946 struct { /* for AF_UNIX sockets, len = 216 */
947 uint8_t src_addr[108];
948 uint8_t dst_addr[108];
949 } unx;
950 } addr;
951 } v2;
952 } hdr;
953
954 int size, ret;
955
956 do {
957 ret = recv(fd, &hdr, sizeof(hdr), MSG_PEEK);
958 } while (ret == -1 && errno == EINTR);
959
960 if (ret == -1)
961 return (errno == EAGAIN) ? 0 : -1;
962
Willy Tarreau01320c92014-06-14 08:36:29 +0200963 if (ret >= 16 && memcmp(&hdr.v2, v2sig, 12) == 0 &&
964 (hdr.v2.ver_cmd & 0xF0) == 0x20) {
Willy Tarreau332d7b02012-11-19 11:27:29 +0100965 size = 16 + hdr.v2.len;
966 if (ret < size)
967 return -1; /* truncated or too large header */
968
Willy Tarreau0f6093a2014-06-11 21:21:26 +0200969 switch (hdr.v2.ver_cmd & 0xF) {
Willy Tarreau332d7b02012-11-19 11:27:29 +0100970 case 0x01: /* PROXY command */
971 switch (hdr.v2.fam) {
972 case 0x11: /* TCPv4 */
973 ((struct sockaddr_in *)&from)->sin_family = AF_INET;
974 ((struct sockaddr_in *)&from)->sin_addr.s_addr =
975 hdr.v2.addr.ip4.src_addr;
976 ((struct sockaddr_in *)&from)->sin_port =
977 hdr.v2.addr.ip4.src_port;
978 ((struct sockaddr_in *)&to)->sin_family = AF_INET;
979 ((struct sockaddr_in *)&to)->sin_addr.s_addr =
980 hdr.v2.addr.ip4.dst_addr;
981 ((struct sockaddr_in *)&to)->sin_port =
982 hdr.v2.addr.ip4.dst_port;
983 goto done;
984 case 0x21: /* TCPv6 */
985 ((struct sockaddr_in6 *)&from)->sin6_family = AF_INET6;
986 memcpy(&((struct sockaddr_in6 *)&from)->sin6_addr,
987 hdr.v2.addr.ip6.src_addr, 16);
988 ((struct sockaddr_in6 *)&from)->sin6_port =
989 hdr.v2.addr.ip6.src_port;
990 ((struct sockaddr_in6 *)&to)->sin6_family = AF_INET6;
991 memcpy(&((struct sockaddr_in6 *)&to)->sin6_addr,
992 hdr.v2.addr.ip6.dst_addr, 16);
993 ((struct sockaddr_in6 *)&to)->sin6_port =
994 hdr.v2.addr.ip6.dst_port;
995 goto done;
996 }
997 /* unsupported protocol, keep local connection address */
998 break;
999 case 0x00: /* LOCAL command */
1000 /* keep local connection address for LOCAL */
1001 break;
1002 default:
1003 return -1; /* not a supported command */
1004 }
1005 }
1006 else if (ret >= 8 && memcmp(hdr.v1.line, "PROXY", 5) == 0) {
1007 char *end = memchr(hdr.v1.line, '\r', ret - 1);
1008 if (!end || end[1] != '\n')
1009 return -1; /* partial or invalid header */
1010 *end = '\0'; /* terminate the string to ease parsing */
1011 size = end + 2 - hdr.v1.line; /* skip header + CRLF */
1012 /* parse the V1 header using favorite address parsers like inet_pton.
1013 * return -1 upon error, or simply fall through to accept.
1014 */
1015 }
1016 else {
1017 /* Wrong protocol */
1018 return -1;
1019 }
1020
1021 done:
1022 /* we need to consume the appropriate amount of data from the socket */
1023 do {
1024 ret = recv(fd, &hdr, size, 0);
1025 } while (ret == -1 && errno == EINTR);
1026 return (ret >= 0) ? 1 : -1;
1027 }