Blame - doc/proxy-protocol.txt - haproxy

blob: c02fa5f79277c7a18b492546e73f2cdae51f7a8f [file] [log] [blame]

Willy Tarreau	640cf22	2010-10-29 21:46:16 +0200	[diff] [blame]	1	The PROXY protocol - 2010/10/29 - Willy TARREAU
				2	-----------------------------------------------
				3
				4	Relaying TCP connections through proxies generally involves a loss of the
				5	original TCP connection parameters such as source and destination addresses,
				6	ports, and so on. Some protocols make it a little bit easier to transfer such
				7	information. For SMTP, Postfix authors have proposed the XCLIENT protocol which
				8	received broad adoption and is particularly suited to mail exchanges. In HTTP,
				9	we have the non-standard but omnipresent X-Forwarded-For header which relays
				10	information about the original source address, and the less common
				11	X-Original-To which relays information about the destination address.
				12
				13	However, both mechanisms require a knowledge of the underlying protocol to be
				14	implemented in intermediaries.
				15
				16	Then comes a new class of products which we'll call "dumb proxies", not because
				17	they don't do anything, but because they're processing protocol-agnostic data.
				18	Stunnel is an example of such a "dumb proxy". It talks raw TCP on one side, and
				19	raw SSL on the other one, and does that reliably.
				20
				21	The problem with such a proxy when it is combined with another one such as
				22	haproxy is to adapt it to talk the higher level protocol. A patch is available
				23	for Stunnel to make it capable to insert an X-Forwarded-For header in the first
				24	HTTP request of each incoming connection. Haproxy is able not to add another
				25	one when the connection comes from Stunnel, so that it's possible to hide it
				26	from the servers.
				27
				28	The typical architecture becomes the following one :
				29
				30
				31	+--------+ HTTP :80 +----------+
				32	\| client \| --------------------------------> \| \|
				33	\| \| \| haproxy, \|
				34	+--------+ +---------+ \| 1 or 2 \|
				35	/ / HTTPS \| stunnel \| HTTP :81 \| listening\|
				36	<________/ ---------> \| (server \| ---------> \| ports \|
				37	\| mode) \| \| \|
				38	+---------+ +----------+
				39
				40
				41	The problem appears when haproxy runs with keep-alive on the side towards the
				42	client. The Stunnel patch will only add the X-Forwarded-For header to the first
				43	request of each connection and all subsequent requests will not have it. One
				44	solution could be to improve the patch to make it support keep-alive and parse
				45	all forwarded data, whether they're announced with a Content-Length or with a
				46	Transfer-Encoding, taking care of special methods such as HEAD which announce
				47	data without transfering them, etc... In fact, it would require implementing a
				48	full HTTP stack in Stunnel. It would then become a lot more complex, a lot less
				49	reliable and would not anymore be the "dumb proxy" that fits every purposes.
				50
				51	In practice, we don't need to add a header for each request because we'll emit
				52	the exact same information every time : the information related to the client
				53	side connection. We could then cache that information in haproxy and use it for
				54	every other request. But that becomes dangerous and is still limited to HTTP
				55	only.
				56
				57	Another approach would be to prepend each connection with a line reporting the
				58	characteristics of the other side's connection. This method is a lot simpler to
				59	implement, does not require any protocol-specific knowledge on either side, and
				60	completely fits the purpose. That's finally what we did with a small patch to
				61	Stunnel and another one to haproxy. We have called this protocol the PROXY
				62	protocol.
				63
				64	The PROXY protocol's goal is to fill the receiver's internal structures with
				65	the information it could have found itself if it performed the accept from the
				66	client. Thus right now we're supporting the following :
				67	- INET protocol and family (TCP over IPv4 or IPv6)
				68	- layer 3 source and destination addresses
				69	- layer 4 source and destination ports if any
				70
				71	Unlike the XCLIENT protocol, the PROXY protocol was designed with limited
				72	extensibility in order to help the receiver parse it very fast, while keeping
				73	it human-readable for better debugging possibilities. So it consists in exactly
				74	the following block prepended before any data flowing from the dumb proxy to
				75	the next hop :
				76
				77	- a string identifying the protocol : "PROXY" ( \x50 \x52 \x4F \x58 \x59 )
				78
				79	- exactly one space : " " ( \x20 )
				80
				81	- a string indicating the proxied INET protocol and family. At the moment,
				82	only "TCP4" ( \x54 \x43 \x50 \x34 ) for TCP over IPv4, and "TCP6"
				83	( \x54 \x43 \x50 \x36 ) for TCP over IPv6 are allowed. Unsupported or
				84	unknown protocols must be reported with the name "UNKNOWN" ( \x55 \x4E \x4B
				85	\x4E \x4F \x57 \x4E). The remaining fields of the line are then optional
				86	and may be ignored, until the CRLF is found.
				87
				88	- exactly one space : " " ( \x20 )
				89
				90	- the layer 3 source address in its canonical format. IPv4 addresses must be
				91	indicated as a series of exactly 4 integers in the range [0..255] inclusive
				92	written in decimal representation separated by exactly one dot between each
				93	other. Heading zeroes are not permitted in front of numbers in order to
				94	avoid any possible confusion with octal numbers. IPv6 addresses must be
				95	indicated as series of 4 hexadecimal digits (upper or lower case) delimited
				96	by colons between each other, with the acceptance of one double colon
				97	sequence to replace the largest acceptable range of consecutive zeroes. The
				98	total number of decoded bits must exactly be 128. The advertised protocol
				99	family dictates what format to use.
				100
				101	- exactly one space : " " ( \x20 )
				102
				103	- the layer 3 destination address in its canonical format. It is the same
				104	format as the layer 3 source address and matches the same family.
				105
				106	- exactly one space : " " ( \x20 )
				107
				108	- the TCP source port represented as a decimal integer in the range
				109	[0..65535] inclusive. Heading zeroes are not permitted in front of numbers
				110	in order to avoid any possible confusion with octal numbers.
				111
				112	- exactly one space : " " ( \x20 )
				113
				114	- the TCP destination port represented as a decimal integer in the range
				115	[0..65535] inclusive. Heading zeroes are not permitted in front of numbers
				116	in order to avoid any possible confusion with octal numbers.
				117
				118	- the CRLF sequence ( \x0D \x0A )
				119
				120	The receiver MUST be configured to only receive this protocol and MUST not try
				121	to guess whether the line is prepended or not. That means that the protocol
				122	explicitly prevents port sharing between public and private access. Otherwise
				123	it would become a big security issue. The receiver should ensure proper access
				124	filtering so that only trusted proxies are allowed to use this protocol. The
				125	receiver must wait for the CRLF sequence to decode the addresses in order to
				126	ensure they are complete. Any sequence which does not exactly match the
				127	protocol must be discarded and cause a connection abort. It is recommended
				128	to abort the connection as soon as possible to that the emitter notices the
				129	anomaly.
				130
				131	If the announced transport protocol is "UNKNOWN", then the receiver knows that
				132	the emitter talks the correct protocol, any may or may not decide to accept the
				133	connection and use the real connection's parameters as if there was no such
				134	protocol on the wire.
				135
				136	An example of such a line before an HTTP request would look like this (CR
				137	marked as "\r" and LF marked as "\n") :
				138
				139	PROXY TCP4 192.168.0.1 192.168.0.11 56324 443\r\n
				140	GET / HTTP/1.1\r\n
				141	Host: 192.168.0.11\r\n
				142	\r\n
				143
				144	For the emitter, the line is easy to put into the output buffers once the
				145	connection is established. For the receiver, once the line is parsed, it's
				146	easy to skip it from the input buffers.
				147
				148	We have a patch available for recent versions of Stunnel that brings it the
				149	ability to be an emitter. The feature is called "send-proxy" there. The code
				150	for the receiving side has been merged into haproxy and is enabled using the
				151	"accept-proxy" keyword on a "bind" statement. Haproxy will use the transport
				152	information from the PROXY protocol for logging, ACLs, etc... everywhere an
				153	information about the original connection is required.
				154
				155	It is possible that the protocol may slightly evolve to present other
				156	information such as the incoming network interface, or the origin addresses in
				157	case of network address translation happening before the first proxy, but this
				158	is not identified as a requirement right now.
				159	--