blob: ff3b3db8434bb9aec055b7b4858012ce5b4ae91c [file] [log] [blame]
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001 -----------------------
2 HAProxy Starter Guide
3 -----------------------
Willy Tarreaufba74ea2018-12-22 11:19:45 +01004 version 2.0
Willy Tarreaud8e42b62015-08-18 21:51:36 +02005
6
7This document is an introduction to HAProxy for all those who don't know it, as
8well as for those who want to re-discover it when they know older versions. Its
9primary focus is to provide users with all the elements to decide if HAProxy is
10the product they're looking for or not. Advanced users may find here some parts
11of solutions to some ideas they had just because they were not aware of a given
Davor Ocelic4094ce12017-12-19 23:30:39 +010012new feature. Some sizing information is also provided, the product's lifecycle
Willy Tarreaud8e42b62015-08-18 21:51:36 +020013is explained, and comparisons with partially overlapping products are provided.
14
Davor Ocelic4094ce12017-12-19 23:30:39 +010015This document doesn't provide any configuration help or hints, but it explains
Willy Tarreaud8e42b62015-08-18 21:51:36 +020016where to find the relevant documents. The summary below is meant to help you
17search sections by name and navigate through the document.
18
19Note to documentation contributors :
20 This document is formatted with 80 columns per line, with even number of
21 spaces for indentation and without tabs. Please follow these rules strictly
22 so that it remains easily printable everywhere. If you add sections, please
23 update the summary below for easier searching.
24
25
26Summary
27-------
28
291. Available documentation
30
312. Quick introduction to load balancing and load balancers
32
333. Introduction to HAProxy
343.1. What HAProxy is and is not
353.2. How HAProxy works
363.3. Basic features
373.3.1. Proxying
383.3.2. SSL
393.3.3. Monitoring
403.3.4. High availability
413.3.5. Load balancing
423.3.6. Stickiness
433.3.7. Sampling and converting information
443.3.8. Maps
453.3.9. ACLs and conditions
463.3.10. Content switching
473.3.11. Stick-tables
Davor Ocelic4094ce12017-12-19 23:30:39 +0100483.3.12. Formatted strings
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200493.3.13. HTTP rewriting and redirection
503.3.14. Server protection
513.3.15. Logging
523.3.16. Statistics
533.4. Advanced features
543.4.1. Management
553.4.2. System-specific capabilities
563.4.3. Scripting
573.5. Sizing
583.6. How to get HAProxy
59
604. Companion products and alternatives
614.1. Apache HTTP server
624.2. NGINX
634.3. Varnish
644.4. Alternatives
65
66
671. Available documentation
68--------------------------
69
70The complete HAProxy documentation is contained in the following documents.
71Please ensure to consult the relevant documentation to save time and to get the
72most accurate response to your needs. Also please refrain from sending questions
73to the mailing list whose responses are present in these documents.
74
75 - intro.txt (this document) : it presents the basics of load balancing,
76 HAProxy as a product, what it does, what it doesn't do, some known traps to
77 avoid, some OS-specific limitations, how to get it, how it evolves, how to
Davor Ocelic4094ce12017-12-19 23:30:39 +010078 ensure you're running with all known fixes, how to update it, complements
79 and alternatives.
Willy Tarreaud8e42b62015-08-18 21:51:36 +020080
Willy Tarreau373933d2015-10-13 16:32:20 +020081 - management.txt : it explains how to start haproxy, how to manage it at
Davor Ocelic4094ce12017-12-19 23:30:39 +010082 runtime, how to manage it on multiple nodes, and how to proceed with
83 seamless upgrades.
Willy Tarreau373933d2015-10-13 16:32:20 +020084
Willy Tarreaud8e42b62015-08-18 21:51:36 +020085 - configuration.txt : the reference manual details all configuration keywords
86 and their options. It is used when a configuration change is needed.
87
Willy Tarreaud8e42b62015-08-18 21:51:36 +020088 - coding-style.txt : this is for developers who want to propose some code to
Davor Ocelic4094ce12017-12-19 23:30:39 +010089 the project. It explains the style to adopt for the code. It is not very
90 strict and not all the code base completely respects it, but contributions
Willy Tarreaud8e42b62015-08-18 21:51:36 +020091 which diverge too much from it will be rejected.
92
93 - proxy-protocol.txt : this is the de-facto specification of the PROXY
94 protocol which is implemented by HAProxy and a number of third party
95 products.
96
Davor Ocelic4094ce12017-12-19 23:30:39 +010097 - README : how to build HAProxy from sources
Willy Tarreaud8e42b62015-08-18 21:51:36 +020098
99
1002. Quick introduction to load balancing and load balancers
101----------------------------------------------------------
102
103Load balancing consists in aggregating multiple components in order to achieve
104a total processing capacity above each component's individual capacity, without
105any intervention from the end user and in a scalable way. This results in more
Willy Tarreaueff04f42015-08-27 14:44:43 +0200106operations being performed simultaneously by the time it takes a component to
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200107perform only one. A single operation however will still be performed on a single
108component at a time and will not get faster than without load balancing. It
109always requires at least as many operations as available components and an
110efficient load balancing mechanism to make use of all components and to fully
111benefit from the load balancing. A good example of this is the number of lanes
112on a highway which allows as many cars to pass during the same time frame
113without increasing their individual speed.
114
115Examples of load balancing :
116
117 - Process scheduling in multi-processor systems
Davor Ocelic4094ce12017-12-19 23:30:39 +0100118 - Link load balancing (e.g. EtherChannel, Bonding)
119 - IP address load balancing (e.g. ECMP, DNS round-robin)
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200120 - Server load balancing (via load balancers)
121
122The mechanism or component which performs the load balancing operation is
123called a load balancer. In web environments these components are called a
124"network load balancer", and more commonly a "load balancer" given that this
125activity is by far the best known case of load balancing.
126
127A load balancer may act :
128
129 - at the link level : this is called link load balancing, and it consists in
Patrick Starrdce734e2017-10-09 13:17:12 +0700130 choosing what network link to send a packet to;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200131
132 - at the network level : this is called network load balancing, and it
Patrick Starrdce734e2017-10-09 13:17:12 +0700133 consists in choosing what route a series of packets will follow;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200134
135 - at the server level : this is called server load balancing and it consists
136 in deciding what server will process a connection or request.
137
138Two distinct technologies exist and address different needs, though with some
Willy Tarreaueff04f42015-08-27 14:44:43 +0200139overlapping. In each case it is important to keep in mind that load balancing
140consists in diverting the traffic from its natural flow and that doing so always
141requires a minimum of care to maintain the required level of consistency between
142all routing decisions.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200143
144The first one acts at the packet level and processes packets more or less
145individually. There is a 1-to-1 relation between input and output packets, so
146it is possible to follow the traffic on both sides of the load balancer using a
Davor Ocelic4094ce12017-12-19 23:30:39 +0100147regular network sniffer. This technology can be very cheap and extremely fast.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200148It is usually implemented in hardware (ASICs) allowing to reach line rate, such
Davor Ocelic4094ce12017-12-19 23:30:39 +0100149as switches doing ECMP. Usually stateless, it can also be stateful (consider
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200150the session a packet belongs to and called layer4-LB or L4), may support DSR
151(direct server return, without passing through the LB again) if the packets
152were not modified, but provides almost no content awareness. This technology is
153very well suited to network-level load balancing, though it is sometimes used
154for very basic server load balancing at high speed.
155
156The second one acts on session contents. It requires that the input streams is
157reassembled and processed as a whole. The contents may be modified, and the
158output stream is segmented into new packets. For this reason it is generally
159performed by proxies and they're often called layer 7 load balancers or L7.
160This implies that there are two distinct connections on each side, and that
161there is no relation between input and output packets sizes nor counts. Clients
162and servers are not required to use the same protocol (for example IPv4 vs
163IPv6, clear vs SSL). The operations are always stateful, and the return traffic
164must pass through the load balancer. The extra processing comes with a cost so
165it's not always possible to achieve line rate, especially with small packets.
166On the other hand, it offers wide possibilities and is generally achieved by
167pure software, even if embedded into hardware appliances. This technology is
168very well suited for server load balancing.
169
170Packet-based load balancers are generally deployed in cut-through mode, so they
171are installed on the normal path of the traffic and divert it according to the
172configuration. The return traffic doesn't necessarily pass through the load
173balancer. Some modifications may be applied to the network destination address
174in order to direct the traffic to the proper destination. In this case, it is
175mandatory that the return traffic passes through the load balancer. If the
176routes doesn't make this possible, the load balancer may also replace the
177packets' source address with its own in order to force the return traffic to
178pass through it.
179
Davor Ocelic4094ce12017-12-19 23:30:39 +0100180Proxy-based load balancers are deployed as a server with their own IP addresses
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200181and ports, without architecture changes. Sometimes this requires to perform some
182adaptations to the applications so that clients are properly directed to the
183load balancer's IP address and not directly to the server's. Some load balancers
Davor Ocelic4094ce12017-12-19 23:30:39 +0100184may have to adjust some servers' responses to make this possible (e.g. the HTTP
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200185Location header field used in HTTP redirects). Some proxy-based load balancers
186may intercept traffic for an address they don't own, and spoof the client's
187address when connecting to the server. This allows them to be deployed as if
188they were a regular router or firewall, in a cut-through mode very similar to
189the packet based load balancers. This is particularly appreciated for products
190which combine both packet mode and proxy mode. In this case DSR is obviously
191still not possible and the return traffic still has to be routed back to the
192load balancer.
193
194A very scalable layered approach would consist in having a front router which
195receives traffic from multiple load balanced links, and uses ECMP to distribute
196this traffic to a first layer of multiple stateful packet-based load balancers
197(L4). These L4 load balancers in turn pass the traffic to an even larger number
198of proxy-based load balancers (L7), which have to parse the contents to decide
199what server will ultimately receive the traffic.
200
201The number of components and possible paths for the traffic increases the risk
202of failure; in very large environments, it is even normal to permanently have
203a few faulty components being fixed or replaced. Load balancing done without
204awareness of the whole stack's health significantly degrades availability. For
205this reason, any sane load balancer will verify that the components it intends
206to deliver the traffic to are still alive and reachable, and it will stop
207delivering traffic to faulty ones. This can be achieved using various methods.
208
209The most common one consists in periodically sending probes to ensure the
210component is still operational. These probes are called "health checks". They
211must be representative of the type of failure to address. For example a ping-
212based check will not detect that a web server has crashed and doesn't listen to
213a port anymore, while a connection to the port will verify this, and a more
214advanced request may even validate that the server still works and that the
215database it relies on is still accessible. Health checks often involve a few
216retries to cover for occasional measuring errors. The period between checks
217must be small enough to ensure the faulty component is not used for too long
218after an error occurs.
219
220Other methods consist in sampling the production traffic sent to a destination
Davor Ocelic4094ce12017-12-19 23:30:39 +0100221to observe if it is processed correctly or not, and to evict the components
Patrick Starrdce734e2017-10-09 13:17:12 +0700222which return inappropriate responses. However this requires to sacrifice a part
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200223of the production traffic and this is not always acceptable. A combination of
224these two mechanisms provides the best of both worlds, with both of them being
225used to detect a fault, and only health checks to detect the end of the fault.
226A last method involves centralized reporting : a central monitoring agent
227periodically updates all load balancers about all components' state. This gives
228a global view of the infrastructure to all components, though sometimes with
229less accuracy or responsiveness. It's best suited for environments with many
230load balancers and many servers.
231
232Layer 7 load balancers also face another challenge known as stickiness or
233persistence. The principle is that they generally have to direct multiple
234subsequent requests or connections from a same origin (such as an end user) to
235the same target. The best known example is the shopping cart on an online
236store. If each click leads to a new connection, the user must always be sent
237to the server which holds his shopping cart. Content-awareness makes it easier
238to spot some elements in the request to identify the server to deliver it to,
239but that's not always enough. For example if the source address is used as a
240key to pick a server, it can be decided that a hash-based algorithm will be
241used and that a given IP address will always be sent to the same server based
242on a divide of the address by the number of available servers. But if one
243server fails, the result changes and all users are suddenly sent to a different
244server and lose their shopping cart. The solution against this issue consists
245in memorizing the chosen target so that each time the same visitor is seen,
246he's directed to the same server regardless of the number of available servers.
247The information may be stored in the load balancer's memory, in which case it
248may have to be replicated to other load balancers if it's not alone, or it may
249be stored in the client's memory using various methods provided that the client
250is able to present this information back with every request (cookie insertion,
251redirection to a sub-domain, etc). This mechanism provides the extra benefit of
252not having to rely on unstable or unevenly distributed information (such as the
253source IP address). This is in fact the strongest reason to adopt a layer 7
254load balancer instead of a layer 4 one.
255
256In order to extract information such as a cookie, a host header field, a URL
257or whatever, a load balancer may need to decrypt SSL/TLS traffic and even
Davor Ocelic4094ce12017-12-19 23:30:39 +0100258possibly to re-encrypt it when passing it to the server. This expensive task
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200259explains why in some high-traffic infrastructures, sometimes there may be a
260lot of load balancers.
261
262Since a layer 7 load balancer may perform a number of complex operations on the
263traffic (decrypt, parse, modify, match cookies, decide what server to send to,
264etc), it can definitely cause some trouble and will very commonly be accused of
265being responsible for a lot of trouble that it only revealed. Often it will be
266discovered that servers are unstable and periodically go up and down, or for
267web servers, that they deliver pages with some hard-coded links forcing the
268clients to connect directly to one specific server without passing via the load
269balancer, or that they take ages to respond under high load causing timeouts.
270That's why logging is an extremely important aspect of layer 7 load balancing.
271Once a trouble is reported, it is important to figure if the load balancer took
272a wrong decision and if so why so that it doesn't happen anymore.
273
274
2753. Introduction to HAProxy
276--------------------------
277
Davor Ocelic4094ce12017-12-19 23:30:39 +0100278HAProxy is written as "HAProxy" to designate the product, and as "haproxy" to
279designate the executable program, software package or a process. However, both
280are commonly used for both purposes, and are pronounced H-A-Proxy. Very early,
281"haproxy" used to stand for "high availability proxy" and the name was written
282in two separate words, though by now it means nothing else than "HAProxy".
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200283
284
Davor Ocelic4094ce12017-12-19 23:30:39 +01002853.1. What HAProxy is and isn't
286------------------------------
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200287
288HAProxy is :
289
290 - a TCP proxy : it can accept a TCP connection from a listening socket,
291 connect to a server and attach these sockets together allowing traffic to
292 flow in both directions;
293
294 - an HTTP reverse-proxy (called a "gateway" in HTTP terminology) : it presents
295 itself as a server, receives HTTP requests over connections accepted on a
296 listening TCP socket, and passes the requests from these connections to
297 servers using different connections.
298
299 - an SSL terminator / initiator / offloader : SSL/TLS may be used on the
300 connection coming from the client, on the connection going to the server,
301 or even on both connections.
302
303 - a TCP normalizer : since connections are locally terminated by the operating
304 system, there is no relation between both sides, so abnormal traffic such as
305 invalid packets, flag combinations, window advertisements, sequence numbers,
306 incomplete connections (SYN floods), or so will not be passed to the other
307 side. This protects fragile TCP stacks from protocol attacks, and also
308 allows to optimize the connection parameters with the client without having
309 to modify the servers' TCP stack settings.
310
311 - an HTTP normalizer : when configured to process HTTP traffic, only valid
312 complete requests are passed. This protects against a lot of protocol-based
313 attacks. Additionally, protocol deviations for which there is a tolerance
314 in the specification are fixed so that they don't cause problem on the
Davor Ocelic4094ce12017-12-19 23:30:39 +0100315 servers (e.g. multiple-line headers).
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200316
317 - an HTTP fixing tool : it can modify / fix / add / remove / rewrite the URL
318 or any request or response header. This helps fixing interoperability issues
319 in complex environments.
320
321 - a content-based switch : it can consider any element from the request to
322 decide what server to pass the request or connection to. Thus it is possible
Davor Ocelic4094ce12017-12-19 23:30:39 +0100323 to handle multiple protocols over a same port (e.g. HTTP, HTTPS, SSH).
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200324
325 - a server load balancer : it can load balance TCP connections and HTTP
326 requests. In TCP mode, load balancing decisions are taken for the whole
327 connection. In HTTP mode, decisions are taken per request.
328
329 - a traffic regulator : it can apply some rate limiting at various points,
330 protect the servers against overloading, adjust traffic priorities based on
331 the contents, and even pass such information to lower layers and outer
332 network components by marking packets.
333
334 - a protection against DDoS and service abuse : it can maintain a wide number
335 of statistics per IP address, URL, cookie, etc and detect when an abuse is
336 happening, then take action (slow down the offenders, block them, send them
337 to outdated contents, etc).
338
339 - an observation point for network troubleshooting : due to the precision of
340 the information reported in logs, it is often used to narrow down some
341 network-related issues.
342
343 - an HTTP compression offloader : it can compress responses which were not
344 compressed by the server, thus reducing the page load time for clients with
345 poor connectivity or using high-latency, mobile networks.
346
347HAProxy is not :
348
Davor Ocelic4094ce12017-12-19 23:30:39 +0100349 - an explicit HTTP proxy, i.e. the proxy that browsers use to reach the
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200350 internet. There are excellent open-source software dedicated for this task,
351 such as Squid. However HAProxy can be installed in front of such a proxy to
352 provide load balancing and high availability.
353
Davor Ocelic4094ce12017-12-19 23:30:39 +0100354 - a caching proxy : it will return the contents received from the server as-is
355 and will not interfere with any caching policy. There are excellent
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200356 open-source software for this task such as Varnish. HAProxy can be installed
357 in front of such a cache to provide SSL offloading, and scalability through
358 smart load balancing.
359
360 - a data scrubber : it will not modify the body of requests nor responses.
361
362 - a web server : during startup, it isolates itself inside a chroot jail and
363 drops its privileges, so that it will not perform any single file-system
364 access once started. As such it cannot be turned into a web server. There
365 are excellent open-source software for this such as Apache or Nginx, and
366 HAProxy can be installed in front of them to provide load balancing and
367 high availability.
368
369 - a packet-based load balancer : it will not see IP packets nor UDP datagrams,
370 will not perform NAT or even less DSR. These are tasks for lower layers.
371 Some kernel-based components such as IPVS (Linux Virtual Server) already do
372 this pretty well and complement perfectly with HAProxy.
373
374
3753.2. How HAProxy works
376----------------------
377
378HAProxy is a single-threaded, event-driven, non-blocking engine combining a very
379fast I/O layer with a priority-based scheduler. As it is designed with a data
380forwarding goal in mind, its architecture is optimized to move data as fast as
381possible with the least possible operations. As such it implements a layered
Davor Ocelic4094ce12017-12-19 23:30:39 +0100382model offering bypass mechanisms at each level ensuring data doesn't reach
383higher levels unless needed. Most of the processing is performed in the kernel,
384and HAProxy does its best to help the kernel do the work as fast as possible by
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200385giving some hints or by avoiding certain operation when it guesses they could
386be grouped later. As a result, typical figures show 15% of the processing time
387spent in HAProxy versus 85% in the kernel in TCP or HTTP close mode, and about
38830% for HAProxy versus 70% for the kernel in HTTP keep-alive mode.
389
390A single process can run many proxy instances; configurations as large as
391300000 distinct proxies in a single process were reported to run fine. Thus
392there is usually no need to start more than one process for all instances.
393
394It is possible to make HAProxy run over multiple processes, but it comes with
395a few limitations. In general it doesn't make sense in HTTP close or TCP modes
396because the kernel-side doesn't scale very well with some operations such as
397connect(). It scales pretty well for HTTP keep-alive mode but the performance
Davor Ocelic4094ce12017-12-19 23:30:39 +0100398that can be achieved out of a single process generally outperforms common needs
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200399by an order of magnitude. It does however make sense when used as an SSL
400offloader, and this feature is well supported in multi-process mode.
401
402HAProxy only requires the haproxy executable and a configuration file to run.
403For logging it is highly recommended to have a properly configured syslog daemon
404and log rotations in place. The configuration files are parsed before starting,
405then HAProxy tries to bind all listening sockets, and refuses to start if
406anything fails. Past this point it cannot fail anymore. This means that there
407are no runtime failures and that if it accepts to start, it will work until it
408is stopped.
409
410Once HAProxy is started, it does exactly 3 things :
411
412 - process incoming connections;
413
414 - periodically check the servers' status (known as health checks);
415
416 - exchange information with other haproxy nodes.
417
418Processing incoming connections is by far the most complex task as it depends
419on a lot of configuration possibilities, but it can be summarized as the 9 steps
420below :
421
422 - accept incoming connections from listening sockets that belong to a
423 configuration entity known as a "frontend", which references one or multiple
424 listening addresses;
425
426 - apply the frontend-specific processing rules to these connections that may
427 result in blocking them, modifying some headers, or intercepting them to
428 execute some internal applets such as the statistics page or the CLI;
429
430 - pass these incoming connections to another configuration entity representing
431 a server farm known as a "backend", which contains the list of servers and
432 the load balancing strategy for this server farm;
433
434 - apply the backend-specific processing rules to these connections;
435
436 - decide which server to forward the connection to according to the load
437 balancing strategy;
438
439 - apply the backend-specific processing rules to the response data;
440
441 - apply the frontend-specific processing rules to the response data;
442
443 - emit a log to report what happened in fine details;
444
445 - in HTTP, loop back to the second step to wait for a new request, otherwise
446 close the connection.
447
448Frontends and backends are sometimes considered as half-proxies, since they only
449look at one side of an end-to-end connection; the frontend only cares about the
450clients while the backend only cares about the servers. HAProxy also supports
451full proxies which are exactly the union of a frontend and a backend. When HTTP
452processing is desired, the configuration will generally be split into frontends
453and backends as they open a lot of possibilities since any frontend may pass a
454connection to any backend. With TCP-only proxies, using frontends and backends
455rarely provides a benefit and the configuration can be more readable with full
456proxies.
457
458
4593.3. Basic features
460-------------------
461
462This section will enumerate a number of features that HAProxy implements, some
463of which are generally expected from any modern load balancer, and some of
464which are a direct benefit of HAProxy's architecture. More advanced features
465will be detailed in the next section.
466
467
4683.3.1. Basic features : Proxying
469--------------------------------
470
471Proxying is the action of transferring data between a client and a server over
Patrick Starrdce734e2017-10-09 13:17:12 +0700472two independent connections. The following basic features are supported by
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200473HAProxy regarding proxying and connection management :
474
475 - Provide the server with a clean connection to protect them against any
476 client-side defect or attack;
477
Patrick Starrdce734e2017-10-09 13:17:12 +0700478 - Listen to multiple IP addresses and/or ports, even port ranges;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200479
Davor Ocelic4094ce12017-12-19 23:30:39 +0100480 - Transparent accept : intercept traffic targeting any arbitrary IP address
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200481 that doesn't even belong to the local system;
482
483 - Server port doesn't need to be related to listening port, and may even be
484 translated by a fixed offset (useful with ranges);
485
486 - Transparent connect : spoof the client's (or any) IP address if needed
487 when connecting to the server;
488
489 - Provide a reliable return IP address to the servers in multi-site LBs;
490
491 - Offload the server thanks to buffers and possibly short-lived connections
492 to reduce their concurrent connection count and their memory footprint;
493
Davor Ocelic4094ce12017-12-19 23:30:39 +0100494 - Optimize TCP stacks (e.g. SACK), congestion control, and reduce RTT impacts;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200495
Davor Ocelic4094ce12017-12-19 23:30:39 +0100496 - Support different protocol families on both sides (e.g. IPv4/IPv6/Unix);
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200497
498 - Timeout enforcement : HAProxy supports multiple levels of timeouts depending
499 on the stage the connection is, so that a dead client or server, or an
500 attacker cannot be granted resources for too long;
501
502 - Protocol validation: HTTP, SSL, or payload are inspected and invalid
503 protocol elements are rejected, unless instructed to accept them anyway;
504
505 - Policy enforcement : ensure that only what is allowed may be forwarded;
506
507 - Both incoming and outgoing connections may be limited to certain network
508 namespaces (Linux only), making it easy to build a cross-container,
509 multi-tenant load balancer;
510
511 - PROXY protocol presents the client's IP address to the server even for
512 non-HTTP traffic. This is an HAProxy extension that was adopted by a number
513 of third-party products by now, at least these ones at the time of writing :
514 - client : haproxy, stud, stunnel, exaproxy, ELB, squid
515 - server : haproxy, stud, postfix, exim, nginx, squid, node.js, varnish
516
517
5183.3.2. Basic features : SSL
519---------------------------
520
521HAProxy's SSL stack is recognized as one of the most featureful according to
522Google's engineers (http://istlsfastyet.com/). The most commonly used features
523making it quite complete are :
524
525 - SNI-based multi-hosting with no limit on sites count and focus on
526 performance. At least one deployment is known for running 50000 domains
527 with their respective certificates;
528
529 - support for wildcard certificates reduces the need for many certificates ;
530
531 - certificate-based client authentication with configurable policies on
532 failure to present a valid certificate. This allows to present a different
533 server farm to regenerate the client certificate for example;
534
535 - authentication of the backend server ensures the backend server is the real
536 one and not a man in the middle;
537
Patrick Starrdce734e2017-10-09 13:17:12 +0700538 - authentication with the backend server lets the backend server know it's
539 really the expected haproxy node that is connecting to it;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200540
541 - TLS NPN and ALPN extensions make it possible to reliably offload SPDY/HTTP2
542 connections and pass them in clear text to backend servers;
543
544 - OCSP stapling further reduces first page load time by delivering inline an
545 OCSP response when the client requests a Certificate Status Request;
546
547 - Dynamic record sizing provides both high performance and low latency, and
548 significantly reduces page load time by letting the browser start to fetch
549 new objects while packets are still in flight;
550
551 - permanent access to all relevant SSL/TLS layer information for logging,
Davor Ocelic4094ce12017-12-19 23:30:39 +0100552 access control, reporting etc. These elements can be embedded into HTTP
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200553 header or even as a PROXY protocol extension so that the offloaded server
554 gets all the information it would have had if it performed the SSL
555 termination itself.
556
557 - Detect, log and block certain known attacks even on vulnerable SSL libs,
558 such as the Heartbleed attack affecting certain versions of OpenSSL.
559
Pavlos Parissisba56d9c2015-08-24 13:14:32 +0200560 - support for stateless session resumption (RFC 5077 TLS Ticket extension).
561 TLS tickets can be updated from CLI which provides them means to implement
562 Perfect Forward Secrecy by frequently rotating the tickets.
563
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200564
5653.3.3. Basic features : Monitoring
566----------------------------------
567
568HAProxy focuses a lot on availability. As such it cares about servers state,
569and about reporting its own state to other network components :
570
Patrick Starrdce734e2017-10-09 13:17:12 +0700571 - Servers' state is continuously monitored using per-server parameters. This
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200572 ensures the path to the server is operational for regular traffic;
573
574 - Health checks support two hysteresis for up and down transitions in order
575 to protect against state flapping;
576
577 - Checks can be sent to a different address/port/protocol : this makes it
578 easy to check a single service that is considered representative of multiple
579 ones, for example the HTTPS port for an HTTP+HTTPS server.
580
581 - Servers can track other servers and go down simultaneously : this ensures
Davor Ocelic4094ce12017-12-19 23:30:39 +0100582 that servers hosting multiple services can fail atomically and that no one
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200583 will be sent to a partially failed server;
584
585 - Agents may be deployed on the server to monitor load and health : a server
586 may be interested in reporting its load, operational status, administrative
Patrick Starrdce734e2017-10-09 13:17:12 +0700587 status independently from what health checks can see. By running a simple
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200588 agent on the server, it's possible to consider the server's view of its own
589 health in addition to the health checks validating the whole path;
590
591 - Various check methods are available : TCP connect, HTTP request, SMTP hello,
592 SSL hello, LDAP, SQL, Redis, send/expect scripts, all with/without SSL;
593
594 - State change is notified in the logs and stats page with the failure reason
Davor Ocelic4094ce12017-12-19 23:30:39 +0100595 (e.g. the HTTP response received at the moment the failure was detected). An
Willy Tarreaueff04f42015-08-27 14:44:43 +0200596 e-mail can also be sent to a configurable address upon such a change ;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200597
598 - Server state is also reported on the stats interface and can be used to take
599 routing decisions so that traffic may be sent to different farms depending
Davor Ocelic4094ce12017-12-19 23:30:39 +0100600 on their sizes and/or health (e.g. loss of an inter-DC link);
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200601
602 - HAProxy can use health check requests to pass information to the servers,
Davor Ocelic4094ce12017-12-19 23:30:39 +0100603 such as their names, weight, the number of other servers in the farm etc.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200604 so that servers can adjust their response and decisions based on this
Davor Ocelic4094ce12017-12-19 23:30:39 +0100605 knowledge (e.g. postpone backups to keep more CPU available);
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200606
607 - Servers can use health checks to report more detailed state than just on/off
Davor Ocelic4094ce12017-12-19 23:30:39 +0100608 (e.g. I would like to stop, please stop sending new visitors);
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200609
610 - HAProxy itself can report its state to external components such as routers
611 or other load balancers, allowing to build very complete multi-path and
612 multi-layer infrastructures.
613
614
6153.3.4. Basic features : High availability
616-----------------------------------------
617
618Just like any serious load balancer, HAProxy cares a lot about availability to
619ensure the best global service continuity :
620
Davor Ocelic4094ce12017-12-19 23:30:39 +0100621 - Only valid servers are used ; the other ones are automatically evicted from
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200622 load balancing farms ; under certain conditions it is still possible to
623 force to use them though;
624
625 - Support for a graceful shutdown so that it is possible to take servers out
626 of a farm without affecting any connection;
627
628 - Backup servers are automatically used when active servers are down and
629 replace them so that sessions are not lost when possible. This also allows
Davor Ocelic4094ce12017-12-19 23:30:39 +0100630 to build multiple paths to reach the same server (e.g. multiple interfaces);
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200631
632 - Ability to return a global failed status for a farm when too many servers
633 are down. This, combined with the monitoring capabilities makes it possible
634 for an upstream component to choose a different LB node for a given service;
635
636 - Stateless design makes it easy to build clusters : by design, HAProxy does
637 its best to ensure the highest service continuity without having to store
638 information that could be lost in the event of a failure. This ensures that
639 a takeover is the most seamless possible;
640
641 - Integrates well with standard VRRP daemon keepalived : HAProxy easily tells
Patrick Starrdce734e2017-10-09 13:17:12 +0700642 keepalived about its state and copes very well with floating virtual IP
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200643 addresses. Note: only use IP redundancy protocols (VRRP/CARP) over cluster-
644 based solutions (Heartbeat, ...) as they're the ones offering the fastest,
645 most seamless, and most reliable switchover.
646
647
6483.3.5. Basic features : Load balancing
649--------------------------------------
650
651HAProxy offers a fairly complete set of load balancing features, most of which
652are unfortunately not available in a number of other load balancing products :
653
654 - no less than 9 load balancing algorithms are supported, some of which apply
655 to input data to offer an infinite list of possibilities. The most common
656 ones are round-robin (for short connections, pick each server in turn),
657 leastconn (for long connections, pick the least recently used of the servers
658 with the lowest connection count), source (for SSL farms or terminal server
Davor Ocelic4094ce12017-12-19 23:30:39 +0100659 farms, the server directly depends on the client's source address), URI (for
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200660 HTTP caches, the server directly depends on the HTTP URI), hdr (the server
661 directly depends on the contents of a specific HTTP header field), first
662 (for short-lived virtual machines, all connections are packed on the
663 smallest possible subset of servers so that unused ones can be powered
664 down);
665
666 - all algorithms above support per-server weights so that it is possible to
667 accommodate from different server generations in a farm, or direct a small
668 fraction of the traffic to specific servers (debug mode, running the next
669 version of the software, etc);
670
671 - dynamic weights are supported for round-robin, leastconn and consistent
672 hashing ; this allows server weights to be modified on the fly from the CLI
673 or even by an agent running on the server;
674
675 - slow-start is supported whenever a dynamic weight is supported; this allows
676 a server to progressively take the traffic. This is an important feature
677 for fragile application servers which require to compile classes at runtime
678 as well as cold caches which need to fill up before being run at full
679 throttle;
680
681 - hashing can apply to various elements such as client's source address, URL
682 components, query string element, header field values, POST parameter, RDP
683 cookie;
684
685 - consistent hashing protects server farms against massive redistribution when
686 adding or removing servers in a farm. That's very important in large cache
687 farms and it allows slow-start to be used to refill cold caches;
688
689 - a number of internal metrics such as the number of connections per server,
690 per backend, the amount of available connection slots in a backend etc makes
691 it possible to build very advanced load balancing strategies.
692
693
6943.3.6. Basic features : Stickiness
695----------------------------------
696
697Application load balancing would be useless without stickiness. HAProxy provides
698a fairly comprehensive set of possibilities to maintain a visitor on the same
699server even across various events such as server addition/removal, down/up
700cycles, and some methods are designed to be resistant to the distance between
701multiple load balancing nodes in that they don't require any replication :
702
703 - stickiness information can be individually matched and learned from
704 different places if desired. For example a JSESSIONID cookie may be matched
705 both in a cookie and in the URL. Up to 8 parallel sources can be learned at
706 the same time and each of them may point to a different stick-table;
707
708 - stickiness information can come from anything that can be seen within a
709 request or response, including source address, TCP payload offset and
Patrick Starrdce734e2017-10-09 13:17:12 +0700710 length, HTTP query string elements, header field values, cookies, and so
Davor Ocelic4094ce12017-12-19 23:30:39 +0100711 on.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200712
Davor Ocelic4094ce12017-12-19 23:30:39 +0100713 - stick-tables are replicated between all nodes in a multi-master fashion;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200714
715 - commonly used elements such as SSL-ID or RDP cookies (for TSE farms) are
716 directly accessible to ease manipulation;
717
Davor Ocelic4094ce12017-12-19 23:30:39 +0100718 - all sticking rules may be dynamically conditioned by ACLs;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200719
720 - it is possible to decide not to stick to certain servers, such as backup
721 servers, so that when the nominal server comes back, it automatically takes
722 the load back. This is often used in multi-path environments;
723
Davor Ocelic4094ce12017-12-19 23:30:39 +0100724 - in HTTP it is often preferred not to learn anything and instead manipulate
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200725 a cookie dedicated to stickiness. For this, it's possible to detect,
726 rewrite, insert or prefix such a cookie to let the client remember what
727 server was assigned;
728
729 - the server may decide to change or clean the stickiness cookie on logout,
730 so that leaving visitors are automatically unbound from the server;
731
732 - using ACL-based rules it is also possible to selectively ignore or enforce
733 stickiness regardless of the server's state; combined with advanced health
734 checks, that helps admins verify that the server they're installing is up
735 and running before presenting it to the whole world;
736
737 - an innovative mechanism to set a maximum idle time and duration on cookies
738 ensures that stickiness can be smoothly stopped on devices which are never
739 closed (smartphones, TVs, home appliances) without having to store them on
740 persistent storage;
741
742 - multiple server entries may share the same stickiness keys so that
743 stickiness is not lost in multi-path environments when one path goes down;
744
745 - soft-stop ensures that only users with stickiness information will continue
746 to reach the server they've been assigned to but no new users will go there.
747
748
7493.3.7. Basic features : Sampling and converting information
750-----------------------------------------------------------
751
752HAProxy supports information sampling using a wide set of "sample fetch
753functions". The principle is to extract pieces of information known as samples,
754for immediate use. This is used for stickiness, to build conditions, to produce
755information in logs or to enrich HTTP headers.
756
757Samples can be fetched from various sources :
758
759 - constants : integers, strings, IP addresses, binary blocks;
760
761 - the process : date, environment variables, server/frontend/backend/process
762 state, byte/connection counts/rates, queue length, random generator, ...
763
764 - variables : per-session, per-request, per-response variables;
765
766 - the client connection : source and destination addresses and ports, and all
767 related statistics counters;
768
769 - the SSL client session : protocol, version, algorithm, cipher, key size,
770 session ID, all client and server certificate fields, certificate serial,
771 SNI, ALPN, NPN, client support for certain extensions;
772
773 - request and response buffers contents : arbitrary payload at offset/length,
774 data length, RDP cookie, decoding of SSL hello type, decoding of TLS SNI;
775
776 - HTTP (request and response) : method, URI, path, query string arguments,
Davor Ocelic4094ce12017-12-19 23:30:39 +0100777 status code, headers values, positional header value, cookies, captures,
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200778 authentication, body elements;
779
780A sample may then pass through a number of operators known as "converters" to
781experience some transformation. A converter consumes a sample and produces a
782new one, possibly of a completely different type. For example, a converter may
783be used to return only the integer length of the input string, or could turn a
784string to upper case. Any arbitrary number of converters may be applied in
785series to a sample before final use. Among all available sample converters, the
786following ones are the most commonly used :
787
788 - arithmetic and logic operators : they make it possible to perform advanced
789 computation on input data, such as computing ratios, percentages or simply
790 converting from one unit to another one;
791
Patrick Starrdce734e2017-10-09 13:17:12 +0700792 - IP address masks are useful when some addresses need to be grouped by larger
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200793 networks;
794
Davor Ocelic4094ce12017-12-19 23:30:39 +0100795 - data representation : URL-decode, base64, hex, JSON strings, hashing;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200796
797 - string conversion : extract substrings at fixed positions, fixed length,
798 extract specific fields around certain delimiters, extract certain words,
Patrick Starrdce734e2017-10-09 13:17:12 +0700799 change case, apply regex-based substitution;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200800
Davor Ocelic4094ce12017-12-19 23:30:39 +0100801 - date conversion : convert to HTTP date format, convert local to UTC and
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200802 conversely, add or remove offset;
803
804 - lookup an entry in a stick table to find statistics or assigned server;
805
806 - map-based key-to-value conversion from a file (mostly used for geolocation).
807
808
8093.3.8. Basic features : Maps
810----------------------------
811
812Maps are a powerful type of converter consisting in loading a two-columns file
813into memory at boot time, then looking up each input sample from the first
814column and either returning the corresponding pattern on the second column if
815the entry was found, or returning a default value. The output information also
816being a sample, it can in turn experience other transformations including other
817map lookups. Maps are most commonly used to translate the client's IP address
818to an AS number or country code since they support a longest match for network
819addresses but they can be used for various other purposes.
820
821Part of their strength comes from being updatable on the fly either from the CLI
822or from certain actions using other samples, making them capable of storing and
823retrieving information between subsequent accesses. Another strength comes from
Patrick Starrdce734e2017-10-09 13:17:12 +0700824the binary tree based indexation which makes them extremely fast even when they
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200825contain hundreds of thousands of entries, making geolocation very cheap and easy
826to set up.
827
828
8293.3.9. Basic features : ACLs and conditions
830-------------------------------------------
831
832Most operations in HAProxy can be made conditional. Conditions are built by
833combining multiple ACLs using logic operators (AND, OR, NOT). Each ACL is a
834series of tests based on the following elements :
835
836 - a sample fetch method to retrieve the element to test ;
837
838 - an optional series of converters to transform the element ;
839
840 - a list of patterns to match against ;
841
842 - a matching method to indicate how to compare the patterns with the sample
843
844For example, the sample may be taken from the HTTP "Host" header, it could then
845be converted to lower case, then matched against a number of regex patterns
846using the regex matching method.
847
848Technically, ACLs are built on the same core as the maps, they share the exact
849same internal structure, pattern matching methods and performance. The only real
850difference is that instead of returning a sample, they only return "found" or
851or "not found". In terms of usage, ACL patterns may be declared inline in the
852configuration file and do not require their own file. ACLs may be named for ease
853of use or to make configurations understandable. A named ACL may be declared
854multiple times and it will evaluate all definitions in turn until one matches.
855
856About 13 different pattern matching methods are provided, among which IP address
857mask, integer ranges, substrings, regex. They work like functions, and just like
858with any programming language, only what is needed is evaluated, so when a
859condition involving an OR is already true, next ones are not evaluated, and
860similarly when a condition involving an AND is already false, the rest of the
861condition is not evaluated.
862
863There is no practical limit to the number of declared ACLs, and a handful of
864commonly used ones are provided. However experience has shown that setups using
865a lot of named ACLs are quite hard to troubleshoot and that sometimes using
Patrick Starrdce734e2017-10-09 13:17:12 +0700866anonymous ACLs inline is easier as it requires less references out of the scope
Davor Ocelic4094ce12017-12-19 23:30:39 +0100867being analyzed.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200868
869
8703.3.10. Basic features : Content switching
871------------------------------------------
872
873HAProxy implements a mechanism known as content-based switching. The principle
874is that a connection or request arrives on a frontend, then the information
875carried with this request or connection are processed, and at this point it is
876possible to write ACLs-based conditions making use of these information to
877decide what backend will process the request. Thus the traffic is directed to
878one backend or another based on the request's contents. The most common example
879consists in using the Host header and/or elements from the path (sub-directories
880or file-name extensions) to decide whether an HTTP request targets a static
881object or the application, and to route static objects traffic to a backend made
882of fast and light servers, and all the remaining traffic to a more complex
883application server, thus constituting a fine-grained virtual hosting solution.
884This is quite convenient to make multiple technologies coexist as a more global
885solution.
886
887Another use case of content-switching consists in using different load balancing
888algorithms depending on various criteria. A cache may use a URI hash while an
Davor Ocelic4094ce12017-12-19 23:30:39 +0100889application would use round-robin.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200890
891Last but not least, it allows multiple customers to use a small share of a
892common resource by enforcing per-backend (thus per-customer connection limits).
893
894Content switching rules scale very well, though their performance may depend on
895the number and complexity of the ACLs in use. But it is also possible to write
896dynamic content switching rules where a sample value directly turns into a
897backend name and without making use of ACLs at all. Such configurations have
898been reported to work fine at least with 300000 backends in production.
899
900
9013.3.11. Basic features : Stick-tables
902-------------------------------------
903
904Stick-tables are commonly used to store stickiness information, that is, to keep
905a reference to the server a certain visitor was directed to. The key is then the
906identifier associated with the visitor (its source address, the SSL ID of the
907connection, an HTTP or RDP cookie, the customer number extracted from the URL or
908from the payload, ...) and the stored value is then the server's identifier.
909
910Stick tables may use 3 different types of samples for their keys : integers,
911strings and addresses. Only one stick-table may be referenced in a proxy, and it
Davor Ocelic4094ce12017-12-19 23:30:39 +0100912is designated everywhere with the proxy name. Up to 8 keys may be tracked in
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200913parallel. The server identifier is committed during request or response
914processing once both the key and the server are known.
915
916Stick-table contents may be replicated in active-active mode with other HAProxy
917nodes known as "peers" as well as with the new process during a reload operation
918so that all load balancing nodes share the same information and take the same
Davor Ocelic4094ce12017-12-19 23:30:39 +0100919routing decision if client's requests are spread over multiple nodes.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200920
921Since stick-tables are indexed on what allows to recognize a client, they are
922often also used to store extra information such as per-client statistics. The
923extra statistics take some extra space and need to be explicitly declared. The
924type of statistics that may be stored includes the input and output bandwidth,
925the number of concurrent connections, the connection rate and count over a
926period, the amount and frequency of errors, some specific tags and counters,
Davor Ocelic4094ce12017-12-19 23:30:39 +0100927etc. In order to support keeping such information without being forced to
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200928stick to a given server, a special "tracking" feature is implemented and allows
929to track up to 3 simultaneous keys from different tables at the same time
930regardless of stickiness rules. Each stored statistics may be searched, dumped
931and cleared from the CLI and adds to the live troubleshooting capabilities.
932
933While this mechanism can be used to surclass a returning visitor or to adjust
Davor Ocelic4094ce12017-12-19 23:30:39 +0100934the delivered quality of service depending on good or bad behavior, it is
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200935mostly used to fight against service abuse and more generally DDoS as it allows
Davor Ocelic4094ce12017-12-19 23:30:39 +0100936to build complex models to detect certain bad behaviors at a high processing
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200937speed.
938
939
Davor Ocelic4094ce12017-12-19 23:30:39 +01009403.3.12. Basic features : Formatted strings
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200941-----------------------------------------
942
943There are many places where HAProxy needs to manipulate character strings, such
944as logs, redirects, header additions, and so on. In order to provide the
Davor Ocelic4094ce12017-12-19 23:30:39 +0100945greatest flexibility, the notion of Formatted strings was introduced, initially
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200946for logging purposes, which explains why it's still called "log-format". These
947strings contain escape characters allowing to introduce various dynamic data
948including variables and sample fetch expressions into strings, and even to
949adjust the encoding while the result is being turned into a string (for example,
950adding quotes). This provides a powerful way to build header contents or to
951customize log lines. Additionally, in order to remain simple to build most
952common strings, about 50 special tags are provided as shortcuts for information
953commonly used in logs.
954
955
9563.3.13. Basic features : HTTP rewriting and redirection
957-------------------------------------------------------
958
959Installing a load balancer in front of an application that was never designed
960for this can be a challenging task without the proper tools. One of the most
961commonly requested operation in this case is to adjust requests and response
962headers to make the load balancer appear as the origin server and to fix hard
963coded information. This comes with changing the path in requests (which is
964strongly advised against), modifying Host header field, modifying the Location
965response header field for redirects, modifying the path and domain attribute
966for cookies, and so on. It also happens that a number of servers are somewhat
967verbose and tend to leak too much information in the response, making them more
Davor Ocelic4094ce12017-12-19 23:30:39 +0100968vulnerable to targeted attacks. While it's theoretically not the role of a load
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200969balancer to clean this up, in practice it's located at the best place in the
970infrastructure to guarantee that everything is cleaned up.
971
972Similarly, sometimes the load balancer will have to intercept some requests and
973respond with a redirect to a new target URL. While some people tend to confuse
974redirects and rewriting, these are two completely different concepts, since the
975rewriting makes the client and the server see different things (and disagree on
976the location of the page being visited) while redirects ask the client to visit
977the new URL so that it sees the same location as the server.
978
979In order to do this, HAProxy supports various possibilities for rewriting and
Davor Ocelic4094ce12017-12-19 23:30:39 +0100980redirects, among which :
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200981
982 - regex-based URL and header rewriting in requests and responses. Regex are
983 the most commonly used tool to modify header values since they're easy to
984 manipulate and well understood;
985
Davor Ocelic4094ce12017-12-19 23:30:39 +0100986 - headers may also be appended, deleted or replaced based on formatted strings
987 so that it is possible to pass information there (e.g. client side TLS
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200988 algorithm and cipher);
989
990 - HTTP redirects can use any 3xx code to a relative, absolute, or completely
Davor Ocelic4094ce12017-12-19 23:30:39 +0100991 dynamic (formatted string) URI;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200992
993 - HTTP redirects also support some extra options such as setting or clearing
994 a specific cookie, dropping the query string, appending a slash if missing,
995 and so on;
996
997 - all operations support ACL-based conditions;
998
999
10003.3.14. Basic features : Server protection
1001------------------------------------------
1002
Davor Ocelic4094ce12017-12-19 23:30:39 +01001003HAProxy does a lot to maximize service availability, and for this it takes
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001004large efforts to protect servers against overloading and attacks. The first
1005and most important point is that only complete and valid requests are forwarded
1006to the servers. The initial reason is that HAProxy needs to find the protocol
1007elements it needs to stay synchronized with the byte stream, and the second
1008reason is that until the request is complete, there is no way to know if some
1009elements will change its semantics. The direct benefit from this is that servers
1010are not exposed to invalid or incomplete requests. This is a very effective
1011protection against slowloris attacks, which have almost no impact on HAProxy.
1012
1013Another important point is that HAProxy contains buffers to store requests and
1014responses, and that by only sending a request to a server when it's complete and
1015by reading the whole response very quickly from the local network, the server
1016side connection is used for a very short time and this preserves server
1017resources as much as possible.
1018
1019A direct extension to this is that HAProxy can artificially limit the number of
1020concurrent connections or outstanding requests to a server, which guarantees
1021that the server will never be overloaded even if it continuously runs at 100% of
1022its capacity during traffic spikes. All excess requests will simply be queued to
1023be processed when one slot is released. In the end, this huge resource savings
1024most often ensures so much better server response times that it ends up actually
1025being faster than by overloading the server. Queued requests may be redispatched
1026to other servers, or even aborted in queue when the client aborts, which also
1027protects the servers against the "reload effect", where each click on "reload"
1028by a visitor on a slow-loading page usually induces a new request and maintains
1029the server in an overloaded state.
1030
1031The slow-start mechanism also protects restarting servers against high traffic
1032levels while they're still finalizing their startup or compiling some classes.
1033
1034Regarding the protocol-level protection, it is possible to relax the HTTP parser
Davor Ocelic4094ce12017-12-19 23:30:39 +01001035to accept non standard-compliant but harmless requests or responses and even to
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001036fix them. This allows bogus applications to be accessible while a fix is being
Patrick Starrdce734e2017-10-09 13:17:12 +07001037developed. In parallel, offending messages are completely captured with a
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001038detailed report that help developers spot the issue in the application. The most
1039dangerous protocol violations are properly detected and dealt with and fixed.
1040For example malformed requests or responses with two Content-length headers are
1041either fixed if the values are exactly the same, or rejected if they differ,
1042since it becomes a security problem. Protocol inspection is not limited to HTTP,
1043it is also available for other protocols like TLS or RDP.
1044
1045When a protocol violation or attack is detected, there are various options to
1046respond to the user, such as returning the common "HTTP 400 bad request",
Davor Ocelic4094ce12017-12-19 23:30:39 +01001047closing the connection with a TCP reset, or faking an error after a long delay
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001048("tarpit") to confuse the attacker. All of these contribute to protecting the
1049servers by discouraging the offending client from pursuing an attack that
1050becomes very expensive to maintain.
1051
1052HAProxy also proposes some more advanced options to protect against accidental
1053data leaks and session crossing. Not only it can log suspicious server responses
1054but it will also log and optionally block a response which might affect a given
1055visitors' confidentiality. One such example is a cacheable cookie appearing in a
1056cacheable response and which may result in an intermediary cache to deliver it
1057to another visitor, causing an accidental session sharing.
1058
1059
10603.3.15. Basic features : Logging
1061--------------------------------
1062
1063Logging is an extremely important feature for a load balancer, first because a
Davor Ocelic4094ce12017-12-19 23:30:39 +01001064load balancer is often wrongly accused of causing the problems it reveals, and
1065second because it is placed at a critical point in an infrastructure where all
1066normal and abnormal activity needs to be analyzed and correlated with other
1067components.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001068
1069HAProxy provides very detailed logs, with millisecond accuracy and the exact
Davor Ocelic4094ce12017-12-19 23:30:39 +01001070connection accept time that can be searched in firewalls logs (e.g. for NAT
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001071correlation). By default, TCP and HTTP logs are quite detailed an contain
1072everything needed for troubleshooting, such as source IP address and port,
1073frontend, backend, server, timers (request receipt duration, queue duration,
1074connection setup time, response headers time, data transfer time), global
1075process state, connection counts, queue status, retries count, detailed
1076stickiness actions and disconnect reasons, header captures with a safe output
1077encoding. It is then possible to extend or replace this format to include any
1078sampled data, variables, captures, resulting in very detailed information. For
Davor Ocelic4094ce12017-12-19 23:30:39 +01001079example it is possible to log the number of cumulative requests or number of
1080different URLs visited by a client.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001081
1082The log level may be adjusted per request using standard ACLs, so it is possible
1083to automatically silent some logs considered as pollution and instead raise
Davor Ocelic4094ce12017-12-19 23:30:39 +01001084warnings when some abnormal behavior happen for a small part of the traffic
1085(e.g. too many URLs or HTTP errors for a source address). Administrative logs
1086are also emitted with their own levels to inform about the loss or recovery of a
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001087server for example.
1088
Davor Ocelic4094ce12017-12-19 23:30:39 +01001089Each frontend and backend may use multiple independent log outputs, which eases
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001090multi-tenancy. Logs are preferably sent over UDP, maybe JSON-encoded, and are
1091truncated after a configurable line length in order to guarantee delivery.
1092
1093
10943.3.16. Basic features : Statistics
1095-----------------------------------
1096
1097HAProxy provides a web-based statistics reporting interface with authentication,
1098security levels and scopes. It is thus possible to provide each hosted customer
1099with his own page showing only his own instances. This page can be located in a
1100hidden URL part of the regular web site so that no new port needs to be opened.
1101This page may also report the availability of other HAProxy nodes so that it is
1102easy to spot if everything works as expected at a glance. The view is synthetic
1103with a lot of details accessible (such as error causes, last access and last
1104change duration, etc), which are also accessible as a CSV table that other tools
1105may import to draw graphs. The page may self-refresh to be used as a monitoring
1106page on a large display. In administration mode, the page also allows to change
1107server state to ease maintenance operations.
1108
1109
11103.4. Advanced features
1111----------------------
1112
11133.4.1. Advanced features : Management
1114-------------------------------------
1115
1116HAProxy is designed to remain extremely stable and safe to manage in a regular
1117production environment. It is provided as a single executable file which doesn't
1118require any installation process. Multiple versions can easily coexist, meaning
1119that it's possible (and recommended) to upgrade instances progressively by
Davor Ocelic4094ce12017-12-19 23:30:39 +01001120order of importance instead of migrating all of them at once. Configuration
1121files are easily versioned. Configuration checking is done off-line so it
1122doesn't require to restart a service that will possibly fail. During
1123configuration checks, a number of advanced mistakes may be detected (e.g. a rule
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001124hiding another one, or stickiness that will not work) and detailed warnings and
1125configuration hints are proposed to fix them. Backwards configuration file
1126compatibility goes very far away in time, with version 1.5 still fully
1127supporting configurations for versions 1.1 written 13 years before, and 1.6
1128only dropping support for almost unused, obsolete keywords that can be done
1129differently. The configuration and software upgrade mechanism is smooth and non
1130disruptive in that it allows old and new processes to coexist on the system,
Davor Ocelic4094ce12017-12-19 23:30:39 +01001131each handling its own connections. System status, build options, and library
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001132compatibility are reported on startup.
1133
1134Some advanced features allow an application administrator to smoothly stop a
1135server, detect when there's no activity on it anymore, then take it off-line,
1136stop it, upgrade it and ensure it doesn't take any traffic while being upgraded,
1137then test it again through the normal path without opening it to the public, and
1138all of this without touching HAProxy at all. This ensures that even complicated
1139production operations may be done during opening hours with all technical
1140resources available.
1141
1142The process tries to save resources as much as possible, uses memory pools to
1143save on allocation time and limit memory fragmentation, releases payload buffers
1144as soon as their contents are sent, and supports enforcing strong memory limits
1145above which connections have to wait for a buffer to become available instead of
1146allocating more memory. This system helps guarantee memory usage in certain
1147strict environments.
1148
1149A command line interface (CLI) is available as a UNIX or TCP socket, to perform
1150a number of operations and to retrieve troubleshooting information. Everything
1151done on this socket doesn't require a configuration change, so it is mostly used
1152for temporary changes. Using this interface it is possible to change a server's
1153address, weight and status, to consult statistics and clear counters, dump and
1154clear stickiness tables, possibly selectively by key criteria, dump and kill
1155client-side and server-side connections, dump captured errors with a detailed
1156analysis of the exact cause and location of the error, dump, add and remove
1157entries from ACLs and maps, update TLS shared secrets, apply connection limits
1158and rate limits on the fly to arbitrary frontends (useful in shared hosting
1159environments), and disable a specific frontend to release a listening port
1160(useful when daytime operations are forbidden and a fix is needed nonetheless).
1161
1162For environments where SNMP is mandatory, at least two agents exist, one is
Davor Ocelic4094ce12017-12-19 23:30:39 +01001163provided with the HAProxy sources and relies on the Net-SNMP Perl module.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001164Another one is provided with the commercial packages and doesn't require Perl.
1165Both are roughly equivalent in terms of coverage.
1166
1167It is often recommended to install 4 utilities on the machine where HAProxy is
1168deployed :
1169
1170 - socat (in order to connect to the CLI, though certain forks of netcat can
1171 also do it to some extents);
1172
1173 - halog from the latest HAProxy version : this is the log analysis tool, it
1174 parses native TCP and HTTP logs extremely fast (1 to 2 GB per second) and
1175 extracts useful information and statistics such as requests per URL, per
1176 source address, URLs sorted by response time or error rate, termination
Davor Ocelic4094ce12017-12-19 23:30:39 +01001177 codes etc. It was designed to be deployed on the production servers to
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001178 help troubleshoot live issues so it has to be there ready to be used;
1179
1180 - tcpdump : this is highly recommended to take the network traces needed to
1181 troubleshoot an issue that was made visible in the logs. There is a moment
1182 where application and haproxy's analysis will diverge and the network traces
1183 are the only way to say who's right and who's wrong. It's also fairly common
1184 to detect bugs in network stacks and hypervisors thanks to tcpdump;
1185
1186 - strace : it is tcpdump's companion. It will report what HAProxy really sees
1187 and will help sort out the issues the operating system is responsible for
1188 from the ones HAProxy is responsible for. Strace is often requested when a
1189 bug in HAProxy is suspected;
1190
1191
11923.4.2. Advanced features : System-specific capabilities
1193-------------------------------------------------------
1194
1195Depending on the operating system HAProxy is deployed on, certain extra features
1196may be available or needed. While it is supported on a number of platforms,
Patrick Starrdce734e2017-10-09 13:17:12 +07001197HAProxy is primarily developed on Linux, which explains why some features are
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001198only available on this platform.
1199
1200The transparent bind and connect features, the support for binding connections
1201to a specific network interface, as well as the ability to bind multiple
1202processes to the same IP address and ports are only available on Linux and BSD
1203systems, though only Linux performs a kernel-side load balancing of the incoming
1204requests between the available processes.
1205
1206On Linux, there are also a number of extra features and optimizations including
1207support for network namespaces (also known as "containers") allowing HAProxy to
1208be a gateway between all containers, the ability to set the MSS, Netfilter marks
1209and IP TOS field on the client side connection, support for TCP FastOpen on the
1210listening side, TCP user timeouts to let the kernel quickly kill connections
1211when it detects the client has disappeared before the configured timeouts, TCP
1212splicing to let the kernel forward data between the two sides of a connections
1213thus avoiding multiple memory copies, the ability to enable the "defer-accept"
1214bind option to only get notified of an incoming connection once data become
1215available in the kernel buffers, and the ability to send the request with the
Davor Ocelic4094ce12017-12-19 23:30:39 +01001216ACK confirming a connect (sometimes called "piggy-back") which is enabled with
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001217the "tcp-smart-connect" option. On Linux, HAProxy also takes great care of
1218manipulating the TCP delayed ACKs to save as many packets as possible on the
1219network.
1220
1221Some systems have an unreliable clock which jumps back and forth in the past
1222and in the future. This used to happen with some NUMA systems where multiple
1223processors didn't see the exact same time of day, and recently it became more
1224common in virtualized environments where the virtual clock has no relation with
1225the real clock, resulting in huge time jumps (sometimes up to 30 seconds have
1226been observed). This causes a lot of trouble with respect to timeout enforcement
1227in general. Due to this flaw of these systems, HAProxy maintains its own
1228monotonic clock which is based on the system's clock but where drift is measured
1229and compensated for. This ensures that even with a very bad system clock, timers
1230remain reasonably accurate and timeouts continue to work. Note that this problem
1231affects all the software running on such systems and is not specific to HAProxy.
1232The common effects are spurious timeouts or application freezes. Thus if this
Davor Ocelic4094ce12017-12-19 23:30:39 +01001233behavior is detected on a system, it must be fixed, regardless of the fact that
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001234HAProxy protects itself against it.
1235
1236
12373.4.3. Advanced features : Scripting
1238------------------------------------
1239
1240HAProxy can be built with support for the Lua embedded language, which opens a
1241wide area of new possibilities related to complex manipulation of requests or
1242responses, routing decisions, statistics processing and so on. Using Lua it is
1243even possible to establish parallel connections to other servers to exchange
1244information. This way it becomes possible (though complex) to develop an
1245authentication system for example. Please refer to the documentation in the file
1246"doc/lua-api/index.rst" for more information on how to use Lua.
1247
1248
12493.5. Sizing
1250-----------
1251
1252Typical CPU usage figures show 15% of the processing time spent in HAProxy
1253versus 85% in the kernel in TCP or HTTP close mode, and about 30% for HAProxy
1254versus 70% for the kernel in HTTP keep-alive mode. This means that the operating
1255system and its tuning have a strong impact on the global performance.
1256
1257Usages vary a lot between users, some focus on bandwidth, other ones on request
Davor Ocelic4094ce12017-12-19 23:30:39 +01001258rate, others on connection concurrency, others on SSL performance. This section
1259aims at providing a few elements to help with this task.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001260
1261It is important to keep in mind that every operation comes with a cost, so each
1262individual operation adds its overhead on top of the other ones, which may be
1263negligible in certain circumstances, and which may dominate in other cases.
1264
1265When processing the requests from a connection, we can say that :
1266
1267 - forwarding data costs less than parsing request or response headers;
1268
1269 - parsing request or response headers cost less than establishing then closing
1270 a connection to a server;
1271
1272 - establishing an closing a connection costs less than a TLS resume operation;
1273
1274 - a TLS resume operation costs less than a full TLS handshake with a key
1275 computation;
1276
1277 - an idle connection costs less CPU than a connection whose buffers hold data;
1278
1279 - a TLS context costs even more memory than a connection with data;
1280
1281So in practice, it is cheaper to process payload bytes than header bytes, thus
1282it is easier to achieve high network bandwidth with large objects (few requests
1283per volume unit) than with small objects (many requests per volume unit). This
1284explains why maximum bandwidth is always measured with large objects, while
1285request rate or connection rates are measured with small objects.
1286
Davor Ocelic4094ce12017-12-19 23:30:39 +01001287Some operations scale well on multiple processes spread over multiple CPUs,
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001288and others don't scale as well. Network bandwidth doesn't scale very far because
1289the CPU is rarely the bottleneck for large objects, it's mostly the network
Davor Ocelic4094ce12017-12-19 23:30:39 +01001290bandwidth and data buses to reach the network interfaces. The connection rate
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001291doesn't scale well over multiple processors due to a few locks in the system
1292when dealing with the local ports table. The request rate over persistent
1293connections scales very well as it doesn't involve much memory nor network
1294bandwidth and doesn't require to access locked structures. TLS key computation
1295scales very well as it's totally CPU-bound. TLS resume scales moderately well,
1296but reaches its limits around 4 processes where the overhead of accessing the
1297shared table offsets the small gains expected from more power.
1298
1299The performance numbers one can expect from a very well tuned system are in the
1300following range. It is important to take them as orders of magnitude and to
1301expect significant variations in any direction based on the processor, IRQ
1302setting, memory type, network interface type, operating system tuning and so on.
1303
Davor Ocelic4094ce12017-12-19 23:30:39 +01001304The following numbers were found on a Core i7 running at 3.7 GHz equipped with
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001305a dual-port 10 Gbps NICs running Linux kernel 3.10, HAProxy 1.6 and OpenSSL
13061.0.2. HAProxy was running as a single process on a single dedicated CPU core,
1307and two extra cores were dedicated to network interrupts :
1308
1309 - 20 Gbps of maximum network bandwidth in clear text for objects 256 kB or
1310 higher, 10 Gbps for 41kB or higher;
1311
1312 - 4.6 Gbps of TLS traffic using AES256-GCM cipher with large objects;
1313
1314 - 83000 TCP connections per second from client to server;
1315
1316 - 82000 HTTP connections per second from client to server;
1317
1318 - 97000 HTTP requests per second in server-close mode (keep-alive with the
1319 client, close with the server);
1320
1321 - 243000 HTTP requests per second in end-to-end keep-alive mode;
1322
1323 - 300000 filtered TCP connections per second (anti-DDoS)
1324
1325 - 160000 HTTPS requests per second in keep-alive mode over persistent TLS
1326 connections;
1327
1328 - 13100 HTTPS requests per second using TLS resumed connections;
1329
Davor Ocelic4094ce12017-12-19 23:30:39 +01001330 - 1300 HTTPS connections per second using TLS connections renegotiated with
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001331 RSA2048;
1332
1333 - 20000 concurrent saturated connections per GB of RAM, including the memory
1334 required for system buffers; it is possible to do better with careful tuning
Davor Ocelic4094ce12017-12-19 23:30:39 +01001335 but this result it easy to achieve.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001336
1337 - about 8000 concurrent TLS connections (client-side only) per GB of RAM,
1338 including the memory required for system buffers;
1339
1340 - about 5000 concurrent end-to-end TLS connections (both sides) per GB of
1341 RAM including the memory required for system buffers;
1342
1343Thus a good rule of thumb to keep in mind is that the request rate is divided
1344by 10 between TLS keep-alive and TLS resume, and between TLS resume and TLS
Davor Ocelic4094ce12017-12-19 23:30:39 +01001345renegotiation, while it's only divided by 3 between HTTP keep-alive and HTTP
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001346close. Another good rule of thumb is to remember that a high frequency core
1347with AES instructions can do around 5 Gbps of AES-GCM per core.
1348
Patrick Starrdce734e2017-10-09 13:17:12 +07001349Having more cores rarely helps (except for TLS) and is even counter-productive
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001350due to the lower frequency. In general a small number of high frequency cores
1351is better.
1352
1353Another good rule of thumb is to consider that on the same server, HAProxy will
1354be able to saturate :
1355
1356 - about 5-10 static file servers or caching proxies;
1357
1358 - about 100 anti-virus proxies;
1359
Willy Tarreau16af23c2015-08-27 16:30:53 +02001360 - and about 100-1000 application servers depending on the technology in use.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001361
1362
13633.6. How to get HAProxy
1364-----------------------
1365
Davor Ocelic4094ce12017-12-19 23:30:39 +01001366HAProxy is an open source project covered by the GPLv2 license, meaning that
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001367everyone is allowed to redistribute it provided that access to the sources is
1368also provided upon request, especially if any modifications were made.
1369
1370HAProxy evolves as a main development branch called "master" or "mainline", from
1371which new branches are derived once the code is considered stable. A lot of web
1372sites run some development branches in production on a voluntarily basis, either
1373to participate to the project or because they need a bleeding edge feature, and
1374their feedback is highly valuable to fix bugs and judge the overall quality and
Patrick Starrdce734e2017-10-09 13:17:12 +07001375stability of the version being developed.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001376
1377The new branches that are created when the code is stable enough constitute a
1378stable version and are generally maintained for several years, so that there is
1379no emergency to migrate to a newer branch even when you're not on the latest.
1380Once a stable branch is issued, it may only receive bug fixes, and very rarely
1381minor feature updates when that makes users' life easier. All fixes that go into
1382a stable branch necessarily come from the master branch. This guarantees that no
1383fix will be lost after an upgrade. For this reason, if you fix a bug, please
1384make the patch against the master branch, not the stable branch. You may even
1385discover it was already fixed. This process also ensures that regressions in a
1386stable branch are extremely rare, so there is never any excuse for not upgrading
1387to the latest version in your current branch.
1388
Davor Ocelic4094ce12017-12-19 23:30:39 +01001389Branches are numbered with two digits delimited with a dot, such as "1.6". A
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001390complete version includes one or two sub-version numbers indicating the level of
1391fix. For example, version 1.5.14 is the 14th fix release in branch 1.5 after
1392version 1.5.0 was issued. It contains 126 fixes for individual bugs, 24 updates
1393on the documentation, and 75 other backported patches, most of which were needed
Patrick Starrdce734e2017-10-09 13:17:12 +07001394to fix the aforementioned 126 bugs. An existing feature may never be modified
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001395nor removed in a stable branch, in order to guarantee that upgrades within the
1396same branch will always be harmless.
1397
1398HAProxy is available from multiple sources, at different release rhythms :
1399
1400 - The official community web site : http://www.haproxy.org/ : this site
1401 provides the sources of the latest development release, all stable releases,
1402 as well as nightly snapshots for each branch. The release cycle is not fast,
1403 several months between stable releases, or between development snapshots.
1404 Very old versions are still supported there. Everything is provided as
1405 sources only, so whatever comes from there needs to be rebuilt and/or
1406 repackaged;
1407
1408 - A number of operating systems such as Linux distributions and BSD ports.
1409 These systems generally provide long-term maintained versions which do not
1410 always contain all the fixes from the official ones, but which at least
1411 contain the critical fixes. It often is a good option for most users who do
1412 not seek advanced configurations and just want to keep updates easy;
1413
1414 - Commercial versions from http://www.haproxy.com/ : these are supported
1415 professional packages built for various operating systems or provided as
1416 appliances, based on the latest stable versions and including a number of
1417 features backported from the next release for which there is a strong
1418 demand. It is the best option for users seeking the latest features with
1419 the reliability of a stable branch, the fastest response time to fix bugs,
Davor Ocelic4094ce12017-12-19 23:30:39 +01001420 or simply support contracts on top of an open source product;
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001421
1422
1423In order to ensure that the version you're using is the latest one in your
1424branch, you need to proceed this way :
1425
1426 - verify which HAProxy executable you're running : some systems ship it by
1427 default and administrators install their versions somewhere else on the
1428 system, so it is important to verify in the startup scripts which one is
1429 used;
1430
1431 - determine which source your HAProxy version comes from. For this, it's
1432 generally sufficient to type "haproxy -v". A development version will
1433 appear like this, with the "dev" word after the branch number :
1434
1435 HA-Proxy version 1.6-dev3-385ecc-68 2015/08/18
1436
1437 A stable version will appear like this, as well as unmodified stable
1438 versions provided by operating system vendors :
1439
1440 HA-Proxy version 1.5.14 2015/07/02
1441
1442 And a nightly snapshot of a stable version will appear like this with an
1443 hexadecimal sequence after the version, and with the date of the snapshot
1444 instead of the date of the release :
1445
1446 HA-Proxy version 1.5.14-e4766ba 2015/07/29
1447
1448 Any other format may indicate a system-specific package with its own
1449 patch set. For example HAProxy Enterprise versions will appear with the
1450 following format (<branch>-<latest commit>-<revision>) :
1451
1452 HA-Proxy version 1.5.0-994126-357 2015/07/02
1453
1454 - for system-specific packages, you have to check with your vendor's package
1455 repository or update system to ensure that your system is still supported,
1456 and that fixes are still provided for your branch. For community versions
1457 coming from haproxy.org, just visit the site, verify the status of your
1458 branch and compare the latest version with yours to see if you're on the
1459 latest one. If not you can upgrade. If your branch is not maintained
1460 anymore, you're definitely very late and will have to consider an upgrade
1461 to a more recent branch (carefully read the README when doing so).
1462
1463HAProxy will have to be updated according to the source it came from. Usually it
1464follows the system vendor's way of upgrading a package. If it was taken from
1465sources, please read the README file in the sources directory after extracting
1466the sources and follow the instructions for your operating system.
1467
1468
14694. Companion products and alternatives
1470--------------------------------------
1471
1472HAProxy integrates fairly well with certain products listed below, which is why
Davor Ocelic4094ce12017-12-19 23:30:39 +01001473they are mentioned here even if not directly related to HAProxy.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001474
1475
14764.1. Apache HTTP server
1477-----------------------
1478
1479Apache is the de-facto standard HTTP server. It's a very complete and modular
1480project supporting both file serving and dynamic contents. It can serve as a
Michael Prokop4438c602019-05-24 10:25:45 +02001481frontend for some application servers. It can even proxy requests and cache
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001482responses. In all of these use cases, a front load balancer is commonly needed.
Patrick Starrdce734e2017-10-09 13:17:12 +07001483Apache can work in various modes, some being heavier than others. Certain
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001484modules still require the heavier pre-forked model and will prevent Apache from
1485scaling well with a high number of connections. In this case HAProxy can provide
1486a tremendous help by enforcing the per-server connection limits to a safe value
1487and will significantly speed up the server and preserve its resources that will
1488be better used by the application.
1489
1490Apache can extract the client's address from the X-Forwarded-For header by using
1491the "mod_rpaf" extension. HAProxy will automatically feed this header when
1492"option forwardfor" is specified in its configuration. HAProxy may also offer a
1493nice protection to Apache when exposed to the internet, where it will better
Davor Ocelic4094ce12017-12-19 23:30:39 +01001494resist a wide number of types of DoS attacks.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001495
1496
14974.2. NGINX
1498----------
1499
1500NGINX is the second de-facto standard HTTP server. Just like Apache, it covers a
1501wide range of features. NGINX is built on a similar model as HAProxy so it has
1502no problem dealing with tens of thousands of concurrent connections. When used
Davor Ocelic4094ce12017-12-19 23:30:39 +01001503as a gateway to some applications (e.g. using the included PHP FPM) it can often
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001504be beneficial to set up some frontend connection limiting to reduce the load
1505on the PHP application. HAProxy will clearly be useful there both as a regular
Davor Ocelic4094ce12017-12-19 23:30:39 +01001506load balancer and as the traffic regulator to speed up PHP by decongesting
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001507it. Also since both products use very little CPU thanks to their event-driven
1508architecture, it's often easy to install both of them on the same system. NGINX
1509implements HAProxy's PROXY protocol, thus it is easy for HAProxy to pass the
1510client's connection information to NGINX so that the application gets all the
1511relevant information. Some benchmarks have also shown that for large static
1512file serving, implementing consistent hash on HAProxy in front of NGINX can be
1513beneficial by optimizing the OS' cache hit ratio, which is basically multiplied
1514by the number of server nodes.
1515
1516
15174.3. Varnish
1518------------
1519
1520Varnish is a smart caching reverse-proxy, probably best described as a web
1521application accelerator. Varnish doesn't implement SSL/TLS and wants to dedicate
1522all of its CPU cycles to what it does best. Varnish also implements HAProxy's
1523PROXY protocol so that HAProxy can very easily be deployed in front of Varnish
1524as an SSL offloader as well as a load balancer and pass it all relevant client
1525information. Also, Varnish naturally supports decompression from the cache when
1526a server has provided a compressed object, but doesn't compress however. HAProxy
1527can then be used to compress outgoing data when backend servers do not implement
1528compression, though it's rarely a good idea to compress on the load balancer
1529unless the traffic is low.
1530
1531When building large caching farms across multiple nodes, HAProxy can make use of
1532consistent URL hashing to intelligently distribute the load to the caching nodes
1533and avoid cache duplication, resulting in a total cache size which is the sum of
1534all caching nodes.
1535
1536
15374.4. Alternatives
1538-----------------
1539
1540Linux Virtual Server (LVS or IPVS) is the layer 4 load balancer included within
1541the Linux kernel. It works at the packet level and handles TCP and UDP. In most
1542cases it's more a complement than an alternative since it doesn't have layer 7
1543knowledge at all.
1544
1545Pound is another well-known load balancer. It's much simpler and has much less
1546features than HAProxy but for many very basic setups both can be used. Its
1547author has always focused on code auditability first and wants to maintain the
1548set of features low. Its thread-based architecture scales less well with high
1549connection counts, but it's a good product.
1550
1551Pen is a quite light load balancer. It supports SSL, maintains persistence using
1552a fixed-size table of its clients' IP addresses. It supports a packet-oriented
1553mode allowing it to support direct server return and UDP to some extents. It is
1554meant for small loads (the persistence table only has 2048 entries).
1555
1556NGINX can do some load balancing to some extents, though it's clearly not its
1557primary function. Production traffic is used to detect server failures, the
1558load balancing algorithms are more limited, and the stickiness is very limited.
1559But it can make sense in some simple deployment scenarios where it is already
1560present. The good thing is that since it integrates very well with HAProxy,
Davor Ocelic4094ce12017-12-19 23:30:39 +01001561there's nothing wrong with adding HAProxy later when its limits have been
1562reached.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001563
1564Varnish also does some load balancing of its backend servers and does support
1565real health checks. It doesn't implement stickiness however, so just like with
1566NGINX, as long as stickiness is not needed that can be enough to start with.
1567And similarly, since HAProxy and Varnish integrate so well together, it's easy
1568to add it later into the mix to complement the feature set.
1569