blob: 325ba76388bd344b644126c437ba8efd454a8646 [file] [log] [blame]
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001 -----------------------
2 HAProxy Starter Guide
3 -----------------------
Willy Tarreau1db55792020-11-05 17:20:35 +01004 version 2.4
Willy Tarreaud8e42b62015-08-18 21:51:36 +02005
6
7This document is an introduction to HAProxy for all those who don't know it, as
8well as for those who want to re-discover it when they know older versions. Its
9primary focus is to provide users with all the elements to decide if HAProxy is
10the product they're looking for or not. Advanced users may find here some parts
11of solutions to some ideas they had just because they were not aware of a given
Davor Ocelic4094ce12017-12-19 23:30:39 +010012new feature. Some sizing information is also provided, the product's lifecycle
Willy Tarreaud8e42b62015-08-18 21:51:36 +020013is explained, and comparisons with partially overlapping products are provided.
14
Davor Ocelic4094ce12017-12-19 23:30:39 +010015This document doesn't provide any configuration help or hints, but it explains
Willy Tarreaud8e42b62015-08-18 21:51:36 +020016where to find the relevant documents. The summary below is meant to help you
17search sections by name and navigate through the document.
18
19Note to documentation contributors :
20 This document is formatted with 80 columns per line, with even number of
21 spaces for indentation and without tabs. Please follow these rules strictly
22 so that it remains easily printable everywhere. If you add sections, please
23 update the summary below for easier searching.
24
25
26Summary
27-------
28
291. Available documentation
30
312. Quick introduction to load balancing and load balancers
32
333. Introduction to HAProxy
343.1. What HAProxy is and is not
353.2. How HAProxy works
363.3. Basic features
373.3.1. Proxying
383.3.2. SSL
393.3.3. Monitoring
403.3.4. High availability
413.3.5. Load balancing
423.3.6. Stickiness
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +0200433.3.7. Logging
443.4. Standard features
453.4.1. Sampling and converting information
463.4.2. Maps
473.4.3. ACLs and conditions
483.4.4. Content switching
493.4.5. Stick-tables
503.4.6. Formatted strings
513.4.7. HTTP rewriting and redirection
523.4.8. Server protection
533.4.9. Statistics
543.5. Advanced features
553.5.1. Management
563.5.2. System-specific capabilities
573.5.3. Scripting
583.6. Sizing
593.7. How to get HAProxy
Willy Tarreaud8e42b62015-08-18 21:51:36 +020060
614. Companion products and alternatives
624.1. Apache HTTP server
634.2. NGINX
644.3. Varnish
654.4. Alternatives
66
Willy Tarreau65626232020-05-05 18:08:07 +0200675. Contacts
68
Willy Tarreaud8e42b62015-08-18 21:51:36 +020069
701. Available documentation
71--------------------------
72
73The complete HAProxy documentation is contained in the following documents.
74Please ensure to consult the relevant documentation to save time and to get the
75most accurate response to your needs. Also please refrain from sending questions
76to the mailing list whose responses are present in these documents.
77
78 - intro.txt (this document) : it presents the basics of load balancing,
79 HAProxy as a product, what it does, what it doesn't do, some known traps to
80 avoid, some OS-specific limitations, how to get it, how it evolves, how to
Davor Ocelic4094ce12017-12-19 23:30:39 +010081 ensure you're running with all known fixes, how to update it, complements
82 and alternatives.
Willy Tarreaud8e42b62015-08-18 21:51:36 +020083
Willy Tarreau373933d2015-10-13 16:32:20 +020084 - management.txt : it explains how to start haproxy, how to manage it at
Davor Ocelic4094ce12017-12-19 23:30:39 +010085 runtime, how to manage it on multiple nodes, and how to proceed with
86 seamless upgrades.
Willy Tarreau373933d2015-10-13 16:32:20 +020087
Willy Tarreaud8e42b62015-08-18 21:51:36 +020088 - configuration.txt : the reference manual details all configuration keywords
89 and their options. It is used when a configuration change is needed.
90
Willy Tarreaud8e42b62015-08-18 21:51:36 +020091 - coding-style.txt : this is for developers who want to propose some code to
Davor Ocelic4094ce12017-12-19 23:30:39 +010092 the project. It explains the style to adopt for the code. It is not very
93 strict and not all the code base completely respects it, but contributions
Willy Tarreaud8e42b62015-08-18 21:51:36 +020094 which diverge too much from it will be rejected.
95
96 - proxy-protocol.txt : this is the de-facto specification of the PROXY
97 protocol which is implemented by HAProxy and a number of third party
98 products.
99
Davor Ocelic4094ce12017-12-19 23:30:39 +0100100 - README : how to build HAProxy from sources
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200101
102
1032. Quick introduction to load balancing and load balancers
104----------------------------------------------------------
105
106Load balancing consists in aggregating multiple components in order to achieve
107a total processing capacity above each component's individual capacity, without
108any intervention from the end user and in a scalable way. This results in more
Willy Tarreaueff04f42015-08-27 14:44:43 +0200109operations being performed simultaneously by the time it takes a component to
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200110perform only one. A single operation however will still be performed on a single
111component at a time and will not get faster than without load balancing. It
112always requires at least as many operations as available components and an
113efficient load balancing mechanism to make use of all components and to fully
114benefit from the load balancing. A good example of this is the number of lanes
115on a highway which allows as many cars to pass during the same time frame
116without increasing their individual speed.
117
118Examples of load balancing :
119
120 - Process scheduling in multi-processor systems
Davor Ocelic4094ce12017-12-19 23:30:39 +0100121 - Link load balancing (e.g. EtherChannel, Bonding)
122 - IP address load balancing (e.g. ECMP, DNS round-robin)
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200123 - Server load balancing (via load balancers)
124
125The mechanism or component which performs the load balancing operation is
126called a load balancer. In web environments these components are called a
127"network load balancer", and more commonly a "load balancer" given that this
128activity is by far the best known case of load balancing.
129
130A load balancer may act :
131
132 - at the link level : this is called link load balancing, and it consists in
Patrick Starrdce734e2017-10-09 13:17:12 +0700133 choosing what network link to send a packet to;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200134
135 - at the network level : this is called network load balancing, and it
Patrick Starrdce734e2017-10-09 13:17:12 +0700136 consists in choosing what route a series of packets will follow;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200137
138 - at the server level : this is called server load balancing and it consists
139 in deciding what server will process a connection or request.
140
141Two distinct technologies exist and address different needs, though with some
Willy Tarreaueff04f42015-08-27 14:44:43 +0200142overlapping. In each case it is important to keep in mind that load balancing
143consists in diverting the traffic from its natural flow and that doing so always
144requires a minimum of care to maintain the required level of consistency between
145all routing decisions.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200146
147The first one acts at the packet level and processes packets more or less
148individually. There is a 1-to-1 relation between input and output packets, so
149it is possible to follow the traffic on both sides of the load balancer using a
Davor Ocelic4094ce12017-12-19 23:30:39 +0100150regular network sniffer. This technology can be very cheap and extremely fast.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200151It is usually implemented in hardware (ASICs) allowing to reach line rate, such
Davor Ocelic4094ce12017-12-19 23:30:39 +0100152as switches doing ECMP. Usually stateless, it can also be stateful (consider
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200153the session a packet belongs to and called layer4-LB or L4), may support DSR
154(direct server return, without passing through the LB again) if the packets
155were not modified, but provides almost no content awareness. This technology is
156very well suited to network-level load balancing, though it is sometimes used
157for very basic server load balancing at high speed.
158
159The second one acts on session contents. It requires that the input streams is
160reassembled and processed as a whole. The contents may be modified, and the
161output stream is segmented into new packets. For this reason it is generally
162performed by proxies and they're often called layer 7 load balancers or L7.
163This implies that there are two distinct connections on each side, and that
164there is no relation between input and output packets sizes nor counts. Clients
165and servers are not required to use the same protocol (for example IPv4 vs
166IPv6, clear vs SSL). The operations are always stateful, and the return traffic
167must pass through the load balancer. The extra processing comes with a cost so
168it's not always possible to achieve line rate, especially with small packets.
169On the other hand, it offers wide possibilities and is generally achieved by
170pure software, even if embedded into hardware appliances. This technology is
171very well suited for server load balancing.
172
173Packet-based load balancers are generally deployed in cut-through mode, so they
174are installed on the normal path of the traffic and divert it according to the
175configuration. The return traffic doesn't necessarily pass through the load
176balancer. Some modifications may be applied to the network destination address
177in order to direct the traffic to the proper destination. In this case, it is
178mandatory that the return traffic passes through the load balancer. If the
179routes doesn't make this possible, the load balancer may also replace the
180packets' source address with its own in order to force the return traffic to
181pass through it.
182
Davor Ocelic4094ce12017-12-19 23:30:39 +0100183Proxy-based load balancers are deployed as a server with their own IP addresses
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200184and ports, without architecture changes. Sometimes this requires to perform some
185adaptations to the applications so that clients are properly directed to the
186load balancer's IP address and not directly to the server's. Some load balancers
Davor Ocelic4094ce12017-12-19 23:30:39 +0100187may have to adjust some servers' responses to make this possible (e.g. the HTTP
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200188Location header field used in HTTP redirects). Some proxy-based load balancers
189may intercept traffic for an address they don't own, and spoof the client's
190address when connecting to the server. This allows them to be deployed as if
191they were a regular router or firewall, in a cut-through mode very similar to
192the packet based load balancers. This is particularly appreciated for products
193which combine both packet mode and proxy mode. In this case DSR is obviously
194still not possible and the return traffic still has to be routed back to the
195load balancer.
196
197A very scalable layered approach would consist in having a front router which
198receives traffic from multiple load balanced links, and uses ECMP to distribute
199this traffic to a first layer of multiple stateful packet-based load balancers
200(L4). These L4 load balancers in turn pass the traffic to an even larger number
201of proxy-based load balancers (L7), which have to parse the contents to decide
202what server will ultimately receive the traffic.
203
204The number of components and possible paths for the traffic increases the risk
205of failure; in very large environments, it is even normal to permanently have
206a few faulty components being fixed or replaced. Load balancing done without
207awareness of the whole stack's health significantly degrades availability. For
208this reason, any sane load balancer will verify that the components it intends
209to deliver the traffic to are still alive and reachable, and it will stop
210delivering traffic to faulty ones. This can be achieved using various methods.
211
212The most common one consists in periodically sending probes to ensure the
213component is still operational. These probes are called "health checks". They
214must be representative of the type of failure to address. For example a ping-
215based check will not detect that a web server has crashed and doesn't listen to
216a port anymore, while a connection to the port will verify this, and a more
217advanced request may even validate that the server still works and that the
218database it relies on is still accessible. Health checks often involve a few
219retries to cover for occasional measuring errors. The period between checks
220must be small enough to ensure the faulty component is not used for too long
221after an error occurs.
222
223Other methods consist in sampling the production traffic sent to a destination
Davor Ocelic4094ce12017-12-19 23:30:39 +0100224to observe if it is processed correctly or not, and to evict the components
Patrick Starrdce734e2017-10-09 13:17:12 +0700225which return inappropriate responses. However this requires to sacrifice a part
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200226of the production traffic and this is not always acceptable. A combination of
227these two mechanisms provides the best of both worlds, with both of them being
228used to detect a fault, and only health checks to detect the end of the fault.
229A last method involves centralized reporting : a central monitoring agent
230periodically updates all load balancers about all components' state. This gives
231a global view of the infrastructure to all components, though sometimes with
232less accuracy or responsiveness. It's best suited for environments with many
233load balancers and many servers.
234
235Layer 7 load balancers also face another challenge known as stickiness or
236persistence. The principle is that they generally have to direct multiple
237subsequent requests or connections from a same origin (such as an end user) to
238the same target. The best known example is the shopping cart on an online
239store. If each click leads to a new connection, the user must always be sent
240to the server which holds his shopping cart. Content-awareness makes it easier
241to spot some elements in the request to identify the server to deliver it to,
242but that's not always enough. For example if the source address is used as a
243key to pick a server, it can be decided that a hash-based algorithm will be
244used and that a given IP address will always be sent to the same server based
245on a divide of the address by the number of available servers. But if one
246server fails, the result changes and all users are suddenly sent to a different
247server and lose their shopping cart. The solution against this issue consists
248in memorizing the chosen target so that each time the same visitor is seen,
249he's directed to the same server regardless of the number of available servers.
250The information may be stored in the load balancer's memory, in which case it
251may have to be replicated to other load balancers if it's not alone, or it may
252be stored in the client's memory using various methods provided that the client
253is able to present this information back with every request (cookie insertion,
254redirection to a sub-domain, etc). This mechanism provides the extra benefit of
255not having to rely on unstable or unevenly distributed information (such as the
256source IP address). This is in fact the strongest reason to adopt a layer 7
257load balancer instead of a layer 4 one.
258
259In order to extract information such as a cookie, a host header field, a URL
260or whatever, a load balancer may need to decrypt SSL/TLS traffic and even
Davor Ocelic4094ce12017-12-19 23:30:39 +0100261possibly to re-encrypt it when passing it to the server. This expensive task
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200262explains why in some high-traffic infrastructures, sometimes there may be a
263lot of load balancers.
264
265Since a layer 7 load balancer may perform a number of complex operations on the
266traffic (decrypt, parse, modify, match cookies, decide what server to send to,
267etc), it can definitely cause some trouble and will very commonly be accused of
268being responsible for a lot of trouble that it only revealed. Often it will be
269discovered that servers are unstable and periodically go up and down, or for
270web servers, that they deliver pages with some hard-coded links forcing the
271clients to connect directly to one specific server without passing via the load
272balancer, or that they take ages to respond under high load causing timeouts.
273That's why logging is an extremely important aspect of layer 7 load balancing.
274Once a trouble is reported, it is important to figure if the load balancer took
275a wrong decision and if so why so that it doesn't happen anymore.
276
277
2783. Introduction to HAProxy
279--------------------------
280
Davor Ocelic4094ce12017-12-19 23:30:39 +0100281HAProxy is written as "HAProxy" to designate the product, and as "haproxy" to
282designate the executable program, software package or a process. However, both
283are commonly used for both purposes, and are pronounced H-A-Proxy. Very early,
284"haproxy" used to stand for "high availability proxy" and the name was written
285in two separate words, though by now it means nothing else than "HAProxy".
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200286
287
Davor Ocelic4094ce12017-12-19 23:30:39 +01002883.1. What HAProxy is and isn't
289------------------------------
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200290
291HAProxy is :
292
293 - a TCP proxy : it can accept a TCP connection from a listening socket,
294 connect to a server and attach these sockets together allowing traffic to
Willy Tarreauec8962c2020-05-05 17:39:16 +0200295 flow in both directions; IPv4, IPv6 and even UNIX sockets are supported on
296 either side, so this can provide an easy way to translate addresses between
297 different families.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200298
299 - an HTTP reverse-proxy (called a "gateway" in HTTP terminology) : it presents
300 itself as a server, receives HTTP requests over connections accepted on a
301 listening TCP socket, and passes the requests from these connections to
Willy Tarreauec8962c2020-05-05 17:39:16 +0200302 servers using different connections. It may use any combination of HTTP/1.x
303 or HTTP/2 on any side and will even automatically detect the protocol
304 spoken on each side when ALPN is used over TLS.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200305
306 - an SSL terminator / initiator / offloader : SSL/TLS may be used on the
307 connection coming from the client, on the connection going to the server,
Willy Tarreauec8962c2020-05-05 17:39:16 +0200308 or even on both connections. A lot of settings can be applied per name
309 (SNI), and may be updated at runtime without restarting. Such setups are
310 extremely scalable and deployments involving tens to hundreds of thousands
311 of certificates were reported.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200312
313 - a TCP normalizer : since connections are locally terminated by the operating
314 system, there is no relation between both sides, so abnormal traffic such as
315 invalid packets, flag combinations, window advertisements, sequence numbers,
316 incomplete connections (SYN floods), or so will not be passed to the other
317 side. This protects fragile TCP stacks from protocol attacks, and also
318 allows to optimize the connection parameters with the client without having
319 to modify the servers' TCP stack settings.
320
321 - an HTTP normalizer : when configured to process HTTP traffic, only valid
322 complete requests are passed. This protects against a lot of protocol-based
323 attacks. Additionally, protocol deviations for which there is a tolerance
324 in the specification are fixed so that they don't cause problem on the
Davor Ocelic4094ce12017-12-19 23:30:39 +0100325 servers (e.g. multiple-line headers).
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200326
327 - an HTTP fixing tool : it can modify / fix / add / remove / rewrite the URL
328 or any request or response header. This helps fixing interoperability issues
329 in complex environments.
330
331 - a content-based switch : it can consider any element from the request to
332 decide what server to pass the request or connection to. Thus it is possible
Davor Ocelic4094ce12017-12-19 23:30:39 +0100333 to handle multiple protocols over a same port (e.g. HTTP, HTTPS, SSH).
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200334
335 - a server load balancer : it can load balance TCP connections and HTTP
336 requests. In TCP mode, load balancing decisions are taken for the whole
337 connection. In HTTP mode, decisions are taken per request.
338
339 - a traffic regulator : it can apply some rate limiting at various points,
340 protect the servers against overloading, adjust traffic priorities based on
341 the contents, and even pass such information to lower layers and outer
342 network components by marking packets.
343
344 - a protection against DDoS and service abuse : it can maintain a wide number
345 of statistics per IP address, URL, cookie, etc and detect when an abuse is
346 happening, then take action (slow down the offenders, block them, send them
347 to outdated contents, etc).
348
349 - an observation point for network troubleshooting : due to the precision of
350 the information reported in logs, it is often used to narrow down some
351 network-related issues.
352
353 - an HTTP compression offloader : it can compress responses which were not
354 compressed by the server, thus reducing the page load time for clients with
355 poor connectivity or using high-latency, mobile networks.
356
Willy Tarreauec8962c2020-05-05 17:39:16 +0200357 - a caching proxy : it may cache responses in RAM so that subsequent requests
358 for the same object avoid the cost of another network transfer from the
359 server as long as the object remains present and valid. It will however not
360 store objects to any persistent storage. Please note that this caching
361 feature is designed to be maintenance free and focuses solely on saving
362 haproxy's precious resources and not on save the server's resources. Caches
363 designed to optimize servers require much more tuning and flexibility. If
364 you instead need such an advanced cache, please use Varnish Cache, which
365 integrates perfectly with haproxy, especially when SSL/TLS is needed on any
366 side.
367
368 - a FastCGI gateway : FastCGI can be seen as a different representation of
369 HTTP, and as such, HAProxy can directly load-balance a farm comprising any
370 combination of FastCGI application servers without requiring to insert
371 another level of gateway between them. This results in resource savings and
372 a reduction of maintenance costs.
373
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200374HAProxy is not :
375
Davor Ocelic4094ce12017-12-19 23:30:39 +0100376 - an explicit HTTP proxy, i.e. the proxy that browsers use to reach the
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200377 internet. There are excellent open-source software dedicated for this task,
378 such as Squid. However HAProxy can be installed in front of such a proxy to
379 provide load balancing and high availability.
380
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200381 - a data scrubber : it will not modify the body of requests nor responses.
382
Willy Tarreauec8962c2020-05-05 17:39:16 +0200383 - a static web server : during startup, it isolates itself inside a chroot
384 jail and drops its privileges, so that it will not perform any single file-
385 system access once started. As such it cannot be turned into a static web
386 server (dynamic servers are supported through FastCGI however). There are
387 excellent open-source software for this such as Apache or Nginx, and
388 HAProxy can be easily installed in front of them to provide load balancing,
389 high availability and acceleration.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200390
391 - a packet-based load balancer : it will not see IP packets nor UDP datagrams,
392 will not perform NAT or even less DSR. These are tasks for lower layers.
393 Some kernel-based components such as IPVS (Linux Virtual Server) already do
394 this pretty well and complement perfectly with HAProxy.
395
396
3973.2. How HAProxy works
398----------------------
399
Willy Tarreauec8962c2020-05-05 17:39:16 +0200400HAProxy is an event-driven, non-blocking engine combining a very fast I/O layer
401with a priority-based, multi-threaded scheduler. As it is designed with a data
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200402forwarding goal in mind, its architecture is optimized to move data as fast as
Willy Tarreauec8962c2020-05-05 17:39:16 +0200403possible with the least possible operations. It focuses on optimizing the CPU
404cache's efficiency by sticking connections to the same CPU as long as possible.
405As such it implements a layered model offering bypass mechanisms at each level
406ensuring data doesn't reach higher levels unless needed. Most of the processing
407is performed in the kernel, and HAProxy does its best to help the kernel do the
408work as fast as possible by giving some hints or by avoiding certain operation
409when it guesses they could be grouped later. As a result, typical figures show
41015% of the processing time spent in HAProxy versus 85% in the kernel in TCP or
411HTTP close mode, and about 30% for HAProxy versus 70% for the kernel in HTTP
412keep-alive mode.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200413
414A single process can run many proxy instances; configurations as large as
Willy Tarreauec8962c2020-05-05 17:39:16 +0200415300000 distinct proxies in a single process were reported to run fine. A single
416core, single CPU setup is far more than enough for more than 99% users, and as
417such, users of containers and virtual machines are encouraged to use the
418absolute smallest images they can get to save on operational costs and simplify
419troubleshooting. However the machine HAProxy runs on must never ever swap, and
420its CPU must not be artificially throttled (sub-CPU allocation in hypervisors)
421nor be shared with compute-intensive processes which would induce a very high
422context-switch latency.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200423
Willy Tarreauec8962c2020-05-05 17:39:16 +0200424Threading allows to exploit all available processing capacity by using one
425thread per CPU core. This is mostly useful for SSL or when data forwarding
426rates above 40 Gbps are needed. In such cases it is critically important to
427avoid communications between multiple physical CPUs, which can cause strong
428bottlenecks in the network stack and in HAProxy itself. While counter-intuitive
429to some, the first thing to do when facing some performance issues is often to
430reduce the number of CPUs HAProxy runs on.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200431
432HAProxy only requires the haproxy executable and a configuration file to run.
433For logging it is highly recommended to have a properly configured syslog daemon
Willy Tarreauec8962c2020-05-05 17:39:16 +0200434and log rotations in place. Logs may also be sent to stdout/stderr, which can be
435useful inside containers. The configuration files are parsed before starting,
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200436then HAProxy tries to bind all listening sockets, and refuses to start if
437anything fails. Past this point it cannot fail anymore. This means that there
438are no runtime failures and that if it accepts to start, it will work until it
439is stopped.
440
441Once HAProxy is started, it does exactly 3 things :
442
443 - process incoming connections;
444
445 - periodically check the servers' status (known as health checks);
446
447 - exchange information with other haproxy nodes.
448
449Processing incoming connections is by far the most complex task as it depends
450on a lot of configuration possibilities, but it can be summarized as the 9 steps
451below :
452
453 - accept incoming connections from listening sockets that belong to a
454 configuration entity known as a "frontend", which references one or multiple
455 listening addresses;
456
457 - apply the frontend-specific processing rules to these connections that may
458 result in blocking them, modifying some headers, or intercepting them to
459 execute some internal applets such as the statistics page or the CLI;
460
461 - pass these incoming connections to another configuration entity representing
462 a server farm known as a "backend", which contains the list of servers and
463 the load balancing strategy for this server farm;
464
465 - apply the backend-specific processing rules to these connections;
466
467 - decide which server to forward the connection to according to the load
468 balancing strategy;
469
470 - apply the backend-specific processing rules to the response data;
471
472 - apply the frontend-specific processing rules to the response data;
473
474 - emit a log to report what happened in fine details;
475
476 - in HTTP, loop back to the second step to wait for a new request, otherwise
477 close the connection.
478
479Frontends and backends are sometimes considered as half-proxies, since they only
480look at one side of an end-to-end connection; the frontend only cares about the
481clients while the backend only cares about the servers. HAProxy also supports
482full proxies which are exactly the union of a frontend and a backend. When HTTP
483processing is desired, the configuration will generally be split into frontends
484and backends as they open a lot of possibilities since any frontend may pass a
485connection to any backend. With TCP-only proxies, using frontends and backends
486rarely provides a benefit and the configuration can be more readable with full
487proxies.
488
489
4903.3. Basic features
491-------------------
492
493This section will enumerate a number of features that HAProxy implements, some
494of which are generally expected from any modern load balancer, and some of
495which are a direct benefit of HAProxy's architecture. More advanced features
496will be detailed in the next section.
497
498
4993.3.1. Basic features : Proxying
500--------------------------------
501
502Proxying is the action of transferring data between a client and a server over
Patrick Starrdce734e2017-10-09 13:17:12 +0700503two independent connections. The following basic features are supported by
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200504HAProxy regarding proxying and connection management :
505
506 - Provide the server with a clean connection to protect them against any
507 client-side defect or attack;
508
Patrick Starrdce734e2017-10-09 13:17:12 +0700509 - Listen to multiple IP addresses and/or ports, even port ranges;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200510
Davor Ocelic4094ce12017-12-19 23:30:39 +0100511 - Transparent accept : intercept traffic targeting any arbitrary IP address
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200512 that doesn't even belong to the local system;
513
514 - Server port doesn't need to be related to listening port, and may even be
515 translated by a fixed offset (useful with ranges);
516
517 - Transparent connect : spoof the client's (or any) IP address if needed
518 when connecting to the server;
519
520 - Provide a reliable return IP address to the servers in multi-site LBs;
521
522 - Offload the server thanks to buffers and possibly short-lived connections
523 to reduce their concurrent connection count and their memory footprint;
524
Davor Ocelic4094ce12017-12-19 23:30:39 +0100525 - Optimize TCP stacks (e.g. SACK), congestion control, and reduce RTT impacts;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200526
Davor Ocelic4094ce12017-12-19 23:30:39 +0100527 - Support different protocol families on both sides (e.g. IPv4/IPv6/Unix);
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200528
529 - Timeout enforcement : HAProxy supports multiple levels of timeouts depending
530 on the stage the connection is, so that a dead client or server, or an
531 attacker cannot be granted resources for too long;
532
533 - Protocol validation: HTTP, SSL, or payload are inspected and invalid
534 protocol elements are rejected, unless instructed to accept them anyway;
535
536 - Policy enforcement : ensure that only what is allowed may be forwarded;
537
538 - Both incoming and outgoing connections may be limited to certain network
539 namespaces (Linux only), making it easy to build a cross-container,
540 multi-tenant load balancer;
541
542 - PROXY protocol presents the client's IP address to the server even for
543 non-HTTP traffic. This is an HAProxy extension that was adopted by a number
544 of third-party products by now, at least these ones at the time of writing :
545 - client : haproxy, stud, stunnel, exaproxy, ELB, squid
546 - server : haproxy, stud, postfix, exim, nginx, squid, node.js, varnish
547
548
5493.3.2. Basic features : SSL
550---------------------------
551
552HAProxy's SSL stack is recognized as one of the most featureful according to
553Google's engineers (http://istlsfastyet.com/). The most commonly used features
554making it quite complete are :
555
556 - SNI-based multi-hosting with no limit on sites count and focus on
557 performance. At least one deployment is known for running 50000 domains
558 with their respective certificates;
559
560 - support for wildcard certificates reduces the need for many certificates ;
561
562 - certificate-based client authentication with configurable policies on
563 failure to present a valid certificate. This allows to present a different
564 server farm to regenerate the client certificate for example;
565
566 - authentication of the backend server ensures the backend server is the real
567 one and not a man in the middle;
568
Patrick Starrdce734e2017-10-09 13:17:12 +0700569 - authentication with the backend server lets the backend server know it's
570 really the expected haproxy node that is connecting to it;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200571
572 - TLS NPN and ALPN extensions make it possible to reliably offload SPDY/HTTP2
573 connections and pass them in clear text to backend servers;
574
575 - OCSP stapling further reduces first page load time by delivering inline an
576 OCSP response when the client requests a Certificate Status Request;
577
578 - Dynamic record sizing provides both high performance and low latency, and
579 significantly reduces page load time by letting the browser start to fetch
580 new objects while packets are still in flight;
581
582 - permanent access to all relevant SSL/TLS layer information for logging,
Davor Ocelic4094ce12017-12-19 23:30:39 +0100583 access control, reporting etc. These elements can be embedded into HTTP
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200584 header or even as a PROXY protocol extension so that the offloaded server
585 gets all the information it would have had if it performed the SSL
586 termination itself.
587
588 - Detect, log and block certain known attacks even on vulnerable SSL libs,
589 such as the Heartbleed attack affecting certain versions of OpenSSL.
590
Pavlos Parissisba56d9c2015-08-24 13:14:32 +0200591 - support for stateless session resumption (RFC 5077 TLS Ticket extension).
592 TLS tickets can be updated from CLI which provides them means to implement
593 Perfect Forward Secrecy by frequently rotating the tickets.
594
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200595
5963.3.3. Basic features : Monitoring
597----------------------------------
598
599HAProxy focuses a lot on availability. As such it cares about servers state,
600and about reporting its own state to other network components :
601
Patrick Starrdce734e2017-10-09 13:17:12 +0700602 - Servers' state is continuously monitored using per-server parameters. This
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200603 ensures the path to the server is operational for regular traffic;
604
605 - Health checks support two hysteresis for up and down transitions in order
606 to protect against state flapping;
607
608 - Checks can be sent to a different address/port/protocol : this makes it
609 easy to check a single service that is considered representative of multiple
610 ones, for example the HTTPS port for an HTTP+HTTPS server.
611
612 - Servers can track other servers and go down simultaneously : this ensures
Davor Ocelic4094ce12017-12-19 23:30:39 +0100613 that servers hosting multiple services can fail atomically and that no one
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200614 will be sent to a partially failed server;
615
616 - Agents may be deployed on the server to monitor load and health : a server
617 may be interested in reporting its load, operational status, administrative
Patrick Starrdce734e2017-10-09 13:17:12 +0700618 status independently from what health checks can see. By running a simple
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200619 agent on the server, it's possible to consider the server's view of its own
620 health in addition to the health checks validating the whole path;
621
622 - Various check methods are available : TCP connect, HTTP request, SMTP hello,
623 SSL hello, LDAP, SQL, Redis, send/expect scripts, all with/without SSL;
624
625 - State change is notified in the logs and stats page with the failure reason
Davor Ocelic4094ce12017-12-19 23:30:39 +0100626 (e.g. the HTTP response received at the moment the failure was detected). An
Willy Tarreaueff04f42015-08-27 14:44:43 +0200627 e-mail can also be sent to a configurable address upon such a change ;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200628
629 - Server state is also reported on the stats interface and can be used to take
630 routing decisions so that traffic may be sent to different farms depending
Davor Ocelic4094ce12017-12-19 23:30:39 +0100631 on their sizes and/or health (e.g. loss of an inter-DC link);
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200632
633 - HAProxy can use health check requests to pass information to the servers,
Davor Ocelic4094ce12017-12-19 23:30:39 +0100634 such as their names, weight, the number of other servers in the farm etc.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200635 so that servers can adjust their response and decisions based on this
Davor Ocelic4094ce12017-12-19 23:30:39 +0100636 knowledge (e.g. postpone backups to keep more CPU available);
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200637
638 - Servers can use health checks to report more detailed state than just on/off
Davor Ocelic4094ce12017-12-19 23:30:39 +0100639 (e.g. I would like to stop, please stop sending new visitors);
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200640
641 - HAProxy itself can report its state to external components such as routers
642 or other load balancers, allowing to build very complete multi-path and
643 multi-layer infrastructures.
644
645
6463.3.4. Basic features : High availability
647-----------------------------------------
648
649Just like any serious load balancer, HAProxy cares a lot about availability to
650ensure the best global service continuity :
651
Davor Ocelic4094ce12017-12-19 23:30:39 +0100652 - Only valid servers are used ; the other ones are automatically evicted from
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200653 load balancing farms ; under certain conditions it is still possible to
654 force to use them though;
655
656 - Support for a graceful shutdown so that it is possible to take servers out
657 of a farm without affecting any connection;
658
659 - Backup servers are automatically used when active servers are down and
660 replace them so that sessions are not lost when possible. This also allows
Davor Ocelic4094ce12017-12-19 23:30:39 +0100661 to build multiple paths to reach the same server (e.g. multiple interfaces);
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200662
663 - Ability to return a global failed status for a farm when too many servers
664 are down. This, combined with the monitoring capabilities makes it possible
665 for an upstream component to choose a different LB node for a given service;
666
667 - Stateless design makes it easy to build clusters : by design, HAProxy does
668 its best to ensure the highest service continuity without having to store
669 information that could be lost in the event of a failure. This ensures that
670 a takeover is the most seamless possible;
671
672 - Integrates well with standard VRRP daemon keepalived : HAProxy easily tells
Patrick Starrdce734e2017-10-09 13:17:12 +0700673 keepalived about its state and copes very well with floating virtual IP
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200674 addresses. Note: only use IP redundancy protocols (VRRP/CARP) over cluster-
675 based solutions (Heartbeat, ...) as they're the ones offering the fastest,
676 most seamless, and most reliable switchover.
677
678
6793.3.5. Basic features : Load balancing
680--------------------------------------
681
682HAProxy offers a fairly complete set of load balancing features, most of which
683are unfortunately not available in a number of other load balancing products :
684
Willy Tarreauec8962c2020-05-05 17:39:16 +0200685 - no less than 10 load balancing algorithms are supported, some of which apply
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200686 to input data to offer an infinite list of possibilities. The most common
687 ones are round-robin (for short connections, pick each server in turn),
688 leastconn (for long connections, pick the least recently used of the servers
689 with the lowest connection count), source (for SSL farms or terminal server
Davor Ocelic4094ce12017-12-19 23:30:39 +0100690 farms, the server directly depends on the client's source address), URI (for
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200691 HTTP caches, the server directly depends on the HTTP URI), hdr (the server
692 directly depends on the contents of a specific HTTP header field), first
693 (for short-lived virtual machines, all connections are packed on the
694 smallest possible subset of servers so that unused ones can be powered
695 down);
696
697 - all algorithms above support per-server weights so that it is possible to
698 accommodate from different server generations in a farm, or direct a small
699 fraction of the traffic to specific servers (debug mode, running the next
700 version of the software, etc);
701
702 - dynamic weights are supported for round-robin, leastconn and consistent
703 hashing ; this allows server weights to be modified on the fly from the CLI
704 or even by an agent running on the server;
705
706 - slow-start is supported whenever a dynamic weight is supported; this allows
707 a server to progressively take the traffic. This is an important feature
708 for fragile application servers which require to compile classes at runtime
709 as well as cold caches which need to fill up before being run at full
710 throttle;
711
712 - hashing can apply to various elements such as client's source address, URL
713 components, query string element, header field values, POST parameter, RDP
714 cookie;
715
716 - consistent hashing protects server farms against massive redistribution when
717 adding or removing servers in a farm. That's very important in large cache
718 farms and it allows slow-start to be used to refill cold caches;
719
720 - a number of internal metrics such as the number of connections per server,
721 per backend, the amount of available connection slots in a backend etc makes
722 it possible to build very advanced load balancing strategies.
723
724
7253.3.6. Basic features : Stickiness
726----------------------------------
727
728Application load balancing would be useless without stickiness. HAProxy provides
729a fairly comprehensive set of possibilities to maintain a visitor on the same
730server even across various events such as server addition/removal, down/up
731cycles, and some methods are designed to be resistant to the distance between
732multiple load balancing nodes in that they don't require any replication :
733
734 - stickiness information can be individually matched and learned from
735 different places if desired. For example a JSESSIONID cookie may be matched
736 both in a cookie and in the URL. Up to 8 parallel sources can be learned at
737 the same time and each of them may point to a different stick-table;
738
739 - stickiness information can come from anything that can be seen within a
740 request or response, including source address, TCP payload offset and
Patrick Starrdce734e2017-10-09 13:17:12 +0700741 length, HTTP query string elements, header field values, cookies, and so
Davor Ocelic4094ce12017-12-19 23:30:39 +0100742 on.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200743
Davor Ocelic4094ce12017-12-19 23:30:39 +0100744 - stick-tables are replicated between all nodes in a multi-master fashion;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200745
746 - commonly used elements such as SSL-ID or RDP cookies (for TSE farms) are
747 directly accessible to ease manipulation;
748
Davor Ocelic4094ce12017-12-19 23:30:39 +0100749 - all sticking rules may be dynamically conditioned by ACLs;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200750
751 - it is possible to decide not to stick to certain servers, such as backup
752 servers, so that when the nominal server comes back, it automatically takes
753 the load back. This is often used in multi-path environments;
754
Davor Ocelic4094ce12017-12-19 23:30:39 +0100755 - in HTTP it is often preferred not to learn anything and instead manipulate
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200756 a cookie dedicated to stickiness. For this, it's possible to detect,
757 rewrite, insert or prefix such a cookie to let the client remember what
758 server was assigned;
759
760 - the server may decide to change or clean the stickiness cookie on logout,
761 so that leaving visitors are automatically unbound from the server;
762
763 - using ACL-based rules it is also possible to selectively ignore or enforce
764 stickiness regardless of the server's state; combined with advanced health
765 checks, that helps admins verify that the server they're installing is up
766 and running before presenting it to the whole world;
767
768 - an innovative mechanism to set a maximum idle time and duration on cookies
769 ensures that stickiness can be smoothly stopped on devices which are never
770 closed (smartphones, TVs, home appliances) without having to store them on
771 persistent storage;
772
773 - multiple server entries may share the same stickiness keys so that
774 stickiness is not lost in multi-path environments when one path goes down;
775
776 - soft-stop ensures that only users with stickiness information will continue
777 to reach the server they've been assigned to but no new users will go there.
778
779
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +02007803.3.7. Basic features : Logging
781-------------------------------
782
783Logging is an extremely important feature for a load balancer, first because a
784load balancer is often wrongly accused of causing the problems it reveals, and
785second because it is placed at a critical point in an infrastructure where all
786normal and abnormal activity needs to be analyzed and correlated with other
787components.
788
789HAProxy provides very detailed logs, with millisecond accuracy and the exact
790connection accept time that can be searched in firewalls logs (e.g. for NAT
791correlation). By default, TCP and HTTP logs are quite detailed and contain
792everything needed for troubleshooting, such as source IP address and port,
793frontend, backend, server, timers (request receipt duration, queue duration,
794connection setup time, response headers time, data transfer time), global
795process state, connection counts, queue status, retries count, detailed
796stickiness actions and disconnect reasons, header captures with a safe output
797encoding. It is then possible to extend or replace this format to include any
798sampled data, variables, captures, resulting in very detailed information. For
799example it is possible to log the number of cumulative requests or number of
800different URLs visited by a client.
801
802The log level may be adjusted per request using standard ACLs, so it is possible
803to automatically silent some logs considered as pollution and instead raise
804warnings when some abnormal behavior happen for a small part of the traffic
805(e.g. too many URLs or HTTP errors for a source address). Administrative logs
806are also emitted with their own levels to inform about the loss or recovery of a
807server for example.
808
809Each frontend and backend may use multiple independent log outputs, which eases
810multi-tenancy. Logs are preferably sent over UDP, maybe JSON-encoded, and are
811truncated after a configurable line length in order to guarantee delivery. But
812it is also possible to send them to stdout/stderr or any file descriptor, as
813well as to a ring buffer that a client can subscribe to in order to retrieve
814them.
815
816
8173.3.8. Basic features : Statistics
818----------------------------------
819
820HAProxy provides a web-based statistics reporting interface with authentication,
821security levels and scopes. It is thus possible to provide each hosted customer
822with his own page showing only his own instances. This page can be located in a
823hidden URL part of the regular web site so that no new port needs to be opened.
824This page may also report the availability of other HAProxy nodes so that it is
825easy to spot if everything works as expected at a glance. The view is synthetic
826with a lot of details accessible (such as error causes, last access and last
827change duration, etc), which are also accessible as a CSV table that other tools
828may import to draw graphs. The page may self-refresh to be used as a monitoring
829page on a large display. In administration mode, the page also allows to change
830server state to ease maintenance operations.
831
832A Prometheus exporter is also provided so that the statistics can be consumed
833in a different format depending on the deployment.
834
835
8363.4. Standard features
837----------------------
838
839In this section, some features that are very commonly used in HAProxy but are
840not necessarily present on other load balancers are enumerated.
841
842
8433.4.1. Standard features : Sampling and converting information
844--------------------------------------------------------------
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200845
846HAProxy supports information sampling using a wide set of "sample fetch
847functions". The principle is to extract pieces of information known as samples,
848for immediate use. This is used for stickiness, to build conditions, to produce
849information in logs or to enrich HTTP headers.
850
851Samples can be fetched from various sources :
852
853 - constants : integers, strings, IP addresses, binary blocks;
854
855 - the process : date, environment variables, server/frontend/backend/process
856 state, byte/connection counts/rates, queue length, random generator, ...
857
858 - variables : per-session, per-request, per-response variables;
859
860 - the client connection : source and destination addresses and ports, and all
861 related statistics counters;
862
863 - the SSL client session : protocol, version, algorithm, cipher, key size,
864 session ID, all client and server certificate fields, certificate serial,
865 SNI, ALPN, NPN, client support for certain extensions;
866
867 - request and response buffers contents : arbitrary payload at offset/length,
868 data length, RDP cookie, decoding of SSL hello type, decoding of TLS SNI;
869
870 - HTTP (request and response) : method, URI, path, query string arguments,
Davor Ocelic4094ce12017-12-19 23:30:39 +0100871 status code, headers values, positional header value, cookies, captures,
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200872 authentication, body elements;
873
874A sample may then pass through a number of operators known as "converters" to
875experience some transformation. A converter consumes a sample and produces a
876new one, possibly of a completely different type. For example, a converter may
877be used to return only the integer length of the input string, or could turn a
878string to upper case. Any arbitrary number of converters may be applied in
879series to a sample before final use. Among all available sample converters, the
880following ones are the most commonly used :
881
882 - arithmetic and logic operators : they make it possible to perform advanced
883 computation on input data, such as computing ratios, percentages or simply
884 converting from one unit to another one;
885
Patrick Starrdce734e2017-10-09 13:17:12 +0700886 - IP address masks are useful when some addresses need to be grouped by larger
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200887 networks;
888
Davor Ocelic4094ce12017-12-19 23:30:39 +0100889 - data representation : URL-decode, base64, hex, JSON strings, hashing;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200890
891 - string conversion : extract substrings at fixed positions, fixed length,
892 extract specific fields around certain delimiters, extract certain words,
Patrick Starrdce734e2017-10-09 13:17:12 +0700893 change case, apply regex-based substitution;
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200894
Davor Ocelic4094ce12017-12-19 23:30:39 +0100895 - date conversion : convert to HTTP date format, convert local to UTC and
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200896 conversely, add or remove offset;
897
898 - lookup an entry in a stick table to find statistics or assigned server;
899
900 - map-based key-to-value conversion from a file (mostly used for geolocation).
901
902
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +02009033.4.2. Standard features : Maps
904-------------------------------
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200905
906Maps are a powerful type of converter consisting in loading a two-columns file
907into memory at boot time, then looking up each input sample from the first
908column and either returning the corresponding pattern on the second column if
909the entry was found, or returning a default value. The output information also
910being a sample, it can in turn experience other transformations including other
911map lookups. Maps are most commonly used to translate the client's IP address
912to an AS number or country code since they support a longest match for network
913addresses but they can be used for various other purposes.
914
915Part of their strength comes from being updatable on the fly either from the CLI
916or from certain actions using other samples, making them capable of storing and
917retrieving information between subsequent accesses. Another strength comes from
Patrick Starrdce734e2017-10-09 13:17:12 +0700918the binary tree based indexation which makes them extremely fast even when they
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200919contain hundreds of thousands of entries, making geolocation very cheap and easy
920to set up.
921
922
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +02009233.4.3. Standard features : ACLs and conditions
924----------------------------------------------
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200925
926Most operations in HAProxy can be made conditional. Conditions are built by
927combining multiple ACLs using logic operators (AND, OR, NOT). Each ACL is a
928series of tests based on the following elements :
929
930 - a sample fetch method to retrieve the element to test ;
931
932 - an optional series of converters to transform the element ;
933
934 - a list of patterns to match against ;
935
936 - a matching method to indicate how to compare the patterns with the sample
937
938For example, the sample may be taken from the HTTP "Host" header, it could then
939be converted to lower case, then matched against a number of regex patterns
940using the regex matching method.
941
942Technically, ACLs are built on the same core as the maps, they share the exact
943same internal structure, pattern matching methods and performance. The only real
944difference is that instead of returning a sample, they only return "found" or
945or "not found". In terms of usage, ACL patterns may be declared inline in the
946configuration file and do not require their own file. ACLs may be named for ease
947of use or to make configurations understandable. A named ACL may be declared
948multiple times and it will evaluate all definitions in turn until one matches.
949
950About 13 different pattern matching methods are provided, among which IP address
951mask, integer ranges, substrings, regex. They work like functions, and just like
952with any programming language, only what is needed is evaluated, so when a
953condition involving an OR is already true, next ones are not evaluated, and
954similarly when a condition involving an AND is already false, the rest of the
955condition is not evaluated.
956
957There is no practical limit to the number of declared ACLs, and a handful of
958commonly used ones are provided. However experience has shown that setups using
959a lot of named ACLs are quite hard to troubleshoot and that sometimes using
Patrick Starrdce734e2017-10-09 13:17:12 +0700960anonymous ACLs inline is easier as it requires less references out of the scope
Davor Ocelic4094ce12017-12-19 23:30:39 +0100961being analyzed.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200962
963
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +02009643.4.4. Standard features : Content switching
965--------------------------------------------
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200966
967HAProxy implements a mechanism known as content-based switching. The principle
968is that a connection or request arrives on a frontend, then the information
969carried with this request or connection are processed, and at this point it is
970possible to write ACLs-based conditions making use of these information to
971decide what backend will process the request. Thus the traffic is directed to
972one backend or another based on the request's contents. The most common example
973consists in using the Host header and/or elements from the path (sub-directories
974or file-name extensions) to decide whether an HTTP request targets a static
975object or the application, and to route static objects traffic to a backend made
976of fast and light servers, and all the remaining traffic to a more complex
977application server, thus constituting a fine-grained virtual hosting solution.
978This is quite convenient to make multiple technologies coexist as a more global
979solution.
980
981Another use case of content-switching consists in using different load balancing
982algorithms depending on various criteria. A cache may use a URI hash while an
Davor Ocelic4094ce12017-12-19 23:30:39 +0100983application would use round-robin.
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200984
985Last but not least, it allows multiple customers to use a small share of a
986common resource by enforcing per-backend (thus per-customer connection limits).
987
988Content switching rules scale very well, though their performance may depend on
989the number and complexity of the ACLs in use. But it is also possible to write
990dynamic content switching rules where a sample value directly turns into a
991backend name and without making use of ACLs at all. Such configurations have
992been reported to work fine at least with 300000 backends in production.
993
994
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +02009953.4.5. Standard features : Stick-tables
996---------------------------------------
Willy Tarreaud8e42b62015-08-18 21:51:36 +0200997
998Stick-tables are commonly used to store stickiness information, that is, to keep
999a reference to the server a certain visitor was directed to. The key is then the
1000identifier associated with the visitor (its source address, the SSL ID of the
1001connection, an HTTP or RDP cookie, the customer number extracted from the URL or
1002from the payload, ...) and the stored value is then the server's identifier.
1003
1004Stick tables may use 3 different types of samples for their keys : integers,
1005strings and addresses. Only one stick-table may be referenced in a proxy, and it
Davor Ocelic4094ce12017-12-19 23:30:39 +01001006is designated everywhere with the proxy name. Up to 8 keys may be tracked in
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001007parallel. The server identifier is committed during request or response
1008processing once both the key and the server are known.
1009
1010Stick-table contents may be replicated in active-active mode with other HAProxy
1011nodes known as "peers" as well as with the new process during a reload operation
1012so that all load balancing nodes share the same information and take the same
Davor Ocelic4094ce12017-12-19 23:30:39 +01001013routing decision if client's requests are spread over multiple nodes.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001014
1015Since stick-tables are indexed on what allows to recognize a client, they are
1016often also used to store extra information such as per-client statistics. The
1017extra statistics take some extra space and need to be explicitly declared. The
1018type of statistics that may be stored includes the input and output bandwidth,
1019the number of concurrent connections, the connection rate and count over a
1020period, the amount and frequency of errors, some specific tags and counters,
Davor Ocelic4094ce12017-12-19 23:30:39 +01001021etc. In order to support keeping such information without being forced to
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001022stick to a given server, a special "tracking" feature is implemented and allows
1023to track up to 3 simultaneous keys from different tables at the same time
1024regardless of stickiness rules. Each stored statistics may be searched, dumped
1025and cleared from the CLI and adds to the live troubleshooting capabilities.
1026
1027While this mechanism can be used to surclass a returning visitor or to adjust
Davor Ocelic4094ce12017-12-19 23:30:39 +01001028the delivered quality of service depending on good or bad behavior, it is
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001029mostly used to fight against service abuse and more generally DDoS as it allows
Davor Ocelic4094ce12017-12-19 23:30:39 +01001030to build complex models to detect certain bad behaviors at a high processing
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001031speed.
1032
1033
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +020010343.4.6. Standard features : Formatted strings
1035--------------------------------------------
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001036
1037There are many places where HAProxy needs to manipulate character strings, such
1038as logs, redirects, header additions, and so on. In order to provide the
Davor Ocelic4094ce12017-12-19 23:30:39 +01001039greatest flexibility, the notion of Formatted strings was introduced, initially
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001040for logging purposes, which explains why it's still called "log-format". These
1041strings contain escape characters allowing to introduce various dynamic data
1042including variables and sample fetch expressions into strings, and even to
1043adjust the encoding while the result is being turned into a string (for example,
Willy Tarreauec8962c2020-05-05 17:39:16 +02001044adding quotes). This provides a powerful way to build header contents, to build
1045response data or even response templates, or to customize log lines.
1046Additionally, in order to remain simple to build most common strings, about 50
1047special tags are provided as shortcuts for information commonly used in logs.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001048
1049
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +020010503.4.7. Standard features : HTTP rewriting and redirection
1051---------------------------------------------------------
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001052
1053Installing a load balancer in front of an application that was never designed
1054for this can be a challenging task without the proper tools. One of the most
1055commonly requested operation in this case is to adjust requests and response
1056headers to make the load balancer appear as the origin server and to fix hard
1057coded information. This comes with changing the path in requests (which is
1058strongly advised against), modifying Host header field, modifying the Location
1059response header field for redirects, modifying the path and domain attribute
1060for cookies, and so on. It also happens that a number of servers are somewhat
1061verbose and tend to leak too much information in the response, making them more
Davor Ocelic4094ce12017-12-19 23:30:39 +01001062vulnerable to targeted attacks. While it's theoretically not the role of a load
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001063balancer to clean this up, in practice it's located at the best place in the
1064infrastructure to guarantee that everything is cleaned up.
1065
1066Similarly, sometimes the load balancer will have to intercept some requests and
1067respond with a redirect to a new target URL. While some people tend to confuse
1068redirects and rewriting, these are two completely different concepts, since the
1069rewriting makes the client and the server see different things (and disagree on
1070the location of the page being visited) while redirects ask the client to visit
1071the new URL so that it sees the same location as the server.
1072
1073In order to do this, HAProxy supports various possibilities for rewriting and
Davor Ocelic4094ce12017-12-19 23:30:39 +01001074redirects, among which :
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001075
1076 - regex-based URL and header rewriting in requests and responses. Regex are
1077 the most commonly used tool to modify header values since they're easy to
1078 manipulate and well understood;
1079
Davor Ocelic4094ce12017-12-19 23:30:39 +01001080 - headers may also be appended, deleted or replaced based on formatted strings
1081 so that it is possible to pass information there (e.g. client side TLS
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001082 algorithm and cipher);
1083
1084 - HTTP redirects can use any 3xx code to a relative, absolute, or completely
Davor Ocelic4094ce12017-12-19 23:30:39 +01001085 dynamic (formatted string) URI;
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001086
1087 - HTTP redirects also support some extra options such as setting or clearing
1088 a specific cookie, dropping the query string, appending a slash if missing,
1089 and so on;
1090
Willy Tarreauec8962c2020-05-05 17:39:16 +02001091 - a powerful "return" directive allows to customize every part of a response
1092 like status, headers, body using dynamic contents or even template files.
1093
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001094 - all operations support ACL-based conditions;
1095
1096
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +020010973.4.8. Standard features : Server protection
1098--------------------------------------------
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001099
Davor Ocelic4094ce12017-12-19 23:30:39 +01001100HAProxy does a lot to maximize service availability, and for this it takes
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001101large efforts to protect servers against overloading and attacks. The first
1102and most important point is that only complete and valid requests are forwarded
1103to the servers. The initial reason is that HAProxy needs to find the protocol
1104elements it needs to stay synchronized with the byte stream, and the second
1105reason is that until the request is complete, there is no way to know if some
1106elements will change its semantics. The direct benefit from this is that servers
1107are not exposed to invalid or incomplete requests. This is a very effective
1108protection against slowloris attacks, which have almost no impact on HAProxy.
1109
1110Another important point is that HAProxy contains buffers to store requests and
1111responses, and that by only sending a request to a server when it's complete and
1112by reading the whole response very quickly from the local network, the server
1113side connection is used for a very short time and this preserves server
1114resources as much as possible.
1115
1116A direct extension to this is that HAProxy can artificially limit the number of
1117concurrent connections or outstanding requests to a server, which guarantees
1118that the server will never be overloaded even if it continuously runs at 100% of
1119its capacity during traffic spikes. All excess requests will simply be queued to
1120be processed when one slot is released. In the end, this huge resource savings
1121most often ensures so much better server response times that it ends up actually
1122being faster than by overloading the server. Queued requests may be redispatched
1123to other servers, or even aborted in queue when the client aborts, which also
1124protects the servers against the "reload effect", where each click on "reload"
1125by a visitor on a slow-loading page usually induces a new request and maintains
1126the server in an overloaded state.
1127
1128The slow-start mechanism also protects restarting servers against high traffic
1129levels while they're still finalizing their startup or compiling some classes.
1130
1131Regarding the protocol-level protection, it is possible to relax the HTTP parser
Davor Ocelic4094ce12017-12-19 23:30:39 +01001132to accept non standard-compliant but harmless requests or responses and even to
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001133fix them. This allows bogus applications to be accessible while a fix is being
Patrick Starrdce734e2017-10-09 13:17:12 +07001134developed. In parallel, offending messages are completely captured with a
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001135detailed report that help developers spot the issue in the application. The most
1136dangerous protocol violations are properly detected and dealt with and fixed.
1137For example malformed requests or responses with two Content-length headers are
1138either fixed if the values are exactly the same, or rejected if they differ,
1139since it becomes a security problem. Protocol inspection is not limited to HTTP,
1140it is also available for other protocols like TLS or RDP.
1141
1142When a protocol violation or attack is detected, there are various options to
1143respond to the user, such as returning the common "HTTP 400 bad request",
Davor Ocelic4094ce12017-12-19 23:30:39 +01001144closing the connection with a TCP reset, or faking an error after a long delay
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001145("tarpit") to confuse the attacker. All of these contribute to protecting the
1146servers by discouraging the offending client from pursuing an attack that
1147becomes very expensive to maintain.
1148
1149HAProxy also proposes some more advanced options to protect against accidental
1150data leaks and session crossing. Not only it can log suspicious server responses
1151but it will also log and optionally block a response which might affect a given
1152visitors' confidentiality. One such example is a cacheable cookie appearing in a
1153cacheable response and which may result in an intermediary cache to deliver it
1154to another visitor, causing an accidental session sharing.
1155
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001156
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +020011573.5. Advanced features
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001158----------------------
1159
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +020011603.5.1. Advanced features : Management
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001161-------------------------------------
1162
1163HAProxy is designed to remain extremely stable and safe to manage in a regular
1164production environment. It is provided as a single executable file which doesn't
1165require any installation process. Multiple versions can easily coexist, meaning
1166that it's possible (and recommended) to upgrade instances progressively by
Davor Ocelic4094ce12017-12-19 23:30:39 +01001167order of importance instead of migrating all of them at once. Configuration
1168files are easily versioned. Configuration checking is done off-line so it
1169doesn't require to restart a service that will possibly fail. During
1170configuration checks, a number of advanced mistakes may be detected (e.g. a rule
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001171hiding another one, or stickiness that will not work) and detailed warnings and
1172configuration hints are proposed to fix them. Backwards configuration file
1173compatibility goes very far away in time, with version 1.5 still fully
1174supporting configurations for versions 1.1 written 13 years before, and 1.6
1175only dropping support for almost unused, obsolete keywords that can be done
1176differently. The configuration and software upgrade mechanism is smooth and non
1177disruptive in that it allows old and new processes to coexist on the system,
Davor Ocelic4094ce12017-12-19 23:30:39 +01001178each handling its own connections. System status, build options, and library
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001179compatibility are reported on startup.
1180
1181Some advanced features allow an application administrator to smoothly stop a
1182server, detect when there's no activity on it anymore, then take it off-line,
1183stop it, upgrade it and ensure it doesn't take any traffic while being upgraded,
1184then test it again through the normal path without opening it to the public, and
1185all of this without touching HAProxy at all. This ensures that even complicated
1186production operations may be done during opening hours with all technical
1187resources available.
1188
1189The process tries to save resources as much as possible, uses memory pools to
1190save on allocation time and limit memory fragmentation, releases payload buffers
1191as soon as their contents are sent, and supports enforcing strong memory limits
1192above which connections have to wait for a buffer to become available instead of
1193allocating more memory. This system helps guarantee memory usage in certain
1194strict environments.
1195
1196A command line interface (CLI) is available as a UNIX or TCP socket, to perform
1197a number of operations and to retrieve troubleshooting information. Everything
1198done on this socket doesn't require a configuration change, so it is mostly used
1199for temporary changes. Using this interface it is possible to change a server's
1200address, weight and status, to consult statistics and clear counters, dump and
1201clear stickiness tables, possibly selectively by key criteria, dump and kill
1202client-side and server-side connections, dump captured errors with a detailed
1203analysis of the exact cause and location of the error, dump, add and remove
1204entries from ACLs and maps, update TLS shared secrets, apply connection limits
1205and rate limits on the fly to arbitrary frontends (useful in shared hosting
1206environments), and disable a specific frontend to release a listening port
1207(useful when daytime operations are forbidden and a fix is needed nonetheless).
Willy Tarreauec8962c2020-05-05 17:39:16 +02001208Updating certificates and their configuration on the fly is permitted, as well
1209as enabling and consulting traces of every processing step of the traffic.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001210
1211For environments where SNMP is mandatory, at least two agents exist, one is
Davor Ocelic4094ce12017-12-19 23:30:39 +01001212provided with the HAProxy sources and relies on the Net-SNMP Perl module.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001213Another one is provided with the commercial packages and doesn't require Perl.
1214Both are roughly equivalent in terms of coverage.
1215
1216It is often recommended to install 4 utilities on the machine where HAProxy is
1217deployed :
1218
1219 - socat (in order to connect to the CLI, though certain forks of netcat can
1220 also do it to some extents);
1221
1222 - halog from the latest HAProxy version : this is the log analysis tool, it
1223 parses native TCP and HTTP logs extremely fast (1 to 2 GB per second) and
1224 extracts useful information and statistics such as requests per URL, per
1225 source address, URLs sorted by response time or error rate, termination
Davor Ocelic4094ce12017-12-19 23:30:39 +01001226 codes etc. It was designed to be deployed on the production servers to
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001227 help troubleshoot live issues so it has to be there ready to be used;
1228
1229 - tcpdump : this is highly recommended to take the network traces needed to
1230 troubleshoot an issue that was made visible in the logs. There is a moment
1231 where application and haproxy's analysis will diverge and the network traces
1232 are the only way to say who's right and who's wrong. It's also fairly common
1233 to detect bugs in network stacks and hypervisors thanks to tcpdump;
1234
1235 - strace : it is tcpdump's companion. It will report what HAProxy really sees
1236 and will help sort out the issues the operating system is responsible for
1237 from the ones HAProxy is responsible for. Strace is often requested when a
1238 bug in HAProxy is suspected;
1239
1240
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +020012413.5.2. Advanced features : System-specific capabilities
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001242-------------------------------------------------------
1243
1244Depending on the operating system HAProxy is deployed on, certain extra features
1245may be available or needed. While it is supported on a number of platforms,
Patrick Starrdce734e2017-10-09 13:17:12 +07001246HAProxy is primarily developed on Linux, which explains why some features are
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001247only available on this platform.
1248
1249The transparent bind and connect features, the support for binding connections
1250to a specific network interface, as well as the ability to bind multiple
1251processes to the same IP address and ports are only available on Linux and BSD
1252systems, though only Linux performs a kernel-side load balancing of the incoming
1253requests between the available processes.
1254
1255On Linux, there are also a number of extra features and optimizations including
1256support for network namespaces (also known as "containers") allowing HAProxy to
1257be a gateway between all containers, the ability to set the MSS, Netfilter marks
1258and IP TOS field on the client side connection, support for TCP FastOpen on the
1259listening side, TCP user timeouts to let the kernel quickly kill connections
1260when it detects the client has disappeared before the configured timeouts, TCP
1261splicing to let the kernel forward data between the two sides of a connections
1262thus avoiding multiple memory copies, the ability to enable the "defer-accept"
1263bind option to only get notified of an incoming connection once data become
1264available in the kernel buffers, and the ability to send the request with the
Davor Ocelic4094ce12017-12-19 23:30:39 +01001265ACK confirming a connect (sometimes called "piggy-back") which is enabled with
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001266the "tcp-smart-connect" option. On Linux, HAProxy also takes great care of
1267manipulating the TCP delayed ACKs to save as many packets as possible on the
1268network.
1269
1270Some systems have an unreliable clock which jumps back and forth in the past
1271and in the future. This used to happen with some NUMA systems where multiple
1272processors didn't see the exact same time of day, and recently it became more
1273common in virtualized environments where the virtual clock has no relation with
1274the real clock, resulting in huge time jumps (sometimes up to 30 seconds have
1275been observed). This causes a lot of trouble with respect to timeout enforcement
1276in general. Due to this flaw of these systems, HAProxy maintains its own
1277monotonic clock which is based on the system's clock but where drift is measured
1278and compensated for. This ensures that even with a very bad system clock, timers
1279remain reasonably accurate and timeouts continue to work. Note that this problem
1280affects all the software running on such systems and is not specific to HAProxy.
1281The common effects are spurious timeouts or application freezes. Thus if this
Davor Ocelic4094ce12017-12-19 23:30:39 +01001282behavior is detected on a system, it must be fixed, regardless of the fact that
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001283HAProxy protects itself against it.
1284
Willy Tarreauec8962c2020-05-05 17:39:16 +02001285On Linux, a new starting process may communicate with the previous one to reuse
1286its listening file descriptors so that the listening sockets are never
Thayne McCombscdbcca92021-01-07 21:24:41 -07001287interrupted during the process's replacement.
Willy Tarreauec8962c2020-05-05 17:39:16 +02001288
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001289
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +020012903.5.3. Advanced features : Scripting
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001291------------------------------------
1292
1293HAProxy can be built with support for the Lua embedded language, which opens a
1294wide area of new possibilities related to complex manipulation of requests or
1295responses, routing decisions, statistics processing and so on. Using Lua it is
1296even possible to establish parallel connections to other servers to exchange
1297information. This way it becomes possible (though complex) to develop an
1298authentication system for example. Please refer to the documentation in the file
1299"doc/lua-api/index.rst" for more information on how to use Lua.
1300
1301
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +020013023.5.4. Advanced features: Tracing
Willy Tarreauec8962c2020-05-05 17:39:16 +02001303---------------------------------
1304
1305At any moment an administrator may connect over the CLI and enable tracing in
1306various internal subsystems. Various levels of details are provided by default
1307so that in practice anything between one line per request to 500 lines per
1308request can be retrieved. Filters as well as an automatic capture on/off/pause
1309mechanism are available so that it really is possible to wait for a certain
1310event and watch it in detail. This is extremely convenient to diagnose protocol
1311violations from faulty servers and clients, or denial of service attacks.
1312
1313
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +020013143.6. Sizing
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001315-----------
1316
1317Typical CPU usage figures show 15% of the processing time spent in HAProxy
1318versus 85% in the kernel in TCP or HTTP close mode, and about 30% for HAProxy
1319versus 70% for the kernel in HTTP keep-alive mode. This means that the operating
1320system and its tuning have a strong impact on the global performance.
1321
1322Usages vary a lot between users, some focus on bandwidth, other ones on request
Davor Ocelic4094ce12017-12-19 23:30:39 +01001323rate, others on connection concurrency, others on SSL performance. This section
1324aims at providing a few elements to help with this task.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001325
1326It is important to keep in mind that every operation comes with a cost, so each
1327individual operation adds its overhead on top of the other ones, which may be
1328negligible in certain circumstances, and which may dominate in other cases.
1329
1330When processing the requests from a connection, we can say that :
1331
1332 - forwarding data costs less than parsing request or response headers;
1333
1334 - parsing request or response headers cost less than establishing then closing
1335 a connection to a server;
1336
1337 - establishing an closing a connection costs less than a TLS resume operation;
1338
1339 - a TLS resume operation costs less than a full TLS handshake with a key
1340 computation;
1341
1342 - an idle connection costs less CPU than a connection whose buffers hold data;
1343
1344 - a TLS context costs even more memory than a connection with data;
1345
1346So in practice, it is cheaper to process payload bytes than header bytes, thus
1347it is easier to achieve high network bandwidth with large objects (few requests
1348per volume unit) than with small objects (many requests per volume unit). This
1349explains why maximum bandwidth is always measured with large objects, while
1350request rate or connection rates are measured with small objects.
1351
Davor Ocelic4094ce12017-12-19 23:30:39 +01001352Some operations scale well on multiple processes spread over multiple CPUs,
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001353and others don't scale as well. Network bandwidth doesn't scale very far because
1354the CPU is rarely the bottleneck for large objects, it's mostly the network
Davor Ocelic4094ce12017-12-19 23:30:39 +01001355bandwidth and data buses to reach the network interfaces. The connection rate
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001356doesn't scale well over multiple processors due to a few locks in the system
1357when dealing with the local ports table. The request rate over persistent
1358connections scales very well as it doesn't involve much memory nor network
1359bandwidth and doesn't require to access locked structures. TLS key computation
1360scales very well as it's totally CPU-bound. TLS resume scales moderately well,
1361but reaches its limits around 4 processes where the overhead of accessing the
1362shared table offsets the small gains expected from more power.
1363
1364The performance numbers one can expect from a very well tuned system are in the
1365following range. It is important to take them as orders of magnitude and to
1366expect significant variations in any direction based on the processor, IRQ
1367setting, memory type, network interface type, operating system tuning and so on.
1368
Davor Ocelic4094ce12017-12-19 23:30:39 +01001369The following numbers were found on a Core i7 running at 3.7 GHz equipped with
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001370a dual-port 10 Gbps NICs running Linux kernel 3.10, HAProxy 1.6 and OpenSSL
13711.0.2. HAProxy was running as a single process on a single dedicated CPU core,
1372and two extra cores were dedicated to network interrupts :
1373
1374 - 20 Gbps of maximum network bandwidth in clear text for objects 256 kB or
1375 higher, 10 Gbps for 41kB or higher;
1376
1377 - 4.6 Gbps of TLS traffic using AES256-GCM cipher with large objects;
1378
1379 - 83000 TCP connections per second from client to server;
1380
1381 - 82000 HTTP connections per second from client to server;
1382
1383 - 97000 HTTP requests per second in server-close mode (keep-alive with the
1384 client, close with the server);
1385
1386 - 243000 HTTP requests per second in end-to-end keep-alive mode;
1387
1388 - 300000 filtered TCP connections per second (anti-DDoS)
1389
1390 - 160000 HTTPS requests per second in keep-alive mode over persistent TLS
1391 connections;
1392
1393 - 13100 HTTPS requests per second using TLS resumed connections;
1394
Davor Ocelic4094ce12017-12-19 23:30:39 +01001395 - 1300 HTTPS connections per second using TLS connections renegotiated with
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001396 RSA2048;
1397
1398 - 20000 concurrent saturated connections per GB of RAM, including the memory
1399 required for system buffers; it is possible to do better with careful tuning
Davor Ocelic4094ce12017-12-19 23:30:39 +01001400 but this result it easy to achieve.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001401
1402 - about 8000 concurrent TLS connections (client-side only) per GB of RAM,
1403 including the memory required for system buffers;
1404
1405 - about 5000 concurrent end-to-end TLS connections (both sides) per GB of
1406 RAM including the memory required for system buffers;
1407
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +02001408A more recent benchmark featuring the multi-thread enabled HAProxy 2.4 on a
140964-core ARM Graviton2 processor in AWS reached 2 million HTTPS requests per
1410second at sub-millisecond response time, and 100 Gbps of traffic:
1411
1412 https://www.haproxy.com/blog/haproxy-forwards-over-2-million-http-requests-per-second-on-a-single-aws-arm-instance/
1413
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001414Thus a good rule of thumb to keep in mind is that the request rate is divided
1415by 10 between TLS keep-alive and TLS resume, and between TLS resume and TLS
Davor Ocelic4094ce12017-12-19 23:30:39 +01001416renegotiation, while it's only divided by 3 between HTTP keep-alive and HTTP
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001417close. Another good rule of thumb is to remember that a high frequency core
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +02001418with AES instructions can do around 20 Gbps of AES-GCM per core.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001419
1420Another good rule of thumb is to consider that on the same server, HAProxy will
1421be able to saturate :
1422
1423 - about 5-10 static file servers or caching proxies;
1424
1425 - about 100 anti-virus proxies;
1426
Willy Tarreau16af23c2015-08-27 16:30:53 +02001427 - and about 100-1000 application servers depending on the technology in use.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001428
1429
Willy Tarreaucf5b8ab2022-05-31 16:23:06 +020014303.7. How to get HAProxy
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001431-----------------------
1432
Davor Ocelic4094ce12017-12-19 23:30:39 +01001433HAProxy is an open source project covered by the GPLv2 license, meaning that
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001434everyone is allowed to redistribute it provided that access to the sources is
1435also provided upon request, especially if any modifications were made.
1436
1437HAProxy evolves as a main development branch called "master" or "mainline", from
1438which new branches are derived once the code is considered stable. A lot of web
1439sites run some development branches in production on a voluntarily basis, either
1440to participate to the project or because they need a bleeding edge feature, and
1441their feedback is highly valuable to fix bugs and judge the overall quality and
Patrick Starrdce734e2017-10-09 13:17:12 +07001442stability of the version being developed.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001443
1444The new branches that are created when the code is stable enough constitute a
1445stable version and are generally maintained for several years, so that there is
1446no emergency to migrate to a newer branch even when you're not on the latest.
1447Once a stable branch is issued, it may only receive bug fixes, and very rarely
1448minor feature updates when that makes users' life easier. All fixes that go into
1449a stable branch necessarily come from the master branch. This guarantees that no
1450fix will be lost after an upgrade. For this reason, if you fix a bug, please
1451make the patch against the master branch, not the stable branch. You may even
1452discover it was already fixed. This process also ensures that regressions in a
1453stable branch are extremely rare, so there is never any excuse for not upgrading
1454to the latest version in your current branch.
1455
Willy Tarreauec8962c2020-05-05 17:39:16 +02001456Branches are numbered with two digits delimited with a dot, such as "1.6".
1457Since 1.9, branches with an odd second digit are mostly focused on sensitive
1458technical updates and more aimed at advanced users because they are likely to
1459trigger more bugs than the other ones. They are maintained for about a year
1460only and must not be deployed where they cannot be rolled back in emergency. A
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001461complete version includes one or two sub-version numbers indicating the level of
1462fix. For example, version 1.5.14 is the 14th fix release in branch 1.5 after
1463version 1.5.0 was issued. It contains 126 fixes for individual bugs, 24 updates
1464on the documentation, and 75 other backported patches, most of which were needed
Patrick Starrdce734e2017-10-09 13:17:12 +07001465to fix the aforementioned 126 bugs. An existing feature may never be modified
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001466nor removed in a stable branch, in order to guarantee that upgrades within the
1467same branch will always be harmless.
1468
1469HAProxy is available from multiple sources, at different release rhythms :
1470
1471 - The official community web site : http://www.haproxy.org/ : this site
1472 provides the sources of the latest development release, all stable releases,
1473 as well as nightly snapshots for each branch. The release cycle is not fast,
1474 several months between stable releases, or between development snapshots.
1475 Very old versions are still supported there. Everything is provided as
1476 sources only, so whatever comes from there needs to be rebuilt and/or
1477 repackaged;
1478
Willy Tarreauec8962c2020-05-05 17:39:16 +02001479 - GitHub : https://github.com/haproxy/haproxy/ : this is the mirror for the
1480 development branch only, which provides integration with the issue tracker,
1481 continuous integration and code coverage tools. This is exclusively for
1482 contributors;
1483
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001484 - A number of operating systems such as Linux distributions and BSD ports.
1485 These systems generally provide long-term maintained versions which do not
1486 always contain all the fixes from the official ones, but which at least
1487 contain the critical fixes. It often is a good option for most users who do
1488 not seek advanced configurations and just want to keep updates easy;
1489
1490 - Commercial versions from http://www.haproxy.com/ : these are supported
1491 professional packages built for various operating systems or provided as
1492 appliances, based on the latest stable versions and including a number of
1493 features backported from the next release for which there is a strong
1494 demand. It is the best option for users seeking the latest features with
1495 the reliability of a stable branch, the fastest response time to fix bugs,
Davor Ocelic4094ce12017-12-19 23:30:39 +01001496 or simply support contracts on top of an open source product;
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001497
1498
1499In order to ensure that the version you're using is the latest one in your
1500branch, you need to proceed this way :
1501
1502 - verify which HAProxy executable you're running : some systems ship it by
1503 default and administrators install their versions somewhere else on the
1504 system, so it is important to verify in the startup scripts which one is
1505 used;
1506
1507 - determine which source your HAProxy version comes from. For this, it's
1508 generally sufficient to type "haproxy -v". A development version will
1509 appear like this, with the "dev" word after the branch number :
1510
Willy Tarreau58000fe2021-05-09 06:25:16 +02001511 HAProxy version 2.4-dev18-a5357c-137 2021/05/09 - https://haproxy.org/
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001512
1513 A stable version will appear like this, as well as unmodified stable
1514 versions provided by operating system vendors :
1515
Willy Tarreau58000fe2021-05-09 06:25:16 +02001516 HAProxy version 1.5.14 2015/07/02
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001517
1518 And a nightly snapshot of a stable version will appear like this with an
1519 hexadecimal sequence after the version, and with the date of the snapshot
1520 instead of the date of the release :
1521
Willy Tarreau58000fe2021-05-09 06:25:16 +02001522 HAProxy version 1.5.14-e4766ba 2015/07/29
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001523
1524 Any other format may indicate a system-specific package with its own
1525 patch set. For example HAProxy Enterprise versions will appear with the
1526 following format (<branch>-<latest commit>-<revision>) :
1527
Willy Tarreau58000fe2021-05-09 06:25:16 +02001528 HAProxy version 1.5.0-994126-357 2015/07/02
1529
1530 Please note that historically versions prior to 2.4 used to report the
1531 process name with a hyphen between "HA" and "Proxy", including those above
1532 which were adjusted to show the correct format only, so better ignore this
1533 word or use a relaxed match in scripts. Additionally, modern versions add
1534 a URL linking to the project's home.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001535
Willy Tarreau58000fe2021-05-09 06:25:16 +02001536 Finally, versions 2.1 and above will include a "Status" line indicating
Willy Tarreauec8962c2020-05-05 17:39:16 +02001537 whether the version is safe for production or not, and if so, till when, as
1538 well as a link to the list of known bugs affecting this version.
1539
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001540 - for system-specific packages, you have to check with your vendor's package
1541 repository or update system to ensure that your system is still supported,
1542 and that fixes are still provided for your branch. For community versions
1543 coming from haproxy.org, just visit the site, verify the status of your
1544 branch and compare the latest version with yours to see if you're on the
1545 latest one. If not you can upgrade. If your branch is not maintained
1546 anymore, you're definitely very late and will have to consider an upgrade
1547 to a more recent branch (carefully read the README when doing so).
1548
1549HAProxy will have to be updated according to the source it came from. Usually it
1550follows the system vendor's way of upgrading a package. If it was taken from
1551sources, please read the README file in the sources directory after extracting
1552the sources and follow the instructions for your operating system.
1553
1554
15554. Companion products and alternatives
1556--------------------------------------
1557
1558HAProxy integrates fairly well with certain products listed below, which is why
Davor Ocelic4094ce12017-12-19 23:30:39 +01001559they are mentioned here even if not directly related to HAProxy.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001560
1561
15624.1. Apache HTTP server
1563-----------------------
1564
1565Apache is the de-facto standard HTTP server. It's a very complete and modular
1566project supporting both file serving and dynamic contents. It can serve as a
Michael Prokop4438c602019-05-24 10:25:45 +02001567frontend for some application servers. It can even proxy requests and cache
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001568responses. In all of these use cases, a front load balancer is commonly needed.
Patrick Starrdce734e2017-10-09 13:17:12 +07001569Apache can work in various modes, some being heavier than others. Certain
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001570modules still require the heavier pre-forked model and will prevent Apache from
1571scaling well with a high number of connections. In this case HAProxy can provide
1572a tremendous help by enforcing the per-server connection limits to a safe value
1573and will significantly speed up the server and preserve its resources that will
1574be better used by the application.
1575
1576Apache can extract the client's address from the X-Forwarded-For header by using
1577the "mod_rpaf" extension. HAProxy will automatically feed this header when
1578"option forwardfor" is specified in its configuration. HAProxy may also offer a
1579nice protection to Apache when exposed to the internet, where it will better
Davor Ocelic4094ce12017-12-19 23:30:39 +01001580resist a wide number of types of DoS attacks.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001581
1582
15834.2. NGINX
1584----------
1585
1586NGINX is the second de-facto standard HTTP server. Just like Apache, it covers a
1587wide range of features. NGINX is built on a similar model as HAProxy so it has
1588no problem dealing with tens of thousands of concurrent connections. When used
Davor Ocelic4094ce12017-12-19 23:30:39 +01001589as a gateway to some applications (e.g. using the included PHP FPM) it can often
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001590be beneficial to set up some frontend connection limiting to reduce the load
1591on the PHP application. HAProxy will clearly be useful there both as a regular
Davor Ocelic4094ce12017-12-19 23:30:39 +01001592load balancer and as the traffic regulator to speed up PHP by decongesting
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001593it. Also since both products use very little CPU thanks to their event-driven
1594architecture, it's often easy to install both of them on the same system. NGINX
1595implements HAProxy's PROXY protocol, thus it is easy for HAProxy to pass the
1596client's connection information to NGINX so that the application gets all the
1597relevant information. Some benchmarks have also shown that for large static
1598file serving, implementing consistent hash on HAProxy in front of NGINX can be
1599beneficial by optimizing the OS' cache hit ratio, which is basically multiplied
1600by the number of server nodes.
1601
1602
16034.3. Varnish
1604------------
1605
1606Varnish is a smart caching reverse-proxy, probably best described as a web
1607application accelerator. Varnish doesn't implement SSL/TLS and wants to dedicate
1608all of its CPU cycles to what it does best. Varnish also implements HAProxy's
1609PROXY protocol so that HAProxy can very easily be deployed in front of Varnish
1610as an SSL offloader as well as a load balancer and pass it all relevant client
1611information. Also, Varnish naturally supports decompression from the cache when
1612a server has provided a compressed object, but doesn't compress however. HAProxy
1613can then be used to compress outgoing data when backend servers do not implement
1614compression, though it's rarely a good idea to compress on the load balancer
1615unless the traffic is low.
1616
1617When building large caching farms across multiple nodes, HAProxy can make use of
1618consistent URL hashing to intelligently distribute the load to the caching nodes
1619and avoid cache duplication, resulting in a total cache size which is the sum of
Willy Tarreauec8962c2020-05-05 17:39:16 +02001620all caching nodes. In addition, caching of very small dumb objects for a short
1621duration on HAProxy can sometimes save network round trips and reduce the CPU
1622load on both the HAProxy and the Varnish nodes. This is only possible is no
1623processing is done on these objects on Varnish (this is often referred to as
1624the notion of "favicon cache", by which a sizeable percentage of useless
1625downstream requests can sometimes be avoided). However do not enable HAProxy
1626caching for a long time (more than a few seconds) in front of any other cache,
1627that would significantly complicate troubleshooting without providing really
1628significant savings.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001629
1630
16314.4. Alternatives
1632-----------------
1633
1634Linux Virtual Server (LVS or IPVS) is the layer 4 load balancer included within
1635the Linux kernel. It works at the packet level and handles TCP and UDP. In most
1636cases it's more a complement than an alternative since it doesn't have layer 7
1637knowledge at all.
1638
1639Pound is another well-known load balancer. It's much simpler and has much less
1640features than HAProxy but for many very basic setups both can be used. Its
1641author has always focused on code auditability first and wants to maintain the
1642set of features low. Its thread-based architecture scales less well with high
1643connection counts, but it's a good product.
1644
1645Pen is a quite light load balancer. It supports SSL, maintains persistence using
1646a fixed-size table of its clients' IP addresses. It supports a packet-oriented
1647mode allowing it to support direct server return and UDP to some extents. It is
1648meant for small loads (the persistence table only has 2048 entries).
1649
1650NGINX can do some load balancing to some extents, though it's clearly not its
1651primary function. Production traffic is used to detect server failures, the
1652load balancing algorithms are more limited, and the stickiness is very limited.
1653But it can make sense in some simple deployment scenarios where it is already
1654present. The good thing is that since it integrates very well with HAProxy,
Davor Ocelic4094ce12017-12-19 23:30:39 +01001655there's nothing wrong with adding HAProxy later when its limits have been
1656reached.
Willy Tarreaud8e42b62015-08-18 21:51:36 +02001657
1658Varnish also does some load balancing of its backend servers and does support
1659real health checks. It doesn't implement stickiness however, so just like with
1660NGINX, as long as stickiness is not needed that can be enough to start with.
1661And similarly, since HAProxy and Varnish integrate so well together, it's easy
1662to add it later into the mix to complement the feature set.
1663
Willy Tarreau65626232020-05-05 18:08:07 +02001664
16655. Contacts
1666-----------
1667
1668If you want to contact the developers or any community member about anything,
1669the best way to do it usually is via the mailing list by sending your message
1670to haproxy@formilux.org. Please note that this list is public and its archives
1671are public as well so you should avoid disclosing sensitive information. A
1672thousand of users of various experience levels are present there and even the
1673most complex questions usually find an optimal response relatively quickly.
1674Suggestions are welcome too. For users having difficulties with e-mail, a
1675Discourse platform is available at http://discourse.haproxy.org/ . However
1676please keep in mind that there are less people reading questions there and that
1677most are handled by a really tiny team. In any case, please be patient and
1678respectful with those who devote their spare time helping others.
1679
1680I you believe you've found a bug but are not sure, it's best reported on the
1681mailing list. If you're quite convinced you've found a bug, that your version
1682is up-to-date in its branch, and you already have a GitHub account, feel free
1683to go directly to https://github.com/haproxy/haproxy/ and file an issue with
1684all possibly available details. Again, this is public so be careful not to post
1685information you might later regret. Since the issue tracker presents itself as
1686a very long thread, please avoid pasting very long dumps (a few hundreds lines
1687or more) and attach them instead.
1688
1689If you've found what you're absolutely certain can be considered a critical
1690security issue that would put many users in serious trouble if discussed in a
1691public place, then you can send it with the reproducer to security@haproxy.org.
1692A small team of trusted developers will receive it and will be able to propose
1693a fix. We usually don't use embargoes and once a fix is available it gets
1694merged. In some rare circumstances it can happen that a release is coordinated
1695with software vendors. Please note that this process usually messes up with
1696eveyone's work, and that rushed up releases can sometimes introduce new bugs,
1697so it's best avoided unless strictly necessary; as such, there is often little
1698consideration for reports that needlessly cause such extra burden, and the best
1699way to see your work credited usually is to provide a working fix, which will
1700appear in changelogs.