blob: af66de2a30275168babacc3e895b906aae7c4f26 [file] [log] [blame]
Bhaskar Maddala10e26de2013-11-20 12:55:21 -050012013/11/20 - How hashing works internally in haproxy - maddalab@gmail.com
2
3This document describes how Haproxy implements hashing both map-based and
4consistent hashing, both prior to versions 1.5 and the motivation and tests
William Lallemand4f392792020-06-12 17:31:06 +02005that were done when providing additional options starting in version 2.0
Bhaskar Maddala10e26de2013-11-20 12:55:21 -05006
7A note on hashing in general, hash functions strive to have little
8correlation between input and output. The heart of a hash function is its
9mixing step. The behavior of the mixing step largely determines whether the
10hash function is collision-resistant. Hash functions that are collision
11resistant are more likely to have an even distribution of load.
12
13The purpose of the mixing function is to spread the effect of each message
14bit throughout all the bits of the internal state. Ideally every bit in the
15hash state is affected by every bit in the message. And we want to do that
16as quickly as possible simply for the sake of program performance. A
17function is said to satisfy the strict avalanche criterion if, whenever a
18single input bit is complemented (toggled between 0 and 1), each of the
19output bits should change with a probability of one half for an arbitrary
20selection of the remaining input bits.
21
22To guard against a combination of hash function and input that results in
23high rate of collisions, haproxy implements an avalanche algorithm on the
24result of the hashing function. In all versions 1.4 and prior avalanche is
25always applied when using the consistent hashing directive. It is intended
26to provide quite a good distribution for little input variations. The result
27is quite suited to fit over a 32-bit space with enough variations so that
28a randomly picked number falls equally before any server position, which is
29ideal for consistently hashed backends, a common use case for caches.
30
31In all versions 1.4 and prior Haproxy implements the SDBM hashing function.
32However tests show that alternatives to SDBM have a better cache
33distribution on different hashing criteria. Additional tests involving
34alternatives for hash input and an option to trigger avalanche, we found
35different algorithms perform better on different criteria. DJB2 performs
36well when hashing ascii text and is a good choice when hashing on host
37header. Other alternatives perform better on numbers and are a good choice
38when using source ip. The results also vary by use of the avalanche flag.
39
40The results of the testing can be found under the tests folder. Here is
41a summary of the discussion on the results on 1 input criteria and the
42methodology used to generate the results.
43
44A note of the setup when validating the results independently, one
45would want to avoid backend server counts that may skew the results. As
46an example with DJB2 avoid 33 servers. Please see the implementations of
47the hashing function, which can be found in the links under references.
48
49The following was the set up used
50
51(a) hash-type consistent/map-based
52(b) avalanche on/off
53(c) balanche host(hdr)
54(d) 3 criteria for inputs
55 - ~ 10K requests, including duplicates
56 - ~ 46K requests, unique requests from 1 MM requests were obtained
57 - ~ 250K requests, including duplicates
58(e) 17 servers in backend, all servers were assigned the same weight
59
60Result of the hashing were obtained across the server via monitoring log
61files for haproxy. Population Standard deviation was used to evaluate the
62efficacy of the hashing algorithm. Lower standard deviation, indicates
63a better distribution of load across the backends.
64
65On 10K requests, when using consistent hashing with avalanche on host
66headers, DJB2 significantly out performs SDBM. Std dev on SDBM was 48.95
67and DJB2 was 26.29. This relationship is inverted with avalanche disabled,
68however DJB2 with avalanche enabled out performs SDBM with avalanche
69disabled.
70
71On map-based hashing SDBM out performs DJB2 irrespective of the avalanche
72option. SDBM without avalanche is marginally better than with avalanche.
73DJB2 performs significantly worse with avalanche enabled.
74
75Summary: The results of the testing indicate that there isn't a hashing
76algorithm that can be applied across all input criteria. It is necessary
77to support alternatives to SDBM, which is generally the best option, with
78algorithms that are better for different inputs. Avalanche is not always
79applicable and may result in less smooth distribution.
80
81References:
Adam Mills342b4062020-04-14 16:45:15 -070082Mixing Functions/Avalanche: https://papa.bretmulvey.com/post/124027987928/hash-functions
Bhaskar Maddala10e26de2013-11-20 12:55:21 -050083Hash Functions: http://www.cse.yorku.ca/~oz/hash.html