blob: ffdb2565e932f0b0d13e9c141b2c647f2a1d6cb1 [file] [log] [blame]
Willy Tarreau6d1a9882007-01-07 02:03:04 +01001 Using Linux TCP Splicing with HAProxy
2 Willy Tarreau <w@1wt.eu>
3 - 2007/01/06 -
4
5
6Alexandre Cassen has started a project called Linux Layer7 Switching (L7SW),
7whose goal is to provide kernel services to help userland proxies achieving
8very high performance. Right now, the project consists in a loadable kernel
9module providing TCP Splicing under Linux.
10
11TCP Splicing is a method by which a userland proxy can tell the kernel that
12it considers it has no added value on the data part of a connection, and that
13the kernel can perform the transfers it itself, thus relieving the proxy from
14a potentially heavy job. There are two advantages to this method :
15
16 - it reduces the number of process wakeups
17 - it reduces the number of data copies between user-space and kernel buffers
18
19This method is particularly suited to protocols in which data is sent till
20the end of the session. This is the case for FTP data for instance, and it
21is also the case for the BODY part of HTTP/1.0.
22
23The great news is that haproxy has been designed from the beginning with a
24clear distinction between the headers and the DATA phase, so it was a child's
25game to add hooks to Alex's library in it
26
27Be careful! Both versions are to be considered BETA software ! Run them on
28your systems if you want, but do not complain if it crashes twice a day !
29Anyway, it seems stable on our test machines.
30
31In order to use TCP Splicing on haproxy, you need :
32
33 - Linux Layer7 Switching code version 0.1.1 : [ http://linux-l7sw.sf.net/ ]
34 - Haproxy version 1.3.5 : [ http://haproxy.1wt.eu/download/1.3/src/ ]
35
36Then, you must untar both packages in any location, let's assume you'll
37be using /tmp. First extract l7sw and :
38
39 $ cd /tmp
40 $ tar zxf layer7switch-0.1.1.tar.gz
41 $ cd layer7switch-0.1.1
42
43L7SW currently only supports Linux kernel 2.6.19+. If you prefer to use it
44on a more stable kernel, such as 2.6.16.X, you can apply this patch to the
45L7SW directory :
46
47 [ http://haproxy.1wt.eu/download/patches/tcp_splice-0.1.1-linux-2.6.16.diff ]
48
49 $ patch -p1 -d kernel < tcp_splice-0.1.1-linux-2.6.16.diff
50
51Alternatively, if you prefer to run it on 2.4.33+, you can apply this patch
52to the L7SW directory :
53
54 [ http://haproxy.1wt.eu/download/patches/tcp_splice-0.1.1-linux-2.4.33.diff ]
55
56 $ patch -p1 -d kernel < tcp_splice-0.1.1-linux-2.4.33.diff
57
58Then build the kernel module as described in the L7SW README. Basically, you
59just have to do this once your tree has been patched :
60
61 $ cd kernel
62 $ make
63
64You can either install the resulting module (tcp_splice) or load it now. During
65early testing periods, it might be preferable to avoid installing anything and
66just load it manually :
67
68 $ sudo insmod tcp_splice.*o
69 $ cd ..
70
71Now that the module is loaded, you need to build the libtcpsplice library on
72which haproxy currently relies :
73
74 $ cd userland/libtcpsplice
75 $ make
76 $ cd ..
77
78For the adventurous, there's also a proof of concept in the userlan/switchd
79directory, it may be useful if you encounter problems with haproxy for
80instance. But it is not needed at all here.
81
82OK, L7SW is ready. Now you have to extract haproxy and tell it to build using
83libtcpsplice :
84
85 $ cd /tmp
86 $ tar zxf haproxy-1.3.5.tar.gz
87 $ cd haproxy-1.3.5
88 $ make USE_TCPSPLICE=1 TCPSPLICEDIR=/tmp/layer7switch-0.1.1/userland/libtcpsplice
89
90There are other options to make, which are hugely recommended, such as
91CPU=, REGEX=, and above all, TARGET= so that you use the best syscalls and
92functions for your system. Generally you will use TARGET=linux26, but 2.4 users
93with an epoll-patched kernel will use TARGET=linux24e. This is very important
94because failing to specify those options will disable important optimizations
95which might hide the tcpsplice benefits ! Please consult the haproxy's README.
96
97Now that you have haproxy built with support for tcpsplice, and that the module
98is loaded, you have to write a config. There is an example in the 'examples'
99directory. Basically, you just have to add the "option tcpsplice" keyword BOTH
100in the frontend AND in the backend sections that you want to accelerate.
101
102If the option is specified only in the frontend or in the backend, then no
103acceleration will be used. It is designed this way to allow some front-back
104combinations to use it without forcing others to use it. Of course, if you use
105a single "listen" section, you just have to specify it once.
106
107As of now (l7sw-0.1.1 and haproxy-1.3.5), you need the CAP_NETADMIN capability
108to START and to RUN. For human beings, it means that you have to start haproxy
109as root and keep it running as root, so it must not drop its priviledges. This
110is somewhat annoying, but we'll try to find a solution later.
111
112Also, l7sw-0.1.1 does not yet support TCP window scaling nor SACK. So you have
113to disable both features on the proxy :
114
115 $ sudo sysctl -w net.ipv4.tcp_window_scaling=0
116 $ sudo sysctl -w net.ipv4.tcp_sack=0
117 $ sudo sysctl -w net.ipv4.tcp_dsack=0
118 $ sudo sysctl -w net.ipv4.tcp_tw_recycle=1
119
120You can now check that everything works as expected. Run "vmstat 1" or "top"
121in one terminal, and haproxy in another one :
122
123 $ sudo ./haproxy -f examples/tcp-splicing-sample.cfg
124
125Transfering large file through it should not affect it much. You should observe
126something like 10% CPU instead of 95% when transferring 1 MB files at full
127speed. You can play with the tcpsplice option in the configuration to see the
128effects.
129
130
131Troubleshooting
132---------------
133
134This software is still beta, and you will probably encounter some caveats.
135I personnally ran into a few issues that we'll try to address with Alex. First
136of all, I had occasionnal lockups on my SMP machine which I never had on an UP
137one. So if you get problems on an SMP machine, please reboot it in UP and do
138not lose your time on this.
139
140I also noticed that sometimes, some sessions remained established even after
141the end of the program. You might also see some situtations where even after
142the proxy's exit, the traffic still passes through the system. It may happen
143when you have a limited source port range and that you reuse a TIME_WAIT
144session matching exactly the same source and destinations. This will need
145to be addressed too.
146
147You can play with tcp_splice variables and timeouts here in /proc/sys/net/ :
148
149 $ ls /proc/sys/net/tcp_splice/
150 debug_level timeout_established timeout_listen timeout_synsent
151 timeout_close timeout_finwait timeout_synack timeout_timewait
152 timeout_closewait timeout_lastack timeout_synrecv
153
154 $ sysctl net/tcp_splice
155 net.tcp_splice.debug_level = 0
156 net.tcp_splice.timeout_synack = 120
157 net.tcp_splice.timeout_listen = 120
158 net.tcp_splice.timeout_lastack = 30
159 net.tcp_splice.timeout_closewait = 60
160 net.tcp_splice.timeout_close = 10
161 net.tcp_splice.timeout_timewait = 120
162 net.tcp_splice.timeout_finwait = 120
163 net.tcp_splice.timeout_synrecv = 60
164 net.tcp_splice.timeout_synsent = 120
165 net.tcp_splice.timeout_established = 900
166
167You can also consult the full session list here :
168
169$ head /proc/net/tcp_splice_conn
170FromIP FPrt ToIP TPrt LocalIP LPrt DestIP DPrt State Expires
1710A000301 4EBB 0A000302 1F40 0A000302 817B 0A000301 0050 CLOSE 7
1720A000301 4E9B 0A000302 1F40 0A000302 8165 0A000301 0050 CLOSE 7
173
174Since a session exists at least in CLOSE state for 10 seconds, you just have
175to consult this entry less than 10 seconds after a test to see a session.
176
177Please report your successes, failures, suggestions or fixes to the L7SW
178mailing list here (do not use the list to report other haproxy bugs) :
179
180 https://lists.sourceforge.net/lists/listinfo/linux-l7sw-devel
181
182
183Motivations
184-----------
185
186I've always wanted haproxy to be the fastest and most reliable software load
187balancer available. L7SW is an opportunity to make get a huge performance boost
188on high traffic sites (eg: photo sharing, streaming, ...). In turn, I find it a
189shame that Alex wastes his time redevelopping a proxy as a proof of concept for
190his kernel code. While it is a fun game to enter into, it really becomes harder
191when you need to get close to customers' needs. So by porting haproxy early to
192L7SW, I get both the opportunity to get an idea of what it will soon be capable
193of, and help Alex spend more time on the complex kernel part.
194
195Have fun !
196Willy