Willy Tarreau | 6d1a988 | 2007-01-07 02:03:04 +0100 | [diff] [blame] | 1 | Using Linux TCP Splicing with HAProxy |
| 2 | Willy Tarreau <w@1wt.eu> |
| 3 | - 2007/01/06 - |
| 4 | |
| 5 | |
| 6 | Alexandre Cassen has started a project called Linux Layer7 Switching (L7SW), |
| 7 | whose goal is to provide kernel services to help userland proxies achieving |
| 8 | very high performance. Right now, the project consists in a loadable kernel |
| 9 | module providing TCP Splicing under Linux. |
| 10 | |
| 11 | TCP Splicing is a method by which a userland proxy can tell the kernel that |
| 12 | it considers it has no added value on the data part of a connection, and that |
| 13 | the kernel can perform the transfers it itself, thus relieving the proxy from |
| 14 | a potentially heavy job. There are two advantages to this method : |
| 15 | |
| 16 | - it reduces the number of process wakeups |
| 17 | - it reduces the number of data copies between user-space and kernel buffers |
| 18 | |
| 19 | This method is particularly suited to protocols in which data is sent till |
| 20 | the end of the session. This is the case for FTP data for instance, and it |
| 21 | is also the case for the BODY part of HTTP/1.0. |
| 22 | |
| 23 | The great news is that haproxy has been designed from the beginning with a |
| 24 | clear distinction between the headers and the DATA phase, so it was a child's |
| 25 | game to add hooks to Alex's library in it |
| 26 | |
| 27 | Be careful! Both versions are to be considered BETA software ! Run them on |
| 28 | your systems if you want, but do not complain if it crashes twice a day ! |
| 29 | Anyway, it seems stable on our test machines. |
| 30 | |
| 31 | In order to use TCP Splicing on haproxy, you need : |
| 32 | |
| 33 | - Linux Layer7 Switching code version 0.1.1 : [ http://linux-l7sw.sf.net/ ] |
| 34 | - Haproxy version 1.3.5 : [ http://haproxy.1wt.eu/download/1.3/src/ ] |
| 35 | |
| 36 | Then, you must untar both packages in any location, let's assume you'll |
| 37 | be using /tmp. First extract l7sw and : |
| 38 | |
| 39 | $ cd /tmp |
| 40 | $ tar zxf layer7switch-0.1.1.tar.gz |
| 41 | $ cd layer7switch-0.1.1 |
| 42 | |
| 43 | L7SW currently only supports Linux kernel 2.6.19+. If you prefer to use it |
| 44 | on a more stable kernel, such as 2.6.16.X, you can apply this patch to the |
| 45 | L7SW directory : |
| 46 | |
| 47 | [ http://haproxy.1wt.eu/download/patches/tcp_splice-0.1.1-linux-2.6.16.diff ] |
| 48 | |
| 49 | $ patch -p1 -d kernel < tcp_splice-0.1.1-linux-2.6.16.diff |
| 50 | |
| 51 | Alternatively, if you prefer to run it on 2.4.33+, you can apply this patch |
| 52 | to the L7SW directory : |
| 53 | |
| 54 | [ http://haproxy.1wt.eu/download/patches/tcp_splice-0.1.1-linux-2.4.33.diff ] |
| 55 | |
| 56 | $ patch -p1 -d kernel < tcp_splice-0.1.1-linux-2.4.33.diff |
| 57 | |
| 58 | Then build the kernel module as described in the L7SW README. Basically, you |
| 59 | just have to do this once your tree has been patched : |
| 60 | |
| 61 | $ cd kernel |
| 62 | $ make |
| 63 | |
| 64 | You can either install the resulting module (tcp_splice) or load it now. During |
| 65 | early testing periods, it might be preferable to avoid installing anything and |
| 66 | just load it manually : |
| 67 | |
| 68 | $ sudo insmod tcp_splice.*o |
| 69 | $ cd .. |
| 70 | |
| 71 | Now that the module is loaded, you need to build the libtcpsplice library on |
| 72 | which haproxy currently relies : |
| 73 | |
| 74 | $ cd userland/libtcpsplice |
| 75 | $ make |
| 76 | $ cd .. |
| 77 | |
| 78 | For the adventurous, there's also a proof of concept in the userlan/switchd |
| 79 | directory, it may be useful if you encounter problems with haproxy for |
| 80 | instance. But it is not needed at all here. |
| 81 | |
| 82 | OK, L7SW is ready. Now you have to extract haproxy and tell it to build using |
| 83 | libtcpsplice : |
| 84 | |
| 85 | $ cd /tmp |
| 86 | $ tar zxf haproxy-1.3.5.tar.gz |
| 87 | $ cd haproxy-1.3.5 |
| 88 | $ make USE_TCPSPLICE=1 TCPSPLICEDIR=/tmp/layer7switch-0.1.1/userland/libtcpsplice |
| 89 | |
| 90 | There are other options to make, which are hugely recommended, such as |
| 91 | CPU=, REGEX=, and above all, TARGET= so that you use the best syscalls and |
| 92 | functions for your system. Generally you will use TARGET=linux26, but 2.4 users |
| 93 | with an epoll-patched kernel will use TARGET=linux24e. This is very important |
| 94 | because failing to specify those options will disable important optimizations |
| 95 | which might hide the tcpsplice benefits ! Please consult the haproxy's README. |
| 96 | |
| 97 | Now that you have haproxy built with support for tcpsplice, and that the module |
| 98 | is loaded, you have to write a config. There is an example in the 'examples' |
| 99 | directory. Basically, you just have to add the "option tcpsplice" keyword BOTH |
| 100 | in the frontend AND in the backend sections that you want to accelerate. |
| 101 | |
| 102 | If the option is specified only in the frontend or in the backend, then no |
| 103 | acceleration will be used. It is designed this way to allow some front-back |
| 104 | combinations to use it without forcing others to use it. Of course, if you use |
| 105 | a single "listen" section, you just have to specify it once. |
| 106 | |
| 107 | As of now (l7sw-0.1.1 and haproxy-1.3.5), you need the CAP_NETADMIN capability |
| 108 | to START and to RUN. For human beings, it means that you have to start haproxy |
| 109 | as root and keep it running as root, so it must not drop its priviledges. This |
| 110 | is somewhat annoying, but we'll try to find a solution later. |
| 111 | |
| 112 | Also, l7sw-0.1.1 does not yet support TCP window scaling nor SACK. So you have |
| 113 | to disable both features on the proxy : |
| 114 | |
| 115 | $ sudo sysctl -w net.ipv4.tcp_window_scaling=0 |
| 116 | $ sudo sysctl -w net.ipv4.tcp_sack=0 |
| 117 | $ sudo sysctl -w net.ipv4.tcp_dsack=0 |
| 118 | $ sudo sysctl -w net.ipv4.tcp_tw_recycle=1 |
| 119 | |
| 120 | You can now check that everything works as expected. Run "vmstat 1" or "top" |
| 121 | in one terminal, and haproxy in another one : |
| 122 | |
| 123 | $ sudo ./haproxy -f examples/tcp-splicing-sample.cfg |
| 124 | |
| 125 | Transfering large file through it should not affect it much. You should observe |
| 126 | something like 10% CPU instead of 95% when transferring 1 MB files at full |
| 127 | speed. You can play with the tcpsplice option in the configuration to see the |
| 128 | effects. |
| 129 | |
| 130 | |
| 131 | Troubleshooting |
| 132 | --------------- |
| 133 | |
| 134 | This software is still beta, and you will probably encounter some caveats. |
| 135 | I personnally ran into a few issues that we'll try to address with Alex. First |
| 136 | of all, I had occasionnal lockups on my SMP machine which I never had on an UP |
| 137 | one. So if you get problems on an SMP machine, please reboot it in UP and do |
| 138 | not lose your time on this. |
| 139 | |
| 140 | I also noticed that sometimes, some sessions remained established even after |
| 141 | the end of the program. You might also see some situtations where even after |
| 142 | the proxy's exit, the traffic still passes through the system. It may happen |
| 143 | when you have a limited source port range and that you reuse a TIME_WAIT |
| 144 | session matching exactly the same source and destinations. This will need |
| 145 | to be addressed too. |
| 146 | |
| 147 | You can play with tcp_splice variables and timeouts here in /proc/sys/net/ : |
| 148 | |
| 149 | $ ls /proc/sys/net/tcp_splice/ |
| 150 | debug_level timeout_established timeout_listen timeout_synsent |
| 151 | timeout_close timeout_finwait timeout_synack timeout_timewait |
| 152 | timeout_closewait timeout_lastack timeout_synrecv |
| 153 | |
| 154 | $ sysctl net/tcp_splice |
| 155 | net.tcp_splice.debug_level = 0 |
| 156 | net.tcp_splice.timeout_synack = 120 |
| 157 | net.tcp_splice.timeout_listen = 120 |
| 158 | net.tcp_splice.timeout_lastack = 30 |
| 159 | net.tcp_splice.timeout_closewait = 60 |
| 160 | net.tcp_splice.timeout_close = 10 |
| 161 | net.tcp_splice.timeout_timewait = 120 |
| 162 | net.tcp_splice.timeout_finwait = 120 |
| 163 | net.tcp_splice.timeout_synrecv = 60 |
| 164 | net.tcp_splice.timeout_synsent = 120 |
| 165 | net.tcp_splice.timeout_established = 900 |
| 166 | |
| 167 | You can also consult the full session list here : |
| 168 | |
| 169 | $ head /proc/net/tcp_splice_conn |
| 170 | FromIP FPrt ToIP TPrt LocalIP LPrt DestIP DPrt State Expires |
| 171 | 0A000301 4EBB 0A000302 1F40 0A000302 817B 0A000301 0050 CLOSE 7 |
| 172 | 0A000301 4E9B 0A000302 1F40 0A000302 8165 0A000301 0050 CLOSE 7 |
| 173 | |
| 174 | Since a session exists at least in CLOSE state for 10 seconds, you just have |
| 175 | to consult this entry less than 10 seconds after a test to see a session. |
| 176 | |
| 177 | Please report your successes, failures, suggestions or fixes to the L7SW |
| 178 | mailing list here (do not use the list to report other haproxy bugs) : |
| 179 | |
| 180 | https://lists.sourceforge.net/lists/listinfo/linux-l7sw-devel |
| 181 | |
| 182 | |
| 183 | Motivations |
| 184 | ----------- |
| 185 | |
| 186 | I've always wanted haproxy to be the fastest and most reliable software load |
| 187 | balancer available. L7SW is an opportunity to make get a huge performance boost |
| 188 | on high traffic sites (eg: photo sharing, streaming, ...). In turn, I find it a |
| 189 | shame that Alex wastes his time redevelopping a proxy as a proof of concept for |
| 190 | his kernel code. While it is a fun game to enter into, it really becomes harder |
| 191 | when you need to get close to customers' needs. So by porting haproxy early to |
| 192 | L7SW, I get both the opportunity to get an idea of what it will soon be capable |
| 193 | of, and help Alex spend more time on the complex kernel part. |
| 194 | |
| 195 | Have fun ! |
| 196 | Willy |