MAJOR: compression: integrate support for libslz

This library is designed to emit a zlib-compatible stream with no
memory usage and to favor resource savings over compression ratio.
While zlib requires 256 kB of RAM per compression context (and can only
support 4000 connections per GB of RAM), the stateless compression
offered by libslz does not need to retain buffers between subsequent
calls. In theory this slightly reduces the compression ratio but in
practice it does not have that much of an effect since the zlib
window is limited to 32kB.

Libslz is available at :

      http://git.1wt.eu/web?p=libslz.git

It was designed for web compression and provides a lot of savings
over zlib in haproxy. Here are the preliminary results on a single
core of a core2-quad 3.0 GHz in 32-bit for only 300 concurrent
sessions visiting the home page of www.haproxy.org (76 kB) with
the default 16kB buffers :

          BW In      BW Out     BW Saved   Ratio   memory VSZ/RSS
zlib      237 Mbps    92 Mbps   145 Mbps   2.58     84M /  69M
slz       733 Mbps   380 Mbps   353 Mbps   1.93    5.9M / 4.2M

So while the compression ratio is lower, the bandwidth savings are
much more important due to the significantly lower compression cost
which allows to consume even more data from the servers. In the
example above, zlib became the bottleneck at 24% of the output
bandwidth. Also the difference in memory usage is obvious.

More tests run on a single core of a core i5-3320M, with 500 concurrent
users and the default 16kB buffers :

At 100% CPU (no limit) :
          BW In      BW Out     BW Saved   Ratio   memory VSZ/RSS  hits/s
zlib      480 Mbps   188 Mbps   292 Mbps   2.55     130M / 101M     744
slz      1700 Mbps   810 Mbps   890 Mbps   2.10    23.7M / 9.7M    2382

At 85% CPU (limited) :
          BW In      BW Out     BW Saved   Ratio   memory VSZ/RSS  hits/s
zlib     1240 Mbps   976 Mbps   264 Mbps   1.27     130M / 100M    1738
slz      1600 Mbps   976 Mbps   624 Mbps   1.64    23.7M / 9.7M    2210

The most important benefit really happens when the CPU usage is
limited by "maxcompcpuusage" or the BW limited by "maxcomprate" :
in order to preserve resources, haproxy throttles the compression
ratio until usage is within limits. Since slz is much cheaper, the
average compression ratio is much higher and the input bandwidth
is quite higher for one Gbps output.

Other tests made with some reference files :

                           BW In     BW Out    BW Saved  Ratio  hits/s
daniels.html       zlib  1320 Mbps  163 Mbps  1157 Mbps   8.10    1925
                   slz   3600 Mbps  580 Mbps  3020 Mbps   6.20    5300

tv.com/listing     zlib   980 Mbps  124 Mbps   856 Mbps   7.90     310
                   slz   3300 Mbps  553 Mbps  2747 Mbps   5.97    1100

jquery.min.js      zlib   430 Mbps  180 Mbps   250 Mbps   2.39     547
                   slz   1470 Mbps  764 Mbps   706 Mbps   1.92    1815

bootstrap.min.css  zlib   790 Mbps  165 Mbps   625 Mbps   4.79     777
                   slz   2450 Mbps  650 Mbps  1800 Mbps   3.77    2400

So on top of saving a lot of memory, slz is constantly 2.5-3.5 times
faster than zlib and results in providing more savings for a fixed CPU
usage. For links smaller than 100 Mbps, zlib still provides a better
compression ratio, at the expense of a much higher CPU usage.

Larger input files provide slightly higher bandwidth for both libs, at
the expense of a bit more memory usage for zlib (it converges to 256kB
per connection).
4 files changed