9f4f6b038c04f5090d5100db5840bedbaa530787 - haproxy

commit	9f4f6b038c04f5090d5100db5840bedbaa530787	[log] [tgz]
author	Willy Tarreau <w@1wt.eu>	Tue Sep 20 07:27:15 2022 +0200
committer	Willy Tarreau <w@1wt.eu>	Tue Sep 20 07:41:58 2022 +0200
tree	51850070b0908a63c919a8c162c47b9005780d90
parent	cbfee3a9f69c57ab5986fab9cf438cc4b5dbfaa0 [diff]

OPTIM: hpack-huff: reduce the cache footprint of the huffman decoder

Some tables are currently used to decode bit blocks and lengths. We do
see such lookups in perf top. We have 4 512-byte tables and one 64-byte
one. Looking closer, the second half of the table (length) has so few
variations that most of the time it will be computed in a single "if",
and never more than 3. This alone allows to cut the tables in half. In
addition, one table (bits 15-11) is only 32-element long, while another
one (bits 11-4) starts at 0x60, so we can merge the two as they do not
overlap, and further save size. We're now down to 4 256-entries tables.

This is visible in h3 and h2 where the max request rate is slightly higher
(e.g. +1.6% for h2). The huff_dec() function got slightly larger but the
overall code size shrunk:

  $ nm --size haproxy-before | grep huff_dec
  000000000000029e T huff_dec
  $ nm --size haproxy-after | grep huff_dec
  0000000000000345 T huff_dec
  $ size haproxy-before haproxy-after
     text    data     bss     dec     hex filename
  7591126  569268 2761348 10921742         a6a70e haproxy-before
  7591082  568180 2761348 10920610         a6a2a2 haproxy-after

src/hpack-huff.c[diff]

1 file changed

tree: 51850070b0908a63c919a8c162c47b9005780d90