b55fcf2ee8d5bc4c12f4b0c34e8cb5c2ffbe5946 - haproxy

commit	b55fcf2ee8d5bc4c12f4b0c34e8cb5c2ffbe5946	[log] [tgz]
author	Willy Tarreau <w@1wt.eu>	Thu Oct 28 22:48:29 2010 +0200
committer	Willy Tarreau <w@1wt.eu>	Sat Oct 30 19:04:36 2010 +0200
tree	38d8bfeb352f8fba21647882c00438c32a1c93dd
parent	6190b7d9dca43c18c1f37b96091610976289561c [diff]

[BUG] ebtree: fix duplicate strings insertion

(update to ebtree 6.0.4)

Recent fix fd301cc1370cd4977fe175dfa4544c7dc0e7ce6b was not OK because it
was returning one excess byte, causing some duplicates not to be detected.
The reason is that we added 8 bits to count the trailing zero but they
were implied by the pre-incrementation of the pointer.

Fixing this was still not enough, as the problem appeared when
string_equal_bits() was applied on two identical strings, and it returned
a number of bits covering the trailing zero. Subsequent calls were applied
to the first byte after this trailing zero. It was often zero when doing
insertion from raw files, explaining why the issue was not discovered
earlier. But when the data is from a reused area, duplicate strings are not
correctly detected when inserting into the tree.

Several solutions were tested, and the only efficient one consists in making
string_equal_bits() notify the caller that the end of the string was reached.
It now returns zero and the callers just have to ensure that when they get a
zero, they stop using that bit until a dup tree or a leaf is encountered.

This fix brought the unexpected bonus of simplifying the insertion code a bit
and making it slightly faster to process duplicates.

The impact for haproxy was that if many similar string patterns were loaded
from a file, there was a potential risk that their insertion or matching could
have been slower. The bigger impact was with the URL sorting feature of halog,
which is not yet merged and is how this bug was discovered.
(cherry picked from commit 518d59ec9ba43705f930f9ece3749c450fd005df)

3 files changed

tree: 38d8bfeb352f8fba21647882c00438c32a1c93dd