Blame - include/import/ebtree.h - haproxy

blob: 85ea31d9bf759e303cacc5a7bcd1edf3d6a28253 [file] [log] [blame]

Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	1	/*
				2	* Elastic Binary Trees - generic macros and structures.
Willy Tarreau	f3bfede	2011-07-25 11:38:17 +0200	[diff] [blame]	3	* Version 6.0.6
				4	* (C) 2002-2011 - Willy Tarreau <w@1wt.eu>
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	5	*
Willy Tarreau	f3bfede	2011-07-25 11:38:17 +0200	[diff] [blame]	6	* This library is free software; you can redistribute it and/or
				7	* modify it under the terms of the GNU Lesser General Public
				8	* License as published by the Free Software Foundation, version 2.1
				9	* exclusively.
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	10	*
Willy Tarreau	f3bfede	2011-07-25 11:38:17 +0200	[diff] [blame]	11	* This library is distributed in the hope that it will be useful,
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	12	* but WITHOUT ANY WARRANTY; without even the implied warranty of
Willy Tarreau	f3bfede	2011-07-25 11:38:17 +0200	[diff] [blame]	13	* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
				14	* Lesser General Public License for more details.
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	15	*
Willy Tarreau	f3bfede	2011-07-25 11:38:17 +0200	[diff] [blame]	16	* You should have received a copy of the GNU Lesser General Public
				17	* License along with this library; if not, write to the Free Software
				18	* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	19	*/
				20
				21
				22
				23	/*
				24	General idea:
				25	-------------
				26	In a radix binary tree, we may have up to 2N-1 nodes for N keys if all of
				27	them are leaves. If we find a way to differentiate intermediate nodes (later
				28	called "nodes") and final nodes (later called "leaves"), and we associate
				29	them by two, it is possible to build sort of a self-contained radix tree with
				30	intermediate nodes always present. It will not be as cheap as the ultree for
				31	optimal cases as shown below, but the optimal case almost never happens :
				32
				33	Eg, to store 8, 10, 12, 13, 14 :
				34
				35	ultree this theorical tree
				36
				37	8 8
				38	/ \ / \
				39	10 12 10 12
				40	/ \ / \
				41	13 14 12 14
				42	/ \
				43	12 13
				44
				45	Note that on real-world tests (with a scheduler), is was verified that the
				46	case with data on an intermediate node never happens. This is because the
				47	data spectrum is too large for such coincidences to happen. It would require
				48	for instance that a task has its expiration time at an exact second, with
				49	other tasks sharing that second. This is too rare to try to optimize for it.
				50
				51	What is interesting is that the node will only be added above the leaf when
				52	necessary, which implies that it will always remain somewhere above it. So
				53	both the leaf and the node can share the exact value of the leaf, because
				54	when going down the node, the bit mask will be applied to comparisons. So we
				55	are tempted to have one single key shared between the node and the leaf.
				56
				57	The bit only serves the nodes, and the dups only serve the leaves. So we can
				58	put a lot of information in common. This results in one single entity with
				59	two branch pointers and two parent pointers, one for the node part, and one
				60	for the leaf part :
				61
				62	node's leaf's
				63	parent parent
				64	\| \|
				65	[node] [leaf]
				66	/ \
				67	left right
				68	branch branch
				69
				70	The node may very well refer to its leaf counterpart in one of its branches,
				71	indicating that its own leaf is just below it :
				72
				73	node's
				74	parent
				75	\|
				76	[node]
				77	/ \
				78	left [leaf]
				79	branch
				80
				81	Adding keys in such a tree simply consists in inserting nodes between
				82	other nodes and/or leaves :
				83
				84	[root]
				85	\|
				86	[node2]
				87	/ \
				88	[leaf1] [node3]
				89	/ \
				90	[leaf2] [leaf3]
				91
				92	On this diagram, we notice that [node2] and [leaf2] have been pulled away
				93	from each other due to the insertion of [node3], just as if there would be
				94	an elastic between both parts. This elastic-like behaviour gave its name to
				95	the tree : "Elastic Binary Tree", or "EBtree". The entity which associates a
				96	node part and a leaf part will be called an "EB node".
				97
				98	We also notice on the diagram that there is a root entity required to attach
				99	the tree. It only contains two branches and there is nothing above it. This
				100	is an "EB root". Some will note that [leaf1] has no [node1]. One property of
				101	the EBtree is that all nodes have their branches filled, and that if a node
				102	has only one branch, it does not need to exist. Here, [leaf1] was added
				103	below [root] and did not need any node.
				104
				105	An EB node contains :
				106	- a pointer to the node's parent (node_p)
				107	- a pointer to the leaf's parent (leaf_p)
				108	- two branches pointing to lower nodes or leaves (branches)
				109	- a bit position (bit)
				110	- an optional key.
				111
				112	The key here is optional because it's used only during insertion, in order
				113	to classify the nodes. Nothing else in the tree structure requires knowledge
				114	of the key. This makes it possible to write type-agnostic primitives for
				115	everything, and type-specific insertion primitives. This has led to consider
				116	two types of EB nodes. The type-agnostic ones will serve as a header for the
				117	other ones, and will simply be called "struct eb_node". The other ones will
				118	have their type indicated in the structure name. Eg: "struct eb32_node" for
				119	nodes carrying 32 bit keys.
				120
				121	We will also node that the two branches in a node serve exactly the same
				122	purpose as an EB root. For this reason, a "struct eb_root" will be used as
				123	well inside the struct eb_node. In order to ease pointer manipulation and
				124	ROOT detection when walking upwards, all the pointers inside an eb_node will
				125	point to the eb_root part of the referenced EB nodes, relying on the same
				126	principle as the linked lists in Linux.
				127
				128	Another important point to note, is that when walking inside a tree, it is
				129	very convenient to know where a node is attached in its parent, and what
				130	type of branch it has below it (leaf or node). In order to simplify the
				131	operations and to speed up the processing, it was decided in this specific
				132	implementation to use the lowest bit from the pointer to designate the side
				133	of the upper pointers (left/right) and the type of a branch (leaf/node).
				134	This practise is not mandatory by design, but an implementation-specific
				135	optimisation permitted on all platforms on which data must be aligned. All
				136	known 32 bit platforms align their integers and pointers to 32 bits, leaving
				137	the two lower bits unused. So, we say that the pointers are "tagged". And
				138	since they designate pointers to root parts, we simply call them
				139	"tagged root pointers", or "eb_troot" in the code.
				140
				141	Duplicate keys are stored in a special manner. When inserting a key, if
				142	the same one is found, then an incremental binary tree is built at this
				143	place from these keys. This ensures that no special case has to be written
				144	to handle duplicates when walking through the tree or when deleting entries.
				145	It also guarantees that duplicates will be walked in the exact same order
				146	they were inserted. This is very important when trying to achieve fair
				147	processing distribution for instance.
				148
				149	Algorithmic complexity can be derived from 3 variables :
				150	- the number of possible different keys in the tree : P
				151	- the number of entries in the tree : N
				152	- the number of duplicates for one key : D
				153
				154	Note that this tree is deliberately NOT balanced. For this reason, the worst
				155	case may happen with a small tree (eg: 32 distinct keys of one bit). BUT,
				156	the operations required to manage such data are so much cheap that they make
				157	it worth using it even under such conditions. For instance, a balanced tree
				158	may require only 6 levels to store those 32 keys when this tree will
				159	require 32. But if per-level operations are 5 times cheaper, it wins.
				160
				161	Minimal, Maximal and Average times are specified in number of operations.
				162	Minimal is given for best condition, Maximal for worst condition, and the
				163	average is reported for a tree containing random keys. An operation
				164	generally consists in jumping from one node to the other.
				165
				166	Complexity :
				167	- lookup : min=1, max=log(P), avg=log(N)
				168	- insertion from root : min=1, max=log(P), avg=log(N)
				169	- insertion of dups : min=1, max=log(D), avg=log(D)/2 after lookup
				170	- deletion : min=1, max=1, avg=1
				171	- prev/next : min=1, max=log(P), avg=2 :
				172	N/2 nodes need 1 hop => 1*N/2
				173	N/4 nodes need 2 hops => 2*N/4
				174	N/8 nodes need 3 hops => 3*N/8
				175	...
				176	N/x nodes need log(x) hops => log2(x)*N/x
				177	Total cost for all N nodes : sum[i=1..N](log2(i)N/i) = Nsum[i=1..N](log2(i)/i)
				178	Average cost across N nodes = total / N = sum[i=1..N](log2(i)/i) = 2
				179
				180	This design is currently limited to only two branches per node. Most of the
				181	tree descent algorithm would be compatible with more branches (eg: 4, to cut
				182	the height in half), but this would probably require more complex operations
				183	and the deletion algorithm would be problematic.
				184
				185	Useful properties :
				186	- a node is always added above the leaf it is tied to, and never can get
				187	below nor in another branch. This implies that leaves directly attached
				188	to the root do not use their node part, which is indicated by a NULL
				189	value in node_p. This also enhances the cache efficiency when walking
				190	down the tree, because when the leaf is reached, its node part will
				191	already have been visited (unless it's the first leaf in the tree).
				192
				193	- pointers to lower nodes or leaves are stored in "branch" pointers. Only
				194	the root node may have a NULL in either branch, it is not possible for
				195	other branches. Since the nodes are attached to the left branch of the
				196	root, it is not possible to see a NULL left branch when walking up a
				197	tree. Thus, an empty tree is immediately identified by a NULL left
				198	branch at the root. Conversely, the one and only way to identify the
				199	root node is to check that it right branch is NULL. Note that the
				200	NULL pointer may have a few low-order bits set.
				201
				202	- a node connected to its own leaf will have branch[0\|1] pointing to
				203	itself, and leaf_p pointing to itself.
				204
				205	- a node can never have node_p pointing to itself.
				206
				207	- a node is linked in a tree if and only if it has a non-null leaf_p.
				208
				209	- a node can never have both branches equal, except for the root which can
				210	have them both NULL.
				211
				212	- deletion only applies to leaves. When a leaf is deleted, its parent must
				213	be released too (unless it's the root), and its sibling must attach to
				214	the grand-parent, replacing the parent. Also, when a leaf is deleted,
				215	the node tied to this leaf will be removed and must be released too. If
				216	this node is different from the leaf's parent, the freshly released
				217	leaf's parent will be used to replace the node which must go. A released
				218	node will never be used anymore, so there's no point in tracking it.
				219
				220	- the bit index in a node indicates the bit position in the key which is
				221	represented by the branches. That means that a node with (bit == 0) is
				222	just above two leaves. Negative bit values are used to build a duplicate
				223	tree. The first node above two identical leaves gets (bit == -1). This
				224	value logarithmically decreases as the duplicate tree grows. During
				225	duplicate insertion, a node is inserted above the highest bit value (the
				226	lowest absolute value) in the tree during the right-sided walk. If bit
				227	-1 is not encountered (highest < -1), we insert above last leaf.
				228	Otherwise, we insert above the node with the highest value which was not
				229	equal to the one of its parent + 1.
				230
				231	- the "eb_next" primitive walks from left to right, which means from lower
				232	to higher keys. It returns duplicates in the order they were inserted.
				233	The "eb_first" primitive returns the left-most entry.
				234
				235	- the "eb_prev" primitive walks from right to left, which means from
				236	higher to lower keys. It returns duplicates in the opposite order they
				237	were inserted. The "eb_last" primitive returns the right-most entry.
				238
				239	- a tree which has 1 in the lower bit of its root's right branch is a
				240	tree with unique nodes. This means that when a node is inserted with
				241	a key which already exists will not be inserted, and the previous
				242	entry will be returned.
				243
				244	*/
				245
				246	#ifndef _EBTREE_H
				247	#define _EBTREE_H
				248
				249	#include <stdlib.h>
Willy Tarreau	4c7e4b7	2020-05-27 12:58:42 +0200	[diff] [blame]	250	#include <haproxy/api.h>
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	251
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	252	static inline int flsnz8_generic(unsigned int x)
				253	{
				254	int ret = 0;
				255	if (x >> 4) { x >>= 4; ret += 4; }
				256	return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
				257	}
				258
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	259	/* Note: we never need to run fls on null keys, so we can optimize the fls
				260	* function by removing a conditional jump.
				261	*/
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	262	#if defined(__i386__) \|\| defined(__x86_64__)
				263	/* this code is similar on 32 and 64 bit */
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	264	static inline int flsnz(int x)
				265	{
				266	int r;
				267	__asm__("bsrl %1,%0\n"
				268	: "=r" (r) : "rm" (x));
				269	return r+1;
				270	}
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	271
				272	static inline int flsnz8(unsigned char x)
				273	{
				274	int r;
				275	__asm__("movzbl %%al, %%eax\n"
				276	"bsrl %%eax,%0\n"
				277	: "=r" (r) : "a" (x));
				278	return r+1;
				279	}
				280
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	281	#else
				282	// returns 1 to 32 for 1<<0 to 1<<31. Undefined for 0.
				283	#define flsnz(___a) ({ \
				284	register int ___x, ___bits = 0; \
				285	___x = (___a); \
				286	if (___x & 0xffff0000) { ___x &= 0xffff0000; ___bits += 16;} \
				287	if (___x & 0xff00ff00) { ___x &= 0xff00ff00; ___bits += 8;} \
				288	if (___x & 0xf0f0f0f0) { ___x &= 0xf0f0f0f0; ___bits += 4;} \
				289	if (___x & 0xcccccccc) { ___x &= 0xcccccccc; ___bits += 2;} \
				290	if (___x & 0xaaaaaaaa) { ___x &= 0xaaaaaaaa; ___bits += 1;} \
				291	___bits + 1; \
				292	})
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	293
				294	static inline int flsnz8(unsigned int x)
				295	{
				296	return flsnz8_generic(x);
				297	}
				298
				299
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	300	#endif
				301
				302	static inline int fls64(unsigned long long x)
				303	{
				304	unsigned int h;
				305	unsigned int bits = 32;
				306
				307	h = x >> 32;
				308	if (!h) {
				309	h = x;
				310	bits = 0;
				311	}
				312	return flsnz(h) + bits;
				313	}
				314
				315	#define fls_auto(x) ((sizeof(x) > 4) ? fls64(x) : flsnz(x))
				316
				317	/* Linux-like "container_of". It returns a pointer to the structure of type
				318	* <type> which has its member <name> stored at address <ptr>.
				319	*/
				320	#ifndef container_of
				321	#define container_of(ptr, type, name) ((type )(((void )(ptr)) - ((long)&((type *)0)->name)))
				322	#endif
				323
Willy Tarreau	2b57020	2013-05-07 15:58:28 +0200	[diff] [blame]	324	/* returns a pointer to the structure of type <type> which has its member <name>
				325	* stored at address <ptr>, unless <ptr> is 0, in which case 0 is returned.
				326	*/
				327	#ifndef container_of_safe
				328	#define container_of_safe(ptr, type, name) \
				329	({ void *__p = (ptr); \
				330	__p ? (type )(__p - ((long)&((type )0)->name)) : (type *)0; \
				331	})
				332	#endif
				333
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	334	/* Number of bits per node, and number of leaves per node */
				335	#define EB_NODE_BITS 1
				336	#define EB_NODE_BRANCHES (1 << EB_NODE_BITS)
				337	#define EB_NODE_BRANCH_MASK (EB_NODE_BRANCHES - 1)
				338
				339	/* Be careful not to tweak those values. The walking code is optimized for NULL
				340	* detection on the assumption that the following values are intact.
				341	*/
				342	#define EB_LEFT 0
				343	#define EB_RGHT 1
				344	#define EB_LEAF 0
				345	#define EB_NODE 1
				346
				347	/* Tags to set in root->b[EB_RGHT] :
				348	* - EB_NORMAL is a normal tree which stores duplicate keys.
				349	* - EB_UNIQUE is a tree which stores unique keys.
				350	*/
				351	#define EB_NORMAL 0
				352	#define EB_UNIQUE 1
				353
				354	/* This is the same as an eb_node pointer, except that the lower bit embeds
				355	* a tag. See eb_dotag()/eb_untag()/eb_gettag(). This tag has two meanings :
				356	* - 0=left, 1=right to designate the parent's branch for leaf_p/node_p
				357	* - 0=link, 1=leaf to designate the branch's type for branch[]
				358	*/
				359	typedef void eb_troot_t;
				360
				361	/* The eb_root connects the node which contains it, to two nodes below it, one
				362	* of which may be the same node. At the top of the tree, we use an eb_root
				363	* too, which always has its right branch NULL (+/1 low-order bits).
				364	*/
				365	struct eb_root {
				366	eb_troot_t b[EB_NODE_BRANCHES]; / left and right branches */
				367	};
				368
				369	/* The eb_node contains the two parts, one for the leaf, which always exists,
				370	* and one for the node, which remains unused in the very first node inserted
				371	* into the tree. This structure is 20 bytes per node on 32-bit machines. Do
				372	* not change the order, benchmarks have shown that it's optimal this way.
Willy Tarreau	2c315ee	2020-02-21 15:47:36 +0100	[diff] [blame]	373	* Note: be careful about this struct's alignment if it gets included into
				374	* another struct and some atomic ops are expected on the keys or the node.
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	375	*/
				376	struct eb_node {
				377	struct eb_root branches; /* branches, must be at the beginning */
				378	eb_troot_t node_p; / link node's parent */
				379	eb_troot_t leaf_p; / leaf node's parent */
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	380	short int bit; /* link's bit position. */
Willy Tarreau	22c0a93	2011-07-25 12:22:44 +0200	[diff] [blame]	381	short unsigned int pfx; /* data prefix length, always related to leaf */
Willy Tarreau	41136de	2020-02-22 15:55:33 +0100	[diff] [blame]	382	} __attribute__((packed));
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	383
				384	/* Return the structure of type <type> whose member <member> points to <ptr> */
				385	#define eb_entry(ptr, type, member) container_of(ptr, type, member)
				386
				387	/* The root of a tree is an eb_root initialized with both pointers NULL.
				388	* During its life, only the left pointer will change. The right one will
				389	* always remain NULL, which is the way we detect it.
				390	*/
				391	#define EB_ROOT \
				392	(struct eb_root) { \
				393	.b = {[0] = NULL, [1] = NULL }, \
				394	}
				395
				396	#define EB_ROOT_UNIQUE \
				397	(struct eb_root) { \
				398	.b = {[0] = NULL, [1] = (void *)1 }, \
				399	}
				400
				401	#define EB_TREE_HEAD(name) \
				402	struct eb_root name = EB_ROOT
				403
				404
				405	/***************************************\
				406	* Private functions. Not for end-user *
				407	\***************************************/
				408
				409	/* Converts a root pointer to its equivalent eb_troot_t pointer,
				410	* ready to be stored in ->branch[], leaf_p or node_p. NULL is not
				411	* conserved. To be used with EB_LEAF, EB_NODE, EB_LEFT or EB_RGHT in <tag>.
				412	*/
				413	static inline eb_troot_t eb_dotag(const struct eb_root root, const int tag)
				414	{
				415	return (eb_troot_t )((void )root + tag);
				416	}
				417
				418	/* Converts an eb_troot_t pointer pointer to its equivalent eb_root pointer,
				419	* for use with pointers from ->branch[], leaf_p or node_p. NULL is conserved
				420	* as long as the tree is not corrupted. To be used with EB_LEAF, EB_NODE,
				421	* EB_LEFT or EB_RGHT in <tag>.
				422	*/
				423	static inline struct eb_root eb_untag(const eb_troot_t troot, const int tag)
				424	{
				425	return (struct eb_root )((void )troot - tag);
				426	}
				427
				428	/* returns the tag associated with an eb_troot_t pointer */
				429	static inline int eb_gettag(eb_troot_t *troot)
				430	{
				431	return (unsigned long)troot & 1;
				432	}
				433
				434	/* Converts a root pointer to its equivalent eb_troot_t pointer and clears the
				435	* tag, no matter what its value was.
				436	*/
				437	static inline struct eb_root eb_clrtag(const eb_troot_t troot)
				438	{
				439	return (struct eb_root *)((unsigned long)troot & ~1UL);
				440	}
				441
				442	/* Returns a pointer to the eb_node holding <root> */
				443	static inline struct eb_node eb_root_to_node(struct eb_root root)
				444	{
				445	return container_of(root, struct eb_node, branches);
				446	}
				447
				448	/* Walks down starting at root pointer <start>, and always walking on side
				449	* <side>. It either returns the node hosting the first leaf on that side,
				450	* or NULL if no leaf is found. <start> may either be NULL or a branch pointer.
				451	* The pointer to the leaf (or NULL) is returned.
				452	*/
				453	static inline struct eb_node eb_walk_down(eb_troot_t start, unsigned int side)
				454	{
				455	/* A NULL pointer on an empty tree root will be returned as-is */
				456	while (eb_gettag(start) == EB_NODE)
				457	start = (eb_untag(start, EB_NODE))->b[side];
				458	/* NULL is left untouched (root==eb_node, EB_LEAF==0) */
				459	return eb_root_to_node(eb_untag(start, EB_LEAF));
				460	}
				461
				462	/* This function is used to build a tree of duplicates by adding a new node to
				463	* a subtree of at least 2 entries. It will probably never be needed inlined,
				464	* and it is not for end-user.
				465	*/
				466	static forceinline struct eb_node *
				467	__eb_insert_dup(struct eb_node sub, struct eb_node new)
				468	{
				469	struct eb_node *head = sub;
				470
Willy Tarreau	655c84a	2011-09-19 20:36:45 +0200	[diff] [blame]	471	eb_troot_t *new_left = eb_dotag(&new->branches, EB_LEFT);
				472	eb_troot_t *new_rght = eb_dotag(&new->branches, EB_RGHT);
				473	eb_troot_t *new_leaf = eb_dotag(&new->branches, EB_LEAF);
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	474
				475	/* first, identify the deepest hole on the right branch */
				476	while (eb_gettag(head->branches.b[EB_RGHT]) != EB_LEAF) {
				477	struct eb_node *last = head;
				478	head = container_of(eb_untag(head->branches.b[EB_RGHT], EB_NODE),
				479	struct eb_node, branches);
				480	if (head->bit > last->bit + 1)
				481	sub = head; /* there's a hole here */
				482	}
				483
				484	/* Here we have a leaf attached to (head)->b[EB_RGHT] */
				485	if (head->bit < -1) {
				486	/* A hole exists just before the leaf, we insert there */
				487	new->bit = -1;
				488	sub = container_of(eb_untag(head->branches.b[EB_RGHT], EB_LEAF),
				489	struct eb_node, branches);
				490	head->branches.b[EB_RGHT] = eb_dotag(&new->branches, EB_NODE);
				491
				492	new->node_p = sub->leaf_p;
				493	new->leaf_p = new_rght;
				494	sub->leaf_p = new_left;
				495	new->branches.b[EB_LEFT] = eb_dotag(&sub->branches, EB_LEAF);
				496	new->branches.b[EB_RGHT] = new_leaf;
				497	return new;
				498	} else {
				499	int side;
				500	/* No hole was found before a leaf. We have to insert above
				501	* <sub>. Note that we cannot be certain that <sub> is attached
				502	* to the right of its parent, as this is only true if <sub>
				503	* is inside the dup tree, not at the head.
				504	*/
				505	new->bit = sub->bit - 1; /* install at the lowest level */
				506	side = eb_gettag(sub->node_p);
				507	head = container_of(eb_untag(sub->node_p, side), struct eb_node, branches);
				508	head->branches.b[side] = eb_dotag(&new->branches, EB_NODE);
				509
				510	new->node_p = sub->node_p;
				511	new->leaf_p = new_rght;
				512	sub->node_p = new_left;
				513	new->branches.b[EB_LEFT] = eb_dotag(&sub->branches, EB_NODE);
				514	new->branches.b[EB_RGHT] = new_leaf;
				515	return new;
				516	}
				517	}
				518
				519
				520	/**************************************\
				521	* Public functions, for the end-user *
				522	\**************************************/
				523
Willy Tarreau	fdc1018	2010-05-16 21:13:24 +0200	[diff] [blame]	524	/* Return non-zero if the tree is empty, otherwise zero */
Willy Tarreau	43be340	2019-10-02 15:21:58 +0200	[diff] [blame]	525	static inline int eb_is_empty(const struct eb_root *root)
Willy Tarreau	fdc1018	2010-05-16 21:13:24 +0200	[diff] [blame]	526	{
				527	return !root->b[EB_LEFT];
				528	}
				529
Willy Tarreau	2b57020	2013-05-07 15:58:28 +0200	[diff] [blame]	530	/* Return non-zero if the node is a duplicate, otherwise zero */
Willy Tarreau	43be340	2019-10-02 15:21:58 +0200	[diff] [blame]	531	static inline int eb_is_dup(const struct eb_node *node)
Willy Tarreau	2b57020	2013-05-07 15:58:28 +0200	[diff] [blame]	532	{
				533	return node->bit < 0;
				534	}
				535
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	536	/* Return the first leaf in the tree starting at <root>, or NULL if none */
				537	static inline struct eb_node eb_first(struct eb_root root)
				538	{
				539	return eb_walk_down(root->b[0], EB_LEFT);
				540	}
				541
				542	/* Return the last leaf in the tree starting at <root>, or NULL if none */
				543	static inline struct eb_node eb_last(struct eb_root root)
				544	{
				545	return eb_walk_down(root->b[0], EB_RGHT);
				546	}
				547
				548	/* Return previous leaf node before an existing leaf node, or NULL if none. */
				549	static inline struct eb_node eb_prev(struct eb_node node)
				550	{
				551	eb_troot_t *t = node->leaf_p;
				552
				553	while (eb_gettag(t) == EB_LEFT) {
				554	/* Walking up from left branch. We must ensure that we never
				555	* walk beyond root.
				556	*/
				557	if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL))
				558	return NULL;
				559	t = (eb_root_to_node(eb_untag(t, EB_LEFT)))->node_p;
				560	}
				561	/* Note that <t> cannot be NULL at this stage */
				562	t = (eb_untag(t, EB_RGHT))->b[EB_LEFT];
				563	return eb_walk_down(t, EB_RGHT);
				564	}
				565
				566	/* Return next leaf node after an existing leaf node, or NULL if none. */
				567	static inline struct eb_node eb_next(struct eb_node node)
				568	{
				569	eb_troot_t *t = node->leaf_p;
				570
				571	while (eb_gettag(t) != EB_LEFT)
				572	/* Walking up from right branch, so we cannot be below root */
				573	t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p;
				574
				575	/* Note that <t> cannot be NULL at this stage */
Willy Tarreau	2b57020	2013-05-07 15:58:28 +0200	[diff] [blame]	576	t = (eb_untag(t, EB_LEFT))->b[EB_RGHT];
				577	if (eb_clrtag(t) == NULL)
				578	return NULL;
				579	return eb_walk_down(t, EB_LEFT);
				580	}
				581
				582	/* Return previous leaf node within a duplicate sub-tree, or NULL if none. */
				583	static inline struct eb_node eb_prev_dup(struct eb_node node)
				584	{
				585	eb_troot_t *t = node->leaf_p;
				586
				587	while (eb_gettag(t) == EB_LEFT) {
				588	/* Walking up from left branch. We must ensure that we never
				589	* walk beyond root.
				590	*/
				591	if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL))
				592	return NULL;
				593	/* if the current node leaves a dup tree, quit */
				594	if ((eb_root_to_node(eb_untag(t, EB_LEFT)))->bit >= 0)
				595	return NULL;
				596	t = (eb_root_to_node(eb_untag(t, EB_LEFT)))->node_p;
				597	}
				598	/* Note that <t> cannot be NULL at this stage */
				599	if ((eb_root_to_node(eb_untag(t, EB_RGHT)))->bit >= 0)
				600	return NULL;
				601	t = (eb_untag(t, EB_RGHT))->b[EB_LEFT];
				602	return eb_walk_down(t, EB_RGHT);
				603	}
				604
				605	/* Return next leaf node within a duplicate sub-tree, or NULL if none. */
				606	static inline struct eb_node eb_next_dup(struct eb_node node)
				607	{
				608	eb_troot_t *t = node->leaf_p;
				609
				610	while (eb_gettag(t) != EB_LEFT) {
				611	/* Walking up from right branch, so we cannot be below root */
				612	/* if the current node leaves a dup tree, quit */
				613	if ((eb_root_to_node(eb_untag(t, EB_RGHT)))->bit >= 0)
				614	return NULL;
				615	t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p;
				616	}
				617
				618	/* Note that <t> cannot be NULL at this stage */
				619	if ((eb_root_to_node(eb_untag(t, EB_LEFT)))->bit >= 0)
				620	return NULL;
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	621	t = (eb_untag(t, EB_LEFT))->b[EB_RGHT];
				622	if (eb_clrtag(t) == NULL)
				623	return NULL;
				624	return eb_walk_down(t, EB_LEFT);
				625	}
				626
				627	/* Return previous leaf node before an existing leaf node, skipping duplicates,
				628	* or NULL if none. */
				629	static inline struct eb_node eb_prev_unique(struct eb_node node)
				630	{
				631	eb_troot_t *t = node->leaf_p;
				632
				633	while (1) {
				634	if (eb_gettag(t) != EB_LEFT) {
				635	node = eb_root_to_node(eb_untag(t, EB_RGHT));
				636	/* if we're right and not in duplicates, stop here */
				637	if (node->bit >= 0)
				638	break;
				639	t = node->node_p;
				640	}
				641	else {
				642	/* Walking up from left branch. We must ensure that we never
				643	* walk beyond root.
				644	*/
				645	if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL))
				646	return NULL;
				647	t = (eb_root_to_node(eb_untag(t, EB_LEFT)))->node_p;
				648	}
				649	}
				650	/* Note that <t> cannot be NULL at this stage */
				651	t = (eb_untag(t, EB_RGHT))->b[EB_LEFT];
				652	return eb_walk_down(t, EB_RGHT);
				653	}
				654
				655	/* Return next leaf node after an existing leaf node, skipping duplicates, or
				656	* NULL if none.
				657	*/
				658	static inline struct eb_node eb_next_unique(struct eb_node node)
				659	{
				660	eb_troot_t *t = node->leaf_p;
				661
				662	while (1) {
				663	if (eb_gettag(t) == EB_LEFT) {
				664	if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL))
				665	return NULL; /* we reached root */
				666	node = eb_root_to_node(eb_untag(t, EB_LEFT));
				667	/* if we're left and not in duplicates, stop here */
				668	if (node->bit >= 0)
				669	break;
				670	t = node->node_p;
				671	}
				672	else {
				673	/* Walking up from right branch, so we cannot be below root */
				674	t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p;
				675	}
				676	}
				677
				678	/* Note that <t> cannot be NULL at this stage */
				679	t = (eb_untag(t, EB_LEFT))->b[EB_RGHT];
				680	if (eb_clrtag(t) == NULL)
				681	return NULL;
				682	return eb_walk_down(t, EB_LEFT);
				683	}
				684
				685
				686	/* Removes a leaf node from the tree if it was still in it. Marks the node
				687	* as unlinked.
				688	*/
				689	static forceinline void __eb_delete(struct eb_node *node)
				690	{
				691	__label__ delete_unlink;
				692	unsigned int pside, gpside, sibtype;
				693	struct eb_node *parent;
				694	struct eb_root *gparent;
				695
				696	if (!node->leaf_p)
				697	return;
				698
				699	/* we need the parent, our side, and the grand parent */
				700	pside = eb_gettag(node->leaf_p);
				701	parent = eb_root_to_node(eb_untag(node->leaf_p, pside));
				702
				703	/* We likely have to release the parent link, unless it's the root,
				704	* in which case we only set our branch to NULL. Note that we can
				705	* only be attached to the root by its left branch.
				706	*/
				707
				708	if (eb_clrtag(parent->branches.b[EB_RGHT]) == NULL) {
				709	/* we're just below the root, it's trivial. */
				710	parent->branches.b[EB_LEFT] = NULL;
				711	goto delete_unlink;
				712	}
				713
				714	/* To release our parent, we have to identify our sibling, and reparent
				715	* it directly to/from the grand parent. Note that the sibling can
				716	* either be a link or a leaf.
				717	*/
				718
				719	gpside = eb_gettag(parent->node_p);
				720	gparent = eb_untag(parent->node_p, gpside);
				721
				722	gparent->b[gpside] = parent->branches.b[!pside];
				723	sibtype = eb_gettag(gparent->b[gpside]);
				724
				725	if (sibtype == EB_LEAF) {
				726	eb_root_to_node(eb_untag(gparent->b[gpside], EB_LEAF))->leaf_p =
				727	eb_dotag(gparent, gpside);
				728	} else {
				729	eb_root_to_node(eb_untag(gparent->b[gpside], EB_NODE))->node_p =
				730	eb_dotag(gparent, gpside);
				731	}
				732	/* Mark the parent unused. Note that we do not check if the parent is
				733	* our own node, but that's not a problem because if it is, it will be
				734	* marked unused at the same time, which we'll use below to know we can
				735	* safely remove it.
				736	*/
				737	parent->node_p = NULL;
				738
				739	/* The parent node has been detached, and is currently unused. It may
				740	* belong to another node, so we cannot remove it that way. Also, our
				741	* own node part might still be used. so we can use this spare node
				742	* to replace ours if needed.
				743	*/
				744
				745	/* If our link part is unused, we can safely exit now */
				746	if (!node->node_p)
				747	goto delete_unlink;
				748
				749	/* From now on, <node> and <parent> are necessarily different, and the
				750	* <node>'s node part is in use. By definition, <parent> is at least
				751	* below <node>, so keeping its key for the bit string is OK.
				752	*/
				753
				754	parent->node_p = node->node_p;
				755	parent->branches = node->branches;
				756	parent->bit = node->bit;
				757
				758	/* We must now update the new node's parent... */
				759	gpside = eb_gettag(parent->node_p);
				760	gparent = eb_untag(parent->node_p, gpside);
				761	gparent->b[gpside] = eb_dotag(&parent->branches, EB_NODE);
				762
				763	/* ... and its branches */
				764	for (pside = 0; pside <= 1; pside++) {
				765	if (eb_gettag(parent->branches.b[pside]) == EB_NODE) {
				766	eb_root_to_node(eb_untag(parent->branches.b[pside], EB_NODE))->node_p =
				767	eb_dotag(&parent->branches, pside);
				768	} else {
				769	eb_root_to_node(eb_untag(parent->branches.b[pside], EB_LEAF))->leaf_p =
				770	eb_dotag(&parent->branches, pside);
				771	}
				772	}
				773	delete_unlink:
				774	/* Now the node has been completely unlinked */
				775	node->leaf_p = NULL;
				776	return; /* tree is not empty yet */
				777	}
				778
				779	/* Compare blocks <a> and <b> byte-to-byte, from bit <ignore> to bit <len-1>.
				780	* Return the number of equal bits between strings, assuming that the first
				781	* <ignore> bits are already identical. It is possible to return slightly more
				782	* than <len> bits if <len> does not stop on a byte boundary and we find exact
				783	* bytes. Note that parts or all of <ignore> bits may be rechecked. It is only
				784	* passed here as a hint to speed up the check.
				785	*/
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	786	static forceinline int equal_bits(const unsigned char *a,
				787	const unsigned char *b,
				788	int ignore, int len)
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	789	{
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	790	for (ignore >>= 3, a += ignore, b += ignore, ignore <<= 3;
				791	ignore < len; ) {
				792	unsigned char c;
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	793
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	794	a++; b++;
				795	ignore += 8;
				796	c = b[-1] ^ a[-1];
				797
				798	if (c) {
				799	/* OK now we know that old and new differ at byte <ptr> and that <c> holds
				800	* the bit differences. We have to find what bit is differing and report
				801	* it as the number of identical bits. Note that low bit numbers are
				802	* assigned to high positions in the byte, as we compare them as strings.
				803	*/
				804	ignore -= flsnz8(c);
				805	break;
				806	}
				807	}
				808	return ignore;
				809	}
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	810
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	811	/* check that the two blocks <a> and <b> are equal on <len> bits. If it is known
				812	* they already are on some bytes, this number of equal bytes to be skipped may
				813	* be passed in <skip>. It returns 0 if they match, otherwise non-zero.
				814	*/
				815	static forceinline int check_bits(const unsigned char *a,
				816	const unsigned char *b,
				817	int skip,
				818	int len)
				819	{
				820	int bit, ret;
				821
				822	/* This uncommon construction gives the best performance on x86 because
				823	* it makes heavy use multiple-index addressing and parallel instructions,
				824	* and it prevents gcc from reordering the loop since it is already
				825	* properly oriented. Tested to be fine with 2.95 to 4.2.
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	826	*/
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	827	bit = ~len + (skip << 3) + 9; // = (skip << 3) + (8 - len)
				828	ret = a[skip] ^ b[skip];
				829	if (unlikely(bit >= 0))
				830	return ret >> bit;
				831	while (1) {
				832	skip++;
				833	if (ret)
				834	return ret;
				835	ret = a[skip] ^ b[skip];
				836	bit += 8;
				837	if (bit >= 0)
				838	return ret >> bit;
				839	}
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	840	}
				841
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	842
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	843	/* Compare strings <a> and <b> byte-to-byte, from bit <ignore> to the last 0.
				844	* Return the number of equal bits between strings, assuming that the first
				845	* <ignore> bits are already identical. Note that parts or all of <ignore> bits
				846	* may be rechecked. It is only passed here as a hint to speed up the check.
				847	* The caller is responsible for not passing an <ignore> value larger than any
				848	* of the two strings. However, referencing any bit from the trailing zero is
Willy Tarreau	b55fcf2	2010-10-28 22:48:29 +0200	[diff] [blame]	849	* permitted. Equal strings are reported as a negative number of bits, which
				850	* indicates the end was reached.
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	851	*/
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	852	static forceinline int string_equal_bits(const unsigned char *a,
				853	const unsigned char *b,
				854	int ignore)
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	855	{
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	856	int beg;
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	857	unsigned char c;
				858
				859	beg = ignore >> 3;
				860
				861	/* skip known and identical bits. We stop at the first different byte
				862	* or at the first zero we encounter on either side.
				863	*/
				864	while (1) {
				865	unsigned char d;
				866
				867	c = a[beg];
				868	d = b[beg];
				869	beg++;
				870
				871	c ^= d;
				872	if (c)
				873	break;
				874	if (!d)
Willy Tarreau	b55fcf2	2010-10-28 22:48:29 +0200	[diff] [blame]	875	return -1;
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	876	}
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	877	/* OK now we know that a and b differ at byte <beg>, or that both are zero.
				878	* We have to find what bit is differing and report it as the number of
				879	* identical bits. Note that low bit numbers are assigned to high positions
				880	* in the byte, as we compare them as strings.
				881	*/
Willy Tarreau	3a93244	2010-05-09 19:29:23 +0200	[diff] [blame]	882	return (beg << 3) - flsnz8(c);
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	883	}
				884
				885	static forceinline int cmp_bits(const unsigned char a, const unsigned char b, unsigned int pos)
				886	{
				887	unsigned int ofs;
				888	unsigned char bit_a, bit_b;
				889
				890	ofs = pos >> 3;
				891	pos = ~pos & 7;
				892
				893	bit_a = (a[ofs] >> pos) & 1;
				894	bit_b = (b[ofs] >> pos) & 1;
				895
				896	return bit_a - bit_b; /* -1: a<b; 0: a=b; 1: a>b */
				897	}
				898
				899	static forceinline int get_bit(const unsigned char *a, unsigned int pos)
				900	{
				901	unsigned int ofs;
				902
				903	ofs = pos >> 3;
				904	pos = ~pos & 7;
				905	return (a[ofs] >> pos) & 1;
				906	}
				907
				908	/* These functions are declared in ebtree.c */
				909	void eb_delete(struct eb_node *node);
Willy Tarreau	03e7853	2020-02-25 07:38:05 +0100	[diff] [blame]	910	struct eb_node eb_insert_dup(struct eb_node sub, struct eb_node *new);
Willy Tarreau	853926a	2020-06-16 11:10:53 +0200	[diff] [blame]	911	int eb_memcmp(const void m1, const void m2, size_t len);
Willy Tarreau	c218602	2009-10-26 19:48:54 +0100	[diff] [blame]	912
				913	#endif /* _EB_TREE_H */
				914
				915	/*
				916	* Local variables:
				917	* c-indent-level: 8
				918	* c-basic-offset: 8
				919	* End:
				920	*/