Blame - include/common/ebtree.h - haproxy

blob: a2024bc9d2dbf37d7ba48b51ff7f9ff6f5fc87ba [file] [log] [blame]

Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	1	/*
				2	* Elastic Binary Trees - generic macros and structures.
Willy Tarreau	1fb6c87	2008-05-16 19:48:20 +0200	[diff] [blame]	3	* Version 4.0
Willy Tarreau	70bcfb7	2008-01-27 02:21:53 +0100	[diff] [blame]	4	* (C) 2002-2008 - Willy Tarreau <w@1wt.eu>
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	5	*
				6	* This program is free software; you can redistribute it and/or modify
				7	* it under the terms of the GNU General Public License as published by
				8	* the Free Software Foundation; either version 2 of the License, or
				9	* (at your option) any later version.
				10	*
				11	* This program is distributed in the hope that it will be useful,
				12	* but WITHOUT ANY WARRANTY; without even the implied warranty of
				13	* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
				14	* GNU General Public License for more details.
				15	*
				16	* You should have received a copy of the GNU General Public License
				17	* along with this program; if not, write to the Free Software
				18	* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
				19	*
				20	*
				21	* Short history :
				22	*
				23	* 2007/09/28: full support for the duplicates tree => v3
				24	* 2007/07/08: merge back cleanups from kernel version.
				25	* 2007/07/01: merge into Linux Kernel (try 1).
				26	* 2007/05/27: version 2: compact everything into one single struct
				27	* 2007/05/18: adapted the structure to support embedded nodes
				28	* 2007/05/13: adapted to mempools v2.
				29	*/
				30
				31
				32
				33	/*
				34	General idea:
				35	-------------
				36	In a radix binary tree, we may have up to 2N-1 nodes for N keys if all of
				37	them are leaves. If we find a way to differentiate intermediate nodes (later
				38	called "nodes") and final nodes (later called "leaves"), and we associate
				39	them by two, it is possible to build sort of a self-contained radix tree with
				40	intermediate nodes always present. It will not be as cheap as the ultree for
				41	optimal cases as shown below, but the optimal case almost never happens :
				42
				43	Eg, to store 8, 10, 12, 13, 14 :
				44
				45	ultree this theorical tree
				46
				47	8 8
				48	/ \ / \
				49	10 12 10 12
				50	/ \ / \
				51	13 14 12 14
				52	/ \
				53	12 13
				54
				55	Note that on real-world tests (with a scheduler), is was verified that the
				56	case with data on an intermediate node never happens. This is because the
				57	data spectrum is too large for such coincidences to happen. It would require
				58	for instance that a task has its expiration time at an exact second, with
				59	other tasks sharing that second. This is too rare to try to optimize for it.
				60
				61	What is interesting is that the node will only be added above the leaf when
				62	necessary, which implies that it will always remain somewhere above it. So
				63	both the leaf and the node can share the exact value of the leaf, because
				64	when going down the node, the bit mask will be applied to comparisons. So we
				65	are tempted to have one single key shared between the node and the leaf.
				66
				67	The bit only serves the nodes, and the dups only serve the leaves. So we can
				68	put a lot of information in common. This results in one single entity with
				69	two branch pointers and two parent pointers, one for the node part, and one
				70	for the leaf part :
				71
				72	node's leaf's
				73	parent parent
				74	\| \|
				75	[node] [leaf]
				76	/ \
				77	left right
				78	branch branch
				79
				80	The node may very well refer to its leaf counterpart in one of its branches,
				81	indicating that its own leaf is just below it :
				82
				83	node's
				84	parent
				85	\|
				86	[node]
				87	/ \
				88	left [leaf]
				89	branch
				90
				91	Adding keys in such a tree simply consists in inserting nodes between
				92	other nodes and/or leaves :
				93
				94	[root]
				95	\|
				96	[node2]
				97	/ \
				98	[leaf1] [node3]
				99	/ \
				100	[leaf2] [leaf3]
				101
				102	On this diagram, we notice that [node2] and [leaf2] have been pulled away
				103	from each other due to the insertion of [node3], just as if there would be
				104	an elastic between both parts. This elastic-like behaviour gave its name to
				105	the tree : "Elastic Binary Tree", or "EBtree". The entity which associates a
				106	node part and a leaf part will be called an "EB node".
				107
				108	We also notice on the diagram that there is a root entity required to attach
				109	the tree. It only contains two branches and there is nothing above it. This
				110	is an "EB root". Some will note that [leaf1] has no [node1]. One property of
				111	the EBtree is that all nodes have their branches filled, and that if a node
				112	has only one branch, it does not need to exist. Here, [leaf1] was added
				113	below [root] and did not need any node.
				114
				115	An EB node contains :
				116	- a pointer to the node's parent (node_p)
				117	- a pointer to the leaf's parent (leaf_p)
				118	- two branches pointing to lower nodes or leaves (branches)
				119	- a bit position (bit)
				120	- an optional key.
				121
				122	The key here is optional because it's used only during insertion, in order
				123	to classify the nodes. Nothing else in the tree structure requires knowledge
				124	of the key. This makes it possible to write type-agnostic primitives for
				125	everything, and type-specific insertion primitives. This has led to consider
				126	two types of EB nodes. The type-agnostic ones will serve as a header for the
				127	other ones, and will simply be called "struct eb_node". The other ones will
				128	have their type indicated in the structure name. Eg: "struct eb32_node" for
				129	nodes carrying 32 bit keys.
				130
				131	We will also node that the two branches in a node serve exactly the same
				132	purpose as an EB root. For this reason, a "struct eb_root" will be used as
				133	well inside the struct eb_node. In order to ease pointer manipulation and
				134	ROOT detection when walking upwards, all the pointers inside an eb_node will
				135	point to the eb_root part of the referenced EB nodes, relying on the same
				136	principle as the linked lists in Linux.
				137
				138	Another important point to note, is that when walking inside a tree, it is
				139	very convenient to know where a node is attached in its parent, and what
				140	type of branch it has below it (leaf or node). In order to simplify the
				141	operations and to speed up the processing, it was decided in this specific
				142	implementation to use the lowest bit from the pointer to designate the side
				143	of the upper pointers (left/right) and the type of a branch (leaf/node).
				144	This practise is not mandatory by design, but an implementation-specific
				145	optimisation permitted on all platforms on which data must be aligned. All
				146	known 32 bit platforms align their integers and pointers to 32 bits, leaving
				147	the two lower bits unused. So, we say that the pointers are "tagged". And
				148	since they designate pointers to root parts, we simply call them
				149	"tagged root pointers", or "eb_troot" in the code.
				150
				151	Duplicate keys are stored in a special manner. When inserting a key, if
				152	the same one is found, then an incremental binary tree is built at this
				153	place from these keys. This ensures that no special case has to be written
				154	to handle duplicates when walking through the tree or when deleting entries.
				155	It also guarantees that duplicates will be walked in the exact same order
				156	they were inserted. This is very important when trying to achieve fair
				157	processing distribution for instance.
				158
				159	Algorithmic complexity can be derived from 3 variables :
				160	- the number of possible different keys in the tree : P
				161	- the number of entries in the tree : N
				162	- the number of duplicates for one key : D
				163
				164	Note that this tree is deliberately NOT balanced. For this reason, the worst
				165	case may happen with a small tree (eg: 32 distinct keys of one bit). BUT,
				166	the operations required to manage such data are so much cheap that they make
				167	it worth using it even under such conditions. For instance, a balanced tree
				168	may require only 6 levels to store those 32 keys when this tree will
				169	require 32. But if per-level operations are 5 times cheaper, it wins.
				170
				171	Minimal, Maximal and Average times are specified in number of operations.
				172	Minimal is given for best condition, Maximal for worst condition, and the
				173	average is reported for a tree containing random keys. An operation
				174	generally consists in jumping from one node to the other.
				175
				176	Complexity :
				177	- lookup : min=1, max=log(P), avg=log(N)
				178	- insertion from root : min=1, max=log(P), avg=log(N)
				179	- insertion of dups : min=1, max=log(D), avg=log(D)/2 after lookup
				180	- deletion : min=1, max=1, avg=1
				181	- prev/next : min=1, max=log(P), avg=2 :
				182	N/2 nodes need 1 hop => 1*N/2
				183	N/4 nodes need 2 hops => 2*N/4
				184	N/8 nodes need 3 hops => 3*N/8
				185	...
				186	N/x nodes need log(x) hops => log2(x)*N/x
				187	Total cost for all N nodes : sum[i=1..N](log2(i)N/i) = Nsum[i=1..N](log2(i)/i)
				188	Average cost across N nodes = total / N = sum[i=1..N](log2(i)/i) = 2
				189
				190	This design is currently limited to only two branches per node. Most of the
				191	tree descent algorithm would be compatible with more branches (eg: 4, to cut
				192	the height in half), but this would probably require more complex operations
				193	and the deletion algorithm would be problematic.
				194
				195	Useful properties :
				196	- a node is always added above the leaf it is tied to, and never can get
				197	below nor in another branch. This implies that leaves directly attached
				198	to the root do not use their node part, which is indicated by a NULL
				199	value in node_p. This also enhances the cache efficiency when walking
				200	down the tree, because when the leaf is reached, its node part will
				201	already have been visited (unless it's the first leaf in the tree).
				202
				203	- pointers to lower nodes or leaves are stored in "branch" pointers. Only
				204	the root node may have a NULL in either branch, it is not possible for
				205	other branches. Since the nodes are attached to the left branch of the
				206	root, it is not possible to see a NULL left branch when walking up a
				207	tree. Thus, an empty tree is immediately identified by a NULL left
				208	branch at the root. Conversely, the one and only way to identify the
Willy Tarreau	1fb6c87	2008-05-16 19:48:20 +0200	[diff] [blame]	209	root node is to check that it right branch is NULL. Note that the
				210	NULL pointer may have a few low-order bits set.
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	211
				212	- a node connected to its own leaf will have branch[0\|1] pointing to
				213	itself, and leaf_p pointing to itself.
				214
				215	- a node can never have node_p pointing to itself.
				216
				217	- a node is linked in a tree if and only if it has a non-null leaf_p.
				218
				219	- a node can never have both branches equal, except for the root which can
				220	have them both NULL.
				221
				222	- deletion only applies to leaves. When a leaf is deleted, its parent must
				223	be released too (unless it's the root), and its sibling must attach to
				224	the grand-parent, replacing the parent. Also, when a leaf is deleted,
				225	the node tied to this leaf will be removed and must be released too. If
				226	this node is different from the leaf's parent, the freshly released
				227	leaf's parent will be used to replace the node which must go. A released
				228	node will never be used anymore, so there's no point in tracking it.
				229
				230	- the bit index in a node indicates the bit position in the key which is
				231	represented by the branches. That means that a node with (bit == 0) is
				232	just above two leaves. Negative bit values are used to build a duplicate
				233	tree. The first node above two identical leaves gets (bit == -1). This
				234	value logarithmically decreases as the duplicate tree grows. During
				235	duplicate insertion, a node is inserted above the highest bit value (the
				236	lowest absolute value) in the tree during the right-sided walk. If bit
				237	-1 is not encountered (highest < -1), we insert above last leaf.
				238	Otherwise, we insert above the node with the highest value which was not
				239	equal to the one of its parent + 1.
				240
				241	- the "eb_next" primitive walks from left to right, which means from lower
				242	to higher keys. It returns duplicates in the order they were inserted.
				243	The "eb_first" primitive returns the left-most entry.
				244
				245	- the "eb_prev" primitive walks from right to left, which means from
				246	higher to lower keys. It returns duplicates in the opposite order they
				247	were inserted. The "eb_last" primitive returns the right-most entry.
				248
Willy Tarreau	1fb6c87	2008-05-16 19:48:20 +0200	[diff] [blame]	249	- a tree which has 1 in the lower bit of its root's right branch is a
				250	tree with unique nodes. This means that when a node is inserted with
				251	a key which already exists will not be inserted, and the previous
				252	entry will be returned.
				253
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	254	*/
				255
Willy Tarreau	f56fd8a	2007-11-19 18:43:04 +0100	[diff] [blame]	256	#ifndef _COMMON_EBTREE_H
				257	#define _COMMON_EBTREE_H
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	258
				259	#include <stdlib.h>
Willy Tarreau	be68e6a	2007-11-21 10:34:15 +0100	[diff] [blame]	260	#include <common/config.h>
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	261
				262	/* Note: we never need to run fls on null keys, so we can optimize the fls
				263	* function by removing a conditional jump.
				264	*/
				265	#if defined(__i386__)
				266	static inline int flsnz(int x)
				267	{
				268	int r;
				269	__asm__("bsrl %1,%0\n"
				270	: "=r" (r) : "rm" (x));
				271	return r+1;
				272	}
				273	#else
				274	// returns 1 to 32 for 1<<0 to 1<<31. Undefined for 0.
				275	#define flsnz(___a) ({ \
				276	register int ___x, ___bits = 0; \
				277	___x = (___a); \
				278	if (___x & 0xffff0000) { ___x &= 0xffff0000; ___bits += 16;} \
				279	if (___x & 0xff00ff00) { ___x &= 0xff00ff00; ___bits += 8;} \
				280	if (___x & 0xf0f0f0f0) { ___x &= 0xf0f0f0f0; ___bits += 4;} \
				281	if (___x & 0xcccccccc) { ___x &= 0xcccccccc; ___bits += 2;} \
				282	if (___x & 0xaaaaaaaa) { ___x &= 0xaaaaaaaa; ___bits += 1;} \
				283	___bits + 1; \
				284	})
				285	#endif
				286
				287	static inline int fls64(unsigned long long x)
				288	{
				289	unsigned int h;
				290	unsigned int bits = 32;
				291
				292	h = x >> 32;
				293	if (!h) {
				294	h = x;
				295	bits = 0;
				296	}
				297	return flsnz(h) + bits;
				298	}
				299
				300	#define fls_auto(x) ((sizeof(x) > 4) ? fls64(x) : flsnz(x))
				301
				302	/* Linux-like "container_of". It returns a pointer to the structure of type
				303	* <type> which has its member <name> stored at address <ptr>.
				304	*/
				305	#ifndef container_of
				306	#define container_of(ptr, type, name) ((type )(((void )(ptr)) - ((long)&((type *)0)->name)))
				307	#endif
				308
				309	/*
				310	* Gcc >= 3 provides the ability for the program to give hints to the compiler
				311	* about what branch of an if is most likely to be taken. This helps the
				312	* compiler produce the most compact critical paths, which is generally better
				313	* for the cache and to reduce the number of jumps. Be very careful not to use
				314	* this in inline functions, because the code reordering it causes very often
				315	* has a negative impact on the calling functions.
				316	*/
Willy Tarreau	70bcfb7	2008-01-27 02:21:53 +0100	[diff] [blame]	317	#if !defined(likely)
				318	#if __GNUC__ < 3
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	319	#define __builtin_expect(x,y) (x)
Willy Tarreau	70bcfb7	2008-01-27 02:21:53 +0100	[diff] [blame]	320	#define likely(x) (x)
				321	#define unlikely(x) (x)
				322	#elif __GNUC__ < 4
				323	/* gcc 3.x does the best job at this */
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	324	#define likely(x) (__builtin_expect((x) != 0, 1))
				325	#define unlikely(x) (__builtin_expect((x) != 0, 0))
Willy Tarreau	70bcfb7	2008-01-27 02:21:53 +0100	[diff] [blame]	326	#else
				327	/* GCC 4.x is stupid, it performs the comparison then compares it to 1,
				328	* so we cheat in a dirty way to prevent it from doing this. This will
				329	* only work with ints and booleans though.
				330	*/
				331	#define likely(x) (x)
Willy Tarreau	75875a7	2008-07-06 15:18:50 +0200	[diff] [blame]	332	#define unlikely(x) (__builtin_expect((unsigned long)(x), 0))
Willy Tarreau	70bcfb7	2008-01-27 02:21:53 +0100	[diff] [blame]	333	#endif
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	334	#endif
				335
Willy Tarreau	75cf17e	2008-08-29 15:48:49 +0200	[diff] [blame]	336	/* By default, gcc does not inline large chunks of code, but we want it to
				337	* respect our choices.
				338	*/
				339	#if !defined(forceinline)
				340	#if __GNUC__ < 3
				341	#define forceinline inline
				342	#else
				343	#define forceinline inline __attribute__((always_inline))
				344	#endif
				345	#endif
				346
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	347	/* Support passing function parameters in registers. For this, the
				348	* CONFIG_EBTREE_REGPARM macro has to be set to the maximal number of registers
				349	* allowed. Some functions have intentionally received a regparm lower than
				350	* their parameter count, it is in order to avoid register clobbering where
				351	* they are called.
				352	*/
				353	#ifndef REGPRM1
				354	#if CONFIG_EBTREE_REGPARM >= 1
				355	#define REGPRM1 __attribute__((regparm(1)))
				356	#else
				357	#define REGPRM1
				358	#endif
				359	#endif
				360
				361	#ifndef REGPRM2
				362	#if CONFIG_EBTREE_REGPARM >= 2
				363	#define REGPRM2 __attribute__((regparm(2)))
				364	#else
				365	#define REGPRM2 REGPRM1
				366	#endif
				367	#endif
				368
				369	#ifndef REGPRM3
				370	#if CONFIG_EBTREE_REGPARM >= 3
				371	#define REGPRM3 __attribute__((regparm(3)))
				372	#else
				373	#define REGPRM3 REGPRM2
				374	#endif
				375	#endif
				376
				377	/* Number of bits per node, and number of leaves per node */
				378	#define EB_NODE_BITS 1
				379	#define EB_NODE_BRANCHES (1 << EB_NODE_BITS)
				380	#define EB_NODE_BRANCH_MASK (EB_NODE_BRANCHES - 1)
				381
				382	/* Be careful not to tweak those values. The walking code is optimized for NULL
				383	* detection on the assumption that the following values are intact.
				384	*/
				385	#define EB_LEFT 0
				386	#define EB_RGHT 1
				387	#define EB_LEAF 0
				388	#define EB_NODE 1
				389
Willy Tarreau	1fb6c87	2008-05-16 19:48:20 +0200	[diff] [blame]	390	/* Tags to set in root->b[EB_RGHT] :
				391	* - EB_NORMAL is a normal tree which stores duplicate keys.
				392	* - EB_UNIQUE is a tree which stores unique keys.
				393	*/
				394	#define EB_NORMAL 0
				395	#define EB_UNIQUE 1
				396
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	397	/* This is the same as an eb_node pointer, except that the lower bit embeds
				398	* a tag. See eb_dotag()/eb_untag()/eb_gettag(). This tag has two meanings :
				399	* - 0=left, 1=right to designate the parent's branch for leaf_p/node_p
				400	* - 0=link, 1=leaf to designate the branch's type for branch[]
				401	*/
				402	typedef void eb_troot_t;
				403
				404	/* The eb_root connects the node which contains it, to two nodes below it, one
				405	* of which may be the same node. At the top of the tree, we use an eb_root
Willy Tarreau	1fb6c87	2008-05-16 19:48:20 +0200	[diff] [blame]	406	* too, which always has its right branch NULL (+/1 low-order bits).
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	407	*/
				408	struct eb_root {
				409	eb_troot_t b[EB_NODE_BRANCHES]; / left and right branches */
				410	};
				411
				412	/* The eb_node contains the two parts, one for the leaf, which always exists,
				413	* and one for the node, which remains unused in the very first node inserted
				414	* into the tree. This structure is 20 bytes per node on 32-bit machines. Do
				415	* not change the order, benchmarks have shown that it's optimal this way.
				416	*/
				417	struct eb_node {
				418	struct eb_root branches; /* branches, must be at the beginning */
				419	eb_troot_t node_p; / link node's parent */
				420	eb_troot_t leaf_p; / leaf node's parent */
				421	int bit; /* link's bit position. */
				422	};
				423
				424	/* Return the structure of type <type> whose member <member> points to <ptr> */
				425	#define eb_entry(ptr, type, member) container_of(ptr, type, member)
				426
				427	/* The root of a tree is an eb_root initialized with both pointers NULL.
				428	* During its life, only the left pointer will change. The right one will
				429	* always remain NULL, which is the way we detect it.
				430	*/
				431	#define EB_ROOT \
				432	(struct eb_root) { \
				433	.b = {[0] = NULL, [1] = NULL }, \
				434	}
				435
Willy Tarreau	1fb6c87	2008-05-16 19:48:20 +0200	[diff] [blame]	436	#define EB_ROOT_UNIQUE \
				437	(struct eb_root) { \
				438	.b = {[0] = NULL, [1] = (void *)1 }, \
				439	}
				440
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	441	#define EB_TREE_HEAD(name) \
				442	struct eb_root name = EB_ROOT
				443
				444
				445	/***************************************\
				446	* Private functions. Not for end-user *
				447	\***************************************/
				448
				449	/* Converts a root pointer to its equivalent eb_troot_t pointer,
				450	* ready to be stored in ->branch[], leaf_p or node_p. NULL is not
				451	* conserved. To be used with EB_LEAF, EB_NODE, EB_LEFT or EB_RGHT in <tag>.
				452	*/
				453	static inline eb_troot_t eb_dotag(const struct eb_root root, const int tag)
				454	{
				455	return (eb_troot_t )((void )root + tag);
				456	}
				457
				458	/* Converts an eb_troot_t pointer pointer to its equivalent eb_root pointer,
				459	* for use with pointers from ->branch[], leaf_p or node_p. NULL is conserved
				460	* as long as the tree is not corrupted. To be used with EB_LEAF, EB_NODE,
				461	* EB_LEFT or EB_RGHT in <tag>.
				462	*/
				463	static inline struct eb_root eb_untag(const eb_troot_t troot, const int tag)
				464	{
				465	return (struct eb_root )((void )troot - tag);
				466	}
				467
				468	/* returns the tag associated with an eb_troot_t pointer */
				469	static inline int eb_gettag(eb_troot_t *troot)
				470	{
				471	return (unsigned long)troot & 1;
				472	}
				473
				474	/* Converts a root pointer to its equivalent eb_troot_t pointer and clears the
				475	* tag, no matter what its value was.
				476	*/
				477	static inline struct eb_root eb_clrtag(const eb_troot_t troot)
				478	{
				479	return (struct eb_root *)((unsigned long)troot & ~1UL);
				480	}
				481
				482	/* Returns a pointer to the eb_node holding <root> */
				483	static inline struct eb_node eb_root_to_node(struct eb_root root)
				484	{
				485	return container_of(root, struct eb_node, branches);
				486	}
				487
				488	/* Walks down starting at root pointer <start>, and always walking on side
				489	* <side>. It either returns the node hosting the first leaf on that side,
				490	* or NULL if no leaf is found. <start> may either be NULL or a branch pointer.
				491	* The pointer to the leaf (or NULL) is returned.
				492	*/
				493	static inline struct eb_node eb_walk_down(eb_troot_t start, unsigned int side)
				494	{
				495	/* A NULL pointer on an empty tree root will be returned as-is */
				496	while (eb_gettag(start) == EB_NODE)
				497	start = (eb_untag(start, EB_NODE))->b[side];
				498	/* NULL is left untouched (root==eb_node, EB_LEAF==0) */
				499	return eb_root_to_node(eb_untag(start, EB_LEAF));
				500	}
				501
				502	/* This function is used to build a tree of duplicates by adding a new node to
				503	* a subtree of at least 2 entries. It will probably never be needed inlined,
				504	* and it is not for end-user.
				505	*/
Willy Tarreau	75cf17e	2008-08-29 15:48:49 +0200	[diff] [blame]	506	static forceinline struct eb_node *
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	507	__eb_insert_dup(struct eb_node sub, struct eb_node new)
				508	{
				509	struct eb_node *head = sub;
				510
				511	struct eb_troot *new_left = eb_dotag(&new->branches, EB_LEFT);
				512	struct eb_troot *new_rght = eb_dotag(&new->branches, EB_RGHT);
				513	struct eb_troot *new_leaf = eb_dotag(&new->branches, EB_LEAF);
				514
				515	/* first, identify the deepest hole on the right branch */
				516	while (eb_gettag(head->branches.b[EB_RGHT]) != EB_LEAF) {
				517	struct eb_node *last = head;
				518	head = container_of(eb_untag(head->branches.b[EB_RGHT], EB_NODE),
				519	struct eb_node, branches);
				520	if (head->bit > last->bit + 1)
				521	sub = head; /* there's a hole here */
				522	}
				523
				524	/* Here we have a leaf attached to (head)->b[EB_RGHT] */
				525	if (head->bit < -1) {
				526	/* A hole exists just before the leaf, we insert there */
				527	new->bit = -1;
				528	sub = container_of(eb_untag(head->branches.b[EB_RGHT], EB_LEAF),
				529	struct eb_node, branches);
				530	head->branches.b[EB_RGHT] = eb_dotag(&new->branches, EB_NODE);
				531
				532	new->node_p = sub->leaf_p;
				533	new->leaf_p = new_rght;
				534	sub->leaf_p = new_left;
				535	new->branches.b[EB_LEFT] = eb_dotag(&sub->branches, EB_LEAF);
				536	new->branches.b[EB_RGHT] = new_leaf;
				537	return new;
				538	} else {
				539	int side;
				540	/* No hole was found before a leaf. We have to insert above
				541	* <sub>. Note that we cannot be certain that <sub> is attached
				542	* to the right of its parent, as this is only true if <sub>
				543	* is inside the dup tree, not at the head.
				544	*/
				545	new->bit = sub->bit - 1; /* install at the lowest level */
				546	side = eb_gettag(sub->node_p);
				547	head = container_of(eb_untag(sub->node_p, side), struct eb_node, branches);
				548	head->branches.b[side] = eb_dotag(&new->branches, EB_NODE);
				549
				550	new->node_p = sub->node_p;
				551	new->leaf_p = new_rght;
				552	sub->node_p = new_left;
				553	new->branches.b[EB_LEFT] = eb_dotag(&sub->branches, EB_NODE);
				554	new->branches.b[EB_RGHT] = new_leaf;
				555	return new;
				556	}
				557	}
				558
				559
				560	/**************************************\
				561	* Public functions, for the end-user *
				562	\**************************************/
				563
				564	/* Return the first leaf in the tree starting at <root>, or NULL if none */
				565	static inline struct eb_node eb_first(struct eb_root root)
				566	{
				567	return eb_walk_down(root->b[0], EB_LEFT);
				568	}
				569
				570	/* Return the last leaf in the tree starting at <root>, or NULL if none */
				571	static inline struct eb_node eb_last(struct eb_root root)
				572	{
				573	return eb_walk_down(root->b[0], EB_RGHT);
				574	}
				575
				576	/* Return previous leaf node before an existing leaf node, or NULL if none. */
				577	static inline struct eb_node eb_prev(struct eb_node node)
				578	{
				579	eb_troot_t *t = node->leaf_p;
				580
				581	while (eb_gettag(t) == EB_LEFT) {
				582	/* Walking up from left branch. We must ensure that we never
				583	* walk beyond root.
				584	*/
Willy Tarreau	1fb6c87	2008-05-16 19:48:20 +0200	[diff] [blame]	585	if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL))
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	586	return NULL;
				587	t = (eb_root_to_node(eb_untag(t, EB_LEFT)))->node_p;
				588	}
				589	/* Note that <t> cannot be NULL at this stage */
				590	t = (eb_untag(t, EB_RGHT))->b[EB_LEFT];
				591	return eb_walk_down(t, EB_RGHT);
				592	}
				593
				594	/* Return next leaf node after an existing leaf node, or NULL if none. */
				595	static inline struct eb_node eb_next(struct eb_node node)
				596	{
				597	eb_troot_t *t = node->leaf_p;
				598
				599	while (eb_gettag(t) != EB_LEFT)
				600	/* Walking up from right branch, so we cannot be below root */
				601	t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p;
				602
				603	/* Note that <t> cannot be NULL at this stage */
				604	t = (eb_untag(t, EB_LEFT))->b[EB_RGHT];
Willy Tarreau	1fb6c87	2008-05-16 19:48:20 +0200	[diff] [blame]	605	if (eb_clrtag(t) == NULL)
				606	return NULL;
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	607	return eb_walk_down(t, EB_LEFT);
				608	}
				609
				610	/* Return previous leaf node before an existing leaf node, skipping duplicates,
				611	* or NULL if none. */
				612	static inline struct eb_node eb_prev_unique(struct eb_node node)
				613	{
				614	eb_troot_t *t = node->leaf_p;
				615
				616	while (1) {
				617	if (eb_gettag(t) != EB_LEFT) {
				618	node = eb_root_to_node(eb_untag(t, EB_RGHT));
				619	/* if we're right and not in duplicates, stop here */
				620	if (node->bit >= 0)
				621	break;
				622	t = node->node_p;
				623	}
				624	else {
				625	/* Walking up from left branch. We must ensure that we never
				626	* walk beyond root.
				627	*/
Willy Tarreau	1fb6c87	2008-05-16 19:48:20 +0200	[diff] [blame]	628	if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL))
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	629	return NULL;
				630	t = (eb_root_to_node(eb_untag(t, EB_LEFT)))->node_p;
				631	}
				632	}
				633	/* Note that <t> cannot be NULL at this stage */
				634	t = (eb_untag(t, EB_RGHT))->b[EB_LEFT];
				635	return eb_walk_down(t, EB_RGHT);
				636	}
				637
				638	/* Return next leaf node after an existing leaf node, skipping duplicates, or
				639	* NULL if none.
				640	*/
				641	static inline struct eb_node eb_next_unique(struct eb_node node)
				642	{
				643	eb_troot_t *t = node->leaf_p;
				644
				645	while (1) {
				646	if (eb_gettag(t) == EB_LEFT) {
Willy Tarreau	1fb6c87	2008-05-16 19:48:20 +0200	[diff] [blame]	647	if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL))
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	648	return NULL; /* we reached root */
				649	node = eb_root_to_node(eb_untag(t, EB_LEFT));
				650	/* if we're left and not in duplicates, stop here */
				651	if (node->bit >= 0)
				652	break;
				653	t = node->node_p;
				654	}
				655	else {
				656	/* Walking up from right branch, so we cannot be below root */
				657	t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p;
				658	}
				659	}
				660
				661	/* Note that <t> cannot be NULL at this stage */
				662	t = (eb_untag(t, EB_LEFT))->b[EB_RGHT];
Willy Tarreau	1fb6c87	2008-05-16 19:48:20 +0200	[diff] [blame]	663	if (eb_clrtag(t) == NULL)
				664	return NULL;
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	665	return eb_walk_down(t, EB_LEFT);
				666	}
				667
				668
				669	/* Removes a leaf node from the tree if it was still in it. Marks the node
				670	* as unlinked.
				671	*/
Willy Tarreau	75cf17e	2008-08-29 15:48:49 +0200	[diff] [blame]	672	static forceinline void __eb_delete(struct eb_node *node)
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	673	{
				674	__label__ delete_unlink;
				675	unsigned int pside, gpside, sibtype;
				676	struct eb_node *parent;
				677	struct eb_root *gparent;
				678
				679	if (!node->leaf_p)
				680	return;
				681
				682	/* we need the parent, our side, and the grand parent */
				683	pside = eb_gettag(node->leaf_p);
				684	parent = eb_root_to_node(eb_untag(node->leaf_p, pside));
				685
				686	/* We likely have to release the parent link, unless it's the root,
				687	* in which case we only set our branch to NULL. Note that we can
				688	* only be attached to the root by its left branch.
				689	*/
				690
Willy Tarreau	1fb6c87	2008-05-16 19:48:20 +0200	[diff] [blame]	691	if (eb_clrtag(parent->branches.b[EB_RGHT]) == NULL) {
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	692	/* we're just below the root, it's trivial. */
				693	parent->branches.b[EB_LEFT] = NULL;
				694	goto delete_unlink;
				695	}
				696
				697	/* To release our parent, we have to identify our sibling, and reparent
				698	* it directly to/from the grand parent. Note that the sibling can
				699	* either be a link or a leaf.
				700	*/
				701
				702	gpside = eb_gettag(parent->node_p);
				703	gparent = eb_untag(parent->node_p, gpside);
				704
				705	gparent->b[gpside] = parent->branches.b[!pside];
				706	sibtype = eb_gettag(gparent->b[gpside]);
				707
				708	if (sibtype == EB_LEAF) {
				709	eb_root_to_node(eb_untag(gparent->b[gpside], EB_LEAF))->leaf_p =
				710	eb_dotag(gparent, gpside);
				711	} else {
				712	eb_root_to_node(eb_untag(gparent->b[gpside], EB_NODE))->node_p =
				713	eb_dotag(gparent, gpside);
				714	}
				715	/* Mark the parent unused. Note that we do not check if the parent is
				716	* our own node, but that's not a problem because if it is, it will be
				717	* marked unused at the same time, which we'll use below to know we can
				718	* safely remove it.
				719	*/
				720	parent->node_p = NULL;
				721
				722	/* The parent node has been detached, and is currently unused. It may
				723	* belong to another node, so we cannot remove it that way. Also, our
				724	* own node part might still be used. so we can use this spare node
				725	* to replace ours if needed.
				726	*/
				727
				728	/* If our link part is unused, we can safely exit now */
				729	if (!node->node_p)
				730	goto delete_unlink;
				731
				732	/* From now on, <node> and <parent> are necessarily different, and the
				733	* <node>'s node part is in use. By definition, <parent> is at least
				734	* below <node>, so keeping its key for the bit string is OK.
				735	*/
				736
				737	parent->node_p = node->node_p;
				738	parent->branches = node->branches;
				739	parent->bit = node->bit;
				740
				741	/* We must now update the new node's parent... */
				742	gpside = eb_gettag(parent->node_p);
				743	gparent = eb_untag(parent->node_p, gpside);
				744	gparent->b[gpside] = eb_dotag(&parent->branches, EB_NODE);
				745
				746	/* ... and its branches */
				747	for (pside = 0; pside <= 1; pside++) {
				748	if (eb_gettag(parent->branches.b[pside]) == EB_NODE) {
				749	eb_root_to_node(eb_untag(parent->branches.b[pside], EB_NODE))->node_p =
				750	eb_dotag(&parent->branches, pside);
				751	} else {
				752	eb_root_to_node(eb_untag(parent->branches.b[pside], EB_LEAF))->leaf_p =
				753	eb_dotag(&parent->branches, pside);
				754	}
				755	}
				756	delete_unlink:
				757	/* Now the node has been completely unlinked */
				758	node->leaf_p = NULL;
				759	return; /* tree is not empty yet */
				760	}
				761
				762	/* These functions are declared in ebtree.c */
				763	void eb_delete(struct eb_node *node);
				764	REGPRM1 struct eb_node eb_insert_dup(struct eb_node sub, struct eb_node *new);
				765
Willy Tarreau	f56fd8a	2007-11-19 18:43:04 +0100	[diff] [blame]	766	#endif /* _COMMON_EBTREE_H */
Willy Tarreau	e6d2e4d	2007-11-15 23:56:17 +0100	[diff] [blame]	767
				768	/*
				769	* Local variables:
				770	* c-indent-level: 8
				771	* c-basic-offset: 8
				772	* End:
				773	*/