blob: 854666ab590490743a3a15e00e6ffc5970c7fba4 [file] [log] [blame]
Willy Tarreaue6d2e4d2007-11-15 23:56:17 +01001/*
2 * Elastic Binary Trees - generic macros and structures.
3 * (C) 2002-2007 - Willy Tarreau <w@1wt.eu>
4 *
5 * This program is free software; you can redistribute it and/or modify
6 * it under the terms of the GNU General Public License as published by
7 * the Free Software Foundation; either version 2 of the License, or
8 * (at your option) any later version.
9 *
10 * This program is distributed in the hope that it will be useful,
11 * but WITHOUT ANY WARRANTY; without even the implied warranty of
12 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
13 * GNU General Public License for more details.
14 *
15 * You should have received a copy of the GNU General Public License
16 * along with this program; if not, write to the Free Software
17 * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
18 *
19 *
20 * Short history :
21 *
22 * 2007/09/28: full support for the duplicates tree => v3
23 * 2007/07/08: merge back cleanups from kernel version.
24 * 2007/07/01: merge into Linux Kernel (try 1).
25 * 2007/05/27: version 2: compact everything into one single struct
26 * 2007/05/18: adapted the structure to support embedded nodes
27 * 2007/05/13: adapted to mempools v2.
28 */
29
30
31
32/*
33 General idea:
34 -------------
35 In a radix binary tree, we may have up to 2N-1 nodes for N keys if all of
36 them are leaves. If we find a way to differentiate intermediate nodes (later
37 called "nodes") and final nodes (later called "leaves"), and we associate
38 them by two, it is possible to build sort of a self-contained radix tree with
39 intermediate nodes always present. It will not be as cheap as the ultree for
40 optimal cases as shown below, but the optimal case almost never happens :
41
42 Eg, to store 8, 10, 12, 13, 14 :
43
44 ultree this theorical tree
45
46 8 8
47 / \ / \
48 10 12 10 12
49 / \ / \
50 13 14 12 14
51 / \
52 12 13
53
54 Note that on real-world tests (with a scheduler), is was verified that the
55 case with data on an intermediate node never happens. This is because the
56 data spectrum is too large for such coincidences to happen. It would require
57 for instance that a task has its expiration time at an exact second, with
58 other tasks sharing that second. This is too rare to try to optimize for it.
59
60 What is interesting is that the node will only be added above the leaf when
61 necessary, which implies that it will always remain somewhere above it. So
62 both the leaf and the node can share the exact value of the leaf, because
63 when going down the node, the bit mask will be applied to comparisons. So we
64 are tempted to have one single key shared between the node and the leaf.
65
66 The bit only serves the nodes, and the dups only serve the leaves. So we can
67 put a lot of information in common. This results in one single entity with
68 two branch pointers and two parent pointers, one for the node part, and one
69 for the leaf part :
70
71 node's leaf's
72 parent parent
73 | |
74 [node] [leaf]
75 / \
76 left right
77 branch branch
78
79 The node may very well refer to its leaf counterpart in one of its branches,
80 indicating that its own leaf is just below it :
81
82 node's
83 parent
84 |
85 [node]
86 / \
87 left [leaf]
88 branch
89
90 Adding keys in such a tree simply consists in inserting nodes between
91 other nodes and/or leaves :
92
93 [root]
94 |
95 [node2]
96 / \
97 [leaf1] [node3]
98 / \
99 [leaf2] [leaf3]
100
101 On this diagram, we notice that [node2] and [leaf2] have been pulled away
102 from each other due to the insertion of [node3], just as if there would be
103 an elastic between both parts. This elastic-like behaviour gave its name to
104 the tree : "Elastic Binary Tree", or "EBtree". The entity which associates a
105 node part and a leaf part will be called an "EB node".
106
107 We also notice on the diagram that there is a root entity required to attach
108 the tree. It only contains two branches and there is nothing above it. This
109 is an "EB root". Some will note that [leaf1] has no [node1]. One property of
110 the EBtree is that all nodes have their branches filled, and that if a node
111 has only one branch, it does not need to exist. Here, [leaf1] was added
112 below [root] and did not need any node.
113
114 An EB node contains :
115 - a pointer to the node's parent (node_p)
116 - a pointer to the leaf's parent (leaf_p)
117 - two branches pointing to lower nodes or leaves (branches)
118 - a bit position (bit)
119 - an optional key.
120
121 The key here is optional because it's used only during insertion, in order
122 to classify the nodes. Nothing else in the tree structure requires knowledge
123 of the key. This makes it possible to write type-agnostic primitives for
124 everything, and type-specific insertion primitives. This has led to consider
125 two types of EB nodes. The type-agnostic ones will serve as a header for the
126 other ones, and will simply be called "struct eb_node". The other ones will
127 have their type indicated in the structure name. Eg: "struct eb32_node" for
128 nodes carrying 32 bit keys.
129
130 We will also node that the two branches in a node serve exactly the same
131 purpose as an EB root. For this reason, a "struct eb_root" will be used as
132 well inside the struct eb_node. In order to ease pointer manipulation and
133 ROOT detection when walking upwards, all the pointers inside an eb_node will
134 point to the eb_root part of the referenced EB nodes, relying on the same
135 principle as the linked lists in Linux.
136
137 Another important point to note, is that when walking inside a tree, it is
138 very convenient to know where a node is attached in its parent, and what
139 type of branch it has below it (leaf or node). In order to simplify the
140 operations and to speed up the processing, it was decided in this specific
141 implementation to use the lowest bit from the pointer to designate the side
142 of the upper pointers (left/right) and the type of a branch (leaf/node).
143 This practise is not mandatory by design, but an implementation-specific
144 optimisation permitted on all platforms on which data must be aligned. All
145 known 32 bit platforms align their integers and pointers to 32 bits, leaving
146 the two lower bits unused. So, we say that the pointers are "tagged". And
147 since they designate pointers to root parts, we simply call them
148 "tagged root pointers", or "eb_troot" in the code.
149
150 Duplicate keys are stored in a special manner. When inserting a key, if
151 the same one is found, then an incremental binary tree is built at this
152 place from these keys. This ensures that no special case has to be written
153 to handle duplicates when walking through the tree or when deleting entries.
154 It also guarantees that duplicates will be walked in the exact same order
155 they were inserted. This is very important when trying to achieve fair
156 processing distribution for instance.
157
158 Algorithmic complexity can be derived from 3 variables :
159 - the number of possible different keys in the tree : P
160 - the number of entries in the tree : N
161 - the number of duplicates for one key : D
162
163 Note that this tree is deliberately NOT balanced. For this reason, the worst
164 case may happen with a small tree (eg: 32 distinct keys of one bit). BUT,
165 the operations required to manage such data are so much cheap that they make
166 it worth using it even under such conditions. For instance, a balanced tree
167 may require only 6 levels to store those 32 keys when this tree will
168 require 32. But if per-level operations are 5 times cheaper, it wins.
169
170 Minimal, Maximal and Average times are specified in number of operations.
171 Minimal is given for best condition, Maximal for worst condition, and the
172 average is reported for a tree containing random keys. An operation
173 generally consists in jumping from one node to the other.
174
175 Complexity :
176 - lookup : min=1, max=log(P), avg=log(N)
177 - insertion from root : min=1, max=log(P), avg=log(N)
178 - insertion of dups : min=1, max=log(D), avg=log(D)/2 after lookup
179 - deletion : min=1, max=1, avg=1
180 - prev/next : min=1, max=log(P), avg=2 :
181 N/2 nodes need 1 hop => 1*N/2
182 N/4 nodes need 2 hops => 2*N/4
183 N/8 nodes need 3 hops => 3*N/8
184 ...
185 N/x nodes need log(x) hops => log2(x)*N/x
186 Total cost for all N nodes : sum[i=1..N](log2(i)*N/i) = N*sum[i=1..N](log2(i)/i)
187 Average cost across N nodes = total / N = sum[i=1..N](log2(i)/i) = 2
188
189 This design is currently limited to only two branches per node. Most of the
190 tree descent algorithm would be compatible with more branches (eg: 4, to cut
191 the height in half), but this would probably require more complex operations
192 and the deletion algorithm would be problematic.
193
194 Useful properties :
195 - a node is always added above the leaf it is tied to, and never can get
196 below nor in another branch. This implies that leaves directly attached
197 to the root do not use their node part, which is indicated by a NULL
198 value in node_p. This also enhances the cache efficiency when walking
199 down the tree, because when the leaf is reached, its node part will
200 already have been visited (unless it's the first leaf in the tree).
201
202 - pointers to lower nodes or leaves are stored in "branch" pointers. Only
203 the root node may have a NULL in either branch, it is not possible for
204 other branches. Since the nodes are attached to the left branch of the
205 root, it is not possible to see a NULL left branch when walking up a
206 tree. Thus, an empty tree is immediately identified by a NULL left
207 branch at the root. Conversely, the one and only way to identify the
208 root node is to check that it right branch is NULL.
209
210 - a node connected to its own leaf will have branch[0|1] pointing to
211 itself, and leaf_p pointing to itself.
212
213 - a node can never have node_p pointing to itself.
214
215 - a node is linked in a tree if and only if it has a non-null leaf_p.
216
217 - a node can never have both branches equal, except for the root which can
218 have them both NULL.
219
220 - deletion only applies to leaves. When a leaf is deleted, its parent must
221 be released too (unless it's the root), and its sibling must attach to
222 the grand-parent, replacing the parent. Also, when a leaf is deleted,
223 the node tied to this leaf will be removed and must be released too. If
224 this node is different from the leaf's parent, the freshly released
225 leaf's parent will be used to replace the node which must go. A released
226 node will never be used anymore, so there's no point in tracking it.
227
228 - the bit index in a node indicates the bit position in the key which is
229 represented by the branches. That means that a node with (bit == 0) is
230 just above two leaves. Negative bit values are used to build a duplicate
231 tree. The first node above two identical leaves gets (bit == -1). This
232 value logarithmically decreases as the duplicate tree grows. During
233 duplicate insertion, a node is inserted above the highest bit value (the
234 lowest absolute value) in the tree during the right-sided walk. If bit
235 -1 is not encountered (highest < -1), we insert above last leaf.
236 Otherwise, we insert above the node with the highest value which was not
237 equal to the one of its parent + 1.
238
239 - the "eb_next" primitive walks from left to right, which means from lower
240 to higher keys. It returns duplicates in the order they were inserted.
241 The "eb_first" primitive returns the left-most entry.
242
243 - the "eb_prev" primitive walks from right to left, which means from
244 higher to lower keys. It returns duplicates in the opposite order they
245 were inserted. The "eb_last" primitive returns the right-most entry.
246
247 */
248
Willy Tarreauf56fd8a2007-11-19 18:43:04 +0100249#ifndef _COMMON_EBTREE_H
250#define _COMMON_EBTREE_H
Willy Tarreaue6d2e4d2007-11-15 23:56:17 +0100251
252#include <stdlib.h>
Willy Tarreaube68e6a2007-11-21 10:34:15 +0100253#include <common/config.h>
Willy Tarreaue6d2e4d2007-11-15 23:56:17 +0100254
255/* Note: we never need to run fls on null keys, so we can optimize the fls
256 * function by removing a conditional jump.
257 */
258#if defined(__i386__)
259static inline int flsnz(int x)
260{
261 int r;
262 __asm__("bsrl %1,%0\n"
263 : "=r" (r) : "rm" (x));
264 return r+1;
265}
266#else
267// returns 1 to 32 for 1<<0 to 1<<31. Undefined for 0.
268#define flsnz(___a) ({ \
269 register int ___x, ___bits = 0; \
270 ___x = (___a); \
271 if (___x & 0xffff0000) { ___x &= 0xffff0000; ___bits += 16;} \
272 if (___x & 0xff00ff00) { ___x &= 0xff00ff00; ___bits += 8;} \
273 if (___x & 0xf0f0f0f0) { ___x &= 0xf0f0f0f0; ___bits += 4;} \
274 if (___x & 0xcccccccc) { ___x &= 0xcccccccc; ___bits += 2;} \
275 if (___x & 0xaaaaaaaa) { ___x &= 0xaaaaaaaa; ___bits += 1;} \
276 ___bits + 1; \
277 })
278#endif
279
280static inline int fls64(unsigned long long x)
281{
282 unsigned int h;
283 unsigned int bits = 32;
284
285 h = x >> 32;
286 if (!h) {
287 h = x;
288 bits = 0;
289 }
290 return flsnz(h) + bits;
291}
292
293#define fls_auto(x) ((sizeof(x) > 4) ? fls64(x) : flsnz(x))
294
295/* Linux-like "container_of". It returns a pointer to the structure of type
296 * <type> which has its member <name> stored at address <ptr>.
297 */
298#ifndef container_of
299#define container_of(ptr, type, name) ((type *)(((void *)(ptr)) - ((long)&((type *)0)->name)))
300#endif
301
302/*
303 * Gcc >= 3 provides the ability for the program to give hints to the compiler
304 * about what branch of an if is most likely to be taken. This helps the
305 * compiler produce the most compact critical paths, which is generally better
306 * for the cache and to reduce the number of jumps. Be very careful not to use
307 * this in inline functions, because the code reordering it causes very often
308 * has a negative impact on the calling functions.
309 */
310#if __GNUC__ < 3 && !defined(__builtin_expect)
311#define __builtin_expect(x,y) (x)
312#endif
313
314#ifndef likely
315#define likely(x) (__builtin_expect((x) != 0, 1))
316#define unlikely(x) (__builtin_expect((x) != 0, 0))
317#endif
318
319/* Support passing function parameters in registers. For this, the
320 * CONFIG_EBTREE_REGPARM macro has to be set to the maximal number of registers
321 * allowed. Some functions have intentionally received a regparm lower than
322 * their parameter count, it is in order to avoid register clobbering where
323 * they are called.
324 */
325#ifndef REGPRM1
326#if CONFIG_EBTREE_REGPARM >= 1
327#define REGPRM1 __attribute__((regparm(1)))
328#else
329#define REGPRM1
330#endif
331#endif
332
333#ifndef REGPRM2
334#if CONFIG_EBTREE_REGPARM >= 2
335#define REGPRM2 __attribute__((regparm(2)))
336#else
337#define REGPRM2 REGPRM1
338#endif
339#endif
340
341#ifndef REGPRM3
342#if CONFIG_EBTREE_REGPARM >= 3
343#define REGPRM3 __attribute__((regparm(3)))
344#else
345#define REGPRM3 REGPRM2
346#endif
347#endif
348
349/* Number of bits per node, and number of leaves per node */
350#define EB_NODE_BITS 1
351#define EB_NODE_BRANCHES (1 << EB_NODE_BITS)
352#define EB_NODE_BRANCH_MASK (EB_NODE_BRANCHES - 1)
353
354/* Be careful not to tweak those values. The walking code is optimized for NULL
355 * detection on the assumption that the following values are intact.
356 */
357#define EB_LEFT 0
358#define EB_RGHT 1
359#define EB_LEAF 0
360#define EB_NODE 1
361
362/* This is the same as an eb_node pointer, except that the lower bit embeds
363 * a tag. See eb_dotag()/eb_untag()/eb_gettag(). This tag has two meanings :
364 * - 0=left, 1=right to designate the parent's branch for leaf_p/node_p
365 * - 0=link, 1=leaf to designate the branch's type for branch[]
366 */
367typedef void eb_troot_t;
368
369/* The eb_root connects the node which contains it, to two nodes below it, one
370 * of which may be the same node. At the top of the tree, we use an eb_root
371 * too, which always has its right branch NULL.
372 */
373struct eb_root {
374 eb_troot_t *b[EB_NODE_BRANCHES]; /* left and right branches */
375};
376
377/* The eb_node contains the two parts, one for the leaf, which always exists,
378 * and one for the node, which remains unused in the very first node inserted
379 * into the tree. This structure is 20 bytes per node on 32-bit machines. Do
380 * not change the order, benchmarks have shown that it's optimal this way.
381 */
382struct eb_node {
383 struct eb_root branches; /* branches, must be at the beginning */
384 eb_troot_t *node_p; /* link node's parent */
385 eb_troot_t *leaf_p; /* leaf node's parent */
386 int bit; /* link's bit position. */
387};
388
389/* Return the structure of type <type> whose member <member> points to <ptr> */
390#define eb_entry(ptr, type, member) container_of(ptr, type, member)
391
392/* The root of a tree is an eb_root initialized with both pointers NULL.
393 * During its life, only the left pointer will change. The right one will
394 * always remain NULL, which is the way we detect it.
395 */
396#define EB_ROOT \
397 (struct eb_root) { \
398 .b = {[0] = NULL, [1] = NULL }, \
399 }
400
401#define EB_TREE_HEAD(name) \
402 struct eb_root name = EB_ROOT
403
404
405/***************************************\
406 * Private functions. Not for end-user *
407\***************************************/
408
409/* Converts a root pointer to its equivalent eb_troot_t pointer,
410 * ready to be stored in ->branch[], leaf_p or node_p. NULL is not
411 * conserved. To be used with EB_LEAF, EB_NODE, EB_LEFT or EB_RGHT in <tag>.
412 */
413static inline eb_troot_t *eb_dotag(const struct eb_root *root, const int tag)
414{
415 return (eb_troot_t *)((void *)root + tag);
416}
417
418/* Converts an eb_troot_t pointer pointer to its equivalent eb_root pointer,
419 * for use with pointers from ->branch[], leaf_p or node_p. NULL is conserved
420 * as long as the tree is not corrupted. To be used with EB_LEAF, EB_NODE,
421 * EB_LEFT or EB_RGHT in <tag>.
422 */
423static inline struct eb_root *eb_untag(const eb_troot_t *troot, const int tag)
424{
425 return (struct eb_root *)((void *)troot - tag);
426}
427
428/* returns the tag associated with an eb_troot_t pointer */
429static inline int eb_gettag(eb_troot_t *troot)
430{
431 return (unsigned long)troot & 1;
432}
433
434/* Converts a root pointer to its equivalent eb_troot_t pointer and clears the
435 * tag, no matter what its value was.
436 */
437static inline struct eb_root *eb_clrtag(const eb_troot_t *troot)
438{
439 return (struct eb_root *)((unsigned long)troot & ~1UL);
440}
441
442/* Returns a pointer to the eb_node holding <root> */
443static inline struct eb_node *eb_root_to_node(struct eb_root *root)
444{
445 return container_of(root, struct eb_node, branches);
446}
447
448/* Walks down starting at root pointer <start>, and always walking on side
449 * <side>. It either returns the node hosting the first leaf on that side,
450 * or NULL if no leaf is found. <start> may either be NULL or a branch pointer.
451 * The pointer to the leaf (or NULL) is returned.
452 */
453static inline struct eb_node *eb_walk_down(eb_troot_t *start, unsigned int side)
454{
455 /* A NULL pointer on an empty tree root will be returned as-is */
456 while (eb_gettag(start) == EB_NODE)
457 start = (eb_untag(start, EB_NODE))->b[side];
458 /* NULL is left untouched (root==eb_node, EB_LEAF==0) */
459 return eb_root_to_node(eb_untag(start, EB_LEAF));
460}
461
462/* This function is used to build a tree of duplicates by adding a new node to
463 * a subtree of at least 2 entries. It will probably never be needed inlined,
464 * and it is not for end-user.
465 */
466static inline struct eb_node *
467__eb_insert_dup(struct eb_node *sub, struct eb_node *new)
468{
469 struct eb_node *head = sub;
470
471 struct eb_troot *new_left = eb_dotag(&new->branches, EB_LEFT);
472 struct eb_troot *new_rght = eb_dotag(&new->branches, EB_RGHT);
473 struct eb_troot *new_leaf = eb_dotag(&new->branches, EB_LEAF);
474
475 /* first, identify the deepest hole on the right branch */
476 while (eb_gettag(head->branches.b[EB_RGHT]) != EB_LEAF) {
477 struct eb_node *last = head;
478 head = container_of(eb_untag(head->branches.b[EB_RGHT], EB_NODE),
479 struct eb_node, branches);
480 if (head->bit > last->bit + 1)
481 sub = head; /* there's a hole here */
482 }
483
484 /* Here we have a leaf attached to (head)->b[EB_RGHT] */
485 if (head->bit < -1) {
486 /* A hole exists just before the leaf, we insert there */
487 new->bit = -1;
488 sub = container_of(eb_untag(head->branches.b[EB_RGHT], EB_LEAF),
489 struct eb_node, branches);
490 head->branches.b[EB_RGHT] = eb_dotag(&new->branches, EB_NODE);
491
492 new->node_p = sub->leaf_p;
493 new->leaf_p = new_rght;
494 sub->leaf_p = new_left;
495 new->branches.b[EB_LEFT] = eb_dotag(&sub->branches, EB_LEAF);
496 new->branches.b[EB_RGHT] = new_leaf;
497 return new;
498 } else {
499 int side;
500 /* No hole was found before a leaf. We have to insert above
501 * <sub>. Note that we cannot be certain that <sub> is attached
502 * to the right of its parent, as this is only true if <sub>
503 * is inside the dup tree, not at the head.
504 */
505 new->bit = sub->bit - 1; /* install at the lowest level */
506 side = eb_gettag(sub->node_p);
507 head = container_of(eb_untag(sub->node_p, side), struct eb_node, branches);
508 head->branches.b[side] = eb_dotag(&new->branches, EB_NODE);
509
510 new->node_p = sub->node_p;
511 new->leaf_p = new_rght;
512 sub->node_p = new_left;
513 new->branches.b[EB_LEFT] = eb_dotag(&sub->branches, EB_NODE);
514 new->branches.b[EB_RGHT] = new_leaf;
515 return new;
516 }
517}
518
519
520/**************************************\
521 * Public functions, for the end-user *
522\**************************************/
523
524/* Return the first leaf in the tree starting at <root>, or NULL if none */
525static inline struct eb_node *eb_first(struct eb_root *root)
526{
527 return eb_walk_down(root->b[0], EB_LEFT);
528}
529
530/* Return the last leaf in the tree starting at <root>, or NULL if none */
531static inline struct eb_node *eb_last(struct eb_root *root)
532{
533 return eb_walk_down(root->b[0], EB_RGHT);
534}
535
536/* Return previous leaf node before an existing leaf node, or NULL if none. */
537static inline struct eb_node *eb_prev(struct eb_node *node)
538{
539 eb_troot_t *t = node->leaf_p;
540
541 while (eb_gettag(t) == EB_LEFT) {
542 /* Walking up from left branch. We must ensure that we never
543 * walk beyond root.
544 */
545 if (unlikely((eb_untag(t, EB_LEFT))->b[EB_RGHT] == NULL))
546 return NULL;
547 t = (eb_root_to_node(eb_untag(t, EB_LEFT)))->node_p;
548 }
549 /* Note that <t> cannot be NULL at this stage */
550 t = (eb_untag(t, EB_RGHT))->b[EB_LEFT];
551 return eb_walk_down(t, EB_RGHT);
552}
553
554/* Return next leaf node after an existing leaf node, or NULL if none. */
555static inline struct eb_node *eb_next(struct eb_node *node)
556{
557 eb_troot_t *t = node->leaf_p;
558
559 while (eb_gettag(t) != EB_LEFT)
560 /* Walking up from right branch, so we cannot be below root */
561 t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p;
562
563 /* Note that <t> cannot be NULL at this stage */
564 t = (eb_untag(t, EB_LEFT))->b[EB_RGHT];
565 return eb_walk_down(t, EB_LEFT);
566}
567
568/* Return previous leaf node before an existing leaf node, skipping duplicates,
569 * or NULL if none. */
570static inline struct eb_node *eb_prev_unique(struct eb_node *node)
571{
572 eb_troot_t *t = node->leaf_p;
573
574 while (1) {
575 if (eb_gettag(t) != EB_LEFT) {
576 node = eb_root_to_node(eb_untag(t, EB_RGHT));
577 /* if we're right and not in duplicates, stop here */
578 if (node->bit >= 0)
579 break;
580 t = node->node_p;
581 }
582 else {
583 /* Walking up from left branch. We must ensure that we never
584 * walk beyond root.
585 */
586 if (unlikely((eb_untag(t, EB_LEFT))->b[EB_RGHT] == NULL))
587 return NULL;
588 t = (eb_root_to_node(eb_untag(t, EB_LEFT)))->node_p;
589 }
590 }
591 /* Note that <t> cannot be NULL at this stage */
592 t = (eb_untag(t, EB_RGHT))->b[EB_LEFT];
593 return eb_walk_down(t, EB_RGHT);
594}
595
596/* Return next leaf node after an existing leaf node, skipping duplicates, or
597 * NULL if none.
598 */
599static inline struct eb_node *eb_next_unique(struct eb_node *node)
600{
601 eb_troot_t *t = node->leaf_p;
602
603 while (1) {
604 if (eb_gettag(t) == EB_LEFT) {
605 if (unlikely((eb_untag(t, EB_LEFT))->b[EB_RGHT] == NULL))
606 return NULL; /* we reached root */
607 node = eb_root_to_node(eb_untag(t, EB_LEFT));
608 /* if we're left and not in duplicates, stop here */
609 if (node->bit >= 0)
610 break;
611 t = node->node_p;
612 }
613 else {
614 /* Walking up from right branch, so we cannot be below root */
615 t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p;
616 }
617 }
618
619 /* Note that <t> cannot be NULL at this stage */
620 t = (eb_untag(t, EB_LEFT))->b[EB_RGHT];
621 return eb_walk_down(t, EB_LEFT);
622}
623
624
625/* Removes a leaf node from the tree if it was still in it. Marks the node
626 * as unlinked.
627 */
628static inline void __eb_delete(struct eb_node *node)
629{
630 __label__ delete_unlink;
631 unsigned int pside, gpside, sibtype;
632 struct eb_node *parent;
633 struct eb_root *gparent;
634
635 if (!node->leaf_p)
636 return;
637
638 /* we need the parent, our side, and the grand parent */
639 pside = eb_gettag(node->leaf_p);
640 parent = eb_root_to_node(eb_untag(node->leaf_p, pside));
641
642 /* We likely have to release the parent link, unless it's the root,
643 * in which case we only set our branch to NULL. Note that we can
644 * only be attached to the root by its left branch.
645 */
646
647 if (parent->branches.b[EB_RGHT] == NULL) {
648 /* we're just below the root, it's trivial. */
649 parent->branches.b[EB_LEFT] = NULL;
650 goto delete_unlink;
651 }
652
653 /* To release our parent, we have to identify our sibling, and reparent
654 * it directly to/from the grand parent. Note that the sibling can
655 * either be a link or a leaf.
656 */
657
658 gpside = eb_gettag(parent->node_p);
659 gparent = eb_untag(parent->node_p, gpside);
660
661 gparent->b[gpside] = parent->branches.b[!pside];
662 sibtype = eb_gettag(gparent->b[gpside]);
663
664 if (sibtype == EB_LEAF) {
665 eb_root_to_node(eb_untag(gparent->b[gpside], EB_LEAF))->leaf_p =
666 eb_dotag(gparent, gpside);
667 } else {
668 eb_root_to_node(eb_untag(gparent->b[gpside], EB_NODE))->node_p =
669 eb_dotag(gparent, gpside);
670 }
671 /* Mark the parent unused. Note that we do not check if the parent is
672 * our own node, but that's not a problem because if it is, it will be
673 * marked unused at the same time, which we'll use below to know we can
674 * safely remove it.
675 */
676 parent->node_p = NULL;
677
678 /* The parent node has been detached, and is currently unused. It may
679 * belong to another node, so we cannot remove it that way. Also, our
680 * own node part might still be used. so we can use this spare node
681 * to replace ours if needed.
682 */
683
684 /* If our link part is unused, we can safely exit now */
685 if (!node->node_p)
686 goto delete_unlink;
687
688 /* From now on, <node> and <parent> are necessarily different, and the
689 * <node>'s node part is in use. By definition, <parent> is at least
690 * below <node>, so keeping its key for the bit string is OK.
691 */
692
693 parent->node_p = node->node_p;
694 parent->branches = node->branches;
695 parent->bit = node->bit;
696
697 /* We must now update the new node's parent... */
698 gpside = eb_gettag(parent->node_p);
699 gparent = eb_untag(parent->node_p, gpside);
700 gparent->b[gpside] = eb_dotag(&parent->branches, EB_NODE);
701
702 /* ... and its branches */
703 for (pside = 0; pside <= 1; pside++) {
704 if (eb_gettag(parent->branches.b[pside]) == EB_NODE) {
705 eb_root_to_node(eb_untag(parent->branches.b[pside], EB_NODE))->node_p =
706 eb_dotag(&parent->branches, pside);
707 } else {
708 eb_root_to_node(eb_untag(parent->branches.b[pside], EB_LEAF))->leaf_p =
709 eb_dotag(&parent->branches, pside);
710 }
711 }
712 delete_unlink:
713 /* Now the node has been completely unlinked */
714 node->leaf_p = NULL;
715 return; /* tree is not empty yet */
716}
717
718/* These functions are declared in ebtree.c */
719void eb_delete(struct eb_node *node);
720REGPRM1 struct eb_node *eb_insert_dup(struct eb_node *sub, struct eb_node *new);
721
Willy Tarreauf56fd8a2007-11-19 18:43:04 +0100722#endif /* _COMMON_EBTREE_H */
Willy Tarreaue6d2e4d2007-11-15 23:56:17 +0100723
724/*
725 * Local variables:
726 * c-indent-level: 8
727 * c-basic-offset: 8
728 * End:
729 */