blob: 36fcef50773d4d083d159b418bab73bdca55d105 [file] [log] [blame]
Willy Tarreauc2186022009-10-26 19:48:54 +01001/*
2 * Elastic Binary Trees - generic macros and structures.
Willy Tarreauf3bfede2011-07-25 11:38:17 +02003 * Version 6.0.6
4 * (C) 2002-2011 - Willy Tarreau <w@1wt.eu>
Willy Tarreauc2186022009-10-26 19:48:54 +01005 *
Willy Tarreauf3bfede2011-07-25 11:38:17 +02006 * This library is free software; you can redistribute it and/or
7 * modify it under the terms of the GNU Lesser General Public
8 * License as published by the Free Software Foundation, version 2.1
9 * exclusively.
Willy Tarreauc2186022009-10-26 19:48:54 +010010 *
Willy Tarreauf3bfede2011-07-25 11:38:17 +020011 * This library is distributed in the hope that it will be useful,
Willy Tarreauc2186022009-10-26 19:48:54 +010012 * but WITHOUT ANY WARRANTY; without even the implied warranty of
Willy Tarreauf3bfede2011-07-25 11:38:17 +020013 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
14 * Lesser General Public License for more details.
Willy Tarreauc2186022009-10-26 19:48:54 +010015 *
Willy Tarreauf3bfede2011-07-25 11:38:17 +020016 * You should have received a copy of the GNU Lesser General Public
17 * License along with this library; if not, write to the Free Software
18 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
Willy Tarreauc2186022009-10-26 19:48:54 +010019 */
20
21
22
23/*
24 General idea:
25 -------------
26 In a radix binary tree, we may have up to 2N-1 nodes for N keys if all of
27 them are leaves. If we find a way to differentiate intermediate nodes (later
28 called "nodes") and final nodes (later called "leaves"), and we associate
29 them by two, it is possible to build sort of a self-contained radix tree with
30 intermediate nodes always present. It will not be as cheap as the ultree for
31 optimal cases as shown below, but the optimal case almost never happens :
32
33 Eg, to store 8, 10, 12, 13, 14 :
34
35 ultree this theorical tree
36
37 8 8
38 / \ / \
39 10 12 10 12
40 / \ / \
41 13 14 12 14
42 / \
43 12 13
44
45 Note that on real-world tests (with a scheduler), is was verified that the
46 case with data on an intermediate node never happens. This is because the
47 data spectrum is too large for such coincidences to happen. It would require
48 for instance that a task has its expiration time at an exact second, with
49 other tasks sharing that second. This is too rare to try to optimize for it.
50
51 What is interesting is that the node will only be added above the leaf when
52 necessary, which implies that it will always remain somewhere above it. So
53 both the leaf and the node can share the exact value of the leaf, because
54 when going down the node, the bit mask will be applied to comparisons. So we
55 are tempted to have one single key shared between the node and the leaf.
56
57 The bit only serves the nodes, and the dups only serve the leaves. So we can
58 put a lot of information in common. This results in one single entity with
59 two branch pointers and two parent pointers, one for the node part, and one
60 for the leaf part :
61
62 node's leaf's
63 parent parent
64 | |
65 [node] [leaf]
66 / \
67 left right
68 branch branch
69
70 The node may very well refer to its leaf counterpart in one of its branches,
71 indicating that its own leaf is just below it :
72
73 node's
74 parent
75 |
76 [node]
77 / \
78 left [leaf]
79 branch
80
81 Adding keys in such a tree simply consists in inserting nodes between
82 other nodes and/or leaves :
83
84 [root]
85 |
86 [node2]
87 / \
88 [leaf1] [node3]
89 / \
90 [leaf2] [leaf3]
91
92 On this diagram, we notice that [node2] and [leaf2] have been pulled away
93 from each other due to the insertion of [node3], just as if there would be
94 an elastic between both parts. This elastic-like behaviour gave its name to
95 the tree : "Elastic Binary Tree", or "EBtree". The entity which associates a
96 node part and a leaf part will be called an "EB node".
97
98 We also notice on the diagram that there is a root entity required to attach
99 the tree. It only contains two branches and there is nothing above it. This
100 is an "EB root". Some will note that [leaf1] has no [node1]. One property of
101 the EBtree is that all nodes have their branches filled, and that if a node
102 has only one branch, it does not need to exist. Here, [leaf1] was added
103 below [root] and did not need any node.
104
105 An EB node contains :
106 - a pointer to the node's parent (node_p)
107 - a pointer to the leaf's parent (leaf_p)
108 - two branches pointing to lower nodes or leaves (branches)
109 - a bit position (bit)
110 - an optional key.
111
112 The key here is optional because it's used only during insertion, in order
113 to classify the nodes. Nothing else in the tree structure requires knowledge
114 of the key. This makes it possible to write type-agnostic primitives for
115 everything, and type-specific insertion primitives. This has led to consider
116 two types of EB nodes. The type-agnostic ones will serve as a header for the
117 other ones, and will simply be called "struct eb_node". The other ones will
118 have their type indicated in the structure name. Eg: "struct eb32_node" for
119 nodes carrying 32 bit keys.
120
121 We will also node that the two branches in a node serve exactly the same
122 purpose as an EB root. For this reason, a "struct eb_root" will be used as
123 well inside the struct eb_node. In order to ease pointer manipulation and
124 ROOT detection when walking upwards, all the pointers inside an eb_node will
125 point to the eb_root part of the referenced EB nodes, relying on the same
126 principle as the linked lists in Linux.
127
128 Another important point to note, is that when walking inside a tree, it is
129 very convenient to know where a node is attached in its parent, and what
130 type of branch it has below it (leaf or node). In order to simplify the
131 operations and to speed up the processing, it was decided in this specific
132 implementation to use the lowest bit from the pointer to designate the side
133 of the upper pointers (left/right) and the type of a branch (leaf/node).
134 This practise is not mandatory by design, but an implementation-specific
135 optimisation permitted on all platforms on which data must be aligned. All
136 known 32 bit platforms align their integers and pointers to 32 bits, leaving
137 the two lower bits unused. So, we say that the pointers are "tagged". And
138 since they designate pointers to root parts, we simply call them
139 "tagged root pointers", or "eb_troot" in the code.
140
141 Duplicate keys are stored in a special manner. When inserting a key, if
142 the same one is found, then an incremental binary tree is built at this
143 place from these keys. This ensures that no special case has to be written
144 to handle duplicates when walking through the tree or when deleting entries.
145 It also guarantees that duplicates will be walked in the exact same order
146 they were inserted. This is very important when trying to achieve fair
147 processing distribution for instance.
148
149 Algorithmic complexity can be derived from 3 variables :
150 - the number of possible different keys in the tree : P
151 - the number of entries in the tree : N
152 - the number of duplicates for one key : D
153
154 Note that this tree is deliberately NOT balanced. For this reason, the worst
155 case may happen with a small tree (eg: 32 distinct keys of one bit). BUT,
156 the operations required to manage such data are so much cheap that they make
157 it worth using it even under such conditions. For instance, a balanced tree
158 may require only 6 levels to store those 32 keys when this tree will
159 require 32. But if per-level operations are 5 times cheaper, it wins.
160
161 Minimal, Maximal and Average times are specified in number of operations.
162 Minimal is given for best condition, Maximal for worst condition, and the
163 average is reported for a tree containing random keys. An operation
164 generally consists in jumping from one node to the other.
165
166 Complexity :
167 - lookup : min=1, max=log(P), avg=log(N)
168 - insertion from root : min=1, max=log(P), avg=log(N)
169 - insertion of dups : min=1, max=log(D), avg=log(D)/2 after lookup
170 - deletion : min=1, max=1, avg=1
171 - prev/next : min=1, max=log(P), avg=2 :
172 N/2 nodes need 1 hop => 1*N/2
173 N/4 nodes need 2 hops => 2*N/4
174 N/8 nodes need 3 hops => 3*N/8
175 ...
176 N/x nodes need log(x) hops => log2(x)*N/x
177 Total cost for all N nodes : sum[i=1..N](log2(i)*N/i) = N*sum[i=1..N](log2(i)/i)
178 Average cost across N nodes = total / N = sum[i=1..N](log2(i)/i) = 2
179
180 This design is currently limited to only two branches per node. Most of the
181 tree descent algorithm would be compatible with more branches (eg: 4, to cut
182 the height in half), but this would probably require more complex operations
183 and the deletion algorithm would be problematic.
184
185 Useful properties :
186 - a node is always added above the leaf it is tied to, and never can get
187 below nor in another branch. This implies that leaves directly attached
188 to the root do not use their node part, which is indicated by a NULL
189 value in node_p. This also enhances the cache efficiency when walking
190 down the tree, because when the leaf is reached, its node part will
191 already have been visited (unless it's the first leaf in the tree).
192
193 - pointers to lower nodes or leaves are stored in "branch" pointers. Only
194 the root node may have a NULL in either branch, it is not possible for
195 other branches. Since the nodes are attached to the left branch of the
196 root, it is not possible to see a NULL left branch when walking up a
197 tree. Thus, an empty tree is immediately identified by a NULL left
198 branch at the root. Conversely, the one and only way to identify the
199 root node is to check that it right branch is NULL. Note that the
200 NULL pointer may have a few low-order bits set.
201
202 - a node connected to its own leaf will have branch[0|1] pointing to
203 itself, and leaf_p pointing to itself.
204
205 - a node can never have node_p pointing to itself.
206
207 - a node is linked in a tree if and only if it has a non-null leaf_p.
208
209 - a node can never have both branches equal, except for the root which can
210 have them both NULL.
211
212 - deletion only applies to leaves. When a leaf is deleted, its parent must
213 be released too (unless it's the root), and its sibling must attach to
214 the grand-parent, replacing the parent. Also, when a leaf is deleted,
215 the node tied to this leaf will be removed and must be released too. If
216 this node is different from the leaf's parent, the freshly released
217 leaf's parent will be used to replace the node which must go. A released
218 node will never be used anymore, so there's no point in tracking it.
219
220 - the bit index in a node indicates the bit position in the key which is
221 represented by the branches. That means that a node with (bit == 0) is
222 just above two leaves. Negative bit values are used to build a duplicate
223 tree. The first node above two identical leaves gets (bit == -1). This
224 value logarithmically decreases as the duplicate tree grows. During
225 duplicate insertion, a node is inserted above the highest bit value (the
226 lowest absolute value) in the tree during the right-sided walk. If bit
227 -1 is not encountered (highest < -1), we insert above last leaf.
228 Otherwise, we insert above the node with the highest value which was not
229 equal to the one of its parent + 1.
230
231 - the "eb_next" primitive walks from left to right, which means from lower
232 to higher keys. It returns duplicates in the order they were inserted.
233 The "eb_first" primitive returns the left-most entry.
234
235 - the "eb_prev" primitive walks from right to left, which means from
236 higher to lower keys. It returns duplicates in the opposite order they
237 were inserted. The "eb_last" primitive returns the right-most entry.
238
239 - a tree which has 1 in the lower bit of its root's right branch is a
240 tree with unique nodes. This means that when a node is inserted with
241 a key which already exists will not be inserted, and the previous
242 entry will be returned.
243
244 */
245
246#ifndef _EBTREE_H
247#define _EBTREE_H
248
249#include <stdlib.h>
Willy Tarreau4c7e4b72020-05-27 12:58:42 +0200250#include <haproxy/api.h>
Willy Tarreauc2186022009-10-26 19:48:54 +0100251
Willy Tarreau3a932442010-05-09 19:29:23 +0200252static inline int flsnz8_generic(unsigned int x)
253{
254 int ret = 0;
255 if (x >> 4) { x >>= 4; ret += 4; }
256 return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
257}
258
Willy Tarreauc2186022009-10-26 19:48:54 +0100259/* Note: we never need to run fls on null keys, so we can optimize the fls
260 * function by removing a conditional jump.
261 */
Willy Tarreau3a932442010-05-09 19:29:23 +0200262#if defined(__i386__) || defined(__x86_64__)
263/* this code is similar on 32 and 64 bit */
Willy Tarreauc2186022009-10-26 19:48:54 +0100264static inline int flsnz(int x)
265{
266 int r;
267 __asm__("bsrl %1,%0\n"
268 : "=r" (r) : "rm" (x));
269 return r+1;
270}
Willy Tarreau3a932442010-05-09 19:29:23 +0200271
272static inline int flsnz8(unsigned char x)
273{
274 int r;
275 __asm__("movzbl %%al, %%eax\n"
276 "bsrl %%eax,%0\n"
277 : "=r" (r) : "a" (x));
278 return r+1;
279}
280
Willy Tarreauc2186022009-10-26 19:48:54 +0100281#else
282// returns 1 to 32 for 1<<0 to 1<<31. Undefined for 0.
283#define flsnz(___a) ({ \
284 register int ___x, ___bits = 0; \
285 ___x = (___a); \
286 if (___x & 0xffff0000) { ___x &= 0xffff0000; ___bits += 16;} \
287 if (___x & 0xff00ff00) { ___x &= 0xff00ff00; ___bits += 8;} \
288 if (___x & 0xf0f0f0f0) { ___x &= 0xf0f0f0f0; ___bits += 4;} \
289 if (___x & 0xcccccccc) { ___x &= 0xcccccccc; ___bits += 2;} \
290 if (___x & 0xaaaaaaaa) { ___x &= 0xaaaaaaaa; ___bits += 1;} \
291 ___bits + 1; \
292 })
Willy Tarreau3a932442010-05-09 19:29:23 +0200293
294static inline int flsnz8(unsigned int x)
295{
296 return flsnz8_generic(x);
297}
298
299
Willy Tarreauc2186022009-10-26 19:48:54 +0100300#endif
301
302static inline int fls64(unsigned long long x)
303{
304 unsigned int h;
305 unsigned int bits = 32;
306
307 h = x >> 32;
308 if (!h) {
309 h = x;
310 bits = 0;
311 }
312 return flsnz(h) + bits;
313}
314
315#define fls_auto(x) ((sizeof(x) > 4) ? fls64(x) : flsnz(x))
316
317/* Linux-like "container_of". It returns a pointer to the structure of type
318 * <type> which has its member <name> stored at address <ptr>.
319 */
320#ifndef container_of
321#define container_of(ptr, type, name) ((type *)(((void *)(ptr)) - ((long)&((type *)0)->name)))
322#endif
323
Willy Tarreau2b570202013-05-07 15:58:28 +0200324/* returns a pointer to the structure of type <type> which has its member <name>
325 * stored at address <ptr>, unless <ptr> is 0, in which case 0 is returned.
326 */
327#ifndef container_of_safe
328#define container_of_safe(ptr, type, name) \
329 ({ void *__p = (ptr); \
330 __p ? (type *)(__p - ((long)&((type *)0)->name)) : (type *)0; \
331 })
332#endif
333
Willy Tarreauc2186022009-10-26 19:48:54 +0100334/* Number of bits per node, and number of leaves per node */
335#define EB_NODE_BITS 1
336#define EB_NODE_BRANCHES (1 << EB_NODE_BITS)
337#define EB_NODE_BRANCH_MASK (EB_NODE_BRANCHES - 1)
338
339/* Be careful not to tweak those values. The walking code is optimized for NULL
340 * detection on the assumption that the following values are intact.
341 */
342#define EB_LEFT 0
343#define EB_RGHT 1
344#define EB_LEAF 0
345#define EB_NODE 1
346
347/* Tags to set in root->b[EB_RGHT] :
348 * - EB_NORMAL is a normal tree which stores duplicate keys.
349 * - EB_UNIQUE is a tree which stores unique keys.
350 */
351#define EB_NORMAL 0
352#define EB_UNIQUE 1
353
354/* This is the same as an eb_node pointer, except that the lower bit embeds
355 * a tag. See eb_dotag()/eb_untag()/eb_gettag(). This tag has two meanings :
356 * - 0=left, 1=right to designate the parent's branch for leaf_p/node_p
357 * - 0=link, 1=leaf to designate the branch's type for branch[]
358 */
359typedef void eb_troot_t;
360
361/* The eb_root connects the node which contains it, to two nodes below it, one
362 * of which may be the same node. At the top of the tree, we use an eb_root
363 * too, which always has its right branch NULL (+/1 low-order bits).
364 */
365struct eb_root {
366 eb_troot_t *b[EB_NODE_BRANCHES]; /* left and right branches */
367};
368
369/* The eb_node contains the two parts, one for the leaf, which always exists,
370 * and one for the node, which remains unused in the very first node inserted
371 * into the tree. This structure is 20 bytes per node on 32-bit machines. Do
372 * not change the order, benchmarks have shown that it's optimal this way.
Willy Tarreau2c315ee2020-02-21 15:47:36 +0100373 * Note: be careful about this struct's alignment if it gets included into
374 * another struct and some atomic ops are expected on the keys or the node.
Willy Tarreauc2186022009-10-26 19:48:54 +0100375 */
376struct eb_node {
377 struct eb_root branches; /* branches, must be at the beginning */
378 eb_troot_t *node_p; /* link node's parent */
379 eb_troot_t *leaf_p; /* leaf node's parent */
Willy Tarreau3a932442010-05-09 19:29:23 +0200380 short int bit; /* link's bit position. */
Willy Tarreau22c0a932011-07-25 12:22:44 +0200381 short unsigned int pfx; /* data prefix length, always related to leaf */
Willy Tarreau41136de2020-02-22 15:55:33 +0100382} __attribute__((packed));
Willy Tarreauc2186022009-10-26 19:48:54 +0100383
384/* Return the structure of type <type> whose member <member> points to <ptr> */
385#define eb_entry(ptr, type, member) container_of(ptr, type, member)
386
387/* The root of a tree is an eb_root initialized with both pointers NULL.
388 * During its life, only the left pointer will change. The right one will
389 * always remain NULL, which is the way we detect it.
390 */
391#define EB_ROOT \
392 (struct eb_root) { \
393 .b = {[0] = NULL, [1] = NULL }, \
394 }
395
396#define EB_ROOT_UNIQUE \
397 (struct eb_root) { \
398 .b = {[0] = NULL, [1] = (void *)1 }, \
399 }
400
401#define EB_TREE_HEAD(name) \
402 struct eb_root name = EB_ROOT
403
404
405/***************************************\
406 * Private functions. Not for end-user *
407\***************************************/
408
409/* Converts a root pointer to its equivalent eb_troot_t pointer,
410 * ready to be stored in ->branch[], leaf_p or node_p. NULL is not
411 * conserved. To be used with EB_LEAF, EB_NODE, EB_LEFT or EB_RGHT in <tag>.
412 */
413static inline eb_troot_t *eb_dotag(const struct eb_root *root, const int tag)
414{
415 return (eb_troot_t *)((void *)root + tag);
416}
417
418/* Converts an eb_troot_t pointer pointer to its equivalent eb_root pointer,
419 * for use with pointers from ->branch[], leaf_p or node_p. NULL is conserved
420 * as long as the tree is not corrupted. To be used with EB_LEAF, EB_NODE,
421 * EB_LEFT or EB_RGHT in <tag>.
422 */
423static inline struct eb_root *eb_untag(const eb_troot_t *troot, const int tag)
424{
425 return (struct eb_root *)((void *)troot - tag);
426}
427
428/* returns the tag associated with an eb_troot_t pointer */
429static inline int eb_gettag(eb_troot_t *troot)
430{
431 return (unsigned long)troot & 1;
432}
433
434/* Converts a root pointer to its equivalent eb_troot_t pointer and clears the
435 * tag, no matter what its value was.
436 */
437static inline struct eb_root *eb_clrtag(const eb_troot_t *troot)
438{
439 return (struct eb_root *)((unsigned long)troot & ~1UL);
440}
441
442/* Returns a pointer to the eb_node holding <root> */
443static inline struct eb_node *eb_root_to_node(struct eb_root *root)
444{
445 return container_of(root, struct eb_node, branches);
446}
447
448/* Walks down starting at root pointer <start>, and always walking on side
449 * <side>. It either returns the node hosting the first leaf on that side,
450 * or NULL if no leaf is found. <start> may either be NULL or a branch pointer.
451 * The pointer to the leaf (or NULL) is returned.
452 */
453static inline struct eb_node *eb_walk_down(eb_troot_t *start, unsigned int side)
454{
455 /* A NULL pointer on an empty tree root will be returned as-is */
456 while (eb_gettag(start) == EB_NODE)
457 start = (eb_untag(start, EB_NODE))->b[side];
458 /* NULL is left untouched (root==eb_node, EB_LEAF==0) */
459 return eb_root_to_node(eb_untag(start, EB_LEAF));
460}
461
462/* This function is used to build a tree of duplicates by adding a new node to
463 * a subtree of at least 2 entries. It will probably never be needed inlined,
464 * and it is not for end-user.
465 */
466static forceinline struct eb_node *
467__eb_insert_dup(struct eb_node *sub, struct eb_node *new)
468{
469 struct eb_node *head = sub;
470
Willy Tarreau655c84a2011-09-19 20:36:45 +0200471 eb_troot_t *new_left = eb_dotag(&new->branches, EB_LEFT);
472 eb_troot_t *new_rght = eb_dotag(&new->branches, EB_RGHT);
473 eb_troot_t *new_leaf = eb_dotag(&new->branches, EB_LEAF);
Willy Tarreauc2186022009-10-26 19:48:54 +0100474
475 /* first, identify the deepest hole on the right branch */
476 while (eb_gettag(head->branches.b[EB_RGHT]) != EB_LEAF) {
477 struct eb_node *last = head;
478 head = container_of(eb_untag(head->branches.b[EB_RGHT], EB_NODE),
479 struct eb_node, branches);
480 if (head->bit > last->bit + 1)
481 sub = head; /* there's a hole here */
482 }
483
484 /* Here we have a leaf attached to (head)->b[EB_RGHT] */
485 if (head->bit < -1) {
486 /* A hole exists just before the leaf, we insert there */
487 new->bit = -1;
488 sub = container_of(eb_untag(head->branches.b[EB_RGHT], EB_LEAF),
489 struct eb_node, branches);
490 head->branches.b[EB_RGHT] = eb_dotag(&new->branches, EB_NODE);
491
492 new->node_p = sub->leaf_p;
493 new->leaf_p = new_rght;
494 sub->leaf_p = new_left;
495 new->branches.b[EB_LEFT] = eb_dotag(&sub->branches, EB_LEAF);
496 new->branches.b[EB_RGHT] = new_leaf;
497 return new;
498 } else {
499 int side;
500 /* No hole was found before a leaf. We have to insert above
501 * <sub>. Note that we cannot be certain that <sub> is attached
502 * to the right of its parent, as this is only true if <sub>
503 * is inside the dup tree, not at the head.
504 */
505 new->bit = sub->bit - 1; /* install at the lowest level */
506 side = eb_gettag(sub->node_p);
507 head = container_of(eb_untag(sub->node_p, side), struct eb_node, branches);
508 head->branches.b[side] = eb_dotag(&new->branches, EB_NODE);
509
510 new->node_p = sub->node_p;
511 new->leaf_p = new_rght;
512 sub->node_p = new_left;
513 new->branches.b[EB_LEFT] = eb_dotag(&sub->branches, EB_NODE);
514 new->branches.b[EB_RGHT] = new_leaf;
515 return new;
516 }
517}
518
519
520/**************************************\
521 * Public functions, for the end-user *
522\**************************************/
523
Willy Tarreaufdc10182010-05-16 21:13:24 +0200524/* Return non-zero if the tree is empty, otherwise zero */
Willy Tarreau43be3402019-10-02 15:21:58 +0200525static inline int eb_is_empty(const struct eb_root *root)
Willy Tarreaufdc10182010-05-16 21:13:24 +0200526{
527 return !root->b[EB_LEFT];
528}
529
Willy Tarreau2b570202013-05-07 15:58:28 +0200530/* Return non-zero if the node is a duplicate, otherwise zero */
Willy Tarreau43be3402019-10-02 15:21:58 +0200531static inline int eb_is_dup(const struct eb_node *node)
Willy Tarreau2b570202013-05-07 15:58:28 +0200532{
533 return node->bit < 0;
534}
535
Willy Tarreauc2186022009-10-26 19:48:54 +0100536/* Return the first leaf in the tree starting at <root>, or NULL if none */
537static inline struct eb_node *eb_first(struct eb_root *root)
538{
539 return eb_walk_down(root->b[0], EB_LEFT);
540}
541
542/* Return the last leaf in the tree starting at <root>, or NULL if none */
543static inline struct eb_node *eb_last(struct eb_root *root)
544{
545 return eb_walk_down(root->b[0], EB_RGHT);
546}
547
548/* Return previous leaf node before an existing leaf node, or NULL if none. */
549static inline struct eb_node *eb_prev(struct eb_node *node)
550{
551 eb_troot_t *t = node->leaf_p;
552
553 while (eb_gettag(t) == EB_LEFT) {
554 /* Walking up from left branch. We must ensure that we never
555 * walk beyond root.
556 */
557 if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL))
558 return NULL;
559 t = (eb_root_to_node(eb_untag(t, EB_LEFT)))->node_p;
560 }
561 /* Note that <t> cannot be NULL at this stage */
562 t = (eb_untag(t, EB_RGHT))->b[EB_LEFT];
563 return eb_walk_down(t, EB_RGHT);
564}
565
566/* Return next leaf node after an existing leaf node, or NULL if none. */
567static inline struct eb_node *eb_next(struct eb_node *node)
568{
569 eb_troot_t *t = node->leaf_p;
570
571 while (eb_gettag(t) != EB_LEFT)
572 /* Walking up from right branch, so we cannot be below root */
573 t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p;
574
575 /* Note that <t> cannot be NULL at this stage */
Willy Tarreau2b570202013-05-07 15:58:28 +0200576 t = (eb_untag(t, EB_LEFT))->b[EB_RGHT];
577 if (eb_clrtag(t) == NULL)
578 return NULL;
579 return eb_walk_down(t, EB_LEFT);
580}
581
582/* Return previous leaf node within a duplicate sub-tree, or NULL if none. */
583static inline struct eb_node *eb_prev_dup(struct eb_node *node)
584{
585 eb_troot_t *t = node->leaf_p;
586
587 while (eb_gettag(t) == EB_LEFT) {
588 /* Walking up from left branch. We must ensure that we never
589 * walk beyond root.
590 */
591 if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL))
592 return NULL;
593 /* if the current node leaves a dup tree, quit */
594 if ((eb_root_to_node(eb_untag(t, EB_LEFT)))->bit >= 0)
595 return NULL;
596 t = (eb_root_to_node(eb_untag(t, EB_LEFT)))->node_p;
597 }
598 /* Note that <t> cannot be NULL at this stage */
599 if ((eb_root_to_node(eb_untag(t, EB_RGHT)))->bit >= 0)
600 return NULL;
601 t = (eb_untag(t, EB_RGHT))->b[EB_LEFT];
602 return eb_walk_down(t, EB_RGHT);
603}
604
605/* Return next leaf node within a duplicate sub-tree, or NULL if none. */
606static inline struct eb_node *eb_next_dup(struct eb_node *node)
607{
608 eb_troot_t *t = node->leaf_p;
609
610 while (eb_gettag(t) != EB_LEFT) {
611 /* Walking up from right branch, so we cannot be below root */
612 /* if the current node leaves a dup tree, quit */
613 if ((eb_root_to_node(eb_untag(t, EB_RGHT)))->bit >= 0)
614 return NULL;
615 t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p;
616 }
617
Remi Tricot-Le Breton0179e1d2021-05-18 18:56:42 +0200618 /* Note that <t> cannot be NULL at this stage. If our leaf is directly
619 * under the root, we must not try to cast the leaf_p into a eb_node*
620 * since it is a pointer to an eb_root.
621 */
622 if (eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL)
623 return NULL;
624
Willy Tarreau2b570202013-05-07 15:58:28 +0200625 if ((eb_root_to_node(eb_untag(t, EB_LEFT)))->bit >= 0)
626 return NULL;
Willy Tarreauc2186022009-10-26 19:48:54 +0100627 t = (eb_untag(t, EB_LEFT))->b[EB_RGHT];
Willy Tarreauc2186022009-10-26 19:48:54 +0100628 return eb_walk_down(t, EB_LEFT);
629}
630
631/* Return previous leaf node before an existing leaf node, skipping duplicates,
632 * or NULL if none. */
633static inline struct eb_node *eb_prev_unique(struct eb_node *node)
634{
635 eb_troot_t *t = node->leaf_p;
636
637 while (1) {
638 if (eb_gettag(t) != EB_LEFT) {
639 node = eb_root_to_node(eb_untag(t, EB_RGHT));
640 /* if we're right and not in duplicates, stop here */
641 if (node->bit >= 0)
642 break;
643 t = node->node_p;
644 }
645 else {
646 /* Walking up from left branch. We must ensure that we never
647 * walk beyond root.
648 */
649 if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL))
650 return NULL;
651 t = (eb_root_to_node(eb_untag(t, EB_LEFT)))->node_p;
652 }
653 }
654 /* Note that <t> cannot be NULL at this stage */
655 t = (eb_untag(t, EB_RGHT))->b[EB_LEFT];
656 return eb_walk_down(t, EB_RGHT);
657}
658
659/* Return next leaf node after an existing leaf node, skipping duplicates, or
660 * NULL if none.
661 */
662static inline struct eb_node *eb_next_unique(struct eb_node *node)
663{
664 eb_troot_t *t = node->leaf_p;
665
666 while (1) {
667 if (eb_gettag(t) == EB_LEFT) {
668 if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL))
669 return NULL; /* we reached root */
670 node = eb_root_to_node(eb_untag(t, EB_LEFT));
671 /* if we're left and not in duplicates, stop here */
672 if (node->bit >= 0)
673 break;
674 t = node->node_p;
675 }
676 else {
677 /* Walking up from right branch, so we cannot be below root */
678 t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p;
679 }
680 }
681
682 /* Note that <t> cannot be NULL at this stage */
683 t = (eb_untag(t, EB_LEFT))->b[EB_RGHT];
684 if (eb_clrtag(t) == NULL)
685 return NULL;
686 return eb_walk_down(t, EB_LEFT);
687}
688
689
690/* Removes a leaf node from the tree if it was still in it. Marks the node
691 * as unlinked.
692 */
693static forceinline void __eb_delete(struct eb_node *node)
694{
695 __label__ delete_unlink;
696 unsigned int pside, gpside, sibtype;
697 struct eb_node *parent;
698 struct eb_root *gparent;
699
700 if (!node->leaf_p)
701 return;
702
703 /* we need the parent, our side, and the grand parent */
704 pside = eb_gettag(node->leaf_p);
705 parent = eb_root_to_node(eb_untag(node->leaf_p, pside));
706
707 /* We likely have to release the parent link, unless it's the root,
708 * in which case we only set our branch to NULL. Note that we can
709 * only be attached to the root by its left branch.
710 */
711
712 if (eb_clrtag(parent->branches.b[EB_RGHT]) == NULL) {
713 /* we're just below the root, it's trivial. */
714 parent->branches.b[EB_LEFT] = NULL;
715 goto delete_unlink;
716 }
717
718 /* To release our parent, we have to identify our sibling, and reparent
719 * it directly to/from the grand parent. Note that the sibling can
720 * either be a link or a leaf.
721 */
722
723 gpside = eb_gettag(parent->node_p);
724 gparent = eb_untag(parent->node_p, gpside);
725
726 gparent->b[gpside] = parent->branches.b[!pside];
727 sibtype = eb_gettag(gparent->b[gpside]);
728
729 if (sibtype == EB_LEAF) {
730 eb_root_to_node(eb_untag(gparent->b[gpside], EB_LEAF))->leaf_p =
731 eb_dotag(gparent, gpside);
732 } else {
733 eb_root_to_node(eb_untag(gparent->b[gpside], EB_NODE))->node_p =
734 eb_dotag(gparent, gpside);
735 }
736 /* Mark the parent unused. Note that we do not check if the parent is
737 * our own node, but that's not a problem because if it is, it will be
738 * marked unused at the same time, which we'll use below to know we can
739 * safely remove it.
740 */
741 parent->node_p = NULL;
742
743 /* The parent node has been detached, and is currently unused. It may
744 * belong to another node, so we cannot remove it that way. Also, our
745 * own node part might still be used. so we can use this spare node
746 * to replace ours if needed.
747 */
748
749 /* If our link part is unused, we can safely exit now */
750 if (!node->node_p)
751 goto delete_unlink;
752
753 /* From now on, <node> and <parent> are necessarily different, and the
754 * <node>'s node part is in use. By definition, <parent> is at least
755 * below <node>, so keeping its key for the bit string is OK.
756 */
757
758 parent->node_p = node->node_p;
759 parent->branches = node->branches;
760 parent->bit = node->bit;
761
762 /* We must now update the new node's parent... */
763 gpside = eb_gettag(parent->node_p);
764 gparent = eb_untag(parent->node_p, gpside);
765 gparent->b[gpside] = eb_dotag(&parent->branches, EB_NODE);
766
767 /* ... and its branches */
768 for (pside = 0; pside <= 1; pside++) {
769 if (eb_gettag(parent->branches.b[pside]) == EB_NODE) {
770 eb_root_to_node(eb_untag(parent->branches.b[pside], EB_NODE))->node_p =
771 eb_dotag(&parent->branches, pside);
772 } else {
773 eb_root_to_node(eb_untag(parent->branches.b[pside], EB_LEAF))->leaf_p =
774 eb_dotag(&parent->branches, pside);
775 }
776 }
777 delete_unlink:
778 /* Now the node has been completely unlinked */
779 node->leaf_p = NULL;
780 return; /* tree is not empty yet */
781}
782
783/* Compare blocks <a> and <b> byte-to-byte, from bit <ignore> to bit <len-1>.
784 * Return the number of equal bits between strings, assuming that the first
785 * <ignore> bits are already identical. It is possible to return slightly more
786 * than <len> bits if <len> does not stop on a byte boundary and we find exact
787 * bytes. Note that parts or all of <ignore> bits may be rechecked. It is only
788 * passed here as a hint to speed up the check.
789 */
Willy Tarreau3a932442010-05-09 19:29:23 +0200790static forceinline int equal_bits(const unsigned char *a,
791 const unsigned char *b,
792 int ignore, int len)
Willy Tarreauc2186022009-10-26 19:48:54 +0100793{
Willy Tarreau3a932442010-05-09 19:29:23 +0200794 for (ignore >>= 3, a += ignore, b += ignore, ignore <<= 3;
795 ignore < len; ) {
796 unsigned char c;
Willy Tarreauc2186022009-10-26 19:48:54 +0100797
Willy Tarreau3a932442010-05-09 19:29:23 +0200798 a++; b++;
799 ignore += 8;
800 c = b[-1] ^ a[-1];
801
802 if (c) {
803 /* OK now we know that old and new differ at byte <ptr> and that <c> holds
804 * the bit differences. We have to find what bit is differing and report
805 * it as the number of identical bits. Note that low bit numbers are
806 * assigned to high positions in the byte, as we compare them as strings.
807 */
808 ignore -= flsnz8(c);
809 break;
810 }
811 }
812 return ignore;
813}
Willy Tarreauc2186022009-10-26 19:48:54 +0100814
Willy Tarreau3a932442010-05-09 19:29:23 +0200815/* check that the two blocks <a> and <b> are equal on <len> bits. If it is known
816 * they already are on some bytes, this number of equal bytes to be skipped may
817 * be passed in <skip>. It returns 0 if they match, otherwise non-zero.
818 */
819static forceinline int check_bits(const unsigned char *a,
820 const unsigned char *b,
821 int skip,
822 int len)
823{
824 int bit, ret;
825
826 /* This uncommon construction gives the best performance on x86 because
827 * it makes heavy use multiple-index addressing and parallel instructions,
828 * and it prevents gcc from reordering the loop since it is already
829 * properly oriented. Tested to be fine with 2.95 to 4.2.
Willy Tarreauc2186022009-10-26 19:48:54 +0100830 */
Willy Tarreau3a932442010-05-09 19:29:23 +0200831 bit = ~len + (skip << 3) + 9; // = (skip << 3) + (8 - len)
832 ret = a[skip] ^ b[skip];
833 if (unlikely(bit >= 0))
834 return ret >> bit;
835 while (1) {
836 skip++;
837 if (ret)
838 return ret;
839 ret = a[skip] ^ b[skip];
840 bit += 8;
841 if (bit >= 0)
842 return ret >> bit;
843 }
Willy Tarreauc2186022009-10-26 19:48:54 +0100844}
845
Willy Tarreau3a932442010-05-09 19:29:23 +0200846
Willy Tarreauc2186022009-10-26 19:48:54 +0100847/* Compare strings <a> and <b> byte-to-byte, from bit <ignore> to the last 0.
848 * Return the number of equal bits between strings, assuming that the first
849 * <ignore> bits are already identical. Note that parts or all of <ignore> bits
850 * may be rechecked. It is only passed here as a hint to speed up the check.
851 * The caller is responsible for not passing an <ignore> value larger than any
852 * of the two strings. However, referencing any bit from the trailing zero is
Willy Tarreaub55fcf22010-10-28 22:48:29 +0200853 * permitted. Equal strings are reported as a negative number of bits, which
854 * indicates the end was reached.
Willy Tarreauc2186022009-10-26 19:48:54 +0100855 */
Willy Tarreau3a932442010-05-09 19:29:23 +0200856static forceinline int string_equal_bits(const unsigned char *a,
857 const unsigned char *b,
858 int ignore)
Willy Tarreauc2186022009-10-26 19:48:54 +0100859{
Willy Tarreau3a932442010-05-09 19:29:23 +0200860 int beg;
Willy Tarreauc2186022009-10-26 19:48:54 +0100861 unsigned char c;
862
863 beg = ignore >> 3;
864
865 /* skip known and identical bits. We stop at the first different byte
866 * or at the first zero we encounter on either side.
867 */
868 while (1) {
869 unsigned char d;
870
871 c = a[beg];
872 d = b[beg];
873 beg++;
874
875 c ^= d;
876 if (c)
877 break;
878 if (!d)
Willy Tarreaub55fcf22010-10-28 22:48:29 +0200879 return -1;
Willy Tarreauc2186022009-10-26 19:48:54 +0100880 }
Willy Tarreauc2186022009-10-26 19:48:54 +0100881 /* OK now we know that a and b differ at byte <beg>, or that both are zero.
882 * We have to find what bit is differing and report it as the number of
883 * identical bits. Note that low bit numbers are assigned to high positions
884 * in the byte, as we compare them as strings.
885 */
Willy Tarreau3a932442010-05-09 19:29:23 +0200886 return (beg << 3) - flsnz8(c);
Willy Tarreauc2186022009-10-26 19:48:54 +0100887}
888
889static forceinline int cmp_bits(const unsigned char *a, const unsigned char *b, unsigned int pos)
890{
891 unsigned int ofs;
892 unsigned char bit_a, bit_b;
893
894 ofs = pos >> 3;
895 pos = ~pos & 7;
896
897 bit_a = (a[ofs] >> pos) & 1;
898 bit_b = (b[ofs] >> pos) & 1;
899
900 return bit_a - bit_b; /* -1: a<b; 0: a=b; 1: a>b */
901}
902
903static forceinline int get_bit(const unsigned char *a, unsigned int pos)
904{
905 unsigned int ofs;
906
907 ofs = pos >> 3;
908 pos = ~pos & 7;
909 return (a[ofs] >> pos) & 1;
910}
911
912/* These functions are declared in ebtree.c */
913void eb_delete(struct eb_node *node);
Willy Tarreau03e78532020-02-25 07:38:05 +0100914struct eb_node *eb_insert_dup(struct eb_node *sub, struct eb_node *new);
Willy Tarreau853926a2020-06-16 11:10:53 +0200915int eb_memcmp(const void *m1, const void *m2, size_t len);
Willy Tarreauc2186022009-10-26 19:48:54 +0100916
917#endif /* _EB_TREE_H */
918
919/*
920 * Local variables:
921 * c-indent-level: 8
922 * c-basic-offset: 8
923 * End:
924 */