Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 1 | /* |
| 2 | * Elastic Binary Trees - generic macros and structures. |
Willy Tarreau | 1fb6c87 | 2008-05-16 19:48:20 +0200 | [diff] [blame] | 3 | * Version 4.0 |
Willy Tarreau | 70bcfb7 | 2008-01-27 02:21:53 +0100 | [diff] [blame] | 4 | * (C) 2002-2008 - Willy Tarreau <w@1wt.eu> |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 5 | * |
| 6 | * This program is free software; you can redistribute it and/or modify |
| 7 | * it under the terms of the GNU General Public License as published by |
| 8 | * the Free Software Foundation; either version 2 of the License, or |
| 9 | * (at your option) any later version. |
| 10 | * |
| 11 | * This program is distributed in the hope that it will be useful, |
| 12 | * but WITHOUT ANY WARRANTY; without even the implied warranty of |
| 13 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
| 14 | * GNU General Public License for more details. |
| 15 | * |
| 16 | * You should have received a copy of the GNU General Public License |
| 17 | * along with this program; if not, write to the Free Software |
| 18 | * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA |
| 19 | * |
| 20 | * |
| 21 | * Short history : |
| 22 | * |
| 23 | * 2007/09/28: full support for the duplicates tree => v3 |
| 24 | * 2007/07/08: merge back cleanups from kernel version. |
| 25 | * 2007/07/01: merge into Linux Kernel (try 1). |
| 26 | * 2007/05/27: version 2: compact everything into one single struct |
| 27 | * 2007/05/18: adapted the structure to support embedded nodes |
| 28 | * 2007/05/13: adapted to mempools v2. |
| 29 | */ |
| 30 | |
| 31 | |
| 32 | |
| 33 | /* |
| 34 | General idea: |
| 35 | ------------- |
| 36 | In a radix binary tree, we may have up to 2N-1 nodes for N keys if all of |
| 37 | them are leaves. If we find a way to differentiate intermediate nodes (later |
| 38 | called "nodes") and final nodes (later called "leaves"), and we associate |
| 39 | them by two, it is possible to build sort of a self-contained radix tree with |
| 40 | intermediate nodes always present. It will not be as cheap as the ultree for |
| 41 | optimal cases as shown below, but the optimal case almost never happens : |
| 42 | |
| 43 | Eg, to store 8, 10, 12, 13, 14 : |
| 44 | |
| 45 | ultree this theorical tree |
| 46 | |
| 47 | 8 8 |
| 48 | / \ / \ |
| 49 | 10 12 10 12 |
| 50 | / \ / \ |
| 51 | 13 14 12 14 |
| 52 | / \ |
| 53 | 12 13 |
| 54 | |
| 55 | Note that on real-world tests (with a scheduler), is was verified that the |
| 56 | case with data on an intermediate node never happens. This is because the |
| 57 | data spectrum is too large for such coincidences to happen. It would require |
| 58 | for instance that a task has its expiration time at an exact second, with |
| 59 | other tasks sharing that second. This is too rare to try to optimize for it. |
| 60 | |
| 61 | What is interesting is that the node will only be added above the leaf when |
| 62 | necessary, which implies that it will always remain somewhere above it. So |
| 63 | both the leaf and the node can share the exact value of the leaf, because |
| 64 | when going down the node, the bit mask will be applied to comparisons. So we |
| 65 | are tempted to have one single key shared between the node and the leaf. |
| 66 | |
| 67 | The bit only serves the nodes, and the dups only serve the leaves. So we can |
| 68 | put a lot of information in common. This results in one single entity with |
| 69 | two branch pointers and two parent pointers, one for the node part, and one |
| 70 | for the leaf part : |
| 71 | |
| 72 | node's leaf's |
| 73 | parent parent |
| 74 | | | |
| 75 | [node] [leaf] |
| 76 | / \ |
| 77 | left right |
| 78 | branch branch |
| 79 | |
| 80 | The node may very well refer to its leaf counterpart in one of its branches, |
| 81 | indicating that its own leaf is just below it : |
| 82 | |
| 83 | node's |
| 84 | parent |
| 85 | | |
| 86 | [node] |
| 87 | / \ |
| 88 | left [leaf] |
| 89 | branch |
| 90 | |
| 91 | Adding keys in such a tree simply consists in inserting nodes between |
| 92 | other nodes and/or leaves : |
| 93 | |
| 94 | [root] |
| 95 | | |
| 96 | [node2] |
| 97 | / \ |
| 98 | [leaf1] [node3] |
| 99 | / \ |
| 100 | [leaf2] [leaf3] |
| 101 | |
| 102 | On this diagram, we notice that [node2] and [leaf2] have been pulled away |
| 103 | from each other due to the insertion of [node3], just as if there would be |
| 104 | an elastic between both parts. This elastic-like behaviour gave its name to |
| 105 | the tree : "Elastic Binary Tree", or "EBtree". The entity which associates a |
| 106 | node part and a leaf part will be called an "EB node". |
| 107 | |
| 108 | We also notice on the diagram that there is a root entity required to attach |
| 109 | the tree. It only contains two branches and there is nothing above it. This |
| 110 | is an "EB root". Some will note that [leaf1] has no [node1]. One property of |
| 111 | the EBtree is that all nodes have their branches filled, and that if a node |
| 112 | has only one branch, it does not need to exist. Here, [leaf1] was added |
| 113 | below [root] and did not need any node. |
| 114 | |
| 115 | An EB node contains : |
| 116 | - a pointer to the node's parent (node_p) |
| 117 | - a pointer to the leaf's parent (leaf_p) |
| 118 | - two branches pointing to lower nodes or leaves (branches) |
| 119 | - a bit position (bit) |
| 120 | - an optional key. |
| 121 | |
| 122 | The key here is optional because it's used only during insertion, in order |
| 123 | to classify the nodes. Nothing else in the tree structure requires knowledge |
| 124 | of the key. This makes it possible to write type-agnostic primitives for |
| 125 | everything, and type-specific insertion primitives. This has led to consider |
| 126 | two types of EB nodes. The type-agnostic ones will serve as a header for the |
| 127 | other ones, and will simply be called "struct eb_node". The other ones will |
| 128 | have their type indicated in the structure name. Eg: "struct eb32_node" for |
| 129 | nodes carrying 32 bit keys. |
| 130 | |
| 131 | We will also node that the two branches in a node serve exactly the same |
| 132 | purpose as an EB root. For this reason, a "struct eb_root" will be used as |
| 133 | well inside the struct eb_node. In order to ease pointer manipulation and |
| 134 | ROOT detection when walking upwards, all the pointers inside an eb_node will |
| 135 | point to the eb_root part of the referenced EB nodes, relying on the same |
| 136 | principle as the linked lists in Linux. |
| 137 | |
| 138 | Another important point to note, is that when walking inside a tree, it is |
| 139 | very convenient to know where a node is attached in its parent, and what |
| 140 | type of branch it has below it (leaf or node). In order to simplify the |
| 141 | operations and to speed up the processing, it was decided in this specific |
| 142 | implementation to use the lowest bit from the pointer to designate the side |
| 143 | of the upper pointers (left/right) and the type of a branch (leaf/node). |
| 144 | This practise is not mandatory by design, but an implementation-specific |
| 145 | optimisation permitted on all platforms on which data must be aligned. All |
| 146 | known 32 bit platforms align their integers and pointers to 32 bits, leaving |
| 147 | the two lower bits unused. So, we say that the pointers are "tagged". And |
| 148 | since they designate pointers to root parts, we simply call them |
| 149 | "tagged root pointers", or "eb_troot" in the code. |
| 150 | |
| 151 | Duplicate keys are stored in a special manner. When inserting a key, if |
| 152 | the same one is found, then an incremental binary tree is built at this |
| 153 | place from these keys. This ensures that no special case has to be written |
| 154 | to handle duplicates when walking through the tree or when deleting entries. |
| 155 | It also guarantees that duplicates will be walked in the exact same order |
| 156 | they were inserted. This is very important when trying to achieve fair |
| 157 | processing distribution for instance. |
| 158 | |
| 159 | Algorithmic complexity can be derived from 3 variables : |
| 160 | - the number of possible different keys in the tree : P |
| 161 | - the number of entries in the tree : N |
| 162 | - the number of duplicates for one key : D |
| 163 | |
| 164 | Note that this tree is deliberately NOT balanced. For this reason, the worst |
| 165 | case may happen with a small tree (eg: 32 distinct keys of one bit). BUT, |
| 166 | the operations required to manage such data are so much cheap that they make |
| 167 | it worth using it even under such conditions. For instance, a balanced tree |
| 168 | may require only 6 levels to store those 32 keys when this tree will |
| 169 | require 32. But if per-level operations are 5 times cheaper, it wins. |
| 170 | |
| 171 | Minimal, Maximal and Average times are specified in number of operations. |
| 172 | Minimal is given for best condition, Maximal for worst condition, and the |
| 173 | average is reported for a tree containing random keys. An operation |
| 174 | generally consists in jumping from one node to the other. |
| 175 | |
| 176 | Complexity : |
| 177 | - lookup : min=1, max=log(P), avg=log(N) |
| 178 | - insertion from root : min=1, max=log(P), avg=log(N) |
| 179 | - insertion of dups : min=1, max=log(D), avg=log(D)/2 after lookup |
| 180 | - deletion : min=1, max=1, avg=1 |
| 181 | - prev/next : min=1, max=log(P), avg=2 : |
| 182 | N/2 nodes need 1 hop => 1*N/2 |
| 183 | N/4 nodes need 2 hops => 2*N/4 |
| 184 | N/8 nodes need 3 hops => 3*N/8 |
| 185 | ... |
| 186 | N/x nodes need log(x) hops => log2(x)*N/x |
| 187 | Total cost for all N nodes : sum[i=1..N](log2(i)*N/i) = N*sum[i=1..N](log2(i)/i) |
| 188 | Average cost across N nodes = total / N = sum[i=1..N](log2(i)/i) = 2 |
| 189 | |
| 190 | This design is currently limited to only two branches per node. Most of the |
| 191 | tree descent algorithm would be compatible with more branches (eg: 4, to cut |
| 192 | the height in half), but this would probably require more complex operations |
| 193 | and the deletion algorithm would be problematic. |
| 194 | |
| 195 | Useful properties : |
| 196 | - a node is always added above the leaf it is tied to, and never can get |
| 197 | below nor in another branch. This implies that leaves directly attached |
| 198 | to the root do not use their node part, which is indicated by a NULL |
| 199 | value in node_p. This also enhances the cache efficiency when walking |
| 200 | down the tree, because when the leaf is reached, its node part will |
| 201 | already have been visited (unless it's the first leaf in the tree). |
| 202 | |
| 203 | - pointers to lower nodes or leaves are stored in "branch" pointers. Only |
| 204 | the root node may have a NULL in either branch, it is not possible for |
| 205 | other branches. Since the nodes are attached to the left branch of the |
| 206 | root, it is not possible to see a NULL left branch when walking up a |
| 207 | tree. Thus, an empty tree is immediately identified by a NULL left |
| 208 | branch at the root. Conversely, the one and only way to identify the |
Willy Tarreau | 1fb6c87 | 2008-05-16 19:48:20 +0200 | [diff] [blame] | 209 | root node is to check that it right branch is NULL. Note that the |
| 210 | NULL pointer may have a few low-order bits set. |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 211 | |
| 212 | - a node connected to its own leaf will have branch[0|1] pointing to |
| 213 | itself, and leaf_p pointing to itself. |
| 214 | |
| 215 | - a node can never have node_p pointing to itself. |
| 216 | |
| 217 | - a node is linked in a tree if and only if it has a non-null leaf_p. |
| 218 | |
| 219 | - a node can never have both branches equal, except for the root which can |
| 220 | have them both NULL. |
| 221 | |
| 222 | - deletion only applies to leaves. When a leaf is deleted, its parent must |
| 223 | be released too (unless it's the root), and its sibling must attach to |
| 224 | the grand-parent, replacing the parent. Also, when a leaf is deleted, |
| 225 | the node tied to this leaf will be removed and must be released too. If |
| 226 | this node is different from the leaf's parent, the freshly released |
| 227 | leaf's parent will be used to replace the node which must go. A released |
| 228 | node will never be used anymore, so there's no point in tracking it. |
| 229 | |
| 230 | - the bit index in a node indicates the bit position in the key which is |
| 231 | represented by the branches. That means that a node with (bit == 0) is |
| 232 | just above two leaves. Negative bit values are used to build a duplicate |
| 233 | tree. The first node above two identical leaves gets (bit == -1). This |
| 234 | value logarithmically decreases as the duplicate tree grows. During |
| 235 | duplicate insertion, a node is inserted above the highest bit value (the |
| 236 | lowest absolute value) in the tree during the right-sided walk. If bit |
| 237 | -1 is not encountered (highest < -1), we insert above last leaf. |
| 238 | Otherwise, we insert above the node with the highest value which was not |
| 239 | equal to the one of its parent + 1. |
| 240 | |
| 241 | - the "eb_next" primitive walks from left to right, which means from lower |
| 242 | to higher keys. It returns duplicates in the order they were inserted. |
| 243 | The "eb_first" primitive returns the left-most entry. |
| 244 | |
| 245 | - the "eb_prev" primitive walks from right to left, which means from |
| 246 | higher to lower keys. It returns duplicates in the opposite order they |
| 247 | were inserted. The "eb_last" primitive returns the right-most entry. |
| 248 | |
Willy Tarreau | 1fb6c87 | 2008-05-16 19:48:20 +0200 | [diff] [blame] | 249 | - a tree which has 1 in the lower bit of its root's right branch is a |
| 250 | tree with unique nodes. This means that when a node is inserted with |
| 251 | a key which already exists will not be inserted, and the previous |
| 252 | entry will be returned. |
| 253 | |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 254 | */ |
| 255 | |
Willy Tarreau | f56fd8a | 2007-11-19 18:43:04 +0100 | [diff] [blame] | 256 | #ifndef _COMMON_EBTREE_H |
| 257 | #define _COMMON_EBTREE_H |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 258 | |
| 259 | #include <stdlib.h> |
Willy Tarreau | be68e6a | 2007-11-21 10:34:15 +0100 | [diff] [blame] | 260 | #include <common/config.h> |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 261 | |
| 262 | /* Note: we never need to run fls on null keys, so we can optimize the fls |
| 263 | * function by removing a conditional jump. |
| 264 | */ |
| 265 | #if defined(__i386__) |
| 266 | static inline int flsnz(int x) |
| 267 | { |
| 268 | int r; |
| 269 | __asm__("bsrl %1,%0\n" |
| 270 | : "=r" (r) : "rm" (x)); |
| 271 | return r+1; |
| 272 | } |
| 273 | #else |
| 274 | // returns 1 to 32 for 1<<0 to 1<<31. Undefined for 0. |
| 275 | #define flsnz(___a) ({ \ |
| 276 | register int ___x, ___bits = 0; \ |
| 277 | ___x = (___a); \ |
| 278 | if (___x & 0xffff0000) { ___x &= 0xffff0000; ___bits += 16;} \ |
| 279 | if (___x & 0xff00ff00) { ___x &= 0xff00ff00; ___bits += 8;} \ |
| 280 | if (___x & 0xf0f0f0f0) { ___x &= 0xf0f0f0f0; ___bits += 4;} \ |
| 281 | if (___x & 0xcccccccc) { ___x &= 0xcccccccc; ___bits += 2;} \ |
| 282 | if (___x & 0xaaaaaaaa) { ___x &= 0xaaaaaaaa; ___bits += 1;} \ |
| 283 | ___bits + 1; \ |
| 284 | }) |
| 285 | #endif |
| 286 | |
| 287 | static inline int fls64(unsigned long long x) |
| 288 | { |
| 289 | unsigned int h; |
| 290 | unsigned int bits = 32; |
| 291 | |
| 292 | h = x >> 32; |
| 293 | if (!h) { |
| 294 | h = x; |
| 295 | bits = 0; |
| 296 | } |
| 297 | return flsnz(h) + bits; |
| 298 | } |
| 299 | |
| 300 | #define fls_auto(x) ((sizeof(x) > 4) ? fls64(x) : flsnz(x)) |
| 301 | |
| 302 | /* Linux-like "container_of". It returns a pointer to the structure of type |
| 303 | * <type> which has its member <name> stored at address <ptr>. |
| 304 | */ |
| 305 | #ifndef container_of |
| 306 | #define container_of(ptr, type, name) ((type *)(((void *)(ptr)) - ((long)&((type *)0)->name))) |
| 307 | #endif |
| 308 | |
| 309 | /* |
| 310 | * Gcc >= 3 provides the ability for the program to give hints to the compiler |
| 311 | * about what branch of an if is most likely to be taken. This helps the |
| 312 | * compiler produce the most compact critical paths, which is generally better |
| 313 | * for the cache and to reduce the number of jumps. Be very careful not to use |
| 314 | * this in inline functions, because the code reordering it causes very often |
| 315 | * has a negative impact on the calling functions. |
| 316 | */ |
Willy Tarreau | 70bcfb7 | 2008-01-27 02:21:53 +0100 | [diff] [blame] | 317 | #if !defined(likely) |
| 318 | #if __GNUC__ < 3 |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 319 | #define __builtin_expect(x,y) (x) |
Willy Tarreau | 70bcfb7 | 2008-01-27 02:21:53 +0100 | [diff] [blame] | 320 | #define likely(x) (x) |
| 321 | #define unlikely(x) (x) |
| 322 | #elif __GNUC__ < 4 |
| 323 | /* gcc 3.x does the best job at this */ |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 324 | #define likely(x) (__builtin_expect((x) != 0, 1)) |
| 325 | #define unlikely(x) (__builtin_expect((x) != 0, 0)) |
Willy Tarreau | 70bcfb7 | 2008-01-27 02:21:53 +0100 | [diff] [blame] | 326 | #else |
| 327 | /* GCC 4.x is stupid, it performs the comparison then compares it to 1, |
| 328 | * so we cheat in a dirty way to prevent it from doing this. This will |
| 329 | * only work with ints and booleans though. |
| 330 | */ |
| 331 | #define likely(x) (x) |
Willy Tarreau | 75875a7 | 2008-07-06 15:18:50 +0200 | [diff] [blame] | 332 | #define unlikely(x) (__builtin_expect((unsigned long)(x), 0)) |
Willy Tarreau | 70bcfb7 | 2008-01-27 02:21:53 +0100 | [diff] [blame] | 333 | #endif |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 334 | #endif |
| 335 | |
Willy Tarreau | 75cf17e | 2008-08-29 15:48:49 +0200 | [diff] [blame] | 336 | /* By default, gcc does not inline large chunks of code, but we want it to |
| 337 | * respect our choices. |
| 338 | */ |
| 339 | #if !defined(forceinline) |
| 340 | #if __GNUC__ < 3 |
| 341 | #define forceinline inline |
| 342 | #else |
| 343 | #define forceinline inline __attribute__((always_inline)) |
| 344 | #endif |
| 345 | #endif |
| 346 | |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 347 | /* Support passing function parameters in registers. For this, the |
| 348 | * CONFIG_EBTREE_REGPARM macro has to be set to the maximal number of registers |
| 349 | * allowed. Some functions have intentionally received a regparm lower than |
| 350 | * their parameter count, it is in order to avoid register clobbering where |
| 351 | * they are called. |
| 352 | */ |
| 353 | #ifndef REGPRM1 |
| 354 | #if CONFIG_EBTREE_REGPARM >= 1 |
| 355 | #define REGPRM1 __attribute__((regparm(1))) |
| 356 | #else |
| 357 | #define REGPRM1 |
| 358 | #endif |
| 359 | #endif |
| 360 | |
| 361 | #ifndef REGPRM2 |
| 362 | #if CONFIG_EBTREE_REGPARM >= 2 |
| 363 | #define REGPRM2 __attribute__((regparm(2))) |
| 364 | #else |
| 365 | #define REGPRM2 REGPRM1 |
| 366 | #endif |
| 367 | #endif |
| 368 | |
| 369 | #ifndef REGPRM3 |
| 370 | #if CONFIG_EBTREE_REGPARM >= 3 |
| 371 | #define REGPRM3 __attribute__((regparm(3))) |
| 372 | #else |
| 373 | #define REGPRM3 REGPRM2 |
| 374 | #endif |
| 375 | #endif |
| 376 | |
| 377 | /* Number of bits per node, and number of leaves per node */ |
| 378 | #define EB_NODE_BITS 1 |
| 379 | #define EB_NODE_BRANCHES (1 << EB_NODE_BITS) |
| 380 | #define EB_NODE_BRANCH_MASK (EB_NODE_BRANCHES - 1) |
| 381 | |
| 382 | /* Be careful not to tweak those values. The walking code is optimized for NULL |
| 383 | * detection on the assumption that the following values are intact. |
| 384 | */ |
| 385 | #define EB_LEFT 0 |
| 386 | #define EB_RGHT 1 |
| 387 | #define EB_LEAF 0 |
| 388 | #define EB_NODE 1 |
| 389 | |
Willy Tarreau | 1fb6c87 | 2008-05-16 19:48:20 +0200 | [diff] [blame] | 390 | /* Tags to set in root->b[EB_RGHT] : |
| 391 | * - EB_NORMAL is a normal tree which stores duplicate keys. |
| 392 | * - EB_UNIQUE is a tree which stores unique keys. |
| 393 | */ |
| 394 | #define EB_NORMAL 0 |
| 395 | #define EB_UNIQUE 1 |
| 396 | |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 397 | /* This is the same as an eb_node pointer, except that the lower bit embeds |
| 398 | * a tag. See eb_dotag()/eb_untag()/eb_gettag(). This tag has two meanings : |
| 399 | * - 0=left, 1=right to designate the parent's branch for leaf_p/node_p |
| 400 | * - 0=link, 1=leaf to designate the branch's type for branch[] |
| 401 | */ |
| 402 | typedef void eb_troot_t; |
| 403 | |
| 404 | /* The eb_root connects the node which contains it, to two nodes below it, one |
| 405 | * of which may be the same node. At the top of the tree, we use an eb_root |
Willy Tarreau | 1fb6c87 | 2008-05-16 19:48:20 +0200 | [diff] [blame] | 406 | * too, which always has its right branch NULL (+/1 low-order bits). |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 407 | */ |
| 408 | struct eb_root { |
| 409 | eb_troot_t *b[EB_NODE_BRANCHES]; /* left and right branches */ |
| 410 | }; |
| 411 | |
| 412 | /* The eb_node contains the two parts, one for the leaf, which always exists, |
| 413 | * and one for the node, which remains unused in the very first node inserted |
| 414 | * into the tree. This structure is 20 bytes per node on 32-bit machines. Do |
| 415 | * not change the order, benchmarks have shown that it's optimal this way. |
| 416 | */ |
| 417 | struct eb_node { |
| 418 | struct eb_root branches; /* branches, must be at the beginning */ |
| 419 | eb_troot_t *node_p; /* link node's parent */ |
| 420 | eb_troot_t *leaf_p; /* leaf node's parent */ |
| 421 | int bit; /* link's bit position. */ |
| 422 | }; |
| 423 | |
| 424 | /* Return the structure of type <type> whose member <member> points to <ptr> */ |
| 425 | #define eb_entry(ptr, type, member) container_of(ptr, type, member) |
| 426 | |
| 427 | /* The root of a tree is an eb_root initialized with both pointers NULL. |
| 428 | * During its life, only the left pointer will change. The right one will |
| 429 | * always remain NULL, which is the way we detect it. |
| 430 | */ |
| 431 | #define EB_ROOT \ |
| 432 | (struct eb_root) { \ |
| 433 | .b = {[0] = NULL, [1] = NULL }, \ |
| 434 | } |
| 435 | |
Willy Tarreau | 1fb6c87 | 2008-05-16 19:48:20 +0200 | [diff] [blame] | 436 | #define EB_ROOT_UNIQUE \ |
| 437 | (struct eb_root) { \ |
| 438 | .b = {[0] = NULL, [1] = (void *)1 }, \ |
| 439 | } |
| 440 | |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 441 | #define EB_TREE_HEAD(name) \ |
| 442 | struct eb_root name = EB_ROOT |
| 443 | |
| 444 | |
| 445 | /***************************************\ |
| 446 | * Private functions. Not for end-user * |
| 447 | \***************************************/ |
| 448 | |
| 449 | /* Converts a root pointer to its equivalent eb_troot_t pointer, |
| 450 | * ready to be stored in ->branch[], leaf_p or node_p. NULL is not |
| 451 | * conserved. To be used with EB_LEAF, EB_NODE, EB_LEFT or EB_RGHT in <tag>. |
| 452 | */ |
| 453 | static inline eb_troot_t *eb_dotag(const struct eb_root *root, const int tag) |
| 454 | { |
| 455 | return (eb_troot_t *)((void *)root + tag); |
| 456 | } |
| 457 | |
| 458 | /* Converts an eb_troot_t pointer pointer to its equivalent eb_root pointer, |
| 459 | * for use with pointers from ->branch[], leaf_p or node_p. NULL is conserved |
| 460 | * as long as the tree is not corrupted. To be used with EB_LEAF, EB_NODE, |
| 461 | * EB_LEFT or EB_RGHT in <tag>. |
| 462 | */ |
| 463 | static inline struct eb_root *eb_untag(const eb_troot_t *troot, const int tag) |
| 464 | { |
| 465 | return (struct eb_root *)((void *)troot - tag); |
| 466 | } |
| 467 | |
| 468 | /* returns the tag associated with an eb_troot_t pointer */ |
| 469 | static inline int eb_gettag(eb_troot_t *troot) |
| 470 | { |
| 471 | return (unsigned long)troot & 1; |
| 472 | } |
| 473 | |
| 474 | /* Converts a root pointer to its equivalent eb_troot_t pointer and clears the |
| 475 | * tag, no matter what its value was. |
| 476 | */ |
| 477 | static inline struct eb_root *eb_clrtag(const eb_troot_t *troot) |
| 478 | { |
| 479 | return (struct eb_root *)((unsigned long)troot & ~1UL); |
| 480 | } |
| 481 | |
| 482 | /* Returns a pointer to the eb_node holding <root> */ |
| 483 | static inline struct eb_node *eb_root_to_node(struct eb_root *root) |
| 484 | { |
| 485 | return container_of(root, struct eb_node, branches); |
| 486 | } |
| 487 | |
| 488 | /* Walks down starting at root pointer <start>, and always walking on side |
| 489 | * <side>. It either returns the node hosting the first leaf on that side, |
| 490 | * or NULL if no leaf is found. <start> may either be NULL or a branch pointer. |
| 491 | * The pointer to the leaf (or NULL) is returned. |
| 492 | */ |
| 493 | static inline struct eb_node *eb_walk_down(eb_troot_t *start, unsigned int side) |
| 494 | { |
| 495 | /* A NULL pointer on an empty tree root will be returned as-is */ |
| 496 | while (eb_gettag(start) == EB_NODE) |
| 497 | start = (eb_untag(start, EB_NODE))->b[side]; |
| 498 | /* NULL is left untouched (root==eb_node, EB_LEAF==0) */ |
| 499 | return eb_root_to_node(eb_untag(start, EB_LEAF)); |
| 500 | } |
| 501 | |
| 502 | /* This function is used to build a tree of duplicates by adding a new node to |
| 503 | * a subtree of at least 2 entries. It will probably never be needed inlined, |
| 504 | * and it is not for end-user. |
| 505 | */ |
Willy Tarreau | 75cf17e | 2008-08-29 15:48:49 +0200 | [diff] [blame] | 506 | static forceinline struct eb_node * |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 507 | __eb_insert_dup(struct eb_node *sub, struct eb_node *new) |
| 508 | { |
| 509 | struct eb_node *head = sub; |
| 510 | |
| 511 | struct eb_troot *new_left = eb_dotag(&new->branches, EB_LEFT); |
| 512 | struct eb_troot *new_rght = eb_dotag(&new->branches, EB_RGHT); |
| 513 | struct eb_troot *new_leaf = eb_dotag(&new->branches, EB_LEAF); |
| 514 | |
| 515 | /* first, identify the deepest hole on the right branch */ |
| 516 | while (eb_gettag(head->branches.b[EB_RGHT]) != EB_LEAF) { |
| 517 | struct eb_node *last = head; |
| 518 | head = container_of(eb_untag(head->branches.b[EB_RGHT], EB_NODE), |
| 519 | struct eb_node, branches); |
| 520 | if (head->bit > last->bit + 1) |
| 521 | sub = head; /* there's a hole here */ |
| 522 | } |
| 523 | |
| 524 | /* Here we have a leaf attached to (head)->b[EB_RGHT] */ |
| 525 | if (head->bit < -1) { |
| 526 | /* A hole exists just before the leaf, we insert there */ |
| 527 | new->bit = -1; |
| 528 | sub = container_of(eb_untag(head->branches.b[EB_RGHT], EB_LEAF), |
| 529 | struct eb_node, branches); |
| 530 | head->branches.b[EB_RGHT] = eb_dotag(&new->branches, EB_NODE); |
| 531 | |
| 532 | new->node_p = sub->leaf_p; |
| 533 | new->leaf_p = new_rght; |
| 534 | sub->leaf_p = new_left; |
| 535 | new->branches.b[EB_LEFT] = eb_dotag(&sub->branches, EB_LEAF); |
| 536 | new->branches.b[EB_RGHT] = new_leaf; |
| 537 | return new; |
| 538 | } else { |
| 539 | int side; |
| 540 | /* No hole was found before a leaf. We have to insert above |
| 541 | * <sub>. Note that we cannot be certain that <sub> is attached |
| 542 | * to the right of its parent, as this is only true if <sub> |
| 543 | * is inside the dup tree, not at the head. |
| 544 | */ |
| 545 | new->bit = sub->bit - 1; /* install at the lowest level */ |
| 546 | side = eb_gettag(sub->node_p); |
| 547 | head = container_of(eb_untag(sub->node_p, side), struct eb_node, branches); |
| 548 | head->branches.b[side] = eb_dotag(&new->branches, EB_NODE); |
| 549 | |
| 550 | new->node_p = sub->node_p; |
| 551 | new->leaf_p = new_rght; |
| 552 | sub->node_p = new_left; |
| 553 | new->branches.b[EB_LEFT] = eb_dotag(&sub->branches, EB_NODE); |
| 554 | new->branches.b[EB_RGHT] = new_leaf; |
| 555 | return new; |
| 556 | } |
| 557 | } |
| 558 | |
| 559 | |
| 560 | /**************************************\ |
| 561 | * Public functions, for the end-user * |
| 562 | \**************************************/ |
| 563 | |
| 564 | /* Return the first leaf in the tree starting at <root>, or NULL if none */ |
| 565 | static inline struct eb_node *eb_first(struct eb_root *root) |
| 566 | { |
| 567 | return eb_walk_down(root->b[0], EB_LEFT); |
| 568 | } |
| 569 | |
| 570 | /* Return the last leaf in the tree starting at <root>, or NULL if none */ |
| 571 | static inline struct eb_node *eb_last(struct eb_root *root) |
| 572 | { |
| 573 | return eb_walk_down(root->b[0], EB_RGHT); |
| 574 | } |
| 575 | |
| 576 | /* Return previous leaf node before an existing leaf node, or NULL if none. */ |
| 577 | static inline struct eb_node *eb_prev(struct eb_node *node) |
| 578 | { |
| 579 | eb_troot_t *t = node->leaf_p; |
| 580 | |
| 581 | while (eb_gettag(t) == EB_LEFT) { |
| 582 | /* Walking up from left branch. We must ensure that we never |
| 583 | * walk beyond root. |
| 584 | */ |
Willy Tarreau | 1fb6c87 | 2008-05-16 19:48:20 +0200 | [diff] [blame] | 585 | if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL)) |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 586 | return NULL; |
| 587 | t = (eb_root_to_node(eb_untag(t, EB_LEFT)))->node_p; |
| 588 | } |
| 589 | /* Note that <t> cannot be NULL at this stage */ |
| 590 | t = (eb_untag(t, EB_RGHT))->b[EB_LEFT]; |
| 591 | return eb_walk_down(t, EB_RGHT); |
| 592 | } |
| 593 | |
| 594 | /* Return next leaf node after an existing leaf node, or NULL if none. */ |
| 595 | static inline struct eb_node *eb_next(struct eb_node *node) |
| 596 | { |
| 597 | eb_troot_t *t = node->leaf_p; |
| 598 | |
| 599 | while (eb_gettag(t) != EB_LEFT) |
| 600 | /* Walking up from right branch, so we cannot be below root */ |
| 601 | t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p; |
| 602 | |
| 603 | /* Note that <t> cannot be NULL at this stage */ |
| 604 | t = (eb_untag(t, EB_LEFT))->b[EB_RGHT]; |
Willy Tarreau | 1fb6c87 | 2008-05-16 19:48:20 +0200 | [diff] [blame] | 605 | if (eb_clrtag(t) == NULL) |
| 606 | return NULL; |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 607 | return eb_walk_down(t, EB_LEFT); |
| 608 | } |
| 609 | |
| 610 | /* Return previous leaf node before an existing leaf node, skipping duplicates, |
| 611 | * or NULL if none. */ |
| 612 | static inline struct eb_node *eb_prev_unique(struct eb_node *node) |
| 613 | { |
| 614 | eb_troot_t *t = node->leaf_p; |
| 615 | |
| 616 | while (1) { |
| 617 | if (eb_gettag(t) != EB_LEFT) { |
| 618 | node = eb_root_to_node(eb_untag(t, EB_RGHT)); |
| 619 | /* if we're right and not in duplicates, stop here */ |
| 620 | if (node->bit >= 0) |
| 621 | break; |
| 622 | t = node->node_p; |
| 623 | } |
| 624 | else { |
| 625 | /* Walking up from left branch. We must ensure that we never |
| 626 | * walk beyond root. |
| 627 | */ |
Willy Tarreau | 1fb6c87 | 2008-05-16 19:48:20 +0200 | [diff] [blame] | 628 | if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL)) |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 629 | return NULL; |
| 630 | t = (eb_root_to_node(eb_untag(t, EB_LEFT)))->node_p; |
| 631 | } |
| 632 | } |
| 633 | /* Note that <t> cannot be NULL at this stage */ |
| 634 | t = (eb_untag(t, EB_RGHT))->b[EB_LEFT]; |
| 635 | return eb_walk_down(t, EB_RGHT); |
| 636 | } |
| 637 | |
| 638 | /* Return next leaf node after an existing leaf node, skipping duplicates, or |
| 639 | * NULL if none. |
| 640 | */ |
| 641 | static inline struct eb_node *eb_next_unique(struct eb_node *node) |
| 642 | { |
| 643 | eb_troot_t *t = node->leaf_p; |
| 644 | |
| 645 | while (1) { |
| 646 | if (eb_gettag(t) == EB_LEFT) { |
Willy Tarreau | 1fb6c87 | 2008-05-16 19:48:20 +0200 | [diff] [blame] | 647 | if (unlikely(eb_clrtag((eb_untag(t, EB_LEFT))->b[EB_RGHT]) == NULL)) |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 648 | return NULL; /* we reached root */ |
| 649 | node = eb_root_to_node(eb_untag(t, EB_LEFT)); |
| 650 | /* if we're left and not in duplicates, stop here */ |
| 651 | if (node->bit >= 0) |
| 652 | break; |
| 653 | t = node->node_p; |
| 654 | } |
| 655 | else { |
| 656 | /* Walking up from right branch, so we cannot be below root */ |
| 657 | t = (eb_root_to_node(eb_untag(t, EB_RGHT)))->node_p; |
| 658 | } |
| 659 | } |
| 660 | |
| 661 | /* Note that <t> cannot be NULL at this stage */ |
| 662 | t = (eb_untag(t, EB_LEFT))->b[EB_RGHT]; |
Willy Tarreau | 1fb6c87 | 2008-05-16 19:48:20 +0200 | [diff] [blame] | 663 | if (eb_clrtag(t) == NULL) |
| 664 | return NULL; |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 665 | return eb_walk_down(t, EB_LEFT); |
| 666 | } |
| 667 | |
| 668 | |
| 669 | /* Removes a leaf node from the tree if it was still in it. Marks the node |
| 670 | * as unlinked. |
| 671 | */ |
Willy Tarreau | 75cf17e | 2008-08-29 15:48:49 +0200 | [diff] [blame] | 672 | static forceinline void __eb_delete(struct eb_node *node) |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 673 | { |
| 674 | __label__ delete_unlink; |
| 675 | unsigned int pside, gpside, sibtype; |
| 676 | struct eb_node *parent; |
| 677 | struct eb_root *gparent; |
| 678 | |
| 679 | if (!node->leaf_p) |
| 680 | return; |
| 681 | |
| 682 | /* we need the parent, our side, and the grand parent */ |
| 683 | pside = eb_gettag(node->leaf_p); |
| 684 | parent = eb_root_to_node(eb_untag(node->leaf_p, pside)); |
| 685 | |
| 686 | /* We likely have to release the parent link, unless it's the root, |
| 687 | * in which case we only set our branch to NULL. Note that we can |
| 688 | * only be attached to the root by its left branch. |
| 689 | */ |
| 690 | |
Willy Tarreau | 1fb6c87 | 2008-05-16 19:48:20 +0200 | [diff] [blame] | 691 | if (eb_clrtag(parent->branches.b[EB_RGHT]) == NULL) { |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 692 | /* we're just below the root, it's trivial. */ |
| 693 | parent->branches.b[EB_LEFT] = NULL; |
| 694 | goto delete_unlink; |
| 695 | } |
| 696 | |
| 697 | /* To release our parent, we have to identify our sibling, and reparent |
| 698 | * it directly to/from the grand parent. Note that the sibling can |
| 699 | * either be a link or a leaf. |
| 700 | */ |
| 701 | |
| 702 | gpside = eb_gettag(parent->node_p); |
| 703 | gparent = eb_untag(parent->node_p, gpside); |
| 704 | |
| 705 | gparent->b[gpside] = parent->branches.b[!pside]; |
| 706 | sibtype = eb_gettag(gparent->b[gpside]); |
| 707 | |
| 708 | if (sibtype == EB_LEAF) { |
| 709 | eb_root_to_node(eb_untag(gparent->b[gpside], EB_LEAF))->leaf_p = |
| 710 | eb_dotag(gparent, gpside); |
| 711 | } else { |
| 712 | eb_root_to_node(eb_untag(gparent->b[gpside], EB_NODE))->node_p = |
| 713 | eb_dotag(gparent, gpside); |
| 714 | } |
| 715 | /* Mark the parent unused. Note that we do not check if the parent is |
| 716 | * our own node, but that's not a problem because if it is, it will be |
| 717 | * marked unused at the same time, which we'll use below to know we can |
| 718 | * safely remove it. |
| 719 | */ |
| 720 | parent->node_p = NULL; |
| 721 | |
| 722 | /* The parent node has been detached, and is currently unused. It may |
| 723 | * belong to another node, so we cannot remove it that way. Also, our |
| 724 | * own node part might still be used. so we can use this spare node |
| 725 | * to replace ours if needed. |
| 726 | */ |
| 727 | |
| 728 | /* If our link part is unused, we can safely exit now */ |
| 729 | if (!node->node_p) |
| 730 | goto delete_unlink; |
| 731 | |
| 732 | /* From now on, <node> and <parent> are necessarily different, and the |
| 733 | * <node>'s node part is in use. By definition, <parent> is at least |
| 734 | * below <node>, so keeping its key for the bit string is OK. |
| 735 | */ |
| 736 | |
| 737 | parent->node_p = node->node_p; |
| 738 | parent->branches = node->branches; |
| 739 | parent->bit = node->bit; |
| 740 | |
| 741 | /* We must now update the new node's parent... */ |
| 742 | gpside = eb_gettag(parent->node_p); |
| 743 | gparent = eb_untag(parent->node_p, gpside); |
| 744 | gparent->b[gpside] = eb_dotag(&parent->branches, EB_NODE); |
| 745 | |
| 746 | /* ... and its branches */ |
| 747 | for (pside = 0; pside <= 1; pside++) { |
| 748 | if (eb_gettag(parent->branches.b[pside]) == EB_NODE) { |
| 749 | eb_root_to_node(eb_untag(parent->branches.b[pside], EB_NODE))->node_p = |
| 750 | eb_dotag(&parent->branches, pside); |
| 751 | } else { |
| 752 | eb_root_to_node(eb_untag(parent->branches.b[pside], EB_LEAF))->leaf_p = |
| 753 | eb_dotag(&parent->branches, pside); |
| 754 | } |
| 755 | } |
| 756 | delete_unlink: |
| 757 | /* Now the node has been completely unlinked */ |
| 758 | node->leaf_p = NULL; |
| 759 | return; /* tree is not empty yet */ |
| 760 | } |
| 761 | |
| 762 | /* These functions are declared in ebtree.c */ |
| 763 | void eb_delete(struct eb_node *node); |
| 764 | REGPRM1 struct eb_node *eb_insert_dup(struct eb_node *sub, struct eb_node *new); |
| 765 | |
Willy Tarreau | f56fd8a | 2007-11-19 18:43:04 +0100 | [diff] [blame] | 766 | #endif /* _COMMON_EBTREE_H */ |
Willy Tarreau | e6d2e4d | 2007-11-15 23:56:17 +0100 | [diff] [blame] | 767 | |
| 768 | /* |
| 769 | * Local variables: |
| 770 | * c-indent-level: 8 |
| 771 | * c-basic-offset: 8 |
| 772 | * End: |
| 773 | */ |