BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe
A crash reported in github issue #880 looks impossible unless
pendconn_cond_unlink() occasionally sees a null leaf_p when attempting
to remove an entry, which seems to be confirmed by the reporter. What
seems to be happening is that depending on compiler optimizations,
this pointer can appear as null while pointers are moved if one of
the node's parents is removed from or inserted into the tree. There's
no explicit null of the pointer during these operations but those
pointers are rewritten in multiple steps and nothing prevents this
situation from happening, and there are no particular barrier nor
atomic ops around this.
This test was used to avoid unnecessary locking, for already deleted
entries, but looking at the code it appears that pendconn_free() already
resets s->pend_pos that's used as <p> there, and that the other call
reasons are after an error where the connection will be dropped as
well. So we don't save anything by doing this test, and make it
unsafe. The older code used to check for list emptiness there and
not inside pendconn_unlink(), which explains why the code has stayed
there. Let's just remove this now.
Thanks to @jaroslawr for reporting this issue in great details and for
testing the proposed fix.
This should be backpored to 1.8, where the test on LIST_ISEMPTY should
be moved to pendconn_unlink() instead (inside the lock, just like 2.0+).
(cherry picked from commit fac0f645dff9fb1854db32da9fa7c317eaaeede1)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 3376d9f52e047214061aeedaf117a2006ce60d70)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 3e4a7b7f725228fc95927a8765728e8bedf47574)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
1 file changed