BUG/MAJOR: quic: complete thread migration before tcp-rules
A quic_conn is instantiated and tied on the first thread which has
received the first INITIAL packet. After handshake completion,
listener_accept() is called. For each quic_conn, a new thread is
selected among the least loaded ones Note that this occurs earlier if
handling 0-RTT data.
This thread connection migration is done in two steps :
* inside listener_accept(), on the origin thread, quic_conn
tasks/tasklet are killed. After this, no quic_conn related processing
will occur on this thread. The connection is flagged with
QUIC_FL_CONN_AFFINITY_CHANGED.
* as soon as the first quic_conn related processing occurs on the new
thread, the migration is finalized. This allows to allocate the new
tasks/tasklet directly on the destination thread.
This last step on the new thread must be done prior to other quic_conn
access. There is two events which may trigger it :
* a packet is received on the new thread. In this case,
qc_finalize_affinity_rebind() is called from quic_dgram_parse().
* the recently accepted connection is popped from accept_queue_ring via
accept_queue_process(). This will called session_accept_fd() as
listener.bind_conf.accept callback. This instantiates a new session
and start connection stack via conn_xprt_start(), which itself calls
qc_xprt_start() where qc_finalize_affinity_rebind() is used.
A condition was recently found which could cause a closing to be used
with qc_finalize_affinity_rebind() which is forbidden with a BUG_ON().
This lat step was not compatible with layer 4 rule such as "tcp-request
connection reject" which closes the connection early. In this case, most
of the body of session_accept_fd() is skipped, including
qc_xprt_start(), so thread migration is not finalized. At the end of the
function, conn_xprt_close() is then called which flags the connection as
CLOSING.
If a datagram is received for this connection before it is released,
this will call qc_finalize_affinity_rebind() which triggers its BUG_ON()
to prevent thread migration for CLOSING quic_conn.
FATAL: bug condition "qc->flags & ((1U << 29)|(1U << 30))" matched at src/quic_conn.c:2036
Thread 3 "haproxy" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7ffff794f700 (LWP 2973030)]
0x00005555556221f3 in qc_finalize_affinity_rebind (qc=0x7ffff002d060) at src/quic_conn.c:2036
2036 BUG_ON(qc->flags & (QUIC_FL_CONN_CLOSING|QUIC_FL_CONN_DRAINING));
(gdb) bt
#0 0x00005555556221f3 in qc_finalize_affinity_rebind (qc=0x7ffff002d060) at src/quic_conn.c:2036
#1 0x0000555555682463 in quic_dgram_parse (dgram=0x7fff5003ef10, from_qc=0x0, li=0x555555f38670) at src/quic_rx.c:2602
#2 0x0000555555651aae in quic_lstnr_dghdlr (t=0x555555fc4440, ctx=0x555555fc3f78, state=32832) at src/quic_sock.c:189
#3 0x00005555558c9393 in run_tasks_from_lists (budgets=0x7ffff7944c90) at src/task.c:596
#4 0x00005555558c9e8e in process_runnable_tasks () at src/task.c:876
#5 0x000055555586b7b2 in run_poll_loop () at src/haproxy.c:2966
#6 0x000055555586be87 in run_thread_poll_loop (data=0x555555d3d340 <ha_thread_info+64>) at src/haproxy.c:3165
#7 0x00007ffff7b59609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#8 0x00007ffff7a7e133 in clone () from /lib/x86_64-linux-gnu/libc.so.6
To fix this issue, ensure quic_conn migration is completed earlier
inside session_accept_fd(), before any tcp rules processing. This is
done by moving qc_finalize_affinity_rebind() invocation from
qc_xprt_start() to qc_conn_init().
This must be backported up to 2.7.
(cherry picked from commit a8968701c036ca427d96d99894e135984e372004)
[ad: move comment from quic_rx.c to quic_conn.c]
Signed-off-by: Amaury Denoyelle <adenoyelle@haproxy.com>
2 files changed