BUG/MEDIUM: connection: report connection errors even when no mux is installed

An annoying issue was met when testing the reverse-http mechanism, by
which failed connection attempts would apparently not be attempted again
when there was no connect timeout. It turned out to be more generalized
than the rhttp system, and actually affects all outgoing connections
relying on NPN or ALPN to choose the mux, on which no mux is installed
and for which the subscriber (ssl_sock) must be notified instead.

The problem appeared during 2.2-dev1 development. First, commit
062df2c23 ("MEDIUM: backend: move the connection finalization step to
back_handle_st_con()") broke the error reporting by testing CO_FL_ERROR
only under CO_FL_CONNECTED. While it still worked OK for cases where a
mux was present, it did not for this specific situation because no
single error path would be considered when no mux was present. Changing
the CO_FL_CONNECTED test to also include CO_FL_ERROR did work, until
a few commits later with 477902bd2 ("MEDIUM: connections: Get ride of
the xprt_done callback.") which removed the xprt_done callback that was
used to indicate success or failure of the transport layer setup, since,
as the commit explains, we can report this via the mux. What this last
commit says is true, except when there is no mux.

For this, however, the sock_conn_iocb() function (formerly conn_fd_handler)
is called for such errors, evaluates a number of conditions, none of which
is matched in this error condition case, since sock_conn_check() instantly
reports an error causing a jump to the leave label. There, the mux is
notified if installed, and the function returns. In other error condition
cases, readiness and activity are checked for both sides, the tasklets
woken up and the corresponding subscriber flags removed. This means that
a sane (and safe) approach would consist in just notifying the subscriber
in case of error, if such a subscriber still exists: if still there, it
means the event hasn't been caught earlier, then it's the right moment
to report it. And since this is done after conn_notify_mux(), it still
leaves all control to the mux once it's installed.

This commit should be progressively backported as far as 2.2 since it's
where the problem was introduced. It's important to clearly check the
error path in each function to make sure the fix still does what it's
supposed to.

(cherry picked from commit 6a4591c3d09da3fbd949073a32285a4afaa6785d)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 482038c9a8665134a2761a31be5f056e501529a7)
[cf: Slightly adapted because context differs. The tasklet is woken up before
     clearing the subs events]
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 92e11cee7f693cbbcc8f41749d0e6127dfdd19cf)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 2a3b41f0587a1b649489cb70fdd5c5c4c2085ba1)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
1 file changed