BUG/MAJOR: ssl: free the generated SSL_CTX if the LRU cache is disabled

Kim Seri reported that haproxy 1.6.0 crashes after a few requests
when a bind line has SSL enabled with more than one certificate. This
was caused by an insufficient condition to free generated certs during
ssl_sock_close() which can also catch other certs.

Christopher Faulet analysed the situation like this :

-------
First the LRU tree is only initialized when the SSL certs generation is
configured on a bind line. So, in the most of cases, it is NULL (it is
not the same thing than empty).
When the SSL certs generation is used, if the cache is not NULL, a such
certificate is pushed in the cache and there is no need to release it
when the connection is closed.
But it can be disabled in the configuration. So in that case, we must
free the generated certificate when the connection is closed.

Then here, we have really a bug. Here is the buggy part:

3125)      if (conn->xprt_ctx) {
3126) #ifdef SSL_CTRL_SET_TLSEXT_HOSTNAME
3127)              if (!ssl_ctx_lru_tree && objt_listener(conn->target)) {
3128)                      SSL_CTX *ctx = SSL_get_SSL_CTX(conn->xprt_ctx);
3129)                      if (ctx != 3130)
 SSL_CTX_free(ctx);
3131)              }
3133)              SSL_free(conn->xprt_ctx);
3134)              conn->xprt_ctx = NULL;
3135)              sslconns--;
3136)      }

The check on the line 3127 is not enough to determine if this is a
generated certificate or not. Because ssl_ctx_lru_tree is NULL,
generated certificates, if any, must be freed. But here ctx should also
be compared to all SNI certificates and not only to default_ctx. Because
of this bug, when a SNI certificate is used for a connection, it is
erroneously freed when this connection is closed.
-------

Christopher provided this reliable reproducer :

----------
global
    tune.ssl.default-dh-param   2048
    daemon

listen ssl_server
    mode tcp
    bind 127.0.0.1:4443 ssl crt srv1.test.com.pem crt srv2.test.com.pem

    timeout connect 5000
    timeout client  30000
    timeout server  30000

    server srv A.B.C.D:80

You just need to generate 2 SSL certificates with 2 CN (here
srv1.test.com and srv2.test.com).

Then, by doing SSL requests with the first CN, there is no problem. But
with the second CN, it should segfault on the 2nd request.

openssl s_client -connect 127.0.0.1:4443 -servername srv1.test.com // OK
openssl s_client -connect 127.0.0.1:4443 -servername srv1.test.com // OK

But,

openssl s_client -connect 127.0.0.1:4443 -servername srv2.test.com // OK
openssl s_client -connect 127.0.0.1:4443 -servername srv2.test.com // KO
-----------

A long discussion led to the following proposal which this patch implements :

- the cert is generated. It gets a refcount = 1.
- we assign it to the SSL. Its refcount becomes two.
- we try to insert it into the tree. The tree will handle its freeing
  using SSL_CTX_free() during eviction.
- if we can't insert into the tree because the tree is disabled, then
  we have to call SSL_CTX_free() ourselves, then we'd rather do it
  immediately. It will more closely mimmick the case where the cert
  is added to the tree and immediately evicted by concurrent activity
  on the cache.
- we never have to call SSL_CTX_free() during ssl_sock_close() because
  the SSL session only relies on openssl doing the right thing based on
  the refcount only.
- thus we never need to know how the cert was created since the
  SSL_CTX_free() is either guaranteed or already done for generated
  certs, and this protects other ones against any accidental call to
  SSL_CTX_free() without having to track where the cert comes from.

This patch also reduces the inter-dependence between the LRU tree and
the SSL stack, so it should cause less sweating to migrate to threads
later.

This bug is specific to 1.6.0, as it was introduced after dev7 by
this fix :

   d2cab92 ("BUG/MINOR: ssl: fix management of the cache where forged certificates are stored")

Thus a backport to 1.6 is required, but not to 1.5.
1 file changed