f175b08bfbf81d6808b9ce474b8aa28cce805f9e - haproxy

commit	f175b08bfbf81d6808b9ce474b8aa28cce805f9e	[log] [tgz]
author	Aurelien DARRAGON <adarragon@haproxy.com>	Wed Feb 01 17:22:32 2023 +0100
committer	Christopher Faulet <cfaulet@haproxy.com>	Wed Apr 05 08:58:16 2023 +0200
tree	c477db7baf4fcbc804609cbbff614e9e12c15018
parent	75b9d1c041c0c814f91b07b789ac997aa952a4b9 [diff]
BUG/MINOR: server/del: fix srv->next pointer consistency

We recently discovered a bug which affects dynamic server deletion:

When a server is deleted, it is removed from the "visible" server list.
But as we've seen in previous commit
("MINOR: server: add SRV_F_DELETED flag"), it can still be accessed by
someone who keeps a reference on it (waiting for the final srv_drop()).
Throughout this transient state, server ptr is still valid (may be
dereferenced) and the flag SRV_F_DELETED is set.

However, as the server is not part of server list anymore, we have
an issue: srv->next pointer won't be updated anymore as the only place
where we perform such update is in cli_parse_delete_server() by
iterating over the "visible" server list.

Because of this, we cannot guarantee that a server with the
SRV_F_DELETED flag has a valid 'next' ptr: 'next' could be pointing
to a fully removed (already freed) server.

This problem can be easily demonstrated with server dumping in
the stats:

server list dumping is performed in stats_dump_proxy_to_buffer()
The function can be interrupted and resumed later by design.
ie: output buffer is full: partial dump and finish the dump after
the flush

This is implemented by calling srv_take() on the server being dumped,
and only releasing it when we're done with it using srv_drop().
(drop can be delayed after function resume if buffer is full)

While the function design seems OK, it works with the assumption that
srv->next will still be valid after the function resumes, which is
not true. (especially if multiple servers are being removed in between
the 2 dumping attempts)

In practice, this did not cause any crash yet (at least this was not
reported so far), because server dumping is so fast that it is very
unlikely that multiple server deletions make their way between 2
dumping attempts in most setups. But still, this is a problem that we
need to address because some upcoming work might depend on this
assumption as well and for the moment it is not safe at all.

========================================================================

Here is a quick reproducer:

With this patch, we're creating a large deletion window of 3s as soon
as we reach a server named "t2" while iterating over the list.

This will give us plenty of time to perform multiple deletions before
the function is resumed.

    |  diff --git a/src/stats.c b/src/stats.c
    |  index 84a4f9b6e..15e49b4cd 100644
    |  --- a/src/stats.c
    |  +++ b/src/stats.c
    |  @@ -3189,11 +3189,24 @@ int stats_dump_proxy_to_buffer(struct stconn *sc, struct htx *htx,
    |                    * Temporarily increment its refcount to prevent its
    |                    * anticipated cleaning. Call free_server to release it.
    |                    */
    |  +                struct server *orig = ctx->obj2;
    |                   for (; ctx->obj2 != NULL;
    |                          ctx->obj2 = srv_drop(sv)) {
    |
    |                           sv = ctx->obj2;
    |  +                        printf("sv = %s\n", sv->id);
    |                           srv_take(sv);
    |  +                        if (!strcmp("t2", sv->id) && orig == px->srv) {
    |  +                                printf("deletion window: 3s\n");
    |  +                                thread_idle_now();
    |  +                                thread_harmless_now();
    |  +                                sleep(3);
    |  +                                thread_harmless_end();
    |  +
    |  +                                thread_idle_end();
    |  +
    |  +                                goto full; /* simulate full buffer */
    |  +                        }
    |
    |                           if (htx) {
    |                                   if (htx_almost_full(htx))
    |  @@ -4353,6 +4366,7 @@ static void http_stats_io_handler(struct appctx *appctx)
    |           struct channel *res = sc_ic(sc);
    |           struct htx *req_htx, *res_htx;
    |
    |  +        printf("http dump\n");
    |           /* only proxy stats are available via http */
    |           ctx->domain = STATS_DOMAIN_PROXY;
    |

Ok, we're ready, now we start haproxy with the following conf:

       global
          stats socket /tmp/ha.sock  mode 660  level admin  expose-fd listeners thread 1-1
          nbthread 2

       frontend stats
           mode http
           bind *:8081 thread 2-2
           stats enable
           stats uri /

       backend farm
               server t1 127.0.0.1:1899  disabled
               server t2 127.0.0.1:18999 disabled
               server t3 127.0.0.1:18998 disabled
               server t4 127.0.0.1:18997 disabled

And finally, we execute the following script:

       curl localhost:8081/stats&
       sleep .2
       echo "del server farm/t2" | nc -U /tmp/ha.sock
       echo "del server farm/t3" | nc -U /tmp/ha.sock

This should be enough to reveal the issue, I easily manage to
consistently crash haproxy with the following reproducer:

    http dump
    sv = t1
    http dump
    sv = t1
    sv = t2
    deletion window = 3s
    [NOTICE]   (2940566) : Server deleted.
    [NOTICE]   (2940566) : Server deleted.
    http dump
    sv = t2
    sv = �����U
    [1]    2940566 segmentation fault (core dumped)  ./haproxy -f ttt.conf

========================================================================

To fix this, we add prev_deleted mt_list in server struct.
For a given "visible" server, this list will contain the pending
"deleted" servers references that point to it using their 'next' ptr.

This way, whenever this "visible" server is going to be deleted via
cli_parse_delete_server() it will check for servers in its
'prev_deleted' list and update their 'next' pointer so that they no
longer point to it, and then it will push them in its
'next->prev_deleted' list to transfer the update responsibility to the
next 'visible' server (if next != NULL).

Then, following the same logic, the server about to be removed in
cli_parse_delete_server() will push itself as well into its
'next->prev_deleted' list (if next != NULL) so that it may still use its
'next' ptr for the time it is in transient removal state.

In srv_drop(), right before the server is finally freed, we make sure
to remove it from the 'next->prev_deleted' list so that 'next' won't
try to perform the pointers update for this server anymore.
This has to be done atomically to prevent 'next' srv from accessing a
purged server.

As a result:
  for a valid server, either deleted or not, 'next' ptr will always
  point to a non deleted (ie: visible) server.

With the proposed fix, and several removal combinations (including
unordered cli_parse_delete_server() and srv_drop() calls), I cannot
reproduce the crash anymore.

Example tricky removal sequence that is now properly handled:

sv list: t1,t2,t3,t4,t5,t6

ops:
   take(t2)
   del(t4)
   del(t3)
   del(t5)
   drop(t3)
   drop(t4)
   drop(t5)
   drop(t2)
2 files changed
tree: c477db7baf4fcbc804609cbbff614e9e12c15018