29bc8e9dabe249ae7ffbbd4c34973456afc88cb7 - haproxy

commit	29bc8e9dabe249ae7ffbbd4c34973456afc88cb7	[log] [tgz]
author	Aurelien DARRAGON <adarragon@haproxy.com>	Wed Mar 29 16:18:50 2023 +0200
committer	Christopher Faulet <cfaulet@haproxy.com>	Mon Apr 24 11:47:31 2023 +0200
tree	2d840482461d7ae0dfdda3fd6e83d0ea6166a292
parent	644358e0d35fd07aa3057388a3cda5a048db1684 [diff]

BUG/MEDIUM: proxy/sktable: prevent watchdog trigger on soft-stop

During soft-stop, manage_proxy() (p->task) will try to purge
trashable (expired and not referenced) sticktable entries,
effectively releasing the process memory to leave some space
for new processes.

This is done by calling stktable_trash_oldest(), immediately
followed by a pool_gc() to give the memory back to the OS.

As already mentioned in dfe7925 ("BUG/MEDIUM: stick-table:
limit the time spent purging old entries"), calling
stktable_trash_oldest() with a huge batch can result in the function
spending too much time searching and purging entries, and ultimately
triggering the watchdog.

Lately, an internal issue was reported in which we could see
that the watchdog is being triggered in stktable_trash_oldest()
on soft-stop (thus initiated by manage_proxy())

According to the report, the crash seems to only occur since 5938021
("BUG/MEDIUM: stick-table: do not leave entries in end of window during purge")

This could be the result of stktable_trash_oldest() now working
as expected, and thus spending a large amount of time purging
entries when called with a large enough <to_batch>.

Instead of adding new checks in stktable_trash_oldest(), here we
chose to address the issue directly in manage_proxy().

Since the stktable_trash_oldest() function is called with
<to_batch> == <p->table->current>, it's pretty obvious that it could
cause some issues during soft-stop if a large table, assuming it is
full prior to the soft-stop, suddenly sees most of its entries
becoming trashable because of the soft-stop.

Moreover, we should note that the call to stktable_trash_oldest() is
immediately followed by a call to pool_gc():

We know for sure that pool_gc(), as it involves malloc_trim() on
glibc, is rather expensive, and the more memory to reclaim,
the longer the call.

We need to ensure that both stktable_trash_oldest() + consequent
pool_gc() call both theoretically fit in a single task execution window
to avoid contention, and thus prevent the watchdog from being triggered.

To do this, we now allocate a "budget" for each purging attempt.
budget is maxed out to 32K, it means that each sticktable cleanup
attempt will trash at most 32K entries.

32K value is quite arbitrary here, and might need to be adjusted or
even deducted from other parameters if this fails to properly address
the issue without introducing new side-effects.
The goal is to find a good balance between the max duration of each
cleanup batch and the frequency of (expensive) pool_gc() calls.

If most of the budget is actually spent trashing entries, then the task
will immediately be rescheduled to continue the purge.
This way, the purge is effectively batched over multiple task runs.

This may be slowly backported to all stable versions.
[Please note that this commit depends on 6e1fe25 ("MINOR: proxy/pool:
prevent unnecessary calls to pool_gc()")]

(cherry picked from commit 7f01f0a8efae25d14176d43b149aeeceb4df6116)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 98f94b377ca6c5ceb718a568fc10a1e66a24f253)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit 103cd522990c3658f9b0d343b35189aa4bb86d82)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
(cherry picked from commit ce31f3e906bf7f485452cb11ca4ed665f5b64bbb)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>

src/proxy.c[diff]

1 file changed

tree: 2d840482461d7ae0dfdda3fd6e83d0ea6166a292