| From git@z Thu Jan 1 00:00:00 1970 |
| Subject: [PATCH v2] napi: fix race inside napi_enable |
| From: Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
| Date: Sat, 18 Sep 2021 16:52:32 +0800 |
| Message-Id: <20210918085232.71436-1-xuanzhuo@linux.alibaba.com> |
| To: netdev@vger.kernel.org, linyunsheng@huawei.com |
| Cc: "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Eric Dumazet <edumazet@google.com>, Daniel Borkmann <daniel@iogearbox.net>, Antoine Tenart <atenart@kernel.org>, Alexander Lobakin <alobakin@pm.me>, Wei Wang <weiwan@google.com>, Taehee Yoo <ap420073@gmail.com>,Björn Töpel <bjorn@kernel.org>, Arnd Bergmann <arnd@arndb.de>, Kumar Kartikeya Dwivedi <memxor@gmail.com>, Neil Horman <nhorman@redhat.com>, Dust Li <dust.li@linux.alibaba.com> |
| List-Id: <netdev.vger.kernel.org> |
| MIME-Version: 1.0 |
| Content-Type: text/plain; charset="utf-8" |
| Content-Transfer-Encoding: 7bit |
| |
| The process will cause napi.state to contain NAPI_STATE_SCHED and |
| not in the poll_list, which will cause napi_disable() to get stuck. |
| |
| The prefix "NAPI_STATE_" is removed in the figure below, and |
| NAPI_STATE_HASHED is ignored in napi.state. |
| |
| CPU0 | CPU1 | napi.state |
| =============================================================================== |
| napi_disable() | | SCHED | NPSVC |
| napi_enable() | | |
| { | | |
| smp_mb__before_atomic(); | | |
| clear_bit(SCHED, &n->state); | | NPSVC |
| | napi_schedule_prep() | SCHED | NPSVC |
| | napi_poll() | |
| | napi_complete_done() | |
| | { | |
| | if (n->state & (NPSVC | | (1) |
| | _BUSY_POLL))) | |
| | return false; | |
| | ................ | |
| | } | SCHED | NPSVC |
| | | |
| clear_bit(NPSVC, &n->state); | | SCHED |
| } | | |
| | | |
| napi_schedule_prep() | | SCHED | MISSED (2) |
| |
| (1) Here return direct. Because of NAPI_STATE_NPSVC exists. |
| (2) NAPI_STATE_SCHED exists. So not add napi.poll_list to sd->poll_list |
| |
| Since NAPI_STATE_SCHED already exists and napi is not in the |
| sd->poll_list queue, NAPI_STATE_SCHED cannot be cleared and will always |
| exist. |
| |
| 1. This will cause this queue to no longer receive packets. |
| 2. If you encounter napi_disable under the protection of rtnl_lock, it |
| will cause the entire rtnl_lock to be locked, affecting the overall |
| system. |
| |
| This patch uses cmpxchg to implement napi_enable(), which ensures that |
| there will be no race due to the separation of clear two bits. |
| |
| Fixes: 2d8bff12699abc ("netpoll: Close race condition between poll_one_napi and napi_disable") |
| Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> |
| Reviewed-by: Dust Li <dust.li@linux.alibaba.com> |
| --- |
| net/core/dev.c | 16 ++++++++++------ |
| 1 file changed, 10 insertions(+), 6 deletions(-) |
| |
| diff --git a/net/core/dev.c b/net/core/dev.c |
| index 74fd402d26dd..7ee9fecd3aff 100644 |
| --- a/net/core/dev.c |
| +++ b/net/core/dev.c |
| @@ -6923,12 +6923,16 @@ EXPORT_SYMBOL(napi_disable); |
| */ |
| void napi_enable(struct napi_struct *n) |
| { |
| - BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); |
| - smp_mb__before_atomic(); |
| - clear_bit(NAPI_STATE_SCHED, &n->state); |
| - clear_bit(NAPI_STATE_NPSVC, &n->state); |
| - if (n->dev->threaded && n->thread) |
| - set_bit(NAPI_STATE_THREADED, &n->state); |
| + unsigned long val, new; |
| + |
| + do { |
| + val = READ_ONCE(n->state); |
| + BUG_ON(!test_bit(NAPI_STATE_SCHED, &val)); |
| + |
| + new = val & ~(NAPIF_STATE_SCHED | NAPIF_STATE_NPSVC); |
| + if (n->dev->threaded && n->thread) |
| + new |= NAPIF_STATE_THREADED; |
| + } while (cmpxchg(&n->state, val, new) != val); |
| } |
| EXPORT_SYMBOL(napi_enable); |
| |
| |
| -- |
| 2.31.0 |
| |
| |