Menu

Commit [r9557]  Maximize  Restore  History

qla2x00t-32gbit: Fix hang during NVMe session tear down

The following hung task call trace was seen:

[ 1230.183294] INFO: task qla2xxx_wq:523 blocked for more than 120 seconds.
[ 1230.197749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1230.205585] qla2xxx_wq D 0 523 2 0x80004000
[ 1230.205636] Workqueue: qla2xxx_wq qlt_free_session_done [qla2xxx]
[ 1230.205639] Call Trace:
[ 1230.208100] __schedule+0x2c4/0x700
[ 1230.211607] schedule+0x38/0xa0
[ 1230.214769] schedule_timeout+0x246/0x2f0
[ 1230.222651] wait_for_completion+0x97/0x100
[ 1230.226921] qlt_free_session_done+0x6a0/0x6f0 [qla2xxx]
[ 1230.232254] process_one_work+0x1a7/0x360

...when device side port resets were done.

Abort threads were getting out without processing due to the "deleted"
flag check. The delete thread, meanwhile, could not proceed with a
logout (that would have cleared out pending requests) as the logout IOCB
work was not progressing. It appears like the hung qlt_free_session_done()
thread is causing the ha->wq works on hold. The qlt_free_session_done()
was hung waiting for nvme_fc_unregister_remoteport() + localport_delete cb
to be complete, which would only happen when all I/Os are released.

Fix this by allowing abort to progress until device delete is completely
done. This should make the qlt_free_session_done() proceed without hang and
thus clear up the deadlock.

Link: https://lore.kernel.org/r/20210817051315.2477-5-njavali@...
Signed-off-by: Arun Easi <aeasi@...>
Signed-off-by: Nilesh Javali <njavali@...>
Signed-off-by: Martin K. Petersen <martin.petersen@...>
[ commit 310e69edfbd57995868a428eeddea09a7b5d2749 upstream ]

bvassche 2021-09-05

changed /trunk/qla2x00t-32gbit/qla_nvme.c
/trunk/qla2x00t-32gbit/qla_nvme.c Diff Switch to side-by-side view
Loading...
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.