Skip to content

Commit 3a6acfd

Browse files
Ole John Askedahlerlend
authored andcommitted
Bug#36066725 Regular mgmd hangs when sending it a stop node for ndbmtd
Root cause is that the mutexes 'theMultiTransporterMutex' and 'clusterMgrThreadMutex' are taken in different order in the two respective call chains: 1) ClusterMgr::threadMain() -> lock() -> NdbMutex_Lock(clusterMgrThreadMutex) - ::threadMain(), holding clusterMgrThreadMutex -> TransporterFacade::startConnecting() - TF::startConnecting -> lockMultiTransporters() <<<< HANG while holding clusterMgrThreadMutex 2) TransporterRegistry::report_disconnect() -> lockMultiTransporters() - ::report_disconnect(), holding theMultiTransporterMutex, -> TransporterFacade::reportDisconnect() - TF::reportDisconnect -> ClusterMgr::reportDisconnected() - ClusterMgr::reportDisconnected() -> lock() - lock() -> NdbMutex_Lock(clusterMgrThreadMutex) <<<< Held by 1) Patch change TransporterRegistry::report_disconnect() such that the theMultiTransporterMutex is released before calling reportDisconnect(NodeId). It should be sufficient to hold theMultiTransporterMutex while ::report_disconnect check if we are disconnecting a multiTransporter, and if all its Trps are in DISCONNECTED state. When this finished we have set up 'ready_to_disconnect' and can release theMultiTransporterMutex before -> reportDisconnect() Change-Id: I19be0d9d92184efb8f20a92aa7189b9b85f069bc
1 parent 76d2002 commit 3a6acfd

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

storage/ndb/src/common/transporter/TransporterRegistry.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2373,14 +2373,14 @@ void TransporterRegistry::report_disconnect(TransporterReceiveHandle &recvdata,
23732373
remove_allTransporters(this_trp);
23742374
}
23752375
} // End of multiTransporter DISCONNECT handling
2376+
unlockMultiTransporters();
23762377

23772378
if (ready_to_disconnect) // 5)
23782379
{
23792380
DEBUG_FPRINTF((stderr, "(%u) -> reportDisconnect(node_id=%u)\n",
23802381
localNodeId, node_id));
23812382
recvdata.reportDisconnect(node_id, errnum);
23822383
}
2383-
unlockMultiTransporters();
23842384
DBUG_VOID_RETURN;
23852385
}
23862386

0 commit comments

Comments
 (0)