Fix regression introduced by fix for bug#19524096.

Ole John Aske · Ole John Aske · commit 60e1b37d9c90 · 2015-01-21T09:37:53.000+01:00
That fix caused ndb_global_schema_lock_error to fail as mysqld
ended up in a deadlock between TransporterFacade and ClusterMgr mutex
in ClusterMgr::is_cluster_completely_unavailable(). There
likely are other deadlock scenarios also.

The fix is to remove the Guard locking clusterMgrThreadMutex.
The rational and (lack of) correctness for this is discussed in
a mail pasted below.

There are also a few small improvements to ::is_cluster_completely_unavailable()

1. Dont copy entire 'trp_node' struct but read it by refference instead.

2. Replace usage of 'm_impl-&gt;m_transporter_facade' with 'getTransporter()'
   as this is the pattern used elsewhere in this code block.


--------------- Pasted mail with some background ----------------

Subject: (Dead-) locking of ClusterMgr::theNodes[] structures
 
Hi

Writing this as a note to myself and others after having analyzed
the failure of ndb_global_schema_lock_error.test. That test
timed out as mysqld ended up in a deadlock between
ClusterMgr::clusterMgrThreadMutex and TransporterFacade::theMutexPtr

ClusterMgr maintains node state &amp; info in theNodes[]. From external
components, this info is access through ClusterMgr::getNodeInfo().
theNodes[] are only updated from within ClustMgr, all external access
is read only.

Updates to theNodes[] are partly done from withing several ClustMgr::exec&lt;foo&gt;
methods, and partly from ClusterMgr::reportConnected() / ::reportDisconnected().
All updates seems to be done with ClusterMgr::clusterMgrThreadMutex locked.

Several ClusterMgr methods are available for inspecting node status
from other components, these all use information from theNodes[].
Some of the more commonly used of these methods are:

 - TransporterFacade::get_node_alive(n) (Reads 'theClusterMgr-&gt;getNodeInfo(n)')
 - TransporterFacade::get_an_alive_node() (Implemented on top of ::get_node_alive(n))
 - NdbImpl::get_node_stopping(n) (Reads 'theClusterMgr-&gt;getNodeInfo(n)')
 - NdbImpl::get_node_alive(n) (Reads 'theClusterMgr-&gt;getNodeInfo(n)')
 - NdbImpl::get&lt;foo&gt; ...A lot more node state getters....

The locking schema used to provide atomicity of theNodes[] for the above
methods are .... mixed, and not really defined at all as far as I can tell.
Some examples:

- NdbDictInterface::dictSignal(), NdbDictInterface::forceGCPWait() &amp; NdbDictInterface::listObjects():
  Before calling get_node_alive() / ::get_an_alive_node(), a PollGuard is set.
  PollGuard calls trp_client::start_poll() which is different pre/post 7.3:
  - Pre 7.3, trp_client::start_poll
              -&gt; TransporterFacade::start_poll -
                -&gt; lock TransporterFacade mutex (a global mutex)

   - 7.3 -&gt; trp_client::start_poll, lock trp_client m_mutex.
              -&gt; TransporterFacade::start_poll ...no locking, and mutex gone in this version

 Observations: There are no locking of ClusterMgr::clusterMgrThreadMutex here,
               neither pre/post 7.3 .

- Ndb_cluster_connection::wait_until_ready(),
  Ndb_cluster_connection_impl::get_next_alive_node()
  Ndb_cluster_connection::get_no_ready():
  These all sets the TransporterFacadeFurthermore mutex.

- Ndb::sendRecSignal
  Sets a PollGuard as above, which either lock the TransporterFacade or
  the trp_client mutex

- Ndb::sendPrepTrans()
  Documents in comments that TransporterFacade mutex should be locked prior to call

So this has become a total mess. It might seem like that it prior
to 7.3 was the intention that TransporterFacade mutex should be held
when accessing theNodes[], or any methods that access it itself.
After 7.3 a mix of TransporterFacade and Trp_client mutex is effectively used

Updating looks like it sets the ClusterMgr mutex to protect these, which will
of course not work as the reader doesnt set this mutex. However, it could be
that all updates happens at places where it is called from the TransporterFacade.
Here we *used to* hold the TransporterFacade mutex prior to 7.3, which would make
some sense. This all needs more investigation .... and some documentation in the code...
In the current state there certainly are no effective concurrency protection of
the node state info  in 7.3+ , It could be that it work in 7.1 &amp; 7.2

On top of this the fix for bug#19524096 introduced
ClusterMgr::is_cluster_completely_unavailable() which is also based on
the nodes status available in theNodes[]. Probably in good faith,
backed by that updates of theNodes[] was protected with clusterMgrThreadMutex,
that method grabs that lock before accessing theNodes[] info.

Actually this method is *the only* node state getters which
does any explicit locking. ... and it was doomed to fail as this
was completely untested territory. See other mail about how
it could deadlock with the TranporterFacade mutex.

Sidenote: any other node state getters attempting to follow the
same locking pattern had likely deadlocked the same way.

::is_cluster_completely_unavailable() is called from within code
which also calls ::get_node_alive() and ::get_an_alive_node(),
without using any locking protection for these. Based on this I will
propose a patch for the bug#19524096 regression, which simply
removes the mutex locking from ::is_cluster_completely_unavailable().

This will not be entirely kosher based on how the shared node state
structs should have been mutex protected. However, based on my discussion
above, there are already so many violations in this area that a single
more should not matter. A bigger effort should be taken to clean up this entirely.
diff --git a/storage/ndb/src/ndbapi/ClusterMgr.cpp b/storage/ndb/src/ndbapi/ClusterMgr.cpp
@@ -1,5 +1,5 @@
 /*
-   Copyright (c) 2003, 2014, Oracle and/or its affiliates. All rights reserved.
+   Copyright (c) 2003, 2015, Oracle and/or its affiliates. All rights reserved.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -1053,10 +1053,16 @@ ClusterMgr::is_cluster_completely_unavailable()
 {
   bool ret_code = true;
 
-  Guard g(clusterMgrThreadMutex);
+  /**
+   * This method (and several other 'node state getters') allow
+   * reading of theNodes[] from multiple block threads while 
+   * ClusterMgr concurrently updates them. Thus, a mutex should
+   * have been expected here. See bug#20391191, and addendum patches
+   * to bug#19524096, to understand what prevents us from locking (yet)
+   */
   for (NodeId n = 1; n < MAX_NDB_NODES ; n++)
   {
-    const trp_node node = getNodeInfo(n);
+    const trp_node& node = theNodes[n];
     if (!node.defined)
     {
       /**
diff --git a/storage/ndb/src/ndbapi/NdbDictionaryImpl.cpp b/storage/ndb/src/ndbapi/NdbDictionaryImpl.cpp
@@ -1,5 +1,5 @@
 /*
-   Copyright (c) 2003, 2014, Oracle and/or its affiliates. All rights reserved.
+   Copyright (c) 2003, 2015, Oracle and/or its affiliates. All rights reserved.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -2384,7 +2384,7 @@ NdbDictInterface::dictSignal(NdbApiSignal* sig,
     }
     DBUG_PRINT("info", ("node %d", node));
     if(node == 0){
-      if (m_impl->m_transporter_facade->is_cluster_completely_unavailable())
+      if (getTransporter()->is_cluster_completely_unavailable())
       {
         m_error.code= 4009;
       }
@@ -5972,7 +5972,7 @@ NdbDictInterface::listObjects(NdbApiSignal* signal,
     PollGuard poll_guard(* m_impl);
     Uint16 aNodeId = getTransporter()->get_an_alive_node();
     if (aNodeId == 0) {
-      if (m_impl->m_transporter_facade->is_cluster_completely_unavailable())
+      if (getTransporter()->is_cluster_completely_unavailable())
       {
         m_error.code= 4009;
       }
@@ -6141,7 +6141,7 @@ NdbDictInterface::forceGCPWait(int type)
       PollGuard pg(* m_impl);
       Uint16 aNodeId = getTransporter()->get_an_alive_node();
       if (aNodeId == 0) {
-        if (m_impl->m_transporter_facade->is_cluster_completely_unavailable())
+        if (getTransporter()->is_cluster_completely_unavailable())
         {
           m_error.code= 4009;
         }
@@ -6183,7 +6183,7 @@ NdbDictInterface::forceGCPWait(int type)
       m_impl->lock();
       Uint16 aNodeId = getTransporter()->get_an_alive_node();
       if (aNodeId == 0) {
-        if (m_impl->m_transporter_facade->is_cluster_completely_unavailable())
+        if (getTransporter()->is_cluster_completely_unavailable())
         {
           m_error.code= 4009;
         }

Original file line number	Diff line number	Diff line change
`@@ -1,5 +1,5 @@`
`1`	`1`	`/*`
`2`		`- Copyright (c) 2003, 2014, Oracle and/or its affiliates. All rights reserved.`
	`2`	`+ Copyright (c) 2003, 2015, Oracle and/or its affiliates. All rights reserved.`
`3`	`3`
`4`	`4`	`This program is free software; you can redistribute it and/or modify`
`5`	`5`	`it under the terms of the GNU General Public License as published by`
`@@ -2384,7 +2384,7 @@ NdbDictInterface::dictSignal(NdbApiSignal* sig,`
`2384`	`2384`	`}`
`2385`	`2385`	`DBUG_PRINT("info", ("node %d", node));`
`2386`	`2386`	`if(node == 0){`
`2387`		`- if (m_impl->m_transporter_facade->is_cluster_completely_unavailable())`
	`2387`	`+ if (getTransporter()->is_cluster_completely_unavailable())`
`2388`	`2388`	`{`
`2389`	`2389`	`m_error.code= 4009;`
`2390`	`2390`	`}`
`@@ -5972,7 +5972,7 @@ NdbDictInterface::listObjects(NdbApiSignal* signal,`
`5972`	`5972`	`PollGuard poll_guard(* m_impl);`
`5973`	`5973`	`Uint16 aNodeId = getTransporter()->get_an_alive_node();`
`5974`	`5974`	`if (aNodeId == 0) {`
`5975`		`- if (m_impl->m_transporter_facade->is_cluster_completely_unavailable())`
	`5975`	`+ if (getTransporter()->is_cluster_completely_unavailable())`
`5976`	`5976`	`{`
`5977`	`5977`	`m_error.code= 4009;`
`5978`	`5978`	`}`
`@@ -6141,7 +6141,7 @@ NdbDictInterface::forceGCPWait(int type)`
`6141`	`6141`	`PollGuard pg(* m_impl);`
`6142`	`6142`	`Uint16 aNodeId = getTransporter()->get_an_alive_node();`
`6143`	`6143`	`if (aNodeId == 0) {`
`6144`		`- if (m_impl->m_transporter_facade->is_cluster_completely_unavailable())`
	`6144`	`+ if (getTransporter()->is_cluster_completely_unavailable())`
`6145`	`6145`	`{`
`6146`	`6146`	`m_error.code= 4009;`
`6147`	`6147`	`}`
`@@ -6183,7 +6183,7 @@ NdbDictInterface::forceGCPWait(int type)`
`6183`	`6183`	`m_impl->lock();`
`6184`	`6184`	`Uint16 aNodeId = getTransporter()->get_an_alive_node();`
`6185`	`6185`	`if (aNodeId == 0) {`
`6186`		`- if (m_impl->m_transporter_facade->is_cluster_completely_unavailable())`
	`6186`	`+ if (getTransporter()->is_cluster_completely_unavailable())`
`6187`	`6187`	`{`
`6188`	`6188`	`m_error.code= 4009;`
`6189`	`6189`	`}`