Tags: shicarl/mysql-server
Tags
Fix regression introduced by fix for bug#19524096.
That fix caused ndb_global_schema_lock_error to fail as mysqld
ended up in a deadlock between TransporterFacade and ClusterMgr mutex
in ClusterMgr::is_cluster_completely_unavailable(). There
likely are other deadlock scenarios also.
The fix is to remove the Guard locking clusterMgrThreadMutex.
The rational and (lack of) correctness for this is discussed in
a mail pasted below.
There are also a few small improvements to ::is_cluster_completely_unavailable()
1. Dont copy entire 'trp_node' struct but read it by refference instead.
2. Replace usage of 'm_impl->m_transporter_facade' with 'getTransporter()'
as this is the pattern used elsewhere in this code block.
--------------- Pasted mail with some background ----------------
Subject: (Dead-) locking of ClusterMgr::theNodes[] structures
Hi
Writing this as a note to myself and others after having analyzed
the failure of ndb_global_schema_lock_error.test. That test
timed out as mysqld ended up in a deadlock between
ClusterMgr::clusterMgrThreadMutex and TransporterFacade::theMutexPtr
ClusterMgr maintains node state & info in theNodes[]. From external
components, this info is access through ClusterMgr::getNodeInfo().
theNodes[] are only updated from within ClustMgr, all external access
is read only.
Updates to theNodes[] are partly done from withing several ClustMgr::exec<foo>
methods, and partly from ClusterMgr::reportConnected() / ::reportDisconnected().
All updates seems to be done with ClusterMgr::clusterMgrThreadMutex locked.
Several ClusterMgr methods are available for inspecting node status
from other components, these all use information from theNodes[].
Some of the more commonly used of these methods are:
- TransporterFacade::get_node_alive(n) (Reads 'theClusterMgr->getNodeInfo(n)')
- TransporterFacade::get_an_alive_node() (Implemented on top of ::get_node_alive(n))
- NdbImpl::get_node_stopping(n) (Reads 'theClusterMgr->getNodeInfo(n)')
- NdbImpl::get_node_alive(n) (Reads 'theClusterMgr->getNodeInfo(n)')
- NdbImpl::get<foo> ...A lot more node state getters....
The locking schema used to provide atomicity of theNodes[] for the above
methods are .... mixed, and not really defined at all as far as I can tell.
Some examples:
- NdbDictInterface::dictSignal(), NdbDictInterface::forceGCPWait() & NdbDictInterface::listObjects():
Before calling get_node_alive() / ::get_an_alive_node(), a PollGuard is set.
PollGuard calls trp_client::start_poll() which is different pre/post 7.3:
- Pre 7.3, trp_client::start_poll
-> TransporterFacade::start_poll -
-> lock TransporterFacade mutex (a global mutex)
- 7.3 -> trp_client::start_poll, lock trp_client m_mutex.
-> TransporterFacade::start_poll ...no locking, and mutex gone in this version
Observations: There are no locking of ClusterMgr::clusterMgrThreadMutex here,
neither pre/post 7.3 .
- Ndb_cluster_connection::wait_until_ready(),
Ndb_cluster_connection_impl::get_next_alive_node()
Ndb_cluster_connection::get_no_ready():
These all sets the TransporterFacadeFurthermore mutex.
- Ndb::sendRecSignal
Sets a PollGuard as above, which either lock the TransporterFacade or
the trp_client mutex
- Ndb::sendPrepTrans()
Documents in comments that TransporterFacade mutex should be locked prior to call
So this has become a total mess. It might seem like that it prior
to 7.3 was the intention that TransporterFacade mutex should be held
when accessing theNodes[], or any methods that access it itself.
After 7.3 a mix of TransporterFacade and Trp_client mutex is effectively used
Updating looks like it sets the ClusterMgr mutex to protect these, which will
of course not work as the reader doesnt set this mutex. However, it could be
that all updates happens at places where it is called from the TransporterFacade.
Here we *used to* hold the TransporterFacade mutex prior to 7.3, which would make
some sense. This all needs more investigation .... and some documentation in the code...
In the current state there certainly are no effective concurrency protection of
the node state info in 7.3+ , It could be that it work in 7.1 & 7.2
On top of this the fix for bug#19524096 introduced
ClusterMgr::is_cluster_completely_unavailable() which is also based on
the nodes status available in theNodes[]. Probably in good faith,
backed by that updates of theNodes[] was protected with clusterMgrThreadMutex,
that method grabs that lock before accessing theNodes[] info.
Actually this method is *the only* node state getters which
does any explicit locking. ... and it was doomed to fail as this
was completely untested territory. See other mail about how
it could deadlock with the TranporterFacade mutex.
Sidenote: any other node state getters attempting to follow the
same locking pattern had likely deadlocked the same way.
::is_cluster_completely_unavailable() is called from within code
which also calls ::get_node_alive() and ::get_an_alive_node(),
without using any locking protection for these. Based on this I will
propose a patch for the bug#19524096 regression, which simply
removes the mutex locking from ::is_cluster_completely_unavailable().
This will not be entirely kosher based on how the shared node state
structs should have been mutex protected. However, based on my discussion
above, there are already so many violations in this area that a single
more should not matter. A bigger effort should be taken to clean up this entirely.
Fix regression introduced by fix for bug#19524096.
That fix caused ndb_global_schema_lock_error to fail as mysqld
ended up in a deadlock between TransporterFacade and ClusterMgr mutex
in ClusterMgr::is_cluster_completely_unavailable(). There
likely are other deadlock scenarios also.
The fix is to remove the Guard locking clusterMgrThreadMutex.
The rational and (lack of) correctness for this is discussed in
a mail pasted below.
There are also a few small improvements to ::is_cluster_completely_unavailable()
1. Dont copy entire 'trp_node' struct but read it by refference instead.
2. Replace usage of 'm_impl->m_transporter_facade' with 'getTransporter()'
as this is the pattern used elsewhere in this code block.
--------------- Pasted mail with some background ----------------
Subject: (Dead-) locking of ClusterMgr::theNodes[] structures
Hi
Writing this as a note to myself and others after having analyzed
the failure of ndb_global_schema_lock_error.test. That test
timed out as mysqld ended up in a deadlock between
ClusterMgr::clusterMgrThreadMutex and TransporterFacade::theMutexPtr
ClusterMgr maintains node state & info in theNodes[]. From external
components, this info is access through ClusterMgr::getNodeInfo().
theNodes[] are only updated from within ClustMgr, all external access
is read only.
Updates to theNodes[] are partly done from withing several ClustMgr::exec<foo>
methods, and partly from ClusterMgr::reportConnected() / ::reportDisconnected().
All updates seems to be done with ClusterMgr::clusterMgrThreadMutex locked.
Several ClusterMgr methods are available for inspecting node status
from other components, these all use information from theNodes[].
Some of the more commonly used of these methods are:
- TransporterFacade::get_node_alive(n) (Reads 'theClusterMgr->getNodeInfo(n)')
- TransporterFacade::get_an_alive_node() (Implemented on top of ::get_node_alive(n))
- NdbImpl::get_node_stopping(n) (Reads 'theClusterMgr->getNodeInfo(n)')
- NdbImpl::get_node_alive(n) (Reads 'theClusterMgr->getNodeInfo(n)')
- NdbImpl::get<foo> ...A lot more node state getters....
The locking schema used to provide atomicity of theNodes[] for the above
methods are .... mixed, and not really defined at all as far as I can tell.
Some examples:
- NdbDictInterface::dictSignal(), NdbDictInterface::forceGCPWait() & NdbDictInterface::listObjects():
Before calling get_node_alive() / ::get_an_alive_node(), a PollGuard is set.
PollGuard calls trp_client::start_poll() which is different pre/post 7.3:
- Pre 7.3, trp_client::start_poll
-> TransporterFacade::start_poll -
-> lock TransporterFacade mutex (a global mutex)
- 7.3 -> trp_client::start_poll, lock trp_client m_mutex.
-> TransporterFacade::start_poll ...no locking, and mutex gone in this version
Observations: There are no locking of ClusterMgr::clusterMgrThreadMutex here,
neither pre/post 7.3 .
- Ndb_cluster_connection::wait_until_ready(),
Ndb_cluster_connection_impl::get_next_alive_node()
Ndb_cluster_connection::get_no_ready():
These all sets the TransporterFacadeFurthermore mutex.
- Ndb::sendRecSignal
Sets a PollGuard as above, which either lock the TransporterFacade or
the trp_client mutex
- Ndb::sendPrepTrans()
Documents in comments that TransporterFacade mutex should be locked prior to call
So this has become a total mess. It might seem like that it prior
to 7.3 was the intention that TransporterFacade mutex should be held
when accessing theNodes[], or any methods that access it itself.
After 7.3 a mix of TransporterFacade and Trp_client mutex is effectively used
Updating looks like it sets the ClusterMgr mutex to protect these, which will
of course not work as the reader doesnt set this mutex. However, it could be
that all updates happens at places where it is called from the TransporterFacade.
Here we *used to* hold the TransporterFacade mutex prior to 7.3, which would make
some sense. This all needs more investigation .... and some documentation in the code...
In the current state there certainly are no effective concurrency protection of
the node state info in 7.3+ , It could be that it work in 7.1 & 7.2
On top of this the fix for bug#19524096 introduced
ClusterMgr::is_cluster_completely_unavailable() which is also based on
the nodes status available in theNodes[]. Probably in good faith,
backed by that updates of theNodes[] was protected with clusterMgrThreadMutex,
that method grabs that lock before accessing theNodes[] info.
Actually this method is *the only* node state getters which
does any explicit locking. ... and it was doomed to fail as this
was completely untested territory. See other mail about how
it could deadlock with the TranporterFacade mutex.
Sidenote: any other node state getters attempting to follow the
same locking pattern had likely deadlocked the same way.
::is_cluster_completely_unavailable() is called from within code
which also calls ::get_node_alive() and ::get_an_alive_node(),
without using any locking protection for these. Based on this I will
propose a patch for the bug#19524096 regression, which simply
removes the mutex locking from ::is_cluster_completely_unavailable().
This will not be entirely kosher based on how the shared node state
structs should have been mutex protected. However, based on my discussion
above, there are already so many violations in this area that a single
more should not matter. A bigger effort should be taken to clean up this entirely.
Bug#20379693: MISSING BREAK STATEMENT IN SWITCH "MYSQLD_GET_ONE_OPTIO…
…N" FUNCTION IN MYSQLD.C
Description:
===========
Missing break statements in switch case within
"mysqld_get_one_option" function.
case OPT_BINLOGGING_IMPOSSIBLE_MODE:
WARN_DEPRECATED(NULL, "--binlogging_impossible_mode",
"'--binlog_error_action'");
case OPT_SIMPLIFIED_BINLOG_GTID_RECOVERY:
WARN_DEPRECATED(NULL,
"--simplified_binlog_gtid_recovery",
"'--binlog_gtid_simple_recovery'");
Fix:
===
Added missing break statements.
Updated the copyright year in the welcome message for MySQL
Bug #19183565 CREATE DYNAMIC INNODB_TMPDIR VARIABLE TO CONTROL WHERE INNODB WRITES TEMP FILES - Reverting the patch.
Bug #19875710 NDB : COMMIT ACK MARKERS LEAK @ LQH IN 7.4
Fix TC ref counting of Commit Ack markers in LQH so as not
to leak markers at LQH.
TC has one TC-Commit-Ack-Marker record per transaction which
is used to track which nodes and LDM instances hold
LQH-Commit-Ack-Marker records.
This is used when receiving TC_COMMIT_ACK to know which nodes
and LDM instances should be sent a REMOVE_MARKER_ORD signal.
TC only needs one operation per transaction to have
LQH-Commit-Ack-marker records (in each live node in one
nodegroup), so the approach taken is to request them
for all write operations until one of the write
operations succeeds (and keeps its marker at LQH).
After this, subsequent write operations needn't allocate
markers at LQH.
Write operations that don't succeed and don't immediately
cause a transaction abort (e.g. those defined with
IgnoreError, and which e.g. find no row, or row already exists
or something) are aborted (and discarded at LQH), and so they
leave no LQH-Commit-Ack marker.
Where a transaction prepares write operations that all fail at
LQH, there will be no LQH-Commit-Ack markers, and so no need
for a TC-Commit-Ack marker. This is handled using a reference
count of how many LQH-Commit-Ack markers have been requested
*or acknowledged*. If this becomes == 0 then there's no need
for a TC-Commit-Ack marker.
TC uses a per-transaction state and a per-transaction reference
counter to manage this.
The bug is that the reference count was only covering the
outstanding requests, and not the LQH-Commit-Ack markers that
were acknowledged. In other words the reference count was
decremented in execLQHKEYCONF, which signified that an LQH-Commit-Ack
marker was allocated on that LQH instance.
In certain situations this resulted in the allocated LQH-Commit-Ack
markers being leaked, and eventually this causes the cluster to become
read only as new write operations cannot allocate LQH-Commit-Ack markers.
Bug seems to have been added as part of
Bug #19451060 BUG#73339 IN MYSQL BUG SYSTEM, NDBREQUIRE INCORRECT
Fix is to *not* decrement the reference count in execLQHKEYCONF.
However, the current implementation 'forgets' that an operation resulted in
marker allocation (and reference count increment) after LQHKEYCONF is
processed.
To solve this, TC is modified to record which operations caused
LQH-Commit-Ack markers to be allocated, so that during the
per-operation phase of transaction ABORT or COMMIT, the
reference count can be decremented and so re-checked for
consistency.
Some additional jam()s and comments are added.
A new ndbinfo.ndb$pools pool is added - LQH Commit Ack Markers.
This is used in the testcase to ensure that all LQH Commit Ack
markers are released, and may be useful for problem diagnosis
in future.
Replication used in the test to get batching of writing operations
and NdbApi AO_IgnoreError flag setting.
Some basic transaction abort testcases are added which showed problems
with a partial fix.
PreviousNext