You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WL#13574 Include MDL and ACL locks in MTS deadlock detection infra-structure [post-push]
Description
-----------
Three issues introduced by WL#13574 patch being addressed in this patch:
1) Multi-threaded applier stopping with a segmentation fault in ARM
environments.
2) Pure virtual method invocation error in MDL graph infra-structure, being
sporadically observed.
3) The `rpl_gtid.rpl_gtid_mts_spco_deadlock_other_locks` test-case is
failing inn 8.0 and trunk branches.
Analysis/Fix
------------
The analysis and proposed fix for each of the above issues:
1) Within the `memory::Aligned_atomic` the L1 cache-line size is being
fetched programtically and at runtime, in order to optimize memory
usage. The method by which this configuratio value is acquired differs
from OS to OS. For Linux, the method being used was to read a file from
the `proc` filesystem. This is not portable, for instance in ARM such
file doesn't exist. The proper way is to use `sysconf()` and the tag
`_SC_LEVEL1_DCACHE_LINESIZE`.
2) In `Commit_order_manager::wait_on_graph` method, a
`Commit_order_lock_graph` local object is being created, `ticket`, which
reference is passed on to the MDL graph as a node to wait for. In the
same method, a `raii::Sentry<>` object is created, in order to clean the
`ticket` variable reference from the MDL graph at the end of the
scope. The `Commit_order_lock_graph` reference stored in the MDL graph
is accessed by every thread that executes a deadlock search on the MDL
graph.
The problem was that the `raii::Sentry<>` object was being instantiated
**before** the `Commit_order_lock_graph` object. Since the order of
disposal is inverse to the order of creation, the
`Commit_order_lock_graph` object was being disposed of prior to the
invocation of the clean up by the `raii::Senty<>`. So, a time hiatus
between both disposals where the object was already disposed of but
still referenced by the MDL graph. This is fixed by inverting the order
of creation between the objects.
Why not a segmentation fault? Because, since the
`Commit_order_lock_graph` object is local, the memory is still there, in
the stack. OTOH, the destructor for the object was already invoked,
meaning, no memory violation but the information about the object is
cleared, hence the _pure virtual method invocation_ error due object
access by casting it (the `Commit_order_lock_graph` reference) to the
parent class, `MDL_wait_for_subgraph`.
3) Within a multi-threaded applier with _replica-preserve-commit-order_
enabled, there are two execution paths by which the multi-threaded
applier coordinator may exit due to a deadlock: all workers are waiting
on the commit order queue and they are all asked to back-off by the MDL
graph infra-structure; there are workers that haven't arrive the commit
stage yet and will back-off due to the state of the commit order
queue. The different paths make the applier output different error
messages. The test-cases needed to be updated to reflect the
difference. For each test case, both execution paths are now exercised
and tested.
Reviewed-by: Pedro Gomes <[email protected]>
Reviewed-by: Sven Sandberg <[email protected]>
RB: 25616
--let $wait_condition=SELECTcount(*) = $mts_spco_gd_pending_workersFROMinformation_schema.processlistWHERESTATE="Waiting for preceding transaction to commit"
182
-
--sourceinclude/wait_condition.inc
224
+
if ($mts_spco_gd_worker_3_only_runs_after_deadlock==0)
0 commit comments