-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-37541 Race of rolling back and committing transaction to binlog #4301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 10.6
Are you sure you want to change the base?
Conversation
This is the 1st of two commits to demonstrate the failure which is that a rolling back transaction while serializes its completion in Engine before a contender transaction the two are logged into binary log in reverse order.
The 2nd commit of two contains the fixes. Two transactions could binlog their completions in opposite to how it is done in Engine. That is is rare situations ROLLBACK in Engine could be scheduled first by 2pc coordinator to leave by chance binlogging of the transaction and whatever might be dependent on it in execution time. This is fixed with ensuring the binlog rollback handlerton is always executed first in 2pc.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @andrelkin ! Thanks for the patch!
I left a couple questions on a comment.
|
||
int err; | ||
if (has_binlog_hton(ha_info) && | ||
(err= binlog_hton->rollback(binlog_hton, thd, all))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had time to fully investigate some of my questions/concerns before I leave for vacation, but I figured I'd at least voice them before I leave.
@knielsen mentions on JIRA
Hm. Normally this kind of wrong order is prevented by ensuring that binlogging always happens before releasing table and row locks.
Apparently, this is violated here for ROLLBACK. Not sure why or how the ROLLBACK is executed before binlogging happens.
Which I interpret as supporting your solution. But to me it looks a bit unconventional:
- Why does the 2pc coordinator/orderer not consider rollback (or does it and I just misunderstand)?
- Would it be better to integrate the rollback case with the 2pc coordinator?
Otherwise, it seems we may run other opposite binlog-order problems when rollback is BINLOG->ENGINE
and commit can be ENGINE->BINLOG
for direct writes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the 2pc coordinator/orderer not consider rollback
I am somewhat lost with the question.. Let me make it fully clear, the matter at hand is about the engine rollback
runs ahead of its order in the 2pc protocol. The engine completes (a similar would be with COMMIT, but it does not exist that thanks to the same measure as in the patch) the trx before binlog (as coordinator) gets informed.
andrelkin ***@***.***> writes:
This is the 1st of two commits to demonstrate the failure which is
that a rolling back transaction while serializes its completion in
Engine before a contender transaction the two are logged into binary
log in reverse order.<!--
diff --git a/sql/handler.cc b/sql/handler.cc
index 5d7138490fdd2..c8ddb0b5ebd5f 100644
--- a/sql/handler.cc
+++ b/sql/handler.cc
@@ -2306,11 +2306,18 @@ int ha_rollback_trans(THD *thd, bool all)
if (is_real_trans) /* not a statement commit */
thd->stmt_map.close_transient_cursors();
+ int err;
+ if (has_binlog_hton(ha_info) &&
+ (err= binlog_hton->rollback(binlog_hton, thd, all)))
+ {
+ my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
+ error= 1;
+ }
for (; ha_info; ha_info= ha_info_next)
{
- int err;
handlerton *ht= ha_info->ht();
- if ((err= ht->rollback(ht, thd, all)))
+
+ if (ht != binlog_hton && (err= ht->rollback(ht, thd, all)))
If binlog needs to do some operation before engines roll back (and I agree
this is needed here), then that operation should not be done inside the
`binlog_hton->rollback()` call. This is really ugly, first we iterate a list
to find some specific element and operate on it, then we iterate the list
again and have a check to skip that specific element. Instead, use an
explicit call into the binlog that does the operations needed, at the point
it time where it should happen.
(And yes, I know this is done similar in `commit_one_phase_2()`, and it is
as wrong there as it is here. Repeating a sin doesn't make it right ;-).
Also, do we really need to do this in 10.6? That doesn't seem right. This
looks like a bug that has always been there, and IIUC it only occurs when
mixing transactional and non-transactional tables in the same "transaction",
something that is in any case fragile and problem-ridden.
- Kristian.
|
Yes we unfortunately do, there's an active support case. And for that reason I suggest we keep the ugly style for 10.6 alone. After all its pattern is proven to be effectual and robust. For 10.11 and up I am preparing much better shaped code that addresses the search for binlog hton. I hope the two branches compromise makes sense to you. |
andrelkin ***@***.***> writes:
andrelkin left a comment (MariaDB/server#4301)
@knielsen
> do we really need to do this in 10.6?
Yes we unfortunately do, there's an *active* support case. And for
That's not a valid reason.
I suggest we keep the ugly style for 10.6 alone. After all
its pattern is proven to be effectual and robust.
For 10.11 and up I am preparing much better shaped code that addresses
the search for binlog hton.
I hope the two branches compromise makes sense to you.
Ok, sounds reasonable.
- Kristian.
|
> + int err;
> + if (has_binlog_hton(ha_info) &&
> + (err= binlog_hton->rollback(binlog_hton, thd, all)))
> + {
> + my_error(ER_ERROR_DURING_COMMIT, MYF(0), err);
> + error= 1;
> + }
> for (; ha_info; ha_info= ha_info_next)
> {
> - int err;
> handlerton *ht= ha_info->ht();
> - if ((err= ht->rollback(ht, thd, all)))
> +
> + if (ht != binlog_hton && (err= ht->rollback(ht, thd, all)))
If binlog needs to do some operation before engines roll back (and I agree
this is needed here), then that operation should not be done inside the
`binlog_hton->rollback()` call. This is really ugly, first we iterate a list
to find some specific element and operate on it, then we iterate the list
again and have a check to skip that specific element. Instead, use an
explicit call into the binlog that does the operations needed, at the point
it time where it should happen.
Andrei, please hold off with this patch for a bit.
I just talked to Monty, he mentioned that the binlog is supposed to always
be first in the ha_info list. Possibly something at some point broke that,
and this could be the root cause of this bug (or regression in this case).
Monty wants to look into this and discuss with Serg. If we can ensure that
binlog is always first in the list, then it automatically should fix this
case, and could fix other corner cases as well.
- Kristian.
|
Not that it's impossible but I've been pessimistic about it as there're a number of ways to screw this nice property. Just one simple case. |
Well, the above
Now
My feeling remains the same: it costs nothing to locate B because I had A which aimed to keep |
Added Monty and Serg to the discussion about
|
I gave it some more thinking. Here is a summary of There exist few patterns when
where it's There also exist a harder pattern like mentioned two 2pc engines with one them
If we go via p.1, we'll to fix
I don't see why the 2nd, being clearly of lesser efforts, should be lesser favorable. Dear reviewers, please respond with your ideas or voting. |
Somewhat polished version of p.2 of my last comments is pushed to bb-10.11-andrei with the idea it to be 10.11+ version while bb-10.6-MDEV-37541 to cover 10.6 alone. |
Disclaimer: I only looked at the first commit. For now I'm trying to understand a problem and what a solution should be, not to discuss a particular solution. I see a few rather suspicious issues demonstrated by the first commit:
|
The immediate issue is indeed that a Memory update caused the binlog branch registration at logging in ROW format and that's due to a legacy pre- (I did not track it further down)
So we do that for any engine modification. The practice introduced back in time may (does) need revising just as you and Kristian point. However as comments mention this issue is wider than how it appears here. See the |
binlog-filtered doesn't seem to matter, but, indeed, with two engines one can have binlog in the middle. Still, Kristian and my point stands — Rollback should do full 2pc if it needs to write something to binlog. |
:-( I over-complicated it: {Innodb, Binlog, Rocksdb} is a natural order for two engine trx like
For what reason actually? The policy of ROLLBACK-ended logging, no need to remind you, is merely to record side effects which of course are non-transactional by definition. And in that sense such logging is similar to any non-transactional logging. PS. I read a key Kristian's note
(rather than |
What do you mean, for what reason, that's how 2pc works. if there's one participant that cannot do 2pc, it should be in the middle. That's why the protocol is what it is — every engine prepares, binlog commits (writes to binlog, it's tc->log call), then everyone commits. As for rollbacks, there's normally no need to do that, every participant (including binlog) simply throws transaction changes away. But a "rollback" when a non-trans engine is involved is not really a rollback. That engine has the changes committed. And binlog needs to commit (write transaction down persistently) too, so it's a commit, of a sort. And must be subjected to full 2pc. {Innodb, Binlog, Rocksdb} case is not a problem, it'll be a true rollback and nothing will be written to binlog. It doesn't matter whether nothing will be written to binlog first or last. |
What engine committed? None of the
just as you say the transaction is fine to rollback anyway it likes. What happens is that along with the transactional changes there's a non-transactional modification which may be placed in the transactional cache. And the whole cache with ROLLBACK termination is logged to reflect that fact.
Not a problem without side effects. But we have ones in this bug... I should've mentioned that just in case in my last comments). |
Aria. Or MyISAM. Any non-transactional engine. "Committed" in the sense that changes made between
what problem do you have in this bug with {Innodb, Binlog, Rocksdb} case? |
Right. So we can say it one-phase-committed. We have two different ways to log such 1pc engine statement when they mingle a 2pc engine transaction.
When one of two 2pc transactions that depend on each other through Innodb engine, and the dependency parent is represented by the list of handlertons as |
@vuvova , just one little thing, there is a detail that seems to have been left out
but may be critical to the understanding. And it is that is not the Memory table that produces the side effect in the test. It's DROP TEMPORARY TABLE which causes |
This is the 1st of two commits to demonstrate the failure which is
that a rolling back transaction while serializes its completion in
Engine before a contender transaction the two are logged into binary
log in reverse order.
Description
TODO: fill description here
Release Notes
TODO: What should the release notes say about this change?
Include any changed system variables, status variables or behaviour. Optionally list any https://mariadb.com/kb/ pages that need changing.
How can this PR be tested?
TODO: modify the automated test suite to verify that the PR causes MariaDB to behave as intended.
Consult the documentation on "Writing good test cases".
If the changes are not amenable to automated testing, please explain why not and carefully describe how to test manually.
Basing the PR against the correct MariaDB version
main
branch.PR quality check