-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
MDEV-4698 #4364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 10.11
Are you sure you want to change the base?
Conversation
* Scan the relay log for GTIDs and the number of events in a partially-logged transaction * For scanning performance, add `Gtid_list_log_event` headers to relay logs to parallel binary logs * Fall back to `@@relay_log_recovery` when unsuccessful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note
If a domain is not listed in a GTID position, it is treated as positioned before the domain’s first event.
This allows replicas to pick up new domains without configuration.
| while ((ev= Log_event::read_log_event(round == 1 ? first_log : &log, &error, | ||
| fdle, opt_master_verify_checksum)) | ||
| /* | ||
| A slave will verify the checksum in the SQL thread; | ||
| the relay log does not care about checksums until then. | ||
| */ | ||
| fdle, !is_relay_log && opt_master_verify_checksum)) | ||
| && ev->is_valid()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like TC_LOG_BINLOG::recover() aborts with no error when it encounters an “invalid” event.
This might be okay for the relay log, but I doubt it’s correct for binlog full-recovery.
| DBUG_ASSERT(rpl_global_gtid_slave_state->loaded); | ||
| if (rpl_load_gtid_state(&mi->gtid_current_pos, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method uses rpl_global_gtid_slave_state a little later than this place.
I wonder what has loaded the SQL position, guaranteed?
sql/log.cc
Outdated
| Gtid_list_log_event gl_ev= is_relay_log ? | ||
| Gtid_list_log_event(&mi->gtid_current_pos, 0) : | ||
| Gtid_list_log_event(&rpl_global_gtid_binlog_state, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binary log uses a rpl_binlog_state, which uses Domain-Server ID pairs as keys, as noted in MDEV-34614 (comment); whereas an IO thread uses a slave_connection_state (which appears to be intended for status tracking by a primary), which only uses Domain IDs as keys.
To bridge the different data models of these GTID lists with reüsable code, perhaps we could use a common interface superclass for these two and similar others:
mariadb-binlog --start/stop-position(after MDEV-37231): Domain & Server ID → Sequence Number- Binary log: Domain ID → hashmap[Server ID → GTID], caches of the latest GTID and the next Sequence Number
- Although we might prefer soft-deprecating this file-based Binlog in favour of Binlog-in-InnoDB.
- Binlog dump thread: Domain ID → Server ID, Sequence Number, flags
- IO thread: Domain ID → Server ID, Sequence Number (no
slave_connection_stateflags used, apparently) @@gtid_slave_pos(rpl_slave_state): Domain ID → Server ID, Sequence Number, various other state info
| } | ||
| } | ||
| if (ddl_log_close_binlogged_events(&ddl_log_ids)) | ||
| if (!is_relay_log && ddl_log_close_binlogged_events(&ddl_log_ids)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to find that XA-less binlog recovery is generally good for GTID-only relay log scanning – except for this step.
The hindrance when reading the code was how TC_LOG_BINLOG::recover() blends GTID and XA recoveries so much, XA-less recovery still tracks some XA things that it doesn’t need, such as xids and several Recovery_context fields.
| the .state file to restore the binlog state. This allows to copy a server | ||
| to provision a new one without copying the binlog files (except the | ||
| master-bin.state file) and still preserve the correct binlog state. | ||
| Note that relay logs do not cache a `.state` file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I omitted enabling this to reduce the impact of this “bug fix”.
This aside, how beneficial would it be (for the main branch)?
* (re)initialize `Gtid_IO_pos` from `@@gtid_slave`/
`current_pos` when discarding unprocessed relay logs
* `@@relay_log_recovery`
* CHANGE MASTER
* RESET REPLICA
* a bunch of ~~(attempt to)~~ fix some problems from Snapshot 1
* Forward-declare `Master_info` with the matching composite keyword.
* `Master_info` is only available when `HAVE_REPLICATION`.
* That `GTID_LIST_EVENT` was never checksummed.
* The scan for the `FORMAT_DESCRIPTION_EVENT` heading was
terminating at that `GTID_LIST_EVENT`.
* Update MTR result to include those `GTID_LIST_EVENT`s
Description
Gtid_list_log_eventheaders to relay logs to parallel binary logs@@relay_log_recoverywhen unsuccessfulTODO: fill description here
How can this PR be tested?
TODO: modify the automated test suite to verify that the PR causes MariaDB to behave as intended.
Consult the documentation on "Writing good test cases".
PR quality check
mainbranch.