Skip to content

Conversation

@ParadoxV5
Copy link
Contributor

Description

  • Scan the relay log for GTIDs and the number of events in a partially-logged transaction
  • For scanning performance, add Gtid_list_log_event headers to relay logs to parallel binary logs
  • Fall back to @@relay_log_recovery when unsuccessful

TODO: fill description here

  1. What problem is the patch trying to solve?
  2. If some output changed that is not visible in a test case, what was it looking like before the change and how it's looking with this patch applied?
  3. Do you think this patch might introduce side-effects in other parts of the server?

How can this PR be tested?

TODO: modify the automated test suite to verify that the PR causes MariaDB to behave as intended.
Consult the documentation on "Writing good test cases".

PR quality check

  • *This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.
  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

* Scan the relay log for GTIDs and the number
  of events in a partially-logged transaction
* For scanning performance, add `Gtid_list_log_event`
  headers to relay logs to parallel binary logs
* Fall back to `@@relay_log_recovery` when unsuccessful
@ParadoxV5 ParadoxV5 added MariaDB Corporation Replication Patches involved in replication labels Oct 15, 2025
Copy link
Contributor Author

@ParadoxV5 ParadoxV5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

If a domain is not listed in a GTID position, it is treated as positioned before the domain’s first event.
This allows replicas to pick up new domains without configuration.

Comment on lines 11740 to 11746
while ((ev= Log_event::read_log_event(round == 1 ? first_log : &log, &error,
fdle, opt_master_verify_checksum))
/*
A slave will verify the checksum in the SQL thread;
the relay log does not care about checksums until then.
*/
fdle, !is_relay_log && opt_master_verify_checksum))
&& ev->is_valid())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like TC_LOG_BINLOG::recover() aborts with no error when it encounters an “invalid” event.
This might be okay for the relay log, but I doubt it’s correct for binlog full-recovery.

Comment on lines +3920 to +3921
DBUG_ASSERT(rpl_global_gtid_slave_state->loaded);
if (rpl_load_gtid_state(&mi->gtid_current_pos,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method uses rpl_global_gtid_slave_state a little later than this place.
I wonder what has loaded the SQL position, guaranteed?

sql/log.cc Outdated
Comment on lines 4119 to 4121
Gtid_list_log_event gl_ev= is_relay_log ?
Gtid_list_log_event(&mi->gtid_current_pos, 0) :
Gtid_list_log_event(&rpl_global_gtid_binlog_state, 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binary log uses a rpl_binlog_state, which uses Domain-Server ID pairs as keys, as noted in MDEV-34614 (comment); whereas an IO thread uses a slave_connection_state (which appears to be intended for status tracking by a primary), which only uses Domain IDs as keys.
To bridge the different data models of these GTID lists with reüsable code, perhaps we could use a common interface superclass for these two and similar others:

  • mariadb-binlog --start/stop-position (after MDEV-37231): Domain & Server ID → Sequence Number
  • Binary log: Domain ID → hashmap[Server ID → GTID], caches of the latest GTID and the next Sequence Number
    • Although we might prefer soft-deprecating this file-based Binlog in favour of Binlog-in-InnoDB.
  • Binlog dump thread: Domain ID → Server ID, Sequence Number, flags
  • IO thread: Domain ID → Server ID, Sequence Number (no slave_connection_state flags used, apparently)
  • @@gtid_slave_pos (rpl_slave_state): Domain ID → Server ID, Sequence Number, various other state info

}
}
if (ddl_log_close_binlogged_events(&ddl_log_ids))
if (!is_relay_log && ddl_log_close_binlogged_events(&ddl_log_ids))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to find that XA-less binlog recovery is generally good for GTID-only relay log scanning – except for this step.

The hindrance when reading the code was how TC_LOG_BINLOG::recover() blends GTID and XA recoveries so much, XA-less recovery still tracks some XA things that it doesn’t need, such as xids and several Recovery_context fields.

the .state file to restore the binlog state. This allows to copy a server
to provision a new one without copying the binlog files (except the
master-bin.state file) and still preserve the correct binlog state.
Note that relay logs do not cache a `.state` file.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I omitted enabling this to reduce the impact of this “bug fix”.
This aside, how beneficial would it be (for the main branch)?

* (re)initialize `Gtid_IO_pos` from `@@gtid_slave`/
  `current_pos` when discarding unprocessed relay logs
  * `@@relay_log_recovery`
  * CHANGE MASTER
  * RESET REPLICA
* a bunch of ~~(attempt to)~~ fix some problems from Snapshot 1
  * Forward-declare `Master_info` with the matching composite keyword.
  * `Master_info` is only available when `HAVE_REPLICATION`.
  * That `GTID_LIST_EVENT` was never checksummed.
  * The scan for the `FORMAT_DESCRIPTION_EVENT` heading was
    terminating at that `GTID_LIST_EVENT`.
  * Update MTR result to include those `GTID_LIST_EVENT`s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

MariaDB Corporation Replication Patches involved in replication

Development

Successfully merging this pull request may close these issues.

1 participant