Skip to content

fix: partial replication data loss #5297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

fix: partial replication data loss #5297

wants to merge 1 commit into from

Conversation

kostasrim
Copy link
Contributor

The issue is that each journal entry is not always executed by the replica yet we increment the journal records executed. An example of that is when the context gets cancelled (e,g connection lost from master) before the command dispatches for execution. However, when the replica reconnects it will start from the next lsn, ignoring the journal entry which never dispatched leading to data loss.

What's described was reproduced in #5277

resolves #5275

@kostasrim kostasrim self-assigned this Jun 13, 2025
// got cancelled, e.g, replication connection broke), we will get
// inconsistent data because the replica will resume from the next
// lsn of the master and this lsn entry will be lost.
journal_rec_executed_.fetch_add(1, std::memory_order_relaxed);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is a general issue here because lsn and records executed might not be 1-1. If that's the case we might have data loss under certain scenarios.

// TODO investigate this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_partial_sync failed
1 participant