Skip to content

Conversation

pcd1193182
Copy link
Contributor

Sponsored by: [Klara, Inc.; Wasabi Technology, Inc.]

Motivation and Context

When a write comes in via dmu_sync_late_arrival, its txg is equal to the open TXG. If that write gangs, and we have not yet activated the new gang header feature, and the gang header we pick can store a larger gang header, we will try to schedule the upgrade for the open TXG + 1. In debug mode, this causes an assertion to trip in txg_verify. I can pretty reliably reproduce this on a performance test setup I have.

Description

This PR sets the TXG for activating the feature to be at most the current open TXG. Activating the feature a TXG early shouldn't cause any problems, since we don't use the activation txg directly. I believe this method of doing accessing the current open TXG is safe; the current open TXG could increase while the comparison/replacement is happening, but I don't believe a value larger than the open txg can get into txg_verify with this code. And we don't use atomics anywhere else to access this field, so it shouldn't be necessary here either.

How Has This Been Tested?

Manual testing with the workload that originally triggered the problem.

Unfortunately I haven't been able to find a small reproducer for this. I've tried a few things, but we need the first thing that gangs to be a dmu_sync_late_arrival call, which is not trivial to orchestrate. If anyone has ideas for a test that would work, I'm happy to try them out.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

When a write comes in via dmu_sync_late_arrival, its txg is equal to the
open TXG. If that write gangs, and we have not yet activated the new
gang header feature, and the gang header we pick can store a larger gang
header, we will try to schedule the upgrade for the open TXG + 1. In
debug mode, this causes an assertion to trip. This PR sets the TXG for
activating the feature to be at most the current open TXG.

Signed-off-by: Paul Dagnelie <[email protected]>
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
@amotin
Copy link
Member

amotin commented Oct 9, 2025

I have feeling this is a workaround on a workaround. The txg is when we really want this feature to be activated. We increment it by 1 for the case when the code is called from syncing context. But could we increment it only when we know that it is already syncing? dmu_sync_late_arrival() should hold the transaction open, so it should not get into syncing until the write completes.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Code Review Needed Ready for review and testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants