Skip to content

concurrency: update wait queues in AddDiscoveredLock #147065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stevendanna
Copy link
Collaborator

@stevendanna stevendanna commented May 21, 2025

Concurrent readers in the face of a concurrent, retrying writer can result in a situation where AddDiscoveredLock moves the timestamp of a held intent past the read timestamp a waiting reader. The reader should be unblocked in this case but previously wasn't, resulting in a lock table verification assertion failure in the form of:

error: non locking reader ... does not conflict with lock holder

This is my best theory for what is happening in #146749 based on increased logging. It is difficult to be certain given the required ordering of events is hard to observe with locking.

See the comment in the test for a more complete timeline that can lead to the bug.

Fixes #146749

Release note: None

@stevendanna stevendanna requested a review from a team as a code owner May 21, 2025 12:19
@stevendanna stevendanna requested a review from miraradeva May 21, 2025 12:19
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna stevendanna force-pushed the ssd/add-discovered-recomputes-wait-queue branch from 2aae2e1 to 566e0bf Compare May 21, 2025 12:31
Concurrent readers in the face of a concurrent, retrying writer can
result in a situation where AddDiscoveredLock moves the timestamp of a
held intent past the read timestamp a waiting reader. The reader
should be unblocked in this case but previously wasn't, resulting in a
lock table verification assertion failure in the form of:

   error: non locking reader ... does not conflict with lock holder

This is my best theory for what is happening in cockroachdb#146749 based on
increased logging. It is difficult to be certain given the required
ordering of events is hard to observe with locking.

See the comment in the test for a more complete timeline that can lead
to the bug.

Fixes cockroachdb#146749

Release note: None
@stevendanna stevendanna force-pushed the ssd/add-discovered-recomputes-wait-queue branch from 566e0bf to 75d36be Compare May 21, 2025 13:54
@stevendanna stevendanna removed the request for review from a team May 21, 2025 13:55
@stevendanna stevendanna assigned arulajmani and unassigned arulajmani May 21, 2025
@stevendanna stevendanna requested a review from arulajmani May 21, 2025 13:55
Copy link
Contributor

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me, but maybe @arulajmani should take a look too. There are so many branches in some of these lock table functions, and I'm never sure when we need to informActiveWaiters vs recomputeWaitQueues vs both.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/kv/kvserver/concurrency/lock_table.go line 3117 at r2 (raw file):

	if tl.replicatedInfo.isEmpty() {
		tl.replicatedInfo.acquire(foundLock.Strength, foundLock.Txn.WriteTimestamp)

Can we end up in a similar situation in this branch? E.g. the lock was held unreplicated at first, then held as an intent by the same transaction at a higher timestamp, and then it was discovered by some other request.


pkg/kv/kvserver/concurrency/testdata/lock_table/add_discovered line 170 at r1 (raw file):

# update at step 12.
#
# Note that in the test as written, the followin are how the reqests align

nit: typos on "following" and "requests"

Copy link
Contributor

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't mean to block it.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @stevendanna)

Copy link
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @miraradeva)


pkg/kv/kvserver/concurrency/lock_table.go line 3117 at r2 (raw file):

Previously, miraradeva (Mira Radeva) wrote…

Can we end up in a similar situation in this branch? E.g. the lock was held unreplicated at first, then held as an intent by the same transaction at a higher timestamp, and then it was discovered by some other request.

I think that "the lock was held unreplicated at first" would still fall under the kl.isLockedBy branch assuming the discovered lock is for the same transaction.

That said though, I do kinda wonder if we could re-arrange this function a bit. It seems to me that instead of calling tl.replicatedInfo.acquire(foundLock.Strength, foundLock.Txn.WriteTimestamp) conditionally here, we could remove it from the first branch, call it unconditionally here, and then always recompute the wait queues. If you take the else branch above, then I think isEmpty() is true by definition.

@stevendanna
Copy link
Collaborator Author

@arulajmani If you have some time for review, this would clear a flaky test.

@arulajmani
Copy link
Collaborator

Sorry I missed this. I'll have a look today after our weekly meeting.

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice sleuthing here!

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva and @stevendanna)


pkg/kv/kvserver/concurrency/testdata/lock_table/add_discovered line 159 at r2 (raw file):

# 6 |                          | FinishReq                |                            |                             |
# 8 |                          |                          |                            | SeqReq                      |
# 9 |                          |                          | AddDiscovered(intent@ts10) |                             |

@stevendanna is another way to look at this issue that r1@ts10 doesn't conflict on latches with w2@ts12? Even though this doesn't cause the test to fail, because r1 has pulled the wrong intent into the lock table, r1 shouldn't really be waiting on the intent written by w2.

What if we had w2 acquire a write latch at Txn.MinTimestamp (in this case ts@10) instead? Conceptually, you could think of re-writing an intent at a higher timestamp as intent resolution. Should we be mimicking how intent resolution does latching, for similar reasons?

minTxnTS = t.IntentTxn.MinTimestamp

Copy link
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @miraradeva)


pkg/kv/kvserver/concurrency/testdata/lock_table/add_discovered line 159 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

@stevendanna is another way to look at this issue that r1@ts10 doesn't conflict on latches with w2@ts12? Even though this doesn't cause the test to fail, because r1 has pulled the wrong intent into the lock table, r1 shouldn't really be waiting on the intent written by w2.

What if we had w2 acquire a write latch at Txn.MinTimestamp (in this case ts@10) instead? Conceptually, you could think of re-writing an intent at a higher timestamp as intent resolution. Should we be mimicking how intent resolution does latching, for similar reasons?

minTxnTS = t.IntentTxn.MinTimestamp

Yes, I definitely think another way to look at this is from the latching angle. Latching at the MinTimestamp is an interesting idea.

It would mean a retrying transaction is more likely to be blocked by concurrent readers. But whether that is a bad thing isn't so clear to me. In this case it does seem like it would have been better. Perhaps we can use this vecindex workload as something of a benchmark to see how they compare.

Copy link
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @miraradeva)


pkg/kv/kvserver/concurrency/testdata/lock_table/add_discovered line 159 at r2 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Yes, I definitely think another way to look at this is from the latching angle. Latching at the MinTimestamp is an interesting idea.

It would mean a retrying transaction is more likely to be blocked by concurrent readers. But whether that is a bad thing isn't so clear to me. In this case it does seem like it would have been better. Perhaps we can use this vecindex workload as something of a benchmark to see how they compare.

Apologies for the delay here. I think this fix is correct, but I am still interested in exploring changing the latching timestamp.

I'll try to get this done this week.

stevendanna added a commit to stevendanna/cockroach that referenced this pull request Jun 25, 2025
Retrying writers currently latch at their WriteTimestamp, allowing them
to overlap with readers who might be reading intents from their previous
Epoch. This is fine, but does lead to some interesting situations like
the one described in cockroachdb#147065. In some cases, such readers will end up
unnecessarily waiting on the old intent and will only be unblocked when
a subsequent reader discovers the updated intent or when the writer
completes its transaction.

Latching at the MinTimestamp would disallow the overlapping reader and
writer. The reader would wait until the writer had updated its intent
and then would proceed with its read.

Release note: None
@stevendanna
Copy link
Collaborator Author

I've opened an experiment for the change to the latch timestamp here: #148802

I personally think we should probably do both fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/vecindex: TestVecIndexConcurrency failed [panic in lockTable]
4 participants