Skip to content

testing/synctest: isolation failure in encoding/json #73733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
neild opened this issue May 15, 2025 · 4 comments
Open

testing/synctest: isolation failure in encoding/json #73733

neild opened this issue May 15, 2025 · 4 comments
Assignees
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@neild
Copy link
Contributor

neild commented May 15, 2025

encoding/json uses a WaitGroup to coordinate access to a cache between multiple goroutines:
https://go.googlesource.com/go/+/refs/tags/go1.24.3/src/encoding/json/encode.go#359

When multiple synctest bubbles access the cache at the same time, a goroutine in one bubble can become blocked on a goroutine in the other bubble. Since we don't track whether the WaitGroup was created in a bubble, or which bubble it was created in, this looks like a deadlock to synctest.

A possible fix is to change encoding/json to use a sync.Once here. A goroutine blocked on a Once is not "durably" blocked in synctest terms. Also, a sync.Once is probably a clearer way of expressing the intent of this encoding/json code.

@mknyszek mknyszek added NeedsFix The path to resolution is known, but the work has not been done. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels May 15, 2025
@mknyszek mknyszek added this to the Backlog milestone May 15, 2025
@mknyszek mknyszek removed the NeedsFix The path to resolution is known, but the work has not been done. label May 15, 2025
@mknyszek mknyszek modified the milestones: Backlog, Go1.25 May 15, 2025
@neild neild self-assigned this May 15, 2025
@dsnet
Copy link
Member

dsnet commented May 15, 2025

FYI, the v2 implementation doesn't wait for completion for computation for the full type tree:

v, _ := lookupArshalerCache.LoadOrStore(t, fncs)

It defers recursive type computation lazily with a sync.Once as suggested:

var (
once sync.Once
keyFncs *arshaler
valFncs *arshaler
)
init := func() {
keyFncs = lookupArshaler(t.Key())
valFncs = lookupArshaler(t.Elem())
}

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/673335 mentions this issue: encoding/json: avoid supurious synctest deadlock detection

@chad-bekmezian-snap
Copy link

Isn’t this an issue that anybody using synctest could easily run into while writing completely valid code? Why wouldn’t we want to address the underlying issue, rather than the symptom?

@neild
Copy link
Contributor Author

neild commented May 16, 2025

Why wouldn’t we want to address the underlying issue, rather than the symptom?

Do you have a specific suggestion on how we can address the underlying issue?

There's quite a bit of discussion in #67434 about the general problem of goroutines in a synctest bubble accessing global concurrency constructs. We arrived at a set of compromises which I think are, while imperfect, good enough to make the synctest package more useful than not. I don't believe there is any perfect solution to this problem, although it's probable that we can do better than the current implementation. (In particular, it's worth considering whether we can report cross-bubble use of a WaitGroup as happened here--if this had been a panic, it would still be a problem but it would have been a simpler one to diagnose.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

5 participants