-
Notifications
You must be signed in to change notification settings - Fork 12
Minimize the blast radius of lock-counter bugs #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds defensive logic around recycling lock counter objects via sync.Pool
to ensure their internal counters are always zeroed and to panic in test mode if invariants are violated.
- Introduce
rwPoolPut
to validate and resetrwlockCtr
counters before putting back into the pool. - Wrap pool-Get in both
get
andLock
methods with test-mode checks and explicit zeroing of counters. - Replace direct
sync.Pool.Put
calls inUnlock
/RUnlock
(and inMutexMap
) with the new defensive helper.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
rwmutexmap.go | Added rwPoolPut wrapper, replaced direct Put calls, and added test-only counter invariant checks around pool Get /Put . |
mutexmap.go | Added test-only panic checks and defensive waiters resets in Lock /Unlock , and replaced direct pool Put with defensive cleanup. |
Comments suppressed due to low confidence (3)
rwmutexmap.go:28
- There are no tests exercising the new
rwPoolPut
behavior (e.g., confirming counters are zeroed). Add unit tests to explicitly cover both the normal path and panic path in test mode.
func rwPoolPut(nameLock *rwlockCtr) {
rwmutexmap.go:34
- The call to testing.Testing() will fail to compile because
testing.Testing
is not part of Go's testing API. Consider using a build tag or an injectable flag to enable test-only checks instead of importingtesting
in production code.
if testing.Testing() {
mutexmap.go:43
- Similar to the above,
testing.Testing()
does not exist in thetesting
package. Remove this import and guard or replace it with a valid mechanism for test-only assertions.
if testing.Testing() {
faa2054
to
b058b63
Compare
Recycling objects with a sync.Pool risks amplifying the impact of bugs. If the object is not reset to be functionally indistinguishable from a newly-constructed one before being returned to the pool, the unlucky recipient of the recycled object could misbehave through no fault of its own. And what's worse, such bugs would likely manifest as Heisenbugs as they would only happen when objects returned to the pool are being reused, i.e. a program with lots of concurrency and lots of lock churn. Make the pool optimization functionally transparent so the impact of counter bugs does not extend beyond the scope it would have had without the pool optimization. Defensively reset the lock counters before returning them to the pool and after taking them from the pool. And add testing-only code which asserts that the invariants are upheld to ensure that the defensive coding is not papering over real bugs. Signed-off-by: Cory Snider <[email protected]>
import "testing" | ||
|
||
func isTesting() bool { | ||
return testing.Testing() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I like importing testing like this, though admittedly I'd not seen this function before.
Are we expecting users of this library to have the different behavior under test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps some package-level bool that's set as part of tests in this repository (so both "testing" to be true, and the bool to be set?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
admittedly I'd not seen this function before.
It's new as of go1.21. The proposal was https://go.dev/issue/52600
Are we expecting users of this library to have the different behavior under test?
If this library is free of bugs there will be no observed difference in behaviour in testing vs. application code. But if this library does contain latent bugs, I figure this strikes the best compromise between stability in production and discoverability of bugs.
There are a few ways to go about defending against lock counter / concurrency bugs:
- Defensively reset the structs unconditionally. Heisenbugs in production may never get caught, or could get misattributed to the application. This is closest to the v1 behaviour.
- Unconditionally assert the invariants. Bugs will get noticed and correctly attributed to the library, but at the risk of programs panicking in production.
- Defensively reset the structs in applications, assert the invariants in test binaries. Test coverage of the library expands to the test suites of our users, without causing any grief in production.
- Assert the invariants in moby/locker CI; defensive resetting everywhere else, including importer test suites. Only marginally better than option 1 as bugs will only be caught if we get really lucky in a CI run of the moby/locker tests.
I would rather go with option 2 if we can't come to a consensus on 3. Better to fail fast and definitively than send application maintainers on a wild goose chase for "impossible" unbounded concurrency as a consequence of bugs in moby/locker.
Perhaps some package-level bool that's set as part of tests in this repository (so both "testing" to be true, and the bool to be set?)
There's no point to call testing.Testing()
in _test.go files as execution of code in those files already implies that a go test
program is being executed.
Recycling objects with a sync.Pool risks amplifying the impact of bugs. If the object is not reset to be functionally indistinguishable from a newly-constructed one before being returned to the pool, the unlucky recipient of the recycled object could misbehave through no fault of its own. And what's worse, such bugs would likely manifest as Heisenbugs as they would only happen when objects returned to the pool are being reused, i.e. a program with lots of concurrency and lots of lock churn.
Make the pool optimization functionally transparent so the impact of counter bugs does not extend beyond the scope it would have had without the pool optimization. Defensively reset the lock counters before returning them to the pool and after taking them from the pool. And add testing-only code which asserts that the invariants are upheld to ensure that the defensive coding is not papering over real bugs.