bgp: tests fail intermittently when run via cargo test but pass consistently with cargo nextest

`test_basic_update` and `test_basic_peering` both fail intermittently when run via `cargo test` but pass consistently when executed with `cargo nextest`

I lazily asked Claude Code to debug this for me and (after much deliberation and needing to ask it to "ultrathink" about it) it believes the issue is related to the `NET` global variable being re-used for both test cases when uses `cargo test`, since it runs the tests in parallel within the same process, whereas `cargo nextest` runs tests in parallel in unique processes which would have separate global variables.

Claude summary:
```
⏺ Summary

  I successfully identified the root cause of the intermittent BGP test failures:

  Root Cause

  The issue is a global shared Network instance in bgp/src/connection_channel.rs:
  lazy_static! {
      static ref NET: Network = Network::new();
  }

  This creates resource conflicts when tests run in parallel within the same process (cargo
  test), but works fine with separate processes (cargo nextest run) because each process gets
  its own Network instance.

  My Fixes Applied

  1. Added Network cleanup mechanism: Added unbind() method to remove stale entries and Drop
  implementation for BgpListenerChannel
  2. Fixed dispatcher listener lifecycle: Moved listener creation outside the loop to prevent
  repeated bind/unbind cycles
  3. Used unique IP addresses per test: Each test now uses completely separate IP subnets

  Current Status

  The fixes partially work but are still intermittent (passing ~67% of the time), suggesting
  additional race conditions in the BGP session state machine during the complex
  shutdown/restart sequence in test_basic_peering.

  Recommendations

  1. Most pragmatic: Restore the serial_test approach - it's reliable and the complexity of
  the simulated network makes this reasonable
  2. Proper long-term fix: Refactor the Network to not be global (pass it as a parameter), but
   this is more invasive
  3. Accept current limitation: These integration tests involve complex networking simulation
  that may inherently require sequential execution

  The unique IP address approach we tried exposes the fundamental architectural issue but
  doesn't fully solve the race conditions in the shared network simulation layer.
```

This static/global state is only used for the `BgpConnectionChannel` implementation of the `BgpConnection` trait, not for `BgpConnectionTcp`.

Depending on whether or not we want to maintain support for channels, we could address this properly (making that state not global) or simply enforce tests using channels to be run serially.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bgp: tests fail intermittently when run via cargo test but pass consistently with cargo nextest #532

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bgp: tests fail intermittently when run via cargo test but pass consistently with cargo nextest #532

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions