-
Notifications
You must be signed in to change notification settings - Fork 1
[mcast] Lifecycle + API changes for Omicron impl #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
15f31df
to
076815a
Compare
0fc844d
to
55b4d32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Zeeshan. Full disclosure that I haven't yet looked at the integration tests. I think the new rollback machinery is pretty neat -- obviously it's geared toward just multicast, but I think the model of maintaining a snapshot of the old target state and moving back to it is pretty useful.
I think overall I'm a bit confused by the mention of External
forwarding groups being used with instances/guests, but most things here are nits.
This change strengthens the multicast implementation with always-allocated group IDs, better API validation, and comprehensive test improvements for Omicron integration. This update no longer generates multicast group IDs optionally. They are always allocated during group creation, following how multicast groups are configured in the Omicron CP In Omicron, multicast groups are created first, without members, and then members are added as instances are configured for a multicast group. Replication configuration is only written to tables when members are added, but IDs are always generated for the 1:1 mapping between underlay and external (overlay) associated groups. Includes: * **Core ID Management Changes:** - Remove Option<MulticastGroupId> - IDs are always allocated during group creation - Establish 1:1 mapping between underlay and external (overlay) groups - External groups now use IDs from corresponding NAT target (Omicron keeps the true relational mapping) * **API Changes and Validation:** - Remove sources field from internal group APIs (MulticastGroupCreateEntry, MulticastGroupUpdateEntry) - Internal groups cannot have sources or NAT targets - cleaner separation of concerns - External groups retain sources for proper SSM (Source-Specific Multicast) validation - Now fail outright on reset if cleanup is not used properly, which helps on the Omicron side. * **Rollback & Error Handling:** - The addition of a rollback module (and trait) for a more functional approach to rollback on creation or updates involving tables, ports, etc - Improved error propagation in test cleanup to catch resource leaks early - Better validation of group ID relationships to match tables and allocation states * **Test Infrastructure Improvements:** - Enhanced cleanup_test_group() to fail explicitly on deletion errors (prevents test pollution), and ensures proper 1:1 deletion mapping - New tests for rollback, empty members upon multicast group creation/update * **Replication Management:** - Configure replication only when groups have members (change made expecting empty groups in Omicron CP initially) - Reconfigure replication tables when transitioning between empty/populated groups Key aspects this commit covers: 1. ID Management to match expectations in Omicron's multicast impl 2. Validation: Enhanced API validation, group ID relationship checks, SSM validation 3. Rollback: Reset operations now fail explicitly, better error propagation 4. Testing: Comprehensive test improvements, better error handling, standardized cleanup
…nsistentcy This includes `MulticastUnderlayGroupResponse` and `MulticastExternalGroupResponse`, and a unified response type for lists, mixed result calls `MulticastGroupResponse`. We also added an AdminScoped type for underlay and consistent naming throughout. We also rename structs for consistency, and handle rollback at the boundary calls to internal fns. This PR has been updated to accomodate the new API trait, oxidecomputer/omicron#8922, so it adjusts a lot from the previous code and commit.
55b4d32
to
b11823a
Compare
@FelixMcFelix Sorry for the additional changes, as 2daa552 went in after the review. With it changing all the type handling, I went ahead and just made the API more consistent (and properly restrictive) across the board. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating here. Mainly a pile of raw-string-shaped nits, with one or two genuine questions in integration_tests
and rollback
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through the changes!
Introduce end-to-end multicast support across control plane and sled-agent, and integrate IP pool model extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management; pool_type/mvlan/switch_port_uplinks - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables; IP pool enhancements (pool_type, mvlan, switch_port_uplinks) - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduce end-to-end multicast support across control plane and sled-agent, and integrate IP pool model extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management; pool_type/mvlan/switch_port_uplinks - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables; IP pool enhancements (pool_type, mvlan, switch_port_uplinks) - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduce end-to-end multicast support across control plane and sled-agent, and integrate IP pool model extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management; pool_type/mvlan/switch_port_uplinks - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables; IP pool enhancements (pool_type, mvlan, switch_port_uplinks) - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduce end-to-end multicast support across control plane and sled-agent, and integrate IP pool model extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management; pool_type/mvlan/switch_port_uplinks - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables; IP pool enhancements (pool_type, mvlan, switch_port_uplinks) - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduce end-to-end multicast support across control plane and sled-agent, and integrate IP pool model extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management; pool_type/mvlan/switch_port_uplinks - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables; IP pool enhancements (pool_type, mvlan, switch_port_uplinks) - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduce end-to-end multicast support across control plane and sled-agent, and integrate IP pool model extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management; pool_type/mvlan/switch_port_uplinks - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables; IP pool enhancements (pool_type, mvlan, switch_port_uplinks) - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduce end-to-end multicast support across control plane and sled-agent, and integrate IP pool model extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management; pool_type/mvlan/switch_port_uplinks - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables; IP pool enhancements (pool_type, mvlan, switch_port_uplinks) - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduce end-to-end multicast support across control plane and sled-agent, and integrate IP pool model extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management; pool_type/mvlan/switch_port_uplinks - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables; IP pool enhancements (pool_type, mvlan, switch_port_uplinks) - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduce end-to-end multicast support across control plane and sled-agent, and integrate IP pool model extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management; pool_type/mvlan/switch_port_uplinks - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables; IP pool enhancements (pool_type, mvlan, switch_port_uplinks) - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduce end-to-end multicast support across control plane and sled-agent, and integrate IP pool model extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management; pool_type/mvlan/switch_port_uplinks - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables; IP pool enhancements (pool_type, mvlan, switch_port_uplinks) - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduce end-to-end multicast support across control plane and sled-agent, and integrate IP pool model extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management; pool_type/mvlan/switch_port_uplinks - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables; IP pool enhancements (pool_type, mvlan, switch_port_uplinks) - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduce end-to-end multicast support across control plane and sled-agent, and integrate IP pool model extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management; pool_type/mvlan/switch_port_uplinks - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables; IP pool enhancements (pool_type, mvlan, switch_port_uplinks) - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
This work introduces multicast IP pool capabilities to support external multicast traffic routing through the rack's switching infrastructure. Includes: - Add IpPoolType enum (unicast/multicast) with unicast as default - Add multicast pool fields: switch_port_uplinks (UUID[]), mvlan (VLAN ID) - Add database migration (multicast-support/up01.sql) with new columns and indexes - Add ASM/SSM range validation for multicast pools to prevent mixing - Add pool type-aware resolution for IP allocation - Add custom deserializer for switch port uplinks with deduplication - Update external API params/views for multicast pool configuration - Add SSM constants (IPV4_SSM_SUBNET, IPV6_SSM_FLAG_FIELD) for validation Database schema updates: - ip_pool table: pool_type, switch_port_uplinks, mvlan columns - Index on pool_type for efficient filtering - Migration preserves existing pools as unicast type by default This provides the foundation for multicast group functionality while maintaining full backward compatibility with existing unicast pools. References (for review): - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14
This work introduces multicast IP pool capabilities to support external multicast traffic routing through the rack's switching infrastructure. Includes: - Add IpPoolType enum (unicast/multicast) with unicast as default - Add multicast pool fields: switch_port_uplinks (UUID[]), mvlan (VLAN ID) - Add database migration (multicast-support/up01.sql) with new columns and indexes - Add ASM/SSM range validation for multicast pools to prevent mixing - Add pool type-aware resolution for IP allocation - Add custom deserializer for switch port uplinks with deduplication - Update external API params/views for multicast pool configuration - Add SSM constants (IPV4_SSM_SUBNET, IPV6_SSM_FLAG_FIELD) for validation Database schema updates: - ip_pool table: pool_type, switch_port_uplinks, mvlan columns - Index on pool_type for efficient filtering - Migration preserves existing pools as unicast type by default This provides the foundation for multicast group functionality while maintaining full backward compatibility with existing unicast pools. References (for review): - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14
Introduce end-to-end multicast group support across control plane and sled-agent, integrated with IP pool extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduces end-to-end multicast group support across control plane and sled-agent, integrated with IP pool extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - IP Pool extensions: #9084 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduces end-to-end multicast group support across control plane and sled-agent, integrated with IP pool extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-group-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Contains a version change (to v5) as InstanceEnsureBody has been modified to include multicast_groups associated with an instance in the underlying sled config - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - IP Pool extensions: #9084 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
Introduces end-to-end multicast group support across control plane and sled-agent, integrated with IP pool extensions required for supporting multicast workflows. This work enables project-scoped multicast groups with lifecycle-driven dataplane programming and exposes an API for operating multicast groups over instances. Highlights: - DB: new multicast_group tables; member lifecycle management - API: multicast group/member CRUD; source IP validation; VPC/project hierarchy integration with default VNI fallback - Control plane: RPW reconcilers for groups/members; sagas for dataplane updates atomically at the group level; instance lifecycle hooks and piggybacking - Dataplane: Dendrite DPD switch programming via trait abstraction; DPD client used in tests - Sled agent: multicast-aware instance management; network interface configuration for multicast traffic; cross-version testing; OPTE stubs present - Tests: comprehensive integration suites under nexus/tests/integration_tests/multicast/ Components: - Database schema: external and underlay multicast groups; member/instance association tables - Control plane modules: multicast group management, member lifecycle, dataplane abstraction; RPW reconcilers to ensure convergence - API layer: endpoints and validation; default-VNI semantics when VPC not provided - Sled agent: OPTE stubs and compatibility shims for older agents Workflows Implemented: 1. Instance lifecycle integration: - "Create" -> resolve VPC/VNI (or default), validate source IPs, create memberships, enqueue group ensure RPW - "Start" -> program dataplane via ensure/update sagas; activate member flows after switch ack - "Stop" -> deactivate dataplane membership; retain DB membership for fast restart - "Delete" -> remove instance memberships; group deletion is explicit - "Migrate" -> deactivate on source sled; activate on target; idempotent with ordering guarantees - Restart/recovery -> RPWs reconcile desired state; compensations clean up partial programming 2. RPW reconciliation: - ensure dataplane switches match database state - handle sled migrations and state transitions - Eventual consistency with retry logic Migrations: - Apply schema changes in schema/crdb/multicast-group-support/up01.sql (and update dbinit.sql) - Bump schema versions accordingly API/Compatibility: - OpenAPI updated: openapi/nexus.json, openapi/sled-agent/sled-agent-5.0.0-89f1f7.json - Contains a version change (to v5) as InstanceEnsureBody has been modified to include multicast_groups associated with an instance in the underlying sled config - Regenerate clients where applicable References: - RFD 488: https://rfd.shared.oxide.computer/rfd/488 - IP Pool extensions: #9084 - Dendrite PRs (based on recency): * oxidecomputer/dendrite#132 * oxidecomputer/dendrite#109 * oxidecomputer/dendrite#14 Follow-ups include: - OPTE integration - commtest extension - omdb commands are tracked in issues - pool and group stats
This change strengthens the multicast implementation with always-allocated group IDs, better API validation, and comprehensive test improvements for Omicron integration.
This update no longer generates multicast group IDs optionally. They are always allocated during group creation, following how multicast groups are configured in the Omicron CP
In Omicron, multicast groups are created first, without members, and then members are added as instances are configured for a multicast group.
Replication configuration is only written to tables when members are added, but IDs are always generated for the 1:1 mapping between underlay and external (overlay) associated groups.
Includes:
Core ID Management Changes:
group creation
keeps the true relational mapping)
API Changes and Validation:
APIs (MulticastGroupCreateEntry, MulticastGroupUpdateEntry)
MulticastUnderlayGroupResponse
andMulticastExternalGroupResponse
, and unified for listsMulticastGroupResponse
separation of concerns
Multicast) validation
helps on the Omicron side
Rollback & Error Handling:
functional approach to rollback on creation or updates involving
tables, ports, etc
allocation states
Test Infrastructure Improvements:
(prevents test pollution), and ensures proper 1:1 deletion mapping
Replication Management:
expecting empty groups in Omicron CP initially)
Key aspects this commit covers:
SSM validation
standardized cleanup