Skip to content

xds: Convert CdsLb to XdsDepManager #12140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 11, 2025
Merged

xds: Convert CdsLb to XdsDepManager #12140

merged 2 commits into from
Jun 11, 2025

Conversation

ejona86
Copy link
Member

@ejona86 ejona86 commented Jun 9, 2025

I noticed we deviated from gRFC A37 in some ways. It turned out those were added to the gRFC later in grpc/proposal#344:

  • NACKing empty aggregate clusters
  • Failing aggregate cluster when children could not be loaded
  • Recusion limit of 16. We had this behavior already, but it was ascribed to matching C++

There's disagreement on whether we should actually fail the aggregate cluster for bad children, so I'm preserving the pre-existing behavior for now.

The code is now doing a depth-first leaf traversal, not breadth-first. This was odd to see, but the code was also pretty old, so the reasoning seems lost to history. Since we haven't seen more than a single level of aggregate clusters in practice, this wouldn't have been noticed by users.

XdsDependencyManager.start() was created to guarantee that the callback could not be called before returning from the constructor. Otherwise XDS_CLUSTER_SUBSCRIPT_REGISTRY could potentially be null.


While I made a point to call out those behavior changes. They are all relatively small tweaks to the code.

There are a few miscellaneous changes in this that could have been split out (mingRingSize, start(), Closeable, error handling, XdsDepMan's test), but I don't think they'll slow down the review and the reason for noticing is this PR. I think those lines will be pretty easy though. The hard part of this review should be CdsLoadBalancer2 and test, and splitting things out didn't seem like it'd speed the review. I'd be happy to split some out if you'd think it'd help.

I noticed we deviated from gRFC A37 in some ways. It turned out those
were added to the gRFC later in grpc/proposal#344:
- NACKing empty aggregate clusters
- Failing aggregate cluster when children could not be loaded
- Recusion limit of 16. We had this behavior already, but it was
  ascribed to matching C++

There's disagreement on whether we should actually fail the aggregate
cluster for bad children, so I'm preserving the pre-existing behavior
for now.

The code is now doing a depth-first leaf traversal, not breadth-first.
This was odd to see, but the code was also pretty old, so the reasoning
seems lost to history. Since we haven't seen more than a single level of
aggregate clusters in practice, this wouldn't have been noticed by
users.

XdsDependencyManager.start() was created to guarantee that the callback
could not be called before returning from the constructor. Otherwise
XDS_CLUSTER_SUBSCRIPT_REGISTRY could potentially be null.
@ejona86 ejona86 requested a review from kannanjgithub June 9, 2025 21:18
It was actually fine as-is. I had left it in to do an audit of all the
failures, but so many are impossible it turned out not to matter much.
And then I forgot to delete the comment.
Copy link
Member Author

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments about the changes to XdsDepManagerTest, which I changed because I was basing the CdsLb2Test off of it.

@@ -183,9 +167,6 @@ public void tearDown() throws InterruptedException {
xdsDependencyManager.shutdown();
}
xdsClient.shutdown();
channel.shutdown(); // channel not owned by XdsClient
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was simply wrong. GrpcXdsTransport shuts down the channel when no longer used.

@@ -183,9 +167,6 @@ public void tearDown() throws InterruptedException {
xdsDependencyManager.shutdown();
}
xdsClient.shutdown();
channel.shutdown(); // channel not owned by XdsClient

xdsServer.shutdownNow().awaitTermination(5, TimeUnit.SECONDS);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unnecessary, as the server has been registered with the cleanup rule.


private XdsClient xdsClient = XdsTestUtils.createXdsClient(
Collections.singletonList("control-plane"),
serverInfo -> new GrpcXdsTransportFactory.GrpcXdsTransport(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of ignoring the serverInfo, I choose to use it as the in-process server name. The SERVER_URI string previously was unused.

@ejona86 ejona86 merged commit 297ab05 into grpc:master Jun 11, 2025
15 of 16 checks passed
@ejona86 ejona86 deleted the a74-cds branch June 11, 2025 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants