-
Notifications
You must be signed in to change notification settings - Fork 3.9k
xds: Convert CdsLb to XdsDepManager #12140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I noticed we deviated from gRFC A37 in some ways. It turned out those were added to the gRFC later in grpc/proposal#344: - NACKing empty aggregate clusters - Failing aggregate cluster when children could not be loaded - Recusion limit of 16. We had this behavior already, but it was ascribed to matching C++ There's disagreement on whether we should actually fail the aggregate cluster for bad children, so I'm preserving the pre-existing behavior for now. The code is now doing a depth-first leaf traversal, not breadth-first. This was odd to see, but the code was also pretty old, so the reasoning seems lost to history. Since we haven't seen more than a single level of aggregate clusters in practice, this wouldn't have been noticed by users. XdsDependencyManager.start() was created to guarantee that the callback could not be called before returning from the constructor. Otherwise XDS_CLUSTER_SUBSCRIPT_REGISTRY could potentially be null.
It was actually fine as-is. I had left it in to do an audit of all the failures, but so many are impossible it turned out not to matter much. And then I forgot to delete the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments about the changes to XdsDepManagerTest, which I changed because I was basing the CdsLb2Test off of it.
@@ -183,9 +167,6 @@ public void tearDown() throws InterruptedException { | |||
xdsDependencyManager.shutdown(); | |||
} | |||
xdsClient.shutdown(); | |||
channel.shutdown(); // channel not owned by XdsClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was simply wrong. GrpcXdsTransport shuts down the channel when no longer used.
@@ -183,9 +167,6 @@ public void tearDown() throws InterruptedException { | |||
xdsDependencyManager.shutdown(); | |||
} | |||
xdsClient.shutdown(); | |||
channel.shutdown(); // channel not owned by XdsClient | |||
|
|||
xdsServer.shutdownNow().awaitTermination(5, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary, as the server has been registered with the cleanup rule.
|
||
private XdsClient xdsClient = XdsTestUtils.createXdsClient( | ||
Collections.singletonList("control-plane"), | ||
serverInfo -> new GrpcXdsTransportFactory.GrpcXdsTransport( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of ignoring the serverInfo, I choose to use it as the in-process server name. The SERVER_URI string previously was unused.
I noticed we deviated from gRFC A37 in some ways. It turned out those were added to the gRFC later in grpc/proposal#344:
There's disagreement on whether we should actually fail the aggregate cluster for bad children, so I'm preserving the pre-existing behavior for now.
The code is now doing a depth-first leaf traversal, not breadth-first. This was odd to see, but the code was also pretty old, so the reasoning seems lost to history. Since we haven't seen more than a single level of aggregate clusters in practice, this wouldn't have been noticed by users.
XdsDependencyManager.start() was created to guarantee that the callback could not be called before returning from the constructor. Otherwise XDS_CLUSTER_SUBSCRIPT_REGISTRY could potentially be null.
While I made a point to call out those behavior changes. They are all relatively small tweaks to the code.
There are a few miscellaneous changes in this that could have been split out (mingRingSize, start(), Closeable, error handling, XdsDepMan's test), but I don't think they'll slow down the review and the reason for noticing is this PR. I think those lines will be pretty easy though. The hard part of this review should be CdsLoadBalancer2 and test, and splitting things out didn't seem like it'd speed the review. I'd be happy to split some out if you'd think it'd help.