fix: turn atlas-connect-cluster async #343

fmenezes · 2025-07-07T17:24:26Z

fixes #321

Proposed changes

turn atlas-connect-cluster tool into an async tool. Unfortunately, rely on atlas user creation api to connect, sometimes the propagation time of the user from control plane to the data plane can take time, there is no state to query other than check if user has access to DB.

Checklist

I have signed the MongoDB CLA

coveralls · 2025-07-07T17:32:54Z

Pull Request Test Coverage Report for Build 16145285047

Details

22 of 48 (45.83%) changed or added relevant lines in 2 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage decreased (-1.9%) to 73.688%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/session.ts	3	5	60.0%
src/tools/atlas/metadata/connectCluster.ts	19	43	44.19%

Files with Coverage Reduction	New Missed Lines	%
src/tools/atlas/metadata/connectCluster.ts	1	44.34%

Totals
Change from base Build 16135914646:	-1.9%
Covered Lines:	853
Relevant Lines:	1065

💛 - Coveralls

coveralls · 2025-07-07T17:32:54Z

Pull Request Test Coverage Report for Build 16123662219

Details

0 of 34 (0.0%) changed or added relevant lines in 1 file are covered.
129 unchanged lines in 15 files lost coverage.
Overall coverage decreased (-13.3%) to 60.879%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/tools/atlas/metadata/connectCluster.ts	0	34	0.0%

Files with Coverage Reduction	New Missed Lines	%
src/common/atlas/generatePassword.ts	3	25.0%
src/tools/atlas/create/createFreeCluster.ts	3	57.14%
src/tools/atlas/read/inspectCluster.ts	3	23.53%
src/tools/atlas/read/listAlerts.ts	3	20.0%
src/tools/atlas/read/listOrgs.ts	3	72.73%
src/tools/atlas/read/inspectAccessList.ts	4	33.33%
src/tools/atlas/create/createDBUser.ts	6	25.0%
src/tools/atlas/create/createProject.ts	8	46.67%
src/common/atlas/cluster.ts	9	0.0%
src/tools/atlas/create/createAccessList.ts	10	16.13%

Totals
Change from base Build 16077040616:	-13.3%
Covered Lines:	704
Relevant Lines:	1046

💛 - Coveralls

Copilot

Pull Request Overview

This PR refactors the atlas-connect-cluster tool into an asynchronous background process and updates integration tests to poll until the cluster connection is established.

Tests now loop with retries to wait for the async connection to succeed.
ConnectClusterTool has been split into query, prepare, and background connect phases, returning immediately with an “Attempting…” message.
Added new log IDs for connect attempts and successes in logger.ts.

Comments suppressed due to low confidence (1)

tests/integration/tools/atlas/clusters.test.ts:195

Add an assertion after the loop (or inside the success branch) to fail the test if the cluster never reports "Cluster is already connected.", otherwise the test may silently pass without verifying a successful connection.

                for (let i = 0; i < 600; i++) {

Copilot · 2025-07-08T06:55:41Z

src/tools/atlas/metadata/connectCluster.ts

+        await this.session.serviceProvider.runCommand("admin", {
+            ping: 1,
+        });
+        return "connected";


Wrap the runCommand('admin', { ping: 1 }) call in a try/catch so transient ping errors don’t bubble up and trigger a full reconnection flow prematurely.

Suggested change

await this.session.serviceProvider.runCommand("admin", {

ping: 1,

});

return "connected";

try {

await this.session.serviceProvider.runCommand("admin", {

ping: 1,

});

return "connected";

} catch (error) {

logger.warn(LogId.ConnectionPingError, `Ping command failed: ${error.message}`);

return "connecting";

}

no need, I'm bubbling up the error

Copilot · 2025-07-08T06:55:41Z

src/tools/atlas/metadata/connectCluster.ts

+        for (let i = 0; i < 600; i++) {
+            // try for 5 minutes


Extract the retry count (600) and delay (500ms) into named constants to improve readability and ease future adjustments.

Suggested change

for (let i = 0; i < 600; i++) {

// try for 5 minutes

for (let i = 0; i < RETRY_COUNT; i++) {

// try for RETRY_COUNT attempts

Copilot · 2025-07-08T06:55:41Z

src/logger.ts

@@ -17,6 +17,8 @@ export const LogId = {
    atlasDeleteDatabaseUserFailure: mongoLogId(1_001_002),
    atlasConnectFailure: mongoLogId(1_001_003),
    atlasInspectFailure: mongoLogId(1_001_004),
+    atlasConnectAttempt: mongoLogId(1_001_005),
+    atlasConnectSuccessed: mongoLogId(1_001_006),


The log ID key atlasConnectSuccessed is misspelled; consider renaming it to atlasConnectSucceeded or atlasConnectSuccess for clarity.

Suggested change

atlasConnectSuccessed: mongoLogId(1_001_006),

atlasConnectSucceeded: mongoLogId(1_001_006),

nirinchev · 2025-07-08T10:57:32Z

src/tools/atlas/metadata/connectCluster.ts

        const connectionString = cn.toString();

+        return connectionString;


Suggested change

const connectionString = cn.toString();

return connectionString;

return cn.toString();

nirinchev · 2025-07-08T11:00:26Z

src/tools/atlas/metadata/connectCluster.ts

+                            groupId: this.session.connectedAtlasCluster?.projectId || "",
+                            username: this.session.connectedAtlasCluster?.username || "",


If those are not set, does it make sense to make that call at all?

nirinchev · 2025-07-08T11:09:49Z

src/tools/atlas/metadata/connectCluster.ts

+        );
+    }
+
+    protected async execute({ projectId, clusterName }: ToolArgs<typeof this.argsShape>): Promise<CallToolResult> {


I'm not so sure about the DX changes here - any reason not to do the polling internally and only return a result once we're connected? Assuming the common use case for connecting to a cluster is to do something with it (e.g. insert/query documents) - the user will be waiting anyway, so might as well just notify them when we're done. Note that once we add support for streamable http, we'd be able to use the progress notifications mechanism to notify the user that something is going on, so they won't be just starting at a spinner for 10-15 seconds.

timeouts, client-side controls them no extended timeouts per tool, even our CI is intermittent, notice I added a very generous timeout (5 minutes)

this is coming from #321

poking around I've found that typescripts default timeout for tool calling is 1 minute (https://github.com/modelcontextprotocol/typescript-sdk/blob/main/src/shared/protocol.ts#L53C45-L53C50)

Hm... looking at #321, it doesn't appear like the issue is a timeout, but rather that the original 10-ish second window we gave the db user to be connected isn't enough (assuming that the auth error is due to the user still not being created after 20 retries with 500 ms delay each).

How would you feel if we went a middle ground where we polled internally for e.g. 45 seconds and if we still didn't have a valid user, we would return something along the lines of "provisioning the user and creating to the cluster is taking a bit longer, so check back in 15-20 seconds"? That way for the majority of users, we'd still connect within the first tool call, while still providing a graceful fallback for the few where it takes a lot longer.

Right, I think that is acceptable

I'll limit it to 30 secs given we have two extra API calls that can also eat some time

fmenezes · 2025-07-08T15:02:36Z

@nirinchev this is ready for another look

fmenezes added 3 commits July 7, 2025 18:24

fix: turn atlas-connect-cluster async

e654962

fix: update decription

24298ed

fix: format

f45b22f

fmenezes added 6 commits July 7, 2025 18:33

fix: add logs

d2f91d5

fix: tests

8eeb786

fix

6c84179

fix: update result

dad0111

fix: styles

d2c54ae

Merge branch 'main' into mcp-46

e978c82

fmenezes marked this pull request as ready for review July 8, 2025 06:54

Copilot AI review requested due to automatic review settings July 8, 2025 06:54

fmenezes requested a review from a team as a code owner July 8, 2025 06:54

Copilot AI reviewed Jul 8, 2025

View reviewed changes

fmenezes added 2 commits July 8, 2025 08:16

fix: improve model interpretation

54dfd7b

fix: tests

42b3e47

nirinchev reviewed Jul 8, 2025

View reviewed changes

fix: comments

e267b45

fmenezes requested a review from nirinchev July 8, 2025 13:44

fix: mix sync and async

28372be

	atlasConnectSuccessed: mongoLogId(1_001_006),
	atlasConnectSucceeded: mongoLogId(1_001_006),

		const connectionString = cn.toString();

		return connectionString;

		groupId: this.session.connectedAtlasCluster?.projectId \|\| "",
		username: this.session.connectedAtlasCluster?.username \|\| "",

fix: turn atlas-connect-cluster async #343

Are you sure you want to change the base?

fix: turn atlas-connect-cluster async #343

Uh oh!

Conversation

fmenezes commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Uh oh!

coveralls commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 16145285047

Details

💛 - Coveralls

Uh oh!

coveralls commented Jul 7, 2025

Pull Request Test Coverage Report for Build 16123662219

Details

💛 - Coveralls

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmenezes Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmenezes commented Jul 8, 2025

Uh oh!

Uh oh!

fmenezes commented Jul 7, 2025 •

edited

Loading

coveralls commented Jul 7, 2025 •

edited

Loading

fmenezes Jul 8, 2025 •

edited

Loading