Skip to content

Crash in NetworkTransport on reconnect #137

Open
@nighthawk

Description

@nighthawk

Describe the bug
Came across a runtime crash in NetworkTransport when a MCP client reconnects (mcp-swift-sdk version 0.9.0)

To Reproduce
I haven't yet been able to reliably reproduce it. In my case the host macOS app was running, when I launched Zed where I had the MCP server configured. Zed seems to reconnect, but that caused a runtime crash in the host macOS app.

Logs

macOS crash log snippet:

0x1a76fa1f8 _assertionFailure(_:_:file:line:flags:) + 176
1   libswift_Concurrency.dylib    	       0x27ecd64e4 CheckedContinuation.resume(throwing:) + 348
2   MyHostApp                      	       0x1003beb94 closure #1 in NetworkTransport.handleReconnection(error:continuation:context:) + 180

Relevant code snippet:

        private func handleReconnection(
            error: Swift.Error,
            continuation: CheckedContinuation<Void, Swift.Error>,
            context: String
        ) async {
            if !isStopping,
                reconnectionConfig.enabled,
                reconnectAttempt < reconnectionConfig.maxAttempts
            {
                // Try to reconnect with exponential backoff
                reconnectAttempt += 1
                logger.info(
                    "Attempting reconnection after \(context) (\(reconnectAttempt)/\(reconnectionConfig.maxAttempts))..."
                )

                // Calculate backoff delay
                let delay = reconnectionConfig.backoffDelay(for: reconnectAttempt)

                // Schedule reconnection attempt after delay
                Task {
                    try? await Task.sleep(for: .seconds(delay))
                    if !isStopping {
                        // Cancel the current connection before attempting to reconnect.
                        self.connection.cancel()
                        // Resume original continuation with error; outer logic or a new call to connect() will handle retry.
                        continuation.resume(throwing: error)
                    } else {
                        continuation.resume(throwing: error)  // Stopping, so fail.
                    }
                }
            } else {
                // Not configured to reconnect, exceeded max attempts, or stopping
                self.connection.cancel()  // Ensure connection is cancelled
                continuation.resume(throwing: error)
            }
        }

Additional context

The crash is triggered when continuation.resume(throwing: ...) gets called multiple times for the same continuation. I haven't yet wrapped my head around the entire flow here and how this is possible, and it's hard to debug as it seems intermittent.

Suggestions appreciated for how to address this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions