Skip to content

sending pdf file over socket connection is too slow. #1623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vishal180618 opened this issue Apr 22, 2025 · 1 comment
Open

sending pdf file over socket connection is too slow. #1623

vishal180618 opened this issue Apr 22, 2025 · 1 comment
Labels

Comments

@vishal180618
Copy link

I’m using a WebSocket-based setup with a private server (hosted in Romania) and a client located in Asia. When a new connection is established, sending a file (typically a PDF between 8 to 20 MB) takes around 6 to 8 seconds—which already feels slow compared to downloading the same file over HTTP (usually under 2 seconds).

The issue becomes more noticeable if the WebSocket connection stays open but idle for about 30 minutes or more. After that, sending the same file takes around 20 seconds.

Has anyone encountered similar behavior? Are there best practices or recommended approaches to maintain good performance on long-lived idle WebSocket connections?

Thanks in advance!

client.py:

class WebSocketClient:
    def __init__(self, uri):
        self.connection_id = f'{uuid.uuid4().hex[:5]}'
        self.uri = uri
        self.websocket = None

    async def connect(self):
        """Establish and maintain a persistent WebSocket connection."""
        while True:
            try:
                self.websocket = await connect(
                    self.uri, max_size=None, ping_interval=20, ping_timeout=120
                )
                logger.info(f"Connected to {self.uri} | ID: {self.connection_id}")

                try:
                    logger.info(f"Warming up connection {self.connection_id} with dummy PDF")
                    import os
                    script_dir = os.path.dirname(os.path.abspath(__file__))
                    warmup_path = os.path.join(script_dir, 'warmup_8mb.pdf')

                    with open(warmup_path, 'rb') as fp:
                        await self.websocket.send(fp.read())
                    _ = await self.websocket.recv()
                    logger.info(f"Connection {self.connection_id} warm-up complete")
                except Exception as e:
                    logger.warning(f"Warm-up failed: {e}")

                await self.websocket.wait_closed()

            except Exception as e:  # Handles ALL disconnections
                logger.error(f"WebSocketClient.connect: Connection {self.connection_id} lost: {e}. Retrying in 1 second...")

            # Ensure the connection is closed before retrying
            if self.websocket:
                await self.websocket.close()
                self.websocket = None

            await asyncio.sleep(1)  # Wait before reconnecting


    async def send_pdf(self, pdf_bytes):
        """Send PDF bytes to the server (fails if disconnected)."""
        if not self.websocket:
            logger.error(f"Cannot send PDF: WebSocket {self.connection_id} is closed.")
            raise ConnectionError("WebSocket is not connected. Call 'connect()' first.")

        await self.websocket.send(pdf_bytes)

    async def receive_response(self):
        """Receive a response from the server (fails if disconnected)."""
        if not self.websocket :
            logger.error(f"Cannot receive response: WebSocket {self.connection_id} is closed.")
            raise ConnectionError("WebSocket is not connected. Call 'connect()' first.")
        response = await self.websocket.recv()
        return json.loads(response)  # Assuming the server sends a JSON response.

    async def close(self):
        """Close the WebSocket connection cleanly."""
        if self.websocket:
            await self.websocket.close()
            self.websocket = None
            logger.info(f"WebSocket connection {self.connection_id} closed.")

server.py:

async def hello(websocket):
    client_address = getattr(websocket, "remote_address", "Unknown")
    logging.info(f"Client connected: {client_address}")
    try:
        while True:
            # Receive the PDF bytes from the client
            pdf_bytes = await websocket.recv()
            logging.info(
                f"<<< Received PDF bytes (size: {len(pdf_bytes)} bytes) from {client_address}"
            )

            # Process the PDF file
            output = process_pdf_file(pdf_bytes)  # Assume this returns a dictionary

            # Send the output back to the client
            await websocket.send(json.dumps(output))
            logging.info(f">>> Sent data: {output} to {client_address}")
    except Exception as e:
        logging.error(f"Error handling client {client_address}: {e}", exc_info=True)
    finally:
        logging.info(f"Connection closed: {client_address}")


async def main():
    logging.info("Starting WebSocket server on ws://localhost:4441")
    async with serve(hello, host="", port=4441, max_size=None, ping_timeout=120):
        await asyncio.Future()  # Keep the server running forever


if __name__ == "__main__":
    asyncio.run(main())
@aaugustin
Copy link
Member

About "it takes 6–8 seconds instead of 2 seconds for an HTTP download".

  • If you're I/O-bound, which your message suggests, a larger write_limit might help;
  • If you're CPU-bound on compression (= you run a very very cheap VPS), disabling it with compression=None will help;
  • Make sure that from websockets.speedups import apply_mask works on both the client and the server; if it doesn't, then for sure large messages will be slow.

There's no reason why performance would degrade as the connection gets older; something's happening on the network path between your two endpoints; possibly a network component does traffic shaping and throttles long-lived connections? To debug this, you may well have to pull wireshark and check what's happening on the TCP connection.

You may also want to try sending the same PDF over a plain TCP connection with asyncio.start_server and asyncio.open_connection`. If you have the same problem, then the cause is at a lower level than websockets (possibly network, host, OS, Python, asyncio).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants