Skip to content

Nexus-Digital-Automations/crawl4ai-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawl4AI MCP Server

A powerful Model Context Protocol (MCP) server that provides web scraping and crawling capabilities using Crawl4AI. This server acts as the "hands and eyes" for client-side AI, enabling intelligent web content analysis and extraction.

Features

  • 🔍 Page Structure Analysis: Extract clean HTML or Markdown content from any webpage
  • 🎯 Schema-Based Extraction: Precision data extraction using CSS selectors and AI-generated schemas
  • 📸 Screenshot Capture: Visual webpage representation for analysis
  • ⚡ Async Operations: Non-blocking web crawling with progress reporting
  • 🛡️ Error Handling: Comprehensive error handling and validation
  • 📊 MCP Integration: Full Model Context Protocol compatibility with logging and progress tracking

Architecture

┌─────────────────┐    ┌───────────────────┐    ┌─────────────────┐
│   Client AI     │    │  Crawl4AI MCP     │    │   Web Content   │
│   ("Brain")     │◄──►│   Server          │◄──►│   (Websites)    │
│                 │    │  ("Hands & Eyes") │    │                 │
└─────────────────┘    └───────────────────┘    └─────────────────┘
  • FastMCP: Handles MCP protocol and tool registration
  • AsyncWebCrawler: Provides async web scraping capabilities
  • Stdio Transport: MCP-compatible communication channel
  • Error-Safe Logging: All logs directed to stderr to prevent protocol corruption

Installation

Prerequisites

  • Python 3.10 or higher
  • pip package manager

Setup

  1. Clone or download this repository:

    git clone <repository-url>
    cd crawl4ai-mcp
  2. Create and activate virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Install Playwright browsers (required for screenshots):

    playwright install

Usage

Starting the Server

# Activate virtual environment
source venv/bin/activate

# Start the MCP server
python3 crawl4ai_mcp_server.py

Testing with MCP Inspector

For interactive testing and development:

# Start MCP Inspector interface
fastmcp dev crawl4ai_mcp_server.py

This will start a web interface (usually at http://localhost:6274) where you can test all tools interactively.

Available Tools

1. server_status

Purpose: Get server health and capabilities information
Parameters: None

Example Response:

{
  "server_name": "Crawl4AI-MCP-Server",
  "version": "1.0.0", 
  "status": "operational",
  "capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"]
}

2. get_page_structure

Purpose: Extract webpage content for analysis (the "eyes" function)
Parameters:

  • url (string): The webpage URL to analyze
  • format (string, optional): Output format - "html" or "markdown" (default: "html")

Example:

{
  "url": "https://example.com",
  "format": "html"
}

3. crawl_with_schema

Purpose: Precision data extraction using CSS selectors (the "hands" function)
Parameters:

  • url (string): The webpage URL to extract data from
  • extraction_schema (string): JSON string defining field names and CSS selectors

Example Schema:

{
  "title": "h1",
  "description": "p.description", 
  "price": ".price-value",
  "author": ".author-name",
  "tags": ".tag"
}

Example Usage:

{
  "url": "https://example.com/product",
  "extraction_schema": "{\"title\": \"h1\", \"price\": \".price\", \"description\": \"p\"}"
}

4. take_screenshot

Purpose: Capture visual representation of webpage
Parameters:

  • url (string): The webpage URL to screenshot

Example:

{
  "url": "https://example.com"
}

Returns: Base64-encoded PNG image data with metadata

Integration with Claude Desktop

To use this server with Claude Desktop, add this configuration to your Claude Desktop settings:

{
  "mcpServers": {
    "crawl4ai": {
      "command": "python3",
      "args": ["/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"],
      "env": {}
    }
  }
}

Replace /path/to/crawl4ai-mcp/ with the actual path to your installation directory.

Error Handling

All tools include comprehensive error handling and return structured JSON responses:

{
  "error": "Error description",
  "url": "https://example.com", 
  "success": false
}

Common error scenarios:

  • Invalid URL format
  • Network connectivity issues
  • Invalid extraction schemas
  • Screenshot capture failures

Development

Project Structure

crawl4ai-mcp/
├── crawl4ai_mcp_server.py    # Main server implementation
├── requirements.txt          # Python dependencies
├── pyproject.toml           # Project configuration
├── USAGE_EXAMPLES.md        # Detailed usage examples
└── README.md               # This file

Dependencies

  • fastmcp: FastMCP framework for MCP server development
  • crawl4ai: Core web crawling and extraction library
  • pydantic: Data validation and parsing
  • playwright: Browser automation for screenshots

Testing

Run the linter to ensure code quality:

ruff check .

Test server startup:

python3 crawl4ai_mcp_server.py

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly with MCP Inspector
  5. Submit a pull request

License

This project is open source. See the LICENSE file for details.

Support

For issues and questions:

  1. Check the troubleshooting section in USAGE_EXAMPLES.md
  2. Test with MCP Inspector to isolate issues
  3. Verify all dependencies are correctly installed
  4. Ensure virtual environment is activated

Acknowledgments

  • Crawl4AI: Powerful web crawling and extraction capabilities
  • FastMCP: Streamlined MCP server development framework
  • Model Context Protocol: Standardized AI tool integration

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •