A powerful Model Context Protocol (MCP) server that provides web scraping and crawling capabilities using Crawl4AI. This server acts as the "hands and eyes" for client-side AI, enabling intelligent web content analysis and extraction.
- 🔍 Page Structure Analysis: Extract clean HTML or Markdown content from any webpage
- 🎯 Schema-Based Extraction: Precision data extraction using CSS selectors and AI-generated schemas
- 📸 Screenshot Capture: Visual webpage representation for analysis
- ⚡ Async Operations: Non-blocking web crawling with progress reporting
- 🛡️ Error Handling: Comprehensive error handling and validation
- 📊 MCP Integration: Full Model Context Protocol compatibility with logging and progress tracking
┌─────────────────┐ ┌───────────────────┐ ┌─────────────────┐
│ Client AI │ │ Crawl4AI MCP │ │ Web Content │
│ ("Brain") │◄──►│ Server │◄──►│ (Websites) │
│ │ │ ("Hands & Eyes") │ │ │
└─────────────────┘ └───────────────────┘ └─────────────────┘
- FastMCP: Handles MCP protocol and tool registration
- AsyncWebCrawler: Provides async web scraping capabilities
- Stdio Transport: MCP-compatible communication channel
- Error-Safe Logging: All logs directed to stderr to prevent protocol corruption
- Python 3.10 or higher
- pip package manager
-
Clone or download this repository:
git clone <repository-url> cd crawl4ai-mcp
-
Create and activate virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Install Playwright browsers (required for screenshots):
playwright install
# Activate virtual environment
source venv/bin/activate
# Start the MCP server
python3 crawl4ai_mcp_server.pyFor interactive testing and development:
# Start MCP Inspector interface
fastmcp dev crawl4ai_mcp_server.pyThis will start a web interface (usually at http://localhost:6274) where you can test all tools interactively.
Purpose: Get server health and capabilities information
Parameters: None
Example Response:
{
"server_name": "Crawl4AI-MCP-Server",
"version": "1.0.0",
"status": "operational",
"capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"]
}Purpose: Extract webpage content for analysis (the "eyes" function)
Parameters:
url(string): The webpage URL to analyzeformat(string, optional): Output format - "html" or "markdown" (default: "html")
Example:
{
"url": "https://example.com",
"format": "html"
}Purpose: Precision data extraction using CSS selectors (the "hands" function)
Parameters:
url(string): The webpage URL to extract data fromextraction_schema(string): JSON string defining field names and CSS selectors
Example Schema:
{
"title": "h1",
"description": "p.description",
"price": ".price-value",
"author": ".author-name",
"tags": ".tag"
}Example Usage:
{
"url": "https://example.com/product",
"extraction_schema": "{\"title\": \"h1\", \"price\": \".price\", \"description\": \"p\"}"
}Purpose: Capture visual representation of webpage
Parameters:
url(string): The webpage URL to screenshot
Example:
{
"url": "https://example.com"
}Returns: Base64-encoded PNG image data with metadata
To use this server with Claude Desktop, add this configuration to your Claude Desktop settings:
{
"mcpServers": {
"crawl4ai": {
"command": "python3",
"args": ["/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"],
"env": {}
}
}
}Replace /path/to/crawl4ai-mcp/ with the actual path to your installation directory.
All tools include comprehensive error handling and return structured JSON responses:
{
"error": "Error description",
"url": "https://example.com",
"success": false
}Common error scenarios:
- Invalid URL format
- Network connectivity issues
- Invalid extraction schemas
- Screenshot capture failures
crawl4ai-mcp/
├── crawl4ai_mcp_server.py # Main server implementation
├── requirements.txt # Python dependencies
├── pyproject.toml # Project configuration
├── USAGE_EXAMPLES.md # Detailed usage examples
└── README.md # This file
- fastmcp: FastMCP framework for MCP server development
- crawl4ai: Core web crawling and extraction library
- pydantic: Data validation and parsing
- playwright: Browser automation for screenshots
Run the linter to ensure code quality:
ruff check .Test server startup:
python3 crawl4ai_mcp_server.py- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly with MCP Inspector
- Submit a pull request
This project is open source. See the LICENSE file for details.
For issues and questions:
- Check the troubleshooting section in USAGE_EXAMPLES.md
- Test with MCP Inspector to isolate issues
- Verify all dependencies are correctly installed
- Ensure virtual environment is activated
- Crawl4AI: Powerful web crawling and extraction capabilities
- FastMCP: Streamlined MCP server development framework
- Model Context Protocol: Standardized AI tool integration