Crawl4AI MCP Server

A powerful Model Context Protocol (MCP) server that provides web scraping and crawling capabilities using Crawl4AI. This server acts as the "hands and eyes" for client-side AI, enabling intelligent web content analysis and extraction.

Features

🔍 Page Structure Analysis: Extract clean HTML or Markdown content from any webpage
🎯 Schema-Based Extraction: Precision data extraction using CSS selectors and AI-generated schemas
📸 Screenshot Capture: Visual webpage representation for analysis
⚡ Async Operations: Non-blocking web crawling with progress reporting
🛡️ Error Handling: Comprehensive error handling and validation
📊 MCP Integration: Full Model Context Protocol compatibility with logging and progress tracking

Architecture

┌─────────────────┐    ┌───────────────────┐    ┌─────────────────┐
│   Client AI     │    │  Crawl4AI MCP     │    │   Web Content   │
│   ("Brain")     │◄──►│   Server          │◄──►│   (Websites)    │
│                 │    │  ("Hands & Eyes") │    │                 │
└─────────────────┘    └───────────────────┘    └─────────────────┘

FastMCP: Handles MCP protocol and tool registration
AsyncWebCrawler: Provides async web scraping capabilities
Stdio Transport: MCP-compatible communication channel
Error-Safe Logging: All logs directed to stderr to prevent protocol corruption

Installation

Prerequisites

Python 3.10 or higher
pip package manager

Setup

Clone or download this repository:

git clone <repository-url>
cd crawl4ai-mcp

Create and activate virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Install Playwright browsers (required for screenshots):
```
playwright install
```

Usage

Starting the Server

# Activate virtual environment
source venv/bin/activate

# Start the MCP server
python3 crawl4ai_mcp_server.py

Testing with MCP Inspector

For interactive testing and development:

# Start MCP Inspector interface
fastmcp dev crawl4ai_mcp_server.py

This will start a web interface (usually at http://localhost:6274) where you can test all tools interactively.

Available Tools

1. server_status

Purpose: Get server health and capabilities information
Parameters: None

Example Response:

{
  "server_name": "Crawl4AI-MCP-Server",
  "version": "1.0.0", 
  "status": "operational",
  "capabilities": ["web_crawling", "content_extraction", "screenshot_capture", "schema_based_extraction"]
}

2. get_page_structure

Purpose: Extract webpage content for analysis (the "eyes" function)
Parameters:

url (string): The webpage URL to analyze
format (string, optional): Output format - "html" or "markdown" (default: "html")

Example:

{
  "url": "https://example.com",
  "format": "html"
}

3. crawl_with_schema

Purpose: Precision data extraction using CSS selectors (the "hands" function)
Parameters:

url (string): The webpage URL to extract data from
extraction_schema (string): JSON string defining field names and CSS selectors

Example Schema:

{
  "title": "h1",
  "description": "p.description", 
  "price": ".price-value",
  "author": ".author-name",
  "tags": ".tag"
}

Example Usage:

{
  "url": "https://example.com/product",
  "extraction_schema": "{\"title\": \"h1\", \"price\": \".price\", \"description\": \"p\"}"
}

4. take_screenshot

Purpose: Capture visual representation of webpage
Parameters:

url (string): The webpage URL to screenshot

Example:

{
  "url": "https://example.com"
}

Returns: Base64-encoded PNG image data with metadata

Integration with Claude Desktop

To use this server with Claude Desktop, add this configuration to your Claude Desktop settings:

{
  "mcpServers": {
    "crawl4ai": {
      "command": "python3",
      "args": ["/path/to/crawl4ai-mcp/crawl4ai_mcp_server.py"],
      "env": {}
    }
  }
}

Replace /path/to/crawl4ai-mcp/ with the actual path to your installation directory.

Error Handling

All tools include comprehensive error handling and return structured JSON responses:

{
  "error": "Error description",
  "url": "https://example.com", 
  "success": false
}

Common error scenarios:

Invalid URL format
Network connectivity issues
Invalid extraction schemas
Screenshot capture failures

Development

Project Structure

crawl4ai-mcp/
├── crawl4ai_mcp_server.py    # Main server implementation
├── requirements.txt          # Python dependencies
├── pyproject.toml           # Project configuration
├── USAGE_EXAMPLES.md        # Detailed usage examples
└── README.md               # This file

Dependencies

fastmcp: FastMCP framework for MCP server development
crawl4ai: Core web crawling and extraction library
pydantic: Data validation and parsing
playwright: Browser automation for screenshots

Testing

Run the linter to ensure code quality:

ruff check .

Test server startup:

python3 crawl4ai_mcp_server.py

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly with MCP Inspector
Submit a pull request

License

This project is open source. See the LICENSE file for details.

Support

For issues and questions:

Check the troubleshooting section in USAGE_EXAMPLES.md
Test with MCP Inspector to isolate issues
Verify all dependencies are correctly installed
Ensure virtual environment is activated

Acknowledgments

Crawl4AI: Powerful web crawling and extraction capabilities
FastMCP: Streamlined MCP server development framework
Model Context Protocol: Standardized AI tool integration

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
__pycache__		__pycache__
development		development
node_modules		node_modules
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
FINAL_MATHEMATICAL_DECLARATION.md		FINAL_MATHEMATICAL_DECLARATION.md
INFINITE_LOOP_RESOLVED.md		INFINITE_LOOP_RESOLVED.md
INTEGRATION_GUIDE.md		INTEGRATION_GUIDE.md
PROJECT_COMPLETE.md		PROJECT_COMPLETE.md
README.md		README.md
TASK_CREATION_ANALYSIS.md		TASK_CREATION_ANALYSIS.md
TODO.json		TODO.json
TROUBLESHOOTING.md		TROUBLESHOOTING.md
USAGE_EXAMPLES.md		USAGE_EXAMPLES.md
crawl4ai_mcp_server.py		crawl4ai_mcp_server.py
debug_schema.py		debug_schema.py
eslint.config.mjs		eslint.config.mjs
infinite-continue-hook.log		infinite-continue-hook.log
package-lock.json		package-lock.json
package.json		package.json
post-tool-linter-hook.log		post-tool-linter-hook.log
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
server-info.json		server-info.json
test_mcp_tools.py		test_mcp_tools.py
test_schema_example.json		test_schema_example.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Crawl4AI MCP Server

Features

Architecture

Installation

Prerequisites

Setup

Usage

Starting the Server

Testing with MCP Inspector

Available Tools

1. server_status

2. get_page_structure

3. crawl_with_schema

4. take_screenshot

Integration with Claude Desktop

Error Handling

Development

Project Structure

Dependencies

Testing

Contributing

License

Support

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Nexus-Digital-Automations/crawl4ai-mcp

Folders and files

Latest commit

History

Repository files navigation

Crawl4AI MCP Server

Features

Architecture

Installation

Prerequisites

Setup

Usage

Starting the Server

Testing with MCP Inspector

Available Tools

1. server_status

2. get_page_structure

3. crawl_with_schema

4. take_screenshot

Integration with Claude Desktop

Error Handling

Development

Project Structure

Dependencies

Testing

Contributing

License

Support

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages