Skip to content

Releases: apache/doris-mcp-server

0.6.0

09 Sep 07:08
067f160

Choose a tag to compare

Doris MCP Server 0.6.0 Release Notes

Release Date: 3 September 2025
Release Manager: FreeOnePlus(Yijia Su)


🚀 Major Features & Enhancements

Enterprise-Grade Authentication System

Doris MCP Server 0.6.0 introduces a comprehensive authentication framework designed for enterprise environments:

  • Multi-Authentication Support: Complete implementation of Token, JWT, and OAuth authentication systems
  • Granular Control: Independent switches for each authentication method with global default security
  • Token-Bound Database Configuration: Revolutionary approach allowing tokens to carry their own database connection parameters
  • Hot Reload Capability: Runtime configuration updates without service interruption
  • Immediate Validation: Database configurations are validated at connection time, not query time

Advanced Connection Management

Enhanced connection pooling and session management for improved performance:

  • Session Caching: Intelligent caching of Doris connections to reduce overhead and prevent timeout issues
  • Connection Pool Optimization: Proper connection lifecycle management with automatic release mechanisms
  • Multi-Worker Architecture: True horizontal scaling with stateless design for efficient load distribution

🔧 Critical System Improvements

Database Configuration Priority System

Problem: Inconsistent handling of database configurations across different authentication methods
Solution: Implemented a clear priority hierarchy:

  1. Token-bound database configuration (highest priority)
  2. Environment variables (.env configuration)
  3. Error state (if neither available)

This ensures enterprise users can have token-specific database access while maintaining fallback mechanisms.

Hot Reload Configuration Management

Problem: Configuration changes required service restarts, causing downtime
Solution: Implemented file-based hot reloading with:

  • Real-time monitoring of tokens.json changes (10-second intervals)
  • Automatic token revalidation and pool recreation
  • Zero-downtime configuration updates
  • Comprehensive error handling and rollback mechanisms

🛠️ Technical Architecture Enhancements

Unified Configuration Framework

  • Consolidated Configuration Management: Refactored configuration handling through unified DorisConfig object
  • Command-Line Priority: Proper precedence handling (command-line > environment variables > defaults)
  • Docker Compatibility: Fixed line ending issues for cross-platform container deployment

Enhanced SQL Security Framework

  • Improved Injection Detection: Enhanced SQL injection pattern recognition for complex queries
  • Access Control Fixes: Resolved security vulnerability allowing read-only mode bypass
  • Comprehensive Validation: Real-time SQL security validation with detailed error reporting

Asynchronous Operation Standardization

  • Mixed Async/Sync Fix: Converted all synchronous methods to asynchronous for consistency
  • Thread Safety: Eliminated exception errors related to mixed method calls
  • Improved Reliability: Enhanced process consistency across all operations

📊 New Tools & Capabilities

Tool/Feature Description Key Benefit
Token Authentication Enterprise-grade token system with database binding Secure multi-tenant access
Token Management Interface Web-based dashboard for complete token lifecycle management Intuitive admin control & security
Hot Reload Runtime configuration updates Zero-downtime operations
Session Cache Intelligent connection reuse 60% reduction in connection overhead
Bucket Analysis Enhanced storage metadata in analyze_table_storage Comprehensive table insights

🔒 Security & Authentication

Authentication System Features

  • Token-Based Authentication: Secure API access with configurable expiration
  • JWT Integration: Standard JWT token support for modern applications
  • OAuth Support: Industry-standard OAuth2 authentication flow
  • Role-Based Access Control: Granular permission system integration
  • Database-Level Security: Token-bound database access controls
  • Web-Based Token Management: Intuitive browser interface for complete token lifecycle management

🌐 Token Management Interface

The new Web-Based Token Management Interface provides enterprise-grade token administration through a secure, intuitive dashboard:

Access & Security Controls

  • Multi-Layer Access Protection:
    • IP Restriction: Access limited to localhost (127.0.0.1) and IPv6 loopback (::1) only
    • Admin Token Authentication: Requires valid TOKEN_MANAGEMENT_ADMIN_TOKEN for access
    • Configuration-Controlled: Must enable ENABLE_HTTP_TOKEN_MANAGEMENT=true in environment
  • Secure URL Authentication: Access via http://localhost:3000/token/management?admin_token=<your_admin_token>
  • Real-Time Permission Validation: Every operation validates admin credentials and IP restrictions
Image Image Image

Comprehensive Token Operations

  • Token Creation: Full-featured form with database configuration support
    • Basic Information: Token ID, description, custom token value, expiration settings
    • Database Binding: Complete database configuration (host, port, user, password, database, FE HTTP port)
    • Automatic Persistence: Tokens immediately saved to tokens.json with hot-reload support
  • Token Management: Real-time token operations
    • List & Statistics: Live token inventory with database binding status
    • Token Revocation: One-click token deletion with immediate file updates
    • Cleanup Operations: Automated expired token removal with audit trails
  • Interactive Dashboard: Live statistics and responsive UI with real-time updates

Enterprise Security Features

  • Zero Network Exposure: Interface only accessible from server host machine
  • Admin-Only Operations: All token operations require administrator-level authentication
  • Audit Trail: Complete logging of all token management operations
  • Configuration Validation: Real-time validation of database configurations before token creation
  • Automatic Token Persistence: All operations immediately synchronized with token storage files

Security Improvements

  • Access Control Enforcement: Fixed read-only mode bypass vulnerability
  • SQL Injection Protection: Enhanced detection patterns for complex query structures
  • Connection Security: Secure connection pooling with proper credential management
  • Audit Trail: Comprehensive logging for authentication and authorization events

⚡ Performance Optimizations

Connection Management

  • Session Caching: Reduces connection creation overhead by up to 60%
  • Pool Optimization: Intelligent connection pooling with health monitoring
  • Resource Management: Automatic connection cleanup and resource reclamation

Multi-Worker Scalability

  • Horizontal Scaling: True multi-process architecture for high-throughput scenarios
  • Load Distribution: Efficient request distribution across worker processes
  • Stateless Design: No shared state between workers for maximum scalability

Query Execution

  • Connection Reuse: Cached connections for frequently used sessions
  • Timeout Prevention: Eliminated "Connection acquisition timed out" errors
  • Optimized Resource Usage: Reduced memory and CPU overhead

🔄 Migration & Compatibility

Upgrading from 0.5.x

  1. Backup Configuration: Save existing .env and configuration files
  2. Update Dependencies: Run pip install -r requirements.txt
  3. Review Authentication: Configure new authentication system if needed
  4. Test Connections: Verify database connections with new pooling system

Configuration Changes

  • Authentication Settings: New environment variables for auth configuration
  • Token Management: Optional tokens.json file for token-based authentication
  • Connection Pooling: Enhanced pool configuration options

Backward Compatibility

  • API Compatibility: All existing MCP tools remain functional
  • Configuration: Existing .env configurations continue to work
  • Docker: Improved container compatibility with line ending fixes

📋 Detailed Bug Fixes & Improvements

Connection Management (PR #34, #44)

  • Issue: Connection pool exhaustion causing "Connection acquisition timed out" errors
  • Root Cause: Connections not properly released after query execution
  • Solution: Implemented automatic connection release and session caching
  • Impact: 100% elimination of timeout issues, improved resource utilization

Docker Compatibility (PR #39)

  • Issue: Script execution failures in Docker containers due to line ending issues
  • Root Cause: Windows-style CRLF line endings causing syntax errors
  • Solution: Added dos2unix conversion in Dockerfile build process
  • Impact: Seamless cross-platform Docker deployment

Configuration Management (PR #41)

  • Issue: Inconsistent configuration precedence and default value handling
  • Root Cause: Multiple configuration sources without clear priority
  • Solution: Unified configuration framework with proper precedence
  • **...
Read more

0.5.1

16 Jul 06:41
6d3c128

Choose a tag to compare

Doris MCP Server v0.5.1 Release Notes

🔥 Critical Fixes & System Improvements

✅ Complete Resolution of at_eof Connection Issues

The most important fix in v0.5.1 - Complete elimination of at_eof connection pool errors that affected previous versions:

  • 99.9% Error Reduction: Complete redesign of connection pool strategy with zero minimum connections
  • Self-Healing Architecture: Automatic pool recovery with intelligent health monitoring
  • Production Ready: Robust connection management tested under high concurrent loads

🕒 Global SQL Timeout Configuration Enhancement (NEW in v0.5.1)

  • Unified Timeout Control: The global SQL timeout configuration (timeout) is now fully supported and enhanced. You can control the default SQL execution timeout via config/performance/query_timeout, and all SQL executions will use this value by default unless explicitly overridden at runtime.
  • Bug Fix: Fixed issues where some SQL executions did not correctly apply the global timeout configuration, ensuring all entry points (such as MCP tools, API, batch queries, etc.) are consistently controlled.
  • Robustness: Improved code robustness by optimizing the timeout propagation chain in core classes like QueryRequest and DorisQueryExecutor, preventing timeout failures due to missing parameters.
  • Documentation: Updated documentation and configuration instructions to clarify the priority and scope of the timeout configuration.

🔧 Enhanced Logging System with Intelligent Management

Revolutionary logging system overhaul providing enterprise-grade log management:

  • Level-Based Organization: Automatic separation into debug, info, warning, error, critical logs
  • Automatic Cleanup: Background scheduler with configurable retention (default: 30 days)
  • Professional Format: Millisecond precision timestamps with proper alignment
  • Zero Maintenance: Hands-off log management with rotation and cleanup

🚀 Major New Features

Enterprise Data Analytics Suite (New in v0.5.1)

Introducing 7 new enterprise-grade data governance and analytics tools providing comprehensive data management capabilities for modern data architectures.

🔄 Unified Data Quality Framework

  • analyze_data_quality: Comprehensive data quality analysis combining completeness and distribution analysis
  • Configurable Analysis Scope: Completeness-only, distribution-only, or comprehensive analysis modes
  • Business Rules Engine: Custom business rule validation with regex patterns and SQL conditions
  • Statistical Insights: Advanced distribution analysis with percentiles, outliers, and pattern detection

📊 Data Governance & Lineage (New in v0.5.1)

  • trace_column_lineage: End-to-end column lineage tracking through SQL analysis and dependency mapping
  • monitor_data_freshness: Real-time data staleness monitoring with configurable freshness thresholds
  • Confidence Scoring: Intelligent confidence scoring for lineage relationships and data quality metrics
  • Impact Analysis: Comprehensive impact assessment for data changes and transformations

🔍 Advanced Analytics Suite (New in v0.5.1)

  • analyze_data_access_patterns: User behavior analysis and security anomaly detection
  • analyze_data_flow_dependencies: Data flow impact analysis and dependency mapping
  • analyze_slow_queries_topn: Performance bottleneck identification with pattern analysis
  • analyze_resource_growth_curves: Capacity planning with growth trend analysis

High-Performance ADBC Integration (New in v0.5.1)

Complete Apache Arrow Flight SQL (ADBC) support for enterprise-grade data transfer performance.

🏃‍♂️ Arrow Flight SQL Protocol

  • exec_adbc_query: High-performance SQL execution using Arrow Flight SQL protocol
  • get_adbc_connection_info: ADBC connection diagnostics and status monitoring
  • Multiple Data Formats: Support for Arrow, Pandas DataFrame, and Dictionary formats
  • Optimized Performance: Significant performance improvements for large dataset transfers

⚙️ Configurable ADBC Framework

  • Dynamic Configuration: All ADBC parameters now configurable via environment variables
  • Smart Defaults: Intelligent default values from configuration with runtime override support
  • Connection Management: Advanced connection pooling and health monitoring for ADBC connections
  • Cross-Platform Support: Full compatibility across Windows, Linux, and macOS environments

🔧 Enhanced Configuration Management (Updated in v0.5.1)

ADBC Configuration System

Comprehensive configuration management for Arrow Flight SQL operations:

# ADBC Query Configuration
ADBC_DEFAULT_MAX_ROWS=100000      # Default maximum rows for ADBC queries
ADBC_DEFAULT_TIMEOUT=60           # Default query timeout in seconds
ADBC_DEFAULT_RETURN_FORMAT=arrow  # Default return format (arrow/pandas/dict)
ADBC_CONNECTION_TIMEOUT=30        # ADBC connection timeout
ADBC_ENABLED=true                 # Enable/disable ADBC tools

# Arrow Flight SQL Ports
FE_ARROW_FLIGHT_SQL_PORT=8096     # Frontend Arrow Flight SQL port
BE_ARROW_FLIGHT_SQL_PORT=8097     # Backend Arrow Flight SQL port

# Doris MCP SQL Timeout (NEW in v0.5.1)
QUERY_TIMEOUT=30                  # Global SQL timeout (seconds) for Doris MCP; all SQL executions use this by default

Enhanced Environment Variable Support

  • Complete ADBC Integration: All ADBC parameters configurable via environment variables
  • Global Timeout Control: New QUERY_TIMEOUT environment variable for unified global SQL timeout configuration
  • Backward Compatibility: All existing configurations remain unchanged
  • Validation Framework: Built-in validation for all ADBC configuration parameters
  • Documentation Updates: Comprehensive .env.example with ADBC configuration guidance

📊 New MCP Tools Summary

Core Analytics Tools (7 New Tools)

Tool Name Module Description
analyze_data_quality Data Quality Comprehensive data quality analysis with completeness and distribution insights
trace_column_lineage Data Governance End-to-end column lineage tracking with confidence scoring
monitor_data_freshness Data Governance Real-time data staleness monitoring with alerting
analyze_data_access_patterns Security Analytics User behavior analysis and security anomaly detection
analyze_data_flow_dependencies Dependency Analysis Data flow impact analysis and dependency mapping
analyze_slow_queries_topn Performance Analytics Top-N slow query analysis with pattern identification
analyze_resource_growth_curves Performance Analytics Resource growth analysis for capacity planning

ADBC High-Performance Tools (2 New Tools)

Tool Name Description Performance Benefit
exec_adbc_query Arrow Flight SQL query execution 3-10x faster data transfer for large datasets
get_adbc_connection_info ADBC connection diagnostics Real-time connection health monitoring

🏗️ Architecture Enhancements

Modular Tool Design (New in v0.5.0)

  • Data Governance Tools (data_governance_tools.py): Lineage tracking and freshness monitoring
  • Data Quality Tools (data_quality_tools.py): Comprehensive quality analysis framework
  • Data Exploration Tools (data_exploration_tools.py): Advanced statistical analysis
  • Security Analytics Tools (security_analytics_tools.py): Access pattern analysis and threat detection
  • Dependency Analysis Tools (dependency_analysis_tools.py): Impact analysis and dependency mapping
  • Performance Analytics Tools (performance_analytics_tools.py): Query optimization and capacity planning
  • ADBC Query Tools (adbc_query_tools.py): High-performance Arrow Flight SQL operations

Enhanced Configuration Architecture (Updated in v0.5.1)

  • Centralized ADBC Config: New ADBCConfig dataclass with comprehensive parameter management
  • Global Timeout Config: Added global SQL timeout configuration query_timeout; all SQL executions use this by default
  • Environment Integration: Full environment variable support for all ADBC settings and SQL timeout
  • Validation Framework: Built-in parameter validation and error handling
  • Dynamic Tool Registration: Tools automatically use configuration defaults with runtime override support

🔒 Security & Compatibility

JSON Serialization Improvements (Fixed in v0.5.0)

  • Pandas Compatibility: Resolved numpy data type serialization issues in ADBC tools
  • Cross-Format Support: Seamless data conversion between Arrow, Pandas, and JSON formats
  • Memory Optimization: Efficient memory usage conversion with proper cleanup

Enterprise Security Integration

  • Access Pattern Analysis: Advanced user behavior monitoring with anomaly detection
  • Audit Trail Support: Comprehensive audit logging for all data governance operations
  • Risk Assessment: Intelligent risk scoring for data lineage and access patterns

📈 Performance Improvements

ADBC Performance Metrics

  • Query Execution: 0.5-1.5 seconds for typical enterprise queries
  • Data Transfer: Up to 10x faster than traditional methods for large datasets
  • **Memory Effic...
Read more

0.4.2

02 Jul 03:00

Choose a tag to compare

Doris MCP Server v0.4.2 Release Notes

Release Date: 2025-07-02
Version: 0.4.2
Type: Major Feature & Stability Release

🎯 Overview

Doris MCP Server v0.4.2 is a comprehensive release that introduces significant enhancements focused on operational excellence, advanced monitoring capabilities, configuration standardization, and critical stability improvements. This release includes enhanced security features, MCP library compatibility resolution, and establishes a unified configuration framework for better maintainability and production reliability.

🚀 Major Features & Improvements

1. 🔧 MCP Version Compatibility Layer

The Problem:

  • MCP 1.9.3 introduced breaking changes to the RequestContext class (changed from 2 to 3 generic parameters)
  • Users encountered TypeError: Too few arguments for RequestContext when upgrading MCP library
  • Version conflicts prevented server startup and caused deployment issues

The Solution:

  • Intelligent Version Detection: Automatically detects installed MCP version at runtime
  • Compatibility Abstraction: Gracefully handles API differences between MCP 1.8.x and 1.9.x
  • Flexible Dependency Range: Updated to mcp>=1.8.0,<2.0.0 for broader compatibility

Technical Implementation:

# Added version compatibility layer in doris_mcp_server/main.py
def _get_mcp_capabilities(self):
    """Get MCP capabilities with version compatibility"""
    try:
        # For MCP 1.9.x and newer - handles 3-parameter RequestContext
        return self.server.get_capabilities(...)
    except TypeError:
        # For MCP 1.8.x - handles 2-parameter RequestContext  
        return self.server.get_capabilities(...)
    except Exception:
        # Fallback for any version compatibility issues
        return minimal_capabilities

Benefits:

  • Zero Configuration: No manual adjustments needed for different MCP versions
  • Future-Proof: Robust handling of potential future API changes
  • Developer Friendly: Clear version logging for troubleshooting

Supported MCP Versions:

  • MCP 1.8.x: Fully supported (recommended for stability)
  • MCP 1.9.x: Fully supported (latest features)
  • Future 1.x versions: Compatible through intelligent detection

2. 🛠️ Critical Connection Stability Resolution (at_eof Error Fix)

The Problem:

  • Users experienced sudden at_eof connection errors during operation
  • Database interactions would fail unexpectedly, requiring server restarts
  • Connection pool became unstable under load

Root Cause Analysis:
The at_eof error was caused by aiomysql connection state checking issues when the connection's internal _reader object became None, but the ping check still attempted to access the at_eof attribute.

Comprehensive Solution:

A. Enhanced Connection Health Monitoring

# Added in doris_mcp_server/utils/db.py
async def ping(self):
    """Enhanced ping with connection state validation"""
    if not self._connection:
        return False
    
    # NEW: Strict connection state validation
    if hasattr(self._connection, '_reader') and self._connection._reader is None:
        return False
    if hasattr(self._connection, '_transport') and self._connection._transport is None:
        return False
        
    # Only check at_eof if connection components are valid
    if (hasattr(self._connection, '_reader') and 
        hasattr(self._connection._reader, 'at_eof')):
        return not self._connection._reader.at_eof()
    
    return True

B. Automatic Query Retry Mechanism

# Added in doris_mcp_server/utils/query_executor.py
async def execute_sql_for_mcp(self, sql: str, **kwargs):
    """Execute SQL with automatic retry for connection errors"""
    max_retries = 2
    
    for attempt in range(max_retries + 1):
        try:
            return await self._execute_sql_internal(sql, **kwargs)
        except Exception as e:
            if self._is_connection_error(e) and attempt < max_retries:
                logger.warning(f"Connection error detected, retrying... (attempt {attempt + 1})")
                await self._recover_connection()
                continue
            raise

Results:

  • Zero at_eof errors in 50+ comprehensive test scenarios
  • Automatic recovery from transient connection issues
  • Improved stability under high concurrent load

3. 🔒 Unified Security Configuration Management

The Problem:
Security keywords were defined inconsistently across multiple locations:

  • SecurityConfig: 9 keywords
  • DorisSecurityManager._load_blocked_keywords(): 13 keywords
  • SQLSecurityValidator.__init__(): 7 keywords

Solution: Single Source of Truth

A. Centralized Configuration

# doris_mcp_server/utils/config.py
@dataclass
class SecurityConfig:
    blocked_keywords: list[str] = field(
        default_factory=lambda: [
            # DDL Operations (Data Definition Language)
            "DROP", "CREATE", "ALTER", "TRUNCATE",
            # DML Operations (Data Manipulation Language) 
            "DELETE", "INSERT", "UPDATE",
            # DCL Operations (Data Control Language)
            "GRANT", "REVOKE",
            # System Operations
            "EXEC", "EXECUTE", "SHUTDOWN", "KILL",
        ]
    )

B. Environment Variable Support

# New environment variable configuration
ENABLE_SECURITY_CHECK=true
BLOCKED_KEYWORDS="DROP,DELETE,TRUNCATE,ALTER,CREATE,INSERT,UPDATE,GRANT,REVOKE,EXEC,EXECUTE,SHUTDOWN,KILL"

Benefits:

  • Configuration Consistency: All components use the same keyword list
  • Flexible Management: Easy customization through environment variables
  • No Code Duplication: Single source eliminates maintenance overhead

4. 🛠️ Enhanced Monitoring Tools Module

Design Philosophy

The enhanced monitoring tools module follows a modular, extensible design pattern that emphasizes:

  • Separation of Concerns: Each tool handles specific monitoring aspects
  • Composability: Tools can be combined for comprehensive monitoring workflows
  • Observability: Built-in monitoring and logging for all operations
  • Resilience: Robust error handling and recovery mechanisms

Core Capabilities

A. Real-time Memory Monitoring

# Example: Real-time memory statistics via BE Memory Tracker
{
  "tool_name": "get_realtime_memory_stats",
  "capabilities": [
    "Process memory tracking",
    "Global shared memory monitoring", 
    "Query-specific memory usage",
    "Load operations memory tracking",
    "Compaction memory analysis"
  ]
}

B. Historical Memory Analytics

# Example: Historical memory statistics via BE Bvar interface
{
  "tool_name": "get_historical_memory_stats",
  "features": [
    "Time-series memory data collection",
    "Configurable time range analysis",
    "Memory usage trend identification",
    "Performance baseline establishment",
    "Capacity planning insights"
  ]
}

C. BE Node Discovery & Configuration
The monitoring tools support flexible BE (Backend) node discovery to accommodate different deployment scenarios:

External Network (Manual Configuration):

# Configure BE hosts manually for external network access
DORIS_BE_HOSTS=10.1.1.100,10.1.1.101,10.1.1.102
DORIS_BE_WEBSERVER_PORT=8040

Internal Network (Automatic Discovery):

# Leave BE_HOSTS empty for automatic discovery via SHOW BACKENDS
# DORIS_BE_HOSTS=  # Empty or not set
# System will execute: SHOW BACKENDS and use internal IPs

5. 🔍 Mature Query Information Tools

Enhanced SQL Explain with Content Analysis

# New Feature: Content-aware explain plans with file export capability
{
  "tool_name": "get_sql_explain",
  "enhancements": [
    "Configurable content truncation (MAX_RESPONSE_CONTENT_SIZE)",
    "LLM-friendly output formatting with file export for attachments",
    "Plan complexity scoring",
    "Optimization suggestion integration",
    "Historical plan comparison",
    "Direct file output for LLM analysis and SQL optimization workflows"
  ]
}

Advanced SQL Profiling

# New Feature: Comprehensive profiling with truncation and file export
{
  "tool_name": "get_sql_profile", 
  "capabilities": [
    "Real-time performance tracking",
    "Resource usage breakdown",
    "Bottleneck identification",
    "Content size management with configurable truncation",
    "Performance trend analysis",
    "Complete profile file generation for LLM attachment analysis",
    "SQL optimization workflow integration with file export"
  ]
}

LLM Integration & File Export Capabilities

Both SQL Explain and Profile tools are designed for seamless LLM integration workflows:

Dual Output Strategy:

# Each tool provides both truncated content and complete file output
response = {
  "content": "Truncated content for immediate LLM analysis (4096 chars)",
  "file_path": "/tmp/explain_12345.txt",  # Complete file for attachment
  "is_content_truncated": true,
  "original_content_size": 156552
}

6. ⚙️ Unified Configuration Framework

Configuration Standardization

All configuration parameters have been consolidated into a unified framework:

A. Central Configuration Management

  • Single Source of Truth: All settings managed through doris_mcp_server/utils/config.py
  • Environment Integration: Seamless .env file and environment variable support
  • Validation Framework: Comprehensive configuration validation with clear error messages

B. Configuration Categories

# Unified configuration structure
DorisConfig:
├── DatabaseConfig          # Database connec...
Read more