Skip to content

sawantudayan/ModelShield

 
 

Repository files navigation

🛡️ ModelShield

Production-Ready ModelShield Framework

Python 3.8+ License: MIT FastAPI Contributions Welcome

ModelShield is a comprehensive, production-ready framework for implementing safety guardrails in Large Language Model (LLM) applications. It provides multi-layered protection against prompt injection, content policy violations, and ensures responsible AI deployment.

🌟 Key Features

Multi-Layered Detection

  • 📝 Pattern-Based Detection: Comprehensive regex patterns for known attack vectors
  • 🔍 Enhanced PII Detection: Microsoft Presidio integration for context-aware personal data identification
  • 🧠 Semantic Analysis: Transformer-based ML models for detecting obfuscated and novel attacks
  • 📤 Output Scanning: Real-time validation of model responses before delivery

Production Ready

  • ⚡ FastAPI Server: High-performance async API with automatic documentation
  • 🔄 Context-Aware: Intelligent severity adjustment based on use case (educational, medical, creative)
  • 📊 Comprehensive Monitoring: Detailed analytics and violation tracking
  • 🎯 Extensive Coverage: 500+ detection patterns across 12+ violation categories

Developer Friendly

  • 🔧 Easy Integration: Simple API endpoints and Python SDK
  • 📋 Comprehensive Testing: Built-in evaluation framework with benchmarking
  • 🎨 Customizable: Extensible pattern library and configurable thresholds
  • 📖 Rich Documentation: Interactive API docs and examples

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/baw117/ModelShield.git
cd modelshield

# Install dependencies
pip install -r requirements.txt

# Download spaCy model for enhanced PII detection
python -m spacy download en_core_web_sm

# Start the server
python startup.py

Basic Usage

from guardrails import ProductionGuardrailsServer, ProductionConfig

# Create configuration
config = ProductionConfig(
    enable_semantic_detection=True,
    enable_enhanced_pii=True,
    enable_output_scanning=True
)

# Start server
server = ProductionGuardrailsServer(config)
server.run(host="0.0.0.0", port=8000)

API Usage

# Test input validation
curl -X POST "http://localhost:8000/validate-input" \
     -H "Content-Type: application/json" \
     -d '{"prompt": "Hello, how can I help you today?", "user_id": "user123"}'

# Test output validation
curl -X POST "http://localhost:8000/validate-output" \
     -H "Content-Type: application/json" \
     -d '{"response_text": "Here is some helpful information...", "original_prompt": "How do I...?"}'

🏗️ Architecture

ModelShield employs a multi-layered detection approach:

Input → Pattern Detection → PII Detection → Semantic Analysis → Context Analysis → Decision
         ↓                  ↓               ↓                 ↓               ↓
    Regex Rules         Presidio NLP    Transformer ML    Educational?    Allow/Block/Filter
    • Prompt Injection  • Email/SSN     • Toxicity       • Medical?      
    • Hate Speech      • Credit Cards   • Obfuscation    • Creative?     
    • Violence         • Phone Numbers  • Similarity     • News?         

📊 Detection Categories

Category Coverage Examples
Prompt Injection 25+ patterns System override attempts, instruction bypass
PII Detection 15+ types Email, SSN, credit cards, API keys
Content Safety 200+ patterns Inappropriate content across multiple domains
Context Analysis 4 contexts Educational, medical, creative, news
Output Validation 10+ checks Response filtering, data leak prevention

🔧 Configuration

config = ProductionConfig(
    # Core Detection
    enable_semantic_detection=True,    # ML-based detection
    enable_enhanced_pii=True,          # Presidio integration
    enable_output_scanning=True,       # Response validation
    enable_context_analysis=True,      # Context-aware decisions
    
    # Performance
    max_concurrent_requests=100,       # Rate limiting
    request_timeout_seconds=30,        # Timeout settings
    enable_caching=True,               # Response caching
    
    # Security
    enable_rate_limiting=True,         # Request rate limits
    enable_audit_logging=True,         # Security event logging
    
    # Model Settings
    presidio_confidence_threshold=0.7,  # PII detection threshold
    semantic_confidence_threshold=0.6,  # ML model threshold
)

📈 Performance

  • Latency: < 100ms average response time
  • Throughput: 1000+ requests/second
  • Accuracy: 95%+ detection rate with <2% false positives
  • Coverage: 500+ attack patterns across 12 categories

🧪 Testing

# Run comprehensive test suite
python test_server.py

# Run evaluation benchmarks
python -c "
from comprehensive_test_guardrails import main
import asyncio
asyncio.run(main())
"

# Test specific components
python pii_detector.py
python semantic_detector.py

🤝 Contributing

We welcome contributions! ModelShield thrives on community input to stay ahead of evolving threats.

Areas where we need help:

  • 🔍 New Attack Patterns: Help us identify and defend against emerging threats
  • 🌍 Multi-language Support: Extend detection to non-English content
  • Performance Optimization: Improve speed and efficiency
  • 🧪 Testing & Validation: Expand our test coverage
  • 📚 Documentation: Improve guides and examples

See CONTRIBUTING.md for detailed guidelines.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

📞 Support

🙏 Acknowledgments

  • Microsoft Presidio for PII detection capabilities
  • Hugging Face Transformers for semantic analysis models
  • FastAPI for the robust web framework
  • spaCy for NLP processing

⭐ Star History

If ModelShield helps secure your AI applications, please consider giving us a star! ⭐


Built with ❤️ for the AI Safety Community

Making AI systems safer, one guardrail at a time.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%