A modern, full-stack chat application demonstrating how to integrate React frontend with a Go backend and run local Large Language Models (LLMs) using Docker's Model Runner. This project features a comprehensive Redis-powered observability stack with real-time monitoring, analytics, and distributed tracing.
This project showcases a complete Generative AI interface with enterprise-grade observability that includes:
- React/TypeScript frontend with a responsive chat UI
- Go backend server for API handling
- Integration with Docker's Model Runner to run Llama 3.2 locally
- Redis Stack with TimeSeries for data persistence and analytics
- Comprehensive observability with metrics, logging, and tracing
- NEW: Redis-powered analytics with real-time performance monitoring
- Enhanced Docker Compose setup with full observability stack
- 💬 Interactive chat interface with message history
- 🔄 Real-time streaming responses (tokens appear as they're generated)
- 🌓 Light/dark mode support based on user preference
- 🐳 Dockerized deployment for easy setup and portability
- 🏠 Run AI models locally without cloud API dependencies
- 🔒 Cross-origin resource sharing (CORS) enabled
- 🧪 Integration testing using Testcontainers
- 📊 Redis-powered metrics and performance monitoring
- 📝 Structured logging with zerolog
- 🔍 Distributed tracing with OpenTelemetry & Jaeger
- 📈 Grafana dashboards for visualization
- 🚀 Advanced llama.cpp performance metrics
- 🆕 Redis Stack with TimeSeries, Search, and JSON support
- 🆕 Redis Exporter for Prometheus metrics integration
- 🆕 Token Analytics Service for usage tracking
- 🆕 Production-ready health checks and service dependencies
- 🆕 Auto-configured Grafana with Prometheus and Redis datasources
The application now consists of a comprehensive observability stack:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ >>> │ Backend │ >>> │ Model Runner │
│ (React/TS) │ │ (Go) │ │ (Llama 3.2) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
:3000 :8080 :12434
│ │
┌─────────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Grafana │ <<< │ Prometheus │ │ Jaeger │
│ Dashboards │ │ Metrics │ │ Tracing │
└─────────────────┘ └──────────────┘ └─────────────────┘
:3001 :9091 :16686
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Redis Stack │ │ Redis Exporter │ │ Token Analytics │
│ DB + Insight │ │ (Prometheus) │ │ Service │
└─────────────────┘ └─────────────────┘ └─────────────────┘
:6379, :8001 :9121 :8082
┌─────────────────┐
│ Redis TimeSeries│
│ Service │
└─────────────────┘
:8085
- Docker and Docker Compose
- Git
- Go 1.19 or higher (for local development)
- Node.js and npm (for frontend development)
Before starting, pull the required model:
docker model pull ai/llama3.2:1B-Q8_0Start the complete AIWatch observability stack:
# Clone the repository
git clone https://github.com/collabnix/aiwatch.git
cd aiwatch
# Start the complete stack (builds and runs all services)
docker-compose up -d --buildAfter deployment, access these services:
| Service | URL | Credentials | Purpose |
|---|---|---|---|
| AIWatch Frontend | http://localhost:3000 | - | Main chat interface |
| Grafana | http://localhost:3001 | admin/admin | Monitoring dashboards |
| Redis Insight | http://localhost:8001 | - | Redis database GUI |
| Prometheus | http://localhost:9091 | - | Metrics collection |
| Jaeger | http://localhost:16686 | - | Distributed tracing |
| Token Analytics | http://localhost:8082 | - | Usage analytics API |
| TimeSeries API | http://localhost:8085 | - | Redis TimeSeries service |
After deployment, verify the observability stack is working:
-
Check Grafana Connection:
- Visit http://localhost:3001
- Login with admin/admin
- Go to Configuration > Data Sources
- Verify Prometheus datasource shows "✅ Data source is working"
- Verify Redis datasource is configured
-
Check Prometheus Targets:
- Visit http://localhost:9091/targets
- All targets should show State: UP:
prometheus:9090(Prometheus itself)redis-exporter:9121(Redis metrics)backend:9090(Backend metrics)token-analytics:8082(Analytics metrics)
-
View Pre-built Dashboard:
- In Grafana, go to Dashboards
- Open "AIWatch Redis Monitoring"
- You should see Redis metrics: Memory Usage, Connected Clients, Commands/sec
-
Redis Database (Port 6379)
- Primary data store for chat history and session management
- Redis TimeSeries for metrics storage
- Redis JSON for complex data structures
- Redis Search for full-text capabilities
-
Redis Insight (Port 8001)
- Web-based Redis GUI for database inspection
- Real-time monitoring of Redis performance
- Key-value browser and query interface
-
Redis Exporter (Port 9121)
- Exports Redis metrics to Prometheus
- Monitors memory usage, command statistics, connection counts
- Integration with alerting systems
-
Token Analytics Service (Port 8082)
- Tracks token usage patterns and costs
- API endpoint for analytics queries
- Integration with frontend metrics display
-
Redis TimeSeries Service (Port 8085)
- Dedicated API for time-series data operations
- Historical performance data storage
- Real-time metrics aggregation
- Real-time Redis Metrics: Memory usage, commands/sec, connections
- Token Usage Analytics: Input/output tokens, cost tracking, usage patterns
- Performance Monitoring: Response times, throughput, error rates
- Historical Data: Time-series storage of all metrics for trend analysis
- Grafana Integration: Pre-configured dashboards for Redis monitoring
- Auto-configured Datasources: Prometheus and Redis datasources automatically set up
The frontend is built with React, TypeScript, and Vite:
cd frontend
npm install
npm run devThis will start the development server at http://localhost:3000.
The Go backend can be run directly:
go mod download
go run main.goMake sure to set the required environment variables from backend.env:
BASE_URL: URL for the model runnerMODEL: Model identifier to useAPI_KEY: API key for authentication (defaults to "ollama")REDIS_ADDR: Redis connection address (redis:6379)LOG_LEVEL: Logging level (debug, info, warn, error)LOG_PRETTY: Whether to output pretty-printed logsTRACING_ENABLED: Enable OpenTelemetry tracingOTLP_ENDPOINT: OpenTelemetry collector endpoint
- The frontend sends chat messages to the backend API
- The backend formats the messages and sends them to the Model Runner
- Chat history and session data are stored in Redis
- The LLM processes the input and generates a response
- The backend streams the tokens back to the frontend as they're generated
- Token analytics are collected and stored in Redis TimeSeries
- Redis metrics are exported to Prometheus for monitoring
- Observability components collect metrics, logs, and traces throughout the process
- Grafana dashboards provide real-time visualization of system performance
├── compose.yaml # Complete observability stack deployment
├── backend.env # Backend environment variables
├── main.go # Go backend server
├── frontend/ # React frontend application
│ ├── src/ # Source code
│ │ ├── components/ # React components
│ │ ├── App.tsx # Main application component
│ │ └── ...
├── pkg/ # Go packages
│ ├── logger/ # Structured logging
│ ├── metrics/ # Prometheus metrics
│ ├── middleware/ # HTTP middleware
│ ├── tracing/ # OpenTelemetry tracing
│ └── health/ # Health check endpoints
├── prometheus/ # Prometheus configuration
│ └── prometheus.yml # Scraping configuration
├── grafana/ # Grafana configuration
│ ├── provisioning/ # Auto-configuration
│ │ ├── datasources/ # Prometheus & Redis datasources
│ │ └── dashboards/ # Dashboard provisioning
│ └── dashboards/ # Pre-built dashboard JSON files
├── redis/ # Redis configuration
│ └── redis.conf # Redis server configuration
├── observability/ # Observability documentation
└── ...
The application includes detailed llama.cpp metrics displayed directly in the UI:
- Tokens per Second: Real-time generation speed
- Context Window Size: Maximum tokens the model can process
- Prompt Evaluation Time: Time spent processing the input prompt
- Memory per Token: Memory usage efficiency
- Thread Utilization: Number of threads used for inference
- Batch Size: Inference batch size
These metrics help in understanding the performance characteristics of llama.cpp models and can be used to optimize configurations.
The project includes comprehensive observability features:
- Model performance (latency, time to first token)
- Token usage (input and output counts)
- Request rates and error rates
- Active request monitoring
- Redis performance metrics (memory, commands, connections)
- Token analytics with cost tracking
- llama.cpp specific performance metrics
- Structured JSON logs with zerolog
- Log levels (debug, info, warn, error, fatal)
- Request logging middleware
- Error tracking
- Request flow tracing with OpenTelemetry
- Integration with Jaeger for visualization
- Span context propagation
For more information, see Observability Documentation.
The Redis setup includes:
- Persistence: RDB and AOF enabled for data durability
- Memory Optimization: Configured for optimal performance
- Security: Protected mode disabled for development (configure for production)
- TimeSeries: Enabled for metrics storage
- Networking: Bridge network for service communication
All services include:
- Health Checks: Automated service health monitoring
- Restart Policies: Automatic restart on failure
- Resource Limits: Memory and CPU constraints
- Logging: Centralized log collection
- Grafana Datasources: Automatically configured Prometheus and Redis connections
- Dashboard Provisioning: Pre-built Redis monitoring dashboard
- Prometheus Targets: All services automatically discovered and monitored
You can customize the application by:
- Changing the model in
backend.envto use a different LLM - Modifying the frontend components for a different UI experience
- Extending the backend API with additional functionality
- Customizing the Grafana dashboards for different metrics
- Adjusting llama.cpp parameters for performance optimization
- Configuring Redis for different persistence and performance requirements
- Adding custom analytics using the Token Analytics Service API
- Creating custom dashboards in Grafana for specific monitoring needs
- Adding new datasources in
grafana/provisioning/datasources/
The project includes integration tests using Testcontainers:
cd tests
go test -v- Model not loading: Ensure you've pulled the model with
docker model pull - Connection errors: Verify Docker network settings and that Model Runner is running
- Streaming issues: Check CORS settings in the backend code
- Metrics not showing: Verify that Prometheus can reach the backend metrics endpoint
- Redis connection failed: Check Redis container status and network connectivity
- llama.cpp metrics missing: Confirm that your model is indeed a llama.cpp model
- Grafana dashboards empty: Ensure Prometheus is collecting metrics and data source is configured correctly
- Redis Insight not accessible: Check if port 8001 is available and Redis container is running
- Token analytics not working: Verify Redis TimeSeries module is loaded and service dependencies are met
- Performance degradation: Monitor Redis memory usage and consider adjusting configuration
- Data not persisting: Check Redis volume mounts and persistence configuration
If Grafana shows "No data" like in your screenshot:
-
Check Datasource Configuration:
# Verify Prometheus is accessible from Grafana container docker exec aiwatch-grafana wget -qO- http://prometheus:9090/api/v1/query?query=up
-
Check Prometheus Targets:
# View Prometheus targets status curl http://localhost:9091/api/v1/targets -
Restart Stack (if needed):
docker-compose down docker-compose up -d --build
The issue you encountered was due to Docker networking - services within the Docker network must communicate using service names (like prometheus:9090) rather than localhost:9090. We've fixed this by:
- ✅ Mounting the
prometheus.ymlconfiguration file properly - ✅ Using correct service names in Prometheus targets
- ✅ Auto-configuring Grafana datasources with proper internal URLs
- ✅ Adding pre-built Redis monitoring dashboard
Monitor service health using:
# Check all container status
docker-compose ps
# View specific service logs
docker-compose logs redis
docker-compose logs grafana
docker-compose logs prometheus
docker-compose logs token-analytics- Memory Management: Configure
maxmemoryand eviction policies - Persistence: Balance between RDB and AOF based on use case
- Networking: Use Redis clustering for high availability
- Monitoring: Set up alerts for memory usage and connection limits
- Thread Configuration: Optimize thread count based on CPU cores
- Memory Settings: Configure context window based on available RAM
- Batch Processing: Adjust batch size for optimal throughput
If upgrading from a previous version:
- Backup existing data (if any)
- Stop current services:
docker-compose down - Use new compose file:
docker-compose up -d --build - Verify all services: Check health endpoints and Grafana dashboards
- Import existing data into Redis if needed
MIT
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Docker Model Runner team for local LLM capabilities
- Redis Stack for comprehensive data management
- Grafana and Prometheus communities for observability tools
- OpenTelemetry project for distributed tracing standards