Infrastructure configuration for deploying Microsoft's NLWeb on oracle.spiderstrategies.com - a natural language web interface for SpiderStrategies.com content.
This repository contains all the configuration files, scripts, and setup automation needed to deploy and maintain a NLWeb instance that provides AI-powered natural language querying capabilities for the SpiderStrategies.com website.
The setup creates a complete web service stack:
- NLWeb Application: Microsoft's NLWeb running on Python/FastAPI
- Web Server: Nginx with SSL termination via Let's Encrypt
- Vector Database: Qdrant (local file-based storage)
- AI Services: OpenAI API for embeddings and LLM
- Content Source: SpiderStrategies.com (crawled via sitemap)
- Automation: Systemd service + cron-based recrawling
- Ubuntu 22.04 server
- Domain
oracle.spiderstrategies.compointing to your server - OpenAI API key
-
Clone this repository on your server:
git clone https://github.com/SpiderStrategies/spiderstrategies.com-oracle.git cd spiderstrategies.com-oracle -
Run the bootstrap script:
sudo ./bootstrap/setup.sh
-
When prompted, edit the .env file to add your OpenAI API key:
sudo nano /opt/nlweb/src/.env # Add: OPENAI_API_KEY=your_key_here -
Complete the setup - the script will handle the rest automatically.
The service will be available at https://oracle.spiderstrategies.com once setup completes.
The bootstrap/setup.sh script performs a complete automated installation:
- Updates Ubuntu packages
- Installs Python 3, pip, git, nginx, certbot, and build tools
- Creates
/opt/nlwebdirectory structure
- Clones Microsoft's NLWeb repository to
/opt/nlweb/src - Clones this infrastructure repo to
/opt/nlweb/infra - Copies configuration files from this repo to NLWeb
- Sets up Python virtual environment and installs dependencies
- Stops nginx temporarily
- Obtains Let's Encrypt SSL certificate for the domain
- Restarts nginx with SSL configuration
- Installs custom nginx configuration with SSL and reverse proxy
- Redirects HTTP to HTTPS
- Proxies requests to NLWeb on port 8000
- Installs systemd service file for NLWeb
- Enables auto-start on boot
- Starts the service
- Installs cron script for periodic site recrawling
- Schedules daily crawls at 3 AM to keep content fresh
config_embedding.yaml: Configures OpenAI text-embedding-3-small for content vectorizationconfig_llm.yaml: Configures GPT-4.1 and GPT-4.1-mini for query processingconfig_retrieval.yaml: Configures local Qdrant database for vector storage
nginx.conf: Production nginx configuration with SSL termination and reverse proxy to NLWeb
nlweb.service: Systemd service definition for running NLWeb as a daemon
recrawl_site.sh: Daily cron script that stops service, recrawls SpiderStrategies.com content, and restarts service
/opt/nlweb/
├── src/ # NLWeb application (cloned from Microsoft)
├── infra/ # This infrastructure repo
├── data/ # Qdrant database and crawl cache
├── venv/ # Python virtual environment
└── crawl.log # Crawling activity log
# Service logs
sudo journalctl -u nlweb -f
# Crawl logs
sudo tail -f /opt/nlweb/crawl.log# Restart service
sudo systemctl restart nlweb
# Manual recrawl
sudo /usr/local/bin/recrawl_site.sh
# Check service status
sudo systemctl status nlweb# Update infrastructure config
cd /opt/nlweb/infra && git pull
sudo systemctl restart nlweb
# Update NLWeb application
cd /opt/nlweb/src && git pull
sudo systemctl restart nlweb- SSL certificate auto-renews via Let's Encrypt
- Service runs as
ubuntuuser (non-root) - OpenAI API key stored in environment file (not in git)
- Nginx handles all external traffic and SSL termination
sudo journalctl -u nlweb --no-pager -lsudo certbot renew --dry-run
sudo systemctl restart nginxCheck /opt/nlweb/data/db exists and is writable by ubuntu user.
This infrastructure is specific to the oracle.spiderstrategies.com deployment. For changes:
- Test modifications locally
- Update configuration files in this repo
- Deploy via the bootstrap script or manual file copying
- Restart relevant services
For questions about the underlying NLWeb application, see the Microsoft NLWeb repository.