Skip to content

SpiderStrategies/spiderstrategies.com-oracle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spiderstrategies.com-oracle

Infrastructure configuration for deploying Microsoft's NLWeb on oracle.spiderstrategies.com - a natural language web interface for SpiderStrategies.com content.

Overview

This repository contains all the configuration files, scripts, and setup automation needed to deploy and maintain a NLWeb instance that provides AI-powered natural language querying capabilities for the SpiderStrategies.com website.

Architecture

The setup creates a complete web service stack:

  • NLWeb Application: Microsoft's NLWeb running on Python/FastAPI
  • Web Server: Nginx with SSL termination via Let's Encrypt
  • Vector Database: Qdrant (local file-based storage)
  • AI Services: OpenAI API for embeddings and LLM
  • Content Source: SpiderStrategies.com (crawled via sitemap)
  • Automation: Systemd service + cron-based recrawling

Quick Start

Prerequisites

  • Ubuntu 22.04 server
  • Domain oracle.spiderstrategies.com pointing to your server
  • OpenAI API key

Installation

  1. Clone this repository on your server:

    git clone https://github.com/SpiderStrategies/spiderstrategies.com-oracle.git
    cd spiderstrategies.com-oracle
  2. Run the bootstrap script:

    sudo ./bootstrap/setup.sh
  3. When prompted, edit the .env file to add your OpenAI API key:

    sudo nano /opt/nlweb/src/.env
    # Add: OPENAI_API_KEY=your_key_here
  4. Complete the setup - the script will handle the rest automatically.

The service will be available at https://oracle.spiderstrategies.com once setup completes.

What the Setup Does

The bootstrap/setup.sh script performs a complete automated installation:

1. System Dependencies

  • Updates Ubuntu packages
  • Installs Python 3, pip, git, nginx, certbot, and build tools
  • Creates /opt/nlweb directory structure

2. Application Setup

  • Clones Microsoft's NLWeb repository to /opt/nlweb/src
  • Clones this infrastructure repo to /opt/nlweb/infra
  • Copies configuration files from this repo to NLWeb
  • Sets up Python virtual environment and installs dependencies

3. SSL Certificate

  • Stops nginx temporarily
  • Obtains Let's Encrypt SSL certificate for the domain
  • Restarts nginx with SSL configuration

4. Web Server Configuration

  • Installs custom nginx configuration with SSL and reverse proxy
  • Redirects HTTP to HTTPS
  • Proxies requests to NLWeb on port 8000

5. Service Management

  • Installs systemd service file for NLWeb
  • Enables auto-start on boot
  • Starts the service

6. Content Crawling

  • Installs cron script for periodic site recrawling
  • Schedules daily crawls at 3 AM to keep content fresh

Configuration Files

AI & Embeddings (config/)

  • config_embedding.yaml: Configures OpenAI text-embedding-3-small for content vectorization
  • config_llm.yaml: Configures GPT-4.1 and GPT-4.1-mini for query processing
  • config_retrieval.yaml: Configures local Qdrant database for vector storage

Web Server (nginx/)

  • nginx.conf: Production nginx configuration with SSL termination and reverse proxy to NLWeb

Service Management (systemd/)

  • nlweb.service: Systemd service definition for running NLWeb as a daemon

Automation (cron/)

  • recrawl_site.sh: Daily cron script that stops service, recrawls SpiderStrategies.com content, and restarts service

Directory Structure (Post-Installation)

/opt/nlweb/
├── src/           # NLWeb application (cloned from Microsoft)
├── infra/         # This infrastructure repo
├── data/          # Qdrant database and crawl cache
├── venv/          # Python virtual environment
└── crawl.log      # Crawling activity log

Maintenance

Viewing Logs

# Service logs
sudo journalctl -u nlweb -f

# Crawl logs  
sudo tail -f /opt/nlweb/crawl.log

Manual Operations

# Restart service
sudo systemctl restart nlweb

# Manual recrawl
sudo /usr/local/bin/recrawl_site.sh

# Check service status
sudo systemctl status nlweb

Updates

# Update infrastructure config
cd /opt/nlweb/infra && git pull
sudo systemctl restart nlweb

# Update NLWeb application
cd /opt/nlweb/src && git pull  
sudo systemctl restart nlweb

Security Notes

  • SSL certificate auto-renews via Let's Encrypt
  • Service runs as ubuntu user (non-root)
  • OpenAI API key stored in environment file (not in git)
  • Nginx handles all external traffic and SSL termination

Troubleshooting

Service Won't Start

sudo journalctl -u nlweb --no-pager -l

SSL Certificate Issues

sudo certbot renew --dry-run
sudo systemctl restart nginx

Database Issues

Check /opt/nlweb/data/db exists and is writable by ubuntu user.

Contributing

This infrastructure is specific to the oracle.spiderstrategies.com deployment. For changes:

  1. Test modifications locally
  2. Update configuration files in this repo
  3. Deploy via the bootstrap script or manual file copying
  4. Restart relevant services

For questions about the underlying NLWeb application, see the Microsoft NLWeb repository.

About

NLWeb server config for oracle.spiderstrategies.com

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages