Skip to content

DrJLabs/NeoRAG

Repository files navigation

Fancryrag Neo4j GraphRAG Baseline

This project bootstraps a Neo4j-backed GraphRAG pipeline using Astral's uv package manager.

Quick Start

# Install dependencies
uv sync

# Launch Neo4j with APOC on the internal rag-net network
docker compose up -d neo4j

# Populate vector index (cosine, 768 dim)
make index

# Create or verify the full-text index used for lexical recall
make fulltext-index

# Run the ingestion pipeline over a text file
printf 'Alice founded Acme Corp in 2012. Bob joined in 2015.' > sample.txt
make ingest f=sample.txt

# Inspect graph counts
make counts

Configure secrets in .env.local before running ingestion. When targeting a local OpenAI-compatible embedding server (e.g., localhost:20010), set:

EMBEDDING_API_BASE_URL=http://localhost:20010/v1
EMBEDDING_API_KEY=<token or dummy if not required>

If you are using OpenAI project-scoped keys (sk-proj-...), also capture the project identifier issued by OpenAI:

OPENAI_API_KEY=sk-proj-...
OPENAI_PROJECT=proj_...

Project keys require the project field so the Python SDK can route requests to the correct workspace.

For hybrid search, set the index configuration variables (sensible defaults shown):

INDEX_NAME=text_embeddings
FULLTEXT_INDEX_NAME=chunk_text_fulltext
FULLTEXT_LABEL=Chunk
FULLTEXT_PROPERTY=text
FULLTEXT_READY_ATTEMPTS=10
FULLTEXT_READY_DELAY=3

The full-text index script is idempotent; rerun make fulltext-index after ingestion jobs or schema changes to keep lexical search synchronized with vector metadata.

FULLTEXT_READY_ATTEMPTS and FULLTEXT_READY_DELAY tune how long the provisioning script waits for Neo4j to accept connections before failing. Defaults cover local Docker startup, but tighten them for pre-warmed environments or extend for slower clusters.

FastMCP Hybrid Server

Story 1.2 introduces a Google OAuth-protected FastMCP server that fronts the HybridCypherRetriever. Configure the additional environment variables (see .env.example for defaults):

  • MCP_BASE_URL: Public base URL used during OAuth callbacks (https://...).
  • MCP_SERVER_HOST, MCP_SERVER_PORT, MCP_SERVER_PATH: Local binding for the HTTP transport; defaults to 0.0.0.0:8080 and /mcp.
  • GOOGLE_OAUTH_CLIENT_ID, GOOGLE_OAUTH_CLIENT_SECRET: Credentials issued via Google Cloud Console.
  • GOOGLE_OAUTH_REQUIRED_SCOPES: Comma-separated scopes. The baseline requests openid and userinfo.email.
  • HYBRID_RETRIEVAL_QUERY_PATH: Path to the Cypher projection appended after the hybrid search prelude. The default file queries/hybrid_retrieval.cypher returns the node, its text, and the combined score.
  • EMBEDDING_MODEL, EMBEDDING_TIMEOUT_SECONDS, EMBEDDING_MAX_RETRIES: Tuning knobs for the OpenAI-compatible embedding client that powers query embeddings.

After populating .env.local, start the server:

uv run python servers/mcp_hybrid_google.py

Structured JSON logs announce startup, incoming tool invocations, embedding latencies, and retries. The /mcp/search tool returns both normalized vector and full-text scores so downstream clients (e.g., ChatGPT) can reason about result ranking. Use the /mcp/fetch tool to retrieve a specific node by elementId and see its metadata.

Neo4j Container Layout

The docker-compose.yml file provisions a single neo4j service on an isolated bridge network named rag-net. The container mounts persistent volumes for /data, /logs, and /plugins, pulls the official neo4j:5.18 image, and enables APOC via the NEO4J_PLUGINS env setting. Credentials come from .env.local, allowing make up and make down to manage the lifecycle with docker compose. Re-run make index whenever the embedding dimension changes (default: 768 for the local model).

About

Neo4j GraphRAG baseline configured with uv, Neo4j APOC container, and native pipeline config.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages