Skip to content
forked from jolovicdev/shandu

OpenAI DeepResearch alternative, An AI-driven research system that performs comprehensive, iterative research on any topic using multiple search engines and LLMs.

License

Notifications You must be signed in to change notification settings

Rogerspy/shandu

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shandu: Advanced AI Research System for Comprehensive Knowledge Synthesis

Shandu is a cutting-edge AI research assistant that performs in-depth, multi-source research on any topic using advanced language models, intelligent web scraping, and iterative exploration to generate comprehensive, well-structured reports with proper citations.

MIT License Python 3.9+

🔍 What is Shandu?

Shandu is an intelligent, LLM-powered research system that automates the comprehensive research process - from initial query clarification to in-depth content analysis and report generation. Built on LangGraph's state-based workflow, it recursively explores topics with sophisticated algorithms for source evaluation, content extraction, and knowledge synthesis.

Key Use Cases

  • Academic Research: Generate literature reviews, background information, and complex topic analyses
  • Market Intelligence: Analyze industry trends, competitor strategies, and market opportunities
  • Content Creation: Produce well-researched articles, blog posts, and reports with proper citations
  • Technology Exploration: Track emerging technologies, innovations, and technical developments
  • Policy Analysis: Research regulations, compliance requirements, and policy implications
  • Competitive Analysis: Compare products, services, and company strategies across industries

⚙️ How Shandu Works

flowchart TB
    subgraph Input
        Q[User Query]
        B[Breadth Parameter]
        D[Depth Parameter]
    end

    DR[Deep Research] -->
    SQ[SERP Queries] -->
    PR[Process Results]

    subgraph Results[Results]
        direction TB
        NL((Learnings))
        ND((Directions))
    end

    PR --> NL
    PR --> ND

    DP{depth > 0?}

    RD["Next Direction:
    - Prior Goals
    - New Questions
    - Learnings"]

    MR[Markdown Report]

    %% Main Flow
    Q & B & D --> DR

    %% Results to Decision
    NL & ND --> DP

    %% Circular Flow
    DP -->|Yes| RD
    RD -->|New Context| DR

    %% Final Output
    DP -->|No| MR

    %% Styling
    classDef input fill:#7bed9f,stroke:#2ed573,color:black
    classDef process fill:#70a1ff,stroke:#1e90ff,color:black
    classDef recursive fill:#ffa502,stroke:#ff7f50,color:black
    classDef output fill:#ff4757,stroke:#ff6b81,color:black
    classDef results fill:#a8e6cf,stroke:#3b7a57,color:black

    class Q,B,D input
    class DR,SQ,PR process
    class DP,RD recursive
    class MR output
    class NL,ND results
Loading

🌟 Key Features

  • Intelligent State-based Workflow: Leverages LangGraph for a structured, step-by-step research process with clear state transitions
  • Iterative Deep Exploration: Recursively explores topics with dynamic depth and breadth, adapting to information discovered
  • Multi-source Information Synthesis: Analyzes data from search engines, web content, and structured knowledge bases
  • Enhanced Web Scraping: Features dynamic JS rendering, content extraction, and ethical scraping practices
  • Smart Source Evaluation: Automatically assesses source credibility, relevance, and information value
  • Content Analysis Pipeline: Uses advanced NLP to extract key information, identify patterns, and synthesize findings
  • Comprehensive Report Generation: Creates detailed, well-structured reports with proper citations and thematic organization
  • Parallel Processing Architecture: Implements concurrent operations for efficient multi-query execution
  • Adaptive Search Strategy: Dynamically adjusts search queries based on discovered information and knowledge gaps
  • Full Citation Management: Properly attributes all sources with formatted citations in multiple styles

🚀 Installation

# Install from PyPI
pip install shandu

# Install from source
git clone https://github.com/jolovicdev/shandu.git
cd shandu
pip install -e .

🏁 Quick Start

# Configure API settings (supports various LLM providers)
shandu configure

# Run comprehensive research
shandu research "Your research query" --depth 2 --breadth 4 --output report.md

# Quick AI-powered search with web scraping (You dont need Perplexity!)
shandu aisearch "Who is the current sitting president of United States?" --detailed

# Basic multi-engine search
shandu search "Your search query"

📝 Example Reports

The examples directory contains comprehensive research reports generated by Shandu. Check out:

📚 Detailed Usage

Research Command

shandu research "Your research query" \
    --depth 3 \                # How deep to explore (1-5, default: 2)
    --breadth 5 \              # How many parallel queries (2-10, default: 4)
    --output report.md \       # Save to file instead of terminal
    --verbose                  # Show detailed progress

Example Reports

You can find example reports in the examples directory:

  1. High Energy Storage Report - A comprehensive analysis of technological innovations in renewable energy storage (2020-2025)
    shandu research "What are the technological advancements in renewable energy storage (e.g., batteries, hydrogen) between 2020 and 2025, and how have they impacted energy grid reliability?" --depth 2 --breadth 2 -o "examples/o3-mini-high_energy-storage-report.md"

AI Search Command with Scraping

shandu aisearch "Your search query" \
    --engines "google,duckduckgo" \  # Comma-separated list of search engines
    --max-results 15 \               # Maximum number of results to return
    --output results.md \            # Save to file instead of terminal
    --detailed                       # Generate a detailed analysis

💻 Python API

from shandu.agents import ResearchGraph
from langchain_openai import ChatOpenAI

# Initialize with custom LLM if desired
llm = ChatOpenAI(model="gpt-4")

# Initialize the research graph
researcher = ResearchGraph(
    llm=llm,
    temperature=0.5
)

# Perform deep research
results = researcher.research_sync(
    query="Your research query",
    depth=3,       # How deep to go with recursive research
    breadth=4,     # How many parallel queries to explore
    detail_level="high"
)

# Print or save results
print(results.to_markdown())

🧩 Advanced Architecture

Shandu implements a sophisticated multi-stage research pipeline:

  1. Query Clarification: Interactive questions to understand research needs
  2. Research Planning: Strategic planning for comprehensive topic coverage
  3. Iterative Exploration:
    • Smart query generation based on knowledge gaps
    • Multi-engine search with parallelized execution
    • Relevance filtering of search results
    • Intelligent web scraping with content extraction
    • Source credibility assessment
    • Information analysis and synthesis
    • Reflection on findings to identify gaps
  4. Report Generation:
    • Theme extraction and organization
    • Multi-step report enhancement
    • Citation formatting and management
    • Section expansion for comprehensive coverage

🔌 Supported Search Engines & Sources

  • Google Search
  • DuckDuckGo
  • Wikipedia
  • ArXiv (academic papers)
  • Custom search engines can be added

📊 Technical Capabilities

  • Dynamic JS Rendering: Handles JavaScript-heavy websites
  • Content Extraction: Identifies and extracts main content from web pages
  • Parallel Processing: Concurrent execution of searches and scraping
  • Caching: Efficient caching of search results and scraped content
  • Rate Limiting: Respectful access to web resources
  • Robots.txt Compliance: Ethical web scraping practices
  • Flexible Output Formats: Markdown, JSON, plain text

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

OpenAI DeepResearch alternative, An AI-driven research system that performs comprehensive, iterative research on any topic using multiple search engines and LLMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%