Auto-Slides: Automatic Academic Presentation Generation with Multi-Agent Collaboration

Yuheng Yang¹ Wenjia Jiang¹ Yang Wang¹ Yiwei Wang² Chi Zhang¹^*
¹AGI Lab, Westlake University ²University of California at Merced
[email protected]

Auto-Slides is an intelligent system that automatically converts academic research papers into well-structured, pedagogically optimized presentation slides. Built on large language models and cognitive science principles, it creates multimodal presentations with interactive customization capabilities.

🏗️ System Architecture

🚀 Features

Intelligent PDF Processing: Automatically extracts text, figures, tables, and structure from research papers using advanced OCR and layout analysis
Multi-Agent Framework: Employs specialized agents for content extraction, presentation planning, verification, and repair
Interactive Customization: Supports real-time refinement through natural language dialogue
Pedagogical Optimization: Creates presentation-oriented narratives that enhance learning and comprehension
Multimodal Output: Generates slides with proper figure placement, table formatting, and code syntax highlighting
Multiple Themes: Supports various Beamer themes (Madrid, Berlin, Singapore, etc.)
Bilingual Support: Works with both English and Chinese papers
Speech Generation: Optional accompanying speech script generation

📋 Requirements

Python 3.8+
LaTeX environment (TeX Live or MiKTeX)
OpenAI API key
8GB+ RAM (for marker-pdf model)

🛠️ Installation

1. Clone the Repository

git clone https://github.com/your-username/auto-slides.git
cd auto-slides

2. Create Virtual Environment (Recommended)

python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

3. Install Dependencies

pip install -r requirements.txt

4. Download Required Models

Important: Download the marker-pdf model for PDF processing:

python down_model.py

This will download the marker-pdf model to the models/ directory (~2GB).

5. Configure API Keys

Option 1: Use the provided template (Recommended)

Rename the provided template file and add your API key:

# Rename the template file
mv ".env copy" .env

# Edit the .env file and add your OpenAI API key
nano .env  # or use your preferred editor

Option 2: Create from scratch

Create a .env file in the project root:

# OpenAI API Configuration (Required)
OPENAI_API_KEY=your_openai_api_key_here

# Optional: LangSmith for monitoring (uncomment if needed)
# LANGCHAIN_TRACING_V2="true"
# LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
# LANGCHAIN_API_KEY="your_langsmith_api_key_here"
# LANGCHAIN_PROJECT="auto-slides"

Note: The .env copy file contains a template with all available configuration options. Simply rename it to .env and update the OPENAI_API_KEY value with your actual API key.

🎯 Quick Start

Basic Usage

Convert a research paper to presentation slides:

python main.py path/to/your/paper.pdf

Advanced Options

python main.py path/to/your/paper.pdf \
    --language en \
    --model gpt-4o \
    --theme Madrid \
    --output-dir output \
    --verbose

Disable Interactive Revision

By default, Auto-Slides enables interactive revision after generating slides. To disable this feature:

python main.py path/to/your/paper.pdf --no-interactive-revise

Revision Mode

Modify existing presentations with feedback:

python main.py --revise \
    --original-plan=path/to/plan.json \
    --previous-tex=path/to/output.tex \
    --feedback="Please make the title more prominent"

📖 Command Line Options

Required Arguments

pdf_path: Path to the input PDF file

Optional Arguments

Basic Options

--output-dir, -o: Output directory (default: output)
--language, -l: Output language (zh or en, default: en)
--model, -m: Language model to use (default: gpt-4o)
--theme: Beamer theme (default: Madrid)
--verbose, -v: Show verbose logs

Processing Options

--disable-llm-enhancement: Use basic PDF parsing only
--skip-compilation, -s: Generate TEX only, skip PDF compilation
--max-retries, -r: Maximum retries for compilation (default: 5)

Interactive Features

--interactive, -i: Enable interactive mode for presentation plan optimization
--no-interactive-revise: Disable interactive revision after slide generation (enabled by default)

Verification & Repair

--enable-verification: Enable content verification (default: enabled)
--enable-auto-repair: Enable automatic content repair (default: enabled)
--disable-verification: Disable verification and repair (fast mode)

Speech Generation

--enable-speech: Generate accompanying speech script
--speech-duration: Target speech duration in minutes (default: 15)
--speech-style: Speech style (academic_conference, classroom, industry_presentation, public_talk)

Revision Mode

--revise, -R: Enable revision mode
--original-plan: Path to original presentation plan JSON
--previous-tex: Path to previous TEX file
--feedback: User feedback for modifications

📁 Output Structure

The system generates organized output in the specified directory:

output/
├── raw/<session_id>/          # Extracted PDF content
├── plan/<session_id>/         # Presentation plans (JSON)
├── tex/<session_id>/          # Generated LaTeX files and PDFs
├── images/<session_id>/       # Extracted figures and tables
├── verification/<session_id>/ # Verification reports
├── repair/<session_id>/       # Auto-repair results
└── speech/<session_id>/       # Generated speech scripts

🔧 System Architecture

Auto-Slides employs a sophisticated multi-agent framework with the following components:

PDF Parser: Extracts content using marker-pdf and OCR technologies
Presentation Planner: Generates structured presentation plans based on cognitive science principles
Verification Agent: Ensures content coverage and accuracy through automated validation
Repair Agent: Automatically fixes identified issues and improves content completeness
TEX Generator: Creates high-quality LaTeX Beamer code with proper formatting
Interactive Editor: Enables real-time customization through natural language dialogue

The system processes research papers through multiple stages, from initial content extraction to final presentation generation, with built-in quality assurance and user interaction capabilities.

🎨 Supported Themes

Madrid (default)
Berlin
Singapore
Warsaw
Copenhagen
And more Beamer themes

🌍 Language Support

English: Full support with optimized prompts
Chinese: Full support with Chinese-specific processing

⚠️ Troubleshooting

Common Issues

Q: Images not extracted or missing?

Ensure the marker-pdf model is downloaded: python down_model.py
Check that images exist in output/images/<session_id>/

Q: API key not working?

Verify OPENAI_API_KEY is set in your .env file
Check your OpenAI account has sufficient credits

Q: LaTeX compilation fails?

Ensure LaTeX is installed and pdflatex is in your PATH
The system automatically uses -shell-escape for code highlighting

Q: Memory issues with large PDFs?

The marker-pdf model requires ~8GB RAM
Consider using --disable-llm-enhancement for basic processing

Performance Tips

Use --disable-verification for faster processing
Enable --skip-compilation to generate TEX only
Use --no-interactive-revise to skip interactive features

📚 Citation

If you use Auto-Slides in your research, please cite our paper:

@article{yang2025autoslides,
  title={Auto-Slides: Automatic Academic Presentation Generation with Multi-Agent Collaboration},
  author={Yang, Yuheng and Jiang, Wenjia and Wang, Yang and Wang, Yiwei and Zhang, Chi},
  journal={arXiv preprint arXiv:2509.11062},
  year={2025},
  note={AGI Lab, Westlake University; University of California at Merced; Corresponding author: Chi Zhang},
  url={https://auto-slides.github.io/},
  eprint={2509.11062},
  archivePrefix={arXiv},
  primaryClass={cs.AI}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📞 Contact

Project Page: auto-slides.github.io
Paper: arXiv:2509.11062
Issues: GitHub Issues
Email: [email protected]

🙏 Acknowledgments

marker-pdf for PDF processing
OpenAI for language models
LangChain for LLM orchestration
Beamer for presentation framework

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
examples		examples
modules		modules
prompts		prompts
static		static
tests		tests
utils		utils
.env copy		.env copy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
down_model.py		down_model.py
main.py		main.py
patch_openai.py		patch_openai.py
requirements.lock		requirements.lock
requirements.txt		requirements.txt

License

jizew/Auto-Slides

Folders and files

Latest commit

History

Repository files navigation

Auto-Slides: Automatic Academic Presentation Generation with Multi-Agent Collaboration

🏗️ System Architecture

🚀 Features

📋 Requirements

🛠️ Installation

1. Clone the Repository

2. Create Virtual Environment (Recommended)

3. Install Dependencies

4. Download Required Models

5. Configure API Keys

🎯 Quick Start

Basic Usage

Advanced Options

Disable Interactive Revision

Revision Mode

📖 Command Line Options

Required Arguments

Optional Arguments

Basic Options

Processing Options

Interactive Features

Verification & Repair

Speech Generation

Revision Mode

📁 Output Structure

🔧 System Architecture

🎨 Supported Themes

🌍 Language Support

⚠️ Troubleshooting

Common Issues

Performance Tips

📚 Citation

📄 License

🤝 Contributing

📞 Contact

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages