AI-Podcast is an intelligent audio content generation platform powered by Multi-Agent collaboration. It transforms topics, documents, or URLs into immersive, podcast-style audio conversations. By leveraging state-of-the-art LLMs and TTS models, it orchestrates a team of AI agents—from planners to directors and voice actors—to produce high-quality, structured, and engaging audio content in real-time.
- 👥 Flexible Cast Formats: Supports Solo (Monologue), Duo (Dialogue), and Multi-person (Roundtable) modes to suit different content styles.
- 🧠 Adaptive Depth Modes:
- Lite Mode: Quick, concise summaries for rapid consumption.
- Deep Exploration Mode: In-depth analysis where agents perform parallel web searches and structured outlining for comprehensive coverage.
- 🎭 Custom Personas & Voices: Users can fully customize character personalities (system prompts) and timbre (voice cloning/selection).
- 🗣️ Real-time Interaction: Supports user intervention, allowing you to join the discussion and steer the conversation in real-time.
- 📚 Diverse Inputs: Generate podcasts from a simple Topic, uploaded Documents (PDF/Txt), or URLs.
This project is built upon a robust stack of cutting-edge open-source tools:
- Multi-Agent Framework: AgentScope - Orchestrates the complex interaction between agents.
- Large Language Models (LLM):
Qwen/Qwen3-8Btencent/Hunyuan-7B-Instruct
- Text-to-Speech (TTS):
FunAudioLLM/CosyVoice2-0.5B- Provides natural, emotional, and streaming speech synthesis.
In the Deep Exploration Mode, the system employs a sophisticated chain of agents to ensure the content is factual, structured, and engaging.
graph TD
%% Nodes
User([👤 User Input])
Planner[🧠 Planner Agent]
SubPlanner[🔎 Sub Planner Agents]
Outline[📝 Podcast Outline]
Director[🎬 Director Agent]
Roles[🗣️ Role Agents]
ScreenWriter[✍️ ScreenWriter Agent]
TTS[🌊 TTS Engine]
Audio([🔊 Streamed Audio])
Summary[📉 Summary Agent]
Memory[(💾 Global Memory)]
%% Flow
User -->|Topic / Doc / URL| Planner
Planner -->|Identify Intent & Distribute Tasks| SubPlanner
subgraph Exploration [Parallel Exploration Phase]
SubPlanner -->|Web Search & Research| SubPlanner
end
SubPlanner -->|Finalize Structure| Outline
subgraph Production [Chapter Loop Production]
Outline -->|Iterate Chapters| Director
Director -->|Set Emotion & Direction| Roles
Roles -->|Broadcast Information| Roles
Roles -->|Raw Dialogue| ScreenWriter
ScreenWriter -->|Polish & Format| TTS
TTS -.->|Real-time Stream| Audio
end
%% Memory & Loop
ScreenWriter --> Summary
Summary -->|Update Context| Memory
Memory -.->|Context Feedback| Director
Memory -.->|Context Feedback| Roles
- Planner Agent: Analyzes user intent and creates a high-level directive.
- Sub Planner Agents: Execute parallel tasks (including Web Search) to gather information and draft a detailed Podcast Outline.
- Director Agent: For each chapter, sets the emotional tone and guides the conversation flow.
- Role Agents: Adopt specific personas and generate dialogue using an Information Broadcast mechanism to share knowledge.
- ScreenWriter Agent: Aggregates the dialogue, polishes the script for natural flow, and hands it off to TTS.
- TTS Engine: Converts the script into audio via streaming for low latency.
- Summary Agent & Memory: Maintains the context of the conversation to ensure consistency across chapters.
- Core Framework: Setup AgentScope environment.
- Cast Modes: Support for Single, Dual, and Multi-agent conversations.
- Customization: Support for user-defined personas and specific voice timbres.
- Deep Exploration Mode: Integrate Sub-Planner parallel search and outline generation (Pending Merge).
- Input Handling:
- User Topic Input
- Document (PDF/Text) Parsing (Pending Merge)
- URL Content Extraction (Pending Merge)
- Real-time Interaction: Enable users to interrupt and participate in the chat (Pending Merge).
- Python 3.9+
- CUDA-compatible GPU (for local LLM/TTS inference)
-
Clone the repository
git clone https://github.com//h2h2h/multi_agents_podcast.git cd multi_agents_podcast -
Install dependencies
pip install -r requirements.txt
-
Model Setup
- Download
CosyVoice2-0.5Band place it in themodels/ttsdirectory. - Configure your LLM endpoints (or local paths for Qwen/Hunyuan) in
config.yaml.
- Download
Todo
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.