Let every interaction be driven by understanding. · Enterprise-Grade Intelligent Memory System
💬 More than memory — it's foresight.
EverMemOS is a forward-thinking intelligent system.
While traditional AI memory serves merely as a "look-back" database, EverMemOS enables AI not only to "remember" what happened, but also to "understand" the meaning behind these memories and use them to guide current actions and decisions. In the EverMemOS demo tools, you can see how EverMemOS extracts important information from your history, and then remembers your preferences, habits, and history during conversations, just like a friend who truly knows you.
On the LoCoMo benchmark, our approach built upon EverMemOS achieved a reasoning accuracy of 92.3% (evaluated by LLM-Judge), outperforming comparable methods in our evaluation.
|
[2025-11-02] 🎉 🎉 🎉 EverMemOS v1.0.0 Released!
|
Build AI memory that never forgets, making every conversation built on previous understanding.
|
Beyond "fragments," connecting "stories": Automatically linking conversation pieces to build clear thematic context, enabling AI to "truly understand." When facing multi-threaded conversations, it naturally distinguishes between "Project A progress discussion" and "Team B strategy planning," maintaining coherent contextual logic within each theme. |
Beyond "retrieval," intelligent "perception": Proactively capturing deep connections between memories and tasks, enabling AI to "think thoroughly" at critical moments. Imagine: When a user asks for "food recommendations," the AI proactively recalls "you had dental surgery two days ago" as a key piece of information, automatically adjusting suggestions to avoid unsuitable options. |
Beyond "records," dynamic "growth": Real-time user profile updates that get to know you better with each conversation, enabling AI to "recognize you authentically." Every interaction subtly updates the AI's understanding of you — preferences, style, and focus points all continuously evolve. |
EverMemOS is an open-source project designed to provide long-term memory capabilities to conversational AI agents. It extracts, structures, and retrieves information from conversations, enabling agents to maintain context, recall past interactions, and progressively build user profiles. This results in more personalized, coherent, and intelligent conversations.
📄 Paper Coming Soon - Our technical paper is in preparation. Stay tuned!
EverMemOS operates along two main tracks: memory construction and memory perception. Together they form a cognitive loop that continuously absorbs, consolidates, and applies past information, so every response is grounded in real context and long-term memory.
Memory construction layer: builds structured, retrievable long-term memory from raw conversation data.
-
Core elements
- ⚛️ Atomic memory unit MemCell: the core structured unit distilled from conversations for downstream organization and reference
- 🗂️ Multi-level memory: integrate related fragments by theme and storyline to form reusable, hierarchical memories
- 🏷️ Multiple memory types: covering episodes, profiles, preferences, relationships, semantic knowledge, basic facts, and core memories
-
Workflow
- MemCell extraction: identify key information in conversations to generate atomic memories
- Memory construction: integrate by theme and participants to form episodes and profiles
- Storage and indexing: persist data and build keyword and semantic indexes to support fast recall
Memory perception layer: quickly recalls relevant memories through multi-round reasoning and intelligent fusion, achieving precise contextual awareness.
-
🧪 Hybrid Retrieval (RRF Fusion)
Parallel execution of semantic and keyword retrieval, seamlessly fused using Reciprocal Rank Fusion algorithm -
📊 Intelligent Reranking (Reranker)
Batch concurrent processing with exponential backoff retry, maintaining stability under high throughput
Reorders candidate memories by deep relevance, prioritizing the most critical information
-
🎓 LLM-Guided Multi-Round Recall
For insufficient cases, generate 2-3 complementary queries, retrieve and fuse in parallel Automatically identifies missing information, proactively filling retrieval blind spots -
🔀 Multi-Query Parallel Strategy
When a single query cannot fully express intent, generate multiple complementary perspective queries
Enhance coverage of complex intents through multi-path RRF fusion -
⚡ Lightweight Fast Mode
For latency-sensitive scenarios, skip LLM calls and use RRF-fused hybrid retrieval
Flexibly balance between speed and quality
- Context Integration: Concatenate recalled multi-level memories (episodes, profiles, preferences) with current conversation
- Traceable Reasoning: Model generates responses based on explicit memory evidence, avoiding hallucination
💡 Through the cognitive loop of "Structured Memory → Multi-Strategy Recall → Intelligent Retrieval → Contextual Reasoning", the AI always "thinks with memory", achieving true contextual awareness.
Expand/Collapse Directory Structure
memsys-opensource/
├── src/ # Source code directory
│ ├── agentic_layer/ # Agentic layer - unified memory interface
│ ├── memory_layer/ # Memory layer - memory extraction
│ │ ├── memcell_extractor/ # MemCell extractor
│ │ ├── memory_extractor/ # Memory extractor
│ │ └── prompts/ # LLM prompt templates
│ ├── retrieval_layer/ # Retrieval layer - memory retrieval
│ ├── biz_layer/ # Business layer - business logic
│ ├── infra_layer/ # Infrastructure layer
│ ├── core/ # Core functionality (DI/lifecycle/middleware)
│ ├── component/ # Components (LLM adapters, etc.)
│ └── common_utils/ # Common utilities
├── demo/ # Demo code
├── data/ # Sample conversation data
├── evaluation/ # Evaluation scripts
│ └── src/ # Evaluation framework source code
├── data_format/ # Data format definitions
├── docs/ # Documentation
├── config.json # Configuration file
├── env.template # Environment variable template
├── pyproject.toml # Project configuration
└── README.md # Project description
- Python 3.10+
- uv (recommended package manager)
- Docker 20.10+ and Docker Compose 2.0+
- At least 4GB of available RAM (for Elasticsearch and Milvus)
Use Docker Compose to start all dependency services (MongoDB, Elasticsearch, Milvus, Redis) with one command:
# 1. Clone the repository
git clone https://github.com/EverMind-AI/EverMemOS.git
cd EverMemOS
# 2. Start Docker services
docker-compose up -d
# 3. Verify service status
docker-compose ps
# 4. Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# 5. Install project dependencies
uv sync
# 6. Configure environment variables
cp env.template .env
# Edit the .env file and fill in the necessary configurations:
# - LLM_API_KEY: Enter your LLM API Key (for memory extraction)
# - DEEPINFRA_API_KEY: Enter your DeepInfra API Key (for Embedding and Rerank)Docker Services:
| Service | Host Port | Container Port | Purpose |
|---|---|---|---|
| MongoDB | 27017 | 27017 | Primary database for storing memory cells and profiles |
| Elasticsearch | 19200 | 9200 | Keyword search engine (BM25) |
| Milvus | 19530 | 19530 | Vector database for semantic retrieval |
| Redis | 6379 | 6379 | Cache service |
💡 Connection Tips:
- Use host ports when connecting (e.g.,
localhost:19200for Elasticsearch)- MongoDB credentials:
admin/memsys123(local development only)- Stop services:
docker-compose down| View logs:docker-compose logs -f
📖 MongoDB detailed installation guide: MongoDB Installation Guide
EverMemOS offers multiple usage methods. Choose the one that best suits your needs:
The demo showcases the end-to-end functionality of EverMemOS.
🚀 Quick Start: Simple Demo (Recommended) ⭐
The fastest way to experience EverMemOS! Just 2 steps to see memory storage and retrieval in action:
# Step 1: Start the API server (in terminal 1)
uv run python src/bootstrap.py src/run.py --port 8001
# Step 2: Run the simple demo (in terminal 2)
uv run python src/bootstrap.py demo/simple_demo.pyWhat it does:
- Stores 4 conversation messages about sports hobbies
- Waits 10 seconds for indexing
- Searches for relevant memories with 3 different queries
- Shows complete workflow with friendly explanations
Perfect for: First-time users, quick testing, understanding core concepts
See the demo code at demo/simple_demo.py
We also provide a full-featured experience:
Prerequisites: Start the API Server
# Terminal 1: Start the API server (required)
uv run python src/bootstrap.py src/run.py --port 8001💡 Tip: Keep the API server running throughout. All following operations should be performed in another terminal.
Step 1: Extract Memories
Run the memory extraction script to process sample conversation data and build the memory database:
# Terminal 2: Run the extraction script
uv run python src/bootstrap.py demo/extract_memory.pyThis script performs the following actions:
- Calls
demo.tools.clear_all_data.clear_all_memories()so the demo starts from an empty MongoDB/Elasticsearch/Milvus/Redis state. Ensure the dependency stack launched bydocker-composeis running before executing the script, otherwise the wipe step will fail. - Loads
data/assistant_chat_zh.json, appendsscene="assistant"to each message, and streams every entry tohttp://localhost:8001/api/v3/agentic/memorize. Update thebase_url,data_file, orprofile_sceneconstants indemo/extract_memory.pyif you host the API on another endpoint or want to ingest a different scenario. - Writes through the HTTP API only: MemCells, episodes, and profiles are created inside your databases, not under
demo/memcell_outputs/. Inspect MongoDB (and Milvus/Elasticsearch) to verify ingestion or proceed directly to the chat demo.
💡 Tip: For detailed configuration instructions and usage guide, please refer to the Demo Documentation.
Step 2: Chat with Memory
After extracting memories, start the interactive chat demo:
# Terminal 2: Run the chat program (ensure API server is still running)
uv run python src/bootstrap.py demo/chat_with_memory.pyThis program loads .env via python-dotenv, verifies that at least one LLM key (LLM_API_KEY, OPENROUTER_API_KEY, or OPENAI_API_KEY) is available, and connects to MongoDB through demo.utils.ensure_mongo_beanie_ready to enumerate groups that already contain MemCells. Each user query invokes api/v3/agentic/retrieve_lightweight unless you explicitly select the Agentic mode, in which case the orchestrator switches to api/v3/agentic/retrieve_agentic and warns about the additional LLM latency.
Interactive Workflow:
- Select Language: Choose a zh or en terminal UI.
- Select Scenario Mode: Assistant (one-on-one) or Group Chat (multi-speaker analysis).
- Select Conversation Group: Groups are read live from MongoDB via
query_all_groups_from_mongodb; run the extraction step first so the list is non-empty. - Select Retrieval Mode:
rrf,embedding,bm25, or LLM-guided Agentic retrieval. - Start Chatting: Pose questions, inspect the retrieved memories that are displayed before each response, and use
help,clear,reload, orexitto manage the session.
The evaluation framework provides a unified, modular way to benchmark memory systems on standard datasets (LoCoMo, LongMemEval, PersonaMem).
Quick Test (Smoke Test):
# Test with limited data to verify everything works
# Default: first conversation, first 10 messages, first 3 questions
uv run python -m evaluation.cli --dataset locomo --system evermemos --smoke
# Custom smoke test: 20 messages, 5 questions
uv run python -m evaluation.cli --dataset locomo --system evermemos \
--smoke --smoke-messages 20 --smoke-questions 5
# Test different datasets
uv run python -m evaluation.cli --dataset longmemeval --system evermemos --smoke
uv run python -m evaluation.cli --dataset personamem --system evermemos --smoke
# Test specific stages (e.g., only search and answer)
uv run python -m evaluation.cli --dataset locomo --system evermemos \
--smoke --stages search answer
# View smoke test results quickly
cat evaluation/results/locomo-evermemos-smoke/report.txtFull Evaluation:
# Evaluate EvermemOS on LoCoMo benchmark
uv run python -m evaluation.cli --dataset locomo --system evermemos
# Evaluate on other datasets
uv run python -m evaluation.cli --dataset longmemeval --system evermemos
uv run python -m evaluation.cli --dataset personamem --system evermemos
# Use --run-name to distinguish multiple runs (useful for A/B testing)
uv run python -m evaluation.cli --dataset locomo --system evermemos --run-name baseline
uv run python -m evaluation.cli --dataset locomo --system evermemos --run-name experiment1
# Resume from checkpoint if interrupted (automatic)
# Just re-run the same command - it will detect and resume from checkpoint
uv run python -m evaluation.cli --dataset locomo --system evermemosView Results:
# Results are saved to evaluation/results/{dataset}-{system}[-{run-name}]/
cat evaluation/results/locomo-evermemos/report.txt # Summary metrics
cat evaluation/results/locomo-evermemos/eval_results.json # Detailed per-question results
cat evaluation/results/locomo-evermemos/pipeline.log # Execution logsThe evaluation pipeline consists of 4 stages (add → search → answer → evaluate) with automatic checkpointing and resume support.
⚙️ Evaluation Configuration:
- Data Preparation: Place datasets in
evaluation/data/(seeevaluation/README.md)- Environment: Configure
.envwith LLM API keys (seeenv.template)- Installation: Run
uv sync --group evaluationto install dependencies- Custom Config: Copy and modify YAML files in
evaluation/config/systems/orevaluation/config/datasets/- Advanced Usage: See
evaluation/README.mdfor checkpoint management, stage-specific runs, and system comparisons
Prerequisites: Start the API Server
Before calling the API, make sure the API server is running:
# Start the API server
uv run python src/bootstrap.py src/run.py --port 8001💡 Tip: Keep the API server running throughout. All following API calls should be performed in another terminal.
Use V3 API to store single message memory:
Example: Store single message memory
curl -X POST http://localhost:8001/api/v3/agentic/memorize \
-H "Content-Type: application/json" \
-d '{
"message_id": "msg_001",
"create_time": "2025-02-01T10:00:00+08:00",
"sender": "user_103",
"sender_name": "Chen",
"content": "We need to complete the product design this week",
"group_id": "group_001",
"group_name": "Project Discussion Group",
"scene": "group_chat"
}'ℹ️
sceneis a required field, only supportsassistantorgroup_chat, used to specify memory extraction strategy. ℹ️ By default, all memory types are extracted and stored
API Features:
/api/v3/agentic/memorize: Store single message memory/api/v3/agentic/retrieve_lightweight: Lightweight memory retrieval (Embedding + BM25 + RRF)/api/v3/agentic/retrieve_agentic: Agentic memory retrieval (LLM-guided multi-round intelligent retrieval)
For more API details, please refer to Agentic V3 API Documentation.
🔍 Retrieve Memories
EverMemOS provides two retrieval modes: Lightweight (fast) and Agentic (intelligent).
Lightweight Retrieval
| Parameter | Required | Description |
|---|---|---|
query |
Yes* | Natural language query (*optional for profile data source) |
user_id |
No | User ID |
data_source |
Yes | episode / event_log / semantic_memory / profile |
memory_scope |
Yes | personal (user_id only) / group (group_id only) / all (both) |
retrieval_mode |
Yes | embedding / bm25 / rrf (recommended) |
group_id |
No | Group ID |
current_time |
No | Filter valid semantic_memory (format: YYYY-MM-DD) |
top_k |
No | Number of results (default: 5) |
Example 1: Personal Memory
Example: Personal Memory Retrieval
curl -X POST http://localhost:8001/api/v3/agentic/retrieve_lightweight \
-H "Content-Type: application/json" \
-d '{
"query": "What sports does the user like?",
"user_id": "user_001",
"data_source": "episode",
"memory_scope": "personal",
"retrieval_mode": "rrf"
}'Example 2: Group Memory
Example: Group Memory Retrieval
curl -X POST http://localhost:8001/api/v3/agentic/retrieve_lightweight \
-H "Content-Type: application/json" \
-d '{
"query": "Discuss project progress",
"group_id": "project_team_001",
"data_source": "episode",
"memory_scope": "group",
"retrieval_mode": "rrf"
}'Agentic Retrieval
LLM-guided multi-round intelligent search with automatic query refinement and result reranking.
Example: Agentic Retrieval
curl -X POST http://localhost:8001/api/v3/agentic/retrieve_agentic \
-H "Content-Type: application/json" \
-d '{
"query": "What foods might the user like?",
"user_id": "user_001",
"group_id": "chat_group_001",
"top_k": 20,
"llm_config": {
"model": "gpt-4o-mini",
"api_key": "your_api_key"
}
}'
⚠️ Agentic retrieval requires LLM API key and takes longer, but provides higher quality results for queries requiring multiple memory sources and complex logic.
📖 Full Documentation: Agentic V3 API | Testing Tool:
demo/tools/test_retrieval_comprehensive.py
EverMemOS supports a standardized group chat data format (GroupChatFormat). You can use scripts for batch storage:
# Use script for batch storage (Chinese data)
uv run python src/bootstrap.py src/run_memorize.py \
--input data/group_chat_zh.json \
--api-url http://localhost:8001/api/v3/agentic/memorize \
--scene group_chat
# Or use English data
uv run python src/bootstrap.py src/run_memorize.py \
--input data/group_chat_en.json \
--api-url http://localhost:8001/api/v3/agentic/memorize \
--scene group_chat
# Validate file format
uv run python src/bootstrap.py src/run_memorize.py \
--input data/group_chat_en.json \
--scene group_chat \
--validate-onlyℹ️ Scene Parameter Explanation: The
sceneparameter is required and specifies the memory extraction strategy:
- Use
assistantfor one-on-one conversations with AI assistant- Use
group_chatfor multi-person group discussionsNote: In your data files, you may see
scenevalues likeworkorcompany- these are internal scene descriptors in the data format. The--scenecommand-line parameter uses different values (assistant/group_chat) to specify which extraction pipeline to apply.
GroupChatFormat Example:
{
"version": "1.0.0",
"conversation_meta": {
"group_id": "group_001",
"name": "Project Discussion Group",
"user_details": {
"user_101": {
"full_name": "Alice",
"role": "Product Manager"
}
}
},
"conversation_list": [
{
"message_id": "msg_001",
"create_time": "2025-02-01T10:00:00+08:00",
"sender": "user_101",
"content": "Good morning everyone"
}
]
}For complete format specifications, please refer to Group Chat Format Specification.
For detailed installation, configuration, and usage instructions, please refer to:
- 📚 Quick Start Guide - Complete installation and configuration steps
- 📖 API Usage Guide - API endpoints and data format details
- 🔧 Development Guide - Architecture design and development best practices
- 🚀 Bootstrap Usage - Script runner usage instructions
- 📝 Group Chat Format Specification - Standardized data format
- Quick Start Guide - Installation, configuration, and startup
- Development Guide - Architecture design and best practices
- Bootstrap Usage - Script runner
- Agentic V3 API - Agentic layer API
- Dependency Injection Framework - DI container usage guide
- 📖 Demo Guide - Interactive examples and memory extraction demos
- 📊 Data Guide - Sample conversation data and format specifications
- 📊 Evaluation Guide - Testing EverMemOS-based methods on standard benchmarks
EverMemOS adopts a layered architecture design, mainly including:
- Agentic Layer: Memory extraction, vectorization, retrieval, and reranking
- Memory Layer: MemCell extraction, episodic memory management
- Retrieval Layer: Multi-modal retrieval and result ranking
- Business Layer: Business logic and data operations
- Infrastructure Layer: Database, cache, message queue adapters, etc.
- Core Framework: Dependency injection, middleware, queue management, etc.
For more architectural details, please refer to the Development Guide.
We welcome all forms of contributions! Whether it's reporting bugs, proposing new features, or submitting code improvements.
Before contributing, please read our Contributing Guide to learn about:
- Development environment setup
- Code standards and best practices
- Git commit conventions (Gitemoji)
- Pull Request process
We are building a vibrant open-source community!
Thanks to all the developers who have contributed to this project!
If you use EverMemOS in your research, please cite our paper (coming soon):
Coming soon
This project is licensed under the Apache License 2.0. This means you are free to use, modify, and distribute this project, with the following key conditions:
- You must include a copy of the Apache 2.0 license
- You must state any significant changes made to the code
- You must retain all copyright, patent, trademark, and attribution notices
- If a NOTICE file is included, you must include it in your distribution
Thanks to the following projects and communities for their inspiration and support:
-
Memos - Thank you to the Memos project for providing a comprehensive, standardized open-source note-taking service that has provided valuable inspiration for our memory system design.
-
Nemori - Thank you to the Nemori project for providing a self-organising long-term memory substrate for agentic LLM workflows that has provided valuable inspiration for our memory system design.
If this project helps you, please give us a ⭐️
Made with ❤️ by the EverMemOS Team
