EverMemOS

Let every interaction be driven by understanding. · Enterprise-Grade Intelligent Memory System

💬 More than memory — it's foresight.

EverMemOS is a forward-thinking intelligent system.
While traditional AI memory serves merely as a "look-back" database, EverMemOS enables AI not only to "remember" what happened, but also to "understand" the meaning behind these memories and use them to guide current actions and decisions. In the EverMemOS demo tools, you can see how EverMemOS extracts important information from your history, and then remembers your preferences, habits, and history during conversations, just like a friend who truly knows you. On the LoCoMo benchmark, our approach built upon EverMemOS achieved a reasoning accuracy of 92.3% (evaluated by LLM-Judge), outperforming comparable methods in our evaluation.

📢 Latest Updates

[2025-11-02] 🎉 🎉 🎉 EverMemOS v1.0.0 Released!

✨ Stable Version: AI Memory System officially open sourced
📚 Complete Documentation: Quick start guide and comprehensive API documentation
📈 Benchmark Testing: LoCoMo dataset benchmark evaluation pipeline
🖥️ Demo Tools: Get started quickly with easy-to-use demos

🎯 Core Vision

Build AI memory that never forgets, making every conversation built on previous understanding.

💡 Unique Advantages

🔗 Coherent Narrative

Beyond "fragments," connecting "stories": Automatically linking conversation pieces to build clear thematic context, enabling AI to "truly understand."

When facing multi-threaded conversations, it naturally distinguishes between "Project A progress discussion" and "Team B strategy planning," maintaining coherent contextual logic within each theme.

From scattered phrases to complete narratives, AI no longer just "understands one sentence" but "understands the whole story."

🧠 Evidence-Based Perception

Beyond "retrieval," intelligent "perception": Proactively capturing deep connections between memories and tasks, enabling AI to "think thoroughly" at critical moments.

Imagine: When a user asks for "food recommendations," the AI proactively recalls "you had dental surgery two days ago" as a key piece of information, automatically adjusting suggestions to avoid unsuitable options.

This is Contextual Awareness — enabling AI thinking to be truly built on understanding rather than isolated responses.

💾 Living Profiles

Beyond "records," dynamic "growth": Real-time user profile updates that get to know you better with each conversation, enabling AI to "recognize you authentically."

Every interaction subtly updates the AI's understanding of you — preferences, style, and focus points all continuously evolve.

As interactions deepen, it doesn't just "remember what you said," but is "learning who you are."

📚 Documentation
🏗️ Architecture Design
🤝 Contributing
🌟 Join Us
🙏 Acknowledgments

📖 Project Introduction

EverMemOS is an open-source project designed to provide long-term memory capabilities to conversational AI agents. It extracts, structures, and retrieves information from conversations, enabling agents to maintain context, recall past interactions, and progressively build user profiles. This results in more personalized, coherent, and intelligent conversations.

📄 Paper Coming Soon - Our technical paper is in preparation. Stay tuned!

🎯 System Framework

EverMemOS operates along two main tracks: memory construction and memory perception. Together they form a cognitive loop that continuously absorbs, consolidates, and applies past information, so every response is grounded in real context and long-term memory.

🧩 Memory Construction

Memory construction layer: builds structured, retrievable long-term memory from raw conversation data.

Core elements
- ⚛️ Atomic memory unit MemCell: the core structured unit distilled from conversations for downstream organization and reference
- 🗂️ Multi-level memory: integrate related fragments by theme and storyline to form reusable, hierarchical memories
- 🏷️ Multiple memory types: covering episodes, profiles, preferences, relationships, semantic knowledge, basic facts, and core memories
Workflow
1. MemCell extraction: identify key information in conversations to generate atomic memories
2. Memory construction: integrate by theme and participants to form episodes and profiles
3. Storage and indexing: persist data and build keyword and semantic indexes to support fast recall

🔎 Memory Perception

Memory perception layer: quickly recalls relevant memories through multi-round reasoning and intelligent fusion, achieving precise contextual awareness.

🎯 Intelligent Retrieval Tools

🧪 Hybrid Retrieval (RRF Fusion)
Parallel execution of semantic and keyword retrieval, seamlessly fused using Reciprocal Rank Fusion algorithm
📊 Intelligent Reranking (Reranker)
Batch concurrent processing with exponential backoff retry, maintaining stability under high throughput
Reorders candidate memories by deep relevance, prioritizing the most critical information

🤖 Agentic Intelligent Retrieval

🎓 LLM-Guided Multi-Round Recall
For insufficient cases, generate 2-3 complementary queries, retrieve and fuse in parallel Automatically identifies missing information, proactively filling retrieval blind spots
🔀 Multi-Query Parallel Strategy
When a single query cannot fully express intent, generate multiple complementary perspective queries
Enhance coverage of complex intents through multi-path RRF fusion
⚡ Lightweight Fast Mode
For latency-sensitive scenarios, skip LLM calls and use RRF-fused hybrid retrieval
Flexibly balance between speed and quality

🧠 Reasoning Fusion

Context Integration: Concatenate recalled multi-level memories (episodes, profiles, preferences) with current conversation
Traceable Reasoning: Model generates responses based on explicit memory evidence, avoiding hallucination

💡 Through the cognitive loop of "Structured Memory → Multi-Strategy Recall → Intelligent Retrieval → Contextual Reasoning", the AI always "thinks with memory", achieving true contextual awareness.

📁 Project Structure

Expand/Collapse Directory Structure

memsys-opensource/
├── src/                              # Source code directory
│   ├── agentic_layer/                # Agentic layer - unified memory interface
│   ├── memory_layer/                 # Memory layer - memory extraction
│   │   ├── memcell_extractor/        # MemCell extractor
│   │   ├── memory_extractor/         # Memory extractor
│   │   └── prompts/                  # LLM prompt templates
│   ├── retrieval_layer/              # Retrieval layer - memory retrieval
│   ├── biz_layer/                    # Business layer - business logic
│   ├── infra_layer/                  # Infrastructure layer
│   ├── core/                         # Core functionality (DI/lifecycle/middleware)
│   ├── component/                    # Components (LLM adapters, etc.)
│   └── common_utils/                 # Common utilities
├── demo/                             # Demo code
├── data/                             # Sample conversation data
├── evaluation/                       # Evaluation scripts
│   └── src/                          # Evaluation framework source code
├── data_format/                      # Data format definitions
├── docs/                             # Documentation
├── config.json                       # Configuration file
├── env.template                      # Environment variable template
├── pyproject.toml                    # Project configuration
└── README.md                         # Project description

🚀 Quick Start

Prerequisites

Python 3.10+
uv (recommended package manager)
Docker 20.10+ and Docker Compose 2.0+
At least 4GB of available RAM (for Elasticsearch and Milvus)

Installation

Using Docker for Dependency Services ⭐

Use Docker Compose to start all dependency services (MongoDB, Elasticsearch, Milvus, Redis) with one command:

# 1. Clone the repository
git clone https://github.com/EverMind-AI/EverMemOS.git
cd EverMemOS

# 2. Start Docker services
docker-compose up -d

# 3. Verify service status
docker-compose ps

# 4. Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 5. Install project dependencies
uv sync

# 6. Configure environment variables
cp env.template .env
# Edit the .env file and fill in the necessary configurations:
#   - LLM_API_KEY: Enter your LLM API Key (for memory extraction)
#   - DEEPINFRA_API_KEY: Enter your DeepInfra API Key (for Embedding and Rerank)

Docker Services:

Service	Host Port	Container Port	Purpose
MongoDB	27017	27017	Primary database for storing memory cells and profiles
Elasticsearch	19200	9200	Keyword search engine (BM25)
Milvus	19530	19530	Vector database for semantic retrieval
Redis	6379	6379	Cache service

💡 Connection Tips:

Use host ports when connecting (e.g., localhost:19200 for Elasticsearch)

MongoDB credentials: admin / memsys123 (local development only)

Stop services: docker-compose down | View logs: docker-compose logs -f

📖 MongoDB detailed installation guide: MongoDB Installation Guide

How to Use

EverMemOS offers multiple usage methods. Choose the one that best suits your needs:

🎯 Run Demo: Memory Extraction and Interactive Chat

The demo showcases the end-to-end functionality of EverMemOS.

🚀 Quick Start: Simple Demo (Recommended) ⭐

The fastest way to experience EverMemOS! Just 2 steps to see memory storage and retrieval in action:

# Step 1: Start the API server (in terminal 1)
uv run python src/bootstrap.py src/run.py --port 8001

# Step 2: Run the simple demo (in terminal 2)
uv run python src/bootstrap.py demo/simple_demo.py

What it does:

Stores 4 conversation messages about sports hobbies
Waits 10 seconds for indexing
Searches for relevant memories with 3 different queries
Shows complete workflow with friendly explanations

Perfect for: First-time users, quick testing, understanding core concepts

See the demo code at demo/simple_demo.py

We also provide a full-featured experience:

Prerequisites: Start the API Server

# Terminal 1: Start the API server (required)
uv run python src/bootstrap.py src/run.py --port 8001

💡 Tip: Keep the API server running throughout. All following operations should be performed in another terminal.

Step 1: Extract Memories

Run the memory extraction script to process sample conversation data and build the memory database:

# Terminal 2: Run the extraction script
uv run python src/bootstrap.py demo/extract_memory.py

This script performs the following actions:

Calls demo.tools.clear_all_data.clear_all_memories() so the demo starts from an empty MongoDB/Elasticsearch/Milvus/Redis state. Ensure the dependency stack launched by docker-compose is running before executing the script, otherwise the wipe step will fail.
Loads data/assistant_chat_zh.json, appends scene="assistant" to each message, and streams every entry to http://localhost:8001/api/v3/agentic/memorize. Update the base_url, data_file, or profile_scene constants in demo/extract_memory.py if you host the API on another endpoint or want to ingest a different scenario.
Writes through the HTTP API only: MemCells, episodes, and profiles are created inside your databases, not under demo/memcell_outputs/. Inspect MongoDB (and Milvus/Elasticsearch) to verify ingestion or proceed directly to the chat demo.

💡 Tip: For detailed configuration instructions and usage guide, please refer to the Demo Documentation.

Step 2: Chat with Memory

After extracting memories, start the interactive chat demo:

# Terminal 2: Run the chat program (ensure API server is still running)
uv run python src/bootstrap.py demo/chat_with_memory.py

This program loads .env via python-dotenv, verifies that at least one LLM key (LLM_API_KEY, OPENROUTER_API_KEY, or OPENAI_API_KEY) is available, and connects to MongoDB through demo.utils.ensure_mongo_beanie_ready to enumerate groups that already contain MemCells. Each user query invokes api/v3/agentic/retrieve_lightweight unless you explicitly select the Agentic mode, in which case the orchestrator switches to api/v3/agentic/retrieve_agentic and warns about the additional LLM latency.

Interactive Workflow:

Select Language: Choose a zh or en terminal UI.
Select Scenario Mode: Assistant (one-on-one) or Group Chat (multi-speaker analysis).
Select Conversation Group: Groups are read live from MongoDB via query_all_groups_from_mongodb; run the extraction step first so the list is non-empty.
Select Retrieval Mode: rrf, embedding, bm25, or LLM-guided Agentic retrieval.
Start Chatting: Pose questions, inspect the retrieved memories that are displayed before each response, and use help, clear, reload, or exit to manage the session.

📊 Run Evaluation: Performance Testing

The evaluation framework provides a unified, modular way to benchmark memory systems on standard datasets (LoCoMo, LongMemEval, PersonaMem).

Quick Test (Smoke Test):

# Test with limited data to verify everything works
# Default: first conversation, first 10 messages, first 3 questions
uv run python -m evaluation.cli --dataset locomo --system evermemos --smoke

# Custom smoke test: 20 messages, 5 questions
uv run python -m evaluation.cli --dataset locomo --system evermemos \
    --smoke --smoke-messages 20 --smoke-questions 5

# Test different datasets
uv run python -m evaluation.cli --dataset longmemeval --system evermemos --smoke
uv run python -m evaluation.cli --dataset personamem --system evermemos --smoke

# Test specific stages (e.g., only search and answer)
uv run python -m evaluation.cli --dataset locomo --system evermemos \
    --smoke --stages search answer

# View smoke test results quickly
cat evaluation/results/locomo-evermemos-smoke/report.txt

Full Evaluation:

# Evaluate EvermemOS on LoCoMo benchmark
uv run python -m evaluation.cli --dataset locomo --system evermemos

# Evaluate on other datasets
uv run python -m evaluation.cli --dataset longmemeval --system evermemos
uv run python -m evaluation.cli --dataset personamem --system evermemos

# Use --run-name to distinguish multiple runs (useful for A/B testing)
uv run python -m evaluation.cli --dataset locomo --system evermemos --run-name baseline
uv run python -m evaluation.cli --dataset locomo --system evermemos --run-name experiment1

# Resume from checkpoint if interrupted (automatic)
# Just re-run the same command - it will detect and resume from checkpoint
uv run python -m evaluation.cli --dataset locomo --system evermemos

View Results:

# Results are saved to evaluation/results/{dataset}-{system}[-{run-name}]/
cat evaluation/results/locomo-evermemos/report.txt          # Summary metrics
cat evaluation/results/locomo-evermemos/eval_results.json   # Detailed per-question results
cat evaluation/results/locomo-evermemos/pipeline.log        # Execution logs

The evaluation pipeline consists of 4 stages (add → search → answer → evaluate) with automatic checkpointing and resume support.

⚙️ Evaluation Configuration:

Data Preparation: Place datasets in evaluation/data/ (see evaluation/README.md)

Environment: Configure .env with LLM API keys (see env.template)

Installation: Run uv sync --group evaluation to install dependencies

Custom Config: Copy and modify YAML files in evaluation/config/systems/ or evaluation/config/datasets/

Advanced Usage: See evaluation/README.md for checkpoint management, stage-specific runs, and system comparisons

🔌 Call API Endpoints

Prerequisites: Start the API Server

Before calling the API, make sure the API server is running:

# Start the API server
uv run python src/bootstrap.py src/run.py --port 8001

💡 Tip: Keep the API server running throughout. All following API calls should be performed in another terminal.

Use V3 API to store single message memory:

Example: Store single message memory

curl -X POST http://localhost:8001/api/v3/agentic/memorize \
  -H "Content-Type: application/json" \
  -d '{
    "message_id": "msg_001",
    "create_time": "2025-02-01T10:00:00+08:00",
    "sender": "user_103",
    "sender_name": "Chen",
    "content": "We need to complete the product design this week",
    "group_id": "group_001",
    "group_name": "Project Discussion Group",
    "scene": "group_chat"
  }'

ℹ️ scene is a required field, only supports assistant or group_chat, used to specify memory extraction strategy. ℹ️ By default, all memory types are extracted and stored

API Features:

/api/v3/agentic/memorize: Store single message memory
/api/v3/agentic/retrieve_lightweight: Lightweight memory retrieval (Embedding + BM25 + RRF)
/api/v3/agentic/retrieve_agentic: Agentic memory retrieval (LLM-guided multi-round intelligent retrieval)

For more API details, please refer to Agentic V3 API Documentation.

🔍 Retrieve Memories

EverMemOS provides two retrieval modes: Lightweight (fast) and Agentic (intelligent).

Lightweight Retrieval

Parameter	Required	Description
`query`	Yes*	Natural language query (*optional for profile data source)
`user_id`	No	User ID
`data_source`	Yes	`episode` / `event_log` / `semantic_memory` / `profile`
`memory_scope`	Yes	`personal` (user_id only) / `group` (group_id only) / `all` (both)
`retrieval_mode`	Yes	`embedding` / `bm25` / `rrf` (recommended)
`group_id`	No	Group ID
`current_time`	No	Filter valid semantic_memory (format: YYYY-MM-DD)
`top_k`	No	Number of results (default: 5)

Example 1: Personal Memory

Example: Personal Memory Retrieval

curl -X POST http://localhost:8001/api/v3/agentic/retrieve_lightweight \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What sports does the user like?",
    "user_id": "user_001",
    "data_source": "episode",
    "memory_scope": "personal",
    "retrieval_mode": "rrf"
  }'

Example 2: Group Memory

Example: Group Memory Retrieval

curl -X POST http://localhost:8001/api/v3/agentic/retrieve_lightweight \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Discuss project progress",
    "group_id": "project_team_001",
    "data_source": "episode",
    "memory_scope": "group",
    "retrieval_mode": "rrf"
  }'

Agentic Retrieval

LLM-guided multi-round intelligent search with automatic query refinement and result reranking.

Example: Agentic Retrieval

curl -X POST http://localhost:8001/api/v3/agentic/retrieve_agentic \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What foods might the user like?",
    "user_id": "user_001",
    "group_id": "chat_group_001",
    "top_k": 20,
    "llm_config": {
      "model": "gpt-4o-mini",
      "api_key": "your_api_key"
    }
  }'

⚠️ Agentic retrieval requires LLM API key and takes longer, but provides higher quality results for queries requiring multiple memory sources and complex logic.

📖 Full Documentation: Agentic V3 API | Testing Tool: demo/tools/test_retrieval_comprehensive.py

📦 Batch Store Group Chat Memory

EverMemOS supports a standardized group chat data format (GroupChatFormat). You can use scripts for batch storage:

# Use script for batch storage (Chinese data)
uv run python src/bootstrap.py src/run_memorize.py \
  --input data/group_chat_zh.json \
  --api-url http://localhost:8001/api/v3/agentic/memorize \
  --scene group_chat 

# Or use English data
uv run python src/bootstrap.py src/run_memorize.py \
  --input data/group_chat_en.json \
  --api-url http://localhost:8001/api/v3/agentic/memorize \
  --scene group_chat

# Validate file format
uv run python src/bootstrap.py src/run_memorize.py \
  --input data/group_chat_en.json \
  --scene group_chat \
  --validate-only

ℹ️ Scene Parameter Explanation: The scene parameter is required and specifies the memory extraction strategy:

Use assistant for one-on-one conversations with AI assistant

Use group_chat for multi-person group discussions

Note: In your data files, you may see scene values like work or company - these are internal scene descriptors in the data format. The --scene command-line parameter uses different values (assistant/group_chat) to specify which extraction pipeline to apply.

GroupChatFormat Example:

{
  "version": "1.0.0",
  "conversation_meta": {
    "group_id": "group_001",
    "name": "Project Discussion Group",
    "user_details": {
      "user_101": {
        "full_name": "Alice",
        "role": "Product Manager"
      }
    }
  },
  "conversation_list": [
    {
      "message_id": "msg_001",
      "create_time": "2025-02-01T10:00:00+08:00",
      "sender": "user_101",
      "content": "Good morning everyone"
    }
  ]
}

For complete format specifications, please refer to Group Chat Format Specification.

More Details

For detailed installation, configuration, and usage instructions, please refer to:

📚 Quick Start Guide - Complete installation and configuration steps
📖 API Usage Guide - API endpoints and data format details
🔧 Development Guide - Architecture design and development best practices
🚀 Bootstrap Usage - Script runner usage instructions
📝 Group Chat Format Specification - Standardized data format

📚 Documentation

Developer Docs

Quick Start Guide - Installation, configuration, and startup
Development Guide - Architecture design and best practices
Bootstrap Usage - Script runner

API Documentation

Agentic V3 API - Agentic layer API

Core Framework

Dependency Injection Framework - DI container usage guide

Demos & Evaluation

📖 Demo Guide - Interactive examples and memory extraction demos
📊 Data Guide - Sample conversation data and format specifications
📊 Evaluation Guide - Testing EverMemOS-based methods on standard benchmarks

🏗️ Architecture Design

EverMemOS adopts a layered architecture design, mainly including:

Agentic Layer: Memory extraction, vectorization, retrieval, and reranking
Memory Layer: MemCell extraction, episodic memory management
Retrieval Layer: Multi-modal retrieval and result ranking
Business Layer: Business logic and data operations
Infrastructure Layer: Database, cache, message queue adapters, etc.
Core Framework: Dependency injection, middleware, queue management, etc.

For more architectural details, please refer to the Development Guide.

🤝 Contributing

We welcome all forms of contributions! Whether it's reporting bugs, proposing new features, or submitting code improvements.

Before contributing, please read our Contributing Guide to learn about:

Development environment setup
Code standards and best practices
Git commit conventions (Gitemoji)
Pull Request process

🌟 Join Us

We are building a vibrant open-source community!

Contact

Contributors

Thanks to all the developers who have contributed to this project!

📖 Citation

If you use EverMemOS in your research, please cite our paper (coming soon):

Coming soon

📄 License

This project is licensed under the Apache License 2.0. This means you are free to use, modify, and distribute this project, with the following key conditions:

You must include a copy of the Apache 2.0 license
You must state any significant changes made to the code
You must retain all copyright, patent, trademark, and attribution notices
If a NOTICE file is included, you must include it in your distribution

🙏 Acknowledgments

Thanks to the following projects and communities for their inspiration and support:

Memos - Thank you to the Memos project for providing a comprehensive, standardized open-source note-taking service that has provided valuable inspiration for our memory system design.
Nemori - Thank you to the Nemori project for providing a self-organising long-term memory substrate for agentic LLM workflows that has provided valuable inspiration for our memory system design.

If this project helps you, please give us a ⭐️

Made with ❤️ by the EverMemOS Team

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitlab		.gitlab
.vscode		.vscode
data		data
data_format/group_chat		data_format/group_chat
demo		demo
docs		docs
evaluation		evaluation
figs		figs
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
APACHE_2.0_MIGRATION_GUIDE.md		APACHE_2.0_MIGRATION_GUIDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
FETCH_HEAD		FETCH_HEAD
LICENSE		LICENSE
LICENSE_CHANGE_SUMMARY.md		LICENSE_CHANGE_SUMMARY.md
NOTICE		NOTICE
README.md		README.md
README_zh.md		README_zh.md
config.json		config.json
docker-compose.yaml		docker-compose.yaml
env.template		env.template
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
pytest.ini		pytest.ini
uv.lock		uv.lock

License

Nyakult/EverMemOS

Folders and files

Latest commit

History

Repository files navigation