Skip to content

EverMemOS is an open-source, enterprise-grade intelligent memory system. Our mission is to build AI memory that never forgets, making every conversation built on previous understanding.

License

Notifications You must be signed in to change notification settings

Nyakult/EverMemOS

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EverMemOS Logo EverMemOS

Website

Let every interaction be driven by understanding. · Enterprise-Grade Intelligent Memory System

Python License Docker FastAPI MongoDB Elasticsearch Milvus Redis Release

English | 简体中文


💬 More than memory — it's foresight.

EverMemOS is a forward-thinking intelligent system.
While traditional AI memory serves merely as a "look-back" database, EverMemOS enables AI not only to "remember" what happened, but also to "understand" the meaning behind these memories and use them to guide current actions and decisions. In the EverMemOS demo tools, you can see how EverMemOS extracts important information from your history, and then remembers your preferences, habits, and history during conversations, just like a friend who truly knows you. On the LoCoMo benchmark, our approach built upon EverMemOS achieved a reasoning accuracy of 92.3% (evaluated by LLM-Judge), outperforming comparable methods in our evaluation.


📢 Latest Updates

[2025-11-02] 🎉 🎉 🎉 EverMemOS v1.0.0 Released!

  • Stable Version: AI Memory System officially open sourced
  • 📚 Complete Documentation: Quick start guide and comprehensive API documentation
  • 📈 Benchmark Testing: LoCoMo dataset benchmark evaluation pipeline
  • 🖥️ Demo Tools: Get started quickly with easy-to-use demos

🎯 Core Vision

Build AI memory that never forgets, making every conversation built on previous understanding.


💡 Unique Advantages

🔗 Coherent Narrative

Beyond "fragments," connecting "stories": Automatically linking conversation pieces to build clear thematic context, enabling AI to "truly understand."

When facing multi-threaded conversations, it naturally distinguishes between "Project A progress discussion" and "Team B strategy planning," maintaining coherent contextual logic within each theme.

From scattered phrases to complete narratives, AI no longer just "understands one sentence" but "understands the whole story."

🧠 Evidence-Based Perception

Beyond "retrieval," intelligent "perception": Proactively capturing deep connections between memories and tasks, enabling AI to "think thoroughly" at critical moments.

Imagine: When a user asks for "food recommendations," the AI proactively recalls "you had dental surgery two days ago" as a key piece of information, automatically adjusting suggestions to avoid unsuitable options.

This is Contextual Awareness — enabling AI thinking to be truly built on understanding rather than isolated responses.

💾 Living Profiles

Beyond "records," dynamic "growth": Real-time user profile updates that get to know you better with each conversation, enabling AI to "recognize you authentically."

Every interaction subtly updates the AI's understanding of you — preferences, style, and focus points all continuously evolve.

As interactions deepen, it doesn't just "remember what you said," but is "learning who you are."

📑 Table of Contents


📖 Project Introduction

EverMemOS is an open-source project designed to provide long-term memory capabilities to conversational AI agents. It extracts, structures, and retrieves information from conversations, enabling agents to maintain context, recall past interactions, and progressively build user profiles. This results in more personalized, coherent, and intelligent conversations.

📄 Paper Coming Soon - Our technical paper is in preparation. Stay tuned!

🎯 System Framework

EverMemOS operates along two main tracks: memory construction and memory perception. Together they form a cognitive loop that continuously absorbs, consolidates, and applies past information, so every response is grounded in real context and long-term memory.

Overview

🧩 Memory Construction

Memory construction layer: builds structured, retrievable long-term memory from raw conversation data.

  • Core elements

    • ⚛️ Atomic memory unit MemCell: the core structured unit distilled from conversations for downstream organization and reference
    • 🗂️ Multi-level memory: integrate related fragments by theme and storyline to form reusable, hierarchical memories
    • 🏷️ Multiple memory types: covering episodes, profiles, preferences, relationships, semantic knowledge, basic facts, and core memories
  • Workflow

    1. MemCell extraction: identify key information in conversations to generate atomic memories
    2. Memory construction: integrate by theme and participants to form episodes and profiles
    3. Storage and indexing: persist data and build keyword and semantic indexes to support fast recall

🔎 Memory Perception

Memory perception layer: quickly recalls relevant memories through multi-round reasoning and intelligent fusion, achieving precise contextual awareness.

🎯 Intelligent Retrieval Tools

  • 🧪 Hybrid Retrieval (RRF Fusion)
    Parallel execution of semantic and keyword retrieval, seamlessly fused using Reciprocal Rank Fusion algorithm

  • 📊 Intelligent Reranking (Reranker)
    Batch concurrent processing with exponential backoff retry, maintaining stability under high throughput
    Reorders candidate memories by deep relevance, prioritizing the most critical information

🤖 Agentic Intelligent Retrieval

  • 🎓 LLM-Guided Multi-Round Recall
    For insufficient cases, generate 2-3 complementary queries, retrieve and fuse in parallel Automatically identifies missing information, proactively filling retrieval blind spots

  • 🔀 Multi-Query Parallel Strategy
    When a single query cannot fully express intent, generate multiple complementary perspective queries
    Enhance coverage of complex intents through multi-path RRF fusion

  • ⚡ Lightweight Fast Mode
    For latency-sensitive scenarios, skip LLM calls and use RRF-fused hybrid retrieval
    Flexibly balance between speed and quality

🧠 Reasoning Fusion

  • Context Integration: Concatenate recalled multi-level memories (episodes, profiles, preferences) with current conversation
  • Traceable Reasoning: Model generates responses based on explicit memory evidence, avoiding hallucination

💡 Through the cognitive loop of "Structured Memory → Multi-Strategy Recall → Intelligent Retrieval → Contextual Reasoning", the AI always "thinks with memory", achieving true contextual awareness.

📁 Project Structure

Expand/Collapse Directory Structure
memsys-opensource/
├── src/                              # Source code directory
│   ├── agentic_layer/                # Agentic layer - unified memory interface
│   ├── memory_layer/                 # Memory layer - memory extraction
│   │   ├── memcell_extractor/        # MemCell extractor
│   │   ├── memory_extractor/         # Memory extractor
│   │   └── prompts/                  # LLM prompt templates
│   ├── retrieval_layer/              # Retrieval layer - memory retrieval
│   ├── biz_layer/                    # Business layer - business logic
│   ├── infra_layer/                  # Infrastructure layer
│   ├── core/                         # Core functionality (DI/lifecycle/middleware)
│   ├── component/                    # Components (LLM adapters, etc.)
│   └── common_utils/                 # Common utilities
├── demo/                             # Demo code
├── data/                             # Sample conversation data
├── evaluation/                       # Evaluation scripts
│   └── src/                          # Evaluation framework source code
├── data_format/                      # Data format definitions
├── docs/                             # Documentation
├── config.json                       # Configuration file
├── env.template                      # Environment variable template
├── pyproject.toml                    # Project configuration
└── README.md                         # Project description

🚀 Quick Start

Prerequisites

  • Python 3.10+
  • uv (recommended package manager)
  • Docker 20.10+ and Docker Compose 2.0+
  • At least 4GB of available RAM (for Elasticsearch and Milvus)

Installation

Using Docker for Dependency Services ⭐

Use Docker Compose to start all dependency services (MongoDB, Elasticsearch, Milvus, Redis) with one command:

# 1. Clone the repository
git clone https://github.com/EverMind-AI/EverMemOS.git
cd EverMemOS

# 2. Start Docker services
docker-compose up -d

# 3. Verify service status
docker-compose ps

# 4. Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# 5. Install project dependencies
uv sync

# 6. Configure environment variables
cp env.template .env
# Edit the .env file and fill in the necessary configurations:
#   - LLM_API_KEY: Enter your LLM API Key (for memory extraction)
#   - DEEPINFRA_API_KEY: Enter your DeepInfra API Key (for Embedding and Rerank)

Docker Services:

Service Host Port Container Port Purpose
MongoDB 27017 27017 Primary database for storing memory cells and profiles
Elasticsearch 19200 9200 Keyword search engine (BM25)
Milvus 19530 19530 Vector database for semantic retrieval
Redis 6379 6379 Cache service

💡 Connection Tips:

  • Use host ports when connecting (e.g., localhost:19200 for Elasticsearch)
  • MongoDB credentials: admin / memsys123 (local development only)
  • Stop services: docker-compose down | View logs: docker-compose logs -f

📖 MongoDB detailed installation guide: MongoDB Installation Guide


How to Use

EverMemOS offers multiple usage methods. Choose the one that best suits your needs:


🎯 Run Demo: Memory Extraction and Interactive Chat

The demo showcases the end-to-end functionality of EverMemOS.


🚀 Quick Start: Simple Demo (Recommended)

The fastest way to experience EverMemOS! Just 2 steps to see memory storage and retrieval in action:

# Step 1: Start the API server (in terminal 1)
uv run python src/bootstrap.py src/run.py --port 8001

# Step 2: Run the simple demo (in terminal 2)
uv run python src/bootstrap.py demo/simple_demo.py

What it does:

  • Stores 4 conversation messages about sports hobbies
  • Waits 10 seconds for indexing
  • Searches for relevant memories with 3 different queries
  • Shows complete workflow with friendly explanations

Perfect for: First-time users, quick testing, understanding core concepts

See the demo code at demo/simple_demo.py


We also provide a full-featured experience:

Prerequisites: Start the API Server

# Terminal 1: Start the API server (required)
uv run python src/bootstrap.py src/run.py --port 8001

💡 Tip: Keep the API server running throughout. All following operations should be performed in another terminal.


Step 1: Extract Memories

Run the memory extraction script to process sample conversation data and build the memory database:

# Terminal 2: Run the extraction script
uv run python src/bootstrap.py demo/extract_memory.py

This script performs the following actions:

  • Calls demo.tools.clear_all_data.clear_all_memories() so the demo starts from an empty MongoDB/Elasticsearch/Milvus/Redis state. Ensure the dependency stack launched by docker-compose is running before executing the script, otherwise the wipe step will fail.
  • Loads data/assistant_chat_zh.json, appends scene="assistant" to each message, and streams every entry to http://localhost:8001/api/v3/agentic/memorize. Update the base_url, data_file, or profile_scene constants in demo/extract_memory.py if you host the API on another endpoint or want to ingest a different scenario.
  • Writes through the HTTP API only: MemCells, episodes, and profiles are created inside your databases, not under demo/memcell_outputs/. Inspect MongoDB (and Milvus/Elasticsearch) to verify ingestion or proceed directly to the chat demo.

💡 Tip: For detailed configuration instructions and usage guide, please refer to the Demo Documentation.

Step 2: Chat with Memory

After extracting memories, start the interactive chat demo:

# Terminal 2: Run the chat program (ensure API server is still running)
uv run python src/bootstrap.py demo/chat_with_memory.py

This program loads .env via python-dotenv, verifies that at least one LLM key (LLM_API_KEY, OPENROUTER_API_KEY, or OPENAI_API_KEY) is available, and connects to MongoDB through demo.utils.ensure_mongo_beanie_ready to enumerate groups that already contain MemCells. Each user query invokes api/v3/agentic/retrieve_lightweight unless you explicitly select the Agentic mode, in which case the orchestrator switches to api/v3/agentic/retrieve_agentic and warns about the additional LLM latency.

Interactive Workflow:

  1. Select Language: Choose a zh or en terminal UI.
  2. Select Scenario Mode: Assistant (one-on-one) or Group Chat (multi-speaker analysis).
  3. Select Conversation Group: Groups are read live from MongoDB via query_all_groups_from_mongodb; run the extraction step first so the list is non-empty.
  4. Select Retrieval Mode: rrf, embedding, bm25, or LLM-guided Agentic retrieval.
  5. Start Chatting: Pose questions, inspect the retrieved memories that are displayed before each response, and use help, clear, reload, or exit to manage the session.

📊 Run Evaluation: Performance Testing

The evaluation framework provides a unified, modular way to benchmark memory systems on standard datasets (LoCoMo, LongMemEval, PersonaMem).

Quick Test (Smoke Test):

# Test with limited data to verify everything works
# Default: first conversation, first 10 messages, first 3 questions
uv run python -m evaluation.cli --dataset locomo --system evermemos --smoke

# Custom smoke test: 20 messages, 5 questions
uv run python -m evaluation.cli --dataset locomo --system evermemos \
    --smoke --smoke-messages 20 --smoke-questions 5

# Test different datasets
uv run python -m evaluation.cli --dataset longmemeval --system evermemos --smoke
uv run python -m evaluation.cli --dataset personamem --system evermemos --smoke

# Test specific stages (e.g., only search and answer)
uv run python -m evaluation.cli --dataset locomo --system evermemos \
    --smoke --stages search answer

# View smoke test results quickly
cat evaluation/results/locomo-evermemos-smoke/report.txt

Full Evaluation:

# Evaluate EvermemOS on LoCoMo benchmark
uv run python -m evaluation.cli --dataset locomo --system evermemos

# Evaluate on other datasets
uv run python -m evaluation.cli --dataset longmemeval --system evermemos
uv run python -m evaluation.cli --dataset personamem --system evermemos

# Use --run-name to distinguish multiple runs (useful for A/B testing)
uv run python -m evaluation.cli --dataset locomo --system evermemos --run-name baseline
uv run python -m evaluation.cli --dataset locomo --system evermemos --run-name experiment1

# Resume from checkpoint if interrupted (automatic)
# Just re-run the same command - it will detect and resume from checkpoint
uv run python -m evaluation.cli --dataset locomo --system evermemos

View Results:

# Results are saved to evaluation/results/{dataset}-{system}[-{run-name}]/
cat evaluation/results/locomo-evermemos/report.txt          # Summary metrics
cat evaluation/results/locomo-evermemos/eval_results.json   # Detailed per-question results
cat evaluation/results/locomo-evermemos/pipeline.log        # Execution logs

The evaluation pipeline consists of 4 stages (add → search → answer → evaluate) with automatic checkpointing and resume support.

⚙️ Evaluation Configuration:

  • Data Preparation: Place datasets in evaluation/data/ (see evaluation/README.md)
  • Environment: Configure .env with LLM API keys (see env.template)
  • Installation: Run uv sync --group evaluation to install dependencies
  • Custom Config: Copy and modify YAML files in evaluation/config/systems/ or evaluation/config/datasets/
  • Advanced Usage: See evaluation/README.md for checkpoint management, stage-specific runs, and system comparisons

🔌 Call API Endpoints

Prerequisites: Start the API Server

Before calling the API, make sure the API server is running:

# Start the API server
uv run python src/bootstrap.py src/run.py --port 8001

💡 Tip: Keep the API server running throughout. All following API calls should be performed in another terminal.


Use V3 API to store single message memory:

Example: Store single message memory
curl -X POST http://localhost:8001/api/v3/agentic/memorize \
  -H "Content-Type: application/json" \
  -d '{
    "message_id": "msg_001",
    "create_time": "2025-02-01T10:00:00+08:00",
    "sender": "user_103",
    "sender_name": "Chen",
    "content": "We need to complete the product design this week",
    "group_id": "group_001",
    "group_name": "Project Discussion Group",
    "scene": "group_chat"
  }'

ℹ️ scene is a required field, only supports assistant or group_chat, used to specify memory extraction strategy. ℹ️ By default, all memory types are extracted and stored

API Features:

  • /api/v3/agentic/memorize: Store single message memory
  • /api/v3/agentic/retrieve_lightweight: Lightweight memory retrieval (Embedding + BM25 + RRF)
  • /api/v3/agentic/retrieve_agentic: Agentic memory retrieval (LLM-guided multi-round intelligent retrieval)

For more API details, please refer to Agentic V3 API Documentation.


🔍 Retrieve Memories

EverMemOS provides two retrieval modes: Lightweight (fast) and Agentic (intelligent).

Lightweight Retrieval

Parameter Required Description
query Yes* Natural language query (*optional for profile data source)
user_id No User ID
data_source Yes episode / event_log / semantic_memory / profile
memory_scope Yes personal (user_id only) / group (group_id only) / all (both)
retrieval_mode Yes embedding / bm25 / rrf (recommended)
group_id No Group ID
current_time No Filter valid semantic_memory (format: YYYY-MM-DD)
top_k No Number of results (default: 5)

Example 1: Personal Memory

Example: Personal Memory Retrieval
curl -X POST http://localhost:8001/api/v3/agentic/retrieve_lightweight \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What sports does the user like?",
    "user_id": "user_001",
    "data_source": "episode",
    "memory_scope": "personal",
    "retrieval_mode": "rrf"
  }'

Example 2: Group Memory

Example: Group Memory Retrieval
curl -X POST http://localhost:8001/api/v3/agentic/retrieve_lightweight \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Discuss project progress",
    "group_id": "project_team_001",
    "data_source": "episode",
    "memory_scope": "group",
    "retrieval_mode": "rrf"
  }'

Agentic Retrieval

LLM-guided multi-round intelligent search with automatic query refinement and result reranking.

Example: Agentic Retrieval
curl -X POST http://localhost:8001/api/v3/agentic/retrieve_agentic \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What foods might the user like?",
    "user_id": "user_001",
    "group_id": "chat_group_001",
    "top_k": 20,
    "llm_config": {
      "model": "gpt-4o-mini",
      "api_key": "your_api_key"
    }
  }'

⚠️ Agentic retrieval requires LLM API key and takes longer, but provides higher quality results for queries requiring multiple memory sources and complex logic.

📖 Full Documentation: Agentic V3 API | Testing Tool: demo/tools/test_retrieval_comprehensive.py


📦 Batch Store Group Chat Memory

EverMemOS supports a standardized group chat data format (GroupChatFormat). You can use scripts for batch storage:

# Use script for batch storage (Chinese data)
uv run python src/bootstrap.py src/run_memorize.py \
  --input data/group_chat_zh.json \
  --api-url http://localhost:8001/api/v3/agentic/memorize \
  --scene group_chat 

# Or use English data
uv run python src/bootstrap.py src/run_memorize.py \
  --input data/group_chat_en.json \
  --api-url http://localhost:8001/api/v3/agentic/memorize \
  --scene group_chat

# Validate file format
uv run python src/bootstrap.py src/run_memorize.py \
  --input data/group_chat_en.json \
  --scene group_chat \
  --validate-only

ℹ️ Scene Parameter Explanation: The scene parameter is required and specifies the memory extraction strategy:

  • Use assistant for one-on-one conversations with AI assistant
  • Use group_chat for multi-person group discussions

Note: In your data files, you may see scene values like work or company - these are internal scene descriptors in the data format. The --scene command-line parameter uses different values (assistant/group_chat) to specify which extraction pipeline to apply.

GroupChatFormat Example:

{
  "version": "1.0.0",
  "conversation_meta": {
    "group_id": "group_001",
    "name": "Project Discussion Group",
    "user_details": {
      "user_101": {
        "full_name": "Alice",
        "role": "Product Manager"
      }
    }
  },
  "conversation_list": [
    {
      "message_id": "msg_001",
      "create_time": "2025-02-01T10:00:00+08:00",
      "sender": "user_101",
      "content": "Good morning everyone"
    }
  ]
}

For complete format specifications, please refer to Group Chat Format Specification.

More Details

For detailed installation, configuration, and usage instructions, please refer to:

📚 Documentation

Developer Docs

API Documentation

Core Framework

Demos & Evaluation

🏗️ Architecture Design

EverMemOS adopts a layered architecture design, mainly including:

  • Agentic Layer: Memory extraction, vectorization, retrieval, and reranking
  • Memory Layer: MemCell extraction, episodic memory management
  • Retrieval Layer: Multi-modal retrieval and result ranking
  • Business Layer: Business logic and data operations
  • Infrastructure Layer: Database, cache, message queue adapters, etc.
  • Core Framework: Dependency injection, middleware, queue management, etc.

For more architectural details, please refer to the Development Guide.

🤝 Contributing

We welcome all forms of contributions! Whether it's reporting bugs, proposing new features, or submitting code improvements.

Before contributing, please read our Contributing Guide to learn about:

  • Development environment setup
  • Code standards and best practices
  • Git commit conventions (Gitemoji)
  • Pull Request process

🌟 Join Us

We are building a vibrant open-source community!

Contact

GitHub Issues GitHub Discussions Email Reddit X

Contributors

Thanks to all the developers who have contributed to this project!

📖 Citation

If you use EverMemOS in your research, please cite our paper (coming soon):

Coming soon

📄 License

This project is licensed under the Apache License 2.0. This means you are free to use, modify, and distribute this project, with the following key conditions:

  • You must include a copy of the Apache 2.0 license
  • You must state any significant changes made to the code
  • You must retain all copyright, patent, trademark, and attribution notices
  • If a NOTICE file is included, you must include it in your distribution

🙏 Acknowledgments

Thanks to the following projects and communities for their inspiration and support:

  • Memos - Thank you to the Memos project for providing a comprehensive, standardized open-source note-taking service that has provided valuable inspiration for our memory system design.

  • Nemori - Thank you to the Nemori project for providing a self-organising long-term memory substrate for agentic LLM workflows that has provided valuable inspiration for our memory system design.


If this project helps you, please give us a ⭐️

Made with ❤️ by the EverMemOS Team

About

EverMemOS is an open-source, enterprise-grade intelligent memory system. Our mission is to build AI memory that never forgets, making every conversation built on previous understanding.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%