3 unstable releases

Uses new Rust 2024

new 0.2.1 Dec 25, 2025
0.2.0 Dec 24, 2025
0.1.0 Dec 24, 2025

#502 in Text processing

AGPL-3.0-or-later

140KB
3.5K SLoC

memory-indexer

In-memory multilingual full-text indexer with pinyin-first search, prefix and fuzzy recall—built for chat memory, note-taking, or local knowledge bases.

Highlights

  • Out-of-the-box CJK support

    • chinese and pinyin fuzzy search

    • japanese/korean n-grams with custom dictionaries

    • mixed-script text supported

  • Ranking and routing

    • BM25 with minimum-should-match

    • ASCII queries auto-route exact → pinyin → fuzzy

    • non-ASCII uses 2/3-gram + Levenshtein fuzzy

  • Highlight-friendly offsets: UTF-8/UTF-16 positions supported

  • Index snapshots: compressed binary format for persistence and fast loading

  • Pluggable dictionaries: inject or train Japanese/Hangul dictionaries for better tokenization

Quick start

use memory_indexer::{InMemoryIndex, SearchMode};

let mut index = InMemoryIndex::default();
index.add_doc("kb", "doc-cn", "你好世界 memory-indexer", true);
index.add_doc("kb", "doc-en", "fuzzy search handles typos", true);

// Auto chooses between exact / pinyin / fuzzy
let hits = index.search_hits("kb", "nihao");

// Explicit modes
let fuzzy = index.search_with_mode("kb", "memry-indexer", SearchMode::Fuzzy);
let pinyin_prefix = index.search_with_mode_hits("kb", "nhs", SearchMode::Pinyin);

// Highlight spans (UTF-16 positions by default)
let spans = index.get_matches("kb", "doc-cn", "nihao");

// Snapshot persistence
let snapshot = index.get_snapshot_data("kb").unwrap();
// index.load_snapshot("kb", snapshot);

Development

  • Tests: cargo test
  • Benchmarks: cargo bench

License

AGPL-3.0-or-later

Dependencies

~15MB
~199K SLoC