3 unstable releases
Uses new Rust 2024
| new 0.2.1 | Dec 25, 2025 |
|---|---|
| 0.2.0 | Dec 24, 2025 |
| 0.1.0 | Dec 24, 2025 |
#502 in Text processing
140KB
3.5K
SLoC
memory-indexer
In-memory multilingual full-text indexer with pinyin-first search, prefix and fuzzy recall—built for chat memory, note-taking, or local knowledge bases.
Highlights
-
Out-of-the-box CJK support
-
chinese and pinyin fuzzy search
-
japanese/korean n-grams with custom dictionaries
-
mixed-script text supported
-
-
Ranking and routing
-
BM25 with minimum-should-match
-
ASCII queries auto-route exact → pinyin → fuzzy
-
non-ASCII uses 2/3-gram + Levenshtein fuzzy
-
-
Highlight-friendly offsets: UTF-8/UTF-16 positions supported
-
Index snapshots: compressed binary format for persistence and fast loading
-
Pluggable dictionaries: inject or train Japanese/Hangul dictionaries for better tokenization
Quick start
use memory_indexer::{InMemoryIndex, SearchMode};
let mut index = InMemoryIndex::default();
index.add_doc("kb", "doc-cn", "你好世界 memory-indexer", true);
index.add_doc("kb", "doc-en", "fuzzy search handles typos", true);
// Auto chooses between exact / pinyin / fuzzy
let hits = index.search_hits("kb", "nihao");
// Explicit modes
let fuzzy = index.search_with_mode("kb", "memry-indexer", SearchMode::Fuzzy);
let pinyin_prefix = index.search_with_mode_hits("kb", "nhs", SearchMode::Pinyin);
// Highlight spans (UTF-16 positions by default)
let spans = index.get_matches("kb", "doc-cn", "nihao");
// Snapshot persistence
let snapshot = index.get_snapshot_data("kb").unwrap();
// index.load_snapshot("kb", snapshot);
Development
- Tests:
cargo test - Benchmarks:
cargo bench
License
AGPL-3.0-or-later
Dependencies
~15MB
~199K SLoC