How it works:
- Storage uses one SQLite database file, plus a local LanceDB index of vectors. No need for a server, cloud services, or any API keys.
- Retrieval is a hybrid approach using BM25 (rank-bm25) and vector-based search (sentence-transformers) combined with a co-occurrence graph of entities, using reciprocal rank fusion. The idea is to find the right memory, not the closest one.
- It plugs into the agent's lifecycle via MCP: before the agent responds, relevant memories are added to its input; after each turn, decisions and new learnings are automatically recorded. No need to manually remember "remember this".
- It maintains a dictionary for each project which builds itself based on your memories, which improves recall performance for the project-specific vocabulary.
- It can run fully offline, pointing to a locally installed Ollama model and even the optional large language model features such as consolidation, de-duplication, and chatting about your memories stays on your machine. Embedding is done locally by default.