VaultMem — Building Encrypted Memory for AI Agents
I spent the last few months building something I couldn’t find anywhere else: a memory library for AI agents where the platform developer cryptographically cannot read what the user stores.
Install:
pip install vaultmem· Code: github.com/aag1091-alt/vaultmem-sdk · Preprint: doi.org/10.5281/zenodo.19154079
The problem
Every AI memory library I looked at — mem0, Zep, LangMem, Letta — stores user memories in plaintext on the platform’s servers. The platform operator can read everything their users have ever told the agent. This isn’t a privacy policy failure; it’s a structural one. When the key lives with the operator, the guarantee lives with the operator.
This matters more than it sounds. The deepest use case for persistent AI memory isn’t task management — it’s candor. An agent that remembers across sessions only becomes genuinely useful when you can be completely honest with it. A person working through a difficult decision. Someone with early-stage dementia building a longitudinal record of their life. A user who just wants their preferences remembered without those preferences living in a corporate database.
For that to work, the privacy guarantee has to be mathematical, not a privacy policy.
What VaultMem is
VaultMem is an embeddable Python library. A platform developer adds it as a dependency, and their users get encrypted, portable, private memory — without the developer being able to read it.
The analogy I keep coming back to: VaultMem is to AI memory what 1Password is to credentials. The library doesn’t know what it’s storing. The platform receives opaque ciphertext it can store, back up, and serve back — but cannot decrypt.
from vaultmem import VaultSession, OllamaEmbedder
embedder = OllamaEmbedder("http://localhost:11434")
# Create an encrypted vault
with VaultSession.create("./vault", passphrase="s3cr3t", owner="alice",
embedder=embedder) as s:
s.add("I met Bob at the AI conference yesterday")
s.add("I prefer concise bullet-point answers")
s.add("My go-to language for data work is Python")
# Search across sessions
with VaultSession.open("./vault", passphrase="s3cr3t",
embedder=embedder) as s:
results = s.search("Bob conference")
for r in results:
print(f"[{r.tier}] {r.score:.3f} {r.atom.content}")
Output:
[ATOM] 0.847 I met Bob at the AI conference yesterday
How the encryption works
The key architecture has two layers:
User passphrase
→ Argon2id(passphrase, salt) ← 64 MiB, 3 iterations
→ KEK (Key Encryption Key) ← in RAM only; zeroed after open
→ AES-GCM-Dec(KEK, wrapped_MEK)
→ MEK (Master Encryption Key) ← in RAM during session only
→ AES-256-GCM(memory atom) ← to disk
The Master Encryption Key (MEK) is generated once at vault creation and stored only as a ciphertext wrapped with the user’s passphrase-derived key. It never touches disk in plaintext. When the session closes, it’s zeroed from RAM.
A platform developer who integrates VaultMem receives encrypted blobs from their users. They can store them, back them up, sync them — but they cannot read them. The math makes this true regardless of what their privacy policy says.
Two things I’m particularly happy with:
Passphrase changes are O(1). Because the MEK is wrapped separately from the data, changing a passphrase means re-wrapping one key — not re-encrypting every memory atom. A vault with 10,000 atoms changes passphrase in milliseconds.
Per-atom tamper detection. Each atom is encrypted with AES-256-GCM, which includes a 128-bit authentication tag. Each atom also has its UUID bound as Additional Authenticated Data, so you can’t transplant a valid ciphertext from one slot to another. Any tampering is detected before the data surfaces.
The memory model
Memories are stored as MemoryObject instances implementing the Memory-as-Asset 5-tuple: content, context, owner, permissions, versioning state.
Four memory types: EPISODIC (things that happened), SEMANTIC (facts), PERSONA (preferences and habits), PROCEDURAL (how-to knowledge). Type is assigned by a deterministic classifier — no LLM call, no network request.
The more interesting part is the granularity hierarchy:
- ATOM — a single memory statement. Immutable once written.
- COMPOSITE — a type-homogeneous grouping of related atoms.
- AFFINITY — a derived atom representing a recurring pattern. “User drinks coffee every morning” synthesized from twenty individual coffee mentions.
AFFINITY atoms carry a significance score:
σ = (1 − e^(−κ × freq)) × e^(−λ × days_since_last)
Frequency drives the first factor (with diminishing returns), recency drives the second. The parameters vary by data class — medical memories decay slower than general ones.
Three-tier retrieval — and why it matters
The retrieval model searches AFFINITY → COMPOSITE → ATOM and re-ranks results globally. On factual queries this behaves identically to flat cosine search — no regression. On pattern queries it’s a different story.
I ran a benchmark to verify this. Three topic clusters (coffee habits, Python usage, exercise routine), each with one AFFINITY atom and five ATOM-granularity memories including a deliberately tricky decoy: an episodic memory containing the exact query keywords, designed to fool flat cosine.
For the query “my morning coffee habits”, flat cosine ranks the decoy atom first (it contains the words “coffee habits” and “morning”) and the AFFINITY second. Three-tier flips this — the AFFINITY’s significance score (~0.92 for a well-established habit) outweighs the cosine advantage of the decoy.
| Metric | Flat cosine | Three-tier | Δ |
|---|---|---|---|
| MRR (pattern queries) | 0.50 | 1.00 | +0.50 |
| MRR (specific queries) | 1.00 | 1.00 | 0.00 |
Pattern queries: AFFINITY surfaced first every time. Specific queries: no regression. The significance weight isn’t a heuristic — it’s a correction for a systematic bias in pure cosine retrieval.
The .vmem format
Vaults are stored as two files:
my_vault/
├── current.vmem # encrypted index (48-byte self-describing header)
└── current.atoms # append-only encrypted atom blocks
The 48-byte header is the only plaintext in the file. It carries the magic bytes (VMEM), format version, encryption algorithm, KDF, and file type — enough for any conforming implementation to decrypt the vault given the passphrase. No prior coordination with the writing implementation needed.
Atoms are append-only and immutable. Checkpoints update only the index file, atomically via fsync + os.rename(). Crash-safe.
The format is designed to be the interoperability layer for a future where encrypted memory vaults move between applications and agents — the same way a PDF can be opened by any conforming reader regardless of which app produced it.
What’s in the library
About 2,100 lines of Python, 39 unit tests, three core dependencies (cryptography, argon2-cffi, numpy). Sentence-transformers or a local Ollama instance for embeddings — or bring your own.
# LocalEmbedder — no network, runs on CPU
from vaultmem import LocalEmbedder
embedder = LocalEmbedder()
# OllamaEmbedder — GPU-accelerated, localhost or Tailscale
from vaultmem import OllamaEmbedder
embedder = OllamaEmbedder("http://localhost:11434", model="all-minilm")
# Custom — implement .embed(text) -> list[float]
class MyEmbedder:
def embed(self, text: str) -> list[float]: ...
@property
def dimension(self) -> int: return 384
The research angle
VaultMem is also a research paper — the first implementation of Layer 1 of the Memory-as-Asset framework (Pan, Huang & Yang, 2026). The paper formalizes the threat model, scores ten existing AI memory systems against a four-dimension privacy taxonomy (no prior system achieves all four properties simultaneously), and specifies the .vmem format as a portable standard.
I submitted the preprint to Zenodo and I’m targeting ACM CCS 2026 for the full paper.
Try it
pip install vaultmem # core
pip install "vaultmem[local]" # + local embeddings (sentence-transformers)
The library is intentionally small — the code is the specification. If you’re building an AI agent and want to give your users memory that’s actually theirs, this is the starting point.