claude-mem: The Plugin That Gave Claude Code a Long-Term Memory

Photo by Manuel Geissinger on Pexels Source

claude-mem is a Claude Code plugin that captures everything your AI coding agent does during a session, compresses it with AI, and injects relevant context back into future sessions. Created by Alex Newman, the plugin hit #1 on GitHub Trending in February 2026, accumulating over 21,000 stars. It solves a problem every developer using AI coding agents has experienced: you close the terminal, and the agent forgets everything it learned about your codebase, your conventions, and the decisions you made together.

The core architecture is straightforward. Five lifecycle hooks observe every tool invocation during a Claude Code session. A worker service compresses those observations using Claude’s Agent SDK. The compressed summaries land in a dual storage layer (SQLite for keyword search, ChromaDB for semantic search) and get automatically injected into future sessions. The result is roughly 10x token efficiency compared to manually copying context between sessions.

The Amnesia Problem Is Costing Developers Real Time

Every AI coding agent on the market today is stateless by default. Cursor, GitHub Copilot, Claude Code, Codex: start a new session and the agent has zero memory of what you built yesterday. No recollection of the architectural decisions you discussed. No awareness of the bugs you fixed or the patterns you established.

The cost is concrete. Developers using AI coding agents report spending 15-30 minutes per session re-establishing context: re-explaining project structure, re-describing naming conventions, re-stating constraints the agent already learned in the previous session. Multiply that across a team of 10 engineers running 3-4 AI sessions per day, and you are looking at 30-60 hours per week of pure repetition.

Claude Code’s built-in CLAUDE.md file partially addresses this by letting you write static instructions the agent reads at session start. But CLAUDE.md is manual. You write it once and update it occasionally. It does not capture the dynamic context that accumulates during a session: the debugging rabbit holes, the tool outputs, the incremental understanding of a specific module’s quirks.

claude-mem fills the gap between static project instructions and the dynamic session context that makes an AI coding agent genuinely useful over time. One developer on X reported that generated code follows project conventions about 85% of the time with persistent memory, compared to 30% without it.

Five Hooks, One Worker Service: How claude-mem Captures Context

claude-mem registers five lifecycle hooks through Claude Code’s plugin system. Each hook fires at a specific point in the session lifecycle, and together they create a complete record of what happened.

SessionStart: Injecting Past Context

When you open a new Claude Code session, the SessionStart hook fires immediately. It queries both the SQLite and ChromaDB databases for memories relevant to the current project directory. Matching summaries get injected into Claude’s system prompt before you type your first message. This is how the agent “remembers” what happened in previous sessions without you saying a word.

The retrieval is project-aware. If you switch between a React frontend repo and a Python backend repo, claude-mem serves different memories for each. It also prioritizes recent sessions over older ones, so stale context does not crowd out current work.

UserPromptSubmit: Capturing Intent

The UserPromptSubmit hook records what you asked the agent to do. This matters because tool outputs alone do not capture intent. If you asked Claude to “refactor the payment module to use the strategy pattern,” that instruction provides critical framing for the file edits and terminal commands that followed. Without it, a future session would see the changes but not understand why they were made.

PostToolUse: Recording Every Action

This is the workhorse hook. Every time Claude executes a tool (editing a file, running a bash command, searching the codebase, reading a file), PostToolUse captures both the input and the output. A single debugging session might generate hundreds of PostToolUse events as Claude reads files, runs tests, edits code, and re-runs tests.

Raw PostToolUse data would be enormous. That is where the compression pipeline comes in, handled by the worker service running on port 37777.

Stop and SessionEnd: Compression and Persistence

When a session ends, two things happen in sequence. First, the Stop hook triggers a two-stage compression process. An AI model (running through Claude’s Agent SDK) reads the raw observations and generates semantic summaries. These summaries capture the key decisions, findings, and outcomes without preserving every line of tool output.

Second, SessionEnd persists everything. The compressed summaries go into both SQLite (for structured queries and full-text search via FTS5) and ChromaDB (for semantic similarity search via embeddings). A background worker handles this asynchronously so it does not block your terminal.

The Dual Database Architecture

claude-mem’s storage layer uses two databases for a reason. Each handles a different type of retrieval, and together they cover the full spectrum of how developers actually search for past context.

SQLite with FTS5: The “When and What” Database

SQLite stores structured records: sessions, individual observations, and AI-generated summaries. Its FTS5 full-text search extension handles keyword queries efficiently. When you ask “what did we change in the authentication module?” the system can match on exact terms like “auth,” “login,” and “JWT” to find relevant sessions.

FTS5 is fast, deterministic, and requires zero external dependencies. No vector database server to keep running, no embedding model to call. For precise queries where you know the terminology, keyword search is often more reliable than semantic search.

ChromaDB: The “Similar Concept” Database

ChromaDB powers semantic search via embeddings. When keyword search fails (because you are asking about “that thing we did with the payment flow” instead of knowing the exact function name), Chroma finds conceptually related observations even without exact keyword overlap.

The two databases complement each other. SQLite answers “show me everything from last Tuesday’s debugging session.” Chroma answers “what did we learn about rate limiting, even if we never used that exact phrase?”

Memory Retrieval Layers

claude-mem retrieves memories in a layered approach designed for token efficiency. The first layer returns a compact index: memory IDs with one-line descriptions. Claude scans this index and requests full details only for the memories it actually needs. A timeline layer lets Claude inspect what happened before and after a selected memory, reconstructing the decision flow without loading entire sessions.

This layered retrieval means a typical session injection consumes 500-2,000 tokens of context, not 20,000. That leaves most of your context window available for the actual coding work.

Why Memory Infrastructure Matters Beyond claude-mem

claude-mem is one plugin for one tool. The broader trend is that memory is becoming a first-class infrastructure layer across the entire AI agent ecosystem.

Mem0 provides a managed memory layer that works across multiple LLM providers, reporting a 26% accuracy boost over baseline RAG on conversational benchmarks. Zep offers a similar persistent memory service with built-in fact extraction and entity tracking. Redis published a dedicated AI agent orchestration framework that treats memory as a core primitive alongside state management and tool routing.

Andrew Ng’s Context Hub takes a different approach: rather than compressing past sessions, it lets agents build up institutional knowledge about APIs and codebases over time, similar to how a human developer accumulates tribal knowledge.

The pattern across all of these is the same. Stateless agents are a prototype-stage limitation, not a feature. Production-grade AI agents need to remember what they learned, and the teams building memory infrastructure are treating it with the same seriousness as database design or caching strategy.

VentureBeat’s 2026 prediction that contextual memory will surpass RAG for agentic AI is already playing out. The question is no longer whether AI coding agents need persistent memory, but which memory architecture wins.

Getting Started with claude-mem

Installation takes under two minutes. The plugin requires Node.js 18+ and Bun (for the worker service).

npx claude-mem@latest init

This command registers the lifecycle hooks in your Claude Code configuration and starts the worker service. From that point, every Claude Code session in the initialized project automatically captures and retrieves memories.

The worker service exposes a web UI at http://localhost:37777 where you can browse your memory stream, inspect individual observations, and manage settings. You can also query memories directly through MCP tools that claude-mem exposes to Claude Code, letting the agent search its own memory as part of its reasoning process.

For teams, claude-mem stores everything locally by default (SQLite files in your project directory). There is no cloud sync, no data leaving your machine. The project is licensed under AGPL-3.0, so you can inspect every line of code that touches your session data.

Frequently Asked Questions

What is claude-mem and how does it work?

claude-mem is an open-source Claude Code plugin that captures everything the AI coding agent does during a session (file edits, bash commands, search results) through five lifecycle hooks, compresses those observations using AI, and injects relevant context back into future sessions. It uses SQLite for keyword search and ChromaDB for semantic search, achieving roughly 10x token efficiency compared to manual context management.

Does claude-mem send my code to the cloud?

No. claude-mem stores all data locally in SQLite and ChromaDB databases within your project directory. No session data leaves your machine. The project is licensed under AGPL-3.0, so the full source code is available for audit.

How much context does claude-mem add to each session?

claude-mem uses a layered retrieval approach that typically adds 500-2,000 tokens of compressed context per session. It first returns a compact index of relevant memories, then loads full details only for the ones Claude actually needs, preserving most of your context window for coding work.

Can I use claude-mem with other AI coding agents besides Claude Code?

claude-mem is built specifically for Claude Code’s lifecycle hook system and currently only works with Claude Code. For other AI coding agents, alternatives like Mem0, Zep, or custom MCP-based memory servers provide similar persistent memory capabilities across different platforms.

Why did claude-mem become so popular on GitHub?

claude-mem hit #1 on GitHub Trending because it solved a universal pain point: AI coding agents forgetting everything between sessions. Developers reported spending 15-30 minutes per session re-establishing context, and claude-mem eliminated that friction with automatic capture and injection of relevant past context.

The Amnesia Problem Is Costing Developers Real Time#

Five Hooks, One Worker Service: How claude-mem Captures Context#

SessionStart: Injecting Past Context#

UserPromptSubmit: Capturing Intent#

PostToolUse: Recording Every Action#

Stop and SessionEnd: Compression and Persistence#

The Dual Database Architecture#

SQLite with FTS5: The “When and What” Database#

ChromaDB: The “Similar Concept” Database#

Memory Retrieval Layers#

Why Memory Infrastructure Matters Beyond claude-mem#

Getting Started with claude-mem#

Frequently Asked Questions#

What is claude-mem and how does it work?#

Does claude-mem send my code to the cloud?#

How much context does claude-mem add to each session?#

Can I use claude-mem with other AI coding agents besides Claude Code?#

Why did claude-mem become so popular on GitHub?#