AI Agents Need Memory, Not Databases: Inside Uteke, the Offline-First Semantic Memory Engine
Uteke gives AI agents persistent, searchable memory in a single binary — no API keys, no Docker, no cloud. Here's what makes it different.
Why Do AI Agents Have Amnesia?
Here's a problem most AI builders encounter sooner or later: your agent does something brilliant in one session, then forgets everything by the next. Context windows reset, conversation history evaporates, and all those carefully learned preferences, decisions, and insights vanish into thin air.
The standard fix has been to hook agents up to cloud databases — Mem0, Letta, Zep — services that store and retrieve memories via API calls. These tools work, but they come with a hidden cost: your data leaves your machine, travels through the internet, and lands on someone else's infrastructure. For developers building privacy-conscious agents, working with sensitive data, or running in air-gapped environments, that's a non-starter.
Enter Uteke — a Rust-based semantic memory engine that gives AI agents persistent, searchable memory without ever calling an external API.
What Makes Uteke Different From Other Memory Solutions?
Uteke positions itself as the privacy-first, zero-dependency alternative to cloud-based memory layers. Here's how it compares to the established players:
| Feature | Uteke | Mem0 | Letta | Zep |
|---|---|---|---|---|
| Setup | Single binary | pip + Docker + Qdrant | pip + Docker + Postgres | pip + Docker + Neo4j |
| API Keys Needed | None | OpenAI/LLM key | LLM key | LLM + vector DB |
| Offline | Fully | Cloud embedding | Needs LLM server | Needs LLM + Neo4j |
| Semantic Search | ONNX + FTS5 hybrid | Cloud embedding | Keyword + archival | GraphRAG |
| Full-Text Search | FTS5 built-in | No | Keyword only | No |
| Knowledge Graph | SQLite-backed (v0.2) | No | No | Neo4j required |
| Code Chunking | AST-aware (5 langs) | No | No | No |
| Recall Speed | ~30ms | Network round-trip | Network round-trip | Network round-trip |
| Privacy | Data never leaves machine | Data sent to LLM | Data sent to LLM | Data sent to LLM |
| License | Apache 2.0 | Apache 2.0 | Apache 2.0 | Apache 2.0 |
The key differentiator: Uteke runs entirely on your hardware. No Docker containers, no database servers, no Python dependencies. One binary, ~188MB embedding model downloaded on first run, and you're operational.
How Does the Hybrid Search System Work?
Most memory solutions rely on either vector similarity or keyword matching. Uteke does both simultaneously. Every stored memory gets converted into a 768-dimensional vector using an ONNX Runtime model (EmbeddingGemma Q4) running locally on your CPU. At the same time, the raw text is indexed in SQLite's FTS5 full-text search engine.
When you query for a memory, both systems run in parallel and their results are merged using Reciprocal Rank Fusion (RRF) — an algorithm that combines ranked lists without needing score normalization. The result is a search that understands semantic meaning and catches exact keyword matches, giving you better recall than either approach alone.
# Store a memory with rich metadata
uteke remember "Deploy v2.1 to staging" --tags deploy,staging \
--entity staging-server --category infrastructure
# Hybrid recall — semantic + keyword, merged by RRF
uteke recall "when do we deploy?"
# Start server mode for persistent fast access (~42ms)
uteke-serve --port 8767What Are Rooms and Why Do They Matter?
Uteke introduced Rooms — a way to group memories by context. Think of a Room as a container for everything related to a specific meeting, project, or discussion thread. Each memory in a Room carries author attribution, so you can trace who contributed what.
# Create a room for your project
uteke room create "Project Alpha" --tags project,alpha
# Add memories to the room
uteke room add "Project Alpha" "Backend uses PostgreSQL with connection pooling"
# Recall from a specific room
uteke room recall "Project Alpha" --query "database setup"
# Generate a structured document from room memories
uteke room document "Project Alpha"This is particularly useful for multi-agent systems. Each agent gets its own namespace — fully isolated memory spaces with zero overhead. Combine that with Rooms, and you have the foundation for collaborative AI systems where agents share context within projects while keeping their individual memories separate.
Can You Time-Travel Through Your AI's Memories?
One of the most interesting features is time-travel queries. Uteke tracks when memories are created, modified, and deprecated. This means you can ask: "What did my agent know about this topic on June 1st?" and get results reflecting the state of knowledge at that specific point in time.
# Recall memories as they existed on a specific date
uteke recall "deployment strategy" --at 2026-06-01T12:00:00Z
# List all memories known at a point in time
uteke list --at 2026-06-01T12:00:00ZThis has practical implications for debugging agent decisions, maintaining audit trails, and understanding how an agent's knowledge evolved over time. No other memory tool in this space offers this capability.
How Does the Relationship Graph Connect Memories?
Memories in isolation are useful. Memories connected with explicit relationships are powerful. Uteke's relationship graph lets you link memories with typed edges: supersedes, contradicts, references, and part_of. When you recall a memory, you can traverse these relationships to surface related context.
# Recall and follow relationship edges
uteke recall "authentication approach" --related --depth 2Combined with smart decay — a composite importance scoring system where you can pin critical memories to prevent them from being forgotten — Uteke gives agents a sophisticated way to manage knowledge lifecycle without manual curation.
What's New in v0.2: Knowledge Graphs, Code, and Structured Data?
Version 0.2.0 shipped on June 14, 2026, and it's a substantial upgrade. The headline features solve three real problems that memory engines typically ignore: relational data, code understanding, and bulk knowledge import.
SQLite-Backed Knowledge Graph
Prior to v0.2, Uteke's relationship tracking was useful but limited to linking memories. The new knowledge graph adds dedicated graph_nodes and graph_edges tables in SQLite, giving you a proper graph database on top of your semantic memory. You can upsert nodes, add typed edges, find paths between nodes using BFS, and query relationships — all from the CLI.
# Create graph nodes
uteke graph nodes upsert "Alice" --type person --tags engineer
uteke graph nodes upsert "Project X" --type project --tags active
# Connect them
uteke graph edges add "Alice" "leads" "Project X"
# Find paths between any two nodes
uteke graph path "Alice" "v2-release"
# Query all relationships for a node
uteke graph neighbors "Alice"
# Get graph stats
uteke graph statsThis is powerful for agents that need to maintain organizational knowledge, team structures, or project dependencies — without needing a separate Neo4j instance. The BFS pathfinding with parent tracking means your agent can answer questions like "how is Alice connected to the v2 release?" with a traversable path, not just a similarity score.
Structured JSON Memory
Not all knowledge is prose. Configuration data, API schemas, user preferences — structured data is a first-class citizen in v0.2. Uteke now auto-detects JSON content and sets content_type='json', then flattens it for embedding so semantic search works on structured data too.
# Store structured data — auto-detected as JSON
uteke remember '{"name":"Alice","role":"engineer","team":"backend"}' --tags team
# Recall by semantic meaning (finds JSON content)
uteke recall "who is on the backend team?"
# Filter JSON memories by key-value
uteke list --where role=engineer
# Pretty-print JSON results
uteke recall "team structure" --content-format jsonThe flattening logic converts {"name": "Alice"} to "name: Alice" for embedding, which means semantic search naturally surfaces structured records alongside prose memories. The --where filter adds exact-match capability for JSON fields, giving you the best of both worlds.
AST-Aware Code Chunking
Agents that work with codebases need memory that understands code structure. v0.2 adds a regex-based AST chunker that splits code files by functions, classes, and blocks — without requiring tree-sitter or any external dependency.
# Import a codebase into memory
uteke import src/main.rs --tags rust,backend --format code
# The chunker handles Rust, Go, Python, TypeScript/JS, and Dart
# Each function/class becomes a separate memory with contextSupported languages: Rust, Go, Python, TypeScript/JavaScript, and Dart. Unknown languages fall back to whole-file import. Combined with the extract_imports() function, agents can import entire codebases and semantically search for specific functions or patterns.
External Knowledge Import
Bulk import has been one of the most requested features. v0.2 adds uteke import with auto-format detection for markdown, JSONL, and plain text files.
# Import markdown — splits by headings, each section = one memory
uteke import docs/architecture.md --tags architecture,docs
# Import JSONL (each line = one memory)
uteke import data/conversations.jsonl --tags training
# Import from stdin
echo "Important context about the deployment" | uteke import - --tags note
# Text files — splits by double newline (paragraphs)
uteke import notes.txt --tags notesMarkdown files get split by heading boundaries, so each section becomes a discrete, searchable memory. This is particularly useful for importing documentation, wikis, or knowledge bases into Uteke's semantic memory.
Docker and Environment Variable Support
v0.2 also adds first-class Docker support with a docker-compose.yml that includes health checks and volume persistence. For server deployments, environment variables now cover all major configuration options: UTEKE_LOG_LEVEL, UTEKE_SERVER_HOST, UTEKE_SERVER_PORT, UTEKE_RECALL_MIN_SCORE, and UTEKE_RECALL_STRICT.
# Pull and run with Docker
docker compose up -d
# Configure via environment
UTEKE_RECALL_MIN_SCORE=0.3 uteke-serve --port 8767Resolution order: CLI flag > environment variable > config file > default. This makes deployment in containerized and multi-agent environments significantly easier.
What's the Performance Like in Practice?
Speed matters when memory recall is part of every agent interaction. Uteke's library mode delivers ~30ms recall times. The server mode (uteke-serve) improves this to ~42ms for warm queries while being 75x faster than CLI cold starts — the ONNX model loads once and stays resident in memory.
CLI cold start dropped from ~3 seconds to ~20ms for non-embedding commands by lazily loading the ONNX model only when needed. Commands like list, stats, tags, and the new graph subcommands now start instantly.
Is Uteke Ready for Production?
Uteke is open-source under Apache 2.0, written in Rust, and available as a single binary for Linux (x86_64 and ARM64), macOS (Apple Silicon), and Windows. The project has hit v0.2.0 with schema migrations handling automatic database upgrades, 108 unit tests across the workspace, and a Docker quickstart for one-command deployment.
For AI agent developers who need persistent memory without cloud dependencies, Uteke is worth serious consideration. It's particularly compelling for organizations pushing toward open-source AI infrastructure, and for use cases where data sovereignty and supply chain security are non-negotiable.
The pluggable embedding architecture means that as new models emerge, Uteke can adopt them without architectural changes. The Embedder trait currently supports ONNX backends, with OpenAI and Ollama adapters on the roadmap.
You can check it out at github.com/codecoradev/uteke — installation is a single curl command, and your agent will have memory in under a minute.
Comments ()