Accuracy is half the story.
Token cost is the other half.
All 20 facts written to a single shared namespace. 40 queries must find the right fact among everything stored. The efficiency score compounds accuracy and injection size: a system that returns too much context loses points even if it finds the right answer.
The pool test
C1 tests each fact in isolation — one fact per namespace, zero competition for retrieval. C2 is harder: all 20 facts are written to a single shared namespace, and each query must return the one relevant fact from the full pool.
This tests how well each system concentrates its injection on relevant content versus returning bulk context. In production, a memory pool grows over time — systems that return more context with each query impose higher token cost at inference time.
For Iranti, the shared namespace is a single project/user context. For Shodh, a single user ID. For Mem0, a single user ID with one Chroma collection. For Graphiti, a singlegroup_id containing all episodes.
Efficiency formula
The efficiency score is a compound metric that penalizes token bloat:
Tokens counted as whitespace-delimited words in the returned context string. Per-query averages computed across all 40 queries.
Three-axis comparison
Bar length = relative token cost
Normalized to max 5.0
Why Shodh's token cost collapses efficiency
In C1 (isolated namespaces), Shodh returned 20 tokens per query — identical to Iranti. This is because each namespace contained exactly one fact, so recall returned exactly that fact's text.
In C2 (shared pool of 20 facts), Shodh's token count jumps to 66 tokens per query. Shodh's recall implementation returns the full text of each matched memory without summarization or truncation. When the pool contains 20 facts and recall is tuned to return top-k results, the returned context includes multiple full fact texts.
At 66 tok/query across 40 queries, Shodh injects 2,640 tokens of memory context into a typical session — versus Iranti's 800. At scale, this differential compounds across every turn that involves memory retrieval.
Shodh still finds the right answer 92% of the time — the accuracy degradation from isolated to pool is small (100% → 92%). The problem is not retrieval quality, it is injection volume. The correct fact is in the context, surrounded by other facts.
| System | Isolated tok/q | Pool tok/q |
|---|---|---|
| Iranti | 20 | 20 |
| Shodh | 20 | 66 ↑ |
| Mem0 | 13 | 18 |
| Graphiti | 37 | 49 |
Iranti's attend-based injection returns only the entity facts relevant to the current query — 20 tokens whether the pool has 1 fact or 1,000. This is a consequence of structured entity+key addressing: the query maps to specific entity attributes, not a full-text search across all stored content.
Key findings
Iranti's structured attend-based injection returns only the relevant entity fact — 20 tok/query regardless of pool size.
Shodh's recall returns full memory text from the pool: 66 tok/query despite 92% accuracy collapses efficiency to 1.39.
Mem0 is lean (18 tok/query) but at 80% accuracy — efficiency score 4.44, second only to Iranti.
Graphiti returns ~49 tok/query of rephrased edge facts at 60% accuracy: worst efficiency score at 1.22.