C2 — Pool Efficiency

Accuracy is half the story.
Token cost is the other half.

All 20 facts written to a single shared namespace. 40 queries must find the right fact among everything stored. The efficiency score compounds accuracy and injection size: a system that returns too much context loses points even if it finds the right answer.

Iranti

100% · 20 tok/query

1.39

Shodh

92% · 66 tok/query

4.44

Mem0

80% · 18 tok/query

1.22

Graphiti

60% · 49 tok/query

The pool test

C1 tests each fact in isolation — one fact per namespace, zero competition for retrieval. C2 is harder: all 20 facts are written to a single shared namespace, and each query must return the one relevant fact from the full pool.

This tests how well each system concentrates its injection on relevant content versus returning bulk context. In production, a memory pool grows over time — systems that return more context with each query impose higher token cost at inference time.

For Iranti, the shared namespace is a single project/user context. For Shodh, a single user ID. For Mem0, a single user ID with one Chroma collection. For Graphiti, a singlegroup_id containing all episodes.

Efficiency formula

The efficiency score is a compound metric that penalizes token bloat:

efficiency = accuracy% ÷ avg_tok_per_query

Iranti

100 ÷ 20 = 5.0

Shodh

92 ÷ 66 = 1.39

Mem0

80 ÷ 18 = 4.44

Graphiti

60 ÷ 49 = 1.22

Tokens counted as whitespace-delimited words in the returned context string. Per-query averages computed across all 40 queries.

Three-axis comparison

Accuracy

Iranti100%

Shodh92%

Mem080%

Graphiti60%

Avg tok/query (lower = better)

Iranti20

Shodh66

Mem018

Graphiti49

Bar length = relative token cost

Efficiency score

Iranti5

Shodh1.39

Mem04.44

Graphiti1.22

Normalized to max 5.0

Why Shodh's token cost collapses efficiency

In C1 (isolated namespaces), Shodh returned 20 tokens per query — identical to Iranti. This is because each namespace contained exactly one fact, so recall returned exactly that fact's text.

In C2 (shared pool of 20 facts), Shodh's token count jumps to 66 tokens per query. Shodh's recall implementation returns the full text of each matched memory without summarization or truncation. When the pool contains 20 facts and recall is tuned to return top-k results, the returned context includes multiple full fact texts.

At 66 tok/query across 40 queries, Shodh injects 2,640 tokens of memory context into a typical session — versus Iranti's 800. At scale, this differential compounds across every turn that involves memory retrieval.

Shodh accuracy in pool: 92%

Shodh still finds the right answer 92% of the time — the accuracy degradation from isolated to pool is small (100% → 92%). The problem is not retrieval quality, it is injection volume. The correct fact is in the context, surrounded by other facts.

Isolated (C1) vs Pool (C2)

System	Isolated tok/q	Pool tok/q
Iranti	20	20
Shodh	20	66 ↑
Mem0	13	18
Graphiti	37	49

Iranti pool behavior

Iranti's attend-based injection returns only the entity facts relevant to the current query — 20 tokens whether the pool has 1 fact or 1,000. This is a consequence of structured entity+key addressing: the query maps to specific entity attributes, not a full-text search across all stored content.

Key findings

Iranti's structured attend-based injection returns only the relevant entity fact — 20 tok/query regardless of pool size.

Shodh's recall returns full memory text from the pool: 66 tok/query despite 92% accuracy collapses efficiency to 1.39.

Mem0 is lean (18 tok/query) but at 80% accuracy — efficiency score 4.44, second only to Iranti.

Graphiti returns ~49 tok/query of rephrased edge facts at 60% accuracy: worst efficiency score at 1.22.

← All benchmarks ← C1: Recall accuracy C3: Conflict resolution →