Context Economy
Iranti uses 37% fewer input tokens by turn 15 of a coding session compared to an agent that re-reads files on recall turns. Token counts are exact — measured via the Anthropic countTokens API, not char/4 estimates.
Token divergence over time
Cumulative input tokens at each turn. Both arms are identical through turn 7. The gap widens monotonically from turn 8 as recall turns accumulate.
Per-turn results
Full 15-turn breakdown. Tokens are cumulative input tokens at the start of each turn.
| Turn | Phase | No memory | With Iranti | Saved |
|---|---|---|---|---|
| 1 | establishment | 1,081 | 1,081 | — |
| 2 | establishment | 1,556 | 1,556 | — |
| 3 | establishment | 1,969 | 1,969 | — |
| 4 | establishment | 2,379 | 2,379 | — |
| 5 | establishment | 2,779 | 2,779 | — |
| 6 | establishment | 3,252 | 3,252 | — |
| 7 | establishment | 3,781 | 3,781 | — |
| 8 | recall | 4,220 | 3,980 | 6% |
| 9 | recall | 4,730 | 4,163 | 12% |
| 10 | recall | 5,236 | 4,355 | 17% |
| 11 | recall | 5,802 | 4,542 | 22% |
| 12 | recall | 6,256 | 4,769 | 24% |
| 13 | recall | 6,711 | 4,981 | 26% |
| 14 | recall | 8,043 | 5,362 | 33% |
| 15 | recall | 8,949 | 5,677 | 37% |
What this measures
Every multi-turn AI coding session accumulates tokens. When an agent needs a specific value from an earlier file — a config key, a function signature, a database schema — it either keeps the file in context (inflating token count every turn) or re-reads it (adding a full tool result to the window). Either way, context grows faster than it needs to.
Iranti's inject blocks are the alternative: instead of the full file (~300–600 tok), the agent receives a compact structured fact (~50–150 tok) with exactly the value it needed. The difference compounds across every recall turn.
Agent re-reads the relevant source file on recall turns. A file read adds the full file content as a tool result to the messages array. All prior tool results also accumulate — the window grows with every turn.
Agent receives a compact inject block on recall turns. Iranti's identity-first retrieval returns only the needed fact. The inject block is ~50–150 tokens vs. ~300–600 for a file re-read.
How we measured it
The benchmark uses the Anthropic client.beta.messages.countTokens() API to get exact token counts for the full messages array at each turn — no generation, no sampling, no char/4 approximation. Both arms run concurrently via Promise.all() per turn for a fair comparison.
- →7 synthetic TypeScript/SQL files covering a fictional auth system (~300–600 tok each)
- →15-turn DebugAuth session: 7 establishment turns then 8 recall turns
- →Establishment turns identical across both arms — no divergence until recall
- →Recall turns: NO_MEMORY re-reads the relevant file; WITH_IRANTI uses a pre-computed v0.3.11 inject block
- →Model: claude-sonnet-4-6 (token counts are model-specific)
- →Context window: 200k tokens — session reaches 4.5% (NO_MEMORY) vs 2.8% (WITH_IRANTI) by turn 15