Entity Fact Retrieval
under Distractor Density.
Does Iranti's exact entity/key lookup maintain precision as distractor entity count grows — a regime where context-reading is expected to degrade? A Needle-in-a-Haystack variant adapted for structured fact retrieval.
Results at a glance
Accuracy vs scale
Both arms at ceiling across all tested N
Both arms at ceiling through N=1000. At N=5000 (~276k tokens), the baseline becomes infeasible — the haystack document exceeds Claude's 200k context window. Iranti returned 4/4 facts via exact key lookup. The dashed region marks where baseline context-reading can no longer run.
⚑ N=20+adversarial: wrong facts injected for needle entities. Baseline unaffected at this scale.
Methodology
What this measures
In production multi-agent systems, agents accumulate many facts about many entities across their lifetime. If those facts live in context, retrieval precision degrades as transcript length grows — the model has more to sort through, with more surface area for confusion. If facts live in a structured KB, retrieval is an O(1) exact lookup unaffected by how many other entities exist.
B1 tests this directly: we embed two target entities ("needle" entities) among N distractor entities, ask 10 questions about the needle entities, and compare context-reading accuracy against Iranti's iranti_query(entity, key) lookup.
Inspired by Greg Kamradt's Needle-in-a-Haystack (2023) and RULER (Hsieh et al., 2024). Our adaptation replaces the sentence needle with a structured entity/key fact and replaces document-position variation with entity-count variation.
Conditions
Tested scales
| Condition | Haystack size | Notes |
|---|---|---|
| N=5 | ~400 tok | Short haystack — high signal-to-noise |
| N=20 | ~1.6k tok | Medium haystack |
| N=20+adversarial | ~1.6k tok | Wrong values injected for needle entities to test confound resistance |
| N=100 | ~8k tok | Larger haystack — both arms still at ceiling |
| N=500 | ~28k tok | Long context — both arms still at ceiling |
| N=1000 | ~57k tok | Null differential confirmed — both arms at ceiling through ~57k tokens |
| N=5000 | ~276k tok | First positive differential: Iranti 4/4, baseline infeasible (document exceeds 200k context window) |
Test data
The needle entities
Two fictional researchers with four facts each. Embedded at a fixed position in every haystack.
Write arm trial results
Full ingest → retrieve cycle
Both entities written to KB via iranti_write, then retrieved via iranti_query with no haystack document in context. Tests the full pipeline — not just retrieval from pre-existing KB data. Each cell below represents one trial.
| Entity | affiliation | publication_count | previous_employer | research_focus |
|---|---|---|---|---|
| researcher/alice_chen | ||||
| researcher/bob_okafor |
All 8 cells confirmed correct. Zero hallucinations. Zero cross-entity contamination (alice facts never attributed to bob, and vice versa).
| Trial | Entity | Key | Result |
|---|---|---|---|
| IB1 | researcher/alice_chen | affiliation | ✓ correct |
| IB2 | researcher/alice_chen | publication_count | ✓ correct |
| IB3 | researcher/alice_chen | previous_employer | ✓ correct |
| IB4 | researcher/alice_chen | research_focus | ✓ correct |
| IB5 | researcher/bob_okafor | affiliation | ✓ correct |
| IB6 | researcher/bob_okafor | publication_count | ✓ correct |
| IB7 | researcher/bob_okafor | previous_employer | ✓ correct |
| IB8 | researcher/bob_okafor | research_focus | ✓ correct |
Write arm accuracy: 8/8 (100%). Recall: 8/8. Precision: 8/8.
The metadata advantage
Provenance, not just value
Every iranti_query result includes confidence, source, timestamp, and contested status. Context-reading returns a value and nothing else. The metadata is what allows downstream agents to reason about reliability — without needing to re-read the source document.
{"found": true,"entity": "researcher/alice_chen","key": "affiliation","value": "MIT Computer Science","confidence": 0.98,"source": "agent/site_main","validFrom": "2026-03-20T14:30:00Z","contested": false}
Analysis
Key findings and limitations
iranti_search (semantic) vs iranti_query (exact) found the semantic path returned the target fact in 5th position — not first. For structured fact retrieval, exact key lookup is the correct tool.iranti_query result includes confidence, source, validFrom, and contested fields. Context-reading returns none of this metadata.Full trial execution records, baseline runs, dataset definitions, and statistical notes are in the benchmarking repository.