Benchmark B6

Text Ingestion Pipeline
8/8 facts correct across 2 entities. Fixed in v0.2.16.

B6 tests iranti_ingest: give Iranti a text passage and have it extract and store facts automatically. Earlier versions showed contamination (v0.2.12) and a silent chunker failure (v0.2.14). In v0.2.16, the pipeline works correctly — 8 out of 8 facts extracted, zero cross-entity contamination.

Executed 2026-03-21n=2 entities, 8 factsFixed in v0.2.16

Results at a glance

8/8Facts correctly extracted across 2 researcher entities

2/2Entities with fully correct extraction (4 facts each)

0Cross-entity contamination errors (was 3/3 in v0.2.12)

Findingiranti_ingest now works cleanly with a real AI provider. Facts extracted from each passage match the source — no bleed between entities, no hallucinated values, no silent failures.

What this measures

Structured writes via iranti_write require the agent to explicitly name the key and provide the value. That works well — other benchmarks confirm it. But it shifts the extraction burden onto the calling agent. A more productive interface lets an agent hand Iranti a block of text and have the storage layer extract the facts automatically.

That is what iranti_ingest does. B6 tests whether this automatic extraction pipeline works reliably. The input is a short, unambiguous passage about a fictional researcher containing exactly 4 facts. A working ingest pipeline returns 4/4 correct per entity.

In v0.2.16, with a real AI provider, both entities returned 4/4. The full version history below shows how we arrived at a clean result.

Honest version history

B6 ran against three versions. The first two showed serious problems. The third is clean.

Version	Result	Score	Notes
v0.2.12	FAIL — contamination	1/4	3/3 wrong values matched existing KB entries
v0.2.14	FAIL — chunker silent failure	0/4	Chunker defect — ingest returned no extractions
v0.2.16	PASS — fixed	8/8	Both entities fully correct, no contamination

Limitationv0.2.12 — contamination pattern. 3 out of 4 wrong values each matched a real entry from a different entity already in the KB. Confidence scores reported 87–92 on all 4 extractions, including the 3 wrong ones — making automated validation unreliable. The root cause was a mock provider artifact interacting with KB context.

Limitationv0.2.14 — chunker silent failure. A defect in the chunker caused the pipeline to return zero extractions without surfacing an error. The calling agent received an empty response and had no reliable way to detect the failure.

Findingv0.2.16 — fixed. Both the contamination root cause (mock provider interaction) and the chunker defect were resolved between v0.2.14 and v0.2.16. The pipeline now runs cleanly against a real AI provider.

How it works now — source to extraction

Entity 1 (Dr. Elena Vasquez). All 4 facts extracted and stored correctly. No values from other KB entries.

Source text (input to iranti_ingest)

"Dr. Elena Vasquez is a researcher at Stanford AI Lab. She has published 22 papers to date. Prior to her current role, she worked at Google Brain from 2016 to 2018. Her research focuses on out-of-distribution generalization."

Clear, unambiguous — 4 distinct facts

What iranti_ingest stored

✓Institution: Stanford AI Lab
✓Publications: 22
✓previous_employer.employer: Google Brain
✓Focus: out-of-distribution generalization

All 4 facts extracted correctly. No contamination.

Full extraction results — both entities

Each row shows the ground truth from the source text and what Iranti stored. All 8 facts correct across both entities.

Entity 1 — Dr. Elena Vasquez (Stanford AI Lab)

Fact	Ground truth	What Iranti stored	Result
Institution	Stanford AI Lab	Stanford AI Lab
Publication count	22	22
Previous employer	Google Brain (2016–2018)	Google Brain / 2016 / 2018
Research focus	out-of-distribution generalization	out-of-distribution generalization
Total	4 facts	4 correct	4/4

Entity 2 — Dr. Marcus Chen (MIT CSAIL)

Fact	Ground truth	What Iranti stored	Result
Institution	MIT CSAIL	MIT CSAIL
Publication count	17	17
Previous employer	DeepMind (2019–2021)	DeepMind / 2019 / 2021
Research focus	causal inference	causal inference
Total	4 facts	4 correct	4/4

Important behavioral note — compound fact decomposition

The Librarian decomposes compound facts into sub-keys. A fact like "Google Brain 2016–2018" is not stored as a single string under previous_employer. Instead, three separate keys are written:

Stored keys (sub-key decomposition)

✓previous_employer.employer→"Google Brain"

✓previous_employer.employment_start_year→"2016"

✓previous_employer.employment_end_year→"2018"

Querying previous_employer (parent key) will return nothing. Query the sub-keys directly.

LimitationDesign your queries for sub-keys. If you ingest a compound fact and then query the parent key (e.g., previous_employer), the query will return nothing. This is by design — the Librarian stores precision over convenience. Query previous_employer.employer instead. If you mix iranti_ingest-written entries with structured iranti_write entries for the same entity, ensure the key shapes are consistent.

Honest limitations

Limitationn=2 entities, single test session. Both entities returned 4/4, but the sample size is small. Behavior at scale — larger KBs, longer passages, more entity types — is not yet characterized.

LimitationReal AI provider required. The v0.2.12 contamination pattern was a testing artifact of the mock provider interacting with KB context. The v0.2.16 results were produced with a real AI provider. iranti_ingest should not be used with a mock provider in production contexts.

LimitationSub-key decomposition can break queries. Compound facts are always decomposed into sub-keys. If callers query the parent key, they get nothing. This is consistent behavior but requires awareness at the query layer.

NotePrior versions showed serious problems. v0.2.12 and v0.2.14 both produced wrong results through different mechanisms. The fix landed in v0.2.16. Version pinning matters for this feature.

Key findings

Findingiranti_ingest works correctly in v0.2.16 with a real AI provider. 8 out of 8 facts extracted correctly across 2 entities. Zero cross-entity contamination. The pipeline is now ready for evaluation in production-like conditions.

FindingCompound facts are decomposed — design queries accordingly. The Librarian splits compound values into typed sub-keys. This is intentional precision. Query the sub-keys, not the parent key, and ensure consistency if mixing ingest-written and structured-write entries.

FindingThe contamination pattern in v0.2.12 was a testing artifact. Using a mock provider introduced the contamination pattern. Real provider runs do not exhibit it. The KB storage layer itself was reliable — the problem was upstream in extraction.

FindingWrite durability works. What gets stored via iranti_ingest is correct and retrievable. The pipeline now composes cleanly with the rest of the KB surface.

Raw data

Full trial execution records, source text, extraction output, and methodology notes in the benchmarking repository.

iranti-benchmarking →← All benchmarks

Text Ingestion Pipeline8/8 facts correct across 2 entities. Fixed in v0.2.16.