Benchmark B6

Text Ingestion Pipeline
8/8 facts correct across 2 entities. Fixed in v0.2.16.

B6 tests iranti_ingest: give Iranti a text passage and have it extract and store facts automatically. Earlier versions showed contamination (v0.2.12) and a silent chunker failure (v0.2.14). In v0.2.16, the pipeline works correctly — 8 out of 8 facts extracted, zero cross-entity contamination.

Executed 2026-03-21n=2 entities, 8 factsFixed in v0.2.16

Results at a glance

8/8Facts correctly extracted across 2 researcher entities
2/2Entities with fully correct extraction (4 facts each)
0Cross-entity contamination errors (was 3/3 in v0.2.12)
Findingiranti_ingest now works cleanly with a real AI provider. Facts extracted from each passage match the source — no bleed between entities, no hallucinated values, no silent failures.

What this measures

Structured writes via iranti_write require the agent to explicitly name the key and provide the value. That works well — other benchmarks confirm it. But it shifts the extraction burden onto the calling agent. A more productive interface lets an agent hand Iranti a block of text and have the storage layer extract the facts automatically.

That is what iranti_ingest does. B6 tests whether this automatic extraction pipeline works reliably. The input is a short, unambiguous passage about a fictional researcher containing exactly 4 facts. A working ingest pipeline returns 4/4 correct per entity.

In v0.2.16, with a real AI provider, both entities returned 4/4. The full version history below shows how we arrived at a clean result.

Honest version history

B6 ran against three versions. The first two showed serious problems. The third is clean.

VersionResultScoreNotes
v0.2.12FAIL — contamination1/43/3 wrong values matched existing KB entries
v0.2.14FAIL — chunker silent failure0/4Chunker defect — ingest returned no extractions
v0.2.16PASS — fixed8/8Both entities fully correct, no contamination
Limitationv0.2.12 — contamination pattern. 3 out of 4 wrong values each matched a real entry from a different entity already in the KB. Confidence scores reported 87–92 on all 4 extractions, including the 3 wrong ones — making automated validation unreliable. The root cause was a mock provider artifact interacting with KB context.
Limitationv0.2.14 — chunker silent failure. A defect in the chunker caused the pipeline to return zero extractions without surfacing an error. The calling agent received an empty response and had no reliable way to detect the failure.
Findingv0.2.16 — fixed. Both the contamination root cause (mock provider interaction) and the chunker defect were resolved between v0.2.14 and v0.2.16. The pipeline now runs cleanly against a real AI provider.

How it works now — source to extraction

Entity 1 (Dr. Elena Vasquez). All 4 facts extracted and stored correctly. No values from other KB entries.

Source text (input to iranti_ingest)

"Dr. Elena Vasquez is a researcher at Stanford AI Lab. She has published 22 papers to date. Prior to her current role, she worked at Google Brain from 2016 to 2018. Her research focuses on out-of-distribution generalization."

Clear, unambiguous — 4 distinct facts
What iranti_ingest stored
  • Institution: Stanford AI Lab
  • Publications: 22
  • previous_employer.employer: Google Brain
  • Focus: out-of-distribution generalization
All 4 facts extracted correctly. No contamination.

Full extraction results — both entities

Each row shows the ground truth from the source text and what Iranti stored. All 8 facts correct across both entities.

Entity 1 — Dr. Elena Vasquez (Stanford AI Lab)
FactGround truthWhat Iranti storedResult
InstitutionStanford AI LabStanford AI Lab
Publication count2222
Previous employerGoogle Brain (2016–2018)Google Brain / 2016 / 2018
Research focusout-of-distribution generalizationout-of-distribution generalization
Total4 facts4 correct4/4
Entity 2 — Dr. Marcus Chen (MIT CSAIL)
FactGround truthWhat Iranti storedResult
InstitutionMIT CSAILMIT CSAIL
Publication count1717
Previous employerDeepMind (2019–2021)DeepMind / 2019 / 2021
Research focuscausal inferencecausal inference
Total4 facts4 correct4/4

Important behavioral note — compound fact decomposition

The Librarian decomposes compound facts into sub-keys. A fact like "Google Brain 2016–2018" is not stored as a single string under previous_employer. Instead, three separate keys are written:

Stored keys (sub-key decomposition)
previous_employer.employer"Google Brain"
previous_employer.employment_start_year"2016"
previous_employer.employment_end_year"2018"
Querying previous_employer (parent key) will return nothing. Query the sub-keys directly.
LimitationDesign your queries for sub-keys. If you ingest a compound fact and then query the parent key (e.g., previous_employer), the query will return nothing. This is by design — the Librarian stores precision over convenience. Query previous_employer.employer instead. If you mix iranti_ingest-written entries with structured iranti_write entries for the same entity, ensure the key shapes are consistent.

Honest limitations

Limitationn=2 entities, single test session. Both entities returned 4/4, but the sample size is small. Behavior at scale — larger KBs, longer passages, more entity types — is not yet characterized.
LimitationReal AI provider required. The v0.2.12 contamination pattern was a testing artifact of the mock provider interacting with KB context. The v0.2.16 results were produced with a real AI provider. iranti_ingest should not be used with a mock provider in production contexts.
LimitationSub-key decomposition can break queries. Compound facts are always decomposed into sub-keys. If callers query the parent key, they get nothing. This is consistent behavior but requires awareness at the query layer.
NotePrior versions showed serious problems. v0.2.12 and v0.2.14 both produced wrong results through different mechanisms. The fix landed in v0.2.16. Version pinning matters for this feature.

Key findings

Findingiranti_ingest works correctly in v0.2.16 with a real AI provider. 8 out of 8 facts extracted correctly across 2 entities. Zero cross-entity contamination. The pipeline is now ready for evaluation in production-like conditions.
FindingCompound facts are decomposed — design queries accordingly. The Librarian splits compound values into typed sub-keys. This is intentional precision. Query the sub-keys, not the parent key, and ensure consistency if mixing ingest-written and structured-write entries.
FindingThe contamination pattern in v0.2.12 was a testing artifact. Using a mock provider introduced the contamination pattern. Real provider runs do not exhibit it. The KB storage layer itself was reliable — the problem was upstream in extraction.
FindingWrite durability works. What gets stored via iranti_ingest is correct and retrievable. The pipeline now composes cleanly with the rest of the KB surface.
Raw data

Full trial execution records, source text, extraction output, and methodology notes in the benchmarking repository.