Benchmark B2

Cross-Session Memory
Persistence.

Can facts written to Iranti in one process session be retrieved accurately in a later, entirely separate session — with no in-context knowledge of what was written? This is the core promise of durable memory infrastructure.

Executed 2026-03-20 → 2026-03-215 entities × 4 facts = 20 factssmall-n — indicative only

Results at a glance

20/20Facts recalled in Session 2 (100%)
0/20Baseline — stateless LLM (definitional)
0Hallucinated facts returned
FindingIranti provides cross-session persistence that a stateless LLM cannot. The 0% baseline is not a performance measurement — it is definitional. Without external memory, a new session has no access to prior session data.

The mechanism

Session 1 writes 20 facts via iranti_write. The KB is the only bridge between sessions. Session 2 opens with no knowledge of Session 1 — only iranti_query calls recover the facts.

Session 1iranti_write ×20KB persistsdurable storageSession 2iranti_query ×20writeretrieve (no context)

Recall rate

How many of the 20 written facts were correctly recalled in Session 2? The baseline is 0% by definition — stateless LLMs have no memory between process invocations.

Iranti20 / 20  (100%)
Baseline (LLM without memory)0 / 20  (0%)

Definitional — a stateless LLM has no access to Session 1 data in Session 2. This is not an empirical measurement.

The evidence — 5 × 4 persistence grid

Five fictional researcher entities, each with four facts, written in Session 1 and retrieved in Session 2. Every cell below confirms a correct retrieval. The KB was the only persistence mechanism between sessions.

EntityS1 writeaffiliationpub_countprev_employerresearch_focusS2 read
priya_nairW
R
james_oseiW
R
yuki_tanakaW
R
fatima_al_rashidW
R
marco_delucaW
R
Total55/55/55/55/520/20

Each checkmark represents a fact correctly retrieved in Session 2 with no in-context knowledge of what was written. W = written (Session 1). R = retrieved (Session 2).

Ground truth entities

Five fictional researchers. All entity IDs and fact values are synthetic — fabricated specifically for this benchmark, not drawn from real individuals.

researcher/priya_nair
affiliationUniversity of Toronto
publication_count34
previous_employerIBM Research (2016–2020)
research_focusfederated learning
researcher/james_osei
affiliationOxford Machine Learning Research Group
publication_count19
previous_employerMeta AI (2021–2023)
research_focusgraph neural networks
researcher/yuki_tanaka
affiliationKAIST AI Institute
publication_count28
previous_employerSamsung Research (2017–2021)
research_focusvision-language models
researcher/fatima_al_rashid
affiliationKAUST
publication_count41
previous_employerMicrosoft Research (2015–2019)
research_focuscausal inference
researcher/marco_deluca
affiliationETH Zurich AI Center
publication_count56
previous_employerNVIDIA Research (2018–2022)
research_focushardware-efficient neural networks

What this measures

Large language models are stateless. Every new API call or process invocation starts with a blank context window. Any facts a model encountered in a previous session are gone — unless they were written to durable external storage and re-injected into the next session.

This statefulness gap matters acutely for agents. An agent that remembers a user's preferences, builds a research profile over multiple calls, or tracks evolving project state — all of that continuity depends entirely on what is written to persistent storage between sessions.

B2 tests whether Iranti's KB is genuinely durable across distinct process invocations, not merely intra-session consistent. The test is intentionally simple: if anything breaks at this level (write, persist, retrieve across boundary), everything built on top of it fails.

This benchmark is not about recall precision under noise (that is B1). It is about the existence of the persistence guarantee — does the KB survive the session boundary?

Additional cross-session evidence

Beyond the synthetic benchmark, production writes from unrelated work confirm the same property.

ticket/cp_t010
2026-03-202026-03-21

Entity and fact data written during B1 benchmark execution. Retrieved correctly in B2 planning session the following day.

ticket/cp_t011
2026-03-202026-03-21

Cross-reference written during benchmark logging. Confirmed retrievable in separate process session.

These are incidental observations from live work, not controlled trials. They corroborate the benchmark result but do not replace it.

Threats to validity

LimitationSingle-arm test. There is no independently operated baseline arm. The 0% baseline is an analytic claim (stateless LLMs have no cross-session memory), not an experimentally measured result. This is appropriate for a definitional claim but would not satisfy an independent review board.
LimitationPossible intra-session retrieval. If the write and retrieve operations occurred within a single long-running process — rather than two genuinely separate invocations — this would be intra-session retrieval, not cross-session. The 2026-03-20 → 2026-03-21 ticket evidence is the clearest demonstration of the genuine session boundary being crossed.
LimitationSynthetic entities. All five researcher entities are fabricated. The KB could not have populated these facts from model weights. This controls for a confound but means results say nothing about retrieval quality on real, semantically complex entities.
NoteSmall N. 20 facts across 5 entities is a correctness check, not a reliability study. A persistence guarantee should hold at 100% for all N — but this benchmark does not stress-test volume, concurrency, or failure modes.

Key properties confirmed

FindingDurable persistence across session boundary. Facts written in Session 1 are available in Session 2. The KB survived the process invocation boundary.
FindingZero hallucination on retrieval. All 20 returned facts matched the ground truth exactly. No values were confabulated or blended across entities.
FindingEntity isolation maintained. With 5 researcher entities sharing the same 4 key names, retrieval returned the correct value for the correct entity in every case. No cross-entity bleed was observed.
FindingRetrieved facts carry provenance. iranti_query responses include confidence, source, validFrom, and contested metadata. An LLM without memory returns none of this — only the value, with no audit trail.
Raw data

Full trial execution records, session logs, entity definitions, and methodology notes in the benchmarking repository.