Benchmark B2

Cross-Session Memory
Persistence.

Can facts written to Iranti in one process session be retrieved accurately in a later, entirely separate session — with no in-context knowledge of what was written? This is the core promise of durable memory infrastructure.

Executed 2026-03-20 → 2026-03-215 entities × 4 facts = 20 factssmall-n — indicative only

Results at a glance

20/20Facts recalled in Session 2 (100%)

0/20Baseline — stateless LLM (definitional)

0Hallucinated facts returned

FindingIranti provides cross-session persistence that a stateless LLM cannot. The 0% baseline is not a performance measurement — it is definitional. Without external memory, a new session has no access to prior session data.

The mechanism

Session 1 writes 20 facts via iranti_write. The KB is the only bridge between sessions. Session 2 opens with no knowledge of Session 1 — only iranti_query calls recover the facts.

Recall rate

How many of the 20 written facts were correctly recalled in Session 2? The baseline is 0% by definition — stateless LLMs have no memory between process invocations.

Iranti20 / 20 (100%)

Baseline (LLM without memory)0 / 20 (0%)

Definitional — a stateless LLM has no access to Session 1 data in Session 2. This is not an empirical measurement.

The evidence — 5 × 4 persistence grid

Five fictional researcher entities, each with four facts, written in Session 1 and retrieved in Session 2. Every cell below confirms a correct retrieval. The KB was the only persistence mechanism between sessions.

Entity	S1 write	affiliation	pub_count	prev_employer	research_focus	S2 read
priya_nair	W	University of Toronto	34	IBM Research (2016–2020)	federated learning	R
james_osei	W	Oxford Machine Learning Research Group	19	Meta AI (2021–2023)	graph neural networks	R
yuki_tanaka	W	KAIST AI Institute	28	Samsung Research (2017–2021)	vision-language models	R
fatima_al_rashid	W	KAUST	41	Microsoft Research (2015–2019)	causal inference	R
marco_deluca	W	ETH Zurich AI Center	56	NVIDIA Research (2018–2022)	hardware-efficient neural networks	R
Total	5	5/5	5/5	5/5	5/5	20/20

Each checkmark represents a fact correctly retrieved in Session 2 with no in-context knowledge of what was written. W = written (Session 1). R = retrieved (Session 2).

Ground truth entities

Five fictional researchers. All entity IDs and fact values are synthetic — fabricated specifically for this benchmark, not drawn from real individuals.

researcher/priya_nair

affiliationUniversity of Toronto

publication_count34

previous_employerIBM Research (2016–2020)

research_focusfederated learning

researcher/james_osei

affiliationOxford Machine Learning Research Group

publication_count19

previous_employerMeta AI (2021–2023)

research_focusgraph neural networks

researcher/yuki_tanaka

affiliationKAIST AI Institute

publication_count28

previous_employerSamsung Research (2017–2021)

research_focusvision-language models

researcher/fatima_al_rashid

affiliationKAUST

publication_count41

previous_employerMicrosoft Research (2015–2019)

research_focuscausal inference

researcher/marco_deluca

affiliationETH Zurich AI Center

publication_count56

previous_employerNVIDIA Research (2018–2022)

research_focushardware-efficient neural networks

What this measures

Large language models are stateless. Every new API call or process invocation starts with a blank context window. Any facts a model encountered in a previous session are gone — unless they were written to durable external storage and re-injected into the next session.

This statefulness gap matters acutely for agents. An agent that remembers a user's preferences, builds a research profile over multiple calls, or tracks evolving project state — all of that continuity depends entirely on what is written to persistent storage between sessions.

B2 tests whether Iranti's KB is genuinely durable across distinct process invocations, not merely intra-session consistent. The test is intentionally simple: if anything breaks at this level (write, persist, retrieve across boundary), everything built on top of it fails.

This benchmark is not about recall precision under noise (that is B1). It is about the existence of the persistence guarantee — does the KB survive the session boundary?

Additional cross-session evidence

Beyond the synthetic benchmark, production writes from unrelated work confirm the same property.

ticket/cp_t010

2026-03-20→2026-03-21

Entity and fact data written during B1 benchmark execution. Retrieved correctly in B2 planning session the following day.

ticket/cp_t011

2026-03-20→2026-03-21

Cross-reference written during benchmark logging. Confirmed retrievable in separate process session.

These are incidental observations from live work, not controlled trials. They corroborate the benchmark result but do not replace it.

Threats to validity

LimitationSingle-arm test. There is no independently operated baseline arm. The 0% baseline is an analytic claim (stateless LLMs have no cross-session memory), not an experimentally measured result. This is appropriate for a definitional claim but would not satisfy an independent review board.

LimitationPossible intra-session retrieval. If the write and retrieve operations occurred within a single long-running process — rather than two genuinely separate invocations — this would be intra-session retrieval, not cross-session. The 2026-03-20 → 2026-03-21 ticket evidence is the clearest demonstration of the genuine session boundary being crossed.

LimitationSynthetic entities. All five researcher entities are fabricated. The KB could not have populated these facts from model weights. This controls for a confound but means results say nothing about retrieval quality on real, semantically complex entities.

NoteSmall N. 20 facts across 5 entities is a correctness check, not a reliability study. A persistence guarantee should hold at 100% for all N — but this benchmark does not stress-test volume, concurrency, or failure modes.

Key properties confirmed

FindingDurable persistence across session boundary. Facts written in Session 1 are available in Session 2. The KB survived the process invocation boundary.

FindingZero hallucination on retrieval. All 20 returned facts matched the ground truth exactly. No values were confabulated or blended across entities.

FindingEntity isolation maintained. With 5 researcher entities sharing the same 4 key names, retrieval returned the correct value for the correct entity in every case. No cross-entity bleed was observed.

FindingRetrieved facts carry provenance. iranti_query responses include confidence, source, validFrom, and contested metadata. An LLM without memory returns none of this — only the value, with no audit trail.

Raw data

Full trial execution records, session logs, entity definitions, and methodology notes in the benchmarking repository.

iranti-benchmarking →← All benchmarks

Cross-Session MemoryPersistence.