High-energy physics has a data problem, or rather, a data volume problem. The four major LHC experiments collectively accumulated over 2 exabytes of ROOT data through Runs 1–3. With the High-Luminosity LHC (HL-LHC) expected to begin data taking by the end of this decade, that number is projected to balloon beyond 10 exabytes under management. To put this in perspective, 90% of all LHC data will be produced during the HL-LHC era. Storage alone consumes roughly half of the total HEP computing budget. Every percentage point of compression matters. Every microsecond of read latency, multiplied across billions of events, adds up to real compute hours and real cost.

For a quarter of a century, TTree, ROOT's columnar data storage format, has been the backbone of essentially every HEP analysis pipeline. It stores the events, branches, and leaves that physicists query, plot, and fit every day. But TTree was designed in the late 1990s, before there was even a ratified C++ standard, before parallel computing was mainstream, and before SSDs, NVMe drives, and object stores existed. The computing landscape of 2025 is unrecognizable compared to 1995. The question is no longer can we keep TTree alive? but rather can we afford not to replace it?

Then we have RNTuple, ROOT's next-generation I/O subsystem, built from the ground up to be the designated data format for LHC data starting in Run 4.

What Exactly Is TTree, and Why Has It Worked So Well?

Before discussing RNTuple, it's worth appreciating what TTree got right. TTree is a columnar storage system, meaning that instead of writing complete event records one after another (row-wise), it stores all values of a given data member ("branch") contiguously on disk. If you only want to plot the transverse momentum (pT) of muons across 100 million events, TTree lets you read just the pT column without touching anything else, charges, vertex positions and isolation variables, none of it gets loaded into memory.

This columnar design was ahead of its time. TTree also provided seamless C++ integration: you could store arbitrary C++ objects, including nested structures and collections, without explicitly defining data schemas. You just handed it your classes, and ROOT's dictionary system took care of serialization. TTree organized data into "baskets" (compressed blocks of column data) and "clusters" (groups of entries that form a self-contained unit for I/O). The combination of columnar layout, transparent compression, and deep C++ integration made TTree the de facto standard for HEP event data.

But TTree also accumulated 25 years of design debt.

Where TTree Shows Its Age

Several fundamental limitations have become increasingly painful as the data volumes and hardware landscape have evolved:

None of these are flaws in the context of when TTree was designed. They are consequences of a format that has been remarkably successful for 25 years but was never designed for the world we now inhabit.

RNTuple: A Clean-Sheet Redesign

RNTuple (short for "nested tuple") is not an incremental improvement to TTree. It is a backwards-incompatible redesign of both the binary on-disk format and the C++ API. The ROOT team, led by Jakob Blomer at CERN, made the deliberate decision to break compatibility with TTree in order to fully exploit optimization opportunities that would be impossible under the constraints of backward compatibility. This was not a decision taken lightly, but the HL-LHC data challenge made it necessary.

The design of RNTuple draws on the ROOT team's quarter-century of experience with TTree, combined with best practices from modern industry columnar formats like Apache Parquet and Arrow. The result is purpose-built for the unique characteristics of HEP data: statistically independent events, complex nested data structures, columnar access patterns, and write-once-read-many workflows.

RNTuple's internal architecture is organized into four well-separated layers:

  1. Event Iteration Layer: The user-facing API. Provides convenient, type-safe interfaces for looping over events, creating models, reading and writing entries. This is where RNTupleReader, RNTupleWriter, and RNTupleModel live.

  2. Logical Layer: Maps complex C++ types (including nested collections, std::variant, std::optional, user-defined classes) onto flat columns of fundamental types. This is where the recursive decomposition of C++ objects into serializable fields happens.

  3. Primitives Layer: Groups ranges of column elements into pages, the basic unit of compression and I/O. A page contains a contiguous range of values for a single column.

  4. Storage Layer: Handles the actual I/O of pages and metadata (headers, footers, page lists). This layer abstracts over different storage backends: local files, XRootD, object stores (DAOS, S3), NVMe with Direct I/O.

This layered design has practical consequences. Adding support for a new C++ type means only touching the logical layer. Adding support for a new storage backend means only touching the storage layer. The concerns are genuinely separated.

On-Disk Format: What Changed and Why

The first production version of the RNTuple binary format specification was released in November 2024 as part of ROOT 6.34. This is a formally specified format with a public specification document, a first for ROOT, and a significant step for interoperability with third-party tools (like Uproot in Python).

Here are the key architectural differences from TTree:

The Numbers: Storage and Throughput

The performance claims are backed by extensive benchmarking across real experiment data models. Here's what the ROOT team has demonstrated:

When you multiply these savings across exabytes, the cost implications for storage and network bandwidth are staggering.

The API: Modern, Safe, and Familiar (Enough)

RNTuple's C++ API follows modern core guidelines: type-safe templates, smart pointer ownership semantics, and RResult<> error handling that avoids silent failures. Compare:

// TTree: runtime type matching, raw pointers, silent failure modes
float pt;
tree->SetBranchAddress("muon_pt", &pt);  // typo? wrong type? you'll only know at runtime

// RNTuple: compile-time type safety, clear ownership
auto model = RNTupleModel::Create();
auto pt = model->MakeField<float>("muon_pt");
auto reader = RNTupleReader::Open(std::move(model), "Events", "data.root");

For analysis physicists, the most important thing is that RDataFrame works identically with both TTree and RNTuple. If your analysis is written using RDataFrame, switching from TTree to RNTuple is essentially a one-line change , or in ROOT 6.32+, RDataFrame auto-detects the format entirely. Your cuts, histograms, and column definitions don't change at all.

For framework developers (reconstruction, simulation, derivation), RNTuple provides the RNTupleWriter interface with model creation, field definition, and entry-based filling that maps naturally onto existing TTree-based workflows.

The Migration Path

The transition from TTree to RNTuple is already underway. ATLAS, CMS, and LHCb have all integrated initial RNTuple support into their experiment frameworks. The timeline looks like this:

For existing data, ROOT provides RNTupleImporter, a tool that converts TTree datasets to RNTuple:

auto importer = RNTupleImporter::Create("input.root", "Events", "output.root");
importer->Import();

And critically: TTree is not going away. It will remain in ROOT indefinitely. The exabytes of existing TTree data will continue to be readable. The transition is about new data and new analyses, not about obsoleting the past.

Why This Matters Beyond the Numbers

The move from TTree to RNTuple is not just a performance optimization. It represents a fundamental rethinking of how HEP stores and accesses data, informed by both 25 years of TTree experience and the realities of modern computing:

The ROOT team is continuing to develop RNTuple with several areas of active work: precision cascades for separating lossy compression levels into different files, deeper integration with analysis facilities, GPU-friendly data transfer mechanisms, and schema evolution maturity. The collaboration with experiments is bidirectional, early adopters provide feedback that shapes the API, and the ROOT team ensures that the format supports the full breadth of experiment data models.

For anyone in the field: try it out. Convert a TTree to RNTuple with the importer, run your RDataFrame analysis, and see the difference yourself. The format is production-ready on disk, the APIs are stabilizing rapidly, and the performance improvements are real and measured.

After 25 years of TTree, the future of HEP I/O is being written, in little-endian, checksummed, and very, very fast.


References and further reading: