Mohamed Elashri

Two Uproot Mysteries

So I was working on parsing some ntuples for a gazilion time and naturally decided to stick with uproot, the so called modern solution for reading ROOT files in Python. Because in 2025 we obviously need yet another way to handle TTree structures from a library that hates you just a little less than ROOT itself.

Let’s begin with the classic. eager vs lazy loading. You’d think that calling .arrays() would get you arrays, right? But no. Depending on whether you passed a single branch or multiple, or if you asked for library="np" or "ak", you might get a NumPy array, an awkward array, or a dictionary of arrays or an empty array wrapped in sixteen layers of abstraction that looks like data but isn’t. And of course, none of this is consistent unless you read the fine print of the documentation where they casually mention that sometimes you get a RecordArray pretending to be a NumPy structured array. Which is cool, because what you really wanted was a flat array of floats from a leaf called Track_PT, but you ended up with an object that fails silently when passed to your histogramming code because “iteration” means something different now.

Then there’s the string decoding. Ah yes, TObjString and its wonderful legacy. You’d think reading a string from a ROOT file would be straightforward after all, ASCII is solved, UTF-8 is solved, and even ROOT has had decades to figure it out. But uproot will happily give you a byte string that looks like it was read from a VAX system via a serial port. Sometimes it decodes fine. Other times it’s a bytes object that mysteriously needs decode("utf-8"), but only after you’ve cast it through four awkward array conversions and flattened it twice. And don’t get me started on reading a list of strings because suddenly you’re handed an ak.Array where every entry is a list of one string, but sometimes it’s a list of zero strings, or worse, an object array with dtype=object that breaks every downstream NumPy function you try to use.

In the end, uproot is still the only sane choice if you want to avoid diving into PyROOT and compiling things at 2AM on lxplus. But sane doesn’t mean it won’t drive you slightly insane. It’s like talking to a reasonable person who insists on answering every question with a riddle. Sure, you get the answer, but you have to guess the context first.