GPUs vs FPGAs in LHCb Trigger: Gains and Losses

This is an essay I wrote during my attendance of Fast Machine Learning in Science conference at ETH, Zurich. It is affected by the discussions here about Machine Learning Models inference in different fundamental science fields. Most of these were about using FPGA as accelerators for ML inference. I presented my work in integrating the complex end-to-end neural network based primary vertex locator algorithm inside LHCb HLT1 Trigger Framework (Allen). Take it as an opinion piece and feel free to email me with any discussion and I will amend it here (since I don't have a comment section yet).

Introduction

The choice of LHCb to rely exclusively on GPUs for its Run 3 and Run 4 real-time processing, abandoning both FPGAs and the traditional hardware-trigger paradigm, was one of the boldest decisions in the landscape of high-energy physics data acquisition. While this move delivered operational simplicity and flexibility, it also imposed structural limitations on what can be achieved in the earliest stages of online event reconstruction. To understand these trade-offs, one must compare LHCb’s architecture not only against what FPGAs could have offered, but also against the strategies of ATLAS, CMS, and ALICE.

The Role of FPGAs in Hardware Triggers

At the LHC, hardware triggers based on FPGAs have been central to real-time data selection since the beginning. Their pipelined, deterministic architectures guarantee latencies on the order of microseconds. In ATLAS and CMS, the Level-1 (L1) trigger layer is implemented almost entirely in FPGAs and ASICs. This stage takes the 40 MHz collision stream and rapidly reduces it to O(100 kHz), passing only a fraction of events onward to higher-level processing. The algorithms at this level focus on calorimeter sums, muon system patterns, and fast track stubs, all well-suited to FPGA logic.

FPGAs offer three unique advantages at this stage:

Deterministic timing and reliability at extreme rates.
High efficiency in simple yet throughput-heavy tasks.
Potential to integrate compact ML inference engines directly into the lowest level of decision-making.

Research in both ATLAS and CMS has already demonstrated FPGA-based ML applications, such as jet tagging and tau identification, that exploit quantized networks and systolic arrays for inference within nanosecond-level latencies. This is the place where ML matters most: at the very front line of rejection, before most collisions are discarded forever.

LHCb’s Different Path: GPUs and Allen

LHCb made a deliberate decision to remove this hardware layer altogether. Instead of an FPGA-based L1 trigger, the experiment processes every event at 30 MHz directly in software. The first decision layer is HLT1, which runs entirely on GPUs under the framework of Allen. In practical terms, HLT1 plays the same structural role in LHCb’s architecture as Level-1 in ATLAS and CMS, but it is implemented with GPUs rather than FPGAs.

The flexibility this affords is substantial. Allen can run nearly offline-quality algorithms at the very first stage, reconstructing tracks, vertices, and physics objects without the need to design and validate FPGA firmware. This makes it possible to adapt selections rapidly as physics priorities evolve. For a forward spectrometer like LHCb, where rare decays may have subtle signatures, this flexibility is especially valuable.

But the absence of a hardware layer carries heavy consequences. Unlike FPGAs, GPUs do not provide deterministic cycle-level execution. Achieving the required throughput demands strict control of streams, memory residency, and kernel scheduling. To manage this, Allen enforces rigid execution and memory models. This ensures performance, but it complicates the integration of external machine learning libraries such as ONNX Runtime, cuDNN, or TensorRT. Adapting neural networks to Allen’s framework often requires custom abstraction layers and careful engineering. What could have been natural in an FPGA pipeline becomes cumbersome in the GPU-only environment.

Most importantly, HLT1 must execute full reconstruction for every event within tight latency budgets. Without a hardware prefilter, there is no early rejection of trivial background. This raises the computational burden, increases power consumption, and narrows the scope for innovative ML algorithms that could have been deployed at the very first decision point.

How ATLAS, CMS, and ALICE Differ

ATLAS and CMS retain their FPGA-based Level-1 triggers. These systems ensure that only a sharply reduced event stream reaches the software-based HLT. ML research in both experiments focuses increasingly on this L1 stage, because this is where inference can shape the downstream dataset most effectively. Their software HLTs, although sophisticated and capable of hosting ML, are operating at rates already reduced by hardware, which changes the computational problem entirely.

ALICE, by contrast, adopted a continuous readout model. It records essentially all collisions and reconstructs them later using a GPU-heavy offline farm. This approach reflects ALICE’s heavy-ion program and the different rate environment in which it operates. While conceptually similar to LHCb in its GPU reliance, ALICE is not tasked with reducing a 30 MHz proton-proton input in real time.

Within this spectrum, LHCb is singular. It is the only LHC experiment where the very first selection layer is software-defined, GPU-based, and must execute complete event-level algorithms on the full collision rate. This makes it more flexible than ATLAS and CMS, but less efficient and less deterministic than FPGA-based solutions.

What Was Gained and What Was Lost

The GPU-only path simplified deployment. Most physicists are more comfortable with C++ and CUDA than with FPGA firmware development. Updates are distributed as software releases rather than hardware redesigns. The architecture is homogeneous, making operations and scaling straightforward. LHCb also tied its future to the rapid evolution of GPUs, which continue to expand their ML capabilities.

But the losses are significant. By giving up the hardware trigger, LHCb lost the opportunity to deploy ML inference exactly where it matters most: at the first rejection layer. It also accepted a system that consumes more power and demands more rigid internal frameworks to manage execution. In comparison, ATLAS and CMS can continue experimenting with FPGA-based ML at Level-1 while protecting their software stages from the amount of unfiltered events.

Conclusion

The decision to adopt a fully GPU-based trigger at LHCb was a conscious prioritization of flexibility and rapid adaptability over efficiency and determinism. It gave the experiment unique strengths but also introduced hard limits. Where ATLAS and CMS can place machine learning directly into their FPGA-based Level-1 triggers, LHCb must carry the entire burden in software. This narrowed the scope of what can be attempted in HLT1, and it has left some of the potential of early-stage ML inference untapped.

The boldness of the LHCb design cannot be denied, but it has also exposed the limits of abandoning hardware triggers entirely. In the long run, hybrid designs or new abstraction layers may be needed to recover some of the advantages that were sacrificed in the name of flexibility.

We must also consider the potential for future advancements in both hardware and software that could help bridge the gap between the flexibility of a fully software-based approach and the efficiency of hardware triggers. As ML techniques continue to evolve, there may be opportunities to integrate them more effectively into the trigger process, regardless of the underlying architecture. This could be a path that we need to achieve for HL-LHC but also for any future projects like the FCC. TheFCC accelerator in the $q\bar{q}$ era which would be about 40 years from now would be an interesting challenge for the community. But on this timescale, it would be totally different community and technological landscape.