Machine Learning-Based b-Jet Tagging in pp Collisions at sqrt{s}=13 TeV

2025-04-28

By Hadi Hassan, Neelkamal Mallick, D.J. Kim

Recently, I have picked an interest in b-tagging and Machine Learning application in this area. So I would like to comment on the first recent paper that I read on this matter. The paper introduces a method using a CNN to tag b-jets in proton-proton collisions at 13 TeV. They used Pythia8 for event generation and apply smearing to mimic the ALICE detector’s resolution. Features from jets, tracks, and secondary vertices are used to train the model. The results show that their ML model outperforms traditional methods like secondary vertex and impact parameter approaches.

Key Takeaways

They used constructed features by combining jet-level parameters, track properties (e.g., transverse momentum, DCA), and secondary vertex attributes (e.g., decay length).
The model achieves an AUC of 0.966, demonstrating strong separation between signal (b-jets) and background (c/lf-jets). At 50% efficiency, it outperforms traditional methods (e.g., IP and SV-based approaches) by ~30% in purity
The analysis includes kinematic dependencies (e.g., pT bins), showing robust performance for high pT jets (90% purity at 70% efficiency), which aligns with expectations due to clearer displaced vertex signatures at higher energies
Momentum and DCA smearing based on ALICE ITS2 detector resolutions adds some realism to the Pythia8-generated data

Context & Analysis

The first is that the performance degrades for low pT jets due to reduced track multiplicity and shorter decay lengths. They did not discuss any trail of mitigation that like exploring dynamic feature selection. This is in particular important because in HL-LHC for upcoming runs we will have a limiting applicability to high luminosity environments where low pT jets dominate.

The second point is that while smearing might add some realism to ALICE detector simulation but While smearing approximates detector effects, it does not account for complexities like pileup, non-Gaussian resolution tails, or detector inefficiencies (e.g., missed tracks/vertices). A work like that would benefit from a full GEANT4 simulations because the current oversimplification in the implementation will for example overestimate secondary vertex reconstruction accuracy.

Also lets talk about the fact that the paper is light on details and how and why the current model is chosen. Ablation studies or hyperparameter optimization details are absent, raising questions about whether the design is optimal. This discussion would be more useful from ML point of view than the super focus on physics’ aspect.

Let’s be real that this is introductory work and a lot of work will be needed if this ever would be added to ALICE. But regarding this, the most important part for me is that while the paper stated that training is done on 40M events with 350k b-jets and 300 epochs implies significant computational resources.

No discussion of inference speed or feasibility for triggering is provided. Maybe this was outside the scope of the paper but to be realistic is that most of the work from my experience would be those things that is hard and people will not be happy by how slow and hard you can proceed with it.