Let me complain about lxplus

2025-08-05

Most people logging into lxplus arrive with reasonable expectations: a stable environment to develop, test, and iterate before pushing jobs to the grid. What they often find instead is a system built around very different assumptions. lxplus isn’t designed as a workstation or development platform, it’s a launchpad. And once you understand that, many of the daily frustrations begin to make sense.

The most fundamental of these frictions is the lack of session stability. lxplus nodes sit behind load balancers that reassign VMs regularly. If your session is idle, or sometimes even active, it can vanish without warning. tmux and screen help against client-side disconnects, but not against the host being recycled entirely. This is especially frustrating for users connecting from campuses or ISPs that apply stateful NAT or mix IPv6 with tunneled IPv4, intermittent delays caused by handshake retransmissions can silently trip SSH’s default keepalive mechanisms, leading to dropped connections even during interactive use. One of the common problems with eduroam is actually with lxplus disconnecting very frequent because of the weird network conditions (I see you Geneva Airport eduroam)

Storage limitations make this worse. AFS is quota-limited to a few GBs, just enough to compile a project or two. Exceed that, and even basic commands like make fail until you manually remove files. EOS, the alternative, provides more space but suffers from inconsistent performance. Copying large ROOT files, training ML models, or simply running a hadd job to merge ntuples becomes a waiting game. I/O-heavy workflows hit a bottleneck, and if a job tries to write back many output files, the tail latency often causes batch retries.

lxplus also enforces strict resource limits. Processes that cross memory thresholds are killed, sometimes silently. A job that merges ntuples can die mid-way, leaving only partial output and a cryptic email arriving much later. Since storage is remote and job restarts are manual, this often means redoing the full operation from scratch. Even modest local machines outperform lxplus for such tasks.

This is compounded by configuration drift. lxplus updates CVMFS trees frequently, especially for shared software like ROOT, Python packages, or LHCb-specific tools. Local institutional clusters often lag behind due to administrative constraints. As a result, code that compiles and runs on lxplus may fail on your university’s farm, with symbol mismatches or missing libraries. Some users respond by copying dependencies manually or freezing CVMFS snapshots, but both are brittle and require constant maintenance.

The workflow itself starts to stretch. Reading data from EOS is slow. If a session drops, you lose your working state. Reconnecting means repeating the setup, enduring slow reads again, and hoping memory limits aren’t breached. This makes fast feedback cycles difficult. Instead of edit-compile-test loops, you batch changes and hope for the best, introducing new risk and delaying discovery of bugs.

Still, a few practical shifts can help smooth the experience. Treat lxplus as a control interface, not a workstation. Use it to sync code, test small units, submit jobs, and monitor them. Keep heavy development and file operations on a machine you control, whether that’s a laptop or a university cluster with persistent storage and predictable behavior. When you must move data off EOS, split it into manageable archives, say 200, 300 MB, and use xrdcp¹ with resume options to hedge against link failures. Avoid generating many small files, bundle them into containers and unpack them elsewhere. And when possible, nudge your team or institution to keep CVMFS snapshots in sync with lxplus, so your tools behave the same across environments.

For anything that lasts longer than a coffee break, don’t trust a terminal pane. Submit it as a batch job through HTCondor. Keep the interactive work short, intentional, and resilient to disconnections.

None of these steps make lxplus a full-featured development machine, but that was never its job. Used well, it is a clean gateway into CERN infrastructure, a stable authentication point, and a job dispatcher. Trying to stretch it into more than that invites frustration. Instead, shape your workflow to what lxplus is, and you’ll recover a measure of speed, clarity, and control. And maybe even a bit of sanity in your mind.

xrdcp is a command-line tool for copying files to and from EOS (the Object Store at CERN) and other XRootD servers. It supports various options for resuming interrupted transfers, which can be helpful in unreliable network conditions. ↩︎