Code Sync Across Network-Restricted Machines at CERN

2024-10-25

When working with CERN’s network-restricted machines, often used in experimental trigger development we may need to connect through multiple SSH hops. Typically, this means logging into lxplus, then jumping to an intermediate node like lbgw, and finally reaching a restricted machine. If someone prefer using local tools like VSCode or JetBrains, this setup can be challenging; installing and maintaining these editors in a restricted environment is resource-intensive and often impractical.

To solve this, we’ll set up SSH key-based authentication across all nodes and use a bidirectional sync script that automatically keeps both our local and remote environments in sync. This setup allows us to work with all our favorite local tools while maintaining an up-to-date remote environment in real time. CERN IT’s testing of two-factor authentication (2FA) for SSH access also makes this a timely solution, as it eliminates the need for repeated 2FA prompts.

With this setup, we avoid installing heavy remote servers for VSCode or JetBrains, keeping our development environment flexible and efficient.

If you just need the script you can download it as a gist here

Required Dependencies and Supported OS

This setup requires a few essential tools that are typically pre-installed on Linux and macOS:

SSH: Secure access across multiple hops is central to this setup.
rsync: Provides efficient file syncing between local and remote systems.

OS Compatibility

Linux: Fully compatible with Linux distributions, especially those using apt (Debian/Ubuntu) or yum/dnf (RHEL/Fedora) for package management.
macOS: Compatible with Homebrew-installed dependencies.

Installing Dependencies

On Linux, we can install rsync with:

# Debian/Ubuntu
sudo apt install rsync

# RHEL/Fedora
sudo yum install rsync

For macOS:

Install Homebrew if we haven’t already:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install rsync:
```
brew install rsync
```

Setting Up SSH Key Authentication Across Hops

Using SSH keys simplifies access and removes the need to re-enter credentials at each hop, which is especially useful for those testing CERN IT’s 2FA SSH access.

Generate an SSH key pair on our local machine if we haven’t already:

ssh-keygen -t rsa -b 4096 -C "email@example.com"

Once w have the keys, copy the public key to each machine in our SSH chain, which might look like lxplus ➞ lbgw ➞ restricted machine. Set up the keys in the following order:

Add the Key to lxplus:

ssh-copy-id -i ~/.ssh/id_rsa.pub username@lxplus.cern.ch

Add the Key to lbgw via lxplus:

ssh -J username@lxplus.cern.ch username@lbgw 'mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys' < ~/.ssh/id_rsa.pub

Add the Key to the Restricted Machine via lbgw:

ssh -J username@lxplus.cern.ch,username@lbgw username@restricted_machine 'mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys' < ~/.ssh/id_rsa.pub

This key setup enables seamless SSH access across hops without repeated authentication, allowing us to reach the final machine without interruption.

To streamline further, configure our ~/.ssh/config file with the multi-hop settings:

Host lxplus
    User username
    HostName lxplus.cern.ch
    ForwardAgent yes

Host lbgw
    User username
    HostName lbgw
    ProxyJump lxplus

Host restricted_machine
    User username
    HostName hostname_of_restricted_machine
    ProxyJump lbgw

With this setup, we can reach the restricted machine by typing ssh restricted_machine.

Writing a Real-Time Bidirectional Sync Script with Checksum Comparison

With SSH keys and simplified connections in place, we can now create a script that monitors our local and remote project directories for changes using checksum comparison. By generating a checksum of each directory, the script detects when changes occur and syncs in both directions as needed. This approach provides efficient, real-time sync across our machines.

The sync script is organized into logical parts for flexibility and ease of maintenance.

Script Structure

Before writing the script, think about its structure to help organize necessary functions. Let’s outline the requirements:

Requirements:

Sync changes from local to remote and vice versa.
Monitor both local and remote directories for changes.
Update files on both machines whenever changes are detected.
Handle both local and remote directories.
Generate checksums for local and remote directories.
Implement best practices for logging and error handling.

Based on the requirements, we can proceed with writing the script in functional blocks. The following section will show how we can achieve these requirements using simple functions.

Functions

Parameters and Logging

#!/bin/bash

# Parse user inputs
REMOTE_USER="$1"
REMOTE_SERVER="$2"
REMOTE_PATH="$3"
LOCAL_PATH="$4"
SYNC_INTERVAL="${5:-10}"

# Define colors for logs
GREEN='\033[0;32m'
RED='\033[0;31m'
NC='\033[0m' # No color

log_info() { echo -e "${GREEN}[INFO]${NC} $1"; }
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }

This part sets up color-coded logging functions and parses user inputs for the remote and local paths, with an optional sync interval (defaulting to 10 seconds).

Checksum Generation

# Function to generate a checksum file list in a directory
generate_checksum() {
    find "$1" -type f ! -path "*/node_modules/*" -exec md5sum {} + 2>/dev/null | sort | md5sum | awk '{print $1}'
}

This function generates a checksum for a directory, excluding node_modules. To expand the list of excluded directories, pass additional arguments to the function:

# Function to generate a checksum file list with exclusions
generate_checksum() {
    find "$1" -type f ! -path "*/node_modules/*" ! -path "*/$2/*" -exec md5sum {} + 2>/dev/null | sort | md5sum | awk '{print $1}'
}

Bidirectional Sync Functions

# Sync function from local to remote
sync_local_to_remote() {
    log_info "Syncing changes from local to remote"
    rsync -avz --exclude 'node_modules' "$LOCAL_PATH/" "${REMOTE_USER}@${REMOTE_SERVER}:${REMOTE_PATH}"
}

# Sync function from remote to local
sync_remote_to_local() {
    log_info "Syncing changes from remote to local"
    rsync -avz --exclude 'node_modules' "${REMOTE_USER}@${REMOTE_SERVER}:${REMOTE_PATH}/" "$LOCAL_PATH"
}

These functions use rsync to copy files and directories, excluding node_modules.

Monitoring for Changes

# Monitor changes based on checksum comparison
monitor_changes() {
    while true; do
        local_checksum=$(generate_checksum "$LOCAL_PATH")
        remote_checksum=$(ssh "$REMOTE_USER@$REMOTE_SERVER" "cd $REMOTE_PATH && find . -type f ! -path '*/node_modules/*' -exec md5sum {} + 2>/dev/null | sort | md5sum | awk '{print \$1}'")

        if [ "$local_checksum" != "$remote_checksum" ]; then
            log_info "Detected changes. Syncing..."
            sync_remote_to_local
            sync_local_to_remote
        else
            log_info "No changes detected. Skipping sync."
        fi

        # Wait before the next check
        sleep "$SYNC_INTERVAL"
    done
}

Now our script is ready to run!. Save it as sync.sh

Running the Script

After creating the script, make it executable and run it with our desired paths and sync interval:

chmod +x sync_.sh
./sync.sh username restricted_machine /path/to/remote/project /path/to/local/project 10

This setup lets we work seamlessly with our local development tools without installing them on the remote environment. With dependencies aligned for Linux and macOS, this guide offers flexibility for CERN developers working on network-restricted machines, reducing friction caused by multi-hop SSH connections and 2FA.