Skip to content

combinatrix-ai/gparallel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gparallel 🖥️🚀

gparallel is a GPU-aware parallel job scheduler with a ** tmux-like TUI** for managing GPU workloads on single machines. Perfect for researchers and ML engineers who need to maximize GPU utilization without the complexity of cluster managers.

gparallel License

Features

  • 🚀 Automatic GPU allocation - Jobs distributed across available GPUs
  • 📊 Real-time TUI - Beautiful terminal interface with GPU status, job queue, and live logs
  • 📜 Smart scrolling - Navigate through large job queues with arrow keys
  • 🔄 Job state tracking - Visual indicators for QUEUE, RUN, DONE, and FAIL states
  • 💾 GPU memory monitoring - Real-time memory usage with color-coded indicators
  • 🎯 GPU status indicators - ● (running) / ○ (idle) status for each GPU
  • Non-blocking execution - Jobs start immediately as GPUs become available
  • 🛑 Graceful shutdown - Ctrl+C kills all running jobs cleanly

Quick Start

# Create a command list (one command per line)
cat > commands.txt <<'EOF'
python train.py --model bert --epochs 10
python train.py --model gpt2 --epochs 20
python eval.py --checkpoint model_best.pt
EOF

# Run with TUI (default)
gparallel commands.txt

# Run without TUI for CI/scripting
gparallel commands.txt --no-tui

Terminal UI

When running in TUI mode, gparallel displays a comprehensive interface:

┌─ GPUs ───────────────────┐┌─ Job queue ──────────────────────────────────┐
│0 ● RTX4090  20312 MB     ││3c93a4f1 train.py --model bert    RUN  G0    │
│1 ○ RTX4090  24576 MB     ││[f1ed8a92 train.py --model gpt2    QUEUE]    │ 
│2 ● RTX4090  18432 MB     ││a8b2c3d4 eval.py --checkpoint...  RUN  G2    │
└──────────────────────────┘└──────────────────────────────────────────────┘
┌─ Live log : job #f1ed8a92 (tail -f) ─────────────────────────────────┐
│No logs yet for job f1ed8a92 (train.py --model gpt2)                  │
│                                                                       │
└───────────────────────────────────────────────────────────────────────┘
↑/↓ Navigate jobs  q Quit (jobs continue)  Ctrl+C Force quit & stop all jobs  Auto-exit when all jobs complete

UI Components

  1. GPU Panel (top-left)

    • GPU ID and name
    • Status indicator: ● (running job) / ○ (idle)
    • Available memory in MB with color coding:
      • 🟢 Green: <50% usage
      • 🟡 Yellow: 50-80% usage
      • 🔴 Red: >80% usage
  2. Job Queue Panel (top-right)

    • Job ID (first 8 chars of UUID)
    • Command (truncated if too long)
    • State: QUEUE, RUN (with GPU), DONE, or FAIL
    • Scrollable with ↑/↓ keys when many jobs exist
  3. Live Log Panel (bottom)

    • Shows stdout/stderr from selected job
    • Auto-selects first job
    • Updates in real-time
    • Limited to last 1000 lines per job

Keyboard Controls

  • ↑/↓ - Navigate through jobs in the queue
  • q - Quit gparallel (jobs continue running in background)
  • Ctrl+C - Force quit and terminate all running jobs

Installation

From Source (Recommended)

# Clone the repository
git clone https://github.com/yourusername/gparallel
cd gparallel

# Build with Cargo
cargo build --release

# Copy to your PATH
sudo cp target/release/gparallel /usr/local/bin/

Using Cargo

# Once published to crates.io
cargo install gparallel

Dependencies

  • Rust 1.70+
  • NVIDIA drivers and CUDA toolkit
  • Terminal with UTF-8 support for UI elements

Usage

Basic Usage

# Run jobs from a file
gparallel jobs.txt

# Specify visible GPUs
CUDA_VISIBLE_DEVICES=0,2,4 gparallel jobs.txt

# Disable TUI for scripts/CI
gparallel jobs.txt --no-tui

Command File Format

Create a text file with one command per line:

# train_jobs.txt
python train.py --config configs/experiment1.yaml
python train.py --config configs/experiment2.yaml
python train.py --config configs/experiment3.yaml
./run_benchmark.sh --gpu-test
jupyter nbconvert --execute notebook.ipynb

Generating Commands Dynamically

# Generate parameter sweep
for lr in 0.01 0.001 0.0001; do
    for bs in 32 64 128; do
        echo "python train.py --lr $lr --batch-size $bs"
    done
done > sweep_jobs.txt

gparallel sweep_jobs.txt

How It Works

  1. GPU Detection

    • Respects CUDA_VISIBLE_DEVICES if set
    • Uses NVML for GPU information and memory monitoring
    • Falls back to nvidia-smi if NVML unavailable
    • Assumes single GPU if detection fails
  2. Job Scheduling

    • Round-robin assignment to available GPUs
    • Jobs queued when all GPUs busy
    • Immediate dispatch when GPU becomes free
    • Each job gets exclusive GPU via CUDA_VISIBLE_DEVICES
  3. Process Management

    • Spawns jobs via bash -c
    • Captures stdout/stderr to memory buffers
    • Tracks process IDs for signal handling
    • Updates job states in real-time
  4. Memory Monitoring

    • Polls GPU memory every 2 seconds
    • Updates display with current free memory
    • Color-codes based on usage percentage

Command Line Options

gparallel [OPTIONS] <FILENAME>

Arguments:
  <FILENAME>  File containing commands to execute (one per line)

Options:
      --no-tui                     Disable TUI and use plain text output
  -h, --help                       Print help
  -V, --version                    Print version

Troubleshooting

GPU Detection Issues

If gparallel shows incorrect GPUs:

# Check NVIDIA driver
nvidia-smi

# Force specific GPUs
export CUDA_VISIBLE_DEVICES=0,1,2
gparallel jobs.txt

TUI Not Displaying

If TUI doesn't appear:

# Check terminal capabilities
echo $TERM

# Force non-TUI mode
gparallel jobs.txt --no-tui

Jobs Not Starting

Common causes:

  • All GPUs busy (check GPU panel)
  • Previous job hasn't released GPU yet
  • Command syntax error (check failed jobs)

Contributing

Contributions welcome! Please open an issue or submit a pull request.

Development Setup

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone and build
git clone https://github.com/yourusername/gparallel
cd gparallel
cargo build

# Run tests
cargo test

# Run with debug output
RUST_LOG=debug cargo run -- test_jobs.txt

License

MIT License - see LICENSE file for details.

About

Single‑binary workstation GPU scheduler

Resources

Stars

Watchers

Forks

Packages

No packages published