Train with GPUs
Train ZenML pipelines on GPUs and scale out with 🤗 Accelerate.
Last updated
Was this helpful?
Train ZenML pipelines on GPUs and scale out with 🤗 Accelerate.
Last updated
Was this helpful?
Need more compute than your laptop can offer? This tutorial shows how to:
Request GPU resources for individual steps.
Build a CUDA‑enabled container image so the GPU is actually visible.
Reset the CUDA cache between steps (optional but handy for memory‑heavy jobs).
Scale to multiple GPUs or nodes with the integration.
If your orchestrator supports it you can reserve CPU, GPU and RAM directly on a ZenML @step
:
👉 Check your orchestrator's docs; some (e.g. SkyPilot) expose dedicated settings instead of ResourceSettings
.
Requesting a GPU is not enough—your Docker image needs the CUDA runtime, too.
Use the official CUDA images for TensorFlow/PyTorch or the pre‑built ones offered by AWS, GCP or Azure.
If you squeeze every last MB out of the GPU consider clearing the cache at the beginning of each step:
Call cleanup_memory()
at the start of your GPU steps.
ZenML integrates with the Hugging Face Accelerate launcher. Wrap your training step with run_with_accelerate
to fan it out over multiple GPUs or machines:
Common arguments:
num_processes
: total processes to launch (one per GPU)
multi_gpu=True
: enable multi‑GPU mode
cpu=True
: force CPU training
mixed_precision
: "fp16"
/ "bf16"
/ "no"
Accelerate‑decorated steps must be called with keyword arguments and cannot be wrapped a second time inside the pipeline definition.
Use the same CUDA image as above plus add Accelerate to the requirements:
GPU is unused
Verify CUDA toolkit inside container (nvcc --version
), check driver compatibility
OOM even after cache reset
Reduce batch size, use gradient accumulation, or request more GPU memory
Accelerate hangs
Make sure ports are open between nodes; pass main_process_port
explicitly
Need help? Join us on .