Skip to content

Failure to Initialize Container: unsatisfied condition: cuda>=12.6 #6465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Jack-Khuu opened this issue Mar 25, 2025 · 3 comments
Closed

Failure to Initialize Container: unsatisfied condition: cuda>=12.6 #6465

Jack-Khuu opened this issue Mar 25, 2025 · 3 comments

Comments

@Jack-Khuu
Copy link

Jack-Khuu commented Mar 25, 2025

GPU based workflows fail after bumping Cuda requirements from 12.4 -> 12.6 in torchchat.
Would love some help updating the driver or suggestions on how to update test config

Example Run: https://github.com/pytorch/torchchat/actions/runs/14053926197/job/39349546495

docker: Error response from daemon: failed to create task for container: failed to create shim task: 
OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , 

stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.6, 
please update your driver to a newer version, or use an earlier cuda container: unknown.

Test Runner Config: https://github.com/pytorch/torchchat/blob/fea361f6cce0b1cdd54cc211dde19266753b60fc/.github/workflows/more-tests.yml#L11-L19

  test-cuda:
    permissions:
      id-token: write
      contents: read
    uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
    with:
      runner: linux.g5.4xlarge.nvidia.gpu
      gpu-arch-type: cuda
      gpu-arch-version: "12.6"

Similar related past Issue: #5191

@clee2000
Copy link
Contributor

cc @atalman ? @seemethere I'm not sure who owns nova

I think it needs to use a newer cuda driver which would require changes to linux_job_v2 to take in an input for this, but I also see pytorch/pytorch jobs named cuda12.6 but they show 12.4 when nvidia-smi is run. Do the 12.6 on pytorch/pytorch just mean that the binary was built with 12.6 but not necessarily run on 12.6?

@HonestDeng
Copy link

cc @atalman ? @seemethere I'm not sure who owns nova

I think it needs to use a newer cuda driver which would require changes to linux_job_v2 to take in an input for this, but I also see pytorch/pytorch jobs named cuda12.6 but they show 12.4 when nvidia-smi is run. Do the 12.6 on pytorch/pytorch just mean that the binary was built with 12.6 but not necessarily run on 12.6?

Hi. I'm the owner of Update CI Jobs in anticipation for Cuda 12.4 deprecation pytorch/torchchat#1515.

I'm new to torchchat and confused what you said. Do you mean we should update linux_job_v2.yml to use a newer cuda driver?

Thanks.

@Jack-Khuu
Copy link
Author

Ah, thanks @clee2000

@HonestDeng I'll follow up with you on Discord 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants