container run failed when using containerd instead of docker

**1. Issue or feature description**
ctr run --rm -t   --gpus 0 general-vllm-infer-service:0.1.8 test nvidia-smi

ctr: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: ldcache error: process /sbin/ldconfig.real failed with error code: 1: unknown
here is /etc/containerd/config.toml  
`    
[plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "nvidia"
      disable_snapshot_annotations = true
      discard_unpacked_layers = false
      ignore_rdt_not_enabled_errors = false
      no_pivot = false
      systemd_cgroup = false

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = false

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = false
`

using docker, it works:
docker run --rm --gpus all general-vllm-infer-service:0.1.8  nvidia-smi
Tue Jun 10 14:57:12 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A10                     Off |   00000000:05:00.0 Off |                    0 |
|  0%   46C    P0             62W /  150W |       1MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+


**2. Steps to reproduce the issue**

ctr run --rm -t   --gpus 0 general-vllm-infer-service:0.1.8 test nvidia-smi

environment: 
HostOS: centos7
containrd: 1.6.33-3.1
docker: 26.1.4
nvidia-container-toolkit: 1.17.8

**3. what differences between them**
comparing specs generated by containerd and docker,  significat differece is below:

![Image](https://github.com/user-attachments/assets/cae6ef74-170d-4a19-8b5b-6941421e1ba1)

Additionally, logs are found in file '/var/log/nvidia-container-runtime.log' and '/var/log/nvidia-container-toolkit.log' when using docker run, but nothing when using ctr run.

/var/log/nvidia-container-toolkit.log
> I0610 03:24:34.656249 3245 nvc.c:396] initializing library context (version=1.17.8, build=6eda4d76c8c5f8fc174e4abca83e513fb4dd63b0)
I0610 03:24:34.656307 3245 nvc.c:367] using root /
I0610 03:24:34.656311 3245 nvc.c:368] using ldcache /etc/ld.so.cache
I0610 03:24:34.656313 3245 nvc.c:369] using unprivileged user 65534:65534
I0610 03:24:34.656330 3245 nvc.c:413] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0610 03:24:34.656370 3245 nvc.c:415] dxcore initialization failed, continuing assuming a non-WSL environment
I0610 03:24:34.659852 3260 nvc.c:278] loading kernel module nvidia
I0610 03:24:34.660003 3260 nvc.c:282] running mknod for /dev/nvidiactl
I0610 03:24:34.660093 3260 nvc.c:286] running mknod for /dev/nvidia0
I0610 03:24:34.660132 3260 nvc.c:290] running mknod for all nvcaps in /dev/nvidia-caps
I0610 03:24:34.663391 3260 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I0610 03:24:34.663505 3260 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I0610 03:24:34.665444 3260 nvc.c:304] loading kernel module nvidia_uvm
I0610 03:24:34.665496 3260 nvc.c:308] running mknod for /dev/nvidia-uvm
I0610 03:24:34.665561 3260 nvc.c:313] loading kernel module nvidia_modeset
I0610 03:24:34.665594 3260 nvc.c:317] running mknod for /dev/nvidia-modeset


/var/log/nvidia-container-runtime.log
> {"level":"debug","msg":"Checking candidate '/usr/bin/runc'","time":"2025-06-10T03:24:31Z"}
{"level":"debug","msg":"Found 1 candidates; ignoring further candidates","time":"2025-06-10T03:24:31Z"}
{"level":"debug","msg":"Using bundle directory: /run/containerd/io.containerd.runtime.v2.task/moby/2978ebc37fded9a3b952a46eb0c9d3fa1186d3101c2da522fc06c96b35098a3a","time":"2025-06-10T03:24:31Z"}
{"level":"info","msg":"Using OCI specification file path: /run/containerd/io.containerd.runtime.v2.task/moby/2978ebc37fded9a3b952a46eb0c9d3fa1186d3101c2da522fc06c96b35098a3a/config.json","time":"2025-06-10T03:24:31Z"}
{"level":"debug","msg":"Is WSL-based system? false: could not load DXCore library: libdxcore.so: cannot open shared object file: No such file or directory","time":"2025-06-10T03:24:31Z"}
{"level":"debug","msg":"Is Tegra-based system? false: /sys/devices/soc0/family file not found","time":"2025-06-10T03:24:31Z"}
{"level":"debug","msg":"Is NVML-based system? true: found NVML library","time":"2025-06-10T03:24:31Z"}
{"level":"debug","msg":"Has only integrated GPUs? false: device \"NVIDIA A10\" does not use nvgpu module","time":"2025-06-10T03:24:32Z"}

**4. two questions to be solved**
question 1：What's wrong with my image ’general-vllm-infer-service:0.1.8‘， why it can work using docker，can't work using containerd

question 2：container spec is different，how it happen？How can I get nvidia-container-toolkit log when using containerd？


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

container run failed when using containerd instead of docker #1138

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

container run failed when using containerd instead of docker #1138

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions