Skip to content

NemoMegatron-aws-optimized docker missed libucc.so.1 #707

Open
@tangxianfeng

Description

@tangxianfeng

The docker image cannot import torch because it removes hpcx in line 23, but seems never installs it back.

Error example:
python3 -c "import torch; print(torch.cuda.is_available())"
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/torch/init.py", line 368, in
from torch._C import * # noqa: F403
ImportError: libucc.so.1: cannot open shared object file: No such file or directory

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions